<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Code-itch &#187; TinySeqXML</title>
	<atom:link href="http://www.code-itch.com/blog/category/tinyseqxml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.code-itch.com/blog</link>
	<description>A non-coders attempts at writing useful code</description>
	<lastBuildDate>Sun, 19 Dec 2010 11:47:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>XML parsing &#8211; python</title>
		<link>http://www.code-itch.com/blog/2007/04/xml-parsing-python/</link>
		<comments>http://www.code-itch.com/blog/2007/04/xml-parsing-python/#comments</comments>
		<pubDate>Thu, 12 Apr 2007 17:17:29 +0000</pubDate>
		<dc:creator>harijay</dc:creator>
				<category><![CDATA[diveintopython]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[TinySeqXML]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[UnicodeEncodeError]]></category>

		<guid isPermaLink="false">http://codeitch.wordpress.com/2007/04/12/xml-parsing-python/</guid>
		<description><![CDATA[[youtube=http://www.youtube.com/watch?v=L-_6tiTR8v0] XML is a great way to organize information. I first learnt of the power of XML to systematize information when I used it to output a whole bunch of search results from NCBI in the Tinyseq XML format. Once I had this XML document , I could read it into Excel and then very [...]]]></description>
			<content:encoded><![CDATA[<p>[youtube=http://www.youtube.com/watch?v=L-_6tiTR8v0]</p>
<p>XML is a great way to organize information. I first learnt of the power of XML to systematize information when I used it to output a whole bunch of search results from NCBI in the<a href="http://xml.coverpages.org/NCBI-XML-Toolbox.txt"> Tinyseq XML format</a>. Once I had this XML document , I could read it into Excel and then very easily analyze the information since it was nicely laid out as an Excel sheet.</p>
<p><a href="http://harijay.wordpress.com/2006/09/28/moving-elns-offsite/">Backpackit a service I use to take notes detailing my experimental research  results</a><br />
outputs all of the account data in XML format. Before I can move this data elsewhere , it helps for me to understand the data structure. So the first task I  set out to do was to parse the XML output.</p>
<p>I decided to use Python for this , because I felt using Java here would be like using an elephant to crush a fly ( or whatever the expression is ). Also a lot of the data is text , and I always used perl previously to handle text.  So a general basis for my codeitch will be What I did in Perl before I wold like to do in Python now. Java will be used once for more heavyweight tasks.</p>
<p>What I needed my program to do was :</p>
<ol>
<li> Read the XML output</li>
<li>Create objects for each element or node in the output</li>
</ol>
<p>I can then imagine that once I have these objects I can ask questions like how many objects have embedded images , how many objects have outgoing links etc etc..</p>
<p>The <a href="http://www.diveintopython.org/">&#8220;Dive into python &#8220;</a> book gave me a quick introduction into the <a href="http://www.diveintopython.org/xml_processing/parsing_xml.html">xml.dom package</a>. I then ran into some encoding or codec issues and learnt all about <a href="http://www.reportlab.com/i18n/python_unicode_tutorial.html">&#8220;utf8&#8243; and &#8220;iso8859&#8243; character encoding</a>. Once I learnt <a href="http://wiki.wxpython.org/index.cgi/UnicodeEncodeError">how to handle the UnicodeEncodeError</a> , I had a full fledged three line program that parsed my input file , created the document object and as proof of successful parsing and printed my XML file back out.</p>
<p>The screencast above documents my travails.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.code-itch.com/blog/2007/04/xml-parsing-python/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

