Developer Forums | About Us | Site Map
Search  
HOME > TUTORIALS > SERVER SIDE CODING > PYTHON TUTORIALS > THE PYTHON WEB SERVICES DEVELOPER: RSS FOR PYTHON


Sponsors





Useful Lists

Web Host
site hosted by netplex

Online Manuals

The Python Web services developer: RSS for Python
By Mike Olson and Uche Ogbuji - 2004-01-14 Page:  1 2 3 4

RSS.py

Mark Nottingham's RSS.py is a Python library for RSS processing. It is very complete and well-written. It requires Python 2.2 and PyXML 0.7.1. Installation is easy; just download the Python file from Mark's home page and copy it to somewhere in your PYTHONPATH.

Most users of RSS.py need only concern themselves with two classes it provides: CollectionChannel and TrackingChannel. The latter seems the more useful of the two. TrackingChannel is a data structure that contains all the RSS data indexed by the key of each item. CollectionChannel is a similar data structure, but organized more as RSS documents themselves are, with the top-level channel information pointing to the item details using hash values for the URLs. You will probably use the utility namespace declarations in the RSS.ns structure. Listing 1 is a simple script that downloads and parses an RSS feed for Python news, and prints out all the information from the various items in a simple listing.

Listing 1


  
from RSS import ns, CollectionChannel, TrackingChannel

#Create a tracking channel, which is a data structure that
#Indexes RSS data by item URL
tc = TrackingChannel()

#Returns the RSSParser instance used, which can usually be ignored
tc.parse("http://www.python.org/channews.rdf")

RSS10_TITLE = (ns.rss10, 'title')
RSS10_DESC = (ns.rss10, 'description')

#You can also use tc.keys()
items = tc.listItems()
for item in items:
    #Each item is a (url, order_index) tuple
    url = item[0]
    print "RSS Item:", url
    #Get all the data for the item as a Python dictionary
    item_data = tc.getItem(item)
    print "Title:", item_data.get(RSS10_TITLE, "(none)")
    print "Description:", item_data.get(RSS10_DESC, "(none)")



We start by creating a TrackingChannel instance, and then populate it with data parsed from the RSS feed at http://www.python.org/channews.rdf. RSS.py uses tuples as the property names for RSS data. This may seem an unusual approach to those not used to XML processing techniques, but it is actually a very useful way of being very precise about what was in the original RSS file. In effect, an RSS 0.91 title element is not considered to be equivalent to an RSS 1.0 one. There is enough data for the application to ignore this distinction, if it likes, by ignoring the namespace portion of each tuple; but the basic API is wedded to the syntax of the original RSS file, so that this information is not lost. In the code, we use this property data to gather all the items from the news feed for display. Notice that we are careful not to assume which properties any particular item might have. We retrieve properties using the safe form as seen in the code below.



    print "Title:", item_data.get(RSS10_TITLE, "(none)")

Which provides a default value if the property is not found, rather than this example.



    print "Title:", item_data[RSS10_TITLE]

This precaution is necessary because you never know what elements are used in an RSS feed. Listing 2shows the output from Listing 1.

Listing 2



$ python listing1.py 
RSS Item: http://www.python.org/2.2.2/
Title: Python 2.2.2b1
Description: (none)
RSS Item: http://sf.net/projects/spambayes/
Title: spambayes project
Description: (none)
RSS Item: http://www.mems-exchange.org/software/scgi/
Title: scgi 0.5
Description: (none)
RSS Item: http://roundup.sourceforge.net/
Title: Roundup 0.4.4
Description: (none)
RSS Item: http://www.pygame.org/
Title: Pygame 1.5.3
Description: (none)
RSS Item: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
Title: Pyrex 0.4.4.1
Description: (none)
RSS Item: http://www.tundraware.com/Software/hb/
Title: hb 1.88
Description: (none)
RSS Item: http://www.tundraware.com/Software/abck/
Title: abck 2.2
Description: (none)
RSS Item: http://www.terra.es/personal7/inigoserna/lfm/
Title: lfm 0.9
Description: (none)
RSS Item: http://www.tundraware.com/Software/waccess/
Title: waccess 2.0
Description: (none)
RSS Item: http://www.krause-software.de/jinsitu/
Title: JinSitu 0.3
Description: (none)
RSS Item: http://www.alobbs.com/pykyra/
Title: PyKyra 0.1.0
Description: (none)
RSS Item: http://www.havenrock.com/developer/treewidgets/index.html
Title: TreeWidgets 1.0a1
Description: (none)
RSS Item: http://civil.sf.net/
Title: Civil 0.80
Description: (none)
RSS Item: http://www.stackless.com/
Title: Stackless Python Beta
Description: (none)

Of course, you would expect somewhat different output because the news items will have changed by the time you try it. The RSS.py channel objects also provide methods for adding and modifying RSS information. You can write the result back to RSS 1.0 format using the output() method. Try this out by writing back out the information parsed in Listing 1. Kick off the script in interactive mode by running: python -i listing1.py . At the resuting Python prompt, run the following example.




>>> result = tc.output(items)
>>> print result

The result is an RSS 1.0 document printed out. You must have RSS.py, version 0.42 or more recent for this to work. There is a bug in the output() method in earlier versions.



View The Python Web services developer: RSS for Python Discussion

Page:  1 2 3 4 Next Page: rssparser.py

First published by IBM developerWorks


Copyright 2004-2024 GrindingGears.com. All rights reserved.
Article copyright and all rights retained by the author.