Mark Pilgrim offers another module for RSS file parsing. It doesn't provide all the features and options that RSS.py does, but it does offer a very liberal parser, which deals well with all the confusing diversity in the world of RSS. To quote from the rssparser.py page:
You see, most RSS feeds suck. Invalid characters, unescaped ampersands (Blogger feeds), invalid entities (Radio feeds), unescaped and invalid HTML (The Register's feed most days). Or just a bastardized mix of RSS 0.9x elements with RSS 1.0 elements (Movable Type feeds).
Then there are feeds, like Aaron's feed, which are too bleeding edge. He puts an excerpt in the description element but puts the full text in the content:encoded element (as CDATA). This is valid RSS 1.0, but nobody actually uses it (except Aaron), few news aggregators support it, and many parsers choke on it. Other parsers are confused by the new elements (guid) in RSS 0.94 (see Dave Winer's feed for an example). And then there's Jon Udell's feed, with the
fullitemelement that he just sort of made up.
It's funny to consider this in the light of the fact that XML and Web services are supposed to increase interoperability. Anyway, rssparser.py is designed to deal with all the madness.
Installing rssparser.py is also very easy. You download the Python
file (see Resources), rename it from "rssparser.py.txt" to
"rssparser.py", and copy it to your
PYTHONPATH. I also
suggest getting the optional timeoutsocket module which improves the
timeout behavior of socket operations in Python, and thus can help
getting RSS feeds less likely to stall the application thread in case
As you can see, the code is much simpler. The trade-off between RSS.py and rssparser.py is largely that the former has more features, and maintains more syntactic information from the RSS feed. The latter is simpler, and a more forgiving parser (the RSS.py parser only accepts well-formed XML).
The output should be the same as in Listing 2.