- by Joel
- 11/06/2007
- Programming, Ruby
- 0 comments
Rhapsody, a subscription music service I've used for the last couple years, makes a lot of their data available through RSS feeds and XML. For a project I'm working on I wanted to parse one of their RSS feeds with Ruby and I thought I'd share how I did it.
The Rhapsody feeds are RSS 2.0 but also include Rhapsody specific elements in the 'rhap' namespace, so it was necessary to use a parser that could be extended to parse these elements. Ruby includes a library for parsing RSS but after reading the source I wasn't sure how to extend it. Fortunately, some searching turned up the Syndication library which the author designed to be easily extended. After doing a 'gem install syndication' you're all set to begin coding.
The Syndication parser works by defining objects that map (roughly) to elements in the RSS document. When the parser is parsing an element it maps element attributes to object attributes with namespace (if any) prepended to the object attribute with an underscore. For example, the attribute rhap:rcid would become rhap_rcid and if an attr_accessor named rhap_rcid existed on the object it would get set to the value of the rhap:rcid attribute. I wanted to extend the Syndication::RSS:Item class so I started out by defining a module called Syndication::Rhapsody::Item that had all the properties I was interested in. Then, with that defined, it was simply a matter of including this module in the Syndication::RSS:Item class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
require 'syndication/rss' module Syndication module Rhapsody module Item attr_accessor :rhap_rcid attr_accessor :rhap_artist attr_accessor :rhap_artist_rcid attr_accessor :rhap_album attr_accessor :rhap_album_rcid attr_accessor :rhap_album_art attr_accessor :rhap_album_release_date attr_accessor :rhap_album_original_release_date attr_accessor :rhap_album_type # Need to override the tag2method defined in class Container because it # doesn't deal with tags with dashes in them. Ruby can't handle method # names with dashes so we switch to underscores. def tag2method(tag) return tag.downcase.gsub(/[:-]/, '_') + '=' end end end module RSS class Item include Rhapsody::Item end end end |
You'll notice that I also ended up overriding the definition of the tag2method method. This is because Ruby doesn't allow variable names with hyphens (quite sensibly) so the elements in the 'rhap' namespace that had a hyphen in them were getting ignored. To fix that I simply had tag2method substitute an underscore for a hyphen.
With the Syndication::RSS::Item modified it's now simply a matter of creating the parser and reading the feed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
require 'rss/2.0' require 'open-uri' require 'syndication/rhapsody' class RhapsodyReader def initialize(url = 'http://feeds.rhapsody.com/new-releases.rss') @url = url end def read parser = Syndication::RSS::Parser.new @feed = nil open(@url) do |s| content = s.read @feed = parser.parse content end @feed.items.each do |item| puts "Got rcid '#{item.rcid}'" end end end |
For more information on the Rhapsody web-service you can go here: http://webservices.rhapsody.com/. Like I mentioned above, Rhapsody also makes some of their data available as non-RSS XML and I'll write about using REXML to parse it in the future.