Rhapsody, a subscription music service I've used for the last couple years, makes a lot of their data available through RSS feeds and XML. For a project I'm working on I wanted to parse one of their RSS feeds with Ruby and I thought I'd share how I did it.

The Rhapsody feeds are RSS 2.0 but also include Rhapsody specific elements in the 'rhap' namespace, so it was necessary to use a parser that could be extended to parse these elements. Ruby includes a library for parsing RSS but after reading the source I wasn't sure how to extend it. Fortunately, some searching turned up the Syndication library which the author designed to be easily extended. After doing a 'gem install syndication' you're all set to begin coding.

The Syndication parser works by defining objects that map (roughly) to elements in the RSS document. When the parser is parsing an element it maps element attributes to object attributes with namespace (if any) prepended to the object attribute with an underscore. For example, the attribute rhap:rcid would become rhap_rcid and if an attr_accessor named rhap_rcid existed on the object it would get set to the value of the rhap:rcid attribute. I wanted to extend the Syndication::RSS:Item class so I started out by defining a module called Syndication::Rhapsody::Item that had all the properties I was interested in. Then, with that defined, it was simply a matter of including this module in the Syndication::RSS:Item class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

require 'syndication/rss'

module Syndication
  module Rhapsody
    module Item
      attr_accessor :rhap_rcid
      attr_accessor :rhap_artist
      attr_accessor :rhap_artist_rcid
      attr_accessor :rhap_album
      attr_accessor :rhap_album_rcid
      attr_accessor :rhap_album_art
      attr_accessor :rhap_album_release_date
      attr_accessor :rhap_album_original_release_date
      attr_accessor :rhap_album_type
      
      # Need to override the tag2method defined in class Container because it
      # doesn't deal with tags with dashes in them. Ruby can't handle method
      # names with dashes so we switch to underscores.
      def tag2method(tag)
        return tag.downcase.gsub(/[:-]/, '_') + '='
      end
    end
  end
  
  module RSS
      class Item
        include Rhapsody::Item
      end
    end
end

You'll notice that I also ended up overriding the definition of the tag2method method. This is because Ruby doesn't allow variable names with hyphens (quite sensibly) so the elements in the 'rhap' namespace that had a hyphen in them were getting ignored. To fix that I simply had tag2method substitute an underscore for a hyphen.

With the Syndication::RSS::Item modified it's now simply a matter of creating the parser and reading the feed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

require 'rss/2.0'
require 'open-uri'
require 'syndication/rhapsody'

class RhapsodyReader
  def initialize(url = 'http://feeds.rhapsody.com/new-releases.rss')
    @url = url
  end

  def read
    parser = Syndication::RSS::Parser.new
    @feed = nil
    open(@url) do |s|
      content = s.read
      @feed = parser.parse content
    end
    
    @feed.items.each do |item|
      puts "Got rcid '#{item.rcid}'"
    end
  end
end

For more information on the Rhapsody web-service you can go here: http://webservices.rhapsody.com/. Like I mentioned above, Rhapsody also makes some of their data available as non-RSS XML and I'll write about using REXML to parse it in the future.

Leave a Reply