When I think about writing or modifying software without some way of knowing if my changes have had unintended consequences it makes me nervous - nervous in the way a pilot might feel if he were asked to fly a plane with no instrument cluster. Yet, on the other hand, when I spend more time fixing/maintaining/debugging tests than doing feature development it makes me think there has got to be a better way - the instrument panel exists for the benefit of the pilot, not vice versa.

At work we run a pretty wide variety of tests against our software - JUnit, JsUnit, and Selenium tests that span everything from short method-level tests to large functional tests that start up the server with a mock environment. The short JUnit tests have never given me a problem and they run very quickly, so they have a high utility/maintenance ratio. The Selenium tests, on the other hand, take somewhere in the range of an hour to run, seem to be somewhat flaky, and are difficult to troubleshoot. Most of the problem is not with the Selenium part, per-se, but with the way the tests are run as part of an automated build and the fact that server start-up takes a long time. The overall result, though, is that the utility/maintenance ratio is very low. It's frustrating to spend a lot of time maintaining something that isn't part of the core product and isn't actively improving the end-user's experience - especially when it keeps me from working on things that would.

As a result, I've come to the conclusion that when the utility/maintenance ratio for a test suite drops below some value N you should consider it broken and take steps to fix it. I'm not sure what the specific fix is, but you should either decrease the amount of time you have to spend in maintenance or increase the utility of the tests dramatically. Since it's easier to measure time spent in maintenance I'd start there. The utility of a test suite tends to be binary - it either catches a problem or it doesn't.

I'm not big on New Year's resolutions. First, it seems to me like people who get one chance a year to make a resolution are at a disadvantage compared to those who can introspect at any point in the year and decide that "Starting now I want to ________." Secondly, a resolution seems like a thing that is more often broken than maintained, and once broken left by the wayside until the next opportunity to make (and possibly break) it again. For these reasons I prefer goals. A goal can be set at any time and you never 'break' your goal. You may not have achieved it yet, but that doesn't imply that you've stopped trying or somehow missed your opportunity to achieve it.

While thinking about the past year I realized that I learned several new programming languages but didn't really accomplish anything significant in any of them. I learned Ruby (and Rails), Python for work, and spent some time looking at Scala (which I really like), and have recently started learning Objective-C and Cocoa. My goal, starting now, is to stick with a language until I've accomplished something significant in the language. I'm not sure what I mean by significant yet - possibly something that can be used by others - but when I accomplish it I'll know.

Rhapsody, a subscription music service I've used for the last couple years, makes a lot of their data available through RSS feeds and XML. For a project I'm working on I wanted to parse one of their RSS feeds with Ruby and I thought I'd share how I did it.

The Rhapsody feeds are RSS 2.0 but also include Rhapsody specific elements in the 'rhap' namespace, so it was necessary to use a parser that could be extended to parse these elements. Ruby includes a library for parsing RSS but after reading the source I wasn't sure how to extend it. Fortunately, some searching turned up the Syndication library which the author designed to be easily extended. After doing a 'gem install syndication' you're all set to begin coding.

The Syndication parser works by defining objects that map (roughly) to elements in the RSS document. When the parser is parsing an element it maps element attributes to object attributes with namespace (if any) prepended to the object attribute with an underscore. For example, the attribute rhap:rcid would become rhap_rcid and if an attr_accessor named rhap_rcid existed on the object it would get set to the value of the rhap:rcid attribute. I wanted to extend the Syndication::RSS:Item class so I started out by defining a module called Syndication::Rhapsody::Item that had all the properties I was interested in. Then, with that defined, it was simply a matter of including this module in the Syndication::RSS:Item class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

require 'syndication/rss'

module Syndication
  module Rhapsody
    module Item
      attr_accessor :rhap_rcid
      attr_accessor :rhap_artist
      attr_accessor :rhap_artist_rcid
      attr_accessor :rhap_album
      attr_accessor :rhap_album_rcid
      attr_accessor :rhap_album_art
      attr_accessor :rhap_album_release_date
      attr_accessor :rhap_album_original_release_date
      attr_accessor :rhap_album_type
      
      # Need to override the tag2method defined in class Container because it
      # doesn't deal with tags with dashes in them. Ruby can't handle method
      # names with dashes so we switch to underscores.
      def tag2method(tag)
        return tag.downcase.gsub(/[:-]/, '_') + '='
      end
    end
  end
  
  module RSS
      class Item
        include Rhapsody::Item
      end
    end
end

You'll notice that I also ended up overriding the definition of the tag2method method. This is because Ruby doesn't allow variable names with hyphens (quite sensibly) so the elements in the 'rhap' namespace that had a hyphen in them were getting ignored. To fix that I simply had tag2method substitute an underscore for a hyphen.

With the Syndication::RSS::Item modified it's now simply a matter of creating the parser and reading the feed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

require 'rss/2.0'
require 'open-uri'
require 'syndication/rhapsody'

class RhapsodyReader
  def initialize(url = 'http://feeds.rhapsody.com/new-releases.rss')
    @url = url
  end

  def read
    parser = Syndication::RSS::Parser.new
    @feed = nil
    open(@url) do |s|
      content = s.read
      @feed = parser.parse content
    end
    
    @feed.items.each do |item|
      puts "Got rcid '#{item.rcid}'"
    end
  end
end

For more information on the Rhapsody web-service you can go here: http://webservices.rhapsody.com/. Like I mentioned above, Rhapsody also makes some of their data available as non-RSS XML and I'll write about using REXML to parse it in the future.

I have a few programming related things I'd like to do or am in the process of doing and I thought I'd write them down here to keep track of them.

  1. Learn Scala. This is partly motivated by my desire to use the lift framework and partly by interest in the language itself. I used to shudder involuntarily when I heard the word 'functional' (unpleasant flashbacks of YAPL-to-C translators written in LISP in college) but the concept of a functional/OO hybrid language seems like a good way to restore my relationship with my highly-parenthesized brethren.
  2. Write a JSON serializer in Java. I think there are probably a couple different approaches that I can take and I'm curious which is best. I might also try to write a deserializer. This is a project I'm undertaking for informational purposes and not to use the actual code. I might also try to write a Scala implementation just to see how they differ.
  3. Learn Python. This is something I need to do for work. I've heard a lot of good things about Python so I'm looking forward to working with it. This is more of a passive learning since I'll learn as much as I need to know to do what I need to do. Probably not the best way to learn a language, but for now it'll have to suffice.
  4. Write a JavaScript parser that will let me use Java style class and interface definitions and then convert it to JavaScript on the fly. I hate the way JavaScript does classes and member access and I want some way to clearly define what my objects look like and behave like. I would love to be able to write:
    
    package com.joelpm.widgets;
    
    public class WidgetFoo {
      protected var width;
      protected var height;
    
      public WidgetFoo(var width, var height) {
        this.width = width;
        this.height = height;
      }
    
      public var getWidth() { return width; }
      public var getHeight() { return height; }
    
      public var move(var xDist, var yDist) {
        // do move...
      }
    }
    
    
    Instead of having the equivalent JavaScript that might look something like this:
    
    var com = {};
    var com.joelpm = {};
    var com.joelpm.widgets = {};
    
    com.joelpm.widgets.WidgetFoo = function(width, height) {
      this.width = width;
      this.height = height;
    }
    
    com.joelpm.widgets.WidgetFoo.prototype.getHeight = function() {
      return this.height;
    }
    
    com.joelpm.widgets.WidgetFoo.prototype.getWidth = function() {
      return this.width;
    }
    
    com.joelpm.widgets.WidgetFoo.prototype.move = function(xDist, yDist) {
      // do move...
    }
    
    
    Clearly, the JS parser would just convert the much nicer looking Java style class declaration syntax into the equivalent JS syntax and evaluate it, but development would be much nicer and the code would be clearer to both myself and anyone else who has to read it. I know there are JS compilers like Google's GWT that will turn actual Java into JS, but I'm looking for something that doesn't require a compile cycle to pick up changes. Anyway, this is probably something I'll never get around to, but the thought was prompted by a discussion I had yesterday.
  5. In theory I also want to keep working on my RoR picture application. I've wanted an application that would let me upload all my pictures to a directory on my webhost via FTP and then process them all with a web-interface. Once they're all uploaded I want to go to the web-interface, type in the directory name, and click a button. The app should then find all new pictures, load the details (EXIF, size, etc) into a DB, create thumbnails, and create a default album. From there I want to be able to add the photos to custom albums and set privacy. I've got some basic code to read a dir and insert photos into a DB but there's a lot left to do. Recently this has been less important than learning Scala, but I may bounce back and forth between the two.

Looking back over my list I guess I've got enough to keep me busy for a while, and that's not counting the non-programming projects I'd also like to tackle at some point, like setting up my Ubuntu workstation as a wireless router that connects to the internet using my EVDO card. So much to do, so little time...

Last night I created a new Ruby-on-Rails project and got a subversion repository set up for it so that I could hack away on my 'fun project' whenever I had a few minutes. This morning I read about a programming language called Scala that compiles down to Java bytecode and an accompanying web framework called lift that is like Rails but performs 6x faster and is multi-threaded.

I'm greatly disappointed. How's a guy supposed to get into something new if every time he turns around there's something even cooler out there?

For now I'll stay focused on RoR since I don't have a hosting provider that lets me run a JVM, but Scala + Lift looks really cool.

It's been a while since I programmed something for fun. I could be wrong, but the only thing turned up by a quick perusal of my memory was a Java applet I wrote in 1999 that read documents off the server and displayed their contents. I was a sophomore in college at the time.

Part of the reason it's been a while since I wrote something strictly for fun is because I'm fortunate enough to work on things that are fun. Later that year I got paid to write a Java program that ran on Linux and controlled a Sony video-conferencing camera via the serial port. The next year I got to implement terrain rendering algorithms in C++/OpenGL as part of my job. And then there were all the programs I wrote for classes.

In the professional world I've been fortunate to work on all sorts of things, from platform software to social software, but it's been a while since I banged away on something I started myself and it would be fun to get back into that. I thought maybe hacking on Mephisto would be my next project, but we'll have to see. Whatever I work on will have to be easy to work on in small increments since there's not a lot of time available for it.

At work I'm part of a reading group that's going through Java Concurrency in Practice. This week we're reading chapter 7 on Task Cancellation and the authors discuss the difference between how a task and a thread deal with the interruption status bit. The idea is that a task is just borrowing the thread, it doesn't own it, and thus it has a responsibility to be a good steward of thread properties - the interrupted flag in this case. Their point is helpfully illustrated with this real life example:

If you are house-sitting for someone, you don't throw out the mail that comes while they're away - you save it and let them deal with it when they get back, even if you do read their magazines.
Goetz, Brian. Java Concurrency in Practice. New Jersey: Addison-Wesley, 2006

I found the illustration humorous and also appreciated the authors ability to reduce the idea to something any layman could understand. I highly recommend the book to anyone who is doing multi-threaded programming in Java (and probably C# as well). It's given me a much better understanding of the concepts and made me more aware of the complexities of multi-threaded software.

Since I get to play with Java at work all day I decided to spend the free minutes I have on Ruby/Rails. Towards that end I decided I'd start with an app that already existed and peak at its innards while extending it. A blogging application seemed like the perfect start and Mephisto, based on what I've read, is much simpler than Typo so I chose it.

My goal was to get a full development environment set up so that I could develop and run locally, store code in Subversion, and deploy to the live site using Capistrano. My local development is done on OS X and the live site is on a shared host provided by Site5.

There were a couple articles that I found very helpful. The first is over at TheBitGuru titled Setting up Capistrano on Site5. This article will help you get your Subversion repository set up and your Rails application 'Capistranized.' After you've done that head over to fluct.isono.us and check out Moving Home... with Capistrano on Site5 and part two Deploy Mephisto on Rails RC1 with Capistrano.

After those articles you should have Mephisto checked into your Subversion repository on Site5 and deployable with a shared frozen rails via James' shared_rails rake task. It was at this point that I started running into some problems. I would deploy my code and nothing would happen - attempts to access joelpm.com would result in a 0 byte response and there were no results in the production log and nothing in the fastcgi crash log.

As a sanity check I ran Adam Greenfield's Mephisto install script and everything worked, so I knew that it was possible to get Mephisto running on Site5. After that I started on the dozen different attempts to fix things, one of which included rewriting the shared_rails task based on the code in the rails:freeze:edge task.

shared_rails.rake:

desc "deploy shared rails environment"
task :deploy_shared_rails do
  
  ENV['SHARED_PATH'] = '../../shared' unless ENV['SHARED_PATH']
  ENV['RAILS_PATH']  = File.join(ENV['SHARED_PATH'], 'rails')
  svn_root = "http://dev.rubyonrails.org/svn/rails/"
  symlink_path  = 'vendor/rails'

  if ENV['TAG']
    rails_svn = "#{svn_root}/tags/#{ENV['TAG']}"
    export_path = "#{ENV['RAILS_PATH']}/tag_#{ENV['TAG']}"
  else
    rails_svn = "#{svn_root}/trunk"
    if ENV['REVISION'].nil?
      ENV['REVISION'] = /^r(\d+)/.match(%x{svn -qr HEAD log #{svn_root}})[1]
      puts "REVISION not set. Using HEAD, which is revision #{ENV['REVISION']}."
    end
    export_path = "#{ENV['RAILS_PATH']}/rev_#{ENV['REVISION']}"   
  end
      
  # do we need to export this tag/revision?
  unless File.exists?(export_path)
    puts "setting up rails " + (ENV['TAG'] ? "tag #{ENV['TAG']}" : "rev #{ENV['REVISION']}")

    mkdir_p export_path

    get_framework do |framework|
      system "svn export #{rails_svn}/#{framework} #{export_path}/#{framework}" + (ENV['REVISION'] ? " -r #{ENV['REVISION']}" : "")
    end
  end

  puts 'linking rails'
  rm_rf   symlink_path
  mkdir_p symlink_path

  get_framework do |framework|
    ln_s File.expand_path("#{export_path}/#{framework}"), "#{symlink_path}/#{framework}"
  end
  
  touch symlink_path + (ENV['TAG'] ? "/TAG_#{ENV['TAG']}" :  "/REVISION_#{ENV['REVISION']}")
end

def get_framework
  %w( railties actionpack activerecord actionmailer activesupport actionwebservice ).each do |framework|
    yield framework
  end
end

I don't think rewriting the task actually fixed anything since the problem was elsewhere, but it was a good learning exercise and I'm posting it here for your reference.

When I tried running dispatch.fcgi from the command line I kept getting this error:

undefined method `downcase' for nil:NilClass //vendor/rails/actionpack/lib/action_controller/request.rb:20:in `method'

I thought surely this was the reason I couldn't get Mephisto running so I tried different things to solve the problem. However, it turned out that the problem was that when running from the command line one of the environment parameters doesn't get set. If you run the command like this everything works: REQUEST_METHOD=GET public/dispatch.fcgi.

After dispatch.fcgi was happily returning a response I decided that the problem must be somewhere between Apache and fcgi so I took a look at the Apache error logs. Sure enough, there were a bunch of errors about the permissions on the /public directory being too permissive. Chmod'ing to 755 fixed everything and Mephisto finally came up.

I'm very happy with the current setup. I'm able to develop locally using Locomotive/TextMate/Terminal and deploy the latest code just by running 'cap deploy'. Many thanks to TheBitGuru and fluct.isono.us for their very helpful articles. I'll be happy to answer any questions I can about getting things up and running on Site5.

I've used a few revision control systems (RCS) over the years. At IBM I used CMVC and CVS, at Xanga I used SourceGear's Vault and SubVersion, and while I was interning at SAIC/Demaco I even got a little experience with ClearCase. All of these are fine RCSs, each with its strengths and weaknesses. CMVC, at least in the way it was used at IBM, is an excellent cross-platform solution that does not only revision control but also defect tracking, build management, and will make you ice cream if properly configured (not really) - I hated it until I learned how to use it and then I loved it. CVS is probably the most widely used RCS and it's good at what it does, though it has some shortcoming around folder management, which is one of the reasons we have Subversion. Subversion is a solid tool that seeks to pick up where CVS left off and in my experience (on windows, using TortoiseSVN) it worked great. Vault, from SourceGear, is also another solid offering. And I don't remember much about ClearCase, so I won't comment on it.

The obvious omission here (to anyone doing Visual Studio development) is Visual Source Safe, which I can now say I've had the misfortune of using. I've read blog entries on how awful VSS is, talked to colleague who had bad experiences, and today I experienced my first extensive code-loss due to VSS. While it is entirely possible that the fault was mine and I clicked the wrong button, or didn't check the write box, or some other user error, I never had that problem with any of the other five RCSs I've used. That leads me to conclude that if VSS didn't outright do something wrong then the behavior of the app is so counter-intuitive that it might as well be wrong. While I don't relish redoing all my work the code that I'll end up rewriting will probably be written better; however, I can't quite bring myself to face the CSS files yet.

This is one of the reasons I can't wait for Apple's OS X Leopard and Timemachine. I would love to have a RCS for all my local files - not that it will help me out much with VSS, but if Apple does it now, Microsoft will probably release it in whatever comes after Windows Vista, though hopefully by then I won't be doing development on a Windows platform.

When we started work on the new profile manager, specifically the parts that involve managing your friends and invites, we wanted to use AJAX to provide a fast and effective user interface. At the start I spent a lot of time beating my head against a wall trying to figure out how to structure the JS code so it wasn't one big hack. While a lot of the JS code was for display purposes, some of it was also business logic, which now straddled both the client and the server.

We decided early on to use Microsoft's Atlas framework and one of the benefits it provides is the ability to use namespaces and create classes in Javascript. From a programming perspective this at least allows you to organize your code (though it doesn't do anything to insure you have a good architecture). To achieve a simple yet solid architecture I wanted to create an MVC based system. This system would have simple objects, retrieved from the server via Atlas ASMX proxies (another big benefit of Atlas); business objects that wrapped these simple objects and provided some simple functionality; views that were HTML representations of these business objects; a single page class that managed all the resources for the current page; and various util classes like sorters (for sorting objects), pagers (for paging through arrays of objects), and controls (a JS/XHTML/CSS combo that provides UI functionality).

The image below is a simplified view of this architecture:

As you can see, one object may have multiple views associated with it, rendered in different contexts of the page.

Here's an example of how this looks in practice:

The items highlighted in red are views of business objects - specifically a Friend object and a Group object. The Group object has two views - one on the drop-down menu (which is a control, highlighted in green) and one on the Edit view of the Friend object. The Friend object also has two views - an Edit view (the layer that is hovering, which is actually a view built on a control, though it's not highlighted in green) and the Default view, which is the card-like representation of a Friend.

The separation between the view and the model is very clean, however, the controller and the view aren't separated as cleanly as I would prefer. This is because the view frequently provides actions for the user to perform. For example, if I were to click the checkbox next to 'strangers' and then click the 'Save' button the Friend object is updated and then a web-service call is made by the view to the Server to update the Friend object in the DB. The view needs to know whether or not the call succeeded (and it's asynchronous) and then act accordingly.

One of the benefits of this system is that views are only rendered for objects that are currently visible. Doing this greatly reduces the complexity of the DOM. For instance, Dan has over 1300 friends, and though the page takes two or three seconds to load (over broadband) once the data is loaded navigating through it is very fast. Also, because all the data is loaded on the client we can provide very fast sorting and filtering (without a roundtrip). The burden of this sorting and filtering isn't born by our servers, either, in fact none of the data is sorted by the DB, all it has to do is SELECT WITH(NOLOCK) WHERE - which is very quick.

As mentioned in previous posts, the profile.xanga.com site is hosted at a different co-lo than the main www.xanga.com site (I'm going to refer to the co-los by the subdomain that lives at each one from here on out). A relatively small amount of data is shared between the two sites and the read/update ratio for that data is pretty high, so we were able to optimize for that scenario.

One set of data that takes this scenario to an extreme is the Metros data (stored at the www co-lo). This data is like a tree with great breadth and very shallow depth. It's persisted as a table in a DB and the most frequent operation is a look-up on a leaf node to map an ID to a name. Updates happen very infrequently and consist of a leaf node being updated or created; the tree structure itself doesn't change.

Because we need to look up information from this table on nearly every page request to the profile site we decided to cache all the data in memory on each of the web-servers and periodically refresh it. When IIS started it would make a call to the metros DB at the other co-lo and load all the data into memory. This provided very fast lookups and reduced the load on the metros DB at the expense of some memory on each of the webservers (and memory is cheap).

This worked great, with one small problem - whenever we redeployed code to the webservers they would all restart and try to repopulate the local caches. This swamped the DB server and resulted in only some of the webservers getting the data, the rest timed out and threw errors.

To get around this I ended up writing a little app that makes a request to the DB to get all the data and then writes it to an XML file. The XML file is then robocopied to each of the webservers. The app is set up as a scheduled task on one of the servers and runs every couple hours.

Although this solution is really simple it has a bunch of benefits. First, compared to the other approach, only a fraction of the data gets sent between the two co-los (we're requesting it once instead of N times). Second, it's also much less DB intensive (for the same reason). Third, it provides a level of redundancy - if the DB happens to be down IIS will just read the old copy of the XML file and at least we'll have some of the data. And finally, I'm pretty sure that reading a local XML file is faster than requesting all the data from a remote SQL Server, so the cache gets populated faster.

I'm sure it's nothing new or novel, but I found it interesting.

I've been reading through Programming Ruby: The Pragmatic Programmers' Guide and came across this passage
Because if itself is an expression, you can get really obscure with statements such as

if artist == "John Coltrane"
  artist = "'Trane"
end unless use_nicknames == "no"

This path leads to the gates of madness.
which made me laugh. But it was also a relief as it seems to indicate that Ruby programmers (perhaps unlike Perl programmers) aren't set on doing things just because they can.