As mentioned in previous posts, the profile.xanga.com site is hosted at a different co-lo than the main www.xanga.com site (I'm going to refer to the co-los by the subdomain that lives at each one from here on out). A relatively small amount of data is shared between the two sites and the read/update ratio for that data is pretty high, so we were able to optimize for that scenario.

One set of data that takes this scenario to an extreme is the Metros data (stored at the www co-lo). This data is like a tree with great breadth and very shallow depth. It's persisted as a table in a DB and the most frequent operation is a look-up on a leaf node to map an ID to a name. Updates happen very infrequently and consist of a leaf node being updated or created; the tree structure itself doesn't change.

Because we need to look up information from this table on nearly every page request to the profile site we decided to cache all the data in memory on each of the web-servers and periodically refresh it. When IIS started it would make a call to the metros DB at the other co-lo and load all the data into memory. This provided very fast lookups and reduced the load on the metros DB at the expense of some memory on each of the webservers (and memory is cheap).

This worked great, with one small problem - whenever we redeployed code to the webservers they would all restart and try to repopulate the local caches. This swamped the DB server and resulted in only some of the webservers getting the data, the rest timed out and threw errors.

To get around this I ended up writing a little app that makes a request to the DB to get all the data and then writes it to an XML file. The XML file is then robocopied to each of the webservers. The app is set up as a scheduled task on one of the servers and runs every couple hours.

Although this solution is really simple it has a bunch of benefits. First, compared to the other approach, only a fraction of the data gets sent between the two co-los (we're requesting it once instead of N times). Second, it's also much less DB intensive (for the same reason). Third, it provides a level of redundancy - if the DB happens to be down IIS will just read the old copy of the XML file and at least we'll have some of the data. And finally, I'm pretty sure that reading a local XML file is faster than requesting all the data from a remote SQL Server, so the cache gets populated faster.

I'm sure it's nothing new or novel, but I found it interesting.