- by Joel
- 03/30/2006
- AJAX, Performance, Xanga
- 1 comments
One of the problems with uploading files via a web-page is that you have no idea how the upload is progressing until it finishes. However, the capabilities provided by AJAX make it possible to check on the progress of an upload by making periodic calls to the server to find out how much of the file has been transferred. You can even go a step further and make your AJAX component/control smart enough to restart the upload if no bytes have been transferred after a few seconds. It's a great way to improve the user experience, and we use a third-party component at Xanga that does just that.
However, when there is an error in the server-side application that is supposed to receive the uploaded image, this type of smart behavior by the client can actually come back to bite you - which is what happened to us. Because the upload app was failing with an error the image transfer was never being started. The AJAX component would see that the upload failed and it would try again, about a second-and-a-half later. This cycle repeated endlessly until the user tired of waiting and closed their browser window/tab. With several thousand users trying to upload images this effectively DDOSed everything at that co-lo. The routers were at 100% CPU utilization trying to keep up with all the requests coming in, which slowed down service to everything - profile.xanga.com included, simply because it sat behind the same routers.
The problem was exacerbated by the fact that the tool we use to aggregate and analyze errors from all the servers was misconfigured on the upload servers, so we had no idea that the application was broken. I'm not sure who figured out that the upload app was broken (Bob?), or how, but after that was known the rest of the story fell into place quickly enough. Needless to say, the problem has been fixed and the error aggregation tool has been properly configured.
May 21st, 2007 at 01:09 PM as if you needed another blog? i like this one though...had similar slowness with fleaflicker in august that drove me crazy. after i was 100% certain it wasn't my code, i complained to my hosting service...turned out to be a faulty switch (which they maintain). you and the xangans should swing by lauren's bday tomorrow--will be a good time.