Monday 9 July 2007

Service Availability - oh the irony

I only posted about this last Friday and wouldn't you know we had an outage this morning. It was brief and was more of a slow-down than an outage but timing is everything.

We are redeveloping our web site at the moment and will be including a section to contain reports on outages and availability However until then I feel I've let the genie out of the bottle and should follow through with an explanation of what caused the outage.

In short we brought onboard a set of new front-end application servers and our load-balancers did not behave as we expected. They decided to ignore all-bar-one of the new boxes, giving all traffic to that one box. The new machines are faster than the old ones, but not that much faster.

It was picked up very quickly and resolved within a 3-4 minutes. We're updating our procedures to make sure this doesn't happen again. Sorry if you were affected.

No comments: