Tuesday 11 March 2008

Putting SLAP to use - Dynamic Config

Now I've got the specifications of the SLAP protocol out of the way, I'd like to describe how we're using it and the benefits we are seeing. The first being the dynamic configuration of service end points.

In a service like ours we have a series of nodes that process messages and apply rules to manipulate and route them. If we want to change the location for a service, we have to first notify all the upstream services of the change of location.

In a planned scenario this can be laborious. All services that may make use of the service have to be reconfigured, often requiring a restart. It is possible to have these services monitor a config file for changes to avoid a restart but this doesn't remove the need to manually go through each service config and make the necessary changes. This becomes increasingly complex in a distributed system spanning many machines and networks.

The unplanned scenario is even less desirable. There are many reasons why a service can stop and this will generally be at the most inopportune times. Having to manually reconfigure services in response to a failure is not acceptable in a high availability scenario like ours.

The other problem with the unplanned scenario is that the client service has to deal with the unavailability at the point it's attempting to consume the service. Say for example this is a network service, it could be waiting for a network timeout before it recognises that the service is not accepting requests. It then has to fire-fight, cleaning up because it thinks the service is unavailable.

Far better for it to be told categorically that the service is unavailable, it can then queue requests or whatever has been coded as appropriate, while it waits to be informed that the service is operational again.

It is possible to use separate load-balancers to handle this kind of outage, but they cost money, need configuring, draw power and the solution the bring is by no means dynamic. I'll actually discuss load-balancing using SLAP in a subsequent post.

In the SLAP world, services are not configured to use fix endpoints like sip:ems3.prod1.esendex.com:8067 but rather are bound to well known service names eg: slap:smsrouter. The actual endpoints to the service are advertised in ANNOUNCE in order that clients can maintain a record of the current state of the services they need to consume.

If a client service wants to make use of an smsrouter service, it checks the state in it's local service state table before sending the request to the correct service URI. This information is kept up to date by the service. Most will announce their state on a periodic basis but I'd also consider it good practice for the service to send an ANNOUNCE when it's state changes, perhaps when it's under load or is shutting dowm, to ensure all clients are kept up to date.

The development team at Esendex have also found it very useful when building and debugging services as Jonathan describes in SLAP, my service’s up!. This was an unforeseen benefit but another one that has cut out a lot of the hassle with debugging services. The guys can step through the code, find the issue, write the unit test and rebuild very quickly which get's us to market far quicker.

More on load balancing soon.

No comments: