Redundant Web servers
Daniel Feenberg
feenberg-fCu/yNAGv6M at public.gmane.org
Mon Dec 14 18:39:46 EST 2009
A plan to improve web server availability:
Assume one has a backup webserver with similar capacity to the production
server. Redundancy can be achieved by adding a second A record to the www
record pointing to the backup webserver. Then the DNS server will return
both records for each query, in random order. If both webservers are up,
obviously no problem. If one is down, we anticipate that the browsers will
go on to the 2nd address if the first server address does not respond. The
question is, how universal is this ability? I have been doing some
experiments and I believe it is near universal among recent browsers.
Using the latest versions of six browsers on various machines about the
office
MSIE 8.0.6001.18702
Opera 10.10
Safari 4.0.4 531.21.10
Firefox 3.5.30729
Chrome 4.0.239.30
Konqueror 4.3.3
the worst result when one server was down was a delay of about 30 seconds
before the page was loaded. I conclude that some browsers have a 30 second
timeout before trying the next IP address. FF 2.0 and Lynx never switched.
I was able to test Safari 4.0, FF 2.0 and Chrome 3.0 on Adobe Browserlab,
and those all failed to switch within the Browserlab timeout. On my PC,
Safari 4.0.4 did manage to switch after about 30 seconds, so perhaps
Browserlab is to quick to give up.
During periods when one server was down, users of non-switching browsers
would have a 50% chance of getting the bad server in an individual browser
session, but the chance of one of two servers being down is about double
the chance of one server being down. This is close to a wash then, for the
older browsers and a pretty big win for the newer ones. It is true that a
user with an older browser could close his browser and wait 5 minutes (our
DNS TTL) for another chance, but probably most users wouldn't do that.
This isn't perfect, but I think achieves some valuable redundancy at low
cost, and does not introduce any single point of failure that didn't exist
before. It does not require any special topology, hardware or skills
either.
On our web site all internal links are relative. This means that once a
browser session finds a working server, it will stay with that server - so
there is only one delay per visitor. If the links were absolute, then
there would be a delay on 50% of page views, not very attractive.
See also:
http://www.tenereillo.com/GSLBPageOfShame.htm
but he doesn't name any compatible browsers, and back in 2004, I don't
think there were many.
I did a similar experiment some years ago with mail clients. The results
were much worse, but maybe I should try again:
http://www.nber.org/redundant.html
Comments or suggestions?
Daniel Feenberg
feenberg-fCu/yNAGv6M at public.gmane.org
More information about the Discuss
mailing list