Technical issues on Linux
John Chambers
jc at minya.bcs.org
Mon Aug 12 22:39:00 EDT 1996
| >I'm looking for hints on fixing any of three technical problems associated
| >with managing high-volume Linux servers...
| >
| >1) The TCP connect() system call apparently never times out in current
| > versions of Linux. This caused mail server crashes here during the AOL
| > outage Tuesday (hundreds of sendmail processes building up, waiting
| > indefinitely for the AOL server to come back up).
|
| I'm surprised this would be in ther kernel. I would think the
| timeouts are settable by the application. Perhaps a newer version of
| sendmail or another of the many mailers (smail or procmail come to
| mind) might handle this better? You might be correct though, this
| should be a simple test file to write and see it it does actually
| timeout.
Hmmm ... I've tackled this problem before, on a couple of Unices, and
never came up with a solution that actually worked. Do you have code
that reliably times out a connect() after n seconds? What does the
code look like? Does it really work, even with an uncooperative other
end to the connection?
The connect() call itself lacks any timeout mechanism; you can check
this with `man 2 connect`. So there's no help there.
The obvious thing to do is to set an alarm() n seconds in the future.
This might work somewhere, but what it has done on the systems I've
used are: The alarm callback routine gets called; sets the appropriate
flags, etc, and returns. It returns into the kernel, with the process
still hung on the connect(), and it doesn't return to the process.
You'd expect it would return with an EINTR, but in my experience, this
never happens. It waits until the tcp routines time out the
connection, and only then does it return the EINTR.
I'd guess that this is a bug in the original BSD code, but it seems to
be a bug that has been propogated to a lot of systems.
Maybe I'll try to find some spare time to work up a test on linux. In
the meantime, does anyone know if linux's connect() can actually be
timed out in a controlled, reliable fashion? (If so, I'd have one more
argument in support of linux. ;-)
More information about the Discuss
mailing list