Technical issues on Linux

John Chambers jc at minya.bcs.org
Mon Aug 12 22:39:00 EDT 1996


| >I'm looking for hints on fixing any of three technical problems associated
| >with managing high-volume Linux servers...
| >
| >1)  The TCP connect() system call apparently never times out in current
| >  versions of Linux.  This caused mail server crashes here during the AOL
| >  outage Tuesday (hundreds of sendmail processes building up, waiting
| >  indefinitely for the AOL server to come back up).
| 
| I'm surprised this would be in ther kernel.  I would think the
| timeouts are settable by the application.  Perhaps a newer version of
| sendmail or another of the many mailers (smail or procmail come to
| mind) might handle this better?  You might be correct though, this
| should be a simple test file to write and see it it does actually
| timeout.

Hmmm ...  I've tackled this problem before, on a couple of Unices, and
never  came up with a solution that actually worked.  Do you have code
that reliably times out a connect() after n seconds?   What  does  the
code look like?  Does it really work, even with an uncooperative other
end to the connection?

The connect() call itself lacks any timeout mechanism; you  can  check
this with `man 2 connect`.  So there's no help there.

The obvious thing to do is to set an alarm() n seconds in the  future.
This  might  work  somewhere, but what it has done on the systems I've
used are: The alarm callback routine gets called; sets the appropriate
flags, etc, and returns.  It returns into the kernel, with the process
still hung on the connect(), and it doesn't  return  to  the  process.
You'd expect it would return with an EINTR, but in my experience, this
never  happens.   It  waits  until  the  tcp  routines  time  out  the
connection, and only then does it return the EINTR.

I'd guess that this is a bug in the original BSD code, but it seems to
be a bug that has been propogated to a lot of systems.

Maybe I'll try to find some spare time to work up a test on linux.  In
the meantime, does anyone know if linux's connect()  can  actually  be
timed out in a controlled, reliable fashion? (If so, I'd have one more
argument in support of linux.  ;-)




More information about the Discuss mailing list