Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
| >I'm looking for hints on fixing any of three technical problems associated | >with managing high-volume Linux servers... | > | >1) The TCP connect() system call apparently never times out in current | > versions of Linux. This caused mail server crashes here during the AOL | > outage Tuesday (hundreds of sendmail processes building up, waiting | > indefinitely for the AOL server to come back up). | | I'm surprised this would be in ther kernel. I would think the | timeouts are settable by the application. Perhaps a newer version of | sendmail or another of the many mailers (smail or procmail come to | mind) might handle this better? You might be correct though, this | should be a simple test file to write and see it it does actually | timeout. Hmmm ... I've tackled this problem before, on a couple of Unices, and never came up with a solution that actually worked. Do you have code that reliably times out a connect() after n seconds? What does the code look like? Does it really work, even with an uncooperative other end to the connection? The connect() call itself lacks any timeout mechanism; you can check this with `man 2 connect`. So there's no help there. The obvious thing to do is to set an alarm() n seconds in the future. This might work somewhere, but what it has done on the systems I've used are: The alarm callback routine gets called; sets the appropriate flags, etc, and returns. It returns into the kernel, with the process still hung on the connect(), and it doesn't return to the process. You'd expect it would return with an EINTR, but in my experience, this never happens. It waits until the tcp routines time out the connection, and only then does it return the EINTR. I'd guess that this is a bug in the original BSD code, but it seems to be a bug that has been propogated to a lot of systems. Maybe I'll try to find some spare time to work up a test on linux. In the meantime, does anyone know if linux's connect() can actually be timed out in a controlled, reliable fashion? (If so, I'd have one more argument in support of linux. ;-)
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |