Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU |
There are some very bright people on this list. I'm betting at least a few of them can help me out with a problem I'm having. I'd prefer to keep the discussion on the list, but if someone wanted to e-mail me privately about this, you could do so at code at pizzashack dot org. I've got some Python code that uses select.select() to capture all the output of a subprocess (both stdout and stderr, see below). This code works as expected on a variety of Fedora systems running Python > 2.4.0, but on a Debian 3.1 system running Python 2.2.1 it's a no-go. I'm trying to figure out where the bug is... I've been looking at this for a while, and I'm starting to think it's not in my code at all, but instead perhaps it's in the underlying tools or OS. Searching for bugs in Python 2.2.1 didn't turn up anything useful... The behavior I see is as follows. the call to select() returns: [<fobj corresponding to the child's STDOUT>] [] [] If and only if the total amount of output is greater than the specified buffer size, then reading on this file hangs indefinitely. For what it's worth, the program whose output I need to capture with this generates about 17k of output to STDERR, and about 1k of output to STDOUT, at essentially random intervals. But I also ran it with a test shell script that generates roughly 40k of output to each file object, alternating between STDOUT and STDERR in roughly line-sized chunks, with the same results. Using that shell script, my code works fine on a variety of systems, but using all the same code, is broken on Debian 3.1 with Python 2.2.1 on it. [I know, I'm repeating myself.] Yes, I'm aware that this version of Python is quite old, but I don't have a great deal of control over that (though if this is indeed a python bug, as opposed to a problem with my implementation, it might provide some leverage to get it upgraded)... Thanks in advance for any help you can provide. The code in question (quite short) follows: def capture(cmd): buffsize = 8192 inlist = [] inbuf = "" errbuf = "" io = popen2.Popen3(cmd, True, buffsize) inlist.append(io.fromchild) inlist.append(io.childerr) while True: ins, outs, excepts = select.select(inlist, [], []) for i in ins: x = i.read() if not x: inlist.remove(i) else: if i == io.fromchild: inbuf += x if i == io.childerr: errbuf += x if not inlist: break if io.wait(): raise FailedExitStatus, errbuf return (inbuf, errbuf) If anyone would like, I could also provide a shell script and a main program one could use to test this function... MY ANALYSIS ----------- The whole point of using select() is that it should only return a list of file objects which are ready for reading or writing. In this case, in both the working case (Python 2.4+ on Red Hat) and the non-working case (Python 2.2.1 on Debian 3.1), select() returns the file object corresponding to the subprocess's STDOUT, which *should* mean that there is data ready to be read on that file descriptor. However, the actual read blocks, and both the parent and the child go to sleep. This should be impossible. That is the very problem select() is designed to solve... I note that I've set the buffer size to 8k. At the very least, as soon as the process wrote 8k to STDOUT, there should be data ready to read. Assuming full buffering is enabled for the pipe that connects STDOUT of the subprocess to the parent, the call to select() should block until one of the following conditions occur: - 8k of data is written by the child to STDOUT - any amount of data is written to STDERR - the child process terminates The last point is perhaps noteworthy; if the child process only has 4k of data to write to STDOUT, and never writes anything to STDERR, then the buffer will never fill. However, the program will terminate, at which point (assuming there was no explicit call to close() previously) the operating system will close all open file descriptors, and flush all of the child's I/O buffers. At that point, the parent process, which would be sleeping in select(), will wake up, read the 4k of data, and (eventually) close its end of the pipe (an additional iteration through the select() loop will be required, I believe). Should the program write output to STDERR before the 8k STDOUT buffer is full, then again, the parent, sleeping in select(), will awaken, and select will return the file object corresponding to the parent's end of the pipe connecting to the child's STDERR. Again, all of this is the essence of what select() does. It is supposed to guarantee that any file descriptors (or objects) it returns are in fact ready for data to be read or written. I think there are only a few possibilities: 1. My implementation of the select() loop is subtly broken. This seems like the most likely case to me; however I've been over it a bunch of times, and I can't find anything wrong with it. It's undeniable that select is returning a file object, and that reads on that file object immediately after the call to select block. I can't see how this could be possible, barring a bug somewhere else. 2. select.select() is broken in the version of Python I'm using. 3. The select() system call is somehow broken in the Linux kernel I'm using. I tend to rule this out, because I'm reasonably certain someone would have noticed this before I did. The kernel in question is being used on thousands of machines (I'm not exaggerating) which run a variety of network-oriented programs. I can't imagine that none of them uses select() (though perhaps its possible that none use it in quite the manner I'm using it here). But it may be worth looking at... I could write an implementation of a select() loop in C and see how that works. If you can see any flaw in my analysis, or in my implementation, by all means point it out! -- Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02 -=-=-=-=- This message is posted from an invalid address. Replying to it will result in undeliverable mail due to spam prevention. Sorry for the inconvenience. _______________________________________________ Discuss mailing list [hidden email] http://lists.blu.org/mailman/listinfo/discuss
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |