| Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | About BLU |
There are some very bright people on this list. I'm betting at least
a few of them can help me out with a problem I'm having. I'd prefer
to keep the discussion on the list, but if someone wanted to e-mail me
privately about this, you could do so at code at pizzashack dot org.
I've got some Python code that uses select.select() to capture all the
output of a subprocess (both stdout and stderr, see below). This code
works as expected on a variety of Fedora systems running Python > 2.4.0,
but on a Debian 3.1 system running Python 2.2.1 it's a no-go. I'm
trying to figure out where the bug is... I've been looking at this
for a while, and I'm starting to think it's not in my code at all, but
instead perhaps it's in the underlying tools or OS. Searching for
bugs in Python 2.2.1 didn't turn up anything useful...
The behavior I see is as follows. the call to select() returns:
[<fobj corresponding to the child's STDOUT>] [] []
If and only if the total amount of output is greater than the
specified buffer size, then reading on this file hangs indefinitely.
For what it's worth, the program whose output I need to capture with
this generates about 17k of output to STDERR, and about 1k of output
to STDOUT, at essentially random intervals. But I also ran it with a
test shell script that generates roughly 40k of output to each file
object, alternating between STDOUT and STDERR in roughly line-sized
chunks, with the same results. Using that shell script, my code works
fine on a variety of systems, but using all the same code, is broken
on Debian 3.1 with Python 2.2.1 on it. [I know, I'm repeating myself.]
Yes, I'm aware that this version of Python is quite old, but I don't
have a great deal of control over that (though if this is indeed a
python bug, as opposed to a problem with my implementation, it might
provide some leverage to get it upgraded)... Thanks in advance for
any help you can provide. The code in question (quite short) follows:
def capture(cmd):
buffsize = 8192
inlist = []
inbuf = ""
errbuf = ""
io = popen2.Popen3(cmd, True, buffsize)
inlist.append(io.fromchild)
inlist.append(io.childerr)
while True:
ins, outs, excepts = select.select(inlist, [], [])
for i in ins:
x = i.read()
if not x:
inlist.remove(i)
else:
if i == io.fromchild:
inbuf += x
if i == io.childerr:
errbuf += x
if not inlist:
break
if io.wait():
raise FailedExitStatus, errbuf
return (inbuf, errbuf)
If anyone would like, I could also provide a shell script and a main
program one could use to test this function...
MY ANALYSIS
-----------
The whole point of using select() is that it should only return a list
of file objects which are ready for reading or writing. In this case,
in both the working case (Python 2.4+ on Red Hat) and the non-working
case (Python 2.2.1 on Debian 3.1), select() returns the file object
corresponding to the subprocess's STDOUT, which *should* mean that
there is data ready to be read on that file descriptor. However, the
actual read blocks, and both the parent and the child go to sleep.
This should be impossible. That is the very problem select() is
designed to solve...
I note that I've set the buffer size to 8k. At the very least, as
soon as the process wrote 8k to STDOUT, there should be data ready to
read. Assuming full buffering is enabled for the pipe that connects
STDOUT of the subprocess to the parent, the call to select() should
block until one of the following conditions occur:
- 8k of data is written by the child to STDOUT
- any amount of data is written to STDERR
- the child process terminates
The last point is perhaps noteworthy; if the child process only has 4k
of data to write to STDOUT, and never writes anything to STDERR, then
the buffer will never fill. However, the program will terminate, at
which point (assuming there was no explicit call to close()
previously) the operating system will close all open file descriptors,
and flush all of the child's I/O buffers. At that point, the parent
process, which would be sleeping in select(), will wake up, read the
4k of data, and (eventually) close its end of the pipe (an additional
iteration through the select() loop will be required, I believe).
Should the program write output to STDERR before the 8k STDOUT buffer
is full, then again, the parent, sleeping in select(), will awaken, and
select will return the file object corresponding to the parent's end
of the pipe connecting to the child's STDERR. Again, all of this is the
essence of what select() does. It is supposed to guarantee that any
file descriptors (or objects) it returns are in fact ready for data to
be read or written.
I think there are only a few possibilities:
1. My implementation of the select() loop is subtly broken. This
seems like the most likely case to me; however I've been over it a
bunch of times, and I can't find anything wrong with it. It's
undeniable that select is returning a file object, and that reads
on that file object immediately after the call to select block. I
can't see how this could be possible, barring a bug somewhere else.
2. select.select() is broken in the version of Python I'm using.
3. The select() system call is somehow broken in the Linux kernel I'm
using. I tend to rule this out, because I'm reasonably certain
someone would have noticed this before I did. The kernel in
question is being used on thousands of machines (I'm not
exaggerating) which run a variety of network-oriented programs. I
can't imagine that none of them uses select() (though perhaps its
possible that none use it in quite the manner I'm using it here).
But it may be worth looking at... I could write an implementation
of a select() loop in C and see how that works.
If you can see any flaw in my analysis, or in my implementation, by
all means point it out!
--
Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address. Replying to it will result in
undeliverable mail due to spam prevention. Sorry for the inconvenience.
_______________________________________________
Discuss mailing list
[hidden email]
http://lists.blu.org/mailman/listinfo/discuss