Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

A little Python-fu with select()



 There are some very bright people on this list.  I'm betting at least 
a few of them can help me out with a problem I'm having.  I'd prefer 
to keep the discussion on the list, but if someone wanted to e-mail me 
privately about this, you could do so at code at pizzashack dot org. 

I've got some Python code that uses select.select() to capture all the 
output of a subprocess (both stdout and stderr, see below).  This code 
works as expected on a variety of Fedora systems running Python > 2.4.0, 
but on a Debian 3.1 system running Python 2.2.1 it's a no-go.  I'm 
trying to figure out where the bug is...  I've been looking at this 
for a while, and I'm starting to think it's not in my code at all, but 
instead perhaps it's in the underlying tools or OS.  Searching for 
bugs in Python 2.2.1 didn't turn up anything useful... 

The behavior I see is as follows. the call to select() returns: 

  [<fobj corresponding to the child's STDOUT>] [] [] 

If and only if the total amount of output is greater than the 
specified buffer size, then reading on this file hangs indefinitely. 
For what it's worth, the program whose output I need to capture with 
this generates about 17k of output to STDERR, and about 1k of output 
to STDOUT, at essentially random intervals.  But I also ran it with a 
test shell script that generates roughly 40k of output to each file 
object, alternating between STDOUT and STDERR in roughly line-sized 
chunks, with the same results.  Using that shell script, my code works 
fine on a variety of systems, but using all the same code, is broken 
on Debian 3.1 with Python 2.2.1 on it. [I know, I'm repeating myself.] 

Yes, I'm aware that this version of Python is quite old, but I don't 
have a great deal of control over that (though if this is indeed a 
python bug, as opposed to a problem with my implementation, it might 
provide some leverage to get it upgraded)...  Thanks in advance for 
any help you can provide.  The code in question (quite short) follows: 

def capture(cmd): 
    buffsize = 8192 
    inlist = [] 
    inbuf = "" 
    errbuf = "" 

    io = popen2.Popen3(cmd, True, buffsize) 
    inlist.append(io.fromchild) 
    inlist.append(io.childerr) 
    while True: 
        ins, outs, excepts = select.select(inlist, [], []) 
        for i in ins: 
            x = i.read() 
            if not x: 
                inlist.remove(i) 
            else: 
                if i == io.fromchild: 
                    inbuf += x 
                if i == io.childerr: 
                    errbuf += x 
        if not inlist: 
            break 
    if io.wait(): 
        raise FailedExitStatus, errbuf 
    return (inbuf, errbuf) 

If anyone would like, I could also provide a shell script and a main 
program one could use to test this function... 

MY ANALYSIS 
----------- 

The whole point of using select() is that it should only return a list 
of file objects which are ready for reading or writing.  In this case, 
in both the working case (Python 2.4+ on Red Hat) and the non-working 
case (Python 2.2.1 on Debian 3.1), select() returns the file object 
corresponding to the subprocess's STDOUT, which *should* mean that 
there is data ready to be read on that file descriptor.  However, the 
actual read blocks, and both the parent and the child go to sleep. 

This should be impossible.  That is the very problem select() is 
designed to solve... 

I note that I've set the buffer size to 8k.  At the very least, as 
soon as the process wrote 8k to STDOUT, there should be data ready to 
read.  Assuming full buffering is enabled for the pipe that connects 
STDOUT of the subprocess to the parent, the call to select() should 
block until one of the following conditions occur: 

 - 8k of data is written by the child to STDOUT 

 - any amount of data is written to STDERR 

 - the child process terminates 

The last point is perhaps noteworthy; if the child process only has 4k 
of data to write to STDOUT, and never writes anything to STDERR, then 
the buffer will never fill.  However, the program will terminate, at 
which point (assuming there was no explicit call to close() 
previously) the operating system will close all open file descriptors, 
and flush all of the child's I/O buffers.  At that point, the parent 
process, which would be sleeping in select(), will wake up, read the 
4k of data, and (eventually) close its end of the pipe (an additional 
iteration through the select() loop will be required, I believe). 

Should the program write output to STDERR before the 8k STDOUT buffer 
is full, then again, the parent, sleeping in select(), will awaken, and 
select will return the file object corresponding to the parent's end 
of the pipe connecting to the child's STDERR.  Again, all of this is the 
essence of what select() does.  It is supposed to guarantee that any 
file descriptors (or objects) it returns are in fact ready for data to 
be read or written. 

I think there are only a few possibilities: 

1. My implementation of the select() loop is subtly broken.  This 
   seems like the most likely case to me; however I've been over it a 
   bunch of times, and I can't find anything wrong with it.  It's 
   undeniable that select is returning a file object, and that reads 
   on that file object immediately after the call to select block.  I 
   can't see how this could be possible, barring a bug somewhere else. 

2. select.select() is broken in the version of Python I'm using.   

3. The select() system call is somehow broken in the Linux kernel I'm 
   using.  I tend to rule this out, because I'm reasonably certain 
   someone would have noticed this before I did.  The kernel in 
   question is being used on thousands of machines (I'm not 
   exaggerating) which run a variety of network-oriented programs.  I 
   can't imagine that none of them uses select() (though perhaps its 
   possible that none use it in quite the manner I'm using it here). 
   But it may be worth looking at...  I could write an implementation 
   of a select() loop in C and see how that works. 

If you can see any flaw in my analysis, or in my implementation, by 
all means point it out! 

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02 
-=-=-=-=- 
This message is posted from an invalid address.  Replying to it will result in 
undeliverable mail due to spam prevention.  Sorry for the inconvenience. 

_______________________________________________ 
Discuss mailing list 
[hidden email] 
http://lists.blu.org/mailman/listinfo/discuss
 


BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org