python giveth and python taketh away
Or to put it another way: the fun I had with the new subprocess module in Python (well, new to 2.4).
I’ve been doing some major refactoring of the code we use to do our continuous builds and during one of the sessions I discovered the new subprocess module. It appeared to be the answer to some hairy code that Morgen and others created in Hardhat (our current build tool) to work around the chore of running child processes across three platforms.
Some quick test code revealed that it really simplified a lot of basic spawn- child-process-and-gather-it’s-output activities that is at the core of any build process. For example:
import subprocess
p = subprocess.Popen(['ls', '-la'], stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate()
print "STDOUT follows:"
print out
print "STDERR follows:"
print err
will create a Popen object, tell it to use PIPEs for STDOUT and STDERR and to run “ls -la” when ready. The p.communicate() call is a helper method to run the command, gather the output and return it as a tuple. Both are returned as strings.
That works fine if you want to get the output as two different strings. I really prefer to see them inter-mixed so I took a look at the source and saw that subprocess.STDOUT was available and that triggered an Aha! moment when I remembered that they were pipes.
import subprocess
p = subprocess.Popen(['ls', '-la'], stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
out, err = p.communicate()
print "STDOUT follows:"
print out
print "STDERR follows:"
print err
This code shows that all of the output is returned in the variable out and the variable err is assigned a value of None. Not bad! Next I started experimenting with the other Popen methods myself instead of using the helper functions and that lead me to try the following:
import subprocess
p = subprocess.Popen(['ls', '-la'], stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
p.wait()
out = p.stdout.readlines()
print "return code [%d]" % p.returncode
print out
Now this worked like gang-busters for me – the output is inter-mixed and also returned as a list (something which most of my build helper routines handle) and I can also pass back the return code for log output. Looking good :)
That is until I started seeing really odd hangs in my testing. At first I suspected a lot of other things (bad files, corrupt downloads, internet lag, etc), but eventually came to realize that the child process was sending back so much information that the whole thing hung waiting for something to read some data from the pipe. At least that’s what I think is happening, so I decided to test my theory and when I rearranged the code to do this:
out = p.stdout.readlines()
p.wait()
things started flowing again. This also allowed me to log the data coming from p.stdout.readlines() immediately instead of waiting to the end – this makes the output appear as it’s generated. While this helps only with the appearance of activity, it is noticiable when you are watching a terminal session sitting there for a couple minutes with nothing happening.
Anywho, hopefully this will appear in someone’s google search and be helpful to them. When I was google’ing for examples the only hits I got were for the Python docs and those were not very useful to me.