Hello, First off, many thanks for libguestfs... very useful! I'm trying to run libguestfs as an RPC service by using a python daemon with the python libguestfs bindings. With this service I'm "centrally" editing VM images, and often running many GuestFS() on the same image file in a short amount of time to edit the image's contents. The problem I'm running across is that each GuestFS() instance called from within the python daemon doesn't terminate itself (until a timeout much later?). I believe this sometimes causes my image edits to clobber each other and corrupt the image now and then. Each image edit usually amounts to: g = guestfs.GuestFS() g.add_drive(image_file) g.launch() g.mount("/dev/vda1", "/") g.write(file_name, file_contents) g.sync() g.umount_all() g.close() del g As I understand things, there is a reference count that should close "g" once the reference count = 0. However, I can't seem to make that happen inside of my python daemon which continues to run. Any clues on how I might accomplish this or what I'm doing wrong? Thanks
On Fri, Apr 29, 2011 at 01:38:19AM -0400, T Johnson wrote:> Hello, > > First off, many thanks for libguestfs... very useful! > > I'm trying to run libguestfs as an RPC service by using a python > daemon with the python libguestfs bindings. With this service I'm > "centrally" editing VM images, and often running many GuestFS() on the > same image file in a short amount of time to edit the image's > contents. > > The problem I'm running across is that each GuestFS() instance called > from within the python daemon doesn't terminate itself (until a > timeout much later?). I believe this sometimes causes my image edits > to clobber each other and corrupt the image now and then. Each image > edit usually amounts to: > > g = guestfs.GuestFS() > g.add_drive(image_file) > g.launch() > g.mount("/dev/vda1", "/") > g.write(file_name, file_contents) > g.sync() > g.umount_all() > g.close() > del g > > > As I understand things, there is a reference count that should close > "g" once the reference count = 0. However, I can't seem to make that > happen inside of my python daemon which continues to run.First thing is to find out if 'del g' really closes the handle or not. It's my understanding (of Python bindings) that 'del g' *should* close the handle immediately, assuming you haven't hidden a reference to 'g' somewhere else in the program. Unfortunately we left out a trace call in guestfs_close so instead of using ordinary call tracing you have to do this using debugging: g.set_verbose (1) print >> sys.stderr, "I'm about to call del g ..." del g print >> sys.stderr, "I've called del g." This should produce a sequence of messages like: I'm about to call del g ... closing guestfs handle 0x1234 (state 2) # ie. guestfs_close called I've called del g. So we could get one of two results here: either guestfs_close is being called, or guestfs_close is not being called. If it's *not* being called then it's something to do with reference counting (or you're tucking a reference away somewhere else). I don't know a lot about how Python refcounting is supposed to work, but it's possible there is a mistake in the bindings, eg. not incrementing or decrementing a reference somewhere. If guestfs_close *is* being called then it's a more serious problem in the underlying library. guestfs_close is supposed to kill the qemu subprocess by sending it a SIGTERM, so really it should exit almost immediately, and we also waitpid for the process so really guestfs_close should not be returning before qemu has really exited. If this isn't happening, then getting 'strace' output would be the most useful thing. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/
On Sat, Apr 30, 2011 at 01:10:57AM -0400, T Johnson wrote:> Thanks for responding. I'm by no means a python expert so I could very > well be doing something wrong, but I've been playing a bit with what > you've suggested and noted a few interesting things. > > 1) As soon as I create 'g', something seems to take over stderr and > stdout so I can no longer print debugging output like you suggested.This is very odd. AFAIK nothing in libguestfs should affect stdout or stderr. When you say "create 'g'" do you mean "g = guestfs.GuestFS()" (ie. create the handle) or "g.launch ()" (ie. launch the subprocess)? I guess it is possible that something in g.launch (ie. forking off a subprocess) might affect stdout/stderr, although we use it in an awful lot of other programs and no one has noticed anything before. Here's my test program: ---------------------------------------------------------------------- #!/usr/bin/python import sys import guestfs print >> sys.stderr, "creating the handle" g = guestfs.GuestFS() print >> sys.stderr, "add_drive_ro" g.add_drive_ro ("/dev/null") print >> sys.stderr, "launch" g.launch () print >> sys.stderr, "set_verbose" g.set_verbose (1) print >> sys.stderr, "delete the handle" #g.close () del g print >> sys.stderr, "finished" ---------------------------------------------------------------------- BTW you can't directly call g.close() from Python because that method is not exposed (instead, it is called implicitly when you delete g). So your program ought to be throwing an exception when you try to call g.close(). I'm not sure why you are not seeing that. You're not trying to create the underlying libguestfsmod objects directly? Don't do that. Here is the output from my test program: ---------------------------------------------------------------------- creating the handle add_drive_ro launch set_verbose delete the handle libguestfs: closing guestfs handle 0x25106a0 (state 2) libguestfs: send_to_daemon: 44 bytes: 00 00 00 28 | 20 00 f5 f5 | 00 00 00 04 | 00 00 01 1a | 00 00 00 00 | ... libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 1a | 00 00 00 01 | 00 12 34 00 | ... libguestfs: sending SIGTERM to process 13874 finished ---------------------------------------------------------------------- As you can see it's completely as expected (python-libguestfs-1.11.3-1.fc16.x86_64).> 2) (I assume) because of #1, g.set_verbose(1) results in no output to > stdout/stderr either, so it hasn't been helpful so far. > > 3) However, I then decided to try g.set_trace(1) and that does provide > output. In the example I provided before I get this (editing config > settings in /etc/sysconfig/network):Unfortunately (because of a bug) closing the handle does not result in any trace output, except ...> libguestfs: trace: add_drive "/mnt/images/test_server.img" > libguestfs: trace: add_drive = 0 > libguestfs: trace: launch > libguestfs: trace: launch = 0 > libguestfs: trace: mount "/dev/vda1" "/" > libguestfs: trace: mount = 0 > libguestfs: trace: write "/etc/sysconfig/network" "..." > libguestfs: trace: write = 0 > > I can't seem to get any other ouput.. but it is very interesting that > once I kill the process I see the remainder of the calls failing from > trace output: > > libguestfs: trace: umount_all > libguestfs: trace: umount_all = -1 (error) > libguestfs: trace: sync > libguestfs: trace: sync = -1 (error) > libguestfs: trace: umount_all > libguestfs: trace: umount_all = -1 (error) > libguestfs: trace: sync > libguestfs: trace: sync = -1 (error)... when you close a handle which has autosync enabled, which is now the default, you'll see output like this. It seems like all your handles are getting closed at once. I have no idea why.> So, I guess that is what's failing.. but I haven't been able to > determine why yet. > > This is something I've been running into on much older versions of > libguestfs all the way up to now which I'm trying your recent > "1.7.17-17.el6" RPMs for rhel6.Ah, so this could be RHEL 6.0? This could well be a RHEL 6.0 bug. Definitely try out the RHEL 6.1 packages from: http://people.redhat.com/~rjones/rhel6.1-libguestfs-preview/ You'll find a lot of things are better. It's much faster for one. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/
Hello, I've been strapped for time to work on this lately, but have a couple updates. I have been using the updated RHEL 6.1 packages, so unfortunately that doesn't seem to solve the problem yet. I'm still having some trouble capturing all the output, but I have manged to collect this verbose output after aborting a "stalled" request: [ 35.206396] end_request: I/O error, dev vda, sector 2048 [ 35.207363] Buffer I/O error on device vda1, logical block 0 [ 35.207384] lost page write due to I/O error on vda1 [ 35.207384] Buffer I/O error on device vda1, logical block 1 [ 35.207384] lost page write due to I/O error on vda1 [ 35.207384] end_request: I/O error, dev vda, sector 1574912 [ 35.207384] Buffer I/O error on device vda1, logical block 196608 [ 35.207384] lost page write due to I/O error on vda1 [ 35.207384] end_request: I/O error, dev vda, sector 1575624 [ 35.207384] Buffer I/O error on device vda1, logical block 196697 [ 35.207384] lost page write due to I/O error on vda1 check_for_daemon_cancellation_or_eof: 0x1786f80 g->state = 3, fd = 66 child_cleanup: 0x1786f80: child process died I should probably note that this is being done over NFS mounted images. I'm fairly sure there aren't NFS problems though, as the NFS mounts are quite stable and otherwise seem to respond just fine against the same mounts at the same time this problem occurs. I'm hoping the above might trigger some ideas but I'm still working on trying to figure out the lack of verbose data to stderr otherwise. Thanks, TJ On Sat, Apr 30, 2011 at 5:32 AM, Richard W.M. Jones <rjones at redhat.com> wrote:> On Sat, Apr 30, 2011 at 01:10:57AM -0400, T Johnson wrote: >> Thanks for responding. I'm by no means a python expert so I could very >> well be doing something wrong, but I've been playing a bit with what >> you've suggested and noted a few interesting things. >> >> 1) As soon as I create 'g', something seems to take over stderr and >> stdout so I can no longer print debugging output like you suggested. > > This is very odd. ?AFAIK nothing in libguestfs should affect stdout or > stderr. > > When you say "create 'g'" do you mean "g = guestfs.GuestFS()" (ie. > create the handle) or "g.launch ()" (ie. launch the subprocess)? ?I > guess it is possible that something in g.launch (ie. forking off a > subprocess) might affect stdout/stderr, although we use it in an awful > lot of other programs and no one has noticed anything before. > > Here's my test program: > > ---------------------------------------------------------------------- > #!/usr/bin/python > > import sys > import guestfs > > print >> sys.stderr, "creating the handle" > g = guestfs.GuestFS() > print >> sys.stderr, "add_drive_ro" > g.add_drive_ro ("/dev/null") > print >> sys.stderr, "launch" > g.launch () > print >> sys.stderr, "set_verbose" > g.set_verbose (1) > print >> sys.stderr, "delete the handle" > #g.close () > del g > print >> sys.stderr, "finished" > ---------------------------------------------------------------------- > > BTW you can't directly call g.close() from Python because that method > is not exposed (instead, it is called implicitly when you delete g). > So your program ought to be throwing an exception when you try to call > g.close(). ?I'm not sure why you are not seeing that. > > You're not trying to create the underlying libguestfsmod objects > directly? ?Don't do that. > > Here is the output from my test program: > > ---------------------------------------------------------------------- > creating the handle > add_drive_ro > launch > set_verbose > delete the handle > libguestfs: closing guestfs handle 0x25106a0 (state 2) > libguestfs: send_to_daemon: 44 bytes: 00 00 00 28 | 20 00 f5 f5 | 00 00 00 04 | 00 00 01 1a | 00 00 00 00 | ... > libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 1a | 00 00 00 01 | 00 12 34 00 | ... > libguestfs: sending SIGTERM to process 13874 > finished > ---------------------------------------------------------------------- > > As you can see it's completely as expected > (python-libguestfs-1.11.3-1.fc16.x86_64). > >> 2) (I assume) because of #1, g.set_verbose(1) results in no output to >> stdout/stderr either, so it hasn't been helpful so far. >> >> 3) However, I then decided to try g.set_trace(1) and that does provide >> output. In the example I provided before I get this (editing config >> settings in /etc/sysconfig/network): > > Unfortunately (because of a bug) closing the handle does not result > in any trace output, except ... > >> libguestfs: trace: add_drive "/mnt/images/test_server.img" >> libguestfs: trace: add_drive = 0 >> libguestfs: trace: launch >> libguestfs: trace: launch = 0 >> libguestfs: trace: mount "/dev/vda1" "/" >> libguestfs: trace: mount = 0 >> libguestfs: trace: write "/etc/sysconfig/network" "..." >> libguestfs: trace: write = 0 >> >> I can't seem to get any other ouput.. but it is very interesting that >> once I kill the process I see the remainder of the calls failing from >> trace output: >> >> libguestfs: trace: umount_all >> libguestfs: trace: umount_all = -1 (error) >> libguestfs: trace: sync >> libguestfs: trace: sync = -1 (error) >> libguestfs: trace: umount_all >> libguestfs: trace: umount_all = -1 (error) >> libguestfs: trace: sync >> libguestfs: trace: sync = -1 (error) > > ... when you close a handle which has autosync enabled, which is now > the default, you'll see output like this. ?It seems like all your > handles are getting closed at once. ?I have no idea why. > >> So, I guess that is what's failing.. but I haven't been able to >> determine why yet. >> >> This is something I've been running into on much older versions of >> libguestfs all the way up to now which I'm trying your recent >> "1.7.17-17.el6" RPMs for rhel6. > > Ah, so this could be RHEL 6.0? ?This could well be a RHEL 6.0 bug. > Definitely try out the RHEL 6.1 packages from: > > http://people.redhat.com/~rjones/rhel6.1-libguestfs-preview/ > > You'll find a lot of things are better. ?It's much faster for one. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > virt-df lists disk usage of guests without needing to install any > software inside the virtual machine. ?Supports Linux and Windows. > http://et.redhat.com/~rjones/virt-df/ >