I''ve had two unexplained instances now of qemu-dm crashing for no apparent reason, and with no clue as to why. There is nothing in the log files or anything. I know I can get it to crash by doing funky things with vnc, but I don''t believe that that was the case here... Any suggestions? Is this a known problem that is fixed by a patch anywhere? Thanks James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> -----Original Message----- > From: xen-users-bounces@lists.xensource.com > [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of > James Harper > Sent: 15 June 2007 04:05 > To: xen-users@lists.xensource.com > Subject: [Xen-users] qemu-dm crashing under 3.1 > > I''ve had two unexplained instances now of qemu-dm crashing for no > apparent reason, and with no clue as to why. There is nothing > in the log > files or anything. > > I know I can get it to crash by doing funky things with vnc, > but I don''t > believe that that was the case here... > > Any suggestions? Is this a known problem that is fixed by a patch > anywhere?How about connecting gdb to the qemu-dm (find the PID for qemu, then in gdb do "attach <pid>; c". When it crashes or exits, it will catch it in gdb, and you''ll be able to do a traceback (at least if it didn''t just exit with an error code). I''ve not seen any problem like this, nor any fix for it. -- Mats> > Thanks > > James > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > > > I''ve had two unexplained instances now of qemu-dm crashing for no > > apparent reason, and with no clue as to why. There is nothing > > in the log > > files or anything. > > > > I know I can get it to crash by doing funky things with vnc, > > but I don''t > > believe that that was the case here... > > > > Any suggestions? Is this a known problem that is fixed by a patch > > anywhere? > > How about connecting gdb to the qemu-dm (find the PID for qemu, thenin> gdb do "attach <pid>; c". When it crashes or exits, it will catch itin> gdb, and you''ll be able to do a traceback (at least if it didn''t just > exit with an error code). > > I''ve not seen any problem like this, nor any fix for it.Okay... I can produce a segfault in a windows domain with vnc, but not in a linux domain in text mode. ''backtrace'' says: " #0 0x0000000000409b25 in ?? () #1 0x000000000046c041 in ?? () #2 0x000000000040b6d6 in ?? () #3 0x00002b43022004ca in __libc_start_main () from /lib/libc.so.6 #4 0x0000000000404bba in ?? () #5 0x00007fffa94a6f08 in ?? () #6 0x0000000000000000 in ?? () " Which isn''t helpful to me. Is there another command I can try to give something more useful? Perhaps I need debug info compiled in... Thanks James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> -----Original Message----- > From: James Harper [mailto:james.harper@bendigoit.com.au] > Sent: 15 June 2007 11:26 > To: Petersson, Mats; xen-users@lists.xensource.com > Subject: RE: [Xen-users] qemu-dm crashing under 3.1 > > > > > > > I''ve had two unexplained instances now of qemu-dm crashing for no > > > apparent reason, and with no clue as to why. There is nothing > > > in the log > > > files or anything. > > > > > > I know I can get it to crash by doing funky things with vnc, > > > but I don''t > > > believe that that was the case here... > > > > > > Any suggestions? Is this a known problem that is fixed by a patch > > > anywhere? > > > > How about connecting gdb to the qemu-dm (find the PID for qemu, then > in > > gdb do "attach <pid>; c". When it crashes or exits, it will catch it > in > > gdb, and you''ll be able to do a traceback (at least if it > didn''t just > > exit with an error code). > > > > I''ve not seen any problem like this, nor any fix for it. > > Okay... I can produce a segfault in a windows domain with vnc, but not > in a linux domain in text mode. > > ''backtrace'' says: > > " > #0 0x0000000000409b25 in ?? () > #1 0x000000000046c041 in ?? () > #2 0x000000000040b6d6 in ?? () > #3 0x00002b43022004ca in __libc_start_main () from /lib/libc.so.6 > #4 0x0000000000404bba in ?? () > #5 0x00007fffa94a6f08 in ?? () > #6 0x0000000000000000 in ?? () > " > > Which isn''t helpful to me. Is there another command I can try to give > something more useful? Perhaps I need debug info compiled in...That would make it a lot more readable - you could try just "objdump -d qemu-dm" and see if that gives you a clue of which function it''s in, but using a debug build would make it much more readable. You can re-build JUST qemu-dm by going to .../tools/ioemu and doing "make clean all" - first modify the CFLAGS in makefile with "CFLAGS +-g" to add debug symbols. Then copy your new qemu-dm to /usr/lib[64]/xen/bin on the target machine (if that''s not the same machine you''re building the code on). -- Mats> > Thanks > > James > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > That would make it a lot more readable - you could try just "objdump-d> qemu-dm" and see if that gives you a clue of which function it''s in,but> using a debug build would make it much more readable. > > You can re-build JUST qemu-dm by going to .../tools/ioemu and doing > "make clean all" - first modify the CFLAGS in makefile with "CFLAGS +> -g" to add debug symbols. Then copy your new qemu-dm to > /usr/lib[64]/xen/bin on the target machine (if that''s not the same > machine you''re building the code on). >Because it was a debian build it needed a bit of persuasion... Here''s the output (excuse the line wrapping): " Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 47890972539360 (LWP 4609)] 0x0000000000409b25 in main_loop_wait (timeout=10) at /usr/local/src/xen/xen-3.1-3.1.0-rc10+hg15040/debian/build/source/tools/ ioemu/vl.c:5224 5224 if (ioh->fd_write && FD_ISSET(ioh->fd, &wfds)) { (gdb) bt #0 0x0000000000409b25 in main_loop_wait (timeout=10) at /usr/local/src/xen/xen-3.1-3.1.0-rc10+hg15040/debian/build/source/tools/ ioemu/vl.c:5224 #1 0x000000000046c041 in main_loop () at /usr/local/src/xen/xen-3.1-3.1.0-rc10+hg15040/debian/build/source/tools/ ioemu/target-i386-dm/helper2.c:628 #2 0x000000000040b6d6 in main (argc=21, argv=0x7fff2fa03468) at /usr/local/src/xen/xen-3.1-3.1.0-rc10+hg15040/debian/build/source/tools/ ioemu/vl.c:6903 (gdb) print ioh $1 = (IOHandlerRecord *) 0x9224b0 (gdb) print ioh->fd_write $2 = (IOHandler *) 0x4691e0 <vnc_client_write> (gdb) print ioh->fd $3 = 9932400 (gdb) print wfds $4 = {fds_bits = {0 <repeats 16 times>}} (gdb) print &wfds $5 = (fd_set *) 0x7fff2fa00760 (gdb) " The only thing that strikes me as odd is the value of ioh->fd... isn''t that a little bit high for a fd number? James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Jun 15, 2007 at 09:19:12PM +1000, James Harper wrote:> > > > That would make it a lot more readable - you could try just "objdump > -d > > qemu-dm" and see if that gives you a clue of which function it''s in, > but > > using a debug build would make it much more readable. > > > > You can re-build JUST qemu-dm by going to .../tools/ioemu and doing > > "make clean all" - first modify the CFLAGS in makefile with "CFLAGS +> > -g" to add debug symbols. Then copy your new qemu-dm to > > /usr/lib[64]/xen/bin on the target machine (if that''s not the same > > machine you''re building the code on). > > > > Because it was a debian build it needed a bit of persuasion... > > Here''s the output (excuse the line wrapping): > > " > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 47890972539360 (LWP 4609)] > 0x0000000000409b25 in main_loop_wait (timeout=10) > at > /usr/local/src/xen/xen-3.1-3.1.0-rc10+hg15040/debian/build/source/tools/ > ioemu/vl.c:5224 > 5224 if (ioh->fd_write && FD_ISSET(ioh->fd, &wfds)) { > > (gdb) bt > #0 0x0000000000409b25 in main_loop_wait (timeout=10) > at > /usr/local/src/xen/xen-3.1-3.1.0-rc10+hg15040/debian/build/source/tools/ > ioemu/vl.c:5224 > #1 0x000000000046c041 in main_loop () > at > /usr/local/src/xen/xen-3.1-3.1.0-rc10+hg15040/debian/build/source/tools/ > ioemu/target-i386-dm/helper2.c:628 > #2 0x000000000040b6d6 in main (argc=21, argv=0x7fff2fa03468) > at > /usr/local/src/xen/xen-3.1-3.1.0-rc10+hg15040/debian/build/source/tools/ > ioemu/vl.c:6903 > > (gdb) print ioh > $1 = (IOHandlerRecord *) 0x9224b0 > (gdb) print ioh->fd_write > $2 = (IOHandler *) 0x4691e0 <vnc_client_write> > (gdb) print ioh->fd > $3 = 9932400 > (gdb) print wfds > $4 = {fds_bits = {0 <repeats 16 times>}} > (gdb) print &wfds > $5 = (fd_set *) 0x7fff2fa00760 > (gdb) > " > > The only thing that strikes me as odd is the value of ioh->fd... isn''t > that a little bit high for a fd number?That looks like the VNC / event loop corruption bug Anthony & myself fixed in upstream QEMU a few months back. You might want to give the attached patch a go & see if it helps. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > The only thing that strikes me as odd is the value of ioh->fd...isn''t> > that a little bit high for a fd number? > > That looks like the VNC / event loop corruption bug Anthony & myselffixed> in upstream QEMU a few months back. You might want to give theattached> patch a go & see if it helps. >The point I got to in looking through the code was that the vnc_read function could close the fd and deallocate things before the second FD_ISSET function... I assume that''s what your patch fixes? I can no longer make it crash with your patch applied, so I believe it is fixed. Woohoo! Could the same race condition occur in any other code paths? The other crash I''ve seen appears to be similar but not related to the vnc stuff... Thanks! James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Jun 15, 2007 at 10:04:01PM +1000, James Harper wrote:> > > The only thing that strikes me as odd is the value of ioh->fd... > isn''t > > > that a little bit high for a fd number? > > > > That looks like the VNC / event loop corruption bug Anthony & myself > fixed > > in upstream QEMU a few months back. You might want to give the > attached > > patch a go & see if it helps. > > > > The point I got to in looking through the code was that the vnc_read > function could close the fd and deallocate things before the second > FD_ISSET function... I assume that''s what your patch fixes?Yep, that''s exactly the scenario.> I can no longer make it crash with your patch applied, so I believe it > is fixed. Woohoo! > > Could the same race condition occur in any other code paths? The other > crash I''ve seen appears to be similar but not related to the vnc > stuff...Well depending on how lucky you are you may or may not see an immediate crash from the bug I patched. In your case it was fairly immediate, but I''ve seen it hit this & only crash later - depends on what random piece of memory are getting scribbled on :-) Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > Could the same race condition occur in any other code paths? Theother> > crash I''ve seen appears to be similar but not related to the vnc > > stuff... > > Well depending on how lucky you are you may or may not see animmediate> crash from the bug I patched. In your case it was fairly immediate,but> I''ve seen it hit this & only crash later - depends on what random > piece of memory are getting scribbled on :-)I was able to reproduce it very quickly by hitting the refresh button on the browser in a tightvnc java console. I assume that the timing of hanging up the old connection and creating a new one was just right to bring about this bug. So I''m pretty confident that it is now fixed for me :) Thanks again! James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users