On Tue, Oct 14, 2014 at 03:40:22PM +0000, VONDRA Alain wrote:> Rich, > I've followed your instructions to trace, but I am not very skilful with gdb, maybe I made a mistake : > > (1) As root do: > > echo core.%p > /proc/sys/kernel/core_pattern -> OK > > (2) Before running virt-v2v, do: > > ulimited -c unlimited -> I think it's ulimit -c unlimited -> -> OK > > and you should get a core.* file in the current directory when qemu-img segfaults. Attach that file to gdb to get a stack trace: > > gdb /usr/bin/qemu-img core.XYZ -> Do I need to wait the crash becase I don't have any core ???Yes, you have to wait for qemu-img to crash before there will be a core dump. If it's not crashing, then connect to qemu-img directly, something like this: gdb /usr/bin/qemu-img `pidof qemu-img` and run this command:> (gdb) t a a btto show the stack trace in all threads. If qemu-img is consuming CPU then it's probably not hung. I'm still interested to find out why fstrim didn't work. Can you run: guestfish --ro -d unc-srv-qual03><fs> run ><fs> part-list /dev/sda ><fs> part-list /dev/sdb ><fs> part-list /dev/sdc ><fs> part-list /dev/sdd(etc) I'd be interested to see if the partitions are unaligned, which is the only reason why fstrim should fail on NTFS. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Rich, The results of the guestfish command :><fs> part-list /dev/sda[0] = { part_num: 1 part_start: 1048576 part_end: 85906685951 part_size: 85905637376 }><fs> part-list /dev/sdb[0] = { part_num: 1 part_start: 32256 part_end: 322126640639 part_size: 322126608384 }><fs> part-list /dev/sdc[0] = { part_num: 1 part_start: 32256 part_end: 32210196479 part_size: 32210164224 }><fs> part-list /dev/sdd[0] = { part_num: 1 part_start: 32256 part_end: 32210196479 part_size: 32210164224 }><fs> part-list /dev/sdf[0] = { part_num: 1 part_start: 32256 part_end: 21476206079 part_size: 21476173824 }><fs> part-list /dev/sdg[0] = { part_num: 1 part_start: 32256 part_end: 42952412159 part_size: 42952379904 }><fs> part-list /dev/sdh[0] = { part_num: 1 part_start: 32256 part_end: 214745610239 part_size: 214745577984 }><fs> part-list /dev/sdi[0] = { part_num: 1 part_start: 32256 part_end: 32224857087 part_size: 32224824832 } Alain -----Message d'origine----- De : Richard W.M. Jones [mailto:rjones@redhat.com] Envoyé : mardi 14 octobre 2014 22:52 À : VONDRA Alain Cc : libguestfs@redhat.com Objet : Re: [Libguestfs] Virt-v2v conversion issue On Tue, Oct 14, 2014 at 03:40:22PM +0000, VONDRA Alain wrote:> Rich, > I've followed your instructions to trace, but I am not very skilful with gdb, maybe I made a mistake : > > (1) As root do: > > echo core.%p > /proc/sys/kernel/core_pattern -> OK > > (2) Before running virt-v2v, do: > > ulimited -c unlimited -> I think it's ulimit -c unlimited -> -> OK > > and you should get a core.* file in the current directory when qemu-img segfaults. Attach that file to gdb to get a stack trace: > > gdb /usr/bin/qemu-img core.XYZ -> Do I need to wait the crash becase I don't have any core ???Yes, you have to wait for qemu-img to crash before there will be a core dump. If it's not crashing, then connect to qemu-img directly, something like this: gdb /usr/bin/qemu-img `pidof qemu-img` and run this command:> (gdb) t a a btto show the stack trace in all threads. If qemu-img is consuming CPU then it's probably not hung. I'm still interested to find out why fstrim didn't work. Can you run: guestfish --ro -d unc-srv-qual03><fs> run ><fs> part-list /dev/sda ><fs> part-list /dev/sdb ><fs> part-list /dev/sdc ><fs> part-list /dev/sdd(etc) I'd be interested to see if the partitions are unaligned, which is the only reason why fstrim should fail on NTFS. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
On Wed, Oct 15, 2014 at 12:03:26PM +0000, VONDRA Alain wrote:> Rich, > The results of the guestfish command : > > ><fs> part-list /dev/sda > [0] = { > part_num: 1 > part_start: 1048576 > part_end: 85906685951 > part_size: 85905637376 > } > ><fs> part-list /dev/sdb > [0] = { > part_num: 1 > part_start: 32256 > part_end: 322126640639 > part_size: 322126608384 > }[etc] Ah .. this does explain why the fstrim failed on every disk apart from /dev/sda. My patches to implement fstrim in NTFS give up unless the partition is aligned sufficiently: http://comments.gmane.org/gmane.comp.file-systems.ntfs-3g.devel/1074 /dev/sda is sufficiently aligned, but the other 8 disks are not. This is rather unfortunate for you because fstrim would undoubtedly have saved network bandwidth as well as making qemu-img do less. At least that mystery is solved. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Now, the conversion hangs on the first disk at 54,03/100%, here is the gdb result : (gdb) t a a bt Thread 2 (Thread 0x7f0f3cea9700 (LWP 9583)): #0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:101 #1 0x00007f0f45b765e7 in qemu_sem_timedwait () #2 0x00007f0f45b0750c in worker_thread () #3 0x00007f0f4203bdf3 in start_thread (arg=0x7f0f3cea9700) at pthread_create.c:308 #4 0x00007f0f41d6901d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 1 (Thread 0x7f0f45aa88c0 (LWP 9582)): #0 0x00007f0f41d5eb0f in __GI_ppoll (fds=0x7f0f470d7a00, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:56 #1 0x00007f0f45b146db in qemu_poll_ns () #2 0x00007f0f45b15430 in aio_poll () #3 0x00007f0f45b0eedd in bdrv_prwv_co () #4 0x00007f0f45b0efd3 in bdrv_rw_co () #5 0x00007f0f45b05b35 in img_convert () #6 0x00007f0f41c94af5 in __libc_start_main (main=0x7f0f45b01e80 <main>, argc=10, ubp_av=0x7fff19cb3ab8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff19cb3aa8) at libc-start.c:274 #7 0x00007f0f45b022ed in _start () Alain -----Message d'origine----- De : Richard W.M. Jones [mailto:rjones@redhat.com] Envoyé : mardi 14 octobre 2014 22:52 À : VONDRA Alain Cc : libguestfs@redhat.com Objet : Re: [Libguestfs] Virt-v2v conversion issue On Tue, Oct 14, 2014 at 03:40:22PM +0000, VONDRA Alain wrote:> Rich, > I've followed your instructions to trace, but I am not very skilful with gdb, maybe I made a mistake : > > (1) As root do: > > echo core.%p > /proc/sys/kernel/core_pattern -> OK > > (2) Before running virt-v2v, do: > > ulimited -c unlimited -> I think it's ulimit -c unlimited -> -> OK > > and you should get a core.* file in the current directory when qemu-img segfaults. Attach that file to gdb to get a stack trace: > > gdb /usr/bin/qemu-img core.XYZ -> Do I need to wait the crash becase I don't have any core ???Yes, you have to wait for qemu-img to crash before there will be a core dump. If it's not crashing, then connect to qemu-img directly, something like this: gdb /usr/bin/qemu-img `pidof qemu-img` and run this command:> (gdb) t a a btto show the stack trace in all threads. If qemu-img is consuming CPU then it's probably not hung. I'm still interested to find out why fstrim didn't work. Can you run: guestfish --ro -d unc-srv-qual03><fs> run ><fs> part-list /dev/sda ><fs> part-list /dev/sdb ><fs> part-list /dev/sdc ><fs> part-list /dev/sdd(etc) I'd be interested to see if the partitions are unaligned, which is the only reason why fstrim should fail on NTFS. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
On Wed, Oct 15, 2014 at 12:23:17PM +0000, VONDRA Alain wrote:> Now, the conversion hangs on the first disk at 54,03/100%, here is the gdb result :OK I just talked to Paolo about this, and it's fairly serious. Can you get a core dump from this? You will need to set up the ulimit etc as outlined in the previous email: https://www.redhat.com/archives/libguestfs/2014-October/msg00102.html then run the conversion until it hangs, then 'killall -BUS qemu-img' which should cause qemu-img to drop a core dump file. You can send me (privately) the core dump and I will open a BZ about this and ensure that the necessary people see this. Thanks, Rich.> (gdb) t a a bt > > Thread 2 (Thread 0x7f0f3cea9700 (LWP 9583)): > #0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:101 > #1 0x00007f0f45b765e7 in qemu_sem_timedwait () > #2 0x00007f0f45b0750c in worker_thread () > #3 0x00007f0f4203bdf3 in start_thread (arg=0x7f0f3cea9700) at pthread_create.c:308 > #4 0x00007f0f41d6901d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 > > Thread 1 (Thread 0x7f0f45aa88c0 (LWP 9582)): > #0 0x00007f0f41d5eb0f in __GI_ppoll (fds=0x7f0f470d7a00, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:56 > #1 0x00007f0f45b146db in qemu_poll_ns () > #2 0x00007f0f45b15430 in aio_poll () > #3 0x00007f0f45b0eedd in bdrv_prwv_co () > #4 0x00007f0f45b0efd3 in bdrv_rw_co () > #5 0x00007f0f45b05b35 in img_convert () > #6 0x00007f0f41c94af5 in __libc_start_main (main=0x7f0f45b01e80 <main>, argc=10, ubp_av=0x7fff19cb3ab8, init=<optimized out>, fini=<optimized out>, > rtld_fini=<optimized out>, stack_end=0x7fff19cb3aa8) at libc-start.c:274 > #7 0x00007f0f45b022ed in _start () > > Alain > > -----Message d'origine----- > De : Richard W.M. Jones [mailto:rjones@redhat.com] > Envoyé : mardi 14 octobre 2014 22:52 > À : VONDRA Alain > Cc : libguestfs@redhat.com > Objet : Re: [Libguestfs] Virt-v2v conversion issue > > On Tue, Oct 14, 2014 at 03:40:22PM +0000, VONDRA Alain wrote: > > Rich, > > I've followed your instructions to trace, but I am not very skilful with gdb, maybe I made a mistake : > > > > (1) As root do: > > > > echo core.%p > /proc/sys/kernel/core_pattern -> OK > > > > (2) Before running virt-v2v, do: > > > > ulimited -c unlimited -> I think it's ulimit -c unlimited -> -> OK > > > > and you should get a core.* file in the current directory when qemu-img segfaults. Attach that file to gdb to get a stack trace: > > > > gdb /usr/bin/qemu-img core.XYZ -> Do I need to wait the crash becase I don't have any core ??? > > Yes, you have to wait for qemu-img to crash before there will be a core dump. If it's not crashing, then connect to qemu-img directly, something like this: > > gdb /usr/bin/qemu-img `pidof qemu-img` > > and run this command: > > > (gdb) t a a bt > > to show the stack trace in all threads. > > If qemu-img is consuming CPU then it's probably not hung. > > I'm still interested to find out why fstrim didn't work. Can you run: > > guestfish --ro -d unc-srv-qual03 > ><fs> run > ><fs> part-list /dev/sda > ><fs> part-list /dev/sdb > ><fs> part-list /dev/sdc > ><fs> part-list /dev/sdd > (etc) > > I'd be interested to see if the partitions are unaligned, which is the only reason why fstrim should fail on NTFS. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org