Jaden Liang
2014-Nov-08 10:20 UTC
[Gluster-users] [Gluster-devel]glusterfs crashed lead by liblvm2app.so with BD xlator
Hi all, We are testing BD xlator to verify the KVM running with gluster. After some simple tests, we encountered a coredump of glusterfs lead by liblvm2app.so. Hope some one here might give some advises about this issue. We have debug for some time, and found out this coredump is triggered by a thread-safe issue. From the core file, the top function is _update_mda() with a invailid pointer which is from lvmcache_foreach_mda(). As we know, the glusterfsd has some io threads to simulate the async io. That will make more than 1 thread run into bd_statfs_cbk(). And in liblvm2app.so, _text_read() will look up an info in a hash table named _pvid_hash. If no info item exist, it will allocate a new one. However, there isn't any lock to protect this operations! liblvm2app.so will get crashed with multi-thread like this precedures: Thread A and thread B go into bd_statfs_cbk() at the same time: 1. A allocate an new info node, and put it into _pvid_hash, call lvmcache_foreach_mda(). 2. B looks up and get the info generaed by A in _pvid_hash, pass it to lvmcache_del_mdas(), this will free the info node. 3. A keep using the info node which has been freed by B. 4. Memory crash... Reproduce steps: 1. Create a BD volume with BD xlator follow a standard method. Mount it on a glusterfs client. 2. Write a simple test script crash_bd.sh: #!/bin/bash while :;do i=0 while [ $i -lt 10 ]; do df > /dev/null i=`expr $i + 1` done sleep 10; done 3. Start some crash_bd.sh at the same time # ./crash_bd.sh & # ./crash_bd.sh & # ./crash_bd.sh & # ./crash_bd.sh & 4. Just wait some minutes, it will get an error like this: df: `/mnt/bd_vol': Transport endpoint is not connected the glusterfs has crashed. Note: If we set the io-thread number to single, the BD xlator looks running very well! Hope some one here might give some advises about this issue. Any infomations is appriciated! Core detail: Core was generated by `/usr/sbin/glusterfsd -s host-005056b50a23 --volfile-id bd.host-0050'. Program terminated with signal 11, Segmentation fault. #0 _update_mda (mda=0x11fb000, baton=0x7f83b1d0a6d0) at format_text/text_label.c:328 353 format_text/text_label.c: No such file or directory. (gdb) bt #0 _update_mda (mda=0x11fb000, baton=0x7f83b1d0a6d0) at format_text/text_label.c:328 #1 0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info at entry=0x11faf00, fun=fun at entry=0x7f83b59c1a60 <_update_mda>, baton=baton at entry=0x7f83b1d0a6d0) at cache/lvmcache.c:1880 #2 0x00007f83b59c0e5f in _text_read (l=<optimized out>, dev=0x11ee8d8, buf=<optimized out>, label=0x7f83b1d0a958) at format_text/text_label.c:459 #3 0x00007f83b59c27e7 in label_read (dev=0x11ee8d8, result=result at entry=0x7f83b1d0a958, scan_sector=scan_sector at entry=0) at label/label.c:284 #4 0x00007f83b599dd2b in lvmcache_fmt_from_vgname (cmd=cmd at entry=0x11d3c40, vgname=vgname at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0, revalidate_labels=revalidate_labels at entry=1) at cache/lvmcache.c:506 #5 0x00007f83b59e1ad8 in _vg_read (cmd=cmd at entry=0x11d3c40, vgname=vgname at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0, warnings=warnings at entry=1, consistent=consistent at entry=0x7f83b1d0ab48, precommitted=precommitted at entry=0) at metadata/metadata.c:3143 #6 0x00007f83b59e2ecc in vg_read_internal (cmd=cmd at entry=0x11d3c40, vgname=vgname at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0, warnings=warnings at entry=1, consistent=consistent at entry=0x7f83b1d0ab48) at metadata/metadata.c:3549 #7 0x00007f83b59e30cc in _vg_lock_and_read (misc_flags=0, status_flags=0, lock_flags=33, vgid=0x0, vg_name=0x11d2c50 "bd-vg", cmd=0x11d3c40) at metadata/metadata.c:4235 #8 vg_read (cmd=cmd at entry=0x11d3c40, vg_name=vg_name at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0, flags=0) at metadata/metadata.c:4343 #9 0x00007f83b599753f in _lvm_vg_open (mode=0x7f83b5c8971e "r", vgname=0x11d2c50 "bd-vg", libh=0x11d3c40, flags=<optimized out>) at lvm_vg.c:221 #10 lvm_vg_open (libh=0x11d3c40, vgname=0x11d2c50 "bd-vg", mode=mode at entry=0x7f83b5c8971e "r", flags=flags at entry=0) at lvm_vg.c:238 #11 0x00007f83b5c7ee36 in bd_statfs_cbk (frame=0x7f83b95416e4, cookie=<optimized out>, this=0x119eb90, op_ret=0, op_errno=0, buff=0x7f83b1d0ac70, xdata=0x0) at bd.c:353 ...... (gdb) f 1 #1 0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info at entry=0x11faf00, fun=fun at entry=0x7f83b59c1a60 <_update_mda>, baton=baton at entry=0x7f83b1d0a6d0) at cache/lvmcache.c:1899 1899 cache/lvmcache.c: No such file or directory. (gdb) p *info $1 = {list = {n = 0x11fa650, p = 0x11fa650}, mdas = {n =0x11fafd0, p 0x11fafd0}, das = {n = 0x11fb000, p = 0x11fb000}, bas = {n = 0x11faf30, p = 0x11faf30}, vginfo = 0x11fa640, label 0x11faed0, fmt = 0x11f8480, dev = 0x11ee8d8, device_size = 531870253056, status = 1} (gdb) info threads Id Target Id Frame 11 Thread 0x7f83b3786700 (LWP 24272) 0x00007f7bc917cdec in _dev_close (dev=0x16421d0, immediate=immediate at entry=0) at device/dev-io.c:624 10 Thread 0x7f83bb968700 (LWP 23306) 0x00007f83ba5e40d3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 9 Thread 0x7f83b250c700 (LWP 24276) 0x00007f83bac822d4 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 8 Thread 0x7f83b2d0d700 (LWP 24275) 0x00007f83bac8264b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 7 Thread 0x7f83b350e700 (LWP 24274) 0x00007f83ba5b4bdd in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 6 Thread 0x7f83b3887700 (LWP 24271) 0x00007f83bac822d4 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 5 Thread 0x7f83b6ed7700 (LWP 23310) 0x00007f83bac858ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 4 Thread 0x7f83b7d57700 (LWP 23309) 0x00007f83bac8264b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 3 Thread 0x7f83b8558700 (LWP 23308) 0x00007f83bac8264b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 2 Thread 0x7f83b8d59700 (LWP 23307) 0x00007f83bac85d77 in do_sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0 * 1 Thread 0x7f83b1d0b700 (LWP 26000) _update_mda (mda=0x11fb000, baton=0x7f83b1d0a6d0) at format_text/text_label.c:353 (gdb)thread 11 [Switching to thread 11 (Thread 0x7f83b3786700 (LWP 24272))] #0 0x00007f83bac8578d in fsync () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007f7bc917cdec in _dev_close (dev=0x16421d0, immediate=immediate at entry=0) at device/dev-io.c:624 #1 0x00007f7bc917d257 in dev_close (dev=<optimized out>) at device/dev-io.c:631 #2 0x00007f83b59c1a88 in _update_mda (mda=0x11fafd0, baton=0x7f83b37856d0) at at format_text/text_label.c:361 #3 0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info at entry=0x11faf00, fun=fun at entry=0x7f83b59c1a60 <_update_mda>, baton=baton at entry=0x7f83b37856d0) at cache/lvmcache.c:1899 #4 0x00007f83b59c0e5f in _text_read (l=<optimized out>, dev=0x11ee8d8, buf=<optimized out>, label=0x7f83b3785958) at format_text/text_label.c:459 #5 0x00007f83b59c27e7 in label_read (dev=0x11ee8d8, result=result at entry=0x7f83b3785958, scan_sector=scan_sector at entry=0) at label/label.c:284 #6 0x00007f83b599dd2b in lvmcache_fmt_from_vgname (cmd=cmd at entry=0x11d3c40, vgname=vgname at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0, revalidate_labels=revalidate_labels at entry=1) at cache/lvmcache.c:506 #7 0x00007f83b59e1ad8 in _vg_read (cmd=cmd at entry=0x11d3c40, vgname=vgname at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0, warnings=warnings at entry=1, consistent=consistent at entry=0x7f83b3785b48, precommitted=precommitted at entry=0) at metadata/metadata.c:3143 #8 0x00007f83b59e2ecc in vg_read_internal (cmd=cmd at entry=0x11d3c40, vgname=vgname at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0, warnings=warnings at entry=1, consistent=consistent at entry=0x7f83b3785b48) at metadata/metadata.c:3549 #9 0x00007f83b59e30cc in _vg_lock_and_read (misc_flags=0, status_flags=0, lock_flags=33, vgid=0x0, vg_name=0x11d2c50 "bd-vg", cmd=0x11d3c40) at metadata/metadata.c:4235 #10 vg_read (cmd=cmd at entry=0x11d3c40, vg_name=vg_name at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0, flags=0) at metadata/metadata.c:4343 #11 0x00007f83b599753f in _lvm_vg_open (mode=0x7f83b5c8971e "r", vgname=0x11d2c50 "bd-vg", libh=0x11d3c40, flags=<optimized out>) at lvm_vg.c:221 #12 lvm_vg_open (libh=0x11d3c40, vgname=0x11d2c50 "bd-vg", mode=mode at entry=0x7f83b5c8971e "r", flags=flags at entry=0) at lvm_vg.c:238 #13 0x00007f83b5c7ee36 in bd_statfs_cbk (frame=0x7f83b95412dc, cookie=<optimized out>, this=0x119eb90, op_ret=0, op_errno=0, buff=0x7f83b3785c70, xdata=0x0) at bd.c:353 ...... (gdb) f 3 #3 0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info at entry=0x11faf00, fun=fun at entry=0x7f83b59c1a60 <_update_mda>, baton=baton at entry=0x7f83b37856d0) at cache/lvmcache.c:1899 1899 in cache/lvmcache.c (gdb) p *info $2 = {list = {n = 0x11fa650, p = 0x11fa650}, mdas = {n = 0x11fafd0, p 0x11fafd0}, das = {n = 0x11fb000, p = 0x11fb000}, bas = {n = 0x11faf30, p = 0x11faf30}, vginfo = 0x11fa640, label 0x11faed0, fmt = 0x11f8480, dev = 0x11ee8d8, device_size = 531870253056, status = 1} -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141108/804f6cdb/attachment.html>
Vijay Bellur
2014-Nov-10 10:47 UTC
[Gluster-users] [Gluster-devel]glusterfs crashed lead by liblvm2app.so with BD xlator
On 11/08/2014 03:50 PM, Jaden Liang wrote:> > Hi all, > > We are testing BD xlator to verify the KVM running with gluster. After some > simple tests, we encountered a coredump of glusterfs lead by liblvm2app.so. > Hope some one here might give some advises about this issue. > > We have debug for some time, and found out this coredump is triggered by a > thread-safe issue. From the core file, the top function is _update_mda() > with a invailid pointer which is from lvmcache_foreach_mda(). As we > know, the glusterfsd > has some io threads to simulate the async io. That will make more than 1 > thread run into > bd_statfs_cbk(). And in liblvm2app.so, _text_read() will look up an info > in a hash > table named _pvid_hash. If no info item exist, it will allocate a new > one. However, > there isn't any lock to protect this operations! liblvm2app.so will get > crashed with > multi-thread like this precedures: > > Thread A and thread B go into bd_statfs_cbk() at the same time: > 1. A allocate an new info node, and put it into _pvid_hash, call > lvmcache_foreach_mda(). > 2. B looks up and get the info generaed by A in _pvid_hash, pass it to > lvmcache_del_mdas(), this will free the info node. > 3. A keep using the info node which has been freed by B. > 4. Memory crash... >Thanks for the report and the steps to recreate the problem.> #9 0x00007f83b599753f in _lvm_vg_open (mode=0x7f83b5c8971e "r", > vgname=0x11d2c50 "bd-vg", libh=0x11d3c40, > flags=<optimized out>) at lvm_vg.c:221 > #10 lvm_vg_open (libh=0x11d3c40, vgname=0x11d2c50 "bd-vg", > mode=mode at entry=0x7f83b5c8971e "r", flags=flags at entry=0) > at lvm_vg.c:238 > #11 0x00007f83b5c7ee36 in bd_statfs_cbk (frame=0x7f83b95416e4, > cookie=<optimized out>, this=0x119eb90, op_ret=0, op_errno=0, > buff=0x7f83b1d0ac70, xdata=0x0) at bd.c:353One quick fix would be to serialize calls to lvm_vg_open() by holding a lock in bd xlator. Have you tried attempting that? -Vijay