I tried to run "zfs list" on my system, but looks that this command will hangs. This command can not return even if I press "contrl+c" as following: root at intel7:/export/bench/io/filebench/results# zfs list ^C^C^C^C ^C^C^C^C .. When this happens, I am running filebench benchmark with oltp workload. But "zpool status" shows that all pools are in good statu like following: root at intel7:~# zpool status pool: rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c8t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: tpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM tpool ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 errors: No known data errors My system is running B141 and tpool is using the latest version 26. Tried command "truss -p `pgrep zfs`", but it failes like following: root at intel7:~# truss -p `pgrep zfs` truss: unanticipated system error: 5060 Looks that zfs is in deadlock state, but I dont know what is the cause. I have tried to run filebench/oltp workload several times, each time it will leads to this state. But if I run filebench with other workload such as fileserver, webwerver, this issue does not happen. Thanks Zhihui
Looks that the txg_sync_thread for this pool has been blocked and never return, which leads to many other threads have been blocked. I have tried to change zfs_vdev_max_pending value from 10 to 35 and retested the workload serveral times, this issue does not happen. But if I change it back to 10, it happens very easily. Any known bug on this or any suggestion to solve this issue?> ffffff0502c3378c::wchaninfo -vADDR TYPE NWAITERS THREAD PROC ffffff0502c3378c cond 1730: ffffff051cc6b500 go_filebench ffffff051ce61020 go_filebench ffffff051cc4e4e0 go_filebench ffffff051d115120 go_filebench ffffff051e9ed000 go_filebench ffffff051bf644c0 go_filebench ffffff051c65b000 go_filebench ffffff051c728500 go_filebench ffffff050d83a8c0 go_filebench ffffff051c528c00 go_filebench ffffff051b750800 go_filebench ffffff051cdd7520 go_filebench ffffff051ce71bc0 go_filebench ffffff051cb5e840 go_filebench ffffff051cbdec60 go_filebench ffffff0516473c60 go_filebench ffffff051d132820 go_filebench ffffff051d13a400 go_filebench ffffff050fbf0b40 go_filebench ffffff051ce7a400 go_filebench ffffff051b781820 go_filebench ffffff051ce603e0 go_filebench ffffff051d1bf840 go_filebench ffffff051c6c24c0 go_filebench ffffff051d204100 go_filebench ffffff051cbdf160 go_filebench ffffff051ce52c00 go_filebench .......> ffffff051cc6b500::findstack -vstack pointer for thread ffffff051cc6b500: ffffff0020a76ac0 [ ffffff0020a76ac0 _resume_from_idle+0xf1() ] ffffff0020a76af0 swtch+0x145() ffffff0020a76b20 cv_wait+0x61(ffffff0502c3378c, ffffff0502c33700) ffffff0020a76b70 zil_commit+0x67(ffffff0502c33700, 6b255, 14) ffffff0020a76d80 zfs_write+0xaaf(ffffff050b5c9140, ffffff0020a76e40, 40, ffffff0502dab258, 0) ffffff0020a76df0 fop_write+0x6b(ffffff050b5c9140, ffffff0020a76e40, 40, ffffff0502dab258, 0) ffffff0020a76ec0 pwrite64+0x244(1a, b6f2a000, 800, b841a800, 0) ffffff0020a76f10 sys_syscall32+0xff()>From the zil_commit code, I try to find the thread whose stack havefunction call zil_commit_writer. This thread did not return back from zil_commit_write so that it will not call cv_broadcast to wake up the waiting threads.> ffffff051d10fba0::findstack -vstack pointer for thread ffffff051d10fba0: ffffff0021ab9a10 [ ffffff0021ab9a10 _resume_from_idle+0xf1() ] ffffff0021ab9a40 swtch+0x145() ffffff0021ab9a70 cv_wait+0x61(ffffff051ae1b988, ffffff051ae1b980) ffffff0021ab9ab0 zio_wait+0x5d(ffffff051ae1b680) ffffff0021ab9b20 zil_commit_writer+0x249(ffffff0502c33700, 6b250, e) ffffff0021ab9b70 zil_commit+0x91(ffffff0502c33700, 6b250, e) ffffff0021ab9d80 zfs_write+0xaaf(ffffff050b5c9540, ffffff0021ab9e40, 40, ffffff0502dab258, 0) ffffff0021ab9df0 fop_write+0x6b(ffffff050b5c9540, ffffff0021ab9e40, 40, ffffff0502dab258, 0) ffffff0021ab9ec0 pwrite64+0x244(14, bfbfb800, 800, 88f3f000, 0) ffffff0021ab9f10 sys_syscall32+0xff()> ffffff051ae1b680::zio -rADDRESS TYPE STAGE WAITER ffffff051ae1b680 NULL CHECKSUM_VERIFY ffffff051d10fba0 ffffff051a9c1978 WRITE VDEV_IO_START - ffffff052454d348 WRITE VDEV_IO_START - ffffff051572b960 WRITE VDEV_IO_START - ffffff050accb330 WRITE VDEV_IO_START - ffffff0514453c80 WRITE VDEV_IO_START - ffffff0524537648 WRITE VDEV_IO_START - ffffff05090e9660 WRITE VDEV_IO_START - ffffff05151cb698 WRITE VDEV_IO_START - ffffff0514668658 WRITE VDEV_IO_START - ffffff0514835690 WRITE VDEV_IO_START - ffffff05198979a0 WRITE VDEV_IO_START - ffffff0507e1d038 WRITE VDEV_IO_START - ffffff0510727028 WRITE VDEV_IO_START - ffffff0523a25018 WRITE VDEV_IO_START - ffffff0523d729c0 WRITE VDEV_IO_START - ffffff052465b990 WRITE VDEV_IO_START - ffffff052395f008 WRITE DONE - ffffff0514cbc350 WRITE VDEV_IO_START - ffffff05146f2688 WRITE VDEV_IO_START - ffffff0509454048 WRITE VDEV_IO_START - ffffff0524186038 WRITE VDEV_IO_START - ffffff051166e9a0 WRITE DONE - ffffff0515256960 WRITE VDEV_IO_START - ffffff0518edf010 WRITE VDEV_IO_START - ffffff0514b2f688 WRITE VDEV_IO_START - ffffff05158b4040 WRITE VDEV_IO_START - ffffff052448d648 WRITE DONE - ffffff0512354380 WRITE VDEV_IO_START - ffffff051aafe6a0 WRITE VDEV_IO_START - ffffff051524e350 WRITE VDEV_IO_START - ffffff051a707058 WRITE VDEV_IO_START - ffffff0524679c88 WRITE DONE - ffffff051acef058 WRITE DONE -> ffffff051acef058::print zio_t io_executorio_executor = 0xffffff002089ac40> 0xffffff002089ac40::findstack -vstack pointer for thread ffffff002089ac40: ffffff002089a720 [ ffffff002089a720 _resume_from_idle+0xf1() ] ffffff002089a750 swtch+0x145() ffffff002089a800 turnstile_block+0x760(ffffff051d186418, 0, ffffff051fcf0340, fffffffffbc07db8, 0, 0) ffffff002089a860 mutex_vector_enter+0x261(ffffff051fcf0340) ffffff002089a890 txg_rele_to_sync+0x2a(ffffff05121bece8) ffffff002089a8c0 dmu_tx_commit+0xee(ffffff05121bec98) ffffff002089a8f0 zil_lwb_write_done+0x5f(ffffff051acef058) ffffff002089a960 zio_done+0x383(ffffff051acef058) ffffff002089a990 zio_execute+0x8d(ffffff051acef058) ffffff002089a9f0 zio_notify_parent+0xa6(ffffff051acef058, ffffff052391b9b8, 1) ffffff002089aa60 zio_done+0x3e2(ffffff052391b9b8) ffffff002089aa90 zio_execute+0x8d(ffffff052391b9b8) ffffff002089ab30 taskq_thread+0x248(ffffff050c418910) ffffff002089ab40 thread_start+8()> ffffff05121bece8::print -t txg_handle_ttxg_handle_t { tx_cpu_t *th_cpu = 0xffffff051fcf0340 uint64_t th_txg = 0xf36 }> ffffff051fcf0340::mutexADDR TYPE HELD MINSPL OLDSPL WAITERS ffffff051fcf0340 adapt ffffff050dc5d3a0 - - yes> ffffff050dc5d3a0::findstack -vstack pointer for thread ffffff050dc5d3a0: ffffff0023589970 [ ffffff0023589970 _resume_from_idle+0xf1() ] ffffff00235899a0 swtch+0x145() ffffff0023589a50 turnstile_block+0x760(ffffff051ce0c948, 0, ffffff05083403c8, fffffffffbc07db8, 0, 0) ffffff0023589ab0 mutex_vector_enter+0x261(ffffff05083403c8) ffffff0023589b30 dmu_tx_try_assign+0xab(ffffff0514395018, 2) ffffff0023589b70 dmu_tx_assign+0x2a(ffffff0514395018, 2) ffffff0023589d80 zfs_write+0x65f(ffffff050b5c9640, ffffff0023589e40, 40, ffffff0502dab258, 0) ffffff0023589df0 fop_write+0x6b(ffffff050b5c9640, ffffff0023589e40, 40, ffffff0502dab258, 0) ffffff0023589ec0 pwrite64+0x244(16, b6f7c000, 800, a7ef7800, 0) ffffff0023589f10 sys_syscall32+0xff()> ffffff0514395018::print dmu_tx_t{ tx_holds = { list_size = 0x50 list_offset = 0x8 list_head = { list_next = 0xffffff0508054840 list_prev = 0xffffff050da3b1f8 } } tx_objset = 0xffffff05028c8940 tx_dir = 0xffffff04e7785400 tx_pool = 0xffffff0502ceac00 tx_txg = 0xf36 tx_lastsnap_txg = 0x1 tx_lasttried_txg = 0 tx_txgh = { th_cpu = 0xffffff051fcf0340 th_txg = 0xf36 } tx_tempreserve_cookie = 0 tx_needassign_txh = 0 tx_callbacks = { list_size = 0x20 list_offset = 0 list_head = { list_next = 0xffffff0514395098 list_prev = 0xffffff0514395098 } } tx_anyobj = 0 tx_err = 0 }> ffffff05083403c8::mutexADDR TYPE HELD MINSPL OLDSPL WAITERS ffffff05083403c8 adapt ffffff002035cc40 - - yes> ffffff002035cc40::findstack -vstack pointer for thread ffffff002035cc40: ffffff002035c590 [ ffffff002035c590 _resume_from_idle+0xf1() ] ffffff002035c5c0 swtch+0x145() ffffff002035c5f0 cv_wait+0x61(ffffff05123ce350, ffffff05123ce348) ffffff002035c630 zio_wait+0x5d(ffffff05123ce048) ffffff002035c690 dbuf_read+0x1e8(ffffff0509c758e0, 0, a) ffffff002035c710 dmu_buf_hold+0xac(ffffff05028c8940, ffffffffffffffff, 0, 0, ffffff002035c748, 1) ffffff002035c7b0 zap_lockdir+0x6d(ffffff05028c8940, ffffffffffffffff, 0, 1, 1, 0, ffffff002035c7d8) ffffff002035c840 zap_lookup_norm+0x55(ffffff05028c8940, ffffffffffffffff, ffffff002035c920, 8, 1, ffffff002035c8b8, 0, 0, 0 , 0) ffffff002035c8a0 zap_lookup+0x2d(ffffff05028c8940, ffffffffffffffff, ffffff002035c920, 8, 1, ffffff002035c8b8) ffffff002035c910 zap_increment+0x64(ffffff05028c8940, ffffffffffffffff, ffffff002035c920, fffffffeffef7e00, ffffff0511d9bc80) ffffff002035c990 zap_increment_int+0x68(ffffff05028c8940, ffffffffffffffff, 0, fffffffeffef7e00, ffffff0511d9bc80) ffffff002035c9f0 do_userquota_update+0x69(ffffff05028c8940, 100108000, 3, 0, 0, 1, ffffff0511d9bc80) ffffff002035ca50 dmu_objset_do_userquota_updates+0xde(ffffff05028c8940, ffffff0511d9bc80) ffffff002035cad0 dsl_pool_sync+0x112(ffffff0502ceac00, f34) ffffff002035cb80 spa_sync+0x37b(ffffff0501269580, f34) ffffff002035cc20 txg_sync_thread+0x247(ffffff0502ceac00) ffffff002035cc30 thread_start+8()> ffffff05123ce048::zio -rADDRESS TYPE STAGE WAITER ffffff05123ce048 NULL CHECKSUM_VERIFY ffffff002035cc40 ffffff051a9a9338 READ VDEV_IO_START - ffffff050e3a4050 READ VDEV_IO_DONE - ffffff0519173c90 READ VDEV_IO_START ->ffffff0519173c90::print zio_t io_doneio_done = vdev_cache_fill The zio ffffff0519173c90 is vdec cach read rquest and can not be done so that txt_sync_thread isblocked. I dont know why this zio can not be satisfied and enter into done stage. I have tried to dd the raw device which consists the pool when this zfs hangs, it works ok. Thanks Zhihui On Mon, Jul 5, 2010 at 7:56 PM, zhihui Chen <zhchen3 at gmail.com> wrote:> I tried to run "zfs list" on my system, but looks that this command > will hangs. This command can not return even if I press "contrl+c" as > following: > root at intel7:/export/bench/io/filebench/results# zfs list > ^C^C^C^C > > ^C^C^C^C > > > > > .. > When this happens, I am running filebench benchmark with oltp > workload. But "zpool status" shows that all pools are in good statu > like following: > root at intel7:~# zpool status > ?pool: rpool > ?state: ONLINE > status: The pool is formatted using an older on-disk format. ?The pool can > ? ? ? ?still be used, but some features are unavailable. > action: Upgrade the pool using ''zpool upgrade''. ?Once this is done, the > ? ? ? ?pool will no longer be accessible on older software versions. > ?scan: none requested > config: > > ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM > ? ? ? ?rpool ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?c8t0d0s0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > > errors: No known data errors > > ?pool: tpool > ?state: ONLINE > ?scan: none requested > config: > > ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM > ? ? ? ?tpool ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?c10t1d0 ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > > errors: No known data errors > > > My system is running B141 and tpool is using the latest version 26. > Tried command "truss -p `pgrep zfs`", but ?it failes like following: > > root at intel7:~# truss -p `pgrep zfs` > truss: unanticipated system error: 5060 > > Looks that zfs is in deadlock state, but I dont know what is the > cause. I have tried to run filebench/oltp workload several times, each > time it will leads to this state. But if I run filebench with other > workload such as fileserver, webwerver, this issue does not happen. > > Thanks > Zhihui >
I don''t recall seeing this issue before. Best thing to do is file a bug and include a pointer to the crash dump. - George zhihui Chen wrote:> Looks that the txg_sync_thread for this pool has been blocked and > never return, which leads to many other threads have been > blocked. I have tried to change zfs_vdev_max_pending value from 10 to > 35 and retested the workload serveral times, this issue > does not happen. But if I change it back to 10, it happens very > easily. Any known bug on this or any suggestion to solve this issue? > >> ffffff0502c3378c::wchaninfo -v > ADDR TYPE NWAITERS THREAD PROC > ffffff0502c3378c cond 1730: ffffff051cc6b500 go_filebench > ffffff051ce61020 go_filebench > ffffff051cc4e4e0 go_filebench > ffffff051d115120 go_filebench > ffffff051e9ed000 go_filebench > ffffff051bf644c0 go_filebench > ffffff051c65b000 go_filebench > ffffff051c728500 go_filebench > ffffff050d83a8c0 go_filebench > ffffff051c528c00 go_filebench > ffffff051b750800 go_filebench > ffffff051cdd7520 go_filebench > ffffff051ce71bc0 go_filebench > ffffff051cb5e840 go_filebench > ffffff051cbdec60 go_filebench > ffffff0516473c60 go_filebench > ffffff051d132820 go_filebench > ffffff051d13a400 go_filebench > ffffff050fbf0b40 go_filebench > ffffff051ce7a400 go_filebench > ffffff051b781820 go_filebench > ffffff051ce603e0 go_filebench > ffffff051d1bf840 go_filebench > ffffff051c6c24c0 go_filebench > ffffff051d204100 go_filebench > ffffff051cbdf160 go_filebench > ffffff051ce52c00 go_filebench > ....... >> ffffff051cc6b500::findstack -v > stack pointer for thread ffffff051cc6b500: ffffff0020a76ac0 > [ ffffff0020a76ac0 _resume_from_idle+0xf1() ] > ffffff0020a76af0 swtch+0x145() > ffffff0020a76b20 cv_wait+0x61(ffffff0502c3378c, ffffff0502c33700) > ffffff0020a76b70 zil_commit+0x67(ffffff0502c33700, 6b255, 14) > ffffff0020a76d80 zfs_write+0xaaf(ffffff050b5c9140, ffffff0020a76e40, > 40, ffffff0502dab258, 0) > ffffff0020a76df0 fop_write+0x6b(ffffff050b5c9140, ffffff0020a76e40, > 40, ffffff0502dab258, 0) > ffffff0020a76ec0 pwrite64+0x244(1a, b6f2a000, 800, b841a800, 0) > ffffff0020a76f10 sys_syscall32+0xff() > > From the zil_commit code, I try to find the thread whose stack have > function call zil_commit_writer. This thread did not > return back from zil_commit_write so that it will not call > cv_broadcast to wake up the waiting threads. > >> ffffff051d10fba0::findstack -v > stack pointer for thread ffffff051d10fba0: ffffff0021ab9a10 > [ ffffff0021ab9a10 _resume_from_idle+0xf1() ] > ffffff0021ab9a40 swtch+0x145() > ffffff0021ab9a70 cv_wait+0x61(ffffff051ae1b988, ffffff051ae1b980) > ffffff0021ab9ab0 zio_wait+0x5d(ffffff051ae1b680) > ffffff0021ab9b20 zil_commit_writer+0x249(ffffff0502c33700, 6b250, e) > ffffff0021ab9b70 zil_commit+0x91(ffffff0502c33700, 6b250, e) > ffffff0021ab9d80 zfs_write+0xaaf(ffffff050b5c9540, ffffff0021ab9e40, > 40, ffffff0502dab258, 0) > ffffff0021ab9df0 fop_write+0x6b(ffffff050b5c9540, ffffff0021ab9e40, > 40, ffffff0502dab258, 0) > ffffff0021ab9ec0 pwrite64+0x244(14, bfbfb800, 800, 88f3f000, 0) > ffffff0021ab9f10 sys_syscall32+0xff() > >> ffffff051ae1b680::zio -r > ADDRESS TYPE STAGE WAITER > ffffff051ae1b680 NULL CHECKSUM_VERIFY ffffff051d10fba0 > ffffff051a9c1978 WRITE VDEV_IO_START - > ffffff052454d348 WRITE VDEV_IO_START - > ffffff051572b960 WRITE VDEV_IO_START - > ffffff050accb330 WRITE VDEV_IO_START - > ffffff0514453c80 WRITE VDEV_IO_START - > ffffff0524537648 WRITE VDEV_IO_START - > ffffff05090e9660 WRITE VDEV_IO_START - > ffffff05151cb698 WRITE VDEV_IO_START - > ffffff0514668658 WRITE VDEV_IO_START - > ffffff0514835690 WRITE VDEV_IO_START - > ffffff05198979a0 WRITE VDEV_IO_START - > ffffff0507e1d038 WRITE VDEV_IO_START - > ffffff0510727028 WRITE VDEV_IO_START - > ffffff0523a25018 WRITE VDEV_IO_START - > ffffff0523d729c0 WRITE VDEV_IO_START - > ffffff052465b990 WRITE VDEV_IO_START - > ffffff052395f008 WRITE DONE - > ffffff0514cbc350 WRITE VDEV_IO_START - > ffffff05146f2688 WRITE VDEV_IO_START - > ffffff0509454048 WRITE VDEV_IO_START - > ffffff0524186038 WRITE VDEV_IO_START - > ffffff051166e9a0 WRITE DONE - > ffffff0515256960 WRITE VDEV_IO_START - > ffffff0518edf010 WRITE VDEV_IO_START - > ffffff0514b2f688 WRITE VDEV_IO_START - > ffffff05158b4040 WRITE VDEV_IO_START - > ffffff052448d648 WRITE DONE - > ffffff0512354380 WRITE VDEV_IO_START - > ffffff051aafe6a0 WRITE VDEV_IO_START - > ffffff051524e350 WRITE VDEV_IO_START - > ffffff051a707058 WRITE VDEV_IO_START - > ffffff0524679c88 WRITE DONE - > ffffff051acef058 WRITE DONE - > >> ffffff051acef058::print zio_t io_executor > io_executor = 0xffffff002089ac40 >> 0xffffff002089ac40::findstack -v > stack pointer for thread ffffff002089ac40: ffffff002089a720 > [ ffffff002089a720 _resume_from_idle+0xf1() ] > ffffff002089a750 swtch+0x145() > ffffff002089a800 turnstile_block+0x760(ffffff051d186418, 0, > ffffff051fcf0340, fffffffffbc07db8, 0, 0) > ffffff002089a860 mutex_vector_enter+0x261(ffffff051fcf0340) > ffffff002089a890 txg_rele_to_sync+0x2a(ffffff05121bece8) > ffffff002089a8c0 dmu_tx_commit+0xee(ffffff05121bec98) > ffffff002089a8f0 zil_lwb_write_done+0x5f(ffffff051acef058) > ffffff002089a960 zio_done+0x383(ffffff051acef058) > ffffff002089a990 zio_execute+0x8d(ffffff051acef058) > ffffff002089a9f0 zio_notify_parent+0xa6(ffffff051acef058, ffffff052391b9b8, 1) > ffffff002089aa60 zio_done+0x3e2(ffffff052391b9b8) > ffffff002089aa90 zio_execute+0x8d(ffffff052391b9b8) > ffffff002089ab30 taskq_thread+0x248(ffffff050c418910) > ffffff002089ab40 thread_start+8() >> ffffff05121bece8::print -t txg_handle_t > txg_handle_t { > tx_cpu_t *th_cpu = 0xffffff051fcf0340 > uint64_t th_txg = 0xf36 > } > > >> ffffff051fcf0340::mutex > ADDR TYPE HELD MINSPL OLDSPL WAITERS > ffffff051fcf0340 adapt ffffff050dc5d3a0 - - yes > >> ffffff050dc5d3a0::findstack -v > stack pointer for thread ffffff050dc5d3a0: ffffff0023589970 > [ ffffff0023589970 _resume_from_idle+0xf1() ] > ffffff00235899a0 swtch+0x145() > ffffff0023589a50 turnstile_block+0x760(ffffff051ce0c948, 0, > ffffff05083403c8, fffffffffbc07db8, 0, 0) > ffffff0023589ab0 mutex_vector_enter+0x261(ffffff05083403c8) > ffffff0023589b30 dmu_tx_try_assign+0xab(ffffff0514395018, 2) > ffffff0023589b70 dmu_tx_assign+0x2a(ffffff0514395018, 2) > ffffff0023589d80 zfs_write+0x65f(ffffff050b5c9640, ffffff0023589e40, > 40, ffffff0502dab258, 0) > ffffff0023589df0 fop_write+0x6b(ffffff050b5c9640, ffffff0023589e40, > 40, ffffff0502dab258, 0) > ffffff0023589ec0 pwrite64+0x244(16, b6f7c000, 800, a7ef7800, 0) > ffffff0023589f10 sys_syscall32+0xff() >> ffffff0514395018::print dmu_tx_t > { > tx_holds = { > list_size = 0x50 > list_offset = 0x8 > list_head = { > list_next = 0xffffff0508054840 > list_prev = 0xffffff050da3b1f8 > } > } > tx_objset = 0xffffff05028c8940 > tx_dir = 0xffffff04e7785400 > tx_pool = 0xffffff0502ceac00 > tx_txg = 0xf36 > tx_lastsnap_txg = 0x1 > tx_lasttried_txg = 0 > tx_txgh = { > th_cpu = 0xffffff051fcf0340 > th_txg = 0xf36 > } > tx_tempreserve_cookie = 0 > tx_needassign_txh = 0 > tx_callbacks = { > list_size = 0x20 > list_offset = 0 > list_head = { > list_next = 0xffffff0514395098 > list_prev = 0xffffff0514395098 > } > } > tx_anyobj = 0 > tx_err = 0 > } >> ffffff05083403c8::mutex > ADDR TYPE HELD MINSPL OLDSPL WAITERS > ffffff05083403c8 adapt ffffff002035cc40 - - yes > >> ffffff002035cc40::findstack -v > stack pointer for thread ffffff002035cc40: ffffff002035c590 > [ ffffff002035c590 _resume_from_idle+0xf1() ] > ffffff002035c5c0 swtch+0x145() > ffffff002035c5f0 cv_wait+0x61(ffffff05123ce350, ffffff05123ce348) > ffffff002035c630 zio_wait+0x5d(ffffff05123ce048) > ffffff002035c690 dbuf_read+0x1e8(ffffff0509c758e0, 0, a) > ffffff002035c710 dmu_buf_hold+0xac(ffffff05028c8940, > ffffffffffffffff, 0, 0, ffffff002035c748, 1) > ffffff002035c7b0 zap_lockdir+0x6d(ffffff05028c8940, > ffffffffffffffff, 0, 1, 1, 0, ffffff002035c7d8) > ffffff002035c840 zap_lookup_norm+0x55(ffffff05028c8940, > ffffffffffffffff, ffffff002035c920, 8, 1, ffffff002035c8b8, 0, 0, 0 > , 0) > ffffff002035c8a0 zap_lookup+0x2d(ffffff05028c8940, ffffffffffffffff, > ffffff002035c920, 8, 1, ffffff002035c8b8) > ffffff002035c910 zap_increment+0x64(ffffff05028c8940, > ffffffffffffffff, ffffff002035c920, fffffffeffef7e00, > ffffff0511d9bc80) > ffffff002035c990 zap_increment_int+0x68(ffffff05028c8940, > ffffffffffffffff, 0, fffffffeffef7e00, ffffff0511d9bc80) > ffffff002035c9f0 do_userquota_update+0x69(ffffff05028c8940, > 100108000, 3, 0, 0, 1, ffffff0511d9bc80) > ffffff002035ca50 > dmu_objset_do_userquota_updates+0xde(ffffff05028c8940, > ffffff0511d9bc80) > ffffff002035cad0 dsl_pool_sync+0x112(ffffff0502ceac00, f34) > ffffff002035cb80 spa_sync+0x37b(ffffff0501269580, f34) > ffffff002035cc20 txg_sync_thread+0x247(ffffff0502ceac00) > ffffff002035cc30 thread_start+8() >> ffffff05123ce048::zio -r > ADDRESS TYPE STAGE WAITER > ffffff05123ce048 NULL CHECKSUM_VERIFY ffffff002035cc40 > ffffff051a9a9338 READ VDEV_IO_START - > ffffff050e3a4050 READ VDEV_IO_DONE - > ffffff0519173c90 READ VDEV_IO_START - > >> ffffff0519173c90::print zio_t io_done > io_done = vdev_cache_fill > > The zio ffffff0519173c90 is vdec cach read rquest and can not be done > so that txt_sync_thread isblocked. I dont know why this zio can not be > satisfied and enter into done stage. I have tried to dd the raw device > which consists the pool when this zfs hangs, it works ok. > > Thanks > Zhihui > > On Mon, Jul 5, 2010 at 7:56 PM, zhihui Chen <zhchen3 at gmail.com> wrote: >> I tried to run "zfs list" on my system, but looks that this command >> will hangs. This command can not return even if I press "contrl+c" as >> following: >> root at intel7:/export/bench/io/filebench/results# zfs list >> ^C^C^C^C >> >> ^C^C^C^C >> >> >> >> >> .. >> When this happens, I am running filebench benchmark with oltp >> workload. But "zpool status" shows that all pools are in good statu >> like following: >> root at intel7:~# zpool status >> pool: rpool >> state: ONLINE >> status: The pool is formatted using an older on-disk format. The pool can >> still be used, but some features are unavailable. >> action: Upgrade the pool using ''zpool upgrade''. Once this is done, the >> pool will no longer be accessible on older software versions. >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> rpool ONLINE 0 0 0 >> c8t0d0s0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> pool: tpool >> state: ONLINE >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> tpool ONLINE 0 0 0 >> c10t1d0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> >> My system is running B141 and tpool is using the latest version 26. >> Tried command "truss -p `pgrep zfs`", but it failes like following: >> >> root at intel7:~# truss -p `pgrep zfs` >> truss: unanticipated system error: 5060 >> >> Looks that zfs is in deadlock state, but I dont know what is the >> cause. I have tried to run filebench/oltp workload several times, each >> time it will leads to this state. But if I run filebench with other >> workload such as fileserver, webwerver, this issue does not happen. >> >> Thanks >> Zhihui >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Thanks, I have filed the bug but I dont know how to provide the crash dump. If this bug is accepted, RE can get the crash dump file from me. On Thu, Jul 15, 2010 at 10:33 PM, George Wilson <george.r.wilson at oracle.com> wrote:> I don''t recall seeing this issue before. Best thing to do is file a bug and > include a pointer to the crash dump. > > - George > > zhihui Chen wrote: >> >> Looks that the txg_sync_thread for this pool has been blocked and >> never return, which leads to many other threads have been >> blocked. I have tried to change zfs_vdev_max_pending value from 10 to >> 35 and retested the workload serveral times, this issue >> does not happen. But if I change it back to 10, it happens very >> easily. Any known bug on this or any suggestion to solve this issue? >> >>> ffffff0502c3378c::wchaninfo -v >> >> ADDR ? ? ? ? ? ? TYPE NWAITERS ? THREAD ? ? ? ? ? PROC >> ffffff0502c3378c cond ? ? 1730: ?ffffff051cc6b500 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051ce61020 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051cc4e4e0 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051d115120 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051e9ed000 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051bf644c0 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051c65b000 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051c728500 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff050d83a8c0 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051c528c00 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051b750800 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051cdd7520 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051ce71bc0 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051cb5e840 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051cbdec60 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff0516473c60 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051d132820 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051d13a400 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff050fbf0b40 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051ce7a400 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051b781820 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051ce603e0 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051d1bf840 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051c6c24c0 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051d204100 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051cbdf160 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ffffff051ce52c00 go_filebench >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ....... >>> >>> ffffff051cc6b500::findstack -v >> >> stack pointer for thread ffffff051cc6b500: ffffff0020a76ac0 >> [ ffffff0020a76ac0 _resume_from_idle+0xf1() ] >> ?ffffff0020a76af0 swtch+0x145() >> ?ffffff0020a76b20 cv_wait+0x61(ffffff0502c3378c, ffffff0502c33700) >> ?ffffff0020a76b70 zil_commit+0x67(ffffff0502c33700, 6b255, 14) >> ?ffffff0020a76d80 zfs_write+0xaaf(ffffff050b5c9140, ffffff0020a76e40, >> 40, ffffff0502dab258, 0) >> ?ffffff0020a76df0 fop_write+0x6b(ffffff050b5c9140, ffffff0020a76e40, >> 40, ffffff0502dab258, 0) >> ?ffffff0020a76ec0 pwrite64+0x244(1a, b6f2a000, 800, b841a800, 0) >> ?ffffff0020a76f10 sys_syscall32+0xff() >> >> From the zil_commit code, I try to find the thread whose stack have >> function call zil_commit_writer. This thread did not >> return back from zil_commit_write so that it will not call >> cv_broadcast to wake up the waiting threads. >> >>> ffffff051d10fba0::findstack -v >> >> stack pointer for thread ffffff051d10fba0: ffffff0021ab9a10 >> [ ffffff0021ab9a10 _resume_from_idle+0xf1() ] >> ?ffffff0021ab9a40 swtch+0x145() >> ?ffffff0021ab9a70 cv_wait+0x61(ffffff051ae1b988, ffffff051ae1b980) >> ?ffffff0021ab9ab0 zio_wait+0x5d(ffffff051ae1b680) >> ?ffffff0021ab9b20 zil_commit_writer+0x249(ffffff0502c33700, 6b250, e) >> ?ffffff0021ab9b70 zil_commit+0x91(ffffff0502c33700, 6b250, e) >> ?ffffff0021ab9d80 zfs_write+0xaaf(ffffff050b5c9540, ffffff0021ab9e40, >> 40, ffffff0502dab258, 0) >> ?ffffff0021ab9df0 fop_write+0x6b(ffffff050b5c9540, ffffff0021ab9e40, >> 40, ffffff0502dab258, 0) >> ?ffffff0021ab9ec0 pwrite64+0x244(14, bfbfb800, 800, 88f3f000, 0) >> ?ffffff0021ab9f10 sys_syscall32+0xff() >> >>> ffffff051ae1b680::zio -r >> >> ADDRESS ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TYPE ?STAGE ? ? ? ? ? ?WAITER >> ffffff051ae1b680 ? ? ? ? ? ? ? ? ? ? ? ? NULL ?CHECKSUM_VERIFY >> ?ffffff051d10fba0 >> ?ffffff051a9c1978 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff052454d348 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff051572b960 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff050accb330 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff0514453c80 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff0524537648 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff05090e9660 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff05151cb698 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff0514668658 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff0514835690 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff05198979a0 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff0507e1d038 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff0510727028 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff0523a25018 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff0523d729c0 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff052465b990 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff052395f008 ? ? ? ? ? ? ? ? ? ? ? ?WRITE DONE ? ? ? ? ? ? - >> ?ffffff0514cbc350 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff05146f2688 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff0509454048 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff0524186038 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff051166e9a0 ? ? ? ? ? ? ? ? ? ? ? ?WRITE DONE ? ? ? ? ? ? - >> ?ffffff0515256960 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff0518edf010 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff0514b2f688 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff05158b4040 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff052448d648 ? ? ? ? ? ? ? ? ? ? ? ?WRITE DONE ? ? ? ? ? ? - >> ?ffffff0512354380 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff051aafe6a0 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff051524e350 ? ? ? ? ? ? ? ? ? ? ? ?WRITE VDEV_IO_START ? ?- >> ?ffffff051a707058 ? ? ? ? ? ? ? ? ? ? ? WRITE VDEV_IO_START ? ?- >> ?ffffff0524679c88 ? ? ? ? ? ? ? ? ? ? ? ?WRITE DONE ? ? ? ? ? ? - >> ?ffffff051acef058 ? ? ? ? ? ? ? ? ? ? ? ?WRITE DONE ? ? ? ? ? ? - >> >>> ffffff051acef058::print zio_t io_executor >> >> io_executor = 0xffffff002089ac40 >>> >>> 0xffffff002089ac40::findstack -v >> >> stack pointer for thread ffffff002089ac40: ffffff002089a720 >> [ ffffff002089a720 _resume_from_idle+0xf1() ] >> ?ffffff002089a750 swtch+0x145() >> ?ffffff002089a800 turnstile_block+0x760(ffffff051d186418, 0, >> ffffff051fcf0340, fffffffffbc07db8, 0, 0) >> ?ffffff002089a860 mutex_vector_enter+0x261(ffffff051fcf0340) >> ?ffffff002089a890 txg_rele_to_sync+0x2a(ffffff05121bece8) >> ?ffffff002089a8c0 dmu_tx_commit+0xee(ffffff05121bec98) >> ?ffffff002089a8f0 zil_lwb_write_done+0x5f(ffffff051acef058) >> ?ffffff002089a960 zio_done+0x383(ffffff051acef058) >> ?ffffff002089a990 zio_execute+0x8d(ffffff051acef058) >> ?ffffff002089a9f0 zio_notify_parent+0xa6(ffffff051acef058, >> ffffff052391b9b8, 1) >> ?ffffff002089aa60 zio_done+0x3e2(ffffff052391b9b8) >> ?ffffff002089aa90 zio_execute+0x8d(ffffff052391b9b8) >> ?ffffff002089ab30 taskq_thread+0x248(ffffff050c418910) >> ?ffffff002089ab40 thread_start+8() >>> >>> ffffff05121bece8::print -t txg_handle_t >> >> txg_handle_t { >> ? ?tx_cpu_t *th_cpu = 0xffffff051fcf0340 >> ? ?uint64_t th_txg = 0xf36 >> } >> >> >>> ffffff051fcf0340::mutex >> >> ? ? ? ? ? ?ADDR ?TYPE ? ? ? ? ? ? HELD MINSPL OLDSPL WAITERS >> ffffff051fcf0340 adapt ffffff050dc5d3a0 ? ? ?- ? ? ?- ? ? yes >> >>> ffffff050dc5d3a0::findstack -v >> >> stack pointer for thread ffffff050dc5d3a0: ffffff0023589970 >> [ ffffff0023589970 _resume_from_idle+0xf1() ] >> ?ffffff00235899a0 swtch+0x145() >> ?ffffff0023589a50 turnstile_block+0x760(ffffff051ce0c948, 0, >> ffffff05083403c8, fffffffffbc07db8, 0, 0) >> ?ffffff0023589ab0 mutex_vector_enter+0x261(ffffff05083403c8) >> ?ffffff0023589b30 dmu_tx_try_assign+0xab(ffffff0514395018, 2) >> ?ffffff0023589b70 dmu_tx_assign+0x2a(ffffff0514395018, 2) >> ?ffffff0023589d80 zfs_write+0x65f(ffffff050b5c9640, ffffff0023589e40, >> 40, ffffff0502dab258, 0) >> ?ffffff0023589df0 fop_write+0x6b(ffffff050b5c9640, ffffff0023589e40, >> 40, ffffff0502dab258, 0) >> ?ffffff0023589ec0 pwrite64+0x244(16, b6f7c000, 800, a7ef7800, 0) >> ?ffffff0023589f10 sys_syscall32+0xff() >>> >>> ffffff0514395018::print dmu_tx_t >> >> { >> ? ?tx_holds = { >> ? ? ? ?list_size = 0x50 >> ? ? ? ?list_offset = 0x8 >> ? ? ? ?list_head = { >> ? ? ? ? ? ?list_next = 0xffffff0508054840 >> ? ? ? ? ? ?list_prev = 0xffffff050da3b1f8 >> ? ? ? ?} >> ? ?} >> ? ?tx_objset = 0xffffff05028c8940 >> ? ?tx_dir = 0xffffff04e7785400 >> ? ?tx_pool = 0xffffff0502ceac00 >> ? ?tx_txg = 0xf36 >> ? ?tx_lastsnap_txg = 0x1 >> ? ?tx_lasttried_txg = 0 >> ? ?tx_txgh = { >> ? ? ? ?th_cpu = 0xffffff051fcf0340 >> ? ? ? ?th_txg = 0xf36 >> ? ?} >> ? ?tx_tempreserve_cookie = 0 >> ? ?tx_needassign_txh = 0 >> ? ?tx_callbacks = { >> ? ? ? ?list_size = 0x20 >> ? ? ? ?list_offset = 0 >> ? ? ? ?list_head = { >> ? ? ? ? ? ?list_next = 0xffffff0514395098 >> ? ? ? ? ? ?list_prev = 0xffffff0514395098 >> ? ? ? ?} >> ? ?} >> ? ?tx_anyobj = 0 >> ? ?tx_err = 0 >> } >>> >>> ffffff05083403c8::mutex >> >> ? ? ? ? ? ?ADDR ?TYPE ? ? ? ? ? ? HELD MINSPL OLDSPL WAITERS >> ffffff05083403c8 adapt ffffff002035cc40 ? ? ?- ? ? ?- ? ? yes >> >>> ffffff002035cc40::findstack -v >> >> stack pointer for thread ffffff002035cc40: ffffff002035c590 >> [ ffffff002035c590 _resume_from_idle+0xf1() ] >> ?ffffff002035c5c0 swtch+0x145() >> ?ffffff002035c5f0 cv_wait+0x61(ffffff05123ce350, ffffff05123ce348) >> ?ffffff002035c630 zio_wait+0x5d(ffffff05123ce048) >> ?ffffff002035c690 dbuf_read+0x1e8(ffffff0509c758e0, 0, a) >> ?ffffff002035c710 dmu_buf_hold+0xac(ffffff05028c8940, >> ffffffffffffffff, 0, 0, ffffff002035c748, 1) >> ?ffffff002035c7b0 zap_lockdir+0x6d(ffffff05028c8940, >> ffffffffffffffff, 0, 1, 1, 0, ffffff002035c7d8) >> ?ffffff002035c840 zap_lookup_norm+0x55(ffffff05028c8940, >> ffffffffffffffff, ffffff002035c920, 8, 1, ffffff002035c8b8, 0, 0, 0 >> ?, 0) >> ?ffffff002035c8a0 zap_lookup+0x2d(ffffff05028c8940, ffffffffffffffff, >> ffffff002035c920, 8, 1, ffffff002035c8b8) >> ?ffffff002035c910 zap_increment+0x64(ffffff05028c8940, >> ffffffffffffffff, ffffff002035c920, fffffffeffef7e00, >> ?ffffff0511d9bc80) >> ?ffffff002035c990 zap_increment_int+0x68(ffffff05028c8940, >> ffffffffffffffff, 0, fffffffeffef7e00, ffffff0511d9bc80) >> ?ffffff002035c9f0 do_userquota_update+0x69(ffffff05028c8940, >> 100108000, 3, 0, 0, 1, ffffff0511d9bc80) >> ?ffffff002035ca50 >> dmu_objset_do_userquota_updates+0xde(ffffff05028c8940, >> ffffff0511d9bc80) >> ?ffffff002035cad0 dsl_pool_sync+0x112(ffffff0502ceac00, f34) >> ?ffffff002035cb80 spa_sync+0x37b(ffffff0501269580, f34) >> ?ffffff002035cc20 txg_sync_thread+0x247(ffffff0502ceac00) >> ?ffffff002035cc30 thread_start+8() >>> >>> ffffff05123ce048::zio -r >> >> ADDRESS ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TYPE ?STAGE ? ? ? ? ? ?WAITER >> ffffff05123ce048 ? ? ? ? ? ? ? ? ? ? ? ? NULL ?CHECKSUM_VERIFY >> ?ffffff002035cc40 >> ?ffffff051a9a9338 ? ? ? ? ? ? ? ? ? ? ? ?READ ?VDEV_IO_START ? ?- >> ?ffffff050e3a4050 ? ? ? ? ? ? ? ? ? ? ? READ ?VDEV_IO_DONE ? ? - >> ? ffffff0519173c90 ? ? ? ? ? ? ? ? ? ? ?READ ?VDEV_IO_START ? ?- >> >>> ffffff0519173c90::print zio_t io_done >> >> io_done = vdev_cache_fill >> >> The zio ffffff0519173c90 is vdec cach read rquest and can not be done >> so that txt_sync_thread isblocked. I dont know why this zio can not be >> satisfied and enter into done stage. I have tried to dd the raw device >> which consists the pool when this zfs hangs, it works ok. >> >> Thanks >> Zhihui >> >> On Mon, Jul 5, 2010 at 7:56 PM, zhihui Chen <zhchen3 at gmail.com> wrote: >>> >>> I tried to run "zfs list" on my system, but looks that this command >>> will hangs. This command can not return even if I press "contrl+c" as >>> following: >>> root at intel7:/export/bench/io/filebench/results# zfs list >>> ^C^C^C^C >>> >>> ^C^C^C^C >>> >>> >>> >>> >>> .. >>> When this happens, I am running filebench benchmark with oltp >>> workload. But "zpool status" shows that all pools are in good statu >>> like following: >>> root at intel7:~# zpool status >>> ?pool: rpool >>> ?state: ONLINE >>> status: The pool is formatted using an older on-disk format. ?The pool >>> can >>> ? ? ? still be used, but some features are unavailable. >>> action: Upgrade the pool using ''zpool upgrade''. ?Once this is done, the >>> ? ? ? pool will no longer be accessible on older software versions. >>> ?scan: none requested >>> config: >>> >>> ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM >>> ? ? ? rpool ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>> ? ? ? ? c8t0d0s0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>> >>> errors: No known data errors >>> >>> ?pool: tpool >>> ?state: ONLINE >>> ?scan: none requested >>> config: >>> >>> ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM >>> ? ? ? tpool ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>> ? ? ? ? c10t1d0 ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>> >>> errors: No known data errors >>> >>> >>> My system is running B141 and tpool is using the latest version 26. >>> Tried command "truss -p `pgrep zfs`", but ?it failes like following: >>> >>> root at intel7:~# truss -p `pgrep zfs` >>> truss: unanticipated system error: 5060 >>> >>> Looks that zfs is in deadlock state, but I dont know what is the >>> cause. I have tried to run filebench/oltp workload several times, each >>> time it will leads to this state. But if I run filebench with other >>> workload such as fileserver, webwerver, this issue does not happen. >>> >>> Thanks >>> Zhihui >>> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >