River Tarnell
2008-Sep-30 03:10 UTC
[zfs-discuss] S10U5: deadlock between ''zfs receive'' and ''zfs list''
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 hi, i have an X4500 running Solaris 10 Update 5 (with all current patches). it has a stripe-mirror ZFS pool over 44 disks with 2 hot spare. the system is entirely idle, except that every 60 seconds, a ''zfs recv'' is run. a couple of days ago, while the recv was running, i tried to destroy the pool. the ''zfs destroy'' command hung; the ''zfs receive'' also seemed to be hung. neither would reply to ''kill -9'', and i could not do anything with the pool. i had to reboot the system to recover it. i saved the core dump and have gathered some information on the state at the time it hung. this is the ''zfs receive'' thread: SolarisCAT(vmcore.0/10X)> thread 0xffffffff9d1c1500 ==== user (LWP_SYS) thread: 0xffffffff9d1c1500 PID: 10090 ===cmd: /sbin/zfs receive export/filebench at zrep-75 t_wchan: 0xffffffff9d1c16d6 sobj: condition var (from genunix:delay+0xa0) t_procp: 0xffffffffaefbc710 p_as: 0xffffffffae4087e8 size: 4661248 RSS: 1970176 hat: 0xffffffff9d2583a0 cpuset: zone: global t_stk: 0xfffffe8000185f10 sp: 0xfffffe80001859f0 t_stkbase: 0xfffffe8000181000 t_pri: 59(TS) t_epri: 60 pctcpu: 0.019277 t_lwp: 0xfffffe84cc7ed8a0 lwp_regs: 0xfffffe8000185f10 mstate: LMS_SLEEP ms_prev: LMS_SYSTEM ms_state_start: 0.212252471 seconds earlier ms_start: 10 minutes 45.391859933 seconds earlier psrset: 0 last CPU: 0 idle: 0 ticks (0 seconds) start: Fri Sep 26 21:46:06 2008 age: 645 seconds (10 minutes 45 seconds) syscall: #54 ioctl(, 0x0) (sysent: genunix:ioctl+0x0) tstate: TS_SLEEP - awaiting an event tflg: T_DFLTSTK - stack is default size tpflg: TP_TWAIT - wait to be freed by lwp_wait TP_MSACCT - collect micro-state accounting information tsched: TS_LOAD - thread is in memory TS_DONT_SWAP - thread/LWP should not be swapped pflag: SKILLED - SIGKILL has been posted to the process SMSACCT - process is keeping micro-state accounting SMSFORK - child inherits micro-state accounting pc: unix:_resume_from_idle+0xf8 resume_return: addq $0x8,%rsp unix:_resume_from_idle+0xf8 resume_return() unix:swtch+0x12a() genunix:cv_wait+0x68() genunix:delay+0xa0() zfs:dnode_special_close+0x20() zfs:dmu_objset_evict+0x37() zfs:dsl_dataset_evict+0x9f() zfs:dbuf_evict_user+0x44() zfs:dmu_buf_rele+0x179() zfs:dsl_dataset_close+0x56() zfs:dmu_recvbackup+0x5bc() zfs:zfs_ioc_recvbackup+0xa3() zfs:zfsdev_ioctl+0x13e() genunix:cdev_ioctl+0x1d() specfs:spec_ioctl+0x50() genunix:fop_ioctl+0x25() genunix:ioctl+0xac() unix:_syscall32_save+0xbf() - -- switch to user thread''s user stack -- this is the ''zfs destroy'' thread: SolarisCAT(vmcore.0/10X)> thread 0xffffffff98c3e400 ==== user (LWP_SYS) thread: 0xffffffff98c3e400 PID: 10110 ===cmd: zfs destroy export/filebench t_wchan: 0xffffffffb8c381d8 sobj: mutex owner: 0xffffffff9d1c1500 top owner is waiting for a cv: 0xffffffff9d1c1500 t_procp: 0xffffffffaf551e50 p_as: 0xffffffffae408628 size: 4579328 RSS: 1826816 hat: 0xffffffff9d258270 cpuset: zone: global t_stk: 0xfffffe80001d3f10 sp: 0xfffffe80001d3a40 t_stkbase: 0xfffffe80001cf000 t_pri: 60(TS) pctcpu: 0.000006 t_lwp: 0xfffffe84cc6d7760 lwp_regs: 0xfffffe80001d3f10 mstate: LMS_SLEEP ms_prev: LMS_SYSTEM ms_state_start: 5 minutes 6.360093741 seconds earlier ms_start: 5 minutes 6.366816867 seconds earlier psrset: 0 last CPU: 2 idle: 30615 ticks (5 minutes 6.15 seconds) start: Fri Sep 26 21:51:45 2008 age: 306 seconds (5 minutes 6 seconds) syscall: #54 ioctl(, 0x0) (sysent: genunix:ioctl+0x0) tstate: TS_SLEEP - awaiting an event tflg: T_DFLTSTK - stack is default size tpflg: TP_TWAIT - wait to be freed by lwp_wait TP_MSACCT - collect micro-state accounting information tsched: TS_LOAD - thread is in memory TS_DONT_SWAP - thread/LWP should not be swapped pflag: SKILLED - SIGKILL has been posted to the process SMSACCT - process is keeping micro-state accounting SMSFORK - child inherits micro-state accounting pc: unix:_resume_from_idle+0xf8 resume_return: addq $0x8,%rsp unix:_resume_from_idle+0xf8 resume_return() unix:swtch+0x12a() genunix:turnstile_block+0x1f9() unix:mutex_vector_enter+0x239() zfs:dbuf_read+0x6c() zfs:dmu_bonus_hold+0x8f() zfs:dsl_dataset_open_obj+0x55() zfs:dsl_dataset_open_spa+0x142() zfs:dsl_dataset_open+0x16() zfs:dmu_objset_open+0x49() zfs:zfs_ioc_objset_stats+0x3d() zfs:zfsdev_ioctl+0x13e() genunix:cdev_ioctl+0x1d() specfs:spec_ioctl+0x50() genunix:fop_ioctl+0x25() genunix:ioctl+0xac() unix:_syscall32_save+0xbf() - -- switch to user thread''s user stack -- and this is a ''zfs list'' command which was also hung: SolarisCAT(vmcore.0/10X)> thread 0xffffffff98d78060 ==== user (LWP_SYS) thread: 0xffffffff98d78060 PID: 10100 ===cmd: zfs list t_wchan: 0xffffffffb8c381d8 sobj: mutex owner: 0xffffffff9d1c1500 top owner is waiting for a cv: 0xffffffff9d1c1500 t_procp: 0xffffffffaf553720 p_as: 0xffffffffaf490ee0 size: 4841472 RSS: 2166784 hat: 0xffffffffaf4f0740 cpuset: zone: global t_stk: 0xfffffe800015bf10 sp: 0xfffffe800015ba00 t_stkbase: 0xfffffe8000157000 t_pri: 60(TS) pctcpu: 0.000001 t_lwp: 0xfffffe84cc77e850 lwp_regs: 0xfffffe800015bf10 mstate: LMS_SLEEP ms_prev: LMS_SYSTEM ms_state_start: 5 minutes 55.321220812 seconds earlier ms_start: 5 minutes 55.331209204 seconds earlier psrset: 0 last CPU: 2 idle: 35511 ticks (5 minutes 55.11 seconds) start: Fri Sep 26 21:50:56 2008 age: 355 seconds (5 minutes 55 seconds) syscall: #54 ioctl(, 0x0) (sysent: genunix:ioctl+0x0) tstate: TS_SLEEP - awaiting an event tflg: T_DFLTSTK - stack is default size tpflg: TP_TWAIT - wait to be freed by lwp_wait TP_MSACCT - collect micro-state accounting information tsched: TS_LOAD - thread is in memory TS_DONT_SWAP - thread/LWP should not be swapped pflag: SMSACCT - process is keeping micro-state accounting SMSFORK - child inherits micro-state accounting pc: unix:_resume_from_idle+0xf8 resume_return: addq $0x8,%rsp unix:_resume_from_idle+0xf8 resume_return() unix:swtch+0x12a() genunix:turnstile_block+0x1f9() unix:mutex_vector_enter+0x239() zfs:dbuf_read+0x6c() zfs:dmu_bonus_hold+0x8f() zfs:dsl_dataset_open_obj+0x55() zfs:dsl_dataset_open_spa+0x142() zfs:dsl_dataset_open+0x16() zfs:dmu_objset_open+0x49() zfs:zfs_ioc_objset_stats+0x3d() zfs:zfs_ioc_dataset_list_next+0x13e() zfs:zfsdev_ioctl+0x13e() genunix:cdev_ioctl+0x1d() specfs:spec_ioctl+0x50() genunix:fop_ioctl+0x25() genunix:ioctl+0xac() unix:_syscall32_save+0xbf() - -- switch to user thread''s user stack -- the list and destroy are both waiting on the same mtx: SolarisCAT(vmcore.0/10X)> mutex -l 0xffffffffb8c381d8 adaptive mutex: owner: 0xffffffff9d1c1500 waiters: true turnstile @ 0xffffffff9884f240 ts_next: 0x0 ts_free: 0xffffffff9879bec8 ts_sobj: 0xffffffffb8c381d8 ts_waiters: 2 ts_epri: 60 ts_inheritor: 0xffffffff9d1c1500 ts_prioinv: 0x0 writer sleepq: thread pri pctcpu idle PID wchan command 0xffffffff98d78060 60 0.000 -5m55.11s 10100 0xffffffffb8c381d8 zfs list 0xffffffff98c3e400 60 0.000 -5m6.15s 10110 0xffffffffb8c381d8 zfs destroy export/filebench reader sleepq: NULL while the recv waits on a cv: SolarisCAT(vmcore.0/10X)> cv 0xffffffff9d1c16d6 cv_waiters: 1 sleepq_head @ 0xfffffffffbcd1ac0, sq_lock: 0x0 thread pri pctcpu idle PID wchan command 0xffffffff9d1c1500 60 0.019 0s 10090 0xffffffff9d1c16d6 /sbin/zfs receive export/filebench at zrep-75 now, i can''t be entirely sure that the system was really hung rather than blocking on the ''zfs recv'' (which often takes a very long time, even for a smallish stream); however, i waited at least 5 minutes, which is usually enough for the recv to complete. i realise this is an OpenSolaris forum, but i assume people won''t mind reports of possibly serious bugs... - river. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (SunOS) iEYEARECAAYFAkjhmJgACgkQIXd7fCuc5vIvqwCgiTSBW0RBzvju+Jc1DXeonvgh goQAnjCZrZd2dEGE5kxKeFflO0Dazira =kHGP -----END PGP SIGNATURE-----