River Tarnell
2008-Sep-30 03:10 UTC
[zfs-discuss] S10U5: deadlock between ''zfs receive'' and ''zfs list''
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
hi,
i have an X4500 running Solaris 10 Update 5 (with all current patches). it has
a stripe-mirror ZFS pool over 44 disks with 2 hot spare. the system is
entirely idle, except that every 60 seconds, a ''zfs recv'' is
run. a couple of
days ago, while the recv was running, i tried to destroy the pool. the
''zfs
destroy'' command hung; the ''zfs receive'' also seemed
to be hung. neither would
reply to ''kill -9'', and i could not do anything with the pool.
i had to reboot
the system to recover it.
i saved the core dump and have gathered some information on the state at the
time it hung. this is the ''zfs receive'' thread:
SolarisCAT(vmcore.0/10X)> thread 0xffffffff9d1c1500
==== user (LWP_SYS) thread: 0xffffffff9d1c1500 PID: 10090 ===cmd: /sbin/zfs
receive export/filebench at zrep-75
t_wchan: 0xffffffff9d1c16d6 sobj: condition var (from genunix:delay+0xa0)
t_procp: 0xffffffffaefbc710
p_as: 0xffffffffae4087e8 size: 4661248 RSS: 1970176
hat: 0xffffffff9d2583a0 cpuset:
zone: global
t_stk: 0xfffffe8000185f10 sp: 0xfffffe80001859f0 t_stkbase: 0xfffffe8000181000
t_pri: 59(TS) t_epri: 60 pctcpu: 0.019277
t_lwp: 0xfffffe84cc7ed8a0 lwp_regs: 0xfffffe8000185f10
mstate: LMS_SLEEP ms_prev: LMS_SYSTEM
ms_state_start: 0.212252471 seconds earlier
ms_start: 10 minutes 45.391859933 seconds earlier
psrset: 0 last CPU: 0
idle: 0 ticks (0 seconds)
start: Fri Sep 26 21:46:06 2008
age: 645 seconds (10 minutes 45 seconds)
syscall: #54 ioctl(, 0x0) (sysent: genunix:ioctl+0x0)
tstate: TS_SLEEP - awaiting an event
tflg: T_DFLTSTK - stack is default size
tpflg: TP_TWAIT - wait to be freed by lwp_wait
TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SKILLED - SIGKILL has been posted to the process
SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting
pc: unix:_resume_from_idle+0xf8 resume_return: addq $0x8,%rsp
unix:_resume_from_idle+0xf8 resume_return()
unix:swtch+0x12a()
genunix:cv_wait+0x68()
genunix:delay+0xa0()
zfs:dnode_special_close+0x20()
zfs:dmu_objset_evict+0x37()
zfs:dsl_dataset_evict+0x9f()
zfs:dbuf_evict_user+0x44()
zfs:dmu_buf_rele+0x179()
zfs:dsl_dataset_close+0x56()
zfs:dmu_recvbackup+0x5bc()
zfs:zfs_ioc_recvbackup+0xa3()
zfs:zfsdev_ioctl+0x13e()
genunix:cdev_ioctl+0x1d()
specfs:spec_ioctl+0x50()
genunix:fop_ioctl+0x25()
genunix:ioctl+0xac()
unix:_syscall32_save+0xbf()
- -- switch to user thread''s user stack --
this is the ''zfs destroy'' thread:
SolarisCAT(vmcore.0/10X)> thread 0xffffffff98c3e400
==== user (LWP_SYS) thread: 0xffffffff98c3e400 PID: 10110 ===cmd: zfs destroy
export/filebench
t_wchan: 0xffffffffb8c381d8 sobj: mutex owner: 0xffffffff9d1c1500
top owner is waiting for a cv: 0xffffffff9d1c1500
t_procp: 0xffffffffaf551e50
p_as: 0xffffffffae408628 size: 4579328 RSS: 1826816
hat: 0xffffffff9d258270 cpuset:
zone: global
t_stk: 0xfffffe80001d3f10 sp: 0xfffffe80001d3a40 t_stkbase: 0xfffffe80001cf000
t_pri: 60(TS) pctcpu: 0.000006
t_lwp: 0xfffffe84cc6d7760 lwp_regs: 0xfffffe80001d3f10
mstate: LMS_SLEEP ms_prev: LMS_SYSTEM
ms_state_start: 5 minutes 6.360093741 seconds earlier
ms_start: 5 minutes 6.366816867 seconds earlier
psrset: 0 last CPU: 2
idle: 30615 ticks (5 minutes 6.15 seconds)
start: Fri Sep 26 21:51:45 2008
age: 306 seconds (5 minutes 6 seconds)
syscall: #54 ioctl(, 0x0) (sysent: genunix:ioctl+0x0)
tstate: TS_SLEEP - awaiting an event
tflg: T_DFLTSTK - stack is default size
tpflg: TP_TWAIT - wait to be freed by lwp_wait
TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SKILLED - SIGKILL has been posted to the process
SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting
pc: unix:_resume_from_idle+0xf8 resume_return: addq $0x8,%rsp
unix:_resume_from_idle+0xf8 resume_return()
unix:swtch+0x12a()
genunix:turnstile_block+0x1f9()
unix:mutex_vector_enter+0x239()
zfs:dbuf_read+0x6c()
zfs:dmu_bonus_hold+0x8f()
zfs:dsl_dataset_open_obj+0x55()
zfs:dsl_dataset_open_spa+0x142()
zfs:dsl_dataset_open+0x16()
zfs:dmu_objset_open+0x49()
zfs:zfs_ioc_objset_stats+0x3d()
zfs:zfsdev_ioctl+0x13e()
genunix:cdev_ioctl+0x1d()
specfs:spec_ioctl+0x50()
genunix:fop_ioctl+0x25()
genunix:ioctl+0xac()
unix:_syscall32_save+0xbf()
- -- switch to user thread''s user stack --
and this is a ''zfs list'' command which was also hung:
SolarisCAT(vmcore.0/10X)> thread 0xffffffff98d78060
==== user (LWP_SYS) thread: 0xffffffff98d78060 PID: 10100 ===cmd: zfs list
t_wchan: 0xffffffffb8c381d8 sobj: mutex owner: 0xffffffff9d1c1500
top owner is waiting for a cv: 0xffffffff9d1c1500
t_procp: 0xffffffffaf553720
p_as: 0xffffffffaf490ee0 size: 4841472 RSS: 2166784
hat: 0xffffffffaf4f0740 cpuset:
zone: global
t_stk: 0xfffffe800015bf10 sp: 0xfffffe800015ba00 t_stkbase: 0xfffffe8000157000
t_pri: 60(TS) pctcpu: 0.000001
t_lwp: 0xfffffe84cc77e850 lwp_regs: 0xfffffe800015bf10
mstate: LMS_SLEEP ms_prev: LMS_SYSTEM
ms_state_start: 5 minutes 55.321220812 seconds earlier
ms_start: 5 minutes 55.331209204 seconds earlier
psrset: 0 last CPU: 2
idle: 35511 ticks (5 minutes 55.11 seconds)
start: Fri Sep 26 21:50:56 2008
age: 355 seconds (5 minutes 55 seconds)
syscall: #54 ioctl(, 0x0) (sysent: genunix:ioctl+0x0)
tstate: TS_SLEEP - awaiting an event
tflg: T_DFLTSTK - stack is default size
tpflg: TP_TWAIT - wait to be freed by lwp_wait
TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting
pc: unix:_resume_from_idle+0xf8 resume_return: addq $0x8,%rsp
unix:_resume_from_idle+0xf8 resume_return()
unix:swtch+0x12a()
genunix:turnstile_block+0x1f9()
unix:mutex_vector_enter+0x239()
zfs:dbuf_read+0x6c()
zfs:dmu_bonus_hold+0x8f()
zfs:dsl_dataset_open_obj+0x55()
zfs:dsl_dataset_open_spa+0x142()
zfs:dsl_dataset_open+0x16()
zfs:dmu_objset_open+0x49()
zfs:zfs_ioc_objset_stats+0x3d()
zfs:zfs_ioc_dataset_list_next+0x13e()
zfs:zfsdev_ioctl+0x13e()
genunix:cdev_ioctl+0x1d()
specfs:spec_ioctl+0x50()
genunix:fop_ioctl+0x25()
genunix:ioctl+0xac()
unix:_syscall32_save+0xbf()
- -- switch to user thread''s user stack --
the list and destroy are both waiting on the same mtx:
SolarisCAT(vmcore.0/10X)> mutex -l 0xffffffffb8c381d8
adaptive mutex: owner: 0xffffffff9d1c1500 waiters: true
turnstile @ 0xffffffff9884f240
ts_next: 0x0 ts_free: 0xffffffff9879bec8 ts_sobj: 0xffffffffb8c381d8
ts_waiters: 2 ts_epri: 60 ts_inheritor: 0xffffffff9d1c1500
ts_prioinv: 0x0
writer sleepq:
thread pri pctcpu idle PID wchan command
0xffffffff98d78060 60 0.000 -5m55.11s 10100 0xffffffffb8c381d8 zfs list
0xffffffff98c3e400 60 0.000 -5m6.15s 10110 0xffffffffb8c381d8 zfs
destroy export/filebench
reader sleepq: NULL
while the recv waits on a cv:
SolarisCAT(vmcore.0/10X)> cv 0xffffffff9d1c16d6
cv_waiters: 1
sleepq_head @ 0xfffffffffbcd1ac0, sq_lock: 0x0
thread pri pctcpu idle PID wchan command
0xffffffff9d1c1500 60 0.019 0s 10090 0xffffffff9d1c16d6
/sbin/zfs receive export/filebench at zrep-75
now, i can''t be entirely sure that the system was really hung rather
than
blocking on the ''zfs recv'' (which often takes a very long
time, even for a
smallish stream); however, i waited at least 5 minutes, which is usually enough
for the recv to complete.
i realise this is an OpenSolaris forum, but i assume people won''t mind
reports
of possibly serious bugs...
- river.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (SunOS)
iEYEARECAAYFAkjhmJgACgkQIXd7fCuc5vIvqwCgiTSBW0RBzvju+Jc1DXeonvgh
goQAnjCZrZd2dEGE5kxKeFflO0Dazira
=kHGP
-----END PGP SIGNATURE-----