Hi. I''m observing the following deadlock. One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O: Tracing pid 1188 tid 100114 td 0xc41b1660 sched_switch(c41b1660,0,1,180,9bbb0333,...) at sched_switch+0x1b6 mi_switch(1,0,c0666ae9,1bf,c41b1660,...) at mi_switch+0x1ee sleepq_switch(c41b1660,0,c0666ae9,21d,c41b1660,...) at sleepq_switch+0xf0 sleepq_wait(de465f94,de465f7c,c43fc60a,1,0,...) at sleepq_wait+0x60 _cv_wait(de465f94,de465f7c,c43fc4cd,34f,0,...) at _cv_wait+0x1fc zio_wait(de465d68,c43ea3c5,240,4000,0,...) at zio_wait+0x99 dbuf_read(d3a39594,de465d68,2,c43ee6df,d2e09900,...) at dbuf_read+0x2e7 dnode_hold_impl(c4bbf000,a075,0,1,c43ec318,...) at dnode_hold_impl+0x15a dnode_hold(c4bbf000,a075,0,c43ec318,f899a758,...) at dnode_hold+0x35 dmu_bonus_hold(c480e540,a075,0,0,f899a7a0,...) at dmu_bonus_hold+0x31 zfs_zget(c3f4f000,a075,0,f899a89c,1,...) at zfs_zget+0x7a zfs_dirent_lock(f899a8a0,c4d18c64,f899a928,f899a89c,6,...) at zfs_dirent_lock+0x619 zfs_dirlook(c4d18c64,f899a928,f899ac20,0,0,...) at zfs_dirlook+0x272 zfs_lookup(cbef4414,f899a928,f899ac20,f899ac34,2,...) at zfs_lookup+0x2df zfs_freebsd_lookup(f899aa70,c0685c24,cbef4414,cbef4414,f899ac34,...) at zfs_freebsd_lookup+0xdd VOP_CACHEDLOOKUP_APV(c4405800,f899aa70,f899ac34,f899ac20,c3f65200,...) at VOP_CACHEDLOOKUP_APV+0xc5 vfs_cache_lookup(f899aaf4,c066e847,cbef4414,1,cbef4414,...) at vfs_cache_lookup+0xd3 VOP_LOOKUP_APV(c4405800,f899aaf4,c41b1660,c066d7b8,19e,...) at VOP_LOOKUP_APV+0xe5 lookup(f899ac0c,c066d7b8,c9,bf,ca5b032c,...) at lookup+0x53e namei(f899ac0c,0,c0679402,8d6,c065f266,...) at namei+0x2fe kern_rename(c41b1660,2820d040,2820c040,0,f899ad2c,...) at kern_rename+0x4a rename(c41b1660,f899acfc,8,c06758bf,c06924e0,...) at rename+0x29 syscall(f899ad38) at syscall+0x283 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (128, FreeBSD ELF32, rename), eip = 0x280d1cdb, esp = 0xbfbfe7fc, ebp = 0xbfbfea48 --- Another thread tries to finish I/O, but can''t, because it is trying to acquire zfsvfs->z_hold_mtx[i] lock: Tracing command spa_zio_intr_2 pid 1117 tid 100020 td 0xc3b38440 sched_switch(c3b38440,0,1,180,9d2b0e9f,...) at sched_switch+0x1b6 mi_switch(1,0,c0666ae9,1bf,c3b38440,...) at mi_switch+0x1ee sleepq_switch(c3b38440,0,c0666ae9,21d,c3f4f6a4,...) at sleepq_switch+0xf0 sleepq_wait(c3f4f6a4,0,c43fba6f,3,0,...) at sleepq_wait+0x60 _sx_xlock_hard(c3f4f6a4,c3b38440,0,c43fb80c,31e,...) at _sx_xlock_hard+0x286 _sx_xlock(c3f4f6a4,0,c43fb80c,31e,c7af015c,...) at _sx_xlock+0xb8 zfs_zinactive(d0d00988,0,c4401326,e27,d0d00988,...) at zfs_zinactive+0xa2 zfs_inactive(c7af015c,c3aa9600,0,c7af015c,f8aa2888,...) at zfs_inactive+0x307 zfs_freebsd_inactive(f8aa28a0,c06856cc,c7af0230,c7af0230,c7af015c,...) at zfs_freebsd_inactive+0x32 VOP_INACTIVE_APV(c4405800,f8aa28a0,c066e0c7,8e9,c06b5120,...) at VOP_INACTIVE_APV+0xc5 vinactive(c7af0230,0,c066e0c7,859,858,...) at vinactive+0xb1 vrele(c7af015c,c71317d0,f8aa2924,c04d16b0,c71317d0,...) at vrele+0x18f zfs_get_done(c5832d8c,c71317d0,292,28d,d2387b70,...) at zfs_get_done+0xad dmu_sync_done(d11e56b4,ccc323c0,c6940930,0,0,...) at dmu_sync_done+0x1f0 arc_write_done(d11e56b4,d4863b2c,e38a2208,4d3,d11d68f0,...) at arc_write_done+0x44a zio_done(d11e56b4,c44033e0,f8aa2a38,c434906a,d13c9e00,...) at zio_done+0xb2 zio_next_stage(d11e56b4,d11e56f8,80,0,f8aa2a68,...) at zio_next_stage+0x236 zio_assess(d11e56b4,c3b384d8,c069938c,c43fc4cd,d11e58c8,...) at zio_assess+0x843 zio_next_stage(d11e56b4,c43fc4cd,36d,36a,f8aa2b44,...) at zio_next_stage+0x236 zio_wait_for_children(d11e56b4,12,d11e58bc,f8aa2b8c,c436cc76,...) at zio_wait_for_children+0x99 zio_wait_children_done(d11e56b4,c0a575a0,c3b384d8,c0697e04,c0679402,...) at zio_wait_children_done+0x25 zio_next_stage(d11e56b4,c1074808,0,c0679402,8d6,...) at zio_next_stage+0x236 zio_vdev_io_assess(d11e56b4,c44033e0,40,cef4def8,c3,...) at zio_vdev_io_assess+0x2a7 zio_next_stage(d11e56b4,c3b38440,f8aa2c3c,246,c0a116b4,...) at zio_next_stage+0x236 vdev_mirror_io_done(d11e56b4,f8aa2cf8,c42c94de,d11e56b4,0,...) at vdev_mirror_io_done+0x113 zio_vdev_io_done(d11e56b4,0,c43e6f2d,33d,c442ec14,...) at zio_vdev_io_done+0x21 taskq_thread(c442ebf4,f8aa2d38,c065fcb9,31e,c3b37aa0,...) at taskq_thread+0x29e fork_exit(c42c9240,c442ebf4,f8aa2d38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xf8aa2d70, ebp = 0 --- As you can see, the SPA ZIO INTR thread wants to finish I/O requests, calls arc_write_done(). All in all we call vn_rele(), which may end up in calling zfs_zinactive() if this was the last vnode reference. zfs_zinactive() tries to acquire zfsvfs->z_hold_mtx[i] lock, but can''t, because it is already held by the thread waiting for I/O. Neither I/O can be finished, nor lock can be unlocked. I started to see this deadlock after last integration, so it was introduced in more or less during last month. Any ideas how to solve it? -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071105/e1a95765/attachment.bin>
On Mon, Nov 05, 2007 at 02:22:12PM +0100, Pawel Jakub Dawidek wrote:> Hi. > > I''m observing the following deadlock. > > One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O:[...]> Another thread tries to finish I/O, but can''t, because it is trying to > acquire zfsvfs->z_hold_mtx[i] lock:[...]> As you can see, the SPA ZIO INTR thread wants to finish I/O requests, calls > arc_write_done(). All in all we call vn_rele(), which may end up in calling > zfs_zinactive() if this was the last vnode reference. zfs_zinactive() tries > to acquire zfsvfs->z_hold_mtx[i] lock, but can''t, because it is already held > by the thread waiting for I/O. Neither I/O can be finished, nor lock can be > unlocked. > > I started to see this deadlock after last integration, so it was introduced > in more or less during last month. Any ideas how to solve it?I was able to trigger this deadlock with older code as well. It seems I fixed it be changing the zfs_get_done() function from: void zfs_get_done(dmu_buf_t *db, void *vzgd) { zgd_t *zgd = (zgd_t *)vzgd; rl_t *rl = zgd->zgd_rl; vnode_t *vp = ZTOV(rl->r_zp); dmu_buf_rele(db, vzgd); zfs_range_unlock(rl); VN_RELE(vp); zil_add_vdev(zgd->zgd_zilog, DVA_GET_VDEV(BP_IDENTITY(zgd->zgd_bp))); kmem_free(zgd, sizeof (zgd_t)); } To something like this: void zfs_get_done(dmu_buf_t *db, void *vzgd) { zgd_t *zgd = (zgd_t *)vzgd; rl_t *rl = zgd->zgd_rl; vnode_t *vp = ZTOV(rl->r_zp); dmu_buf_rele(db, vzgd); zfs_range_unlock(rl); zgd->zgd_vnode = vp; zil_add_vdev(zgd->zgd_zilog, DVA_GET_VDEV(BP_IDENTITY(zgd->zgd_bp))); /* * Put the vnode onto the list. Call VN_RELE() on this vnode from * within another kernel thread later (also free zgd there). */ mtx_lock(&zgd_done_mtx); TAILQ_INSERT_TAIL(&zgd_done, zgd, zgd_next); mtx_unlock(&zgd_done_mtx); wakeup(&zgd_done); } Which bascially means, that VN_RELE() will be called from another thread later. This seems to work, but I changed the order of VN_RELE() and zil_add_vdev() calls. Is that safe? Doing zil_add_vdev() also from this additional thread didn''t worked for me. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071107/e27a4697/attachment.bin>
Pawel, I''m not quite sure I understand why thread #1 below is stalled. Is there only a single thread available for IO completion? -Mark Pawel Jakub Dawidek wrote:> Hi. > > I''m observing the following deadlock. > > One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O: > > Tracing pid 1188 tid 100114 td 0xc41b1660 > sched_switch(c41b1660,0,1,180,9bbb0333,...) at sched_switch+0x1b6 > mi_switch(1,0,c0666ae9,1bf,c41b1660,...) at mi_switch+0x1ee > sleepq_switch(c41b1660,0,c0666ae9,21d,c41b1660,...) at sleepq_switch+0xf0 > sleepq_wait(de465f94,de465f7c,c43fc60a,1,0,...) at sleepq_wait+0x60 > _cv_wait(de465f94,de465f7c,c43fc4cd,34f,0,...) at _cv_wait+0x1fc > zio_wait(de465d68,c43ea3c5,240,4000,0,...) at zio_wait+0x99 > dbuf_read(d3a39594,de465d68,2,c43ee6df,d2e09900,...) at dbuf_read+0x2e7 > dnode_hold_impl(c4bbf000,a075,0,1,c43ec318,...) at dnode_hold_impl+0x15a > dnode_hold(c4bbf000,a075,0,c43ec318,f899a758,...) at dnode_hold+0x35 > dmu_bonus_hold(c480e540,a075,0,0,f899a7a0,...) at dmu_bonus_hold+0x31 > zfs_zget(c3f4f000,a075,0,f899a89c,1,...) at zfs_zget+0x7a > zfs_dirent_lock(f899a8a0,c4d18c64,f899a928,f899a89c,6,...) at zfs_dirent_lock+0x619 > zfs_dirlook(c4d18c64,f899a928,f899ac20,0,0,...) at zfs_dirlook+0x272 > zfs_lookup(cbef4414,f899a928,f899ac20,f899ac34,2,...) at zfs_lookup+0x2df > zfs_freebsd_lookup(f899aa70,c0685c24,cbef4414,cbef4414,f899ac34,...) at zfs_freebsd_lookup+0xdd > VOP_CACHEDLOOKUP_APV(c4405800,f899aa70,f899ac34,f899ac20,c3f65200,...) at VOP_CACHEDLOOKUP_APV+0xc5 > vfs_cache_lookup(f899aaf4,c066e847,cbef4414,1,cbef4414,...) at vfs_cache_lookup+0xd3 > VOP_LOOKUP_APV(c4405800,f899aaf4,c41b1660,c066d7b8,19e,...) at VOP_LOOKUP_APV+0xe5 > lookup(f899ac0c,c066d7b8,c9,bf,ca5b032c,...) at lookup+0x53e > namei(f899ac0c,0,c0679402,8d6,c065f266,...) at namei+0x2fe > kern_rename(c41b1660,2820d040,2820c040,0,f899ad2c,...) at kern_rename+0x4a > rename(c41b1660,f899acfc,8,c06758bf,c06924e0,...) at rename+0x29 > syscall(f899ad38) at syscall+0x283 > Xint0x80_syscall() at Xint0x80_syscall+0x20 > --- syscall (128, FreeBSD ELF32, rename), eip = 0x280d1cdb, esp = 0xbfbfe7fc, ebp = 0xbfbfea48 --- > > Another thread tries to finish I/O, but can''t, because it is trying to > acquire zfsvfs->z_hold_mtx[i] lock: > > Tracing command spa_zio_intr_2 pid 1117 tid 100020 td 0xc3b38440 > sched_switch(c3b38440,0,1,180,9d2b0e9f,...) at sched_switch+0x1b6 > mi_switch(1,0,c0666ae9,1bf,c3b38440,...) at mi_switch+0x1ee > sleepq_switch(c3b38440,0,c0666ae9,21d,c3f4f6a4,...) at sleepq_switch+0xf0 > sleepq_wait(c3f4f6a4,0,c43fba6f,3,0,...) at sleepq_wait+0x60 > _sx_xlock_hard(c3f4f6a4,c3b38440,0,c43fb80c,31e,...) at _sx_xlock_hard+0x286 > _sx_xlock(c3f4f6a4,0,c43fb80c,31e,c7af015c,...) at _sx_xlock+0xb8 > zfs_zinactive(d0d00988,0,c4401326,e27,d0d00988,...) at zfs_zinactive+0xa2 > zfs_inactive(c7af015c,c3aa9600,0,c7af015c,f8aa2888,...) at zfs_inactive+0x307 > zfs_freebsd_inactive(f8aa28a0,c06856cc,c7af0230,c7af0230,c7af015c,...) at zfs_freebsd_inactive+0x32 > VOP_INACTIVE_APV(c4405800,f8aa28a0,c066e0c7,8e9,c06b5120,...) at VOP_INACTIVE_APV+0xc5 > vinactive(c7af0230,0,c066e0c7,859,858,...) at vinactive+0xb1 > vrele(c7af015c,c71317d0,f8aa2924,c04d16b0,c71317d0,...) at vrele+0x18f > zfs_get_done(c5832d8c,c71317d0,292,28d,d2387b70,...) at zfs_get_done+0xad > dmu_sync_done(d11e56b4,ccc323c0,c6940930,0,0,...) at dmu_sync_done+0x1f0 > arc_write_done(d11e56b4,d4863b2c,e38a2208,4d3,d11d68f0,...) at arc_write_done+0x44a > zio_done(d11e56b4,c44033e0,f8aa2a38,c434906a,d13c9e00,...) at zio_done+0xb2 > zio_next_stage(d11e56b4,d11e56f8,80,0,f8aa2a68,...) at zio_next_stage+0x236 > zio_assess(d11e56b4,c3b384d8,c069938c,c43fc4cd,d11e58c8,...) at zio_assess+0x843 > zio_next_stage(d11e56b4,c43fc4cd,36d,36a,f8aa2b44,...) at zio_next_stage+0x236 > zio_wait_for_children(d11e56b4,12,d11e58bc,f8aa2b8c,c436cc76,...) at zio_wait_for_children+0x99 > zio_wait_children_done(d11e56b4,c0a575a0,c3b384d8,c0697e04,c0679402,...) at zio_wait_children_done+0x25 > zio_next_stage(d11e56b4,c1074808,0,c0679402,8d6,...) at zio_next_stage+0x236 > zio_vdev_io_assess(d11e56b4,c44033e0,40,cef4def8,c3,...) at zio_vdev_io_assess+0x2a7 > zio_next_stage(d11e56b4,c3b38440,f8aa2c3c,246,c0a116b4,...) at zio_next_stage+0x236 > vdev_mirror_io_done(d11e56b4,f8aa2cf8,c42c94de,d11e56b4,0,...) at vdev_mirror_io_done+0x113 > zio_vdev_io_done(d11e56b4,0,c43e6f2d,33d,c442ec14,...) at zio_vdev_io_done+0x21 > taskq_thread(c442ebf4,f8aa2d38,c065fcb9,31e,c3b37aa0,...) at taskq_thread+0x29e > fork_exit(c42c9240,c442ebf4,f8aa2d38) at fork_exit+0xb8 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0, eip = 0, esp = 0xf8aa2d70, ebp = 0 --- > > As you can see, the SPA ZIO INTR thread wants to finish I/O requests, calls > arc_write_done(). All in all we call vn_rele(), which may end up in calling > zfs_zinactive() if this was the last vnode reference. zfs_zinactive() tries > to acquire zfsvfs->z_hold_mtx[i] lock, but can''t, because it is already held > by the thread waiting for I/O. Neither I/O can be finished, nor lock can be > unlocked. > > I started to see this deadlock after last integration, so it was introduced > in more or less during last month. Any ideas how to solve it? > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code
On Wed, Nov 07, 2007 at 07:01:52AM -0700, Mark Maybee wrote:> Pawel, > > I''m not quite sure I understand why thread #1 below is stalled. Is > there only a single thread available for IO completion?There are few, but I belive the thread #2 is trying to complete the very I/O request on which thread #1 is waiting. Thread #2 can''t complete this I/O request, because the lock it needs to acquire is held by thread #1.> > One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O: > > > > Tracing pid 1188 tid 100114 td 0xc41b1660[...]> > _cv_wait(de465f94,de465f7c,c43fc4cd,34f,0,...) at _cv_wait+0x1fc > > zio_wait(de465d68,c43ea3c5,240,4000,0,...) at zio_wait+0x99 > > dbuf_read(d3a39594,de465d68,2,c43ee6df,d2e09900,...) at dbuf_read+0x2e7 > > dnode_hold_impl(c4bbf000,a075,0,1,c43ec318,...) at dnode_hold_impl+0x15a > > dnode_hold(c4bbf000,a075,0,c43ec318,f899a758,...) at dnode_hold+0x35 > > dmu_bonus_hold(c480e540,a075,0,0,f899a7a0,...) at dmu_bonus_hold+0x31 > > zfs_zget(c3f4f000,a075,0,f899a89c,1,...) at zfs_zget+0x7a > > zfs_dirent_lock(f899a8a0,c4d18c64,f899a928,f899a89c,6,...) at zfs_dirent_lock+0x619 > > zfs_dirlook(c4d18c64,f899a928,f899ac20,0,0,...) at zfs_dirlook+0x272 > > zfs_lookup(cbef4414,f899a928,f899ac20,f899ac34,2,...) at zfs_lookup+0x2df[...]> > Another thread tries to finish I/O, but can''t, because it is trying to > > acquire zfsvfs->z_hold_mtx[i] lock: > > > > Tracing command spa_zio_intr_2 pid 1117 tid 100020 td 0xc3b38440[...]> > _sx_xlock(c3f4f6a4,0,c43fb80c,31e,c7af015c,...) at _sx_xlock+0xb8 > > zfs_zinactive(d0d00988,0,c4401326,e27,d0d00988,...) at zfs_zinactive+0xa2 > > zfs_inactive(c7af015c,c3aa9600,0,c7af015c,f8aa2888,...) at zfs_inactive+0x307[...]> > zfs_get_done(c5832d8c,c71317d0,292,28d,d2387b70,...) at zfs_get_done+0xad > > dmu_sync_done(d11e56b4,ccc323c0,c6940930,0,0,...) at dmu_sync_done+0x1f0 > > arc_write_done(d11e56b4,d4863b2c,e38a2208,4d3,d11d68f0,...) at arc_write_done+0x44a > > zio_done(d11e56b4,c44033e0,f8aa2a38,c434906a,d13c9e00,...) at zio_done+0xb2 > > zio_next_stage(d11e56b4,d11e56f8,80,0,f8aa2a68,...) at zio_next_stage+0x236 > > zio_assess(d11e56b4,c3b384d8,c069938c,c43fc4cd,d11e58c8,...) at zio_assess+0x843 > > zio_next_stage(d11e56b4,c43fc4cd,36d,36a,f8aa2b44,...) at zio_next_stage+0x236 > > zio_wait_for_children(d11e56b4,12,d11e58bc,f8aa2b8c,c436cc76,...) at zio_wait_for_children+0x99 > > zio_wait_children_done(d11e56b4,c0a575a0,c3b384d8,c0697e04,c0679402,...) at zio_wait_children_done+0x25 > > zio_next_stage(d11e56b4,c1074808,0,c0679402,8d6,...) at zio_next_stage+0x236 > > zio_vdev_io_assess(d11e56b4,c44033e0,40,cef4def8,c3,...) at zio_vdev_io_assess+0x2a7 > > zio_next_stage(d11e56b4,c3b38440,f8aa2c3c,246,c0a116b4,...) at zio_next_stage+0x236 > > vdev_mirror_io_done(d11e56b4,f8aa2cf8,c42c94de,d11e56b4,0,...) at vdev_mirror_io_done+0x113 > > zio_vdev_io_done(d11e56b4,0,c43e6f2d,33d,c442ec14,...) at zio_vdev_io_done+0x21[...] BTW. Why I always see vdev_mirror_io_done() even if I don''t use mirror? Here I had RAIDZ vdev. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071107/7939b295/attachment.bin>
Hmm, seems rather unlikely that these two IOs are related. Thread 1 is trying to read a dnode in order to extract the znode data from its bonus buffer. Thread 2 is completing a dmu_sync() write (so this is the result of a zil operation). While its possible that the dmu_sync() write may involve reading some of the blocks in the dnode from Thread 1, this should not result in Thread 1 waiting for anything. I think vdev_mirror_io_done() often shows up in the stack because the ditto-ing code leverages that code path. -Mark Pawel Jakub Dawidek wrote:> On Wed, Nov 07, 2007 at 07:01:52AM -0700, Mark Maybee wrote: >> Pawel, >> >> I''m not quite sure I understand why thread #1 below is stalled. Is >> there only a single thread available for IO completion? > > There are few, but I belive the thread #2 is trying to complete the very > I/O request on which thread #1 is waiting. > Thread #2 can''t complete this I/O request, because the lock it needs to > acquire is held by thread #1. > >>> One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O: >>> >>> Tracing pid 1188 tid 100114 td 0xc41b1660 > [...] >>> _cv_wait(de465f94,de465f7c,c43fc4cd,34f,0,...) at _cv_wait+0x1fc >>> zio_wait(de465d68,c43ea3c5,240,4000,0,...) at zio_wait+0x99 >>> dbuf_read(d3a39594,de465d68,2,c43ee6df,d2e09900,...) at dbuf_read+0x2e7 >>> dnode_hold_impl(c4bbf000,a075,0,1,c43ec318,...) at dnode_hold_impl+0x15a >>> dnode_hold(c4bbf000,a075,0,c43ec318,f899a758,...) at dnode_hold+0x35 >>> dmu_bonus_hold(c480e540,a075,0,0,f899a7a0,...) at dmu_bonus_hold+0x31 >>> zfs_zget(c3f4f000,a075,0,f899a89c,1,...) at zfs_zget+0x7a >>> zfs_dirent_lock(f899a8a0,c4d18c64,f899a928,f899a89c,6,...) at zfs_dirent_lock+0x619 >>> zfs_dirlook(c4d18c64,f899a928,f899ac20,0,0,...) at zfs_dirlook+0x272 >>> zfs_lookup(cbef4414,f899a928,f899ac20,f899ac34,2,...) at zfs_lookup+0x2df > [...] > >>> Another thread tries to finish I/O, but can''t, because it is trying to >>> acquire zfsvfs->z_hold_mtx[i] lock: >>> >>> Tracing command spa_zio_intr_2 pid 1117 tid 100020 td 0xc3b38440 > [...] >>> _sx_xlock(c3f4f6a4,0,c43fb80c,31e,c7af015c,...) at _sx_xlock+0xb8 >>> zfs_zinactive(d0d00988,0,c4401326,e27,d0d00988,...) at zfs_zinactive+0xa2 >>> zfs_inactive(c7af015c,c3aa9600,0,c7af015c,f8aa2888,...) at zfs_inactive+0x307 > [...] >>> zfs_get_done(c5832d8c,c71317d0,292,28d,d2387b70,...) at zfs_get_done+0xad >>> dmu_sync_done(d11e56b4,ccc323c0,c6940930,0,0,...) at dmu_sync_done+0x1f0 >>> arc_write_done(d11e56b4,d4863b2c,e38a2208,4d3,d11d68f0,...) at arc_write_done+0x44a >>> zio_done(d11e56b4,c44033e0,f8aa2a38,c434906a,d13c9e00,...) at zio_done+0xb2 >>> zio_next_stage(d11e56b4,d11e56f8,80,0,f8aa2a68,...) at zio_next_stage+0x236 >>> zio_assess(d11e56b4,c3b384d8,c069938c,c43fc4cd,d11e58c8,...) at zio_assess+0x843 >>> zio_next_stage(d11e56b4,c43fc4cd,36d,36a,f8aa2b44,...) at zio_next_stage+0x236 >>> zio_wait_for_children(d11e56b4,12,d11e58bc,f8aa2b8c,c436cc76,...) at zio_wait_for_children+0x99 >>> zio_wait_children_done(d11e56b4,c0a575a0,c3b384d8,c0697e04,c0679402,...) at zio_wait_children_done+0x25 >>> zio_next_stage(d11e56b4,c1074808,0,c0679402,8d6,...) at zio_next_stage+0x236 >>> zio_vdev_io_assess(d11e56b4,c44033e0,40,cef4def8,c3,...) at zio_vdev_io_assess+0x2a7 >>> zio_next_stage(d11e56b4,c3b38440,f8aa2c3c,246,c0a116b4,...) at zio_next_stage+0x236 >>> vdev_mirror_io_done(d11e56b4,f8aa2cf8,c42c94de,d11e56b4,0,...) at vdev_mirror_io_done+0x113 >>> zio_vdev_io_done(d11e56b4,0,c43e6f2d,33d,c442ec14,...) at zio_vdev_io_done+0x21 > [...] > > BTW. Why I always see vdev_mirror_io_done() even if I don''t use mirror? > Here I had RAIDZ vdev. > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code
On Wed, Nov 07, 2007 at 07:41:54AM -0700, Mark Maybee wrote:> Hmm, seems rather unlikely that these two IOs are related. Thread 1 > is trying to read a dnode in order to extract the znode data from its > bonus buffer. Thread 2 is completing a dmu_sync() write (so this is > the result of a zil operation). While its possible that the dmu_sync() > write may involve reading some of the blocks in the dnode from Thread 1, > this should not result in Thread 1 waiting for anything.Maybe my interpretation is wrong, but moving this VN_RELE() to separate kernel thread for later fixes deadlock for me.> I think vdev_mirror_io_done() often shows up in the stack because the > ditto-ing code leverages that code path.Ahh, ok. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071107/5abf0aa6/attachment.bin>
Pawel Jakub Dawidek wrote:> On Wed, Nov 07, 2007 at 07:41:54AM -0700, Mark Maybee wrote: >> Hmm, seems rather unlikely that these two IOs are related. Thread 1 >> is trying to read a dnode in order to extract the znode data from its >> bonus buffer. Thread 2 is completing a dmu_sync() write (so this is >> the result of a zil operation). While its possible that the dmu_sync() >> write may involve reading some of the blocks in the dnode from Thread 1, >> this should not result in Thread 1 waiting for anything. > > Maybe my interpretation is wrong, but moving this VN_RELE() to separate > kernel thread for later fixes deadlock for me. >Right, but I think the problem here is that you are running out of IO threads. Do you maintain separate thread sets for each "type" of IO (i.e. reads, writes, etc)? We found, early on, that if we didn''t reserve threads for specific types of IOs that we could run into this type of deadlock. -Mark
On Wed, Nov 07, 2007 at 08:57:53AM -0700, Mark Maybee wrote:> Pawel Jakub Dawidek wrote: > > On Wed, Nov 07, 2007 at 07:41:54AM -0700, Mark Maybee wrote: > >> Hmm, seems rather unlikely that these two IOs are related. Thread 1 > >> is trying to read a dnode in order to extract the znode data from its > >> bonus buffer. Thread 2 is completing a dmu_sync() write (so this is > >> the result of a zil operation). While its possible that the dmu_sync() > >> write may involve reading some of the blocks in the dnode from Thread 1, > >> this should not result in Thread 1 waiting for anything. > > > > Maybe my interpretation is wrong, but moving this VN_RELE() to separate > > kernel thread for later fixes deadlock for me. > > > Right, but I think the problem here is that you are running out of IO > threads. Do you maintain separate thread sets for each "type" of IO > (i.e. reads, writes, etc)? We found, early on, that if we didn''t > reserve threads for specific types of IOs that we could run into this > type of deadlock.Yes, I''ve (A * B * C) I/O threads, where A is number of I/O types (6), B is number of CPUs, and C is 2 (issue and intr threads). This is the same in Solaris, I belive. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071107/b796f37f/attachment.bin>