Hi. We''re testing the most recent ZFS version from OpenSolaris ported to FreeBSD. Kris (CCed) observed strange situation. In function arc_read() he had a panic on assertion that we try to unlock a lock which is not beeing held: rw_enter(&pbuf->b_hdr->b_datalock, RW_READER); err = arc_read_nolock(pio, spa, bp, done, private, priority, flags, arc_flags, zb); rw_exit(&pbuf->b_hdr->b_datalock); <--- THIS ONE The only possiblity was that b_hdr for pbuf was changed somewhere. We diagnozed this further and the b_hdr field is changed in arc_release() function, here: buf->b_hdr = nhdr; Backtrace for this change is the following: arc_release() at arc_release+0x4ec dbuf_write() at dbuf_write+0x26b dbuf_sync_list() at dbuf_sync_list+0x3eb dbuf_sync_list() at dbuf_sync_list+0x17f dbuf_sync_list() at dbuf_sync_list+0x17f dbuf_sync_list() at dbuf_sync_list+0x17f dbuf_sync_list() at dbuf_sync_list+0x17f dbuf_sync_list() at dbuf_sync_list+0x17f dbuf_sync_list() at dbuf_sync_list+0x17f dnode_sync() at dnode_sync+0x9bd dmu_objset_sync() at dmu_objset_sync+0x120 dsl_pool_sync() at dsl_pool_sync+0x72 spa_sync() at spa_sync+0x2f3 txg_sync_thread() at txg_sync_thread+0x2cd Do you have any ideas how to fix it? Kris has a way to reproduce it in his environment and I''m sure he could try a patch, if you could provide one. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20080729/44ceb387/attachment.bin>
On Tue, Jul 29, 2008 at 12:41:16PM +0200, Pawel Jakub Dawidek wrote:> Hi. > > We''re testing the most recent ZFS version from OpenSolaris ported to > FreeBSD. Kris (CCed) observed strange situation. In function arc_read() > he had a panic on assertion that we try to unlock a lock which is not > beeing held: > > rw_enter(&pbuf->b_hdr->b_datalock, RW_READER); > > err = arc_read_nolock(pio, spa, bp, done, private, priority, > flags, arc_flags, zb); > > rw_exit(&pbuf->b_hdr->b_datalock); <--- THIS ONE > > The only possiblity was that b_hdr for pbuf was changed somewhere. We > diagnozed this further and the b_hdr field is changed in arc_release() > function, here: > > buf->b_hdr = nhdr;[...] We have a simple test case to reproduce that and a patch that seems to work for us. I don''t really understand the code well enough to be able to prepare the right fix. The patch is here: http://people.freebsd.org/~pjd/patches/arc.patch Script to reproduce it: #!/bin/sh fs=tank/$$ while true; do zfs clone tank/boom at boom $fs find /$fs > /dev/null zfs destroy $fs done To use it, you first need to create pool tank and tank/boom dataset, then put some files into /tank/boom/ (we had FreeBSD source tree in there), then take a snapshot tank/boom at boom and: # cp boom.sh /tank/ # cd /tank # ./boom.sh &; ./boom.sh &; ./boom.sh &; ./boom.sh & You''ll need SMP machine to reproduce that. We tried to reproduce it on recent OpenSolaris, but we have only one CPU in there and we weren''t able to reproduce it. Maybe it is FreeBSD-specific problem, but it doesn''t look like that. Could you guys at least try to reproduce it on some SMP OpenSolaris machine? Thanks in advance! -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20080816/4affb003/attachment.bin>
This is a known bug: 6732083 arc_read() panic: rw_exit: lock not held with a known cause. The fix you suggest works, but it rather ugly. We are working on a fix now. -Mark Pawel Jakub Dawidek wrote:> On Tue, Jul 29, 2008 at 12:41:16PM +0200, Pawel Jakub Dawidek wrote: >> Hi. >> >> We''re testing the most recent ZFS version from OpenSolaris ported to >> FreeBSD. Kris (CCed) observed strange situation. In function arc_read() >> he had a panic on assertion that we try to unlock a lock which is not >> beeing held: >> >> rw_enter(&pbuf->b_hdr->b_datalock, RW_READER); >> >> err = arc_read_nolock(pio, spa, bp, done, private, priority, >> flags, arc_flags, zb); >> >> rw_exit(&pbuf->b_hdr->b_datalock); <--- THIS ONE >> >> The only possiblity was that b_hdr for pbuf was changed somewhere. We >> diagnozed this further and the b_hdr field is changed in arc_release() >> function, here: >> >> buf->b_hdr = nhdr; > [...] > > We have a simple test case to reproduce that and a patch that seems to > work for us. I don''t really understand the code well enough to be able > to prepare the right fix. > > The patch is here: > > http://people.freebsd.org/~pjd/patches/arc.patch > > Script to reproduce it: > > #!/bin/sh > fs=tank/$$ > while true; do > zfs clone tank/boom at boom $fs > find /$fs > /dev/null > zfs destroy $fs > done > > To use it, you first need to create pool tank and tank/boom dataset, > then put some files into /tank/boom/ (we had FreeBSD source tree in > there), then take a snapshot tank/boom at boom and: > > # cp boom.sh /tank/ > # cd /tank > # ./boom.sh &; ./boom.sh &; ./boom.sh &; ./boom.sh & > > You''ll need SMP machine to reproduce that. We tried to reproduce it on > recent OpenSolaris, but we have only one CPU in there and we weren''t > able to reproduce it. Maybe it is FreeBSD-specific problem, but it > doesn''t look like that. Could you guys at least try to reproduce it on > some SMP OpenSolaris machine? > > Thanks in advance! > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code