Pawel Jakub Dawidek
2009-Sep-09 10:18 UTC
[zfs-code] mze_insert() panic (avl_find() succeeded inside avl_add())
Hi. There is a bug (6709336), which was closed with ''Not Reproducible'' reason, but I''ve a file system over here, where it is easy to reproduce. Dmitry (fs owner) send me a snapshot of it (zfs send output) and I can reproduce it at will after doing ''ls'' in one of the directories. It looks like there are three entires in directory that are duplicated. After adding this patch, so it stops panicing: http://people.freebsd.org/~pjd/patches/zap_micro.c.2.patch and some more debug we can see something like this: mzap_open: obj=6108 mze_insert: i=00: adding mze_name=[daily.20080701.gz] hash=16887356416216530944 mze_insert: i=01: adding mze_name=[daily.20080702.gz] hash=14729145520459087872 mze_insert: i=02: adding mze_name=[daily.20080703.gz] hash=15098007138724741120 mze_insert: i=03: adding mze_name=[daily.20080704.gz] hash=9227684390178390016 mze_insert: i=04: adding mze_name=[daily.20080705.gz] hash=11376611903104090112 mze_insert: i=05: adding mze_name=[daily.20080706.gz] hash=13533957483210473472 mze_insert: i=06: adding mze_name=[daily.20080707.gz] hash=11978674440062894080 mze_insert: i=07: adding mze_name=[daily.20080708.gz] hash=1783752625867456512 mze_insert: i=08: adding mze_name=[daily.20080709.gz] hash=373697339124088832 mze_insert: i=09: adding mze_name=[daily.20080710.gz] hash=12364645971085230080 mze_insert: i=10: adding mze_name=[daily.20080711.gz] hash=13147843840310247424 mze_insert: i=11: adding mze_name=[daily.20080712.gz] hash=10395742459047968768 mze_insert: i=12: adding mze_name=[daily.20080713.gz] hash=10208414471135690752 mze_insert: i=13: adding mze_name=[daily.20080714.gz] hash=15862282000518873088 mze_insert: i=14: adding mze_name=[daily.20080715.gz] hash=13964729096542879744 mze_insert: i=15: adding mze_name=[daily.20080716.gz] hash=16717135042526052352 mze_insert: i=16: adding mze_name=[daily.20080717.gz] hash=18019389811235749888 mze_insert: i=17: adding mze_name=[daily.20080718.gz] hash=4966448873967976448 mze_insert: i=18: adding mze_name=[daily.20080719.gz] hash=6413922602988863488 mze_insert: i=19: adding mze_name=[daily.20080720.gz] hash=5759986962857459712 mze_insert: i=20: adding mze_name=[daily.20080721.gz] hash=5909039301739413504 mze_insert: i=21: adding mze_name=[daily.20080722.gz] hash=8372896562954633216 mze_insert: i=22: adding mze_name=[daily.20080723.gz] hash=7627984238364590080 mze_insert: i=23: adding mze_name=[daily.20080724.gz] hash=4059358240683589632 mze_insert: i=24: adding mze_name=[daily.20080725.gz] hash=2718826723431940096 mze_insert: i=25: adding mze_name=[daily.20080726.gz] hash=254638509216759808 mze_insert: i=26: adding mze_name=[daily.20080727.gz] hash=2190475858316099584 mze_insert: i=27: adding mze_name=[daily.20080728.gz] hash=11573062813869932544 mze_insert: i=28: adding mze_name=[daily.20080729.gz] hash=13651902507838865408 mze_insert: i=29: adding mze_name=[daily.20080730.gz] hash=1423949746464096256 mze_insert: i=30: adding mze_name=[daily.20080731.gz] hash=1021306183190839296 mzap_open: i=31: empty mze_name mze_insert: i=32: ALREADY EXISTS! mze_name=[daily.20080706.gz] hash=13533957483210473472 mzap_open: i=33: empty mze_name mze_insert: i=34: ALREADY EXISTS! mze_name=[daily.20080711.gz] hash=13147843840310247424 mzap_open: i=35: empty mze_name mze_insert: i=36: ALREADY EXISTS! mze_name=[daily.20080715.gz] hash=13964729096542879744 mzap_open: i=37: empty mze_name mzap_open: i=38: empty mze_name zpool scrub shows not errors. Any idea how we ended up in this situation? If there is ZFS developer interested in tracking it further down, maybe Dmitry will be able to provide snapshot of this corrupted file system. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090909/6a93225c/attachment.bin>
Pawel Jakub Dawidek
2009-Sep-09 10:30 UTC
[zfs-code] mze_insert() panic (avl_find() succeeded inside avl_add())
On Wed, Sep 09, 2009 at 12:18:53PM +0200, Pawel Jakub Dawidek wrote:> Hi. > > There is a bug (6709336), which was closed with ''Not Reproducible'' reason, but > I''ve a file system over here, where it is easy to reproduce. Dmitry (fs owner) > send me a snapshot of it (zfs send output) and I can reproduce it at will after > doing ''ls'' in one of the directories. > > It looks like there are three entires in directory that are duplicated. > > After adding this patch, so it stops panicing: > > http://people.freebsd.org/~pjd/patches/zap_micro.c.2.patch > > and some more debug we can see something like this:[...]> If there is ZFS developer interested in tracking it further down, maybe Dmitry > will be able to provide snapshot of this corrupted file system.Actually I was able to reduce the file system by removing irrelevant files and data (I unlinked non-colliding files and truncated colliding files to 0), now the snapshot is around 6MB. Dmitry, are you ok with me publishing such stripped down file system? -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090909/b98e8e50/attachment.bin>
Pawel Jakub Dawidek
2009-Sep-09 12:51 UTC
[zfs-code] mze_insert() panic (avl_find() succeeded inside avl_add())
On Wed, Sep 09, 2009 at 03:25:58PM +0400, Dmitry Morozovsky wrote:> On Wed, 9 Sep 2009, Pawel Jakub Dawidek wrote: > > PJD> > If there is ZFS developer interested in tracking it further down, maybe Dmitry > PJD> > will be able to provide snapshot of this corrupted file system. > PJD> > PJD> Actually I was able to reduce the file system by removing irrelevant > PJD> files and data (I unlinked non-colliding files and truncated colliding > PJD> files to 0), now the snapshot is around 6MB. > PJD> > PJD> Dmitry, are you ok with me publishing such stripped down file system? > > as you''ve stripped down the content, it''s perfectly ok.Actually I don''t think it is stripped down entirely. The snapshot still has 6MB and this is more or less how big duplicated files were, so I believe data from those 3 files is still there. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090909/166a89e4/attachment.bin>
Matthew Ahrens
2009-Sep-09 17:07 UTC
[zfs-code] mze_insert() panic (avl_find() succeeded inside avl_add())
Unfortunately, I don''t think that the send stream would help us diagnose it further. What we really need is a way to reproduce how to get into this situation in the first place. --matt Pawel Jakub Dawidek wrote:> Hi. > > There is a bug (6709336), which was closed with ''Not Reproducible'' reason, but > I''ve a file system over here, where it is easy to reproduce. Dmitry (fs owner) > send me a snapshot of it (zfs send output) and I can reproduce it at will after > doing ''ls'' in one of the directories. > > It looks like there are three entires in directory that are duplicated. > > After adding this patch, so it stops panicing: > > http://people.freebsd.org/~pjd/patches/zap_micro.c.2.patch > > and some more debug we can see something like this: > > mzap_open: obj=6108 > mze_insert: i=00: adding mze_name=[daily.20080701.gz] hash=16887356416216530944 > mze_insert: i=01: adding mze_name=[daily.20080702.gz] hash=14729145520459087872 > mze_insert: i=02: adding mze_name=[daily.20080703.gz] hash=15098007138724741120 > mze_insert: i=03: adding mze_name=[daily.20080704.gz] hash=9227684390178390016 > mze_insert: i=04: adding mze_name=[daily.20080705.gz] hash=11376611903104090112 > mze_insert: i=05: adding mze_name=[daily.20080706.gz] hash=13533957483210473472 > mze_insert: i=06: adding mze_name=[daily.20080707.gz] hash=11978674440062894080 > mze_insert: i=07: adding mze_name=[daily.20080708.gz] hash=1783752625867456512 > mze_insert: i=08: adding mze_name=[daily.20080709.gz] hash=373697339124088832 > mze_insert: i=09: adding mze_name=[daily.20080710.gz] hash=12364645971085230080 > mze_insert: i=10: adding mze_name=[daily.20080711.gz] hash=13147843840310247424 > mze_insert: i=11: adding mze_name=[daily.20080712.gz] hash=10395742459047968768 > mze_insert: i=12: adding mze_name=[daily.20080713.gz] hash=10208414471135690752 > mze_insert: i=13: adding mze_name=[daily.20080714.gz] hash=15862282000518873088 > mze_insert: i=14: adding mze_name=[daily.20080715.gz] hash=13964729096542879744 > mze_insert: i=15: adding mze_name=[daily.20080716.gz] hash=16717135042526052352 > mze_insert: i=16: adding mze_name=[daily.20080717.gz] hash=18019389811235749888 > mze_insert: i=17: adding mze_name=[daily.20080718.gz] hash=4966448873967976448 > mze_insert: i=18: adding mze_name=[daily.20080719.gz] hash=6413922602988863488 > mze_insert: i=19: adding mze_name=[daily.20080720.gz] hash=5759986962857459712 > mze_insert: i=20: adding mze_name=[daily.20080721.gz] hash=5909039301739413504 > mze_insert: i=21: adding mze_name=[daily.20080722.gz] hash=8372896562954633216 > mze_insert: i=22: adding mze_name=[daily.20080723.gz] hash=7627984238364590080 > mze_insert: i=23: adding mze_name=[daily.20080724.gz] hash=4059358240683589632 > mze_insert: i=24: adding mze_name=[daily.20080725.gz] hash=2718826723431940096 > mze_insert: i=25: adding mze_name=[daily.20080726.gz] hash=254638509216759808 > mze_insert: i=26: adding mze_name=[daily.20080727.gz] hash=2190475858316099584 > mze_insert: i=27: adding mze_name=[daily.20080728.gz] hash=11573062813869932544 > mze_insert: i=28: adding mze_name=[daily.20080729.gz] hash=13651902507838865408 > mze_insert: i=29: adding mze_name=[daily.20080730.gz] hash=1423949746464096256 > mze_insert: i=30: adding mze_name=[daily.20080731.gz] hash=1021306183190839296 > mzap_open: i=31: empty mze_name > mze_insert: i=32: ALREADY EXISTS! mze_name=[daily.20080706.gz] hash=13533957483210473472 > mzap_open: i=33: empty mze_name > mze_insert: i=34: ALREADY EXISTS! mze_name=[daily.20080711.gz] hash=13147843840310247424 > mzap_open: i=35: empty mze_name > mze_insert: i=36: ALREADY EXISTS! mze_name=[daily.20080715.gz] hash=13964729096542879744 > mzap_open: i=37: empty mze_name > mzap_open: i=38: empty mze_name > > zpool scrub shows not errors. > > Any idea how we ended up in this situation? > > If there is ZFS developer interested in tracking it further down, maybe Dmitry > will be able to provide snapshot of this corrupted file system. > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code
Andrey Kuzmin
2009-Sep-09 18:09 UTC
[zfs-code] mze_insert() panic (avl_find() succeeded inside avl_add())
FS snapshot (hope that''s what send stream contains) has tx ids for (at least) all non-free blocks, thus providing a history, kind of. Should be helpful, imho. Regards, Andrey On Wed, Sep 9, 2009 at 9:07 PM, Matthew Ahrens<Matthew.Ahrens at sun.com> wrote:> Unfortunately, I don''t think that the send stream would help us diagnose it > further. ?What we really need is a way to reproduce how to get into this > situation in the first place. > > --matt > > Pawel Jakub Dawidek wrote: >> >> Hi. >> >> There is a bug (6709336), which was closed with ''Not Reproducible'' reason, >> but >> I''ve a file system over here, where it is easy to reproduce. ?Dmitry (fs >> owner) >> send me a snapshot of it (zfs send output) and I can reproduce it at will >> after >> doing ''ls'' in one of the directories. >> >> It looks like there are three entires in directory that are duplicated. >> >> After adding this patch, so it stops panicing: >> >> ? ? ? ?http://people.freebsd.org/~pjd/patches/zap_micro.c.2.patch >> >> and some more debug we can see something like this: >> >> ?mzap_open: obj=6108 >> mze_insert: i=00: adding mze_name=[daily.20080701.gz] >> hash=16887356416216530944 >> mze_insert: i=01: adding mze_name=[daily.20080702.gz] >> hash=14729145520459087872 >> mze_insert: i=02: adding mze_name=[daily.20080703.gz] >> hash=15098007138724741120 >> mze_insert: i=03: adding mze_name=[daily.20080704.gz] >> hash=9227684390178390016 >> mze_insert: i=04: adding mze_name=[daily.20080705.gz] >> hash=11376611903104090112 >> mze_insert: i=05: adding mze_name=[daily.20080706.gz] >> hash=13533957483210473472 >> mze_insert: i=06: adding mze_name=[daily.20080707.gz] >> hash=11978674440062894080 >> mze_insert: i=07: adding mze_name=[daily.20080708.gz] >> hash=1783752625867456512 >> mze_insert: i=08: adding mze_name=[daily.20080709.gz] >> hash=373697339124088832 >> mze_insert: i=09: adding mze_name=[daily.20080710.gz] >> hash=12364645971085230080 >> mze_insert: i=10: adding mze_name=[daily.20080711.gz] >> hash=13147843840310247424 >> mze_insert: i=11: adding mze_name=[daily.20080712.gz] >> hash=10395742459047968768 >> mze_insert: i=12: adding mze_name=[daily.20080713.gz] >> hash=10208414471135690752 >> mze_insert: i=13: adding mze_name=[daily.20080714.gz] >> hash=15862282000518873088 >> mze_insert: i=14: adding mze_name=[daily.20080715.gz] >> hash=13964729096542879744 >> mze_insert: i=15: adding mze_name=[daily.20080716.gz] >> hash=16717135042526052352 >> mze_insert: i=16: adding mze_name=[daily.20080717.gz] >> hash=18019389811235749888 >> mze_insert: i=17: adding mze_name=[daily.20080718.gz] >> hash=4966448873967976448 >> mze_insert: i=18: adding mze_name=[daily.20080719.gz] >> hash=6413922602988863488 >> mze_insert: i=19: adding mze_name=[daily.20080720.gz] >> hash=5759986962857459712 >> mze_insert: i=20: adding mze_name=[daily.20080721.gz] >> hash=5909039301739413504 >> mze_insert: i=21: adding mze_name=[daily.20080722.gz] >> hash=8372896562954633216 >> mze_insert: i=22: adding mze_name=[daily.20080723.gz] >> hash=7627984238364590080 >> mze_insert: i=23: adding mze_name=[daily.20080724.gz] >> hash=4059358240683589632 >> mze_insert: i=24: adding mze_name=[daily.20080725.gz] >> hash=2718826723431940096 >> mze_insert: i=25: adding mze_name=[daily.20080726.gz] >> hash=254638509216759808 >> mze_insert: i=26: adding mze_name=[daily.20080727.gz] >> hash=2190475858316099584 >> mze_insert: i=27: adding mze_name=[daily.20080728.gz] >> hash=11573062813869932544 >> mze_insert: i=28: adding mze_name=[daily.20080729.gz] >> hash=13651902507838865408 >> mze_insert: i=29: adding mze_name=[daily.20080730.gz] >> hash=1423949746464096256 >> mze_insert: i=30: adding mze_name=[daily.20080731.gz] >> hash=1021306183190839296 >> ?mzap_open: i=31: empty mze_name >> mze_insert: i=32: ALREADY EXISTS! mze_name=[daily.20080706.gz] >> hash=13533957483210473472 >> ?mzap_open: i=33: empty mze_name >> mze_insert: i=34: ALREADY EXISTS! mze_name=[daily.20080711.gz] >> hash=13147843840310247424 >> ?mzap_open: i=35: empty mze_name >> mze_insert: i=36: ALREADY EXISTS! mze_name=[daily.20080715.gz] >> hash=13964729096542879744 >> ?mzap_open: i=37: empty mze_name >> ?mzap_open: i=38: empty mze_name >> >> zpool scrub shows not errors. >> >> Any idea how we ended up in this situation? >> >> If there is ZFS developer interested in tracking it further down, maybe >> Dmitry >> will be able to provide snapshot of this corrupted file system. >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> zfs-code mailing list >> zfs-code at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-code > > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code >
Pawel Jakub Dawidek
2009-Sep-09 20:53 UTC
[zfs-code] mze_insert() panic (avl_find() succeeded inside avl_add())
On Wed, Sep 09, 2009 at 10:07:08AM -0700, Matthew Ahrens wrote:> Unfortunately, I don''t think that the send stream would help us diagnose it > further. What we really need is a way to reproduce how to get into this > situation in the first place.Yes and we unfortunatelly cannot provide a way to reproduce it. I''m still looking at it and this is what I found: When I delete duplicated entries, export&import the pool to flush the cache I can see the covered entries. They are almost exactly the same (same name, same size, same md5 sum), the only thing that is different are inodes - those duplicates have different inode number that the ones I removed. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090909/0326f9cb/attachment.bin>
Matthew Ahrens
2009-Sep-09 22:13 UTC
[zfs-code] mze_insert() panic (avl_find() succeeded inside avl_add())
Pawel Jakub Dawidek wrote:> On Wed, Sep 09, 2009 at 10:07:08AM -0700, Matthew Ahrens wrote: >> Unfortunately, I don''t think that the send stream would help us diagnose it >> further. What we really need is a way to reproduce how to get into this >> situation in the first place. > > Yes and we unfortunatelly cannot provide a way to reproduce it. > > I''m still looking at it and this is what I found: > > When I delete duplicated entries, export&import the pool to flush the > cache I can see the covered entries. They are almost exactly the same > (same name, same size, same md5 sum), the only thing that is different > are inodes - those duplicates have different inode number that the ones > I removed.That''s an interesting piece of information. And can you tell if all the inode numbers referenced are valid files still? Their zp_parent should point to this directory. --matt
Pawel Jakub Dawidek
2009-Sep-10 05:27 UTC
[zfs-code] mze_insert() panic (avl_find() succeeded inside avl_add())
On Wed, Sep 09, 2009 at 03:13:27PM -0700, Matthew Ahrens wrote:> Pawel Jakub Dawidek wrote: > >On Wed, Sep 09, 2009 at 10:07:08AM -0700, Matthew Ahrens wrote: > >>Unfortunately, I don''t think that the send stream would help us diagnose > >>it further. What we really need is a way to reproduce how to get into > >>this situation in the first place. > > > >Yes and we unfortunatelly cannot provide a way to reproduce it. > > > >I''m still looking at it and this is what I found: > > > >When I delete duplicated entries, export&import the pool to flush the > >cache I can see the covered entries. They are almost exactly the same > >(same name, same size, same md5 sum), the only thing that is different > >are inodes - those duplicates have different inode number that the ones > >I removed. > > That''s an interesting piece of information. And can you tell if all the > inode numbers referenced are valid files still? Their zp_parent should > point to this directory.From what I can see all the files are ok. Both initial and covered znodes'' zp_parent point at the parent directory inode. I can even rename initial files, export&import and now I''ve access to all six files. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090910/886416f7/attachment.bin>