Dennis Clarke
2009-May-10 15:24 UTC
[zfs-discuss] using zdb -e -bbcsL to debug that hung thread issue
---------------------------- Original Message ---------------------------- Subject: Re: I see you''re running zdb -e -bbcsL From: "Victor Latushkin" <Victor.Latushkin at Sun.COM> Date: Sun, May 10, 2009 11:17 To: dclarke at blastwave.org -------------------------------------------------------------------------- Dennis Clarke wrote:> # w > 3:14pm up 11:24, 3 users, load average: 0.46, 0.29, 0.23 > User tty login@ idle JCPU PCPU what > dclarke console 1:22pm 1:52 2:02 1:31 /usr/lib/nwam-manager > dclarke pts/4 1:44pm 1:10 zpool import -f -R > /mnt/foo 1598 > dclarke pts/7 1:49pm 9 ssh -2 -4 -e^ -l > dclarke loginz. > dclarke pts/8 1:51pm 3 ssh -2 -4 -e^ -l > dclarke mail.li > dclarke pts/10 2:07pm 20 w > iktorn pts/11 3:06pm 4 zpool import > iktorn pts/12 3:13pm 1 1 zdb -e -bbcsL > 159890708868077350 > > > Now I need to go read the manual to see what zdb is :-) >thus far I see some output from that : dclarke at neptune:~$ cat ../iktorn/zdb/zdb-ebbcsL.out I know that will wrap all wrong for people to see. see : http://www.blastwave.org/dclarke/blog/files/zdb-ebbcsL.README Traversing all blocks to verify metadata checksums ... zdb_blkptr_cb: Got error 50 reading <0, 34, 0, 0> [L0 SPA space map] 0x1000L/0x200P DVA[0]=<0:0x6091e400:0x200> DVA[1]=<0:0x3e091e400:0x200> DVA[2]=<0:0x78091e400:0x200> fletcher4 lzjb LE contiguous birth=1461 fill=1 cksum=0x1f7bc0ee12:0x6fcfd90640d:0x10787c83addaf:0x1f3ef97a921b6f -- skipping zdb_blkptr_cb: Got error 50 reading <0, 35, 0, 0> [L0 SPA space map] 0x1000L/0x200P DVA[0]=<0:0x6091e600:0x200> DVA[1]=<0:0x3e091e600:0x200> DVA[2]=<0:0x78091e600:0x200> fletcher4 lzjb LE contiguous birth=1461 fill=1 cksum=0x1f7bc0ee12:0x6fcfd90640d:0x10787c83addaf:0x1f3ef97a921b6f -- skipping zdb_blkptr_cb: Got error 50 reading <0, 36, 0, 0> [L0 SPA space map] 0x1000L/0x200P DVA[0]=<0:0x6091e200:0x200> DVA[1]=<0:0x3e091e200:0x200> DVA[2]=<0:0x78091e200:0x200> fletcher4 lzjb LE contiguous birth=1461 fill=1 cksum=0x1f522c92e2:0x6ae50e6dbad:0xf0a944e70790:0x1b6468e6c6f56a -- skipping zdb_blkptr_cb: Got error 50 reading <0, 37, 0, 0> [L0 SPA space map] 0x1000L/0x200P DVA[0]=<0:0x803d9800:0x200> DVA[1]=<0:0x4003d9800:0x200> DVA[2]=<0:0x7a03d9800:0x200> fletcher4 lzjb LE contiguous birth=1509 fill=1 cksum=0x1f2b0a539f:0x763a6e219f4:0x1200601439c63:0x22f1a766cefc7c -- skipping zdb_blkptr_cb: Got error 50 reading <0, 38, 0, 0> [L0 SPA space map] 0x1000L/0x200P DVA[0]=<0:0x803d9a00:0x200> DVA[1]=<0:0x4003d9a00:0x200> DVA[2]=<0:0x7a03d9a00:0x200> fletcher4 lzjb LE contiguous birth=1509 fill=1 cksum=0x1f2b0a539f:0x763a6e219f4:0x1200601439c63:0x22f1a766cefc7c -- skipping zdb_blkptr_cb: Got error 50 reading <0, 39, 0, 0> [L0 SPA space map] 0x1000L/0x200P DVA[0]=<0:0x803d9600:0x200> DVA[1]=<0:0x4003d9600:0x200> DVA[2]=<0:0x7a03d9600:0x200> fletcher4 lzjb LE contiguous birth=1509 fill=1 cksum=0x1f2b0a539f:0x763a6e219f4:0x1200601439c63:0x22f1a766cefc7c -- skipping zdb_blkptr_cb: Got error 50 reading <0, 48, 0, 0> [L0 SPA space map] 0x1000L/0x400P DVA[0]=<0:0xc1263c00:0x400> DVA[1]=<0:0x441263c00:0x400> DVA[2]=<0:0x7c1263c00:0x400> fletcher4 lzjb LE contiguous birth=648 fill=1 cksum=0x22b93a8434:0x190afe8456c3:0x9632f68e6719b:0x2703bc59856dd31 -- skipping zdb_blkptr_cb: Got error 50 reading <0, 49, 0, 0> [L0 SPA space map] 0x1000L/0x400P DVA[0]=<0:0xc1264000:0x400> DVA[1]=<0:0x441264000:0x400> DVA[2]=<0:0x7c1264000:0x400> fletcher4 lzjb LE contiguous birth=648 fill=1 cksum=0x24c7dc289e:0x1a54426b8513:0x9cb262a2e8e04:0x286474a6a40f0ea -- skipping zdb_blkptr_cb: Got error 50 reading <0, 50, 0, 0> [L0 SPA space map] 0x1000L/0x400P DVA[0]=<0:0xc1264400:0x400> DVA[1]=<0:0x441264400:0x400> DVA[2]=<0:0x7c1264400:0x400> fletcher4 lzjb LE contiguous birth=648 fill=1 cksum=0x24c7dc289e:0x1a54426b8513:0x9cb262a2e8e04:0x286474a6a40f0ea -- skipping Error counts: errno count 50 9 block traversal size 1561281536 != alloc 20934112256 (unreachable 19372830720) bp count: 4121 bp logical: 521589760 avg: 126568 bp physical: 520441856 avg: 126290 compression: 1.00 bp allocated: 1561281536 avg: 378859 compression: 0.33 SPA allocated: 20934112256 used: 26.17% Blocks LSIZE PSIZE ASIZE avg comp %Total Type 8 56.0K 10.0K 30.0K 3.75K 5.60 0.00 deferred free 1 512 512 1.50K 1.50K 1.00 0.00 object directory 2 1K 1K 3.00K 1.50K 1.00 0.00 object array 1 16K 1.50K 4.50K 4.50K 10.67 0.00 packed nvlist - - - - - - - packed nvlist size 1 16K 1K 3.00K 3.00K 16.00 0.00 bplist - - - - - - - bplist header - - - - - - - SPA space map header 48 192K 37.0K 111K 2.31K 5.19 0.01 SPA space map 1 12.0K 12.0K 12.0K 12.0K 1.00 0.00 ZIL intent log 24 384K 30.0K 76.0K 3.17K 12.80 0.00 DMU dnode 4 4K 2K 5.00K 1.25K 2.00 0.00 DMU objset - - - - - - - DSL directory 4 2K 2K 6.00K 1.50K 1.00 0.00 DSL directory child map 3 1.50K 1.50K 4.50K 1.50K 1.00 0.00 DSL dataset snap map 7 96.5K 11.0K 33.0K 4.71K 8.77 0.00 DSL props - - - - - - - DSL dataset - - - - - - - ZFS znode - - - - - - - ZFS V0 ACL 3.91K 497M 496M 1.45G 381K 1.00 99.98 ZFS plain file 7 3.50K 3.50K 8.50K 1.21K 1.00 0.00 ZFS directory 3 1.50K 1.50K 3.50K 1.17K 1.00 0.00 ZFS master node 3 1.50K 1.50K 3.50K 1.17K 1.00 0.00 ZFS delete queue - - - - - - - zvol object - - - - - - - zvol prop - - - - - - - other uint8[] - - - - - - - other uint64[] - - - - - - - other ZAP - - - - - - - persistent error log 1 128K 5.00K 15.0K 15.0K 25.60 0.00 SPA history - - - - - - - SPA history offsets - - - - - - - Pool properties - - - - - - - DSL permissions - - - - - - - ZFS ACL - - - - - - - ZFS SYSACL - - - - - - - FUID table - - - - - - - FUID table size - - - - - - - DSL dataset next clones - - - - - - - scrub work queue 4.02K 497M 496M 1.45G 370K 1.00 100.00 Total capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum 15989070886807735056 19.5G 55.0G 97 0 5.89M 0 0 0 11 /dev/dsk/c0d0p0 19.5G 55.0G 97 0 5.89M 0 0 0 145 dclarke at neptune:~$ Dennis ps: I''m just being verbose and sharing what I see on the maillist. Other people may be able to write a man page for zdb that helps us understand what it does.
Victor Latushkin
2009-May-10 18:37 UTC
[zfs-discuss] using zdb -e -bbcsL to debug that hung thread issue
Dennis Clarke wrote:> > ---------------------------- Original Message ---------------------------- > Subject: Re: I see you''re running zdb -e -bbcsL > From: "Victor Latushkin" <Victor.Latushkin at Sun.COM> > Date: Sun, May 10, 2009 11:17 > To: dclarke at blastwave.org > -------------------------------------------------------------------------- > > Dennis Clarke wrote: >> # w >> 3:14pm up 11:24, 3 users, load average: 0.46, 0.29, 0.23 >> User tty login@ idle JCPU PCPU what >> dclarke console 1:22pm 1:52 2:02 1:31 /usr/lib/nwam-manager >> dclarke pts/4 1:44pm 1:10 zpool import -f -R >> /mnt/foo 1598 >> dclarke pts/7 1:49pm 9 ssh -2 -4 -e^ -l >> dclarke loginz. >> dclarke pts/8 1:51pm 3 ssh -2 -4 -e^ -l >> dclarke mail.li >> dclarke pts/10 2:07pm 20 w >> iktorn pts/11 3:06pm 4 zpool import >> iktorn pts/12 3:13pm 1 1 zdb -e -bbcsL >> 159890708868077350 >> >> >> Now I need to go read the manual to see what zdb is :-) >> > > thus far I see some output from that : > > dclarke at neptune:~$ cat ../iktorn/zdb/zdb-ebbcsL.outzdb -ebbcsL is doing this: -e is for exported pool -bb is to walk entire block tree with verbosity 2 (hence ''b'' two times) -c is to checksum only metadata blocks (to avoid reading data block and save time, adding second ''c'' would cause checksumming of all blocks) -s is to output summary of operations in the end -L is to avoid loading space maps (since we know at least one is corrupted) So it allows us to check consistency of on-disk metadata for the pool without importing it in-kernel.> I know that will wrap all wrong for people to see. > > see : > http://www.blastwave.org/dclarke/blog/files/zdb-ebbcsL.README > > > Traversing all blocks to verify metadata checksums ... > zdb_blkptr_cb: Got error 50 reading <0, 34, 0, 0> [L0 SPA space map] > 0x1000L/0x200P DVA[0]=<0:0x6091e400:0x200> DVA[1]=<0:0x3e091e400:0x200> > DVA[2]=<0:0x78091e400:0x200> fletcher4 lzjb LE contiguous birth=1461 > fill=1 cksum=0x1f7bc0ee12:0x6fcfd90640d:0x10787c83addaf:0x1f3ef97a921b6f > -- skippingThis is the block we are trying to read in (compare DVAs and other fields with ::blkptr output in the other thread).> zdb_blkptr_cb: Got error 50 reading <0, 35, 0, 0> [L0 SPA space map] > 0x1000L/0x200P DVA[0]=<0:0x6091e600:0x200> DVA[1]=<0:0x3e091e600:0x200> > DVA[2]=<0:0x78091e600:0x200> fletcher4 lzjb LE contiguous birth=1461 > fill=1 cksum=0x1f7bc0ee12:0x6fcfd90640d:0x10787c83addaf:0x1f3ef97a921b6f > -- skipping > zdb_blkptr_cb: Got error 50 reading <0, 36, 0, 0> [L0 SPA space map] > 0x1000L/0x200P DVA[0]=<0:0x6091e200:0x200> DVA[1]=<0:0x3e091e200:0x200> > DVA[2]=<0:0x78091e200:0x200> fletcher4 lzjb LE contiguous birth=1461 > fill=1 cksum=0x1f522c92e2:0x6ae50e6dbad:0xf0a944e70790:0x1b6468e6c6f56a -- > skipping > zdb_blkptr_cb: Got error 50 reading <0, 37, 0, 0> [L0 SPA space map] > 0x1000L/0x200P DVA[0]=<0:0x803d9800:0x200> DVA[1]=<0:0x4003d9800:0x200> > DVA[2]=<0:0x7a03d9800:0x200> fletcher4 lzjb LE contiguous birth=1509 > fill=1 cksum=0x1f2b0a539f:0x763a6e219f4:0x1200601439c63:0x22f1a766cefc7c > -- skipping > zdb_blkptr_cb: Got error 50 reading <0, 38, 0, 0> [L0 SPA space map] > 0x1000L/0x200P DVA[0]=<0:0x803d9a00:0x200> DVA[1]=<0:0x4003d9a00:0x200> > DVA[2]=<0:0x7a03d9a00:0x200> fletcher4 lzjb LE contiguous birth=1509 > fill=1 cksum=0x1f2b0a539f:0x763a6e219f4:0x1200601439c63:0x22f1a766cefc7c > -- skipping > zdb_blkptr_cb: Got error 50 reading <0, 39, 0, 0> [L0 SPA space map] > 0x1000L/0x200P DVA[0]=<0:0x803d9600:0x200> DVA[1]=<0:0x4003d9600:0x200> > DVA[2]=<0:0x7a03d9600:0x200> fletcher4 lzjb LE contiguous birth=1509 > fill=1 cksum=0x1f2b0a539f:0x763a6e219f4:0x1200601439c63:0x22f1a766cefc7c > -- skipping > zdb_blkptr_cb: Got error 50 reading <0, 48, 0, 0> [L0 SPA space map] > 0x1000L/0x400P DVA[0]=<0:0xc1263c00:0x400> DVA[1]=<0:0x441263c00:0x400> > DVA[2]=<0:0x7c1263c00:0x400> fletcher4 lzjb LE contiguous birth=648 fill=1 > cksum=0x22b93a8434:0x190afe8456c3:0x9632f68e6719b:0x2703bc59856dd31 -- > skipping > zdb_blkptr_cb: Got error 50 reading <0, 49, 0, 0> [L0 SPA space map] > 0x1000L/0x400P DVA[0]=<0:0xc1264000:0x400> DVA[1]=<0:0x441264000:0x400> > DVA[2]=<0:0x7c1264000:0x400> fletcher4 lzjb LE contiguous birth=648 fill=1 > cksum=0x24c7dc289e:0x1a54426b8513:0x9cb262a2e8e04:0x286474a6a40f0ea -- > skipping > zdb_blkptr_cb: Got error 50 reading <0, 50, 0, 0> [L0 SPA space map] > 0x1000L/0x400P DVA[0]=<0:0xc1264400:0x400> DVA[1]=<0:0x441264400:0x400> > DVA[2]=<0:0x7c1264400:0x400> fletcher4 lzjb LE contiguous birth=648 fill=1 > cksum=0x24c7dc289e:0x1a54426b8513:0x9cb262a2e8e04:0x286474a6a40f0ea -- > skipping > > Error counts: > > errno count > 50 9 > block traversal size 1561281536 != alloc 20934112256 (unreachable > 19372830720) > > bp count: 4121 > bp logical: 521589760 avg: 126568 > bp physical: 520441856 avg: 126290 compression: 1.00 > bp allocated: 1561281536 avg: 378859 compression: 0.33 > SPA allocated: 20934112256 used: 26.17% > > Blocks LSIZE PSIZE ASIZE avg comp %Total Type > 8 56.0K 10.0K 30.0K 3.75K 5.60 0.00 deferred free > 1 512 512 1.50K 1.50K 1.00 0.00 object directory > 2 1K 1K 3.00K 1.50K 1.00 0.00 object array > 1 16K 1.50K 4.50K 4.50K 10.67 0.00 packed nvlist > - - - - - - - packed nvlist size > 1 16K 1K 3.00K 3.00K 16.00 0.00 bplist > - - - - - - - bplist header > - - - - - - - SPA space map header > 48 192K 37.0K 111K 2.31K 5.19 0.01 SPA space map > 1 12.0K 12.0K 12.0K 12.0K 1.00 0.00 ZIL intent log > 24 384K 30.0K 76.0K 3.17K 12.80 0.00 DMU dnode > 4 4K 2K 5.00K 1.25K 2.00 0.00 DMU objset > - - - - - - - DSL directory > 4 2K 2K 6.00K 1.50K 1.00 0.00 DSL directory > child map > 3 1.50K 1.50K 4.50K 1.50K 1.00 0.00 DSL dataset snap map > 7 96.5K 11.0K 33.0K 4.71K 8.77 0.00 DSL props > - - - - - - - DSL dataset > - - - - - - - ZFS znode > - - - - - - - ZFS V0 ACL > 3.91K 497M 496M 1.45G 381K 1.00 99.98 ZFS plain file > 7 3.50K 3.50K 8.50K 1.21K 1.00 0.00 ZFS directory > 3 1.50K 1.50K 3.50K 1.17K 1.00 0.00 ZFS master node > 3 1.50K 1.50K 3.50K 1.17K 1.00 0.00 ZFS delete queue > - - - - - - - zvol object > - - - - - - - zvol prop > - - - - - - - other uint8[] > - - - - - - - other uint64[] > - - - - - - - other ZAP > - - - - - - - persistent error log > 1 128K 5.00K 15.0K 15.0K 25.60 0.00 SPA history > - - - - - - - SPA history offsets > - - - - - - - Pool properties > - - - - - - - DSL permissions > - - - - - - - ZFS ACL > - - - - - - - ZFS SYSACL > - - - - - - - FUID table > - - - - - - - FUID table size > - - - - - - - DSL dataset next > clones > - - - - - - - scrub work queue > 4.02K 497M 496M 1.45G 370K 1.00 100.00 Total > > capacity operations bandwidth ---- errors > ---- > description used avail read write read write read write > cksum > 15989070886807735056 19.5G 55.0G 97 0 5.89M 0 0 0 > 11 > /dev/dsk/c0d0p0 19.5G 55.0G 97 0 5.89M 0 0 0 > 145 > dclarke at neptune:~$ > > Dennis > > ps: I''m just being verbose and sharing what I see on the maillist. Other > people may be able to write a man page for zdb that helps us understand > what it does. > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss