Hello experts, and preferably those in the know of the low-level side of ZFS! I''ve mentioned this in another thread, but this question may need some more general attention: My main data pool shows a discrepancy between traversal and alloc sizes, as quoted below. In another thread, it was suggested that due to some corruption ZFS might get two different block pointers pointing to the same logical block - so one of them has a checksum mismatch and the overall accounting error. This sounds credible to me. Question to the low-level gurus: can it be so despite ZFS''s best efforts to keep everything perfectly safe? Is it possible to determine which block pointer is invalid and perhaps enforce its destruction or reallocation, similarly to discovery and "recovery" of "lost clusters" on other FSes (maybe using some of the ZFS-forensics Python scripts out there)? If this matter really concerns some 12Kb of userdata, I won''t mind corrupting or losing it, if this would make my pool "sane" again. I do want to ensure that my pool is consistent, even if it means forging some blocks on-disk, bypassing ZFS, and perhaps sacrificing a little bit of the data (preferably - knowing what it is, so maybe if I have another copy lying around - I can fetch it; otherwise I''d know and accept what is gone). However I do not want to recreate this pool from scratch, because I don''t have a backup or means to make a backup of all its entirety of data. As a presumably reliable home storage, it was the backup and original storage has now been repurposed and filled up as well ;) More detailed ZDB walks (i.e. to detect leaked blocks) had failed, probably due to depleting RAM (8Gb) and SWAP (35Gb), I might retry with a bit more swap soon. Also all the tests quoted below were done in oi_148a on home-brewn hardware (ex-desktop, no ECC), and I plan to redo some of them in oi_151a. Sorry, now the post becomes lengthy due to "screenshots"... ======= Simple ZDB walk of the pool: # time zdb -bsvL -e 1601233584937321596 Traversing all blocks ... block traversal size 9044810649600 != alloc 9044810661888 (unreachable 12288) bp count: 85247389 bp logical: 8891475160064 avg: 104302 bp physical: 7985515234304 avg: 93674 compression: 1.11 bp allocated: 12429088972800 avg: 145800 compression: 0.72 bp deduped: 3384278323200 ref>1: 13909855 deduplication: 1.27 SPA allocated: 9044810661888 used: 75.64% Blocks LSIZE PSIZE ASIZE avg comp %Total Type - - - - - - - unallocated 2 32K 4K 72.0K 36.0K 8.00 0.00 object directory 3 1.50K 1.50K 108K 36.0K 1.00 0.00 object array 2 32K 2.50K 72.0K 36.0K 12.80 0.00 packed nvlist - - - - - - - packed nvlist size 7.80K 988M 208M 1.12G 147K 4.75 0.01 bpobj - - - - - - - bpobj header - - - - - - - SPA space map header 185K 761M 523M 6.57G 36.3K 1.46 0.06 SPA space map 22 1020K 1020K 1.58M 73.6K 1.00 0.00 ZIL intent log 933K 14.6G 3.11G 25.2G 27.6K 4.69 0.22 DMU dnode 1.75K 3.50M 898K 42.0M 24.0K 3.99 0.00 DMU objset - - - - - - - DSL directory 390 243K 200K 13.7M 36.0K 1.21 0.00 DSL directory child map 388 298K 208K 13.6M 36.0K 1.43 0.00 DSL dataset snap map 715 10.2M 1.14M 25.1M 36.0K 8.92 0.00 DSL props - - - - - - - DSL dataset - - - - - - - ZFS znode - - - - - - - ZFS V0 ACL 76.1M 8.06T 7.25T 11.2T 150K 1.11 98.67 ZFS plain file 2.17M 2.76G 1.33G 52.7G 24.3K 2.08 0.46 ZFS directory 341 314K 171K 7.99M 24.0K 1.84 0.00 ZFS master node 856 25.4M 1.16M 20.1M 24.1K 21.92 0.00 ZFS delete queue - - - - - - - zvol object - - - - - - - zvol prop - - - - - - - other uint8[] - - - - - - - other uint64[] - - - - - - - other ZAP - - - - - - - persistent error log 33 4.02M 763K 4.46M 139K 5.39 0.00 SPA history - - - - - - - SPA history offsets 1 512 512 36.0K 36.0K 1.00 0.00 Pool properties - - - - - - - DSL permissions 17.1K 12.7M 8.63M 411M 24.0K 1.48 0.00 ZFS ACL - - - - - - - ZFS SYSACL 5 80.0K 5.00K 120K 24.0K 16.00 0.00 FUID table - - - - - - - FUID table size 1.37K 723K 705K 49.3M 36.0K 1.03 0.00 DSL dataset next clones - - - - - - - scan work queue 2.69K 2.57M 1.36M 64.5M 24.0K 1.89 0.00 ZFS user/group used - - - - - - - ZFS user/group quota - - - - - - - snapshot refcount tags 1.87M 7.48G 4.41G 67.4G 36.0K 1.70 0.58 DDT ZAP algorithm 2 32K 4.50K 72.0K 36.0K 7.11 0.00 DDT statistics 21 10.5K 10.5K 504K 24.0K 1.00 0.00 System attributes 288 144K 144K 6.75M 24.0K 1.00 0.00 SA master node 288 432K 144K 6.75M 24.0K 3.00 0.00 SA attr registration 576 9.00M 1008K 13.5M 24.0K 9.14 0.00 SA attr layouts - - - - - - - scan translations - - - - - - - deduplicated block 1.84K 4.73M 1.20M 66.3M 36.0K 3.95 0.00 DSL deadlist map - - - - - - - DSL deadlist map hdr 94 68.0K 50.0K 3.30M 36.0K 1.36 0.00 DSL dir clones 11 1.38M 49.5K 792K 72.0K 28.44 0.00 bpobj subobj 20 258K 49.5K 864K 43.2K 5.21 0.00 deferred free 815 22.0M 10.3M 23.8M 29.9K 2.13 0.00 dedup ditto 81.3M 8.09T 7.26T 11.3T 142K 1.11 100.00 Total capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum pool 8.23T 2.65T 89 0 418K 0 0 0 0 raidz2 8.23T 2.65T 89 0 418K 0 0 0 0 /dev/dsk/c6t0d0s0 8 0 497K 0 0 0 0 /dev/dsk/c6t1d0s0 13 0 531K 0 0 0 0 /dev/dsk/c6t2d0s0 9 0 479K 0 0 0 0 /dev/dsk/c6t3d0s0 8 0 495K 0 0 0 0 /dev/dsk/c6t4d0s0 13 0 531K 0 0 0 0 /dev/dsk/c6t5d0s0 9 0 481K 0 0 0 0 real 632m2.412s user 311m54.827s sys 5m1.200s ======== Attempts to analyze the pool with leaked-block detection failed despite adding lots of swap (35Gb), but likely due to running out of VM as shown below: root at openindiana:~# time zdb -bsvc -e 1601233584937321596 Traversing all blocks to verify checksums and verify nothing leaked ... Assertion failed: zio_wait(zio_claim(0L, zcb->zcb_spa, refcnt ? 0 : spa_first_txg(zcb->zcb_spa), bp, 0L, 0L, ZIO_FLAG_CANFAIL)) == 0 (0x2 == 0x0), file ../zdb.c, line 1950 Abort real 7197m41.288s user 291m39.256s sys 25m48.133s root at openindiana:~# time zdb -bb -e 1601233584937321596 Traversing all blocks to verify nothing leaked ... (LAN Disconnected; RAM/SWAP used up) === Virtual memory was depleted by this ZDB walk attempt: # iostat -Xn -Td c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 60 iostat: near-0 (20-40kb/s in 60sec avg) # top last pid: 4154; load avg: 0.16, 0.17, 0.17; up 7+06:59:56 17:10:31 50 processes: 49 sleeping, 1 on cpu CPU states: 92.0% idle, 0.0% user, 8.0% kernel, 0.0% iowait, 0.0% swap Kernel: 1618 ctxsw, 142 trap, 1163 intr, 774 syscall, 77 flt, 256 pgin, 264 pgout Memory: 8191M phys mem, 128M free mem, 35G total swap, 998M free swap PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND 4133 root 205 60 0 23G 6541M sleep 255:04 0.24% zdb 4028 root 1 59 0 2300K 1380K sleep 35:23 0.18% vmstat 3887 root 1 59 0 3984K 896K cpu/1 27:08 0.13% top 3705 jack 1 59 0 7596K 1048K sleep 2:59 0.01% sshd # vmstat 1 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 lf lf rm in sy cs us sy id 0 2 0 404468 130880 2 76 269 285 285 0 636 0 0 0 0 1660 762 1834 0 10 90 0 2 0 404468 130928 1 66 228 272 276 0 461 0 0 0 0 1453 786 1625 0 8 92 0 2 0 404468 130880 1 77 275 235 235 0 356 0 0 0 0 1307 750 1618 0 8 92 0 2 0 404468 130916 1 67 237 277 277 0 621 0 0 0 0 1593 1105 2022 0 10 90 0 2 0 404468 130876 4 75 255 231 231 0 598 0 0 0 0 1356 751 1897 0 8 91 0 1 0 404468 130872 6 79 256 272 272 0 584 0 0 0 0 1337 765 1702 0 8 92 0 2 0 404468 130908 3 70 237 281 281 0 583 0 0 0 0 1593 762 1885 0 10 90 0 2 0 404468 130868 6 86 287 275 275 0 534 0 0 0 0 1074 754 1635 0 8 92 0 2 0 404468 130884 2 71 245 273 273 0 597 0 0 0 0 1327 763 1652 0 8 92 0 2 0 404468 130900 6 85 287 318 318 0 520 0 0 0 0 1370 770 1913 0 10 89 ======== Also, when my non-redundant rpool failed, I could not walk it by ZDB - the tool was reporting errors and bailing out like shown below. At the moment it is irrelevant to my current problem, as I''ve recreated that rpool, but it seems wrong that errors in files precluded the pool from working (being imported by zpool, or checked by zdb). root at openindiana:~# time zdb -bb -e 17995958177810353692 Traversing all blocks to verify nothing leaked ... Assertion failed: ss->ss_start <= start (0x79e22600 <= 0x79e1dc00), file ../../../uts/common/fs/zfs/space_map.c, line 173 Abort (core dumped) real 0m12.184s user 0m0.367s sys 0m0.474s root at openindiana:~# time zdb -bsvc -e 17995958177810353692 Traversing all blocks to verify checksums and verify nothing leaked ... Assertion failed: ss->ss_start <= start (0x79e22600 <= 0x79e1dc00), file ../../../uts/common/fs/zfs/space_map.c, line 173 Abort (core dumped) real 0m12.019s user 0m0.360s sys 0m0.458s === There were some errors, but all known ones proved to be in file data, hence - why the fatal bailout (and also inability to import that rpool - machine hung on every attempt)? root at openindiana:~# time zdb -bsvcL -e 17995958177810353692 Traversing all blocks to verify checksums ... zdb_blkptr_cb: Got error 50 reading <182, 19177, 0, 1> DVA[0]=<0:a8c8e600:20000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=82L/82P fill=1 cksum=3401f5fe522b:109ee10ba48ed38c:e7f49c220f7b8bc:ff405ef051b91e65 -- skipping zdb_blkptr_cb: Got error 50 reading <182, 19202, 0, 1> DVA[0]=<0:a9030a00:20000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=82L/82P fill=1 cksum=11c4c738b0ba:7bb81bce3313913:8f85a7abf1b9e34:58e8746d63119393 -- skipping zdb_blkptr_cb: Got error 50 reading <182, 24924, 0, 0> DVA[0]=<0:b1aaec00:14a00> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=14a00L/14a00P birth=85L/85P fill=1 cksum=270679cd905d:6119a969a134566:6f0f7da64c4d2d90:3ab86aa985abef02 -- skipping zdb_blkptr_cb: Got error 50 reading <182, 24944, 0, 0> DVA[0]=<0:b1cdf000:10800> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=10800L/10800P birth=85L/85P fill=1 cksum=1ebb4d1ae9f5:3cf5f42afa9a332:757613fc2d2de7b3:5f197017333a4f89 -- skipping zdb_blkptr_cb: Got error 50 reading <493, 947, 0, 165> DVA[0]=<0:b3efc200:20000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=26691L/26691P fill=1 cksum=2cdc2ae22d10:b33d31bcbc0d8da:f1571c9975e151b0:a037073594569635 -- skipping Error counts: errno count 50 5 block traversal size 11986202624 != alloc 11986203136 (unreachable 512) bp count: 405927 bp logical: 15030449664 avg: 37027 bp physical: 12995855872 avg: 32015 compression: 1.16 bp allocated: 13172434944 avg: 32450 compression: 1.14 bp deduped: 1186232320 ref>1: 12767 deduplication: 1.09 SPA allocated: 11986203136 used: 56.17% //Jim