Andriy Gapon
2021-Feb-23 10:52 UTC
lots of "no such file or directory" errors in zfs filesystem
On 23/02/2021 05:25, Chris Anderson wrote:> so I can't ls -i the file since that triggers the no such file warning. if I run > zdb -dddd on the inode of a directory which contains one of those missing files, > I can get the inode of the file from that, but I don't get anything particularly > interesting in the output. > > most of the files that are missing are in directories with a large number of > files (the largest has 180k) but I managed to find a directory which had a > single file entry that is missing: > > Dataset tank/home/cva [ZPL], ID 196, cr_txg 163, 109G, 908537 objects, rootbp > DVA[0]=<0:13210311000:1000> DVA[1]=<0:18b9a02c000:1000> [L0 DMU objset] > fletcher4 uncompressed LE contiguous unique double size=800L/800P > birth=46916371L/46916371P fill=908537 > cksum=11fdd21d1d:13cb24c87a6e:da0c9bf1b5df3:715ab2ec45b7b09 > > > ? ? Object? lvl ? iblk ? dblk? dsize? dnsize? lsize ? %full? type > > ?? ? 38268? ? 1 ? 128K ? ? 1K? ? ? 0? ? 512 ? ? 1K? 100.00? ZFS directory > > ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 264 ? bonus? ZFS znode > > ? ? ? ? dnode flags: USED_BYTES USERUSED_ACCOUNTED? > > ? ? ? ? dnode maxblkid: 0 > > ? ? ? ? uid ? ? 1001 > > ? ? ? ? gid ? ? 1001 > > ? ? ? ? atime ? Sun Aug? 6 02:00:41 2017 > > ? ? ? ? mtime ? Wed Apr 15 12:12:42 2020 > > ? ? ? ? ctime ? Wed Apr 15 12:12:42 2020 > > ? ? ? ? crtime? Sat Aug? 5 15:10:07 2017 > > ? ? ? ? gen ? ? 23881023 > > ? ? ? ? mode? ? 40755 > > ? ? ? ? size? ? 3 > > ? ? ? ? parent? 38176 > > ? ? ? ? links ? 2 > > ? ? ? ? pflags? 40800000144 > > ? ? ? ? xattr ? 0 > > ? ? ? ? rdev? ? 0x0000000000000000 > > ? ? ? ? microzap: 1024 bytes, 1 entries > > ?? ? ? ? > > ? ? ? ? ? ? ? ? hash_test.go = 38274 (type: Regular File) > > > # zdb -dddd tank/home/cva 38274 > > Dataset tank/home/cva [ZPL], ID 196, cr_txg 163, 109G, 908537 objects, rootbp > DVA[0]=<0:13210311000:1000> DVA[1]=<0:18b9a02c000:1000> [L0 DMU objset] > fletcher4 uncompressed LE contiguous unique double size=800L/800P > birth=46916371L/46916371P fill=908537 > cksum=11fdd21d1d:13cb24c87a6e:da0c9bf1b5df3:715ab2ec45b7b09 > > > ? ? Object? lvl ? iblk ? dblk? dsize? dnsize? lsize ? %full? type > > zdb: dmu_bonus_hold(38274) failed, errno 2So, this looks like a "simple" problem. Unfortunately, it is very hard to tell retrospectively what bug caused it. The directory has an entry for the file, but the file does not actually exist (or has a different ID). This is a logical inconsistency, not a data integrity issue. So, a scrub, being a data integrity check, would not detect such an issue. Hypothetical zfs_fsck is needed to find and repair such logical problems. Does that pool and filesystem have any special history? I mean upgrades, replication via send/recv, moving between OS-s, etc. -- Andriy Gapon
Chris Anderson
2021-Feb-23 17:30 UTC
lots of "no such file or directory" errors in zfs filesystem
On Tue, Feb 23, 2021 at 4:53 AM Andriy Gapon <avg at freebsd.org> wrote:> On 23/02/2021 05:25, Chris Anderson wrote: > > so I can't ls -i the file since that triggers the no such file warning. > if I run > > zdb -dddd on the inode of a directory which contains one of those > missing files, > > I can get the inode of the file from that, but I don't get anything > particularly > > interesting in the output. > > > > most of the files that are missing are in directories with a large > number of > > files (the largest has 180k) but I managed to find a directory which had > a > > single file entry that is missing: > > > > Dataset tank/home/cva [ZPL], ID 196, cr_txg 163, 109G, 908537 objects, > rootbp > > DVA[0]=<0:13210311000:1000> DVA[1]=<0:18b9a02c000:1000> [L0 DMU objset] > > fletcher4 uncompressed LE contiguous unique double size=800L/800P > > birth=46916371L/46916371P fill=908537 > > cksum=11fdd21d1d:13cb24c87a6e:da0c9bf1b5df3:715ab2ec45b7b09 > > > > > > Object lvl iblk dblk dsize dnsize lsize %full type > > > > 38268 1 128K 1K 0 512 1K 100.00 ZFS directory > > > > 264 bonus ZFS znode > > > > dnode flags: USED_BYTES USERUSED_ACCOUNTED > > > > dnode maxblkid: 0 > > > > uid 1001 > > > > gid 1001 > > > > atime Sun Aug 6 02:00:41 2017 > > > > mtime Wed Apr 15 12:12:42 2020 > > > > ctime Wed Apr 15 12:12:42 2020 > > > > crtime Sat Aug 5 15:10:07 2017 > > > > gen 23881023 > > > > mode 40755 > > > > size 3 > > > > parent 38176 > > > > links 2 > > > > pflags 40800000144 > > > > xattr 0 > > > > rdev 0x0000000000000000 > > > > microzap: 1024 bytes, 1 entries > > > > > > > > hash_test.go = 38274 (type: Regular File) > > > > > > # zdb -dddd tank/home/cva 38274 > > > > Dataset tank/home/cva [ZPL], ID 196, cr_txg 163, 109G, 908537 objects, > rootbp > > DVA[0]=<0:13210311000:1000> DVA[1]=<0:18b9a02c000:1000> [L0 DMU objset] > > fletcher4 uncompressed LE contiguous unique double size=800L/800P > > birth=46916371L/46916371P fill=908537 > > cksum=11fdd21d1d:13cb24c87a6e:da0c9bf1b5df3:715ab2ec45b7b09 > > > > > > Object lvl iblk dblk dsize dnsize lsize %full type > > > > zdb: dmu_bonus_hold(38274) failed, errno 2 > > So, this looks like a "simple" problem. > Unfortunately, it is very hard to tell retrospectively what bug caused it. > The directory has an entry for the file, but the file does not actually > exist > (or has a different ID). > This is a logical inconsistency, not a data integrity issue. > So, a scrub, being a data integrity check, would not detect such an issue. > Hypothetical zfs_fsck is needed to find and repair such logical problems. >ah, I see. that makes sense.> Does that pool and filesystem have any special history? > I mean upgrades, replication via send/recv, moving between OS-s, etc. >nope, it led a pretty boring life. that zfs filesystem was created on that server and has been on the same two mirrored disks for its lifetime. it has had freebsd upgrades applied as they became available. zfs upgrades were for the most part avoided until quite recently (though the problem existed prior to the upgrades) the server does have a relatively modest amount of ram (2GB). dunno if that makes it more likely that these kinds of issues get triggered.