I have been tracking down a problem with "zfs diff" that reveals itself variously as a hang (unkillable process), panic or error, depending on the ZFS kernel version but seems to be caused by corruption within the pool. I am using FreeBSD but the issue looks to be generic ZFS, rather than FreeBSD-specific. The hang and panic are related to the rw_enter() in opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk() The error is: Unable to determine path or stats for object 2128453 in tank/beckett/home at 20120518: Invalid argument A scrub reports no issues: root at FB10-64:~ # zpool status pool: tank state: ONLINE status: The pool is formatted using a legacy on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on software that does not support feature flags. scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ada2 ONLINE 0 0 0 errors: No known data errors But zdb says that object is the child of a plain file - which isn''t sane: root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 2128453 Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2128453 1 16K 1.50K 1.50K 1.50K 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path ???<object#2128453> uid 1000 gid 1000 atime Fri Mar 23 16:34:52 2012 mtime Sat Oct 22 16:13:42 2011 ctime Sun Oct 23 21:09:02 2011 crtime Sat Oct 22 16:13:42 2011 gen 2237174 mode 100444 size 1089 parent 2242171 links 1 pflags 40800000004 xattr 0 rdev 0x0000000000000000 root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 2242171 Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2242171 3 16K 128K 25.4M 25.5M 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 203 path /jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png uid 1000 gid 1000 atime Fri Mar 23 16:41:53 2012 mtime Mon Oct 24 21:15:56 2011 ctime Mon Oct 24 21:15:56 2011 crtime Mon Oct 24 21:15:37 2011 gen 2286679 mode 100644 size 26625731 parent 7001490 links 1 pflags 40800000004 xattr 0 rdev 0x0000000000000000 root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 7001490 Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 7001490 1 16K 512 1K 512 100.00 ZFS directory 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path /jashank/Pictures/sch/pdm-a4-11 uid 1000 gid 1000 atime Thu May 17 03:38:32 2012 mtime Mon Oct 24 21:15:37 2011 ctime Mon Oct 24 21:15:37 2011 crtime Fri Oct 14 22:17:44 2011 gen 2088407 mode 40755 size 6 parent 6370559 links 2 pflags 40800000144 xattr 0 rdev 0x0000000000000000 microzap: 512 bytes, 4 entries stereo-pair-2.png = 2242171 (type: Regular File) stereo-pair-2.xcf = 7002074 (type: Regular File) stereo-pair-1.xcf = 7001512 (type: Regular File) stereo-pair-1.png = 2241802 (type: Regular File) root at FB10-64:~ # The above experiments were carried out on a partial copy of the pool. The main pool started quite a long while ago and has been upgraded and moved several times using send/recv (which happily and quietly replicates the corruption). Note that I have never (intentionally) used extended attributes within the pool but it has been exported to Windows XP via Samba and possibly to OS-X via NFSv3. Does anyone have any suggestions for fixing the corruption? One suggestion was "tar c | tar x" but that is a last resort (since there are 54 filesystems and ~1900 snapshots in the pool). -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121117/bccc93b7/attachment.bin>
On 11/16/2012 07:15 PM, Peter Jeremy wrote:> I have been tracking down a problem with "zfs diff" that reveals > itself variously as a hang (unkillable process), panic or error, > depending on the ZFS kernel version but seems to be caused by > corruption within the pool. I am using FreeBSD but the issue looks to > be generic ZFS, rather than FreeBSD-specific. > > The hang and panic are related to the rw_enter() in > opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk() > > The error is: > Unable to determine path or stats for object 2128453 intank/beckett/home at 20120518: Invalid argument>Is the pool importing properly at least? Maybe you can create another volume and transfer the data over for that volume, then destroy it? There are special things you can do with import where you can roll back to a certain txg on the import if you know the damage is recent. -------------- next part -------------- An HTML attachment was scrubbed... URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121119/f495b207/attachment.html>
On 2012-Nov-19 11:02:06 -0500, Ray Arachelian <ray at arachelian.com> wrote:>Is the pool importing properly at least? Maybe you can create another >volume and transfer the data over for that volume, then destroy it?The pool is imported and passes all tests except "zfs diff". Creating another pool _is_ an option but I''m not sure how to transfer the data across - using "zfs send | zfs recv" replicates the corruption and "tar -c | tar -x" loses all the snapshots.>There are special things you can do with import where you can roll back >to a certain txg on the import if you know the damage is recent.The damage exists in the oldest snapshot for that filesystem. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/357058f8/attachment.bin>
On Mon, Nov 19, 2012 at 9:03 AM, Peter Jeremy <peter at rulingia.com> wrote:> On 2012-Nov-19 11:02:06 -0500, Ray Arachelian <ray at arachelian.com> wrote: >>Is the pool importing properly at least? Maybe you can create another >>volume and transfer the data over for that volume, then destroy it? > > The pool is imported and passes all tests except "zfs diff". Creating > another pool _is_ an option but I''m not sure how to transfer the data > across - using "zfs send | zfs recv" replicates the corruption and > "tar -c | tar -x" loses all the snapshots.Create new pool. Create new filesystem. rsync data from /path/to/filesystem/.zfs/snapshot/snapname/ to new filesystem Snapshot new filesystem. rsync data from /path/to/filesystem/.zfs/snapshot/snapname+1/ to new filesystem Snapshot new filesystem See if zfs diff works. If it does, repeat the rsync/snapshot steps for the rest of the snapshots. -- Freddie Cash fjwcash at gmail.com
On 11/19/2012 12:03 PM, Peter Jeremy wrote:> On 2012-Nov-19 11:02:06 -0500, Ray Arachelian <ray at arachelian.com> wrote: > > > The damage exists in the oldest snapshot for that filesystem. >Are you able to delete that snapshot? -------------- next part -------------- An HTML attachment was scrubbed... URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121119/df1028d7/attachment.html>
On 2012-Nov-19 09:10:36 -0800, Freddie Cash <fjwcash at gmail.com> wrote:>Create new pool. >Create new filesystem. >rsync data from /path/to/filesystem/.zfs/snapshot/snapname/ to new filesystem >Snapshot new filesystem. >rsync data from /path/to/filesystem/.zfs/snapshot/snapname+1/ to new filesystem >Snapshot new filesystem > >See if zfs diff works. > >If it does, repeat the rsync/snapshot steps for the rest of the snapshots.Yep - that''s the fallback solution. With 1874 snapshots spread over 54 filesystems (including a couple of clones), that''s a major undertaking. (And it loses timestamp information). -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/3f433c5a/attachment.bin>
On 2012-Nov-19 13:47:01 -0500, Ray Arachelian <ray at arachelian.com> wrote:>On 11/19/2012 12:03 PM, Peter Jeremy wrote: >> The damage exists in the oldest snapshot for that filesystem. >Are you able to delete that snapshot?Yes but it has no effect - the corrupt object exists in the current pool so deleting an old snapshot has no effect. What I was hoping was that someone would have a suggestion on removing the corruption in-place - using zdb, zhack or similar. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/48c731eb/attachment.bin>
On 11/16/12 17:15, Peter Jeremy wrote:> I have been tracking down a problem with "zfs diff" that reveals > itself variously as a hang (unkillable process), panic or error, > depending on the ZFS kernel version but seems to be caused by > corruption within the pool. I am using FreeBSD but the issue looks to > be generic ZFS, rather than FreeBSD-specific. > > The hang and panic are related to the rw_enter() in > opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk() >There is probably nothing wrong with the snapshots. This is a bug in ZFS diff. The ZPL parent pointer is only guaranteed to be correct for directory objects. What you probably have is a file that was hard linked multiple times and the parent pointer (i.e. directory) was recycled and is now a file> The error is: > Unable to determine path or stats for object 2128453 in tank/beckett/home at 20120518: Invalid argument > > A scrub reports no issues: > root at FB10-64:~ # zpool status > pool: tank > state: ONLINE > status: The pool is formatted using a legacy on-disk format. The pool can > still be used, but some features are unavailable. > action: Upgrade the pool using ''zpool upgrade''. Once this is done, the > pool will no longer be accessible on software that does not support feature > flags. > scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > > errors: No known data errors > > But zdb says that object is the child of a plain file - which isn''t sane: > > root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 2128453 > Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 > > Object lvl iblk dblk dsize lsize %full type > 2128453 1 16K 1.50K 1.50K 1.50K 100.00 ZFS plain file > 264 bonus ZFS znode > dnode flags: USED_BYTES USERUSED_ACCOUNTED > dnode maxblkid: 0 > path ???<object#2128453> > uid 1000 > gid 1000 > atime Fri Mar 23 16:34:52 2012 > mtime Sat Oct 22 16:13:42 2011 > ctime Sun Oct 23 21:09:02 2011 > crtime Sat Oct 22 16:13:42 2011 > gen 2237174 > mode 100444 > size 1089 > parent 2242171 > links 1 > pflags 40800000004 > xattr 0 > rdev 0x0000000000000000 > > root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 2242171 > Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 > > Object lvl iblk dblk dsize lsize %full type > 2242171 3 16K 128K 25.4M 25.5M 100.00 ZFS plain file > 264 bonus ZFS znode > dnode flags: USED_BYTES USERUSED_ACCOUNTED > dnode maxblkid: 203 > path /jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png > uid 1000 > gid 1000 > atime Fri Mar 23 16:41:53 2012 > mtime Mon Oct 24 21:15:56 2011 > ctime Mon Oct 24 21:15:56 2011 > crtime Mon Oct 24 21:15:37 2011 > gen 2286679 > mode 100644 > size 26625731 > parent 7001490 > links 1 > pflags 40800000004 > xattr 0 > rdev 0x0000000000000000 > > root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 7001490 > Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 > > Object lvl iblk dblk dsize lsize %full type > 7001490 1 16K 512 1K 512 100.00 ZFS directory > 264 bonus ZFS znode > dnode flags: USED_BYTES USERUSED_ACCOUNTED > dnode maxblkid: 0 > path /jashank/Pictures/sch/pdm-a4-11 > uid 1000 > gid 1000 > atime Thu May 17 03:38:32 2012 > mtime Mon Oct 24 21:15:37 2011 > ctime Mon Oct 24 21:15:37 2011 > crtime Fri Oct 14 22:17:44 2011 > gen 2088407 > mode 40755 > size 6 > parent 6370559 > links 2 > pflags 40800000144 > xattr 0 > rdev 0x0000000000000000 > microzap: 512 bytes, 4 entries > > stereo-pair-2.png = 2242171 (type: Regular File) > stereo-pair-2.xcf = 7002074 (type: Regular File) > stereo-pair-1.xcf = 7001512 (type: Regular File) > stereo-pair-1.png = 2241802 (type: Regular File) > > root at FB10-64:~ # > > The above experiments were carried out on a partial copy of the pool. > The main pool started quite a long while ago and has been upgraded and > moved several times using send/recv (which happily and quietly > replicates the corruption). Note that I have never (intentionally) > used extended attributes within the pool but it has been exported to > Windows XP via Samba and possibly to OS-X via NFSv3. > > Does anyone have any suggestions for fixing the corruption? One > suggestion was "tar c | tar x" but that is a last resort (since there > are 54 filesystems and ~1900 snapshots in the pool). > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 2012-11-19 20:28, Peter Jeremy wrote:> Yep - that''s the fallback solution. With 1874 snapshots spread over 54 > filesystems (including a couple of clones), that''s a major undertaking. > (And it loses timestamp information).Well, as long as you have and know the base snapshots for the clones, you can recreate them at the same branching point on the new copy too. Remember to use something like "rsync -cavPHK --delete-after --inplace src/ dst/" to do the copy, so that the files removed from the source snapshot are removed on target, the changes are detected thanks to file checksum verification (not only size and timestamp), and changes take place within the target''s copy of the file (not as rsync''s default copy-and-rewrite) in order for the retained snapshots history to remain sensible and space-saving. Also, while you are at it, you can use different settings on the new pool, based on your achieved knowledge of your data - perhaps using better compression (IMHO stale old data that became mostly read-only is a good candidate for gzip-9), setting proper block sizes for files of databases and disk images, maybe setting better checksums, and if your RAM vastness and data similarity permit - perhaps employing dedup (run "zdb -S" on source pool to simulate dedup and see if you get any better than 3x savings - then it may become worthwhile). But, yes, this will take quite a while to effectively walk your pool several thousand times, if you do the plain rsync from each snapdir. Perhaps, if the "zfs diff" does perform reasonably for you, you can feed its output as the list of objects to replicate in rsync''s input and save many cycles this way. Good luck, //Jim Klimov
On 2012-11-19 20:58, Mark Shellenbaum wrote:> There is probably nothing wrong with the snapshots. This is a bug in > ZFS diff. The ZPL parent pointer is only guaranteed to be correct for > directory objects. What you probably have is a file that was hard > linked multiple times and the parent pointer (i.e. directory) was > recycled and is now a fileInteresting... do the ZPL files in ZFS keep pointers to parents? How in the COW transactiveness could the parent directory be removed, and not the pointer to it from the files inside it? Is this possible in current ZFS, or could this be a leftover in the pool from its history with older releases? Thanks, //Jim
Oh, and one more thing: rsync is only good if your filesystems don''t really rely on ZFS/NFSv4-style ACLs. If you need those, you are stuck with Solaris tar or Solaris cpio to carry the files over, or you have to script up replication of ACLs after rsync somehow. You should also replicate the "local" zfs attributes of your datasets, "zfs allow" permissions, ACLs on ".zfs/shares/*" (if any, for CIFS) - at least of their currently relevant live copies, which is also not a fatally difficult scripting (I don''t know if it is possible to fetch the older attribute values from snapshots - which were in force at that past moment of time; if somebody knows anything on this - plz write). On another note, to speed up the rsyncs, you can try to save on the encryption (if you do this within a trusted LAN) - use rsh, or ssh with "arcfour" or "none" enc. algos, or perhaps rsync over NFS as if you are in the local filesystem. HTH, //Jim
On 19 November, 2012 - Jim Klimov sent me these 1,1K bytes:> Oh, and one more thing: rsync is only good if your filesystems don''t > really rely on ZFS/NFSv4-style ACLs. If you need those, you are stuck > with Solaris tar or Solaris cpio to carry the files over, or you have > to script up replication of ACLs after rsync somehow.Ugly hack that seems to do the trick for us is to first rsync, then: #!/usr/local/bin/perl -w for my $oldfile (@ARGV) { my $newfile = $oldfile; $newfile =~ s{/export}{/newdir/export}; next if -l $oldfile; open(F,"-|","/bin/ls","-ladV","--",$oldfile); my @a = <F>; close(F); my $crap = shift @a; # filename line chomp(@a); for (@a) { $_ =~ s/ //g; } my $acl = join(",", at a); system("/bin/chmod","A=".$acl,$newfile); } /bin/find /export -acl -print0 | xargs -0 /blah/aclcopy.pl /Tomas -- Tomas Forsman, stric at acc.umu.se, acc.umu.se/~stric |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
On 11/19/12 1:14 PM, Jim Klimov wrote:> On 2012-11-19 20:58, Mark Shellenbaum wrote: >> There is probably nothing wrong with the snapshots. This is a bug in >> ZFS diff. The ZPL parent pointer is only guaranteed to be correct for >> directory objects. What you probably have is a file that was hard >> linked multiple times and the parent pointer (i.e. directory) was >> recycled and is now a file > > Interesting... do the ZPL files in ZFS keep pointers to parents? >The parent pointer for hard linked files is always set to the last link to be created. $ mkdir dir.1 $ mkdir dir.2 $ touch dir.1/a $ ln dir.1/a dir.2/a.linked $ rm -rf dir.2 Now the parent pointer for "a" will reference a removed directory. The parent pointer is a single 64 bit quantity that can''t track all the possible parents a hard linked file could have. Now when the original dir.2 object number is recycled you could have a situation where the parent pointer for points to a non-directory. The ZPL never uses the parent pointer internally. It is only used by zfs diff and other utility code to translate object numbers to full pathnames. The ZPL has always set the parent pointer, but it is more for debugging purposes.> How in the COW transactiveness could the parent directory be > removed, and not the pointer to it from the files inside it? > Is this possible in current ZFS, or could this be a leftover > in the pool from its history with older releases? > > Thanks, > //Jim > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 2012-11-19 22:38, Mark Shellenbaum wrote:> The parent pointer is a single 64 bit quantity that can''t track all the > possible parents a hard linked file could have.I believe it is inode number of the parent, or similar to that - and an available inode number can get recycled and used by newer objects?> Now when the original dir.2 object number is recycled you could have a > situation where the parent pointer for points to a non-directory. > > The ZPL never uses the parent pointer internally. It is only used by > zfs diff and other utility code to translate object numbers to full > pathnames. The ZPL has always set the parent pointer, but it is more > for debugging purposes.Thanks, very interesting! Now that this value is used and somewhat exposed to users, isn''t it time to replace it with some nvlist or a different object type that would hold all such parent pointers for hardlinked files (perhaps, when moving from a single integer to nvlist if we have more than one link from a directory to a file inode)? At least, it would make zdiff more consistent and reliable, though at a cost of some complexity... inodes do already track their reference counts. If we keep track of one referrer explicitly, why not track them all? Thanks for info, //Jim
On 2012-Nov-19 21:10:56 +0100, Jim Klimov <jimklimov at cos.ru> wrote:>On 2012-11-19 20:28, Peter Jeremy wrote: >> Yep - that''s the fallback solution. With 1874 snapshots spread over 54 >> filesystems (including a couple of clones), that''s a major undertaking. >> (And it loses timestamp information). > >Well, as long as you have and know the base snapshots for the clones, >you can recreate them at the same branching point on the new copy too.Yes, it''s just painful.>Also, while you are at it, you can use different settings on the new >pool, based on your achieved knowledge of your dataThis pool has a rebuild in its future anyway so I have this planned. - perhaps using>better compression (IMHO stale old data that became mostly read-only >is a good candidate for gzip-9), setting proper block sizes for files >of databases and disk images, maybe setting better checksums, and if>your RAM vastness and data similarity permit - perhaps employing dedupAfter reading the horror stories and reading up on how dedupe works, this is definitely not on the list.>(run "zdb -S" on source pool to simulate dedup and see if you get any >better than 3x savings - then it may become worthwhile).Not without lots more RAM - and that would mean a whole new box.>Perhaps, if the "zfs diff" does perform reasonably for you, you can >feed its output as the list of objects to replicate in rsync''s input >and save many cycles this way.The starting point of this saga was that "zfs diff" failed, so that isn''t an option. On 2012-Nov-19 21:24:19 +0100, Jim Klimov <jimklimov at cos.ru> wrote:>fatally difficult scripting (I don''t know if it is possible to fetch >the older attribute values from snapshots - which were in force at >that past moment of time; if somebody knows anything on this - plz >write).The best way to identify past attributes is probably to parse "zfs history", though that won''t help for "received" attributes. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/00a2d1ad/attachment.bin>
On 2012-Nov-19 14:38:30 -0700, Mark Shellenbaum <Mark.Shellenbaum at oracle.com> wrote:>On 11/19/12 1:14 PM, Jim Klimov wrote: >> On 2012-11-19 20:58, Mark Shellenbaum wrote: >>> There is probably nothing wrong with the snapshots. This is a bug in >>> ZFS diff. The ZPL parent pointer is only guaranteed to be correct for >>> directory objects. What you probably have is a file that was hard >>> linked multiple times and the parent pointer (i.e. directory) was >>> recycled and is now a fileAh. Thank you for that. I knew about the parent pointer, I wasn''t aware that ZFS didn''t manage it "correctly".>The parent pointer for hard linked files is always set to the last link >to be created. > >$ mkdir dir.1 >$ mkdir dir.2 >$ touch dir.1/a >$ ln dir.1/a dir.2/a.linked >$ rm -rf dir.2 > >Now the parent pointer for "a" will reference a removed directory.I''ve done some experimenting and confirmod this behaviour. I gather zdb bypasses ARC because the change of parent pointer after the ln(1) only becames visible after a sync.>The ZPL never uses the parent pointer internally. It is only used by >zfs diff and other utility code to translate object numbers to full >pathnames. The ZPL has always set the parent pointer, but it is more >for debugging purposes.I didn''t realise that. I agree that the above scenario can''t be tracked with a single parent pointer but I assumed that ZFS reset the parent to "unknown" rather than leaving it as a pointer to a random no-longer-valid object. This probably needs to be documented as a caveat on "zfs diff" - especially since it can cause hangs and panics with older kernel code. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/ed502f6a/attachment-0001.bin>