thr3ads.net - zfs discuss - [zfs-discuss] Repairing corrupted ZFS pool [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Peter Jeremy

2012-Nov-17 00:15 UTC

[zfs-discuss] Repairing corrupted ZFS pool

I have been tracking down a problem with "zfs diff" that reveals
itself variously as a hang (unkillable process), panic or error,
depending on the ZFS kernel version but seems to be caused by
corruption within the pool.  I am using FreeBSD but the issue looks to
be generic ZFS, rather than FreeBSD-specific.

The hang and panic are related to the rw_enter() in
opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk()

The error is:
Unable to determine path or stats for object 2128453 in tank/beckett/home at
20120518: Invalid argument

A scrub reports no issues:
root at FB10-64:~ # zpool status
  pool: tank
 state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using ''zpool upgrade''.  Once this is
done, the
        pool will no longer be accessible on software that does not support
feature
        flags.
  scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          ada2      ONLINE       0     0     0

errors: No known data errors

But zdb says that object is the child of a plain file - which isn''t
sane:

root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 2128453
Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419
objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200>
[L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P
birth=8375L/8375P fill=2026419
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
   2128453    1    16K  1.50K  1.50K  1.50K  100.00  ZFS plain file
                                        264   bonus  ZFS znode
        dnode flags: USED_BYTES USERUSED_ACCOUNTED 
        dnode maxblkid: 0
        path    ???<object#2128453>
        uid     1000
        gid     1000
        atime   Fri Mar 23 16:34:52 2012
        mtime   Sat Oct 22 16:13:42 2011
        ctime   Sun Oct 23 21:09:02 2011
        crtime  Sat Oct 22 16:13:42 2011
        gen     2237174
        mode    100444
        size    1089
        parent  2242171
        links   1
        pflags  40800000004
        xattr   0
        rdev    0x0000000000000000

root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 2242171
Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419
objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200>
[L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P
birth=8375L/8375P fill=2026419
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
   2242171    3    16K   128K  25.4M  25.5M  100.00  ZFS plain file
                                        264   bonus  ZFS znode
        dnode flags: USED_BYTES USERUSED_ACCOUNTED 
        dnode maxblkid: 203
        path    /jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png
        uid     1000
        gid     1000
        atime   Fri Mar 23 16:41:53 2012
        mtime   Mon Oct 24 21:15:56 2011
        ctime   Mon Oct 24 21:15:56 2011
        crtime  Mon Oct 24 21:15:37 2011
        gen     2286679
        mode    100644
        size    26625731
        parent  7001490
        links   1
        pflags  40800000004
        xattr   0
        rdev    0x0000000000000000

root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 7001490
Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419
objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200>
[L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P
birth=8375L/8375P fill=2026419
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
   7001490    1    16K    512     1K    512  100.00  ZFS directory
                                        264   bonus  ZFS znode
        dnode flags: USED_BYTES USERUSED_ACCOUNTED 
        dnode maxblkid: 0
        path    /jashank/Pictures/sch/pdm-a4-11
        uid     1000
        gid     1000
        atime   Thu May 17 03:38:32 2012
        mtime   Mon Oct 24 21:15:37 2011
        ctime   Mon Oct 24 21:15:37 2011
        crtime  Fri Oct 14 22:17:44 2011
        gen     2088407
        mode    40755
        size    6
        parent  6370559
        links   2
        pflags  40800000144
        xattr   0
        rdev    0x0000000000000000
        microzap: 512 bytes, 4 entries

                stereo-pair-2.png = 2242171 (type: Regular File)
                stereo-pair-2.xcf = 7002074 (type: Regular File)
                stereo-pair-1.xcf = 7001512 (type: Regular File)
                stereo-pair-1.png = 2241802 (type: Regular File)

root at FB10-64:~ #

The above experiments were carried out on a partial copy of the pool.
The main pool started quite a long while ago and has been upgraded and
moved several times using send/recv (which happily and quietly
replicates the corruption).  Note that I have never (intentionally)
used extended attributes within the pool but it has been exported to
Windows XP via Samba and possibly to OS-X via NFSv3.

Does anyone have any suggestions for fixing the corruption?  One
suggestion was "tar c | tar x" but that is a last resort (since there
are 54 filesystems and ~1900 snapshots in the pool).

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121117/bccc93b7/attachment.bin>

Ray Arachelian

2012-Nov-19 16:02 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 11/16/2012 07:15 PM, Peter Jeremy wrote:> I have been tracking down a problem with "zfs diff" that reveals
> itself variously as a hang (unkillable process), panic or error,
> depending on the ZFS kernel version but seems to be caused by
> corruption within the pool. I am using FreeBSD but the issue looks to
> be generic ZFS, rather than FreeBSD-specific.
>
> The hang and panic are related to the rw_enter() in
> opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk()
>
> The error is:
> Unable to determine path or stats for object 2128453 in
tank/beckett/home at 20120518: Invalid argument>
Is the pool importing properly at least?  Maybe you can create another
volume and transfer the data over for that volume, then destroy it?

There are special things you can do with import where you can roll back
to a certain txg on the import if you know the damage is recent.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121119/f495b207/attachment.html>

Peter Jeremy

2012-Nov-19 17:03 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 2012-Nov-19 11:02:06 -0500, Ray Arachelian <ray at arachelian.com>
wrote:>Is the pool importing properly at least?  Maybe you can create another
>volume and transfer the data over for that volume, then destroy it?
The pool is imported and passes all tests except "zfs diff".  Creating
another pool _is_ an option but I''m not sure how to transfer the data
across - using "zfs send | zfs recv" replicates the corruption and
"tar -c | tar -x" loses all the snapshots.
>There are special things you can do with import where you can roll back
>to a certain txg on the import if you know the damage is recent.
The damage exists in the oldest snapshot for that filesystem.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/357058f8/attachment.bin>

Freddie Cash

2012-Nov-19 17:10 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On Mon, Nov 19, 2012 at 9:03 AM, Peter Jeremy <peter at rulingia.com>
wrote:> On 2012-Nov-19 11:02:06 -0500, Ray Arachelian <ray at arachelian.com>
wrote:
>>Is the pool importing properly at least?  Maybe you can create another
>>volume and transfer the data over for that volume, then destroy it?
>
> The pool is imported and passes all tests except "zfs diff". 
Creating
> another pool _is_ an option but I''m not sure how to transfer the
data
> across - using "zfs send | zfs recv" replicates the corruption
and
> "tar -c | tar -x" loses all the snapshots.
Create new pool.
Create new filesystem.
rsync data from /path/to/filesystem/.zfs/snapshot/snapname/ to new filesystem
Snapshot new filesystem.
rsync data from /path/to/filesystem/.zfs/snapshot/snapname+1/ to new filesystem
Snapshot new filesystem

See if zfs diff works.

If it does, repeat the rsync/snapshot steps for the rest of the snapshots.

-- 
Freddie Cash
fjwcash at gmail.com

Ray Arachelian

2012-Nov-19 18:47 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 11/19/2012 12:03 PM, Peter Jeremy wrote:> On 2012-Nov-19 11:02:06 -0500, Ray Arachelian <ray at arachelian.com>
wrote:
>
>
> The damage exists in the oldest snapshot for that filesystem.
>
Are you able to delete that snapshot?

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121119/df1028d7/attachment.html>

Peter Jeremy

2012-Nov-19 19:28 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 2012-Nov-19 09:10:36 -0800, Freddie Cash <fjwcash at gmail.com>
wrote:>Create new pool.
>Create new filesystem.
>rsync data from /path/to/filesystem/.zfs/snapshot/snapname/ to new
filesystem
>Snapshot new filesystem.
>rsync data from /path/to/filesystem/.zfs/snapshot/snapname+1/ to new
filesystem
>Snapshot new filesystem
>
>See if zfs diff works.
>
>If it does, repeat the rsync/snapshot steps for the rest of the snapshots.
Yep - that''s the fallback solution.  With 1874 snapshots spread over 54
filesystems (including a couple of clones), that''s a major undertaking.
(And it loses timestamp information).

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/3f433c5a/attachment.bin>

Peter Jeremy

2012-Nov-19 19:53 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 2012-Nov-19 13:47:01 -0500, Ray Arachelian <ray at arachelian.com>
wrote:>On 11/19/2012 12:03 PM, Peter Jeremy wrote:
>> The damage exists in the oldest snapshot for that filesystem.
>Are you able to delete that snapshot?
Yes but it has no effect - the corrupt object exists in the current
pool so deleting an old snapshot has no effect.

What I was hoping was that someone would have a suggestion on removing
the corruption in-place - using zdb, zhack or similar.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/48c731eb/attachment.bin>

Mark Shellenbaum

2012-Nov-19 19:58 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 11/16/12 17:15, Peter Jeremy wrote:> I have been tracking down a problem with "zfs diff" that reveals
> itself variously as a hang (unkillable process), panic or error,
> depending on the ZFS kernel version but seems to be caused by
> corruption within the pool.  I am using FreeBSD but the issue looks to
> be generic ZFS, rather than FreeBSD-specific.
>
> The hang and panic are related to the rw_enter() in
> opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk()
>
There is probably nothing wrong with the snapshots.  This is a bug in 
ZFS diff.  The ZPL parent pointer is only guaranteed to be correct for 
directory objects.  What you probably have is a file that was hard 
linked multiple times and the parent pointer (i.e. directory) was 
recycled and is now a file

> The error is:
> Unable to determine path or stats for object 2128453 in tank/beckett/home
at 20120518: Invalid argument
>
> A scrub reports no issues:
> root at FB10-64:~ # zpool status
>    pool: tank
>   state: ONLINE
> status: The pool is formatted using a legacy on-disk format.  The pool can
>          still be used, but some features are unavailable.
> action: Upgrade the pool using ''zpool upgrade''.  Once
this is done, the
>          pool will no longer be accessible on software that does not
support feature
>          flags.
>    scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36
2012
> config:
>
>          NAME        STATE     READ WRITE CKSUM
>          tank        ONLINE       0     0     0
>            ada2      ONLINE       0     0     0
>
> errors: No known data errors
>
> But zdb says that object is the child of a plain file - which
isn''t sane:
>
> root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 2128453
> Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G,
2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> 
DVA[1]=<0:31b07fbc00:200>  [L0 DMU objset] fletcher4 lzjb LE contiguous
unique double size=800L/200P birth=8375L/8375P fill=2026419
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79
>
>      Object  lvl   iblk   dblk  dsize  lsize   %full  type
>     2128453    1    16K  1.50K  1.50K  1.50K  100.00  ZFS plain file
>                                          264   bonus  ZFS znode
>          dnode flags: USED_BYTES USERUSED_ACCOUNTED
>          dnode maxblkid: 0
>          path    ???<object#2128453>
>          uid     1000
>          gid     1000
>          atime   Fri Mar 23 16:34:52 2012
>          mtime   Sat Oct 22 16:13:42 2011
>          ctime   Sun Oct 23 21:09:02 2011
>          crtime  Sat Oct 22 16:13:42 2011
>          gen     2237174
>          mode    100444
>          size    1089
>          parent  2242171
>          links   1
>          pflags  40800000004
>          xattr   0
>          rdev    0x0000000000000000
>
> root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 2242171
> Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G,
2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> 
DVA[1]=<0:31b07fbc00:200>  [L0 DMU objset] fletcher4 lzjb LE contiguous
unique double size=800L/200P birth=8375L/8375P fill=2026419
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79
>
>      Object  lvl   iblk   dblk  dsize  lsize   %full  type
>     2242171    3    16K   128K  25.4M  25.5M  100.00  ZFS plain file
>                                          264   bonus  ZFS znode
>          dnode flags: USED_BYTES USERUSED_ACCOUNTED
>          dnode maxblkid: 203
>          path    /jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png
>          uid     1000
>          gid     1000
>          atime   Fri Mar 23 16:41:53 2012
>          mtime   Mon Oct 24 21:15:56 2011
>          ctime   Mon Oct 24 21:15:56 2011
>          crtime  Mon Oct 24 21:15:37 2011
>          gen     2286679
>          mode    100644
>          size    26625731
>          parent  7001490
>          links   1
>          pflags  40800000004
>          xattr   0
>          rdev    0x0000000000000000
>
> root at FB10-64:~ # zdb -vvv tank/beckett/home at 20120518 7001490
> Dataset tank/beckett/home at 20120518 [ZPL], ID 605, cr_txg 8379, 143G,
2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> 
DVA[1]=<0:31b07fbc00:200>  [L0 DMU objset] fletcher4 lzjb LE contiguous
unique double size=800L/200P birth=8375L/8375P fill=2026419
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79
>
>      Object  lvl   iblk   dblk  dsize  lsize   %full  type
>     7001490    1    16K    512     1K    512  100.00  ZFS directory
>                                          264   bonus  ZFS znode
>          dnode flags: USED_BYTES USERUSED_ACCOUNTED
>          dnode maxblkid: 0
>          path    /jashank/Pictures/sch/pdm-a4-11
>          uid     1000
>          gid     1000
>          atime   Thu May 17 03:38:32 2012
>          mtime   Mon Oct 24 21:15:37 2011
>          ctime   Mon Oct 24 21:15:37 2011
>          crtime  Fri Oct 14 22:17:44 2011
>          gen     2088407
>          mode    40755
>          size    6
>          parent  6370559
>          links   2
>          pflags  40800000144
>          xattr   0
>          rdev    0x0000000000000000
>          microzap: 512 bytes, 4 entries
>
>                  stereo-pair-2.png = 2242171 (type: Regular File)
>                  stereo-pair-2.xcf = 7002074 (type: Regular File)
>                  stereo-pair-1.xcf = 7001512 (type: Regular File)
>                  stereo-pair-1.png = 2241802 (type: Regular File)
>
> root at FB10-64:~ #
>
> The above experiments were carried out on a partial copy of the pool.
> The main pool started quite a long while ago and has been upgraded and
> moved several times using send/recv (which happily and quietly
> replicates the corruption).  Note that I have never (intentionally)
> used extended attributes within the pool but it has been exported to
> Windows XP via Samba and possibly to OS-X via NFSv3.
>
> Does anyone have any suggestions for fixing the corruption?  One
> suggestion was "tar c | tar x" but that is a last resort (since
there
> are 54 filesystems and ~1900 snapshots in the pool).
>
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Klimov

2012-Nov-19 20:10 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 2012-11-19 20:28, Peter Jeremy wrote:> Yep - that''s the fallback solution.  With 1874 snapshots spread
over 54
> filesystems (including a couple of clones), that''s a major
undertaking.
> (And it loses timestamp information).
Well, as long as you have and know the base snapshots for the clones,
you can recreate them at the same branching point on the new copy too.

Remember to use something like "rsync -cavPHK --delete-after --inplace 
src/ dst/" to do the copy, so that the files removed from the source
snapshot are removed on target, the changes are detected thanks to
file checksum verification (not only size and timestamp), and changes
take place within the target''s copy of the file (not as
rsync''s default
copy-and-rewrite) in order for the retained snapshots history to remain
sensible and space-saving.

Also, while you are at it, you can use different settings on the new
pool, based on your achieved knowledge of your data - perhaps using
better compression (IMHO stale old data that became mostly read-only
is a good candidate for gzip-9), setting proper block sizes for files
of databases and disk images, maybe setting better checksums, and if
your RAM vastness and data similarity permit - perhaps employing dedup
(run "zdb -S" on source pool to simulate dedup and see if you get any
better than 3x savings - then it may become worthwhile).

But, yes, this will take quite a while to effectively walk your pool
several thousand times, if you do the plain rsync from each snapdir.
Perhaps, if the "zfs diff" does perform reasonably for you, you can
feed its output as the list of objects to replicate in rsync''s input
and save many cycles this way.

Good luck,
//Jim Klimov

Jim Klimov

2012-Nov-19 20:14 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 2012-11-19 20:58, Mark Shellenbaum wrote:> There is probably nothing wrong with the snapshots.  This is a bug in
> ZFS diff.  The ZPL parent pointer is only guaranteed to be correct for
> directory objects.  What you probably have is a file that was hard
> linked multiple times and the parent pointer (i.e. directory) was
> recycled and is now a file
Interesting... do the ZPL files in ZFS keep pointers to parents?

How in the COW transactiveness could the parent directory be
removed, and not the pointer to it from the files inside it?
Is this possible in current ZFS, or could this be a leftover
in the pool from its history with older releases?

Thanks,
//Jim

Jim Klimov

2012-Nov-19 20:24 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

Oh, and one more thing: rsync is only good if your filesystems don''t
really rely on ZFS/NFSv4-style ACLs. If you need those, you are stuck
with Solaris tar or Solaris cpio to carry the files over, or you have
to script up replication of ACLs after rsync somehow.

You should also replicate the "local" zfs attributes of your datasets,
"zfs allow" permissions, ACLs on ".zfs/shares/*" (if any,
for CIFS) -
at least of their currently relevant live copies, which is also not a
fatally difficult scripting (I don''t know if it is possible to fetch
the older attribute values from snapshots - which were in force at
that past moment of time; if somebody knows anything on this - plz
write).

On another note, to speed up the rsyncs, you can try to save on the
encryption (if you do this within a trusted LAN) - use rsh, or ssh
with "arcfour" or "none" enc. algos, or perhaps rsync over
NFS as
if you are in the local filesystem.

HTH,
//Jim

Tomas Forsman

2012-Nov-19 20:39 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 19 November, 2012 - Jim Klimov sent me these 1,1K bytes:
> Oh, and one more thing: rsync is only good if your filesystems
don''t
> really rely on ZFS/NFSv4-style ACLs. If you need those, you are stuck
> with Solaris tar or Solaris cpio to carry the files over, or you have
> to script up replication of ACLs after rsync somehow.
Ugly hack that seems to do the trick for us is to first rsync, then:

#!/usr/local/bin/perl -w

for my $oldfile (@ARGV) {
        my $newfile = $oldfile;
        $newfile =~ s{/export}{/newdir/export};

        next if -l $oldfile;

       
open(F,"-|","/bin/ls","-ladV","--",$oldfile);
        my @a = <F>;
        close(F);
        my $crap = shift @a; # filename line
	chomp(@a);
        for (@a) {
                $_ =~ s/ //g;
        }
        my $acl = join(",", at a);
        system("/bin/chmod","A=".$acl,$newfile);
}


/bin/find /export -acl -print0 | xargs -0 /blah/aclcopy.pl


/Tomas
-- 
Tomas Forsman, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Mark Shellenbaum

2012-Nov-19 21:38 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 11/19/12 1:14 PM, Jim Klimov wrote:> On 2012-11-19 20:58, Mark Shellenbaum wrote:
>> There is probably nothing wrong with the snapshots.  This is a bug in
>> ZFS diff.  The ZPL parent pointer is only guaranteed to be correct for
>> directory objects.  What you probably have is a file that was hard
>> linked multiple times and the parent pointer (i.e. directory) was
>> recycled and is now a file
>
> Interesting... do the ZPL files in ZFS keep pointers to parents?
>
The parent pointer for hard linked files is always set to the last link 
to be created.

$ mkdir dir.1
$ mkdir dir.2
$ touch dir.1/a
$ ln dir.1/a dir.2/a.linked
$ rm -rf dir.2

Now the parent pointer for "a" will reference a removed directory.

The parent pointer is a single 64 bit quantity that can''t track all the
possible parents a hard linked file could have.

Now when the original dir.2 object number is recycled you could have a 
situation where the parent pointer for points to a non-directory.

The ZPL never uses the parent pointer internally.  It is only used by 
zfs diff and other utility code to translate object numbers to full 
pathnames.  The ZPL has always set the parent pointer, but it is more 
for debugging purposes.
> How in the COW transactiveness could the parent directory be
> removed, and not the pointer to it from the files inside it?
> Is this possible in current ZFS, or could this be a leftover
> in the pool from its history with older releases?
>
> Thanks,
> //Jim
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Klimov

2012-Nov-19 21:51 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 2012-11-19 22:38, Mark Shellenbaum wrote:> The parent pointer is a single 64 bit quantity that can''t track
all the
> possible parents a hard linked file could have.
I believe it is inode number of the parent, or similar to that - and
an available inode number can get recycled and used by newer objects?
> Now when the original dir.2 object number is recycled you could have a
> situation where the parent pointer for points to a non-directory.
>
> The ZPL never uses the parent pointer internally.  It is only used by
> zfs diff and other utility code to translate object numbers to full
> pathnames.  The ZPL has always set the parent pointer, but it is more
> for debugging purposes.
Thanks, very interesting!

Now that this value is used and somewhat exposed to users, isn''t it
time to replace it with some nvlist or a different object type that
would hold all such parent pointers for hardlinked files (perhaps,
when moving from a single integer to nvlist if we have more than one
link from a directory to a file inode)? At least, it would make zdiff
more consistent and reliable, though at a cost of some complexity...
inodes do already track their reference counts. If we keep track of
one referrer explicitly, why not track them all?

Thanks for info,
//Jim

Peter Jeremy

2012-Nov-19 22:21 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 2012-Nov-19 21:10:56 +0100, Jim Klimov <jimklimov at cos.ru>
wrote:>On 2012-11-19 20:28, Peter Jeremy wrote:
>> Yep - that''s the fallback solution.  With 1874 snapshots
spread over 54
>> filesystems (including a couple of clones), that''s a major
undertaking.
>> (And it loses timestamp information).
>
>Well, as long as you have and know the base snapshots for the clones,
>you can recreate them at the same branching point on the new copy too.
Yes, it''s just painful.
>Also, while you are at it, you can use different settings on the new
>pool, based on your achieved knowledge of your data
This pool has a rebuild in its future anyway so I have this planned.
 - perhaps using>better compression (IMHO stale old data that became mostly read-only
>is a good candidate for gzip-9), setting proper block sizes for files
>of databases and disk images, maybe setting better checksums, and if
>your RAM vastness and data similarity permit - perhaps employing dedup
After reading the horror stories and reading up on how dedupe works,
this is definitely not on the list.
>(run "zdb -S" on source pool to simulate dedup and see if you get
any
>better than 3x savings - then it may become worthwhile).
Not without lots more RAM - and that would mean a whole new box.
>Perhaps, if the "zfs diff" does perform reasonably for you, you
can
>feed its output as the list of objects to replicate in rsync''s
input
>and save many cycles this way.
The starting point of this saga was that "zfs diff" failed, so that
isn''t an option.

On 2012-Nov-19 21:24:19 +0100, Jim Klimov <jimklimov at cos.ru>
wrote:>fatally difficult scripting (I don''t know if it is possible to
fetch
>the older attribute values from snapshots - which were in force at
>that past moment of time; if somebody knows anything on this - plz
>write).
The best way to identify past attributes is probably to parse
"zfs history", though that won''t help for
"received" attributes.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/00a2d1ad/attachment.bin>

Peter Jeremy

2012-Nov-19 22:53 UTC

head link

[zfs-discuss] Repairing corrupted ZFS pool

On 2012-Nov-19 14:38:30 -0700, Mark Shellenbaum <Mark.Shellenbaum at
oracle.com> wrote:>On 11/19/12 1:14 PM, Jim Klimov wrote:
>> On 2012-11-19 20:58, Mark Shellenbaum wrote:
>>> There is probably nothing wrong with the snapshots.  This is a bug
in
>>> ZFS diff.  The ZPL parent pointer is only guaranteed to be correct
for
>>> directory objects.  What you probably have is a file that was hard
>>> linked multiple times and the parent pointer (i.e. directory) was
>>> recycled and is now a file
Ah.  Thank you for that.  I knew about the parent pointer, I wasn''t
aware that ZFS didn''t manage it "correctly".
>The parent pointer for hard linked files is always set to the last link 
>to be created.
>
>$ mkdir dir.1
>$ mkdir dir.2
>$ touch dir.1/a
>$ ln dir.1/a dir.2/a.linked
>$ rm -rf dir.2
>
>Now the parent pointer for "a" will reference a removed directory.
I''ve done some experimenting and confirmod this behaviour.  I gather
zdb bypasses ARC because the change of parent pointer after the ln(1)
only becames visible after a sync.
>The ZPL never uses the parent pointer internally.  It is only used by 
>zfs diff and other utility code to translate object numbers to full 
>pathnames.  The ZPL has always set the parent pointer, but it is more 
>for debugging purposes.
I didn''t realise that.  I agree that the above scenario can''t
be
tracked with a single parent pointer but I assumed that ZFS reset the
parent to "unknown" rather than leaving it as a pointer to a random
no-longer-valid object.

This probably needs to be documented as a caveat on "zfs diff" -
especially since it can cause hangs and panics with older kernel code.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121120/ed502f6a/attachment-0001.bin>

zfs discuss - Nov 2012 - Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool

[zfs-discuss] Repairing corrupted ZFS pool