thr3ads.net - zfs discuss - [zfs-discuss] tracking error to file [May 2006]

If this information is useful, please help other people find it:
Share via:

Gregory Shaw

2006-May-19 19:23 UTC

[zfs-discuss] tracking error to file

In my testing, I''ve found the following error:

zpool status -v
   pool: local
state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         local       ONLINE       0     0     0
           c0d1p0    ONLINE       0     0     0
           c2d0p1    ONLINE       0     0     0
           c3d0p1    ONLINE       0     0     0
           c0d0s7    ONLINE       0     0     0

errors: The following persistent errors have been detected:

           DATASET  OBJECT  RANGE
           1b       2402    lvl=0 blkid=1965

I haven''t found a way to report in human terms what the above object  
refers to.  Is there such a method?

I can clear the error using existing tools, but I''d like to know what  
is broken before I destroy it.

Thanks!

-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273        Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382           greg.shaw at sun.com (work)
Louisville, CO 80028-4382                 shaw at fmsoft.com (home)
"When Microsoft writes an application for Linux, I''ve Won." -
Linus
Torvalds

Matthew Ahrens

2006-May-22 06:25 UTC

head link

[zfs-discuss] tracking error to file

On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw
wrote:>           DATASET  OBJECT  RANGE
>           1b       2402    lvl=0 blkid=1965
> 
> I haven''t found a way to report in human terms what the above
object
> refers to.  Is there such a method?
There isn''t any great method currently, but you can use
''zdb'' to find
this information.  The quickest way would be to first determine the name
of dataset 0x1b (=27):

	# zdb local | grep "ID 27,"
	Dataset local/ahrens [ZPL], ID 27, ...

Then get info on that particular object in that filesystem:

	# zdb -vvv <dataset_name> 2402
	...
	    Object  lvl   iblk   dblk  lsize  asize  type
	      2402    1    16K  3.50K  3.50K  2.50K  ZFS plain file
					 264  bonus  ZFS znode
		path    /raidz/usr/src/uts/common/fs/zfs/dmu.c
	...

The "path" listed is relative to the filesystem''s mountpoint.

--matt

Gregory Shaw

2006-May-22 15:21 UTC

head link

[zfs-discuss] tracking error to file

Thanks!  I will do the below.

I brought it up on the alias, as I thought the problem would be  
encountered by a user eventually.  They''ll want the same information  
-- What does the error impact?

On May 22, 2006, at 12:25 AM, Matthew Ahrens wrote:
> On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw wrote:
>>           DATASET  OBJECT  RANGE
>>           1b       2402    lvl=0 blkid=1965
>>
>> I haven''t found a way to report in human terms what the above
object
>> refers to.  Is there such a method?
>
> There isn''t any great method currently, but you can use
''zdb'' to find
> this information.  The quickest way would be to first determine the  
> name
> of dataset 0x1b (=27):
>
> 	# zdb local | grep "ID 27,"
> 	Dataset local/ahrens [ZPL], ID 27, ...
>
> Then get info on that particular object in that filesystem:
>
> 	# zdb -vvv <dataset_name> 2402
> 	...
> 	    Object  lvl   iblk   dblk  lsize  asize  type
> 	      2402    1    16K  3.50K  3.50K  2.50K  ZFS plain file
> 					 264  bonus  ZFS znode
> 		path    /raidz/usr/src/uts/common/fs/zfs/dmu.c
> 	...
>
> The "path" listed is relative to the filesystem''s
mountpoint.
>
> --matt
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273        Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382           greg.shaw at sun.com (work)
Louisville, CO 80028-4382                 shaw at fmsoft.com (home)
"When Microsoft writes an application for Linux, I''ve Won." -
Linus
Torvalds

Wout Mertens

2006-May-23 09:49 UTC

head link

[zfs-discuss] tracking error to file

Can that same method be used to figure out what files changed between  
snapshots?

Wout.

On 22 May 2006, at 08:25, Matthew Ahrens wrote:
> On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw wrote:
>>           DATASET  OBJECT  RANGE
>>           1b       2402    lvl=0 blkid=1965
>>
>> I haven''t found a way to report in human terms what the above
object
>> refers to.  Is there such a method?
>
> There isn''t any great method currently, but you can use
''zdb'' to find
> this information.  The quickest way would be to first determine the  
> name
> of dataset 0x1b (=27):
>
> 	# zdb local | grep "ID 27,"
> 	Dataset local/ahrens [ZPL], ID 27, ...
>
> Then get info on that particular object in that filesystem:
>
> 	# zdb -vvv <dataset_name> 2402
> 	...
> 	    Object  lvl   iblk   dblk  lsize  asize  type
> 	      2402    1    16K  3.50K  3.50K  2.50K  ZFS plain file
> 					 264  bonus  ZFS znode
> 		path    /raidz/usr/src/uts/common/fs/zfs/dmu.c
> 	...
>
> The "path" listed is relative to the filesystem''s
mountpoint.
>
> --matt
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Matthew Ahrens

2006-May-23 16:44 UTC

head link

[zfs-discuss] tracking error to file

On Tue, May 23, 2006 at 11:49:47AM +0200, Wout Mertens
wrote:> Can that same method be used to figure out what files changed between  
> snapshots?
To figure out what files changed, we need to (a) figure out what object
numbers changed, and (b) do the object number to file name translation.

The method I described (using zdb) will not be involved in either step.
zdb is an undocumented interface, and using it for this purpose is only
a workaround.  However, the same algorithms implemented in zdb will be
used to do step (b), the object number to file name translation.

--matt

Russell Blaine

2006-Sep-27 22:32 UTC

head link

[zfs-discuss] Re: tracking error to file

The zdb object -> path trick doesn''t give me a path name:


errors: The following persistent errors have been detected:

          DATASET  OBJECT  RANGE
          13       a51b    lvl=0 blkid=9

bash-3.00# zdb mypool | grep "ID 19,"
Dataset mypool/rab [ZPL], ID 19, cr_txg 6, last_txg 4391649, 80.3G, 41883 

objectsbash-3.00# zdb -vvv mypool/rab a51b
Dataset mypool/rab [ZPL], ID 19, cr_txg 6, last_txg 4391649, 80.3G, 41883
objects, rootbp [L0 DMU objset] 400L/200P DVA[0]=<1:4408daa00:200>
DVA[1]=<0:8d7323200:200> DVA[2]=<1:6a1c4ee00:200> fletcher4 lzjb LE
contiguous birth=4391649 fill=41883
cksum=b79e8d8b0:469ba0a4696:e05ec517a391:1ea5669d90270d

    ZIL header: claim_txg 0, seq 0

        first block: [L0 ZIL intent log] 20000L/20000P
DVA[0]=<1:31c560000:20000> zilog uncompressed LE contiguous birth=4030488
fill=0 cksum=7e20922ee4d68bf1:e4a75d71f8cd7cb5:13:1

        Block seqno 1, won''t claim


    Object  lvl   iblk   dblk  lsize  asize  type
         0    6    16K    16K  22.1M  15.2M  DMU dnode

Should I be concerned? If the corruption isn''t in my data, and ZFS
metadata self-consistent at all times, does the corruption matter?

bash-3.00# uname -a
SunOS xxxx 5.11 onnv-gate:2006-09-26 i86pc i386 i86pc
 
 
This message posted from opensolaris.org

Matthew Ahrens

2006-Sep-27 22:55 UTC

head link

[zfs-discuss] Re: tracking error to file

Russell Blaine wrote:> The zdb object -> path trick doesn''t give me a path name:
> 
> 
> errors: The following persistent errors have been detected:
> 
>           DATASET  OBJECT  RANGE
>           13       a51b    lvl=0 blkid=9
> objectsbash-3.00# zdb -vvv mypool/rab a51b
Try 0xa51b.

--matt

Russell Blaine

2006-Sep-28 01:45 UTC

head link

[zfs-discuss] Re: Re: tracking error to file

That was it. Thanks, Matt.
 
 
This message posted from opensolaris.org

Davin Milun

2007-Feb-19 05:19 UTC

head link

[zfs-discuss] Re: tracking error to file

I have one that looks like this:
  pool: preplica-1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        preplica-1  ONLINE       2     0     2
          c2t0d0    ONLINE       0     0     0
          c2t1d0    ONLINE       0     0     0
          c2t2d0    ONLINE       2     0     2
          c2t3d0    ONLINE       0     0     0

errors: The following persistent errors have been detected:

          DATASET  OBJECT  RANGE
          36       3a2939  lvl=0 blkid=0

% uname -a
SunOS preplica01 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Fire-V210

% zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
preplica-1            9.06T   8.78T    291G    96%  ONLINE     -


This is a replicated filesystem, that is kept up to date with zfs send/recv, and
is never even mounted locally.  Originally the error was in a regular inode.  So
I did the find -inum thing, and found the filename.  I cp''ed the file
and deleted the old copy on the original filesystem, and did some incremental
zfs send|recv''s to propagate the fix here.   And I expected the problem
to go away.

But instead it started looking like that above.

I tried the trick with zdb listed here, but 
  zdb preplica-1 | grep "ID 36,"
is taking forever to complete.  But none of the filesystems listed near the
front of the output have ID 36.

So I tried the zdb -vvv of 0x3a2939 on each of the filesystems that I have - and
none of them was ID 36!  Not even the one that the bad inode had originally been
reported it.

Any suggestions?

I know that it''s a relatively old version of Solaris 10, with a fairly
old patchset.

Should I be concerned about this error?  I do know what caused it (a bad disk in
the underlying hardware raid5 storage - yes... I know... I know... :-)  - which
was removed). So I''m not concerned about ongoing corruption from this
specific problem.   I just want to know what file is impacted by it.

Thanks!
Davin.
 
 
This message posted from opensolaris.org

eric kustarz

2007-Feb-20 17:27 UTC

head link

[zfs-discuss] Re: tracking error to file

On Feb 18, 2007, at 9:19 PM, Davin Milun wrote:
> I have one that looks like this:
>   pool: preplica-1
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise  
> restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         preplica-1  ONLINE       2     0     2
>           c2t0d0    ONLINE       0     0     0
>           c2t1d0    ONLINE       0     0     0
>           c2t2d0    ONLINE       2     0     2
>           c2t3d0    ONLINE       0     0     0
>
> errors: The following persistent errors have been detected:
>
>           DATASET  OBJECT  RANGE
>           36       3a2939  lvl=0 blkid=0
>
> % uname -a
> SunOS preplica01 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Fire-V210
>
> % zpool list
> NAME                    SIZE    USED   AVAIL    CAP  HEALTH      
> ALTROOT
> preplica-1            9.06T   8.78T    291G    96%  ONLINE     -
>
>
> This is a replicated filesystem, that is kept up to date with zfs  
> send/recv, and is never even mounted locally.  Originally the error  
> was in a regular inode.  So I did the find -inum thing, and found  
> the filename.  I cp''ed the file and deleted the old copy on the  
> original filesystem, and did some incremental zfs send|recv''s to  
> propagate the fix here.   And I expected the problem to go away.
If you run a ''zpool scrub preplica-1'', then the persistent
error log
will be cleaned up.  In the future, we''ll have a background scrubber  
to make your life easier.

eric

Wade.Stuart at fallon.com

2007-Feb-20 18:43 UTC

head link

[zfs-discuss] Re: tracking error to file

>
> If you run a ''zpool scrub preplica-1'', then the
persistent error log
> will be cleaned up.  In the future, we''ll have a background
scrubber
> to make your life easier.
>
> eric
Eric,

      Great news!  Are there any details about how this will be implemented
yet?  I am most curious to how tunable it will be as far as system
resources (CPU/IO etc).

-Wade

eric kustarz

2007-Feb-20 19:54 UTC

head link

[zfs-discuss] Re: tracking error to file

On Feb 20, 2007, at 10:43 AM, Wade.Stuart at fallon.com wrote:
>
>
>
>
>
>>
>> If you run a ''zpool scrub preplica-1'', then the
persistent error log
>> will be cleaned up.  In the future, we''ll have a background
scrubber
>> to make your life easier.
>>
>> eric
>
> Eric,
>
>       Great news!  Are there any details about how this will be  
> implemented
> yet?  I am most curious to how tunable it will be as far as system
> resources (CPU/IO etc).
>
No details yet, still working those out along with the infrastructure  
to make it happen.

eric

Possibly Parallel Threads

Search for more reasonably related threads

zfs discuss - May 2006 - tracking error to file

[zfs-discuss] tracking error to file

[zfs-discuss] tracking error to file

[zfs-discuss] tracking error to file

[zfs-discuss] tracking error to file

[zfs-discuss] tracking error to file

[zfs-discuss] Re: tracking error to file

[zfs-discuss] Re: tracking error to file

[zfs-discuss] Re: Re: tracking error to file

[zfs-discuss] Re: tracking error to file

[zfs-discuss] Re: tracking error to file

[zfs-discuss] Re: tracking error to file

[zfs-discuss] Re: tracking error to file

Possibly Parallel Threads