thr3ads.net - zfs discuss - [zfs-discuss] tracking an error back to a file [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Gregory Shaw

2006-Jul-20 13:39 UTC

[zfs-discuss] tracking an error back to a file

Hi.  I''m in the process of writing an introductory paper on ZFS.    
The paper is meant to be something that could be given to a systems  
admin at a site to introduce ZFS and document common procedures for  
using ZFS.

In the paper, I want to document the method for identifying which  
file has a checksum error.  In previous discussions on this alias,  
I''ve used the following method:

zpool status -v
   pool: local
state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub completed with 4 errors on Wed Jul 12 20:38:03 2006
config:

         NAME        STATE     READ WRITE CKSUM
         local       ONLINE       0     0     8
           c0d0s7    ONLINE       0     0     4
           c1d0s2    ONLINE       0     0     4

errors: The following persistent errors have been detected:

           DATASET      OBJECT  RANGE
           local/music  31018   6291456-6422528
           local/music  37932   1572864-1703936
           local/music  12895   4063232-4194304
           local/music  7782    3145728-3276800

  zdb -vvv local/music 31018
Dataset local/music [ZPL], ID 21, cr_txg 286098, last_txg 569229,  
266G, 47341 objects, rootbp [L0 DMU objset] 400L/200P DVA[0] 
=<1:1e60334600:200> DVA[1]=<0:1f34545e00:200> DVA[2] 
=<1:209bb8a00:200> fletcher4 lzjb LE contiguous birth=569229  
fill=47341 cksum=bfbec0b7e:4cabe29d1ca:f8ffe68a911f:22341ff0761b57

     Object  lvl   iblk   dblk  lsize  asize  type
      31018    2    16K   128K  7.50M  7.51M  ZFS plain file
                                  264  bonus  ZFS znode
         path    /Mos Def/Black on Both Sides/03 Love.mp3
         atime   Tue Jul  4 01:26:27 2006
         mtime   Sat Apr 15 20:17:19 2006
         ctime   Tue Jul  4 01:26:27 2006
         crtime  Tue Jul  4 01:26:26 2006
         gen     328624
         mode    100755
         size    7762952
         parent  26652
         links   1
         xattr   0
         rdev    0x0000000000000000

The above is a real error that I''ve encountered on a snv_41 machine  
that I use to store a backup of my music collection.   It''s a x86 (32- 
bit) machine that has either bad disks, or, a bad controller.

My question:  Is the above an interface that should be documented as  
the method for identifying what file has an error?  Or is there some  
other interface that is either better documented or better supported?

I don''t want to put unstable interfaces in the document if I can  
avoid it.

Thanks!

-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273        Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive MS 4382              greg.shaw at sun.com (work)
Louisville, CO 80028-4382                 shaw at fmsoft.com (home)
"When Microsoft writes an application for Linux, I''ve Won." -
Linus
Torvalds


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060720/f19659e0/attachment.html>

Eric Schrock

2006-Jul-20 15:29 UTC

head link

[zfs-discuss] tracking an error back to a file

The zdb interface is certainly unstable.  We plan on automatically doing
this at a future date (bugid not handy), but it''s a little tricky for
live filesystems.  If your filesystem is undergoing a lot of churn, you
may notice that zdb(1M) will blow up with an I/O error or assertion
failure somewhere, because it''s not in-sync with the kernel''s
version.
Eventually, we will have a method of doing this at the ZPL layer, so
that we can correctly get this information for mounted filesystems.

So feel free to demonstrate this (its the only usable workaround at the
moment), with the caveat that:

	- zdb(1M) is unstable and can change at any point
	- it may not work on a live pool

We''ve also thought about how to repair such damage.  Plain file
contents
are pretty easy, but metadata can be tricky, because we don''t know the
extent of blocks that it references.  So if we just delete it, we''ll
leak blocks now and forever.

- Eric

On Thu, Jul 20, 2006 at 07:39:08AM -0600, Gregory Shaw
wrote:> Hi.  I''m in the process of writing an introductory paper on ZFS.
> The paper is meant to be something that could be given to a systems  
> admin at a site to introduce ZFS and document common procedures for  
> using ZFS.
> 
> In the paper, I want to document the method for identifying which  
> file has a checksum error.  In previous discussions on this alias,  
> I''ve used the following method:
> 
> zpool status -v
>   pool: local
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
> scrub: scrub completed with 4 errors on Wed Jul 12 20:38:03 2006
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         local       ONLINE       0     0     8
>           c0d0s7    ONLINE       0     0     4
>           c1d0s2    ONLINE       0     0     4
> 
> errors: The following persistent errors have been detected:
> 
>           DATASET      OBJECT  RANGE
>           local/music  31018   6291456-6422528
>           local/music  37932   1572864-1703936
>           local/music  12895   4063232-4194304
>           local/music  7782    3145728-3276800
> 
>  zdb -vvv local/music 31018
> Dataset local/music [ZPL], ID 21, cr_txg 286098, last_txg 569229,  
> 266G, 47341 objects, rootbp [L0 DMU objset] 400L/200P DVA[0] 
> =<1:1e60334600:200> DVA[1]=<0:1f34545e00:200> DVA[2] 
> =<1:209bb8a00:200> fletcher4 lzjb LE contiguous birth=569229  
> fill=47341 cksum=bfbec0b7e:4cabe29d1ca:f8ffe68a911f:22341ff0761b57
> 
>     Object  lvl   iblk   dblk  lsize  asize  type
>      31018    2    16K   128K  7.50M  7.51M  ZFS plain file
>                                  264  bonus  ZFS znode
>         path    /Mos Def/Black on Both Sides/03 Love.mp3
>         atime   Tue Jul  4 01:26:27 2006
>         mtime   Sat Apr 15 20:17:19 2006
>         ctime   Tue Jul  4 01:26:27 2006
>         crtime  Tue Jul  4 01:26:26 2006
>         gen     328624
>         mode    100755
>         size    7762952
>         parent  26652
>         links   1
>         xattr   0
>         rdev    0x0000000000000000
> 
> The above is a real error that I''ve encountered on a snv_41
machine
> that I use to store a backup of my music collection.   It''s a x86
(32-
> bit) machine that has either bad disks, or, a bad controller.
> 
> My question:  Is the above an interface that should be documented as  
> the method for identifying what file has an error?  Or is there some  
> other interface that is either better documented or better supported?
> 
> I don''t want to put unstable interfaces in the document if I can  
> avoid it.
> 
> Thanks!
> 
> -----
> Gregory Shaw, IT Architect
> Phone: (303) 673-8273        Fax: (303) 673-8273
> ITCTO Group, Sun Microsystems Inc.
> 1 StorageTek Drive MS 4382              greg.shaw at sun.com (work)
> Louisville, CO 80028-4382                 shaw at fmsoft.com (home)
> "When Microsoft writes an application for Linux, I''ve
Won." - Linus
> Torvalds
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Seemingly Similar Threads

Search for more reasonably related threads

zfs discuss - Jul 2006 - tracking an error back to a file

[zfs-discuss] tracking an error back to a file

[zfs-discuss] tracking an error back to a file

Seemingly Similar Threads