The zdb interface is certainly unstable. We plan on automatically doing
this at a future date (bugid not handy), but it''s a little tricky for
live filesystems. If your filesystem is undergoing a lot of churn, you
may notice that zdb(1M) will blow up with an I/O error or assertion
failure somewhere, because it''s not in-sync with the kernel''s
version.
Eventually, we will have a method of doing this at the ZPL layer, so
that we can correctly get this information for mounted filesystems.
So feel free to demonstrate this (its the only usable workaround at the
moment), with the caveat that:
- zdb(1M) is unstable and can change at any point
- it may not work on a live pool
We''ve also thought about how to repair such damage. Plain file
contents
are pretty easy, but metadata can be tricky, because we don''t know the
extent of blocks that it references. So if we just delete it, we''ll
leak blocks now and forever.
- Eric
On Thu, Jul 20, 2006 at 07:39:08AM -0600, Gregory Shaw
wrote:> Hi. I''m in the process of writing an introductory paper on ZFS.
> The paper is meant to be something that could be given to a systems
> admin at a site to introduce ZFS and document common procedures for
> using ZFS.
>
> In the paper, I want to document the method for identifying which
> file has a checksum error. In previous discussions on this alias,
> I''ve used the following method:
>
> zpool status -v
> pool: local
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://www.sun.com/msg/ZFS-8000-8A
> scrub: scrub completed with 4 errors on Wed Jul 12 20:38:03 2006
> config:
>
> NAME STATE READ WRITE CKSUM
> local ONLINE 0 0 8
> c0d0s7 ONLINE 0 0 4
> c1d0s2 ONLINE 0 0 4
>
> errors: The following persistent errors have been detected:
>
> DATASET OBJECT RANGE
> local/music 31018 6291456-6422528
> local/music 37932 1572864-1703936
> local/music 12895 4063232-4194304
> local/music 7782 3145728-3276800
>
> zdb -vvv local/music 31018
> Dataset local/music [ZPL], ID 21, cr_txg 286098, last_txg 569229,
> 266G, 47341 objects, rootbp [L0 DMU objset] 400L/200P DVA[0]
> =<1:1e60334600:200> DVA[1]=<0:1f34545e00:200> DVA[2]
> =<1:209bb8a00:200> fletcher4 lzjb LE contiguous birth=569229
> fill=47341 cksum=bfbec0b7e:4cabe29d1ca:f8ffe68a911f:22341ff0761b57
>
> Object lvl iblk dblk lsize asize type
> 31018 2 16K 128K 7.50M 7.51M ZFS plain file
> 264 bonus ZFS znode
> path /Mos Def/Black on Both Sides/03 Love.mp3
> atime Tue Jul 4 01:26:27 2006
> mtime Sat Apr 15 20:17:19 2006
> ctime Tue Jul 4 01:26:27 2006
> crtime Tue Jul 4 01:26:26 2006
> gen 328624
> mode 100755
> size 7762952
> parent 26652
> links 1
> xattr 0
> rdev 0x0000000000000000
>
> The above is a real error that I''ve encountered on a snv_41
machine
> that I use to store a backup of my music collection. It''s a x86
(32-
> bit) machine that has either bad disks, or, a bad controller.
>
> My question: Is the above an interface that should be documented as
> the method for identifying what file has an error? Or is there some
> other interface that is either better documented or better supported?
>
> I don''t want to put unstable interfaces in the document if I can
> avoid it.
>
> Thanks!
>
> -----
> Gregory Shaw, IT Architect
> Phone: (303) 673-8273 Fax: (303) 673-8273
> ITCTO Group, Sun Microsystems Inc.
> 1 StorageTek Drive MS 4382 greg.shaw at sun.com (work)
> Louisville, CO 80028-4382 shaw at fmsoft.com (home)
> "When Microsoft writes an application for Linux, I''ve
Won." - Linus
> Torvalds
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock