In my testing, I''ve found the following error: zpool status -v pool: local state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM local ONLINE 0 0 0 c0d1p0 ONLINE 0 0 0 c2d0p1 ONLINE 0 0 0 c3d0p1 ONLINE 0 0 0 c0d0s7 ONLINE 0 0 0 errors: The following persistent errors have been detected: DATASET OBJECT RANGE 1b 2402 lvl=0 blkid=1965 I haven''t found a way to report in human terms what the above object refers to. Is there such a method? I can clear the error using existing tools, but I''d like to know what is broken before I destroy it. Thanks! ----- Gregory Shaw, IT Architect Phone: (303) 673-8273 Fax: (303) 673-8273 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive ULVL4-382 greg.shaw at sun.com (work) Louisville, CO 80028-4382 shaw at fmsoft.com (home) "When Microsoft writes an application for Linux, I''ve Won." - Linus Torvalds
On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw wrote:> DATASET OBJECT RANGE > 1b 2402 lvl=0 blkid=1965 > > I haven''t found a way to report in human terms what the above object > refers to. Is there such a method?There isn''t any great method currently, but you can use ''zdb'' to find this information. The quickest way would be to first determine the name of dataset 0x1b (=27): # zdb local | grep "ID 27," Dataset local/ahrens [ZPL], ID 27, ... Then get info on that particular object in that filesystem: # zdb -vvv <dataset_name> 2402 ... Object lvl iblk dblk lsize asize type 2402 1 16K 3.50K 3.50K 2.50K ZFS plain file 264 bonus ZFS znode path /raidz/usr/src/uts/common/fs/zfs/dmu.c ... The "path" listed is relative to the filesystem''s mountpoint. --matt
Thanks! I will do the below. I brought it up on the alias, as I thought the problem would be encountered by a user eventually. They''ll want the same information -- What does the error impact? On May 22, 2006, at 12:25 AM, Matthew Ahrens wrote:> On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw wrote: >> DATASET OBJECT RANGE >> 1b 2402 lvl=0 blkid=1965 >> >> I haven''t found a way to report in human terms what the above object >> refers to. Is there such a method? > > There isn''t any great method currently, but you can use ''zdb'' to find > this information. The quickest way would be to first determine the > name > of dataset 0x1b (=27): > > # zdb local | grep "ID 27," > Dataset local/ahrens [ZPL], ID 27, ... > > Then get info on that particular object in that filesystem: > > # zdb -vvv <dataset_name> 2402 > ... > Object lvl iblk dblk lsize asize type > 2402 1 16K 3.50K 3.50K 2.50K ZFS plain file > 264 bonus ZFS znode > path /raidz/usr/src/uts/common/fs/zfs/dmu.c > ... > > The "path" listed is relative to the filesystem''s mountpoint. > > --matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss----- Gregory Shaw, IT Architect Phone: (303) 673-8273 Fax: (303) 673-8273 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive ULVL4-382 greg.shaw at sun.com (work) Louisville, CO 80028-4382 shaw at fmsoft.com (home) "When Microsoft writes an application for Linux, I''ve Won." - Linus Torvalds
Can that same method be used to figure out what files changed between snapshots? Wout. On 22 May 2006, at 08:25, Matthew Ahrens wrote:> On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw wrote: >> DATASET OBJECT RANGE >> 1b 2402 lvl=0 blkid=1965 >> >> I haven''t found a way to report in human terms what the above object >> refers to. Is there such a method? > > There isn''t any great method currently, but you can use ''zdb'' to find > this information. The quickest way would be to first determine the > name > of dataset 0x1b (=27): > > # zdb local | grep "ID 27," > Dataset local/ahrens [ZPL], ID 27, ... > > Then get info on that particular object in that filesystem: > > # zdb -vvv <dataset_name> 2402 > ... > Object lvl iblk dblk lsize asize type > 2402 1 16K 3.50K 3.50K 2.50K ZFS plain file > 264 bonus ZFS znode > path /raidz/usr/src/uts/common/fs/zfs/dmu.c > ... > > The "path" listed is relative to the filesystem''s mountpoint. > > --matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, May 23, 2006 at 11:49:47AM +0200, Wout Mertens wrote:> Can that same method be used to figure out what files changed between > snapshots?To figure out what files changed, we need to (a) figure out what object numbers changed, and (b) do the object number to file name translation. The method I described (using zdb) will not be involved in either step. zdb is an undocumented interface, and using it for this purpose is only a workaround. However, the same algorithms implemented in zdb will be used to do step (b), the object number to file name translation. --matt
The zdb object -> path trick doesn''t give me a path name: errors: The following persistent errors have been detected: DATASET OBJECT RANGE 13 a51b lvl=0 blkid=9 bash-3.00# zdb mypool | grep "ID 19," Dataset mypool/rab [ZPL], ID 19, cr_txg 6, last_txg 4391649, 80.3G, 41883 objectsbash-3.00# zdb -vvv mypool/rab a51b Dataset mypool/rab [ZPL], ID 19, cr_txg 6, last_txg 4391649, 80.3G, 41883 objects, rootbp [L0 DMU objset] 400L/200P DVA[0]=<1:4408daa00:200> DVA[1]=<0:8d7323200:200> DVA[2]=<1:6a1c4ee00:200> fletcher4 lzjb LE contiguous birth=4391649 fill=41883 cksum=b79e8d8b0:469ba0a4696:e05ec517a391:1ea5669d90270d ZIL header: claim_txg 0, seq 0 first block: [L0 ZIL intent log] 20000L/20000P DVA[0]=<1:31c560000:20000> zilog uncompressed LE contiguous birth=4030488 fill=0 cksum=7e20922ee4d68bf1:e4a75d71f8cd7cb5:13:1 Block seqno 1, won''t claim Object lvl iblk dblk lsize asize type 0 6 16K 16K 22.1M 15.2M DMU dnode Should I be concerned? If the corruption isn''t in my data, and ZFS metadata self-consistent at all times, does the corruption matter? bash-3.00# uname -a SunOS xxxx 5.11 onnv-gate:2006-09-26 i86pc i386 i86pc This message posted from opensolaris.org
Russell Blaine wrote:> The zdb object -> path trick doesn''t give me a path name: > > > errors: The following persistent errors have been detected: > > DATASET OBJECT RANGE > 13 a51b lvl=0 blkid=9> objectsbash-3.00# zdb -vvv mypool/rab a51bTry 0xa51b. --matt
That was it. Thanks, Matt. This message posted from opensolaris.org
I have one that looks like this: pool: preplica-1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM preplica-1 ONLINE 2 0 2 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 2 0 2 c2t3d0 ONLINE 0 0 0 errors: The following persistent errors have been detected: DATASET OBJECT RANGE 36 3a2939 lvl=0 blkid=0 % uname -a SunOS preplica01 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Fire-V210 % zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT preplica-1 9.06T 8.78T 291G 96% ONLINE - This is a replicated filesystem, that is kept up to date with zfs send/recv, and is never even mounted locally. Originally the error was in a regular inode. So I did the find -inum thing, and found the filename. I cp''ed the file and deleted the old copy on the original filesystem, and did some incremental zfs send|recv''s to propagate the fix here. And I expected the problem to go away. But instead it started looking like that above. I tried the trick with zdb listed here, but zdb preplica-1 | grep "ID 36," is taking forever to complete. But none of the filesystems listed near the front of the output have ID 36. So I tried the zdb -vvv of 0x3a2939 on each of the filesystems that I have - and none of them was ID 36! Not even the one that the bad inode had originally been reported it. Any suggestions? I know that it''s a relatively old version of Solaris 10, with a fairly old patchset. Should I be concerned about this error? I do know what caused it (a bad disk in the underlying hardware raid5 storage - yes... I know... I know... :-) - which was removed). So I''m not concerned about ongoing corruption from this specific problem. I just want to know what file is impacted by it. Thanks! Davin. This message posted from opensolaris.org
On Feb 18, 2007, at 9:19 PM, Davin Milun wrote:> I have one that looks like this: > pool: preplica-1 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise > restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > preplica-1 ONLINE 2 0 2 > c2t0d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t2d0 ONLINE 2 0 2 > c2t3d0 ONLINE 0 0 0 > > errors: The following persistent errors have been detected: > > DATASET OBJECT RANGE > 36 3a2939 lvl=0 blkid=0 > > % uname -a > SunOS preplica01 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Fire-V210 > > % zpool list > NAME SIZE USED AVAIL CAP HEALTH > ALTROOT > preplica-1 9.06T 8.78T 291G 96% ONLINE - > > > This is a replicated filesystem, that is kept up to date with zfs > send/recv, and is never even mounted locally. Originally the error > was in a regular inode. So I did the find -inum thing, and found > the filename. I cp''ed the file and deleted the old copy on the > original filesystem, and did some incremental zfs send|recv''s to > propagate the fix here. And I expected the problem to go away.If you run a ''zpool scrub preplica-1'', then the persistent error log will be cleaned up. In the future, we''ll have a background scrubber to make your life easier. eric
> > If you run a ''zpool scrub preplica-1'', then the persistent error log > will be cleaned up. In the future, we''ll have a background scrubber > to make your life easier. > > ericEric, Great news! Are there any details about how this will be implemented yet? I am most curious to how tunable it will be as far as system resources (CPU/IO etc). -Wade
On Feb 20, 2007, at 10:43 AM, Wade.Stuart at fallon.com wrote:> > > > > >> >> If you run a ''zpool scrub preplica-1'', then the persistent error log >> will be cleaned up. In the future, we''ll have a background scrubber >> to make your life easier. >> >> eric > > Eric, > > Great news! Are there any details about how this will be > implemented > yet? I am most curious to how tunable it will be as far as system > resources (CPU/IO etc). >No details yet, still working those out along with the infrastructure to make it happen. eric