I had a checksum error occur in a file. Since only one file is corrupt (and it''s a link library at that) I don''t want to blow away the whole pool to remove the corrupt file. However, I can''t figure out any way to unlink the file. Using "rm" to try to unlink the file I get EIO: % rm llib-lip.ln rm: llib-lip.ln not removed: I/O error Trying to truncate it is also no dice: % cat >llib-lip.ln llib-lip.ln: I/O error What are the expected paths for recovery here? I took a look at: http://www.sun.com/msg/ZFS-8000-8A That page isn''t helpful since it just says to "restore the file". Well, you can''t restore a file if you can''t cleanup the old corrupted one! (Also BTW that page has a typo, you might want to get the typo fixed, I didn''t know where the doc bugs should go for those messages) - Eric
On Wed, 19 Jul 2006, Eric Lowe wrote:> > (Also BTW that page has a typo, you might want to get the typo fixed, I > didn''t know where the doc bugs should go for those messages) > > - EricProduct: event_registry Category: events Sub-Category: msg -tim
>> (Also BTW that page has a typo, you might want to get the typo fixed, >> I didn''t know where the doc bugs should go for those messages) >> > Product: event_registry > Category: events > Sub-Category: msgThanks, I filed 6450642. - Eric
What does ''zpool status -v'' show? This sounds like you have corruption in the dnode (a.k.a. metadata). This corruption is unrepairable at the moment, since we have no way of knowing the extent of the blocks that this dnode may be referencing. You should be able to move this file aside, however. - Eric On Wed, Jul 19, 2006 at 01:27:23PM -0500, Eric Lowe wrote:> I had a checksum error occur in a file. Since only one file is corrupt > (and it''s a link library at that) I don''t want to blow away the whole pool > to remove the corrupt file. However, I can''t figure out any way to unlink > the file. Using "rm" to try to unlink the file I get EIO: > > % rm llib-lip.ln > rm: llib-lip.ln not removed: I/O error > > Trying to truncate it is also no dice: > % cat >llib-lip.ln > llib-lip.ln: I/O error > > What are the expected paths for recovery here? > > I took a look at: > http://www.sun.com/msg/ZFS-8000-8A > > That page isn''t helpful since it just says to "restore the file". Well, > you can''t restore a file if you can''t cleanup the old corrupted one! > > (Also BTW that page has a typo, you might want to get the typo fixed, I > didn''t know where the doc bugs should go for those messages) > > - Eric > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Eric Schrock wrote:> What does ''zpool status -v'' show? This sounds like you have corruption# zpool status -v pool: junk state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM junk ONLINE 0 0 0 raidz ONLINE 0 0 0 c0d0 ONLINE 0 0 0 c1d0 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 errors: The following persistent errors have been detected: DATASET OBJECT RANGE 27 4a2e5 lvl=2 blkid=0> in the dnode (a.k.a. metadata). This corruption is unrepairable at the > moment, since we have no way of knowing the extent of the blocks that > this dnode may be referencing. You should be able to move this file > aside, however.Trying to move it panic''d my machine. However I am running build 36 (big disclaimer). It''s time for a BFU. ;) - Eric
Well the fact that it''s a level 2 indirect block indicates why it can''t simply be removed. We don''t know what data it refers to, so we can''t free the associated blocks. The panic on move is quite interesting - after BFU give it another shot and file a bug if it still happens. - Eric On Thu, Jul 20, 2006 at 02:28:38PM -0500, Eric Lowe wrote:> Eric Schrock wrote: > >What does ''zpool status -v'' show? This sounds like you have corruption > > # zpool status -v > pool: junk > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > junk ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c0d0 ONLINE 0 0 0 > c1d0 ONLINE 0 0 0 > c1d1 ONLINE 0 0 0 > > errors: The following persistent errors have been detected: > > DATASET OBJECT RANGE > 27 4a2e5 lvl=2 blkid=0 > > >in the dnode (a.k.a. metadata). This corruption is unrepairable at the > >moment, since we have no way of knowing the extent of the blocks that > >this dnode may be referencing. You should be able to move this file > >aside, however. > > Trying to move it panic''d my machine. > > However I am running build 36 (big disclaimer). It''s time for a BFU. ;) > > - Eric-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
> Well the fact that it''s a level 2 indirect block indicates why it can''t > simply be removed. We don''t know what data it refers to, so we can''t > free the associated blocks. The panic on move is quite interesting - > after BFU give it another shot and file a bug if it still happens.What''s the long term solution for this type of corruption? Will there be a ''fsck''-like utility that can find all valid items and make sure they''re connected properly, or is something else possible? -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
On Thu, 20 Jul 2006, Darren Dunham wrote:> > Well the fact that it''s a level 2 indirect block indicates why it can''t > > simply be removed. We don''t know what data it refers to, so we can''t > > free the associated blocks. The panic on move is quite interesting - > > after BFU give it another shot and file a bug if it still happens. > > What''s the long term solution for this type of corruption? Will there > be a ''fsck''-like utility that can find all valid items and make sure > they''re connected properly, or is something else possible?This is deja vu and positively scary. In the bad old days, when we were cursed with hierarchical databases[1], one ran the DB "fsck" equivalent. And sometimes it worked; and sometimes it did''nt. And sometimes there were bugs in the hierarchical fsck/repair utility that could turn your minor DB "issue" into a totally trashed DB! :( We still have hierarchical databases of course - but todays software technology and practices have made them far less vulnerable to nasty bugs. Try running the test suite for the SleepyCat DB sometime you have a machine you want to exercise... ZFS has a very reasonable tree-like data structure - but ... the memory of hierarchical DB fsck-like utilities really scare me... There has to be a better way. [1] and there were few alternatives. Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006
See: http://www.opensolaris.org/jive/thread.jspa?threadID=11305&tstart=0 Basically, the first step is to identify the file in question so the user knows what''s been lost. The second step is a way to move these blocks into pergatory, where they won''t take up filesystem namespace, but still account for used space. The final step is to actually delete the blocks and then do a garbage-collection type of operation to find which blocks are no longer referenced. This is a hugely complicated task, as dealing with snapshots and DMU accounting is going to be horrific, if possible at all. It is, however, on our (rather long) list of things to tackle. If your corruption is in a L0 data block, you''re in a much better situation, because we know the size of the block and can safely free it without worrying about what else it might reference. - Eric On Thu, Jul 20, 2006 at 01:20:23PM -0700, Darren Dunham wrote:> > Well the fact that it''s a level 2 indirect block indicates why it can''t > > simply be removed. We don''t know what data it refers to, so we can''t > > free the associated blocks. The panic on move is quite interesting - > > after BFU give it another shot and file a bug if it still happens. > > What''s the long term solution for this type of corruption? Will there > be a ''fsck''-like utility that can find all valid items and make sure > they''re connected properly, or is something else possible? > > -- > Darren Dunham ddunham at taos.com > Senior Technical Consultant TAOS http://www.taos.com/ > Got some Dr Pepper? San Francisco, CA bay area > < This line left intentionally blank to confuse you. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Note that there are two common reasons to have a fsck-like utility - 1. Detect corruption 2. Repair corruption For the first, we have scrubbing (and eventually background scrubbing) so it''s pointless in the ZFS world. For the latter, the type of things it repairs are known pathologies endemic to the underlying filesystem. For example, it knows how to reconnect inodes if you were in the middle of adding the corresponding directory entry, fixing up the global inode table, etc. For the type of corruption we''re talking about, there is no repair procedure, period. We cannot deal with arbitrary corruption any more than other filesystems. However, we do have the advantage of always knowing when something is corrupted, and knowing what that particular block should have been. The best we can hope for is a) to identify orphaned blocks resulting from corruption and b) provide a way to move/free these files so they don''t permanently pollute the filesystem namespace. - Eric On Thu, Jul 20, 2006 at 03:34:07PM -0500, Al Hopper wrote:> On Thu, 20 Jul 2006, Darren Dunham wrote: > > > > Well the fact that it''s a level 2 indirect block indicates why it can''t > > > simply be removed. We don''t know what data it refers to, so we can''t > > > free the associated blocks. The panic on move is quite interesting - > > > after BFU give it another shot and file a bug if it still happens. > > > > What''s the long term solution for this type of corruption? Will there > > be a ''fsck''-like utility that can find all valid items and make sure > > they''re connected properly, or is something else possible? > > This is deja vu and positively scary. In the bad old days, when we were > cursed with hierarchical databases[1], one ran the DB "fsck" equivalent. > And sometimes it worked; and sometimes it did''nt. And sometimes there > were bugs in the hierarchical fsck/repair utility that could turn your > minor DB "issue" into a totally trashed DB! :( > > We still have hierarchical databases of course - but todays software > technology and practices have made them far less vulnerable to nasty bugs. > Try running the test suite for the SleepyCat DB sometime you have a > machine you want to exercise... > > ZFS has a very reasonable tree-like data structure - but ... the memory > of hierarchical DB fsck-like utilities really scare me... There has to be > a better way. > > [1] and there were few alternatives. > > Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 > OpenSolaris Governing Board (OGB) Member - Feb 2006 > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
> Basically, the first step is to identify the file in question so the > user knows what''s been lost. The second step is a way to move these > blocks into pergatory, where they won''t take up filesystem namespace, > but still account for used space. The final step is to actually delete > the blocks and then do a garbage-collection type of operation to find > which blocks are no longer referenced.The GC operation is what I was referring to by a ''fsck''-like utility. (not that it has to be a stand-alone utility in the way fsck is).> This is a hugely complicated task, as dealing with snapshots and DMU > accounting is going to be horrific, if possible at all. It is, however, > on our (rather long) list of things to tackle.Understood. I was really just interested in learning if this was a "we wanted to get other stuff out the door first" or a "we''re not sure it''s possible" type problem. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
> However, we do have the advantage of always knowing when something > is corrupted, and knowing what that particular block should have been.We also have ditto blocks for all metadata, so that even if any block of ZFS metadata is destroyed, we always have another copy. Bill Moore describes ditto blocks in detail here: http://blogs.sun.com/roller/page/bill?entry=ditto_blocks_the_amazing_tape Jeff
On Thu, Jul 20, 2006 at 03:45:54PM -0700, Jeff Bonwick wrote:> > However, we do have the advantage of always knowing when something > > is corrupted, and knowing what that particular block should have been. > > We also have ditto blocks for all metadata, so that even if any block > of ZFS metadata is destroyed, we always have another copy. > Bill Moore describes ditto blocks in detail here: > > http://blogs.sun.com/roller/page/bill?entry=ditto_blocks_the_amazing_tapeRight. And I should point out that if Eric had been running build 38 or later, this data corruption would not have happened - it would have been automatically repaired using ditto blocks (the bad block was a L2 indirect block - of which there would have been 2 copies). --Bill
Hello Bill, Friday, July 21, 2006, 7:31:25 AM, you wrote: BM> On Thu, Jul 20, 2006 at 03:45:54PM -0700, Jeff Bonwick wrote:>> > However, we do have the advantage of always knowing when something >> > is corrupted, and knowing what that particular block should have been. >> >> We also have ditto blocks for all metadata, so that even if any block >> of ZFS metadata is destroyed, we always have another copy. >> Bill Moore describes ditto blocks in detail here: >> >> http://blogs.sun.com/roller/page/bill?entry=ditto_blocks_the_amazing_tapeBM> Right. And I should point out that if Eric had been running build 38 or BM> later, this data corruption would not have happened - it would have been BM> automatically repaired using ditto blocks (the bad block was a L2 BM> indirect block - of which there would have been 2 copies). However possibly something is broken there as I see on two different servers (v240, T2000) CKSUM errors for ditto blocks on daily basics and it''s hard to belive I have a problem with hardware and it hits only metadata blocks. More at: http://www.opensolaris.org/jive/thread.jspa?threadID=9846&tstart=0 -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
After reading the ditto blocks blog (good article, btw), an idea occurred to me: Since we use ditto blocks to preserve critical filesystem data, would it be practical to add a filesystem property that would cause all files in a filesystem to be stored as mirrored blocks? That would allow a dual-copy behavior selectable on a filesystem boundary even in a vdev pool. That could be handy for those that have a little bit of critical data and a lot of not-so-critical data. On Jul 20, 2006, at 4:45 PM, Jeff Bonwick wrote:>> However, we do have the advantage of always knowing when something >> is corrupted, and knowing what that particular block should have >> been. > > We also have ditto blocks for all metadata, so that even if any block > of ZFS metadata is destroyed, we always have another copy. > Bill Moore describes ditto blocks in detail here: > > http://blogs.sun.com/roller/page/bill? > entry=ditto_blocks_the_amazing_tape > > Jeff > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss----- Gregory Shaw, IT Architect Phone: (303) 673-8273 Fax: (303) 673-8273 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive MS 4382 greg.shaw at sun.com (work) Louisville, CO 80028-4382 shaw at fmsoft.com (home) "When Microsoft writes an application for Linux, I''ve Won." - Linus Torvalds -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060721/fedad8bd/attachment.html>
Hello Gregory, Friday, July 21, 2006, 3:22:17 PM, you wrote: > After reading the ditto blocks blog (good article, btw), an idea occurred to me: Since we use ditto blocks to preserve critical filesystem data, would it be practical to add a filesystem property that would cause all files in a filesystem to be stored as mirrored blocks? That would allow a dual-copy behavior selectable on a filesystem boundary even in a vdev pool. That could be handy for those that have a little bit of critical data and a lot of not-so-critical data. IIRC that''s already planned. -- Best regards, Robert mailto:rmilkowski@task.gda.pl http://milek.blogspot.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Fri, Jul 21, 2006 at 07:22:17AM -0600, Gregory Shaw wrote:> After reading the ditto blocks blog (good article, btw), an idea > occurred to me: > > Since we use ditto blocks to preserve critical filesystem data, would > it be practical to add a filesystem property that would cause all > files in a filesystem to be stored as mirrored blocks?Yep, that''s the plan. I even mention it in the blog. :) --Bill
Eric Schrock wrote:> Well the fact that it''s a level 2 indirect block indicates why it can''t > simply be removed. We don''t know what data it refers to, so we can''t > free the associated blocks. The panic on move is quite interesting - > after BFU give it another shot and file a bug if it still happens.I''m still seeing the panic (build 42) when trying to ''mv'' the file with corrupt indirect blocks. The problem looks like 6424466 and 6440780, the panic string is "data after EOF". Email me offline if you would like to collect the core from my system. - Eric
Eric Lowe wrote:> Eric Schrock wrote: > >> Well the fact that it''s a level 2 indirect block indicates why it can''t >> simply be removed. We don''t know what data it refers to, so we can''t >> free the associated blocks. The panic on move is quite interesting - >> after BFU give it another shot and file a bug if it still happens. > > > I''m still seeing the panic (build 42) when trying to ''mv'' the file with > corrupt indirect blocks. The problem looks like 6424466 and 6440780, the > panic string is "data after EOF". Email me offline if you would like to > collect the core from my system. > > - EricYup, this is a duplicate of 6424466 (6440780 is also probably a dup of 6424466). You are seeing this panic on a ''mv'' because of some old debug code in dnode_sync() scanning the dnode contents. The "data after EOF" message is bogus, the real problem is your data corruption. Anyway, this is not going to go away until I put back a fix for 6424466. Sorry about that. -Mark