Harry Putnam
2010-Mar-09 02:32 UTC
[zfs-discuss] what to do when errors occur during scrub
I''m a little at a loss here as to what to do about these two errors that turned up during a scrub. The discs involved are a matched pair in mirror mode. zpool status -v z3 (wrapped for mail): ------- --------- ---=--- --------- -------- scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8 10:26:49 2010 config: NAME STATE READ WRITE CKSUM z3 ONLINE 0 0 2 mirror-0 ONLINE 0 0 4 c5d0 ONLINE 0 0 4 c6d0 ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: [NOTE: Edited to ease reading -ed -hp] z3/projects at zfs-auto-snap:monthly-2009-08-30-09:26:/Training/\ [... huge path snipped ...]/2_Database.mov /t/bk-test-DiskDamage-021710_005252/rsnap/misc/hourly.4/\ [... huge path snipped ...]/es.utf-8.sug ------- --------- ---=--- --------- -------- Those are just two on disk files. Can it be as simple as just deleting them? Or is something more technical required.
Harry Putnam
2010-Mar-09 17:08 UTC
[zfs-discuss] what to do when errors occur during scrub
[I hope this isn''t a repost double whammy. I posted this message under `Message-ID: <87fx4ai5sp.fsf at newsguy.com>'' over 15 hrs ago but it never appeared on my nntp server (gmane) far as I can see] I''m a little at a loss here as to what to do about these two errors that turned up during a scrub. The discs involved are a matched pair in mirror mode. zpool status -v z3 (wrapped for mail): ------- --------- ---=--- --------- -------- scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8 10:26:49 2010 config: NAME STATE READ WRITE CKSUM z3 ONLINE 0 0 2 mirror-0 ONLINE 0 0 4 c5d0 ONLINE 0 0 4 c6d0 ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: [NOTE: Edited to ease reading -ed -hp] z3/projects at zfs-auto-snap:monthly-2009-08-30-09:26:/Training/\ [... huge path snipped ...]/2_Database.mov /t/bk-test-DiskDamage-021710_005252/rsnap/misc/hourly.4/\ [... huge path snipped ...]/es.utf-8.sug ------- --------- ---=--- --------- -------- Those are just two on disk files. Can it be as simple as just deleting them? Or is something more technical required.
Cindy Swearingen
2010-Mar-09 20:18 UTC
[zfs-discuss] what to do when errors occur during scrub
Hi Harry, Reviewing other postings where permanent errors where found on redundant ZFS configs, one was resolved by re-running the zpool scrub and one resolved itself because the files with the permanent errors were most likely temporary files. One of the files with permanent errors below is a snapshot and the other looks another backup. I would recommend the top section of this troubleshooting wiki to determine if hardware issues are causing these permanent errors: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide If it turns out that some hardware problem, power failure, or other event caused these errors and if rerunning the scrub doesn''t remove these files, then I would remove them manually (if you have copies of the data somewhere else). Thanks, Cindy On 03/09/10 10:08, Harry Putnam wrote:> [I hope this isn''t a repost double whammy. I posted this message > under `Message-ID: <87fx4ai5sp.fsf at newsguy.com>'' over 15 hrs ago but > it never appeared on my nntp server (gmane) far as I can see] > > I''m a little at a loss here as to what to do about these two errors > that turned up during a scrub. > > The discs involved are a matched pair in mirror mode. > > zpool status -v z3 (wrapped for mail): > ------- --------- ---=--- --------- -------- > scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8 > 10:26:49 2010 config: > > NAME STATE READ WRITE CKSUM > z3 ONLINE 0 0 2 > mirror-0 ONLINE 0 0 4 > c5d0 ONLINE 0 0 4 > c6d0 ONLINE 0 0 4 > > errors: Permanent errors have been detected in the following files: > [NOTE: Edited to ease reading -ed -hp] > z3/projects at zfs-auto-snap:monthly-2009-08-30-09:26:/Training/\ > [... huge path snipped ...]/2_Database.mov > > /t/bk-test-DiskDamage-021710_005252/rsnap/misc/hourly.4/\ > [... huge path snipped ...]/es.utf-8.sug > > ------- --------- ---=--- --------- -------- > > Those are just two on disk files. > > Can it be as simple as just deleting them? > > Or is something more technical required. > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Harry Putnam
2010-Mar-09 22:57 UTC
[zfs-discuss] what to do when errors occur during scrub
Cindy Swearingen <Cindy.Swearingen at Sun.COM> writes:> Hi Harry, > > Reviewing other postings where permanent errors where found on > redundant ZFS configs, one was resolved by re-running the zpool scrub > and one > resolved itself because the files with the permanent errors were most > likely temporary files.what search strings did you use to find those?... I always seem to use search strings that miss what I''m after.... its helpful to see how others conduct searches.> One of the files with permanent errors below is a snapshot and the other > looks another backup. > > I would recommend the top section of this troubleshooting wiki to > determine if hardware issues are causing these permanent errors:> http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_GuideA lot of that seems horribly complex for what is apparently (and this may turn out to be wishful thinking) a pretty minor problem. But it does say that repeated scrubs will most likely remove all traces of corruption (assuming its not caused by hardware). However I see no evidence that the `scrub'' command is doing anything at all (more on that below). I decided to take the line of least Resistance and simply deleted the file. As you guessed, they were backups and luckily for me, redundant. So following a scrub... I see errors that look more technical. But first the info given by `zpool status'' appears to either be referencing a earlier scrub or is seriously wrong in what it reports. root # zpool status -vx z3 pool: z3 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8 10:26:49 2010 config: ------- --------- ---=--- --------- -------- I just ran a scrub moments ago, but `status'' is still reporting one from earlier in the day. It says 1HR and 48 minutes but that is completely wrong too. ------- --------- ---=--- --------- -------- NAME STATE READ WRITE CKSUM z3 ONLINE 0 0 2 mirror-0 ONLINE 0 0 4 c5d0 ONLINE 0 0 4 c6d0 ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: <0x42>:<0x552d> z3/t:<0xe1d99f> ------- --------- ---=--- --------- -------- The `status'' report, even though it seems to have bogus information about the scrub, does show different output for the errors. Are those hex addresses of devices or what? There is nothing at all on z3/t Also - it appears `zpool scrub -s z3'' doesn''t really do anything. The status report above is taken immediately after a scrub command. The `scub -s'' command just returns the prompt... no output and apparently no scrub either. Does the failure to scrub indicate it cannot be scrubbed? Does a status report that shows the pool on line and not degraded really mean anything is that just as spurious as the scrub info there? Sorry if I seem like a lazy dog but I don''t really see a section in the trouble shooting (from viewing the outline of sections) that appears to deal with directly with scrubbing. Apparently I''m supposed to read and digest the whole thing so as to know what to do... but I get quickly completely lost in the discussion. They say to use fmdump for a list of defective hardware... but I don''t see anything that appears to indicate a problem unless the two entries from March 5th mean something that is not apparent. fmdump (I removed the exact times from the lines so this wouldn''t wrap) [...] Mar 05 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 ZFS-8000-GH Mar 05 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 ZFS-8000-GH Mar 08 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 FMD-8000-4M Repaired Mar 08 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 FMD-8000-6U Resolved Mar 08 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 FMD-8000-4M Repaired Mar 08 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 FMD-8000-6U Resolved
Cindy Swearingen
2010-Mar-10 00:09 UTC
[zfs-discuss] what to do when errors occur during scrub
Hi Harry, Part of my job is to make this stuff easier but we have a some hurdles to go in the area of device management and troubleshooting. The relevant zfs-discuss threads include ''permanent errors'' if that helps. You might try running zpool clear on the pool between the scrubs. I will add this as a recovery step in the t/s wiki. I believe you''ll see the hex output if the files no longer exist. You could try to using this syntax to isolate any recent errors on your devices: # fmdump -eV | grep c5d0 # fmdump -eV | grep c6d0 Cindy ----- Original Message ----- From: Harry Putnam <reader at newsguy.com> Date: Tuesday, March 9, 2010 4:00 pm Subject: Re: [zfs-discuss] what to do when errors occur during scrub To: zfs-discuss at opensolaris.org> Cindy Swearingen <Cindy.Swearingen at Sun.COM> writes: > > > Hi Harry, > > > > Reviewing other postings where permanent errors where found on > > redundant ZFS configs, one was resolved by re-running the zpool scrub > > and one > > resolved itself because the files with the permanent errors were most > > likely temporary files. > > what search strings did you use to find those?... I always seem to use > search strings that miss what I''m after.... its helpful to see how > others conduct searches. > > > One of the files with permanent errors below is a snapshot and the other > > looks another backup. > > > > I would recommend the top section of this troubleshooting wiki to > > determine if hardware issues are causing these permanent errors: > > > http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide > > A lot of that seems horribly complex for what is apparently (and this > may turn out to be wishful thinking) a pretty minor problem. But it > does say that repeated scrubs will most likely remove all traces of > corruption (assuming its not caused by hardware). However I see no > evidence that the `scrub'' command is doing anything at all (more on > that below). > > I decided to take the line of least Resistance and simply deleted the > file. > > As you guessed, they were backups and luckily for me, redundant. > > So following a scrub... I see errors that look more technical. > > But first the info given by `zpool status'' appears to either be > referencing a earlier scrub or is seriously wrong in what it reports. > > root # zpool status -vx z3 > pool: z3 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > > action: Restore the file in question if possible. Otherwise restore > the entire pool from backup. see: > > http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after > 1h48m with 2 errors on Mon Mar 8 10:26:49 2010 config: > > ------- --------- ---=--- --------- -------- > > I just ran a scrub moments ago, but `status'' is still reporting one > from earlier in the day. It says 1HR and 48 minutes but that is > completely wrong too. > > ------- --------- ---=--- --------- -------- > > NAME STATE READ WRITE CKSUM > z3 ONLINE 0 0 2 > mirror-0 ONLINE 0 0 4 > c5d0 ONLINE 0 0 4 > c6d0 ONLINE 0 0 4 > > errors: Permanent errors have been detected in the following files: > > <0x42>:<0x552d> > z3/t:<0xe1d99f> > ------- --------- ---=--- --------- -------- > > The `status'' report, even though it seems to have bogus information > about the scrub, does show different output for the errors. > > Are those hex addresses of devices or what? There is nothing at all > on z3/t > > Also - it appears `zpool scrub -s z3'' doesn''t really do anything. > > The status report above is taken immediately after a scrub command. > > The `scub -s'' command just returns the prompt... no output and > apparently no scrub either. > > Does the failure to scrub indicate it cannot be scrubbed? Does a > status report that shows the pool on line and not degraded really mean > anything is that just as spurious as the scrub info there? > > Sorry if I seem like a lazy dog but I don''t really see a section in > the trouble shooting (from viewing the outline of sections) that > appears to deal with directly with scrubbing. > > Apparently I''m supposed to read and digest the whole thing so as to > know what to do... but I get quickly completely lost in the > discussion. > > They say to use fmdump for a list of defective hardware... but I don''t > see anything that appears to indicate a problem unless the two entries > from March 5th mean something that is not apparent. > > fmdump > > (I removed the exact times from the lines so this wouldn''t wrap) > > [...] > > Mar 05 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 ZFS-8000-GH > Mar 05 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 ZFS-8000-GH > Mar 08 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 FMD-8000-4M Repaired > Mar 08 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 FMD-8000-6U Resolved > Mar 08 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 FMD-8000-4M Repaired > Mar 08 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 FMD-8000-6U Resolved > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
David Dyer-Bennet
2010-Mar-10 02:53 UTC
[zfs-discuss] what to do when errors occur during scrub
On 3/9/2010 4:57 PM, Harry Putnam wrote:> Also - it appears `zpool scrub -s z3'' doesn''t really do anything. > The status report above is taken immediately after a scrub command. > > The `scub -s'' command just returns the prompt... no output and > apparently no scrub either. >The "-s" switch is documented to STOP a scrub, though I''ve never used it. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
Harry Putnam
2010-Mar-10 18:07 UTC
[zfs-discuss] what to do when errors occur during scrub
David Dyer-Bennet <dd-b at dd-b.net> writes:> On 3/9/2010 4:57 PM, Harry Putnam wrote: >> Also - it appears `zpool scrub -s z3'' doesn''t really do anything. >> The status report above is taken immediately after a scrub command. >> >> The `scub -s'' command just returns the prompt... no output and >> apparently no scrub either. >> > > The "-s" switch is documented to STOP a scrub, though I''ve never used it.egad... and so it is...