Harry Putnam
2009-Mar-30 02:28 UTC
[zfs-discuss] Data corruption during resilver operation
I''m in well over my head with this report from zpool status saying: root # zpool status z3 pool: z3 state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 0h7m with 38 errors on Sun Mar 29 18:37:28 2009 config: NAME STATE READ WRITE CKSUM z3 DEGRADED 0 0 40 mirror DEGRADED 0 0 80 c5d0 DEGRADED 0 0 80 too many errors c6d0 DEGRADED 0 0 80 too many errors This is that last thing and apparently the result of a series of steps I''ve taken to increase a zpool mirrors size. There was quite a lot of huffing and puffing with the sata controller that holds this mirror but the short version is: zpool z3 created as mirror on 2 older 200gb SATAI disks. On an adaptec 1205sa PCI controller. After deciding I wanted increase the size of this pool, I detached 1 disk, then pulled it out. I replaced it with a newer bigger sata II wd750 gb disk. When I attempted to startup and attach this disk, I didn''t get by the boot process, and discovered my sata controller could not handle the newer SATAII disk.. No boot was possible. I finally got the sata contoller in shape to work by flashing the 2 part BIOS with latest bios for that card. (Sil 3112a chip). Restarted with 1 original 200gb disk and 1 new 750gb disk. It booted and I was abble to attach the new larger drive and begin the resilvering process. I went on to other things, but when I checked back I found the error report cited above. I stared looking through the data but didn''t really see much wrong. I check the byte size with `du -sb'' on the zpool and the source of the data on a remote linux host. They were not the same but quite close. I didn''t think that meant much since it was on different filesystems. zfs and reiserfs. I went to the web page cited in the report to see what I could learn. To summarize it said this was serious business. That data might not even be able to be removed but that for sure it needed to be replaced from clean backup. Using zpool status -v z3 I learned there were 51 files said to be corrupt. But when I looked at the files they were not part of the original data. The original data was put there by an rsync process from a remote host. and contained none of the named files. There files are of the form (wrapped for mail): z3/www/reader at zfs-auto-snap:frequent-2009-03-29-18:55:\ /www/localhost/htdocs/lcweb/TrainingVids/VegasTraining/\ VegasTraiiningTransitions.avi (All on one line) I''m not at all clear on what this is. The part after the colon is what was rsynced over. The files that turned up in the report are all *.mov *.avi, *.mpg or *.pdf. I didn''t make any snapshots, nor did I set anything to have them made automatically... so not sure where this snapshot came from or really even if it is in fact a snapshot. Is it somehow a product of the resilvering? When I go to the root of this filesystem (/www) and run a find command like: find . -name ''VegasTraiiningTransitions.avi'' The file is found. I haven''t been able to test if they play yet but wondering what this snapshot stuff means. And what I should do about it. The warning clearly suggests they must be replaced with good copies. That wouldn''t be too big a deal, but I do still have the other new disk to insert and resilver. So what is the smart move here?... Replace the data before continuing with the enlargement of the pool? Or something else?
You are seeing snapshots from Time-Slider''s automatic snapshot service. If you have a copy of each of these 58 files elsewhere, I suppose you could re-copy them to the mirror and then do ''zpool clear [poolname]'' to reset the error counter. On Sun, Mar 29, 2009 at 10:28 PM, Harry Putnam <reader at newsguy.com> wrote:> I''m in well over my head with this report from zpool status saying: > > root # zpool status z3 > ?pool: z3 > ?state: DEGRADED > status: One or more devices has experienced an error resulting in data > ? ? ? ?corruption. ?Applications may be affected. > action: Restore the file in question if possible. ?Otherwise restore the > ? ? ? ?entire pool from backup. > ? see: http://www.sun.com/msg/ZFS-8000-8A > ?scrub: resilver completed after 0h7m with 38 errors on Sun Mar 29 18:37:28 2009 > config: > > ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM > ? ? ? ?z3 ? ? ? ? ?DEGRADED ? ? 0 ? ? 0 ? ?40 > ? ? ? ? ?mirror ? ?DEGRADED ? ? 0 ? ? 0 ? ?80 > ? ? ? ? ? ?c5d0 ? ?DEGRADED ? ? 0 ? ? 0 ? ?80 ?too many errors > ? ? ? ? ? ?c6d0 ? ?DEGRADED ? ? 0 ? ? 0 ? ?80 ?too many errors > > This is that last thing and apparently the result of a series of steps > I''ve taken to increase a zpool mirrors size. > > There was quite a lot of huffing and puffing with the sata controller > that holds this mirror but the short version is: > > zpool z3 created as mirror on 2 older 200gb SATAI disks. ?On an > adaptec 1205sa PCI controller. > > After deciding I wanted increase the size of this pool, I detached 1 > disk, then pulled it out. ?I replaced it with a newer bigger sata II > wd750 gb disk. ?When I attempted to startup and attach this disk, I > didn''t get by the boot process, and discovered my sata controller > could not handle the newer SATAII disk.. ?No boot was possible. > > I finally got the sata contoller in shape to work by flashing the 2 > part BIOS with latest bios for that card. (Sil 3112a chip). > > Restarted with 1 original 200gb disk and 1 new 750gb disk. > It booted and I was abble to attach the new larger drive and begin the > resilvering process. > > I went on to other things, but when I checked back I found the error > report cited above. > > I stared looking through the data but didn''t really see much wrong. ?I > check the byte size with `du -sb'' on the zpool and the source of the > data on a remote linux host. ?They were not the same but quite close. > I didn''t think that meant much since it was on different filesystems. > zfs and reiserfs. > > I went to the web page cited in the report to see what I could learn. > To summarize it said this was serious business. ?That data might not > even be able to be removed but that for sure it needed to be replaced > from clean backup. > > Using zpool status -v z3 ?I learned there were 51 files said to be > corrupt. ?But when I looked at the files they were not part of the > original data. > > The original data was put there by an rsync process from a remote > host. and contained none of the named files. ?There files are of the > form (wrapped for mail): > > ?z3/www/reader at zfs-auto-snap:frequent-2009-03-29-18:55:\ > ? ?/www/localhost/htdocs/lcweb/TrainingVids/VegasTraining/\ > ? ? ? VegasTraiiningTransitions.avi > > ?(All on one line) > > I''m not at all clear on what this is. ?The part after the colon is > what was rsynced over. ?The files that turned up in the report are all > *.mov *.avi, *.mpg or *.pdf. > > I didn''t make any snapshots, nor did I set anything to have them made > automatically... so not sure where this snapshot came from or really > even if it is in fact a snapshot. > > Is it somehow a product of the resilvering? > > When I go to the root of this filesystem (/www) and run a find command > like: > ?find . -name ''VegasTraiiningTransitions.avi'' > > The file is found. ?I haven''t been able to test if they play yet but > wondering what this snapshot stuff means. ?And what I should do about > it. > > The warning clearly suggests they must be replaced with good copies. > > That wouldn''t be too big a deal, but I do still have the other new > disk to insert and resilver. > > So what is the smart move here?... Replace the data before continuing > with the enlargement of the pool? Or something else? > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Harry Putnam
2009-Mar-30 19:03 UTC
[zfs-discuss] Data corruption during resilver operation
Blake <blake.irvin at gmail.com> writes:> You are seeing snapshots from Time-Slider''s automatic snapshot service. > > If you have a copy of each of these 58 files elsewhere, I suppose you > could re-copy them to the mirror and then do ''zpool clear [poolname]'' > to reset the error counter. >Thanks... I did try coping from the source to replace those but it didn''t appear to make any difference... still got the errors. I finally just assumed I''d done something untoward during all the shuffle of upgrading a 200gb mirror to a 750 gb mirror and flashing the bios of the PCI sata controller card in the middle. So resorted to zpool destroy badpool Finished the switch from 200gb to 750gb with no zpool on either. Created the mirror using the 2 750gb disks. And finally rsynced the data across from a linux machine to the new zpool as before.
Sounds like the best way - I was about to suggest that anyway :) On Mon, Mar 30, 2009 at 3:03 PM, Harry Putnam <reader at newsguy.com> wrote:> Blake <blake.irvin at gmail.com> writes: >> You are seeing snapshots from Time-Slider''s automatic snapshot service. >> >> If you have a copy of each of these 58 files elsewhere, I suppose you >> could re-copy them to the mirror and then do ''zpool clear [poolname]'' >> to reset the error counter. >> > Thanks... I did try coping from the source to replace those but it > didn''t appear to make any difference... still got the errors. > > I finally just assumed I''d done something untoward during all the > shuffle of upgrading a 200gb mirror to a 750 gb mirror and flashing > the bios of the PCI sata controller card in the middle. > > So resorted to zpool destroy badpool > > Finished the switch from 200gb to 750gb with no zpool on either. > > Created the mirror using the 2 750gb disks. ?And finally rsynced the > data across from a linux machine to the new zpool as before. > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >