thr3ads.net - zfs discuss - [zfs-discuss] Data corruption during resilver operation [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Harry Putnam

2009-Mar-30 02:28 UTC

[zfs-discuss] Data corruption during resilver operation

I''m in well over my head with this report from zpool status saying:

root # zpool status z3
  pool: z3
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 0h7m with 38 errors on Sun Mar 29 18:37:28 2009
config:

        NAME        STATE     READ WRITE CKSUM
        z3          DEGRADED     0     0    40
          mirror    DEGRADED     0     0    80
            c5d0    DEGRADED     0     0    80  too many errors
            c6d0    DEGRADED     0     0    80  too many errors

This is that last thing and apparently the result of a series of steps
I''ve taken to increase a zpool mirrors size.

There was quite a lot of huffing and puffing with the sata controller
that holds this mirror but the short version is:

zpool z3 created as mirror on 2 older 200gb SATAI disks.  On an
adaptec 1205sa PCI controller.

After deciding I wanted increase the size of this pool, I detached 1
disk, then pulled it out.  I replaced it with a newer bigger sata II
wd750 gb disk.  When I attempted to startup and attach this disk, I
didn''t get by the boot process, and discovered my sata controller
could not handle the newer SATAII disk..  No boot was possible.

I finally got the sata contoller in shape to work by flashing the 2
part BIOS with latest bios for that card. (Sil 3112a chip).

Restarted with 1 original 200gb disk and 1 new 750gb disk.  
It booted and I was abble to attach the new larger drive and begin the
resilvering process.

I went on to other things, but when I checked back I found the error
report cited above.

I stared looking through the data but didn''t really see much wrong.  I
check the byte size with `du -sb'' on the zpool and the source of the
data on a remote linux host.  They were not the same but quite close.
I didn''t think that meant much since it was on different filesystems.
zfs and reiserfs.

I went to the web page cited in the report to see what I could learn.
To summarize it said this was serious business.  That data might not
even be able to be removed but that for sure it needed to be replaced
from clean backup.

Using zpool status -v z3  I learned there were 51 files said to be
corrupt.  But when I looked at the files they were not part of the
original data.

The original data was put there by an rsync process from a remote
host. and contained none of the named files.  There files are of the
form (wrapped for mail):

  z3/www/reader at zfs-auto-snap:frequent-2009-03-29-18:55:\
    /www/localhost/htdocs/lcweb/TrainingVids/VegasTraining/\
       VegasTraiiningTransitions.avi

  (All on one line)

I''m not at all clear on what this is.  The part after the colon is
what was rsynced over.  The files that turned up in the report are all
*.mov *.avi, *.mpg or *.pdf.

I didn''t make any snapshots, nor did I set anything to have them made
automatically... so not sure where this snapshot came from or really
even if it is in fact a snapshot.

Is it somehow a product of the resilvering?

When I go to the root of this filesystem (/www) and run a find command
like:
  find . -name ''VegasTraiiningTransitions.avi''

The file is found.  I haven''t been able to test if they play yet but
wondering what this snapshot stuff means.  And what I should do about
it.

The warning clearly suggests they must be replaced with good copies.

That wouldn''t be too big a deal, but I do still have the other new
disk to insert and resilver.

So what is the smart move here?... Replace the data before continuing
with the enlargement of the pool? Or something else?

Blake

2009-Mar-30 18:53 UTC

head link

[zfs-discuss] Data corruption during resilver operation

You are seeing snapshots from Time-Slider''s automatic snapshot service.

If you have a copy of each of these 58 files elsewhere, I suppose you
could re-copy them to the mirror and then do ''zpool clear
[poolname]''
to reset the error counter.



On Sun, Mar 29, 2009 at 10:28 PM, Harry Putnam <reader at newsguy.com>
wrote:> I''m in well over my head with this report from zpool status
saying:
>
> root # zpool status z3
> ?pool: z3
> ?state: DEGRADED
> status: One or more devices has experienced an error resulting in data
> ? ? ? ?corruption. ?Applications may be affected.
> action: Restore the file in question if possible. ?Otherwise restore the
> ? ? ? ?entire pool from backup.
> ? see: http://www.sun.com/msg/ZFS-8000-8A
> ?scrub: resilver completed after 0h7m with 38 errors on Sun Mar 29 18:37:28
2009
> config:
>
> ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM
> ? ? ? ?z3 ? ? ? ? ?DEGRADED ? ? 0 ? ? 0 ? ?40
> ? ? ? ? ?mirror ? ?DEGRADED ? ? 0 ? ? 0 ? ?80
> ? ? ? ? ? ?c5d0 ? ?DEGRADED ? ? 0 ? ? 0 ? ?80 ?too many errors
> ? ? ? ? ? ?c6d0 ? ?DEGRADED ? ? 0 ? ? 0 ? ?80 ?too many errors
>
> This is that last thing and apparently the result of a series of steps
> I''ve taken to increase a zpool mirrors size.
>
> There was quite a lot of huffing and puffing with the sata controller
> that holds this mirror but the short version is:
>
> zpool z3 created as mirror on 2 older 200gb SATAI disks. ?On an
> adaptec 1205sa PCI controller.
>
> After deciding I wanted increase the size of this pool, I detached 1
> disk, then pulled it out. ?I replaced it with a newer bigger sata II
> wd750 gb disk. ?When I attempted to startup and attach this disk, I
> didn''t get by the boot process, and discovered my sata controller
> could not handle the newer SATAII disk.. ?No boot was possible.
>
> I finally got the sata contoller in shape to work by flashing the 2
> part BIOS with latest bios for that card. (Sil 3112a chip).
>
> Restarted with 1 original 200gb disk and 1 new 750gb disk.
> It booted and I was abble to attach the new larger drive and begin the
> resilvering process.
>
> I went on to other things, but when I checked back I found the error
> report cited above.
>
> I stared looking through the data but didn''t really see much
wrong. ?I
> check the byte size with `du -sb'' on the zpool and the source of
the
> data on a remote linux host. ?They were not the same but quite close.
> I didn''t think that meant much since it was on different
filesystems.
> zfs and reiserfs.
>
> I went to the web page cited in the report to see what I could learn.
> To summarize it said this was serious business. ?That data might not
> even be able to be removed but that for sure it needed to be replaced
> from clean backup.
>
> Using zpool status -v z3 ?I learned there were 51 files said to be
> corrupt. ?But when I looked at the files they were not part of the
> original data.
>
> The original data was put there by an rsync process from a remote
> host. and contained none of the named files. ?There files are of the
> form (wrapped for mail):
>
> ?z3/www/reader at zfs-auto-snap:frequent-2009-03-29-18:55:\
> ? ?/www/localhost/htdocs/lcweb/TrainingVids/VegasTraining/\
> ? ? ? VegasTraiiningTransitions.avi
>
> ?(All on one line)
>
> I''m not at all clear on what this is. ?The part after the colon is
> what was rsynced over. ?The files that turned up in the report are all
> *.mov *.avi, *.mpg or *.pdf.
>
> I didn''t make any snapshots, nor did I set anything to have them
made
> automatically... so not sure where this snapshot came from or really
> even if it is in fact a snapshot.
>
> Is it somehow a product of the resilvering?
>
> When I go to the root of this filesystem (/www) and run a find command
> like:
> ?find . -name ''VegasTraiiningTransitions.avi''
>
> The file is found. ?I haven''t been able to test if they play yet
but
> wondering what this snapshot stuff means. ?And what I should do about
> it.
>
> The warning clearly suggests they must be replaced with good copies.
>
> That wouldn''t be too big a deal, but I do still have the other new
> disk to insert and resilver.
>
> So what is the smart move here?... Replace the data before continuing
> with the enlargement of the pool? Or something else?
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Harry Putnam

2009-Mar-30 19:03 UTC

head link

[zfs-discuss] Data corruption during resilver operation

Blake <blake.irvin at gmail.com> writes:> You are seeing snapshots from Time-Slider''s automatic snapshot
service.
>
> If you have a copy of each of these 58 files elsewhere, I suppose you
> could re-copy them to the mirror and then do ''zpool clear
[poolname]''
> to reset the error counter.
>Thanks... I did try coping from the source to replace those but it
didn''t appear to make any difference... still got the errors.

I finally just assumed I''d done something untoward during all the
shuffle of upgrading a 200gb mirror to a 750 gb mirror and flashing
the bios of the PCI sata controller card in the middle.

So resorted to zpool destroy badpool

Finished the switch from 200gb to 750gb with no zpool on either.

Created the mirror using the 2 750gb disks.  And finally rsynced the
data across from a linux machine to the new zpool as before.

Blake

2009-Mar-30 20:47 UTC

head link

[zfs-discuss] Data corruption during resilver operation

Sounds like the best way - I was about to suggest that anyway :)

On Mon, Mar 30, 2009 at 3:03 PM, Harry Putnam <reader at newsguy.com>
wrote:> Blake <blake.irvin at gmail.com> writes:
>> You are seeing snapshots from Time-Slider''s automatic snapshot
service.
>>
>> If you have a copy of each of these 58 files elsewhere, I suppose you
>> could re-copy them to the mirror and then do ''zpool clear
[poolname]''
>> to reset the error counter.
>>
> Thanks... I did try coping from the source to replace those but it
> didn''t appear to make any difference... still got the errors.
>
> I finally just assumed I''d done something untoward during all the
> shuffle of upgrading a 200gb mirror to a 750 gb mirror and flashing
> the bios of the PCI sata controller card in the middle.
>
> So resorted to zpool destroy badpool
>
> Finished the switch from 200gb to 750gb with no zpool on either.
>
> Created the mirror using the 2 750gb disks. ?And finally rsynced the
> data across from a linux machine to the new zpool as before.
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Possibly Parallel Threads

Search for more seemingly similar threads

zfs discuss - Mar 2009 - Data corruption during resilver operation

[zfs-discuss] Data corruption during resilver operation

[zfs-discuss] Data corruption during resilver operation

[zfs-discuss] Data corruption during resilver operation

[zfs-discuss] Data corruption during resilver operation

Possibly Parallel Threads