thr3ads.net - zfs discuss - [zfs-discuss] What happens when: file-corrupted and no-redundancy? [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Aleksandr Levchuk

2010-Feb-03 21:45 UTC

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

Hardware RAID6 + hot spare, worked well for us. So, I wanted to stick
our SAN for data protection. I understand that the end-to-end checks
of ZFS make it better at detecting corruptions.

In my case, I can imagine that ZFS would FREEZ the whole volume when a
single block or file is found to be corrupted.

Ideally, I would not like this to happen and instead get a log with
names of corrupted files.

What exactly does happens when zfs detects a corrupted block/file and
does not have redundancy to correct it?

Alex

-- 
---------------------------------------------------------------
Aleksandr Levchuk
Homepage: http://biocluster.ucr.edu/~alevchuk/
Cell Phone: (951) 368-0004

Bioinformatic Systems and Databases
Lab Phone: (951) 905-5232

Institute for Integrative Genome Biology
University of California, Riverside
---------------------------------------------------------------

Aleksandr Levchuk

2010-Feb-03 23:15 UTC

head link

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

We switched to OpenSolaris + ZFS. RAID6 + hot spare on LSI Engenio san hardware,
worked well for us. (I''m used to the san management GUI. Also,
something that RAID-Z would not be able to do is: the san lights-up the amber
LEDs on the drives that fail, so I know which one to replace.)

So, I wanted to try to stick to the hardware RAID for data protection. I
understand that the end-to-end checks of ZFS make it better at detecting
corruptions.

In my case, I can imagine that ZFS would FREEZ the whole volume when a single
block or file is found to be corrupted.

Ideally, I would not like this to happen and instead would like to get a log
with names of corrupted files.

What exactly does happens when
zfs detects a corrupted block/file and does not have redundancy to correct it?

Alex
-- 
This message posted from opensolaris.org

Richard Elling

2010-Feb-03 23:57 UTC

head link

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

On Feb 3, 2010, at 3:15 PM, Aleksandr Levchuk wrote:
> We switched to OpenSolaris + ZFS. RAID6 + hot spare on LSI Engenio san
hardware, worked well for us. (I''m used to the san management GUI.
Also, something that RAID-Z would not be able to do is: the san lights-up the
amber LEDs on the drives that fail, so I know which one to replace.)
In Solaris, lighting service LEDs is part of the FMA framework agents.
When ZFS has repeated checksum failures from a device, it will notify
FMA, which is responsible for diagnosis and subsequent service alerts.
> So, I wanted to try to stick to the hardware RAID for data protection. I
understand that the end-to-end checks of ZFS make it better at detecting
corruptions.
> 
> In my case, I can imagine that ZFS would FREEZ the whole volume when a
single block or file is found to be corrupted.
No.
> Ideally, I would not like this to happen and instead would like to get a
log with names of corrupted files.
> 
> What exactly does happens when
> zfs detects a corrupted block/file and does not have redundancy to correct
it?
It depends on the failmode property setting and how your application reading
the file handles errors. In all cases, the "zpool status -xv" command
can display
the file name or metadata that is corrupted.
 -- richard

Robert Milkowski

2010-Feb-04 00:22 UTC

head link

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

On 03/02/2010 23:15, Aleksandr Levchuk wrote:> We switched to OpenSolaris + ZFS. RAID6 + hot spare on LSI Engenio san
hardware, worked well for us. (I''m used to the san management GUI.
Also, something that RAID-Z would not be able to do is: the san lights-up the
amber LEDs on the drives that fail, so I know which one to replace.)
>
> So, I wanted to try to stick to the hardware RAID for data protection. I
understand that the end-to-end checks of ZFS make it better at detecting
corruptions.
>
> In my case, I can imagine that ZFS would FREEZ the whole volume when a
single block or file is found to be corrupted.
>
> Ideally, I would not like this to happen and instead would like to get a
log with names of corrupted files.
>
> What exactly does happens when
> zfs detects a corrupted block/file and does not have redundancy to correct
it?
>
> Alex
>    
Your wish is...
that''s exactly what should happen - zpool status -v should provide you 
with list of affected files which you should be able to delete. in case 
of corrupted block contained meta-data zfs should actually be able to 
fix it on the fly for you as all meta-data related block are kept in at 
least two copies even if no redundancy is configured at pool level.

Let''s test it:

milek at r600:~# mkfile 128m file1
milek at r600:~# zpool create test `pwd`/file1
milek at r600:~# zpool status test
   pool: test
  state: ONLINE
  scrub: none requested
config:

     NAME                        STATE     READ WRITE CKSUM
     test                        ONLINE       0     0     0
       /export/home/milek/file1  ONLINE       0     0     0

errors: No known data errors
milek at r600:~#
milek at r600:~# cp /bin/bash /test/file1
milek at r600:~# cp /bin/bash /test/file2
milek at r600:~# cp /bin/bash /test/file3
milek at r600:~# cp /bin/bash /test/file4
milek at r600:~# cp /bin/bash /test/file5
milek at r600:~# cp /bin/bash /test/file6
milek at r600:~# cp /bin/bash /test/file7
milek at r600:~# cp /bin/bash /test/file8
milek at r600:~# cp /bin/bash /test/file9
milek at r600:~# sync
milek at r600:~# dd if=/dev/zero of=file1 seek=50 count=10000 conv=notrunc
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 0.179617 s, 28.5 MB/s
milek at r600:~# sync
milek at r600:~# zpool scrub test
milek at r600:~# zpool status -v test
   pool: test
  state: DEGRADED
status: One or more devices has experienced an error resulting in data
     corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
     entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub completed after 0h0m with 7 errors on Thu Feb  4 00:18:40 
2010
config:

     NAME                        STATE     READ WRITE CKSUM
     test                        DEGRADED     0     0     7
       /export/home/milek/file1  DEGRADED     0     0    29  too many errors

errors: Permanent errors have been detected in the following files:

         /test/file1
milek at r600:~#
milek at r600:~# rm /test/file1
milek at r600:~# sync
milek at r600:~# zpool scrub test
milek at r600:~# zpool status -v test
   pool: test
  state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
     attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
     using ''zpool clear'' or replace the device with
''zpool replace''.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub completed after 0h0m with 0 errors on Thu Feb  4 00:19:55 
2010
config:

     NAME                        STATE     READ WRITE CKSUM
     test                        DEGRADED     0     0     7
       /export/home/milek/file1  DEGRADED     0     0    29  too many errors

errors: No known data errors
milek at r600:~# zpool clear test
milek at r600:~# zpool scrub test
milek at r600:~# zpool status -v test
   pool: test
  state: ONLINE
  scrub: scrub completed after 0h0m with 0 errors on Thu Feb  4 00:20:12 
2010
config:

     NAME                        STATE     READ WRITE CKSUM
     test                        ONLINE       0     0     0
       /export/home/milek/file1  ONLINE       0     0     0

errors: No known data errors
milek at r600:~#
milek at r600:~# ls -la /test/
total 7191
drwxr-xr-x  2 root root     10 2010-02-04 00:19 .
drwxr-xr-x 28 root root     30 2010-02-04 00:17 ..
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file2
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file3
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file4
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file5
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file6
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file7
-r-xr-xr-x  1 root root 799040 2010-02-04 00:18 file8
-r-xr-xr-x  1 root root 799040 2010-02-04 00:18 file9
milek at r600:~#


-- 
Robert Milkowski
htpp://milek.blogspot.com

Robert Milkowski

2010-Feb-04 17:28 UTC

head link

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

On 03/02/2010 21:45, Aleksandr Levchuk wrote:> Hardware RAID6 + hot spare, worked well for us. So, I wanted to stick
> our SAN for data protection. I understand that the end-to-end checks
> of ZFS make it better at detecting corruptions.
>
> In my case, I can imagine that ZFS would FREEZ the whole volume when a
> single block or file is found to be corrupted.
>
> Ideally, I would not like this to happen and instead get a log with
> names of corrupted files.
>
> What exactly does happens when zfs detects a corrupted block/file and
> does not have redundancy to correct it?
>
> Alex
>
>    I will repeat myself (as I sent below email just yesterday...)

ZFS won''t freeze a pool if a single block is corrupted even if no 
redundancy is configured on zfs level.

zpool status -v should provide you with list of affected files which you 
should be able to delete. In case of corrupted block containg meta-data 
zfs should actually be able to fix it on the fly for you as all 
meta-data related blocks are kept in at least two copies even if no 
redundancy is configured at pool level.

Let''s test it:

milek at r600:~# mkfile 128m file1
milek at r600:~# zpool create test `pwd`/file1
milek at r600:~# zpool status test
   pool: test
  state: ONLINE
  scrub: none requested
config:

     NAME                        STATE     READ WRITE CKSUM
     test                        ONLINE       0     0     0
       /export/home/milek/file1  ONLINE       0     0     0

errors: No known data errors
milek at r600:~#
milek at r600:~# cp /bin/bash /test/file1
milek at r600:~# cp /bin/bash /test/file2
milek at r600:~# cp /bin/bash /test/file3
milek at r600:~# cp /bin/bash /test/file4
milek at r600:~# cp /bin/bash /test/file5
milek at r600:~# cp /bin/bash /test/file6
milek at r600:~# cp /bin/bash /test/file7
milek at r600:~# cp /bin/bash /test/file8
milek at r600:~# cp /bin/bash /test/file9
milek at r600:~# sync
milek at r600:~# dd if=/dev/zero of=file1 seek=50 count=10000 conv=notrunc
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 0.179617 s, 28.5 MB/s
milek at r600:~# sync
milek at r600:~# zpool scrub test
milek at r600:~# zpool status -v test
   pool: test
  state: DEGRADED
status: One or more devices has experienced an error resulting in data
     corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
     entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub completed after 0h0m with 7 errors on Thu Feb  4 00:18:40 
2010
config:

     NAME                        STATE     READ WRITE CKSUM
     test                        DEGRADED     0     0     7
       /export/home/milek/file1  DEGRADED     0     0    29  too many 
errors

errors: Permanent errors have been detected in the following files:

         /test/file1
milek at r600:~#
milek at r600:~# rm /test/file1
milek at r600:~# sync
milek at r600:~# zpool scrub test
milek at r600:~# zpool status -v test
   pool: test
  state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
     attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
     using ''zpool clear'' or replace the device with
''zpool replace''.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub completed after 0h0m with 0 errors on Thu Feb  4 00:19:55 
2010
config:

     NAME                        STATE     READ WRITE CKSUM
     test                        DEGRADED     0     0     7
       /export/home/milek/file1  DEGRADED     0     0    29  too many 
errors

errors: No known data errors
milek at r600:~# zpool clear test
milek at r600:~# zpool scrub test
milek at r600:~# zpool status -v test
   pool: test
  state: ONLINE
  scrub: scrub completed after 0h0m with 0 errors on Thu Feb  4 00:20:12 
2010
config:

     NAME                        STATE     READ WRITE CKSUM
     test                        ONLINE       0     0     0
       /export/home/milek/file1  ONLINE       0     0     0

errors: No known data errors
milek at r600:~#
milek at r600:~# ls -la /test/
total 7191
drwxr-xr-x  2 root root     10 2010-02-04 00:19 .
drwxr-xr-x 28 root root     30 2010-02-04 00:17 ..
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file2
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file3
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file4
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file5
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file6
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file7
-r-xr-xr-x  1 root root 799040 2010-02-04 00:18 file8
-r-xr-xr-x  1 root root 799040 2010-02-04 00:18 file9
milek at r600:~#


-- 
Robert Milkowski
htpp://milek.blogspot.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100204/8c933871/attachment.html>

zfs discuss - Feb 2010 - What happens when: file-corrupted and no-redundancy?

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

[zfs-discuss] What happens when: file-corrupted and no-redundancy?