thr3ads.net - zfs discuss - [zfs-discuss] RAID-Z with Permanent errors detected in files [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Andrej Gortchivkin

2010-Apr-02 01:52 UTC

[zfs-discuss] RAID-Z with Permanent errors detected in files

Hi All,

I just got across a strange (well... at least for me) situation with ZFS and I
hope you might be able to help me out. Recently I built a new machine from
scratch for my storage needs which include various CIFS / NFS and most
importantly VMware ESX based operations (in conjunction with COMSTAR). The
machine that I built is based on fairly new hardware and is running x86
OpenSolaris B134 OS respectively with a RAID-Z pool on the top of 4 x 1TB SATA-2
Samsung HDD''s + one additional HDD for hotspare purposes.

Yesterday one of the HDD''s decided to produce some errors and although
I wasn''t surprised of that, I was more surprised about the fact that
there are permanent errors over some files.

Here is the output I got from right after the resilvering:

--------------------------------------------------------------------------------------------------

  pool: ZPOOL_SAS_1234
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 2h45m with 4 errors on Fri Apr  2 03:01:34 2010
config:

        NAME        STATE     READ WRITE CKSUM
        ZPOOL_SAS_1234  DEGRADED   381     0     0
          c7t0d0    ONLINE       0     0     0
          c7t1d0    ONLINE       0     0     0
          c7t2d0    ONLINE       0     0     0
          spare-3   DEGRADED   363     0     1
            c7t3d0  DEGRADED   381     0     3  too many errors
            c7t4d0  ONLINE       0     0   730  326G resilvered
        spares
          c7t4d0    INUSE     currently in use

errors: Permanent errors have been detected in the following files:

        /ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN1_DATASTORE01
        /ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN2_DATASTORE02
        /ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN5_DATASTORE05

--------------------------------------------------------------------------------------------------

Although I''m sure that the "c7t3d0" HDD is having some issues
(obviously I''m about to replace it), now I still don''t
understand why would I get corruption over the files considering that all other
drives are indicating zero problems within the READ, WRITE and CHKSUM columns.
Perhaps I''m missing something about the ZFS concept and it''s
redundancy but my understanding for RAID-Z is that it operates in a way similar
as RAID-5 which should mean that if one HDD goes down for whatever reason, the
data stored over my ZFS pool / datasets should remain unharmed due to the
redundancy.

Just as an additional information, the LUNX_DATASTOREXX are files which I
exported via COMSTAR toward few ESX machines. Not sure if there is any
relationship between the indicated errors and the way I use my storage box but I
can certainly tell that the files were under heavy at the time being of the HDD
failure.

Any advice would be appreciated very much.

Cheers
-- 
This message posted from opensolaris.org

Ian Collins

2010-Apr-02 01:58 UTC

head link

[zfs-discuss] RAID-Z with Permanent errors detected in files

On 04/ 2/10 02:52 PM, Andrej Gortchivkin wrote:> Hi All,
>
> I just got across a strange (well... at least for me) situation with ZFS
and I hope you might be able to help me out. Recently I built a new machine from
scratch for my storage needs which include various CIFS / NFS and most
importantly VMware ESX based operations (in conjunction with COMSTAR). The
machine that I built is based on fairly new hardware and is running x86
OpenSolaris B134 OS respectively with a RAID-Z pool on the top of 4 x 1TB SATA-2
Samsung HDD''s + one additional HDD for hotspare purposes.
>
> Yesterday one of the HDD''s decided to produce some errors and
although I wasn''t surprised of that, I was more surprised about the
fact that there are permanent errors over some files.
>
> Here is the output I got from right after the resilvering:
>
>
--------------------------------------------------------------------------------------------------
>
>    pool: ZPOOL_SAS_1234
>   state: DEGRADED
> status: One or more devices has experienced an error resulting in data
>          corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>          entire pool from backup.
>     see: http://www.sun.com/msg/ZFS-8000-8A
>   scrub: resilver completed after 2h45m with 4 errors on Fri Apr  2
03:01:34 2010
> config:
>
>          NAME        STATE     READ WRITE CKSUM
>          ZPOOL_SAS_1234  DEGRADED   381     0     0
>            c7t0d0    ONLINE       0     0     0
>            c7t1d0    ONLINE       0     0     0
>            c7t2d0    ONLINE       0     0     0
>            spare-3   DEGRADED   363     0     1
>              c7t3d0  DEGRADED   381     0     3  too many errors
>              c7t4d0  ONLINE       0     0   730  326G resilvered
>          spares
>            c7t4d0    INUSE     currently in use
>
> errors: Permanent errors have been detected in the following files:
>
>          /ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN1_DATASTORE01
>          /ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN2_DATASTORE02
>          /ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN5_DATASTORE05
>
>
--------------------------------------------------------------------------------------------------
>
> Although I''m sure that the "c7t3d0" HDD is having some
issues (obviously I''m about to replace it), now I still don''t
understand why would I get corruption over the files considering that all other
drives are indicating zero problems within the READ, WRITE and CHKSUM columns.
Perhaps I''m missing something about the ZFS concept and it''s
redundancy but my understanding for RAID-Z is that it operates in a way similar
as RAID-5 which should mean that if one HDD goes down for whatever reason, the
data stored over my ZFS pool / datasets should remain unharmed due to the
redundancy.
>
>    You don''t appear to have any redundancy!  How did you create the pool 
(should be in "zpool history")?

-- 
Ian.

Andrej Gortchivkin

2010-Apr-02 02:30 UTC

head link

[zfs-discuss] RAID-Z with Permanent errors detected in files

I created the pool by using:

zpool create ZPOOL_SAS_1234 raidz c7t0d0 c7t1d0 c7t2d0 c7t3d0

However now that you mentioned the lack of redundancy I see where is the
problem. I guess it will then remain a mystery how did this happen, since
I''m very careful when engaging the commands and I''m sure that
I didn''t miss the "raidz" parameter.

Could this by chance have been caused by some sort of bug that existed in the
previous releases of OpenSolaris ? I''m using the OS on my machine since
around B11X (can''t remember exactly but maybe 119) and I kept upgrading
it constantly till now.

Thanks for the quick response and your help.

p.s. I guess I was too lazy to check the pool itself. No kidding I should have
started with the simple checks first :)
-- 
This message posted from opensolaris.org

Ian Collins

2010-Apr-02 02:35 UTC

head link

[zfs-discuss] RAID-Z with Permanent errors detected in files

On 04/ 2/10 03:30 PM, Andrej Gortchivkin wrote:> I created the pool by using:
>
> zpool create ZPOOL_SAS_1234 raidz c7t0d0 c7t1d0 c7t2d0 c7t3d0
>
> However now that you mentioned the lack of redundancy I see where is the
problem. I guess it will then remain a mystery how did this happen, since
I''m very careful when engaging the commands and I''m sure that
I didn''t miss the "raidz" parameter.
>
> Could this by chance have been caused by some sort of bug that existed in
the previous releases of OpenSolaris ? I''m using the OS on my machine
since around B11X (can''t remember exactly but maybe 119) and I kept
upgrading it constantly till now.
>
>    No, the command syntax has been there from the beginning....

Better luck next time!

-- 
Ian.

Lutz Schumann

2010-Apr-02 21:50 UTC

head link

[zfs-discuss] RAID-Z with Permanent errors detected in files

> I guess it will then
> remain a mystery how did this happen, since I''m very
> careful when engaging the commands and I''m sure that
> I didn''t miss the "raidz" parameter. 
You can be sure by calling "zpool history". 

Robert
-- 
This message posted from opensolaris.org

zfs discuss - Apr 2010 - RAID-Z with Permanent errors detected in files

[zfs-discuss] RAID-Z with Permanent errors detected in files

[zfs-discuss] RAID-Z with Permanent errors detected in files

[zfs-discuss] RAID-Z with Permanent errors detected in files

[zfs-discuss] RAID-Z with Permanent errors detected in files

[zfs-discuss] RAID-Z with Permanent errors detected in files