thr3ads.net - zfs discuss - [zfs-discuss] Permanent errors on two files [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Gary Mills

2009-Dec-04 19:19 UTC

[zfs-discuss] Permanent errors on two files

I just noticed this today:

    # zpool status -v
      pool: space
     state: ONLINE
    status: One or more devices has experienced an error resulting in data
            corruption.  Applications may be affected.
    action: Restore the file in question if possible.  Otherwise restore the
            entire pool from backup.
       see: http://www.sun.com/msg/ZFS-8000-8A
     scrub: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            space       ONLINE       0     0     0
              c0t1d0    ONLINE       0     0     0
    
    errors: Permanent errors have been detected in the following files:
    
            space/dcc:<0x11e887>
            space/dcc:<0xba25aa>

The device here is a hardware mirror of two 146-gig SAS drives.
How can ZFS detect errors when it has no redundancy?  How do I
determine what files these are?  Will a scrub fix it?  This is a
production system, so I want to be careful.

It''s running Solaris 10 5/09 s10x_u7wos_08 X86.

-- 
-Gary Mills-        -Unix Group-        -Computer and Network Services-

Cindy Swearingen

2009-Dec-04 21:52 UTC

head link

[zfs-discuss] Permanent errors on two files

Hi Gary,

To answer your questions, the hardware read some data and ZFS detected
a problem with the checksums in this dataset and reported this problem.
ZFS can do this regardless of ZFS redundancy.

I don''t think a scrub will fix these permanent errors, but it depends
on the corruption. If its data, but not redundant and no copies=2,
then probably not. If its metadata, then multiple copies exist, but
it depends on the extent of the corruption.

If space/dcc is a dataset, is it mounted? ZFS might not be able to
print the filenames if the dataset is not mounted, but I''m not sure
if this is why only object numbers are displayed.

The zpool status -v command will generally print out filenames, dnode
object numbers, or identify metadata corruption problems. These look
like object numbers, because they are large, rather than metadata
objects, but an expert will have to comment.

You might be able to identify these object numbers with zdb, but
I''m not sure how do that.

I would also check fmdump -eV to see how frequent the hardware
has had problems.

Cindy

On 12/04/09 12:19, Gary Mills wrote:> I just noticed this today:
> 
>     # zpool status -v
>       pool: space
>      state: ONLINE
>     status: One or more devices has experienced an error resulting in data
>             corruption.  Applications may be affected.
>     action: Restore the file in question if possible.  Otherwise restore
the
>             entire pool from backup.
>        see: http://www.sun.com/msg/ZFS-8000-8A
>      scrub: none requested
>     config:
>     
>             NAME        STATE     READ WRITE CKSUM
>             space       ONLINE       0     0     0
>               c0t1d0    ONLINE       0     0     0
>     
>     errors: Permanent errors have been detected in the following files:
>     
>             space/dcc:<0x11e887>
>             space/dcc:<0xba25aa>
> 
> The device here is a hardware mirror of two 146-gig SAS drives.
> How can ZFS detect errors when it has no redundancy?  How do I
> determine what files these are?  Will a scrub fix it?  This is a
> production system, so I want to be careful.
> 
> It''s running Solaris 10 5/09 s10x_u7wos_08 X86.
>

Victor Latushkin

2009-Dec-04 22:52 UTC

head link

[zfs-discuss] Permanent errors on two files

On Dec 5, 2009, at 0:52, Cindy Swearingen <Cindy.Swearingen at Sun.COM>  
wrote:
> Hi Gary,
>
> To answer your questions, the hardware read some data and ZFS detected
> a problem with the checksums in this dataset and reported this  
> problem.
> ZFS can do this regardless of ZFS redundancy.
>
> I don''t think a scrub will fix these permanent errors, but it
depends
> on the corruption. If its data, but not redundant and no copies=2,
> then probably not. If its metadata, then multiple copies exist, but
> it depends on the extent of the corruption.
>
> If space/dcc is a dataset, is it mounted? ZFS might not be able to
> print the filenames if the dataset is not mounted, but I''m not
sure
> if this is why only object numbers are displayed.
>
> The zpool status -v command will generally print out filenames, dnode
> object numbers, or identify metadata corruption problems. These look
> like object numbers, because they are large, rather than metadata
> objects, but an expert will have to comment.
Yes, thi is object numbers and most likely reason these are not turned  
into filnames is that corresponding files no longer exist.

So I''d run scrub another time, if the files are gone and there are no  
other corruptions scrub will reset error log and zpool status should  
become clean.>
> You might be able to identify these object numbers with zdb, but
> I''m not sure how do that.
>
You can try to use zdb this way to check if these objects still exist

zdb -d space/dcc 0x11e887 0xba25aa

Victor> I would also check fmdump -eV to see how frequent the hardware
> has had problems.
>
> Cindy
>
>
> On 12/04/09 12:19, Gary Mills wrote:
>> I just noticed this today:
>>    # zpool status -v
>>      pool: space
>>     state: ONLINE
>>    status: One or more devices has experienced an error resulting  
>> in data
>>            corruption.  Applications may be affected.
>>    action: Restore the file in question if possible.  Otherwise  
>> restore the
>>            entire pool from backup.
>>       see: http://www.sun.com/msg/ZFS-8000-8A
>>     scrub: none requested
>>    config:
>>                NAME        STATE     READ WRITE CKSUM
>>            space       ONLINE       0     0     0
>>              c0t1d0    ONLINE       0     0     0
>>        errors: Permanent errors have been detected in the following  
>> files:
>>                space/dcc:<0x11e887>
>>            space/dcc:<0xba25aa>
>> The device here is a hardware mirror of two 146-gig SAS drives.
>> How can ZFS detect errors when it has no redundancy?  How do I
>> determine what files these are?  Will a scrub fix it?  This is a
>> production system, so I want to be careful.
>> It''s running Solaris 10 5/09 s10x_u7wos_08 X86.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Gary Mills

2009-Dec-07 02:25 UTC

head link

[zfs-discuss] Permanent errors on two files

On Fri, Dec 04, 2009 at 02:52:47PM -0700, Cindy Swearingen
wrote:> 
> If space/dcc is a dataset, is it mounted? ZFS might not be able to
> print the filenames if the dataset is not mounted, but I''m not
sure
> if this is why only object numbers are displayed.
Yes, it''s mounted and is quite an active filesystem.
> I would also check fmdump -eV to see how frequent the hardware
> has had problems.
That shows ZFS checksum errors in July, but nothing since that time.
There were also DIMM errors before that, starting in June.  We
replaced the failed DIMMs, also in July.  This is an X4450 with ECC
memory.  There were no disk errors reported.  I suppose we can blame
the memory.

-- 
-Gary Mills-        -Unix Group-        -Computer and Network Services-

Gary Mills

2009-Dec-07 02:29 UTC

head link

[zfs-discuss] Permanent errors on two files

On Sat, Dec 05, 2009 at 01:52:12AM +0300, Victor Latushkin
wrote:> On Dec 5, 2009, at 0:52, Cindy Swearingen <Cindy.Swearingen at
Sun.COM>
> wrote:
> 
> >The zpool status -v command will generally print out filenames, dnode
> >object numbers, or identify metadata corruption problems. These look
> >like object numbers, because they are large, rather than metadata
> >objects, but an expert will have to comment.
> 
> Yes, thi is object numbers and most likely reason these are not turned  
> into filnames is that corresponding files no longer exist.
That seems to be the case:

    # zdb -d space/dcc 0x11e887 0xba25aa
    Dataset space/dcc [ZPL], ID 21, cr_txg 19, 20.5G, 3672408 objects
> So I''d run scrub another time, if the files are gone and there are
no
> other corruptions scrub will reset error log and zpool status should  
> become clean.
That worked.  After the scrub, there are no errors reported.
> >You might be able to identify these object numbers with zdb, but
> >I''m not sure how do that.
> 
> You can try to use zdb this way to check if these objects still exist
> 
> zdb -d space/dcc 0x11e887 0xba25aa
-- 
-Gary Mills-        -Unix Group-        -Computer and Network Services-

zfs discuss - Dec 2009 - Permanent errors on two files

[zfs-discuss] Permanent errors on two files

[zfs-discuss] Permanent errors on two files

[zfs-discuss] Permanent errors on two files

[zfs-discuss] Permanent errors on two files

[zfs-discuss] Permanent errors on two files