thr3ads.net - zfs discuss - [zfs-discuss] I/O Currently Suspended Need Help Repairing [Jul 2011]

If this information is useful, please help other people find it:
Share via:

zfsnoob4

2011-Jul-03 05:09 UTC

[zfs-discuss] I/O Currently Suspended Need Help Repairing

Hey guys,

I had a zfs system in raidz1 that was working until there was a power outage and
now I''m getting this:


  pool: tank
 state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ''zpool
clear''.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: scrub in progress for 0h49m, 0.00% done, 7059912h41m to go
config:

        NAME        STATE     READ WRITE CKSUM
        tank        DEGRADED    28     0     0
          raidz1-0  DEGRADED   183    13     6
            c6d0    ONLINE       0     0     0
            c7d0    DEGRADED     9   214    22  too many errors
            c7d1    DEGRADED    13    27     0  too many errors

errors: 28 data errors, use ''-v'' for a list




When I use the -v option to see the errors I get stuff like this:

        <metadata>:<0x18>
        <metadata>:<0x19>
...


So my question is, when the scrub is done, what should I do to fix the problems?
I think those 28 errors refer to 28 different files, so lets say I have no
problem deleting those files to save the rest of the drives, how do I go about
doing this?


Thanks.
-- 
This message posted from opensolaris.org

zfsnoob4

2011-Jul-04 19:52 UTC

head link

[zfs-discuss] I/O Currently Suspended Need Help Repairing

Ok so now I have no idea what to do. The scrub is not working either. The pool
is only 3x 1.5TB drives so it should not take so long. Does anyone know what I
should do next?


  pool: tank
 state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ''zpool
clear''.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: scrub in progress for 39h40m, 0.00% done, 336791319h2m to go
config:

        NAME        STATE     READ WRITE CKSUM
        tank        DEGRADED    28     0     0
          raidz1-0  DEGRADED   183    13     6
            c6d0    ONLINE       0     0     0
            c7d0    DEGRADED     9   214    22  too many errors
            c7d1    DEGRADED    13    27     0  too many errors

errors: 28 data errors, use ''-v'' for a list




Thanks.
-- 
This message posted from opensolaris.org

Jim Klimov

2011-Jul-05 13:39 UTC

head link

[zfs-discuss] I/O Currently Suspended Need Help Repairing

Hello,

Are you certain that after the outage your disks are indeed accessible?
* What does BIOS say?
* Are there any errors reported in "dmesg" output or the
"/var/adm/messages" file?
* Does the "format" command return in a timely manner?
** Can you access and print the disk labels in "format" command?
(Select a disk by number, enter "p", enter "p" again).

 From the output below it seems that there is some hardware problem,
like connectivity (loose SATA cables, etc.) or the disks are fried.

And with to disks out of the 3-disk set, the outlook is grim (if they
are indeed FUBAR). If it is just a connectivity problem with at least
one of the disks, you have a chance of scrub salvaging the data.

So more or less start by doing what the status command said,
in this case ;)

2011-07-04 23:52, zfsnoob4 wrote:> Ok so now I have no idea what to do. The scrub is not working either. The
pool is only 3x 1.5TB drives so it should not take so long. Does anyone know
what I should do next?
>
>
>    pool: tank
>   state: DEGRADED
> status: One or more devices are faulted in response to IO failures.
> action: Make sure the affected devices are connected, then run
''zpool clear''.
>     see: http://www.sun.com/msg/ZFS-8000-HC
>   scrub: scrub in progress for 39h40m, 0.00% done, 336791319h2m to go
> config:
>
>          NAME        STATE     READ WRITE CKSUM
>          tank        DEGRADED    28     0     0
>            raidz1-0  DEGRADED   183    13     6
>              c6d0    ONLINE       0     0     0
>              c7d0    DEGRADED     9   214    22  too many errors
>              c7d1    DEGRADED    13    27     0  too many errors
>
> errors: 28 data errors, use ''-v'' for a list
>
>
>
>
> Thanks.HTH,
//Jim Klimov

zfsnoob4

2011-Jul-05 18:10 UTC

head link

[zfs-discuss] I/O Currently Suspended Need Help Repairing

Thanks for your help,

I did a zpool clear and now this happens:


 pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 61h58m, 0.00% done, 526074322h7m to go
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c6d0    ONLINE       0     0     0
            c7d0    ONLINE       0     0    12  15.5K repaired
            c7d1    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>
        <metadata>:<0x1>
        <metadata>:<0x3>
        <metadata>:<0x14>
        <metadata>:<0x15>
        <metadata>:<0x16>
        <metadata>:<0x18>
        <metadata>:<0x19>
        <metadata>:<0x1a>
        <metadata>:<0x1b>
        <metadata>:<0x1c>
        <metadata>:<0x1d>
        <metadata>:<0x1e>
        <metadata>:<0x1f>
        tank/server:<0x0>
        tank/media:<0x0>




So there are two things going on. It says there is still a scrub in progress but
its at 0% after 61h. Is there a way to stop it? Should I do that?

Also for the error files, how do I delete those files and clear the error?


Thanks again for your help.
-- 
This message posted from opensolaris.org

zfsnoob4

2011-Jul-05 21:03 UTC

head link

[zfs-discuss] I/O Currently Suspended Need Help Repairing

Reading through this page
(http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html), it seems like all I
need to do is ''rm'' the file. The problem is finding it in the
first place. Near the bottom of this page it says:

"
If the damage is within a file data block, then the file can safely be removed,
thereby clearing the error from the system. The first step is to try to locate
the file by using the find command and specify the object number that is
identified in the zpool status output under DATASET/OBJECT/RANGE output as the
inode number to find. For example:

# find -inum 6
"

But the find inum command isn''t working.

Has anyone run into this? Thanks.
-- 
This message posted from opensolaris.org

Possibly Parallel Threads

Search for more seemingly similar threads

zfs discuss - Jul 2011 - I/O Currently Suspended Need Help Repairing

[zfs-discuss] I/O Currently Suspended Need Help Repairing

[zfs-discuss] I/O Currently Suspended Need Help Repairing

[zfs-discuss] I/O Currently Suspended Need Help Repairing

[zfs-discuss] I/O Currently Suspended Need Help Repairing

[zfs-discuss] I/O Currently Suspended Need Help Repairing

Possibly Parallel Threads