Hey guys,
I had a zfs system in raidz1 that was working until there was a power outage and
now I''m getting this:
pool: tank
state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ''zpool
clear''.
see: http://www.sun.com/msg/ZFS-8000-HC
scrub: scrub in progress for 0h49m, 0.00% done, 7059912h41m to go
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 28 0 0
raidz1-0 DEGRADED 183 13 6
c6d0 ONLINE 0 0 0
c7d0 DEGRADED 9 214 22 too many errors
c7d1 DEGRADED 13 27 0 too many errors
errors: 28 data errors, use ''-v'' for a list
When I use the -v option to see the errors I get stuff like this:
<metadata>:<0x18>
<metadata>:<0x19>
...
So my question is, when the scrub is done, what should I do to fix the problems?
I think those 28 errors refer to 28 different files, so lets say I have no
problem deleting those files to save the rest of the drives, how do I go about
doing this?
Thanks.
--
This message posted from opensolaris.org
Ok so now I have no idea what to do. The scrub is not working either. The pool
is only 3x 1.5TB drives so it should not take so long. Does anyone know what I
should do next?
pool: tank
state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ''zpool
clear''.
see: http://www.sun.com/msg/ZFS-8000-HC
scrub: scrub in progress for 39h40m, 0.00% done, 336791319h2m to go
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 28 0 0
raidz1-0 DEGRADED 183 13 6
c6d0 ONLINE 0 0 0
c7d0 DEGRADED 9 214 22 too many errors
c7d1 DEGRADED 13 27 0 too many errors
errors: 28 data errors, use ''-v'' for a list
Thanks.
--
This message posted from opensolaris.org
Jim Klimov
2011-Jul-05 13:39 UTC
[zfs-discuss] I/O Currently Suspended Need Help Repairing
Hello, Are you certain that after the outage your disks are indeed accessible? * What does BIOS say? * Are there any errors reported in "dmesg" output or the "/var/adm/messages" file? * Does the "format" command return in a timely manner? ** Can you access and print the disk labels in "format" command? (Select a disk by number, enter "p", enter "p" again). From the output below it seems that there is some hardware problem, like connectivity (loose SATA cables, etc.) or the disks are fried. And with to disks out of the 3-disk set, the outlook is grim (if they are indeed FUBAR). If it is just a connectivity problem with at least one of the disks, you have a chance of scrub salvaging the data. So more or less start by doing what the status command said, in this case ;) 2011-07-04 23:52, zfsnoob4 wrote:> Ok so now I have no idea what to do. The scrub is not working either. The pool is only 3x 1.5TB drives so it should not take so long. Does anyone know what I should do next? > > > pool: tank > state: DEGRADED > status: One or more devices are faulted in response to IO failures. > action: Make sure the affected devices are connected, then run ''zpool clear''. > see: http://www.sun.com/msg/ZFS-8000-HC > scrub: scrub in progress for 39h40m, 0.00% done, 336791319h2m to go > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 28 0 0 > raidz1-0 DEGRADED 183 13 6 > c6d0 ONLINE 0 0 0 > c7d0 DEGRADED 9 214 22 too many errors > c7d1 DEGRADED 13 27 0 too many errors > > errors: 28 data errors, use ''-v'' for a list > > > > > Thanks.HTH, //Jim Klimov
Thanks for your help,
I did a zpool clear and now this happens:
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub in progress for 61h58m, 0.00% done, 526074322h7m to go
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c6d0 ONLINE 0 0 0
c7d0 ONLINE 0 0 12 15.5K repaired
c7d1 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
<metadata>:<0x0>
<metadata>:<0x1>
<metadata>:<0x3>
<metadata>:<0x14>
<metadata>:<0x15>
<metadata>:<0x16>
<metadata>:<0x18>
<metadata>:<0x19>
<metadata>:<0x1a>
<metadata>:<0x1b>
<metadata>:<0x1c>
<metadata>:<0x1d>
<metadata>:<0x1e>
<metadata>:<0x1f>
tank/server:<0x0>
tank/media:<0x0>
So there are two things going on. It says there is still a scrub in progress but
its at 0% after 61h. Is there a way to stop it? Should I do that?
Also for the error files, how do I delete those files and clear the error?
Thanks again for your help.
--
This message posted from opensolaris.org
Reading through this page (http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html), it seems like all I need to do is ''rm'' the file. The problem is finding it in the first place. Near the bottom of this page it says: " If the damage is within a file data block, then the file can safely be removed, thereby clearing the error from the system. The first step is to try to locate the file by using the find command and specify the object number that is identified in the zpool status output under DATASET/OBJECT/RANGE output as the inode number to find. For example: # find -inum 6 " But the find inum command isn''t working. Has anyone run into this? Thanks. -- This message posted from opensolaris.org