Scott Aitken
2012-Jul-16  15:37 UTC
[zfs-discuss] Corrupted pool: I/O error and Bad exchange descriptor
Hi all,
this is a follow up some help I was soliciting with my corrupted pool.
The short story is I can have no confidence in the quality in the labels on 2
of my 5 drive RAIDZ array.  For various reasons.
There is a possibility even that one drive has label of another (a mirroring
accident).
Anyhoo, for some odd reason, the drives finally mounted (they are actually
drive images on another ZFS pool which I have snapshotted).
When I imported the pool, ZFS complained that two of the datasets would not
mount, but the remainder did.
It seems that small files read ok.  (Perhaps small enough to fit on a single
block -
hence probably mirrored and not striped.  Assuming my understanding of what
happens to small files is correct).
But on larger files I get:
root at openindiana-01:/ZP-8T-RZ1-01/incoming# cp httpd-error.log.zip /mnt2/
cp: reading `httpd-error.log.zip'': I/O error
and on some directories:
root at openindiana-01:/ZP-8T-RZ1-01/usr# ls -al
cd ..ls: cannot access obj: Bad exchange descriptor
total 54
drwxr-xr-x  5 root root  5 2011-11-03 16:28 .
drwxr-xr-x 11 root root 11 2011-11-04 13:14 ..
??????????  ? ?    ?     ?                ? obj
drwxr-xr-x 68 root root 83 2011-10-30 01:00 ports
drwxr-xr-x 22 root root 31 2011-09-25 02:00 src
Here is the zpool status output:
root at openindiana-01:/ZP-8T-RZ1-01# zpool status
 pool: ZP-8T-RZ1-01
state: DEGRADED
status: One or more devices has experienced an error resulting in data
       corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
       entire pool from backup.
  see: http://www.sun.com/msg/ZFS-8000-8A
 scan: scrub in progress since Sat Nov  5 23:57:46 2011
   112G scanned out of 6.93T at 6.24M/s, 318h17m to go
   305M repaired, 1.57% done
config:
       NAME                      STATE     READ WRITE CKSUM
       ZP-8T-RZ1-01              DEGRADED     0     0  356K
         raidz1-0                DEGRADED     0     0  722K
           12339070507640025002  UNAVAIL      0     0     0  was /dev/lofi/2
           /dev/lofi/5           DEGRADED     0     0     0  too many errors
(repairing)
           /dev/lofi/4           DEGRADED     0     0     0  too many errors
(repairing)
           /dev/lofi/3           DEGRADED     0     0 74.4K  too many errors
(repairing)
           /dev/lofi/1           DEGRADED     0     0     0  too many errors
(repairing)
All those errors may be caused by one disk actually owning the wrong label.
I''m not entirely sure.
Also, while it''s complaining that /dev/lofi/2 is UNAVAIL, it certainly
is.
Although it''s probably not labelled with
''12339070507640025002''.
I''d love to get some of my data back.  Any recovery is a bonus.
If anyone is keen, I have enabled SSH into the Open Indiana box
which I''m using to try and recovery the pool, so if you''d like
to take a shot
please let me know.
Thanks in advance,
Scott