Hi there!
Failure of one disk destroyed all my data. Physical removal of broken
disks caused zpool to make another disk to appear twice causing final
data corruption. Everything was fine until broken disks c7t0d0 was
removed. Is there any way to remove second c5t0d0 from pool which
should be missing c7t0d0 ? How to build this pool back to degraded mode
with two working disks left?
Tomppa
Timeline:
*** Jan 2007 raidz1 zpool v was built using 3 Lacie 500G USB disks
c5t0d0, c7t0d0 and c8t0d0
*** Jul 6 11:12 c7t0d0 fails> Jul 6 11:12:29 iki scsi: [ID 107833 kern.warning] WARNING: /pci at
8,700000/usb at 2,2/storage at 4/disk at 0,0 (sd3):
> Jul 6 11:12:29 iki SCSI transport failed: reason
''timeout'': retrying command
> Jul 6 11:12:33 iki scsi: [ID 107833 kern.warning] WARNING: /pci at
8,700000/usb at 2,2/storage at 4/disk at 0,0 (sd3):
> Jul 6 11:12:33 iki SCSI transport failed: reason
''tran_err'': retrying command
> Jul 6 11:13:33 iki Error for Command: read(10) Error
Level: Retryable
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.notice] Requested Block:
90452642 Error Block: 90452642
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.notice] Vendor: SAMSUNG
Serial Number:
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.notice] Sense Key: No
Additional Sense
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.notice] ASC: 0x0 (no
additional sense info), ASCQ: 0x0, FRU: 0x0
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.warning] WARNING: /pci at
8,700000/usb at 2,2/storage at 4/disk at 0,0 (sd3):
> Jul 6 11:13:33 iki Error for Command: read(10) Error
Level: Retryable
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.notice] Requested Block:
90452642 Error Block: 90452642
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.notice] Vendor: SAMSUNG
Serial Number:
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.notice] Sense Key: No
Additional Sense
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.notice] ASC: 0x0 (no
additional sense info), ASCQ: 0x0, FRU: 0x0
> Jul 6 11:13:33 iki scsi: [ID 107833 kern.warning] WARNING: /pci at
8,700000/usb at 2,2/storage at 4/disk at 0,0 (sd3):
*** Jul 8 03:35:26
resilver completed with 0 errors on Sun Jul 8 03:35:26 2007
*** Jul 8 12:00> % zpool status
> pool: v
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error. An
> attempt was made to correct the error. Applications are
> unaffected.
> action: Determine if the device needs to be replaced, and clear the
> errors
> using ''zpool clear'' or replace the device with
''zpool replace''.
> see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: resilver completed with 0 errors on Sun Jul 8 03:35:26 2007
> config:
>
> NAME STATE READ WRITE CKSUM
> v ONLINE 0 0 369
> raidz1 ONLINE 0 0 369
> c5t0d0 ONLINE 0 0 1
> c7t0d0 ONLINE 29 2.54K 507
> c8t0d0 ONLINE 0 0 0
>
> errors: No known data errors
> % format -e
> Searching for disks...done
>
>
> AVAILABLE DISK SELECTIONS:
> 0. c2t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
> /pci at 8,600000/SUNW,qlc at 4/fp at 0,0/ssd at
w21000000871ad9fd,0
> 1. c2t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
> /pci at 8,600000/SUNW,qlc at 4/fp at 0,0/ssd at
w21000000871d4348,0
> 2. c5t0d0 <SAMSUNG-HD501LJ-CR10-465.76GB>
> /pci at 8,700000/usb at 2,2/storage at 5/disk at 0,0
> 3. c7t0d0 <drive not available>
> /pci at 8,700000/usb at 2,2/storage at 4/disk at 0,0
> 4. c8t0d0 <SAMSUNG-HD501LJ-CR10-465.76GB>
> /pci at 8,700000/usb at 2,2/storage at 3/disk at 0,0
> Specify disk (enter its number): ^D
>
> % iostat -En
> c5t0d0 Soft Errors: 900 Hard Errors: 0 Transport Errors: 0
> Vendor: SAMSUNG Product: HD501LJ Revision: CR10 Serial No:
> Size: 500.11GB <500107862016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal
Request: 900 Predictive Failure Analysis: 0
> sd3 Soft Errors: 4258 Hard Errors: 2 Transport Errors: 2698
> Vendor: SAMSUNG Product: HD501LJ Revision: CR10 Serial No:
> Size: 500.11GB <500107862016 bytes>> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal
Request: 787 Predictive Failure Analysis: 0
> c8t0d0 Soft Errors: 843 Hard Errors: 0 Transport Errors: 0
> Vendor: SAMSUNG Product: HD501LJ Revision: CR10 Serial No:
> Size: 500.11GB <500107862016 bytes>> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal
Request: 843 Predictive Failure Analysis: 0
*** Jul 8 14:12 c7t0d0 physically removed> % zpool status pool: v
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://www.sun.com/msg/ZFS-8000-8A
> scrub: resilver completed with 0 errors on Sun Jul 8 03:35:26 2007
> config:
>
> NAME STATE READ WRITE CKSUM
> v ONLINE 0 0 9.96K
> raidz1 ONLINE 0 0 9.96K
> c5t0d0 ONLINE 0 0 40
> c5t0d0 ONLINE 29 2.54K 1.35K
> c8t0d0 ONLINE 0 0 0
>
> errors: 0 data errors, use ''-v'' for a list
*** Jul 10 09:30 zfs unmount v ; zpool export v ; zpool import v
which caused this panic> Jul 10 09:30:19 iki savecore: [ID 570001 auth.error] reboot after panic:
asserti
> on failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0
(0x6 == 0
> x0), file: ../../common/fs/zfs/space_map.c, line: 307
*** Jul 10 14:37> % zpool import
> pool: v
> id: 16534952157184541936
> state: DEGRADED
> status: One or more devices contains corrupted data.
> action: The pool can be imported despite missing or damaged devices. The
> fault tolerance of the pool may be compromised if imported.
> see: http://www.sun.com/msg/ZFS-8000-4J
> config:
>
> v DEGRADED
> raidz1 DEGRADED
> c5t0d0 FAULTED corrupted data
> c5t0d0 ONLINE
> c8t0d0 ONLINE