Kwang Whee Lee
2012-Jul-10 02:25 UTC
[zfs-discuss] Repairing Faulted ZFS pool and missing disks
Hello all, I have been struggled with ZFS and my data on the OpenSolaris 2009.06 and Solaris 11. Last month, my ZFS pool tank (with RAIDz1 configured) became unavailable and 4 out of 6 SCSI disks could not be recognized by OpenSolaris #format command. 1) The four missing Seagate disks (1000.20GB) are c7t0d0, c7t1d0, c7t3d0, and c7t4d0. root at MEL-SUN-X2270:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7t2d0 <ATA-ST31000528AS-CC38-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 1. c7t5d0 <ATA-ST31000528AS-CC37-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 2. c9d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 3. c10d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 4. c10d1 <DEFAULT cyl 3888 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 1,0 root at MEL-SUN-X2270:~# iostat -E cmdk0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUS3F Size: 500.10GB <500101152768 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 cmdk1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUGEF Size: 500.10GB <500101152768 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 cmdk2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: SSDSA2SH032G1SB Revision: Serial No: CVEM02830008032 Size: 32.00GB <31999500288 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd3 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC38 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd4 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd5 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd6 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 root at MEL-SUN-X2270:~# zpool status -v pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c9d0s0 ONLINE 0 0 0 c10d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas raidz1 UNAVAIL 0 0 0 insufficient replicas c7t0d0 UNAVAIL 0 0 0 cannot open c7t1d0 UNAVAIL 0 0 0 cannot open c7t2d0 ONLINE 0 0 0 c7t3d0 UNAVAIL 0 0 0 cannot open c7t4d0 UNAVAIL 0 0 0 cannot open c7t5d0 ONLINE 0 0 0 pool: temppool1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM temppool1 UNAVAIL 0 0 0 insufficient replicas c13t0d0 UNAVAIL 0 0 0 cannot open 2) I did try power-cycle my SunFire X2270 server and the J4200 array. The tank was remained faulted and the disks were not visible by the OpenSolaris. The same result when I tried command #zpool status -x tank. Below are the warning messages I get. Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110700 Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110700 Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. messages.1 3) I have installed Solaris 11 on another disk and tried run #format command. I am able to see all the disks within the array now. root at solaris11:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c3t0d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 0,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__0/disk 1. c3t1d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 1,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__1/disk 2. c3t2d0 <ATA-ST31000528AS-CC38-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__2/disk 3. c3t3d0 <ATA-ST31000528AS-CC49 cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 3,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__3/disk 4. c3t4d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 4,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__4/disk 5. c3t5d0 <ATA-ST31000528AS-CC37-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__5/disk 6. c5d0 <MCBQE64G- SE752X069-0001 cyl 7780 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 4) I try to import the zpool from the new Solaris 11, below is the result. root at solaris11:~# zpool import pool: tank id: 4902002746703797589 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank FAULTED corrupted data raidz1-0 DEGRADED c3t0d0 ONLINE c3t1d0 ONLINE c3t2d0 ONLINE c7t3d0 UNAVAIL cannot open c3t4d0 ONLINE c3t5d0 ONLINE 5) Then I run the command #zpool import -f tank, it fails with error below. root at solaris11:~# zpool import -f tank cannot import ''tank'': I/O error Destroy and re-create the pool from a backup source. What can I do next to bring back my zpool as I have some of important VMs stored on this tank? Your input and advice given is highly appreciated! Regards, Lee EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120710/5899d2b6/attachment-0001.html>
Gregg Wonderly
2012-Jul-10 03:33 UTC
[zfs-discuss] Repairing Faulted ZFS pool and missing disks
First, try unplugging and reconnecting the cabling on your drives. There may be some corrosion on the connectors caused by oxidation. If that doesn''t change anything, then next try disconnecting one of the missing drives at a time and see if the pool will come up with one disk missing. One of the drives may be creating a voltage/signaling problem if it has an internal circuit failure of a Capacitor or Resistor. If format can see the drives at some point, but the pool will not import, try using "zpool clear" to remove the "permanent failure indicated" errors from the pool and then the pool should import. Run a scrub to check the integrity of the pool if you don''t have the ability to "copy it elsewhere" at that point. Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly. I recently had a 4x2tb mirrored pool show up with several failed drives. It turned out to be the controller. After replacing the controller, I could bring the pool back up and it scrubbed 3 times without problems, so it is fine for now, I have another pool with the same data on it that I can recover from if needed. Gregg On Jul 9, 2012, at 9:25 PM, Kwang Whee Lee wrote:> Hello all, > > I have been struggled with ZFS and my data on the OpenSolaris 2009.06 and Solaris 11. Last month, my ZFS pool tank (with RAIDz1 configured) became unavailable and 4 out of 6 SCSI disks could not be recognized by OpenSolaris #format command. > > 1) The four missing Seagate disks (1000.20GB) are c7t0d0, c7t1d0, c7t3d0, and c7t4d0. > > root at MEL-SUN-X2270:~# format > Searching for disks...done > > > AVAILABLE DISK SELECTIONS: > 0. c7t2d0 <ATA-ST31000528AS-CC38-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 > 1. c7t5d0 <ATA-ST31000528AS-CC37-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 > 2. c9d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 > 3. c10d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 > 4. c10d1 <DEFAULT cyl 3888 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 1,0 > > > > root at MEL-SUN-X2270:~# iostat -E > cmdk0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUS3F Size: 500.10GB <500101152768 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > cmdk1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUGEF Size: 500.10GB <500101152768 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > cmdk2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: SSDSA2SH032G1SB Revision: Serial No: CVEM02830008032 Size: 32.00GB <31999500288 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd3 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC38 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd4 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd5 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd6 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > > > > root at MEL-SUN-X2270:~# zpool status -v > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c9d0s0 ONLINE 0 0 0 > c10d0s0 ONLINE 0 0 0 > > errors: No known data errors > > pool: tank > state: UNAVAIL > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank UNAVAIL 0 0 0 insufficient replicas > raidz1 UNAVAIL 0 0 0 insufficient replicas > c7t0d0 UNAVAIL 0 0 0 cannot open > c7t1d0 UNAVAIL 0 0 0 cannot open > c7t2d0 ONLINE 0 0 0 > c7t3d0 UNAVAIL 0 0 0 cannot open > c7t4d0 UNAVAIL 0 0 0 cannot open > c7t5d0 ONLINE 0 0 0 > > pool: temppool1 > state: UNAVAIL > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > temppool1 UNAVAIL 0 0 0 insufficient replicas > c13t0d0 UNAVAIL 0 0 0 cannot open > > 2) I did try power-cycle my SunFire X2270 server and the J4200 array. The tank was remained faulted and the disks were not visible by the OpenSolaris. The same result when I tried command #zpool status ?x tank. > > Below are the warning messages I get. > > Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110700 > Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110700 > Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. > Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. > Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. > messages.1 > > 3) I have installed Solaris 11 on another disk and tried run #format command. I am able to see all the disks within the array now. > > root at solaris11:~# format > Searching for disks...done > > AVAILABLE DISK SELECTIONS: > 0. c3t0d0 <ATA-ST31000528AS-CC35-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 0,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__0/disk > 1. c3t1d0 <ATA-ST31000528AS-CC35-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 1,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__1/disk > 2. c3t2d0 <ATA-ST31000528AS-CC38-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__2/disk > 3. c3t3d0 <ATA-ST31000528AS-CC49 cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 3,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__3/disk > 4. c3t4d0 <ATA-ST31000528AS-CC35-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 4,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__4/disk > 5. c3t5d0 <ATA-ST31000528AS-CC37-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__5/disk > 6. c5d0 <MCBQE64G- SE752X069-0001 cyl 7780 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 > > 4) I try to import the zpool from the new Solaris 11, below is the result. > > root at solaris11:~# zpool import > pool: tank > id: 4902002746703797589 > state: FAULTED > status: The pool was last accessed by another system. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > tank FAULTED corrupted data > raidz1-0 DEGRADED > c3t0d0 ONLINE > c3t1d0 ONLINE > c3t2d0 ONLINE > c7t3d0 UNAVAIL cannot open > c3t4d0 ONLINE > c3t5d0 ONLINE > > 5) Then I run the command #zpool import ?f tank, it fails with error below. > > root at solaris11:~# zpool import -f tank > cannot import ''tank'': I/O error > Destroy and re-create the pool from > a backup source. > > What can I do next to bring back my zpool as I have some of important VMs stored on this tank? > > Your input and advice given is highly appreciated! > > Regards, > Lee > > EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120709/e8a2daea/attachment-0001.html>
Kwang Whee Lee
2012-Jul-10 04:18 UTC
[zfs-discuss] Repairing Faulted ZFS pool and missing disks
Hi Gregg, Thanks for your prompt reply. The issue here is the state of the pool is shown "UNAVAIL" and it returns "cannot open ''tank'': no such pool" when I try #zpool clear tank. The original operating system running on my X2270 host HDD is OpenSolaris 2009.06. By check the message file: Jun 29 00:54:43 MEL-SUN-X2270 genunix: [ID 751201 kern.notice] NOTICE: One or more I/O devices have been retired This is where four of the array disks invisible. If I install another OpenSolaris OS on a new hard disk within the host, all the six disks in the array are become visible (via #format). From here, I try to import the pool but fail. "Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly" May I know how you copy it to other devices? Is anyone know how to un-retire /repair the fault from OpenSolaris? If the disks are visible to the ZFS, then hopefully this command #import zpool tank will work. Regards, Kwang Whee Lee From: Gregg Wonderly [mailto:gregg at wonderly.org] Sent: Tuesday, 10 July 2012 1:34 PM To: Kwang Whee Lee Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Repairing Faulted ZFS pool and missing disks First, try unplugging and reconnecting the cabling on your drives. There may be some corrosion on the connectors caused by oxidation. If that doesn''t change anything, then next try disconnecting one of the missing drives at a time and see if the pool will come up with one disk missing. One of the drives may be creating a voltage/signaling problem if it has an internal circuit failure of a Capacitor or Resistor. If format can see the drives at some point, but the pool will not import, try using "zpool clear" to remove the "permanent failure indicated" errors from the pool and then the pool should import. Run a scrub to check the integrity of the pool if you don''t have the ability to "copy it elsewhere" at that point. Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly. I recently had a 4x2tb mirrored pool show up with several failed drives. It turned out to be the controller. After replacing the controller, I could bring the pool back up and it scrubbed 3 times without problems, so it is fine for now, I have another pool with the same data on it that I can recover from if needed. Gregg On Jul 9, 2012, at 9:25 PM, Kwang Whee Lee wrote: Hello all, I have been struggled with ZFS and my data on the OpenSolaris 2009.06 and Solaris 11. Last month, my ZFS pool tank (with RAIDz1 configured) became unavailable and 4 out of 6 SCSI disks could not be recognized by OpenSolaris #format command. 1) The four missing Seagate disks (1000.20GB) are c7t0d0, c7t1d0, c7t3d0, and c7t4d0. root at MEL-SUN-X2270:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7t2d0 <ATA-ST31000528AS-CC38-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 1. c7t5d0 <ATA-ST31000528AS-CC37-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 2. c9d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 3. c10d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 4. c10d1 <DEFAULT cyl 3888 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 1,0 root at MEL-SUN-X2270:~# iostat -E cmdk0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUS3F Size: 500.10GB <500101152768 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 cmdk1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUGEF Size: 500.10GB <500101152768 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 cmdk2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: SSDSA2SH032G1SB Revision: Serial No: CVEM02830008032 Size: 32.00GB <31999500288 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd3 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC38 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd4 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd5 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd6 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 root at MEL-SUN-X2270:~# zpool status -v pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c9d0s0 ONLINE 0 0 0 c10d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas raidz1 UNAVAIL 0 0 0 insufficient replicas c7t0d0 UNAVAIL 0 0 0 cannot open c7t1d0 UNAVAIL 0 0 0 cannot open c7t2d0 ONLINE 0 0 0 c7t3d0 UNAVAIL 0 0 0 cannot open c7t4d0 UNAVAIL 0 0 0 cannot open c7t5d0 ONLINE 0 0 0 pool: temppool1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM temppool1 UNAVAIL 0 0 0 insufficient replicas c13t0d0 UNAVAIL 0 0 0 cannot open 2) I did try power-cycle my SunFire X2270 server and the J4200 array. The tank was remained faulted and the disks were not visible by the OpenSolaris. The same result when I tried command #zpool status -x tank. Below are the warning messages I get. Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110700 Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110700 Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. messages.1 3) I have installed Solaris 11 on another disk and tried run #format command. I am able to see all the disks within the array now. root at solaris11:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c3t0d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 0,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__0/disk 1. c3t1d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 1,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__1/disk 2. c3t2d0 <ATA-ST31000528AS-CC38-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__2/disk 3. c3t3d0 <ATA-ST31000528AS-CC49 cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 3,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__3/disk 4. c3t4d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 4,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__4/disk 5. c3t5d0 <ATA-ST31000528AS-CC37-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__5/disk 6. c5d0 <MCBQE64G- SE752X069-0001 cyl 7780 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 4) I try to import the zpool from the new Solaris 11, below is the result. root at solaris11:~# zpool import pool: tank id: 4902002746703797589 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank FAULTED corrupted data raidz1-0 DEGRADED c3t0d0 ONLINE c3t1d0 ONLINE c3t2d0 ONLINE c7t3d0 UNAVAIL cannot open c3t4d0 ONLINE c3t5d0 ONLINE 5) Then I run the command #zpool import -f tank, it fails with error below. root at solaris11:~# zpool import -f tank cannot import ''tank'': I/O error Destroy and re-create the pool from a backup source. What can I do next to bring back my zpool as I have some of important VMs stored on this tank? Your input and advice given is highly appreciated! Regards, Lee EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org<mailto:zfs-discuss at opensolaris.org> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120710/c7075e30/attachment-0001.html>
Gregg Wonderly
2012-Jul-10 19:06 UTC
[zfs-discuss] Repairing Faulted ZFS pool and missing disks
On Jul 9, 2012, at 11:18 PM, Kwang Whee Lee wrote:> Hi Gregg, > > Thanks for your prompt reply. The issue here is the state of the pool is shown ?UNAVAIL? and it returns ?cannot open ?tank?: no such pool? when I try #zpool clear tank.Was this on the 2009.06 or the new version of solaris?> > The original operating system running on my X2270 host HDD is OpenSolaris 2009.06. By check the message file: > Jun 29 00:54:43 MEL-SUN-X2270 genunix: [ID 751201 kern.notice] NOTICE: One or more I/O devices have been retired > This is where four of the array disks invisible. > > If I install another OpenSolaris OS on a new hard disk within the host, all the six disks in the array are become visible (via #format). From here, I try to import the pool but fail.You said that you did the following:> root at solaris11:~# zpool import > pool: tank > id: 4902002746703797589 > state: FAULTED > status: The pool was last accessed by another system. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > tank FAULTED corrupted data > raidz1-0 DEGRADED > c3t0d0 ONLINE > c3t1d0 ONLINE > c3t2d0 ONLINE > c7t3d0 UNAVAIL cannot open > c3t4d0 ONLINE > c3t5d0 ONLINE > > 5) Then I run the command #zpool import ?f tank, it fails with error below. > > root at solaris11:~# zpool import -f tank > cannot import ''tank'': I/O error > Destroy and re-create the pool from > a backup source.So, at this point, did you try "zpool clear tank" and then do the import? What happens if you remove c7t3d0 from the system and do the import? If this is a raidz1, then it should work with one drive out of the pool. If uncabling doesn''t work, put another, drive in the c3t7d0 slot. What ZPOOL is trying to do here, is not "start the pool" when you''ve made a mistake. So, since it can''t talk to some drives, it won''t start the pool. It''s being a little too extreme in its diagnosis of the failures I think.> > ?Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly? > > May I know how you copy it to other devices?If a drive is accessible. I plug in another of the same size, and use "dd if=XXXX of=YYYY bs=10240000" to copy the disk, at the surface level. If dd can not read the disk, then you are out of luck. I can''t overstate the need to have copies of things somewhere. If you don''t have another pool, and you have space, set copies=2 on the pool and ZFS will put copies of files on other drives so that if you lose some blocks in a catastrophic way, it can use the copies to recover.> > Is anyone know how to un-retire /repair the fault from OpenSolaris? If the disks are visible to the ZFS, then hopefully this command #import zpool tank will work.You need to be at a point where all of your drives are recognized by the OS as functioning disks. Whether they are all the correct disks, or not, is the trick. When a single device, hard fails, you must put a functioning disk in that slot, and then the pool will import. Once it is imported, then you can use "zpool replace tank c7t3d0" for example to cause ZFS to rebuild the pool onto that new/replacement disk. Without doing the "zpool replace", your pool will be in a degraded state, and any further failures will compromise the integrity of your pool. Using copies= on the pool, can help recover from that situation though. Gregg Wonderly> Regards, > Kwang Whee Lee > > From: Gregg Wonderly [mailto:gregg at wonderly.org] > Sent: Tuesday, 10 July 2012 1:34 PM > To: Kwang Whee Lee > Cc: zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] Repairing Faulted ZFS pool and missing disks > > First, try unplugging and reconnecting the cabling on your drives. There may be some corrosion on the connectors caused by oxidation. > > If that doesn''t change anything, then next try disconnecting one of the missing drives at a time and see if the pool will come up with one disk missing. One of the drives may be creating a voltage/signaling problem if it has an internal circuit failure of a Capacitor or Resistor. > > If format can see the drives at some point, but the pool will not import, try using "zpool clear" to remove the "permanent failure indicated" errors from the pool and then the pool should import. Run a scrub to check the integrity of the pool if you don''t have the ability to "copy it elsewhere" at that point. > > Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly. > > I recently had a 4x2tb mirrored pool show up with several failed drives. It turned out to be the controller. After replacing the controller, I could bring the pool back up and it scrubbed 3 times without problems, so it is fine for now, I have another pool with the same data on it that I can recover from if needed. > > Gregg > > On Jul 9, 2012, at 9:25 PM, Kwang Whee Lee wrote: > > > Hello all, > > I have been struggled with ZFS and my data on the OpenSolaris 2009.06 and Solaris 11. Last month, my ZFS pool tank (with RAIDz1 configured) became unavailable and 4 out of 6 SCSI disks could not be recognized by OpenSolaris #format command. > > 1) The four missing Seagate disks (1000.20GB) are c7t0d0, c7t1d0, c7t3d0, and c7t4d0. > > root at MEL-SUN-X2270:~# format > Searching for disks...done > > > AVAILABLE DISK SELECTIONS: > 0. c7t2d0 <ATA-ST31000528AS-CC38-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 > 1. c7t5d0 <ATA-ST31000528AS-CC37-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 > 2. c9d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 > 3. c10d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 > 4. c10d1 <DEFAULT cyl 3888 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 1,0 > > > > root at MEL-SUN-X2270:~# iostat -E > cmdk0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUS3F Size: 500.10GB <500101152768 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > cmdk1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUGEF Size: 500.10GB <500101152768 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > cmdk2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: SSDSA2SH032G1SB Revision: Serial No: CVEM02830008032 Size: 32.00GB <31999500288 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd3 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC38 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd4 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd5 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd6 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > > > > root at MEL-SUN-X2270:~# zpool status -v > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c9d0s0 ONLINE 0 0 0 > c10d0s0 ONLINE 0 0 0 > > errors: No known data errors > > pool: tank > state: UNAVAIL > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank UNAVAIL 0 0 0 insufficient replicas > raidz1 UNAVAIL 0 0 0 insufficient replicas > c7t0d0 UNAVAIL 0 0 0 cannot open > c7t1d0 UNAVAIL 0 0 0 cannot open > c7t2d0 ONLINE 0 0 0 > c7t3d0 UNAVAIL 0 0 0 cannot open > c7t4d0 UNAVAIL 0 0 0 cannot open > c7t5d0 ONLINE 0 0 0 > > pool: temppool1 > state: UNAVAIL > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > temppool1 UNAVAIL 0 0 0 insufficient replicas > c13t0d0 UNAVAIL 0 0 0 cannot open > > 2) I did try power-cycle my SunFire X2270 server and the J4200 array. The tank was remained faulted and the disks were not visible by the OpenSolaris. The same result when I tried command #zpool status ?x tank. > > Below are the warning messages I get. > > Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110700 > Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110700 > Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. > Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. > Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. > messages.1 > > 3) I have installed Solaris 11 on another disk and tried run #format command. I am able to see all the disks within the array now. > > root at solaris11:~# format > Searching for disks...done > > AVAILABLE DISK SELECTIONS: > 0. c3t0d0 <ATA-ST31000528AS-CC35-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 0,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__0/disk > 1. c3t1d0 <ATA-ST31000528AS-CC35-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 1,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__1/disk > 2. c3t2d0 <ATA-ST31000528AS-CC38-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__2/disk > 3. c3t3d0 <ATA-ST31000528AS-CC49 cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 3,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__3/disk > 4. c3t4d0 <ATA-ST31000528AS-CC35-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 4,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__4/disk > 5. c3t5d0 <ATA-ST31000528AS-CC37-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__5/disk > 6. c5d0 <MCBQE64G- SE752X069-0001 cyl 7780 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 > > 4) I try to import the zpool from the new Solaris 11, below is the result. > > root at solaris11:~# zpool import > pool: tank > id: 4902002746703797589 > state: FAULTED > status: The pool was last accessed by another system. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > tank FAULTED corrupted data > raidz1-0 DEGRADED > c3t0d0 ONLINE > c3t1d0 ONLINE > c3t2d0 ONLINE > c7t3d0 UNAVAIL cannot open > c3t4d0 ONLINE > c3t5d0 ONLINE > > 5) Then I run the command #zpool import ?f tank, it fails with error below. > > root at solaris11:~# zpool import -f tank > cannot import ''tank'': I/O error > Destroy and re-create the pool from > a backup source. > > What can I do next to bring back my zpool as I have some of important VMs stored on this tank? > > Your input and advice given is highly appreciated! > > Regards, > Lee > > EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120710/a2664cbf/attachment-0001.html>
Kwang Whee Lee
2012-Jul-11 06:43 UTC
[zfs-discuss] Repairing Faulted ZFS pool and missing disks
Hi Gregg, Regarding your question, below here are my anwers: 1) Was this on the 2009.06 or the new version of solaris? This is on the new version of Solaris 11. 2) So, at this point, did you try "zpool clear tank" and then do the import? What happens if you remove c7t3d0 from the system and do the import? If this is a raidz1, then it should work with one drive out of the pool. If uncabling doesn''t work, put another, drive in the c3t7d0 slot. I did try zpool clear tank on the freshly installed Solaris 11 but it returns "cannot open ''tank'': no such pool". If I remove the c7t3d0 from the array and do run "# zpool import -f tank", I get the same result as mentioned earlier. I did try to put another drive in the same c7t3d0 slot but still could not import the tank. root at solaris11:~# zpool import -f tank cannot import ''tank'': I/O error Destroy and re-create the pool from a backup source. 3) You need to be at a point where all of your drives are recognized by the OS as functioning disks. Whether they are all the correct disks, or not, is the trick. When a single device, hard fails, you must put a functioning disk in that slot, and then the pool will import. Once it is imported, then you can use "zpool replace tank c7t3d0" for example to cause ZFS to rebuild the pool onto that new/replacement disk. Without doing the "zpool replace", your pool will be in a degraded state, and any further failures will compromise the integrity of your pool. Using copies= on the pool, can help recover from that situation though. The version ZFS pool is raidz1. Should I format the new replacement disk using fdisk? How can I change the label of the new disk to EFI Please advise, thank you! Regards, Kwang Whee Lee From: Gregg Wonderly [mailto:gregg at wonderly.org] Sent: Wednesday, 11 July 2012 5:06 AM To: Kwang Whee Lee Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Repairing Faulted ZFS pool and missing disks On Jul 9, 2012, at 11:18 PM, Kwang Whee Lee wrote: Hi Gregg, Thanks for your prompt reply. The issue here is the state of the pool is shown "UNAVAIL" and it returns "cannot open ''tank'': no such pool" when I try #zpool clear tank. Was this on the 2009.06 or the new version of solaris? This is on the new version of Solaris. The original operating system running on my X2270 host HDD is OpenSolaris 2009.06. By check the message file: Jun 29 00:54:43 MEL-SUN-X2270 genunix: [ID 751201 kern.notice] NOTICE: One or more I/O devices have been retired This is where four of the array disks invisible. If I install another OpenSolaris OS on a new hard disk within the host, all the six disks in the array are become visible (via #format). From here, I try to import the pool but fail. You said that you did the following: root at solaris11:~# zpool import pool: tank id: 4902002746703797589 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank FAULTED corrupted data raidz1-0 DEGRADED c3t0d0 ONLINE c3t1d0 ONLINE c3t2d0 ONLINE c7t3d0 UNAVAIL cannot open c3t4d0 ONLINE c3t5d0 ONLINE 5) Then I run the command #zpool import -f tank, it fails with error below. root at solaris11:~# zpool import -f tank cannot import ''tank'': I/O error Destroy and re-create the pool from a backup source. So, at this point, did you try "zpool clear tank" and then do the import? What happens if you remove c7t3d0 from the system and do the import? If this is a raidz1, then it should work with one drive out of the pool. If uncabling doesn''t work, put another, drive in the c3t7d0 slot. What ZPOOL is trying to do here, is not "start the pool" when you''ve made a mistake. So, since it can''t talk to some drives, it won''t start the pool. It''s being a little too extreme in its diagnosis of the failures I think. "Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly" May I know how you copy it to other devices? If a drive is accessible. I plug in another of the same size, and use "dd if=XXXX of=YYYY bs=10240000" to copy the disk, at the surface level. If dd can not read the disk, then you are out of luck. I can''t overstate the need to have copies of things somewhere. If you don''t have another pool, and you have space, set copies=2 on the pool and ZFS will put copies of files on other drives so that if you lose some blocks in a catastrophic way, it can use the copies to recover. Is anyone know how to un-retire /repair the fault from OpenSolaris? If the disks are visible to the ZFS, then hopefully this command #import zpool tank will work. You need to be at a point where all of your drives are recognized by the OS as functioning disks. Whether they are all the correct disks, or not, is the trick. When a single device, hard fails, you must put a functioning disk in that slot, and then the pool will import. Once it is imported, then you can use "zpool replace tank c7t3d0" for example to cause ZFS to rebuild the pool onto that new/replacement disk. Without doing the "zpool replace", your pool will be in a degraded state, and any further failures will compromise the integrity of your pool. Using copies= on the pool, can help recover from that situation though. Gregg Wonderly Regards, Kwang Whee Lee From: Gregg Wonderly [mailto:gregg at wonderly.org]<mailto:[mailto:gregg at wonderly.org]> Sent: Tuesday, 10 July 2012 1:34 PM To: Kwang Whee Lee Cc: zfs-discuss at opensolaris.org<mailto:zfs-discuss at opensolaris.org> Subject: Re: [zfs-discuss] Repairing Faulted ZFS pool and missing disks First, try unplugging and reconnecting the cabling on your drives. There may be some corrosion on the connectors caused by oxidation. If that doesn''t change anything, then next try disconnecting one of the missing drives at a time and see if the pool will come up with one disk missing. One of the drives may be creating a voltage/signaling problem if it has an internal circuit failure of a Capacitor or Resistor. If format can see the drives at some point, but the pool will not import, try using "zpool clear" to remove the "permanent failure indicated" errors from the pool and then the pool should import. Run a scrub to check the integrity of the pool if you don''t have the ability to "copy it elsewhere" at that point. Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly. I recently had a 4x2tb mirrored pool show up with several failed drives. It turned out to be the controller. After replacing the controller, I could bring the pool back up and it scrubbed 3 times without problems, so it is fine for now, I have another pool with the same data on it that I can recover from if needed. Gregg On Jul 9, 2012, at 9:25 PM, Kwang Whee Lee wrote: Hello all, I have been struggled with ZFS and my data on the OpenSolaris 2009.06 and Solaris 11. Last month, my ZFS pool tank (with RAIDz1 configured) became unavailable and 4 out of 6 SCSI disks could not be recognized by OpenSolaris #format command. 1) The four missing Seagate disks (1000.20GB) are c7t0d0, c7t1d0, c7t3d0, and c7t4d0. root at MEL-SUN-X2270:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7t2d0 <ATA-ST31000528AS-CC38-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 1. c7t5d0 <ATA-ST31000528AS-CC37-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 2. c9d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 3. c10d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 4. c10d1 <DEFAULT cyl 3888 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 1,0 root at MEL-SUN-X2270:~# iostat -E cmdk0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUS3F Size: 500.10GB <500101152768 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 cmdk1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUGEF Size: 500.10GB <500101152768 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 cmdk2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: SSDSA2SH032G1SB Revision: Serial No: CVEM02830008032 Size: 32.00GB <31999500288 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd3 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC38 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd4 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd5 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd6 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 root at MEL-SUN-X2270:~# zpool status -v pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c9d0s0 ONLINE 0 0 0 c10d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas raidz1 UNAVAIL 0 0 0 insufficient replicas c7t0d0 UNAVAIL 0 0 0 cannot open c7t1d0 UNAVAIL 0 0 0 cannot open c7t2d0 ONLINE 0 0 0 c7t3d0 UNAVAIL 0 0 0 cannot open c7t4d0 UNAVAIL 0 0 0 cannot open c7t5d0 ONLINE 0 0 0 pool: temppool1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM temppool1 UNAVAIL 0 0 0 insufficient replicas c13t0d0 UNAVAIL 0 0 0 cannot open 2) I did try power-cycle my SunFire X2270 server and the J4200 array. The tank was remained faulted and the disks were not visible by the OpenSolaris. The same result when I tried command #zpool status -x tank. Below are the warning messages I get. Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110700 Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110700 Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. messages.1 3) I have installed Solaris 11 on another disk and tried run #format command. I am able to see all the disks within the array now. root at solaris11:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c3t0d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 0,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__0/disk 1. c3t1d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 1,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__1/disk 2. c3t2d0 <ATA-ST31000528AS-CC38-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__2/disk 3. c3t3d0 <ATA-ST31000528AS-CC49 cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 3,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__3/disk 4. c3t4d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 4,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__4/disk 5. c3t5d0 <ATA-ST31000528AS-CC37-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__5/disk 6. c5d0 <MCBQE64G- SE752X069-0001 cyl 7780 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 4) I try to import the zpool from the new Solaris 11, below is the result. root at solaris11:~# zpool import pool: tank id: 4902002746703797589 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank FAULTED corrupted data raidz1-0 DEGRADED c3t0d0 ONLINE c3t1d0 ONLINE c3t2d0 ONLINE c7t3d0 UNAVAIL cannot open c3t4d0 ONLINE c3t5d0 ONLINE 5) Then I run the command #zpool import -f tank, it fails with error below. root at solaris11:~# zpool import -f tank cannot import ''tank'': I/O error Destroy and re-create the pool from a backup source. What can I do next to bring back my zpool as I have some of important VMs stored on this tank? Your input and advice given is highly appreciated! Regards, Lee EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org<mailto:zfs-discuss at opensolaris.org> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120711/1dbd12d6/attachment-0001.html>
Kwang Whee Lee
2012-Jul-11 06:44 UTC
[zfs-discuss] Repairing Faulted ZFS pool and missing disks
Hi Gregg, Regarding your question, below here are my anwers: 1) Was this on the 2009.06 or the new version of solaris? This is on the new version of Solaris 11. 2) So, at this point, did you try "zpool clear tank" and then do the import? What happens if you remove c7t3d0 from the system and do the import? If this is a raidz1, then it should work with one drive out of the pool. If uncabling doesn''t work, put another, drive in the c3t7d0 slot. I did try zpool clear tank on the freshly installed Solaris 11 but it returns "cannot open ''tank'': no such pool". If I remove the c7t3d0 from the array and do run "# zpool import -f tank", I get the same result as mentioned earlier. I did try to put another drive in the same c7t3d0 slot but still could not import the tank. root at solaris11:~# zpool import -f tank cannot import ''tank'': I/O error Destroy and re-create the pool from a backup source. 3) You need to be at a point where all of your drives are recognized by the OS as functioning disks. Whether they are all the correct disks, or not, is the trick. When a single device, hard fails, you must put a functioning disk in that slot, and then the pool will import. Once it is imported, then you can use "zpool replace tank c7t3d0" for example to cause ZFS to rebuild the pool onto that new/replacement disk. Without doing the "zpool replace", your pool will be in a degraded state, and any further failures will compromise the integrity of your pool. Using copies= on the pool, can help recover from that situation though. The version ZFS pool is raidz1. Should I format the new replacement disk using fdisk? How can I change the label of the new disk to EFI Please advise, thank you! Regards, Kwang Whee Lee From: Gregg Wonderly [mailto:gregg at wonderly.org] Sent: Wednesday, 11 July 2012 5:06 AM To: Kwang Whee Lee Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Repairing Faulted ZFS pool and missing disks On Jul 9, 2012, at 11:18 PM, Kwang Whee Lee wrote: Hi Gregg, Thanks for your prompt reply. The issue here is the state of the pool is shown "UNAVAIL" and it returns "cannot open ''tank'': no such pool" when I try #zpool clear tank. Was this on the 2009.06 or the new version of solaris? This is on the new version of Solaris. The original operating system running on my X2270 host HDD is OpenSolaris 2009.06. By check the message file: Jun 29 00:54:43 MEL-SUN-X2270 genunix: [ID 751201 kern.notice] NOTICE: One or more I/O devices have been retired This is where four of the array disks invisible. If I install another OpenSolaris OS on a new hard disk within the host, all the six disks in the array are become visible (via #format). From here, I try to import the pool but fail. You said that you did the following: root at solaris11:~# zpool import pool: tank id: 4902002746703797589 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank FAULTED corrupted data raidz1-0 DEGRADED c3t0d0 ONLINE c3t1d0 ONLINE c3t2d0 ONLINE c7t3d0 UNAVAIL cannot open c3t4d0 ONLINE c3t5d0 ONLINE 5) Then I run the command #zpool import -f tank, it fails with error below. root at solaris11:~# zpool import -f tank cannot import ''tank'': I/O error Destroy and re-create the pool from a backup source. So, at this point, did you try "zpool clear tank" and then do the import? What happens if you remove c7t3d0 from the system and do the import? If this is a raidz1, then it should work with one drive out of the pool. If uncabling doesn''t work, put another, drive in the c3t7d0 slot. What ZPOOL is trying to do here, is not "start the pool" when you''ve made a mistake. So, since it can''t talk to some drives, it won''t start the pool. It''s being a little too extreme in its diagnosis of the failures I think. "Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly" May I know how you copy it to other devices? If a drive is accessible. I plug in another of the same size, and use "dd if=XXXX of=YYYY bs=10240000" to copy the disk, at the surface level. If dd can not read the disk, then you are out of luck. I can''t overstate the need to have copies of things somewhere. If you don''t have another pool, and you have space, set copies=2 on the pool and ZFS will put copies of files on other drives so that if you lose some blocks in a catastrophic way, it can use the copies to recover. Is anyone know how to un-retire /repair the fault from OpenSolaris? If the disks are visible to the ZFS, then hopefully this command #import zpool tank will work. You need to be at a point where all of your drives are recognized by the OS as functioning disks. Whether they are all the correct disks, or not, is the trick. When a single device, hard fails, you must put a functioning disk in that slot, and then the pool will import. Once it is imported, then you can use "zpool replace tank c7t3d0" for example to cause ZFS to rebuild the pool onto that new/replacement disk. Without doing the "zpool replace", your pool will be in a degraded state, and any further failures will compromise the integrity of your pool. Using copies= on the pool, can help recover from that situation though. Gregg Wonderly Regards, Kwang Whee Lee From: Gregg Wonderly [mailto:gregg at wonderly.org]<mailto:[mailto:gregg at wonderly.org]> Sent: Tuesday, 10 July 2012 1:34 PM To: Kwang Whee Lee Cc: zfs-discuss at opensolaris.org<mailto:zfs-discuss at opensolaris.org> Subject: Re: [zfs-discuss] Repairing Faulted ZFS pool and missing disks First, try unplugging and reconnecting the cabling on your drives. There may be some corrosion on the connectors caused by oxidation. If that doesn''t change anything, then next try disconnecting one of the missing drives at a time and see if the pool will come up with one disk missing. One of the drives may be creating a voltage/signaling problem if it has an internal circuit failure of a Capacitor or Resistor. If format can see the drives at some point, but the pool will not import, try using "zpool clear" to remove the "permanent failure indicated" errors from the pool and then the pool should import. Run a scrub to check the integrity of the pool if you don''t have the ability to "copy it elsewhere" at that point. Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly. I recently had a 4x2tb mirrored pool show up with several failed drives. It turned out to be the controller. After replacing the controller, I could bring the pool back up and it scrubbed 3 times without problems, so it is fine for now, I have another pool with the same data on it that I can recover from if needed. Gregg On Jul 9, 2012, at 9:25 PM, Kwang Whee Lee wrote: Hello all, I have been struggled with ZFS and my data on the OpenSolaris 2009.06 and Solaris 11. Last month, my ZFS pool tank (with RAIDz1 configured) became unavailable and 4 out of 6 SCSI disks could not be recognized by OpenSolaris #format command. 1) The four missing Seagate disks (1000.20GB) are c7t0d0, c7t1d0, c7t3d0, and c7t4d0. root at MEL-SUN-X2270:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7t2d0 <ATA-ST31000528AS-CC38-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 1. c7t5d0 <ATA-ST31000528AS-CC37-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 2. c9d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 3. c10d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 4. c10d1 <DEFAULT cyl 3888 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 1,0 root at MEL-SUN-X2270:~# iostat -E cmdk0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUS3F Size: 500.10GB <500101152768 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 cmdk1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUGEF Size: 500.10GB <500101152768 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 cmdk2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: SSDSA2SH032G1SB Revision: Serial No: CVEM02830008032 Size: 32.00GB <31999500288 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd3 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC38 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd4 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd5 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd6 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: Size: 1000.20GB <1000204886016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 root at MEL-SUN-X2270:~# zpool status -v pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c9d0s0 ONLINE 0 0 0 c10d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas raidz1 UNAVAIL 0 0 0 insufficient replicas c7t0d0 UNAVAIL 0 0 0 cannot open c7t1d0 UNAVAIL 0 0 0 cannot open c7t2d0 ONLINE 0 0 0 c7t3d0 UNAVAIL 0 0 0 cannot open c7t4d0 UNAVAIL 0 0 0 cannot open c7t5d0 ONLINE 0 0 0 pool: temppool1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM temppool1 UNAVAIL 0 0 0 insufficient replicas c13t0d0 UNAVAIL 0 0 0 cannot open 2) I did try power-cycle my SunFire X2270 server and the J4200 array. The tank was remained faulted and the disks were not visible by the OpenSolaris. The same result when I tried command #zpool status -x tank. Below are the warning messages I get. Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110700 Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110700 Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. messages.1 3) I have installed Solaris 11 on another disk and tried run #format command. I am able to see all the disks within the array now. root at solaris11:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c3t0d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 0,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__0/disk 1. c3t1d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 1,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__1/disk 2. c3t2d0 <ATA-ST31000528AS-CC38-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__2/disk 3. c3t3d0 <ATA-ST31000528AS-CC49 cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 3,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__3/disk 4. c3t4d0 <ATA-ST31000528AS-CC35-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 4,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__4/disk 5. c3t5d0 <ATA-ST31000528AS-CC37-931.51GB> /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__5/disk 6. c5d0 <MCBQE64G- SE752X069-0001 cyl 7780 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 4) I try to import the zpool from the new Solaris 11, below is the result. root at solaris11:~# zpool import pool: tank id: 4902002746703797589 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank FAULTED corrupted data raidz1-0 DEGRADED c3t0d0 ONLINE c3t1d0 ONLINE c3t2d0 ONLINE c7t3d0 UNAVAIL cannot open c3t4d0 ONLINE c3t5d0 ONLINE 5) Then I run the command #zpool import -f tank, it fails with error below. root at solaris11:~# zpool import -f tank cannot import ''tank'': I/O error Destroy and re-create the pool from a backup source. What can I do next to bring back my zpool as I have some of important VMs stored on this tank? Your input and advice given is highly appreciated! Regards, Lee EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org<mailto:zfs-discuss at opensolaris.org> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120711/811109fa/attachment-0001.html>
Gregg Wonderly
2012-Jul-11 14:11 UTC
[zfs-discuss] Repairing Faulted ZFS pool and missing disks
On Jul 11, 2012, at 1:44 AM, Kwang Whee Lee wrote:> Hi Gregg, > > Regarding your question, below here are my anwers: > > 1) Was this on the 2009.06 or the new version of solaris? This is on the new version of Solaris 11. > 2) So, at this point, did you try "zpool clear tank" and then do the import? What happens if you remove c7t3d0 from the system and do the import? If this is a raidz1, then it should work with one drive out of the pool. If uncabling doesn''t work, put another, drive in the c3t7d0 slot. > > I did try zpool clear tank on the freshly installed Solaris 11 but it returns ?cannot open ''tank'': no such pool?. If I remove the c7t3d0 from the array and do run ?# zpool import ?f tank?, I get the same result as mentioned earlier. I did try to put another drive in the same c7t3d0 slot but still could not import the tank. > > root at solaris11:~# zpool import -f tank > cannot import ''tank'': I/O error > Destroy and re-create the pool from > a backup source.There has to be a disk/controller/cable error then. For each device in the pool do the following to see if the devices are accessible. dd if=/dev/rdsk/c7t?d0p0 of=/dev/null count=100 bs=1024000 that should read data from the disk and write it to /dev/null, silently. You should just get an in and out block count report. If you get an I/O error, then that drive/controller is having problems, and that is what you need to focus on. If you have another controller on the machine, recable the disk to that controller and use format to figure out the device name, and use the ''dd'' command above with the appropriate if= argument disk name to see if you can read it on that controller.> > 3) You need to be at a point where all of your drives are recognized by the OS as functioning disks. Whether they are all the correct disks, or not, is the trick. When a single device, hard fails, you must put a functioning disk in that slot, and then the pool will import. Once it is imported, then you can use "zpool replace tank c7t3d0" for example to cause ZFS to rebuild the pool onto that new/replacement disk. > > Without doing the "zpool replace", your pool will be in a degraded state, and any further failures will compromise the integrity of your pool. Using copies= on the pool, can help recover from that situation though. > > The version ZFS pool is raidz1. Should I format the new replacement disk using fdisk? How can I change the label of the new disk to EFIIf the disk has been used before, then you''ll just need to zero out the disk so that it is not partitioned, using dd with something like the following parameters. Please check the output (of=) device name carefully because this will destroy data on that output device/file. dd if=/dev/zero of=/dev/rdsk/c7t3d0p0 count=1000 bs=102400 Gregg Wonderly> Please advise, thank you! > > Regards, > Kwang Whee Lee > > From: Gregg Wonderly [mailto:gregg at wonderly.org] > Sent: Wednesday, 11 July 2012 5:06 AM > To: Kwang Whee Lee > Cc: zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] Repairing Faulted ZFS pool and missing disks > > > On Jul 9, 2012, at 11:18 PM, Kwang Whee Lee wrote: > > Hi Gregg, > > Thanks for your prompt reply. The issue here is the state of the pool is shown ?UNAVAIL? and it returns ?cannot open ?tank?: no such pool? when I try #zpool clear tank. > > Was this on the 2009.06 or the new version of solaris? This is on the new version of Solaris. > > > The original operating system running on my X2270 host HDD is OpenSolaris 2009.06. By check the message file: > Jun 29 00:54:43 MEL-SUN-X2270 genunix: [ID 751201 kern.notice] NOTICE: One or more I/O devices have been retired > This is where four of the array disks invisible. > > If I install another OpenSolaris OS on a new hard disk within the host, all the six disks in the array are become visible (via #format). From here, I try to import the pool but fail. > > You said that you did the following: > > root at solaris11:~# zpool import > pool: tank > id: 4902002746703797589 > state: FAULTED > status: The pool was last accessed by another system. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > tank FAULTED corrupted data > raidz1-0 DEGRADED > c3t0d0 ONLINE > c3t1d0 ONLINE > c3t2d0 ONLINE > c7t3d0 UNAVAIL cannot open > c3t4d0 ONLINE > c3t5d0 ONLINE > > 5) Then I run the command #zpool import ?f tank, it fails with error below. > > root at solaris11:~# zpool import -f tank > cannot import ''tank'': I/O error > Destroy and re-create the pool from > a backup source. > > So, at this point, did you try "zpool clear tank" and then do the import? What happens if you remove c7t3d0 from the system and do the import? If this is a raidz1, then it should work with one drive out of the pool. If uncabling doesn''t work, put another, drive in the c3t7d0 slot. > > What ZPOOL is trying to do here, is not "start the pool" when you''ve made a mistake. So, since it can''t talk to some drives, it won''t start the pool. It''s being a little too extreme in its diagnosis of the failures I think. > > > ?Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly? > > May I know how you copy it to other devices? > > If a drive is accessible. I plug in another of the same size, and use "dd if=XXXX of=YYYY bs=10240000" to copy the disk, at the surface level. If dd can not read the disk, then you are out of luck. > > I can''t overstate the need to have copies of things somewhere. If you don''t have another pool, and you have space, set copies=2 on the pool and ZFS will put copies of files on other drives so that if you lose some blocks in a catastrophic way, it can use the copies to recover. > > > Is anyone know how to un-retire /repair the fault from OpenSolaris? If the disks are visible to the ZFS, then hopefully this command #import zpool tank will work. > > You need to be at a point where all of your drives are recognized by the OS as functioning disks. Whether they are all the correct disks, or not, is the trick. When a single device, hard fails, you must put a functioning disk in that slot, and then the pool will import. Once it is imported, then you can use "zpool replace tank c7t3d0" for example to cause ZFS to rebuild the pool onto that new/replacement disk. > > Without doing the "zpool replace", your pool will be in a degraded state, and any further failures will compromise the integrity of your pool. Using copies= on the pool, can help recover from that situation though. > > Gregg Wonderly > > > Regards, > Kwang Whee Lee > > From: Gregg Wonderly [mailto:gregg at wonderly.org] > Sent: Tuesday, 10 July 2012 1:34 PM > To: Kwang Whee Lee > Cc: zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] Repairing Faulted ZFS pool and missing disks > > First, try unplugging and reconnecting the cabling on your drives. There may be some corrosion on the connectors caused by oxidation. > > If that doesn''t change anything, then next try disconnecting one of the missing drives at a time and see if the pool will come up with one disk missing. One of the drives may be creating a voltage/signaling problem if it has an internal circuit failure of a Capacitor or Resistor. > > If format can see the drives at some point, but the pool will not import, try using "zpool clear" to remove the "permanent failure indicated" errors from the pool and then the pool should import. Run a scrub to check the integrity of the pool if you don''t have the ability to "copy it elsewhere" at that point. > > Usually when a pool fails like this, I''d be tempted to copy it to other devices, and then I would really not use the failed devices any longer, unless I could put them on another controller and see them work correctly. > > I recently had a 4x2tb mirrored pool show up with several failed drives. It turned out to be the controller. After replacing the controller, I could bring the pool back up and it scrubbed 3 times without problems, so it is fine for now, I have another pool with the same data on it that I can recover from if needed. > > Gregg > > On Jul 9, 2012, at 9:25 PM, Kwang Whee Lee wrote: > > Hello all, > > I have been struggled with ZFS and my data on the OpenSolaris 2009.06 and Solaris 11. Last month, my ZFS pool tank (with RAIDz1 configured) became unavailable and 4 out of 6 SCSI disks could not be recognized by OpenSolaris #format command. > > 1) The four missing Seagate disks (1000.20GB) are c7t0d0, c7t1d0, c7t3d0, and c7t4d0. > > root at MEL-SUN-X2270:~# format > Searching for disks...done > > > AVAILABLE DISK SELECTIONS: > 0. c7t2d0 <ATA-ST31000528AS-CC38-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 > 1. c7t5d0 <ATA-ST31000528AS-CC37-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 > 2. c9d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 > 3. c10d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 > 4. c10d1 <DEFAULT cyl 3888 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 1,0 > > > > root at MEL-SUN-X2270:~# iostat -E > cmdk0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUS3F Size: 500.10GB <500101152768 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > cmdk1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: HITACHI HUA7250 Revision: Serial No: GTF402P6GUUGEF Size: 500.10GB <500101152768 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > cmdk2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: SSDSA2SH032G1SB Revision: Serial No: CVEM02830008032 Size: 32.00GB <31999500288 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd3 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC38 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd4 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd5 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC35 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > sd6 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31000528AS Revision: CC37 Serial No: > Size: 1000.20GB <1000204886016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > > > > root at MEL-SUN-X2270:~# zpool status -v > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c9d0s0 ONLINE 0 0 0 > c10d0s0 ONLINE 0 0 0 > > errors: No known data errors > > pool: tank > state: UNAVAIL > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank UNAVAIL 0 0 0 insufficient replicas > raidz1 UNAVAIL 0 0 0 insufficient replicas > c7t0d0 UNAVAIL 0 0 0 cannot open > c7t1d0 UNAVAIL 0 0 0 cannot open > c7t2d0 ONLINE 0 0 0 > c7t3d0 UNAVAIL 0 0 0 cannot open > c7t4d0 UNAVAIL 0 0 0 cannot open > c7t5d0 ONLINE 0 0 0 > > pool: temppool1 > state: UNAVAIL > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > temppool1 UNAVAIL 0 0 0 insufficient replicas > c13t0d0 UNAVAIL 0 0 0 cannot open > > 2) I did try power-cycle my SunFire X2270 server and the J4200 array. The tank was remained faulted and the disks were not visible by the OpenSolaris. The same result when I tried command #zpool status ?x tank. > > Below are the warning messages I get. > > Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110700 > Jun 18 22:30:37 MEL-SUN-X2270 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:37 MEL-SUN-X2270 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110700 > Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. > Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. > Jun 18 22:30:39 MEL-SUN-X2270 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Jun 18 22:30:39 MEL-SUN-X2270 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0 (mpt0): > Jun 18 22:30:39 MEL-SUN-X2270 Log info 0x31110700 received for target 1. > messages.1 > > 3) I have installed Solaris 11 on another disk and tried run #format command. I am able to see all the disks within the array now. > > root at solaris11:~# format > Searching for disks...done > > AVAILABLE DISK SELECTIONS: > 0. c3t0d0 <ATA-ST31000528AS-CC35-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 0,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__0/disk > 1. c3t1d0 <ATA-ST31000528AS-CC35-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 1,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__1/disk > 2. c3t2d0 <ATA-ST31000528AS-CC38-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 2,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__2/disk > 3. c3t3d0 <ATA-ST31000528AS-CC49 cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 3,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__3/disk > 4. c3t4d0 <ATA-ST31000528AS-CC35-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 4,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__4/disk > 5. c3t5d0 <ATA-ST31000528AS-CC37-931.51GB> > /pci at 0,0/pci8086,340e at 7/pci1000,3150 at 0/sd at 5,0 > /dev/chassis/SUN-Storage-J4200.0939QAJ006/SCSI_Device__5/disk > 6. c5d0 <MCBQE64G- SE752X069-0001 cyl 7780 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 > > 4) I try to import the zpool from the new Solaris 11, below is the result. > > root at solaris11:~# zpool import > pool: tank > id: 4902002746703797589 > state: FAULTED > status: The pool was last accessed by another system. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > tank FAULTED corrupted data > raidz1-0 DEGRADED > c3t0d0 ONLINE > c3t1d0 ONLINE > c3t2d0 ONLINE > c7t3d0 UNAVAIL cannot open > c3t4d0 ONLINE > c3t5d0 ONLINE > > 5) Then I run the command #zpool import ?f tank, it fails with error below. > > root at solaris11:~# zpool import -f tank > cannot import ''tank'': I/O error > Destroy and re-create the pool from > a backup source. > > What can I do next to bring back my zpool as I have some of important VMs stored on this tank? > > Your input and advice given is highly appreciated! > > Regards, > Lee > > EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses. > > EMAIL DISCLAIMER This email message and its attachments are confidential and may also contain copyright or privileged material. If you are not the intended recipient, you may not forward the email or disclose or use the information contained in it. If you have received this email message in error, please advise the sender immediately by replying to this email and delete the message and any associated attachments. Any views, opinions, conclusions, advice or statements expressed in this email message are those of the individual sender and should not be relied upon as the considered view, opinion, conclusions, advice or statement of this company except where the sender expressly, and with authority, states them to be the considered view, opinion, conclusions, advice or statement of this company. Every care is taken but we recommend that you scan any attachments for viruses.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120711/d0c7655e/attachment-0001.html>