Hello, one drive of my mirror failed today, but 'zpool staus' shows it "online". Every process using a ZFS mount hangs. Also 'zpool offline /dev/ad1' hangs infinitely. Here's the dmesg of the failing (and correctly detached) device: ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ata3: port is not ready (timeout 10000ms) tfd = 00000080 ata3: hardware reset timeout ad1: FAILURE - device detached But: zpool status pool: URUBAmirrorP1 state: ONLINE status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-JQ scrub: none requested config: NAME STATE READ WRITE CKSUM URUBAmirrorP1 ONLINE 0 7K 0 ad1 ONLINE 3 14,9K 0 ad2 ONLINE 0 0 0 Reboot doesn't work, somebody had to reset the machine. How should such a error event be handled??? Isn't a mirror useless if there's no way to continue with one remaining good drive? If the OS was on the same pool the complete machine is unaccessable with a failing drive..?!? Thanks, -Harry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100505/c0d32633/signature.pgp
Harald Schmalzbauer schrieb am 05.05.2010 14:41 (localtime):> Hello, > > one drive of my mirror failed today, but 'zpool staus' shows it "online". > Every process using a ZFS mount hangs. Also 'zpool offline /dev/ad1' > hangs infinitely.... Sorry, I made an error with zpool create. Somehow the little word "mirror" must have been lost. So the pool wasn't a mirror but a stripe. Then of course I can't make one vdev offline. Sorry for the noise. But I took the opportunity to do some tests with that failing drive and created a _real_ mirror. That works without failures, but using the mirror again leads to: ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ata3: port is not ready (timeout 10000ms) tfd = 00000080 ata3: hardware reset timeout ad1: FAILURE - device detached Now zpool reporsts the vdev ad1 still online although it has been detached and 'atacontrol list' doesn't show it anymore: zpool status pool: URUBAmirrorP1 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM URUBAmirrorP1 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad1 ONLINE 3 302K 0 ad2 ONLINE 0 0 0 errors: No known data errors atacontrol list ATA channel 2: Master: ad0 <TRANSCEND/20090520> SATA revision 1.x Slave: no device present ATA channel 3: Master: no device present Slave: no device present ATA channel 4: Master: ad2 <SAMSUNG HD154UI/1AG01118> SATA revision 2.x Slave: no device present ATA channel 5: Master: ad3 <ST3750640NS/3.AEG> SATA revision 1.x Slave: no device present How should such a failure be handled? Do I have to manually mark the drive offline for zpool? Thanks, -Harry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100505/36bce5d5/signature.pgp