thr3ads.net - freebsd stable - ZFS (zpool) doesn't detect failed drive [May 2010]

If this information is useful, please help other people find it:
Share via:

Harald Schmalzbauer

2010-May-05 12:41 UTC

ZFS (zpool) doesn't detect failed drive

Hello,

one drive of my mirror failed today, but 'zpool staus' shows it
"online".
Every process using a ZFS mount hangs. Also 'zpool offline /dev/ad1' 
hangs infinitely.

Here's the dmesg of the failing (and correctly detached) device:
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ata3: port is not ready (timeout 10000ms) tfd = 00000080
ata3: hardware reset timeout
ad1: FAILURE - device detached

But:
zpool status
   pool: URUBAmirrorP1
  state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool 
clear'.
    see: http://www.sun.com/msg/ZFS-8000-JQ
  scrub: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	URUBAmirrorP1  ONLINE       0    7K     0
	  ad1       ONLINE       3 14,9K     0
	  ad2       ONLINE       0     0     0



Reboot doesn't work, somebody had to reset the machine.
How should such a error event be handled??? Isn't a mirror useless if 
there's no way to continue with one remaining good drive?
If the OS was on the same pool the complete machine is unaccessable with 
a failing drive..?!?

Thanks,

-Harry


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: OpenPGP digital signature
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100505/c0d32633/signature.pgp

Harald Schmalzbauer

2010-May-05 14:56 UTC

head link

ZFS (zpool) doesn't detect failed drive

Harald Schmalzbauer schrieb am 05.05.2010 14:41
(localtime):> Hello,
>
> one drive of my mirror failed today, but 'zpool staus' shows it
"online".
> Every process using a ZFS mount hangs. Also 'zpool offline
/dev/ad1'
> hangs infinitely....
Sorry, I made an error with zpool create. Somehow the little word
"mirror" must have been lost. So the pool wasn't a mirror but a
stripe.
Then of course I can't make one vdev offline. Sorry for the noise.
But I took the opportunity to do some tests with that failing drive and
created a _real_ mirror. That works without failures, but using the
mirror again leads to:
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ata3: port is not ready (timeout 10000ms) tfd = 00000080
ata3: hardware reset timeout
ad1: FAILURE - device detached

Now zpool reporsts the vdev ad1 still online although it has been
detached and 'atacontrol list' doesn't show it anymore:

zpool status
pool: URUBAmirrorP1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool
replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
URUBAmirrorP1 ONLINE 0 0 0
mirror ONLINE 0 0 0
ad1 ONLINE 3 302K 0
ad2 ONLINE 0 0 0

errors: No known data errors

atacontrol list
ATA channel 2:
Master: ad0 <TRANSCEND/20090520> SATA revision 1.x
Slave: no device present
ATA channel 3:
Master: no device present
Slave: no device present
ATA channel 4:
Master: ad2 <SAMSUNG HD154UI/1AG01118> SATA revision 2.x
Slave: no device present
ATA channel 5:
Master: ad3 <ST3750640NS/3.AEG> SATA revision 1.x
Slave: no device present

How should such a failure be handled?
Do I have to manually mark the drive offline for zpool?

Thanks,

-Harry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: OpenPGP digital signature
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100505/36bce5d5/signature.pgp

freebsd stable - May 2010 - ZFS (zpool) doesn't detect failed drive

ZFS (zpool) doesn't detect failed drive

ZFS (zpool) doesn't detect failed drive