thr3ads.net - zfs discuss - [zfs-discuss] ZFS goes catatonic when drives go dead? [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Peter Eriksson

2006-Nov-21 14:58 UTC

[zfs-discuss] ZFS goes catatonic when drives go dead?

This is a bit frustrating... If I create Zpool with some disks on a SAN
(A3500FC)
on a Sun Ultra 10 running Solaris 10 6/06 with all the latest patches:

[0] kraiklyn:~# zpool status
  pool: galahad
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        galahad     ONLINE       0     0     0
          raidz     ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c1t5d1  ONLINE       0     0     0
            c1t5d2  ONLINE       0     0     0
            c1t5d3  ONLINE       0     0     0
            c1t5d4  ONLINE       0     0     0
          raidz     ONLINE       0     0     0
            c1t5d5  ONLINE       0     0     0
            c1t5d6  ONLINE       0     0     0
            c1t5d7  ONLINE       0     0     0
            c1t5d8  ONLINE       0     0     0
            c1t5d9  ONLINE       0     0     0

errors: No known data errors

The fail a drive to simulate a disk failure (same thing happens if I actually 
pull the disk from the system) then this happens:

[1] kraiklyn:~# drivutil -f 20 c1t5d0

drivutil succeeded!
[0] kraiklyn:~# zpool status
  pool: galahad
 state: ONLINE
 scrub: none requested


It never recovers from this state. I went for some coffee at this state and
it''s still
here when I came back.  If I start another terminal, and then try
"format", that too hangs.

I can''t even reboot the machine (it just hangs) without having to do
a RETURN ~ Ctrl-B to get out to the OpenBoot prompt.

After a reboot it shows the expected output:

[0] kraiklyn:~# zpool status
  pool: galahad
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using ''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        galahad     DEGRADED     0     0     0
          raidz     DEGRADED     0     0     0
            c1t5d0  ONLINE       0     0     0
            c1t5d1  UNAVAIL      0     0     0  corrupted data
            c1t5d2  ONLINE       0     0     0
            c1t5d3  ONLINE       0     0     0
            c1t5d4  ONLINE       0     0     0
          raidz     ONLINE       0     0     0
            c1t5d5  ONLINE       0     0     0
            c1t5d6  ONLINE       0     0     0
            c1t5d7  ONLINE       0     0     0
            c1t5d8  ONLINE       0     0     0
            c1t5d9  ONLINE       0     0     0

errors: No known data errors


However, if I stay away from ZFS/Zpool and just use the raw devices (or use SVM
to handle things) then things work as expected - the LUN goes away/generates I/O
errors when I fail that disk and things come back when I "unfail" it.
No hangs...

It almost feels like ZFS causes some "lock" in the kernel.
 
 
This message posted from opensolaris.org

Peter Eriksson

2006-Nov-21 16:30 UTC

head link

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?

Heh.... Found a workaround : Wrap all the real disk devices with a SVM
metadevices, and then
put those into a ZFS raidz volume. Now... This doesn''t *really* feel
right, somehow, but if it works, then it works... :-)


# zpool status
  pool: foobar
 state: ONLINE
 scrub: none requested
config:

        NAME                  STATE     READ WRITE CKSUM
        foobar                ONLINE       0     0     0
          raidz               ONLINE       0     0     0
            /dev/md/dsk/d101  ONLINE       0     0     0
            /dev/md/dsk/d102  ONLINE       0     0     0
            /dev/md/dsk/d103  ONLINE       0     0     0
            /dev/md/dsk/d104  ONLINE       0     0     0
            /dev/md/dsk/d105  ONLINE       0     0     0

# drivutil -f 23 c1t5d0
# cp /var/adm/messages /foobar/
# zpool status
[pool: foobar
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

        NAME                  STATE     READ WRITE CKSUM
        foobar                DEGRADED     0     0     0
          raidz               DEGRADED     0     0     0
            /dev/md/dsk/d101  ONLINE       0     0     0
            /dev/md/dsk/d102  UNAVAIL      0   141     0  cannot open
            /dev/md/dsk/d103  ONLINE       0     0     0
            /dev/md/dsk/d104  ONLINE       0     0     0
            /dev/md/dsk/d105  ONLINE       0     0     0

errors: No known data errors

# zpool online foobar /dev/md/dsk/d102
Bringing device /dev/md/dsk/d102 online

# zpool status 
  pool: foobar
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed with 0 errors on Tue Nov 21 17:29:40 2006
config:

        NAME                  STATE     READ WRITE CKSUM
        foobar                ONLINE       0     0     0
          raidz               ONLINE       0     0     0
            /dev/md/dsk/d101  ONLINE       0     0     0
            /dev/md/dsk/d102  ONLINE       0   141     0
            /dev/md/dsk/d103  ONLINE       0     0     0
            /dev/md/dsk/d104  ONLINE       0     0     0
            /dev/md/dsk/d105  ONLINE       0     0     0

errors: No known data errors
 
 
This message posted from opensolaris.org

Richard Elling

2006-Nov-21 18:45 UTC

head link

[zfs-discuss] ZFS goes catatonic when drives go dead?

I think this is in the FAQ, if not, then it should be.  Full integration
between ZFS and FMA is not yet available.  Until it is available, there are
some failure modes which are not handled perfectly well.  It really depends
on the entire software+firmware+hardware stack as to how it will react under
the various failure scenarios during this interim time period.
  -- richard

Peter Eriksson wrote:> This is a bit frustrating... If I create Zpool with some disks on a SAN
(A3500FC)
> on a Sun Ultra 10 running Solaris 10 6/06 with all the latest patches:
> 
> [0] kraiklyn:~# zpool status
>   pool: galahad
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         galahad     ONLINE       0     0     0
>           raidz     ONLINE       0     0     0
>             c1t5d0  ONLINE       0     0     0
>             c1t5d1  ONLINE       0     0     0
>             c1t5d2  ONLINE       0     0     0
>             c1t5d3  ONLINE       0     0     0
>             c1t5d4  ONLINE       0     0     0
>           raidz     ONLINE       0     0     0
>             c1t5d5  ONLINE       0     0     0
>             c1t5d6  ONLINE       0     0     0
>             c1t5d7  ONLINE       0     0     0
>             c1t5d8  ONLINE       0     0     0
>             c1t5d9  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> The fail a drive to simulate a disk failure (same thing happens if I
actually
> pull the disk from the system) then this happens:
> 
> [1] kraiklyn:~# drivutil -f 20 c1t5d0
> 
> drivutil succeeded!
> [0] kraiklyn:~# zpool status
>   pool: galahad
>  state: ONLINE
>  scrub: none requested
> 
> 
> It never recovers from this state. I went for some coffee at this state and
it''s still
> here when I came back.  If I start another terminal, and then try
"format", that too hangs.
> 
> I can''t even reboot the machine (it just hangs) without having to
do
> a RETURN ~ Ctrl-B to get out to the OpenBoot prompt.
> 
> After a reboot it shows the expected output:
> 
> [0] kraiklyn:~# zpool status
>   pool: galahad
>  state: DEGRADED
> status: One or more devices could not be used because the label is missing
or
>         invalid.  Sufficient replicas exist for the pool to continue
>         functioning in a degraded state.
> action: Replace the device using ''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         galahad     DEGRADED     0     0     0
>           raidz     DEGRADED     0     0     0
>             c1t5d0  ONLINE       0     0     0
>             c1t5d1  UNAVAIL      0     0     0  corrupted data
>             c1t5d2  ONLINE       0     0     0
>             c1t5d3  ONLINE       0     0     0
>             c1t5d4  ONLINE       0     0     0
>           raidz     ONLINE       0     0     0
>             c1t5d5  ONLINE       0     0     0
>             c1t5d6  ONLINE       0     0     0
>             c1t5d7  ONLINE       0     0     0
>             c1t5d8  ONLINE       0     0     0
>             c1t5d9  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> 
> However, if I stay away from ZFS/Zpool and just use the raw devices (or use
SVM to handle things) then things work as expected - the LUN goes away/generates
I/O errors when I fail that disk and things come back when I "unfail"
it. No hangs...
> 
> It almost feels like ZFS causes some "lock" in the kernel.
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Peter Eriksson

2006-Nov-22 11:38 UTC

head link

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?

There is nothing in the ZFS FAQ about this. I also fail to see how FMA could
make any difference since it seems that ZFS is deadlocking somewhere in the
kernel when this happens...

It works if you wrap all the physical devices inside SVM metadevices and use
those for your
ZFS/zpool instead. Ie:

metainit d101 1 1 c1t5d0s0
metainit d102 1 1 c1t5d1s0
metainit d103 1 1 c1t5d2s0
zpool create foo radz /dev/md/dsk/d101 /dev/md/dsk/d102 /dev/md/dsk/d103

Another unrelated observation - I''ve noticed that ZFS often works
*faster* if I wrap a physical partition inside a metadevice and then feed that
to zpool instead of using the raw partition directly with zpool... Example:
Testing ZFS on a spare 40GB partition of the boot ATA disk in an Sun Ultra
10/440 gives horrible performance numbers. If I wrap that into a simple
metadevice and feed to ZFS things work much faster... Ie:

Zpool containing one normal disk partition:

# /bin/time mkfile 1G 1G
real     2:46.5
user        0.4
sys        24.1
--> 6MB/s (that was actually the best number I got - the worst was 3:03
minutes)

Zpool containing one SVM metadevice containing the same disk partition:

#/bin/time mkfile 1G 1G
real     1:41.6
user        0.3
sys        23.3
--> 10MB/s

(Idle machine in both cases, mkfile rerun a couple of times, with the same
results. I removed the 1G file between reruns of course)
 
 
This message posted from opensolaris.org

Richard Elling

2006-Nov-22 17:26 UTC

head link

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?

Peter Eriksson wrote:> There is nothing in the ZFS FAQ about this. I also fail to see how FMA
could make any
> difference since it seems that ZFS is deadlocking somewhere in the kernel
when this happens...
Some people don''t see a difference between "hung" and
"patiently waiting."
There are failure modes where you would patiently wait.  With full FMA
integration
the system will know that patiently waiting is futile.
> It works if you wrap all the physical devices inside SVM metadevices and
use those for your
> ZFS/zpool instead. Ie:
> 
> metainit d101 1 1 c1t5d0s0
> metainit d102 1 1 c1t5d1s0
> metainit d103 1 1 c1t5d2s0
> zpool create foo radz /dev/md/dsk/d101 /dev/md/dsk/d102 /dev/md/dsk/d103
> 
> Another unrelated observation - I''ve noticed that ZFS often works
*faster* if I wrap a
> physical partition inside a metadevice and then feed that to zpool instead
of using
> the raw partition directly with zpool... Example: Testing ZFS on a spare
40GB partition
> of the boot ATA disk in an Sun Ultra 10/440 gives horrible performance
numbers. If I
> wrap that into a simple metadevice and feed to ZFS things work much
faster... Ie:
More likely this is:
6421427 netra x1 slagged by NFS over ZFS leading to long spins in the ATA driver
code
  -- richard

Pawel Jakub Dawidek

2006-Nov-23 11:09 UTC

head link

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?

On Wed, Nov 22, 2006 at 03:38:05AM -0800, Peter Eriksson
wrote:> There is nothing in the ZFS FAQ about this. I also fail to see how FMA
could make any difference since it seems that ZFS is deadlocking somewhere in
the kernel when this happens...
> 
> It works if you wrap all the physical devices inside SVM metadevices and
use those for your
> ZFS/zpool instead. Ie:
> 
> metainit d101 1 1 c1t5d0s0
> metainit d102 1 1 c1t5d1s0
> metainit d103 1 1 c1t5d2s0
> zpool create foo radz /dev/md/dsk/d101 /dev/md/dsk/d102 /dev/md/dsk/d103
> 
> Another unrelated observation - I''ve noticed that ZFS often works
*faster* if I wrap a physical partition inside a metadevice and then feed that
to zpool instead of using the raw partition directly with zpool... Example:
Testing ZFS on a spare 40GB partition of the boot ATA disk in an Sun Ultra
10/440 gives horrible performance numbers. If I wrap that into a simple
metadevice and feed to ZFS things work much faster... Ie:
> 
> Zpool containing one normal disk partition:
> 
> # /bin/time mkfile 1G 1G
> real     2:46.5
> user        0.4
> sys        24.1
> --> 6MB/s (that was actually the best number I got - the worst was 3:03
minutes)
> 
> Zpool containing one SVM metadevice containing the same disk partition:
> 
> #/bin/time mkfile 1G 1G
> real     1:41.6
> user        0.3
> sys        23.3
> --> 10MB/s
> 
> (Idle machine in both cases, mkfile rerun a couple of times, with the same
results. I removed the 1G file between reruns of course)
It may be because for raw disks ZFS flushes write cache (via
DKIOCFLUSHWRITECACHE), which can be expensive operation and highly
depend on disks/controllers used. I doubt it does the same for
metadevices, but I may be wrong.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061123/d44df423/attachment.bin>

Pawel Jakub Dawidek

2006-Nov-23 11:19 UTC

head link

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?

On Thu, Nov 23, 2006 at 12:09:09PM +0100, Pawel Jakub Dawidek
wrote:> On Wed, Nov 22, 2006 at 03:38:05AM -0800, Peter Eriksson wrote:
> > There is nothing in the ZFS FAQ about this. I also fail to see how FMA
could make any difference since it seems that ZFS is deadlocking somewhere in
the kernel when this happens...
> > 
> > It works if you wrap all the physical devices inside SVM metadevices
and use those for your
> > ZFS/zpool instead. Ie:
> > 
> > metainit d101 1 1 c1t5d0s0
> > metainit d102 1 1 c1t5d1s0
> > metainit d103 1 1 c1t5d2s0
> > zpool create foo radz /dev/md/dsk/d101 /dev/md/dsk/d102
/dev/md/dsk/d103
> > 
> > Another unrelated observation - I''ve noticed that ZFS often
works *faster* if I wrap a physical partition inside a metadevice and then feed
that to zpool instead of using the raw partition directly with zpool... Example:
Testing ZFS on a spare 40GB partition of the boot ATA disk in an Sun Ultra
10/440 gives horrible performance numbers. If I wrap that into a simple
metadevice and feed to ZFS things work much faster... Ie:
> > 
> > Zpool containing one normal disk partition:
> > 
> > # /bin/time mkfile 1G 1G
> > real     2:46.5
> > user        0.4
> > sys        24.1
> > --> 6MB/s (that was actually the best number I got - the worst was
3:03 minutes)
> > 
> > Zpool containing one SVM metadevice containing the same disk
partition:
> > 
> > #/bin/time mkfile 1G 1G
> > real     1:41.6
> > user        0.3
> > sys        23.3
> > --> 10MB/s
> > 
> > (Idle machine in both cases, mkfile rerun a couple of times, with the
same results. I removed the 1G file between reruns of course)
> 
> It may be because for raw disks ZFS flushes write cache (via
> DKIOCFLUSHWRITECACHE), which can be expensive operation and highly
> depend on disks/controllers used. I doubt it does the same for
> metadevices, but I may be wrong.
Oops, you operate on partitions... I think for partitions ZFS disables
write cache on disks... Anyway, I''ll leave the answer to someone more
clueful.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061123/568546aa/attachment.bin>

zfs discuss - Nov 2006 - ZFS goes catatonic when drives go dead?

[zfs-discuss] ZFS goes catatonic when drives go dead?

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?

[zfs-discuss] ZFS goes catatonic when drives go dead?

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?

[zfs-discuss] Re: ZFS goes catatonic when drives go dead?