thr3ads.net - zfs discuss - [zfs-discuss] zfs hardware failure questions [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Kam

2008-Nov-20 15:54 UTC

[zfs-discuss] zfs hardware failure questions

I was asked a few interesting questions by a fellow co-worker regarding ZFS and
after much google-bombing, still can''t find answers. I''ve seen
several people try to ask these questions, but only to have been answered
indirectly.

If I have a pool that consists of a raidz-1 w/ three 500gb disks and I go to the
store and buy a fourth 500gb disk and add it to the pool as the second vdev,
what happens when that fourth disk has a hardware failure?

The second question is lets say I have two disks and I create a non-parity pool
[2 vdevs creating /tank] with a single child filesystem [/tank/fscopies2/] in
the pool with the copies=2 attribute. If I lose one of these disks, will I still
have access to my files? If you were to add a third disk to this filesystem as a
third vdev at a future point in time, would there be any scenario where a
hardware failure would cause the rest of the pool to be unreadable?
-- 
This message posted from opensolaris.org

Tim

2008-Nov-20 16:03 UTC

head link

[zfs-discuss] zfs hardware failure questions

On Thu, Nov 20, 2008 at 9:54 AM, Kam <cam at mrlane.com> wrote:
> I was asked a few interesting questions by a fellow co-worker regarding ZFS
> and after much google-bombing, still can''t find answers.
I''ve seen several
> people try to ask these questions, but only to have been answered
> indirectly.
>
> If I have a pool that consists of a raidz-1 w/ three 500gb disks and I go
> to the store and buy a fourth 500gb disk and add it to the pool as the
> second vdev, what happens when that fourth disk has a hardware failure?

Your pool is toast.  If you set copies=2, and get lucky enough that copies
of every block on the standalone are copied to the raidz vdev, you might be
able to survive, the odds are pretty slim.

Long story short, don''t ever do it.  Not only will you have messed up
performance, you''ve basically wasted the drive used for parity in the
raidz
vdev.  The pool isn''t any better than a raid-0 at this point.

>
>
> The second question is lets say I have two disks and I create a non-parity
> pool [2 vdevs creating /tank] with a single child filesystem
> [/tank/fscopies2/] in the pool with the copies=2 attribute. If I lose one
of
> these disks, will I still have access to my files? If you were to add a
> third disk to this filesystem as a third vdev at a future point in time,
> would there be any scenario where a hardware failure would cause the rest
of
> the pool to be unreadable?
>
Same as above, not necessarily.  There''s nothing guaranteeing where the
two
copies will exist, just that there will be two.  They may both be on one
disk, or they may not.  This is more to protect against corrupt blocks if
you only have a single drive, than against losing an entire drive.

Moral of the story continues to be: If you want protection against a failed
disk, use a raid algorithm that provides it.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/6028688b/attachment.html>

Miles Nordin

2008-Nov-20 18:35 UTC

head link

[zfs-discuss] zfs hardware failure questions

>>>>> "t" == Tim  <tim at tcsac.net> writes:
     >> a fourth 500gb disk and add
     >> it to the pool as the second vdev, what happens when that
     >> fourth disk has a hardware failure?

     t> If you set copies=2, and get lucky enough
     t> that copies of every block on the standalone are copied to the
     t> raidz vdev, you might be able to survive,

no, of course you won''t survive.  Just try it with file vdev''s
before
pontificating about it.

-----8<----
bash-3.00# mkfile 64m t0
bash-3.00# mkfile 64m t1
bash-3.00# mkfile 64m t2
bash-3.00# mkfile 64m t00
bash-3.00# pwd -P
/usr/export
bash-3.00# zpool create foolpool raidz1 /usr/export/t{0..2}
bash-3.00# zpool add foolpool /usr/export/t00
invalid vdev specification
use ''-f'' to override the following errors:
mismatched replication level: pool uses raidz and new vdev is file
bash-3.00# zpool add -f foolpool /usr/export/t00
bash-3.00# zpool status -v foolpool
  pool: foolpool
 state: ONLINE
 scrub: none requested
config:

        NAME                STATE     READ WRITE CKSUM
        foolpool            ONLINE       0     0     0
          raidz1            ONLINE       0     0     0
            /usr/export/t0  ONLINE       0     0     0
            /usr/export/t1  ONLINE       0     0     0
            /usr/export/t2  ONLINE       0     0     0
          /usr/export/t00   ONLINE       0     0     0

errors: No known data errors
bash-3.00# zfs set copies=2 foolpool
bash-3.00# cd /
bash-3.00# pax -rwpe sbin foolpool/
bash-3.00# > /usr/export/t00
bash-3.00# pax -w foolpool/ > /dev/null
bash-3.00# zpool status -v foolpool
  pool: foolpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME                STATE     READ WRITE CKSUM
        foolpool            DEGRADED     4     0    21
          raidz1            ONLINE       0     0     0
            /usr/export/t0  ONLINE       0     0     0
            /usr/export/t1  ONLINE       0     0     0
            /usr/export/t2  ONLINE       0     0     0
          /usr/export/t00   DEGRADED     4     0    21  too many errors

errors: No known data errors
bash-3.00# zpool offline foolpool /usr/export/t00
cannot offline /usr/export/t00: no valid replicas
bash-3.00# zpool export foolpool
panic[cpu0]/thread=2a1016b7ca0: assertion failed: vdev_config_sync(rvd, txg) ==
0, file: ../../common/fs/zfs/spa.c, line: 3125

000002a1016b7850 genunix:assfail+78 (7b72c668, 7b72b680, c35, 183d800, 1285c00,
0)
  %l0-3: 0000000000000422 0000000000000081 000003001df5e580 0000000070170880
  %l4-7: 0000060016076c88 0000000000000000 0000000001887800 0000000000000000
000002a1016b7900 zfs:spa_sync+244 (3001df5e580, 42, 30043434e30, 7b72c400,
7b72b400, 4)
  %l0-3: 0000000000000000 000003001df5e6b0 000003001df5e670 00000600155cce80
  %l4-7: 0000030056703040 0000060013659200 000003001df5e708 00000000018c2e98
000002a1016b79c0 zfs:txg_sync_thread+120 (60013659200, 42, 2a1016b7a70,
60013659320, 60013659312, 60013659310)
  %l0-3: 0000000000000000 00000600136592d0 00000600136592d8 0000060013659316
  %l4-7: 0000060013659314 00000600136592c8 0000000000000043 0000000000000042

syncing file systems... done
[...first reboot...]
WARNING: ZFS replay transaction error 30, dataset boot/usr, seq 0x134c, txtype 9

NOTICE: iscsi: SendTarget discovery failed (11)         [``patiently
waits'''' forver]

~#Type  ''go'' to resume
{0} ok boot -m milestone=none
Resetting ...
[...second reboot...]

# /sbin/mount -o remount,rw /
# /sbin/mount /usr
# iscsiadm remove discovery-address 10.100.100.135
# iscsiadm remove discovery-address 10.100.100.138
# cd /usr/export 
# mkdir hide
# mv t0 t1 t2 t00 hide
mv: cannot access t00                                   [haha ZFS.]
# sync
# reboot
syncing file systems... done
[...third reboot...]
SunOS Release 5.11 Version snv_71 64-bit
Copyright 1983-2007 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: mddb: unable to get devid for ''sd'', 0xffffffff
NOTICE: mddb: unable to get devid for ''sd'', 0xffffffff
NOTICE: mddb: unable to get devid for ''sd'', 0xffffffff
WARNING: /pci at 8,700000/usb at 5,3 (ohci0): Connecting device on port 4 failed
Hostname: terabithia.th3h.inner.chaos
/usr/sbin/pmconfig: "/etc/power.conf" line 37, cannot find ufs mount
point for "/usr/.CPR"
Reading ZFS config: done.
Mounting ZFS filesystems: (9/9)

terabithia.th3h.inner.chaos console login: root
Password:
Nov 20 13:09:30 terabithia.th3h.inner.chaos login: ROOT LOGIN /dev/console
Last login: Mon Aug 18 03:04:12 on console
Sun Microsystems Inc.   SunOS 5.11      snv_71  October 2007
You have new mail.
# exec bash
bash-3.00# zpool status -v foolpool
  pool: foolpool
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using ''zpool
online''.
   see: sun.com/msg/ZFS-8000-3C
 scrub: none requested
config: 

        NAME                STATE     READ WRITE CKSUM
        foolpool            UNAVAIL      0     0     0  insufficient replicas
          raidz1            UNAVAIL      0     0     0  insufficient replicas
            /usr/export/t0  UNAVAIL      0     0     0  cannot open
            /usr/export/t1  UNAVAIL      0     0     0  cannot open
            /usr/export/t2  UNAVAIL      0     0     0  cannot open
          /usr/export/t00   UNAVAIL      0     0     0  cannot open
bash-3.00# cd /usr/export
bash-3.00# mv hide/* .
bash-3.00# zpool clear foolpool
cannot open ''foolpool'': pool is unavailable
bash-3.00# zpool status -v foolpool
  pool: foolpool
 state: UNAVAIL
status: One or more devices could not be used because the label is missing
        or invalid.  There are insufficient replicas for the pool to continue
        functioning.
action: Destroy and re-create the pool from a backup source.
   see: sun.com/msg/ZFS-8000-5E
 scrub: none requested
config: 

        NAME                STATE     READ WRITE CKSUM
        foolpool            UNAVAIL      0     0     0  insufficient replicas
          raidz1            ONLINE       0     0     0
            /usr/export/t0  ONLINE       0     0     0
            /usr/export/t1  ONLINE       0     0     0
            /usr/export/t2  ONLINE       0     0     0
          /usr/export/t00   UNAVAIL      0     0     0  corrupted data
bash-3.00# zpool export foolpool
bash-3.00# zpool import -d /usr/export foolpool
internal error: Value too large for defined data type
Abort (core dumped)
bash-3.00# rm core
bash-3.00# zpool import -d /usr/export
  pool: foolpool
    id: 8355048046034000632
 state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: sun.com/msg/ZFS-8000-5E
config:

        foolpool            UNAVAIL  insufficient replicas
          raidz1            ONLINE
            /usr/export/t0  ONLINE
            /usr/export/t1  ONLINE
            /usr/export/t2  ONLINE
          /usr/export/t00   UNAVAIL  corrupted data
bash-3.00# ls -l
total 398218
drwxr-xr-x   2 root     root           2 Nov 20 13:10 hide
drwxr-xr-x   6 root     root           6 Oct  5 02:04 nboot
-rw------T   1 root     root     67108864 Nov 20 12:50 t0
-rw------T   1 root     root     67028992 Nov 20 12:50 t00  [<- lol*2.  where
did THAT come from?
-rw------T   1 root     root     67108864 Nov 20 12:50 t1       and slightly too
small, see?]
-rw------T   1 root     root     67108864 Nov 20 12:50 t2
-----8<----

     t> They may both be on one disk, or they may not.  This is more
     t> to protect against corrupt blocks if you only have a single
     t> drive, than against losing an entire drive.

It''s not because there is some data which is not spread across both
drives.  It''s because ZFS won''t let you get at ANY of the
data, even
what''s spread, because of a variety of sanity checks, core dumps, and
kernel panics.  copies=2 is a really silly feature, IMHO.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/bccb5dd4/attachment.bin>

Tim

2008-Nov-20 19:24 UTC

head link

[zfs-discuss] zfs hardware failure questions

On Thu, Nov 20, 2008 at 12:35 PM, Miles Nordin <carton at ivy.net> wrote:
> >>>>> "t" == Tim  <tim at tcsac.net> writes:
>
>     >> a fourth 500gb disk and add
>     >> it to the pool as the second vdev, what happens when that
>     >> fourth disk has a hardware failure?
>
>     t> If you set copies=2, and get lucky enough
>     t> that copies of every block on the standalone are copied to the
>     t> raidz vdev, you might be able to survive,
>
> no, of course you won''t survive.  Just try it with file
vdev''s before
> pontificating about it.
>
> -----8<----
> bash-3.00# mkfile 64m t0
> bash-3.00# mkfile 64m t1
> bash-3.00# mkfile 64m t2
> bash-3.00# mkfile 64m t00
> bash-3.00# pwd -P
> /usr/export
> bash-3.00# zpool create foolpool raidz1 /usr/export/t{0..2}
> bash-3.00# zpool add foolpool /usr/export/t00
> invalid vdev specification
> use ''-f'' to override the following errors:
> mismatched replication level: pool uses raidz and new vdev is file
> bash-3.00# zpool add -f foolpool /usr/export/t00
> bash-3.00# zpool status -v foolpool
>  pool: foolpool
>  state: ONLINE
>  scrub: none requested
> config:
>
>        NAME                STATE     READ WRITE CKSUM
>        foolpool            ONLINE       0     0     0
>          raidz1            ONLINE       0     0     0
>            /usr/export/t0  ONLINE       0     0     0
>            /usr/export/t1  ONLINE       0     0     0
>            /usr/export/t2  ONLINE       0     0     0
>          /usr/export/t00   ONLINE       0     0     0
>
> errors: No known data errors
> bash-3.00# zfs set copies=2 foolpool
> bash-3.00# cd /
> bash-3.00# pax -rwpe sbin foolpool/
> bash-3.00# > /usr/export/t00
> bash-3.00# pax -w foolpool/ > /dev/null
> bash-3.00# zpool status -v foolpool
>  pool: foolpool
>  state: DEGRADED
> status: One or more devices has experienced an unrecoverable error.  An
>        attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>        using ''zpool clear'' or replace the device with
''zpool replace''.
>   see: sun.com/msg/ZFS-8000-9P
>  scrub: none requested
> config:
>
>        NAME                STATE     READ WRITE CKSUM
>        foolpool            DEGRADED     4     0    21
>          raidz1            ONLINE       0     0     0
>            /usr/export/t0  ONLINE       0     0     0
>            /usr/export/t1  ONLINE       0     0     0
>            /usr/export/t2  ONLINE       0     0     0
>          /usr/export/t00   DEGRADED     4     0    21  too many errors
>
> errors: No known data errors
> bash-3.00# zpool offline foolpool /usr/export/t00
> cannot offline /usr/export/t00: no valid replicas
> bash-3.00# zpool export foolpool
> panic[cpu0]/thread=2a1016b7ca0: assertion failed: vdev_config_sync(rvd,
> txg) == 0, file: ../../common/fs/zfs/spa.c, line: 3125
>
> 000002a1016b7850 genunix:assfail+78 (7b72c668, 7b72b680, c35, 183d800,
> 1285c00, 0)
>  %l0-3: 0000000000000422 0000000000000081 000003001df5e580 0000000070170880
>  %l4-7: 0000060016076c88 0000000000000000 0000000001887800 0000000000000000
> 000002a1016b7900 zfs:spa_sync+244 (3001df5e580, 42, 30043434e30, 7b72c400,
> 7b72b400, 4)
>  %l0-3: 0000000000000000 000003001df5e6b0 000003001df5e670 00000600155cce80
>  %l4-7: 0000030056703040 0000060013659200 000003001df5e708 00000000018c2e98
> 000002a1016b79c0 zfs:txg_sync_thread+120 (60013659200, 42, 2a1016b7a70,
> 60013659320, 60013659312, 60013659310)
>  %l0-3: 0000000000000000 00000600136592d0 00000600136592d8 0000060013659316
>  %l4-7: 0000060013659314 00000600136592c8 0000000000000043 0000000000000042
>
> syncing file systems... done
> [...first reboot...]
> WARNING: ZFS replay transaction error 30, dataset boot/usr, seq 0x134c,
> txtype 9
>
> NOTICE: iscsi: SendTarget discovery failed (11)         [``patiently
> waits'''' forver]
>
> ~#Type  ''go'' to resume
> {0} ok boot -m milestone=none
> Resetting ...
> [...second reboot...]
>
> # /sbin/mount -o remount,rw /
> # /sbin/mount /usr
> # iscsiadm remove discovery-address 10.100.100.135
> # iscsiadm remove discovery-address 10.100.100.138
> # cd /usr/export
> # mkdir hide
> # mv t0 t1 t2 t00 hide
> mv: cannot access t00                                   [haha ZFS.]
> # sync
> # reboot
> syncing file systems... done
> [...third reboot...]
> SunOS Release 5.11 Version snv_71 64-bit
> Copyright 1983-2007 Sun Microsystems, Inc.  All rights reserved.
> Use is subject to license terms.
> NOTICE: mddb: unable to get devid for ''sd'', 0xffffffff
> NOTICE: mddb: unable to get devid for ''sd'', 0xffffffff
> NOTICE: mddb: unable to get devid for ''sd'', 0xffffffff
> WARNING: /pci at 8,700000/usb at 5,3 (ohci0): Connecting device on port 4
failed
> Hostname: terabithia.th3h.inner.chaos
> /usr/sbin/pmconfig: "/etc/power.conf" line 37, cannot find ufs
mount point
> for "/usr/.CPR"
> Reading ZFS config: done.
> Mounting ZFS filesystems: (9/9)
>
> terabithia.th3h.inner.chaos console login: root
> Password:
> Nov 20 13:09:30 terabithia.th3h.inner.chaos login: ROOT LOGIN /dev/console
> Last login: Mon Aug 18 03:04:12 on console
> Sun Microsystems Inc.   SunOS 5.11      snv_71  October 2007
> You have new mail.
> # exec bash
> bash-3.00# zpool status -v foolpool
>  pool: foolpool
>  state: UNAVAIL
> status: One or more devices could not be opened.  There are insufficient
>        replicas for the pool to continue functioning.
> action: Attach the missing device and online it using ''zpool
online''.
>   see: sun.com/msg/ZFS-8000-3C
>  scrub: none requested
> config:
>
>        NAME                STATE     READ WRITE CKSUM
>        foolpool            UNAVAIL      0     0     0  insufficient
> replicas
>          raidz1            UNAVAIL      0     0     0  insufficient
> replicas
>            /usr/export/t0  UNAVAIL      0     0     0  cannot open
>            /usr/export/t1  UNAVAIL      0     0     0  cannot open
>            /usr/export/t2  UNAVAIL      0     0     0  cannot open
>          /usr/export/t00   UNAVAIL      0     0     0  cannot open
> bash-3.00# cd /usr/export
> bash-3.00# mv hide/* .
> bash-3.00# zpool clear foolpool
> cannot open ''foolpool'': pool is unavailable
> bash-3.00# zpool status -v foolpool
>  pool: foolpool
>  state: UNAVAIL
> status: One or more devices could not be used because the label is missing
>        or invalid.  There are insufficient replicas for the pool to
> continue
>        functioning.
> action: Destroy and re-create the pool from a backup source.
>   see: sun.com/msg/ZFS-8000-5E
>  scrub: none requested
> config:
>
>        NAME                STATE     READ WRITE CKSUM
>        foolpool            UNAVAIL      0     0     0  insufficient
> replicas
>          raidz1            ONLINE       0     0     0
>            /usr/export/t0  ONLINE       0     0     0
>            /usr/export/t1  ONLINE       0     0     0
>            /usr/export/t2  ONLINE       0     0     0
>          /usr/export/t00   UNAVAIL      0     0     0  corrupted data
> bash-3.00# zpool export foolpool
> bash-3.00# zpool import -d /usr/export foolpool
> internal error: Value too large for defined data type
> Abort (core dumped)
> bash-3.00# rm core
> bash-3.00# zpool import -d /usr/export
>  pool: foolpool
>    id: 8355048046034000632
>  state: FAULTED
> status: One or more devices contains corrupted data.
> action: The pool cannot be imported due to damaged devices or data.
>   see: sun.com/msg/ZFS-8000-5E
> config:
>
>        foolpool            UNAVAIL  insufficient replicas
>          raidz1            ONLINE
>            /usr/export/t0  ONLINE
>            /usr/export/t1  ONLINE
>            /usr/export/t2  ONLINE
>          /usr/export/t00   UNAVAIL  corrupted data
> bash-3.00# ls -l
> total 398218
> drwxr-xr-x   2 root     root           2 Nov 20 13:10 hide
> drwxr-xr-x   6 root     root           6 Oct  5 02:04 nboot
> -rw------T   1 root     root     67108864 Nov 20 12:50 t0
> -rw------T   1 root     root     67028992 Nov 20 12:50 t00  [<- lol*2.
>  where did THAT come from?
> -rw------T   1 root     root     67108864 Nov 20 12:50 t1       and
> slightly too small, see?]
> -rw------T   1 root     root     67108864 Nov 20 12:50 t2
> -----8<----
>
>     t> They may both be on one disk, or they may not.  This is more
>     t> to protect against corrupt blocks if you only have a single
>     t> drive, than against losing an entire drive.
>
> It''s not because there is some data which is not spread across
both
> drives.  It''s because ZFS won''t let you get at ANY of the
data, even
> what''s spread, because of a variety of sanity checks, core dumps,
and
> kernel panics.  copies=2 is a really silly feature, IMHO.
>
Pretty sure ALL of the above are settings that can be changed.  Not to
mention that *backline*, from what I''ve seen, can still get the data if
you
have valid copies of the blocks.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/b530aa09/attachment.html>

Miles Nordin

2008-Nov-20 23:02 UTC

head link

[zfs-discuss] zfs hardware failure questions

>>>>> "t" == Tim  <tim at tcsac.net> writes:
     t> Pretty sure ALL of the above are settings that can be changed.

nope.  But feel free to be more specific.  Or repeat the test
yourself---it only takes like fifteen minutes.  It''d take five if not
for the rebooting.

     t> Not to mention that *backline*, from what I''ve seen, can
still
     t> get the data if you have valid copies of the blocks.

Can you elaborate?  There have been a lot of posts here about pools in
FAULTED state that won''t import---I''m sure those posters would
be even
more interested than I to know how they can recover their data using
*backline*.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/454177f6/attachment.bin>

Tim

2008-Nov-20 23:17 UTC

head link

[zfs-discuss] zfs hardware failure questions

On Thu, Nov 20, 2008 at 5:02 PM, Miles Nordin <carton at ivy.net> wrote:
> >>>>> "t" == Tim  <tim at tcsac.net> writes:
>
>     t> Pretty sure ALL of the above are settings that can be changed.
>
> nope.  But feel free to be more specific.  Or repeat the test
> yourself---it only takes like fifteen minutes.  It''d take five if
not
> for the rebooting.

Uhh, yes.  There''s more than one post here describing how to set what
the
system does when the pool is in a faulted state/you lose a drive.  Wait,
panic, proceed.  If we had any sort of decent search function I''d
already
have the response.  As such, it''ll take a bit of time for me to dig it
up.


>
>
>     t> Not to mention that *backline*, from what I''ve seen, can
still
>     t> get the data if you have valid copies of the blocks.
>
> Can you elaborate?  There have been a lot of posts here about pools in
> FAULTED state that won''t import---I''m sure those posters
would be even
> more interested than I to know how they can recover their data using
> *backline*.
>
>I''d have to search the list, but I''ve read of more than one
person here that
has worked with a Sun Engineer to manually re-create the metadata.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/50940ec2/attachment.html>

Tim

2008-Nov-20 23:19 UTC

head link

[zfs-discuss] zfs hardware failure questions

On Thu, Nov 20, 2008 at 5:02 PM, Miles Nordin <carton at ivy.net> wrote:
> >>>>> "t" == Tim  <tim at tcsac.net> writes:
>
>     t> Pretty sure ALL of the above are settings that can be changed.
>
> nope.  But feel free to be more specific.  Or repeat the test
> yourself---it only takes like fifteen minutes.  It''d take five if
not
> for the rebooting.
>
Here''s this one:

PSARC 2007/567 zpool failmode property
prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures
mail.opensolaris.org/pipermail/onnv-notify/2007-October/012782.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/33431740/attachment.html>

Miles Nordin

2008-Nov-20 23:37 UTC

head link

[zfs-discuss] zfs hardware failure questions

>>>>> "t" == Tim  <tim at tcsac.net> writes:
     t> Uhh, yes.  There''s more than one post here describing how to
     t> set what the system does when the pool is in a faulted
     t> state/you lose a drive.  Wait, panic, proceed.

(a) search for it, try it, then report your results.  Don''t counter a
    reasonably-designed albeit quick test with stardust and fantasy
    based on vague innuendos.

(b) ``ALL of the above are settings that can be changed.'''' 
Please
    tell me how to fix ``ALL'''' my problems, which you claim
are all
    fixable without enumerating them.  I will:

    1. panic on exporting pool
 
    2. ''zpool import foolpool'' coredumps

    3. pools backed by unavailable iSCSI targets and stored in
       zpool.cache prevent bootup

    4. WARNING: ZFS replay transaction error 30, dataset boot/usr, seq 0x134c,
txtype 9

    5. disappearing, reappearing /usr/export/t00 (the backing for the
       file vdev was another ZFS pool, so this is also a ZFS problem)

Yeah, I''m pressing the issue.  But I''m annoyed because I gave
the
question an honest go and got a clear result, and instead of repeating
my test as you easily could, you dismiss the result without even
reading it carefully.  I think it''s possible to give much better
advice by trying the the thing out than by extrapolating marketing
claims.

     t> I''ve read of more than one person here that has worked with
a
     t> Sun Engineer to manually re-create the metadata.

(a) not in this situation, they didn''t.  The loss was much less
    catastrophic than a whole vdev---in fact it wasn''t ever proven
    that anything was lost at all by the underlying storage.  I think
    the pool may also have been simpler structure, like a
    single-device vdev exported as a LUN from a SAN.

(b) that''s not the same thing as *backline*, whatever that is.

(c) that''s pretty far short of ``if you get lucky enough ... you might
    be able to survive.''''  My claim: there is zero chance of
importing
    the pool in the normal, supported way after this failure scenario.

(d) I don''t think the OP had in mind counting on support from Sun
    engineering when you responded by suggesting an unredundant pool
    with copies=2 might suit his needs.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/10a49fba/attachment.bin>

Tim

2008-Nov-20 23:43 UTC

head link

[zfs-discuss] zfs hardware failure questions

On Thu, Nov 20, 2008 at 5:37 PM, Miles Nordin <carton at ivy.net> wrote:
> >>>>> "t" == Tim  <tim at tcsac.net> writes:
>
>      t> Uhh, yes.  There''s more than one post here describing
how to
>     t> set what the system does when the pool is in a faulted
>     t> state/you lose a drive.  Wait, panic, proceed.
>
> (a) search for it, try it, then report your results.  Don''t
counter a
>    reasonably-designed albeit quick test with stardust and fantasy
>    based on vague innuendos.
>
> (b) ``ALL of the above are settings that can be
changed.''''  Please
>    tell me how to fix ``ALL'''' my problems, which you
claim are all
>    fixable without enumerating them.  I will:
>
>    1. panic on exporting pool
>
>    2. ''zpool import foolpool'' coredumps
>
>    3. pools backed by unavailable iSCSI targets and stored in
>       zpool.cache prevent bootup
>
>    4. WARNING: ZFS replay transaction error 30, dataset boot/usr, seq
> 0x134c, txtype 9
>
>    5. disappearing, reappearing /usr/export/t00 (the backing for the
>       file vdev was another ZFS pool, so this is also a ZFS problem)
>
> Yeah, I''m pressing the issue.  But I''m annoyed because I
gave the
> question an honest go and got a clear result, and instead of repeating
> my test as you easily could, you dismiss the result without even
> reading it carefully.  I think it''s possible to give much better
> advice by trying the the thing out than by extrapolating marketing
> claims.

I gave you a the PSARC.


>
>     t> I''ve read of more than one person here that has worked
with a
>     t> Sun Engineer to manually re-create the metadata.
>
> (a) not in this situation, they didn''t.  The loss was much less
>    catastrophic than a whole vdev---in fact it wasn''t ever proven
>    that anything was lost at all by the underlying storage.  I think
>    the pool may also have been simpler structure, like a
>    single-device vdev exported as a LUN from a SAN.
>
> (b) that''s not the same thing as *backline*, whatever that is.
>
> (c) that''s pretty far short of ``if you get lucky enough ... you
might
>    be able to survive.''''  My claim: there is zero chance
of importing
>    the pool in the normal, supported way after this failure scenario.
>
> (d) I don''t think the OP had in mind counting on support from Sun
>    engineering when you responded by suggesting an unredundant pool
>    with copies=2 might suit his needs.
>
Which is why I told him not to do it, and that his chances were slim...
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/500dcebf/attachment.html>

Miles Nordin

2008-Nov-20 23:46 UTC

head link

[zfs-discuss] zfs hardware failure questions

>>>>> "t" == Tim  <tim at tcsac.net> writes:
     t> I gave you a the PSARC.

test it.

     t> Which is why I told him not to do it, and that his chances
     t> were slim...

his chances are zero.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/18c3d80b/attachment.bin>

Miles Nordin

2008-Nov-21 00:07 UTC

head link

[zfs-discuss] zfs hardware failure questions

>>>>> "t" == Tim  <tim at tcsac.net> writes:
     t> PSARC 2007/567 zpool failmode property

I think this is in s10u6, so it''s been in stable solaris for
twenty-one days.

It''s not in the release I tested, so it''s worth trying again
on a
newer release.  I tried on snv_83a, and instead of getting a panic
when I export the pool, I get a panic when I try to import it:

-----8<-----
bash-3.2# mkfile 64m t0
bash-3.2# mkfile 64m t1
bash-3.2# mkfile 64m t2
bash-3.2# mkfile 64m t00
bash-3.2# pwd
/export/v
bash-3.2# zpool create foolpool raidz /export/v/t{0..2}
bash-3.2# zpool add foolpool /export/v/t00
invalid vdev specification
use ''-f'' to override the following errors:
mismatched replication level: pool uses raidz and new vdev is file
bash-3.2# zpool add -f foolpool /export/v/t00
bash-3.2# zpool set failmode=continue foolpool
bash-3.2# zfs set copies=2 foolpool
bash-3.2# cd /
bash-3.2# pax -rwpe sbin foolpool/
bash-3.2# zpool status -v foolpool
  pool: foolpool
 state: ONLINE
 scrub: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        foolpool          ONLINE       0     0     0
          raidz1          ONLINE       0     0     0
            /export/v/t0  ONLINE       0     0     0
            /export/v/t1  ONLINE       0     0     0
            /export/v/t2  ONLINE       0     0     0
          /export/v/t00   ONLINE       0     0     0

errors: No known data errors
bash-3.2# > /export/v/t00
bash-3.2# zpool status -v foolpool
  pool: foolpool
 state: ONLINE
 scrub: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        foolpool          ONLINE       0     0     0
          raidz1          ONLINE       0     0     0
            /export/v/t0  ONLINE       0     0     0
            /export/v/t1  ONLINE       0     0     0
            /export/v/t2  ONLINE       0     0     0
          /export/v/t00   ONLINE       0     0     0

errors: No known data errors
bash-3.2# pax -w foolpool/ > /dev/null
bash-3.2# zpool status -v foolpool
  pool: foolpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        foolpool          DEGRADED     5     0    22
          raidz1          ONLINE       0     0     0
            /export/v/t0  ONLINE       0     0     0
            /export/v/t1  ONLINE       0     0     0
            /export/v/t2  ONLINE       0     0     0
          /export/v/t00   DEGRADED     5     0    22  too many errors

errors: No known data errors
bash-3.2# zpool export foolpool
bash-3.2# zpool import -d /export/v
  pool: foolpool
    id: 2069059560600538796
 state: DEGRADED
status: One or more devices contains corrupted data.
action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: sun.com/msg/ZFS-8000-4J
config:

        foolpool          DEGRADED
          raidz1          ONLINE
            /export/v/t0  ONLINE
            /export/v/t1  ONLINE
            /export/v/t2  ONLINE
          /export/v/t00   UNAVAIL  corrupted data
bash-3.2# zpool import -d /export/v foolpool
panic: assertion failed: vdev_config_sync(rvd->vdev_child,
rvd->vdev_children, txg) == 0 (0x5 == 0x0), file: ../../common/fs/zfs/spa.c,
line: 4095
-----8<-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081120/44010707/attachment.bin>

zfs discuss - Nov 2008 - zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions

[zfs-discuss] zfs hardware failure questions