thr3ads.net - zfs discuss - [zfs-discuss] raidz2 [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Brian Lionberger

2007-Nov-19 23:33 UTC

[zfs-discuss] raidz2

If I yank out a disk in a raidz2 4 disk array, shouldn''t the other
disks
pick up without any errrors?
I have a 3120 JBOD and I went and yanked out a disk and the everything 
got hosed. It''s okay, because I''m just testing stuff and
wanted to see
raidz2 in action when a disk goes down.

Am I missing a step?

I set up the disks with this command:

zpool create apool raidz2 c#t#d0 c#t#d0 c#t#d0 c#t#d0.
zfs create apool/export_home
zfs create apool/export_backup
zfs set mountpoint=/export/home apool/export_home
zfs set mountpoint=/export/backup apool/export_backup

Thanks for your help/advice.
Brian.

Eric Schrock

2007-Nov-19 23:38 UTC

head link

[zfs-discuss] raidz2

On Mon, Nov 19, 2007 at 04:33:26PM -0700, Brian Lionberger
wrote:> If I yank out a disk in a raidz2 4 disk array, shouldn''t the other
disks
> pick up without any errrors?
> I have a 3120 JBOD and I went and yanked out a disk and the everything 
> got hosed. It''s okay, because I''m just testing stuff and
wanted to see
> raidz2 in action when a disk goes down.
> 
> Am I missing a step?
What version of Solaris are you running?  What does "got hosed" mean?

There have been many improvements in proactively detecting failure,
culminating in build 77 of Nevada.  Earlier builds:

- Were unable to distinguish device removal from devices misbehaving,
  depending on the driver and hardware.

- Did not diagnose a series of I/O failures as disk failure.

- Allowed several (painful) SCSI retries and continued to queue up I/O,
  even if the disk was fatally damaged.

Most classes of hardware would behave reasonably well on device removal,
but certain classes caused cascading failures in ZFS, all which should
be resolved in build 77 or later.

- Eric

--
Eric Schrock, FishWorks                        http://blogs.sun.com/eschrock

Paul Boven

2007-Nov-20 10:02 UTC

head link

[zfs-discuss] raidz2

Hi Eric, everyone,

Eric Schrock wrote:> There have been many improvements in proactively detecting failure,
> culminating in build 77 of Nevada.  Earlier builds:
> 
> - Were unable to distinguish device removal from devices misbehaving,
>   depending on the driver and hardware.
> 
> - Did not diagnose a series of I/O failures as disk failure.
> 
> - Allowed several (painful) SCSI retries and continued to queue up I/O,
>   even if the disk was fatally damaged.
> Most classes of hardware would behave reasonably well on device removal,
> but certain classes caused cascading failures in ZFS, all which should
> be resolved in build 77 or later.
I seem to be having exactly the problems you are describing (see my
postings with the subject ''zfs on a raid box''). So I would
very much
like to give b77 a try. I''m currently running b76, as that''s
the latest
sxce that''s available. Are the sources to anything beyond b76 already
available? Would I need to build it, or bfu?

I''m seeing zfs not making use of available hot-spares when I pull a
disk, long and indeed painful SCSI retries and very poor write
performance on a degraded zpool - I hope to be able to test if b77 fares
any better with this.

Regards, Paul Boven.
-- 
Paul Boven <boven at jive.nl> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It''s a fringe science

Richard Elling

2007-Nov-20 18:10 UTC

head link

[zfs-discuss] raidz2

comment on retries below...

Paul Boven wrote:> Hi Eric, everyone,
> 
> Eric Schrock wrote:
>> There have been many improvements in proactively detecting failure,
>> culminating in build 77 of Nevada.  Earlier builds:
>>
>> - Were unable to distinguish device removal from devices misbehaving,
>>   depending on the driver and hardware.
>>
>> - Did not diagnose a series of I/O failures as disk failure.
>>
>> - Allowed several (painful) SCSI retries and continued to queue up I/O,
>>   even if the disk was fatally damaged.
> 
>> Most classes of hardware would behave reasonably well on device
removal,
>> but certain classes caused cascading failures in ZFS, all which should
>> be resolved in build 77 or later.
> 
> I seem to be having exactly the problems you are describing (see my
> postings with the subject ''zfs on a raid box''). So I
would very much
> like to give b77 a try. I''m currently running b76, as
that''s the latest
> sxce that''s available. Are the sources to anything beyond b76
already
> available? Would I need to build it, or bfu?
> 
> I''m seeing zfs not making use of available hot-spares when I pull
a
> disk, long and indeed painful SCSI retries and very poor write
> performance on a degraded zpool - I hope to be able to test if b77 fares
> any better with this.
The SCSI retries are implemented at the driver level (usually sd) below
ZFS.  By default, the timeout (60s) and retry (3 or 5) counters are
somewhat conservative and intended to apply to a wide variety of hardware,
including slow CD-ROMs and ancient processors.  Depending on your
situation and business requirements, these may be tuned.  There is a
pretty good article on BigAdmin which describes tuning the FC side of
the equation (ssd driver).
	http://www.sun.com/bigadmin/features/hub_articles/tuning_sfs.jsp

Beware, making these tunables too small can lead to an unstable system.
The article does a good job of explaining how interdependent the tunables
are, so hopefully you can make wise choices.
  -- richard

Eric Schrock

2007-Nov-20 18:10 UTC

head link

[zfs-discuss] raidz2

On Tue, Nov 20, 2007 at 11:02:55AM +0100, Paul Boven
wrote:> 
> I seem to be having exactly the problems you are describing (see my
> postings with the subject ''zfs on a raid box''). So I
would very much
> like to give b77 a try. I''m currently running b76, as
that''s the latest
> sxce that''s available. Are the sources to anything beyond b76
already
> available? Would I need to build it, or bfu?
The sources, yes (you can pull them from the ON mercurial mirror).  It
looks like the latest SX:CE is still on build 76, so it doesn''t seem
like you can get a binary distro, yet.
> 
> I''m seeing zfs not making use of available hot-spares when I pull
a
> disk, long and indeed painful SCSI retries and very poor write
> performance on a degraded zpool - I hope to be able to test if b77 fares
> any better with this.
What hardware/driver are you using?  Build 76 should have the ability to
recognize removed devices via DKIOCGETSTATE and immediately transition
to the REMOVED state instead of going through the SCSI retry logic (3x
60 seconds).  Build 77 added a ''probe'' operation on I/O
failure that
will try to read/write some basic data to the disk and if that fails
will immediately determine the disk as FAULTED without having to wait
for retries to fail and FMA diagnosis to offline the device.

- Eric

--
Eric Schrock, FishWorks                        http://blogs.sun.com/eschrock

zfs discuss - Nov 2007 - raidz2

[zfs-discuss] raidz2

[zfs-discuss] raidz2

[zfs-discuss] raidz2

[zfs-discuss] raidz2

[zfs-discuss] raidz2