thr3ads.net - zfs discuss - [zfs-discuss] zfs, raidz, spare and jbod [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Claus Guttesen

2008-Jul-25 11:25 UTC

[zfs-discuss] zfs, raidz, spare and jbod

Hi.

I installed solaris express developer edition (b79) on a supermicro
quad-core harpertown E5405 with 8 GB ram and two internal sata-drives.
I installed solaris onto one of the internal drives. I added an areca
arc-1680 sas-controller and configured it in jbod-mode. I attached an
external sas-cabinet with 16 sas-drives 1 TB (931 binary GB). I
created a raidz2-pool with ten disks and one spare. I then copied some
400 GB of small files each approx. 1 MB. To simulate a disk-crash I
pulled one disk out of the cabinet and zfs faulted the drive and used
the spare and started a resilver.

During the resilver-process one of the remaining disks had a
checksum-error and was marked as degraded. The zpool is now
unavailable. I first tried to add another spare but got I/O-error. I
then tried to replace the degraded disk by adding a new one:

# zpool add ef1 c3t1d3p0
cannot open ''/dev/dsk/c3t1d3p0'': I/O error

Partial dmesg:

Jul 25 13:14:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=1 lun=3 ccb=''0xffffff02e0ca0800'' outstanding command
timeout
Jul 25 13:14:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=1 lun=3 fatal error on target, device was gone
Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=1
Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=0
Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=8 lun=0 ccb=''0xffffff02e0c8be00'' outstanding command
timeout
Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=8 lun=0 fatal error on target, device was gone
Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=0 lun=0 ccb=''0xffffff02e0c92a00'' outstanding command
timeout
Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=0 lun=0 fatal error on target, device was gone
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=1
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=0
Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=0 lun=5 ccb=''0xffffff02e0c97200'' outstanding command
timeout
Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=0 lun=5 fatal error on target, device was gone
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=1
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=0
Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
id=1 lun=3 ccb=''0xffffff02e0ca0800'' outstanding command
timeout
Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
id=1 lun=3 fatal error on target, device was gone
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=1
Jul 25 13:15:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
arcmsr0: tran reset level=0
Jul 25 13:15:00 malene scsi: [ID 107833 kern.warning] WARNING:
/pci at 0,0/pci8086,25f9 at 6/pci10b5,8533 at 0/pci10b5,8533 at 9/pci17d3,1680
at 0/sd at 1,3
(sd8):
Jul 25 13:15:00 malene  offline or reservation conflict

/usr/sbin/zpool status
  pool: ef1
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use ''zpool clear'' to
mark the device
        repaired.
 scrub: resilver in progress, 0.02% done, 5606h29m to go
config:

        NAME            STATE     READ WRITE CKSUM
        ef1             DEGRADED     0     0     0
          raidz2        DEGRADED     0     0     0
            spare       ONLINE       0     0     0
              c3t0d0p0  ONLINE       0     0     0
              c3t1d2p0  ONLINE       0     0     0
            c3t0d1p0    ONLINE       0     0     0
            c3t0d2p0    ONLINE       0     0     0
            c3t0d0p0    FAULTED     35 1.61K     0  too many errors
            c3t0d4p0    ONLINE       0     0     0
            c3t0d5p0    DEGRADED     0     0    34  too many errors
            c3t0d6p0    ONLINE       0     0     0
            c3t0d7p0    ONLINE       0     0     0
            c3t1d0p0    ONLINE       0     0     0
            c3t1d1p0    ONLINE       0     0     0
        spares
          c3t1d2p0      INUSE     currently in use

errors: No known data errors

When I try to start cli64 to access the arc-1680-card it hangs as well.

Is this a deficiency in the arcmsr-driver?

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

James C. McPherson

2008-Jul-25 11:52 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

Hi Claus,

Claus Guttesen wrote:> Hi.
> 
> I installed solaris express developer edition (b79) on a supermicro
> quad-core harpertown E5405 with 8 GB ram and two internal sata-drives.
> I installed solaris onto one of the internal drives. I added an areca
> arc-1680 sas-controller and configured it in jbod-mode. I attached an
> external sas-cabinet with 16 sas-drives 1 TB (931 binary GB). I
> created a raidz2-pool with ten disks and one spare. I then copied some
> 400 GB of small files each approx. 1 MB. To simulate a disk-crash I
> pulled one disk out of the cabinet and zfs faulted the drive and used
> the spare and started a resilver.
I''m not convinced that this is a valid test; yanking a disk out
will have physical-layer effects apart from removing the device
from your system. I think relling or roch would have something
to say on this also.
> During the resilver-process one of the remaining disks had a
> checksum-error and was marked as degraded. The zpool is now
> unavailable. I first tried to add another spare but got I/O-error. I
> then tried to replace the degraded disk by adding a new one:
> 
> # zpool add ef1 c3t1d3p0
> cannot open ''/dev/dsk/c3t1d3p0'': I/O error
> 
> Partial dmesg:
> 
> Jul 25 13:14:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
> id=1 lun=3 ccb=''0xffffff02e0ca0800'' outstanding command
timeout
> Jul 25 13:14:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
> id=1 lun=3 fatal error on target, device was gone
> Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
> arcmsr0: tran reset level=1
tran reset with level=1 is a bus reset
> Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
> arcmsr0: tran reset level=0
tran reset with level=0 is a target-specific reset, which arcmsr
doesn''t support.

...
> Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
> id=1 lun=3 ccb=''0xffffff02e0ca0800'' outstanding command
timeout
> Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
> id=1 lun=3 fatal error on target, device was gone
The command timed out because your system configuration was unexpectedly
changed in a manner which arcmsr doesn''t support.

....
> /usr/sbin/zpool status
>   pool: ef1
>  state: DEGRADED
> status: One or more devices are faulted in response to persistent errors.
>         Sufficient replicas exist for the pool to continue functioning in a
>         degraded state.
> action: Replace the faulted device, or use ''zpool clear''
to mark the device
>         repaired.
>  scrub: resilver in progress, 0.02% done, 5606h29m to go
> config:
> 
>         NAME            STATE     READ WRITE CKSUM
>         ef1             DEGRADED     0     0     0
>           raidz2        DEGRADED     0     0     0
>             spare       ONLINE       0     0     0
>               c3t0d0p0  ONLINE       0     0     0
>               c3t1d2p0  ONLINE       0     0     0
>             c3t0d1p0    ONLINE       0     0     0
>             c3t0d2p0    ONLINE       0     0     0
>             c3t0d0p0    FAULTED     35 1.61K     0  too many errors
>             c3t0d4p0    ONLINE       0     0     0
>             c3t0d5p0    DEGRADED     0     0    34  too many errors
>             c3t0d6p0    ONLINE       0     0     0
>             c3t0d7p0    ONLINE       0     0     0
>             c3t1d0p0    ONLINE       0     0     0
>             c3t1d1p0    ONLINE       0     0     0
>         spares
>           c3t1d2p0      INUSE     currently in use
> 
> errors: No known data errors
a double disk failure while resilvering - not a good state for your
pool to be in.

Can you wait for the resilver to complete? Every minute that goes
by tends to decrease the estimate on how long remains.

In addition, why are you using p0 devices rather than GPT-labelled
disks (or whole-disk s0 slices) ?
> When I try to start cli64 to access the arc-1680-card it hangs as well.
> Is this a deficiency in the arcmsr-driver?
I''ll quibble - "this" can mean several things.

Yes, there seems to be an issue with arcmsr''s handling of uncoordinated
device removal. I advise against doing this

I don''t know how cli64 works and you haven''t provided any
messages output
from the system at the time when "it hangs" - is that the cli64 util,
the system, your zpool?...

For interest - which version of arcmsr are you running?



James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp	http://www.jmcp.homeunix.com/blog

Claus Guttesen

2008-Jul-25 13:27 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

>> I installed solaris express developer edition (b79) on a supermicro
>> quad-core harpertown E5405 with 8 GB ram and two internal sata-drives.
>> I installed solaris onto one of the internal drives. I added an areca
>> arc-1680 sas-controller and configured it in jbod-mode. I attached an
>> external sas-cabinet with 16 sas-drives 1 TB (931 binary GB). I
>> created a raidz2-pool with ten disks and one spare. I then copied some
>> 400 GB of small files each approx. 1 MB. To simulate a disk-crash I
>> pulled one disk out of the cabinet and zfs faulted the drive and used
>> the spare and started a resilver.
>
> I''m not convinced that this is a valid test; yanking a disk out
> will have physical-layer effects apart from removing the device
> from your system. I think relling or roch would have something
> to say on this also.
In later tests I will use zpool to off-line the disk instead. Thank
you for pointing this out.
>> During the resilver-process one of the remaining disks had a
>> checksum-error and was marked as degraded. The zpool is now
>> unavailable. I first tried to add another spare but got I/O-error. I
>> then tried to replace the degraded disk by adding a new one:
>>
>> # zpool add ef1 c3t1d3p0
>> cannot open ''/dev/dsk/c3t1d3p0'': I/O error
>>
>> Partial dmesg:
>>
>> Jul 25 13:14:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
>> id=1 lun=3 ccb=''0xffffff02e0ca0800'' outstanding
command timeout
>> Jul 25 13:14:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
>> id=1 lun=3 fatal error on target, device was gone
>> Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
>> arcmsr0: tran reset level=1
>
> tran reset with level=1 is a bus reset
>
>> Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
>> arcmsr0: tran reset level=0
>
> tran reset with level=0 is a target-specific reset, which arcmsr
> doesn''t support.
>
> ...
>
>> Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
>> id=1 lun=3 ccb=''0xffffff02e0ca0800'' outstanding
command timeout
>> Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
>> id=1 lun=3 fatal error on target, device was gone
>
> The command timed out because your system configuration was unexpectedly
> changed in a manner which arcmsr doesn''t support.
Are there alternative jbod-capable sas-controllers in the same range
as the arc-1680? That is compatible with solaris? I choosed the
arc-1680 since it''s well-supported on FreeBSD and Solaris.
>> /usr/sbin/zpool status
>>  pool: ef1
>>  state: DEGRADED
>> status: One or more devices are faulted in response to persistent
errors.
>>        Sufficient replicas exist for the pool to continue functioning
in a
>>        degraded state.
>> action: Replace the faulted device, or use ''zpool
clear'' to mark the
>> device
>>        repaired.
>>  scrub: resilver in progress, 0.02% done, 5606h29m to go
>> config:
>>
>>        NAME            STATE     READ WRITE CKSUM
>>        ef1             DEGRADED     0     0     0
>>          raidz2        DEGRADED     0     0     0
>>            spare       ONLINE       0     0     0
>>              c3t0d0p0  ONLINE       0     0     0
>>              c3t1d2p0  ONLINE       0     0     0
>>            c3t0d1p0    ONLINE       0     0     0
>>            c3t0d2p0    ONLINE       0     0     0
>>            c3t0d0p0    FAULTED     35 1.61K     0  too many errors
>>            c3t0d4p0    ONLINE       0     0     0
>>            c3t0d5p0    DEGRADED     0     0    34  too many errors
>>            c3t0d6p0    ONLINE       0     0     0
>>            c3t0d7p0    ONLINE       0     0     0
>>            c3t1d0p0    ONLINE       0     0     0
>>            c3t1d1p0    ONLINE       0     0     0
>>        spares
>>          c3t1d2p0      INUSE     currently in use
>>
>> errors: No known data errors
>
> a double disk failure while resilvering - not a good state for your
> pool to be in.
The degraded disk came after I pulled the first disk and was not intended. :-)
> Can you wait for the resilver to complete? Every minute that goes
> by tends to decrease the estimate on how long remains.
The resilver had approx. three hours remaining when the second disk
was marked as degraded. After that the resilver process (and access as
such) to the raidz2-pool stopped.
> In addition, why are you using p0 devices rather than GPT-labelled
> disks (or whole-disk s0 slices) ?
My ignorance. I''m a fairly seasoned FreeBSD-administrator and had
previously used da0, da1, da2 etc. when I defined a similar raidz2 on
FreeBSD. But when I installed solaris I initially saw lun 0 on target
0 and 1 and then tried the devices that I saw. And the p0-device in
/dev/dsk was the first to respond to my zpool create-command. :^)

Modifying /kernel/drv/sd.conf made all the lun''s visible.

Solaris is a different kind of animal. I have destroyed and created a
new raidz2 using the c3t0d0, c3t0d1, c3t0d2 etc. devices instead.
> I don''t know how cli64 works and you haven''t provided any
messages output
> from the system at the time when "it hangs" - is that the cli64
util,
> the system, your zpool?...
I tried to start the program but it hung. Here is an example when I
can access the utility:

CLI> disk info
  # Enc# Slot#   ModelName                        Capacity  Usage
============================================================================== 
1  01  Slot#1  N.A.                                0.0GB  N.A.
  2  01  Slot#2  N.A.                                0.0GB  N.A.
  3  01  Slot#3  N.A.                                0.0GB  N.A.
  4  01  Slot#4  N.A.                                0.0GB  N.A.
  5  01  Slot#5  N.A.                                0.0GB  N.A.
  6  01  Slot#6  N.A.                                0.0GB  N.A.
  7  01  Slot#7  N.A.                                0.0GB  N.A.
  8  01  Slot#8  N.A.                                0.0GB  N.A.
  9  02  SLOT 000 SEAGATE ST31000640SS             1000.2GB  JBOD
 10  02  SLOT 001 SEAGATE ST31000640SS             1000.2GB  JBOD
 11  02  SLOT 002 SEAGATE ST31000640SS             1000.2GB  JBOD
 12  02  SLOT 003 SEAGATE ST31000640SS             1000.2GB  JBOD
 13  02  SLOT 004 SEAGATE ST31000640SS             1000.2GB  JBOD
 14  02  SLOT 005 SEAGATE ST31000640SS             1000.2GB  JBOD
 15  02  SLOT 006 SEAGATE ST31000640SS             1000.2GB  JBOD
 16  02  SLOT 007 SEAGATE ST31000640SS             1000.2GB  JBOD
 17  02  SLOT 008 SEAGATE ST31000640SS             1000.2GB  JBOD
 18  02  SLOT 009 SEAGATE ST31000640SS             1000.2GB  JBOD
 19  02  SLOT 010 SEAGATE ST31000640SS             1000.2GB  JBOD
 20  02  SLOT 011 SEAGATE ST31000640SS             1000.2GB  JBOD
 21  02  SLOT 012 SEAGATE ST31000640SS             1000.2GB  JBOD
 22  02  SLOT 013 SEAGATE ST31000640SS             1000.2GB  JBOD
 23  02  SLOT 014 SEAGATE ST31000640SS             1000.2GB  JBOD
 24  02  SLOT 015 SEAGATE ST31000640SS             1000.2GB  JBOD
==============================================================================> For interest - which version of arcmsr are you running?
I''m running the version that was supplied on the CD, this is
1.20.00.15 from 2007-04-04. The firmware is V1.45 from 2008-3-27.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

Chad Leigh -- Shire.Net LLC

2008-Jul-25 13:41 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

On Jul 25, 2008, at 7:27 AM, Claus Guttesen wrote:
> I''m running the version that was supplied on the CD, this is
> 1.20.00.15 from 2007-04-04. The firmware is V1.45 from 2008-3-27.


Check the version at the Areca website.  They may have a more recent  
driver there.  The dates are later for the 1.20.00.15 and there is a  
-71010 extension.

Otherwise, file a bug with Areca.  They are pretty good about  
responding.

Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net

Claus Guttesen

2008-Jul-25 13:57 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

>> I''m running the version that was supplied on the CD, this is
>> 1.20.00.15 from 2007-04-04. The firmware is V1.45 from 2008-3-27.
>
> Check the version at the Areca website.  They may have a more recent driver
> there.  The dates are later for the 1.20.00.15 and there is a -71010
> extension.
>
> Otherwise, file a bug with Areca.  They are pretty good about responding.
I actually tried this driver as well but from the file pkginfo the
driver from the ftp-server is VERSION=1.20.00.13,REV=2006.08.14 where
the supplied driver is VERSION=1.20.00.15,REV=2007.08.14.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

Miles Nordin

2008-Jul-25 20:48 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

>>>>> "jcm" == James C McPherson <James.McPherson at
Sun.COM> writes:
   jcm> I''m not convinced that this is a valid test; yanking a disk

it is the ONLY valid test.  it''s just testing more than ZFS.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080725/8029dbee/attachment.bin>

Richard Elling

2008-Jul-25 21:34 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

Miles Nordin wrote:>>>>>> "jcm" == James C McPherson
<James.McPherson at Sun.COM> writes:
>>>>>>             
>
>    jcm> I''m not convinced that this is a valid test; yanking a
disk
>
> it is the ONLY valid test.  it''s just testing more than ZFS.
>   
disagree.  It is only a test of the failure mode of yanking a disk.
I will submit that this failure mode is often best solved by door
locks, not software.

FWIW, I did post a Pareto chart of disk failure modes we
measured in our installed base over a large sample size.
http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection
"yanking a disk" did not even make the "other" category.
 -- richard

Miles Nordin

2008-Jul-25 23:17 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

>>>>> "re" == Richard Elling <Richard.Elling at
Sun.COM> writes:
re> I will submit that this failure mode is often best
re> solved by door locks, not software.

First, not just door locks, but:

* redundant power supplies

* sleds and Maintain Me, Please lights

* high-strung extremely conservative sysadmins who take months to do
small jobs and demand high salaries

* racks, pedestals, separate rooms, mains wiring diversity

in short, all the costly and cumbersome things ZFS is supposed to make
optional.

Secondly, from skimming the article you posted, ``did not even make
the Other category'''' in this case seems to mean the study
doesn''t
consider it, not that you captured some wholistic reliability data and
found that it didn''t occur.

Thirdly, as people keep saying over and over in here, the reason they
pull drives is to simulate the kind of fails-to-spin,
fails-to-IDENTIFY, spews garbage onto the bus drive that many of us
have seen cause lower-end systems to do weird things. If it didn''t
happen, we wouldn''t have *SEEN* it, and wouldn''t be trying to
simulate
it. You can''t make me distrust my own easily-remembered experience
from like two months ago by plotting some bar chart.

A month ago you were telling us these tiny boards with some $10
chinese chip that split one SATA connector into two, built into Sun''s
latest JBOD drive sleds, are worth a 500% markup on 1TB drives because
in the real world, cables fail, controllers fail, drives spew garbage
onto busses, therefore simple fan-out port multipliers are not good
enough---you need this newly-conceived ghetto-multipath. Now you''re
telling me failed controllers, cables, and drive firmware is allowed
to lock a whole kernel because it ``doesn''t even make the Other
category.'''' sorry, that does not compute.

I think I''m going to want a ``simulate channel A
failure'''' button on
this $700 sled. If only the sled weren''t so expensive I could
simulate it myself by sanding off the resist and scribbling over the
traces with a pencil or something. I basically don''t trust any of it
any more, and I''ll stop pulling drives when I have a
drive-failure-simulator I trust more than that procedure. ''zpool
offline'' is not a drive-failure-simulator---I''ve already
established
on my own system it''s very different, and there is at least one fix
going into b94 trying to close that gap.

I''m sorry, this is just ridiculous.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080725/7d744abd/attachment.bin>

Bob Friesenhahn

2008-Jul-25 23:45 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

On Fri, 25 Jul 2008, Miles Nordin wrote:
> I think I''m going to want a ``simulate channel A
failure'''' button on
> this $700 sled.  If only the sled weren''t so expensive I could
Why don''t you just purchase the smallest possible drive from Sun and 
replace it with a cheap graymarket 1.5TB drive from some random place 
on the net?  Then install the small drive in your home PC.  That is 
what the rest of us who don''t care about reliability, warranty, or 
service, do (but the home PC runs great!).

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Miles Nordin

2008-Jul-25 23:50 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

>>>>> "bf" == Bob Friesenhahn <bfriesen at
simple.dallas.tx.us> writes:
    bf> purchase the smallest possible drive 

right, good point.  The failed-channel-simulator could be constructed
from the smallest drive/sled module.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080725/f5030484/attachment.bin>

Richard Elling

2008-Jul-26 01:00 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

Miles Nordin wrote:>>>>>> "re" == Richard Elling <Richard.Elling at
Sun.COM> writes:
>>>>>>             
>
>     re> I will submit that this failure mode is often best
>     re> solved by door locks, not software.
>
> First, not just door locks, but:
>
>  * redundant power supplies
>
>  * sleds and Maintain Me, Please lights
>
>  * high-strung extremely conservative sysadmins who take months to do
>    small jobs and demand high salaries
>
>  * racks, pedestals, separate rooms, mains wiring diversity
>
> in short, all the costly and cumbersome things ZFS is supposed to make
> optional.
>
>   
:-) I don''t think it is in the ZFS design scope to change diversity...
> Secondly, from skimming the article you posted, ``did not even make
> the Other category'''' in this case seems to mean the study
doesn''t
> consider it, not that you captured some wholistic reliability data and
> found that it didn''t occur.
>   
You are correct that in the samples we collected, we had no records
of disks spontaneously falling out of the system.  The failures we
collected for this study were those not caused by service actions.
> Thirdly, as people keep saying over and over in here, the reason they
> pull drives is to simulate the kind of fails-to-spin,
> fails-to-IDENTIFY, spews garbage onto the bus drive that many of us
> have seen cause lower-end systems to do weird things.  If it
didn''t
> happen, we wouldn''t have *SEEN* it, and wouldn''t be
trying to simulate
> it.  You can''t make me distrust my own easily-remembered
experience
> from like two months ago by plotting some bar chart.
>   
What happens when the device suddenly disappears is that the
device selection fails. This exercises a code path that is relatively
short and does the obvious.  A failure to spin exercises a very
different code path because the host can often talk to the disk,
but the disk itself is sick.
> A month ago you were telling us these tiny boards with some $10
> chinese chip that split one SATA connector into two, built into
Sun''s
> latest JBOD drive sleds, are worth a 500% markup on 1TB drives because
> in the real world, cables fail, controllers fail, drives spew garbage
> onto busses, therefore simple fan-out port multipliers are not good
> enough---you need this newly-conceived ghetto-multipath.  Now
you''re
> telling me failed controllers, cables, and drive firmware is allowed
> to lock a whole kernel because it ``doesn''t even make the Other
> category.''''  sorry, that does not compute.
>   
I believe the record will show that there are known bugs in
the Marvell driver which have caused this problem for SATA
drives.  In the JBOD sled case, this exact problem would not
exist because you hot-plug to SAS interfaces, not SATA
interfaces -- different controller and driver.
> I think I''m going to want a ``simulate channel A
failure'''' button on
> this $700 sled.  If only the sled weren''t so expensive I could
> simulate it myself by sanding off the resist and scribbling over the
> traces with a pencil or something.  I basically don''t trust any of
it
> any more, and I''ll stop pulling drives when I have a
> drive-failure-simulator I trust more than that procedure.  ''zpool
> offline'' is not a drive-failure-simulator---I''ve already
established
> on my own system it''s very different, and there is at least one
fix
> going into b94 trying to close that gap.
>
> I''m sorry, this is just ridiculous.
>   
With parallel SCSI this was a lot easier -- we could just wire a
switch into the bus and cause stuck-at faults quite easily.  With
SAS and SATA it is more difficult because they only share
differential pairs in a point-to-point link.  There is link detection
going on all of the time which precludes testing for stuck-at
faults.  Each packet has CRCs, so in order to induce a known
bad packet for testing you''ll have to write some code which
makes intentionally bad packets.  But this will only really test
the part of the controller chip which does CRC validation, which
is, again, probably not what you want.  It actually works a lot
more like Ethernet, which also has differential signalling, link
detection, and CRCs.

But if you really just want to do fault injections, then you should
look at ztest, http://opensolaris.org/os/community/zfs/ztest/
though it is really a ZFS code-path exerciser and not a Marvell
driver path exerciser.  If you want to test the Marvell code path
then you might look at project COMSTAR which will allow
you to configure another host to look like a disk and then you
can make all sorts of simulated disk faults by making unexpected
responses, borken packets, really slow responses, etc.
http://opensolaris.org/os/project/comstar/
 -- richard

James C. McPherson

2008-Jul-27 07:14 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

Claus Guttesen wrote:
...
>>> Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0:
scsi
>>> id=1 lun=3 ccb=''0xffffff02e0ca0800'' outstanding
command timeout
>>> Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0:
scsi
>>> id=1 lun=3 fatal error on target, device was gone
>> The command timed out because your system configuration was
unexpectedly
>> changed in a manner which arcmsr doesn''t support.
> 
> Are there alternative jbod-capable sas-controllers in the same range
> as the arc-1680? That is compatible with solaris? I choosed the
> arc-1680 since it''s well-supported on FreeBSD and Solaris.
I don''t know, quite probably :) Have a look at the the HCL for both
Solaris 10, Solaris Express and OpenSolaris 2008.05 -

http://www.sun.com/bigadmin/hcl/
http://www.sun.com/bigadmin/hcl/data/sx/
http://www.sun.com/bigadmin/hcl/data/os/
>>> /usr/sbin/zpool status
>>>  pool: ef1
>>>  state: DEGRADED
>>> status: One or more devices are faulted in response to persistent
errors.
>>>        Sufficient replicas exist for the pool to continue
functioning in a
>>>        degraded state.
>>> action: Replace the faulted device, or use ''zpool
clear'' to mark the
>>> device
>>>        repaired.
>>>  scrub: resilver in progress, 0.02% done, 5606h29m to go
>>> config:
>>>
>>>        NAME            STATE     READ WRITE CKSUM
>>>        ef1             DEGRADED     0     0     0
>>>          raidz2        DEGRADED     0     0     0
>>>            spare       ONLINE       0     0     0
>>>              c3t0d0p0  ONLINE       0     0     0
>>>              c3t1d2p0  ONLINE       0     0     0
>>>            c3t0d1p0    ONLINE       0     0     0
>>>            c3t0d2p0    ONLINE       0     0     0
>>>            c3t0d0p0    FAULTED     35 1.61K     0  too many errors
>>>            c3t0d4p0    ONLINE       0     0     0
>>>            c3t0d5p0    DEGRADED     0     0    34  too many errors
>>>            c3t0d6p0    ONLINE       0     0     0
>>>            c3t0d7p0    ONLINE       0     0     0
>>>            c3t1d0p0    ONLINE       0     0     0
>>>            c3t1d1p0    ONLINE       0     0     0
>>>        spares
>>>          c3t1d2p0      INUSE     currently in use
>>>
>>> errors: No known data errors
>> a double disk failure while resilvering - not a good state for your
>> pool to be in.
> 
> The degraded disk came after I pulled the first disk and was not intended.
:-)
That''s usually the case :)

>> Can you wait for the resilver to complete? Every minute that goes
>> by tends to decrease the estimate on how long remains.
> 
> The resilver had approx. three hours remaining when the second disk
> was marked as degraded. After that the resilver process (and access as
> such) to the raidz2-pool stopped.
I think that''s probably to be expected.
>> In addition, why are you using p0 devices rather than GPT-labelled
>> disks (or whole-disk s0 slices) ?
> 
> My ignorance. I''m a fairly seasoned FreeBSD-administrator and had
> previously used da0, da1, da2 etc. when I defined a similar raidz2 on
> FreeBSD. But when I installed solaris I initially saw lun 0 on target
> 0 and 1 and then tried the devices that I saw. And the p0-device in
> /dev/dsk was the first to respond to my zpool create-command. :^)
Not to worry - every OS handles things a little different in
that area.
> Modifying /kernel/drv/sd.conf made all the lun''s visible.
Yes - by default the Areca will only present targets, not any
luns underneath so sd.conf modification is necessary. I''m working
on getting that fixed.




James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp	http://www.jmcp.homeunix.com/blog

Claus Guttesen

2008-Aug-05 14:51 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

> I installed solaris express developer edition (b79) on a supermicro
> quad-core harpertown E5405 with 8 GB ram and two internal sata-drives.
> I installed solaris onto one of the internal drives. I added an areca
> arc-1680 sas-controller and configured it in jbod-mode. I attached an
> external sas-cabinet with 16 sas-drives 1 TB (931 binary GB). I
> created a raidz2-pool with ten disks and one spare. I then copied some
> 400 GB of small files each approx. 1 MB. To simulate a disk-crash I
> pulled one disk out of the cabinet and zfs faulted the drive and used
> the spare and started a resilver.
>
> During the resilver-process one of the remaining disks had a
> checksum-error and was marked as degraded. The zpool is now
> unavailable. I first tried to add another spare but got I/O-error. I
> then tried to replace the degraded disk by adding a new one:
>
> # zpool add ef1 c3t1d3p0
> cannot open ''/dev/dsk/c3t1d3p0'': I/O error
>
> Partial dmesg:
>
> Jul 25 13:14:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi
> id=1 lun=3 ccb=''0xffffff02e0ca0800'' outstanding command
timeout
> Jul 25 13:14:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi
> id=1 lun=3 fatal error on target, device was gone
> Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING:
> arcmsr0: tran reset level=1
>
> Is this a deficiency in the arcmsr-driver?
I beleive I have found the problem. I tried to define a raid-5-volume
on the arc-1680-card and still saw errors as mentioned above.
Areca-support suggested that I upgraded to the lastest solaris-drivers
(located in the beta-folder) and upgraded firmware as well. I did both
and it somewhat solved my problems but I had very poor
write-performance, 2-6 MB/s.

So I deleted my zpool and changed the arc-1680-configuration and put
all disks in passthrough-mode. I created a new zpool and performed
similar tests and have not experienced any abnormal behaviour.

I''m re-installing the server with FreeBSD and will do similar tests
and report back.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

Brian Couper

2008-Aug-21 20:40 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

Hi,

One of the thing you could have done to continue the resilver is "zpool
clear"
This would have let you continue to replace the drive you pulled out. Once that
was done you could have them figured out what was wrong with the second faulty
drive.

The second drive only had check sum errors, ZFS was doing it job, the data on
the was usable.  zpool clear would have keeped the pool online, all be it with
lots of complaints.

I have had to use  zpool clear multiple times on one of my zpools after a PSU
failure took out a HDD and damaged another. Mobo and ram died too :(
The damaged drive racked up thousands and thousands of errors while i replaced
the dead drive. in the end i only lost one small file.

Am no expert, but thats how I got round a similar problem.
 
 
This message posted from opensolaris.org

Joe Crain

2008-Nov-12 19:08 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

I had the same problem described by  kometen with our Areca ARC-1680 controller
on opensolaris 2008.05.  We were using the controller in JBOD mode and allowing
zpool to to use entire disks.

Setting the drives in pass-through mode on the Areca controller manager solved
the issue.

Also worthy to note, after the zpool degraded under operation in JBOD and
subsequently reconfiguring the controller to pass-through each disk, zpool
import was able to recover the corrupted zpool.
-- 
This message posted from opensolaris.org

Diego Remolina

2009-Jan-09 14:57 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

Could you explain if you did any specific configuration on the Areca Raid
controller other than setting it to Raid and manually marking every disk as
pass-trhough so that the disks are viewable from opensolaris?

I have an ARC-1680ix-16. I have tried two configurations. JBOD and RAID but
making all drives pass-through as you suggest.

In both Instances, I can only see two hard drives from opensolaris. I have tried
a RHEL 5.2 Installation with the Areca controller and I can see all the drives
using RAID with pass-through. Do I need any boot parameters for opensolaris or
something else?

The Areca controller assigned the following settings for the drives when
configured as pass-through

Channel-SCSI_ID-LUN     Disk#
0-0-0     01
0-0-1     02
0-0-2     03
0-0-3     04
0-0-4     05
0-0-5     06
0-0-6     07
0-0-7     08
----------
0-1-0     09
0-1-1     10
0-1-2     11
0-1-3     12
0-1-4     13
0-1-5     14
0-1-6     15
0-1-7     16

My Firmware is 1.45 and I am using the areca driver that comes with opensolaris
2008.11

Any help would be greatly appreciated.
-- 
This message posted from opensolaris.org

Rob

2010-Jan-09 13:26 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

I have installed Opensolaris build 129 on our server. It has 12 disk at a Areca
1130 controller. Using the latest firmware.

I have put all the disk in jbod and running them in raidz2. After a while the
systems hangs with arcmsr0: tran reset )level 0x1) called for target4 lun 0
target reset not supported

What can I do? I really want it to work! 

(I am gone set all the disk to pass-through monday)
-- 
This message posted from opensolaris.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: solaris troubles.jpg
Type: image/jpeg
Size: 145619 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100109/e5e26997/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: solaris troubles2.jpg
Type: image/jpeg
Size: 227387 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100109/e5e26997/attachment-0001.jpg>

Arnaud Brand

2010-Jan-10 12:41 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

We had a similar problem on Areca 1680. It was caused by a drive that 
didn''t properly reset (took ~2 seconds each time according to the drive
tray''s led).
Replacing the drive solved this problem, but then we hit another problem 
which you can see in this thread : 
http://opensolaris.org/jive/thread.jspa?threadID=121335&tstart=0

I''m curious wether you have a similar setup and encounter the same
problems.
How did you setup your pools ?

Please tell me if you have any luck setting the drives to pass-through.

Thanks,
Arnaud

Le 09/01/10 14:26, Rob a ?crit :> I have installed Opensolaris build 129 on our server. It has 12 disk at a
Areca 1130 controller. Using the latest firmware.
>
> I have put all the disk in jbod and running them in raidz2. After a while
the systems hangs with arcmsr0: tran reset )level 0x1) called for target4 lun 0
> target reset not supported
>
> What can I do? I really want it to work!
>
> (I am gone set all the disk to pass-through monday)
>    
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Rob

2010-Jan-10 19:20 UTC

head link

[zfs-discuss] zfs, raidz, spare and jbod

Hello Arnaud,

Thanks for your reply.

We have a system ( 2 x  Xeon 5410, Intel S5000PSL  mobo and 8 GB memory) with 12
x 500 GB SATA disks on a Areca 1130 controller. rpool is a mirror over 2 disks.
8 disks in raidz2, 1 spare. We have 2 aggr links.

Our goal is a ESX storage system, I am using ISCSI and NFS to serve space to our
ESX 4.0 servers.

We can remove a disk, with no problem. I can do a replace and the disk is being
resilverd. That works fine here.

Our problem comes when we make it the server a little bit harder! When we give
the server a "hard" time, copy 60G+ of data or do some other stuff to
give the system some load it hangs. This happens after 5 minutes or after 30
minutes or later but it hangs. Then we get the problems of the attached
pictures.

I have also emaild Areca. I''ll hope the can fix it..

Regards,

Rob
-- 
This message posted from opensolaris.org

Maybe Matching Threads

Search for more reasonably related threads

zfs discuss - Jul 2008 - zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

[zfs-discuss] zfs, raidz, spare and jbod

Maybe Matching Threads