thr3ads.net - zfs discuss - [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine? [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Willard Korfhage

2010-Apr-12 03:59 UTC

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

I''m struggling to get a reliable OpenSolaris system on a file server.
I''m running an Asus P5BV-C/4L server motherboard, 4GB ECC ram, an E3110
processor, and an Areca 1230 with 12 1-TB disks attached. In a previous posting,
it looked like RAM or the power supply by be a problem, so I ended up upgrading
everything except the raid card and the disks. I''m running OpenSolaris
preview build 134.

I started off my setting up all the disks to be pass-through disks, and tried to
make a raidz2 array using all the disks. It would work for a while, then
suddenly every disk in the array would have too many errors and the system would
fail. I don''t know why the sudden failure, but eventually I gave up.

Instead, I used the Areca card to create a Raid-6 array with a hot spare, and
created a pool directly on the 8TB disk the raid card exposed. I''ll let
the card handle the redundancy, and zfs just the file system. Disk performance
is noticeably faster, by the way, compared to software raid.

I have been testing the system, and it suddenly failed again:

 # zpool status -v
  pool: bigraid
 state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ''zpool
clear''.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        bigraid     DEGRADED     0     0     7
          c4t0d0    DEGRADED     0     0    34  too many errors

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x1>
        <metadata>:<0x18>
        bigraid:<0x3>

The raid card says the array is fine - no errors - so something is going on with
ZFS. I''m out of ideas this point, except that build 134 might be
unstable and I should install an earlier, more stable version. Is there anything
I''m missing that I should check?
-- 
This message posted from opensolaris.org

Will Murnane

2010-Apr-12 04:49 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

On Sun, Apr 11, 2010 at 23:59, Willard Korfhage <opensolaris at
familyk.org> wrote:> I''m struggling to get a reliable OpenSolaris system on a file
server. I''m running an Asus P5BV-C/4L server motherboard, 4GB ECC ram,
an E3110 processor, and an Areca 1230 with 12 1-TB disks attached. In a previous
posting, it looked like RAM or the power supply by be a problem, so I ended up
upgrading everything except the raid card and the disks. I''m running
OpenSolaris preview build 134.What power supply are you running now, and how are the disks connected
to it?  I had problems with my array caused by not enough power cables
running to the disk backplanes.  I ran some more cables and it cleared
up.
> The raid card says the array is fine - no errors - so something is going on
with ZFS. I''m out of ideas this point, except that build 134 might be
unstable and I should install an earlier, more stable version. Is there anything
I''m missing that I should check?Does anything show up in /var/adm/messages when the badness happens?
fmadm faulty?

Will

Willard Korfhage

2010-Apr-12 05:39 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

IT is a Corsair 650W modular power supply, with 2 or 3 disks per cable. However,
the Areca card is not reporting any errors, so I think power to the disks is
unlikely to be a problem.

Here''s what is in /var/adm/messages

Apr 11 22:37:41 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH,
TYPE: Fault, VER: 1, SEVERITY: Major
Apr 11 22:37:41 fs9 EVENT-TIME: Sun Apr 11 22:37:41 CDT 2010
Apr 11 22:37:41 fs9 PLATFORM: System-Product-Name, CSN: System-Serial-Number,
HOSTNAME: fs9
Apr 11 22:37:41 fs9 SOURCE: zfs-diagnosis, REV: 1.0
Apr 11 22:37:41 fs9 EVENT-ID: f6d2aef7-d5fc-e302-a68e-a50a91e81d2d
Apr 11 22:37:41 fs9 DESC: The number of checksum errors associated with a ZFS
device
Apr 11 22:37:41 fs9 exceeded acceptable levels.  Refer to
http://sun.com/msg/ZFS-8000-GH for more information.
Apr 11 22:37:41 fs9 AUTO-RESPONSE: The device has been marked as degraded.  An
attempt
Apr 11 22:37:41 fs9 will be made to activate a hot spare if available.
Apr 11 22:37:41 fs9 IMPACT: Fault tolerance of the pool may be compromised.
Apr 11 22:37:41 fs9 REC-ACTION: Run ''zpool status -x'' and
replace the bad device.
Apr 11 22:37:42 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC,
TYPE: Error, VER: 1, SEVERITY: Major
Apr 11 22:37:42 fs9 EVENT-TIME: Sun Apr 11 22:37:42 CDT 2010
Apr 11 22:37:42 fs9 PLATFORM: System-Product-Name, CSN: System-Serial-Number,
HOSTNAME: fs9
Apr 11 22:37:42 fs9 SOURCE: zfs-diagnosis, REV: 1.0
Apr 11 22:37:42 fs9 EVENT-ID: 89b2ef1c-c689-66a0-a7f7-d015a1b7f260
Apr 11 22:37:42 fs9 DESC: The ZFS pool has experienced currently unrecoverable
I/O
Apr 11 22:37:42 fs9         failures.  Refer to http://sun.com/msg/ZFS-8000-HC
for more information.
Apr 11 22:37:42 fs9 AUTO-RESPONSE: No automated response will be taken.
Apr 11 22:37:42 fs9 IMPACT: Read and write I/Os cannot be serviced.
Apr 11 22:37:42 fs9 REC-ACTION: Make sure the affected devices are connected,
then run
Apr 11 22:37:42 fs9         ''zpool clear''.
-- 
This message posted from opensolaris.org

Ian Collins

2010-Apr-12 06:20 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

On 04/12/10 05:39 PM, Willard Korfhage wrote:> IT is a Corsair 650W modular power supply, with 2 or 3 disks per cable.
However, the Areca card is not reporting any errors, so I think power to the
disks is unlikely to be a problem.
>
> Here''s what is in /var/adm/messages
>
> Apr 11 22:37:41 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH,
TYPE: Fault, VER: 1, SEVERITY: Major
> Apr 11 22:37:41 fs9 EVENT-TIME: Sun Apr 11 22:37:41 CDT 2010
> Apr 11 22:37:41 fs9 PLATFORM: System-Product-Name, CSN:
System-Serial-Number, HOSTNAME: fs9
> Apr 11 22:37:41 fs9 SOURCE: zfs-diagnosis, REV: 1.0
> Apr 11 22:37:41 fs9 EVENT-ID: f6d2aef7-d5fc-e302-a68e-a50a91e81d2d
> Apr 11 22:37:41 fs9 DESC: The number of checksum errors associated with a
ZFS device
> Apr 11 22:37:41 fs9 exceeded acceptable levels.  Refer to
http://sun.com/msg/ZFS-8000-GH for more information.
> Apr 11 22:37:41 fs9 AUTO-RESPONSE: The device has been marked as degraded. 
An attempt
> Apr 11 22:37:41 fs9 will be made to activate a hot spare if available.
> Apr 11 22:37:41 fs9 IMPACT: Fault tolerance of the pool may be compromised.
> Apr 11 22:37:41 fs9 REC-ACTION: Run ''zpool status -x'' and
replace the bad device.
> Apr 11 22:37:42 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC,
TYPE: Error, VER: 1, SEVERITY: Major
> Apr 11 22:37:42 fs9 EVENT-TIME: Sun Apr 11 22:37:42 CDT 2010
> Apr 11 22:37:42 fs9 PLATFORM: System-Product-Name, CSN:
System-Serial-Number, HOSTNAME: fs9
> Apr 11 22:37:42 fs9 SOURCE: zfs-diagnosis, REV: 1.0
> Apr 11 22:37:42 fs9 EVENT-ID: 89b2ef1c-c689-66a0-a7f7-d015a1b7f260
> Apr 11 22:37:42 fs9 DESC: The ZFS pool has experienced currently
unrecoverable I/O
> Apr 11 22:37:42 fs9         failures.  Refer to
http://sun.com/msg/ZFS-8000-HC for more information.
> Apr 11 22:37:42 fs9 AUTO-RESPONSE: No automated response will be taken.
> Apr 11 22:37:42 fs9 IMPACT: Read and write I/Os cannot be serviced.
> Apr 11 22:37:42 fs9 REC-ACTION: Make sure the affected devices are
connected, then run
> Apr 11 22:37:42 fs9         ''zpool clear''.
>    Anything before that?

-- 
Ian.

Tonmaus

2010-Apr-12 07:37 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

Hi,
> I started off my setting up all the disks to be
> pass-through disks, and tried to make a raidz2 array
> using all the disks. It would work for a while, then
> suddenly every disk in the array would have too many
> errors and the system would fail.
I had exactly the same experience with my Areca controller. Actually, I
couldn''t get it to work unless I put the whole controller in jbod mode.
Neither 12 x "Raid-0 arrays" with single disks nor pass-through was
workable. I had kernel panic and pool corruption all over the place, sometimes
with, sometimes without additional corruption messages from the areca panel.I am
not sure if this relates to the rest of your problem, though.

Regards,

Tonmaus
-- 
This message posted from opensolaris.org

Willard Korfhage

2010-Apr-12 10:51 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

Just a message 7 hours earlier about an IRQ being shared by drivers with
different interrupt levels might result in reduced performance.
-- 
This message posted from opensolaris.org

Willard Korfhage

2010-Apr-12 11:03 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

I was wondering if the controller itself has problems. My card''s
firmware is version 1.42, and the firmware on the website is up to 1.48.

I see the firmware released in last September says

Fix Opensolaris+ZFS to add device to mirror set in JBOD or passthrough mode

and

Fix SATA raid controller seagate HDD error handling

I''m not using mirroring, but I am using seagate drives. Looks like I
should do a firmware upgrade
-- 
This message posted from opensolaris.org

Tonmaus

2010-Apr-12 12:41 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

Upgrading the firmware a good idea, as there are other issues with Areca
controllers that only have been solved recently. i.e. 1.42 is probably still
affected by a problem with SCSI labels that may give problems importing a pool.

-Tonmaus
-- 
This message posted from opensolaris.org

Willard Korfhage

2010-Apr-12 13:10 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

I upgraded to the latest firmware. When I rebooted the machine, the pool was
back, with no errors. I was surprised.

I will work with it more, and see if it stays good. I''ve done a scrub,
so now I''ll put more data on it and stress it some more.

If the firmware upgrade fixed everything, then I''ve got  a question
about which I am better off doing: keep it as-is, with the raid card providing
redundancy, or turn it all back into pass-through drives and let ZFS handle it,
making the Areca card just a really expensive way of getting a bunch of SATA
interfaces?
-- 
This message posted from opensolaris.org

David Magda

2010-Apr-12 13:33 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

On Mon, April 12, 2010 09:10, Willard Korfhage wrote:
> If the firmware upgrade fixed everything, then I''ve got  a
question about
> which I am better off doing: keep it as-is, with the raid card providing
> redundancy, or turn it all back into pass-through drives and let ZFS
> handle it, making the Areca card just a really expensive way of getting a
> bunch of SATA interfaces?
Unless there''s a specific feature that the card does, I''d say
that ZFS
would give you more capabilities: scrubbing, reporting, recovery on
checksum errors, more efficient rebuilds (i.e., only copying blocks that
are used). If the hardware ever goes south, you''ll also be able to have
to
move the disks to any arbitrary machine and do a ''zpool
import''.

At least for DAS, there are very few reasons to use fancy cards nowadays
(also true with Linux and LVM to a certain extent).

Kyle McDonald

2010-Apr-12 17:10 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

On 4/12/2010 9:10 AM, Willard Korfhage wrote:> I upgraded to the latest firmware. When I rebooted the machine, the pool
was back, with no errors. I was surprised.
>
> I will work with it more, and see if it stays good. I''ve done a
scrub, so now I''ll put more data on it and stress it some more.
>
> If the firmware upgrade fixed everything, then I''ve got  a
question about which I am better off doing: keep it as-is, with the raid card
providing redundancy, or turn it all back into pass-through drives and let ZFS
handle it, making the Areca card just a really expensive way of getting a bunch
of SATA interfaces?
>   
AS one of the other posters mentioned there may be a third way that
might give you something close to "the best of both worlds".

Try using the Areca card to make 12 single disk RAID 0 LUNs, and then
use those in ZFS.
I''m not sure of the definition of ''passthrough'', but
if it disables any
battery backed cache that the card may have, then by setting up 12 HW
RAID LUNs instead, you it should give you an improvement by allowing the
Card to cache writes.

The one downside of doing this vs. something more like ''jbod''
is that if
the controller dies you will need to move the disks to another Areca
controller, where as with 12  ''jbod'' connections you could
move them to
pretty much any controller you wanted.

 -Kyle

Ragnar Sundblad

2010-Apr-13 22:42 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

On 12 apr 2010, at 19.10, Kyle McDonald wrote:
> On 4/12/2010 9:10 AM, Willard Korfhage wrote:
>> I upgraded to the latest firmware. When I rebooted the machine, the
pool was back, with no errors. I was surprised.
>> 
>> I will work with it more, and see if it stays good. I''ve done
a scrub, so now I''ll put more data on it and stress it some more.
>> 
>> If the firmware upgrade fixed everything, then I''ve got  a
question about which I am better off doing: keep it as-is, with the raid card
providing redundancy, or turn it all back into pass-through drives and let ZFS
handle it, making the Areca card just a really expensive way of getting a bunch
of SATA interfaces?
>> 
> 
> AS one of the other posters mentioned there may be a third way that
> might give you something close to "the best of both worlds".
> 
> Try using the Areca card to make 12 single disk RAID 0 LUNs, and then
> use those in ZFS.
> I''m not sure of the definition of ''passthrough'',
but if it disables any
> battery backed cache that the card may have, then by setting up 12 HW
> RAID LUNs instead, you it should give you an improvement by allowing the
> Card to cache writes.
> 
> The one downside of doing this vs. something more like
''jbod'' is that if
> the controller dies you will need to move the disks to another Areca
> controller, where as with 12  ''jbod'' connections you
could move them to
> pretty much any controller you wanted.
And that if you use the write cache in the controller and the controller
dies, parts of your recently written data is only in the dead controller,
and your pool may be more or less corrupt and may have to be rolled back
a few versions to be rescued or may be not rescuable at all.
This may may not be acceptable.

/ragge

Willard Korfhage

2010-Apr-14 00:03 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

These are all good reasons to switch back to letting ZFS handle it. I did put
about 600GB of data on the pool as configured with Raid 6 on the card, verified
the data, and scrubbed it a couple time in the process and there''s no
problems, so it appears that the firmware upgrade fixed my problems. However,
I''m going to switch it back to passthrough disks, remake the pool and
try it again.
-- 
This message posted from opensolaris.org

Victor Latushkin

2010-Apr-14 00:07 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

On Apr 14, 2010, at 2:42 AM, Ragnar Sundblad wrote:
> 
> On 12 apr 2010, at 19.10, Kyle McDonald wrote:
> 
>> On 4/12/2010 9:10 AM, Willard Korfhage wrote:
>>> I upgraded to the latest firmware. When I rebooted the machine, the
pool was back, with no errors. I was surprised.
>>> 
>>> I will work with it more, and see if it stays good. I''ve
done a scrub, so now I''ll put more data on it and stress it some more.
>>> 
>>> If the firmware upgrade fixed everything, then I''ve got  a
question about which I am better off doing: keep it as-is, with the raid card
providing redundancy, or turn it all back into pass-through drives and let ZFS
handle it, making the Areca card just a really expensive way of getting a bunch
of SATA interfaces?
>>> 
>> 
>> AS one of the other posters mentioned there may be a third way that
>> might give you something close to "the best of both worlds".
>> 
>> Try using the Areca card to make 12 single disk RAID 0 LUNs, and then
>> use those in ZFS.
>> I''m not sure of the definition of
''passthrough'', but if it disables any
>> battery backed cache that the card may have, then by setting up 12 HW
>> RAID LUNs instead, you it should give you an improvement by allowing
the
>> Card to cache writes.
>> 
>> The one downside of doing this vs. something more like
''jbod'' is that if
>> the controller dies you will need to move the disks to another Areca
>> controller, where as with 12  ''jbod'' connections you
could move them to
>> pretty much any controller you wanted.
> 
> And that if you use the write cache in the controller and the controller
> dies, parts of your recently written data is only in the dead controller,
> and your pool may be more or less corrupt and may have to be rolled back
> a few versions to be rescued or may be not rescuable at all.
> This may may not be acceptable.
There was successful recovery of what seemed to be result of lost cache on Areca
controller, see this thread:

http://opensolaris.org/jive/thread.jspa?threadID=109007

it was manual recovery, but these days we have ''zpool import -fFX
<poolname>'' that would do the same in a lot more user-fiendly
manner.

--
regards
victor

Willard Korfhage

2010-Apr-15 04:27 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

As I mentioned earlier, I removed the hardware-based Raid6 array, changed all
the disks to passthrough disks, made a raidz2 pool using all the disk. I used my
backup program to copy 55GB of data to the disk, and now I have errors all over
the place.

# zpool status -v
  pool: bigraid
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h4m with 0 errors on Wed Apr 14 22:56:36 2010
config:

        NAME        STATE     READ WRITE CKSUM
        bigraid     DEGRADED     0     0     0
          raidz2-0  DEGRADED     0     0    24
            c4t0d0  ONLINE       0     0     3
            c4t0d1  ONLINE       0     0     2
            c4t0d2  ONLINE       0     0     2
            c4t0d3  DEGRADED     0     0     2  too many errors
            c4t0d4  ONLINE       0     0     2
            c4t0d5  ONLINE       0     0     2
            c4t0d6  ONLINE       0     0     1
            c4t0d7  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c4t1d1  ONLINE       0     0     2
            c4t1d2  ONLINE       0     0     2
            c4t1d3  ONLINE       0     0     4

errors: No known data errors


So, zfs on hardware-supported raid was fine, but zfs on passthrough disks is
not. I''m at a loss to explain it. Any ideas?
-- 
This message posted from opensolaris.org

Tonmaus

2010-Apr-15 13:54 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

My understanding of "passthrough disk" from the Areca documentation is
that single drives are exempted from the RAID controller regime and that the
port will behave just like a plain HBA port.
Now, on my Areca controller (r.i.p.) that mode always created the biggest havoc
with ZFS/Opensolaris, including zpool states just like yours. That was on a
older firmware, though.
12xRAID0 was only marginally better than pass-through.

What I maybe did not mention is that we tried with Ubuntu/dmraid on the same HW
for an afternoon, but here the initialisation of the RAID crashed with a
reproducible Kernel Panic.

I think I mentioned it before: the only thing that worked decently was putting
the whole controller in JBOD mode.

Yes, it is an expensive way of providing a bunch of SATA ports... in my case it
wasn''t that bad as I got a 1170 for app. 400 Euros, but it was still
too expensive given the performance under ZFS, so I swapped it against a full
re-fund for a pair of LSIs.

Regards,
Tonmaus
-- 
This message posted from opensolaris.org

Willard Korfhage

2010-Apr-15 22:07 UTC

head link

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

I''ve got a Supermicro AOC-USAS-L8I on the way because I gather from
these forums that it works well. I''ll just wait for that, then try 8
disks on that an 4 on the motherboard SATA ports.
-- 
This message posted from opensolaris.org

zfs discuss - Apr 2010 - Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?