Willard Korfhage
2010-Apr-12 03:59 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
I''m struggling to get a reliable OpenSolaris system on a file server. I''m running an Asus P5BV-C/4L server motherboard, 4GB ECC ram, an E3110 processor, and an Areca 1230 with 12 1-TB disks attached. In a previous posting, it looked like RAM or the power supply by be a problem, so I ended up upgrading everything except the raid card and the disks. I''m running OpenSolaris preview build 134. I started off my setting up all the disks to be pass-through disks, and tried to make a raidz2 array using all the disks. It would work for a while, then suddenly every disk in the array would have too many errors and the system would fail. I don''t know why the sudden failure, but eventually I gave up. Instead, I used the Areca card to create a Raid-6 array with a hot spare, and created a pool directly on the 8TB disk the raid card exposed. I''ll let the card handle the redundancy, and zfs just the file system. Disk performance is noticeably faster, by the way, compared to software raid. I have been testing the system, and it suddenly failed again: # zpool status -v pool: bigraid state: DEGRADED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAME STATE READ WRITE CKSUM bigraid DEGRADED 0 0 7 c4t0d0 DEGRADED 0 0 34 too many errors errors: Permanent errors have been detected in the following files: <metadata>:<0x1> <metadata>:<0x18> bigraid:<0x3> The raid card says the array is fine - no errors - so something is going on with ZFS. I''m out of ideas this point, except that build 134 might be unstable and I should install an earlier, more stable version. Is there anything I''m missing that I should check? -- This message posted from opensolaris.org
Will Murnane
2010-Apr-12 04:49 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
On Sun, Apr 11, 2010 at 23:59, Willard Korfhage <opensolaris at familyk.org> wrote:> I''m struggling to get a reliable OpenSolaris system on a file server. I''m running an Asus P5BV-C/4L server motherboard, 4GB ECC ram, an E3110 processor, and an Areca 1230 with 12 1-TB disks attached. In a previous posting, it looked like RAM or the power supply by be a problem, so I ended up upgrading everything except the raid card and the disks. I''m running OpenSolaris preview build 134.What power supply are you running now, and how are the disks connected to it? I had problems with my array caused by not enough power cables running to the disk backplanes. I ran some more cables and it cleared up.> The raid card says the array is fine - no errors - so something is going on with ZFS. I''m out of ideas this point, except that build 134 might be unstable and I should install an earlier, more stable version. Is there anything I''m missing that I should check?Does anything show up in /var/adm/messages when the badness happens? fmadm faulty? Will
Willard Korfhage
2010-Apr-12 05:39 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
IT is a Corsair 650W modular power supply, with 2 or 3 disks per cable. However, the Areca card is not reporting any errors, so I think power to the disks is unlikely to be a problem. Here''s what is in /var/adm/messages Apr 11 22:37:41 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major Apr 11 22:37:41 fs9 EVENT-TIME: Sun Apr 11 22:37:41 CDT 2010 Apr 11 22:37:41 fs9 PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: fs9 Apr 11 22:37:41 fs9 SOURCE: zfs-diagnosis, REV: 1.0 Apr 11 22:37:41 fs9 EVENT-ID: f6d2aef7-d5fc-e302-a68e-a50a91e81d2d Apr 11 22:37:41 fs9 DESC: The number of checksum errors associated with a ZFS device Apr 11 22:37:41 fs9 exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-GH for more information. Apr 11 22:37:41 fs9 AUTO-RESPONSE: The device has been marked as degraded. An attempt Apr 11 22:37:41 fs9 will be made to activate a hot spare if available. Apr 11 22:37:41 fs9 IMPACT: Fault tolerance of the pool may be compromised. Apr 11 22:37:41 fs9 REC-ACTION: Run ''zpool status -x'' and replace the bad device. Apr 11 22:37:42 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major Apr 11 22:37:42 fs9 EVENT-TIME: Sun Apr 11 22:37:42 CDT 2010 Apr 11 22:37:42 fs9 PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: fs9 Apr 11 22:37:42 fs9 SOURCE: zfs-diagnosis, REV: 1.0 Apr 11 22:37:42 fs9 EVENT-ID: 89b2ef1c-c689-66a0-a7f7-d015a1b7f260 Apr 11 22:37:42 fs9 DESC: The ZFS pool has experienced currently unrecoverable I/O Apr 11 22:37:42 fs9 failures. Refer to http://sun.com/msg/ZFS-8000-HC for more information. Apr 11 22:37:42 fs9 AUTO-RESPONSE: No automated response will be taken. Apr 11 22:37:42 fs9 IMPACT: Read and write I/Os cannot be serviced. Apr 11 22:37:42 fs9 REC-ACTION: Make sure the affected devices are connected, then run Apr 11 22:37:42 fs9 ''zpool clear''. -- This message posted from opensolaris.org
Ian Collins
2010-Apr-12 06:20 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
On 04/12/10 05:39 PM, Willard Korfhage wrote:> IT is a Corsair 650W modular power supply, with 2 or 3 disks per cable. However, the Areca card is not reporting any errors, so I think power to the disks is unlikely to be a problem. > > Here''s what is in /var/adm/messages > > Apr 11 22:37:41 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major > Apr 11 22:37:41 fs9 EVENT-TIME: Sun Apr 11 22:37:41 CDT 2010 > Apr 11 22:37:41 fs9 PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: fs9 > Apr 11 22:37:41 fs9 SOURCE: zfs-diagnosis, REV: 1.0 > Apr 11 22:37:41 fs9 EVENT-ID: f6d2aef7-d5fc-e302-a68e-a50a91e81d2d > Apr 11 22:37:41 fs9 DESC: The number of checksum errors associated with a ZFS device > Apr 11 22:37:41 fs9 exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-GH for more information. > Apr 11 22:37:41 fs9 AUTO-RESPONSE: The device has been marked as degraded. An attempt > Apr 11 22:37:41 fs9 will be made to activate a hot spare if available. > Apr 11 22:37:41 fs9 IMPACT: Fault tolerance of the pool may be compromised. > Apr 11 22:37:41 fs9 REC-ACTION: Run ''zpool status -x'' and replace the bad device. > Apr 11 22:37:42 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major > Apr 11 22:37:42 fs9 EVENT-TIME: Sun Apr 11 22:37:42 CDT 2010 > Apr 11 22:37:42 fs9 PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: fs9 > Apr 11 22:37:42 fs9 SOURCE: zfs-diagnosis, REV: 1.0 > Apr 11 22:37:42 fs9 EVENT-ID: 89b2ef1c-c689-66a0-a7f7-d015a1b7f260 > Apr 11 22:37:42 fs9 DESC: The ZFS pool has experienced currently unrecoverable I/O > Apr 11 22:37:42 fs9 failures. Refer to http://sun.com/msg/ZFS-8000-HC for more information. > Apr 11 22:37:42 fs9 AUTO-RESPONSE: No automated response will be taken. > Apr 11 22:37:42 fs9 IMPACT: Read and write I/Os cannot be serviced. > Apr 11 22:37:42 fs9 REC-ACTION: Make sure the affected devices are connected, then run > Apr 11 22:37:42 fs9 ''zpool clear''. >Anything before that? -- Ian.
Tonmaus
2010-Apr-12 07:37 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
Hi,> I started off my setting up all the disks to be > pass-through disks, and tried to make a raidz2 array > using all the disks. It would work for a while, then > suddenly every disk in the array would have too many > errors and the system would fail.I had exactly the same experience with my Areca controller. Actually, I couldn''t get it to work unless I put the whole controller in jbod mode. Neither 12 x "Raid-0 arrays" with single disks nor pass-through was workable. I had kernel panic and pool corruption all over the place, sometimes with, sometimes without additional corruption messages from the areca panel.I am not sure if this relates to the rest of your problem, though. Regards, Tonmaus -- This message posted from opensolaris.org
Willard Korfhage
2010-Apr-12 10:51 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
Just a message 7 hours earlier about an IRQ being shared by drivers with different interrupt levels might result in reduced performance. -- This message posted from opensolaris.org
Willard Korfhage
2010-Apr-12 11:03 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
I was wondering if the controller itself has problems. My card''s firmware is version 1.42, and the firmware on the website is up to 1.48. I see the firmware released in last September says Fix Opensolaris+ZFS to add device to mirror set in JBOD or passthrough mode and Fix SATA raid controller seagate HDD error handling I''m not using mirroring, but I am using seagate drives. Looks like I should do a firmware upgrade -- This message posted from opensolaris.org
Tonmaus
2010-Apr-12 12:41 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
Upgrading the firmware a good idea, as there are other issues with Areca controllers that only have been solved recently. i.e. 1.42 is probably still affected by a problem with SCSI labels that may give problems importing a pool. -Tonmaus -- This message posted from opensolaris.org
Willard Korfhage
2010-Apr-12 13:10 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
I upgraded to the latest firmware. When I rebooted the machine, the pool was back, with no errors. I was surprised. I will work with it more, and see if it stays good. I''ve done a scrub, so now I''ll put more data on it and stress it some more. If the firmware upgrade fixed everything, then I''ve got a question about which I am better off doing: keep it as-is, with the raid card providing redundancy, or turn it all back into pass-through drives and let ZFS handle it, making the Areca card just a really expensive way of getting a bunch of SATA interfaces? -- This message posted from opensolaris.org
David Magda
2010-Apr-12 13:33 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
On Mon, April 12, 2010 09:10, Willard Korfhage wrote:> If the firmware upgrade fixed everything, then I''ve got a question about > which I am better off doing: keep it as-is, with the raid card providing > redundancy, or turn it all back into pass-through drives and let ZFS > handle it, making the Areca card just a really expensive way of getting a > bunch of SATA interfaces?Unless there''s a specific feature that the card does, I''d say that ZFS would give you more capabilities: scrubbing, reporting, recovery on checksum errors, more efficient rebuilds (i.e., only copying blocks that are used). If the hardware ever goes south, you''ll also be able to have to move the disks to any arbitrary machine and do a ''zpool import''. At least for DAS, there are very few reasons to use fancy cards nowadays (also true with Linux and LVM to a certain extent).
Kyle McDonald
2010-Apr-12 17:10 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
On 4/12/2010 9:10 AM, Willard Korfhage wrote:> I upgraded to the latest firmware. When I rebooted the machine, the pool was back, with no errors. I was surprised. > > I will work with it more, and see if it stays good. I''ve done a scrub, so now I''ll put more data on it and stress it some more. > > If the firmware upgrade fixed everything, then I''ve got a question about which I am better off doing: keep it as-is, with the raid card providing redundancy, or turn it all back into pass-through drives and let ZFS handle it, making the Areca card just a really expensive way of getting a bunch of SATA interfaces? >AS one of the other posters mentioned there may be a third way that might give you something close to "the best of both worlds". Try using the Areca card to make 12 single disk RAID 0 LUNs, and then use those in ZFS. I''m not sure of the definition of ''passthrough'', but if it disables any battery backed cache that the card may have, then by setting up 12 HW RAID LUNs instead, you it should give you an improvement by allowing the Card to cache writes. The one downside of doing this vs. something more like ''jbod'' is that if the controller dies you will need to move the disks to another Areca controller, where as with 12 ''jbod'' connections you could move them to pretty much any controller you wanted. -Kyle
Ragnar Sundblad
2010-Apr-13 22:42 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
On 12 apr 2010, at 19.10, Kyle McDonald wrote:> On 4/12/2010 9:10 AM, Willard Korfhage wrote: >> I upgraded to the latest firmware. When I rebooted the machine, the pool was back, with no errors. I was surprised. >> >> I will work with it more, and see if it stays good. I''ve done a scrub, so now I''ll put more data on it and stress it some more. >> >> If the firmware upgrade fixed everything, then I''ve got a question about which I am better off doing: keep it as-is, with the raid card providing redundancy, or turn it all back into pass-through drives and let ZFS handle it, making the Areca card just a really expensive way of getting a bunch of SATA interfaces? >> > > AS one of the other posters mentioned there may be a third way that > might give you something close to "the best of both worlds". > > Try using the Areca card to make 12 single disk RAID 0 LUNs, and then > use those in ZFS. > I''m not sure of the definition of ''passthrough'', but if it disables any > battery backed cache that the card may have, then by setting up 12 HW > RAID LUNs instead, you it should give you an improvement by allowing the > Card to cache writes. > > The one downside of doing this vs. something more like ''jbod'' is that if > the controller dies you will need to move the disks to another Areca > controller, where as with 12 ''jbod'' connections you could move them to > pretty much any controller you wanted.And that if you use the write cache in the controller and the controller dies, parts of your recently written data is only in the dead controller, and your pool may be more or less corrupt and may have to be rolled back a few versions to be rescued or may be not rescuable at all. This may may not be acceptable. /ragge
Willard Korfhage
2010-Apr-14 00:03 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
These are all good reasons to switch back to letting ZFS handle it. I did put about 600GB of data on the pool as configured with Raid 6 on the card, verified the data, and scrubbed it a couple time in the process and there''s no problems, so it appears that the firmware upgrade fixed my problems. However, I''m going to switch it back to passthrough disks, remake the pool and try it again. -- This message posted from opensolaris.org
Victor Latushkin
2010-Apr-14 00:07 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
On Apr 14, 2010, at 2:42 AM, Ragnar Sundblad wrote:> > On 12 apr 2010, at 19.10, Kyle McDonald wrote: > >> On 4/12/2010 9:10 AM, Willard Korfhage wrote: >>> I upgraded to the latest firmware. When I rebooted the machine, the pool was back, with no errors. I was surprised. >>> >>> I will work with it more, and see if it stays good. I''ve done a scrub, so now I''ll put more data on it and stress it some more. >>> >>> If the firmware upgrade fixed everything, then I''ve got a question about which I am better off doing: keep it as-is, with the raid card providing redundancy, or turn it all back into pass-through drives and let ZFS handle it, making the Areca card just a really expensive way of getting a bunch of SATA interfaces? >>> >> >> AS one of the other posters mentioned there may be a third way that >> might give you something close to "the best of both worlds". >> >> Try using the Areca card to make 12 single disk RAID 0 LUNs, and then >> use those in ZFS. >> I''m not sure of the definition of ''passthrough'', but if it disables any >> battery backed cache that the card may have, then by setting up 12 HW >> RAID LUNs instead, you it should give you an improvement by allowing the >> Card to cache writes. >> >> The one downside of doing this vs. something more like ''jbod'' is that if >> the controller dies you will need to move the disks to another Areca >> controller, where as with 12 ''jbod'' connections you could move them to >> pretty much any controller you wanted. > > And that if you use the write cache in the controller and the controller > dies, parts of your recently written data is only in the dead controller, > and your pool may be more or less corrupt and may have to be rolled back > a few versions to be rescued or may be not rescuable at all. > This may may not be acceptable.There was successful recovery of what seemed to be result of lost cache on Areca controller, see this thread: http://opensolaris.org/jive/thread.jspa?threadID=109007 it was manual recovery, but these days we have ''zpool import -fFX <poolname>'' that would do the same in a lot more user-fiendly manner. -- regards victor
Willard Korfhage
2010-Apr-15 04:27 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
As I mentioned earlier, I removed the hardware-based Raid6 array, changed all the disks to passthrough disks, made a raidz2 pool using all the disk. I used my backup program to copy 55GB of data to the disk, and now I have errors all over the place. # zpool status -v pool: bigraid state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h4m with 0 errors on Wed Apr 14 22:56:36 2010 config: NAME STATE READ WRITE CKSUM bigraid DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 24 c4t0d0 ONLINE 0 0 3 c4t0d1 ONLINE 0 0 2 c4t0d2 ONLINE 0 0 2 c4t0d3 DEGRADED 0 0 2 too many errors c4t0d4 ONLINE 0 0 2 c4t0d5 ONLINE 0 0 2 c4t0d6 ONLINE 0 0 1 c4t0d7 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t1d1 ONLINE 0 0 2 c4t1d2 ONLINE 0 0 2 c4t1d3 ONLINE 0 0 4 errors: No known data errors So, zfs on hardware-supported raid was fine, but zfs on passthrough disks is not. I''m at a loss to explain it. Any ideas? -- This message posted from opensolaris.org
Tonmaus
2010-Apr-15 13:54 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
My understanding of "passthrough disk" from the Areca documentation is that single drives are exempted from the RAID controller regime and that the port will behave just like a plain HBA port. Now, on my Areca controller (r.i.p.) that mode always created the biggest havoc with ZFS/Opensolaris, including zpool states just like yours. That was on a older firmware, though. 12xRAID0 was only marginally better than pass-through. What I maybe did not mention is that we tried with Ubuntu/dmraid on the same HW for an afternoon, but here the initialisation of the RAID crashed with a reproducible Kernel Panic. I think I mentioned it before: the only thing that worked decently was putting the whole controller in JBOD mode. Yes, it is an expensive way of providing a bunch of SATA ports... in my case it wasn''t that bad as I got a 1170 for app. 400 Euros, but it was still too expensive given the performance under ZFS, so I swapped it against a full re-fund for a pair of LSIs. Regards, Tonmaus -- This message posted from opensolaris.org
Willard Korfhage
2010-Apr-15 22:07 UTC
[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?
I''ve got a Supermicro AOC-USAS-L8I on the way because I gather from these forums that it works well. I''ll just wait for that, then try 8 disks on that an 4 on the motherboard SATA ports. -- This message posted from opensolaris.org