I am using ZFS on FreeBSD 7.0_beta3. This is the first time i have used ZFS and I have run into something that I am not sure if this is normal, but am very concerned about. SYSTEM INFO: hp 320s (storage array) 12 disks (750GB each) 2GB RAM 1GB flash drive (running the OS) When I take a disk offline and replace it with my spare, after the spare rebuild it shows there are numerous errors. see below: scrub: resilver completed with 946 errors on Thu Dec 6 15:15:32 2007 config: NAME STATE READ WRITE CKSUM fatty DEGRADED 0 0 3.71K raidz2 DEGRADED 0 0 3.71K da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 300 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 253 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 spare DEGRADED 0 0 0 da9 OFFLINE 0 0 0 da11 ONLINE 0 0 0 da10 ONLINE 0 0 0 spares da11 INUSE currently in use errors: 801 data errors, use ''-v'' for a list After I detach the spare da11 and bring da9 back online all the errors go away. pool: fatty state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed with 0 errors on Thu Dec 6 15:57:23 2007 config: NAME STATE READ WRITE CKSUM fatty ONLINE 0 0 3.71K raidz2 ONLINE 0 0 3.71K da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 300 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 253 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 spares da11 AVAIL errors: No known data errors Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071207/8d254f41/attachment.html>
> NAME STATE READ WRITE CKSUM > fatty DEGRADED 0 0 3.71K > raidz2 DEGRADED 0 0 3.71K > da0 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 300 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da6 ONLINE 0 0 253 > da7 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > spare DEGRADED 0 0 0 > da9 OFFLINE 0 0 0 > da11 ONLINE 0 0 0 > da10 ONLINE 0 0 0 > spares > da11 INUSE currently in use > > errors: 801 data errors, use ''-v'' for a list > > > After I detach the spare da11 and bring da9 back online all the errors > go away.Theory: Suppose da3 and da6 are either bad drives, have cabling issues, or are on a controller suffering corruption (different from the other drives). If you now were to replace da9 by da11, the resilver operation would be reading from these drives, thus triggering checksum issues. Once you bring da9 back in, it is either entirely up to date or very close to up to date, so the amount of I/O required to resilver it is very small and may not trigger problems. If this theory is correct, a scrub (zpool scrub fatty) should encounter checksum errors on da3 and da6. -- / Peter Schuller PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at infidyne.com>'' Key retrieval: Send an E-Mail to getpgpkey at scode.org E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: This is a digitally signed message part. URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071207/7d3c0720/attachment.bin>
--On 07 December 2007 11:18 -0600 Jason Morton <jasonm at layeredtechnologies.com> wrote:> I am using ZFS on FreeBSD 7.0_beta3. This is the first time i have used > ZFS and I have run into something that I am not sure if this is normal, > but am very concerned about. > > SYSTEM INFO: > hp 320s (storage array) > 12 disks (750GB each) > 2GB RAM > 1GB flash drive (running the OS)Hi There, I''ve been running ZFS under FreeBSD 7.0 for a few months now, and we also have a lot of HP / Proliant Kit - and, touch wood, so far - we''ve not seen any issues. The first thing I''d suggest is make sure you have the absolutely *latest* firmware on the BIOS, and RAID controller (P400 I think the 320S is) from HP''s site. We''ve had a number of problems with drives ''disappearing'' array''s locking, and errors with previous firmware in the past - which were all (finally) resolved by updated firmware. Even our latest delivered batch of 360''s and 380''s didn''t have anything like ''current'' firmware on.> When I take a disk offline and replace it with my spare, after the spare > rebuild it shows there are numerous errors. see below: > scrub: resilver completed with 946 errors on Thu Dec 6 15:15:32 2007Being as they''re checksum errors - they probably won''t be logged on the console (as ZFS detected them, and not nesc. the underlying CAM layers) - but worth checking in case something "isn''t happy". With that in mind - you might also want to check if there''s anything in common with da3 and da6 - either in the physical drives, or where they are on the DSL320''s drive bay/box allocations, as shown by the RAID controller config (F8 at boot time when the RAID is init''ing). -Kp
On Dec 7, 2007, at 1:05 PM, Karl Pielorz wrote:> > > --On 07 December 2007 11:18 -0600 Jason Morton > <jasonm at layeredtechnologies.com> wrote: > >> I am using ZFS on FreeBSD 7.0_beta3. This is the first time i have >> used >> ZFS and I have run into something that I am not sure if this is >> normal, >> but am very concerned about. >> >> SYSTEM INFO: >> hp 320s (storage array) >> 12 disks (750GB each) >> 2GB RAM >> 1GB flash drive (running the OS) > > Hi There, > > I''ve been running ZFS under FreeBSD 7.0 for a few months now, and we > also > have a lot of HP / Proliant Kit - and, touch wood, so far - we''ve > not seen > any issues. >Jason, Now that FreeBSD 7 has been out for a while, have you had any problems with your ZFS pools? -joe