I''ve had my 10x500 ZFS+ running for probably 6 months now and had thought it was scrubbing occasionally (wrong) so I started a scrub this morning, its almost done now and I got this: errors: No known data errors # zpool status pool: pile state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 97.93% done, 0h5m to go config: NAME STATE READ WRITE CKSUM pile ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 1 c5t5d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 1 c5t7d0 ONLINE 0 0 0 c3d0 ONLINE 0 0 1 c4d0 ONLINE 0 0 0 So it says its a minor error but still one to be concerned about, I thought resilvering takes care of checksum errors, does it not? Should I be running to buy 3 new 500GB drives? Thanks, Sam This message posted from opensolaris.org
Could this someway be related to this rather large (100GB) difference that ''zfs list'' and ''zpool list'' report: NAME SIZE USED AVAIL CAP HEALTH ALTROOT pile 4.53T 4.31T 223G 95% ONLINE - # zfs list NAME USED AVAIL REFER MOUNTPOINT pile 3.44T 120G 3.44T /pile I know there should be a 1TB difference in SIZE but the difference in AVAIL makes no sense. This message posted from opensolaris.org
Arne Schwabe
2008-Jul-30 02:57 UTC
[zfs-discuss] My first ''unrecoverable error'', what to do?
Sam schrieb:> I''ve had my 10x500 ZFS+ running for probably 6 months now and had thought it was scrubbing occasionally (wrong) so I started a scrub this morning, its almost done now and I got this: > > errors: No known data errors > # zpool status > pool: pile > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using ''zpool clear'' or replace the device with ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: scrub in progress, 97.93% done, 0h5m to go > config: > > NAME STATE READ WRITE CKSUM > pile ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c5t0d0 ONLINE 0 0 0 > c5t1d0 ONLINE 0 0 0 > c5t2d0 ONLINE 0 0 0 > c5t3d0 ONLINE 0 0 0 > c5t4d0 ONLINE 0 0 1 > c5t5d0 ONLINE 0 0 0 > c5t6d0 ONLINE 0 0 1 > c5t7d0 ONLINE 0 0 0 > c3d0 ONLINE 0 0 1 > c4d0 ONLINE 0 0 0 > > > So it says its a minor error but still one to be concerned about, I thought resilvering takes care of checksum errors, does it not? Should I be running to buy 3 new 500GB drives? > >Failures can have different cause. Maybe a cable is defect. Also occosinal defect sectors are "normal" and are managed quite good by the defect managment of the drive. You can use zpool clear to reset the counters to 0. Arne
Bob Friesenhahn
2008-Jul-30 05:01 UTC
[zfs-discuss] My first ''unrecoverable error'', what to do?
On Tue, 29 Jul 2008, Sam wrote:> So it says its a minor error but still one to be concerned about, I > thought resilvering takes care of checksum errors, does it not? > Should I be running to buy 3 new 500GB drives?Presumably these are SATA drives. Studies show that typical SATA drives tend to produce recurring data errors during their lifetime so a few data errors are likely nothing to be alarmed about. If you see many tens or hundreds then there would be cause for concern. Enterprise SCSI drives produce very few such errors and evidence suggests that data errors may portend doom. I have yet to see an error here. Knock on wood! Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Robert Milkowski
2008-Jul-30 10:07 UTC
[zfs-discuss] My first ''unrecoverable error'', what to do?
Hello Sam, Wednesday, July 30, 2008, 3:23:55 AM, you wrote: S> I''ve had my 10x500 ZFS+ running for probably 6 months now and had S> thought it was scrubbing occasionally (wrong) so I started a scrub S> this morning, its almost done now and I got this: S> errors: No known data errors S> # zpool status S> pool: pile S> state: ONLINE S> status: One or more devices has experienced an unrecoverable error. An S> attempt was made to correct the error. Applications are unaffected. S> action: Determine if the device needs to be replaced, and clear the errors S> using ''zpool clear'' or replace the device with ''zpool replace''. S> see: http://www.sun.com/msg/ZFS-8000-9P S> scrub: scrub in progress, 97.93% done, 0h5m to go S> config: S> NAME STATE READ WRITE CKSUM S> pile ONLINE 0 0 0 S> raidz2 ONLINE 0 0 0 S> c5t0d0 ONLINE 0 0 0 S> c5t1d0 ONLINE 0 0 0 S> c5t2d0 ONLINE 0 0 0 S> c5t3d0 ONLINE 0 0 0 S> c5t4d0 ONLINE 0 0 1 S> c5t5d0 ONLINE 0 0 0 S> c5t6d0 ONLINE 0 0 1 S> c5t7d0 ONLINE 0 0 0 S> c3d0 ONLINE 0 0 1 S> c4d0 ONLINE 0 0 0 S> So it says its a minor error but still one to be concerned about, S> I thought resilvering takes care of checksum errors, does it not? S> Should I be running to buy 3 new 500GB drives? ZFS only reported to you that there were some checksum errors - all of them were corrected and no bad data was given back to applications. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com