Today, I noticed this: [joe at coruscant$] zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 0h0m with 0 errors on Sat Apr 4 08:31:49 2009 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 4 36K resilvered c2t5d0 ONLINE 0 0 0 errors: No known data errors I think this means a disk is failing and that ZFS did a good job of keeping everything sane. According to http://www.sun.com/msg/ZFS-8000-9P: The Message ID: ZFS-8000-9P indicates a device has exceeded the acceptable limit of errors allowed by the system. See document 203768 for additional information. Unfortunately, I''m not *authorized* to see that document. Question: I''m assuming the disk is dying. How can I get more information from the OS to confirm? Rant: Sun, you suck for telling me to read a document for additional information, and then denying me access.
On Fri, Apr 3, 2009 at 10:41 AM, Joe S <js.lists at gmail.com> wrote:> Today, I noticed this: > > [joe at coruscant$] zpool status > ?pool: tank > ?state: ONLINE > status: One or more devices has experienced an unrecoverable error. ?An > ? ? ? ?attempt was made to correct the error. ?Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > ? ? ? ?using ''zpool clear'' or replace the device with ''zpool replace''. > ? see: http://www.sun.com/msg/ZFS-8000-9P > ?scrub: resilver completed after 0h0m with 0 errors on Sat Apr ?4 08:31:49 2009 > config: > > ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM > ? ? ? ?tank ? ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?raidz1 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c2t0d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c2t1d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c2t4d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?raidz1 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c2t2d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c2t3d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 4 ?36K resilvered > ? ? ? ? ? ?c2t5d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > > errors: No known data errors > > > I think this means a disk is failing and that ZFS did a good job of > keeping everything sane. > > According to http://www.sun.com/msg/ZFS-8000-9P: > > The Message ID: ZFS-8000-9P indicates a device has exceeded the > acceptable limit of errors allowed by the system. See document 203768 > for additional information. > > Unfortunately, I''m not *authorized* to see that document. > > > Question: I''m assuming the disk is dying. How can I get more > information from the OS to confirm? > > Rant: Sun, you suck for telling me to read a document for additional > information, and then denying me access. >Running Nevada 105. Incidentally, I tried upgrading to Nevada 110, but the OS wouldn''t finish booting. It stopped at the part where it was trying to mount my ZFS filesystems. I booted back into 105 and it boots, but then as I ran a zpool status, I noticed that message.
On Fri, Apr 3, 2009 at 12:41 PM, Joe S <js.lists at gmail.com> wrote:> Today, I noticed this: > > [joe at coruscant$] zpool status > pool: tank > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using ''zpool clear'' or replace the device with ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: resilver completed after 0h0m with 0 errors on Sat Apr 4 08:31:49 > 2009 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > c2t0d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t4d0 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > c2t2d0 ONLINE 0 0 0 > c2t3d0 ONLINE 0 0 4 36K resilvered > c2t5d0 ONLINE 0 0 0 > > errors: No known data errors > > > I think this means a disk is failing and that ZFS did a good job of > keeping everything sane. > > According to http://www.sun.com/msg/ZFS-8000-9P: > > The Message ID: ZFS-8000-9P indicates a device has exceeded the > acceptable limit of errors allowed by the system. See document 203768 > for additional information. > > Unfortunately, I''m not *authorized* to see that document. > > > Question: I''m assuming the disk is dying. How can I get more > information from the OS to confirm? > > Rant: Sun, you suck for telling me to read a document for additional > information, and then denying me access. >On that front... I''m wondering if we could get a project going to mirror all of those pages to a non-sun hosted site. Just in case that IBM thing really does come to fruition :) --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090403/6e9a22e2/attachment.html>
On Fri, 3 Apr 2009, Joe S wrote:> > I think this means a disk is failing and that ZFS did a good job of > keeping everything sane.The disk is not necessarily "failing" (i.e. headed for the dumpster). Notice that only 36K had to be resilvered. If this is a SATA disk, then write down a note that this occured, clear the errors, and then wait for additional errors to crop up on the same disk. If this is an enterprise SCSI/SAS/FC disk then the situation could indicate something more serious since media failures are much less common. By all means, do a ''zfs scrub'' on the pool to make sure that there is not other data waiting to fail.> Rant: Sun, you suck for telling me to read a document for additional > information, and then denying me access.Yes, this sucks. Presumably if you paid for a support contract you could see the additional information. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Fri, Apr 3, 2009 at 12:51 PM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Fri, 3 Apr 2009, Joe S wrote: > >> >> I think this means a disk is failing and that ZFS did a good job of >> keeping everything sane. >> > > The disk is not necessarily "failing" (i.e. headed for the dumpster). > Notice that only 36K had to be resilvered. If this is a SATA disk, then > write down a note that this occured, clear the errors, and then wait for > additional errors to crop up on the same disk. If this is an enterprise > SCSI/SAS/FC disk then the situation could indicate something more serious > since media failures are much less common. > > By all means, do a ''zfs scrub'' on the pool to make sure that there is not > other data waiting to fail. > > Rant: Sun, you suck for telling me to read a document for additional >> information, and then denying me access. >> > > Yes, this sucks. Presumably if you paid for a support contract you could > see the additional information. >> Bob >I have a support contract and cannot see it. I''m assuming it''s internal only. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090403/8f132e24/attachment.html>
Joe, I just checked the referenced document and the document is providing steps via an example of replacing the failed/faulted device. I found in the ZFS Administration guide the URL below on repairing a device in a zpool. http://docs.sun.com/app/docs/doc/819-5461/gbbvf?l=en&a=view The above URL was linked from the Chapter 11 portion of the ZFS Administration guide on troubleshooting problems. http://docs.sun.com/app/docs/doc/819-5461/gavwg?l=en&a=view The link was in the paragraph below. Physically Reattaching the Device Exactly how a missing device is reattached depends on the device in question. If the device is a network-attached drive, connectivity should be restored. If the device is a USB or other removable media, it should be reattached to the system. If the device is a local disk, a controller might have failed such that the device is no longer visible to the system. In this case, the controller should be replaced at which point the disks will again be available. Other pathologies can exist and depend on the type of hardware and its configuration. If a drive fails and it is no longer visible to the system (an unlikely event), the device should be treated as a damaged device. Follow the procedures outlined in Repairing a Damaged Device. I do agree that if we (Sun) point people to additional steps that if they are externally available those should be referenced before an internal only link. Geoff On Fri, 2009-04-03 at 11:45, Joe S wrote:> On Fri, Apr 3, 2009 at 10:41 AM, Joe S <js.lists at gmail.com> wrote: > > Today, I noticed this: > > > > [joe at coruscant$] zpool status > > pool: tank > > state: ONLINE > > status: One or more devices has experienced an unrecoverable error. > An > > attempt was made to correct the error. Applications are > unaffected. > > action: Determine if the device needs to be replaced, and clear the > errors > > using ''zpool clear'' or replace the device with ''zpool > replace''. > > see: http://www.sun.com/msg/ZFS-8000-9P > > scrub: resilver completed after 0h0m with 0 errors on Sat Apr 4 > 08:31:49 2009 > > config: > > > > NAME STATE READ WRITE CKSUM > > tank ONLINE 0 0 0 > > raidz1 ONLINE 0 0 0 > > c2t0d0 ONLINE 0 0 0 > > c2t1d0 ONLINE 0 0 0 > > c2t4d0 ONLINE 0 0 0 > > raidz1 ONLINE 0 0 0 > > c2t2d0 ONLINE 0 0 0 > > c2t3d0 ONLINE 0 0 4 36K resilvered > > c2t5d0 ONLINE 0 0 0 > > > > errors: No known data errors > > > > > > I think this means a disk is failing and that ZFS did a good job of > > keeping everything sane. > > > > According to http://www.sun.com/msg/ZFS-8000-9P: > > > > The Message ID: ZFS-8000-9P indicates a device has exceeded the > > acceptable limit of errors allowed by the system. See document > 203768 > > for additional information. > > > > Unfortunately, I''m not *authorized* to see that document. > > > > > > Question: I''m assuming the disk is dying. How can I get more > > information from the OS to confirm? > > > > Rant: Sun, you suck for telling me to read a document for additional > > information, and then denying me access. > > > > Running Nevada 105. > > Incidentally, I tried upgrading to Nevada 110, but the OS wouldn''t > finish booting. It stopped at the part where it was trying to mount my > ZFS filesystems. I booted back into 105 and it boots, but then as I > ran a zpool status, I noticed that message. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Geoff Shipman - (303) 272-9955 Systems Technology Service Center - Operating System Solaris and Network Technology Domain Americas Systems Technology Service Center
On Fri, Apr 03, 2009 at 10:41:40AM -0700, Joe S wrote:> Today, I noticed this:...> According to http://www.sun.com/msg/ZFS-8000-9P: > > The Message ID: ZFS-8000-9P indicates a device has exceeded the > acceptable limit of errors allowed by the system. See document 203768 > for additional information.... I''ve had the same on a thumper with S10u6 1|2 month ago. Since logs did not show any disk error/warning for the last 6 month I just cleared the pool and finally scrubbed it and put back the ''tmp hotspare'' used to the hot spare pool. No errors or warnings since then for that disk, so it was obviously a false/brain damaged alarm ... regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768