Jonathan
2010-Apr-14 07:05 UTC
[zfs-discuss] Replaced drive in zpool, was fine, now degraded - ohno
I just started replacing drives in this zpool (to increase storage). I pulled the first drive, and replaced it with a new drive and all was well. It resilvered with 0 errors. This was 5 days ago. Just today I was looking around and noticed that my pool was degraded (I see now that this occurred last night). Sure enough there are 12 read errors on the new drive. I''m on snv 111b. I attempted to get smartmontools workings, but it doesn''t seem to want to work as these are all sata drives. fmdump indicates that the read errors occurred within about 10 minutes of one another. Is it safe to say this drive is bad, or is there anything else I can do about this? Thanks, Jon -------------------------------------------------------- $ zpool status MyStorage pool: MyStorage state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use ''zpool clear'' to mark the device repaired. scrub: scrub completed after 8h7m with 0 errors on Sun Apr 11 13:07:40 2010 config: NAME STATE READ WRITE CKSUM MyStorage DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c5t0d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c7t1d0 FAULTED 12 0 0 too many errors errors: No known data errors -------------------------------------------------------- $ fmdump TIME UUID SUNW-MSG-ID Apr 09 16:08:04.4660 1f07d23f-a4ba-cbbb-8713-d003d9771079 ZFS-8000-D3 Apr 13 22:29:02.8063 e26c7e32-e5dd-cd9c-cd26-d5715049aad8 ZFS-8000-FD -------------------------------------------------------- That first log is the original drive being replaced. The second is the read errors on the new drive. -- This message posted from opensolaris.org
Richard Elling
2010-Apr-14 16:45 UTC
[zfs-discuss] Replaced drive in zpool, was fine, now degraded - ohno
On Apr 14, 2010, at 12:05 AM, Jonathan wrote:> I just started replacing drives in this zpool (to increase storage). I pulled the first drive, and replaced it with a new drive and all was well. It resilvered with 0 errors. This was 5 days ago. Just today I was looking around and noticed that my pool was degraded (I see now that this occurred last night). Sure enough there are 12 read errors on the new drive. > > I''m on snv 111b. I attempted to get smartmontools workings, but it doesn''t seem to want to work as these are all sata drives. fmdump indicates that the read errors occurred within about 10 minutes of one another.Use "iostat -En" to see the nature of the I/O errors.> > Is it safe to say this drive is bad, or is there anything else I can do about this?It is safe to say that there was trouble reading from the drive at some time in the past. But you have not determined the root cause -- the info available in zpool status is not sufficient. -- richard> > Thanks, > Jon > > -------------------------------------------------------- > $ zpool status MyStorage > pool: MyStorage > state: DEGRADED > status: One or more devices are faulted in response to persistent errors. > Sufficient replicas exist for the pool to continue functioning in a > degraded state. > action: Replace the faulted device, or use ''zpool clear'' to mark the device > repaired. > scrub: scrub completed after 8h7m with 0 errors on Sun Apr 11 13:07:40 2010 > config: > > NAME STATE READ WRITE CKSUM > MyStorage DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > c5t0d0 ONLINE 0 0 0 > c5t1d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c7t1d0 FAULTED 12 0 0 too many errors > > errors: No known data errors > -------------------------------------------------------- > $ fmdump > TIME UUID SUNW-MSG-ID > Apr 09 16:08:04.4660 1f07d23f-a4ba-cbbb-8713-d003d9771079 ZFS-8000-D3 > Apr 13 22:29:02.8063 e26c7e32-e5dd-cd9c-cd26-d5715049aad8 ZFS-8000-FD > -------------------------------------------------------- > That first log is the original drive being replaced. The second is the read errors on the new drive. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
Jonathan
2010-Apr-14 16:56 UTC
[zfs-discuss] Replaced drive in zpool, was fine, now degraded - ohno
I just ran ''iostat -En''. This is what was reported for the drive in question (all other drives showed 0 errors across the board. All drives indicated the "illegal request... predictive failure analysis" ------------------------------------------------------------------------------ c7t1d0 Soft Errors: 0 Hard Errors: 36 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 36 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 126 Predictive Failure Analysis: 0 ------------------------------------------------------------------------------ -- This message posted from opensolaris.org
Eric Andersen
2010-Apr-14 17:00 UTC
[zfs-discuss] Replaced drive in zpool, was fine, now degraded - ohno
> I''m on snv 111b. I attempted to get smartmontools > workings, but it doesn''t seem to want to work as > these are all sata drives.Have you tried using ''-d sat,12'' when using smartmontools? opensolaris.org/jive/thread.jspa?messageID=473727 -- This message posted from opensolaris.org
Richard Elling
2010-Apr-14 17:02 UTC
[zfs-discuss] Replaced drive in zpool, was fine, now degraded - ohno
On Apr 14, 2010, at 9:56 AM, Jonathan wrote:> I just ran ''iostat -En''. This is what was reported for the drive in question (all other drives showed 0 errors across the board. > > All drives indicated the "illegal request... predictive failure analysis" > ------------------------------------------------------------------------------ > c7t1d0 Soft Errors: 0 Hard Errors: 36 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 36 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 126 Predictive Failure Analysis: 0 > ------------------------------------------------------------------------------Don''t worry about illegal requests, they are not permanent. Do worry about media errors. Though this is the most common HDD error, it is also the cause of data loss. Fortunately, ZFS detected this and repaired it for you. Other file systems may not be so gracious. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
Jonathan
2010-Apr-14 17:08 UTC
[zfs-discuss] Replaced drive in zpool, was fine, now degraded - ohno
Yeah, ------------------------------------------ $smartctl -d sat,12 -i /dev/rdsk/c5t0d0 smartctl 5.39.1 2010-01-28 r3054 [i386-pc-solaris2.11] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) ------------------------------------------ I''m thinking between 111 and 132 (mentioned in post) something changed. -- This message posted from opensolaris.org
Cindy Swearingen
2010-Apr-14 17:27 UTC
[zfs-discuss] Replaced drive in zpool, was fine, now degraded - ohno
Jonathan, For a different diagnostic perspective, you might use the fmdump -eV command to identify what FMA indicates for this device. This level of diagnostics is below the ZFS level and definitely more detailed so you can see when these errors began and for how long. Cindy On 04/14/10 11:08, Jonathan wrote:> Yeah, > ------------------------------------------ > $smartctl -d sat,12 -i /dev/rdsk/c5t0d0 > smartctl 5.39.1 2010-01-28 r3054 [i386-pc-solaris2.11] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) > ------------------------------------------ > > I''m thinking between 111 and 132 (mentioned in post) something changed.
Jonathan
2010-Apr-14 17:28 UTC
[zfs-discuss] Replaced drive in zpool, was fine, now degraded - ohno
> > Do worry about media errors. Though this is the most > common HDD > error, it is also the cause of data loss. > Fortunately, ZFS detected this > and repaired it for you.Right. I assume you do recommend swapping the faulted drive out though? Other file systems may not> be so gracious. > -- richardAs we are all too aware I''m sure :) -- This message posted from opensolaris.org