Paul B. Henson
2010-Jan-09 17:45 UTC
[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly
We just had our first x4500 disk failure (which of course had to happen late Friday night <sigh>), I''ve opened a ticket on it but don''t expect a response until Monday so was hoping to verify the hot spare took over correctly and we still have redundancy pending device replacement. This is an S10U6 box: SunOS cartman 5.10 Generic_141445-09 i86pc i386 i86pc Looks like the first errors started yesterday morning: Jan 8 07:46:02 cartman marvell88sx: [ID 268337 kern.warning] WARNING: marvell88 sx1:device on port 2 failed to reset Jan 8 07:46:15 cartman marvell88sx: [ID 268337 kern.warning] WARNING: marvell88 sx1:device on port 2 failed to reset Jan 8 07:46:32 cartman sata: [ID 801593 kern.warning] WARNING: /pci at 0,0/pci1022 ,7458 at 2/pci11ab,11ab at 1: Jan 8 07:46:32 cartman SATA device at port 2 - device failed Jan 8 07:46:32 cartman scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci1022 ,7458 at 2/pci11ab,11ab at 1/disk at 2,0 (sd26): Jan 8 07:46:32 cartman Command failed to complete...Device is gone ZFS failed the drive about 11:15PM: Jan 8 23:15:01 cartman zpool_check[3702]: [ID 702911 daemon.error] zpool export status: One or more devices has experienced an unrecoverable error. An Jan 8 23:15:01 cartman zpool_check[3702]: [ID 702911 daemon.error] zpool export status: attempt was made to correct the error. Applications are unaffected. Jan 8 23:15:01 cartman zpool_check[3702]: [ID 702911 daemon.error] unknown head er see Jan 8 23:15:01 cartman zpool_check[3702]: [ID 702911 daemon.error] warning: poo l export health DEGRADED However, the errors continue still: Jan 9 03:54:48 cartman scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci1022 ,7458 at 2/pci11ab,11ab at 1/disk at 2,0 (sd26): Jan 9 03:54:48 cartman Command failed to complete...Device is gone [...] Jan 9 07:56:12 cartman scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci1022 ,7458 at 2/pci11ab,11ab at 1/disk at 2,0 (sd26): Jan 9 07:56:12 cartman Command failed to complete...Device is gone Jan 9 07:56:12 cartman scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci1022 ,7458 at 2/pci11ab,11ab at 1/disk at 2,0 (sd26): Jan 9 07:56:12 cartman drive offline If ZFS removed the drive from the pool, why does the system keep complaining about it? Is fault management stuff still poking at it? Here''s the zpool status output: pool: export state: DEGRADED [...] scrub: scrub completed after 0h6m with 0 errors on Fri Jan 8 23:21:31 2010 NAME STATE READ WRITE CKSUM export DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c0t2d0 ONLINE 0 0 0 spare DEGRADED 18.9K 0 0 c1t2d0 REMOVED 0 0 0 c5t0d0 ONLINE 0 0 18.9K spares c5t0d0 INUSE currently in use Is the pool/mirror/spare still supposed to show up as degraded after the hot spare is deployed? There are 18.9K checksum errors on the disk that failed, but there are also 18.9K read errors on the hot spare? The scrub started at 11pm last night, the disk got booted at 11:15pm, presumably the scrub came across the failures the os had been reporting. The last scrub status shows that scrub completing successfully. What happened to the resilver status? How can I tell if the resilver was successful? Did the resilver start and complete while the scrub was still running and its status output was lost? Is there any way to see the status of past scrubs/resilvers, or is only the most recent one available? Fault managment doesn''t report any problems: root at cartman ~ # fmdump TIME UUID SUNW-MSG-ID fmdump: /var/fm/fmd/fltlog is empty Shouldn''t this show a failed disk? fmdump -e shows tuns of bad stuff: Jan 08 07:46:32.9467 ereport.fs.zfs.probe_failure Jan 08 07:46:36.2015 ereport.fs.zfs.io [...] Jan 08 07:51:05.1865 ereport.fs.zfs.io None of that results in a fault diagnosys? Mostly I''d like to verify my hot spare is working correctly. Given the spare status is "degraded", the read errors on the spare device, and the lack of successful resilver status output, it seems like the spare might not have been added successfully. Thanks for any input you might provide... -- Paul B. Henson | (909) 979-6361 | csupomona.edu/~henson Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Eric Schrock
2010-Jan-09 21:09 UTC
[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly
On Jan 9, 2010, at 9:45 AM, Paul B. Henson wrote:> > If ZFS removed the drive from the pool, why does the system keep > complaining about it?It''s not failing in the sense that it''s returning I/O errors, but it''s flaky, so it''s attaching and detaching. Most likely it decided to attach again and then you got transport errors.> Is fault management stuff still poking at it?No.> Is the pool/mirror/spare still supposed to show up as degraded after the > hot spare is deployed?Yes.> There are 18.9K checksum errors on the disk that failed, but there are also > 18.9K read errors on the hot spare?This is a bug recently fixed in OpenSolaris.> The last scrub status shows that scrub completing successfully. What > happened to the resilver status?If there was a scrub it will show as the last thing completed.> How can I tell if the resilver was > successful?If the scrub was successful.> Did the resilver start and complete while the scrub was still > running and its status output was lost?No, only one can be active at any time.> Is there any way to see the status > of past scrubs/resilvers, or is only the most recent one available?Only the most recent one.> None of that results in a fault diagnosys?When the device is in the process of going away, no. From the OS perspective this disk was physically removed from the system.> Mostly I''d like to verify my hot spare is working correctly. Given the > spare status is "degraded", the read errors on the spare device, and the > lack of successful resilver status output, it seems like the spare might > not have been added successfully.No, it''s fine. DEGRADED just means the pool is not operating at the ideal state. By definition a hot spare is always DEGRADED. As long as the spare itself is ONLINE it''s fine. Hope that helps, - Eric -- Eric Schrock, Fishworks blogs.sun.com/eschrock
Ian Collins
2010-Jan-09 22:04 UTC
[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly
Paul B. Henson wrote:> We just had our first x4500 disk failure (which of course had to happen > late Friday night <sigh>), I''ve opened a ticket on it but don''t expect a > response until Monday so was hoping to verify the hot spare took over > correctly and we still have redundancy pending device replacement. > > This is an S10U6 box: > > Here''s the zpool status output: > > pool: export > state: DEGRADED > [...] > scrub: scrub completed after 0h6m with 0 errors on Fri Jan 8 23:21:31 > 2010 > > > NAME STATE READ WRITE CKSUM > export DEGRADED 0 0 0 > > mirror DEGRADED 0 0 0 > c0t2d0 ONLINE 0 0 0 > spare DEGRADED 18.9K 0 0 > c1t2d0 REMOVED 0 0 0 > c5t0d0 ONLINE 0 0 18.9K > > spares > c5t0d0 INUSE currently in use > > Is the pool/mirror/spare still supposed to show up as degraded after the > hot spare is deployed? > >Yes, the spare will show as degraded until you replace it. I had a pool on a 4500 that lost one drive, then swapped out 3 more due to brain farts from that naff Marvell driver. It was a bit of a concern for a while seeing two degraded devices in one raidz vdev!> The scrub started at 11pm last night, the disk got booted at 11:15pm, > presumably the scrub came across the failures the os had been reporting. > The last scrub status shows that scrub completing successfully. What > happened to the resilver status? How can I tell if the resilver was > successful? Did the resilver start and complete while the scrub was still > running and its status output was lost? Is there any way to see the status > of past scrubs/resilvers, or is only the most recent one available? > >You only see the last one, but a resilver is a scrub.> Mostly I''d like to verify my hot spare is working correctly. Given the > spare status is "degraded", the read errors on the spare device, and the > lack of successful resilver status output, it seems like the spare might > not have been added successfully. > >It has - "scrub completed after 0h6m with 0 errors". -- Ian.
Paul B. Henson
2010-Jan-09 23:17 UTC
[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly
On Sat, 9 Jan 2010, Eric Schrock wrote:> > If ZFS removed the drive from the pool, why does the system keep > > complaining about it? > > It''s not failing in the sense that it''s returning I/O errors, but it''s > flaky, so it''s attaching and detaching. Most likely it decided to attach > again and then you got transport errors.Ok, how do I make it stop logging messages about the drive until it is replaced? It''s still filling up the logs with the same errors about the drive being offline. Looks like hdadm isn''t it: root at cartman ~ # hdadm offline disk c1t2d0 /usr/bin/hdadm[1762]: /dev/rdsk/c1t2d0d0p0: cannot open /dev/rdsk/c1t2d0d0p0 is not available Hmm, I was able to unconfigure it with cfgadm: root at cartman ~ # cfgadm -c unconfigure sata1/2::dsk/c1t2d0 It went from: sata1/2::dsk/c1t2d0 disk connected configured failed to: sata1/2 disk connected unconfigured failed Hopefully that will stop the errors until it''s replaced and not break anything else :).> No, it''s fine. DEGRADED just means the pool is not operating at the > ideal state. By definition a hot spare is always DEGRADED. As long as > the spare itself is ONLINE it''s fine.The spare shows as "INUSE", but I''m guessing that''s fine too.> Hope that helpsThat was perfect, thank you very much for the review. Now I can not worry about it until Monday :). -- Paul B. Henson | (909) 979-6361 | csupomona.edu/~henson Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Cindy Swearingen
2010-Jan-11 16:56 UTC
[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly
Hi Paul, Example 11-1 in this section describes how to replace a disk on an x4500 system: docs.sun.com/app/docs/doc/819-5461/gbcet?a=view Cindy On 01/09/10 16:17, Paul B. Henson wrote:> On Sat, 9 Jan 2010, Eric Schrock wrote: > >>> If ZFS removed the drive from the pool, why does the system keep >>> complaining about it? >> It''s not failing in the sense that it''s returning I/O errors, but it''s >> flaky, so it''s attaching and detaching. Most likely it decided to attach >> again and then you got transport errors. > > Ok, how do I make it stop logging messages about the drive until it is > replaced? It''s still filling up the logs with the same errors about the > drive being offline. > > Looks like hdadm isn''t it: > > root at cartman ~ # hdadm offline disk c1t2d0 > /usr/bin/hdadm[1762]: /dev/rdsk/c1t2d0d0p0: cannot open > /dev/rdsk/c1t2d0d0p0 is not available > > Hmm, I was able to unconfigure it with cfgadm: > > root at cartman ~ # cfgadm -c unconfigure sata1/2::dsk/c1t2d0 > > It went from: > > sata1/2::dsk/c1t2d0 disk connected configured failed > > to: > > sata1/2 disk connected unconfigured failed > > Hopefully that will stop the errors until it''s replaced and not break > anything else :). > >> No, it''s fine. DEGRADED just means the pool is not operating at the >> ideal state. By definition a hot spare is always DEGRADED. As long as >> the spare itself is ONLINE it''s fine. > > The spare shows as "INUSE", but I''m guessing that''s fine too. > >> Hope that helps > > That was perfect, thank you very much for the review. Now I can not worry > about it until Monday :). >
Paul B. Henson
2010-Jan-12 01:42 UTC
[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly
On Sat, 9 Jan 2010, Eric Schrock wrote:> No, it''s fine. DEGRADED just means the pool is not operating at the > ideal state. By definition a hot spare is always DEGRADED. As long as > the spare itself is ONLINE it''s fine.One more question on this; so there''s no way to tell just from the status the difference between a pool degraded due to disk failure but still with full redundancy from a hot spare vs a pool degraded due to disk failure that has lost redundancy due to that failure? I guess you can review the pool details for the specifics but for large pools it seems it would be valuable to be able to quickly distinguish these states from the short status. Thanks... -- Paul B. Henson | (909) 979-6361 | csupomona.edu/~henson Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Eric Schrock
2010-Jan-12 01:48 UTC
[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly
On 01/11/10 17:42, Paul B. Henson wrote:> On Sat, 9 Jan 2010, Eric Schrock wrote: > >> No, it''s fine. DEGRADED just means the pool is not operating at the >> ideal state. By definition a hot spare is always DEGRADED. As long as >> the spare itself is ONLINE it''s fine. > > One more question on this; so there''s no way to tell just from the status > the difference between a pool degraded due to disk failure but still with > full redundancy from a hot spare vs a pool degraded due to disk failure > that has lost redundancy due to that failure? I guess you can review the > pool details for the specifics but for large pools it seems it would be > valuable to be able to quickly distinguish these states from the short > status.No, there is no way to tell if a pool has DTL (dirty time log) entries. - Eric -- Eric Schrock, Fishworks blogs.sun.com/eschrock
Paul B. Henson
2010-Jan-12 02:35 UTC
[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly
On Mon, 11 Jan 2010, Eric Schrock wrote:> No, there is no way to tell if a pool has DTL (dirty time log) entries.Hmm, I hadn''t heard that term before, but based on a quick search I take it that''s the list of data in the pool that is not fully redundant? So if a 2-way mirror vdev lost a half, everything written after the loss would be on the DTL, and if the same device came back, recovery would entail just running through the DTL and writing out what it missed? Although presumably if the failed device was replaced with another device entirely all of the data would need to be written out. I''m not quite sure that answered my question. My original question was, for example, given a 2-way mirror, one half fails. There is a hot spare available, which is pulled in, and while the pool isn''t optimal, it does have the same number of devices that it''s supposed to. On the other hand, the same mirror loses a device, there''s no hot spare, and the pool is short one device. My understanding is that in both scenarios the pool status would be "DEGRADED", but it seems there''s an important difference. In the first case, another device could fail, and the pool would still be ok. In the second, another device failing would result in complete loss of data. While you can tell the difference between these two different states by looking at the detailed output and seeing if a hot spare is in use, I was just saying that it would be nice for the short status to have some distinction between "device failed, hot spare in use" and "device failed, keep fingers crossed" ;). Back to your answer, if the existance of DTL entries means the pool doesn''t have full redundancy for some data, and you can''t tell if a pool has DTL entries, are you saying there''s no way to tell if the current state of your pool could survive a device failure? If a resilver successfully completes, barring another device failure, doesn''t that mean the pool is restored to full redundancy? I feel like I must be misunderstanding something :(. Thanks... -- Paul B. Henson | (909) 979-6361 | csupomona.edu/~henson Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Eric Schrock
2010-Jan-12 03:28 UTC
[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly
On Jan 11, 2010, at 6:35 PM, Paul B. Henson wrote:> On Mon, 11 Jan 2010, Eric Schrock wrote: > >> No, there is no way to tell if a pool has DTL (dirty time log) entries. > > Hmm, I hadn''t heard that term before, but based on a quick search I take it > that''s the list of data in the pool that is not fully redundant? So if a > 2-way mirror vdev lost a half, everything written after the loss would be > on the DTL, and if the same device came back, recovery would entail just > running through the DTL and writing out what it missed? Although presumably > if the failed device was replaced with another device entirely all of the > data would need to be written out. > > I''m not quite sure that answered my question. My original question was, for > example, given a 2-way mirror, one half fails. There is a hot spare > available, which is pulled in, and while the pool isn''t optimal, it does > have the same number of devices that it''s supposed to. On the other hand, > the same mirror loses a device, there''s no hot spare, and the pool is short > one device. My understanding is that in both scenarios the pool status > would be "DEGRADED", but it seems there''s an important difference. In the > first case, another device could fail, and the pool would still be ok. In > the second, another device failing would result in complete loss of data. > > While you can tell the difference between these two different states by > looking at the detailed output and seeing if a hot spare is in use, I was > just saying that it would be nice for the short status to have some > distinction between "device failed, hot spare in use" and "device failed, > keep fingers crossed" ;). > > Back to your answer, if the existance of DTL entries means the pool doesn''t > have full redundancy for some data, and you can''t tell if a pool has DTL > entries, are you saying there''s no way to tell if the current state of your > pool could survive a device failure? If a resilver successfully completes, > barring another device failure, doesn''t that mean the pool is restored to > full redundancy? I feel like I must be misunderstanding something :(.DTLs are a more specific answer to your question. It implies that a toplevel vdev has a known time when there is invalid data for it or one of its children. This may because a device failed and is accumulating DTL time, a new replacing or spare vdev was attached, or it may be because a device was unplugged and then plugged back in. Your example (hot spares) is but one of the ways in which this can happen, but in any of the cases it implies that data is not fully replicated. There is obviously a way to detect this in the kernel, it''s simply not exported to userland in any useful way. The reason I focused on DTLs is that if any mechanism were provided to distinguish a pool lacking full redundancy, it would be based on DTLs - nothing else makes sense. - Eric> > Thanks... > > > -- > Paul B. Henson | (909) 979-6361 | csupomona.edu/~henson > Operating Systems and Network Analyst | henson at csupomona.edu > California State Polytechnic University | Pomona CA 91768-- Eric Schrock, Fishworks blogs.sun.com/eschrock