Followup....
The problems I was having were NOT ZFS problems, but a bug in the
sd/ssd (scsi) driver.
Last night I rebooted the server and about half of the zpools came
up FAULTED. I tried a whole host of things, including checking the zfs
labels with zdb -l, everything looked good but zfs would not let me
use the pools (I had pools where all the devices were ONLINE but all
the vdevs were FAULTED).
I opened a case with Oracle at about 8:30 PM and got a call back
from Stephen Foster, a support engineer in the kernel group, within
half an hour. He asked me to check a couple more things and rapidly
got to the point where he was recommending a specific version (due to
other patch requirements on this box) of the sd/ssd driver patch.
After having to hop through a couple small hoops to get the sd/ssd and
our IDR patch to play nice together the system came up and all the
zpools were as I expected (all but two were ONLINE, the two that
weren''t were DEGRADED due to missing devices).
I zpool replaced the last of the (really) failed devices and
started a scrub sunning on one other suspect pool and went to bed.
The problem the sd/ssd patch fixes is that the sd/ssd driver was
not handling a couple very specific scsi commands fast enough for ZFS.
The problem shows itself when importing under Solaris 10U9 or when
making other underlying storage changes. It is apparently 10U9
specific.
On Fri, May 20, 2011 at 1:12 PM, Paul Kraus <paul at kraus-haus.org>
wrote:> I have run into a more serious and scary situation after our array
> outage yesterday.
>
> As I posted earlier today, I came in this morning and found 9 LUNs off
> line (our of over 120). Not a big deal, as the rest of the array was
> OK (and still is), and the other arrays are fine. Everything is
> mirrored across arrays. I started "zpool replace"ing bad LUNs
with
> some excess capacity we have. The first two went fine, the third is
> still resilvering. The fourth, on the other hand, has been a
> nightmare. Here is the current state:
>
> ? pool: deadbeef
> ?state: UNAVAIL
> status: One or more devices are faulted in response to IO failures.
> action: Make sure the affected devices are connected, then run
''zpool clear''.
> ? see: http://www.sun.com/msg/ZFS-8000-HC
> ?scrub: resilver in progress for 2h4m, 0.07% done, 3186h1m to go
> config:
>
> ? ? ? ?NAME ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? STATE ? ? READ WRITE
CKSUM
> ? ? ? ?deadbeef ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? UNAVAIL ? ? ?0
> 0 ? ? 0 ?insufficient replicas
> ? ? ? ? ?mirror-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? DEGRADED ? ? 0 ? ? 0 ?
? 0
> ? ? ? ? ? ?c5t600C0FF00000000009278536638D9B07d0 ? ?ONLINE ? ? ? 0 ? ? 0 ?
? 0
> ? ? ? ? ? ?replacing-1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?DEGRADED ? ? 0 ? ? 0 ?
? 0
> ? ? ? ? ? ? ?c5t600C0FF0000000000922614781B19005d0 ?UNAVAIL ? ? ?0
> ?0 ? ? 0 ?corrupted data
> ? ? ? ? ? ? ?c5t600C0FF00000000009277F7905F6DD05d0 ?ONLINE ? ? ? 0
> ?0 ? ? 0 ?38K resilvered
> ? ? ? ? ?mirror-1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? UNAVAIL ? ? ?0
> ?0 ? ? 0 ?corrupted data
> ? ? ? ? ? ?c5t600C0FF0000000000927852FB91AD301d0 ? ?ONLINE ? ? ? 0 ? ? 0 ?
? 0
> ? ? ? ? ? ?c5t600C0FF0000000000922614781B19006d0 ? ?ONLINE ? ? ? 0
> ?0 ? ? 0 ?14K resilvered
> ? ? ? ? ?mirror-2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0 ?
? 0
> ? ? ? ? ? ?c5t600C0FF00000000009277F6FA1A14C06d0 ? ?ONLINE ? ? ? 0
> ?0 ? ? 0 ?31K resilvered
> ? ? ? ? ? ?c5t600015D000060200000000000000B361d0 ? ?ONLINE ? ? ? 0 ? ? 0 ?
? 0
> ? ? ? ? ?mirror-3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? DEGRADED ? ? 0 ? ? 0 ?
? 0
> ? ? ? ? ? ?replacing-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?DEGRADED ? ? 0 ? ? 0 ?
? 0
> ? ? ? ? ? ? ?c5t600C0FF000000000092261491D9A9F09d0 ?UNAVAIL ? ? ?0
> ?0 ? ? 0 ?cannot open
> ? ? ? ? ? ? ?c5t600015D000060200000000000000B365d0 ?ONLINE ? ? ? 0
> ?0 ? ? 0 ?32.9M resilvered
> ? ? ? ? ? ?c5t600C0FF00000000009277F7905F6DD02d0 ? ?ONLINE ? ? ? 0
> ?0 ? ? 0 ?2.50K resilvered
>
> errors: 134 data errors, use ''-v'' for a list
>
> ? ?Now, of all these UNAVAIL and FAULTed devices only one is actually
> bad, c5t600C0FF000000000092261491D9A9F09d0 is from the raid set that
> is dead. Now, when the array was cold booted yesterday there was a
> temporary outage of the LUNs from the other two raidsets as well
> (c5t600C0FF0000000000922614781B19005d0 and
> c5t600C0FF0000000000922614781B19006d0). We have seen this before, and
> usually we just do a ''zpool clear'' of the device and a
resilver gets
> us back where we need to be.
>
> ? ?This time has been different... I did a ''zpool clear deadbeef
> c5t600C0FF0000000000922614781B19005d0'' and the zpool immediately
went
> UNAVAIL with c5t600C0FF00000000009278536638D9B07d0 going UNAVAIL. I
> did a ''zpool clear deadbeef
c5t600C0FF00000000009278536638D9B07d0'' and
> it came right back.
>
> ? ?At that point I confirmed that I could read from both
> c5t600C0FF00000000009278536638D9B07d0 and
> c5t600C0FF0000000000922614781B19005d0 using dd. I also let the
> resilver in progress complete, which it did in about an hour with no
> issues.
>
> ? ?I then did the zpool replace on
> c5t600C0FF000000000092261491D9A9F09d0 in mirror-3 (the really dead
> device) and I was rewarded with an UNAVAIL pool again. I cleared a
> number of known good devices and the got the pool back.
>
> ? At this point I assumed the zfs label on the
> c5t600C0FF0000000000922614781B19005d0 had gotten somehow corrupted so
> I tried a zpool replace of it with itself and even with -f it would
> not let me. So I tried replacing it with a different LUN, as you can
> see above. That was when it all went into the crapper and has stayed
> there. zpool clear does not even return (and can''t be killed).
> mirror-1 reports UNAVAIL but both halves report ONLINE.
>
> ? I am afraid to EXPORT in case it won''t IMPORT, but I have also
> started the process to restore from the replicated copy of the data
> from a remote site. After lunch I will probably try and EXPORT /
> IMPORT and see if that gets me anywhere.
>
> NOTE: there are 16 other pools on this server, one of which is
> resilvering, one of which still has bad LUNs I need to replace, and
> the rest are fine. The pool has a capacity of 1.5 TB and is about 1.37
> TB used, the remaining pool to cleanup is 8 TB used out of 9 TB and we
> really can''t afford to have these kinds of problems with that one.
>
> --
>
{--------1---------2---------3---------4---------5---------6---------7---------}
> Paul Kraus
> -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/
)
> -> Sound Coordinator, Schenectady Light Opera Company (
> http://www.sloctheater.org/ )
> -> Technical Advisor, RPI Players
>
--
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players