Hi,
Can anyone identify whether this is a known issue (perhaps 6667208) and
if the fix is going to be pushed out to Solaris 10 anytime soon? I''m
getting badly beaten up over this weekly, essentially anytime we drop a
packet between our twenty-odd iscsi-backed zones and the filer.
Chris was kind enough to provide his synopsis here (thanks Chris):
http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFailmodeProblem
Also, I really need a workaround for the meantime. Is someone out
there handy enough with the undocumented stuff to recommend a zdb
command or something that will pound the delinquent pool into submission
without crashing everything? Surely there''s a pool hard-reset command
somewhere for the QA guys, right?
thx
jake
Chris Siebenmann wrote:> You write:
> | Now I''d asked about this some months ago, but didn''t
get an answer so
> | forgive me for asking again: What''s the difference between wait
and
> | continue in my scenario? Will this allow the one faulted pool to fully
> | fail and accept that it''s broken, thereby allowing me to frob
the iscsi
> | initiator, re-import the pool and restart the zone? [...]
>
> Our experience here in a similar iscsi-based environment is that
> neither ''wait'' nor ''continue'' will
enable the pool to recover, and that
> frequently the entire system will eventually hang in a state where no
> ZFS pools can be used and the system can''t even be rebooted
cleanly.
>
> My primary testing has been on Solaris 10 update 6, and I wrote
> up the results here:
> http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFailmodeProblem
>
> I have recently been able to do preliminary testing on Solaris 10
> update 8, and it appears to behave more or less the same.
>
> I wish I had better news for you.
>
> - cks