OK, so this is another "my pool got eaten" problem. Our setup:
Nevada 77 when it happened, now running 87.
9 iSCSI vdevs exported from Linux boxes off of hardware RAID (running Linux for
drivers on the RAID controllers). The pool itself is simply striped.
Our problem:
Power got yanked to 8 of the 9 vdevs. At the time, we had ZIL disabled and
write-back caching enabled on the vdevs for performance reasons. The ZIL *was*
going to be re-enabled, but Murphy''s Law says things crash beforehand.
On attempting to bring the system back up after a reboot, all the vdevs and the
pool itself is marked FAULTED with corrupted data.
What we''ve attempted:
Since last Thursday (today is the Wednesday afterwords), we''ve tried
using this weekend''s nightly build to use zpool import -F to no avail.
In addition, I''ve been going through and applying dtrace probes into
the kernel to see where its dying and how, to see if it''s a "turn
off sanity checks and mount r/o" issue, or if it''s that our data
is hopelessly munged. This attempt has resulted in a bit of a goose chase, with
possibilities popping up and failure modes branching quicker than I can take a
close look at them.
My partner here is working on the possibility of an offline file-grabbing
program, which shows some progress, but not much yet.
Our biggest problem is neither of us are experienced in kernel-land debugging or
filesystems, and at least I am rather unexperienced with the debugging power
tools available on Solaris, such as mdb, and uses of dtrace beyond looking at
function return values and entry arguments.
Is there someone who has a bit more experience with this who can help us?
-- Matt
This message posted from opensolaris.org