Chris Murray
2008-Aug-25 19:02 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
Hi all,
I have a RAID-Z zpool made up of 4 x SATA drives running on Nexenta 1.0.1
(OpenSolaris b85 kernel). It has on it some ZFS filesystems and few volumes that
are shared to various windows boxes over iSCSI. On one particular iSCSI volume,
I discovered that I had mistakenly deleted some files from the FAT32 partition
that is on it. The files were still in a ZFS snapshot that was made earlier in
the morning so I made use of the ZFS clone command to create a separate copy of
the volume. I accessed it in Windows, got the files I needed, and then proceeded
to delete it using "zfs destroy". During the process, disk activity
stopped, my SSH windows stopped responding and Windows lost all iSCSI
connections, reporting delayed write failed for the volumes that disappeared.
I powered down the Nexenta box and started it back up, where it hung with the
following output:
SunOS Release 5.11 Version NexentaOS_20080312 64-bit
Loading Nexenta...
Hostname: mammoth
This is before the usual "Reading ZFS config: done" and "Mounting
ZFS filesystems" indicators. The only way I could bring system up was to
disconnect all four SATA drives before power-on. I can then export the zpool,
reboot, and the system comes up without complaint. However, of course, the pool
isn''t imported. When I execute "zpool import", the pool is
detected fine:
pool: zp
id: 2070286287887108251
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
zp ONLINE
raidz1 ONLINE
c0t1d0 ONLINE
c0t0d0 ONLINE
c0t3d0 ONLINE
c0t2d0 ONLINE
The next issue is that when the pool is actually imported ("zpool import -f
zp"), it too hangs the whole system, albeit after a minute or so of disk
activity. A "zpool iostat zp 10" during that time is below:
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
zp 1.73T 1018G 1.13K 7 4.43M 23.6K
zp 1.73T 1018G 1.05K 0 4.07M 0
zp 1.73T 1018G 1.15K 0 4.88M 0
zp 1.73T 1018G 457 0 1.36M 0
zp 1.73T 1018G 668 0 2.49M 0
zp 1.73T 1018G 411 0 1.80M 0
[system stopped at this point and wouldn''t accept keypresses any more]
I''m lost as to what to do - every time the pool is imported, it briefly
turns up in "zpool status", but will then hang the system to the
extent that I must power off, disconnect drives, power up, zpool export, and
reboot, just to be able to start typing commands again!!
So far I''ve tried:
1. Rebooting with only one of the SATA drives attached at a time. All four times
the OS came up fine, but of course "zpool status" reported the pool as
having insufficient replicas. I don''t know whether powering up with two
or three drives will work; I didn''t want to try any permutations in
case I made things worse.
2. Checking with "fmdump -e", the only output relating to zfs is
regarding missing vdev''s and is presumably from when I have been
rebooting with drives disconnected.
3. "dd if=/dev/rdsk/c0t0d0 of=/"dev/null bs=1048576" and the
equivalents for the other three drives are all currently running and I await the
results. Given that a scrub takes about 7 hours, I expect I''ll have to
leave this overnight.
4. "zdb -e zp" is now at the stage of "Traversing all blocks to
verify checksums and verify nothing leaked ...". I expect this will also
take some time.
While I wait for the results from "dd" and "zdb", is there
anything else I can try in order to get the pool up and running again?
I have spotted some previous, similar posts regarding hanging, notably this one:
http://opensolaris.org/jive/thread.jspa?threadID=70205&tstart=15
Unfortunately, I am a bit of a Nexenta/OpenSolaris/Unix newbie so a lot of that
is way over my head, and when the system completely hangs, I have no choice but
to power off. Any help is much appreciated!
Thanks,
Chris
This message posted from opensolaris.org
Miles Nordin
2008-Aug-25 19:23 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
>>>>> "cm" == Chris Murray <chrismurray84 at gmail.com> writes:cm> The next issue is that when the pool is actually imported cm> ("zpool import -f zp"), it too hangs the whole system, albeit cm> after a minute or so of disk activity. could it be #6573681? http://www.opensolaris.org/jive/message.jspa?messageID=261936&tstart=0#261936 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080825/ff566ed7/attachment.bin>
Victor Latushkin
2008-Aug-25 20:19 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
Miles Nordin wrote:>>>>>> "cm" == Chris Murray <chrismurray84 at gmail.com> writes: > > cm> The next issue is that when the pool is actually imported > cm> ("zpool import -f zp"), it too hangs the whole system, albeit > cm> after a minute or so of disk activity. > > could it be #6573681? > > http://www.opensolaris.org/jive/message.jspa?messageID=261936&tstart=0#261936It is pretty likely, at least visible symptoms are very similar. This bug is fixed in Solaris Nevada build 94. Any chance to try that? Victor
Chris Murray
2008-Aug-25 20:43 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
Ah-ha! That certainly looks like the same issue Miles - well spotted! As it happens, the "zdb" command failed with "out of memory -- generating core dump" whereas all four dd''s completed successfully. I''m downloading snv96 right now - I''ll install in the morning and post my results both here, and in the thread you mention. If this works, I may stay with OpenSolaris again - I''ve been unable to use Nexenta as an iSCSI target for ESXi 3.5 because of the b85 kernel, so this upgrade to b96 may kill two birds with one stone. This message posted from opensolaris.org
Chris Murray
2008-Aug-25 22:40 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
That''s a good point - I''ll try svn94 if I can get my hands on it - any idea where the download for it is? I''ve been going round in circles and all I can come up with are the variants of svn96 - CD, DVD (2 images), DVD (single image). Maybe that''s a sign I should give up for the night! Chris This message posted from opensolaris.org
Chris Murray
2008-Aug-26 10:47 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
Ok, used the development 2008.11 (b95) livecd earlier this morning to import the pool, and it worked fine. I then rebooted back into Nexenta and all is well. Many thanks for the help guys! Chris This message posted from opensolaris.org