Chris Murray
2008-Aug-25 19:02 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
Hi all, I have a RAID-Z zpool made up of 4 x SATA drives running on Nexenta 1.0.1 (OpenSolaris b85 kernel). It has on it some ZFS filesystems and few volumes that are shared to various windows boxes over iSCSI. On one particular iSCSI volume, I discovered that I had mistakenly deleted some files from the FAT32 partition that is on it. The files were still in a ZFS snapshot that was made earlier in the morning so I made use of the ZFS clone command to create a separate copy of the volume. I accessed it in Windows, got the files I needed, and then proceeded to delete it using "zfs destroy". During the process, disk activity stopped, my SSH windows stopped responding and Windows lost all iSCSI connections, reporting delayed write failed for the volumes that disappeared. I powered down the Nexenta box and started it back up, where it hung with the following output: SunOS Release 5.11 Version NexentaOS_20080312 64-bit Loading Nexenta... Hostname: mammoth This is before the usual "Reading ZFS config: done" and "Mounting ZFS filesystems" indicators. The only way I could bring system up was to disconnect all four SATA drives before power-on. I can then export the zpool, reboot, and the system comes up without complaint. However, of course, the pool isn''t imported. When I execute "zpool import", the pool is detected fine: pool: zp id: 2070286287887108251 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: zp ONLINE raidz1 ONLINE c0t1d0 ONLINE c0t0d0 ONLINE c0t3d0 ONLINE c0t2d0 ONLINE The next issue is that when the pool is actually imported ("zpool import -f zp"), it too hangs the whole system, albeit after a minute or so of disk activity. A "zpool iostat zp 10" during that time is below: capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- zp 1.73T 1018G 1.13K 7 4.43M 23.6K zp 1.73T 1018G 1.05K 0 4.07M 0 zp 1.73T 1018G 1.15K 0 4.88M 0 zp 1.73T 1018G 457 0 1.36M 0 zp 1.73T 1018G 668 0 2.49M 0 zp 1.73T 1018G 411 0 1.80M 0 [system stopped at this point and wouldn''t accept keypresses any more] I''m lost as to what to do - every time the pool is imported, it briefly turns up in "zpool status", but will then hang the system to the extent that I must power off, disconnect drives, power up, zpool export, and reboot, just to be able to start typing commands again!! So far I''ve tried: 1. Rebooting with only one of the SATA drives attached at a time. All four times the OS came up fine, but of course "zpool status" reported the pool as having insufficient replicas. I don''t know whether powering up with two or three drives will work; I didn''t want to try any permutations in case I made things worse. 2. Checking with "fmdump -e", the only output relating to zfs is regarding missing vdev''s and is presumably from when I have been rebooting with drives disconnected. 3. "dd if=/dev/rdsk/c0t0d0 of=/"dev/null bs=1048576" and the equivalents for the other three drives are all currently running and I await the results. Given that a scrub takes about 7 hours, I expect I''ll have to leave this overnight. 4. "zdb -e zp" is now at the stage of "Traversing all blocks to verify checksums and verify nothing leaked ...". I expect this will also take some time. While I wait for the results from "dd" and "zdb", is there anything else I can try in order to get the pool up and running again? I have spotted some previous, similar posts regarding hanging, notably this one: http://opensolaris.org/jive/thread.jspa?threadID=70205&tstart=15 Unfortunately, I am a bit of a Nexenta/OpenSolaris/Unix newbie so a lot of that is way over my head, and when the system completely hangs, I have no choice but to power off. Any help is much appreciated! Thanks, Chris This message posted from opensolaris.org
Miles Nordin
2008-Aug-25 19:23 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
>>>>> "cm" == Chris Murray <chrismurray84 at gmail.com> writes:cm> The next issue is that when the pool is actually imported cm> ("zpool import -f zp"), it too hangs the whole system, albeit cm> after a minute or so of disk activity. could it be #6573681? http://www.opensolaris.org/jive/message.jspa?messageID=261936&tstart=0#261936 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080825/ff566ed7/attachment.bin>
Victor Latushkin
2008-Aug-25 20:19 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
Miles Nordin wrote:>>>>>> "cm" == Chris Murray <chrismurray84 at gmail.com> writes: > > cm> The next issue is that when the pool is actually imported > cm> ("zpool import -f zp"), it too hangs the whole system, albeit > cm> after a minute or so of disk activity. > > could it be #6573681? > > http://www.opensolaris.org/jive/message.jspa?messageID=261936&tstart=0#261936It is pretty likely, at least visible symptoms are very similar. This bug is fixed in Solaris Nevada build 94. Any chance to try that? Victor
Chris Murray
2008-Aug-25 20:43 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
Ah-ha! That certainly looks like the same issue Miles - well spotted! As it happens, the "zdb" command failed with "out of memory -- generating core dump" whereas all four dd''s completed successfully. I''m downloading snv96 right now - I''ll install in the morning and post my results both here, and in the thread you mention. If this works, I may stay with OpenSolaris again - I''ve been unable to use Nexenta as an iSCSI target for ESXi 3.5 because of the b85 kernel, so this upgrade to b96 may kill two birds with one stone. This message posted from opensolaris.org
Chris Murray
2008-Aug-25 22:40 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
That''s a good point - I''ll try svn94 if I can get my hands on it - any idea where the download for it is? I''ve been going round in circles and all I can come up with are the variants of svn96 - CD, DVD (2 images), DVD (single image). Maybe that''s a sign I should give up for the night! Chris This message posted from opensolaris.org
Chris Murray
2008-Aug-26 10:47 UTC
[zfs-discuss] Unable to import zpool since system hang during zfs destroy
Ok, used the development 2008.11 (b95) livecd earlier this morning to import the pool, and it worked fine. I then rebooted back into Nexenta and all is well. Many thanks for the help guys! Chris This message posted from opensolaris.org