Zachary Bedell
2010-Aug-06 02:01 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
Greetings, all! I''ve recently jumped into OpenSolaris after years of using Gentoo for my primary server OS, and I''ve run into a bit of trouble on my main storage zpool. Reading through the archives, it seems like the symptoms I''m seeing are fairly common though the causes seem to vary a bit. Also, it seems like many of the issues were fixed at or around snv99 while I''m running dev-134. The trouble started when I created and subsequently tried to destroy a 2TB zvol. The zpool hosting it has compression & dedup enabled on the root. I''ve run into issues previously with zvol destruction taking a long time, so I was expecting this to take a while, but alas it managed to hang and lock the system up tight before it completed. Immediately after starting the `zfs destroy` process, all I/O to the zpool stopped dead. No NFS, no Xen zvol access, no local POSIX access. Any attempts to run `zpool <anything>` hung indefinitely and couldn''t be Ctrl-C''d or kill -9''d. For about two hours, there was disk activity on the pool (blinken lights), but then everything stopped. No more lights, no response on network, and the console''s monitor was stuck in powersave with no keyboard or mouse activity able to wake it up. I let it sit that way for another hour or so before giving up and hitting the BRS. Upon reboot, the OS hung at "Reading ZFS Config: -", again with blinken lights. That ran for about an hour, then locked up as above. In order to get back into the system, I pulled the four Samsung HD203WI drives as well as the two OCZ Vertex SSD''s (split between ZIL, L2ARC, and swap for the Linux xVM) and was able to boot up with just the rpool. Knowing that my system is RAM constrained (4GB, but 1.5 of that dedicated to a Linux in xVM), I thought that perhaps the ZFS ARC was exhausting system memory and ultimately killing the system. Turning to the Evil Tuning Guide, I tried adding "set zfs:zfs_arc_max = 0x20000000" to /etc/system, followed by update-archive and reboot. Once the system rebooted (still without) the six devices for the pool, I hot-added the six devices, did `cfgadm -al` followed by `cfgadm -cconfigure <the devices>` and had everything connected and spinning again. `zpool status` of course was livid about the state of disrepair the pool was in: pool: tank state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas raidz1-0 UNAVAIL 0 0 0 insufficient replicas c4t1d0 UNAVAIL 0 0 0 cannot open c4t0d0 UNAVAIL 0 0 0 cannot open c5t0d0 UNAVAIL 0 0 0 cannot open c5t1d0 UNAVAIL 0 0 0 cannot open logs mirror-1 UNAVAIL 0 0 0 insufficient replicas c7t4d0s0 UNAVAIL 0 0 0 cannot open c7t5d0s0 UNAVAIL 0 0 0 cannot open So I crossed my fingers, did a `zpool clear tank` and..... Nothing.... The command hung indefinitely, but I had blinken lights again, so it was definitely doing *something*. I headed off for $DAY_JOB, fingers still crossed (which makes driving difficult, believe me...) Through the magic of open WiFi on an otherwise closed corporate network, I was able to VPN back home and keep an eye on things. At this point, `zpool <anything>` still hung immortally (tried everything short of a stake through the heart, but it wouldn''t be killed). Any existing SSH sessions continued to work (accessed via VNC to my desktop where they''d been left open), but any new SSH attempts hung before getting a shell: fawn:~ pendor$ ssh pin Last login: Fri Aug 6 00:08:30 2010 from fawn.thebedells (the end -- nothing beyond this) Peeking at `ps` showed that each new SSH attempt tried to run /usr/sbin/quota and seemingly died there. I''m assuming that I could probably somehow hack my SSHd or PAM config to skip quota, but that''s more or less moot. Curiously, console interactive login to X as well as remote VNC sessions to the server both worked fine and gave me a functional Gnome environment. While this process was on-going, `zpool iostat` was off the menu, but running plain-old `iostat -dex 10` gave: extended device statistics ---- errors --- device r/s w/s kr/s kw/s wait actv svc_t %w %b s/w h/w trn tot sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd1 0.1 5.6 6.4 30.8 0.0 0.0 0.9 0 0 0 0 0 0 sd2 0.0 5.6 0.0 30.8 0.0 0.0 0.6 0 0 0 0 0 0 sd7 69.3 0.0 71.3 0.0 0.0 0.6 8.3 0 55 0 0 0 0 sd8 69.0 0.0 71.8 0.0 0.0 0.5 7.5 0 49 0 0 0 0 sd9 67.5 0.0 73.5 0.0 0.0 0.6 8.3 0 52 0 0 0 0 sd10 68.3 0.0 68.5 0.0 0.0 0.6 8.6 0 55 0 0 0 0 sd11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd17 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 There''s a *little* bit of throughput coming from the drives and intermittent write going to the SSD''s. The read rate is pretty much constant (between 70 & 140 per 10 seconds), but the SSD write only shows up sporadically, maybe once or twice a minute. Alas, about four hours after starting the process this time, everything went dark, and the system stopped responding. Upon returning home, I found the blinken lights had ceased, and any attempt to access the console failed to wake the monitor. I pulled the drives & hit the BRS yet again. At this point, I''m still operating on the assumption that "this too shall pass," and that given enough RAM and patience, the pool might eventually import. Given that a goodly chunk of the server''s measly 4GB was consumed by an inaccessible xVM, I hacked my grub menu.lst to allow me to boot to a non-PV kernel and booted up with the full 4GB at Solaris'' disposal. Repeating the above dance with cfgadm and zpool clear has started the import process again, and I anxiously await its outcome. That said, I''m hoping someone with a bit more knowledge of ZFS than I (which would be roughly the entire population of the planet) might have some suggestion for a short cut to get this pool running again or at the very least some suggestion that might allow me to delete zvol''s in the future without having to perform major surgery. So far I''ve been running about 90 minutes with the full 4GB available, and the blinken lights persist. I''ll give what vitals I can on the system in the hopes they might help. I''m not sure how much detail I''ll be able to give as I''m not yet even qualified to be a complete Solaris n00b (some parts are missing). That said, I''m comfortable as can be on Unix-like systems (Linux & Darwin by day), and would love to help diagnose this issue in detail. SSH tunneling, etc. is an option if anyone''s up for it. For all my willingness to help, though, the data on this pool is rather on the important side, so any debugging likely to lead to dataloss I''d rather avoid if it all possible. The system is OpenSolaris dev-b134 running on x64. Supermicro MBD-X7SBE motherboard with an Intel E6400 Wolfdale 2.8GHz dual core CPU and 4GB of DDR2-667 RAM. The disks are connected to two Supermicro AOC-SAT2-MV8 SATA controllers (staggered across both of them) with the ZIL & L2ARC SSD''s on two of the motherboard''s built-in SATA ports. Two WD Caviar Blue drives also on the motherboard SATA are mirrored to make the rpool. The whole mess is stored in a NORCO RPC-4220 case with 20 hot-swap (allegedly) SATA bays. The top shelf of four bays go to the MB SATA where the remaining 16 are staggered across the two MV8''s via SFF-8087->4x SATA splitter cables. My zpool''s are at version 22. The system usually runs as a dom0, but for the time being I''ve switched to a vanilla kernel so as to give maximal resources to ZFS. As mentioned, above the pool in question has dedup and compression active. Given my reading the last several days, I''ve come to conclude enabling those was probably a BadIdea(tm), but c''est la vie, at least for the time being. Best case, if anyone can tell me how to escape import heck and get back to a running system, I''ll be your friend forever... Failing that, if there were some way to get a PROGRESS report of how the thing is doing, that would at least satisfy my OCD need to know what''s going on. If any additional info would help diagnose the issue, just name it. Thanks in advance for any assistance! Best regards, Zac Bedell -- This message posted from opensolaris.org
Richard Jahnel
2010-Aug-06 03:16 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
Assuming there are no other volumes sharing slices of those disk, why import? Just over write the disk with a new pool using the f flag during creation. I''m just sayin since you were destroying the volume anyway I presume there is no data we are trying to preserve here. -- This message posted from opensolaris.org
Zachary Bedell
2010-Aug-06 13:23 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
Alas the pool in question has a dozen odd other ZFS'' that range in importance from "nice to have" to "let''s not even think about it". On the bright side, at about 14 hours in, my lights are still blinken. Here''s hoping the +RAM and -xVM were the difference. One related question: In the unfortunate event this locks up again, does anyone know if on a subsequent import attempt the zvol destroy is resumed, or simply restarted? I don''t mind rinse, reboot, repeat if it''ll eventually get though the DDT check I suspect is going on. If it''s just starting over each time, I''ll probably bite the bullet and max my motherboard out at its astounding 8GB RAM capacity. -- This message posted from opensolaris.org
Richard Jahnel
2010-Aug-06 20:44 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
For arc reasons if no other, I would max it out to the 8 gb regardless. -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2010-Aug-07 21:30 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
----- Original Message -----> Greetings, all! > > I''ve recently jumped into OpenSolaris after years of using Gentoo for > my primary server OS, and I''ve run into a bit of trouble on my main > storage zpool. Reading through the archives, it seems like the > symptoms I''m seeing are fairly common though the causes seem to vary a > bit. Also, it seems like many of the issues were fixed at or around > snv99 while I''m running dev-134. > > The trouble started when I created and subsequently tried to destroy a > 2TB zvol. The zpool hosting it has compression & dedup enabled on the > root.The current dedup code is said to be good with a truckload of memory and sufficient L2ARC, but at 4GB of RAM, it sucks quite badly. I''ve done some testing with 134 and dedup on a 12TB box, and even removing small datasets (<1TB) may take a long time. If this happens to you, and you get an (unexpected?) reboot, let ZFS spend hours (or days) mounting the filesystems, and it''ll probably be ok after some time. Last time this happened to me, the box hung for some seven hours. I''ve heard of others talking about days. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Zachary Bedell
2010-Aug-08 00:54 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
It''s alive!! Some 20 hours after starting, the zpool import finished, and all data is alive & well. I ordered the extra RAM, so that''ll no doubt help in the future. I''m also in the process of de-Xen''ing the server I had running under xVM, so Solaris will get the whole 8GB to itself. Finally, I turned off compression & dedup on all the datasets, used zfs send to dump them to a clean pool that hasn''t and won''t see dedup. Lesson learned on dedup... Toy home servers need not apply. =) I need to do a bit of benchmarking on compression on the new drives as decompressing everything expanded several of the datasets a bit more than I would have liked. Might turn it on selectively as long as it''s only dedup that causes `zfs destroy` to take an eternity. Thanks all for the calm words. Turned out I just needed to wait it out, but I''m not very good at waiting when sick storage arrays are involved. =) -Zac -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2010-Aug-08 01:07 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
----- Original Message -----> It''s alive!! Some 20 hours after starting, the zpool import finished, > and all data is alive & well. > > I ordered the extra RAM, so that''ll no doubt help in the future. I''m > also in the process of de-Xen''ing the server I had running under xVM, > so Solaris will get the whole 8GB to itself. Finally, I turned off > compression & dedup on all the datasets, used zfs send to dump them to > a clean pool that hasn''t and won''t see dedup. > > Lesson learned on dedup... Toy home servers need not apply. =)Compression is safe, though Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Zachary Bedell
2010-Aug-11 15:06 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
Just wanted to post a bit of closure to this thread quick... Most of the "import taking too long threads" I''ve found on the list tend to fade out without any definitive answer as to what went wrong. I needed something a bit more concrete to make me happy. After zfs send''ing everything to a fresh pool, I destroyed the problematic pool and re-created it without dedup. I also recreated the original conditions of minimal RAM (only 4GB with half of that eaten by xVM), and created a new 2TB zvol. I re-ran my original experiment trying to get lofi encryption setup on top of the zvol (which worked, but proved far too slow to be any use, alas...), then zfs destroy''d the zvol. Destroy was close enough to instantaneous on the same hardware as the original problem. It was definitely the dedup (made far more acute by memory starvation) that caused the zfs destroy and subsequent re-import to take forever. So for the sake of keyword searches: zfs dedup (or dedupe) makes zfs destroy of large zvol take a long time, possibly causing kernel panic and/or system lockup on systems with low memory (2GB - 4GB confirmed, possibly higher). zfs import on subsequent reboot also takes a long time, but will eventually finish given adequate RAM. Disabling vXM and returning RAM from Xen to the dom0 may improve the speed and/or prevent system crashes before completion. Thanks again, all! -Zac -- This message posted from opensolaris.org
Ville Ojamo
2010-Aug-12 04:46 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
I am having a similar issue at the moment.. 3 GB RAM under ESXi, but dedup for this zvol (1.2 T) was turned off and only 300 G was used. The pool does contain other datasets with dedup turned on but are small enough so I''m not hitting the memory limits (been there, tried that, never again without maxing out the RAM + SSD). Tried to destroy the zvol, waited for a long time, and due to some unexpected environmental problems needed to pull the plug on the box quickly to save it. Now the boot is at "Reading ZFS config: *" since a few days, but I have time to wait. ESXi monitoring confirms CPU activity but very little I/O. My point, this particular zvol did not have deduplication turned on but it seems I am still hitting the same problem. snv_134 BTW this is a PoC box with nothing too important on it and I have some spare time, so if I can help somehow, for example with kernel debugging let me know. -- This message posted from opensolaris.org
Richard Elling
2010-Aug-16 00:09 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
On Aug 11, 2010, at 9:46 PM, Ville Ojamo wrote:> I am having a similar issue at the moment.. 3 GB RAM under ESXi, but dedup for this zvol (1.2 T) was turned off and only 300 G was used. The pool does contain other datasets with dedup turned on but are small enough so I''m not hitting the memory limits (been there, tried that, never again without maxing out the RAM + SSD). > > Tried to destroy the zvol, waited for a long time, and due to some unexpected environmental problems needed to pull the plug on the box quickly to save it. Now the boot is at "Reading ZFS config: *" since a few days, but I have time to wait. ESXi monitoring confirms CPU activity but very little I/O. > > My point, this particular zvol did not have deduplication turned on but it seems I am still hitting the same problem. > > snv_134 > > BTW this is a PoC box with nothing too important on it and I have some spare time, so if I can help somehow, for example with kernel debugging let me know.There are several fixes since b134, such as CR6948890 snapshot deletion can induce pathologically long spa_sync() times http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6948890 You might be hitting this, also. I know we''ve integrated this fix into the later Nexenta releases. -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com