Zachary Bedell
2010-Aug-06 02:01 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
Greetings, all!
I''ve recently jumped into OpenSolaris after years of using Gentoo for
my primary server OS, and I''ve run into a bit of trouble on my main
storage zpool. Reading through the archives, it seems like the symptoms
I''m seeing are fairly common though the causes seem to vary a bit.
Also, it seems like many of the issues were fixed at or around snv99 while
I''m running dev-134.
The trouble started when I created and subsequently tried to destroy a 2TB zvol.
The zpool hosting it has compression & dedup enabled on the root.
I''ve run into issues previously with zvol destruction taking a long
time, so I was expecting this to take a while, but alas it managed to hang and
lock the system up tight before it completed.
Immediately after starting the `zfs destroy` process, all I/O to the zpool
stopped dead. No NFS, no Xen zvol access, no local POSIX access. Any attempts
to run `zpool <anything>` hung indefinitely and couldn''t be
Ctrl-C''d or kill -9''d. For about two hours, there was disk
activity on the pool (blinken lights), but then everything stopped. No more
lights, no response on network, and the console''s monitor was stuck in
powersave with no keyboard or mouse activity able to wake it up. I let it sit
that way for another hour or so before giving up and hitting the BRS.
Upon reboot, the OS hung at "Reading ZFS Config: -", again with
blinken lights. That ran for about an hour, then locked up as above.
In order to get back into the system, I pulled the four Samsung HD203WI drives
as well as the two OCZ Vertex SSD''s (split between ZIL, L2ARC, and swap
for the Linux xVM) and was able to boot up with just the rpool.
Knowing that my system is RAM constrained (4GB, but 1.5 of that dedicated to a
Linux in xVM), I thought that perhaps the ZFS ARC was exhausting system memory
and ultimately killing the system. Turning to the Evil Tuning Guide, I tried
adding "set zfs:zfs_arc_max = 0x20000000" to /etc/system, followed by
update-archive and reboot.
Once the system rebooted (still without) the six devices for the pool, I
hot-added the six devices, did `cfgadm -al` followed by `cfgadm -cconfigure
<the devices>` and had everything connected and spinning again. `zpool
status` of course was livid about the state of disrepair the pool was in:
pool: tank
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using ''zpool
online''.
see: http://www.sun.com/msg/ZFS-8000-3C
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank UNAVAIL 0 0 0 insufficient replicas
raidz1-0 UNAVAIL 0 0 0 insufficient replicas
c4t1d0 UNAVAIL 0 0 0 cannot open
c4t0d0 UNAVAIL 0 0 0 cannot open
c5t0d0 UNAVAIL 0 0 0 cannot open
c5t1d0 UNAVAIL 0 0 0 cannot open
logs
mirror-1 UNAVAIL 0 0 0 insufficient replicas
c7t4d0s0 UNAVAIL 0 0 0 cannot open
c7t5d0s0 UNAVAIL 0 0 0 cannot open
So I crossed my fingers, did a `zpool clear tank` and..... Nothing.... The
command hung indefinitely, but I had blinken lights again, so it was definitely
doing *something*. I headed off for $DAY_JOB, fingers still crossed (which
makes driving difficult, believe me...)
Through the magic of open WiFi on an otherwise closed corporate network, I was
able to VPN back home and keep an eye on things. At this point, `zpool
<anything>` still hung immortally (tried everything short of a stake
through the heart, but it wouldn''t be killed). Any existing SSH
sessions continued to work (accessed via VNC to my desktop where they''d
been left open), but any new SSH attempts hung before getting a shell:
fawn:~ pendor$ ssh pin
Last login: Fri Aug 6 00:08:30 2010 from fawn.thebedells
(the end -- nothing beyond this)
Peeking at `ps` showed that each new SSH attempt tried to run /usr/sbin/quota
and seemingly died there. I''m assuming that I could probably somehow
hack my SSHd or PAM config to skip quota, but that''s more or less moot.
Curiously, console interactive login to X as well as remote VNC sessions to the
server both worked fine and gave me a functional Gnome environment.
While this process was on-going, `zpool iostat` was off the menu, but running
plain-old `iostat -dex 10` gave:
extended device statistics ---- errors ---
device r/s w/s kr/s kw/s wait actv svc_t %w %b s/w h/w trn tot
sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
sd1 0.1 5.6 6.4 30.8 0.0 0.0 0.9 0 0 0 0 0 0
sd2 0.0 5.6 0.0 30.8 0.0 0.0 0.6 0 0 0 0 0 0
sd7 69.3 0.0 71.3 0.0 0.0 0.6 8.3 0 55 0 0 0 0
sd8 69.0 0.0 71.8 0.0 0.0 0.5 7.5 0 49 0 0 0 0
sd9 67.5 0.0 73.5 0.0 0.0 0.6 8.3 0 52 0 0 0 0
sd10 68.3 0.0 68.5 0.0 0.0 0.6 8.6 0 55 0 0 0 0
sd11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
sd12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
sd17 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
sd18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
There''s a *little* bit of throughput coming from the drives and
intermittent write going to the SSD''s. The read rate is pretty much
constant (between 70 & 140 per 10 seconds), but the SSD write only shows up
sporadically, maybe once or twice a minute.
Alas, about four hours after starting the process this time, everything went
dark, and the system stopped responding. Upon returning home, I found the
blinken lights had ceased, and any attempt to access the console failed to wake
the monitor. I pulled the drives & hit the BRS yet again.
At this point, I''m still operating on the assumption that "this
too shall pass," and that given enough RAM and patience, the pool might
eventually import. Given that a goodly chunk of the server''s measly
4GB was consumed by an inaccessible xVM, I hacked my grub menu.lst to allow me
to boot to a non-PV kernel and booted up with the full 4GB at Solaris''
disposal. Repeating the above dance with cfgadm and zpool clear has started the
import process again, and I anxiously await its outcome.
That said, I''m hoping someone with a bit more knowledge of ZFS than I
(which would be roughly the entire population of the planet) might have some
suggestion for a short cut to get this pool running again or at the very least
some suggestion that might allow me to delete zvol''s in the future
without having to perform major surgery. So far I''ve been running
about 90 minutes with the full 4GB available, and the blinken lights persist.
I''ll give what vitals I can on the system in the hopes they might help.
I''m not sure how much detail I''ll be able to give as
I''m not yet even qualified to be a complete Solaris n00b (some parts
are missing). That said, I''m comfortable as can be on Unix-like
systems (Linux & Darwin by day), and would love to help diagnose this issue
in detail. SSH tunneling, etc. is an option if anyone''s up for it.
For all my willingness to help, though, the data on this pool is rather on the
important side, so any debugging likely to lead to dataloss I''d rather
avoid if it all possible.
The system is OpenSolaris dev-b134 running on x64. Supermicro MBD-X7SBE
motherboard with an Intel E6400 Wolfdale 2.8GHz dual core CPU and 4GB of
DDR2-667 RAM. The disks are connected to two Supermicro AOC-SAT2-MV8 SATA
controllers (staggered across both of them) with the ZIL & L2ARC
SSD''s on two of the motherboard''s built-in SATA ports. Two WD
Caviar Blue drives also on the motherboard SATA are mirrored to make the rpool.
The whole mess is stored in a NORCO RPC-4220 case with 20 hot-swap (allegedly)
SATA bays. The top shelf of four bays go to the MB SATA where the remaining 16
are staggered across the two MV8''s via SFF-8087->4x SATA splitter
cables. My zpool''s are at version 22.
The system usually runs as a dom0, but for the time being I''ve switched
to a vanilla kernel so as to give maximal resources to ZFS.
As mentioned, above the pool in question has dedup and compression active.
Given my reading the last several days, I''ve come to conclude enabling
those was probably a BadIdea(tm), but c''est la vie, at least for the
time being.
Best case, if anyone can tell me how to escape import heck and get back to a
running system, I''ll be your friend forever... Failing that, if there
were some way to get a PROGRESS report of how the thing is doing, that would at
least satisfy my OCD need to know what''s going on.
If any additional info would help diagnose the issue, just name it.
Thanks in advance for any assistance!
Best regards,
Zac Bedell
--
This message posted from opensolaris.org
Richard Jahnel
2010-Aug-06 03:16 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
Assuming there are no other volumes sharing slices of those disk, why import? Just over write the disk with a new pool using the f flag during creation. I''m just sayin since you were destroying the volume anyway I presume there is no data we are trying to preserve here. -- This message posted from opensolaris.org
Zachary Bedell
2010-Aug-06 13:23 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
Alas the pool in question has a dozen odd other ZFS'' that range in importance from "nice to have" to "let''s not even think about it". On the bright side, at about 14 hours in, my lights are still blinken. Here''s hoping the +RAM and -xVM were the difference. One related question: In the unfortunate event this locks up again, does anyone know if on a subsequent import attempt the zvol destroy is resumed, or simply restarted? I don''t mind rinse, reboot, repeat if it''ll eventually get though the DDT check I suspect is going on. If it''s just starting over each time, I''ll probably bite the bullet and max my motherboard out at its astounding 8GB RAM capacity. -- This message posted from opensolaris.org
Richard Jahnel
2010-Aug-06 20:44 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
For arc reasons if no other, I would max it out to the 8 gb regardless. -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2010-Aug-07 21:30 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
----- Original Message -----> Greetings, all! > > I''ve recently jumped into OpenSolaris after years of using Gentoo for > my primary server OS, and I''ve run into a bit of trouble on my main > storage zpool. Reading through the archives, it seems like the > symptoms I''m seeing are fairly common though the causes seem to vary a > bit. Also, it seems like many of the issues were fixed at or around > snv99 while I''m running dev-134. > > The trouble started when I created and subsequently tried to destroy a > 2TB zvol. The zpool hosting it has compression & dedup enabled on the > root.The current dedup code is said to be good with a truckload of memory and sufficient L2ARC, but at 4GB of RAM, it sucks quite badly. I''ve done some testing with 134 and dedup on a 12TB box, and even removing small datasets (<1TB) may take a long time. If this happens to you, and you get an (unexpected?) reboot, let ZFS spend hours (or days) mounting the filesystems, and it''ll probably be ok after some time. Last time this happened to me, the box hung for some seven hours. I''ve heard of others talking about days. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Zachary Bedell
2010-Aug-08 00:54 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
It''s alive!! Some 20 hours after starting, the zpool import finished, and all data is alive & well. I ordered the extra RAM, so that''ll no doubt help in the future. I''m also in the process of de-Xen''ing the server I had running under xVM, so Solaris will get the whole 8GB to itself. Finally, I turned off compression & dedup on all the datasets, used zfs send to dump them to a clean pool that hasn''t and won''t see dedup. Lesson learned on dedup... Toy home servers need not apply. =) I need to do a bit of benchmarking on compression on the new drives as decompressing everything expanded several of the datasets a bit more than I would have liked. Might turn it on selectively as long as it''s only dedup that causes `zfs destroy` to take an eternity. Thanks all for the calm words. Turned out I just needed to wait it out, but I''m not very good at waiting when sick storage arrays are involved. =) -Zac -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2010-Aug-08 01:07 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
----- Original Message -----> It''s alive!! Some 20 hours after starting, the zpool import finished, > and all data is alive & well. > > I ordered the extra RAM, so that''ll no doubt help in the future. I''m > also in the process of de-Xen''ing the server I had running under xVM, > so Solaris will get the whole 8GB to itself. Finally, I turned off > compression & dedup on all the datasets, used zfs send to dump them to > a clean pool that hasn''t and won''t see dedup. > > Lesson learned on dedup... Toy home servers need not apply. =)Compression is safe, though Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Zachary Bedell
2010-Aug-11 15:06 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
Just wanted to post a bit of closure to this thread quick... Most of the "import taking too long threads" I''ve found on the list tend to fade out without any definitive answer as to what went wrong. I needed something a bit more concrete to make me happy. After zfs send''ing everything to a fresh pool, I destroyed the problematic pool and re-created it without dedup. I also recreated the original conditions of minimal RAM (only 4GB with half of that eaten by xVM), and created a new 2TB zvol. I re-ran my original experiment trying to get lofi encryption setup on top of the zvol (which worked, but proved far too slow to be any use, alas...), then zfs destroy''d the zvol. Destroy was close enough to instantaneous on the same hardware as the original problem. It was definitely the dedup (made far more acute by memory starvation) that caused the zfs destroy and subsequent re-import to take forever. So for the sake of keyword searches: zfs dedup (or dedupe) makes zfs destroy of large zvol take a long time, possibly causing kernel panic and/or system lockup on systems with low memory (2GB - 4GB confirmed, possibly higher). zfs import on subsequent reboot also takes a long time, but will eventually finish given adequate RAM. Disabling vXM and returning RAM from Xen to the dom0 may improve the speed and/or prevent system crashes before completion. Thanks again, all! -Zac -- This message posted from opensolaris.org
Ville Ojamo
2010-Aug-12 04:46 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
I am having a similar issue at the moment.. 3 GB RAM under ESXi, but dedup for this zvol (1.2 T) was turned off and only 300 G was used. The pool does contain other datasets with dedup turned on but are small enough so I''m not hitting the memory limits (been there, tried that, never again without maxing out the RAM + SSD). Tried to destroy the zvol, waited for a long time, and due to some unexpected environmental problems needed to pull the plug on the box quickly to save it. Now the boot is at "Reading ZFS config: *" since a few days, but I have time to wait. ESXi monitoring confirms CPU activity but very little I/O. My point, this particular zvol did not have deduplication turned on but it seems I am still hitting the same problem. snv_134 BTW this is a PoC box with nothing too important on it and I have some spare time, so if I can help somehow, for example with kernel debugging let me know. -- This message posted from opensolaris.org
Richard Elling
2010-Aug-16 00:09 UTC
[zfs-discuss] zpool ''stuck'' after failed zvol destory and reboot
On Aug 11, 2010, at 9:46 PM, Ville Ojamo wrote:> I am having a similar issue at the moment.. 3 GB RAM under ESXi, but dedup for this zvol (1.2 T) was turned off and only 300 G was used. The pool does contain other datasets with dedup turned on but are small enough so I''m not hitting the memory limits (been there, tried that, never again without maxing out the RAM + SSD). > > Tried to destroy the zvol, waited for a long time, and due to some unexpected environmental problems needed to pull the plug on the box quickly to save it. Now the boot is at "Reading ZFS config: *" since a few days, but I have time to wait. ESXi monitoring confirms CPU activity but very little I/O. > > My point, this particular zvol did not have deduplication turned on but it seems I am still hitting the same problem. > > snv_134 > > BTW this is a PoC box with nothing too important on it and I have some spare time, so if I can help somehow, for example with kernel debugging let me know.There are several fixes since b134, such as CR6948890 snapshot deletion can induce pathologically long spa_sync() times http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6948890 You might be hitting this, also. I know we''ve integrated this fix into the later Nexenta releases. -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com