Yaverot
2011-Mar-06 05:14 UTC
[zfs-discuss] How long should an empty destroy take? snv_134
I''m (still) running snv_134 on a home server. My main pool "tank" filled up last night ( 1G free remaining ). So today I bought new drives, adding them one at a time running format between each one to see what name they received. As I had a pair of names, I zpool create/add newtank mirror cxxt0do cyyt0d0 them. Then I got to the point where I need my unused drives. They were too small to act as spares for tank, but I didn''t want to lose track of them I stuck them in another pool called "others". We''re heading into the 3rd hour of the zpool destroy on "others". The system isn''t locked up, as it responds to local keyboard input, and existing ssh & smb connections. While this destroy is "running" all other zpool/zfs commands appear to be hung. The others pool never had more than 100G in it at one time, never had any snapshots, and was empty for atleast two weeks prior to the destroy command. I don''t think dedup was ever used on it, but that should hardly matter when the pool was already empty. "others" was never shared via smb or nfs. zpool destroy on an empty pool should be on the order of seconds, right? I really don''t want to reboot/power down the server, as I''ll use my current connections, and if there''s problems I don''t know when the system will be working again to re-establish. Yes, I''ve triple checked, I''m not destroying tank. While writing the email, I attempted a new ssh connection, it got to the "Last login:" line, but hasn''t made it to the prompt. So I really don''t want to take the server down physically. Doing a df, "others" doesn''t show, but rpool, tank, and newtank do. Another indication I issued destroy on the right pool. The smb connection is slower than normal, but still usable.
Richard Elling
2011-Mar-06 17:47 UTC
[zfs-discuss] How long should an empty destroy take? snv_134
On Mar 5, 2011, at 9:14 PM, Yaverot wrote:> I''m (still) running snv_134 on a home server. My main pool "tank" filled up last night ( 1G free remaining ). > So today I bought new drives, adding them one at a time running format between each one to see what name they received. > As I had a pair of names, I zpool create/add newtank mirror cxxt0do cyyt0d0 them. > > Then I got to the point where I need my unused drives. They were too small to act as spares for tank, but I didn''t want to lose track of them I stuck them in another pool called "others". > > We''re heading into the 3rd hour of the zpool destroy on "others"."zpool destroy" or "zfs destroy"? -- richard> The system isn''t locked up, as it responds to local keyboard input, and existing ssh & smb connections. > While this destroy is "running" all other zpool/zfs commands appear to be hung. > > The others pool never had more than 100G in it at one time, never had any snapshots, and was empty for atleast two weeks prior to the destroy command. I don''t think dedup was ever used on it, but that should hardly matter when the pool was already empty. > "others" was never shared via smb or nfs. > > zpool destroy on an empty pool should be on the order of seconds, right? > > I really don''t want to reboot/power down the server, as I''ll use my current connections, and if there''s problems I don''t know when the system will be working again to re-establish. > > Yes, I''ve triple checked, I''m not destroying tank. > While writing the email, I attempted a new ssh connection, it got to the "Last login:" line, but hasn''t made it to the prompt. So I really don''t want to take the server down physically. > > Doing a df, "others" doesn''t show, but rpool, tank, and newtank do. Another indication I issued destroy on the right pool. > The smb connection is slower than normal, but still usable. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Edward Ned Harvey
2011-Mar-07 01:14 UTC
[zfs-discuss] How long should an empty destroy take? snv_134
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Yaverot > > We''re heading into the 3rd hour of the zpool destroy on "others". > The system isn''t locked up, as it responds to local keyboard input, andI bet you, you''re in a semi-crashed state right now, which will degrade into a full system crash. You''ll have no choice but to power cycle. Prove me wrong, please. ;-) I bet, as soon as you type in any "zpool" or "zfs" command ... even "list" or "status" they will also hang indefinitely. Is your pool still 100% full? That''s probably the cause. I suggest if possible, immediately deleting something and destroying an old snapshot to free up a little bit of space. And then you can move onward...> While this destroy is "running" all other zpool/zfs commands appear to be > hung.Oh, sorry, didn''t see this before I wrote what I wrote above. This just further confirms what I said above.> zpool destroy on an empty pool should be on the order of seconds, right?zpool destroy is instant, regardless of how much data there is in a pool. zfs destroy is instant for an empty volume, but zfs destroy takes a long time for a lot of data. But as mentioned above, that''s irrelevant to your situation. Because your system is crashed, and even if you try init 0 or init 6... They will fail. You have no choice but to power cycle. For the heck of it, I suggest init 0 first. Then wait half an hour, and power cycle. Just to try and make the crash as graceful as possible. As soon as it comes back up, free up a little bit of space, so you can avoid a repeat.> Yes, I''ve triple checked, I''m not destroying tank. > While writing the email, I attempted a new ssh connection, it got to the"Last> login:" line, but hasn''t made it to the prompt.Oh, sorry, yet again this is confirming what I said above. semi-crashed and degrading into a full crash. Right now, you cannot open any new command prompts. Soon it will stop responding to ping. (Maybe 2-12 hours.)
Edward Ned Harvey
2011-Mar-07 01:21 UTC
[zfs-discuss] How long should an empty destroy take? snv_134
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Yaverot > > I''m (still) running snv_134 on a home server. My main pool "tank" filledup> last night ( 1G free remaining ).There is (or was) a bug that would sometimes cause the system to crash when 100% full. http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/41227 In that thread, the crash was related to being 100% full, running a scrub, and some write operations all at the same time. By any chance were you running a scrub? I am curious whether or not the scrub is actually an ingredient in that failure scenario, or if the scrub was just coincidence for me.
Nathan Kroenert
2011-Mar-07 03:49 UTC
[zfs-discuss] How long should an empty destroy take? snv_134
Why wouldn''t they try a reboot -d? That would at least get some data in the form of a crash dump if at all possible... A power cycle seems a little medieval to me... At least in the first instance. The other thing I have noted is that sometimes things to get wedged, and if you can find where, (mdb -k and take a poke at the stack of some of the zfs/zpool commands that are hung to see what they were operating on) and trying a zpool clear on that zpool. Not that I''m recommending that you should *need* to, but that has got me unwedged on occasion. (though, usually when I have dome something administratively silly... ;) Nathan. On 7/03/2011 12:14 PM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Yaverot >> >> We''re heading into the 3rd hour of the zpool destroy on "others". >> The system isn''t locked up, as it responds to local keyboard input, and > I bet you, you''re in a semi-crashed state right now, which will degrade into > a full system crash. You''ll have no choice but to power cycle. Prove me > wrong, please. ;-) > > I bet, as soon as you type in any "zpool" or "zfs" command ... even "list" > or "status" they will also hang indefinitely. > > Is your pool still 100% full? That''s probably the cause. I suggest if > possible, immediately deleting something and destroying an old snapshot to > free up a little bit of space. And then you can move onward... > > >> While this destroy is "running" all other zpool/zfs commands appear to be >> hung. > Oh, sorry, didn''t see this before I wrote what I wrote above. This just > further confirms what I said above. > > >> zpool destroy on an empty pool should be on the order of seconds, right? > zpool destroy is instant, regardless of how much data there is in a pool. > zfs destroy is instant for an empty volume, but zfs destroy takes a long > time for a lot of data. > > But as mentioned above, that''s irrelevant to your situation. Because your > system is crashed, and even if you try init 0 or init 6... They will fail. > You have no choice but to power cycle. > > For the heck of it, I suggest init 0 first. Then wait half an hour, and > power cycle. Just to try and make the crash as graceful as possible. > > As soon as it comes back up, free up a little bit of space, so you can avoid > a repeat. > > >> Yes, I''ve triple checked, I''m not destroying tank. >> While writing the email, I attempted a new ssh connection, it got to the > "Last >> login:" line, but hasn''t made it to the prompt. > Oh, sorry, yet again this is confirming what I said above. semi-crashed and > degrading into a full crash. > Right now, you cannot open any new command prompts. > Soon it will stop responding to ping. (Maybe 2-12 hours.) > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Yaverot
2011-Mar-07 07:46 UTC
[zfs-discuss] How long should an empty destroy take? snv_134
Follow up, and current status: In the morning I cut power (before receiving the 4 replies). Turning it on again, I got too impatient to get a text screen for diagnostics to show that I overfilled the keyboard buffer. I forced it off again (to stop the beeps), then waited longer before attempting to switch it from splash-screen to console. When it got up, "others" was still there, and a disk (c16) was "faulted" which, as I only used the pool for light testing, and holding the names of devices, was a stripe, so the pool was faulted. My guess is that the disk switched to faulted between the zpool status and the "zpool destroy others", and then got stuck trying to write the "not-in-use" label to the unavail disk. I was able to "zpool destroy -f others" and add those to my newtank. ( using -f on add ) So newtank is now large enough for a send/recv from tank. It isn''t done yet, but a scrub on tank takes about 36 hours (newtank is mirrors instead of tank''s raidz3). Two drives show faulted in tank, one I found, it renamed itself from either c12 or c14 to c21, but my attempt to add it back to the pool gave an error that c10 is already part of tank. Yes c10 is part of tank, but the commandline referred to c14 and c21, so why talk about c10? Getting the data onto newtank seamed the best thing to push for, so I''m doing the send/recv with tank degraded, one more disk can disappear before any data is at risk. My power-off/reboot before running an export/import loop on newtank means all those drives have different names now than the ones I wrote on them. :( rpool remains 1% inuse. tank reports 100% full (with 1.44G free), "others" is destroyed but I know that c16 is still physically connected and hasn''t be zfs-delabeled should it ever online itself. zfs list shows data being recv''ed on newtank. So: 1. send/recv tank->newtank progressing, and will hopefully finish with no problems. 2. Two disks apparently disappeared as they aren''t part of any pool and don''t show in format either.* 3. One disk renamed itself and therefore can''t be readded/reattached to tank. (now c21) 4. All drives put into newtank before the destroy showed up, but with different names. newtank imported cleanly (at the time it was still empty). *Or I don''t see them because I get lost in the order. Comparing output requires scrolling back & forth, and they aren''t sorted the same.
Edward Ned Harvey
2011-Mar-07 14:26 UTC
[zfs-discuss] How long should an empty destroy take? snv_134
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Yaverot > > rpool remains 1% inuse. tank reports 100% full (with 1.44G free),I recommend: When creating your new pool, use slices of the new disks, which are 99% of the size of the new disks instead of using the whole new disks. Because this is a more reliable way of avoiding the problem "my new replacement disk for the failed disk is slightly smaller than the failed disk and therefore I can''t replace." I also recommend: In every pool, create some space reservation. So when and if you ever hit 100% usage again and start to hit the system crash scenario, you can do a zfs destroy (snapshot) and delete the space reservation, in order to avoid the system crash scenario you just witnessed. Hopefully.