Jack Kielsmeier
2009-Dec-08 04:07 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Howdy, I upgraded to snv_128a from snv_125 . I wanted to do some de-dup testing :). I have two zfs pools: rpool and vault. I upgraded my vault zpool version and turned on dedup on datastore vault/shared_storage. I also turned on gzip compression on this dataset as well. Before I turned on dedup, I made a new datastore and copied all data to vault/shared_storage_temp (just in case something crazy happened to my dedup''d datastore, since dedup is new). I removed all data on my dedup''d datastore and copied all data from my temp datastore. After I realized my space savings wasn''t going to be that great, I decided to delete vault/shared_storage dataset. zfs destroy vault/shared_storage This hung, and couldn''t be killed. I force rebooted my system, and I couldn''t boot into Solaris. It hung at reading zfs config I then booted into single user mode (multiple times) and any zfs or zpool commands froze. I then rebooted to my snv_125 environment. As it should, it ignored my vault zpool, as it''s version is higher than it can understand. I forced an zpool export of vault and rebooted, I could then boot back into snv_128 and zpool import listed the pool of vault. However, I cannot import via name or identifier, the command hangs, as well as any additional zfs or zpool commands. I cannot kill or kill -9 the processes. Is there anything I can do to get my pool imported? I haven''t done much troubleshooting at all on opensolairs, I''d be happy to run any suggested commands and provide output. Thank you for the assistance. -- This message posted from opensolaris.org
Markus Kovero
2009-Dec-08 08:46 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Hi, are you sure zfs isnt just going thru transactions after forcibly stopping zfs destroy? Sometimes (always) it seems zfs/zpool commands just hang if you destroy larger filesets, in reality zfs is just doing its job, if you reboot server during dataset destroy it will take some time to come up. So how long you''ve waited, have you tried removing /etc/zfs/zpool.cache and then booting into snv_128, doing import and possibly watching disk with iostat to see is there any activity? Yours Markus Kovero -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jack Kielsmeier Sent: 8. joulukuuta 2009 6:08 To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled Howdy, I upgraded to snv_128a from snv_125 . I wanted to do some de-dup testing :). I have two zfs pools: rpool and vault. I upgraded my vault zpool version and turned on dedup on datastore vault/shared_storage. I also turned on gzip compression on this dataset as well. Before I turned on dedup, I made a new datastore and copied all data to vault/shared_storage_temp (just in case something crazy happened to my dedup''d datastore, since dedup is new). I removed all data on my dedup''d datastore and copied all data from my temp datastore. After I realized my space savings wasn''t going to be that great, I decided to delete vault/shared_storage dataset. zfs destroy vault/shared_storage This hung, and couldn''t be killed. I force rebooted my system, and I couldn''t boot into Solaris. It hung at reading zfs config I then booted into single user mode (multiple times) and any zfs or zpool commands froze. I then rebooted to my snv_125 environment. As it should, it ignored my vault zpool, as it''s version is higher than it can understand. I forced an zpool export of vault and rebooted, I could then boot back into snv_128 and zpool import listed the pool of vault. However, I cannot import via name or identifier, the command hangs, as well as any additional zfs or zpool commands. I cannot kill or kill -9 the processes. Is there anything I can do to get my pool imported? I haven''t done much troubleshooting at all on opensolairs, I''d be happy to run any suggested commands and provide output. Thank you for the assistance. -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jack Kielsmeier
2009-Dec-08 14:23 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I waited about 20 minutes or so. I''ll try your suggestions tonight. I didn''t look at iostat. I just figured it was hung after waiting that long, but now that I know it can take a very long time, I will watch it and make sure it''s doing something. Thanks. I''ll post my results either tonight or tomorrow morning. -- This message posted from opensolaris.org
Tim Cook
2009-Dec-08 17:00 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Tue, Dec 8, 2009 at 8:23 AM, Jack Kielsmeier <jackal at netins.net> wrote:> I waited about 20 minutes or so. I''ll try your suggestions tonight. > > I didn''t look at iostat. I just figured it was hung after waiting that > long, but now that I know it can take a very long time, I will watch it and > make sure it''s doing something. > >How big was the pool you destroyed, and what are the system specs? -- --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091208/bfbf7ce2/attachment.html>
Jack Kielsmeier
2009-Dec-08 17:26 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
The pool is roughly 4.5 TB (Raidz1, 4 1.5 TB Disks) I didn''t attempt to destroy the pool, only a dataset within the pool. The dataset is/was about 1.2TB. System Specs Intel Q6600 (2.4 Ghz Quad Core) 4GB RAM 2x 500 GB drives in zfs mirror (rpool) 4x 1.5 TB drives in zfs raidz1 array (vault) The 1.5TB drives are attached to a PCI sata card (Silicon Image), the rpool drives are using the integrated SATA ports Please let me know if you need further specs. And thank you. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-08 23:38 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
It''s been about 45 minutes now since I started trying to import the pool. I see disk activity (see below) What concerns me is my free memory keeps shrinking as time goes on. Now have 185MB free out of 4 gigs (and 2 gigs of swap free). Hope this doesn''t exhaust all my memory and freeze my box. I''ll post updated information later. iostat output (taken every 5 seconds): extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6d0 29.8 0.0 1894.7 0.0 0.0 0.6 0.0 20.2 0 60 c3d0 28.8 0.0 1843.3 0.0 0.0 0.4 0.0 12.6 0 36 c3d1 31.2 0.0 1984.3 0.0 0.0 0.7 0.0 21.0 0 65 c4d0 29.0 0.0 1830.9 0.0 0.0 0.3 0.0 11.5 0 33 c4d1 Tue Dec 8 17:34:15 CST 2009 cpu us sy wt id 1 1 0 97 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6d0 29.8 0.0 1881.9 0.0 0.0 0.6 0.0 20.6 0 61 c3d0 32.6 0.0 2086.3 0.0 0.0 0.4 0.0 11.7 0 38 c3d1 30.2 0.0 1932.7 0.0 0.0 0.6 0.0 20.2 0 61 c4d0 30.2 0.0 1932.7 0.0 0.0 0.4 0.0 12.4 0 37 c4d1 Tue Dec 8 17:34:20 CST 2009 cpu us sy wt id 1 1 0 98 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6d0 27.2 0.0 1728.2 0.0 0.0 0.6 0.0 20.3 0 55 c3d0 30.2 0.0 1932.8 0.0 0.0 0.4 0.0 13.0 0 39 c3d1 30.8 0.0 1958.6 0.0 0.0 0.7 0.0 21.5 0 66 c4d0 31.6 0.0 2009.8 0.0 0.0 0.4 0.0 11.3 0 36 c4d1 Tue Dec 8 17:34:25 CST 2009 cpu us sy wt id 1 1 0 98 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6d0 30.8 0.0 1971.2 0.0 0.0 0.6 0.0 20.4 0 63 c3d0 29.2 0.0 1868.8 0.0 0.0 0.4 0.0 13.0 0 38 c3d1 30.2 0.0 1932.8 0.0 0.0 0.6 0.0 20.3 0 61 c4d0 30.2 0.0 1920.2 0.0 0.0 0.4 0.0 12.1 0 37 c4d1 -- This message posted from opensolaris.org
Tim Cook
2009-Dec-08 23:59 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Tue, Dec 8, 2009 at 5:38 PM, Jack Kielsmeier <jackal at netins.net> wrote:> It''s been about 45 minutes now since I started trying to import the pool. > > I see disk activity (see below) > > What concerns me is my free memory keeps shrinking as time goes on. Now > have 185MB free out of 4 gigs (and 2 gigs of swap free). > > Hope this doesn''t exhaust all my memory and freeze my box. > > I''ll post updated information later. > > iostat output (taken every 5 seconds): > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6d0 > 29.8 0.0 1894.7 0.0 0.0 0.6 0.0 20.2 0 60 c3d0 > 28.8 0.0 1843.3 0.0 0.0 0.4 0.0 12.6 0 36 c3d1 > 31.2 0.0 1984.3 0.0 0.0 0.7 0.0 21.0 0 65 c4d0 > 29.0 0.0 1830.9 0.0 0.0 0.3 0.0 11.5 0 33 c4d1 > Tue Dec 8 17:34:15 CST 2009 > cpu > us sy wt id > 1 1 0 97 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6d0 > 29.8 0.0 1881.9 0.0 0.0 0.6 0.0 20.6 0 61 c3d0 > 32.6 0.0 2086.3 0.0 0.0 0.4 0.0 11.7 0 38 c3d1 > 30.2 0.0 1932.7 0.0 0.0 0.6 0.0 20.2 0 61 c4d0 > 30.2 0.0 1932.7 0.0 0.0 0.4 0.0 12.4 0 37 c4d1 > Tue Dec 8 17:34:20 CST 2009 > cpu > us sy wt id > 1 1 0 98 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6d0 > 27.2 0.0 1728.2 0.0 0.0 0.6 0.0 20.3 0 55 c3d0 > 30.2 0.0 1932.8 0.0 0.0 0.4 0.0 13.0 0 39 c3d1 > 30.8 0.0 1958.6 0.0 0.0 0.7 0.0 21.5 0 66 c4d0 > 31.6 0.0 2009.8 0.0 0.0 0.4 0.0 11.3 0 36 c4d1 > Tue Dec 8 17:34:25 CST 2009 > cpu > us sy wt id > 1 1 0 98 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6d0 > 30.8 0.0 1971.2 0.0 0.0 0.6 0.0 20.4 0 63 c3d0 > 29.2 0.0 1868.8 0.0 0.0 0.4 0.0 13.0 0 38 c3d1 > 30.2 0.0 1932.8 0.0 0.0 0.6 0.0 20.3 0 61 c4d0 > 30.2 0.0 1920.2 0.0 0.0 0.4 0.0 12.1 0 37 c4d1 > >That''s expected, ZFS will use all the memory it can unless you tell it not to. It shouldn''t freeze the box. -- --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091208/d6122cca/attachment.html>
Jack Kielsmeier
2009-Dec-09 02:36 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ah, good to know! I''m learning all kinds of stuff here :) The command (zpool import) is still running and I''m still seeing disk activity. Any rough idea as to how long this command should last? Looks like each disk is being read at a rate of 1.5-2 megabytes per second. Going worst case, assuming each disk is 1572864 megs (the 1.5TB disks are actually smaller than this due to the ''rounding'' drive manufacturers do) and 2 megs/sec read rate per disk, that means hopefully at most I should have to wait: 1572864(megs) / 2(megs/second) / 60 (seconds / minute) / 60 (minutes / hour) / 24 (hour / day): 9.1 days Again, I don''t know if the zpool import is looking at the entire contents of the disks, or what exactly it''s doing, but I''m hoping that would be the ''maximum'' I''d have to wait for this command to finish :) -- This message posted from opensolaris.org
Brent Jones
2009-Dec-09 03:41 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Tue, Dec 8, 2009 at 6:36 PM, Jack Kielsmeier <jackal at netins.net> wrote:> Ah, good to know! I''m learning all kinds of stuff here :) > > The command (zpool import) is still running and I''m still seeing disk activity. > > Any rough idea as to how long this command should last? Looks like each disk is being read at a rate of 1.5-2 megabytes per second. > > Going worst case, assuming each disk is ?1572864 megs (the 1.5TB disks are actually smaller than this due to the ''rounding'' drive manufacturers do) and 2 megs/sec read rate per disk, that means hopefully at most I should have to wait: > > 1572864(megs) / 2(megs/second) / 60 (seconds / minute) / 60 (minutes / hour) / 24 (hour / day): > > 9.1 days > > Again, I don''t know if the zpool import is looking at the entire contents of the disks, or what exactly it''s doing, but I''m hoping that would be the ''maximum'' I''d have to wait for this command to finish :) > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discuss >I submitted a bug a while ago about this: bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6855208 I''ll escalate since I have a support contract. But yes, I see this as a serious bug, I thought my machine had locked up entirely as well, it took about 2 days to finish a destroy on a volume about 12TB in size. -- Brent Jones brent at servuhome.net
Jack Kielsmeier
2009-Dec-09 04:06 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
The server just went "almost" totally unresponsive :( I still hear the disks thrashing. If I press keys on the keyboard, my login screen will not show up. I had a VNC session hang and can no longer get back in. I can try to ssh to the server, I get prompted for my username and password, but it will not drop me off to a prompt. ------------------------- login as: redshirt Using keyboard-interactive authentication. Password: ------------------------- It just hangs there I also run Virtual Box with a Ubuntu VM, that VM went unresponsive as well. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-09 04:16 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I just hard-rebooted my server. I''m moving off my VM to my laptop so it can continue to run :) Then, if it "freezes" again I''ll just let it sit, as I did hear the disks thrashing. -- This message posted from opensolaris.org
Tim Cook
2009-Dec-09 04:18 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Tue, Dec 8, 2009 at 10:16 PM, Jack Kielsmeier <jackal at netins.net> wrote:> I just hard-rebooted my server. I''m moving off my VM to my laptop so it can > continue to run :) > > Then, if it "freezes" again I''ll just let it sit, as I did hear the disks > thrashing. > -- >As long as you''ve already rebooted, you should limit the amount of memory ZFS can use before you restart the import. -- --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091208/bc9961bd/attachment.html>
Jack Kielsmeier
2009-Dec-09 04:38 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, When searching for how to do that, I see that it requires a modification to /etc/system. I''m thinking I''ll limit it to 1GB, so the entry (which must be in hex) appears to be: set zfs:zfs_arc_max = 0x40000000 Then I''ll reboot the server and try the import again. Thanks for the continued assistance. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-09 05:23 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Upon further research, it appears I need to limit both the ncsize and the arc_max. I think I''ll use: set ncsize = 0x30000000 set zfs:zfs_arc_max = 0x10000000 That should give me a max of 1GB used between both. If I should be using different values (or other settings), please let me know :) -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-09 05:31 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Yikes, Posted too soon. I don''t want to set my ncsize that high!!! (Was thinking the entry was memory, but it''s entries). set ncsize = 250000 set zfs:zfs_arc_max = 0x10000000 Now THIS should hopefully only make it so the process can take around 1GB of RAM. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-09 05:47 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, I have started the zpool import again. Looking at iostat, it looks like I''m getting compatible read speeds (possibly a little slower): extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6d0 26.4 0.0 1689.6 0.0 0.0 0.6 0.0 21.9 0 58 c3d0 28.0 0.0 1792.0 0.0 0.0 0.4 0.0 14.0 0 39 c3d1 29.0 0.0 1856.0 0.0 0.0 0.6 0.0 22.0 0 64 c4d0 27.6 0.0 1766.4 0.0 0.0 0.4 0.0 13.2 0 37 c4d1 Tue Dec 8 23:44:50 CST 2009 cpu us sy wt id 0 0 0 99 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c5t1d0 0.0 13.8 0.0 54.6 0.0 0.0 1.4 2.4 0 3 c5d0 0.6 12.6 26.2 54.6 0.0 0.0 0.4 1.3 0 2 c6d0 28.2 0.0 1804.8 0.0 0.0 0.6 0.0 21.9 0 62 c3d0 28.6 0.0 1817.8 0.0 0.0 0.4 0.0 13.8 0 40 c3d1 27.2 0.0 1740.8 0.0 0.0 0.6 0.0 21.0 0 57 c4d0 27.4 0.0 1741.0 0.0 0.0 0.4 0.0 12.9 0 35 c4d1 I also have a ton of free RAM now. I think I can bump up my settings quite a bit, but I''m not going to (at least not yet). I''m heading to bed. I''ll post an update tomorrow. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-09 06:01 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> On Tue, Dec 8, 2009 at 6:36 PM, Jack Kielsmeier > <jackal at netins.net> wrote: > > Ah, good to know! I''m learning all kinds of stuff > here :) > > > > The command (zpool import) is still running and I''m > still seeing disk activity. > > > > Any rough idea as to how long this command should > last? Looks like each disk is being read at a rate of > 1.5-2 megabytes per second. > > > > Going worst case, assuming each disk is ?1572864 > megs (the 1.5TB disks are actually smaller than this > due to the ''rounding'' drive manufacturers do) and 2 > megs/sec read rate per disk, that means hopefully at > most I should have to wait: > > > > 1572864(megs) / 2(megs/second) / 60 (seconds / > minute) / 60 (minutes / hour) / 24 (hour / day): > > > > 9.1 days > > > > Again, I don''t know if the zpool import is looking > at the entire contents of the disks, or what exactly > it''s doing, but I''m hoping that would be the > ''maximum'' I''d have to wait for this command to finish > :) > > -- > > This message posted from opensolaris.org > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > > mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > > > > I submitted a bug a while ago about this: > > bugs.opensolaris.org/bugdatabase/view_bug.do?bu > g_id=6855208 > > I''ll escalate since I have a support contract. But > yes, I see this as > a serious bug, I thought my machine had locked up > entirely as well, it > took about 2 days to finish a destroy on a volume > about 12TB in size. > > -- > Brent Jones > brent at servuhome.net > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discu > ssThanks for escalating this. -- This message posted from opensolaris.org
Fajar A. Nugraha
2009-Dec-09 06:08 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Wed, Dec 9, 2009 at 10:41 AM, Brent Jones <brent at servuhome.net> wrote:> I submitted a bug a while ago about this: > > bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6855208 > > I''ll escalate since I have a support contract. But yes, I see this as > a serious bug, I thought my machine had locked up entirely as well, it > took about 2 days to finish a destroy on a volume about 12TB in size.So the cause is the large size of destroyed dataset? Yikes! I mean in classic volume manager + fs setup, destroying a filesystem is as simple as removing that volume. I''d hoped that zfs could do the same thing in terms of speed, but apparently that''s not the case (at least for now). -- Fajar
Michael Herf
2009-Dec-09 07:37 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Am in the same boat, exactly. Destroyed a large set and rebooted, with a scrub running on the same pool. My reboot stuck on "Reading ZFS Config: *" for several hours (disks were active). I cleared the zpool.cache from single-user and am doing an import (can boot again). I wasn''t able to boot my 123 build (kernel panic), even though my rpool is an older version. zpool import is pegging all 4 disks in my RAIDZ-1. Can''t touch zpool/zfs commands during the import or they hang...but regular iostat is ok for watching what''s going on. I didn''t limit ARC memory (box has 6GB), we''ll see if that''s ok. -- This message posted from opensolaris.org
Markus Kovero
2009-Dec-09 07:44 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
>From what I''ve noticed, if one destroys dataset that is say 50-70TB and reboots before destroy is finished, it can take up to several _days_ before it''s back up again.So, nowadays I''m doing rm -fr BEFORE issuing zfs destroy whenever possible. Yours Markus Kovero -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Michael Herf Sent: 9. joulukuuta 2009 9:38 To: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled Am in the same boat, exactly. Destroyed a large set and rebooted, with a scrub running on the same pool. My reboot stuck on "Reading ZFS Config: *" for several hours (disks were active). I cleared the zpool.cache from single-user and am doing an import (can boot again). I wasn''t able to boot my 123 build (kernel panic), even though my rpool is an older version. zpool import is pegging all 4 disks in my RAIDZ-1. Can''t touch zpool/zfs commands during the import or they hang...but regular iostat is ok for watching what''s going on. I didn''t limit ARC memory (box has 6GB), we''ll see if that''s ok. -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org mail.opensolaris.org/mailman/listinfo/zfs-discuss
Michael Herf
2009-Dec-09 08:15 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
zpool import done! Back online. Total downtime for 4TB pool was about 8 hours, don''t know how much of this was completing the destroy transaction. -- This message posted from opensolaris.org
Hi all, Is there any way to generate some report related to the de-duplication feature of ZFS within a zpool/zfs pool? I mean, its nice to have the dedup ratio, but it think it would be also good to have a report where we could see what directories/files have been found as repeated and therefore they "suffered" deduplication. Thanks for your time, Bruno -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091209/758e5057/attachment.bin>
On Wed, Dec 9, 2009 at 2:26 PM, Bruno Sousa <bsousa at epinfante.com> wrote:> Hi all, > > Is there any way to generate some report related to the de-duplication > feature of ZFS within a zpool/zfs pool? > I mean, its nice to have the dedup ratio, but it think it would be also > good to have a report where we could see what directories/files have > been found as repeated and therefore they "suffered" deduplication.Nice to have at first glance, but could you detail on any specific use-case you see? Regards, Andrey> > Thanks for your time, > Bruno > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Hi Andrey, For instance, i talked about deduplication to my manager and he was happy because less data = less storage, and therefore less costs . However, now the IT group of my company needs to provide to management board, a report of duplicated data found per share, and in our case one share means one specific company department/division. Bottom line, the mindset is something like : * one share equals to a specific department within the company * the department demands a X value of data storage * the data storage costs Y * making a report of the amount of data consumed by a department, before and after deduplication, means that data storage costs can be seen per department * if theres a cost reduction due to the usage of deduplication, part of that money can be used for business , either IT related subjects or general business * management board wants to see numbers related to costs, and not things like "the racio of deduplication in SAN01 is 3x", because for management this is "geek talk" I hope i was somehow clear, but i can try to explain better if needed. Thanks, Bruno Andrey Kuzmin wrote:> On Wed, Dec 9, 2009 at 2:26 PM, Bruno Sousa <bsousa at epinfante.com> wrote: > >> Hi all, >> >> Is there any way to generate some report related to the de-duplication >> feature of ZFS within a zpool/zfs pool? >> I mean, its nice to have the dedup ratio, but it think it would be also >> good to have a report where we could see what directories/files have >> been found as repeated and therefore they "suffered" deduplication. >> > > Nice to have at first glance, but could you detail on any specific > use-case you see? > > Regards, > Andrey > > >> Thanks for your time, >> Bruno >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> >> > >-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091209/b53cdd6e/attachment.bin>
Jack Kielsmeier
2009-Dec-09 11:47 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> zpool import done! Back online. > > Total downtime for 4TB pool was about 8 hours, don''t > know how much of this was completing the destroy > transaction.Lucky You! :) My box has gone totally unresponsive again :( I cannot even ping it now and I can''t hear the disks thrashing. -- This message posted from opensolaris.org
On Wed, Dec 9, 2009 at 2:47 PM, Bruno Sousa <bsousa at epinfante.com> wrote:> Hi Andrey, > > For instance, i talked about deduplication to my manager and he was > happy because less data = less storage, and therefore less costs . > However, now the IT group of my company needs to provide to management > board, a report of duplicated data found per share, and in our case one > share means one specific company department/division. > Bottom line, the mindset is something like : > > ? ?* one share equals to a specific department within the company > ? ?* the department demands a X value of data storage > ? ?* the data storage costs Y > ? ?* making a report of the amount of data consumed by a department, > ? ? ?before and after deduplication, means that data storage costs can > ? ? ?be seen per departmentDo you currently have tools that report storage usage per share? What you ask for looks like a request to make these deduplication-aware.> ? ?* if theres a cost reduction due to the usage of deduplication, part > ? ? ?of that money can be used for business , either IT related > ? ? ?subjects or general business > ? ?* management board wants to see numbers related to costs, and not > ? ? ?things like "the racio of deduplication in SAN01 is 3x", because > ? ? ?for management this is "geek talk"Just divide storage costs by deduplication factor (>1), and here you are (provided you can do it by department). Regards, Andrey> > I hope i was somehow clear, but i can try to explain better if needed. > > Thanks, > Bruno > > Andrey Kuzmin wrote: >> On Wed, Dec 9, 2009 at 2:26 PM, Bruno Sousa <bsousa at epinfante.com> wrote: >> >>> Hi all, >>> >>> Is there any way to generate some report related to the de-duplication >>> feature of ZFS within a zpool/zfs pool? >>> I mean, its nice to have the dedup ratio, but it think it would be also >>> good to have a report where we could see what directories/files have >>> been found as repeated and therefore they "suffered" deduplication. >>> >> >> Nice to have at first glance, but could you detail on any specific >> use-case you see? >> >> Regards, >> Andrey >> >> >>> Thanks for your time, >>> Bruno >>> >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >>> >>> >> >> > >
Jack Kielsmeier
2009-Dec-09 12:51 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I have disabled all ''non-important'' processes (gdm, ssh, vnc, etc). I am now starting this process locally on the server via the console with about 3.4 GB free of RAM. I still have my entries in /etc/system for limiting how much RAM zfs can use. -- This message posted from opensolaris.org
Hi, The tool to report storage usage per share is du -h / df -h :) , so yes, these tools could be deduplication aware. I know for instance that microsoft has a feature (in Win2003 R2), called File Server Resource Manager, and inside theres the possibility to make Storage Reports, and one of those reports is Duplicated Files. Bottom line, if ZFS can deliver such a capability, i think that Solaris/OpenSolaris would gain yet another competitive edge over other solutions, therefore more customers could see more and more advantages by choosing ZFS based storage. Bruno Andrey Kuzmin wrote:> On Wed, Dec 9, 2009 at 2:47 PM, Bruno Sousa <bsousa at epinfante.com> wrote: > >> Hi Andrey, >> >> For instance, i talked about deduplication to my manager and he was >> happy because less data = less storage, and therefore less costs . >> However, now the IT group of my company needs to provide to management >> board, a report of duplicated data found per share, and in our case one >> share means one specific company department/division. >> Bottom line, the mindset is something like : >> >> * one share equals to a specific department within the company >> * the department demands a X value of data storage >> * the data storage costs Y >> * making a report of the amount of data consumed by a department, >> before and after deduplication, means that data storage costs can >> be seen per department >> > > Do you currently have tools that report storage usage per share? What > you ask for looks like a request to make these deduplication-aware. > > >> * if theres a cost reduction due to the usage of deduplication, part >> of that money can be used for business , either IT related >> subjects or general business >> * management board wants to see numbers related to costs, and not >> things like "the racio of deduplication in SAN01 is 3x", because >> for management this is "geek talk" >> > > Just divide storage costs by deduplication factor (>1), and here you > are (provided you can do it by department). > > Regards, > Andrey > > >> I hope i was somehow clear, but i can try to explain better if needed. >> >> Thanks, >> Bruno >> >> Andrey Kuzmin wrote: >> >>> On Wed, Dec 9, 2009 at 2:26 PM, Bruno Sousa <bsousa at epinfante.com> wrote: >>> >>> >>>> Hi all, >>>> >>>> Is there any way to generate some report related to the de-duplication >>>> feature of ZFS within a zpool/zfs pool? >>>> I mean, its nice to have the dedup ratio, but it think it would be also >>>> good to have a report where we could see what directories/files have >>>> been found as repeated and therefore they "suffered" deduplication. >>>> >>>> >>> Nice to have at first glance, but could you detail on any specific >>> use-case you see? >>> >>> Regards, >>> Andrey >>> >>> >>> >>>> Thanks for your time, >>>> Bruno >>>> >>>> _______________________________________________ >>>> zfs-discuss mailing list >>>> zfs-discuss at opensolaris.org >>>> mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>> >>>> >>>> >>>> >>> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091209/ebe097f5/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091209/ebe097f5/attachment.bin>
Bob Friesenhahn
2009-Dec-09 16:58 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Wed, 9 Dec 2009, Markus Kovero wrote:>> From what I''ve noticed, if one destroys dataset that is say 50-70TB and reboots before destroy is finished, it can take up to several _days_ before it''s back up again. > So, nowadays I''m doing rm -fr BEFORE issuing zfs destroy whenever possible.It stands to reason that if deduplication is done via reference counting then whenever a deduplicated block is freed its duplication count needs to be reduced and it needs to be done atomically. Blocks such as full-length zeroed blocks (common for zfs logical volumes) are likely to be quite heavily duplicated. That may be where this bottleneck is coming from. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, simplesystems.org/users/bfriesen GraphicsMagick Maintainer, GraphicsMagick.org
On Dec 9, 2009, at 3:47 AM, Bruno Sousa wrote:> Hi Andrey, > > For instance, i talked about deduplication to my manager and he was > happy because less data = less storage, and therefore less costs . > However, now the IT group of my company needs to provide to management > board, a report of duplicated data found per share, and in our case > one > share means one specific company department/division. > Bottom line, the mindset is something like : > > * one share equals to a specific department within the company > * the department demands a X value of data storage > * the data storage costs Y > * making a report of the amount of data consumed by a department, > before and after deduplication, means that data storage costs can > be seen per department > * if theres a cost reduction due to the usage of deduplication, > part > of that money can be used for business , either IT related > subjects or general business > * management board wants to see numbers related to costs, and not > things like "the racio of deduplication in SAN01 is 3x", because > for management this is "geek talk" > > I hope i was somehow clear, but i can try to explain better if needed.Snapshots, copies, compression, deduplication, and (eventually) encryption occurs at the block level, not the file level. Hence, file-level accounting works as long as you do not try to make a 1:1 relationship to physical space. But your problem, as described above, is one of managerial accounting. IMHO, trying to apply a technical solution to a managerial accounting problem is akin to catching a greased pig. It is much easier to just do what businessmen do -- manage managerial accounting. en.wikipedia.org/wiki/Managerial_accounting -- richard
Hi, Despite the fact that i agree in general with your comments, in reality it all comes to money.. So in this case, if i could prove that ZFS was able to find X amount of duplicated data, and since that X amount of data has a price of Y per GB, IT could be seen as business enabler instead of a cost centre. But indeed, you''re right , in my case a possible technical solution is trying to answer for a managerial solution..however, isn''t it way IT was invented, that i believe that''s why i got my paycheck each month :) Bruno Richard Elling wrote:> On Dec 9, 2009, at 3:47 AM, Bruno Sousa wrote: > >> Hi Andrey, >> >> For instance, i talked about deduplication to my manager and he was >> happy because less data = less storage, and therefore less costs . >> However, now the IT group of my company needs to provide to management >> board, a report of duplicated data found per share, and in our case one >> share means one specific company department/division. >> Bottom line, the mindset is something like : >> >> * one share equals to a specific department within the company >> * the department demands a X value of data storage >> * the data storage costs Y >> * making a report of the amount of data consumed by a department, >> before and after deduplication, means that data storage costs can >> be seen per department >> * if theres a cost reduction due to the usage of deduplication, part >> of that money can be used for business , either IT related >> subjects or general business >> * management board wants to see numbers related to costs, and not >> things like "the racio of deduplication in SAN01 is 3x", because >> for management this is "geek talk" >> >> I hope i was somehow clear, but i can try to explain better if needed. > > Snapshots, copies, compression, deduplication, and (eventually) > encryption > occurs at the block level, not the file level. Hence, file-level > accounting > works as long as you do not try to make a 1:1 relationship to physical > space. > > But your problem, as described above, is one of managerial accounting. > IMHO, trying to apply a technical solution to a managerial accounting > problem is akin to catching a greased pig. It is much easier to just do > what businessmen do -- manage managerial accounting. > en.wikipedia.org/wiki/Managerial_accounting > -- richard > >-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091209/84566180/attachment.bin>
On Wed, 9 Dec 2009, Bruno Sousa wrote:> > Despite the fact that i agree in general with your comments, in reality > it all comes to money.. > So in this case, if i could prove that ZFS was able to find X amount of > duplicated data, and since that X amount of data has a price of Y per > GB, IT could be seen as business enabler instead of a cost centre.Most of the cost of storing business data is related to the cost of backing it up and administering it rather than the cost of the system on which it is stored. In this case it is reasonable to know the total amount of user data (and charge for it), since it likely needs to be backed up and managed. Deduplication does not help much here. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, simplesystems.org/users/bfriesen GraphicsMagick Maintainer, GraphicsMagick.org
On Wed, Dec 9, 2009 at 10:43 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Wed, 9 Dec 2009, Bruno Sousa wrote: >> >> Despite the fact that i agree in general with your comments, in reality >> it all comes to money.. >> So in this case, if i could prove that ZFS was able to find X amount of >> duplicated data, and since that X amount of data has a price of Y per >> GB, IT could be seen as business enabler instead of a cost centre. > > Most of the cost of storing business data is related to the cost of backing > it up and administering it rather than the cost of the system on which it is > stored. ?In this case it is reasonable to know the total amount of user data > (and charge for it), since it likely needs to be backed up and managed. > ?Deduplication does not help much here.Um, I thought deduplication had been invented to reduce backup window :). Regards, Andrey> Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, simplesystems.org/users/bfriesen > GraphicsMagick Maintainer, ? ?GraphicsMagick.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hi, The data needs to be stored somewhere, and usually we need to have a server, disk array, disks, and more data means more disks, and more disks active means more power usage , therefore higher costs, and less green IT :) So, at my point of view, deduplication is relevant for lowering costs, but in order to do that , there has to be a way to measure those costs/savings. But yes, this costs probably represent less than 20% of the total cost, but its a cost no matter what. However, maybe im driving in the wrong road... Bruno Bob Friesenhahn wrote:> On Wed, 9 Dec 2009, Bruno Sousa wrote: >> >> Despite the fact that i agree in general with your comments, in reality >> it all comes to money.. >> So in this case, if i could prove that ZFS was able to find X amount of >> duplicated data, and since that X amount of data has a price of Y per >> GB, IT could be seen as business enabler instead of a cost centre. > > Most of the cost of storing business data is related to the cost of > backing it up and administering it rather than the cost of the system > on which it is stored. In this case it is reasonable to know the > total amount of user data (and charge for it), since it likely needs > to be backed up and managed. Deduplication does not help much here. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, > simplesystems.org/users/bfriesen > GraphicsMagick Maintainer, GraphicsMagick.org >-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091209/31d9f245/attachment.bin>
On Wed, 9 Dec 2009, Andrey Kuzmin wrote:> > Um, I thought deduplication had been invented to reduce backup window :).Unless the backup system also supports deduplication, in what way does deduplication reduce the backup window? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, simplesystems.org/users/bfriesen GraphicsMagick Maintainer, GraphicsMagick.org
Jack Kielsmeier
2009-Dec-09 23:12 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> I have disabled all ''non-important'' processes (gdm, > ssh, vnc, etc). I am now starting this process > locally on the server via the console with about 3.4 > GB free of RAM. > > I still have my entries in /etc/system for limiting > how much RAM zfs can use.Going on 10 hours now, still importing. Still at just under 2MB/S read speed on each disk in the pool. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-09 23:41 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> > I have disabled all ''non-important'' processes > (gdm, > > ssh, vnc, etc). I am now starting this process > > locally on the server via the console with about > 3.4 > > GB free of RAM. > > > > I still have my entries in /etc/system for > limiting > > how much RAM zfs can use. > > Going on 10 hours now, still importing. Still at just > under 2MB/S read speed on each disk in the pool.And it''s now froze again. Been frozen for 10 minutes now. I had iostat working on the console, At the time of the freeze, it started writing to the zfs pool disks, previous to that, it has been all reads. The console cursor is still blinking at least, so it''s not a hard lock. I''m just gonna let it sit for a while and see what happens. -- This message posted from opensolaris.org
Cindy Swearingen
2009-Dec-09 23:44 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I wonder if you are hitting this bug: bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905936 Deleting large files or filesystems on a dedup=on filesystem stalls the whole system Cindy On 12/09/09 16:41, Jack Kielsmeier wrote:>>> I have disabled all ''non-important'' processes >> (gdm, >>> ssh, vnc, etc). I am now starting this process >>> locally on the server via the console with about >> 3.4 >>> GB free of RAM. >>> >>> I still have my entries in /etc/system for >> limiting >>> how much RAM zfs can use. >> Going on 10 hours now, still importing. Still at just >> under 2MB/S read speed on each disk in the pool. > > And it''s now froze again. Been frozen for 10 minutes now. > > I had iostat working on the console, At the time of the freeze, it started writing to the zfs pool disks, previous to that, it has been all reads. > > The console cursor is still blinking at least, so it''s not a hard lock. I''m just gonna let it sit for a while and see what happens.
Jack Kielsmeier
2009-Dec-10 00:49 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ah that could be it! This leaves me hopeful, as it looks like that bug says it''ll eventually finish! -- This message posted from opensolaris.org
On Dec 9, 2009, at 11:07 AM, Bruno Sousa wrote:> Hi, > > Despite the fact that i agree in general with your comments, in > reality > it all comes to money.. > So in this case, if i could prove that ZFS was able to find X amount > of > duplicated data, and since that X amount of data has a price of Y per > GB, IT could be seen as business enabler instead of a cost centre. > But indeed, you''re right , in my case a possible technical solution is > trying to answer for a managerial solution..however, isn''t it way IT > was > invented, that i believe that''s why i got my paycheck each month :)OK, I think I''ve pulled your leg just a bit :-) Here is the problem, if you charge per-byte, then when you dedup the cost per-byte increases. Why? Because you have both fixed and variable costs and dedup will reduce your variable (per-byte) cost. cost = fixed cost($) + [per byte cost ($/byte) * bytes] The best way to solve this is through managerial accounting (aka change the rules :-) which happens quite often in business. See also Captain Kirk''s response to the Kobayashi Maru en.wikipedia.org/wiki/Kobayashi_Maru Finally, as my managerial accounting professor says, "don''t lose money" :-) -- richard
Hi, Couldn''t agree more..but i just asked if there was such a tool :) Bruno Richard Elling wrote:> On Dec 9, 2009, at 11:07 AM, Bruno Sousa wrote: >> Hi, >> >> Despite the fact that i agree in general with your comments, in reality >> it all comes to money.. >> So in this case, if i could prove that ZFS was able to find X amount of >> duplicated data, and since that X amount of data has a price of Y per >> GB, IT could be seen as business enabler instead of a cost centre. >> But indeed, you''re right , in my case a possible technical solution is >> trying to answer for a managerial solution..however, isn''t it way IT was >> invented, that i believe that''s why i got my paycheck each month :) > > OK, I think I''ve pulled your leg just a bit :-) Here is the problem, > if you > charge per-byte, then when you dedup the cost per-byte increases. > Why? Because you have both fixed and variable costs and dedup > will reduce your variable (per-byte) cost. > > cost = fixed cost($) + [per byte cost ($/byte) * bytes] > > The best way to solve this is through managerial accounting (aka > change the rules :-) which happens quite often in business. See also > Captain Kirk''s response to the Kobayashi Maru > en.wikipedia.org/wiki/Kobayashi_Maru > > Finally, as my managerial accounting professor says, "don''t lose > money" :-) > -- richard > > >-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091210/ba229dbe/attachment.bin>
Jack Kielsmeier
2009-Dec-11 04:41 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
My import is still going (I hope, as I can''t confirm since my system appears to be totally locked except for the little blinking console cursor), been well over a day. I''m less hopeful now, but will still let it "do it''s thing" for another couple of days. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-12 23:46 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
It''s been over 72 hours since my last import attempt. System still is non-responsive. No idea if it''s doing anything -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-13 04:09 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
My system was pingable again, unfortunately I disabled all services such as ssh. My console was still hung, but I was wondering if I had hung USB crap (since I use a USB keyboard and everything had been hung for days). I force rebooted and the pool was not imported :(. I started the process off again, this time with remote services enabled and am telling myself to not touch the sucker for 7 days. We''ll see if that lasts :) -- This message posted from opensolaris.org
Ross
2009-Dec-14 10:46 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Thanks for the update, it''s no help to you of course, but I''m watching your progress with interest. Your progress updates are very much appreciated. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-15 02:50 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Thanks. I''ve decided now to only post when: 1) I have my zfs pool back or 2) I give up I should note that there are periods of time where I can ping my server (rarely), but most of the time not. I have not been able to ssh into it, and the console is hung (minus the little blinking cursor). I''m going to let this "run" until the end of the week. If I don''t have my zpool back by then, I''m guessing I never will. -- This message posted from opensolaris.org
Victor Latushkin
2009-Dec-15 06:31 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Dec 15, 2009, at 5:50, Jack Kielsmeier <jackal at netins.net> wrote:> Thanks. > > I''ve decided now to only post when: > > 1) I have my zfs pool back > or > 2) I give up > > I should note that there are periods of time where I can ping my > server (rarely), but most of the time not. I have not been able to > ssh into it, and the console is hung (minus the little blinking > cursor). > > I''m going to let this "run" until the end of the week. If I don''t > have my zpool back by then, I''m guessing I never will.Don''t give up! Let''s wait a bit longer and if it doesn''t work we''ll see what can be done. Regards Victor> -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jack Kielsmeier
2009-Dec-15 07:25 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> On Dec 15, 2009, at 5:50, Jack Kielsmeier > <jackal at netins.net> wrote: > > > Thanks. > > > > I''ve decided now to only post when: > > > > 1) I have my zfs pool back > > or > > 2) I give up > > > > I should note that there are periods of time where > I can ping my > > server (rarely), but most of the time not. I have > not been able to > > ssh into it, and the console is hung (minus the > little blinking > > cursor). > > > > I''m going to let this "run" until the end of the > week. If I don''t > > have my zpool back by then, I''m guessing I never > will. > > Don''t give up! Let''s wait a bit longer and if it > doesn''t work we''ll > see what can be done. > > Regards > Victor > > > -- > > This message posted from opensolaris.org > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > > mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discu > ssAh, thanks. As long as there is stuff to try, I won''t give up. I miss being able to use my server, but I''ll live :) I should note, that I can live with losing the data that I had on my pool. While I would prefer recovering it, I can stand to lose it. -- This message posted from opensolaris.org
Cindy Swearingen
2009-Dec-16 18:05 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Jack, We''d like to get a crash dump from this system to determine the root cause of the system hang. You can get a crash dump from a live system like this: # savecore -L dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel 0:18 100% done 100% done: 49953 pages dumped, dump succeeded savecore: System dump time: Wed Dec 16 10:37:51 2009 savecore: Saving compressed system crash dump in /var/crash/v120-brm-08/vmdump.0 savecore: Decompress the crash dump with ''savecore -vf /var/crash/v120-brm-08/vmdump.0'' It won''t impact the running system. Then, upload the crash dump file by following these instructions: wikis.sun.com/display/supportfiles/Sun+Support+Files+-+Help+and+Users+Guide Let us know when you get it uploaded. Thanks, Cindy On 12/15/09 00:25, Jack Kielsmeier wrote:>> On Dec 15, 2009, at 5:50, Jack Kielsmeier >> <jackal at netins.net> wrote: >> >>> Thanks. >>> >>> I''ve decided now to only post when: >>> >>> 1) I have my zfs pool back >>> or >>> 2) I give up >>> >>> I should note that there are periods of time where >> I can ping my >>> server (rarely), but most of the time not. I have >> not been able to >>> ssh into it, and the console is hung (minus the >> little blinking >>> cursor). >>> >>> I''m going to let this "run" until the end of the >> week. If I don''t >>> have my zpool back by then, I''m guessing I never >> will. >> >> Don''t give up! Let''s wait a bit longer and if it >> doesn''t work we''ll >> see what can be done. >> >> Regards >> Victor >> >>> -- >>> This message posted from opensolaris.org >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> >> mail.opensolaris.org/mailman/listinfo/zfs-discu >> ss >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> mail.opensolaris.org/mailman/listinfo/zfs-discu >> ss > > Ah, thanks. As long as there is stuff to try, I won''t give up. I miss being able to use my server, but I''ll live :) > > I should note, that I can live with losing the data that I had on my pool. While I would prefer recovering it, I can stand to lose it.
Jack Kielsmeier
2009-Dec-16 19:13 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> Jack, > > We''d like to get a crash dump from this system to > determine the root > cause of the system hang. You can get a crash dump > from a live system > like this: > > # savecore -L > dumping to /dev/zvol/dsk/rpool/dump, offset 65536, > content: kernel > 0:18 100% done > 0% done: 49953 pages dumped, dump succeeded > savecore: System dump time: Wed Dec 16 10:37:51 2009 > > savecore: Saving compressed system crash dump in > /var/crash/v120-brm-08/vmdump.0 > savecore: Decompress the crash dump with > ''savecore -vf /var/crash/v120-brm-08/vmdump.0'' > > It won''t impact the running system. > > Then, upload the crash dump file by following these > instructions: > > wikis.sun.com/display/supportfiles/Sun+Support+ > Files+-+Help+and+Users+Guide > > Let us know when you get it uploaded. > > Thanks, > > Cindy > > > On 12/15/09 00:25, Jack Kielsmeier wrote: > >> On Dec 15, 2009, at 5:50, Jack Kielsmeier > >> <jackal at netins.net> wrote: > >> > >>> Thanks. > >>> > >>> I''ve decided now to only post when: > >>> > >>> 1) I have my zfs pool back > >>> or > >>> 2) I give up > >>> > >>> I should note that there are periods of time > where > >> I can ping my > >>> server (rarely), but most of the time not. I have > >> not been able to > >>> ssh into it, and the console is hung (minus the > >> little blinking > >>> cursor). > >>> > >>> I''m going to let this "run" until the end of the > >> week. If I don''t > >>> have my zpool back by then, I''m guessing I never > >> will. > >> > >> Don''t give up! Let''s wait a bit longer and if it > >> doesn''t work we''ll > >> see what can be done. > >> > >> Regards > >> Victor > >> > >>> -- > >>> This message posted from opensolaris.org > >>> _______________________________________________ > >>> zfs-discuss mailing list > >>> zfs-discuss at opensolaris.org > >>> > >> > mail.opensolaris.org/mailman/listinfo/zfs-discu > >> ss > >> _______________________________________________ > >> zfs-discuss mailing list > >> zfs-discuss at opensolaris.org > >> > mail.opensolaris.org/mailman/listinfo/zfs-discu > >> ss > > > > Ah, thanks. As long as there is stuff to try, I > won''t give up. I miss being able to use my server, > but I''ll live :) > > > > I should note, that I can live with losing the data > that I had on my pool. While I would prefer > recovering it, I can stand to lose it. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discu > ssI''d be glad to do this, but I have a question. If the dump needs to happen while the system is hanging, how can I run the dump? :) I cannot ssh in, and my console is completely unresponsive. Would running the dump help at all when my system is not hung? If so I can hard reboot the server and run said command. I could also run the dump when I first start the zpool import. My system does not hang until stuff is being written to the pool. It takes several hours for this to happen (last time it was something like 14 hours of reading, and then mass writes started to happen when looking at iostat, and the freeze always happens exactly when writes to the disk get very busy). When iostat is refreshing every 5 seconds, I only get one output that shows writes before it freezes. Thanks -- This message posted from opensolaris.org
Cindy Swearingen
2009-Dec-16 20:25 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
In some cases, root logged into the console can still function, but if not, then you''d need to shutdown the system and run sync. I can walk you through those steps if you need them. If you''ve been tortured long enough, then feel free to upload a crash dump and let us know. Thanks, Cindy On 12/16/09 12:13, Jack Kielsmeier wrote:>> Jack, >> >> We''d like to get a crash dump from this system to >> determine the root >> cause of the system hang. You can get a crash dump >> from a live system >> like this: >> >> # savecore -L >> dumping to /dev/zvol/dsk/rpool/dump, offset 65536, >> content: kernel >> 0:18 100% done >> 0% done: 49953 pages dumped, dump succeeded >> savecore: System dump time: Wed Dec 16 10:37:51 2009 >> >> savecore: Saving compressed system crash dump in >> /var/crash/v120-brm-08/vmdump.0 >> savecore: Decompress the crash dump with >> ''savecore -vf /var/crash/v120-brm-08/vmdump.0'' >> >> It won''t impact the running system. >> >> Then, upload the crash dump file by following these >> instructions: >> >> wikis.sun.com/display/supportfiles/Sun+Support+ >> Files+-+Help+and+Users+Guide >> >> Let us know when you get it uploaded. >> >> Thanks, >> >> Cindy >> >> >> On 12/15/09 00:25, Jack Kielsmeier wrote: >>>> On Dec 15, 2009, at 5:50, Jack Kielsmeier >>>> <jackal at netins.net> wrote: >>>> >>>>> Thanks. >>>>> >>>>> I''ve decided now to only post when: >>>>> >>>>> 1) I have my zfs pool back >>>>> or >>>>> 2) I give up >>>>> >>>>> I should note that there are periods of time >> where >>>> I can ping my >>>>> server (rarely), but most of the time not. I have >>>> not been able to >>>>> ssh into it, and the console is hung (minus the >>>> little blinking >>>>> cursor). >>>>> >>>>> I''m going to let this "run" until the end of the >>>> week. If I don''t >>>>> have my zpool back by then, I''m guessing I never >>>> will. >>>> >>>> Don''t give up! Let''s wait a bit longer and if it >>>> doesn''t work we''ll >>>> see what can be done. >>>> >>>> Regards >>>> Victor >>>> >>>>> -- >>>>> This message posted from opensolaris.org >>>>> _______________________________________________ >>>>> zfs-discuss mailing list >>>>> zfs-discuss at opensolaris.org >>>>> >> mail.opensolaris.org/mailman/listinfo/zfs-discu >>>> ss >>>> _______________________________________________ >>>> zfs-discuss mailing list >>>> zfs-discuss at opensolaris.org >>>> >> mail.opensolaris.org/mailman/listinfo/zfs-discu >>>> ss >>> Ah, thanks. As long as there is stuff to try, I >> won''t give up. I miss being able to use my server, >> but I''ll live :) >>> I should note, that I can live with losing the data >> that I had on my pool. While I would prefer >> recovering it, I can stand to lose it. >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> mail.opensolaris.org/mailman/listinfo/zfs-discu >> ss > > I''d be glad to do this, but I have a question. > > If the dump needs to happen while the system is hanging, how can I run the dump? :) I cannot ssh in, and my console is completely unresponsive. > > Would running the dump help at all when my system is not hung? If so I can hard reboot the server and run said command. > > I could also run the dump when I first start the zpool import. My system does not hang until stuff is being written to the pool. It takes several hours for this to happen (last time it was something like 14 hours of reading, and then mass writes started to happen when looking at iostat, and the freeze always happens exactly when writes to the disk get very busy). When iostat is refreshing every 5 seconds, I only get one output that shows writes before it freezes. > > Thanks
Jack Kielsmeier
2009-Dec-16 21:34 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I''ll see what I can do. I have a busy couple of days, so it may not be until Friday until I can spend much time on this. Thanks -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-18 04:13 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, my console is 100% completely hung, not gonna be able to enter any commands when it freezes. I can''t even get the numlock light to change it''s status. This time I even plugged in a PS/2 keyboard instead of USB thinking maybe it was USB dying during the hang, but not so. I have hard rebooted my system again. I''m going to set up a script that will continuously run savecore, after 10, I''ll reset the bounds file. Hopefully by doing it this way, I''ll get a savecore right as the system starts to go unresponsive. I''ll post the script I''ll be running here shortly after I write it. Also, as far as using ''sync'' I"m not sure what exactly I would do there. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-18 04:40 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, this is the script I am running (as a background process). This script doesn''t matter much, it''s just here for reference, as I''m running into problems just running the savecore command while the zpool import is running. #################################################################### #!/bin/bash count=1 rm /var/crash/opensol/bounds /usr/bin/savecore -L while [ 1 ] do if [ $count == 10 ] then count=1 rm /var/crash/opensol/bounds fi savecore -L count=`expr $count + 1` done #################################################################### opensol was the name of the system before I renamed it to wd40, crash data is still set to be put in /var/crash/opensol I have started another zpool import of the vault volume #################################################################### root at wd40:~# zpool import pool: vault id: 4018273146420816291 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: vault ONLINE raidz1-0 ONLINE c3d0 ONLINE c3d1 ONLINE c4d0 ONLINE c4d1 ONLINE root at wd40:~# zpool import 4018273146420816291 & [1] 1093 #################################################################### After starting the import, savecore -L no longer finishes root at wd40:/var/adm# savecore -L dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel 0:05 100% done 100% done: 153601 pages dumped, dump succeeded It should be saying that it''s saving to /var/crash/opensol/, but instead it just hangs and never returns me to a prompt Previous to running zpool import, the savecore command took anywhere from 10-15 seconds to finish. If I cd to /var/crash/opensol, there is not a new file created I tried firing off savecore again, same result. A ps listing shows the savecore command root at wd40:/var/crash/opensol# ps -ef | grep savecore root 1092 1061 0 22:27:55 ? 0:01 savecore -L root 1134 1083 0 22:33:28 pts/3 0:00 grep savecore root 1113 787 0 22:30:23 ? 0:01 savecore -L (One of these is from the script I was running when I started the import manually, the other when I just ran the savecore -L command by itself). I cannot kill these processes, even with a kill -9 I then hard rebooted my server yet again (as it hangs if it''s in process of a zpool import) After the reboot, all I did was ssh in, disable gdm, run my zfs import command, and try another savecore (this time not trying to use my script above first, just a simple savecore -L as root from the command line), once again it hangs root at wd40:~# zpool import 4018273146420816291 & [1] 783 root at wd40:~# savecore -L dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel 0:05 100% done 100% done: 138876 pages dumped, dump succeeded -- This message posted from opensolaris.org
Victor Latushkin
2009-Dec-18 06:41 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On 18.12.09 07:13, Jack Kielsmeier wrote:> Ok, my console is 100% completely hung, not gonna be able to enter any > commands when it freezes. > > I can''t even get the numlock light to change it''s status. > > This time I even plugged in a PS/2 keyboard instead of USB thinking maybe it > was USB dying during the hang, but not so. > > I have hard rebooted my system again.I think it may be better to boot system with kmdb loaded - you need to edit you GRUB menu OpenSolaris entry and add -k to kernel$ line. Or you can just load kmdb from the console: mdb -K then type :c to continue When system freezes, you can use F1-A key combination to drop into kmdb, and then you can type $<systemdump to generate crashdump and reboot. Regards, victor> > I''m going to set up a script that will continuously run savecore, after 10, > I''ll reset the bounds file. Hopefully by doing it this way, I''ll get a > savecore right as the system starts to go unresponsive. > > I''ll post the script I''ll be running here shortly after I write it. > > Also, as far as using ''sync'' I"m not sure what exactly I would do there.-- -- Victor Latushkin phone: x11467 / +74959370467 TSC-Kernel EMEA mobile: +78957693012 Sun Services, Moscow blog: blogs.sun.com/vlatushkin Sun Microsystems
Jack Kielsmeier
2009-Dec-18 14:20 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ah! Ok, I will give this a try tonight! Thanks. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-19 02:43 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, I have started my import after using the -k on my kernel line (I just did a test dump using this method just to make sure it works ok, and it does). I have also added the following to my /etc/system file and rebooted: set snooping=1 According to this page: developers.sun.com/solaris/articles/manage_core_dump.html "Sometimes, the system will hang without any response even when you use kmdb or OBP. In this case, use the "deadman timer." The deadman timer allows the OS to force a kernel panic in the event of a system hang. This feature is available on x86 and SPARC systems. Add the following line to /etc/system and reboot so the deadman timer will be enabled." And this will force a kernel panic. I''ll wait for the system to hang and give it a try. Again, thanks for all the help. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-20 16:05 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, dump uploaded! ############################################################ Thanks for your upload Your file has been stored as "/cores/redshirt-vmdump.0" on the Supportfiles service. Size of the file (in bytes) : 1743978496. The file has a cksum of : 2878443682 . ############################################################ It''s about 1.7 GB compressed! -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-21 23:00 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I don''t mean to sound ungrateful (because I really do appreciate all the help I have received here), but I am really missing the use of my server. Over Christmas, I want to be able to use my laptop (right now, it''s acting as a server for some of the things my OpenSolaris server did). This means I will need to get my server back up and running in full working order by then. All the data that I lost is unimportant data, so I''m not really missing anything there. Again, I do appreciate all the help, but I''m going to "give up" if no solution can be found in the next couple of days. This is simply because I want to be able to use my hardware. What I plan on doing is simply formatting each disk that was part of the bad pool and creating a new one. -- This message posted from opensolaris.org
tom wagner
2009-Dec-27 00:50 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I am having the exact same problem after destroying a dataset with a few gigabytes of data and dedup. I type zfs destroy vault/virtualmachines which was a zvol with dedup turned on and the server hung, couldn''t ping, couldn''t get on the console. Next bootup same thing just hangs when importing the filesystems. I removed one of the pool disks and also all the mirrors so that I can experiment without losing the original data. But all of my behaviors are acting the exact same way as in this thread, although I can''t lose this particular data as it''s been a week since last backup so I am freaking out a little. I am on build 130 and was going to do another backup but was troubleshooting a different issue regarding poor iscsi performance before I did the backup. I deleted that zvol with dedup on and now I''m in the same boat as the parent. same hangs. and interesting thing that happens is that when I hit the power button which usually tells the system to start a shutdown, during these hangs it says not enough kernel memory. Perhaps a memory leak during the failed destroy is causing the hangups. But literally every description and in here matches my symptoms and during an import I can see the disks get hit pretty hard for a about 3 minutes and then stop cold turkey and the system is unresponsive, just a blinking cursor at the console and I can hit enter to generate a newline but everything is blank. so the console is locked pretty hard. The pool is made up of 3 mirrored vdevs, a cache and a log device. everything was running great until I destroy that one little deduped zvol. I was able to destroy other zvols right before that one. Has anybody had a chance to look at the dump the poster sent up? Thanks -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-27 06:52 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I still haven''t given up :) I moved my Virtual Machines to my main rig (which gets rebooted often, so this is ''not optimal'' to say the least) :) I have since upgraded to 129. I noticed that even if timeslider/autosnaps are disabled, a zpool command still gets generated every 15 minutes. Since all zpool/zfs commands freeze during the import, I''d have hundreds of hung zpool processes. I stopped this by commenting out all jobs on the zfssnap crontab as well as the auto-snap cleanup job on roots crontab. This did nothing to resolve my issue, but I figured I should note it. I''d copy and past the exact jobs, but my server is once again hung. I''m going to upgrade my server (new motherboard that supports more than 4GB of RAM). I''ll have double the RAM, perhaps there is some sort of RAM issue going on. I really wanted to get 16GB of RAM, by my own personal budget will not allow it :) -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-27 06:54 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Just wondering, How much RAM is in your system? -- This message posted from opensolaris.org
tom wagner
2009-Dec-27 19:27 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
4 Gigabytes. The hang on my system happens much faster. I can watch the drives light up and run iostat but 3 minutes in like clockwork everything gets hung and I''m left with a blinking cursor at the console that newlines but doesn''t do anything. Although if I run kmdb and hit f1-a I can get into the debugger. I''m thinking upon import it sees the destroy never finished and tries again during import and the same thing that hung the system during the original destroy is hanging the pool again and again during these imports. for the heck of it I even tried using the -F to roll the uber block, but no joy. I wouldn''t recommend this as it can be destructive, but since I pulled the mirrored drive physically out of my pool, I still have a good copy of the original pool I''m really worried that I won''t get this data back. I''m hoping its just a resource leak issue or something rather than corrupted metadata from destroying a dedup zvol. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-27 22:31 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
It sounds like you have less data on yours, perhaps that is why yours freezes faster. Whatever mine is doing during the import, it reads my disks now for nearly 24-hours, and then starts writing to the disks. The reads start out fast, then they just sit, going at something like 20k / second on each disk in my raidz1 pool. As soon as it''s done reading whatever it''s reading, it starts to write, that is when the freeze happens. I think the folks here from Sun that have been assisting here are on holiday break. I''m guessing there won''t be further assistance from them until after the first of the year. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-27 22:35 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Here is iostat output of my disks being read: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 45.3 0.0 27.6 0.0 0.0 0.6 0.0 13.3 0 60 c3d0 44.3 0.0 27.0 0.0 0.0 0.3 0.0 7.7 0 34 c3d1 43.5 0.0 27.4 0.0 0.0 0.5 0.0 12.6 0 55 c4d0 41.1 0.0 24.9 0.0 0.0 0.3 0.0 8.0 0 33 c4d1 very very slow It didn''t used to take as long to freeze for me, but every time I restart the process, the ''reading'' portion of the zpool import seems to take much longer. -- This message posted from opensolaris.org
scottford
2009-Dec-27 22:47 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I have a pool in the same state. I deleted a file set that was compressed and deduped and had a bunch of zero blocks in it. The delete ran for a while and then it hung. Trying to import with any combination of -f or -fF or -fFX gives the same results you guys get. zdb -eud shows all my file sets and the one I deleted gives error 16 inconsistent data. My pool has 12 750GB drives with 2 vdevs 6 drives in each raidz2 set. I have 12GB ram and a corei7. When I run the import it can return as fast as a few hours or as long as 3 days depending on the options I choose. Each run does end with a system hang. At some points all 8 logical cpus are running at 50% for hours. I don''t mind rolling back to a previous consistent state, just need some help getting it right. I started on snv_128 and since have upgraded to 129. Haven''t tried 130 yet. I tried limiting the arc size according to another post and it just took longer to get to the system hang. Scott -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-28 03:03 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
One thing that bugged me is that I can not ssh as myself to my box when a zpool import is running. It just hangs after accepting my password. I had to convert root from a role to a user and ssh as root to my box. I now know why this is, when I log in, /usr/sbin/quota gets called. This must do a zfs or zpool command get get quota information which hangs during an import. -- This message posted from opensolaris.org
scottford
2009-Dec-29 19:21 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I booted the snv_130 live cd and ran zpool import -fFX and it took a day, but it imported my pool and rolled it back to a previous version. I haven''t looked to see what was missing, but I didn''t need any of the changes over the last few weeks. Scott -- This message posted from opensolaris.org
tom wagner
2009-Dec-29 20:05 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> I booted the snv_130 live cd and ran zpool import > -fFX and it took a day, but it imported my pool and > rolled it back to a previous version. I haven''t > looked to see what was missing, but I didn''t need any > of the changes over the last few weeks. > > ScottI''ll give it a shot. Hope this works, Will report back if it succeeds. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-30 06:55 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I got my pool back!!!! Did a rig upgrade (new motherboard, processor, and 8 GB of RAM), re-installed opensolaris 2009.06, did an upgrade to snv_130, and did the import! The import only took about 4 hours! I have a hunch that I was running into some sort of issue with not having enough RAM previously. Of course, that''s just a guess. -- This message posted from opensolaris.org
Jack Kielsmeier
2009-Dec-30 06:57 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I should note that my import command was: zpool import -f vault -- This message posted from opensolaris.org
tom wagner
2009-Dec-30 18:26 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Yeah, still no joy on getting my pool back. I think I might have to try grabbing another server with a lot more memory and slapping the HBA and the drives in that. Can ZFS deal with a controller change? -- This message posted from opensolaris.org
Richard Elling
2009-Dec-30 18:35 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Dec 30, 2009, at 10:26 AM, tom wagner wrote:> Yeah, still no joy on getting my pool back. I think I might have to > try grabbing another server with a lot more memory and slapping the > HBA and the drives in that. Can ZFS deal with a controller change?Yes. -- richard
Jack Kielsmeier
2009-Dec-31 14:12 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> Yeah, still no joy on getting my pool back. I think > I might have to try grabbing another server with a > lot more memory and slapping the HBA and the drives > in that. Can ZFS deal with a controller change?Just some more info that ''may'' help. After I upgraded to 8GB of RAM, I did not limit the amount of RAM zfs can take. So if you are doing any kind of limiting in /etc/system, you may want to take that out. -- This message posted from opensolaris.org
tom wagner
2010-Jan-01 22:23 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Yeah, still no joy. I moved the disks to another machine altogether with 8gb and a quad core intel versus the dual core amd I was using and it still just hangs the box on import. this time I did a nohup zpool import -fFX vault after booting off the b130 live dvd on this machine into single user text mode so I''d have minimal processes and the machine still hangs tighter than a drum. Can''t even hit the enter and get a newline this way, probably because the bash process is locked. I''ve left it for 24 hours like this and will leave it for another day or two to see if it is actually doing anything behind the scenes. I guess my plan B will be to leave these disks in a closet and try again some time in the future and hopefully in some later build the kinks get all worked out enough with dedup to deal with my pool as I''d really not like to lose the data in this pool. -- This message posted from opensolaris.org
Richard Elling
2010-Jan-01 22:43 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Jan 1, 2010, at 2:23 PM, tom wagner wrote:> Yeah, still no joy. I moved the disks to another machine altogether > with 8gb and a quad core intel versus the dual core amd I was using > and it still just hangs the box on import. this time I did a nohup > zpool import -fFX vault after booting off the b130 live dvd on this > machine into single user text mode so I''d have minimal processes and > the machine still hangs tighter than a drum. Can''t even hit the > enter and get a newline this way, probably because the bash process > is locked. I''ve left it for 24 hours like this and will leave it for > another day or two to see if it is actually doing anything behind > the scenes. I guess my plan B will be to leave these disks in a > closet and try again some time in the future and hopefully in some > later build the kinks get all worked out enough with dedup to deal > with my pool as I''d really not like to lose the data in this pool.Are the drive lights blinking? If so, then let it do its work. Rebooting won''t help because when the pool is imported, the destroy will continue. See other recent threads in this forum on the subject for more insight. opensolaris.org/jive/forum.jspa?forumID=80&start=0 -- richard
tom wagner
2010-Jan-01 23:49 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
That''s the thing, the drive lights aren''t blinking, but I was thinking maybe the writes are going so slow that it''s possible they aren''t registering. And since I can''t keep a running iostat, Ican''t tell if anything is going on. I can however get into the KMDB. is there something in there that can monitor storage activity or anything? probably not, but it''s worth asking. -- This message posted from opensolaris.org
Markus Kovero
2010-Jan-02 12:10 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
If pool isnt rpool you might to want to boot into singleuser mode (-s after kernel parameters on boot) remove /etc/zfs/zpool.cache and then reboot. after that you can merely ssh into box and watch iostat while import. Yours Markus Kovero
Colin Raven
2010-Jan-02 18:36 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On Sat, Jan 2, 2010 at 13:10, Markus Kovero <Markus.Kovero at nebula.fi> wrote:> If pool isnt rpool you might to want to boot into singleuser mode (-s after > kernel parameters on boot) remove /etc/zfs/zpool.cache and then reboot. > after that you can merely ssh into box and watch iostat while import. >Wow, it''s utterly priceless tidbits like this that keeps me addicted to zfs-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100102/c8ac9a3c/attachment.html>
tom wagner
2010-Jan-02 19:18 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> If pool isnt rpool you might to want to boot into > singleuser mode (-s after kernel parameters on boot) > remove /etc/zfs/zpool.cache and then reboot. > after that you can merely ssh into box and watch > iostat while import. > > Yours > Markus Kovero > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discu > ssHey Markus, Thanks for the suggestion, but as stated in the thread, I am booting using "-s -kv -m verbose" and deleting the cache file was one of the first troubleshooting steps we and the others affected did. The other problem is that we were all starting an iostat at the console and ssh''ing in during multiuser mode and starting the import, but the eventual hang starts hanging iostat as well and kills the ssh. Seems like this issue is effecting more users than just me judging from this and the other threads I''ve been watching. Update on the other stuff. This is day 3 of my import and still no joy. Thanks, ~Bryan -- This message posted from opensolaris.org
Jack Kielsmeier
2010-Jan-02 19:51 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> That''s the thing, the drive lights aren''t blinking, > but I was thinking maybe the writes are going so slow > that it''s possible they aren''t registering. And since > I can''t keep a running iostat, Ican''t tell if > anything is going on. I can however get into the > KMDB. is there something in there that can monitor > storage activity or anything? > probably not, but it''s worth asking. > > Oh, and for the other guys, was your ZIL on an ssd or > in the pool? My ZIL is on a 30GB ssd from ocz and my > arcl2 is on another ssd of the same type. I''m > wondering if your ZIL''s are in the pool and therefore > is helping your recovery where I may be hitting > simultaneous bug. > > Message was edited by: tomwagNo SSD''s in my system. ZIL is in the pool. -- This message posted from opensolaris.org
Markus Kovero
2010-Jan-02 22:29 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> Hey Markus, > > Thanks for the suggestion, but as stated in the thread, I am booting using "-s -kv -m > verbose" and deleting the cache file was one of the first troubleshooting steps we and > the others affected did. The other problem is that we were all starting an iostat at > the console and ssh''ing in during multiuser mode and starting the import, but the > eventual hang starts hanging iostat as well and kills the ssh. > > Seems like this issue is effecting more users than just me judging from this and the > other threads I''ve been watching. > > Update on the other stuff. This is day 3 of my import and still no joy. > > Thanks, > ~BryanOh, my bad I didnt go thru thread so closely, anyway, seems bit odd it''s blocking I/O completely, have you tried reading from pools member disks with dd before import and checking iostat error counters for hw/transport errors? Did you try with different set of RAM on other server, faulty ram could do this as well. And is your swap device okay, if it happens to swap during import into faulty pool/device it might cause interesting behavior as well. Yours Markus Kovero
tom wagner
2010-Jan-04 16:47 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
In this last iteration, I switched to a completely different box with twice the resources. Somehow, from the symptoms, I don''t think trying it on one of the 48 or 128GB servers at work is going to change the outcome. The hang happens too fast. it seems like something in the destroy is causing this system hang. ON this last iteration I also used a LIVE dvd of b130 on the 8gb system as another person in this thread who experienced the problem was able to use to get his pool back. AS far as hardware errors, since I used a completely different system for this last import nothing was the same except for disks, and even then I used only 1/2 the mirrored disks from the pool and tried import on each half. The reason I am only using half of the mirrored disks is so that if in my experimenting trying to get my data back I screw up the pool, I still have a copy of the pool to use for later tests. Also, to any devs, there is a thread on here where a user is also experiencing the problem and he is offering up his server for remote access for a dev to check out the problem. after 4 days, I stopped the import as it was just sort of sitting there hung. Anyone know if I can import a pool without the log device? I''d like to try that next if there is some way to do that. -- This message posted from opensolaris.org
Jack Kielsmeier
2010-Jan-08 08:22 UTC
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I''m thinking that the issue is simply with zfs destroy, not with dedup or compression. Yesterday I decided to do some iscsi testing, I created a new dataset in my pool, 1TB. I did not use compression or dedup. After copying about 700GB of data from my windows box (NTFS on top of the iscsi disk), I decided I didn''t want to use it, so I attempted to delete the dataset. Once again, the command froze. I removed the zfs cache file and am now trying to import my pool... again. This time, the memory fills up QUICKLY, I hit 8GB used in about an hour, then the box completely freezes. iostat shows each of my disks being read at about 10 megs/S up until the freeze. It does not matter if I limit l2arc size in /etc/system, the behavior is the same. -- This message posted from opensolaris.org