Georg S. Duck
2010-Jan-27 11:10 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
Hi, I was suffering for weeks from the following problem: a zfs dataset contained an automatic snapshot (monthly) that used 2.8 TB of data. The dataset was deprecated, so I chose to destroy it after I had deleted some files; eventually it was completely blank besides the snapshot that still locked 2.8 TB on the pool. ''zfs destroy -r pool/dataset'' hung the machine within seconds to be completely unresponsive. No respective messages could be found in logs. The issue was reproducible. The same happened for ''zfs destroy pool/dataset at snapshot'' Thus, the conclusion was that the snapshot was indeed the problem. Solution: After trying several things, including updating the system to snv_130 and snv_131, I had the idea to restore the dataset to the snapshot before doing another zfs destroy attempt. ''zfs rollback pool/dataset at snapshot'' ''zfs unmount -f pool/dataset'' ''zfs destroy -r pool/dataset'' Et voil?! It worked. Conclusion: I guess there is something wrong in zfs handling snapshots during a recursive dataset destruction. As it seems, the destruction is only successful if the dataset is consistent with the snapshot. Even if the workaround seems to be viable a fix of the issue would be appreciated. Regards, Tonmaus -- This message posted from opensolaris.org
erik.ableson
2010-Jan-27 11:39 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
On 27 janv. 2010, at 12:10, Georg S. Duck wrote:> Hi, > I was suffering for weeks from the following problem: > a zfs dataset contained an automatic snapshot (monthly) that used 2.8 TB of data. The dataset was deprecated, so I chose to destroy it after I had deleted some files; eventually it was completely blank besides the snapshot that still locked 2.8 TB on the pool. > > ''zfs destroy -r pool/dataset'' > > hung the machine within seconds to be completely unresponsive. No respective messages could be found in logs. The issue was reproducible. > The same happened for > ''zfs destroy pool/dataset at snapshot'' > > Thus, the conclusion was that the snapshot was indeed the problem.For info, I have exactly the same situation here with a snapshot that cannot be deleted that results in the same symptoms. Total freeze, even on the console. Server responds to pings, but that''s it. All iSCSI, NFS and ssh connections are cut. Currently running b130. I''ll try the workaround once I get some spare space to migrate the contents. Erik
Georg S. Duck
2010-Jan-27 13:02 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
> Server responds to pings, but that''s it. All iSCSI, NFS and ssh connections are cut.That''s consistent with my findings, adding that SMB is cut as well. At one vain attempt to destroy the dataset at snapshot I got a "[ID 224711 kern.warning] WARNING: Memory pressure: TCP defensive mode on". If I had a separate ssh session open with ''top'' running I could monitor CPU load going through the roof before that session was dead along with everything.> For info, I have exactly the same situation here with a snapshot that cannot be deleted that results in the same symptoms.That would rule an empty data set being a relevant side condition.> I''ll try the workaround once I get some spare space to migrate the contents.If your final aim isn''t the destruction of the dataset that exacerbates the situation. After I had understood the issue with snapshots my choice was to de-activate all automatic snapshots on non-rpools. Specifically I have different backup protocols in place anyhow. Automatic snapshots are on by default. Regards, Tonmaus -- This message posted from opensolaris.org
Tim Haley
2010-Jan-27 15:26 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
On 01/27/10 04:39 AM, erik.ableson wrote:> On 27 janv. 2010, at 12:10, Georg S. Duck wrote: > > >> Hi, >> I was suffering for weeks from the following problem: >> a zfs dataset contained an automatic snapshot (monthly) that used 2.8 TB of data. The dataset was deprecated, so I chose to destroy it after I had deleted some files; eventually it was completely blank besides the snapshot that still locked 2.8 TB on the pool. >> >> ''zfs destroy -r pool/dataset'' >> >> hung the machine within seconds to be completely unresponsive. No respective messages could be found in logs. The issue was reproducible. >> The same happened for >> ''zfs destroy pool/dataset at snapshot'' >> >> Thus, the conclusion was that the snapshot was indeed the problem. >> > For info, I have exactly the same situation here with a snapshot that cannot be deleted that results in the same symptoms. Total freeze, even on the console. Server responds to pings, but that''s it. All iSCSI, NFS and ssh connections are cut. Currently running b130. > > I''ll try the workaround once I get some spare space to migrate the contents. > > Erik > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >This sounds like yet another instance of 6910767 deleting large holey objects hangs other I/Os I have a module based on 130 that includes this fix if you would like to try it. -tim
Tonmaus
2010-Jan-27 20:00 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
> This sounds like yet another instance of > > 6910767 deleting large holey objects hangs other I/Os > > I have a module based on 130 that includes this fix > if you would like to try it. > > -timHi Tim, 6910767 seems to be about ZVOLs. The dataset here was not a ZVOL. I had a 1,4 TB ZVOL on the same pool that also wasn''t easy to kill. It hung the machine as well - but only once: it was gone after a forced re-boot. Regards, Tonmaus -- This message posted from opensolaris.org
Alasdair Lumsden
2010-Apr-27 16:39 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
Hi - was there any progress on this issue? I''d be interested to know if any bugs were filed regarding it and whether there''s a way to follow up on the progress. Cheers, Alasdair -- This message posted from opensolaris.org
Lo Zio
2010-Jul-01 12:23 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
I also have this problem, with 134 if I delete big snapshots the server hangs only responding to ping. I also have the ZVOL issue. Any news about having them solved? In my case this is a big problem since I''m using osol as a file server... Thanks -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2010-Jul-01 18:29 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
----- Original Message -----> I also have this problem, with 134 if I delete big snapshots the > server hangs only responding to ping. > I also have the ZVOL issue. > Any news about having them solved? > In my case this is a big problem since I''m using osol as a file > server...Are you using dedup? If so, that''s the reason - dedup isn''t ready for production yet unless you have sufficient RAM/L2ARC Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Lo Zio
2010-Jul-01 19:23 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
Thanks roy, I read a lot around and also was thinking it was a dedup-related problem. Although I did not find any indication of how many RAM is enough, and never find something saying "Do not use dedup, it will definitely crash your server". I''m using a Dell Xeon with 4 Gb of RAM, maybe it is not an uber-server but it works really well (when it is not hung, I mean). Do you have an idea about the optimal config to have 1,5T of available space in 10 datasets (5 deduped), and 10 rotating snapshots? Thanks -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2010-Jul-01 19:32 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
----- Original Message -----> Thanks roy, I read a lot around and also was thinking it was a > dedup-related problem. Although I did not find any indication of how > many RAM is enough, and never find something saying "Do not use dedup, > it will definitely crash your server". I''m using a Dell Xeon with 4 Gb > of RAM, maybe it is not an uber-server but it works really well (when > it is not hung, I mean). > Do you have an idea about the optimal config to have 1,5T of available > space in 10 datasets (5 deduped), and 10 rotating snapshots? > ThanksErik Timble had a post on this today, pasted below. Look though that, but seriously, with 4 gigs of RAM and no L2ARC, dedup is of no use roy Actually, I think the rule-of-thumb is 270 bytes/DDT entry. It''s 200 bytes of ARC for every L2ARC entry. DDT doesn''t count for this ARC space usage E.g.: I have 1TB of 4k files that are to be deduped, and it turns out that I have about a 5:1 dedup ratio. I''d also like to see how much ARC usage I eat up with a 160GB L2ARC. (1) How many entries are there in the DDT: 1TB of 4k files means there are 2^30 files (about 1 billion). However, at a 5:1 dedup ratio, I''m only actually storing 20% of that, so I have about 214 million blocks. Thus, I need a DDT of about 270 * 214 million =~ 58GB in size (2) My L2ARC is 160GB in size, but I''m using 58GB for the DDT. Thus, I have 102GB free for use as a data cache. 102GB / 4k =~ 27 million blocks can be stored in the remaining L2ARC space. However, 26 million files takes up: 200 * 27 million =~ 5.5GB of space in ARC Thus, I''d better have at least 5.5GB of RAM allocated solely for L2ARC reference pointers, and no other use. -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Erik Trimble
2010-Jul-01 19:33 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
On 7/1/2010 12:23 PM, Lo Zio wrote:> Thanks roy, I read a lot around and also was thinking it was a dedup-related problem. Although I did not find any indication of how many RAM is enough, and never find something saying "Do not use dedup, it will definitely crash your server". I''m using a Dell Xeon with 4 Gb of RAM, maybe it is not an uber-server but it works really well (when it is not hung, I mean). > Do you have an idea about the optimal config to have 1,5T of available space in 10 datasets (5 deduped), and 10 rotating snapshots? > Thanks >Take a look at the archives for these threads: Dedup RAM requirements, vs. L2ARC? http://mail.opensolaris.org/pipermail/zfs-discuss/2010-June/042661.html Dedup performance hit http://mail.opensolaris.org/pipermail/zfs-discuss/2010-June/042235.html 4GB of RAM is likely to be *way* too small to run dedup with your setup. You almost certainly need a SSD for L2ARC, and probably at least 2x the RAM. The "hangs" you see are likely the Dedup Table being built on-the-fly from the datasets, which is massively I/O intensive. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Lo Zio
2010-Jul-02 10:37 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
Sorry roy, but reading the post you pointed me "meaning about 1,2GB per 1TB stored on 128kB blocks" I have 1,5TB and 4 Gb of RAM, and not all of this is deduped. Why you say it''s *way* too small. It should be *way* enough.>From the performance point of view, it is not a problem, I use that machine to store backups and using delta-snapshots it transfers a reasonable amount of data each night.BTW, have you some info about the fact that 2010.WHATEVER will support dedup in a stable way? Also, you said that should be a very I/O intensive task that blocks my server, but before getting hung iostat and zpool-iostat show very little I/O (about 500 kb/sec) and no CPU usage. Now I started a destroy -r on a deduped dataset about 500Gb, and it crashed. You think it may come up in some days? I have time until monday to test. Thanks -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2010-Jul-02 11:19 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
----- Original Message -----> Sorry roy, but reading the post you pointed me > "meaning about 1,2GB per 1TB stored on 128kB blocks" > I have 1,5TB and 4 Gb of RAM, and not all of this is deduped. > Why you say it''s *way* too small. It should be *way* enough. > From the performance point of view, it is not a problem, I use that > machine to store backups and using delta-snapshots it transfers a > reasonable amount of data each night. > BTW, have you some info about the fact that 2010.WHATEVER will support > dedup in a stable way?I have no idea if 201\d\.\d+ will have usable dedup at release time, but on my test system (Intel core2duo 2,3, 8GB RAM, 8x2TB disk and a couple of 160GB X25-Ms) dedup doesn''t behave very well when removing deduped data. It takes for ever and if an unexpected reboot happens when this is running, it will hang osol on bootup while reading zfs data - well - hang is the wrong word, but it''ll finish whatever it started before booting up. Since this can take hours or even days, and the system won''t be very useful while it''s doing this, I''ve decided to halt testing on dedup until something comes out of Oracle. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Lo Zio
2010-Jul-10 08:04 UTC
[zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found
In any case, have you an idea of the way to solve my current problem? I have 450Gb in a deduped dataset I want to destroy, and each tentative I do results in a machine hang. I just want to destroy a dataset and all of its snapshots!! I tried unmounting before zfs destroy but had no luck... Thanks -- This message posted from opensolaris.org