Brent Jones
2009-Dec-27 21:35 UTC
[zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot
On Sun, Dec 27, 2009 at 12:55 AM, Stephan Budach <stephan.budach at jvm.de> wrote:> Brent, > > I had known about that bug a couple of weeks ago, but that bug has been files against v111 and we''re at v130. I have also seached the ZFS part of this forum and really couldn''t find much about this issue. > > The other issue I noticed is that, as opposed to the statements I read, that once zfs is underway destroying a big dataset, other operations would continue to work, but that doesen''t seem to be the case. When destroying the 3 TB dataset, the other zvol that had been exported via iSCSI stalled as well and that''s really bad. > > Cheers, > budy > -- > This message posted from opensolaris.org > _______________________________________________ > opensolaris-help mailing list > opensolaris-help at opensolaris.org >I just tested your claim, and you appear to be correct. I created a couple dummy ZFS filesystems, loaded them with about 2TB, exported them via CIFS, and destroyed one of them. The destroy took the usual amount of time (about 2 hours), and actually, quite to my surprise, all I/O on the ENTIRE zpool stalled. I dont recall seeing this prior to 130, in fact, I know I would have noticed this, as we create and destroy large ZFS filesystems very frequently. So it seems the original issue I reported many months back has actually gained some new negative impacts :( I''ll try to escalate this with my Sun support contract, but Sun support still isn''t very familiar/clued in about OpenSolaris, so I doubt I will get very far. Cross posting to ZFS-discuss also, as other may have seen this and know of a solution/workaround. -- Brent Jones brent at servuhome.net
Brent Jones
2009-Dec-29 08:34 UTC
[zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot
On Sun, Dec 27, 2009 at 1:35 PM, Brent Jones <brent at servuhome.net> wrote:> On Sun, Dec 27, 2009 at 12:55 AM, Stephan Budach <stephan.budach at jvm.de> wrote: >> Brent, >> >> I had known about that bug a couple of weeks ago, but that bug has been files against v111 and we''re at v130. I have also seached the ZFS part of this forum and really couldn''t find much about this issue. >> >> The other issue I noticed is that, as opposed to the statements I read, that once zfs is underway destroying a big dataset, other operations would continue to work, but that doesen''t seem to be the case. When destroying the 3 TB dataset, the other zvol that had been exported via iSCSI stalled as well and that''s really bad. >> >> Cheers, >> budy >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> opensolaris-help mailing list >> opensolaris-help at opensolaris.org >> > > I just tested your claim, and you appear to be correct. > > I created a couple dummy ZFS filesystems, loaded them with about 2TB, > exported them via CIFS, and destroyed one of them. > The destroy took the usual amount of time (about 2 hours), and > actually, quite to my surprise, all I/O on the ENTIRE zpool stalled. > I dont recall seeing this prior to 130, in fact, I know I would have > noticed this, as we create and destroy large ZFS filesystems very > frequently. > > So it seems the original issue I reported many months back has > actually gained some new negative impacts ?:( > > I''ll try to escalate this with my Sun support contract, but Sun > support still isn''t very familiar/clued in about OpenSolaris, so I > doubt I will get very far. > > Cross posting to ZFS-discuss also, as other may have seen this and > know of a solution/workaround. > > > > -- > Brent Jones > brent at servuhome.net >I did some more testing, and it seems this is 100% reproducible ONLY if the file system and/or entire pool had compression or de-dupe enabled at one point. It doesn''t seem to matter if de-dupe/compression was enabled for 5 minutes, or the entire life of the pool, as soon as either are turned on in snv_130, doing any type of mass change (like deleting a big file system) will hang ALL I/O for a significant amount of time. If I create a filesystem with neither enabled, fill it with a few TB of data, and do a ''zfs destroy'' on it, it''ll go pretty quick, just a couple minutes, and no noticeable impact to system I/O. I''m curious about the 7000 series appliances, since those supposedly ship now with de-dupe as a fully supported option. Is the code significantly different in the core of ZFS on the 7000 appliances than a recent build of OpenSolaris? My sales rep assures me theres very little overhead by enabling de-dupe on the 7000 series (which he''s trying to sell us, obviously) but I can''t see how that could be, when I have the same hardware the 7000''s run on (fully loaded X4540). Any thoughts from anyone? -- Brent Jones brent at servuhome.net
Stephan Budach
2009-Dec-29 09:32 UTC
[zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot
Hi Brent, what you have noticed makes sense and that behaviour has been present since v127, when dedupe was introduced in OpenSolaris. This also fits into my observations. I thought I had totally messed up one of my OpenSolaris boxes which I used to take my first steps with ZFS/dedupe and re-creating the same zpool on another OpenSolaris box, immediately returned my pool to deliver high performance I/O. Alas, after I had enabled dedupe on one of the zfs vols, the system started to show those issues again. If I got that correctly, ZFS calculates a sha-256 bit checksum anyway, so that really shouldn''t impact performance significantly. I have installed OpenSolaris on a Dell R610 with 2 current Nehalem CPUs and 12 GB of RAM and I couldn''t notice a difference in I/O with or w/o dedupe configured. Budy -- This message posted from opensolaris.org
Richard Elling
2009-Dec-29 17:50 UTC
[zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot
On Dec 29, 2009, at 12:34 AM, Brent Jones wrote:> On Sun, Dec 27, 2009 at 1:35 PM, Brent Jones <brent at servuhome.net> > wrote: >> On Sun, Dec 27, 2009 at 12:55 AM, Stephan Budach <stephan.budach at jvm.de >> > wrote: >>> Brent, >>> >>> I had known about that bug a couple of weeks ago, but that bug has >>> been files against v111 and we''re at v130. I have also seached the >>> ZFS part of this forum and really couldn''t find much about this >>> issue. >>> >>> The other issue I noticed is that, as opposed to the statements I >>> read, that once zfs is underway destroying a big dataset, other >>> operations would continue to work, but that doesen''t seem to be >>> the case. When destroying the 3 TB dataset, the other zvol that >>> had been exported via iSCSI stalled as well and that''s really bad. >>> >>> Cheers, >>> budy >>> -- >>> This message posted from opensolaris.org >>> _______________________________________________ >>> opensolaris-help mailing list >>> opensolaris-help at opensolaris.org >>> >> >> I just tested your claim, and you appear to be correct. >> >> I created a couple dummy ZFS filesystems, loaded them with about 2TB, >> exported them via CIFS, and destroyed one of them. >> The destroy took the usual amount of time (about 2 hours), and >> actually, quite to my surprise, all I/O on the ENTIRE zpool stalled. >> I dont recall seeing this prior to 130, in fact, I know I would have >> noticed this, as we create and destroy large ZFS filesystems very >> frequently.>> So it seems the original issue I reported many months back has >> actually gained some new negative impacts :( >> >> I''ll try to escalate this with my Sun support contract, but Sun >> support still isn''t very familiar/clued in about OpenSolaris, so I >> doubt I will get very far. >> >> Cross posting to ZFS-discuss also, as other may have seen this and >> know of a solution/workaround. >> >> >> >> -- >> Brent Jones >> brent at servuhome.net >> > > I did some more testing, and it seems this is 100% reproducible ONLY > if the file system and/or entire pool had compression or de-dupe > enabled at one point. > It doesn''t seem to matter if de-dupe/compression was enabled for 5 > minutes, or the entire life of the pool, as soon as either are turned > on in snv_130, doing any type of mass change (like deleting a big file > system) will hang ALL I/O for a significant amount of time.I don''t believe compression matters. But dedup can really make a big difference. When you enable dedup, the deduplication table (DDT) is created to keep track of the references to blocks. When you remove a file, the reference counter needs to be decremented for each block in the file. When a DDT entry has a reference count of zero, the block can be freed. When you destroy a file system (or dataset) which has dedup enabled, then all of the blocks written since dedup was enabled will need to have their reference counters decremented. This workload looks like a small, random read followed by a small write. With luck, the small, random read will already be loaded in the ARC, but you can''t escape the small write (though they should be coalesced). Bottom line, rm or destroy of deduplicated files or datasets will create a flurry of small, random I/O to the pool. If you use devices in the pool which are not optimized for lots of small, random I/O, then this activity will take a long time. ...which brings up a few interesting questions; Does it make sense to remove deduplicated files? How do we schedule automatic snapshot removal? I filed an RFE on a method to address this problem. I''ll pass along the CR if or when it is assigned. -- richard
Eric D. Mudama
2009-Dec-29 18:03 UTC
[zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot
On Tue, Dec 29 at 9:50, Richard Elling wrote:>I don''t believe compression matters. But dedup can really make a big >difference. When you enable dedup, the deduplication table (DDT) is >created to keep track of the references to blocks. When you remove aAre there any published notes on relative DDT size compared to file count, dedup efficiency, pool size, etc. for admins to make server capacity planning decisions? --eric -- Eric D. Mudama edmudama at mail.bounceswoosh.org
Richard Elling
2009-Dec-29 18:10 UTC
[zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot
On Dec 29, 2009, at 10:03 AM, Eric D. Mudama wrote:> On Tue, Dec 29 at 9:50, Richard Elling wrote: >> I don''t believe compression matters. But dedup can really make a big >> difference. When you enable dedup, the deduplication table (DDT) is >> created to keep track of the references to blocks. When you remove a > > Are there any published notes on relative DDT size compared to file > count, dedup efficiency, pool size, etc. for admins to make server > capacity planning decisions?I think it is still too early to tell. The community will need to do more experiments and share results :-) Also, the DDT is not instrumented -- quite unlike the ARC, for instance. I''ve been making some DTrace measurements, but am not yet ready to share any results. -- richard
Stephan Budach
2009-Dec-30 09:33 UTC
[zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot
Richard, well? I am willing to experiment with dedup but I am quite unsure on how to share my results effectively. That is, what would be the interesting data, that would help improving ZFS/dedup and how should that data be presented? I reckon that just from sharing general issues, some hard facts might be of interest for that. Cheers, budy -- This message posted from opensolaris.org
William D. Hathaway
2009-Dec-30 13:28 UTC
[zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot
I know dedup is on the roadmap for the 7000 series, but I don''t think it is officially supported yet, since we would have seen a note about the release of the software on the FishWorks Wiki http://wikis.sun.com/display/FishWorks/Software+Updates -- This message posted from opensolaris.org