Lutz Schumann
2010-Feb-26 19:42 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
Hello list, ZFS can be used in both file level (zfs) and block level access (zvol). When using zvols, those are always thin provisioned (space is allocated on first write). We use zvols with comstar to do iSCSI and FC access - and exuse me in advance - but this may also be a more comstar related question then. When reading from a freshly created zvol, no data comes from disk. All reads are satisfied by ZFS and comstar returns 0''s (I guess) for all reads. Now If a virtual machine writes to the zvol, blocks are allocated on disk. Reads are now partial from disk (for all blocks written) and from ZFS layer (all unwritten blocks). If the virtual machine (which may be vmware / xen / hyperv) deletes blocks / frees space within the zvol, this also means a write - usually in meta data area only. Thus the underlaying Storage system does not know which blocks in a zvol are really used. So reducing size in zvols is really difficult / not possible. Even if one deletes everything in guest, the blocks keep allocated. If one zeros out all blocks, even more space is allocated. For the same purpose TRIM (ATA) / PUNCH (SCSI) has bee introduced. With this commands the guest can tell the storage, which blocks are not used anymore. Those commands are not available in Comstar today :( However I had the idea that comstar can get the same result in the way vmware did it some time ago with "vmware tools". Idea: - If the guest writes a block with 0''s only, the block is freed again - if someone reads this block again - it wil get the same 0''s it would get if the 0''s would be written - The checksum of a "all 0" block dan be hard coded for SHA1 / Flecher, so the comparison for "is this a "0 only" block is easy. With this in place, a host wishing to free thin provisioned zvol space can fill the unused blocks wirth 0s easity with simple tools (e.g. dd if=/dev/zero of=/MYFILE bs=1M; rm /MYFILE) and the space is freed again on the zvol side. Does anyone know why this is not incorporated into ZFS ? -- This message posted from opensolaris.org
Tomas Ögren
2010-Feb-26 19:48 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
On 26 February, 2010 - Lutz Schumann sent me these 2,2K bytes:> Hello list, > > ZFS can be used in both file level (zfs) and block level access (zvol). When using zvols, those are always thin provisioned (space is allocated on first write). We use zvols with comstar to do iSCSI and FC access - and exuse me in advance - but this may also be a more comstar related question then. > > When reading from a freshly created zvol, no data comes from disk. All reads are satisfied by ZFS and comstar returns 0''s (I guess) for all reads. > > Now If a virtual machine writes to the zvol, blocks are allocated on disk. Reads are now partial from disk (for all blocks written) and from ZFS layer (all unwritten blocks). > > If the virtual machine (which may be vmware / xen / hyperv) deletes blocks / frees space within the zvol, this also means a write - usually in meta data area only. Thus the underlaying Storage system does not know which blocks in a zvol are really used. > > So reducing size in zvols is really difficult / not possible. Even if one deletes everything in guest, the blocks keep allocated. If one zeros out all blocks, even more space is allocated. > > For the same purpose TRIM (ATA) / PUNCH (SCSI) has bee introduced. With this commands the guest can tell the storage, which blocks are not used anymore. Those commands are not available in Comstar today :( > > However I had the idea that comstar can get the same result in the way vmware did it some time ago with "vmware tools". > > Idea: > - If the guest writes a block with 0''s only, the block is freed again > - if someone reads this block again - it wil get the same 0''s it would get if the 0''s would be written > - The checksum of a "all 0" block dan be hard coded for SHA1 / Flecher, so the comparison for "is this a "0 only" block is easy. > > With this in place, a host wishing to free thin provisioned zvol space can fill the unused blocks wirth 0s easity with simple tools (e.g. dd if=/dev/zero of=/MYFILE bs=1M; rm /MYFILE) and the space is freed again on the zvol side. > > Does anyone know why this is not incorporated into ZFS ?What you can do until this is to enable compression (like lzjb) on the zvol, then do your dd dance in the client, then you can disable the compression again. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Bill Sommerfeld
2010-Feb-26 19:53 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
On 02/26/10 11:42, Lutz Schumann wrote:> Idea: > - If the guest writes a block with 0''s only, the block is freed again > - if someone reads this block again - it wil get the same 0''s it would get if the 0''s would be written > - The checksum of a "all 0" block dan be hard coded for SHA1 / Flecher, so the comparison for "is this a "0 only" block is easy. > > With this in place, a host wishing to free thin provisioned zvol space can fill the unused blocks wirth 0s easity with simple tools (e.g. dd if=/dev/zero of=/MYFILE bs=1M; rm /MYFILE) and the space is freed again on the zvol side.You''ve just described how ZFS behaves when compression is enabled -- a block of zeros is compressed to a hole represented by an all-zeros block pointer. > Does anyone know why this is not incorporated into ZFS ? It''s in there. Turn on compression to use it. - Bill
Lutz Schumann
2010-Feb-26 19:55 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
This would be an idea and I thought about this. However I see the following problems: 1) using deduplication This will reduce the on disk size however the DDT will grow forever and for the deletion of zvols this will mean a lot of time and work (see other threads regarding DDT memory issues on the list) 2) compression AS I understand it - if I do zfs send/receive (which we do for DR) data is grown to the original size again on the wire. This makes it difficult. Regards, Robert -- This message posted from opensolaris.org
Marc Nicholas
2010-Feb-26 20:06 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
On Fri, Feb 26, 2010 at 2:42 PM, Lutz Schumann <presales at storageconcepts.de>wrote:> > Now If a virtual machine writes to the zvol, blocks are allocated on disk. > Reads are now partial from disk (for all blocks written) and from ZFS layer > (all unwritten blocks). > > If the virtual machine (which may be vmware / xen / hyperv) deletes blocks > / frees space within the zvol, this also means a write - usually in meta > data area only. Thus the underlaying Storage system does not know which > blocks in a zvol are really used. >Your''re using VMs and *not* using dedupe?! VMs are almost the perfect use-case for dedupe :) -marc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100226/ddc694a6/attachment.html>
Richard Elling
2010-Feb-26 22:22 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
On Feb 26, 2010, at 11:55 AM, Lutz Schumann wrote:> This would be an idea and I thought about this. However I see the following problems: > > 1) using deduplication > > This will reduce the on disk size however the DDT will grow forever and for the deletion of zvols this will mean a lot of time and work (see other threads regarding DDT memory issues on the list)Use compression and deduplication. If you are mostly concerned about the zero fills, then the zle compressor is very fast and efficient.> 2) compression > > AS I understand it - if I do zfs send/receive (which we do for DR) data is grown to the original size again on the wire. This makes it difficult.uhmmm... compress it, too :-) -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)
Lutz Schumann
2010-May-08 12:04 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
I have to come back to this issue after a while cause it just hit me. I have a VMWare vSphere 4 test host. I have various machines in there to do tests for performance and other stuff. So a lot of IO/ benchmarks are done and a lot of data is created during this benchmarks. The vSphere test machine is connected via FC to the storage box (Nexenta 3.0). Currently the data visible withing the VM''s is ~ 10 GB (mostly OS only). ON the ESX host it is a lot more (no thin provisioning). On the Nexenta Box a lot more is used (~500G) because it has been used once. The test machine has not a lot of memory because I don''t need it. So enabeling deduplication is not an option. Also benchmarks have shown that performance really sucks. So compression is an option yes, and I yield GREAT comression ratios (> 10x). However on the wire the full 500 GB are transfered. There is NO way to shring the volumes currently. I could compress the wire with gzip, however this would be VERY unefficient: (SOURCE) Disk Read (compressed) -> decrompression to memory -> comress on wire -> (TARGET) decompress on the other side -> compress for disk -> Write Disk (compressed) This means a lot of CPU is used. Not nice. Now If I think there would be "write all zero, clear the block again" in ZFS (which I suggest in this thread) I could do the following: Fill the disks within the VM''s with all zero (dd if=/dev/zero of=/MYFILE bs=1M ...). This could effectibly write a lot for all-zero blocks to the comstar and zfs. ZFS could then free the blocks and I could then send the zvol to backup. This could mean 10 GB used in VM -> zvol size of ~10 GB -> 10 GB transfered. Doesn''t this sound better ? Of course we could also wait for punch/trip support, tune the whole stack and wait until components in such a setup support trim ... but I believe this will take years. So the approach mentioned above sounds like a programmatic solution. Where can proposals like this be placed ? bugs.opensolaris.com ? -- This message posted from opensolaris.org
Brandon High
2010-May-08 14:23 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
On Sat, May 8, 2010 at 5:04 AM, Lutz Schumann <presales at storageconcepts.de> wrote:> Now If I think there would be "write all zero, clear the block again" in ZFS (which I suggest in this thread) I could do the following: > > Fill the disks within the VM''s with all zero (dd if=/dev/zero of=/MYFILE bs=1M ...). This could effectibly write a lot for all-zero blocks to the comstar and zfs. ZFS could then free the blocks and I could then send the zvol to backup. This could mean 10 GB used in VM -> zvol size of ~10 GB -> 10 GB transfered.If you set compression=zle, you''ll only be compressing strings of 0s. It''s low overhead for the zfs system to maintain. Doing a dd like you suggest works well to reclaim most freed space from the zvol. You can unlink the file that you''re writing to before the dd is finished so that the VM doesn''t see the disk as full for more than a split second. Windows VMs can use SDelete (http://technet.microsoft.com/en-us/sysinternals/bb897443.aspx) to zero-fill their disks to free space on the server. -B -- Brandon High : bhigh at freaks.com
I run dd if=/dev/zero of=testfile bs=1024k count=50000 inside the iscsi vmfs from ESXi and rm textfile. However, the zpool list doesn''t decrease at all. In fact, the used storage increase when I do dd. FreeNas 8.0.4 and ESXi 5.0 Help. Thanks.
Koopmann, Jan-Peter
2013-Feb-10 12:01 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
Why should it? Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap support (I believe currently only Nexenta but correct me if I am wrong) the blocks will not be freed, will they? Kind regards JP Sent from a mobile device. Am 10.02.2013 um 11:01 schrieb "Datnus" <vinhdat82 at yahoo.com>:> I run dd if=/dev/zero of=testfile bs=1024k count=50000 inside the iscsi vmfs > from ESXi and rm textfile. > > However, the zpool list doesn''t decrease at all. In fact, the used storage > increase when I do dd. > > FreeNas 8.0.4 and ESXi 5.0 > Help. > Thanks. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2766 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130210/e9ede951/attachment.bin>
Jim Klimov
2013-Feb-10 13:13 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
On 2013-02-10 10:57, Datnus wrote:> I run dd if=/dev/zero of=testfile bs=1024k count=50000 inside the iscsi vmfs > from ESXi and rm textfile. > > However, the zpool list doesn''t decrease at all. In fact, the used storage > increase when I do dd. > > FreeNas 8.0.4 and ESXi 5.0 > Help. > Thanks.Did you also enable compression (any non-"off" kind) for the ZVOL which houses your iSCSI volume? The procedure with zero-copying does allocate (logically) the blocks requested in the sparse volume. If this volume is stored on ZFS with compression (active at the moment when you write these blocks), then ZFS detects an all-zeroes blocks and uses no space to store it, only adding a block pointer entry to reference its emptiness. This way you get some growth in metadata, but none in userdata for the volume. If by doing this trick you "overwrite" the non-empty but logically "deleted" blocks in the VM''s filesystem housed inside iSCSI in the ZVOL, then the backend storage should shrink by releasing those non-empty blocks. Ultimately, if you use snapshots - those released blocks would be reassigned into the snapshots of the ZVOL; and so in order to get usable free space on your pool, you''d have to destroy all those older snapshots (between creation and deletion times of those no-longer-useful blocks). If you have reservations about compression for VMs (performance-wise or somehow else), take a look at "zle" compression mode which should only reduce consecutive strings of zeroes. Also I''d reiterate - the compression mode takes effect for blocks written after the mode was set. For example, if you prefer to store your datasets generally uncompressed for any reason, then you can enable a compression mode, zero-fill the VM disk''s free space as you did, and re-disable the compression for the volume for any further writes. Also note that if you "zfs send" or otherwise copy the data off the dataset into another (backup one), only the one compression method last defined for the target dataset would be applied to the new writes into it - regardless of absence or presence (and type) of compression on the original dataset. HTH, //Jim Klimov
Koopmann, Jan-Peter
2013-Feb-10 13:55 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
I forgot about compression. Makes sense. As long as the zeroes find their way to the backend storage this should work. Thanks! Kind regards JP -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2766 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130210/82fd622d/attachment.bin>
Darren J Moffat
2013-Feb-12 10:25 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
On 02/10/13 12:01, Koopmann, Jan-Peter wrote:> Why should it? > > Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap support (I believe currently only Nexenta but correct me if I am wrong) the blocks will not be freed, will they?Solaris 11.1 has ZFS with SCSI UNMAP support. -- Darren J Moffat
Stefan Ring
2013-Feb-12 11:30 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
>> Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap >> support (I believe currently only Nexenta but correct me if I am wrong) the >> blocks will not be freed, will they? > > > Solaris 11.1 has ZFS with SCSI UNMAP support.Freeing unused blocks works perfectly well with fstrim (Linux) consuming an iSCSI zvol served up by oi151a6.
Thomas Nau
2013-Feb-12 15:07 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
Darren On 02/12/2013 11:25 AM, Darren J Moffat wrote:> > > On 02/10/13 12:01, Koopmann, Jan-Peter wrote: >> Why should it? >> >> Unless you do a shrink on the vmdk and use a zfs variant with scsi >> unmap support (I believe currently only Nexenta but correct me if I am >> wrong) the blocks will not be freed, will they? > > Solaris 11.1 has ZFS with SCSI UNMAP support.Seem to have skipped that one... Are there any related tools e.g. to release all "zero" blocks or the like? Of course it''s up to the admin then to know what all this is about or to wreck the data Thomas
Darren J Moffat
2013-Feb-12 15:36 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
On 02/12/13 15:07, Thomas Nau wrote:> Darren > > On 02/12/2013 11:25 AM, Darren J Moffat wrote: >> >> >> On 02/10/13 12:01, Koopmann, Jan-Peter wrote: >>> Why should it? >>> >>> Unless you do a shrink on the vmdk and use a zfs variant with scsi >>> unmap support (I believe currently only Nexenta but correct me if I am >>> wrong) the blocks will not be freed, will they? >> >> Solaris 11.1 has ZFS with SCSI UNMAP support. > > > Seem to have skipped that one... Are there any related tools e.g. to > release all "zero" blocks or the like? Of course it''s up to the admin > then to know what all this is about or to wreck the dataNo tools, ZFS does it automaticaly when freeing blocks when the underlying device advertises the functionality. ZFS ZVOLs shared over COMSTAR advertise SCSI UNMAP as well. -- Darren J Moffat
Casper.Dik at oracle.com
2013-Feb-12 15:45 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
>No tools, ZFS does it automaticaly when freeing blocks when the >underlying device advertises the functionality. > >ZFS ZVOLs shared over COMSTAR advertise SCSI UNMAP as well.If a system was running something older, e.g., Solaris 11; the "free" blocks will not be marked such on the server even after the system upgrades to Solaris 11.1. There might be a way to force that by disabling compression and then create a large file full with NULs and then remove that. But you need to check first that this has some effect before you even try. Casper
Sašo Kiselkov
2013-Feb-12 16:11 UTC
[zfs-discuss] Freeing unused space in thin provisioned zvols
On 02/10/2013 01:01 PM, Koopmann, Jan-Peter wrote:> Why should it? > > I believe currently only Nexenta but correct me if I am wrongThe code has been mainlined a while ago, see: https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd.c#L3702-L3730 https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/fs/zfs/zvol.c#L1697-L1754 Thanks should go to the guys at Nexenta for contributing this to the open-source effort. Cheers, -- Saso