tim Kries
2010-Apr-23 15:41 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
Hi, I am playing with opensolaris a while now. Today i tried to deduplicate the backup VHD files Windows Server 2008 generates. I made a backup before and after installing AD-role and copied the files to the share on opensolaris (build 134). First i got a straight 1.00x, then i set recordsize to 4k (to be like NTFS), it jumped up to 1.29x after that. But it should be a lot better right? Is there something i missed? Regards Tim -- This message posted from opensolaris.org
Richard Jahnel
2010-Apr-23 17:37 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
You might note, dedupe only dedupes data that is writen after the flag is set. It does not retroactivly dedupe already writen data. -- This message posted from opensolaris.org
tim Kries
2010-Apr-23 17:49 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
It was active all the time. Made a new zfs with -o dedup=on, copied with default record size, got no dedup, deleted files, set recordsize 4k, dedup ratio 1.29x -- This message posted from opensolaris.org
Khyron
2010-Apr-23 17:56 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
A few things come to mind... 1. A lot better than...what? Setting the recordsize to 4K got you some deduplication but maybe the pertinent question is what were you expecting? 2. Dedup is fairly new. I haven''t seen any reports of experiments like yours so...CONGRATULATIONS!! You''re probably the first. Or at least the first willing to discuss it with the world as a matter of public record? Since dedup is new, you can''t expect much in the way of previous experience with it. I also haven''t seen coordinated experiments of various configurations with dedup off then on, for comparison. In the end, the question is going to be whether that level of dedup is going to be enough for you. Is dedup even important? Is it just a "gravy" feature or a key requirement? You''re in un-explored territory, it appears. On Fri, Apr 23, 2010 at 11:41, tim Kries <tim.kreis at gmx.de> wrote:> Hi, > > I am playing with opensolaris a while now. Today i tried to deduplicate the > backup VHD files Windows Server 2008 generates. I made a backup before and > after installing AD-role and copied the files to the share on opensolaris > (build 134). First i got a straight 1.00x, then i set recordsize to 4k (to > be like NTFS), it jumped up to 1.29x after that. But it should be a lot > better right? > > Is there something i missed? > > Regards > Tim > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it''s a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100423/a4888cbb/attachment.html>
tim Kries
2010-Apr-23 18:13 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
Dedup is a key element for my purpose, because i am planning a central repository for like 150 Windows Server 2008 (R2) servers which would take a lot less storage if they dedup right. -- This message posted from opensolaris.org
Constantin Gonzalez
2010-Apr-26 08:45 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
Hi Tim, thanks for sharing your dedup experience. Especially for Virtualization, having a good pool of experience will help a lot of people. So you see a dedup ratio of 1.29 for two installations of Windows Server 2008 on the same ZFS backing store, if I understand you correctly. What dedup ratios do you see for the third, fourth and fifth server installation? Also, maybe dedup is not the only way to save space. What compression rate do you get? And: Have you tried setting up a Windows System, then setting up the next one based on a ZFS clone of the first one? Hope this helps, Constantin On 04/23/10 08:13 PM, tim Kries wrote:> Dedup is a key element for my purpose, because i am planning a central repository for like 150 Windows Server 2008 (R2) servers which would take a lot less storage if they dedup right.-- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist Blog: constantin.glez.de Tel.: +49 89/4 60 08-25 91 Twitter: @zalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: J?rgen Kunz
tim Kries
2010-Apr-26 15:51 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
Hi, The setting was this: Fresh installation of 2008 R2 -> server backup with the backup feature -> move vhd to zfs -> install active directory role -> backup again -> move vhd to same share I am kinda confused over the change of dedup ratio from changing the record size, since it should dedup 256-bit blocks. I have to set up the opensolaris again since it died in my virtualbox (no sure why), so i cant test more server installations atm. Compression seemed to work pretty good (i used gzip-6) and i think it was compress ratio ~4, but i dont think that would work well for productive systems since you would need some serious cpu-power to work with. I will setup up another test in a few hours. Personally i am not sure if using clones might be a good idea for windows server 2008, all these problems with sid... -- This message posted from opensolaris.org
tim Kries
2010-Apr-26 19:31 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
I found the VHD specification here: http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc I am not sure if i understand it right, but it seems like data on disk gets "compressed" into the vhd (no empty space), so even a slight difference in the beginning of the file will slide through and ruin the pattern for block based dedup. As I am not an expert on file systems, someone with more expertise would be appreciated to look at this. Would be a real shame. -- This message posted from opensolaris.org
Brandon High
2010-Apr-26 23:54 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
On Mon, Apr 26, 2010 at 8:51 AM, tim Kries <tim.kreis at gmx.de> wrote:> I am kinda confused over the change of dedup ratio from changing the record size, since it should dedup 256-bit blocks.Dedup works on the blocks or either recordsize or volblocksize. The checksum is made per block written, and those checksums are used to dedup the data. With a recordsize of 128k, two blocks with a one byte difference would not dedup. With an 8k recordsize, 15 out of 16 blocks would dedup. Repeat over the entire VHD. Setting the record size equal to a multiple of the VHD''s internal block size and ensuring that the internal filesystem is block aligned will probably help to improve dedup ratios. So for an NTFS guest with 4k blocks, use a 4k, 8k or 16k record size and ensure that when you install in the VHD that its partitions are block aligned for the recordsize you''re using. VHD supports fixed size and dynamic size images. If you''re using a fixed image, the space is pre-allocated. This doesn''t mean you''ll waste unused space on ZFS with compression, since all those zeros will take up almost no space. Your VHD file should remain block-aligned however. I''m not sure that a dynamic size image will block align if there is empty space. Using compress=zle will only compress the zeros with almost no cpu penalty. Using a COMSTAR iscsi volume is probably an even better idea, since you won''t have the POSIX layer in the path, and you won''t have the VHD file header throwing off your block alignment. -B -- Brandon High : bhigh at freaks.com
Tim.Kreis
2010-Apr-27 05:50 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
The problem is that the windows server backup seems to choose dynamic vhd (which would make sense in most cases) and I dont know if there is a way to change that. Using ISCSI-volumes wont help in my case since servers are running on physical hardware. Am 27.04.2010 01:54, schrieb Brandon High:> On Mon, Apr 26, 2010 at 8:51 AM, tim Kries<tim.kreis at gmx.de> wrote: > >> I am kinda confused over the change of dedup ratio from changing the record size, since it should dedup 256-bit blocks. >> > Dedup works on the blocks or either recordsize or volblocksize. The > checksum is made per block written, and those checksums are used to > dedup the data. > > With a recordsize of 128k, two blocks with a one byte difference would > not dedup. With an 8k recordsize, 15 out of 16 blocks would dedup. > Repeat over the entire VHD. > > Setting the record size equal to a multiple of the VHD''s internal > block size and ensuring that the internal filesystem is block aligned > will probably help to improve dedup ratios. So for an NTFS guest with > 4k blocks, use a 4k, 8k or 16k record size and ensure that when you > install in the VHD that its partitions are block aligned for the > recordsize you''re using. > > VHD supports fixed size and dynamic size images. If you''re using a > fixed image, the space is pre-allocated. This doesn''t mean you''ll > waste unused space on ZFS with compression, since all those zeros will > take up almost no space. Your VHD file should remain block-aligned > however. I''m not sure that a dynamic size image will block align if > there is empty space. Using compress=zle will only compress the zeros > with almost no cpu penalty. > > Using a COMSTAR iscsi volume is probably an even better idea, since > you won''t have the POSIX layer in the path, and you won''t have the VHD > file header throwing off your block alignment. > > -B > >
Roy Sigurd Karlsbakk
2010-Apr-27 12:42 UTC
[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
----- "Tim.Kreis" <tim.kreis at gmx.de> skrev:> The problem is that the windows server backup seems to choose dynamic > > vhd (which would make sense in most cases) and I dont know if there is > a > way to change that. Using ISCSI-volumes wont help in my case since > servers are running on physical hardware.It should work well anyway, if you (a) fill up the server with memory and (b) reduce block size to 8k or even less. But do (a) before (b). Dedup is very memory hungry roy