Jeremy Kitchen
2010-Mar-03 00:54 UTC
[zfs-discuss] couple of quick questions with regard to dedup
so, I''m playing around with dedup, and trying to get it set up how I want, with little impact on performance (we''re using zfs primarily for storage of backups, using rsync to copy the files from our linux servers to our opensolaris/zfs ''backupbricks) currently running snv_133 on x86, zpool version 22, zfs version 4 first question: I can''t seem to get fletcher4,verify enabled for the dedup property on my filesystems, whenever I try, I get this: # zfs set dedup=fletcher4,verify raid3153 cannot set property for ''raid3153'': ''dedup'' must be one of ''on | off | verify | sha256[,verify]'' since sha256,verify is the same as just verify (at least for now, and having it set explicitly is a Good Thing, imo) I don''t see anything about fletcher4 in there. I saw it had been disabled at one point because of the endianness issue, but it''s been re-enabled yes? second question: (2 parter, actually) with regard to verify, I notice that how it works is that it''ll compare the checksums, and if it finds a match it then goes and does a direct byte comparison of the data. first part: Would it not be more efficient to simply store 2 different checksums (both can be fast, so long as they''re different algorithms) and compare the one checksum and then the other? Seems like the likelyhood of two colliding sets of data having the same checksum under two algorithms is astronomically high (even more so than what is already there) without having the penalty of having to re-read the entire block from disk. Maybe it''s a trade-off of increased memory usage vs increased reads? Memory is cheap, disk bandwidth is not, so maybe having the option of a larger DDT with this feature would be useful? second part: is there any way to enable some sort of logging when a checksum collision is noticed when verify is on? The reason I ask this is because we have well over 2PB of data we''re backing up to our zfs storage and will have much more in the future, so if anyone is going to get a collision, it''s probably going to be us. However, if the empirical collision rate is incredibly low, or non-existant, and I can confirm that with some sort of alert when one occurs, I could justify turning verify off on my dedup settings. third question: is there a way to look at the current size of the DDT as it exists in memory? I need to know if I need more ram in my systems to support dedup. We currently have 32GB of ram in some systems, and 16GB in others (the ones based on our old x4500s) with each system housing about 100TB of data. Thanks for everything, folks, zfs is awesome, and the new dedup and userquota features are ones I am eagerly looking forward to implementing in our setup, keep up the good work! -Jeremy