I''m curious as to how send/recv intersects with dedupe... if I send/recv a deduped filesystem, is the data sent it it''s de-duped form, ie just sent once, followed by the pointers for subsequent dupe data, or is the the data sent in expanded form, with the recv side system then having to redo the dedupe process? Obviously sending it deduped is more efficient in terms of bandwidth and CPU time on the recv side, but it may also be more complicated to achieve? Also - do we know yet what affect block size has on dedupe? My guess is that a smaller block size will perhaps give a better duplication match rate, but at the cost of higher CPU usage and perhaps reduced performance, as the system will need to store larger de-dupe hash tables? Regards, Tristan
Tristan, there''s another dedup system for "zfs send" in PSARC 2009/557. This can be used independently of whether the in-pool data was deduped. Case log: http://arc.opensolaris.org/caselog/PSARC/2009/557/ Discussion: http://www.opensolaris.org/jive/thread.jspa?threadID=115082 So I believe your deduped data is rehydrated for sending, and then (within the send stream) this other method may be used to save space in transit. What the pool on the receiving end does with it will depend on it''s local dedup settings. HTH... -cheers, CSB -- This message posted from opensolaris.org
Tristan Ball wrote:> > I''m curious as to how send/recv intersects with dedupe... if I send/recv > a deduped filesystem, is the data sent it it''s de-duped form, ie just > sent once, followed by the pointers for subsequent dupe data, or is the > the data sent in expanded form, with the recv side system then having to > redo the dedupe process?The on disk dedup and dedup of the stream are actually separate features. Stream dedup hasn''t yet integrated. It will be a choice at *send* time if the stream is to be deduplicated.> Obviously sending it deduped is more efficient in terms of bandwidth and > CPU time on the recv side, but it may also be more complicated to achieve?A stream can be deduped even if the on disk format isn''t and vice versa.> Also - do we know yet what affect block size has on dedupe? My guess is > that a smaller block size will perhaps give a better duplication match > rate, but at the cost of higher CPU usage and perhaps reduced > performance, as the system will need to store larger de-dupe hash tables?That really depends on how the applications write blocks and what your data is like. It could go either way very easily. As with all dedup it is a trade off between IO bandwidth and CPU/memory. Sometimes dedup will improve performance, since like compression it can reduce IO requirements, but depending on workload the CPU/memory overhead may or may not be worth it (same with compression). -- Darren J Moffat
Hi Darren, More below... Darren J Moffat wrote:> Tristan Ball wrote: > >> Obviously sending it deduped is more efficient in terms of bandwidth >> and CPU time on the recv side, but it may also be more complicated to >> achieve? > > A stream can be deduped even if the on disk format isn''t and vice versa. >Is the send dedup''ing more efficient if the filesystem is already depdup''d? If both are enabled do they share anything? -Kyle
Kyle McDonald wrote:> Hi Darren, > > More below... > > Darren J Moffat wrote: >> Tristan Ball wrote: >> >>> Obviously sending it deduped is more efficient in terms of bandwidth >>> and CPU time on the recv side, but it may also be more complicated >>> to achieve? >> >> A stream can be deduped even if the on disk format isn''t and vice versa. >> > Is the send dedup''ing more efficient if the filesystem is already > depdup''d? If both are enabled do they share anything? > > -Kyle >At this time, no. But very shortly we hope to tie the two together better to make use of the existing checksums and duplication info available in the on-disk and in-kernel structures. Lori
Kyle McDonald wrote:> Hi Darren, > > More below... > > Darren J Moffat wrote: >> Tristan Ball wrote: >> >>> Obviously sending it deduped is more efficient in terms of bandwidth >>> and CPU time on the recv side, but it may also be more complicated to >>> achieve? > >> A stream can be deduped even if the on disk format isn''t and vice versa. >> > Is the send dedup''ing more efficient if the filesystem is already > depdup''d? If both are enabled do they share anything?ZFS send deduplication is still in development so I''d rather let the engineers working on it say what they are doing if they wish to. -- Darren J Moffat