thr3ads.net - zfs discuss - [zfs-discuss] optimise away COW when rewriting the same data? [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Kjetil Torgrim Homme

2010-Jan-24 15:26 UTC

[zfs-discuss] optimise away COW when rewriting the same data?

I was looking at the performance of using rsync to copy some large files
which change only a little between each run (database files).  I take a
snapshot after every successful run of rsync, so when using rsync
--inplace, only changed portions of the file will occupy new disk space.

Unfortunately, performance wasn''t too good, the source server in
question simply didn''t have much CPU to perform the rsync delta
algorithm, and in addition it creates read I/O load on the destination
server.  So I had to switch it off and transfer the whole file instead.
In this particular case, that means I need 120 GB to store each run
rather than 10, but that''s the way it goes.

If I had enabled deduplication, this would be a moot point, dedup would
take care of it for me.  Judging from early reports my server
will probably not have the required oomph to handle it, so I''m holding
off until I get to replace it with a server with more RAM and CPU.

But it occured to me that this is a special case which could be
beneficial in many cases -- if the filesystem uses secure checksums, it
could check the existing block pointer and see if the replaced data
matches.  (Due to the (infinitesimal) potential for hash collisions this
should be configurable the same way it is for dedup.)  In essence,
rsync''s writes would become no-ops, and very little CPU would be wasted
on either side of the pipe.

Even in the absence of snapshots, this would leave the filesystem less
fragmented, since the COW is avoided.  This would be a win-win if the
ZFS pipeline can communicate the correct information between layers.

Are there any ZFS hackers who can comment on the feasibility of this
idea?

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

David Magda

2010-Jan-24 17:43 UTC

head link

[zfs-discuss] optimise away COW when rewriting the same data?

On Jan 24, 2010, at 10:26, Kjetil Torgrim Homme wrote:
> But it occured to me that this is a special case which could be
> beneficial in many cases -- if the filesystem uses secure checksums,  
> it
> could check the existing block pointer and see if the replaced data
> matches.  [...]
>
> Are there any ZFS hackers who can comment on the feasibility of this
> idea?
There is a bug that requests an API in ZFS'' DMU library to get  
checksum data:

	6856024 - DMU checksum API
		http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6856024

It specifically mentions Lustre, and not anything like the ZFS POSIX  
interface to files (ZPL). There''s also:
> Here''s another: file comparison based on values derived from
files''
> checksum or dnode block pointer. This would allow for very efficient  
> file comparison between filesystems related by cloning. Such values  
> might be made available through an extended attribute, say.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6366224

It''s been brought up before on zfs-discuss: the two options would be  
linking against some kind of ZFS-specific library, or using an ioctl()  
of some kind. As it stands, ZFS is really the only mainstream(-ish)  
file system that does checksums, and so there''s no standard POSIX call
for such things. Perhaps as more file systems add this functionality  
something will come of it.

Kjetil Torgrim Homme

2010-Jan-24 23:20 UTC

head link

[zfs-discuss] optimise away COW when rewriting the same data?

David Magda <dmagda at ee.ryerson.ca> writes:
> On Jan 24, 2010, at 10:26, Kjetil Torgrim Homme wrote:
>
>> But it occured to me that this is a special case which could be
>> beneficial in many cases -- if the filesystem uses secure checksums,
>> it could check the existing block pointer and see if the replaced
>> data matches.  [...]
>>
>> Are there any ZFS hackers who can comment on the feasibility of this
>> idea?
>
> There is a bug that requests an API in ZFS'' DMU library to get
> checksum data:
>
> 	6856024 - DMU checksum API
> 	http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6856024
That would work, but it would require rsync to do checksum calculations
itself to do the comparison.  Then ZFS would recalculate the checksum if
the data was actually written, so it''s wasting work for local copies.
It would be interesting to extend the rsync protocol to take advantage
of this, though, so that the checksum can be calculated on the remote
host.  Hmmmm...  It would need very ZFS specific support, e.g., the
recordsize is potentially different for each file, likewise for the
checksum algorithm.

Fixing a library seems easier than patching the kernel, so your approach
is probably better anyhow.
> It specifically mentions Lustre, and not anything like the ZFS POSIX
> interface to files (ZPL). There''s also:
>
>> Here''s another: file comparison based on values derived from
files''
>> checksum or dnode block pointer. This would allow for very efficient
>> file comparison between filesystems related by cloning. Such values
>> might be made available through an extended attribute, say.
>
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6366224
>
> It''s been brought up before on zfs-discuss: the two options would
be
> linking against some kind of ZFS-specific library, or using an ioctl()
> of some kind. As it stands, ZFS is really the only mainstream(-ish)
> file system that does checksums, and so there''s no standard POSIX
call
> for such things. Perhaps as more file systems add this functionality
> something will come of it.
The checksum algorithms need to be very strictly specified.  Not a
problem for sha256, I guess, but fletcher4 probably don''t have
independent implementations which are 100% compatible with ZFS -- and
GPL (needed for rsync and many other applications).

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

zfs discuss - Jan 2010 - optimise away COW when rewriting the same data?

[zfs-discuss] optimise away COW when rewriting the same data?

[zfs-discuss] optimise away COW when rewriting the same data?

[zfs-discuss] optimise away COW when rewriting the same data?