Kjetil Torgrim Homme
2010-Jan-24 15:26 UTC
[zfs-discuss] optimise away COW when rewriting the same data?
I was looking at the performance of using rsync to copy some large files which change only a little between each run (database files). I take a snapshot after every successful run of rsync, so when using rsync --inplace, only changed portions of the file will occupy new disk space. Unfortunately, performance wasn''t too good, the source server in question simply didn''t have much CPU to perform the rsync delta algorithm, and in addition it creates read I/O load on the destination server. So I had to switch it off and transfer the whole file instead. In this particular case, that means I need 120 GB to store each run rather than 10, but that''s the way it goes. If I had enabled deduplication, this would be a moot point, dedup would take care of it for me. Judging from early reports my server will probably not have the required oomph to handle it, so I''m holding off until I get to replace it with a server with more RAM and CPU. But it occured to me that this is a special case which could be beneficial in many cases -- if the filesystem uses secure checksums, it could check the existing block pointer and see if the replaced data matches. (Due to the (infinitesimal) potential for hash collisions this should be configurable the same way it is for dedup.) In essence, rsync''s writes would become no-ops, and very little CPU would be wasted on either side of the pipe. Even in the absence of snapshots, this would leave the filesystem less fragmented, since the COW is avoided. This would be a win-win if the ZFS pipeline can communicate the correct information between layers. Are there any ZFS hackers who can comment on the feasibility of this idea? -- Kjetil T. Homme Redpill Linpro AS - Changing the game
David Magda
2010-Jan-24 17:43 UTC
[zfs-discuss] optimise away COW when rewriting the same data?
On Jan 24, 2010, at 10:26, Kjetil Torgrim Homme wrote:> But it occured to me that this is a special case which could be > beneficial in many cases -- if the filesystem uses secure checksums, > it > could check the existing block pointer and see if the replaced data > matches. [...] > > Are there any ZFS hackers who can comment on the feasibility of this > idea?There is a bug that requests an API in ZFS'' DMU library to get checksum data: 6856024 - DMU checksum API http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6856024 It specifically mentions Lustre, and not anything like the ZFS POSIX interface to files (ZPL). There''s also:> Here''s another: file comparison based on values derived from files'' > checksum or dnode block pointer. This would allow for very efficient > file comparison between filesystems related by cloning. Such values > might be made available through an extended attribute, say.http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6366224 It''s been brought up before on zfs-discuss: the two options would be linking against some kind of ZFS-specific library, or using an ioctl() of some kind. As it stands, ZFS is really the only mainstream(-ish) file system that does checksums, and so there''s no standard POSIX call for such things. Perhaps as more file systems add this functionality something will come of it.
Kjetil Torgrim Homme
2010-Jan-24 23:20 UTC
[zfs-discuss] optimise away COW when rewriting the same data?
David Magda <dmagda at ee.ryerson.ca> writes:> On Jan 24, 2010, at 10:26, Kjetil Torgrim Homme wrote: > >> But it occured to me that this is a special case which could be >> beneficial in many cases -- if the filesystem uses secure checksums, >> it could check the existing block pointer and see if the replaced >> data matches. [...] >> >> Are there any ZFS hackers who can comment on the feasibility of this >> idea? > > There is a bug that requests an API in ZFS'' DMU library to get > checksum data: > > 6856024 - DMU checksum API > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6856024That would work, but it would require rsync to do checksum calculations itself to do the comparison. Then ZFS would recalculate the checksum if the data was actually written, so it''s wasting work for local copies. It would be interesting to extend the rsync protocol to take advantage of this, though, so that the checksum can be calculated on the remote host. Hmmmm... It would need very ZFS specific support, e.g., the recordsize is potentially different for each file, likewise for the checksum algorithm. Fixing a library seems easier than patching the kernel, so your approach is probably better anyhow.> It specifically mentions Lustre, and not anything like the ZFS POSIX > interface to files (ZPL). There''s also: > >> Here''s another: file comparison based on values derived from files'' >> checksum or dnode block pointer. This would allow for very efficient >> file comparison between filesystems related by cloning. Such values >> might be made available through an extended attribute, say. > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6366224 > > It''s been brought up before on zfs-discuss: the two options would be > linking against some kind of ZFS-specific library, or using an ioctl() > of some kind. As it stands, ZFS is really the only mainstream(-ish) > file system that does checksums, and so there''s no standard POSIX call > for such things. Perhaps as more file systems add this functionality > something will come of it.The checksum algorithms need to be very strictly specified. Not a problem for sha256, I guess, but fletcher4 probably don''t have independent implementations which are 100% compatible with ZFS -- and GPL (needed for rsync and many other applications). -- Kjetil T. Homme Redpill Linpro AS - Changing the game