Nirbheek Chauhan
2010-Dec-06 12:41 UTC
"Appending" data to the middle of a file using btrfs-specific features
Hello, I''d like to know if there has been any discussion about adding a new feature to write (add) data at an offset, but without overwriting existing data, or re-writing the existing data. Essentially, in-place addition/removal of data to a file at a place other than the end of the file. Some possible use-cases of such a feature would be: (a) Databases (currently hack around this by allocating sparse files) (b) Delta-patching (rsync, patch, xdelta, etc) (c) Video editors (especially if combined with reflink copies) Besides I/O savings, it would also have significant space savings if the current subvolume being written to has been snapshotted (a common use-case for incremental backups). I''ve been told that the problem is somewhat difficult to solve properly under block-based representation of data, but I was hoping that btrfs'' reflink mechanism and its space-efficient packing of small files might make it doable. A hack I can think of is to do a BTRFS_IOC_CLONE_RANGE into a new file (upto the offset), writing whatever data is required, and then doing another BTRFS_IOC_CLONE_RANGE with an offset for the rest of the original file. This can be followed by a rename() over the original file. Similarly for removing data from the middle of a file. Would this work? Would it be cleaner to implement something equivalent internally? Thanks! -- ~Nirbheek Chauhan Gentoo GNOME+Mozilla Team -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2010-Dec-06 16:05 UTC
Re: "Appending" data to the middle of a file using btrfs-specific features
Excerpts from Nirbheek Chauhan''s message of 2010-12-06 07:41:16 -0500:> Hello, > > I''d like to know if there has been any discussion about adding a new > feature to write (add) data at an offset, but without overwriting > existing data, or re-writing the existing data. Essentially, in-place > addition/removal of data to a file at a place other than the end of > the file. > > Some possible use-cases of such a feature would be: > > (a) Databases (currently hack around this by allocating sparse files) > (b) Delta-patching (rsync, patch, xdelta, etc) > (c) Video editors (especially if combined with reflink copies) > > Besides I/O savings, it would also have significant space savings if > the current subvolume being written to has been snapshotted (a common > use-case for incremental backups). > > I''ve been told that the problem is somewhat difficult to solve > properly under block-based representation of data, but I was hoping > that btrfs'' reflink mechanism and its space-efficient packing of small > files might make it doable. > > A hack I can think of is to do a BTRFS_IOC_CLONE_RANGE into a new file > (upto the offset), writing whatever data is required, and then doing > another BTRFS_IOC_CLONE_RANGE with an offset for the rest of the > original file. This can be followed by a rename() over the original > file. Similarly for removing data from the middle of a file. Would > this work? Would it be cleaner to implement something equivalent > internally?It would work yes. The operation has three cases: 1) file size doesn''t change 2) extend the file with new bytes in the middle 3) make the file smaller removing bytes in the middle #1 is the easiest case, you can just use the clone range ioctl directly For #2 and #3, all of the file pointers past the bytes you want to add or remove need to be updated with a new file offset. I''d say for an initial implementation to use the IOC_CLONE_RANGE code, and after everything is working we can look at optimizing it with a shift ioctl if it makes sense. Of the use cases you list, video editors seems the most useful. Databases already have things pretty much under control, and delta patching wants to go to a new file anyway. Video editing software has long been looking for ways to do this. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nirbheek Chauhan
2010-Dec-06 19:14 UTC
Re: "Appending" data to the middle of a file using btrfs-specific features
On Mon, Dec 6, 2010 at 9:35 PM, Chris Mason <chris.mason@oracle.com> wrote:> Excerpts from Nirbheek Chauhan''s message of 2010-12-06 07:41:16 -0500:[snip]>> Some possible use-cases of such a feature would be: >> >> (a) Databases (currently hack around this by allocating sparse files) >> (b) Delta-patching (rsync, patch, xdelta, etc) >> (c) Video editors (especially if combined with reflink copies) >> >> Besides I/O savings, it would also have significant space savings if >> the current subvolume being written to has been snapshotted (a common >> use-case for incremental backups). >>[snip]>> A hack I can think of is to do a BTRFS_IOC_CLONE_RANGE into a new file >> (upto the offset), writing whatever data is required, and then doing >> another BTRFS_IOC_CLONE_RANGE with an offset for the rest of the >> original file. This can be followed by a rename() over the original >> file. Similarly for removing data from the middle of a file. Would >> this work? Would it be cleaner to implement something equivalent >> internally? > > It would work yes. The operation has three cases: > > 1) file size doesn''t change > 2) extend the file with new bytes in the middle > 3) make the file smaller removing bytes in the middle > > #1 is the easiest case, you can just use the clone range ioctl directly > > For #2 and #3, all of the file pointers past the bytes you want to add > or remove need to be updated with a new file offset. I''d say for an > initial implementation to use the IOC_CLONE_RANGE code, and after > everything is working we can look at optimizing it with a shift ioctl if > it makes sense. >Alrighty, I''ll try this and report back any bugs and/or suggestions.> Of the use cases you list, video editors seems the most useful. > Databases already have things pretty much under control, and delta > patching wants to go to a new file anyway. Video editing software has > long been looking for ways to do this. >As an aside, my primary motivation for this was that doing an incremental backup of things like git bare repositories and databases using btrfs subvolume snapshots is expensive w.r.t. disk space. Even though rsync calculates a binary delta before transferring data, it has to write everything out (except if just appending). So in that case, each "incremental" backup is hardly so. Thanks for your help! :) -- ~Nirbheek Chauhan Gentoo GNOME+Mozilla Team -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2010-Dec-06 19:33 UTC
Re: "Appending" data to the middle of a file using btrfs-specific features
Excerpts from Nirbheek Chauhan''s message of 2010-12-06 14:14:59 -0500:> On Mon, Dec 6, 2010 at 9:35 PM, Chris Mason <chris.mason@oracle.com> wrote: > > Excerpts from Nirbheek Chauhan''s message of 2010-12-06 07:41:16 -0500: > [snip] > >> Some possible use-cases of such a feature would be: > >> > >> (a) Databases (currently hack around this by allocating sparse files) > >> (b) Delta-patching (rsync, patch, xdelta, etc) > >> (c) Video editors (especially if combined with reflink copies) > >> > >> Besides I/O savings, it would also have significant space savings if > >> the current subvolume being written to has been snapshotted (a common > >> use-case for incremental backups). > >> > [snip] > >> A hack I can think of is to do a BTRFS_IOC_CLONE_RANGE into a new file > >> (upto the offset), writing whatever data is required, and then doing > >> another BTRFS_IOC_CLONE_RANGE with an offset for the rest of the > >> original file. This can be followed by a rename() over the original > >> file. Similarly for removing data from the middle of a file. Would > >> this work? Would it be cleaner to implement something equivalent > >> internally? > > > > It would work yes. Â The operation has three cases: > > > > 1) file size doesn''t change > > 2) extend the file with new bytes in the middle > > 3) make the file smaller removing bytes in the middle > > > > #1 is the easiest case, you can just use the clone range ioctl directly > > > > For #2 and #3, all of the file pointers past the bytes you want to add > > or remove need to be updated with a new file offset. Â I''d say for an > > initial implementation to use the IOC_CLONE_RANGE code, and after > > everything is working we can look at optimizing it with a shift ioctl if > > it makes sense. > > > > Alrighty, I''ll try this and report back any bugs and/or suggestions. > > > Of the use cases you list, video editors seems the most useful. > > Databases already have things pretty much under control, and delta > > patching wants to go to a new file anyway. Â Video editing software has > > long been looking for ways to do this. > > > > As an aside, my primary motivation for this was that doing an > incremental backup of things like git bare repositories and databases > using btrfs subvolume snapshots is expensive w.r.t. disk space. Even > though rsync calculates a binary delta before transferring data, it > has to write everything out (except if just appending). So in that > case, each "incremental" backup is hardly so.Oh, I see what you mean. Yes that is definitely an interesting use case. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Freddie Cash
2010-Dec-06 19:35 UTC
Re: "Appending" data to the middle of a file using btrfs-specific features
On Mon, Dec 6, 2010 at 11:14 AM, Nirbheek Chauhan <nirbheek.chauhan@gmail.com> wrote:> As an aside, my primary motivation for this was that doing an > incremental backup of things like git bare repositories and databases > using btrfs subvolume snapshots is expensive w.r.t. disk space. Even > though rsync calculates a binary delta before transferring data, it > has to write everything out (except if just appending). So in that > case, each "incremental" backup is hardly so.Since btrfs is Copy-on-Write, have you experimented with --inplace on the rsync command-line? That way, rsync writes the changes "over-top" of the existing file, thus allowing btrfs to only write out the blocks that have changed, via CoW? We do this with our ZFS rsync backups, and found disk usage to go way down over the default "write out new data to new file, rename overtop" method that rsync uses. There''s also the --no-whole-file option which causes rsync to only send delta changes for existing files, another useful feature with CoW filesystems. -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nirbheek Chauhan
2010-Dec-06 20:30 UTC
Re: "Appending" data to the middle of a file using btrfs-specific features
On Tue, Dec 7, 2010 at 1:05 AM, Freddie Cash <fjwcash@gmail.com> wrote:> On Mon, Dec 6, 2010 at 11:14 AM, Nirbheek Chauhan > <nirbheek.chauhan@gmail.com> wrote: >> As an aside, my primary motivation for this was that doing an >> incremental backup of things like git bare repositories and databases >> using btrfs subvolume snapshots is expensive w.r.t. disk space. Even >> though rsync calculates a binary delta before transferring data, it >> has to write everything out (except if just appending). So in that >> case, each "incremental" backup is hardly so. > > Since btrfs is Copy-on-Write, have you experimented with --inplace on > the rsync command-line? That way, rsync writes the changes "over-top" > of the existing file, thus allowing btrfs to only write out the blocks > that have changed, via CoW? > > We do this with our ZFS rsync backups, and found disk usage to go way > down over the default "write out new data to new file, rename overtop" > method that rsync uses. > > There''s also the --no-whole-file option which causes rsync to only > send delta changes for existing files, another useful feature with CoW > filesystems. >I had tried the --inplace option, but it didn''t seem to do anything for me, so I didn''t explore that further. However, after following your suggestion and retrying with --no-whole-file, I see that the behaviour is quite different! It seems that --whole-file is enabled by default for local file transfers, and so --inplace had no effect. But the behaviour of --inplace is not entirely to write out *only* the blocks that have changed. From what I could make out, it does the following: (1) Calculate a delta b/w the src and trg files (2) Seek to the first difference in the target file (3) Start writing data I''m glossing over the final step because I didn''t look deeper, but I think you can safely assume that after the first difference, all data is rewritten. So this is halfway between "rewrite the whole file" and "write only the changed bits into the file". It doesn''t actually use any CoW features from what I can see. There is lots of room for btrfs reflinking magic. :) Note that I tested this behaviour on a btrfs partition with a vanilla rsync-3.0.7 tarball; the copy you use with ZFS might be doing some CoW magic. Thanks for the tip! -- ~Nirbheek Chauhan Gentoo GNOME+Mozilla Team -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Freddie Cash
2010-Dec-06 20:42 UTC
Re: "Appending" data to the middle of a file using btrfs-specific features
On Mon, Dec 6, 2010 at 12:30 PM, Nirbheek Chauhan <nirbheek.chauhan@gmail.com> wrote:> On Tue, Dec 7, 2010 at 1:05 AM, Freddie Cash <fjwcash@gmail.com> wrote: >> On Mon, Dec 6, 2010 at 11:14 AM, Nirbheek Chauhan >> <nirbheek.chauhan@gmail.com> wrote: >>> As an aside, my primary motivation for this was that doing an >>> incremental backup of things like git bare repositories and databases >>> using btrfs subvolume snapshots is expensive w.r.t. disk space. Even >>> though rsync calculates a binary delta before transferring data, it >>> has to write everything out (except if just appending). So in that >>> case, each "incremental" backup is hardly so. >> >> Since btrfs is Copy-on-Write, have you experimented with --inplace on >> the rsync command-line? That way, rsync writes the changes "over-top" >> of the existing file, thus allowing btrfs to only write out the blocks >> that have changed, via CoW? >> >> We do this with our ZFS rsync backups, and found disk usage to go way >> down over the default "write out new data to new file, rename overtop" >> method that rsync uses. >> >> There''s also the --no-whole-file option which causes rsync to only >> send delta changes for existing files, another useful feature with CoW >> filesystems. >> > I had tried the --inplace option, but it didn''t seem to do anything > for me, so I didn''t explore that further. However, after following > your suggestion and retrying with --no-whole-file, I see that the > behaviour is quite different! It seems that --whole-file is enabled by > default for local file transfers, and so --inplace had no effect.Yes, correct, --whole-file is used for local transfers since it''s assumed you have all the disk I/O in the world, so why try to limit the amount of data transferred. :)> But the behaviour of --inplace is not entirely to write out *only* the > blocks that have changed. From what I could make out, it does the > following: > > (1) Calculate a delta b/w the src and trg files > (2) Seek to the first difference in the target file > (3) Start writing dataThat may be true, I''ve never looked into the actual algorithm(s) that rsync uses. Just played around with CLI options until we found the set that works best in our situation (--inplace --delete-during --no-whole-file --numeric-ids --hard-links --archive, over SSH with HPN patches).> I''m glossing over the final step because I didn''t look deeper, but I > think you can safely assume that after the first difference, all data > is rewritten. So this is halfway between "rewrite the whole file" and > "write only the changed bits into the file". It doesn''t actually use > any CoW features from what I can see. There is lots of room for btrfs > reflinking magic. :) > > Note that I tested this behaviour on a btrfs partition with a vanilla > rsync-3.0.7 tarball; the copy you use with ZFS might be doing some CoW > magic.All the CoW "magic" is handled by the filesystem, and not the tools on top. If the tool only updates X bytes, which fit into 1 block on the fs, then only that 1 block gets updated via CoW. Personally, I don''t think the tools need to be updated to understand CoW or to integrate with the underlying FS. Instead, they should just operate on blocks of X size, and let the FS figure out what to do. Otherwise, you end up with "rsync for ZFS", "rsync for ZFS", "rsync for BtrFS", "rsync for FAT32", etc. But, I''m just a lowly sysadmin, what do I know about filesystem internals? ;) -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nirbheek Chauhan
2010-Dec-07 07:38 UTC
Re: "Appending" data to the middle of a file using btrfs-specific features
On Tue, Dec 7, 2010 at 2:12 AM, Freddie Cash <fjwcash@gmail.com> wrote:> On Mon, Dec 6, 2010 at 12:30 PM, Nirbheek Chauhan > <nirbheek.chauhan@gmail.com> wrote: >> But the behaviour of --inplace is not entirely to write out *only* the >> blocks that have changed. From what I could make out, it does the >> following: >> >> (1) Calculate a delta b/w the src and trg files >> (2) Seek to the first difference in the target file >> (3) Start writing data > > That may be true, I''ve never looked into the actual algorithm(s) that > rsync uses. Just played around with CLI options until we found the > set that works best in our situation (--inplace --delete-during > --no-whole-file --numeric-ids --hard-links --archive, over SSH with > HPN patches). > >> I''m glossing over the final step because I didn''t look deeper, but I >> think you can safely assume that after the first difference, all data >> is rewritten. So this is halfway between "rewrite the whole file" and >> "write only the changed bits into the file". It doesn''t actually use >> any CoW features from what I can see. There is lots of room for btrfs >> reflinking magic. :) >> >> Note that I tested this behaviour on a btrfs partition with a vanilla >> rsync-3.0.7 tarball; the copy you use with ZFS might be doing some CoW >> magic. > > All the CoW "magic" is handled by the filesystem, and not the tools on > top. If the tool only updates X bytes, which fit into 1 block on the > fs, then only that 1 block gets updated via CoW. >I''m quite sure that''s what happens in btrfs too, but the thing about updating in-place is that if you have ABCDXXXEFGH which needs to change to ABCDZZZEFGH You''re all good. Only the blocks corresponding to XXX will be updated. But if the change is ABCDZZZZEFGH You''ll need to start rewriting EFGH since there''s no way to insert data in the middle (afaik) of a file with standard syscalls. Maybe later you get a set of changes which sync you up with the file''s contents again, but the chances of that happening in a large file are quite remote. That''s why I said that it can be safely assumed that after the first difference, all data is rewritten. The only way to get around this on the filesystem level that I can think of is data de-duplication; the filesystem doesn''t let go of the blocks for a while, and does reflinking if the same data is written again. Perhaps that''s what ZFS is doing, I have no idea :) -- ~Nirbheek Chauhan Gentoo GNOME+Mozilla Team -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrey Kuzmin
2010-Dec-07 07:50 UTC
Re: "Appending" data to the middle of a file using btrfs-specific features
On Mon, Dec 6, 2010 at 7:05 PM, Chris Mason <chris.mason@oracle.com> wrote:> Excerpts from Nirbheek Chauhan''s message of 2010-12-06 07:41:16 -0500: >> Hello, >> >> I''d like to know if there has been any discussion about adding a new >> feature to write (add) data at an offset, but without overwriting >> existing data, or re-writing the existing data. Essentially, in-place >> addition/removal of data to a file at a place other than the end of >> the file. >> >> Some possible use-cases of such a feature would be: >> >> (a) Databases (currently hack around this by allocating sparse files) >> (b) Delta-patching (rsync, patch, xdelta, etc) >> (c) Video editors (especially if combined with reflink copies) >> >> Besides I/O savings, it would also have significant space savings if >> the current subvolume being written to has been snapshotted (a common >> use-case for incremental backups). >> >> I''ve been told that the problem is somewhat difficult to solve >> properly under block-based representation of data, but I was hoping >> that btrfs'' reflink mechanism and its space-efficient packing of small >> files might make it doable. >> >> A hack I can think of is to do a BTRFS_IOC_CLONE_RANGE into a new file >> (upto the offset), writing whatever data is required, and then doing >> another BTRFS_IOC_CLONE_RANGE with an offset for the rest of the >> original file. This can be followed by a rename() over the original >> file. Similarly for removing data from the middle of a file. Would >> this work? Would it be cleaner to implement something equivalent >> internally? > > It would work yes. The operation has three cases: > > 1) file size doesn''t change > 2) extend the file with new bytes in the middle > 3) make the file smaller removing bytes in the middle > > #1 is the easiest case, you can just use the clone range ioctl directlyTis doesn''t seem to be interesting, looking just like traditional COW overwrite.> > For #2 and #3, all of the file pointers past the bytes you want to add > or remove need to be updated with a new file offset. I''d say for an > initial implementation to use the IOC_CLONE_RANGE code, and after > everything is working we can look at optimizing it with a shift ioctl if > it makes sense.Not sure how btrfs implements versioned B-trees, but other snapshot-capable file-systems I''m aware of utilize DITTO B-tree entry that says "for tis range, consult previous version tree". One can imagine DITTO(n) extension that would tell "subtract n from look-up key and then consult previous version tree", effectively achieving range shift behavior. FWIW. Regards, Andrey> > Of the use cases you list, video editors seems the most useful. > Databases already have things pretty much under control, and delta > patching wants to go to a new file anyway. Video editing software has > long been looking for ways to do this. > > -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nirbheek Chauhan
2010-Dec-07 11:29 UTC
Re: "Appending" data to the middle of a file using btrfs-specific features
[I think the mail was sent to just me due to a reply-accident, I''ve re-added the mailing list for this reply] On Tue, Dec 7, 2010 at 3:50 PM, David Pottage <david@electric-spoon.com> wrote:> On 06/12/10 12:41, Nirbheek Chauhan wrote: >> >> I''d like to know if there has been any discussion about adding a new >> feature to write (add) data at an offset, but without overwriting >> existing data, or re-writing the existing data. Essentially, in-place >> addition/removal of data to a file at a place other than the end of >> the file. >> >> Some possible use-cases of such a feature would be: >> >> (a) Databases (currently hack around this by allocating sparse files) >> (b) Delta-patching (rsync, patch, xdelta, etc) >> (c) Video editors (especially if combined with reflink copies) >> >> Besides I/O savings, it would also have significant space savings if >> the current subvolume being written to has been snapshotted (a common >> use-case for incremental backups). >> > > This idea was discussed back in June. (Search the archives for "Complex > filesystem operations: split and join" > > Back then the idea was to achieve insertion and removal of data by splitting > and joining existing files, so to insert data in the middle of a file, you > would cut it in two, append data to the first file and then re-join it. >Aha, I searched the archives and I found the thread in question[1], thanks! The original thread seems to have gone for a split/join implementation that would work with vfat along with a new syscall.> I think that direct insertion and removal of data is a cleaner idea, though > it may result in a more complex API. You could still achieve cutting files > into two by creating a COW copy of the file and truncating one, and removing > a block of bytes from the start of the other. >I agree, being able to manipulate file stream in a way similar to inserting/deleting in linked lists would introduce new possibilities (and challenges, I''m sure). As you mentioned in the original thread, it''s quite strange that there''s no way to do this with current file API.> I still think it would be a good idea to be able to join files together with > a file system API call, so the equivalent of: > > cat track1.mp3 track2.mp3 track3.mp3 > mix_tape.mp3 > > Could be done as a filesystem call to create mix_tape.mp3 as a de-duplicated > copy of the contents of the three source files, without many megabytes of > I/O. >Ah, this is relatively straightforward with the clone_range ioctl. There was some talk about a reflink() or clone() syscall a while ago[2], perhaps that could be extended as reflink_range() so that it could be used with other filesystems which support reflinks as well. 1. http://thread.gmane.org/gmane.linux.kernel/996835 2. http://lwn.net/Articles/333783/ -- ~Nirbheek Chauhan Gentoo GNOME+Mozilla Team -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html