Are there any plans to implement something akin to ZFS send/recv, to be able to create a stream representation of a snapshot and restore it later/somewhere else? I''ve spent some time trawling the mailing list and wiki, but I don''t see anything there. Cheers, Pat -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 12 March 2010, Pat Patterson wrote:> Are there any plans to implement something akin to ZFS send/recv, to > be able to create a stream representation of a snapshot and restore it > later/somewhere else? I''ve spent some time trawling the mailing list > and wiki, but I don''t see anything there.I spent a bit of time on this argument, in order to find how implement an efficient method to backup incrementally the data. AFAICT "zfs send" and "zfs recv" do the same thing that tar does. They transform a tree (or the difference between a tree and its snapshot) to a stream, and vice-versa. To transform a tree to a stream is not very interesting. The interesting part is how compare a tree and its snapshot. In fact a snapshot of a tree should a be pointer to the original tree, and when a file is modified, a branch of the modified part (the extens of the file, the directories of the path) is performed (yes I know that this a big simplification of the process). The key is that the file-system knows which part of a snapshot is still equal to the source and which not. If this kind of data is available to the user space, comparing a tree and it snapshot should be very fast. Reading the documentation of btrfs, it seems that associated the transaction there is a "version number". With this "version number" of a directory, we would be able to verify the equality of two trees comparing only the root of the trees. This would increase the seed of two trees. But I was never able to get this "version number". There is the ioctl command FS_IOC_GETVERSION, which seems to return this number. But when a directory or an its children is update, this number doesn''t change. I tried to hack the kernel code in order to test different "version" number: I tried inode->i_generation, or btrfs_inode->generation or btrfs_inode->sequence or btrfs_inode->{last|last_sub|logged}_trans... But none of the above was useful for my purpose. Even tough there is no a clear conclusion, I hope that this note may be useful to start to discuss on this matter. Regards Goffredo> > Cheers, > > Pat > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijackATinwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 12, 2010 at 07:30:01PM +0100, Goffredo Baroncelli wrote:> On Friday 12 March 2010, Pat Patterson wrote: > > Are there any plans to implement something akin to ZFS send/recv, to > > be able to create a stream representation of a snapshot and restore it > > later/somewhere else? I''ve spent some time trawling the mailing list > > and wiki, but I don''t see anything there. > > I spent a bit of time on this argument, in order to find how implement an > efficient method to backup incrementally the data. > > AFAICT "zfs send" and "zfs recv" do the same thing that tar does. They > transform a tree (or the difference between a tree and its snapshot) to a > stream, and vice-versa. > > To transform a tree to a stream is not very interesting. > The interesting part is how compare a tree and its snapshot. In fact a > snapshot of a tree should a be pointer to the original tree, and when a file > is modified, a branch of the modified part (the extens of the file, the > directories of the path) is performed (yes I know that this a big > simplification of the process). > The key is that the file-system knows which part of a snapshot is still equal > to the source and which not. > > If this kind of data is available to the user space, comparing a tree and it > snapshot should be very fast. > > Reading the documentation of btrfs, it seems that associated the transaction > there is a "version number". With this "version number" of a directory, we > would be able to verify the equality of two trees comparing only the root of > the trees. This would increase the seed of two trees.Every btree block and file extent include the transaction id of when they were created. When COW is on, this means they include the transaction id of when they were last modified. Finding updated file extents means searching through the tree based on transaction id (ignoring any branch in the tree older than transid X), which is exactly what the treelog code does to efficiently log fsyncs. This is especially easy because the tree node pointers include the expected transaction id of what they are pointing to, so you can skip reading any tree block with an old pointer. In the subvol branch, we have a new ioctl to do tree searches from userland based on these ranges. It can very easily be used to make a list of files (and extents in those files) that have been updated since a given transid.> > But I was never able to get this "version number". There is the ioctl command > FS_IOC_GETVERSION, which seems to return this number. But when a directory or > an its children is update, this number doesn''t change. > > I tried to hack the kernel code in order to test different "version" number: I > tried inode->i_generation, or btrfs_inode->generation or btrfs_inode->sequence > or btrfs_inode->{last|last_sub|logged}_trans... > But none of the above was useful for my purpose.Right, I decided instead to store the generation in the file extent pointer. We needed it for other things as well, and it makes it possible to find individual extents that have changed in a file instead of just flagging the file as modified. This would be a good project if anyone is interested, I''m happy to send along full details. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Chris On Monday 15 March 2010, Chris Mason wrote:> On Fri, Mar 12, 2010 at 07:30:01PM +0100, Goffredo Baroncelli wrote: > > On Friday 12 March 2010, Pat Patterson wrote: > > > Are there any plans to implement something akin to ZFS send/recv, to > > > be able to create a stream representation of a snapshot and restore it > > > later/somewhere else? I''ve spent some time trawling the mailing list > > > and wiki, but I don''t see anything there. > > > > I spent a bit of time on this argument, in order to find how implement an > > efficient method to backup incrementally the data. > > > > AFAICT "zfs send" and "zfs recv" do the same thing that tar does. They > > transform a tree (or the difference between a tree and its snapshot) to a > > stream, and vice-versa. > > > > To transform a tree to a stream is not very interesting. > > The interesting part is how compare a tree and its snapshot. In fact a > > snapshot of a tree should a be pointer to the original tree, and when afile> > is modified, a branch of the modified part (the extens of the file, the > > directories of the path) is performed (yes I know that this a big > > simplification of the process). > > The key is that the file-system knows which part of a snapshot is stillequal> > to the source and which not. > > > > If this kind of data is available to the user space, comparing a tree andit> > snapshot should be very fast. > > > > Reading the documentation of btrfs, it seems that associated thetransaction> > there is a "version number". With this "version number" of a directory,we> > would be able to verify the equality of two trees comparing only the rootof> > the trees. This would increase the seed of two trees. > > Every btree block and file extent include the transaction id of when > they were created. When COW is on, this means they include the > transaction id of when they were last modified. > > Finding updated file extents means searching through the tree based on > transaction id (ignoring any branch in the tree older than transid X), > which is exactly what the treelog code does to efficiently log fsyncs. > This is especially easy because the tree node pointers include the > expected transaction id of what they are pointing to, so you can skip > reading any tree block with an old pointer.If I understand correctly, you say that it is possible to find the file update between two transaction id. It would be wonderful. Even though a question comes me: what about if the transaction doesn''t contain the snapshot alone ? Could the "delta" contain writes happened after the second transaction or before the first transaction ?> In the subvol branch, we have a new ioctl to do tree searches from > userland based on these ranges. It can very easily be used to make a > list of files (and extents in those files) that have been updated since > a given transid. > > > > > But I was never able to get this "version number". There is the ioctlcommand> > FS_IOC_GETVERSION, which seems to return this number. But when a directoryor> > an its children is update, this number doesn''t change. > > > > I tried to hack the kernel code in order to test different "version"number: I> > tried inode->i_generation, or btrfs_inode->generation or btrfs_inode- >sequence > > or btrfs_inode->{last|last_sub|logged}_trans... > > But none of the above was useful for my purpose. > > Right, I decided instead to store the generation in the file extent > pointer. We needed it for other things as well, and it makes it > possible to find individual extents that have changed in a file instead > of just flagging the file as modified. > > This would be a good project if anyone is interested, I''m happy to send > along full details.If you are able to provide further details, I am interested in the things. I appreciate any suggestion how extract the transaction ID given a file (or a directory).> > -chrisGoffredo -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijackATinwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 16, 2010 at 07:17:12PM +0100, Goffredo Baroncelli wrote:> Hi Chris > > On Monday 15 March 2010, Chris Mason wrote: > > On Fri, Mar 12, 2010 at 07:30:01PM +0100, Goffredo Baroncelli wrote: > > > On Friday 12 March 2010, Pat Patterson wrote: > > > > Are there any plans to implement something akin to ZFS send/recv, to > > > > be able to create a stream representation of a snapshot and restore it > > > > later/somewhere else? I''ve spent some time trawling the mailing list > > > > and wiki, but I don''t see anything there. > > > > > > I spent a bit of time on this argument, in order to find how implement an > > > efficient method to backup incrementally the data. > > > > > > AFAICT "zfs send" and "zfs recv" do the same thing that tar does. They > > > transform a tree (or the difference between a tree and its snapshot) to a > > > stream, and vice-versa. > > > > > > To transform a tree to a stream is not very interesting. > > > The interesting part is how compare a tree and its snapshot. In fact a > > > snapshot of a tree should a be pointer to the original tree, and when a > file > > > is modified, a branch of the modified part (the extens of the file, the > > > directories of the path) is performed (yes I know that this a big > > > simplification of the process). > > > The key is that the file-system knows which part of a snapshot is still > equal > > > to the source and which not. > > > > > > If this kind of data is available to the user space, comparing a tree and > it > > > snapshot should be very fast. > > > > > > Reading the documentation of btrfs, it seems that associated the > transaction > > > there is a "version number". With this "version number" of a directory, > we > > > would be able to verify the equality of two trees comparing only the root > of > > > the trees. This would increase the seed of two trees. > > > > Every btree block and file extent include the transaction id of when > > they were created. When COW is on, this means they include the > > transaction id of when they were last modified. > > > > Finding updated file extents means searching through the tree based on > > transaction id (ignoring any branch in the tree older than transid X), > > which is exactly what the treelog code does to efficiently log fsyncs. > > This is especially easy because the tree node pointers include the > > expected transaction id of what they are pointing to, so you can skip > > reading any tree block with an old pointer. > > If I understand correctly, you say that it is possible to find the file update > between two transaction id. It would be wonderful. Even though a question > comes me: what about if the transaction doesn''t contain the snapshot alone ? > Could the "delta" contain writes happened after the second transaction or > before the first transaction ? > > > In the subvol branch, we have a new ioctl to do tree searches from > > userland based on these ranges. It can very easily be used to make a > > list of files (and extents in those files) that have been updated since > > a given transid. > > > > > > > > But I was never able to get this "version number". There is the ioctl > command > > > FS_IOC_GETVERSION, which seems to return this number. But when a directory > or > > > an its children is update, this number doesn''t change. > > > > > > I tried to hack the kernel code in order to test different "version" > number: I > > > tried inode->i_generation, or btrfs_inode->generation or btrfs_inode- > >sequence > > > or btrfs_inode->{last|last_sub|logged}_trans... > > > But none of the above was useful for my purpose. > > > > Right, I decided instead to store the generation in the file extent > > pointer. We needed it for other things as well, and it makes it > > possible to find individual extents that have changed in a file instead > > of just flagging the file as modified. > > > > This would be a good project if anyone is interested, I''m happy to send > > along full details. > > > If you are able to provide further details, I am interested in the things. > I appreciate any suggestion how extract the transaction ID given a file (or a > directory).The new btrfs subvol find-new command has an example to build up a list of files that have changed based on the generation in the extent field. This is only the start of what a real tool needs, but it should definitely help anyone interested in this. The usage is btrfs subvol find-new <path> <generation> If you pass a generation of zero, it''ll list every file in the filesystem. Otherwise it will only pass files with extents >given generation. The generations are done on each extent, and have nothing to do with mtime/ctime. So if you just touch a file, it won''t show up in the list. For this tool to be real it will also need to check inode times against a reference time. The list is per-subvol only, but there''s no reason it can''t descend into other subvols from userland. Filtering the search by subdirectory is an exercise for the reader. In many cases it will actually be slower than doing the whole FS, but in others it''ll be much faster. Another thing to keep in mind is the search only finds extents after they have been written to the disk. Example output: btrfs subvol find-new /mnt 0 | head # btrfs subvol find-new /mnt/foo 0 | head -n 3 inode 263 file offset 0 len 452 disk start 0 offset 0 gen 10017 flags INLINE linux.ext3/.git/hooks/applypatch-msg.sample inode 264 file offset 0 len 160 disk start 0 offset 0 gen 10017 flags INLINE linux.ext3/.git/hooks/post-commit.sample inode 267 file offset 0 len 8192 disk start 12582912 offset 0 gen 10017 flags NONE linux.ext3/.git/hooks/pre-rebase.sample So we have two small inline files and one file with a regular extent. The fields tell us: inode number in the subvol logical start of range in file logical length of range in file (for a compressed file this would be the uncompressed size) Extent start on disk Offset into that extent on disk Generation number of this extent (transid that created it) Any flags: COMPRESS,INLINE,PREALLOC The extent number on disk is included so that files sharing the same extents can be identified. I''m sure we''ll have to grow this a bit and play with it, but it''s definitely a start. Just let me know if you have any questions. Extra points to the first person that finds a way to send this as a file list for rsync. One important thing to remember if you want to use this to make a backup program is that it won''t tell you about any files that have been removed. There are a few different ways to get this information, but the easiest way is to make a manifest of the directory listings for any directory that has been changed. The search ioctl exposes the whole btrfs btree to userland, and the find-new command has a few different examples of ways you might use this. Once you get a feel for the searches a lot of things get easier, but the learning curve is pretty steep. Please ask questions early and often if you play with this and don''t get the results you expect. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html