Hi Chris! Hi Alexander! I''m working on the kernel side of a feature request we have from the snapper developers. Snapper can be used (in conjunction) with btrfs on openSUSE to roll back system changes made via YaST and zypper. If you want to know more, you can see the following page: http://doc.opensuse.org/documentation/html/openSUSE/opensuse-reference/cha.snapper.html Right now snapper does this by running ''find'' and ''diff'' against snapshots it has taken. This is obviously expensive and we''re looking at something like the btrfs send ioctl to reduce the overhead during this operation. In fact, send works quite nicely except that it always reads changed file data from disk and puts it in the stream - snapper doesn''t want this overhead and would prefer to produce the changed data itself. Knowing that there *was* a data update however is still critical. So I have a conceptually simple patch which passes a flag to modify this behavior. The patch is included below however I will quote the description here: "This patch adds the flag, BTRFS_SEND_FLAG_NO_FILE_DATA to the btrfs send ioctl code. When this flag is set, the btrfs send code will never write file data into the stream (thus also avoiding expensive reads of that data in the first place). BTRFS_SEND_C_UPDATE_EXTENT commands will be sent (instead of BTRFS_SEND_C_WRITE) with an offset, length pair indicating the extent in question. This patch does not affect the operation of BTRFS_SEND_C_CLONE commands - they will continue to be sent when a search finds an appropriate extent to clone from." What do you all think of this approach? Is it too ugly to put a conditional command in the stream? Do we want to just add another ioctl? I''m loathe to go in *that* direction because I fear I''d be replicating most of the send code _anyway_ (well, it could just be a wrapper ioctl()). Thanks in advance for looking at this. --Mark -- Mark Fasheh From: Mark Fasheh <mfasheh@suse.de> btrfs: add "no file data" flag to btrfs send ioctl This patch adds the flag, BTRFS_SEND_FLAG_NO_FILE_DATA to the btrfs send ioctl code. When this flag is set, the btrfs send code will never write file data into the stream (thus also avoiding expensive reads of that data in the first place). BTRFS_SEND_C_UPDATE_EXTENT commands will be sent (instead of BTRFS_SEND_C_WRITE) with an offset, length pair indicating the extent in question. This patch does not affect the operation of BTRFS_SEND_C_CLONE commands - they will continue to be sent when a search finds an appropriate extent to clone from. Signed-off-by: Mark Fasheh <mfasheh@suse.de> --- fs/btrfs/ioctl.h | 7 +++++++ fs/btrfs/send.c | 48 ++++++++++++++++++++++++++++++++++++++++++++---- fs/btrfs/send.h | 1 + 3 files changed, 52 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 731e287..1f6cfdd 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -363,6 +363,13 @@ struct btrfs_ioctl_received_subvol_args { __u64 reserved[16]; /* in */ }; +/* + * Caller doesn''t want file data in the send stream, even if the + * search of clone sources doesn''t find an extent. UPDATE_EXTENT + * commands will be sent instead of WRITE commands. + */ +#define BTRFS_SEND_FLAG_NO_FILE_DATA 0x1 + struct btrfs_ioctl_send_args { __s64 send_fd; /* in */ __u64 clone_sources_count; /* in */ diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index e78b297..f97b5e6 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -85,6 +85,7 @@ struct send_ctx { u32 send_max_size; u64 total_send_size; u64 cmd_send_size[BTRFS_SEND_C_MAX + 1]; + u64 flags; /* ''flags'' member of btrfs_ioctl_send_args is u64 */ struct vfsmount *mnt; @@ -3707,6 +3708,42 @@ out: return ret; } +/* + * Send an update extent command to user space. + */ +static int send_update_extent(struct send_ctx *sctx, + u64 offset, u32 len) +{ + int ret = 0; + struct fs_path *p; + +verbose_printk("btrfs: send_update_extent offset=%llu, len=%d\n", offset, len); + + p = fs_path_alloc(sctx); + if (!p) + return -ENOMEM; + + ret = begin_cmd(sctx, BTRFS_SEND_C_UPDATE_EXTENT); + if (ret < 0) + goto out; + + ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); + if (ret < 0) + goto out; + + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_CLONE_LEN, len); + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); + + ret = send_cmd(sctx); + +tlv_put_failure: +out: + fs_path_free(sctx, p); + return ret; +} + static int send_write_or_clone(struct send_ctx *sctx, struct btrfs_path *path, struct btrfs_key *key, @@ -3742,7 +3779,11 @@ static int send_write_or_clone(struct send_ctx *sctx, goto out; } - if (!clone_root) { + if (clone_root) { + ret = send_clone(sctx, offset, len, clone_root); + } else if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA) { + ret = send_update_extent(sctx, offset, len); + } else { while (pos < len) { l = len - pos; if (l > BTRFS_SEND_READ_SIZE) @@ -3755,10 +3796,7 @@ static int send_write_or_clone(struct send_ctx *sctx, pos += ret; } ret = 0; - } else { - ret = send_clone(sctx, offset, len, clone_root); } - out: return ret; } @@ -4570,6 +4608,8 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) INIT_RADIX_TREE(&sctx->name_cache, GFP_NOFS); INIT_LIST_HEAD(&sctx->name_cache_list); + sctx->flags = arg->flags; + sctx->send_filp = fget(arg->send_fd); if (IS_ERR(sctx->send_filp)) { ret = PTR_ERR(sctx->send_filp); diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h index 1bf4f32..8bb18f7 100644 --- a/fs/btrfs/send.h +++ b/fs/btrfs/send.h @@ -86,6 +86,7 @@ enum btrfs_send_cmd { BTRFS_SEND_C_UTIMES, BTRFS_SEND_C_END, + BTRFS_SEND_C_UPDATE_EXTENT, __BTRFS_SEND_C_MAX, }; #define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1) -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Mark, I like your approach in general. Two comments on the patch below. On Mon, December 31, 2012 at 22:33 (+0100), Mark Fasheh wrote:> Hi Chris! Hi Alexander! > > I''m working on the kernel side of a feature request we have from the snapper > developers. Snapper can be used (in conjunction) with btrfs on openSUSE to > roll back system changes made via YaST and zypper. > > If you want to know more, you can see the following page: > > http://doc.opensuse.org/documentation/html/openSUSE/opensuse-reference/cha.snapper.html > > Right now snapper does this by running ''find'' and ''diff'' against snapshots > it has taken. This is obviously expensive and we''re looking at something > like the btrfs send ioctl to reduce the overhead during this operation. > > In fact, send works quite nicely except that it always reads changed file > data from disk and puts it in the stream - snapper doesn''t want this > overhead and would prefer to produce the changed data itself. Knowing that > there *was* a data update however is still critical. > > So I have a conceptually simple patch which passes a flag to modify this > behavior. The patch is included below however I will quote the description > here: > > "This patch adds the flag, BTRFS_SEND_FLAG_NO_FILE_DATA to the btrfs send > ioctl code. When this flag is set, the btrfs send code will never write file > data into the stream (thus also avoiding expensive reads of that data in the > first place). BTRFS_SEND_C_UPDATE_EXTENT commands will be sent (instead of > BTRFS_SEND_C_WRITE) with an offset, length pair indicating the extent in > question. > > This patch does not affect the operation of BTRFS_SEND_C_CLONE commands - > they will continue to be sent when a search finds an appropriate extent to > clone from." > > > What do you all think of this approach? Is it too ugly to put a conditional > command in the stream? Do we want to just add another ioctl? I''m loathe to > go in *that* direction because I fear I''d be replicating most of the send > code _anyway_ (well, it could just be a wrapper ioctl()). > > Thanks in advance for looking at this. > --Mark > > -- > Mark Fasheh > > > From: Mark Fasheh <mfasheh@suse.de> > > btrfs: add "no file data" flag to btrfs send ioctl > > This patch adds the flag, BTRFS_SEND_FLAG_NO_FILE_DATA to the btrfs send > ioctl code. When this flag is set, the btrfs send code will never write file > data into the stream (thus also avoiding expensive reads of that data in the > first place). BTRFS_SEND_C_UPDATE_EXTENT commands will be sent (instead of > BTRFS_SEND_C_WRITE) with an offset, length pair indicating the extent in > question. > > This patch does not affect the operation of BTRFS_SEND_C_CLONE commands - > they will continue to be sent when a search finds an appropriate extent to > clone from. > > Signed-off-by: Mark Fasheh <mfasheh@suse.de> > --- > fs/btrfs/ioctl.h | 7 +++++++ > fs/btrfs/send.c | 48 ++++++++++++++++++++++++++++++++++++++++++++---- > fs/btrfs/send.h | 1 + > 3 files changed, 52 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h > index 731e287..1f6cfdd 100644 > --- a/fs/btrfs/ioctl.h > +++ b/fs/btrfs/ioctl.h > @@ -363,6 +363,13 @@ struct btrfs_ioctl_received_subvol_args { > __u64 reserved[16]; /* in */ > }; > > +/* > + * Caller doesn''t want file data in the send stream, even if the > + * search of clone sources doesn''t find an extent. UPDATE_EXTENT > + * commands will be sent instead of WRITE commands. > + */ > +#define BTRFS_SEND_FLAG_NO_FILE_DATA 0x1 > + > struct btrfs_ioctl_send_args { > __s64 send_fd; /* in */ > __u64 clone_sources_count; /* in */ > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > index e78b297..f97b5e6 100644 > --- a/fs/btrfs/send.c > +++ b/fs/btrfs/send.c > @@ -85,6 +85,7 @@ struct send_ctx { > u32 send_max_size; > u64 total_send_size; > u64 cmd_send_size[BTRFS_SEND_C_MAX + 1]; > + u64 flags; /* ''flags'' member of btrfs_ioctl_send_args is u64 */ > > struct vfsmount *mnt; > > @@ -3707,6 +3708,42 @@ out: > return ret; > } > > +/* > + * Send an update extent command to user space. > + */ > +static int send_update_extent(struct send_ctx *sctx, > + u64 offset, u32 len) > +{ > + int ret = 0; > + struct fs_path *p; > + > +verbose_printk("btrfs: send_update_extent offset=%llu, len=%d\n", offset, len);I know the send code has some of those, too, but it really shouldn''t. Please don''t add any new ones. If you think that''s useful, please use proper indentation or even better, pr_debug with proper indentation.> + > + p = fs_path_alloc(sctx); > + if (!p) > + return -ENOMEM; > + > + ret = begin_cmd(sctx, BTRFS_SEND_C_UPDATE_EXTENT); > + if (ret < 0) > + goto out; > + > + ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); > + if (ret < 0) > + goto out; > + > + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); > + TLV_PUT_U64(sctx, BTRFS_SEND_A_CLONE_LEN, len); > + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); > + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len);Having both CLONE_LEN and SIZE doesn''t seem right. The sequence seems a bit odd here, too, but it''s in good company with the rest of send.c. -Jan> + > + ret = send_cmd(sctx); > + > +tlv_put_failure: > +out: > + fs_path_free(sctx, p); > + return ret; > +} > + > static int send_write_or_clone(struct send_ctx *sctx, > struct btrfs_path *path, > struct btrfs_key *key, > @@ -3742,7 +3779,11 @@ static int send_write_or_clone(struct send_ctx *sctx, > goto out; > } > > - if (!clone_root) { > + if (clone_root) { > + ret = send_clone(sctx, offset, len, clone_root); > + } else if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA) { > + ret = send_update_extent(sctx, offset, len); > + } else { > while (pos < len) { > l = len - pos; > if (l > BTRFS_SEND_READ_SIZE) > @@ -3755,10 +3796,7 @@ static int send_write_or_clone(struct send_ctx *sctx, > pos += ret; > } > ret = 0; > - } else { > - ret = send_clone(sctx, offset, len, clone_root); > } > - > out: > return ret; > } > @@ -4570,6 +4608,8 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) > INIT_RADIX_TREE(&sctx->name_cache, GFP_NOFS); > INIT_LIST_HEAD(&sctx->name_cache_list); > > + sctx->flags = arg->flags; > + > sctx->send_filp = fget(arg->send_fd); > if (IS_ERR(sctx->send_filp)) { > ret = PTR_ERR(sctx->send_filp); > diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h > index 1bf4f32..8bb18f7 100644 > --- a/fs/btrfs/send.h > +++ b/fs/btrfs/send.h > @@ -86,6 +86,7 @@ enum btrfs_send_cmd { > BTRFS_SEND_C_UTIMES, > > BTRFS_SEND_C_END, > + BTRFS_SEND_C_UPDATE_EXTENT, > __BTRFS_SEND_C_MAX, > }; > #define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1) >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 02, 2013 at 12:44:00PM +0100, Jan Schmidt wrote:> Hi Mark, > > I like your approach in general. Two comments on the patch below.Oh great! Thanks for looking at this Jan!> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > > index e78b297..f97b5e6 100644 > > --- a/fs/btrfs/send.c > > +++ b/fs/btrfs/send.c > > @@ -85,6 +85,7 @@ struct send_ctx { > > u32 send_max_size; > > u64 total_send_size; > > u64 cmd_send_size[BTRFS_SEND_C_MAX + 1]; > > + u64 flags; /* ''flags'' member of btrfs_ioctl_send_args is u64 */ > > > > struct vfsmount *mnt; > > > > @@ -3707,6 +3708,42 @@ out: > > return ret; > > } > > > > +/* > > + * Send an update extent command to user space. > > + */ > > +static int send_update_extent(struct send_ctx *sctx, > > + u64 offset, u32 len) > > +{ > > + int ret = 0; > > + struct fs_path *p; > > + > > +verbose_printk("btrfs: send_update_extent offset=%llu, len=%d\n", offset, len); > > I know the send code has some of those, too, but it really shouldn''t. Please > don''t add any new ones. If you think that''s useful, please use proper > indentation or even better, pr_debug with proper indentation.Ugh yeah I see that now. Definitely I was "going with the flow" so to speak. I''ll kill that line completely most likely. It''s not really telling us a whole lot I couldn''t get from other means.> > + > > + p = fs_path_alloc(sctx); > > + if (!p) > > + return -ENOMEM; > > + > > + ret = begin_cmd(sctx, BTRFS_SEND_C_UPDATE_EXTENT); > > + if (ret < 0) > > + goto out; > > + > > + ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); > > + if (ret < 0) > > + goto out; > > + > > + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); > > + TLV_PUT_U64(sctx, BTRFS_SEND_A_CLONE_LEN, len); > > + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); > > + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); > > Having both CLONE_LEN and SIZE doesn''t seem right. The sequence seems a bit odd > here, too, but it''s in good company with the rest of send.c.You''re right though that CLONE_LEN doesn''t have any reason to be there. I''m partial to changing the sequence then: TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); Seems more sane from a readability standpoint. "Oh, this <path> had an extent change in the range (<offset>, <len>). Thanks again! --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html