Hi Chris, Those patches (except the last two) have been sent to the mailing list before, and they fix real bugs (but not critical bugs). Please consider queue them for 2.6.38. The free-space cache fixes were reviewed by Josef. Other patches fix some memory leaks, fix an acl bug, and fix the file clone ioctl. Note: not all the patches are from Fujitsu. Those patches can be pulled from: git://repo.or.cz/linux-btrfs-devel.git bug-fixes Ian Kent (1): Btrfs: Fix memory leak on finding existing super Li Zefan (8): btrfs: Fix threshold calculation for block groups smaller than 1GB btrfs: Add helper function free_bitmap() btrfs: Free fully occupied bitmap in cluster btrfs: Update stats when allocating from a cluster btrfs: Add a helper try_merge_free_space() btrfs: Check mergeable free space when removing a cluster Btrfs: Fix memory leak at umount Btrfs: Fix file clone when source offset is not 0 Miao Xie (2): Btrfs: Don''t return acl info when mounting with noacl option Btrfs: Fix memory leak in writepage fixup work Tero Roponen (1): Btrfs: Free correct pointer after using strsep fs/btrfs/acl.c | 6 ++ fs/btrfs/disk-io.c | 2 + fs/btrfs/free-space-cache.c | 161 ++++++++++++++++++++++++++----------------- fs/btrfs/inode.c | 1 + fs/btrfs/ioctl.c | 5 +- fs/btrfs/super.c | 7 ++- 6 files changed, 117 insertions(+), 65 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Jan-27 08:42 UTC
[PATCH 01/12] btrfs: Fix threshold calculation for block groups smaller than 1GB
If a block group is smaller than 1GB, the extent entry threadhold calculation will always set the threshold to 0. So as free space gets fragmented, btrfs will switch to use bitmap to manage free space, but then will never switch back to extents due to this bug. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/free-space-cache.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 60d6842..42f4015 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1016,14 +1016,18 @@ static void recalculate_thresholds(struct btrfs_block_group_cache *block_group) u64 max_bytes; u64 bitmap_bytes; u64 extent_bytes; + u64 size = block_group->key.offset; /* * The goal is to keep the total amount of memory used per 1gb of space * at or below 32k, so we need to adjust how much memory we allow to be * used by extent based free space tracking */ - max_bytes = MAX_CACHE_BYTES_PER_GIG * - (div64_u64(block_group->key.offset, 1024 * 1024 * 1024)); + if (size < 1024 * 1024 * 1024) + max_bytes = MAX_CACHE_BYTES_PER_GIG; + else + max_bytes = MAX_CACHE_BYTES_PER_GIG * + div64_u64(size, 1024 * 1024 * 1024); /* * we want to account for 1 more bitmap than what we have so we can make -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Remove some duplicated code. This prepares for the next patch. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/free-space-cache.c | 37 ++++++++++++++++--------------------- 1 files changed, 16 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 42f4015..850104f 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1175,6 +1175,16 @@ static void add_new_bitmap(struct btrfs_block_group_cache *block_group, recalculate_thresholds(block_group); } +static void free_bitmap(struct btrfs_block_group_cache *block_group, + struct btrfs_free_space *bitmap_info) +{ + unlink_free_space(block_group, bitmap_info); + kfree(bitmap_info->bitmap); + kfree(bitmap_info); + block_group->total_bitmaps--; + recalculate_thresholds(block_group); +} + static noinline int remove_from_bitmap(struct btrfs_block_group_cache *block_group, struct btrfs_free_space *bitmap_info, u64 *offset, u64 *bytes) @@ -1215,13 +1225,8 @@ again: if (*bytes) { struct rb_node *next = rb_next(&bitmap_info->offset_index); - if (!bitmap_info->bytes) { - unlink_free_space(block_group, bitmap_info); - kfree(bitmap_info->bitmap); - kfree(bitmap_info); - block_group->total_bitmaps--; - recalculate_thresholds(block_group); - } + if (!bitmap_info->bytes) + free_bitmap(block_group, bitmap_info); /* * no entry after this bitmap, but we still have bytes to @@ -1254,13 +1259,8 @@ again: return -EAGAIN; goto again; - } else if (!bitmap_info->bytes) { - unlink_free_space(block_group, bitmap_info); - kfree(bitmap_info->bitmap); - kfree(bitmap_info); - block_group->total_bitmaps--; - recalculate_thresholds(block_group); - } + } else if (!bitmap_info->bytes) + free_bitmap(block_group, bitmap_info); return 0; } @@ -1689,13 +1689,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, ret = offset; if (entry->bitmap) { bitmap_clear_bits(block_group, entry, offset, bytes); - if (!entry->bytes) { - unlink_free_space(block_group, entry); - kfree(entry->bitmap); - kfree(entry); - block_group->total_bitmaps--; - recalculate_thresholds(block_group); - } + if (!entry->bytes) + free_bitmap(block_group, entry); } else { unlink_free_space(block_group, entry); entry->offset += bytes; -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
If there''s no more free space in a bitmap, we should free it. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/free-space-cache.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 850104f..cb0137e 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1788,6 +1788,8 @@ static u64 btrfs_alloc_from_bitmap(struct btrfs_block_group_cache *block_group, ret = search_start; bitmap_clear_bits(block_group, entry, ret, bytes); + if (entry->bytes == 0) + free_bitmap(block_group, entry); out: spin_unlock(&cluster->lock); spin_unlock(&block_group->tree_lock); -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Jan-27 08:43 UTC
[PATCH 04/12] btrfs: Update stats when allocating from a cluster
When allocating extent entry from a cluster, we should update the free_space and free_extents fields of the block group. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/free-space-cache.c | 17 ++++++++++++++--- 1 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index cb0137e..2974c47 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1843,15 +1843,26 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, entry->offset += bytes; entry->bytes -= bytes; - if (entry->bytes == 0) { + if (entry->bytes == 0) rb_erase(&entry->offset_index, &cluster->root); - kfree(entry); - } break; } out: spin_unlock(&cluster->lock); + if (!ret) + return 0; + + spin_lock(&block_group->tree_lock); + + block_group->free_space -= bytes; + if (entry->bytes == 0) { + block_group->free_extents--; + kfree(entry); + } + + spin_unlock(&block_group->tree_lock); + return ret; } -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
When adding a new extent, we''ll firstly see if we can merge this extent to the left or/and right extent. Extract this as a helper try_merge_free_space(). As a side effect, we fix a small bug that if the new extent has non-bitmap left entry but is unmergeble, we''ll directly link the extent without trying to drop it into bitmap. This also prepares for the next patch. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/free-space-cache.c | 75 ++++++++++++++++++++++++------------------ 1 files changed, 43 insertions(+), 32 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 2974c47..cf67dc3 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1363,22 +1363,14 @@ out: return ret; } -int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, - u64 offset, u64 bytes) +bool try_merge_free_space(struct btrfs_block_group_cache *block_group, + struct btrfs_free_space *info) { - struct btrfs_free_space *right_info = NULL; - struct btrfs_free_space *left_info = NULL; - struct btrfs_free_space *info = NULL; - int ret = 0; - - info = kzalloc(sizeof(struct btrfs_free_space), GFP_NOFS); - if (!info) - return -ENOMEM; - - info->offset = offset; - info->bytes = bytes; - - spin_lock(&block_group->tree_lock); + struct btrfs_free_space *left_info; + struct btrfs_free_space *right_info; + bool merged = false; + u64 offset = info->offset; + u64 bytes = info->bytes; /* * first we want to see if there is free space adjacent to the range we @@ -1392,27 +1384,11 @@ int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, else left_info = tree_search_offset(block_group, offset - 1, 0, 0); - /* - * If there was no extent directly to the left or right of this new - * extent then we know we''re going to have to allocate a new extent, so - * before we do that see if we need to drop this into a bitmap - */ - if ((!left_info || left_info->bitmap) && - (!right_info || right_info->bitmap)) { - ret = insert_into_bitmap(block_group, info); - - if (ret < 0) { - goto out; - } else if (ret) { - ret = 0; - goto out; - } - } - if (right_info && !right_info->bitmap) { unlink_free_space(block_group, right_info); info->bytes += right_info->bytes; kfree(right_info); + merged = true; } if (left_info && !left_info->bitmap && @@ -1421,8 +1397,43 @@ int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, info->offset = left_info->offset; info->bytes += left_info->bytes; kfree(left_info); + merged = true; } + return merged; +} + +int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, + u64 offset, u64 bytes) +{ + struct btrfs_free_space *info; + int ret = 0; + + info = kzalloc(sizeof(struct btrfs_free_space), GFP_NOFS); + if (!info) + return -ENOMEM; + + info->offset = offset; + info->bytes = bytes; + + spin_lock(&block_group->tree_lock); + + if (try_merge_free_space(block_group, info)) + goto link; + + /* + * There was no extent directly to the left or right of this new + * extent then we know we''re going to have to allocate a new extent, so + * before we do that see if we need to drop this into a bitmap + */ + ret = insert_into_bitmap(block_group, info); + if (ret < 0) { + goto out; + } else if (ret) { + ret = 0; + goto out; + } +link: ret = link_free_space(block_group, info); if (ret) kfree(info); -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Jan-27 08:44 UTC
[PATCH 06/12] btrfs: Check mergeable free space when removing a cluster
After returing extents from a cluster to the block group, some extents in the block group may be mergeable. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/free-space-cache.c | 26 ++++++++++++++++++++------ 1 files changed, 20 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index cf67dc3..a5501ed 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -987,11 +987,18 @@ tree_search_offset(struct btrfs_block_group_cache *block_group, return entry; } -static void unlink_free_space(struct btrfs_block_group_cache *block_group, - struct btrfs_free_space *info) +static inline void +__unlink_free_space(struct btrfs_block_group_cache *block_group, + struct btrfs_free_space *info) { rb_erase(&info->offset_index, &block_group->free_space_offset); block_group->free_extents--; +} + +static void unlink_free_space(struct btrfs_block_group_cache *block_group, + struct btrfs_free_space *info) +{ + __unlink_free_space(block_group, info); block_group->free_space -= info->bytes; } @@ -1364,7 +1371,7 @@ out: } bool try_merge_free_space(struct btrfs_block_group_cache *block_group, - struct btrfs_free_space *info) + struct btrfs_free_space *info, bool update_stat) { struct btrfs_free_space *left_info; struct btrfs_free_space *right_info; @@ -1385,7 +1392,10 @@ bool try_merge_free_space(struct btrfs_block_group_cache *block_group, left_info = tree_search_offset(block_group, offset - 1, 0, 0); if (right_info && !right_info->bitmap) { - unlink_free_space(block_group, right_info); + if (update_stat) + unlink_free_space(block_group, right_info); + else + __unlink_free_space(block_group, right_info); info->bytes += right_info->bytes; kfree(right_info); merged = true; @@ -1393,7 +1403,10 @@ bool try_merge_free_space(struct btrfs_block_group_cache *block_group, if (left_info && !left_info->bitmap && left_info->offset + left_info->bytes == offset) { - unlink_free_space(block_group, left_info); + if (update_stat) + unlink_free_space(block_group, left_info); + else + __unlink_free_space(block_group, left_info); info->offset = left_info->offset; info->bytes += left_info->bytes; kfree(left_info); @@ -1418,7 +1431,7 @@ int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, spin_lock(&block_group->tree_lock); - if (try_merge_free_space(block_group, info)) + if (try_merge_free_space(block_group, info, true)) goto link; /* @@ -1636,6 +1649,7 @@ __btrfs_return_cluster_to_free_space( node = rb_next(&entry->offset_index); rb_erase(&entry->offset_index, &cluster->root); BUG_ON(entry->bitmap); + try_merge_free_space(block_group, entry, false); tree_insert_offset(&block_group->free_space_offset, entry->offset, &entry->offset_index, 0); } -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
fs_info, which is allocated in open_ctree(), should be freed in close_ctree(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/disk-io.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a5d2249..089871e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2513,6 +2513,8 @@ int close_ctree(struct btrfs_root *root) kfree(fs_info->chunk_root); kfree(fs_info->dev_root); kfree(fs_info->csum_root); + kfree(fs_info); + return 0; } -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Jan-27 08:44 UTC
[PATCH 08/12] Btrfs: Fix memory leak on finding existing super
From: Ian Kent <raven@themaw.net> We missed a memory deallocation in commit 450ba0ea. If an existing super block is found at mount and there is no error condition then the pre-allocated tree_root and fs_info are no not used and are not freeded. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/super.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 61bd79a..f50253c 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -654,6 +654,8 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags, } btrfs_close_devices(fs_devices); + kfree(fs_info); + kfree(tree_root); } else { char b[BDEVNAME_SIZE]; -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Jan-27 08:45 UTC
[PATCH 09/12] Btrfs: Free correct pointer after using strsep
From: Tero Roponen <tero.roponen@gmail.com> We must save and free the original kstrdup()''ed pointer because strsep() modifies its first argument. Signed-off-by: Tero Roponen <tero.roponen@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/super.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index f50253c..78ee681 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -277,7 +277,7 @@ static int btrfs_parse_early_options(const char *options, fmode_t flags, struct btrfs_fs_devices **fs_devices) { substring_t args[MAX_OPT_ARGS]; - char *opts, *p; + char *opts, *orig, *p; int error = 0; int intarg; @@ -291,6 +291,7 @@ static int btrfs_parse_early_options(const char *options, fmode_t flags, opts = kstrdup(options, GFP_KERNEL); if (!opts) return -ENOMEM; + orig = opts; while ((p = strsep(&opts, ",")) != NULL) { int token; @@ -326,7 +327,7 @@ static int btrfs_parse_early_options(const char *options, fmode_t flags, } out_free_opts: - kfree(opts); + kfree(orig); out: /* * If no subvolume name is specified we use the default one. Allocate -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Jan-27 08:45 UTC
[PATCH 10/12] Btrfs: Don''t return acl info when mounting with noacl option
From: Miao Xie <miaox@cn.fujitsu.com> Steps to reproduce: # mkfs.btrfs /dev/sda2 # mount /dev/sda2 /mnt # touch /mnt/file0 # setfacl -m ''u:root:x,g::x,o::x'' /mnt/file0 # umount /mnt # mount /dev/sda2 -o noacl /mnt # getfacl /mnt/file0 ... user::rw- user:root:--x group::--x mask::--x other::--x The output should be: user::rw- group::--x other::--x Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/acl.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index 2222d16..3c52fc8 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -37,6 +37,9 @@ static struct posix_acl *btrfs_get_acl(struct inode *inode, int type) char *value = NULL; struct posix_acl *acl; + if (!IS_POSIXACL(inode)) + return NULL; + acl = get_cached_acl(inode, type); if (acl != ACL_NOT_CACHED) return acl; @@ -82,6 +85,9 @@ static int btrfs_xattr_acl_get(struct dentry *dentry, const char *name, struct posix_acl *acl; int ret = 0; + if (!IS_POSIXACL(dentry->d_inode)) + return -EOPNOTSUPP; + acl = btrfs_get_acl(dentry->d_inode, type); if (IS_ERR(acl)) -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Jan-27 08:45 UTC
[PATCH 11/12] Btrfs: Fix memory leak in writepage fixup work
From: Miao Xie <miaox@cn.fujitsu.com> fixup, which is allocated when starting page write to fix up the extent without ORDERED bit set, should be freed after this work is done. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/inode.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5f91944..3a6edc4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1544,6 +1544,7 @@ out: out_page: unlock_page(page); page_cache_release(page); + kfree(fixup); } /* -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Jan-27 08:46 UTC
[PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
Suppose: - the source extent is: [0, 100] - the src offset is 10 - the clone length is 90 - the dest offset is 0 This statement: new_key.offset = key.offset + destoff - off will produce such an extent for the dest file: [ino, BTRFS_EXTENT_DATA_KEY, -10] , which is obviously wrong. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/ioctl.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f87552a..1b61dab 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, memcpy(&new_key, &key, sizeof(new_key)); new_key.objectid = inode->i_ino; - new_key.offset = key.offset + destoff - off; + if (off <= key.offset) + new_key.offset = key.offset + destoff - off; + else + new_key.offset = destoff; trans = btrfs_start_transaction(root, 1); if (IS_ERR(trans)) { -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Excerpts from Li Zefan''s message of 2011-01-27 03:42:14 -0500:> Hi Chris, > > Those patches (except the last two) have been sent to the mailing list > before, and they fix real bugs (but not critical bugs). Please consider > queue them for 2.6.38.Thanks, I''ve got these along with more from Josef and some others from the list. They have been running all weekend with good results, so I''ll send out the pull request on Monday. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jan-26 13:52 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
I was looking at the clone range ioctl and have some remarks: On 27.01.2011 09:46, Li Zefan wrote:> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index f87552a..1b61dab 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, > > memcpy(&new_key, &key, sizeof(new_key)); > new_key.objectid = inode->i_ino; > - new_key.offset = key.offset + destoff - off; > + if (off <= key.offset) > + new_key.offset = key.offset + destoff - off; > + else > + new_key.offset = destoff;^^^^^^^ 1) This looks spurious to me. What if destoff isn''t aligned? That''s what the "key.offset - off" code above is for. Before the patch, the code didn''t work at all, I agree. But this fix can only work for aligned requests. 2) The error in new_key also has propagated to the extent item''s backref and wasn''t fixed there. I did a range clone and ended up with an extent item like that: item 30 key (1318842368 EXTENT_ITEM 131072) itemoff 1047 itemsize 169 extent refs 8 gen 11103 flags 1 [...] extent data backref root 257 objectid 272 offset 18446744073709494272 count 1 The last offset (equal to -14 * 4k) is obviously wrong. I didn''t figure out how the variables are computed, but it looks like there''s something wrong with the "datao" u64 to me. 3) Then, there''s this code comment: 2180 /* 2181 * TODO: 2186 * - allow ranges within the same file to be cloned (provided 2187 * they don''t overlap)? 2188 */ This should be safe to do. Even with the current interface, we only check for inode equality. If they differ, cloning is permitted. Make a full-file clone, and you''ll end up with two inodes referring to the same extent. Detection of overlapping areas seems to be missing, though and should be added. Until that, the inode check stands as a (very weak) protection against overlapping clone requests. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-Jan-26 16:17 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
On Thu, Jan 26, 2012 at 02:52:32PM +0100, Jan Schmidt wrote:> I was looking at the clone range ioctl and have some remarks: > > On 27.01.2011 09:46, Li Zefan wrote: > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > > index f87552a..1b61dab 100644 > > --- a/fs/btrfs/ioctl.c > > +++ b/fs/btrfs/ioctl.c > > @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, > > > > memcpy(&new_key, &key, sizeof(new_key)); > > new_key.objectid = inode->i_ino; > > - new_key.offset = key.offset + destoff - off; > > + if (off <= key.offset) > > + new_key.offset = key.offset + destoff - off; > > + else > > + new_key.offset = destoff; > ^^^^^^^ > 1) This looks spurious to me. What if destoff isn''t aligned? That''s what > the "key.offset - off" code above is for. Before the patch, the code > didn''t work at all, I agree. But this fix can only work for aligned > requests.Source range and destination offset are accepted iff are aligned: 2300 /* verify the end result is block aligned */ 2301 if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len, bs) || 2302 !IS_ALIGNED(destoff, bs)) 2303 goto out_unlock; david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2012-Jan-30 06:33 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
Jan Schmidt wrote:> I was looking at the clone range ioctl and have some remarks: > > On 27.01.2011 09:46, Li Zefan wrote: >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> index f87552a..1b61dab 100644 >> --- a/fs/btrfs/ioctl.c >> +++ b/fs/btrfs/ioctl.c >> @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, >> >> memcpy(&new_key, &key, sizeof(new_key)); >> new_key.objectid = inode->i_ino; >> - new_key.offset = key.offset + destoff - off; >> + if (off <= key.offset) >> + new_key.offset = key.offset + destoff - off; >> + else >> + new_key.offset = destoff; > ^^^^^^^ > 1) This looks spurious to me. What if destoff isn''t aligned? That''s what > the "key.offset - off" code above is for. Before the patch, the code > didn''t work at all, I agree. But this fix can only work for aligned > requests. > > 2) The error in new_key also has propagated to the extent item''s backref > and wasn''t fixed there. I did a range clone and ended up with an extent > item like that: > item 30 key (1318842368 EXTENT_ITEM 131072) itemoff 1047 > itemsize 169 > extent refs 8 gen 11103 flags 1 > [...] > extent data backref root 257 objectid 272 offset > 18446744073709494272 count 1 > > The last offset (equal to -14 * 4k) is obviously wrong. I didn''t figure > out how the variables are computed, but it looks like there''s something > wrong with the "datao" u64 to me. >Unfortunately this is expected. The calculation is: extent_item.extent_data_ref.offset = file_pos - file_extent.extent_offset so you may get negative offset. The design idea was to reduce the number of extent backrefs in that a data backref can point to different file extents in the same file (in this case the "count" field > 1). We didn''t expect nagetive offset until range clone was implemented. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jan-30 10:03 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
On 30.01.2012 07:33, Li Zefan wrote:> Jan Schmidt wrote: >> I was looking at the clone range ioctl and have some remarks: >> >> On 27.01.2011 09:46, Li Zefan wrote: >>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >>> index f87552a..1b61dab 100644 >>> --- a/fs/btrfs/ioctl.c >>> +++ b/fs/btrfs/ioctl.c >>> @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, >>> >>> memcpy(&new_key, &key, sizeof(new_key)); >>> new_key.objectid = inode->i_ino; >>> - new_key.offset = key.offset + destoff - off; >>> + if (off <= key.offset) >>> + new_key.offset = key.offset + destoff - off; >>> + else >>> + new_key.offset = destoff; >> ^^^^^^^ >> 1) This looks spurious to me. What if destoff isn''t aligned? That''s what >> the "key.offset - off" code above is for. Before the patch, the code >> didn''t work at all, I agree. But this fix can only work for aligned >> requests. >> >> 2) The error in new_key also has propagated to the extent item''s backref >> and wasn''t fixed there. I did a range clone and ended up with an extent >> item like that: >> item 30 key (1318842368 EXTENT_ITEM 131072) itemoff 1047 >> itemsize 169 >> extent refs 8 gen 11103 flags 1 >> [...] >> extent data backref root 257 objectid 272 offset >> 18446744073709494272 count 1 >> >> The last offset (equal to -14 * 4k) is obviously wrong. I didn''t figure >> out how the variables are computed, but it looks like there''s something >> wrong with the "datao" u64 to me. >> > > Unfortunately this is expected. The calculation is: > > extent_item.extent_data_ref.offset = file_pos - file_extent.extent_offset > > so you may get negative offset.I see where the negative offset comes from. But what can this offset be used for?> The design idea was to reduce the number of extent backrefs in that > a data backref can point to different file extents in the same file > (in this case the "count" field > 1). We didn''t expect nagetive > offset until range clone was implemented.Reducing the number of backrefs is a good thing. In case of count > 1, it''s clear that the offset cannot reference all of the extent data items. We have different design choices: a) Use the above computation and leave the filesystem with an unusable offset value for extent backrefs. b) Use either one of the extent data item offsets this backref references. c) Always use a predefined constant (like 0 or -1) when count > 1. d) Disallow count > 1 for those refs and turn them into shared refs as soon as count gets > 1. I don''t like a :-) -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2012-Feb-01 09:44 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
Jan Schmidt wrote:> On 30.01.2012 07:33, Li Zefan wrote: >> Jan Schmidt wrote: >>> I was looking at the clone range ioctl and have some remarks: >>> >>> On 27.01.2011 09:46, Li Zefan wrote: >>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >>>> index f87552a..1b61dab 100644 >>>> --- a/fs/btrfs/ioctl.c >>>> +++ b/fs/btrfs/ioctl.c >>>> @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, >>>> >>>> memcpy(&new_key, &key, sizeof(new_key)); >>>> new_key.objectid = inode->i_ino; >>>> - new_key.offset = key.offset + destoff - off; >>>> + if (off <= key.offset) >>>> + new_key.offset = key.offset + destoff - off; >>>> + else >>>> + new_key.offset = destoff; >>> ^^^^^^^ >>> 1) This looks spurious to me. What if destoff isn''t aligned? That''s what >>> the "key.offset - off" code above is for. Before the patch, the code >>> didn''t work at all, I agree. But this fix can only work for aligned >>> requests. >>> >>> 2) The error in new_key also has propagated to the extent item''s backref >>> and wasn''t fixed there. I did a range clone and ended up with an extent >>> item like that: >>> item 30 key (1318842368 EXTENT_ITEM 131072) itemoff 1047 >>> itemsize 169 >>> extent refs 8 gen 11103 flags 1 >>> [...] >>> extent data backref root 257 objectid 272 offset >>> 18446744073709494272 count 1 >>> >>> The last offset (equal to -14 * 4k) is obviously wrong. I didn''t figure >>> out how the variables are computed, but it looks like there''s something >>> wrong with the "datao" u64 to me. >>> >> >> Unfortunately this is expected. The calculation is: >> >> extent_item.extent_data_ref.offset = file_pos - file_extent.extent_offset >> >> so you may get negative offset. > > I see where the negative offset comes from. But what can this offset be > used for? > >> The design idea was to reduce the number of extent backrefs in that >> a data backref can point to different file extents in the same file >> (in this case the "count" field > 1). We didn''t expect nagetive >> offset until range clone was implemented. > > Reducing the number of backrefs is a good thing. In case of count > 1, > it''s clear that the offset cannot reference all of the extent data > items. We have different design choices: > > a) Use the above computation and leave the filesystem with an unusable > offset value for extent backrefs. > > b) Use either one of the extent data item offsets this backref references. > > c) Always use a predefined constant (like 0 or -1) when count > 1. > > d) Disallow count > 1 for those refs and turn them into shared refs as > soon as count gets > 1. >I expressed the same doubt. See this thread: http://marc.info/?t=131425912800001&r=1&w=2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Yan, Zheng
2012-Feb-02 04:25 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
On Thu, Jan 27, 2011 at 4:46 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:> Suppose: > - the source extent is: [0, 100] > - the src offset is 10 > - the clone length is 90 > - the dest offset is 0 > > This statement: > > new_key.offset = key.offset + destoff - off > > will produce such an extent for the dest file: > > [ino, BTRFS_EXTENT_DATA_KEY, -10] > > , which is obviously wrong. > > Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> > --- > fs/btrfs/ioctl.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index f87552a..1b61dab 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, > > memcpy(&new_key, &key, sizeof(new_key)); > new_key.objectid = inode->i_ino; > - new_key.offset = key.offset + destoff - off; > + if (off <= key.offset) > + new_key.offset = key.offset + destoff - off; > + else > + new_key.offset = destoff; > > trans = btrfs_start_transaction(root, 1); > if (IS_ERR(trans)) {This is a disk format change, will cause Oops when deleting or truncating the file. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Yan, Zheng
2012-Feb-02 04:31 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
On Mon, Jan 30, 2012 at 6:03 PM, Jan Schmidt <list.btrfs@jan-o-sch.net> wrote:> On 30.01.2012 07:33, Li Zefan wrote: >> Jan Schmidt wrote: >>> I was looking at the clone range ioctl and have some remarks: >>> >>> On 27.01.2011 09:46, Li Zefan wrote: >>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >>>> index f87552a..1b61dab 100644 >>>> --- a/fs/btrfs/ioctl.c >>>> +++ b/fs/btrfs/ioctl.c >>>> @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, >>>> >>>> memcpy(&new_key, &key, sizeof(new_key)); >>>> new_key.objectid = inode->i_ino; >>>> - new_key.offset = key.offset + destoff - off; >>>> + if (off <= key.offset) >>>> + new_key.offset = key.offset + destoff - off; >>>> + else >>>> + new_key.offset = destoff; >>> ^^^^^^^ >>> 1) This looks spurious to me. What if destoff isn''t aligned? That''s what >>> the "key.offset - off" code above is for. Before the patch, the code >>> didn''t work at all, I agree. But this fix can only work for aligned >>> requests. >>> >>> 2) The error in new_key also has propagated to the extent item''s backref >>> and wasn''t fixed there. I did a range clone and ended up with an extent >>> item like that: >>> item 30 key (1318842368 EXTENT_ITEM 131072) itemoff 1047 >>> itemsize 169 >>> extent refs 8 gen 11103 flags 1 >>> [...] >>> extent data backref root 257 objectid 272 offset >>> 18446744073709494272 count 1 >>> >>> The last offset (equal to -14 * 4k) is obviously wrong. I didn''t figure >>> out how the variables are computed, but it looks like there''s something >>> wrong with the "datao" u64 to me. >>> >> >> Unfortunately this is expected. The calculation is: >> >> extent_item.extent_data_ref.offset = file_pos - file_extent.extent_offset >> >> so you may get negative offset. > > I see where the negative offset comes from. But what can this offset be > used for? >The offset in backref isn''t equal to the offset of the file extent, it''s just a hint for searching file extents, Negative offset isn''t that bad if you consider it as value that is close to zero.>> The design idea was to reduce the number of extent backrefs in that >> a data backref can point to different file extents in the same file >> (in this case the "count" field > 1). We didn''t expect nagetive >> offset until range clone was implemented. > > Reducing the number of backrefs is a good thing. In case of count > 1, > it''s clear that the offset cannot reference all of the extent data > items. We have different design choices: > > a) Use the above computation and leave the filesystem with an unusable > offset value for extent backrefs. > > b) Use either one of the extent data item offsets this backref references. > > c) Always use a predefined constant (like 0 or -1) when count > 1. > > d) Disallow count > 1 for those refs and turn them into shared refs as > soon as count gets > 1. > > I don''t like a :-) > > -Jan > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2012-Feb-02 05:31 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
Yan, Zheng wrote:> On Thu, Jan 27, 2011 at 4:46 PM, Li Zefan <lizf@cn.fujitsu.com> wrote: >> Suppose: >> - the source extent is: [0, 100] >> - the src offset is 10 >> - the clone length is 90 >> - the dest offset is 0 >> >> This statement: >> >> new_key.offset = key.offset + destoff - off >> >> will produce such an extent for the dest file: >> >> [ino, BTRFS_EXTENT_DATA_KEY, -10] >> >> , which is obviously wrong. >> >> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> >> --- >> fs/btrfs/ioctl.c | 5 ++++- >> 1 files changed, 4 insertions(+), 1 deletions(-) >> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> index f87552a..1b61dab 100644 >> --- a/fs/btrfs/ioctl.c >> +++ b/fs/btrfs/ioctl.c >> @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, >> >> memcpy(&new_key, &key, sizeof(new_key)); >> new_key.objectid = inode->i_ino; >> - new_key.offset = key.offset + destoff - off; >> + if (off <= key.offset) >> + new_key.offset = key.offset + destoff - off; >> + else >> + new_key.offset = destoff; >> >> trans = btrfs_start_transaction(root, 1); >> if (IS_ERR(trans)) { > > This is a disk format change, will cause Oops when deleting or > truncating the file. >This is a bug fix, never mean to make a change on disk format. I''ll appreciate if you point out what exactly is wrong in this fix. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Yan, Zheng
2012-Feb-02 06:44 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
On Thu, Feb 2, 2012 at 1:31 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:> Yan, Zheng wrote: >> On Thu, Jan 27, 2011 at 4:46 PM, Li Zefan <lizf@cn.fujitsu.com> wrote: >>> Suppose: >>> - the source extent is: [0, 100] >>> - the src offset is 10 >>> - the clone length is 90 >>> - the dest offset is 0 >>> >>> This statement: >>> >>> new_key.offset = key.offset + destoff - off >>> >>> will produce such an extent for the dest file: >>> >>> [ino, BTRFS_EXTENT_DATA_KEY, -10] >>> >>> , which is obviously wrong. >>> >>> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> >>> --- >>> fs/btrfs/ioctl.c | 5 ++++- >>> 1 files changed, 4 insertions(+), 1 deletions(-) >>> >>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >>> index f87552a..1b61dab 100644 >>> --- a/fs/btrfs/ioctl.c >>> +++ b/fs/btrfs/ioctl.c >>> @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, >>> >>> memcpy(&new_key, &key, sizeof(new_key)); >>> new_key.objectid = inode->i_ino; >>> - new_key.offset = key.offset + destoff - off; >>> + if (off <= key.offset) >>> + new_key.offset = key.offset + destoff - off; >>> + else >>> + new_key.offset = destoff; >>> >>> trans = btrfs_start_transaction(root, 1); >>> if (IS_ERR(trans)) { >> >> This is a disk format change, will cause Oops when deleting or >> truncating the file. >> > > This is a bug fix, never mean to make a change on disk format. I''ll > appreciate if you point out what exactly is wrong in this fix.sorry, I was misleaded by the backref discuss, your fix is correct -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Feb-02 21:00 UTC
Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0
On 02.02.2012 05:31, Yan, Zheng wrote:> On Mon, Jan 30, 2012 at 6:03 PM, Jan Schmidt <list.btrfs@jan-o-sch.net> wrote: >> On 30.01.2012 07:33, Li Zefan wrote: >>> Jan Schmidt wrote: >>>> [...] >>> Unfortunately this is expected. The calculation is: >>> >>> extent_item.extent_data_ref.offset = file_pos - file_extent.extent_offset >>> >>> so you may get negative offset. >> >> I see where the negative offset comes from. But what can this offset be >> used for? >> > > The offset in backref isn''t equal to the offset of the file extent, > it''s just a hint for searching > file extentsI see. Thanks for clarifying. And, Li: Thanks for bringing the old thread back to my mind. Nevertheless, that still seems quite a strange choice to me. Is there a deep sense behind it that is deeper than I''m currently looking? -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html