Hello, This patch series adds an initial implementation of restriper (it''s a clever name for relocation framework that allows to do selective profile changing and selective balancing with some goodies like pausing/resuming and reporting progress to the user. Profile changing is global (per-FS) so far, per-subvolume profiles require some discussion and can be implemented in future. This is a RFC so some features/problems are not yet implemented/resolved. The current TODO list is as follows: 1) do pause/cancel via trans commit instead of waiting for the current chunk to be fully relocated 2) fix problems with left-over chunks (the ones we were being relocating to when the crash occured, this is going to become a bigger problem when item 1 is done 3) fix remount problems (get rid of deadlocks that occur on remount while relocating, stop restriper on remounts) 4) issue a discard on removed chunks - 1 GiB+ discards can be a big deal There is also a couple of problems related to profile changing and resuming that I''m working on right now. But the basic infrastructure is all there and is ready for reviewing. This patchset deprecates Hugo''s "Balance management" patch series. Originally this was supposed to be just a profile changing thing merged with those patches, but the merge turned out to be a complete rewrite. The filters part was rewritten to be per-chunk-type and the management part was thrown away because we now store an item to disk, there is the difference between pausing and cancelling, locking is different, etc. I''m happy to integrate any ideas that got dropped as a result of this. Thanks to Arne who did an early review and Chris for overall guidance. Any comments/suggestions are appreciated. The series is on top of 3.1-rc3, available at: git://github.com/idryomov/btrfs-unstable.git restriper-rfc Thanks, Ilya Ilya Dryomov (21): Btrfs: get rid of *_alloc_profile fields Btrfs: introduce masks for chunk type and profile Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit Btrfs: make avail_*_alloc_bits fields dynamic Btrfs: add basic restriper infrastructure Btrfs: implement online profile changing Btrfs: add basic infrastructure for selective balancing Btrfs: soft profile changing mode (aka soft convert) Btrfs: profiles filter Btrfs: usage filter Btrfs: devid filter Btrfs: devid subset filter Btrfs: virtual address space subset filter Btrfs: save restripe parameters to disk Btrfs: recover restripe on mount Btrfs: allow for cancelling restriper Btrfs: allow for pausing restriper Btrfs: allow for resuming restriper after it was paused Btrfs: add skip_restripe mount option Btrfs: get rid of btrfs_balance() function Btrfs: add restripe progress reporting fs/btrfs/ctree.h | 156 ++++++++++- fs/btrfs/disk-io.c | 15 +- fs/btrfs/extent-tree.c | 118 ++++++-- fs/btrfs/ioctl.c | 214 ++++++++++++- fs/btrfs/ioctl.h | 44 +++ fs/btrfs/super.c | 8 +- fs/btrfs/volumes.c | 780 +++++++++++++++++++++++++++++++++++++++++++++--- fs/btrfs/volumes.h | 57 ++++- 8 files changed, 1304 insertions(+), 88 deletions(-) -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
{data,metadata,system}_alloc_profile fields have been unused for a long time now. Get rid of them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ctree.h | 3 --- fs/btrfs/disk-io.c | 3 --- fs/btrfs/extent-tree.c | 10 ++++------ fs/btrfs/volumes.c | 6 ++---- 4 files changed, 6 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 03912c5..dcf2fd7 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1095,9 +1095,6 @@ struct btrfs_fs_info { u64 avail_data_alloc_bits; u64 avail_metadata_alloc_bits; u64 avail_system_alloc_bits; - u64 data_alloc_profile; - u64 metadata_alloc_profile; - u64 system_alloc_profile; unsigned data_chunk_allocations; unsigned metadata_ratio; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 07b3ac6..46d0412 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1997,9 +1997,6 @@ struct btrfs_root *open_ctree(struct super_block *sb, fs_info->generation = generation; fs_info->last_trans_committed = generation; - fs_info->data_alloc_profile = (u64)-1; - fs_info->metadata_alloc_profile = (u64)-1; - fs_info->system_alloc_profile = fs_info->metadata_alloc_profile; ret = btrfs_init_space_info(fs_info); if (ret) { diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f5be06a..4e1b763 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2998,14 +2998,12 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) static u64 get_alloc_profile(struct btrfs_root *root, u64 flags) { if (flags & BTRFS_BLOCK_GROUP_DATA) - flags |= root->fs_info->avail_data_alloc_bits & - root->fs_info->data_alloc_profile; + flags |= root->fs_info->avail_data_alloc_bits; else if (flags & BTRFS_BLOCK_GROUP_SYSTEM) - flags |= root->fs_info->avail_system_alloc_bits & - root->fs_info->system_alloc_profile; + flags |= root->fs_info->avail_system_alloc_bits; else if (flags & BTRFS_BLOCK_GROUP_METADATA) - flags |= root->fs_info->avail_metadata_alloc_bits & - root->fs_info->metadata_alloc_profile; + flags |= root->fs_info->avail_metadata_alloc_bits; + return btrfs_reduce_alloc_profile(root, flags); } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f2a4cc7..ed96275 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2711,8 +2711,7 @@ static noinline int init_first_rw_device(struct btrfs_trans_handle *trans, return ret; alloc_profile = BTRFS_BLOCK_GROUP_METADATA | - (fs_info->metadata_alloc_profile & - fs_info->avail_metadata_alloc_bits); + fs_info->avail_metadata_alloc_bits; alloc_profile = btrfs_reduce_alloc_profile(root, alloc_profile); ret = __btrfs_alloc_chunk(trans, extent_root, &map, &chunk_size, @@ -2722,8 +2721,7 @@ static noinline int init_first_rw_device(struct btrfs_trans_handle *trans, sys_chunk_offset = chunk_offset + chunk_size; alloc_profile = BTRFS_BLOCK_GROUP_SYSTEM | - (fs_info->system_alloc_profile & - fs_info->avail_system_alloc_bits); + fs_info->avail_system_alloc_bits; alloc_profile = btrfs_reduce_alloc_profile(root, alloc_profile); ret = __btrfs_alloc_chunk(trans, extent_root, &sys_map, -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Aug-23 20:01 UTC
[PATCH 02/21] Btrfs: introduce masks for chunk type and profile
Chunk''s type and profile are encoded in u64 flags field. Introduce masks to easily access them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ctree.h | 8 ++++++++ fs/btrfs/extent-tree.c | 12 +++--------- fs/btrfs/volumes.c | 11 ++--------- 3 files changed, 13 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index dcf2fd7..b882c95 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -717,6 +717,14 @@ struct btrfs_csum_item { #define BTRFS_BLOCK_GROUP_RAID10 (1 << 6) #define BTRFS_NR_RAID_TYPES 5 +#define BTRFS_BLOCK_GROUP_TYPE_MASK (BTRFS_BLOCK_GROUP_DATA | \ + BTRFS_BLOCK_GROUP_SYSTEM | \ + BTRFS_BLOCK_GROUP_METADATA) + +#define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \ + BTRFS_BLOCK_GROUP_RAID1 | \ + BTRFS_BLOCK_GROUP_DUP | \ + BTRFS_BLOCK_GROUP_RAID10) struct btrfs_block_group_item { __le64 used; __le64 chunk_objectid; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4e1b763..de4c639 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -559,8 +559,7 @@ static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info, struct list_head *head = &info->space_info; struct btrfs_space_info *found; - flags &= BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_SYSTEM | - BTRFS_BLOCK_GROUP_METADATA; + flags &= BTRFS_BLOCK_GROUP_TYPE_MASK; rcu_read_lock(); list_for_each_entry_rcu(found, head, list) { @@ -2924,9 +2923,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, INIT_LIST_HEAD(&found->block_groups[i]); init_rwsem(&found->groups_sem); spin_lock_init(&found->lock); - found->flags = flags & (BTRFS_BLOCK_GROUP_DATA | - BTRFS_BLOCK_GROUP_SYSTEM | - BTRFS_BLOCK_GROUP_METADATA); + found->flags = flags & BTRFS_BLOCK_GROUP_TYPE_MASK; found->total_bytes = total_bytes; found->disk_total = total_bytes * factor; found->bytes_used = bytes_used; @@ -2947,10 +2944,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) { - u64 extra_flags = flags & (BTRFS_BLOCK_GROUP_RAID0 | - BTRFS_BLOCK_GROUP_RAID1 | - BTRFS_BLOCK_GROUP_RAID10 | - BTRFS_BLOCK_GROUP_DUP); + u64 extra_flags = flags & BTRFS_BLOCK_GROUP_PROFILE_MASK; if (extra_flags) { if (flags & BTRFS_BLOCK_GROUP_DATA) fs_info->avail_data_alloc_bits |= extra_flags; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ed96275..af4bf56 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2908,12 +2908,8 @@ again: } } if (rw & REQ_DISCARD) { - if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | - BTRFS_BLOCK_GROUP_RAID1 | - BTRFS_BLOCK_GROUP_DUP | - BTRFS_BLOCK_GROUP_RAID10)) { + if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) stripes_required = map->num_stripes; - } } if (multi_ret && (rw & (REQ_WRITE | REQ_DISCARD)) && stripes_allocated < stripes_required) { @@ -2937,10 +2933,7 @@ again: if (rw & REQ_DISCARD) *length = min_t(u64, em->len - offset, *length); - else if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | - BTRFS_BLOCK_GROUP_RAID1 | - BTRFS_BLOCK_GROUP_RAID10 | - BTRFS_BLOCK_GROUP_DUP)) { + else if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { /* we limit the length of each bio to what fits in a stripe */ *length = min_t(u64, em->len - offset, map->stripe_len - stripe_offset); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Aug-23 20:01 UTC
[PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit
Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for avail_{data,metadata,system}_alloc_bits fields, which are there to tell us about available allocation profiles in the fs. When chunk is created, it''s profile is OR''ed with respective avail_alloc_bits field. Since SINGLE is denoted by 0 in the on-disk format, currently there is no way to tell when such chunks become avaialble. Restriper needs that information, so add a separate bit for SINGLE profile. This bit is going to be in-memory only, it should never be written out to disk, so it''s not a disk format change. However to avoid remappings in future, reserve corresponding on-disk bit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ctree.h | 12 ++++++++++++ fs/btrfs/extent-tree.c | 22 ++++++++++++++-------- 2 files changed, 26 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b882c95..5b00eb8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -725,6 +725,17 @@ struct btrfs_csum_item { BTRFS_BLOCK_GROUP_RAID1 | \ BTRFS_BLOCK_GROUP_DUP | \ BTRFS_BLOCK_GROUP_RAID10) +/* + * We need a bit for restriper to be able to tell when chunks of type + * SINGLE are available. It is used in avail_*_alloc_bits. + */ +#define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7) + +/* + * To avoid troubles or remappings, reserve on-disk bit. + */ +#define BTRFS_BLOCK_GROUP_RESERVED (1 << 7) + struct btrfs_block_group_item { __le64 used; __le64 chunk_objectid; @@ -1100,6 +1111,7 @@ struct btrfs_fs_info { spinlock_t ref_cache_lock; u64 total_ref_cache_size; + /* SINGLE has it''s own bit for these three */ u64 avail_data_alloc_bits; u64 avail_metadata_alloc_bits; u64 avail_system_alloc_bits; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index de4c639..ed35eb5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2945,14 +2945,17 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) { u64 extra_flags = flags & BTRFS_BLOCK_GROUP_PROFILE_MASK; - if (extra_flags) { - if (flags & BTRFS_BLOCK_GROUP_DATA) - fs_info->avail_data_alloc_bits |= extra_flags; - if (flags & BTRFS_BLOCK_GROUP_METADATA) - fs_info->avail_metadata_alloc_bits |= extra_flags; - if (flags & BTRFS_BLOCK_GROUP_SYSTEM) - fs_info->avail_system_alloc_bits |= extra_flags; - } + + /* on-disk -> in-memory */ + if (extra_flags == 0) + extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE; + + if (flags & BTRFS_BLOCK_GROUP_DATA) + fs_info->avail_data_alloc_bits |= extra_flags; + if (flags & BTRFS_BLOCK_GROUP_METADATA) + fs_info->avail_metadata_alloc_bits |= extra_flags; + if (flags & BTRFS_BLOCK_GROUP_SYSTEM) + fs_info->avail_system_alloc_bits |= extra_flags; } u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) @@ -2986,6 +2989,9 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) (flags & BTRFS_BLOCK_GROUP_RAID10) | (flags & BTRFS_BLOCK_GROUP_DUP))) flags &= ~BTRFS_BLOCK_GROUP_RAID0; + + /* in-memory -> on-disk */ + flags &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE; return flags; } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Aug-23 20:01 UTC
[PATCH 04/21] Btrfs: make avail_*_alloc_bits fields dynamic
Currently when new chunks are created respective avail_alloc_bits field is updated to reflect profiles of all chunks present in the system. However when chunks are removed, corresponding profile bits are never cleared. This patch clears corresponding bit of avail_alloc_bits field when the last chunk of that type goes away. Restriper needs this to properly operate when "downgrading". Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/extent-tree.c | 20 ++++++++++++++++++++ 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ed35eb5..a04f99b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7197,6 +7197,22 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, return 0; } +static void clear_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) +{ + u64 extra_flags = flags & BTRFS_BLOCK_GROUP_PROFILE_MASK; + + /* on-disk -> in-memory */ + if (extra_flags == 0) + extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE; + + if (flags & BTRFS_BLOCK_GROUP_DATA) + fs_info->avail_data_alloc_bits &= ~extra_flags; + if (flags & BTRFS_BLOCK_GROUP_METADATA) + fs_info->avail_metadata_alloc_bits &= ~extra_flags; + if (flags & BTRFS_BLOCK_GROUP_SYSTEM) + fs_info->avail_system_alloc_bits &= ~extra_flags; +} + int btrfs_remove_block_group(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 group_start) { @@ -7207,6 +7223,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, struct btrfs_key key; struct inode *inode; int ret; + int index; int factor; root = root->fs_info->extent_root; @@ -7222,6 +7239,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, free_excluded_extents(root, block_group); memcpy(&key, &block_group->key, sizeof(key)); + index = get_block_group_index(block_group); if (block_group->flags & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10)) @@ -7296,6 +7314,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, * are still on the list after taking the semaphore */ list_del_init(&block_group->list); + if (list_empty(&block_group->space_info->block_groups[index])) + clear_avail_alloc_bits(root->fs_info, block_group->flags); up_write(&block_group->space_info->groups_sem); if (block_group->cached == BTRFS_CACHE_STARTED) -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Aug-23 20:01 UTC
[PATCH 05/21] Btrfs: add basic restriper infrastructure
Add basic restriper infrastructure: ioctl to start restripe, all restripe ioctl data structures, add data structure for tracking restriper''s state to fs_info. Duplicate balancing code for restriper, btrfs_balance() will be removed when restriper is implemented. Explicitly disallow any volume operations when restriper is running. (previously this restriction relied on volume_mutex being held during the execution of any volume operation) Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ctree.h | 5 + fs/btrfs/disk-io.c | 4 + fs/btrfs/ioctl.c | 107 ++++++++++++++++++++++---- fs/btrfs/ioctl.h | 37 +++++++++ fs/btrfs/volumes.c | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 18 ++++ 6 files changed, 369 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 5b00eb8..65d7562 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -895,6 +895,7 @@ struct btrfs_block_group_cache { }; struct reloc_control; +struct restripe_control; struct btrfs_device; struct btrfs_fs_devices; struct btrfs_delayed_root; @@ -1116,6 +1117,10 @@ struct btrfs_fs_info { u64 avail_metadata_alloc_bits; u64 avail_system_alloc_bits; + spinlock_t restripe_lock; + struct mutex restripe_mutex; + struct restripe_control *restripe_ctl; + unsigned data_chunk_allocations; unsigned metadata_ratio; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 46d0412..fa2301b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1700,6 +1700,10 @@ struct btrfs_root *open_ctree(struct super_block *sb, init_rwsem(&fs_info->scrub_super_lock); fs_info->scrub_workers_refcnt = 0; + spin_lock_init(&fs_info->restripe_lock); + mutex_init(&fs_info->restripe_mutex); + fs_info->restripe_ctl = NULL; + sb->s_blocksize = 4096; sb->s_blocksize_bits = blksize_bits(4096); sb->s_bdi = &fs_info->bdi; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 970977a..9dfc686 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1165,13 +1165,21 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, if (!capable(CAP_SYS_ADMIN)) return -EPERM; + mutex_lock(&root->fs_info->volume_mutex); + if (root->fs_info->restripe_ctl) { + printk(KERN_INFO "btrfs: restripe in progress\n"); + ret = -EINVAL; + goto out; + } + vol_args = memdup_user(arg, sizeof(*vol_args)); - if (IS_ERR(vol_args)) - return PTR_ERR(vol_args); + if (IS_ERR(vol_args)) { + ret = PTR_ERR(vol_args); + goto out; + } vol_args->name[BTRFS_PATH_NAME_MAX] = ''\0''; - mutex_lock(&root->fs_info->volume_mutex); sizestr = vol_args->name; devstr = strchr(sizestr, '':''); if (devstr) { @@ -1188,7 +1196,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, printk(KERN_INFO "resizer unable to find device %llu\n", (unsigned long long)devid); ret = -EINVAL; - goto out_unlock; + goto out_free; } if (!strcmp(sizestr, "max")) new_size = device->bdev->bd_inode->i_size; @@ -1203,7 +1211,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, new_size = memparse(sizestr, NULL); if (new_size == 0) { ret = -EINVAL; - goto out_unlock; + goto out_free; } } @@ -1212,7 +1220,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, if (mod < 0) { if (new_size > old_size) { ret = -EINVAL; - goto out_unlock; + goto out_free; } new_size = old_size - new_size; } else if (mod > 0) { @@ -1221,11 +1229,11 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, if (new_size < 256 * 1024 * 1024) { ret = -EINVAL; - goto out_unlock; + goto out_free; } if (new_size > device->bdev->bd_inode->i_size) { ret = -EFBIG; - goto out_unlock; + goto out_free; } do_div(new_size, root->sectorsize); @@ -1238,7 +1246,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { ret = PTR_ERR(trans); - goto out_unlock; + goto out_free; } ret = btrfs_grow_device(trans, device, new_size); btrfs_commit_transaction(trans, root); @@ -1246,9 +1254,10 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, ret = btrfs_shrink_device(device, new_size); } -out_unlock: - mutex_unlock(&root->fs_info->volume_mutex); +out_free: kfree(vol_args); +out: + mutex_unlock(&root->fs_info->volume_mutex); return ret; } @@ -2014,14 +2023,25 @@ static long btrfs_ioctl_add_dev(struct btrfs_root *root, void __user *arg) if (!capable(CAP_SYS_ADMIN)) return -EPERM; + mutex_lock(&root->fs_info->volume_mutex); + if (root->fs_info->restripe_ctl) { + printk(KERN_INFO "btrfs: restripe in progress\n"); + ret = -EINVAL; + goto out; + } + vol_args = memdup_user(arg, sizeof(*vol_args)); - if (IS_ERR(vol_args)) - return PTR_ERR(vol_args); + if (IS_ERR(vol_args)) { + ret = PTR_ERR(vol_args); + goto out; + } vol_args->name[BTRFS_PATH_NAME_MAX] = ''\0''; ret = btrfs_init_new_device(root, vol_args->name); kfree(vol_args); +out: + mutex_unlock(&root->fs_info->volume_mutex); return ret; } @@ -2036,14 +2056,25 @@ static long btrfs_ioctl_rm_dev(struct btrfs_root *root, void __user *arg) if (root->fs_info->sb->s_flags & MS_RDONLY) return -EROFS; + mutex_lock(&root->fs_info->volume_mutex); + if (root->fs_info->restripe_ctl) { + printk(KERN_INFO "btrfs: restripe in progress\n"); + ret = -EINVAL; + goto out; + } + vol_args = memdup_user(arg, sizeof(*vol_args)); - if (IS_ERR(vol_args)) - return PTR_ERR(vol_args); + if (IS_ERR(vol_args)) { + ret = PTR_ERR(vol_args); + goto out; + } vol_args->name[BTRFS_PATH_NAME_MAX] = ''\0''; ret = btrfs_rm_device(root, vol_args->name); kfree(vol_args); +out: + mutex_unlock(&root->fs_info->volume_mutex); return ret; } @@ -2833,6 +2864,50 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root, return ret; } +static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg) +{ + struct btrfs_ioctl_restripe_args *rargs; + struct btrfs_fs_info *fs_info = root->fs_info; + struct restripe_control *rctl; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (fs_info->sb->s_flags & MS_RDONLY) + return -EROFS; + + mutex_lock(&fs_info->restripe_mutex); + + rargs = memdup_user(arg, sizeof(*rargs)); + if (IS_ERR(rargs)) { + ret = PTR_ERR(rargs); + goto out; + } + + rctl = kzalloc(sizeof(*rctl), GFP_NOFS); + if (!rctl) { + kfree(rargs); + ret = -ENOMEM; + goto out; + } + + rctl->fs_info = fs_info; + rctl->flags = rargs->flags; + + memcpy(&rctl->data, &rargs->data, sizeof(rctl->data)); + memcpy(&rctl->meta, &rargs->meta, sizeof(rctl->meta)); + memcpy(&rctl->sys, &rargs->sys, sizeof(rctl->sys)); + + ret = btrfs_restripe(rctl); + + /* rctl freed in unset_restripe_control */ + kfree(rargs); +out: + mutex_unlock(&fs_info->restripe_mutex); + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2905,6 +2980,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_scrub_cancel(root, argp); case BTRFS_IOC_SCRUB_PROGRESS: return btrfs_ioctl_scrub_progress(root, argp); + case BTRFS_IOC_RESTRIPE: + return btrfs_ioctl_restripe(root, argp); } return -ENOTTY; diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index ad1ea78..798f1d4 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -109,6 +109,41 @@ struct btrfs_ioctl_fs_info_args { __u64 reserved[124]; /* pad to 1k */ }; +struct btrfs_restripe_args { + __u64 profiles; + __u64 usage; + __u64 devid; + __u64 pstart; + __u64 pend; + __u64 vstart; + __u64 vend; + + __u64 target; + + __u64 flags; + + __u64 unused[8]; +} __attribute__ ((__packed__)); + +struct btrfs_restripe_progress { + __u64 expected; + __u64 considered; + __u64 completed; +}; + +struct btrfs_ioctl_restripe_args { + __u64 flags; + __u64 state; + + struct btrfs_restripe_args data; + struct btrfs_restripe_args sys; + struct btrfs_restripe_args meta; + + struct btrfs_restripe_progress stat; + + __u64 unused[72]; /* pad to 1k */ +}; + #define BTRFS_INO_LOOKUP_PATH_MAX 4080 struct btrfs_ioctl_ino_lookup_args { __u64 treeid; @@ -248,4 +283,6 @@ struct btrfs_ioctl_space_args { struct btrfs_ioctl_dev_info_args) #define BTRFS_IOC_FS_INFO _IOR(BTRFS_IOCTL_MAGIC, 31, \ struct btrfs_ioctl_fs_info_args) +#define BTRFS_IOC_RESTRIPE _IOW(BTRFS_IOCTL_MAGIC, 32, \ + struct btrfs_ioctl_restripe_args) #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index af4bf56..0e4a276 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1262,7 +1262,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) bool clear_super = false; mutex_lock(&uuid_mutex); - mutex_lock(&root->fs_info->volume_mutex); all_avail = root->fs_info->avail_data_alloc_bits | root->fs_info->avail_system_alloc_bits | @@ -1427,7 +1426,6 @@ error_close: if (bdev) blkdev_put(bdev, FMODE_READ | FMODE_EXCL); out: - mutex_unlock(&root->fs_info->volume_mutex); mutex_unlock(&uuid_mutex); return ret; error_undo: @@ -1604,7 +1602,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) } filemap_write_and_wait(bdev->bd_inode->i_mapping); - mutex_lock(&root->fs_info->volume_mutex); devices = &root->fs_info->fs_devices->devices; /* @@ -1728,8 +1725,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) ret = btrfs_relocate_sys_chunks(root); BUG_ON(ret); } -out: - mutex_unlock(&root->fs_info->volume_mutex); + return ret; error: blkdev_put(bdev, FMODE_EXCL); @@ -1737,7 +1733,7 @@ error: mutex_unlock(&uuid_mutex); up_write(&sb->s_umount); } - goto out; + return ret; } static noinline int btrfs_update_device(struct btrfs_trans_handle *trans, @@ -2155,6 +2151,217 @@ error: } /* + * Should be called with both restripe and volume mutexes held to + * serialize other volume operations (add_dev/rm_dev/resize) wrt + * restriper. Same goes for unset_restripe_control(). + */ +static void set_restripe_control(struct restripe_control *rctl) +{ + struct btrfs_fs_info *fs_info = rctl->fs_info; + + spin_lock(&fs_info->restripe_lock); + fs_info->restripe_ctl = rctl; + spin_unlock(&fs_info->restripe_lock); +} + +static void unset_restripe_control(struct btrfs_fs_info *fs_info) +{ + struct restripe_control *rctl = fs_info->restripe_ctl; + + spin_lock(&fs_info->restripe_lock); + fs_info->restripe_ctl = NULL; + spin_unlock(&fs_info->restripe_lock); + + kfree(rctl); +} + +static int __btrfs_restripe(struct btrfs_root *dev_root) +{ + struct list_head *devices; + struct btrfs_device *device; + u64 old_size; + u64 size_to_free; + struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_trans_handle *trans; + int ret; + int enospc_errors = 0; + + /* step one make some room on all the devices */ + devices = &dev_root->fs_info->fs_devices->devices; + list_for_each_entry(device, devices, dev_list) { + old_size = device->total_bytes; + size_to_free = div_factor(old_size, 1); + size_to_free = min(size_to_free, (u64)1 * 1024 * 1024); + if (!device->writeable || + device->total_bytes - device->bytes_used > size_to_free) + continue; + + ret = btrfs_shrink_device(device, old_size - size_to_free); + if (ret == -ENOSPC) + break; + BUG_ON(ret); + + trans = btrfs_start_transaction(dev_root, 0); + BUG_ON(IS_ERR(trans)); + + ret = btrfs_grow_device(trans, device, old_size); + BUG_ON(ret); + + btrfs_end_transaction(trans, dev_root); + } + + /* step two, relocate all the chunks */ + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto error; + } + + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; + key.offset = (u64)-1; + key.type = BTRFS_CHUNK_ITEM_KEY; + + while (1) { + ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0); + if (ret < 0) + goto error; + + /* + * this shouldn''t happen, it means the last relocate + * failed + */ + if (ret == 0) + BUG_ON(1); /* DIS - break ? */ + + ret = btrfs_previous_item(chunk_root, path, 0, + BTRFS_CHUNK_ITEM_KEY); + if (ret) + BUG_ON(1); /* DIS - break ? */ + + btrfs_item_key_to_cpu(path->nodes[0], &found_key, + path->slots[0]); + if (found_key.objectid != key.objectid) + break; + + /* chunk zero is special */ + if (found_key.offset == 0) + break; + + btrfs_release_path(path); + ret = btrfs_relocate_chunk(chunk_root, + chunk_root->root_key.objectid, + found_key.objectid, + found_key.offset); + if (ret && ret != -ENOSPC) + goto error; + if (ret == -ENOSPC) + enospc_errors++; + key.offset = found_key.offset - 1; + } + +error: + btrfs_free_path(path); + if (enospc_errors) { + printk(KERN_INFO "btrfs: restripe finished with %d enospc " + "error(s)\n", enospc_errors); + ret = -ENOSPC; + } + + return ret; +} + +/* + * Should be called with restripe_mutex held + */ +int btrfs_restripe(struct restripe_control *rctl) +{ + struct btrfs_fs_info *fs_info = rctl->fs_info; + u64 allowed; + int ret; + + mutex_lock(&fs_info->volume_mutex); + + /* + * Profile changing sanity checks + */ + allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE; + if (fs_info->fs_devices->num_devices == 1) + allowed |= BTRFS_BLOCK_GROUP_DUP; + else if (fs_info->fs_devices->num_devices < 4) + allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1); + else + allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID10); + + if (rctl->data.target & ~allowed) { + printk(KERN_ERR "btrfs: unable to start restripe with target " + "data profile %llu\n", + (unsigned long long)rctl->data.target); + ret = -EINVAL; + goto out; + } + if (rctl->sys.target & ~allowed) { + printk(KERN_ERR "btrfs: unable to start restripe with target " + "system profile %llu\n", + (unsigned long long)rctl->sys.target); + ret = -EINVAL; + goto out; + } + if (rctl->meta.target & ~allowed) { + printk(KERN_ERR "btrfs: unable to start restripe with target " + "metadata profile %llu\n", + (unsigned long long)rctl->meta.target); + ret = -EINVAL; + goto out; + } + + if (rctl->data.target & BTRFS_BLOCK_GROUP_DUP) { + printk(KERN_ERR "btrfs: dup for data is not allowed\n"); + ret = -EINVAL; + goto out; + } + + /* allow to reduce meta or sys integrity only if force set */ + allowed = BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID10; + if (((rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT) && + (fs_info->avail_system_alloc_bits & allowed) && + !(rctl->sys.target & allowed)) || + ((rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT) && + (fs_info->avail_metadata_alloc_bits & allowed) && + !(rctl->meta.target & allowed))) { + if (rctl->flags & BTRFS_RESTRIPE_FORCE) { + printk(KERN_INFO "btrfs: force reducing metadata " + "integrity\n"); + } else { + printk(KERN_ERR "btrfs: can''t reduce metadata " + "integrity\n"); + ret = -EINVAL; + goto out; + } + } + + set_restripe_control(rctl); + mutex_unlock(&fs_info->volume_mutex); + + ret = __btrfs_restripe(fs_info->dev_root); + + mutex_lock(&fs_info->volume_mutex); + unset_restripe_control(fs_info); + mutex_unlock(&fs_info->volume_mutex); + + return ret; + +out: + mutex_unlock(&fs_info->volume_mutex); + kfree(rctl); + return ret; +} + +/* * shrinking a device means finding all of the device extents past * the new size, and then following the back refs to the chunks. * The chunk relocation code actually frees the device extent diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 6d866db..8804c5c 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -168,6 +168,23 @@ struct map_lookup { #define map_lookup_size(n) (sizeof(struct map_lookup) + \ (sizeof(struct btrfs_bio_stripe) * (n))) +#define BTRFS_RESTRIPE_FORCE (1ULL << 3) + +/* + * Profile changing flags + */ +#define BTRFS_RESTRIPE_ARGS_CONVERT (1ULL << 8) + +struct btrfs_restripe_args; +struct restripe_control { + struct btrfs_fs_info *fs_info; + u64 flags; + + struct btrfs_restripe_args data; + struct btrfs_restripe_args sys; + struct btrfs_restripe_args meta; +}; + int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start, u64 end, u64 *length); @@ -211,6 +228,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid, int btrfs_shrink_device(struct btrfs_device *device, u64 new_size); int btrfs_init_new_device(struct btrfs_root *root, char *path); int btrfs_balance(struct btrfs_root *dev_root); +int btrfs_restripe(struct restripe_control *rctl); int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); int find_free_dev_extent(struct btrfs_trans_handle *trans, struct btrfs_device *device, u64 num_bytes, -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Profile changing is done by initializing target field in respective btrfs_restripe_args structs and launching a balance. Reducing code in this mode will pick restriper''s target profile if it''s available instead of doing a blind reduce. If target profile is not yet available go back to plain reducing. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/extent-tree.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 53 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a04f99b..05e55d1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2968,6 +2968,34 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) u64 num_devices = root->fs_info->fs_devices->rw_devices + root->fs_info->fs_devices->missing_devices; + /* pick restriper''s target profile if it''s available */ + spin_lock(&root->fs_info->restripe_lock); + if (root->fs_info->restripe_ctl) { + struct restripe_control *rctl = root->fs_info->restripe_ctl; + u64 t = 0; + + if ((flags & BTRFS_BLOCK_GROUP_DATA) && + (rctl->data.flags & BTRFS_RESTRIPE_ARGS_CONVERT) && + (flags & rctl->data.target)) { + t = BTRFS_BLOCK_GROUP_DATA | rctl->data.target; + } else if ((flags & BTRFS_BLOCK_GROUP_SYSTEM) && + (rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT) && + (flags & rctl->sys.target)) { + t = BTRFS_BLOCK_GROUP_SYSTEM | rctl->sys.target; + } else if ((flags & BTRFS_BLOCK_GROUP_METADATA) && + (rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT) && + (flags & rctl->meta.target)) { + t = BTRFS_BLOCK_GROUP_METADATA | rctl->meta.target; + } + + if (t) { + spin_unlock(&root->fs_info->restripe_lock); + t &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE; + return t; + } + } + spin_unlock(&root->fs_info->restripe_lock); + if (num_devices == 1) flags &= ~(BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID0); if (num_devices < 4) @@ -2987,8 +3015,9 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) if ((flags & BTRFS_BLOCK_GROUP_RAID0) && ((flags & BTRFS_BLOCK_GROUP_RAID1) | (flags & BTRFS_BLOCK_GROUP_RAID10) | - (flags & BTRFS_BLOCK_GROUP_DUP))) + (flags & BTRFS_BLOCK_GROUP_DUP))) { flags &= ~BTRFS_BLOCK_GROUP_RAID0; + } /* in-memory -> on-disk */ flags &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE; @@ -6519,6 +6548,29 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags) u64 stripped = BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10; + if (root->fs_info->restripe_ctl) { + struct restripe_control *rctl = root->fs_info->restripe_ctl; + u64 t = 0; + + /* pick restriper''s target profile and return */ + if (flags & BTRFS_BLOCK_GROUP_DATA && + rctl->data.flags & BTRFS_RESTRIPE_ARGS_CONVERT) { + t = BTRFS_BLOCK_GROUP_DATA | rctl->data.target; + } else if (flags & BTRFS_BLOCK_GROUP_SYSTEM && + rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT) { + t = BTRFS_BLOCK_GROUP_SYSTEM | rctl->sys.target; + } else if (flags & BTRFS_BLOCK_GROUP_METADATA && + rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT) { + t = BTRFS_BLOCK_GROUP_METADATA | rctl->meta.target; + } + + if (t) { + /* in-memory -> on-disk */ + t &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE; + return t; + } + } + /* * we add in the count of missing devices because we want * to make sure that any RAID levels on a degraded FS -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Aug-23 20:01 UTC
[PATCH 07/21] Btrfs: add basic infrastructure for selective balancing
This allows to have a separate set of filters for each chunk type (data,meta,sys). The code however is generic and switch on chunk type is only done once. This commit also adds a type filter: it allows to balance for example meta and system chunks w/o touching data ones. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/volumes.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 12 +++++++++ 2 files changed, 76 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 0e4a276..95c6310 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2175,6 +2175,30 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info) kfree(rctl); } +static int should_restripe_chunk(struct btrfs_root *root, + struct extent_buffer *leaf, + struct btrfs_chunk *chunk, u64 chunk_offset) +{ + struct restripe_control *rctl = root->fs_info->restripe_ctl; + u64 chunk_type = btrfs_chunk_type(leaf, chunk); + struct btrfs_restripe_args *rargs = NULL; + + /* type filter */ + if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) & + (rctl->flags & BTRFS_RESTRIPE_TYPE_MASK))) { + return 0; + } + + if (chunk_type & BTRFS_BLOCK_GROUP_DATA) + rargs = &rctl->data; + else if (chunk_type & BTRFS_BLOCK_GROUP_SYSTEM) + rargs = &rctl->sys; + else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA) + rargs = &rctl->meta; + + return 1; +} + static int __btrfs_restripe(struct btrfs_root *dev_root) { struct list_head *devices; @@ -2182,10 +2206,13 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) u64 old_size; u64 size_to_free; struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root; + struct btrfs_chunk *chunk; struct btrfs_path *path; struct btrfs_key key; struct btrfs_key found_key; struct btrfs_trans_handle *trans; + struct extent_buffer *leaf; + int slot; int ret; int enospc_errors = 0; @@ -2241,8 +2268,10 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) if (ret) BUG_ON(1); /* DIS - break ? */ - btrfs_item_key_to_cpu(path->nodes[0], &found_key, - path->slots[0]); + leaf = path->nodes[0]; + slot = path->slots[0]; + btrfs_item_key_to_cpu(leaf, &found_key, slot); + if (found_key.objectid != key.objectid) break; @@ -2250,6 +2279,14 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) if (found_key.offset == 0) break; + chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk); + + if (!should_restripe_chunk(chunk_root, leaf, chunk, + found_key.offset)) { + btrfs_release_path(path); + goto loop; + } + btrfs_release_path(path); ret = btrfs_relocate_chunk(chunk_root, chunk_root->root_key.objectid, @@ -2259,6 +2296,7 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) goto error; if (ret == -ENOSPC) enospc_errors++; +loop: key.offset = found_key.offset - 1; } @@ -2285,8 +2323,30 @@ int btrfs_restripe(struct restripe_control *rctl) mutex_lock(&fs_info->volume_mutex); /* - * Profile changing sanity checks + * In case of mixed groups both data and meta should be picked, + * and identical options should be given for both of them. */ + allowed = btrfs_super_incompat_flags(&fs_info->super_copy); + if ((allowed & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) && + (rctl->flags & (BTRFS_RESTRIPE_DATA | BTRFS_RESTRIPE_METADATA))) { + if (!(rctl->flags & BTRFS_RESTRIPE_DATA) || + !(rctl->flags & BTRFS_RESTRIPE_METADATA) || + memcmp(&rctl->data, &rctl->meta, sizeof(rctl->data))) { + printk(KERN_ERR "btrfs: with mixed groups data and " + "metadata restripe options must be the same\n"); + ret = -EINVAL; + goto out; + } + } + + /* + * Profile changing sanity checks. Skip them if a simple + * balance is requested. + */ + if (!((rctl->data.flags | rctl->sys.flags | rctl->meta.flags) & + BTRFS_RESTRIPE_ARGS_CONVERT)) + goto do_restripe; + allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE; if (fs_info->fs_devices->num_devices == 1) allowed |= BTRFS_BLOCK_GROUP_DUP; @@ -2344,6 +2404,7 @@ int btrfs_restripe(struct restripe_control *rctl) } } +do_restripe: set_restripe_control(rctl); mutex_unlock(&fs_info->volume_mutex); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 8804c5c..f40227e 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -168,6 +168,18 @@ struct map_lookup { #define map_lookup_size(n) (sizeof(struct map_lookup) + \ (sizeof(struct btrfs_bio_stripe) * (n))) +/* + * Restriper''s general "type" filter. Shares bits with chunk type for + * simplicity, RESTRIPE prefix is used to avoid confusion. + */ +#define BTRFS_RESTRIPE_DATA (1ULL << 0) +#define BTRFS_RESTRIPE_SYSTEM (1ULL << 1) +#define BTRFS_RESTRIPE_METADATA (1ULL << 2) + +#define BTRFS_RESTRIPE_TYPE_MASK (BTRFS_RESTRIPE_DATA | \ + BTRFS_RESTRIPE_SYSTEM | \ + BTRFS_RESTRIPE_METADATA) + #define BTRFS_RESTRIPE_FORCE (1ULL << 3) /* -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Aug-23 20:01 UTC
[PATCH 08/21] Btrfs: soft profile changing mode (aka soft convert)
When doing convert from one profile to another if soft mode is on restriper won''t touch chunks that already have the profile we are converting to. This is useful if e.g. half of the fs was converted earlier. The soft mode switch is per-type (like everything else). This means that we can convert for example meta chunks the "hard" way while converting data chunks selectively with soft switch. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/volumes.c | 26 ++++++++++++++++++++++++++ fs/btrfs/volumes.h | 5 ++++- 2 files changed, 30 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 95c6310..ff252ef 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2175,6 +2175,26 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info) kfree(rctl); } +/* + * Restripe filters. Return 1 if chunk should be ''filtered out'', + * ie should not be restriped. + */ +static int chunk_soft_convert_filter(u64 chunk_profile, + struct btrfs_restripe_args *rargs) +{ + BUG_ON(!(rargs->flags & BTRFS_RESTRIPE_ARGS_CONVERT)); + + chunk_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK; + + if (chunk_profile == 0) + chunk_profile = BTRFS_AVAIL_ALLOC_BIT_SINGLE; + + if (rargs->target & chunk_profile) + return 1; + + return 0; +} + static int should_restripe_chunk(struct btrfs_root *root, struct extent_buffer *leaf, struct btrfs_chunk *chunk, u64 chunk_offset) @@ -2196,6 +2216,12 @@ static int should_restripe_chunk(struct btrfs_root *root, else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA) rargs = &rctl->meta; + /* soft profile changing mode */ + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) && + chunk_soft_convert_filter(chunk_type, rargs)) { + return 0; + } + return 1; } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index f40227e..1852f69 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -183,9 +183,12 @@ struct map_lookup { #define BTRFS_RESTRIPE_FORCE (1ULL << 3) /* - * Profile changing flags + * Profile changing flags. When SOFT is set we won''t relocate chunk if + * it already has the target profile (even though it may be + * half-filled). */ #define BTRFS_RESTRIPE_ARGS_CONVERT (1ULL << 8) +#define BTRFS_RESTRIPE_ARGS_SOFT (1ULL << 9) struct btrfs_restripe_args; struct restripe_control { -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Select chunks based on a given profile mask. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/volumes.c | 20 ++++++++++++++++++++ fs/btrfs/volumes.h | 5 +++++ 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ff252ef..f045615 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2179,6 +2179,20 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info) * Restripe filters. Return 1 if chunk should be ''filtered out'', * ie should not be restriped. */ +static int chunk_profiles_filter(u64 chunk_profile, + struct btrfs_restripe_args *rargs) +{ + chunk_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK; + + if (chunk_profile == 0) + chunk_profile = BTRFS_AVAIL_ALLOC_BIT_SINGLE; + + if (rargs->profiles & chunk_profile) + return 0; + + return 1; +} + static int chunk_soft_convert_filter(u64 chunk_profile, struct btrfs_restripe_args *rargs) { @@ -2216,6 +2230,12 @@ static int should_restripe_chunk(struct btrfs_root *root, else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA) rargs = &rctl->meta; + /* profiles filter */ + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_PROFILES) && + chunk_profiles_filter(chunk_type, rargs)) { + return 0; + } + /* soft profile changing mode */ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) && chunk_soft_convert_filter(chunk_type, rargs)) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 1852f69..9f96ad8 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -183,6 +183,11 @@ struct map_lookup { #define BTRFS_RESTRIPE_FORCE (1ULL << 3) /* + * Restripe filters + */ +#define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0) + +/* * Profile changing flags. When SOFT is set we won''t relocate chunk if * it already has the target profile (even though it may be * half-filled). -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Select chunks that are less than X percent full. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/volumes.c | 33 +++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 2 files changed, 34 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f045615..b49ecfa 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2193,6 +2193,33 @@ static int chunk_profiles_filter(u64 chunk_profile, return 1; } +static u64 div_factor_fine(u64 num, int factor) +{ + if (factor == 100) + return num; + num *= factor; + do_div(num, 100); + return num; +} + +static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, + struct btrfs_restripe_args *rargs) +{ + struct btrfs_block_group_cache *cache; + u64 chunk_used, user_thresh; + int ret = 1; + + cache = btrfs_lookup_block_group(fs_info, chunk_offset); + chunk_used = btrfs_block_group_used(&cache->item); + + user_thresh = div_factor_fine(cache->key.offset, rargs->usage); + if (chunk_used < user_thresh) + ret = 0; + + btrfs_put_block_group(cache); + return ret; +} + static int chunk_soft_convert_filter(u64 chunk_profile, struct btrfs_restripe_args *rargs) { @@ -2236,6 +2263,12 @@ static int should_restripe_chunk(struct btrfs_root *root, return 0; } + /* usage filter */ + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_USAGE) && + chunk_usage_filter(rctl->fs_info, chunk_offset, rargs)) { + return 0; + } + /* soft profile changing mode */ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) && chunk_soft_convert_filter(chunk_type, rargs)) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 9f96ad8..c6baf4b 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -186,6 +186,7 @@ struct map_lookup { * Restripe filters */ #define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0) +#define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1) /* * Profile changing flags. When SOFT is set we won''t relocate chunk if -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Relocate chunks which have at least one stripe located on a device with devid X. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/volumes.c | 23 +++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 2 files changed, 24 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index b49ecfa..ce2a9e0 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2220,6 +2220,23 @@ static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, return ret; } +static int chunk_devid_filter(struct extent_buffer *leaf, + struct btrfs_chunk *chunk, + struct btrfs_restripe_args *rargs) +{ + struct btrfs_stripe *stripe; + int num_stripes = btrfs_chunk_num_stripes(leaf, chunk); + int i; + + for (i = 0; i < num_stripes; i++) { + stripe = btrfs_stripe_nr(chunk, i); + if (btrfs_stripe_devid(leaf, stripe) == rargs->devid) + return 0; + } + + return 1; +} + static int chunk_soft_convert_filter(u64 chunk_profile, struct btrfs_restripe_args *rargs) { @@ -2269,6 +2286,12 @@ static int should_restripe_chunk(struct btrfs_root *root, return 0; } + /* devid filter */ + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_DEVID) && + chunk_devid_filter(leaf, chunk, rargs)) { + return 0; + } + /* soft profile changing mode */ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) && chunk_soft_convert_filter(chunk_type, rargs)) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index c6baf4b..1b8dc3e 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -187,6 +187,7 @@ struct map_lookup { */ #define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0) #define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1) +#define BTRFS_RESTRIPE_ARGS_DEVID (1ULL << 2) /* * Profile changing flags. When SOFT is set we won''t relocate chunk if -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Select chunks which have at least one byte of at least one stripe located on a device with devid X in a given [pstart,pend) physical address range. This filter only works when devid filter is turned on. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/volumes.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 2 files changed, 46 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ce2a9e0..4393f6d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2237,6 +2237,45 @@ static int chunk_devid_filter(struct extent_buffer *leaf, return 1; } +/* [pstart, pend) */ +static int chunk_drange_filter(struct extent_buffer *leaf, + struct btrfs_chunk *chunk, + u64 chunk_offset, + struct btrfs_restripe_args *rargs) +{ + struct btrfs_stripe *stripe; + int num_stripes = btrfs_chunk_num_stripes(leaf, chunk); + u64 stripe_offset; + u64 stripe_length; + int factor; + int i; + + BUG_ON(!(rargs->flags & BTRFS_RESTRIPE_ARGS_DEVID)); + + if (btrfs_chunk_type(leaf, chunk) & (BTRFS_BLOCK_GROUP_DUP | + BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10)) + factor = 2; + else + factor = 1; + factor = num_stripes / factor; + + for (i = 0; i < num_stripes; i++) { + stripe = btrfs_stripe_nr(chunk, i); + if (btrfs_stripe_devid(leaf, stripe) != rargs->devid) + continue; + + stripe_offset = btrfs_stripe_offset(leaf, stripe); + stripe_length = btrfs_chunk_length(leaf, chunk); + do_div(stripe_length, factor); + + if (stripe_offset < rargs->pend && + stripe_offset + stripe_length > rargs->pstart) + return 0; + } + + return 1; +} + static int chunk_soft_convert_filter(u64 chunk_profile, struct btrfs_restripe_args *rargs) { @@ -2292,6 +2331,12 @@ static int should_restripe_chunk(struct btrfs_root *root, return 0; } + /* drange filter, makes sense only with devid filter */ + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_DRANGE) && + chunk_drange_filter(leaf, chunk, chunk_offset, rargs)) { + return 0; + } + /* soft profile changing mode */ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) && chunk_soft_convert_filter(chunk_type, rargs)) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 1b8dc3e..8d4bbcb 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -188,6 +188,7 @@ struct map_lookup { #define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0) #define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1) #define BTRFS_RESTRIPE_ARGS_DEVID (1ULL << 2) +#define BTRFS_RESTRIPE_ARGS_DRANGE (1ULL << 3) /* * Profile changing flags. When SOFT is set we won''t relocate chunk if -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Aug-23 20:01 UTC
[PATCH 13/21] Btrfs: virtual address space subset filter
Select chunks which have at least one byte located inside a given [vstart, vend) virtual address space range. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/volumes.c | 20 ++++++++++++++++++++ fs/btrfs/volumes.h | 3 ++- 2 files changed, 22 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4393f6d..eccd458 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2276,6 +2276,20 @@ static int chunk_drange_filter(struct extent_buffer *leaf, return 1; } +/* [vstart, vend) */ +static int chunk_vrange_filter(struct extent_buffer *leaf, + struct btrfs_chunk *chunk, + u64 chunk_offset, + struct btrfs_restripe_args *rargs) +{ + if (chunk_offset < rargs->vend && + chunk_offset + btrfs_chunk_length(leaf, chunk) > rargs->vstart) + /* at least part of the chunk is inside this vrange */ + return 0; + + return 1; +} + static int chunk_soft_convert_filter(u64 chunk_profile, struct btrfs_restripe_args *rargs) { @@ -2337,6 +2351,12 @@ static int should_restripe_chunk(struct btrfs_root *root, return 0; } + /* vrange filter */ + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_VRANGE) && + chunk_vrange_filter(leaf, chunk, chunk_offset, rargs)) { + return 0; + } + /* soft profile changing mode */ if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) && chunk_soft_convert_filter(chunk_type, rargs)) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 8d4bbcb..9726180 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -188,7 +188,8 @@ struct map_lookup { #define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0) #define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1) #define BTRFS_RESTRIPE_ARGS_DEVID (1ULL << 2) -#define BTRFS_RESTRIPE_ARGS_DRANGE (1ULL << 3) +#define BTRFS_RESTRIPE_ARGS_DRANGE (1ULL << 3) +#define BTRFS_RESTRIPE_ARGS_VRANGE (1ULL << 4) /* * Profile changing flags. When SOFT is set we won''t relocate chunk if -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Introduce a new btree objectid for storing restripe item. The reason is to be able to resume restriper after a crash with the same parameters. Restripe item has a very high objectid and goes into tree of tree roots. The key for the new item is as follows: [ BTRFS_RESTRIPE_OBJECTID ; 0 ; 0 ] Older kernels simply ignore it so it''s safe to mount with an older kernel and then go back to the newer one. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ctree.h | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/volumes.c | 105 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 228 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 65d7562..b524034 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -85,6 +85,9 @@ struct btrfs_ordered_sum; /* holds checksums of all the data extents */ #define BTRFS_CSUM_TREE_OBJECTID 7ULL +/* for storing restripe params in the root tree */ +#define BTRFS_RESTRIPE_OBJECTID -4ULL + /* orhpan objectid for tracking unlinked/truncated files */ #define BTRFS_ORPHAN_OBJECTID -5ULL @@ -649,6 +652,47 @@ struct btrfs_root_ref { __le16 name_len; } __attribute__ ((__packed__)); +/* + * Restriper stuff + */ +struct btrfs_disk_restripe_args { + /* profiles to touch, in-memory format */ + __le64 profiles; + + /* usage filter */ + __le64 usage; + + /* devid filter */ + __le64 devid; + + /* devid subset filter [pstart..pend) */ + __le64 pstart; + __le64 pend; + + /* btrfs virtual address space subset filter [vstart..vend) */ + __le64 vstart; + __le64 vend; + + /* profile to convert to, in-memory format */ + __le64 target; + + /* BTRFS_RESTRIPE_ARGS_* */ + __le64 flags; + + __le64 unused[8]; +} __attribute__ ((__packed__)); + +struct btrfs_restripe_item { + /* BTRFS_RESTRIPE_* */ + __le64 flags; + + struct btrfs_disk_restripe_args data; + struct btrfs_disk_restripe_args sys; + struct btrfs_disk_restripe_args meta; + + __le64 unused[4]; +} __attribute__ ((__packed__)); + #define BTRFS_FILE_EXTENT_INLINE 0 #define BTRFS_FILE_EXTENT_REG 1 #define BTRFS_FILE_EXTENT_PREALLOC 2 @@ -727,7 +771,8 @@ struct btrfs_csum_item { BTRFS_BLOCK_GROUP_RAID10) /* * We need a bit for restriper to be able to tell when chunks of type - * SINGLE are available. It is used in avail_*_alloc_bits. + * SINGLE are available. It is used in avail_*_alloc_bits and restripe + * item fields. */ #define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7) @@ -2000,8 +2045,86 @@ static inline bool btrfs_root_readonly(struct btrfs_root *root) return root->root_item.flags & BTRFS_ROOT_SUBVOL_RDONLY; } -/* struct btrfs_super_block */ +/* struct btrfs_restripe_item */ +BTRFS_SETGET_FUNCS(restripe_flags, struct btrfs_restripe_item, flags, 64); + +static inline void btrfs_restripe_data(struct extent_buffer *eb, + struct btrfs_restripe_item *ri, + struct btrfs_disk_restripe_args *ra) +{ + read_eb_member(eb, ri, struct btrfs_restripe_item, data, ra); +} +static inline void btrfs_set_restripe_data(struct extent_buffer *eb, + struct btrfs_restripe_item *ri, + struct btrfs_disk_restripe_args *ra) +{ + write_eb_member(eb, ri, struct btrfs_restripe_item, data, ra); +} + +static inline void btrfs_restripe_meta(struct extent_buffer *eb, + struct btrfs_restripe_item *ri, + struct btrfs_disk_restripe_args *ra) +{ + read_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra); +} + +static inline void btrfs_set_restripe_meta(struct extent_buffer *eb, + struct btrfs_restripe_item *ri, + struct btrfs_disk_restripe_args *ra) +{ + write_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra); +} + +static inline void btrfs_restripe_sys(struct extent_buffer *eb, + struct btrfs_restripe_item *ri, + struct btrfs_disk_restripe_args *ra) +{ + read_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra); +} + +static inline void btrfs_set_restripe_sys(struct extent_buffer *eb, + struct btrfs_restripe_item *ri, + struct btrfs_disk_restripe_args *ra) +{ + write_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra); +} + +static inline void +btrfs_disk_restripe_args_to_cpu(struct btrfs_restripe_args *cpu, + struct btrfs_disk_restripe_args *disk) +{ + memset(cpu, 0, sizeof(*cpu)); + + cpu->profiles = le64_to_cpu(disk->profiles); + cpu->usage = le64_to_cpu(disk->usage); + cpu->devid = le64_to_cpu(disk->devid); + cpu->pstart = le64_to_cpu(disk->pstart); + cpu->pend = le64_to_cpu(disk->pend); + cpu->vstart = le64_to_cpu(disk->vstart); + cpu->vend = le64_to_cpu(disk->vend); + cpu->target = le64_to_cpu(disk->target); + cpu->flags = le64_to_cpu(disk->flags); +} + +static inline void +btrfs_cpu_restripe_args_to_disk(struct btrfs_disk_restripe_args *disk, + struct btrfs_restripe_args *cpu) +{ + memset(disk, 0, sizeof(*disk)); + + disk->profiles = cpu_to_le64(cpu->profiles); + disk->usage = cpu_to_le64(cpu->usage); + disk->devid = cpu_to_le64(cpu->devid); + disk->pstart = cpu_to_le64(cpu->pstart); + disk->pend = cpu_to_le64(cpu->pend); + disk->vstart = cpu_to_le64(cpu->vstart); + disk->vend = cpu_to_le64(cpu->vend); + disk->target = cpu_to_le64(cpu->target); + disk->flags = cpu_to_le64(cpu->flags); +} + +/* struct btrfs_super_block */ BTRFS_SETGET_STACK_FUNCS(super_bytenr, struct btrfs_super_block, bytenr, 64); BTRFS_SETGET_STACK_FUNCS(super_flags, struct btrfs_super_block, flags, 64); BTRFS_SETGET_STACK_FUNCS(super_generation, struct btrfs_super_block, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index eccd458..1057ad3 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2150,6 +2150,97 @@ error: return ret; } +static int insert_restripe_item(struct btrfs_root *root, + struct restripe_control *rctl) +{ + struct btrfs_trans_handle *trans; + struct btrfs_restripe_item *item; + struct btrfs_disk_restripe_args disk_rargs; + struct btrfs_path *path; + struct extent_buffer *leaf; + struct btrfs_key key; + int ret, err; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) { + btrfs_free_path(path); + return PTR_ERR(trans); + } + + key.objectid = BTRFS_RESTRIPE_OBJECTID; + key.type = 0; + key.offset = 0; + + ret = btrfs_insert_empty_item(trans, root, path, &key, + sizeof(*item)); + if (ret) + goto out; + + leaf = path->nodes[0]; + item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_restripe_item); + + memset_extent_buffer(leaf, 0, (unsigned long)item, sizeof(*item)); + + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->data); + btrfs_set_restripe_data(leaf, item, &disk_rargs); + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->meta); + btrfs_set_restripe_meta(leaf, item, &disk_rargs); + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->sys); + btrfs_set_restripe_sys(leaf, item, &disk_rargs); + + btrfs_set_restripe_flags(leaf, item, rctl->flags); + + btrfs_mark_buffer_dirty(leaf); +out: + btrfs_free_path(path); + err = btrfs_commit_transaction(trans, root); + if (err && !ret) + ret = err; + return ret; +} + +static int del_restripe_item(struct btrfs_root *root) +{ + struct btrfs_trans_handle *trans; + struct btrfs_path *path; + struct btrfs_key key; + int ret, err; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) { + btrfs_free_path(path); + return PTR_ERR(trans); + } + + key.objectid = BTRFS_RESTRIPE_OBJECTID; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(trans, root, &key, path, -1, 1); + if (ret < 0) + goto out; + if (ret > 0) { + ret = -ENOENT; + goto out; + } + + ret = btrfs_del_item(trans, root, path); +out: + btrfs_free_path(path); + err = btrfs_commit_transaction(trans, root); + if (err && !ret) + ret = err; + return ret; +} + /* * Should be called with both restripe and volume mutexes held to * serialize other volume operations (add_dev/rm_dev/resize) wrt @@ -2485,6 +2576,7 @@ int btrfs_restripe(struct restripe_control *rctl) { struct btrfs_fs_info *fs_info = rctl->fs_info; u64 allowed; + int err; int ret; mutex_lock(&fs_info->volume_mutex); @@ -2572,16 +2664,25 @@ int btrfs_restripe(struct restripe_control *rctl) } do_restripe: + ret = insert_restripe_item(fs_info->tree_root, rctl); + if (ret && ret != -EEXIST) + goto out; + BUG_ON(ret == -EEXIST); + set_restripe_control(rctl); mutex_unlock(&fs_info->volume_mutex); - ret = __btrfs_restripe(fs_info->dev_root); + err = __btrfs_restripe(fs_info->dev_root); mutex_lock(&fs_info->volume_mutex); + unset_restripe_control(fs_info); + ret = del_restripe_item(fs_info->tree_root); + BUG_ON(ret); + mutex_unlock(&fs_info->volume_mutex); - return ret; + return err; out: mutex_unlock(&fs_info->volume_mutex); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On mount, if restripe item is found, resume restripe in a separate kernel thread. Try to be smart to continue roughly where previous balance (or convert) was interrupted. For chunk types that were being converted to some profile we turn on soft convert, in case of a simple balance we turn on usage filter and relocate only less-than-90%-full chunks of that type. These are just heuristics but they help quite a bit, and can be improved in future. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/disk-io.c | 3 + fs/btrfs/ioctl.c | 2 +- fs/btrfs/volumes.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 3 +- 4 files changed, 127 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index fa2301b..b3950f2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2103,6 +2103,9 @@ struct btrfs_root *open_ctree(struct super_block *sb, if (!err) err = btrfs_orphan_cleanup(fs_info->tree_root); up_read(&fs_info->cleanup_work_sem); + + err = btrfs_recover_restripe(fs_info->tree_root); + if (err) { close_ctree(tree_root); return ERR_PTR(err); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9dfc686..f371edd 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2899,7 +2899,7 @@ static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg) memcpy(&rctl->meta, &rargs->meta, sizeof(rctl->meta)); memcpy(&rctl->sys, &rargs->sys, sizeof(rctl->sys)); - ret = btrfs_restripe(rctl); + ret = btrfs_restripe(rctl, 0); /* rctl freed in unset_restripe_control */ kfree(rargs); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1057ad3..4490124 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -23,6 +23,7 @@ #include <linux/random.h> #include <linux/iocontext.h> #include <linux/capability.h> +#include <linux/kthread.h> #include <asm/div64.h> #include "compat.h" #include "ctree.h" @@ -2242,16 +2243,58 @@ out: } /* + * This is a heuristic used to reduce the number of chunks restriped on + * resume after balance was interrupted. + */ +static void update_restripe_args(struct restripe_control *rctl) +{ + /* + * Turn on soft mode for chunk types that were being converted. + */ + if (rctl->data.flags & BTRFS_RESTRIPE_ARGS_CONVERT) + rctl->data.flags |= BTRFS_RESTRIPE_ARGS_SOFT; + if (rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT) + rctl->sys.flags |= BTRFS_RESTRIPE_ARGS_SOFT; + if (rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT) + rctl->meta.flags |= BTRFS_RESTRIPE_ARGS_SOFT; + + /* + * Turn on usage filter if is not already used. The idea is + * that chunks that we have already balanced should be + * reasonably full. Don''t do it for chunks that are being + * converted - that will keep us from relocating unconverted + * (albeit full) chunks. + */ + if (!(rctl->data.flags & BTRFS_RESTRIPE_ARGS_USAGE) && + !(rctl->data.flags & BTRFS_RESTRIPE_ARGS_CONVERT)) { + rctl->data.flags |= BTRFS_RESTRIPE_ARGS_USAGE; + rctl->data.usage = 90; + } + if (!(rctl->sys.flags & BTRFS_RESTRIPE_ARGS_USAGE) && + !(rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT)) { + rctl->sys.flags |= BTRFS_RESTRIPE_ARGS_USAGE; + rctl->sys.usage = 90; + } + if (!(rctl->meta.flags & BTRFS_RESTRIPE_ARGS_USAGE) && + !(rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT)) { + rctl->meta.flags |= BTRFS_RESTRIPE_ARGS_USAGE; + rctl->meta.usage = 90; + } +} + +/* * Should be called with both restripe and volume mutexes held to * serialize other volume operations (add_dev/rm_dev/resize) wrt * restriper. Same goes for unset_restripe_control(). */ -static void set_restripe_control(struct restripe_control *rctl) +static void set_restripe_control(struct restripe_control *rctl, int update) { struct btrfs_fs_info *fs_info = rctl->fs_info; spin_lock(&fs_info->restripe_lock); fs_info->restripe_ctl = rctl; + if (update) + update_restripe_args(rctl); spin_unlock(&fs_info->restripe_lock); } @@ -2572,7 +2615,7 @@ error: /* * Should be called with restripe_mutex held */ -int btrfs_restripe(struct restripe_control *rctl) +int btrfs_restripe(struct restripe_control *rctl, int resume) { struct btrfs_fs_info *fs_info = rctl->fs_info; u64 allowed; @@ -2667,9 +2710,9 @@ do_restripe: ret = insert_restripe_item(fs_info->tree_root, rctl); if (ret && ret != -EEXIST) goto out; - BUG_ON(ret == -EEXIST); + BUG_ON(ret == -EEXIST && !resume); - set_restripe_control(rctl); + set_restripe_control(rctl, resume); mutex_unlock(&fs_info->volume_mutex); err = __btrfs_restripe(fs_info->dev_root); @@ -2690,6 +2733,80 @@ out: return ret; } +static int restriper_kthread(void *data) +{ + struct restripe_control *rctl = (struct restripe_control *)data; + struct btrfs_fs_info *fs_info = rctl->fs_info; + int ret; + + mutex_lock(&fs_info->restripe_mutex); + + printk(KERN_INFO "btrfs: continuing restripe\n"); + ret = btrfs_restripe(rctl, 1); + + mutex_unlock(&fs_info->restripe_mutex); + return ret; +} + +int btrfs_recover_restripe(struct btrfs_root *tree_root) +{ + struct task_struct *tsk; + struct restripe_control *rctl; + struct btrfs_restripe_item *item; + struct btrfs_disk_restripe_args disk_rargs; + struct btrfs_path *path; + struct extent_buffer *leaf; + struct btrfs_key key; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + rctl = kzalloc(sizeof(*rctl), GFP_NOFS); + if (!rctl) { + ret = -ENOMEM; + goto out; + } + + key.objectid = BTRFS_RESTRIPE_OBJECTID; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, tree_root, &key, path, 0, 0); + if (ret < 0) + goto out_free; + if (ret > 0) { /* ret = -ENOENT; */ + ret = 0; + goto out_free; + } + + leaf = path->nodes[0]; + item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_restripe_item); + + rctl->fs_info = tree_root->fs_info; + rctl->flags = btrfs_restripe_flags(leaf, item); + + btrfs_restripe_data(leaf, item, &disk_rargs); + btrfs_disk_restripe_args_to_cpu(&rctl->data, &disk_rargs); + btrfs_restripe_meta(leaf, item, &disk_rargs); + btrfs_disk_restripe_args_to_cpu(&rctl->meta, &disk_rargs); + btrfs_restripe_sys(leaf, item, &disk_rargs); + btrfs_disk_restripe_args_to_cpu(&rctl->sys, &disk_rargs); + + tsk = kthread_run(restriper_kthread, rctl, "btrfs-restriper"); + if (IS_ERR(tsk)) + ret = PTR_ERR(tsk); + else + goto out; + +out_free: + kfree(rctl); +out: + btrfs_free_path(path); + return ret; +} + /* * shrinking a device means finding all of the device extents past * the new size, and then following the back refs to the chunks. diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 9726180..6fcb4a5 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -252,7 +252,8 @@ struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid, int btrfs_shrink_device(struct btrfs_device *device, u64 new_size); int btrfs_init_new_device(struct btrfs_root *root, char *path); int btrfs_balance(struct btrfs_root *dev_root); -int btrfs_restripe(struct restripe_control *rctl); +int btrfs_restripe(struct restripe_control *rctl, int resume); +int btrfs_recover_restripe(struct btrfs_root *tree_root); int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); int find_free_dev_extent(struct btrfs_trans_handle *trans, struct btrfs_device *device, u64 num_bytes, -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Implement an ioctl for cancelling restriper. Currently we wait until relocation of the current block group is finished, in future this can be done by triggering a commit. Restripe item is deleted and no memory about the interrupted restripe is kept. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ctree.h | 2 + fs/btrfs/disk-io.c | 2 + fs/btrfs/ioctl.c | 20 +++++++++++++++++ fs/btrfs/ioctl.h | 3 ++ fs/btrfs/volumes.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++---- fs/btrfs/volumes.h | 7 ++++++ 6 files changed, 90 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b524034..8e764d9 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1165,6 +1165,8 @@ struct btrfs_fs_info { spinlock_t restripe_lock; struct mutex restripe_mutex; struct restripe_control *restripe_ctl; + unsigned long restripe_state; + wait_queue_head_t restripe_wait; unsigned data_chunk_allocations; unsigned metadata_ratio; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b3950f2..662a6e6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1703,6 +1703,8 @@ struct btrfs_root *open_ctree(struct super_block *sb, spin_lock_init(&fs_info->restripe_lock); mutex_init(&fs_info->restripe_mutex); fs_info->restripe_ctl = NULL; + fs_info->restripe_state = 0; + init_waitqueue_head(&fs_info->restripe_wait); sb->s_blocksize = 4096; sb->s_blocksize_bits = blksize_bits(4096); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f371edd..d8bdb67 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2878,6 +2878,10 @@ static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg) return -EROFS; mutex_lock(&fs_info->restripe_mutex); + if (fs_info->restripe_ctl) { + ret = -EINPROGRESS; + goto out; + } rargs = memdup_user(arg, sizeof(*rargs)); if (IS_ERR(rargs)) { @@ -2908,6 +2912,20 @@ out: return ret; } +static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root, + int cmd) +{ + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + switch (cmd) { + case BTRFS_RESTRIPE_CTL_CANCEL: + return btrfs_cancel_restripe(root->fs_info); + } + + return -EINVAL; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2982,6 +3000,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_scrub_progress(root, argp); case BTRFS_IOC_RESTRIPE: return btrfs_ioctl_restripe(root, argp); + case BTRFS_IOC_RESTRIPE_CTL: + return btrfs_ioctl_restripe_ctl(root, arg); } return -ENOTTY; diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 798f1d4..4f6ead5 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -109,6 +109,8 @@ struct btrfs_ioctl_fs_info_args { __u64 reserved[124]; /* pad to 1k */ }; +#define BTRFS_RESTRIPE_CTL_CANCEL 1 + struct btrfs_restripe_args { __u64 profiles; __u64 usage; @@ -285,4 +287,5 @@ struct btrfs_ioctl_space_args { struct btrfs_ioctl_fs_info_args) #define BTRFS_IOC_RESTRIPE _IOW(BTRFS_IOCTL_MAGIC, 32, \ struct btrfs_ioctl_restripe_args) +#define BTRFS_IOC_RESTRIPE_CTL _IOW(BTRFS_IOCTL_MAGIC, 33, int) #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4490124..cd43368 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2553,6 +2553,13 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) key.type = BTRFS_CHUNK_ITEM_KEY; while (1) { + struct btrfs_fs_info *fs_info = dev_root->fs_info; + + if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) { + ret = -ECANCELED; + goto error; + } + ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0); if (ret < 0) goto error; @@ -2715,16 +2722,25 @@ do_restripe: set_restripe_control(rctl, resume); mutex_unlock(&fs_info->volume_mutex); + set_bit(RESTRIPE_RUNNING, &fs_info->restripe_state); + mutex_unlock(&fs_info->restripe_mutex); + err = __btrfs_restripe(fs_info->dev_root); - mutex_lock(&fs_info->volume_mutex); + mutex_lock(&fs_info->restripe_mutex); + clear_bit(RESTRIPE_RUNNING, &fs_info->restripe_state); - unset_restripe_control(fs_info); - ret = del_restripe_item(fs_info->tree_root); - BUG_ON(ret); + if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) { + mutex_lock(&fs_info->volume_mutex); - mutex_unlock(&fs_info->volume_mutex); + unset_restripe_control(fs_info); + ret = del_restripe_item(fs_info->tree_root); + BUG_ON(ret); + + mutex_unlock(&fs_info->volume_mutex); + } + wake_up(&fs_info->restripe_wait); return err; out: @@ -2807,6 +2823,41 @@ out: return ret; } +int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info) +{ + int ret = 0; + + mutex_lock(&fs_info->restripe_mutex); + if (!fs_info->restripe_ctl) { + ret = -ENOTCONN; + goto out; + } + + if (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) { + set_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state); + while (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) { + mutex_unlock(&fs_info->restripe_mutex); + wait_event(fs_info->restripe_wait, + !test_bit(RESTRIPE_RUNNING, + &fs_info->restripe_state)); + mutex_lock(&fs_info->restripe_mutex); + } + clear_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state); + } else { + mutex_lock(&fs_info->volume_mutex); + + unset_restripe_control(fs_info); + ret = del_restripe_item(fs_info->tree_root); + BUG_ON(ret); + + mutex_unlock(&fs_info->volume_mutex); + } + +out: + mutex_unlock(&fs_info->restripe_mutex); + return ret; +} + /* * shrinking a device means finding all of the device extents past * the new size, and then following the back refs to the chunks. diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 6fcb4a5..dd1fa7f 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -199,6 +199,12 @@ struct map_lookup { #define BTRFS_RESTRIPE_ARGS_CONVERT (1ULL << 8) #define BTRFS_RESTRIPE_ARGS_SOFT (1ULL << 9) +/* + * Restripe state bits + */ +#define RESTRIPE_RUNNING 0 +#define RESTRIPE_CANCEL_REQ 1 + struct btrfs_restripe_args; struct restripe_control { struct btrfs_fs_info *fs_info; @@ -254,6 +260,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *path); int btrfs_balance(struct btrfs_root *dev_root); int btrfs_restripe(struct restripe_control *rctl, int resume); int btrfs_recover_restripe(struct btrfs_root *tree_root); +int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info); int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); int find_free_dev_extent(struct btrfs_trans_handle *trans, struct btrfs_device *device, u64 num_bytes, -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Implement an ioctl for pausing restriper. This pauses the relocation, but restripe is still considered to be "in progress": restriper item is not deleted, other volume operations cannot be started, etc. If paused in the middle of profile changing operation we will continue making allocations with the target profile. Add a hook to close_ctree() to be able to pause restriper and free it''s data structures on unmount. (It''s safe to unmount when restriper is in ''paused'' state, we will resume with the same parameters on the next mount) Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/disk-io.c | 3 +++ fs/btrfs/ioctl.c | 2 ++ fs/btrfs/ioctl.h | 1 + fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 2 ++ 5 files changed, 50 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 662a6e6..7db5c50 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2542,6 +2542,9 @@ int close_ctree(struct btrfs_root *root) fs_info->closing = 1; smp_mb(); + /* pause restriper and free restripe_ctl */ + btrfs_pause_restripe(root->fs_info, 1); + btrfs_scrub_cancel(root); /* wait for any defraggers to finish */ diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d8bdb67..61978ac 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2921,6 +2921,8 @@ static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root, switch (cmd) { case BTRFS_RESTRIPE_CTL_CANCEL: return btrfs_cancel_restripe(root->fs_info); + case BTRFS_RESTRIPE_CTL_PAUSE: + return btrfs_pause_restripe(root->fs_info, 0); } return -EINVAL; diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 4f6ead5..e468d5b 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -110,6 +110,7 @@ struct btrfs_ioctl_fs_info_args { }; #define BTRFS_RESTRIPE_CTL_CANCEL 1 +#define BTRFS_RESTRIPE_CTL_PAUSE 2 struct btrfs_restripe_args { __u64 profiles; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index cd43368..65deaa7 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2555,7 +2555,8 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) while (1) { struct btrfs_fs_info *fs_info = dev_root->fs_info; - if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) { + if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) || + test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state)) { ret = -ECANCELED; goto error; } @@ -2730,7 +2731,9 @@ do_restripe: mutex_lock(&fs_info->restripe_mutex); clear_bit(RESTRIPE_RUNNING, &fs_info->restripe_state); - if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) { + if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) || + (!test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state) && + !test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state))) { mutex_lock(&fs_info->volume_mutex); unset_restripe_control(fs_info); @@ -2858,6 +2861,43 @@ out: return ret; } +int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset) +{ + int ret = 0; + + mutex_lock(&fs_info->restripe_mutex); + if (!fs_info->restripe_ctl) { + ret = -ENOTCONN; + goto out; + } + + /* only running restripe can be paused */ + if (!test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) { + ret = -ENOTCONN; + goto out_unset; + } + + set_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state); + while (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) { + mutex_unlock(&fs_info->restripe_mutex); + wait_event(fs_info->restripe_wait, + !test_bit(RESTRIPE_RUNNING, + &fs_info->restripe_state)); + mutex_lock(&fs_info->restripe_mutex); + } + clear_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state); + +out_unset: + if (unset) { + mutex_lock(&fs_info->volume_mutex); + unset_restripe_control(fs_info); + mutex_unlock(&fs_info->volume_mutex); + } +out: + mutex_unlock(&fs_info->restripe_mutex); + return ret; +} + /* * shrinking a device means finding all of the device extents past * the new size, and then following the back refs to the chunks. diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index dd1fa7f..b8c234a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -204,6 +204,7 @@ struct map_lookup { */ #define RESTRIPE_RUNNING 0 #define RESTRIPE_CANCEL_REQ 1 +#define RESTRIPE_PAUSE_REQ 2 struct btrfs_restripe_args; struct restripe_control { @@ -261,6 +262,7 @@ int btrfs_balance(struct btrfs_root *dev_root); int btrfs_restripe(struct restripe_control *rctl, int resume); int btrfs_recover_restripe(struct btrfs_root *tree_root); int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info); +int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset); int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); int find_free_dev_extent(struct btrfs_trans_handle *trans, struct btrfs_device *device, u64 num_bytes, -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Aug-23 20:01 UTC
[PATCH 18/21] Btrfs: allow for resuming restriper after it was paused
Implement an ioctl for resuming restriper. We use the same heuristics used when recovering restripe after a crash to try to start where we left off last time. If needed those parameters can be made configurable through the userspace "resume" command in future. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ioctl.c | 2 ++ fs/btrfs/ioctl.h | 1 + fs/btrfs/volumes.c | 25 +++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 4 files changed, 29 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 61978ac..cb2f420 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2923,6 +2923,8 @@ static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root, return btrfs_cancel_restripe(root->fs_info); case BTRFS_RESTRIPE_CTL_PAUSE: return btrfs_pause_restripe(root->fs_info, 0); + case BTRFS_RESTRIPE_CTL_RESUME: + return btrfs_resume_restripe(root->fs_info); } return -EINVAL; diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index e468d5b..365d06c 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -111,6 +111,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_RESTRIPE_CTL_CANCEL 1 #define BTRFS_RESTRIPE_CTL_PAUSE 2 +#define BTRFS_RESTRIPE_CTL_RESUME 3 struct btrfs_restripe_args { __u64 profiles; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 65deaa7..bfe2b03 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2898,6 +2898,31 @@ out: return ret; } +int btrfs_resume_restripe(struct btrfs_fs_info *fs_info) +{ + int ret; + + if (fs_info->sb->s_flags & MS_RDONLY) + return -EROFS; + + mutex_lock(&fs_info->restripe_mutex); + if (!fs_info->restripe_ctl) { + ret = -ENOTCONN; + goto out; + } + + if (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) { + ret = -EINPROGRESS; + goto out; + } + + ret = btrfs_restripe(fs_info->restripe_ctl, 1); + +out: + mutex_unlock(&fs_info->restripe_mutex); + return ret; +} + /* * shrinking a device means finding all of the device extents past * the new size, and then following the back refs to the chunks. diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index b8c234a..c0652c9 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -263,6 +263,7 @@ int btrfs_restripe(struct restripe_control *rctl, int resume); int btrfs_recover_restripe(struct btrfs_root *tree_root); int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info); int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset); +int btrfs_resume_restripe(struct btrfs_fs_info *fs_info); int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); int find_free_dev_extent(struct btrfs_trans_handle *trans, struct btrfs_device *device, u64 num_bytes, -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Since restriper kthread starts involuntarily on mount and can suck cpu and memory bandwidth add a mount option to forcefully skip it. The restriper in that case hangs around in paused state and can be resumed from userspace when it''s convenient. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ctree.h | 1 + fs/btrfs/super.c | 8 +++++++- fs/btrfs/volumes.c | 15 +++++++++++++-- 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8e764d9..0eaa08d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1432,6 +1432,7 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_MOUNT_ENOSPC_DEBUG (1 << 15) #define BTRFS_MOUNT_AUTO_DEFRAG (1 << 16) #define BTRFS_MOUNT_INODE_MAP_CACHE (1 << 17) +#define BTRFS_MOUNT_SKIP_RESTRIPE (1 << 18) #define btrfs_clear_opt(o, opt) ((o) &= ~BTRFS_MOUNT_##opt) #define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 15634d4..1ef8c33 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -162,7 +162,7 @@ enum { Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard, Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed, Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, - Opt_inode_cache, Opt_err, + Opt_inode_cache, Opt_skip_restripe, Opt_err, }; static match_table_t tokens = { @@ -195,6 +195,7 @@ static match_table_t tokens = { {Opt_subvolrootid, "subvolrootid=%d"}, {Opt_defrag, "autodefrag"}, {Opt_inode_cache, "inode_cache"}, + {Opt_skip_restripe, "skip_restripe"}, {Opt_err, NULL}, }; @@ -381,6 +382,9 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) printk(KERN_INFO "btrfs: enabling auto defrag"); btrfs_set_opt(info->mount_opt, AUTO_DEFRAG); break; + case Opt_skip_restripe: + btrfs_set_opt(info->mount_opt, SKIP_RESTRIPE); + break; case Opt_err: printk(KERN_INFO "btrfs: unrecognized mount option " "''%s''\n", p); @@ -729,6 +733,8 @@ static int btrfs_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_puts(seq, ",autodefrag"); if (btrfs_test_opt(root, INODE_MAP_CACHE)) seq_puts(seq, ",inode_cache"); + if (btrfs_test_opt(root, SKIP_RESTRIPE)) + seq_puts(seq, ",skip_restripe"); return 0; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bfe2b03..d8958e2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2756,13 +2756,24 @@ static int restriper_kthread(void *data) { struct restripe_control *rctl = (struct restripe_control *)data; struct btrfs_fs_info *fs_info = rctl->fs_info; - int ret; + int ret = 0; mutex_lock(&fs_info->restripe_mutex); - printk(KERN_INFO "btrfs: continuing restripe\n"); + if (btrfs_test_opt(fs_info->tree_root, SKIP_RESTRIPE)) { + mutex_lock(&fs_info->volume_mutex); + set_restripe_control(rctl, 0); + mutex_unlock(&fs_info->volume_mutex); + + printk(KERN_INFO "btrfs: force skipping restripe\n"); + goto out; + } else { + printk(KERN_INFO "btrfs: continuing restripe\n"); + } + ret = btrfs_restripe(rctl, 1); +out: mutex_unlock(&fs_info->restripe_mutex); return ret; } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Aug-23 20:02 UTC
[PATCH 20/21] Btrfs: get rid of btrfs_balance() function
Remove btrfs_balance(). The old balancing ioctl now uses restriper infrastructure, just w/o using any filters. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ioctl.c | 38 +++++++++++++++++- fs/btrfs/volumes.c | 115 ++++----------------------------------------------- fs/btrfs/volumes.h | 1 - 3 files changed, 46 insertions(+), 108 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index cb2f420..4f29149 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2864,6 +2864,42 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root, return ret; } +static long btrfs_ioctl_balance(struct btrfs_root *root) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + struct restripe_control *rctl; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (fs_info->sb->s_flags & MS_RDONLY) + return -EROFS; + + mutex_lock(&fs_info->restripe_mutex); + if (fs_info->restripe_ctl) { + ret = -EINPROGRESS; + goto out; + } + + rctl = kzalloc(sizeof(*rctl), GFP_NOFS); + if (!rctl) { + ret = -ENOMEM; + goto out; + } + + rctl->fs_info = fs_info; + /* relocate everything - no filters */ + rctl->flags |= BTRFS_RESTRIPE_TYPE_MASK; + + ret = btrfs_restripe(rctl, 0); + + /* rctl freed in unset_restripe_control */ +out: + mutex_unlock(&fs_info->restripe_mutex); + return ret; +} + static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg) { struct btrfs_ioctl_restripe_args *rargs; @@ -2974,7 +3010,7 @@ long btrfs_ioctl(struct file *file, unsigned int case BTRFS_IOC_DEV_INFO: return btrfs_ioctl_dev_info(root, argp); case BTRFS_IOC_BALANCE: - return btrfs_balance(root->fs_info->dev_root); + return btrfs_ioctl_balance(root); case BTRFS_IOC_CLONE: return btrfs_ioctl_clone(file, arg, 0, 0, 0); case BTRFS_IOC_CLONE_RANGE: diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d8958e2..ead4996 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2045,112 +2045,6 @@ error: return ret; } -static u64 div_factor(u64 num, int factor) -{ - if (factor == 10) - return num; - num *= factor; - do_div(num, 10); - return num; -} - -int btrfs_balance(struct btrfs_root *dev_root) -{ - int ret; - struct list_head *devices = &dev_root->fs_info->fs_devices->devices; - struct btrfs_device *device; - u64 old_size; - u64 size_to_free; - struct btrfs_path *path; - struct btrfs_key key; - struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root; - struct btrfs_trans_handle *trans; - struct btrfs_key found_key; - - if (dev_root->fs_info->sb->s_flags & MS_RDONLY) - return -EROFS; - - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - - mutex_lock(&dev_root->fs_info->volume_mutex); - dev_root = dev_root->fs_info->dev_root; - - /* step one make some room on all the devices */ - list_for_each_entry(device, devices, dev_list) { - old_size = device->total_bytes; - size_to_free = div_factor(old_size, 1); - size_to_free = min(size_to_free, (u64)1 * 1024 * 1024); - if (!device->writeable || - device->total_bytes - device->bytes_used > size_to_free) - continue; - - ret = btrfs_shrink_device(device, old_size - size_to_free); - if (ret == -ENOSPC) - break; - BUG_ON(ret); - - trans = btrfs_start_transaction(dev_root, 0); - BUG_ON(IS_ERR(trans)); - - ret = btrfs_grow_device(trans, device, old_size); - BUG_ON(ret); - - btrfs_end_transaction(trans, dev_root); - } - - /* step two, relocate all the chunks */ - path = btrfs_alloc_path(); - if (!path) { - ret = -ENOMEM; - goto error; - } - key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; - key.offset = (u64)-1; - key.type = BTRFS_CHUNK_ITEM_KEY; - - while (1) { - ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0); - if (ret < 0) - goto error; - - /* - * this shouldn''t happen, it means the last relocate - * failed - */ - if (ret == 0) - break; - - ret = btrfs_previous_item(chunk_root, path, 0, - BTRFS_CHUNK_ITEM_KEY); - if (ret) - break; - - btrfs_item_key_to_cpu(path->nodes[0], &found_key, - path->slots[0]); - if (found_key.objectid != key.objectid) - break; - - /* chunk zero is special */ - if (found_key.offset == 0) - break; - - btrfs_release_path(path); - ret = btrfs_relocate_chunk(chunk_root, - chunk_root->root_key.objectid, - found_key.objectid, - found_key.offset); - if (ret && ret != -ENOSPC) - goto error; - key.offset = found_key.offset - 1; - } - ret = 0; -error: - btrfs_free_path(path); - mutex_unlock(&dev_root->fs_info->volume_mutex); - return ret; -} - static int insert_restripe_item(struct btrfs_root *root, struct restripe_control *rctl) { @@ -2500,6 +2394,15 @@ static int should_restripe_chunk(struct btrfs_root *root, return 1; } +static u64 div_factor(u64 num, int factor) +{ + if (factor == 10) + return num; + num *= factor; + do_div(num, 10); + return num; +} + static int __btrfs_restripe(struct btrfs_root *dev_root) { struct list_head *devices; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index c0652c9..20da71f 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -258,7 +258,6 @@ struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid, u8 *uuid, u8 *fsid); int btrfs_shrink_device(struct btrfs_device *device, u64 new_size); int btrfs_init_new_device(struct btrfs_root *root, char *path); -int btrfs_balance(struct btrfs_root *dev_root); int btrfs_restripe(struct restripe_control *rctl, int resume); int btrfs_recover_restripe(struct btrfs_root *tree_root); int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- fs/btrfs/ioctl.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ioctl.h | 2 ++ fs/btrfs/volumes.c | 40 ++++++++++++++++++++++++++++++++++------ fs/btrfs/volumes.h | 3 +++ 4 files changed, 84 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 4f29149..a342544 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2966,6 +2966,49 @@ static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root, return -EINVAL; } +static long btrfs_ioctl_restripe_progress(struct btrfs_root *root, + void __user *arg) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_ioctl_restripe_args *rargs; + struct restripe_control *rctl; + int ret = 0; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + mutex_lock(&fs_info->restripe_mutex); + if (!(rctl = fs_info->restripe_ctl)) { + ret = -ENOTCONN; + goto out; + } + + rargs = kzalloc(sizeof(*rargs), GFP_NOFS); + if (!rargs) { + ret = -ENOMEM; + goto out; + } + + rargs->flags = rctl->flags; + rargs->state = fs_info->restripe_state; + + memcpy(&rargs->data, &rctl->data, sizeof(rargs->data)); + memcpy(&rargs->sys, &rctl->sys, sizeof(rargs->sys)); + memcpy(&rargs->meta, &rctl->meta, sizeof(rargs->meta)); + + spin_lock(&fs_info->restripe_lock); + memcpy(&rargs->stat, &rctl->stat, sizeof(rargs->stat)); + spin_unlock(&fs_info->restripe_lock); + + if (copy_to_user(arg, rargs, sizeof(*rargs))) + ret = -EFAULT; + + kfree(rargs); +out: + mutex_unlock(&fs_info->restripe_mutex); + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -3042,6 +3085,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_restripe(root, argp); case BTRFS_IOC_RESTRIPE_CTL: return btrfs_ioctl_restripe_ctl(root, arg); + case BTRFS_IOC_RESTRIPE_PROGRESS: + return btrfs_ioctl_restripe_progress(root, argp); } return -ENOTTY; diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 365d06c..2154816 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -290,4 +290,6 @@ struct btrfs_ioctl_space_args { #define BTRFS_IOC_RESTRIPE _IOW(BTRFS_IOCTL_MAGIC, 32, \ struct btrfs_ioctl_restripe_args) #define BTRFS_IOC_RESTRIPE_CTL _IOW(BTRFS_IOCTL_MAGIC, 33, int) +#define BTRFS_IOC_RESTRIPE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 34, \ + struct btrfs_ioctl_restripe_args) #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ead4996..9a248b9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2187,8 +2187,10 @@ static void set_restripe_control(struct restripe_control *rctl, int update) spin_lock(&fs_info->restripe_lock); fs_info->restripe_ctl = rctl; - if (update) + if (update) { update_restripe_args(rctl); + memset(&rctl->stat, 0, sizeof(rctl->stat)); + } spin_unlock(&fs_info->restripe_lock); } @@ -2419,6 +2421,7 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) int slot; int ret; int enospc_errors = 0; + bool counting_only = true; /* step one make some room on all the devices */ devices = &dev_root->fs_info->fs_devices->devices; @@ -2451,12 +2454,14 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) goto error; } +again: key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; key.offset = (u64)-1; key.type = BTRFS_CHUNK_ITEM_KEY; while (1) { struct btrfs_fs_info *fs_info = dev_root->fs_info; + struct restripe_control *rctl = fs_info->restripe_ctl; if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) || test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state)) { @@ -2493,25 +2498,48 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk); - if (!should_restripe_chunk(chunk_root, leaf, chunk, - found_key.offset)) { - btrfs_release_path(path); - goto loop; + if (!counting_only) { + spin_lock(&fs_info->restripe_lock); + rctl->stat.considered++; + spin_unlock(&fs_info->restripe_lock); } + ret = should_restripe_chunk(chunk_root, leaf, chunk, + found_key.offset); btrfs_release_path(path); + if (!ret) + goto loop; + + if (counting_only) { + spin_lock(&fs_info->restripe_lock); + rctl->stat.expected++; + spin_unlock(&fs_info->restripe_lock); + goto loop; + } + ret = btrfs_relocate_chunk(chunk_root, chunk_root->root_key.objectid, found_key.objectid, found_key.offset); if (ret && ret != -ENOSPC) goto error; - if (ret == -ENOSPC) + if (ret == -ENOSPC) { enospc_errors++; + } else { + spin_lock(&fs_info->restripe_lock); + rctl->stat.completed++; + spin_unlock(&fs_info->restripe_lock); + } loop: key.offset = found_key.offset - 1; } + if (counting_only) { + btrfs_release_path(path); + counting_only = false; + goto again; + } + error: btrfs_free_path(path); if (enospc_errors) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 20da71f..5ca3b3b 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -207,6 +207,7 @@ struct map_lookup { #define RESTRIPE_PAUSE_REQ 2 struct btrfs_restripe_args; +struct btrfs_restripe_progress; struct restripe_control { struct btrfs_fs_info *fs_info; u64 flags; @@ -214,6 +215,8 @@ struct restripe_control { struct btrfs_restripe_args data; struct btrfs_restripe_args sys; struct btrfs_restripe_args meta; + + struct btrfs_restripe_progress stat; }; int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start, -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, I''ve hit a problem with restriper but under ragher unclear conditions: [12308.210636] ------------[ cut here ]------------ [12308.214185] kernel BUG at fs/btrfs/relocation.c:2047! [12308.214185] invalid opcode: 0000 [#1] SMP [12308.214185] CPU 0 [12308.214185] Modules linked in: loop btrfs aoe [12308.214185] [12308.214185] Pid: 31102, comm: btrfs Not tainted 3.1.0-rc7-default+ #32 Intel Corporation Santa Rosa platform/Matanzas [12308.214185] RIP: 0010:[<ffffffffa0084af5>] [<ffffffffa0084af5>] merge_reloc_root+0x5d5/0x600 [btrfs] [12308.214185] RSP: 0018:ffff88003e0159f8 EFLAGS: 00010293 [12308.214185] RAX: 00000000ffffffe4 RBX: ffff880051bc1c70 RCX: 0000000000000000 [12308.214185] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880053a9ccb8 [12308.214185] RBP: ffff88003e015ae8 R08: 0000000000000000 R09: 0000000000000000 [12308.214185] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880075041000 [12308.214185] R13: ffff8800585bb198 R14: ffff880000000000 R15: ffff880026e04000 [12308.214185] FS: 00007fda377f3740(0000) GS:ffff88007e400000(0000) knlGS:0000000000000000 [12308.214185] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [12308.214185] CR2: 00007f049bb1c000 CR3: 0000000026c97000 CR4: 00000000000006f0 [12308.214185] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [12308.214185] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [12308.214185] Process btrfs (pid: 31102, threadinfo ffff88003e014000, task ffff880040d549c0) [12308.214185] Stack: [12308.214185] 000000003e015a08 ffff880057b98070 ffff880026fc30fc 0000000000000246 [12308.214185] ffff880057b98070 ffff880057b98058 000000000000e000 ffff880026fc3000 [12308.214185] ffff880057b98058 ffff880057b98058 ffff88003e015a68 ffffffff81c2835b [12308.214185] Call Trace: [12308.214185] [<ffffffff81c2835b>] ? _raw_spin_unlock+0x2b/0x50 [12308.214185] [<ffffffffa00364ad>] ? btrfs_read_fs_root_no_name+0x1fd/0x310 [btrfs] [12308.214185] [<ffffffffa0084c44>] merge_reloc_roots+0x124/0x150 [btrfs] [12308.214185] [<ffffffffa0085258>] relocate_block_group+0x398/0x610 [btrfs] [12308.214185] [<ffffffffa003bcf7>] ? btrfs_clean_old_snapshots+0x197/0x1c0 [btrfs] [12308.214185] [<ffffffffa0085680>] btrfs_relocate_block_group+0x1b0/0x2e0 [btrfs] [12308.214185] [<ffffffffa0060b7b>] btrfs_relocate_chunk+0x8b/0x6c0 [btrfs] [12308.214185] [<ffffffff810e0e10>] ? trace_hardirqs_on_caller+0x20/0x1d0 [12308.214185] [<ffffffff81089383>] ? __wake_up+0x53/0x70 [12308.214185] [<ffffffffa006ef80>] ? btrfs_tree_read_unlock_blocking+0x40/0x60 [btrfs] [12308.214185] [<ffffffffa0064ca9>] btrfs_restripe+0x689/0xb00 [btrfs] [12308.214185] [<ffffffff811858e4>] ? __kmalloc+0x234/0x260 [12308.214185] [<ffffffffa006e871>] btrfs_ioctl+0x14e1/0x1560 [btrfs] [12308.214185] [<ffffffff81c2c660>] ? do_page_fault+0x2d0/0x580 [12308.214185] [<ffffffff811a4568>] do_vfs_ioctl+0x98/0x560 [12308.214185] [<ffffffff810da369>] ? trace_hardirqs_off_caller+0x29/0xc0 [12308.214185] [<ffffffff81c28bd9>] ? retint_swapgs+0x13/0x1b [12308.214185] [<ffffffff81192a6b>] ? fget_light+0x17b/0x3c0 [12308.214185] [<ffffffff811a4a7f>] sys_ioctl+0x4f/0x80 [12308.214185] [<ffffffff81c312c2>] system_call_fastpath+0x16/0x1b [12308.214185] Code: ff ff 41 bd f4 ff ff ff eb b9 48 8d 95 70 ff ff ff 48 8d 75 90 4c 89 ff e8 a9 9f ff ff eb a4 48 89 df e8 cf 28 f9 ff eb 9a 0f 0b <0f> 0b 0f 0b 0f 0b be ef 07 00 00 48 c7 c7 b4 49 09 a0 e8 54 99 [12308.214185] RIP [<ffffffffa0084af5>] merge_reloc_root+0x5d5/0x600 [btrfs] [12308.214185] RSP <ffff88003e0159f8> [12308.652440] ---[ end trace a106d7cf9f82a8ff ]--- steps before the crash - data: a freshly created raid10, 5 devices with about 4 gigs of data, lots of chained snapshots, lots of them deleted (both numbers are in order of 10) - device remove - restripe - device add - restriper start [blocked] - restripe cancel [blocked] - *crash* - successive mount is ok - rebalance continues, can be started/cancelled without problems the error is ENOSPC from snapshot cleanup. one thing that was visible only on disk activity monitor was a steady several-megs of writes performed by freespace thread. I''ve seen this already, but I''m not able to reproduce it reliably. the tree is from my experimental integration branch http://repo.or.cz/w/linux-2.6/btrfs-unstable.git integration/btrfs-next-experimental (linus+josef+mark+janosch+restriper+hotfixes from mailinglist) apart from that, basic switching of raids works nicely. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Sep-27 12:51 UTC
Re: [PATCH 01/21] Btrfs: get rid of *_alloc_profile fields
On Tue, Aug 23, 2011 at 11:01:42PM +0300, Ilya Dryomov wrote:> {data,metadata,system}_alloc_profile fields have been unused for a long > time now. Get rid of them.a good cleanup which could be sent separately. d/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Sep-27 13:02 UTC
Re: [PATCH 07/21] Btrfs: add basic infrastructure for selective balancing
On Tue, Aug 23, 2011 at 11:01:48PM +0300, Ilya Dryomov wrote:> This allows to have a separate set of filters for each chunk type > (data,meta,sys). The code however is generic and switch on chunk type > is only done once. > > This commit also adds a type filter: it allows to balance for example > meta and system chunks w/o touching data ones. > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > fs/btrfs/volumes.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++-- > fs/btrfs/volumes.h | 12 +++++++++ > 2 files changed, 76 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 0e4a276..95c6310 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -2175,6 +2175,30 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info) > kfree(rctl); > } > > +static int should_restripe_chunk(struct btrfs_root *root, > + struct extent_buffer *leaf, > + struct btrfs_chunk *chunk, u64 chunk_offset) > +{ > + struct restripe_control *rctl = root->fs_info->restripe_ctl; > + u64 chunk_type = btrfs_chunk_type(leaf, chunk); > + struct btrfs_restripe_args *rargs = NULL; > + > + /* type filter */ > + if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) & > + (rctl->flags & BTRFS_RESTRIPE_TYPE_MASK))) { > + return 0; > + } > + > + if (chunk_type & BTRFS_BLOCK_GROUP_DATA) > + rargs = &rctl->data; > + else if (chunk_type & BTRFS_BLOCK_GROUP_SYSTEM) > + rargs = &rctl->sys; > + else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA) > + rargs = &rctl->meta;what''s the point of setting local variable ''rargs'' without using or returning it?> + > + return 1; > +} > + > static int __btrfs_restripe(struct btrfs_root *dev_root) > { > struct list_head *devices; > @@ -2182,10 +2206,13 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) > u64 old_size; > u64 size_to_free; > struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root; > + struct btrfs_chunk *chunk; > struct btrfs_path *path; > struct btrfs_key key; > struct btrfs_key found_key; > struct btrfs_trans_handle *trans; > + struct extent_buffer *leaf; > + int slot; > int ret; > int enospc_errors = 0; > > @@ -2241,8 +2268,10 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) > if (ret) > BUG_ON(1); /* DIS - break ? */ > > - btrfs_item_key_to_cpu(path->nodes[0], &found_key, > - path->slots[0]); > + leaf = path->nodes[0]; > + slot = path->slots[0]; > + btrfs_item_key_to_cpu(leaf, &found_key, slot); > + > if (found_key.objectid != key.objectid) > break; > > @@ -2250,6 +2279,14 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) > if (found_key.offset == 0) > break; > > + chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk); > + > + if (!should_restripe_chunk(chunk_root, leaf, chunk, > + found_key.offset)) { > + btrfs_release_path(path); > + goto loop; > + } > + > btrfs_release_path(path); > ret = btrfs_relocate_chunk(chunk_root, > chunk_root->root_key.objectid, > @@ -2259,6 +2296,7 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) > goto error; > if (ret == -ENOSPC) > enospc_errors++; > +loop: > key.offset = found_key.offset - 1; > } > > @@ -2285,8 +2323,30 @@ int btrfs_restripe(struct restripe_control *rctl) > mutex_lock(&fs_info->volume_mutex); > > /* > - * Profile changing sanity checks > + * In case of mixed groups both data and meta should be picked, > + * and identical options should be given for both of them. > */ > + allowed = btrfs_super_incompat_flags(&fs_info->super_copy); > + if ((allowed & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) && > + (rctl->flags & (BTRFS_RESTRIPE_DATA | BTRFS_RESTRIPE_METADATA))) { > + if (!(rctl->flags & BTRFS_RESTRIPE_DATA) || > + !(rctl->flags & BTRFS_RESTRIPE_METADATA) || > + memcmp(&rctl->data, &rctl->meta, sizeof(rctl->data))) { > + printk(KERN_ERR "btrfs: with mixed groups data and " > + "metadata restripe options must be the same\n"); > + ret = -EINVAL; > + goto out; > + } > + } > + > + /* > + * Profile changing sanity checks. Skip them if a simple > + * balance is requested. > + */ > + if (!((rctl->data.flags | rctl->sys.flags | rctl->meta.flags) & > + BTRFS_RESTRIPE_ARGS_CONVERT)) > + goto do_restripe; > + > allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE; > if (fs_info->fs_devices->num_devices == 1) > allowed |= BTRFS_BLOCK_GROUP_DUP; > @@ -2344,6 +2404,7 @@ int btrfs_restripe(struct restripe_control *rctl) > } > } > > +do_restripe: > set_restripe_control(rctl); > mutex_unlock(&fs_info->volume_mutex); > > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index 8804c5c..f40227e 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -168,6 +168,18 @@ struct map_lookup { > #define map_lookup_size(n) (sizeof(struct map_lookup) + \ > (sizeof(struct btrfs_bio_stripe) * (n))) > > +/* > + * Restriper''s general "type" filter. Shares bits with chunk type for > + * simplicity, RESTRIPE prefix is used to avoid confusion. > + */ > +#define BTRFS_RESTRIPE_DATA (1ULL << 0) > +#define BTRFS_RESTRIPE_SYSTEM (1ULL << 1) > +#define BTRFS_RESTRIPE_METADATA (1ULL << 2) > + > +#define BTRFS_RESTRIPE_TYPE_MASK (BTRFS_RESTRIPE_DATA | \ > + BTRFS_RESTRIPE_SYSTEM | \ > + BTRFS_RESTRIPE_METADATA) > + > #define BTRFS_RESTRIPE_FORCE (1ULL << 3) > > /* > ---- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 23, 2011 at 11:01:51PM +0300, Ilya Dryomov wrote:> Select chunks that are less than X percent full. > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > fs/btrfs/volumes.c | 33 +++++++++++++++++++++++++++++++++ > fs/btrfs/volumes.h | 1 + > 2 files changed, 34 insertions(+), 0 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index f045615..b49ecfa 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -2193,6 +2193,33 @@ static int chunk_profiles_filter(u64 chunk_profile, > return 1; > } > > +static u64 div_factor_fine(u64 num, int factor) > +{factor is obtained from userspace via btrfs_restripe_args and should imhoe be checked for safety.> + if (factor == 100)something like this (if the type is really ''int'') if (factor < 0 || factor >= 100)> + return num; > + num *= factor; > + do_div(num, 100); > + return num; > +} > + > +static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, > + struct btrfs_restripe_args *rargs) > +{ > + struct btrfs_block_group_cache *cache; > + u64 chunk_used, user_thresh; > + int ret = 1; > + > + cache = btrfs_lookup_block_group(fs_info, chunk_offset); > + chunk_used = btrfs_block_group_used(&cache->item); > + > + user_thresh = div_factor_fine(cache->key.offset, rargs->usage);^^^^^^^^^^^^ does not seem right, but AFAICS is harmless, if an overflow occurs> + if (chunk_used < user_thresh) > + ret = 0;will result in ret = 1 and code below will do not continue restriping> + > + btrfs_put_block_group(cache); > + return ret; > +} > + > static int chunk_soft_convert_filter(u64 chunk_profile, > struct btrfs_restripe_args *rargs) > { > @@ -2236,6 +2263,12 @@ static int should_restripe_chunk(struct btrfs_root *root, > return 0; > } > > + /* usage filter */ > + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_USAGE) && > + chunk_usage_filter(rctl->fs_info, chunk_offset, rargs)) {^^^ will skip restriping chunk (if the previous holds).> + return 0; > + } > + > /* soft profile changing mode */ > if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) && > chunk_soft_convert_filter(chunk_type, rargs)) { > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index 9f96ad8..c6baf4b 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -186,6 +186,7 @@ struct map_lookup { > * Restripe filters > */ > #define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0) > +#define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1) > > /* > * Profile changing flags. When SOFT is set we won''t relocate chunk if > -- > 1.7.5.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Sep-27 13:43 UTC
Re: [PATCH 14/21] Btrfs: save restripe parameters to disk
On Tue, Aug 23, 2011 at 11:01:55PM +0300, Ilya Dryomov wrote:> Introduce a new btree objectid for storing restripe item. The reason is > to be able to resume restriper after a crash with the same parameters. > Restripe item has a very high objectid and goes into tree of tree roots. > > The key for the new item is as follows: > > [ BTRFS_RESTRIPE_OBJECTID ; 0 ; 0 ] > > Older kernels simply ignore it so it''s safe to mount with an older > kernel and then go back to the newer one. > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > fs/btrfs/ctree.h | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++- > fs/btrfs/volumes.c | 105 ++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 228 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 65d7562..b524034 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -85,6 +85,9 @@ struct btrfs_ordered_sum; > /* holds checksums of all the data extents */ > #define BTRFS_CSUM_TREE_OBJECTID 7ULL > > +/* for storing restripe params in the root tree */ > +#define BTRFS_RESTRIPE_OBJECTID -4ULL > + > /* orhpan objectid for tracking unlinked/truncated files */ > #define BTRFS_ORPHAN_OBJECTID -5ULL > > @@ -649,6 +652,47 @@ struct btrfs_root_ref { > __le16 name_len; > } __attribute__ ((__packed__)); > > +/* > + * Restriper stuff > + */ > +struct btrfs_disk_restripe_args { > + /* profiles to touch, in-memory format */ > + __le64 profiles; > + > + /* usage filter */ > + __le64 usage; > + > + /* devid filter */ > + __le64 devid; > + > + /* devid subset filter [pstart..pend) */ > + __le64 pstart; > + __le64 pend; > + > + /* btrfs virtual address space subset filter [vstart..vend) */ > + __le64 vstart; > + __le64 vend; > + > + /* profile to convert to, in-memory format */ > + __le64 target; > + > + /* BTRFS_RESTRIPE_ARGS_* */ > + __le64 flags; > + > + __le64 unused[8]; > +} __attribute__ ((__packed__)); > + > +struct btrfs_restripe_item { > + /* BTRFS_RESTRIPE_* */ > + __le64 flags; > + > + struct btrfs_disk_restripe_args data; > + struct btrfs_disk_restripe_args sys; > + struct btrfs_disk_restripe_args meta; > + > + __le64 unused[4]; > +} __attribute__ ((__packed__)); > + > #define BTRFS_FILE_EXTENT_INLINE 0 > #define BTRFS_FILE_EXTENT_REG 1 > #define BTRFS_FILE_EXTENT_PREALLOC 2 > @@ -727,7 +771,8 @@ struct btrfs_csum_item { > BTRFS_BLOCK_GROUP_RAID10) > /* > * We need a bit for restriper to be able to tell when chunks of type > - * SINGLE are available. It is used in avail_*_alloc_bits. > + * SINGLE are available. It is used in avail_*_alloc_bits and restripe > + * item fields. > */ > #define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7) > > @@ -2000,8 +2045,86 @@ static inline bool btrfs_root_readonly(struct btrfs_root *root) > return root->root_item.flags & BTRFS_ROOT_SUBVOL_RDONLY; > } > > -/* struct btrfs_super_block */ > +/* struct btrfs_restripe_item */ > +BTRFS_SETGET_FUNCS(restripe_flags, struct btrfs_restripe_item, flags, 64); > + > +static inline void btrfs_restripe_data(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + read_eb_member(eb, ri, struct btrfs_restripe_item, data, ra); > +} > > +static inline void btrfs_set_restripe_data(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + write_eb_member(eb, ri, struct btrfs_restripe_item, data, ra); > +} > + > +static inline void btrfs_restripe_meta(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + read_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra); > +} > + > +static inline void btrfs_set_restripe_meta(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + write_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra); > +} > + > +static inline void btrfs_restripe_sys(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + read_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra); > +} > + > +static inline void btrfs_set_restripe_sys(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + write_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra); > +} > + > +static inline void > +btrfs_disk_restripe_args_to_cpu(struct btrfs_restripe_args *cpu, > + struct btrfs_disk_restripe_args *disk) > +{ > + memset(cpu, 0, sizeof(*cpu)); > + > + cpu->profiles = le64_to_cpu(disk->profiles); > + cpu->usage = le64_to_cpu(disk->usage); > + cpu->devid = le64_to_cpu(disk->devid); > + cpu->pstart = le64_to_cpu(disk->pstart); > + cpu->pend = le64_to_cpu(disk->pend); > + cpu->vstart = le64_to_cpu(disk->vstart); > + cpu->vend = le64_to_cpu(disk->vend); > + cpu->target = le64_to_cpu(disk->target); > + cpu->flags = le64_to_cpu(disk->flags); > +} > + > +static inline void > +btrfs_cpu_restripe_args_to_disk(struct btrfs_disk_restripe_args *disk, > + struct btrfs_restripe_args *cpu) > +{ > + memset(disk, 0, sizeof(*disk)); > + > + disk->profiles = cpu_to_le64(cpu->profiles); > + disk->usage = cpu_to_le64(cpu->usage); > + disk->devid = cpu_to_le64(cpu->devid); > + disk->pstart = cpu_to_le64(cpu->pstart); > + disk->pend = cpu_to_le64(cpu->pend); > + disk->vstart = cpu_to_le64(cpu->vstart); > + disk->vend = cpu_to_le64(cpu->vend); > + disk->target = cpu_to_le64(cpu->target); > + disk->flags = cpu_to_le64(cpu->flags); > +} > + > +/* struct btrfs_super_block */ > BTRFS_SETGET_STACK_FUNCS(super_bytenr, struct btrfs_super_block, bytenr, 64); > BTRFS_SETGET_STACK_FUNCS(super_flags, struct btrfs_super_block, flags, 64); > BTRFS_SETGET_STACK_FUNCS(super_generation, struct btrfs_super_block, > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index eccd458..1057ad3 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -2150,6 +2150,97 @@ error: > return ret; > } > > +static int insert_restripe_item(struct btrfs_root *root, > + struct restripe_control *rctl) > +{ > + struct btrfs_trans_handle *trans; > + struct btrfs_restripe_item *item; > + struct btrfs_disk_restripe_args disk_rargs; > + struct btrfs_path *path; > + struct extent_buffer *leaf; > + struct btrfs_key key; > + int ret, err; > + > + path = btrfs_alloc_path(); > + if (!path) > + return -ENOMEM; > + > + trans = btrfs_start_transaction(root, 0); > + if (IS_ERR(trans)) { > + btrfs_free_path(path); > + return PTR_ERR(trans); > + } > + > + key.objectid = BTRFS_RESTRIPE_OBJECTID; > + key.type = 0; > + key.offset = 0; > + > + ret = btrfs_insert_empty_item(trans, root, path, &key, > + sizeof(*item)); > + if (ret) > + goto out; > + > + leaf = path->nodes[0]; > + item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_restripe_item); > + > + memset_extent_buffer(leaf, 0, (unsigned long)item, sizeof(*item)); > + > + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->data); > + btrfs_set_restripe_data(leaf, item, &disk_rargs); > + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->meta); > + btrfs_set_restripe_meta(leaf, item, &disk_rargs); > + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->sys); > + btrfs_set_restripe_sys(leaf, item, &disk_rargs); > + > + btrfs_set_restripe_flags(leaf, item, rctl->flags); > + > + btrfs_mark_buffer_dirty(leaf); > +out: > + btrfs_free_path(path); > + err = btrfs_commit_transaction(trans, root); > + if (err && !ret) > + ret = err; > + return ret; > +} > + > +static int del_restripe_item(struct btrfs_root *root) > +{ > + struct btrfs_trans_handle *trans; > + struct btrfs_path *path; > + struct btrfs_key key; > + int ret, err; > + > + path = btrfs_alloc_path(); > + if (!path) > + return -ENOMEM; > + > + trans = btrfs_start_transaction(root, 0); > + if (IS_ERR(trans)) { > + btrfs_free_path(path); > + return PTR_ERR(trans); > + } > + > + key.objectid = BTRFS_RESTRIPE_OBJECTID; > + key.type = 0; > + key.offset = 0; > + > + ret = btrfs_search_slot(trans, root, &key, path, -1, 1); > + if (ret < 0) > + goto out; > + if (ret > 0) { > + ret = -ENOENT; > + goto out; > + } > + > + ret = btrfs_del_item(trans, root, path); > +out: > + btrfs_free_path(path); > + err = btrfs_commit_transaction(trans, root); > + if (err && !ret) > + ret = err; > + return ret; > +} > + > /* > * Should be called with both restripe and volume mutexes held to > * serialize other volume operations (add_dev/rm_dev/resize) wrt > @@ -2485,6 +2576,7 @@ int btrfs_restripe(struct restripe_control *rctl) > { > struct btrfs_fs_info *fs_info = rctl->fs_info; > u64 allowed; > + int err; > int ret; > > mutex_lock(&fs_info->volume_mutex); > @@ -2572,16 +2664,25 @@ int btrfs_restripe(struct restripe_control *rctl) > } > > do_restripe: > + ret = insert_restripe_item(fs_info->tree_root, rctl); > + if (ret && ret != -EEXIST) > + goto out; > + BUG_ON(ret == -EEXIST); > + > set_restripe_control(rctl); > mutex_unlock(&fs_info->volume_mutex); > > - ret = __btrfs_restripe(fs_info->dev_root); > + err = __btrfs_restripe(fs_info->dev_root); > > mutex_lock(&fs_info->volume_mutex); > + > unset_restripe_control(fs_info); > + ret = del_restripe_item(fs_info->tree_root); > + BUG_ON(ret);is it necessary to BUG_ON here? this can fire eg. during mount. if the old restriper state is left in place, the return value from insert_restripe_item above needs to be checked as well. my idea is some kind of checkpointing of the restriper state, eg. transaction number when the restriper succesfully finishes (and then can clean all restriper states).> + > mutex_unlock(&fs_info->volume_mutex); > > - return ret; > + return err; > > out: > mutex_unlock(&fs_info->volume_mutex); > -- > 1.7.5.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2011-Sep-27 17:28 UTC
Re: [PATCH 07/21] Btrfs: add basic infrastructure for selective balancing
On Tue, Sep 27, 2011 at 03:02:41PM +0200, David Sterba wrote:> On Tue, Aug 23, 2011 at 11:01:48PM +0300, Ilya Dryomov wrote: > > This allows to have a separate set of filters for each chunk type > > (data,meta,sys). The code however is generic and switch on chunk type > > is only done once. > > > > This commit also adds a type filter: it allows to balance for example > > meta and system chunks w/o touching data ones. > > > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > > --- > > fs/btrfs/volumes.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++-- > > fs/btrfs/volumes.h | 12 +++++++++ > > 2 files changed, 76 insertions(+), 3 deletions(-) > > > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > > index 0e4a276..95c6310 100644 > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -2175,6 +2175,30 @@ static void unset_restripe_control(struct btrfs_fs_info *fs_info) > > kfree(rctl); > > } > > > > +static int should_restripe_chunk(struct btrfs_root *root, > > + struct extent_buffer *leaf, > > + struct btrfs_chunk *chunk, u64 chunk_offset) > > +{ > > + struct restripe_control *rctl = root->fs_info->restripe_ctl; > > + u64 chunk_type = btrfs_chunk_type(leaf, chunk); > > + struct btrfs_restripe_args *rargs = NULL; > > + > > + /* type filter */ > > + if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) & > > + (rctl->flags & BTRFS_RESTRIPE_TYPE_MASK))) { > > + return 0; > > + } > > + > > + if (chunk_type & BTRFS_BLOCK_GROUP_DATA) > > + rargs = &rctl->data; > > + else if (chunk_type & BTRFS_BLOCK_GROUP_SYSTEM) > > + rargs = &rctl->sys; > > + else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA) > > + rargs = &rctl->meta; > > what''s the point of setting local variable ''rargs'' without using or > returning it?rargs is being used later in the series, it is passed to every filter function. It''s kind of hard to review, but that way I can break the thing into logical chunks and describe each of them. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arne Jansen
2011-Nov-01 07:56 UTC
Re: [PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit
On 23.08.2011 22:01, Ilya Dryomov wrote:> Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for > avail_{data,metadata,system}_alloc_bits fields, which are there to tell > us about available allocation profiles in the fs. When chunk is > created, it''s profile is OR''ed with respective avail_alloc_bits field. > Since SINGLE is denoted by 0 in the on-disk format, currently there is > no way to tell when such chunks become avaialble. Restriper needs that > information, so add a separate bit for SINGLE profile. > > This bit is going to be in-memory only, it should never be written out > to disk, so it''s not a disk format change. However to avoid remappings > in future, reserve corresponding on-disk bit. > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > fs/btrfs/ctree.h | 12 ++++++++++++ > fs/btrfs/extent-tree.c | 22 ++++++++++++++-------- > 2 files changed, 26 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index b882c95..5b00eb8 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -725,6 +725,17 @@ struct btrfs_csum_item { > BTRFS_BLOCK_GROUP_RAID1 | \ > BTRFS_BLOCK_GROUP_DUP | \ > BTRFS_BLOCK_GROUP_RAID10) > +/* > + * We need a bit for restriper to be able to tell when chunks of type > + * SINGLE are available. It is used in avail_*_alloc_bits. > + */ > +#define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7) > + > +/* > + * To avoid troubles or remappings, reserve on-disk bit. > + */ > +#define BTRFS_BLOCK_GROUP_RESERVED (1 << 7)can you move this define up to where the other BLOCK_GROUPS are defined? Otherwise it is easy to overlook.> + > struct btrfs_block_group_item { > __le64 used; > __le64 chunk_objectid; > @@ -1100,6 +1111,7 @@ struct btrfs_fs_info { > spinlock_t ref_cache_lock; > u64 total_ref_cache_size; > > + /* SINGLE has it''s own bit for these three */While this comment is easily understandable in the context in this patch, it is not enough when just reading the resulting code without the commit message. It would be good if you could duplicate more of the commit message into code comments.> u64 avail_data_alloc_bits; > u64 avail_metadata_alloc_bits; > u64 avail_system_alloc_bits; > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index de4c639..ed35eb5 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -2945,14 +2945,17 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, > static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) > { > u64 extra_flags = flags & BTRFS_BLOCK_GROUP_PROFILE_MASK; > - if (extra_flags) { > - if (flags & BTRFS_BLOCK_GROUP_DATA) > - fs_info->avail_data_alloc_bits |= extra_flags; > - if (flags & BTRFS_BLOCK_GROUP_METADATA) > - fs_info->avail_metadata_alloc_bits |= extra_flags; > - if (flags & BTRFS_BLOCK_GROUP_SYSTEM) > - fs_info->avail_system_alloc_bits |= extra_flags; > - } > + > + /* on-disk -> in-memory */ > + if (extra_flags == 0) > + extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE; > + > + if (flags & BTRFS_BLOCK_GROUP_DATA) > + fs_info->avail_data_alloc_bits |= extra_flags; > + if (flags & BTRFS_BLOCK_GROUP_METADATA) > + fs_info->avail_metadata_alloc_bits |= extra_flags; > + if (flags & BTRFS_BLOCK_GROUP_SYSTEM) > + fs_info->avail_system_alloc_bits |= extra_flags; > } > > u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) > @@ -2986,6 +2989,9 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) > (flags & BTRFS_BLOCK_GROUP_RAID10) | > (flags & BTRFS_BLOCK_GROUP_DUP))) > flags &= ~BTRFS_BLOCK_GROUP_RAID0; > + > + /* in-memory -> on-disk */ > + flags &= ~BTRFS_AVAIL_ALLOC_BIT_SINGLE; > return flags; > } >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arne Jansen
2011-Nov-01 10:08 UTC
Re: [PATCH 05/21] Btrfs: add basic restriper infrastructure
On 23.08.2011 22:01, Ilya Dryomov wrote:> Add basic restriper infrastructure: ioctl to start restripe, all > restripe ioctl data structures, add data structure for tracking > restriper''s state to fs_info. Duplicate balancing code for restriper, > btrfs_balance() will be removed when restriper is implemented. > > Explicitly disallow any volume operations when restriper is running. > (previously this restriction relied on volume_mutex being held during > the execution of any volume operation) > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > fs/btrfs/ctree.h | 5 + > fs/btrfs/disk-io.c | 4 + > fs/btrfs/ioctl.c | 107 ++++++++++++++++++++++---- > fs/btrfs/ioctl.h | 37 +++++++++ > fs/btrfs/volumes.c | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++-- > fs/btrfs/volumes.h | 18 ++++ > 6 files changed, 369 insertions(+), 21 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 5b00eb8..65d7562 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -895,6 +895,7 @@ struct btrfs_block_group_cache { > }; > > struct reloc_control; > +struct restripe_control; > struct btrfs_device; > struct btrfs_fs_devices; > struct btrfs_delayed_root; > @@ -1116,6 +1117,10 @@ struct btrfs_fs_info { > u64 avail_metadata_alloc_bits; > u64 avail_system_alloc_bits; > > + spinlock_t restripe_lock; > + struct mutex restripe_mutex; > + struct restripe_control *restripe_ctl; > +Can you please add some comments on the usage of the locks and how to protect the restripe_ctl pointer and the access to its data structures?> unsigned data_chunk_allocations; > unsigned metadata_ratio; > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 46d0412..fa2301b 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -1700,6 +1700,10 @@ struct btrfs_root *open_ctree(struct super_block *sb, > init_rwsem(&fs_info->scrub_super_lock); > fs_info->scrub_workers_refcnt = 0; > > + spin_lock_init(&fs_info->restripe_lock); > + mutex_init(&fs_info->restripe_mutex); > + fs_info->restripe_ctl = NULL; > + > sb->s_blocksize = 4096; > sb->s_blocksize_bits = blksize_bits(4096); > sb->s_bdi = &fs_info->bdi; > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 970977a..9dfc686 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1165,13 +1165,21 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > > + mutex_lock(&root->fs_info->volume_mutex); > + if (root->fs_info->restripe_ctl) { > + printk(KERN_INFO "btrfs: restripe in progress\n"); > + ret = -EINVAL; > + goto out; > + } > + > vol_args = memdup_user(arg, sizeof(*vol_args)); > - if (IS_ERR(vol_args)) > - return PTR_ERR(vol_args); > + if (IS_ERR(vol_args)) { > + ret = PTR_ERR(vol_args); > + goto out; > + } > > vol_args->name[BTRFS_PATH_NAME_MAX] = ''\0''; > > - mutex_lock(&root->fs_info->volume_mutex); > sizestr = vol_args->name; > devstr = strchr(sizestr, '':''); > if (devstr) { > @@ -1188,7 +1196,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, > printk(KERN_INFO "resizer unable to find device %llu\n", > (unsigned long long)devid); > ret = -EINVAL; > - goto out_unlock; > + goto out_free; > } > if (!strcmp(sizestr, "max")) > new_size = device->bdev->bd_inode->i_size; > @@ -1203,7 +1211,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, > new_size = memparse(sizestr, NULL); > if (new_size == 0) { > ret = -EINVAL; > - goto out_unlock; > + goto out_free; > } > } > > @@ -1212,7 +1220,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, > if (mod < 0) { > if (new_size > old_size) { > ret = -EINVAL; > - goto out_unlock; > + goto out_free; > } > new_size = old_size - new_size; > } else if (mod > 0) { > @@ -1221,11 +1229,11 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, > > if (new_size < 256 * 1024 * 1024) { > ret = -EINVAL; > - goto out_unlock; > + goto out_free; > } > if (new_size > device->bdev->bd_inode->i_size) { > ret = -EFBIG; > - goto out_unlock; > + goto out_free; > } > > do_div(new_size, root->sectorsize); > @@ -1238,7 +1246,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, > trans = btrfs_start_transaction(root, 0); > if (IS_ERR(trans)) { > ret = PTR_ERR(trans); > - goto out_unlock; > + goto out_free; > } > ret = btrfs_grow_device(trans, device, new_size); > btrfs_commit_transaction(trans, root); > @@ -1246,9 +1254,10 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root *root, > ret = btrfs_shrink_device(device, new_size); > } > > -out_unlock: > - mutex_unlock(&root->fs_info->volume_mutex); > +out_free: > kfree(vol_args); > +out: > + mutex_unlock(&root->fs_info->volume_mutex); > return ret; > } > > @@ -2014,14 +2023,25 @@ static long btrfs_ioctl_add_dev(struct btrfs_root *root, void __user *arg) > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > > + mutex_lock(&root->fs_info->volume_mutex); > + if (root->fs_info->restripe_ctl) { > + printk(KERN_INFO "btrfs: restripe in progress\n"); > + ret = -EINVAL; > + goto out; > + } > + > vol_args = memdup_user(arg, sizeof(*vol_args)); > - if (IS_ERR(vol_args)) > - return PTR_ERR(vol_args); > + if (IS_ERR(vol_args)) { > + ret = PTR_ERR(vol_args); > + goto out; > + } > > vol_args->name[BTRFS_PATH_NAME_MAX] = ''\0''; > ret = btrfs_init_new_device(root, vol_args->name); > > kfree(vol_args); > +out: > + mutex_unlock(&root->fs_info->volume_mutex); > return ret; > } > > @@ -2036,14 +2056,25 @@ static long btrfs_ioctl_rm_dev(struct btrfs_root *root, void __user *arg) > if (root->fs_info->sb->s_flags & MS_RDONLY) > return -EROFS; > > + mutex_lock(&root->fs_info->volume_mutex); > + if (root->fs_info->restripe_ctl) { > + printk(KERN_INFO "btrfs: restripe in progress\n"); > + ret = -EINVAL; > + goto out; > + } > + > vol_args = memdup_user(arg, sizeof(*vol_args)); > - if (IS_ERR(vol_args)) > - return PTR_ERR(vol_args); > + if (IS_ERR(vol_args)) { > + ret = PTR_ERR(vol_args); > + goto out; > + } > > vol_args->name[BTRFS_PATH_NAME_MAX] = ''\0''; > ret = btrfs_rm_device(root, vol_args->name); > > kfree(vol_args); > +out: > + mutex_unlock(&root->fs_info->volume_mutex); > return ret; > } > > @@ -2833,6 +2864,50 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root, > return ret; > } > > +static long btrfs_ioctl_restripe(struct btrfs_root *root, void __user *arg) > +{ > + struct btrfs_ioctl_restripe_args *rargs; > + struct btrfs_fs_info *fs_info = root->fs_info; > + struct restripe_control *rctl; > + int ret; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + if (fs_info->sb->s_flags & MS_RDONLY) > + return -EROFS; > + > + mutex_lock(&fs_info->restripe_mutex); > + > + rargs = memdup_user(arg, sizeof(*rargs)); > + if (IS_ERR(rargs)) { > + ret = PTR_ERR(rargs); > + goto out; > + } > + > + rctl = kzalloc(sizeof(*rctl), GFP_NOFS); > + if (!rctl) { > + kfree(rargs); > + ret = -ENOMEM; > + goto out; > + } > + > + rctl->fs_info = fs_info; > + rctl->flags = rargs->flags; > + > + memcpy(&rctl->data, &rargs->data, sizeof(rctl->data)); > + memcpy(&rctl->meta, &rargs->meta, sizeof(rctl->meta)); > + memcpy(&rctl->sys, &rargs->sys, sizeof(rctl->sys)); > + > + ret = btrfs_restripe(rctl); > + > + /* rctl freed in unset_restripe_control */ > + kfree(rargs); > +out: > + mutex_unlock(&fs_info->restripe_mutex); > + return ret; > +} > + > long btrfs_ioctl(struct file *file, unsigned int > cmd, unsigned long arg) > { > @@ -2905,6 +2980,8 @@ long btrfs_ioctl(struct file *file, unsigned int > return btrfs_ioctl_scrub_cancel(root, argp); > case BTRFS_IOC_SCRUB_PROGRESS: > return btrfs_ioctl_scrub_progress(root, argp); > + case BTRFS_IOC_RESTRIPE: > + return btrfs_ioctl_restripe(root, argp); > } > > return -ENOTTY; > diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h > index ad1ea78..798f1d4 100644 > --- a/fs/btrfs/ioctl.h > +++ b/fs/btrfs/ioctl.h > @@ -109,6 +109,41 @@ struct btrfs_ioctl_fs_info_args { > __u64 reserved[124]; /* pad to 1k */ > }; > > +struct btrfs_restripe_args { > + __u64 profiles; > + __u64 usage; > + __u64 devid; > + __u64 pstart; > + __u64 pend; > + __u64 vstart; > + __u64 vend; > + > + __u64 target; > + > + __u64 flags; > + > + __u64 unused[8]; > +} __attribute__ ((__packed__)); > + > +struct btrfs_restripe_progress { > + __u64 expected; > + __u64 considered; > + __u64 completed; > +}; > + > +struct btrfs_ioctl_restripe_args { > + __u64 flags; > + __u64 state; > + > + struct btrfs_restripe_args data; > + struct btrfs_restripe_args sys; > + struct btrfs_restripe_args meta; > + > + struct btrfs_restripe_progress stat; > + > + __u64 unused[72]; /* pad to 1k */ > +}; > + > #define BTRFS_INO_LOOKUP_PATH_MAX 4080 > struct btrfs_ioctl_ino_lookup_args { > __u64 treeid; > @@ -248,4 +283,6 @@ struct btrfs_ioctl_space_args { > struct btrfs_ioctl_dev_info_args) > #define BTRFS_IOC_FS_INFO _IOR(BTRFS_IOCTL_MAGIC, 31, \ > struct btrfs_ioctl_fs_info_args) > +#define BTRFS_IOC_RESTRIPE _IOW(BTRFS_IOCTL_MAGIC, 32, \ > + struct btrfs_ioctl_restripe_args) > #endif > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index af4bf56..0e4a276 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1262,7 +1262,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) > bool clear_super = false; > > mutex_lock(&uuid_mutex); > - mutex_lock(&root->fs_info->volume_mutex); > > all_avail = root->fs_info->avail_data_alloc_bits | > root->fs_info->avail_system_alloc_bits | > @@ -1427,7 +1426,6 @@ error_close: > if (bdev) > blkdev_put(bdev, FMODE_READ | FMODE_EXCL); > out: > - mutex_unlock(&root->fs_info->volume_mutex); > mutex_unlock(&uuid_mutex); > return ret; > error_undo: > @@ -1604,7 +1602,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) > } > > filemap_write_and_wait(bdev->bd_inode->i_mapping); > - mutex_lock(&root->fs_info->volume_mutex); > > devices = &root->fs_info->fs_devices->devices; > /* > @@ -1728,8 +1725,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) > ret = btrfs_relocate_sys_chunks(root); > BUG_ON(ret); > } > -out: > - mutex_unlock(&root->fs_info->volume_mutex); > + > return ret; > error: > blkdev_put(bdev, FMODE_EXCL); > @@ -1737,7 +1733,7 @@ error: > mutex_unlock(&uuid_mutex); > up_write(&sb->s_umount); > } > - goto out; > + return ret; > } > > static noinline int btrfs_update_device(struct btrfs_trans_handle *trans, > @@ -2155,6 +2151,217 @@ error: > } > > /* > + * Should be called with both restripe and volume mutexes held to > + * serialize other volume operations (add_dev/rm_dev/resize) wrt > + * restriper. Same goes for unset_restripe_control(). > + */ > +static void set_restripe_control(struct restripe_control *rctl) > +{ > + struct btrfs_fs_info *fs_info = rctl->fs_info; > + > + spin_lock(&fs_info->restripe_lock); > + fs_info->restripe_ctl = rctl; > + spin_unlock(&fs_info->restripe_lock); > +} > + > +static void unset_restripe_control(struct btrfs_fs_info *fs_info) > +{ > + struct restripe_control *rctl = fs_info->restripe_ctl; > + > + spin_lock(&fs_info->restripe_lock); > + fs_info->restripe_ctl = NULL; > + spin_unlock(&fs_info->restripe_lock); > + > + kfree(rctl); > +} > + > +static int __btrfs_restripe(struct btrfs_root *dev_root) > +{ > + struct list_head *devices; > + struct btrfs_device *device; > + u64 old_size; > + u64 size_to_free; > + struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root; > + struct btrfs_path *path; > + struct btrfs_key key; > + struct btrfs_key found_key; > + struct btrfs_trans_handle *trans; > + int ret; > + int enospc_errors = 0; > + > + /* step one make some room on all the devices */ > + devices = &dev_root->fs_info->fs_devices->devices; > + list_for_each_entry(device, devices, dev_list) { > + old_size = device->total_bytes; > + size_to_free = div_factor(old_size, 1); > + size_to_free = min(size_to_free, (u64)1 * 1024 * 1024); > + if (!device->writeable || > + device->total_bytes - device->bytes_used > size_to_free) > + continue; > + > + ret = btrfs_shrink_device(device, old_size - size_to_free); > + if (ret == -ENOSPC) > + break; > + BUG_ON(ret); > + > + trans = btrfs_start_transaction(dev_root, 0); > + BUG_ON(IS_ERR(trans)); > + > + ret = btrfs_grow_device(trans, device, old_size); > + BUG_ON(ret); > + > + btrfs_end_transaction(trans, dev_root); > + } > + > + /* step two, relocate all the chunks */ > + path = btrfs_alloc_path(); > + if (!path) { > + ret = -ENOMEM; > + goto error; > + } > + > + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; > + key.offset = (u64)-1; > + key.type = BTRFS_CHUNK_ITEM_KEY; > + > + while (1) { > + ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0); > + if (ret < 0) > + goto error; > + > + /* > + * this shouldn''t happen, it means the last relocate > + * failed > + */ > + if (ret == 0) > + BUG_ON(1); /* DIS - break ? */ > + > + ret = btrfs_previous_item(chunk_root, path, 0, > + BTRFS_CHUNK_ITEM_KEY); > + if (ret) > + BUG_ON(1); /* DIS - break ? */ > + > + btrfs_item_key_to_cpu(path->nodes[0], &found_key, > + path->slots[0]); > + if (found_key.objectid != key.objectid) > + break; > + > + /* chunk zero is special */ > + if (found_key.offset == 0) > + break; > + > + btrfs_release_path(path); > + ret = btrfs_relocate_chunk(chunk_root, > + chunk_root->root_key.objectid, > + found_key.objectid, > + found_key.offset); > + if (ret && ret != -ENOSPC) > + goto error; > + if (ret == -ENOSPC) > + enospc_errors++; > + key.offset = found_key.offset - 1; > + } > + > +error: > + btrfs_free_path(path); > + if (enospc_errors) { > + printk(KERN_INFO "btrfs: restripe finished with %d enospc " > + "error(s)\n", enospc_errors); > + ret = -ENOSPC; > + } > + > + return ret; > +} > + > +/* > + * Should be called with restripe_mutex held > + */ > +int btrfs_restripe(struct restripe_control *rctl) > +{ > + struct btrfs_fs_info *fs_info = rctl->fs_info; > + u64 allowed; > + int ret; > + > + mutex_lock(&fs_info->volume_mutex); > + > + /* > + * Profile changing sanity checks > + */ > + allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE; > + if (fs_info->fs_devices->num_devices == 1) > + allowed |= BTRFS_BLOCK_GROUP_DUP; > + else if (fs_info->fs_devices->num_devices < 4) > + allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1); > + else > + allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | > + BTRFS_BLOCK_GROUP_RAID10); > + > + if (rctl->data.target & ~allowed) { > + printk(KERN_ERR "btrfs: unable to start restripe with target " > + "data profile %llu\n", > + (unsigned long long)rctl->data.target); > + ret = -EINVAL; > + goto out; > + } > + if (rctl->sys.target & ~allowed) { > + printk(KERN_ERR "btrfs: unable to start restripe with target " > + "system profile %llu\n", > + (unsigned long long)rctl->sys.target); > + ret = -EINVAL; > + goto out; > + } > + if (rctl->meta.target & ~allowed) { > + printk(KERN_ERR "btrfs: unable to start restripe with target " > + "metadata profile %llu\n", > + (unsigned long long)rctl->meta.target); > + ret = -EINVAL; > + goto out; > + } > + > + if (rctl->data.target & BTRFS_BLOCK_GROUP_DUP) { > + printk(KERN_ERR "btrfs: dup for data is not allowed\n"); > + ret = -EINVAL; > + goto out; > + }It would be good to get these error messages somehow to the user, or at least give the user a hint to look in dmesg.> + > + /* allow to reduce meta or sys integrity only if force set */ > + allowed = BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 | > + BTRFS_BLOCK_GROUP_RAID10; > + if (((rctl->sys.flags & BTRFS_RESTRIPE_ARGS_CONVERT) && > + (fs_info->avail_system_alloc_bits & allowed) && > + !(rctl->sys.target & allowed)) || > + ((rctl->meta.flags & BTRFS_RESTRIPE_ARGS_CONVERT) && > + (fs_info->avail_metadata_alloc_bits & allowed) && > + !(rctl->meta.target & allowed))) { > + if (rctl->flags & BTRFS_RESTRIPE_FORCE) { > + printk(KERN_INFO "btrfs: force reducing metadata " > + "integrity\n"); > + } else { > + printk(KERN_ERR "btrfs: can''t reduce metadata " > + "integrity\n"); > + ret = -EINVAL; > + goto out; > + } > + } > + > + set_restripe_control(rctl); > + mutex_unlock(&fs_info->volume_mutex); > + > + ret = __btrfs_restripe(fs_info->dev_root); > + > + mutex_lock(&fs_info->volume_mutex); > + unset_restripe_control(fs_info); > + mutex_unlock(&fs_info->volume_mutex); > + > + return ret; > + > +out: > + mutex_unlock(&fs_info->volume_mutex); > + kfree(rctl); > + return ret; > +} > + > +/* > * shrinking a device means finding all of the device extents past > * the new size, and then following the back refs to the chunks. > * The chunk relocation code actually frees the device extent > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index 6d866db..8804c5c 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -168,6 +168,23 @@ struct map_lookup { > #define map_lookup_size(n) (sizeof(struct map_lookup) + \ > (sizeof(struct btrfs_bio_stripe) * (n))) > > +#define BTRFS_RESTRIPE_FORCE (1ULL << 3) > + > +/* > + * Profile changing flags > + */ > +#define BTRFS_RESTRIPE_ARGS_CONVERT (1ULL << 8) > + > +struct btrfs_restripe_args; > +struct restripe_control { > + struct btrfs_fs_info *fs_info; > + u64 flags; > + > + struct btrfs_restripe_args data; > + struct btrfs_restripe_args sys; > + struct btrfs_restripe_args meta; > +}; > + > int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start, > u64 end, u64 *length); > > @@ -211,6 +228,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid, > int btrfs_shrink_device(struct btrfs_device *device, u64 new_size); > int btrfs_init_new_device(struct btrfs_root *root, char *path); > int btrfs_balance(struct btrfs_root *dev_root); > +int btrfs_restripe(struct restripe_control *rctl); > int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); > int find_free_dev_extent(struct btrfs_trans_handle *trans, > struct btrfs_device *device, u64 num_bytes,-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 23.08.2011 22:01, Ilya Dryomov wrote:> Select chunks that are less than X percent full. > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > fs/btrfs/volumes.c | 33 +++++++++++++++++++++++++++++++++ > fs/btrfs/volumes.h | 1 + > 2 files changed, 34 insertions(+), 0 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index f045615..b49ecfa 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -2193,6 +2193,33 @@ static int chunk_profiles_filter(u64 chunk_profile, > return 1; > } > > +static u64 div_factor_fine(u64 num, int factor) > +{ > + if (factor == 100) > + return num;You already have changed this to a range check that always returns num when <0 or >= 100, but I''d find it more consistent to return 0 when factor < 0.> + num *= factor; > + do_div(num, 100); > + return num; > +} > + > +static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, > + struct btrfs_restripe_args *rargs) > +{ > + struct btrfs_block_group_cache *cache; > + u64 chunk_used, user_thresh; > + int ret = 1; > + > + cache = btrfs_lookup_block_group(fs_info, chunk_offset); > + chunk_used = btrfs_block_group_used(&cache->item); > + > + user_thresh = div_factor_fine(cache->key.offset, rargs->usage); > + if (chunk_used < user_thresh) > + ret = 0; > + > + btrfs_put_block_group(cache); > + return ret; > +} > + > static int chunk_soft_convert_filter(u64 chunk_profile, > struct btrfs_restripe_args *rargs) > { > @@ -2236,6 +2263,12 @@ static int should_restripe_chunk(struct btrfs_root *root, > return 0; > } > > + /* usage filter */ > + if ((rargs->flags & BTRFS_RESTRIPE_ARGS_USAGE) && > + chunk_usage_filter(rctl->fs_info, chunk_offset, rargs)) { > + return 0; > + } > + > /* soft profile changing mode */ > if ((rargs->flags & BTRFS_RESTRIPE_ARGS_SOFT) && > chunk_soft_convert_filter(chunk_type, rargs)) { > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index 9f96ad8..c6baf4b 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -186,6 +186,7 @@ struct map_lookup { > * Restripe filters > */ > #define BTRFS_RESTRIPE_ARGS_PROFILES (1ULL << 0) > +#define BTRFS_RESTRIPE_ARGS_USAGE (1ULL << 1) > > /* > * Profile changing flags. When SOFT is set we won''t relocate chunk if-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arne Jansen
2011-Nov-01 10:29 UTC
Re: [PATCH 14/21] Btrfs: save restripe parameters to disk
On 23.08.2011 22:01, Ilya Dryomov wrote:> Introduce a new btree objectid for storing restripe item. The reason is > to be able to resume restriper after a crash with the same parameters. > Restripe item has a very high objectid and goes into tree of tree roots. > > The key for the new item is as follows: > > [ BTRFS_RESTRIPE_OBJECTID ; 0 ; 0 ] > > Older kernels simply ignore it so it''s safe to mount with an older > kernel and then go back to the newer one. > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > fs/btrfs/ctree.h | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++- > fs/btrfs/volumes.c | 105 ++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 228 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 65d7562..b524034 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -85,6 +85,9 @@ struct btrfs_ordered_sum; > /* holds checksums of all the data extents */ > #define BTRFS_CSUM_TREE_OBJECTID 7ULL > > +/* for storing restripe params in the root tree */ > +#define BTRFS_RESTRIPE_OBJECTID -4ULL > + > /* orhpan objectid for tracking unlinked/truncated files */ > #define BTRFS_ORPHAN_OBJECTID -5ULL > > @@ -649,6 +652,47 @@ struct btrfs_root_ref { > __le16 name_len; > } __attribute__ ((__packed__)); > > +/* > + * Restriper stuff > + */ > +struct btrfs_disk_restripe_args { > + /* profiles to touch, in-memory format */ > + __le64 profiles; > + > + /* usage filter */ > + __le64 usage; > + > + /* devid filter */ > + __le64 devid; > + > + /* devid subset filter [pstart..pend) */ > + __le64 pstart; > + __le64 pend; > + > + /* btrfs virtual address space subset filter [vstart..vend) */ > + __le64 vstart; > + __le64 vend; > + > + /* profile to convert to, in-memory format */ > + __le64 target; > + > + /* BTRFS_RESTRIPE_ARGS_* */ > + __le64 flags; > + > + __le64 unused[8]; > +} __attribute__ ((__packed__)); > + > +struct btrfs_restripe_item { > + /* BTRFS_RESTRIPE_* */ > + __le64 flags; > + > + struct btrfs_disk_restripe_args data; > + struct btrfs_disk_restripe_args sys; > + struct btrfs_disk_restripe_args meta; > + > + __le64 unused[4]; > +} __attribute__ ((__packed__));what are those unused fields for? As I understand it, the restripe_item is only temporary and gets removed after restripe finished, so I don''t see much point in leaving space for future expansions. You have the size of the struct anyway, or can determine which fields to access through the flags field.> + > #define BTRFS_FILE_EXTENT_INLINE 0 > #define BTRFS_FILE_EXTENT_REG 1 > #define BTRFS_FILE_EXTENT_PREALLOC 2 > @@ -727,7 +771,8 @@ struct btrfs_csum_item { > BTRFS_BLOCK_GROUP_RAID10) > /* > * We need a bit for restriper to be able to tell when chunks of type > - * SINGLE are available. It is used in avail_*_alloc_bits. > + * SINGLE are available. It is used in avail_*_alloc_bits and restripe > + * item fields. > */ > #define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1 << 7) > > @@ -2000,8 +2045,86 @@ static inline bool btrfs_root_readonly(struct btrfs_root *root) > return root->root_item.flags & BTRFS_ROOT_SUBVOL_RDONLY; > } > > -/* struct btrfs_super_block */ > +/* struct btrfs_restripe_item */ > +BTRFS_SETGET_FUNCS(restripe_flags, struct btrfs_restripe_item, flags, 64); > + > +static inline void btrfs_restripe_data(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + read_eb_member(eb, ri, struct btrfs_restripe_item, data, ra); > +} > > +static inline void btrfs_set_restripe_data(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + write_eb_member(eb, ri, struct btrfs_restripe_item, data, ra); > +} > + > +static inline void btrfs_restripe_meta(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + read_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra); > +} > + > +static inline void btrfs_set_restripe_meta(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + write_eb_member(eb, ri, struct btrfs_restripe_item, meta, ra); > +} > + > +static inline void btrfs_restripe_sys(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + read_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra); > +} > + > +static inline void btrfs_set_restripe_sys(struct extent_buffer *eb, > + struct btrfs_restripe_item *ri, > + struct btrfs_disk_restripe_args *ra) > +{ > + write_eb_member(eb, ri, struct btrfs_restripe_item, sys, ra); > +} > + > +static inline void > +btrfs_disk_restripe_args_to_cpu(struct btrfs_restripe_args *cpu, > + struct btrfs_disk_restripe_args *disk) > +{ > + memset(cpu, 0, sizeof(*cpu)); > + > + cpu->profiles = le64_to_cpu(disk->profiles); > + cpu->usage = le64_to_cpu(disk->usage); > + cpu->devid = le64_to_cpu(disk->devid); > + cpu->pstart = le64_to_cpu(disk->pstart); > + cpu->pend = le64_to_cpu(disk->pend); > + cpu->vstart = le64_to_cpu(disk->vstart); > + cpu->vend = le64_to_cpu(disk->vend); > + cpu->target = le64_to_cpu(disk->target); > + cpu->flags = le64_to_cpu(disk->flags); > +} > + > +static inline void > +btrfs_cpu_restripe_args_to_disk(struct btrfs_disk_restripe_args *disk, > + struct btrfs_restripe_args *cpu) > +{ > + memset(disk, 0, sizeof(*disk)); > + > + disk->profiles = cpu_to_le64(cpu->profiles); > + disk->usage = cpu_to_le64(cpu->usage); > + disk->devid = cpu_to_le64(cpu->devid); > + disk->pstart = cpu_to_le64(cpu->pstart); > + disk->pend = cpu_to_le64(cpu->pend); > + disk->vstart = cpu_to_le64(cpu->vstart); > + disk->vend = cpu_to_le64(cpu->vend); > + disk->target = cpu_to_le64(cpu->target); > + disk->flags = cpu_to_le64(cpu->flags); > +} > + > +/* struct btrfs_super_block */ > BTRFS_SETGET_STACK_FUNCS(super_bytenr, struct btrfs_super_block, bytenr, 64); > BTRFS_SETGET_STACK_FUNCS(super_flags, struct btrfs_super_block, flags, 64); > BTRFS_SETGET_STACK_FUNCS(super_generation, struct btrfs_super_block, > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index eccd458..1057ad3 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -2150,6 +2150,97 @@ error: > return ret; > } > > +static int insert_restripe_item(struct btrfs_root *root, > + struct restripe_control *rctl) > +{ > + struct btrfs_trans_handle *trans; > + struct btrfs_restripe_item *item; > + struct btrfs_disk_restripe_args disk_rargs; > + struct btrfs_path *path; > + struct extent_buffer *leaf; > + struct btrfs_key key; > + int ret, err; > + > + path = btrfs_alloc_path(); > + if (!path) > + return -ENOMEM; > + > + trans = btrfs_start_transaction(root, 0); > + if (IS_ERR(trans)) { > + btrfs_free_path(path); > + return PTR_ERR(trans); > + } > + > + key.objectid = BTRFS_RESTRIPE_OBJECTID; > + key.type = 0; > + key.offset = 0; > + > + ret = btrfs_insert_empty_item(trans, root, path, &key, > + sizeof(*item)); > + if (ret) > + goto out; > + > + leaf = path->nodes[0]; > + item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_restripe_item); > + > + memset_extent_buffer(leaf, 0, (unsigned long)item, sizeof(*item)); > + > + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->data); > + btrfs_set_restripe_data(leaf, item, &disk_rargs); > + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->meta); > + btrfs_set_restripe_meta(leaf, item, &disk_rargs); > + btrfs_cpu_restripe_args_to_disk(&disk_rargs, &rctl->sys); > + btrfs_set_restripe_sys(leaf, item, &disk_rargs); > + > + btrfs_set_restripe_flags(leaf, item, rctl->flags); > + > + btrfs_mark_buffer_dirty(leaf); > +out: > + btrfs_free_path(path); > + err = btrfs_commit_transaction(trans, root); > + if (err && !ret) > + ret = err; > + return ret; > +} > + > +static int del_restripe_item(struct btrfs_root *root) > +{ > + struct btrfs_trans_handle *trans; > + struct btrfs_path *path; > + struct btrfs_key key; > + int ret, err; > + > + path = btrfs_alloc_path(); > + if (!path) > + return -ENOMEM; > + > + trans = btrfs_start_transaction(root, 0); > + if (IS_ERR(trans)) { > + btrfs_free_path(path); > + return PTR_ERR(trans); > + } > + > + key.objectid = BTRFS_RESTRIPE_OBJECTID; > + key.type = 0; > + key.offset = 0; > + > + ret = btrfs_search_slot(trans, root, &key, path, -1, 1); > + if (ret < 0) > + goto out; > + if (ret > 0) { > + ret = -ENOENT; > + goto out; > + } > + > + ret = btrfs_del_item(trans, root, path); > +out: > + btrfs_free_path(path); > + err = btrfs_commit_transaction(trans, root); > + if (err && !ret) > + ret = err; > + return ret; > +} > + > /* > * Should be called with both restripe and volume mutexes held to > * serialize other volume operations (add_dev/rm_dev/resize) wrt > @@ -2485,6 +2576,7 @@ int btrfs_restripe(struct restripe_control *rctl) > { > struct btrfs_fs_info *fs_info = rctl->fs_info; > u64 allowed; > + int err; > int ret; > > mutex_lock(&fs_info->volume_mutex); > @@ -2572,16 +2664,25 @@ int btrfs_restripe(struct restripe_control *rctl) > } > > do_restripe: > + ret = insert_restripe_item(fs_info->tree_root, rctl); > + if (ret && ret != -EEXIST) > + goto out; > + BUG_ON(ret == -EEXIST); > + > set_restripe_control(rctl); > mutex_unlock(&fs_info->volume_mutex); > > - ret = __btrfs_restripe(fs_info->dev_root); > + err = __btrfs_restripe(fs_info->dev_root); > > mutex_lock(&fs_info->volume_mutex); > + > unset_restripe_control(fs_info); > + ret = del_restripe_item(fs_info->tree_root); > + BUG_ON(ret); > + > mutex_unlock(&fs_info->volume_mutex); > > - return ret; > + return err; > > out: > mutex_unlock(&fs_info->volume_mutex);-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 23.08.2011 22:01, Ilya Dryomov wrote:> On mount, if restripe item is found, resume restripe in a separate > kernel thread. > > Try to be smart to continue roughly where previous balance (or convert) > was interrupted. For chunk types that were being converted to some > profile we turn on soft convert, in case of a simple balance we turn on > usage filter and relocate only less-than-90%-full chunks of that type. > These are just heuristics but they help quite a bit, and can be improved > in future. >Instead of trying to find out where you left off, can''t you just save a pointer in your restripe_item every time a chunk finished? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Nov-01 11:07 UTC
Re: [PATCH 05/21] Btrfs: add basic restriper infrastructure
On Tue, Nov 01, 2011 at 11:08:38AM +0100, Arne Jansen wrote:> > +/* > > + * Should be called with restripe_mutex held > > + */ > > +int btrfs_restripe(struct restripe_control *rctl) > > +{...> > + if (rctl->data.target & BTRFS_BLOCK_GROUP_DUP) { > > + printk(KERN_ERR "btrfs: dup for data is not allowed\n"); > > + ret = -EINVAL; > > + goto out; > > + } > > It would be good to get these error messages somehow to the user, > or at least give the user a hint to look in dmesg.the restriper command ends with EINVAL which is in most cases returned as a result of the ioctl and progs counterpart will 1117 fprintf(stderr, "ERROR: error during restriping ''%s'' " 1118 "- %s\n", path, strerror(e)); 1119 return 19; the hint should go there imho. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arne Jansen
2011-Nov-01 11:08 UTC
Re: [PATCH 05/21] Btrfs: add basic restriper infrastructure
On 01.11.2011 12:07, David Sterba wrote:> On Tue, Nov 01, 2011 at 11:08:38AM +0100, Arne Jansen wrote: >>> +/* >>> + * Should be called with restripe_mutex held >>> + */ >>> +int btrfs_restripe(struct restripe_control *rctl) >>> +{ > ... >>> + if (rctl->data.target & BTRFS_BLOCK_GROUP_DUP) { >>> + printk(KERN_ERR "btrfs: dup for data is not allowed\n"); >>> + ret = -EINVAL; >>> + goto out; >>> + } >> >> It would be good to get these error messages somehow to the user, >> or at least give the user a hint to look in dmesg. > > the restriper command ends with EINVAL which is in most cases returned > as a result of the ioctl and progs counterpart will > > 1117 fprintf(stderr, "ERROR: error during restriping ''%s'' " > 1118 "- %s\n", path, strerror(e)); > 1119 return 19; > > the hint should go there imho.Though it would still be much nicer to get a proper error message to the user directly.> > > david-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 23.08.2011 22:01, Ilya Dryomov wrote:> Implement an ioctl for pausing restriper. This pauses the relocation, > but restripe is still considered to be "in progress": restriper item is > not deleted, other volume operations cannot be started, etc. If paused > in the middle of profile changing operation we will continue making > allocations with the target profile. > > Add a hook to close_ctree() to be able to pause restriper and free it''s > data structures on unmount. (It''s safe to unmount when restriper is in > ''paused'' state, we will resume with the same parameters on the next > mount) > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > fs/btrfs/disk-io.c | 3 +++ > fs/btrfs/ioctl.c | 2 ++ > fs/btrfs/ioctl.h | 1 + > fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- > fs/btrfs/volumes.h | 2 ++ > 5 files changed, 50 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 662a6e6..7db5c50 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -2542,6 +2542,9 @@ int close_ctree(struct btrfs_root *root) > fs_info->closing = 1; > smp_mb(); > > + /* pause restriper and free restripe_ctl */ > + btrfs_pause_restripe(root->fs_info, 1); > + > btrfs_scrub_cancel(root); > > /* wait for any defraggers to finish */ > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index d8bdb67..61978ac 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -2921,6 +2921,8 @@ static long btrfs_ioctl_restripe_ctl(struct btrfs_root *root, > switch (cmd) { > case BTRFS_RESTRIPE_CTL_CANCEL: > return btrfs_cancel_restripe(root->fs_info); > + case BTRFS_RESTRIPE_CTL_PAUSE: > + return btrfs_pause_restripe(root->fs_info, 0); > } > > return -EINVAL; > diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h > index 4f6ead5..e468d5b 100644 > --- a/fs/btrfs/ioctl.h > +++ b/fs/btrfs/ioctl.h > @@ -110,6 +110,7 @@ struct btrfs_ioctl_fs_info_args { > }; > > #define BTRFS_RESTRIPE_CTL_CANCEL 1 > +#define BTRFS_RESTRIPE_CTL_PAUSE 2 > > struct btrfs_restripe_args { > __u64 profiles; > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index cd43368..65deaa7 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -2555,7 +2555,8 @@ static int __btrfs_restripe(struct btrfs_root *dev_root) > while (1) { > struct btrfs_fs_info *fs_info = dev_root->fs_info; > > - if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) { > + if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) || > + test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state)) { > ret = -ECANCELED; > goto error; > } > @@ -2730,7 +2731,9 @@ do_restripe: > mutex_lock(&fs_info->restripe_mutex); > clear_bit(RESTRIPE_RUNNING, &fs_info->restripe_state); > > - if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state)) { > + if (test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state) || > + (!test_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state) && > + !test_bit(RESTRIPE_CANCEL_REQ, &fs_info->restripe_state))) { > mutex_lock(&fs_info->volume_mutex); > > unset_restripe_control(fs_info); > @@ -2858,6 +2861,43 @@ out: > return ret; > }I don''t see a difference in what CANCEL_REQ and PAUSE_REQ do, so it seems one of them would be enough.> > +int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset) > +{ > + int ret = 0; > + > + mutex_lock(&fs_info->restripe_mutex); > + if (!fs_info->restripe_ctl) { > + ret = -ENOTCONN; > + goto out; > + } > + > + /* only running restripe can be paused */ > + if (!test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) { > + ret = -ENOTCONN; > + goto out_unset; > + } > + > + set_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state); > + while (test_bit(RESTRIPE_RUNNING, &fs_info->restripe_state)) { > + mutex_unlock(&fs_info->restripe_mutex); > + wait_event(fs_info->restripe_wait, > + !test_bit(RESTRIPE_RUNNING, > + &fs_info->restripe_state)); > + mutex_lock(&fs_info->restripe_mutex); > + } > + clear_bit(RESTRIPE_PAUSE_REQ, &fs_info->restripe_state); > + > +out_unset: > + if (unset) { > + mutex_lock(&fs_info->volume_mutex); > + unset_restripe_control(fs_info); > + mutex_unlock(&fs_info->volume_mutex); > + } > +out: > + mutex_unlock(&fs_info->restripe_mutex); > + return ret; > +} > +This looks very similar to cancel_restripe. It should be easy to merge them to one function without making a mess out of it.> /* > * shrinking a device means finding all of the device extents past > * the new size, and then following the back refs to the chunks. > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index dd1fa7f..b8c234a 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -204,6 +204,7 @@ struct map_lookup { > */ > #define RESTRIPE_RUNNING 0 > #define RESTRIPE_CANCEL_REQ 1 > +#define RESTRIPE_PAUSE_REQ 2 > > struct btrfs_restripe_args; > struct restripe_control { > @@ -261,6 +262,7 @@ int btrfs_balance(struct btrfs_root *dev_root); > int btrfs_restripe(struct restripe_control *rctl, int resume); > int btrfs_recover_restripe(struct btrfs_root *tree_root); > int btrfs_cancel_restripe(struct btrfs_fs_info *fs_info); > +int btrfs_pause_restripe(struct btrfs_fs_info *fs_info, int unset); > int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); > int find_free_dev_extent(struct btrfs_trans_handle *trans, > struct btrfs_device *device, u64 num_bytes,-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have a fs that started with the default policy of metadata=dup. I added a second device and rebalanced, and so the metadata chunks were converted to raid1. Now I can not remove the second device because raid1 requires at least two devices. If I understand this patch series correctly, I can use it to manually convert those raid1 chunks back to dup, and then remove the second device. It occurs to me though, that in the restripe process, the newly created dup chunks can be allocated from either disk still, and any that are allocated on the second disk will then need to be relocated in order to remove that disk. This seems inefficient, so I was wondering if there is a way to make sure that during the restripe, only the disk I intend to keep is allocated from to create the dup chunks, and thus avoid the need to relocate when I remove the second disk? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7Bq0wACgkQJ4UciIs+XuLoUACeMkb4Pd0zshDDKmVzibYtxmvX GewAnAwKcsCaCaAX2XK6oMWxK6FvZQFc =UxDl -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 14, 2011 at 06:59:14PM -0500, Phillip Susi wrote:> I have a fs that started with the default policy of metadata=dup. I > added a second device and rebalanced, and so the metadata chunks were > converted to raid1. Now I can not remove the second device because > raid1 requires at least two devices. > > If I understand this patch series correctly, I can use it to manually > convert those raid1 chunks back to dup, and then remove the second > device. It occurs to me though, that in the restripe process, the > newly created dup chunks can be allocated from either disk still, and > any that are allocated on the second disk will then need to be > relocated in order to remove that disk. This seems inefficient, so I > was wondering if there is a way to make sure that during the restripe, > only the disk I intend to keep is allocated from to create the dup > chunks, and thus avoid the need to relocate when I remove the second disk?Restriper won''t let you do raid1 -> dup transition because dup is only allowed for a single-spindle FS, so you''ll end up with error "btrfs: unable to start restripe ...". There is no way to prioritize disks during restripe. To get dup back you''ll have to convert everything to single, remove the second drive and then convert metadata from single to dup. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/15/2011 4:22 AM, Ilya Dryomov wrote:> Restriper won''t let you do raid1 -> dup transition because dup is only > allowed for a single-spindle FS, so you''ll end up with error "btrfs: > unable to start restripe ...". > > There is no way to prioritize disks during restripe. To get dup back > you''ll have to convert everything to single, remove the second drive and > then convert metadata from single to dup.So there is no way to put a disk into read only mode and prevent allocations of new chunks there? It seems like both of these limitations are highly undesirable when trying to recover from a failing disk. You don''t want any more data being written to the failing disk while you are trying to remove it, and you certainly don''t want to drop back to a single copy of data that is then written to the failing disk. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 15, 2011 at 09:33:14AM -0500, Phillip Susi wrote:> On 11/15/2011 4:22 AM, Ilya Dryomov wrote: > >Restriper won''t let you do raid1 -> dup transition because dup is only > >allowed for a single-spindle FS, so you''ll end up with error "btrfs: > >unable to start restripe ...". > > > >There is no way to prioritize disks during restripe. To get dup back > >you''ll have to convert everything to single, remove the second drive and > >then convert metadata from single to dup. > > So there is no way to put a disk into read only mode and prevent > allocations of new chunks there? > > It seems like both of these limitations are highly undesirable when > trying to recover from a failing disk. You don''t want any more data > being written to the failing disk while you are trying to remove it, > and you certainly don''t want to drop back to a single copy of data > that is then written to the failing disk.If you have a failing disk in a raid setup, you don''t need to downgrade your raid, you can add a third drive and remove the failing one. But that''s inconvenient and most of the time you''ll have to do a full balance. So another thing I''m working on is drive swap, when it''s done it will take care of the failing disk scenario. If you have a raid setup and one of the disks gone bad you''ll be able to say btrfs device replace FAILED NEW <mountpoint> and it will put valid copy onto the fresh drive, basically doing a raid rebuild. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/23/2011 04:01 PM, Ilya Dryomov wrote:> Hello, > > This patch series adds an initial implementation of restriper (it''s > a clever name for relocation framework that allows to do selective > profile changing and selective balancing with some goodies like > pausing/resuming and reporting progress to the user. > > Profile changing is global (per-FS) so far, per-subvolume profiles > require some discussion and can be implemented in future. This is > a RFC so some features/problems are not yet implemented/resolved. > The current TODO list is as follows:I managed to use these patches to convert the raid1 system and metadata chunks back to single and drop the second disk from a two disk array. In doing so I noticed that the restriper required a force switch to downgrade raid1 to single. This seems completely unnecessary to me. A force switch to btrfs device delete might make sense since delete may or may not force a downgrade, but with restripe, the request to convert from raid1 to single is already quite explicit with no room for ambiguity, so there should be no need for an additional confirmation switch. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7Ee+oACgkQJ4UciIs+XuIGIQCdFx9cP7cPQPslE9IcFNDg/6Ns LQYAn2l2ykGwiJt/yZNvuqePyMj3sxYH =P+HR -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html