Hidetoshi Seto
2013-Sep-05 06:51 UTC
[PATCH v2 0/3] btrfs-progs: prevent mkfs from aborting with small volume
Here are 3 patches to avoid undesired aborts of mkfs.btrfs. These are based on top of Chris''s btrfs-progs.git: git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git Thanks, H.Seto Hidetoshi Seto (3): btrfs-progs: error if device for mkfs is too small btrfs-progs: error if device have no space to make primary chunks btrfs-progs: calculate available blocks on device properly ctree.h | 8 +++++ mkfs.c | 23 +++++++++++++ volumes.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 129 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hidetoshi Seto
2013-Sep-05 06:53 UTC
[PATCH v2 1/3] btrfs-progs: error if device for mkfs is too small
Eric pointed out that mkfs abort if specified volume is too small: # truncate --size=2m testfile # ./mkfs.btrfs testfile : SMALL VOLUME: forcing mixed metadata/data groups mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)'' failed. Aborted (core dumped) As the first step to fix problems around there, let mkfs to report error if the size of target volume is less than the size of the first system block group, BTRFS_MKFS_SYSTEM_GROUP_SIZE (= 4MB). Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> --- mkfs.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/mkfs.c b/mkfs.c index b412b7e..a98fe54 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1422,6 +1422,12 @@ int main(int ac, char **av) } } + /* To create the first block group and chunk 0 in make_btrfs */ + if (dev_block_count < BTRFS_MKFS_SYSTEM_GROUP_SIZE) { + fprintf(stderr, "device is too small to make filesystem\n"); + exit(1); + } + blocks[0] = BTRFS_SUPER_INFO_OFFSET; for (i = 1; i < 7; i++) { blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 + -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hidetoshi Seto
2013-Sep-05 06:55 UTC
[PATCH v2 2/3] btrfs-progs: error if device have no space to make primary chunks
The previous patch works fine if the size of specified volume to mkfs is less than 4MB. However usually btrfs requires more than 4MB to work, and the minimum preferred size is depending on the raid setting etc. This patch let mkfs print error message if it cannot allocate one of chunks should be there at first. [before] # truncate --size=4500K testfile # ./mkfs.btrfs -f testfile : SMALL VOLUME: forcing mixed metadata/data groups mkfs.btrfs: mkfs.c:84: make_root_dir: Assertion `!(ret)'' failed. Aborted (core dumped) [After] # truncate --size=4500K testfile # ./mkfs.btrfs -f testfile : SMALL VOLUME: forcing mixed metadata/data groups no space to alloc data/metadata chunk failed to setup the root directory TBD is calculate minimum size for setting and put it in the error message to let user know how large amount of volume is required. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> --- mkfs.c | 17 +++++++++++++++++ 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/mkfs.c b/mkfs.c index a98fe54..bac122f 100644 --- a/mkfs.c +++ b/mkfs.c @@ -81,6 +81,11 @@ static int make_root_dir(struct btrfs_root *root, int mixed) &chunk_start, &chunk_size, BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA); + if (ret == -ENOSPC) { + fprintf(stderr, + "no space to alloc data/metadata chunk\n"); + goto err; + } BUG_ON(ret); ret = btrfs_make_block_group(trans, root, 0, BTRFS_BLOCK_GROUP_METADATA | @@ -93,6 +98,10 @@ static int make_root_dir(struct btrfs_root *root, int mixed) ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root, &chunk_start, &chunk_size, BTRFS_BLOCK_GROUP_METADATA); + if (ret == -ENOSPC) { + fprintf(stderr, "no space to alloc metadata chunk\n"); + goto err; + } BUG_ON(ret); ret = btrfs_make_block_group(trans, root, 0, BTRFS_BLOCK_GROUP_METADATA, @@ -110,6 +119,10 @@ static int make_root_dir(struct btrfs_root *root, int mixed) ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root, &chunk_start, &chunk_size, BTRFS_BLOCK_GROUP_DATA); + if (ret == -ENOSPC) { + fprintf(stderr, "no space to alloc data chunk\n"); + goto err; + } BUG_ON(ret); ret = btrfs_make_block_group(trans, root, 0, BTRFS_BLOCK_GROUP_DATA, @@ -181,6 +194,10 @@ static int create_one_raid_group(struct btrfs_trans_handle *trans, ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root, &chunk_start, &chunk_size, type); + if (ret == -ENOSPC) { + fprintf(stderr, "not enough free space\n"); + exit(1); + } BUG_ON(ret); ret = btrfs_make_block_group(trans, root->fs_info->extent_root, 0, type, BTRFS_FIRST_CHUNK_TREE_OBJECTID, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hidetoshi Seto
2013-Sep-05 06:57 UTC
[PATCH v2 3/3] btrfs-progs: calculate available blocks on device properly
I found that mkfs.btrfs aborts when assigned multi volumes contain a small volume: # parted /dev/sdf p Model: LSI MegaRAID SAS RMB (scsi) Disk /dev/sdf: 72.8GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 72.4GB 72.4GB primary 2 72.4GB 72.8GB 461MB primary # ./mkfs.btrfs -f /dev/sdf1 /dev/sdf2 : SMALL VOLUME: forcing mixed metadata/data groups adding device /dev/sdf2 id 2 mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)'' failed. Aborted (core dumped) This failure of btrfs_alloc_chunk was caused by following steps: 1) since there is only small space in the small device, mkfs was going to allocate a chunk from free space as much as available. So mkfs called btrfs_alloc_chunk with size = device->total_bytes - device->used_bytes. 2) (According to the comment in source code, to avoid overwriting superblock,) btrfs_alloc_chunk starts taking chunks at an offset of 1MB. It means that the layout of a disk will be like: [[1MB at beginning for sb][allocated chunks]* ... free space ... ] and you can see that the available free space for allocation is: avail = device->total_bytes - device->used_bytes - 1MB. 3) Therefore there is only free space 1MB less than requested. damn. From further investigations I also found that this issue is easily reproduced by using -A, --alloc-start option: # truncate --size=1G testfile # ./mkfs.btrfs -A900M -f testfile : mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)'' failed. Aborted (core dumped) In this case there is only 100MB for allocation but btrfs_alloc_chunk was going to allocate more than the 100MB. The root cause of both of above troubles is a same simple bug: btrfs_chunk_alloc does not calculate available bytes properly even though it researches how many devices have enough room to have a chunk to be allocated. So this patch introduces new function btrfs_device_avail_bytes() which returns available bytes for allocation in specified device. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> --- ctree.h | 8 +++++ volumes.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 106 insertions(+), 6 deletions(-) diff --git a/ctree.h b/ctree.h index 0b0d701..90be7ab 100644 --- a/ctree.h +++ b/ctree.h @@ -811,6 +811,14 @@ struct btrfs_csum_item { u8 csum; } __attribute__ ((__packed__)); +/* + * We don''t want to overwrite 1M at the beginning of device, even though + * there is our 1st superblock at 64k. Some possible reasons: + * - the first 64k blank is useful for some boot loader/manager + * - the first 1M could be scratched by buggy partitioner or somesuch + */ +#define BTRFS_BLOCK_RESERVED_1M_FOR_SUPER ((u64)1024 * 1024) + /* tag for the radix tree of block groups in ram */ #define BTRFS_BLOCK_GROUP_DATA (1ULL << 0) #define BTRFS_BLOCK_GROUP_SYSTEM (1ULL << 1) diff --git a/volumes.c b/volumes.c index 0ff2283..e8d7f25 100644 --- a/volumes.c +++ b/volumes.c @@ -268,7 +268,7 @@ static int find_free_dev_extent(struct btrfs_trans_handle *trans, struct btrfs_dev_extent *dev_extent = NULL; u64 hole_size = 0; u64 last_byte = 0; - u64 search_start = 0; + u64 search_start = root->fs_info->alloc_start; u64 search_end = device->total_bytes; int ret; int slot = 0; @@ -283,10 +283,12 @@ static int find_free_dev_extent(struct btrfs_trans_handle *trans, /* we don''t want to overwrite the superblock on the drive, * so we make sure to start at an offset of at least 1MB */ - search_start = max((u64)1024 * 1024, search_start); + search_start = max(BTRFS_BLOCK_RESERVED_1M_FOR_SUPER, search_start); - if (root->fs_info->alloc_start + num_bytes <= device->total_bytes) - search_start = max(root->fs_info->alloc_start, search_start); + if (search_start >= search_end) { + ret = -ENOSPC; + goto error; + } key.objectid = device->devid; key.offset = search_start; @@ -660,6 +662,94 @@ static u32 find_raid56_stripe_len(u32 data_devices, u32 dev_stripe_target) return 64 * 1024; } +/* + * btrfs_device_avail_bytes - count bytes available for alloc_chunk + * + * It is not equal to "device->total_bytes - device->bytes_used". + * We do not allocate any chunk in 1M at beginning of device, and not + * allowed to allocate any chunk before alloc_start if it is specified. + * So search holes from max(1M, alloc_start) to device->total_bytes. + */ +static int btrfs_device_avail_bytes(struct btrfs_trans_handle *trans, + struct btrfs_device *device, + u64 *avail_bytes) +{ + struct btrfs_path *path; + struct btrfs_root *root = device->dev_root; + struct btrfs_key key; + struct btrfs_dev_extent *dev_extent = NULL; + struct extent_buffer *l; + u64 search_start = root->fs_info->alloc_start; + u64 search_end = device->total_bytes; + u64 extent_end = 0; + u64 free_bytes = 0; + int ret; + int slot = 0; + + search_start = max(BTRFS_BLOCK_RESERVED_1M_FOR_SUPER, search_start); + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = device->devid; + key.offset = root->fs_info->alloc_start; + key.type = BTRFS_DEV_EXTENT_KEY; + + path->reada = 2; + ret = btrfs_search_slot(trans, root, &key, path, 0, 0); + if (ret < 0) + goto error; + ret = btrfs_previous_item(root, path, 0, key.type); + if (ret < 0) + goto error; + + while (1) { + l = path->nodes[0]; + slot = path->slots[0]; + if (slot >= btrfs_header_nritems(l)) { + ret = btrfs_next_leaf(root, path); + if (ret == 0) + continue; + if (ret < 0) + goto error; + break; + } + btrfs_item_key_to_cpu(l, &key, slot); + + if (key.objectid < device->devid) + goto next; + if (key.objectid > device->devid) + break; + if (btrfs_key_type(&key) != BTRFS_DEV_EXTENT_KEY) + goto next; + if (key.offset > search_end) + break; + if (key.offset > search_start) + free_bytes += key.offset - search_start; + + dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent); + extent_end = key.offset + btrfs_dev_extent_length(l, + dev_extent); + if (extent_end > search_start) + search_start = extent_end; + if (search_start > search_end) + break; +next: + path->slots[0]++; + cond_resched(); + } + + if (search_start < search_end) + free_bytes += search_end - search_start; + + *avail_bytes = free_bytes; + ret = 0; +error: + btrfs_free_path(path); + return ret; +} + int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, struct btrfs_root *extent_root, u64 *start, u64 *num_bytes, u64 type) @@ -678,7 +768,7 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 calc_size = 8 * 1024 * 1024; u64 min_free; u64 max_chunk_size = 4 * calc_size; - u64 avail; + u64 avail = 0; u64 max_avail = 0; u64 percent_max; int num_stripes = 1; @@ -782,7 +872,9 @@ again: /* build a private list of devices we will allocate from */ while(index < num_stripes) { device = list_entry(cur, struct btrfs_device, dev_list); - avail = device->total_bytes - device->bytes_used; + ret = btrfs_device_avail_bytes(trans, device, &avail); + if (ret) + return ret; cur = cur->next; if (avail >= min_free) { list_move_tail(&device->dev_list, &private_devs); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Maybe Matching Threads
- [PATCH v2 0/3] btrfs: quasi-round-robin for chunk allocation
- [PATCH] fix enospc when there is plenty of space
- [PATCH v3 0/3] btrfs: quasi-round-robin for chunk allocation
- [PATCH] btrfs-progs: Check mount status of multidevice filesystems
- [PATCH 2/4] Btrfs: clean up find_free_extent