Stefan Behrens
2013-Apr-19 15:41 UTC
[PATCH 0/5] Btrfs: introduce a tree for UUID to subvol ID mapping
Mapping UUIDs to subvolume IDs is an operation with a high effort today. Today, the algorithm even has quadratic effort (based on the number of existing subvolumes), which means, that it takes minutes to send/receive a single subvolume if 10,000 subvolumes exist. But even linear effort would be too much since it is a waste. And these data structures to allow mapping UUIDs to subvolume IDs are created every time a btrfs send/receive instance is started. It is much more efficient to maintain a searchable persistent data structure in the filesystem, one that is updated whenever a subvolume/snapshot is created and deleted, and when the received subvolume UUID is set by the btrfs-receive tool. Therefore kernel code is added that is able to maintain data structures in the filesystem that allow to quickly search for a given UUID and to retrieve the subvol ID. Now follows the lengthy justification, why a new tree was added instead of using the existing root tree: The first approach was to not create another tree that holds UUID items. Instead, the items should just go into the top root tree. Unfortunately this confused the algorithm to assign the objectid of subvolumes and snapshots. The reason is that btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for the first created subvol or snapshot after mounting a filesystem, and this function simply searches for the largest used objectid in the root tree keys to pick the next objectid to assign. Of course, the UUID keys have always been the ones with the highest offset value, and the next assigned subvol ID was wastefully huge. To use any other existing tree did not look proper. To apply a workaround such as setting the objectid to zero in the UUID item key and to implement collision handling would either add limitations (in case of a btrfs_extend_item() approach to handle the collisions) or a lot of complexity and source code (in case a key would be looked up that is free of collisions). Adding new code that introduces limitations is not good, and adding code that is complex and lengthy for no good reason is also not good. That''s the justification why a completely new tree was introduced. Stefan Behrens (5): Btrfs: introduce a tree for items that map UUIDs to something Btrfs: support printing UUID tree elements Btrfs: create UUID tree if required Btrfs: maintain subvolume items in the UUID tree Btrfs: fill UUID tree initially fs/btrfs/Makefile | 3 +- fs/btrfs/ctree.h | 54 ++++++ fs/btrfs/disk-io.c | 43 ++++- fs/btrfs/ioctl.c | 65 ++++++- fs/btrfs/print-tree.c | 73 ++++++++ fs/btrfs/transaction.c | 21 ++- fs/btrfs/uuid-tree.c | 497 +++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.c | 170 +++++++++++++++++ fs/btrfs/volumes.h | 2 + 9 files changed, 915 insertions(+), 13 deletions(-) create mode 100644 fs/btrfs/uuid-tree.c -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2013-Apr-19 15:41 UTC
[PATCH 1/5] Btrfs: introduce a tree for items that map UUIDs to something
Mapping UUIDs to subvolume IDs is an operation with a high effort today. Today, the algorithm even has quadratic effort (based on the number of existing subvolumes), which means, that it takes minutes to send/receive a single subvolume if 10,000 subvolumes exist. But even linear effort would be too much since it is a waste. And these data structures to allow mapping UUIDs to subvolume IDs are created every time a btrfs send/receive instance is started. It is much more efficient to maintain a searchable persistent data structure in the filesystem, one that is updated whenever a subvolume/snapshot is created and deleted, and when the received subvolume UUID is set by the btrfs-receive tool. Therefore kernel code is added with this commit that is able to maintain data structures in the filesystem that allow to quickly search for a given UUID and to retrieve data that is assigned to this UUID, like which subvolume ID is related to this UUID. This commit adds a new tree to hold UUID-to-data mapping items. The key of the items is the full UUID plus the key type BTRFS_UUID_KEY. Multiple data blocks can be stored for a given UUID, a type/length/ value scheme is used. Now follows the lengthy justification, why a new tree was added instead of using the existing root tree: The first approach was to not create another tree that holds UUID items. Instead, the items should just go into the top root tree. Unfortunately this confused the algorithm to assign the objectid of subvolumes and snapshots. The reason is that btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for the first created subvol or snapshot after mounting a filesystem, and this function simply searches for the largest used objectid in the root tree keys to pick the next objectid to assign. Of course, the UUID keys have always been the ones with the highest offset value, and the next assigned subvol ID was wastefully huge. To use any other existing tree did not look proper. To apply a workaround such as setting the objectid to zero in the UUID item key and to implement collision handling would either add limitations (in case of a btrfs_extend_item() approach to handle the collisions) or a lot of complexity and source code (in case a key would be looked up that is free of collisions). Adding new code that introduces limitations is not good, and adding code that is complex and lengthy for no good reason is also not good. That''s the justification why a completely new tree was introduced. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> --- fs/btrfs/Makefile | 3 +- fs/btrfs/ctree.h | 50 ++++++ fs/btrfs/uuid-tree.c | 497 +++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 549 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 3932224..a550dfc 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -8,7 +8,8 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ - reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o + reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ + uuid-tree.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 84be717..0439570 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -94,6 +94,9 @@ struct btrfs_ordered_sum; /* holds quota configuration and tracking */ #define BTRFS_QUOTA_TREE_OBJECTID 8ULL +/* for storing items that use the BTRFS_UUID_KEY */ +#define BTRFS_UUID_TREE_OBJECTID 9ULL + /* orhpan objectid for tracking unlinked/truncated files */ #define BTRFS_ORPHAN_OBJECTID -5ULL @@ -953,6 +956,18 @@ struct btrfs_dev_replace_item { __le64 num_uncorrectable_read_errors; } __attribute__ ((__packed__)); +/* for items that use the BTRFS_UUID_KEY */ +#define BTRFS_UUID_ITEM_TYPE_SUBVOL 0 /* for UUIDs assigned to subvols */ +#define BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL 1 /* for UUIDs assigned to + * received subvols */ + +/* a sequence of such items is stored under the BTRFS_UUID_KEY */ +struct btrfs_uuid_item { + __le64 type; /* refer to BTRFS_UUID_ITEM_TYPE* defines above */ + __le64 len; /* number of following 64bit values */ + __le64 subid[0]; /* sequence of subids */ +} __attribute__ ((__packed__)); + /* different types of block groups (and chunks) */ #define BTRFS_BLOCK_GROUP_DATA (1ULL << 0) #define BTRFS_BLOCK_GROUP_SYSTEM (1ULL << 1) @@ -1889,6 +1904,17 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_DEV_REPLACE_KEY 250 /* + * Stores items that allow to quickly map UUIDs to something else. + * These items are part of the filesystem UUID tree. + * The key is built like this: + * (UUID_upper_64_bits, BTRFS_UUID_KEY, UUID_lower_64_bits). + */ +#if BTRFS_UUID_SIZE != 16 +#error "UUID items require BTRFS_UUID_SIZE == 16!" +#endif +#define BTRFS_UUID_KEY 251 + +/* * string items are for debugging. They just store a short string of * data in the FS */ @@ -2946,6 +2972,12 @@ BTRFS_SETGET_FUNCS(dev_replace_cursor_left, struct btrfs_dev_replace_item, BTRFS_SETGET_FUNCS(dev_replace_cursor_right, struct btrfs_dev_replace_item, cursor_right, 64); +/* btrfs_uuid_item */ +BTRFS_SETGET_FUNCS(uuid_type, struct btrfs_uuid_item, type, 64); +BTRFS_SETGET_FUNCS(uuid_len, struct btrfs_uuid_item, len, 64); +BTRFS_SETGET_STACK_FUNCS(stack_uuid_type, struct btrfs_uuid_item, type, 64); +BTRFS_SETGET_STACK_FUNCS(stack_uuid_len, struct btrfs_uuid_item, len, 64); + BTRFS_SETGET_STACK_FUNCS(stack_dev_replace_src_devid, struct btrfs_dev_replace_item, src_devid, 64); BTRFS_SETGET_STACK_FUNCS(stack_dev_replace_cont_reading_from_srcdev_mode, @@ -3373,6 +3405,24 @@ void btrfs_check_and_init_root_item(struct btrfs_root_item *item); void btrfs_update_root_times(struct btrfs_trans_handle *trans, struct btrfs_root *root); +/* uuid-tree.c */ +int btrfs_lookup_uuid_subvol_item(struct btrfs_root *uuid_root, u8 *uuid, + u64 *subvol_id); +int btrfs_insert_uuid_subvol_item(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, u8 *uuid, + u64 subvol_id); +int btrfs_del_uuid_subvol_item(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, u8 *uuid, + u64 subvol_id); +int btrfs_lookup_uuid_received_subvol_item(struct btrfs_root *uuid_root, + u8 *uuid, u64 *subvol_id); +int btrfs_insert_uuid_received_subvol_item(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, + u8 *uuid, u64 subvol_id); +int btrfs_del_uuid_received_subvol_item(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, u8 *uuid, + u64 subvol_id); + /* dir-item.c */ int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir, const char *name, int name_len); diff --git a/fs/btrfs/uuid-tree.c b/fs/btrfs/uuid-tree.c new file mode 100644 index 0000000..bfc2502 --- /dev/null +++ b/fs/btrfs/uuid-tree.c @@ -0,0 +1,497 @@ +/* + * Copyright (C) STRATO AG 2013. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ +#include <linux/uuid.h> +#include "ctree.h" +#include "transaction.h" +#include "disk-io.h" +#include "print-tree.h" + + +/* + * One key is used to store a sequence of btrfs_uuid_item items. + * Each item in the sequence contains a type information and a sequence of + * ids (together with the information about the size of the sequence of ids). + * {[btrfs_uuid_item type0 {id0, id1, ..., idN}], + * ..., + * [btrfs_uuid_item typeZ {id0, id1, ..., idN}]} + * + * It is forbidden to put multiple items with the same type under the same key. + * Instead the sequence of ids is extended and used to store any additional + * ids for the same item type. + */ + +static void btrfs_uuid_16xu8_to_2xu64(u8 *uuid, u64 *first64, u64 *second64) +{ + u64 val; + int i; + int j; + + for (i = 0; i < 2; i++) { + val = 0; + for (j = 7; j >= 0; j--) { + /* in host byte order, resulting disk key is le */ + val |= ((u64)*uuid) << (j * 8); + uuid++; + } + if (!i) + *first64 = val; + else + *second64 = val; + } +} + +static struct btrfs_uuid_item *btrfs_match_uuid_item_type( + struct btrfs_path *path, u64 type) +{ + struct extent_buffer *eb; + int slot; + struct btrfs_uuid_item *ptr; + u32 item_size; + + eb = path->nodes[0]; + slot = path->slots[0]; + ptr = btrfs_item_ptr(eb, slot, struct btrfs_uuid_item); + item_size = btrfs_item_size_nr(eb, slot); + do { + u64 sub_item_type; + u64 sub_item_len; + + if (item_size < sizeof(*ptr)) { + pr_warn("btrfs: uuid item too short (%llu < %d)!\n", + (unsigned long long)item_size, + (int)sizeof(*ptr)); + return NULL; + } + item_size -= sizeof(*ptr); + sub_item_type = btrfs_uuid_type(eb, ptr); + sub_item_len = btrfs_uuid_len(eb, ptr); + if (sub_item_len * 8 > item_size) { + pr_warn("btrfs: uuid item too short (%llu > %llu)!\n", + (unsigned long long)(sub_item_len * 8), + (unsigned long long)item_size); + return NULL; + } + if (sub_item_type == type) + return ptr; + item_size -= sub_item_len * 8; + ptr = 1 + (struct btrfs_uuid_item *) + (((char *)ptr) + (sub_item_len * 8)); + } while (item_size); + + return NULL; +} + +static int btrfs_uuid_tree_lookup_prepare(struct btrfs_root *uuid_root, + u8 *uuid, u64 type, + struct btrfs_path **path, + struct btrfs_uuid_item **ptr) +{ + int ret; + struct btrfs_key key; + + WARN_ON_ONCE(!uuid_root); + if (!uuid_root) { + ret = -ENOENT; + goto out; + } + + key.type = BTRFS_UUID_KEY; + btrfs_uuid_16xu8_to_2xu64(uuid, &key.objectid, &key.offset); + + *path = btrfs_alloc_path(); + if (!*path) { + ret = -ENOMEM; + goto out; + } + + ret = btrfs_search_slot(NULL, uuid_root, &key, *path, 0, 0); + if (ret < 0) + goto out; + if (ret > 0) { + ret = -ENOENT; + goto out; + } + + *ptr = btrfs_match_uuid_item_type(*path, type); + if (!*ptr) { + ret = -ENOENT; + goto out; + } + + ret = 0; + +out: + return ret; +} + +/* return -ENOENT for !found, < 0 for errors, or 0 if an item was found */ +static int btrfs_uuid_tree_lookup(struct btrfs_root *uuid_root, u8 *uuid, + u64 type, u64 subid) +{ + int ret; + struct btrfs_path *path = NULL; + struct extent_buffer *eb; + struct btrfs_uuid_item *ptr; + u64 sub_item_len; + unsigned long offset; + + ret = btrfs_uuid_tree_lookup_prepare(uuid_root, uuid, type, &path, + &ptr); + if (ret < 0) + goto out; + + eb = path->nodes[0]; + sub_item_len = btrfs_uuid_len(eb, ptr); + WARN_ON_ONCE(sub_item_len == 0); + ret = -ENOENT; + ptr++; + offset = (unsigned long)ptr; + while (sub_item_len > 0) { + u64 data; + + read_extent_buffer(eb, &data, offset, sizeof(data)); + if (data == subid) { + ret = 0; + break; + } + offset += sizeof(data); + sub_item_len--; + } + +out: + btrfs_free_path(path); + return ret; +} + +/* return -ENOENT for !found, < 0 for errors, or 0 if an item was found */ +static int btrfs_uuid_tree_lookup_any(struct btrfs_root *uuid_root, u8 *uuid, + u64 type, u64 *subid) +{ + int ret; + struct btrfs_path *path = NULL; + struct extent_buffer *eb; + struct btrfs_uuid_item *ptr; + u64 sub_item_len; + + ret = btrfs_uuid_tree_lookup_prepare(uuid_root, uuid, type, &path, + &ptr); + if (ret < 0) + goto out; + + eb = path->nodes[0]; + sub_item_len = btrfs_uuid_len(eb, ptr); + WARN_ON_ONCE(sub_item_len == 0); + if (sub_item_len > 0) { + /* return first stored id */ + read_extent_buffer(eb, subid, (unsigned long)(ptr + 1), + sizeof(*subid)); + ret = 0; + } else { + ret = -ENOENT; + } + +out: + btrfs_free_path(path); + return ret; +} + +/* it is not checked very the entry to add already exists */ +static int btrfs_uuid_tree_add(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, u8 *uuid, + u64 type, u64 subid) +{ + int ret; + struct btrfs_path *path = NULL; + struct btrfs_key key; + struct extent_buffer *eb; + int slot; + struct btrfs_uuid_item *ptr; + u32 item_size; + + WARN_ON_ONCE(!uuid_root); + if (!uuid_root) { + ret = -EINVAL; + goto out; + } + + key.type = BTRFS_UUID_KEY; + btrfs_uuid_16xu8_to_2xu64(uuid, &key.objectid, &key.offset); + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto out; + } + + ret = btrfs_insert_empty_item(trans, uuid_root, path, &key, + sizeof(*ptr) + sizeof(subid)); + if (ret == -EEXIST) { + ptr = btrfs_match_uuid_item_type(path, type); + if (ptr) { + /* + * An item with that type already exists. + * Extend the item and store the subid at the + * location of the first id of this type. + */ + unsigned long start; + unsigned long move_dst; + unsigned long move_src; + unsigned long move_len; + + btrfs_extend_item(uuid_root, path, sizeof(subid)); + ptr = (struct btrfs_uuid_item *) + (((char *)ptr) - sizeof(subid)); + eb = path->nodes[0]; + slot = path->slots[0]; + item_size = btrfs_item_size_nr(eb, slot); + WARN_ON(btrfs_uuid_len(eb, ptr) == 0); + btrfs_set_uuid_len(eb, ptr, + btrfs_uuid_len(eb, ptr) + 1); + ptr++; + start = btrfs_item_ptr_offset(eb, slot); + move_dst = ((unsigned long)ptr) + sizeof(subid); + move_src = (unsigned long)ptr; + move_len = item_size - (move_dst - start); + memmove_extent_buffer(eb, move_dst, move_src, move_len); + + goto write_subid; + } else { + btrfs_extend_item(uuid_root, path, + sizeof(*ptr) + sizeof(subid)); + } + } else if (ret < 0) { + pr_warn("btrfs: insert uuid-subvol item failed %d (0x%016llx, 0x%016llx) type %llu!\n", + ret, (unsigned long long)key.objectid, + (unsigned long long)key.offset, + (unsigned long long)type); + goto out; + } + + /* + * Add an item for the type for the first time. Either add it behind + * items with different types, or write the very first item. + */ + eb = path->nodes[0]; + slot = path->slots[0]; + ptr = btrfs_item_ptr(eb, slot, struct btrfs_uuid_item); + item_size = btrfs_item_size_nr(eb, slot); + ptr = (struct btrfs_uuid_item *) + (((char *)ptr) + item_size - (sizeof(*ptr) + sizeof(subid))); + btrfs_set_uuid_type(eb, ptr, type); + btrfs_set_uuid_len(eb, ptr, 1); + ptr++; + +write_subid: + ret = 0; + write_extent_buffer(eb, &subid, (unsigned long)ptr, sizeof(subid)); + btrfs_mark_buffer_dirty(eb); + +out: + btrfs_free_path(path); + return ret; +} + +static int btrfs_uuid_tree_rem(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, u8 *uuid, + u64 type, u64 subid) +{ + int ret; + struct btrfs_path *path = NULL; + struct btrfs_key key; + struct extent_buffer *eb; + int slot; + struct btrfs_uuid_item *ptr_start; + unsigned long offset; + u32 item_size; + u64 id_index; + unsigned long start; + unsigned long move_dst; + unsigned long move_src; + unsigned long move_len; + + WARN_ON_ONCE(!uuid_root); + if (!uuid_root) { + ret = -EINVAL; + goto out; + } + + key.type = BTRFS_UUID_KEY; + btrfs_uuid_16xu8_to_2xu64(uuid, &key.objectid, &key.offset); + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto out; + } + + ret = btrfs_search_slot(trans, uuid_root, &key, path, -1, 1); + if (ret < 0) { + pr_warn("btrfs: error %d while searching for uuid item!\n", + ret); + goto out; + } + if (ret > 0) { + ret = -ENOENT; + goto out; + } + + ptr_start = btrfs_match_uuid_item_type(path, type); + if (!ptr_start) { + /* case 1: no entry for that type is found */ + ret = -ENOENT; + goto out; + } + + eb = path->nodes[0]; + slot = path->slots[0]; + start = btrfs_item_ptr_offset(eb, slot); + item_size = btrfs_item_size_nr(eb, slot); + id_index = btrfs_uuid_len(eb, ptr_start); + offset = (unsigned long)(ptr_start + 1); + while (id_index > 0) { + u64 found_subid; + + read_extent_buffer(eb, &found_subid, offset, + sizeof(found_subid)); + if (found_subid == subid) + break; + offset += sizeof(found_subid); + id_index--; + } + + if (!id_index) { + /* + * case 2: an entry for that type is found, but no match + * for the id + */ + ret = -ENOENT; + goto out; + } + + if (btrfs_uuid_len(eb, ptr_start) == 1) { + if (item_size == sizeof(*ptr_start) + sizeof(subid)) { + /* + * case 3: no other types and ids are stored, + * delete the full item + */ + ret = btrfs_del_item(trans, uuid_root, path); + goto out; + } else { + /* + * case 4: No other ids for the particular type + * are stored, but other types exist. Shrink the + * item by the size of an id plus the size of the + * item for that type. + */ + move_dst = (unsigned long)ptr_start; + } + } else { + /* + * case 5: Other ids for that particular type are stored. + * Shrink the item by the size of an id. + */ + move_dst = offset; + WARN_ON(btrfs_uuid_len(eb, ptr_start) == 0); + btrfs_set_uuid_len(eb, ptr_start, + btrfs_uuid_len(eb, ptr_start) - 1); + } + move_src = offset + sizeof(subid); + move_len = item_size - (move_src - start); + + memmove_extent_buffer(eb, move_dst, move_src, move_len); + btrfs_truncate_item(uuid_root, path, item_size - (move_src - move_dst), + 1); + +out: + btrfs_free_path(path); + return ret; +} + +int btrfs_lookup_uuid_subvol_item(struct btrfs_root *uuid_root, u8 *uuid, + u64 *subvol_id) +{ + int ret; + + ret = btrfs_uuid_tree_lookup_any(uuid_root, uuid, + BTRFS_UUID_ITEM_TYPE_SUBVOL, + subvol_id); + *subvol_id = le64_to_cpu(*subvol_id); + return ret; +} + +int btrfs_insert_uuid_subvol_item(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, u8 *uuid, + u64 subvol_id) +{ + int ret; + + subvol_id = cpu_to_le64(subvol_id); + ret = btrfs_uuid_tree_lookup(uuid_root, uuid, + BTRFS_UUID_ITEM_TYPE_SUBVOL, subvol_id); + if (ret == -ENOENT) + ret = btrfs_uuid_tree_add(trans, uuid_root, uuid, + BTRFS_UUID_ITEM_TYPE_SUBVOL, + subvol_id); + return ret; +} + +int btrfs_del_uuid_subvol_item(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, u8 *uuid, + u64 subvol_id) +{ + return btrfs_uuid_tree_rem(trans, uuid_root, uuid, + BTRFS_UUID_ITEM_TYPE_SUBVOL, subvol_id); +} + +int btrfs_lookup_uuid_received_subvol_item(struct btrfs_root *uuid_root, + u8 *uuid, u64 *subvol_id) +{ + int ret; + + ret = btrfs_uuid_tree_lookup_any(uuid_root, uuid, + BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL, + subvol_id); + *subvol_id = le64_to_cpu(*subvol_id); + return ret; +} + +int btrfs_insert_uuid_received_subvol_item(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, + u8 *uuid, u64 subvol_id) +{ + int ret; + + subvol_id = cpu_to_le64(subvol_id); + ret = btrfs_uuid_tree_lookup(uuid_root, uuid, + BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL, + subvol_id); + if (ret == -ENOENT) + ret = btrfs_uuid_tree_add(trans, uuid_root, uuid, + BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL, + subvol_id); + return ret; +} + +int btrfs_del_uuid_received_subvol_item(struct btrfs_trans_handle *trans, + struct btrfs_root *uuid_root, u8 *uuid, + u64 subvol_id) +{ + return btrfs_uuid_tree_rem(trans, uuid_root, uuid, + BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL, + subvol_id); +} -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2013-Apr-19 15:41 UTC
[PATCH 2/5] Btrfs: support printing UUID tree elements
This commit adds support to print UUID tree elements to print-tree.c. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> --- fs/btrfs/print-tree.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index dc0024f..6e204ee 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -155,6 +155,73 @@ static void print_extent_ref_v0(struct extent_buffer *eb, int slot) } #endif +static void print_uuid_item(struct extent_buffer *l, + struct btrfs_uuid_item *ptr, + u64 item_size) +{ + do { + u64 sub_item_type; + u64 sub_item_len; + u64 subvol_id; + + if (item_size < sizeof(*ptr)) { + printk(KERN_INFO "btrfs: uuid item too short!\n"); + return; + } + sub_item_type = btrfs_uuid_type(l, ptr); + sub_item_len = btrfs_uuid_len(l, ptr); + ptr++; + item_size -= sizeof(*ptr); + if (sub_item_len * 8 > item_size) { + printk(KERN_INFO "btrfs: uuid item too short (2)!\n"); + return; + } + + item_size -= sub_item_len * 8; + switch (sub_item_type) { + case BTRFS_UUID_ITEM_TYPE_SUBVOL: + while (sub_item_len) { + read_extent_buffer(l, &subvol_id, + (unsigned long)ptr, 8); + printk(KERN_INFO "\t\tsubvol_id %llu\n", + (unsigned long long) + le64_to_cpu(subvol_id)); + sub_item_len--; + ptr = (struct btrfs_uuid_item *) + (((char *)ptr) + 8); + } + break; + case BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL: + while (sub_item_len) { + read_extent_buffer(l, &subvol_id, + (unsigned long)ptr, 8); + printk(KERN_INFO "\t\treceived_subvol_id %llu\n", + (unsigned long long) + le64_to_cpu(subvol_id)); + sub_item_len--; + ptr = (struct btrfs_uuid_item *) + (((char *)ptr) + 8); + } + break; + default: + printk(KERN_INFO "\t\tunknown type=%llu, len=8*%llu\n", + (unsigned long long)sub_item_type, + (unsigned long long)sub_item_len); + while (sub_item_len) { + read_extent_buffer(l, &subvol_id, + (unsigned long)ptr, 8); + printk(KERN_INFO "\t\tid %llu\n", + (unsigned long long) + le64_to_cpu(subvol_id)); + sub_item_len--; + ptr = (struct btrfs_uuid_item *) + (((char *)ptr) + 8); + } + break; + } + } while (item_size); +} + void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l) { int i; @@ -168,6 +235,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l) struct btrfs_extent_data_ref *dref; struct btrfs_shared_data_ref *sref; struct btrfs_dev_extent *dev_extent; + struct btrfs_uuid_item *uuid_item; struct btrfs_key key; struct btrfs_key found_key; @@ -301,6 +369,11 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l) case BTRFS_DEV_REPLACE_KEY: printk(KERN_INFO "\t\tdev replace\n"); break; + case BTRFS_UUID_KEY: + uuid_item = btrfs_item_ptr(l, i, + struct btrfs_uuid_item); + print_uuid_item(l, uuid_item, btrfs_item_size_nr(l, i)); + break; }; } } -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
This tree is not created by mkfs.btrfs. Therefore when a filesystem is mounted writable and the UUID tree does not exist, this tree is created if required. The tree is also added to the fs_info structure and initialized, but this commit does not yet read or write UUID tree elements. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> --- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 38 +++++++++++++++++++++++++++++++++++++- fs/btrfs/volumes.c | 23 +++++++++++++++++++++++ fs/btrfs/volumes.h | 2 ++ 4 files changed, 63 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0439570..c9baf55 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1305,6 +1305,7 @@ struct btrfs_fs_info { struct btrfs_root *fs_root; struct btrfs_root *csum_root; struct btrfs_root *quota_root; + struct btrfs_root *uuid_root; /* the log root tree is a directory of all the other log roots */ struct btrfs_root *log_root_tree; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 94d499a..75d196b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1563,6 +1563,9 @@ struct btrfs_root *btrfs_read_fs_root_no_name(struct btrfs_fs_info *fs_info, if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID) return fs_info->quota_root ? fs_info->quota_root : ERR_PTR(-ENOENT); + if (location->objectid == BTRFS_UUID_TREE_OBJECTID) + return fs_info->uuid_root ? fs_info->uuid_root : + ERR_PTR(-ENOENT); again: spin_lock(&fs_info->fs_roots_radix_lock); root = radix_tree_lookup(&fs_info->fs_roots_radix, @@ -2018,6 +2021,10 @@ static void free_root_pointers(struct btrfs_fs_info *info, int chunk_root) free_extent_buffer(info->quota_root->node); free_extent_buffer(info->quota_root->commit_root); } + if (info->uuid_root) { + free_extent_buffer(info->uuid_root->node); + free_extent_buffer(info->uuid_root->commit_root); + } info->tree_root->node = NULL; info->tree_root->commit_root = NULL; @@ -2031,6 +2038,10 @@ static void free_root_pointers(struct btrfs_fs_info *info, int chunk_root) info->quota_root->node = NULL; info->quota_root->commit_root = NULL; } + if (info->uuid_root) { + info->uuid_root->node = NULL; + info->uuid_root->commit_root = NULL; + } if (chunk_root) { free_extent_buffer(info->chunk_root->node); @@ -2062,11 +2073,13 @@ int open_ctree(struct super_block *sb, struct btrfs_root *chunk_root; struct btrfs_root *dev_root; struct btrfs_root *quota_root; + struct btrfs_root *uuid_root; struct btrfs_root *log_tree_root; int ret; int err = -EINVAL; int num_backups_tried = 0; int backup_index = 0; + bool need_to_create_uuid_tree = false; tree_root = fs_info->tree_root = btrfs_alloc_root(fs_info); extent_root = fs_info->extent_root = btrfs_alloc_root(fs_info); @@ -2074,9 +2087,10 @@ int open_ctree(struct super_block *sb, chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info); dev_root = fs_info->dev_root = btrfs_alloc_root(fs_info); quota_root = fs_info->quota_root = btrfs_alloc_root(fs_info); + uuid_root = fs_info->uuid_root = btrfs_alloc_root(fs_info); if (!tree_root || !extent_root || !csum_root || - !chunk_root || !dev_root || !quota_root) { + !chunk_root || !dev_root || !quota_root || !uuid_root) { err = -ENOMEM; goto fail; } @@ -2651,6 +2665,19 @@ retry_root_backup: fs_info->pending_quota_state = 1; } + ret = find_and_setup_root(tree_root, fs_info, + BTRFS_UUID_TREE_OBJECTID, uuid_root); + if (ret) { + if (ret != -ENOENT) + goto recovery_tree_root; + + kfree(uuid_root); + uuid_root = fs_info->uuid_root = NULL; + need_to_create_uuid_tree = true; + } else { + uuid_root->track_dirty = 1; + } + fs_info->generation = generation; fs_info->last_trans_committed = generation; @@ -2830,6 +2857,15 @@ retry_root_backup: return ret; } + if (need_to_create_uuid_tree) { + ret = btrfs_create_uuid_tree(fs_info, tree_root); + if (ret) { + pr_warn("btrfs: failed to create the UUID tree\n"); + close_ctree(tree_root); + return ret; + } + } + return 0; fail_qgroup: diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 76ded9e..5466aae 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3426,6 +3426,29 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info) return 0; } +int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info, + struct btrfs_root *tree_root) +{ + struct btrfs_trans_handle *trans; + struct btrfs_root *uuid_root; + + trans = btrfs_start_transaction(tree_root, 2); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + uuid_root = btrfs_create_tree(trans, fs_info, + BTRFS_UUID_TREE_OBJECTID); + if (IS_ERR(uuid_root)) { + btrfs_abort_transaction(trans, tree_root, + PTR_ERR(uuid_root)); + return PTR_ERR(uuid_root); + } + + fs_info->uuid_root = uuid_root; + uuid_root->track_dirty = 1; + + return btrfs_commit_transaction(trans, tree_root); +} /* * shrinking a device means finding all of the device extents past * the new size, and then following the back refs to the chunks. diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 062d860..81dade3 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -304,6 +304,8 @@ int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info); int btrfs_recover_balance(struct btrfs_fs_info *fs_info); int btrfs_pause_balance(struct btrfs_fs_info *fs_info); int btrfs_cancel_balance(struct btrfs_fs_info *fs_info); +int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info, + struct btrfs_root *tree_root); int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *max_avail); -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2013-Apr-19 15:41 UTC
[PATCH 4/5] Btrfs: maintain subvolume items in the UUID tree
When a new subvolume or snapshot is created, a new UUID item is added to the UUID tree. Such items are removed when the subvolume is deleted. The ioctl to set the received subvolume UUID is also touched and will now also add this received UUID into the UUID tree together with the ID of the subvolume. The latter is also done when read-only snapshots are created which inherit all the send/receive information from the parent subvolume. User mode programs use the BTRFS_IOC_TREE_SEARCH ioctl to search and read in the UUID tree. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> --- fs/btrfs/ioctl.c | 65 ++++++++++++++++++++++++++++++++++++++++++-------- fs/btrfs/transaction.c | 21 +++++++++++++++- 2 files changed, 75 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d0af96a..9bd423e 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -57,6 +57,8 @@ #include "send.h" #include "dev-replace.h" +static char empty_uuid[BTRFS_UUID_SIZE] = {0}; + /* Mask out flags that are inappropriate for the given type of inode. */ static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags) { @@ -396,7 +398,7 @@ static noinline int create_subvol(struct inode *dir, * of create_snapshot(). */ ret = btrfs_subvolume_reserve_metadata(root, &block_rsv, - 7, &qgroup_reserved); + 8, &qgroup_reserved); if (ret) return ret; @@ -518,9 +520,13 @@ static noinline int create_subvol(struct inode *dir, ret = btrfs_add_root_ref(trans, root->fs_info->tree_root, objectid, root->root_key.objectid, btrfs_ino(dir), index, name, namelen); - BUG_ON(ret); + ret = btrfs_insert_uuid_subvol_item(trans, root->fs_info->uuid_root, + root_item.uuid, objectid); + if (ret) + btrfs_abort_transaction(trans, root, ret); + fail: trans->block_rsv = NULL; trans->bytes_reserved = 0; @@ -567,9 +573,10 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, * 1 - root item * 2 - root ref/backref * 1 - root of snapshot + * 1 - UUID item */ ret = btrfs_subvolume_reserve_metadata(BTRFS_I(dir)->root, - &pending_snapshot->block_rsv, 7, + &pending_snapshot->block_rsv, 8, &pending_snapshot->qgroup_reserved); if (ret) goto out; @@ -580,7 +587,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, pending_snapshot->dir = dir; pending_snapshot->inherit = inherit; - trans = btrfs_start_transaction(root, 0); + trans = btrfs_start_transaction(root->fs_info->extent_root, 8); if (IS_ERR(trans)) { ret = PTR_ERR(trans); goto fail; @@ -2207,6 +2214,28 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file, goto out_end_trans; } } + + ret = btrfs_del_uuid_subvol_item(trans, root->fs_info->uuid_root, + dest->root_item.uuid, + dest->root_key.objectid); + if (ret && ret != -ENOENT) { + btrfs_abort_transaction(trans, root, ret); + err = ret; + goto out_end_trans; + } + if (memcmp(dest->root_item.received_uuid, empty_uuid, + BTRFS_UUID_SIZE)) { + ret = btrfs_del_uuid_received_subvol_item(trans, + root->fs_info->uuid_root, + dest->root_item.received_uuid, + dest->root_key.objectid); + if (ret && ret != -ENOENT) { + btrfs_abort_transaction(trans, root, ret); + err = ret; + goto out_end_trans; + } + } + out_end_trans: trans->block_rsv = NULL; trans->bytes_reserved = 0; @@ -2418,7 +2447,6 @@ static long btrfs_ioctl_dev_info(struct btrfs_root *root, void __user *arg) struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices; int ret = 0; char *s_uuid = NULL; - char empty_uuid[BTRFS_UUID_SIZE] = {0}; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -3896,6 +3924,7 @@ static long btrfs_ioctl_set_received_subvol(struct file *file, struct btrfs_trans_handle *trans; struct timespec ct = CURRENT_TIME; int ret = 0; + int received_uuid_changed; ret = mnt_want_write_file(file); if (ret < 0) @@ -3925,7 +3954,7 @@ static long btrfs_ioctl_set_received_subvol(struct file *file, goto out; } - trans = btrfs_start_transaction(root, 1); + trans = btrfs_start_transaction(root, 3); if (IS_ERR(trans)) { ret = PTR_ERR(trans); trans = NULL; @@ -3936,6 +3965,14 @@ static long btrfs_ioctl_set_received_subvol(struct file *file, sa->rtime.sec = ct.tv_sec; sa->rtime.nsec = ct.tv_nsec; + received_uuid_changed = memcmp(root_item->received_uuid, sa->uuid, + BTRFS_UUID_SIZE); + if (received_uuid_changed && + memcmp(root_item->received_uuid, empty_uuid, BTRFS_UUID_SIZE)) + btrfs_del_uuid_received_subvol_item(trans, + root->fs_info->uuid_root, + root_item->received_uuid, + root->root_key.objectid); memcpy(root_item->received_uuid, sa->uuid, BTRFS_UUID_SIZE); btrfs_set_root_stransid(root_item, sa->stransid); btrfs_set_root_rtransid(root_item, sa->rtransid); @@ -3948,13 +3985,21 @@ static long btrfs_ioctl_set_received_subvol(struct file *file, &root->root_key, &root->root_item); if (ret < 0) { btrfs_end_transaction(trans, root); - trans = NULL; goto out; - } else { - ret = btrfs_commit_transaction(trans, root); - if (ret < 0) + } + if (received_uuid_changed && + memcmp(sa->uuid, empty_uuid, BTRFS_UUID_SIZE)) { + ret = btrfs_insert_uuid_received_subvol_item( + trans, root->fs_info->uuid_root, sa->uuid, + root->root_key.objectid); + if (ret < 0 && ret != -EEXIST) { + btrfs_abort_transaction(trans, root, ret); goto out; + } } + ret = btrfs_commit_transaction(trans, root); + if (ret < 0) + goto out; ret = copy_to_user(arg, sa, sizeof(*sa)); if (ret) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 94cbd10..dadbdca 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -34,6 +34,8 @@ #define BTRFS_ROOT_TRANS_TAG 0 +static char empty_uuid[BTRFS_UUID_SIZE] = {0}; + void put_transaction(struct btrfs_transaction *transaction) { WARN_ON(atomic_read(&transaction->use_count) == 0); @@ -1264,8 +1266,25 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, dentry->d_name.len * 2); parent_inode->i_mtime = parent_inode->i_ctime = CURRENT_TIME; ret = btrfs_update_inode_fallback(trans, parent_root, parent_inode); - if (ret) + if (ret) { btrfs_abort_transaction(trans, root, ret); + goto fail; + } + ret = btrfs_insert_uuid_subvol_item(trans, fs_info->uuid_root, + new_uuid.b, objectid); + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto fail; + } + if (memcmp(new_root_item->received_uuid, empty_uuid, BTRFS_UUID_SIZE)) { + ret = btrfs_insert_uuid_received_subvol_item( + trans, fs_info->uuid_root, new_root_item->received_uuid, + objectid); + if (ret && ret != -EEXIST) { + btrfs_abort_transaction(trans, root, ret); + goto fail; + } + } fail: pending->error = ret; dir_item_existed: -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
When the UUID tree is initially created, a task is spawned that walks through the root tree. For each found subvolume root_item, the uuid and received_uuid entries in the UUID tree are added. This is such a quick operation so that in case somebody wants to unmount the filesystem while the task is still running, the unmount is delayed until the UUID tree building task is finished. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> --- fs/btrfs/ctree.h | 3 ++ fs/btrfs/disk-io.c | 5 ++ fs/btrfs/volumes.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 156 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c9baf55..e35268e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -23,6 +23,7 @@ #include <linux/highmem.h> #include <linux/fs.h> #include <linux/rwsem.h> +#include <linux/semaphore.h> #include <linux/completion.h> #include <linux/backing-dev.h> #include <linux/wait.h> @@ -1637,6 +1638,8 @@ struct btrfs_fs_info { struct btrfs_dev_replace dev_replace; atomic_t mutually_exclusive_operation_running; + + struct semaphore uuid_scan_sem; }; /* diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 75d196b..cbda829 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -31,6 +31,7 @@ #include <linux/migrate.h> #include <linux/ratelimit.h> #include <linux/uuid.h> +#include <linux/semaphore.h> #include <asm/unaligned.h> #include "compat.h" #include "ctree.h" @@ -2263,6 +2264,7 @@ int open_ctree(struct super_block *sb, init_rwsem(&fs_info->extent_commit_sem); init_rwsem(&fs_info->cleanup_work_sem); init_rwsem(&fs_info->subvol_sem); + sema_init(&fs_info->uuid_scan_sem, 1); fs_info->dev_replace.lock_owner = 0; atomic_set(&fs_info->dev_replace.nesting_level, 0); mutex_init(&fs_info->dev_replace.lock_finishing_cancel_unmount); @@ -3505,6 +3507,9 @@ int close_ctree(struct btrfs_root *root) fs_info->closing = 1; smp_mb(); + /* wait for the uuid_scan task to finish */ + down(&fs_info->uuid_scan_sem); + /* pause restriper - we want to resume on mount */ btrfs_pause_balance(fs_info); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5466aae..0cddd0c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -26,6 +26,7 @@ #include <linux/ratelimit.h> #include <linux/kthread.h> #include <linux/raid/pq.h> +#include <linux/semaphore.h> #include <asm/div64.h> #include "compat.h" #include "ctree.h" @@ -50,6 +51,7 @@ static void btrfs_dev_stat_print_on_load(struct btrfs_device *device); static DEFINE_MUTEX(uuid_mutex); static LIST_HEAD(fs_uuids); +static char empty_uuid[BTRFS_UUID_SIZE] = {0}; static void lock_chunks(struct btrfs_root *root) { @@ -3426,11 +3428,136 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info) return 0; } +static int btrfs_uuid_scan_kthread(void *data) +{ + struct btrfs_fs_info *fs_info = data; + struct btrfs_root *root = fs_info->tree_root; + struct btrfs_key key; + struct btrfs_key max_key; + struct btrfs_path *path = NULL; + int ret = 0; + struct extent_buffer *eb; + int slot; + struct btrfs_root_item root_item; + u32 item_size; + struct btrfs_trans_handle *trans; + + path = btrfs_alloc_path(); + if (!path) { + pr_warn("btrfs: UUID scan failed, ENOMEM\n"); + goto out; + } + + key.objectid = 0; + key.type = BTRFS_ROOT_ITEM_KEY; + key.offset = 0; + + max_key.objectid = (u64)-1; + max_key.type = BTRFS_ROOT_ITEM_KEY; + max_key.offset = (u64)-1; + + path->keep_locks = 1; + + while (1) { + ret = btrfs_search_forward(root, &key, &max_key, path, 0); + if (ret) { + if (ret < 0) + pr_warn("btrfs: UUID scan failed, %d\n", ret); + else + ret = 0; + break; + } + + if (key.type != BTRFS_ROOT_ITEM_KEY || + (key.objectid < BTRFS_FIRST_FREE_OBJECTID && + key.objectid != BTRFS_FS_TREE_OBJECTID) || + key.objectid > BTRFS_LAST_FREE_OBJECTID) + goto skip; + + eb = path->nodes[0]; + slot = path->slots[0]; + item_size = btrfs_item_size_nr(eb, slot); + if (item_size < sizeof(root_item)) + goto skip; + + trans = NULL; + read_extent_buffer(eb, &root_item, + btrfs_item_ptr_offset(eb, slot), + (int)sizeof(root_item)); + if (memcmp(root_item.uuid, empty_uuid, BTRFS_UUID_SIZE)) { + trans = btrfs_start_transaction(fs_info->uuid_root, 2); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + break; + } + ret = btrfs_insert_uuid_subvol_item(trans, + fs_info->uuid_root, + root_item.uuid, + key.objectid); + if (ret < 0) { + pr_warn("btrfs: insert_uuid_received_subvol_item failed %d\n", + ret); + break; + } + } + + if (memcmp(root_item.received_uuid, empty_uuid, + BTRFS_UUID_SIZE)) { + if (!trans) { + trans = btrfs_start_transaction( + fs_info->uuid_root, 2); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + break; + } + } + ret = btrfs_insert_uuid_received_subvol_item( + trans, fs_info->uuid_root, + root_item.received_uuid, key.objectid); + if (ret < 0) { + pr_warn("btrfs: insert_uuid_received_subvol_item failed %d\n", + ret); + break; + } + } + + if (trans) { + ret = btrfs_end_transaction(trans, fs_info->uuid_root); + if (ret) + break; + } + +skip: + btrfs_release_path(path); + if (key.offset < (u64)-1) { + key.offset++; + } else if (key.type < BTRFS_ROOT_ITEM_KEY) { + key.offset = 0; + key.type++; + } else if (key.objectid < (u64)-1) { + key.offset = 0; + key.type = 0; + key.objectid++; + } else { + break; + } + } + +out: + btrfs_free_path(path); + if (ret) + pr_warn("btrfs: start_transaction failed %d\n", ret); + up(&fs_info->uuid_scan_sem); + return 0; +} + int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info, struct btrfs_root *tree_root) { struct btrfs_trans_handle *trans; struct btrfs_root *uuid_root; + struct task_struct *task; + int ret; trans = btrfs_start_transaction(tree_root, 2); if (IS_ERR(trans)) @@ -3447,8 +3574,28 @@ int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info, fs_info->uuid_root = uuid_root; uuid_root->track_dirty = 1; - return btrfs_commit_transaction(trans, tree_root); + ret = btrfs_commit_transaction(trans, tree_root); + if (ret) + return ret; + + down(&fs_info->uuid_scan_sem); + task = kthread_run(btrfs_uuid_scan_kthread, fs_info, + "btrfs-uuid"); + if (IS_ERR(task)) { + /* + * It is not implemented to retry the scanning + * on next mount. It''s just not worth the effort + * to implement it, UUID tree entries are not + * mission critical. + */ + pr_warn("btrfs: failed to start uuid_scan task\n"); + up(&fs_info->uuid_scan_sem); + return PTR_ERR(task); + } + + return 0; } + /* * shrinking a device means finding all of the device extents past * the new size, and then following the back refs to the chunks. -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-Apr-29 14:31 UTC
Re: [PATCH 2/5] Btrfs: support printing UUID tree elements
On Fri, Apr 19, 2013 at 05:41:03PM +0200, Stefan Behrens wrote:> --- a/fs/btrfs/print-tree.c > +++ b/fs/btrfs/print-tree.c > +static void print_uuid_item(struct extent_buffer *l, > + struct btrfs_uuid_item *ptr, > + u64 item_size) > +{ > + do { > + u64 sub_item_type; > + u64 sub_item_len; > + u64 subvol_id; > + > + if (item_size < sizeof(*ptr)) { > + printk(KERN_INFO "btrfs: uuid item too short!\n");please print the expected and found sizes (also in (2) below)> + return; > + } > + sub_item_type = btrfs_uuid_type(l, ptr); > + sub_item_len = btrfs_uuid_len(l, ptr); > + ptr++; > + item_size -= sizeof(*ptr); > + if (sub_item_len * 8 > item_size) {For documentation purposes, I think using sizeof(u64) instead of 8.> + printk(KERN_INFO "btrfs: uuid item too short (2)!\n"); > + return; > + } > + > + item_size -= sub_item_len * 8; > + switch (sub_item_type) { > + case BTRFS_UUID_ITEM_TYPE_SUBVOL: > + while (sub_item_len) { > + read_extent_buffer(l, &subvol_id, > + (unsigned long)ptr, 8); > + printk(KERN_INFO "\t\tsubvol_id %llu\n", > + (unsigned long long) > + le64_to_cpu(subvol_id)); > + sub_item_len--; > + ptr = (struct btrfs_uuid_item *) > + (((char *)ptr) + 8);and this could be wrapped in a macro or function, it''s repeated several times in that function.> + } > + break; > + case BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL: > + while (sub_item_len) { > + read_extent_buffer(l, &subvol_id, > + (unsigned long)ptr, 8); > + printk(KERN_INFO "\t\treceived_subvol_id %llu\n", > + (unsigned long long) > + le64_to_cpu(subvol_id)); > + sub_item_len--; > + ptr = (struct btrfs_uuid_item *) > + (((char *)ptr) + 8); > + } > + break; > + default: > + printk(KERN_INFO "\t\tunknown type=%llu, len=8*%llu\n", > + (unsigned long long)sub_item_type, > + (unsigned long long)sub_item_len); > + while (sub_item_len) { > + read_extent_buffer(l, &subvol_id, > + (unsigned long)ptr, 8); > + printk(KERN_INFO "\t\tid %llu\n", > + (unsigned long long) > + le64_to_cpu(subvol_id)); > + sub_item_len--; > + ptr = (struct btrfs_uuid_item *) > + (((char *)ptr) + 8); > + } > + break; > + } > + } while (item_size); > +}-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 19, 2013 at 05:41:04PM +0200, Stefan Behrens wrote:> @@ -2830,6 +2857,15 @@ retry_root_backup: > return ret; > } > > + if (need_to_create_uuid_tree) { > + ret = btrfs_create_uuid_tree(fs_info, tree_root); > + if (ret) { > + pr_warn("btrfs: failed to create the UUID tree\n");Please print the error code> + close_ctree(tree_root); > + return ret; > + } > + } > + > return 0; > > fail_qgroup:-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-Apr-29 14:50 UTC
Re: [PATCH 4/5] Btrfs: maintain subvolume items in the UUID tree
On Fri, Apr 19, 2013 at 05:41:05PM +0200, Stefan Behrens wrote:> --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -57,6 +57,8 @@ > #include "send.h" > #include "dev-replace.h" > > +static char empty_uuid[BTRFS_UUID_SIZE] = {0};+ const> @@ -567,9 +573,10 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, > * 1 - root item > * 2 - root ref/backref > * 1 - root of snapshot > + * 1 - UUID item > */ > ret = btrfs_subvolume_reserve_metadata(BTRFS_I(dir)->root, > - &pending_snapshot->block_rsv, 7, > + &pending_snapshot->block_rsv, 8, > &pending_snapshot->qgroup_reserved); > if (ret) > goto out; > @@ -580,7 +587,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, > pending_snapshot->dir = dir; > pending_snapshot->inherit = inherit; > > - trans = btrfs_start_transaction(root, 0); > + trans = btrfs_start_transaction(root->fs_info->extent_root, 8);This look suspicious in 2 ways: * why is root switched to extent_root * what''s the reason of 8 units being reserved> if (IS_ERR(trans)) { > ret = PTR_ERR(trans); > goto fail; > @@ -3925,7 +3954,7 @@ static long btrfs_ioctl_set_received_subvol(struct file *file, > goto out; > } > > - trans = btrfs_start_transaction(root, 1); > + trans = btrfs_start_transaction(root, 3);Please document what''s being reserved> if (IS_ERR(trans)) { > ret = PTR_ERR(trans); > trans = NULL; > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -34,6 +34,8 @@ > > #define BTRFS_ROOT_TRANS_TAG 0 > > +static char empty_uuid[BTRFS_UUID_SIZE] = {0};second empty_uuid? well, alternatively you could implement a helper "is uuid empty" and compare to zeros directly> + > void put_transaction(struct btrfs_transaction *transaction) > { > WARN_ON(atomic_read(&transaction->use_count) == 0);-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 19, 2013 at 05:41:06PM +0200, Stefan Behrens wrote:> When the UUID tree is initially created, a task is spawned that > walks through the root tree. For each found subvolume root_item, > the uuid and received_uuid entries in the UUID tree are added. > This is such a quick operation so that in case somebody wants > to unmount the filesystem while the task is still running, the > unmount is delayed until the UUID tree building task is finished.I think the speed of this operation depends on the internal state of the fs (fragmentation, number of subvols) and this could potentially take long. I''d rather see the rescanning process to be interruptible and restartable, but I take your word for now that it''s quick.> --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -26,6 +26,7 @@ > #include <linux/ratelimit.h> > #include <linux/kthread.h> > #include <linux/raid/pq.h> > +#include <linux/semaphore.h> > #include <asm/div64.h> > #include "compat.h" > #include "ctree.h" > @@ -50,6 +51,7 @@ static void btrfs_dev_stat_print_on_load(struct btrfs_device *device); > > static DEFINE_MUTEX(uuid_mutex); > static LIST_HEAD(fs_uuids); > +static char empty_uuid[BTRFS_UUID_SIZE] = {0};third empty_uuid!> +static int btrfs_uuid_scan_kthread(void *data) > +{ > + struct btrfs_fs_info *fs_info = data; > + struct btrfs_root *root = fs_info->tree_root; > + struct btrfs_key key; > + struct btrfs_key max_key; > + struct btrfs_path *path = NULL; > + int ret = 0; > + struct extent_buffer *eb; > + int slot; > + struct btrfs_root_item root_item; > + u32 item_size; > + struct btrfs_trans_handle *trans; > + > + path = btrfs_alloc_path(); > + if (!path) { > + pr_warn("btrfs: UUID scan failed, ENOMEM\n"); > + goto out; > + } > + > + key.objectid = 0; > + key.type = BTRFS_ROOT_ITEM_KEY; > + key.offset = 0; > + > + max_key.objectid = (u64)-1; > + max_key.type = BTRFS_ROOT_ITEM_KEY; > + max_key.offset = (u64)-1; > + > + path->keep_locks = 1; > + > + while (1) {a big loop, add a cond_resched()> + ret = btrfs_search_forward(root, &key, &max_key, path, 0); > + if (ret) { > + if (ret < 0) > + pr_warn("btrfs: UUID scan failed, %d\n", ret); > + else > + ret = 0; > + break; > + } > + > + if (key.type != BTRFS_ROOT_ITEM_KEY || > + (key.objectid < BTRFS_FIRST_FREE_OBJECTID && > + key.objectid != BTRFS_FS_TREE_OBJECTID) || > + key.objectid > BTRFS_LAST_FREE_OBJECTID) > + goto skip; > + > + eb = path->nodes[0]; > + slot = path->slots[0]; > + item_size = btrfs_item_size_nr(eb, slot); > + if (item_size < sizeof(root_item)) > + goto skip; > + > + trans = NULL; > + read_extent_buffer(eb, &root_item, > + btrfs_item_ptr_offset(eb, slot), > + (int)sizeof(root_item)); > + if (memcmp(root_item.uuid, empty_uuid, BTRFS_UUID_SIZE)) { > + trans = btrfs_start_transaction(fs_info->uuid_root, 2); > + if (IS_ERR(trans)) { > + ret = PTR_ERR(trans); > + break; > + } > + ret = btrfs_insert_uuid_subvol_item(trans, > + fs_info->uuid_root, > + root_item.uuid, > + key.objectid); > + if (ret < 0) { > + pr_warn("btrfs: insert_uuid_received_subvol_item failed %d\n", > + ret); > + break; > + } > + } > + > + if (memcmp(root_item.received_uuid, empty_uuid, > + BTRFS_UUID_SIZE)) { > + if (!trans) { > + trans = btrfs_start_transaction( > + fs_info->uuid_root, 2); > + if (IS_ERR(trans)) { > + ret = PTR_ERR(trans); > + break; > + } > + } > + ret = btrfs_insert_uuid_received_subvol_item( > + trans, fs_info->uuid_root, > + root_item.received_uuid, key.objectid); > + if (ret < 0) { > + pr_warn("btrfs: insert_uuid_received_subvol_item failed %d\n", > + ret); > + break; > + } > + } > + > + if (trans) { > + ret = btrfs_end_transaction(trans, fs_info->uuid_root); > + if (ret) > + break; > + } > + > +skip: > + btrfs_release_path(path); > + if (key.offset < (u64)-1) { > + key.offset++; > + } else if (key.type < BTRFS_ROOT_ITEM_KEY) { > + key.offset = 0; > + key.type++; > + } else if (key.objectid < (u64)-1) { > + key.offset = 0; > + key.type = 0; > + key.objectid++; > + } else { > + break; > + } > + } > + > +out: > + btrfs_free_path(path); > + if (ret) > + pr_warn("btrfs: start_transaction failed %d\n", ret); > + up(&fs_info->uuid_scan_sem);Does lockdep need to be instructed that a semaphore is released in a different thread it''s been taken? Not sure if this wasn''t for mutexes.> + return 0; > +}-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 29 Apr 2013 17:12:20 +0200, David Sterba wrote:> On Fri, Apr 19, 2013 at 05:41:06PM +0200, Stefan Behrens wrote:[...]>> + up(&fs_info->uuid_scan_sem); > > Does lockdep need to be instructed that a semaphore is released in a > different thread it''s been taken? Not sure if this wasn''t for mutexes.Not for semaphores. I''m also working on handling the case that the filesystem is mounted with an old kernel (what Josef and you have said in #btrfs). Thanks for your review comments to the 4 patches, David! All of them make sense and I will change it according to your comments. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik
2013-Apr-29 19:27 UTC
Re: [PATCH 1/5] Btrfs: introduce a tree for items that map UUIDs to something
On Fri, Apr 19, 2013 at 09:41:02AM -0600, Stefan Behrens wrote:> Mapping UUIDs to subvolume IDs is an operation with a high effort > today. Today, the algorithm even has quadratic effort (based on the > number of existing subvolumes), which means, that it takes minutes > to send/receive a single subvolume if 10,000 subvolumes exist. But > even linear effort would be too much since it is a waste. And these > data structures to allow mapping UUIDs to subvolume IDs are created > every time a btrfs send/receive instance is started. > > It is much more efficient to maintain a searchable persistent data > structure in the filesystem, one that is updated whenever a > subvolume/snapshot is created and deleted, and when the received > subvolume UUID is set by the btrfs-receive tool. > > Therefore kernel code is added with this commit that is able to > maintain data structures in the filesystem that allow to quickly > search for a given UUID and to retrieve data that is assigned to > this UUID, like which subvolume ID is related to this UUID. > > This commit adds a new tree to hold UUID-to-data mapping items. The > key of the items is the full UUID plus the key type BTRFS_UUID_KEY. > Multiple data blocks can be stored for a given UUID, a type/length/ > value scheme is used. > > Now follows the lengthy justification, why a new tree was added > instead of using the existing root tree: > > The first approach was to not create another tree that holds UUID > items. Instead, the items should just go into the top root tree. > Unfortunately this confused the algorithm to assign the objectid > of subvolumes and snapshots. The reason is that > btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for > the first created subvol or snapshot after mounting a filesystem, > and this function simply searches for the largest used objectid in > the root tree keys to pick the next objectid to assign. Of course, > the UUID keys have always been the ones with the highest offset > value, and the next assigned subvol ID was wastefully huge. > > To use any other existing tree did not look proper. To apply a > workaround such as setting the objectid to zero in the UUID item > key and to implement collision handling would either add > limitations (in case of a btrfs_extend_item() approach to handle > the collisions) or a lot of complexity and source code (in case a > key would be looked up that is free of collisions). Adding new code > that introduces limitations is not good, and adding code that is > complex and lengthy for no good reason is also not good. That''s the > justification why a completely new tree was introduced. > > Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> > --- > fs/btrfs/Makefile | 3 +- > fs/btrfs/ctree.h | 50 ++++++ > fs/btrfs/uuid-tree.c | 497 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 549 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile > index 3932224..a550dfc 100644 > --- a/fs/btrfs/Makefile > +++ b/fs/btrfs/Makefile > @@ -8,7 +8,8 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ > extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ > export.o tree-log.o free-space-cache.o zlib.o lzo.o \ > compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ > - reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o > + reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ > + uuid-tree.o > > btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o > btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 84be717..0439570 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -94,6 +94,9 @@ struct btrfs_ordered_sum; > /* holds quota configuration and tracking */ > #define BTRFS_QUOTA_TREE_OBJECTID 8ULL > > +/* for storing items that use the BTRFS_UUID_KEY */ > +#define BTRFS_UUID_TREE_OBJECTID 9ULL > + > /* orhpan objectid for tracking unlinked/truncated files */ > #define BTRFS_ORPHAN_OBJECTID -5ULL > > @@ -953,6 +956,18 @@ struct btrfs_dev_replace_item { > __le64 num_uncorrectable_read_errors; > } __attribute__ ((__packed__)); > > +/* for items that use the BTRFS_UUID_KEY */ > +#define BTRFS_UUID_ITEM_TYPE_SUBVOL 0 /* for UUIDs assigned to subvols */ > +#define BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL 1 /* for UUIDs assigned to > + * received subvols */ > + > +/* a sequence of such items is stored under the BTRFS_UUID_KEY */ > +struct btrfs_uuid_item { > + __le64 type; /* refer to BTRFS_UUID_ITEM_TYPE* defines above */ > + __le64 len; /* number of following 64bit values */ > + __le64 subid[0]; /* sequence of subids */ > +} __attribute__ ((__packed__)); > + > /* different types of block groups (and chunks) */ > #define BTRFS_BLOCK_GROUP_DATA (1ULL << 0) > #define BTRFS_BLOCK_GROUP_SYSTEM (1ULL << 1) > @@ -1889,6 +1904,17 @@ struct btrfs_ioctl_defrag_range_args { > #define BTRFS_DEV_REPLACE_KEY 250 > > /* > + * Stores items that allow to quickly map UUIDs to something else. > + * These items are part of the filesystem UUID tree. > + * The key is built like this: > + * (UUID_upper_64_bits, BTRFS_UUID_KEY, UUID_lower_64_bits). > + */ > +#if BTRFS_UUID_SIZE != 16 > +#error "UUID items require BTRFS_UUID_SIZE == 16!" > +#endif > +#define BTRFS_UUID_KEY 251 > + > +/* > * string items are for debugging. They just store a short string of > * data in the FS > */ > @@ -2946,6 +2972,12 @@ BTRFS_SETGET_FUNCS(dev_replace_cursor_left, struct btrfs_dev_replace_item, > BTRFS_SETGET_FUNCS(dev_replace_cursor_right, struct btrfs_dev_replace_item, > cursor_right, 64); > > +/* btrfs_uuid_item */ > +BTRFS_SETGET_FUNCS(uuid_type, struct btrfs_uuid_item, type, 64); > +BTRFS_SETGET_FUNCS(uuid_len, struct btrfs_uuid_item, len, 64); > +BTRFS_SETGET_STACK_FUNCS(stack_uuid_type, struct btrfs_uuid_item, type, 64); > +BTRFS_SETGET_STACK_FUNCS(stack_uuid_len, struct btrfs_uuid_item, len, 64); > + > BTRFS_SETGET_STACK_FUNCS(stack_dev_replace_src_devid, > struct btrfs_dev_replace_item, src_devid, 64); > BTRFS_SETGET_STACK_FUNCS(stack_dev_replace_cont_reading_from_srcdev_mode, > @@ -3373,6 +3405,24 @@ void btrfs_check_and_init_root_item(struct btrfs_root_item *item); > void btrfs_update_root_times(struct btrfs_trans_handle *trans, > struct btrfs_root *root); > > +/* uuid-tree.c */ > +int btrfs_lookup_uuid_subvol_item(struct btrfs_root *uuid_root, u8 *uuid, > + u64 *subvol_id); > +int btrfs_insert_uuid_subvol_item(struct btrfs_trans_handle *trans, > + struct btrfs_root *uuid_root, u8 *uuid, > + u64 subvol_id); > +int btrfs_del_uuid_subvol_item(struct btrfs_trans_handle *trans, > + struct btrfs_root *uuid_root, u8 *uuid, > + u64 subvol_id); > +int btrfs_lookup_uuid_received_subvol_item(struct btrfs_root *uuid_root, > + u8 *uuid, u64 *subvol_id); > +int btrfs_insert_uuid_received_subvol_item(struct btrfs_trans_handle *trans, > + struct btrfs_root *uuid_root, > + u8 *uuid, u64 subvol_id); > +int btrfs_del_uuid_received_subvol_item(struct btrfs_trans_handle *trans, > + struct btrfs_root *uuid_root, u8 *uuid, > + u64 subvol_id); > + > /* dir-item.c */ > int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir, > const char *name, int name_len); > diff --git a/fs/btrfs/uuid-tree.c b/fs/btrfs/uuid-tree.c > new file mode 100644 > index 0000000..bfc2502 > --- /dev/null > +++ b/fs/btrfs/uuid-tree.c > @@ -0,0 +1,497 @@ > +/* > + * Copyright (C) STRATO AG 2013. All rights reserved. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public > + * License v2 as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public > + * License along with this program; if not, write to the > + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, > + * Boston, MA 021110-1307, USA. > + */ > +#include <linux/uuid.h> > +#include "ctree.h" > +#include "transaction.h" > +#include "disk-io.h" > +#include "print-tree.h" > + > + > +/* > + * One key is used to store a sequence of btrfs_uuid_item items. > + * Each item in the sequence contains a type information and a sequence of > + * ids (together with the information about the size of the sequence of ids). > + * {[btrfs_uuid_item type0 {id0, id1, ..., idN}], > + * ..., > + * [btrfs_uuid_item typeZ {id0, id1, ..., idN}]} > + * > + * It is forbidden to put multiple items with the same type under the same key. > + * Instead the sequence of ids is extended and used to store any additional > + * ids for the same item type. > + */ > + > +static void btrfs_uuid_16xu8_to_2xu64(u8 *uuid, u64 *first64, u64 *second64)If this is still called this in the next submission I''m flying to Germany to kick you.> +{ > + u64 val; > + int i; > + int j; > + > + for (i = 0; i < 2; i++) { > + val = 0; > + for (j = 7; j >= 0; j--) { > + /* in host byte order, resulting disk key is le */ > + val |= ((u64)*uuid) << (j * 8); > + uuid++; > + } > + if (!i) > + *first64 = val; > + else > + *second64 = val; > + } > +} > + > +static struct btrfs_uuid_item *btrfs_match_uuid_item_type( > + struct btrfs_path *path, u64 type) > +{ > + struct extent_buffer *eb; > + int slot; > + struct btrfs_uuid_item *ptr; > + u32 item_size; > + > + eb = path->nodes[0]; > + slot = path->slots[0]; > + ptr = btrfs_item_ptr(eb, slot, struct btrfs_uuid_item); > + item_size = btrfs_item_size_nr(eb, slot); > + do { > + u64 sub_item_type; > + u64 sub_item_len; > + > + if (item_size < sizeof(*ptr)) { > + pr_warn("btrfs: uuid item too short (%llu < %d)!\n", > + (unsigned long long)item_size, > + (int)sizeof(*ptr)); > + return NULL; > + } > + item_size -= sizeof(*ptr); > + sub_item_type = btrfs_uuid_type(eb, ptr); > + sub_item_len = btrfs_uuid_len(eb, ptr); > + if (sub_item_len * 8 > item_size) { > + pr_warn("btrfs: uuid item too short (%llu > %llu)!\n", > + (unsigned long long)(sub_item_len * 8), > + (unsigned long long)item_size); > + return NULL; > + } > + if (sub_item_type == type) > + return ptr; > + item_size -= sub_item_len * 8; > + ptr = 1 + (struct btrfs_uuid_item *) > + (((char *)ptr) + (sub_item_len * 8)); > + } while (item_size); > + > + return NULL; > +} > + > +static int btrfs_uuid_tree_lookup_prepare(struct btrfs_root *uuid_root, > + u8 *uuid, u64 type, > + struct btrfs_path **path, > + struct btrfs_uuid_item **ptr) > +{ > + int ret; > + struct btrfs_key key; > + > + WARN_ON_ONCE(!uuid_root); > + if (!uuid_root) {Include the WARN_ON_ONCE() under the if ().> + ret = -ENOENT; > + goto out; > + } > + > + key.type = BTRFS_UUID_KEY; > + btrfs_uuid_16xu8_to_2xu64(uuid, &key.objectid, &key.offset); > + > + *path = btrfs_alloc_path(); > + if (!*path) { > + ret = -ENOMEM; > + goto out; > + }This is just added complexity, have callers allocate their own path and pass it in.> + > + ret = btrfs_search_slot(NULL, uuid_root, &key, *path, 0, 0); > + if (ret < 0) > + goto out; > + if (ret > 0) { > + ret = -ENOENT; > + goto out; > + } > + > + *ptr = btrfs_match_uuid_item_type(*path, type); > + if (!*ptr) { > + ret = -ENOENT; > + goto out; > + } > + > + ret = 0; > + > +out: > + return ret; > +} > + > +/* return -ENOENT for !found, < 0 for errors, or 0 if an item was found */ > +static int btrfs_uuid_tree_lookup(struct btrfs_root *uuid_root, u8 *uuid, > + u64 type, u64 subid) > +{ > + int ret; > + struct btrfs_path *path = NULL; > + struct extent_buffer *eb; > + struct btrfs_uuid_item *ptr; > + u64 sub_item_len; > + unsigned long offset; > + > + ret = btrfs_uuid_tree_lookup_prepare(uuid_root, uuid, type, &path, > + &ptr); > + if (ret < 0) > + goto out; > + > + eb = path->nodes[0]; > + sub_item_len = btrfs_uuid_len(eb, ptr); > + WARN_ON_ONCE(sub_item_len == 0); > + ret = -ENOENT; > + ptr++; > + offset = (unsigned long)ptr; > + while (sub_item_len > 0) { > + u64 data; > + > + read_extent_buffer(eb, &data, offset, sizeof(data)); > + if (data == subid) { > + ret = 0; > + break; > + } > + offset += sizeof(data); > + sub_item_len--; > + } > + > +out: > + btrfs_free_path(path); > + return ret; > +} > + > +/* return -ENOENT for !found, < 0 for errors, or 0 if an item was found */ > +static int btrfs_uuid_tree_lookup_any(struct btrfs_root *uuid_root, u8 *uuid, > + u64 type, u64 *subid) > +{ > + int ret; > + struct btrfs_path *path = NULL; > + struct extent_buffer *eb; > + struct btrfs_uuid_item *ptr; > + u64 sub_item_len; > + > + ret = btrfs_uuid_tree_lookup_prepare(uuid_root, uuid, type, &path, > + &ptr); > + if (ret < 0) > + goto out; > + > + eb = path->nodes[0]; > + sub_item_len = btrfs_uuid_len(eb, ptr); > + WARN_ON_ONCE(sub_item_len == 0); > + if (sub_item_len > 0) { > + /* return first stored id */ > + read_extent_buffer(eb, subid, (unsigned long)(ptr + 1), > + sizeof(*subid)); > + ret = 0; > + } else { > + ret = -ENOENT; > + } > + > +out: > + btrfs_free_path(path); > + return ret; > +} > + > +/* it is not checked very the entry to add already exists */ > +static int btrfs_uuid_tree_add(struct btrfs_trans_handle *trans, > + struct btrfs_root *uuid_root, u8 *uuid, > + u64 type, u64 subid) > +{ > + int ret; > + struct btrfs_path *path = NULL; > + struct btrfs_key key; > + struct extent_buffer *eb; > + int slot; > + struct btrfs_uuid_item *ptr; > + u32 item_size; > + > + WARN_ON_ONCE(!uuid_root); > + if (!uuid_root) {Move WARN_ON_ONCE() under the if statement.> + ret = -EINVAL; > + goto out; > + } > + > + key.type = BTRFS_UUID_KEY; > + btrfs_uuid_16xu8_to_2xu64(uuid, &key.objectid, &key.offset); > + > + path = btrfs_alloc_path(); > + if (!path) { > + ret = -ENOMEM; > + goto out; > + } > + > + ret = btrfs_insert_empty_item(trans, uuid_root, path, &key, > + sizeof(*ptr) + sizeof(subid)); > + if (ret == -EEXIST) { > + ptr = btrfs_match_uuid_item_type(path, type); > + if (ptr) { > + /* > + * An item with that type already exists. > + * Extend the item and store the subid at the > + * location of the first id of this type. > + */ > + unsigned long start; > + unsigned long move_dst; > + unsigned long move_src; > + unsigned long move_len; > + > + btrfs_extend_item(uuid_root, path, sizeof(subid)); > + ptr = (struct btrfs_uuid_item *) > + (((char *)ptr) - sizeof(subid)); > + eb = path->nodes[0]; > + slot = path->slots[0]; > + item_size = btrfs_item_size_nr(eb, slot); > + WARN_ON(btrfs_uuid_len(eb, ptr) == 0); > + btrfs_set_uuid_len(eb, ptr, > + btrfs_uuid_len(eb, ptr) + 1); > + ptr++; > + start = btrfs_item_ptr_offset(eb, slot); > + move_dst = ((unsigned long)ptr) + sizeof(subid); > + move_src = (unsigned long)ptr; > + move_len = item_size - (move_dst - start); > + memmove_extent_buffer(eb, move_dst, move_src, move_len); > + > + goto write_subid; > + } else { > + btrfs_extend_item(uuid_root, path, > + sizeof(*ptr) + sizeof(subid)); > + } > + } else if (ret < 0) { > + pr_warn("btrfs: insert uuid-subvol item failed %d (0x%016llx, 0x%016llx) type %llu!\n", > + ret, (unsigned long long)key.objectid, > + (unsigned long long)key.offset, > + (unsigned long long)type); > + goto out; > + } > + > + /* > + * Add an item for the type for the first time. Either add it behind > + * items with different types, or write the very first item. > + */ > + eb = path->nodes[0]; > + slot = path->slots[0]; > + ptr = btrfs_item_ptr(eb, slot, struct btrfs_uuid_item); > + item_size = btrfs_item_size_nr(eb, slot); > + ptr = (struct btrfs_uuid_item *) > + (((char *)ptr) + item_size - (sizeof(*ptr) + sizeof(subid))); > + btrfs_set_uuid_type(eb, ptr, type); > + btrfs_set_uuid_len(eb, ptr, 1); > + ptr++; > + > +write_subid: > + ret = 0; > + write_extent_buffer(eb, &subid, (unsigned long)ptr, sizeof(subid)); > + btrfs_mark_buffer_dirty(eb); > + > +out: > + btrfs_free_path(path); > + return ret; > +} > + > +static int btrfs_uuid_tree_rem(struct btrfs_trans_handle *trans, > + struct btrfs_root *uuid_root, u8 *uuid, > + u64 type, u64 subid) > +{ > + int ret; > + struct btrfs_path *path = NULL; > + struct btrfs_key key; > + struct extent_buffer *eb; > + int slot; > + struct btrfs_uuid_item *ptr_start; > + unsigned long offset; > + u32 item_size; > + u64 id_index; > + unsigned long start; > + unsigned long move_dst; > + unsigned long move_src; > + unsigned long move_len; > + > + WARN_ON_ONCE(!uuid_root); > + if (!uuid_root) {Here too. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html