Mark Fasheh
2012-Jun-07 23:00 UTC
[PATCH 0/3] btrfs-progs: Support for extended inode refs
Hi, The following three patches add support to btrfs-progs for extended inode refs. The kernel patch set can be found: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16567.html the userspace patches have been tested alongside the kernel patches and seem to be in pretty good order. These patches get us support for mkfs, btrfs-debug-tree and importantly, fsck. The mkfs patch is last so that it can easily be taken in and out of the patch series in case we wish to test these changes without actually enabling the disk feature yet. --Mark ** For reference, I will include my description of the extended inode ref design: Currently btrfs has a limitation on the maximum number of hard links an inode can have. Specifically, links are stored in an array of ref items: struct btrfs_inode_ref { __le64 index; __le16 name_len; /* name goes here */ } __attribute__ ((__packed__)); The ref arrays are found via key triple: (inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid) Since items can not exceed the size of a leaf, the total number of links that can be stored for a given inode / parent dir pair is limited to under 4k. This works fine for the most common case of few to only a handful of links. Once the link count gets higher however, we begin to return EMLINK. The following patches fix this situation by introducing a new ref item: struct btrfs_inode_extref { __le64 parent_objectid; __le64 index; __le16 name_len; __u8 name[0]; /* name goes here */ } __attribute__ ((__packed__)); Extended refs use a different addressing scheme. Extended ref keys look like: (inode objectid, BTRFS_INODE_EXTREF_KEY, hash) Where hash is defined as a function of the parent objectid and link name. This effectively fixes the limitation, though we have a slightly less efficient packing of link data. To keep the best of both worlds then, I implemented the following behavior: Extended refs don''t replace the existing ref array. An inode gets an extended ref for a given link _only_ after the ref array has been filled. So the most common cases shouldn''t actually see any difference in performance or disk usage as they''ll never get to the point where we''re using an extended ref. It''s important while reading the patches however that there''s still the possibility that we can have a set of operations that grow out an inode ref array (adding some extended refs) and then remove only the refs in the array. I don''t really see this being common but it''s a case we always have to consider when coding these changes. Extended refs handle the case of a hash collision by storing items with the same key in an array just like the dir item code. This means we have to search an array on rare occasion. -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Fasheh
2012-Jun-07 23:00 UTC
[PATCH 1/3] [PATCH] btrfs-progs: Basic support for extended inode refs
From: Mark Fasheh <mfasheh@suse.com> This patch syncs the extended inode ref definitions from kernels ctree.h and adds support in btrfs-debug-tree for visualizing the state of extended refs on disk. Signed-off-by: Mark Fasheh <mfasheh@suse.de> --- ctree.h | 27 ++++++++++++++++++++++++++- print-tree.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+), 1 deletions(-) diff --git a/ctree.h b/ctree.h index 6545c50..ebf38fe 100644 --- a/ctree.h +++ b/ctree.h @@ -115,6 +115,13 @@ struct btrfs_trans_handle; */ #define BTRFS_NAME_LEN 255 +/* + * Theoretical limit is larger, but we keep this down to a sane + * value. That should limit greatly the possibility of collisions on + * inode ref items. + */ +#define BTRFS_LINK_MAX 65535U + /* 32 bytes in various csum fields */ #define BTRFS_CSUM_SIZE 32 @@ -412,6 +419,7 @@ struct btrfs_super_block { #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL (1ULL << 1) #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS (1ULL << 2) #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO (1ULL << 3) + /* * some patches floated around with a second compression method * lets save that incompat here for when they do get in @@ -426,6 +434,7 @@ struct btrfs_super_block { */ #define BTRFS_FEATURE_INCOMPAT_BIG_METADATA (1ULL << 5) +#define BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF (1ULL << 6) #define BTRFS_FEATURE_COMPAT_SUPP 0ULL #define BTRFS_FEATURE_COMPAT_RO_SUPP 0ULL @@ -434,7 +443,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL | \ BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO | \ BTRFS_FEATURE_INCOMPAT_BIG_METADATA | \ - BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) + BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS | \ + BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) /* * A leaf is full of items. offset and size tell us where to find @@ -573,6 +583,13 @@ struct btrfs_inode_ref { /* name goes here */ } __attribute__ ((__packed__)); +struct btrfs_inode_extref { + __le64 parent_objectid; + __le64 index; + __le16 name_len; + __u8 name[0]; /* name goes here */ +} __attribute__ ((__packed__)); + struct btrfs_timespec { __le64 sec; __le32 nsec; @@ -866,6 +883,7 @@ struct btrfs_root { */ #define BTRFS_INODE_ITEM_KEY 1 #define BTRFS_INODE_REF_KEY 12 +#define BTRFS_INODE_EXTREF_KEY 13 #define BTRFS_XATTR_ITEM_KEY 24 #define BTRFS_ORPHAN_ITEM_KEY 48 @@ -1145,6 +1163,13 @@ BTRFS_SETGET_FUNCS(inode_ref_name_len, struct btrfs_inode_ref, name_len, 16); BTRFS_SETGET_STACK_FUNCS(stack_inode_ref_name_len, struct btrfs_inode_ref, name_len, 16); BTRFS_SETGET_FUNCS(inode_ref_index, struct btrfs_inode_ref, index, 64); +/* struct btrfs_inode_extref */ +BTRFS_SETGET_FUNCS(inode_extref_parent, struct btrfs_inode_extref, + parent_objectid, 64); +BTRFS_SETGET_FUNCS(inode_extref_name_len, struct btrfs_inode_extref, + name_len, 16); +BTRFS_SETGET_FUNCS(inode_extref_index, struct btrfs_inode_extref, index, 64); + /* struct btrfs_inode_item */ BTRFS_SETGET_FUNCS(inode_generation, struct btrfs_inode_item, generation, 64); BTRFS_SETGET_FUNCS(inode_sequence, struct btrfs_inode_item, sequence, 64); diff --git a/print-tree.c b/print-tree.c index fc134c0..6012df8 100644 --- a/print-tree.c +++ b/print-tree.c @@ -55,6 +55,42 @@ static int print_dir_item(struct extent_buffer *eb, struct btrfs_item *item, return 0; } +static int print_inode_extref_item(struct extent_buffer *eb, + struct btrfs_item *item, + struct btrfs_inode_extref *extref) +{ + u32 total; + u32 cur = 0; + u32 len; + u32 name_len = 0; + u64 index = 0; + u64 parent_objid; + char namebuf[BTRFS_NAME_LEN]; + + total = btrfs_item_size(eb, item); + + while (cur < total) { + index = btrfs_inode_extref_index(eb, extref); + name_len = btrfs_inode_extref_name_len(eb, extref); + parent_objid = btrfs_inode_extref_parent(eb, extref); + + len = (name_len <= sizeof(namebuf))? name_len: sizeof(namebuf); + + read_extent_buffer(eb, namebuf, (unsigned long)(extref->name), len); + + printf("\t\tinode extref index %llu parent %llu namelen %u " + "name: %.*s\n", + (unsigned long long)index, + (unsigned long long)parent_objid, + name_len, len, namebuf); + + len = sizeof(*extref) + name_len; + extref = (struct btrfs_inode_extref *)((char *)extref + len); + cur += len; + } + return 0; +} + static int print_inode_ref_item(struct extent_buffer *eb, struct btrfs_item *item, struct btrfs_inode_ref *ref) { @@ -285,6 +321,9 @@ static void print_key_type(u8 type) case BTRFS_INODE_REF_KEY: printf("INODE_REF"); break; + case BTRFS_INODE_EXTREF_KEY: + printf("INODE_EXTREF"); + break; case BTRFS_DIR_ITEM_KEY: printf("DIR_ITEM"); break; @@ -454,6 +493,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l) struct btrfs_extent_data_ref *dref; struct btrfs_shared_data_ref *sref; struct btrfs_inode_ref *iref; + struct btrfs_inode_extref *iref2; struct btrfs_dev_extent *dev_extent; struct btrfs_disk_key disk_key; struct btrfs_root_item root_item; @@ -492,6 +532,10 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l) iref = btrfs_item_ptr(l, i, struct btrfs_inode_ref); print_inode_ref_item(l, item, iref); break; + case BTRFS_INODE_EXTREF_KEY: + iref2 = btrfs_item_ptr(l, i, struct btrfs_inode_extref); + print_inode_extref_item(l, item, iref2); + break; case BTRFS_DIR_ITEM_KEY: case BTRFS_DIR_INDEX_KEY: case BTRFS_XATTR_ITEM_KEY: -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Fasheh
2012-Jun-07 23:00 UTC
[PATCH 2/3] [PATCH] btrfs-progs: add extended inode ref support to btrfsck
From: Mark Fasheh <mfasheh@suse.com> Add a function, process_inode_extref() to be called from process_one_leaf() when an item type of BTRFS_INODE_EXTREF_KEY is encountered. Similarly to process_inode_ref(), process_inode_extref() walks an extref and adds an inode_backref structure for each reference found within. I modified fsck''s inode_backref to get a type field (ref_type) which helps us internally track the exact type of backrefs found. Of course this field could be overwritten in case of disk corruption (duplicate refs) but duplicate refs themselves are tracked by btrfsck so that should not be an issue as btrfsck is written today. Signed-off-by: Mark Fasheh <mfasheh@suse.de> --- btrfsck.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 51 insertions(+), 2 deletions(-) diff --git a/btrfsck.c b/btrfsck.c index 7aac736..b29c228 100644 --- a/btrfsck.c +++ b/btrfsck.c @@ -96,6 +96,7 @@ struct inode_backref { unsigned int found_inode_ref:1; unsigned int filetype:8; int errors; + unsigned int ref_type; u64 dir; u64 index; u16 namelen; @@ -466,12 +467,14 @@ static int add_inode_backref(struct cache_tree *inode_cache, backref->filetype = filetype; backref->found_dir_item = 1; - } else if (itemtype == BTRFS_INODE_REF_KEY) { + } else if ((itemtype == BTRFS_INODE_REF_KEY) || + (itemtype == BTRFS_INODE_EXTREF_KEY)) { if (backref->found_inode_ref) backref->errors |= REF_ERR_DUP_INODE_REF; if (backref->found_dir_index && backref->index != index) backref->errors |= REF_ERR_INDEX_UNMATCH; + backref->ref_type = itemtype; backref->index = index; backref->found_inode_ref = 1; } else { @@ -507,7 +510,7 @@ static int merge_inode_recs(struct inode_record *src, struct inode_record *dst, add_inode_backref(dst_cache, dst->ino, backref->dir, backref->index, backref->name, backref->namelen, 0, - BTRFS_INODE_REF_KEY, backref->errors); + backref->ref_type, backref->errors); } } @@ -851,6 +854,49 @@ static int process_inode_ref(struct extent_buffer *eb, return 0; } +static int process_inode_extref(struct extent_buffer *eb, + int slot, struct btrfs_key *key, + struct shared_node *active_node) +{ + u32 total; + u32 cur = 0; + u32 len; + u32 name_len; + u64 index; + u64 parent; + int error; + struct cache_tree *inode_cache; + struct btrfs_inode_extref *extref; + char namebuf[BTRFS_NAME_LEN]; + + inode_cache = &active_node->inode_cache; + + extref = btrfs_item_ptr(eb, slot, struct btrfs_inode_extref); + total = btrfs_item_size_nr(eb, slot); + while (cur < total) { + name_len = btrfs_inode_extref_name_len(eb, extref); + index = btrfs_inode_extref_index(eb, extref); + parent = btrfs_inode_extref_parent(eb, extref); + if (name_len <= BTRFS_NAME_LEN) { + len = name_len; + error = 0; + } else { + len = BTRFS_NAME_LEN; + error = REF_ERR_NAME_TOO_LONG; + } + read_extent_buffer(eb, namebuf, + (unsigned long)(extref + 1), len); + add_inode_backref(inode_cache, key->objectid, parent, + index, namebuf, len, 0, key->type, error); + + len = sizeof(*extref) + name_len; + extref = (struct btrfs_inode_extref *)((char *)extref + len); + cur += len; + } + return 0; + +} + static u64 count_csum_range(struct btrfs_root *root, u64 start, u64 len) { struct btrfs_key key; @@ -1033,6 +1079,9 @@ static int process_one_leaf(struct btrfs_root *root, struct extent_buffer *eb, case BTRFS_INODE_REF_KEY: ret = process_inode_ref(eb, i, &key, active_node); break; + case BTRFS_INODE_EXTREF_KEY: + ret = process_inode_extref(eb, i, &key, active_node); + break; case BTRFS_INODE_ITEM_KEY: ret = process_inode_item(eb, i, &key, active_node); break; -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Fasheh
2012-Jun-07 23:00 UTC
[PATCH 3/3] [PATCH] btrfs-progs: mkfs support for extended inode refs
From: Mark Fasheh <mfasheh@suse.com> This patch turns on the BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF superblock flag when creating a new file system in mkfs, enabling extended inode refs. Signed-off-by: Mark Fasheh <mfasheh@suse.de> --- mkfs.c | 13 ++++++++----- 1 files changed, 8 insertions(+), 5 deletions(-) diff --git a/mkfs.c b/mkfs.c index c531ef2..2bf4ee9 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1224,6 +1224,8 @@ int main(int ac, char **av) u64 size_of_data = 0; u64 source_dir_size = 0; char *pretty_buf; + struct btrfs_super_block *super; + u64 flags; while(1) { int c; @@ -1426,13 +1428,14 @@ raid_groups: ret = create_data_reloc_tree(trans, root); BUG_ON(ret); - if (mixed) { - struct btrfs_super_block *super = &root->fs_info->super_copy; - u64 flags = btrfs_super_incompat_flags(super); + super = &root->fs_info->super_copy; + flags = btrfs_super_incompat_flags(super); + flags |= BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF; + if (mixed) flags |= BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS; - btrfs_set_super_incompat_flags(super, flags); - } + + btrfs_set_super_incompat_flags(super, flags); printf("fs created label %s on %s\n\tnodesize %u leafsize %u " "sectorsize %u size %s\n", -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html