thr3ads.net - Btrfs devel - [PATCH 0/3] btrfs: extended inode refs [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Mark Fasheh

2012-Apr-05 20:09 UTC

[PATCH 0/3] btrfs: extended inode refs

Currently btrfs has a limitation on the maximum number of hard links an
inode can have. Specifically, links are stored in an array of ref
items:

struct btrfs_inode_ref {
	__le64 index;
	__le16 name_len;
	/* name goes here */
} __attribute__ ((__packed__));

The ref arrays are found via key triple:

(inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid)

Since items can not exceed the size of a leaf, the total number of links
that can be stored for a given inode / parent dir pair is limited to under
4k. This works fine for the most common case of few to only a handful of
links. Once the link count gets higher however, we begin to return EMLINK.


The following patches fix this situation by introducing a new ref item:

struct btrfs_inode_extref {
	__le64 parent_objectid;
	__le64 index;
	__le16 name_len;
	__u8   name[0];
	/* name goes here */
} __attribute__ ((__packed__));

Extended refs behave differently from ref arrays in several key areas.

Each extended refs is it''s own item so there is no ref array (and
therefore no limit on size).

As a result, we must use a different addressing scheme. Extended ref keys
look like:

(inode objectid, BTRFS_INODE_EXTREF_KEY, hash)

Where hash is defined as a function of the parent objectid and link name.

This effectively fixes the limitation, though we have a slightly less
efficient packing of link data. To keep the best of both worlds then, I
implemented the following behavior:

Extended refs don''t replace the existing ref array. An inode gets an
extended ref for a given link _only_ after the ref array has been filled.  So
the most common cases shouldn''t actually see any difference in
performance
or disk usage as they''ll never get to the point where we''re
using an
extended ref.

It''s important while reading the patches however that there''s
still the
possibility that we can have a set of operations that grow out an inode ref
array (adding some extended refs) and then remove only the refs in the
array.  I don''t really see this being common but it''s a case
we always have
to consider when coding these changes.

Right now there is a limitation for extrefs in that we''re not handling
the
possibility of a hash collision. There are two ways I see we can deal with
this:

We can use a 56-bit hash and keep a generation counter in the lower 8
bits of the offset field.  The cost would be an additional tree search
(between offset <hash>00 and <hash>FF) if we don''t find
exactly the name we
were looking for.

An alternative solution to dealing with collisions could be to emulate the
dir-item insertion code - specifically something like insert_with_overflow()
which will stuff multiple items under one key. I tend to prefer the idea of
simply including a generation in the key offset however since it maintains
the 1:1 relationship of keys to names which turns out to be much nicer to
code for in my honest opinion. Also none of the code which iterates the tree
looking for refs would have to change as the only difference is in the key
offset and not in the actual item structure.


Testing wise, the patches are in an intermediate state. I''ve debugged a
fair
bit but I''m certain there''s gremlins lurking in there.  The
basic namespace
operations work well enough (link, unlink, etc).  I''ve done light
testing of
my changes in backref.c by exercising BTRFS_IOC_INO_PATHS.  The changes in
tree-log.c need the most review and testing - I haven''t really figured
out a
great way to exercise the code in tree-log yet (suggestions would be
great!).


Finally, these patches are based off Linux v3.3.
	--Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Fasheh

2012-Apr-05 20:09 UTC

head link

[PATCH 1/3] btrfs: extended inode refs

This patch adds basic support for extended inode refs. This includes support
for link and unlink of the refs, which basically gets us support for rename
as well.

Inode creation does not need changing - extended refs are only added after
the ref array is full.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/ctree.h      |   50 +++++++++--
 fs/btrfs/inode-item.c |  244 +++++++++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/inode.c      |   20 ++--
 3 files changed, 288 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 80b6486..5fc77ee 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -143,6 +143,13 @@ struct btrfs_ordered_sum;
  */
 #define BTRFS_NAME_LEN 255
 
+/*
+ * Theoretical limit is larger, but we keep this down to a sane
+ * value. That should limit greatly the possibility of collisions on
+ * inode ref items.
+ */
+#define BTRFS_LINK_MAX 65535U
+
 /* 32 bytes in various csum fields */
 #define BTRFS_CSUM_SIZE 32
 
@@ -462,13 +469,16 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
 #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO	(1ULL << 3)
 
+#define BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF	(1ULL << 6)
+
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 #define BTRFS_FEATURE_COMPAT_RO_SUPP		0ULL
 #define BTRFS_FEATURE_INCOMPAT_SUPP			\
 	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF |		\
 	 BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |	\
 	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
-	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)
+	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |		\
+	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
@@ -615,6 +625,14 @@ struct btrfs_inode_ref {
 	/* name goes here */
 } __attribute__ ((__packed__));
 
+struct btrfs_inode_extref {
+	__le64 parent_objectid;
+	__le64 index;
+	__le16 name_len;
+	__u8   name[0];
+	/* name goes here */
+} __attribute__ ((__packed__));
+
 struct btrfs_timespec {
 	__le64 sec;
 	__le32 nsec;
@@ -1400,6 +1418,7 @@ struct btrfs_ioctl_defrag_range_args {
  */
 #define BTRFS_INODE_ITEM_KEY		1
 #define BTRFS_INODE_REF_KEY		12
+#define BTRFS_INODE_EXTREF_KEY		13
 #define BTRFS_XATTR_ITEM_KEY		24
 #define BTRFS_ORPHAN_ITEM_KEY		48
 /* reserve 2-15 close to the inode for later flexibility */
@@ -1701,6 +1720,13 @@ BTRFS_SETGET_STACK_FUNCS(block_group_flags,
 BTRFS_SETGET_FUNCS(inode_ref_name_len, struct btrfs_inode_ref, name_len, 16);
 BTRFS_SETGET_FUNCS(inode_ref_index, struct btrfs_inode_ref, index, 64);
 
+/* struct btrfs_inode_extref */
+BTRFS_SETGET_FUNCS(inode_extref_parent, struct btrfs_inode_extref,
+		   parent_objectid, 64);
+BTRFS_SETGET_FUNCS(inode_extref_name_len, struct btrfs_inode_extref,
+		   name_len, 16);
+BTRFS_SETGET_FUNCS(inode_extref_index, struct btrfs_inode_extref, index, 64);
+
 /* struct btrfs_inode_item */
 BTRFS_SETGET_FUNCS(inode_generation, struct btrfs_inode_item, generation, 64);
 BTRFS_SETGET_FUNCS(inode_sequence, struct btrfs_inode_item, sequence, 64);
@@ -2791,12 +2817,12 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle
*trans,
 			   struct btrfs_root *root,
 			   const char *name, int name_len,
 			   u64 inode_objectid, u64 ref_objectid, u64 *index);
-struct btrfs_inode_ref *
-btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
-			struct btrfs_root *root,
-			struct btrfs_path *path,
-			const char *name, int name_len,
-			u64 inode_objectid, u64 ref_objectid, int mod);
+int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
+			      struct btrfs_root *root,
+			      struct btrfs_path *path,
+			      const char *name, int name_len,
+			      u64 inode_objectid, u64 ref_objectid, int mod,
+			      u64 *ret_index);
 int btrfs_insert_empty_inode(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root,
 			     struct btrfs_path *path, u64 objectid);
@@ -2804,6 +2830,16 @@ int btrfs_lookup_inode(struct btrfs_trans_handle *trans,
struct btrfs_root
 		       *root, struct btrfs_path *path,
 		       struct btrfs_key *location, int mod);
 
+struct btrfs_inode_extref *
+btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
+			  struct btrfs_root *root,
+			  struct btrfs_path *path,
+			  const char *name, int name_len,
+			  u64 inode_objectid, u64 ref_objectid, int ins_len,
+			  int cow);
+
+u64 btrfs_extref_key_off(u64 parent_objectid, const char *name, int name_len);
+
 /* file-item.c */
 int btrfs_del_csums(struct btrfs_trans_handle *trans,
 		    struct btrfs_root *root, u64 bytenr, u64 len);
diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c
index baa74f3..cc5e854 100644
--- a/fs/btrfs/inode-item.c
+++ b/fs/btrfs/inode-item.c
@@ -16,6 +16,7 @@
  * Boston, MA 021110-1307, USA.
  */
 
+#include <linux/crc32c.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -49,18 +50,56 @@ static int find_name_in_backref(struct btrfs_path *path,
const char *name,
 	return 0;
 }
 
-struct btrfs_inode_ref *
+static int find_name_in_ext_backref(struct btrfs_path *path, const char *name,
+				   int name_len, struct btrfs_inode_extref **ref_ret)
+{
+	struct extent_buffer *leaf;
+	struct btrfs_inode_extref *ref;
+	unsigned long ptr;
+	unsigned long name_ptr;
+	u32 item_size;
+
+	leaf = path->nodes[0];
+	item_size = btrfs_item_size_nr(leaf, path->slots[0]);
+	ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+	ref = (struct btrfs_inode_extref *)ptr;
+
+	/*
+	 * There isn''t actually anything to "search" in an extended
+	 * backref - one name is directly attached to each
+	 * item. Instead what we''re really doing here is some sort of
+	 * sanity check - does the name exist where it should have
+	 * been found. If all is well, we''ll return success and the
+	 * inode ref object.
+	 */
+	name_ptr = (unsigned long)(&ref->name);
+	if (btrfs_inode_extref_name_len(leaf, ref) == name_len
+	    && (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0)) {
+		*ref_ret = ref;
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * Figure the key offset of an extended inode ref
+ */
+u64 btrfs_extref_key_off(u64 parent_objectid, const char *name, int name_len)
+{
+	return (u64) crc32c(parent_objectid, name, name_len);
+}
+
+static struct btrfs_inode_ref *
 btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
-			struct btrfs_root *root,
-			struct btrfs_path *path,
-			const char *name, int name_len,
-			u64 inode_objectid, u64 ref_objectid, int mod)
+		       struct btrfs_root *root,
+		       struct btrfs_path *path,
+		       const char *name, int name_len,
+		       u64 inode_objectid, u64 ref_objectid, int ins_len,
+		       int cow)
 {
+	int ret;
 	struct btrfs_key key;
 	struct btrfs_inode_ref *ref;
-	int ins_len = mod < 0 ? -1 : 0;
-	int cow = mod != 0;
-	int ret;
 
 	key.objectid = inode_objectid;
 	key.type = BTRFS_INODE_REF_KEY;
@@ -76,20 +115,137 @@ btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
 	return ref;
 }
 
-int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
+struct btrfs_inode_extref *
+btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
+			  struct btrfs_root *root,
+			  struct btrfs_path *path,
+			  const char *name, int name_len,
+			  u64 inode_objectid, u64 ref_objectid, int ins_len,
+			  int cow)
+{
+	int ret;
+	struct btrfs_key key;
+	struct btrfs_inode_extref *ref;
+
+	key.objectid = inode_objectid;
+	key.type = BTRFS_INODE_EXTREF_KEY;
+	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
+
+	ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow);
+	if (ret < 0)
+		return ERR_PTR(ret);
+	if (ret > 0)
+		return NULL;
+	if (!find_name_in_ext_backref(path, name, name_len, &ref))
+		return NULL;
+	return ref;
+}
+
+int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
+			      struct btrfs_root *root,
+			      struct btrfs_path *path,
+			      const char *name, int name_len,
+			      u64 inode_objectid, u64 ref_objectid, int mod,
+			      u64 *ret_index)
+{
+	struct btrfs_inode_ref *ref1;
+	struct btrfs_inode_extref *ref2;
+	int ins_len = mod < 0 ? -1 : 0;
+	int cow = mod != 0;
+
+	ref1 = btrfs_lookup_inode_ref(trans, root, path, name, name_len,
+				      inode_objectid, ref_objectid, ins_len,
+				      cow);
+	if (IS_ERR(ref1))
+		return PTR_ERR(ref1);
+
+	if (ref1 != NULL) {
+		*ret_index = btrfs_inode_ref_index(path->nodes[0], ref1);
+		return 0;
+	}
+
+	btrfs_release_path(path);
+
+	ref2 = btrfs_lookup_inode_extref(trans, root, path, name,
+					 name_len, inode_objectid,
+					 ref_objectid, ins_len, cow);
+	if (IS_ERR(ref2))
+		return PTR_ERR(ref2);
+
+	if (ref2) {
+		*ret_index = btrfs_inode_extref_index(path->nodes[0], ref2);
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+int btrfs_del_inode_extref(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root,
 			   const char *name, int name_len,
 			   u64 inode_objectid, u64 ref_objectid, u64 *index)
 {
 	struct btrfs_path *path;
 	struct btrfs_key key;
+	struct btrfs_inode_extref *ref2;
+	struct extent_buffer *leaf;
+	int ret;
+
+	key.objectid = inode_objectid;
+	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
+	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	path->leave_spinning = 1;
+
+	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	} else if (ret < 0) {
+		goto out;
+	}
+
+	/*
+	 * Sanity check - did we find the right item for this name?
+	 * This should always succeed so error here will make the FS
+	 * readonly.
+	 */
+	if (!find_name_in_ext_backref(path, name, name_len, &ref2)) {
+		btrfs_std_error(root->fs_info, -ENOENT);
+		ret = -EROFS;
+		goto out;
+	}
+
+	leaf = path->nodes[0];
+	if (index)
+		*index = btrfs_inode_extref_index(leaf, ref2);
+
+	ret = btrfs_del_item(trans, root, path);
+
+out:
+	btrfs_free_path(path);
+
+	return ret;
+}
+
+int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
+			struct btrfs_root *root,
+			const char *name, int name_len,
+			u64 inode_objectid, u64 ref_objectid, u64 *index)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
 	struct btrfs_inode_ref *ref;
 	struct extent_buffer *leaf;
 	unsigned long ptr;
 	unsigned long item_start;
 	u32 item_size;
 	u32 sub_item_len;
-	int ret;
+	int ret, search_ext_refs = 0;
 	int del_len = name_len + sizeof(*ref);
 
 	key.objectid = inode_objectid;
@@ -105,12 +261,14 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
 	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
 	if (ret > 0) {
 		ret = -ENOENT;
+		search_ext_refs = 1;
 		goto out;
 	} else if (ret < 0) {
 		goto out;
 	}
 	if (!find_name_in_backref(path, name, name_len, &ref)) {
 		ret = -ENOENT;
+		search_ext_refs = 1;
 		goto out;
 	}
 	leaf = path->nodes[0];
@@ -132,6 +290,59 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
 				  item_size - sub_item_len, 1);
 out:
 	btrfs_free_path(path);
+
+	if (search_ext_refs) {
+		/* 
+		 * No refs were found, or we could not find the
+		 * name in our ref array. Find and remove the extended
+		 * inode ref then.
+		 */
+		return btrfs_del_inode_extref(trans, root, name, name_len,
+					      inode_objectid, ref_objectid, index);
+	}
+
+	return ret;
+}
+
+static int btrfs_insert_inode_extref(struct btrfs_trans_handle *trans,
+				     struct btrfs_root *root,
+				     const char *name, int name_len,
+				     u64 inode_objectid, u64 ref_objectid, u64 index)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct btrfs_inode_extref *ref;
+	unsigned long ptr;
+	int ret;
+	int ins_len = name_len + sizeof(*ref);
+
+	key.objectid = inode_objectid;
+	key.type = BTRFS_INODE_EXTREF_KEY;
+	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	path->leave_spinning = 1;
+	ret = btrfs_insert_empty_item(trans, root, path, &key,
+				      ins_len);
+	if (ret < 0)
+		goto out;
+
+	ref = btrfs_item_ptr(path->nodes[0], path->slots[0],
+			     struct btrfs_inode_extref);
+
+	btrfs_set_inode_extref_name_len(path->nodes[0], ref, name_len);
+	btrfs_set_inode_extref_index(path->nodes[0], ref, index);
+	btrfs_set_inode_extref_parent(path->nodes[0], ref, ref_objectid);
+
+	ptr = (unsigned long)&ref->name;
+	write_extent_buffer(path->nodes[0], name, ptr, name_len);
+	btrfs_mark_buffer_dirty(path->nodes[0]);
+
+out:
+	btrfs_free_path(path);
 	return ret;
 }
 
@@ -189,6 +400,19 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle
*trans,
 
 out:
 	btrfs_free_path(path);
+
+	if (ret == -EMLINK) {
+		struct btrfs_super_block *disk_super = root->fs_info->super_copy;
+		/* We ran out of space in the ref array. Need to
+		 * add an extended ref. */
+		if (btrfs_super_incompat_flags(disk_super)
+		    & BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
+			ret = btrfs_insert_inode_extref(trans, root, name,
+							name_len,
+							inode_objectid,
+							ref_objectid, index);
+	}
+
 	return ret;
 }
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 892b347..0ca525d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2689,7 +2689,6 @@ static struct btrfs_trans_handle
*__unlink_start_trans(struct inode *dir,
 	struct btrfs_trans_handle *trans;
 	struct btrfs_root *root = BTRFS_I(dir)->root;
 	struct btrfs_path *path;
-	struct btrfs_inode_ref *ref;
 	struct btrfs_dir_item *di;
 	struct inode *inode = dentry->d_inode;
 	u64 index;
@@ -2803,17 +2802,16 @@ static struct btrfs_trans_handle
*__unlink_start_trans(struct inode *dir,
 	}
 	btrfs_release_path(path);
 
-	ref = btrfs_lookup_inode_ref(trans, root, path,
-				dentry->d_name.name, dentry->d_name.len,
-				ino, dir_ino, 0);
-	if (IS_ERR(ref)) {
-		err = PTR_ERR(ref);
+	ret = btrfs_get_inode_ref_index(trans, root, path, dentry->d_name.name,
+					dentry->d_name.len, ino, dir_ino, 0, &index);
+	if (ret) {
+		err = ret;
 		goto out;
 	}
-	BUG_ON(!ref);
+
 	if (check_path_shared(root, path))
 		goto out;
-	index = btrfs_inode_ref_index(path->nodes[0], ref);
+
 	btrfs_release_path(path);
 
 	/*
@@ -4484,6 +4482,10 @@ static struct inode *btrfs_new_inode(struct
btrfs_trans_handle *trans,
 	btrfs_set_key_type(&key[0], BTRFS_INODE_ITEM_KEY);
 	key[0].offset = 0;
 
+	/*
+	 * Start new inodes with a V1 ref. This is slightly more
+	 * efficient for small numbers of hard links.
+	 */
 	key[1].objectid = objectid;
 	btrfs_set_key_type(&key[1], BTRFS_INODE_REF_KEY);
 	key[1].offset = ref_objectid;
@@ -4777,7 +4779,7 @@ static int btrfs_link(struct dentry *old_dentry, struct
inode *dir,
 	if (root->objectid != BTRFS_I(inode)->root->objectid)
 		return -EXDEV;
 
-	if (inode->i_nlink == ~0U)
+	if (inode->i_nlink >= BTRFS_LINK_MAX)
 		return -EMLINK;
 
 	err = btrfs_set_inode_index(dir, &index);
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Fasheh

2012-Apr-05 20:09 UTC

head link

[PATCH 2/3] btrfs: extended inode refs

Teach tree-log.c about extended inode refs. In particular, we have to adjust
the behavior of inode ref replay as well as log tree recovery to account for
the existence of extended refs.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/tree-log.c |  320 +++++++++++++++++++++++++++++++++++++++++---------
 fs/btrfs/tree-log.h |    4 +
 2 files changed, 266 insertions(+), 58 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 966cc74..d69b07a 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -23,6 +23,7 @@
 #include "disk-io.h"
 #include "locking.h"
 #include "print-tree.h"
+#include "backref.h"
 #include "compat.h"
 #include "tree-log.h"
 
@@ -748,6 +749,7 @@ static noinline int backref_in_log(struct btrfs_root *log,
 {
 	struct btrfs_path *path;
 	struct btrfs_inode_ref *ref;
+	struct btrfs_inode_extref *extref;
 	unsigned long ptr;
 	unsigned long ptr_end;
 	unsigned long name_ptr;
@@ -764,8 +766,24 @@ static noinline int backref_in_log(struct btrfs_root *log,
 	if (ret != 0)
 		goto out;
 
-	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
 	ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
+
+	if (key->type == BTRFS_INODE_EXTREF_KEY) {
+		extref = (struct btrfs_inode_extref *)ptr;
+
+		found_name_len = btrfs_inode_extref_name_len(path->nodes[0],
+							     extref);
+		if (found_name_len == namelen) {
+			name_ptr = (unsigned long)&extref->name;
+			ret = memcmp_extent_buffer(path->nodes[0], name,
+						   name_ptr, namelen);
+			if (ret == 0)
+				match = 1;
+		}
+		goto out;
+	}
+
+	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
 	ptr_end = ptr + item_size;
 	while (ptr < ptr_end) {
 		ref = (struct btrfs_inode_ref *)ptr;
@@ -786,7 +804,6 @@ out:
 	return match;
 }
 
-
 /*
  * replay one inode back reference item found in the log tree.
  * eb, slot and key refer to the buffer and key found in the log tree.
@@ -801,15 +818,20 @@ static noinline int add_inode_ref(struct
btrfs_trans_handle *trans,
 				  struct btrfs_key *key)
 {
 	struct btrfs_inode_ref *ref;
+	struct btrfs_inode_extref *extref;
 	struct btrfs_dir_item *di;
+	struct btrfs_key search_key;
 	struct inode *dir;
 	struct inode *inode;
 	unsigned long ref_ptr;
 	unsigned long ref_end;
-	char *name;
-	int namelen;
+	char *name, *victim_name;
+	int namelen, victim_name_len;
 	int ret;
 	int search_done = 0;
+	int log_ref_ver = 0;
+	u64 parent_objectid, inode_objectid, ref_index;
+	struct extent_buffer *leaf;
 
 	/*
 	 * it is possible that we didn''t log all the parent directories
@@ -817,32 +839,56 @@ static noinline int add_inode_ref(struct
btrfs_trans_handle *trans,
 	 * copy the back ref in.  The link count fixup code will take
 	 * care of the rest
 	 */
-	dir = read_one_inode(root, key->offset);
+
+	if (key->type == BTRFS_INODE_EXTREF_KEY) {
+		log_ref_ver = 1;
+
+		ref_ptr = btrfs_item_ptr_offset(eb, slot);
+
+		/* So that we don''t loop back looking for old style log refs. */
+		ref_end = ref_ptr;
+
+		extref = (struct btrfs_inode_extref *) btrfs_item_ptr_offset(eb, slot);
+		namelen = btrfs_inode_extref_name_len(eb, extref);
+		name = kmalloc(namelen, GFP_NOFS);
+
+		read_extent_buffer(eb, name, (unsigned long)&extref->name,
+				   namelen);
+
+		ref_index = btrfs_inode_extref_index(eb, extref);
+		parent_objectid = btrfs_inode_extref_parent(eb, extref);
+	} else {
+		parent_objectid = key->offset;
+
+		ref_ptr = btrfs_item_ptr_offset(eb, slot);
+		ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
+
+		ref = (struct btrfs_inode_ref *)ref_ptr;
+		namelen = btrfs_inode_ref_name_len(eb, ref);
+		name = kmalloc(namelen, GFP_NOFS);
+		BUG_ON(!name);
+
+		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
+
+		ref_index = btrfs_inode_ref_index(eb, ref);
+	}
+
+	inode_objectid = key->objectid;
+
+	dir = read_one_inode(root, parent_objectid);
 	if (!dir)
 		return -ENOENT;
 
-	inode = read_one_inode(root, key->objectid);
+	inode = read_one_inode(root, inode_objectid);
 	if (!inode) {
 		iput(dir);
 		return -EIO;
 	}
 
-	ref_ptr = btrfs_item_ptr_offset(eb, slot);
-	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
-
 again:
-	ref = (struct btrfs_inode_ref *)ref_ptr;
-
-	namelen = btrfs_inode_ref_name_len(eb, ref);
-	name = kmalloc(namelen, GFP_NOFS);
-	BUG_ON(!name);
-
-	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
-
 	/* if we already have a perfect match, we''re done */
 	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
-			 btrfs_inode_ref_index(eb, ref),
-			 name, namelen)) {
+			 ref_index, name, namelen)) {
 		goto out;
 	}
 
@@ -857,19 +903,23 @@ again:
 	if (search_done)
 		goto insert;
 
-	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+	/* Search old style refs */
+	search_key.objectid = inode_objectid;
+	search_key.type = BTRFS_INODE_REF_KEY;
+	search_key.offset = parent_objectid;
+
+	ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
 	if (ret == 0) {
-		char *victim_name;
-		int victim_name_len;
 		struct btrfs_inode_ref *victim_ref;
 		unsigned long ptr;
 		unsigned long ptr_end;
-		struct extent_buffer *leaf = path->nodes[0];
+
+		leaf = path->nodes[0];
 
 		/* are we trying to overwrite a back ref for the root directory
 		 * if so, just jump out, we''re done
 		 */
-		if (key->objectid == key->offset)
+		if (search_key.objectid == search_key.offset)
 			goto out_nowrite;
 
 		/* check all the names in this back reference to see
@@ -889,7 +939,7 @@ again:
 					   (unsigned long)(victim_ref + 1),
 					   victim_name_len);
 
-			if (!backref_in_log(log, key, victim_name,
+			if (!backref_in_log(log, &search_key, victim_name,
 					    victim_name_len)) {
 				btrfs_inc_nlink(inode);
 				btrfs_release_path(path);
@@ -902,19 +952,44 @@ again:
 			ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
 		}
 		BUG_ON(ret);
-
-		/*
-		 * NOTE: we have searched root tree and checked the
-		 * coresponding ref, it does not need to check again.
-		 */
-		search_done = 1;
 	}
 	btrfs_release_path(path);
 
+	/* Same search but for extended refs */
+	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
+					   inode_objectid, parent_objectid, 0,
+					   0);
+	if (extref && !IS_ERR(extref)) {
+		victim_name_len = btrfs_inode_extref_name_len(eb, extref);
+		victim_name = kmalloc(namelen, GFP_NOFS);
+		leaf = path->nodes[0];
+		read_extent_buffer(eb, name, (unsigned long)&extref->name, namelen);
+
+		search_key.objectid = inode_objectid;
+		search_key.type = BTRFS_INODE_EXTREF_KEY;
+		search_key.offset = btrfs_extref_key_off(parent_objectid, name, namelen);
+		if (!backref_in_log(log, &search_key, victim_name,
+				    victim_name_len)) {
+			btrfs_inc_nlink(inode);
+			btrfs_release_path(path);
+
+			ret = btrfs_unlink_inode(trans, root, dir,
+						 inode, victim_name,
+						 victim_name_len);
+		}
+		kfree(victim_name);
+		BUG_ON(ret);
+	}
+
+	/*
+	 * NOTE: we have searched root tree and checked the
+	 * coresponding refs, it does not need to be checked again.
+	 */
+	search_done = 1;
+
 	/* look for a conflicting sequence number */
 	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
-					 btrfs_inode_ref_index(eb, ref),
-					 name, namelen, 0);
+					 ref_index, name, namelen, 0);
 	if (di && !IS_ERR(di)) {
 		ret = drop_one_dir_item(trans, root, path, dir, di);
 		BUG_ON(ret);
@@ -932,17 +1007,25 @@ again:
 
 insert:
 	/* insert our name */
-	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
-			     btrfs_inode_ref_index(eb, ref));
+	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
 	BUG_ON(ret);
 
 	btrfs_update_inode(trans, root, inode);
 
 out:
-	ref_ptr = (unsigned long)(ref + 1) + namelen;
+	ref_ptr = (unsigned long)(ref_ptr + sizeof(struct btrfs_inode_ref)) + namelen;
 	kfree(name);
-	if (ref_ptr < ref_end)
+	if (ref_ptr < ref_end) {
+		ref = (struct btrfs_inode_ref *)ref_ptr;
+		namelen = btrfs_inode_ref_name_len(eb, ref);
+		name = kmalloc(namelen, GFP_NOFS);
+		BUG_ON(!name);
+
+		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
+
+		ref_index = btrfs_inode_ref_index(eb, ref);
 		goto again;
+	}
 
 	/* finally write the back reference in the inode */
 	ret = overwrite_item(trans, root, path, eb, slot, key);
@@ -965,25 +1048,103 @@ static int insert_orphan_item(struct btrfs_trans_handle
*trans,
 	return ret;
 }
 
+int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
+			  u64 start_off, struct btrfs_path *path,
+			  struct btrfs_inode_extref **ret_ref, u64 *found_off)
+{
+	int ret, slot;
+	struct btrfs_key key, found_key;
+	struct btrfs_inode_extref *ref;
+	struct extent_buffer *leaf;
+	struct btrfs_item *item;
+	unsigned long ptr;
 
-/*
- * There are a few corners where the link count of the file can''t
- * be properly maintained during replay.  So, instead of adding
- * lots of complexity to the log code, we just scan the backrefs
- * for any file that has been through replay.
- *
- * The scan will update the link count on the inode to reflect the
- * number of back refs found.  If it goes down to zero, the iput
- * will free the inode.
- */
-static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
-					   struct btrfs_root *root,
-					   struct inode *inode)
+	key.objectid = inode_objectid;
+	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
+	key.offset = start_off;
+
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+
+	while (1) {
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+		if (slot >= btrfs_header_nritems(leaf)) {
+			/*
+			 * If the item at offset is not found,
+			 * btrfs_search_slot will point us to the slot
+			 * where it should be inserted. In our case
+			 * that will be the slot directly before the
+			 * next INODE_REF_KEY_V2 item. In the case
+			 * that we''re pointing to the last slot in a
+			 * leaf, we must move one leaf over.
+			 */
+			ret = btrfs_next_leaf(root, path);
+			if (ret) {
+				if (ret >= 1)
+					ret = -ENOENT;
+				break;
+			}
+			continue;
+		}
+
+		item = btrfs_item_nr(leaf, slot);
+		btrfs_item_key_to_cpu(leaf, &found_key, slot);
+
+		/*
+		 * Check that we''re still looking at an extended ref key for
+		 * this particular objectid. If we have different
+		 * objectid or type then there are no more to be found
+		 * in the tree and we can exit.
+		 */
+		ret = -ENOENT;
+		if (found_key.objectid != inode_objectid)
+			break;
+		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
+			break;
+
+		ret = 0;
+		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+		ref = (struct btrfs_inode_extref *)ptr;
+		*ret_ref = ref;
+		if (found_off)
+			*found_off = found_key.offset + 1;
+		break;
+	}
+
+out:
+	return ret;
+}
+
+static int count_inode_extrefs(struct btrfs_root *root,
+			       struct inode *inode, struct btrfs_path *path)
+{
+	int ret;
+	unsigned int nlink = 0;
+	u64 inode_objectid = btrfs_ino(inode);
+	u64 offset = 0;
+	struct btrfs_inode_extref *ref;
+
+	while (1) {
+		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
+					    &ref, &offset);
+		if (ret)
+			break;
+
+		nlink++;
+		offset++;
+	}
+
+	return nlink;
+}
+
+static int count_inode_refs(struct btrfs_root *root,
+			       struct inode *inode, struct btrfs_path *path)
 {
-	struct btrfs_path *path;
 	int ret;
 	struct btrfs_key key;
-	u64 nlink = 0;
+	unsigned int nlink = 0;
 	unsigned long ptr;
 	unsigned long ptr_end;
 	int name_len;
@@ -993,10 +1154,6 @@ static noinline int fixup_inode_link_count(struct
btrfs_trans_handle *trans,
 	key.type = BTRFS_INODE_REF_KEY;
 	key.offset = (u64)-1;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
 	while (1) {
 		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
 		if (ret < 0)
@@ -1030,6 +1187,48 @@ static noinline int fixup_inode_link_count(struct
btrfs_trans_handle *trans,
 		btrfs_release_path(path);
 	}
 	btrfs_release_path(path);
+	btrfs_free_path(path);
+
+	return nlink;
+}
+
+/*
+ * There are a few corners where the link count of the file can''t
+ * be properly maintained during replay.  So, instead of adding
+ * lots of complexity to the log code, we just scan the backrefs
+ * for any file that has been through replay.
+ *
+ * The scan will update the link count on the inode to reflect the
+ * number of back refs found.  If it goes down to zero, the iput
+ * will free the inode.
+ */
+static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
+					   struct btrfs_root *root,
+					   struct inode *inode)
+{
+	struct btrfs_path *path;
+	int ret;
+	u64 nlink = 0;
+	u64 ino = btrfs_ino(inode);
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = count_inode_refs(root, inode, path);
+	btrfs_release_path(path);
+	if (ret < 0)
+		goto out;
+
+	nlink = ret;
+
+	ret = count_inode_extrefs(root, inode, path);
+	btrfs_release_path(path);
+	if (ret < 0)
+		goto out;
+
+	nlink += ret;
+
 	if (nlink != inode->i_nlink) {
 		set_nlink(inode, nlink);
 		btrfs_update_inode(trans, root, inode);
@@ -1045,9 +1244,10 @@ static noinline int fixup_inode_link_count(struct
btrfs_trans_handle *trans,
 		ret = insert_orphan_item(trans, root, ino);
 		BUG_ON(ret);
 	}
-	btrfs_free_path(path);
 
-	return 0;
+out:
+	btrfs_free_path(path);
+	return ret;
 }
 
 static noinline int fixup_inode_link_counts(struct btrfs_trans_handle *trans,
@@ -1689,6 +1889,10 @@ static int replay_one_buffer(struct btrfs_root *log,
struct extent_buffer *eb,
 			ret = add_inode_ref(wc->trans, root, log, path,
 					    eb, i, &key);
 			BUG_ON(ret && ret != -ENOENT);
+		} else if (key.type == BTRFS_INODE_EXTREF_KEY) {
+			ret = add_inode_ref(wc->trans, root, log, path,
+					    eb, i, &key);
+			BUG_ON(ret && ret != -ENOENT);
 		} else if (key.type == BTRFS_EXTENT_DATA_KEY) {
 			ret = replay_one_extent(wc->trans, root, path,
 						eb, i, &key);
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 2270ac5..fd40ad5 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -49,4 +49,8 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
 int btrfs_log_new_name(struct btrfs_trans_handle *trans,
 			struct inode *inode, struct inode *old_dir,
 			struct dentry *parent);
+int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
+			  u64 start_off, struct btrfs_path *path,
+			  struct btrfs_inode_extref **ret_ref, u64 *found_off);
+
 #endif
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Fasheh

2012-Apr-05 20:09 UTC

head link

[PATCH 3/3] btrfs: extended inode refs

The iterate_irefs in backref.c is used to build path components from inode
refs. I had to add a 2nd iterate function callback to handle extended refs.

Both iterate callbacks eventually converge upon iref_to_path() which I was
able to keep as one function with some small code to abstract away
differences in the two disk structures.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/backref.c |  200 ++++++++++++++++++++++++++++++++++++++++++----------
 fs/btrfs/backref.h |    4 +-
 2 files changed, 165 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 0436c12..f2b8952 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -22,6 +22,7 @@
 #include "ulist.h"
 #include "transaction.h"
 #include "delayed-ref.h"
+#include "tree-log.h"
 
 /*
  * this structure records all encountered refs on the way up to the root
@@ -858,62 +859,75 @@ static int inode_ref_info(u64 inum, u64 ioff, struct
btrfs_root *fs_root,
 }
 
 /*
- * this iterates to turn a btrfs_inode_ref into a full filesystem path.
elements
- * of the path are separated by ''/'' and the path is
guaranteed to be
- * 0-terminated. the path is only given within the current file system.
- * Therefore, it never starts with a ''/''. the caller is
responsible to provide
- * "size" bytes in "dest". the dest buffer will be filled
backwards. finally,
- * the start point of the resulting string is returned. this pointer is within
- * dest, normally.
- * in case the path buffer would overflow, the pointer is decremented further
- * as if output was written to the buffer, though no more output is actually
- * generated. that way, the caller can determine how much space would be
- * required for the path to fit into the buffer. in that case, the returned
- * value will be smaller than dest. callers must check this!
+ * Given the parent objectid and name/name_len pairs of an inode ref
+ * (any version) this iterates to turn that information into a
+ * full filesystem path. elements of the path are separated by
''/'' and
+ * the path is guaranteed to be 0-terminated. the path is only given
+ * within the current file system.  Therefore, it never starts with a
+ * ''/''. the caller is responsible to provide "size"
bytes in
+ * "dest". the dest buffer will be filled backwards. finally, the
+ * start point of the resulting string is returned. this pointer is
+ * within dest, normally.  in case the path buffer would overflow, the
+ * pointer is decremented further as if output was written to the
+ * buffer, though no more output is actually generated. that way, the
+ * caller can determine how much space would be required for the path
+ * to fit into the buffer. in that case, the returned value will be
+ * smaller than dest. callers must check this!
  */
 static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
-				struct btrfs_inode_ref *iref,
-				struct extent_buffer *eb_in, u64 parent,
-				char *dest, u32 size)
+			  int name_len, unsigned long name_off,
+			  struct extent_buffer *eb_in, u64 parent,
+			  char *dest, u32 size)
 {
-	u32 len;
 	int slot;
 	u64 next_inum;
 	int ret;
 	s64 bytes_left = size - 1;
 	struct extent_buffer *eb = eb_in;
 	struct btrfs_key found_key;
+	struct btrfs_inode_ref *iref;
+	struct btrfs_inode_extref *iref2;
 
 	if (bytes_left >= 0)
 		dest[bytes_left] = ''\0'';
 
 	while (1) {
-		len = btrfs_inode_ref_name_len(eb, iref);
-		bytes_left -= len;
+		bytes_left -= name_len;
 		if (bytes_left >= 0)
 			read_extent_buffer(eb, dest + bytes_left,
-						(unsigned long)(iref + 1), len);
+					   name_off, name_len);
 		if (eb != eb_in)
 			free_extent_buffer(eb);
+
+		/* Ok, we have enough to find any refs to the parent inode. */
 		ret = inode_ref_info(parent, 0, fs_root, path, &found_key);
-		if (ret > 0)
-			ret = -ENOENT;
-		if (ret)
-			break;
 		next_inum = found_key.offset;
+		if (ret == 0) {
+			slot = path->slots[0];
+			eb = path->nodes[0];
+			/* make sure we can use eb after releasing the path */
+			if (eb != eb_in)
+				atomic_inc(&eb->refs);
+			btrfs_release_path(path);
+			iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
+
+			name_len = btrfs_inode_ref_name_len(eb, iref);
+			name_off = (unsigned long)(iref + 1);
+		} else {
+			ret = btrfs_find_one_extref(fs_root, parent, 0, path,
+						    &iref2, NULL);
+			if (ret)
+				break;
+
+			next_inum = btrfs_inode_extref_parent(eb, iref2);
+			name_off = (unsigned long)&iref2->name;
+			name_len = btrfs_inode_extref_name_len(eb, iref2);
+		}
 
 		/* regular exit ahead */
 		if (parent == next_inum)
 			break;
 
-		slot = path->slots[0];
-		eb = path->nodes[0];
-		/* make sure we can use eb after releasing the path */
-		if (eb != eb_in)
-			atomic_inc(&eb->refs);
-		btrfs_release_path(path);
-
-		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
 		parent = next_inum;
 		--bytes_left;
 		if (bytes_left >= 0)
@@ -1226,9 +1240,9 @@ int iterate_inodes_from_logical(u64 logical, struct
btrfs_fs_info *fs_info,
 	return ret;
 }
 
-static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
-				struct btrfs_path *path,
-				iterate_irefs_t *iterate, void *ctx)
+static int iterate_inode_refs(u64 inum, struct btrfs_root *fs_root,
+			      struct btrfs_path *path,
+			      iterate_irefs_t *iterate, void *ctx)
 {
 	int ret;
 	int slot;
@@ -1244,7 +1258,7 @@ static int iterate_irefs(u64 inum, struct btrfs_root
*fs_root,
 
 	while (1) {
 		ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path,
-					&found_key);
+				     &found_key);
 		if (ret < 0)
 			break;
 		if (ret) {
@@ -1286,6 +1300,76 @@ static int iterate_irefs(u64 inum, struct btrfs_root
*fs_root,
 	return ret;
 }
 
+static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
+				 struct btrfs_path *path,
+				 iterate_extrefs_t *iterate, void *ctx)
+{
+	int ret;
+	int slot;
+	u64 offset = 0;
+	u64 parent;
+	int found = 0;
+	struct extent_buffer *eb;
+	struct btrfs_item *item;
+	struct btrfs_inode_extref *iref2;
+
+	while (1) {
+		ret = btrfs_find_one_extref(fs_root, inum, offset, path, &iref2,
+					    &offset);
+		if (ret < 0)
+			break;
+		if (ret) {
+			ret = found ? 0 : -ENOENT;
+			break;
+		}
+		++found;
+
+		slot = path->slots[0];
+		eb = path->nodes[0];
+		/* make sure we can use eb after releasing the path */
+		atomic_inc(&eb->refs);
+		btrfs_release_path(path);
+
+		item = btrfs_item_nr(eb, slot);
+		iref2 = btrfs_item_ptr(eb, slot, struct btrfs_inode_extref);
+
+		parent = btrfs_inode_extref_parent(eb, iref2);
+		ret = iterate(parent, iref2, eb, ctx);
+		if (ret) {
+			free_extent_buffer(eb);
+			break;
+		}
+
+		free_extent_buffer(eb);
+		offset++;
+	}
+
+	btrfs_release_path(path);
+
+	return ret;
+}
+
+static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
+			 struct btrfs_path *path,
+			 iterate_irefs_t *iterate,
+			 iterate_extrefs_t *iterate2, void *ctx)
+{
+	int ret, found_refs = 0;
+
+	ret = iterate_inode_refs(inum, fs_root, path, iterate, ctx);
+	if (ret && ret != -ENOENT)
+		return ret;
+
+	if (ret != -ENOENT)
+		++found_refs;
+
+	ret = iterate_inode_extrefs(inum, fs_root, path, iterate2, ctx);
+	if (ret == -ENOENT && found_refs)
+		return 0;
+
+	return ret;
+}
+
 /*
  * returns 0 if the path could be dumped (probably truncated)
  * returns <0 in case of an error
@@ -1299,13 +1383,53 @@ static int inode_to_path(u64 inum, struct
btrfs_inode_ref *iref,
 	int i = ipath->fspath->elem_cnt;
 	const int s_ptr = sizeof(char *);
 	u32 bytes_left;
+	u32 name_len = btrfs_inode_ref_name_len(eb, iref);
+
+	bytes_left = ipath->fspath->bytes_left > s_ptr ?
+					ipath->fspath->bytes_left - s_ptr : 0;
+
+	fspath_min = (char *)ipath->fspath->val + (i + 1) * s_ptr;
+	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, name_len,
+			      (unsigned long)(iref + 1), eb, inum, fspath_min,
+			      bytes_left);
+	if (IS_ERR(fspath))
+		return PTR_ERR(fspath);
+
+	if (fspath > fspath_min) {
+		ipath->fspath->val[i] = (u64)(unsigned long)fspath;
+		++ipath->fspath->elem_cnt;
+		ipath->fspath->bytes_left = fspath - fspath_min;
+	} else {
+		++ipath->fspath->elem_missed;
+		ipath->fspath->bytes_missing += fspath_min - fspath;
+		ipath->fspath->bytes_left = 0;
+	}
+
+	return 0;
+}
+
+/*
+ * returns 0 if the path could be dumped (probably truncated)
+ * returns <0 in case of an error
+ */
+static int inode_to_path2(u64 inum, struct btrfs_inode_extref *iref2,
+			  struct extent_buffer *eb, void *ctx)
+{
+	struct inode_fs_paths *ipath = ctx;
+	char *fspath;
+	char *fspath_min;
+	int i = ipath->fspath->elem_cnt;
+	const int s_ptr = sizeof(char *);
+	u32 bytes_left;
+	u32 name_len = btrfs_inode_extref_name_len(eb, iref2);
 
 	bytes_left = ipath->fspath->bytes_left > s_ptr ?
 					ipath->fspath->bytes_left - s_ptr : 0;
 
 	fspath_min = (char *)ipath->fspath->val + (i + 1) * s_ptr;
-	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, iref, eb,
-				inum, fspath_min, bytes_left);
+	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, name_len,
+			      (unsigned long)&iref2->name, eb, inum,
+			      fspath_min, bytes_left);
 	if (IS_ERR(fspath))
 		return PTR_ERR(fspath);
 
@@ -1339,7 +1463,7 @@ static int inode_to_path(u64 inum, struct btrfs_inode_ref
*iref,
 int paths_from_inode(u64 inum, struct inode_fs_paths *ipath)
 {
 	return iterate_irefs(inum, ipath->fs_root, ipath->btrfs_path,
-				inode_to_path, ipath);
+			     inode_to_path, inode_to_path2, ipath);
 }
 
 /*
diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
index d00dfa9..4634ed7 100644
--- a/fs/btrfs/backref.h
+++ b/fs/btrfs/backref.h
@@ -31,7 +31,9 @@ struct inode_fs_paths {
 typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root,
 		void *ctx);
 typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref,
-				struct extent_buffer *eb, void *ctx);
+				 struct extent_buffer *eb, void *ctx);
+typedef int (iterate_extrefs_t)(u64 parent, struct btrfs_inode_extref *iref,
+				 struct extent_buffer *eb, void *ctx);
 
 int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
 			struct btrfs_path *path);
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jeff Mahoney

2012-Apr-05 21:13 UTC

head link

Re: [PATCH 0/3] btrfs: extended inode refs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/05/2012 04:09 PM, Mark Fasheh wrote:> Currently btrfs has a limitation on the maximum number of hard
> links an inode can have. Specifically, links are stored in an array
> of ref items:
> 
> struct btrfs_inode_ref { __le64 index; __le16 name_len; /* name
> goes here */ } __attribute__ ((__packed__));
> 
> The ref arrays are found via key triple:
> 
> (inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid)
> 
> Since items can not exceed the size of a leaf, the total number of
> links that can be stored for a given inode / parent dir pair is
> limited to under 4k. This works fine for the most common case of
> few to only a handful of links. Once the link count gets higher
> however, we begin to return EMLINK.
> 
> 
> The following patches fix this situation by introducing a new ref
> item:
> 
> struct btrfs_inode_extref { __le64 parent_objectid; __le64 index; 
> __le16 name_len; __u8   name[0]; /* name goes here */ }
> __attribute__ ((__packed__));
> 
> Extended refs behave differently from ref arrays in several key
> areas.
Thanks for digging into this. It''s been heating up on the list lately.
> Each extended refs is it''s own item so there is no ref array (and 
> therefore no limit on size).
> 
> As a result, we must use a different addressing scheme. Extended
> ref keys look like:
> 
> (inode objectid, BTRFS_INODE_EXTREF_KEY, hash)
> 
> Where hash is defined as a function of the parent objectid and link
> name.
I think this is effective. It will essentially have the same
properties as a dirent but seeds the hash at objectid instead of ~1.
> This effectively fixes the limitation, though we have a slightly
> less efficient packing of link data. To keep the best of both
> worlds then, I implemented the following behavior:
> 
> Extended refs don''t replace the existing ref array. An inode gets
> an extended ref for a given link _only_ after the ref array has
> been filled.  So the most common cases shouldn''t actually see any
> difference in performance or disk usage as they''ll never get to
the
> point where we''re using an extended ref.
> 
> It''s important while reading the patches however that
there''s still
> the possibility that we can have a set of operations that grow out
> an inode ref array (adding some extended refs) and then remove only
> the refs in the array.  I don''t really see this being common but
> it''s a case we always have to consider when coding these changes.
> 
> Right now there is a limitation for extrefs in that we''re not
> handling the possibility of a hash collision. There are two ways I
> see we can deal with this:
> 
> We can use a 56-bit hash and keep a generation counter in the lower
> 8 bits of the offset field.  The cost would be an additional tree
> search (between offset <hash>00 and <hash>FF) if we
don''t find
> exactly the name we were looking for.
> 
> An alternative solution to dealing with collisions could be to
> emulate the dir-item insertion code - specifically something like
> insert_with_overflow() which will stuff multiple items under one
> key. I tend to prefer the idea of
I vote for this option. The code for insert_with_overflow is already
well tested and anything that will generate a collision in dirent
insertion will generate a collision for the backref insertion. The
dirent structure is larger than the extref structure, so there should
always be space to match the dirent leaf so that a failure occurs
there first.

- -Jeff
- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPfgr5AAoJEB57S2MheeWyt6cQAJPHUlCtB3+huJTodA7ow+jy
3WPhrbTPYME6lLpC/JQH8XbogKL1IqLbsvl9M3KzHMZAJ4fRzNJXmMCFgIou4cgu
v2cnNwkU1r5LJF/M3HMk1nhxABCeSONNTFqDbEp/eiTbI7X/UsM6q0vdPpj0vYih
20kWZhazmgx4pUPrtldKU+k91jjsZRoZyn8Bx6lEYPKIx1RQuBDPDH8q3ep5og2d
OQHDfMVNEJJ9Mz9lv+BZDqx/Q2Om8wyaM5GfjhtSocT+XrpTT4tC8FnKiUuOE1Ej
dy1Td43t+cCWvglGRDFj5I6ObfW7x4aUDTizo1hMUuGEQFQVc3vGY3OtDuqPxwoV
XS/4XRR3GU2cmsyjTky4Twa+81RF84cUl05+gM9VMaD7BJX60kIp1SXnE/TpyVUd
oVgZsgkV75R4XmjBp1zDrDRHTzpsgxm+zaVhSTxOKWhnV1bWLZ36SXw/1LRr0IVP
K+0xbnds7WWoRJNMSpJTBgAr7N4A5QEYvIUSQEO4UpgX4EXHYkfWstvvYNx1vUOT
HiHtS3ZKbrfzcmD1ZNSOPAcxold5BsVisuBwoyJnvOyCFeCi+Q+S0W8wA0kI6iS8
57nWYZfGelKLm/ATL8vPkpj6BCkFfhAK8neNdd80v1SpkUJOZr5tMTxaYO1r3QvC
T0e3jAC32pgrJ/2fByT+
=dx+I
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Apr-06 01:24 UTC

head link

Re: [PATCH 0/3] btrfs: extended inode refs

On 04/06/2012 04:09 AM, Mark Fasheh wrote:> Currently btrfs has a limitation on the maximum number of hard links an
> inode can have. Specifically, links are stored in an array of ref
> items:
> 
> struct btrfs_inode_ref {
> 	__le64 index;
> 	__le16 name_len;
> 	/* name goes here */
> } __attribute__ ((__packed__));
> 
> The ref arrays are found via key triple:
> 
> (inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid)
> 
> Since items can not exceed the size of a leaf, the total number of links
> that can be stored for a given inode / parent dir pair is limited to under
> 4k. This works fine for the most common case of few to only a handful of
> links. Once the link count gets higher however, we begin to return EMLINK.
> 
> 
> The following patches fix this situation by introducing a new ref item:
> 
> struct btrfs_inode_extref {
> 	__le64 parent_objectid;
> 	__le64 index;
> 	__le16 name_len;
> 	__u8   name[0];
> 	/* name goes here */
> } __attribute__ ((__packed__));
> 
> Extended refs behave differently from ref arrays in several key areas.
> 
> Each extended refs is it''s own item so there is no ref array (and
> therefore no limit on size).
> 
> As a result, we must use a different addressing scheme. Extended ref keys
> look like:
> 
> (inode objectid, BTRFS_INODE_EXTREF_KEY, hash)
> 
> Where hash is defined as a function of the parent objectid and link name.
> 
> This effectively fixes the limitation, though we have a slightly less
> efficient packing of link data. To keep the best of both worlds then, I
> implemented the following behavior:
> 
> Extended refs don''t replace the existing ref array. An inode gets
an
> extended ref for a given link _only_ after the ref array has been filled. 
So
> the most common cases shouldn''t actually see any difference in
performance
> or disk usage as they''ll never get to the point where
we''re using an
> extended ref.
> 
> It''s important while reading the patches however that
there''s still the
> possibility that we can have a set of operations that grow out an inode ref
> array (adding some extended refs) and then remove only the refs in the
> array.  I don''t really see this being common but it''s a
case we always have
> to consider when coding these changes.
> 
> Right now there is a limitation for extrefs in that we''re not
handling the
> possibility of a hash collision. There are two ways I see we can deal with
> this:
> 
> We can use a 56-bit hash and keep a generation counter in the lower 8
> bits of the offset field.  The cost would be an additional tree search
> (between offset <hash>00 and <hash>FF) if we don''t
find exactly the name we
> were looking for.
> 
> An alternative solution to dealing with collisions could be to emulate the
> dir-item insertion code - specifically something like
insert_with_overflow()
> which will stuff multiple items under one key. I tend to prefer the idea of
> simply including a generation in the key offset however since it maintains
> the 1:1 relationship of keys to names which turns out to be much nicer to
> code for in my honest opinion. Also none of the code which iterates the
tree
> looking for refs would have to change as the only difference is in the key
> offset and not in the actual item structure.
> 
> 
> Testing wise, the patches are in an intermediate state. I''ve
debugged a fair
> bit but I''m certain there''s gremlins lurking in there. 
The basic namespace
> operations work well enough (link, unlink, etc).  I''ve done light
testing of
> my changes in backref.c by exercising BTRFS_IOC_INO_PATHS.  The changes in
> tree-log.c need the most review and testing - I haven''t really
figured out a
> great way to exercise the code in tree-log yet (suggestions would be
> great!).
> 
For the log recover test, I used to sysrq+b to make sure our log remains on
disk.

Will also test this patchset sooner or later.

thanks,
liubo
> 
> Finally, these patches are based off Linux v3.3.
> 	--Mark
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Apr-06 02:12 UTC

head link

Re: [PATCH 0/3] btrfs: extended inode refs

On 04/06/2012 09:24 AM, Liu Bo wrote:> On 04/06/2012 04:09 AM, Mark Fasheh wrote:
>> Currently btrfs has a limitation on the maximum number of hard links an
>> inode can have. Specifically, links are stored in an array of ref
>> items:
>>
>> struct btrfs_inode_ref {
>> 	__le64 index;
>> 	__le16 name_len;
>> 	/* name goes here */
>> } __attribute__ ((__packed__));
>>
>> The ref arrays are found via key triple:
>>
>> (inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid)
>>
>> Since items can not exceed the size of a leaf, the total number of
links
>> that can be stored for a given inode / parent dir pair is limited to
under
>> 4k. This works fine for the most common case of few to only a handful
of
>> links. Once the link count gets higher however, we begin to return
EMLINK.
>>
>>
>> The following patches fix this situation by introducing a new ref item:
>>
>> struct btrfs_inode_extref {
>> 	__le64 parent_objectid;
>> 	__le64 index;
>> 	__le16 name_len;
>> 	__u8   name[0];
>> 	/* name goes here */
>> } __attribute__ ((__packed__));
>>
>> Extended refs behave differently from ref arrays in several key areas.
>>
>> Each extended refs is it''s own item so there is no ref array
(and
>> therefore no limit on size).
>>
>> As a result, we must use a different addressing scheme. Extended ref
keys
>> look like:
>>
>> (inode objectid, BTRFS_INODE_EXTREF_KEY, hash)
>>
>> Where hash is defined as a function of the parent objectid and link
name.
>>
>> This effectively fixes the limitation, though we have a slightly less
>> efficient packing of link data. To keep the best of both worlds then, I
>> implemented the following behavior:
>>
>> Extended refs don''t replace the existing ref array. An inode
gets an
>> extended ref for a given link _only_ after the ref array has been
filled.  So
>> the most common cases shouldn''t actually see any difference in
performance
>> or disk usage as they''ll never get to the point where
we''re using an
>> extended ref.
>>
>> It''s important while reading the patches however that
there''s still the
>> possibility that we can have a set of operations that grow out an inode
ref
>> array (adding some extended refs) and then remove only the refs in the
>> array.  I don''t really see this being common but it''s
a case we always have
>> to consider when coding these changes.
>>
>> Right now there is a limitation for extrefs in that we''re not
handling the
>> possibility of a hash collision. There are two ways I see we can deal
with
>> this:
>>
>> We can use a 56-bit hash and keep a generation counter in the lower 8
>> bits of the offset field.  The cost would be an additional tree search
>> (between offset <hash>00 and <hash>FF) if we don''t
find exactly the name we
>> were looking for.
>>
>> An alternative solution to dealing with collisions could be to emulate
the
>> dir-item insertion code - specifically something like
insert_with_overflow()
>> which will stuff multiple items under one key. I tend to prefer the
idea of
>> simply including a generation in the key offset however since it
maintains
>> the 1:1 relationship of keys to names which turns out to be much nicer
to
>> code for in my honest opinion. Also none of the code which iterates the
tree
>> looking for refs would have to change as the only difference is in the
key
>> offset and not in the actual item structure.
>>
>>
>> Testing wise, the patches are in an intermediate state. I''ve
debugged a fair
>> bit but I''m certain there''s gremlins lurking in
there.  The basic namespace
>> operations work well enough (link, unlink, etc).  I''ve done
light testing of
>> my changes in backref.c by exercising BTRFS_IOC_INO_PATHS.  The changes
in
>> tree-log.c need the most review and testing - I haven''t really
figured out a
>> great way to exercise the code in tree-log yet (suggestions would be
>> great!).
>>
> 
> For the log recover test, I used to sysrq+b to make sure our log remains on
disk.
> 
> Will also test this patchset sooner or later.
> 
It Works fine in normal mode except we need to note people to modify their
btrfs-progs with
that incompat flag at the first step ;)

However, for log recover, I use the following script:

$ touch /mnt/btrfs/foobar; 
$ ./fsync_self /mnt/btrfs/foobar; (fsync_self is a wrapper of fsync() written by
myself)
$ for i in `seq 1 1 300`; do ln /mnt/btrfs/foobar /mnt/btrfs/foobar$i;
./fsync_self /mnt/btrfs/foobar$i; done;
$ echo b > /proc/sysrq-trigger

when we come back,
$ mount disk /mnt/btrfs

and it hits a warning and a hang, the dmesg log shows:

Btrfs loaded
device fsid 85811dec-dd03-44f1-a8e2-005a67c6b7f5 devid 1 transid 5 /dev/sdb7
btrfs: disk space caching is enabled
Btrfs detected SSD devices, enabling SSD mode
------------[ cut here ]------------
WARNING: at fs/btrfs/ctree.c:1677 btrfs_search_slot+0x941/0x960 [btrfs]()
Hardware name: QiTianM7150
Modules linked in: btrfs(O) zlib_deflate libcrc32c ip6table_filter ip6_tables
iptable_filter ebtable_nat ebtables ipt_REJECT ip_tables bridge stp llc nfsd
lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
freq_table mperf be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb3i libcxgbi cxgb3
mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi ext3 jbd dm_mirror dm_region_hash
dm_log dm_mod kvm_intel kvm ppdev sg parport_pc parport coretemp hwmon pcspkr
i2c_i801 iTCO_wdt iTCO_vendor_support sky2 snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore
snd_page_alloc ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic
ata_piix i915 drm_kms_
 helper drm i2c_algo_bit i2c_core video [last unloaded: btrfs]
Pid: 2323, comm: mount Tainted: G           O 3.4.0-rc1 #8
Call Trace:
 [<ffffffff8104d59f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8104d5fa>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa0715071>] btrfs_search_slot+0x941/0x960 [btrfs]
 [<ffffffffa07264de>] btrfs_lookup_dir_index_item+0x4e/0x90 [btrfs]
 [<ffffffffa076f139>] add_inode_ref+0x4b9/0x880 [btrfs]
 [<ffffffffa0771fc7>] replay_one_buffer+0x2a7/0x3b0 [btrfs]
 [<ffffffffa074700d>] ? btrfs_token_key_generation+0x5d/0xe0 [btrfs]
 [<ffffffffa076c31a>] walk_down_log_tree+0x23a/0x410 [btrfs]
 [<ffffffffa076c825>] walk_log_tree+0xb5/0x210 [btrfs]
 [<ffffffffa0770669>] btrfs_recover_log_trees+0x229/0x3e0 [btrfs]
 [<ffffffffa0771d20>] ? replay_one_dir_item+0xf0/0xf0 [btrfs]
 [<ffffffffa0730b08>] open_ctree+0x1598/0x1ae0 [btrfs]
 [<ffffffffa070bc94>] btrfs_mount+0x474/0x560 [btrfs]
 [<ffffffff811278be>] ? pcpu_next_pop+0x4e/0x70
 [<ffffffff81169ab3>] mount_fs+0x43/0x1a0
 [<ffffffff81129050>] ? __alloc_percpu+0x10/0x20
 [<ffffffff8118406a>] vfs_kern_mount+0x6a/0xf0
 [<ffffffff81184462>] do_kern_mount+0x52/0x110
 [<ffffffff811f3f98>] ? security_capable+0x18/0x20
 [<ffffffff81186355>] do_mount+0x255/0x7c0
 [<ffffffff8112390b>] ? memdup_user+0x4b/0x90
 [<ffffffff811239ab>] ? strndup_user+0x5b/0x80
 [<ffffffff81186950>] sys_mount+0x90/0xe0
 [<ffffffff814fab69>] system_call_fastpath+0x16/0x1b
---[ end trace d5fe92190ef227d6 ]---
SysRq : Show Blocked State
  task                        PC stack   pid father
mount           D ffffffff81610340     0  2323   2254 0x00000080
 ffff880075989408 0000000000000082 ffff880076c454a0 0000000000013440
 ffff880075989fd8 ffff880075988010 0000000000013440 0000000000013440
 ffff880075989fd8 0000000000013440 ffff88007a45cb30 ffff880076c454a0
Call Trace:
 [<ffffffff814f2029>] schedule+0x29/0x70
 [<ffffffffa076af25>] btrfs_tree_lock+0xc5/0x2a0 [btrfs]
 [<ffffffff8106f850>] ? wake_up_bit+0x40/0x40
 [<ffffffffa070e16b>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
 [<ffffffffa0714e88>] btrfs_search_slot+0x758/0x960 [btrfs]
 [<ffffffffa0715b4d>] btrfs_insert_empty_items+0x8d/0xf0 [btrfs]
 [<ffffffffa07268a3>] insert_with_overflow+0x43/0x110 [btrfs]
 [<ffffffffa0726a4a>] btrfs_insert_dir_item+0xda/0x210 [btrfs]
 [<ffffffff8121f02b>] ? chksum_update+0x1b/0x30
 [<ffffffffa0737f74>] btrfs_add_link+0xe4/0x2f0 [btrfs]
 [<ffffffffa0755bb4>] ? free_extent_buffer+0x34/0x80 [btrfs]
 [<ffffffffa076f22d>] add_inode_ref+0x5ad/0x880 [btrfs]
 [<ffffffffa0771fc7>] replay_one_buffer+0x2a7/0x3b0 [btrfs]
 [<ffffffffa074700d>] ? btrfs_token_key_generation+0x5d/0xe0 [btrfs]
 [<ffffffffa076c31a>] walk_down_log_tree+0x23a/0x410 [btrfs]
 [<ffffffffa076c825>] walk_log_tree+0xb5/0x210 [btrfs]
 [<ffffffffa0770669>] btrfs_recover_log_trees+0x229/0x3e0 [btrfs]
 [<ffffffffa0771d20>] ? replay_one_dir_item+0xf0/0xf0 [btrfs]
 [<ffffffffa0730b08>] open_ctree+0x1598/0x1ae0 [btrfs]
 [<ffffffffa070bc94>] btrfs_mount+0x474/0x560 [btrfs]
 [<ffffffff811278be>] ? pcpu_next_pop+0x4e/0x70
 [<ffffffff81169ab3>] mount_fs+0x43/0x1a0
 [<ffffffff81129050>] ? __alloc_percpu+0x10/0x20
 [<ffffffff8118406a>] vfs_kern_mount+0x6a/0xf0
 [<ffffffff81184462>] do_kern_mount+0x52/0x110
 [<ffffffff811f3f98>] ? security_capable+0x18/0x20
 [<ffffffff81186355>] do_mount+0x255/0x7c0
 [<ffffffff8112390b>] ? memdup_user+0x4b/0x90
 [<ffffffff811239ab>] ? strndup_user+0x5b/0x80
 [<ffffffff81186950>] sys_mount+0x90/0xe0
 [<ffffffff814fab69>] system_call_fastpath+0x16/0x1b
btrfs-transacti D ffffffff81610340     0  2338      2 0x00000080
 ffff880079df1ae0 0000000000000046 ffff8800372ca100 0000000000013440
 ffff880079df1fd8 ffff880079df0010 0000000000013440 0000000000013440
 ffff880079df1fd8 0000000000013440 ffffffff81a13020 ffff8800372ca100
Call Trace:
 [<ffffffff814f2029>] schedule+0x29/0x70
 [<ffffffffa076af25>] btrfs_tree_lock+0xc5/0x2a0 [btrfs]
 [<ffffffff8106f850>] ? wake_up_bit+0x40/0x40
 [<ffffffffa070e16b>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
 [<ffffffffa0714e88>] btrfs_search_slot+0x758/0x960 [btrfs]
 [<ffffffffa072884f>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
 [<ffffffff814f08ce>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0784931>] btrfs_update_delayed_inode+0x71/0x140 [btrfs]
 [<ffffffffa0784e0a>] btrfs_run_delayed_items+0x12a/0x160 [btrfs]
 [<ffffffffa0732aef>] btrfs_commit_transaction+0x36f/0xa70 [btrfs]
 [<ffffffffa0733592>] ? start_transaction+0x92/0x320 [btrfs]
 [<ffffffff8106f850>] ? wake_up_bit+0x40/0x40
 [<ffffffffa072e0fb>] transaction_kthread+0x26b/0x2e0 [btrfs]
 [<ffffffffa072de90>] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0
[btrfs]
 [<ffffffffa072de90>] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0
[btrfs]
 [<ffffffff8106f1ae>] kthread+0x9e/0xb0
 [<ffffffff814fbe64>] kernel_thread_helper+0x4/0x10
 [<ffffffff8106f110>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff814fbe60>] ? gs_change+0x13/0x13
> thanks,
> liubo
> 
>> Finally, these patches are based off Linux v3.3.
>> 	--Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Apr-11 13:11 UTC

head link

Re: [PATCH 0/3] btrfs: extended inode refs

Hi Jeff,

On 05.04.2012 23:13, Jeff Mahoney wrote:>> As a result, we must use a different addressing scheme. Extended
>> ref keys look like:
> 
>> (inode objectid, BTRFS_INODE_EXTREF_KEY, hash)
> 
>> Where hash is defined as a function of the parent objectid and link
>> name.
> 
> I think this is effective. It will essentially have the same
> properties as a dirent but seeds the hash at objectid instead of ~1.
The objectid is already part of the key. What''s the point in seeding it
to crc32 instead of ~1?
>> Right now there is a limitation for extrefs in that we''re not
>> handling the possibility of a hash collision. There are two ways I
>> see we can deal with this:
> 
>> We can use a 56-bit hash and keep a generation counter in the lower
>> 8 bits of the offset field.  The cost would be an additional tree
>> search (between offset <hash>00 and <hash>FF) if we
don''t find
>> exactly the name we were looking for.
> 
>> An alternative solution to dealing with collisions could be to
>> emulate the dir-item insertion code - specifically something like
>> insert_with_overflow() which will stuff multiple items under one
>> key. I tend to prefer the idea of
> 
> I vote for this option.
Having more than one way to deal with hash collisions for keys doesn''t
feel right. Let''s find out which works best and use it in both places.
If the current insert_with_overflow style is best, please use that for
extrefs as well. If it isn''t, can we replace it with something that
works for both?
> The code for insert_with_overflow is already
> well tested
I''m looking forward to seeing tests or references to the archives
discussing this. Do you have a link for me?
> and anything that will generate a collision in dirent
> insertion will generate a collision for the backref insertion. The
> dirent structure is larger than the extref structure, so there should
> always be space to match the dirent leaf so that a failure occurs
> there first.
I agree with that bit.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Apr-11 13:29 UTC

head link

Re: [PATCH 0/3] btrfs: extended inode refs

On 11.04.2012 15:11, Jan Schmidt wrote:> Hi Jeff,
> 
> On 05.04.2012 23:13, Jeff Mahoney wrote:
>>> As a result, we must use a different addressing scheme. Extended
>>> ref keys look like:
>>
>>> (inode objectid, BTRFS_INODE_EXTREF_KEY, hash)
>>
>>> Where hash is defined as a function of the parent objectid and link
>>> name.
>>
>> I think this is effective. It will essentially have the same
>> properties as a dirent but seeds the hash at objectid instead of ~1.
> 
> The objectid is already part of the key. What''s the point in
seeding it
> to crc32 instead of ~1?
While reading the patch set, I''m just seeing that the first objectid is
the inode''s obectid while the latter is the parent''s objectid.
I''m fine
with that.

Thanks,
-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Apr-12 13:08 UTC

head link

Re: [PATCH 1/3] btrfs: extended inode refs

On 05.04.2012 22:09, Mark Fasheh wrote:> This patch adds basic support for extended inode refs. This includes
support
> for link and unlink of the refs, which basically gets us support for rename
> as well.
> 
> Inode creation does not need changing - extended refs are only added after
> the ref array is full.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/btrfs/ctree.h      |   50 +++++++++--
>  fs/btrfs/inode-item.c |  244
+++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/btrfs/inode.c      |   20 ++--
>  3 files changed, 288 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 80b6486..5fc77ee 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -143,6 +143,13 @@ struct btrfs_ordered_sum;
>   */
>  #define BTRFS_NAME_LEN 255
>  
> +/*
> + * Theoretical limit is larger, but we keep this down to a sane
> + * value. That should limit greatly the possibility of collisions on
> + * inode ref items.
> + */
> +#define BTRFS_LINK_MAX 65535U
Do we really want an artificial limit like that?
> +
>  /* 32 bytes in various csum fields */
>  #define BTRFS_CSUM_SIZE 32
>  
> @@ -462,13 +469,16 @@ struct btrfs_super_block {
>  #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
>  #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO	(1ULL << 3)
>  
> +#define BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF	(1ULL << 6)
> +
>  #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
>  #define BTRFS_FEATURE_COMPAT_RO_SUPP		0ULL
>  #define BTRFS_FEATURE_INCOMPAT_SUPP			\
>  	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF |		\
>  	 BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |	\
>  	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
> -	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)
> +	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |		\
> +	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
>  
>  /*
>   * A leaf is full of items. offset and size tell us where to find
> @@ -615,6 +625,14 @@ struct btrfs_inode_ref {
>  	/* name goes here */
>  } __attribute__ ((__packed__));
>  
> +struct btrfs_inode_extref {
> +	__le64 parent_objectid;
> +	__le64 index;
> +	__le16 name_len;
> +	__u8   name[0];
> +	/* name goes here */
> +} __attribute__ ((__packed__));
> +
>  struct btrfs_timespec {
>  	__le64 sec;
>  	__le32 nsec;
> @@ -1400,6 +1418,7 @@ struct btrfs_ioctl_defrag_range_args {
>   */
>  #define BTRFS_INODE_ITEM_KEY		1
>  #define BTRFS_INODE_REF_KEY		12
> +#define BTRFS_INODE_EXTREF_KEY		13
>  #define BTRFS_XATTR_ITEM_KEY		24
>  #define BTRFS_ORPHAN_ITEM_KEY		48
>  /* reserve 2-15 close to the inode for later flexibility */
> @@ -1701,6 +1720,13 @@ BTRFS_SETGET_STACK_FUNCS(block_group_flags,
>  BTRFS_SETGET_FUNCS(inode_ref_name_len, struct btrfs_inode_ref, name_len,
16);
>  BTRFS_SETGET_FUNCS(inode_ref_index, struct btrfs_inode_ref, index, 64);
>  
> +/* struct btrfs_inode_extref */
> +BTRFS_SETGET_FUNCS(inode_extref_parent, struct btrfs_inode_extref,
> +		   parent_objectid, 64);
> +BTRFS_SETGET_FUNCS(inode_extref_name_len, struct btrfs_inode_extref,
> +		   name_len, 16);
> +BTRFS_SETGET_FUNCS(inode_extref_index, struct btrfs_inode_extref, index,
64);
> +
>  /* struct btrfs_inode_item */
>  BTRFS_SETGET_FUNCS(inode_generation, struct btrfs_inode_item, generation,
64);
>  BTRFS_SETGET_FUNCS(inode_sequence, struct btrfs_inode_item, sequence, 64);
> @@ -2791,12 +2817,12 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle
*trans,
>  			   struct btrfs_root *root,
>  			   const char *name, int name_len,
>  			   u64 inode_objectid, u64 ref_objectid, u64 *index);
> -struct btrfs_inode_ref *
> -btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
> -			struct btrfs_root *root,
> -			struct btrfs_path *path,
> -			const char *name, int name_len,
> -			u64 inode_objectid, u64 ref_objectid, int mod);
> +int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
> +			      struct btrfs_root *root,
> +			      struct btrfs_path *path,
> +			      const char *name, int name_len,
> +			      u64 inode_objectid, u64 ref_objectid, int mod,
> +			      u64 *ret_index);
>  int btrfs_insert_empty_inode(struct btrfs_trans_handle *trans,
>  			     struct btrfs_root *root,
>  			     struct btrfs_path *path, u64 objectid);
> @@ -2804,6 +2830,16 @@ int btrfs_lookup_inode(struct btrfs_trans_handle
*trans, struct btrfs_root
>  		       *root, struct btrfs_path *path,
>  		       struct btrfs_key *location, int mod);
>  
> +struct btrfs_inode_extref *
> +btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
> +			  struct btrfs_root *root,
> +			  struct btrfs_path *path,
> +			  const char *name, int name_len,
> +			  u64 inode_objectid, u64 ref_objectid, int ins_len,
> +			  int cow);
> +
> +u64 btrfs_extref_key_off(u64 parent_objectid, const char *name, int
name_len);
> +
>  /* file-item.c */
>  int btrfs_del_csums(struct btrfs_trans_handle *trans,
>  		    struct btrfs_root *root, u64 bytenr, u64 len);
> diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c
> index baa74f3..cc5e854 100644
> --- a/fs/btrfs/inode-item.c
> +++ b/fs/btrfs/inode-item.c
> @@ -16,6 +16,7 @@
>   * Boston, MA 021110-1307, USA.
>   */
>  
> +#include <linux/crc32c.h>
>  #include "ctree.h"
>  #include "disk-io.h"
>  #include "transaction.h"
> @@ -49,18 +50,56 @@ static int find_name_in_backref(struct btrfs_path
*path, const char *name,
>  	return 0;
>  }
>  
> -struct btrfs_inode_ref *
> +static int find_name_in_ext_backref(struct btrfs_path *path, const char
*name,
> +				   int name_len, struct btrfs_inode_extref **ref_ret)
> +{
> +	struct extent_buffer *leaf;
> +	struct btrfs_inode_extref *ref;
> +	unsigned long ptr;
> +	unsigned long name_ptr;
> +	u32 item_size;
> +
> +	leaf = path->nodes[0];
> +	item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> +	ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +	ref = (struct btrfs_inode_extref *)ptr;
> +
> +	/*
> +	 * There isn''t actually anything to "search" in an
extended
> +	 * backref - one name is directly attached to each
> +	 * item. Instead what we''re really doing here is some sort of
> +	 * sanity check - does the name exist where it should have
> +	 * been found. If all is well, we''ll return success and the
> +	 * inode ref object.
> +	 */
In that case, the name is not a good choice. However, if we change how
we deal with hash collisions, the name might be right and the comment
might go away as we really look through more than one item. If we leave
it like this, please rename that function.
> +	name_ptr = (unsigned long)(&ref->name);
> +	if (btrfs_inode_extref_name_len(leaf, ref) == name_len
> +	    && (memcmp_extent_buffer(leaf, name, name_ptr, name_len) ==
0)) {
> +		*ref_ret = ref;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Figure the key offset of an extended inode ref
> + */
> +u64 btrfs_extref_key_off(u64 parent_objectid, const char *name, int
name_len)
I''d suggest to call this one btrfs_extref_hash and move it to
fs/btrfs/hash.h. That way, we can also drop the include for
<linux/crc32c.h> above.
> +{
> +	return (u64) crc32c(parent_objectid, name, name_len);
> +}
> +
> +static struct btrfs_inode_ref *
>  btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
> -			struct btrfs_root *root,
> -			struct btrfs_path *path,
> -			const char *name, int name_len,
> -			u64 inode_objectid, u64 ref_objectid, int mod)
> +		       struct btrfs_root *root,
> +		       struct btrfs_path *path,
> +		       const char *name, int name_len,
> +		       u64 inode_objectid, u64 ref_objectid, int ins_len,
> +		       int cow)
>  {
> +	int ret;
>  	struct btrfs_key key;
>  	struct btrfs_inode_ref *ref;
> -	int ins_len = mod < 0 ? -1 : 0;
> -	int cow = mod != 0;
> -	int ret;
>  
>  	key.objectid = inode_objectid;
>  	key.type = BTRFS_INODE_REF_KEY;
> @@ -76,20 +115,137 @@ btrfs_lookup_inode_ref(struct btrfs_trans_handle
*trans,
>  	return ref;
>  }
>  
> -int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
> +struct btrfs_inode_extref *
> +btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
> +			  struct btrfs_root *root,
> +			  struct btrfs_path *path,
> +			  const char *name, int name_len,
> +			  u64 inode_objectid, u64 ref_objectid, int ins_len,
> +			  int cow)
> +{
> +	int ret;
> +	struct btrfs_key key;
> +	struct btrfs_inode_extref *ref;                                   ^^^
Please use some other identifier than the one we use for struct
btrfs_inode_ref. Suggestion: extref or eref.
> +
> +	key.objectid = inode_objectid;
> +	key.type = BTRFS_INODE_EXTREF_KEY;
> +	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);                     ^^^^^^^^^^^^^^^^^^^^
Get''s clearer when that is called btrfs_extref_hash.
> +
> +	ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow);
> +	if (ret < 0)
> +		return ERR_PTR(ret);
> +	if (ret > 0)
> +		return NULL;
> +	if (!find_name_in_ext_backref(path, name, name_len, &ref))
> +		return NULL;
> +	return ref;
> +}
> +
> +int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
> +			      struct btrfs_root *root,
> +			      struct btrfs_path *path,
> +			      const char *name, int name_len,
> +			      u64 inode_objectid, u64 ref_objectid, int mod,
> +			      u64 *ret_index)
> +{
> +	struct btrfs_inode_ref *ref1;
> +	struct btrfs_inode_extref *ref2;                                   ^^^^
Please, use "ref" and ("eref" or "extref").
> +	int ins_len = mod < 0 ? -1 : 0;
> +	int cow = mod != 0;
> +
> +	ref1 = btrfs_lookup_inode_ref(trans, root, path, name, name_len,
> +				      inode_objectid, ref_objectid, ins_len,
> +				      cow);
> +	if (IS_ERR(ref1))
> +		return PTR_ERR(ref1);
> +
> +	if (ref1 != NULL) {
> +		*ret_index = btrfs_inode_ref_index(path->nodes[0], ref1);
> +		return 0;
> +	}
> +
> +	btrfs_release_path(path);
> +
> +	ref2 = btrfs_lookup_inode_extref(trans, root, path, name,
> +					 name_len, inode_objectid,
> +					 ref_objectid, ins_len, cow);
> +	if (IS_ERR(ref2))
> +		return PTR_ERR(ref2);
> +
> +	if (ref2) {
> +		*ret_index = btrfs_inode_extref_index(path->nodes[0], ref2);
> +		return 0;
> +	}
> +
> +	return -ENOENT;
> +}
> +
> +int btrfs_del_inode_extref(struct btrfs_trans_handle *trans,
>  			   struct btrfs_root *root,
>  			   const char *name, int name_len,
>  			   u64 inode_objectid, u64 ref_objectid, u64 *index)
>  {
>  	struct btrfs_path *path;
>  	struct btrfs_key key;
> +	struct btrfs_inode_extref *ref2;                                   ^^^^
> +	struct extent_buffer *leaf;
> +	int ret;
> +
> +	key.objectid = inode_objectid;
> +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> +	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	path->leave_spinning = 1;
> +
> +	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> +	if (ret > 0) {
> +		ret = -ENOENT;
> +		goto out;
> +	} else if (ret < 0) {
> +		goto out;
> +	}
Clearer (to me):

if (ret > 0)
	ret = -ENOENT;
if (ret < 0)
	goto out;
> +
> +	/*
> +	 * Sanity check - did we find the right item for this name?
> +	 * This should always succeed so error here will make the FS
> +	 * readonly.
> +	 */
> +	if (!find_name_in_ext_backref(path, name, name_len, &ref2)) {
> +		btrfs_std_error(root->fs_info, -ENOENT);
> +		ret = -EROFS;
Simply returning -EROFS doesn''t make the fs read-only, does it?
I''d
suggest to return -EIO and drop the comment above.
> +		goto out;
> +	}
> +
> +	leaf = path->nodes[0];
> +	if (index)
> +		*index = btrfs_inode_extref_index(leaf, ref2);
> +
> +	ret = btrfs_del_item(trans, root, path);
> +
> +out:
> +	btrfs_free_path(path);
> +
> +	return ret;
> +}
> +
> +int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
> +			struct btrfs_root *root,
> +			const char *name, int name_len,
> +			u64 inode_objectid, u64 ref_objectid, u64 *index)
> +{
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
>  	struct btrfs_inode_ref *ref;
>  	struct extent_buffer *leaf;
>  	unsigned long ptr;
>  	unsigned long item_start;
>  	u32 item_size;
>  	u32 sub_item_len;
> -	int ret;
> +	int ret, search_ext_refs = 0;               ^^^^^^^^^^^^^^^^^^^^^
Documentation/CodingStyle:
--
To this end, use just one data declaration per line (no commas for
multiple data declarations).
--

You do this multiple times in your whole patch set.
>  	int del_len = name_len + sizeof(*ref);
>  
>  	key.objectid = inode_objectid;
> @@ -105,12 +261,14 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle
*trans,
>  	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>  	if (ret > 0) {
>  		ret = -ENOENT;
> +		search_ext_refs = 1;
>  		goto out;
>  	} else if (ret < 0) {
>  		goto out;
>  	}
>  	if (!find_name_in_backref(path, name, name_len, &ref)) {
>  		ret = -ENOENT;
> +		search_ext_refs = 1;
>  		goto out;
>  	}
>  	leaf = path->nodes[0];
> @@ -132,6 +290,59 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle
*trans,
>  				  item_size - sub_item_len, 1);
>  out:
>  	btrfs_free_path(path);
> +
> +	if (search_ext_refs) {
As far as I see it, you can drop search_ext_refs and simply go this way
on ret == -ENOENT.
> +		/*                   ^
Catched by accident: trailing whitespace.
> +		 * No refs were found, or we could not find the
> +		 * name in our ref array. Find and remove the extended
> +		 * inode ref then.
> +		 */
> +		return btrfs_del_inode_extref(trans, root, name, name_len,
> +					      inode_objectid, ref_objectid, index);
> +	}
> +
> +	return ret;
> +}
> +
> +static int btrfs_insert_inode_extref(struct btrfs_trans_handle *trans,
> +				     struct btrfs_root *root,
> +				     const char *name, int name_len,
> +				     u64 inode_objectid, u64 ref_objectid, u64 index)
Are we missing a check against BTRFS_LINK_MAX in this function?
> +{
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +	struct btrfs_inode_extref *ref;extref
> +	unsigned long ptr;
> +	int ret;
> +	int ins_len = name_len + sizeof(*ref);
> +
> +	key.objectid = inode_objectid;
> +	key.type = BTRFS_INODE_EXTREF_KEY;
> +	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	path->leave_spinning = 1;
> +	ret = btrfs_insert_empty_item(trans, root, path, &key,
> +				      ins_len);
> +	if (ret < 0)
> +		goto out;
> +
> +	ref = btrfs_item_ptr(path->nodes[0], path->slots[0],
> +			     struct btrfs_inode_extref);
> +
> +	btrfs_set_inode_extref_name_len(path->nodes[0], ref, name_len);
> +	btrfs_set_inode_extref_index(path->nodes[0], ref, index);
> +	btrfs_set_inode_extref_parent(path->nodes[0], ref, ref_objectid);
> +
> +	ptr = (unsigned long)&ref->name;
> +	write_extent_buffer(path->nodes[0], name, ptr, name_len);
> +	btrfs_mark_buffer_dirty(path->nodes[0]);
> +
> +out:
> +	btrfs_free_path(path);
>  	return ret;
>  }
>  
> @@ -189,6 +400,19 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle
*trans,
>  
>  out:
>  	btrfs_free_path(path);
> +
> +	if (ret == -EMLINK) {
> +		struct btrfs_super_block *disk_super = root->fs_info->super_copy;
> +		/* We ran out of space in the ref array. Need to
> +		 * add an extended ref. */
> +		if (btrfs_super_incompat_flags(disk_super)
> +		    & BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
> +			ret = btrfs_insert_inode_extref(trans, root, name,
> +							name_len,
> +							inode_objectid,
> +							ref_objectid, index);
> +	}
> +
>  	return ret;
>  }
>  
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 892b347..0ca525d 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2689,7 +2689,6 @@ static struct btrfs_trans_handle
*__unlink_start_trans(struct inode *dir,
>  	struct btrfs_trans_handle *trans;
>  	struct btrfs_root *root = BTRFS_I(dir)->root;
>  	struct btrfs_path *path;
> -	struct btrfs_inode_ref *ref;
>  	struct btrfs_dir_item *di;
>  	struct inode *inode = dentry->d_inode;
>  	u64 index;
> @@ -2803,17 +2802,16 @@ static struct btrfs_trans_handle
*__unlink_start_trans(struct inode *dir,
>  	}
>  	btrfs_release_path(path);
>  
> -	ref = btrfs_lookup_inode_ref(trans, root, path,
> -				dentry->d_name.name, dentry->d_name.len,
> -				ino, dir_ino, 0);
> -	if (IS_ERR(ref)) {
> -		err = PTR_ERR(ref);
> +	ret = btrfs_get_inode_ref_index(trans, root, path,
dentry->d_name.name,
> +					dentry->d_name.len, ino, dir_ino, 0, &index);
This line is >80 chars. Please use scripts/checkpatch.pl to catch those.
> +	if (ret) {
> +		err = ret;
>  		goto out;
>  	}
> -	BUG_ON(!ref);
> +
>  	if (check_path_shared(root, path))
>  		goto out;
> -	index = btrfs_inode_ref_index(path->nodes[0], ref);
> +
>  	btrfs_release_path(path);
>  
>  	/*
> @@ -4484,6 +4482,10 @@ static struct inode *btrfs_new_inode(struct
btrfs_trans_handle *trans,
>  	btrfs_set_key_type(&key[0], BTRFS_INODE_ITEM_KEY);
>  	key[0].offset = 0;
>  
> +	/*
> +	 * Start new inodes with a V1 ref. This is slightly more
> +	 * efficient for small numbers of hard links.
> +	 */
I''d drop that comment. extref is no "V2" kind of ref. But you
can leave
it in if you feel it helps anyone later.
>  	key[1].objectid = objectid;
>  	btrfs_set_key_type(&key[1], BTRFS_INODE_REF_KEY);
>  	key[1].offset = ref_objectid;
> @@ -4777,7 +4779,7 @@ static int btrfs_link(struct dentry *old_dentry,
struct inode *dir,
>  	if (root->objectid != BTRFS_I(inode)->root->objectid)
>  		return -EXDEV;
>  
> -	if (inode->i_nlink == ~0U)
> +	if (inode->i_nlink >= BTRFS_LINK_MAX)
>  		return -EMLINK;
>  
>  	err = btrfs_set_inode_index(dir, &index);
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Apr-12 13:08 UTC

head link

Re: [PATCH 2/3] btrfs: extended inode refs

On 05.04.2012 22:09, Mark Fasheh wrote:> Teach tree-log.c about extended inode refs. In particular, we have to
adjust
> the behavior of inode ref replay as well as log tree recovery to account
for
> the existence of extended refs.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/btrfs/tree-log.c |  320
+++++++++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/tree-log.h |    4 +
>  2 files changed, 266 insertions(+), 58 deletions(-)
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 966cc74..d69b07a 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -23,6 +23,7 @@
>  #include "disk-io.h"
>  #include "locking.h"
>  #include "print-tree.h"
> +#include "backref.h"
So now tree-log.c includes backref.h and backref.c includes tree-log.h,
this is not a problem by itself, but I''d try to avoid such
dependencies.
I''d put find_one_extref over to backref.c to solve this, also because
there it would be closer to inode_ref_info (which does the same for
INODE_REFs.
>  #include "compat.h"
>  #include "tree-log.h"
>  
> @@ -748,6 +749,7 @@ static noinline int backref_in_log(struct btrfs_root
*log,
>  {
>  	struct btrfs_path *path;
>  	struct btrfs_inode_ref *ref;
> +	struct btrfs_inode_extref *extref;                                   ^^^^^^
:-)
>  	unsigned long ptr;
>  	unsigned long ptr_end;
>  	unsigned long name_ptr;
> @@ -764,8 +766,24 @@ static noinline int backref_in_log(struct btrfs_root
*log,
>  	if (ret != 0)
>  		goto out;
>  
> -	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
>  	ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
> +
> +	if (key->type == BTRFS_INODE_EXTREF_KEY) {
> +		extref = (struct btrfs_inode_extref *)ptr;
> +
> +		found_name_len = btrfs_inode_extref_name_len(path->nodes[0],
> +							     extref);
> +		if (found_name_len == namelen) {
> +			name_ptr = (unsigned long)&extref->name;
> +			ret = memcmp_extent_buffer(path->nodes[0], name,
> +						   name_ptr, namelen);
> +			if (ret == 0)
> +				match = 1;
> +		}
> +		goto out;
> +	}
> +
> +	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
>  	ptr_end = ptr + item_size;
>  	while (ptr < ptr_end) {
>  		ref = (struct btrfs_inode_ref *)ptr;
> @@ -786,7 +804,6 @@ out:
>  	return match;
>  }
>  
> -
>  /*
>   * replay one inode back reference item found in the log tree.
>   * eb, slot and key refer to the buffer and key found in the log tree.
> @@ -801,15 +818,20 @@ static noinline int add_inode_ref(struct
btrfs_trans_handle *trans,
>  				  struct btrfs_key *key)
>  {
>  	struct btrfs_inode_ref *ref;
> +	struct btrfs_inode_extref *extref;
>  	struct btrfs_dir_item *di;
> +	struct btrfs_key search_key;
You don''t need search_key, just use key from the parameter list as the
code did previously.
>  	struct inode *dir;
>  	struct inode *inode;
>  	unsigned long ref_ptr;
>  	unsigned long ref_end;
> -	char *name;
> -	int namelen;
> +	char *name, *victim_name;
> +	int namelen, victim_name_len;
split
>  	int ret;
>  	int search_done = 0;
> +	int log_ref_ver = 0;
> +	u64 parent_objectid, inode_objectid, ref_index;
split
> +	struct extent_buffer *leaf;
>  
>  	/*
>  	 * it is possible that we didn''t log all the parent directories
> @@ -817,32 +839,56 @@ static noinline int add_inode_ref(struct
btrfs_trans_handle *trans,
>  	 * copy the back ref in.  The link count fixup code will take
>  	 * care of the rest
>  	 */
> -	dir = read_one_inode(root, key->offset);
> +
> +	if (key->type == BTRFS_INODE_EXTREF_KEY) {
> +		log_ref_ver = 1;                ^^^^^^^^^^^^^^^
Assigned but never used.
> +
> +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> +
> +		/* So that we don''t loop back looking for old style log refs.
*/
> +		ref_end = ref_ptr;
> +
> +		extref = (struct btrfs_inode_extref *) btrfs_item_ptr_offset(eb, slot);
> +		namelen = btrfs_inode_extref_name_len(eb, extref);
> +		name = kmalloc(namelen, GFP_NOFS);
kmalloc may fail.
> +
> +		read_extent_buffer(eb, name, (unsigned long)&extref->name,
> +				   namelen);
> +
> +		ref_index = btrfs_inode_extref_index(eb, extref);
> +		parent_objectid = btrfs_inode_extref_parent(eb, extref);
> +	} else {
> +		parent_objectid = key->offset;
> +
> +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> +		ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> +
> +		ref = (struct btrfs_inode_ref *)ref_ptr;
> +		namelen = btrfs_inode_ref_name_len(eb, ref);
> +		name = kmalloc(namelen, GFP_NOFS);
> +		BUG_ON(!name);
> +
> +		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> +
> +		ref_index = btrfs_inode_ref_index(eb, ref);
The way you put it, you had to copy all the code from the else block
down before the "goto again" case. Code duplication is error prone.

Maybe you can manage to put everything between "again:" and "goto
again"
into a separate function that can be called multiple times. For the old
INODE_REF items that could be done in a loop then. This would also split
an overly long function into more readable units.
> +	}
> +
> +	inode_objectid = key->objectid;
> +
> +	dir = read_one_inode(root, parent_objectid);
>  	if (!dir)
>  		return -ENOENT;
>  
> -	inode = read_one_inode(root, key->objectid);
> +	inode = read_one_inode(root, inode_objectid);
>  	if (!inode) {
>  		iput(dir);
>  		return -EIO;
>  	}
>  
> -	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> -	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> -
>  again:
> -	ref = (struct btrfs_inode_ref *)ref_ptr;
> -
> -	namelen = btrfs_inode_ref_name_len(eb, ref);
> -	name = kmalloc(namelen, GFP_NOFS);
> -	BUG_ON(!name);
> -
> -	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> -
>  	/* if we already have a perfect match, we''re done */
>  	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
> -			 btrfs_inode_ref_index(eb, ref),
> -			 name, namelen)) {
> +			 ref_index, name, namelen)) {
>  		goto out;
>  	}
>  
> @@ -857,19 +903,23 @@ again:
>  	if (search_done)
>  		goto insert;
>  
> -	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
> +	/* Search old style refs */
> +	search_key.objectid = inode_objectid;
> +	search_key.type = BTRFS_INODE_REF_KEY;
> +	search_key.offset = parent_objectid;
> +
> +	ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
>  	if (ret == 0) {
> -		char *victim_name;
> -		int victim_name_len;
>  		struct btrfs_inode_ref *victim_ref;
>  		unsigned long ptr;
>  		unsigned long ptr_end;
> -		struct extent_buffer *leaf = path->nodes[0];
> +
> +		leaf = path->nodes[0];
>  
>  		/* are we trying to overwrite a back ref for the root directory
>  		 * if so, just jump out, we''re done
>  		 */
> -		if (key->objectid == key->offset)
> +		if (search_key.objectid == search_key.offset)
>  			goto out_nowrite;
>  
>  		/* check all the names in this back reference to see
> @@ -889,7 +939,7 @@ again:
>  					   (unsigned long)(victim_ref + 1),
>  					   victim_name_len);
>  
> -			if (!backref_in_log(log, key, victim_name,
> +			if (!backref_in_log(log, &search_key, victim_name,
>  					    victim_name_len)) {
>  				btrfs_inc_nlink(inode);
>  				btrfs_release_path(path);
> @@ -902,19 +952,44 @@ again:
>  			ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
>  		}
>  		BUG_ON(ret);
> -
> -		/*
> -		 * NOTE: we have searched root tree and checked the
> -		 * coresponding ref, it does not need to check again.
> -		 */
> -		search_done = 1;
>  	}
>  	btrfs_release_path(path);
>  
> +	/* Same search but for extended refs */
> +	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
> +					   inode_objectid, parent_objectid, 0,
> +					   0);
> +	if (extref && !IS_ERR(extref)) {
if (!IS_ERR_OR_NULL(extref))
> +		victim_name_len = btrfs_inode_extref_name_len(eb, extref);
> +		victim_name = kmalloc(namelen, GFP_NOFS);
> +		leaf = path->nodes[0];
> +		read_extent_buffer(eb, name, (unsigned long)&extref->name,
namelen);
> +
> +		search_key.objectid = inode_objectid;
> +		search_key.type = BTRFS_INODE_EXTREF_KEY;
> +		search_key.offset = btrfs_extref_key_off(parent_objectid, name,
namelen);
> +		if (!backref_in_log(log, &search_key, victim_name,
> +				    victim_name_len)) {
> +			btrfs_inc_nlink(inode);
> +			btrfs_release_path(path);
> +
> +			ret = btrfs_unlink_inode(trans, root, dir,
> +						 inode, victim_name,
> +						 victim_name_len);
> +		}
> +		kfree(victim_name);
> +		BUG_ON(ret);
> +	}
> +
> +	/*
> +	 * NOTE: we have searched root tree and checked the
> +	 * coresponding refs, it does not need to be checked again.
> +	 */
> +	search_done = 1;
You moved this comment and assignment out of the "if (ret == 0)" case.
I''m not sure if this is still doing exactly the same now (?).
Previously, we were executing another btrfs_search_slot,
btrfs_lookup_dir_index_item, ... after the "goto again" case, which
would be skipped with this patch.
> +
>  	/* look for a conflicting sequence number */
>  	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
> -					 btrfs_inode_ref_index(eb, ref),
> -					 name, namelen, 0);
> +					 ref_index, name, namelen, 0);
>  	if (di && !IS_ERR(di)) {
>  		ret = drop_one_dir_item(trans, root, path, dir, di);
>  		BUG_ON(ret);
> @@ -932,17 +1007,25 @@ again:
>  
>  insert:
>  	/* insert our name */
> -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> -			     btrfs_inode_ref_index(eb, ref));
> +	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
>  	BUG_ON(ret);
>  
>  	btrfs_update_inode(trans, root, inode);
>  
>  out:
> -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> +	ref_ptr = (unsigned long)(ref_ptr + sizeof(struct btrfs_inode_ref)) +
namelen;
>  	kfree(name);
> -	if (ref_ptr < ref_end)
> +	if (ref_ptr < ref_end) {
> +		ref = (struct btrfs_inode_ref *)ref_ptr;
> +		namelen = btrfs_inode_ref_name_len(eb, ref);
> +		name = kmalloc(namelen, GFP_NOFS);
> +		BUG_ON(!name);
> +
> +		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> +
> +		ref_index = btrfs_inode_ref_index(eb, ref);
This is the duplicated code mentioned above.
>  		goto again;
> +	}
>  
>  	/* finally write the back reference in the inode */
>  	ret = overwrite_item(trans, root, path, eb, slot, key);
> @@ -965,25 +1048,103 @@ static int insert_orphan_item(struct
btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_ref, u64 *found_off)
ret_extref
> +{
> +	int ret, slot;
> +	struct btrfs_key key, found_key;
split
> +	struct btrfs_inode_extref *ref;
> +	struct extent_buffer *leaf;
> +	struct btrfs_item *item;
> +	unsigned long ptr;
>  
> -/*
> - * There are a few corners where the link count of the file can''t
> - * be properly maintained during replay.  So, instead of adding
> - * lots of complexity to the log code, we just scan the backrefs
> - * for any file that has been through replay.
> - *
> - * The scan will update the link count on the inode to reflect the
> - * number of back refs found.  If it goes down to zero, the iput
> - * will free the inode.
> - */
> -static noinline int fixup_inode_link_count(struct btrfs_trans_handle
*trans,
> -					   struct btrfs_root *root,
> -					   struct inode *inode)
[side note: this is really strange patch alignment. my git is doing it
the same way.]
> +	key.objectid = inode_objectid;
> +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> +	key.offset = start_off;
> +
> +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +	if (ret < 0)
> +		goto out;
Just return here and drop the out label.
> +
> +	while (1) {
> +		leaf = path->nodes[0];
> +		slot = path->slots[0];
> +		if (slot >= btrfs_header_nritems(leaf)) {
> +			/*
> +			 * If the item at offset is not found,
> +			 * btrfs_search_slot will point us to the slot
> +			 * where it should be inserted. In our case
> +			 * that will be the slot directly before the
> +			 * next INODE_REF_KEY_V2 item. In the case
> +			 * that we''re pointing to the last slot in a
> +			 * leaf, we must move one leaf over.
> +			 */
> +			ret = btrfs_next_leaf(root, path);
> +			if (ret) {
> +				if (ret >= 1)
> +					ret = -ENOENT;
> +				break;
> +			}
This large block (and its comment) can be replaced by
btrfs_search_slot_for_read with find_higher = 1, return_any = 0. This
also avoids the "continue" and we even git rid of the whole while (1)
loop here.
> +			continue;
> +		}
> +
> +		item = btrfs_item_nr(leaf, slot);
> +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> +
> +		/*
> +		 * Check that we''re still looking at an extended ref key for
> +		 * this particular objectid. If we have different
> +		 * objectid or type then there are no more to be found
> +		 * in the tree and we can exit.
> +		 */
> +		ret = -ENOENT;
> +		if (found_key.objectid != inode_objectid)
> +			break;
> +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> +			break;
> +
> +		ret = 0;
> +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +		ref = (struct btrfs_inode_extref *)ptr;
> +		*ret_ref = ref;
> +		if (found_off)
> +			*found_off = found_key.offset + 1;
> +		break;
> +	}
> +
> +out:
> +	return ret;
> +}
> +
> +static int count_inode_extrefs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
> +{
> +	int ret;
> +	unsigned int nlink = 0;
> +	u64 inode_objectid = btrfs_ino(inode);
> +	u64 offset = 0;
> +	struct btrfs_inode_extref *ref;
> +
> +	while (1) {
> +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> +					    &ref, &offset);
> +		if (ret)
> +			break;
Assume the first call to btrfs_find_ione_extref returns -EIO. Do we
really want count_inode_extrefs return 0 here? I agree that the previous
code suffers from the same problem, but still: it''s a problem.
> +
> +		nlink++;
> +		offset++;
> +	}
> +
> +	return nlink;
> +}
> +
> +static int count_inode_refs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
>  {
> -	struct btrfs_path *path;
>  	int ret;
>  	struct btrfs_key key;
> -	u64 nlink = 0;
> +	unsigned int nlink = 0;
>  	unsigned long ptr;
>  	unsigned long ptr_end;
>  	int name_len;
> @@ -993,10 +1154,6 @@ static noinline int fixup_inode_link_count(struct
btrfs_trans_handle *trans,
>  	key.type = BTRFS_INODE_REF_KEY;
>  	key.offset = (u64)-1;
>  
> -	path = btrfs_alloc_path();
> -	if (!path)
> -		return -ENOMEM;
> -
>  	while (1) {
>  		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>  		if (ret < 0)
> @@ -1030,6 +1187,48 @@ static noinline int fixup_inode_link_count(struct
btrfs_trans_handle *trans,
>  		btrfs_release_path(path);
>  	}
>  	btrfs_release_path(path);
> +	btrfs_free_path(path);
In general, you must not free a path when you don''t allocate it.
> +
> +	return nlink;
> +}
> +
> +/*
> + * There are a few corners where the link count of the file can''t
> + * be properly maintained during replay.  So, instead of adding
> + * lots of complexity to the log code, we just scan the backrefs
> + * for any file that has been through replay.
> + *
> + * The scan will update the link count on the inode to reflect the
> + * number of back refs found.  If it goes down to zero, the iput
> + * will free the inode.
> + */
> +static noinline int fixup_inode_link_count(struct btrfs_trans_handle
*trans,
> +					   struct btrfs_root *root,
> +					   struct inode *inode)
> +{
> +	struct btrfs_path *path;
> +	int ret;
> +	u64 nlink = 0;
> +	u64 ino = btrfs_ino(inode);
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	ret = count_inode_refs(root, inode, path);
> +	btrfs_release_path(path);
Either count_inode_refs should alloc it''s private path, or it should
return the path in a released state. The caller should not be
responsible to pass a clean path and release it afterwards, as
count_inode_refs does not return anything through the path.
> +	if (ret < 0)
> +		goto out;
> +
> +	nlink = ret;
> +
> +	ret = count_inode_extrefs(root, inode, path);
> +	btrfs_release_path(path);
Same here. I''d just put the btrfs_release_path into
count_inode_(ext)refs.
> +	if (ret < 0)
> +		goto out;
> +
> +	nlink += ret;
> +
>  	if (nlink != inode->i_nlink) {
>  		set_nlink(inode, nlink);
>  		btrfs_update_inode(trans, root, inode);
> @@ -1045,9 +1244,10 @@ static noinline int fixup_inode_link_count(struct
btrfs_trans_handle *trans,
>  		ret = insert_orphan_item(trans, root, ino);
>  		BUG_ON(ret);
>  	}
> -	btrfs_free_path(path);
>  
> -	return 0;
> +out:
> +	btrfs_free_path(path);
> +	return ret;
>  }
>  
>  static noinline int fixup_inode_link_counts(struct btrfs_trans_handle
*trans,
> @@ -1689,6 +1889,10 @@ static int replay_one_buffer(struct btrfs_root *log,
struct extent_buffer *eb,
>  			ret = add_inode_ref(wc->trans, root, log, path,
>  					    eb, i, &key);
>  			BUG_ON(ret && ret != -ENOENT);
> +		} else if (key.type == BTRFS_INODE_EXTREF_KEY) {
> +			ret = add_inode_ref(wc->trans, root, log, path,
> +					    eb, i, &key);
> +			BUG_ON(ret && ret != -ENOENT);
>  		} else if (key.type == BTRFS_EXTENT_DATA_KEY) {
>  			ret = replay_one_extent(wc->trans, root, path,
>  						eb, i, &key);
> diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
> index 2270ac5..fd40ad5 100644
> --- a/fs/btrfs/tree-log.h
> +++ b/fs/btrfs/tree-log.h
> @@ -49,4 +49,8 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle
*trans,
>  int btrfs_log_new_name(struct btrfs_trans_handle *trans,
>  			struct inode *inode, struct inode *old_dir,
>  			struct dentry *parent);
> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_ref, u64 *found_off);
> +
>  #endif
-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Apr-12 15:53 UTC

head link

Re: [PATCH 2/3] btrfs: extended inode refs

Hi Mark,

While reading 3/3 I stumbled across one more thing in this one:

On 05.04.2012 22:09, Mark Fasheh wrote:> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_ref, u64 *found_off)
> +{
> +	int ret, slot;
> +	struct btrfs_key key, found_key;
> +	struct btrfs_inode_extref *ref;
> +	struct extent_buffer *leaf;
> +	struct btrfs_item *item;
> +	unsigned long ptr;
>  
> -/*
> - * There are a few corners where the link count of the file can''t
> - * be properly maintained during replay.  So, instead of adding
> - * lots of complexity to the log code, we just scan the backrefs
> - * for any file that has been through replay.
> - *
> - * The scan will update the link count on the inode to reflect the
> - * number of back refs found.  If it goes down to zero, the iput
> - * will free the inode.
> - */
> -static noinline int fixup_inode_link_count(struct btrfs_trans_handle
*trans,
> -					   struct btrfs_root *root,
> -					   struct inode *inode)
> +	key.objectid = inode_objectid;
> +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> +	key.offset = start_off;
> +
> +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +	if (ret < 0)
> +		goto out;
> +
> +	while (1) {
> +		leaf = path->nodes[0];
> +		slot = path->slots[0];
> +		if (slot >= btrfs_header_nritems(leaf)) {
> +			/*
> +			 * If the item at offset is not found,
> +			 * btrfs_search_slot will point us to the slot
> +			 * where it should be inserted. In our case
> +			 * that will be the slot directly before the
> +			 * next INODE_REF_KEY_V2 item. In the case
> +			 * that we''re pointing to the last slot in a
> +			 * leaf, we must move one leaf over.
> +			 */
> +			ret = btrfs_next_leaf(root, path);
> +			if (ret) {
> +				if (ret >= 1)
> +					ret = -ENOENT;
> +				break;
> +			}
> +			continue;
> +		}
> +
> +		item = btrfs_item_nr(leaf, slot);
> +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> +
> +		/*
> +		 * Check that we''re still looking at an extended ref key for
> +		 * this particular objectid. If we have different
> +		 * objectid or type then there are no more to be found
> +		 * in the tree and we can exit.
> +		 */
> +		ret = -ENOENT;
> +		if (found_key.objectid != inode_objectid)
> +			break;
> +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> +			break;
> +
> +		ret = 0;
> +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +		ref = (struct btrfs_inode_extref *)ptr;
> +		*ret_ref = ref;
> +		if (found_off)
> +			*found_off = found_key.offset + 1;                                                      ^^^
It''s evil to call it "found offset" an then return one larger
than the
offset found. No caller would ever expect this.
> +		break;
> +	}
> +
> +out:
> +	return ret;
> +}
> +
> +static int count_inode_extrefs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
> +{
> +	int ret;
> +	unsigned int nlink = 0;
> +	u64 inode_objectid = btrfs_ino(inode);
> +	u64 offset = 0;
> +	struct btrfs_inode_extref *ref;
> +
> +	while (1) {
> +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> +					    &ref, &offset);
> +		if (ret)
> +			break;
> +
> +		nlink++;
> +		offset++;		^^^^^^^^
Huh. See? The caller expected to get the offset found from
btrfs_find_one_extref. As it stands you might be missing the very next key.
> +	}
> +
> +	return nlink;
> +}
-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-Apr-12 16:11 UTC

head link

Re: [PATCH 0/3] btrfs: extended inode refs

On Wed, Apr 11, 2012 at 03:11:46PM +0200, Jan Schmidt
wrote:> Hi Jeff,
> > 
> >> An alternative solution to dealing with collisions could be to
> >> emulate the dir-item insertion code - specifically something like
> >> insert_with_overflow() which will stuff multiple items under one
> >> key. I tend to prefer the idea of
> > 
> > I vote for this option.
[ Big patch series, thanks Mark! ]

I prefer the insert_with_overflow because it makes the deletion case
less complex.  If we handle collisions with bits in the offset, we have
to search around in the tree to find the key that was actually used to
insert the item.

The insert_with_overflow code uses just one key, at the cost of having
to search around inside the item.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Fasheh

2012-Apr-12 16:19 UTC

head link

Re: [PATCH 0/3] btrfs: extended inode refs

On Thu, Apr 12, 2012 at 12:11:13PM -0400, Chris Mason
wrote:> On Wed, Apr 11, 2012 at 03:11:46PM +0200, Jan Schmidt wrote:
> > Hi Jeff,
> > > 
> > >> An alternative solution to dealing with collisions could be
to
> > >> emulate the dir-item insertion code - specifically something
like
> > >> insert_with_overflow() which will stuff multiple items under
one
> > >> key. I tend to prefer the idea of
> > > 
> > > I vote for this option.
> 
> [ Big patch series, thanks Mark! ]
> 
> I prefer the insert_with_overflow because it makes the deletion case
> less complex.  If we handle collisions with bits in the offset, we have
> to search around in the tree to find the key that was actually used to
> insert the item.
> 
> The insert_with_overflow code uses just one key, at the cost of having
> to search around inside the item.
Yeah this actually turns out to be a bit easier to code as well. I''m
taking
this approach.
	--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Apr-12 17:59 UTC

head link

Re: [PATCH 3/3] btrfs: extended inode refs

On 05.04.2012 22:09, Mark Fasheh wrote:> The iterate_irefs in backref.c is used to build path components from inode
> refs. I had to add a 2nd iterate function callback to handle extended refs.
> 
> Both iterate callbacks eventually converge upon iref_to_path() which I was
> able to keep as one function with some small code to abstract away
> differences in the two disk structures.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/btrfs/backref.c |  200
++++++++++++++++++++++++++++++++++++++++++----------
>  fs/btrfs/backref.h |    4 +-
>  2 files changed, 165 insertions(+), 39 deletions(-)
> 
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index 0436c12..f2b8952 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -22,6 +22,7 @@
>  #include "ulist.h"
>  #include "transaction.h"
>  #include "delayed-ref.h"
> +#include "tree-log.h"
I mentioned that in the 2/3 review mail. I''d rather get rid of this
#include here.
>  
>  /*
>   * this structure records all encountered refs on the way up to the root
> @@ -858,62 +859,75 @@ static int inode_ref_info(u64 inum, u64 ioff, struct
btrfs_root *fs_root,
>  }
>  
>  /*
> - * this iterates to turn a btrfs_inode_ref into a full filesystem path.
elements
> - * of the path are separated by ''/'' and the path is
guaranteed to be
> - * 0-terminated. the path is only given within the current file system.
> - * Therefore, it never starts with a ''/''. the caller is
responsible to provide
> - * "size" bytes in "dest". the dest buffer will be
filled backwards. finally,
> - * the start point of the resulting string is returned. this pointer is
within
> - * dest, normally.
> - * in case the path buffer would overflow, the pointer is decremented
further
> - * as if output was written to the buffer, though no more output is
actually
> - * generated. that way, the caller can determine how much space would be
> - * required for the path to fit into the buffer. in that case, the
returned
> - * value will be smaller than dest. callers must check this!
> + * Given the parent objectid and name/name_len pairs of an inode ref
> + * (any version) this iterates to turn that information into a
> + * full filesystem path. elements of the path are separated by
''/'' and
> + * the path is guaranteed to be 0-terminated. the path is only given
> + * within the current file system.  Therefore, it never starts with a
> + * ''/''. the caller is responsible to provide
"size" bytes in
> + * "dest". the dest buffer will be filled backwards. finally,
the
> + * start point of the resulting string is returned. this pointer is
> + * within dest, normally.  in case the path buffer would overflow, the
> + * pointer is decremented further as if output was written to the
> + * buffer, though no more output is actually generated. that way, the
> + * caller can determine how much space would be required for the path
> + * to fit into the buffer. in that case, the returned value will be
> + * smaller than dest. callers must check this!
It would reduce patch sets if you can extend comments in a compatible
way, you make reviewers happy if you don''t realign text (or, later,
function parameters) where it''s not required.
>   */
>  static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path
*path,
> -				struct btrfs_inode_ref *iref,
> -				struct extent_buffer *eb_in, u64 parent,
> -				char *dest, u32 size)
> +			  int name_len, unsigned long name_off,
name_len should be u32
> +			  struct extent_buffer *eb_in, u64 parent,
> +			  char *dest, u32 size)
>  {
> -	u32 len;
>  	int slot;
>  	u64 next_inum;
>  	int ret;
>  	s64 bytes_left = size - 1;
>  	struct extent_buffer *eb = eb_in;
>  	struct btrfs_key found_key;
> +	struct btrfs_inode_ref *iref;
> +	struct btrfs_inode_extref *iref2;
iextref
>  
>  	if (bytes_left >= 0)
>  		dest[bytes_left] = ''\0'';
>  
>  	while (1) {
> -		len = btrfs_inode_ref_name_len(eb, iref);
> -		bytes_left -= len;
> +		bytes_left -= name_len;
>  		if (bytes_left >= 0)
>  			read_extent_buffer(eb, dest + bytes_left,
> -						(unsigned long)(iref + 1), len);
> +					   name_off, name_len);
>  		if (eb != eb_in)
>  			free_extent_buffer(eb);
> +
> +		/* Ok, we have enough to find any refs to the parent inode. */
>  		ret = inode_ref_info(parent, 0, fs_root, path, &found_key);
> -		if (ret > 0)
> -			ret = -ENOENT;
> -		if (ret)
> -			break;
>  		next_inum = found_key.offset;
> +		if (ret == 0) {
> +			slot = path->slots[0];
> +			eb = path->nodes[0];
> +			/* make sure we can use eb after releasing the path */
> +			if (eb != eb_in)
> +				atomic_inc(&eb->refs);
> +			btrfs_release_path(path);
> +			iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
> +
> +			name_len = btrfs_inode_ref_name_len(eb, iref);
> +			name_off = (unsigned long)(iref + 1);
> +		} else {
> +			ret = btrfs_find_one_extref(fs_root, parent, 0, path,
> +						    &iref2, NULL);
> +			if (ret)
> +				break;
> +
> +			next_inum = btrfs_inode_extref_parent(eb, iref2);
> +			name_off = (unsigned long)&iref2->name;
> +			name_len = btrfs_inode_extref_name_len(eb, iref2);
> +		}
>  
>  		/* regular exit ahead */
>  		if (parent == next_inum)
>  			break;
These regular exit lines must go before the block you inserted.
Otherwise we leak a reference on eb if it''s != eb_in.

Whereas I think we don''t need this if-else-construct at all. We do need
the changes you made as to passing name_len and name_off, I agree.
However, the rest of the function should stay as it was, because the
parent of each object must be a directory and a directory won''t have
hard links. Thus, we''ll never meet INODE_EXTREFs when walking up the
path. Or did I miss something?
>  
> -		slot = path->slots[0];
> -		eb = path->nodes[0];
> -		/* make sure we can use eb after releasing the path */
> -		if (eb != eb_in)
> -			atomic_inc(&eb->refs);
> -		btrfs_release_path(path);
> -
> -		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
>  		parent = next_inum;
>  		--bytes_left;
>  		if (bytes_left >= 0)
> @@ -1226,9 +1240,9 @@ int iterate_inodes_from_logical(u64 logical, struct
btrfs_fs_info *fs_info,
>  	return ret;
>  }
>  
> -static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> -				struct btrfs_path *path,
> -				iterate_irefs_t *iterate, void *ctx)
> +static int iterate_inode_refs(u64 inum, struct btrfs_root *fs_root,
> +			      struct btrfs_path *path,
> +			      iterate_irefs_t *iterate, void *ctx)
This function must not call free_extent_buffer(eb) in line 1306 after
applying your patch set (immediately before the break). Second, I think
we''d better add a blocking read lock on eb after incrementing
it''s
refcount, because we need the current content to stay as it is. Both
isn''t part of your patches, but it might be easier if you make that
bugfix change as a 3/4 patch within your set and turn this one into 4/4.
If you don''t like that, I''ll send a separate patch for it.
Don''t miss
the unlock if you do it ;-)
>  {
>  	int ret;
>  	int slot;
> @@ -1244,7 +1258,7 @@ static int iterate_irefs(u64 inum, struct btrfs_root
*fs_root,
>  
>  	while (1) {
>  		ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path,
> -					&found_key);
> +				     &found_key);
>  		if (ret < 0)
>  			break;
>  		if (ret) {
> @@ -1286,6 +1300,76 @@ static int iterate_irefs(u64 inum, struct btrfs_root
*fs_root,
>  	return ret;
>  }
>  
> +static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
> +				 struct btrfs_path *path,
> +				 iterate_extrefs_t *iterate, void *ctx)
> +{
> +	int ret;
> +	int slot;
> +	u64 offset = 0;
> +	u64 parent;
> +	int found = 0;
> +	struct extent_buffer *eb;
> +	struct btrfs_item *item;
> +	struct btrfs_inode_extref *iref2;
iextref
> +
> +	while (1) {
> +		ret = btrfs_find_one_extref(fs_root, inum, offset, path, &iref2,
> +					    &offset);
> +		if (ret < 0)
> +			break;
> +		if (ret) {
> +			ret = found ? 0 : -ENOENT;
> +			break;
> +		}
> +		++found;
> +
> +		slot = path->slots[0];
> +		eb = path->nodes[0];
> +		/* make sure we can use eb after releasing the path */
> +		atomic_inc(&eb->refs);
You need a blocking read lock here, too. Grab it before releasing the path.
> +		btrfs_release_path(path);
> +
> +		item = btrfs_item_nr(eb, slot);
You don''t need item.
> +		iref2 = btrfs_item_ptr(eb, slot, struct btrfs_inode_extref);
> +
> +		parent = btrfs_inode_extref_parent(eb, iref2);
> +		ret = iterate(parent, iref2, eb, ctx);
The caller shouldn''t have to deal with two different types of
callbacks.
Please just build a dummy struct btrfs_inode_ref object here and pass it
to iterate. Alternatively, you can extract the information the caller
will need, here, and pass that instead of a struct btrfs_inode_ref. This
way, we can use the same type for both iterate() functions.
> +		if (ret) {
> +			free_extent_buffer(eb);
> +			break;
> +		}
> +
> +		free_extent_buffer(eb);
Call free_extent_buffer(eb) before the if (ret), drop it from the if
block, add an unlock before the if block.
> +		offset++;
Another caller not expecting btrfs_find_one_extref to return offset
incremented. The offset++ should stay here and btrfs_find_one_extref
should just return the plain offset.
> +	}
> +
> +	btrfs_release_path(path);
> +
> +	return ret;
> +}
> +
> +static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> +			 struct btrfs_path *path,
> +			 iterate_irefs_t *iterate,
> +			 iterate_extrefs_t *iterate2, void *ctx)
As mentioned above, I''d like to see only be a single iterate function
at
this level.
> +{
> +	int ret, found_refs = 0;
split
> +
> +	ret = iterate_inode_refs(inum, fs_root, path, iterate, ctx);
> +	if (ret && ret != -ENOENT)
> +		return ret;
> +
> +	if (ret != -ENOENT)
> +		++found_refs;
I''d make those 2 if statements:

if (!ret)
	++found_refs;
else if (ret != -ENOENT)
	return ret;
> +
> +	ret = iterate_inode_extrefs(inum, fs_root, path, iterate2, ctx);
> +	if (ret == -ENOENT && found_refs)
> +		return 0;
> +
> +	return ret;
> +}
> +
>  /*
>   * returns 0 if the path could be dumped (probably truncated)
>   * returns <0 in case of an error
> @@ -1299,13 +1383,53 @@ static int inode_to_path(u64 inum, struct
btrfs_inode_ref *iref,
>  	int i = ipath->fspath->elem_cnt;
>  	const int s_ptr = sizeof(char *);
>  	u32 bytes_left;
> +	u32 name_len = btrfs_inode_ref_name_len(eb, iref);
> +
> +	bytes_left = ipath->fspath->bytes_left > s_ptr ?
> +					ipath->fspath->bytes_left - s_ptr : 0;
> +
> +	fspath_min = (char *)ipath->fspath->val + (i + 1) * s_ptr;
> +	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, name_len,
> +			      (unsigned long)(iref + 1), eb, inum, fspath_min,
> +			      bytes_left);
> +	if (IS_ERR(fspath))
> +		return PTR_ERR(fspath);
> +
> +	if (fspath > fspath_min) {
> +		ipath->fspath->val[i] = (u64)(unsigned long)fspath;
> +		++ipath->fspath->elem_cnt;
> +		ipath->fspath->bytes_left = fspath - fspath_min;
> +	} else {
> +		++ipath->fspath->elem_missed;
> +		ipath->fspath->bytes_missing += fspath_min - fspath;
> +		ipath->fspath->bytes_left = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * returns 0 if the path could be dumped (probably truncated)
> + * returns <0 in case of an error
> + */
> +static int inode_to_path2(u64 inum, struct btrfs_inode_extref *iref2,
> +			  struct extent_buffer *eb, void *ctx)
We''ll get rid of a second inode_to_path, too, if my suggestion works
out.
> +{
> +	struct inode_fs_paths *ipath = ctx;
> +	char *fspath;
> +	char *fspath_min;
> +	int i = ipath->fspath->elem_cnt;
> +	const int s_ptr = sizeof(char *);
> +	u32 bytes_left;
> +	u32 name_len = btrfs_inode_extref_name_len(eb, iref2);
>  
>  	bytes_left = ipath->fspath->bytes_left > s_ptr ?
>  					ipath->fspath->bytes_left - s_ptr : 0;
>  
>  	fspath_min = (char *)ipath->fspath->val + (i + 1) * s_ptr;
> -	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, iref, eb,
> -				inum, fspath_min, bytes_left);
> +	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, name_len,
> +			      (unsigned long)&iref2->name, eb, inum,
> +			      fspath_min, bytes_left);
>  	if (IS_ERR(fspath))
>  		return PTR_ERR(fspath);
>  
> @@ -1339,7 +1463,7 @@ static int inode_to_path(u64 inum, struct
btrfs_inode_ref *iref,
>  int paths_from_inode(u64 inum, struct inode_fs_paths *ipath)
>  {
>  	return iterate_irefs(inum, ipath->fs_root, ipath->btrfs_path,
> -				inode_to_path, ipath);
> +			     inode_to_path, inode_to_path2, ipath);
>  }
>  
>  /*
> diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
> index d00dfa9..4634ed7 100644
> --- a/fs/btrfs/backref.h
> +++ b/fs/btrfs/backref.h
> @@ -31,7 +31,9 @@ struct inode_fs_paths {
>  typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root,
>  		void *ctx);
>  typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref,
> -				struct extent_buffer *eb, void *ctx);
> +				 struct extent_buffer *eb, void *ctx);
> +typedef int (iterate_extrefs_t)(u64 parent, struct btrfs_inode_extref
*iref,
> +				 struct extent_buffer *eb, void *ctx);
>  
>  int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
>  			struct btrfs_path *path);
Thanks!
-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Apr-12 18:38 UTC

head link

Re: [PATCH 3/3] btrfs: extended inode refs

On 12.04.2012 19:59, Jan Schmidt wrote:>> -static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
>> -				struct btrfs_path *path,
>> -				iterate_irefs_t *iterate, void *ctx)
>> +static int iterate_inode_refs(u64 inum, struct btrfs_root *fs_root,
>> +			      struct btrfs_path *path,
>> +			      iterate_irefs_t *iterate, void *ctx)
> 
> This function must not call free_extent_buffer(eb) in line 1306 after
> applying your patch set (immediately before the break). Second, I think
> we''d better add a blocking read lock on eb after incrementing
it''s
> refcount, because we need the current content to stay as it is. Both
> isn''t part of your patches, but it might be easier if you make
that
> bugfix change as a 3/4 patch within your set and turn this one into 4/4.
> If you don''t like that, I''ll send a separate patch for
it. Don''t miss
> the unlock if you do it ;-)
FYI: There are more read locks missing in the current version of
backref.c. So I made a bugfix patch myself which I''ll test and send
tomorrow.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Fasheh

2012-Apr-24 22:23 UTC

head link

Re: [PATCH 1/3] btrfs: extended inode refs

Jan, firstly thanks very much for the thorough review! I wanted to make sure
I got the collision handling going before addressing the issues you found.
Again thanks, and my notes are below.

On Thu, Apr 12, 2012 at 03:08:27PM +0200, Jan Schmidt
wrote:> > +/*
> > + * Theoretical limit is larger, but we keep this down to a sane
> > + * value. That should limit greatly the possibility of collisions on
> > + * inode ref items.
> > + */
> > +#define BTRFS_LINK_MAX 65535U
> 
> Do we really want an artificial limit like that?
I took a look at other file systems and this seems to be a pretty reasonable
maximum. Ext4 and XFS in particular have similar limits.

Granted, btrfs should be better equipped to deal with large numbers of hard
links - at least in the area of consistency checking.

That said, I think keeping this down to "only" 65 thousand hard links
per
inode should work out fine. Also it''s not a large change to increase or
remove the limit should we have to.

> > -struct btrfs_inode_ref *
> > +static int find_name_in_ext_backref(struct btrfs_path *path, const
char *name,
> > +				   int name_len, struct btrfs_inode_extref **ref_ret)
> > +{
> > +	struct extent_buffer *leaf;
> > +	struct btrfs_inode_extref *ref;
> > +	unsigned long ptr;
> > +	unsigned long name_ptr;
> > +	u32 item_size;
> > +
> > +	leaf = path->nodes[0];
> > +	item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> > +	ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +	ref = (struct btrfs_inode_extref *)ptr;
> > +
> > +	/*
> > +	 * There isn''t actually anything to "search" in
an extended
> > +	 * backref - one name is directly attached to each
> > +	 * item. Instead what we''re really doing here is some sort
of
> > +	 * sanity check - does the name exist where it should have
> > +	 * been found. If all is well, we''ll return success and the
> > +	 * inode ref object.
> > +	 */
> 
> In that case, the name is not a good choice. However, if we change how
> we deal with hash collisions, the name might be right and the comment
> might go away as we really look through more than one item. If we leave
> it like this, please rename that function.
Yeah, great catch. In the newer version as you''ve noted, it *is*
actually a
''find me the name'' function, so the comment (and resulting
EROFS) have been
removed.

> > +	name_ptr = (unsigned long)(&ref->name);
> > +	if (btrfs_inode_extref_name_len(leaf, ref) == name_len
> > +	    && (memcmp_extent_buffer(leaf, name, name_ptr, name_len)
== 0)) {
> > +		*ref_ret = ref;
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Figure the key offset of an extended inode ref
> > + */
> > +u64 btrfs_extref_key_off(u64 parent_objectid, const char *name, int
name_len)
> 
> I''d suggest to call this one btrfs_extref_hash and move it to
> fs/btrfs/hash.h. That way, we can also drop the include for
> <linux/crc32c.h> above.
Done.

> > +{
> > +	return (u64) crc32c(parent_objectid, name, name_len);
> > +}
> > +
> > +static struct btrfs_inode_ref *
> >  btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
> > -			struct btrfs_root *root,
> > -			struct btrfs_path *path,
> > -			const char *name, int name_len,
> > -			u64 inode_objectid, u64 ref_objectid, int mod)
> > +		       struct btrfs_root *root,
> > +		       struct btrfs_path *path,
> > +		       const char *name, int name_len,
> > +		       u64 inode_objectid, u64 ref_objectid, int ins_len,
> > +		       int cow)
> >  {
> > +	int ret;
> >  	struct btrfs_key key;
> >  	struct btrfs_inode_ref *ref;
> > -	int ins_len = mod < 0 ? -1 : 0;
> > -	int cow = mod != 0;
> > -	int ret;
> >  
> >  	key.objectid = inode_objectid;
> >  	key.type = BTRFS_INODE_REF_KEY;
> > @@ -76,20 +115,137 @@ btrfs_lookup_inode_ref(struct btrfs_trans_handle
*trans,
> >  	return ref;
> >  }
> >  
> > -int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
> > +struct btrfs_inode_extref *
> > +btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
> > +			  struct btrfs_root *root,
> > +			  struct btrfs_path *path,
> > +			  const char *name, int name_len,
> > +			  u64 inode_objectid, u64 ref_objectid, int ins_len,
> > +			  int cow)
> > +{
> > +	int ret;
> > +	struct btrfs_key key;
> > +	struct btrfs_inode_extref *ref;
>                                    ^^^
> Please use some other identifier than the one we use for struct
> btrfs_inode_ref. Suggestion: extref or eref.
Yeah I''ll make sure they''re all changed to extref. Makes it
much easier to
read.

> > +
> > +	key.objectid = inode_objectid;
> > +	key.type = BTRFS_INODE_EXTREF_KEY;
> > +	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
>                      ^^^^^^^^^^^^^^^^^^^^
> Get''s clearer when that is called btrfs_extref_hash.
> 
> > +
> > +	ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow);
> > +	if (ret < 0)
> > +		return ERR_PTR(ret);
> > +	if (ret > 0)
> > +		return NULL;
> > +	if (!find_name_in_ext_backref(path, name, name_len, &ref))
> > +		return NULL;
> > +	return ref;
> > +}
> > +
> > +int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
> > +			      struct btrfs_root *root,
> > +			      struct btrfs_path *path,
> > +			      const char *name, int name_len,
> > +			      u64 inode_objectid, u64 ref_objectid, int mod,
> > +			      u64 *ret_index)
> > +{
> > +	struct btrfs_inode_ref *ref1;
> > +	struct btrfs_inode_extref *ref2;
>                                    ^^^^
> Please, use "ref" and ("eref" or "extref").
> 
> > +	int ins_len = mod < 0 ? -1 : 0;
> > +	int cow = mod != 0;
> > +
> > +	ref1 = btrfs_lookup_inode_ref(trans, root, path, name, name_len,
> > +				      inode_objectid, ref_objectid, ins_len,
> > +				      cow);
> > +	if (IS_ERR(ref1))
> > +		return PTR_ERR(ref1);
> > +
> > +	if (ref1 != NULL) {
> > +		*ret_index = btrfs_inode_ref_index(path->nodes[0], ref1);
> > +		return 0;
> > +	}
> > +
> > +	btrfs_release_path(path);
> > +
> > +	ref2 = btrfs_lookup_inode_extref(trans, root, path, name,
> > +					 name_len, inode_objectid,
> > +					 ref_objectid, ins_len, cow);
> > +	if (IS_ERR(ref2))
> > +		return PTR_ERR(ref2);
> > +
> > +	if (ref2) {
> > +		*ret_index = btrfs_inode_extref_index(path->nodes[0], ref2);
> > +		return 0;
> > +	}
> > +
> > +	return -ENOENT;
> > +}
> > +
> > +int btrfs_del_inode_extref(struct btrfs_trans_handle *trans,
> >  			   struct btrfs_root *root,
> >  			   const char *name, int name_len,
> >  			   u64 inode_objectid, u64 ref_objectid, u64 *index)
> >  {
> >  	struct btrfs_path *path;
> >  	struct btrfs_key key;
> > +	struct btrfs_inode_extref *ref2;
>                                    ^^^^
> 
> > +	struct extent_buffer *leaf;
> > +	int ret;
> > +
> > +	key.objectid = inode_objectid;
> > +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> > +	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
> > +
> > +	path = btrfs_alloc_path();
> > +	if (!path)
> > +		return -ENOMEM;
> > +
> > +	path->leave_spinning = 1;
> > +
> > +	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> > +	if (ret > 0) {
> > +		ret = -ENOENT;
> > +		goto out;
> > +	} else if (ret < 0) {
> > +		goto out;
> > +	}
> 
> Clearer (to me):
> 
> if (ret > 0)
> 	ret = -ENOENT;
> if (ret < 0)
> 	goto out;
Agreed, changed.
> 
> > +
> > +	/*
> > +	 * Sanity check - did we find the right item for this name?
> > +	 * This should always succeed so error here will make the FS
> > +	 * readonly.
> > +	 */
> > +	if (!find_name_in_ext_backref(path, name, name_len, &ref2)) {
> > +		btrfs_std_error(root->fs_info, -ENOENT);
> > +		ret = -EROFS;
> 
> Simply returning -EROFS doesn''t make the fs read-only, does it?
I''d
> suggest to return -EIO and drop the comment above.
Actually, the btrfs_std_error() line above that makes the FS readonly (in
which case EROFS is a good return value).

Btw, I think that since is this a request to delete the item that not
finding it in the tree (even with the possiblity of collision handling) is
still a possible corruption.

> > +		goto out;
> > +	}
> > +
> > +	leaf = path->nodes[0];
> > +	if (index)
> > +		*index = btrfs_inode_extref_index(leaf, ref2);
> > +
> > +	ret = btrfs_del_item(trans, root, path);
> > +
> > +out:
> > +	btrfs_free_path(path);
> > +
> > +	return ret;
> > +}
> > +
> > +int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
> > +			struct btrfs_root *root,
> > +			const char *name, int name_len,
> > +			u64 inode_objectid, u64 ref_objectid, u64 *index)
> > +{
> > +	struct btrfs_path *path;
> > +	struct btrfs_key key;
> >  	struct btrfs_inode_ref *ref;
> >  	struct extent_buffer *leaf;
> >  	unsigned long ptr;
> >  	unsigned long item_start;
> >  	u32 item_size;
> >  	u32 sub_item_len;
> > -	int ret;
> > +	int ret, search_ext_refs = 0;
>                ^^^^^^^^^^^^^^^^^^^^^
> Documentation/CodingStyle:
> --
> To this end, use just one data declaration per line (no commas for
> multiple data declarations).
> --
> 
> You do this multiple times in your whole patch set.
Sure, I can change that. There''s lots of kernel code that blatantly
ignores
this but I suppose that''s no excuse...

> >  	int del_len = name_len + sizeof(*ref);
> >  
> >  	key.objectid = inode_objectid;
> > @@ -105,12 +261,14 @@ int btrfs_del_inode_ref(struct
btrfs_trans_handle *trans,
> >  	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> >  	if (ret > 0) {
> >  		ret = -ENOENT;
> > +		search_ext_refs = 1;
> >  		goto out;
> >  	} else if (ret < 0) {
> >  		goto out;
> >  	}
> >  	if (!find_name_in_backref(path, name, name_len, &ref)) {
> >  		ret = -ENOENT;
> > +		search_ext_refs = 1;
> >  		goto out;
> >  	}
> >  	leaf = path->nodes[0];
> > @@ -132,6 +290,59 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle
*trans,
> >  				  item_size - sub_item_len, 1);
> >  out:
> >  	btrfs_free_path(path);
> > +
> > +	if (search_ext_refs) {
> 
> As far as I see it, you can drop search_ext_refs and simply go this way
> on ret == -ENOENT.
I actually made it explicit on purpose. If you look at the function
there''s
several places where ret can be set and then bubbled down. Instead of
depending on the called functions never setting -ENOENT (and future
programmers to get that it''s a ''magic'' value), I
prefer to just explicitely
set a single int variable.

> > +		/* 
>                   ^
> Catched by accident: trailing whitespace.
Oh thanks.
> 
> > +		 * No refs were found, or we could not find the
> > +		 * name in our ref array. Find and remove the extended
> > +		 * inode ref then.
> > +		 */
> > +		return btrfs_del_inode_extref(trans, root, name, name_len,
> > +					      inode_objectid, ref_objectid, index);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static int btrfs_insert_inode_extref(struct btrfs_trans_handle
*trans,
> > +				     struct btrfs_root *root,
> > +				     const char *name, int name_len,
> > +				     u64 inode_objectid, u64 ref_objectid, u64 index)
> 
> Are we missing a check against BTRFS_LINK_MAX in this function?
No, the caller is supposed to check this, in theory before starting a
transaction to change the FS. So btrfs_link() does the checking for us.

> > +{
> > +	struct btrfs_path *path;
> > +	struct btrfs_key key;
> > +	struct btrfs_inode_extref *ref;
> extref
> 
> > +	unsigned long ptr;
> > +	int ret;
> > +	int ins_len = name_len + sizeof(*ref);
> > +
> > +	key.objectid = inode_objectid;
> > +	key.type = BTRFS_INODE_EXTREF_KEY;
> > +	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
> > +
> > +	path = btrfs_alloc_path();
> > +	if (!path)
> > +		return -ENOMEM;
> > +
> > +	path->leave_spinning = 1;
> > +	ret = btrfs_insert_empty_item(trans, root, path, &key,
> > +				      ins_len);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	ref = btrfs_item_ptr(path->nodes[0], path->slots[0],
> > +			     struct btrfs_inode_extref);
> > +
> > +	btrfs_set_inode_extref_name_len(path->nodes[0], ref, name_len);
> > +	btrfs_set_inode_extref_index(path->nodes[0], ref, index);
> > +	btrfs_set_inode_extref_parent(path->nodes[0], ref, ref_objectid);
> > +
> > +	ptr = (unsigned long)&ref->name;
> > +	write_extent_buffer(path->nodes[0], name, ptr, name_len);
> > +	btrfs_mark_buffer_dirty(path->nodes[0]);
> > +
> > +out:
> > +	btrfs_free_path(path);
> >  	return ret;
> >  }
> >  
> > @@ -189,6 +400,19 @@ int btrfs_insert_inode_ref(struct
btrfs_trans_handle *trans,
> >  
> >  out:
> >  	btrfs_free_path(path);
> > +
> > +	if (ret == -EMLINK) {
> > +		struct btrfs_super_block *disk_super =
root->fs_info->super_copy;
> > +		/* We ran out of space in the ref array. Need to
> > +		 * add an extended ref. */
> > +		if (btrfs_super_incompat_flags(disk_super)
> > +		    & BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
> > +			ret = btrfs_insert_inode_extref(trans, root, name,
> > +							name_len,
> > +							inode_objectid,
> > +							ref_objectid, index);
> > +	}
> > +
> >  	return ret;
> >  }
> >  
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 892b347..0ca525d 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -2689,7 +2689,6 @@ static struct btrfs_trans_handle
*__unlink_start_trans(struct inode *dir,
> >  	struct btrfs_trans_handle *trans;
> >  	struct btrfs_root *root = BTRFS_I(dir)->root;
> >  	struct btrfs_path *path;
> > -	struct btrfs_inode_ref *ref;
> >  	struct btrfs_dir_item *di;
> >  	struct inode *inode = dentry->d_inode;
> >  	u64 index;
> > @@ -2803,17 +2802,16 @@ static struct btrfs_trans_handle
*__unlink_start_trans(struct inode *dir,
> >  	}
> >  	btrfs_release_path(path);
> >  
> > -	ref = btrfs_lookup_inode_ref(trans, root, path,
> > -				dentry->d_name.name, dentry->d_name.len,
> > -				ino, dir_ino, 0);
> > -	if (IS_ERR(ref)) {
> > -		err = PTR_ERR(ref);
> > +	ret = btrfs_get_inode_ref_index(trans, root, path,
dentry->d_name.name,
> > +					dentry->d_name.len, ino, dir_ino, 0, &index);
> 
> This line is >80 chars. Please use scripts/checkpatch.pl to catch those.
Fixed.

> > +	if (ret) {
> > +		err = ret;
> >  		goto out;
> >  	}
> > -	BUG_ON(!ref);
> > +
> >  	if (check_path_shared(root, path))
> >  		goto out;
> > -	index = btrfs_inode_ref_index(path->nodes[0], ref);
> > +
> >  	btrfs_release_path(path);
> >  
> >  	/*
> > @@ -4484,6 +4482,10 @@ static struct inode *btrfs_new_inode(struct
btrfs_trans_handle *trans,
> >  	btrfs_set_key_type(&key[0], BTRFS_INODE_ITEM_KEY);
> >  	key[0].offset = 0;
> >  
> > +	/*
> > +	 * Start new inodes with a V1 ref. This is slightly more
> > +	 * efficient for small numbers of hard links.
> > +	 */
> 
> I''d drop that comment. extref is no "V2" kind of ref.
But you can leave
> it in if you feel it helps anyone later. 
Yeah at one point they were called ''v2'' due to developer
laziness :)

I''ve updated the comment:

	/*
	 * Start new inodes with an inode_ref. This is slightly more
	 * efficient for small numbers of hard links since they will
	 * be packed into one item. Extended refs will kick in if we
	 * add more hard links than can fit in the ref item.
	 */


It could probably still be improved beyond that. Mostly I want to point out
that we never start with extended refs so that anyone looking at the code
will understand the ''flow'' of btrfs inode references.

> >  	key[1].objectid = objectid;
> >  	btrfs_set_key_type(&key[1], BTRFS_INODE_REF_KEY);
> >  	key[1].offset = ref_objectid;
> > @@ -4777,7 +4779,7 @@ static int btrfs_link(struct dentry *old_dentry,
struct inode *dir,
> >  	if (root->objectid != BTRFS_I(inode)->root->objectid)
> >  		return -EXDEV;
> >  
> > -	if (inode->i_nlink == ~0U)
> > +	if (inode->i_nlink >= BTRFS_LINK_MAX)
> >  		return -EMLINK;
> >  
> >  	err = btrfs_set_inode_index(dir, &index);
> 
> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Thanks again Jan. I''ll be sure to CC you on my next drop!
	--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-Apr-25 10:19 UTC

head link

Re: [PATCH 1/3] btrfs: extended inode refs

On 25.04.2012 00:23, Mark Fasheh wrote:> Jan, firstly thanks very much for the thorough review! I wanted to make
sure
> I got the collision handling going before addressing the issues you found.
> Again thanks, and my notes are below.

> On Thu, Apr 12, 2012 at 03:08:27PM +0200, Jan Schmidt wrote:
>>> +/*
>>> + * Theoretical limit is larger, but we keep this down to a sane
>>> + * value. That should limit greatly the possibility of collisions
on
>>> + * inode ref items.
>>> + */
>>> +#define BTRFS_LINK_MAX 65535U
>>
>> Do we really want an artificial limit like that?
> 
> I took a look at other file systems and this seems to be a pretty
reasonable
> maximum. Ext4 and XFS in particular have similar limits.
> 
> Granted, btrfs should be better equipped to deal with large numbers of hard
> links - at least in the area of consistency checking.
> 
> That said, I think keeping this down to "only" 65 thousand hard
links per
> inode should work out fine. Also it''s not a large change to
increase or
> remove the limit should we have to.
Agreed.
>>> -struct btrfs_inode_ref *
>>> +static int find_name_in_ext_backref(struct btrfs_path *path, const
char *name,
>>> +				   int name_len, struct btrfs_inode_extref **ref_ret)
>>> +{
>>> +	struct extent_buffer *leaf;
>>> +	struct btrfs_inode_extref *ref;
>>> +	unsigned long ptr;
>>> +	unsigned long name_ptr;
>>> +	u32 item_size;
>>> +
>>> +	leaf = path->nodes[0];
>>> +	item_size = btrfs_item_size_nr(leaf, path->slots[0]);
>>> +	ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
>>> +	ref = (struct btrfs_inode_extref *)ptr;
>>> +
>>> +	/*
>>> +	 * There isn''t actually anything to "search"
in an extended
>>> +	 * backref - one name is directly attached to each
>>> +	 * item. Instead what we''re really doing here is some
sort of
>>> +	 * sanity check - does the name exist where it should have
>>> +	 * been found. If all is well, we''ll return success and
the
>>> +	 * inode ref object.
>>> +	 */
>>
>> In that case, the name is not a good choice. However, if we change how
>> we deal with hash collisions, the name might be right and the comment
>> might go away as we really look through more than one item. If we leave
>> it like this, please rename that function.
> 
> Yeah, great catch. In the newer version as you''ve noted, it *is*
actually a
> ''find me the name'' function, so the comment (and
resulting EROFS) have been
> removed.
> 
> 
>>> +	name_ptr = (unsigned long)(&ref->name);
>>> +	if (btrfs_inode_extref_name_len(leaf, ref) == name_len
>>> +	    && (memcmp_extent_buffer(leaf, name, name_ptr,
name_len) == 0)) {
>>> +		*ref_ret = ref;
>>> +		return 1;
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * Figure the key offset of an extended inode ref
>>> + */
>>> +u64 btrfs_extref_key_off(u64 parent_objectid, const char *name,
int name_len)
>>
>> I''d suggest to call this one btrfs_extref_hash and move it to
>> fs/btrfs/hash.h. That way, we can also drop the include for
>> <linux/crc32c.h> above.
> 
> Done.
> 
> 
>>> +{
>>> +	return (u64) crc32c(parent_objectid, name, name_len);
>>> +}
>>> +
>>> +static struct btrfs_inode_ref *
>>>  btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
>>> -			struct btrfs_root *root,
>>> -			struct btrfs_path *path,
>>> -			const char *name, int name_len,
>>> -			u64 inode_objectid, u64 ref_objectid, int mod)
>>> +		       struct btrfs_root *root,
>>> +		       struct btrfs_path *path,
>>> +		       const char *name, int name_len,
>>> +		       u64 inode_objectid, u64 ref_objectid, int ins_len,
>>> +		       int cow)
>>>  {
>>> +	int ret;
>>>  	struct btrfs_key key;
>>>  	struct btrfs_inode_ref *ref;
>>> -	int ins_len = mod < 0 ? -1 : 0;
>>> -	int cow = mod != 0;
>>> -	int ret;
>>>  
>>>  	key.objectid = inode_objectid;
>>>  	key.type = BTRFS_INODE_REF_KEY;
>>> @@ -76,20 +115,137 @@ btrfs_lookup_inode_ref(struct
btrfs_trans_handle *trans,
>>>  	return ref;
>>>  }
>>>  
>>> -int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>>> +struct btrfs_inode_extref *
>>> +btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
>>> +			  struct btrfs_root *root,
>>> +			  struct btrfs_path *path,
>>> +			  const char *name, int name_len,
>>> +			  u64 inode_objectid, u64 ref_objectid, int ins_len,
>>> +			  int cow)
>>> +{
>>> +	int ret;
>>> +	struct btrfs_key key;
>>> +	struct btrfs_inode_extref *ref;
>>                                    ^^^
>> Please use some other identifier than the one we use for struct
>> btrfs_inode_ref. Suggestion: extref or eref.
> 
> Yeah I''ll make sure they''re all changed to extref. Makes
it much easier to
> read.
> 
> 
>>> +
>>> +	key.objectid = inode_objectid;
>>> +	key.type = BTRFS_INODE_EXTREF_KEY;
>>> +	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
>>                      ^^^^^^^^^^^^^^^^^^^^
>> Get''s clearer when that is called btrfs_extref_hash.
>>
>>> +
>>> +	ret = btrfs_search_slot(trans, root, &key, path, ins_len,
cow);
>>> +	if (ret < 0)
>>> +		return ERR_PTR(ret);
>>> +	if (ret > 0)
>>> +		return NULL;
>>> +	if (!find_name_in_ext_backref(path, name, name_len, &ref))
>>> +		return NULL;
>>> +	return ref;
>>> +}
>>> +
>>> +int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
>>> +			      struct btrfs_root *root,
>>> +			      struct btrfs_path *path,
>>> +			      const char *name, int name_len,
>>> +			      u64 inode_objectid, u64 ref_objectid, int mod,
>>> +			      u64 *ret_index)
>>> +{
>>> +	struct btrfs_inode_ref *ref1;
>>> +	struct btrfs_inode_extref *ref2;
>>                                    ^^^^
>> Please, use "ref" and ("eref" or
"extref").
>>
>>> +	int ins_len = mod < 0 ? -1 : 0;
>>> +	int cow = mod != 0;
>>> +
>>> +	ref1 = btrfs_lookup_inode_ref(trans, root, path, name, name_len,
>>> +				      inode_objectid, ref_objectid, ins_len,
>>> +				      cow);
>>> +	if (IS_ERR(ref1))
>>> +		return PTR_ERR(ref1);
>>> +
>>> +	if (ref1 != NULL) {
>>> +		*ret_index = btrfs_inode_ref_index(path->nodes[0], ref1);
>>> +		return 0;
>>> +	}
>>> +
>>> +	btrfs_release_path(path);
>>> +
>>> +	ref2 = btrfs_lookup_inode_extref(trans, root, path, name,
>>> +					 name_len, inode_objectid,
>>> +					 ref_objectid, ins_len, cow);
>>> +	if (IS_ERR(ref2))
>>> +		return PTR_ERR(ref2);
>>> +
>>> +	if (ref2) {
>>> +		*ret_index = btrfs_inode_extref_index(path->nodes[0], ref2);
>>> +		return 0;
>>> +	}
>>> +
>>> +	return -ENOENT;
>>> +}
>>> +
>>> +int btrfs_del_inode_extref(struct btrfs_trans_handle *trans,
>>>  			   struct btrfs_root *root,
>>>  			   const char *name, int name_len,
>>>  			   u64 inode_objectid, u64 ref_objectid, u64 *index)
>>>  {
>>>  	struct btrfs_path *path;
>>>  	struct btrfs_key key;
>>> +	struct btrfs_inode_extref *ref2;
>>                                    ^^^^
>>
>>> +	struct extent_buffer *leaf;
>>> +	int ret;
>>> +
>>> +	key.objectid = inode_objectid;
>>> +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
>>> +	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
>>> +
>>> +	path = btrfs_alloc_path();
>>> +	if (!path)
>>> +		return -ENOMEM;
>>> +
>>> +	path->leave_spinning = 1;
>>> +
>>> +	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>>> +	if (ret > 0) {
>>> +		ret = -ENOENT;
>>> +		goto out;
>>> +	} else if (ret < 0) {
>>> +		goto out;
>>> +	}
>>
>> Clearer (to me):
>>
>> if (ret > 0)
>> 	ret = -ENOENT;
>> if (ret < 0)
>> 	goto out;
> 
> Agreed, changed.
> 
>>
>>> +
>>> +	/*
>>> +	 * Sanity check - did we find the right item for this name?
>>> +	 * This should always succeed so error here will make the FS
>>> +	 * readonly.
>>> +	 */
>>> +	if (!find_name_in_ext_backref(path, name, name_len, &ref2)) {
>>> +		btrfs_std_error(root->fs_info, -ENOENT);
>>> +		ret = -EROFS;
>>
>> Simply returning -EROFS doesn''t make the fs read-only, does
it? I''d
>> suggest to return -EIO and drop the comment above.
> 
> Actually, the btrfs_std_error() line above that makes the FS readonly (in
> which case EROFS is a good return value).
Got it. -EROFS is a good choice, then.
> Btw, I think that since is this a request to delete the item that not
> finding it in the tree (even with the possiblity of collision handling) is
> still a possible corruption.
> 
> 
>>> +		goto out;
>>> +	}
>>> +
>>> +	leaf = path->nodes[0];
>>> +	if (index)
>>> +		*index = btrfs_inode_extref_index(leaf, ref2);
>>> +
>>> +	ret = btrfs_del_item(trans, root, path);
>>> +
>>> +out:
>>> +	btrfs_free_path(path);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>>> +			struct btrfs_root *root,
>>> +			const char *name, int name_len,
>>> +			u64 inode_objectid, u64 ref_objectid, u64 *index)
>>> +{
>>> +	struct btrfs_path *path;
>>> +	struct btrfs_key key;
>>>  	struct btrfs_inode_ref *ref;
>>>  	struct extent_buffer *leaf;
>>>  	unsigned long ptr;
>>>  	unsigned long item_start;
>>>  	u32 item_size;
>>>  	u32 sub_item_len;
>>> -	int ret;
>>> +	int ret, search_ext_refs = 0;
>>                ^^^^^^^^^^^^^^^^^^^^^
>> Documentation/CodingStyle:
>> --
>> To this end, use just one data declaration per line (no commas for
>> multiple data declarations).
>> --
>>
>> You do this multiple times in your whole patch set.
> 
> Sure, I can change that. There''s lots of kernel code that
blatantly ignores
> this but I suppose that''s no excuse...
> 
> 
>>>  	int del_len = name_len + sizeof(*ref);
>>>  
>>>  	key.objectid = inode_objectid;
>>> @@ -105,12 +261,14 @@ int btrfs_del_inode_ref(struct
btrfs_trans_handle *trans,
>>>  	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>>>  	if (ret > 0) {
>>>  		ret = -ENOENT;
>>> +		search_ext_refs = 1;
>>>  		goto out;
>>>  	} else if (ret < 0) {
>>>  		goto out;
>>>  	}
>>>  	if (!find_name_in_backref(path, name, name_len, &ref)) {
>>>  		ret = -ENOENT;
>>> +		search_ext_refs = 1;
>>>  		goto out;
>>>  	}
>>>  	leaf = path->nodes[0];
>>> @@ -132,6 +290,59 @@ int btrfs_del_inode_ref(struct
btrfs_trans_handle *trans,
>>>  				  item_size - sub_item_len, 1);
>>>  out:
>>>  	btrfs_free_path(path);
>>> +
>>> +	if (search_ext_refs) {
>>
>> As far as I see it, you can drop search_ext_refs and simply go this way
>> on ret == -ENOENT.
> 
> I actually made it explicit on purpose. If you look at the function
there''s
> several places where ret can be set and then bubbled down. Instead of
> depending on the called functions never setting -ENOENT (and future
> programmers to get that it''s a ''magic'' value), I
prefer to just explicitely
> set a single int variable.
Okay, sounds reasonable.
>>> +		/* 
>>                   ^
>> Catched by accident: trailing whitespace.
> 
> Oh thanks.
> 
>>
>>> +		 * No refs were found, or we could not find the
>>> +		 * name in our ref array. Find and remove the extended
>>> +		 * inode ref then.
>>> +		 */
>>> +		return btrfs_del_inode_extref(trans, root, name, name_len,
>>> +					      inode_objectid, ref_objectid, index);
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static int btrfs_insert_inode_extref(struct btrfs_trans_handle
*trans,
>>> +				     struct btrfs_root *root,
>>> +				     const char *name, int name_len,
>>> +				     u64 inode_objectid, u64 ref_objectid, u64 index)
>>
>> Are we missing a check against BTRFS_LINK_MAX in this function?
> 
> No, the caller is supposed to check this, in theory before starting a
> transaction to change the FS. So btrfs_link() does the checking for us.
Is that worth a comment? Or is that clear to anybody familiar with such
things anyway (in which case I''d not add that comment)?
>>> +{
>>> +	struct btrfs_path *path;
>>> +	struct btrfs_key key;
>>> +	struct btrfs_inode_extref *ref;
>> extref
>>
>>> +	unsigned long ptr;
>>> +	int ret;
>>> +	int ins_len = name_len + sizeof(*ref);
>>> +
>>> +	key.objectid = inode_objectid;
>>> +	key.type = BTRFS_INODE_EXTREF_KEY;
>>> +	key.offset = btrfs_extref_key_off(ref_objectid, name, name_len);
>>> +
>>> +	path = btrfs_alloc_path();
>>> +	if (!path)
>>> +		return -ENOMEM;
>>> +
>>> +	path->leave_spinning = 1;
>>> +	ret = btrfs_insert_empty_item(trans, root, path, &key,
>>> +				      ins_len);
>>> +	if (ret < 0)
>>> +		goto out;
>>> +
>>> +	ref = btrfs_item_ptr(path->nodes[0], path->slots[0],
>>> +			     struct btrfs_inode_extref);
>>> +
>>> +	btrfs_set_inode_extref_name_len(path->nodes[0], ref,
name_len);
>>> +	btrfs_set_inode_extref_index(path->nodes[0], ref, index);
>>> +	btrfs_set_inode_extref_parent(path->nodes[0], ref,
ref_objectid);
>>> +
>>> +	ptr = (unsigned long)&ref->name;
>>> +	write_extent_buffer(path->nodes[0], name, ptr, name_len);
>>> +	btrfs_mark_buffer_dirty(path->nodes[0]);
>>> +
>>> +out:
>>> +	btrfs_free_path(path);
>>>  	return ret;
>>>  }
>>>  
>>> @@ -189,6 +400,19 @@ int btrfs_insert_inode_ref(struct
btrfs_trans_handle *trans,
>>>  
>>>  out:
>>>  	btrfs_free_path(path);
>>> +
>>> +	if (ret == -EMLINK) {
>>> +		struct btrfs_super_block *disk_super =
root->fs_info->super_copy;
>>> +		/* We ran out of space in the ref array. Need to
>>> +		 * add an extended ref. */
>>> +		if (btrfs_super_incompat_flags(disk_super)
>>> +		    & BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
>>> +			ret = btrfs_insert_inode_extref(trans, root, name,
>>> +							name_len,
>>> +							inode_objectid,
>>> +							ref_objectid, index);
>>> +	}
>>> +
>>>  	return ret;
>>>  }
>>>  
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index 892b347..0ca525d 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -2689,7 +2689,6 @@ static struct btrfs_trans_handle
*__unlink_start_trans(struct inode *dir,
>>>  	struct btrfs_trans_handle *trans;
>>>  	struct btrfs_root *root = BTRFS_I(dir)->root;
>>>  	struct btrfs_path *path;
>>> -	struct btrfs_inode_ref *ref;
>>>  	struct btrfs_dir_item *di;
>>>  	struct inode *inode = dentry->d_inode;
>>>  	u64 index;
>>> @@ -2803,17 +2802,16 @@ static struct btrfs_trans_handle
*__unlink_start_trans(struct inode *dir,
>>>  	}
>>>  	btrfs_release_path(path);
>>>  
>>> -	ref = btrfs_lookup_inode_ref(trans, root, path,
>>> -				dentry->d_name.name, dentry->d_name.len,
>>> -				ino, dir_ino, 0);
>>> -	if (IS_ERR(ref)) {
>>> -		err = PTR_ERR(ref);
>>> +	ret = btrfs_get_inode_ref_index(trans, root, path,
dentry->d_name.name,
>>> +					dentry->d_name.len, ino, dir_ino, 0, &index);
>>
>> This line is >80 chars. Please use scripts/checkpatch.pl to catch
those.
> 
> Fixed.
> 
> 
>>> +	if (ret) {
>>> +		err = ret;
>>>  		goto out;
>>>  	}
>>> -	BUG_ON(!ref);
>>> +
>>>  	if (check_path_shared(root, path))
>>>  		goto out;
>>> -	index = btrfs_inode_ref_index(path->nodes[0], ref);
>>> +
>>>  	btrfs_release_path(path);
>>>  
>>>  	/*
>>> @@ -4484,6 +4482,10 @@ static struct inode *btrfs_new_inode(struct
btrfs_trans_handle *trans,
>>>  	btrfs_set_key_type(&key[0], BTRFS_INODE_ITEM_KEY);
>>>  	key[0].offset = 0;
>>>  
>>> +	/*
>>> +	 * Start new inodes with a V1 ref. This is slightly more
>>> +	 * efficient for small numbers of hard links.
>>> +	 */
>>
>> I''d drop that comment. extref is no "V2" kind of
ref. But you can leave
>> it in if you feel it helps anyone later. 
> 
> Yeah at one point they were called ''v2'' due to developer
laziness :)
> 
> I''ve updated the comment:
> 
> 	/*
> 	 * Start new inodes with an inode_ref. This is slightly more
> 	 * efficient for small numbers of hard links since they will
> 	 * be packed into one item. Extended refs will kick in if we
> 	 * add more hard links than can fit in the ref item.
> 	 */
> 
> 
> It could probably still be improved beyond that. Mostly I want to point out
> that we never start with extended refs so that anyone looking at the code
> will understand the ''flow'' of btrfs inode references.
I think its fine like it stands above.

-Jan
>>>  	key[1].objectid = objectid;
>>>  	btrfs_set_key_type(&key[1], BTRFS_INODE_REF_KEY);
>>>  	key[1].offset = ref_objectid;
>>> @@ -4777,7 +4779,7 @@ static int btrfs_link(struct dentry
*old_dentry, struct inode *dir,
>>>  	if (root->objectid != BTRFS_I(inode)->root->objectid)
>>>  		return -EXDEV;
>>>  
>>> -	if (inode->i_nlink == ~0U)
>>> +	if (inode->i_nlink >= BTRFS_LINK_MAX)
>>>  		return -EMLINK;
>>>  
>>>  	err = btrfs_set_inode_index(dir, &index);
>>
>> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
> 
> Thanks again Jan. I''ll be sure to CC you on my next drop!
> 	--Mark
> 
> --
> Mark Fasheh--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Fasheh

2012-May-01 18:39 UTC

head link

Re: [PATCH 2/3] btrfs: extended inode refs

On Thu, Apr 12, 2012 at 05:53:15PM +0200, Jan Schmidt
wrote:> Hi Mark,
> 
> While reading 3/3 I stumbled across one more thing in this one:
> 
> On 05.04.2012 22:09, Mark Fasheh wrote:
> > +int btrfs_find_one_extref(struct btrfs_root *root, u64
inode_objectid,
> > +			  u64 start_off, struct btrfs_path *path,
> > +			  struct btrfs_inode_extref **ret_ref, u64 *found_off)
> > +{
> > +	int ret, slot;
> > +	struct btrfs_key key, found_key;
> > +	struct btrfs_inode_extref *ref;
> > +	struct extent_buffer *leaf;
> > +	struct btrfs_item *item;
> > +	unsigned long ptr;
> >  
> > -/*
> > - * There are a few corners where the link count of the file
can''t
> > - * be properly maintained during replay.  So, instead of adding
> > - * lots of complexity to the log code, we just scan the backrefs
> > - * for any file that has been through replay.
> > - *
> > - * The scan will update the link count on the inode to reflect the
> > - * number of back refs found.  If it goes down to zero, the iput
> > - * will free the inode.
> > - */
> > -static noinline int fixup_inode_link_count(struct btrfs_trans_handle
*trans,
> > -					   struct btrfs_root *root,
> > -					   struct inode *inode)
> > +	key.objectid = inode_objectid;
> > +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> > +	key.offset = start_off;
> > +
> > +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	while (1) {
> > +		leaf = path->nodes[0];
> > +		slot = path->slots[0];
> > +		if (slot >= btrfs_header_nritems(leaf)) {
> > +			/*
> > +			 * If the item at offset is not found,
> > +			 * btrfs_search_slot will point us to the slot
> > +			 * where it should be inserted. In our case
> > +			 * that will be the slot directly before the
> > +			 * next INODE_REF_KEY_V2 item. In the case
> > +			 * that we''re pointing to the last slot in a
> > +			 * leaf, we must move one leaf over.
> > +			 */
> > +			ret = btrfs_next_leaf(root, path);
> > +			if (ret) {
> > +				if (ret >= 1)
> > +					ret = -ENOENT;
> > +				break;
> > +			}
> > +			continue;
> > +		}
> > +
> > +		item = btrfs_item_nr(leaf, slot);
> > +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> > +
> > +		/*
> > +		 * Check that we''re still looking at an extended ref key
for
> > +		 * this particular objectid. If we have different
> > +		 * objectid or type then there are no more to be found
> > +		 * in the tree and we can exit.
> > +		 */
> > +		ret = -ENOENT;
> > +		if (found_key.objectid != inode_objectid)
> > +			break;
> > +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> > +			break;
> > +
> > +		ret = 0;
> > +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +		ref = (struct btrfs_inode_extref *)ptr;
> > +		*ret_ref = ref;
> > +		if (found_off)
> > +			*found_off = found_key.offset + 1;
>                                                       ^^^
> It''s evil to call it "found offset" an then return one
larger than the
> offset found. No caller would ever expect this.
> 
> > +		break;
> > +	}
> > +
> > +out:
> > +	return ret;
> > +}
> > +
> > +static int count_inode_extrefs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> > +{
> > +	int ret;
> > +	unsigned int nlink = 0;
> > +	u64 inode_objectid = btrfs_ino(inode);
> > +	u64 offset = 0;
> > +	struct btrfs_inode_extref *ref;
> > +
> > +	while (1) {
> > +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> > +					    &ref, &offset);
> > +		if (ret)
> > +			break;
> > +
> > +		nlink++;
> > +		offset++;
> 		^^^^^^^^
> Huh. See? The caller expected to get the offset found from
> btrfs_find_one_extref. As it stands you might be missing the very next key.
Oh yeah that was totally broken btw. Fixed now, but forgot to mention that
:)

btrfs_find_one_extref() now returns the actual found offset (and the callers
can just increment if they''re doing a search).
	--Mark


--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Fasheh

2012-May-03 23:12 UTC

head link

Re: [PATCH 2/3] btrfs: extended inode refs

On Thu, Apr 12, 2012 at 03:08:35PM +0200, Jan Schmidt
wrote:> On 05.04.2012 22:09, Mark Fasheh wrote:
> > Teach tree-log.c about extended inode refs. In particular, we have to
adjust
> > the behavior of inode ref replay as well as log tree recovery to
account for
> > the existence of extended refs.
> > 
> > Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> > ---
> >  fs/btrfs/tree-log.c |  320
+++++++++++++++++++++++++++++++++++++++++---------
> >  fs/btrfs/tree-log.h |    4 +
> >  2 files changed, 266 insertions(+), 58 deletions(-)
> > 
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 966cc74..d69b07a 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -23,6 +23,7 @@
> >  #include "disk-io.h"
> >  #include "locking.h"
> >  #include "print-tree.h"
> > +#include "backref.h"
> 
> So now tree-log.c includes backref.h and backref.c includes tree-log.h,
> this is not a problem by itself, but I''d try to avoid such
dependencies.
> I''d put find_one_extref over to backref.c to solve this, also
because
> there it would be closer to inode_ref_info (which does the same for
> INODE_REFs.
Yeah good idea. I went ahead and moved find_one_extref to backref.c as you
suggest.

> > @@ -786,7 +804,6 @@ out:
> >  	return match;
> >  }
> >  
> > -
> >  /*
> >   * replay one inode back reference item found in the log tree.
> >   * eb, slot and key refer to the buffer and key found in the log
tree.
> > @@ -801,15 +818,20 @@ static noinline int add_inode_ref(struct
btrfs_trans_handle *trans,
> >  				  struct btrfs_key *key)
> >  {
> >  	struct btrfs_inode_ref *ref;
> > +	struct btrfs_inode_extref *extref;
> >  	struct btrfs_dir_item *di;
> > +	struct btrfs_key search_key;
> 
> You don''t need search_key, just use key from the parameter list as
the
> code did previously.
I force the value of search key (to look for refs) however while the key
value from the parameter list could be of any kind of inode ref.


> >  	struct inode *dir;
> >  	struct inode *inode;
> >  	unsigned long ref_ptr;
> >  	unsigned long ref_end;
> > -	char *name;
> > -	int namelen;
> > +	char *name, *victim_name;
> > +	int namelen, victim_name_len;
> 
> split
> 
> >  	int ret;
> >  	int search_done = 0;
> > +	int log_ref_ver = 0;
> > +	u64 parent_objectid, inode_objectid, ref_index;
> 
> split
done, thanks.

> > +	struct extent_buffer *leaf;
> >  
> >  	/*
> >  	 * it is possible that we didn''t log all the parent
directories
> > @@ -817,32 +839,56 @@ static noinline int add_inode_ref(struct
btrfs_trans_handle *trans,
> >  	 * copy the back ref in.  The link count fixup code will take
> >  	 * care of the rest
> >  	 */
> > -	dir = read_one_inode(root, key->offset);
> > +
> > +	if (key->type == BTRFS_INODE_EXTREF_KEY) {
> > +		log_ref_ver = 1;
>                 ^^^^^^^^^^^^^^^
> Assigned but never used.
Almost got rid of this but it turns out after some changes, I''m using
it
now :)

> > +
> > +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> > +
> > +		/* So that we don''t loop back looking for old style log
refs. */
> > +		ref_end = ref_ptr;
> > +
> > +		extref = (struct btrfs_inode_extref *) btrfs_item_ptr_offset(eb,
slot);
> > +		namelen = btrfs_inode_extref_name_len(eb, extref);
> > +		name = kmalloc(namelen, GFP_NOFS);
> 
> kmalloc may fail.
Fixed both instances of this. I''m just testing for null return from
kmalloc
and bubbling the -ENOMEM back up. The callers of add_inode_ref() will wind
up BUGing on us anyway but that''s beyond the scope of this patch.

> > +
> > +		read_extent_buffer(eb, name, (unsigned long)&extref->name,
> > +				   namelen);
> > +
> > +		ref_index = btrfs_inode_extref_index(eb, extref);
> > +		parent_objectid = btrfs_inode_extref_parent(eb, extref);
> > +	} else {
> > +		parent_objectid = key->offset;
> > +
> > +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> > +		ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> > +
> > +		ref = (struct btrfs_inode_ref *)ref_ptr;
> > +		namelen = btrfs_inode_ref_name_len(eb, ref);
> > +		name = kmalloc(namelen, GFP_NOFS);
> > +		BUG_ON(!name);
> > +
> > +		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> > +
> > +		ref_index = btrfs_inode_ref_index(eb, ref);
> 
> The way you put it, you had to copy all the code from the else block
> down before the "goto again" case. Code duplication is error
prone
>
> Maybe you can manage to put everything between "again:" and
"goto again"
> into a separate function that can be called multiple times. For the old
> INODE_REF items that could be done in a loop then. This would also split
> an overly long function into more readable units.
Hmm, ok I pushed the variable setting code into a function for each type of
ref. That cleans things up and keeps the code centralized.
> > +	/* Same search but for extended refs */
> > +	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
> > +					   inode_objectid, parent_objectid, 0,
> > +					   0);
> > +	if (extref && !IS_ERR(extref)) {
> 
> if (!IS_ERR_OR_NULL(extref))
Nice, thanks for the tip.


> > +		victim_name_len = btrfs_inode_extref_name_len(eb, extref);
> > +		victim_name = kmalloc(namelen, GFP_NOFS);
> > +		leaf = path->nodes[0];
> > +		read_extent_buffer(eb, name, (unsigned long)&extref->name,
namelen);
> > +
> > +		search_key.objectid = inode_objectid;
> > +		search_key.type = BTRFS_INODE_EXTREF_KEY;
> > +		search_key.offset = btrfs_extref_key_off(parent_objectid, name,
namelen);
> > +		if (!backref_in_log(log, &search_key, victim_name,
> > +				    victim_name_len)) {
> > +			btrfs_inc_nlink(inode);
> > +			btrfs_release_path(path);
> > +
> > +			ret = btrfs_unlink_inode(trans, root, dir,
> > +						 inode, victim_name,
> > +						 victim_name_len);
> > +		}
> > +		kfree(victim_name);
> > +		BUG_ON(ret);
> > +	}
> > +
> > +	/*
> > +	 * NOTE: we have searched root tree and checked the
> > +	 * coresponding refs, it does not need to be checked again.
> > +	 */
> > +	search_done = 1;
> 
> You moved this comment and assignment out of the "if (ret == 0)"
case.
> I''m not sure if this is still doing exactly the same now (?).
> Previously, we were executing another btrfs_search_slot,
> btrfs_lookup_dir_index_item, ... after the "goto again" case,
which
> would be skipped with this patch.
Hmm, ok you''re definitely right that the search_done line there is
broken.
Come to think of it, I''m not quite sure what the meaning of that tiny
bit of
code was. I''ll come back to this one once I''ve looked closer.

> > +
> >  	/* look for a conflicting sequence number */
> >  	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
> > -					 btrfs_inode_ref_index(eb, ref),
> > -					 name, namelen, 0);
> > +					 ref_index, name, namelen, 0);
> >  	if (di && !IS_ERR(di)) {
> >  		ret = drop_one_dir_item(trans, root, path, dir, di);
> >  		BUG_ON(ret);
> > @@ -932,17 +1007,25 @@ again:
> >  
> >  insert:
> >  	/* insert our name */
> > -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> > -			     btrfs_inode_ref_index(eb, ref));
> > +	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
ref_index);
> >  	BUG_ON(ret);
> >  
> >  	btrfs_update_inode(trans, root, inode);
> >  
> >  out:
> > -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> > +	ref_ptr = (unsigned long)(ref_ptr + sizeof(struct btrfs_inode_ref))
+ namelen;
> >  	kfree(name);
> > -	if (ref_ptr < ref_end)
> > +	if (ref_ptr < ref_end) {
> > +		ref = (struct btrfs_inode_ref *)ref_ptr;
> > +		namelen = btrfs_inode_ref_name_len(eb, ref);
> > +		name = kmalloc(namelen, GFP_NOFS);
> > +		BUG_ON(!name);
> > +
> > +		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> > +
> > +		ref_index = btrfs_inode_ref_index(eb, ref);
> 
> This is the duplicated code mentioned above.
> 
> >  		goto again;
> > +	}
> >  
> >  	/* finally write the back reference in the inode */
> >  	ret = overwrite_item(trans, root, path, eb, slot, key);
> > @@ -965,25 +1048,103 @@ static int insert_orphan_item(struct
btrfs_trans_handle *trans,
> >  	return ret;
> >  }
> >  
> > +int btrfs_find_one_extref(struct btrfs_root *root, u64
inode_objectid,
> > +			  u64 start_off, struct btrfs_path *path,
> > +			  struct btrfs_inode_extref **ret_ref, u64 *found_off)
> 
> ret_extref
fixed.
> 
> > +{
> > +	int ret, slot;
> > +	struct btrfs_key key, found_key;
> 
> split
fixed.

> > +	struct btrfs_inode_extref *ref;
> > +	struct extent_buffer *leaf;
> > +	struct btrfs_item *item;
> > +	unsigned long ptr;
> >  
> > -/*
> > - * There are a few corners where the link count of the file
can''t
> > - * be properly maintained during replay.  So, instead of adding
> > - * lots of complexity to the log code, we just scan the backrefs
> > - * for any file that has been through replay.
> > - *
> > - * The scan will update the link count on the inode to reflect the
> > - * number of back refs found.  If it goes down to zero, the iput
> > - * will free the inode.
> > - */
> > -static noinline int fixup_inode_link_count(struct btrfs_trans_handle
*trans,
> > -					   struct btrfs_root *root,
> > -					   struct inode *inode)
> 
> [side note: this is really strange patch alignment. my git is doing it
> the same way.]
> 
> > +	key.objectid = inode_objectid;
> > +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> > +	key.offset = start_off;
> > +
> > +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> > +	if (ret < 0)
> > +		goto out;
> 
> Just return here and drop the out label.
Done.
> 
> > +
> > +	while (1) {
> > +		leaf = path->nodes[0];
> > +		slot = path->slots[0];
> > +		if (slot >= btrfs_header_nritems(leaf)) {
> > +			/*
> > +			 * If the item at offset is not found,
> > +			 * btrfs_search_slot will point us to the slot
> > +			 * where it should be inserted. In our case
> > +			 * that will be the slot directly before the
> > +			 * next INODE_REF_KEY_V2 item. In the case
> > +			 * that we''re pointing to the last slot in a
> > +			 * leaf, we must move one leaf over.
> > +			 */
> > +			ret = btrfs_next_leaf(root, path);
> > +			if (ret) {
> > +				if (ret >= 1)
> > +					ret = -ENOENT;
> > +				break;
> > +			}
> 
> This large block (and its comment) can be replaced by
> btrfs_search_slot_for_read with find_higher = 1, return_any = 0. This
> also avoids the "continue" and we even git rid of the whole while
(1)
> loop here.
Ok. I don''t have that in my tree but presumably it''s a very
new function I
just need to pick up with an update...


> > +			continue;
> > +		}
> > +
> > +		item = btrfs_item_nr(leaf, slot);
> > +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> > +
> > +		/*
> > +		 * Check that we''re still looking at an extended ref key
for
> > +		 * this particular objectid. If we have different
> > +		 * objectid or type then there are no more to be found
> > +		 * in the tree and we can exit.
> > +		 */
> > +		ret = -ENOENT;
> > +		if (found_key.objectid != inode_objectid)
> > +			break;
> > +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> > +			break;
> > +
> > +		ret = 0;
> > +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +		ref = (struct btrfs_inode_extref *)ptr;
> > +		*ret_ref = ref;
> > +		if (found_off)
> > +			*found_off = found_key.offset + 1;
> > +		break;
> > +	}
> > +
> > +out:
> > +	return ret;
> > +}
> > +
> > +static int count_inode_extrefs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> > +{
> > +	int ret;
> > +	unsigned int nlink = 0;
> > +	u64 inode_objectid = btrfs_ino(inode);
> > +	u64 offset = 0;
> > +	struct btrfs_inode_extref *ref;
> > +
> > +	while (1) {
> > +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> > +					    &ref, &offset);
> > +		if (ret)
> > +			break;
> 
> Assume the first call to btrfs_find_ione_extref returns -EIO. Do we
> really want count_inode_extrefs return 0 here? I agree that the previous
> code suffers from the same problem, but still: it''s a problem.
Yeah as you note, I''m just keeping the same behavior as before. This I
think
is probably a question for Chris...

> > +
> > +		nlink++;
> > +		offset++;
> > +	}
> > +
> > +	return nlink;
> > +}
> > +
> > +static int count_inode_refs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> >  {
> > -	struct btrfs_path *path;
> >  	int ret;
> >  	struct btrfs_key key;
> > -	u64 nlink = 0;
> > +	unsigned int nlink = 0;
> >  	unsigned long ptr;
> >  	unsigned long ptr_end;
> >  	int name_len;
> > @@ -993,10 +1154,6 @@ static noinline int
fixup_inode_link_count(struct btrfs_trans_handle *trans,
> >  	key.type = BTRFS_INODE_REF_KEY;
> >  	key.offset = (u64)-1;
> >  
> > -	path = btrfs_alloc_path();
> > -	if (!path)
> > -		return -ENOMEM;
> > -
> >  	while (1) {
> >  		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> >  		if (ret < 0)
> > @@ -1030,6 +1187,48 @@ static noinline int
fixup_inode_link_count(struct btrfs_trans_handle *trans,
> >  		btrfs_release_path(path);
> >  	}
> >  	btrfs_release_path(path);
> > +	btrfs_free_path(path);
> 
> In general, you must not free a path when you don''t allocate it.
Ok yeah that one must''ve gotten past me. Fixed.

> 
> > +
> > +	return nlink;
> > +}
> > +
> > +/*
> > + * There are a few corners where the link count of the file
can''t
> > + * be properly maintained during replay.  So, instead of adding
> > + * lots of complexity to the log code, we just scan the backrefs
> > + * for any file that has been through replay.
> > + *
> > + * The scan will update the link count on the inode to reflect the
> > + * number of back refs found.  If it goes down to zero, the iput
> > + * will free the inode.
> > + */
> > +static noinline int fixup_inode_link_count(struct btrfs_trans_handle
*trans,
> > +					   struct btrfs_root *root,
> > +					   struct inode *inode)
> > +{
> > +	struct btrfs_path *path;
> > +	int ret;
> > +	u64 nlink = 0;
> > +	u64 ino = btrfs_ino(inode);
> > +
> > +	path = btrfs_alloc_path();
> > +	if (!path)
> > +		return -ENOMEM;
> > +
> > +	ret = count_inode_refs(root, inode, path);
> > +	btrfs_release_path(path);
> 
> Either count_inode_refs should alloc it''s private path, or it
should
> return the path in a released state. The caller should not be
> responsible to pass a clean path and release it afterwards, as
> count_inode_refs does not return anything through the path.
> 
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	nlink = ret;
> > +
> > +	ret = count_inode_extrefs(root, inode, path);
> > +	btrfs_release_path(path);
> 
> Same here. I''d just put the btrfs_release_path into
count_inode_(ext)refs.
Yeah, fixed those up.

Thanks again Jan!
	--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-May-04 11:39 UTC

head link

Re: [PATCH 2/3] btrfs: extended inode refs

On Thu, May 03, 2012 at 04:12:21PM -0700, Mark Fasheh
wrote:> > > +
> > > +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> > > +
> > > +		/* So that we don''t loop back looking for old style
log refs. */
> > > +		ref_end = ref_ptr;
> > > +
> > > +		extref = (struct btrfs_inode_extref *)
btrfs_item_ptr_offset(eb, slot);
> > > +		namelen = btrfs_inode_extref_name_len(eb, extref);
> > > +		name = kmalloc(namelen, GFP_NOFS);
> > 
> > kmalloc may fail.
> 
> Fixed both instances of this. I''m just testing for null return
from kmalloc
> and bubbling the -ENOMEM back up. The callers of add_inode_ref() will wind
> up BUGing on us anyway but that''s beyond the scope of this patch.
Yes, this is consistent with the rest of no-mem handling. Fixing all
caller paths is not always trivial and one does not want to do it during
a patch development.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Fasheh

2012-May-08 22:57 UTC

head link

Re: [PATCH 3/3] btrfs: extended inode refs

Hi Jan, comments inline as usual!

On Thu, Apr 12, 2012 at 07:59:36PM +0200, Jan Schmidt
wrote:> > @@ -858,62 +859,75 @@ static int inode_ref_info(u64 inum, u64 ioff,
struct btrfs_root *fs_root,
> >  }
> >  
> >  /*
> > - * this iterates to turn a btrfs_inode_ref into a full filesystem
path. elements
> > - * of the path are separated by ''/'' and the path is
guaranteed to be
> > - * 0-terminated. the path is only given within the current file
system.
> > - * Therefore, it never starts with a ''/''. the
caller is responsible to provide
> > - * "size" bytes in "dest". the dest buffer will
be filled backwards. finally,
> > - * the start point of the resulting string is returned. this pointer
is within
> > - * dest, normally.
> > - * in case the path buffer would overflow, the pointer is decremented
further
> > - * as if output was written to the buffer, though no more output is
actually
> > - * generated. that way, the caller can determine how much space would
be
> > - * required for the path to fit into the buffer. in that case, the
returned
> > - * value will be smaller than dest. callers must check this!
> > + * Given the parent objectid and name/name_len pairs of an inode ref
> > + * (any version) this iterates to turn that information into a
> > + * full filesystem path. elements of the path are separated by
''/'' and
> > + * the path is guaranteed to be 0-terminated. the path is only given
> > + * within the current file system.  Therefore, it never starts with a
> > + * ''/''. the caller is responsible to provide
"size" bytes in
> > + * "dest". the dest buffer will be filled backwards.
finally, the
> > + * start point of the resulting string is returned. this pointer is
> > + * within dest, normally.  in case the path buffer would overflow,
the
> > + * pointer is decremented further as if output was written to the
> > + * buffer, though no more output is actually generated. that way, the
> > + * caller can determine how much space would be required for the path
> > + * to fit into the buffer. in that case, the returned value will be
> > + * smaller than dest. callers must check this!
> 
> It would reduce patch sets if you can extend comments in a compatible
> way, you make reviewers happy if you don''t realign text (or,
later,
> function parameters) where it''s not required.
Yeah I just reverted the comment change as it''s no longer needed
anyway.

> >   */
> >  static char *iref_to_path(struct btrfs_root *fs_root, struct
btrfs_path *path,
> > -				struct btrfs_inode_ref *iref,
> > -				struct extent_buffer *eb_in, u64 parent,
> > -				char *dest, u32 size)
> > +			  int name_len, unsigned long name_off,
> 
> name_len should be u32
hmm ok that''s fine.

> > +			  struct extent_buffer *eb_in, u64 parent,
> > +			  char *dest, u32 size)
> >  {
> > -	u32 len;
> >  	int slot;
> >  	u64 next_inum;
> >  	int ret;
> >  	s64 bytes_left = size - 1;
> >  	struct extent_buffer *eb = eb_in;
> >  	struct btrfs_key found_key;
> > +	struct btrfs_inode_ref *iref;
> > +	struct btrfs_inode_extref *iref2;
> 
> iextref
done.

> >  	if (bytes_left >= 0)
> >  		dest[bytes_left] = ''\0'';
> >  
> >  	while (1) {
> > -		len = btrfs_inode_ref_name_len(eb, iref);
> > -		bytes_left -= len;
> > +		bytes_left -= name_len;
> >  		if (bytes_left >= 0)
> >  			read_extent_buffer(eb, dest + bytes_left,
> > -						(unsigned long)(iref + 1), len);
> > +					   name_off, name_len);
> >  		if (eb != eb_in)
> >  			free_extent_buffer(eb);
> > +
> > +		/* Ok, we have enough to find any refs to the parent inode. */
> >  		ret = inode_ref_info(parent, 0, fs_root, path, &found_key);
> > -		if (ret > 0)
> > -			ret = -ENOENT;
> > -		if (ret)
> > -			break;
> >  		next_inum = found_key.offset;
> > +		if (ret == 0) {
> > +			slot = path->slots[0];
> > +			eb = path->nodes[0];
> > +			/* make sure we can use eb after releasing the path */
> > +			if (eb != eb_in)
> > +				atomic_inc(&eb->refs);
> > +			btrfs_release_path(path);
> > +			iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
> > +
> > +			name_len = btrfs_inode_ref_name_len(eb, iref);
> > +			name_off = (unsigned long)(iref + 1);
> > +		} else {
> > +			ret = btrfs_find_one_extref(fs_root, parent, 0, path,
> > +						    &iref2, NULL);
> > +			if (ret)
> > +				break;
> > +
> > +			next_inum = btrfs_inode_extref_parent(eb, iref2);
> > +			name_off = (unsigned long)&iref2->name;
> > +			name_len = btrfs_inode_extref_name_len(eb, iref2);
> > +		}
> >  
> >  		/* regular exit ahead */
> >  		if (parent == next_inum)
> >  			break;
> 
> These regular exit lines must go before the block you inserted.
> Otherwise we leak a reference on eb if it''s != eb_in.
Good catch, fixed by virtue of fixing the below issue :)

> Whereas I think we don''t need this if-else-construct at all. We do
need
> the changes you made as to passing name_len and name_off, I agree.
> However, the rest of the function should stay as it was, because the
> parent of each object must be a directory and a directory won''t
have
> hard links. Thus, we''ll never meet INODE_EXTREFs when walking up
the
> path. Or did I miss something?
You didn''t miss anything, at the time I coded this part I
wasn''t 100% sure
if we would use inode refs on a directory for something *other* than
hardlinking a dir (like, possibly some internal use I wasn''t aware of).
I''m
pretty confident now that this can''t happen so I think the best
approach is
to just kill the code that''s handling this condition and leave a nice
comment.


> >  
> > -		slot = path->slots[0];
> > -		eb = path->nodes[0];
> > -		/* make sure we can use eb after releasing the path */
> > -		if (eb != eb_in)
> > -			atomic_inc(&eb->refs);
> > -		btrfs_release_path(path);
> > -
> > -		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
> >  		parent = next_inum;
> >  		--bytes_left;
> >  		if (bytes_left >= 0)
> > @@ -1226,9 +1240,9 @@ int iterate_inodes_from_logical(u64 logical,
struct btrfs_fs_info *fs_info,
> >  	return ret;
> >  }
> >  
> > -static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> > -				struct btrfs_path *path,
> > -				iterate_irefs_t *iterate, void *ctx)
> > +static int iterate_inode_refs(u64 inum, struct btrfs_root *fs_root,
> > +			      struct btrfs_path *path,
> > +			      iterate_irefs_t *iterate, void *ctx)
> 
> This function must not call free_extent_buffer(eb) in line 1306 after
> applying your patch set (immediately before the break). Second, I think
> we''d better add a blocking read lock on eb after incrementing
it''s
> refcount, because we need the current content to stay as it is. Both
> isn''t part of your patches, but it might be easier if you make
that
> bugfix change as a 3/4 patch within your set and turn this one into 4/4.
> If you don''t like that, I''ll send a separate patch for
it. Don''t miss
> the unlock if you do it ;-)
Ok, I think I was able to figure out and add the correct locking calls.

Basically I believe I need to wrap access around:

btrfs_tree_read_lock(eb);
btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);

<read eb contents>

btrfs_tree_read_unlock_blocking(eb);

> >  {
> >  	int ret;
> >  	int slot;
> > @@ -1244,7 +1258,7 @@ static int iterate_irefs(u64 inum, struct
btrfs_root *fs_root,
> >  
> >  	while (1) {
> >  		ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path,
> > -					&found_key);
> > +				     &found_key);
> >  		if (ret < 0)
> >  			break;
> >  		if (ret) {
> > @@ -1286,6 +1300,76 @@ static int iterate_irefs(u64 inum, struct
btrfs_root *fs_root,
> >  	return ret;
> >  }
> >  
> > +static int iterate_inode_extrefs(u64 inum, struct btrfs_root
*fs_root,
> > +				 struct btrfs_path *path,
> > +				 iterate_extrefs_t *iterate, void *ctx)
> > +{
> > +	int ret;
> > +	int slot;
> > +	u64 offset = 0;
> > +	u64 parent;
> > +	int found = 0;
> > +	struct extent_buffer *eb;
> > +	struct btrfs_item *item;
> > +	struct btrfs_inode_extref *iref2;
> 
> iextref
Done.

> > +
> > +	while (1) {
> > +		ret = btrfs_find_one_extref(fs_root, inum, offset, path,
&iref2,
> > +					    &offset);
> > +		if (ret < 0)
> > +			break;
> > +		if (ret) {
> > +			ret = found ? 0 : -ENOENT;
> > +			break;
> > +		}
> > +		++found;
> > +
> > +		slot = path->slots[0];
> > +		eb = path->nodes[0];
> > +		/* make sure we can use eb after releasing the path */
> > +		atomic_inc(&eb->refs);
> 
> You need a blocking read lock here, too. Grab it before releasing the path.
Done.
> 
> > +		btrfs_release_path(path);
> > +
> > +		item = btrfs_item_nr(eb, slot);
> 
> You don''t need item.
Removed.
> 
> > +		iref2 = btrfs_item_ptr(eb, slot, struct btrfs_inode_extref);
> > +
> > +		parent = btrfs_inode_extref_parent(eb, iref2);
> > +		ret = iterate(parent, iref2, eb, ctx);
> 
> The caller shouldn''t have to deal with two different types of
callbacks.
> Please just build a dummy struct btrfs_inode_ref object here and pass it
> to iterate.
> Alternatively, you can extract the information the caller
> will need, here, and pass that instead of a struct btrfs_inode_ref. This
> way, we can use the same type for both iterate() functions.
Ask and you shall receive :)

It turns out to be pretty trivial to just pass in name_len and the name
pointer as that''s all those callbacks needed from the ref.

> > +		if (ret) {
> > +			free_extent_buffer(eb);
> > +			break;
> > +		}
> > +
> > +		free_extent_buffer(eb);
> 
> Call free_extent_buffer(eb) before the if (ret), drop it from the if
> block, add an unlock before the if block.
Fixed that.

> > +		offset++;
> 
> Another caller not expecting btrfs_find_one_extref to return offset
> incremented. The offset++ should stay here and btrfs_find_one_extref
> should just return the plain offset.
> 
> > +	}
> > +
> > +	btrfs_release_path(path);
> > +
> > +	return ret;
> > +}
> > +
> > +static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> > +			 struct btrfs_path *path,
> > +			 iterate_irefs_t *iterate,
> > +			 iterate_extrefs_t *iterate2, void *ctx)
> 
> As mentioned above, I''d like to see only be a single iterate
function at
> this level.
> 
> > +{
> > +	int ret, found_refs = 0;
> 
> split
done.

> > +
> > +	ret = iterate_inode_refs(inum, fs_root, path, iterate, ctx);
> > +	if (ret && ret != -ENOENT)
> > +		return ret;
> > +
> > +	if (ret != -ENOENT)
> > +		++found_refs;
> 
> I''d make those 2 if statements:
> 
> if (!ret)
> 	++found_refs;
> else if (ret != -ENOENT)
> 	return ret;
done.

> > +
> > +/*
> > + * returns 0 if the path could be dumped (probably truncated)
> > + * returns <0 in case of an error
> > + */
> > +static int inode_to_path2(u64 inum, struct btrfs_inode_extref *iref2,
> > +			  struct extent_buffer *eb, void *ctx)
> 
> We''ll get rid of a second inode_to_path, too, if my suggestion
works out.
Yep.

Thanks Jan!
	--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-May-09 17:02 UTC

head link

Re: [PATCH 3/3] btrfs: extended inode refs

On Tue, May 08, 2012 at 03:57:39PM -0700, Mark Fasheh
wrote:> Hi Jan, comments inline as usual!
> 
> > This function must not call free_extent_buffer(eb) in line 1306 after
> > applying your patch set (immediately before the break). Second, I
think
> > we''d better add a blocking read lock on eb after incrementing
it''s
> > refcount, because we need the current content to stay as it is. Both
> > isn''t part of your patches, but it might be easier if you
make that
> > bugfix change as a 3/4 patch within your set and turn this one into
4/4.
> > If you don''t like that, I''ll send a separate patch
for it. Don''t miss
> > the unlock if you do it ;-)
> 
> Ok, I think I was able to figure out and add the correct locking calls.
> 
> Basically I believe I need to wrap access around:
> 
> btrfs_tree_read_lock(eb);
> btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
> 
> <read eb contents>
> 
> btrfs_tree_read_unlock_blocking(eb);
You only need a blocking lock if you''re scheduling.  Otherwise the
spinlock variant is fine.
> > > +
> > > +	while (1) {
> > > +		ret = btrfs_find_one_extref(fs_root, inum, offset, path,
&iref2,
> > > +					    &offset);
> > > +		if (ret < 0)
> > > +			break;
> > > +		if (ret) {
> > > +			ret = found ? 0 : -ENOENT;
> > > +			break;
> > > +		}
> > > +		++found;
> > > +
> > > +		slot = path->slots[0];
> > > +		eb = path->nodes[0];
> > > +		/* make sure we can use eb after releasing the path */
> > > +		atomic_inc(&eb->refs);
> > 
> > You need a blocking read lock here, too. Grab it before releasing the
path.
If you''re calling btrfs_search_slot, it will give you a blocking lock
on the leaf.  If you set path->leave_spinning before the call,
you''ll
have a spinning lock on the leaf.

If you unlock a block that you got from a path (like eb path->nodes[0]), the
path structure has a flag for each level that
indicates if that block was locked or not.  See btrfs_release_path().
So, don''t fiddle the locks without fiddling the paths.

You can switch from spinning to/from blocking without touching the path, it
figures that out.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-10 08:23 UTC

head link

Re: [PATCH 3/3] btrfs: extended inode refs

On Wed, May 09, 2012 at 19:02 (+0200), Chris Mason
wrote:>>>> +
>>>> +	while (1) {
>>>> +		ret = btrfs_find_one_extref(fs_root, inum, offset, path,
&iref2,
>>>> +					    &offset);
>>>> +		if (ret < 0)
>>>> +			break;
>>>> +		if (ret) {
>>>> +			ret = found ? 0 : -ENOENT;
>>>> +			break;
>>>> +		}
>>>> +		++found;
>>>> +
>>>> +		slot = path->slots[0];
>>>> +		eb = path->nodes[0];
>>>> +		/* make sure we can use eb after releasing the path */
>>>> +		atomic_inc(&eb->refs);
>>>
>>> You need a blocking read lock here, too. Grab it before releasing
the path.
> 
> If you''re calling btrfs_search_slot, it will give you a blocking
lock
> on the leaf.  If you set path->leave_spinning before the call,
you''ll
> have a spinning lock on the leaf.
> 
> If you unlock a block that you got from a path (like eb >
path->nodes[0]), the path structure has a flag for each level that
> indicates if that block was locked or not.  See btrfs_release_path().
> So, don''t fiddle the locks without fiddling the paths.
> 
> You can switch from spinning to/from blocking without touching the path, it
> figures that out.
Note that we''re releasing the path shortly after. My suggestion was to
grab an *additional* read lock after atomic_inc(&eb->refs) (probably
better extent_buffer_get(eb)) and before btrfs_path_release().

An alternative would be to set path->locks[0] to NULL, which would
btrfs_release_path prevent from unlocking it, kidnapping its lock for
our purpose. But I much prefer the open coded solution.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-May-10 13:35 UTC

head link

Re: [PATCH 3/3] btrfs: extended inode refs

On Thu, May 10, 2012 at 10:23:28AM +0200, Jan Schmidt
wrote:> On Wed, May 09, 2012 at 19:02 (+0200), Chris Mason wrote:
> >>>> +
> >>>> +	while (1) {
> >>>> +		ret = btrfs_find_one_extref(fs_root, inum, offset,
path, &iref2,
> >>>> +					    &offset);
> >>>> +		if (ret < 0)
> >>>> +			break;
> >>>> +		if (ret) {
> >>>> +			ret = found ? 0 : -ENOENT;
> >>>> +			break;
> >>>> +		}
> >>>> +		++found;
> >>>> +
> >>>> +		slot = path->slots[0];
> >>>> +		eb = path->nodes[0];
> >>>> +		/* make sure we can use eb after releasing the path */
> >>>> +		atomic_inc(&eb->refs);
> >>>
> >>> You need a blocking read lock here, too. Grab it before
releasing the path.
> > 
> > If you''re calling btrfs_search_slot, it will give you a
blocking lock
> > on the leaf.  If you set path->leave_spinning before the call,
you''ll
> > have a spinning lock on the leaf.
> > 
> > If you unlock a block that you got from a path (like eb > >
path->nodes[0]), the path structure has a flag for each level that
> > indicates if that block was locked or not.  See btrfs_release_path().
> > So, don''t fiddle the locks without fiddling the paths.
> > 
> > You can switch from spinning to/from blocking without touching the
path, it
> > figures that out.
> 
> Note that we''re releasing the path shortly after. My suggestion
was to
> grab an *additional* read lock after atomic_inc(&eb->refs) (probably
> better extent_buffer_get(eb)) and before btrfs_path_release().
> 
> An alternative would be to set path->locks[0] to NULL, which would
> btrfs_release_path prevent from unlocking it, kidnapping its lock for
> our purpose. But I much prefer the open coded solution.
Thanks, I see now.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Apr 2012 - [PATCH 0/3] btrfs: extended inode refs

[PATCH 0/3] btrfs: extended inode refs

[PATCH 1/3] btrfs: extended inode refs

[PATCH 2/3] btrfs: extended inode refs

[PATCH 3/3] btrfs: extended inode refs

Re: [PATCH 0/3] btrfs: extended inode refs

Re: [PATCH 0/3] btrfs: extended inode refs

Re: [PATCH 0/3] btrfs: extended inode refs

Re: [PATCH 0/3] btrfs: extended inode refs

Re: [PATCH 0/3] btrfs: extended inode refs

Re: [PATCH 1/3] btrfs: extended inode refs

Re: [PATCH 2/3] btrfs: extended inode refs

Re: [PATCH 2/3] btrfs: extended inode refs

Re: [PATCH 0/3] btrfs: extended inode refs

Re: [PATCH 0/3] btrfs: extended inode refs

Re: [PATCH 3/3] btrfs: extended inode refs

Re: [PATCH 3/3] btrfs: extended inode refs

Re: [PATCH 1/3] btrfs: extended inode refs

Re: [PATCH 1/3] btrfs: extended inode refs

Re: [PATCH 2/3] btrfs: extended inode refs

Re: [PATCH 2/3] btrfs: extended inode refs

Re: [PATCH 2/3] btrfs: extended inode refs

Re: [PATCH 3/3] btrfs: extended inode refs

Re: [PATCH 3/3] btrfs: extended inode refs

Re: [PATCH 3/3] btrfs: extended inode refs

Re: [PATCH 3/3] btrfs: extended inode refs