thr3ads.net - Btrfs devel - [PATCH 00/24] Btrfs: tree modification log and qgroup patch set [May 2012]

If this information is useful, please help other people find it:
Share via:

Jan Schmidt

2012-May-20 16:06 UTC

[PATCH 00/24] Btrfs: tree modification log and qgroup patch set

This is a combination of three things:

The first commit fixes a false assumption concerning indirect tree block
backrefs. That one should definitely go into 3.5, thanks to Alexander
Block for finding and testing it.

Commit 2 to 12 provide reliable backref resolving on busy trees. The
previous attempts to block certain tree modifications while we''re
resolving backrefs all ended up in (dead-) locking nightmares. What we
now do is we record all the changes we make to fs trees while backref
resolving is in progress into a tree modification log. During backref
resolving we then merge the current state of the tree with the recorded
modifications to get a consistent previous state of the tree. I''d like
to see this in 3.5, too.

Commit 13 to 24 finally add quota groups. This is Arne Jansen''s patch
set sent last October [1] rebased on top of the reliable backref
resolver. See his cover letter and [2] for background and usage
information on qgroups. Would be nice if that went into 3.5 as well.

All three stages can be pulled from my git repository, for patch 1 pull:

	git://git.jan-o-sch.net/btrfs-unstable backref-bugfix

For patches 1 to 12 pull:

	git://git.jan-o-sch.net/btrfs-unstable tree-mod-log-done

For patches 1 to 24 pull:

	git://git.jan-o-sch.net/btrfs-unstable tree-mod-log-quota

All these branches are based on the current for-linus branch from
Chris''
repository. Checked with xfstests (fails 254 273 275, which i claim has
nothing to do with this patch set) and hammered on the filesystem with
fs_mark while resolving backrefs in a loop. Qgroup functionality was
tested with a private test we might turn into an xfstest soon.

Test it, break it, report it :-)

-Jan

Arne Jansen (12):
  Btrfs: qgroup on-disk format
  Btrfs: add helper for tree enumeration
  Btrfs: check the root passed to btrfs_end_transaction
  Btrfs: added helper to create new trees
  Btrfs: qgroup state and initialization
  Btrfs: Test code to change the order of delayed-ref processing
  Btrfs: qgroup implementation and prototypes
  Btrfs: quota tree support and startup
  Btrfs: hooks for qgroup to record delayed refs
  Btrfs: hooks to reserve qgroup space
  Btrfs: add qgroup ioctls
  Btrfs: add qgroup inheritance

Jan Schmidt (12):
  Btrfs: bugfix: ignore the wrong key for indirect tree block backrefs
  Btrfs: look into the extent during find_all_leafs
  Btrfs: don''t set for_cow parameter for tree block functions
  Btrfs: move struct seq_list to ctree.h
  Btrfs: dummy extent buffers for tree mod log
  Btrfs: add tree mod log to fs_info
  Btrfs: add tree modification log functions
  Btrfs: put all modifications into the tree mod log
  Btrfs: add btrfs_search_old_slot
  Btrfs: use the tree modification log for backref resolving
  Btrfs: fs_info variable for join_transaction
  Btrfs: tree mod log sanity checks in join_transaction

 fs/btrfs/Makefile      |    2 +-
 fs/btrfs/backref.c     |  424 +++++++++++---
 fs/btrfs/backref.h     |    3 +-
 fs/btrfs/ctree.c       |  927 +++++++++++++++++++++++++++--
 fs/btrfs/ctree.h       |  231 ++++++++-
 fs/btrfs/delayed-ref.c |   29 +-
 fs/btrfs/delayed-ref.h |    5 -
 fs/btrfs/disk-io.c     |  139 ++++-
 fs/btrfs/disk-io.h     |    5 +
 fs/btrfs/extent-tree.c |   73 +++-
 fs/btrfs/extent_io.c   |   73 +++-
 fs/btrfs/extent_io.h   |    3 +
 fs/btrfs/ioctl.c       |  246 +++++++-
 fs/btrfs/ioctl.h       |   62 ++-
 fs/btrfs/qgroup.c      | 1531 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/transaction.c |   83 ++-
 fs/btrfs/transaction.h |    8 +
 17 files changed, 3620 insertions(+), 224 deletions(-)
 create mode 100644 fs/btrfs/qgroup.c

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 01/24] Btrfs: bugfix: ignore the wrong key for indirect tree block backrefs

The key we store with a tree block backref is only a hint. It is set when
the ref is created and can remain correct for a long time. As the tree is
rebalanced, however, eventually the key no longer points to the correct
destination.

With this patch, we change find_parent_nodes to no longer add keys unless it
knows for sure they''re correct (e.g. because they''re for an
extent data
backref). Then when we later encounter a backref ref with no parent and no
key set, we grab the block and take the first key from the block itself.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/backref.c |  185 ++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 135 insertions(+), 50 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index bcec067..710029e 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -30,16 +30,55 @@
 struct __prelim_ref {
 	struct list_head list;
 	u64 root_id;
-	struct btrfs_key key;
+	struct btrfs_key key_for_search;
 	int level;
 	int count;
 	u64 parent;
 	u64 wanted_disk_byte;
 };
 
+/*
+ * the rules for all callers of this function are:
+ * - obtaining the parent is the goal
+ * - if you add a key, you must know that it is a correct key
+ * - if you cannot add the parent or a correct key, then we will look into the
+ *   block later to set a correct key
+ *
+ * delayed refs
+ * ===========+ *        backref type | shared | indirect | shared | indirect
+ * information         |   tree |     tree |   data |     data
+ * --------------------+--------+----------+--------+----------
+ *      parent logical |    y   |     -    |    -   |     -
+ *      key to resolve |    -   |     y    |    y   |     y
+ *  tree block logical |    -   |     -    |    -   |     -
+ *  root for resolving |    y   |     y    |    y   |     y
+ *
+ * - column 1:       we''ve the parent -> done
+ * - column 2, 3, 4: we use the key to find the parent
+ *
+ * on disk refs (inline or keyed)
+ * =============================+ *        backref type | shared | indirect |
shared | indirect
+ * information         |   tree |     tree |   data |     data
+ * --------------------+--------+----------+--------+----------
+ *      parent logical |    y   |     -    |    y   |     -
+ *      key to resolve |    -   |     -    |    -   |     y
+ *  tree block logical |    y   |     y    |    y   |     y
+ *  root for resolving |    -   |     y    |    y   |     y
+ *
+ * - column 1, 3: we''ve the parent -> done
+ * - column 2:    we take the first key from the block to find the parent
+ *                (see __add_missing_keys)
+ * - column 4:    we use the key to find the parent
+ *
+ * additional information that''s available but not required to find
the parent
+ * block might help in merging entries to gain some speed.
+ */
+
 static int __add_prelim_ref(struct list_head *head, u64 root_id,
-			    struct btrfs_key *key, int level, u64 parent,
-			    u64 wanted_disk_byte, int count)
+			    struct btrfs_key *key, int level,
+			    u64 parent, u64 wanted_disk_byte, int count)
 {
 	struct __prelim_ref *ref;
 
@@ -50,9 +89,9 @@ static int __add_prelim_ref(struct list_head *head, u64
root_id,
 
 	ref->root_id = root_id;
 	if (key)
-		ref->key = *key;
+		ref->key_for_search = *key;
 	else
-		memset(&ref->key, 0, sizeof(ref->key));
+		memset(&ref->key_for_search, 0, sizeof(ref->key_for_search));
 
 	ref->level = level;
 	ref->count = count;
@@ -152,12 +191,13 @@ static int __resolve_indirect_ref(struct btrfs_fs_info
*fs_info,
 		goto out;
 
 	path->lowest_level = level;
-	ret = btrfs_search_slot(NULL, root, &ref->key, path, 0, 0);
+	ret = btrfs_search_slot(NULL, root, &ref->key_for_search, path, 0, 0);
 	pr_debug("search slot in root %llu (level %d, ref count %d) returned
"
 		 "%d for key (%llu %u %llu)\n",
 		 (unsigned long long)ref->root_id, level, ref->count, ret,
-		 (unsigned long long)ref->key.objectid, ref->key.type,
-		 (unsigned long long)ref->key.offset);
+		 (unsigned long long)ref->key_for_search.objectid,
+		 ref->key_for_search.type,
+		 (unsigned long long)ref->key_for_search.offset);
 	if (ret < 0)
 		goto out;
 
@@ -246,10 +286,65 @@ static int __resolve_indirect_refs(struct btrfs_fs_info
*fs_info,
 	return ret;
 }
 
+static inline int ref_for_same_block(struct __prelim_ref *ref1,
+				     struct __prelim_ref *ref2)
+{
+	if (ref1->level != ref2->level)
+		return 0;
+	if (ref1->root_id != ref2->root_id)
+		return 0;
+	if (ref1->key_for_search.type != ref2->key_for_search.type)
+		return 0;
+	if (ref1->key_for_search.objectid != ref2->key_for_search.objectid)
+		return 0;
+	if (ref1->key_for_search.offset != ref2->key_for_search.offset)
+		return 0;
+	if (ref1->parent != ref2->parent)
+		return 0;
+
+	return 1;
+}
+
+/*
+ * read tree blocks and add keys where required.
+ */
+static int __add_missing_keys(struct btrfs_fs_info *fs_info,
+			      struct list_head *head)
+{
+	struct list_head *pos;
+	struct extent_buffer *eb;
+
+	list_for_each(pos, head) {
+		struct __prelim_ref *ref;
+		ref = list_entry(pos, struct __prelim_ref, list);
+
+		if (ref->parent)
+			continue;
+		if (ref->key_for_search.type)
+			continue;
+		BUG_ON(!ref->wanted_disk_byte);
+		eb = read_tree_block(fs_info->tree_root, ref->wanted_disk_byte,
+				     fs_info->tree_root->leafsize, 0);
+		BUG_ON(!eb);
+		btrfs_tree_read_lock(eb);
+		if (btrfs_header_level(eb) == 0)
+			btrfs_item_key_to_cpu(eb, &ref->key_for_search, 0);
+		else
+			btrfs_node_key_to_cpu(eb, &ref->key_for_search, 0);
+		btrfs_tree_read_unlock(eb);
+		free_extent_buffer(eb);
+	}
+	return 0;
+}
+
 /*
  * merge two lists of backrefs and adjust counts accordingly
  *
  * mode = 1: merge identical keys, if key is set
+ *    FIXME: if we add more keys in __add_prelim_ref, we can merge more here.
+ *           additionally, we could even add a key range for the blocks we
+ *           looked into to merge even more (-> replace unresolved refs by
those
+ *           having a parent).
  * mode = 2: merge identical parents
  */
 static int __merge_refs(struct list_head *head, int mode)
@@ -263,20 +358,21 @@ static int __merge_refs(struct list_head *head, int mode)
 
 		ref1 = list_entry(pos1, struct __prelim_ref, list);
 
-		if (mode == 1 && ref1->key.type == 0)
-			continue;
 		for (pos2 = pos1->next, n2 = pos2->next; pos2 != head;
 		     pos2 = n2, n2 = pos2->next) {
 			struct __prelim_ref *ref2;
+			struct __prelim_ref *xchg;
 
 			ref2 = list_entry(pos2, struct __prelim_ref, list);
 
 			if (mode == 1) {
-				if (memcmp(&ref1->key, &ref2->key,
-					   sizeof(ref1->key)) ||
-				    ref1->level != ref2->level ||
-				    ref1->root_id != ref2->root_id)
+				if (!ref_for_same_block(ref1, ref2))
 					continue;
+				if (!ref1->parent && ref2->parent) {
+					xchg = ref1;
+					ref1 = ref2;
+					ref2 = xchg;
+				}
 				ref1->count += ref2->count;
 			} else {
 				if (ref1->parent != ref2->parent)
@@ -296,16 +392,17 @@ static int __merge_refs(struct list_head *head, int mode)
  * smaller or equal that seq to the list
  */
 static int __add_delayed_refs(struct btrfs_delayed_ref_head *head, u64 seq,
-			      struct btrfs_key *info_key,
 			      struct list_head *prefs)
 {
 	struct btrfs_delayed_extent_op *extent_op = head->extent_op;
 	struct rb_node *n = &head->node.rb_node;
+	struct btrfs_key key;
+	struct btrfs_key op_key = {0};
 	int sgn;
 	int ret = 0;
 
 	if (extent_op && extent_op->update_key)
-		btrfs_disk_key_to_cpu(info_key, &extent_op->key);
+		btrfs_disk_key_to_cpu(&op_key, &extent_op->key);
 
 	while ((n = rb_prev(n))) {
 		struct btrfs_delayed_ref_node *node;
@@ -337,7 +434,7 @@ static int __add_delayed_refs(struct btrfs_delayed_ref_head
*head, u64 seq,
 			struct btrfs_delayed_tree_ref *ref;
 
 			ref = btrfs_delayed_node_to_tree_ref(node);
-			ret = __add_prelim_ref(prefs, ref->root, info_key,
+			ret = __add_prelim_ref(prefs, ref->root, &op_key,
 					       ref->level + 1, 0, node->bytenr,
 					       node->ref_mod * sgn);
 			break;
@@ -346,7 +443,7 @@ static int __add_delayed_refs(struct btrfs_delayed_ref_head
*head, u64 seq,
 			struct btrfs_delayed_tree_ref *ref;
 
 			ref = btrfs_delayed_node_to_tree_ref(node);
-			ret = __add_prelim_ref(prefs, ref->root, info_key,
+			ret = __add_prelim_ref(prefs, ref->root, NULL,
 					       ref->level + 1, ref->parent,
 					       node->bytenr,
 					       node->ref_mod * sgn);
@@ -354,8 +451,6 @@ static int __add_delayed_refs(struct btrfs_delayed_ref_head
*head, u64 seq,
 		}
 		case BTRFS_EXTENT_DATA_REF_KEY: {
 			struct btrfs_delayed_data_ref *ref;
-			struct btrfs_key key;
-
 			ref = btrfs_delayed_node_to_data_ref(node);
 
 			key.objectid = ref->objectid;
@@ -368,7 +463,6 @@ static int __add_delayed_refs(struct btrfs_delayed_ref_head
*head, u64 seq,
 		}
 		case BTRFS_SHARED_DATA_REF_KEY: {
 			struct btrfs_delayed_data_ref *ref;
-			struct btrfs_key key;
 
 			ref = btrfs_delayed_node_to_data_ref(node);
 
@@ -394,8 +488,7 @@ static int __add_delayed_refs(struct btrfs_delayed_ref_head
*head, u64 seq,
  */
 static int __add_inline_refs(struct btrfs_fs_info *fs_info,
 			     struct btrfs_path *path, u64 bytenr,
-			     struct btrfs_key *info_key, int *info_level,
-			     struct list_head *prefs)
+			     int *info_level, struct list_head *prefs)
 {
 	int ret = 0;
 	int slot;
@@ -424,12 +517,9 @@ static int __add_inline_refs(struct btrfs_fs_info *fs_info,
 
 	if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
 		struct btrfs_tree_block_info *info;
-		struct btrfs_disk_key disk_key;
 
 		info = (struct btrfs_tree_block_info *)ptr;
 		*info_level = btrfs_tree_block_level(leaf, info);
-		btrfs_tree_block_key(leaf, info, &disk_key);
-		btrfs_disk_key_to_cpu(info_key, &disk_key);
 		ptr += sizeof(struct btrfs_tree_block_info);
 		BUG_ON(ptr > end);
 	} else {
@@ -447,7 +537,7 @@ static int __add_inline_refs(struct btrfs_fs_info *fs_info,
 
 		switch (type) {
 		case BTRFS_SHARED_BLOCK_REF_KEY:
-			ret = __add_prelim_ref(prefs, 0, info_key,
+			ret = __add_prelim_ref(prefs, 0, NULL,
 						*info_level + 1, offset,
 						bytenr, 1);
 			break;
@@ -462,8 +552,9 @@ static int __add_inline_refs(struct btrfs_fs_info *fs_info,
 			break;
 		}
 		case BTRFS_TREE_BLOCK_REF_KEY:
-			ret = __add_prelim_ref(prefs, offset, info_key,
-					       *info_level + 1, 0, bytenr, 1);
+			ret = __add_prelim_ref(prefs, offset, NULL,
+					       *info_level + 1, 0,
+					       bytenr, 1);
 			break;
 		case BTRFS_EXTENT_DATA_REF_KEY: {
 			struct btrfs_extent_data_ref *dref;
@@ -477,8 +568,8 @@ static int __add_inline_refs(struct btrfs_fs_info *fs_info,
 			key.type = BTRFS_EXTENT_DATA_KEY;
 			key.offset = btrfs_extent_data_ref_offset(leaf, dref);
 			root = btrfs_extent_data_ref_root(leaf, dref);
-			ret = __add_prelim_ref(prefs, root, &key, 0, 0, bytenr,
-						count);
+			ret = __add_prelim_ref(prefs, root, &key, 0, 0,
+					       bytenr, count);
 			break;
 		}
 		default:
@@ -496,8 +587,7 @@ static int __add_inline_refs(struct btrfs_fs_info *fs_info,
  */
 static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
 			    struct btrfs_path *path, u64 bytenr,
-			    struct btrfs_key *info_key, int info_level,
-			    struct list_head *prefs)
+			    int info_level, struct list_head *prefs)
 {
 	struct btrfs_root *extent_root = fs_info->extent_root;
 	int ret;
@@ -527,7 +617,7 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
 
 		switch (key.type) {
 		case BTRFS_SHARED_BLOCK_REF_KEY:
-			ret = __add_prelim_ref(prefs, 0, info_key,
+			ret = __add_prelim_ref(prefs, 0, NULL,
 						info_level + 1, key.offset,
 						bytenr, 1);
 			break;
@@ -543,8 +633,9 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
 			break;
 		}
 		case BTRFS_TREE_BLOCK_REF_KEY:
-			ret = __add_prelim_ref(prefs, key.offset, info_key,
-						info_level + 1, 0, bytenr, 1);
+			ret = __add_prelim_ref(prefs, key.offset, NULL,
+					       info_level + 1, 0,
+					       bytenr, 1);
 			break;
 		case BTRFS_EXTENT_DATA_REF_KEY: {
 			struct btrfs_extent_data_ref *dref;
@@ -560,7 +651,7 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
 			key.offset = btrfs_extent_data_ref_offset(leaf, dref);
 			root = btrfs_extent_data_ref_root(leaf, dref);
 			ret = __add_prelim_ref(prefs, root, &key, 0, 0,
-						bytenr, count);
+					       bytenr, count);
 			break;
 		}
 		default:
@@ -586,7 +677,6 @@ static int find_parent_nodes(struct btrfs_trans_handle
*trans,
 {
 	struct btrfs_key key;
 	struct btrfs_path *path;
-	struct btrfs_key info_key = { 0 };
 	struct btrfs_delayed_ref_root *delayed_refs = NULL;
 	struct btrfs_delayed_ref_head *head;
 	int info_level = 0;
@@ -645,8 +735,7 @@ again:
 				btrfs_put_delayed_ref(&head->node);
 				goto again;
 			}
-			ret = __add_delayed_refs(head, seq, &info_key,
-						 &prefs_delayed);
+			ret = __add_delayed_refs(head, seq, &prefs_delayed);
 			if (ret) {
 				spin_unlock(&delayed_refs->lock);
 				goto out;
@@ -665,10 +754,10 @@ again:
 		if (key.objectid == bytenr &&
 		    key.type == BTRFS_EXTENT_ITEM_KEY) {
 			ret = __add_inline_refs(fs_info, path, bytenr,
-						&info_key, &info_level, &prefs);
+						&info_level, &prefs);
 			if (ret)
 				goto out;
-			ret = __add_keyed_refs(fs_info, path, bytenr, &info_key,
+			ret = __add_keyed_refs(fs_info, path, bytenr,
 					       info_level, &prefs);
 			if (ret)
 				goto out;
@@ -676,16 +765,12 @@ again:
 	}
 	btrfs_release_path(path);
 
-	/*
-	 * when adding the delayed refs above, the info_key might not have
-	 * been known yet. Go over the list and replace the missing keys
-	 */
-	list_for_each_entry(ref, &prefs_delayed, list) {
-		if ((ref->key.offset | ref->key.type | ref->key.objectid) == 0)
-			memcpy(&ref->key, &info_key, sizeof(ref->key));
-	}
 	list_splice_init(&prefs_delayed, &prefs);
 
+	ret = __add_missing_keys(fs_info, &prefs);
+	if (ret)
+		goto out;
+
 	ret = __merge_refs(&prefs, 1);
 	if (ret)
 		goto out;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 02/24] Btrfs: look into the extent during find_all_leafs

Before this patch we called find_all_leafs for a data extent, then called
find_all_roots and then looked into the extent to grab the information
we were seeking. This was done without holding the leaves locked to avoid
deadlocks. However, this can obviouly race with concurrent tree
modifications.

Instead, we now look into the extent while we''re holding the lock
during
find_all_leafs and store this information together with the leaf list.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/backref.c |  222 +++++++++++++++++++++++++++++++++++++++++++--------
 fs/btrfs/backref.h |    2 +-
 2 files changed, 188 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 710029e..e6c54d8 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -24,6 +24,79 @@
 #include "delayed-ref.h"
 #include "locking.h"
 
+struct extent_inode_elem {
+	u64 inum;
+	u64 offset;
+	struct extent_inode_elem *next;
+};
+
+static int check_extent_in_eb(struct btrfs_key *key, struct extent_buffer *eb,
+				struct btrfs_file_extent_item *fi,
+				u64 extent_item_pos,
+				struct extent_inode_elem **eie)
+{
+	u64 data_offset;
+	u64 data_len;
+	struct extent_inode_elem *e;
+
+	data_offset = btrfs_file_extent_offset(eb, fi);
+	data_len = btrfs_file_extent_num_bytes(eb, fi);
+
+	if (extent_item_pos < data_offset ||
+	    extent_item_pos >= data_offset + data_len)
+		return 1;
+
+	e = kmalloc(sizeof(*e), GFP_NOFS);
+	if (!e)
+		return -ENOMEM;
+
+	e->next = *eie;
+	e->inum = key->objectid;
+	e->offset = key->offset + (extent_item_pos - data_offset);
+	*eie = e;
+
+	return 0;
+}
+
+static int find_extent_in_eb(struct extent_buffer *eb, u64 wanted_disk_byte,
+				u64 extent_item_pos,
+				struct extent_inode_elem **eie)
+{
+	u64 disk_byte;
+	struct btrfs_key key;
+	struct btrfs_file_extent_item *fi;
+	int slot;
+	int nritems;
+	int extent_type;
+	int ret;
+
+	/*
+	 * from the shared data ref, we only have the leaf but we need
+	 * the key. thus, we must look into all items and see that we
+	 * find one (some) with a reference to our extent item.
+	 */
+	nritems = btrfs_header_nritems(eb);
+	for (slot = 0; slot < nritems; ++slot) {
+		btrfs_item_key_to_cpu(eb, &key, slot);
+		if (key.type != BTRFS_EXTENT_DATA_KEY)
+			continue;
+		fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
+		extent_type = btrfs_file_extent_type(eb, fi);
+		if (extent_type == BTRFS_FILE_EXTENT_INLINE)
+			continue;
+		/* don''t skip BTRFS_FILE_EXTENT_PREALLOC, we can handle that */
+		disk_byte = btrfs_file_extent_disk_bytenr(eb, fi);
+		if (disk_byte != wanted_disk_byte)
+			continue;
+
+		ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie);
+		if (ret < 0)
+			return ret;
+	}
+
+	return 0;
+}
+
 /*
  * this structure records all encountered refs on the way up to the root
  */
@@ -33,6 +106,7 @@ struct __prelim_ref {
 	struct btrfs_key key_for_search;
 	int level;
 	int count;
+	struct extent_inode_elem *inode_list;
 	u64 parent;
 	u64 wanted_disk_byte;
 };
@@ -93,6 +167,7 @@ static int __add_prelim_ref(struct list_head *head, u64
root_id,
 	else
 		memset(&ref->key_for_search, 0, sizeof(ref->key_for_search));
 
+	ref->inode_list = NULL;
 	ref->level = level;
 	ref->count = count;
 	ref->parent = parent;
@@ -103,18 +178,26 @@ static int __add_prelim_ref(struct list_head *head, u64
root_id,
 }
 
 static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path,
-				struct ulist *parents,
-				struct extent_buffer *eb, int level,
-				u64 wanted_objectid, u64 wanted_disk_byte)
+				struct ulist *parents, int level,
+				struct btrfs_key *key, u64 wanted_disk_byte,
+				const u64 *extent_item_pos)
 {
 	int ret;
-	int slot;
+	int slot = path->slots[level];
+	struct extent_buffer *eb = path->nodes[level];
 	struct btrfs_file_extent_item *fi;
-	struct btrfs_key key;
+	struct extent_inode_elem *eie = NULL;
 	u64 disk_byte;
+	u64 wanted_objectid = key->objectid;
 
 add_parent:
-	ret = ulist_add(parents, eb->start, 0, GFP_NOFS);
+	if (level == 0 && extent_item_pos) {
+		fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
+		ret = check_extent_in_eb(key, eb, fi, *extent_item_pos, &eie);
+		if (ret < 0)
+			return ret;
+	}
+	ret = ulist_add(parents, eb->start, (unsigned long)eie, GFP_NOFS);
 	if (ret < 0)
 		return ret;
 
@@ -128,6 +211,7 @@ add_parent:
 	 * repeat this until we don''t find any additional EXTENT_DATA items.
 	 */
 	while (1) {
+		eie = NULL;
 		ret = btrfs_next_leaf(root, path);
 		if (ret < 0)
 			return ret;
@@ -136,9 +220,9 @@ add_parent:
 
 		eb = path->nodes[0];
 		for (slot = 0; slot < btrfs_header_nritems(eb); ++slot) {
-			btrfs_item_key_to_cpu(eb, &key, slot);
-			if (key.objectid != wanted_objectid ||
-			    key.type != BTRFS_EXTENT_DATA_KEY)
+			btrfs_item_key_to_cpu(eb, key, slot);
+			if (key->objectid != wanted_objectid ||
+			    key->type != BTRFS_EXTENT_DATA_KEY)
 				return 0;
 			fi = btrfs_item_ptr(eb, slot,
 						struct btrfs_file_extent_item);
@@ -158,7 +242,8 @@ add_parent:
 static int __resolve_indirect_ref(struct btrfs_fs_info *fs_info,
 					int search_commit_root,
 					struct __prelim_ref *ref,
-					struct ulist *parents)
+					struct ulist *parents,
+					const u64 *extent_item_pos)
 {
 	struct btrfs_path *path;
 	struct btrfs_root *root;
@@ -219,9 +304,8 @@ static int __resolve_indirect_ref(struct btrfs_fs_info
*fs_info,
 		btrfs_item_key_to_cpu(eb, &key, path->slots[0]);
 	}
 
-	/* the last two parameters will only be used for level == 0 */
-	ret = add_all_parents(root, path, parents, eb, level, key.objectid,
-				ref->wanted_disk_byte);
+	ret = add_all_parents(root, path, parents, level, &key,
+				ref->wanted_disk_byte, extent_item_pos);
 out:
 	btrfs_free_path(path);
 	return ret;
@@ -232,7 +316,8 @@ out:
  */
 static int __resolve_indirect_refs(struct btrfs_fs_info *fs_info,
 				   int search_commit_root,
-				   struct list_head *head)
+				   struct list_head *head,
+				   const u64 *extent_item_pos)
 {
 	int err;
 	int ret = 0;
@@ -257,7 +342,7 @@ static int __resolve_indirect_refs(struct btrfs_fs_info
*fs_info,
 		if (ref->count == 0)
 			continue;
 		err = __resolve_indirect_ref(fs_info, search_commit_root,
-					     ref, parents);
+					     ref, parents, extent_item_pos);
 		if (err) {
 			if (ret == 0)
 				ret = err;
@@ -267,6 +352,8 @@ static int __resolve_indirect_refs(struct btrfs_fs_info
*fs_info,
 		/* we put the first parent into the ref at hand */
 		node = ulist_next(parents, NULL);
 		ref->parent = node ? node->val : 0;
+		ref->inode_list +			node ? (struct extent_inode_elem *)node->aux : 0;
 
 		/* additional parents require new refs being added here */
 		while ((node = ulist_next(parents, node))) {
@@ -277,6 +364,8 @@ static int __resolve_indirect_refs(struct btrfs_fs_info
*fs_info,
 			}
 			memcpy(new_ref, ref, sizeof(*ref));
 			new_ref->parent = node->val;
+			new_ref->inode_list +					(struct extent_inode_elem *)node->aux;
 			list_add(&new_ref->list, &ref->list);
 		}
 		ulist_reinit(parents);
@@ -673,7 +762,8 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
  */
 static int find_parent_nodes(struct btrfs_trans_handle *trans,
 			     struct btrfs_fs_info *fs_info, u64 bytenr,
-			     u64 seq, struct ulist *refs, struct ulist *roots)
+			     u64 seq, struct ulist *refs, struct ulist *roots,
+			     const u64 *extent_item_pos)
 {
 	struct btrfs_key key;
 	struct btrfs_path *path;
@@ -775,7 +865,8 @@ again:
 	if (ret)
 		goto out;
 
-	ret = __resolve_indirect_refs(fs_info, search_commit_root, &prefs);
+	ret = __resolve_indirect_refs(fs_info, search_commit_root, &prefs,
+				      extent_item_pos);
 	if (ret)
 		goto out;
 
@@ -794,7 +885,23 @@ again:
 			BUG_ON(ret < 0);
 		}
 		if (ref->count && ref->parent) {
-			ret = ulist_add(refs, ref->parent, 0, GFP_NOFS);
+			if (extent_item_pos && !ref->inode_list) {
+				u32 bsz;
+				struct extent_inode_elem *eie = NULL;
+				struct extent_buffer *eb;
+				bsz = btrfs_level_size(fs_info->extent_root,
+							info_level);
+				eb = btrfs_find_tree_block(fs_info->extent_root,
+							   ref->parent, bsz);
+				BUG_ON(!eb);
+				ret = find_extent_in_eb(eb, bytenr,
+							*extent_item_pos, &eie);
+				ref->inode_list = eie;
+				free_extent_buffer(eb);
+			}
+			ret = ulist_add(refs, ref->parent,
+					(unsigned long)ref->inode_list,
+					GFP_NOFS);
 			BUG_ON(ret < 0);
 		}
 		kfree(ref);
@@ -819,6 +926,26 @@ out:
 	return ret;
 }
 
+static void free_leaf_list(struct ulist *blocks)
+{
+	struct ulist_node *node = NULL;
+	struct extent_inode_elem *eie;
+	struct extent_inode_elem *eie_next;
+
+	while ((node = ulist_next(blocks, node))) {
+		if (!node->aux)
+			continue;
+		eie = (struct extent_inode_elem *)node->aux;
+		for (; eie; eie = eie_next) {
+			eie_next = eie->next;
+			kfree(eie);
+		}
+		node->aux = 0;
+	}
+
+	ulist_free(blocks);
+}
+
 /*
  * Finds all leafs with a reference to the specified combination of bytenr and
  * offset. key_list_head will point to a list of corresponding keys (caller
must
@@ -829,7 +956,8 @@ out:
  */
 static int btrfs_find_all_leafs(struct btrfs_trans_handle *trans,
 				struct btrfs_fs_info *fs_info, u64 bytenr,
-				u64 num_bytes, u64 seq, struct ulist **leafs)
+				u64 seq, struct ulist **leafs,
+				const u64 *extent_item_pos)
 {
 	struct ulist *tmp;
 	int ret;
@@ -843,11 +971,12 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle
*trans,
 		return -ENOMEM;
 	}
 
-	ret = find_parent_nodes(trans, fs_info, bytenr, seq, *leafs, tmp);
+	ret = find_parent_nodes(trans, fs_info, bytenr, seq, *leafs, tmp,
+				extent_item_pos);
 	ulist_free(tmp);
 
 	if (ret < 0 && ret != -ENOENT) {
-		ulist_free(*leafs);
+		free_leaf_list(*leafs);
 		return ret;
 	}
 
@@ -869,7 +998,7 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle
*trans,
  */
 int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 				struct btrfs_fs_info *fs_info, u64 bytenr,
-				u64 num_bytes, u64 seq, struct ulist **roots)
+				u64 seq, struct ulist **roots)
 {
 	struct ulist *tmp;
 	struct ulist_node *node = NULL;
@@ -886,7 +1015,7 @@ int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 
 	while (1) {
 		ret = find_parent_nodes(trans, fs_info, bytenr, seq,
-					tmp, *roots);
+					tmp, *roots, NULL);
 		if (ret < 0 && ret != -ENOENT) {
 			ulist_free(tmp);
 			ulist_free(*roots);
@@ -1178,7 +1307,31 @@ int tree_backref_for_extent(unsigned long *ptr, struct
extent_buffer *eb,
 	return 0;
 }
 
-static int iterate_leaf_refs(struct btrfs_fs_info *fs_info, u64 logical,
+static int iterate_leaf_refs(struct extent_inode_elem *inode_list,
+				u64 root, u64 extent_item_objectid,
+				iterate_extent_inodes_t *iterate, void *ctx)
+{
+	struct extent_inode_elem *eie;
+	struct extent_inode_elem *eie_next;
+	int ret = 0;
+
+	for (eie = inode_list; eie; eie = eie_next) {
+		pr_debug("ref for %llu resolved, key (%llu EXTEND_DATA %llu), "
+			 "root %llu\n", extent_item_objectid,
+			 eie->inum, eie->offset, root);
+		ret = iterate(eie->inum, eie->offset, root, ctx);
+		if (ret) {
+			pr_debug("stopping iteration for %llu due to ret=%d\n",
+				 extent_item_objectid, ret);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+int iterate_leaf_refs_old(struct btrfs_fs_info *fs_info,
+				struct btrfs_path *path, u64 logical,
 				u64 orig_extent_item_objectid,
 				u64 extent_item_pos, u64 root,
 				iterate_extent_inodes_t *iterate, void *ctx)
@@ -1280,28 +1433,27 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
 	}
 
 	ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid,
-				   extent_item_pos, seq_elem.seq,
-				   &refs);
-
+				   seq_elem.seq, &refs, &extent_item_pos);
 	if (ret)
 		goto out;
 
 	while (!ret && (ref_node = ulist_next(refs, ref_node))) {
-		ret = btrfs_find_all_roots(trans, fs_info, ref_node->val, -1,
+		ret = btrfs_find_all_roots(trans, fs_info, ref_node->val,
 						seq_elem.seq, &roots);
 		if (ret)
 			break;
 		while (!ret && (root_node = ulist_next(roots, root_node))) {
-			pr_debug("root %llu references leaf %llu\n",
-					root_node->val, ref_node->val);
-			ret = iterate_leaf_refs(fs_info, ref_node->val,
-						extent_item_objectid,
-						extent_item_pos, root_node->val,
-						iterate, ctx);
+			pr_debug("root %llu references leaf %llu, data list "
+				 "%#lx\n", root_node->val, ref_node->val,
+				 ref_node->aux);
+			ret = iterate_leaf_refs(
+				(struct extent_inode_elem *)ref_node->aux,
+				root_node->val, extent_item_objectid,
+				iterate, ctx);
 		}
 	}
 
-	ulist_free(refs);
+	free_leaf_list(refs);
 	ulist_free(roots);
 out:
 	if (!search_commit_root) {
diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
index 57ea2e9..94ba1b2 100644
--- a/fs/btrfs/backref.h
+++ b/fs/btrfs/backref.h
@@ -58,7 +58,7 @@ int paths_from_inode(u64 inum, struct inode_fs_paths *ipath);
 
 int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 				struct btrfs_fs_info *fs_info, u64 bytenr,
-				u64 num_bytes, u64 seq, struct ulist **roots);
+				u64 seq, struct ulist **roots);
 
 struct btrfs_data_container *init_data_container(u32 total_bytes);
 struct inode_fs_paths *init_ipath(s32 total_bytes, struct btrfs_root *fs_root,
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 03/24] Btrfs: don''t set for_cow parameter for tree block functions

Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed
parameter for_cow = 1. In fact, these two functions should never mark
their tree modification operations as for_cow, because they can change
the number of blocks referenced by a tree.

Hence, we remove the extra for_cow parameter from these functions and
make them pass a zero down.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/ctree.c       |   22 +++++++++++-----------
 fs/btrfs/ctree.h       |    4 ++--
 fs/btrfs/disk-io.c     |    2 +-
 fs/btrfs/extent-tree.c |   10 +++++-----
 fs/btrfs/ioctl.c       |    2 +-
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4106264..56485b3 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -255,7 +255,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 
 	cow = btrfs_alloc_free_block(trans, root, buf->len, 0,
 				     new_root_objectid, &disk_key, level,
-				     buf->start, 0, 1);
+				     buf->start, 0);
 	if (IS_ERR(cow))
 		return PTR_ERR(cow);
 
@@ -467,7 +467,7 @@ static noinline int __btrfs_cow_block(struct
btrfs_trans_handle *trans,
 
 	cow = btrfs_alloc_free_block(trans, root, buf->len, parent_start,
 				     root->root_key.objectid, &disk_key,
-				     level, search_start, empty_size, 1);
+				     level, search_start, empty_size);
 	if (IS_ERR(cow))
 		return PTR_ERR(cow);
 
@@ -509,7 +509,7 @@ static noinline int __btrfs_cow_block(struct
btrfs_trans_handle *trans,
 		rcu_assign_pointer(root->node, cow);
 
 		btrfs_free_tree_block(trans, root, buf, parent_start,
-				      last_ref, 1);
+				      last_ref);
 		free_extent_buffer(buf);
 		add_root_to_dirty_list(root);
 	} else {
@@ -525,7 +525,7 @@ static noinline int __btrfs_cow_block(struct
btrfs_trans_handle *trans,
 					      trans->transid);
 		btrfs_mark_buffer_dirty(parent);
 		btrfs_free_tree_block(trans, root, buf, parent_start,
-				      last_ref, 1);
+				      last_ref);
 	}
 	if (unlock_orig)
 		btrfs_tree_unlock(buf);
@@ -987,7 +987,7 @@ static noinline int balance_level(struct btrfs_trans_handle
*trans,
 		free_extent_buffer(mid);
 
 		root_sub_used(root, mid->len);
-		btrfs_free_tree_block(trans, root, mid, 0, 1, 0);
+		btrfs_free_tree_block(trans, root, mid, 0, 1);
 		/* once for the root ptr */
 		free_extent_buffer_stale(mid);
 		return 0;
@@ -1042,7 +1042,7 @@ static noinline int balance_level(struct
btrfs_trans_handle *trans,
 			btrfs_tree_unlock(right);
 			del_ptr(trans, root, path, level + 1, pslot + 1);
 			root_sub_used(root, right->len);
-			btrfs_free_tree_block(trans, root, right, 0, 1, 0);
+			btrfs_free_tree_block(trans, root, right, 0, 1);
 			free_extent_buffer_stale(right);
 			right = NULL;
 		} else {
@@ -1084,7 +1084,7 @@ static noinline int balance_level(struct
btrfs_trans_handle *trans,
 		btrfs_tree_unlock(mid);
 		del_ptr(trans, root, path, level + 1, pslot);
 		root_sub_used(root, mid->len);
-		btrfs_free_tree_block(trans, root, mid, 0, 1, 0);
+		btrfs_free_tree_block(trans, root, mid, 0, 1);
 		free_extent_buffer_stale(mid);
 		mid = NULL;
 	} else {
@@ -2129,7 +2129,7 @@ static noinline int insert_new_root(struct
btrfs_trans_handle *trans,
 
 	c = btrfs_alloc_free_block(trans, root, root->nodesize, 0,
 				   root->root_key.objectid, &lower_key,
-				   level, root->node->start, 0, 0);
+				   level, root->node->start, 0);
 	if (IS_ERR(c))
 		return PTR_ERR(c);
 
@@ -2252,7 +2252,7 @@ static noinline int split_node(struct btrfs_trans_handle
*trans,
 
 	split = btrfs_alloc_free_block(trans, root, root->nodesize, 0,
 					root->root_key.objectid,
-					&disk_key, level, c->start, 0, 0);
+					&disk_key, level, c->start, 0);
 	if (IS_ERR(split))
 		return PTR_ERR(split);
 
@@ -3004,7 +3004,7 @@ again:
 
 	right = btrfs_alloc_free_block(trans, root, root->leafsize, 0,
 					root->root_key.objectid,
-					&disk_key, 0, l->start, 0, 0);
+					&disk_key, 0, l->start, 0);
 	if (IS_ERR(right))
 		return PTR_ERR(right);
 
@@ -3804,7 +3804,7 @@ static noinline void btrfs_del_leaf(struct
btrfs_trans_handle *trans,
 	root_sub_used(root, leaf->len);
 
 	extent_buffer_get(leaf);
-	btrfs_free_tree_block(trans, root, leaf, 0, 1, 0);
+	btrfs_free_tree_block(trans, root, leaf, 0, 1);
 	free_extent_buffer_stale(leaf);
 }
 /*
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ec42a24..e863188 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2496,11 +2496,11 @@ struct extent_buffer *btrfs_alloc_free_block(struct
btrfs_trans_handle *trans,
 					struct btrfs_root *root, u32 blocksize,
 					u64 parent, u64 root_objectid,
 					struct btrfs_disk_key *key, int level,
-					u64 hint, u64 empty_size, int for_cow);
+					u64 hint, u64 empty_size);
 void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root,
 			   struct extent_buffer *buf,
-			   u64 parent, int last_ref, int for_cow);
+			   u64 parent, int last_ref);
 struct extent_buffer *btrfs_init_new_buffer(struct btrfs_trans_handle *trans,
 					    struct btrfs_root *root,
 					    u64 bytenr, u32 blocksize,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a7ffc88..f433074 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1252,7 +1252,7 @@ static struct btrfs_root *alloc_log_tree(struct
btrfs_trans_handle *trans,
 
 	leaf = btrfs_alloc_free_block(trans, root, root->leafsize, 0,
 				      BTRFS_TREE_LOG_OBJECTID, NULL,
-				      0, 0, 0, 0);
+				      0, 0, 0);
 	if (IS_ERR(leaf)) {
 		kfree(root);
 		return ERR_CAST(leaf);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 49fd7b6..b68eb7a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5217,7 +5217,7 @@ out:
 void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root,
 			   struct extent_buffer *buf,
-			   u64 parent, int last_ref, int for_cow)
+			   u64 parent, int last_ref)
 {
 	struct btrfs_block_group_cache *cache = NULL;
 	int ret;
@@ -5227,7 +5227,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle
*trans,
 					buf->start, buf->len,
 					parent, root->root_key.objectid,
 					btrfs_header_level(buf),
-					BTRFS_DROP_DELAYED_REF, NULL, for_cow);
+					BTRFS_DROP_DELAYED_REF, NULL, 0);
 		BUG_ON(ret); /* -ENOMEM */
 	}
 
@@ -6249,7 +6249,7 @@ struct extent_buffer *btrfs_alloc_free_block(struct
btrfs_trans_handle *trans,
 					struct btrfs_root *root, u32 blocksize,
 					u64 parent, u64 root_objectid,
 					struct btrfs_disk_key *key, int level,
-					u64 hint, u64 empty_size, int for_cow)
+					u64 hint, u64 empty_size)
 {
 	struct btrfs_key ins;
 	struct btrfs_block_rsv *block_rsv;
@@ -6297,7 +6297,7 @@ struct extent_buffer *btrfs_alloc_free_block(struct
btrfs_trans_handle *trans,
 					ins.objectid,
 					ins.offset, parent, root_objectid,
 					level, BTRFS_ADD_DELAYED_EXTENT,
-					extent_op, for_cow);
+					extent_op, 0);
 		BUG_ON(ret); /* -ENOMEM */
 	}
 	return buf;
@@ -6715,7 +6715,7 @@ static noinline int walk_up_proc(struct btrfs_trans_handle
*trans,
 			       btrfs_header_owner(path->nodes[level + 1]));
 	}
 
-	btrfs_free_tree_block(trans, root, eb, parent, wc->refs[level] == 1, 0);
+	btrfs_free_tree_block(trans, root, eb, parent, wc->refs[level] == 1);
 out:
 	wc->refs[level] = 0;
 	wc->flags[level] = 0;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 14f8e1f..7f3a913 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -367,7 +367,7 @@ static noinline int create_subvol(struct btrfs_root *root,
 		return PTR_ERR(trans);
 
 	leaf = btrfs_alloc_free_block(trans, root, root->leafsize,
-				      0, objectid, NULL, 0, 0, 0, 0);
+				      0, objectid, NULL, 0, 0, 0);
 	if (IS_ERR(leaf)) {
 		ret = PTR_ERR(leaf);
 		goto fail;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 04/24] Btrfs: move struct seq_list to ctree.h

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/ctree.h       |    7 +++++++
 fs/btrfs/delayed-ref.h |    5 -----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e863188..e0da6db 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3098,4 +3098,11 @@ void btrfs_reada_detach(void *handle);
 int btree_readahead_hook(struct btrfs_root *root, struct extent_buffer *eb,
 			 u64 start, int err);
 
+/* delayed seq elem */
+struct seq_list {
+	struct list_head list;
+	u64 seq;
+	u32 flags;
+};
+
 #endif
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index d8f244d..fd82446 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -195,11 +195,6 @@ int btrfs_delayed_ref_lock(struct btrfs_trans_handle
*trans,
 int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans,
 			   struct list_head *cluster, u64 search_start);
 
-struct seq_list {
-	struct list_head list;
-	u64 seq;
-};
-
 static inline u64 inc_delayed_seq(struct btrfs_delayed_ref_root *delayed_refs)
 {
 	assert_spin_locked(&delayed_refs->lock);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 05/24] Btrfs: dummy extent buffers for tree mod log

The tree modification log needs two ways to create dummy extent buffers,
once by allocating a fresh one (to rebuild an old root) and once by
cloning an existing one (to make private rewind modifications) to it.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/extent_io.c |   73 +++++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.h |    3 ++
 2 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2fb52c2..4f4bcaa 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3967,6 +3967,58 @@ static struct extent_buffer *__alloc_extent_buffer(struct
extent_io_tree *tree,
 	return eb;
 }
 
+struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src)
+{
+	unsigned long i;
+	struct page *p;
+	struct extent_buffer *new;
+	unsigned long num_pages = num_extent_pages(src->start, src->len);
+
+	new = __alloc_extent_buffer(NULL, src->start, src->len, GFP_ATOMIC);
+	if (new == NULL)
+		return NULL;
+
+	for (i = 0; i < num_pages; i++) {
+		p = alloc_page(GFP_ATOMIC);
+		BUG_ON(!p);
+		attach_extent_buffer_page(new, p);
+		WARN_ON(PageDirty(p));
+		SetPageUptodate(p);
+		new->pages[i] = p;
+	}
+
+	copy_extent_buffer(new, src, 0, 0, src->len);
+	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
+
+	return new;
+}
+
+struct extent_buffer *alloc_dummy_extent_buffer(u64 start, unsigned long len)
+{
+	struct extent_buffer *eb;
+	unsigned long num_pages = num_extent_pages(0, len);
+	unsigned long i;
+
+	eb = __alloc_extent_buffer(NULL, start, len, GFP_ATOMIC);
+	if (!eb)
+		return NULL;
+
+	for (i = 0; i < num_pages; i++) {
+		eb->pages[i] = alloc_page(GFP_ATOMIC);
+		if (!eb->pages[i])
+			goto err;
+	}
+	set_extent_buffer_uptodate(eb);
+	btrfs_set_header_nritems(eb, 0);
+
+	return eb;
+err:
+	for (i--; i > 0; i--)
+		__free_page(eb->pages[i]);
+	__free_extent_buffer(eb);
+	return NULL;
+}
+
 static int extent_buffer_under_io(struct extent_buffer *eb)
 {
 	return (atomic_read(&eb->io_pages) ||
@@ -3978,7 +4030,8 @@ static int extent_buffer_under_io(struct extent_buffer
*eb)
  * Helper for releasing extent buffer page.
  */
 static void btrfs_release_extent_buffer_page(struct extent_buffer *eb,
-						unsigned long start_idx)
+						unsigned long start_idx,
+						int mapped)
 {
 	unsigned long index;
 	struct page *page;
@@ -3992,7 +4045,7 @@ static void btrfs_release_extent_buffer_page(struct
extent_buffer *eb,
 	do {
 		index--;
 		page = extent_buffer_page(eb, index);
-		if (page) {
+		if (page && mapped) {
 			spin_lock(&page->mapping->private_lock);
 			/*
 			 * We do this since we''ll remove the pages after we''ve
@@ -4017,6 +4070,8 @@ static void btrfs_release_extent_buffer_page(struct
extent_buffer *eb,
 			}
 			spin_unlock(&page->mapping->private_lock);
 
+		}
+		if (page) {
 			/* One for when we alloced the page */
 			page_cache_release(page);
 		}
@@ -4026,12 +4081,18 @@ static void btrfs_release_extent_buffer_page(struct
extent_buffer *eb,
 /*
  * Helper for releasing the extent buffer.
  */
-static inline void btrfs_release_extent_buffer(struct extent_buffer *eb)
+static inline void btrfs_release_extent_buffer(struct extent_buffer *eb,
+					       int mapped)
 {
-	btrfs_release_extent_buffer_page(eb, 0);
+	btrfs_release_extent_buffer_page(eb, 0, mapped);
 	__free_extent_buffer(eb);
 }
 
+void free_cloned_extent_buffer(struct extent_buffer *eb)
+{
+	btrfs_release_extent_buffer(eb, 0);
+}
+
 static void check_buffer_tree_ref(struct extent_buffer *eb)
 {
 	/* the ref bit is tricky.  We have to make sure it is set
@@ -4201,7 +4262,7 @@ free_eb:
 	}
 
 	WARN_ON(!atomic_dec_and_test(&eb->refs));
-	btrfs_release_extent_buffer(eb);
+	btrfs_release_extent_buffer(eb, 1);
 	return exists;
 }
 
@@ -4245,7 +4306,7 @@ static void release_extent_buffer(struct extent_buffer
*eb, gfp_t mask)
 		spin_unlock(&tree->buffer_lock);
 
 		/* Should be safe to release our pages at this point */
-		btrfs_release_extent_buffer_page(eb, 0);
+		btrfs_release_extent_buffer_page(eb, 0, 1);
 
 		call_rcu(&eb->rcu_head, btrfs_release_extent_buffer_rcu);
 		return;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index b516c3b..80dd65a 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -265,6 +265,9 @@ void set_page_extent_mapped(struct page *page);
 
 struct extent_buffer *alloc_extent_buffer(struct extent_io_tree *tree,
 					  u64 start, unsigned long len);
+struct extent_buffer *alloc_dummy_extent_buffer(u64 start, unsigned long len);
+struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src);
+void free_cloned_extent_buffer(struct extent_buffer *eb);
 struct extent_buffer *find_extent_buffer(struct extent_io_tree *tree,
 					 u64 start, unsigned long len);
 void free_extent_buffer(struct extent_buffer *eb);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 06/24] Btrfs: add tree mod log to fs_info

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/ctree.h   |    9 +++++++++
 fs/btrfs/disk-io.c |    5 +++++
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e0da6db..6774821 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1129,6 +1129,15 @@ struct btrfs_fs_info {
 	spinlock_t delayed_iput_lock;
 	struct list_head delayed_iputs;
 
+	/* this protects tree_mod_seq_list */
+	spinlock_t tree_mod_seq_lock;
+	atomic_t tree_mod_seq;
+	struct list_head tree_mod_list;
+
+	/* this protects tree_mod_log */
+	rwlock_t tree_mod_log_lock;
+	struct rb_root tree_mod_log;
+
 	atomic_t nr_async_submits;
 	atomic_t async_submit_draining;
 	atomic_t nr_async_bios;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f433074..6aec7c6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1914,11 +1914,14 @@ int open_ctree(struct super_block *sb,
 	spin_lock_init(&fs_info->delayed_iput_lock);
 	spin_lock_init(&fs_info->defrag_inodes_lock);
 	spin_lock_init(&fs_info->free_chunk_lock);
+	spin_lock_init(&fs_info->tree_mod_seq_lock);
+	rwlock_init(&fs_info->tree_mod_log_lock);
 	mutex_init(&fs_info->reloc_mutex);
 
 	init_completion(&fs_info->kobj_unregister);
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
 	INIT_LIST_HEAD(&fs_info->space_info);
+	INIT_LIST_HEAD(&fs_info->tree_mod_list);
 	btrfs_mapping_init(&fs_info->mapping_tree);
 	btrfs_init_block_rsv(&fs_info->global_block_rsv);
 	btrfs_init_block_rsv(&fs_info->delalloc_block_rsv);
@@ -1931,12 +1934,14 @@ int open_ctree(struct super_block *sb,
 	atomic_set(&fs_info->async_submit_draining, 0);
 	atomic_set(&fs_info->nr_async_bios, 0);
 	atomic_set(&fs_info->defrag_running, 0);
+	atomic_set(&fs_info->tree_mod_seq, 0);
 	fs_info->sb = sb;
 	fs_info->max_inline = 8192 * 1024;
 	fs_info->metadata_ratio = 0;
 	fs_info->defrag_inodes = RB_ROOT;
 	fs_info->trans_no_join = 0;
 	fs_info->free_chunk_space = 0;
+	fs_info->tree_mod_log = RB_ROOT;
 
 	/* readahead state */
 	INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_WAIT);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 07/24] Btrfs: add tree modification log functions

The tree mod log will log modifications made fs-tree nodes. Most
modifications are done by autobalance of the tree. Such changes are recorded
as long as a block entry exists. When released, the log is cleaned.

With the tree modification log, it''s possible to reconstruct a
consistent
old state of the tree. This is required to do backref walking on a busy
file system.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/ctree.c   |  409 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/ctree.h   |    7 +-
 fs/btrfs/disk-io.c |    2 +-
 3 files changed, 416 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 56485b3..6420638 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -18,6 +18,7 @@
 
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/rbtree.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -288,6 +289,414 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
+enum mod_log_op {
+	MOD_LOG_KEY_REPLACE,
+	MOD_LOG_KEY_ADD,
+	MOD_LOG_KEY_REMOVE,
+	MOD_LOG_KEY_REMOVE_WHILE_FREEING,
+	MOD_LOG_MOVE_KEYS,
+	MOD_LOG_ROOT_REPLACE,
+};
+
+struct tree_mod_move {
+	int dst_slot;
+	int nr_items;
+};
+
+struct tree_mod_root {
+	u64 logical;
+	u8 level;
+};
+
+struct tree_mod_elem {
+	struct rb_node node;
+	u64 index;		/* shifted logical */
+	struct seq_list elem;
+	enum mod_log_op op;
+
+	/* this is used for MOD_LOG_KEY_* and MOD_LOG_MOVE_KEYS operations */
+	int slot;
+
+	/* this is used for MOD_LOG_KEY* and MOD_LOG_ROOT_REPLACE */
+	u64 generation;
+
+	/* those are used for op == MOD_LOG_KEY_{REPLACE,REMOVE} */
+	struct btrfs_disk_key key;
+	u64 blockptr;
+
+	/* this is used for op == MOD_LOG_MOVE_KEYS */
+	struct tree_mod_move move;
+
+	/* this is used for op == MOD_LOG_ROOT_REPLACE */
+	struct tree_mod_root old_root;
+};
+
+static inline void
+__get_tree_mod_seq(struct btrfs_fs_info *fs_info, struct seq_list *elem)
+{
+	elem->seq = atomic_inc_return(&fs_info->tree_mod_seq);
+	list_add_tail(&elem->list, &fs_info->tree_mod_seq_list);
+}
+
+void btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info,
+			    struct seq_list *elem)
+{
+	elem->flags = 1;
+	spin_lock(&fs_info->tree_mod_seq_lock);
+	__get_tree_mod_seq(fs_info, elem);
+	spin_unlock(&fs_info->tree_mod_seq_lock);
+}
+
+void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
+			    struct seq_list *elem)
+{
+	struct rb_root *tm_root;
+	struct rb_node *node;
+	struct rb_node *next;
+	struct seq_list *cur_elem;
+	struct tree_mod_elem *tm;
+	u64 min_seq = (u64)-1;
+	u64 seq_putting = elem->seq;
+
+	if (!seq_putting)
+		return;
+
+	BUG_ON(!(elem->flags & 1));
+	spin_lock(&fs_info->tree_mod_seq_lock);
+	list_del(&elem->list);
+
+	list_for_each_entry(cur_elem, &fs_info->tree_mod_seq_list, list) {
+		if ((cur_elem->flags & 1) && cur_elem->seq < min_seq) {
+			if (seq_putting > cur_elem->seq) {
+				/*
+				 * blocker with lower sequence number exists, we
+				 * cannot remove anything from the log
+				 */
+				goto out;
+			}
+			min_seq = cur_elem->seq;
+		}
+	}
+
+	/*
+	 * anything that''s lower than the lowest existing (read: blocked)
+	 * sequence number can be removed from the tree.
+	 */
+	write_lock(&fs_info->tree_mod_log_lock);
+	tm_root = &fs_info->tree_mod_log;
+	for (node = rb_first(tm_root); node; node = next) {
+		next = rb_next(node);
+		tm = container_of(node, struct tree_mod_elem, node);
+		if (tm->elem.seq > min_seq)
+			continue;
+		rb_erase(node, tm_root);
+		list_del(&tm->elem.list);
+		kfree(tm);
+	}
+	write_unlock(&fs_info->tree_mod_log_lock);
+out:
+	spin_unlock(&fs_info->tree_mod_seq_lock);
+}
+
+/*
+ * key order of the log:
+ *       index -> sequence
+ *
+ * the index is the shifted logical of the *new* root node for root replace
+ * operations, or the shifted logical of the affected block for all other
+ * operations.
+ */
+static noinline int
+__tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
+{
+	struct rb_root *tm_root;
+	struct rb_node **new;
+	struct rb_node *parent = NULL;
+	struct tree_mod_elem *cur;
+
+	BUG_ON(!tm || !tm->elem.seq);
+
+	write_lock(&fs_info->tree_mod_log_lock);
+	tm_root = &fs_info->tree_mod_log;
+	new = &tm_root->rb_node;
+	while (*new) {
+		cur = container_of(*new, struct tree_mod_elem, node);
+		parent = *new;
+		if (cur->index < tm->index)
+			new = &((*new)->rb_left);
+		else if (cur->index > tm->index)
+			new = &((*new)->rb_right);
+		else if (cur->elem.seq < tm->elem.seq)
+			new = &((*new)->rb_left);
+		else if (cur->elem.seq > tm->elem.seq)
+			new = &((*new)->rb_right);
+		else {
+			kfree(tm);
+			return -EEXIST;
+		}
+	}
+
+	rb_link_node(&tm->node, parent, new);
+	rb_insert_color(&tm->node, tm_root);
+	write_unlock(&fs_info->tree_mod_log_lock);
+
+	return 0;
+}
+
+int tree_mod_alloc(struct btrfs_fs_info *fs_info, gfp_t flags,
+		   struct tree_mod_elem **tm_ret)
+{
+	struct tree_mod_elem *tm;
+	u64 seq = 0;
+
+	/*
+	 * we want to avoid a malloc/free cycle if there''s no blocker in the
+	 * list.
+	 * we also want to avoid atomic malloc. so we must drop the spinlock
+	 * before calling kzalloc and recheck afterwards.
+	 */
+	spin_lock(&fs_info->tree_mod_seq_lock);
+	if (list_empty(&fs_info->tree_mod_seq_list))
+		goto out;
+
+	spin_unlock(&fs_info->tree_mod_seq_lock);
+	tm = *tm_ret = kzalloc(sizeof(*tm), flags);
+	if (!tm)
+		return -ENOMEM;
+
+	spin_lock(&fs_info->tree_mod_seq_lock);
+	if (list_empty(&fs_info->tree_mod_seq_list)) {
+		kfree(tm);
+		goto out;
+	}
+
+	__get_tree_mod_seq(fs_info, &tm->elem);
+	seq = tm->elem.seq;
+	tm->elem.flags = 0;
+
+out:
+	spin_unlock(&fs_info->tree_mod_seq_lock);
+	return seq;
+}
+
+static noinline int
+tree_mod_log_insert_key_mask(struct btrfs_fs_info *fs_info,
+			     struct extent_buffer *eb, int slot,
+			     enum mod_log_op op, gfp_t flags)
+{
+	struct tree_mod_elem *tm;
+	int ret;
+
+	ret = tree_mod_alloc(fs_info, flags, &tm);
+	if (ret <= 0)
+		return ret;
+
+	tm->index = eb->start >> PAGE_CACHE_SHIFT;
+	if (op != MOD_LOG_KEY_ADD) {
+		btrfs_node_key(eb, &tm->key, slot);
+		tm->blockptr = btrfs_node_blockptr(eb, slot);
+	}
+	tm->op = op;
+	tm->slot = slot;
+	tm->generation = btrfs_node_ptr_generation(eb, slot);
+
+	return __tree_mod_log_insert(fs_info, tm);
+}
+
+static noinline int
+tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, struct extent_buffer
*eb,
+			int slot, enum mod_log_op op)
+{
+	return tree_mod_log_insert_key_mask(fs_info, eb, slot, op, GFP_NOFS);
+}
+
+static noinline int
+tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
+			 struct extent_buffer *eb, int dst_slot, int src_slot,
+			 int nr_items, gfp_t flags)
+{
+	struct tree_mod_elem *tm;
+	int ret;
+
+	ret = tree_mod_alloc(fs_info, flags, &tm);
+	if (ret <= 0)
+		return ret;
+
+	tm->index = eb->start >> PAGE_CACHE_SHIFT;
+	tm->slot = src_slot;
+	tm->move.dst_slot = dst_slot;
+	tm->move.nr_items = nr_items;
+	tm->op = MOD_LOG_MOVE_KEYS;
+
+	return __tree_mod_log_insert(fs_info, tm);
+}
+
+static noinline int
+tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
+			 struct extent_buffer *old_root,
+			 struct extent_buffer *new_root, gfp_t flags)
+{
+	struct tree_mod_elem *tm;
+	int ret;
+
+	ret = tree_mod_alloc(fs_info, flags, &tm);
+	if (ret <= 0)
+		return ret;
+
+	tm->index = new_root->start >> PAGE_CACHE_SHIFT;
+	tm->old_root.logical = old_root->start;
+	tm->old_root.level = btrfs_header_level(old_root);
+	tm->generation = btrfs_header_generation(old_root);
+	tm->op = MOD_LOG_ROOT_REPLACE;
+
+	return __tree_mod_log_insert(fs_info, tm);
+}
+
+static struct tree_mod_elem *
+__tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64 min_seq,
+		      int smallest)
+{
+	struct rb_root *tm_root;
+	struct rb_node *node;
+	struct tree_mod_elem *cur = NULL;
+	struct tree_mod_elem *found = NULL;
+	u64 index = start >> PAGE_CACHE_SHIFT;
+
+	read_lock(&fs_info->tree_mod_log_lock);
+	tm_root = &fs_info->tree_mod_log;
+	node = tm_root->rb_node;
+	while (node) {
+		cur = container_of(node, struct tree_mod_elem, node);
+		if (cur->index < index) {
+			node = node->rb_left;
+		} else if (cur->index > index) {
+			node = node->rb_right;
+		} else if (cur->elem.seq < min_seq) {
+			node = node->rb_left;
+		} else if (!smallest) {
+			/* we want the node with the highest seq */
+			if (found)
+				BUG_ON(found->elem.seq > cur->elem.seq);
+			found = cur;
+			node = node->rb_left;
+		} else if (cur->elem.seq > min_seq) {
+			/* we want the node with the smallest seq */
+			if (found)
+				BUG_ON(found->elem.seq < cur->elem.seq);
+			found = cur;
+			node = node->rb_right;
+		} else {
+			return cur;
+		}
+	}
+	read_unlock(&fs_info->tree_mod_log_lock);
+
+	return found;
+}
+
+/*
+ * this returns the element from the log with the smallest time sequence
+ * value that''s in the log (the oldest log item). any element with a
time
+ * sequence lower than min_seq will be ignored.
+ */
+static struct tree_mod_elem *
+tree_mod_log_search_oldest(struct btrfs_fs_info *fs_info, u64 start,
+			   u64 min_seq)
+{
+	return __tree_mod_log_search(fs_info, start, min_seq, 1);
+}
+
+/*
+ * this returns the element from the log with the largest time sequence
+ * value that''s in the log (the most recent log item). any element
with
+ * a time sequence lower than min_seq will be ignored.
+ */
+static struct tree_mod_elem *
+tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64 min_seq)
+{
+	return __tree_mod_log_search(fs_info, start, min_seq, 0);
+}
+
+static inline void
+__copy_extent_buffer_log(struct btrfs_fs_info *fs_info,
+			 struct extent_buffer *dst, struct extent_buffer *src,
+			 unsigned long dst_offset, unsigned long src_offset,
+			 int nr_items, size_t item_size)
+{
+	int ret;
+	int i;
+
+	/* speed this up by single seq for all operations? */
+	for (i = 0; i < nr_items; i++) {
+		ret = tree_mod_log_insert_key(fs_info, src, i + src_offset,
+					      MOD_LOG_KEY_REMOVE);
+		BUG_ON(ret < 0);
+		ret = tree_mod_log_insert_key(fs_info, dst, i + dst_offset,
+					      MOD_LOG_KEY_ADD);
+		BUG_ON(ret < 0);
+	}
+
+	copy_extent_buffer(dst, src, btrfs_node_key_ptr_offset(dst_offset),
+			   btrfs_node_key_ptr_offset(src_offset),
+			   nr_items * item_size);
+}
+
+static inline void
+__memmove_extent_buffer_log(struct btrfs_fs_info *fs_info,
+			    struct extent_buffer *dst,
+			    int dst_offset, int src_offset, int nr_items,
+			    size_t item_size, int tree_mod_log)
+{
+	int ret;
+	if (tree_mod_log) {
+		ret = tree_mod_log_insert_move(fs_info, dst, dst_offset,
+					       src_offset, nr_items, GFP_NOFS);
+		BUG_ON(ret < 0);
+	}
+	memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(dst_offset),
+			      btrfs_node_key_ptr_offset(src_offset),
+			      nr_items * item_size);
+}
+
+static inline void
+__set_node_key_log(struct btrfs_fs_info *fs_info, struct extent_buffer *eb,
+		   struct btrfs_disk_key *disk_key, int nr, int atomic)
+{
+	int ret;
+
+	ret = tree_mod_log_insert_key_mask(fs_info, eb, nr, MOD_LOG_KEY_REPLACE,
+					   atomic ? GFP_ATOMIC : GFP_NOFS);
+	BUG_ON(ret < 0);
+
+	btrfs_set_node_key(eb, disk_key, nr);
+}
+
+static void __log_cleaning(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *eb)
+{
+	int i;
+	int ret;
+	u32 nritems;
+
+	nritems = btrfs_header_nritems(eb);
+	for (i = nritems - 1; i >= 0; i--) {
+		ret = tree_mod_log_insert_key(fs_info, eb, i,
+					      MOD_LOG_KEY_REMOVE_WHILE_FREEING);
+		BUG_ON(ret < 0);
+	}
+}
+
+static inline void
+set_root_pointer(struct btrfs_root *root, struct extent_buffer *new_root_node)
+{
+	int ret;
+	__log_cleaning(root->fs_info, root->node);
+	ret = tree_mod_log_insert_root(root->fs_info, root->node,
+				       new_root_node, GFP_NOFS);
+	BUG_ON(ret < 0);
+	rcu_assign_pointer(root->node, new_root_node);
+}
+
 /*
  * check if the tree block can be shared by multiple trees
  */
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6774821..e53bfb9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1132,7 +1132,7 @@ struct btrfs_fs_info {
 	/* this protects tree_mod_seq_list */
 	spinlock_t tree_mod_seq_lock;
 	atomic_t tree_mod_seq;
-	struct list_head tree_mod_list;
+	struct list_head tree_mod_seq_list;
 
 	/* this protects tree_mod_log */
 	rwlock_t tree_mod_log_lock;
@@ -3114,4 +3114,9 @@ struct seq_list {
 	u32 flags;
 };
 
+void btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info,
+			    struct seq_list *elem);
+void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
+			    struct seq_list *elem);
+
 #endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6aec7c6..f51ad84 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1921,7 +1921,7 @@ int open_ctree(struct super_block *sb,
 	init_completion(&fs_info->kobj_unregister);
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
 	INIT_LIST_HEAD(&fs_info->space_info);
-	INIT_LIST_HEAD(&fs_info->tree_mod_list);
+	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
 	btrfs_mapping_init(&fs_info->mapping_tree);
 	btrfs_init_block_rsv(&fs_info->global_block_rsv);
 	btrfs_init_block_rsv(&fs_info->delalloc_block_rsv);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 08/24] Btrfs: put all modifications into the tree mod log

When running functions that can make changes to the internal trees
(e.g. btrfs_search_slot), we check if somebody may be interested in the
block we''re currently modifying. If so, we record our modification to
be
able to rewind it later on.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/ctree.c |  126 ++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 79 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 6420638..724eade 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -38,7 +38,18 @@ static int balance_node_right(struct btrfs_trans_handle
*trans,
 			      struct extent_buffer *dst_buf,
 			      struct extent_buffer *src_buf);
 static void del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
-		   struct btrfs_path *path, int level, int slot);
+		    struct btrfs_path *path, int level, int slot,
+		    int tree_mod_log);
+static inline void set_root_pointer(struct btrfs_root *root,
+				    struct extent_buffer *new_root_node);
+static void __log_cleaning(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *eb);
+struct extent_buffer *read_old_tree_block(struct btrfs_root *root, u64 bytenr,
+					  u32 blocksize, u64 parent_transid,
+					  u64 time_seq);
+struct extent_buffer *btrfs_find_old_tree_block(struct btrfs_root *root,
+						u64 bytenr, u32 blocksize,
+						u64 time_seq);
 
 struct btrfs_path *btrfs_alloc_path(void)
 {
@@ -678,6 +689,9 @@ static void __log_cleaning(struct btrfs_fs_info *fs_info,
 	int ret;
 	u32 nritems;
 
+	if (btrfs_header_level(eb) == 0)
+		return;
+
 	nritems = btrfs_header_nritems(eb);
 	for (i = nritems - 1; i >= 0; i--) {
 		ret = tree_mod_log_insert_key(fs_info, eb, i,
@@ -818,6 +832,9 @@ static noinline int update_ref_for_cow(struct
btrfs_trans_handle *trans,
 			ret = btrfs_dec_ref(trans, root, buf, 1, 1);
 			BUG_ON(ret); /* -ENOMEM */
 		}
+		/* the root node will be logged in set_root_pointer later */
+		if (buf != root->node && btrfs_header_level(buf) != 0)
+			__log_cleaning(root->fs_info, buf);
 		clean_tree_block(trans, root, buf);
 		*last_ref = 1;
 	}
@@ -915,7 +932,7 @@ static noinline int __btrfs_cow_block(struct
btrfs_trans_handle *trans,
 			parent_start = 0;
 
 		extent_buffer_get(cow);
-		rcu_assign_pointer(root->node, cow);
+		set_root_pointer(root, cow);
 
 		btrfs_free_tree_block(trans, root, buf, parent_start,
 				      last_ref);
@@ -928,6 +945,8 @@ static noinline int __btrfs_cow_block(struct
btrfs_trans_handle *trans,
 			parent_start = 0;
 
 		WARN_ON(trans->transid != btrfs_header_generation(parent));
+		tree_mod_log_insert_key(root->fs_info, parent, parent_slot,
+					MOD_LOG_KEY_REPLACE);
 		btrfs_set_node_blockptr(parent, parent_slot,
 					cow->start);
 		btrfs_set_node_ptr_generation(parent, parent_slot,
@@ -1383,7 +1402,9 @@ static noinline int balance_level(struct
btrfs_trans_handle *trans,
 			goto enospc;
 		}
 
-		rcu_assign_pointer(root->node, child);
+		ret = tree_mod_log_insert_key(root->fs_info, mid, 0,
+					      MOD_LOG_KEY_REMOVE);
+		set_root_pointer(root, child);
 
 		add_root_to_dirty_list(root);
 		btrfs_tree_unlock(child);
@@ -1449,7 +1470,7 @@ static noinline int balance_level(struct
btrfs_trans_handle *trans,
 		if (btrfs_header_nritems(right) == 0) {
 			clean_tree_block(trans, root, right);
 			btrfs_tree_unlock(right);
-			del_ptr(trans, root, path, level + 1, pslot + 1);
+			del_ptr(trans, root, path, level + 1, pslot + 1, 1);
 			root_sub_used(root, right->len);
 			btrfs_free_tree_block(trans, root, right, 0, 1);
 			free_extent_buffer_stale(right);
@@ -1457,7 +1478,8 @@ static noinline int balance_level(struct
btrfs_trans_handle *trans,
 		} else {
 			struct btrfs_disk_key right_key;
 			btrfs_node_key(right, &right_key, 0);
-			btrfs_set_node_key(parent, &right_key, pslot + 1);
+			__set_node_key_log(root->fs_info, parent,
+					   &right_key, pslot + 1, 0);
 			btrfs_mark_buffer_dirty(parent);
 		}
 	}
@@ -1491,7 +1513,7 @@ static noinline int balance_level(struct
btrfs_trans_handle *trans,
 	if (btrfs_header_nritems(mid) == 0) {
 		clean_tree_block(trans, root, mid);
 		btrfs_tree_unlock(mid);
-		del_ptr(trans, root, path, level + 1, pslot);
+		del_ptr(trans, root, path, level + 1, pslot, 1);
 		root_sub_used(root, mid->len);
 		btrfs_free_tree_block(trans, root, mid, 0, 1);
 		free_extent_buffer_stale(mid);
@@ -1500,7 +1522,7 @@ static noinline int balance_level(struct
btrfs_trans_handle *trans,
 		/* update the parent key to reflect our changes */
 		struct btrfs_disk_key mid_key;
 		btrfs_node_key(mid, &mid_key, 0);
-		btrfs_set_node_key(parent, &mid_key, pslot);
+		__set_node_key_log(root->fs_info, parent, &mid_key, pslot, 0);
 		btrfs_mark_buffer_dirty(parent);
 	}
 
@@ -1597,7 +1619,8 @@ static noinline int push_nodes_for_insert(struct
btrfs_trans_handle *trans,
 			struct btrfs_disk_key disk_key;
 			orig_slot += left_nr;
 			btrfs_node_key(mid, &disk_key, 0);
-			btrfs_set_node_key(parent, &disk_key, pslot);
+			__set_node_key_log(root->fs_info, parent, &disk_key,
+					   pslot, 0);
 			btrfs_mark_buffer_dirty(parent);
 			if (btrfs_header_nritems(left) > orig_slot) {
 				path->nodes[level] = left;
@@ -1648,7 +1671,8 @@ static noinline int push_nodes_for_insert(struct
btrfs_trans_handle *trans,
 			struct btrfs_disk_key disk_key;
 
 			btrfs_node_key(right, &disk_key, 0);
-			btrfs_set_node_key(parent, &disk_key, pslot + 1);
+			__set_node_key_log(root->fs_info, parent, &disk_key,
+					   pslot + 1, 0);
 			btrfs_mark_buffer_dirty(parent);
 
 			if (btrfs_header_nritems(mid) <= orig_slot) {
@@ -2350,7 +2374,7 @@ static void fixup_low_keys(struct btrfs_trans_handle
*trans,
 		if (!path->nodes[i])
 			break;
 		t = path->nodes[i];
-		btrfs_set_node_key(t, key, tslot);
+		__set_node_key_log(root->fs_info, t, key, tslot, 1);
 		btrfs_mark_buffer_dirty(path->nodes[i]);
 		if (tslot != 0)
 			break;
@@ -2432,16 +2456,14 @@ static int push_node_left(struct btrfs_trans_handle
*trans,
 	} else
 		push_items = min(src_nritems - 8, push_items);
 
-	copy_extent_buffer(dst, src,
-			   btrfs_node_key_ptr_offset(dst_nritems),
-			   btrfs_node_key_ptr_offset(0),
-			   push_items * sizeof(struct btrfs_key_ptr));
+
+	__copy_extent_buffer_log(root->fs_info, dst, src, dst_nritems, 0,
+				 push_items, sizeof(struct btrfs_key_ptr));
 
 	if (push_items < src_nritems) {
-		memmove_extent_buffer(src, btrfs_node_key_ptr_offset(0),
-				      btrfs_node_key_ptr_offset(push_items),
-				      (src_nritems - push_items) *
-				      sizeof(struct btrfs_key_ptr));
+		__memmove_extent_buffer_log(root->fs_info, src, 0, push_items,
+					src_nritems - push_items,
+					sizeof(struct btrfs_key_ptr), 1);
 	}
 	btrfs_set_header_nritems(src, src_nritems - push_items);
 	btrfs_set_header_nritems(dst, dst_nritems + push_items);
@@ -2491,15 +2513,13 @@ static int balance_node_right(struct btrfs_trans_handle
*trans,
 	if (max_push < push_items)
 		push_items = max_push;
 
-	memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
-				      btrfs_node_key_ptr_offset(0),
-				      (dst_nritems) *
-				      sizeof(struct btrfs_key_ptr));
+	__memmove_extent_buffer_log(root->fs_info, dst, push_items, 0,
+				    dst_nritems,
+				    sizeof(struct btrfs_key_ptr), 1);
 
-	copy_extent_buffer(dst, src,
-			   btrfs_node_key_ptr_offset(0),
-			   btrfs_node_key_ptr_offset(src_nritems - push_items),
-			   push_items * sizeof(struct btrfs_key_ptr));
+	__copy_extent_buffer_log(root->fs_info, dst, src, 0,
+				 src_nritems - push_items,
+				 push_items, sizeof(struct btrfs_key_ptr));
 
 	btrfs_set_header_nritems(src, src_nritems - push_items);
 	btrfs_set_header_nritems(dst, dst_nritems + push_items);
@@ -2570,7 +2590,7 @@ static noinline int insert_new_root(struct
btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(c);
 
 	old = root->node;
-	rcu_assign_pointer(root->node, c);
+	set_root_pointer(root, c);
 
 	/* the super has an extra ref to root->node */
 	free_extent_buffer(old);
@@ -2593,10 +2613,11 @@ static noinline int insert_new_root(struct
btrfs_trans_handle *trans,
 static void insert_ptr(struct btrfs_trans_handle *trans,
 		       struct btrfs_root *root, struct btrfs_path *path,
 		       struct btrfs_disk_key *key, u64 bytenr,
-		       int slot, int level)
+		       int slot, int level, int tree_mod_log)
 {
 	struct extent_buffer *lower;
 	int nritems;
+	int ret;
 
 	BUG_ON(!path->nodes[level]);
 	btrfs_assert_tree_locked(path->nodes[level]);
@@ -2605,10 +2626,15 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
 	BUG_ON(slot > nritems);
 	BUG_ON(nritems == BTRFS_NODEPTRS_PER_BLOCK(root));
 	if (slot != nritems) {
-		memmove_extent_buffer(lower,
-			      btrfs_node_key_ptr_offset(slot + 1),
-			      btrfs_node_key_ptr_offset(slot),
-			      (nritems - slot) * sizeof(struct btrfs_key_ptr));
+		__memmove_extent_buffer_log(root->fs_info, lower, slot + 1,
+					    slot, nritems - slot,
+					    sizeof(struct btrfs_key_ptr),
+					    tree_mod_log && level);
+	}
+	if (tree_mod_log && level) {
+		ret = tree_mod_log_insert_key(root->fs_info, lower, slot,
+					      MOD_LOG_KEY_ADD);
+		BUG_ON(ret < 0);
 	}
 	btrfs_set_node_key(lower, key, slot);
 	btrfs_set_node_blockptr(lower, slot, bytenr);
@@ -2681,10 +2707,8 @@ static noinline int split_node(struct btrfs_trans_handle
*trans,
 			    BTRFS_UUID_SIZE);
 
 
-	copy_extent_buffer(split, c,
-			   btrfs_node_key_ptr_offset(0),
-			   btrfs_node_key_ptr_offset(mid),
-			   (c_nritems - mid) * sizeof(struct btrfs_key_ptr));
+	__copy_extent_buffer_log(root->fs_info, split, c, 0, mid,
+				 c_nritems - mid, sizeof(struct btrfs_key_ptr));
 	btrfs_set_header_nritems(split, c_nritems - mid);
 	btrfs_set_header_nritems(c, mid);
 	ret = 0;
@@ -2693,7 +2717,7 @@ static noinline int split_node(struct btrfs_trans_handle
*trans,
 	btrfs_mark_buffer_dirty(split);
 
 	insert_ptr(trans, root, path, &disk_key, split->start,
-		   path->slots[level + 1] + 1, level + 1);
+		   path->slots[level + 1] + 1, level + 1, 1);
 
 	if (path->slots[level] >= mid) {
 		path->slots[level] -= mid;
@@ -3230,7 +3254,7 @@ static noinline void copy_for_split(struct
btrfs_trans_handle *trans,
 	btrfs_set_header_nritems(l, mid);
 	btrfs_item_key(right, &disk_key, 0);
 	insert_ptr(trans, root, path, &disk_key, right->start,
-		   path->slots[1] + 1, 1);
+		   path->slots[1] + 1, 1, 0);
 
 	btrfs_mark_buffer_dirty(right);
 	btrfs_mark_buffer_dirty(l);
@@ -3437,7 +3461,7 @@ again:
 		if (mid <= slot) {
 			btrfs_set_header_nritems(right, 0);
 			insert_ptr(trans, root, path, &disk_key, right->start,
-				   path->slots[1] + 1, 1);
+				   path->slots[1] + 1, 1, 0);
 			btrfs_tree_unlock(path->nodes[0]);
 			free_extent_buffer(path->nodes[0]);
 			path->nodes[0] = right;
@@ -3446,7 +3470,7 @@ again:
 		} else {
 			btrfs_set_header_nritems(right, 0);
 			insert_ptr(trans, root, path, &disk_key, right->start,
-					  path->slots[1], 1);
+					  path->slots[1], 1, 0);
 			btrfs_tree_unlock(path->nodes[0]);
 			free_extent_buffer(path->nodes[0]);
 			path->nodes[0] = right;
@@ -4158,19 +4182,27 @@ int btrfs_insert_item(struct btrfs_trans_handle *trans,
struct btrfs_root
  * empty a node.
  */
 static void del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
-		    struct btrfs_path *path, int level, int slot)
+		    struct btrfs_path *path, int level, int slot,
+		    int tree_mod_log)
 {
 	struct extent_buffer *parent = path->nodes[level];
 	u32 nritems;
+	int ret;
 
 	nritems = btrfs_header_nritems(parent);
 	if (slot != nritems - 1) {
-		memmove_extent_buffer(parent,
-			      btrfs_node_key_ptr_offset(slot),
-			      btrfs_node_key_ptr_offset(slot + 1),
-			      sizeof(struct btrfs_key_ptr) *
-			      (nritems - slot - 1));
+		__memmove_extent_buffer_log(root->fs_info, parent, slot,
+					    slot + 1, nritems - slot - 1,
+					    sizeof(struct btrfs_key_ptr),
+					    tree_mod_log && level);
 	}
+
+	if (tree_mod_log && level) {
+		ret = tree_mod_log_insert_key(root->fs_info, parent, slot,
+					      MOD_LOG_KEY_REMOVE);
+		BUG_ON(ret < 0);
+	}
+
 	nritems--;
 	btrfs_set_header_nritems(parent, nritems);
 	if (nritems == 0 && parent == root->node) {
@@ -4202,7 +4234,7 @@ static noinline void btrfs_del_leaf(struct
btrfs_trans_handle *trans,
 				    struct extent_buffer *leaf)
 {
 	WARN_ON(btrfs_header_generation(leaf) != trans->transid);
-	del_ptr(trans, root, path, 1, path->slots[1]);
+	del_ptr(trans, root, path, 1, path->slots[1], 1);
 
 	/*
 	 * btrfs_free_extent is expensive, we want to make sure we
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 09/24] Btrfs: add btrfs_search_old_slot

The tree modification log together with the current state of the tree gives
a consistent, old version of the tree. btrfs_search_old_slot is used to
search through this old version and return old (dummy!) extent buffers.
Naturally, this function cannot do any tree modifications.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/ctree.c |  298 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/ctree.h |    2 +
 2 files changed, 294 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 724eade..67d716e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -963,6 +963,191 @@ static noinline int __btrfs_cow_block(struct
btrfs_trans_handle *trans,
 	return 0;
 }
 
+/*
+ * returns the logical address of the oldest predecessor of the given root.
+ * entries older than time_seq are ignored.
+ */
+static struct tree_mod_elem *
+__tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
+			   struct btrfs_root *root, u64 time_seq)
+{
+	struct tree_mod_elem *tm;
+	struct tree_mod_elem *found = NULL;
+	u64 root_logical = root->node->start;
+	int looped = 0;
+
+	if (!time_seq)
+		return 0;
+
+	/*
+	 * the very last operation that''s logged for a root is the
replacement
+	 * operation (if it is replaced at all). this has the index of the *new*
+	 * root, making it the very first operation that''s logged for this
root.
+	 */
+	while (1) {
+		tm = tree_mod_log_search_oldest(fs_info, root_logical,
+						time_seq);
+		if (!looped && !tm)
+			return 0;
+		/*
+		 * we must have key remove operations in the log before the
+		 * replace operation.
+		 */
+		BUG_ON(!tm);
+
+		if (tm->op != MOD_LOG_ROOT_REPLACE)
+			break;
+
+		found = tm;
+		root_logical = tm->old_root.logical;
+		BUG_ON(root_logical == root->node->start);
+		looped = 1;
+	}
+
+	return found;
+}
+
+/*
+ * tm is a pointer to the first operation to rewind within eb. then, all
+ * previous operations will be rewinded (until we reach something older than
+ * time_seq).
+ */
+static void
+__tree_mod_log_rewind(struct extent_buffer *eb, u64 time_seq,
+		      struct tree_mod_elem *first_tm)
+{
+	u32 n;
+	struct rb_node *next;
+	struct tree_mod_elem *tm = first_tm;
+	unsigned long o_dst;
+	unsigned long o_src;
+	unsigned long p_size = sizeof(struct btrfs_key_ptr);
+
+	n = btrfs_header_nritems(eb);
+	while (tm && tm->elem.seq >= time_seq) {
+		/*
+		 * all the operations are recorded with the operator used for
+		 * the modification. as we''re going backwards, we do the
+		 * opposite of each operation here.
+		 */
+		switch (tm->op) {
+		case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
+		case MOD_LOG_KEY_REMOVE:
+			BUG_ON(tm->slot < n);
+			btrfs_set_node_key(eb, &tm->key, tm->slot);
+			btrfs_set_node_blockptr(eb, tm->slot, tm->blockptr);
+			btrfs_set_node_ptr_generation(eb, tm->slot,
+						      tm->generation);
+			n++;
+			break;
+		case MOD_LOG_KEY_REPLACE:
+			BUG_ON(tm->slot >= n);
+			btrfs_set_node_key(eb, &tm->key, tm->slot);
+			btrfs_set_node_blockptr(eb, tm->slot, tm->blockptr);
+			btrfs_set_node_ptr_generation(eb, tm->slot,
+						      tm->generation);
+			break;
+		case MOD_LOG_KEY_ADD:
+			if (tm->slot != n - 1) {
+				o_dst = btrfs_node_key_ptr_offset(tm->slot);
+				o_src = btrfs_node_key_ptr_offset(tm->slot + 1);
+				memmove_extent_buffer(eb, o_dst, o_src, p_size);
+			}
+			n--;
+			break;
+		case MOD_LOG_MOVE_KEYS:
+			memmove_extent_buffer(eb, tm->slot, tm->move.dst_slot,
+					      tm->move.nr_items * p_size);
+			break;
+		case MOD_LOG_ROOT_REPLACE:
+			/*
+			 * this operation is special. for roots, this must be
+			 * handled explicitly before rewinding. for all other
+			 * nodes, this must not exist.
+			 */
+			BUG();
+		}
+		next = rb_next(&tm->node);
+		if (!next)
+			break;
+		tm = container_of(next, struct tree_mod_elem, node);
+		if (tm->index != first_tm->index)
+			break;
+	}
+	btrfs_set_header_nritems(eb, n);
+}
+
+static struct extent_buffer *
+tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb,
+		    u64 time_seq)
+{
+	struct extent_buffer *eb_rewin;
+	struct tree_mod_elem *tm;
+
+	if (!time_seq)
+		return eb;
+
+	if (btrfs_header_level(eb) == 0)
+		return eb;
+
+	tm = tree_mod_log_search(fs_info, eb->start, time_seq);
+	if (!tm)
+		return eb;
+
+	eb_rewin = btrfs_clone_extent_buffer(eb);
+	BUG_ON(!eb_rewin);
+
+	extent_buffer_get(eb_rewin);
+	free_extent_buffer(eb);
+
+	__tree_mod_log_rewind(eb_rewin, time_seq, tm);
+
+	return eb_rewin;
+}
+
+static inline struct extent_buffer *
+get_old_root(struct btrfs_root *root, u64 time_seq)
+{
+	struct tree_mod_elem *tm;
+	struct extent_buffer *eb;
+	struct tree_mod_root *old_root;
+	u64 old_generation;
+
+	tm = __tree_mod_log_oldest_root(root->fs_info, root, time_seq);
+	if (!tm)
+		return root->node;
+
+	old_root = &tm->old_root;
+	old_generation = tm->generation;
+
+	tm = tree_mod_log_search(root->fs_info, old_root->logical, time_seq);
+	/*
+	 * there was an item in the log when __tree_mod_log_oldest_root
+	 * returned. this one must not go away, because the time_seq passed to
+	 * us must be blocking its removal.
+	 */
+	BUG_ON(!tm);
+
+	if (old_root->logical == root->node->start) {
+		/* there are logged operations for the current root */
+		eb = btrfs_clone_extent_buffer(root->node);
+	} else {
+		/* there''s a root replace operation for the current root */
+		eb = alloc_dummy_extent_buffer(tm->index << PAGE_CACHE_SHIFT,
+					       root->nodesize);
+		btrfs_set_header_bytenr(eb, eb->start);
+		btrfs_set_header_backref_rev(eb, BTRFS_MIXED_BACKREF_REV);
+		btrfs_set_header_owner(eb, root->root_key.objectid);
+	}
+	if (!eb)
+		return NULL;
+	btrfs_set_header_level(eb, old_root->level);
+	btrfs_set_header_generation(eb, old_generation);
+	__tree_mod_log_rewind(eb, time_seq, tm);
+
+	return eb;
+}
+
 static inline int should_cow_block(struct btrfs_trans_handle *trans,
 				   struct btrfs_root *root,
 				   struct extent_buffer *buf)
@@ -1929,7 +2114,7 @@ static int
 read_block_for_search(struct btrfs_trans_handle *trans,
 		       struct btrfs_root *root, struct btrfs_path *p,
 		       struct extent_buffer **eb_ret, int level, int slot,
-		       struct btrfs_key *key)
+		       struct btrfs_key *key, u64 time_seq)
 {
 	u64 blocknr;
 	u64 gen;
@@ -1952,7 +2137,8 @@ read_block_for_search(struct btrfs_trans_handle *trans,
 				 * sleeping, return
 				 * right away
 				 */
-				*eb_ret = tmp;
+				*eb_ret = tree_mod_log_rewind(root->fs_info,
+							      tmp, time_seq);
 				return 0;
 			}
 			/* the pages were up to date, but we failed
@@ -1967,7 +2153,8 @@ read_block_for_search(struct btrfs_trans_handle *trans,
 			/* now we''re allowed to do a blocking uptodate check */
 			tmp = read_tree_block(root, blocknr, blocksize, gen);
 			if (tmp && btrfs_buffer_uptodate(tmp, gen, 0) > 0) {
-				*eb_ret = tmp;
+				*eb_ret = tree_mod_log_rewind(root->fs_info,
+							      tmp, time_seq);
 				return 0;
 			}
 			free_extent_buffer(tmp);
@@ -2283,7 +2470,7 @@ cow_done:
 			}
 
 			err = read_block_for_search(trans, root, p,
-						    &b, level, slot, key);
+						    &b, level, slot, key, 0);
 			if (err == -EAGAIN)
 				goto again;
 			if (err) {
@@ -2355,6 +2542,105 @@ done:
 }
 
 /*
+ * Like btrfs_search_slot, this looks for a key in the given tree. It uses the
+ * current state of the tree together with the operations recorded in the tree
+ * modification log to search for the key in a previous version of this tree,
as
+ * denoted by the time_seq parameter.
+ *
+ * Naturally, there is no support for insert, delete or cow operations.
+ *
+ * The resulting path and return value will be set up as if we called
+ * btrfs_search_slot at that point in time with ins_len and cow both set to 0.
+ */
+int btrfs_search_old_slot(struct btrfs_root *root, struct btrfs_key *key,
+			  struct btrfs_path *p, u64 time_seq)
+{
+	struct extent_buffer *b;
+	int slot;
+	int ret;
+	int err;
+	int level;
+	int lowest_unlock = 1;
+	u8 lowest_level = 0;
+
+	lowest_level = p->lowest_level;
+	WARN_ON(p->nodes[0] != NULL);
+	BUG_ON(p->search_commit_root);
+
+again:
+	level = 0;
+	b = get_old_root(root, time_seq);
+	extent_buffer_get(b);
+	level = btrfs_header_level(b);
+	btrfs_tree_read_lock(b);
+	p->locks[level] = BTRFS_READ_LOCK;
+
+	while (b) {
+		level = btrfs_header_level(b);
+		p->nodes[level] = b;
+		btrfs_clear_path_blocking(p, NULL, 0);
+
+		/*
+		 * we have a lock on b and as long as we aren''t changing
+		 * the tree, there is no way to for the items in b to change.
+		 * It is safe to drop the lock on our parent before we
+		 * go through the expensive btree search on b.
+		 */
+		btrfs_unlock_up_safe(p, level + 1);
+
+		ret = bin_search(b, key, level, &slot);
+
+		if (level != 0) {
+			int dec = 0;
+			if (ret && slot > 0) {
+				dec = 1;
+				slot -= 1;
+			}
+			p->slots[level] = slot;
+			unlock_up(p, level, lowest_unlock, 0, NULL);
+
+			if (level == lowest_level) {
+				if (dec)
+					p->slots[level]++;
+				goto done;
+			}
+
+			err = read_block_for_search(NULL, root, p, &b, level,
+						    slot, key, time_seq);
+			if (err == -EAGAIN)
+				goto again;
+			if (err) {
+				ret = err;
+				goto done;
+			}
+
+			level = btrfs_header_level(b);
+			err = btrfs_try_tree_read_lock(b);
+			if (!err) {
+				btrfs_set_path_blocking(p);
+				btrfs_tree_read_lock(b);
+				btrfs_clear_path_blocking(p, b,
+							  BTRFS_READ_LOCK);
+			}
+			p->locks[level] = BTRFS_READ_LOCK;
+			p->nodes[level] = b;
+		} else {
+			p->slots[level] = slot;
+			unlock_up(p, level, lowest_unlock, 0, NULL);
+			goto done;
+		}
+	}
+	ret = 1;
+done:
+	if (!p->leave_spinning)
+		btrfs_set_path_blocking(p);
+	if (ret < 0)
+		btrfs_release_path(p);
+
+	return ret;
+}
+
+/*
  * adjust the pointers going up the tree, starting at level
  * making sure the right key of each node is points to ''key''.
  * This is used after shifting pointers to the left, so it stops
@@ -4712,7 +4998,7 @@ again:
 		next = c;
 		next_rw_lock = path->locks[level];
 		ret = read_block_for_search(NULL, root, path, &next, level,
-					    slot, &key);
+					    slot, &key, 0);
 		if (ret == -EAGAIN)
 			goto again;
 
@@ -4749,7 +5035,7 @@ again:
 			break;
 
 		ret = read_block_for_search(NULL, root, path, &next, level,
-					    0, &key);
+					    0, &key, 0);
 		if (ret == -EAGAIN)
 			goto again;
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e53bfb9..6ba21b1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2668,6 +2668,8 @@ int btrfs_duplicate_item(struct btrfs_trans_handle *trans,
 int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root
 		      *root, struct btrfs_key *key, struct btrfs_path *p, int
 		      ins_len, int cow);
+int btrfs_search_old_slot(struct btrfs_root *root, struct btrfs_key *key,
+			  struct btrfs_path *p, u64 time_seq);
 int btrfs_realloc_node(struct btrfs_trans_handle *trans,
 		       struct btrfs_root *root, struct extent_buffer *parent,
 		       int start_slot, int cache_only, u64 *last_ret,
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 10/24] Btrfs: use the tree modification log for backref resolving

This enables backref resolving on life trees while they are changing. This
is a prerequisite for quota groups and just nice to have for everything
else.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/backref.c |   43 +++++++++++++++++++++++++++----------------
 fs/btrfs/backref.h |    3 ++-
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index e6c54d8..b63c2c5 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -241,6 +241,7 @@ add_parent:
  */
 static int __resolve_indirect_ref(struct btrfs_fs_info *fs_info,
 					int search_commit_root,
+					u64 time_seq,
 					struct __prelim_ref *ref,
 					struct ulist *parents,
 					const u64 *extent_item_pos)
@@ -276,7 +277,7 @@ static int __resolve_indirect_ref(struct btrfs_fs_info
*fs_info,
 		goto out;
 
 	path->lowest_level = level;
-	ret = btrfs_search_slot(NULL, root, &ref->key_for_search, path, 0, 0);
+	ret = btrfs_search_old_slot(root, &ref->key_for_search, path,
time_seq);
 	pr_debug("search slot in root %llu (level %d, ref count %d) returned
"
 		 "%d for key (%llu %u %llu)\n",
 		 (unsigned long long)ref->root_id, level, ref->count, ret,
@@ -315,7 +316,7 @@ out:
  * resolve all indirect backrefs from the list
  */
 static int __resolve_indirect_refs(struct btrfs_fs_info *fs_info,
-				   int search_commit_root,
+				   int search_commit_root, u64 time_seq,
 				   struct list_head *head,
 				   const u64 *extent_item_pos)
 {
@@ -342,7 +343,8 @@ static int __resolve_indirect_refs(struct btrfs_fs_info
*fs_info,
 		if (ref->count == 0)
 			continue;
 		err = __resolve_indirect_ref(fs_info, search_commit_root,
-					     ref, parents, extent_item_pos);
+					     time_seq, ref, parents,
+					     extent_item_pos);
 		if (err) {
 			if (ret == 0)
 				ret = err;
@@ -762,7 +764,8 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
  */
 static int find_parent_nodes(struct btrfs_trans_handle *trans,
 			     struct btrfs_fs_info *fs_info, u64 bytenr,
-			     u64 seq, struct ulist *refs, struct ulist *roots,
+			     u64 delayed_ref_seq, u64 time_seq,
+			     struct ulist *refs, struct ulist *roots,
 			     const u64 *extent_item_pos)
 {
 	struct btrfs_key key;
@@ -825,7 +828,8 @@ again:
 				btrfs_put_delayed_ref(&head->node);
 				goto again;
 			}
-			ret = __add_delayed_refs(head, seq, &prefs_delayed);
+			ret = __add_delayed_refs(head, delayed_ref_seq,
+						 &prefs_delayed);
 			if (ret) {
 				spin_unlock(&delayed_refs->lock);
 				goto out;
@@ -865,8 +869,8 @@ again:
 	if (ret)
 		goto out;
 
-	ret = __resolve_indirect_refs(fs_info, search_commit_root, &prefs,
-				      extent_item_pos);
+	ret = __resolve_indirect_refs(fs_info, search_commit_root, time_seq,
+				      &prefs, extent_item_pos);
 	if (ret)
 		goto out;
 
@@ -956,7 +960,8 @@ static void free_leaf_list(struct ulist *blocks)
  */
 static int btrfs_find_all_leafs(struct btrfs_trans_handle *trans,
 				struct btrfs_fs_info *fs_info, u64 bytenr,
-				u64 seq, struct ulist **leafs,
+				u64 delayed_ref_seq, u64 time_seq,
+				struct ulist **leafs,
 				const u64 *extent_item_pos)
 {
 	struct ulist *tmp;
@@ -971,8 +976,8 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle
*trans,
 		return -ENOMEM;
 	}
 
-	ret = find_parent_nodes(trans, fs_info, bytenr, seq, *leafs, tmp,
-				extent_item_pos);
+	ret = find_parent_nodes(trans, fs_info, bytenr, delayed_ref_seq,
+				time_seq, *leafs, tmp, extent_item_pos);
 	ulist_free(tmp);
 
 	if (ret < 0 && ret != -ENOENT) {
@@ -998,7 +1003,8 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle
*trans,
  */
 int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 				struct btrfs_fs_info *fs_info, u64 bytenr,
-				u64 seq, struct ulist **roots)
+				u64 delayed_ref_seq, u64 time_seq,
+				struct ulist **roots)
 {
 	struct ulist *tmp;
 	struct ulist_node *node = NULL;
@@ -1014,8 +1020,8 @@ int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 	}
 
 	while (1) {
-		ret = find_parent_nodes(trans, fs_info, bytenr, seq,
-					tmp, *roots, NULL);
+		ret = find_parent_nodes(trans, fs_info, bytenr, delayed_ref_seq,
+					time_seq, tmp, *roots, NULL);
 		if (ret < 0 && ret != -ENOENT) {
 			ulist_free(tmp);
 			ulist_free(*roots);
@@ -1413,7 +1419,8 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
 	struct ulist *roots = NULL;
 	struct ulist_node *ref_node = NULL;
 	struct ulist_node *root_node = NULL;
-	struct seq_list seq_elem;
+	struct seq_list seq_elem = {};
+	struct seq_list tree_mod_seq_elem = {};
 	struct btrfs_delayed_ref_root *delayed_refs = NULL;
 
 	pr_debug("resolving all inodes for extent %llu\n",
@@ -1430,16 +1437,19 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
 		spin_lock(&delayed_refs->lock);
 		btrfs_get_delayed_seq(delayed_refs, &seq_elem);
 		spin_unlock(&delayed_refs->lock);
+		btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem);
 	}
 
 	ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid,
-				   seq_elem.seq, &refs, &extent_item_pos);
+				   seq_elem.seq, tree_mod_seq_elem.seq, &refs,
+				   &extent_item_pos);
 	if (ret)
 		goto out;
 
 	while (!ret && (ref_node = ulist_next(refs, ref_node))) {
 		ret = btrfs_find_all_roots(trans, fs_info, ref_node->val,
-						seq_elem.seq, &roots);
+						seq_elem.seq,
+						tree_mod_seq_elem.seq, &roots);
 		if (ret)
 			break;
 		while (!ret && (root_node = ulist_next(roots, root_node))) {
@@ -1457,6 +1467,7 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
 	ulist_free(roots);
 out:
 	if (!search_commit_root) {
+		btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem);
 		btrfs_put_delayed_seq(delayed_refs, &seq_elem);
 		btrfs_end_transaction(trans, fs_info->extent_root);
 	}
diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
index 94ba1b2..c18d8ac 100644
--- a/fs/btrfs/backref.h
+++ b/fs/btrfs/backref.h
@@ -58,7 +58,8 @@ int paths_from_inode(u64 inum, struct inode_fs_paths *ipath);
 
 int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 				struct btrfs_fs_info *fs_info, u64 bytenr,
-				u64 seq, struct ulist **roots);
+				u64 delayed_ref_seq, u64 time_seq,
+				struct ulist **roots);
 
 struct btrfs_data_container *init_data_container(u32 total_bytes);
 struct inode_fs_paths *init_ipath(s32 total_bytes, struct btrfs_root *fs_root,
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 11/24] Btrfs: fs_info variable for join_transaction

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/transaction.c |   37 +++++++++++++++++++------------------
 1 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3642225..eb2bd82 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -55,48 +55,49 @@ static noinline void switch_commit_root(struct btrfs_root
*root)
 static noinline int join_transaction(struct btrfs_root *root, int nofail)
 {
 	struct btrfs_transaction *cur_trans;
+	struct btrfs_fs_info *fs_info = root->fs_info;
 
-	spin_lock(&root->fs_info->trans_lock);
+	spin_lock(&fs_info->trans_lock);
 loop:
 	/* The file system has been taken offline. No new transactions. */
-	if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR) {
-		spin_unlock(&root->fs_info->trans_lock);
+	if (fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR) {
+		spin_unlock(&fs_info->trans_lock);
 		return -EROFS;
 	}
 
-	if (root->fs_info->trans_no_join) {
+	if (fs_info->trans_no_join) {
 		if (!nofail) {
-			spin_unlock(&root->fs_info->trans_lock);
+			spin_unlock(&fs_info->trans_lock);
 			return -EBUSY;
 		}
 	}
 
-	cur_trans = root->fs_info->running_transaction;
+	cur_trans = fs_info->running_transaction;
 	if (cur_trans) {
 		if (cur_trans->aborted) {
-			spin_unlock(&root->fs_info->trans_lock);
+			spin_unlock(&fs_info->trans_lock);
 			return cur_trans->aborted;
 		}
 		atomic_inc(&cur_trans->use_count);
 		atomic_inc(&cur_trans->num_writers);
 		cur_trans->num_joined++;
-		spin_unlock(&root->fs_info->trans_lock);
+		spin_unlock(&fs_info->trans_lock);
 		return 0;
 	}
-	spin_unlock(&root->fs_info->trans_lock);
+	spin_unlock(&fs_info->trans_lock);
 
 	cur_trans = kmem_cache_alloc(btrfs_transaction_cachep, GFP_NOFS);
 	if (!cur_trans)
 		return -ENOMEM;
 
-	spin_lock(&root->fs_info->trans_lock);
-	if (root->fs_info->running_transaction) {
+	spin_lock(&fs_info->trans_lock);
+	if (fs_info->running_transaction) {
 		/*
 		 * someone started a transaction after we unlocked.  Make sure
 		 * to redo the trans_no_join checks above
 		 */
 		kmem_cache_free(btrfs_transaction_cachep, cur_trans);
-		cur_trans = root->fs_info->running_transaction;
+		cur_trans = fs_info->running_transaction;
 		goto loop;
 	}
 
@@ -127,14 +128,14 @@ loop:
 	INIT_LIST_HEAD(&cur_trans->delayed_refs.seq_head);
 
 	INIT_LIST_HEAD(&cur_trans->pending_snapshots);
-	list_add_tail(&cur_trans->list, &root->fs_info->trans_list);
+	list_add_tail(&cur_trans->list, &fs_info->trans_list);
 	extent_io_tree_init(&cur_trans->dirty_pages,
-			     root->fs_info->btree_inode->i_mapping);
-	root->fs_info->generation++;
-	cur_trans->transid = root->fs_info->generation;
-	root->fs_info->running_transaction = cur_trans;
+			     fs_info->btree_inode->i_mapping);
+	fs_info->generation++;
+	cur_trans->transid = fs_info->generation;
+	fs_info->running_transaction = cur_trans;
 	cur_trans->aborted = 0;
-	spin_unlock(&root->fs_info->trans_lock);
+	spin_unlock(&fs_info->trans_lock);
 
 	return 0;
 }
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 12/24] Btrfs: tree mod log sanity checks in join_transaction

When a fresh transaction begins, the tree mod log must be clean. Users of
the tree modification log must ensure they never span across transaction
boundaries.

We reset the sequence to 0 in this safe situation to make absolutely sure
overflow can''t happen.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/transaction.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index eb2bd82..3f50cba 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -122,6 +122,15 @@ loop:
 	cur_trans->delayed_refs.flushing = 0;
 	cur_trans->delayed_refs.run_delayed_start = 0;
 	cur_trans->delayed_refs.seq = 1;
+
+	/*
+	 * although the tree mod log is per file system and not per transaction,
+	 * the log must never go across transaction boundaries.
+	 */
+	BUG_ON(!list_empty(&fs_info->tree_mod_seq_list));
+	BUG_ON(!RB_EMPTY_ROOT(&fs_info->tree_mod_log));
+	atomic_set(&fs_info->tree_mod_seq, 0);
+
 	init_waitqueue_head(&cur_trans->delayed_refs.seq_wait);
 	spin_lock_init(&cur_trans->commit_lock);
 	spin_lock_init(&cur_trans->delayed_refs.lock);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 13/24] Btrfs: qgroup on-disk format

From: Arne Jansen <sensille@gmx.net>

Not all features are in use by the current version
and thus may change in the future.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/ctree.h |  136 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 136 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6ba21b1..3961b7e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -91,6 +91,9 @@ struct btrfs_ordered_sum;
 /* for storing balance parameters in the root tree */
 #define BTRFS_BALANCE_OBJECTID -4ULL
 
+/* holds quota configuration and tracking */
+#define BTRFS_QUOTA_TREE_OBJECTID 8ULL
+
 /* orhpan objectid for tracking unlinked/truncated files */
 #define BTRFS_ORPHAN_OBJECTID -5ULL
 
@@ -872,6 +875,72 @@ struct btrfs_block_group_item {
 	__le64 flags;
 } __attribute__ ((__packed__));
 
+/*
+ * is subvolume quota turned on?
+ */
+#define BTRFS_QGROUP_STATUS_FLAG_ON		(1ULL << 0)
+/*
+ * SCANNING is set during the initialization phase
+ */
+#define BTRFS_QGROUP_STATUS_FLAG_SCANNING	(1ULL << 1)
+/*
+ * Some qgroup entries are known to be out of date,
+ * either because the configuration has changed in a way that
+ * makes a rescan necessary, or because the fs has been mounted
+ * with a non-qgroup-aware version.
+ * Turning qouta off and on again makes it inconsistent, too.
+ */
+#define BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT	(1ULL << 2)
+
+#define BTRFS_QGROUP_STATUS_VERSION        1
+
+struct btrfs_qgroup_status_item {
+	__le64 version;
+	/*
+	 * the generation is updated during every commit. As older
+	 * versions of btrfs are not aware of qgroups, it will be
+	 * possible to detect inconsistencies by checking the
+	 * generation on mount time
+	 */
+	__le64 generation;
+
+	/* flag definitions see above */
+	__le64 flags;
+
+	/*
+	 * only used during scanning to record the progress
+	 * of the scan. It contains a logical address
+	 */
+	__le64 scan;
+} __attribute__ ((__packed__));
+
+struct btrfs_qgroup_info_item {
+	__le64 generation;
+	__le64 rfer;
+	__le64 rfer_cmpr;
+	__le64 excl;
+	__le64 excl_cmpr;
+} __attribute__ ((__packed__));
+
+/* flags definition for qgroup limits */
+#define BTRFS_QGROUP_LIMIT_MAX_RFER	(1ULL << 0)
+#define BTRFS_QGROUP_LIMIT_MAX_EXCL	(1ULL << 1)
+#define BTRFS_QGROUP_LIMIT_RSV_RFER	(1ULL << 2)
+#define BTRFS_QGROUP_LIMIT_RSV_EXCL	(1ULL << 3)
+#define BTRFS_QGROUP_LIMIT_RFER_CMPR	(1ULL << 4)
+#define BTRFS_QGROUP_LIMIT_EXCL_CMPR	(1ULL << 5)
+
+struct btrfs_qgroup_limit_item {
+	/*
+	 * only updated when any of the other values change
+	 */
+	__le64 flags;
+	__le64 max_rfer;
+	__le64 max_excl;
+	__le64 rsv_rfer;
+	__le64 rsv_excl;
+} __attribute__ ((__packed__));
+
 struct btrfs_space_info {
 	u64 flags;
 
@@ -1517,6 +1586,30 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_BALANCE_ITEM_KEY	248
 
 /*
+ * Records the overall state of the qgroups.
+ * There''s only one instance of this key present,
+ * (0, BTRFS_QGROUP_STATUS_KEY, 0)
+ */
+#define BTRFS_QGROUP_STATUS_KEY         240
+/*
+ * Records the currently used space of the qgroup.
+ * One key per qgroup, (0, BTRFS_QGROUP_INFO_KEY, qgroupid).
+ */
+#define BTRFS_QGROUP_INFO_KEY           242
+/*
+ * Contains the user configured limits for the qgroup.
+ * One key per qgroup, (0, BTRFS_QGROUP_LIMIT_KEY, qgroupid).
+ */
+#define BTRFS_QGROUP_LIMIT_KEY          244
+/*
+ * Records the child-parent relationship of qgroups. For
+ * each relation, 2 keys are present:
+ * (childid, BTRFS_QGROUP_RELATION_KEY, parentid)
+ * (parentid, BTRFS_QGROUP_RELATION_KEY, childid)
+ */
+#define BTRFS_QGROUP_RELATION_KEY       246
+
+/*
  * string items are for debugging.  They just store a short string of
  * data in the FS
  */
@@ -2424,6 +2517,49 @@ static inline u32
btrfs_file_extent_inline_item_len(struct extent_buffer *eb,
 	return btrfs_item_size(eb, e) - offset;
 }
 
+/* btrfs_qgroup_status_item */
+BTRFS_SETGET_FUNCS(qgroup_status_generation, struct btrfs_qgroup_status_item,
+		   generation, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item,
+		   version, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
+		   flags, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
+		   scan, 64);
+
+/* btrfs_qgroup_info_item */
+BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
+		   generation, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_rfer, struct btrfs_qgroup_info_item, rfer, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_rfer_cmpr, struct btrfs_qgroup_info_item,
+		   rfer_cmpr, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_excl, struct btrfs_qgroup_info_item, excl, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_excl_cmpr, struct btrfs_qgroup_info_item,
+		   excl_cmpr, 64);
+
+BTRFS_SETGET_STACK_FUNCS(stack_qgroup_info_generation,
+			 struct btrfs_qgroup_info_item, generation, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_qgroup_info_rfer, struct btrfs_qgroup_info_item,
+			 rfer, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_qgroup_info_rfer_cmpr,
+			 struct btrfs_qgroup_info_item, rfer_cmpr, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_qgroup_info_excl, struct btrfs_qgroup_info_item,
+			 excl, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_qgroup_info_excl_cmpr,
+			 struct btrfs_qgroup_info_item, excl_cmpr, 64);
+
+/* btrfs_qgroup_limit_item */
+BTRFS_SETGET_FUNCS(qgroup_limit_flags, struct btrfs_qgroup_limit_item,
+		   flags, 64);
+BTRFS_SETGET_FUNCS(qgroup_limit_max_rfer, struct btrfs_qgroup_limit_item,
+		   max_rfer, 64);
+BTRFS_SETGET_FUNCS(qgroup_limit_max_excl, struct btrfs_qgroup_limit_item,
+		   max_excl, 64);
+BTRFS_SETGET_FUNCS(qgroup_limit_rsv_rfer, struct btrfs_qgroup_limit_item,
+		   rsv_rfer, 64);
+BTRFS_SETGET_FUNCS(qgroup_limit_rsv_excl, struct btrfs_qgroup_limit_item,
+		   rsv_excl, 64);
+
 static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb)
 {
 	return sb->s_fs_info;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 14/24] Btrfs: add helper for tree enumeration

From: Arne Jansen <sensille@gmx.net>

Often no exact match is wanted but just the next lower or
higher item. There''s a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/ctree.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/ctree.h |    3 ++
 2 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 67d716e..bf6df35 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2641,6 +2641,78 @@ done:
 }
 
 /*
+ * helper to use instead of search slot if no exact match is needed but
+ * instead the next or previous item should be returned.
+ * When find_higher is true, the next higher item is returned, the next lower
+ * otherwise.
+ * When return_any and find_higher are both true, and no higher item is found,
+ * return the next lower instead.
+ * When return_any is true and find_higher is false, and no lower item is
found,
+ * return the next higher instead.
+ * It returns 0 if any item is found, 1 if none is found (tree empty), and
+ * < 0 on error
+ */
+int btrfs_search_slot_for_read(struct btrfs_root *root,
+			       struct btrfs_key *key, struct btrfs_path *p,
+			       int find_higher, int return_any)
+{
+	int ret;
+	struct extent_buffer *leaf;
+
+again:
+	ret = btrfs_search_slot(NULL, root, key, p, 0, 0);
+	if (ret <= 0)
+		return ret;
+	/*
+	 * a return value of 1 means the path is at the position where the
+	 * item should be inserted. Normally this is the next bigger item,
+	 * but in case the previous item is the last in a leaf, path points
+	 * to the first free slot in the previous leaf, i.e. at an invalid
+	 * item.
+	 */
+	leaf = p->nodes[0];
+
+	if (find_higher) {
+		if (p->slots[0] >= btrfs_header_nritems(leaf)) {
+			ret = btrfs_next_leaf(root, p);
+			if (ret <= 0)
+				return ret;
+			if (!return_any)
+				return 1;
+			/*
+			 * no higher item found, return the next
+			 * lower instead
+			 */
+			return_any = 0;
+			find_higher = 0;
+			btrfs_release_path(p);
+			goto again;
+		}
+	} else {
+		if (p->slots[0] >= btrfs_header_nritems(leaf)) {
+			/* we''re sitting on an invalid slot */
+			if (p->slots[0] == 0) {
+				ret = btrfs_prev_leaf(root, p);
+				if (ret <= 0)
+					return ret;
+				if (!return_any)
+					return 1;
+				/*
+				 * no lower item found, return the next
+				 * higher instead
+				 */
+				return_any = 0;
+				find_higher = 1;
+				btrfs_release_path(p);
+				goto again;
+			}
+			--p->slots[0];
+		}
+	}
+	return 0;
+}
+
+/*
  * adjust the pointers going up the tree, starting at level
  * making sure the right key of each node is points to ''key''.
  * This is used after shifting pointers to the left, so it stops
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3961b7e..283c992 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2806,6 +2806,9 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans,
struct btrfs_root
 		      ins_len, int cow);
 int btrfs_search_old_slot(struct btrfs_root *root, struct btrfs_key *key,
 			  struct btrfs_path *p, u64 time_seq);
+int btrfs_search_slot_for_read(struct btrfs_root *root,
+			       struct btrfs_key *key, struct btrfs_path *p,
+			       int find_higher, int return_any);
 int btrfs_realloc_node(struct btrfs_trans_handle *trans,
 		       struct btrfs_root *root, struct extent_buffer *parent,
 		       int start_slot, int cache_only, u64 *last_ret,
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 15/24] Btrfs: check the root passed to btrfs_end_transaction

From: Arne Jansen <sensille@gmx.net>

This patch only add a consistancy check to validate that the
same root is passed to start_transaction and end_transaction.
Subvolume quota depends on this.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/transaction.c |    6 ++++++
 fs/btrfs/transaction.h |    6 ++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3f50cba..cde906f 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -335,6 +335,7 @@ again:
 	h->transaction = cur_trans;
 	h->blocks_used = 0;
 	h->bytes_reserved = 0;
+	h->root = root;
 	h->delayed_ref_updates = 0;
 	h->use_count = 1;
 	h->block_rsv = NULL;
@@ -501,6 +502,11 @@ static int __btrfs_end_transaction(struct
btrfs_trans_handle *trans,
 
 	btrfs_trans_release_metadata(trans, root);
 	trans->block_rsv = NULL;
+	/*
+	 * the same root has to be passed to start_transaction and
+	 * end_transaction. Subvolume quota depends on this.
+	 */
+	WARN_ON(trans->root != root);
 	while (count < 2) {
 		unsigned long cur = trans->delayed_ref_updates;
 		trans->delayed_ref_updates = 0;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index fe27379..0107294 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -57,6 +57,12 @@ struct btrfs_trans_handle {
 	struct btrfs_block_rsv *block_rsv;
 	struct btrfs_block_rsv *orig_rsv;
 	int aborted;
+	/*
+	 * this root is only needed to validate that the root passed to
+	 * start_transaction is the same as the one passed to end_transaction.
+	 * Subvolume quota depends on this
+	 */
+	struct btrfs_root *root;
 };
 
 struct btrfs_pending_snapshot {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 16/24] Btrfs: added helper to create new trees

From: Arne Jansen <sensille@gmx.net>

This creates a brand new tree. Will be used to create
the quota tree.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/disk-io.c |   78 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/disk-io.h |    5 +++
 2 files changed, 82 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f51ad84..0c7ac16 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1224,6 +1224,82 @@ static struct btrfs_root *btrfs_alloc_root(struct
btrfs_fs_info *fs_info)
 	return root;
 }
 
+struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
+				     struct btrfs_fs_info *fs_info,
+				     u64 objectid)
+{
+	struct extent_buffer *leaf;
+	struct btrfs_root *tree_root = fs_info->tree_root;
+	struct btrfs_root *root;
+	struct btrfs_key key;
+	int ret = 0;
+	u64 bytenr;
+
+	root = btrfs_alloc_root(fs_info);
+	if (!root)
+		return ERR_PTR(-ENOMEM);
+
+	__setup_root(tree_root->nodesize, tree_root->leafsize,
+		     tree_root->sectorsize, tree_root->stripesize,
+		     root, fs_info, objectid);
+	root->root_key.objectid = objectid;
+	root->root_key.type = BTRFS_ROOT_ITEM_KEY;
+	root->root_key.offset = 0;
+
+	leaf = btrfs_alloc_free_block(trans, root, root->leafsize,
+				      0, objectid, NULL, 0, 0, 0);
+	if (IS_ERR(leaf)) {
+		ret = PTR_ERR(leaf);
+		goto fail;
+	}
+
+	bytenr = leaf->start;
+	memset_extent_buffer(leaf, 0, 0, sizeof(struct btrfs_header));
+	btrfs_set_header_bytenr(leaf, leaf->start);
+	btrfs_set_header_generation(leaf, trans->transid);
+	btrfs_set_header_backref_rev(leaf, BTRFS_MIXED_BACKREF_REV);
+	btrfs_set_header_owner(leaf, objectid);
+	root->node = leaf;
+
+	write_extent_buffer(leaf, fs_info->fsid,
+			    (unsigned long)btrfs_header_fsid(leaf),
+			    BTRFS_FSID_SIZE);
+	write_extent_buffer(leaf, fs_info->chunk_tree_uuid,
+			    (unsigned long)btrfs_header_chunk_tree_uuid(leaf),
+			    BTRFS_UUID_SIZE);
+	btrfs_mark_buffer_dirty(leaf);
+
+	root->commit_root = btrfs_root_node(root);
+	root->track_dirty = 1;
+
+
+	root->root_item.flags = 0;
+	root->root_item.byte_limit = 0;
+	btrfs_set_root_bytenr(&root->root_item, leaf->start);
+	btrfs_set_root_generation(&root->root_item, trans->transid);
+	btrfs_set_root_level(&root->root_item, 0);
+	btrfs_set_root_refs(&root->root_item, 1);
+	btrfs_set_root_used(&root->root_item, leaf->len);
+	btrfs_set_root_last_snapshot(&root->root_item, 0);
+	btrfs_set_root_dirid(&root->root_item, 0);
+	root->root_item.drop_level = 0;
+
+	key.objectid = objectid;
+	key.type = BTRFS_ROOT_ITEM_KEY;
+	key.offset = 0;
+	ret = btrfs_insert_root(trans, tree_root, &key, &root->root_item);
+	if (ret)
+		goto fail;
+
+	btrfs_tree_unlock(leaf);
+
+fail:
+	if (ret)
+		return ERR_PTR(ret);
+
+	return root;
+}
+
 static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans,
 					 struct btrfs_fs_info *fs_info)
 {
@@ -3248,7 +3324,7 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64
parent_transid)
 	return btree_read_extent_buffer_pages(root, buf, 0, parent_transid);
 }
 
-static int btree_lock_page_hook(struct page *page, void *data,
+int btree_lock_page_hook(struct page *page, void *data,
 				void (*flush_fn)(void *))
 {
 	struct inode *inode = page->mapping->host;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index ab1830a..95e147e 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -90,6 +90,11 @@ int btrfs_cleanup_transaction(struct btrfs_root *root);
 void btrfs_cleanup_one_transaction(struct btrfs_transaction *trans,
 				  struct btrfs_root *root);
 void btrfs_abort_devices(struct btrfs_root *root);
+struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
+				     struct btrfs_fs_info *fs_info,
+				     u64 objectid);
+int btree_lock_page_hook(struct page *page, void *data,
+				void (*flush_fn)(void *));
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 void btrfs_init_lockdep(void);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 17/24] Btrfs: qgroup state and initialization

From: Arne Jansen <sensille@gmx.net>

Add state to fs_info.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/ctree.h   |   31 +++++++++++++++++++++++++++++++
 fs/btrfs/disk-io.c |    7 +++++++
 2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 283c992..2b6f003 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1102,6 +1102,7 @@ struct btrfs_fs_info {
 	struct btrfs_root *dev_root;
 	struct btrfs_root *fs_root;
 	struct btrfs_root *csum_root;
+	struct btrfs_root *quota_root;
 
 	/* the log root tree is a directory of all the other log roots */
 	struct btrfs_root *log_root_tree;
@@ -1354,6 +1355,29 @@ struct btrfs_fs_info {
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
 	u32 check_integrity_print_mask;
 #endif
+	/*
+	 * quota information
+	 */
+	unsigned int quota_enabled:1;
+
+	/*
+	 * quota_enabled only changes state after a commit. This holds the
+	 * next state.
+	 */
+	unsigned int pending_quota_state:1;
+
+	/* is qgroup tracking in a consistent state? */
+	u64 qgroup_flags;
+
+	/* holds configuration and tracking. Protected by qgroup_lock */
+	struct rb_root qgroup_tree;
+	spinlock_t qgroup_lock;
+
+	/* list of dirty qgroups to be written at next commit */
+	struct list_head dirty_qgroups;
+
+	/* used by btrfs_qgroup_record_ref for an efficient tree traversal */
+	u64 qgroup_seq;
 
 	/* filesystem state */
 	u64 fs_state;
@@ -3260,4 +3284,11 @@ void btrfs_get_tree_mod_seq(struct btrfs_fs_info
*fs_info,
 void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
 			    struct seq_list *elem);
 
+static inline int is_fstree(u64 rootid)
+{
+	if (rootid == BTRFS_FS_TREE_OBJECTID ||
+	    (s64)rootid >= (s64)BTRFS_FIRST_FREE_OBJECTID)
+		return 1;
+	return 0;
+}
 #endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0c7ac16..d42ad71 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2106,6 +2106,13 @@ int open_ctree(struct super_block *sb,
 	init_rwsem(&fs_info->cleanup_work_sem);
 	init_rwsem(&fs_info->subvol_sem);
 
+	spin_lock_init(&fs_info->qgroup_lock);
+	fs_info->qgroup_tree = RB_ROOT;
+	INIT_LIST_HEAD(&fs_info->dirty_qgroups);
+	fs_info->qgroup_seq = 1;
+	fs_info->quota_enabled = 0;
+	fs_info->pending_quota_state = 0;
+
 	btrfs_init_free_cluster(&fs_info->meta_alloc_cluster);
 	btrfs_init_free_cluster(&fs_info->data_alloc_cluster);
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 18/24] Btrfs: Test code to change the order of delayed-ref processing

From: Arne Jansen <sensille@gmx.net>

Normally delayed refs get processed in ascending bytenr order. This
correlates in most cases to the order added. To expose dependencies
on this order, we start to process the tree in the middle instead of
the beginning.
This code is only effective when SCRAMBLE_DELAYED_REFS is defined.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/extent-tree.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 50 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b68eb7a..a7f980b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -34,6 +34,8 @@
 #include "locking.h"
 #include "free-space-cache.h"
 
+#undef SCRAMBLE_DELAYED_REFS
+
 /*
  * control flags for do_chunk_alloc''s force field
  * CHUNK_ALLOC_NO_FORCE means to only allocate a chunk
@@ -2347,7 +2349,6 @@ next:
 	return count;
 }
 
-
 static void wait_for_more_refs(struct btrfs_delayed_ref_root *delayed_refs,
 			unsigned long num_refs)
 {
@@ -2364,6 +2365,49 @@ static void wait_for_more_refs(struct
btrfs_delayed_ref_root *delayed_refs,
 	spin_lock(&delayed_refs->lock);
 }
 
+#ifdef SCRAMBLE_DELAYED_REFS
+/*
+ * Normally delayed refs get processed in ascending bytenr order. This
+ * correlates in most cases to the order added. To expose dependencies on this
+ * order, we start to process the tree in the middle instead of the beginning
+ */
+static u64 find_middle(struct rb_root *root)
+{
+	struct rb_node *n = root->rb_node;
+	struct btrfs_delayed_ref_node *entry;
+	int alt = 1;
+	u64 middle;
+	u64 first = 0, last = 0;
+
+	n = rb_first(root);
+	if (n) {
+		entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
+		first = entry->bytenr;
+	}
+	n = rb_last(root);
+	if (n) {
+		entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
+		last = entry->bytenr;
+	}
+	n = root->rb_node;
+
+	while (n) {
+		entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
+		WARN_ON(!entry->in_tree);
+
+		middle = entry->bytenr;
+
+		if (alt)
+			n = n->rb_left;
+		else
+			n = n->rb_right;
+
+		alt = 1 - alt;
+	}
+	return middle;
+}
+#endif
+
 /*
  * this starts processing the delayed reference count updates and
  * extent insertions we have queued up so far.  count can be
@@ -2404,6 +2448,11 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle
*trans,
 again:
 	consider_waiting = 0;
 	spin_lock(&delayed_refs->lock);
+
+#ifdef SCRAMBLE_DELAYED_REFS
+	delayed_refs->run_delayed_start = find_middle(&delayed_refs->root);
+#endif
+
 	if (count == 0) {
 		count = delayed_refs->num_entries * 2;
 		run_most = 1;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 19/24] Btrfs: qgroup implementation and prototypes

From: Arne Jansen <sensille@gmx.net>

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/Makefile |    2 +-
 fs/btrfs/ctree.h  |   33 ++
 fs/btrfs/ioctl.h  |   24 +
 fs/btrfs/qgroup.c | 1531 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 1589 insertions(+), 1 deletions(-)
 create mode 100644 fs/btrfs/qgroup.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 0c4fa2b..0bc4d3a 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -8,7 +8,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o
root-tree.o dir-item.o \
 	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
-	   reada.o backref.o ulist.o
+	   reada.o backref.o ulist.o qgroup.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2b6f003..0630412 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3284,6 +3284,39 @@ void btrfs_get_tree_mod_seq(struct btrfs_fs_info
*fs_info,
 void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
 			    struct seq_list *elem);
 
+/* qgroup.c */
+int btrfs_quota_enable(struct btrfs_trans_handle *trans,
+		       struct btrfs_fs_info *fs_info);
+int btrfs_quota_disable(struct btrfs_trans_handle *trans,
+			struct btrfs_fs_info *fs_info);
+int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
+			      struct btrfs_fs_info *fs_info, u64 src, u64 dst);
+int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
+			      struct btrfs_fs_info *fs_info, u64 src, u64 dst);
+int btrfs_create_qgroup(struct btrfs_trans_handle *trans,
+			struct btrfs_fs_info *fs_info, u64 qgroupid,
+			char *name);
+int btrfs_remove_qgroup(struct btrfs_trans_handle *trans,
+			      struct btrfs_fs_info *fs_info, u64 qgroupid);
+int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
+		       struct btrfs_fs_info *fs_info, u64 qgroupid,
+		       struct btrfs_qgroup_limit *limit);
+int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info);
+void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info);
+struct btrfs_delayed_extent_op;
+int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans,
+			    struct btrfs_fs_info *fs_info,
+			    struct btrfs_delayed_ref_node *node,
+			    struct btrfs_delayed_extent_op *extent_op);
+int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
+		      struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
+			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
+			 struct btrfs_qgroup_inherit *inherit);
+int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
+void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
+
 static inline int is_fstree(u64 rootid)
 {
 	if (rootid == BTRFS_FS_TREE_OBJECTID ||
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 086e6bd..44c34a5 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -35,6 +35,30 @@ struct btrfs_ioctl_vol_args {
 #define BTRFS_FSID_SIZE 16
 #define BTRFS_UUID_SIZE 16
 
+#define BTRFS_QGROUP_INHERIT_SET_LIMITS	(1ULL << 0)
+
+struct btrfs_qgroup_limit {
+	__u64	flags;
+	__u64	max_rfer;
+	__u64	max_excl;
+	__u64	rsv_rfer;
+	__u64	rsv_excl;
+};
+
+struct btrfs_qgroup_inherit {
+	__u64	flags;
+	__u64	num_qgroups;
+	__u64	num_ref_copies;
+	__u64	num_excl_copies;
+	struct btrfs_qgroup_limit lim;
+	__u64	qgroups[0];
+};
+
+struct btrfs_ioctl_qgroup_limit_args {
+	__u64	qgroupid;
+	struct btrfs_qgroup_limit lim;
+};
+
 #define BTRFS_SUBVOL_NAME_MAX 4039
 struct btrfs_ioctl_vol_args_v2 {
 	__s64 fd;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
new file mode 100644
index 0000000..678fe45
--- /dev/null
+++ b/fs/btrfs/qgroup.c
@@ -0,0 +1,1531 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include <linux/sched.h>
+#include <linux/pagemap.h>
+#include <linux/writeback.h>
+#include <linux/blkdev.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+#include "ctree.h"
+#include "transaction.h"
+#include "disk-io.h"
+#include "locking.h"
+#include "ulist.h"
+#include "ioctl.h"
+#include "backref.h"
+
+/* TODO XXX FIXME
+ *  - subvol delete -> delete when ref goes to 0? delete limits also?
+ *  - reorganize keys
+ *  - compressed
+ *  - sync
+ *  - rescan
+ *  - copy also limits on subvol creation
+ *  - limit
+ *  - caches fuer ulists
+ *  - performance benchmarks
+ *  - check all ioctl parameters
+ */
+
+/*
+ * one struct for each qgroup, organized in fs_info->qgroup_tree.
+ */
+struct btrfs_qgroup {
+	u64 qgroupid;
+
+	/*
+	 * state
+	 */
+	u64 rfer;	/* referenced */
+	u64 rfer_cmpr;	/* referenced compressed */
+	u64 excl;	/* exclusive */
+	u64 excl_cmpr;	/* exclusive compressed */
+
+	/*
+	 * limits
+	 */
+	u64 lim_flags;	/* which limits are set */
+	u64 max_rfer;
+	u64 max_excl;
+	u64 rsv_rfer;
+	u64 rsv_excl;
+
+	/*
+	 * reservation tracking
+	 */
+	u64 reserved;
+
+	/*
+	 * lists
+	 */
+	struct list_head groups;  /* groups this group is member of */
+	struct list_head members; /* groups that are members of this group */
+	struct list_head dirty;   /* dirty groups */
+	struct rb_node node;	  /* tree of qgroups */
+
+	/*
+	 * temp variables for accounting operations
+	 */
+	u64 tag;
+	u64 refcnt;
+};
+
+/*
+ * glue structure to represent the relations between qgroups.
+ */
+struct btrfs_qgroup_list {
+	struct list_head next_group;
+	struct list_head next_member;
+	struct btrfs_qgroup *group;
+	struct btrfs_qgroup *member;
+};
+
+/* must be called with qgroup_lock held */
+static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
+					   u64 qgroupid)
+{
+	struct rb_node *n = fs_info->qgroup_tree.rb_node;
+	struct btrfs_qgroup *qgroup;
+
+	while (n) {
+		qgroup = rb_entry(n, struct btrfs_qgroup, node);
+		if (qgroup->qgroupid < qgroupid)
+			n = n->rb_left;
+		else if (qgroup->qgroupid > qgroupid)
+			n = n->rb_right;
+		else
+			return qgroup;
+	}
+	return NULL;
+}
+
+/* must be called with qgroup_lock held */
+static struct btrfs_qgroup *add_qgroup_rb(struct btrfs_fs_info *fs_info,
+					  u64 qgroupid)
+{
+	struct rb_node **p = &fs_info->qgroup_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct btrfs_qgroup *qgroup;
+
+	while (*p) {
+		parent = *p;
+		qgroup = rb_entry(parent, struct btrfs_qgroup, node);
+
+		if (qgroup->qgroupid < qgroupid)
+			p = &(*p)->rb_left;
+		else if (qgroup->qgroupid > qgroupid)
+			p = &(*p)->rb_right;
+		else
+			return qgroup;
+	}
+
+	qgroup = kzalloc(sizeof(*qgroup), GFP_ATOMIC);
+	if (!qgroup)
+		return ERR_PTR(-ENOMEM);
+
+	qgroup->qgroupid = qgroupid;
+	INIT_LIST_HEAD(&qgroup->groups);
+	INIT_LIST_HEAD(&qgroup->members);
+	INIT_LIST_HEAD(&qgroup->dirty);
+
+	rb_link_node(&qgroup->node, parent, p);
+	rb_insert_color(&qgroup->node, &fs_info->qgroup_tree);
+
+	return qgroup;
+}
+
+/* must be called with qgroup_lock held */
+static int del_qgroup_rb(struct btrfs_fs_info *fs_info, u64 qgroupid)
+{
+	struct btrfs_qgroup *qgroup = find_qgroup_rb(fs_info, qgroupid);
+	struct btrfs_qgroup_list *list;
+
+	if (!qgroup)
+		return -ENOENT;
+
+	rb_erase(&qgroup->node, &fs_info->qgroup_tree);
+	list_del(&qgroup->dirty);
+
+	while (!list_empty(&qgroup->groups)) {
+		list = list_first_entry(&qgroup->groups,
+					struct btrfs_qgroup_list, next_group);
+		list_del(&list->next_group);
+		list_del(&list->next_member);
+		kfree(list);
+	}
+
+	while (!list_empty(&qgroup->members)) {
+		list = list_first_entry(&qgroup->members,
+					struct btrfs_qgroup_list, next_member);
+		list_del(&list->next_group);
+		list_del(&list->next_member);
+		kfree(list);
+	}
+	kfree(qgroup);
+
+	return 0;
+}
+
+/* must be called with qgroup_lock held */
+static int add_relation_rb(struct btrfs_fs_info *fs_info,
+			   u64 memberid, u64 parentid)
+{
+	struct btrfs_qgroup *member;
+	struct btrfs_qgroup *parent;
+	struct btrfs_qgroup_list *list;
+
+	member = find_qgroup_rb(fs_info, memberid);
+	parent = find_qgroup_rb(fs_info, parentid);
+	if (!member || !parent)
+		return -ENOENT;
+
+	list = kzalloc(sizeof(*list), GFP_ATOMIC);
+	if (!list)
+		return -ENOMEM;
+
+	list->group = parent;
+	list->member = member;
+	list_add_tail(&list->next_group, &member->groups);
+	list_add_tail(&list->next_member, &parent->members);
+
+	return 0;
+}
+
+/* must be called with qgroup_lock held */
+static int del_relation_rb(struct btrfs_fs_info *fs_info,
+			   u64 memberid, u64 parentid)
+{
+	struct btrfs_qgroup *member;
+	struct btrfs_qgroup *parent;
+	struct btrfs_qgroup_list *list;
+
+	member = find_qgroup_rb(fs_info, memberid);
+	parent = find_qgroup_rb(fs_info, parentid);
+	if (!member || !parent)
+		return -ENOENT;
+
+	list_for_each_entry(list, &member->groups, next_group) {
+		if (list->group == parent) {
+			list_del(&list->next_group);
+			list_del(&list->next_member);
+			kfree(list);
+			return 0;
+		}
+	}
+	return -ENOENT;
+}
+
+/*
+ * The full config is read in one go, only called from open_ctree()
+ * It doesn''t use any locking, as at this point we''re still
single-threaded
+ */
+int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_key key;
+	struct btrfs_key found_key;
+	struct btrfs_root *quota_root = fs_info->quota_root;
+	struct btrfs_path *path = NULL;
+	struct extent_buffer *l;
+	int slot;
+	int ret = 0;
+	u64 flags = 0;
+
+	if (!fs_info->quota_enabled)
+		return 0;
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* default this to quota off, in case no status key is found */
+	fs_info->qgroup_flags = 0;
+
+	/*
+	 * pass 1: read status, all qgroup infos and limits
+	 */
+	key.objectid = 0;
+	key.type = 0;
+	key.offset = 0;
+	ret = btrfs_search_slot_for_read(quota_root, &key, path, 1, 1);
+	if (ret)
+		goto out;
+
+	while (1) {
+		struct btrfs_qgroup *qgroup;
+
+		slot = path->slots[0];
+		l = path->nodes[0];
+		btrfs_item_key_to_cpu(l, &found_key, slot);
+
+		if (found_key.type == BTRFS_QGROUP_STATUS_KEY) {
+			struct btrfs_qgroup_status_item *ptr;
+
+			ptr = btrfs_item_ptr(l, slot,
+					     struct btrfs_qgroup_status_item);
+
+			if (btrfs_qgroup_status_version(l, ptr) !+			   
BTRFS_QGROUP_STATUS_VERSION) {
+				printk(KERN_ERR
+				 "btrfs: old qgroup version, quota disabled\n");
+				goto out;
+			}
+			if (btrfs_qgroup_status_generation(l, ptr) !+			    fs_info->generation)
{
+				flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+				printk(KERN_ERR
+					"btrfs: qgroup generation mismatch, "
+					"marked as inconsistent\n");
+			}
+			fs_info->qgroup_flags = btrfs_qgroup_status_flags(l,
+									  ptr);
+			/* FIXME read scan element */
+			goto next1;
+		}
+
+		if (found_key.type != BTRFS_QGROUP_INFO_KEY &&
+		    found_key.type != BTRFS_QGROUP_LIMIT_KEY)
+			goto next1;
+
+		qgroup = find_qgroup_rb(fs_info, found_key.offset);
+		if ((qgroup && found_key.type == BTRFS_QGROUP_INFO_KEY) ||
+		    (!qgroup && found_key.type == BTRFS_QGROUP_LIMIT_KEY)) {
+			printk(KERN_ERR "btrfs: inconsitent qgroup config\n");
+			flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		}
+		if (!qgroup) {
+			qgroup = add_qgroup_rb(fs_info, found_key.offset);
+			if (IS_ERR(qgroup)) {
+				ret = PTR_ERR(qgroup);
+				goto out;
+			}
+		}
+		switch (found_key.type) {
+		case BTRFS_QGROUP_INFO_KEY: {
+			struct btrfs_qgroup_info_item *ptr;
+
+			ptr = btrfs_item_ptr(l, slot,
+					     struct btrfs_qgroup_info_item);
+			qgroup->rfer = btrfs_qgroup_info_rfer(l, ptr);
+			qgroup->rfer_cmpr = btrfs_qgroup_info_rfer_cmpr(l, ptr);
+			qgroup->excl = btrfs_qgroup_info_excl(l, ptr);
+			qgroup->excl_cmpr = btrfs_qgroup_info_excl_cmpr(l, ptr);
+			/* generation currently unused */
+			break;
+		}
+		case BTRFS_QGROUP_LIMIT_KEY: {
+			struct btrfs_qgroup_limit_item *ptr;
+
+			ptr = btrfs_item_ptr(l, slot,
+					     struct btrfs_qgroup_limit_item);
+			qgroup->lim_flags = btrfs_qgroup_limit_flags(l, ptr);
+			qgroup->max_rfer = btrfs_qgroup_limit_max_rfer(l, ptr);
+			qgroup->max_excl = btrfs_qgroup_limit_max_excl(l, ptr);
+			qgroup->rsv_rfer = btrfs_qgroup_limit_rsv_rfer(l, ptr);
+			qgroup->rsv_excl = btrfs_qgroup_limit_rsv_excl(l, ptr);
+			break;
+		}
+		}
+next1:
+		ret = btrfs_next_item(quota_root, path);
+		if (ret < 0)
+			goto out;
+		if (ret)
+			break;
+	}
+	btrfs_release_path(path);
+
+	/*
+	 * pass 2: read all qgroup relations
+	 */
+	key.objectid = 0;
+	key.type = BTRFS_QGROUP_RELATION_KEY;
+	key.offset = 0;
+	ret = btrfs_search_slot_for_read(quota_root, &key, path, 1, 0);
+	if (ret)
+		goto out;
+	while (1) {
+		slot = path->slots[0];
+		l = path->nodes[0];
+		btrfs_item_key_to_cpu(l, &found_key, slot);
+
+		if (found_key.type != BTRFS_QGROUP_RELATION_KEY)
+			goto next2;
+
+		if (found_key.objectid > found_key.offset) {
+			/* parent <- member, not needed to build config */
+			/* FIXME should we omit the key completely? */
+			goto next2;
+		}
+
+		ret = add_relation_rb(fs_info, found_key.objectid,
+				      found_key.offset);
+		if (ret)
+			goto out;
+next2:
+		ret = btrfs_next_item(quota_root, path);
+		if (ret < 0)
+			goto out;
+		if (ret)
+			break;
+	}
+out:
+	fs_info->qgroup_flags |= flags;
+	if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON)) {
+		fs_info->quota_enabled = 0;
+		fs_info->pending_quota_state = 0;
+	}
+	btrfs_free_path(path);
+
+	return ret < 0 ? ret : 0;
+}
+
+/*
+ * This is only called from close_ctree() or open_ctree(), both in single-
+ * treaded paths. Clean up the in-memory structures. No locking needed.
+ */
+void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info)
+{
+	struct rb_node *n;
+	struct btrfs_qgroup *qgroup;
+	struct btrfs_qgroup_list *list;
+
+	while ((n = rb_first(&fs_info->qgroup_tree))) {
+		qgroup = rb_entry(n, struct btrfs_qgroup, node);
+		rb_erase(n, &fs_info->qgroup_tree);
+
+		WARN_ON(!list_empty(&qgroup->dirty));
+
+		while (!list_empty(&qgroup->groups)) {
+			list = list_first_entry(&qgroup->groups,
+						struct btrfs_qgroup_list,
+						next_group);
+			list_del(&list->next_group);
+			list_del(&list->next_member);
+			kfree(list);
+		}
+
+		while (!list_empty(&qgroup->members)) {
+			list = list_first_entry(&qgroup->members,
+						struct btrfs_qgroup_list,
+						next_member);
+			list_del(&list->next_group);
+			list_del(&list->next_member);
+			kfree(list);
+		}
+		kfree(qgroup);
+	}
+}
+
+static int add_qgroup_relation_item(struct btrfs_trans_handle *trans,
+				    struct btrfs_root *quota_root,
+				    u64 src, u64 dst)
+{
+	int ret;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = src;
+	key.type = BTRFS_QGROUP_RELATION_KEY;
+	key.offset = dst;
+
+	ret = btrfs_insert_empty_item(trans, quota_root, path, &key, 0);
+
+	btrfs_mark_buffer_dirty(path->nodes[0]);
+
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int del_qgroup_relation_item(struct btrfs_trans_handle *trans,
+				    struct btrfs_root *quota_root,
+				    u64 src, u64 dst)
+{
+	int ret;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = src;
+	key.type = BTRFS_QGROUP_RELATION_KEY;
+	key.offset = dst;
+
+	ret = btrfs_search_slot(trans, quota_root, &key, path, -1, 1);
+	if (ret < 0)
+		goto out;
+
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	ret = btrfs_del_item(trans, quota_root, path);
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int add_qgroup_item(struct btrfs_trans_handle *trans,
+			   struct btrfs_root *quota_root, u64 qgroupid)
+{
+	int ret;
+	struct btrfs_path *path;
+	struct btrfs_qgroup_info_item *qgroup_info;
+	struct btrfs_qgroup_limit_item *qgroup_limit;
+	struct extent_buffer *leaf;
+	struct btrfs_key key;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = 0;
+	key.type = BTRFS_QGROUP_INFO_KEY;
+	key.offset = qgroupid;
+
+	ret = btrfs_insert_empty_item(trans, quota_root, path, &key,
+				      sizeof(*qgroup_info));
+	if (ret)
+		goto out;
+
+	leaf = path->nodes[0];
+	qgroup_info = btrfs_item_ptr(leaf, path->slots[0],
+				 struct btrfs_qgroup_info_item);
+	btrfs_set_qgroup_info_generation(leaf, qgroup_info, trans->transid);
+	btrfs_set_qgroup_info_rfer(leaf, qgroup_info, 0);
+	btrfs_set_qgroup_info_rfer_cmpr(leaf, qgroup_info, 0);
+	btrfs_set_qgroup_info_excl(leaf, qgroup_info, 0);
+	btrfs_set_qgroup_info_excl_cmpr(leaf, qgroup_info, 0);
+
+	btrfs_mark_buffer_dirty(leaf);
+
+	btrfs_release_path(path);
+
+	key.type = BTRFS_QGROUP_LIMIT_KEY;
+	ret = btrfs_insert_empty_item(trans, quota_root, path, &key,
+				      sizeof(*qgroup_limit));
+	if (ret)
+		goto out;
+
+	leaf = path->nodes[0];
+	qgroup_limit = btrfs_item_ptr(leaf, path->slots[0],
+				  struct btrfs_qgroup_limit_item);
+	btrfs_set_qgroup_limit_flags(leaf, qgroup_limit, 0);
+	btrfs_set_qgroup_limit_max_rfer(leaf, qgroup_limit, 0);
+	btrfs_set_qgroup_limit_max_excl(leaf, qgroup_limit, 0);
+	btrfs_set_qgroup_limit_rsv_rfer(leaf, qgroup_limit, 0);
+	btrfs_set_qgroup_limit_rsv_excl(leaf, qgroup_limit, 0);
+
+	btrfs_mark_buffer_dirty(leaf);
+
+	ret = 0;
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int del_qgroup_item(struct btrfs_trans_handle *trans,
+			   struct btrfs_root *quota_root, u64 qgroupid)
+{
+	int ret;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = 0;
+	key.type = BTRFS_QGROUP_INFO_KEY;
+	key.offset = qgroupid;
+	ret = btrfs_search_slot(trans, quota_root, &key, path, -1, 1);
+	if (ret < 0)
+		goto out;
+
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	ret = btrfs_del_item(trans, quota_root, path);
+	if (ret)
+		goto out;
+
+	btrfs_release_path(path);
+
+	key.type = BTRFS_QGROUP_LIMIT_KEY;
+	ret = btrfs_search_slot(trans, quota_root, &key, path, -1, 1);
+	if (ret < 0)
+		goto out;
+
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	ret = btrfs_del_item(trans, quota_root, path);
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int update_qgroup_limit_item(struct btrfs_trans_handle *trans,
+				    struct btrfs_root *root, u64 qgroupid,
+				    u64 flags, u64 max_rfer, u64 max_excl,
+				    u64 rsv_rfer, u64 rsv_excl)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct extent_buffer *l;
+	struct btrfs_qgroup_limit_item *qgroup_limit;
+	int ret;
+	int slot;
+
+	key.objectid = 0;
+	key.type = BTRFS_QGROUP_LIMIT_KEY;
+	key.offset = qgroupid;
+
+	path = btrfs_alloc_path();
+	BUG_ON(!path);
+	ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
+	if (ret > 0)
+		ret = -ENOENT;
+
+	if (ret)
+		goto out;
+
+	l = path->nodes[0];
+	slot = path->slots[0];
+	qgroup_limit = btrfs_item_ptr(l, path->slots[0],
+				      struct btrfs_qgroup_limit_item);
+	btrfs_set_qgroup_limit_flags(l, qgroup_limit, flags);
+	btrfs_set_qgroup_limit_max_rfer(l, qgroup_limit, max_rfer);
+	btrfs_set_qgroup_limit_max_excl(l, qgroup_limit, max_excl);
+	btrfs_set_qgroup_limit_rsv_rfer(l, qgroup_limit, rsv_rfer);
+	btrfs_set_qgroup_limit_rsv_excl(l, qgroup_limit, rsv_excl);
+
+	btrfs_mark_buffer_dirty(l);
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int update_qgroup_info_item(struct btrfs_trans_handle *trans,
+				   struct btrfs_root *root,
+				   struct btrfs_qgroup *qgroup)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct extent_buffer *l;
+	struct btrfs_qgroup_info_item *qgroup_info;
+	int ret;
+	int slot;
+
+	key.objectid = 0;
+	key.type = BTRFS_QGROUP_INFO_KEY;
+	key.offset = qgroup->qgroupid;
+
+	path = btrfs_alloc_path();
+	BUG_ON(!path);
+	ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
+	if (ret > 0)
+		ret = -ENOENT;
+
+	if (ret)
+		goto out;
+
+	l = path->nodes[0];
+	slot = path->slots[0];
+	qgroup_info = btrfs_item_ptr(l, path->slots[0],
+				 struct btrfs_qgroup_info_item);
+	btrfs_set_qgroup_info_generation(l, qgroup_info, trans->transid);
+	btrfs_set_qgroup_info_rfer(l, qgroup_info, qgroup->rfer);
+	btrfs_set_qgroup_info_rfer_cmpr(l, qgroup_info, qgroup->rfer_cmpr);
+	btrfs_set_qgroup_info_excl(l, qgroup_info, qgroup->excl);
+	btrfs_set_qgroup_info_excl_cmpr(l, qgroup_info, qgroup->excl_cmpr);
+
+	btrfs_mark_buffer_dirty(l);
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int update_qgroup_status_item(struct btrfs_trans_handle *trans,
+				     struct btrfs_fs_info *fs_info,
+				    struct btrfs_root *root)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct extent_buffer *l;
+	struct btrfs_qgroup_status_item *ptr;
+	int ret;
+	int slot;
+
+	key.objectid = 0;
+	key.type = BTRFS_QGROUP_STATUS_KEY;
+	key.offset = 0;
+
+	path = btrfs_alloc_path();
+	BUG_ON(!path);
+	ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
+	if (ret > 0)
+		ret = -ENOENT;
+
+	if (ret)
+		goto out;
+
+	l = path->nodes[0];
+	slot = path->slots[0];
+	ptr = btrfs_item_ptr(l, slot, struct btrfs_qgroup_status_item);
+	btrfs_set_qgroup_status_flags(l, ptr, fs_info->qgroup_flags);
+	btrfs_set_qgroup_status_generation(l, ptr, trans->transid);
+	/* XXX scan */
+
+	btrfs_mark_buffer_dirty(l);
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+/*
+ * called with qgroup_lock held
+ */
+static int btrfs_clean_quota_tree(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *root)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	int ret;
+
+	if (!root)
+		return -EINVAL;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	while (1) {
+		key.objectid = 0;
+		key.offset = 0;
+		key.type = 0;
+
+		path->leave_spinning = 1;
+		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+		if (ret > 0) {
+			if (path->slots[0] == 0)
+				break;
+			path->slots[0]--;
+		} else if (ret < 0) {
+			break;
+		}
+
+		ret = btrfs_del_item(trans, root, path);
+		if (ret)
+			goto out;
+		btrfs_release_path(path);
+	}
+	ret = 0;
+out:
+	root->fs_info->pending_quota_state = 0;
+	btrfs_free_path(path);
+	return ret;
+}
+
+int btrfs_quota_enable(struct btrfs_trans_handle *trans,
+		       struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *quota_root;
+	struct btrfs_path *path = NULL;
+	struct btrfs_qgroup_status_item *ptr;
+	struct extent_buffer *leaf;
+	struct btrfs_key key;
+	int ret = 0;
+
+	spin_lock(&fs_info->qgroup_lock);
+	if (fs_info->quota_root) {
+		fs_info->pending_quota_state = 1;
+		spin_unlock(&fs_info->qgroup_lock);
+		goto out;
+	}
+	spin_unlock(&fs_info->qgroup_lock);
+
+	/*
+	 * initially create the quota tree
+	 */
+	quota_root = btrfs_create_tree(trans, fs_info,
+				       BTRFS_QUOTA_TREE_OBJECTID);
+	if (IS_ERR(quota_root)) {
+		ret =  PTR_ERR(quota_root);
+		goto out;
+	}
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = 0;
+	key.type = BTRFS_QGROUP_STATUS_KEY;
+	key.offset = 0;
+
+	ret = btrfs_insert_empty_item(trans, quota_root, path, &key,
+				      sizeof(*ptr));
+	if (ret)
+		goto out;
+
+	leaf = path->nodes[0];
+	ptr = btrfs_item_ptr(leaf, path->slots[0],
+				 struct btrfs_qgroup_status_item);
+	btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid);
+	btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION);
+	fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON |
+				BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+	btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags);
+	btrfs_set_qgroup_status_scan(leaf, ptr, 0);
+
+	btrfs_mark_buffer_dirty(leaf);
+
+	spin_lock(&fs_info->qgroup_lock);
+	fs_info->quota_root = quota_root;
+	fs_info->pending_quota_state = 1;
+	spin_unlock(&fs_info->qgroup_lock);
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+int btrfs_quota_disable(struct btrfs_trans_handle *trans,
+			struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *tree_root = fs_info->tree_root;
+	struct btrfs_root *quota_root;
+	int ret = 0;
+
+	spin_lock(&fs_info->qgroup_lock);
+	fs_info->pending_quota_state = 0;
+	quota_root = fs_info->quota_root;
+	fs_info->quota_root = NULL;
+	btrfs_free_qgroup_config(fs_info);
+	spin_unlock(&fs_info->qgroup_lock);
+
+	if (!quota_root)
+		return -EINVAL;
+
+	ret = btrfs_clean_quota_tree(trans, quota_root);
+	if (ret)
+		goto out;
+
+	ret = btrfs_del_root(trans, tree_root, &quota_root->root_key);
+	if (ret)
+		goto out;
+
+	list_del(&quota_root->dirty_list);
+
+	btrfs_tree_lock(quota_root->node);
+	clean_tree_block(trans, tree_root, quota_root->node);
+	btrfs_tree_unlock(quota_root->node);
+	btrfs_free_tree_block(trans, quota_root, quota_root->node, 0, 1);
+
+	free_extent_buffer(quota_root->node);
+	free_extent_buffer(quota_root->commit_root);
+	kfree(quota_root);
+out:
+	return ret;
+}
+
+int btrfs_quota_rescan(struct btrfs_fs_info *fs_info)
+{
+	/* FIXME */
+	return 0;
+}
+
+int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
+			      struct btrfs_fs_info *fs_info, u64 src, u64 dst)
+{
+	struct btrfs_root *quota_root;
+	int ret = 0;
+
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		return -EINVAL;
+
+	ret = add_qgroup_relation_item(trans, quota_root, src, dst);
+	if (ret)
+		return ret;
+
+	ret = add_qgroup_relation_item(trans, quota_root, dst, src);
+	if (ret) {
+		del_qgroup_relation_item(trans, quota_root, src, dst);
+		return ret;
+	}
+
+	spin_lock(&fs_info->qgroup_lock);
+	ret = add_relation_rb(quota_root->fs_info, src, dst);
+	spin_unlock(&fs_info->qgroup_lock);
+
+	return ret;
+}
+
+int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
+			      struct btrfs_fs_info *fs_info, u64 src, u64 dst)
+{
+	struct btrfs_root *quota_root;
+	int ret = 0;
+	int err;
+
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		return -EINVAL;
+
+	ret = del_qgroup_relation_item(trans, quota_root, src, dst);
+	err = del_qgroup_relation_item(trans, quota_root, dst, src);
+	if (err && !ret)
+		ret = err;
+
+	spin_lock(&fs_info->qgroup_lock);
+	del_relation_rb(fs_info, src, dst);
+
+	spin_unlock(&fs_info->qgroup_lock);
+
+	return ret;
+}
+
+int btrfs_create_qgroup(struct btrfs_trans_handle *trans,
+			struct btrfs_fs_info *fs_info, u64 qgroupid, char *name)
+{
+	struct btrfs_root *quota_root;
+	struct btrfs_qgroup *qgroup;
+	int ret = 0;
+
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		return -EINVAL;
+
+	ret = add_qgroup_item(trans, quota_root, qgroupid);
+
+	spin_lock(&fs_info->qgroup_lock);
+	qgroup = add_qgroup_rb(fs_info, qgroupid);
+	spin_unlock(&fs_info->qgroup_lock);
+
+	if (IS_ERR(qgroup))
+		ret = PTR_ERR(qgroup);
+
+	return ret;
+}
+
+int btrfs_remove_qgroup(struct btrfs_trans_handle *trans,
+			struct btrfs_fs_info *fs_info, u64 qgroupid)
+{
+	struct btrfs_root *quota_root;
+	int ret = 0;
+
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		return -EINVAL;
+
+	ret = del_qgroup_item(trans, quota_root, qgroupid);
+
+	spin_lock(&fs_info->qgroup_lock);
+	del_qgroup_rb(quota_root->fs_info, qgroupid);
+
+	spin_unlock(&fs_info->qgroup_lock);
+
+	return ret;
+}
+
+int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
+		       struct btrfs_fs_info *fs_info, u64 qgroupid,
+		       struct btrfs_qgroup_limit *limit)
+{
+	struct btrfs_root *quota_root = fs_info->quota_root;
+	struct btrfs_qgroup *qgroup;
+	int ret = 0;
+
+	if (!quota_root)
+		return -EINVAL;
+
+	ret = update_qgroup_limit_item(trans, quota_root, qgroupid,
+				       limit->flags, limit->max_rfer,
+				       limit->max_excl, limit->rsv_rfer,
+				       limit->rsv_excl);
+	if (ret) {
+		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		printk(KERN_INFO "unable to update quota limit for %llu\n",
+		       (unsigned long long)qgroupid);
+	}
+
+	spin_lock(&fs_info->qgroup_lock);
+
+	qgroup = find_qgroup_rb(fs_info, qgroupid);
+	if (!qgroup) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+	qgroup->lim_flags = limit->flags;
+	qgroup->max_rfer = limit->max_rfer;
+	qgroup->max_excl = limit->max_excl;
+	qgroup->rsv_rfer = limit->rsv_rfer;
+	qgroup->rsv_excl = limit->rsv_excl;
+
+unlock:
+	spin_unlock(&fs_info->qgroup_lock);
+
+	return ret;
+}
+
+static void qgroup_dirty(struct btrfs_fs_info *fs_info,
+			 struct btrfs_qgroup *qgroup)
+{
+	if (list_empty(&qgroup->dirty))
+		list_add(&qgroup->dirty, &fs_info->dirty_qgroups);
+}
+
+/*
+ * btrfs_qgroup_record_ref is called for every ref that is added to or deleted
+ * from the fs. First, all roots referencing the extent are searched, and
+ * then the space is accounted accordingly to the different roots. The
+ * accounting algorithm works in 3 steps documented inline.
+ */
+int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans,
+			     struct btrfs_fs_info *fs_info,
+			     struct btrfs_delayed_ref_node *node,
+			     struct btrfs_delayed_extent_op *extent_op)
+{
+	struct btrfs_key ins;
+	struct btrfs_root *quota_root;
+	u64 ref_root;
+	struct btrfs_qgroup *qgroup;
+	struct ulist_node *unode;
+	struct ulist *roots = NULL;
+	struct ulist *tmp = NULL;
+	u64 seq;
+	int ret = 0;
+	int sgn;
+
+	if (!fs_info->quota_enabled)
+		return 0;
+
+	BUG_ON(!fs_info->quota_root);
+
+	ins.objectid = node->bytenr;
+	ins.offset = node->num_bytes;
+	ins.type = BTRFS_EXTENT_ITEM_KEY;
+
+	if (node->type == BTRFS_TREE_BLOCK_REF_KEY ||
+	    node->type == BTRFS_SHARED_BLOCK_REF_KEY) {
+		struct btrfs_delayed_tree_ref *ref;
+		ref = btrfs_delayed_node_to_tree_ref(node);
+		ref_root = ref->root;
+	} else if (node->type == BTRFS_EXTENT_DATA_REF_KEY ||
+		   node->type == BTRFS_SHARED_DATA_REF_KEY) {
+		struct btrfs_delayed_data_ref *ref;
+		ref = btrfs_delayed_node_to_data_ref(node);
+		ref_root = ref->root;
+	} else {
+		BUG();
+	}
+
+	if (!is_fstree(ref_root)) {
+		/*
+		 * non-fs-trees are not being accounted
+		 */
+		return 0;
+	}
+
+	switch (node->action) {
+	case BTRFS_ADD_DELAYED_REF:
+	case BTRFS_ADD_DELAYED_EXTENT:
+		sgn = 1;
+		break;
+	case BTRFS_DROP_DELAYED_REF:
+		sgn = -1;
+		break;
+	case BTRFS_UPDATE_DELAYED_HEAD:
+		return 0;
+	default:
+		BUG();
+	}
+
+	ret = btrfs_find_all_roots(trans, fs_info, node->bytenr,
+				   node->num_bytes,
+				   sgn > 0 ? node->seq - 1 : node->seq, &roots);
+	if (IS_ERR(roots)) {
+		ret = PTR_ERR(roots);
+		goto out;
+	}
+
+	spin_lock(&fs_info->qgroup_lock);
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		goto out;
+
+	qgroup = find_qgroup_rb(fs_info, ref_root);
+	if (!qgroup)
+		goto out;
+
+	/*
+	 * step 1: for each old ref, visit all nodes once and inc refcnt
+	 */
+	unode = NULL;
+	tmp = ulist_alloc(GFP_ATOMIC);
+	if (!tmp) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	seq = fs_info->qgroup_seq;
+	fs_info->qgroup_seq += roots->nnodes + 1; /* max refcnt */
+
+	while ((unode = ulist_next(roots, unode))) {
+		struct ulist_node *tmp_unode;
+		struct btrfs_qgroup *qg;
+
+		qg = find_qgroup_rb(fs_info, unode->val);
+		if (!qg)
+			continue;
+
+		ulist_reinit(tmp);
+						/* XXX id not needed */
+		ulist_add(tmp, qg->qgroupid, (unsigned long)qg, GFP_ATOMIC);
+		tmp_unode = NULL;
+		while ((tmp_unode = ulist_next(tmp, tmp_unode))) {
+			struct btrfs_qgroup_list *glist;
+
+			qg = (struct btrfs_qgroup *)tmp_unode->aux;
+			if (qg->refcnt < seq)
+				qg->refcnt = seq + 1;
+			else
+				++qg->refcnt;
+
+			list_for_each_entry(glist, &qg->groups, next_group) {
+				ulist_add(tmp, glist->group->qgroupid,
+					  (unsigned long)glist->group,
+					  GFP_ATOMIC);
+			}
+		}
+	}
+
+	/*
+	 * step 2: walk from the new root
+	 */
+	ulist_reinit(tmp);
+	ulist_add(tmp, qgroup->qgroupid, (unsigned long)qgroup, GFP_ATOMIC);
+	unode = NULL;
+	while ((unode = ulist_next(tmp, unode))) {
+		struct btrfs_qgroup *qg;
+		struct btrfs_qgroup_list *glist;
+
+		qg = (struct btrfs_qgroup *)unode->aux;
+		if (qg->refcnt < seq) {
+			/* not visited by step 1 */
+			qg->rfer += sgn * node->num_bytes;
+			qg->rfer_cmpr += sgn * node->num_bytes;
+			if (roots->nnodes == 0) {
+				qg->excl += sgn * node->num_bytes;
+				qg->excl_cmpr += sgn * node->num_bytes;
+			}
+			qgroup_dirty(fs_info, qg);
+		}
+		WARN_ON(qg->tag >= seq);
+		qg->tag = seq;
+
+		list_for_each_entry(glist, &qg->groups, next_group) {
+			ulist_add(tmp, glist->group->qgroupid,
+				  (unsigned long)glist->group, GFP_ATOMIC);
+		}
+	}
+
+	/*
+	 * step 3: walk again from old refs
+	 */
+	while ((unode = ulist_next(roots, unode))) {
+		struct btrfs_qgroup *qg;
+		struct ulist_node *tmp_unode;
+
+		qg = find_qgroup_rb(fs_info, unode->val);
+		if (!qg)
+			continue;
+
+		ulist_reinit(tmp);
+		ulist_add(tmp, qg->qgroupid, (unsigned long)qg, GFP_ATOMIC);
+		tmp_unode = NULL;
+		while ((tmp_unode = ulist_next(tmp, tmp_unode))) {
+			struct btrfs_qgroup_list *glist;
+
+			qg = (struct btrfs_qgroup *)tmp_unode->aux;
+			if (qg->tag == seq)
+				continue;
+
+			if (qg->refcnt - seq == roots->nnodes) {
+				qg->excl -= sgn * node->num_bytes;
+				qg->excl_cmpr -= sgn * node->num_bytes;
+				qgroup_dirty(fs_info, qg);
+			}
+
+			list_for_each_entry(glist, &qg->groups, next_group) {
+				ulist_add(tmp, glist->group->qgroupid,
+					  (unsigned long)glist->group,
+					  GFP_ATOMIC);
+			}
+		}
+	}
+	ret = 0;
+out:
+	spin_unlock(&fs_info->qgroup_lock);
+	ulist_free(roots);
+	ulist_free(tmp);
+
+	return ret;
+}
+
+/*
+ * called from commit_transaction. Writes all changed qgroups to disk.
+ */
+int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
+		      struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *quota_root = fs_info->quota_root;
+	int ret = 0;
+
+	if (!quota_root)
+		goto out;
+
+	fs_info->quota_enabled = fs_info->pending_quota_state;
+
+	spin_lock(&fs_info->qgroup_lock);
+	while (!list_empty(&fs_info->dirty_qgroups)) {
+		struct btrfs_qgroup *qgroup;
+		qgroup = list_first_entry(&fs_info->dirty_qgroups,
+					  struct btrfs_qgroup, dirty);
+		list_del_init(&qgroup->dirty);
+		spin_unlock(&fs_info->qgroup_lock);
+		ret = update_qgroup_info_item(trans, quota_root, qgroup);
+		if (ret)
+			fs_info->qgroup_flags |+					BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+		spin_lock(&fs_info->qgroup_lock);
+	}
+	if (fs_info->quota_enabled)
+		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_ON;
+	else
+		fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_ON;
+	spin_unlock(&fs_info->qgroup_lock);
+
+	ret = update_qgroup_status_item(trans, fs_info, quota_root);
+	if (ret)
+		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
+
+out:
+
+	return ret;
+}
+
+/*
+ * copy the acounting information between qgroups. This is necessary when a
+ * snapshot or a subvolume is created
+ */
+int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
+			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
+			 struct btrfs_qgroup_inherit *inherit)
+{
+	int ret = 0;
+	int i;
+	u64 *i_qgroups;
+	struct btrfs_root *quota_root = fs_info->quota_root;
+	struct btrfs_qgroup *srcgroup;
+	struct btrfs_qgroup *dstgroup;
+	u32 level_size = 0;
+
+	if (!fs_info->quota_enabled)
+		return 0;
+
+	if (!quota_root)
+		ret = -EINVAL;
+
+	/*
+	 * create a tracking group for the subvol itself
+	 */
+	ret = add_qgroup_item(trans, quota_root, objectid);
+	if (ret)
+		goto out;
+
+	if (inherit && inherit->flags &
BTRFS_QGROUP_INHERIT_SET_LIMITS) {
+		ret = update_qgroup_limit_item(trans, quota_root, objectid,
+					       inherit->lim.flags,
+					       inherit->lim.max_rfer,
+					       inherit->lim.max_excl,
+					       inherit->lim.rsv_rfer,
+					       inherit->lim.rsv_excl);
+		if (ret)
+			goto out;
+	}
+
+	if (srcid) {
+		struct btrfs_root *srcroot;
+		struct btrfs_key srckey;
+		int srcroot_level;
+
+		srckey.objectid = srcid;
+		srckey.type = BTRFS_ROOT_ITEM_KEY;
+		srckey.offset = (u64)-1;
+		srcroot = btrfs_read_fs_root_no_name(fs_info, &srckey);
+		if (IS_ERR(srcroot)) {
+			ret = PTR_ERR(srcroot);
+			goto out;
+		}
+
+		rcu_read_lock();
+		srcroot_level = btrfs_header_level(srcroot->node);
+		level_size = btrfs_level_size(srcroot, srcroot_level);
+		rcu_read_unlock();
+	}
+
+	/*
+	 * add qgroup to all inherited groups
+	 */
+	if (inherit) {
+		i_qgroups = (u64 *)(inherit + 1);
+		for (i = 0; i < inherit->num_qgroups; ++i) {
+			ret = add_qgroup_relation_item(trans, quota_root,
+						       objectid, *i_qgroups);
+			if (ret)
+				goto out;
+			ret = add_qgroup_relation_item(trans, quota_root,
+						       *i_qgroups, objectid);
+			if (ret)
+				goto out;
+			++i_qgroups;
+		}
+	}
+
+
+	spin_lock(&fs_info->qgroup_lock);
+
+	dstgroup = add_qgroup_rb(fs_info, objectid);
+	if (!dstgroup)
+		goto unlock;
+
+	if (srcid) {
+		srcgroup = find_qgroup_rb(fs_info, srcid);
+		if (!srcgroup)
+			goto unlock;
+		dstgroup->rfer = srcgroup->rfer - level_size;
+		dstgroup->rfer_cmpr = srcgroup->rfer_cmpr - level_size;
+		srcgroup->excl = level_size;
+		srcgroup->excl_cmpr = level_size;
+		qgroup_dirty(fs_info, dstgroup);
+		qgroup_dirty(fs_info, srcgroup);
+	}
+
+	if (!inherit)
+		goto unlock;
+
+	i_qgroups = (u64 *)(inherit + 1);
+	for (i = 0; i < inherit->num_qgroups; ++i) {
+		ret = add_relation_rb(quota_root->fs_info, objectid,
+				      *i_qgroups);
+		if (ret)
+			goto unlock;
+		++i_qgroups;
+	}
+
+	for (i = 0; i <  inherit->num_ref_copies; ++i) {
+		struct btrfs_qgroup *src;
+		struct btrfs_qgroup *dst;
+
+		src = find_qgroup_rb(fs_info, i_qgroups[0]);
+		dst = find_qgroup_rb(fs_info, i_qgroups[1]);
+
+		if (!src || !dst) {
+			ret = -EINVAL;
+			goto unlock;
+		}
+
+		dst->rfer = src->rfer - level_size;
+		dst->rfer_cmpr = src->rfer_cmpr - level_size;
+		i_qgroups += 2;
+	}
+	for (i = 0; i <  inherit->num_excl_copies; ++i) {
+		struct btrfs_qgroup *src;
+		struct btrfs_qgroup *dst;
+
+		src = find_qgroup_rb(fs_info, i_qgroups[0]);
+		dst = find_qgroup_rb(fs_info, i_qgroups[1]);
+
+		if (!src || !dst) {
+			ret = -EINVAL;
+			goto unlock;
+		}
+
+		dst->excl = src->excl + level_size;
+		dst->excl_cmpr = src->excl_cmpr + level_size;
+		i_qgroups += 2;
+	}
+
+unlock:
+	spin_unlock(&fs_info->qgroup_lock);
+out:
+	return 0;
+}
+
+/*
+ * reserve some space for a qgroup and all its parents. The reservation takes
+ * place with start_transaction or dealloc_reserve, similar to ENOSPC
+ * accounting. If not enough space is available, EDQUOT is returned.
+ * We assume that the requested space is new for all qgroups.
+ */
+int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
+{
+	struct btrfs_root *quota_root;
+	struct btrfs_qgroup *qgroup;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	u64 ref_root = root->root_key.objectid;
+	int ret = 0;
+	struct ulist *ulist = NULL;
+	struct ulist_node *unode;
+
+	if (!is_fstree(ref_root))
+		return 0;
+
+	if (num_bytes == 0)
+		return 0;
+
+	spin_lock(&fs_info->qgroup_lock);
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		goto out;
+
+	qgroup = find_qgroup_rb(fs_info, ref_root);
+	if (!qgroup)
+		goto out;
+
+	/*
+	 * in a first step, we check all affected qgroups if any limits would
+	 * be exceeded
+	 */
+	ulist = ulist_alloc(GFP_ATOMIC);
+	ulist_add(ulist, qgroup->qgroupid, (unsigned long)qgroup, GFP_ATOMIC);
+	unode = NULL;
+	while ((unode = ulist_next(ulist, unode))) {
+		struct btrfs_qgroup *qg;
+		struct btrfs_qgroup_list *glist;
+
+		qg = (struct btrfs_qgroup *)unode->aux;
+
+		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
+		    qg->reserved + qg->rfer + num_bytes >
+		    qg->max_rfer)
+			ret = -EDQUOT;
+
+		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
+		    qg->reserved + qg->excl + num_bytes >
+		    qg->max_excl)
+			ret = -EDQUOT;
+
+		list_for_each_entry(glist, &qg->groups, next_group) {
+			ulist_add(ulist, glist->group->qgroupid,
+				  (unsigned long)glist->group, GFP_ATOMIC);
+		}
+	}
+	if (ret)
+		goto out;
+
+	/*
+	 * no limits exceeded, now record the reservation into all qgroups
+	 */
+	unode = NULL;
+	while ((unode = ulist_next(ulist, unode))) {
+		struct btrfs_qgroup *qg;
+
+		qg = (struct btrfs_qgroup *)unode->aux;
+
+		qg->reserved += num_bytes;
+#if 0
+		qgroup_dirty(fs_info, qg);/* XXX not necesarry */
+#endif
+	}
+
+out:
+	spin_unlock(&fs_info->qgroup_lock);
+	ulist_free(ulist);
+
+	return ret;
+}
+
+void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
+{
+	struct btrfs_root *quota_root;
+	struct btrfs_qgroup *qgroup;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct ulist *ulist = NULL;
+	struct ulist_node *unode;
+	u64 ref_root = root->root_key.objectid;
+
+	if (!is_fstree(ref_root))
+		return;
+
+	if (num_bytes == 0)
+		return;
+
+	spin_lock(&fs_info->qgroup_lock);
+
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		goto out;
+
+	qgroup = find_qgroup_rb(fs_info, ref_root);
+	if (!qgroup)
+		goto out;
+
+	ulist = ulist_alloc(GFP_ATOMIC);
+	ulist_add(ulist, qgroup->qgroupid, (unsigned long)qgroup, GFP_ATOMIC);
+	unode = NULL;
+	while ((unode = ulist_next(ulist, unode))) {
+		struct btrfs_qgroup *qg;
+		struct btrfs_qgroup_list *glist;
+
+		qg = (struct btrfs_qgroup *)unode->aux;
+
+		qg->reserved -= num_bytes;
+#if 0
+qgroup_dirty(fs_info, qg);
+#endif
+
+		list_for_each_entry(glist, &qg->groups, next_group) {
+			ulist_add(ulist, glist->group->qgroupid,
+				  (unsigned long)glist->group, GFP_ATOMIC);
+		}
+	}
+
+out:
+	spin_unlock(&fs_info->qgroup_lock);
+	ulist_free(ulist);
+}
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 20/24] Btrfs: quota tree support and startup

From: Arne Jansen <sensille@gmx.net>

Init the quota tree along with the others on open_ctree
and close_ctree. Add the quota tree to the list of well
known trees in btrfs_read_fs_root_no_name.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/ctree.h   |    1 +
 fs/btrfs/disk-io.c |   47 +++++++++++++++++++++++++++++++++++++++++------
 fs/btrfs/qgroup.c  |    4 ++--
 3 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0630412..f70ddb8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2908,6 +2908,7 @@ static inline void free_fs_info(struct btrfs_fs_info
*fs_info)
 	kfree(fs_info->chunk_root);
 	kfree(fs_info->dev_root);
 	kfree(fs_info->csum_root);
+	kfree(fs_info->quota_root);
 	kfree(fs_info->super_copy);
 	kfree(fs_info->super_for_commit);
 	kfree(fs_info);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d42ad71..df0fde8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1471,6 +1471,9 @@ struct btrfs_root *btrfs_read_fs_root_no_name(struct
btrfs_fs_info *fs_info,
 		return fs_info->dev_root;
 	if (location->objectid == BTRFS_CSUM_TREE_OBJECTID)
 		return fs_info->csum_root;
+	if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID)
+		return fs_info->quota_root ? fs_info->quota_root :
+					     ERR_PTR(-ENOENT);
 again:
 	spin_lock(&fs_info->fs_roots_radix_lock);
 	root = radix_tree_lookup(&fs_info->fs_roots_radix,
@@ -1898,6 +1901,10 @@ static void free_root_pointers(struct btrfs_fs_info
*info, int chunk_root)
 	free_extent_buffer(info->extent_root->commit_root);
 	free_extent_buffer(info->csum_root->node);
 	free_extent_buffer(info->csum_root->commit_root);
+	if (info->quota_root) {
+		free_extent_buffer(info->quota_root->node);
+		free_extent_buffer(info->quota_root->commit_root);
+	}
 
 	info->tree_root->node = NULL;
 	info->tree_root->commit_root = NULL;
@@ -1907,6 +1914,10 @@ static void free_root_pointers(struct btrfs_fs_info
*info, int chunk_root)
 	info->extent_root->commit_root = NULL;
 	info->csum_root->node = NULL;
 	info->csum_root->commit_root = NULL;
+	if (info->quota_root) {
+		info->quota_root->node = NULL;
+		info->quota_root->commit_root = NULL;
+	}
 
 	if (chunk_root) {
 		free_extent_buffer(info->chunk_root->node);
@@ -1937,6 +1948,7 @@ int open_ctree(struct super_block *sb,
 	struct btrfs_root *csum_root;
 	struct btrfs_root *chunk_root;
 	struct btrfs_root *dev_root;
+	struct btrfs_root *quota_root;
 	struct btrfs_root *log_tree_root;
 	int ret;
 	int err = -EINVAL;
@@ -1948,9 +1960,10 @@ int open_ctree(struct super_block *sb,
 	csum_root = fs_info->csum_root = btrfs_alloc_root(fs_info);
 	chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info);
 	dev_root = fs_info->dev_root = btrfs_alloc_root(fs_info);
+	quota_root = fs_info->quota_root = btrfs_alloc_root(fs_info);
 
 	if (!tree_root || !extent_root || !csum_root ||
-	    !chunk_root || !dev_root) {
+	    !chunk_root || !dev_root || !quota_root) {
 		err = -ENOMEM;
 		goto fail;
 	}
@@ -2438,6 +2451,17 @@ retry_root_backup:
 
 	csum_root->track_dirty = 1;
 
+	ret = find_and_setup_root(tree_root, fs_info,
+				  BTRFS_QUOTA_TREE_OBJECTID, quota_root);
+	if (ret) {
+		kfree(quota_root);
+		quota_root = fs_info->quota_root = NULL;
+	} else {
+		quota_root->track_dirty = 1;
+		fs_info->quota_enabled = 1;
+		fs_info->pending_quota_state = 1;
+	}
+
 	fs_info->generation = generation;
 	fs_info->last_trans_committed = generation;
 
@@ -2484,6 +2508,9 @@ retry_root_backup:
 			       " integrity check module %s\n", sb->s_id);
 	}
 #endif
+	ret = btrfs_read_qgroup_config(fs_info);
+	if (ret)
+		goto fail_trans_kthread;
 
 	/* do not make disk changes in broken FS */
 	if (btrfs_super_log_root(disk_super) != 0 &&
@@ -2494,7 +2521,7 @@ retry_root_backup:
 			printk(KERN_WARNING "Btrfs log replay required "
 			       "on RO media\n");
 			err = -EIO;
-			goto fail_trans_kthread;
+			goto fail_qgroup;
 		}
 		blocksize  		     btrfs_level_size(tree_root,
@@ -2503,7 +2530,7 @@ retry_root_backup:
 		log_tree_root = btrfs_alloc_root(fs_info);
 		if (!log_tree_root) {
 			err = -ENOMEM;
-			goto fail_trans_kthread;
+			goto fail_qgroup;
 		}
 
 		__setup_root(nodesize, leafsize, sectorsize, stripesize,
@@ -2543,7 +2570,7 @@ retry_root_backup:
 			printk(KERN_WARNING
 			       "btrfs: failed to recover relocation\n");
 			err = -EINVAL;
-			goto fail_trans_kthread;
+			goto fail_qgroup;
 		}
 	}
 
@@ -2553,10 +2580,10 @@ retry_root_backup:
 
 	fs_info->fs_root = btrfs_read_fs_root_no_name(fs_info, &location);
 	if (!fs_info->fs_root)
-		goto fail_trans_kthread;
+		goto fail_qgroup;
 	if (IS_ERR(fs_info->fs_root)) {
 		err = PTR_ERR(fs_info->fs_root);
-		goto fail_trans_kthread;
+		goto fail_qgroup;
 	}
 
 	if (!(sb->s_flags & MS_RDONLY)) {
@@ -2577,6 +2604,8 @@ retry_root_backup:
 
 	return 0;
 
+fail_qgroup:
+	btrfs_free_qgroup_config(fs_info);
 fail_trans_kthread:
 	kthread_stop(fs_info->transaction_kthread);
 fail_cleaner:
@@ -3182,6 +3211,8 @@ int close_ctree(struct btrfs_root *root)
 	fs_info->closing = 2;
 	smp_mb();
 
+	btrfs_free_qgroup_config(root->fs_info);
+
 	if (fs_info->delalloc_bytes) {
 		printk(KERN_INFO "btrfs: at unmount delalloc count %llu\n",
 		       (unsigned long long)fs_info->delalloc_bytes);
@@ -3201,6 +3232,10 @@ int close_ctree(struct btrfs_root *root)
 	free_extent_buffer(fs_info->dev_root->commit_root);
 	free_extent_buffer(fs_info->csum_root->node);
 	free_extent_buffer(fs_info->csum_root->commit_root);
+	if (fs_info->quota_root) {
+		free_extent_buffer(fs_info->quota_root->node);
+		free_extent_buffer(fs_info->quota_root->commit_root);
+	}
 
 	btrfs_free_block_groups(fs_info);
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 678fe45..014ee8a 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1076,8 +1076,8 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle
*trans,
 	}
 
 	ret = btrfs_find_all_roots(trans, fs_info, node->bytenr,
-				   node->num_bytes,
-				   sgn > 0 ? node->seq - 1 : node->seq, &roots);
+				   sgn > 0 ? node->seq - 1 : node->seq, 0,
+				   &roots);
 	if (IS_ERR(roots)) {
 		ret = PTR_ERR(roots);
 		goto out;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 21/24] Btrfs: hooks for qgroup to record delayed refs

From: Arne Jansen <sensille@gmx.net>

Hooks into qgroup code to record refs and into transaction commit.
This is the main entry point for qgroup. Basically every change in
extent backrefs got accounted to the appropriate qgroups.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/delayed-ref.c |   29 +++++++++++++++++++++++------
 fs/btrfs/transaction.c |    7 +++++++
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 69f22e3..fe07753 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -504,7 +504,7 @@ static noinline void add_delayed_tree_ref(struct
btrfs_fs_info *fs_info,
 					 struct btrfs_delayed_ref_node *ref,
 					 u64 bytenr, u64 num_bytes, u64 parent,
 					 u64 ref_root, int level, int action,
-					 int for_cow)
+					 int for_cow, struct seq_list *seq_elem)
 {
 	struct btrfs_delayed_ref_node *existing;
 	struct btrfs_delayed_tree_ref *full_ref;
@@ -525,8 +525,10 @@ static noinline void add_delayed_tree_ref(struct
btrfs_fs_info *fs_info,
 	ref->is_head = 0;
 	ref->in_tree = 1;
 
-	if (need_ref_seq(for_cow, ref_root))
+	if (need_ref_seq(for_cow, ref_root)) {
+		btrfs_get_delayed_seq(delayed_refs, seq_elem);
 		seq = inc_delayed_seq(delayed_refs);
+	}
 	ref->seq = seq;
 
 	full_ref = btrfs_delayed_node_to_tree_ref(ref);
@@ -563,7 +565,8 @@ static noinline void add_delayed_data_ref(struct
btrfs_fs_info *fs_info,
 					 struct btrfs_delayed_ref_node *ref,
 					 u64 bytenr, u64 num_bytes, u64 parent,
 					 u64 ref_root, u64 owner, u64 offset,
-					 int action, int for_cow)
+					 int action, int for_cow,
+					 struct seq_list *seq_elem)
 {
 	struct btrfs_delayed_ref_node *existing;
 	struct btrfs_delayed_data_ref *full_ref;
@@ -584,8 +587,10 @@ static noinline void add_delayed_data_ref(struct
btrfs_fs_info *fs_info,
 	ref->is_head = 0;
 	ref->in_tree = 1;
 
-	if (need_ref_seq(for_cow, ref_root))
+	if (need_ref_seq(for_cow, ref_root)) {
+		btrfs_get_delayed_seq(delayed_refs, seq_elem);
 		seq = inc_delayed_seq(delayed_refs);
+	}
 	ref->seq = seq;
 
 	full_ref = btrfs_delayed_node_to_data_ref(ref);
@@ -631,6 +636,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info
*fs_info,
 	struct btrfs_delayed_tree_ref *ref;
 	struct btrfs_delayed_ref_head *head_ref;
 	struct btrfs_delayed_ref_root *delayed_refs;
+	struct seq_list seq_elem;
 
 	BUG_ON(extent_op && extent_op->is_data);
 	ref = kmalloc(sizeof(*ref), GFP_NOFS);
@@ -657,11 +663,16 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info
*fs_info,
 
 	add_delayed_tree_ref(fs_info, trans, &ref->node, bytenr,
 				   num_bytes, parent, ref_root, level, action,
-				   for_cow);
+				   for_cow, &seq_elem);
 	if (!need_ref_seq(for_cow, ref_root) &&
 	    waitqueue_active(&delayed_refs->seq_wait))
 		wake_up(&delayed_refs->seq_wait);
 	spin_unlock(&delayed_refs->lock);
+	if (need_ref_seq(for_cow, ref_root)) {
+		btrfs_qgroup_record_ref(trans, fs_info, &ref->node, extent_op);
+		btrfs_put_delayed_seq(delayed_refs, &seq_elem);
+	}
+
 	return 0;
 }
 
@@ -679,6 +690,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info
*fs_info,
 	struct btrfs_delayed_data_ref *ref;
 	struct btrfs_delayed_ref_head *head_ref;
 	struct btrfs_delayed_ref_root *delayed_refs;
+	struct seq_list seq_elem;
 
 	BUG_ON(extent_op && !extent_op->is_data);
 	ref = kmalloc(sizeof(*ref), GFP_NOFS);
@@ -705,11 +717,16 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info
*fs_info,
 
 	add_delayed_data_ref(fs_info, trans, &ref->node, bytenr,
 				   num_bytes, parent, ref_root, owner, offset,
-				   action, for_cow);
+				   action, for_cow, &seq_elem);
 	if (!need_ref_seq(for_cow, ref_root) &&
 	    waitqueue_active(&delayed_refs->seq_wait))
 		wake_up(&delayed_refs->seq_wait);
 	spin_unlock(&delayed_refs->lock);
+	if (need_ref_seq(for_cow, ref_root)) {
+		btrfs_qgroup_record_ref(trans, fs_info, &ref->node, extent_op);
+		btrfs_put_delayed_seq(delayed_refs, &seq_elem);
+	}
+
 	return 0;
 }
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index cde906f..2a5e75c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -774,6 +774,13 @@ static noinline int commit_cowonly_roots(struct
btrfs_trans_handle *trans,
 	if (ret)
 		return ret;
 
+	ret = btrfs_run_qgroups(trans, root->fs_info);
+	BUG_ON(ret);
+
+	/* run_qgroups might have added some more refs */
+	ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
+	BUG_ON(ret);
+
 	while (!list_empty(&fs_info->dirty_cowonly_roots)) {
 		next = fs_info->dirty_cowonly_roots.next;
 		list_del_init(next);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 22/24] Btrfs: hooks to reserve qgroup space

From: Arne Jansen <sensille@gmx.net>

Like block reserves, reserve a small piece of space on each
transaction start and for delalloc. These are the hooks that
can actually return EDQUOT to the user.
The amount of space reserved is tracked in the transaction
handle.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/extent-tree.c |   12 ++++++++++++
 fs/btrfs/transaction.c |   16 ++++++++++++++++
 fs/btrfs/transaction.h |    1 +
 3 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a7f980b..855a3ff 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4524,6 +4524,13 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode,
u64 num_bytes)
 	csum_bytes = BTRFS_I(inode)->csum_bytes;
 	spin_unlock(&BTRFS_I(inode)->lock);
 
+	if (root->fs_info->quota_enabled) {
+		ret = btrfs_qgroup_reserve(root, num_bytes +
+					   nr_extents * root->leafsize);
+		if (ret)
+			return ret;
+	}
+
 	ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush);
 	if (ret) {
 		u64 to_free = 0;
@@ -4601,6 +4608,11 @@ void btrfs_delalloc_release_metadata(struct inode *inode,
u64 num_bytes)
 
 	trace_btrfs_space_reservation(root->fs_info, "delalloc",
 				      btrfs_ino(inode), to_free, 0);
+	if (root->fs_info->quota_enabled) {
+		btrfs_qgroup_free(root, num_bytes +
+					dropped * root->leafsize);
+	}
+
 	btrfs_block_rsv_release(root, &root->fs_info->delalloc_block_rsv,
 				to_free);
 }
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 2a5e75c..69b52bd 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -285,6 +285,7 @@ static struct btrfs_trans_handle *start_transaction(struct
btrfs_root *root,
 	struct btrfs_transaction *cur_trans;
 	u64 num_bytes = 0;
 	int ret;
+	u64 qgroup_reserved = 0;
 
 	if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR)
 		return ERR_PTR(-EROFS);
@@ -303,6 +304,14 @@ static struct btrfs_trans_handle *start_transaction(struct
btrfs_root *root,
 	 * the appropriate flushing if need be.
 	 */
 	if (num_items > 0 && root != root->fs_info->chunk_root) {
+		if (root->fs_info->quota_enabled &&
+		    is_fstree(root->root_key.objectid)) {
+			qgroup_reserved = num_items * root->leafsize;
+			ret = btrfs_qgroup_reserve(root, qgroup_reserved);
+			if (ret)
+				return ERR_PTR(ret);
+		}
+
 		num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
 		ret = btrfs_block_rsv_add(root,
 					  &root->fs_info->trans_block_rsv,
@@ -341,6 +350,7 @@ again:
 	h->block_rsv = NULL;
 	h->orig_rsv = NULL;
 	h->aborted = 0;
+	h->qgroup_reserved = qgroup_reserved;
 
 	smp_mb();
 	if (cur_trans->blocked && may_wait_transaction(root, type)) {
@@ -507,6 +517,12 @@ static int __btrfs_end_transaction(struct
btrfs_trans_handle *trans,
 	 * end_transaction. Subvolume quota depends on this.
 	 */
 	WARN_ON(trans->root != root);
+
+	if (trans->qgroup_reserved) {
+		btrfs_qgroup_free(root, trans->qgroup_reserved);
+		trans->qgroup_reserved = 0;
+	}
+
 	while (count < 2) {
 		unsigned long cur = trans->delayed_ref_updates;
 		trans->delayed_ref_updates = 0;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 0107294..addfa2d 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -49,6 +49,7 @@ struct btrfs_transaction {
 struct btrfs_trans_handle {
 	u64 transid;
 	u64 bytes_reserved;
+	u64 qgroup_reserved;
 	unsigned long use_count;
 	unsigned long blocks_reserved;
 	unsigned long blocks_used;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 23/24] Btrfs: add qgroup ioctls

From: Arne Jansen <sensille@gmx.net>

Ioctls to control the qgroup feature like adding and
removing qgroups and assigning qgroups.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/ioctl.c |  185 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/ioctl.h |   27 ++++++++
 2 files changed, 212 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7f3a913..869c335 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3342,6 +3342,183 @@ out:
 	return ret;
 }
 
+static long btrfs_ioctl_quota_ctl(struct btrfs_root *root, void __user *arg)
+{
+	struct btrfs_ioctl_quota_ctl_args *sa;
+	struct btrfs_trans_handle *trans = NULL;
+	int ret;
+	int err;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (root->fs_info->sb->s_flags & MS_RDONLY)
+		return -EROFS;
+
+	sa = memdup_user(arg, sizeof(*sa));
+	if (IS_ERR(sa))
+		return PTR_ERR(sa);
+
+	if (sa->cmd != BTRFS_QUOTA_CTL_RESCAN) {
+		trans = btrfs_start_transaction(root, 2);
+		if (IS_ERR(trans)) {
+			ret = PTR_ERR(trans);
+			goto out;
+		}
+	}
+
+	switch (sa->cmd) {
+	case BTRFS_QUOTA_CTL_ENABLE:
+		ret = btrfs_quota_enable(trans, root->fs_info);
+		break;
+	case BTRFS_QUOTA_CTL_DISABLE:
+		ret = btrfs_quota_disable(trans, root->fs_info);
+		break;
+	case BTRFS_QUOTA_CTL_RESCAN:
+		ret = btrfs_quota_rescan(root->fs_info);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	if (copy_to_user(arg, sa, sizeof(*sa)))
+		ret = -EFAULT;
+
+	if (trans) {
+		err = btrfs_commit_transaction(trans, root);
+		if (err && !ret)
+			ret = err;
+	}
+
+out:
+	kfree(sa);
+	return ret;
+}
+
+static long btrfs_ioctl_qgroup_assign(struct btrfs_root *root, void __user
*arg)
+{
+	struct btrfs_ioctl_qgroup_assign_args *sa;
+	struct btrfs_trans_handle *trans;
+	int ret;
+	int err;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (root->fs_info->sb->s_flags & MS_RDONLY)
+		return -EROFS;
+
+	sa = memdup_user(arg, sizeof(*sa));
+	if (IS_ERR(sa))
+		return PTR_ERR(sa);
+
+	trans = btrfs_join_transaction(root);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		goto out;
+	}
+
+	/* FIXME: check if the IDs really exist */
+	if (sa->assign) {
+		ret = btrfs_add_qgroup_relation(trans, root->fs_info,
+						sa->src, sa->dst);
+	} else {
+		ret = btrfs_del_qgroup_relation(trans, root->fs_info,
+						sa->src, sa->dst);
+	}
+
+	err = btrfs_end_transaction(trans, root);
+	if (err && !ret)
+		ret = err;
+
+out:
+	kfree(sa);
+	return ret;
+}
+
+static long btrfs_ioctl_qgroup_create(struct btrfs_root *root, void __user
*arg)
+{
+	struct btrfs_ioctl_qgroup_create_args *sa;
+	struct btrfs_trans_handle *trans;
+	int ret;
+	int err;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (root->fs_info->sb->s_flags & MS_RDONLY)
+		return -EROFS;
+
+	sa = memdup_user(arg, sizeof(*sa));
+	if (IS_ERR(sa))
+		return PTR_ERR(sa);
+
+	trans = btrfs_join_transaction(root);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		goto out;
+	}
+
+	/* FIXME: check if the IDs really exist */
+	if (sa->create) {
+		ret = btrfs_create_qgroup(trans, root->fs_info, sa->qgroupid,
+					  NULL);
+	} else {
+		ret = btrfs_remove_qgroup(trans, root->fs_info, sa->qgroupid);
+	}
+
+	err = btrfs_end_transaction(trans, root);
+	if (err && !ret)
+		ret = err;
+
+out:
+	kfree(sa);
+	return ret;
+}
+
+static long btrfs_ioctl_qgroup_limit(struct btrfs_root *root, void __user *arg)
+{
+	struct btrfs_ioctl_qgroup_limit_args *sa;
+	struct btrfs_trans_handle *trans;
+	int ret;
+	int err;
+	u64 qgroupid;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (root->fs_info->sb->s_flags & MS_RDONLY)
+		return -EROFS;
+
+	sa = memdup_user(arg, sizeof(*sa));
+	if (IS_ERR(sa))
+		return PTR_ERR(sa);
+
+	trans = btrfs_join_transaction(root);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		goto out;
+	}
+
+	qgroupid = sa->qgroupid;
+	if (!qgroupid) {
+		/* take the current subvol as qgroup */
+		qgroupid = root->root_key.objectid;
+	}
+
+	/* FIXME: check if the IDs really exist */
+	ret = btrfs_limit_qgroup(trans, root->fs_info, qgroupid, &sa->lim);
+
+	err = btrfs_end_transaction(trans, root);
+	if (err && !ret)
+		ret = err;
+
+out:
+	kfree(sa);
+	return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
 		cmd, unsigned long arg)
 {
@@ -3424,6 +3601,14 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_balance_ctl(root, arg);
 	case BTRFS_IOC_BALANCE_PROGRESS:
 		return btrfs_ioctl_balance_progress(root, argp);
+	case BTRFS_IOC_QUOTA_CTL:
+		return btrfs_ioctl_quota_ctl(root, argp);
+	case BTRFS_IOC_QGROUP_ASSIGN:
+		return btrfs_ioctl_qgroup_assign(root, argp);
+	case BTRFS_IOC_QGROUP_CREATE:
+		return btrfs_ioctl_qgroup_create(root, argp);
+	case BTRFS_IOC_QGROUP_LIMIT:
+		return btrfs_ioctl_qgroup_limit(root, argp);
 	}
 
 	return -ENOTTY;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 44c34a5..f7002c8 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -290,6 +290,25 @@ struct btrfs_ioctl_logical_ino_args {
 	__u64				inodes;
 };
 
+#define BTRFS_QUOTA_CTL_ENABLE	1
+#define BTRFS_QUOTA_CTL_DISABLE	2
+#define BTRFS_QUOTA_CTL_RESCAN	3
+struct btrfs_ioctl_quota_ctl_args {
+	__u64 cmd;
+	__u64 status;
+};
+
+struct btrfs_ioctl_qgroup_assign_args {
+	__u64 assign;
+	__u64 src;
+	__u64 dst;
+};
+
+struct btrfs_ioctl_qgroup_create_args {
+	__u64 create;
+	__u64 qgroupid;
+};
+
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
 				   struct btrfs_ioctl_vol_args)
 #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
@@ -355,4 +374,12 @@ struct btrfs_ioctl_logical_ino_args {
 #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \
 					struct btrfs_ioctl_ino_path_args)
 
+#define BTRFS_IOC_QUOTA_CTL _IOWR(BTRFS_IOCTL_MAGIC, 40, \
+			       struct btrfs_ioctl_quota_ctl_args)
+#define BTRFS_IOC_QGROUP_ASSIGN _IOW(BTRFS_IOCTL_MAGIC, 41, \
+			       struct btrfs_ioctl_qgroup_assign_args)
+#define BTRFS_IOC_QGROUP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 42, \
+			       struct btrfs_ioctl_qgroup_create_args)
+#define BTRFS_IOC_QGROUP_LIMIT _IOR(BTRFS_IOCTL_MAGIC, 43, \
+			       struct btrfs_ioctl_qgroup_limit_args)
 #endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-20 16:06 UTC

head link

[PATCH 24/24] Btrfs: add qgroup inheritance

From: Arne Jansen <sensille@gmx.net>

When creating a subvolume or snapshot, it is necessary
to initialize the qgroup account with a copy of some
other (tracking) qgroup. This patch adds parameters
to the ioctls to pass the information from which qgroup
to inherit.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 fs/btrfs/ioctl.c       |   59 ++++++++++++++++++++++++++++++++++-------------
 fs/btrfs/ioctl.h       |   11 ++++++++-
 fs/btrfs/transaction.c |    8 ++++++
 fs/btrfs/transaction.h |    1 +
 4 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 869c335..c46ebdc 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -334,7 +334,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file,
void __user *arg)
 static noinline int create_subvol(struct btrfs_root *root,
 				  struct dentry *dentry,
 				  char *name, int namelen,
-				  u64 *async_transid)
+				  u64 *async_transid,
+				  struct btrfs_qgroup_inherit **inherit)
 {
 	struct btrfs_trans_handle *trans;
 	struct btrfs_key key;
@@ -366,6 +367,11 @@ static noinline int create_subvol(struct btrfs_root *root,
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
 
+	ret = btrfs_qgroup_inherit(trans, root->fs_info, 0, objectid,
+				   inherit ? *inherit : NULL);
+	if (ret)
+		goto fail;
+
 	leaf = btrfs_alloc_free_block(trans, root, root->leafsize,
 				      0, objectid, NULL, 0, 0, 0);
 	if (IS_ERR(leaf)) {
@@ -482,7 +488,7 @@ fail:
 
 static int create_snapshot(struct btrfs_root *root, struct dentry *dentry,
 			   char *name, int namelen, u64 *async_transid,
-			   bool readonly)
+			   bool readonly, struct btrfs_qgroup_inherit **inherit)
 {
 	struct inode *inode;
 	struct btrfs_pending_snapshot *pending_snapshot;
@@ -500,6 +506,10 @@ static int create_snapshot(struct btrfs_root *root, struct
dentry *dentry,
 	pending_snapshot->dentry = dentry;
 	pending_snapshot->root = root;
 	pending_snapshot->readonly = readonly;
+	if (inherit) {
+		pending_snapshot->inherit = *inherit;
+		*inherit = NULL;	/* take responsibility to free it */
+	}
 
 	trans = btrfs_start_transaction(root->fs_info->extent_root, 5);
 	if (IS_ERR(trans)) {
@@ -633,7 +643,8 @@ static inline int btrfs_may_create(struct inode *dir, struct
dentry *child)
 static noinline int btrfs_mksubvol(struct path *parent,
 				   char *name, int namelen,
 				   struct btrfs_root *snap_src,
-				   u64 *async_transid, bool readonly)
+				   u64 *async_transid, bool readonly,
+				   struct btrfs_qgroup_inherit **inherit)
 {
 	struct inode *dir  = parent->dentry->d_inode;
 	struct dentry *dentry;
@@ -664,11 +675,11 @@ static noinline int btrfs_mksubvol(struct path *parent,
 		goto out_up_read;
 
 	if (snap_src) {
-		error = create_snapshot(snap_src, dentry,
-					name, namelen, async_transid, readonly);
+		error = create_snapshot(snap_src, dentry, name, namelen,
+					async_transid, readonly, inherit);
 	} else {
 		error = create_subvol(BTRFS_I(dir)->root, dentry,
-				      name, namelen, async_transid);
+				      name, namelen, async_transid, inherit);
 	}
 	if (!error)
 		fsnotify_mkdir(dir, dentry);
@@ -1367,11 +1378,9 @@ out:
 }
 
 static noinline int btrfs_ioctl_snap_create_transid(struct file *file,
-						    char *name,
-						    unsigned long fd,
-						    int subvol,
-						    u64 *transid,
-						    bool readonly)
+				char *name, unsigned long fd, int subvol,
+				u64 *transid, bool readonly,
+				struct btrfs_qgroup_inherit **inherit)
 {
 	struct btrfs_root *root = BTRFS_I(fdentry(file)->d_inode)->root;
 	struct file *src_file;
@@ -1395,7 +1404,7 @@ static noinline int btrfs_ioctl_snap_create_transid(struct
file *file,
 
 	if (subvol) {
 		ret = btrfs_mksubvol(&file->f_path, name, namelen,
-				     NULL, transid, readonly);
+				     NULL, transid, readonly, inherit);
 	} else {
 		struct inode *src_inode;
 		src_file = fget(fd);
@@ -1414,7 +1423,7 @@ static noinline int btrfs_ioctl_snap_create_transid(struct
file *file,
 		}
 		ret = btrfs_mksubvol(&file->f_path, name, namelen,
 				     BTRFS_I(src_inode)->root,
-				     transid, readonly);
+				     transid, readonly, inherit);
 		fput(src_file);
 	}
 out:
@@ -1434,7 +1443,7 @@ static noinline int btrfs_ioctl_snap_create(struct file
*file,
 
 	ret = btrfs_ioctl_snap_create_transid(file, vol_args->name,
 					      vol_args->fd, subvol,
-					      NULL, false);
+					      NULL, false, NULL);
 
 	kfree(vol_args);
 	return ret;
@@ -1448,6 +1457,7 @@ static noinline int btrfs_ioctl_snap_create_v2(struct file
*file,
 	u64 transid = 0;
 	u64 *ptr = NULL;
 	bool readonly = false;
+	struct btrfs_qgroup_inherit *inherit = NULL;
 
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args))
@@ -1455,7 +1465,8 @@ static noinline int btrfs_ioctl_snap_create_v2(struct file
*file,
 	vol_args->name[BTRFS_SUBVOL_NAME_MAX] = ''\0'';
 
 	if (vol_args->flags &
-	    ~(BTRFS_SUBVOL_CREATE_ASYNC | BTRFS_SUBVOL_RDONLY)) {
+	    ~(BTRFS_SUBVOL_CREATE_ASYNC | BTRFS_SUBVOL_RDONLY |
+	      BTRFS_SUBVOL_QGROUP_INHERIT)) {
 		ret = -EOPNOTSUPP;
 		goto out;
 	}
@@ -1464,10 +1475,21 @@ static noinline int btrfs_ioctl_snap_create_v2(struct
file *file,
 		ptr = &transid;
 	if (vol_args->flags & BTRFS_SUBVOL_RDONLY)
 		readonly = true;
+	if (vol_args->flags & BTRFS_SUBVOL_QGROUP_INHERIT) {
+		if (vol_args->size > PAGE_CACHE_SIZE) {
+			ret = -EINVAL;
+			goto out;
+		}
+		inherit = memdup_user(vol_args->qgroup_inherit, vol_args->size);
+		if (IS_ERR(inherit)) {
+			ret = PTR_ERR(inherit);
+			goto out;
+		}
+	}
 
 	ret = btrfs_ioctl_snap_create_transid(file, vol_args->name,
-					      vol_args->fd, subvol,
-					      ptr, readonly);
+					      vol_args->fd, subvol, ptr,
+					      readonly, &inherit);
 
 	if (ret == 0 && ptr &&
 	    copy_to_user(arg +
@@ -1476,6 +1498,7 @@ static noinline int btrfs_ioctl_snap_create_v2(struct file
*file,
 		ret = -EFAULT;
 out:
 	kfree(vol_args);
+	kfree(inherit);
 	return ret;
 }
 
@@ -3540,6 +3563,8 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_snap_create_v2(file, argp, 0);
 	case BTRFS_IOC_SUBVOL_CREATE:
 		return btrfs_ioctl_snap_create(file, argp, 1);
+	case BTRFS_IOC_SUBVOL_CREATE_V2:
+		return btrfs_ioctl_snap_create_v2(file, argp, 1);
 	case BTRFS_IOC_SNAP_DESTROY:
 		return btrfs_ioctl_snap_destroy(file, argp);
 	case BTRFS_IOC_SUBVOL_GETFLAGS:
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index f7002c8..2b54e7a 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -32,6 +32,7 @@ struct btrfs_ioctl_vol_args {
 
 #define BTRFS_SUBVOL_CREATE_ASYNC	(1ULL << 0)
 #define BTRFS_SUBVOL_RDONLY		(1ULL << 1)
+#define BTRFS_SUBVOL_QGROUP_INHERIT	(1ULL << 2)
 #define BTRFS_FSID_SIZE 16
 #define BTRFS_UUID_SIZE 16
 
@@ -64,7 +65,13 @@ struct btrfs_ioctl_vol_args_v2 {
 	__s64 fd;
 	__u64 transid;
 	__u64 flags;
-	__u64 unused[4];
+	union {
+		struct {
+			__u64 size;
+			struct btrfs_qgroup_inherit __user *qgroup_inherit;
+		};
+		__u64 unused[4];
+	};
 	char name[BTRFS_SUBVOL_NAME_MAX + 1];
 };
 
@@ -353,6 +360,8 @@ struct btrfs_ioctl_qgroup_create_args {
 #define BTRFS_IOC_WAIT_SYNC  _IOW(BTRFS_IOCTL_MAGIC, 22, __u64)
 #define BTRFS_IOC_SNAP_CREATE_V2 _IOW(BTRFS_IOCTL_MAGIC, 23, \
 				   struct btrfs_ioctl_vol_args_v2)
+#define BTRFS_IOC_SUBVOL_CREATE_V2 _IOW(BTRFS_IOCTL_MAGIC, 24, \
+				   struct btrfs_ioctl_vol_args_v2)
 #define BTRFS_IOC_SUBVOL_GETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 25, __u64)
 #define BTRFS_IOC_SUBVOL_SETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 26, __u64)
 #define BTRFS_IOC_SCRUB _IOWR(BTRFS_IOCTL_MAGIC, 27, \
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 69b52bd..ce93f92 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -969,6 +969,14 @@ static noinline int create_pending_snapshot(struct
btrfs_trans_handle *trans,
 		}
 	}
 
+	ret = btrfs_qgroup_inherit(trans, fs_info, root->root_key.objectid,
+				   objectid, pending->inherit);
+	kfree(pending->inherit);
+	if (ret) {
+		pending->error = ret;
+		goto fail;
+	}
+
 	key.objectid = objectid;
 	key.offset = (u64)-1;
 	key.type = BTRFS_ROOT_ITEM_KEY;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index addfa2d..37c3d9d 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -70,6 +70,7 @@ struct btrfs_pending_snapshot {
 	struct dentry *dentry;
 	struct btrfs_root *root;
 	struct btrfs_root *snap;
+	struct btrfs_qgroup_inherit *inherit;
 	/* block reservation for the operation */
 	struct btrfs_block_rsv block_rsv;
 	/* extra metadata reseration for relocation */
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tsutomu Itoh

2012-May-20 23:44 UTC

head link

Re: [PATCH 07/24] Btrfs: add tree modification log functions

Hi Jan,

(2012/05/21 1:06), Jan Schmidt wrote:> The tree mod log will log modifications made fs-tree nodes. Most
> modifications are done by autobalance of the tree. Such changes are
recorded
> as long as a block entry exists. When released, the log is cleaned.
> 
> With the tree modification log, it''s possible to reconstruct a
consistent
> old state of the tree. This is required to do backref walking on a busy
> file system.
> 
> Signed-off-by: Jan Schmidt<list.btrfs@jan-o-sch.net>
> ---
>   fs/btrfs/ctree.c   |  409
++++++++++++++++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/ctree.h   |    7 +-
>   fs/btrfs/disk-io.c |    2 +-
>   3 files changed, 416 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 56485b3..6420638 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -18,6 +18,7 @@
> 
>   #include<linux/sched.h>
>   #include<linux/slab.h>
> +#include<linux/rbtree.h>
>   #include "ctree.h"
>   #include "disk-io.h"
>   #include "transaction.h"
> @@ -288,6 +289,414 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
>   	return 0;
>   }
> 
> +enum mod_log_op {
> +	MOD_LOG_KEY_REPLACE,
> +	MOD_LOG_KEY_ADD,
> +	MOD_LOG_KEY_REMOVE,
> +	MOD_LOG_KEY_REMOVE_WHILE_FREEING,
> +	MOD_LOG_MOVE_KEYS,
> +	MOD_LOG_ROOT_REPLACE,
> +};
> +
> +struct tree_mod_move {
> +	int dst_slot;
> +	int nr_items;
> +};
> +
> +struct tree_mod_root {
> +	u64 logical;
> +	u8 level;
> +};
> +
> +struct tree_mod_elem {
> +	struct rb_node node;
> +	u64 index;		/* shifted logical */
> +	struct seq_list elem;
> +	enum mod_log_op op;
> +
> +	/* this is used for MOD_LOG_KEY_* and MOD_LOG_MOVE_KEYS operations */
> +	int slot;
> +
> +	/* this is used for MOD_LOG_KEY* and MOD_LOG_ROOT_REPLACE */
> +	u64 generation;
> +
> +	/* those are used for op == MOD_LOG_KEY_{REPLACE,REMOVE} */
> +	struct btrfs_disk_key key;
> +	u64 blockptr;
> +
> +	/* this is used for op == MOD_LOG_MOVE_KEYS */
> +	struct tree_mod_move move;
> +
> +	/* this is used for op == MOD_LOG_ROOT_REPLACE */
> +	struct tree_mod_root old_root;
> +};
> +
> +static inline void
> +__get_tree_mod_seq(struct btrfs_fs_info *fs_info, struct seq_list *elem)
> +{
> +	elem->seq = atomic_inc_return(&fs_info->tree_mod_seq);
> +	list_add_tail(&elem->list,&fs_info->tree_mod_seq_list);
> +}
> +
> +void btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info,
> +			    struct seq_list *elem)
> +{
> +	elem->flags = 1;
> +	spin_lock(&fs_info->tree_mod_seq_lock);
> +	__get_tree_mod_seq(fs_info, elem);
> +	spin_unlock(&fs_info->tree_mod_seq_lock);
> +}
> +
> +void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
> +			    struct seq_list *elem)
> +{
> +	struct rb_root *tm_root;
> +	struct rb_node *node;
> +	struct rb_node *next;
> +	struct seq_list *cur_elem;
> +	struct tree_mod_elem *tm;
> +	u64 min_seq = (u64)-1;
> +	u64 seq_putting = elem->seq;
> +
> +	if (!seq_putting)
> +		return;
> +
> +	BUG_ON(!(elem->flags&  1));
> +	spin_lock(&fs_info->tree_mod_seq_lock);
> +	list_del(&elem->list);
> +
> +	list_for_each_entry(cur_elem,&fs_info->tree_mod_seq_list, list) {
> +		if ((cur_elem->flags&  1)&&  cur_elem->seq< 
min_seq) {
> +			if (seq_putting>  cur_elem->seq) {
> +				/*
> +				 * blocker with lower sequence number exists, we
> +				 * cannot remove anything from the log
> +				 */
> +				goto out;
> +			}
> +			min_seq = cur_elem->seq;
> +		}
> +	}
> +
> +	/*
> +	 * anything that''s lower than the lowest existing (read:
blocked)
> +	 * sequence number can be removed from the tree.
> +	 */
> +	write_lock(&fs_info->tree_mod_log_lock);
> +	tm_root =&fs_info->tree_mod_log;
> +	for (node = rb_first(tm_root); node; node = next) {
> +		next = rb_next(node);
> +		tm = container_of(node, struct tree_mod_elem, node);
> +		if (tm->elem.seq>  min_seq)
> +			continue;
> +		rb_erase(node, tm_root);
> +		list_del(&tm->elem.list);
> +		kfree(tm);
> +	}
> +	write_unlock(&fs_info->tree_mod_log_lock);
> +out:
> +	spin_unlock(&fs_info->tree_mod_seq_lock);
> +}
> +
> +/*
> + * key order of the log:
> + *       index ->  sequence
> + *
> + * the index is the shifted logical of the *new* root node for root
replace
> + * operations, or the shifted logical of the affected block for all other
> + * operations.
> + */
> +static noinline int
> +__tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem
*tm)
> +{
> +	struct rb_root *tm_root;
> +	struct rb_node **new;
> +	struct rb_node *parent = NULL;
> +	struct tree_mod_elem *cur;
> +
> +	BUG_ON(!tm || !tm->elem.seq);
> +
> +	write_lock(&fs_info->tree_mod_log_lock);
> +	tm_root =&fs_info->tree_mod_log;
> +	new =&tm_root->rb_node;
> +	while (*new) {
> +		cur = container_of(*new, struct tree_mod_elem, node);
> +		parent = *new;
> +		if (cur->index<  tm->index)
> +			new =&((*new)->rb_left);
> +		else if (cur->index>  tm->index)
> +			new =&((*new)->rb_right);
> +		else if (cur->elem.seq<  tm->elem.seq)
> +			new =&((*new)->rb_left);
> +		else if (cur->elem.seq>  tm->elem.seq)
> +			new =&((*new)->rb_right);
> +		else {
> +			kfree(tm);
> +			return -EEXIST;
I think that write_unlock() is necessary for here.
> +		}
> +	}
> +
> +	rb_link_node(&tm->node, parent, new);
> +	rb_insert_color(&tm->node, tm_root);
> +	write_unlock(&fs_info->tree_mod_log_lock);
> +
> +	return 0;
> +}
> +
> +int tree_mod_alloc(struct btrfs_fs_info *fs_info, gfp_t flags,
> +		   struct tree_mod_elem **tm_ret)
> +{
> +	struct tree_mod_elem *tm;
> +	u64 seq = 0;
> +
> +	/*
> +	 * we want to avoid a malloc/free cycle if there''s no blocker in
the
> +	 * list.
> +	 * we also want to avoid atomic malloc. so we must drop the spinlock
> +	 * before calling kzalloc and recheck afterwards.
> +	 */
> +	spin_lock(&fs_info->tree_mod_seq_lock);
> +	if (list_empty(&fs_info->tree_mod_seq_list))
> +		goto out;
> +
> +	spin_unlock(&fs_info->tree_mod_seq_lock);
> +	tm = *tm_ret = kzalloc(sizeof(*tm), flags);
> +	if (!tm)
> +		return -ENOMEM;
> +
> +	spin_lock(&fs_info->tree_mod_seq_lock);
> +	if (list_empty(&fs_info->tree_mod_seq_list)) {
> +		kfree(tm);
> +		goto out;
> +	}
> +
> +	__get_tree_mod_seq(fs_info,&tm->elem);
> +	seq = tm->elem.seq;
> +	tm->elem.flags = 0;
> +
> +out:
> +	spin_unlock(&fs_info->tree_mod_seq_lock);
> +	return seq;
> +}
> +
> +static noinline int
> +tree_mod_log_insert_key_mask(struct btrfs_fs_info *fs_info,
> +			     struct extent_buffer *eb, int slot,
> +			     enum mod_log_op op, gfp_t flags)
> +{
> +	struct tree_mod_elem *tm;
> +	int ret;
> +
> +	ret = tree_mod_alloc(fs_info, flags,&tm);
> +	if (ret<= 0)
> +		return ret;
> +
> +	tm->index = eb->start>>  PAGE_CACHE_SHIFT;
> +	if (op != MOD_LOG_KEY_ADD) {
> +		btrfs_node_key(eb,&tm->key, slot);
> +		tm->blockptr = btrfs_node_blockptr(eb, slot);
> +	}
> +	tm->op = op;
> +	tm->slot = slot;
> +	tm->generation = btrfs_node_ptr_generation(eb, slot);
> +
> +	return __tree_mod_log_insert(fs_info, tm);
> +}
> +
> +static noinline int
> +tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, struct
extent_buffer *eb,
> +			int slot, enum mod_log_op op)
> +{
> +	return tree_mod_log_insert_key_mask(fs_info, eb, slot, op, GFP_NOFS);
> +}
> +
> +static noinline int
> +tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
> +			 struct extent_buffer *eb, int dst_slot, int src_slot,
> +			 int nr_items, gfp_t flags)
> +{
> +	struct tree_mod_elem *tm;
> +	int ret;
> +
> +	ret = tree_mod_alloc(fs_info, flags,&tm);
> +	if (ret<= 0)
> +		return ret;
> +
> +	tm->index = eb->start>>  PAGE_CACHE_SHIFT;
> +	tm->slot = src_slot;
> +	tm->move.dst_slot = dst_slot;
> +	tm->move.nr_items = nr_items;
> +	tm->op = MOD_LOG_MOVE_KEYS;
> +
> +	return __tree_mod_log_insert(fs_info, tm);
> +}
> +
> +static noinline int
> +tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
> +			 struct extent_buffer *old_root,
> +			 struct extent_buffer *new_root, gfp_t flags)
> +{
> +	struct tree_mod_elem *tm;
> +	int ret;
> +
> +	ret = tree_mod_alloc(fs_info, flags,&tm);
> +	if (ret<= 0)
> +		return ret;
> +
> +	tm->index = new_root->start>>  PAGE_CACHE_SHIFT;
> +	tm->old_root.logical = old_root->start;
> +	tm->old_root.level = btrfs_header_level(old_root);
> +	tm->generation = btrfs_header_generation(old_root);
> +	tm->op = MOD_LOG_ROOT_REPLACE;
> +
> +	return __tree_mod_log_insert(fs_info, tm);
> +}
> +
> +static struct tree_mod_elem *
> +__tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64
min_seq,
> +		      int smallest)
> +{
> +	struct rb_root *tm_root;
> +	struct rb_node *node;
> +	struct tree_mod_elem *cur = NULL;
> +	struct tree_mod_elem *found = NULL;
> +	u64 index = start>>  PAGE_CACHE_SHIFT;
> +
> +	read_lock(&fs_info->tree_mod_log_lock);
> +	tm_root =&fs_info->tree_mod_log;
> +	node = tm_root->rb_node;
> +	while (node) {
> +		cur = container_of(node, struct tree_mod_elem, node);
> +		if (cur->index<  index) {
> +			node = node->rb_left;
> +		} else if (cur->index>  index) {
> +			node = node->rb_right;
> +		} else if (cur->elem.seq<  min_seq) {
> +			node = node->rb_left;
> +		} else if (!smallest) {
> +			/* we want the node with the highest seq */
> +			if (found)
> +				BUG_ON(found->elem.seq>  cur->elem.seq);
> +			found = cur;
> +			node = node->rb_left;
> +		} else if (cur->elem.seq>  min_seq) {
> +			/* we want the node with the smallest seq */
> +			if (found)
> +				BUG_ON(found->elem.seq<  cur->elem.seq);
> +			found = cur;
> +			node = node->rb_right;
> +		} else {
I think read_unlock() is necessary for here.

Thanks,
Tsutomu
> +			return cur;
> +		}
> +	}
> +	read_unlock(&fs_info->tree_mod_log_lock);
> +
> +	return found;
> +}
> +
> +/*
> + * this returns the element from the log with the smallest time sequence
> + * value that''s in the log (the oldest log item). any element
with a time
> + * sequence lower than min_seq will be ignored.
> + */
> +static struct tree_mod_elem *
> +tree_mod_log_search_oldest(struct btrfs_fs_info *fs_info, u64 start,
> +			   u64 min_seq)
> +{
> +	return __tree_mod_log_search(fs_info, start, min_seq, 1);
> +}
> +
> +/*
> + * this returns the element from the log with the largest time sequence
> + * value that''s in the log (the most recent log item). any
element with
> + * a time sequence lower than min_seq will be ignored.
> + */
> +static struct tree_mod_elem *
> +tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64 min_seq)
> +{
> +	return __tree_mod_log_search(fs_info, start, min_seq, 0);
> +}
> +
> +static inline void
> +__copy_extent_buffer_log(struct btrfs_fs_info *fs_info,
> +			 struct extent_buffer *dst, struct extent_buffer *src,
> +			 unsigned long dst_offset, unsigned long src_offset,
> +			 int nr_items, size_t item_size)
> +{
> +	int ret;
> +	int i;
> +
> +	/* speed this up by single seq for all operations? */
> +	for (i = 0; i<  nr_items; i++) {
> +		ret = tree_mod_log_insert_key(fs_info, src, i + src_offset,
> +					      MOD_LOG_KEY_REMOVE);
> +		BUG_ON(ret<  0);
> +		ret = tree_mod_log_insert_key(fs_info, dst, i + dst_offset,
> +					      MOD_LOG_KEY_ADD);
> +		BUG_ON(ret<  0);
> +	}
> +
> +	copy_extent_buffer(dst, src, btrfs_node_key_ptr_offset(dst_offset),
> +			   btrfs_node_key_ptr_offset(src_offset),
> +			   nr_items * item_size);
> +}
> +
> +static inline void
> +__memmove_extent_buffer_log(struct btrfs_fs_info *fs_info,
> +			    struct extent_buffer *dst,
> +			    int dst_offset, int src_offset, int nr_items,
> +			    size_t item_size, int tree_mod_log)
> +{
> +	int ret;
> +	if (tree_mod_log) {
> +		ret = tree_mod_log_insert_move(fs_info, dst, dst_offset,
> +					       src_offset, nr_items, GFP_NOFS);
> +		BUG_ON(ret<  0);
> +	}
> +	memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(dst_offset),
> +			      btrfs_node_key_ptr_offset(src_offset),
> +			      nr_items * item_size);
> +}
> +
> +static inline void
> +__set_node_key_log(struct btrfs_fs_info *fs_info, struct extent_buffer
*eb,
> +		   struct btrfs_disk_key *disk_key, int nr, int atomic)
> +{
> +	int ret;
> +
> +	ret = tree_mod_log_insert_key_mask(fs_info, eb, nr, MOD_LOG_KEY_REPLACE,
> +					   atomic ? GFP_ATOMIC : GFP_NOFS);
> +	BUG_ON(ret<  0);
> +
> +	btrfs_set_node_key(eb, disk_key, nr);
> +}
> +
> +static void __log_cleaning(struct btrfs_fs_info *fs_info,
> +			   struct extent_buffer *eb)
> +{
> +	int i;
> +	int ret;
> +	u32 nritems;
> +
> +	nritems = btrfs_header_nritems(eb);
> +	for (i = nritems - 1; i>= 0; i--) {
> +		ret = tree_mod_log_insert_key(fs_info, eb, i,
> +					      MOD_LOG_KEY_REMOVE_WHILE_FREEING);
> +		BUG_ON(ret<  0);
> +	}
> +}
> +
> +static inline void
> +set_root_pointer(struct btrfs_root *root, struct extent_buffer
*new_root_node)
> +{
> +	int ret;
> +	__log_cleaning(root->fs_info, root->node);
> +	ret = tree_mod_log_insert_root(root->fs_info, root->node,
> +				       new_root_node, GFP_NOFS);
> +	BUG_ON(ret<  0);
> +	rcu_assign_pointer(root->node, new_root_node);
> +}
> +
>   /*
>    * check if the tree block can be shared by multiple trees
>    */
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 6774821..e53bfb9 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1132,7 +1132,7 @@ struct btrfs_fs_info {
>   	/* this protects tree_mod_seq_list */
>   	spinlock_t tree_mod_seq_lock;
>   	atomic_t tree_mod_seq;
> -	struct list_head tree_mod_list;
> +	struct list_head tree_mod_seq_list;
> 
>   	/* this protects tree_mod_log */
>   	rwlock_t tree_mod_log_lock;
> @@ -3114,4 +3114,9 @@ struct seq_list {
>   	u32 flags;
>   };
> 
> +void btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info,
> +			    struct seq_list *elem);
> +void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
> +			    struct seq_list *elem);
> +
>   #endif
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 6aec7c6..f51ad84 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1921,7 +1921,7 @@ int open_ctree(struct super_block *sb,
>   	init_completion(&fs_info->kobj_unregister);
>   	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
>   	INIT_LIST_HEAD(&fs_info->space_info);
> -	INIT_LIST_HEAD(&fs_info->tree_mod_list);
> +	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
>   	btrfs_mapping_init(&fs_info->mapping_tree);
>   	btrfs_init_block_rsv(&fs_info->global_block_rsv);
>   	btrfs_init_block_rsv(&fs_info->delalloc_block_rsv);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tsutomu Itoh

2012-May-21 00:42 UTC

head link

Re: [PATCH 19/24] Btrfs: qgroup implementation and prototypes

Hi Jan,

(2012/05/21 1:06), Jan Schmidt wrote:> From: Arne Jansen<sensille@gmx.net>
> 
> Signed-off-by: Arne Jansen<sensille@gmx.net>
> Signed-off-by: Jan Schmidt<list.btrfs@jan-o-sch.net>
> ---
>   fs/btrfs/Makefile |    2 +-
>   fs/btrfs/ctree.h  |   33 ++
>   fs/btrfs/ioctl.h  |   24 +
>   fs/btrfs/qgroup.c | 1531
+++++++++++++++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 1589 insertions(+), 1 deletions(-)
>   create mode 100644 fs/btrfs/qgroup.c
> 
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index 0c4fa2b..0bc4d3a 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -8,7 +8,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o
root-tree.o dir-item.o \
>   	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
>   	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
>   	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
> -	   reada.o backref.o ulist.o
> +	   reada.o backref.o ulist.o qgroup.o
> 
>   btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
>   btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2b6f003..0630412 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3284,6 +3284,39 @@ void btrfs_get_tree_mod_seq(struct btrfs_fs_info
*fs_info,
>   void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
>   			    struct seq_list *elem);
> 
> +/* qgroup.c */
> +int btrfs_quota_enable(struct btrfs_trans_handle *trans,
> +		       struct btrfs_fs_info *fs_info);
> +int btrfs_quota_disable(struct btrfs_trans_handle *trans,
> +			struct btrfs_fs_info *fs_info);
> +int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
> +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
> +			      struct btrfs_fs_info *fs_info, u64 src, u64 dst);
> +int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
> +			      struct btrfs_fs_info *fs_info, u64 src, u64 dst);
> +int btrfs_create_qgroup(struct btrfs_trans_handle *trans,
> +			struct btrfs_fs_info *fs_info, u64 qgroupid,
> +			char *name);
> +int btrfs_remove_qgroup(struct btrfs_trans_handle *trans,
> +			      struct btrfs_fs_info *fs_info, u64 qgroupid);
> +int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
> +		       struct btrfs_fs_info *fs_info, u64 qgroupid,
> +		       struct btrfs_qgroup_limit *limit);
> +int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info);
> +void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info);
> +struct btrfs_delayed_extent_op;
> +int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans,
> +			    struct btrfs_fs_info *fs_info,
> +			    struct btrfs_delayed_ref_node *node,
> +			    struct btrfs_delayed_extent_op *extent_op);
> +int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
> +		      struct btrfs_fs_info *fs_info);
> +int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
> +			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
> +			 struct btrfs_qgroup_inherit *inherit);
> +int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
> +void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
> +
>   static inline int is_fstree(u64 rootid)
>   {
>   	if (rootid == BTRFS_FS_TREE_OBJECTID ||
> diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
> index 086e6bd..44c34a5 100644
> --- a/fs/btrfs/ioctl.h
> +++ b/fs/btrfs/ioctl.h
> @@ -35,6 +35,30 @@ struct btrfs_ioctl_vol_args {
>   #define BTRFS_FSID_SIZE 16
>   #define BTRFS_UUID_SIZE 16
> 
> +#define BTRFS_QGROUP_INHERIT_SET_LIMITS	(1ULL<<  0)
> +
> +struct btrfs_qgroup_limit {
> +	__u64	flags;
> +	__u64	max_rfer;
> +	__u64	max_excl;
> +	__u64	rsv_rfer;
> +	__u64	rsv_excl;
> +};
> +
> +struct btrfs_qgroup_inherit {
> +	__u64	flags;
> +	__u64	num_qgroups;
> +	__u64	num_ref_copies;
> +	__u64	num_excl_copies;
> +	struct btrfs_qgroup_limit lim;
> +	__u64	qgroups[0];
> +};
> +
> +struct btrfs_ioctl_qgroup_limit_args {
> +	__u64	qgroupid;
> +	struct btrfs_qgroup_limit lim;
> +};
> +
>   #define BTRFS_SUBVOL_NAME_MAX 4039
>   struct btrfs_ioctl_vol_args_v2 {
>   	__s64 fd;
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> new file mode 100644
> index 0000000..678fe45
> --- /dev/null
> +++ b/fs/btrfs/qgroup.c
> @@ -0,0 +1,1531 @@
> +/*
> + * Copyright (C) 2011 STRATO.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; if not, write to the
> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> + * Boston, MA 021110-1307, USA.
> + */
> +
> +#include<linux/sched.h>
> +#include<linux/pagemap.h>
> +#include<linux/writeback.h>
> +#include<linux/blkdev.h>
> +#include<linux/rbtree.h>
> +#include<linux/slab.h>
> +#include<linux/workqueue.h>
> +
> +#include "ctree.h"
> +#include "transaction.h"
> +#include "disk-io.h"
> +#include "locking.h"
> +#include "ulist.h"
> +#include "ioctl.h"
> +#include "backref.h"
> +
> +/* TODO XXX FIXME
> + *  - subvol delete ->  delete when ref goes to 0? delete limits also?
> + *  - reorganize keys
> + *  - compressed
> + *  - sync
> + *  - rescan
> + *  - copy also limits on subvol creation
> + *  - limit
> + *  - caches fuer ulists
> + *  - performance benchmarks
> + *  - check all ioctl parameters
> + */
> +
> +/*
> + * one struct for each qgroup, organized in fs_info->qgroup_tree.
> + */
> +struct btrfs_qgroup {
> +	u64 qgroupid;
> +
> +	/*
> +	 * state
> +	 */
> +	u64 rfer;	/* referenced */
> +	u64 rfer_cmpr;	/* referenced compressed */
> +	u64 excl;	/* exclusive */
> +	u64 excl_cmpr;	/* exclusive compressed */
> +
> +	/*
> +	 * limits
> +	 */
> +	u64 lim_flags;	/* which limits are set */
> +	u64 max_rfer;
> +	u64 max_excl;
> +	u64 rsv_rfer;
> +	u64 rsv_excl;
> +
> +	/*
> +	 * reservation tracking
> +	 */
> +	u64 reserved;
> +
> +	/*
> +	 * lists
> +	 */
> +	struct list_head groups;  /* groups this group is member of */
> +	struct list_head members; /* groups that are members of this group */
> +	struct list_head dirty;   /* dirty groups */
> +	struct rb_node node;	  /* tree of qgroups */
> +
> +	/*
> +	 * temp variables for accounting operations
> +	 */
> +	u64 tag;
> +	u64 refcnt;
> +};
> +
> +/*
> + * glue structure to represent the relations between qgroups.
> + */
> +struct btrfs_qgroup_list {
> +	struct list_head next_group;
> +	struct list_head next_member;
> +	struct btrfs_qgroup *group;
> +	struct btrfs_qgroup *member;
> +};
> +
> +/* must be called with qgroup_lock held */
> +static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
> +					   u64 qgroupid)
> +{
> +	struct rb_node *n = fs_info->qgroup_tree.rb_node;
> +	struct btrfs_qgroup *qgroup;
> +
> +	while (n) {
> +		qgroup = rb_entry(n, struct btrfs_qgroup, node);
> +		if (qgroup->qgroupid<  qgroupid)
> +			n = n->rb_left;
> +		else if (qgroup->qgroupid>  qgroupid)
> +			n = n->rb_right;
> +		else
> +			return qgroup;
> +	}
> +	return NULL;
> +}
> +
> +/* must be called with qgroup_lock held */
> +static struct btrfs_qgroup *add_qgroup_rb(struct btrfs_fs_info *fs_info,
> +					  u64 qgroupid)
> +{
> +	struct rb_node **p =&fs_info->qgroup_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct btrfs_qgroup *qgroup;
> +
> +	while (*p) {
> +		parent = *p;
> +		qgroup = rb_entry(parent, struct btrfs_qgroup, node);
> +
> +		if (qgroup->qgroupid<  qgroupid)
> +			p =&(*p)->rb_left;
> +		else if (qgroup->qgroupid>  qgroupid)
> +			p =&(*p)->rb_right;
> +		else
> +			return qgroup;
> +	}
> +
> +	qgroup = kzalloc(sizeof(*qgroup), GFP_ATOMIC);
> +	if (!qgroup)
> +		return ERR_PTR(-ENOMEM);
> +
> +	qgroup->qgroupid = qgroupid;
> +	INIT_LIST_HEAD(&qgroup->groups);
> +	INIT_LIST_HEAD(&qgroup->members);
> +	INIT_LIST_HEAD(&qgroup->dirty);
> +
> +	rb_link_node(&qgroup->node, parent, p);
> +	rb_insert_color(&qgroup->node,&fs_info->qgroup_tree);
> +
> +	return qgroup;
> +}
> +
> +/* must be called with qgroup_lock held */
> +static int del_qgroup_rb(struct btrfs_fs_info *fs_info, u64 qgroupid)
> +{
> +	struct btrfs_qgroup *qgroup = find_qgroup_rb(fs_info, qgroupid);
> +	struct btrfs_qgroup_list *list;
> +
> +	if (!qgroup)
> +		return -ENOENT;
> +
> +	rb_erase(&qgroup->node,&fs_info->qgroup_tree);
> +	list_del(&qgroup->dirty);
> +
> +	while (!list_empty(&qgroup->groups)) {
> +		list = list_first_entry(&qgroup->groups,
> +					struct btrfs_qgroup_list, next_group);
> +		list_del(&list->next_group);
> +		list_del(&list->next_member);
> +		kfree(list);
> +	}
> +
> +	while (!list_empty(&qgroup->members)) {
> +		list = list_first_entry(&qgroup->members,
> +					struct btrfs_qgroup_list, next_member);
> +		list_del(&list->next_group);
> +		list_del(&list->next_member);
> +		kfree(list);
> +	}
> +	kfree(qgroup);
> +
> +	return 0;
> +}
> +
> +/* must be called with qgroup_lock held */
> +static int add_relation_rb(struct btrfs_fs_info *fs_info,
> +			   u64 memberid, u64 parentid)
> +{
> +	struct btrfs_qgroup *member;
> +	struct btrfs_qgroup *parent;
> +	struct btrfs_qgroup_list *list;
> +
> +	member = find_qgroup_rb(fs_info, memberid);
> +	parent = find_qgroup_rb(fs_info, parentid);
> +	if (!member || !parent)
> +		return -ENOENT;
> +
> +	list = kzalloc(sizeof(*list), GFP_ATOMIC);
> +	if (!list)
> +		return -ENOMEM;
> +
> +	list->group = parent;
> +	list->member = member;
> +	list_add_tail(&list->next_group,&member->groups);
> +	list_add_tail(&list->next_member,&parent->members);
> +
> +	return 0;
> +}
> +
> +/* must be called with qgroup_lock held */
> +static int del_relation_rb(struct btrfs_fs_info *fs_info,
> +			   u64 memberid, u64 parentid)
> +{
> +	struct btrfs_qgroup *member;
> +	struct btrfs_qgroup *parent;
> +	struct btrfs_qgroup_list *list;
> +
> +	member = find_qgroup_rb(fs_info, memberid);
> +	parent = find_qgroup_rb(fs_info, parentid);
> +	if (!member || !parent)
> +		return -ENOENT;
> +
> +	list_for_each_entry(list,&member->groups, next_group) {
> +		if (list->group == parent) {
> +			list_del(&list->next_group);
> +			list_del(&list->next_member);
> +			kfree(list);
> +			return 0;
> +		}
> +	}
> +	return -ENOENT;
> +}
> +
> +/*
> + * The full config is read in one go, only called from open_ctree()
> + * It doesn''t use any locking, as at this point we''re
still single-threaded
> + */
> +int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_key key;
> +	struct btrfs_key found_key;
> +	struct btrfs_root *quota_root = fs_info->quota_root;
> +	struct btrfs_path *path = NULL;
> +	struct extent_buffer *l;
> +	int slot;
> +	int ret = 0;
> +	u64 flags = 0;
> +
> +	if (!fs_info->quota_enabled)
> +		return 0;
> +
> +	path = btrfs_alloc_path();
> +	if (!path) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	/* default this to quota off, in case no status key is found */
> +	fs_info->qgroup_flags = 0;
> +
> +	/*
> +	 * pass 1: read status, all qgroup infos and limits
> +	 */
> +	key.objectid = 0;
> +	key.type = 0;
> +	key.offset = 0;
> +	ret = btrfs_search_slot_for_read(quota_root,&key, path, 1, 1);
> +	if (ret)
> +		goto out;
> +
> +	while (1) {
> +		struct btrfs_qgroup *qgroup;
> +
> +		slot = path->slots[0];
> +		l = path->nodes[0];
> +		btrfs_item_key_to_cpu(l,&found_key, slot);
> +
> +		if (found_key.type == BTRFS_QGROUP_STATUS_KEY) {
> +			struct btrfs_qgroup_status_item *ptr;
> +
> +			ptr = btrfs_item_ptr(l, slot,
> +					     struct btrfs_qgroup_status_item);
> +
> +			if (btrfs_qgroup_status_version(l, ptr) !> +			   
BTRFS_QGROUP_STATUS_VERSION) {
> +				printk(KERN_ERR
> +				 "btrfs: old qgroup version, quota disabled\n");
> +				goto out;
> +			}
> +			if (btrfs_qgroup_status_generation(l, ptr) !> +			   
fs_info->generation) {
> +				flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> +				printk(KERN_ERR
> +					"btrfs: qgroup generation mismatch, "
> +					"marked as inconsistent\n");
> +			}
> +			fs_info->qgroup_flags = btrfs_qgroup_status_flags(l,
> +									  ptr);
> +			/* FIXME read scan element */
> +			goto next1;
> +		}
> +
> +		if (found_key.type != BTRFS_QGROUP_INFO_KEY&&
> +		    found_key.type != BTRFS_QGROUP_LIMIT_KEY)
> +			goto next1;
> +
> +		qgroup = find_qgroup_rb(fs_info, found_key.offset);
> +		if ((qgroup&&  found_key.type == BTRFS_QGROUP_INFO_KEY) ||
> +		    (!qgroup&&  found_key.type == BTRFS_QGROUP_LIMIT_KEY)) {
> +			printk(KERN_ERR "btrfs: inconsitent qgroup config\n");
> +			flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> +		}
> +		if (!qgroup) {
> +			qgroup = add_qgroup_rb(fs_info, found_key.offset);
> +			if (IS_ERR(qgroup)) {
> +				ret = PTR_ERR(qgroup);
> +				goto out;
> +			}
> +		}
> +		switch (found_key.type) {
> +		case BTRFS_QGROUP_INFO_KEY: {
> +			struct btrfs_qgroup_info_item *ptr;
> +
> +			ptr = btrfs_item_ptr(l, slot,
> +					     struct btrfs_qgroup_info_item);
> +			qgroup->rfer = btrfs_qgroup_info_rfer(l, ptr);
> +			qgroup->rfer_cmpr = btrfs_qgroup_info_rfer_cmpr(l, ptr);
> +			qgroup->excl = btrfs_qgroup_info_excl(l, ptr);
> +			qgroup->excl_cmpr = btrfs_qgroup_info_excl_cmpr(l, ptr);
> +			/* generation currently unused */
> +			break;
> +		}
> +		case BTRFS_QGROUP_LIMIT_KEY: {
> +			struct btrfs_qgroup_limit_item *ptr;
> +
> +			ptr = btrfs_item_ptr(l, slot,
> +					     struct btrfs_qgroup_limit_item);
> +			qgroup->lim_flags = btrfs_qgroup_limit_flags(l, ptr);
> +			qgroup->max_rfer = btrfs_qgroup_limit_max_rfer(l, ptr);
> +			qgroup->max_excl = btrfs_qgroup_limit_max_excl(l, ptr);
> +			qgroup->rsv_rfer = btrfs_qgroup_limit_rsv_rfer(l, ptr);
> +			qgroup->rsv_excl = btrfs_qgroup_limit_rsv_excl(l, ptr);
> +			break;
> +		}
> +		}
> +next1:
> +		ret = btrfs_next_item(quota_root, path);
> +		if (ret<  0)
> +			goto out;
> +		if (ret)
> +			break;
> +	}
> +	btrfs_release_path(path);
> +
> +	/*
> +	 * pass 2: read all qgroup relations
> +	 */
> +	key.objectid = 0;
> +	key.type = BTRFS_QGROUP_RELATION_KEY;
> +	key.offset = 0;
> +	ret = btrfs_search_slot_for_read(quota_root,&key, path, 1, 0);
> +	if (ret)
> +		goto out;
> +	while (1) {
> +		slot = path->slots[0];
> +		l = path->nodes[0];
> +		btrfs_item_key_to_cpu(l,&found_key, slot);
> +
> +		if (found_key.type != BTRFS_QGROUP_RELATION_KEY)
> +			goto next2;
> +
> +		if (found_key.objectid>  found_key.offset) {
> +			/* parent<- member, not needed to build config */
> +			/* FIXME should we omit the key completely? */
> +			goto next2;
> +		}
> +
> +		ret = add_relation_rb(fs_info, found_key.objectid,
> +				      found_key.offset);
> +		if (ret)
> +			goto out;
> +next2:
> +		ret = btrfs_next_item(quota_root, path);
> +		if (ret<  0)
> +			goto out;
> +		if (ret)
> +			break;
> +	}
> +out:
> +	fs_info->qgroup_flags |= flags;
> +	if (!(fs_info->qgroup_flags&  BTRFS_QGROUP_STATUS_FLAG_ON)) {
> +		fs_info->quota_enabled = 0;
> +		fs_info->pending_quota_state = 0;
> +	}
> +	btrfs_free_path(path);
> +
> +	return ret<  0 ? ret : 0;
> +}
> +
> +/*
> + * This is only called from close_ctree() or open_ctree(), both in single-
> + * treaded paths. Clean up the in-memory structures. No locking needed.
> + */
> +void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info)
> +{
> +	struct rb_node *n;
> +	struct btrfs_qgroup *qgroup;
> +	struct btrfs_qgroup_list *list;
> +
> +	while ((n = rb_first(&fs_info->qgroup_tree))) {
> +		qgroup = rb_entry(n, struct btrfs_qgroup, node);
> +		rb_erase(n,&fs_info->qgroup_tree);
> +
> +		WARN_ON(!list_empty(&qgroup->dirty));
> +
> +		while (!list_empty(&qgroup->groups)) {
> +			list = list_first_entry(&qgroup->groups,
> +						struct btrfs_qgroup_list,
> +						next_group);
> +			list_del(&list->next_group);
> +			list_del(&list->next_member);
> +			kfree(list);
> +		}
> +
> +		while (!list_empty(&qgroup->members)) {
> +			list = list_first_entry(&qgroup->members,
> +						struct btrfs_qgroup_list,
> +						next_member);
> +			list_del(&list->next_group);
> +			list_del(&list->next_member);
> +			kfree(list);
> +		}
> +		kfree(qgroup);
> +	}
> +}
> +
> +static int add_qgroup_relation_item(struct btrfs_trans_handle *trans,
> +				    struct btrfs_root *quota_root,
> +				    u64 src, u64 dst)
> +{
> +	int ret;
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	key.objectid = src;
> +	key.type = BTRFS_QGROUP_RELATION_KEY;
> +	key.offset = dst;
> +
> +	ret = btrfs_insert_empty_item(trans, quota_root, path,&key, 0);
> +
> +	btrfs_mark_buffer_dirty(path->nodes[0]);
> +
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +static int del_qgroup_relation_item(struct btrfs_trans_handle *trans,
> +				    struct btrfs_root *quota_root,
> +				    u64 src, u64 dst)
> +{
> +	int ret;
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	key.objectid = src;
> +	key.type = BTRFS_QGROUP_RELATION_KEY;
> +	key.offset = dst;
> +
> +	ret = btrfs_search_slot(trans, quota_root,&key, path, -1, 1);
> +	if (ret<  0)
> +		goto out;
> +
> +	if (ret>  0) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	ret = btrfs_del_item(trans, quota_root, path);
> +out:
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +static int add_qgroup_item(struct btrfs_trans_handle *trans,
> +			   struct btrfs_root *quota_root, u64 qgroupid)
> +{
> +	int ret;
> +	struct btrfs_path *path;
> +	struct btrfs_qgroup_info_item *qgroup_info;
> +	struct btrfs_qgroup_limit_item *qgroup_limit;
> +	struct extent_buffer *leaf;
> +	struct btrfs_key key;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	key.objectid = 0;
> +	key.type = BTRFS_QGROUP_INFO_KEY;
> +	key.offset = qgroupid;
> +
> +	ret = btrfs_insert_empty_item(trans, quota_root, path,&key,
> +				      sizeof(*qgroup_info));
> +	if (ret)
> +		goto out;
> +
> +	leaf = path->nodes[0];
> +	qgroup_info = btrfs_item_ptr(leaf, path->slots[0],
> +				 struct btrfs_qgroup_info_item);
> +	btrfs_set_qgroup_info_generation(leaf, qgroup_info, trans->transid);
> +	btrfs_set_qgroup_info_rfer(leaf, qgroup_info, 0);
> +	btrfs_set_qgroup_info_rfer_cmpr(leaf, qgroup_info, 0);
> +	btrfs_set_qgroup_info_excl(leaf, qgroup_info, 0);
> +	btrfs_set_qgroup_info_excl_cmpr(leaf, qgroup_info, 0);
> +
> +	btrfs_mark_buffer_dirty(leaf);
> +
> +	btrfs_release_path(path);
> +
> +	key.type = BTRFS_QGROUP_LIMIT_KEY;
> +	ret = btrfs_insert_empty_item(trans, quota_root, path,&key,
> +				      sizeof(*qgroup_limit));
> +	if (ret)
> +		goto out;
> +
> +	leaf = path->nodes[0];
> +	qgroup_limit = btrfs_item_ptr(leaf, path->slots[0],
> +				  struct btrfs_qgroup_limit_item);
> +	btrfs_set_qgroup_limit_flags(leaf, qgroup_limit, 0);
> +	btrfs_set_qgroup_limit_max_rfer(leaf, qgroup_limit, 0);
> +	btrfs_set_qgroup_limit_max_excl(leaf, qgroup_limit, 0);
> +	btrfs_set_qgroup_limit_rsv_rfer(leaf, qgroup_limit, 0);
> +	btrfs_set_qgroup_limit_rsv_excl(leaf, qgroup_limit, 0);
> +
> +	btrfs_mark_buffer_dirty(leaf);
> +
> +	ret = 0;
> +out:
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +static int del_qgroup_item(struct btrfs_trans_handle *trans,
> +			   struct btrfs_root *quota_root, u64 qgroupid)
> +{
> +	int ret;
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	key.objectid = 0;
> +	key.type = BTRFS_QGROUP_INFO_KEY;
> +	key.offset = qgroupid;
> +	ret = btrfs_search_slot(trans, quota_root,&key, path, -1, 1);
> +	if (ret<  0)
> +		goto out;
> +
> +	if (ret>  0) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	ret = btrfs_del_item(trans, quota_root, path);
> +	if (ret)
> +		goto out;
> +
> +	btrfs_release_path(path);
> +
> +	key.type = BTRFS_QGROUP_LIMIT_KEY;
> +	ret = btrfs_search_slot(trans, quota_root,&key, path, -1, 1);
> +	if (ret<  0)
> +		goto out;
> +
> +	if (ret>  0) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	ret = btrfs_del_item(trans, quota_root, path);
> +
> +out:
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +static int update_qgroup_limit_item(struct btrfs_trans_handle *trans,
> +				    struct btrfs_root *root, u64 qgroupid,
> +				    u64 flags, u64 max_rfer, u64 max_excl,
> +				    u64 rsv_rfer, u64 rsv_excl)
> +{
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +	struct extent_buffer *l;
> +	struct btrfs_qgroup_limit_item *qgroup_limit;
> +	int ret;
> +	int slot;
> +
> +	key.objectid = 0;
> +	key.type = BTRFS_QGROUP_LIMIT_KEY;
> +	key.offset = qgroupid;
> +
> +	path = btrfs_alloc_path();
> +	BUG_ON(!path);
> +	ret = btrfs_search_slot(trans, root,&key, path, 0, 1);
> +	if (ret>  0)
> +		ret = -ENOENT;
> +
> +	if (ret)
> +		goto out;
> +
> +	l = path->nodes[0];
> +	slot = path->slots[0];
> +	qgroup_limit = btrfs_item_ptr(l, path->slots[0],
> +				      struct btrfs_qgroup_limit_item);
> +	btrfs_set_qgroup_limit_flags(l, qgroup_limit, flags);
> +	btrfs_set_qgroup_limit_max_rfer(l, qgroup_limit, max_rfer);
> +	btrfs_set_qgroup_limit_max_excl(l, qgroup_limit, max_excl);
> +	btrfs_set_qgroup_limit_rsv_rfer(l, qgroup_limit, rsv_rfer);
> +	btrfs_set_qgroup_limit_rsv_excl(l, qgroup_limit, rsv_excl);
> +
> +	btrfs_mark_buffer_dirty(l);
> +
> +out:
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +static int update_qgroup_info_item(struct btrfs_trans_handle *trans,
> +				   struct btrfs_root *root,
> +				   struct btrfs_qgroup *qgroup)
> +{
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +	struct extent_buffer *l;
> +	struct btrfs_qgroup_info_item *qgroup_info;
> +	int ret;
> +	int slot;
> +
> +	key.objectid = 0;
> +	key.type = BTRFS_QGROUP_INFO_KEY;
> +	key.offset = qgroup->qgroupid;
> +
> +	path = btrfs_alloc_path();
> +	BUG_ON(!path);
> +	ret = btrfs_search_slot(trans, root,&key, path, 0, 1);
> +	if (ret>  0)
> +		ret = -ENOENT;
> +
> +	if (ret)
> +		goto out;
> +
> +	l = path->nodes[0];
> +	slot = path->slots[0];
> +	qgroup_info = btrfs_item_ptr(l, path->slots[0],
> +				 struct btrfs_qgroup_info_item);
> +	btrfs_set_qgroup_info_generation(l, qgroup_info, trans->transid);
> +	btrfs_set_qgroup_info_rfer(l, qgroup_info, qgroup->rfer);
> +	btrfs_set_qgroup_info_rfer_cmpr(l, qgroup_info, qgroup->rfer_cmpr);
> +	btrfs_set_qgroup_info_excl(l, qgroup_info, qgroup->excl);
> +	btrfs_set_qgroup_info_excl_cmpr(l, qgroup_info, qgroup->excl_cmpr);
> +
> +	btrfs_mark_buffer_dirty(l);
> +
> +out:
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +static int update_qgroup_status_item(struct btrfs_trans_handle *trans,
> +				     struct btrfs_fs_info *fs_info,
> +				    struct btrfs_root *root)
> +{
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +	struct extent_buffer *l;
> +	struct btrfs_qgroup_status_item *ptr;
> +	int ret;
> +	int slot;
> +
> +	key.objectid = 0;
> +	key.type = BTRFS_QGROUP_STATUS_KEY;
> +	key.offset = 0;
> +
> +	path = btrfs_alloc_path();
> +	BUG_ON(!path);
> +	ret = btrfs_search_slot(trans, root,&key, path, 0, 1);
> +	if (ret>  0)
> +		ret = -ENOENT;
> +
> +	if (ret)
> +		goto out;
> +
> +	l = path->nodes[0];
> +	slot = path->slots[0];
> +	ptr = btrfs_item_ptr(l, slot, struct btrfs_qgroup_status_item);
> +	btrfs_set_qgroup_status_flags(l, ptr, fs_info->qgroup_flags);
> +	btrfs_set_qgroup_status_generation(l, ptr, trans->transid);
> +	/* XXX scan */
> +
> +	btrfs_mark_buffer_dirty(l);
> +
> +out:
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +/*
> + * called with qgroup_lock held
> + */
> +static int btrfs_clean_quota_tree(struct btrfs_trans_handle *trans,
> +				  struct btrfs_root *root)
> +{
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +	int ret;
> +
> +	if (!root)
> +		return -EINVAL;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	while (1) {
> +		key.objectid = 0;
> +		key.offset = 0;
> +		key.type = 0;
> +
> +		path->leave_spinning = 1;
> +		ret = btrfs_search_slot(trans, root,&key, path, -1, 1);
> +		if (ret>  0) {
> +			if (path->slots[0] == 0)
> +				break;
> +			path->slots[0]--;
> +		} else if (ret<  0) {
> +			break;
> +		}
> +
> +		ret = btrfs_del_item(trans, root, path);
> +		if (ret)
> +			goto out;
> +		btrfs_release_path(path);
> +	}
> +	ret = 0;
> +out:
> +	root->fs_info->pending_quota_state = 0;
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +int btrfs_quota_enable(struct btrfs_trans_handle *trans,
> +		       struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_root *quota_root;
> +	struct btrfs_path *path = NULL;
> +	struct btrfs_qgroup_status_item *ptr;
> +	struct extent_buffer *leaf;
> +	struct btrfs_key key;
> +	int ret = 0;
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	if (fs_info->quota_root) {
> +		fs_info->pending_quota_state = 1;
> +		spin_unlock(&fs_info->qgroup_lock);
> +		goto out;
> +	}
> +	spin_unlock(&fs_info->qgroup_lock);
> +
> +	/*
> +	 * initially create the quota tree
> +	 */
> +	quota_root = btrfs_create_tree(trans, fs_info,
> +				       BTRFS_QUOTA_TREE_OBJECTID);
> +	if (IS_ERR(quota_root)) {
> +		ret =  PTR_ERR(quota_root);
> +		goto out;
> +	}
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	key.objectid = 0;
> +	key.type = BTRFS_QGROUP_STATUS_KEY;
> +	key.offset = 0;
> +
> +	ret = btrfs_insert_empty_item(trans, quota_root, path,&key,
> +				      sizeof(*ptr));
> +	if (ret)
> +		goto out;
> +
> +	leaf = path->nodes[0];
> +	ptr = btrfs_item_ptr(leaf, path->slots[0],
> +				 struct btrfs_qgroup_status_item);
> +	btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid);
> +	btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION);
> +	fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON |
> +				BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> +	btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags);
> +	btrfs_set_qgroup_status_scan(leaf, ptr, 0);
> +
> +	btrfs_mark_buffer_dirty(leaf);
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	fs_info->quota_root = quota_root;
> +	fs_info->pending_quota_state = 1;
> +	spin_unlock(&fs_info->qgroup_lock);
> +out:
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
> +int btrfs_quota_disable(struct btrfs_trans_handle *trans,
> +			struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_root *tree_root = fs_info->tree_root;
> +	struct btrfs_root *quota_root;
> +	int ret = 0;
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	fs_info->pending_quota_state = 0;
> +	quota_root = fs_info->quota_root;
> +	fs_info->quota_root = NULL;
> +	btrfs_free_qgroup_config(fs_info);
> +	spin_unlock(&fs_info->qgroup_lock);
> +
> +	if (!quota_root)
> +		return -EINVAL;
> +
> +	ret = btrfs_clean_quota_tree(trans, quota_root);
> +	if (ret)
> +		goto out;
> +
> +	ret = btrfs_del_root(trans, tree_root,&quota_root->root_key);
> +	if (ret)
> +		goto out;
> +
> +	list_del(&quota_root->dirty_list);
> +
> +	btrfs_tree_lock(quota_root->node);
> +	clean_tree_block(trans, tree_root, quota_root->node);
> +	btrfs_tree_unlock(quota_root->node);
> +	btrfs_free_tree_block(trans, quota_root, quota_root->node, 0, 1);
> +
> +	free_extent_buffer(quota_root->node);
> +	free_extent_buffer(quota_root->commit_root);
> +	kfree(quota_root);
> +out:
> +	return ret;
> +}
> +
> +int btrfs_quota_rescan(struct btrfs_fs_info *fs_info)
> +{
> +	/* FIXME */
> +	return 0;
> +}
> +
> +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
> +			      struct btrfs_fs_info *fs_info, u64 src, u64 dst)
> +{
> +	struct btrfs_root *quota_root;
> +	int ret = 0;
> +
> +	quota_root = fs_info->quota_root;
> +	if (!quota_root)
> +		return -EINVAL;
> +
> +	ret = add_qgroup_relation_item(trans, quota_root, src, dst);
> +	if (ret)
> +		return ret;
> +
> +	ret = add_qgroup_relation_item(trans, quota_root, dst, src);
> +	if (ret) {
> +		del_qgroup_relation_item(trans, quota_root, src, dst);
> +		return ret;
> +	}
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	ret = add_relation_rb(quota_root->fs_info, src, dst);
> +	spin_unlock(&fs_info->qgroup_lock);
> +
> +	return ret;
> +}
> +
> +int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
> +			      struct btrfs_fs_info *fs_info, u64 src, u64 dst)
> +{
> +	struct btrfs_root *quota_root;
> +	int ret = 0;
> +	int err;
> +
> +	quota_root = fs_info->quota_root;
> +	if (!quota_root)
> +		return -EINVAL;
> +
> +	ret = del_qgroup_relation_item(trans, quota_root, src, dst);
> +	err = del_qgroup_relation_item(trans, quota_root, dst, src);
> +	if (err&&  !ret)
> +		ret = err;
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	del_relation_rb(fs_info, src, dst);
> +
> +	spin_unlock(&fs_info->qgroup_lock);
> +
> +	return ret;
> +}
> +
> +int btrfs_create_qgroup(struct btrfs_trans_handle *trans,
> +			struct btrfs_fs_info *fs_info, u64 qgroupid, char *name)
> +{
> +	struct btrfs_root *quota_root;
> +	struct btrfs_qgroup *qgroup;
> +	int ret = 0;
> +
> +	quota_root = fs_info->quota_root;
> +	if (!quota_root)
> +		return -EINVAL;
> +
> +	ret = add_qgroup_item(trans, quota_root, qgroupid);
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	qgroup = add_qgroup_rb(fs_info, qgroupid);
> +	spin_unlock(&fs_info->qgroup_lock);
> +
> +	if (IS_ERR(qgroup))
> +		ret = PTR_ERR(qgroup);
> +
> +	return ret;
> +}
> +
> +int btrfs_remove_qgroup(struct btrfs_trans_handle *trans,
> +			struct btrfs_fs_info *fs_info, u64 qgroupid)
> +{
> +	struct btrfs_root *quota_root;
> +	int ret = 0;
> +
> +	quota_root = fs_info->quota_root;
> +	if (!quota_root)
> +		return -EINVAL;
> +
> +	ret = del_qgroup_item(trans, quota_root, qgroupid);
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	del_qgroup_rb(quota_root->fs_info, qgroupid);
> +
> +	spin_unlock(&fs_info->qgroup_lock);
> +
> +	return ret;
> +}
> +
> +int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
> +		       struct btrfs_fs_info *fs_info, u64 qgroupid,
> +		       struct btrfs_qgroup_limit *limit)
> +{
> +	struct btrfs_root *quota_root = fs_info->quota_root;
> +	struct btrfs_qgroup *qgroup;
> +	int ret = 0;
> +
> +	if (!quota_root)
> +		return -EINVAL;
> +
> +	ret = update_qgroup_limit_item(trans, quota_root, qgroupid,
> +				       limit->flags, limit->max_rfer,
> +				       limit->max_excl, limit->rsv_rfer,
> +				       limit->rsv_excl);
> +	if (ret) {
> +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> +		printk(KERN_INFO "unable to update quota limit for %llu\n",
> +		       (unsigned long long)qgroupid);
> +	}
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +
> +	qgroup = find_qgroup_rb(fs_info, qgroupid);
> +	if (!qgroup) {
> +		ret = -ENOENT;
> +		goto unlock;
> +	}
> +	qgroup->lim_flags = limit->flags;
> +	qgroup->max_rfer = limit->max_rfer;
> +	qgroup->max_excl = limit->max_excl;
> +	qgroup->rsv_rfer = limit->rsv_rfer;
> +	qgroup->rsv_excl = limit->rsv_excl;
> +
> +unlock:
> +	spin_unlock(&fs_info->qgroup_lock);
> +
> +	return ret;
> +}
> +
> +static void qgroup_dirty(struct btrfs_fs_info *fs_info,
> +			 struct btrfs_qgroup *qgroup)
> +{
> +	if (list_empty(&qgroup->dirty))
> +		list_add(&qgroup->dirty,&fs_info->dirty_qgroups);
> +}
> +
> +/*
> + * btrfs_qgroup_record_ref is called for every ref that is added to or
deleted
> + * from the fs. First, all roots referencing the extent are searched, and
> + * then the space is accounted accordingly to the different roots. The
> + * accounting algorithm works in 3 steps documented inline.
> + */
> +int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans,
> +			     struct btrfs_fs_info *fs_info,
> +			     struct btrfs_delayed_ref_node *node,
> +			     struct btrfs_delayed_extent_op *extent_op)
> +{
> +	struct btrfs_key ins;
> +	struct btrfs_root *quota_root;
> +	u64 ref_root;
> +	struct btrfs_qgroup *qgroup;
> +	struct ulist_node *unode;
> +	struct ulist *roots = NULL;
> +	struct ulist *tmp = NULL;
> +	u64 seq;
> +	int ret = 0;
> +	int sgn;
> +
> +	if (!fs_info->quota_enabled)
> +		return 0;
> +
> +	BUG_ON(!fs_info->quota_root);
> +
> +	ins.objectid = node->bytenr;
> +	ins.offset = node->num_bytes;
> +	ins.type = BTRFS_EXTENT_ITEM_KEY;
> +
> +	if (node->type == BTRFS_TREE_BLOCK_REF_KEY ||
> +	    node->type == BTRFS_SHARED_BLOCK_REF_KEY) {
> +		struct btrfs_delayed_tree_ref *ref;
> +		ref = btrfs_delayed_node_to_tree_ref(node);
> +		ref_root = ref->root;
> +	} else if (node->type == BTRFS_EXTENT_DATA_REF_KEY ||
> +		   node->type == BTRFS_SHARED_DATA_REF_KEY) {
> +		struct btrfs_delayed_data_ref *ref;
> +		ref = btrfs_delayed_node_to_data_ref(node);
> +		ref_root = ref->root;
> +	} else {
> +		BUG();
> +	}
> +
> +	if (!is_fstree(ref_root)) {
> +		/*
> +		 * non-fs-trees are not being accounted
> +		 */
> +		return 0;
> +	}
> +
> +	switch (node->action) {
> +	case BTRFS_ADD_DELAYED_REF:
> +	case BTRFS_ADD_DELAYED_EXTENT:
> +		sgn = 1;
> +		break;
> +	case BTRFS_DROP_DELAYED_REF:
> +		sgn = -1;
> +		break;
> +	case BTRFS_UPDATE_DELAYED_HEAD:
> +		return 0;
> +	default:
> +		BUG();
> +	}
> +
> +	ret = btrfs_find_all_roots(trans, fs_info, node->bytenr,
> +				   node->num_bytes,
> +				   sgn>  0 ? node->seq - 1 : node->seq,&roots);
> +	if (IS_ERR(roots)) {
> +		ret = PTR_ERR(roots);
> +		goto out;
> +	}
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	quota_root = fs_info->quota_root;
> +	if (!quota_root)
> +		goto out;
> +
> +	qgroup = find_qgroup_rb(fs_info, ref_root);
> +	if (!qgroup)
> +		goto out;
> +
> +	/*
> +	 * step 1: for each old ref, visit all nodes once and inc refcnt
> +	 */
> +	unode = NULL;
> +	tmp = ulist_alloc(GFP_ATOMIC);
> +	if (!tmp) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	seq = fs_info->qgroup_seq;
> +	fs_info->qgroup_seq += roots->nnodes + 1; /* max refcnt */
> +
> +	while ((unode = ulist_next(roots, unode))) {
> +		struct ulist_node *tmp_unode;
> +		struct btrfs_qgroup *qg;
> +
> +		qg = find_qgroup_rb(fs_info, unode->val);
> +		if (!qg)
> +			continue;
> +
> +		ulist_reinit(tmp);
> +						/* XXX id not needed */
> +		ulist_add(tmp, qg->qgroupid, (unsigned long)qg, GFP_ATOMIC);
> +		tmp_unode = NULL;
> +		while ((tmp_unode = ulist_next(tmp, tmp_unode))) {
> +			struct btrfs_qgroup_list *glist;
> +
> +			qg = (struct btrfs_qgroup *)tmp_unode->aux;
> +			if (qg->refcnt<  seq)
> +				qg->refcnt = seq + 1;
> +			else
> +				++qg->refcnt;
> +
> +			list_for_each_entry(glist,&qg->groups, next_group) {
> +				ulist_add(tmp, glist->group->qgroupid,
> +					  (unsigned long)glist->group,
> +					  GFP_ATOMIC);
> +			}
> +		}
> +	}
> +
> +	/*
> +	 * step 2: walk from the new root
> +	 */
> +	ulist_reinit(tmp);
> +	ulist_add(tmp, qgroup->qgroupid, (unsigned long)qgroup, GFP_ATOMIC);
> +	unode = NULL;
> +	while ((unode = ulist_next(tmp, unode))) {
> +		struct btrfs_qgroup *qg;
> +		struct btrfs_qgroup_list *glist;
> +
> +		qg = (struct btrfs_qgroup *)unode->aux;
> +		if (qg->refcnt<  seq) {
> +			/* not visited by step 1 */
> +			qg->rfer += sgn * node->num_bytes;
> +			qg->rfer_cmpr += sgn * node->num_bytes;
> +			if (roots->nnodes == 0) {
> +				qg->excl += sgn * node->num_bytes;
> +				qg->excl_cmpr += sgn * node->num_bytes;
> +			}
> +			qgroup_dirty(fs_info, qg);
> +		}
> +		WARN_ON(qg->tag>= seq);
> +		qg->tag = seq;
> +
> +		list_for_each_entry(glist,&qg->groups, next_group) {
> +			ulist_add(tmp, glist->group->qgroupid,
> +				  (unsigned long)glist->group, GFP_ATOMIC);
> +		}
> +	}
> +
> +	/*
> +	 * step 3: walk again from old refs
> +	 */
> +	while ((unode = ulist_next(roots, unode))) {
> +		struct btrfs_qgroup *qg;
> +		struct ulist_node *tmp_unode;
> +
> +		qg = find_qgroup_rb(fs_info, unode->val);
> +		if (!qg)
> +			continue;
> +
> +		ulist_reinit(tmp);
> +		ulist_add(tmp, qg->qgroupid, (unsigned long)qg, GFP_ATOMIC);
> +		tmp_unode = NULL;
> +		while ((tmp_unode = ulist_next(tmp, tmp_unode))) {
> +			struct btrfs_qgroup_list *glist;
> +
> +			qg = (struct btrfs_qgroup *)tmp_unode->aux;
> +			if (qg->tag == seq)
> +				continue;
> +
> +			if (qg->refcnt - seq == roots->nnodes) {
> +				qg->excl -= sgn * node->num_bytes;
> +				qg->excl_cmpr -= sgn * node->num_bytes;
> +				qgroup_dirty(fs_info, qg);
> +			}
> +
> +			list_for_each_entry(glist,&qg->groups, next_group) {
> +				ulist_add(tmp, glist->group->qgroupid,
> +					  (unsigned long)glist->group,
> +					  GFP_ATOMIC);
> +			}
> +		}
> +	}
> +	ret = 0;
> +out:
> +	spin_unlock(&fs_info->qgroup_lock);
> +	ulist_free(roots);
> +	ulist_free(tmp);
> +
> +	return ret;
> +}
> +
> +/*
> + * called from commit_transaction. Writes all changed qgroups to disk.
> + */
> +int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
> +		      struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_root *quota_root = fs_info->quota_root;
> +	int ret = 0;
> +
> +	if (!quota_root)
> +		goto out;
> +
> +	fs_info->quota_enabled = fs_info->pending_quota_state;
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	while (!list_empty(&fs_info->dirty_qgroups)) {
> +		struct btrfs_qgroup *qgroup;
> +		qgroup = list_first_entry(&fs_info->dirty_qgroups,
> +					  struct btrfs_qgroup, dirty);
> +		list_del_init(&qgroup->dirty);
> +		spin_unlock(&fs_info->qgroup_lock);
> +		ret = update_qgroup_info_item(trans, quota_root, qgroup);
> +		if (ret)
> +			fs_info->qgroup_flags |> +				
BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> +		spin_lock(&fs_info->qgroup_lock);
> +	}
> +	if (fs_info->quota_enabled)
> +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_ON;
> +	else
> +		fs_info->qgroup_flags&= ~BTRFS_QGROUP_STATUS_FLAG_ON;
> +	spin_unlock(&fs_info->qgroup_lock);
> +
> +	ret = update_qgroup_status_item(trans, fs_info, quota_root);
> +	if (ret)
> +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> +
> +out:
> +
> +	return ret;
> +}
> +
> +/*
> + * copy the acounting information between qgroups. This is necessary when
a
> + * snapshot or a subvolume is created
> + */
> +int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
> +			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
> +			 struct btrfs_qgroup_inherit *inherit)
> +{
> +	int ret = 0;
> +	int i;
> +	u64 *i_qgroups;
> +	struct btrfs_root *quota_root = fs_info->quota_root;
> +	struct btrfs_qgroup *srcgroup;
> +	struct btrfs_qgroup *dstgroup;
> +	u32 level_size = 0;
> +
> +	if (!fs_info->quota_enabled)
> +		return 0;
> +
> +	if (!quota_root)
> +		ret = -EINVAL;
Is this "return -EINVAL" ?
> +
> +	/*
> +	 * create a tracking group for the subvol itself
> +	 */
> +	ret = add_qgroup_item(trans, quota_root, objectid);
> +	if (ret)
> +		goto out;
> +
> +	if (inherit&&  inherit->flags& 
BTRFS_QGROUP_INHERIT_SET_LIMITS) {
> +		ret = update_qgroup_limit_item(trans, quota_root, objectid,
> +					       inherit->lim.flags,
> +					       inherit->lim.max_rfer,
> +					       inherit->lim.max_excl,
> +					       inherit->lim.rsv_rfer,
> +					       inherit->lim.rsv_excl);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	if (srcid) {
> +		struct btrfs_root *srcroot;
> +		struct btrfs_key srckey;
> +		int srcroot_level;
> +
> +		srckey.objectid = srcid;
> +		srckey.type = BTRFS_ROOT_ITEM_KEY;
> +		srckey.offset = (u64)-1;
> +		srcroot = btrfs_read_fs_root_no_name(fs_info,&srckey);
> +		if (IS_ERR(srcroot)) {
> +			ret = PTR_ERR(srcroot);
> +			goto out;
> +		}
> +
> +		rcu_read_lock();
> +		srcroot_level = btrfs_header_level(srcroot->node);
> +		level_size = btrfs_level_size(srcroot, srcroot_level);
> +		rcu_read_unlock();
> +	}
> +
> +	/*
> +	 * add qgroup to all inherited groups
> +	 */
> +	if (inherit) {
> +		i_qgroups = (u64 *)(inherit + 1);
> +		for (i = 0; i<  inherit->num_qgroups; ++i) {
> +			ret = add_qgroup_relation_item(trans, quota_root,
> +						       objectid, *i_qgroups);
> +			if (ret)
> +				goto out;
> +			ret = add_qgroup_relation_item(trans, quota_root,
> +						       *i_qgroups, objectid);
> +			if (ret)
> +				goto out;
> +			++i_qgroups;
> +		}
> +	}
> +
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +
> +	dstgroup = add_qgroup_rb(fs_info, objectid);
> +	if (!dstgroup)
> +		goto unlock;
> +
> +	if (srcid) {
> +		srcgroup = find_qgroup_rb(fs_info, srcid);
> +		if (!srcgroup)
> +			goto unlock;
> +		dstgroup->rfer = srcgroup->rfer - level_size;
> +		dstgroup->rfer_cmpr = srcgroup->rfer_cmpr - level_size;
> +		srcgroup->excl = level_size;
> +		srcgroup->excl_cmpr = level_size;
> +		qgroup_dirty(fs_info, dstgroup);
> +		qgroup_dirty(fs_info, srcgroup);
> +	}
> +
> +	if (!inherit)
> +		goto unlock;
> +
> +	i_qgroups = (u64 *)(inherit + 1);
> +	for (i = 0; i<  inherit->num_qgroups; ++i) {
> +		ret = add_relation_rb(quota_root->fs_info, objectid,
> +				      *i_qgroups);
> +		if (ret)
> +			goto unlock;
> +		++i_qgroups;
> +	}
> +
> +	for (i = 0; i<   inherit->num_ref_copies; ++i) {
> +		struct btrfs_qgroup *src;
> +		struct btrfs_qgroup *dst;
> +
> +		src = find_qgroup_rb(fs_info, i_qgroups[0]);
> +		dst = find_qgroup_rb(fs_info, i_qgroups[1]);
> +
> +		if (!src || !dst) {
> +			ret = -EINVAL;
> +			goto unlock;
> +		}
> +
> +		dst->rfer = src->rfer - level_size;
> +		dst->rfer_cmpr = src->rfer_cmpr - level_size;
> +		i_qgroups += 2;
> +	}
> +	for (i = 0; i<   inherit->num_excl_copies; ++i) {
> +		struct btrfs_qgroup *src;
> +		struct btrfs_qgroup *dst;
> +
> +		src = find_qgroup_rb(fs_info, i_qgroups[0]);
> +		dst = find_qgroup_rb(fs_info, i_qgroups[1]);
> +
> +		if (!src || !dst) {
> +			ret = -EINVAL;
> +			goto unlock;
> +		}
> +
> +		dst->excl = src->excl + level_size;
> +		dst->excl_cmpr = src->excl_cmpr + level_size;
> +		i_qgroups += 2;
> +	}
> +
> +unlock:
> +	spin_unlock(&fs_info->qgroup_lock);
> +out:
> +	return 0;
        return ret; ?

Thanks,
Tsutomu
> +}
> +
> +/*
> + * reserve some space for a qgroup and all its parents. The reservation
takes
> + * place with start_transaction or dealloc_reserve, similar to ENOSPC
> + * accounting. If not enough space is available, EDQUOT is returned.
> + * We assume that the requested space is new for all qgroups.
> + */
> +int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
> +{
> +	struct btrfs_root *quota_root;
> +	struct btrfs_qgroup *qgroup;
> +	struct btrfs_fs_info *fs_info = root->fs_info;
> +	u64 ref_root = root->root_key.objectid;
> +	int ret = 0;
> +	struct ulist *ulist = NULL;
> +	struct ulist_node *unode;
> +
> +	if (!is_fstree(ref_root))
> +		return 0;
> +
> +	if (num_bytes == 0)
> +		return 0;
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +	quota_root = fs_info->quota_root;
> +	if (!quota_root)
> +		goto out;
> +
> +	qgroup = find_qgroup_rb(fs_info, ref_root);
> +	if (!qgroup)
> +		goto out;
> +
> +	/*
> +	 * in a first step, we check all affected qgroups if any limits would
> +	 * be exceeded
> +	 */
> +	ulist = ulist_alloc(GFP_ATOMIC);
> +	ulist_add(ulist, qgroup->qgroupid, (unsigned long)qgroup, GFP_ATOMIC);
> +	unode = NULL;
> +	while ((unode = ulist_next(ulist, unode))) {
> +		struct btrfs_qgroup *qg;
> +		struct btrfs_qgroup_list *glist;
> +
> +		qg = (struct btrfs_qgroup *)unode->aux;
> +
> +		if ((qg->lim_flags&  BTRFS_QGROUP_LIMIT_MAX_RFER)&&
> +		    qg->reserved + qg->rfer + num_bytes>
> +		    qg->max_rfer)
> +			ret = -EDQUOT;
> +
> +		if ((qg->lim_flags&  BTRFS_QGROUP_LIMIT_MAX_EXCL)&&
> +		    qg->reserved + qg->excl + num_bytes>
> +		    qg->max_excl)
> +			ret = -EDQUOT;
> +
> +		list_for_each_entry(glist,&qg->groups, next_group) {
> +			ulist_add(ulist, glist->group->qgroupid,
> +				  (unsigned long)glist->group, GFP_ATOMIC);
> +		}
> +	}
> +	if (ret)
> +		goto out;
> +
> +	/*
> +	 * no limits exceeded, now record the reservation into all qgroups
> +	 */
> +	unode = NULL;
> +	while ((unode = ulist_next(ulist, unode))) {
> +		struct btrfs_qgroup *qg;
> +
> +		qg = (struct btrfs_qgroup *)unode->aux;
> +
> +		qg->reserved += num_bytes;
> +#if 0
> +		qgroup_dirty(fs_info, qg);/* XXX not necesarry */
> +#endif
> +	}
> +
> +out:
> +	spin_unlock(&fs_info->qgroup_lock);
> +	ulist_free(ulist);
> +
> +	return ret;
> +}
> +
> +void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
> +{
> +	struct btrfs_root *quota_root;
> +	struct btrfs_qgroup *qgroup;
> +	struct btrfs_fs_info *fs_info = root->fs_info;
> +	struct ulist *ulist = NULL;
> +	struct ulist_node *unode;
> +	u64 ref_root = root->root_key.objectid;
> +
> +	if (!is_fstree(ref_root))
> +		return;
> +
> +	if (num_bytes == 0)
> +		return;
> +
> +	spin_lock(&fs_info->qgroup_lock);
> +
> +	quota_root = fs_info->quota_root;
> +	if (!quota_root)
> +		goto out;
> +
> +	qgroup = find_qgroup_rb(fs_info, ref_root);
> +	if (!qgroup)
> +		goto out;
> +
> +	ulist = ulist_alloc(GFP_ATOMIC);
> +	ulist_add(ulist, qgroup->qgroupid, (unsigned long)qgroup, GFP_ATOMIC);
> +	unode = NULL;
> +	while ((unode = ulist_next(ulist, unode))) {
> +		struct btrfs_qgroup *qg;
> +		struct btrfs_qgroup_list *glist;
> +
> +		qg = (struct btrfs_qgroup *)unode->aux;
> +
> +		qg->reserved -= num_bytes;
> +#if 0
> +qgroup_dirty(fs_info, qg);
> +#endif
> +
> +		list_for_each_entry(glist,&qg->groups, next_group) {
> +			ulist_add(ulist, glist->group->qgroupid,
> +				  (unsigned long)glist->group, GFP_ATOMIC);
> +		}
> +	}
> +
> +out:
> +	spin_unlock(&fs_info->qgroup_lock);
> +	ulist_free(ulist);
> +}

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-21 06:06 UTC

head link

Re: [PATCH 07/24] Btrfs: add tree modification log functions

Hi Tsutomu,

On Mon, May 21, 2012 at 01:44 (+0200), Tsutomu Itoh
wrote:>> +static noinline int
>> +__tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct
tree_mod_elem *tm)
>> +{
>> +	struct rb_root *tm_root;
>> +	struct rb_node **new;
>> +	struct rb_node *parent = NULL;
>> +	struct tree_mod_elem *cur;
>> +
>> +	BUG_ON(!tm || !tm->elem.seq);
>> +
>> +	write_lock(&fs_info->tree_mod_log_lock);
>> +	tm_root =&fs_info->tree_mod_log;
>> +	new =&tm_root->rb_node;
>> +	while (*new) {
>> +		cur = container_of(*new, struct tree_mod_elem, node);
>> +		parent = *new;
>> +		if (cur->index<  tm->index)
>> +			new =&((*new)->rb_left);
>> +		else if (cur->index>  tm->index)
>> +			new =&((*new)->rb_right);
>> +		else if (cur->elem.seq<  tm->elem.seq)
>> +			new =&((*new)->rb_left);
>> +		else if (cur->elem.seq>  tm->elem.seq)
>> +			new =&((*new)->rb_right);
>> +		else {
>> +			kfree(tm);
>> +			return -EEXIST;
> 
> I think that write_unlock() is necessary for here.
I thought about calling write_unlock() here and decided against, because
we cannot handle EEXIST anyway. If it ever occurs, there''s a bug in the
code and we hit a BUG_ON immediately after.

To make that more explicit, I''ll change it to either call BUG() here
directly or do the write_unlock nevertheless.
>> +		}
>> +	}
>> +
>> +	rb_link_node(&tm->node, parent, new);
>> +	rb_insert_color(&tm->node, tm_root);
>> +	write_unlock(&fs_info->tree_mod_log_lock);
>> +
>> +	return 0;
>> +}
>> +
>> +int tree_mod_alloc(struct btrfs_fs_info *fs_info, gfp_t flags,
>> +		   struct tree_mod_elem **tm_ret)
>> +{
>> +	struct tree_mod_elem *tm;
>> +	u64 seq = 0;
>> +
>> +	/*
>> +	 * we want to avoid a malloc/free cycle if there''s no
blocker in the
>> +	 * list.
>> +	 * we also want to avoid atomic malloc. so we must drop the spinlock
>> +	 * before calling kzalloc and recheck afterwards.
>> +	 */
>> +	spin_lock(&fs_info->tree_mod_seq_lock);
>> +	if (list_empty(&fs_info->tree_mod_seq_list))
>> +		goto out;
>> +
>> +	spin_unlock(&fs_info->tree_mod_seq_lock);
>> +	tm = *tm_ret = kzalloc(sizeof(*tm), flags);
>> +	if (!tm)
>> +		return -ENOMEM;
>> +
>> +	spin_lock(&fs_info->tree_mod_seq_lock);
>> +	if (list_empty(&fs_info->tree_mod_seq_list)) {
>> +		kfree(tm);
>> +		goto out;
>> +	}
>> +
>> +	__get_tree_mod_seq(fs_info,&tm->elem);
>> +	seq = tm->elem.seq;
>> +	tm->elem.flags = 0;
>> +
>> +out:
>> +	spin_unlock(&fs_info->tree_mod_seq_lock);
>> +	return seq;
>> +}
>> +
>> +static noinline int
>> +tree_mod_log_insert_key_mask(struct btrfs_fs_info *fs_info,
>> +			     struct extent_buffer *eb, int slot,
>> +			     enum mod_log_op op, gfp_t flags)
>> +{
>> +	struct tree_mod_elem *tm;
>> +	int ret;
>> +
>> +	ret = tree_mod_alloc(fs_info, flags,&tm);
>> +	if (ret<= 0)
>> +		return ret;
>> +
>> +	tm->index = eb->start>>  PAGE_CACHE_SHIFT;
>> +	if (op != MOD_LOG_KEY_ADD) {
>> +		btrfs_node_key(eb,&tm->key, slot);
>> +		tm->blockptr = btrfs_node_blockptr(eb, slot);
>> +	}
>> +	tm->op = op;
>> +	tm->slot = slot;
>> +	tm->generation = btrfs_node_ptr_generation(eb, slot);
>> +
>> +	return __tree_mod_log_insert(fs_info, tm);
>> +}
>> +
>> +static noinline int
>> +tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, struct
extent_buffer *eb,
>> +			int slot, enum mod_log_op op)
>> +{
>> +	return tree_mod_log_insert_key_mask(fs_info, eb, slot, op, GFP_NOFS);
>> +}
>> +
>> +static noinline int
>> +tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
>> +			 struct extent_buffer *eb, int dst_slot, int src_slot,
>> +			 int nr_items, gfp_t flags)
>> +{
>> +	struct tree_mod_elem *tm;
>> +	int ret;
>> +
>> +	ret = tree_mod_alloc(fs_info, flags,&tm);
>> +	if (ret<= 0)
>> +		return ret;
>> +
>> +	tm->index = eb->start>>  PAGE_CACHE_SHIFT;
>> +	tm->slot = src_slot;
>> +	tm->move.dst_slot = dst_slot;
>> +	tm->move.nr_items = nr_items;
>> +	tm->op = MOD_LOG_MOVE_KEYS;
>> +
>> +	return __tree_mod_log_insert(fs_info, tm);
>> +}
>> +
>> +static noinline int
>> +tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
>> +			 struct extent_buffer *old_root,
>> +			 struct extent_buffer *new_root, gfp_t flags)
>> +{
>> +	struct tree_mod_elem *tm;
>> +	int ret;
>> +
>> +	ret = tree_mod_alloc(fs_info, flags,&tm);
>> +	if (ret<= 0)
>> +		return ret;
>> +
>> +	tm->index = new_root->start>>  PAGE_CACHE_SHIFT;
>> +	tm->old_root.logical = old_root->start;
>> +	tm->old_root.level = btrfs_header_level(old_root);
>> +	tm->generation = btrfs_header_generation(old_root);
>> +	tm->op = MOD_LOG_ROOT_REPLACE;
>> +
>> +	return __tree_mod_log_insert(fs_info, tm);
>> +}
>> +
>> +static struct tree_mod_elem *
>> +__tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64
min_seq,
>> +		      int smallest)
>> +{
>> +	struct rb_root *tm_root;
>> +	struct rb_node *node;
>> +	struct tree_mod_elem *cur = NULL;
>> +	struct tree_mod_elem *found = NULL;
>> +	u64 index = start>>  PAGE_CACHE_SHIFT;
>> +
>> +	read_lock(&fs_info->tree_mod_log_lock);
>> +	tm_root =&fs_info->tree_mod_log;
>> +	node = tm_root->rb_node;
>> +	while (node) {
>> +		cur = container_of(node, struct tree_mod_elem, node);
>> +		if (cur->index<  index) {
>> +			node = node->rb_left;
>> +		} else if (cur->index>  index) {
>> +			node = node->rb_right;
>> +		} else if (cur->elem.seq<  min_seq) {
>> +			node = node->rb_left;
>> +		} else if (!smallest) {
>> +			/* we want the node with the highest seq */
>> +			if (found)
>> +				BUG_ON(found->elem.seq>  cur->elem.seq);
>> +			found = cur;
>> +			node = node->rb_left;
>> +		} else if (cur->elem.seq>  min_seq) {
>> +			/* we want the node with the smallest seq */
>> +			if (found)
>> +				BUG_ON(found->elem.seq<  cur->elem.seq);
>> +			found = cur;
>> +			node = node->rb_right;
>> +		} else {
> 
> I think read_unlock() is necessary for here.
Right, I''ll add that. Strange lockdep didn''t catch this one.

Thanks for looking!
-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2012-May-21 06:32 UTC

head link

Re: [PATCH 23/24] Btrfs: add qgroup ioctls

[cc to the list]

On Sun, May 20, 2012 at 19:00 (+0200), Andrei Popa
wrote:> For testing qgroup, the patches for btrfs-progs sent last year are ok ?
> http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg12724.html
Yes, that''s the link to the patches required. Sorry for missing it in
my
original post.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - May 2012 - [PATCH 00/24] Btrfs: tree modification log and qgroup patch set

[PATCH 00/24] Btrfs: tree modification log and qgroup patch set

[PATCH 01/24] Btrfs: bugfix: ignore the wrong key for indirect tree block backrefs

[PATCH 02/24] Btrfs: look into the extent during find_all_leafs

[PATCH 03/24] Btrfs: don''t set for_cow parameter for tree block functions

[PATCH 04/24] Btrfs: move struct seq_list to ctree.h

[PATCH 05/24] Btrfs: dummy extent buffers for tree mod log

[PATCH 06/24] Btrfs: add tree mod log to fs_info

[PATCH 07/24] Btrfs: add tree modification log functions

[PATCH 08/24] Btrfs: put all modifications into the tree mod log

[PATCH 09/24] Btrfs: add btrfs_search_old_slot

[PATCH 10/24] Btrfs: use the tree modification log for backref resolving

[PATCH 11/24] Btrfs: fs_info variable for join_transaction

[PATCH 12/24] Btrfs: tree mod log sanity checks in join_transaction

[PATCH 13/24] Btrfs: qgroup on-disk format

[PATCH 14/24] Btrfs: add helper for tree enumeration

[PATCH 15/24] Btrfs: check the root passed to btrfs_end_transaction

[PATCH 16/24] Btrfs: added helper to create new trees

[PATCH 17/24] Btrfs: qgroup state and initialization

[PATCH 18/24] Btrfs: Test code to change the order of delayed-ref processing

[PATCH 19/24] Btrfs: qgroup implementation and prototypes

[PATCH 20/24] Btrfs: quota tree support and startup

[PATCH 21/24] Btrfs: hooks for qgroup to record delayed refs

[PATCH 22/24] Btrfs: hooks to reserve qgroup space

[PATCH 23/24] Btrfs: add qgroup ioctls

[PATCH 24/24] Btrfs: add qgroup inheritance

Re: [PATCH 07/24] Btrfs: add tree modification log functions

Re: [PATCH 19/24] Btrfs: qgroup implementation and prototypes

Re: [PATCH 07/24] Btrfs: add tree modification log functions

Re: [PATCH 23/24] Btrfs: add qgroup ioctls