thr3ads.net - Btrfs devel - [PATCH v1 0/6] Btrfs: scrub: print path to corrupted files and trigger nodatasum fixup [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Jan Schmidt

2011-Jun-13 19:10 UTC

[PATCH v1 0/6] Btrfs: scrub: print path to corrupted files and trigger nodatasum fixup

This patch set introduces two new features for scrub. They share the backref
iteration code which is the reason they made it into the same patch set.

The first feature adds printk statements in case scrub finds an error which list
all affected files. You will need patch 1, 2 and 3 for that.

The second feature adds the trigger which enables us to correct i/o errors in
case the affected extent does not have a checksum (nodatasum), eventually. You
will need patch 1, 4, 5 and 6 for that.

I tried to apply all patches to the current cmason/for-linus branch and to
Arne''s current for-chris branch. They do apply with no errors (some
offsets
possible).

Please review.

Next I''m starting to make up my mind how to implement on-the-fly error
correction correctly. This will enable us to rewrite good data whenever we
encounter a bad copy. I have some preliminary patches already, the stress in the
first sentence is on "correctly". The second feature mentioned in this
patch
series will then automatically use that code, too.

-Jan

Jan Schmidt (6):
  added helper functions to iterate backrefs
  scrub: added unverified_errors
  scrub: print paths of corrupted files
  scrub: bugfix: mirror_num off by one
  add mirror_num to extent_read_full_page
  scrub: add fixup code for errors on nodatasum files

 fs/btrfs/Makefile    |    3 +-
 fs/btrfs/backref.c   |  461 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/backref.h   |   59 +++++++
 fs/btrfs/disk-io.c   |    2 +-
 fs/btrfs/extent_io.c |    6 +-
 fs/btrfs/extent_io.h |    3 +-
 fs/btrfs/inode.c     |    2 +-
 fs/btrfs/scrub.c     |  361 ++++++++++++++++++++++++++++++++++++---
 8 files changed, 866 insertions(+), 31 deletions(-)
 create mode 100644 fs/btrfs/backref.c
 create mode 100644 fs/btrfs/backref.h

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-Jun-13 19:10 UTC

head link

[PATCH v1 1/6] added helper functions to iterate backrefs

These helper functions iterate back references and call a function for each
backref. There is also a function to resolve an inode to a path in the
file system.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/Makefile  |    3 +-
 fs/btrfs/backref.c |  461 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/backref.h |   59 +++++++
 3 files changed, 522 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 9b72dcf..c63f649 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,4 +7,5 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o
root-tree.o dir-item.o \
 	   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
 	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
 	   export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \
-	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o
+	   compression.o delayed-ref.o relocation.o delayed-inode.o backref.o \
+	   scrub.o
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
new file mode 100644
index 0000000..307dfb5
--- /dev/null
+++ b/fs/btrfs/backref.c
@@ -0,0 +1,461 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include "ctree.h"
+#include "backref.h"
+
+/*
+ * this makes the path point to (inum INODE_ITEM ioff)
+ */
+int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+			struct btrfs_path *path)
+{
+	int ret;
+	struct btrfs_key key;
+	struct extent_buffer *eb;
+	struct btrfs_key found_key;
+
+	key.type = BTRFS_INODE_ITEM_KEY;
+	key.objectid = inum;
+	key.offset = ioff;
+
+	ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0);
+	if (ret < 0)
+		return ret;
+
+	eb = path->nodes[0];
+	if (ret && path->slots[0] >= btrfs_header_nritems(eb)) {
+		ret = btrfs_next_leaf(fs_root, path);
+		if (ret)
+			return ret;
+		eb = path->nodes[0];
+	}
+
+	btrfs_item_key_to_cpu(eb, &found_key, path->slots[0]);
+	if (found_key.type != key.type || found_key.objectid != key.objectid)
+		return 1;
+
+	return 0;
+}
+
+static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+				struct btrfs_path *path, int strict,
+				u64 *dest_parent_inum,
+				struct extent_buffer **dest_iref_eb,
+				int *dest_slot)
+{
+	int ret;
+	struct btrfs_key key;
+	struct extent_buffer *eb;
+	struct btrfs_key found_key;
+
+	key.type = BTRFS_INODE_REF_KEY;
+	key.objectid = inum;
+	key.offset = ioff;
+
+	ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+
+	eb = path->nodes[0];
+	if (ret && path->slots[0] >= btrfs_header_nritems(eb)) {
+		ret = btrfs_next_leaf(fs_root, path);
+		if (ret)
+			goto out;
+		eb = path->nodes[0];
+	}
+
+	btrfs_item_key_to_cpu(eb, &found_key, path->slots[0]);
+	if (found_key.type != key.type || found_key.objectid != key.objectid) {
+		ret = 1;
+		goto out;
+	}
+
+	ret = 0;
+	if (dest_parent_inum)
+		*dest_parent_inum = found_key.offset;
+	if (dest_iref_eb)
+		*dest_iref_eb = eb;
+	if (dest_slot)
+		*dest_slot = path->slots[0];
+
+out:
+	btrfs_release_path(path);
+	return ret;
+}
+
+/*
+ * this iterates to turn a btrfs_inode_ref into a full filesystem path.
elements
+ * of the path are separated by ''/'' and the path is
guaranteed to be
+ * 0-terminated. the path is only given within the current file system.
+ * Therefore, it never starts with a ''/''. the caller is
responsible to provide
+ * "size" bytes in "dest". the dest buffer will be filled
backwards! the idea is
+ * that in case of an overflow, the lower part in the hierarchie is most
+ * important to the user. finally, the start point of resulting the string is
+ * returned. in case the path would overflow, "..." is added at the
front of
+ * the string and iteration stops regularly.
+ */
+static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
+				struct btrfs_inode_ref *iref,
+				struct extent_buffer *eb, u64 parent,
+				char *dest, u32 size)
+{
+	u32 len;
+	int slot;
+	u64 inum;
+	int ret;
+	u32 left = size - 1;
+
+	dest[left] = ''\0'';
+
+	while (1) {
+		len = btrfs_inode_ref_name_len(eb, iref);
+		if (len > left) {
+			if (size < 4)
+				return dest+left;
+			if (left > 3)
+				dest += left - 3;
+			memcpy(dest, "...", 3);
+			return dest;
+		}
+		left -= len;
+		read_extent_buffer(eb, dest+left,
+					(unsigned long)(iref + 1), len);
+
+		ret = inode_item_info(parent, 0, fs_root, path);
+		if (ret)
+			return ERR_PTR(ret);
+		eb = path->nodes[0];
+		btrfs_release_path(path);
+
+		ret = inode_ref_info(parent, 0, fs_root, path, 0,
+					&inum, NULL, &slot);
+		if (ret)
+			return ERR_PTR(ret);
+
+		if (parent == inum)   /* regular exit ahead */ {
+			return dest+left;
+		}
+
+		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
+		parent = inum;
+		if (left > 0) {
+			dest[left-1] = ''/'';
+			--left;
+		}
+	}
+}
+
+/*
+ * this makes the path point to (logical EXTENT_ITEM *)
+ * returns 0 for data blocks, 1 for tree blocks and <0 on error
+ */
+int data_extent_from_logical(struct btrfs_root *root, u64 logical,
+				struct btrfs_path *path,
+				struct btrfs_key *found_key)
+{
+	int ret;
+	u64 flags;
+	u32 item_size;
+	struct extent_buffer *eb;
+	struct btrfs_extent_item *ei;
+	struct btrfs_key key;
+	struct btrfs_key f;
+	struct btrfs_key *fk = found_key ? found_key : &f;
+
+	key.type = BTRFS_EXTENT_ITEM_KEY;
+	key.objectid = logical;
+	key.offset = (u64)-1;
+
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret < 0)
+		return ret;
+	ret = btrfs_previous_item(root->fs_info->extent_root, path,
+					0, BTRFS_EXTENT_ITEM_KEY);
+	if (ret < 0)
+		return ret;
+	btrfs_item_key_to_cpu(path->nodes[0], fk, path->slots[0]);
+	if (fk->type != BTRFS_EXTENT_ITEM_KEY || fk->objectid > logical ||
+	    fk->objectid + fk->offset <= logical)
+		return -1;
+
+	eb = path->nodes[0];
+	item_size = btrfs_item_size_nr(eb, path->slots[0]);
+	BUG_ON(item_size < sizeof(*ei));
+
+	ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item);
+	flags = btrfs_extent_flags(eb, ei);
+
+	if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * helper function to iterate extent backrefs. ptr must point to a 0 value for
+ * the first call and may be modified. it is used to track state.
+ * if more backrefs exist, 0 is returned and the next call to _get_extent_ref
+ * must pass the modified ptr parameter to get to the next backref.
+ * after the last backref was processed, 1 is returned.
+ * returns <0 on error
+ */
+static int _get_extent_ref(int flags_wanted, int type_wanted,
+				unsigned long *ptr, struct extent_buffer *eb,
+				struct btrfs_extent_item *ei, u32 item_size,
+				struct btrfs_extent_inline_ref **eiref)
+{
+	int type;
+	int again = 0;
+	unsigned long end;
+	u64 flags;
+	struct btrfs_tree_block_info *info;
+
+	if (!*ptr) {
+		/* first call */
+		flags = btrfs_extent_flags(eb, ei);
+		if (!(flags & flags_wanted))
+			return -1;
+		if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
+			info = (struct btrfs_tree_block_info *)(ei + 1);
+			*eiref = (struct btrfs_extent_inline_ref *)(info + 1);
+		} else {
+			*eiref = (struct btrfs_extent_inline_ref *)(ei + 1);
+		}
+		*ptr = (unsigned long)*eiref;
+	}
+
+	end = (unsigned long)ei + item_size;
+
+again:
+	again = 0;
+
+	*eiref = (struct btrfs_extent_inline_ref *)*ptr;
+	type = btrfs_extent_inline_ref_type(eb, *eiref);
+
+	if (type != type_wanted)
+		again = 1;
+
+	*ptr += btrfs_extent_inline_ref_size(type);
+
+	WARN_ON(*ptr > end);
+	if (*ptr == end)
+		return 1; /* last */
+
+	if (again)
+		goto again;
+
+	return 0;
+}
+
+/*
+ * reads the tree block backref for an extent. tree level and root are returned
+ * through dest_level and dest_root. ptr must point to a 0 value for the first
+ * call and may be modified (see _get_extent_ref comment).
+ * returns 0 on success, <0 on error. note: in contrast to _get_extent_ref
this
+ * one never returns 1!
+ */
+int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb,
+				struct btrfs_extent_item *ei, u32 item_size,
+				u64 *dest_root, u64 *dest_level)
+{
+	int ret;
+	struct btrfs_tree_block_info *info;
+	struct btrfs_extent_inline_ref *eiref;
+
+	ret = _get_extent_ref(BTRFS_EXTENT_FLAG_TREE_BLOCK,
+				BTRFS_TREE_BLOCK_REF_KEY, ptr, eb, ei,
+				item_size, &eiref);
+	if (ret < 0)
+		return ret;
+
+	WARN_ON(!ret); /* multiple tree backrefs for this extent */
+
+	info = (struct btrfs_tree_block_info *)(ei + 1);
+	*dest_root = btrfs_extent_inline_ref_offset(eb, eiref);
+	*dest_level = btrfs_tree_block_level(eb, info);
+
+	return 0;
+}
+
+static int data_inode_for_extent(unsigned long *ptr, struct extent_buffer *eb,
+					struct btrfs_extent_item *ei,
+					u32 item_size, u64 *dest_inum,
+					u64 *dest_ioff)
+{
+	int ret;
+	struct btrfs_extent_inline_ref *eiref;
+	struct btrfs_extent_data_ref *dref;
+
+	ret = _get_extent_ref(BTRFS_EXTENT_FLAG_DATA, BTRFS_EXTENT_DATA_REF_KEY,
+				ptr, eb, ei, item_size, &eiref);
+	if (ret < 0)
+		return ret;
+
+	dref = (struct btrfs_extent_data_ref *)(&eiref->offset);
+	if (btrfs_extent_data_ref_root(eb, dref) != BTRFS_FS_TREE_OBJECTID) {
+		WARN_ON(1);
+		return -1;
+	}
+
+	*dest_inum = btrfs_extent_data_ref_objectid(eb, dref);
+	*dest_ioff = btrfs_extent_data_ref_offset(eb, dref);
+
+	return ret;
+}
+
+/*
+ * calls iterate() for every inode that references the extent identified by
+ * the given parameters.
+ * when the iterator function returns a non-zero value, iteration stops.
+ */
+int iterate_extent_inodes(struct extent_buffer *eb,
+				struct btrfs_extent_item *ei,
+				loff_t extent_item_offset, u32 item_size,
+				iterate_extent_inodes_t *iterate, void *ctx)
+{
+	int last;
+	u64 inum;
+	unsigned long ptr = 0;
+	loff_t extent_data_item_offset;
+	int ret;
+
+	do {
+		last = data_inode_for_extent(&ptr, eb, ei, item_size, &inum,
+						&extent_data_item_offset);
+		if (last < 0)
+			return last;
+
+		ret = iterate(inum, extent_item_offset+extent_data_item_offset,
+				ctx);
+		if (ret)
+			return ret;
+
+	} while (!last);
+
+	return 0;
+}
+
+/*
+ * just a convenience function combining data_extent_from_logical and
+ * iterate_extent_inodes.
+ */
+int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info,
+				struct btrfs_path *path,
+				iterate_extent_inodes_t *iterate, void *ctx)
+{
+	int ret;
+	u32 item_size;
+	struct extent_buffer *l;
+	struct btrfs_extent_item *extent;
+	loff_t offset;
+	struct btrfs_key found_key;
+
+	ret = data_extent_from_logical(fs_info->extent_root, logical, path,
+					&found_key);
+	if (ret)
+		return ret;
+
+	offset = logical - found_key.objectid;
+	l = path->nodes[0];
+	extent = btrfs_item_ptr(l, path->slots[0], struct btrfs_extent_item);
+	item_size = btrfs_item_size_nr(l, path->slots[0]);
+	btrfs_release_path(path);
+
+	ret = iterate_extent_inodes(l, extent, offset, item_size, iterate, ctx);
+
+	return ret;
+}
+
+static int iterate_irefs(u64 inum, struct extent_buffer *eb_i,
+				struct btrfs_root *fs_root,
+				struct btrfs_path *path,
+				iterate_irefs_t *iterate, void *ctx)
+{
+	int ret;
+	int slot;
+	u32 cur;
+	u32 len;
+	u32 name_len;
+	u64 parent = 0;
+	struct extent_buffer *eb_ir;
+	struct btrfs_item *item;
+	struct btrfs_inode_ref *iref;
+
+	while (1) {
+		ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path,
+					1, &parent, &eb_ir, &slot);
+		if (ret < 0)
+			return ret;
+		if (ret)
+			break;
+
+		item = btrfs_item_nr(eb_i, slot);
+		iref = btrfs_item_ptr(eb_i, slot, struct btrfs_inode_ref);
+
+		for (cur = 0; cur < btrfs_item_size(eb_i, item); cur += len) {
+			name_len = btrfs_inode_ref_name_len(eb_i, iref);
+			ret = iterate(parent, iref, eb_ir, slot, ctx);
+			if (ret)
+				return ret;
+			len = sizeof(*iref) + name_len;
+			iref = (struct btrfs_inode_ref *)((char *)iref + len);
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * returns 0 if the path could be dumped (probably truncated)
+ * returns <0 in case of an error
+ */
+static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
+				struct extent_buffer *eb_ir, int slot,
+				void *ctx)
+{
+	struct inode_fs_paths *ipath = ctx;
+	struct extent_buffer *eb_i = ipath->eb_i;
+	u32 path_len;
+	char *fs_path;
+
+	if (ipath->left < 2)
+		return -EOVERFLOW;
+
+	*ipath->dest++ = '' '';
+	--ipath->left;
+
+	fs_path = iref_to_path(ipath->fs_root, ipath->path, iref, eb_i,
+				inum, ipath->scratch_buf, ipath->left);
+	if (!fs_path || IS_ERR(fs_path))
+		return PTR_ERR(fs_path);
+	path_len = ipath->scratch_buf + ipath->left - fs_path - 1;
+	if (path_len+1 > ipath->left)
+		return -EOVERFLOW;
+	memcpy(ipath->dest, fs_path, path_len+1);
+	ipath->left -= path_len;
+	ipath->dest += path_len;
+
+	return 0;
+}
+
+int paths_from_inode(u64 inum, struct inode_fs_paths *ipath)
+{
+	return iterate_irefs(inum, ipath->eb_i, ipath->fs_root, ipath->path,
+				inode_to_path, ipath);
+}
diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
new file mode 100644
index 0000000..6b7e856
--- /dev/null
+++ b/fs/btrfs/backref.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#ifndef __BTRFS_BACKREF__
+#define __BTRFS_BACKREF__
+
+struct inode_fs_paths {
+	int			left;
+	char			*dest;
+	struct btrfs_path	*path;
+	char			*scratch_buf;
+	struct btrfs_root	*fs_root;
+	int			scratch_bufsize;
+	struct extent_buffer	*eb_i;
+};
+
+typedef int (iterate_extent_inodes_t)(u64 inum, loff_t offset, void *ctx);
+typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref,
+				struct extent_buffer *eb_ir,
+				int slot, void *ctx);
+
+int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
+			struct btrfs_path *path);
+
+int data_extent_from_logical(struct btrfs_root *root, u64 logical,
+				struct btrfs_path *path,
+				struct btrfs_key *found_key);
+
+int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb,
+				struct btrfs_extent_item *ei, u32 item_size,
+				u64 *dest_root, u64 *dest_level);
+
+int iterate_extent_inodes(struct extent_buffer *eb,
+				struct btrfs_extent_item *ei,
+				loff_t extent_item_offset, u32 item_size,
+				iterate_extent_inodes_t *iterate, void *ctx);
+
+int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info,
+				struct btrfs_path *path,
+				iterate_extent_inodes_t *iterate, void *ctx);
+
+int paths_from_inode(u64 inum, struct inode_fs_paths *ipath);
+
+#endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-Jun-13 19:10 UTC

head link

[PATCH v1 2/6] scrub: added unverified_errors

In normal operation, scrub is reading data sequentially in large portions.
In case of an i/o error, we try to find the corrupted area(s) by issuing
page sized read requests. With this commit we increment the
unverified_errors counter if all of the small size requests succeed.

Userland patches carrying such conspicous events to the administrator should
already be around.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/scrub.c |   37 ++++++++++++++++++++++++++-----------
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index d5a4108..00e4e58 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -207,18 +207,25 @@ nomem:
  * recheck_error gets called for every page in the bio, even though only
  * one may be bad
  */
-static void scrub_recheck_error(struct scrub_bio *sbio, int ix)
+static int scrub_recheck_error(struct scrub_bio *sbio, int ix)
 {
+	struct scrub_dev *sdev = sbio->sdev;
+	u64 sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
+
 	if (sbio->err) {
-		if (scrub_fixup_io(READ, sbio->sdev->dev->bdev,
-				   (sbio->physical + ix * PAGE_SIZE) >> 9,
+		if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, sector,
 				   sbio->bio->bi_io_vec[ix].bv_page) == 0) {
 			if (scrub_fixup_check(sbio, ix) == 0)
-				return;
+				return 0;
 		}
 	}
 
+	spin_lock(&sdev->stat_lock);
+	++sdev->stat.read_errors;
+	spin_unlock(&sdev->stat_lock);
+
 	scrub_fixup(sbio, ix);
+	return 1;
 }
 
 static int scrub_fixup_check(struct scrub_bio *sbio, int ix)
@@ -388,8 +395,14 @@ static void scrub_checksum(struct btrfs_work *work)
 	int ret;
 
 	if (sbio->err) {
+		ret = 0;
 		for (i = 0; i < sbio->count; ++i)
-			scrub_recheck_error(sbio, i);
+			ret |= scrub_recheck_error(sbio, i);
+		if (!ret) {
+			spin_lock(&sdev->stat_lock);
+			++sdev->stat.unverified_errors;
+			spin_unlock(&sdev->stat_lock);
+		}
 
 		sbio->bio->bi_flags &= ~(BIO_POOL_MASK - 1);
 		sbio->bio->bi_flags |= 1 << BIO_UPTODATE;
@@ -402,10 +415,6 @@ static void scrub_checksum(struct btrfs_work *work)
 			bi->bv_offset = 0;
 			bi->bv_len = PAGE_SIZE;
 		}
-
-		spin_lock(&sdev->stat_lock);
-		++sdev->stat.read_errors;
-		spin_unlock(&sdev->stat_lock);
 		goto out;
 	}
 	for (i = 0; i < sbio->count; ++i) {
@@ -426,8 +435,14 @@ static void scrub_checksum(struct btrfs_work *work)
 			WARN_ON(1);
 		}
 		kunmap_atomic(buffer, KM_USER0);
-		if (ret)
-			scrub_recheck_error(sbio, i);
+		if (ret) {
+			ret = scrub_recheck_error(sbio, i);
+			if (!ret) {
+				spin_lock(&sdev->stat_lock);
+				++sdev->stat.unverified_errors;
+				spin_unlock(&sdev->stat_lock);
+			}
+		}
 	}
 
 out:
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-Jun-13 19:10 UTC

head link

[PATCH v1 3/6] scrub: print paths of corrupted files

While scrubbing, we may encounter various errors. Previously, a logical
address was printed to the log only. Now, all paths belonging to that
address are resolved and printed separately. That should work for hardlinks
as well as reflinks.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/scrub.c |  140 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 137 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 00e4e58..e294d76 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -27,6 +27,7 @@
 #include "volumes.h"
 #include "disk-io.h"
 #include "ordered-data.h"
+#include "backref.h"
 
 /*
  * This is only the first step towards a full-features scrub. It reads all
@@ -106,6 +107,19 @@ struct scrub_dev {
 	spinlock_t		stat_lock;
 };
 
+struct scrub_warning {
+	struct btrfs_path	*path;
+	u64			extent_item_size;
+	char			*scratch_buf;
+	char			*msg_buf;
+	const char		*errstr;
+	u64			sector;
+	u64			logical;
+	struct btrfs_device	*dev;
+	int			msg_bufsize;
+	int			scratch_bufsize;
+};
+
 static void scrub_free_csums(struct scrub_dev *sdev)
 {
 	while (!list_empty(&sdev->csum_list)) {
@@ -201,6 +215,121 @@ nomem:
 	return ERR_PTR(-ENOMEM);
 }
 
+static int scrub_print_warning_inode(u64 inum, loff_t offset, void *ctx)
+{
+	u64 isize;
+	int ret;
+	struct extent_buffer *eb_i;
+	struct btrfs_inode_item *ii;
+	struct scrub_warning *swarn = ctx;
+	struct btrfs_fs_info *fs_info = swarn->dev->dev_root->fs_info;
+	struct inode_fs_paths ipath;
+
+	ret = inode_item_info(inum, 0, fs_info->fs_root, swarn->path);
+	if (ret) {
+err:
+		printk(KERN_WARNING "btrfs: %s at logical %llu on dev "
+			"%s, sector %llu, inode %llu, offset %llu: path "
+			"resolving failed with ret=%d\n", swarn->errstr,
+			swarn->logical, swarn->dev->name, swarn->sector, inum,
+			offset, ret);
+		return 0;
+	}
+	eb_i = swarn->path->nodes[0];
+	ii = btrfs_item_ptr(eb_i, swarn->path->slots[0],
+				struct btrfs_inode_item);
+	btrfs_release_path(swarn->path);
+
+	isize = btrfs_inode_size(eb_i, ii);
+
+	ipath.left = swarn->msg_bufsize - 1;
+	ipath.dest = swarn->msg_buf;
+	ipath.path = swarn->path;
+	ipath.scratch_buf = swarn->scratch_buf;
+	ipath.scratch_bufsize = swarn->scratch_bufsize;
+	ipath.fs_root = fs_info->fs_root;
+	ipath.eb_i = eb_i;
+
+	ret = paths_from_inode(inum, &ipath);
+	if (ret < 0)
+		goto err;
+
+	printk(KERN_WARNING "btrfs: %s at logical %llu on dev "
+		"%s, sector %llu, inode %llu, offset %llu, "
+		"length %llu, links %u (path:%s)\n", swarn->errstr,
+		swarn->logical, swarn->dev->name, swarn->sector, inum, offset,
+		min(isize - offset, (u64)PAGE_SIZE),
+		btrfs_inode_nlink(eb_i, ii), swarn->msg_buf);
+
+	return 0;
+}
+
+static void scrub_print_warning(const char *errstr, struct scrub_bio *sbio,
+				int ix)
+{
+	struct btrfs_device *dev = sbio->sdev->dev;
+	struct btrfs_fs_info *fs_info = dev->dev_root->fs_info;
+	struct btrfs_path *path;
+	struct btrfs_key found_key;
+	struct extent_buffer *eb;
+	struct btrfs_extent_item *ei;
+	struct scrub_warning swarn;
+	u32 item_size;
+	int ret;
+	u64 ref_root;
+	u64 ref_level;
+	unsigned long ptr = 0;
+	static const int bufsize = 4096;
+	loff_t extent_item_offset;
+
+	path = btrfs_alloc_path();
+
+	swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS);
+	swarn.msg_buf = kmalloc(bufsize, GFP_NOFS);
+	swarn.sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
+	swarn.logical = sbio->logical + ix * PAGE_SIZE;
+	swarn.errstr = errstr;
+	swarn.dev = dev;
+	swarn.msg_bufsize = bufsize;
+	swarn.scratch_bufsize = bufsize;
+
+	if (!path || !swarn.scratch_buf || !swarn.msg_buf)
+		goto out;
+
+	ret = data_extent_from_logical(fs_info->extent_root,
+					swarn.logical, path, &found_key);
+	if (ret < 0)
+		goto out;
+
+	extent_item_offset = swarn.logical - found_key.objectid;
+	swarn.extent_item_size = found_key.offset;
+
+	eb = path->nodes[0];
+	ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item);
+	item_size = btrfs_item_size_nr(eb, path->slots[0]);
+
+	btrfs_release_path(path);
+
+	if (ret) {
+		ret = tree_backref_for_extent(&ptr, eb, ei, item_size,
+						&ref_root, &ref_level);
+		printk(KERN_WARNING "%s at logical %llu on dev %s, "
+			"sector %llu: metadata %s (level %llu) in tree %llu\n",
+			errstr, swarn.logical, dev->name, swarn.sector,
+			ref_level ? "node" : "leaf", ret < 0 ? -1 :
ref_level,
+			ret < 0 ? -1 : ref_root);
+	} else {
+		swarn.path = path;
+		iterate_extent_inodes(eb, ei, extent_item_offset, item_size,
+					scrub_print_warning_inode, &swarn);
+	}
+
+out:
+	btrfs_free_path(path);
+	kfree(swarn.scratch_buf);
+	kfree(swarn.msg_buf);
+}
+
 /*
  * scrub_recheck_error gets called when either verification of the page
  * failed or the bio failed to read, e.g. with EIO. In the latter case,
@@ -218,6 +347,11 @@ static int scrub_recheck_error(struct scrub_bio *sbio, int
ix)
 			if (scrub_fixup_check(sbio, ix) == 0)
 				return 0;
 		}
+		if (printk_ratelimit())
+			scrub_print_warning("i/o error", sbio, ix);
+	} else {
+		if (printk_ratelimit())
+			scrub_print_warning("checksum error", sbio, ix);
 	}
 
 	spin_lock(&sdev->stat_lock);
@@ -333,7 +467,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
 	spin_unlock(&sdev->stat_lock);
 
 	if (printk_ratelimit())
-		printk(KERN_ERR "btrfs: fixed up at %llu\n",
+		printk(KERN_ERR "btrfs: fixed up error at logical %llu\n",
 		       (unsigned long long)logical);
 	return;
 
@@ -344,8 +478,8 @@ uncorrectable:
 	spin_unlock(&sdev->stat_lock);
 
 	if (printk_ratelimit())
-		printk(KERN_ERR "btrfs: unable to fixup at %llu\n",
-			 (unsigned long long)logical);
+		printk(KERN_ERR "btrfs: unable to fixup (regular) error at "
+			"logical %llu\n", (unsigned long long)logical);
 }
 
 static int scrub_fixup_io(int rw, struct block_device *bdev, sector_t sector,
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-Jun-13 19:10 UTC

head link

[PATCH v1 4/6] scrub: bugfix: mirror_num off by one

Fix the mirror_num determination in scrub_stripe. The rest of the scrub code
did not use mirror_num for anything important and that error went unnoticed.
The nodatasum fixup patch of this set depends on a correct mirror_num.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/scrub.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index e294d76..ec29ce8 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -913,21 +913,21 @@ static noinline_for_stack int scrub_stripe(struct
scrub_dev *sdev,
 	if (map->type & BTRFS_BLOCK_GROUP_RAID0) {
 		offset = map->stripe_len * num;
 		increment = map->stripe_len * map->num_stripes;
-		mirror_num = 0;
+		mirror_num = 1;
 	} else if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
 		int factor = map->num_stripes / map->sub_stripes;
 		offset = map->stripe_len * (num / map->sub_stripes);
 		increment = map->stripe_len * factor;
-		mirror_num = num % map->sub_stripes;
+		mirror_num = num % map->sub_stripes + 1;
 	} else if (map->type & BTRFS_BLOCK_GROUP_RAID1) {
 		increment = map->stripe_len;
-		mirror_num = num % map->num_stripes;
+		mirror_num = num % map->num_stripes + 1;
 	} else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
 		increment = map->stripe_len;
-		mirror_num = num % map->num_stripes;
+		mirror_num = num % map->num_stripes + 1;
 	} else {
 		increment = map->stripe_len;
-		mirror_num = 0;
+		mirror_num = 1;
 	}
 
 	path = btrfs_alloc_path();
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-Jun-13 19:10 UTC

head link

[PATCH v1 5/6] add mirror_num to extent_read_full_page

Currently, extent_read_full_page always assumes we are trying to read mirror
0, which generally is the best we can do. To add flexibility, pass it as a
parameter. This will be needed by scrub fixup code.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/disk-io.c   |    2 +-
 fs/btrfs/extent_io.c |    6 +++---
 fs/btrfs/extent_io.h |    2 +-
 fs/btrfs/inode.c     |    2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a203d36..daf1e47 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -874,7 +874,7 @@ static int btree_readpage(struct file *file, struct page
*page)
 {
 	struct extent_io_tree *tree;
 	tree = &BTRFS_I(page->mapping->host)->io_tree;
-	return extent_read_full_page(tree, page, btree_get_extent);
+	return extent_read_full_page(tree, page, btree_get_extent, 0);
 }
 
 static int btree_releasepage(struct page *page, gfp_t gfp_flags)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b181a94..b78f665 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2111,16 +2111,16 @@ static int __extent_read_full_page(struct extent_io_tree
*tree,
 }
 
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
-			    get_extent_t *get_extent)
+			    get_extent_t *get_extent, int mirror_num)
 {
 	struct bio *bio = NULL;
 	unsigned long bio_flags = 0;
 	int ret;
 
-	ret = __extent_read_full_page(tree, page, get_extent, &bio, 0,
+	ret = __extent_read_full_page(tree, page, get_extent, &bio, mirror_num,
 				      &bio_flags);
 	if (bio)
-		ret = submit_one_bio(READ, bio, 0, bio_flags);
+		ret = submit_one_bio(READ, bio, mirror_num, bio_flags);
 	return ret;
 }
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4e8445a..2fef77f 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -177,7 +177,7 @@ int unlock_extent_cached(struct extent_io_tree *tree, u64
start, u64 end,
 int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end,
 		    gfp_t mask);
 int extent_read_full_page(struct extent_io_tree *tree, struct page *page,
-			  get_extent_t *get_extent);
+			  get_extent_t *get_extent, int mirror_num);
 int __init extent_io_init(void);
 void extent_io_exit(void);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 113913a..938fc10 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6238,7 +6238,7 @@ int btrfs_readpage(struct file *file, struct page *page)
 {
 	struct extent_io_tree *tree;
 	tree = &BTRFS_I(page->mapping->host)->io_tree;
-	return extent_read_full_page(tree, page, btrfs_get_extent);
+	return extent_read_full_page(tree, page, btrfs_get_extent, 0);
 }
 
 static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-Jun-13 19:10 UTC

head link

[PATCH v1 6/6] scrub: add fixup code for errors on nodatasum files

This removes a FIXME comment and introduces the first part of nodatasum
fixup: It gets the corresponding inode for a logical address and triggers a
regular readpage for the corrupted sector.

Once we have on-the-fly error correction our error will be automatically
corrected. The correction code is expected to clear the newly introduced
EXTENT_DAMAGED flag, making scrub report that error as "corrected"
instead
of "uncorrectable" eventually.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/extent_io.h |    1 +
 fs/btrfs/scrub.c     |  174 ++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 170 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 2fef77f..906ea42 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -17,6 +17,7 @@
 #define EXTENT_NODATASUM (1 << 10)
 #define EXTENT_DO_ACCOUNTING (1 << 11)
 #define EXTENT_FIRST_DELALLOC (1 << 12)
+#define EXTENT_DAMAGED (1 << 13)
 #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
 
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ec29ce8..a2aa47a 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -27,6 +27,7 @@
 #include "volumes.h"
 #include "disk-io.h"
 #include "ordered-data.h"
+#include "transaction.h"
 #include "backref.h"
 
 /*
@@ -94,6 +95,7 @@ struct scrub_dev {
 	int			first_free;
 	int			curr;
 	atomic_t		in_flight;
+	atomic_t		fixup;
 	spinlock_t		list_lock;
 	wait_queue_head_t	list_wait;
 	u16			csum_size;
@@ -107,6 +109,14 @@ struct scrub_dev {
 	spinlock_t		stat_lock;
 };
 
+struct scrub_fixup_nodatasum {
+	struct scrub_dev	*sdev;
+	u64			logical;
+	struct btrfs_root	*root;
+	struct btrfs_work	work;
+	u64			mirror_num;
+};
+
 struct scrub_warning {
 	struct btrfs_path	*path;
 	u64			extent_item_size;
@@ -195,12 +205,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device
*dev)
 
 		if (i != SCRUB_BIOS_PER_DEV-1)
 			sdev->bios[i]->next_free = i + 1;
-		 else
+		else
 			sdev->bios[i]->next_free = -1;
 	}
 	sdev->first_free = 0;
 	sdev->curr = -1;
 	atomic_set(&sdev->in_flight, 0);
+	atomic_set(&sdev->fixup, 0);
 	atomic_set(&sdev->cancel_req, 0);
 	sdev->csum_size = btrfs_super_csum_size(&fs_info->super_copy);
 	INIT_LIST_HEAD(&sdev->csum_list);
@@ -330,6 +341,141 @@ out:
 	kfree(swarn.msg_buf);
 }
 
+static int scrub_fixup_readpage(u64 inum, loff_t offset, void *ctx)
+{
+	struct page *page;
+	unsigned long index;
+	struct scrub_fixup_nodatasum *fixup = ctx;
+	int ret;
+	int corrected;
+	struct btrfs_key key;
+	struct inode *inode;
+	int end = offset + PAGE_SIZE - 1;
+
+	key.type = BTRFS_INODE_ITEM_KEY;
+	key.objectid = inum;
+	key.offset = 0;
+	inode = btrfs_iget(fixup->root->fs_info->sb, &key,
+				fixup->root->fs_info->fs_root, NULL);
+	if (IS_ERR(inode))
+		return -1;
+
+	ret = set_extent_bit(&BTRFS_I(inode)->io_tree, offset, end,
+				EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS);
+	if (ret)
+		return ret < 0 ? ret : -1;
+
+	index = offset >> PAGE_CACHE_SHIFT;
+
+	page = find_or_create_page(inode->i_mapping, index, GFP_NOFS);
+	if (!page)
+		return -1;
+
+	ret = extent_read_full_page(&BTRFS_I(inode)->io_tree, page,
+					btrfs_get_extent, fixup->mirror_num);
+	wait_on_page_locked(page);
+	corrected = !test_range_bit(&BTRFS_I(inode)->io_tree, offset, end,
+					EXTENT_DAMAGED, 0, NULL);
+
+	if (corrected)
+		WARN_ON(!PageUptodate(page));
+	else
+		clear_extent_bit(&BTRFS_I(inode)->io_tree, offset, end,
+					EXTENT_DAMAGED, 0, 0, NULL, GFP_NOFS);
+
+	put_page(page);
+	iput(inode);
+
+	if (ret < 0)
+		return ret;
+
+	if (ret == 0 && corrected) {
+		/*
+		 * we only need to call readpage for one of the inodes belonging
+		 * to this extent. so make iterate_extent_inodes stop
+		 */
+		return 1;
+	}
+
+	return -1;
+}
+
+static void scrub_fixup_nodatasum(struct btrfs_work *work)
+{
+	int ret;
+	struct scrub_fixup_nodatasum *fixup;
+	struct scrub_dev *sdev;
+	struct btrfs_trans_handle *trans = NULL;
+	struct btrfs_fs_info *fs_info;
+	struct btrfs_path *path;
+	int uncorrectable = 0;
+
+	fixup = container_of(work, struct scrub_fixup_nodatasum, work);
+	sdev = fixup->sdev;
+	fs_info = fixup->root->fs_info;
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		spin_lock(&sdev->stat_lock);
+		++sdev->stat.malloc_errors;
+		spin_unlock(&sdev->stat_lock);
+		uncorrectable = 1;
+		goto out;
+	}
+
+	trans = btrfs_join_transaction(fixup->root);
+	if (IS_ERR(trans)) {
+		uncorrectable = 1;
+		goto out;
+	}
+
+	/*
+	 * the idea is to trigger a regular read through the standard path. we
+	 * read a page from the (failed) logical address by specifying the
+	 * corresponding copynum of the failed sector. thus, that readpage is
+	 * expected to fail.
+	 * that is the point where on-the-fly error correction will kick in
+	 * (once it''s finished) and rewrite the failed sector if a good copy
+	 * can be found.
+	 */
+	ret = iterate_inodes_from_logical(fixup->logical,
fixup->root->fs_info,
+						path, scrub_fixup_readpage,
+						fixup);
+	if (ret < 0) {
+		uncorrectable = 1;
+		goto out;
+	}
+	WARN_ON(ret != 1);
+
+	spin_lock(&sdev->stat_lock);
+	++sdev->stat.corrected_errors;
+	spin_unlock(&sdev->stat_lock);
+
+out:
+	if (trans && !IS_ERR(trans))
+		btrfs_end_transaction(trans, fixup->root);
+	if (uncorrectable) {
+		spin_lock(&sdev->stat_lock);
+		++sdev->stat.uncorrectable_errors;
+		spin_unlock(&sdev->stat_lock);
+		if (printk_ratelimit())
+			printk(KERN_ERR "btrfs: unable to fixup (nodatasum) "
+				"error at logical %llu\n", fixup->logical);
+	}
+
+	btrfs_free_path(path);
+	kfree(fixup);
+
+	/* see caller why we''re pretending to be paused in the scrub counters
*/
+	mutex_lock(&fs_info->scrub_lock);
+	atomic_dec(&fs_info->scrubs_running);
+	atomic_dec(&fs_info->scrubs_paused);
+	mutex_unlock(&fs_info->scrub_lock);
+	atomic_dec(&sdev->fixup);
+	wake_up(&fs_info->scrub_pause_wait);
+	wake_up(&sdev->list_wait);
+}
+
 /*
  * scrub_recheck_error gets called when either verification of the page
  * failed or the bio failed to read, e.g. with EIO. In the latter case,
@@ -398,6 +544,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
 	struct btrfs_fs_info *fs_info = sdev->dev->dev_root->fs_info;
 	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
 	struct btrfs_multi_bio *multi = NULL;
+	struct scrub_fixup_nodatasum *fixup;
 	u64 logical = sbio->logical + ix * PAGE_SIZE;
 	u64 length;
 	int i;
@@ -406,12 +553,28 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
 
 	if ((sbio->spag[ix].flags & BTRFS_EXTENT_FLAG_DATA) &&
 	    (sbio->spag[ix].have_csum == 0)) {
+		fixup = kzalloc(sizeof(*fixup), GFP_NOFS);
+		fixup->sdev = sdev;
+		fixup->logical = logical;
+		fixup->root = fs_info->extent_root;
+		fixup->mirror_num = sbio->spag[ix].mirror_num;
 		/*
-		 * nodatasum, don''t try to fix anything
-		 * FIXME: we can do better, open the inode and trigger a
-		 * writeback
+		 * increment scrubs_running to prevent cancel requests from
+		 * completing as long as a fixup worker is running. we must also
+		 * increment scrubs_paused to prevent deadlocking on pause
+		 * requests used for transactions commits (as the worker uses a
+		 * transaction context). it is safe to regard the fixup worker
+		 * as paused for all matters practical. effectively, we only
+		 * avoid cancellation requests from completing.
 		 */
-		goto uncorrectable;
+		mutex_lock(&fs_info->scrub_lock);
+		atomic_inc(&fs_info->scrubs_running);
+		atomic_inc(&fs_info->scrubs_paused);
+		mutex_unlock(&fs_info->scrub_lock);
+		atomic_inc(&sdev->fixup);
+		fixup->work.func = scrub_fixup_nodatasum;
+		btrfs_queue_worker(&fs_info->scrub_workers, &fixup->work);
+		return;
 	}
 
 	length = PAGE_SIZE;
@@ -1404,6 +1567,7 @@ int btrfs_scrub_dev(struct btrfs_root *root, u64 devid,
u64 start, u64 end,
 		ret = scrub_enumerate_chunks(sdev, start, end);
 
 	wait_event(sdev->list_wait, atomic_read(&sdev->in_flight) == 0);
+	wait_event(sdev->list_wait, atomic_read(&sdev->fixup) == 0);
 
 	atomic_dec(&fs_info->scrubs_running);
 	wake_up(&fs_info->scrub_pause_wait);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2011-Jun-16 21:24 UTC

head link

Re: [PATCH v1 1/6] added helper functions to iterate backrefs

On Mon, Jun 13, 2011 at 09:10:34PM +0200, Jan Schmidt
wrote:> These helper functions iterate back references and call a function for each
> backref. There is also a function to resolve an inode to a path in the
> file system.
> 
> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
> ---
>  fs/btrfs/Makefile  |    3 +-
>  fs/btrfs/backref.c |  461
++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/backref.h |   59 +++++++
>  3 files changed, 522 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index 9b72dcf..c63f649 100644
> --- a/fs/btrfs/Makefile
> > +++ b/fs/btrfs/Makefile
> @@ -7,4 +7,5 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o
root-tree.o dir-item.o \
>  	   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
>  	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
>  	   export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \
> -	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o
> +	   compression.o delayed-ref.o relocation.o delayed-inode.o backref.o \
> +	   scrub.o
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> new file mode 100644
> index 0000000..307dfb5
> --- /dev/null
> +++ b/fs/btrfs/backref.c
> @@ -0,0 +1,461 @@
> +/*
> + * Copyright (C) 2011 STRATO.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; if not, write to the
> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> + * Boston, MA 021110-1307, USA.
> + */
> +
> +#include "ctree.h"
> +#include "backref.h"
> +
> +/*
> + * this makes the path point to (inum INODE_ITEM ioff)
> + */
> +int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
> +			struct btrfs_path *path)
> +{
this func ...
> +	int ret;
> +	struct btrfs_key key;
> +	struct extent_buffer *eb;
> +	struct btrfs_key found_key;
> +
> +	key.type = BTRFS_INODE_ITEM_KEY;
> +	key.objectid = inum;
> +	key.offset = ioff;
> +
> +	ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0);
> +	if (ret < 0)
> +		return ret;
> +
> +	eb = path->nodes[0];
> +	if (ret && path->slots[0] >= btrfs_header_nritems(eb)) {
> +		ret = btrfs_next_leaf(fs_root, path);
> +		if (ret)
> +			return ret;
> +		eb = path->nodes[0];
> +	}
> +
> +	btrfs_item_key_to_cpu(eb, &found_key, path->slots[0]);
> +	if (found_key.type != key.type || found_key.objectid != key.objectid)
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
> +				struct btrfs_path *path, int strict,
> +				u64 *dest_parent_inum,
> +				struct extent_buffer **dest_iref_eb,
> +				int *dest_slot)
> +{
and this one share a fair amount of code, do a common helper instead
> +	int ret;
> +	struct btrfs_key key;
> +	struct extent_buffer *eb;
> +	struct btrfs_key found_key;
> +
> +	key.type = BTRFS_INODE_REF_KEY;
> +	key.objectid = inum;
> +	key.offset = ioff;
> +
> +	ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0);
> +	if (ret < 0)
> +		goto out;
> +
> +	eb = path->nodes[0];
> +	if (ret && path->slots[0] >= btrfs_header_nritems(eb)) {
> +		ret = btrfs_next_leaf(fs_root, path);
> +		if (ret)
> +			goto out;
> +		eb = path->nodes[0];
> +	}
> +
> +	btrfs_item_key_to_cpu(eb, &found_key, path->slots[0]);
> +	if (found_key.type != key.type || found_key.objectid != key.objectid) {
> +		ret = 1;
> +		goto out;
> +	}
> +
> +	ret = 0;
> +	if (dest_parent_inum)
> +		*dest_parent_inum = found_key.offset;
> +	if (dest_iref_eb)
> +		*dest_iref_eb = eb;
> +	if (dest_slot)
> +		*dest_slot = path->slots[0];
> +
> +out:
> +	btrfs_release_path(path);
> +	return ret;
> +}
> +
> +/*
> + * this iterates to turn a btrfs_inode_ref into a full filesystem path.
elements
> + * of the path are separated by ''/'' and the path is
guaranteed to be
> + * 0-terminated. the path is only given within the current file system.
> + * Therefore, it never starts with a ''/''. the caller is
responsible to provide
> + * "size" bytes in "dest". the dest buffer will be
filled backwards! the idea is
> + * that in case of an overflow, the lower part in the hierarchie is most
> + * important to the user. finally, the start point of resulting the string
is                                                                
^^^> + * returned. in case the path would overflow, "..." is added at
the front of
> + * the string and iteration stops regularly.
> + */
> +static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path
*path,
> +				struct btrfs_inode_ref *iref,
> +				struct extent_buffer *eb, u64 parent,
> +				char *dest, u32 size)
> +{
> +	u32 len;
> +	int slot;
> +	u64 inum;
> +	int ret;
> +	u32 left = size - 1;
this confused me first, ''left'' may be ambiguous, something
like ''bytes_left''
would be better
> +
> +	dest[left] = ''\0'';
> +
> +	while (1) {
> +		len = btrfs_inode_ref_name_len(eb, iref);
> +		if (len > left) {
> +			if (size < 4)
> +				return dest+left;
                                break;
> +			if (left > 3)
> +				dest += left - 3;
> +			memcpy(dest, "...", 3);
> +			return dest;
i''d rather see this consistent with other return points,

if (left > 3)
	left -= 3;
else
	left = 0;
memcpy(dest + left, "...", 3);
break;
> +		}
> +		left -= len;
> +		read_extent_buffer(eb, dest+left,
> +					(unsigned long)(iref + 1), len);
> +
> +		ret = inode_item_info(parent, 0, fs_root, path);
> +		if (ret)
> +			return ERR_PTR(ret);
> +		eb = path->nodes[0];
> +		btrfs_release_path(path);
> +
> +		ret = inode_ref_info(parent, 0, fs_root, path, 0,
> +					&inum, NULL, &slot);
> +		if (ret)
> +			return ERR_PTR(ret);
> +
> +		if (parent == inum)   /* regular exit ahead */ {
> +			return dest+left;                        break;

and please put the comment before the if and remove { }
> +		}
> +
> +		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
> +		parent = inum;
> +		if (left > 0) {
> +			dest[left-1] = ''/'';
> +			--left;
swap the lines and remove "-1" :)
> +		}
> +	}       
        return dest + left;> +}
> +
> +/*
> + * this makes the path point to (logical EXTENT_ITEM *)
> + * returns 0 for data blocks, 1 for tree blocks and <0 on error
> + */
> +int data_extent_from_logical(struct btrfs_root *root, u64 logical,
> +				struct btrfs_path *path,
> +				struct btrfs_key *found_key)
> +{
> +	int ret;
> +	u64 flags;
> +	u32 item_size;
> +	struct extent_buffer *eb;
> +	struct btrfs_extent_item *ei;
> +	struct btrfs_key key;
> +	struct btrfs_key f;
                         tmpkey;
> +	struct btrfs_key *fk = found_key ? found_key : &f;
transform this to

        if (!found_key)
		found_key = &tmpkey;

and use found_key along
	
still, will you ever pass NULL found_key?
> +
> +	key.type = BTRFS_EXTENT_ITEM_KEY;
> +	key.objectid = logical;
> +	key.offset = (u64)-1;
> +
> +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +	if (ret < 0)
> +		return ret;
> +	ret = btrfs_previous_item(root->fs_info->extent_root, path,
> +					0, BTRFS_EXTENT_ITEM_KEY);
> +	if (ret < 0)
> +		return ret;
> +	btrfs_item_key_to_cpu(path->nodes[0], fk, path->slots[0]);
> +	if (fk->type != BTRFS_EXTENT_ITEM_KEY || fk->objectid > logical
||
> +	    fk->objectid + fk->offset <= logical)
> +		return -1;
this is -EPERM, confusing. eg btrfs_previous_item a few lines above may return
variety of error codes from the whole function, EPERM just does not fit.  does
this check mean some serious error? inconsistency or corruption?
> +
> +	eb = path->nodes[0];
> +	item_size = btrfs_item_size_nr(eb, path->slots[0]);
> +	BUG_ON(item_size < sizeof(*ei));
> +
> +	ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item);
> +	flags = btrfs_extent_flags(eb, ei);
> +
> +	if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)
> +		return 1;
> +
> +	return 0;
> +}
> +
> +/*
> + * helper function to iterate extent backrefs. ptr must point to a 0 value
for
> + * the first call and may be modified. it is used to track state.
> + * if more backrefs exist, 0 is returned and the next call to
_get_extent_ref
> + * must pass the modified ptr parameter to get to the next backref.
> + * after the last backref was processed, 1 is returned.
> + * returns <0 on error
> + */
> +static int _get_extent_ref(int flags_wanted, int type_wanted,           __get_extent_ref
single score function prefixes are not used

flags_wanted is compared to extent_flags of u64 type
> +				unsigned long *ptr, struct extent_buffer *eb,
> +				struct btrfs_extent_item *ei, u32 item_size,
> +				struct btrfs_extent_inline_ref **eiref)
> +{
> +	int type;
> +	int again = 0;
> +	unsigned long end;
> +	u64 flags;
> +	struct btrfs_tree_block_info *info;
> +
> +	if (!*ptr) {
> +		/* first call */
> +		flags = btrfs_extent_flags(eb, ei);
> +		if (!(flags & flags_wanted))
> +			return -1;
again, better errorcode, EINVAL seems to be the right choice
> +		if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
> +			info = (struct btrfs_tree_block_info *)(ei + 1);
> +			*eiref = (struct btrfs_extent_inline_ref *)(info + 1);
> +		} else {
> +			*eiref = (struct btrfs_extent_inline_ref *)(ei + 1);
style: remove { } around single if/else statement
> +		}
> +		*ptr = (unsigned long)*eiref;
> +	}
> +
> +	end = (unsigned long)ei + item_size;
> +
> +again:
> +	again = 0;
excessive use of goto detected:

	do {> +
> +	*eiref = (struct btrfs_extent_inline_ref *)*ptr;
> +	type = btrfs_extent_inline_ref_type(eb, *eiref);
> +
> +	if (type != type_wanted)
> +		again = 1;
> +
> +	*ptr += btrfs_extent_inline_ref_size(type);
> +
> +	WARN_ON(*ptr > end);
> +	if (*ptr == end)
> +		return 1; /* last */
> +
> +	if (again)
> +		goto again;	
	} while (type != type_wanted);
> +
> +	return 0;
> +}
> +
> +/*
> + * reads the tree block backref for an extent. tree level and root are
returned
> + * through dest_level and dest_root. ptr must point to a 0 value for the
first
> + * call and may be modified (see _get_extent_ref comment).
> + * returns 0 on success, <0 on error. note: in contrast to
_get_extent_ref this
> + * one never returns 1!
> + */
> +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb,
> +				struct btrfs_extent_item *ei, u32 item_size,
> +				u64 *dest_root, u64 *dest_level)
there''s not a formal "rule" or something, but i''ve
seen outgoing parameters
named with ''out_''

dest_level can be just an int, levels fit in one byte anyway
> +{
> +	int ret;
> +	struct btrfs_tree_block_info *info;
> +	struct btrfs_extent_inline_ref *eiref;
> +
> +	ret = _get_extent_ref(BTRFS_EXTENT_FLAG_TREE_BLOCK,
> +				BTRFS_TREE_BLOCK_REF_KEY, ptr, eb, ei,
> +				item_size, &eiref);
> +	if (ret < 0)
> +		return ret;
> +
> +	WARN_ON(!ret); /* multiple tree backrefs for this extent */
as we will probably have CONFIG_DEBUG, the comment makes a good candidate for a
printk message
> +
> +	info = (struct btrfs_tree_block_info *)(ei + 1);
> +	*dest_root = btrfs_extent_inline_ref_offset(eb, eiref);
> +	*dest_level = btrfs_tree_block_level(eb, info);
> +
> +	return 0;
> +}
> +
> +static int data_inode_for_extent(unsigned long *ptr, struct extent_buffer
*eb,
> +					struct btrfs_extent_item *ei,
> +					u32 item_size, u64 *dest_inum,
> +					u64 *dest_ioff)
> +{
> +	int ret;
> +	struct btrfs_extent_inline_ref *eiref;
> +	struct btrfs_extent_data_ref *dref;
> +
> +	ret = _get_extent_ref(BTRFS_EXTENT_FLAG_DATA, BTRFS_EXTENT_DATA_REF_KEY,
> +				ptr, eb, ei, item_size, &eiref);
> +	if (ret < 0)
> +		return ret;
> +
> +	dref = (struct btrfs_extent_data_ref *)(&eiref->offset);
> +	if (btrfs_extent_data_ref_root(eb, dref) != BTRFS_FS_TREE_OBJECTID) {
> +		WARN_ON(1);
> +		return -1;
needs better error code
> +	}
> +
> +	*dest_inum = btrfs_extent_data_ref_objectid(eb, dref);
> +	*dest_ioff = btrfs_extent_data_ref_offset(eb, dref);
> +
> +	return ret;
> +}
> +
> +/*
> + * calls iterate() for every inode that references the extent identified
by
> + * the given parameters.
> + * when the iterator function returns a non-zero value, iteration stops.
> + */
> +int iterate_extent_inodes(struct extent_buffer *eb,
> +				struct btrfs_extent_item *ei,
> +				loff_t extent_item_offset, u32 item_size,
> +				iterate_extent_inodes_t *iterate, void *ctx)
> +{
> +	int last;
> +	u64 inum;
> +	unsigned long ptr = 0;
> +	loff_t extent_data_item_offset;
> +	int ret;
> +
> +	do {
> +		last = data_inode_for_extent(&ptr, eb, ei, item_size, &inum,
> +						&extent_data_item_offset);
> +		if (last < 0)
> +			return last;
> +
> +		ret = iterate(inum, extent_item_offset+extent_data_item_offset,
> +				ctx);
> +		if (ret)
> +			return ret;
> +
> +	} while (!last);
> +
> +	return 0;
> +}
> +
> +/*
> + * just a convenience function combining data_extent_from_logical and
> + * iterate_extent_inodes.
> + */
a better comment please or none
> +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info
*fs_info,
> +				struct btrfs_path *path,
> +				iterate_extent_inodes_t *iterate, void *ctx)
> +{
> +	int ret;
> +	u32 item_size;
> +	struct extent_buffer *l;
> +	struct btrfs_extent_item *extent;
> +	loff_t offset;
> +	struct btrfs_key found_key;
> +
> +	ret = data_extent_from_logical(fs_info->extent_root, logical, path,
> +					&found_key);
> +	if (ret)
> +		return ret;
this can ret < 0 in case of error, and 1 for the tree block. not sure if this
shouldn''t be handled separately
> +
> +	offset = logical - found_key.objectid;
> +	l = path->nodes[0];
> +	extent = btrfs_item_ptr(l, path->slots[0], struct btrfs_extent_item);
> +	item_size = btrfs_item_size_nr(l, path->slots[0]);
> +	btrfs_release_path(path);
> +
> +	ret = iterate_extent_inodes(l, extent, offset, item_size, iterate, ctx);
> +
> +	return ret;
> +}
> +
> +static int iterate_irefs(u64 inum, struct extent_buffer *eb_i,
> +				struct btrfs_root *fs_root,
> +				struct btrfs_path *path,
> +				iterate_irefs_t *iterate, void *ctx)
> +{
> +	int ret;
> +	int slot;
> +	u32 cur;
> +	u32 len;
> +	u32 name_len;
> +	u64 parent = 0;
> +	struct extent_buffer *eb_ir;
> +	struct btrfs_item *item;
> +	struct btrfs_inode_ref *iref;
> +
> +	while (1) {
> +		ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path,
> +					1, &parent, &eb_ir, &slot);
> +		if (ret < 0)
> +			return ret;
> +		if (ret)
> +			break;
> +
> +		item = btrfs_item_nr(eb_i, slot);
> +		iref = btrfs_item_ptr(eb_i, slot, struct btrfs_inode_ref);
> +
> +		for (cur = 0; cur < btrfs_item_size(eb_i, item); cur += len) {
> +			name_len = btrfs_inode_ref_name_len(eb_i, iref);
> +			ret = iterate(parent, iref, eb_ir, slot, ctx);
> +			if (ret)
> +				return ret;
> +			len = sizeof(*iref) + name_len;
> +			iref = (struct btrfs_inode_ref *)((char *)iref + len);
> +		}
> +	}
can you estimate how long this loop may take? (eg. number of iterations per
call) there may be a need for cond_resched()
> +
> +	return 0;
> +}
> +
> +/*
> + * returns 0 if the path could be dumped (probably truncated)
> + * returns <0 in case of an error
> + */
> +static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
> +				struct extent_buffer *eb_ir, int slot,
> +				void *ctx)
> +{
> +	struct inode_fs_paths *ipath = ctx;
> +	struct extent_buffer *eb_i = ipath->eb_i;
> +	u32 path_len;
> +	char *fs_path;
> +
> +	if (ipath->left < 2)
> +		return -EOVERFLOW;
> +
> +	*ipath->dest++ = '' '';
> +	--ipath->left;
> +
> +	fs_path = iref_to_path(ipath->fs_root, ipath->path, iref, eb_i,
> +				inum, ipath->scratch_buf, ipath->left);
fs_path will be never NULL, either an IS_ERR or ipath->scratch_buf + some
offset
> +	if (!fs_path || IS_ERR(fs_path))
> +		return PTR_ERR(fs_path);
this could return PTR_ERR(0), confusing with ''return 0'' for
the calle
> +	path_len = ipath->scratch_buf + ipath->left - fs_path - 1;
> +	if (path_len+1 > ipath->left)
> +		return -EOVERFLOW;
> +	memcpy(ipath->dest, fs_path, path_len+1);
> +	ipath->left -= path_len;
> +	ipath->dest += path_len;
> +
> +	return 0;
> +}
> +
> +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath)
> +{
> +	return iterate_irefs(inum, ipath->eb_i, ipath->fs_root,
ipath->path,
> +				inode_to_path, ipath);
> +}
> diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
> new file mode 100644
> index 0000000..6b7e856
> --- /dev/null
> +++ b/fs/btrfs/backref.h
> @@ -0,0 +1,59 @@
> +/*
> + * Copyright (C) 2011 STRATO.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; if not, write to the
> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> + * Boston, MA 021110-1307, USA.
> + */
> +
> +#ifndef __BTRFS_BACKREF__
> +#define __BTRFS_BACKREF__
> +
> +struct inode_fs_paths {
> +	int			left;
as suggested above, rename this
> +	char			*dest;
> +	struct btrfs_path	*path;
> +	char			*scratch_buf;
> +	struct btrfs_root	*fs_root;
> +	int			scratch_bufsize;
> +	struct extent_buffer	*eb_i;
> +};
> +
> +typedef int (iterate_extent_inodes_t)(u64 inum, loff_t offset, void *ctx);
> +typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref,
> +				struct extent_buffer *eb_ir,
> +				int slot, void *ctx);
> +
> +int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
> +			struct btrfs_path *path);
> +
> +int data_extent_from_logical(struct btrfs_root *root, u64 logical,
> +				struct btrfs_path *path,
> +				struct btrfs_key *found_key);
> +
> +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb,
> +				struct btrfs_extent_item *ei, u32 item_size,
> +				u64 *dest_root, u64 *dest_level);
> +
> +int iterate_extent_inodes(struct extent_buffer *eb,
> +				struct btrfs_extent_item *ei,
> +				loff_t extent_item_offset, u32 item_size,
> +				iterate_extent_inodes_t *iterate, void *ctx);
> +
> +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info
*fs_info,
> +				struct btrfs_path *path,
> +				iterate_extent_inodes_t *iterate, void *ctx);
> +
> +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath);
every function from the .c file not listed here should be static
> +
> +#endif--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2011-Jun-16 21:25 UTC

head link

Re: [PATCH v1 2/6] scrub: added unverified_errors

On Mon, Jun 13, 2011 at 09:10:35PM +0200, Jan Schmidt
wrote:> In normal operation, scrub is reading data sequentially in large portions.
> In case of an i/o error, we try to find the corrupted area(s) by issuing
> page sized read requests. With this commit we increment the
> unverified_errors counter if all of the small size requests succeed.
> 
> Userland patches carrying such conspicous events to the administrator
should
> already be around.
> 
> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
> ---
>  fs/btrfs/scrub.c |   37 ++++++++++++++++++++++++++-----------
>  1 files changed, 26 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index d5a4108..00e4e58 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -207,18 +207,25 @@ nomem:
>   * recheck_error gets called for every page in the bio, even though only
>   * one may be bad
>   */
> -static void scrub_recheck_error(struct scrub_bio *sbio, int ix)
> +static int scrub_recheck_error(struct scrub_bio *sbio, int ix)
>  {
> +	struct scrub_dev *sdev = sbio->sdev;
> +	u64 sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
> +
>  	if (sbio->err) {
> -		if (scrub_fixup_io(READ, sbio->sdev->dev->bdev,
> -				   (sbio->physical + ix * PAGE_SIZE) >> 9,
> +		if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, sector,
>  				   sbio->bio->bi_io_vec[ix].bv_page) == 0) {
>  			if (scrub_fixup_check(sbio, ix) == 0)
> -				return;
> +				return 0;
>  		}
>  	}
>  
> +	spin_lock(&sdev->stat_lock);
> +	++sdev->stat.read_errors;
> +	spin_unlock(&sdev->stat_lock);
> +
>  	scrub_fixup(sbio, ix);
> +	return 1;
>  }
>  
>  static int scrub_fixup_check(struct scrub_bio *sbio, int ix)
> @@ -388,8 +395,14 @@ static void scrub_checksum(struct btrfs_work *work)
>  	int ret;
>  
>  	if (sbio->err) {
> +		ret = 0;
>  		for (i = 0; i < sbio->count; ++i)
> -			scrub_recheck_error(sbio, i);
> +			ret |= scrub_recheck_error(sbio, i);
> +		if (!ret) {
> +			spin_lock(&sdev->stat_lock);
> +			++sdev->stat.unverified_errors;
> +			spin_unlock(&sdev->stat_lock);
> +		}
>  
>  		sbio->bio->bi_flags &= ~(BIO_POOL_MASK - 1);
>  		sbio->bio->bi_flags |= 1 << BIO_UPTODATE;
> @@ -402,10 +415,6 @@ static void scrub_checksum(struct btrfs_work *work)
>  			bi->bv_offset = 0;
>  			bi->bv_len = PAGE_SIZE;
>  		}
> -
> -		spin_lock(&sdev->stat_lock);
> -		++sdev->stat.read_errors;
> -		spin_unlock(&sdev->stat_lock);
>  		goto out;
>  	}
>  	for (i = 0; i < sbio->count; ++i) {
> @@ -426,8 +435,14 @@ static void scrub_checksum(struct btrfs_work *work)
>  			WARN_ON(1);
>  		}
>  		kunmap_atomic(buffer, KM_USER0);
> -		if (ret)
> -			scrub_recheck_error(sbio, i);
> +		if (ret) {
> +			ret = scrub_recheck_error(sbio, i);
> +			if (!ret) {
> +				spin_lock(&sdev->stat_lock);
> +				++sdev->stat.unverified_errors;
> +				spin_unlock(&sdev->stat_lock);
> +			}
> +		}
>  	}
>  
>  out:--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2011-Jun-16 21:26 UTC

head link

Re: [PATCH v1 3/6] scrub: print paths of corrupted files

On Mon, Jun 13, 2011 at 09:10:36PM +0200, Jan Schmidt
wrote:> While scrubbing, we may encounter various errors. Previously, a logical
> address was printed to the log only. Now, all paths belonging to that
> address are resolved and printed separately. That should work for hardlinks
> as well as reflinks.
> 
> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
> ---
>  fs/btrfs/scrub.c |  140
++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 137 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 00e4e58..e294d76 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -27,6 +27,7 @@
>  #include "volumes.h"
>  #include "disk-io.h"
>  #include "ordered-data.h"
> +#include "backref.h"
>  
>  /*
>   * This is only the first step towards a full-features scrub. It reads all
> @@ -106,6 +107,19 @@ struct scrub_dev {
>  	spinlock_t		stat_lock;
>  };
>  
> +struct scrub_warning {
> +	struct btrfs_path	*path;
> +	u64			extent_item_size;
> +	char			*scratch_buf;
> +	char			*msg_buf;
> +	const char		*errstr;
> +	u64			sector;
better named ''physical'', like in scrub_bio
> +	u64			logical;
> +	struct btrfs_device	*dev;
> +	int			msg_bufsize;
> +	int			scratch_bufsize;
> +};
> +
>  static void scrub_free_csums(struct scrub_dev *sdev)
>  {
>  	while (!list_empty(&sdev->csum_list)) {
> @@ -201,6 +215,121 @@ nomem:
>  	return ERR_PTR(-ENOMEM);
>  }
>  
> +static int scrub_print_warning_inode(u64 inum, loff_t offset, void *ctx)
> +{
> +	u64 isize;
> +	int ret;
> +	struct extent_buffer *eb_i;
''eb'' for consistency with others
> +	struct btrfs_inode_item *ii;
inode_item is a commonly used name
> +	struct scrub_warning *swarn = ctx;
> +	struct btrfs_fs_info *fs_info = swarn->dev->dev_root->fs_info;
> +	struct inode_fs_paths ipath;
> +
> +	ret = inode_item_info(inum, 0, fs_info->fs_root, swarn->path);
> +	if (ret) {
> +err:
> +		printk(KERN_WARNING "btrfs: %s at logical %llu on dev "
> +			"%s, sector %llu, inode %llu, offset %llu: path "
> +			"resolving failed with ret=%d\n", swarn->errstr,
> +			swarn->logical, swarn->dev->name, swarn->sector, inum,
> +			offset, ret);
> +		return 0;
please move it to the end after the other printk
> +	}
> +	eb_i = swarn->path->nodes[0];
> +	ii = btrfs_item_ptr(eb_i, swarn->path->slots[0],
> +				struct btrfs_inode_item);
> +	btrfs_release_path(swarn->path);
> +
> +	isize = btrfs_inode_size(eb_i, ii);
> +
> +	ipath.left = swarn->msg_bufsize - 1;
> +	ipath.dest = swarn->msg_buf;
> +	ipath.path = swarn->path;
> +	ipath.scratch_buf = swarn->scratch_buf;
> +	ipath.scratch_bufsize = swarn->scratch_bufsize;
> +	ipath.fs_root = fs_info->fs_root;
> +	ipath.eb_i = eb_i;
> +
> +	ret = paths_from_inode(inum, &ipath);
> +	if (ret < 0)
> +		goto err;
> +
> +	printk(KERN_WARNING "btrfs: %s at logical %llu on dev "
> +		"%s, sector %llu, inode %llu, offset %llu, "
> +		"length %llu, links %u (path:%s)\n", swarn->errstr,
> +		swarn->logical, swarn->dev->name, swarn->sector, inum,
offset,
> +		min(isize - offset, (u64)PAGE_SIZE),
> +		btrfs_inode_nlink(eb_i, ii), swarn->msg_buf);
> +
> +	return 0;
> +}
> +
> +static void scrub_print_warning(const char *errstr, struct scrub_bio
*sbio,
> +				int ix)
please rename ''ix'' to eg ''idx'', more
readable
> +{
> +	struct btrfs_device *dev = sbio->sdev->dev;
> +	struct btrfs_fs_info *fs_info = dev->dev_root->fs_info;
> +	struct btrfs_path *path;
> +	struct btrfs_key found_key;
> +	struct extent_buffer *eb;
> +	struct btrfs_extent_item *ei;
> +	struct scrub_warning swarn;
> +	u32 item_size;
> +	int ret;
> +	u64 ref_root;
i''m not able to understand what a u64 value describing a root means,
seeing it
here or in the printk message. it''s returned as 

	*dest_root = btrfs_extent_inline_ref_offset(eb, eiref);

can you please explain it to me?
> +	u64 ref_level;
int is enough, as noted elsewhere
> +	unsigned long ptr = 0;
> +	static const int bufsize = 4096;
drop static, this would allocate a memory cell to hold the value 4096, while
without it, it''s just a named constant and this will make the compiler
happy
> +	loff_t extent_item_offset;
loff_t is a signed long long, do you really need it?
> +
> +	path = btrfs_alloc_path();
> +
> +	swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS);
> +	swarn.msg_buf = kmalloc(bufsize, GFP_NOFS);
> +	swarn.sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
> +	swarn.logical = sbio->logical + ix * PAGE_SIZE;
> +	swarn.errstr = errstr;
> +	swarn.dev = dev;
> +	swarn.msg_bufsize = bufsize;
> +	swarn.scratch_bufsize = bufsize;
> +
> +	if (!path || !swarn.scratch_buf || !swarn.msg_buf)
> +		goto out;
> +
> +	ret = data_extent_from_logical(fs_info->extent_root,
> +					swarn.logical, path, &found_key);
> +	if (ret < 0)
> +		goto out;
> +
> +	extent_item_offset = swarn.logical - found_key.objectid;
> +	swarn.extent_item_size = found_key.offset;
> +
> +	eb = path->nodes[0];
> +	ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item);
> +	item_size = btrfs_item_size_nr(eb, path->slots[0]);
> +
> +	btrfs_release_path(path);
is it safe to release the path here? it''s used in the else branch below
> +
> +	if (ret) {
> +		ret = tree_backref_for_extent(&ptr, eb, ei, item_size,
> +						&ref_root, &ref_level);
> +		printk(KERN_WARNING "%s at logical %llu on dev %s, "
> +			"sector %llu: metadata %s (level %llu) in tree %llu\n",
> +			errstr, swarn.logical, dev->name, swarn.sector,
> +			ref_level ? "node" : "leaf", ret < 0 ? -1 :
ref_level,
> +			ret < 0 ? -1 : ref_root);
> +	} else {
> +		swarn.path = path;
> +		iterate_extent_inodes(eb, ei, extent_item_offset, item_size,
> +					scrub_print_warning_inode, &swarn);
> +	}
> +
> +out:
> +	btrfs_free_path(path);
> +	kfree(swarn.scratch_buf);
> +	kfree(swarn.msg_buf);
> +}
> +
>  /*
>   * scrub_recheck_error gets called when either verification of the page
>   * failed or the bio failed to read, e.g. with EIO. In the latter case,
> @@ -218,6 +347,11 @@ static int scrub_recheck_error(struct scrub_bio *sbio,
int ix)
>  			if (scrub_fixup_check(sbio, ix) == 0)
>  				return 0;
>  		}
> +		if (printk_ratelimit())
please don''t use printk_ratelimit; as a simple printk_ratelimited is
not
applicable here, we have to use __ratelimit: (include/linux/printk.h)

static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL,
	DEFAULT_RATELIMIT_BURST);

if (__ratelimit(&_rs))
> +			scrub_print_warning("i/o error", sbio, ix);
> +	} else {
> +		if (printk_ratelimit())
and the ratelimit context can be even shared
> +			scrub_print_warning("checksum error", sbio, ix);
>  	}
>  
>  	spin_lock(&sdev->stat_lock);
> @@ -333,7 +467,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
>  	spin_unlock(&sdev->stat_lock);
>  
>  	if (printk_ratelimit())
(drop)
> -		printk(KERN_ERR "btrfs: fixed up at %llu\n",
> +		printk(KERN_ERR "btrfs: fixed up error at logical %llu\n",
>  		       (unsigned long long)logical);
                printk_ratelimited

seems that I forgot to convert this (and the one below) some time ago. although
changes should not be mixed within a patch, I guess this one is rather simple
and a good thing to do. 
>  	return;
>  
> @@ -344,8 +478,8 @@ uncorrectable:
>  	spin_unlock(&sdev->stat_lock);
>  
>  	if (printk_ratelimit())
(drop)
> -		printk(KERN_ERR "btrfs: unable to fixup at %llu\n",
> -			 (unsigned long long)logical);
> +		printk(KERN_ERR "btrfs: unable to fixup (regular) error at "
> +			"logical %llu\n", (unsigned long long)logical);
                printk_ratelimited
>  }
>  
>  static int scrub_fixup_io(int rw, struct block_device *bdev, sector_t
sector,--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2011-Jun-16 21:27 UTC

head link

Re: [PATCH v1 6/6] scrub: add fixup code for errors on nodatasum files

On Mon, Jun 13, 2011 at 09:10:39PM +0200, Jan Schmidt
wrote:> This removes a FIXME comment and introduces the first part of nodatasum
> fixup: It gets the corresponding inode for a logical address and triggers a
> regular readpage for the corrupted sector.
> 
> Once we have on-the-fly error correction our error will be automatically
> corrected. The correction code is expected to clear the newly introduced
> EXTENT_DAMAGED flag, making scrub report that error as
"corrected" instead
> of "uncorrectable" eventually.
> 
> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
> ---
>  fs/btrfs/extent_io.h |    1 +
>  fs/btrfs/scrub.c     |  174
++++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 170 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 2fef77f..906ea42 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -17,6 +17,7 @@
>  #define EXTENT_NODATASUM (1 << 10)
>  #define EXTENT_DO_ACCOUNTING (1 << 11)
>  #define EXTENT_FIRST_DELALLOC (1 << 12)
> +#define EXTENT_DAMAGED (1 << 13)
>  #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK)
>  #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
>  
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index ec29ce8..a2aa47a 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -27,6 +27,7 @@
>  #include "volumes.h"
>  #include "disk-io.h"
>  #include "ordered-data.h"
> +#include "transaction.h"
>  #include "backref.h"
>  
>  /*
> @@ -94,6 +95,7 @@ struct scrub_dev {
>  	int			first_free;
>  	int			curr;
>  	atomic_t		in_flight;
> +	atomic_t		fixup;
a nondescriptive name; a counter?
>  	spinlock_t		list_lock;
>  	wait_queue_head_t	list_wait;
>  	u16			csum_size;
> @@ -107,6 +109,14 @@ struct scrub_dev {
>  	spinlock_t		stat_lock;
>  };
>  
> +struct scrub_fixup_nodatasum {
> +	struct scrub_dev	*sdev;
> +	u64			logical;
> +	struct btrfs_root	*root;
> +	struct btrfs_work	work;
> +	u64			mirror_num;
int should be enough and is used elsewhere but scrub_page, uses u64 as well;
that''s a bit too much IMO
> +};
> +
>  struct scrub_warning {
>  	struct btrfs_path	*path;
>  	u64			extent_item_size;
> @@ -195,12 +205,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device
*dev)
>  
>  		if (i != SCRUB_BIOS_PER_DEV-1)
>  			sdev->bios[i]->next_free = i + 1;
> -		 else
> +		else
>  			sdev->bios[i]->next_free = -1;
>  	}
>  	sdev->first_free = 0;
>  	sdev->curr = -1;
>  	atomic_set(&sdev->in_flight, 0);
> +	atomic_set(&sdev->fixup, 0);
>  	atomic_set(&sdev->cancel_req, 0);
>  	sdev->csum_size = btrfs_super_csum_size(&fs_info->super_copy);
>  	INIT_LIST_HEAD(&sdev->csum_list);
> @@ -330,6 +341,141 @@ out:
>  	kfree(swarn.msg_buf);
>  }
>  
> +static int scrub_fixup_readpage(u64 inum, loff_t offset, void *ctx)
> +{
> +	struct page *page;
> +	unsigned long index;
> +	struct scrub_fixup_nodatasum *fixup = ctx;
> +	int ret;
> +	int corrected;
> +	struct btrfs_key key;
> +	struct inode *inode;
> +	int end = offset + PAGE_SIZE - 1;
loff_t to int?
> +
> +	key.type = BTRFS_INODE_ITEM_KEY;
> +	key.objectid = inum;
> +	key.offset = 0;
> +	inode = btrfs_iget(fixup->root->fs_info->sb, &key,
> +				fixup->root->fs_info->fs_root, NULL);
> +	if (IS_ERR(inode))
> +		return -1;
needs better error code
> +
> +	ret = set_extent_bit(&BTRFS_I(inode)->io_tree, offset, end,
> +				EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS);
> +	if (ret)
> +		return ret < 0 ? ret : -1;
needs better error code
> +
> +	index = offset >> PAGE_CACHE_SHIFT;
> +
> +	page = find_or_create_page(inode->i_mapping, index, GFP_NOFS);
> +	if (!page)
> +		return -1;
needs better error code
> +
> +	ret = extent_read_full_page(&BTRFS_I(inode)->io_tree, page,
> +					btrfs_get_extent, fixup->mirror_num);
> +	wait_on_page_locked(page);
> +	corrected = !test_range_bit(&BTRFS_I(inode)->io_tree, offset, end,
> +					EXTENT_DAMAGED, 0, NULL);
> +
> +	if (corrected)
> +		WARN_ON(!PageUptodate(page));
> +	else
> +		clear_extent_bit(&BTRFS_I(inode)->io_tree, offset, end,
> +					EXTENT_DAMAGED, 0, 0, NULL, GFP_NOFS);
> +
> +	put_page(page);
> +	iput(inode);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	if (ret == 0 && corrected) {
> +		/*
> +		 * we only need to call readpage for one of the inodes belonging
> +		 * to this extent. so make iterate_extent_inodes stop
> +		 */
> +		return 1;
> +	}
> +
> +	return -1;
> +}
> +
> +static void scrub_fixup_nodatasum(struct btrfs_work *work)
> +{
> +	int ret;
> +	struct scrub_fixup_nodatasum *fixup;
> +	struct scrub_dev *sdev;
> +	struct btrfs_trans_handle *trans = NULL;
> +	struct btrfs_fs_info *fs_info;
> +	struct btrfs_path *path;
> +	int uncorrectable = 0;
> +
> +	fixup = container_of(work, struct scrub_fixup_nodatasum, work);
> +	sdev = fixup->sdev;
> +	fs_info = fixup->root->fs_info;
> +
> +	path = btrfs_alloc_path();
> +	if (!path) {
> +		spin_lock(&sdev->stat_lock);
> +		++sdev->stat.malloc_errors;
> +		spin_unlock(&sdev->stat_lock);
> +		uncorrectable = 1;
> +		goto out;
> +	}
> +
> +	trans = btrfs_join_transaction(fixup->root);
> +	if (IS_ERR(trans)) {
> +		uncorrectable = 1;
> +		goto out;
> +	}
> +
> +	/*
> +	 * the idea is to trigger a regular read through the standard path. we
> +	 * read a page from the (failed) logical address by specifying the
> +	 * corresponding copynum of the failed sector. thus, that readpage is
> +	 * expected to fail.
> +	 * that is the point where on-the-fly error correction will kick in
> +	 * (once it''s finished) and rewrite the failed sector if a good
copy
> +	 * can be found.
> +	 */
> +	ret = iterate_inodes_from_logical(fixup->logical,
fixup->root->fs_info,
> +						path, scrub_fixup_readpage,
> +						fixup);
> +	if (ret < 0) {
> +		uncorrectable = 1;
> +		goto out;
> +	}
> +	WARN_ON(ret != 1);
> +
> +	spin_lock(&sdev->stat_lock);
> +	++sdev->stat.corrected_errors;
> +	spin_unlock(&sdev->stat_lock);
> +
> +out:
> +	if (trans && !IS_ERR(trans))
> +		btrfs_end_transaction(trans, fixup->root);
> +	if (uncorrectable) {
> +		spin_lock(&sdev->stat_lock);
> +		++sdev->stat.uncorrectable_errors;
> +		spin_unlock(&sdev->stat_lock);
> +		if (printk_ratelimit())
                printk_ratelimited
> +			printk(KERN_ERR "btrfs: unable to fixup (nodatasum) "
> +				"error at logical %llu\n", fixup->logical);
> +	}
> +
> +	btrfs_free_path(path);
> +	kfree(fixup);
> +
> +	/* see caller why we''re pretending to be paused in the scrub
counters */
> +	mutex_lock(&fs_info->scrub_lock);
> +	atomic_dec(&fs_info->scrubs_running);
> +	atomic_dec(&fs_info->scrubs_paused);
> +	mutex_unlock(&fs_info->scrub_lock);
> +	atomic_dec(&sdev->fixup);
> +	wake_up(&fs_info->scrub_pause_wait);
> +	wake_up(&sdev->list_wait);
> +}
> +
>  /*
>   * scrub_recheck_error gets called when either verification of the page
>   * failed or the bio failed to read, e.g. with EIO. In the latter case,
> @@ -398,6 +544,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix)
>  	struct btrfs_fs_info *fs_info = sdev->dev->dev_root->fs_info;
>  	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>  	struct btrfs_multi_bio *multi = NULL;
> +	struct scrub_fixup_nodatasum *fixup;
>  	u64 logical = sbio->logical + ix * PAGE_SIZE;
>  	u64 length;
>  	int i;
> @@ -406,12 +553,28 @@ static void scrub_fixup(struct scrub_bio *sbio, int
ix)
>  
>  	if ((sbio->spag[ix].flags & BTRFS_EXTENT_FLAG_DATA) &&
>  	    (sbio->spag[ix].have_csum == 0)) {
> +		fixup = kzalloc(sizeof(*fixup), GFP_NOFS);
what happens if the allocation fails?
> +		fixup->sdev = sdev;
> +		fixup->logical = logical;
> +		fixup->root = fs_info->extent_root;
> +		fixup->mirror_num = sbio->spag[ix].mirror_num;
>  		/*
> -		 * nodatasum, don''t try to fix anything
> -		 * FIXME: we can do better, open the inode and trigger a
> -		 * writeback
> +		 * increment scrubs_running to prevent cancel requests from
> +		 * completing as long as a fixup worker is running. we must also
> +		 * increment scrubs_paused to prevent deadlocking on pause
> +		 * requests used for transactions commits (as the worker uses a
> +		 * transaction context). it is safe to regard the fixup worker
> +		 * as paused for all matters practical. effectively, we only
> +		 * avoid cancellation requests from completing.
from a rather brief look, this works as advertised
>  		 */
> -		goto uncorrectable;
> +		mutex_lock(&fs_info->scrub_lock);
> +		atomic_inc(&fs_info->scrubs_running);
> +		atomic_inc(&fs_info->scrubs_paused);
> +		mutex_unlock(&fs_info->scrub_lock);
> +		atomic_inc(&sdev->fixup);
> +		fixup->work.func = scrub_fixup_nodatasum;
> +		btrfs_queue_worker(&fs_info->scrub_workers, &fixup->work);
> +		return;
>  	}
>  
>  	length = PAGE_SIZE;
> @@ -1404,6 +1567,7 @@ int btrfs_scrub_dev(struct btrfs_root *root, u64
devid, u64 start, u64 end,
>  		ret = scrub_enumerate_chunks(sdev, start, end);
>  
>  	wait_event(sdev->list_wait, atomic_read(&sdev->in_flight) ==
0);
> +	wait_event(sdev->list_wait, atomic_read(&sdev->fixup) == 0);
what about joining those into one wait?
>  
>  	atomic_dec(&fs_info->scrubs_running);
>  	wake_up(&fs_info->scrub_pause_wait);--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2011-Jun-17 08:16 UTC

head link

Re: [PATCH v1 2/6] scrub: added unverified_errors

Hi,

sorry for the noise, this has no comments from me,

david

On Thu, Jun 16, 2011 at 11:25:21PM +0200, David Sterba
wrote:> On Mon, Jun 13, 2011 at 09:10:35PM +0200, Jan Schmidt wrote:
> > In normal operation, scrub is reading data sequentially in large
portions.
> > In case of an i/o error, we try to find the corrupted area(s) by
issuing
> > page sized read requests. With this commit we increment the
> > unverified_errors counter if all of the small size requests succeed.
> > 
> > Userland patches carrying such conspicous events to the administrator
should
> > already be around.
> > 
> > Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
> > ---
> >  fs/btrfs/scrub.c |   37 ++++++++++++++++++++++++++-----------
> >  1 files changed, 26 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> > index d5a4108..00e4e58 100644
> > --- a/fs/btrfs/scrub.c
> > +++ b/fs/btrfs/scrub.c
> > @@ -207,18 +207,25 @@ nomem:
> >   * recheck_error gets called for every page in the bio, even though
only
> >   * one may be bad
> >   */
> > -static void scrub_recheck_error(struct scrub_bio *sbio, int ix)
> > +static int scrub_recheck_error(struct scrub_bio *sbio, int ix)
> >  {
> > +	struct scrub_dev *sdev = sbio->sdev;
> > +	u64 sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
> > +
> >  	if (sbio->err) {
> > -		if (scrub_fixup_io(READ, sbio->sdev->dev->bdev,
> > -				   (sbio->physical + ix * PAGE_SIZE) >> 9,
> > +		if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, sector,
> >  				   sbio->bio->bi_io_vec[ix].bv_page) == 0) {
> >  			if (scrub_fixup_check(sbio, ix) == 0)
> > -				return;
> > +				return 0;
> >  		}
> >  	}
> >  
> > +	spin_lock(&sdev->stat_lock);
> > +	++sdev->stat.read_errors;
> > +	spin_unlock(&sdev->stat_lock);
> > +
> >  	scrub_fixup(sbio, ix);
> > +	return 1;
> >  }
> >  
> >  static int scrub_fixup_check(struct scrub_bio *sbio, int ix)
> > @@ -388,8 +395,14 @@ static void scrub_checksum(struct btrfs_work
*work)
> >  	int ret;
> >  
> >  	if (sbio->err) {
> > +		ret = 0;
> >  		for (i = 0; i < sbio->count; ++i)
> > -			scrub_recheck_error(sbio, i);
> > +			ret |= scrub_recheck_error(sbio, i);
> > +		if (!ret) {
> > +			spin_lock(&sdev->stat_lock);
> > +			++sdev->stat.unverified_errors;
> > +			spin_unlock(&sdev->stat_lock);
> > +		}
> >  
> >  		sbio->bio->bi_flags &= ~(BIO_POOL_MASK - 1);
> >  		sbio->bio->bi_flags |= 1 << BIO_UPTODATE;
> > @@ -402,10 +415,6 @@ static void scrub_checksum(struct btrfs_work
*work)
> >  			bi->bv_offset = 0;
> >  			bi->bv_len = PAGE_SIZE;
> >  		}
> > -
> > -		spin_lock(&sdev->stat_lock);
> > -		++sdev->stat.read_errors;
> > -		spin_unlock(&sdev->stat_lock);
> >  		goto out;
> >  	}
> >  	for (i = 0; i < sbio->count; ++i) {
> > @@ -426,8 +435,14 @@ static void scrub_checksum(struct btrfs_work
*work)
> >  			WARN_ON(1);
> >  		}
> >  		kunmap_atomic(buffer, KM_USER0);
> > -		if (ret)
> > -			scrub_recheck_error(sbio, i);
> > +		if (ret) {
> > +			ret = scrub_recheck_error(sbio, i);
> > +			if (!ret) {
> > +				spin_lock(&sdev->stat_lock);
> > +				++sdev->stat.unverified_errors;
> > +				spin_unlock(&sdev->stat_lock);
> > +			}
> > +		}
> >  	}
> >  
> >  out:
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-Jun-18 11:25 UTC

head link

Re: [PATCH v1 1/6] added helper functions to iterate backrefs

Lots of quotation removed. All removed comments accepted.

On 16.06.2011 23:24, David Sterba wrote:> On Mon, Jun 13, 2011 at 09:10:34PM +0200, Jan Schmidt wrote:
>> +/*
>> + * this iterates to turn a btrfs_inode_ref into a full filesystem
path. elements
>> + * of the path are separated by ''/'' and the path is
guaranteed to be
>> + * 0-terminated. the path is only given within the current file
system.
>> + * Therefore, it never starts with a ''/''. the caller
is responsible to provide
>> + * "size" bytes in "dest". the dest buffer will be
filled backwards! the idea is
>> + * that in case of an overflow, the lower part in the hierarchie is
most
>> + * important to the user. finally, the start point of resulting the
string is
>                                                                  ^^^
>> + * returned. in case the path would overflow, "..." is added
at the front of
>> + * the string and iteration stops regularly.
>> + */
>> +static char *iref_to_path(struct btrfs_root *fs_root, struct
btrfs_path *path,
>> +				struct btrfs_inode_ref *iref,
>> +				struct extent_buffer *eb, u64 parent,
>> +				char *dest, u32 size)
>> +{
>> +	u32 len;
>> +	int slot;
>> +	u64 inum;
>> +	int ret;
>> +	u32 left = size - 1;
> 
> this confused me first, ''left'' may be ambiguous,
something like ''bytes_left''
> would be better
> 
>> +
>> +	dest[left] = ''\0'';
>> +
>> +	while (1) {
>> +		len = btrfs_inode_ref_name_len(eb, iref);
>> +		if (len > left) {
>> +			if (size < 4)
>> +				return dest+left;
> 
>                                 break;
> 
>> +			if (left > 3)
>> +				dest += left - 3;
>> +			memcpy(dest, "...", 3);
>> +			return dest;
> 
> i''d rather see this consistent with other return points,
> 
> if (left > 3)
> 	left -= 3;
> else
> 	left = 0;
> memcpy(dest + left, "...", 3);
> break;
> 
>> +		}
>> +		left -= len;
>> +		read_extent_buffer(eb, dest+left,
>> +					(unsigned long)(iref + 1), len);
>> +
>> +		ret = inode_item_info(parent, 0, fs_root, path);
>> +		if (ret)
>> +			return ERR_PTR(ret);
>> +		eb = path->nodes[0];
>> +		btrfs_release_path(path);
>> +
>> +		ret = inode_ref_info(parent, 0, fs_root, path, 0,
>> +					&inum, NULL, &slot);
>> +		if (ret)
>> +			return ERR_PTR(ret);
>> +
>> +		if (parent == inum)   /* regular exit ahead */ {
>> +			return dest+left;
>                         break;
> 
> and please put the comment before the if and remove { }
Damn. Braces where a debugging leftover. I wonder if I missed something
from checkpatch.pl.
>> +		}
>> +
>> +		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
>> +		parent = inum;
>> +		if (left > 0) {
>> +			dest[left-1] = ''/'';
>> +			--left;
> 
> swap the lines and remove "-1" :)
> 
>> +		}
>> +	}
>        
>         return dest + left;
Took suggested modifications and it looks prettier with a return at the end.
>> +}
>> +
>> +/*
>> + * this makes the path point to (logical EXTENT_ITEM *)
>> + * returns 0 for data blocks, 1 for tree blocks and <0 on error
>> + */
>> +int data_extent_from_logical(struct btrfs_root *root, u64 logical,
>> +				struct btrfs_path *path,
>> +				struct btrfs_key *found_key)
>> +{
>> +	int ret;
>> +	u64 flags;
>> +	u32 item_size;
>> +	struct extent_buffer *eb;
>> +	struct btrfs_extent_item *ei;
>> +	struct btrfs_key key;
>> +	struct btrfs_key f;
> 
>                          tmpkey;
> 
>> +	struct btrfs_key *fk = found_key ? found_key : &f;
> 
> transform this to
> 
>         if (!found_key)
> 		found_key = &tmpkey;
> 
> and use found_key along
> 	
> still, will you ever pass NULL found_key?
You are right, I probably won''t. I''ll take out those lines
completely.
>> +
>> +	key.type = BTRFS_EXTENT_ITEM_KEY;
>> +	key.objectid = logical;
>> +	key.offset = (u64)-1;
>> +
>> +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>> +	if (ret < 0)
>> +		return ret;
>> +	ret = btrfs_previous_item(root->fs_info->extent_root, path,
>> +					0, BTRFS_EXTENT_ITEM_KEY);
>> +	if (ret < 0)
>> +		return ret;
>> +	btrfs_item_key_to_cpu(path->nodes[0], fk, path->slots[0]);
>> +	if (fk->type != BTRFS_EXTENT_ITEM_KEY || fk->objectid >
logical ||
>> +	    fk->objectid + fk->offset <= logical)
>> +		return -1;
> 
> this is -EPERM, confusing. eg btrfs_previous_item a few lines above may
return
> variety of error codes from the whole function, EPERM just does not fit. 
does
> this check mean some serious error? inconsistency or corruption?
It doesn''t. -ENOENT seems appropriate.
>> +
>> +	eb = path->nodes[0];
>> +	item_size = btrfs_item_size_nr(eb, path->slots[0]);
>> +	BUG_ON(item_size < sizeof(*ei));
>> +
>> +	ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item);
>> +	flags = btrfs_extent_flags(eb, ei);
>> +
>> +	if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)
>> +		return 1;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * helper function to iterate extent backrefs. ptr must point to a 0
value for
>> + * the first call and may be modified. it is used to track state.
>> + * if more backrefs exist, 0 is returned and the next call to
_get_extent_ref
>> + * must pass the modified ptr parameter to get to the next backref.
>> + * after the last backref was processed, 1 is returned.
>> + * returns <0 on error
>> + */
>> +static int _get_extent_ref(int flags_wanted, int type_wanted,
>            __get_extent_ref
> single score function prefixes are not used
> 
> flags_wanted is compared to extent_flags of u64 type
Alongside, I changed type_wanted from int to u8.
>> +				unsigned long *ptr, struct extent_buffer *eb,
>> +				struct btrfs_extent_item *ei, u32 item_size,
>> +				struct btrfs_extent_inline_ref **eiref)
>> +{
>> +	int type;
>> +	int again = 0;
>> +	unsigned long end;
>> +	u64 flags;
>> +	struct btrfs_tree_block_info *info;
>> +
>> +	if (!*ptr) {
>> +		/* first call */
>> +		flags = btrfs_extent_flags(eb, ei);
>> +		if (!(flags & flags_wanted))
>> +			return -1;
> 
> again, better errorcode, EINVAL seems to be the right choice
> 
>> +		if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
>> +			info = (struct btrfs_tree_block_info *)(ei + 1);
>> +			*eiref = (struct btrfs_extent_inline_ref *)(info + 1);
>> +		} else {
>> +			*eiref = (struct btrfs_extent_inline_ref *)(ei + 1);
> 
> style: remove { } around single if/else statement
That would look horribly unbalanced because the if part needs braces.
>> +		}
> 
>> +		*ptr = (unsigned long)*eiref;
>> +	}
>> +
>> +	end = (unsigned long)ei + item_size;
>> +
>> +again:
>> +	again = 0;
> 
> excessive use of goto detected:
> 
> 	do {
>> +
>> +	*eiref = (struct btrfs_extent_inline_ref *)*ptr;
>> +	type = btrfs_extent_inline_ref_type(eb, *eiref);
>> +
>> +	if (type != type_wanted)
>> +		again = 1;
>> +
>> +	*ptr += btrfs_extent_inline_ref_size(type);
>> +
>> +	WARN_ON(*ptr > end);
>> +	if (*ptr == end)
>> +		return 1; /* last */
>> +
>> +	if (again)
>> +		goto again;
> 	
> 	} while (type != type_wanted);
> 
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * reads the tree block backref for an extent. tree level and root are
returned
>> + * through dest_level and dest_root. ptr must point to a 0 value for
the first
>> + * call and may be modified (see _get_extent_ref comment).
>> + * returns 0 on success, <0 on error. note: in contrast to
_get_extent_ref this
>> + * one never returns 1!
>> + */
>> +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer
*eb,
>> +				struct btrfs_extent_item *ei, u32 item_size,
>> +				u64 *dest_root, u64 *dest_level)
> 
> there''s not a formal "rule" or something, but
i''ve seen outgoing parameters
> named with ''out_''
> 
> dest_level can be just an int, levels fit in one byte anyway
> 
>> +{
>> +	int ret;
>> +	struct btrfs_tree_block_info *info;
>> +	struct btrfs_extent_inline_ref *eiref;
>> +
>> +	ret = _get_extent_ref(BTRFS_EXTENT_FLAG_TREE_BLOCK,
>> +				BTRFS_TREE_BLOCK_REF_KEY, ptr, eb, ei,
>> +				item_size, &eiref);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	WARN_ON(!ret); /* multiple tree backrefs for this extent */
> 
> as we will probably have CONFIG_DEBUG, the comment makes a good candidate
for a
> printk message
I''ll prepare that with a if (!ret) and WARN_ON(1) with printk.
>> +
>> +	info = (struct btrfs_tree_block_info *)(ei + 1);
>> +	*dest_root = btrfs_extent_inline_ref_offset(eb, eiref);
>> +	*dest_level = btrfs_tree_block_level(eb, info);
>> +
>> +	return 0;
>> +}
>> +
>> +static int data_inode_for_extent(unsigned long *ptr, struct
extent_buffer *eb,
>> +					struct btrfs_extent_item *ei,
>> +					u32 item_size, u64 *dest_inum,
>> +					u64 *dest_ioff)
>> +{
>> +	int ret;
>> +	struct btrfs_extent_inline_ref *eiref;
>> +	struct btrfs_extent_data_ref *dref;
>> +
>> +	ret = _get_extent_ref(BTRFS_EXTENT_FLAG_DATA,
BTRFS_EXTENT_DATA_REF_KEY,
>> +				ptr, eb, ei, item_size, &eiref);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	dref = (struct btrfs_extent_data_ref *)(&eiref->offset);
>> +	if (btrfs_extent_data_ref_root(eb, dref) != BTRFS_FS_TREE_OBJECTID) {
>> +		WARN_ON(1);
>> +		return -1;
> 
> needs better error code
> 
>> +	}
>> +
>> +	*dest_inum = btrfs_extent_data_ref_objectid(eb, dref);
>> +	*dest_ioff = btrfs_extent_data_ref_offset(eb, dref);
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * calls iterate() for every inode that references the extent
identified by
>> + * the given parameters.
>> + * when the iterator function returns a non-zero value, iteration
stops.
>> + */
>> +int iterate_extent_inodes(struct extent_buffer *eb,
>> +				struct btrfs_extent_item *ei,
>> +				loff_t extent_item_offset, u32 item_size,
>> +				iterate_extent_inodes_t *iterate, void *ctx)
>> +{
>> +	int last;
>> +	u64 inum;
>> +	unsigned long ptr = 0;
>> +	loff_t extent_data_item_offset;
>> +	int ret;
>> +
>> +	do {
>> +		last = data_inode_for_extent(&ptr, eb, ei, item_size, &inum,
>> +						&extent_data_item_offset);
>> +		if (last < 0)
>> +			return last;
>> +
>> +		ret = iterate(inum, extent_item_offset+extent_data_item_offset,
>> +				ctx);
>> +		if (ret)
>> +			return ret;
>> +
>> +	} while (!last);
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * just a convenience function combining data_extent_from_logical and
>> + * iterate_extent_inodes.
>> + */
> 
> a better comment please or none
Opt for the latter. Didn''t feel that would need a comment, anyway.
>> +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info
*fs_info,
>> +				struct btrfs_path *path,
>> +				iterate_extent_inodes_t *iterate, void *ctx)
>> +{
>> +	int ret;
>> +	u32 item_size;
>> +	struct extent_buffer *l;
>> +	struct btrfs_extent_item *extent;
>> +	loff_t offset;
>> +	struct btrfs_key found_key;
>> +
>> +	ret = data_extent_from_logical(fs_info->extent_root, logical,
path,
>> +					&found_key);
>> +	if (ret)
>> +		return ret;
> 
> this can ret < 0 in case of error, and 1 for the tree block. not sure if
this
> shouldn''t be handled separately
Well, that''s fine I think. We can''t proceed either way here.
>> +
>> +	offset = logical - found_key.objectid;
>> +	l = path->nodes[0];
>> +	extent = btrfs_item_ptr(l, path->slots[0], struct
btrfs_extent_item);
>> +	item_size = btrfs_item_size_nr(l, path->slots[0]);
>> +	btrfs_release_path(path);
>> +
>> +	ret = iterate_extent_inodes(l, extent, offset, item_size, iterate,
ctx);
>> +
>> +	return ret;
>> +}
>> +
>> +static int iterate_irefs(u64 inum, struct extent_buffer *eb_i,
>> +				struct btrfs_root *fs_root,
>> +				struct btrfs_path *path,
>> +				iterate_irefs_t *iterate, void *ctx)
>> +{
>> +	int ret;
>> +	int slot;
>> +	u32 cur;
>> +	u32 len;
>> +	u32 name_len;
>> +	u64 parent = 0;
>> +	struct extent_buffer *eb_ir;
>> +	struct btrfs_item *item;
>> +	struct btrfs_inode_ref *iref;
>> +
>> +	while (1) {
>> +		ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path,
>> +					1, &parent, &eb_ir, &slot);
>> +		if (ret < 0)
>> +			return ret;
>> +		if (ret)
>> +			break;
>> +
>> +		item = btrfs_item_nr(eb_i, slot);
>> +		iref = btrfs_item_ptr(eb_i, slot, struct btrfs_inode_ref);
>> +
>> +		for (cur = 0; cur < btrfs_item_size(eb_i, item); cur += len) {
>> +			name_len = btrfs_inode_ref_name_len(eb_i, iref);
>> +			ret = iterate(parent, iref, eb_ir, slot, ctx);
>> +			if (ret)
>> +				return ret;
>> +			len = sizeof(*iref) + name_len;
>> +			iref = (struct btrfs_inode_ref *)((char *)iref + len);
>> +		}
>> +	}
> 
> can you estimate how long this loop may take? (eg. number of iterations per
> call) there may be a need for cond_resched()
Number of iterations is limited by the number of backrefs an inode can
have, which is limited by the number of backrefs that fit into one leaf.
As leaves are only 4kb (maybe 8kb soon), I don''t think that requires a
cond_resched(). More input on this appreciated :-)
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * returns 0 if the path could be dumped (probably truncated)
>> + * returns <0 in case of an error
>> + */
>> +static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
>> +				struct extent_buffer *eb_ir, int slot,
>> +				void *ctx)
>> +{
>> +	struct inode_fs_paths *ipath = ctx;
>> +	struct extent_buffer *eb_i = ipath->eb_i;
>> +	u32 path_len;
>> +	char *fs_path;
>> +
>> +	if (ipath->left < 2)
>> +		return -EOVERFLOW;
>> +
>> +	*ipath->dest++ = '' '';
>> +	--ipath->left;
>> +
>> +	fs_path = iref_to_path(ipath->fs_root, ipath->path, iref, eb_i,
>> +				inum, ipath->scratch_buf, ipath->left);
> 
> fs_path will be never NULL, either an IS_ERR or ipath->scratch_buf +
some
> offset
You are right, that changed while developing, I''ll remove the check. In
this implementation ...
>> +	if (!fs_path || IS_ERR(fs_path))
>> +		return PTR_ERR(fs_path);
> 
> this could return PTR_ERR(0), confusing with ''return 0''
for the calle
... that comment is void then.
>> +	path_len = ipath->scratch_buf + ipath->left - fs_path - 1;
>> +	if (path_len+1 > ipath->left)
>> +		return -EOVERFLOW;
>> +	memcpy(ipath->dest, fs_path, path_len+1);
>> +	ipath->left -= path_len;
>> +	ipath->dest += path_len;
>> +
>> +	return 0;
>> +}
>> +
>> +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath)
>> +{
>> +	return iterate_irefs(inum, ipath->eb_i, ipath->fs_root,
ipath->path,
>> +				inode_to_path, ipath);
>> +}
>> diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
>> new file mode 100644
>> index 0000000..6b7e856
>> --- /dev/null
>> +++ b/fs/btrfs/backref.h
>> @@ -0,0 +1,59 @@
>> +/*
>> + * Copyright (C) 2011 STRATO.  All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public
>> + * License v2 as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public
>> + * License along with this program; if not, write to the
>> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
>> + * Boston, MA 021110-1307, USA.
>> + */
>> +
>> +#ifndef __BTRFS_BACKREF__
>> +#define __BTRFS_BACKREF__
>> +
>> +struct inode_fs_paths {
>> +	int			left;
> 
> as suggested above, rename this
> 
>> +	char			*dest;
>> +	struct btrfs_path	*path;
>> +	char			*scratch_buf;
>> +	struct btrfs_root	*fs_root;
>> +	int			scratch_bufsize;
>> +	struct extent_buffer	*eb_i;
>> +};
>> +
>> +typedef int (iterate_extent_inodes_t)(u64 inum, loff_t offset, void
*ctx);
>> +typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref
*iref,
>> +				struct extent_buffer *eb_ir,
>> +				int slot, void *ctx);
>> +
>> +int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
>> +			struct btrfs_path *path);
>> +
>> +int data_extent_from_logical(struct btrfs_root *root, u64 logical,
>> +				struct btrfs_path *path,
>> +				struct btrfs_key *found_key);
>> +
>> +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer
*eb,
>> +				struct btrfs_extent_item *ei, u32 item_size,
>> +				u64 *dest_root, u64 *dest_level);
>> +
>> +int iterate_extent_inodes(struct extent_buffer *eb,
>> +				struct btrfs_extent_item *ei,
>> +				loff_t extent_item_offset, u32 item_size,
>> +				iterate_extent_inodes_t *iterate, void *ctx);
>> +
>> +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info
*fs_info,
>> +				struct btrfs_path *path,
>> +				iterate_extent_inodes_t *iterate, void *ctx);
>> +
>> +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath);
> 
> every function from the .c file not listed here should be static
I agree on that. Double checked and didn''t find any that
isn''t.

Thanks!
Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-Jun-18 11:26 UTC

head link

Re: [PATCH v1 3/6] scrub: print paths of corrupted files

Lots of quotation removed. All removed comments accepted.

On 16.06.2011 23:26, David Sterba wrote:> On Mon, Jun 13, 2011 at 09:10:36PM +0200, Jan Schmidt wrote:
>> +struct scrub_warning {
>> +	struct btrfs_path	*path;
>> +	u64			extent_item_size;
>> +	char			*scratch_buf;
>> +	char			*msg_buf;
>> +	const char		*errstr;
>> +	u64			sector;
> 
> better named ''physical'', like in scrub_bio
As I save a sector there and not a physical address, I still prefer
"sector". But as a compromise, I''ll change the type to
sector_t instead.
>> +static void scrub_print_warning(const char *errstr, struct scrub_bio
*sbio,
>> +				int ix)
> 
> please rename ''ix'' to eg ''idx'', more
readable
I''d like to stay consistent with the rest of the file, using ix at
similar opportunities. And besides: I like ix more than idx :-)
>> +{
>> +	struct btrfs_device *dev = sbio->sdev->dev;
>> +	struct btrfs_fs_info *fs_info = dev->dev_root->fs_info;
>> +	struct btrfs_path *path;
>> +	struct btrfs_key found_key;
>> +	struct extent_buffer *eb;
>> +	struct btrfs_extent_item *ei;
>> +	struct scrub_warning swarn;
>> +	u32 item_size;
>> +	int ret;
>> +	u64 ref_root;
> 
> i''m not able to understand what a u64 value describing a root
means, seeing it
> here or in the printk message. it''s returned as 
> 
> 	*dest_root = btrfs_extent_inline_ref_offset(eb, eiref);
> 
> can you please explain it to me?
Have a look at the BTRFS_*_TREE_OBJECTID #defines in ctree.h.
btrfs-debug-tree prints them as
                tree block backref root 2
>> +	u64 ref_level;
> 
> int is enough, as noted elsewhere
> 
>> +	unsigned long ptr = 0;
>> +	static const int bufsize = 4096;
> 
> drop static, this would allocate a memory cell to hold the value 4096,
while
> without it, it''s just a named constant and this will make the
compiler happy
> 
>> +	loff_t extent_item_offset;
> 
> loff_t is a signed long long, do you really need it?
Seems I don''t. I was looking around how I came to use loff_t in the
first place, but couldn''t find a good reason. I''m replacing it
with u64,
which makes more sense anyway, since the loff_t parameter I''m passing
around is the result of a subtraction of two u64s.
>> +
>> +	path = btrfs_alloc_path();
>> +
>> +	swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS);
>> +	swarn.msg_buf = kmalloc(bufsize, GFP_NOFS);
>> +	swarn.sector = (sbio->physical + ix * PAGE_SIZE) >> 9;
>> +	swarn.logical = sbio->logical + ix * PAGE_SIZE;
>> +	swarn.errstr = errstr;
>> +	swarn.dev = dev;
>> +	swarn.msg_bufsize = bufsize;
>> +	swarn.scratch_bufsize = bufsize;
>> +
>> +	if (!path || !swarn.scratch_buf || !swarn.msg_buf)
>> +		goto out;
>> +
>> +	ret = data_extent_from_logical(fs_info->extent_root,
>> +					swarn.logical, path, &found_key);
>> +	if (ret < 0)
>> +		goto out;
>> +
>> +	extent_item_offset = swarn.logical - found_key.objectid;
>> +	swarn.extent_item_size = found_key.offset;
>> +
>> +	eb = path->nodes[0];
>> +	ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item);
>> +	item_size = btrfs_item_size_nr(eb, path->slots[0]);
>> +
>> +	btrfs_release_path(path);
> 
> is it safe to release the path here? it''s used in the else branch
below
Yes, and it''s even required, since iterate_extent_inodes() wants a
clean, usable path for itself. To make that more clear, I''ll move the
call to btrfs_release_path inside the else block.
>> +
>> +	if (ret) {
>> +		ret = tree_backref_for_extent(&ptr, eb, ei, item_size,
>> +						&ref_root, &ref_level);
>> +		printk(KERN_WARNING "%s at logical %llu on dev %s, "
>> +			"sector %llu: metadata %s (level %llu) in tree %llu\n",
>> +			errstr, swarn.logical, dev->name, swarn.sector,
>> +			ref_level ? "node" : "leaf", ret < 0 ? -1 :
ref_level,
>> +			ret < 0 ? -1 : ref_root);
>> +	} else {
>> +		swarn.path = path;
>> +		iterate_extent_inodes(eb, ei, extent_item_offset, item_size,
>> +					scrub_print_warning_inode, &swarn);
>> +	}
>> +
>> +out:
>> +	btrfs_free_path(path);
>> +	kfree(swarn.scratch_buf);
>> +	kfree(swarn.msg_buf);
>> +}
>> +
>>  /*
>>   * scrub_recheck_error gets called when either verification of the
page
>>   * failed or the bio failed to read, e.g. with EIO. In the latter
case,
>> @@ -218,6 +347,11 @@ static int scrub_recheck_error(struct scrub_bio
*sbio, int ix)
>>  			if (scrub_fixup_check(sbio, ix) == 0)
>>  				return 0;
>>  		}
>> +		if (printk_ratelimit())
> 
> please don''t use printk_ratelimit; as a simple printk_ratelimited
is not
> applicable here, we have to use __ratelimit: (include/linux/printk.h)
> 
> static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL,
> 	DEFAULT_RATELIMIT_BURST);
> 
> if (__ratelimit(&_rs))
Modified as you suggested, I only included linux/ratelimit.h instead.

Thanks!
Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-Jun-18 11:26 UTC

head link

Re: [PATCH v1 6/6] scrub: add fixup code for errors on nodatasum files

Lots of quotation removed. All removed comments accepted.

On 16.06.2011 23:27, David Sterba wrote:> On Mon, Jun 13, 2011 at 09:10:39PM +0200, Jan Schmidt wrote:
>> +struct scrub_fixup_nodatasum {
>> +	struct scrub_dev	*sdev;
>> +	u64			logical;
>> +	struct btrfs_root	*root;
>> +	struct btrfs_work	work;
>> +	u64			mirror_num;
> 
> int should be enough and is used elsewhere but scrub_page, uses u64 as
well;
> that''s a bit too much IMO
You are right. scrub.c should be changed to use int mirror_num all over.
I''ll integrate a cleanup patch in my next patch series as patch 6/7 and
this one can become a clean 7/7 then.
>> +};
>> +
>>  struct scrub_warning {
>>  	struct btrfs_path	*path;
>>  	u64			extent_item_size;
>> @@ -195,12 +205,13 @@ struct scrub_dev *scrub_setup_dev(struct
btrfs_device *dev)
>>  
>>  		if (i != SCRUB_BIOS_PER_DEV-1)
>>  			sdev->bios[i]->next_free = i + 1;
>> -		 else
>> +		else
>>  			sdev->bios[i]->next_free = -1;
>>  	}
>>  	sdev->first_free = 0;
>>  	sdev->curr = -1;
>>  	atomic_set(&sdev->in_flight, 0);
>> +	atomic_set(&sdev->fixup, 0);
>>  	atomic_set(&sdev->cancel_req, 0);
>>  	sdev->csum_size =
btrfs_super_csum_size(&fs_info->super_copy);
>>  	INIT_LIST_HEAD(&sdev->csum_list);
>> @@ -330,6 +341,141 @@ out:
>>  	kfree(swarn.msg_buf);
>>  }
>>  
>> +static int scrub_fixup_readpage(u64 inum, loff_t offset, void *ctx)
>> +{
>> +	struct page *page;
>> +	unsigned long index;
>> +	struct scrub_fixup_nodatasum *fixup = ctx;
>> +	int ret;
>> +	int corrected;
>> +	struct btrfs_key key;
>> +	struct inode *inode;
>> +	int end = offset + PAGE_SIZE - 1;
> 
> loff_t to int?
> 
>> +
>> +	key.type = BTRFS_INODE_ITEM_KEY;
>> +	key.objectid = inum;
>> +	key.offset = 0;
>> +	inode = btrfs_iget(fixup->root->fs_info->sb, &key,
>> +				fixup->root->fs_info->fs_root, NULL);
>> +	if (IS_ERR(inode))
>> +		return -1;
> 
> needs better error code
> 
>> +
>> +	ret = set_extent_bit(&BTRFS_I(inode)->io_tree, offset, end,
>> +				EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS);
>> +	if (ret)
>> +		return ret < 0 ? ret : -1;
> 
> needs better error code
Hum. That one is tricky. set_extent_bit can in theory return anything,
as there is at least one hook that can be called. If it returns
non-zero, that will become the return value of set_extent_bit. As far as
I see, that hook will always return 0 at the moment. In case that hook
decides to return something >0, I still want iterate_extent_inodes() to
terminate iteration. I think adding a WARN_ON for that and returning
-EFAULT is an option.
>>  		/*
>> -		 * nodatasum, don''t try to fix anything
>> -		 * FIXME: we can do better, open the inode and trigger a
>> -		 * writeback
>> +		 * increment scrubs_running to prevent cancel requests from
>> +		 * completing as long as a fixup worker is running. we must also
>> +		 * increment scrubs_paused to prevent deadlocking on pause
>> +		 * requests used for transactions commits (as the worker uses a
>> +		 * transaction context). it is safe to regard the fixup worker
>> +		 * as paused for all matters practical. effectively, we only
>> +		 * avoid cancellation requests from completing.
> 
> from a rather brief look, this works as advertised
> 
>>  		 */
Bonus points for thinking about that :-)

Thanks!
Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jun 2011 - [PATCH v1 0/6] Btrfs: scrub: print path to corrupted files and trigger nodatasum fixup

[PATCH v1 0/6] Btrfs: scrub: print path to corrupted files and trigger nodatasum fixup

[PATCH v1 1/6] added helper functions to iterate backrefs

[PATCH v1 2/6] scrub: added unverified_errors

[PATCH v1 3/6] scrub: print paths of corrupted files

[PATCH v1 4/6] scrub: bugfix: mirror_num off by one

[PATCH v1 5/6] add mirror_num to extent_read_full_page

[PATCH v1 6/6] scrub: add fixup code for errors on nodatasum files

Re: [PATCH v1 1/6] added helper functions to iterate backrefs

Re: [PATCH v1 2/6] scrub: added unverified_errors

Re: [PATCH v1 3/6] scrub: print paths of corrupted files

Re: [PATCH v1 6/6] scrub: add fixup code for errors on nodatasum files

Re: [PATCH v1 2/6] scrub: added unverified_errors

Re: [PATCH v1 1/6] added helper functions to iterate backrefs

Re: [PATCH v1 3/6] scrub: print paths of corrupted files

Re: [PATCH v1 6/6] scrub: add fixup code for errors on nodatasum files