Jan Schmidt
2011-Jun-13 19:10 UTC
[PATCH v1 0/6] Btrfs: scrub: print path to corrupted files and trigger nodatasum fixup
This patch set introduces two new features for scrub. They share the backref iteration code which is the reason they made it into the same patch set. The first feature adds printk statements in case scrub finds an error which list all affected files. You will need patch 1, 2 and 3 for that. The second feature adds the trigger which enables us to correct i/o errors in case the affected extent does not have a checksum (nodatasum), eventually. You will need patch 1, 4, 5 and 6 for that. I tried to apply all patches to the current cmason/for-linus branch and to Arne''s current for-chris branch. They do apply with no errors (some offsets possible). Please review. Next I''m starting to make up my mind how to implement on-the-fly error correction correctly. This will enable us to rewrite good data whenever we encounter a bad copy. I have some preliminary patches already, the stress in the first sentence is on "correctly". The second feature mentioned in this patch series will then automatically use that code, too. -Jan Jan Schmidt (6): added helper functions to iterate backrefs scrub: added unverified_errors scrub: print paths of corrupted files scrub: bugfix: mirror_num off by one add mirror_num to extent_read_full_page scrub: add fixup code for errors on nodatasum files fs/btrfs/Makefile | 3 +- fs/btrfs/backref.c | 461 ++++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/backref.h | 59 +++++++ fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent_io.c | 6 +- fs/btrfs/extent_io.h | 3 +- fs/btrfs/inode.c | 2 +- fs/btrfs/scrub.c | 361 ++++++++++++++++++++++++++++++++++++--- 8 files changed, 866 insertions(+), 31 deletions(-) create mode 100644 fs/btrfs/backref.c create mode 100644 fs/btrfs/backref.h -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2011-Jun-13 19:10 UTC
[PATCH v1 1/6] added helper functions to iterate backrefs
These helper functions iterate back references and call a function for each backref. There is also a function to resolve an inode to a path in the file system. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/Makefile | 3 +- fs/btrfs/backref.c | 461 ++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/backref.h | 59 +++++++ 3 files changed, 522 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 9b72dcf..c63f649 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -7,4 +7,5 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \ - compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o + compression.o delayed-ref.o relocation.o delayed-inode.o backref.o \ + scrub.o diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c new file mode 100644 index 0000000..307dfb5 --- /dev/null +++ b/fs/btrfs/backref.c @@ -0,0 +1,461 @@ +/* + * Copyright (C) 2011 STRATO. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ + +#include "ctree.h" +#include "backref.h" + +/* + * this makes the path point to (inum INODE_ITEM ioff) + */ +int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root, + struct btrfs_path *path) +{ + int ret; + struct btrfs_key key; + struct extent_buffer *eb; + struct btrfs_key found_key; + + key.type = BTRFS_INODE_ITEM_KEY; + key.objectid = inum; + key.offset = ioff; + + ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0); + if (ret < 0) + return ret; + + eb = path->nodes[0]; + if (ret && path->slots[0] >= btrfs_header_nritems(eb)) { + ret = btrfs_next_leaf(fs_root, path); + if (ret) + return ret; + eb = path->nodes[0]; + } + + btrfs_item_key_to_cpu(eb, &found_key, path->slots[0]); + if (found_key.type != key.type || found_key.objectid != key.objectid) + return 1; + + return 0; +} + +static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root, + struct btrfs_path *path, int strict, + u64 *dest_parent_inum, + struct extent_buffer **dest_iref_eb, + int *dest_slot) +{ + int ret; + struct btrfs_key key; + struct extent_buffer *eb; + struct btrfs_key found_key; + + key.type = BTRFS_INODE_REF_KEY; + key.objectid = inum; + key.offset = ioff; + + ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0); + if (ret < 0) + goto out; + + eb = path->nodes[0]; + if (ret && path->slots[0] >= btrfs_header_nritems(eb)) { + ret = btrfs_next_leaf(fs_root, path); + if (ret) + goto out; + eb = path->nodes[0]; + } + + btrfs_item_key_to_cpu(eb, &found_key, path->slots[0]); + if (found_key.type != key.type || found_key.objectid != key.objectid) { + ret = 1; + goto out; + } + + ret = 0; + if (dest_parent_inum) + *dest_parent_inum = found_key.offset; + if (dest_iref_eb) + *dest_iref_eb = eb; + if (dest_slot) + *dest_slot = path->slots[0]; + +out: + btrfs_release_path(path); + return ret; +} + +/* + * this iterates to turn a btrfs_inode_ref into a full filesystem path. elements + * of the path are separated by ''/'' and the path is guaranteed to be + * 0-terminated. the path is only given within the current file system. + * Therefore, it never starts with a ''/''. the caller is responsible to provide + * "size" bytes in "dest". the dest buffer will be filled backwards! the idea is + * that in case of an overflow, the lower part in the hierarchie is most + * important to the user. finally, the start point of resulting the string is + * returned. in case the path would overflow, "..." is added at the front of + * the string and iteration stops regularly. + */ +static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, + struct btrfs_inode_ref *iref, + struct extent_buffer *eb, u64 parent, + char *dest, u32 size) +{ + u32 len; + int slot; + u64 inum; + int ret; + u32 left = size - 1; + + dest[left] = ''\0''; + + while (1) { + len = btrfs_inode_ref_name_len(eb, iref); + if (len > left) { + if (size < 4) + return dest+left; + if (left > 3) + dest += left - 3; + memcpy(dest, "...", 3); + return dest; + } + left -= len; + read_extent_buffer(eb, dest+left, + (unsigned long)(iref + 1), len); + + ret = inode_item_info(parent, 0, fs_root, path); + if (ret) + return ERR_PTR(ret); + eb = path->nodes[0]; + btrfs_release_path(path); + + ret = inode_ref_info(parent, 0, fs_root, path, 0, + &inum, NULL, &slot); + if (ret) + return ERR_PTR(ret); + + if (parent == inum) /* regular exit ahead */ { + return dest+left; + } + + iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); + parent = inum; + if (left > 0) { + dest[left-1] = ''/''; + --left; + } + } +} + +/* + * this makes the path point to (logical EXTENT_ITEM *) + * returns 0 for data blocks, 1 for tree blocks and <0 on error + */ +int data_extent_from_logical(struct btrfs_root *root, u64 logical, + struct btrfs_path *path, + struct btrfs_key *found_key) +{ + int ret; + u64 flags; + u32 item_size; + struct extent_buffer *eb; + struct btrfs_extent_item *ei; + struct btrfs_key key; + struct btrfs_key f; + struct btrfs_key *fk = found_key ? found_key : &f; + + key.type = BTRFS_EXTENT_ITEM_KEY; + key.objectid = logical; + key.offset = (u64)-1; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + return ret; + ret = btrfs_previous_item(root->fs_info->extent_root, path, + 0, BTRFS_EXTENT_ITEM_KEY); + if (ret < 0) + return ret; + btrfs_item_key_to_cpu(path->nodes[0], fk, path->slots[0]); + if (fk->type != BTRFS_EXTENT_ITEM_KEY || fk->objectid > logical || + fk->objectid + fk->offset <= logical) + return -1; + + eb = path->nodes[0]; + item_size = btrfs_item_size_nr(eb, path->slots[0]); + BUG_ON(item_size < sizeof(*ei)); + + ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item); + flags = btrfs_extent_flags(eb, ei); + + if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) + return 1; + + return 0; +} + +/* + * helper function to iterate extent backrefs. ptr must point to a 0 value for + * the first call and may be modified. it is used to track state. + * if more backrefs exist, 0 is returned and the next call to _get_extent_ref + * must pass the modified ptr parameter to get to the next backref. + * after the last backref was processed, 1 is returned. + * returns <0 on error + */ +static int _get_extent_ref(int flags_wanted, int type_wanted, + unsigned long *ptr, struct extent_buffer *eb, + struct btrfs_extent_item *ei, u32 item_size, + struct btrfs_extent_inline_ref **eiref) +{ + int type; + int again = 0; + unsigned long end; + u64 flags; + struct btrfs_tree_block_info *info; + + if (!*ptr) { + /* first call */ + flags = btrfs_extent_flags(eb, ei); + if (!(flags & flags_wanted)) + return -1; + if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { + info = (struct btrfs_tree_block_info *)(ei + 1); + *eiref = (struct btrfs_extent_inline_ref *)(info + 1); + } else { + *eiref = (struct btrfs_extent_inline_ref *)(ei + 1); + } + *ptr = (unsigned long)*eiref; + } + + end = (unsigned long)ei + item_size; + +again: + again = 0; + + *eiref = (struct btrfs_extent_inline_ref *)*ptr; + type = btrfs_extent_inline_ref_type(eb, *eiref); + + if (type != type_wanted) + again = 1; + + *ptr += btrfs_extent_inline_ref_size(type); + + WARN_ON(*ptr > end); + if (*ptr == end) + return 1; /* last */ + + if (again) + goto again; + + return 0; +} + +/* + * reads the tree block backref for an extent. tree level and root are returned + * through dest_level and dest_root. ptr must point to a 0 value for the first + * call and may be modified (see _get_extent_ref comment). + * returns 0 on success, <0 on error. note: in contrast to _get_extent_ref this + * one never returns 1! + */ +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, + struct btrfs_extent_item *ei, u32 item_size, + u64 *dest_root, u64 *dest_level) +{ + int ret; + struct btrfs_tree_block_info *info; + struct btrfs_extent_inline_ref *eiref; + + ret = _get_extent_ref(BTRFS_EXTENT_FLAG_TREE_BLOCK, + BTRFS_TREE_BLOCK_REF_KEY, ptr, eb, ei, + item_size, &eiref); + if (ret < 0) + return ret; + + WARN_ON(!ret); /* multiple tree backrefs for this extent */ + + info = (struct btrfs_tree_block_info *)(ei + 1); + *dest_root = btrfs_extent_inline_ref_offset(eb, eiref); + *dest_level = btrfs_tree_block_level(eb, info); + + return 0; +} + +static int data_inode_for_extent(unsigned long *ptr, struct extent_buffer *eb, + struct btrfs_extent_item *ei, + u32 item_size, u64 *dest_inum, + u64 *dest_ioff) +{ + int ret; + struct btrfs_extent_inline_ref *eiref; + struct btrfs_extent_data_ref *dref; + + ret = _get_extent_ref(BTRFS_EXTENT_FLAG_DATA, BTRFS_EXTENT_DATA_REF_KEY, + ptr, eb, ei, item_size, &eiref); + if (ret < 0) + return ret; + + dref = (struct btrfs_extent_data_ref *)(&eiref->offset); + if (btrfs_extent_data_ref_root(eb, dref) != BTRFS_FS_TREE_OBJECTID) { + WARN_ON(1); + return -1; + } + + *dest_inum = btrfs_extent_data_ref_objectid(eb, dref); + *dest_ioff = btrfs_extent_data_ref_offset(eb, dref); + + return ret; +} + +/* + * calls iterate() for every inode that references the extent identified by + * the given parameters. + * when the iterator function returns a non-zero value, iteration stops. + */ +int iterate_extent_inodes(struct extent_buffer *eb, + struct btrfs_extent_item *ei, + loff_t extent_item_offset, u32 item_size, + iterate_extent_inodes_t *iterate, void *ctx) +{ + int last; + u64 inum; + unsigned long ptr = 0; + loff_t extent_data_item_offset; + int ret; + + do { + last = data_inode_for_extent(&ptr, eb, ei, item_size, &inum, + &extent_data_item_offset); + if (last < 0) + return last; + + ret = iterate(inum, extent_item_offset+extent_data_item_offset, + ctx); + if (ret) + return ret; + + } while (!last); + + return 0; +} + +/* + * just a convenience function combining data_extent_from_logical and + * iterate_extent_inodes. + */ +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, + struct btrfs_path *path, + iterate_extent_inodes_t *iterate, void *ctx) +{ + int ret; + u32 item_size; + struct extent_buffer *l; + struct btrfs_extent_item *extent; + loff_t offset; + struct btrfs_key found_key; + + ret = data_extent_from_logical(fs_info->extent_root, logical, path, + &found_key); + if (ret) + return ret; + + offset = logical - found_key.objectid; + l = path->nodes[0]; + extent = btrfs_item_ptr(l, path->slots[0], struct btrfs_extent_item); + item_size = btrfs_item_size_nr(l, path->slots[0]); + btrfs_release_path(path); + + ret = iterate_extent_inodes(l, extent, offset, item_size, iterate, ctx); + + return ret; +} + +static int iterate_irefs(u64 inum, struct extent_buffer *eb_i, + struct btrfs_root *fs_root, + struct btrfs_path *path, + iterate_irefs_t *iterate, void *ctx) +{ + int ret; + int slot; + u32 cur; + u32 len; + u32 name_len; + u64 parent = 0; + struct extent_buffer *eb_ir; + struct btrfs_item *item; + struct btrfs_inode_ref *iref; + + while (1) { + ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path, + 1, &parent, &eb_ir, &slot); + if (ret < 0) + return ret; + if (ret) + break; + + item = btrfs_item_nr(eb_i, slot); + iref = btrfs_item_ptr(eb_i, slot, struct btrfs_inode_ref); + + for (cur = 0; cur < btrfs_item_size(eb_i, item); cur += len) { + name_len = btrfs_inode_ref_name_len(eb_i, iref); + ret = iterate(parent, iref, eb_ir, slot, ctx); + if (ret) + return ret; + len = sizeof(*iref) + name_len; + iref = (struct btrfs_inode_ref *)((char *)iref + len); + } + } + + return 0; +} + +/* + * returns 0 if the path could be dumped (probably truncated) + * returns <0 in case of an error + */ +static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref, + struct extent_buffer *eb_ir, int slot, + void *ctx) +{ + struct inode_fs_paths *ipath = ctx; + struct extent_buffer *eb_i = ipath->eb_i; + u32 path_len; + char *fs_path; + + if (ipath->left < 2) + return -EOVERFLOW; + + *ipath->dest++ = '' ''; + --ipath->left; + + fs_path = iref_to_path(ipath->fs_root, ipath->path, iref, eb_i, + inum, ipath->scratch_buf, ipath->left); + if (!fs_path || IS_ERR(fs_path)) + return PTR_ERR(fs_path); + path_len = ipath->scratch_buf + ipath->left - fs_path - 1; + if (path_len+1 > ipath->left) + return -EOVERFLOW; + memcpy(ipath->dest, fs_path, path_len+1); + ipath->left -= path_len; + ipath->dest += path_len; + + return 0; +} + +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath) +{ + return iterate_irefs(inum, ipath->eb_i, ipath->fs_root, ipath->path, + inode_to_path, ipath); +} diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h new file mode 100644 index 0000000..6b7e856 --- /dev/null +++ b/fs/btrfs/backref.h @@ -0,0 +1,59 @@ +/* + * Copyright (C) 2011 STRATO. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ + +#ifndef __BTRFS_BACKREF__ +#define __BTRFS_BACKREF__ + +struct inode_fs_paths { + int left; + char *dest; + struct btrfs_path *path; + char *scratch_buf; + struct btrfs_root *fs_root; + int scratch_bufsize; + struct extent_buffer *eb_i; +}; + +typedef int (iterate_extent_inodes_t)(u64 inum, loff_t offset, void *ctx); +typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref, + struct extent_buffer *eb_ir, + int slot, void *ctx); + +int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root, + struct btrfs_path *path); + +int data_extent_from_logical(struct btrfs_root *root, u64 logical, + struct btrfs_path *path, + struct btrfs_key *found_key); + +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, + struct btrfs_extent_item *ei, u32 item_size, + u64 *dest_root, u64 *dest_level); + +int iterate_extent_inodes(struct extent_buffer *eb, + struct btrfs_extent_item *ei, + loff_t extent_item_offset, u32 item_size, + iterate_extent_inodes_t *iterate, void *ctx); + +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, + struct btrfs_path *path, + iterate_extent_inodes_t *iterate, void *ctx); + +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath); + +#endif -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
In normal operation, scrub is reading data sequentially in large portions. In case of an i/o error, we try to find the corrupted area(s) by issuing page sized read requests. With this commit we increment the unverified_errors counter if all of the small size requests succeed. Userland patches carrying such conspicous events to the administrator should already be around. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/scrub.c | 37 ++++++++++++++++++++++++++----------- 1 files changed, 26 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index d5a4108..00e4e58 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -207,18 +207,25 @@ nomem: * recheck_error gets called for every page in the bio, even though only * one may be bad */ -static void scrub_recheck_error(struct scrub_bio *sbio, int ix) +static int scrub_recheck_error(struct scrub_bio *sbio, int ix) { + struct scrub_dev *sdev = sbio->sdev; + u64 sector = (sbio->physical + ix * PAGE_SIZE) >> 9; + if (sbio->err) { - if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, - (sbio->physical + ix * PAGE_SIZE) >> 9, + if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, sector, sbio->bio->bi_io_vec[ix].bv_page) == 0) { if (scrub_fixup_check(sbio, ix) == 0) - return; + return 0; } } + spin_lock(&sdev->stat_lock); + ++sdev->stat.read_errors; + spin_unlock(&sdev->stat_lock); + scrub_fixup(sbio, ix); + return 1; } static int scrub_fixup_check(struct scrub_bio *sbio, int ix) @@ -388,8 +395,14 @@ static void scrub_checksum(struct btrfs_work *work) int ret; if (sbio->err) { + ret = 0; for (i = 0; i < sbio->count; ++i) - scrub_recheck_error(sbio, i); + ret |= scrub_recheck_error(sbio, i); + if (!ret) { + spin_lock(&sdev->stat_lock); + ++sdev->stat.unverified_errors; + spin_unlock(&sdev->stat_lock); + } sbio->bio->bi_flags &= ~(BIO_POOL_MASK - 1); sbio->bio->bi_flags |= 1 << BIO_UPTODATE; @@ -402,10 +415,6 @@ static void scrub_checksum(struct btrfs_work *work) bi->bv_offset = 0; bi->bv_len = PAGE_SIZE; } - - spin_lock(&sdev->stat_lock); - ++sdev->stat.read_errors; - spin_unlock(&sdev->stat_lock); goto out; } for (i = 0; i < sbio->count; ++i) { @@ -426,8 +435,14 @@ static void scrub_checksum(struct btrfs_work *work) WARN_ON(1); } kunmap_atomic(buffer, KM_USER0); - if (ret) - scrub_recheck_error(sbio, i); + if (ret) { + ret = scrub_recheck_error(sbio, i); + if (!ret) { + spin_lock(&sdev->stat_lock); + ++sdev->stat.unverified_errors; + spin_unlock(&sdev->stat_lock); + } + } } out: -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
While scrubbing, we may encounter various errors. Previously, a logical address was printed to the log only. Now, all paths belonging to that address are resolved and printed separately. That should work for hardlinks as well as reflinks. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/scrub.c | 140 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 137 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 00e4e58..e294d76 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -27,6 +27,7 @@ #include "volumes.h" #include "disk-io.h" #include "ordered-data.h" +#include "backref.h" /* * This is only the first step towards a full-features scrub. It reads all @@ -106,6 +107,19 @@ struct scrub_dev { spinlock_t stat_lock; }; +struct scrub_warning { + struct btrfs_path *path; + u64 extent_item_size; + char *scratch_buf; + char *msg_buf; + const char *errstr; + u64 sector; + u64 logical; + struct btrfs_device *dev; + int msg_bufsize; + int scratch_bufsize; +}; + static void scrub_free_csums(struct scrub_dev *sdev) { while (!list_empty(&sdev->csum_list)) { @@ -201,6 +215,121 @@ nomem: return ERR_PTR(-ENOMEM); } +static int scrub_print_warning_inode(u64 inum, loff_t offset, void *ctx) +{ + u64 isize; + int ret; + struct extent_buffer *eb_i; + struct btrfs_inode_item *ii; + struct scrub_warning *swarn = ctx; + struct btrfs_fs_info *fs_info = swarn->dev->dev_root->fs_info; + struct inode_fs_paths ipath; + + ret = inode_item_info(inum, 0, fs_info->fs_root, swarn->path); + if (ret) { +err: + printk(KERN_WARNING "btrfs: %s at logical %llu on dev " + "%s, sector %llu, inode %llu, offset %llu: path " + "resolving failed with ret=%d\n", swarn->errstr, + swarn->logical, swarn->dev->name, swarn->sector, inum, + offset, ret); + return 0; + } + eb_i = swarn->path->nodes[0]; + ii = btrfs_item_ptr(eb_i, swarn->path->slots[0], + struct btrfs_inode_item); + btrfs_release_path(swarn->path); + + isize = btrfs_inode_size(eb_i, ii); + + ipath.left = swarn->msg_bufsize - 1; + ipath.dest = swarn->msg_buf; + ipath.path = swarn->path; + ipath.scratch_buf = swarn->scratch_buf; + ipath.scratch_bufsize = swarn->scratch_bufsize; + ipath.fs_root = fs_info->fs_root; + ipath.eb_i = eb_i; + + ret = paths_from_inode(inum, &ipath); + if (ret < 0) + goto err; + + printk(KERN_WARNING "btrfs: %s at logical %llu on dev " + "%s, sector %llu, inode %llu, offset %llu, " + "length %llu, links %u (path:%s)\n", swarn->errstr, + swarn->logical, swarn->dev->name, swarn->sector, inum, offset, + min(isize - offset, (u64)PAGE_SIZE), + btrfs_inode_nlink(eb_i, ii), swarn->msg_buf); + + return 0; +} + +static void scrub_print_warning(const char *errstr, struct scrub_bio *sbio, + int ix) +{ + struct btrfs_device *dev = sbio->sdev->dev; + struct btrfs_fs_info *fs_info = dev->dev_root->fs_info; + struct btrfs_path *path; + struct btrfs_key found_key; + struct extent_buffer *eb; + struct btrfs_extent_item *ei; + struct scrub_warning swarn; + u32 item_size; + int ret; + u64 ref_root; + u64 ref_level; + unsigned long ptr = 0; + static const int bufsize = 4096; + loff_t extent_item_offset; + + path = btrfs_alloc_path(); + + swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS); + swarn.msg_buf = kmalloc(bufsize, GFP_NOFS); + swarn.sector = (sbio->physical + ix * PAGE_SIZE) >> 9; + swarn.logical = sbio->logical + ix * PAGE_SIZE; + swarn.errstr = errstr; + swarn.dev = dev; + swarn.msg_bufsize = bufsize; + swarn.scratch_bufsize = bufsize; + + if (!path || !swarn.scratch_buf || !swarn.msg_buf) + goto out; + + ret = data_extent_from_logical(fs_info->extent_root, + swarn.logical, path, &found_key); + if (ret < 0) + goto out; + + extent_item_offset = swarn.logical - found_key.objectid; + swarn.extent_item_size = found_key.offset; + + eb = path->nodes[0]; + ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item); + item_size = btrfs_item_size_nr(eb, path->slots[0]); + + btrfs_release_path(path); + + if (ret) { + ret = tree_backref_for_extent(&ptr, eb, ei, item_size, + &ref_root, &ref_level); + printk(KERN_WARNING "%s at logical %llu on dev %s, " + "sector %llu: metadata %s (level %llu) in tree %llu\n", + errstr, swarn.logical, dev->name, swarn.sector, + ref_level ? "node" : "leaf", ret < 0 ? -1 : ref_level, + ret < 0 ? -1 : ref_root); + } else { + swarn.path = path; + iterate_extent_inodes(eb, ei, extent_item_offset, item_size, + scrub_print_warning_inode, &swarn); + } + +out: + btrfs_free_path(path); + kfree(swarn.scratch_buf); + kfree(swarn.msg_buf); +} + /* * scrub_recheck_error gets called when either verification of the page * failed or the bio failed to read, e.g. with EIO. In the latter case, @@ -218,6 +347,11 @@ static int scrub_recheck_error(struct scrub_bio *sbio, int ix) if (scrub_fixup_check(sbio, ix) == 0) return 0; } + if (printk_ratelimit()) + scrub_print_warning("i/o error", sbio, ix); + } else { + if (printk_ratelimit()) + scrub_print_warning("checksum error", sbio, ix); } spin_lock(&sdev->stat_lock); @@ -333,7 +467,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix) spin_unlock(&sdev->stat_lock); if (printk_ratelimit()) - printk(KERN_ERR "btrfs: fixed up at %llu\n", + printk(KERN_ERR "btrfs: fixed up error at logical %llu\n", (unsigned long long)logical); return; @@ -344,8 +478,8 @@ uncorrectable: spin_unlock(&sdev->stat_lock); if (printk_ratelimit()) - printk(KERN_ERR "btrfs: unable to fixup at %llu\n", - (unsigned long long)logical); + printk(KERN_ERR "btrfs: unable to fixup (regular) error at " + "logical %llu\n", (unsigned long long)logical); } static int scrub_fixup_io(int rw, struct block_device *bdev, sector_t sector, -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fix the mirror_num determination in scrub_stripe. The rest of the scrub code did not use mirror_num for anything important and that error went unnoticed. The nodatasum fixup patch of this set depends on a correct mirror_num. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/scrub.c | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index e294d76..ec29ce8 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -913,21 +913,21 @@ static noinline_for_stack int scrub_stripe(struct scrub_dev *sdev, if (map->type & BTRFS_BLOCK_GROUP_RAID0) { offset = map->stripe_len * num; increment = map->stripe_len * map->num_stripes; - mirror_num = 0; + mirror_num = 1; } else if (map->type & BTRFS_BLOCK_GROUP_RAID10) { int factor = map->num_stripes / map->sub_stripes; offset = map->stripe_len * (num / map->sub_stripes); increment = map->stripe_len * factor; - mirror_num = num % map->sub_stripes; + mirror_num = num % map->sub_stripes + 1; } else if (map->type & BTRFS_BLOCK_GROUP_RAID1) { increment = map->stripe_len; - mirror_num = num % map->num_stripes; + mirror_num = num % map->num_stripes + 1; } else if (map->type & BTRFS_BLOCK_GROUP_DUP) { increment = map->stripe_len; - mirror_num = num % map->num_stripes; + mirror_num = num % map->num_stripes + 1; } else { increment = map->stripe_len; - mirror_num = 0; + mirror_num = 1; } path = btrfs_alloc_path(); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Currently, extent_read_full_page always assumes we are trying to read mirror 0, which generally is the best we can do. To add flexibility, pass it as a parameter. This will be needed by scrub fixup code. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent_io.c | 6 +++--- fs/btrfs/extent_io.h | 2 +- fs/btrfs/inode.c | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a203d36..daf1e47 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -874,7 +874,7 @@ static int btree_readpage(struct file *file, struct page *page) { struct extent_io_tree *tree; tree = &BTRFS_I(page->mapping->host)->io_tree; - return extent_read_full_page(tree, page, btree_get_extent); + return extent_read_full_page(tree, page, btree_get_extent, 0); } static int btree_releasepage(struct page *page, gfp_t gfp_flags) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b181a94..b78f665 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2111,16 +2111,16 @@ static int __extent_read_full_page(struct extent_io_tree *tree, } int extent_read_full_page(struct extent_io_tree *tree, struct page *page, - get_extent_t *get_extent) + get_extent_t *get_extent, int mirror_num) { struct bio *bio = NULL; unsigned long bio_flags = 0; int ret; - ret = __extent_read_full_page(tree, page, get_extent, &bio, 0, + ret = __extent_read_full_page(tree, page, get_extent, &bio, mirror_num, &bio_flags); if (bio) - ret = submit_one_bio(READ, bio, 0, bio_flags); + ret = submit_one_bio(READ, bio, mirror_num, bio_flags); return ret; } diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 4e8445a..2fef77f 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -177,7 +177,7 @@ int unlock_extent_cached(struct extent_io_tree *tree, u64 start, u64 end, int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end, gfp_t mask); int extent_read_full_page(struct extent_io_tree *tree, struct page *page, - get_extent_t *get_extent); + get_extent_t *get_extent, int mirror_num); int __init extent_io_init(void); void extent_io_exit(void); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 113913a..938fc10 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6238,7 +6238,7 @@ int btrfs_readpage(struct file *file, struct page *page) { struct extent_io_tree *tree; tree = &BTRFS_I(page->mapping->host)->io_tree; - return extent_read_full_page(tree, page, btrfs_get_extent); + return extent_read_full_page(tree, page, btrfs_get_extent, 0); } static int btrfs_writepage(struct page *page, struct writeback_control *wbc) -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2011-Jun-13 19:10 UTC
[PATCH v1 6/6] scrub: add fixup code for errors on nodatasum files
This removes a FIXME comment and introduces the first part of nodatasum fixup: It gets the corresponding inode for a logical address and triggers a regular readpage for the corrupted sector. Once we have on-the-fly error correction our error will be automatically corrected. The correction code is expected to clear the newly introduced EXTENT_DAMAGED flag, making scrub report that error as "corrected" instead of "uncorrectable" eventually. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/extent_io.h | 1 + fs/btrfs/scrub.c | 174 ++++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 170 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 2fef77f..906ea42 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -17,6 +17,7 @@ #define EXTENT_NODATASUM (1 << 10) #define EXTENT_DO_ACCOUNTING (1 << 11) #define EXTENT_FIRST_DELALLOC (1 << 12) +#define EXTENT_DAMAGED (1 << 13) #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK) #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index ec29ce8..a2aa47a 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -27,6 +27,7 @@ #include "volumes.h" #include "disk-io.h" #include "ordered-data.h" +#include "transaction.h" #include "backref.h" /* @@ -94,6 +95,7 @@ struct scrub_dev { int first_free; int curr; atomic_t in_flight; + atomic_t fixup; spinlock_t list_lock; wait_queue_head_t list_wait; u16 csum_size; @@ -107,6 +109,14 @@ struct scrub_dev { spinlock_t stat_lock; }; +struct scrub_fixup_nodatasum { + struct scrub_dev *sdev; + u64 logical; + struct btrfs_root *root; + struct btrfs_work work; + u64 mirror_num; +}; + struct scrub_warning { struct btrfs_path *path; u64 extent_item_size; @@ -195,12 +205,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device *dev) if (i != SCRUB_BIOS_PER_DEV-1) sdev->bios[i]->next_free = i + 1; - else + else sdev->bios[i]->next_free = -1; } sdev->first_free = 0; sdev->curr = -1; atomic_set(&sdev->in_flight, 0); + atomic_set(&sdev->fixup, 0); atomic_set(&sdev->cancel_req, 0); sdev->csum_size = btrfs_super_csum_size(&fs_info->super_copy); INIT_LIST_HEAD(&sdev->csum_list); @@ -330,6 +341,141 @@ out: kfree(swarn.msg_buf); } +static int scrub_fixup_readpage(u64 inum, loff_t offset, void *ctx) +{ + struct page *page; + unsigned long index; + struct scrub_fixup_nodatasum *fixup = ctx; + int ret; + int corrected; + struct btrfs_key key; + struct inode *inode; + int end = offset + PAGE_SIZE - 1; + + key.type = BTRFS_INODE_ITEM_KEY; + key.objectid = inum; + key.offset = 0; + inode = btrfs_iget(fixup->root->fs_info->sb, &key, + fixup->root->fs_info->fs_root, NULL); + if (IS_ERR(inode)) + return -1; + + ret = set_extent_bit(&BTRFS_I(inode)->io_tree, offset, end, + EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS); + if (ret) + return ret < 0 ? ret : -1; + + index = offset >> PAGE_CACHE_SHIFT; + + page = find_or_create_page(inode->i_mapping, index, GFP_NOFS); + if (!page) + return -1; + + ret = extent_read_full_page(&BTRFS_I(inode)->io_tree, page, + btrfs_get_extent, fixup->mirror_num); + wait_on_page_locked(page); + corrected = !test_range_bit(&BTRFS_I(inode)->io_tree, offset, end, + EXTENT_DAMAGED, 0, NULL); + + if (corrected) + WARN_ON(!PageUptodate(page)); + else + clear_extent_bit(&BTRFS_I(inode)->io_tree, offset, end, + EXTENT_DAMAGED, 0, 0, NULL, GFP_NOFS); + + put_page(page); + iput(inode); + + if (ret < 0) + return ret; + + if (ret == 0 && corrected) { + /* + * we only need to call readpage for one of the inodes belonging + * to this extent. so make iterate_extent_inodes stop + */ + return 1; + } + + return -1; +} + +static void scrub_fixup_nodatasum(struct btrfs_work *work) +{ + int ret; + struct scrub_fixup_nodatasum *fixup; + struct scrub_dev *sdev; + struct btrfs_trans_handle *trans = NULL; + struct btrfs_fs_info *fs_info; + struct btrfs_path *path; + int uncorrectable = 0; + + fixup = container_of(work, struct scrub_fixup_nodatasum, work); + sdev = fixup->sdev; + fs_info = fixup->root->fs_info; + + path = btrfs_alloc_path(); + if (!path) { + spin_lock(&sdev->stat_lock); + ++sdev->stat.malloc_errors; + spin_unlock(&sdev->stat_lock); + uncorrectable = 1; + goto out; + } + + trans = btrfs_join_transaction(fixup->root); + if (IS_ERR(trans)) { + uncorrectable = 1; + goto out; + } + + /* + * the idea is to trigger a regular read through the standard path. we + * read a page from the (failed) logical address by specifying the + * corresponding copynum of the failed sector. thus, that readpage is + * expected to fail. + * that is the point where on-the-fly error correction will kick in + * (once it''s finished) and rewrite the failed sector if a good copy + * can be found. + */ + ret = iterate_inodes_from_logical(fixup->logical, fixup->root->fs_info, + path, scrub_fixup_readpage, + fixup); + if (ret < 0) { + uncorrectable = 1; + goto out; + } + WARN_ON(ret != 1); + + spin_lock(&sdev->stat_lock); + ++sdev->stat.corrected_errors; + spin_unlock(&sdev->stat_lock); + +out: + if (trans && !IS_ERR(trans)) + btrfs_end_transaction(trans, fixup->root); + if (uncorrectable) { + spin_lock(&sdev->stat_lock); + ++sdev->stat.uncorrectable_errors; + spin_unlock(&sdev->stat_lock); + if (printk_ratelimit()) + printk(KERN_ERR "btrfs: unable to fixup (nodatasum) " + "error at logical %llu\n", fixup->logical); + } + + btrfs_free_path(path); + kfree(fixup); + + /* see caller why we''re pretending to be paused in the scrub counters */ + mutex_lock(&fs_info->scrub_lock); + atomic_dec(&fs_info->scrubs_running); + atomic_dec(&fs_info->scrubs_paused); + mutex_unlock(&fs_info->scrub_lock); + atomic_dec(&sdev->fixup); + wake_up(&fs_info->scrub_pause_wait); + wake_up(&sdev->list_wait); +} + /* * scrub_recheck_error gets called when either verification of the page * failed or the bio failed to read, e.g. with EIO. In the latter case, @@ -398,6 +544,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix) struct btrfs_fs_info *fs_info = sdev->dev->dev_root->fs_info; struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree; struct btrfs_multi_bio *multi = NULL; + struct scrub_fixup_nodatasum *fixup; u64 logical = sbio->logical + ix * PAGE_SIZE; u64 length; int i; @@ -406,12 +553,28 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix) if ((sbio->spag[ix].flags & BTRFS_EXTENT_FLAG_DATA) && (sbio->spag[ix].have_csum == 0)) { + fixup = kzalloc(sizeof(*fixup), GFP_NOFS); + fixup->sdev = sdev; + fixup->logical = logical; + fixup->root = fs_info->extent_root; + fixup->mirror_num = sbio->spag[ix].mirror_num; /* - * nodatasum, don''t try to fix anything - * FIXME: we can do better, open the inode and trigger a - * writeback + * increment scrubs_running to prevent cancel requests from + * completing as long as a fixup worker is running. we must also + * increment scrubs_paused to prevent deadlocking on pause + * requests used for transactions commits (as the worker uses a + * transaction context). it is safe to regard the fixup worker + * as paused for all matters practical. effectively, we only + * avoid cancellation requests from completing. */ - goto uncorrectable; + mutex_lock(&fs_info->scrub_lock); + atomic_inc(&fs_info->scrubs_running); + atomic_inc(&fs_info->scrubs_paused); + mutex_unlock(&fs_info->scrub_lock); + atomic_inc(&sdev->fixup); + fixup->work.func = scrub_fixup_nodatasum; + btrfs_queue_worker(&fs_info->scrub_workers, &fixup->work); + return; } length = PAGE_SIZE; @@ -1404,6 +1567,7 @@ int btrfs_scrub_dev(struct btrfs_root *root, u64 devid, u64 start, u64 end, ret = scrub_enumerate_chunks(sdev, start, end); wait_event(sdev->list_wait, atomic_read(&sdev->in_flight) == 0); + wait_event(sdev->list_wait, atomic_read(&sdev->fixup) == 0); atomic_dec(&fs_info->scrubs_running); wake_up(&fs_info->scrub_pause_wait); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Jun-16 21:24 UTC
Re: [PATCH v1 1/6] added helper functions to iterate backrefs
On Mon, Jun 13, 2011 at 09:10:34PM +0200, Jan Schmidt wrote:> These helper functions iterate back references and call a function for each > backref. There is also a function to resolve an inode to a path in the > file system. > > Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> > --- > fs/btrfs/Makefile | 3 +- > fs/btrfs/backref.c | 461 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/backref.h | 59 +++++++ > 3 files changed, 522 insertions(+), 1 deletions(-) > > diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile > index 9b72dcf..c63f649 100644 > --- a/fs/btrfs/Makefile > > +++ b/fs/btrfs/Makefile > @@ -7,4 +7,5 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ > extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \ > extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ > export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \ > - compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o > + compression.o delayed-ref.o relocation.o delayed-inode.o backref.o \ > + scrub.o > diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c > new file mode 100644 > index 0000000..307dfb5 > --- /dev/null > +++ b/fs/btrfs/backref.c > @@ -0,0 +1,461 @@ > +/* > + * Copyright (C) 2011 STRATO. All rights reserved. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public > + * License v2 as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public > + * License along with this program; if not, write to the > + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, > + * Boston, MA 021110-1307, USA. > + */ > + > +#include "ctree.h" > +#include "backref.h" > + > +/* > + * this makes the path point to (inum INODE_ITEM ioff) > + */ > +int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root, > + struct btrfs_path *path) > +{this func ...> + int ret; > + struct btrfs_key key; > + struct extent_buffer *eb; > + struct btrfs_key found_key; > + > + key.type = BTRFS_INODE_ITEM_KEY; > + key.objectid = inum; > + key.offset = ioff; > + > + ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0); > + if (ret < 0) > + return ret; > + > + eb = path->nodes[0]; > + if (ret && path->slots[0] >= btrfs_header_nritems(eb)) { > + ret = btrfs_next_leaf(fs_root, path); > + if (ret) > + return ret; > + eb = path->nodes[0]; > + } > + > + btrfs_item_key_to_cpu(eb, &found_key, path->slots[0]); > + if (found_key.type != key.type || found_key.objectid != key.objectid) > + return 1; > + > + return 0; > +} > + > +static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root, > + struct btrfs_path *path, int strict, > + u64 *dest_parent_inum, > + struct extent_buffer **dest_iref_eb, > + int *dest_slot) > +{and this one share a fair amount of code, do a common helper instead> + int ret; > + struct btrfs_key key; > + struct extent_buffer *eb; > + struct btrfs_key found_key; > + > + key.type = BTRFS_INODE_REF_KEY; > + key.objectid = inum; > + key.offset = ioff; > + > + ret = btrfs_search_slot(NULL, fs_root, &key, path, 0, 0); > + if (ret < 0) > + goto out; > + > + eb = path->nodes[0]; > + if (ret && path->slots[0] >= btrfs_header_nritems(eb)) { > + ret = btrfs_next_leaf(fs_root, path); > + if (ret) > + goto out; > + eb = path->nodes[0]; > + } > + > + btrfs_item_key_to_cpu(eb, &found_key, path->slots[0]); > + if (found_key.type != key.type || found_key.objectid != key.objectid) { > + ret = 1; > + goto out; > + } > + > + ret = 0; > + if (dest_parent_inum) > + *dest_parent_inum = found_key.offset; > + if (dest_iref_eb) > + *dest_iref_eb = eb; > + if (dest_slot) > + *dest_slot = path->slots[0]; > + > +out: > + btrfs_release_path(path); > + return ret; > +} > + > +/* > + * this iterates to turn a btrfs_inode_ref into a full filesystem path. elements > + * of the path are separated by ''/'' and the path is guaranteed to be > + * 0-terminated. the path is only given within the current file system. > + * Therefore, it never starts with a ''/''. the caller is responsible to provide > + * "size" bytes in "dest". the dest buffer will be filled backwards! the idea is > + * that in case of an overflow, the lower part in the hierarchie is most > + * important to the user. finally, the start point of resulting the string is^^^> + * returned. in case the path would overflow, "..." is added at the front of > + * the string and iteration stops regularly. > + */ > +static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, > + struct btrfs_inode_ref *iref, > + struct extent_buffer *eb, u64 parent, > + char *dest, u32 size) > +{ > + u32 len; > + int slot; > + u64 inum; > + int ret; > + u32 left = size - 1;this confused me first, ''left'' may be ambiguous, something like ''bytes_left'' would be better> + > + dest[left] = ''\0''; > + > + while (1) { > + len = btrfs_inode_ref_name_len(eb, iref); > + if (len > left) { > + if (size < 4) > + return dest+left;break;> + if (left > 3) > + dest += left - 3; > + memcpy(dest, "...", 3); > + return dest;i''d rather see this consistent with other return points, if (left > 3) left -= 3; else left = 0; memcpy(dest + left, "...", 3); break;> + } > + left -= len; > + read_extent_buffer(eb, dest+left, > + (unsigned long)(iref + 1), len); > + > + ret = inode_item_info(parent, 0, fs_root, path); > + if (ret) > + return ERR_PTR(ret); > + eb = path->nodes[0]; > + btrfs_release_path(path); > + > + ret = inode_ref_info(parent, 0, fs_root, path, 0, > + &inum, NULL, &slot); > + if (ret) > + return ERR_PTR(ret); > + > + if (parent == inum) /* regular exit ahead */ { > + return dest+left;break; and please put the comment before the if and remove { }> + } > + > + iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); > + parent = inum; > + if (left > 0) { > + dest[left-1] = ''/''; > + --left;swap the lines and remove "-1" :)> + } > + }return dest + left;> +} > + > +/* > + * this makes the path point to (logical EXTENT_ITEM *) > + * returns 0 for data blocks, 1 for tree blocks and <0 on error > + */ > +int data_extent_from_logical(struct btrfs_root *root, u64 logical, > + struct btrfs_path *path, > + struct btrfs_key *found_key) > +{ > + int ret; > + u64 flags; > + u32 item_size; > + struct extent_buffer *eb; > + struct btrfs_extent_item *ei; > + struct btrfs_key key; > + struct btrfs_key f;tmpkey;> + struct btrfs_key *fk = found_key ? found_key : &f;transform this to if (!found_key) found_key = &tmpkey; and use found_key along still, will you ever pass NULL found_key?> + > + key.type = BTRFS_EXTENT_ITEM_KEY; > + key.objectid = logical; > + key.offset = (u64)-1; > + > + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); > + if (ret < 0) > + return ret; > + ret = btrfs_previous_item(root->fs_info->extent_root, path, > + 0, BTRFS_EXTENT_ITEM_KEY); > + if (ret < 0) > + return ret; > + btrfs_item_key_to_cpu(path->nodes[0], fk, path->slots[0]); > + if (fk->type != BTRFS_EXTENT_ITEM_KEY || fk->objectid > logical || > + fk->objectid + fk->offset <= logical) > + return -1;this is -EPERM, confusing. eg btrfs_previous_item a few lines above may return variety of error codes from the whole function, EPERM just does not fit. does this check mean some serious error? inconsistency or corruption?> + > + eb = path->nodes[0]; > + item_size = btrfs_item_size_nr(eb, path->slots[0]); > + BUG_ON(item_size < sizeof(*ei)); > + > + ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item); > + flags = btrfs_extent_flags(eb, ei); > + > + if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) > + return 1; > + > + return 0; > +} > + > +/* > + * helper function to iterate extent backrefs. ptr must point to a 0 value for > + * the first call and may be modified. it is used to track state. > + * if more backrefs exist, 0 is returned and the next call to _get_extent_ref > + * must pass the modified ptr parameter to get to the next backref. > + * after the last backref was processed, 1 is returned. > + * returns <0 on error > + */ > +static int _get_extent_ref(int flags_wanted, int type_wanted,__get_extent_ref single score function prefixes are not used flags_wanted is compared to extent_flags of u64 type> + unsigned long *ptr, struct extent_buffer *eb, > + struct btrfs_extent_item *ei, u32 item_size, > + struct btrfs_extent_inline_ref **eiref) > +{ > + int type; > + int again = 0; > + unsigned long end; > + u64 flags; > + struct btrfs_tree_block_info *info; > + > + if (!*ptr) { > + /* first call */ > + flags = btrfs_extent_flags(eb, ei); > + if (!(flags & flags_wanted)) > + return -1;again, better errorcode, EINVAL seems to be the right choice> + if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { > + info = (struct btrfs_tree_block_info *)(ei + 1); > + *eiref = (struct btrfs_extent_inline_ref *)(info + 1); > + } else { > + *eiref = (struct btrfs_extent_inline_ref *)(ei + 1);style: remove { } around single if/else statement> + }> + *ptr = (unsigned long)*eiref; > + } > + > + end = (unsigned long)ei + item_size; > + > +again: > + again = 0;excessive use of goto detected: do {> + > + *eiref = (struct btrfs_extent_inline_ref *)*ptr; > + type = btrfs_extent_inline_ref_type(eb, *eiref); > + > + if (type != type_wanted) > + again = 1; > + > + *ptr += btrfs_extent_inline_ref_size(type); > + > + WARN_ON(*ptr > end); > + if (*ptr == end) > + return 1; /* last */ > + > + if (again) > + goto again;} while (type != type_wanted);> + > + return 0; > +} > + > +/* > + * reads the tree block backref for an extent. tree level and root are returned > + * through dest_level and dest_root. ptr must point to a 0 value for the first > + * call and may be modified (see _get_extent_ref comment). > + * returns 0 on success, <0 on error. note: in contrast to _get_extent_ref this > + * one never returns 1! > + */ > +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, > + struct btrfs_extent_item *ei, u32 item_size, > + u64 *dest_root, u64 *dest_level)there''s not a formal "rule" or something, but i''ve seen outgoing parameters named with ''out_'' dest_level can be just an int, levels fit in one byte anyway> +{ > + int ret; > + struct btrfs_tree_block_info *info; > + struct btrfs_extent_inline_ref *eiref; > + > + ret = _get_extent_ref(BTRFS_EXTENT_FLAG_TREE_BLOCK, > + BTRFS_TREE_BLOCK_REF_KEY, ptr, eb, ei, > + item_size, &eiref); > + if (ret < 0) > + return ret; > + > + WARN_ON(!ret); /* multiple tree backrefs for this extent */as we will probably have CONFIG_DEBUG, the comment makes a good candidate for a printk message> + > + info = (struct btrfs_tree_block_info *)(ei + 1); > + *dest_root = btrfs_extent_inline_ref_offset(eb, eiref); > + *dest_level = btrfs_tree_block_level(eb, info); > + > + return 0; > +} > + > +static int data_inode_for_extent(unsigned long *ptr, struct extent_buffer *eb, > + struct btrfs_extent_item *ei, > + u32 item_size, u64 *dest_inum, > + u64 *dest_ioff) > +{ > + int ret; > + struct btrfs_extent_inline_ref *eiref; > + struct btrfs_extent_data_ref *dref; > + > + ret = _get_extent_ref(BTRFS_EXTENT_FLAG_DATA, BTRFS_EXTENT_DATA_REF_KEY, > + ptr, eb, ei, item_size, &eiref); > + if (ret < 0) > + return ret; > + > + dref = (struct btrfs_extent_data_ref *)(&eiref->offset); > + if (btrfs_extent_data_ref_root(eb, dref) != BTRFS_FS_TREE_OBJECTID) { > + WARN_ON(1); > + return -1;needs better error code> + } > + > + *dest_inum = btrfs_extent_data_ref_objectid(eb, dref); > + *dest_ioff = btrfs_extent_data_ref_offset(eb, dref); > + > + return ret; > +} > + > +/* > + * calls iterate() for every inode that references the extent identified by > + * the given parameters. > + * when the iterator function returns a non-zero value, iteration stops. > + */ > +int iterate_extent_inodes(struct extent_buffer *eb, > + struct btrfs_extent_item *ei, > + loff_t extent_item_offset, u32 item_size, > + iterate_extent_inodes_t *iterate, void *ctx) > +{ > + int last; > + u64 inum; > + unsigned long ptr = 0; > + loff_t extent_data_item_offset; > + int ret; > + > + do { > + last = data_inode_for_extent(&ptr, eb, ei, item_size, &inum, > + &extent_data_item_offset); > + if (last < 0) > + return last; > + > + ret = iterate(inum, extent_item_offset+extent_data_item_offset, > + ctx); > + if (ret) > + return ret; > + > + } while (!last); > + > + return 0; > +} > + > +/* > + * just a convenience function combining data_extent_from_logical and > + * iterate_extent_inodes. > + */a better comment please or none> +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, > + struct btrfs_path *path, > + iterate_extent_inodes_t *iterate, void *ctx) > +{ > + int ret; > + u32 item_size; > + struct extent_buffer *l; > + struct btrfs_extent_item *extent; > + loff_t offset; > + struct btrfs_key found_key; > + > + ret = data_extent_from_logical(fs_info->extent_root, logical, path, > + &found_key); > + if (ret) > + return ret;this can ret < 0 in case of error, and 1 for the tree block. not sure if this shouldn''t be handled separately> + > + offset = logical - found_key.objectid; > + l = path->nodes[0]; > + extent = btrfs_item_ptr(l, path->slots[0], struct btrfs_extent_item); > + item_size = btrfs_item_size_nr(l, path->slots[0]); > + btrfs_release_path(path); > + > + ret = iterate_extent_inodes(l, extent, offset, item_size, iterate, ctx); > + > + return ret; > +} > + > +static int iterate_irefs(u64 inum, struct extent_buffer *eb_i, > + struct btrfs_root *fs_root, > + struct btrfs_path *path, > + iterate_irefs_t *iterate, void *ctx) > +{ > + int ret; > + int slot; > + u32 cur; > + u32 len; > + u32 name_len; > + u64 parent = 0; > + struct extent_buffer *eb_ir; > + struct btrfs_item *item; > + struct btrfs_inode_ref *iref; > + > + while (1) { > + ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path, > + 1, &parent, &eb_ir, &slot); > + if (ret < 0) > + return ret; > + if (ret) > + break; > + > + item = btrfs_item_nr(eb_i, slot); > + iref = btrfs_item_ptr(eb_i, slot, struct btrfs_inode_ref); > + > + for (cur = 0; cur < btrfs_item_size(eb_i, item); cur += len) { > + name_len = btrfs_inode_ref_name_len(eb_i, iref); > + ret = iterate(parent, iref, eb_ir, slot, ctx); > + if (ret) > + return ret; > + len = sizeof(*iref) + name_len; > + iref = (struct btrfs_inode_ref *)((char *)iref + len); > + } > + }can you estimate how long this loop may take? (eg. number of iterations per call) there may be a need for cond_resched()> + > + return 0; > +} > + > +/* > + * returns 0 if the path could be dumped (probably truncated) > + * returns <0 in case of an error > + */ > +static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref, > + struct extent_buffer *eb_ir, int slot, > + void *ctx) > +{ > + struct inode_fs_paths *ipath = ctx; > + struct extent_buffer *eb_i = ipath->eb_i; > + u32 path_len; > + char *fs_path; > + > + if (ipath->left < 2) > + return -EOVERFLOW; > + > + *ipath->dest++ = '' ''; > + --ipath->left; > + > + fs_path = iref_to_path(ipath->fs_root, ipath->path, iref, eb_i, > + inum, ipath->scratch_buf, ipath->left);fs_path will be never NULL, either an IS_ERR or ipath->scratch_buf + some offset> + if (!fs_path || IS_ERR(fs_path)) > + return PTR_ERR(fs_path);this could return PTR_ERR(0), confusing with ''return 0'' for the calle> + path_len = ipath->scratch_buf + ipath->left - fs_path - 1; > + if (path_len+1 > ipath->left) > + return -EOVERFLOW; > + memcpy(ipath->dest, fs_path, path_len+1); > + ipath->left -= path_len; > + ipath->dest += path_len; > + > + return 0; > +} > + > +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath) > +{ > + return iterate_irefs(inum, ipath->eb_i, ipath->fs_root, ipath->path, > + inode_to_path, ipath); > +} > diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h > new file mode 100644 > index 0000000..6b7e856 > --- /dev/null > +++ b/fs/btrfs/backref.h > @@ -0,0 +1,59 @@ > +/* > + * Copyright (C) 2011 STRATO. All rights reserved. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public > + * License v2 as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public > + * License along with this program; if not, write to the > + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, > + * Boston, MA 021110-1307, USA. > + */ > + > +#ifndef __BTRFS_BACKREF__ > +#define __BTRFS_BACKREF__ > + > +struct inode_fs_paths { > + int left;as suggested above, rename this> + char *dest; > + struct btrfs_path *path; > + char *scratch_buf; > + struct btrfs_root *fs_root; > + int scratch_bufsize; > + struct extent_buffer *eb_i; > +}; > + > +typedef int (iterate_extent_inodes_t)(u64 inum, loff_t offset, void *ctx); > +typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref, > + struct extent_buffer *eb_ir, > + int slot, void *ctx); > + > +int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root, > + struct btrfs_path *path); > + > +int data_extent_from_logical(struct btrfs_root *root, u64 logical, > + struct btrfs_path *path, > + struct btrfs_key *found_key); > + > +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, > + struct btrfs_extent_item *ei, u32 item_size, > + u64 *dest_root, u64 *dest_level); > + > +int iterate_extent_inodes(struct extent_buffer *eb, > + struct btrfs_extent_item *ei, > + loff_t extent_item_offset, u32 item_size, > + iterate_extent_inodes_t *iterate, void *ctx); > + > +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, > + struct btrfs_path *path, > + iterate_extent_inodes_t *iterate, void *ctx); > + > +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath);every function from the .c file not listed here should be static> + > +#endif-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 13, 2011 at 09:10:35PM +0200, Jan Schmidt wrote:> In normal operation, scrub is reading data sequentially in large portions. > In case of an i/o error, we try to find the corrupted area(s) by issuing > page sized read requests. With this commit we increment the > unverified_errors counter if all of the small size requests succeed. > > Userland patches carrying such conspicous events to the administrator should > already be around. > > Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> > --- > fs/btrfs/scrub.c | 37 ++++++++++++++++++++++++++----------- > 1 files changed, 26 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index d5a4108..00e4e58 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -207,18 +207,25 @@ nomem: > * recheck_error gets called for every page in the bio, even though only > * one may be bad > */ > -static void scrub_recheck_error(struct scrub_bio *sbio, int ix) > +static int scrub_recheck_error(struct scrub_bio *sbio, int ix) > { > + struct scrub_dev *sdev = sbio->sdev; > + u64 sector = (sbio->physical + ix * PAGE_SIZE) >> 9; > + > if (sbio->err) { > - if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, > - (sbio->physical + ix * PAGE_SIZE) >> 9, > + if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, sector, > sbio->bio->bi_io_vec[ix].bv_page) == 0) { > if (scrub_fixup_check(sbio, ix) == 0) > - return; > + return 0; > } > } > > + spin_lock(&sdev->stat_lock); > + ++sdev->stat.read_errors; > + spin_unlock(&sdev->stat_lock); > + > scrub_fixup(sbio, ix); > + return 1; > } > > static int scrub_fixup_check(struct scrub_bio *sbio, int ix) > @@ -388,8 +395,14 @@ static void scrub_checksum(struct btrfs_work *work) > int ret; > > if (sbio->err) { > + ret = 0; > for (i = 0; i < sbio->count; ++i) > - scrub_recheck_error(sbio, i); > + ret |= scrub_recheck_error(sbio, i); > + if (!ret) { > + spin_lock(&sdev->stat_lock); > + ++sdev->stat.unverified_errors; > + spin_unlock(&sdev->stat_lock); > + } > > sbio->bio->bi_flags &= ~(BIO_POOL_MASK - 1); > sbio->bio->bi_flags |= 1 << BIO_UPTODATE; > @@ -402,10 +415,6 @@ static void scrub_checksum(struct btrfs_work *work) > bi->bv_offset = 0; > bi->bv_len = PAGE_SIZE; > } > - > - spin_lock(&sdev->stat_lock); > - ++sdev->stat.read_errors; > - spin_unlock(&sdev->stat_lock); > goto out; > } > for (i = 0; i < sbio->count; ++i) { > @@ -426,8 +435,14 @@ static void scrub_checksum(struct btrfs_work *work) > WARN_ON(1); > } > kunmap_atomic(buffer, KM_USER0); > - if (ret) > - scrub_recheck_error(sbio, i); > + if (ret) { > + ret = scrub_recheck_error(sbio, i); > + if (!ret) { > + spin_lock(&sdev->stat_lock); > + ++sdev->stat.unverified_errors; > + spin_unlock(&sdev->stat_lock); > + } > + } > } > > out:-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Jun-16 21:26 UTC
Re: [PATCH v1 3/6] scrub: print paths of corrupted files
On Mon, Jun 13, 2011 at 09:10:36PM +0200, Jan Schmidt wrote:> While scrubbing, we may encounter various errors. Previously, a logical > address was printed to the log only. Now, all paths belonging to that > address are resolved and printed separately. That should work for hardlinks > as well as reflinks. > > Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> > --- > fs/btrfs/scrub.c | 140 ++++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 files changed, 137 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index 00e4e58..e294d76 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -27,6 +27,7 @@ > #include "volumes.h" > #include "disk-io.h" > #include "ordered-data.h" > +#include "backref.h" > > /* > * This is only the first step towards a full-features scrub. It reads all > @@ -106,6 +107,19 @@ struct scrub_dev { > spinlock_t stat_lock; > }; > > +struct scrub_warning { > + struct btrfs_path *path; > + u64 extent_item_size; > + char *scratch_buf; > + char *msg_buf; > + const char *errstr; > + u64 sector;better named ''physical'', like in scrub_bio> + u64 logical; > + struct btrfs_device *dev; > + int msg_bufsize; > + int scratch_bufsize; > +}; > + > static void scrub_free_csums(struct scrub_dev *sdev) > { > while (!list_empty(&sdev->csum_list)) { > @@ -201,6 +215,121 @@ nomem: > return ERR_PTR(-ENOMEM); > } > > +static int scrub_print_warning_inode(u64 inum, loff_t offset, void *ctx) > +{ > + u64 isize; > + int ret; > + struct extent_buffer *eb_i;''eb'' for consistency with others> + struct btrfs_inode_item *ii;inode_item is a commonly used name> + struct scrub_warning *swarn = ctx; > + struct btrfs_fs_info *fs_info = swarn->dev->dev_root->fs_info; > + struct inode_fs_paths ipath; > + > + ret = inode_item_info(inum, 0, fs_info->fs_root, swarn->path); > + if (ret) { > +err: > + printk(KERN_WARNING "btrfs: %s at logical %llu on dev " > + "%s, sector %llu, inode %llu, offset %llu: path " > + "resolving failed with ret=%d\n", swarn->errstr, > + swarn->logical, swarn->dev->name, swarn->sector, inum, > + offset, ret); > + return 0;please move it to the end after the other printk> + } > + eb_i = swarn->path->nodes[0]; > + ii = btrfs_item_ptr(eb_i, swarn->path->slots[0], > + struct btrfs_inode_item); > + btrfs_release_path(swarn->path); > + > + isize = btrfs_inode_size(eb_i, ii); > + > + ipath.left = swarn->msg_bufsize - 1; > + ipath.dest = swarn->msg_buf; > + ipath.path = swarn->path; > + ipath.scratch_buf = swarn->scratch_buf; > + ipath.scratch_bufsize = swarn->scratch_bufsize; > + ipath.fs_root = fs_info->fs_root; > + ipath.eb_i = eb_i; > + > + ret = paths_from_inode(inum, &ipath); > + if (ret < 0) > + goto err; > + > + printk(KERN_WARNING "btrfs: %s at logical %llu on dev " > + "%s, sector %llu, inode %llu, offset %llu, " > + "length %llu, links %u (path:%s)\n", swarn->errstr, > + swarn->logical, swarn->dev->name, swarn->sector, inum, offset, > + min(isize - offset, (u64)PAGE_SIZE), > + btrfs_inode_nlink(eb_i, ii), swarn->msg_buf); > + > + return 0; > +} > + > +static void scrub_print_warning(const char *errstr, struct scrub_bio *sbio, > + int ix)please rename ''ix'' to eg ''idx'', more readable> +{ > + struct btrfs_device *dev = sbio->sdev->dev; > + struct btrfs_fs_info *fs_info = dev->dev_root->fs_info; > + struct btrfs_path *path; > + struct btrfs_key found_key; > + struct extent_buffer *eb; > + struct btrfs_extent_item *ei; > + struct scrub_warning swarn; > + u32 item_size; > + int ret; > + u64 ref_root;i''m not able to understand what a u64 value describing a root means, seeing it here or in the printk message. it''s returned as *dest_root = btrfs_extent_inline_ref_offset(eb, eiref); can you please explain it to me?> + u64 ref_level;int is enough, as noted elsewhere> + unsigned long ptr = 0; > + static const int bufsize = 4096;drop static, this would allocate a memory cell to hold the value 4096, while without it, it''s just a named constant and this will make the compiler happy> + loff_t extent_item_offset;loff_t is a signed long long, do you really need it?> + > + path = btrfs_alloc_path(); > + > + swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS); > + swarn.msg_buf = kmalloc(bufsize, GFP_NOFS); > + swarn.sector = (sbio->physical + ix * PAGE_SIZE) >> 9; > + swarn.logical = sbio->logical + ix * PAGE_SIZE; > + swarn.errstr = errstr; > + swarn.dev = dev; > + swarn.msg_bufsize = bufsize; > + swarn.scratch_bufsize = bufsize; > + > + if (!path || !swarn.scratch_buf || !swarn.msg_buf) > + goto out; > + > + ret = data_extent_from_logical(fs_info->extent_root, > + swarn.logical, path, &found_key); > + if (ret < 0) > + goto out; > + > + extent_item_offset = swarn.logical - found_key.objectid; > + swarn.extent_item_size = found_key.offset; > + > + eb = path->nodes[0]; > + ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item); > + item_size = btrfs_item_size_nr(eb, path->slots[0]); > + > + btrfs_release_path(path);is it safe to release the path here? it''s used in the else branch below> + > + if (ret) { > + ret = tree_backref_for_extent(&ptr, eb, ei, item_size, > + &ref_root, &ref_level); > + printk(KERN_WARNING "%s at logical %llu on dev %s, " > + "sector %llu: metadata %s (level %llu) in tree %llu\n", > + errstr, swarn.logical, dev->name, swarn.sector, > + ref_level ? "node" : "leaf", ret < 0 ? -1 : ref_level, > + ret < 0 ? -1 : ref_root); > + } else { > + swarn.path = path; > + iterate_extent_inodes(eb, ei, extent_item_offset, item_size, > + scrub_print_warning_inode, &swarn); > + } > + > +out: > + btrfs_free_path(path); > + kfree(swarn.scratch_buf); > + kfree(swarn.msg_buf); > +} > + > /* > * scrub_recheck_error gets called when either verification of the page > * failed or the bio failed to read, e.g. with EIO. In the latter case, > @@ -218,6 +347,11 @@ static int scrub_recheck_error(struct scrub_bio *sbio, int ix) > if (scrub_fixup_check(sbio, ix) == 0) > return 0; > } > + if (printk_ratelimit())please don''t use printk_ratelimit; as a simple printk_ratelimited is not applicable here, we have to use __ratelimit: (include/linux/printk.h) static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); if (__ratelimit(&_rs))> + scrub_print_warning("i/o error", sbio, ix); > + } else { > + if (printk_ratelimit())and the ratelimit context can be even shared> + scrub_print_warning("checksum error", sbio, ix); > } > > spin_lock(&sdev->stat_lock); > @@ -333,7 +467,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix) > spin_unlock(&sdev->stat_lock); > > if (printk_ratelimit())(drop)> - printk(KERN_ERR "btrfs: fixed up at %llu\n", > + printk(KERN_ERR "btrfs: fixed up error at logical %llu\n", > (unsigned long long)logical);printk_ratelimited seems that I forgot to convert this (and the one below) some time ago. although changes should not be mixed within a patch, I guess this one is rather simple and a good thing to do.> return; > > @@ -344,8 +478,8 @@ uncorrectable: > spin_unlock(&sdev->stat_lock); > > if (printk_ratelimit())(drop)> - printk(KERN_ERR "btrfs: unable to fixup at %llu\n", > - (unsigned long long)logical); > + printk(KERN_ERR "btrfs: unable to fixup (regular) error at " > + "logical %llu\n", (unsigned long long)logical);printk_ratelimited> } > > static int scrub_fixup_io(int rw, struct block_device *bdev, sector_t sector,-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Jun-16 21:27 UTC
Re: [PATCH v1 6/6] scrub: add fixup code for errors on nodatasum files
On Mon, Jun 13, 2011 at 09:10:39PM +0200, Jan Schmidt wrote:> This removes a FIXME comment and introduces the first part of nodatasum > fixup: It gets the corresponding inode for a logical address and triggers a > regular readpage for the corrupted sector. > > Once we have on-the-fly error correction our error will be automatically > corrected. The correction code is expected to clear the newly introduced > EXTENT_DAMAGED flag, making scrub report that error as "corrected" instead > of "uncorrectable" eventually. > > Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> > --- > fs/btrfs/extent_io.h | 1 + > fs/btrfs/scrub.c | 174 ++++++++++++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 170 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h > index 2fef77f..906ea42 100644 > --- a/fs/btrfs/extent_io.h > +++ b/fs/btrfs/extent_io.h > @@ -17,6 +17,7 @@ > #define EXTENT_NODATASUM (1 << 10) > #define EXTENT_DO_ACCOUNTING (1 << 11) > #define EXTENT_FIRST_DELALLOC (1 << 12) > +#define EXTENT_DAMAGED (1 << 13) > #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK) > #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC) > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index ec29ce8..a2aa47a 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -27,6 +27,7 @@ > #include "volumes.h" > #include "disk-io.h" > #include "ordered-data.h" > +#include "transaction.h" > #include "backref.h" > > /* > @@ -94,6 +95,7 @@ struct scrub_dev { > int first_free; > int curr; > atomic_t in_flight; > + atomic_t fixup;a nondescriptive name; a counter?> spinlock_t list_lock; > wait_queue_head_t list_wait; > u16 csum_size; > @@ -107,6 +109,14 @@ struct scrub_dev { > spinlock_t stat_lock; > }; > > +struct scrub_fixup_nodatasum { > + struct scrub_dev *sdev; > + u64 logical; > + struct btrfs_root *root; > + struct btrfs_work work; > + u64 mirror_num;int should be enough and is used elsewhere but scrub_page, uses u64 as well; that''s a bit too much IMO> +}; > + > struct scrub_warning { > struct btrfs_path *path; > u64 extent_item_size; > @@ -195,12 +205,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device *dev) > > if (i != SCRUB_BIOS_PER_DEV-1) > sdev->bios[i]->next_free = i + 1; > - else > + else > sdev->bios[i]->next_free = -1; > } > sdev->first_free = 0; > sdev->curr = -1; > atomic_set(&sdev->in_flight, 0); > + atomic_set(&sdev->fixup, 0); > atomic_set(&sdev->cancel_req, 0); > sdev->csum_size = btrfs_super_csum_size(&fs_info->super_copy); > INIT_LIST_HEAD(&sdev->csum_list); > @@ -330,6 +341,141 @@ out: > kfree(swarn.msg_buf); > } > > +static int scrub_fixup_readpage(u64 inum, loff_t offset, void *ctx) > +{ > + struct page *page; > + unsigned long index; > + struct scrub_fixup_nodatasum *fixup = ctx; > + int ret; > + int corrected; > + struct btrfs_key key; > + struct inode *inode; > + int end = offset + PAGE_SIZE - 1;loff_t to int?> + > + key.type = BTRFS_INODE_ITEM_KEY; > + key.objectid = inum; > + key.offset = 0; > + inode = btrfs_iget(fixup->root->fs_info->sb, &key, > + fixup->root->fs_info->fs_root, NULL); > + if (IS_ERR(inode)) > + return -1;needs better error code> + > + ret = set_extent_bit(&BTRFS_I(inode)->io_tree, offset, end, > + EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS); > + if (ret) > + return ret < 0 ? ret : -1;needs better error code> + > + index = offset >> PAGE_CACHE_SHIFT; > + > + page = find_or_create_page(inode->i_mapping, index, GFP_NOFS); > + if (!page) > + return -1;needs better error code> + > + ret = extent_read_full_page(&BTRFS_I(inode)->io_tree, page, > + btrfs_get_extent, fixup->mirror_num); > + wait_on_page_locked(page); > + corrected = !test_range_bit(&BTRFS_I(inode)->io_tree, offset, end, > + EXTENT_DAMAGED, 0, NULL); > + > + if (corrected) > + WARN_ON(!PageUptodate(page)); > + else > + clear_extent_bit(&BTRFS_I(inode)->io_tree, offset, end, > + EXTENT_DAMAGED, 0, 0, NULL, GFP_NOFS); > + > + put_page(page); > + iput(inode); > + > + if (ret < 0) > + return ret; > + > + if (ret == 0 && corrected) { > + /* > + * we only need to call readpage for one of the inodes belonging > + * to this extent. so make iterate_extent_inodes stop > + */ > + return 1; > + } > + > + return -1; > +} > + > +static void scrub_fixup_nodatasum(struct btrfs_work *work) > +{ > + int ret; > + struct scrub_fixup_nodatasum *fixup; > + struct scrub_dev *sdev; > + struct btrfs_trans_handle *trans = NULL; > + struct btrfs_fs_info *fs_info; > + struct btrfs_path *path; > + int uncorrectable = 0; > + > + fixup = container_of(work, struct scrub_fixup_nodatasum, work); > + sdev = fixup->sdev; > + fs_info = fixup->root->fs_info; > + > + path = btrfs_alloc_path(); > + if (!path) { > + spin_lock(&sdev->stat_lock); > + ++sdev->stat.malloc_errors; > + spin_unlock(&sdev->stat_lock); > + uncorrectable = 1; > + goto out; > + } > + > + trans = btrfs_join_transaction(fixup->root); > + if (IS_ERR(trans)) { > + uncorrectable = 1; > + goto out; > + } > + > + /* > + * the idea is to trigger a regular read through the standard path. we > + * read a page from the (failed) logical address by specifying the > + * corresponding copynum of the failed sector. thus, that readpage is > + * expected to fail. > + * that is the point where on-the-fly error correction will kick in > + * (once it''s finished) and rewrite the failed sector if a good copy > + * can be found. > + */ > + ret = iterate_inodes_from_logical(fixup->logical, fixup->root->fs_info, > + path, scrub_fixup_readpage, > + fixup); > + if (ret < 0) { > + uncorrectable = 1; > + goto out; > + } > + WARN_ON(ret != 1); > + > + spin_lock(&sdev->stat_lock); > + ++sdev->stat.corrected_errors; > + spin_unlock(&sdev->stat_lock); > + > +out: > + if (trans && !IS_ERR(trans)) > + btrfs_end_transaction(trans, fixup->root); > + if (uncorrectable) { > + spin_lock(&sdev->stat_lock); > + ++sdev->stat.uncorrectable_errors; > + spin_unlock(&sdev->stat_lock); > + if (printk_ratelimit())printk_ratelimited> + printk(KERN_ERR "btrfs: unable to fixup (nodatasum) " > + "error at logical %llu\n", fixup->logical); > + } > + > + btrfs_free_path(path); > + kfree(fixup); > + > + /* see caller why we''re pretending to be paused in the scrub counters */ > + mutex_lock(&fs_info->scrub_lock); > + atomic_dec(&fs_info->scrubs_running); > + atomic_dec(&fs_info->scrubs_paused); > + mutex_unlock(&fs_info->scrub_lock); > + atomic_dec(&sdev->fixup); > + wake_up(&fs_info->scrub_pause_wait); > + wake_up(&sdev->list_wait); > +} > + > /* > * scrub_recheck_error gets called when either verification of the page > * failed or the bio failed to read, e.g. with EIO. In the latter case, > @@ -398,6 +544,7 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix) > struct btrfs_fs_info *fs_info = sdev->dev->dev_root->fs_info; > struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree; > struct btrfs_multi_bio *multi = NULL; > + struct scrub_fixup_nodatasum *fixup; > u64 logical = sbio->logical + ix * PAGE_SIZE; > u64 length; > int i; > @@ -406,12 +553,28 @@ static void scrub_fixup(struct scrub_bio *sbio, int ix) > > if ((sbio->spag[ix].flags & BTRFS_EXTENT_FLAG_DATA) && > (sbio->spag[ix].have_csum == 0)) { > + fixup = kzalloc(sizeof(*fixup), GFP_NOFS);what happens if the allocation fails?> + fixup->sdev = sdev; > + fixup->logical = logical; > + fixup->root = fs_info->extent_root; > + fixup->mirror_num = sbio->spag[ix].mirror_num; > /* > - * nodatasum, don''t try to fix anything > - * FIXME: we can do better, open the inode and trigger a > - * writeback > + * increment scrubs_running to prevent cancel requests from > + * completing as long as a fixup worker is running. we must also > + * increment scrubs_paused to prevent deadlocking on pause > + * requests used for transactions commits (as the worker uses a > + * transaction context). it is safe to regard the fixup worker > + * as paused for all matters practical. effectively, we only > + * avoid cancellation requests from completing.from a rather brief look, this works as advertised> */ > - goto uncorrectable; > + mutex_lock(&fs_info->scrub_lock); > + atomic_inc(&fs_info->scrubs_running); > + atomic_inc(&fs_info->scrubs_paused); > + mutex_unlock(&fs_info->scrub_lock); > + atomic_inc(&sdev->fixup); > + fixup->work.func = scrub_fixup_nodatasum; > + btrfs_queue_worker(&fs_info->scrub_workers, &fixup->work); > + return; > } > > length = PAGE_SIZE; > @@ -1404,6 +1567,7 @@ int btrfs_scrub_dev(struct btrfs_root *root, u64 devid, u64 start, u64 end, > ret = scrub_enumerate_chunks(sdev, start, end); > > wait_event(sdev->list_wait, atomic_read(&sdev->in_flight) == 0); > + wait_event(sdev->list_wait, atomic_read(&sdev->fixup) == 0);what about joining those into one wait?> > atomic_dec(&fs_info->scrubs_running); > wake_up(&fs_info->scrub_pause_wait);-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, sorry for the noise, this has no comments from me, david On Thu, Jun 16, 2011 at 11:25:21PM +0200, David Sterba wrote:> On Mon, Jun 13, 2011 at 09:10:35PM +0200, Jan Schmidt wrote: > > In normal operation, scrub is reading data sequentially in large portions. > > In case of an i/o error, we try to find the corrupted area(s) by issuing > > page sized read requests. With this commit we increment the > > unverified_errors counter if all of the small size requests succeed. > > > > Userland patches carrying such conspicous events to the administrator should > > already be around. > > > > Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> > > --- > > fs/btrfs/scrub.c | 37 ++++++++++++++++++++++++++----------- > > 1 files changed, 26 insertions(+), 11 deletions(-) > > > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > > index d5a4108..00e4e58 100644 > > --- a/fs/btrfs/scrub.c > > +++ b/fs/btrfs/scrub.c > > @@ -207,18 +207,25 @@ nomem: > > * recheck_error gets called for every page in the bio, even though only > > * one may be bad > > */ > > -static void scrub_recheck_error(struct scrub_bio *sbio, int ix) > > +static int scrub_recheck_error(struct scrub_bio *sbio, int ix) > > { > > + struct scrub_dev *sdev = sbio->sdev; > > + u64 sector = (sbio->physical + ix * PAGE_SIZE) >> 9; > > + > > if (sbio->err) { > > - if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, > > - (sbio->physical + ix * PAGE_SIZE) >> 9, > > + if (scrub_fixup_io(READ, sbio->sdev->dev->bdev, sector, > > sbio->bio->bi_io_vec[ix].bv_page) == 0) { > > if (scrub_fixup_check(sbio, ix) == 0) > > - return; > > + return 0; > > } > > } > > > > + spin_lock(&sdev->stat_lock); > > + ++sdev->stat.read_errors; > > + spin_unlock(&sdev->stat_lock); > > + > > scrub_fixup(sbio, ix); > > + return 1; > > } > > > > static int scrub_fixup_check(struct scrub_bio *sbio, int ix) > > @@ -388,8 +395,14 @@ static void scrub_checksum(struct btrfs_work *work) > > int ret; > > > > if (sbio->err) { > > + ret = 0; > > for (i = 0; i < sbio->count; ++i) > > - scrub_recheck_error(sbio, i); > > + ret |= scrub_recheck_error(sbio, i); > > + if (!ret) { > > + spin_lock(&sdev->stat_lock); > > + ++sdev->stat.unverified_errors; > > + spin_unlock(&sdev->stat_lock); > > + } > > > > sbio->bio->bi_flags &= ~(BIO_POOL_MASK - 1); > > sbio->bio->bi_flags |= 1 << BIO_UPTODATE; > > @@ -402,10 +415,6 @@ static void scrub_checksum(struct btrfs_work *work) > > bi->bv_offset = 0; > > bi->bv_len = PAGE_SIZE; > > } > > - > > - spin_lock(&sdev->stat_lock); > > - ++sdev->stat.read_errors; > > - spin_unlock(&sdev->stat_lock); > > goto out; > > } > > for (i = 0; i < sbio->count; ++i) { > > @@ -426,8 +435,14 @@ static void scrub_checksum(struct btrfs_work *work) > > WARN_ON(1); > > } > > kunmap_atomic(buffer, KM_USER0); > > - if (ret) > > - scrub_recheck_error(sbio, i); > > + if (ret) { > > + ret = scrub_recheck_error(sbio, i); > > + if (!ret) { > > + spin_lock(&sdev->stat_lock); > > + ++sdev->stat.unverified_errors; > > + spin_unlock(&sdev->stat_lock); > > + } > > + } > > } > > > > out: > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2011-Jun-18 11:25 UTC
Re: [PATCH v1 1/6] added helper functions to iterate backrefs
Lots of quotation removed. All removed comments accepted. On 16.06.2011 23:24, David Sterba wrote:> On Mon, Jun 13, 2011 at 09:10:34PM +0200, Jan Schmidt wrote: >> +/* >> + * this iterates to turn a btrfs_inode_ref into a full filesystem path. elements >> + * of the path are separated by ''/'' and the path is guaranteed to be >> + * 0-terminated. the path is only given within the current file system. >> + * Therefore, it never starts with a ''/''. the caller is responsible to provide >> + * "size" bytes in "dest". the dest buffer will be filled backwards! the idea is >> + * that in case of an overflow, the lower part in the hierarchie is most >> + * important to the user. finally, the start point of resulting the string is > ^^^ >> + * returned. in case the path would overflow, "..." is added at the front of >> + * the string and iteration stops regularly. >> + */ >> +static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, >> + struct btrfs_inode_ref *iref, >> + struct extent_buffer *eb, u64 parent, >> + char *dest, u32 size) >> +{ >> + u32 len; >> + int slot; >> + u64 inum; >> + int ret; >> + u32 left = size - 1; > > this confused me first, ''left'' may be ambiguous, something like ''bytes_left'' > would be better > >> + >> + dest[left] = ''\0''; >> + >> + while (1) { >> + len = btrfs_inode_ref_name_len(eb, iref); >> + if (len > left) { >> + if (size < 4) >> + return dest+left; > > break; > >> + if (left > 3) >> + dest += left - 3; >> + memcpy(dest, "...", 3); >> + return dest; > > i''d rather see this consistent with other return points, > > if (left > 3) > left -= 3; > else > left = 0; > memcpy(dest + left, "...", 3); > break; > >> + } >> + left -= len; >> + read_extent_buffer(eb, dest+left, >> + (unsigned long)(iref + 1), len); >> + >> + ret = inode_item_info(parent, 0, fs_root, path); >> + if (ret) >> + return ERR_PTR(ret); >> + eb = path->nodes[0]; >> + btrfs_release_path(path); >> + >> + ret = inode_ref_info(parent, 0, fs_root, path, 0, >> + &inum, NULL, &slot); >> + if (ret) >> + return ERR_PTR(ret); >> + >> + if (parent == inum) /* regular exit ahead */ { >> + return dest+left; > break; > > and please put the comment before the if and remove { }Damn. Braces where a debugging leftover. I wonder if I missed something from checkpatch.pl.>> + } >> + >> + iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); >> + parent = inum; >> + if (left > 0) { >> + dest[left-1] = ''/''; >> + --left; > > swap the lines and remove "-1" :) > >> + } >> + } > > return dest + left;Took suggested modifications and it looks prettier with a return at the end.>> +} >> + >> +/* >> + * this makes the path point to (logical EXTENT_ITEM *) >> + * returns 0 for data blocks, 1 for tree blocks and <0 on error >> + */ >> +int data_extent_from_logical(struct btrfs_root *root, u64 logical, >> + struct btrfs_path *path, >> + struct btrfs_key *found_key) >> +{ >> + int ret; >> + u64 flags; >> + u32 item_size; >> + struct extent_buffer *eb; >> + struct btrfs_extent_item *ei; >> + struct btrfs_key key; >> + struct btrfs_key f; > > tmpkey; > >> + struct btrfs_key *fk = found_key ? found_key : &f; > > transform this to > > if (!found_key) > found_key = &tmpkey; > > and use found_key along > > still, will you ever pass NULL found_key?You are right, I probably won''t. I''ll take out those lines completely.>> + >> + key.type = BTRFS_EXTENT_ITEM_KEY; >> + key.objectid = logical; >> + key.offset = (u64)-1; >> + >> + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); >> + if (ret < 0) >> + return ret; >> + ret = btrfs_previous_item(root->fs_info->extent_root, path, >> + 0, BTRFS_EXTENT_ITEM_KEY); >> + if (ret < 0) >> + return ret; >> + btrfs_item_key_to_cpu(path->nodes[0], fk, path->slots[0]); >> + if (fk->type != BTRFS_EXTENT_ITEM_KEY || fk->objectid > logical || >> + fk->objectid + fk->offset <= logical) >> + return -1; > > this is -EPERM, confusing. eg btrfs_previous_item a few lines above may return > variety of error codes from the whole function, EPERM just does not fit. does > this check mean some serious error? inconsistency or corruption?It doesn''t. -ENOENT seems appropriate.>> + >> + eb = path->nodes[0]; >> + item_size = btrfs_item_size_nr(eb, path->slots[0]); >> + BUG_ON(item_size < sizeof(*ei)); >> + >> + ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item); >> + flags = btrfs_extent_flags(eb, ei); >> + >> + if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) >> + return 1; >> + >> + return 0; >> +} >> + >> +/* >> + * helper function to iterate extent backrefs. ptr must point to a 0 value for >> + * the first call and may be modified. it is used to track state. >> + * if more backrefs exist, 0 is returned and the next call to _get_extent_ref >> + * must pass the modified ptr parameter to get to the next backref. >> + * after the last backref was processed, 1 is returned. >> + * returns <0 on error >> + */ >> +static int _get_extent_ref(int flags_wanted, int type_wanted, > __get_extent_ref > single score function prefixes are not used > > flags_wanted is compared to extent_flags of u64 typeAlongside, I changed type_wanted from int to u8.>> + unsigned long *ptr, struct extent_buffer *eb, >> + struct btrfs_extent_item *ei, u32 item_size, >> + struct btrfs_extent_inline_ref **eiref) >> +{ >> + int type; >> + int again = 0; >> + unsigned long end; >> + u64 flags; >> + struct btrfs_tree_block_info *info; >> + >> + if (!*ptr) { >> + /* first call */ >> + flags = btrfs_extent_flags(eb, ei); >> + if (!(flags & flags_wanted)) >> + return -1; > > again, better errorcode, EINVAL seems to be the right choice > >> + if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { >> + info = (struct btrfs_tree_block_info *)(ei + 1); >> + *eiref = (struct btrfs_extent_inline_ref *)(info + 1); >> + } else { >> + *eiref = (struct btrfs_extent_inline_ref *)(ei + 1); > > style: remove { } around single if/else statementThat would look horribly unbalanced because the if part needs braces.>> + } > >> + *ptr = (unsigned long)*eiref; >> + } >> + >> + end = (unsigned long)ei + item_size; >> + >> +again: >> + again = 0; > > excessive use of goto detected: > > do { >> + >> + *eiref = (struct btrfs_extent_inline_ref *)*ptr; >> + type = btrfs_extent_inline_ref_type(eb, *eiref); >> + >> + if (type != type_wanted) >> + again = 1; >> + >> + *ptr += btrfs_extent_inline_ref_size(type); >> + >> + WARN_ON(*ptr > end); >> + if (*ptr == end) >> + return 1; /* last */ >> + >> + if (again) >> + goto again; > > } while (type != type_wanted); > >> + >> + return 0; >> +} >> + >> +/* >> + * reads the tree block backref for an extent. tree level and root are returned >> + * through dest_level and dest_root. ptr must point to a 0 value for the first >> + * call and may be modified (see _get_extent_ref comment). >> + * returns 0 on success, <0 on error. note: in contrast to _get_extent_ref this >> + * one never returns 1! >> + */ >> +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, >> + struct btrfs_extent_item *ei, u32 item_size, >> + u64 *dest_root, u64 *dest_level) > > there''s not a formal "rule" or something, but i''ve seen outgoing parameters > named with ''out_'' > > dest_level can be just an int, levels fit in one byte anyway > >> +{ >> + int ret; >> + struct btrfs_tree_block_info *info; >> + struct btrfs_extent_inline_ref *eiref; >> + >> + ret = _get_extent_ref(BTRFS_EXTENT_FLAG_TREE_BLOCK, >> + BTRFS_TREE_BLOCK_REF_KEY, ptr, eb, ei, >> + item_size, &eiref); >> + if (ret < 0) >> + return ret; >> + >> + WARN_ON(!ret); /* multiple tree backrefs for this extent */ > > as we will probably have CONFIG_DEBUG, the comment makes a good candidate for a > printk messageI''ll prepare that with a if (!ret) and WARN_ON(1) with printk.>> + >> + info = (struct btrfs_tree_block_info *)(ei + 1); >> + *dest_root = btrfs_extent_inline_ref_offset(eb, eiref); >> + *dest_level = btrfs_tree_block_level(eb, info); >> + >> + return 0; >> +} >> + >> +static int data_inode_for_extent(unsigned long *ptr, struct extent_buffer *eb, >> + struct btrfs_extent_item *ei, >> + u32 item_size, u64 *dest_inum, >> + u64 *dest_ioff) >> +{ >> + int ret; >> + struct btrfs_extent_inline_ref *eiref; >> + struct btrfs_extent_data_ref *dref; >> + >> + ret = _get_extent_ref(BTRFS_EXTENT_FLAG_DATA, BTRFS_EXTENT_DATA_REF_KEY, >> + ptr, eb, ei, item_size, &eiref); >> + if (ret < 0) >> + return ret; >> + >> + dref = (struct btrfs_extent_data_ref *)(&eiref->offset); >> + if (btrfs_extent_data_ref_root(eb, dref) != BTRFS_FS_TREE_OBJECTID) { >> + WARN_ON(1); >> + return -1; > > needs better error code > >> + } >> + >> + *dest_inum = btrfs_extent_data_ref_objectid(eb, dref); >> + *dest_ioff = btrfs_extent_data_ref_offset(eb, dref); >> + >> + return ret; >> +} >> + >> +/* >> + * calls iterate() for every inode that references the extent identified by >> + * the given parameters. >> + * when the iterator function returns a non-zero value, iteration stops. >> + */ >> +int iterate_extent_inodes(struct extent_buffer *eb, >> + struct btrfs_extent_item *ei, >> + loff_t extent_item_offset, u32 item_size, >> + iterate_extent_inodes_t *iterate, void *ctx) >> +{ >> + int last; >> + u64 inum; >> + unsigned long ptr = 0; >> + loff_t extent_data_item_offset; >> + int ret; >> + >> + do { >> + last = data_inode_for_extent(&ptr, eb, ei, item_size, &inum, >> + &extent_data_item_offset); >> + if (last < 0) >> + return last; >> + >> + ret = iterate(inum, extent_item_offset+extent_data_item_offset, >> + ctx); >> + if (ret) >> + return ret; >> + >> + } while (!last); >> + >> + return 0; >> +} >> + >> +/* >> + * just a convenience function combining data_extent_from_logical and >> + * iterate_extent_inodes. >> + */ > > a better comment please or noneOpt for the latter. Didn''t feel that would need a comment, anyway.>> +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, >> + struct btrfs_path *path, >> + iterate_extent_inodes_t *iterate, void *ctx) >> +{ >> + int ret; >> + u32 item_size; >> + struct extent_buffer *l; >> + struct btrfs_extent_item *extent; >> + loff_t offset; >> + struct btrfs_key found_key; >> + >> + ret = data_extent_from_logical(fs_info->extent_root, logical, path, >> + &found_key); >> + if (ret) >> + return ret; > > this can ret < 0 in case of error, and 1 for the tree block. not sure if this > shouldn''t be handled separatelyWell, that''s fine I think. We can''t proceed either way here.>> + >> + offset = logical - found_key.objectid; >> + l = path->nodes[0]; >> + extent = btrfs_item_ptr(l, path->slots[0], struct btrfs_extent_item); >> + item_size = btrfs_item_size_nr(l, path->slots[0]); >> + btrfs_release_path(path); >> + >> + ret = iterate_extent_inodes(l, extent, offset, item_size, iterate, ctx); >> + >> + return ret; >> +} >> + >> +static int iterate_irefs(u64 inum, struct extent_buffer *eb_i, >> + struct btrfs_root *fs_root, >> + struct btrfs_path *path, >> + iterate_irefs_t *iterate, void *ctx) >> +{ >> + int ret; >> + int slot; >> + u32 cur; >> + u32 len; >> + u32 name_len; >> + u64 parent = 0; >> + struct extent_buffer *eb_ir; >> + struct btrfs_item *item; >> + struct btrfs_inode_ref *iref; >> + >> + while (1) { >> + ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path, >> + 1, &parent, &eb_ir, &slot); >> + if (ret < 0) >> + return ret; >> + if (ret) >> + break; >> + >> + item = btrfs_item_nr(eb_i, slot); >> + iref = btrfs_item_ptr(eb_i, slot, struct btrfs_inode_ref); >> + >> + for (cur = 0; cur < btrfs_item_size(eb_i, item); cur += len) { >> + name_len = btrfs_inode_ref_name_len(eb_i, iref); >> + ret = iterate(parent, iref, eb_ir, slot, ctx); >> + if (ret) >> + return ret; >> + len = sizeof(*iref) + name_len; >> + iref = (struct btrfs_inode_ref *)((char *)iref + len); >> + } >> + } > > can you estimate how long this loop may take? (eg. number of iterations per > call) there may be a need for cond_resched()Number of iterations is limited by the number of backrefs an inode can have, which is limited by the number of backrefs that fit into one leaf. As leaves are only 4kb (maybe 8kb soon), I don''t think that requires a cond_resched(). More input on this appreciated :-)>> + >> + return 0; >> +} >> + >> +/* >> + * returns 0 if the path could be dumped (probably truncated) >> + * returns <0 in case of an error >> + */ >> +static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref, >> + struct extent_buffer *eb_ir, int slot, >> + void *ctx) >> +{ >> + struct inode_fs_paths *ipath = ctx; >> + struct extent_buffer *eb_i = ipath->eb_i; >> + u32 path_len; >> + char *fs_path; >> + >> + if (ipath->left < 2) >> + return -EOVERFLOW; >> + >> + *ipath->dest++ = '' ''; >> + --ipath->left; >> + >> + fs_path = iref_to_path(ipath->fs_root, ipath->path, iref, eb_i, >> + inum, ipath->scratch_buf, ipath->left); > > fs_path will be never NULL, either an IS_ERR or ipath->scratch_buf + some > offsetYou are right, that changed while developing, I''ll remove the check. In this implementation ...>> + if (!fs_path || IS_ERR(fs_path)) >> + return PTR_ERR(fs_path); > > this could return PTR_ERR(0), confusing with ''return 0'' for the calle... that comment is void then.>> + path_len = ipath->scratch_buf + ipath->left - fs_path - 1; >> + if (path_len+1 > ipath->left) >> + return -EOVERFLOW; >> + memcpy(ipath->dest, fs_path, path_len+1); >> + ipath->left -= path_len; >> + ipath->dest += path_len; >> + >> + return 0; >> +} >> + >> +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath) >> +{ >> + return iterate_irefs(inum, ipath->eb_i, ipath->fs_root, ipath->path, >> + inode_to_path, ipath); >> +} >> diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h >> new file mode 100644 >> index 0000000..6b7e856 >> --- /dev/null >> +++ b/fs/btrfs/backref.h >> @@ -0,0 +1,59 @@ >> +/* >> + * Copyright (C) 2011 STRATO. All rights reserved. >> + * >> + * This program is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU General Public >> + * License v2 as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + * General Public License for more details. >> + * >> + * You should have received a copy of the GNU General Public >> + * License along with this program; if not, write to the >> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, >> + * Boston, MA 021110-1307, USA. >> + */ >> + >> +#ifndef __BTRFS_BACKREF__ >> +#define __BTRFS_BACKREF__ >> + >> +struct inode_fs_paths { >> + int left; > > as suggested above, rename this > >> + char *dest; >> + struct btrfs_path *path; >> + char *scratch_buf; >> + struct btrfs_root *fs_root; >> + int scratch_bufsize; >> + struct extent_buffer *eb_i; >> +}; >> + >> +typedef int (iterate_extent_inodes_t)(u64 inum, loff_t offset, void *ctx); >> +typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref, >> + struct extent_buffer *eb_ir, >> + int slot, void *ctx); >> + >> +int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root, >> + struct btrfs_path *path); >> + >> +int data_extent_from_logical(struct btrfs_root *root, u64 logical, >> + struct btrfs_path *path, >> + struct btrfs_key *found_key); >> + >> +int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, >> + struct btrfs_extent_item *ei, u32 item_size, >> + u64 *dest_root, u64 *dest_level); >> + >> +int iterate_extent_inodes(struct extent_buffer *eb, >> + struct btrfs_extent_item *ei, >> + loff_t extent_item_offset, u32 item_size, >> + iterate_extent_inodes_t *iterate, void *ctx); >> + >> +int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info, >> + struct btrfs_path *path, >> + iterate_extent_inodes_t *iterate, void *ctx); >> + >> +int paths_from_inode(u64 inum, struct inode_fs_paths *ipath); > > every function from the .c file not listed here should be staticI agree on that. Double checked and didn''t find any that isn''t. Thanks! Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2011-Jun-18 11:26 UTC
Re: [PATCH v1 3/6] scrub: print paths of corrupted files
Lots of quotation removed. All removed comments accepted. On 16.06.2011 23:26, David Sterba wrote:> On Mon, Jun 13, 2011 at 09:10:36PM +0200, Jan Schmidt wrote: >> +struct scrub_warning { >> + struct btrfs_path *path; >> + u64 extent_item_size; >> + char *scratch_buf; >> + char *msg_buf; >> + const char *errstr; >> + u64 sector; > > better named ''physical'', like in scrub_bioAs I save a sector there and not a physical address, I still prefer "sector". But as a compromise, I''ll change the type to sector_t instead.>> +static void scrub_print_warning(const char *errstr, struct scrub_bio *sbio, >> + int ix) > > please rename ''ix'' to eg ''idx'', more readableI''d like to stay consistent with the rest of the file, using ix at similar opportunities. And besides: I like ix more than idx :-)>> +{ >> + struct btrfs_device *dev = sbio->sdev->dev; >> + struct btrfs_fs_info *fs_info = dev->dev_root->fs_info; >> + struct btrfs_path *path; >> + struct btrfs_key found_key; >> + struct extent_buffer *eb; >> + struct btrfs_extent_item *ei; >> + struct scrub_warning swarn; >> + u32 item_size; >> + int ret; >> + u64 ref_root; > > i''m not able to understand what a u64 value describing a root means, seeing it > here or in the printk message. it''s returned as > > *dest_root = btrfs_extent_inline_ref_offset(eb, eiref); > > can you please explain it to me?Have a look at the BTRFS_*_TREE_OBJECTID #defines in ctree.h. btrfs-debug-tree prints them as tree block backref root 2>> + u64 ref_level; > > int is enough, as noted elsewhere > >> + unsigned long ptr = 0; >> + static const int bufsize = 4096; > > drop static, this would allocate a memory cell to hold the value 4096, while > without it, it''s just a named constant and this will make the compiler happy > >> + loff_t extent_item_offset; > > loff_t is a signed long long, do you really need it?Seems I don''t. I was looking around how I came to use loff_t in the first place, but couldn''t find a good reason. I''m replacing it with u64, which makes more sense anyway, since the loff_t parameter I''m passing around is the result of a subtraction of two u64s.>> + >> + path = btrfs_alloc_path(); >> + >> + swarn.scratch_buf = kmalloc(bufsize, GFP_NOFS); >> + swarn.msg_buf = kmalloc(bufsize, GFP_NOFS); >> + swarn.sector = (sbio->physical + ix * PAGE_SIZE) >> 9; >> + swarn.logical = sbio->logical + ix * PAGE_SIZE; >> + swarn.errstr = errstr; >> + swarn.dev = dev; >> + swarn.msg_bufsize = bufsize; >> + swarn.scratch_bufsize = bufsize; >> + >> + if (!path || !swarn.scratch_buf || !swarn.msg_buf) >> + goto out; >> + >> + ret = data_extent_from_logical(fs_info->extent_root, >> + swarn.logical, path, &found_key); >> + if (ret < 0) >> + goto out; >> + >> + extent_item_offset = swarn.logical - found_key.objectid; >> + swarn.extent_item_size = found_key.offset; >> + >> + eb = path->nodes[0]; >> + ei = btrfs_item_ptr(eb, path->slots[0], struct btrfs_extent_item); >> + item_size = btrfs_item_size_nr(eb, path->slots[0]); >> + >> + btrfs_release_path(path); > > is it safe to release the path here? it''s used in the else branch belowYes, and it''s even required, since iterate_extent_inodes() wants a clean, usable path for itself. To make that more clear, I''ll move the call to btrfs_release_path inside the else block.>> + >> + if (ret) { >> + ret = tree_backref_for_extent(&ptr, eb, ei, item_size, >> + &ref_root, &ref_level); >> + printk(KERN_WARNING "%s at logical %llu on dev %s, " >> + "sector %llu: metadata %s (level %llu) in tree %llu\n", >> + errstr, swarn.logical, dev->name, swarn.sector, >> + ref_level ? "node" : "leaf", ret < 0 ? -1 : ref_level, >> + ret < 0 ? -1 : ref_root); >> + } else { >> + swarn.path = path; >> + iterate_extent_inodes(eb, ei, extent_item_offset, item_size, >> + scrub_print_warning_inode, &swarn); >> + } >> + >> +out: >> + btrfs_free_path(path); >> + kfree(swarn.scratch_buf); >> + kfree(swarn.msg_buf); >> +} >> + >> /* >> * scrub_recheck_error gets called when either verification of the page >> * failed or the bio failed to read, e.g. with EIO. In the latter case, >> @@ -218,6 +347,11 @@ static int scrub_recheck_error(struct scrub_bio *sbio, int ix) >> if (scrub_fixup_check(sbio, ix) == 0) >> return 0; >> } >> + if (printk_ratelimit()) > > please don''t use printk_ratelimit; as a simple printk_ratelimited is not > applicable here, we have to use __ratelimit: (include/linux/printk.h) > > static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > > if (__ratelimit(&_rs))Modified as you suggested, I only included linux/ratelimit.h instead. Thanks! Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2011-Jun-18 11:26 UTC
Re: [PATCH v1 6/6] scrub: add fixup code for errors on nodatasum files
Lots of quotation removed. All removed comments accepted. On 16.06.2011 23:27, David Sterba wrote:> On Mon, Jun 13, 2011 at 09:10:39PM +0200, Jan Schmidt wrote: >> +struct scrub_fixup_nodatasum { >> + struct scrub_dev *sdev; >> + u64 logical; >> + struct btrfs_root *root; >> + struct btrfs_work work; >> + u64 mirror_num; > > int should be enough and is used elsewhere but scrub_page, uses u64 as well; > that''s a bit too much IMOYou are right. scrub.c should be changed to use int mirror_num all over. I''ll integrate a cleanup patch in my next patch series as patch 6/7 and this one can become a clean 7/7 then.>> +}; >> + >> struct scrub_warning { >> struct btrfs_path *path; >> u64 extent_item_size; >> @@ -195,12 +205,13 @@ struct scrub_dev *scrub_setup_dev(struct btrfs_device *dev) >> >> if (i != SCRUB_BIOS_PER_DEV-1) >> sdev->bios[i]->next_free = i + 1; >> - else >> + else >> sdev->bios[i]->next_free = -1; >> } >> sdev->first_free = 0; >> sdev->curr = -1; >> atomic_set(&sdev->in_flight, 0); >> + atomic_set(&sdev->fixup, 0); >> atomic_set(&sdev->cancel_req, 0); >> sdev->csum_size = btrfs_super_csum_size(&fs_info->super_copy); >> INIT_LIST_HEAD(&sdev->csum_list); >> @@ -330,6 +341,141 @@ out: >> kfree(swarn.msg_buf); >> } >> >> +static int scrub_fixup_readpage(u64 inum, loff_t offset, void *ctx) >> +{ >> + struct page *page; >> + unsigned long index; >> + struct scrub_fixup_nodatasum *fixup = ctx; >> + int ret; >> + int corrected; >> + struct btrfs_key key; >> + struct inode *inode; >> + int end = offset + PAGE_SIZE - 1; > > loff_t to int? > >> + >> + key.type = BTRFS_INODE_ITEM_KEY; >> + key.objectid = inum; >> + key.offset = 0; >> + inode = btrfs_iget(fixup->root->fs_info->sb, &key, >> + fixup->root->fs_info->fs_root, NULL); >> + if (IS_ERR(inode)) >> + return -1; > > needs better error code > >> + >> + ret = set_extent_bit(&BTRFS_I(inode)->io_tree, offset, end, >> + EXTENT_DAMAGED, 0, NULL, NULL, GFP_NOFS); >> + if (ret) >> + return ret < 0 ? ret : -1; > > needs better error codeHum. That one is tricky. set_extent_bit can in theory return anything, as there is at least one hook that can be called. If it returns non-zero, that will become the return value of set_extent_bit. As far as I see, that hook will always return 0 at the moment. In case that hook decides to return something >0, I still want iterate_extent_inodes() to terminate iteration. I think adding a WARN_ON for that and returning -EFAULT is an option.>> /* >> - * nodatasum, don''t try to fix anything >> - * FIXME: we can do better, open the inode and trigger a >> - * writeback >> + * increment scrubs_running to prevent cancel requests from >> + * completing as long as a fixup worker is running. we must also >> + * increment scrubs_paused to prevent deadlocking on pause >> + * requests used for transactions commits (as the worker uses a >> + * transaction context). it is safe to regard the fixup worker >> + * as paused for all matters practical. effectively, we only >> + * avoid cancellation requests from completing. > > from a rather brief look, this works as advertised > >> */Bonus points for thinking about that :-) Thanks! Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html