Changes v1-v2: - Remove restriction that BTRFS_IOC_GET_DEVICE_STATS is a privileged operation - Cast u64 to unsigned long long for printf() Changes v2-v3: - Rebased on Chris'' current master Changes v3-v4: - Add padding at end of ioctl structure Changes v4-v5: - The statistic members in the ioctl are now organized as an array of 64 bit values. Symbolic names for the array indexes are defined in an enum, which also defines the max value. This change makes it easier to add new statistic members in the future - Give ins_len = -1 to btrfs_search_slot() when an item might get deleted - Introduce a helper function for the repeated sequence stat_int() + dirty = 1 + stat_print() - Introduce a helper function for the code that shares the bio bi_private member for two pieces of information The goal is to detect when drives start to get an increased error rate, when drives should be replaced soon. Therefore statistic counters are added that count IO errors (read, write and flush). Additionally, the software detected errors like checksum errors and corrupted blocks are counted. An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistics for each involved device are read from the device tree and used to initialize the counters. A patch for the btrfs-progs world will also be sent. Stefan Behrens (3): Btrfs: add device counters for detected IO and checksum errors Btrfs: add ioctl to get and reset the device stats Btrfs: read device stats on mount, write modified ones during commit fs/btrfs/ctree.h | 38 ++++++ fs/btrfs/disk-io.c | 20 +++- fs/btrfs/extent_io.c | 18 ++- fs/btrfs/ioctl.c | 26 +++++ fs/btrfs/ioctl.h | 33 ++++++ fs/btrfs/print-tree.c | 3 + fs/btrfs/scrub.c | 65 ++++++++--- fs/btrfs/transaction.c | 4 + fs/btrfs/volumes.c | 304 +++++++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/volumes.h | 52 +++++++++ 10 files changed, 539 insertions(+), 24 deletions(-) -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2012-May-25 14:06 UTC
[PATCH v5 1/3] Btrfs: add device counters for detected IO and checksum errors
The goal is to detect when drives start to get an increased error rate, when drives should be replaced soon. Therefore statistic counters are added that count IO errors (read, write and flush). Additionally, the software detected errors like checksum errors and corrupted blocks are counted. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> --- fs/btrfs/disk-io.c | 13 ++++--- fs/btrfs/extent_io.c | 18 ++++++++-- fs/btrfs/ioctl.h | 19 ++++++++++ fs/btrfs/scrub.c | 65 +++++++++++++++++++++++++--------- fs/btrfs/volumes.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 45 ++++++++++++++++++++++++ 6 files changed, 230 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a7ffc88..da4d33f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2556,18 +2556,19 @@ recovery_tree_root: static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate) { - char b[BDEVNAME_SIZE]; - if (uptodate) { set_buffer_uptodate(bh); } else { + struct btrfs_device *device = (struct btrfs_device *) + bh->b_private; + printk_ratelimited(KERN_WARNING "lost page write due to " - "I/O error on %s\n", - bdevname(bh->b_bdev, b)); + "I/O error on %s\n", device->name); /* note, we dont'' set_buffer_write_io_error because we have * our own ways of dealing with the IO errors */ clear_buffer_uptodate(bh); + btrfs_dev_stat_inc_and_print(device, BTRFS_DEV_STAT_WRITE_ERRS); } unlock_buffer(bh); put_bh(bh); @@ -2682,6 +2683,7 @@ static int write_dev_supers(struct btrfs_device *device, set_buffer_uptodate(bh); lock_buffer(bh); bh->b_end_io = btrfs_end_buffer_write_sync; + bh->b_private = device; } /* @@ -2740,6 +2742,9 @@ static int write_dev_flush(struct btrfs_device *device, int wait) } if (!bio_flagged(bio, BIO_UPTODATE)) { ret = -EIO; + if (!bio_flagged(bio, BIO_EOPNOTSUPP)) + btrfs_dev_stat_inc_and_print(device, + BTRFS_DEV_STAT_FLUSH_ERRS); } /* drop the reference from the wait == 0 run */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2fb52c2..55f72f8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1923,6 +1923,7 @@ int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start, if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) { /* try to remap that extent elsewhere? */ bio_put(bio); + btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); return -EIO; } @@ -2347,10 +2348,23 @@ static void end_bio_extent_readpage(struct bio *bio, int err) if (uptodate && tree->ops && tree->ops->readpage_end_io_hook) { ret = tree->ops->readpage_end_io_hook(page, start, end, state, mirror); - if (ret) + if (ret) { + /* no IO indicated but software detected errors + * in the block, either checksum errors or + * issues with the contents */ + struct btrfs_root *root + BTRFS_I(page->mapping->host)->root; + struct btrfs_device *device; + uptodate = 0; - else + device = btrfs_find_device_for_logical( + root, start, mirror); + if (device) + btrfs_dev_stat_inc_and_print(device, + BTRFS_DEV_STAT_CORRUPTION_ERRS); + } else { clean_io_failure(start, page); + } } if (!uptodate && tree->ops && tree->ops->readpage_io_failed_hook) { diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 086e6bd..5bf05e2 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -266,6 +266,25 @@ struct btrfs_ioctl_logical_ino_args { __u64 inodes; }; +enum btrfs_dev_stat_values { + /* disk I/O failure stats */ + BTRFS_DEV_STAT_WRITE_ERRS, /* EIO or EREMOTEIO from lower layers */ + BTRFS_DEV_STAT_READ_ERRS, /* EIO or EREMOTEIO from lower layers */ + BTRFS_DEV_STAT_FLUSH_ERRS, /* EIO or EREMOTEIO from lower layers */ + + /* stats for indirect indications for I/O failures */ + BTRFS_DEV_STAT_CORRUPTION_ERRS, /* checksum error, bytenr error or + * contents is illegal: this is an + * indication that the block was damaged + * during read or write, or written to + * wrong location or read from wrong + * location */ + BTRFS_DEV_STAT_GENERATION_ERRS, /* an indication that blocks have not + * been written */ + + BTRFS_DEV_STAT_VALUES_MAX +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 7e487be..5e58416 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -50,7 +50,7 @@ struct scrub_dev; struct scrub_page { struct scrub_block *sblock; struct page *page; - struct block_device *bdev; + struct btrfs_device *dev; u64 flags; /* extent flags */ u64 generation; u64 logical; @@ -86,6 +86,7 @@ struct scrub_block { unsigned int header_error:1; unsigned int checksum_error:1; unsigned int no_io_error_seen:1; + unsigned int generation_error:1; /* also sets header_error */ }; }; @@ -675,6 +676,8 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) sdev->stat.read_errors++; sdev->stat.uncorrectable_errors++; spin_unlock(&sdev->stat_lock); + btrfs_dev_stat_inc_and_print(sdev->dev, + BTRFS_DEV_STAT_READ_ERRS); goto out; } @@ -686,6 +689,8 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) sdev->stat.read_errors++; sdev->stat.uncorrectable_errors++; spin_unlock(&sdev->stat_lock); + btrfs_dev_stat_inc_and_print(sdev->dev, + BTRFS_DEV_STAT_READ_ERRS); goto out; } BUG_ON(failed_mirror_index >= BTRFS_MAX_MIRRORS); @@ -699,6 +704,8 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) sdev->stat.read_errors++; sdev->stat.uncorrectable_errors++; spin_unlock(&sdev->stat_lock); + btrfs_dev_stat_inc_and_print(sdev->dev, + BTRFS_DEV_STAT_READ_ERRS); goto out; } @@ -725,12 +732,16 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) spin_unlock(&sdev->stat_lock); if (__ratelimit(&_rs)) scrub_print_warning("i/o error", sblock_to_check); + btrfs_dev_stat_inc_and_print(sdev->dev, + BTRFS_DEV_STAT_READ_ERRS); } else if (sblock_bad->checksum_error) { spin_lock(&sdev->stat_lock); sdev->stat.csum_errors++; spin_unlock(&sdev->stat_lock); if (__ratelimit(&_rs)) scrub_print_warning("checksum error", sblock_to_check); + btrfs_dev_stat_inc_and_print(sdev->dev, + BTRFS_DEV_STAT_CORRUPTION_ERRS); } else if (sblock_bad->header_error) { spin_lock(&sdev->stat_lock); sdev->stat.verify_errors++; @@ -738,6 +749,12 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) if (__ratelimit(&_rs)) scrub_print_warning("checksum/header error", sblock_to_check); + if (sblock_bad->generation_error) + btrfs_dev_stat_inc_and_print(sdev->dev, + BTRFS_DEV_STAT_GENERATION_ERRS); + else + btrfs_dev_stat_inc_and_print(sdev->dev, + BTRFS_DEV_STAT_CORRUPTION_ERRS); } if (sdev->readonly) @@ -998,8 +1015,8 @@ static int scrub_setup_recheck_block(struct scrub_dev *sdev, page = sblock->pagev + page_index; page->logical = logical; page->physical = bbio->stripes[mirror_index].physical; - /* for missing devices, bdev is NULL */ - page->bdev = bbio->stripes[mirror_index].dev->bdev; + /* for missing devices, dev->bdev is NULL */ + page->dev = bbio->stripes[mirror_index].dev; page->mirror_num = mirror_index + 1; page->page = alloc_page(GFP_NOFS); if (!page->page) { @@ -1043,7 +1060,7 @@ static int scrub_recheck_block(struct btrfs_fs_info *fs_info, struct scrub_page *page = sblock->pagev + page_num; DECLARE_COMPLETION_ONSTACK(complete); - if (page->bdev == NULL) { + if (page->dev->bdev == NULL) { page->io_error = 1; sblock->no_io_error_seen = 0; continue; @@ -1053,7 +1070,7 @@ static int scrub_recheck_block(struct btrfs_fs_info *fs_info, bio = bio_alloc(GFP_NOFS, 1); if (!bio) return -EIO; - bio->bi_bdev = page->bdev; + bio->bi_bdev = page->dev->bdev; bio->bi_sector = page->physical >> 9; bio->bi_end_io = scrub_complete_bio_end_io; bio->bi_private = &complete; @@ -1102,11 +1119,14 @@ static void scrub_recheck_block_checksum(struct btrfs_fs_info *fs_info, h = (struct btrfs_header *)mapped_buffer; if (sblock->pagev[0].logical != le64_to_cpu(h->bytenr) || - generation != le64_to_cpu(h->generation) || memcmp(h->fsid, fs_info->fsid, BTRFS_UUID_SIZE) || memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid, - BTRFS_UUID_SIZE)) + BTRFS_UUID_SIZE)) { sblock->header_error = 1; + } else if (generation != le64_to_cpu(h->generation)) { + sblock->header_error = 1; + sblock->generation_error = 1; + } csum = h->csum; } else { if (!have_csum) @@ -1183,7 +1203,7 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad, bio = bio_alloc(GFP_NOFS, 1); if (!bio) return -EIO; - bio->bi_bdev = page_bad->bdev; + bio->bi_bdev = page_bad->dev->bdev; bio->bi_sector = page_bad->physical >> 9; bio->bi_end_io = scrub_complete_bio_end_io; bio->bi_private = &complete; @@ -1197,6 +1217,12 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad, /* this will also unplug the queue */ wait_for_completion(&complete); + if (!bio_flagged(bio, BIO_UPTODATE)) { + btrfs_dev_stat_inc_and_print(page_bad->dev, + BTRFS_DEV_STAT_WRITE_ERRS); + bio_put(bio); + return -EIO; + } bio_put(bio); } @@ -1353,7 +1379,8 @@ static int scrub_checksum_super(struct scrub_block *sblock) u64 mapped_size; void *p; u32 crc = ~(u32)0; - int fail = 0; + int fail_gen = 0; + int fail_cor = 0; u64 len; int index; @@ -1364,13 +1391,13 @@ static int scrub_checksum_super(struct scrub_block *sblock) memcpy(on_disk_csum, s->csum, sdev->csum_size); if (sblock->pagev[0].logical != le64_to_cpu(s->bytenr)) - ++fail; + ++fail_cor; if (sblock->pagev[0].generation != le64_to_cpu(s->generation)) - ++fail; + ++fail_gen; if (memcmp(s->fsid, fs_info->fsid, BTRFS_UUID_SIZE)) - ++fail; + ++fail_cor; len = BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE; mapped_size = PAGE_SIZE - BTRFS_CSUM_SIZE; @@ -1395,9 +1422,9 @@ static int scrub_checksum_super(struct scrub_block *sblock) btrfs_csum_final(crc, calculated_csum); if (memcmp(calculated_csum, on_disk_csum, sdev->csum_size)) - ++fail; + ++fail_cor; - if (fail) { + if (fail_cor + fail_gen) { /* * if we find an error in a super block, we just report it. * They will get written with the next transaction commit @@ -1406,9 +1433,15 @@ static int scrub_checksum_super(struct scrub_block *sblock) spin_lock(&sdev->stat_lock); ++sdev->stat.super_errors; spin_unlock(&sdev->stat_lock); + if (fail_cor) + btrfs_dev_stat_inc_and_print(sdev->dev, + BTRFS_DEV_STAT_CORRUPTION_ERRS); + else + btrfs_dev_stat_inc_and_print(sdev->dev, + BTRFS_DEV_STAT_GENERATION_ERRS); } - return fail; + return fail_cor + fail_gen; } static void scrub_block_get(struct scrub_block *sblock) @@ -1552,7 +1585,7 @@ static int scrub_pages(struct scrub_dev *sdev, u64 logical, u64 len, return -ENOMEM; } spage->sblock = sblock; - spage->bdev = sdev->dev->bdev; + spage->dev = sdev->dev; spage->flags = flags; spage->generation = gen; spage->logical = logical; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1411b99..05e05fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -23,6 +23,7 @@ #include <linux/random.h> #include <linux/iocontext.h> #include <linux/capability.h> +#include <linux/ratelimit.h> #include <linux/kthread.h> #include <asm/div64.h> #include "compat.h" @@ -4001,13 +4002,58 @@ int btrfs_rmap_block(struct btrfs_mapping_tree *map_tree, return 0; } +static void *merge_stripe_index_into_bio_private(void *bi_private, + unsigned int stripe_index) +{ + /* + * with single, dup, RAID0, RAID1 and RAID10, stripe_index is + * at most 1. + * The alternative solution (instead of stealing bits from the + * pointer) would be to allocate an intermediate structure + * that contains the old private pointer plus the stripe_index. + */ + BUG_ON((((uintptr_t)bi_private) & 3) != 0); + BUG_ON(stripe_index > 3); + return (void *)(((uintptr_t)bi_private) | stripe_index); +} + +static struct btrfs_bio *extract_bbio_from_bio_private(void *bi_private) +{ + return (struct btrfs_bio *)(((uintptr_t)bi_private) & ~((uintptr_t)3)); +} + +static unsigned int extract_stripe_index_from_bio_private(void *bi_private) +{ + return (unsigned int)((uintptr_t)bi_private) & 3; +} + static void btrfs_end_bio(struct bio *bio, int err) { - struct btrfs_bio *bbio = bio->bi_private; + struct btrfs_bio *bbio = extract_bbio_from_bio_private(bio->bi_private); int is_orig_bio = 0; - if (err) + if (err) { atomic_inc(&bbio->error); + if (err == -EIO || err == -EREMOTEIO) { + unsigned int stripe_index + extract_stripe_index_from_bio_private( + bio->bi_private); + struct btrfs_device *dev; + + BUG_ON(stripe_index >= bbio->num_stripes); + dev = bbio->stripes[stripe_index].dev; + if (bio->bi_rw & WRITE) + btrfs_dev_stat_inc(dev, + BTRFS_DEV_STAT_WRITE_ERRS); + else + btrfs_dev_stat_inc(dev, + BTRFS_DEV_STAT_READ_ERRS); + if ((bio->bi_rw & WRITE_FLUSH) == WRITE_FLUSH) + btrfs_dev_stat_inc(dev, + BTRFS_DEV_STAT_FLUSH_ERRS); + btrfs_dev_stat_print_on_error(dev); + } + } if (bio == bbio->orig_bio) is_orig_bio = 1; @@ -4149,6 +4195,8 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio, bio = first_bio; } bio->bi_private = bbio; + bio->bi_private = merge_stripe_index_into_bio_private( + bio->bi_private, (unsigned int)dev_nr); bio->bi_end_io = btrfs_end_bio; bio->bi_sector = bbio->stripes[dev_nr].physical >> 9; dev = bbio->stripes[dev_nr].dev; @@ -4509,6 +4557,28 @@ int btrfs_read_sys_array(struct btrfs_root *root) return ret; } +struct btrfs_device *btrfs_find_device_for_logical(struct btrfs_root *root, + u64 logical, int mirror_num) +{ + struct btrfs_mapping_tree *map_tree = &root->fs_info->mapping_tree; + int ret; + u64 map_length = 0; + struct btrfs_bio *bbio = NULL; + struct btrfs_device *device; + + BUG_ON(mirror_num == 0); + ret = btrfs_map_block(map_tree, WRITE, logical, &map_length, &bbio, + mirror_num); + if (ret) { + BUG_ON(bbio != NULL); + return NULL; + } + BUG_ON(mirror_num != bbio->mirror_num); + device = bbio->stripes[mirror_num - 1].dev; + kfree(bbio); + return device; +} + int btrfs_read_chunk_tree(struct btrfs_root *root) { struct btrfs_path *path; @@ -4583,3 +4653,23 @@ error: btrfs_free_path(path); return ret; } + +void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index) +{ + btrfs_dev_stat_inc(dev, index); + btrfs_dev_stat_print_on_error(dev); +} + +void btrfs_dev_stat_print_on_error(struct btrfs_device *dev) +{ + printk_ratelimited(KERN_ERR + "btrfs: bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u\n", + dev->name, + btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS), + btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_READ_ERRS), + btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_FLUSH_ERRS), + btrfs_dev_stat_read(dev, + BTRFS_DEV_STAT_CORRUPTION_ERRS), + btrfs_dev_stat_read(dev, + BTRFS_DEV_STAT_GENERATION_ERRS)); +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index bb6b03f..193b283 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -22,6 +22,7 @@ #include <linux/bio.h> #include <linux/sort.h> #include "async-thread.h" +#include "ioctl.h" #define BTRFS_STRIPE_LEN (64 * 1024) @@ -106,6 +107,10 @@ struct btrfs_device { struct completion flush_wait; int nobarriers; + /* disk I/O failure stats. For detailed description refer to + * enum btrfs_dev_stat_values in ioctl.h */ + int dev_stats_dirty; /* counters need to be written to disk */ + atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; }; struct btrfs_fs_devices { @@ -281,4 +286,44 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info); int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *max_avail); +struct btrfs_device *btrfs_find_device_for_logical(struct btrfs_root *root, + u64 logical, int mirror_num); +void btrfs_dev_stat_print_on_error(struct btrfs_device *device); +void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index); + +static inline void btrfs_dev_stat_inc(struct btrfs_device *dev, + int index) +{ + atomic_inc(dev->dev_stat_values + index); + dev->dev_stats_dirty = 1; +} + +static inline int btrfs_dev_stat_read(struct btrfs_device *dev, + int index) +{ + return atomic_read(dev->dev_stat_values + index); +} + +static inline int btrfs_dev_stat_read_and_reset(struct btrfs_device *dev, + int index) +{ + int ret; + + ret = atomic_xchg(dev->dev_stat_values + index, 0); + dev->dev_stats_dirty = 1; + return ret; +} + +static inline void btrfs_dev_stat_set(struct btrfs_device *dev, + int index, unsigned long val) +{ + atomic_set(dev->dev_stat_values + index, val); + dev->dev_stats_dirty = 1; +} + +static inline void btrfs_dev_stat_reset(struct btrfs_device *dev, + int index) +{ + btrfs_dev_stat_set(dev, index, 0); +} #endif -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2012-May-25 14:06 UTC
[PATCH v5 2/3] Btrfs: add ioctl to get and reset the device stats
An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> --- fs/btrfs/ioctl.c | 26 ++++++++++++++++++++++++++ fs/btrfs/ioctl.h | 14 ++++++++++++++ fs/btrfs/volumes.c | 34 ++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 3 +++ 4 files changed, 77 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 14f8e1f..2f5072d 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3042,6 +3042,28 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root, return ret; } +static long btrfs_ioctl_get_dev_stats(struct btrfs_root *root, + void __user *arg, int reset_after_read) +{ + struct btrfs_ioctl_get_dev_stats *sa; + int ret; + + if (reset_after_read && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + sa = memdup_user(arg, sizeof(*sa)); + if (IS_ERR(sa)) + return PTR_ERR(sa); + + ret = btrfs_get_dev_stats(root, sa, reset_after_read); + + if (copy_to_user(arg, sa, sizeof(*sa))) + ret = -EFAULT; + + kfree(sa); + return ret; +} + static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user *arg) { int ret = 0; @@ -3424,6 +3446,10 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_balance_ctl(root, arg); case BTRFS_IOC_BALANCE_PROGRESS: return btrfs_ioctl_balance_progress(root, argp); + case BTRFS_IOC_GET_DEV_STATS: + return btrfs_ioctl_get_dev_stats(root, argp, 0); + case BTRFS_IOC_GET_AND_RESET_DEV_STATS: + return btrfs_ioctl_get_dev_stats(root, argp, 1); } return -ENOTTY; diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 5bf05e2..497c530 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -285,6 +285,16 @@ enum btrfs_dev_stat_values { BTRFS_DEV_STAT_VALUES_MAX }; +struct btrfs_ioctl_get_dev_stats { + __u64 devid; /* in */ + __u64 nr_items; /* in/out */ + + /* out values: */ + __u64 values[BTRFS_DEV_STAT_VALUES_MAX]; + + __u64 unused[128 - 2 - BTRFS_DEV_STAT_VALUES_MAX]; /* pad to 1k */ +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -349,5 +359,9 @@ enum btrfs_dev_stat_values { struct btrfs_ioctl_ino_path_args) #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \ struct btrfs_ioctl_ino_path_args) +#define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \ + struct btrfs_ioctl_get_dev_stats) +#define BTRFS_IOC_GET_AND_RESET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 53, \ + struct btrfs_ioctl_get_dev_stats) #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 05e05fb..b9138b7 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4673,3 +4673,37 @@ void btrfs_dev_stat_print_on_error(struct btrfs_device *dev) btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_GENERATION_ERRS)); } + +int btrfs_get_dev_stats(struct btrfs_root *root, + struct btrfs_ioctl_get_dev_stats *stats, + int reset_after_read) +{ + struct btrfs_device *dev; + struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices; + int i; + + mutex_lock(&fs_devices->device_list_mutex); + dev = btrfs_find_device(root, stats->devid, NULL, NULL); + mutex_unlock(&fs_devices->device_list_mutex); + + if (!dev) { + printk(KERN_WARNING + "btrfs: get dev_stats failed, device not found\n"); + return -ENODEV; + } else if (reset_after_read) { + for (i = 0; i < BTRFS_DEV_STAT_VALUES_MAX; i++) { + if (stats->nr_items > i) + stats->values[i] + btrfs_dev_stat_read_and_reset(dev, i); + else + btrfs_dev_stat_reset(dev, i); + } + } else { + for (i = 0; i < BTRFS_DEV_STAT_VALUES_MAX; i++) + if (stats->nr_items > i) + stats->values[i] = btrfs_dev_stat_read(dev, i); + } + if (stats->nr_items > BTRFS_DEV_STAT_VALUES_MAX) + stats->nr_items = BTRFS_DEV_STAT_VALUES_MAX; + return 0; +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 193b283..6798f86 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -290,6 +290,9 @@ struct btrfs_device *btrfs_find_device_for_logical(struct btrfs_root *root, u64 logical, int mirror_num); void btrfs_dev_stat_print_on_error(struct btrfs_device *device); void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index); +int btrfs_get_dev_stats(struct btrfs_root *root, + struct btrfs_ioctl_get_dev_stats *stats, + int reset_after_read); static inline void btrfs_dev_stat_inc(struct btrfs_device *dev, int index) -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2012-May-25 14:06 UTC
[PATCH v5 3/3] Btrfs: read device stats on mount, write modified ones during commit
The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistics for each involved device are read from the device tree and used to initialize the counters. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> --- fs/btrfs/ctree.h | 38 +++++++++++ fs/btrfs/disk-io.c | 7 ++ fs/btrfs/print-tree.c | 3 + fs/btrfs/transaction.c | 4 ++ fs/btrfs/volumes.c | 176 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 4 ++ 6 files changed, 232 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ec42a24..3e5ab89 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -823,6 +823,14 @@ struct btrfs_csum_item { u8 csum; } __attribute__ ((__packed__)); +struct btrfs_dev_stats_item { + /* + * grow this item struct at the end for future enhancements and keep + * the existing values unchanged + */ + __le64 values[BTRFS_DEV_STAT_VALUES_MAX]; +} __attribute__ ((__packed__)); + /* different types of block groups (and chunks) */ #define BTRFS_BLOCK_GROUP_DATA (1ULL << 0) #define BTRFS_BLOCK_GROUP_SYSTEM (1ULL << 1) @@ -1508,6 +1516,12 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_BALANCE_ITEM_KEY 248 /* + * Persistantly stores the io stats in the device tree. + * One key for all stats, (0, BTRFS_DEV_STATS_KEY, devid). + */ +#define BTRFS_DEV_STATS_KEY 249 + +/* * string items are for debugging. They just store a short string of * data in the FS */ @@ -2415,6 +2429,30 @@ static inline u32 btrfs_file_extent_inline_item_len(struct extent_buffer *eb, return btrfs_item_size(eb, e) - offset; } +/* btrfs_dev_stats_item */ +static inline u64 btrfs_dev_stats_value(struct extent_buffer *eb, + struct btrfs_dev_stats_item *ptr, + int index) +{ + u64 val; + + read_extent_buffer(eb, &val, + offsetof(struct btrfs_dev_stats_item, values) + + ((unsigned long)ptr) + (index * sizeof(u64)), + sizeof(val)); + return val; +} + +static inline void btrfs_set_dev_stats_value(struct extent_buffer *eb, + struct btrfs_dev_stats_item *ptr, + int index, u64 val) +{ + write_extent_buffer(eb, &val, + offsetof(struct btrfs_dev_stats_item, values) + + ((unsigned long)ptr) + (index * sizeof(u64)), + sizeof(val)); +} + static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) { return sb->s_fs_info; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index da4d33f..00f541a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2353,6 +2353,13 @@ retry_root_backup: fs_info->generation = generation; fs_info->last_trans_committed = generation; + ret = btrfs_init_dev_stats(fs_info); + if (ret) { + printk(KERN_ERR "btrfs: failed to init dev_stats: %d\n", + ret); + goto fail_block_groups; + } + ret = btrfs_init_space_info(fs_info); if (ret) { printk(KERN_ERR "Failed to initial space info: %d\n", ret); diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index f38e452..5e23684 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -294,6 +294,9 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l) btrfs_dev_extent_chunk_offset(l, dev_extent), (unsigned long long) btrfs_dev_extent_length(l, dev_extent)); + case BTRFS_DEV_STATS_KEY: + printk(KERN_INFO "\t\tdevice stats\n"); + break; }; } } diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3642225..82b03af 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -28,6 +28,7 @@ #include "locking.h" #include "tree-log.h" #include "inode-map.h" +#include "volumes.h" #define BTRFS_ROOT_TRANS_TAG 0 @@ -758,6 +759,9 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans, if (ret) return ret; + ret = btrfs_run_dev_stats(trans, root->fs_info); + BUG_ON(ret); + while (!list_empty(&fs_info->dirty_cowonly_roots)) { next = fs_info->dirty_cowonly_roots.next; list_del_init(next); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index b9138b7..e7a1d2a 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -40,6 +40,8 @@ static int init_first_rw_device(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_device *device); static int btrfs_relocate_sys_chunks(struct btrfs_root *root); +static void __btrfs_reset_dev_stats(struct btrfs_device *dev); +static void btrfs_dev_stat_print_on_load(struct btrfs_device *device); static DEFINE_MUTEX(uuid_mutex); static LIST_HEAD(fs_uuids); @@ -362,6 +364,7 @@ static noinline int device_list_add(const char *path, return -ENOMEM; } device->devid = devid; + device->dev_stats_valid = 0; device->work.func = pending_bios_fn; memcpy(device->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE); @@ -4654,6 +4657,162 @@ error: return ret; } +static void __btrfs_reset_dev_stats(struct btrfs_device *dev) +{ + int i; + + for (i = 0; i < BTRFS_DEV_STAT_VALUES_MAX; i++) + btrfs_dev_stat_reset(dev, i); +} + +int btrfs_init_dev_stats(struct btrfs_fs_info *fs_info) +{ + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *dev_root = fs_info->dev_root; + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct extent_buffer *eb; + int slot; + int ret = 0; + struct btrfs_device *device; + struct btrfs_path *path = NULL; + int i; + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto out; + } + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + int item_size; + struct btrfs_dev_stats_item *ptr; + + key.objectid = 0; + key.type = BTRFS_DEV_STATS_KEY; + key.offset = device->devid; + ret = btrfs_search_slot(NULL, dev_root, &key, path, 0, 0); + if (ret) { + printk(KERN_WARNING "btrfs: no dev_stats entry found for device %s (devid %llu) (OK on first mount after mkfs)\n", + device->name, (unsigned long long)device->devid); + __btrfs_reset_dev_stats(device); + device->dev_stats_valid = 1; + btrfs_release_path(path); + continue; + } + slot = path->slots[0]; + eb = path->nodes[0]; + btrfs_item_key_to_cpu(eb, &found_key, slot); + item_size = btrfs_item_size_nr(eb, slot); + + ptr = btrfs_item_ptr(eb, slot, + struct btrfs_dev_stats_item); + + for (i = 0; i < BTRFS_DEV_STAT_VALUES_MAX; i++) { + if (item_size >= (1 + i) * sizeof(__le64)) + btrfs_dev_stat_set(device, i, + btrfs_dev_stats_value(eb, ptr, i)); + else + btrfs_dev_stat_reset(device, i); + } + + device->dev_stats_valid = 1; + btrfs_dev_stat_print_on_load(device); + btrfs_release_path(path); + } + mutex_unlock(&fs_devices->device_list_mutex); + +out: + btrfs_free_path(path); + return ret < 0 ? ret : 0; +} + +static int update_dev_stat_item(struct btrfs_trans_handle *trans, + struct btrfs_root *dev_root, + struct btrfs_device *device) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct extent_buffer *eb; + struct btrfs_dev_stats_item *ptr; + int ret; + int i; + + key.objectid = 0; + key.type = BTRFS_DEV_STATS_KEY; + key.offset = device->devid; + + path = btrfs_alloc_path(); + BUG_ON(!path); + ret = btrfs_search_slot(trans, dev_root, &key, path, -1, 1); + if (ret < 0) { + printk(KERN_WARNING "btrfs: error %d while searching for dev_stats item for device %s!\n", + ret, device->name); + goto out; + } + + if (ret == 0 && + btrfs_item_size_nr(path->nodes[0], path->slots[0]) < sizeof(*ptr)) { + /* need to delete old one and insert a new one */ + ret = btrfs_del_item(trans, dev_root, path); + if (ret != 0) { + printk(KERN_WARNING "btrfs: delete too small dev_stats item for device %s failed %d!\n", + device->name, ret); + goto out; + } + ret = 1; + } + + if (ret == 1) { + /* need to insert a new item */ + btrfs_release_path(path); + ret = btrfs_insert_empty_item(trans, dev_root, path, + &key, sizeof(*ptr)); + if (ret < 0) { + printk(KERN_WARNING "btrfs: insert dev_stats item for device %s failed %d!\n", + device->name, ret); + goto out; + } + } + + eb = path->nodes[0]; + ptr = btrfs_item_ptr(eb, path->slots[0], struct btrfs_dev_stats_item); + for (i = 0; i < BTRFS_DEV_STAT_VALUES_MAX; i++) + btrfs_set_dev_stats_value(eb, ptr, i, + btrfs_dev_stat_read(device, i)); + btrfs_mark_buffer_dirty(eb); + +out: + btrfs_free_path(path); + return ret; +} + +/* + * called from commit_transaction. Writes all changed device stats to disk. + */ +int btrfs_run_dev_stats(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info) +{ + struct btrfs_root *dev_root = fs_info->dev_root; + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + int ret = 0; + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (!device->dev_stats_valid || !device->dev_stats_dirty) + continue; + + ret = update_dev_stat_item(trans, dev_root, device); + if (!ret) + device->dev_stats_dirty = 0; + } + mutex_unlock(&fs_devices->device_list_mutex); + + return ret; +} + void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index) { btrfs_dev_stat_inc(dev, index); @@ -4662,6 +4821,8 @@ void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index) void btrfs_dev_stat_print_on_error(struct btrfs_device *dev) { + if (!dev->dev_stats_valid) + return; printk_ratelimited(KERN_ERR "btrfs: bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u\n", dev->name, @@ -4674,6 +4835,17 @@ void btrfs_dev_stat_print_on_error(struct btrfs_device *dev) BTRFS_DEV_STAT_GENERATION_ERRS)); } +static void btrfs_dev_stat_print_on_load(struct btrfs_device *dev) +{ + printk(KERN_INFO "btrfs: bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u\n", + dev->name, + btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS), + btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_READ_ERRS), + btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_FLUSH_ERRS), + btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_CORRUPTION_ERRS), + btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_GENERATION_ERRS)); +} + int btrfs_get_dev_stats(struct btrfs_root *root, struct btrfs_ioctl_get_dev_stats *stats, int reset_after_read) @@ -4690,6 +4862,10 @@ int btrfs_get_dev_stats(struct btrfs_root *root, printk(KERN_WARNING "btrfs: get dev_stats failed, device not found\n"); return -ENODEV; + } else if (!dev->dev_stats_valid) { + printk(KERN_WARNING + "btrfs: get dev_stats failed, not yet valid\n"); + return -ENODEV; } else if (reset_after_read) { for (i = 0; i < BTRFS_DEV_STAT_VALUES_MAX; i++) { if (stats->nr_items > i) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 6798f86..3406a88 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -109,6 +109,7 @@ struct btrfs_device { /* disk I/O failure stats. For detailed description refer to * enum btrfs_dev_stat_values in ioctl.h */ + int dev_stats_valid; int dev_stats_dirty; /* counters need to be written to disk */ atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; }; @@ -293,6 +294,9 @@ void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index); int btrfs_get_dev_stats(struct btrfs_root *root, struct btrfs_ioctl_get_dev_stats *stats, int reset_after_read); +int btrfs_init_dev_stats(struct btrfs_fs_info *fs_info); +int btrfs_run_dev_stats(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info); static inline void btrfs_dev_stat_inc(struct btrfs_device *dev, int index) -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2012-May-25 15:18 UTC
Re: [PATCH v5 0/3] Btrfs: add IO error device stats
Can you explain why the device error counters should be in a filesystem instead of generic block layer code? On Fri, May 25, 2012 at 04:06:07PM +0200, Stefan Behrens wrote:> Changes v1-v2: > - Remove restriction that BTRFS_IOC_GET_DEVICE_STATS is a privileged > operation > - Cast u64 to unsigned long long for printf() > > Changes v2-v3: > - Rebased on Chris'' current master > > Changes v3-v4: > - Add padding at end of ioctl structure > > Changes v4-v5: > - The statistic members in the ioctl are now organized as an array of > 64 bit values. Symbolic names for the array indexes are defined in > an enum, which also defines the max value. This change makes it > easier to add new statistic members in the future > - Give ins_len = -1 to btrfs_search_slot() when an item might get > deleted > - Introduce a helper function for the repeated sequence stat_int() + > dirty = 1 + stat_print() > - Introduce a helper function for the code that shares the bio > bi_private member for two pieces of information > > The goal is to detect when drives start to get an increased error rate, > when drives should be replaced soon. Therefore statistic counters are > added that count IO errors (read, write and flush). Additionally, the > software detected errors like checksum errors and corrupted blocks are > counted. > > An ioctl interface is added to get the device statistic counters. > A second ioctl is added to atomically get and reset these counters. > > The device statistics are written into the device tree with each > transaction commit. Only modified statistics are written. > When a filesystem is mounted, the device statistics for each involved > device are read from the device tree and used to initialize the > counters. > > A patch for the btrfs-progs world will also be sent. > > Stefan Behrens (3): > Btrfs: add device counters for detected IO and checksum errors > Btrfs: add ioctl to get and reset the device stats > Btrfs: read device stats on mount, write modified ones during commit > > fs/btrfs/ctree.h | 38 ++++++ > fs/btrfs/disk-io.c | 20 +++- > fs/btrfs/extent_io.c | 18 ++- > fs/btrfs/ioctl.c | 26 +++++ > fs/btrfs/ioctl.h | 33 ++++++ > fs/btrfs/print-tree.c | 3 + > fs/btrfs/scrub.c | 65 ++++++++--- > fs/btrfs/transaction.c | 4 + > fs/btrfs/volumes.c | 304 +++++++++++++++++++++++++++++++++++++++++++++++- > fs/btrfs/volumes.h | 52 +++++++++ > 10 files changed, 539 insertions(+), 24 deletions(-) > > -- > 1.7.10.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html---end quoted text---
It would be helpful if already the generic block layer would offer device error counters. Then btrfs could read them, add own counters for its checksum detected errors, and store everything persistently in the filesystem. The goal is to replace disks that have an increased error rate with spare disks, and the goal is to repair this degenerated RAID state quickly. On 05/25/2012 17:18, Christoph Hellwig wrote:> Can you explain why the device error counters should be in a filesystem > instead of generic block layer code? > > On Fri, May 25, 2012 at 04:06:07PM +0200, Stefan Behrens wrote:[...]>> The goal is to detect when drives start to get an increased error rate, >> when drives should be replaced soon. Therefore statistic counters are >> added that count IO errors (read, write and flush). Additionally, the >> software detected errors like checksum errors and corrupted blocks are >> counted. >> >> An ioctl interface is added to get the device statistic counters. >> A second ioctl is added to atomically get and reset these counters. >> >> The device statistics are written into the device tree with each >> transaction commit. Only modified statistics are written. >> When a filesystem is mounted, the device statistics for each involved >> device are read from the device tree and used to initialize the >> counters. >> >> A patch for the btrfs-progs world will also be sent. >> >> Stefan Behrens (3): >> Btrfs: add device counters for detected IO and checksum errors >> Btrfs: add ioctl to get and reset the device stats >> Btrfs: read device stats on mount, write modified ones during commit >> >> fs/btrfs/ctree.h | 38 ++++++ >> fs/btrfs/disk-io.c | 20 +++- >> fs/btrfs/extent_io.c | 18 ++- >> fs/btrfs/ioctl.c | 26 +++++ >> fs/btrfs/ioctl.h | 33 ++++++ >> fs/btrfs/print-tree.c | 3 + >> fs/btrfs/scrub.c | 65 ++++++++--- >> fs/btrfs/transaction.c | 4 + >> fs/btrfs/volumes.c | 304 +++++++++++++++++++++++++++++++++++++++++++++++- >> fs/btrfs/volumes.h | 52 +++++++++ >> 10 files changed, 539 insertions(+), 24 deletions(-) >> >> -- >> 1.7.10.2-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/25/12 19:49, Stefan Behrens wrote:> It would be helpful if already the generic block layer would offer > device error counters. Then btrfs could read them, add own counters for > its checksum detected errors, and store everything persistently in the > filesystem. >I take it that you not only count I/O-errors, but also corrupted blocks and errors generated by misdirected writes. These are informations that are not available to the block layer.> The goal is to replace disks that have an increased error rate with > spare disks, and the goal is to repair this degenerated RAID state quickly. > > > On 05/25/2012 17:18, Christoph Hellwig wrote: >> Can you explain why the device error counters should be in a filesystem >> instead of generic block layer code? >> >> On Fri, May 25, 2012 at 04:06:07PM +0200, Stefan Behrens wrote: > [...] >>> The goal is to detect when drives start to get an increased error rate, >>> when drives should be replaced soon. Therefore statistic counters are >>> added that count IO errors (read, write and flush). Additionally, the >>> software detected errors like checksum errors and corrupted blocks are >>> counted. >>> >>> An ioctl interface is added to get the device statistic counters. >>> A second ioctl is added to atomically get and reset these counters. >>> >>> The device statistics are written into the device tree with each >>> transaction commit. Only modified statistics are written. >>> When a filesystem is mounted, the device statistics for each involved >>> device are read from the device tree and used to initialize the >>> counters. >>> >>> A patch for the btrfs-progs world will also be sent. >>> >>> Stefan Behrens (3): >>> Btrfs: add device counters for detected IO and checksum errors >>> Btrfs: add ioctl to get and reset the device stats >>> Btrfs: read device stats on mount, write modified ones during commit >>> >>> fs/btrfs/ctree.h | 38 ++++++ >>> fs/btrfs/disk-io.c | 20 +++- >>> fs/btrfs/extent_io.c | 18 ++- >>> fs/btrfs/ioctl.c | 26 +++++ >>> fs/btrfs/ioctl.h | 33 ++++++ >>> fs/btrfs/print-tree.c | 3 + >>> fs/btrfs/scrub.c | 65 ++++++++--- >>> fs/btrfs/transaction.c | 4 + >>> fs/btrfs/volumes.c | 304 >>> +++++++++++++++++++++++++++++++++++++++++++++++- >>> fs/btrfs/volumes.h | 52 +++++++++ >>> 10 files changed, 539 insertions(+), 24 deletions(-) >>> >>> -- >>> 1.7.10.2 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Maybe Matching Threads
- [PATCH 0/3] Btrfs: add IO error device stats
- [PATCH] Btrfs: fix crash in scrub repair code when device is missing
- [PATCH 00/11] Btrfs: some patches for 3.3
- [LLVMdev] Generating superblocks (SEME regions w/o loops and calls) in LLVM
- [PATCH] btrfs-progs: avoid memory leak in btrfs_close_devices