Miao Xie
2013-Jul-04 10:37 UTC
[PATCH] Btrfs: fix wrong write offset when replacing a device
The filesystem was corrupted after we did a device replace. Steps to reproduce: # mkfs.btrfs -f -m single -d raid10 <device0>..<device3> # mount <device0> <mnt> # btrfs replace start -rfB 1 <device4> <mnt> # umount <mnt> # btrfsck <device4> The reason is that we changed the write offset by mistake. When we replace the a device, we read the data from the source device at first, and then write the data into the corresponding place of the new device. But when we read the data, we will try to spread the read request among the mirrors. It is a good idea that can speed up the read request, but should not change the write offset because it is corresponding to the source device, not the mirror device, if we change it by the offset of the mirror device, we would get the wrong offset. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/scrub.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 929093d..092bec4 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -228,7 +228,6 @@ static void scrub_bio_end_io_worker(struct btrfs_work *work); static void scrub_block_complete(struct scrub_block *sblock); static void scrub_remap_extent(struct btrfs_fs_info *fs_info, u64 extent_logical, u64 extent_len, - u64 *extent_physical, struct btrfs_device **extent_dev, int *extent_mirror_num); static int scrub_setup_wr_ctx(struct scrub_ctx *sctx, @@ -2482,8 +2481,7 @@ again: extent_mirror_num = mirror_num; if (is_dev_replace) scrub_remap_extent(fs_info, extent_logical, - extent_len, &extent_physical, - &extent_dev, + extent_len, &extent_dev, &extent_mirror_num); ret = btrfs_lookup_csums_range(csum_root, logical, @@ -3057,7 +3055,6 @@ int btrfs_scrub_progress(struct btrfs_root *root, u64 devid, static void scrub_remap_extent(struct btrfs_fs_info *fs_info, u64 extent_logical, u64 extent_len, - u64 *extent_physical, struct btrfs_device **extent_dev, int *extent_mirror_num) { @@ -3074,7 +3071,6 @@ static void scrub_remap_extent(struct btrfs_fs_info *fs_info, return; } - *extent_physical = bbio->stripes[0].physical; *extent_mirror_num = bbio->mirror_num; *extent_dev = bbio->stripes[0].dev; kfree(bbio); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2013-Jul-04 12:20 UTC
Re: [PATCH] Btrfs: fix wrong write offset when replacing a device
On Thu, 4 Jul 2013 18:37:21 +0800, Miao Xie wrote:> The filesystem was corrupted after we did a device replace. > > Steps to reproduce: > # mkfs.btrfs -f -m single -d raid10 <device0>..<device3> > # mount <device0> <mnt> > # btrfs replace start -rfB 1 <device4> <mnt> > # umount <mnt> > # btrfsck <device4> > > The reason is that we changed the write offset by mistake. When we > replace the a device, we read the data from the source device at first, > and then write the data into the corresponding place of the new device. > But when we read the data, we will try to spread the read request among > the mirrors. It is a good idea that can speed up the read request, but > should not change the write offset because it is corresponding to the source > device, not the mirror device, if we change it by the offset of the mirror > device, we would get the wrong offset. Fix it.The main reason for the remapping is not to try to spread the read request, it is done in order to implement the "-r" option ("-r only read from srcdev if no other zero-defect mirror exists", this handles the case where the srcdev is very slow due to lots of read errors). Commit 625f1c8dc changed the write address. It was the right address before, the unmapped one. The physical disk address that is used to read from disk needs to be the mapped one (refer to scrub_add_page_to_rd_bio()), the address to write to disk needs to be the unmapped one. With the current patch, you use the right one for writing but the wrong one for reading. The following code should fix the wrong write address, it reverts the change to the write address from the aforementioned commit. I''ll prepare and send a proper commit (after testing all RAID/single/dup cases). --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2495,7 +2495,8 @@ again: ret = scrub_extent(sctx, extent_logical, extent_len, extent_physical, extent_dev, flags, generation, extent_mirror_num, - extent_physical); + extent_logical - logical + + physical); if (ret) goto out;> > Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> > --- > fs/btrfs/scrub.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index 929093d..092bec4 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -228,7 +228,6 @@ static void scrub_bio_end_io_worker(struct btrfs_work *work); > static void scrub_block_complete(struct scrub_block *sblock); > static void scrub_remap_extent(struct btrfs_fs_info *fs_info, > u64 extent_logical, u64 extent_len, > - u64 *extent_physical, > struct btrfs_device **extent_dev, > int *extent_mirror_num); > static int scrub_setup_wr_ctx(struct scrub_ctx *sctx, > @@ -2482,8 +2481,7 @@ again: > extent_mirror_num = mirror_num; > if (is_dev_replace) > scrub_remap_extent(fs_info, extent_logical, > - extent_len, &extent_physical, > - &extent_dev, > + extent_len, &extent_dev, > &extent_mirror_num); > > ret = btrfs_lookup_csums_range(csum_root, logical, > @@ -3057,7 +3055,6 @@ int btrfs_scrub_progress(struct btrfs_root *root, u64 devid, > > static void scrub_remap_extent(struct btrfs_fs_info *fs_info, > u64 extent_logical, u64 extent_len, > - u64 *extent_physical, > struct btrfs_device **extent_dev, > int *extent_mirror_num) > { > @@ -3074,7 +3071,6 @@ static void scrub_remap_extent(struct btrfs_fs_info *fs_info, > return; > } > > - *extent_physical = bbio->stripes[0].physical; > *extent_mirror_num = bbio->mirror_num; > *extent_dev = bbio->stripes[0].dev; > kfree(bbio); >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2013-Jul-04 13:50 UTC
[PATCH] Btrfs: fix wrong write offset when replacing a device
Miao Xie reported the following issue: The filesystem was corrupted after we did a device replace. Steps to reproduce: # mkfs.btrfs -f -m single -d raid10 <device0>..<device3> # mount <device0> <mnt> # btrfs replace start -rfB 1 <device4> <mnt> # umount <mnt> # btrfsck <device4> The reason for the issue is that we changed the write offset by mistake, introduced by commit 625f1c8dc. We read the data from the source device at first, and then write the data into the corresponding place of the new device. In order to implement the "-r" option, the source location is remapped using btrfs_map_block(). The read takes place on the mapped location, and the write needs to take place on the unmapped location. Currently the write is using the mapped location, and this commit changes it back by undoing the change to the write address that the aforementioned commit added by mistake. Reported-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> --- fs/btrfs/scrub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 4ba2a69..64a157b 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2495,7 +2495,7 @@ again: ret = scrub_extent(sctx, extent_logical, extent_len, extent_physical, extent_dev, flags, generation, extent_mirror_num, - extent_physical); + extent_logical - logical + physical); if (ret) goto out; -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html