thr3ads.net - Btrfs devel - Re: raid10 make_request failure during iozone benchmark upon btrfs [Jul 2012]

If this information is useful, please help other people find it:
Share via:

NeilBrown

2012-Jul-03 02:47 UTC

Re: raid10 make_request failure during iozone benchmark upon btrfs

On Tue, 03 Jul 2012 03:13:33 +0100 Kerin Millar <kerframil@gmail.com>
wrote:
> Hi,
> 
> On 03/07/2012 02:39, NeilBrown wrote:
> 
> [snip]
> 
>  >>> Could you please double check that you are running a kernel
with
>  >>>
>  >>> commit aba336bd1d46d6b0404b06f6915ed76150739057
>  >>> Author: NeilBrown<neilb@suse.de>
>  >>> Date:   Thu May 31 15:39:11 2012 +1000
>  >>>
>  >>>       md: raid1/raid10: fix problem with merge_bvec_fn
>  >>>
>  >>> in it?
>  >>
>  >> I am indeed. I searched the list beforehand and noticed the patch
in
>  >> question. Not sure which -rc it landed in but I checked my source
tree
>  >> and it''s definitely in there.
>  >>
>  >> Cheers,
>  >>
>  >> --Kerin
>  >
>  > Thanks.
>  > Looking at it again I see that it is definitely a different bug, that
patch
>  > wouldn''t affect it.
>  >
>  > But I cannot see what could possibly be causing the problem.
>  > You have a 256K chunk size, so requests should be limited to 512
sectors
>  > aligned at a 512-sector boundary.
>  > However all the requests that a causing errors are 512 sectors long,
but
>  > aligned on a 256-sector boundary (which is not also 512-sector). 
This is
>  > wrong.
> 
> I see.
> 
>  >
>  > It could be that btrfs is submitting bad requests, but I think it
always uses
>  > bio_add_page, and bio_add_page appears to do the right thing.
>  > It could be that dm-linear is causing problem, but it seems to
correctly after
>  > the underlying device for alignment, and reports that alignment to
>  > bio_add_page.
>  > It could be that md/raid10 is the problem but I cannot find any fault
in
>  > raid10_mergeable_bvec - performs much the same tests that the
>  > raid01 make_request function does.
>  >
>  > So it is a mystery.
>  >
>  > Is this failure repeatable?
> 
> Yes, it''s reproducible with 100% consistency. Furthermore, I tried
to
> use the btrfs volume as a store for the package manager, so as to try
> with a ''realistic'' workload. Many of these errors were
triggered
> immediately upon invoking the package manager. In case it matters, the
> package manager is portage (in Gentoo Linux) and the directory structure
> entails a shallow directory depth with a large number of distributed
> small files. I haven''t been able to reproduce with xfs, ext4 or
reiserfs.
> 
>  >
>  > If so, could you please insert
>  >     WARN_ON_ONCE(1);
>  > in drivers/md/raid10.c where it prints out the message: just after
the
>  > "bad_map:" label.
>  >
>  > Also, in raid10_mergeable_bvec, insert
>  >     WARN_ON_ONCE(max<  0);
>  > just before
>  > 		if (max<  0)
>  > 			/* bio_add cannot handle a negative return */
>  > 			max = 0;
>  >
>  > and then see if either of those generate a warning, and post the full
stack
>  > trace  if they do.
> 
> OK. I ran iozone again on a fresh filesystem, mounted with the default
> options. Here''s the trace that appears, just before the first
> make_request_bug message:
> 
> WARNING: at drivers/md/raid10.c:1094 make_request+0xda5/0xe20()
> Hardware name: ProLiant MicroServer
> Modules linked in: btrfs zlib_deflate lzo_compress kvm_amd kvm sp5100_tco
i2c_piix4
> Pid: 1031, comm: btrfs-submit-1 Not tainted 3.5.0-rc5 #3
> Call Trace:
> [<ffffffff81031987>] ? warn_slowpath_common+0x67/0xa0
> [<ffffffff81442b45>] ? make_request+0xda5/0xe20
> [<ffffffff81460b34>] ? __split_and_process_bio+0x2d4/0x600
> [<ffffffff81063429>] ? set_next_entity+0x29/0x60
> [<ffffffff810652c3>] ? pick_next_task_fair+0x63/0x140
> [<ffffffff81450b7f>] ? md_make_request+0xbf/0x1e0
> [<ffffffff8123d12f>] ? generic_make_request+0xaf/0xe0
> [<ffffffff8123d1c3>] ? submit_bio+0x63/0xe0
> [<ffffffff81040abd>] ? try_to_del_timer_sync+0x7d/0x120
> [<ffffffffa016839a>] ? run_scheduled_bios+0x23a/0x520 [btrfs]
> [<ffffffffa0170e40>] ? worker_loop+0x120/0x520 [btrfs]
> [<ffffffffa0170d20>] ? btrfs_queue_worker+0x2e0/0x2e0 [btrfs]
> [<ffffffff810520c5>] ? kthread+0x85/0xa0
> [<ffffffff815441f4>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff81052040>] ? kthread_freezable_should_stop+0x60/0x60
> [<ffffffff815441f0>] ? gs_change+0xb/0xb
> 
> Cheers,
> 
> --Kerin
Thanks.  Looks like it is a btrfs bug - so a big "hello" to
linux-btrfs :-)

The symptom is that iozone on btrfs on md/raid10 can result in

[  919.893454] md/raid10:md0: make_request bug: can''t convert block
across chunks or bigger than 256k 6653500160 256
[  919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0,
corrupt 0, gen 0


i.e. RAID10 has a 256K chunk size, but is getting 256K requests which overlap
two chunks - the last half of one chunk and the first half of the next.
That isn''t allowed and raid10_mergeable_bvec, called by bio_add_page,
should
prevent it.

However btrfs_map_bio() sets ->bi_sector to a new value without verifying
that the resulting bio is still acceptable - which it isn''t.

The core problem is that you cannot build a bio for one location, then use it
freely at another location.
md/raid1 handles this by checking each addition to a bio against all the
possible location that it might read/write it.  Maybe btrfs could do the
same.
Alternately we could work with Kent Overstreet (of bcache fame) to remove the
restriction that the fs must make the bio compatible with the device -
instead requiring the device to split bios when needed, and making it easy to
do that (currently it is not easy).
And there are probably other alternative.

Thanks,
NeilBrown

Chris Mason

2012-Jul-03 15:08 UTC

head link

Re: raid10 make_request failure during iozone benchmark upon btrfs

On Mon, Jul 02, 2012 at 08:47:27PM -0600, NeilBrown
wrote:> Thanks.  Looks like it is a btrfs bug - so a big "hello" to
linux-btrfs :-)
> 
> The symptom is that iozone on btrfs on md/raid10 can result in
> 
> [  919.893454] md/raid10:md0: make_request bug: can''t convert
block across chunks or bigger than 256k 6653500160 256
> [  919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0,
corrupt 0, gen 0
> 
> 
> i.e. RAID10 has a 256K chunk size, but is getting 256K requests which
overlap
> two chunks - the last half of one chunk and the first half of the next.
> That isn''t allowed and raid10_mergeable_bvec, called by
bio_add_page, should
> prevent it.
> 
> However btrfs_map_bio() sets ->bi_sector to a new value without
verifying
> that the resulting bio is still acceptable - which it isn''t.
> 
> The core problem is that you cannot build a bio for one location, then use
it
> freely at another location.
> md/raid1 handles this by checking each addition to a bio against all the
> possible location that it might read/write it.  Maybe btrfs could do the
> same.
> Alternately we could work with Kent Overstreet (of bcache fame) to remove
the
> restriction that the fs must make the bio compatible with the device -
> instead requiring the device to split bios when needed, and making it easy
to
> do that (currently it is not easy).
> And there are probably other alternative.
In this case btrfs should really break the bio down to smaller chunks
and hand feed the lower layers.  There are corners where we think the
device can go a certain size and then later on figure out we were just
too optimistic.  So we should deal with it by breaking the bio up and
then lowering our max.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-raid"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jul 2012 - Re: raid10 make_request failure during iozone benchmark upon btrfs

Re: raid10 make_request failure during iozone benchmark upon btrfs

Re: raid10 make_request failure during iozone benchmark upon btrfs