Write bios are submitted from the submit_worker. The worker pumps down bios into the block layer until it signals a congestion. At least this is the theory. In pratice submit_bio just blocks before any signalling happens. As the bios are queued per device, this can lead to a situation where only one device is served until all bios are submitted, and only then the next device is served. This is obviously suboptimal. This patch just throws out the congestion detection and reschedules the worker every 8 requests. This way, all devices can be kept busy. This is only a temporary fix until the block layer provides a non-blocking submit_bio. Then the whole submit_worker mechanism can be killed. Signed-off-by: Arne Jansen <sensille@gmx.net> --- fs/btrfs/volumes.c | 30 +----------------------------- 1 files changed, 1 insertions(+), 29 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c37433d..5b01742 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -257,36 +257,8 @@ loop_lock: * is now congested. Back off and let other work structs * run instead */ - if (pending && bdi_write_congested(bdi) && batch_run > 8 && + if (pending && batch_run > 8 && fs_info->fs_devices->open_devices > 1) { - struct io_context *ioc; - - ioc = current->io_context; - - /* - * the main goal here is that we don''t want to - * block if we''re going to be able to submit - * more requests without blocking. - * - * This code does two great things, it pokes into - * the elevator code from a filesystem _and_ - * it makes assumptions about how batching works. - */ - if (ioc && ioc->nr_batch_requests > 0 && - time_before(jiffies, ioc->last_waited + HZ/50UL) && - (last_waited == 0 || - ioc->last_waited == last_waited)) { - /* - * we want to go through our batch of - * requests and stop. So, we copy out - * the ioc->last_waited time and test - * against it before looping - */ - last_waited = ioc->last_waited; - if (need_resched()) - cond_resched(); - continue; - } spin_lock(&device->io_lock); requeue_list(pending_bios, pending, tail); device->running_pending = 1; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 29, 2011 at 09:40:56PM +0100, Arne Jansen wrote:> Write bios are submitted from the submit_worker. The worker pumps down > bios into the block layer until it signals a congestion. At least this > is the theory. In pratice submit_bio just blocks before any signalling > happens. As the bios are queued per device, this can lead to a situation > where only one device is served until all bios are submitted, and only > then the next device is served. This is obviously suboptimal. > This patch just throws out the congestion detection and reschedules the > worker every 8 requests. This way, all devices can be kept busy. > This is only a temporary fix until the block layer provides a non-blocking > submit_bio. Then the whole submit_worker mechanism can be killed.The problem with the every 8 requests logic is that we''ve still got a pretty good chance of getting stuck behind get_request_wait. The way the elevator batching works is that it should give us a batch of requests, and once that batch is done we wait. If we jump around every 8 requests, we''ve turned this: [ dev A bio 1-8, dev A bio 8-16, dev A bio 16-32, dev B bio 1-8, dev B ... ] into: [ dev A bio 1-8, dev B bio 1-8, dev A bio 8-16, dev B bio 8-16 ] They look like the same IO, but if we wait for a request when we do (dev B bio 1-8) then our dev A bio 1-8 bio is likely to dispatch without all the other dev A bios we had queued. As you said in IRC, we''d be better off with one thread per device or (my preference) with a real non-blocking submit_bio. What kind of results did you get with your test from bumping the nr_requests? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 29.11.2011 22:47, Chris Mason wrote:> On Tue, Nov 29, 2011 at 09:40:56PM +0100, Arne Jansen wrote: >> Write bios are submitted from the submit_worker. The worker pumps down >> bios into the block layer until it signals a congestion. At least this >> is the theory. In pratice submit_bio just blocks before any signalling >> happens. As the bios are queued per device, this can lead to a situation >> where only one device is served until all bios are submitted, and only >> then the next device is served. This is obviously suboptimal. >> This patch just throws out the congestion detection and reschedules the >> worker every 8 requests. This way, all devices can be kept busy. >> This is only a temporary fix until the block layer provides a non-blocking >> submit_bio. Then the whole submit_worker mechanism can be killed. > > The problem with the every 8 requests logic is that we''ve still got a > pretty good chance of getting stuck behind get_request_wait. The way > the elevator batching works is that it should give us a batch of > requests, and once that batch is done we wait. > > If we jump around every 8 requests, we''ve turned this: > > [ dev A bio 1-8, dev A bio 8-16, dev A bio 16-32, dev B bio 1-8, dev B ... ]currently, it''s more like [ dev A bio 1 - 5000, dev B bio 1-5000 ]> > into: > > [ dev A bio 1-8, dev B bio 1-8, dev A bio 8-16, dev B bio 8-16 ]so this is a great improvement :)> > They look like the same IO, but if we wait for a request when we do > (dev B bio 1-8) then our dev A bio 1-8 bio is likely to dispatch without > all the other dev A bios we had queued. > > As you said in IRC, we''d be better off with one thread per device or (my > preference) with a real non-blocking submit_bio. What kind of results > did you get with your test from bumping the nr_requests?what nr_requests do you mean? btrfs_async_submit_limit? Arne> > -chris > ---- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 30, 2011 at 11:30:01AM +0100, Arne Jansen wrote:> On 29.11.2011 22:47, Chris Mason wrote: > > On Tue, Nov 29, 2011 at 09:40:56PM +0100, Arne Jansen wrote: > >> Write bios are submitted from the submit_worker. The worker pumps down > >> bios into the block layer until it signals a congestion. At least this > >> is the theory. In pratice submit_bio just blocks before any signalling > >> happens. As the bios are queued per device, this can lead to a situation > >> where only one device is served until all bios are submitted, and only > >> then the next device is served. This is obviously suboptimal. > >> This patch just throws out the congestion detection and reschedules the > >> worker every 8 requests. This way, all devices can be kept busy. > >> This is only a temporary fix until the block layer provides a non-blocking > >> submit_bio. Then the whole submit_worker mechanism can be killed. > > > > The problem with the every 8 requests logic is that we''ve still got a > > pretty good chance of getting stuck behind get_request_wait. The way > > the elevator batching works is that it should give us a batch of > > requests, and once that batch is done we wait. > > > > If we jump around every 8 requests, we''ve turned this: > > > > [ dev A bio 1-8, dev A bio 8-16, dev A bio 16-32, dev B bio 1-8, dev B ... ] > > currently, it''s more like > [ dev A bio 1 - 5000, dev B bio 1-5000 ] > > > > > into: > > > > [ dev A bio 1-8, dev B bio 1-8, dev A bio 8-16, dev B bio 8-16 ] > > so this is a great improvement :);)> > > > > They look like the same IO, but if we wait for a request when we do > > (dev B bio 1-8) then our dev A bio 1-8 bio is likely to dispatch without > > all the other dev A bios we had queued. > > > > As you said in IRC, we''d be better off with one thread per device or (my > > preference) with a real non-blocking submit_bio. What kind of results > > did you get with your test from bumping the nr_requests? > > what nr_requests do you mean? btrfs_async_submit_limit?/sys/block/xxx/queue/nr_requests -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 29, 2011 at 09:40:56PM +0100, Arne Jansen wrote:> Write bios are submitted from the submit_worker. The worker pumps down > bios into the block layer until it signals a congestion. At least this > is the theory. In pratice submit_bio just blocks before any signalling > happens. As the bios are queued per device, this can lead to a situation > where only one device is served until all bios are submitted, and only > then the next device is served. This is obviously suboptimal. > This patch just throws out the congestion detection and reschedules the > worker every 8 requests. This way, all devices can be kept busy. > This is only a temporary fix until the block layer provides a non-blocking > submit_bio. Then the whole submit_worker mechanism can be killed.Could you please try the top commit in for-linus instead? It seems to be helping here, although I wasn''t able to trigger the same problems you reported. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html