Shaohua Li
2010-Feb-03 07:45 UTC
[patch]btrfs: finish read pages in the order they are submitted
the endio is done at reverse order of bio vectors. That means for a sequential read, the page first submitted will finish last in a bio. Considering we will do checksum (making cache hot) for every page, this does introduce delay (and chance to squeeze cache used soon) for pages submitted at the begining. I don''t observe obvious performance difference with below patch at my simple test, but seems more natural to finish read in the order they are submitted. Signed-off-by: Shaohua Li <shaohua.li@intel.com> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 96577e8..4df0c56 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1750,7 +1750,8 @@ static void end_bio_extent_writepage(struct bio *bio, int err) static void end_bio_extent_readpage(struct bio *bio, int err) { int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); - struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1; + struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1; + struct bio_vec *bvec = bio->bi_io_vec; struct extent_io_tree *tree; u64 start; u64 end; @@ -1773,7 +1774,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err) else whole_page = 0; - if (--bvec >= bio->bi_io_vec) + if (++bvec <= bvec_end) prefetchw(&bvec->bv_page->flags); if (uptodate && tree->ops && tree->ops->readpage_end_io_hook) { @@ -1818,7 +1819,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err) } check_page_locked(tree, page); } - } while (bvec >= bio->bi_io_vec); + } while (bvec <= bvec_end); bio_put(bio); } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2010-Feb-03 18:18 UTC
Re: [patch]btrfs: finish read pages in the order they are submitted
On Wed, Feb 03, 2010 at 03:45:11PM +0800, Shaohua Li wrote:> the endio is done at reverse order of bio vectors. That means for a sequential > read, the page first submitted will finish last in a bio. Considering we will > do checksum (making cache hot) for every page, this does introduce delay (and > chance to squeeze cache used soon) for pages submitted at the begining. I > don''t observe obvious performance difference with below patch at my simple test, > but seems more natural to finish read in the order they are submitted.Interesting, I wonder if we''d be able to see this on a higher throughput system. Jens, care to give it a shot (patch below)? -chris Signed-off-by: Shaohua Li <shaohua.li@intel.com> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 96577e8..4df0c56 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1750,7 +1750,8 @@ static void end_bio_extent_writepage(struct bio *bio, int err) static void end_bio_extent_readpage(struct bio *bio, int err) { int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); - struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1; + struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1; + struct bio_vec *bvec = bio->bi_io_vec; struct extent_io_tree *tree; u64 start; u64 end; @@ -1773,7 +1774,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err) else whole_page = 0; - if (--bvec >= bio->bi_io_vec) + if (++bvec <= bvec_end) prefetchw(&bvec->bv_page->flags); if (uptodate && tree->ops && tree->ops->readpage_end_io_hook) { @@ -1818,7 +1819,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err) } check_page_locked(tree, page); } - } while (bvec >= bio->bi_io_vec); + } while (bvec <= bvec_end); bio_put(bio); } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jens Axboe
2010-Feb-08 10:59 UTC
Re: [patch]btrfs: finish read pages in the order they are submitted
On Wed, Feb 03 2010, Chris Mason wrote:> On Wed, Feb 03, 2010 at 03:45:11PM +0800, Shaohua Li wrote: > > the endio is done at reverse order of bio vectors. That means for a sequential > > read, the page first submitted will finish last in a bio. Considering we will > > do checksum (making cache hot) for every page, this does introduce delay (and > > chance to squeeze cache used soon) for pages submitted at the begining. I > > don''t observe obvious performance difference with below patch at my simple test, > > but seems more natural to finish read in the order they are submitted. > > Interesting, I wonder if we''d be able to see this on a higher throughput > system. Jens, care to give it a shot (patch below)?Sure, I gave it a spin. Baseline is current -git (-rc7''ish), and the workload is just stream reading 8 16GB files. I used large streaming reads as the bigger ios would hopefully help show the effect of doing the reverse completions. The run takes ~1 minute, and the results are averaged over 3 runs. Throughput: Kernel Slowest Fastest Average ------------------------------------------------------- baseline 2041MB/sec 2229MB/sec 2155MB/sec patched 2052MB/sec 2071MB/sec 2062MB/sec Completion latency average (msecs): Kernel Best Worst Average ------------------------------------------------------- baseline 1.72 1.89 1.79 patche 1.83 1.89 1.85 Probably would need a LOT more runs to get a statistically significant number here, it would be nice if O_DIRECT worked (hint, hint!) which usually makes these things easier to test. If I look at the throughput of the runs, the baseline usually starts a little slower (1.8GB/sec or so) and gets faster, while the patched run starts much higher (close to 3.0GB/sec) and drops to 2.0GB/sec after that for the rest of the run. So I did some perf stat checks too, to see if we see an improvement for cache utilization. Results below. Cache stats (millions) Kernel References Misses ---------------------------------------------- baseline 3547 2387 patched 3822 2351o These numbers are very stable, the above were also averaged over 3 runs, but variability was very low. My feeling is that the patch should be included. Cache misses are provably down and the patch makes a lot of sense just logically. The patched runs seemed more stable, and my gut tells me that the unpatched runs may have been a bit flukey (one fast run, should probably be excluded). Let me know if you want more tests. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jens Axboe
2010-Feb-08 11:44 UTC
Re: [patch]btrfs: finish read pages in the order they are submitted
On Mon, Feb 08 2010, Jens Axboe wrote:> Cache stats (millions) > > Kernel References Misses > ---------------------------------------------- > baseline 3547 2387 > patched 3822 2351 > > These numbers are very stable, the above were also averaged over 3 runs, > but variability was very low.Update on this. I setup the storage system for more stable runs and repeated the above test. It runs a bit faster as well, completes the workload at 2.5GB/sec average. Cache stats (millions) Kernel References Misses ---------------------------------------------- baseline 3384 2318 baseline 3417 2313 baseline 3382 2323 baseline avg 3394 2318 patched 3518 2258 patched 3428 2201 patched 3536 2274 patched avg 3494 2244 So for those runs, ~3% more references and ~3 less misses. Even with the variability here, that looks like a win in my book. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html