thr3ads.net - Btrfs devel - btrfs: why default 4M readahead size? [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Shaohua Li

2010-Mar-18 01:42 UTC

btrfs: why default 4M readahead size?

Btrfs uses below equation to calculate ra_pages:
	fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
              		4 * 1024 * 1024 / PAGE_CACHE_SIZE);
is the max() a typo of min()? This makes the readahead size is 4M by default,
which is too big.
I have a system with 16 CPU, 6G memory and 12 sata disks. I create a btrfs for
each disk, so this isn''t a raid setup. The test is fio, which has 12
tasks to
access 12 files for each disk. The fio test is mmap sequential read. I measure
the performance with different readahead size:
ra size		io throughput
4M		268288 k/s
2M		367616 k/s
1M		431104 k/s
512K		474112 k/s
256K		512000 k/s
128K		538624 k/s
The 4M default readahead size has poor performance.
I also does sync sequential read test, the test difference in''t that
big. But
the 4M case still has about 10% drop compared to the 512k case.

One might argue how about the case memory isn''t tight. I tried only run
a
one-disk setup with only one task. The 4M ra almost has no difference with the
128K ra. I guess the 128k default ra size for backing dev is carefuly choosed
to work with popular disks.
So my question is why we have a default 4M readahead size even with noraid case?

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2010-Mar-18 12:53 UTC

head link

Re: btrfs: why default 4M readahead size?

On Thu, Mar 18, 2010 at 09:42:57AM +0800, Shaohua Li
wrote:> Btrfs uses below equation to calculate ra_pages:
> 	fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
>               		4 * 1024 * 1024 / PAGE_CACHE_SIZE);
> is the max() a typo of min()? This makes the readahead size is 4M by
default,
> which is too big.
Looks like things have changed since I tuned that number.  Fengguang has
been busy ;)
> I have a system with 16 CPU, 6G memory and 12 sata disks. I create a btrfs
for
> each disk, so this isn''t a raid setup. The test is fio, which has
12 tasks to
> access 12 files for each disk. The fio test is mmap sequential read. I
measure
> the performance with different readahead size:
> ra size		io throughput
> 4M		268288 k/s
> 2M		367616 k/s
> 1M		431104 k/s
> 512K		474112 k/s
> 256K		512000 k/s
> 128K		538624 k/s
> The 4M default readahead size has poor performance.
> I also does sync sequential read test, the test difference in''t
that big. But
> the 4M case still has about 10% drop compared to the 512k case.
I''m surprised the 4M is so much slower.  At any rate, the larger size
was selected because btrfs checksumming means we need a bigger buffer to
keep the disks saturated.  Were you on a fancy intel box with hardware
crc32c enabled?
> 
> One might argue how about the case memory isn''t tight. I tried
only run a
> one-disk setup with only one task. The 4M ra almost has no difference with
the
> 128K ra. I guess the 128k default ra size for backing dev is carefuly
choosed
> to work with popular disks.
> So my question is why we have a default 4M readahead size even with noraid
case?
I''m happy to tune it down if lower numbers are more appropriate now,
thanks for trying this!

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shaohua Li

2010-Mar-19 00:59 UTC

head link

Re: btrfs: why default 4M readahead size?

On Thu, Mar 18, 2010 at 08:53:13PM +0800, Chris Mason
wrote:> On Thu, Mar 18, 2010 at 09:42:57AM +0800, Shaohua Li wrote:
> > Btrfs uses below equation to calculate ra_pages:
> > 	fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
> >               		4 * 1024 * 1024 / PAGE_CACHE_SIZE);
> > is the max() a typo of min()? This makes the readahead size is 4M by
default,
> > which is too big.
> 
> Looks like things have changed since I tuned that number.  Fengguang has
> been busy ;)
> 
> > I have a system with 16 CPU, 6G memory and 12 sata disks. I create a
btrfs for
> > each disk, so this isn''t a raid setup. The test is fio, which
has 12 tasks to
> > access 12 files for each disk. The fio test is mmap sequential read. I
measure
> > the performance with different readahead size:
> > ra size		io throughput
> > 4M		268288 k/s
> > 2M		367616 k/s
> > 1M		431104 k/s
> > 512K		474112 k/s
> > 256K		512000 k/s
> > 128K		538624 k/s
> > The 4M default readahead size has poor performance.
> > I also does sync sequential read test, the test difference
in''t that big. But
> > the 4M case still has about 10% drop compared to the 512k case.
> 
> I''m surprised the 4M is so much slower.  At any rate, the larger
size
> was selected because btrfs checksumming means we need a bigger buffer to
> keep the disks saturated.  Were you on a fancy intel box with hardware
> crc32c enabled?yes, this machine supports sse4.2 instruction. Let me check the result with
checksum
disabled.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shaohua Li

2010-Mar-19 02:56 UTC

head link

Re: btrfs: why default 4M readahead size?

On Fri, Mar 19, 2010 at 08:59:48AM +0800, Shaohua Li
wrote:> On Thu, Mar 18, 2010 at 08:53:13PM +0800, Chris Mason wrote:
> > On Thu, Mar 18, 2010 at 09:42:57AM +0800, Shaohua Li wrote:
> > > Btrfs uses below equation to calculate ra_pages:
> > > 	fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
> > >               		4 * 1024 * 1024 / PAGE_CACHE_SIZE);
> > > is the max() a typo of min()? This makes the readahead size is 4M
by default,
> > > which is too big.
> > 
> > Looks like things have changed since I tuned that number.  Fengguang
has
> > been busy ;)
> > 
> > > I have a system with 16 CPU, 6G memory and 12 sata disks. I
create a btrfs for
> > > each disk, so this isn''t a raid setup. The test is fio,
which has 12 tasks to
> > > access 12 files for each disk. The fio test is mmap sequential
read. I measure
> > > the performance with different readahead size:
> > > ra size		io throughput
> > > 4M		268288 k/s
> > > 2M		367616 k/s
> > > 1M		431104 k/s
> > > 512K		474112 k/s
> > > 256K		512000 k/s
> > > 128K		538624 k/s
> > > The 4M default readahead size has poor performance.
> > > I also does sync sequential read test, the test difference
in''t that big. But
> > > the 4M case still has about 10% drop compared to the 512k case.
> > 
> > I''m surprised the 4M is so much slower.  At any rate, the
larger size
> > was selected because btrfs checksumming means we need a bigger buffer
to
> > keep the disks saturated.  Were you on a fancy intel box with hardware
> > crc32c enabled?
> yes, this machine supports sse4.2 instruction. Let me check the result with
checksum
> disabled.Sounds no big difference with checksum disabled. I format the disks and redo
the test:
128k ra: 539648 k/s
4m ra: 285696 k/s

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2010-Mar-19 08:22 UTC

head link

Re: btrfs: why default 4M readahead size?

On Fri, Mar 19 2010, Shaohua Li wrote:> On Fri, Mar 19, 2010 at 08:59:48AM +0800, Shaohua Li wrote:
> > On Thu, Mar 18, 2010 at 08:53:13PM +0800, Chris Mason wrote:
> > > On Thu, Mar 18, 2010 at 09:42:57AM +0800, Shaohua Li wrote:
> > > > Btrfs uses below equation to calculate ra_pages:
> > > > 	fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
> > > >               		4 * 1024 * 1024 / PAGE_CACHE_SIZE);
> > > > is the max() a typo of min()? This makes the readahead size
is 4M by default,
> > > > which is too big.
> > > 
> > > Looks like things have changed since I tuned that number. 
Fengguang has
> > > been busy ;)
> > > 
> > > > I have a system with 16 CPU, 6G memory and 12 sata disks. I
create a btrfs for
> > > > each disk, so this isn''t a raid setup. The test is
fio, which has 12 tasks to
> > > > access 12 files for each disk. The fio test is mmap
sequential read. I measure
> > > > the performance with different readahead size:
> > > > ra size		io throughput
> > > > 4M		268288 k/s
> > > > 2M		367616 k/s
> > > > 1M		431104 k/s
> > > > 512K		474112 k/s
> > > > 256K		512000 k/s
> > > > 128K		538624 k/s
> > > > The 4M default readahead size has poor performance.
> > > > I also does sync sequential read test, the test difference
in''t that big. But
> > > > the 4M case still has about 10% drop compared to the 512k
case.
> > > 
> > > I''m surprised the 4M is so much slower.  At any rate,
the larger size
> > > was selected because btrfs checksumming means we need a bigger
buffer to
> > > keep the disks saturated.  Were you on a fancy intel box with
hardware
> > > crc32c enabled?
> > yes, this machine supports sse4.2 instruction. Let me check the result
with checksum
> > disabled.
> Sounds no big difference with checksum disabled. I format the disks and
redo
> the test:
> 128k ra: 539648 k/s
> 4m ra: 285696 k/s
4MB is definitely a huge read-ahead size, but I do wonder why it would
perform that much worse than a 128KB window. If you narrow your test
down to a single disk (or something simpler, at least), how does 4MB
compare to 128KB? With 6GB of memory, you should not run into read-ahead
memory thrashing.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shaohua Li

2010-Mar-19 09:29 UTC

head link

Re: btrfs: why default 4M readahead size?

On Fri, Mar 19, 2010 at 04:22:11PM +0800, Jens Axboe
wrote:> On Fri, Mar 19 2010, Shaohua Li wrote:
> > On Fri, Mar 19, 2010 at 08:59:48AM +0800, Shaohua Li wrote:
> > > On Thu, Mar 18, 2010 at 08:53:13PM +0800, Chris Mason wrote:
> > > > On Thu, Mar 18, 2010 at 09:42:57AM +0800, Shaohua Li wrote:
> > > > > Btrfs uses below equation to calculate ra_pages:
> > > > > 	fs_info->bdi.ra_pages =
max(fs_info->bdi.ra_pages,
> > > > >               		4 * 1024 * 1024 / PAGE_CACHE_SIZE);
> > > > > is the max() a typo of min()? This makes the readahead
size is 4M by default,
> > > > > which is too big.
> > > > 
> > > > Looks like things have changed since I tuned that number. 
Fengguang has
> > > > been busy ;)
> > > > 
> > > > > I have a system with 16 CPU, 6G memory and 12 sata
disks. I create a btrfs for
> > > > > each disk, so this isn''t a raid setup. The
test is fio, which has 12 tasks to
> > > > > access 12 files for each disk. The fio test is mmap
sequential read. I measure
> > > > > the performance with different readahead size:
> > > > > ra size		io throughput
> > > > > 4M		268288 k/s
> > > > > 2M		367616 k/s
> > > > > 1M		431104 k/s
> > > > > 512K		474112 k/s
> > > > > 256K		512000 k/s
> > > > > 128K		538624 k/s
> > > > > The 4M default readahead size has poor performance.
> > > > > I also does sync sequential read test, the test
difference in''t that big. But
> > > > > the 4M case still has about 10% drop compared to the
512k case.
> > > > 
> > > > I''m surprised the 4M is so much slower.  At any
rate, the larger size
> > > > was selected because btrfs checksumming means we need a
bigger buffer to
> > > > keep the disks saturated.  Were you on a fancy intel box
with hardware
> > > > crc32c enabled?
> > > yes, this machine supports sse4.2 instruction. Let me check the
result with checksum
> > > disabled.
> > Sounds no big difference with checksum disabled. I format the disks
and redo
> > the test:
> > 128k ra: 539648 k/s
> > 4m ra: 285696 k/s
> 
> 4MB is definitely a huge read-ahead size, but I do wonder why it would
> perform that much worse than a 128KB window. If you narrow your test
> down to a single disk (or something simpler, at least), how does 4MB
> compare to 128KB? With 6GB of memory, you should not run into read-ahead
> memory thrashing.test data for a single disk(just run one time so far):
128k ra: 88513k/s
4m ra:87630k/s
so no big difference.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2010-Mar-19 12:57 UTC

head link

Re: btrfs: why default 4M readahead size?

On Fri, Mar 19 2010, Shaohua Li wrote:> On Fri, Mar 19, 2010 at 04:22:11PM +0800, Jens Axboe wrote:
> > On Fri, Mar 19 2010, Shaohua Li wrote:
> > > On Fri, Mar 19, 2010 at 08:59:48AM +0800, Shaohua Li wrote:
> > > > On Thu, Mar 18, 2010 at 08:53:13PM +0800, Chris Mason wrote:
> > > > > On Thu, Mar 18, 2010 at 09:42:57AM +0800, Shaohua Li
wrote:
> > > > > > Btrfs uses below equation to calculate ra_pages:
> > > > > > 	fs_info->bdi.ra_pages =
max(fs_info->bdi.ra_pages,
> > > > > >               		4 * 1024 * 1024 /
PAGE_CACHE_SIZE);
> > > > > > is the max() a typo of min()? This makes the
readahead size is 4M by default,
> > > > > > which is too big.
> > > > > 
> > > > > Looks like things have changed since I tuned that
number.  Fengguang has
> > > > > been busy ;)
> > > > > 
> > > > > > I have a system with 16 CPU, 6G memory and 12 sata
disks. I create a btrfs for
> > > > > > each disk, so this isn''t a raid setup.
The test is fio, which has 12 tasks to
> > > > > > access 12 files for each disk. The fio test is
mmap sequential read. I measure
> > > > > > the performance with different readahead size:
> > > > > > ra size		io throughput
> > > > > > 4M		268288 k/s
> > > > > > 2M		367616 k/s
> > > > > > 1M		431104 k/s
> > > > > > 512K		474112 k/s
> > > > > > 256K		512000 k/s
> > > > > > 128K		538624 k/s
> > > > > > The 4M default readahead size has poor
performance.
> > > > > > I also does sync sequential read test, the test
difference in''t that big. But
> > > > > > the 4M case still has about 10% drop compared to
the 512k case.
> > > > > 
> > > > > I''m surprised the 4M is so much slower.  At
any rate, the larger size
> > > > > was selected because btrfs checksumming means we need a
bigger buffer to
> > > > > keep the disks saturated.  Were you on a fancy intel
box with hardware
> > > > > crc32c enabled?
> > > > yes, this machine supports sse4.2 instruction. Let me check
the result with checksum
> > > > disabled.
> > > Sounds no big difference with checksum disabled. I format the
disks and redo
> > > the test:
> > > 128k ra: 539648 k/s
> > > 4m ra: 285696 k/s
> > 
> > 4MB is definitely a huge read-ahead size, but I do wonder why it would
> > perform that much worse than a 128KB window. If you narrow your test
> > down to a single disk (or something simpler, at least), how does 4MB
> > compare to 128KB? With 6GB of memory, you should not run into
read-ahead
> > memory thrashing.
> test data for a single disk(just run one time so far):
> 128k ra: 88513k/s
> 4m ra:87630k/s
> so no big difference.
That looks pretty much as expected, unless you hit some sort of memory
thrashing, a huge read-ahead window should not cause a performance
degredation. At least not of your magnitude. I would expect performance
to reach a stable threshold once you have requests that are large enough
to utilize the full device bandwidth on its own and then remain at that
plateau.

Any chance you could capture blktrace data for a run with 128KB and one
with 4MB so we could inspect the disk IO pattern?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Mar 2010 - btrfs: why default 4M readahead size?

btrfs: why default 4M readahead size?

Re: btrfs: why default 4M readahead size?

Re: btrfs: why default 4M readahead size?

Re: btrfs: why default 4M readahead size?

Re: btrfs: why default 4M readahead size?

Re: btrfs: why default 4M readahead size?

Re: btrfs: why default 4M readahead size?