thr3ads.net - Btrfs devel - Confused by performance [May 2010]

If this information is useful, please help other people find it:
Share via:

K. Richard Pixley

2010-May-24 21:08 UTC

Confused by performance

I''ve just started to work with btrfs so I started with a benchmark.  On
four identical servers, (2 dual core cpus, single local disk), I built 
filesystems - ext3, ext4, nilfs2, and btrfs.  I checked out a sizable 
code tree and timed a build.  The build is parallelized to use 4 threads 
when possible.

I''m seeing similar build times on ext[34] and nilfs2 but I''m
seeing
almost double the times for btrfs using default options.  And I''m
having
trouble reconciling this performance cost with the benchmarks I''m
seeing
around the net.

Is this a common result?  Is there a trick to getting ext4 competitive 
performance out of btrfs?  Is my application a poor choice for btrfs?  
Am I missing something obvious here?

--rich


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mike Fedyk

2010-May-25 03:59 UTC

head link

Re: Confused by performance

On Mon, May 24, 2010 at 2:08 PM, K. Richard Pixley <rich@noir.com>
wrote:> I''ve just started to work with btrfs so I started with a
benchmark.  On four
> identical servers, (2 dual core cpus, single local disk), I built
> filesystems - ext3, ext4, nilfs2, and btrfs.  I checked out a sizable code
> tree and timed a build.  The build is parallelized to use 4 threads when
> possible.
>
> I''m seeing similar build times on ext[34] and nilfs2 but
I''m seeing almost
> double the times for btrfs using default options.  And I''m having
trouble
> reconciling this performance cost with the benchmarks I''m seeing
around the
> net.
>
> Is this a common result?  Is there a trick to getting ext4 competitive
> performance out of btrfs?  Is my application a poor choice for btrfs?  Am I
> missing something obvious here?
>
Please make sure you''re testing with the latest btrfs from git or
linus latest kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

K. Richard Pixley

2010-May-28 01:45 UTC

head link

Re: Confused by performance

Just as a followup, my problem appears to be hardware related.  It''s
not
clear yet whether it''s a strange failure mode or a configuration
snafoo,
disk or controller, but elsewhere I''m seeing a btfs single disk 
performance penalty more like 2% over ext[34] which seems completely 
reasonable.

Sorry for the panic.

--rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

K. Richard Pixley

2010-Jun-16 18:08 UTC

head link

Confused by performance

Once again I''m stumped by some performance numbers and hoping for some 
insight.

Using an 8-core server, building in parallel, I''m building some code.  
Using ext2 over a 5-way, (5 disk), lvm partition, I can build that code 
in 35 minutes.  Tests with dd on the raw disk and lvm partitions show me 
that I''m getting near linear improvement from the raw stripe, even with
dd runs exceeding 10G, so I think that convinces me that my disks and 
controller subsystem are capable of operating in parallel and in 
concert.  hdparm -t numbers seem to support what I''m seeing from dd.

Running the same build, same parallelism, over a btrfs (defaults) 
partition on a single drive, I''m seeing very consistent build times 
around an hour, which is reasonable.  I get a little under an hour on 
ext4 single disk, again, very consistently.

However, if I build a btrfs file system across the 5 disks, my build 
times decline to around 1.5 - 2hrs, although there''s about a 30min 
variation between different runs.

If I build a btrfs file system across the 5-way lvm stripe, I get even 
worse performance at around 2.5hrs per build, with about a 45min 
variation between runs.

I can''t explain these last two results.  Any theories?

--rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roberto Ragusa

2010-Jun-16 19:21 UTC

head link

Re: Confused by performance

K. Richard Pixley wrote:> Once again I''m stumped by some performance numbers and hoping for
some
> insight.
> 
> Using an 8-core server, building in parallel, I''m building some
code.
> Using ext2 over a 5-way, (5 disk), lvm partition, I can build that code
> in 35 minutes.  Tests with dd on the raw disk and lvm partitions show me
> that I''m getting near linear improvement from the raw stripe, even
with
> dd runs exceeding 10G, so I think that convinces me that my disks and
> controller subsystem are capable of operating in parallel and in
> concert.  hdparm -t numbers seem to support what I''m seeing from
dd.
> 
> Running the same build, same parallelism, over a btrfs (defaults)
> partition on a single drive, I''m seeing very consistent build
times
> around an hour, which is reasonable.  I get a little under an hour on
> ext4 single disk, again, very consistently.
> 
> However, if I build a btrfs file system across the 5 disks, my build
> times decline to around 1.5 - 2hrs, although there''s about a 30min
> variation between different runs.
> 
> If I build a btrfs file system across the 5-way lvm stripe, I get even
> worse performance at around 2.5hrs per build, with about a 45min
> variation between runs.
> 
> I can''t explain these last two results.  Any theories?
If you just want theory, I can try. :-)

Theory of striping follows (numbers invented).
If you have a stripe size of 8 sectors, 40 successive sectors are
divided in 5 groups of 8 sectors, with each group on a different disk.
Suppose you want to read 40 sectors; with one disk and no striping
you need:
    time_to_place_disk_head_and_rotational_latency (10ms)
  + time to read 40 sectors (around 0ms)
Suppose you want to read 40 sectors with your 5disk striped volume,
you need
    time_to_place_disk_head_and_rotational_latency (10ms)
  + time to read 8 sectors (around 0ms)
    time_to_place_disk_head_and_rotational_latency (10ms)
  + time to read 8 sectors (around 0ms)
    time_to_place_disk_head_and_rotational_latency (10ms)
  + time to read 8 sectors (around 0ms)
    time_to_place_disk_head_and_rotational_latency (10ms)
  + time to read 8 sectors (around 0ms)
    time_to_place_disk_head_and_rotational_latency (10ms)
  + time to read 8 sectors (around 0ms)
so you are 5 times slower. Now, it could be that you submit the
5 requests together; in that case you do not pay a 5 times penalty,
but a 2 times penalty. Why? Because, if you think about rotational
latency, if a disk takes 10ms to do one rotation, you will have
your data ready in a random time equally distributed between
0 and 10 ms (average 5ms). If you submit 5 command to 5 disks
each of one will have a (independent!) flat distribution between
0 and 10ms; as you need all 5 pieces you have to wait the
unluckier of the disks, so your average will be near 10ms.
So in general striping costs you a 2 to 5 speed penalty.

If your build if really parallel, when one process is waiting
data, another one will make requests. But remember that all the disks
are busy because of the first process, so it is not unreasonable
to have that multiple processes do not gain any speed.
In reality, the first 5 requests and the second 5 could be evaluated
at the same time to give precedence to the one of the two more
easy for the drive (maybe the second one is lucky from a rotational point
of view, so it is better to do that before the first). In this case the
disks are better utilized, but the net effect on the overall build
is not so easy to establish, because, when you give precedence
to the second request, you are delaying the first, so the entire
first 40-sectors read could have worse timing than 0-10ms_almost_surely_10,
and can get a 0-20ms_maybe_15.

There is a lot of maths you can study (queue theory and scheduling
algorithms) and a lot of factors can be important (disk queue size,
NCQ, caching) at various levels (O.S., controller, disk).

In my opinion, the basic rule in these cases should be:
  == use a stripe size bigger than the sizes of your random reads =In case of
seeky load I would personally use a stripe size of 64MiB,
for example. One read should only involve one disk.
Stripe size is often configured with very small values (such as 4KiB),
because it produces very big numbers when you read sequentially
(as you are really using all the disks together).
But latency sucks.

In your case, the build is probably very seeky, and the "seekiness"
could be exacerbated by having many writes (and things become even worse
when the filesystems involve a journal...).

(sorry for the long mail. you asked for a theory :-) )
-- 
   Roberto Ragusa    mail at robertoragusa.it
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Freddie Cash

2010-Jun-16 19:35 UTC

head link

Re: Confused by performance

<snip a lot of fancy math that missed the point>

That''s all well and good, but you missed the part where he said ext2
on a 5-way LVM stripeset is many times faster than btrfs on a 5-way
btrfs stripeset.

IOW, same 5-way stripeset, different filesystems and volume managers,
and very different performance.

And he''s wondering why the btrfs method used for striping is so much
slower than the lvm method used for striping.

--
Freddie Cash
fjwcash@gmail.com



--
Freddie Cash
fjwcash@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roberto Ragusa

2010-Jun-16 19:56 UTC

head link

Re: Confused by performance

Freddie Cash wrote:> > <snip a lot of fancy math that missed the point>
> >
> > That''s all well and good, but you missed the part where he
said ext2
> > on a 5-way LVM stripeset is many times faster than btrfs on a 5-way
> > btrfs stripeset.
> >
> > IOW, same 5-way stripeset, different filesystems and volume managers,
> > and very different performance.
> >
> > And he''s wondering why the btrfs method used for striping is
so much
> > slower than the lvm method used for striping.
Sorry, I missed the first line where ext2 on 5disk lvm is said to be fast.
I was commenting as the last two results were ext2 on 5disk and btrfs
on 5disk.

I''d say the great variation between successive runs is important and
seems
to point to some bigger problem.

-- Roberto Ragusa mail at robertoragusa.it
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2010-Jun-16 21:44 UTC

head link

Re: Confused by performance

On Wed, Jun 16, 2010 at 7:08 PM, K. Richard Pixley <rich@noir.com>
wrote:> Once again I''m stumped by some performance numbers and hoping for
some
> insight.
>
> Using an 8-core server, building in parallel, I''m building some
code.  Using
> ext2 over a 5-way, (5 disk), lvm partition, I can build that code in 35
> minutes.  Tests with dd on the raw disk and lvm partitions show me that
I''m
> getting near linear improvement from the raw stripe, even with dd runs
> exceeding 10G, so I think that convinces me that my disks and controller
> subsystem are capable of operating in parallel and in concert.  hdparm -t
> numbers seem to support what I''m seeing from dd.
>
> Running the same build, same parallelism, over a btrfs (defaults) partition
> on a single drive, I''m seeing very consistent build times around
an hour,
> which is reasonable.  I get a little under an hour on ext4 single disk,
> again, very consistently.
>
> However, if I build a btrfs file system across the 5 disks, my build times
> decline to around 1.5 - 2hrs, although there''s about a 30min
variation
> between different runs.
>
> If I build a btrfs file system across the 5-way lvm stripe, I get even
worse
> performance at around 2.5hrs per build, with about a 45min variation
between
> runs.
>
> I can''t explain these last two results.  Any theories?
Try mounting the BTRFS filesystem with ''nobarrier'', since this
may be
an obvious difference. Also, for metadata-write-intensive workloads,
when creating the filesystem try ''mkfs.btrfs -m single''. Of
course,
all this doesn''t explain the variance.

I''d say it''s worth emplying ''blktrace'' to
see what happening at a
lower level, and even eg varying between deadline/CFQ I/O schedulers.

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Brown

2010-Jun-17 06:57 UTC

head link

Re: Confused by performance

On 16/06/2010 21:35, Freddie Cash wrote:> <snip a lot of fancy math that missed the point>
>
> That''s all well and good, but you missed the part where he said
ext2
> on a 5-way LVM stripeset is many times faster than btrfs on a 5-way
> btrfs stripeset.
>
> IOW, same 5-way stripeset, different filesystems and volume managers,
> and very different performance.
>
> And he''s wondering why the btrfs method used for striping is so
much
> slower than the lvm method used for striping.
>
This could easily be explained by Roberto''s theory and maths - if the 
lvm stripe set used large stripe sizes so that the random reads were 
mostly read from a single disk, it would be fast.  If the btrfs stripes 
were small, then it would be slow due to all the extra seeks.

Do we know anything about the stripe sizes used?


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2010-Jun-17 09:57 UTC

head link

Re: Confused by performance

On Wed, Jun 16, 2010 at 11:08:48AM -0700, K. Richard Pixley
wrote:> Once again I''m stumped by some performance numbers and hoping for
> some insight.
> 
> Using an 8-core server, building in parallel, I''m building some
> code.  Using ext2 over a 5-way, (5 disk), lvm partition, I can build
> that code in 35 minutes.  Tests with dd on the raw disk and lvm
> partitions show me that I''m getting near linear improvement from
the
> raw stripe, even with dd runs exceeding 10G, so I think that
> convinces me that my disks and controller subsystem are capable of
> operating in parallel and in concert.  hdparm -t numbers seem to
> support what I''m seeing from dd.
> 
> Running the same build, same parallelism, over a btrfs (defaults)
> partition on a single drive, I''m seeing very consistent build
times
> around an hour, which is reasonable.  I get a little under an hour
> on ext4 single disk, again, very consistently.
> 
> However, if I build a btrfs file system across the 5 disks, my build
> times decline to around 1.5 - 2hrs, although there''s about a 30min
> variation between different runs.
> 
> If I build a btrfs file system across the 5-way lvm stripe, I get
> even worse performance at around 2.5hrs per build, with about a
> 45min variation between runs.
> 
> I can''t explain these last two results.  Any theories?
I suspect they come down to different raid levels done by btrfs, and
maybe barriers.

By default btrfs will duplicate metadata, so ext2 is doing much less
metadata IO than btrfs does.

Try mkfs.btrfs -m raid0 -d raid0 /dev/xxx /dev/xxy ...

Then try mount -o nobarrier /dev/xxx /mnt

Someone else mentioned blktrace, it would help explain things if you''re
interested in tracking this down.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - May 2010 - Confused by performance

Confused by performance

Re: Confused by performance

Re: Confused by performance

Confused by performance

Re: Confused by performance

Re: Confused by performance

Re: Confused by performance

Re: Confused by performance

Re: Confused by performance

Re: Confused by performance