thr3ads.net - Btrfs devel - Btrfs slowdown [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Christian Brunner

2011-Jul-25 07:54 UTC

Btrfs slowdown

Hi,

we are running a ceph cluster with btrfs as it''s base filesystem
(kernel 3.0). At the beginning everything worked very well, but after
a few days (2-3) things are getting very slow.

When I look at the object store servers I see heavy disk-i/o on the
btrfs filesystems (disk utilization is between 60% and 100%). I also
did some tracing on the Cepp-Object-Store-Daemon, but I''m quite
certain, that the majority of the disk I/O is not caused by ceph or
any other userland process.

When reboot the system(s) the problems go away for another 2-3 days,
but after that, it starts again. I''m not sure if the problem is
related to the kernel warning I''ve reported last week. At least there
is no temporal relationship between the warning and the slowdown.

Any hints on how to trace this would be welcome.

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrej Podzimek

2011-Jul-25 08:51 UTC

head link

Re: Btrfs slowdown

Andrej Podzimek

2011-Jul-25 09:45 UTC

head link

Re: Btrfs slowdown

Jeremy Sanders

2011-Jul-25 14:37 UTC

head link

Re: Btrfs slowdown

Christian Brunner wrote:
> we are running a ceph cluster with btrfs as it''s base filesystem
> (kernel 3.0). At the beginning everything worked very well, but after
> a few days (2-3) things are getting very slow.
We get quite a slowdown over time, doing rsyncs to different snapshots. 
Btrfs seems to go from using several threads in parallel btrfs-endio-0,1,2, 
shown in top, to just using a single thread btrfs-delalloc.

Jeremy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2011-Jul-25 19:52 UTC

head link

Re: Btrfs slowdown

Excerpts from Christian Brunner''s message of 2011-07-25 03:54:47
-0400:> Hi,
> 
> we are running a ceph cluster with btrfs as it''s base filesystem
> (kernel 3.0). At the beginning everything worked very well, but after
> a few days (2-3) things are getting very slow.
> 
> When I look at the object store servers I see heavy disk-i/o on the
> btrfs filesystems (disk utilization is between 60% and 100%). I also
> did some tracing on the Cepp-Object-Store-Daemon, but I''m quite
> certain, that the majority of the disk I/O is not caused by ceph or
> any other userland process.
> 
> When reboot the system(s) the problems go away for another 2-3 days,
> but after that, it starts again. I''m not sure if the problem is
> related to the kernel warning I''ve reported last week. At least
there
> is no temporal relationship between the warning and the slowdown.
> 
> Any hints on how to trace this would be welcome.
The easiest way to trace this is with latencytop.

Apply this patch:

http://oss.oracle.com/~mason/latencytop.patch

And then use latencytop -c for a few minutes while the system is slow.
Send the output here and hopefully we''ll be able to figure it out.

-chris
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Brunner

2011-Jul-27 08:41 UTC

head link

Re: Btrfs slowdown

2011/7/25 Chris Mason <chris.mason@oracle.com>:> Excerpts from Christian Brunner''s message of 2011-07-25 03:54:47
-0400:
>> Hi,
>>
>> we are running a ceph cluster with btrfs as it''s base
filesystem
>> (kernel 3.0). At the beginning everything worked very well, but after
>> a few days (2-3) things are getting very slow.
>>
>> When I look at the object store servers I see heavy disk-i/o on the
>> btrfs filesystems (disk utilization is between 60% and 100%). I also
>> did some tracing on the Cepp-Object-Store-Daemon, but I''m
quite
>> certain, that the majority of the disk I/O is not caused by ceph or
>> any other userland process.
>>
>> When reboot the system(s) the problems go away for another 2-3 days,
>> but after that, it starts again. I''m not sure if the problem
is
>> related to the kernel warning I''ve reported last week. At
least there
>> is no temporal relationship between the warning and the slowdown.
>>
>> Any hints on how to trace this would be welcome.
>
> The easiest way to trace this is with latencytop.
>
> Apply this patch:
>
> http://oss.oracle.com/~mason/latencytop.patch
>
> And then use latencytop -c for a few minutes while the system is slow.
> Send the output here and hopefully we''ll be able to figure it out.
I''ve now installed latencytop. Attached are two output files: The
first is from yesterday and was created aproxematly half an hour after
the boot. The second on is from today, uptime is 19h. The load on the
system is already rising. Disk utilization is approximately at 50%.

Thanks for your help.

Christian

Marcus Sorensen

2011-Jul-28 04:05 UTC

head link

Re: Btrfs slowdown

Christian,

Have you checked up on the disks themselves and hardware? High
utilization can mean that the i/o load has increased, but it can also
mean that the i/o capacity has decreased.  Your traces seem to
indicate that a good portion of the time is being spent on commits,
that could be waiting on disk. That "wait_for_commit" looks to
basically just spin waiting for the commit to complete, and at least
one thing that calls it raises a BUG_ON, not sure if it''s one
you''ve
seen even on 2.6.38.

There could be all sorts of performance related reasons that aren''t
specific to btrfs or ceph, on our various systems we''ve seen things
like the raid card module being upgraded in newer kernels and suddenly
our disks start to go into sleep mode after a bit, dirty_ratio causing
multiple gigs of memory to sync because its not optimized for the
workload, external SAS enclosures stop communicating a few days after
reboot (but the disks keep working with sporadic issues), things like
patrol read hitting a bad sector on a disk, causing it to go into
enhanced error recovery and stop responding, etc.

Maybe you have already tried these things. It''s where I would start
anyway. Looking at /proc/meminfo, dirty, writeback, swap, etc both
while the system is functioning desirably and when it''s misbehaving.
Looking at anything else that might be in D state. Looking at not just
disk util, but the workload causing it (e.g. Was I doing 300 iops
previously with an average size of 64k, and now I''m only managing 50
iops at 64k before the disk util reports 100%?) Testing the system in
a filesystem-agnostic manner, for example when performance is bad
through btrfs, is performance the same as you got on fresh boot when
testing iops on /dev/sdb or whatever? You''re not by chance swapping
after a bit of uptime on any volume that''s shared with the underlying
disks that make up your osd, obfuscated by a hardware raid? I didn''t
see the kernel warning you''re referring to, just the ixgbe malloc
failure you mentioned the other day.

I do not mean to presume that you have not looked at these things
already. I am not very knowledgeable in btrfs specifically, but I
would expect any degradation in performance over time to be due to
what''s on disk (lots of small files, fragmented, etc). This is
obviously not the case in this situation since a reboot recovers the
performance. I suppose it could also be a memory leak or something
similar, but you should be able to detect something like that by
monitoring your memory situation, /proc/slabinfo etc.

Just my thoughts, good luck on this. I am currently running 2.6.39.3
(btrfs) on the 7 node cluster I put together, but I just built it and
am comparing between various configs. It will be awhile before it is
under load for several days straight.

On Wed, Jul 27, 2011 at 2:41 AM, Christian Brunner <chb@muc.de>
wrote:> 2011/7/25 Chris Mason <chris.mason@oracle.com>:
>> Excerpts from Christian Brunner''s message of 2011-07-25
03:54:47 -0400:
>>> Hi,
>>>
>>> we are running a ceph cluster with btrfs as it''s base
filesystem
>>> (kernel 3.0). At the beginning everything worked very well, but
after
>>> a few days (2-3) things are getting very slow.
>>>
>>> When I look at the object store servers I see heavy disk-i/o on the
>>> btrfs filesystems (disk utilization is between 60% and 100%). I
also
>>> did some tracing on the Cepp-Object-Store-Daemon, but I''m
quite
>>> certain, that the majority of the disk I/O is not caused by ceph or
>>> any other userland process.
>>>
>>> When reboot the system(s) the problems go away for another 2-3
days,
>>> but after that, it starts again. I''m not sure if the
problem is
>>> related to the kernel warning I''ve reported last week. At
least there
>>> is no temporal relationship between the warning and the slowdown.
>>>
>>> Any hints on how to trace this would be welcome.
>>
>> The easiest way to trace this is with latencytop.
>>
>> Apply this patch:
>>
>> http://oss.oracle.com/~mason/latencytop.patch
>>
>> And then use latencytop -c for a few minutes while the system is slow.
>> Send the output here and hopefully we''ll be able to figure it
out.
>
> I''ve now installed latencytop. Attached are two output files: The
> first is from yesterday and was created aproxematly half an hour after
> the boot. The second on is from today, uptime is 19h. The load on the
> system is already rising. Disk utilization is approximately at 50%.
>
> Thanks for your help.
>
> Christian
>--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Brunner

2011-Jul-28 15:10 UTC

head link

Re: Btrfs slowdown

2011/7/28 Marcus Sorensen <shadowsor@gmail.com>:> Christian,
>
> Have you checked up on the disks themselves and hardware? High
> utilization can mean that the i/o load has increased, but it can also
> mean that the i/o capacity has decreased.  Your traces seem to
> indicate that a good portion of the time is being spent on commits,
> that could be waiting on disk. That "wait_for_commit" looks to
> basically just spin waiting for the commit to complete, and at least
> one thing that calls it raises a BUG_ON, not sure if it''s one
you''ve
> seen even on 2.6.38.
>
> There could be all sorts of performance related reasons that
aren''t
> specific to btrfs or ceph, on our various systems we''ve seen
things
> like the raid card module being upgraded in newer kernels and suddenly
> our disks start to go into sleep mode after a bit, dirty_ratio causing
> multiple gigs of memory to sync because its not optimized for the
> workload, external SAS enclosures stop communicating a few days after
> reboot (but the disks keep working with sporadic issues), things like
> patrol read hitting a bad sector on a disk, causing it to go into
> enhanced error recovery and stop responding, etc.
I'' fairly confident that the hardware is ok. We see the problem on
four machines. It could be a problem with the hpsa driver/firmware,
but we haven''t seen the behavior with 2.6.38 and the changes in the
hpsa driver are not that big.
> Maybe you have already tried these things. It''s where I would
start
> anyway. Looking at /proc/meminfo, dirty, writeback, swap, etc both
> while the system is functioning desirably and when it''s
misbehaving.
> Looking at anything else that might be in D state. Looking at not just
> disk util, but the workload causing it (e.g. Was I doing 300 iops
> previously with an average size of 64k, and now I''m only managing
50
> iops at 64k before the disk util reports 100%?) Testing the system in
> a filesystem-agnostic manner, for example when performance is bad
> through btrfs, is performance the same as you got on fresh boot when
> testing iops on /dev/sdb or whatever? You''re not by chance
swapping
> after a bit of uptime on any volume that''s shared with the
underlying
> disks that make up your osd, obfuscated by a hardware raid? I
didn''t
> see the kernel warning you''re referring to, just the ixgbe malloc
> failure you mentioned the other day.
I''ve looked at most of this. What makes me point to btrfs, is that the
problem goes away when I reboot on server in our cluster, but persists
on the other systems. So it can''t be related to the number of requests
that come in.
> I do not mean to presume that you have not looked at these things
> already. I am not very knowledgeable in btrfs specifically, but I
> would expect any degradation in performance over time to be due to
> what''s on disk (lots of small files, fragmented, etc). This is
> obviously not the case in this situation since a reboot recovers the
> performance. I suppose it could also be a memory leak or something
> similar, but you should be able to detect something like that by
> monitoring your memory situation, /proc/slabinfo etc.
It could be related to a memory leak. The machine has a lot RAM (24
GB), but we have seen page allocation failures in the ixgbe driver,
when we are using jumbo frames.
> Just my thoughts, good luck on this. I am currently running 2.6.39.3
> (btrfs) on the 7 node cluster I put together, but I just built it and
> am comparing between various configs. It will be awhile before it is
> under load for several days straight.
Thanks!

When I look at the latencytop results, there is a high latency when
calling "btrfs_commit_transaction_async". Isn''t
"async" supposed to
return immediately?

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sage Weil

2011-Jul-28 16:01 UTC

head link

Re: Btrfs slowdown

On Thu, 28 Jul 2011, Christian Brunner wrote:> When I look at the latencytop results, there is a high latency when
> calling "btrfs_commit_transaction_async". Isn''t
"async" supposed to
> return immediately?
It depends.  That function has to block until the commit has started 
before returning in the case where it creates a new btrfs root (i.e., 
snapshot creation).  Otherwise a subsequent operation (after the ioctl 
returns) can sneak in before the snapshot is taken.  (IIRC there was also 
another problem with keeping internal structures consistent, tho I''m 
forgetting the details.)  And there are a bunch of things 
btrfs_commit_transaction() does before setting blocked = 1 that can be 
slow.  There is a fair bit of transaction commit optimization work that 
should eventually be done here that we sadly haven''t had the resources
to
look at yet.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

mck

2011-Aug-03 15:56 UTC

head link

Re: Btrfs slowdown

I can confirm this as well (64-bit, Core i7, single-disk).
> The issue seems to be gone in 3.0.0.
After a few hours working 3.0.0 slows down on me too. The performance
becomes unusable and a reboot is a must. Certain applications
(particularly evolution ad firefox) are next to permanently greyed out.

I have had a couple of corrupted tree logs recently and had to use
btrfs-zero-log (mentioned in an earlier thread). Otherwise returning to
2.6.38 is the workaround.

~mck

-- 
"A mind that has been stretched will never return to it''s original
dimension." Albert Einstein 
| www.semb.wever.org | www.sesat.no 
| http://tech.finn.no | http://xss-http-filter.sf.net

Sage Weil

2011-Aug-08 21:58 UTC

head link

Re: Btrfs slowdown

Hi Christian,

Are you still seeing this slowness?

sage


On Wed, 27 Jul 2011, Christian Brunner wrote:> 2011/7/25 Chris Mason <chris.mason@oracle.com>:
> > Excerpts from Christian Brunner''s message of 2011-07-25
03:54:47 -0400:
> >> Hi,
> >>
> >> we are running a ceph cluster with btrfs as it''s base
filesystem
> >> (kernel 3.0). At the beginning everything worked very well, but
after
> >> a few days (2-3) things are getting very slow.
> >>
> >> When I look at the object store servers I see heavy disk-i/o on
the
> >> btrfs filesystems (disk utilization is between 60% and 100%). I
also
> >> did some tracing on the Cepp-Object-Store-Daemon, but I''m
quite
> >> certain, that the majority of the disk I/O is not caused by ceph
or
> >> any other userland process.
> >>
> >> When reboot the system(s) the problems go away for another 2-3
days,
> >> but after that, it starts again. I''m not sure if the
problem is
> >> related to the kernel warning I''ve reported last week. At
least there
> >> is no temporal relationship between the warning and the slowdown.
> >>
> >> Any hints on how to trace this would be welcome.
> >
> > The easiest way to trace this is with latencytop.
> >
> > Apply this patch:
> >
> > http://oss.oracle.com/~mason/latencytop.patch
> >
> > And then use latencytop -c for a few minutes while the system is slow.
> > Send the output here and hopefully we''ll be able to figure it
out.
> 
> I''ve now installed latencytop. Attached are two output files: The
> first is from yesterday and was created aproxematly half an hour after
> the boot. The second on is from today, uptime is 19h. The load on the
> system is already rising. Disk utilization is approximately at 50%.
> 
> Thanks for your help.
> 
> Christian
> --
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Brunner

2011-Aug-09 13:33 UTC

head link

Re: Btrfs slowdown

Hi Sage,

I did some testing with btrfs-unstable yesterday. With the recent
commit from Chris it looks quite good:

"Btrfs: force unplugs when switching from high to regular priority
bios"


However I can''t test it extensively, because our main environment is
on ext4 at the moment.

Regards
Christian

2011/8/8 Sage Weil <sage@newdream.net>:> Hi Christian,
>
> Are you still seeing this slowness?
>
> sage
>
>
> On Wed, 27 Jul 2011, Christian Brunner wrote:
>> 2011/7/25 Chris Mason <chris.mason@oracle.com>:
>> > Excerpts from Christian Brunner''s message of 2011-07-25
03:54:47 -0400:
>> >> Hi,
>> >>
>> >> we are running a ceph cluster with btrfs as it''s base
filesystem
>> >> (kernel 3.0). At the beginning everything worked very well,
but after
>> >> a few days (2-3) things are getting very slow.
>> >>
>> >> When I look at the object store servers I see heavy disk-i/o
on the
>> >> btrfs filesystems (disk utilization is between 60% and 100%).
I also
>> >> did some tracing on the Cepp-Object-Store-Daemon, but
I''m quite
>> >> certain, that the majority of the disk I/O is not caused by
ceph or
>> >> any other userland process.
>> >>
>> >> When reboot the system(s) the problems go away for another 2-3
days,
>> >> but after that, it starts again. I''m not sure if the
problem is
>> >> related to the kernel warning I''ve reported last
week. At least there
>> >> is no temporal relationship between the warning and the
slowdown.
>> >>
>> >> Any hints on how to trace this would be welcome.
>> >
>> > The easiest way to trace this is with latencytop.
>> >
>> > Apply this patch:
>> >
>> > http://oss.oracle.com/~mason/latencytop.patch
>> >
>> > And then use latencytop -c for a few minutes while the system is
slow.
>> > Send the output here and hopefully we''ll be able to
figure it out.
>>
>> I''ve now installed latencytop. Attached are two output files:
The
>> first is from yesterday and was created aproxematly half an hour after
>> the boot. The second on is from today, uptime is 19h. The load on the
>> system is already rising. Disk utilization is approximately at 50%.
>>
>> Thanks for your help.
>>
>> Christian
>>
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Maybe Matching Threads

Search for more seemingly similar threads

Btrfs devel - Jul 2011 - Btrfs slowdown

Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Re: Btrfs slowdown

Maybe Matching Threads