Hello, I''ve migrated my system to btrfs (raid1) a few months ago. Since then the performance has been pretty bad, but recently it''s gotten unbearable: a simple sync called while the system is idle can take 20 up to 60 seconds. Creating or deleting files often has several seconds latency, too. One curious - but maybe unrelated - observation is that even though I''m using a raid1 btrfs setup, the hdds are often being written to sequentially. One hard-drive sees some write activity and after it subsides, the other drive sees some activity. (See attached sequential-writes.txt.) - 64bit gentoo with vanilla 2.6.39 kernel - lzo compression enabled - 2x WD1000FYPS (1TB WD hdds) - Athlon x2 2.2GHz with 8GB RAM - space_cache was enabled, but it seemed to make the problem worse. It''s no longer in the mount options. Any help is appreciated. Thanks, Henning server ~ # sync; time sync real 0m28.869s user 0m0.000s sys 0m5.750s server ~ # uname -a Linux server 2.6.39 #3 SMP Sat May 28 17:25:31 CEST 2011 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux server ~ # mount | grep btrfs /dev/sdb2 on / type btrfs (rw,noatime,compress=lzo,noacl) /dev/sda2 on /mnt/pool type btrfs (rw,noatime,subvolid=0,compress=lzo) /dev/sda2 on /usr/portage type btrfs (rw,noatime,subvol=newportage,compress=lzo) /dev/sda2 on /home type btrfs (rw,noatime,subvol=home,compress=lzo) /dev/sda2 on /home/mythtv type btrfs (rw,noatime,subvol=mythtv,compress=lzo) server ~ # btrfs fi show Label: none uuid: 7676eb78-e411-4505-ac51-ccd12aa5a6b6 Total devices 2 FS bytes used 281.58GB devid 1 size 931.28GB used 898.26GB path /dev/sda2 devid 3 size 931.27GB used 898.26GB path /dev/sdb2 Btrfs v0.19-35-g1b444cd-dirty server ~ # btrfs fi df / Data, RAID1: total=875.00GB, used=279.30GB System, RAID1: total=8.00MB, used=140.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=23.25GB, used=2.28GB bonnie++ Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP server 16G 147 90 76321 18 31787 16 1370 71 64812 14 27.0 66 Latency 66485us 7581ms 4455ms 25011us 695ms 959ms Version 1.96 ------Sequential Create------ --------Random Create-------- server -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 238 51 +++++ +++ 219 51 284 52 +++++ +++ 390 57 Latency 1914ms 524us 3461ms 1141ms 39us 1308ms 1.96,1.96,server,1,1308618030,16G,,147,90,76321,18,31787,16,1370,71,64812,14,27.0,66,16,,,,,238,51,+++++,+++,219,51,284,52,+++++,+++,390,57,66485us,7581ms,4455ms,25011us,695ms,959ms,1914ms,524us,3461ms,1141ms,39us,1308ms
On 06/20/2011 05:51 PM, Henning Rohlfs wrote:> Hello, > > I''ve migrated my system to btrfs (raid1) a few months ago. Since then > the performance has been pretty bad, but recently it''s gotten > unbearable: a simple sync called while the system is idle can take 20 up > to 60 seconds. Creating or deleting files often has several seconds > latency, too. > > One curious - but maybe unrelated - observation is that even though I''m > using a raid1 btrfs setup, the hdds are often being written to > sequentially. One hard-drive sees some write activity and after it > subsides, the other drive sees some activity. (See attached > sequential-writes.txt.) >Can you do sysrq+w while this is happening so we can see who is doing the writing? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 20 Jun 2011 20:12:16 -0400, Josef Bacik wrote:> On 06/20/2011 05:51 PM, Henning Rohlfs wrote: >> Hello, >> >> I''ve migrated my system to btrfs (raid1) a few months ago. Since >> then >> the performance has been pretty bad, but recently it''s gotten >> unbearable: a simple sync called while the system is idle can take >> 20 up >> to 60 seconds. Creating or deleting files often has several seconds >> latency, too. >> >> One curious - but maybe unrelated - observation is that even though >> I''m >> using a raid1 btrfs setup, the hdds are often being written to >> sequentially. One hard-drive sees some write activity and after it >> subsides, the other drive sees some activity. (See attached >> sequential-writes.txt.) >> > > Can you do sysrq+w while this is happening so we can see who is doing > the writing? Thanks, > > JosefWhen I call sync, it starts with several seconds of 100% (one core) cpu usage by sync itself. Afterwards btrfs-submit-0 and sync are blocked. sysrq+w output is attached.
Henning Rohlfs wrote (ao):> - space_cache was enabled, but it seemed to make the problem worse. > It''s no longer in the mount options.space_cache is a one time mount option which enabled space_cache. Not supplying it anymore as a mount option has no effect (dmesg | grep btrfs). Sander -- Humilis IT Services and Solutions http://www.humilis.net -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 21 Jun 2011 10:00:59 +0200, Sander wrote:> Henning Rohlfs wrote (ao): >> - space_cache was enabled, but it seemed to make the problem worse. >> It''s no longer in the mount options. > > space_cache is a one time mount option which enabled space_cache. Not > supplying it anymore as a mount option has no effect (dmesg | grep > btrfs).I''m sure that after the first reboot after removing the flag from the mount options, the system was faster for a while. That must have been a coincidence (or just an error on my part). Anyway, I rebooted with clear_cache as mount option and there was no improvement either. Thanks for pointing this out, Henning -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/21/2011 05:26 AM, Henning Rohlfs wrote:> On Tue, 21 Jun 2011 10:00:59 +0200, Sander wrote: >> Henning Rohlfs wrote (ao): >>> - space_cache was enabled, but it seemed to make the problem worse. >>> It''s no longer in the mount options. >> >> space_cache is a one time mount option which enabled space_cache. Not >> supplying it anymore as a mount option has no effect (dmesg | grep >> btrfs). > > I''m sure that after the first reboot after removing the flag from the > mount options, the system was faster for a while. That must have been a > coincidence (or just an error on my part). >No, the space cache will make your system faster _after_ having been enabled once. The reason for this is because we have to build the cache the slow way at first, and then after that we can do it the fast way. What is probably happening is your box is slowing down trying to build this cache. Don''t mount with clear_cache unless there is a bug in your cache. Let it do it''s thing and stuff will get faster. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2011-06-20 at 23:51 +0200, Henning Rohlfs wrote:> Hello, > > I''ve migrated my system to btrfs (raid1) a few months ago. Since then > the performance has been pretty bad, but recently it''s gotten > unbearable: a simple sync called while the system is idle can take 20 up > to 60 seconds. Creating or deleting files often has several seconds > latency, too.I think I’ve been seeing a fairly similar, or possibly the same? issue as well. It looks like it’s actually a regression introduced in 2.6.39 - if I switch back to a 2.6.38 kernel, my latency issues magically go away! (I''m curious: does using the older 2.6.38.x kernel help with anyone else that''s seeing the issue?) Some hardware/configuration details: btrfs on a single disc (Seagate Momentus XT hybrid), lzo compression and space cache enabled. Some snapshots in use. I notice that in latencytop I''m seeing a lot of lines with (cropped) traces like sleep_on_page wait_on_page_bit read_extent_buffer_ 13.3 msec 0.5 % showing up that I didn''t see with the 2.6.38 kernel. I occasionally see latencies as bad as 20-30 seconds on operations like fsync or synchronous writes. I think I can reproduce the issue well enough to bisect it, so I might give that a try. It''ll be slow going, though. -- Calvin Walton <calvin.walton@kepstin.ca> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 21 Jun 2011 11:18:30 -0400, Josef Bacik wrote:> On 06/21/2011 05:26 AM, Henning Rohlfs wrote: >> On Tue, 21 Jun 2011 10:00:59 +0200, Sander wrote: >>> Henning Rohlfs wrote (ao): >>>> - space_cache was enabled, but it seemed to make the problem >>>> worse. >>>> It''s no longer in the mount options. >>> >>> space_cache is a one time mount option which enabled space_cache. >>> Not >>> supplying it anymore as a mount option has no effect (dmesg | grep >>> btrfs). >> >> I''m sure that after the first reboot after removing the flag from >> the >> mount options, the system was faster for a while. That must have >> been a >> coincidence (or just an error on my part). >> > > No, the space cache will make your system faster _after_ having been > enabled once. The reason for this is because we have to build the > cache > the slow way at first, and then after that we can do it the fast way. > What is probably happening is your box is slowing down trying to > build > this cache. Don''t mount with clear_cache unless there is a bug in > your > cache. Let it do it''s thing and stuff will get faster.I''m just reporting what I experienced. I had space_cache in the mount options while the problem developed and removed it when the system got too slow. After the next reboot the system was responsive for a short time (an hour maybe - which seems to have been unrelated to the mount option though from what you described). Now there''s no difference whatsoever between no options, space_cache and clear_cache. To sum it up: I only played with the clear_cache option because the system got too slow in the first place. I don''t see how the problem can be related to this option if changing it it makes no difference. Thanks, Henning -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 21 Jun 2011 11:24:11 -0400, Calvin Walton wrote:> On Mon, 2011-06-20 at 23:51 +0200, Henning Rohlfs wrote: >> Hello, >> >> I''ve migrated my system to btrfs (raid1) a few months ago. Since >> then >> the performance has been pretty bad, but recently it''s gotten >> unbearable: a simple sync called while the system is idle can take >> 20 up >> to 60 seconds. Creating or deleting files often has several seconds >> latency, too. > > I think I’ve been seeing a fairly similar, or possibly the same? > issue > as well. It looks like it’s actually a regression introduced in > 2.6.39 - > if I switch back to a 2.6.38 kernel, my latency issues magically go > away! (I''m curious: does using the older 2.6.38.x kernel help with > anyone else that''s seeing the issue?) > > Some hardware/configuration details: > btrfs on a single disc (Seagate Momentus XT hybrid), lzo compression > and > space cache enabled. Some snapshots in use. > > I notice that in latencytop I''m seeing a lot of lines with (cropped) > traces like > > sleep_on_page wait_on_page_bit read_extent_buffer_ 13.3 msec > 0.5 % > > showing up that I didn''t see with the 2.6.38 kernel. I occasionally > see > latencies as bad as 20-30 seconds on operations like fsync or > synchronous writes. > > I think I can reproduce the issue well enough to bisect it, so I > might > give that a try. It''ll be slow going, though.You are right. This seems to be a regression in the .39 kernel. I tested with 2.6.38.2 just now and the performance is back to normal. Thanks, Henning server ~ # uname -a Linux server 2.6.38.2 #1 SMP Thu Apr 14 13:05:35 CEST 2011 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux server ~ # sync; time sync real 0m0.144s user 0m0.000s sys 0m0.020s server ~ # bonnie++ -d tmp -u 0:0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP server 16G 147 97 279933 56 73245 34 1258 78 102379 23 177.3 50 Latency 423ms 103ms 645ms 163ms 404ms 264ms Version 1.96 ------Sequential Create------ --------Random Create-------- server -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 3784 28 +++++ +++ 8519 60 13694 59 +++++ +++ 11710 76 Latency 127ms 1024us 18718us 15958us 119us 2459us 1.96,1.96,server,1,1308745595,16G,,147,97,279933,56,73245,34,1258,78,102379,23,177.3,50,16,,,,,3784,28,+++++,+++,8519,60,13694,59,+++++,+++,11710,76,423ms,103ms,645ms,163ms,404ms,264ms,127ms,1024us,18718us,15958us,119us,2459us -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/22/2011 10:15 AM, Henning Rohlfs wrote:> On Tue, 21 Jun 2011 11:24:11 -0400, Calvin Walton wrote: >> On Mon, 2011-06-20 at 23:51 +0200, Henning Rohlfs wrote: >>> Hello, >>> >>> I''ve migrated my system to btrfs (raid1) a few months ago. Since then >>> the performance has been pretty bad, but recently it''s gotten >>> unbearable: a simple sync called while the system is idle can take >>> 20 up >>> to 60 seconds. Creating or deleting files often has several seconds >>> latency, too. >> >> I think I’ve been seeing a fairly similar, or possibly the same? issue >> as well. It looks like it’s actually a regression introduced in 2.6.39 - >> if I switch back to a 2.6.38 kernel, my latency issues magically go >> away! (I''m curious: does using the older 2.6.38.x kernel help with >> anyone else that''s seeing the issue?) >> >> Some hardware/configuration details: >> btrfs on a single disc (Seagate Momentus XT hybrid), lzo compression and >> space cache enabled. Some snapshots in use. >> >> I notice that in latencytop I''m seeing a lot of lines with (cropped) >> traces like >> >> sleep_on_page wait_on_page_bit read_extent_buffer_ 13.3 msec >> 0.5 % >> >> showing up that I didn''t see with the 2.6.38 kernel. I occasionally see >> latencies as bad as 20-30 seconds on operations like fsync or >> synchronous writes. >> >> I think I can reproduce the issue well enough to bisect it, so I might >> give that a try. It''ll be slow going, though. > > You are right. This seems to be a regression in the .39 kernel. I tested > with 2.6.38.2 just now and the performance is back to normal.Would you mind bisecting? You can make it go faster by doing git bisect start fs/ that way only changes in fs are used. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-06-22 at 11:39 -0400, Josef Bacik wrote:> On 06/22/2011 10:15 AM, Henning Rohlfs wrote: > > On Tue, 21 Jun 2011 11:24:11 -0400, Calvin Walton wrote: > >> On Mon, 2011-06-20 at 23:51 +0200, Henning Rohlfs wrote: > >>> Hello, > >>> > >>> I''ve migrated my system to btrfs (raid1) a few months ago. Since then > >>> the performance has been pretty bad, but recently it''s gotten > >>> unbearable: a simple sync called while the system is idle can take > >>> 20 up > >>> to 60 seconds. Creating or deleting files often has several seconds > >>> latency, too. > >> > >> I think I’ve been seeing a fairly similar, or possibly the same? issue > >> as well. It looks like it’s actually a regression introduced in 2.6.39 - > >> if I switch back to a 2.6.38 kernel, my latency issues magically go > >> away! (I''m curious: does using the older 2.6.38.x kernel help with > >> anyone else that''s seeing the issue?)> >> I think I can reproduce the issue well enough to bisect it, so I might > >> give that a try. It''ll be slow going, though. > > > > You are right. This seems to be a regression in the .39 kernel. I tested > > with 2.6.38.2 just now and the performance is back to normal. > > Would you mind bisecting?Just before I was going to try bisecting, I tried the 3.0-rc4 kernel out of curiosity. And it seems to be quite a bit better; at the very least, I’m not seeing gui applications stalling for ~10 seconds when doing things like opening or writing files. Latencytop is reporting fsync() latencies staying pretty steady in the range of under 300ms, with occasional outliers at up to 2s, and it''s not getting worse with time. I''ll still look into doing a bisect between 2.6.38 and 2.6.39, I''m curious what went wrong. -- Calvin Walton <calvin.walton@kepstin.ca> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/22/2011 11:57 AM, Calvin Walton wrote:> On Wed, 2011-06-22 at 11:39 -0400, Josef Bacik wrote: >> On 06/22/2011 10:15 AM, Henning Rohlfs wrote: >>> On Tue, 21 Jun 2011 11:24:11 -0400, Calvin Walton wrote: >>>> On Mon, 2011-06-20 at 23:51 +0200, Henning Rohlfs wrote: >>>>> Hello, >>>>> >>>>> I''ve migrated my system to btrfs (raid1) a few months ago. Since then >>>>> the performance has been pretty bad, but recently it''s gotten >>>>> unbearable: a simple sync called while the system is idle can take >>>>> 20 up >>>>> to 60 seconds. Creating or deleting files often has several seconds >>>>> latency, too. >>>> >>>> I think I’ve been seeing a fairly similar, or possibly the same? issue >>>> as well. It looks like it’s actually a regression introduced in 2.6.39 - >>>> if I switch back to a 2.6.38 kernel, my latency issues magically go >>>> away! (I''m curious: does using the older 2.6.38.x kernel help with >>>> anyone else that''s seeing the issue?) > >>>> I think I can reproduce the issue well enough to bisect it, so I might >>>> give that a try. It''ll be slow going, though. >>> >>> You are right. This seems to be a regression in the .39 kernel. I tested >>> with 2.6.38.2 just now and the performance is back to normal. >> >> Would you mind bisecting? > > Just before I was going to try bisecting, I tried the 3.0-rc4 kernel out > of curiosity. And it seems to be quite a bit better; at the very least, > I’m not seeing gui applications stalling for ~10 seconds when doing > things like opening or writing files. Latencytop is reporting fsync() > latencies staying pretty steady in the range of under 300ms, with > occasional outliers at up to 2s, and it''s not getting worse with time. > > I''ll still look into doing a bisect between 2.6.38 and 2.6.39, I''m > curious what went wrong. >Yeah that makes two of us :). There were some other plugging changes that went into to 38, so maybe bisect all of the kernel, not just fs/ just in case it was those and not us. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html