Hello everyone, I''ve rebased the experimental branch to include most of the optimizations I''ve been working on. The two major changes are doing all extent tree operations in delayed processing queues and removing many of the blocking points with btree locks held. In addition to smoothing out IO performance, these changes really cut down on the amount of stack btrfs is using, which is especially important for kernels with 4k stacks enabled (fedora). -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> Hello everyone, > > I''ve rebased the experimental branch to include most of the > optimizations I''ve been working on. > > The two major changes are doing all extent tree operations in delayed > processing queues and removing many of the blocking points with btree > locks held. > > In addition to smoothing out IO performance, these changes really cut > down on the amount of stack btrfs is using, which is especially > important for kernels with 4k stacks enabled (fedora). > >Well, no drastic changes. On Raid, creates got better, but random write got worse. Mail server was mixed. For single disk, pretty much the same story, although CPU savings is noticeable on write, although at the expense of performance. Raid graphs and links: http://btrfs.boxacle.net/repository/raid/history/History.html Single disk graphs and links: http://btrfs.boxacle.net/repository/single-disk/History/History.html Steve> -chris > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2009-03-13 at 17:52 -0500, Steven Pratt wrote:> Chris Mason wrote: > > Hello everyone, > > > > I''ve rebased the experimental branch to include most of the > > optimizations I''ve been working on. > > > > The two major changes are doing all extent tree operations in delayed > > processing queues and removing many of the blocking points with btree > > locks held. > > > > In addition to smoothing out IO performance, these changes really cut > > down on the amount of stack btrfs is using, which is especially > > important for kernels with 4k stacks enabled (fedora). > > > > > Well, no drastic changes. On Raid, creates got better, but random write > got worse. Mail server was mixed. For single disk, pretty much the same > story, although CPU savings is noticeable on write, although at the > expense of performance. >Thanks for running this, but the main performance fixes for your test are still in testing locally. One thing that makes a huge difference on the random write run is to mount -o ssd. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Fri, 2009-03-13 at 17:52 -0500, Steven Pratt wrote: > >> Chris Mason wrote: >> >>> Hello everyone, >>> >>> I''ve rebased the experimental branch to include most of the >>> optimizations I''ve been working on. >>> >>> The two major changes are doing all extent tree operations in delayed >>> processing queues and removing many of the blocking points with btree >>> locks held. >>> >>> In addition to smoothing out IO performance, these changes really cut >>> down on the amount of stack btrfs is using, which is especially >>> important for kernels with 4k stacks enabled (fedora). >>> >>> >>> >> Well, no drastic changes. On Raid, creates got better, but random write >> got worse. Mail server was mixed. For single disk, pretty much the same >> story, although CPU savings is noticeable on write, although at the >> expense of performance. >> >> > > Thanks for running this, but the main performance fixes for your test > are still in testing locally. One thing that makes a huge difference on > the random write run is to mount -o ssd. > >Tried a run with -o ssd on the raid system. It made some minor improvements in random write performance. Helps more on odirect, but mainly at the 16thread count. Single and 128 threads it doesn''t make much difference. Results syncing now to history boxacle http://btrfs.boxacle.net/repository/raid/history/History.html Steve> -chris > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dear All, I was wondering if the experimental branch is smth. different or it is the same git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable-standalone.git That repository does not seem to have any branches other than head and master. Thank you, Best regards, Grigory Makarevich Chris Mason wrote:> Hello everyone, > > I''ve rebased the experimental branch to include most of the > optimizations I''ve been working on. > > The two major changes are doing all extent tree operations in delayed > processing queues and removing many of the blocking points with btree > locks held. > > In addition to smoothing out IO performance, these changes really cut > down on the amount of stack btrfs is using, which is especially > important for kernels with 4k stacks enabled (fedora). > > -chris > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2009-03-15 at 15:13 -0400, Grigory Makarevich wrote:> Dear All, > > I was wondering if the experimental branch is smth. different or it is > the same > git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable-standalone.git > > That repository does not seem to have any branches other than head and > master. >Unfortunately I still need to update the standalone tree. All of these changes are only in the btrfs-unstable.git tree. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2009-03-15 at 09:38 -0500, Steven Pratt wrote:> > Thanks for running this, but the main performance fixes for your test > > are still in testing locally. One thing that makes a huge difference on > > the random write run is to mount -o ssd. > > > > > Tried a run with -o ssd on the raid system. It made some minor > improvements in random write performance. Helps more on odirect, but > mainly at the 16thread count. Single and 128 threads it doesn''t make > much difference. > > Results syncing now to history boxacle > http://btrfs.boxacle.net/repository/raid/history/History.htmlWell, still completely different from my test rig ;) For the random write run, yours runs at 580 trans/sec for btrfs and mine is going along at 8000 trans/sec. The part that confuses me is that you seem to have some big gaps where just a single CPU is stuck in IO wait, and not much CPU time is in use. Do you happen to have the blktrace logs for any of the btrfs runs? I''d be interested in a script that did sysrq-w every 5s and captured the output. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Sun, 2009-03-15 at 09:38 -0500, Steven Pratt wrote: > >>> Thanks for running this, but the main performance fixes for your test >>> are still in testing locally. One thing that makes a huge difference on >>> the random write run is to mount -o ssd. >>> >>> >>> >> Tried a run with -o ssd on the raid system. It made some minor >> improvements in random write performance. Helps more on odirect, but >> mainly at the 16thread count. Single and 128 threads it doesn''t make >> much difference. >> >> Results syncing now to history boxacle >> http://btrfs.boxacle.net/repository/raid/history/History.html >> > > Well, still completely different from my test rig ;) For the random > write run, yours runs at 580 trans/sec for btrfs and mine is going along > at 8000 trans/sec. >That is odd. However, I think I have found 1 factor. In rerunning with blktrace and sysrq an interesting thing happened. The results got a lot faster. What I did was just run the 128 thread odirect random write test. Instead of 2.8MB/sec, I got 17MB/sec. Still far below the 100+ of ext4 and JFS, but one heck of a difference. Here is what I think is going on. We make use of a flag in FFSB to reuse the existing fileset if the fileset meets the setup criteria exactly. For the test I am running that is 1024 100MB files. Since all of the random write test are doing overwrites within the file, the file sizes do not change and therefore the fileset is valid for reuse. For most Filesystems this is fine, but with BTRFS COW, this will result in a very different file layout at the start of each variation of the random write test. The latest 128 thread was on a newly formatted FS. So...., I will do 2 new runs tonight. First, I will re-mkfs before each random write test and otherwise run as usual. Second, I plan on running the 128 threads test multiple times (5 minute runs each) to see if it does really degrade over time. What worries me is that for the case describer above, we only have about 25 minutes of aging on the filesystem by the time we execute th last random write test, which is not a whole lot.> The part that confuses me is that you seem to have some big gaps where > just a single CPU is stuck in IO wait, and not much CPU time is in use. >Yes, I have noticed that.> Do you happen to have the blktrace logs for any of the btrfs runs? I''d > be interested in a script that did sysrq-w every 5s and captured the > output. >No, but as I mentioned above, I ran this today. Had a bug collecting the sysrq, so I''ll re-run tonight and post as soon as I can. Steve> -chris > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2009-03-17 at 15:57 -0500, Steven Pratt wrote:> Chris Mason wrote: > > On Sun, 2009-03-15 at 09:38 -0500, Steven Pratt wrote: > > > >>> Thanks for running this, but the main performance fixes for your test > >>> are still in testing locally. One thing that makes a huge difference on > >>> the random write run is to mount -o ssd. > >>> > >>> > >>> > >> Tried a run with -o ssd on the raid system. It made some minor > >> improvements in random write performance. Helps more on odirect, but > >> mainly at the 16thread count. Single and 128 threads it doesn''t make > >> much difference. > >> > >> Results syncing now to history boxacle > >> http://btrfs.boxacle.net/repository/raid/history/History.html > >> > > > > Well, still completely different from my test rig ;) For the random > > write run, yours runs at 580 trans/sec for btrfs and mine is going along > > at 8000 trans/sec. > > > That is odd. However, I think I have found 1 factor. In rerunning with > blktrace and sysrq an interesting thing happened. The results got a lot > faster. What I did was just run the 128 thread odirect random write > test. Instead of 2.8MB/sec, I got 17MB/sec. Still far below the 100+ of > ext4 and JFS, but one heck of a difference. Here is what I think is > going on. We make use of a flag in FFSB to reuse the existing fileset > if the fileset meets the setup criteria exactly. For the test I am > running that is 1024 100MB files. Since all of the random write test > are doing overwrites within the file, the file sizes do not change and > therefore the fileset is valid for reuse.Oh! In that case you''re stuck waiting to cache the extents already used in a block group. At least I hope that''s what sysrq-w will show us. The first mods to a block group after a mount are slow while we read in the free extents. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html