Here is the latest set of performance runs from the 2.6.35-rc5 tree. Included is a refresh of all the other filesystems with some changes for barriers on and off since this has been somewhat of a hot topic recently. New data linked in to the history graphs here: http://btrfs.boxacle.net/repository/raid/history/History.html From a BTRFS performance perspective, we took a major regression on write heavy workloads. As much as a 10x hit! The problems seems to be due to this changeset: http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 Btrfs: Shrink delay allocated space in a synchronized Shrink delayed allocation space in a synchronized manner is more controllable than flushing all delay allocated space in an async thread. This changeset introduced "btrfs_start_one_delalloc_inode" in http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 In heavy write workloads this new function is now dominating the profiles: samples % app name symbol name 8914973 65.1261 btrfs.ko btrfs_start_one_delalloc_inode 1024841 7.4867 vmlinux-2.6.35-rc5-autokern1 rb_get_reader_page 716046 5.2309 vmlinux-2.6.35-rc5-autokern1 ring_buffer_consume 315354 2.3037 oprofile.ko add_event_entry 202484 1.4792 vmlinux-2.6.35-rc5-autokern1 write_inode_now 195018 1.4247 btrfs.ko btrfs_tree_lock Appears to be major contention on the spin lock, as this gets worse with more threads. This needs to be redone. Steve -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 06, 2010 at 01:44:11PM -0500, Steven Pratt wrote:> Here is the latest set of performance runs from the 2.6.35-rc5 tree. > Included is a refresh of all the other filesystems with some changes > for barriers on and off since this has been somewhat of a hot topic > recently. > > New data linked in to the history graphs here: > http://btrfs.boxacle.net/repository/raid/history/History.html > > From a BTRFS performance perspective, we took a major regression on > write heavy workloads. As much as a 10x hit! The problems seems to > be due to this changeset: > http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=5da9d01b66458b180a6bee0e637a1d0a3effc622Ouch! The problem is we''re not being aggressive enough about allocating chunks for data, which makes the flusher come in and start data IO. Thanks a lot for finding the regression, my machine definitely didn''t show this. I''ll reproduce and fix it up. -chris> Btrfs: Shrink delay allocated space in a synchronized > > Shrink delayed allocation space in a synchronized manner is more > controllable than flushing all delay allocated space in an async > thread. > > This changeset introduced "btrfs_start_one_delalloc_inode" in > http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 > > In heavy write workloads this new function is now dominating the profiles: > > samples % app name symbol name > 8914973 65.1261 btrfs.ko btrfs_start_one_delalloc_inode > 1024841 7.4867 vmlinux-2.6.35-rc5-autokern1 rb_get_reader_page > 716046 5.2309 vmlinux-2.6.35-rc5-autokern1 ring_buffer_consume > 315354 2.3037 oprofile.ko add_event_entry > 202484 1.4792 vmlinux-2.6.35-rc5-autokern1 write_inode_now > 195018 1.4247 btrfs.ko btrfs_tree_lock > > > Appears to be major contention on the spin lock, as this gets worse > with more threads. This needs to be redone. > > > Steve > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Fri, Aug 06, 2010 at 01:44:11PM -0500, Steven Pratt wrote: > > Here is the latest set of performance runs from the 2.6.35-rc5 tree. > > Included is a refresh of all the other filesystems with some changes > > for barriers on and off since this has been somewhat of a hot topic > > recently. > > > > New data linked in to the history graphs here: > > http://btrfs.boxacle.net/repository/raid/history/History.html > > > > From a BTRFS performance perspective, we took a major regression on > > write heavy workloads. As much as a 10x hit! The problems seems to > > be due to this changeset: > > http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 > > Ouch! The problem is we''re not being aggressive enough about > allocating chunks for data, which makes the flusher come in and start > data IO. > > Thanks a lot for finding the regression, my machine definitely didn''t > show this. > > I''ll reproduce and fix it up. >The guys testing this in Ubuntu''s Maverick Alpha''s have noticed this too... It happens only in some configurations... for example, I tested in a VM and did not see any issue, but when installing for real on the same hardware performance fell through the floor. https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/601299?comments=all> > -chris > > > Btrfs: Shrink delay allocated space in a synchronized > > > > Shrink delayed allocation space in a synchronized manner is more > > controllable than flushing all delay allocated space in an async > > thread. > > > > This changeset introduced "btrfs_start_one_delalloc_inode" in > > http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 > > > > In heavy write workloads this new function is now dominating the profiles: > > > > samples % app name symbol name > > 8914973 65.1261 btrfs.ko btrfs_start_one_delalloc_inode > > 1024841 7.4867 vmlinux-2.6.35-rc5-autokern1 rb_get_reader_page > > 716046 5.2309 vmlinux-2.6.35-rc5-autokern1 ring_buffer_consume > > 315354 2.3037 oprofile.ko add_event_entry > > 202484 1.4792 vmlinux-2.6.35-rc5-autokern1 write_inode_now > > 195018 1.4247 btrfs.ko btrfs_tree_lock > > > > > > Appears to be major contention on the spin lock, as this gets worse > > with more threads. This needs to be redone. > > > > > > Steve >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 06, 2010 at 01:44:11PM -0500, Steven Pratt wrote:> Here is the latest set of performance runs from the 2.6.35-rc5 tree. > Included is a refresh of all the other filesystems with some changes > for barriers on and off since this has been somewhat of a hot topic > recently. > > New data linked in to the history graphs here: > http://btrfs.boxacle.net/repository/raid/history/History.html > > From a BTRFS performance perspective, we took a major regression on > write heavy workloads. As much as a 10x hit! The problems seems to > be due to this changeset: > http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 > Btrfs: Shrink delay allocated space in a synchronized > > Shrink delayed allocation space in a synchronized manner is more > controllable than flushing all delay allocated space in an async > thread. > > This changeset introduced "btrfs_start_one_delalloc_inode" in > http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 > > In heavy write workloads this new function is now dominating the profiles: > > samples % app name symbol name > 8914973 65.1261 btrfs.ko btrfs_start_one_delalloc_inodeHi Steve, I think I know why this is a problem and how to fix it, but I''m having a trouble reproducing this exact setup. Which of your tests was this oprofile from? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Fri, Aug 06, 2010 at 01:44:11PM -0500, Steven Pratt wrote: > >> Here is the latest set of performance runs from the 2.6.35-rc5 tree. >> Included is a refresh of all the other filesystems with some changes >> for barriers on and off since this has been somewhat of a hot topic >> recently. >> >> New data linked in to the history graphs here: >> http://btrfs.boxacle.net/repository/raid/history/History.html >> >> From a BTRFS performance perspective, we took a major regression on >> write heavy workloads. As much as a 10x hit! The problems seems to >> be due to this changeset: >> http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 >> Btrfs: Shrink delay allocated space in a synchronized >> >> Shrink delayed allocation space in a synchronized manner is more >> controllable than flushing all delay allocated space in an async >> thread. >> >> This changeset introduced "btrfs_start_one_delalloc_inode" in >> http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 >> >> In heavy write workloads this new function is now dominating the profiles: >> >> samples % app name symbol name >> 8914973 65.1261 btrfs.ko btrfs_start_one_delalloc_inode >> > > Hi Steve, > > I think I know why this is a problem and how to fix it, but I''m having a > trouble reproducing this exact setup. Which of your tests was this > oprofile from? >128 thread random write. With or without nocow option. Steve> -chris > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 16, 2010 at 04:51:12PM -0500, Steven Pratt wrote:> Chris Mason wrote: > >>This changeset introduced "btrfs_start_one_delalloc_inode" in > >>http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 > >> > >>In heavy write workloads this new function is now dominating the profiles: > >> > >>samples % app name symbol name > >>8914973 65.1261 btrfs.ko btrfs_start_one_delalloc_inode > > > >Hi Steve, > > > >I think I know why this is a problem and how to fix it, but I''m having a > >trouble reproducing this exact setup. Which of your tests was this > >oprofile from? > 128 thread random write. With or without nocow option.Ok, I haven''t managed to reproduce your problem exactly, but this is faster for me here. Could you please give it a try: From 8e965331de749c39f3781d581b55d2c207de060f Mon Sep 17 00:00:00 2001 From: Chris Mason <chris.mason@oracle.com> Date: Wed, 18 Aug 2010 13:31:27 -0400 Subject: [PATCH] Btrfs: don''t trigger delayed allocation throttling as often We reserve metadata space based on the number of delayed allocation extents that are currently pending. As we run out of space, we start forcing writeback to turn those reservations into physical extents. The reservations are based on some worst case math, so the sooner we turn them into real blocks, the better off we are. But, the writeback is being forced too soon and too often. This fixes things to be less aggressive. Signed-off-by: Chris Mason <chris.mason@oracle.com> --- fs/btrfs/extent-tree.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 32d0940..55e1ee0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3681,6 +3681,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_block_rsv *block_rsv = &root->fs_info->delalloc_block_rsv; u64 to_reserve; + u64 max_reserve; int nr_extents; int retries = 0; int ret; @@ -3717,7 +3718,11 @@ again: block_rsv_add_bytes(block_rsv, to_reserve, 1); - if (block_rsv->size > 512 * 1024 * 1024) + /* 10% or 2GB */ + max_reserve = min_t(u64, 2ULL * 1024 * 1024 * 1024, + div_factor(root->fs_info->fs_devices->total_rw_bytes, 1)); + + if (block_rsv->size > max_reserve) shrink_delalloc(NULL, root, to_reserve); return 0; -- 1.7.1.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Mon, Aug 16, 2010 at 04:51:12PM -0500, Steven Pratt wrote: > >> Chris Mason wrote: >> >>>> This changeset introduced "btrfs_start_one_delalloc_inode" in >>>> http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 >>>> >>>> In heavy write workloads this new function is now dominating the profiles: >>>> >>>> samples % app name symbol name >>>> 8914973 65.1261 btrfs.ko btrfs_start_one_delalloc_inode >>>> >>> Hi Steve, >>> >>> I think I know why this is a problem and how to fix it, but I''m having a >>> trouble reproducing this exact setup. Which of your tests was this >>> oprofile from? >>> >> 128 thread random write. With or without nocow option. >> > > Ok, I haven''t managed to reproduce your problem exactly, but this is > faster for me here. Could you please give it a try: >Was out on vacation. Test is running now. Should have results by uploaded by Monday. Steve> >From 8e965331de749c39f3781d581b55d2c207de060f Mon Sep 17 00:00:00 2001 > From: Chris Mason <chris.mason@oracle.com> > Date: Wed, 18 Aug 2010 13:31:27 -0400 > Subject: [PATCH] Btrfs: don''t trigger delayed allocation throttling as often > > We reserve metadata space based on the number of delayed allocation > extents that are currently pending. As we run out of space, we start > forcing writeback to turn those reservations into physical extents. > > The reservations are based on some worst case math, so the sooner we > turn them into real blocks, the better off we are. > > But, the writeback is being forced too soon and too often. This fixes > things to be less aggressive. > > Signed-off-by: Chris Mason <chris.mason@oracle.com> > --- > fs/btrfs/extent-tree.c | 7 ++++++- > 1 files changed, 6 insertions(+), 1 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 32d0940..55e1ee0 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -3681,6 +3681,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) > struct btrfs_root *root = BTRFS_I(inode)->root; > struct btrfs_block_rsv *block_rsv = &root->fs_info->delalloc_block_rsv; > u64 to_reserve; > + u64 max_reserve; > int nr_extents; > int retries = 0; > int ret; > @@ -3717,7 +3718,11 @@ again: > > block_rsv_add_bytes(block_rsv, to_reserve, 1); > > - if (block_rsv->size > 512 * 1024 * 1024) > + /* 10% or 2GB */ > + max_reserve = min_t(u64, 2ULL * 1024 * 1024 * 1024, > + div_factor(root->fs_info->fs_devices->total_rw_bytes, 1)); > + > + if (block_rsv->size > max_reserve) > shrink_delalloc(NULL, root, to_reserve); > > return 0; >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Steven Pratt wrote:> Chris Mason wrote: >> On Mon, Aug 16, 2010 at 04:51:12PM -0500, Steven Pratt wrote: >> >>> Chris Mason wrote: >>> >>>>> This changeset introduced "btrfs_start_one_delalloc_inode" in >>>>> http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commitdiff;h=5da9d01b66458b180a6bee0e637a1d0a3effc622 >>>>> >>>>> >>>>> In heavy write workloads this new function is now dominating the >>>>> profiles: >>>>> >>>>> samples % app name symbol name >>>>> 8914973 65.1261 btrfs.ko >>>>> btrfs_start_one_delalloc_inode >>>>> >>>> Hi Steve, >>>> >>>> I think I know why this is a problem and how to fix it, but I''m >>>> having a >>>> trouble reproducing this exact setup. Which of your tests was this >>>> oprofile from? >>>> >>> 128 thread random write. With or without nocow option. >>> >> >> Ok, I haven''t managed to reproduce your problem exactly, but this is >> faster for me here. Could you please give it a try: >> > Was out on vacation. Test is running now. Should have results by > uploaded by Monday. > > Steve > >> >From 8e965331de749c39f3781d581b55d2c207de060f Mon Sep 17 00:00:00 2001 >> From: Chris Mason <chris.mason@oracle.com> >> Date: Wed, 18 Aug 2010 13:31:27 -0400 >> Subject: [PATCH] Btrfs: don''t trigger delayed allocation throttling >> as often >> >> We reserve metadata space based on the number of delayed allocation >> extents that are currently pending. As we run out of space, we start >> forcing writeback to turn those reservations into physical extents. >> >> The reservations are based on some worst case math, so the sooner we >> turn them into real blocks, the better off we are. >> >> But, the writeback is being forced too soon and too often. This fixes >> things to be less aggressive. >> >> Signed-off-by: Chris Mason <chris.mason@oracle.com> >> --- >> fs/btrfs/extent-tree.c | 7 ++++++- >> 1 files changed, 6 insertions(+), 1 deletions(-) >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index 32d0940..55e1ee0 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -3681,6 +3681,7 @@ int btrfs_delalloc_reserve_metadata(struct >> inode *inode, u64 num_bytes) >> struct btrfs_root *root = BTRFS_I(inode)->root; >> struct btrfs_block_rsv *block_rsv = >> &root->fs_info->delalloc_block_rsv; >> u64 to_reserve; >> + u64 max_reserve; >> int nr_extents; >> int retries = 0; >> int ret; >> @@ -3717,7 +3718,11 @@ again: >> >> block_rsv_add_bytes(block_rsv, to_reserve, 1); >> >> - if (block_rsv->size > 512 * 1024 * 1024) >> + /* 10% or 2GB */ >> + max_reserve = min_t(u64, 2ULL * 1024 * 1024 * 1024, >> + div_factor(root->fs_info->fs_devices->total_rw_bytes, 1)); >> + >> + if (block_rsv->size > max_reserve) >> shrink_delalloc(NULL, root, to_reserve); >> >> return 0; >>This did not seem to help, in fact we regressed more with COW enabled.. One thing to note, the last 2 sets of runs in the history graphs were actually run by Keith and he used stock kernel trees. For my recreate, I pulled the latest btrfs-unstable which is based on a 2.6.34 tree. Should I retest this on stock 2.6.35? The high time in btrfs_start_one_delalloc_inode still exists. Full results can be found here: http://btrfs.boxacle.net/repository/raid/perftest/perfpatch/perfpatch.html 128 thread random write test that shows the problem: http://btrfs.boxacle.net/repository/raid/perftest/perfpatch/perfpatch_Large_file_random_writes._num_threads=128.html Steve> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 23, 2010 at 02:13:53PM -0500, Steven Pratt wrote:> This did not seem to help, in fact we regressed more with COW > enabled.. One thing to note, the last 2 sets of runs in the history > graphs were actually run by Keith and he used stock kernel trees. > For my recreate, I pulled the latest btrfs-unstable which is based > on a 2.6.34 tree. Should I retest this on stock 2.6.35? The high > time in btrfs_start_one_delalloc_inode still exists.btrfs-unstable or .35 are both fine. Is this a fresh mkfs or are you reusing an existing tree?> > Full results can be found here: > http://btrfs.boxacle.net/repository/raid/perftest/perfpatch/perfpatch.html > > 128 thread random write test that shows the problem: > > http://btrfs.boxacle.net/repository/raid/perftest/perfpatch/perfpatch_Large_file_random_writes._num_threads=128.htmlOk, thanks, I''ll try again. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Mon, Aug 23, 2010 at 02:13:53PM -0500, Steven Pratt wrote: > >> This did not seem to help, in fact we regressed more with COW >> enabled.. One thing to note, the last 2 sets of runs in the history >> graphs were actually run by Keith and he used stock kernel trees. >> For my recreate, I pulled the latest btrfs-unstable which is based >> on a 2.6.34 tree. Should I retest this on stock 2.6.35? The high >> time in btrfs_start_one_delalloc_inode still exists. >> > > btrfs-unstable or .35 are both fine. > >Ok.> Is this a fresh mkfs or are you reusing an existing tree? > >In between. New mkfs before benchmark run, multiple tests are all then run with unmounting and remounting, but no new mkfs. The random write is preceded by sequential reads and random reads.>> Full results can be found here: >> http://btrfs.boxacle.net/repository/raid/perftest/perfpatch/perfpatch.html >> >> 128 thread random write test that shows the problem: >> >> http://btrfs.boxacle.net/repository/raid/perftest/perfpatch/perfpatch_Large_file_random_writes._num_threads=128.html >> > > Ok, thanks, I''ll try again. >Ok, will probably just run the 128 thread random write next time, since I am not seeing much difference on anything else. Steve> -chris >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html