liubo
2011-Apr-21 07:58 UTC
[RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
The current code relogs the entire inode every time during fsync log, and it is much better suited to small files rather than large ones. During my performance test, the fsync performace of large files sucks, and we can ascribe this to the tremendous amount of csum infos of the large ones, cause we have to flush all of these csum infos into log trees even when there are only _one_ change in the whole file data. Apparently, to optimize fsync, we need to create a filter to skip the unnecessary csum ones, that is, the corresponding file data remains unchanged before this fsync. Here I have some test results to show, I use sysbench to do "random write + fsync". Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 10000 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: == Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb ==a) without patch: (*SPEED* : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (*SPEED* : 5.1537Mb/sec) 1319.34 Requests/sec executed Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> --- fs/btrfs/ctree.h | 14 ++++++++++++-- fs/btrfs/inode.c | 1 + fs/btrfs/tree-log.c | 31 +++++++++++++++++++++++++------ 3 files changed, 38 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2e61fe1..300bea0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -642,6 +642,12 @@ struct btrfs_root_ref { #define BTRFS_FILE_EXTENT_REG 1 #define BTRFS_FILE_EXTENT_PREALLOC 2 +/* + * used to indicate that this file extent has just been changed and + * its csums need to be updated when fsync tries to log this inode. + */ +#define BTRFS_FILE_EXTENT_CSUM_UPTODATE (1 << 0) + struct btrfs_file_extent_item { /* * transaction id that created this extent @@ -665,7 +671,9 @@ struct btrfs_file_extent_item { */ u8 compression; u8 encryption; - __le16 other_encoding; /* spare for later use */ + u8 other_encoding; /* spare for later use */ + + u8 flag; /* are we inline data or a real extent? */ u8 type; @@ -2026,7 +2034,9 @@ BTRFS_SETGET_FUNCS(file_extent_compression, struct btrfs_file_extent_item, BTRFS_SETGET_FUNCS(file_extent_encryption, struct btrfs_file_extent_item, encryption, 8); BTRFS_SETGET_FUNCS(file_extent_other_encoding, struct btrfs_file_extent_item, - other_encoding, 16); + other_encoding, 8); +BTRFS_SETGET_FUNCS(file_extent_flag, struct btrfs_file_extent_item, + flag, 8); /* this returns the number of file bytes represented by the inline item. * If an item is compressed, this is the uncompressed size diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a4157cf..ed4e318 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1660,6 +1660,7 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, btrfs_set_file_extent_compression(leaf, fi, compression); btrfs_set_file_extent_encryption(leaf, fi, encryption); btrfs_set_file_extent_other_encoding(leaf, fi, other_encoding); + btrfs_set_file_extent_flag(leaf, fi, BTRFS_FILE_EXTENT_CSUM_UPTODATE); btrfs_unlock_up_safe(path, 1); btrfs_set_lock_blocking(leaf); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index c50271a..baa4a0a 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2591,11 +2591,24 @@ static int drop_objectid_items(struct btrfs_trans_handle *trans, return ret; } +static inline int need_csum(struct extent_buffer *src, + struct btrfs_file_extent_item *fi, + u64 gen, int csum) +{ + if (csum && + (btrfs_file_extent_generation(src, fi) == gen) && + (btrfs_file_extent_flag(src, fi) & BTRFS_FILE_EXTENT_CSUM_UPTODATE)) + return 1; + + return 0; +} + + static noinline int copy_items(struct btrfs_trans_handle *trans, struct btrfs_root *log, struct btrfs_path *dst_path, struct extent_buffer *src, - int start_slot, int nr, int inode_only) + int start_slot, int nr, int inode_only, int csum) { unsigned long src_offset; unsigned long dst_offset; @@ -2653,6 +2666,7 @@ static noinline int copy_items(struct btrfs_trans_handle *trans, btrfs_set_inode_generation(dst_path->nodes[0], inode_item, 0); } + /* take a reference on file data extents so that truncates * or deletes of this inode don''t have to relog the inode * again @@ -2663,8 +2677,9 @@ static noinline int copy_items(struct btrfs_trans_handle *trans, struct btrfs_file_extent_item); found_type = btrfs_file_extent_type(src, extent); - if (found_type == BTRFS_FILE_EXTENT_REG || - found_type == BTRFS_FILE_EXTENT_PREALLOC) { + if ((found_type == BTRFS_FILE_EXTENT_REG || + found_type == BTRFS_FILE_EXTENT_PREALLOC) && + need_csum(src, extent, trans->transid, csum)) { u64 ds, dl, cs, cl; ds = btrfs_file_extent_disk_bytenr(src, extent); @@ -2688,6 +2703,9 @@ static noinline int copy_items(struct btrfs_trans_handle *trans, ds + cs, ds + cs + cl - 1, &ordered_sums); BUG_ON(ret); + + btrfs_set_file_extent_flag(src, extent, 0); + btrfs_mark_buffer_dirty(src); } } } @@ -2742,6 +2760,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, int nritems; int ins_start_slot = 0; int ins_nr; + int csum = (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) ? 0 : 1; log = root->log_root; @@ -2816,7 +2835,7 @@ again: } ret = copy_items(trans, log, dst_path, src, ins_start_slot, - ins_nr, inode_only); + ins_nr, inode_only, csum); if (ret) { err = ret; goto out_unlock; @@ -2835,7 +2854,7 @@ next_slot: if (ins_nr) { ret = copy_items(trans, log, dst_path, src, ins_start_slot, - ins_nr, inode_only); + ins_nr, inode_only, csum); if (ret) { err = ret; goto out_unlock; @@ -2856,7 +2875,7 @@ next_slot: if (ins_nr) { ret = copy_items(trans, log, dst_path, src, ins_start_slot, - ins_nr, inode_only); + ins_nr, inode_only, csum); if (ret) { err = ret; goto out_unlock; -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2011-Apr-21 13:16 UTC
Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
Excerpts from liubo''s message of 2011-04-21 03:58:21 -0400:> > The current code relogs the entire inode every time during fsync log, > and it is much better suited to small files rather than large ones. > > During my performance test, the fsync performace of large files sucks, > and we can ascribe this to the tremendous amount of csum infos of the > large ones, cause we have to flush all of these csum infos into log trees > even when there are only _one_ change in the whole file data. Apparently, > to optimize fsync, we need to create a filter to skip the unnecessary csum > ones, that is, the corresponding file data remains unchanged before this fsync. > > Here I have some test results to show, I use sysbench to do "random write + fsync". > > Sysbench args: > - Number of threads: 1 > - Extra file open flags: 0 > - 2 files, 4Gb each > - Block size 4Kb > - Number of random requests for random IO: 10000 > - Read/Write ratio for combined random IO test: 1.50 > - Periodic FSYNC enabled, calling fsync() each 100 requests. > - Calling fsync() at the end of test, Enabled. > - Using synchronous I/O mode > - Doing random write test > > Sysbench results: > ==> Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total > Read 0b Written 39.062Mb Total transferred 39.062Mb > ==> a) without patch: (*SPEED* : 451.01Kb/sec) > 112.75 Requests/sec executed > > b) with patch: (*SPEED* : 5.1537Mb/sec) > 1319.34 Requests/sec executedReally nice results! Especially considering the small size of the patch. But, I''d really like to look at using sub transaction ids for this, and then logging just the part of the inode that had changed since the last log commit. It''s more complex, but will also help reduce tree searches for the file items. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Apr-22 00:55 UTC
Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
Chris Mason wrote:> Excerpts from liubo''s message of 2011-04-21 03:58:21 -0400: >> >> The current code relogs the entire inode every time during fsync log, >> and it is much better suited to small files rather than large ones. >> >> During my performance test, the fsync performace of large files sucks, >> and we can ascribe this to the tremendous amount of csum infos of the >> large ones, cause we have to flush all of these csum infos into log trees >> even when there are only _one_ change in the whole file data. Apparently, >> to optimize fsync, we need to create a filter to skip the unnecessary csum >> ones, that is, the corresponding file data remains unchanged before this fsync. >> >> Here I have some test results to show, I use sysbench to do "random write + fsync". >> >> Sysbench args: >> - Number of threads: 1 >> - Extra file open flags: 0 >> - 2 files, 4Gb each >> - Block size 4Kb >> - Number of random requests for random IO: 10000 >> - Read/Write ratio for combined random IO test: 1.50 >> - Periodic FSYNC enabled, calling fsync() each 100 requests. >> - Calling fsync() at the end of test, Enabled. >> - Using synchronous I/O mode >> - Doing random write test >> >> Sysbench results: >> ==>> Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total >> Read 0b Written 39.062Mb Total transferred 39.062Mb >> ==>> a) without patch: (*SPEED* : 451.01Kb/sec) >> 112.75 Requests/sec executed >> >> b) with patch: (*SPEED* : 5.1537Mb/sec) >> 1319.34 Requests/sec executed > > Really nice results! Especially considering the small size of the patch. > > But, I''d really like to look at using sub transaction ids for this, and > then logging just the part of the inode that had changed since the last > log commit. It''s more complex, but will also help reduce tree searches > for the file items. >And this patch forgot to mention it has compatability issue. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2011-Apr-22 01:28 UTC
Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
Excerpts from Li Zefan''s message of 2011-04-21 20:55:40 -0400:> Chris Mason wrote: > > Excerpts from liubo''s message of 2011-04-21 03:58:21 -0400: > >> > >> The current code relogs the entire inode every time during fsync log, > >> and it is much better suited to small files rather than large ones. > >> > >> During my performance test, the fsync performace of large files sucks, > >> and we can ascribe this to the tremendous amount of csum infos of the > >> large ones, cause we have to flush all of these csum infos into log trees > >> even when there are only _one_ change in the whole file data. Apparently, > >> to optimize fsync, we need to create a filter to skip the unnecessary csum > >> ones, that is, the corresponding file data remains unchanged before this fsync. > >> > >> Here I have some test results to show, I use sysbench to do "random write + fsync". > >> > >> Sysbench args: > >> - Number of threads: 1 > >> - Extra file open flags: 0 > >> - 2 files, 4Gb each > >> - Block size 4Kb > >> - Number of random requests for random IO: 10000 > >> - Read/Write ratio for combined random IO test: 1.50 > >> - Periodic FSYNC enabled, calling fsync() each 100 requests. > >> - Calling fsync() at the end of test, Enabled. > >> - Using synchronous I/O mode > >> - Doing random write test > >> > >> Sysbench results: > >> ==> >> Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total > >> Read 0b Written 39.062Mb Total transferred 39.062Mb > >> ==> >> a) without patch: (*SPEED* : 451.01Kb/sec) > >> 112.75 Requests/sec executed > >> > >> b) with patch: (*SPEED* : 5.1537Mb/sec) > >> 1319.34 Requests/sec executed > > > > Really nice results! Especially considering the small size of the patch. > > > > But, I''d really like to look at using sub transaction ids for this, and > > then logging just the part of the inode that had changed since the last > > log commit. It''s more complex, but will also help reduce tree searches > > for the file items. > > > > And this patch forgot to mention it has compatability issue.Right, at the very least we want to just use one bit of that field instead of all 8. But keeping a sub-transid and putting that in the generation field of the file extent instead can get us the same benefits without stealing the bits. As we push the sub transid into the btree blocks as well, we''ll get much faster tree walks too. The penalty is in complexity in the logging code, since it will have to deal with finding extents in the log tree and merging in the new extents from the file. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
liubo
2011-Apr-25 09:58 UTC
Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
On 04/22/2011 09:28 AM, Chris Mason wrote:> Excerpts from Li Zefan''s message of 2011-04-21 20:55:40 -0400: >> Chris Mason wrote: >>> Excerpts from liubo''s message of 2011-04-21 03:58:21 -0400: >>>> The current code relogs the entire inode every time during fsync log, >>>> and it is much better suited to small files rather than large ones. >>>> >>>> During my performance test, the fsync performace of large files sucks, >>>> and we can ascribe this to the tremendous amount of csum infos of the >>>> large ones, cause we have to flush all of these csum infos into log trees >>>> even when there are only _one_ change in the whole file data. Apparently, >>>> to optimize fsync, we need to create a filter to skip the unnecessary csum >>>> ones, that is, the corresponding file data remains unchanged before this fsync. >>>> >>>> Here I have some test results to show, I use sysbench to do "random write + fsync". >>>> >>>> Sysbench args: >>>> - Number of threads: 1 >>>> - Extra file open flags: 0 >>>> - 2 files, 4Gb each >>>> - Block size 4Kb >>>> - Number of random requests for random IO: 10000 >>>> - Read/Write ratio for combined random IO test: 1.50 >>>> - Periodic FSYNC enabled, calling fsync() each 100 requests. >>>> - Calling fsync() at the end of test, Enabled. >>>> - Using synchronous I/O mode >>>> - Doing random write test >>>> >>>> Sysbench results: >>>> ==>>>> Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total >>>> Read 0b Written 39.062Mb Total transferred 39.062Mb >>>> ==>>>> a) without patch: (*SPEED* : 451.01Kb/sec) >>>> 112.75 Requests/sec executed >>>> >>>> b) with patch: (*SPEED* : 5.1537Mb/sec) >>>> 1319.34 Requests/sec executed >>> Really nice results! Especially considering the small size of the patch. >>> >>> But, I''d really like to look at using sub transaction ids for this, and >>> then logging just the part of the inode that had changed since the last >>> log commit. It''s more complex, but will also help reduce tree searches >>> for the file items. >>> >> And this patch forgot to mention it has compatability issue. > > Right, at the very least we want to just use one bit of that field > instead of all 8. But keeping a sub-transid and putting that in the > generation field of the file extent instead can get us the same benefits > without stealing the bits. >Nice. This is the first step of my plan.> As we push the sub transid into the btree blocks as well, we''ll get much > faster tree walks too. The penalty is in complexity in the logging > code, since it will have to deal with finding extents in the log tree > and merging in the new extents from the file.I''ve been thinking of this extent buffer with sub transid stuff for a while, and will give it a try. :) thanks, liubo.> > -chris >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
liubo
2011-May-06 02:36 UTC
[RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
The current code relogs the entire inode every time during fsync log, and it is much better suited to small files rather than large ones. During my performance test, the fsync performace of large files sucks, and we can ascribe this to the tremendous amount of csum infos of the large ones, cause we have to flush all of these csum infos into log trees even when there are only _one_ change in the whole file data. Apparently, to optimize fsync, we need to create a filter to skip the unnecessary csum ones, that is, the corresponding file data remains unchanged before this fsync. Here I have some test results to show, I use sysbench to do "random write + fsync". ==sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= [prepare, run] == Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 10000 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: == Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb ==a) without patch: (*SPEED* : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (*SPEED* : 4.7533Mb/sec) 1216.84 Requests/sec executed PS: I''ve made a _sub transid_ stuff patch, but it does not perform as effectively as this patch, and I''m wanderring where the problem is and trying to improve it more. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> --- fs/btrfs/tree-log.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index c50271a..b934a36 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2662,6 +2662,9 @@ static noinline int copy_items(struct btrfs_trans_handle *trans, extent = btrfs_item_ptr(src, start_slot + i, struct btrfs_file_extent_item); + if (btrfs_file_extent_generation(src, extent) < trans->transid) + continue; + found_type = btrfs_file_extent_type(src, extent); if (found_type == BTRFS_FILE_EXTENT_REG || found_type == BTRFS_FILE_EXTENT_PREALLOC) { -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik
2011-May-06 12:51 UTC
Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
On 05/05/2011 10:36 PM, liubo wrote:> > The current code relogs the entire inode every time during fsync log, > and it is much better suited to small files rather than large ones. > > During my performance test, the fsync performace of large files sucks, > and we can ascribe this to the tremendous amount of csum infos of the > large ones, cause we have to flush all of these csum infos into log trees > even when there are only _one_ change in the whole file data. Apparently, > to optimize fsync, we need to create a filter to skip the unnecessary csum > ones, that is, the corresponding file data remains unchanged before this fsync. > > Here I have some test results to show, I use sysbench to do "random write + fsync". > > ==> sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= [prepare, run] > ==> > Sysbench args: > - Number of threads: 1 > - Extra file open flags: 0 > - 2 files, 4Gb each > - Block size 4Kb > - Number of random requests for random IO: 10000 > - Read/Write ratio for combined random IO test: 1.50 > - Periodic FSYNC enabled, calling fsync() each 100 requests. > - Calling fsync() at the end of test, Enabled. > - Using synchronous I/O mode > - Doing random write test > > Sysbench results: > ==> Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total > Read 0b Written 39.062Mb Total transferred 39.062Mb > ==> a) without patch: (*SPEED* : 451.01Kb/sec) > 112.75 Requests/sec executed > > b) with patch: (*SPEED* : 4.7533Mb/sec) > 1216.84 Requests/sec executed > > > PS: I''ve made a _sub transid_ stuff patch, but it does not perform as effectively as this patch, > and I''m wanderring where the problem is and trying to improve it more. > > Signed-off-by: Liu Bo<liubo2009@cn.fujitsu.com> > --- > fs/btrfs/tree-log.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index c50271a..b934a36 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -2662,6 +2662,9 @@ static noinline int copy_items(struct btrfs_trans_handle *trans, > extent = btrfs_item_ptr(src, start_slot + i, > struct btrfs_file_extent_item); > > + if (btrfs_file_extent_generation(src, extent)< trans->transid) > + continue; > + > found_type = btrfs_file_extent_type(src, extent); > if (found_type == BTRFS_FILE_EXTENT_REG || > found_type == BTRFS_FILE_EXTENT_PREALLOC) {Seems reasonable to me, Reviewed-by: Josef Bacik <josef@redhat.com> Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2011-May-06 14:59 UTC
Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
Excerpts from liubo''s message of 2011-05-05 22:36:09 -0400:> > The current code relogs the entire inode every time during fsync log, > and it is much better suited to small files rather than large ones. > > During my performance test, the fsync performace of large files sucks, > and we can ascribe this to the tremendous amount of csum infos of the > large ones, cause we have to flush all of these csum infos into log trees > even when there are only _one_ change in the whole file data. Apparently, > to optimize fsync, we need to create a filter to skip the unnecessary csum > ones, that is, the corresponding file data remains unchanged before this fsync. > > Here I have some test results to show, I use sysbench to do "random write + fsync". > > ==> sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= [prepare, run] > ==> > Sysbench args: > - Number of threads: 1 > - Extra file open flags: 0 > - 2 files, 4Gb each > - Block size 4Kb > - Number of random requests for random IO: 10000 > - Read/Write ratio for combined random IO test: 1.50 > - Periodic FSYNC enabled, calling fsync() each 100 requests. > - Calling fsync() at the end of test, Enabled. > - Using synchronous I/O mode > - Doing random write test > > Sysbench results: > ==> Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total > Read 0b Written 39.062Mb Total transferred 39.062Mb > ==> a) without patch: (*SPEED* : 451.01Kb/sec) > 112.75 Requests/sec executed > > b) with patch: (*SPEED* : 4.7533Mb/sec) > 1216.84 Requests/sec executed > > > PS: I''ve made a _sub transid_ stuff patch, but it does not perform as effectively as this patch, > and I''m wanderring where the problem is and trying to improve it more. > > Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> > --- > fs/btrfs/tree-log.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index c50271a..b934a36 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -2662,6 +2662,9 @@ static noinline int copy_items(struct btrfs_trans_handle *trans, > extent = btrfs_item_ptr(src, start_slot + i, > struct btrfs_file_extent_item); > > + if (btrfs_file_extent_generation(src, extent) < trans->transid) > + continue; > +Some rough math shows you get 368 requests/sec per line added by this patch. Just think about how much better the metric would be without the whitespace! Really though, nicely done. You''re still copying the extent items into the log tree even though they are from older transactions. If you push the check into btrfs_log_inode, you can avoid even more work. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Myroslav Opyr
2011-Oct-25 23:18 UTC
Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
liubo <liubo2009 <at> cn.fujitsu.com> writes:> > On 04/22/2011 09:28 AM, Chris Mason wrote: > > Right, at the very least we want to just use one bit of that field > > instead of all 8. But keeping a sub-transid and putting that in the > > generation field of the file extent instead can get us the same benefits > > without stealing the bits. > > > > Nice. This is the first step of my plan. > > > As we push the sub transid into the btree blocks as well, we''ll get much > > faster tree walks too. The penalty is in complexity in the logging > > code, since it will have to deal with finding extents in the log tree > > and merging in the new extents from the file. > > I''ve been thinking of this extent buffer with sub transid stuff for a while, > and will give it a try. :)Hi, any progress upon this patch? We started experimenting with btrfs and were immediately hit by the large file fsync issue. Each fsync operation to 3.3Gb file that is having several dozens of bytes appended is being visualized as 55-58sec long freeze of all filesystem operations altogether with 99.9% CPU utilization in the process that caused the fsync. Removing fsync calls made several magnitudes difference in operations speed. FYI, we are running 2.6.38.8-32.fc15.i686.PAE with Btrfs v0.19 as XEN DomU and btrfs considers virtual xvd device (backed by HDD file on Dom0) to be SSD (according to dmesg) if that matters. Regards, m. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2011-Oct-26 01:12 UTC
Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
On 10/26/2011 07:18 AM, Myroslav Opyr wrote:> liubo <liubo2009 <at> cn.fujitsu.com> writes: > >> On 04/22/2011 09:28 AM, Chris Mason wrote: >>> Right, at the very least we want to just use one bit of that field >>> instead of all 8. But keeping a sub-transid and putting that in the >>> generation field of the file extent instead can get us the same benefits >>> without stealing the bits. >>> >> Nice. This is the first step of my plan. >> >>> As we push the sub transid into the btree blocks as well, we''ll get much >>> faster tree walks too. The penalty is in complexity in the logging >>> code, since it will have to deal with finding extents in the log tree >>> and merging in the new extents from the file. >> I''ve been thinking of this extent buffer with sub transid stuff for a while, >> and will give it a try. :) > > Hi, > > any progress upon this patch? We started experimenting with btrfs and were > immediately hit by the large file fsync issue. >The patchset has been done and queued for merge, and you can try it with the newest version: http://marc.info/?l=linux-btrfs&m=131262353801288&w=2 http://marc.info/?l=linux-btrfs&m=131262353701285&w=2 http://marc.info/?l=linux-btrfs&m=131262357201359&w=2 http://marc.info/?l=linux-btrfs&m=131262353301276&w=2 http://marc.info/?l=linux-btrfs&m=131262356501328&w=2 http://marc.info/?l=linux-btrfs&m=131262352901267&w=2 http://marc.info/?l=linux-btrfs&m=131262355001313&w=2 http://marc.info/?l=linux-btrfs&m=131262357201356&w=2 http://marc.info/?l=linux-btrfs&m=131262354201304&w=2 http://marc.info/?l=linux-btrfs&m=131262355801321&w=2 http://marc.info/?l=linux-btrfs&m=131262355001310&w=2 http://marc.info/?l=linux-btrfs&m=131262354001293&w=2 http://marc.info/?l=linux-btrfs&m=131262353301279&w=2 thanks, liubo> Each fsync operation to 3.3Gb file that is having several dozens of bytes > appended is being visualized as 55-58sec long freeze of all filesystem > operations altogether with 99.9% CPU utilization in the process that caused the > fsync. Removing fsync calls made several magnitudes difference in operations > speed. > > FYI, we are running 2.6.38.8-32.fc15.i686.PAE with Btrfs v0.19 as XEN DomU and > btrfs considers virtual xvd device (backed by HDD file on Dom0) to be SSD > (according to dmesg) if that matters. > > Regards, > > m. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html