This idea is from ext4. By this patch, we can make the dio write parallel, and improve the performance. We needn''t worry about the race between dio write and truncate, because the truncate need wait untill all the dio write end. And we also needn''t worry about the race between dio write and punch hole, because we have extent lock to protect our operation. I ran fio to test the performance of this feature. == Hardware =CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz Mem: 2GB SSD: Intel X25-M 120GB (Test Partition: 60GB) == config file =[global] ioengine=psync direct=1 bs=4k size=32G runtime=60 directory=/mnt/btrfs/ filename=testfile group_reporting thread [file1] numjobs=1 # 2 4 rw=randwrite == result (KBps) =write 1 2 4 lock 24936 24738 24726 nolock 24962 30866 32101 == result (iops) =write 1 2 4 lock 6234 6184 6181 nolock 6240 7716 8025 Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/inode.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d17a04b..091593a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; int flags = 0; - bool wakeup = false; + bool wakeup = true; int ret; if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov, offset, nr_segs)) return 0; - if (rw == READ) { - atomic_inc(&inode->i_dio_count); - smp_mb__after_atomic_inc(); - if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK, - &BTRFS_I(inode)->runtime_flags))) { - inode_dio_done(inode); - flags = DIO_LOCKING | DIO_SKIP_HOLES; - } else { - wakeup = true; - } + atomic_inc(&inode->i_dio_count); + smp_mb__after_atomic_inc(); + if (rw == WRITE) { + mutex_unlock(&inode->i_mutex); + } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK, + &BTRFS_I(inode)->runtime_flags))) { + inode_dio_done(inode); + flags = DIO_LOCKING | DIO_SKIP_HOLES; + wakeup = false; } ret = __blockdev_direct_IO(rw, iocb, inode, BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev, iov, offset, nr_segs, btrfs_get_blocks_direct, NULL, btrfs_submit_direct, flags); + if (wakeup) inode_dio_done(inode); + if (rw == WRITE) + mutex_lock(&inode->i_mutex); return ret; } -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik
2013-Jan-31 16:42 UTC
Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write
On Thu, Jan 31, 2013 at 02:39:03AM -0700, Miao Xie wrote:> This idea is from ext4. By this patch, we can make the dio write parallel, > and improve the performance. > > We needn''t worry about the race between dio write and truncate, because the > truncate need wait untill all the dio write end. > > And we also needn''t worry about the race between dio write and punch hole, > because we have extent lock to protect our operation. > > I ran fio to test the performance of this feature. > > == Hardware => CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz > Mem: 2GB > SSD: Intel X25-M 120GB (Test Partition: 60GB) > > == config file => [global] > ioengine=psync > direct=1 > bs=4k > size=32G > runtime=60 > directory=/mnt/btrfs/ > filename=testfile > group_reporting > thread > > [file1] > numjobs=1 # 2 4 > rw=randwrite > > == result (KBps) => write 1 2 4 > lock 24936 24738 24726 > nolock 24962 30866 32101 > > == result (iops) => write 1 2 4 > lock 6234 6184 6181 > nolock 6240 7716 8025So the one thing I worry about is interactions with fsync. I''ve been depending on the mutex to keep us from getting screwed by writers coming in while I''m trying to write out the changed extents. Could you test this with fsync and make sure it doesn''t break anything? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:> This idea is from ext4. By this patch, we can make the dio write parallel, > and improve the performance.Interesting, AFAIK, ext4 can only do nolock dio write on some conditions(should be a overwrite, file size remains unchanged, no aligned/buffer io in flight), btrfs is ok without any conditions? thanks, liubo> > We needn''t worry about the race between dio write and truncate, because the > truncate need wait untill all the dio write end. > > And we also needn''t worry about the race between dio write and punch hole, > because we have extent lock to protect our operation. > > I ran fio to test the performance of this feature. > > == Hardware => CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz > Mem: 2GB > SSD: Intel X25-M 120GB (Test Partition: 60GB) > > == config file => [global] > ioengine=psync > direct=1 > bs=4k > size=32G > runtime=60 > directory=/mnt/btrfs/ > filename=testfile > group_reporting > thread > > [file1] > numjobs=1 # 2 4 > rw=randwrite > > == result (KBps) => write 1 2 4 > lock 24936 24738 24726 > nolock 24962 30866 32101 > > == result (iops) => write 1 2 4 > lock 6234 6184 6181 > nolock 6240 7716 8025 > > Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> > --- > fs/btrfs/inode.c | 24 +++++++++++++----------- > 1 file changed, 13 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index d17a04b..091593a 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, > struct file *file = iocb->ki_filp; > struct inode *inode = file->f_mapping->host; > int flags = 0; > - bool wakeup = false; > + bool wakeup = true; > int ret; > > if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov, > offset, nr_segs)) > return 0; > > - if (rw == READ) { > - atomic_inc(&inode->i_dio_count); > - smp_mb__after_atomic_inc(); > - if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK, > - &BTRFS_I(inode)->runtime_flags))) { > - inode_dio_done(inode); > - flags = DIO_LOCKING | DIO_SKIP_HOLES; > - } else { > - wakeup = true; > - } > + atomic_inc(&inode->i_dio_count); > + smp_mb__after_atomic_inc(); > + if (rw == WRITE) { > + mutex_unlock(&inode->i_mutex); > + } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK, > + &BTRFS_I(inode)->runtime_flags))) { > + inode_dio_done(inode); > + flags = DIO_LOCKING | DIO_SKIP_HOLES; > + wakeup = false; > } > > ret = __blockdev_direct_IO(rw, iocb, inode, > BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev, > iov, offset, nr_segs, btrfs_get_blocks_direct, NULL, > btrfs_submit_direct, flags); > + > if (wakeup) > inode_dio_done(inode); > + if (rw == WRITE) > + mutex_lock(&inode->i_mutex); > return ret; > } > > -- > 1.7.11.7 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On fri, 1 Feb 2013 10:53:30 +0800, Liu Bo wrote:> On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote: >> This idea is from ext4. By this patch, we can make the dio write parallel, >> and improve the performance. > > Interesting, AFAIK, ext4 can only do nolock dio write on some > conditions(should be a overwrite, file size remains unchanged, > no aligned/buffer io in flight), btrfs is ok without any conditions?ext4 don''t have extent lock, it can not avoid 2 AIO threads are at work on the same unwritten block, so it can not use unlocked dio write for unaligned dio/aio. But btrfs has extent lock, it can avoid this problem. And ext4 need take write lock of ->i_data_sem, when it allocate the free space, but in order to avoid truncation and hole punch during dio, it need take the read lock of ->i_data_sem before it release ->i_mutex, that is if it isn''t a overwrite, deadlock will happen, so the unlocked dio of ext4 should be a overwrite. But btrfs doesn''t have such limitation. Thanks Miao> > thanks, > liubo > >> >> We needn''t worry about the race between dio write and truncate, because the >> truncate need wait untill all the dio write end. >> >> And we also needn''t worry about the race between dio write and punch hole, >> because we have extent lock to protect our operation. >> >> I ran fio to test the performance of this feature. >> >> == Hardware =>> CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz >> Mem: 2GB >> SSD: Intel X25-M 120GB (Test Partition: 60GB) >> >> == config file =>> [global] >> ioengine=psync >> direct=1 >> bs=4k >> size=32G >> runtime=60 >> directory=/mnt/btrfs/ >> filename=testfile >> group_reporting >> thread >> >> [file1] >> numjobs=1 # 2 4 >> rw=randwrite >> >> == result (KBps) =>> write 1 2 4 >> lock 24936 24738 24726 >> nolock 24962 30866 32101 >> >> == result (iops) =>> write 1 2 4 >> lock 6234 6184 6181 >> nolock 6240 7716 8025 >> >> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> >> --- >> fs/btrfs/inode.c | 24 +++++++++++++----------- >> 1 file changed, 13 insertions(+), 11 deletions(-) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index d17a04b..091593a 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, >> struct file *file = iocb->ki_filp; >> struct inode *inode = file->f_mapping->host; >> int flags = 0; >> - bool wakeup = false; >> + bool wakeup = true; >> int ret; >> >> if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov, >> offset, nr_segs)) >> return 0; >> >> - if (rw == READ) { >> - atomic_inc(&inode->i_dio_count); >> - smp_mb__after_atomic_inc(); >> - if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK, >> - &BTRFS_I(inode)->runtime_flags))) { >> - inode_dio_done(inode); >> - flags = DIO_LOCKING | DIO_SKIP_HOLES; >> - } else { >> - wakeup = true; >> - } >> + atomic_inc(&inode->i_dio_count); >> + smp_mb__after_atomic_inc(); >> + if (rw == WRITE) { >> + mutex_unlock(&inode->i_mutex); >> + } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK, >> + &BTRFS_I(inode)->runtime_flags))) { >> + inode_dio_done(inode); >> + flags = DIO_LOCKING | DIO_SKIP_HOLES; >> + wakeup = false; >> } >> >> ret = __blockdev_direct_IO(rw, iocb, inode, >> BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev, >> iov, offset, nr_segs, btrfs_get_blocks_direct, NULL, >> btrfs_submit_direct, flags); >> + >> if (wakeup) >> inode_dio_done(inode); >> + if (rw == WRITE) >> + mutex_lock(&inode->i_mutex); >> return ret; >> } >> >> -- >> 1.7.11.7 >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On fri, 01 Feb 2013 12:08:25 +0800, Miao Xie wrote:> On fri, 1 Feb 2013 10:53:30 +0800, Liu Bo wrote: >> On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote: >>> This idea is from ext4. By this patch, we can make the dio write parallel, >>> and improve the performance. >> >> Interesting, AFAIK, ext4 can only do nolock dio write on some >> conditions(should be a overwrite, file size remains unchanged, >> no aligned/buffer io in flight), btrfs is ok without any conditions? > > ext4 don''t have extent lock, it can not avoid 2 AIO threads are at work on the same > unwritten block, so it can not use unlocked dio write for unaligned dio/aio. But btrfs > has extent lock, it can avoid this problem.Besides that, btrfs doesn''t allow doing a unaligned dio/aio. I read the code again, found there is a race that several tasks may update i_size at the same time. There are two methods to fix this problem: 1. just like ext4, don''t do unlocked write dio if it is beyond the end of the file 2. use a spin lock to protect i_size update I want to choose the 2nd one. Thanks Miao> > And ext4 need take write lock of ->i_data_sem, when it allocate the free space, > but in order to avoid truncation and hole punch during dio, it need take the read > lock of ->i_data_sem before it release ->i_mutex, that is if it isn''t a overwrite, > deadlock will happen, so the unlocked dio of ext4 should be a overwrite. But btrfs > doesn''t have such limitation. > > Thanks > Miao > >> >> thanks, >> liubo >> >>> >>> We needn''t worry about the race between dio write and truncate, because the >>> truncate need wait untill all the dio write end. >>> >>> And we also needn''t worry about the race between dio write and punch hole, >>> because we have extent lock to protect our operation. >>> >>> I ran fio to test the performance of this feature. >>> >>> == Hardware =>>> CPU: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz >>> Mem: 2GB >>> SSD: Intel X25-M 120GB (Test Partition: 60GB) >>> >>> == config file =>>> [global] >>> ioengine=psync >>> direct=1 >>> bs=4k >>> size=32G >>> runtime=60 >>> directory=/mnt/btrfs/ >>> filename=testfile >>> group_reporting >>> thread >>> >>> [file1] >>> numjobs=1 # 2 4 >>> rw=randwrite >>> >>> == result (KBps) =>>> write 1 2 4 >>> lock 24936 24738 24726 >>> nolock 24962 30866 32101 >>> >>> == result (iops) =>>> write 1 2 4 >>> lock 6234 6184 6181 >>> nolock 6240 7716 8025 >>> >>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> >>> --- >>> fs/btrfs/inode.c | 24 +++++++++++++----------- >>> 1 file changed, 13 insertions(+), 11 deletions(-) >>> >>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>> index d17a04b..091593a 100644 >>> --- a/fs/btrfs/inode.c >>> +++ b/fs/btrfs/inode.c >>> @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, >>> struct file *file = iocb->ki_filp; >>> struct inode *inode = file->f_mapping->host; >>> int flags = 0; >>> - bool wakeup = false; >>> + bool wakeup = true; >>> int ret; >>> >>> if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iov, >>> offset, nr_segs)) >>> return 0; >>> >>> - if (rw == READ) { >>> - atomic_inc(&inode->i_dio_count); >>> - smp_mb__after_atomic_inc(); >>> - if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK, >>> - &BTRFS_I(inode)->runtime_flags))) { >>> - inode_dio_done(inode); >>> - flags = DIO_LOCKING | DIO_SKIP_HOLES; >>> - } else { >>> - wakeup = true; >>> - } >>> + atomic_inc(&inode->i_dio_count); >>> + smp_mb__after_atomic_inc(); >>> + if (rw == WRITE) { >>> + mutex_unlock(&inode->i_mutex); >>> + } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK, >>> + &BTRFS_I(inode)->runtime_flags))) { >>> + inode_dio_done(inode); >>> + flags = DIO_LOCKING | DIO_SKIP_HOLES; >>> + wakeup = false; >>> } >>> >>> ret = __blockdev_direct_IO(rw, iocb, inode, >>> BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev, >>> iov, offset, nr_segs, btrfs_get_blocks_direct, NULL, >>> btrfs_submit_direct, flags); >>> + >>> if (wakeup) >>> inode_dio_done(inode); >>> + if (rw == WRITE) >>> + mutex_lock(&inode->i_mutex); >>> return ret; >>> } >>> >>> -- >>> 1.7.11.7 >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html