Filipe David Borba Manana
2013-Sep-22 20:55 UTC
[PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t wait for delayed work to finish before returning success to the caller. This change fixes this, ensuring that there''s no data loss if a power failure happens right after fs sync returns success to the caller and before the next commit happens. Steps to reproduce the data loss issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs Right after the btrfs fi sync command (a second or 2 for example), power off the machine and reboot it. The file will be empty, as it can be verified after mounting the filesystem and through btrfs-debug-tree: $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 location key (257 INODE_ITEM 0) type FILE namelen 6 datalen 0 name: foobar item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 inode ref index 2 namelen 6 name: foobar checksum tree key (CSUM_TREE ROOT_ITEM 0) leaf 29429760 items 0 free space 3995 generation 7 owner 7 fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae uuid tree key (UUID_TREE ROOT_ITEM 0) After this patch, the data loss no longer happens after a power failure and btrfs-debug-tree shows: $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 location key (257 INODE_ITEM 0) type FILE namelen 6 datalen 0 name: foobar item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 inode ref index 2 namelen 6 name: foobar item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 extent data disk byte 12845056 nr 8192 extent data offset 0 nr 8192 ram 8192 extent compression 0 checksum tree key (CSUM_TREE ROOT_ITEM 0) Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> --- fs/btrfs/super.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 6ab0df5..557e38f 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) struct btrfs_trans_handle *trans; struct btrfs_fs_info *fs_info = btrfs_sb(sb); struct btrfs_root *root = fs_info->tree_root; + int ret; trace_btrfs_sync_fs(wait); @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait) return 0; } + ret = btrfs_start_all_delalloc_inodes(fs_info, 0); + if (ret) + return ret; + btrfs_wait_all_ordered_extents(fs_info); trans = btrfs_attach_transaction_barrier(root); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Sep-23 01:30 UTC
Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
On sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t > wait for delayed work to finish before returning success to the > caller. This change fixes this, ensuring that there''s no data loss > if a power failure happens right after fs sync returns success to > the caller and before the next commit happens. > > Steps to reproduce the data loss issue: > > $ mkfs.btrfs -f /dev/sdb3 > $ mount /dev/sdb3 /mnt/btrfs > $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs > > Right after the btrfs fi sync command (a second or 2 for example), power > off the machine and reboot it. The file will be empty, as it can be verified > after mounting the filesystem and through btrfs-debug-tree: > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > location key (257 INODE_ITEM 0) type FILE > namelen 6 datalen 0 name: foobar > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > inode ref index 2 namelen 6 name: foobar > checksum tree key (CSUM_TREE ROOT_ITEM 0) > leaf 29429760 items 0 free space 3995 generation 7 owner 7 > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae > uuid tree key (UUID_TREE ROOT_ITEM 0) > > After this patch, the data loss no longer happens after a power failure and > btrfs-debug-tree shows: > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > location key (257 INODE_ITEM 0) type FILE > namelen 6 datalen 0 name: foobar > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > inode ref index 2 namelen 6 name: foobar > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 > extent data disk byte 12845056 nr 8192 > extent data offset 0 nr 8192 ram 8192 > extent compression 0 > checksum tree key (CSUM_TREE ROOT_ITEM 0) > > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> > --- > fs/btrfs/super.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index 6ab0df5..557e38f 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > struct btrfs_trans_handle *trans; > struct btrfs_fs_info *fs_info = btrfs_sb(sb); > struct btrfs_root *root = fs_info->tree_root; > + int ret; > > trace_btrfs_sync_fs(wait); > > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > return 0; > } > > + ret = btrfs_start_all_delalloc_inodes(fs_info, 0); > + if (ret) > + return ret; > +I don''t think we should call btrfs_start_all_delalloc_inodes(), because this function is also called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary. Calling writeback_inodes_sb() before btrfs_sync_fs() is better way to fix this problem. Thanks Miao> btrfs_wait_all_ordered_extents(fs_info); > > trans = btrfs_attach_transaction_barrier(root); >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Manana
2013-Sep-23 09:11 UTC
Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
On Mon, Sep 23, 2013 at 2:30 AM, Miao Xie <miaox@cn.fujitsu.com> wrote:> > On sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote: > > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t > > wait for delayed work to finish before returning success to the > > caller. This change fixes this, ensuring that there''s no data loss > > if a power failure happens right after fs sync returns success to > > the caller and before the next commit happens. > > > > Steps to reproduce the data loss issue: > > > > $ mkfs.btrfs -f /dev/sdb3 > > $ mount /dev/sdb3 /mnt/btrfs > > $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs > > > > Right after the btrfs fi sync command (a second or 2 for example), power > > off the machine and reboot it. The file will be empty, as it can be verified > > after mounting the filesystem and through btrfs-debug-tree: > > > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > > > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > > location key (257 INODE_ITEM 0) type FILE > > namelen 6 datalen 0 name: foobar > > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 > > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > > inode ref index 2 namelen 6 name: foobar > > checksum tree key (CSUM_TREE ROOT_ITEM 0) > > leaf 29429760 items 0 free space 3995 generation 7 owner 7 > > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e > > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae > > uuid tree key (UUID_TREE ROOT_ITEM 0) > > > > After this patch, the data loss no longer happens after a power failure and > > btrfs-debug-tree shows: > > > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > > location key (257 INODE_ITEM 0) type FILE > > namelen 6 datalen 0 name: foobar > > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 > > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > > inode ref index 2 namelen 6 name: foobar > > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 > > extent data disk byte 12845056 nr 8192 > > extent data offset 0 nr 8192 ram 8192 > > extent compression 0 > > checksum tree key (CSUM_TREE ROOT_ITEM 0) > > > > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> > > --- > > fs/btrfs/super.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > > index 6ab0df5..557e38f 100644 > > --- a/fs/btrfs/super.c > > +++ b/fs/btrfs/super.c > > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > > struct btrfs_trans_handle *trans; > > struct btrfs_fs_info *fs_info = btrfs_sb(sb); > > struct btrfs_root *root = fs_info->tree_root; > > + int ret; > > > > trace_btrfs_sync_fs(wait); > > > > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > > return 0; > > } > > > > + ret = btrfs_start_all_delalloc_inodes(fs_info, 0); > > + if (ret) > > + return ret; > > + > > I don''t think we should call btrfs_start_all_delalloc_inodes(), because this function is also > called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add > btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary.Where is that do_sync() function exactly? I''m not finding any with that exact name in fs/btrfs/* nor fs/* I used this approach because (besides working) it''s what is done in btrfs_commit_transaction() (via btrfs_start_delalloc_flush and btrfs_wait_delalloc_flush). Why can it be like that in the transaction commit and not in btrfs_sync_fs() ?> > Calling writeback_inodes_sb() before btrfs_sync_fs() is better way to fix this problem.Just tested it, and it works that way too. Uploading a new patch. Thanks for the feedback/review Miao.> > Thanks > Miao > > > btrfs_wait_all_ordered_extents(fs_info); > > > > trans = btrfs_attach_transaction_barrier(root); > > >-- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That''s why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Borba Manana
2013-Sep-23 09:23 UTC
[PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t wait for delayed work to finish before returning success to the caller. This change fixes this, ensuring that there''s no data loss if a power failure happens right after fs sync returns success to the caller and before the next commit happens. Steps to reproduce the data loss issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs Right after the btrfs fi sync command (a second or 2 for example), power off the machine and reboot it. The file will be empty, as it can be verified after mounting the filesystem and through btrfs-debug-tree: $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 location key (257 INODE_ITEM 0) type FILE namelen 6 datalen 0 name: foobar item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 inode ref index 2 namelen 6 name: foobar checksum tree key (CSUM_TREE ROOT_ITEM 0) leaf 29429760 items 0 free space 3995 generation 7 owner 7 fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae uuid tree key (UUID_TREE ROOT_ITEM 0) After this patch, the data loss no longer happens after a power failure and btrfs-debug-tree shows: $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 location key (257 INODE_ITEM 0) type FILE namelen 6 datalen 0 name: foobar item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 inode ref index 2 namelen 6 name: foobar item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 extent data disk byte 12845056 nr 8192 extent data offset 0 nr 8192 ram 8192 extent compression 0 checksum tree key (CSUM_TREE ROOT_ITEM 0) Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> --- V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as suggested by Miao Xie. fs/btrfs/super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 6ab0df5..38b4392 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) return 0; } + writeback_inodes_sb(sb, WB_REASON_SYNC); btrfs_wait_all_ordered_extents(fs_info); trans = btrfs_attach_transaction_barrier(root); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2013-Sep-23 09:51 UTC
Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
On Mon, Sep 23, 2013 at 10:11:42AM +0100, Filipe David Manana wrote:> On Mon, Sep 23, 2013 at 2:30 AM, Miao Xie <miaox@cn.fujitsu.com> wrote: > > > > On sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote: > > > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t > > > wait for delayed work to finish before returning success to the > > > caller. This change fixes this, ensuring that there''s no data loss > > > if a power failure happens right after fs sync returns success to > > > the caller and before the next commit happens. > > > > > > Steps to reproduce the data loss issue: > > > > > > $ mkfs.btrfs -f /dev/sdb3 > > > $ mount /dev/sdb3 /mnt/btrfs > > > $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs > > > > > > Right after the btrfs fi sync command (a second or 2 for example), power > > > off the machine and reboot it. The file will be empty, as it can be verified > > > after mounting the filesystem and through btrfs-debug-tree: > > > > > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > > > > > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > > > location key (257 INODE_ITEM 0) type FILE > > > namelen 6 datalen 0 name: foobar > > > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > > > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 > > > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > > > inode ref index 2 namelen 6 name: foobar > > > checksum tree key (CSUM_TREE ROOT_ITEM 0) > > > leaf 29429760 items 0 free space 3995 generation 7 owner 7 > > > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e > > > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae > > > uuid tree key (UUID_TREE ROOT_ITEM 0) > > > > > > After this patch, the data loss no longer happens after a power failure and > > > btrfs-debug-tree shows: > > > > > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > > > location key (257 INODE_ITEM 0) type FILE > > > namelen 6 datalen 0 name: foobar > > > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > > > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 > > > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > > > inode ref index 2 namelen 6 name: foobar > > > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 > > > extent data disk byte 12845056 nr 8192 > > > extent data offset 0 nr 8192 ram 8192 > > > extent compression 0 > > > checksum tree key (CSUM_TREE ROOT_ITEM 0) > > > > > > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> > > > --- > > > fs/btrfs/super.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > > > index 6ab0df5..557e38f 100644 > > > --- a/fs/btrfs/super.c > > > +++ b/fs/btrfs/super.c > > > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > > > struct btrfs_trans_handle *trans; > > > struct btrfs_fs_info *fs_info = btrfs_sb(sb); > > > struct btrfs_root *root = fs_info->tree_root; > > > + int ret; > > > > > > trace_btrfs_sync_fs(wait); > > > > > > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > > > return 0; > > > } > > > > > > + ret = btrfs_start_all_delalloc_inodes(fs_info, 0); > > > + if (ret) > > > + return ret; > > > + > > > > I don''t think we should call btrfs_start_all_delalloc_inodes(), because this function is also > > called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add > > btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary. > > Where is that do_sync() function exactly? I''m not finding any with > that exact name in fs/btrfs/* nor fs/*I think it should refer to sync_filesystem() :) -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Manana
2013-Sep-23 09:53 UTC
Re: [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana <fdmanana@gmail.com> wrote:> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t > wait for delayed work to finish before returning success to the > caller. This change fixes this, ensuring that there''s no data loss > if a power failure happens right after fs sync returns success to > the caller and before the next commit happens. > > Steps to reproduce the data loss issue: > > $ mkfs.btrfs -f /dev/sdb3 > $ mount /dev/sdb3 /mnt/btrfs > $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs > > Right after the btrfs fi sync command (a second or 2 for example), power > off the machine and reboot it. The file will be empty, as it can be verified > after mounting the filesystem and through btrfs-debug-tree: > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > location key (257 INODE_ITEM 0) type FILE > namelen 6 datalen 0 name: foobar > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > inode ref index 2 namelen 6 name: foobar > checksum tree key (CSUM_TREE ROOT_ITEM 0) > leaf 29429760 items 0 free space 3995 generation 7 owner 7 > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae > uuid tree key (UUID_TREE ROOT_ITEM 0) > > After this patch, the data loss no longer happens after a power failure and > btrfs-debug-tree shows: > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > location key (257 INODE_ITEM 0) type FILE > namelen 6 datalen 0 name: foobar > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > inode ref index 2 namelen 6 name: foobar > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 > extent data disk byte 12845056 nr 8192 > extent data offset 0 nr 8192 ram 8192 > extent compression 0 > checksum tree key (CSUM_TREE ROOT_ITEM 0) > > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> > --- > > V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as > suggested by Miao Xie. > > fs/btrfs/super.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index 6ab0df5..38b4392 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > return 0; > } > > + writeback_inodes_sb(sb, WB_REASON_SYNC); > btrfs_wait_all_ordered_extents(fs_info);Ignore this 2nd patch version please, for 2 reasons: 1) It triggers a WARN_ON because writeback_inodes_sb() requires the sb->u_mount semaphore to be acquired before, which is not always the case (it is when called through btrfs_kill_super, otherwise it isn''t) 2) It doesn''t guarantee that inodes are actually written (see comment of writeback_inodes_sb()), so we can return 0 (success) when the writes actually didn''t happen/succeed. Because of this, btrfs_start_all_delalloc_inodes() is more honest.> > trans = btrfs_attach_transaction_barrier(root); > -- > 1.7.9.5 >-- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That''s why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2013-Sep-23 09:59 UTC
Re: [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
On Mon, Sep 23, 2013 at 10:53:20AM +0100, Filipe David Manana wrote:> On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana > <fdmanana@gmail.com> wrote: > > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t > > wait for delayed work to finish before returning success to the > > caller. This change fixes this, ensuring that there''s no data loss > > if a power failure happens right after fs sync returns success to > > the caller and before the next commit happens. > > > > Steps to reproduce the data loss issue: > > > > $ mkfs.btrfs -f /dev/sdb3 > > $ mount /dev/sdb3 /mnt/btrfs > > $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs > > > > Right after the btrfs fi sync command (a second or 2 for example), power > > off the machine and reboot it. The file will be empty, as it can be verified > > after mounting the filesystem and through btrfs-debug-tree: > > > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > > location key (257 INODE_ITEM 0) type FILE > > namelen 6 datalen 0 name: foobar > > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 > > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > > inode ref index 2 namelen 6 name: foobar > > checksum tree key (CSUM_TREE ROOT_ITEM 0) > > leaf 29429760 items 0 free space 3995 generation 7 owner 7 > > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e > > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae > > uuid tree key (UUID_TREE ROOT_ITEM 0) > > > > After this patch, the data loss no longer happens after a power failure and > > btrfs-debug-tree shows: > > > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > > location key (257 INODE_ITEM 0) type FILE > > namelen 6 datalen 0 name: foobar > > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 > > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > > inode ref index 2 namelen 6 name: foobar > > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 > > extent data disk byte 12845056 nr 8192 > > extent data offset 0 nr 8192 ram 8192 > > extent compression 0 > > checksum tree key (CSUM_TREE ROOT_ITEM 0) > > > > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> > > --- > > > > V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as > > suggested by Miao Xie. > > > > fs/btrfs/super.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > > index 6ab0df5..38b4392 100644 > > --- a/fs/btrfs/super.c > > +++ b/fs/btrfs/super.c > > @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > > return 0; > > } > > > > + writeback_inodes_sb(sb, WB_REASON_SYNC); > > btrfs_wait_all_ordered_extents(fs_info); > > Ignore this 2nd patch version please, for 2 reasons: > > 1) It triggers a WARN_ON because writeback_inodes_sb() requires the > sb->u_mount semaphore to be acquired before, which is not always the > case (it is when called through btrfs_kill_super, otherwise it isn''t) > > 2) It doesn''t guarantee that inodes are actually written (see comment > of writeback_inodes_sb()), so we can return 0 (success) when the > writes actually didn''t happen/succeed. Because of this, > btrfs_start_all_delalloc_inodes() is more honest.What about case BTRFS_IOC_SYNC: btrfs_start_all_delalloc_inodes(); btrfs_sync_fs(file->f_dentry->d_sb, 1); return 0; This way, there is no impact on calling sync(1). -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Manana
2013-Sep-23 10:06 UTC
Re: [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
On Mon, Sep 23, 2013 at 10:59 AM, Liu Bo <bo.li.liu@oracle.com> wrote:> On Mon, Sep 23, 2013 at 10:53:20AM +0100, Filipe David Manana wrote: >> On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana >> <fdmanana@gmail.com> wrote: >> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t >> > wait for delayed work to finish before returning success to the >> > caller. This change fixes this, ensuring that there''s no data loss >> > if a power failure happens right after fs sync returns success to >> > the caller and before the next commit happens. >> > >> > Steps to reproduce the data loss issue: >> > >> > $ mkfs.btrfs -f /dev/sdb3 >> > $ mount /dev/sdb3 /mnt/btrfs >> > $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs >> > >> > Right after the btrfs fi sync command (a second or 2 for example), power >> > off the machine and reboot it. The file will be empty, as it can be verified >> > after mounting the filesystem and through btrfs-debug-tree: >> > >> > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 >> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 >> > location key (257 INODE_ITEM 0) type FILE >> > namelen 6 datalen 0 name: foobar >> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 >> > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 >> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 >> > inode ref index 2 namelen 6 name: foobar >> > checksum tree key (CSUM_TREE ROOT_ITEM 0) >> > leaf 29429760 items 0 free space 3995 generation 7 owner 7 >> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e >> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae >> > uuid tree key (UUID_TREE ROOT_ITEM 0) >> > >> > After this patch, the data loss no longer happens after a power failure and >> > btrfs-debug-tree shows: >> > >> > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 >> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 >> > location key (257 INODE_ITEM 0) type FILE >> > namelen 6 datalen 0 name: foobar >> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 >> > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 >> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 >> > inode ref index 2 namelen 6 name: foobar >> > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 >> > extent data disk byte 12845056 nr 8192 >> > extent data offset 0 nr 8192 ram 8192 >> > extent compression 0 >> > checksum tree key (CSUM_TREE ROOT_ITEM 0) >> > >> > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> >> > --- >> > >> > V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as >> > suggested by Miao Xie. >> > >> > fs/btrfs/super.c | 1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c >> > index 6ab0df5..38b4392 100644 >> > --- a/fs/btrfs/super.c >> > +++ b/fs/btrfs/super.c >> > @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >> > return 0; >> > } >> > >> > + writeback_inodes_sb(sb, WB_REASON_SYNC); >> > btrfs_wait_all_ordered_extents(fs_info); >> >> Ignore this 2nd patch version please, for 2 reasons: >> >> 1) It triggers a WARN_ON because writeback_inodes_sb() requires the >> sb->u_mount semaphore to be acquired before, which is not always the >> case (it is when called through btrfs_kill_super, otherwise it isn''t) >> >> 2) It doesn''t guarantee that inodes are actually written (see comment >> of writeback_inodes_sb()), so we can return 0 (success) when the >> writes actually didn''t happen/succeed. Because of this, >> btrfs_start_all_delalloc_inodes() is more honest. > > What about > case BTRFS_IOC_SYNC: > btrfs_start_all_delalloc_inodes(); > btrfs_sync_fs(file->f_dentry->d_sb, 1); > return 0; > > This way, there is no impact on calling sync(1).Sounds ok. Will try it, returning error if btrfs_start_all_delalloc_inodes() returns an error. Thanks for the suggestion and pointing me to sync_filesystem() :)> > -liubo-- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That''s why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Borba Manana
2013-Sep-23 10:28 UTC
[PATCH v3] Btrfs: fix sync fs to actually wait for all data to be persisted
Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t wait for delayed work to finish before returning success to the caller. This change fixes this, ensuring that there''s no data loss if a power failure happens right after fs sync returns success to the caller and before the next commit happens. Steps to reproduce the data loss issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs Right after the btrfs fi sync command (a second or 2 for example), power off the machine and reboot it. The file will be empty, as it can be verified after mounting the filesystem and through btrfs-debug-tree: $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 location key (257 INODE_ITEM 0) type FILE namelen 6 datalen 0 name: foobar item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 inode ref index 2 namelen 6 name: foobar checksum tree key (CSUM_TREE ROOT_ITEM 0) leaf 29429760 items 0 free space 3995 generation 7 owner 7 fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae uuid tree key (UUID_TREE ROOT_ITEM 0) After this patch, the data loss no longer happens after a power failure and btrfs-debug-tree shows: $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 location key (257 INODE_ITEM 0) type FILE namelen 6 datalen 0 name: foobar item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 inode ref index 2 namelen 6 name: foobar item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 extent data disk byte 12845056 nr 8192 extent data offset 0 nr 8192 ram 8192 extent compression 0 checksum tree key (CSUM_TREE ROOT_ITEM 0) Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> --- V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as suggested by Miao Xie. V3: Use btrfs_start_all_delalloc_inodes() instead but outside btrfs_sync_fs(), in the sync IOCTL handler. Using writeback_inodes_sb() is not very honest because it doesn''t guarantee inode data is persisted and we have no way to know if persistence really happened or not, returning 0 (success) always. Thanks Liu Bo for the suggestion. fs/btrfs/ioctl.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9d46f60..8792fc8 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4557,9 +4557,15 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_logical_to_ino(root, argp); case BTRFS_IOC_SPACE_INFO: return btrfs_ioctl_space_info(root, argp); - case BTRFS_IOC_SYNC: + case BTRFS_IOC_SYNC: { + int ret; + + ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0); + if (ret) + return ret; btrfs_sync_fs(file->f_dentry->d_sb, 1); return 0; + } case BTRFS_IOC_START_SYNC: return btrfs_ioctl_start_sync(root, argp); case BTRFS_IOC_WAIT_SYNC: -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Borba Manana
2013-Sep-23 10:35 UTC
[PATCH v4] Btrfs: fix sync fs to actually wait for all data to be persisted
Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t wait for delayed work to finish before returning success to the caller. This change fixes this, ensuring that there''s no data loss if a power failure happens right after fs sync returns success to the caller and before the next commit happens. Steps to reproduce the data loss issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs Right after the btrfs fi sync command (a second or 2 for example), power off the machine and reboot it. The file will be empty, as it can be verified after mounting the filesystem and through btrfs-debug-tree: $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 location key (257 INODE_ITEM 0) type FILE namelen 6 datalen 0 name: foobar item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 inode ref index 2 namelen 6 name: foobar checksum tree key (CSUM_TREE ROOT_ITEM 0) leaf 29429760 items 0 free space 3995 generation 7 owner 7 fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae uuid tree key (UUID_TREE ROOT_ITEM 0) After this patch, the data loss no longer happens after a power failure and btrfs-debug-tree shows: $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 location key (257 INODE_ITEM 0) type FILE namelen 6 datalen 0 name: foobar item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 inode ref index 2 namelen 6 name: foobar item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 extent data disk byte 12845056 nr 8192 extent data offset 0 nr 8192 ram 8192 extent compression 0 checksum tree key (CSUM_TREE ROOT_ITEM 0) Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> --- V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as suggested by Miao Xie. V3: Use btrfs_start_all_delalloc_inodes() instead but outside btrfs_sync_fs(), in the sync IOCTL handler. Using writeback_inodes_sb() is not very honest because it doesn''t guarantee inode data is persisted and we have no way to know if persistence really happened or not, returning 0 (success) always. Thanks Liu Bo for the suggestion. V4: Be even more honest in the sync IOCTL handler - don''t always return success regardless of the result of the btrfs_sync_fs() call. fs/btrfs/ioctl.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9d46f60..385c58f 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4557,9 +4557,15 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_logical_to_ino(root, argp); case BTRFS_IOC_SPACE_INFO: return btrfs_ioctl_space_info(root, argp); - case BTRFS_IOC_SYNC: - btrfs_sync_fs(file->f_dentry->d_sb, 1); - return 0; + case BTRFS_IOC_SYNC: { + int ret; + + ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0); + if (ret) + return ret; + ret = btrfs_sync_fs(file->f_dentry->d_sb, 1); + return ret; + } case BTRFS_IOC_START_SYNC: return btrfs_ioctl_start_sync(root, argp); case BTRFS_IOC_WAIT_SYNC: -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Sep-24 01:31 UTC
Re: [PATCH v4] Btrfs: fix sync fs to actually wait for all data to be persisted
On mon, 23 Sep 2013 11:35:11 +0100, Filipe David Borba Manana wrote:> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn''t > wait for delayed work to finish before returning success to the > caller. This change fixes this, ensuring that there''s no data loss > if a power failure happens right after fs sync returns success to > the caller and before the next commit happens. > > Steps to reproduce the data loss issue: > > $ mkfs.btrfs -f /dev/sdb3 > $ mount /dev/sdb3 /mnt/btrfs > $ perl -e ''$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'' && btrfs fi sync /mnt/btrfs > > Right after the btrfs fi sync command (a second or 2 for example), power > off the machine and reboot it. The file will be empty, as it can be verified > after mounting the filesystem and through btrfs-debug-tree: > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > location key (257 INODE_ITEM 0) type FILE > namelen 6 datalen 0 name: foobar > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1 > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > inode ref index 2 namelen 6 name: foobar > checksum tree key (CSUM_TREE ROOT_ITEM 0) > leaf 29429760 items 0 free space 3995 generation 7 owner 7 > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae > uuid tree key (UUID_TREE ROOT_ITEM 0) > > After this patch, the data loss no longer happens after a power failure and > btrfs-debug-tree shows: > > $ btrfs-debug-tree /dev/sdb3 | egrep ''\(257 INODE_ITEM 0\) itemoff'' -B 3 -A 8 > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 > location key (257 INODE_ITEM 0) type FILE > namelen 6 datalen 0 name: foobar > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1 > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 > inode ref index 2 namelen 6 name: foobar > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 > extent data disk byte 12845056 nr 8192 > extent data offset 0 nr 8192 ram 8192 > extent compression 0 > checksum tree key (CSUM_TREE ROOT_ITEM 0) > > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>> --- > > V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as > suggested by Miao Xie. > V3: Use btrfs_start_all_delalloc_inodes() instead but outside btrfs_sync_fs(), > in the sync IOCTL handler. Using writeback_inodes_sb() is not very honest > because it doesn''t guarantee inode data is persisted and we have no way > to know if persistence really happened or not, returning 0 (success) always. > Thanks Liu Bo for the suggestion. > V4: Be even more honest in the sync IOCTL handler - don''t always return success > regardless of the result of the btrfs_sync_fs() call. > > fs/btrfs/ioctl.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 9d46f60..385c58f 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -4557,9 +4557,15 @@ long btrfs_ioctl(struct file *file, unsigned int > return btrfs_ioctl_logical_to_ino(root, argp); > case BTRFS_IOC_SPACE_INFO: > return btrfs_ioctl_space_info(root, argp); > - case BTRFS_IOC_SYNC: > - btrfs_sync_fs(file->f_dentry->d_sb, 1); > - return 0; > + case BTRFS_IOC_SYNC: { > + int ret; > + > + ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0); > + if (ret) > + return ret; > + ret = btrfs_sync_fs(file->f_dentry->d_sb, 1); > + return ret; > + } > case BTRFS_IOC_START_SYNC: > return btrfs_ioctl_start_sync(root, argp); > case BTRFS_IOC_WAIT_SYNC: >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html