Hisashi Hifumi
2009-Feb-02 11:00 UTC
[PATCH] btrfs: call mark_inode_dirty when i_size is updated
Hi Chris. I think it is needed to call mark_inode_dirty() when file size expands in order to flush metadata updates to HDD through sync() syscall or background_writeout(). Thanks. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> diff -Nrup linux-2.6.29-rc3.org/fs/btrfs/file.c linux-2.6.29-rc3/fs/btrfs/file.c --- linux-2.6.29-rc3.org/fs/btrfs/file.c 2009-02-02 16:53:18.000000000 +0900 +++ linux-2.6.29-rc3/fs/btrfs/file.c 2009-02-02 18:21:00.000000000 +0900 @@ -152,7 +152,7 @@ static noinline int dirty_and_release_pa } if (end_pos > isize) { i_size_write(inode, end_pos); - btrfs_update_inode(trans, root, inode); + mark_inode_dirty(inode); } err = btrfs_end_transaction(trans, root); out_unlock: -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2009-Feb-02 14:12 UTC
Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote:> Hi Chris. > > I think it is needed to call mark_inode_dirty() when file size expands > in order to flush metadata updates to HDD through sync() syscall or > background_writeout(). >Thanks for reading through this code and sending the patch. I find the I_DIRTY flags one of the more confusing parts of the generic fs writeback cdoe. But, I think what happens is the btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which does: if (mapping->host) { /* !PageAnon && !swapper_space */ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); } This should be enough to make sure the btrfs inodes are processed by background writeout and sync(). Please let me know if I''m misreading things. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hisashi Hifumi
2009-Feb-03 00:36 UTC
Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
At 23:12 09/02/02, Chris Mason wrote:>On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote: >> Hi Chris. >> >> I think it is needed to call mark_inode_dirty() when file size expands >> in order to flush metadata updates to HDD through sync() syscall or >> background_writeout(). >> > >Thanks for reading through this code and sending the patch. > >I find the I_DIRTY flags one of the more confusing parts of the generic >fs writeback cdoe. But, I think what happens is the >btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which >does: > >if (mapping->host) { > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >} > >This should be enough to make sure the btrfs inodes are processed by >background writeout and sync(). Please let me know if I''m misreading >things.Surely, as you pointed out, btrfs_set_page_dirty calls if (mapping->host) { /* !PageAnon && !swapper_space */ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); } through _set_page_dirty_nobuffers. But I_DIRTY_PAGES is not sufficient. To flush metadata update to HDD through sync(), I_DIRTY_SYNC or I_DIRTY_DATASYNC flag is needed. see __sync_single_inode. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2009-Feb-03 01:04 UTC
Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
On Tue, 2009-02-03 at 09:36 +0900, Hisashi Hifumi wrote:> At 23:12 09/02/02, Chris Mason wrote: > >On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote: > >> Hi Chris. > >> > >> I think it is needed to call mark_inode_dirty() when file size expands > >> in order to flush metadata updates to HDD through sync() syscall or > >> background_writeout(). > >> > > > >Thanks for reading through this code and sending the patch. > > > >I find the I_DIRTY flags one of the more confusing parts of the generic > >fs writeback cdoe. But, I think what happens is the > >btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which > >does: > > > >if (mapping->host) { > > /* !PageAnon && !swapper_space */ > > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > >} > > > >This should be enough to make sure the btrfs inodes are processed by > >background writeout and sync(). Please let me know if I''m misreading > >things. > > Surely, as you pointed out, btrfs_set_page_dirty calls > if (mapping->host) { > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > } > through _set_page_dirty_nobuffers. > But I_DIRTY_PAGES is not sufficient. > To flush metadata update to HDD through sync(), I_DIRTY_SYNC or > I_DIRTY_DATASYNC flag is needed. see __sync_single_inode.Since btrfs uses a dirty_inode callback, our inodes are never really dirty. The btree metadata always has the same information as the in-core inode does. The extra transaction commit steps taken at sync time are enough to get all the relevant metadata on disk. So, I think what happens is that I_DIRTY_PAGES is enough to get the data pages on disk and the transaction commit gets the metadata on disk. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hisashi Hifumi
2009-Feb-03 02:43 UTC
Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
At 10:04 09/02/03, Chris Mason wrote:>On Tue, 2009-02-03 at 09:36 +0900, Hisashi Hifumi wrote: >> At 23:12 09/02/02, Chris Mason wrote: >> >On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote: >> >> Hi Chris. >> >> >> >> I think it is needed to call mark_inode_dirty() when file size expands >> >> in order to flush metadata updates to HDD through sync() syscall or >> >> background_writeout(). >> >> >> > >> >Thanks for reading through this code and sending the patch. >> > >> >I find the I_DIRTY flags one of the more confusing parts of the generic >> >fs writeback cdoe. But, I think what happens is the >> >btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which >> >does: >> > >> >if (mapping->host) { >> > /* !PageAnon && !swapper_space */ >> > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >> >} >> > >> >This should be enough to make sure the btrfs inodes are processed by >> >background writeout and sync(). Please let me know if I''m misreading >> >things. >> >> Surely, as you pointed out, btrfs_set_page_dirty calls >> if (mapping->host) { >> /* !PageAnon && !swapper_space */ >> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >> } >> through _set_page_dirty_nobuffers. >> But I_DIRTY_PAGES is not sufficient. >> To flush metadata update to HDD through sync(), I_DIRTY_SYNC or >> I_DIRTY_DATASYNC flag is needed. see __sync_single_inode. > >Since btrfs uses a dirty_inode callback, our inodes are never really >dirty. The btree metadata always has the same information as the in-core >inode does. > >The extra transaction commit steps taken at sync time are enough to get >all the relevant metadata on disk. > >So, I think what happens is that I_DIRTY_PAGES is enough to get the data >pages on disk and the transaction commit gets the metadata on disk.metadata update transaction is made through dirty_inode call back, but to run dirty_inode callback I_DIRTY_SYNC or I_DIRTY_DATASYNC flag is needed. (see __mark_inode_dirty). Also, to commit transaction to disk through write_inode callback, I_DIRTY_SYNC or I_DIRTY_DATASYNC flag is needed.(see _sync_single_inode) So I think my patch fixes this issue. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2009-Feb-03 16:22 UTC
Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated
On Tue, 2009-02-03 at 11:43 +0900, Hisashi Hifumi wrote:> At 10:04 09/02/03, Chris Mason wrote: > >On Tue, 2009-02-03 at 09:36 +0900, Hisashi Hifumi wrote: > >> At 23:12 09/02/02, Chris Mason wrote: > >> >On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote: > >> >> Hi Chris. > >> >> > >> >> I think it is needed to call mark_inode_dirty() when file size expands > >> >> in order to flush metadata updates to HDD through sync() syscall or > >> >> background_writeout(). > >> >> > >> > > >> >Thanks for reading through this code and sending the patch. > >> > > >> >I find the I_DIRTY flags one of the more confusing parts of the generic > >> >fs writeback cdoe. But, I think what happens is the > >> >btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which > >> >does: > >> > > >> >if (mapping->host) { > >> > /* !PageAnon && !swapper_space */ > >> > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > >> >} > >> > > >> >This should be enough to make sure the btrfs inodes are processed by > >> >background writeout and sync(). Please let me know if I''m misreading > >> >things. > >> > >> Surely, as you pointed out, btrfs_set_page_dirty calls > >> if (mapping->host) { > >> /* !PageAnon && !swapper_space */ > >> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > >> } > >> through _set_page_dirty_nobuffers. > >> But I_DIRTY_PAGES is not sufficient. > >> To flush metadata update to HDD through sync(), I_DIRTY_SYNC or > >> I_DIRTY_DATASYNC flag is needed. see __sync_single_inode. > > > >Since btrfs uses a dirty_inode callback, our inodes are never really > >dirty. The btree metadata always has the same information as the in-core > >inode does. > > > >The extra transaction commit steps taken at sync time are enough to get > >all the relevant metadata on disk. > > > >So, I think what happens is that I_DIRTY_PAGES is enough to get the data > >pages on disk and the transaction commit gets the metadata on disk. > > > metadata update transaction is made through dirty_inode call back, but > to run dirty_inode callback I_DIRTY_SYNC or I_DIRTY_DATASYNC flag is needed. > (see __mark_inode_dirty).Btrfs is using btrfs_update_inode to get inode updates into the btree when it changes things. The only places that trigger the dirty_inode callback are the generic code that do atime/mtime/ctime updates and a few other generic VFS operations.> > Also, to commit transaction to disk through write_inode callback, I_DIRTY_SYNC > or I_DIRTY_DATASYNC flag is needed.(see _sync_single_inode) > > So I think my patch fixes this issue.I think I don''t fully understand which caller of write_inode (or __sync_single_inode) we''re missing. For O_DIRECT and O_SYNC, btrfs_file_write is responsible for getting the metadata on disk. For sync(), the btrfs super operations do a full transaction commit. In all of the above cases, the btree is always up to date with the contents of the vfs inode. We just need to make sure a transaction commit (or log operation) is done for the file. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html