Christoph Hellwig
2023-Jan-18 17:30 UTC
[Ocfs2-devel] remove most callers of write_one_page v3
Hi all, this series removes most users of the write_one_page API. These helpers internally call ->writepage which we are gradually removing from the kernel. Changes since v2: - more minix error handling fixes Changes since v1: - drop the btrfs changes (queue up in the btrfs tree) - drop the finaly move to jfs (can't be done without the btrfs patches) - fix the existing minix code to properly propagate errors Diffstat: minix/dir.c | 62 +++++++++++++++++++++++++++------------------------ minix/minix.h | 3 +- minix/namei.c | 27 ++++++++++++---------- ocfs2/refcounttree.c | 9 ++++--- sysv/dir.c | 30 +++++++++++++++--------- ufs/dir.c | 29 +++++++++++++++-------- 6 files changed, 94 insertions(+), 66 deletions(-)
Christoph Hellwig
2023-Jan-18 17:30 UTC
[Ocfs2-devel] [PATCH 1/7] minix: move releasing pages into unlink and rename
Instead of consuming the page reference and kmap in the low-level minix_delete_entry and minix_set_link helpers, do it in the callers where that code can be shared with the error cleanup path. Signed-off-by: Christoph Hellwig <hch at lst.de> --- fs/minix/dir.c | 2 -- fs/minix/namei.c | 19 ++++++++++--------- 2 files changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/minix/dir.c b/fs/minix/dir.c index dcfe5b25378b54..ec462330e749af 100644 --- a/fs/minix/dir.c +++ b/fs/minix/dir.c @@ -306,7 +306,6 @@ int minix_delete_entry(struct minix_dir_entry *de, struct page *page) } else { unlock_page(page); } - dir_put_page(page); inode->i_ctime = inode->i_mtime = current_time(inode); mark_inode_dirty(inode); return err; @@ -430,7 +429,6 @@ void minix_set_link(struct minix_dir_entry *de, struct page *page, } else { unlock_page(page); } - dir_put_page(page); dir->i_mtime = dir->i_ctime = current_time(dir); mark_inode_dirty(dir); } diff --git a/fs/minix/namei.c b/fs/minix/namei.c index 8afdc408ca4fd5..5fc696e032c543 100644 --- a/fs/minix/namei.c +++ b/fs/minix/namei.c @@ -150,23 +150,23 @@ static int minix_mkdir(struct user_namespace *mnt_userns, struct inode *dir, static int minix_unlink(struct inode * dir, struct dentry *dentry) { - int err = -ENOENT; struct inode * inode = d_inode(dentry); struct page * page; struct minix_dir_entry * de; + int err; de = minix_find_entry(dentry, &page); if (!de) - goto end_unlink; - + return -ENOENT; err = minix_delete_entry(de, page); - if (err) - goto end_unlink; + kunmap(page); + put_page(page); + if (err) + return err; inode->i_ctime = dir->i_ctime; inode_dec_link_count(inode); -end_unlink: - return err; + return 0; } static int minix_rmdir(struct inode * dir, struct dentry *dentry) @@ -223,7 +223,10 @@ static int minix_rename(struct user_namespace *mnt_userns, new_de = minix_find_entry(new_dentry, &new_page); if (!new_de) goto out_dir; + err = 0; minix_set_link(new_de, new_page, old_inode); + kunmap(new_page); + put_page(new_page); new_inode->i_ctime = current_time(new_inode); if (dir_de) drop_nlink(new_inode); @@ -243,8 +246,6 @@ static int minix_rename(struct user_namespace *mnt_userns, minix_set_link(dir_de, dir_page, new_dir); inode_dec_link_count(old_dir); } - return 0; - out_dir: if (dir_de) { kunmap(dir_page); -- 2.39.0
Christoph Hellwig
2023-Jan-18 17:30 UTC
[Ocfs2-devel] [PATCH 2/7] minix: fix error handling in minix_delete_entry
If minix_prepare_chunk fails, updating c/mtime and marking the dir inode dirty is wrong, as the inode hasn't been modified. Signed-off-by: Christoph Hellwig <hch at lst.de> --- fs/minix/dir.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/minix/dir.c b/fs/minix/dir.c index ec462330e749af..242e179aa1fbeb 100644 --- a/fs/minix/dir.c +++ b/fs/minix/dir.c @@ -297,18 +297,20 @@ int minix_delete_entry(struct minix_dir_entry *de, struct page *page) lock_page(page); err = minix_prepare_chunk(page, pos, len); - if (err == 0) { - if (sbi->s_version == MINIX_V3) - ((minix3_dirent *) de)->inode = 0; - else - de->inode = 0; - err = dir_commit_chunk(page, pos, len); - } else { + if (err) { unlock_page(page); + return err; } + if (sbi->s_version == MINIX_V3) + ((minix3_dirent *)de)->inode = 0; + else + de->inode = 0; + err = dir_commit_chunk(page, pos, len); + if (err) + return err; inode->i_ctime = inode->i_mtime = current_time(inode); mark_inode_dirty(inode); - return err; + return 0; } int minix_make_empty(struct inode *inode, struct inode *dir) -- 2.39.0
Christoph Hellwig
2023-Jan-18 17:30 UTC
[Ocfs2-devel] [PATCH 3/7] minix: fix error handling in minix_set_link
If minix_prepare_chunk fails, updating c/mtime and marking the dir inode dirty is wrong, as the inode hasn't been modified. Also propagate the error to the caller. Note that this moves the dir_put_page call later, but that matches other uses of this helper in the directory code. Signed-off-by: Christoph Hellwig <hch at lst.de> --- fs/minix/dir.c | 22 ++++++++++++---------- fs/minix/minix.h | 3 ++- fs/minix/namei.c | 10 ++++++---- 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/fs/minix/dir.c b/fs/minix/dir.c index 242e179aa1fbeb..34c1cdb5dc7d47 100644 --- a/fs/minix/dir.c +++ b/fs/minix/dir.c @@ -410,8 +410,8 @@ int minix_empty_dir(struct inode * inode) } /* Releases the page */ -void minix_set_link(struct minix_dir_entry *de, struct page *page, - struct inode *inode) +int minix_set_link(struct minix_dir_entry *de, struct page *page, + struct inode *inode) { struct inode *dir = page->mapping->host; struct minix_sb_info *sbi = minix_sb(dir->i_sb); @@ -420,19 +420,21 @@ void minix_set_link(struct minix_dir_entry *de, struct page *page, int err; lock_page(page); - err = minix_prepare_chunk(page, pos, sbi->s_dirsize); - if (err == 0) { - if (sbi->s_version == MINIX_V3) - ((minix3_dirent *) de)->inode = inode->i_ino; - else - de->inode = inode->i_ino; - err = dir_commit_chunk(page, pos, sbi->s_dirsize); - } else { + if (err) { unlock_page(page); + return err; } + if (sbi->s_version == MINIX_V3) + ((minix3_dirent *)de)->inode = inode->i_ino; + else + de->inode = inode->i_ino; + err = dir_commit_chunk(page, pos, sbi->s_dirsize); + if (err) + return err; dir->i_mtime = dir->i_ctime = current_time(dir); mark_inode_dirty(dir); + return 0; } struct minix_dir_entry * minix_dotdot (struct inode *dir, struct page **p) diff --git a/fs/minix/minix.h b/fs/minix/minix.h index 20217336802570..8f7a636bd1b241 100644 --- a/fs/minix/minix.h +++ b/fs/minix/minix.h @@ -69,7 +69,8 @@ extern int minix_add_link(struct dentry*, struct inode*); extern int minix_delete_entry(struct minix_dir_entry*, struct page*); extern int minix_make_empty(struct inode*, struct inode*); extern int minix_empty_dir(struct inode*); -extern void minix_set_link(struct minix_dir_entry*, struct page*, struct inode*); +int minix_set_link(struct minix_dir_entry *de, struct page *page, + struct inode *inode); extern struct minix_dir_entry *minix_dotdot(struct inode*, struct page**); extern ino_t minix_inode_by_name(struct dentry*); diff --git a/fs/minix/namei.c b/fs/minix/namei.c index 5fc696e032c543..bfbadd85d01032 100644 --- a/fs/minix/namei.c +++ b/fs/minix/namei.c @@ -223,10 +223,11 @@ static int minix_rename(struct user_namespace *mnt_userns, new_de = minix_find_entry(new_dentry, &new_page); if (!new_de) goto out_dir; - err = 0; - minix_set_link(new_de, new_page, old_inode); + err = minix_set_link(new_de, new_page, old_inode); kunmap(new_page); put_page(new_page); + if (err) + goto out_dir; new_inode->i_ctime = current_time(new_inode); if (dir_de) drop_nlink(new_inode); @@ -243,8 +244,9 @@ static int minix_rename(struct user_namespace *mnt_userns, mark_inode_dirty(old_inode); if (dir_de) { - minix_set_link(dir_de, dir_page, new_dir); - inode_dec_link_count(old_dir); + err = minix_set_link(dir_de, dir_page, new_dir); + if (!err) + inode_dec_link_count(old_dir); } out_dir: if (dir_de) { -- 2.39.0
Christoph Hellwig
2023-Jan-18 17:30 UTC
[Ocfs2-devel] [PATCH 4/7] minix: don't flush page immediately for DIRSYNC directories
We do not need to writeout modified directory blocks immediately when modifying them while the page is locked. It is enough to do the flush somewhat later which has the added benefit that inode times can be flushed as well. It also allows us to stop depending on write_one_page() function. Ported from an ext2 patch by Jan Kara. Signed-off-by: Christoph Hellwig <hch at lst.de> --- fs/minix/dir.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/fs/minix/dir.c b/fs/minix/dir.c index 34c1cdb5dc7d47..bf9858f76b6ae1 100644 --- a/fs/minix/dir.c +++ b/fs/minix/dir.c @@ -46,21 +46,27 @@ minix_last_byte(struct inode *inode, unsigned long page_nr) return last_byte; } -static int dir_commit_chunk(struct page *page, loff_t pos, unsigned len) +static void dir_commit_chunk(struct page *page, loff_t pos, unsigned len) { struct address_space *mapping = page->mapping; struct inode *dir = mapping->host; - int err = 0; + block_write_end(NULL, mapping, pos, len, len, page, NULL); if (pos+len > dir->i_size) { i_size_write(dir, pos+len); mark_inode_dirty(dir); } - if (IS_DIRSYNC(dir)) - err = write_one_page(page); - else - unlock_page(page); + unlock_page(page); +} + +static int minix_handle_dirsync(struct inode *dir) +{ + int err; + + err = filemap_write_and_wait(dir->i_mapping); + if (!err) + err = sync_inode_metadata(dir, 1); return err; } @@ -274,9 +280,10 @@ int minix_add_link(struct dentry *dentry, struct inode *inode) memset (namx + namelen, 0, sbi->s_dirsize - namelen - 2); de->inode = inode->i_ino; } - err = dir_commit_chunk(page, pos, sbi->s_dirsize); + dir_commit_chunk(page, pos, sbi->s_dirsize); dir->i_mtime = dir->i_ctime = current_time(dir); mark_inode_dirty(dir); + err = minix_handle_dirsync(dir); out_put: dir_put_page(page); out: @@ -305,12 +312,10 @@ int minix_delete_entry(struct minix_dir_entry *de, struct page *page) ((minix3_dirent *)de)->inode = 0; else de->inode = 0; - err = dir_commit_chunk(page, pos, len); - if (err) - return err; + dir_commit_chunk(page, pos, len); inode->i_ctime = inode->i_mtime = current_time(inode); mark_inode_dirty(inode); - return 0; + return minix_handle_dirsync(inode); } int minix_make_empty(struct inode *inode, struct inode *dir) @@ -350,7 +355,8 @@ int minix_make_empty(struct inode *inode, struct inode *dir) } kunmap_atomic(kaddr); - err = dir_commit_chunk(page, 0, 2 * sbi->s_dirsize); + dir_commit_chunk(page, 0, 2 * sbi->s_dirsize); + err = minix_handle_dirsync(inode); fail: put_page(page); return err; @@ -429,12 +435,10 @@ int minix_set_link(struct minix_dir_entry *de, struct page *page, ((minix3_dirent *)de)->inode = inode->i_ino; else de->inode = inode->i_ino; - err = dir_commit_chunk(page, pos, sbi->s_dirsize); - if (err) - return err; + dir_commit_chunk(page, pos, sbi->s_dirsize); dir->i_mtime = dir->i_ctime = current_time(dir); mark_inode_dirty(dir); - return 0; + return minix_handle_dirsync(dir); } struct minix_dir_entry * minix_dotdot (struct inode *dir, struct page **p) -- 2.39.0
Christoph Hellwig
2023-Jan-18 17:30 UTC
[Ocfs2-devel] [PATCH 5/7] sysv: don't flush page immediately for DIRSYNC directories
We do not need to writeout modified directory blocks immediately when modifying them while the page is locked. It is enough to do the flush somewhat later which has the added benefit that inode times can be flushed as well. It also allows us to stop depending on write_one_page() function. Ported from an ext2 patch by Jan Kara. Signed-off-by: Christoph Hellwig <hch at lst.de> --- fs/sysv/dir.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/fs/sysv/dir.c b/fs/sysv/dir.c index 88e38cd8f5c9ae..16730795a62161 100644 --- a/fs/sysv/dir.c +++ b/fs/sysv/dir.c @@ -34,21 +34,26 @@ static inline void dir_put_page(struct page *page) put_page(page); } -static int dir_commit_chunk(struct page *page, loff_t pos, unsigned len) +static void dir_commit_chunk(struct page *page, loff_t pos, unsigned len) { struct address_space *mapping = page->mapping; struct inode *dir = mapping->host; - int err = 0; block_write_end(NULL, mapping, pos, len, len, page, NULL); if (pos+len > dir->i_size) { i_size_write(dir, pos+len); mark_inode_dirty(dir); } - if (IS_DIRSYNC(dir)) - err = write_one_page(page); - else - unlock_page(page); + unlock_page(page); +} + +static int sysv_handle_dirsync(struct inode *dir) +{ + int err; + + err = filemap_write_and_wait(dir->i_mapping); + if (!err) + err = sync_inode_metadata(dir, 1); return err; } @@ -215,9 +220,10 @@ int sysv_add_link(struct dentry *dentry, struct inode *inode) memcpy (de->name, name, namelen); memset (de->name + namelen, 0, SYSV_DIRSIZE - namelen - 2); de->inode = cpu_to_fs16(SYSV_SB(inode->i_sb), inode->i_ino); - err = dir_commit_chunk(page, pos, SYSV_DIRSIZE); + dir_commit_chunk(page, pos, SYSV_DIRSIZE); dir->i_mtime = dir->i_ctime = current_time(dir); mark_inode_dirty(dir); + err = sysv_handle_dirsync(dir); out_page: dir_put_page(page); out: @@ -238,11 +244,11 @@ int sysv_delete_entry(struct sysv_dir_entry *de, struct page *page) err = sysv_prepare_chunk(page, pos, SYSV_DIRSIZE); BUG_ON(err); de->inode = 0; - err = dir_commit_chunk(page, pos, SYSV_DIRSIZE); + dir_commit_chunk(page, pos, SYSV_DIRSIZE); dir_put_page(page); inode->i_ctime = inode->i_mtime = current_time(inode); mark_inode_dirty(inode); - return err; + return sysv_handle_dirsync(inode); } int sysv_make_empty(struct inode *inode, struct inode *dir) @@ -272,7 +278,8 @@ int sysv_make_empty(struct inode *inode, struct inode *dir) strcpy(de->name,".."); kunmap(page); - err = dir_commit_chunk(page, 0, 2 * SYSV_DIRSIZE); + dir_commit_chunk(page, 0, 2 * SYSV_DIRSIZE); + err = sysv_handle_dirsync(inode); fail: put_page(page); return err; @@ -336,10 +343,11 @@ void sysv_set_link(struct sysv_dir_entry *de, struct page *page, err = sysv_prepare_chunk(page, pos, SYSV_DIRSIZE); BUG_ON(err); de->inode = cpu_to_fs16(SYSV_SB(inode->i_sb), inode->i_ino); - err = dir_commit_chunk(page, pos, SYSV_DIRSIZE); + dir_commit_chunk(page, pos, SYSV_DIRSIZE); dir_put_page(page); dir->i_mtime = dir->i_ctime = current_time(dir); mark_inode_dirty(dir); + sysv_handle_dirsync(inode); } struct sysv_dir_entry * sysv_dotdot (struct inode *dir, struct page **p) -- 2.39.0
Christoph Hellwig
2023-Jan-18 17:30 UTC
[Ocfs2-devel] [PATCH 6/7] ufs: don't flush page immediately for DIRSYNC directories
We do not need to writeout modified directory blocks immediately when modifying them while the page is locked. It is enough to do the flush somewhat later which has the added benefit that inode times can be flushed as well. It also allows us to stop depending on write_one_page() function. Ported from an ext2 patch by Jan Kara. Signed-off-by: Christoph Hellwig <hch at lst.de> --- fs/ufs/dir.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c index 391efaf1d52897..379d75796a5ce3 100644 --- a/fs/ufs/dir.c +++ b/fs/ufs/dir.c @@ -42,11 +42,10 @@ static inline int ufs_match(struct super_block *sb, int len, return !memcmp(name, de->d_name, len); } -static int ufs_commit_chunk(struct page *page, loff_t pos, unsigned len) +static void ufs_commit_chunk(struct page *page, loff_t pos, unsigned len) { struct address_space *mapping = page->mapping; struct inode *dir = mapping->host; - int err = 0; inode_inc_iversion(dir); block_write_end(NULL, mapping, pos, len, len, page, NULL); @@ -54,10 +53,16 @@ static int ufs_commit_chunk(struct page *page, loff_t pos, unsigned len) i_size_write(dir, pos+len); mark_inode_dirty(dir); } - if (IS_DIRSYNC(dir)) - err = write_one_page(page); - else - unlock_page(page); + unlock_page(page); +} + +static int ufs_handle_dirsync(struct inode *dir) +{ + int err; + + err = filemap_write_and_wait(dir->i_mapping); + if (!err) + err = sync_inode_metadata(dir, 1); return err; } @@ -99,11 +104,12 @@ void ufs_set_link(struct inode *dir, struct ufs_dir_entry *de, de->d_ino = cpu_to_fs32(dir->i_sb, inode->i_ino); ufs_set_de_type(dir->i_sb, de, inode->i_mode); - err = ufs_commit_chunk(page, pos, len); + ufs_commit_chunk(page, pos, len); ufs_put_page(page); if (update_times) dir->i_mtime = dir->i_ctime = current_time(dir); mark_inode_dirty(dir); + ufs_handle_dirsync(dir); } @@ -390,10 +396,11 @@ int ufs_add_link(struct dentry *dentry, struct inode *inode) de->d_ino = cpu_to_fs32(sb, inode->i_ino); ufs_set_de_type(sb, de, inode->i_mode); - err = ufs_commit_chunk(page, pos, rec_len); + ufs_commit_chunk(page, pos, rec_len); dir->i_mtime = dir->i_ctime = current_time(dir); mark_inode_dirty(dir); + err = ufs_handle_dirsync(dir); /* OFFSET_CACHE */ out_put: ufs_put_page(page); @@ -531,9 +538,10 @@ int ufs_delete_entry(struct inode *inode, struct ufs_dir_entry *dir, if (pde) pde->d_reclen = cpu_to_fs16(sb, to - from); dir->d_ino = 0; - err = ufs_commit_chunk(page, pos, to - from); + ufs_commit_chunk(page, pos, to - from); inode->i_ctime = inode->i_mtime = current_time(inode); mark_inode_dirty(inode); + err = ufs_handle_dirsync(inode); out: ufs_put_page(page); UFSD("EXIT\n"); @@ -579,7 +587,8 @@ int ufs_make_empty(struct inode * inode, struct inode *dir) strcpy (de->d_name, ".."); kunmap(page); - err = ufs_commit_chunk(page, 0, chunk_size); + ufs_commit_chunk(page, 0, chunk_size); + err = ufs_handle_dirsync(inode); fail: put_page(page); return err; -- 2.39.0
Christoph Hellwig
2023-Jan-18 17:30 UTC
[Ocfs2-devel] [PATCH 7/7] ocfs2: don't use write_one_page in ocfs2_duplicate_clusters_by_page
Use filemap_write_and_wait_range to write back the range of the dirty page instead of write_one_page in preparation of removing write_one_page and eventually ->writepage. Signed-off-by: Christoph Hellwig <hch at lst.de> Reviewed-by: Jan Kara <jack at suse.cz> Reviewed-by: Joseph Qi <joseph.qi at linux.alibaba.com> --- fs/ocfs2/refcounttree.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c index 623db358b1efa8..4a73405962ec4f 100644 --- a/fs/ocfs2/refcounttree.c +++ b/fs/ocfs2/refcounttree.c @@ -2952,10 +2952,11 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, */ if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) { if (PageDirty(page)) { - /* - * write_on_page will unlock the page on return - */ - ret = write_one_page(page); + unlock_page(page); + put_page(page); + + ret = filemap_write_and_wait_range(mapping, + offset, map_end - 1); goto retry; } } -- 2.39.0
Andrew Morton
2023-Jan-18 21:09 UTC
[Ocfs2-devel] remove most callers of write_one_page v3
On Wed, 18 Jan 2023 18:30:20 +0100 Christoph Hellwig <hch at lst.de> wrote:> this series removes most users of the write_one_page API. These helpers > internally call ->writepage which we are gradually removing from the > kernel.I grabbed these. If Stephen's duplicated patch check tells me that they're also in the VFS tree, I'll drop them again.
On Wed, Jan 18, 2023 at 06:30:20PM +0100, Christoph Hellwig wrote:> Hi all, > > this series removes most users of the write_one_page API. These helpers > internally call ->writepage which we are gradually removing from the > kernel. > > Changes since v2: > - more minix error handling fixes > > Changes since v1: > - drop the btrfs changes (queue up in the btrfs tree) > - drop the finaly move to jfs (can't be done without the btrfs patches) > - fix the existing minix code to properly propagate errorsGrabbed. I'll split that into per-fs branches (and synchronize wrt fixes), fold the kmap_local stuff into that and push out.
Reasonably Related Threads
- remove most callers of write_one_page v4
- + ufs-dont-flush-page-immediately-for-dirsync-directories.patch added to mm-unstable branch
- [PATCH] btrfs: call mark_inode_dirty when i_size is updated
- [git patches] ocfs2 fixes
- [PATCH] fs: push file_update_time into ->page_mkwrite