On Wed, Feb 19, 2020 at 01:00:39PM -0800, Matthew Wilcox wrote:> From: "Matthew Wilcox (Oracle)" <willy at infradead.org> > > This series adds a readahead address_space operation to eventually > replace the readpages operation. The key difference is that > pages are added to the page cache as they are allocated (and > then looked up by the filesystem) instead of passing them on a > list to the readpages operation and having the filesystem add > them to the page cache. It's a net reduction in code for each > implementation, more efficient than walking a list, and solves > the direct-write vs buffered-read problem reported by yu kuai at > https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3 at huawei.com/__;!!GqivPVa7Brio!JjIyuowjiK6CuG-UXuzAOn71T4G5dvcCU6C8-tHRKBx6sebTRXJpyR_7FTTgyI5duGVQPw$ > > The only unconverted filesystems are those which use fscache. > Their conversion is pending Dave Howells' rewrite which will make the > conversion substantially easier. > > I want to thank the reviewers; Dave Chinner, John Hubbard and Christoph > Hellwig have done a marvellous job of providing constructive criticism. > Eric Biggers pointed out how I'd broken ext4 (which led to a substantial > change). I've tried to take it all on board, but I may have missed > something simply because you've done such a thorough job. > > This series can also be found at > https://urldefense.com/v3/__http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/tags/readahead_v7__;!!GqivPVa7Brio!JjIyuowjiK6CuG-UXuzAOn71T4G5dvcCU6C8-tHRKBx6sebTRXJpyR_7FTTgyI77Y0Obcg$ > (I also pushed the readahead_v6 tag there in case anyone wants to diff, and > they're both based on 5.6-rc2 so they're easy to diff) > > v7: > - Now passes an xfstests run on ext4!On btrfs it still chokes on the first test btrfs/001, with the following warning, the test is stuck there. [ 21.100922] WARNING: suspicious RCU usage [ 21.103107] 5.6.0-rc2-default+ #996 Not tainted [ 21.105133] ----------------------------- [ 21.106864] include/linux/xarray.h:1164 suspicious rcu_dereference_check() usage! [ 21.109948] [ 21.109948] other info that might help us debug this: [ 21.109948] [ 21.113373] [ 21.113373] rcu_scheduler_active = 2, debug_locks = 1 [ 21.115801] 4 locks held by umount/793: [ 21.117135] #0: ffff964a736890e8 (&type->s_umount_key#26){+.+.}, at: deactivate_super+0x2f/0x40 [ 21.120188] #1: ffff964a7347ba68 (&delayed_node->mutex){+.+.}, at: __btrfs_commit_inode_delayed_items+0x44c/0x4e0 [btrfs] [ 21.123042] #2: ffff964a612fe5c8 (&space_info->groups_sem){++++}, at: find_free_extent+0x27d/0xf00 [btrfs] [ 21.126068] #3: ffff964a60b93280 (&caching_ctl->mutex){+.+.}, at: btrfs_cache_block_group+0x1f0/0x500 [btrfs] [ 21.129655] [ 21.129655] stack backtrace: [ 21.131943] CPU: 1 PID: 793 Comm: umount Not tainted 5.6.0-rc2-default+ #996 [ 21.134164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 21.138076] Call Trace: [ 21.139441] dump_stack+0x71/0xa0 [ 21.140954] xas_start+0x1a4/0x240 [ 21.142473] xas_load+0xa/0x50 [ 21.143874] xas_find+0x226/0x280 [ 21.145298] extent_readahead+0xcb/0x4f0 [btrfs] [ 21.146934] ? mem_cgroup_commit_charge+0x56/0x400 [ 21.148654] ? rcu_read_lock_sched_held+0x5d/0x90 [ 21.150382] ? __add_to_page_cache_locked+0x327/0x380 [ 21.152155] read_pages+0x80/0x1f0 [ 21.153531] page_cache_readahead_unbounded+0x1b7/0x210 [ 21.155196] __load_free_space_cache+0x1c1/0x730 [btrfs] [ 21.157014] load_free_space_cache+0xb9/0x190 [btrfs] [ 21.158222] btrfs_cache_block_group+0x1f8/0x500 [btrfs] [ 21.159717] ? finish_wait+0x90/0x90 [ 21.160723] find_free_extent+0xa17/0xf00 [btrfs] [ 21.161798] ? kvm_sched_clock_read+0x14/0x30 [ 21.163022] ? sched_clock_cpu+0x10/0x120 [ 21.164361] btrfs_reserve_extent+0x9b/0x180 [btrfs] [ 21.165952] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] [ 21.167680] ? __lock_acquire+0x272/0x1320 [ 21.169353] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] [ 21.171313] __btrfs_cow_block+0x143/0x7a0 [btrfs] [ 21.173080] btrfs_cow_block+0x15f/0x310 [btrfs] [ 21.174487] btrfs_search_slot+0x93b/0xf70 [btrfs] [ 21.175940] btrfs_lookup_inode+0x3a/0xc0 [btrfs] [ 21.177419] ? __btrfs_commit_inode_delayed_items+0x417/0x4e0 [btrfs] [ 21.179032] ? __btrfs_commit_inode_delayed_items+0x44c/0x4e0 [btrfs] [ 21.180787] __btrfs_update_delayed_inode+0x73/0x260 [btrfs] [ 21.182174] __btrfs_commit_inode_delayed_items+0x46c/0x4e0 [btrfs] [ 21.183907] ? btrfs_first_delayed_node+0x4c/0x90 [btrfs] [ 21.185204] __btrfs_run_delayed_items+0x8e/0x140 [btrfs] [ 21.186521] btrfs_commit_transaction+0x312/0xae0 [btrfs] [ 21.188142] ? btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [ 21.189684] sync_filesystem+0x6e/0x90 [ 21.190878] generic_shutdown_super+0x22/0x100 [ 21.192693] kill_anon_super+0x14/0x30 [ 21.194389] btrfs_kill_super+0x12/0x20 [btrfs] [ 21.196078] deactivate_locked_super+0x2c/0x70 [ 21.197732] cleanup_mnt+0x100/0x160 [ 21.199033] task_work_run+0x90/0xc0 [ 21.200331] exit_to_usermode_loop+0x96/0xa0 [ 21.201744] do_syscall_64+0x1df/0x210 [ 21.203187] entry_SYSCALL_64_after_hwframe+0x49/0xbe
On Thu, Feb 20, 2020 at 06:54:00PM +0100, David Sterba wrote:> On Wed, Feb 19, 2020 at 01:00:39PM -0800, Matthew Wilcox wrote: > > From: "Matthew Wilcox (Oracle)" <willy at infradead.org> > > > > This series adds a readahead address_space operation to eventually > > replace the readpages operation. The key difference is that > > pages are added to the page cache as they are allocated (and > > then looked up by the filesystem) instead of passing them on a > > list to the readpages operation and having the filesystem add > > them to the page cache. It's a net reduction in code for each > > implementation, more efficient than walking a list, and solves > > the direct-write vs buffered-read problem reported by yu kuai at > > https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3 at huawei.com/__;!!GqivPVa7Brio!PSofOvlJ7CW1rg5JjImjIgIXeIgapJVME9s7Q5n9tSU2lQzOyy6wg4-tkac0Lhm2mHQg1Q$ > > > > The only unconverted filesystems are those which use fscache. > > Their conversion is pending Dave Howells' rewrite which will make the > > conversion substantially easier. > > > > I want to thank the reviewers; Dave Chinner, John Hubbard and Christoph > > Hellwig have done a marvellous job of providing constructive criticism. > > Eric Biggers pointed out how I'd broken ext4 (which led to a substantial > > change). I've tried to take it all on board, but I may have missed > > something simply because you've done such a thorough job. > > > > This series can also be found at > > https://urldefense.com/v3/__http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/tags/readahead_v7__;!!GqivPVa7Brio!PSofOvlJ7CW1rg5JjImjIgIXeIgapJVME9s7Q5n9tSU2lQzOyy6wg4-tkac0Lhk1B9Ks6g$ > > (I also pushed the readahead_v6 tag there in case anyone wants to diff, and > > they're both based on 5.6-rc2 so they're easy to diff) > > > > v7: > > - Now passes an xfstests run on ext4! > > On btrfs it still chokes on the first test btrfs/001, with the following > warning, the test is stuck there.Thanks. The warning actually wasn't the problem, but it did need to be addressed. I got a test system up & running with btrfs, and it's currently on generic/027 with the following patch: diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 9c782c15f7f7..d23a224d2ad2 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -676,7 +676,7 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac, struct page **array, unsigned int array_sz) { unsigned int i = 0; - XA_STATE(xas, &rac->mapping->i_pages, rac->_index); + XA_STATE(xas, &rac->mapping->i_pages, 0); struct page *page; BUG_ON(rac->_batch_count > rac->_nr_pages); @@ -684,6 +684,8 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac, rac->_index += rac->_batch_count; rac->_batch_count = 0; + xas_set(&xas, rac->_index); + rcu_read_lock(); xas_for_each(&xas, page, rac->_index + rac->_nr_pages - 1) { VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageTail(page), page); @@ -702,6 +704,7 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac, if (i == array_sz) break; } + rcu_read_unlock(); return i; }