Vlastimil Babka
2016-Apr-01 21:29 UTC
[PATCH v3 02/16] mm/compaction: support non-lru movable page migration
Might have been better as a separate migration patch and then a compaction patch. It's prefixed mm/compaction, but most changed are in mm/migrate.c On 03/30/2016 09:12 AM, Minchan Kim wrote:> We have allowed migration for only LRU pages until now and it was > enough to make high-order pages. But recently, embedded system(e.g., > webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) > so we have seen several reports about troubles of small high-order > allocation. For fixing the problem, there were several efforts > (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, > reserved memory, vmalloc and so on) but if there are lots of > non-movable pages in system, their solutions are void in the long run. > > So, this patch is to support facility to change non-movable pages > with movable. For the feature, this patch introduces functions related > to migration to address_space_operations as well as some page flags. > > Basically, this patch supports two page-flags and two functions related > to page migration. The flag and page->mapping stability are protected > by PG_lock. > > PG_movable > PG_isolated > > bool (*isolate_page) (struct page *, isolate_mode_t); > void (*putback_page) (struct page *); > > Duty of subsystem want to make their pages as migratable are > as follows: > > 1. It should register address_space to page->mapping then mark > the page as PG_movable via __SetPageMovable. > > 2. It should mark the page as PG_isolated via SetPageIsolated > if isolation is sucessful and return true.Ah another thing to document (especially in the comments/Doc) is that the subsystem must not expect anything to survive in page.lru (or fields that union it) after having isolated successfully.> 3. If migration is successful, it should clear PG_isolated and > PG_movable of the page for free preparation then release the > reference of the page to free. > > 4. If migration fails, putback function of subsystem should > clear PG_isolated via ClearPageIsolated. > > 5. If a subsystem want to release isolated page, it should > clear PG_isolated but not PG_movable. Instead, VM will do it.Under lock? Or just with ClearPageIsolated?> Cc: Vlastimil Babka <vbabka at suse.cz> > Cc: Mel Gorman <mgorman at suse.de> > Cc: Hugh Dickins <hughd at google.com> > Cc: dri-devel at lists.freedesktop.org > Cc: virtualization at lists.linux-foundation.org > Signed-off-by: Gioh Kim <gurugio at hanmail.net> > Signed-off-by: Minchan Kim <minchan at kernel.org> > --- > Documentation/filesystems/Locking | 4 + > Documentation/filesystems/vfs.txt | 5 + > fs/proc/page.c | 3 + > include/linux/fs.h | 2 + > include/linux/migrate.h | 2 + > include/linux/page-flags.h | 31 ++++++ > include/uapi/linux/kernel-page-flags.h | 1 + > mm/compaction.c | 14 ++- > mm/migrate.c | 174 +++++++++++++++++++++++++++++---- > 9 files changed, 217 insertions(+), 19 deletions(-) > > diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking > index 619af9bfdcb3..0bb79560abb3 100644 > --- a/Documentation/filesystems/Locking > +++ b/Documentation/filesystems/Locking > @@ -195,7 +195,9 @@ unlocks and drops the reference. > int (*releasepage) (struct page *, int); > void (*freepage)(struct page *); > int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); > + bool (*isolate_page) (struct page *, isolate_mode_t); > int (*migratepage)(struct address_space *, struct page *, struct page *); > + void (*putback_page) (struct page *); > int (*launder_page)(struct page *); > int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); > int (*error_remove_page)(struct address_space *, struct page *); > @@ -219,7 +221,9 @@ invalidatepage: yes > releasepage: yes > freepage: yes > direct_IO: > +isolate_page: yes > migratepage: yes (both) > +putback_page: yes > launder_page: yes > is_partially_uptodate: yes > error_remove_page: yes > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt > index b02a7d598258..4c1b6c3b4bc8 100644 > --- a/Documentation/filesystems/vfs.txt > +++ b/Documentation/filesystems/vfs.txt > @@ -592,9 +592,14 @@ struct address_space_operations { > int (*releasepage) (struct page *, int); > void (*freepage)(struct page *); > ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); > + /* isolate a page for migration */ > + bool (*isolate_page) (struct page *, isolate_mode_t); > /* migrate the contents of a page to the specified target */ > int (*migratepage) (struct page *, struct page *); > + /* put the page back to right list */... "after a failed migration" ?> + void (*putback_page) (struct page *); > int (*launder_page) (struct page *); > + > int (*is_partially_uptodate) (struct page *, unsigned long, > unsigned long); > void (*is_dirty_writeback) (struct page *, bool *, bool *); > diff --git a/fs/proc/page.c b/fs/proc/page.c > index 3ecd445e830d..ce3d08a4ad8d 100644 > --- a/fs/proc/page.c > +++ b/fs/proc/page.c > @@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page) > if (page_is_idle(page)) > u |= 1 << KPF_IDLE; > > + if (PageMovable(page)) > + u |= 1 << KPF_MOVABLE; > + > u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked); > > u |= kpf_copy_bit(k, KPF_SLAB, PG_slab); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index da9e67d937e5..36f2d610e7a8 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -401,6 +401,8 @@ struct address_space_operations { > */ > int (*migratepage) (struct address_space *, > struct page *, struct page *, enum migrate_mode); > + bool (*isolate_page)(struct page *, isolate_mode_t); > + void (*putback_page)(struct page *); > int (*launder_page) (struct page *); > int (*is_partially_uptodate) (struct page *, unsigned long, > unsigned long); > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index 9b50325e4ddf..404fbfefeb33 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *, > struct page *, struct page *, enum migrate_mode); > extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, > unsigned long private, enum migrate_mode mode, int reason); > +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode); > +extern void putback_movable_page(struct page *page); > > extern int migrate_prep(void); > extern int migrate_prep_local(void); > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index f4ed4f1b0c77..77ebf8fdbc6e 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -129,6 +129,10 @@ enum pageflags { > > /* Compound pages. Stored in first tail page's flags */ > PG_double_map = PG_private_2, > + > + /* non-lru movable pages */ > + PG_movable = PG_reclaim, > + PG_isolated = PG_owner_priv_1,Documentation should probably state that these fields alias and subsystem supporting the movable pages shouldn't use them elsewhere. Also I'm a bit uncomfortable how isolate_movable_page() blindly expects that page->mapping->a_ops->isolate_page exists for PageMovable() pages. What if it's a false positive on a PG_reclaim page? Can we rely on PG_reclaim always (and without races) implying PageLRU() so that we don't even attempt isolate_movable_page()?> }; > > #ifndef __GENERATING_BOUNDS_H > @@ -614,6 +618,33 @@ static inline void __ClearPageBalloon(struct page *page) > atomic_set(&page->_mapcount, -1); > } > > +#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)IIRC this was what Gioh's previous attempts used instead of PG_movable? Is it still needed? Doesn't it prevent a driver providing movable *and* mapped pages? If it's to distinguish the PG_reclaim alias that I mention above, it seems like an overkill to me. Why would be need both special mapcount value and a flag? Checking that page->mapping->a_ops->isolate_page exists before calling it should be enough to resolve the ambiguity?> + > +static inline int PageMovable(struct page *page) > +{ > + return ((test_bit(PG_movable, &(page)->flags) && > + atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE) > + || PageBalloon(page)); > +} > + > +/* Caller should hold a PG_lock */ > +static inline void __SetPageMovable(struct page *page, > + struct address_space *mapping) > +{ > + page->mapping = mapping; > + __set_bit(PG_movable, &page->flags); > + atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE); > +} > + > +static inline void __ClearPageMovable(struct page *page) > +{ > + atomic_set(&page->_mapcount, -1); > + __clear_bit(PG_movable, &(page)->flags); > + page->mapping = NULL; > +} > + > +PAGEFLAG(Isolated, isolated, PF_ANY); > + > /* > * If network-based swap is enabled, sl*b must keep track of whether pages > * were allocated from pfmemalloc reserves. > diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h > index 5da5f8751ce7..a184fd2434fa 100644 > --- a/include/uapi/linux/kernel-page-flags.h > +++ b/include/uapi/linux/kernel-page-flags.h > @@ -34,6 +34,7 @@ > #define KPF_BALLOON 23 > #define KPF_ZERO_PAGE 24 > #define KPF_IDLE 25 > +#define KPF_MOVABLE 26 > > > #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */ > diff --git a/mm/compaction.c b/mm/compaction.c > index ccf97b02b85f..7557aedddaee 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > /* > * Check may be lockless but that's ok as we recheck later. > - * It's possible to migrate LRU pages and balloon pages > + * It's possible to migrate LRU and movable kernel pages. > * Skip any other type of page > */ > is_lru = PageLRU(page); > @@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > goto isolate_success; > } > } > + > + if (unlikely(PageMovable(page)) && > + !PageIsolated(page)) { > + if (locked) { > + spin_unlock_irqrestore(&zone->lru_lock, > + flags); > + locked = false; > + } > + > + if (isolate_movable_page(page, isolate_mode)) > + goto isolate_success; > + } > } > > /* > diff --git a/mm/migrate.c b/mm/migrate.c > index 53529c805752..b56bf2b3fe8c 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -73,6 +73,85 @@ int migrate_prep_local(void) > return 0; > } > > +bool isolate_movable_page(struct page *page, isolate_mode_t mode) > +{ > + bool ret = false;Maintaining "ret" seems useless here. All the "goto out*" statements are executed only when ret is false, and ret == true is returned by a different return.> + > + /* > + * Avoid burning cycles with pages that are yet under __free_pages(), > + * or just got freed under us. > + * > + * In case we 'win' a race for a movable page being freed under us and > + * raise its refcount preventing __free_pages() from doing its job > + * the put_page() at the end of this block will take care of > + * release this page, thus avoiding a nasty leakage. > + */ > + if (unlikely(!get_page_unless_zero(page))) > + goto out; > + > + /* > + * Check PG_movable before holding a PG_lock because page's owner > + * assumes anybody doesn't touch PG_lock of newly allocated page. > + */ > + if (unlikely(!PageMovable(page))) > + goto out_putpage; > + /* > + * As movable pages are not isolated from LRU lists, concurrent > + * compaction threads can race against page migration functions > + * as well as race against the releasing a page. > + * > + * In order to avoid having an already isolated movable page > + * being (wrongly) re-isolated while it is under migration, > + * or to avoid attempting to isolate pages being released, > + * lets be sure we have the page lock > + * before proceeding with the movable page isolation steps. > + */ > + if (unlikely(!trylock_page(page))) > + goto out_putpage; > + > + if (!PageMovable(page) || PageIsolated(page)) > + goto out_no_isolated; > + > + ret = page->mapping->a_ops->isolate_page(page, mode); > + if (!ret) > + goto out_no_isolated; > + > + WARN_ON_ONCE(!PageIsolated(page)); > + unlock_page(page); > + return ret; > + > +out_no_isolated: > + unlock_page(page); > +out_putpage: > + put_page(page); > +out: > + return ret; > +} > + > +/* It should be called on page which is PG_movable */ > +void putback_movable_page(struct page *page) > +{ > + /* > + * 'lock_page()' stabilizes the page and prevents races against > + * concurrent isolation threads attempting to re-isolate it. > + */ > + VM_BUG_ON_PAGE(!PageMovable(page), page); > + > + lock_page(page); > + if (PageIsolated(page)) { > + struct address_space *mapping; > + > + mapping = page_mapping(page); > + mapping->a_ops->putback_page(page); > + WARN_ON_ONCE(PageIsolated(page)); > + } else { > + __ClearPageMovable(page); > + } > + unlock_page(page); > + /* drop the extra ref count taken for movable page isolation */ > + put_page(page); > +} > + > /* > * Put previously isolated pages back onto the appropriate lists > * from where they were once taken off for compaction/migration. > @@ -94,10 +173,18 @@ void putback_movable_pages(struct list_head *l) > list_del(&page->lru); > dec_zone_page_state(page, NR_ISOLATED_ANON + > page_is_file_cache(page)); > - if (unlikely(isolated_balloon_page(page))) > + if (unlikely(isolated_balloon_page(page))) { > balloon_page_putback(page); > - else > + } else if (unlikely(PageMovable(page))) { > + if (PageIsolated(page)) { > + putback_movable_page(page); > + } else { > + __ClearPageMovable(page);We don't do lock_page() here, so what prevents parallel compaction isolating the same page?> + put_page(page); > + } > + } else { > putback_lru_page(page); > + } > } > } > > @@ -592,7 +679,7 @@ void migrate_page_copy(struct page *newpage, struct page *page) > ***********************************************************/ > > /* > - * Common logic to directly migrate a single page suitable for > + * Common logic to directly migrate a single LRU page suitable for > * pages that do not use PagePrivate/PagePrivate2. > * > * Pages are locked upon entry and exit. > @@ -755,24 +842,54 @@ static int move_to_new_page(struct page *newpage, struct page *page, > enum migrate_mode mode) > { > struct address_space *mapping; > - int rc; > + int rc = -EAGAIN; > + bool lru_movable = true; > > VM_BUG_ON_PAGE(!PageLocked(page), page); > VM_BUG_ON_PAGE(!PageLocked(newpage), newpage); > > mapping = page_mapping(page); > - if (!mapping) > - rc = migrate_page(mapping, newpage, page, mode); > - else if (mapping->a_ops->migratepage) > - /* > - * Most pages have a mapping and most filesystems provide a > - * migratepage callback. Anonymous pages are part of swap > - * space which also has its own migratepage callback. This > - * is the most common path for page migration. > - */ > - rc = mapping->a_ops->migratepage(mapping, newpage, page, mode); > - else > - rc = fallback_migrate_page(mapping, newpage, page, mode); > + /* > + * In case of non-lru page, it could be released after > + * isolation step. In that case, we shouldn't try > + * fallback migration which was designed for LRU pages. > + * > + * The rule for such case is that subsystem should clear > + * PG_isolated but remains PG_movable so VM should catch > + * it and clear PG_movable for it. > + */ > + if (unlikely(PageMovable(page))) {Can false positive from PG_reclaim occur here?> + lru_movable = false; > + VM_BUG_ON_PAGE(!mapping, page); > + if (!PageIsolated(page)) { > + rc = MIGRATEPAGE_SUCCESS; > + __ClearPageMovable(page); > + goto out; > + } > + } > + > + if (likely(lru_movable)) { > + if (!mapping) > + rc = migrate_page(mapping, newpage, page, mode); > + else if (mapping->a_ops->migratepage) > + /* > + * Most pages have a mapping and most filesystems > + * provide a migratepage callback. Anonymous pages > + * are part of swap space which also has its own > + * migratepage callback. This is the most common path > + * for page migration. > + */ > + rc = mapping->a_ops->migratepage(mapping, newpage, > + page, mode); > + else > + rc = fallback_migrate_page(mapping, newpage, > + page, mode); > + } else { > + rc = mapping->a_ops->migratepage(mapping, newpage, > + page, mode); > + WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS && > + PageIsolated(page)); > + } > > /* > * When successful, old pagecache page->mapping must be cleared before > @@ -782,6 +899,7 @@ static int move_to_new_page(struct page *newpage, struct page *page, > if (!PageAnon(page)) > page->mapping = NULL; > } > +out: > return rc; > } > > @@ -960,6 +1078,8 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, > put_new_page(newpage, private); > else > put_page(newpage); > + if (PageMovable(page)) > + __ClearPageMovable(page); > goto out; > } > > @@ -1000,8 +1120,26 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, > num_poisoned_pages_inc(); > } > } else { > - if (rc != -EAGAIN) > - putback_lru_page(page); > + if (rc != -EAGAIN) { > + /* > + * subsystem couldn't remove PG_movable since page is > + * isolated so PageMovable check is not racy in here. > + * But PageIsolated check can be racy but it's okay > + * because putback_movable_page checks it under PG_lock > + * again. > + */ > + if (unlikely(PageMovable(page))) { > + if (PageIsolated(page)) > + putback_movable_page(page); > + else { > + __ClearPageMovable(page);Again, we don't do lock_page() here, so what prevents parallel compaction isolating the same page? Sorry for so many questions, hope they all have good answers and this series is a success :) Thanks for picking it up.> + put_page(page); > + } > + } else { > + putback_lru_page(page); > + } > + } > + > if (put_new_page) > put_new_page(newpage, private); > else >
Minchan Kim
2016-Apr-04 05:12 UTC
[PATCH v3 02/16] mm/compaction: support non-lru movable page migration
On Fri, Apr 01, 2016 at 11:29:14PM +0200, Vlastimil Babka wrote:> Might have been better as a separate migration patch and then a > compaction patch. It's prefixed mm/compaction, but most changed are > in mm/migrate.cIndeed. The title is rather misleading but not sure it's a good idea to separate compaction and migration part. I will just resend to change the tile from "mm/compaction" to "mm/migration".> > On 03/30/2016 09:12 AM, Minchan Kim wrote: > >We have allowed migration for only LRU pages until now and it was > >enough to make high-order pages. But recently, embedded system(e.g., > >webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) > >so we have seen several reports about troubles of small high-order > >allocation. For fixing the problem, there were several efforts > >(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, > >reserved memory, vmalloc and so on) but if there are lots of > >non-movable pages in system, their solutions are void in the long run. > > > >So, this patch is to support facility to change non-movable pages > >with movable. For the feature, this patch introduces functions related > >to migration to address_space_operations as well as some page flags. > > > >Basically, this patch supports two page-flags and two functions related > >to page migration. The flag and page->mapping stability are protected > >by PG_lock. > > > > PG_movable > > PG_isolated > > > > bool (*isolate_page) (struct page *, isolate_mode_t); > > void (*putback_page) (struct page *); > > > >Duty of subsystem want to make their pages as migratable are > >as follows: > > > >1. It should register address_space to page->mapping then mark > >the page as PG_movable via __SetPageMovable. > > > >2. It should mark the page as PG_isolated via SetPageIsolated > >if isolation is sucessful and return true. > > Ah another thing to document (especially in the comments/Doc) is > that the subsystem must not expect anything to survive in page.lru > (or fields that union it) after having isolated successfully.Indeed. I surprised I didn't miss because I wrote it down somewhere but might miss it during rebase. I will fix it.> > >3. If migration is successful, it should clear PG_isolated and > >PG_movable of the page for free preparation then release the > >reference of the page to free. > > > >4. If migration fails, putback function of subsystem should > >clear PG_isolated via ClearPageIsolated. > > > >5. If a subsystem want to release isolated page, it should > >clear PG_isolated but not PG_movable. Instead, VM will do it. > > Under lock? Or just with ClearPageIsolated?Both: ClearPageIsolated undert PG_lock. Yes, it's better to change ClearPageIsolated to __ClearPageIsolated.> > >Cc: Vlastimil Babka <vbabka at suse.cz> > >Cc: Mel Gorman <mgorman at suse.de> > >Cc: Hugh Dickins <hughd at google.com> > >Cc: dri-devel at lists.freedesktop.org > >Cc: virtualization at lists.linux-foundation.org > >Signed-off-by: Gioh Kim <gurugio at hanmail.net> > >Signed-off-by: Minchan Kim <minchan at kernel.org> > >--- > > Documentation/filesystems/Locking | 4 + > > Documentation/filesystems/vfs.txt | 5 + > > fs/proc/page.c | 3 + > > include/linux/fs.h | 2 + > > include/linux/migrate.h | 2 + > > include/linux/page-flags.h | 31 ++++++ > > include/uapi/linux/kernel-page-flags.h | 1 + > > mm/compaction.c | 14 ++- > > mm/migrate.c | 174 +++++++++++++++++++++++++++++---- > > 9 files changed, 217 insertions(+), 19 deletions(-) > > > >diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking > >index 619af9bfdcb3..0bb79560abb3 100644 > >--- a/Documentation/filesystems/Locking > >+++ b/Documentation/filesystems/Locking > >@@ -195,7 +195,9 @@ unlocks and drops the reference. > > int (*releasepage) (struct page *, int); > > void (*freepage)(struct page *); > > int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); > >+ bool (*isolate_page) (struct page *, isolate_mode_t); > > int (*migratepage)(struct address_space *, struct page *, struct page *); > >+ void (*putback_page) (struct page *); > > int (*launder_page)(struct page *); > > int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); > > int (*error_remove_page)(struct address_space *, struct page *); > >@@ -219,7 +221,9 @@ invalidatepage: yes > > releasepage: yes > > freepage: yes > > direct_IO: > >+isolate_page: yes > > migratepage: yes (both) > >+putback_page: yes > > launder_page: yes > > is_partially_uptodate: yes > > error_remove_page: yes > >diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt > >index b02a7d598258..4c1b6c3b4bc8 100644 > >--- a/Documentation/filesystems/vfs.txt > >+++ b/Documentation/filesystems/vfs.txt > >@@ -592,9 +592,14 @@ struct address_space_operations { > > int (*releasepage) (struct page *, int); > > void (*freepage)(struct page *); > > ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); > >+ /* isolate a page for migration */ > >+ bool (*isolate_page) (struct page *, isolate_mode_t); > > /* migrate the contents of a page to the specified target */ > > int (*migratepage) (struct page *, struct page *); > >+ /* put the page back to right list */ > > ... "after a failed migration" ?Better.> > >+ void (*putback_page) (struct page *); > > int (*launder_page) (struct page *); > >+ > > int (*is_partially_uptodate) (struct page *, unsigned long, > > unsigned long); > > void (*is_dirty_writeback) (struct page *, bool *, bool *); > >diff --git a/fs/proc/page.c b/fs/proc/page.c > >index 3ecd445e830d..ce3d08a4ad8d 100644 > >--- a/fs/proc/page.c > >+++ b/fs/proc/page.c > >@@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page) > > if (page_is_idle(page)) > > u |= 1 << KPF_IDLE; > > > >+ if (PageMovable(page)) > >+ u |= 1 << KPF_MOVABLE; > >+ > > u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked); > > > > u |= kpf_copy_bit(k, KPF_SLAB, PG_slab); > >diff --git a/include/linux/fs.h b/include/linux/fs.h > >index da9e67d937e5..36f2d610e7a8 100644 > >--- a/include/linux/fs.h > >+++ b/include/linux/fs.h > >@@ -401,6 +401,8 @@ struct address_space_operations { > > */ > > int (*migratepage) (struct address_space *, > > struct page *, struct page *, enum migrate_mode); > >+ bool (*isolate_page)(struct page *, isolate_mode_t); > >+ void (*putback_page)(struct page *); > > int (*launder_page) (struct page *); > > int (*is_partially_uptodate) (struct page *, unsigned long, > > unsigned long); > >diff --git a/include/linux/migrate.h b/include/linux/migrate.h > >index 9b50325e4ddf..404fbfefeb33 100644 > >--- a/include/linux/migrate.h > >+++ b/include/linux/migrate.h > >@@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *, > > struct page *, struct page *, enum migrate_mode); > > extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, > > unsigned long private, enum migrate_mode mode, int reason); > >+extern bool isolate_movable_page(struct page *page, isolate_mode_t mode); > >+extern void putback_movable_page(struct page *page); > > > > extern int migrate_prep(void); > > extern int migrate_prep_local(void); > >diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > >index f4ed4f1b0c77..77ebf8fdbc6e 100644 > >--- a/include/linux/page-flags.h > >+++ b/include/linux/page-flags.h > >@@ -129,6 +129,10 @@ enum pageflags { > > > > /* Compound pages. Stored in first tail page's flags */ > > PG_double_map = PG_private_2, > >+ > >+ /* non-lru movable pages */ > >+ PG_movable = PG_reclaim, > >+ PG_isolated = PG_owner_priv_1, > > Documentation should probably state that these fields alias and > subsystem supporting the movable pages shouldn't use them elsewhere.Yeb.> > Also I'm a bit uncomfortable how isolate_movable_page() blindly expects that > page->mapping->a_ops->isolate_page exists for PageMovable() pages. > What if it's a false positive on a PG_reclaim page? Can we rely on > PG_reclaim always (and without races) implying PageLRU() so that we > don't even attempt isolate_movable_page()?For now, we shouldn't have such a false positive because PageMovable checks page->_mapcount == PAGE_MOVABLE_MAPCOUNT_VALUE as well as PG_movable under PG_lock. But I read your question about user-mapped drvier pages so we cannot use _mapcount anymore so I will find another thing. A option is this. static inline int PageMovable(struct page *page) { int ret = 0; struct address_space *mapping; struct address_space_operations *a_op; if (!test_bit(PG_movable, &(page->flags)) goto out; mapping = page->mapping; if (!mapping) goto out; a_op = mapping->a_op; if (!aop) goto out; if (a_op->isolate_page) ret = 1; out: return ret; } It works under PG_lock but with this, we need trylock_page to peek whether it's movable non-lru or not for scanning pfn. For avoiding that, we need another function to peek which just checks PG_movable bit instead of all things. /* * If @page_locked is false, we cannot guarantee page->mapping's stability * so just the function checks with PG_movable which could be false positive * so caller should check it again under PG_lock to check a_ops->isolate_page. */ static inline int PageMovable(struct page *page, bool page_locked) { int ret = 0; struct address_space *mapping; struct address_space_operations *a_op; if (!test_bit(PG_movable, &(page->flags)) goto out; if (!page_locked) { ret = 1; goto out; } mapping = page->mapping; if (!mapping) goto out; a_op = mapping->a_op; if (!aop) goto out; if (a_op->isolate_page) ret = 1; out: return ret; }> > > }; > > > > #ifndef __GENERATING_BOUNDS_H > >@@ -614,6 +618,33 @@ static inline void __ClearPageBalloon(struct page *page) > > atomic_set(&page->_mapcount, -1); > > } > > > >+#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255) > > IIRC this was what Gioh's previous attempts used instead of > PG_movable? Is it still needed? Doesn't it prevent a driverIt needs to avoid false positive as I said.> providing movable *and* mapped pages?Absolutely true. I will rethink about it.> If it's to distinguish the PG_reclaim alias that I mention above, it > seems like an overkill to me. Why would be need both special > mapcount value and a flag? Checking that > page->mapping->a_ops->isolate_page exists before calling it should > be enough to resolve the ambiguity?As I mentioned, using a_ops->isolate_page needs to be done under PG_lock. And the idea I suggested above will work, I guess. I will try it.> > >+ > >+static inline int PageMovable(struct page *page) > >+{ > >+ return ((test_bit(PG_movable, &(page)->flags) && > >+ atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE) > >+ || PageBalloon(page)); > >+} > >+ > >+/* Caller should hold a PG_lock */ > >+static inline void __SetPageMovable(struct page *page, > >+ struct address_space *mapping) > >+{ > >+ page->mapping = mapping; > >+ __set_bit(PG_movable, &page->flags); > >+ atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE); > >+} > >+ > >+static inline void __ClearPageMovable(struct page *page) > >+{ > >+ atomic_set(&page->_mapcount, -1); > >+ __clear_bit(PG_movable, &(page)->flags); > >+ page->mapping = NULL; > >+} > >+ > >+PAGEFLAG(Isolated, isolated, PF_ANY); > >+ > > /* > > * If network-based swap is enabled, sl*b must keep track of whether pages > > * were allocated from pfmemalloc reserves. > >diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h > >index 5da5f8751ce7..a184fd2434fa 100644 > >--- a/include/uapi/linux/kernel-page-flags.h > >+++ b/include/uapi/linux/kernel-page-flags.h > >@@ -34,6 +34,7 @@ > > #define KPF_BALLOON 23 > > #define KPF_ZERO_PAGE 24 > > #define KPF_IDLE 25 > >+#define KPF_MOVABLE 26 > > > > > > #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */ > >diff --git a/mm/compaction.c b/mm/compaction.c > >index ccf97b02b85f..7557aedddaee 100644 > >--- a/mm/compaction.c > >+++ b/mm/compaction.c > >@@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > > > /* > > * Check may be lockless but that's ok as we recheck later. > >- * It's possible to migrate LRU pages and balloon pages > >+ * It's possible to migrate LRU and movable kernel pages. > > * Skip any other type of page > > */ > > is_lru = PageLRU(page); > >@@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > goto isolate_success; > > } > > } > >+ > >+ if (unlikely(PageMovable(page)) && > >+ !PageIsolated(page)) { > >+ if (locked) { > >+ spin_unlock_irqrestore(&zone->lru_lock, > >+ flags); > >+ locked = false; > >+ } > >+ > >+ if (isolate_movable_page(page, isolate_mode)) > >+ goto isolate_success; > >+ } > > } > > > > /* > >diff --git a/mm/migrate.c b/mm/migrate.c > >index 53529c805752..b56bf2b3fe8c 100644 > >--- a/mm/migrate.c > >+++ b/mm/migrate.c > >@@ -73,6 +73,85 @@ int migrate_prep_local(void) > > return 0; > > } > > > >+bool isolate_movable_page(struct page *page, isolate_mode_t mode) > >+{ > >+ bool ret = false; > > Maintaining "ret" seems useless here. All the "goto out*" statements > are executed only when ret is false, and ret == true is returned by > a different return.Yeb. Will change.> > >+ > >+ /* > >+ * Avoid burning cycles with pages that are yet under __free_pages(), > >+ * or just got freed under us. > >+ * > >+ * In case we 'win' a race for a movable page being freed under us and > >+ * raise its refcount preventing __free_pages() from doing its job > >+ * the put_page() at the end of this block will take care of > >+ * release this page, thus avoiding a nasty leakage. > >+ */ > >+ if (unlikely(!get_page_unless_zero(page))) > >+ goto out; > >+ > >+ /* > >+ * Check PG_movable before holding a PG_lock because page's owner > >+ * assumes anybody doesn't touch PG_lock of newly allocated page. > >+ */ > >+ if (unlikely(!PageMovable(page))) > >+ goto out_putpage; > >+ /* > >+ * As movable pages are not isolated from LRU lists, concurrent > >+ * compaction threads can race against page migration functions > >+ * as well as race against the releasing a page. > >+ * > >+ * In order to avoid having an already isolated movable page > >+ * being (wrongly) re-isolated while it is under migration, > >+ * or to avoid attempting to isolate pages being released, > >+ * lets be sure we have the page lock > >+ * before proceeding with the movable page isolation steps. > >+ */ > >+ if (unlikely(!trylock_page(page))) > >+ goto out_putpage; > >+ > >+ if (!PageMovable(page) || PageIsolated(page)) > >+ goto out_no_isolated; > >+ > >+ ret = page->mapping->a_ops->isolate_page(page, mode); > >+ if (!ret) > >+ goto out_no_isolated; > >+ > >+ WARN_ON_ONCE(!PageIsolated(page)); > >+ unlock_page(page); > >+ return ret; > >+ > >+out_no_isolated: > >+ unlock_page(page); > >+out_putpage: > >+ put_page(page); > >+out: > >+ return ret; > >+} > >+ > >+/* It should be called on page which is PG_movable */ > >+void putback_movable_page(struct page *page) > >+{ > >+ /* > >+ * 'lock_page()' stabilizes the page and prevents races against > >+ * concurrent isolation threads attempting to re-isolate it. > >+ */ > >+ VM_BUG_ON_PAGE(!PageMovable(page), page); > >+ > >+ lock_page(page); > >+ if (PageIsolated(page)) { > >+ struct address_space *mapping; > >+ > >+ mapping = page_mapping(page); > >+ mapping->a_ops->putback_page(page); > >+ WARN_ON_ONCE(PageIsolated(page)); > >+ } else { > >+ __ClearPageMovable(page); > >+ } > >+ unlock_page(page); > >+ /* drop the extra ref count taken for movable page isolation */ > >+ put_page(page); > >+} > >+ > > /* > > * Put previously isolated pages back onto the appropriate lists > > * from where they were once taken off for compaction/migration. > >@@ -94,10 +173,18 @@ void putback_movable_pages(struct list_head *l) > > list_del(&page->lru); > > dec_zone_page_state(page, NR_ISOLATED_ANON + > > page_is_file_cache(page)); > >- if (unlikely(isolated_balloon_page(page))) > >+ if (unlikely(isolated_balloon_page(page))) { > > balloon_page_putback(page); > >- else > >+ } else if (unlikely(PageMovable(page))) { > >+ if (PageIsolated(page)) { > >+ putback_movable_page(page); > >+ } else { > >+ __ClearPageMovable(page); > > We don't do lock_page() here, so what prevents parallel compaction > isolating the same page?Need PG_lock.> > >+ put_page(page); > >+ } > >+ } else { > > putback_lru_page(page); > >+ } > > } > > } > > > >@@ -592,7 +679,7 @@ void migrate_page_copy(struct page *newpage, struct page *page) > > ***********************************************************/ > > > > /* > >- * Common logic to directly migrate a single page suitable for > >+ * Common logic to directly migrate a single LRU page suitable for > > * pages that do not use PagePrivate/PagePrivate2. > > * > > * Pages are locked upon entry and exit. > >@@ -755,24 +842,54 @@ static int move_to_new_page(struct page *newpage, struct page *page, > > enum migrate_mode mode) > > { > > struct address_space *mapping; > >- int rc; > >+ int rc = -EAGAIN; > >+ bool lru_movable = true; > > > > VM_BUG_ON_PAGE(!PageLocked(page), page); > > VM_BUG_ON_PAGE(!PageLocked(newpage), newpage); > > > > mapping = page_mapping(page); > >- if (!mapping) > >- rc = migrate_page(mapping, newpage, page, mode); > >- else if (mapping->a_ops->migratepage) > >- /* > >- * Most pages have a mapping and most filesystems provide a > >- * migratepage callback. Anonymous pages are part of swap > >- * space which also has its own migratepage callback. This > >- * is the most common path for page migration. > >- */ > >- rc = mapping->a_ops->migratepage(mapping, newpage, page, mode); > >- else > >- rc = fallback_migrate_page(mapping, newpage, page, mode); > >+ /* > >+ * In case of non-lru page, it could be released after > >+ * isolation step. In that case, we shouldn't try > >+ * fallback migration which was designed for LRU pages. > >+ * > >+ * The rule for such case is that subsystem should clear > >+ * PG_isolated but remains PG_movable so VM should catch > >+ * it and clear PG_movable for it. > >+ */ > >+ if (unlikely(PageMovable(page))) { > > Can false positive from PG_reclaim occur here?PageMovable has _mapcount == PAGE_MOVALBE_MAPCOUNT_VALUE check.> > >+ lru_movable = false; > >+ VM_BUG_ON_PAGE(!mapping, page); > >+ if (!PageIsolated(page)) { > >+ rc = MIGRATEPAGE_SUCCESS; > >+ __ClearPageMovable(page); > >+ goto out; > >+ } > >+ } > >+ > >+ if (likely(lru_movable)) { > >+ if (!mapping) > >+ rc = migrate_page(mapping, newpage, page, mode); > >+ else if (mapping->a_ops->migratepage) > >+ /* > >+ * Most pages have a mapping and most filesystems > >+ * provide a migratepage callback. Anonymous pages > >+ * are part of swap space which also has its own > >+ * migratepage callback. This is the most common path > >+ * for page migration. > >+ */ > >+ rc = mapping->a_ops->migratepage(mapping, newpage, > >+ page, mode); > >+ else > >+ rc = fallback_migrate_page(mapping, newpage, > >+ page, mode); > >+ } else { > >+ rc = mapping->a_ops->migratepage(mapping, newpage, > >+ page, mode); > >+ WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS && > >+ PageIsolated(page)); > >+ } > > > > /* > > * When successful, old pagecache page->mapping must be cleared before > >@@ -782,6 +899,7 @@ static int move_to_new_page(struct page *newpage, struct page *page, > > if (!PageAnon(page)) > > page->mapping = NULL; > > } > >+out: > > return rc; > > } > > > >@@ -960,6 +1078,8 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, > > put_new_page(newpage, private); > > else > > put_page(newpage); > >+ if (PageMovable(page)) > >+ __ClearPageMovable(page); > > goto out; > > } > > > >@@ -1000,8 +1120,26 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, > > num_poisoned_pages_inc(); > > } > > } else { > >- if (rc != -EAGAIN) > >- putback_lru_page(page); > >+ if (rc != -EAGAIN) { > >+ /* > >+ * subsystem couldn't remove PG_movable since page is > >+ * isolated so PageMovable check is not racy in here. > >+ * But PageIsolated check can be racy but it's okay > >+ * because putback_movable_page checks it under PG_lock > >+ * again. > >+ */ > >+ if (unlikely(PageMovable(page))) { > >+ if (PageIsolated(page)) > >+ putback_movable_page(page); > >+ else { > >+ __ClearPageMovable(page); > > Again, we don't do lock_page() here, so what prevents parallel > compaction isolating the same page?It seems to need PG_lock in there, too. Thanks for catching it up.> > Sorry for so many questions, hope they all have good answers and > this series is a success :) Thanks for picking it up.No problem at all. Many question means the code or/and doc is not clear and still need be improved. Thanks for detail review, Vlastimil! I will resend new versions after vacation in this week.
Vlastimil Babka
2016-Apr-04 13:24 UTC
[PATCH v3 02/16] mm/compaction: support non-lru movable page migration
On 04/04/2016 07:12 AM, Minchan Kim wrote:> On Fri, Apr 01, 2016 at 11:29:14PM +0200, Vlastimil Babka wrote: >> Might have been better as a separate migration patch and then a >> compaction patch. It's prefixed mm/compaction, but most changed are >> in mm/migrate.c > > Indeed. The title is rather misleading but not sure it's a good idea > to separate compaction and migration part.Guess it's better to see the new functions together with its user after all, OK.> I will just resend to change the tile from "mm/compaction" to > "mm/migration".OK!>> Also I'm a bit uncomfortable how isolate_movable_page() blindly expects that >> page->mapping->a_ops->isolate_page exists for PageMovable() pages. >> What if it's a false positive on a PG_reclaim page? Can we rely on >> PG_reclaim always (and without races) implying PageLRU() so that we >> don't even attempt isolate_movable_page()? > > For now, we shouldn't have such a false positive because PageMovable > checks page->_mapcount == PAGE_MOVABLE_MAPCOUNT_VALUE as well as PG_movable > under PG_lock. > > But I read your question about user-mapped drvier pages so we cannot > use _mapcount anymore so I will find another thing. A option is this. > > static inline int PageMovable(struct page *page) > { > int ret = 0; > struct address_space *mapping; > struct address_space_operations *a_op; > > if (!test_bit(PG_movable, &(page->flags)) > goto out; > > mapping = page->mapping; > if (!mapping) > goto out; > > a_op = mapping->a_op; > if (!aop) > goto out; > if (a_op->isolate_page) > ret = 1; > out: > return ret; > > } > > It works under PG_lock but with this, we need trylock_page to peek > whether it's movable non-lru or not for scanning pfn.Hm I hoped that with READ_ONCE() we could do the peek safely without trylock_page, if we use it only as a heuristic. But I guess it would require at least RCU-level protection of the page->mapping->a_op->isolate_page chain.> For avoiding that, we need another function to peek which just checks > PG_movable bit instead of all things. > > > /* > * If @page_locked is false, we cannot guarantee page->mapping's stability > * so just the function checks with PG_movable which could be false positive > * so caller should check it again under PG_lock to check a_ops->isolate_page. > */ > static inline int PageMovable(struct page *page, bool page_locked) > { > int ret = 0; > struct address_space *mapping; > struct address_space_operations *a_op; > > if (!test_bit(PG_movable, &(page->flags)) > goto out; > > if (!page_locked) { > ret = 1; > goto out; > } > > mapping = page->mapping; > if (!mapping) > goto out; > > a_op = mapping->a_op; > if (!aop) > goto out; > if (a_op->isolate_page) > ret = 1; > out: > return ret; > }I wouldn't put everything into single function, but create something like __PageMovable() just for the unlocked peek. Unlike the zone->lru_lock, we don't keep page_lock() across iterations in isolate_migratepages_block(), as obviously each page has different lock. So the page_locked parameter would be always passed as constant, and at that point it's better to have separate functions. So I guess the question is how many false positives from overlap with PG_reclaim the scanner will hit if we give up on PAGE_MOVABLE_MAPCOUNT_VALUE, as that will increase number of page locks just to realize that it's not actual PageMovable() page...> Thanks for detail review, Vlastimil! > I will resend new versions after vacation in this week.You're welcome, great!
Seemingly Similar Threads
- [PATCH v3 02/16] mm/compaction: support non-lru movable page migration
- [PATCH v3 02/16] mm/compaction: support non-lru movable page migration
- [PATCH v3 02/16] mm/compaction: support non-lru movable page migration
- [PATCH v2 13/18] mm/compaction: support non-lru movable page migration
- [PATCH v2 13/18] mm/compaction: support non-lru movable page migration