David Hildenbrand
2020-Jul-30 09:34 UTC
[PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE
@Andrew, @Mst, I suggest the whole series (including the virtio-mem change) goes via the -mm tree. Currently, virtio-mem does not really support ZONE_MOVABLE. While it allows to online fully plugged memory blocks to ZONE_MOVABLE, it does not allow to online partially-plugged memory blocks to ZONE_MOVABLE and will never consider such memory blocks when unplugging memory. This might be surprising for users (especially, if onlining suddenly fails). Let's support partially plugged memory blocks in ZONE_MOVABLE, allowing partially plugged memory blocks to be online to ZONE_MOVABLE and also unplugging from such memory blocks. This is especially helpful for testing, but also paves the way for virtio-mem optimizations, allowing more memory to get reliably unplugged. Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing better documentation of how ZONE_MOVABLE interacts with different kind of unmovable pages (memory offlining vs. alloc_contig_range()). v1 -> v2: - "mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()" -- Move to position 1, add Fixes: tag -- Drop unused "out:" label - "mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()" -- Keep curly braces on "else" case - Replace "[PATCH v1 5/6] mm/page_alloc: restrict ZONE_MOVABLE optimization in has_unmovable_pages() to memory offlining" by "mm: document semantics of ZONE_MOVABLE" -- Brain dump of what I know about ZONE_MOVABLE :) David Hildenbrand (6): mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate() mm/page_alloc: tweak comments in has_unmovable_pages() mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate() mm/page_isolation: cleanup set_migratetype_isolate() virtio-mem: don't special-case ZONE_MOVABLE mm: document semantics of ZONE_MOVABLE drivers/virtio/virtio_mem.c | 47 +++++++------------------------------ include/linux/mmzone.h | 34 +++++++++++++++++++++++++++ mm/page_alloc.c | 22 +++++------------ mm/page_isolation.c | 39 ++++++++++++++---------------- 4 files changed, 65 insertions(+), 77 deletions(-) -- 2.26.2
David Hildenbrand
2020-Jul-30 09:34 UTC
[PATCH v2 1/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
Right now, if we have two isolations racing, we might trigger the WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just return directly. In the future, we might want to report -EAGAIN to the caller instead, as this could indicate a temporary isolation failure only. Reviewed-by: Baoquan He <bhe at redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux at gmail.com> Acked-by: Mike Kravetz <mike.kravetz at oracle.com> Fixes: 4a55c0474a92 ("mm/hotplug: silence a lockdep splat with printk()") Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Qian Cai <cai at lca.pw> Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/page_isolation.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index f6d07c5f0d34d..7d7d263ce7f4b 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -29,10 +29,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ /* * We assume the caller intended to SET migrate type to isolate. * If it is already set, then someone else must have raced and - * set it before us. Return -EBUSY + * set it before us. */ - if (is_migrate_isolate_page(page)) - goto out; + if (is_migrate_isolate_page(page)) { + spin_unlock_irqrestore(&zone->lock, flags); + return -EBUSY; + } /* * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. @@ -52,7 +54,6 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ ret = 0; } -out: spin_unlock_irqrestore(&zone->lock, flags); if (!ret) { drain_all_pages(zone); -- 2.26.2
David Hildenbrand
2020-Jul-30 09:34 UTC
[PATCH v2 2/6] mm/page_alloc: tweak comments in has_unmovable_pages()
Let's move the split comment regarding bootmem allocations and memory holes, especially in the context of ZONE_MOVABLE, to the PageReserved() check. Reviewed-by: Baoquan He <bhe at redhat.com> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Mike Kravetz <mike.kravetz at oracle.com> Cc: Pankaj Gupta <pankaj.gupta.linux at gmail.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/page_alloc.c | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e028b87ce2942..042ba09d70c5d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, unsigned long iter = 0; unsigned long pfn = page_to_pfn(page); - /* - * TODO we could make this much more efficient by not checking every - * page in the range if we know all of them are in MOVABLE_ZONE and - * that the movable zone guarantees that pages are migratable but - * the later is not the case right now unfortunatelly. E.g. movablecore - * can still lead to having bootmem allocations in zone_movable. - */ - if (is_migrate_cma_page(page)) { /* * CMA allocations (alloc_contig_range) really need to mark @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, page = pfn_to_page(pfn + iter); + /* + * Both, bootmem allocations and memory holes are marked + * PG_reserved and are unmovable. We can even have unmovable + * allocations inside ZONE_MOVABLE, for example when + * specifying "movablecore". + */ if (PageReserved(page)) return page; @@ -8306,14 +8304,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, * it. But now, memory offline itself doesn't call * shrink_node_slabs() and it still to be fixed. */ - /* - * If the page is not RAM, page_count()should be 0. - * we don't need more check. This is an _used_ not-movable page. - * - * The problematic thing here is PG_reserved pages. PG_reserved - * is set to both of a memory hole page and a _used_ kernel - * page at boot. - */ return page; } return NULL; -- 2.26.2
David Hildenbrand
2020-Jul-30 09:34 UTC
[PATCH v2 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
Inside has_unmovable_pages(), we have a comment describing how unmovable data could end up in ZONE_MOVABLE - via "movable_core". Also, besides checking if the first page in the pageblock is reserved, we don't perform any further checks in case of ZONE_MOVABLE. In case of memory offlining, we set REPORT_FAILURE, properly dump_page() the page and handle the error gracefully. alloc_contig_pages() users currently never allocate from ZONE_MOVABLE. E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic pages only, which will never end up on the MOVABLE zone (see htlb_alloc_mask()). Reviewed-by: Baoquan He <bhe at redhat.com> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Mike Kravetz <mike.kravetz at oracle.com> Cc: Pankaj Gupta <pankaj.gupta.linux at gmail.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/page_isolation.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 7d7d263ce7f4b..d099aac479601 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -57,15 +57,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ spin_unlock_irqrestore(&zone->lock, flags); if (!ret) { drain_all_pages(zone); - } else { - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); - - if ((isol_flags & REPORT_FAILURE) && unmovable) - /* - * printk() with zone->lock held will likely trigger a - * lockdep splat, so defer it here. - */ - dump_page(unmovable, "unmovable page"); + } else if ((isol_flags & REPORT_FAILURE) && unmovable) { + /* + * printk() with zone->lock held will likely trigger a + * lockdep splat, so defer it here. + */ + dump_page(unmovable, "unmovable page"); } return ret; -- 2.26.2
David Hildenbrand
2020-Jul-30 09:34 UTC
[PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate()
Let's clean it up a bit, simplifying error handling and getting rid of the label. Reviewed-by: Baoquan He <bhe at redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux at gmail.com> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Mike Kravetz <mike.kravetz at oracle.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/page_isolation.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index d099aac479601..e65fe5d770849 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -17,12 +17,9 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags) { - struct page *unmovable = NULL; - struct zone *zone; + struct zone *zone = page_zone(page); + struct page *unmovable; unsigned long flags; - int ret = -EBUSY; - - zone = page_zone(page); spin_lock_irqsave(&zone->lock, flags); @@ -51,13 +48,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ NULL); __mod_zone_freepage_state(zone, -nr_pages, mt); - ret = 0; + spin_unlock_irqrestore(&zone->lock, flags); + drain_all_pages(zone); + return 0; } spin_unlock_irqrestore(&zone->lock, flags); - if (!ret) { - drain_all_pages(zone); - } else if ((isol_flags & REPORT_FAILURE) && unmovable) { + if (isol_flags & REPORT_FAILURE) { /* * printk() with zone->lock held will likely trigger a * lockdep splat, so defer it here. @@ -65,7 +62,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ dump_page(unmovable, "unmovable page"); } - return ret; + return -EBUSY; } static void unset_migratetype_isolate(struct page *page, unsigned migratetype) -- 2.26.2
David Hildenbrand
2020-Jul-30 09:34 UTC
[PATCH v2 5/6] virtio-mem: don't special-case ZONE_MOVABLE
Let's allow to online partially plugged memory blocks to ZONE_MOVABLE and also consider memory blocks that were onlined to ZONE_MOVABLE when unplugging memory. While unplugged memory blocks are, in general, unmovable, they can be skipped when offlining memory. virtio-mem only unplugs fairly big chunks (in the megabyte range) and rather tries to shrink the memory region than randomly choosing memory. In theory, if all other pages in the movable zone would be movable, virtio-mem would only shrink that zone and not create any kind of fragmentation. In the future, we might want to remember the zone again and use the information when (un)plugging memory. For now, let's keep it simple. Note: Support for defragmentation is planned, to deal with fragmentation after unplug due to memory chunks within memory blocks that could not get unplugged before (e.g., somebody pinning pages within ZONE_MOVABLE for a longer time). Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Jason Wang <jasowang at redhat.com> Cc: Mike Kravetz <mike.kravetz at oracle.com> Cc: Pankaj Gupta <pankaj.gupta.linux at gmail.com> Cc: Baoquan He <bhe at redhat.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- drivers/virtio/virtio_mem.c | 47 +++++++------------------------------ 1 file changed, 8 insertions(+), 39 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index f26f5f64ae822..2ddfc4a0e2ee0 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -36,18 +36,10 @@ enum virtio_mem_mb_state { VIRTIO_MEM_MB_STATE_OFFLINE, /* Partially plugged, fully added to Linux, offline. */ VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL, - /* Fully plugged, fully added to Linux, online (!ZONE_MOVABLE). */ + /* Fully plugged, fully added to Linux, online. */ VIRTIO_MEM_MB_STATE_ONLINE, - /* Partially plugged, fully added to Linux, online (!ZONE_MOVABLE). */ + /* Partially plugged, fully added to Linux, online. */ VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL, - /* - * Fully plugged, fully added to Linux, online (ZONE_MOVABLE). - * We are not allowed to allocate (unplug) parts of this block that - * are not movable (similar to gigantic pages). We will never allow - * to online OFFLINE_PARTIAL to ZONE_MOVABLE (as they would contain - * unmovable parts). - */ - VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE, VIRTIO_MEM_MB_STATE_COUNT }; @@ -526,21 +518,10 @@ static bool virtio_mem_owned_mb(struct virtio_mem *vm, unsigned long mb_id) } static int virtio_mem_notify_going_online(struct virtio_mem *vm, - unsigned long mb_id, - enum zone_type zone) + unsigned long mb_id) { switch (virtio_mem_mb_get_state(vm, mb_id)) { case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL: - /* - * We won't allow to online a partially plugged memory block - * to the MOVABLE zone - it would contain unmovable parts. - */ - if (zone == ZONE_MOVABLE) { - dev_warn_ratelimited(&vm->vdev->dev, - "memory block has holes, MOVABLE not supported\n"); - return NOTIFY_BAD; - } - return NOTIFY_OK; case VIRTIO_MEM_MB_STATE_OFFLINE: return NOTIFY_OK; default: @@ -560,7 +541,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm, VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL); break; case VIRTIO_MEM_MB_STATE_ONLINE: - case VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE: virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_OFFLINE); break; @@ -579,24 +559,17 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm, virtio_mem_retry(vm); } -static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id, - enum zone_type zone) +static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id) { unsigned long nb_offline; switch (virtio_mem_mb_get_state(vm, mb_id)) { case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL: - BUG_ON(zone == ZONE_MOVABLE); virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL); break; case VIRTIO_MEM_MB_STATE_OFFLINE: - if (zone == ZONE_MOVABLE) - virtio_mem_mb_set_state(vm, mb_id, - VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE); - else - virtio_mem_mb_set_state(vm, mb_id, - VIRTIO_MEM_MB_STATE_ONLINE); + virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE); break; default: BUG(); @@ -675,7 +648,6 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb, const unsigned long start = PFN_PHYS(mhp->start_pfn); const unsigned long size = PFN_PHYS(mhp->nr_pages); const unsigned long mb_id = virtio_mem_phys_to_mb_id(start); - enum zone_type zone; int rc = NOTIFY_OK; if (!virtio_mem_overlaps_range(vm, start, size)) @@ -717,8 +689,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb, break; } vm->hotplug_active = true; - zone = page_zonenum(pfn_to_page(mhp->start_pfn)); - rc = virtio_mem_notify_going_online(vm, mb_id, zone); + rc = virtio_mem_notify_going_online(vm, mb_id); break; case MEM_OFFLINE: virtio_mem_notify_offline(vm, mb_id); @@ -726,8 +697,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb, mutex_unlock(&vm->hotplug_mutex); break; case MEM_ONLINE: - zone = page_zonenum(pfn_to_page(mhp->start_pfn)); - virtio_mem_notify_online(vm, mb_id, zone); + virtio_mem_notify_online(vm, mb_id); vm->hotplug_active = false; mutex_unlock(&vm->hotplug_mutex); break; @@ -1906,8 +1876,7 @@ static void virtio_mem_remove(struct virtio_device *vdev) if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] || vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] || vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] || - vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL] || - vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE]) { + vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) { dev_warn(&vdev->dev, "device still has system memory added\n"); } else { virtio_mem_delete_resource(vm); -- 2.26.2
David Hildenbrand
2020-Jul-30 09:34 UTC
[PATCH v2 6/6] mm: document semantics of ZONE_MOVABLE
Let's document what ZONE_MOVABLE means, how it's used, and which special cases we have regarding unmovable pages (memory offlining vs. migration / allocations). Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Mike Kravetz <mike.kravetz at oracle.com> Cc: Mike Rapoport <rppt at kernel.org> Cc: Pankaj Gupta <pankaj.gupta.linux at gmail.com> Cc: Baoquan He <bhe at redhat.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- include/linux/mmzone.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index f6f884970511d..b8c49b2aff684 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -372,6 +372,40 @@ enum zone_type { */ ZONE_HIGHMEM, #endif + /* + * ZONE_MOVABLE is similar to ZONE_NORMAL, except that it *primarily* + * only contains movable pages. Main use cases are to make memory + * offlining more likely to succeed, and to locally limit unmovable + * allocations - e.g., to increase the number of THP/huge pages. + * Notable special cases: + * + * 1. Pinned pages: (Long-term) pinning of movable pages might + * essentially turn such pages unmovable. Memory offlining might + * retry a long time. + * 2. memblock allocations: kernelcore/movablecore setups might create + * situations where ZONE_MOVABLE contains unmovable allocations + * after boot. Memory offlining and allocations fail early. + * 3. Memory holes: Such pages cannot be allocated. Applies only to + * boot memory, not hotplugged memory. Memory offlining and + * allocations fail early. + * 4. PG_hwpoison pages: While poisoned pages can be skipped during + * memory offlining, such pages cannot be allocated. + * 5. Unmovable PG_offline pages: In paravirtualized environments, + * hotplugged memory blocks might only partially be managed by the + * buddy (e.g., via XEN-balloon, Hyper-V balloon, virtio-mem). The + * parts not manged by the buddy are unmovable PG_offline pages. In + * some cases (virtio-mem), such pages can be skipped during + * memory offlining, however, cannot be moved/allcoated. These + * techniques might use alloc_contig_range() to hide previously + * exposed pages from the buddy again (e.g., to implement some sort + * of memory unplug in virtio-mem). + * + * In general, no unmovable allocations that degrade memory offlining + * should end up in ZONE_MOVABLE. Allocators (like alloc_contig_range()) + * have to expect that migrating pages in ZONE_MOVABLE can fail (even + * if has_unmovable_pages() states that there are no unmovable pages, + * there can be false negatives). + */ ZONE_MOVABLE, #ifdef CONFIG_ZONE_DEVICE ZONE_DEVICE, -- 2.26.2
Vlastimil Babka
2020-Aug-06 13:35 UTC
[PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate()
On 7/30/20 11:34 AM, David Hildenbrand wrote:> Let's clean it up a bit, simplifying error handling and getting rid of > the label.Nit: the label was already removed by patch 1/6?> Reviewed-by: Baoquan He <bhe at redhat.com> > Reviewed-by: Pankaj Gupta <pankaj.gupta.linux at gmail.com> > Cc: Andrew Morton <akpm at linux-foundation.org> > Cc: Michal Hocko <mhocko at suse.com> > Cc: Michael S. Tsirkin <mst at redhat.com> > Cc: Mike Kravetz <mike.kravetz at oracle.com> > Signed-off-by: David Hildenbrand <david at redhat.com> > --- > mm/page_isolation.c | 17 +++++++---------- > 1 file changed, 7 insertions(+), 10 deletions(-) > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index d099aac479601..e65fe5d770849 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -17,12 +17,9 @@ > > static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags) > { > - struct page *unmovable = NULL; > - struct zone *zone; > + struct zone *zone = page_zone(page); > + struct page *unmovable; > unsigned long flags; > - int ret = -EBUSY; > - > - zone = page_zone(page); > > spin_lock_irqsave(&zone->lock, flags); > > @@ -51,13 +48,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ > NULL); > > __mod_zone_freepage_state(zone, -nr_pages, mt); > - ret = 0; > + spin_unlock_irqrestore(&zone->lock, flags); > + drain_all_pages(zone); > + return 0; > } > > spin_unlock_irqrestore(&zone->lock, flags); > - if (!ret) { > - drain_all_pages(zone); > - } else if ((isol_flags & REPORT_FAILURE) && unmovable) { > + if (isol_flags & REPORT_FAILURE) { > /* > * printk() with zone->lock held will likely trigger a > * lockdep splat, so defer it here. > @@ -65,7 +62,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ > dump_page(unmovable, "unmovable page"); > } > > - return ret; > + return -EBUSY; > } > > static void unset_migratetype_isolate(struct page *page, unsigned migratetype) >