David Hildenbrand
2020-Jun-30 14:26 UTC
[PATCH v1 0/6] mm / virtio-mem: support ZONE_MOVABLE
Currently, virtio-mem does not really support ZONE_MOVABLE. While it allows to online fully plugged memory blocks to ZONE_MOVABLE, it does not allow to online partially-plugged memory blocks to ZONE_MOVABLE and will never consider such memory blocks when unplugging memory. This might be surprising for users (especially, if onlining suddenly fails). Let's support partially plugged memory blocks in ZONE_MOVABLE, allowing partially plugged memory blocks to be online to ZONE_MOVABLE and also unplugging from such memory blocks. This is especially helpful for testing, but also paves the way for virtio-mem optimizations, allowing more memory to get reliably unplugged. Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing better documentation of how ZONE_MOVABLE interacts with different kind of unmovable pages (memory offlining vs. alloc_contig_range()). David Hildenbrand (6): mm/page_alloc: tweak comments in has_unmovable_pages() mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate() mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate() mm/page_isolation: cleanup set_migratetype_isolate() mm/page_alloc: restrict ZONE_MOVABLE optimization in has_unmovable_pages() to memory offlining virtio-mem: don't special-case ZONE_MOVABLE drivers/virtio/virtio_mem.c | 47 +++++++------------------------------ mm/page_alloc.c | 29 +++++++++-------------- mm/page_isolation.c | 40 ++++++++++++++----------------- 3 files changed, 36 insertions(+), 80 deletions(-) -- 2.26.2
David Hildenbrand
2020-Jun-30 14:26 UTC
[PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()
Let's move the split comment regarding bootmem allocations and memory holes, especially in the context of ZONE_MOVABLE, to the PageReserved() check. Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/page_alloc.c | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 48eb0f1410d47..bd3ebf08f09b9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, unsigned long iter = 0; unsigned long pfn = page_to_pfn(page); - /* - * TODO we could make this much more efficient by not checking every - * page in the range if we know all of them are in MOVABLE_ZONE and - * that the movable zone guarantees that pages are migratable but - * the later is not the case right now unfortunatelly. E.g. movablecore - * can still lead to having bootmem allocations in zone_movable. - */ - if (is_migrate_cma_page(page)) { /* * CMA allocations (alloc_contig_range) really need to mark @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, page = pfn_to_page(pfn + iter); + /* + * Both, bootmem allocations and memory holes are marked + * PG_reserved and are unmovable. We can even have unmovable + * allocations inside ZONE_MOVABLE, for example when + * specifying "movable_core". + */ if (PageReserved(page)) return page; @@ -8306,14 +8304,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, * it. But now, memory offline itself doesn't call * shrink_node_slabs() and it still to be fixed. */ - /* - * If the page is not RAM, page_count()should be 0. - * we don't need more check. This is an _used_ not-movable page. - * - * The problematic thing here is PG_reserved pages. PG_reserved - * is set to both of a memory hole page and a _used_ kernel - * page at boot. - */ return page; } return NULL; -- 2.26.2
David Hildenbrand
2020-Jun-30 14:26 UTC
[PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
Right now, if we have two isolations racing, we might trigger the WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just return directly. In the future, we might want to report -EAGAIN to the caller instead, as this could indicate a temporary isolation failure only. Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/page_isolation.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index f6d07c5f0d34d..553b49a34cf71 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -29,10 +29,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ /* * We assume the caller intended to SET migrate type to isolate. * If it is already set, then someone else must have raced and - * set it before us. Return -EBUSY + * set it before us. */ - if (is_migrate_isolate_page(page)) - goto out; + if (is_migrate_isolate_page(page)) { + spin_unlock_irqrestore(&zone->lock, flags); + return -EBUSY; + } /* * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. -- 2.26.2
David Hildenbrand
2020-Jun-30 14:26 UTC
[PATCH v1 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
Inside has_unmovable_pages(), we have a comment describing how unmovable data could end up in ZONE_MOVABLE - via "movable_core". Also, besides checking if the first page in the pageblock is reserved, we don't perform any further checks in case of ZONE_MOVABLE. In case of memory offlining, we set REPORT_FAILURE, properly dump_page() the page and handle the error gracefully. alloc_contig_pages() users currently never allocate from ZONE_MOVABLE. E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic pages only, which will never end up on the MOVABLE zone (see htlb_alloc_mask()). Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/page_isolation.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 553b49a34cf71..02a01bff6b219 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -58,16 +58,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ spin_unlock_irqrestore(&zone->lock, flags); if (!ret) { drain_all_pages(zone); - } else { - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); - - if ((isol_flags & REPORT_FAILURE) && unmovable) - /* - * printk() with zone->lock held will likely trigger a - * lockdep splat, so defer it here. - */ - dump_page(unmovable, "unmovable page"); - } + } else if ((isol_flags & REPORT_FAILURE) && unmovable) + /* + * printk() with zone->lock held will likely trigger a + * lockdep splat, so defer it here. + */ + dump_page(unmovable, "unmovable page"); return ret; } -- 2.26.2
David Hildenbrand
2020-Jun-30 14:26 UTC
[PATCH v1 4/6] mm/page_isolation: cleanup set_migratetype_isolate()
Let's clean it up a bit, simplifying error handling and getting rid of the label. Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/page_isolation.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 02a01bff6b219..5f869bef23fa4 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -17,12 +17,9 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags) { - struct page *unmovable = NULL; - struct zone *zone; + struct zone *zone = page_zone(page); + struct page *unmovable; unsigned long flags; - int ret = -EBUSY; - - zone = page_zone(page); spin_lock_irqsave(&zone->lock, flags); @@ -51,21 +48,20 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ NULL); __mod_zone_freepage_state(zone, -nr_pages, mt); - ret = 0; + spin_unlock_irqrestore(&zone->lock, flags); + drain_all_pages(zone); + return 0; } -out: spin_unlock_irqrestore(&zone->lock, flags); - if (!ret) { - drain_all_pages(zone); - } else if ((isol_flags & REPORT_FAILURE) && unmovable) + if (isol_flags & REPORT_FAILURE) /* * printk() with zone->lock held will likely trigger a * lockdep splat, so defer it here. */ dump_page(unmovable, "unmovable page"); - return ret; + return -EBUSY; } static void unset_migratetype_isolate(struct page *page, unsigned migratetype) -- 2.26.2
David Hildenbrand
2020-Jun-30 14:26 UTC
[PATCH v1 5/6] mm/page_alloc: restrict ZONE_MOVABLE optimization in has_unmovable_pages() to memory offlining
We can already have pages that can be offlined but not allocated in ZONE_MOVABLE - PageHWPoison pages. While these pages can be skipped when offlining ("moving them to /dev/null"), we cannot move them when allocating. virtio-mem managed memory is similar. The logical memory holes corresponding to unplug memory ranges can be skipped when offlining, however, the pages cannot be moved. Currently, virtio-mem special-cases ZONE_MOVABLE, such that: - partially plugged memory blocks it added to Linux cannot be onlined to ZONE_MOVABLE - when unplugging memory, it will never consider memory blocks that were onlined to ZONE_MOVABLE We also want to support ZONE_MOVABLE in virtio-mem for both cases. Note that virtio-mem does not blindly try to unplug random pages within its managed memory region. It always plugs memory left-to-right and tries to unplug memory right-to-left - in roughly MAX_ORDER - 1 granularity. In theory, the movable ZONE part would only shrink when unplugging memory from ZONE_MOVABLE. Let's perform the ZONE_MOVABLE optimization only for memory offlining, such that we reduce the number of false positives from has_unmovable_pages() in case of alloc_contig_range() on ZONE_MOVABLE. Note: We currently don't seem to have any user of alloc_contig_range() that actually uses ZONE_MOVABLE. This change is mostly valuable for the documentation. Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/page_alloc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bd3ebf08f09b9..45077d74d975d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8237,9 +8237,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, /* * If the zone is movable and we have ruled out all reserved * pages then it should be reasonably safe to assume the rest - * is movable. + * is movable. As we can have some pages in the movable zone + * that are only considered movable for memory offlining (esp., + * PageHWPoison and PageOffline that will be skipped), we + * perform this optimization only for memory offlining. */ - if (zone_idx(zone) == ZONE_MOVABLE) + if ((flags & MEMORY_OFFLINE) && zone_idx(zone) == ZONE_MOVABLE) continue; /* -- 2.26.2
David Hildenbrand
2020-Jun-30 14:26 UTC
[PATCH v1 6/6] virtio-mem: don't special-case ZONE_MOVABLE
Let's allow to online partially plugged memory blocks to ZONE_MOVABLE and also consider memory blocks that were onlined to ZONE_MOVABLE when unplugging memory. While unplugged memory blocks are, in general, unmovable, they can be skipped when offlining memory. virtio-mem only unplugs fairly big chunks (in the megabyte range) and rather tries to shrink the memory region than randomly choosing memory. In theory, if all other pages in the movable zone would be movable, virtio-mem would only shrink that zone and not create any kind of fragmentation. Note: Support for defragmentation is planned, to deal with fragmentation after unplug due to memory chunks within memory blocks that could not get unplugged before (e.g., somebody pinning pages within ZONE_MOVABLE for a longer time). Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Michal Hocko <mhocko at suse.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Jason Wang <jasowang at redhat.com> Signed-off-by: David Hildenbrand <david at redhat.com> --- drivers/virtio/virtio_mem.c | 47 +++++++------------------------------ 1 file changed, 8 insertions(+), 39 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index f26f5f64ae822..2ddfc4a0e2ee0 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -36,18 +36,10 @@ enum virtio_mem_mb_state { VIRTIO_MEM_MB_STATE_OFFLINE, /* Partially plugged, fully added to Linux, offline. */ VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL, - /* Fully plugged, fully added to Linux, online (!ZONE_MOVABLE). */ + /* Fully plugged, fully added to Linux, online. */ VIRTIO_MEM_MB_STATE_ONLINE, - /* Partially plugged, fully added to Linux, online (!ZONE_MOVABLE). */ + /* Partially plugged, fully added to Linux, online. */ VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL, - /* - * Fully plugged, fully added to Linux, online (ZONE_MOVABLE). - * We are not allowed to allocate (unplug) parts of this block that - * are not movable (similar to gigantic pages). We will never allow - * to online OFFLINE_PARTIAL to ZONE_MOVABLE (as they would contain - * unmovable parts). - */ - VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE, VIRTIO_MEM_MB_STATE_COUNT }; @@ -526,21 +518,10 @@ static bool virtio_mem_owned_mb(struct virtio_mem *vm, unsigned long mb_id) } static int virtio_mem_notify_going_online(struct virtio_mem *vm, - unsigned long mb_id, - enum zone_type zone) + unsigned long mb_id) { switch (virtio_mem_mb_get_state(vm, mb_id)) { case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL: - /* - * We won't allow to online a partially plugged memory block - * to the MOVABLE zone - it would contain unmovable parts. - */ - if (zone == ZONE_MOVABLE) { - dev_warn_ratelimited(&vm->vdev->dev, - "memory block has holes, MOVABLE not supported\n"); - return NOTIFY_BAD; - } - return NOTIFY_OK; case VIRTIO_MEM_MB_STATE_OFFLINE: return NOTIFY_OK; default: @@ -560,7 +541,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm, VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL); break; case VIRTIO_MEM_MB_STATE_ONLINE: - case VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE: virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_OFFLINE); break; @@ -579,24 +559,17 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm, virtio_mem_retry(vm); } -static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id, - enum zone_type zone) +static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id) { unsigned long nb_offline; switch (virtio_mem_mb_get_state(vm, mb_id)) { case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL: - BUG_ON(zone == ZONE_MOVABLE); virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL); break; case VIRTIO_MEM_MB_STATE_OFFLINE: - if (zone == ZONE_MOVABLE) - virtio_mem_mb_set_state(vm, mb_id, - VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE); - else - virtio_mem_mb_set_state(vm, mb_id, - VIRTIO_MEM_MB_STATE_ONLINE); + virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE); break; default: BUG(); @@ -675,7 +648,6 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb, const unsigned long start = PFN_PHYS(mhp->start_pfn); const unsigned long size = PFN_PHYS(mhp->nr_pages); const unsigned long mb_id = virtio_mem_phys_to_mb_id(start); - enum zone_type zone; int rc = NOTIFY_OK; if (!virtio_mem_overlaps_range(vm, start, size)) @@ -717,8 +689,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb, break; } vm->hotplug_active = true; - zone = page_zonenum(pfn_to_page(mhp->start_pfn)); - rc = virtio_mem_notify_going_online(vm, mb_id, zone); + rc = virtio_mem_notify_going_online(vm, mb_id); break; case MEM_OFFLINE: virtio_mem_notify_offline(vm, mb_id); @@ -726,8 +697,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb, mutex_unlock(&vm->hotplug_mutex); break; case MEM_ONLINE: - zone = page_zonenum(pfn_to_page(mhp->start_pfn)); - virtio_mem_notify_online(vm, mb_id, zone); + virtio_mem_notify_online(vm, mb_id); vm->hotplug_active = false; mutex_unlock(&vm->hotplug_mutex); break; @@ -1906,8 +1876,7 @@ static void virtio_mem_remove(struct virtio_device *vdev) if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] || vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] || vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] || - vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL] || - vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE]) { + vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) { dev_warn(&vdev->dev, "device still has system memory added\n"); } else { virtio_mem_delete_resource(vm); -- 2.26.2
David Hildenbrand
2020-Jul-21 09:59 UTC
[PATCH v1 0/6] mm / virtio-mem: support ZONE_MOVABLE
On 30.06.20 16:26, David Hildenbrand wrote:> Currently, virtio-mem does not really support ZONE_MOVABLE. While it allows > to online fully plugged memory blocks to ZONE_MOVABLE, it does not allow > to online partially-plugged memory blocks to ZONE_MOVABLE and will never > consider such memory blocks when unplugging memory. This might be > surprising for users (especially, if onlining suddenly fails). > > Let's support partially plugged memory blocks in ZONE_MOVABLE, allowing > partially plugged memory blocks to be online to ZONE_MOVABLE and also > unplugging from such memory blocks. > > This is especially helpful for testing, but also paves the way for > virtio-mem optimizations, allowing more memory to get reliably unplugged. > > Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing > better documentation of how ZONE_MOVABLE interacts with different kind of > unmovable pages (memory offlining vs. alloc_contig_range()). > > David Hildenbrand (6): > mm/page_alloc: tweak comments in has_unmovable_pages() > mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate() > mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate() > mm/page_isolation: cleanup set_migratetype_isolate() > mm/page_alloc: restrict ZONE_MOVABLE optimization in > has_unmovable_pages() to memory offlining > virtio-mem: don't special-case ZONE_MOVABLE > > drivers/virtio/virtio_mem.c | 47 +++++++------------------------------ > mm/page_alloc.c | 29 +++++++++-------------- > mm/page_isolation.c | 40 ++++++++++++++----------------- > 3 files changed, 36 insertions(+), 80 deletions(-) >Gentle ping, patch #1-#5 should be easy to review. -- Thanks, David / dhildenb
David Hildenbrand
2020-Jul-27 12:23 UTC
[PATCH v1 5/6] mm/page_alloc: restrict ZONE_MOVABLE optimization in has_unmovable_pages() to memory offlining
On 30.06.20 16:26, David Hildenbrand wrote:> We can already have pages that can be offlined but not allocated in > ZONE_MOVABLE - PageHWPoison pages. While these pages can be skipped when > offlining ("moving them to /dev/null"), we cannot move them when > allocating. > > virtio-mem managed memory is similar. The logical memory holes > corresponding to unplug memory ranges can be skipped when offlining, > however, the pages cannot be moved. Currently, virtio-mem special-cases > ZONE_MOVABLE, such that: > - partially plugged memory blocks it added to Linux cannot be onlined to > ZONE_MOVABLE > - when unplugging memory, it will never consider memory blocks that were > onlined to ZONE_MOVABLE > > We also want to support ZONE_MOVABLE in virtio-mem for both cases. Note > that virtio-mem does not blindly try to unplug random pages within its > managed memory region. It always plugs memory left-to-right and tries to > unplug memory right-to-left - in roughly MAX_ORDER - 1 granularity. In > theory, the movable ZONE part would only shrink when unplugging memory > from ZONE_MOVABLE. > > Let's perform the ZONE_MOVABLE optimization only for memory offlining, > such that we reduce the number of false positives from > has_unmovable_pages() in case of alloc_contig_range() on ZONE_MOVABLE. > > Note: We currently don't seem to have any user of alloc_contig_range() > that actually uses ZONE_MOVABLE. This change is mostly valuable for the > documentation. > > Cc: Andrew Morton <akpm at linux-foundation.org> > Cc: Michal Hocko <mhocko at suse.com> > Cc: Michael S. Tsirkin <mst at redhat.com> > Signed-off-by: David Hildenbrand <david at redhat.com> > --- > mm/page_alloc.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index bd3ebf08f09b9..45077d74d975d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8237,9 +8237,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, > /* > * If the zone is movable and we have ruled out all reserved > * pages then it should be reasonably safe to assume the rest > - * is movable. > + * is movable. As we can have some pages in the movable zone > + * that are only considered movable for memory offlining (esp., > + * PageHWPoison and PageOffline that will be skipped), we > + * perform this optimization only for memory offlining. > */ > - if (zone_idx(zone) == ZONE_MOVABLE) > + if ((flags & MEMORY_OFFLINE) && zone_idx(zone) == ZONE_MOVABLE) > continue; > > /* >So, as we don't have any alloc_contig_range() users that use ZONE_MOVABLE for now, and virtio-mem will be the only one for now (which accounts for 50% of the special cases - PG_offline), I think we might drop this patch. Worst think is that if we ever have other alloc_contig_range() users, that we return "false" from has_unmovable_pages() and fail later when trying to migrate/isolate all pages. This should, however, only happen in rare cases (and there are already other cases where we have basically unmovable data - long-term pinnings). On the plus side, keeping the ZONE_MOVABLE optimizations here also allows virtio-mem to better tolerate unstable page flags when trying to alloc_contig_range(). Thoughts? -- Thanks, David / dhildenb
David Hildenbrand
2020-Jul-28 14:07 UTC
[PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()
On 28.07.20 15:48, Baoquan He wrote:> On 06/30/20 at 04:26pm, David Hildenbrand wrote: >> Let's move the split comment regarding bootmem allocations and memory >> holes, especially in the context of ZONE_MOVABLE, to the PageReserved() >> check. >> >> Cc: Andrew Morton <akpm at linux-foundation.org> >> Cc: Michal Hocko <mhocko at suse.com> >> Cc: Michael S. Tsirkin <mst at redhat.com> >> Signed-off-by: David Hildenbrand <david at redhat.com> >> --- >> mm/page_alloc.c | 22 ++++++---------------- >> 1 file changed, 6 insertions(+), 16 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 48eb0f1410d47..bd3ebf08f09b9 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, >> unsigned long iter = 0; >> unsigned long pfn = page_to_pfn(page); >> >> - /* >> - * TODO we could make this much more efficient by not checking every >> - * page in the range if we know all of them are in MOVABLE_ZONE and >> - * that the movable zone guarantees that pages are migratable but >> - * the later is not the case right now unfortunatelly. E.g. movablecore >> - * can still lead to having bootmem allocations in zone_movable. >> - */ >> - >> if (is_migrate_cma_page(page)) { >> /* >> * CMA allocations (alloc_contig_range) really need to mark >> @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, >> >> page = pfn_to_page(pfn + iter); >> >> + /* >> + * Both, bootmem allocations and memory holes are marked >> + * PG_reserved and are unmovable. We can even have unmovable >> + * allocations inside ZONE_MOVABLE, for example when >> + * specifying "movable_core". > ~~~~ should be 'movablecore', we don't > have kernel parameter 'movable_core'.Agreed!> > Otherwise, this looks good to me. Esp the code comment at below had been > added very long time ago and obsolete. > > Reviewed-by: Baoquan He <bhe at redhat.com> > > By the way, David, do you know what is the situation of having unmovable > allocations inside ZONE_MOVABLE when specifying 'movablecore'? I quickly > went through find_zone_movable_pfns_for_nodes(), but didn't get why. > Could you tell a little more detail about it?As far as I understand, it can happen that we have memblock allocations during boot that fall into an area the kernel later configures to span the movable zone (via movable_core).> > Thanks > Baoquan-- Thanks, David / dhildenb
David Hildenbrand
2020-Jul-29 12:29 UTC
[PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()
On 29.07.20 12:47, Baoquan He wrote:> On 07/28/20 at 04:07pm, David Hildenbrand wrote: >> On 28.07.20 15:48, Baoquan He wrote: >>> On 06/30/20 at 04:26pm, David Hildenbrand wrote: >>>> Let's move the split comment regarding bootmem allocations and memory >>>> holes, especially in the context of ZONE_MOVABLE, to the PageReserved() >>>> check. >>>> >>>> Cc: Andrew Morton <akpm at linux-foundation.org> >>>> Cc: Michal Hocko <mhocko at suse.com> >>>> Cc: Michael S. Tsirkin <mst at redhat.com> >>>> Signed-off-by: David Hildenbrand <david at redhat.com> >>>> --- >>>> mm/page_alloc.c | 22 ++++++---------------- >>>> 1 file changed, 6 insertions(+), 16 deletions(-) >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index 48eb0f1410d47..bd3ebf08f09b9 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, >>>> unsigned long iter = 0; >>>> unsigned long pfn = page_to_pfn(page); >>>> >>>> - /* >>>> - * TODO we could make this much more efficient by not checking every >>>> - * page in the range if we know all of them are in MOVABLE_ZONE and >>>> - * that the movable zone guarantees that pages are migratable but >>>> - * the later is not the case right now unfortunatelly. E.g. movablecore >>>> - * can still lead to having bootmem allocations in zone_movable. >>>> - */ >>>> - >>>> if (is_migrate_cma_page(page)) { >>>> /* >>>> * CMA allocations (alloc_contig_range) really need to mark >>>> @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, >>>> >>>> page = pfn_to_page(pfn + iter); >>>> >>>> + /* >>>> + * Both, bootmem allocations and memory holes are marked >>>> + * PG_reserved and are unmovable. We can even have unmovable >>>> + * allocations inside ZONE_MOVABLE, for example when >>>> + * specifying "movable_core". >>> ~~~~ should be 'movablecore', we don't >>> have kernel parameter 'movable_core'. >> >> Agreed! >> >>> >>> Otherwise, this looks good to me. Esp the code comment at below had been >>> added very long time ago and obsolete. >>> >>> Reviewed-by: Baoquan He <bhe at redhat.com> >>> >>> By the way, David, do you know what is the situation of having unmovable >>> allocations inside ZONE_MOVABLE when specifying 'movablecore'? I quickly >>> went through find_zone_movable_pfns_for_nodes(), but didn't get why. >>> Could you tell a little more detail about it? >> >> As far as I understand, it can happen that we have memblock allocations >> during boot that fall into an area the kernel later configures to span >> the movable zone (via movable_core). > > Seems yes, thanks a lot. Wondering who is still using > movablecore|kernelcore in what use case. >AFAIK, it's the only (guaranteed) way to get ZONE_MOVABLE without involving memory hotplug. As I learned, the movable zone is also interesting beyond memory hotunplug. It limits unmovable fragmentation and e.g., makes THP/huge pages more likely/easier to allocate. -- Thanks, David / dhildenb
David Hildenbrand
2020-Jul-29 13:37 UTC
[PATCH v1 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
On 29.07.20 15:24, Baoquan He wrote:> On 06/30/20 at 04:26pm, David Hildenbrand wrote: >> Inside has_unmovable_pages(), we have a comment describing how unmovable >> data could end up in ZONE_MOVABLE - via "movable_core". Also, besides > ~~~ 'movablecore' >> checking if the first page in the pageblock is reserved, we don't >> perform any further checks in case of ZONE_MOVABLE. >> >> In case of memory offlining, we set REPORT_FAILURE, properly >> dump_page() the page and handle the error gracefully. >> alloc_contig_pages() users currently never allocate from ZONE_MOVABLE. >> E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic >> pages only, which will never end up on the MOVABLE zone >> (see htlb_alloc_mask()). >> >> Cc: Andrew Morton <akpm at linux-foundation.org> >> Cc: Michal Hocko <mhocko at suse.com> >> Cc: Michael S. Tsirkin <mst at redhat.com> >> Signed-off-by: David Hildenbrand <david at redhat.com> >> --- >> mm/page_isolation.c | 16 ++++++---------- >> 1 file changed, 6 insertions(+), 10 deletions(-) >> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index 553b49a34cf71..02a01bff6b219 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -58,16 +58,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ >> spin_unlock_irqrestore(&zone->lock, flags); >> if (!ret) { >> drain_all_pages(zone); >> - } else { >> - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); >> - >> - if ((isol_flags & REPORT_FAILURE) && unmovable) >> - /* >> - * printk() with zone->lock held will likely trigger a >> - * lockdep splat, so defer it here. >> - */ >> - dump_page(unmovable, "unmovable page"); >> - } >> + } else if ((isol_flags & REPORT_FAILURE) && unmovable) > > This else if branch should be enclosed in brace? >Not necessarily. And it will be gone in the next patch in this series :) Thanks! -- Thanks, David / dhildenb
David Hildenbrand
2020-Jul-29 18:08 UTC
[PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
On 29.07.20 19:31, Mike Kravetz wrote:> On 6/30/20 7:26 AM, David Hildenbrand wrote: >> Right now, if we have two isolations racing, we might trigger the >> WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just >> return directly. > > Just curious, what call path has the WARN_ON_ONCE()/dump_page(NULL)?See below, two set_migratetype_isolate() caller racing.> >> >> In the future, we might want to report -EAGAIN to the caller instead, as >> this could indicate a temporary isolation failure only. >> >> Cc: Andrew Morton <akpm at linux-foundation.org> >> Cc: Michal Hocko <mhocko at suse.com> >> Cc: Michael S. Tsirkin <mst at redhat.com> >> Signed-off-by: David Hildenbrand <david at redhat.com> > > Hi David, > > That 'return -EAGAIN' was added as a sort of synchronization mechanism. > See commit message for 2c7452a075d4d. Before adding the 'return -EAGAIN', > I could create races which would abandon isolated pageblocks. Repeating > those races over and over would result in a good chunk of system memory > being isolated and unusable.It's actually -EBUSY, it should maybe later be changed to -EAGAIN (see comment), so caller can decide to retry immediately. Other discussion.> > Admittedly, these races are rare and I had to work really hard to produce > them. I'll try to find my testing mechanism. My concern is reintroducing > this abandoning of pageblocks. I have not looked further in your series > to see if this potentially addressed later. If not, then we should not > remove the return code. >Memory offlining could race with alloc_contig_range(), e.g., called when allocating gigantic pages, or when virtio-mem tries to unplug memory. The latter two could also race. We are getting more alloc_contig_range() users, which is why these races will become more relevant. I have no clue what you mean with "reintroducing this abandoning of pageblocks". All this patch is changing is not doing the dump_page() - or am I missing something important? -- Thanks, David / dhildenb
Possibly Parallel Threads
- [PATCH v1 0/6] mm / virtio-mem: support ZONE_MOVABLE
- [PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE
- [PATCH v3 0/6] mm / virtio-mem: support ZONE_MOVABLE
- [PATCH v3 0/6] mm / virtio-mem: support ZONE_MOVABLE
- [PATCH v4 0/6] mm / virtio-mem: support ZONE_MOVABLE