David Hildenbrand
2022-Apr-12 15:06 UTC
[PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
On 12.04.22 17:01, Zi Yan wrote:> On 12 Apr 2022, at 10:49, David Hildenbrand wrote: > >> On 12.04.22 16:07, Zi Yan wrote: >>> On 12 Apr 2022, at 9:10, David Hildenbrand wrote: >>> >>>> On 06.04.22 17:18, Zi Yan wrote: >>>>> From: Zi Yan <ziy at nvidia.com> >>>>> >>>>> Enable set_migratetype_isolate() to check specified sub-range for >>>>> unmovable pages during isolation. Page isolation is done >>>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that >>>>> granularity are intended to be isolated. For example, >>>>> alloc_contig_range(), which uses page isolation, allows ranges without >>>>> alignment. This commit makes unmovable page check only look for >>>>> interesting pages, so that page isolation can succeed for any >>>>> non-overlapping ranges. >>>>> >>>>> Signed-off-by: Zi Yan <ziy at nvidia.com> >>>>> --- >>>> >>>> [...] >>>> >>>>> /* >>>>> - * This function checks whether pageblock includes unmovable pages or not. >>>>> + * This function checks whether the range [start_pfn, end_pfn) includes >>>>> + * unmovable pages or not. The range must fall into a single pageblock and >>>>> + * consequently belong to a single zone. >>>>> * >>>>> * PageLRU check without isolation or lru_lock could race so that >>>>> * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable >>>>> @@ -28,12 +30,14 @@ >>>>> * cannot get removed (e.g., via memory unplug) concurrently. >>>>> * >>>>> */ >>>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page, >>>>> - int migratetype, int flags) >>>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn, >>>>> + int migratetype, int flags) >>>>> { >>>>> - unsigned long iter = 0; >>>>> - unsigned long pfn = page_to_pfn(page); >>>>> - unsigned long offset = pfn % pageblock_nr_pages; >>>>> + unsigned long pfn = start_pfn; >>>>> + struct page *page = pfn_to_page(pfn); >>>> >>>> >>>> Just do >>>> >>>> struct page *page = pfn_to_page(start_pfn); >>>> struct zone *zone = page_zone(page); >>>> >>>> here. No need to lookup the zone again in the loop because, as you >>>> document "must ... belong to a single zone.". >>>> >>>> Then, there is also no need to initialize "pfn" here. In the loop header >>>> is sufficient. >>>> >>> >>> Sure. >>> >>>>> + >>>>> + VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !>>>>> + ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages)); >>>>> >>>>> if (is_migrate_cma_page(page)) { >>>>> /* >>>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, >>>>> return page; >>>>> } >>>>> >>>>> - for (; iter < pageblock_nr_pages - offset; iter++) { >>>>> - page = pfn_to_page(pfn + iter); >>>>> + for (pfn = start_pfn; pfn < end_pfn; pfn++) { >>>>> + struct zone *zone; >>>>> + >>>>> + page = pfn_to_page(pfn); >>>>> + zone = page_zone(page); >>>>> >>>>> /* >>>>> * Both, bootmem allocations and memory holes are marked >>>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, >>>>> } >>>>> >>>>> skip_pages = compound_nr(head) - (page - head); >>>>> - iter += skip_pages - 1; >>>>> + pfn += skip_pages - 1; >>>>> continue; >>>>> } >>>>> >>>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, >>>>> */ >>>>> if (!page_ref_count(page)) { >>>>> if (PageBuddy(page)) >>>>> - iter += (1 << buddy_order(page)) - 1; >>>>> + pfn += (1 << buddy_order(page)) - 1; >>>>> continue; >>>>> } >>>>> >>>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, >>>>> return NULL; >>>>> } >>>>> >>>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags) >>>>> +/* >>>>> + * This function set pageblock migratetype to isolate if no unmovable page is >>>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with >>>>> + * [start_pfn, end_pfn). >>>>> + */ >>>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags, >>>>> + unsigned long start_pfn, unsigned long end_pfn) >>>> >>>> I think we might be able do better, eventually not passing start_pfn at >>>> all. Hmm. >>> >>> IMHO, having start_pfn and end_pfn in the parameter list would make the >>> interface easier to understand. Otherwise if we remove start_pfn, >>> the caller needs to adjust @page to be within the range of [start_pfn, >>> end_pfn) >>> >>>> >>>> I think we want to pull out the >>>> start_isolate_page_range()/undo_isolate_page_range() interface change >>>> into a separate patch. >>> >>> You mean a patch just adding >>> >>> unsigned long isolate_start = pfn_max_align_down(start_pfn); >>> unsigned long isolate_end = pfn_max_align_up(end_pfn); >>> >>> in start_isolate_page_range()/undo_isolate_page_range()? >>> >>> Yes I can do that. >> >> I think we have to be careful with memory onlining/offlining. There are >> corner cases where we get called with only pageblock alignment and >> must not adjust the range. > > In the patch below, you added a new set of start_isolate_pageblocks() > and undo_isolate_pageblocks(), where start_isolate_pageblocks() still > calls set_migratetype_isolate() and noted their range should not be > adjusted. But in my patch, set_migratetype_isolate() adjusts > the range for has_unmovable_pages() check. Do you meanRight, that's broken in your patch. Memory onlining/offlining behavior changed recently when "vmemmap on memory" was added. The start range might only be aligned to pageblocks but not MAX_ORDER -1 -- and we must not align u.. The core thing is that there are two types of users: memory offlining that knows what it's doing when it aligns to less then MAX_ORDER -1 , and range allocators, that just pass in the range of interest.> start_isolate_pageblocks() should call a different version of > set_migratetype_isolate() without range adjustment? That can be done > with an additional parameter in start_isolate_page_range(), like > bool strict, right?Random boolean flags are in general frowned upon ;) -- Thanks, David / dhildenb