Michal Hocko
2019-Oct-16 11:43 UTC
[PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
On Thu 19-09-19 16:22:25, David Hildenbrand wrote:> virtio-mem wants to allow to offline memory blocks of which some parts > were unplugged, especially, to later offline and remove completely > unplugged memory blocks. The important part is that PageOffline() has > to remain set until the section is offline, so these pages will never > get accessed (e.g., when dumping). The pages should not be handed > back to the buddy (which would require clearing PageOffline() and > result in issues if offlining fails and the pages are suddenly in the > buddy). > > Let's use "PageOffline() + reference count = 0" as a sign to > memory offlining code that these pages can simply be skipped when > offlining, similar to free or HWPoison pages. > > Pass flags to test_pages_isolated(), similar as already done for > has_unmovable_pages(). Use a new flag to indicate the > requirement of memory offlining to skip over these special pages. > > In has_unmovable_pages(), make sure the pages won't be detected as > movable. This is not strictly necessary, however makes e.g., > alloc_contig_range() stop early, trying to isolate such page blocks - > compared to failing later when testing if all pages were isolated. > > Also, make sure that when a reference to a PageOffline() page is > dropped, that the page will not be returned to the buddy. > > memory devices (like virtio-mem) that want to make use of this > functionality have to make sure to synchronize against memory offlining, > using the memory hotplug notifier. > > Alternative: Allow to offline with a reference count of 1 > and use some other sign in the struct page that offlining is permitted.Few questions. I do not see onlining code to take care of this special case. What should happen when offline && online? Should we allow to try_remove_memory to succeed with these pages? Do we really have hook into __put_page? Why do we even care about the reference count of those pages? Wouldn't it be just more consistent to elevate the reference count (I guess this is what you suggest in the last paragraph) and the virtio driver would return that page to the buddy by regular put_page. This is also related to the above question about the physical memory remove. [...]> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d5d7944954b3..fef74720d8b4 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8221,6 +8221,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > if (!page_ref_count(page)) { > if (PageBuddy(page)) > iter += (1 << page_order(page)) - 1; > + /* > + * Memory devices allow to offline a page if it is > + * marked PG_offline and has a reference count of 0. > + * However, the pages are not movable as it would be > + * required e.g., for alloc_contig_range(). > + */ > + if (PageOffline(page) && !(flags & SKIP_OFFLINE)) > + if (++found > count) > + goto unmovable; > continue; > }Do we really need to distinguish offline and hwpoison pages? They are both unmovable for allocator purposes and offlineable for the hotplug, right? Should we just hide them behind a helper and use it rather than an explicit SKIP_$FOO? -- Michal Hocko SUSE Labs
David Hildenbrand
2019-Oct-16 12:50 UTC
[PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
On 16.10.19 13:43, Michal Hocko wrote:> On Thu 19-09-19 16:22:25, David Hildenbrand wrote: >> virtio-mem wants to allow to offline memory blocks of which some parts >> were unplugged, especially, to later offline and remove completely >> unplugged memory blocks. The important part is that PageOffline() has >> to remain set until the section is offline, so these pages will never >> get accessed (e.g., when dumping). The pages should not be handed >> back to the buddy (which would require clearing PageOffline() and >> result in issues if offlining fails and the pages are suddenly in the >> buddy). >> >> Let's use "PageOffline() + reference count = 0" as a sign to >> memory offlining code that these pages can simply be skipped when >> offlining, similar to free or HWPoison pages. >> >> Pass flags to test_pages_isolated(), similar as already done for >> has_unmovable_pages(). Use a new flag to indicate the >> requirement of memory offlining to skip over these special pages. >> >> In has_unmovable_pages(), make sure the pages won't be detected as >> movable. This is not strictly necessary, however makes e.g., >> alloc_contig_range() stop early, trying to isolate such page blocks - >> compared to failing later when testing if all pages were isolated. >> >> Also, make sure that when a reference to a PageOffline() page is >> dropped, that the page will not be returned to the buddy. >> >> memory devices (like virtio-mem) that want to make use of this >> functionality have to make sure to synchronize against memory offlining, >> using the memory hotplug notifier. >> >> Alternative: Allow to offline with a reference count of 1 >> and use some other sign in the struct page that offlining is permitted. > > Few questions. I do not see onlining code to take care of this special > case. What should happen when offline && online?Once offline, the memmap is garbage. When onlining again: a) memmap will be re-initialized b) online_page_callback_t will be called for every page in the section. The driver can mark them offline again and not give them to the buddy. c) section will be marked online. The driver that marked these pages to be skipped when offlining is responsible for registering the online_page_callback_t callback where these pages will get excluded. This is exactly the same as when onling a memory block that is partially populated (e.g., HpyerV balloon right now). So it's effectively "re-initializing the memmap using the driver knowledge" when onlining.> Should we allow to try_remove_memory to succeed with these pages?I think we should first properly offline them (mark sections offline and memory blocks, fixup numbers, shrink zones ...). The we can cleanly remove the memory. (see [PATCH RFC v3 8/9] mm/memory_hotplug: Introduce offline_and_remove_memory()) Once offline, the memmap is irrelevant and try_remove_memory() can do its job.> Do we really have hook into __put_page? Why do we even care about the > reference count of those pages? Wouldn't it be just more consistent to > elevate the reference count (I guess this is what you suggest in the > last paragraph) and the virtio driver would return that page to the > buddy by regular put_page. This is also related to the above question > about the physical memory remove.Returning them to the buddy is problematic for various reasons. Let's have a look at __offline_pages(): 1) start_isolate_page_range() -> offline pages with a reference count of one will be detected as unmovable -> BAD, we abort right away. We could hack around that. 2) memory_notify(MEM_GOING_OFFLINE, &arg); -> Here, we could release all pages to the buddy, clearing PG_offline -> BAD, PF_offline must not be cleared so dumping tools will not touch these pages. I don't see a way to hack around that. 3) scan_movable_pages() ... 4a) memory_notify(MEM_OFFLINE, &arg); Perfect, it worked. Sections are offline. 4b) undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); memory_notify(MEM_CANCEL_OFFLINE, &arg); -> Offlining failed for whatever reason. -> Pages are in the buddy, but we already un-isolated them. BAD. By not going via the buddy we avoid these issues and can leave PG_offline set until the section is fully offline. Something that is very desirable for virtio-mem (and as far as I can tell also HyperV in the future).> > [...] >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index d5d7944954b3..fef74720d8b4 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -8221,6 +8221,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, >> if (!page_ref_count(page)) { >> if (PageBuddy(page)) >> iter += (1 << page_order(page)) - 1; >> + /* >> + * Memory devices allow to offline a page if it is >> + * marked PG_offline and has a reference count of 0. >> + * However, the pages are not movable as it would be >> + * required e.g., for alloc_contig_range(). >> + */ >> + if (PageOffline(page) && !(flags & SKIP_OFFLINE)) >> + if (++found > count) >> + goto unmovable; >> continue; >> } > > Do we really need to distinguish offline and hwpoison pages? They are > both unmovable for allocator purposes and offlineable for the hotplug, > right? Should we just hide them behind a helper and use it rather than > an explicit SKIP_$FOO?Makes sense. It really boils down to "offline" vs. "allocate" use cases. So maybe instead of "SKIP_FOO" something like "MEMORY_OFFLINE". ? -- Thanks, David / dhildenb
David Hildenbrand
2019-Oct-16 13:45 UTC
[PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
On 16.10.19 13:43, Michal Hocko wrote:> On Thu 19-09-19 16:22:25, David Hildenbrand wrote: >> virtio-mem wants to allow to offline memory blocks of which some parts >> were unplugged, especially, to later offline and remove completely >> unplugged memory blocks. The important part is that PageOffline() has >> to remain set until the section is offline, so these pages will never >> get accessed (e.g., when dumping). The pages should not be handed >> back to the buddy (which would require clearing PageOffline() and >> result in issues if offlining fails and the pages are suddenly in the >> buddy). >> >> Let's use "PageOffline() + reference count = 0" as a sign to >> memory offlining code that these pages can simply be skipped when >> offlining, similar to free or HWPoison pages. >> >> Pass flags to test_pages_isolated(), similar as already done for >> has_unmovable_pages(). Use a new flag to indicate the >> requirement of memory offlining to skip over these special pages. >> >> In has_unmovable_pages(), make sure the pages won't be detected as >> movable. This is not strictly necessary, however makes e.g., >> alloc_contig_range() stop early, trying to isolate such page blocks - >> compared to failing later when testing if all pages were isolated. >> >> Also, make sure that when a reference to a PageOffline() page is >> dropped, that the page will not be returned to the buddy. >> >> memory devices (like virtio-mem) that want to make use of this >> functionality have to make sure to synchronize against memory offlining, >> using the memory hotplug notifier. >> >> Alternative: Allow to offline with a reference count of 1 >> and use some other sign in the struct page that offlining is permitted. > > Few questions. I do not see onlining code to take care of this special > case. What should happen when offline && online? > Should we allow to try_remove_memory to succeed with these pages? > Do we really have hook into __put_page? Why do we even care about the > reference count of those pages?Oh, I forgot to answer this questions. The __put_page() change is necessary for the following race I identified: Page has a refcount of 1 (e.g., allocated by virtio-mem using alloc_contig_range()). a) kernel: get_page_unless_zero(page): refcount = 2 b) virtio-mem: set page PG_offline, reduce refcount): refocunt = 1 c) kernel: put_page(page): refcount = 0 The page would suddenly be given to the buddy. which is bad. -- Thanks, David / dhildenb
Michal Hocko
2019-Oct-16 13:45 UTC
[PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
On Wed 16-10-19 14:50:30, David Hildenbrand wrote:> On 16.10.19 13:43, Michal Hocko wrote: > > On Thu 19-09-19 16:22:25, David Hildenbrand wrote: > > > virtio-mem wants to allow to offline memory blocks of which some parts > > > were unplugged, especially, to later offline and remove completely > > > unplugged memory blocks. The important part is that PageOffline() has > > > to remain set until the section is offline, so these pages will never > > > get accessed (e.g., when dumping). The pages should not be handed > > > back to the buddy (which would require clearing PageOffline() and > > > result in issues if offlining fails and the pages are suddenly in the > > > buddy). > > > > > > Let's use "PageOffline() + reference count = 0" as a sign to > > > memory offlining code that these pages can simply be skipped when > > > offlining, similar to free or HWPoison pages. > > > > > > Pass flags to test_pages_isolated(), similar as already done for > > > has_unmovable_pages(). Use a new flag to indicate the > > > requirement of memory offlining to skip over these special pages. > > > > > > In has_unmovable_pages(), make sure the pages won't be detected as > > > movable. This is not strictly necessary, however makes e.g., > > > alloc_contig_range() stop early, trying to isolate such page blocks - > > > compared to failing later when testing if all pages were isolated. > > > > > > Also, make sure that when a reference to a PageOffline() page is > > > dropped, that the page will not be returned to the buddy. > > > > > > memory devices (like virtio-mem) that want to make use of this > > > functionality have to make sure to synchronize against memory offlining, > > > using the memory hotplug notifier. > > > > > > Alternative: Allow to offline with a reference count of 1 > > > and use some other sign in the struct page that offlining is permitted. > > > > Few questions. I do not see onlining code to take care of this special > > case. What should happen when offline && online? > > Once offline, the memmap is garbage. When onlining again: > > a) memmap will be re-initialized > b) online_page_callback_t will be called for every page in the section. The > driver can mark them offline again and not give them to the buddy. > c) section will be marked online.But we can skip those pages when onlining and keep them in the offline state right? We do not poison offlined pages. There is state stored in the struct page. In other words this shouldn't be really different from HWPoison pages. I cannot find the code that is doing that and maybe we don't handle that. But we cannot simply online hwpoisoned page. Offlining the range will not make a broken memory OK all of the sudden. And your usecase sounds similar to me.> The driver that marked these pages to be skipped when offlining is > responsible for registering the online_page_callback_t callback where these > pages will get excluded. > > This is exactly the same as when onling a memory block that is partially > populated (e.g., HpyerV balloon right now). > > So it's effectively "re-initializing the memmap using the driver knowledge" > when onlining.I am not sure I follow. So you exclude those pages when onlining?> > Should we allow to try_remove_memory to succeed with these pages? > > I think we should first properly offline them (mark sections offline and > memory blocks, fixup numbers, shrink zones ...). The we can cleanly remove > the memory. (see [PATCH RFC v3 8/9] mm/memory_hotplug: Introduce > offline_and_remove_memory())I will have a look, but just to quick question. try_remove_memory would fail if the range is offline (via user interface) but there are still some pages in the driver Offline state?> Once offline, the memmap is irrelevant and try_remove_memory() can do its > job. > > > Do we really have hook into __put_page? Why do we even care about the > > reference count of those pages? Wouldn't it be just more consistent to > > elevate the reference count (I guess this is what you suggest in the > > last paragraph) and the virtio driver would return that page to the > > buddy by regular put_page. This is also related to the above question > > about the physical memory remove. > > Returning them to the buddy is problematic for various reasons. Let's have a > look at __offline_pages(): > > 1) start_isolate_page_range() > -> offline pages with a reference count of one will be detected as unmovable > -> BAD, we abort right away. We could hack around that. > > 2) memory_notify(MEM_GOING_OFFLINE, &arg); > -> Here, we could release all pages to the buddy, clearing PG_offline > -> BAD, PF_offline must not be cleared so dumping tools will not touch > these pages. I don't see a way to hack around that. > > 3) scan_movable_pages() ... > > 4a) memory_notify(MEM_OFFLINE, &arg); > > Perfect, it worked. Sections are offline. > > 4b) undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); > memory_notify(MEM_CANCEL_OFFLINE, &arg); > > -> Offlining failed for whatever reason. > -> Pages are in the buddy, but we already un-isolated them. BAD. > > By not going via the buddy we avoid these issues and can leave PG_offline > set until the section is fully offline. Something that is very desirable for > virtio-mem (and as far as I can tell also HyperV in the future).I am not sure I follow. Maybe my original question was confusing. Let me ask again. Why do we need to hook into __put_page?> > [...] > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index d5d7944954b3..fef74720d8b4 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -8221,6 +8221,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > > > if (!page_ref_count(page)) { > > > if (PageBuddy(page)) > > > iter += (1 << page_order(page)) - 1; > > > + /* > > > + * Memory devices allow to offline a page if it is > > > + * marked PG_offline and has a reference count of 0. > > > + * However, the pages are not movable as it would be > > > + * required e.g., for alloc_contig_range(). > > > + */ > > > + if (PageOffline(page) && !(flags & SKIP_OFFLINE)) > > > + if (++found > count) > > > + goto unmovable; > > > continue; > > > } > > > > Do we really need to distinguish offline and hwpoison pages? They are > > both unmovable for allocator purposes and offlineable for the hotplug, > > right? Should we just hide them behind a helper and use it rather than > > an explicit SKIP_$FOO? > > Makes sense. It really boils down to "offline" vs. "allocate" use cases. > > So maybe instead of "SKIP_FOO" something like "MEMORY_OFFLINE". ?Yes, that would be a better fit. -- Michal Hocko SUSE Labs
Michal Hocko
2019-Oct-16 14:03 UTC
[PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
On Wed 16-10-19 15:45:06, David Hildenbrand wrote:> On 16.10.19 13:43, Michal Hocko wrote: > > On Thu 19-09-19 16:22:25, David Hildenbrand wrote: > > > virtio-mem wants to allow to offline memory blocks of which some parts > > > were unplugged, especially, to later offline and remove completely > > > unplugged memory blocks. The important part is that PageOffline() has > > > to remain set until the section is offline, so these pages will never > > > get accessed (e.g., when dumping). The pages should not be handed > > > back to the buddy (which would require clearing PageOffline() and > > > result in issues if offlining fails and the pages are suddenly in the > > > buddy). > > > > > > Let's use "PageOffline() + reference count = 0" as a sign to > > > memory offlining code that these pages can simply be skipped when > > > offlining, similar to free or HWPoison pages. > > > > > > Pass flags to test_pages_isolated(), similar as already done for > > > has_unmovable_pages(). Use a new flag to indicate the > > > requirement of memory offlining to skip over these special pages. > > > > > > In has_unmovable_pages(), make sure the pages won't be detected as > > > movable. This is not strictly necessary, however makes e.g., > > > alloc_contig_range() stop early, trying to isolate such page blocks - > > > compared to failing later when testing if all pages were isolated. > > > > > > Also, make sure that when a reference to a PageOffline() page is > > > dropped, that the page will not be returned to the buddy. > > > > > > memory devices (like virtio-mem) that want to make use of this > > > functionality have to make sure to synchronize against memory offlining, > > > using the memory hotplug notifier. > > > > > > Alternative: Allow to offline with a reference count of 1 > > > and use some other sign in the struct page that offlining is permitted. > > > > Few questions. I do not see onlining code to take care of this special > > case. What should happen when offline && online? > > Should we allow to try_remove_memory to succeed with these pages? > > Do we really have hook into __put_page? Why do we even care about the > > reference count of those pages? > > Oh, I forgot to answer this questions. The __put_page() change is necessary > for the following race I identified: > > Page has a refcount of 1 (e.g., allocated by virtio-mem using > alloc_contig_range()). > > a) kernel: get_page_unless_zero(page): refcount = 2 > b) virtio-mem: set page PG_offline, reduce refcount): refocunt = 1 > c) kernel: put_page(page): refcount = 0 > > The page would suddenly be given to the buddy. which is bad.But why cannot you keep the reference count at 1 (do get_page when offlining the page)? In other words as long as the driver knows the page has been returned to the host then it has ref count at 1. Once the page is returned to the guest for whatever reason it can free it to the system by clearing the offline state and put_page. An elevated ref count could help to detect that the memory hotremove is not safe until the driver removes all potential metadata it might still hold. You also know that memory online should skip such a page. All in all your page is still in use by the driver and the life cycle is controlled by that driver. Or am I am missing something? -- Michal Hocko SUSE Labs
Reasonably Related Threads
- [PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
- [PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
- [PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
- [PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
- [PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0