thr3ads.net - Virtualization - [PATCH v12 6/8] mm: support reporting free page blocks [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Michal Hocko

2017-Jul-25 11:25 UTC

[PATCH v12 6/8] mm: support reporting free page blocks

On Tue 25-07-17 17:32:00, Wei Wang wrote:> On 07/24/2017 05:00 PM, Michal Hocko wrote:
> >On Wed 19-07-17 20:01:18, Wei Wang wrote:
> >>On 07/19/2017 04:13 PM, Michal Hocko wrote:
> >[...
> >>>All you should need is the check for the page reference count,
no?  I
> >>>assume you do some sort of pfn walk and so you should be able
to get an
> >>>access to the struct page.
> >>Not necessarily - the guest struct page is not seen by the
hypervisor. The
> >>hypervisor only gets those guest pfns which are hinted as unused.
From the
> >>hypervisor (host) point of view, a guest physical address
corresponds to a
> >>virtual address of a host process. So, once the hypervisor knows a
guest
> >>physical page is unsued, it knows that the corresponding virtual
memory of
> >>the process doesn't need to be transferred in the 1st round.
> >I am sorry, but I do not understand. Why cannot _guest_ simply check
the
> >struct page ref count and send them to the hypervisor?
> 
> Were you suggesting the following?
> 1) get a free page block from the page list using the API;
No. Use a pfn walk, check the reference count and skip those pages which
have 0 ref count. I suspected that you need to do some sort of the pfn
walk anyway because you somehow have to evaluate a memory to migrate,
right?
> 2) if page->ref_count == 0, send it to the hypervisor
yes
> Btw, ref_count may also change at any time.
> 
> >Is there any
> >documentation which describes the workflow or code which would use your
> >new API?
> >
> 
> It's used in the balloon driver (patch 8). We don't have any docs
yet, but
> I think the high level workflow is the two steps above.
I will have a look.
-- 
Michal Hocko
SUSE Labs

Wei Wang

2017-Jul-25 11:56 UTC

head link

[PATCH v12 6/8] mm: support reporting free page blocks

On 07/25/2017 07:25 PM, Michal Hocko wrote:> On Tue 25-07-17 17:32:00, Wei Wang wrote:
>> On 07/24/2017 05:00 PM, Michal Hocko wrote:
>>> On Wed 19-07-17 20:01:18, Wei Wang wrote:
>>>> On 07/19/2017 04:13 PM, Michal Hocko wrote:
>>> [...
>>>>> All you should need is the check for the page reference
count, no?  I
>>>>> assume you do some sort of pfn walk and so you should be
able to get an
>>>>> access to the struct page.
>>>> Not necessarily - the guest struct page is not seen by the
hypervisor. The
>>>> hypervisor only gets those guest pfns which are hinted as
unused. From the
>>>> hypervisor (host) point of view, a guest physical address
corresponds to a
>>>> virtual address of a host process. So, once the hypervisor
knows a guest
>>>> physical page is unsued, it knows that the corresponding
virtual memory of
>>>> the process doesn't need to be transferred in the 1st
round.
>>> I am sorry, but I do not understand. Why cannot _guest_ simply
check the
>>> struct page ref count and send them to the hypervisor?
>> Were you suggesting the following?
>> 1) get a free page block from the page list using the API;
> No. Use a pfn walk, check the reference count and skip those pages which
> have 0 ref count.

"pfn walk" - do you mean start from the first pfn, and scan all the
pfns
that the VM has?

> I suspected that you need to do some sort of the pfn
> walk anyway because you somehow have to evaluate a memory to migrate,
> right?

We don't need to do the pfn walk in the guest kernel. When the API 
reports, for example,
a 2MB free page block, the API caller offers to the hypervisor the base 
address of the page
block, and size=2MB, to the hypervisor.

The hypervisor maintains a bitmap of all the guest physical memory (a 
bit corresponds to
a guest pfn). When migrating memory, only the pfns that are set in the 
bitmap are transferred
to the destination machine. So, when the hypervisor receives a 2MB free 
page block, the
corresponding bits in the bitmap are cleared.

Best,
Wei

Michal Hocko

2017-Jul-25 12:41 UTC

head link

[PATCH v12 6/8] mm: support reporting free page blocks

On Tue 25-07-17 19:56:24, Wei Wang wrote:> On 07/25/2017 07:25 PM, Michal Hocko wrote:
> >On Tue 25-07-17 17:32:00, Wei Wang wrote:
> >>On 07/24/2017 05:00 PM, Michal Hocko wrote:
> >>>On Wed 19-07-17 20:01:18, Wei Wang wrote:
> >>>>On 07/19/2017 04:13 PM, Michal Hocko wrote:
> >>>[...
> >>>>>All you should need is the check for the page reference
count, no?  I
> >>>>>assume you do some sort of pfn walk and so you should
be able to get an
> >>>>>access to the struct page.
> >>>>Not necessarily - the guest struct page is not seen by the
hypervisor. The
> >>>>hypervisor only gets those guest pfns which are hinted as
unused. From the
> >>>>hypervisor (host) point of view, a guest physical address
corresponds to a
> >>>>virtual address of a host process. So, once the hypervisor
knows a guest
> >>>>physical page is unsued, it knows that the corresponding
virtual memory of
> >>>>the process doesn't need to be transferred in the 1st
round.
> >>>I am sorry, but I do not understand. Why cannot _guest_ simply
check the
> >>>struct page ref count and send them to the hypervisor?
> >>Were you suggesting the following?
> >>1) get a free page block from the page list using the API;
> >No. Use a pfn walk, check the reference count and skip those pages
which
> >have 0 ref count.
> 
> 
> "pfn walk" - do you mean start from the first pfn, and scan all
the pfns
> that the VM has?
yes
> >I suspected that you need to do some sort of the pfn
> >walk anyway because you somehow have to evaluate a memory to migrate,
> >right?
> 
> We don't need to do the pfn walk in the guest kernel. When the API
> reports, for example, a 2MB free page block, the API caller offers to
> the hypervisor the base address of the page block, and size=2MB, to
> the hypervisor.
So you want to skip pfn walks by regularly calling into the page
allocator to update your bitmap. If that is the case then would an API
that would allow you to update your bitmap via a callback be s
sufficient? Something like
	void walk_free_mem(int node, int min_order,
			void (*visit)(unsigned long pfn, unsigned long nr_pages))

The function will call the given callback for each free memory block on
the given node starting from the given min_order. The callback will be
strictly an atomic and very light context. You can update your bitmap
from there.

This would address my main concern that the allocator internals would
get outside of the allocator proper. A nasty callback which would be too
expensive could still stall other allocations and cause latencies but
the locking will be under mm core control at least.

Does that sound useful?
-- 
Michal Hocko
SUSE Labs

Seemingly Similar Threads

Search for more reasonably related threads

Virtualization - Jul 2017 - [PATCH v12 6/8] mm: support reporting free page blocks

[PATCH v12 6/8] mm: support reporting free page blocks

[PATCH v12 6/8] mm: support reporting free page blocks

[PATCH v12 6/8] mm: support reporting free page blocks

Seemingly Similar Threads