thr3ads.net - Virtualization - [PATCH v12 6/8] mm: support reporting free page blocks [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Michal Hocko

2017-Jul-25 14:53 UTC

[PATCH v12 6/8] mm: support reporting free page blocks

On Tue 25-07-17 14:47:16, Wang, Wei W wrote:> On Tuesday, July 25, 2017 8:42 PM, hal Hocko wrote:
> > On Tue 25-07-17 19:56:24, Wei Wang wrote:
> > > On 07/25/2017 07:25 PM, Michal Hocko wrote:
> > > >On Tue 25-07-17 17:32:00, Wei Wang wrote:
> > > >>On 07/24/2017 05:00 PM, Michal Hocko wrote:
> > > >>>On Wed 19-07-17 20:01:18, Wei Wang wrote:
> > > >>>>On 07/19/2017 04:13 PM, Michal Hocko wrote:
> > > >>>[...
> > > We don't need to do the pfn walk in the guest kernel. When
the API
> > > reports, for example, a 2MB free page block, the API caller
offers to
> > > the hypervisor the base address of the page block, and size=2MB,
to
> > > the hypervisor.
> > 
> > So you want to skip pfn walks by regularly calling into the page
allocator to
> > update your bitmap. If that is the case then would an API that would
allow you
> > to update your bitmap via a callback be s sufficient? Something like
> > 	void walk_free_mem(int node, int min_order,
> > 			void (*visit)(unsigned long pfn, unsigned long nr_pages))
> > 
> > The function will call the given callback for each free memory block
on the given
> > node starting from the given min_order. The callback will be strictly
an atomic
> > and very light context. You can update your bitmap from there.
> 
> I would need to introduce more about the background here:
> The hypervisor and the guest live in their own address space. The
hypervisor's bitmap
> isn't seen by the guest. I think we also wouldn't be able to give a
callback function
> from the hypervisor to the guest in this case.
How did you plan to use your original API which export struct page array
then?
> > This would address my main concern that the allocator internals would
get
> > outside of the allocator proper. 
> 
> What issue would it have to expose the internal, for_each_zone()?
zone is a MM internal concept. No code outside of the MM proper should
really care about zones. 
-- 
Michal Hocko
SUSE Labs

Wei Wang

2017-Jul-26 02:22 UTC

head link

[PATCH v12 6/8] mm: support reporting free page blocks

On 07/25/2017 10:53 PM, Michal Hocko wrote:> On Tue 25-07-17 14:47:16, Wang, Wei W wrote:
>> On Tuesday, July 25, 2017 8:42 PM, hal Hocko wrote:
>>> On Tue 25-07-17 19:56:24, Wei Wang wrote:
>>>> On 07/25/2017 07:25 PM, Michal Hocko wrote:
>>>>> On Tue 25-07-17 17:32:00, Wei Wang wrote:
>>>>>> On 07/24/2017 05:00 PM, Michal Hocko wrote:
>>>>>>> On Wed 19-07-17 20:01:18, Wei Wang wrote:
>>>>>>>> On 07/19/2017 04:13 PM, Michal Hocko wrote:
>>>>>>> [...
>>>> We don't need to do the pfn walk in the guest kernel. When
the API
>>>> reports, for example, a 2MB free page block, the API caller
offers to
>>>> the hypervisor the base address of the page block, and
size=2MB, to
>>>> the hypervisor.
>>> So you want to skip pfn walks by regularly calling into the page
allocator to
>>> update your bitmap. If that is the case then would an API that
would allow you
>>> to update your bitmap via a callback be s sufficient? Something
like
>>> 	void walk_free_mem(int node, int min_order,
>>> 			void (*visit)(unsigned long pfn, unsigned long nr_pages))
>>>
>>> The function will call the given callback for each free memory
block on the given
>>> node starting from the given min_order. The callback will be
strictly an atomic
>>> and very light context. You can update your bitmap from there.
>> I would need to introduce more about the background here:
>> The hypervisor and the guest live in their own address space. The
hypervisor's bitmap
>> isn't seen by the guest. I think we also wouldn't be able to
give a callback function
>> from the hypervisor to the guest in this case.
> How did you plan to use your original API which export struct page array
> then?

That's where the virtio-balloon driver comes in. It uses a shared ring 
mechanism to
send the guest memory info to the hypervisor.

We didn't expose the struct page array from the guest to the hypervisor. 
For example, when
a 2MB free page block is reported from the free page list, the info put 
on the ring is just
(base address of the 2MB continuous memory, size=2M).

>
>>> This would address my main concern that the allocator internals
would get
>>> outside of the allocator proper.
>> What issue would it have to expose the internal, for_each_zone()?
> zone is a MM internal concept. No code outside of the MM proper should
> really care about zones.
I think this is also what Andrew suggested in the previous discussion:
https://lkml.org/lkml/2017/3/16/951

Move the code to virtio-balloon and a little layering violation seems 
acceptable.


Best,
Wei

Michal Hocko

2017-Jul-26 10:24 UTC

head link

[PATCH v12 6/8] mm: support reporting free page blocks

On Wed 26-07-17 10:22:23, Wei Wang wrote:> On 07/25/2017 10:53 PM, Michal Hocko wrote:
> >On Tue 25-07-17 14:47:16, Wang, Wei W wrote:
> >>On Tuesday, July 25, 2017 8:42 PM, hal Hocko wrote:
> >>>On Tue 25-07-17 19:56:24, Wei Wang wrote:
> >>>>On 07/25/2017 07:25 PM, Michal Hocko wrote:
> >>>>>On Tue 25-07-17 17:32:00, Wei Wang wrote:
> >>>>>>On 07/24/2017 05:00 PM, Michal Hocko wrote:
> >>>>>>>On Wed 19-07-17 20:01:18, Wei Wang wrote:
> >>>>>>>>On 07/19/2017 04:13 PM, Michal Hocko wrote:
> >>>>>>>[...
> >>>>We don't need to do the pfn walk in the guest kernel.
When the API
> >>>>reports, for example, a 2MB free page block, the API caller
offers to
> >>>>the hypervisor the base address of the page block, and
size=2MB, to
> >>>>the hypervisor.
> >>>So you want to skip pfn walks by regularly calling into the
page allocator to
> >>>update your bitmap. If that is the case then would an API that
would allow you
> >>>to update your bitmap via a callback be s sufficient? Something
like
> >>>	void walk_free_mem(int node, int min_order,
> >>>			void (*visit)(unsigned long pfn, unsigned long nr_pages))
> >>>
> >>>The function will call the given callback for each free memory
block on the given
> >>>node starting from the given min_order. The callback will be
strictly an atomic
> >>>and very light context. You can update your bitmap from there.
> >>I would need to introduce more about the background here:
> >>The hypervisor and the guest live in their own address space. The
hypervisor's bitmap
> >>isn't seen by the guest. I think we also wouldn't be able
to give a callback function
> >>from the hypervisor to the guest in this case.
> >How did you plan to use your original API which export struct page
array
> >then?
> 
> 
> That's where the virtio-balloon driver comes in. It uses a shared ring
> mechanism to
> send the guest memory info to the hypervisor.
> 
> We didn't expose the struct page array from the guest to the
hypervisor. For
> example, when
> a 2MB free page block is reported from the free page list, the info put on
> the ring is just
> (base address of the 2MB continuous memory, size=2M).
So what exactly prevents virtio-balloon from using the above proposed
callback mechanism and export what is needed to the hypervisor?
 > >>>This would address my main concern that the allocator internals
would get
> >>>outside of the allocator proper.
> >>What issue would it have to expose the internal, for_each_zone()?
> >zone is a MM internal concept. No code outside of the MM proper should
> >really care about zones.
> 
> I think this is also what Andrew suggested in the previous discussion:
> https://lkml.org/lkml/2017/3/16/951
> 
> Move the code to virtio-balloon and a little layering violation seems
> acceptable.
Andrew didn't suggest to expose zones to modules. If I read his words
properly he said that a functionality which "provides a snapshot of the
present system unused pages" is just too specific for virtio usecase
to be a generic function and as such it should be in virtio. I tend
to agree. Even the proposed callback API is a layer violation. We
should just make sure that the API is not inherently dangerous. That
is why I have chosen to give only pfn and nr_pages to the caller. Sure
somebody could argue that the caller could do pfn_to_page and do nasty
things. That would be a fair argument but nothing really prevents
anybody to do th pfn walk already so there shouldn't inherently more
risk. We can document what we expect from the API user and that would be
much easier to describe than struct page API IMHO.
-- 
Michal Hocko
SUSE Labs

Seemingly Similar Threads

Search for more possibly parallel threads

Virtualization - Jul 2017 - [PATCH v12 6/8] mm: support reporting free page blocks

[PATCH v12 6/8] mm: support reporting free page blocks

[PATCH v12 6/8] mm: support reporting free page blocks

[PATCH v12 6/8] mm: support reporting free page blocks

Seemingly Similar Threads