thr3ads.net - Virtualization - VIRTIO_BALLOON_F_FREE_PAGE

If this information is useful, please help other people find it:
Share via:

Wei Wang

2019-Sep-16 01:41 UTC

VIRTIO_BALLOON_F_FREE_PAGE_HINT

On 09/14/2019 02:36 AM, Tyler Sanderson wrote:> Hello, I'm curious about the intent of VIRTIO_BALLOON_F_FREE_PAGE_HINT 
> (commit 
>
<https://github.com/torvalds/linux/commit/86a559787e6f5cf662c081363f64a20cad654195#diff-fd202acf694d9eba19c8c64da3e480c9>).
>
>
> My understanding is that this mechanism works similarly to the 
> existing inflate/deflate queues. Pages are allocated by the guest and 
> then reported on VQ_FREE_PAGE.
>
> Question: Is there a limit to how many pages will be allocated? What 
> controls the amount of memory pressure applied?
No control for the limit currently. The implementation reports all the 
guest free pages to host.
The main usage for this feature so far is to have guest skip sending 
those guest free pages
(the more, the better) during live migration.

>
> In my experience with virtio balloon there are problems with the 
> mechanisms that are supposed to deflate the balloon in response to 
> memory pressure (e.g. OOM notifier).
What problem did you see? We've also changed balloon to use memory shrinker,
did you see the problem with shrinker as well?
>
> It seems an ideal balloon interface would allow the guest to round 
> robin through free guest physical pages, allowing the host to unback 
> them, but never having more than a few pages allocated to the balloon 
> at any one time. For example:
> 1. Guest allocates 1 page and notifies balloon device of this page's 
> address.
> 2. Host debacks the received page.
> 3. Guest frees the page.
> 4. Repeat at #1, but ensure that different pages are allocated each time.
Probably you need a mechanism to "ensure" different pages to be
allocated.
The current implementation (having balloon hold the allocated pages) could
be thought of as one mechanism (it is simple).
>
> This way the "balloon size" is never more than a few pages and
does
> not create memory pressure. However the difficulty is in ensuring each 
> set of sent pages is disjoint from previously sent pages. Is there a 
> mechanism to round-robin allocations through all of guest physical 
> memory? Does VIRTIO_BALLOON_F_FREE_PAGE_HINT enable this?
>
AFAIK, no such round-robin allocation so far. This may need the page 
allocation to have states recording
the allocation history.

Best,
Wei

David Hildenbrand

2019-Sep-16 07:26 UTC

head link

VIRTIO_BALLOON_F_FREE_PAGE_HINT

On 16.09.19 03:41, Wei Wang wrote:> On 09/14/2019 02:36 AM, Tyler Sanderson wrote:
>> Hello, I'm curious about the intent of
VIRTIO_BALLOON_F_FREE_PAGE_HINT
>> (commit 
>>
<https://github.com/torvalds/linux/commit/86a559787e6f5cf662c081363f64a20cad654195#diff-fd202acf694d9eba19c8c64da3e480c9>).
>>
>>
>> My understanding is that this mechanism works similarly to the 
>> existing inflate/deflate queues. Pages are allocated by the guest and 
>> then reported on VQ_FREE_PAGE.
>>
>> Question: Is there a limit to how many pages will be allocated? What 
>> controls the amount of memory pressure applied?
> 
> No control for the limit currently. The implementation reports all the 
> guest free pages to host.
> The main usage for this feature so far is to have guest skip sending 
> those guest free pages
> (the more, the better) during live migration.
> 
> 
>>
>> In my experience with virtio balloon there are problems with the 
>> mechanisms that are supposed to deflate the balloon in response to 
>> memory pressure (e.g. OOM notifier).
> 
> What problem did you see? We've also changed balloon to use memory
shrinker,
> did you see the problem with shrinker as well?
> 
>>
>> It seems an ideal balloon interface would allow the guest to round 
>> robin through free guest physical pages, allowing the host to unback 
>> them, but never having more than a few pages allocated to the balloon 
>> at any one time. For example:
>> 1. Guest allocates 1 page and notifies balloon device of this
page's
>> address.
>> 2. Host debacks the received page.
>> 3. Guest frees the page.
>> 4. Repeat at #1, but ensure that different pages are allocated each
time.
> 
> Probably you need a mechanism to "ensure" different pages to be
allocated.
> The current implementation (having balloon hold the allocated pages) could
> be thought of as one mechanism (it is simple).
> 
>>
>> This way the "balloon size" is never more than a few pages
and does
>> not create memory pressure. However the difficulty is in ensuring each 
>> set of sent pages is disjoint from previously sent pages. Is there a 
>> mechanism to round-robin allocations through all of guest physical 
>> memory? Does VIRTIO_BALLOON_F_FREE_PAGE_HINT enable this?
There are use cases where you really want memory pressure (page cache is
the prime example). Anyhow, I think the use case you want the
"round-robin allocations" for is better tackled by "free page
reporting"
(used to be called "free page hinting") currently discussed on various
lists.

"allowing the host to unback them, but never having more than a few
pages allocated to the balloon at any one time." is similar to what
"free page reporting" does. We decided to only report bigger pages
(avoid splitting up THP in the hypervisor, overhead) and only
temporarily pull out a fixed amount of pages (16) from the page
allocator to avoid false-OOM. Guaranteeing forward progress (similar to
what you describe) is one important key concept.

-- 

Thanks,

David / dhildenb

Michael S. Tsirkin

2019-Oct-03 18:31 UTC

head link

VIRTIO_BALLOON_F_FREE_PAGE_HINT

On Thu, Oct 03, 2019 at 11:27:46AM -0700, Tyler Sanderson
wrote:> Sorry for the slow reply, I did some verification on my end. See responses
> inline.
> 
> On Mon, Sep 16, 2019 at 12:26 AM David Hildenbrand <david at
redhat.com> wrote:
> 
>     On 16.09.19 03:41, Wei Wang wrote:
>     > On 09/14/2019 02:36 AM, Tyler Sanderson wrote:
>     >> Hello, I'm curious about the intent of
VIRTIO_BALLOON_F_FREE_PAGE_HINT
>     >> (commit
>     >> <https://github.com/torvalds/linux/commit/
>     86a559787e6f5cf662c081363f64a20cad654195#
>     diff-fd202acf694d9eba19c8c64da3e480c9>).
>     >>
>     >>
>     >> My understanding is that this mechanism works similarly to the
>     >> existing inflate/deflate queues. Pages are allocated by the
guest and
>     >> then reported on VQ_FREE_PAGE.
>     >>
>     >> Question: Is there a limit to how many pages will be
allocated? What
>     >> controls the amount of memory pressure applied?
>     >
>     > No control for the limit currently. The implementation reports all
the
>     > guest free pages to host.
>     > The main usage for this feature so far is to have guest skip
sending
>     > those guest free pages
>     > (the more, the better) during live migration.
> 
> How does this differ from the regular inflate/deflate queue?
> Also, couldn't you simply skip sending pages that do not have host
pages
> backing them (assuming pages added to the balloon are unbacked to reclaim
the
> memory)?
Yes but putting most guest memory into the balloon would
slow the guest down significantly.

> 
>     >
>     >
>     >>
>     >> In my experience with virtio balloon there are problems with
the
>     >> mechanisms that are supposed to deflate the balloon in
response to
>     >> memory pressure (e.g. OOM notifier).
>     >
>     > What problem did you see? We've also changed balloon to use
memory
>     shrinker,
>     > did you see the problem with shrinker as well?
> 
> Yes, I've observed problems both before and after the shrinker change
(although
> different problems).
> Before the shrinker change, the overcommit accounting?feature gets in the
way
> and prevents allocations, even when the balloon could be deflated. The OOM
> notifier is never invoked so the balloon driver's hook into the
OOM?notifier is
> useless.
> After the shrinker change the overcommit accounting problem is fixed, but I
> have still found that forcibly deflating the balloon under memory pressure
is
> slow enough that random allocations can still fail (is there a timeout for
> allocations?).
> For example, I've seen:
> tysand at vm ~ $ fallocate -l 5G d/foo? ? // d is tmpfs mount. This command
causes
> balloon to require deflation.
> tysand at vm grep Mem /proc/meminfo
> MemTotal: ? ? ? ?8172852 kB
> MemFree: ? ? ? ? ?138932 kB
> MemAvailable: ? ? ?83428 kB
> tysand at vm ~ $ grep Mem /proc/meminfo
> free(): invalid pointer
> -bash: wait_for: No record of process 5415
> free(): invalid pointer
> 
> Or similarly, I've seen SSH terminate with:
> tysand at vm:~$ grep Mem /proc/meminfo
> *** stack smashing detected ***: <unknown> terminated
> 
> Presumably the stack smashing and "free(): invalid pointer" are
caused by
> malloc returning null in those programs and the programs not handling it
> correctly.
> 
> Notably I don't see the fallocate command fail. Usually only other
processes.
> 
> 
>     >
>     >>
>     >> It seems an ideal balloon interface would allow the guest to
round
>     >> robin through free guest physical pages, allowing the host to
unback
>     >> them, but never having more than a few pages allocated to the
balloon
>     >> at any one time. For example:
>     >> 1. Guest allocates 1 page and notifies balloon device of this
page's
>     >> address.
>     >> 2. Host debacks the received page.
>     >> 3. Guest frees the page.
>     >> 4. Repeat at #1, but ensure that different pages are allocated
each
>     time.
>     >
>     > Probably you need a mechanism to "ensure" different
pages to be
>     allocated.
>     > The current implementation (having balloon hold the allocated
pages)
>     could
>     > be thought of as one mechanism (it is simple).
>     >
>     >>
>     >> This way the "balloon size" is never more than a few
pages and does
>     >> not create memory pressure. However the difficulty is in
ensuring each
>     >> set of sent pages is disjoint from previously sent pages. Is
there a
>     >> mechanism to round-robin allocations through all of guest
physical
>     >> memory? Does VIRTIO_BALLOON_F_FREE_PAGE_HINT enable this?
> 
>     There are use cases where you really want memory pressure (page cache
is
>     the prime example). Anyhow, I think the use case you want the
>     "round-robin allocations" for is better tackled by "free
page reporting"
>     (used to be called "free page hinting") currently discussed
on various
>     lists.
> 
>     "allowing the host to unback them, but never having more than a
few
>     pages allocated to the balloon at any one time." is similar to
what
>     "free page reporting" does. We decided to only report bigger
pages
>     (avoid splitting up THP in the hypervisor, overhead) and only
>     temporarily pull out a fixed amount of pages (16) from the page
>     allocator to avoid false-OOM. Guaranteeing forward progress (similar to
>     what you describe) is one important key concept.
> 
> 
> I'm really excited to see this being pursued! It looks like things are
actively
> moving forward.
> 
> 
> 
>     --
> 
>     Thanks,
> 
>     David / dhildenb
>

David Hildenbrand

2019-Oct-04 08:06 UTC

head link

VIRTIO_BALLOON_F_FREE_PAGE_HINT

On 04.10.19 01:15, Tyler Sanderson wrote:> I was mistaken, the problem with overcommit accounting is not fixed by
> the change to shrinker interface.
> This means that large allocations are stopped even if they could succeed
> by deflating the balloon.
Please note that some people use the balloon for actual memory unplug -
so initiating to deflate the balloon under any circumstances is
undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM"
being
set - however that is barely the case (at least in the setups I know :) ).

So yes, free page reporting is a different thing, because it really is
used to "hint" and not to "agree to unplug" in any scenario.

-- 

Thanks,

David / dhildenb

Michael S. Tsirkin

2019-Oct-04 08:35 UTC

head link

VIRTIO_BALLOON_F_FREE_PAGE_HINT

On Fri, Oct 04, 2019 at 10:06:03AM +0200, David Hildenbrand
wrote:> On 04.10.19 01:15, Tyler Sanderson wrote:
> > I was mistaken, the problem with overcommit accounting is not fixed by
> > the change to shrinker interface.
> > This means that large allocations are stopped even if they could
succeed
> > by deflating the balloon.
> 
> Please note that some people use the balloon for actual memory unplug -
> so initiating to deflate the balloon under any circumstances is
> undesired. It's different with
"VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being
> set - however that is barely the case (at least in the setups I know :) ).
> 
> So yes, free page reporting is a different thing, because it really is
> used to "hint" and not to "agree to unplug" in any
scenario.
> 
> -- 
> 
> Thanks,
> 

VIRTIO_BALLOON_F_DEFLATE_ON_OOM isn't really well thought through
at the spec level either. For example, when will we inflate again?
Current code does this at the next interrupt, which requires
host to somehow know it's time to inflate.

-- 
MST

Maybe Matching Threads

Search for more possibly parallel threads

Virtualization - Oct 2019 - VIRTIO_BALLOON_F_FREE_PAGE_HINT

VIRTIO_BALLOON_F_FREE_PAGE_HINT

VIRTIO_BALLOON_F_FREE_PAGE_HINT

VIRTIO_BALLOON_F_FREE_PAGE_HINT

VIRTIO_BALLOON_F_FREE_PAGE_HINT

VIRTIO_BALLOON_F_FREE_PAGE_HINT

Maybe Matching Threads