On 09/14/2019 02:36 AM, Tyler Sanderson wrote:> Hello, I'm curious about the intent of VIRTIO_BALLOON_F_FREE_PAGE_HINT > (commit > <https://github.com/torvalds/linux/commit/86a559787e6f5cf662c081363f64a20cad654195#diff-fd202acf694d9eba19c8c64da3e480c9>). > > > My understanding is that this mechanism works similarly to the > existing inflate/deflate queues. Pages are allocated by the guest and > then reported on VQ_FREE_PAGE. > > Question: Is there a limit to how many pages will be allocated? What > controls the amount of memory pressure applied?No control for the limit currently. The implementation reports all the guest free pages to host. The main usage for this feature so far is to have guest skip sending those guest free pages (the more, the better) during live migration.> > In my experience with virtio balloon there are problems with the > mechanisms that are supposed to deflate the balloon in response to > memory pressure (e.g. OOM notifier).What problem did you see? We've also changed balloon to use memory shrinker, did you see the problem with shrinker as well?> > It seems an ideal balloon interface would allow the guest to round > robin through free guest physical pages, allowing the host to unback > them, but never having more than a few pages allocated to the balloon > at any one time. For example: > 1. Guest allocates 1 page and notifies balloon device of this page's > address. > 2. Host debacks the received page. > 3. Guest frees the page. > 4. Repeat at #1, but ensure that different pages are allocated each time.Probably you need a mechanism to "ensure" different pages to be allocated. The current implementation (having balloon hold the allocated pages) could be thought of as one mechanism (it is simple).> > This way the "balloon size" is never more than a few pages and does > not create memory pressure. However the difficulty is in ensuring each > set of sent pages is disjoint from previously sent pages. Is there a > mechanism to round-robin allocations through all of guest physical > memory? Does VIRTIO_BALLOON_F_FREE_PAGE_HINT enable this? >AFAIK, no such round-robin allocation so far. This may need the page allocation to have states recording the allocation history. Best, Wei
On 16.09.19 03:41, Wei Wang wrote:> On 09/14/2019 02:36 AM, Tyler Sanderson wrote: >> Hello, I'm curious about the intent of VIRTIO_BALLOON_F_FREE_PAGE_HINT >> (commit >> <https://github.com/torvalds/linux/commit/86a559787e6f5cf662c081363f64a20cad654195#diff-fd202acf694d9eba19c8c64da3e480c9>). >> >> >> My understanding is that this mechanism works similarly to the >> existing inflate/deflate queues. Pages are allocated by the guest and >> then reported on VQ_FREE_PAGE. >> >> Question: Is there a limit to how many pages will be allocated? What >> controls the amount of memory pressure applied? > > No control for the limit currently. The implementation reports all the > guest free pages to host. > The main usage for this feature so far is to have guest skip sending > those guest free pages > (the more, the better) during live migration. > > >> >> In my experience with virtio balloon there are problems with the >> mechanisms that are supposed to deflate the balloon in response to >> memory pressure (e.g. OOM notifier). > > What problem did you see? We've also changed balloon to use memory shrinker, > did you see the problem with shrinker as well? > >> >> It seems an ideal balloon interface would allow the guest to round >> robin through free guest physical pages, allowing the host to unback >> them, but never having more than a few pages allocated to the balloon >> at any one time. For example: >> 1. Guest allocates 1 page and notifies balloon device of this page's >> address. >> 2. Host debacks the received page. >> 3. Guest frees the page. >> 4. Repeat at #1, but ensure that different pages are allocated each time. > > Probably you need a mechanism to "ensure" different pages to be allocated. > The current implementation (having balloon hold the allocated pages) could > be thought of as one mechanism (it is simple). > >> >> This way the "balloon size" is never more than a few pages and does >> not create memory pressure. However the difficulty is in ensuring each >> set of sent pages is disjoint from previously sent pages. Is there a >> mechanism to round-robin allocations through all of guest physical >> memory? Does VIRTIO_BALLOON_F_FREE_PAGE_HINT enable this?There are use cases where you really want memory pressure (page cache is the prime example). Anyhow, I think the use case you want the "round-robin allocations" for is better tackled by "free page reporting" (used to be called "free page hinting") currently discussed on various lists. "allowing the host to unback them, but never having more than a few pages allocated to the balloon at any one time." is similar to what "free page reporting" does. We decided to only report bigger pages (avoid splitting up THP in the hypervisor, overhead) and only temporarily pull out a fixed amount of pages (16) from the page allocator to avoid false-OOM. Guaranteeing forward progress (similar to what you describe) is one important key concept. -- Thanks, David / dhildenb
On Thu, Oct 03, 2019 at 11:27:46AM -0700, Tyler Sanderson wrote:> Sorry for the slow reply, I did some verification on my end. See responses > inline. > > On Mon, Sep 16, 2019 at 12:26 AM David Hildenbrand <david at redhat.com> wrote: > > On 16.09.19 03:41, Wei Wang wrote: > > On 09/14/2019 02:36 AM, Tyler Sanderson wrote: > >> Hello, I'm curious about the intent of VIRTIO_BALLOON_F_FREE_PAGE_HINT > >> (commit > >> <https://github.com/torvalds/linux/commit/ > 86a559787e6f5cf662c081363f64a20cad654195# > diff-fd202acf694d9eba19c8c64da3e480c9>). > >> > >> > >> My understanding is that this mechanism works similarly to the > >> existing inflate/deflate queues. Pages are allocated by the guest and > >> then reported on VQ_FREE_PAGE. > >> > >> Question: Is there a limit to how many pages will be allocated? What > >> controls the amount of memory pressure applied? > > > > No control for the limit currently. The implementation reports all the > > guest free pages to host. > > The main usage for this feature so far is to have guest skip sending > > those guest free pages > > (the more, the better) during live migration. > > How does this differ from the regular inflate/deflate queue? > Also, couldn't you simply skip sending pages that do not have host pages > backing them (assuming pages added to the balloon are unbacked to reclaim the > memory)?Yes but putting most guest memory into the balloon would slow the guest down significantly.> > > > > > >> > >> In my experience with virtio balloon there are problems with the > >> mechanisms that are supposed to deflate the balloon in response to > >> memory pressure (e.g. OOM notifier). > > > > What problem did you see? We've also changed balloon to use memory > shrinker, > > did you see the problem with shrinker as well? > > Yes, I've observed problems both before and after the shrinker change (although > different problems). > Before the shrinker change, the overcommit accounting?feature gets in the way > and prevents allocations, even when the balloon could be deflated. The OOM > notifier is never invoked so the balloon driver's hook into the OOM?notifier is > useless. > After the shrinker change the overcommit accounting problem is fixed, but I > have still found that forcibly deflating the balloon under memory pressure is > slow enough that random allocations can still fail (is there a timeout for > allocations?). > For example, I've seen: > tysand at vm ~ $ fallocate -l 5G d/foo? ? // d is tmpfs mount. This command causes > balloon to require deflation. > tysand at vm grep Mem /proc/meminfo > MemTotal: ? ? ? ?8172852 kB > MemFree: ? ? ? ? ?138932 kB > MemAvailable: ? ? ?83428 kB > tysand at vm ~ $ grep Mem /proc/meminfo > free(): invalid pointer > -bash: wait_for: No record of process 5415 > free(): invalid pointer > > Or similarly, I've seen SSH terminate with: > tysand at vm:~$ grep Mem /proc/meminfo > *** stack smashing detected ***: <unknown> terminated > > Presumably the stack smashing and "free(): invalid pointer" are caused by > malloc returning null in those programs and the programs not handling it > correctly. > > Notably I don't see the fallocate command fail. Usually only other processes. > > > > > >> > >> It seems an ideal balloon interface would allow the guest to round > >> robin through free guest physical pages, allowing the host to unback > >> them, but never having more than a few pages allocated to the balloon > >> at any one time. For example: > >> 1. Guest allocates 1 page and notifies balloon device of this page's > >> address. > >> 2. Host debacks the received page. > >> 3. Guest frees the page. > >> 4. Repeat at #1, but ensure that different pages are allocated each > time. > > > > Probably you need a mechanism to "ensure" different pages to be > allocated. > > The current implementation (having balloon hold the allocated pages) > could > > be thought of as one mechanism (it is simple). > > > >> > >> This way the "balloon size" is never more than a few pages and does > >> not create memory pressure. However the difficulty is in ensuring each > >> set of sent pages is disjoint from previously sent pages. Is there a > >> mechanism to round-robin allocations through all of guest physical > >> memory? Does VIRTIO_BALLOON_F_FREE_PAGE_HINT enable this? > > There are use cases where you really want memory pressure (page cache is > the prime example). Anyhow, I think the use case you want the > "round-robin allocations" for is better tackled by "free page reporting" > (used to be called "free page hinting") currently discussed on various > lists. > > "allowing the host to unback them, but never having more than a few > pages allocated to the balloon at any one time." is similar to what > "free page reporting" does. We decided to only report bigger pages > (avoid splitting up THP in the hypervisor, overhead) and only > temporarily pull out a fixed amount of pages (16) from the page > allocator to avoid false-OOM. Guaranteeing forward progress (similar to > what you describe) is one important key concept. > > > I'm really excited to see this being pursued! It looks like things are actively > moving forward. > > > > -- > > Thanks, > > David / dhildenb >
On 04.10.19 01:15, Tyler Sanderson wrote:> I was mistaken, the problem with overcommit accounting is not fixed by > the change to shrinker interface. > This means that large allocations are stopped even if they could succeed > by deflating the balloon.Please note that some people use the balloon for actual memory unplug - so initiating to deflate the balloon under any circumstances is undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being set - however that is barely the case (at least in the setups I know :) ). So yes, free page reporting is a different thing, because it really is used to "hint" and not to "agree to unplug" in any scenario. -- Thanks, David / dhildenb
On Fri, Oct 04, 2019 at 10:06:03AM +0200, David Hildenbrand wrote:> On 04.10.19 01:15, Tyler Sanderson wrote: > > I was mistaken, the problem with overcommit accounting is not fixed by > > the change to shrinker interface. > > This means that large allocations are stopped even if they could succeed > > by deflating the balloon. > > Please note that some people use the balloon for actual memory unplug - > so initiating to deflate the balloon under any circumstances is > undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being > set - however that is barely the case (at least in the setups I know :) ). > > So yes, free page reporting is a different thing, because it really is > used to "hint" and not to "agree to unplug" in any scenario. > > -- > > Thanks, >VIRTIO_BALLOON_F_DEFLATE_ON_OOM isn't really well thought through at the spec level either. For example, when will we inflate again? Current code does this at the next interrupt, which requires host to somehow know it's time to inflate. -- MST