thr3ads.net - Linux Virtualization - [RFC] virtio-mem: paravirtualized memory [Jun 2017]

If this information is useful, please help other people find it:
Share via:

David Hildenbrand

2017-Jun-16 15:59 UTC

[RFC] virtio-mem: paravirtualized memory

On 16.06.2017 17:04, Michael S. Tsirkin wrote:> On Fri, Jun 16, 2017 at 04:20:02PM +0200, David Hildenbrand wrote:
>> Hi,
>>
>> this is an idea that is based on Andrea Arcangeli's original idea
to
>> host enforce guest access to memory given up using virtio-balloon using
>> userfaultfd in the hypervisor. While looking into the details, I
>> realized that host-enforcing virtio-balloon would result in way too
many
>> problems (mainly backwards compatibility) and would also have some
>> conceptual restrictions that I want to avoid. So I developed the idea
of
>> virtio-mem - "paravirtualized memory".
> 
> Thanks! I went over this quickly, will read some more in the
> coming days. I would like to ask for some clarifications
> on one part meanwhile:
Thanks for looking into it that fast! :)

In general, what this section is all about: Why to not simply host
enforce virtio-balloon.
> 
>> Q: Why not reuse virtio-balloon?
>>
>> A: virtio-balloon is for cooperative memory management. It has a fixed
>>    page size
> 
> We are fixing that with VIRTIO_BALLOON_F_PAGE_CHUNKS btw.
> I would appreciate you looking into that patchset.
Will do, thanks. Problem is that there is no "enforcement" on the page
size. VIRTIO_BALLOON_F_PAGE_CHUNKS simply allows to send bigger chunks.
Nobody hinders the guest (especially legacy virtio-balloon drivers) from
sending 4k pages.

So this doesn't really fix the issue (we have here), it just allows to
speed up transfer. Which is a good thing, but does not help for
enforcement at all. So, yes support for page sizes > 4k, but no way to
enforce it.
> 
>> and will deflate in certain situations.
> 
> What does this refer to?
A Linux guest will deflate the balloon (all or some pages) in the
following scenarios:
a) page migration
b) unload virtio-balloon kernel module
c) hibernate/suspension
d) (DEFLATE_ON_OOM)

A Linux guest will touch memory without deflating:
a) During a kexec() dump
d) On reboots (regular, after kexec(), system_reset)
> 
>> Any change we
>>    introduce will break backwards compatibility.
> 
> Why does this have to be the caseIf we suddenly enforce the existing virtio-balloon, we will break legacy
guests.

Simple example:
Guest with inflated virtio-balloon reboots. Touches inflated memory.
Gets killed at some random point.

Of course, another discussion would be "can't we move virtio-mem
functionality into virtio-balloon instead of changing virtio-balloon".
With the current concept this is also not possible (one region per
device vs. one virtio-balloon device). And I think while similar, these
are two different concepts.
> 
>> virtio-balloon was not
>>    designed to give guarantees. Nobody can hinder the guest from
>>    deflating/reusing inflated memory.
> 
> Reusing without deflate is forbidden with TELL_HOST, right?
TELL_HOST just means "please inform me". There is no way to NACK a
request. It is not a permission to do so, just a "friendly
notification". And this is exactly not what we want when host enforcing
memory access.

> 
>>    In addition, it might make perfect
>>    sense to have both, virtio-balloon and virtio-mem at the same time,
>>    especially looking at the DEFLATE_ON_OOM or STATS features of
>>    virtio-balloon. While virtio-mem is all about guarantees, virtio-
>>    balloon is about cooperation.
> 
> Thanks, and I intend to look more into this next week.
> 
I know that it is tempting to force this concept into virtio-balloon. I
spent quite some time thinking about this (and possible other techniques
like implicit memory deflation on reboots) and decided not to do it. We
just end up trying to hack around all possible things that could go
wrong, while still not being able to handle all requirements properly.

-- 

Thanks,

David

Michael S. Tsirkin

2017-Jun-16 20:19 UTC

head link

[RFC] virtio-mem: paravirtualized memory

On Fri, Jun 16, 2017 at 05:59:07PM +0200, David Hildenbrand
wrote:> On 16.06.2017 17:04, Michael S. Tsirkin wrote:
> > On Fri, Jun 16, 2017 at 04:20:02PM +0200, David Hildenbrand wrote:
> >> Hi,
> >>
> >> this is an idea that is based on Andrea Arcangeli's original
idea to
> >> host enforce guest access to memory given up using virtio-balloon
using
> >> userfaultfd in the hypervisor. While looking into the details, I
> >> realized that host-enforcing virtio-balloon would result in way
too many
> >> problems (mainly backwards compatibility) and would also have some
> >> conceptual restrictions that I want to avoid. So I developed the
idea of
> >> virtio-mem - "paravirtualized memory".
> > 
> > Thanks! I went over this quickly, will read some more in the
> > coming days. I would like to ask for some clarifications
> > on one part meanwhile:
> 
> Thanks for looking into it that fast! :)
> 
> In general, what this section is all about: Why to not simply host
> enforce virtio-balloon.
> > 
> >> Q: Why not reuse virtio-balloon?
> >>
> >> A: virtio-balloon is for cooperative memory management. It has a
fixed
> >>    page size
> > 
> > We are fixing that with VIRTIO_BALLOON_F_PAGE_CHUNKS btw.
> > I would appreciate you looking into that patchset.
> 
> Will do, thanks. Problem is that there is no "enforcement" on the
page
> size. VIRTIO_BALLOON_F_PAGE_CHUNKS simply allows to send bigger chunks.
> Nobody hinders the guest (especially legacy virtio-balloon drivers) from
> sending 4k pages.
> 
> So this doesn't really fix the issue (we have here), it just allows to
> speed up transfer. Which is a good thing, but does not help for
> enforcement at all. So, yes support for page sizes > 4k, but no way to
> enforce it.
> 
> > 
> >> and will deflate in certain situations.
> > 
> > What does this refer to?
> 
> A Linux guest will deflate the balloon (all or some pages) in the
> following scenarios:
> a) page migration
It inflates it first, doesn't it?
> b) unload virtio-balloon kernel module
> c) hibernate/suspension
> d) (DEFLATE_ON_OOM)
You need to set a flag in the balloon to allow this, right?
> A Linux guest will touch memory without deflating:
> a) During a kexec() dump
> d) On reboots (regular, after kexec(), system_reset)
> > 
> >> Any change we
> >>    introduce will break backwards compatibility.
> > 
> > Why does this have to be the case
> If we suddenly enforce the existing virtio-balloon, we will break legacy
> guests.
Can't we do it with a feature flag?
> Simple example:
> Guest with inflated virtio-balloon reboots. Touches inflated memory.
> Gets killed at some random point.
> 
> Of course, another discussion would be "can't we move virtio-mem
> functionality into virtio-balloon instead of changing virtio-balloon".
> With the current concept this is also not possible (one region per
> device vs. one virtio-balloon device). And I think while similar, these
> are two different concepts.
> 
> > 
> >> virtio-balloon was not
> >>    designed to give guarantees. Nobody can hinder the guest from
> >>    deflating/reusing inflated memory.
> > 
> > Reusing without deflate is forbidden with TELL_HOST, right?
> 
> TELL_HOST just means "please inform me". There is no way to NACK
a
> request. It is not a permission to do so, just a "friendly
> notification". And this is exactly not what we want when host
enforcing
> memory access.
> 
> 
> > 
> >>    In addition, it might make perfect
> >>    sense to have both, virtio-balloon and virtio-mem at the same
time,
> >>    especially looking at the DEFLATE_ON_OOM or STATS features of
> >>    virtio-balloon. While virtio-mem is all about guarantees,
virtio-
> >>    balloon is about cooperation.
> > 
> > Thanks, and I intend to look more into this next week.
> > 
> 
> I know that it is tempting to force this concept into virtio-balloon. I
> spent quite some time thinking about this (and possible other techniques
> like implicit memory deflation on reboots) and decided not to do it. We
> just end up trying to hack around all possible things that could go
> wrong, while still not being able to handle all requirements properly.
I agree there's a large # of requirements here not addressed by the balloon.

One other thing that would be helpful here is pointing out the
similarities between virtio-mem and the balloon. I'll ponder it
over the weekend.

The biggest worry for me is inability to support DMA into this memory.
Is this hard to fix?


Thanks!


> -- 
> 
> Thanks,
> 
> David

David Hildenbrand

2017-Jun-18 10:17 UTC

head link

[RFC] virtio-mem: paravirtualized memory

>> A Linux guest will deflate the balloon (all or some pages) in the
>> following scenarios:
>> a) page migration
> 
> It inflates it first, doesn't it?
Yes, that that is true. I was just listing all scenarios.
> 
>> b) unload virtio-balloon kernel module
>> c) hibernate/suspension
>> d) (DEFLATE_ON_OOM)
> 
> You need to set a flag in the balloon to allow this, right?
Yes, has to be enabled in QEMU and will propagate to the guest. It is
used in various setups and you could either go for DEFLATE_ON_OOM
(cooperative memory manangement) or memory unplug, not both.
> 
>> A Linux guest will touch memory without deflating:
>> a) During a kexec() dump
>> d) On reboots (regular, after kexec(), system_reset)
>>>
>>>> Any change we
>>>>    introduce will break backwards compatibility.
>>>
>>> Why does this have to be the case
>> If we suddenly enforce the existing virtio-balloon, we will break
legacy
>> guests.
> 
> Can't we do it with a feature flag?
I haven't found an easy way to do that, without turning all existing
virtio-balloon implementations useless. But honestly, whatever you do,
you will be confronted with the very basic problems of this approach:

Random memory holes on a reboot and the chance that the guest that comes up

a) contains a legacy virtio-balloon
b) contains no virtio-balloon at all
c) starts up virtio-balloon too late to fill the holes

Now, there are various possible approaches that require their own hacks
and only solve a subset of these problems. Just a very short version of
it all:

1) very early virtio-balloon that queries a bitmap of inflated memory
via some interface. This is just a giant hack (e.g. what about Windows?)
and even the bios might already touch inflated memory. Still breaks at
least b) and c). No good.

2) Do "implicit" balloon inflation on a reboot. Any page the guest
touches is marked as inflated. This requires a lot of quirks in the host
and still breaks at least b) and c). Basically no good for us.

Yo can read more about the involved problems at
https://blog.xenproject.org/2014/02/14/ballooning-rebooting-and-the-feature-youve-never-heard-of/

3) Try to mark inflated pages as reserved in the a820 bitmap and make
the balloon hotplug these. Well, this is x86 special and has some other
problems (e.g. what to do with ACPI hotplugged memory?). Also, how to
handle this on windows? Exploding size of the a820 map. No good.

4) Try to resize the guest main memory, to compensate unplugged memory.
While this sounds promising, there are elementary problems to solve: How
to deal with ACPI hotplugged memory? What to resize? And there has to be
ACPI hotplug, otherwise you cannot add more memory to a guest. While we
could solve some x86 specific problems here, migration on the QEMU side
will also be "fun". virtio-mem heavily simplifies that all by only
working on its own memory.

But again, these are all hacks, and at least I don't want to create a
giant hack and call it virtio-*, that is restricted to some very
specific use cases and/or architectures. Let's just do it in a clean way
if possible.

[...]
> I agree there's a large # of requirements here not addressed by the
> balloon.
Exactly, and it tries to solve the basic problem of rebooting into a
guest that does not contain a fitting guest driver.
>
> One other thing that would be helpful here is pointing out the
> similarities between virtio-mem and the balloon. I'll ponder it
> over the weekend.
There is much more difference here than similarity. The only thing they
share is allocating/freeing memory and tell the host about it. But
already how/from where memory is allocated is different. I think even
the general use case is different. Again, I think both concepts make
sense to coexist.
> 
> The biggest worry for me is inability to support DMA into this memory.
> Is this hard to fix?
As a short term solution: Always give your (x86) guest at least 3.x G of
base memory. And I mean that is the exact same thing you have with
ordinary ACPI based memory hotplug right now. That will also never
become DMA memory. So it is not worse compared to what we have right now.

Long term solution: I think this was never a use case. Usually, all
memory you "add", you theoretically want to be able to
"remove" again.
So from that point, it does not make sense to mark it as DMA and feed it
to some driver that will not let go of it. I haven't had a deep look at
it, but I at least think it could be done with some effort. Not sure
about Windows.

Thanks!

-- 

Thanks,

David

Possibly Parallel Threads

Search for more maybe matching threads

Linux Virtualization - Jun 2017 - [RFC] virtio-mem: paravirtualized memory

[RFC] virtio-mem: paravirtualized memory

[RFC] virtio-mem: paravirtualized memory

[RFC] virtio-mem: paravirtualized memory

Possibly Parallel Threads