David Hildenbrand
2020-Apr-20 13:28 UTC
[virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
>>> Now we are talking about what's safe to do with the page. >>> >>> If POISON flag is set by hypervisor but clear by guest, >>> or poison_val is 0, then it's clear it's safe to blow >>> away the page if we can figure out it's unused. >>> >>> Otherwise, it's much less clear :) >> >> Hah! Agreed :D > > That isn't quite true. The problem is in the case of hinting it isn't > setting the page to 0. It is simply not migrating it. So if there is > data from an earlier pass it is stuck at that value. So the balloon > will see the poison/init on some pages cleared, however I suppose the > balloon doesn't care about the contents of the page. For the pages > that are leaked back out via the shrinker they will be dirtied so they > will end up being migrated with the correct value eventually.Right, I think current Linux guests are fine. The critical piece we are talking about is 1) Guest balloon allocates and hints a page 2) Hypervisor does not process hinting request 3) Guest frees the page and reuses it (especially, writes it). 4) Hypervisor processes the hinting request. AFAIU, as soon as the guest writes the page (e.g., zeroes it/poisons it in the buddy, or somebody who allocated it), the page *will* get migrated, even if 4) happens after 3). That's guaranteed by the 2-bitmap magic. Now, assume the following happens (in some future Linux version) (due to your "simply not migrating it" comment): 1) Guest balloon allocates and hints a page. Assume the page is zero due to want_init_on_free(). 2) Hypervisor processes the hinting request. 3) Guest frees the page. Assume we are implementing some magic to "skip" zeroing, as we assume it is still zero. Due to 2), the page won't get migrated. In 3) we expect the page to be 0. QEMU would have to make sure that we always get either the original, or a zero page on the destination. Otherwise, this smells like data corruption.> >>> I'll have to come back and re-read the rest next week, this >>> is complex stuff and I'm too rushed with other things today. >> >> Yeah, I'm also loaded with other stuff. Maybe Alex has time to >> understand the details as well. > > So after looking it over again it makes a bit more sense why this > hasn't caused any issues so far, and I can see why the poison enabled > setup and hinting can work. The problem is I am left wondering what > assumptions we are allowed to leave in place. > > 1. Can we assume that we don't care about the contents in the pages in > the balloon changing?I think, we should define valid ways for the hypervisor to change it. "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by a zero page. However, as soon as the page is written by the guest (even before the hinting request was processed by the host), the modified page will stay - whereby the unwritten parts might either be from the old, or from the zero page." I think the debatable part is "whereby the unwritten parts might either be from the old, or from the zero page". AFAIU, you think it could happen in QEMU, that we have neither the old, nor the zero page, but instead some previous content. The question is if that's valid, or if that's a BUG in QEMU. If it's valid, we can do no optimizations in the guest (e.g., skip zeroing in the buddy). And I agree that this smells like "data corruption" as Michael said.> 2. Can we assume that the guest will always rewrite the page after the > deflate in the case of init_on_free or poison?Depends on what we think is the right way to do - IOW if we think "some other content" as mentioned above is a BUG or not.> 3. Can we assume that free page hinting will always function as a > balloon setup, so no moving it over to a page reporting type setup?I think we have to define the valid semantics. That implies what would be valid to do with it. Unfortunately, we have to reverse-engineer here.> > If we assume the above 3 items then there isn't any point in worrying > about poison when it comes to free page hinting. It doesn't make sense > to since they essentially negate it. As such I could go through this > patch and the QEMU patches and clean up any associations since the to > are not really tied together in any way.The big question is, if we want to support VIRTIO_BALLOON_F_PAGE_POISON with free page hinting. e.g.,: "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by a page full of X. However, as soon as the page is written by the guest (even before the hinting request was processed by the host), the modified page will stay - whereby the unwritten parts might either be from the old, or from a page filled with X. Without VIRTIO_BALLOON_F_PAGE_POISON, X is zero, otherwise it is poison_val." The current QEMU implementation would be to simply migrate all hinted pages. In the future we could optimize. Not sure if it's worth the trouble. -- Thanks, David / dhildenb
Alexander Duyck
2020-Apr-20 18:32 UTC
[virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
On Mon, Apr 20, 2020 at 6:28 AM David Hildenbrand <david at redhat.com> wrote:> > >>> Now we are talking about what's safe to do with the page. > >>> > >>> If POISON flag is set by hypervisor but clear by guest, > >>> or poison_val is 0, then it's clear it's safe to blow > >>> away the page if we can figure out it's unused. > >>> > >>> Otherwise, it's much less clear :) > >> > >> Hah! Agreed :D > > > > That isn't quite true. The problem is in the case of hinting it isn't > > setting the page to 0. It is simply not migrating it. So if there is > > data from an earlier pass it is stuck at that value. So the balloon > > will see the poison/init on some pages cleared, however I suppose the > > balloon doesn't care about the contents of the page. For the pages > > that are leaked back out via the shrinker they will be dirtied so they > > will end up being migrated with the correct value eventually. > > Right, I think current Linux guests are fine. The critical piece we are > talking about is > > 1) Guest balloon allocates and hints a page > 2) Hypervisor does not process hinting request > 3) Guest frees the page and reuses it (especially, writes it). > 4) Hypervisor processes the hinting request. > > AFAIU, as soon as the guest writes the page (e.g., zeroes it/poisons it > in the buddy, or somebody who allocated it), the page *will* get > migrated, even if 4) happens after 3). That's guaranteed by the 2-bitmap > magic.Yes. Basically the page will be flagged as dirty as soon as it is written to, and that will be dealt with after the current batch of hints are processed so we should be able to guarantee that the page will have a coherent value stored in it.> Now, assume the following happens (in some future Linux version) (due to > your "simply not migrating it" comment): > > 1) Guest balloon allocates and hints a page. Assume the page is zero due > to want_init_on_free(). > 2) Hypervisor processes the hinting request. > 3) Guest frees the page. Assume we are implementing some magic to "skip" > zeroing, as we assume it is still zero. > > Due to 2), the page won't get migrated. In 3) we expect the page to be > 0. QEMU would have to make sure that we always get either the original, > or a zero page on the destination. Otherwise, this smells like data > corruption.I agree and that is my concern. From the time the page is hinted until it is written to again it is in an indeterminate state. However with the current Linux guest implementation it is being written to so technically there isn't an issue. We would just need to make sure it stays that way. In addition it is already an exposed interface and this is the way it works. I think if anything we are likely forced to document it as-is and guarantee that the behavior is not changed.> > > >>> I'll have to come back and re-read the rest next week, this > >>> is complex stuff and I'm too rushed with other things today. > >> > >> Yeah, I'm also loaded with other stuff. Maybe Alex has time to > >> understand the details as well. > > > > So after looking it over again it makes a bit more sense why this > > hasn't caused any issues so far, and I can see why the poison enabled > > setup and hinting can work. The problem is I am left wondering what > > assumptions we are allowed to leave in place. > > > > 1. Can we assume that we don't care about the contents in the pages in > > the balloon changing? > > I think, we should define valid ways for the hypervisor to change it. > > "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by > a zero page. However, as soon as the page is written by the guest (even > before the hinting request was processed by the host), the modified page > will stay - whereby the unwritten parts might either be from the old, or > from the zero page."Actually I don't think the zero page is accurate. A page that is hinted basically implies we don't care about the content. I would think that we could treat a hinted page as uninitialized memory.> I think the debatable part is "whereby the unwritten parts might either > be from the old, or from the zero page". AFAIU, you think it could > happen in QEMU, that we have neither the old, nor the zero page, but > instead some previous content. The question is if that's valid, or if > that's a BUG in QEMU. If it's valid, we can do no optimizations in the > guest (e.g., skip zeroing in the buddy). And I agree that this smells > like "data corruption" as Michael said.So if any part of the page is written to after it is hinted you will be getting the modified page since it goes back to being "dirty". All the hinting is doing is skipping the migration of dirty pages that are "hinted" as long as they are not written to again. So the pages are valid up until we migrate to the new system. At that point all of the pages that are in the page hinting balloon will be stale data as we skipped migrating them. Assuming something like page poisoning or init_on_free are enabled they will be poisoned again when they are transferred from the balloon back into the page allocator.> > 2. Can we assume that the guest will always rewrite the page after the > > deflate in the case of init_on_free or poison? > > Depends on what we think is the right way to do - IOW if we think "some > other content" as mentioned above is a BUG or not.So I wouldn't consider it a but as the zero page probably doesn't apply. We are basically just indicating we don't care about the contents, we aren't giving it a value. At least that is how I see it working based on how it was implemented.> > 3. Can we assume that free page hinting will always function as a > > balloon setup, so no moving it over to a page reporting type setup? > > I think we have to define the valid semantics. That implies what would > be valid to do with it. Unfortunately, we have to reverse-engineer here.So at this point a "hinted" page is a page that can essentially transition to uninitialized while it is sitting in the balloon. I suspect that is how we are going to have to treat it since that is the behavior that it has right now.> > > > If we assume the above 3 items then there isn't any point in worrying > > about poison when it comes to free page hinting. It doesn't make sense > > to since they essentially negate it. As such I could go through this > > patch and the QEMU patches and clean up any associations since the to > > are not really tied together in any way. > > The big question is, if we want to support VIRTIO_BALLOON_F_PAGE_POISON > with free page hinting. e.g.,: > > "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by > a page full of X. However, as soon as the page is written by the guest > (even before the hinting request was processed by the host), the > modified page will stay - whereby the unwritten parts might either be > from the old, or from a page filled with X. Without > VIRTIO_BALLOON_F_PAGE_POISON, X is zero, otherwise it is poison_val."So the problem is, as implemented, none of the above is correct. Basically what we get with VIRTIO_BALLOON_F_FREE_PAGE_HINT is either no migration, or migration of some old stale state if the page made it into the balloon. There is a chance X could be 0 in the non-migration case as I believe that is going to be the default starting point for memory, however it is just as likely that the page will have stale data if it is in use during any of the bitmap sync passes. The problem is the original code didn't take the poison flag or value into account, so I think we are stuck treating a hinted page as uninitialized memory as long as it is in the balloon. With the VIRTIO_BALLOON_F_PAGE_POISON we could make it so that when the page comes out of the balloon it is either unmodified or it is poison_val. Without the VIRTIO_BALLOON_F_PAGE_POISON feature present you cannot make that guarantee and are stuck with the page being treated as either unmodified or uninitialized memory.> The current QEMU implementation would be to simply migrate all hinted > pages. In the future we could optimize. Not sure if it's worth the trouble.So the trick for me is how best to go about sorting this all out. There are two ways I see of doing it. The first approach would be to just assume that hinting should be disassociated from poisoning. If I go that route that would more closely match up with what happened in QEMU. The downside is that it locks in the unmodified/uninitialized behavior and would require pages to be rewritten when they come out of the balloon. However this is the behavior we have now so it would only require specification documentation changes. The second approach is to work on defining it such that VIRTIO_BALLOON_F_PAGE_POISON switches us to the unmodified/poison_val definition. However the side effect of that is that for now having the flag set means that essentially page hinting is disabled until we could come up with a way of guaranteeing the poison_val effect which would be additional work. In addition it has to rely on the fact that the existing guest solutions reinitialize the pages when they come out of the balloon since the pre-5.7 driver was broken and only took poison into account and didn't check on init_on_free. It would likely be the much more error prone approach and create significantly more work.
David Hildenbrand
2020-Apr-21 07:29 UTC
[virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
>>> 2. Can we assume that the guest will always rewrite the page after the >>> deflate in the case of init_on_free or poison? >> >> Depends on what we think is the right way to do - IOW if we think "some >> other content" as mentioned above is a BUG or not. > > So I wouldn't consider it a but as the zero page probably doesn't > apply. We are basically just indicating we don't care about the > contents, we aren't giving it a value. At least that is how I see it > working based on how it was implemented. > >>> 3. Can we assume that free page hinting will always function as a >>> balloon setup, so no moving it over to a page reporting type setup? >> >> I think we have to define the valid semantics. That implies what would >> be valid to do with it. Unfortunately, we have to reverse-engineer here. > > So at this point a "hinted" page is a page that can essentially > transition to uninitialized while it is sitting in the balloon. I > suspect that is how we are going to have to treat it since that is the > behavior that it has right now.At least it's not what Michael initially thought should be done - IIRC. "We need to remember that the whole HINT thing is not a mandate for host to corrupt memory. It's just some info about page use guest gives host. If host corrupts memory it's broken ..."). So the question remains: Does QEMU have a BUG or do we actually allow to corrupt guest memory. That leaves us with either 1. "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by zero page. However, as soon as the page is written by the guest (even before the hinting request was processed by the host), the modified page will stay - whereby the unwritten parts might either be from the old, or from the zero page." - a QEMU BUG. 2. "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT are considered unused and will contain an undefined/uninitialized content once hinted. As soon as the page is written by the guest (even before the hinting request was processed by the host), the modified page will stay. The guest should reinitialized the full page in case it cares about the actual content (e.g., page poisoning)." I tend to favor 2 - although it basically leaves no room for future improvement regarding skipping re-initialization in the guest after migration.>>> >>> If we assume the above 3 items then there isn't any point in worrying >>> about poison when it comes to free page hinting. It doesn't make sense >>> to since they essentially negate it. As such I could go through this >>> patch and the QEMU patches and clean up any associations since the to >>> are not really tied together in any way. >> >> The big question is, if we want to support VIRTIO_BALLOON_F_PAGE_POISON >> with free page hinting. e.g.,: >> >> "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by >> a page full of X. However, as soon as the page is written by the guest >> (even before the hinting request was processed by the host), the >> modified page will stay - whereby the unwritten parts might either be >> from the old, or from a page filled with X. Without >> VIRTIO_BALLOON_F_PAGE_POISON, X is zero, otherwise it is poison_val."[...]> > With the VIRTIO_BALLOON_F_PAGE_POISON we could make it so that when > the page comes out of the balloon it is either unmodified or it is > poison_val. Without the VIRTIO_BALLOON_F_PAGE_POISON feature present > you cannot make that guarantee and are stuck with the page being > treated as either unmodified or uninitialized memory.While it would be possible, I doubt it will be easy to implement, and I still wonder if we should really care about optimizing an undocumented feature that takes three people to figure out the semantics.> >> The current QEMU implementation would be to simply migrate all hinted >> pages. In the future we could optimize. Not sure if it's worth the trouble. > > So the trick for me is how best to go about sorting this all out. > There are two ways I see of doing it. > > The first approach would be to just assume that hinting should be > disassociated from poisoning. If I go that route that would more > closely match up with what happened in QEMU. The downside is that it > locks in the unmodified/uninitialized behavior and would require pages > to be rewritten when they come out of the balloon. However this is the > behavior we have now so it would only require specification > documentation changes.Right now, for simplicity, I prefer this and define VIRTIO_BALLOON_F_PAGE_POISON only for explicit deflation (balloon deflation via the deflate queue) and implicit deflation (VIRTIO_BALLOON_F_REPORTING). Let's see if Michael has another opinion. -- Thanks, David / dhildenb