David Hildenbrand
2020-Apr-17 10:09 UTC
[virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
> What do you call "hinting ends" though? The fact we put > a page in the VQ is not a guarantee that it's been consumed > by the hypervisor. >I'd say hinting ends once the hypervisor sets FREE_PAGE_REPORT_S_DONE.> > I think a strict definition is this: > - hint includes a command ID > - hint implies "page was unused at some point after guest reading command ID" > > > Hypervisor can use dirty tracking tricks to get from that to > "page is unused at the moment". > >> Whereby X is >> currently assumed to be 0, correct? > > > > Now we are talking about what's safe to do with the page. > > If POISON flag is set by hypervisor but clear by guest, > or poison_val is 0, then it's clear it's safe to blow > away the page if we can figure out it's unused. > > Otherwise, it's much less clear :)Hah! Agreed :D> > > > I'll have to come back and re-read the rest next week, this > is complex stuff and I'm too rushed with other things today.Yeah, I'm also loaded with other stuff. Maybe Alex has time to understand the details as well. -- Thanks, David / dhildenb
Michael S. Tsirkin
2020-Apr-17 10:19 UTC
[virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
On Fri, Apr 17, 2020 at 12:09:38PM +0200, David Hildenbrand wrote:> > What do you call "hinting ends" though? The fact we put > > a page in the VQ is not a guarantee that it's been consumed > > by the hypervisor. > > > > I'd say hinting ends once the hypervisor sets FREE_PAGE_REPORT_S_DONE.Can't find that one anywhere. what did I miss?> > > > I think a strict definition is this: > > - hint includes a command ID > > - hint implies "page was unused at some point after guest reading command ID" > > > > > > Hypervisor can use dirty tracking tricks to get from that to > > "page is unused at the moment". > > > >> Whereby X is > >> currently assumed to be 0, correct? > > > > > > > > Now we are talking about what's safe to do with the page. > > > > If POISON flag is set by hypervisor but clear by guest, > > or poison_val is 0, then it's clear it's safe to blow > > away the page if we can figure out it's unused. > > > > Otherwise, it's much less clear :) > > Hah! Agreed :D > > > > > > > > > I'll have to come back and re-read the rest next week, this > > is complex stuff and I'm too rushed with other things today. > > Yeah, I'm also loaded with other stuff. Maybe Alex has time to > understand the details as well. > > -- > Thanks, > > David / dhildenb
David Hildenbrand
2020-Apr-17 10:26 UTC
[virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
On 17.04.20 12:19, Michael S. Tsirkin wrote:> On Fri, Apr 17, 2020 at 12:09:38PM +0200, David Hildenbrand wrote: >> > What do you call "hinting ends" though? The fact we put >>> a page in the VQ is not a guarantee that it's been consumed >>> by the hypervisor. >>> >> >> I'd say hinting ends once the hypervisor sets FREE_PAGE_REPORT_S_DONE. > > Can't find that one anywhere. what did I miss?Sorry, the QEMU implementation is confusing. FREE_PAGE_REPORT_S_DONE is translated to VIRTIO_BALLOON_CMD_ID_DONE QEMU: hw/virtio/virtio-balloon.c:virtio_balloon_free_page_report_notify() -> virtio_balloon_free_page_done(dev) -> s->free_page_report_status = FREE_PAGE_REPORT_S_DONE; virtio_notify_config(vdev); When the guest reads the config hw/virtio/virtio-balloon.c:virtio_balloon_get_config() -> if (dev->free_page_report_status == FREE_PAGE_REPORT_S_DONE) -> config.free_page_report_cmd_id = ... VIRTIO_BALLOON_CMD_ID_DONE Linux: drivers/virtio/virtio_balloon.c:report_free_page_func() -> if (cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) { -> return_free_pages_to_mm() So it's VIRTIO_BALLOON_CMD_ID_DONE. -- Thanks, David / dhildenb
Alexander Duyck
2020-Apr-17 16:22 UTC
[virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
On Fri, Apr 17, 2020 at 3:09 AM David Hildenbrand <david at redhat.com> wrote:> > > What do you call "hinting ends" though? The fact we put > > a page in the VQ is not a guarantee that it's been consumed > > by the hypervisor. > > > > I'd say hinting ends once the hypervisor sets FREE_PAGE_REPORT_S_DONE.The key bit to this is that there are 4 states, and quasi unlimited command IDs, although I believe the first 2 are matched up to the states. So the VIRTIO_BALLOON_CMD_ID_DONE is matched up with FREE_PAGE_REPORT_S_DONE, and CMD_ID_STOP with S_STOP, but really all it means is that we are done with the current epoch so we need to flush the memory and move on. The state is more important to the hypervisor as it will switch to "STOP" while it is synching the dirty bits, "REQUESTED" once that has been completed and it will increment the command ID, "START" once the first hint is received with the matching command ID, and "DONE" once the migration is complete. As long as it is in the "START" state and the command ID in the hint matches it will use the information to clear the dirty bits as it runs in parallel with the migration task. The piece I think I was missing was that the balloon is staying (mostly) inflated until the migration is complete. If that is the case then I suppose we could leave this enabled at least on the guest side, assuming the balloon doesn't give back too many pages when it hits the shrinker.> > > > I think a strict definition is this: > > - hint includes a command ID > > - hint implies "page was unused at some point after guest reading command ID" > > > > > > Hypervisor can use dirty tracking tricks to get from that to > > "page is unused at the moment". > > > >> Whereby X is > >> currently assumed to be 0, correct? > > > > > > > > Now we are talking about what's safe to do with the page. > > > > If POISON flag is set by hypervisor but clear by guest, > > or poison_val is 0, then it's clear it's safe to blow > > away the page if we can figure out it's unused. > > > > Otherwise, it's much less clear :) > > Hah! Agreed :DThat isn't quite true. The problem is in the case of hinting it isn't setting the page to 0. It is simply not migrating it. So if there is data from an earlier pass it is stuck at that value. So the balloon will see the poison/init on some pages cleared, however I suppose the balloon doesn't care about the contents of the page. For the pages that are leaked back out via the shrinker they will be dirtied so they will end up being migrated with the correct value eventually.> > I'll have to come back and re-read the rest next week, this > > is complex stuff and I'm too rushed with other things today. > > Yeah, I'm also loaded with other stuff. Maybe Alex has time to > understand the details as well.So after looking it over again it makes a bit more sense why this hasn't caused any issues so far, and I can see why the poison enabled setup and hinting can work. The problem is I am left wondering what assumptions we are allowed to leave in place. 1. Can we assume that we don't care about the contents in the pages in the balloon changing? 2. Can we assume that the guest will always rewrite the page after the deflate in the case of init_on_free or poison? 3. Can we assume that free page hinting will always function as a balloon setup, so no moving it over to a page reporting type setup? If we assume the above 3 items then there isn't any point in worrying about poison when it comes to free page hinting. It doesn't make sense to since they essentially negate it. As such I could go through this patch and the QEMU patches and clean up any associations since the to are not really tied together in any way.
David Hildenbrand
2020-Apr-20 13:28 UTC
[virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
>>> Now we are talking about what's safe to do with the page. >>> >>> If POISON flag is set by hypervisor but clear by guest, >>> or poison_val is 0, then it's clear it's safe to blow >>> away the page if we can figure out it's unused. >>> >>> Otherwise, it's much less clear :) >> >> Hah! Agreed :D > > That isn't quite true. The problem is in the case of hinting it isn't > setting the page to 0. It is simply not migrating it. So if there is > data from an earlier pass it is stuck at that value. So the balloon > will see the poison/init on some pages cleared, however I suppose the > balloon doesn't care about the contents of the page. For the pages > that are leaked back out via the shrinker they will be dirtied so they > will end up being migrated with the correct value eventually.Right, I think current Linux guests are fine. The critical piece we are talking about is 1) Guest balloon allocates and hints a page 2) Hypervisor does not process hinting request 3) Guest frees the page and reuses it (especially, writes it). 4) Hypervisor processes the hinting request. AFAIU, as soon as the guest writes the page (e.g., zeroes it/poisons it in the buddy, or somebody who allocated it), the page *will* get migrated, even if 4) happens after 3). That's guaranteed by the 2-bitmap magic. Now, assume the following happens (in some future Linux version) (due to your "simply not migrating it" comment): 1) Guest balloon allocates and hints a page. Assume the page is zero due to want_init_on_free(). 2) Hypervisor processes the hinting request. 3) Guest frees the page. Assume we are implementing some magic to "skip" zeroing, as we assume it is still zero. Due to 2), the page won't get migrated. In 3) we expect the page to be 0. QEMU would have to make sure that we always get either the original, or a zero page on the destination. Otherwise, this smells like data corruption.> >>> I'll have to come back and re-read the rest next week, this >>> is complex stuff and I'm too rushed with other things today. >> >> Yeah, I'm also loaded with other stuff. Maybe Alex has time to >> understand the details as well. > > So after looking it over again it makes a bit more sense why this > hasn't caused any issues so far, and I can see why the poison enabled > setup and hinting can work. The problem is I am left wondering what > assumptions we are allowed to leave in place. > > 1. Can we assume that we don't care about the contents in the pages in > the balloon changing?I think, we should define valid ways for the hypervisor to change it. "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by a zero page. However, as soon as the page is written by the guest (even before the hinting request was processed by the host), the modified page will stay - whereby the unwritten parts might either be from the old, or from the zero page." I think the debatable part is "whereby the unwritten parts might either be from the old, or from the zero page". AFAIU, you think it could happen in QEMU, that we have neither the old, nor the zero page, but instead some previous content. The question is if that's valid, or if that's a BUG in QEMU. If it's valid, we can do no optimizations in the guest (e.g., skip zeroing in the buddy). And I agree that this smells like "data corruption" as Michael said.> 2. Can we assume that the guest will always rewrite the page after the > deflate in the case of init_on_free or poison?Depends on what we think is the right way to do - IOW if we think "some other content" as mentioned above is a BUG or not.> 3. Can we assume that free page hinting will always function as a > balloon setup, so no moving it over to a page reporting type setup?I think we have to define the valid semantics. That implies what would be valid to do with it. Unfortunately, we have to reverse-engineer here.> > If we assume the above 3 items then there isn't any point in worrying > about poison when it comes to free page hinting. It doesn't make sense > to since they essentially negate it. As such I could go through this > patch and the QEMU patches and clean up any associations since the to > are not really tied together in any way.The big question is, if we want to support VIRTIO_BALLOON_F_PAGE_POISON with free page hinting. e.g.,: "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by a page full of X. However, as soon as the page is written by the guest (even before the hinting request was processed by the host), the modified page will stay - whereby the unwritten parts might either be from the old, or from a page filled with X. Without VIRTIO_BALLOON_F_PAGE_POISON, X is zero, otherwise it is poison_val." The current QEMU implementation would be to simply migrate all hinted pages. In the future we could optimize. Not sure if it's worth the trouble. -- Thanks, David / dhildenb