Michael S. Tsirkin
2017-Jul-30 04:22 UTC
[PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG
On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote:> On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: > > > > > > OK I thought this over. While we might need these new APIs in > > > > > > the future, I think that at the moment, there's a way to implement > > > > > > this feature that is significantly simpler. Just add each s/g > > > > > > as a separate input buffer. > > > > > Should it be an output buffer? > > > > Hypervisor overwrites these pages with zeroes. Therefore it is > > > > writeable by device: DMA_FROM_DEVICE. > > > Why would the hypervisor need to zero the buffer? > > The page is supplied to hypervisor and can lose the value that > > is there. That is the definition of writeable by device. > > I think for the free pages, it should be clear that they will be added as > output buffer to the device, because (as we discussed) they are just hints, > and some of them may be used by the guest after the report_ API is invoked. > The device/hypervisor should not use or discard them.Discarding contents is exactly what you propose doing if migration is going on, isn't it?> For the balloon pages, I kind of agree with the existing implementation > (e.g. inside tell_host()), which uses virtqueue_add_outbuf (instead of > _add_inbuf()).This is because it does not pass SGs, it passes weirdly formatted PA within the buffer.> I think inbuf should be a buffer which will be written by the device and > read by the > driver.If hypervisor can change it, it's an inbuf. Should not matter whether driver reads it.> The cmd buffer put on the vq for the device to send commands can be > an > inbuf, I think. > > > > > > > I think it may only > > > need to read out the info(base,size). > > And then do what? > > > For the free pages, the info will be used to clear the corresponding "1" in > the dirty bitmap. > For balloon pages, they will be made DONTNEED and given to other host > processes to > use (the device won't write them, so no need to set "write" when > virtqueue_map_desc() in > the device). > > > > > > > I think it should be like this: > > > the cmd hdr buffer: input, because the hypervisor would write it to > > > send a cmd to the guest > > > the payload buffer: output, for the hypervisor to read the info > > These should be split. > > > > We have: > > > > 1. request that hypervisor sends to guest, includes ID: input > > 2. synchronisation header with ID sent by guest: output > > 3. list of pages: input > > > > 2 and 3 must be on the same VQ. 1 can come on any VQ - reusing stats VQ > > might make sense. > > I would prefer to make the above item 1 come on the the free page vq, > because the existing stat_vq doesn't support the cmd hdr. > Now, I think it is also not necessary to move the existing stat_vq > implementation to > a new implementation under the cmd hdr, because > that new support doesn't make a difference for stats, it will still use its > stat_vq (the free > page vq will be used for reporting free pages only) > > What do you think? > > > Best, > Wei
On Sunday, July 30, 2017 12:23 PM, Michael S. Tsirkin wrote:> On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote: > > On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: > > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: > > > > > > > OK I thought this over. While we might need these new APIs > > > > > > > in the future, I think that at the moment, there's a way to > > > > > > > implement this feature that is significantly simpler. Just > > > > > > > add each s/g as a separate input buffer. > > > > > > Should it be an output buffer? > > > > > Hypervisor overwrites these pages with zeroes. Therefore it is > > > > > writeable by device: DMA_FROM_DEVICE. > > > > Why would the hypervisor need to zero the buffer? > > > The page is supplied to hypervisor and can lose the value that is > > > there. That is the definition of writeable by device. > > > > I think for the free pages, it should be clear that they will be added > > as output buffer to the device, because (as we discussed) they are > > just hints, and some of them may be used by the guest after the report_ API is > invoked. > > The device/hypervisor should not use or discard them. > > Discarding contents is exactly what you propose doing if migration is going on, > isn't it?That's actually a different concept. Please let me explain it with this example: The hypervisor receives the hint saying the guest PageX is a free page, but as we know, after that report_ API exits, the guest kernel may take PageX to use, so PageX is not free page any more. At this time, if the hypervisor writes to the page, that would crash the guest. So, I think the cornerstone of this work is that the hypervisor should not touch the reported pages. Best, Wei
Michael S. Tsirkin
2017-Jul-30 16:18 UTC
[PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG
On Sun, Jul 30, 2017 at 05:59:17AM +0000, Wang, Wei W wrote:> On Sunday, July 30, 2017 12:23 PM, Michael S. Tsirkin wrote: > > On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote: > > > On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: > > > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: > > > > > > > > OK I thought this over. While we might need these new APIs > > > > > > > > in the future, I think that at the moment, there's a way to > > > > > > > > implement this feature that is significantly simpler. Just > > > > > > > > add each s/g as a separate input buffer. > > > > > > > Should it be an output buffer? > > > > > > Hypervisor overwrites these pages with zeroes. Therefore it is > > > > > > writeable by device: DMA_FROM_DEVICE. > > > > > Why would the hypervisor need to zero the buffer? > > > > The page is supplied to hypervisor and can lose the value that is > > > > there. That is the definition of writeable by device. > > > > > > I think for the free pages, it should be clear that they will be added > > > as output buffer to the device, because (as we discussed) they are > > > just hints, and some of them may be used by the guest after the report_ API is > > invoked. > > > The device/hypervisor should not use or discard them. > > > > Discarding contents is exactly what you propose doing if migration is going on, > > isn't it? > > That's actually a different concept. Please let me explain it with this example: > > The hypervisor receives the hint saying the guest PageX is a free page, but as we know, > after that report_ API exits, the guest kernel may take PageX to use, so PageX is not free > page any more. At this time, if the hypervisor writes to the page, that would crash the guest. > So, I think the cornerstone of this work is that the hypervisor should not touch the > reported pages. > > Best, > WeiThat's a hypervisor implementation detail. From guest point of view, discarding contents can not be distinguished from writing old contents.