Christian Borntraeger
2018-Dec-27 11:31 UTC
[PATCH v37 0/3] Virtio-balloon: support free page reporting
This patch triggers random crashes in the guest kernel on s390 early during boot. No migration and no setting of the balloon is involved. On 27.08.2018 03:32, Wei Wang wrote:> The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, implemented by this > series enables the virtio-balloon driver to report hints of guest free > pages to host. It can be used to accelerate virtual machine (VM) live > migration. Here is an introduction of this usage: > > Live migration needs to transfer the VM's memory from the source machine > to the destination round by round. For the 1st round, all the VM's memory > is transferred. From the 2nd round, only the pieces of memory that were > written by the guest (after the 1st round) are transferred. One method > that is popularly used by the hypervisor to track which part of memory is > written is to have the hypervisor write-protect all the guest memory. > > This feature enables the optimization by skipping the transfer of guest > free pages during VM live migration. It is not concerned that the memory > pages are used after they are given to the hypervisor as a hint of the > free pages, because they will be tracked by the hypervisor and transferred > in the subsequent round if they are used and written. > > * Tests > 1 Test Environment > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms > > 2 Test Results (results are averaged over several repeated runs) > 2.1 Guest setup: 8G RAM, 4 vCPU > 2.1.1 Idle guest live migration time > Optimization v.s. Legacy = 620ms vs 2970ms > --> ~79% reduction > 2.1.2 Guest live migration with Linux compilation workload > (i.e. make bzImage -j4) running > 1) Live Migration Time: > Optimization v.s. Legacy = 2273ms v.s. 4502ms > --> ~50% reduction > 2) Linux Compilation Time: > Optimization v.s. Legacy = 8min42s v.s. 8min43s > --> no obvious difference > > 2.2 Guest setup: 128G RAM, 4 vCPU > 2.2.1 Idle guest live migration time > Optimization v.s. Legacy = 5294ms vs 41651ms > --> ~87% reduction > 2.2.2 Guest live migration with Linux compilation workload > 1) Live Migration Time: > Optimization v.s. Legacy = 8816ms v.s. 54201ms > --> 84% reduction > 2) Linux Compilation Time: > Optimization v.s. Legacy = 8min30s v.s. 8min36s > --> no obvious difference > > ChangeLog: > v36->v37: > - free the reported pages to mm when receives a DONE cmd from host. > Please see patch 1's commit log for reasons. Please see patch 1's > commit for detailed explanations. > > For ChangeLogs from v22 to v36, please reference > https://lkml.org/lkml/2018/7/20/199 > > For ChangeLogs before v21, please reference > https://lwn.net/Articles/743660/ > > Wei Wang (3): > virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT > mm/page_poison: expose page_poisoning_enabled to kernel modules > virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON > > drivers/virtio/virtio_balloon.c | 374 ++++++++++++++++++++++++++++++++---- > include/uapi/linux/virtio_balloon.h | 8 + > mm/page_poison.c | 6 + > 3 files changed, 355 insertions(+), 33 deletions(-) >
Christian Borntraeger
2018-Dec-27 11:59 UTC
[PATCH v37 0/3] Virtio-balloon: support free page reporting
On 27.12.2018 12:31, Christian Borntraeger wrote:> This patch triggers random crashes in the guest kernel on s390 early during boot. > No migration and no setting of the balloon is involved. >Adding Conny and Halil, As the QEMU provides no PAGE_HINT feature yet, this quick hack makes the guest boot fine again: diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 728ecd1eea305..aa2e1864c5736 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -492,7 +492,7 @@ static int init_vqs(struct virtio_balloon *vb) callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; } - err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, + err = vb->vdev->config->find_vqs(vb->vdev, 3, //VIRTIO_BALLOON_VQ_MAX, vqs, callbacks, names, NULL, NULL); if (err) return err; To me it looks like that virtio_ccw_find_vqs will abort if any of the virtqueues that it is been asked for does not exist (including the earlier ones). Christian> > On 27.08.2018 03:32, Wei Wang wrote: >> The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, implemented by this >> series enables the virtio-balloon driver to report hints of guest free >> pages to host. It can be used to accelerate virtual machine (VM) live >> migration. Here is an introduction of this usage: >> >> Live migration needs to transfer the VM's memory from the source machine >> to the destination round by round. For the 1st round, all the VM's memory >> is transferred. From the 2nd round, only the pieces of memory that were >> written by the guest (after the 1st round) are transferred. One method >> that is popularly used by the hypervisor to track which part of memory is >> written is to have the hypervisor write-protect all the guest memory. >> >> This feature enables the optimization by skipping the transfer of guest >> free pages during VM live migration. It is not concerned that the memory >> pages are used after they are given to the hypervisor as a hint of the >> free pages, because they will be tracked by the hypervisor and transferred >> in the subsequent round if they are used and written. >> >> * Tests >> 1 Test Environment >> Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz >> Migration setup: migrate_set_speed 100G, migrate_set_downtime 400ms >> >> 2 Test Results (results are averaged over several repeated runs) >> 2.1 Guest setup: 8G RAM, 4 vCPU >> 2.1.1 Idle guest live migration time >> Optimization v.s. Legacy = 620ms vs 2970ms >> --> ~79% reduction >> 2.1.2 Guest live migration with Linux compilation workload >> (i.e. make bzImage -j4) running >> 1) Live Migration Time: >> Optimization v.s. Legacy = 2273ms v.s. 4502ms >> --> ~50% reduction >> 2) Linux Compilation Time: >> Optimization v.s. Legacy = 8min42s v.s. 8min43s >> --> no obvious difference >> >> 2.2 Guest setup: 128G RAM, 4 vCPU >> 2.2.1 Idle guest live migration time >> Optimization v.s. Legacy = 5294ms vs 41651ms >> --> ~87% reduction >> 2.2.2 Guest live migration with Linux compilation workload >> 1) Live Migration Time: >> Optimization v.s. Legacy = 8816ms v.s. 54201ms >> --> 84% reduction >> 2) Linux Compilation Time: >> Optimization v.s. Legacy = 8min30s v.s. 8min36s >> --> no obvious difference >> >> ChangeLog: >> v36->v37: >> - free the reported pages to mm when receives a DONE cmd from host. >> Please see patch 1's commit log for reasons. Please see patch 1's >> commit for detailed explanations. >> >> For ChangeLogs from v22 to v36, please reference >> https://lkml.org/lkml/2018/7/20/199 >> >> For ChangeLogs before v21, please reference >> https://lwn.net/Articles/743660/ >> >> Wei Wang (3): >> virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT >> mm/page_poison: expose page_poisoning_enabled to kernel modules >> virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON >> >> drivers/virtio/virtio_balloon.c | 374 ++++++++++++++++++++++++++++++++---- >> include/uapi/linux/virtio_balloon.h | 8 + >> mm/page_poison.c | 6 + >> 3 files changed, 355 insertions(+), 33 deletions(-) >>
Christian Borntraeger
2018-Dec-27 12:17 UTC
[PATCH v37 0/3] Virtio-balloon: support free page reporting
On 27.12.2018 12:59, Christian Borntraeger wrote:> On 27.12.2018 12:31, Christian Borntraeger wrote: >> This patch triggers random crashes in the guest kernel on s390 early during boot. >> No migration and no setting of the balloon is involved. >> > > Adding Conny and Halil, > > As the QEMU provides no PAGE_HINT feature yet, this quick hack makes the > guest boot fine again: > > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index 728ecd1eea305..aa2e1864c5736 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -492,7 +492,7 @@ static int init_vqs(struct virtio_balloon *vb) > callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; > } > > - err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, > + err = vb->vdev->config->find_vqs(vb->vdev, 3, //VIRTIO_BALLOON_VQ_MAX, > vqs, callbacks, names, NULL, NULL); > if (err) > return err; > > > To me it looks like that virtio_ccw_find_vqs will abort if any of the virtqueues > that it is been asked for does not exist (including the earlier ones). >This "hack" makes the random crashes go away, but the balloon interface itself does not work. (setting the value to anything will hang the guest). As patch 1 also modifies the main path, there seem to be additional issues, maybe endianess Looking at things like + vb->cmd_id_received = VIRTIO_BALLOON_CMD_ID_STOP; + vb->cmd_id_active = cpu_to_virtio32(vb->vdev, + VIRTIO_BALLOON_CMD_ID_STOP); + vb->cmd_id_stop = cpu_to_virtio32(vb->vdev, + VIRTIO_BALLOON_CMD_ID_STOP); Why is cmd_id_received not using cpu_to_virtio32?