Michael pointed out that the virtio-vsock draft specification does not address live migration and in fact currently precludes migration. Migration is fundamental so the device specification at least mustn't preclude it. Having brainstormed migration with Matthew Benjamin and Michael Tsirkin, I am now summarizing the approach that I want to include in the next draft specification. Feedback and comments welcome! In the meantime I will implement this in code and update the draft specification. 1. Requirements Virtio-vsock is a new AF_VSOCK transport. As such, it should provide at least the same guarantees as the existing AF_VSOCK VMCI transport. This is for consistency and to allow code reuse across any AF_VSOCK transport. Virtio-vsock aims to replace virtio-serial by providing the same guest/host communication ability but with sockets API semantics that are more popular and convenient for application developers. Therefore virtio-vsock migration should provide at least the same level of migration functionality as virtio-serial. Ideally it should be possible to migrate applications using AF_VSOCK together with the virtual machine so that guest<->host communication is interrupted. Neither AF_VSOCK VMCI nor virtio-serial support this today. 2. Basic disruptive migration flow When the virtual machine migrates from the source host to the destination host, the guest's CID may change. The CID namespace is host-wide so other hosts may have CID collisions and allocate a new CID for incoming migration VMs. The device notifies the guest that the CID has changed. Guest sockets are affected as follows: * Established connections are reset (ECONNRESET) and the guest application will have to reconnect. * Listen sockets remain open. The only thing to note is that connections from the host are now made to the new CID. This means the local address of the listen socket is automatically updated to the new CID. * Sockets in other states are unchanged. Applications must handle disruptive migration by reconnecting if necessary after ECONNRESET. 3. Checkpoint/restore for seamless migration Applications that wish to communicate across live migration can do so but this requires extra application-specific checkpoint/restore code. This is similar to the approach taken by the CRIU project where getsockopt()/setsockopt() is used to migrate socket state. The difference is that the application process is not automatically migrated from the source host to the destination host. Therefore, the application needs to migrate its own state somehow. The flow is as follows: The application on the source host must quiesce (stop sending/receiving) and use getsockopt() to extract socket state information from the host kernel. A new instance of the application is started on the destination host and given the state so it can restore the connection. The setsockopt() syscall is used to restore socket state information. The guest is given a list of <host_old_cid, host_new_cid, host_port, guest_port> tuples for established connections that must not be reset when the guest CID update notification is received. These connections will carry on as if nothing changed. Note that the connection's remote address is updated from host_old_cid to host_new_cid. This allows remapping of CIDs (if necessary). Typically this will be unused because the host always has well-known CID 2. In a guest<->guest scenario it may be used to remap CIDs. For the time being I am focussing on the basic disruptive migration flow only. Checkpoint/restore can be added with a feature bit in the future. It is a lot more complex and I'm not sure whether there will be any users yet. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160303/80032760/attachment.sig>
On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:> Michael pointed out that the virtio-vsock draft specification does not > address live migration and in fact currently precludes migration. > > Migration is fundamental so the device specification at least mustn't > preclude it. Having brainstormed migration with Matthew Benjamin and > Michael Tsirkin, I am now summarizing the approach that I want to > include in the next draft specification. > > Feedback and comments welcome! In the meantime I will implement this in > code and update the draft specification. > > 1. Requirements > > Virtio-vsock is a new AF_VSOCK transport. As such, it should provide at > least the same guarantees as the existing AF_VSOCK VMCI transport. This > is for consistency and to allow code reuse across any AF_VSOCK > transport. > > Virtio-vsock aims to replace virtio-serial by providing the same > guest/host communication ability but with sockets API semantics that are > more popular and convenient for application developers. Therefore > virtio-vsock migration should provide at least the same level of > migration functionality as virtio-serial. > > Ideally it should be possible to migrate applications using AF_VSOCK > together with the virtual machine so that guest<->host communication is > interrupted. Neither AF_VSOCK VMCI nor virtio-serial support this > today.I'm not sure why do you say this about virtio serial. It appears that if host pre-connected to destination qemu before migration, backend reconnects transparently on destination.> 2. Basic disruptive migration flow > > When the virtual machine migrates from the source host to the > destination host, the guest's CID may change. The CID namespace is > host-wideBTW, I think CIDs would have to become per network namespace.> so other hosts may have CID collisions and allocate a new CID > for incoming migration VMs.I guess all this is so that guest can retrieve its CID and send it to host using some side-channel?> The device notifies the guest that the CID has changed. Guest sockets > are affected as follows: > > * Established connections are reset (ECONNRESET) and the guest > application will have to reconnect. > > * Listen sockets remain open. The only thing to note is that > connections from the host are now made to the new CID. This means > the local address of the listen socket is automatically updated to > the new CID. > > * Sockets in other states are unchanged. > > Applications must handle disruptive migration by reconnecting if > necessary after ECONNRESET. > > 3. Checkpoint/restore for seamless migration > > Applications that wish to communicate across live migration can do so > but this requires extra application-specific checkpoint/restore code. > > This is similar to the approach taken by the CRIU project where > getsockopt()/setsockopt() is used to migrate socket state. The > difference is that the application process is not automatically migrated > from the source host to the destination host. Therefore, the > application needs to migrate its own state somehow. > > The flow is as follows: > > The application on the source host must quiesce (stop sending/receiving) > and use getsockopt() to extract socket state information from the host > kernel. > > A new instance of the application is started on the destination host and > given the state so it can restore the connection. The setsockopt() > syscall is used to restore socket state information. > > The guest is given a list of <host_old_cid, host_new_cid, host_port, > guest_port> tuples for established connections that must not be reset > when the guest CID update notification is received. These connections > will carry on as if nothing changed. > > Note that the connection's remote address is updated from host_old_cid > to host_new_cid. This allows remapping of CIDs (if necessary). > Typically this will be unused because the host always has well-known CID > 2. In a guest<->guest scenario it may be used to remap CIDs. > > > For the time being I am focussing on the basic disruptive migration flow > only. Checkpoint/restore can be added with a feature bit in the future. > It is a lot more complex and I'm not sure whether there will be any > users yet. > > StefanThis makes some things harder. For example, imagine a guest reboot mixed with migration. We don't know why did the connection die, so we'll retry connections until - when? Could you please describe some user of vsock and show how it recovers from destructive migration? -- MST
On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:> Michael pointed out that the virtio-vsock draft specification does not > address live migration and in fact currently precludes migration. > > Migration is fundamental so the device specification at least mustn't > preclude it. Having brainstormed migration with Matthew Benjamin and > Michael Tsirkin, I am now summarizing the approach that I want to > include in the next draft specification. > > Feedback and comments welcome! In the meantime I will implement this in > code and update the draft specification.Most of the issue seems to be a consequence of using a 4 byte CID. I think the right thing to do is just to teach guests about 64 bit CIDs. For now, can we drop guest CID from guest to host communication completely, making CID only host-visible? Maybe leave the space in the packet so we can add CID there later. It seems that in theory this will allow changing CID during migration, transparently to the guest. Guest visible CID is required for guest to guest communication - but IIUC that is not currently supported. Maybe that can be made conditional on 64 bit addressing. Alternatively, it seems much easier to accept that these channels get broken across migration.> 1. Requirements > > Virtio-vsock is a new AF_VSOCK transport. As such, it should provide at > least the same guarantees as the existing AF_VSOCK VMCI transport. This > is for consistency and to allow code reuse across any AF_VSOCK > transport. > > Virtio-vsock aims to replace virtio-serial by providing the same > guest/host communication ability but with sockets API semantics that are > more popular and convenient for application developers. Therefore > virtio-vsock migration should provide at least the same level of > migration functionality as virtio-serial. > > Ideally it should be possible to migrate applications using AF_VSOCK > together with the virtual machine so that guest<->host communication is > interrupted. Neither AF_VSOCK VMCI nor virtio-serial support this > today. > > 2. Basic disruptive migration flow > > When the virtual machine migrates from the source host to the > destination host, the guest's CID may change. The CID namespace is > host-wide so other hosts may have CID collisions and allocate a new CID > for incoming migration VMs. > > The device notifies the guest that the CID has changed. Guest sockets > are affected as follows: > > * Established connections are reset (ECONNRESET) and the guest > application will have to reconnect. > > * Listen sockets remain open. The only thing to note is that > connections from the host are now made to the new CID. This means > the local address of the listen socket is automatically updated to > the new CID. > > * Sockets in other states are unchanged. > > Applications must handle disruptive migration by reconnecting if > necessary after ECONNRESET. > > 3. Checkpoint/restore for seamless migration > > Applications that wish to communicate across live migration can do so > but this requires extra application-specific checkpoint/restore code. > > This is similar to the approach taken by the CRIU project where > getsockopt()/setsockopt() is used to migrate socket state. The > difference is that the application process is not automatically migrated > from the source host to the destination host. Therefore, the > application needs to migrate its own state somehow. > > The flow is as follows: > > The application on the source host must quiesce (stop sending/receiving) > and use getsockopt() to extract socket state information from the host > kernel. > > A new instance of the application is started on the destination host and > given the state so it can restore the connection. The setsockopt() > syscall is used to restore socket state information. > > The guest is given a list of <host_old_cid, host_new_cid, host_port, > guest_port> tuples for established connections that must not be reset > when the guest CID update notification is received. These connections > will carry on as if nothing changed. > > Note that the connection's remote address is updated from host_old_cid > to host_new_cid. This allows remapping of CIDs (if necessary). > Typically this will be unused because the host always has well-known CID > 2. In a guest<->guest scenario it may be used to remap CIDs. > > > For the time being I am focussing on the basic disruptive migration flow > only. Checkpoint/restore can be added with a feature bit in the future. > It is a lot more complex and I'm not sure whether there will be any > users yet. > > Stefan
On Fri, Mar 11, 2016 at 01:56:05AM +0200, Michael S. Tsirkin wrote:> On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote: > > Michael pointed out that the virtio-vsock draft specification does not > > address live migration and in fact currently precludes migration. > > > > Migration is fundamental so the device specification at least mustn't > > preclude it. Having brainstormed migration with Matthew Benjamin and > > Michael Tsirkin, I am now summarizing the approach that I want to > > include in the next draft specification. > > > > Feedback and comments welcome! In the meantime I will implement this in > > code and update the draft specification. > > > > 1. Requirements > > > > Virtio-vsock is a new AF_VSOCK transport. As such, it should provide at > > least the same guarantees as the existing AF_VSOCK VMCI transport. This > > is for consistency and to allow code reuse across any AF_VSOCK > > transport. > > > > Virtio-vsock aims to replace virtio-serial by providing the same > > guest/host communication ability but with sockets API semantics that are > > more popular and convenient for application developers. Therefore > > virtio-vsock migration should provide at least the same level of > > migration functionality as virtio-serial. > > > > Ideally it should be possible to migrate applications using AF_VSOCK > > together with the virtual machine so that guest<->host communication is > > interrupted. Neither AF_VSOCK VMCI nor virtio-serial support this > > today. > > I'm not sure why do you say this about virtio serial. > It appears that if host pre-connected to destination > qemu before migration, backend reconnects transparently > on destination.You are right, virtio-serial supports keeping active ports open across migration (as well as closing active ports across migration). In virtio-vsock the equivalent would be setsockopt() CRIU-style socket migration which is not implemented today.> > 2. Basic disruptive migration flow > > > > When the virtual machine migrates from the source host to the > > destination host, the guest's CID may change. The CID namespace is > > host-wide > > > BTW, I think CIDs would have to become per network namespace.Yes, I agree.> > so other hosts may have CID collisions and allocate a new CID > > for incoming migration VMs. > > I guess all this is so that guest can retrieve its CID and > send it to host using some side-channel?Yes.> > The device notifies the guest that the CID has changed. Guest sockets > > are affected as follows: > > > > * Established connections are reset (ECONNRESET) and the guest > > application will have to reconnect. > > > > * Listen sockets remain open. The only thing to note is that > > connections from the host are now made to the new CID. This means > > the local address of the listen socket is automatically updated to > > the new CID. > > > > * Sockets in other states are unchanged. > > > > Applications must handle disruptive migration by reconnecting if > > necessary after ECONNRESET. > > > > 3. Checkpoint/restore for seamless migration > > > > Applications that wish to communicate across live migration can do so > > but this requires extra application-specific checkpoint/restore code. > > > > This is similar to the approach taken by the CRIU project where > > getsockopt()/setsockopt() is used to migrate socket state. The > > difference is that the application process is not automatically migrated > > from the source host to the destination host. Therefore, the > > application needs to migrate its own state somehow. > > > > The flow is as follows: > > > > The application on the source host must quiesce (stop sending/receiving) > > and use getsockopt() to extract socket state information from the host > > kernel. > > > > A new instance of the application is started on the destination host and > > given the state so it can restore the connection. The setsockopt() > > syscall is used to restore socket state information. > > > > The guest is given a list of <host_old_cid, host_new_cid, host_port, > > guest_port> tuples for established connections that must not be reset > > when the guest CID update notification is received. These connections > > will carry on as if nothing changed. > > > > Note that the connection's remote address is updated from host_old_cid > > to host_new_cid. This allows remapping of CIDs (if necessary). > > Typically this will be unused because the host always has well-known CID > > 2. In a guest<->guest scenario it may be used to remap CIDs. > > > > > > For the time being I am focussing on the basic disruptive migration flow > > only. Checkpoint/restore can be added with a feature bit in the future. > > It is a lot more complex and I'm not sure whether there will be any > > users yet. > > > > Stefan > > This makes some things harder. For example, imagine a guest > reboot mixed with migration. We don't know why did the connection > die, so we'll retry connections until - when? > > Could you please describe some user of vsock and show how > it recovers from destructive migration?qemu-guest-agent runs inside the guest with an AF_VSOCK listen socket. libvirt arbitrates the qemu-guest-agent connection and provides an API for applications to send commands. When an application sends a command, libvirt checks if the connection to qemu-guest-agent is established. If there is no connection libvirt will attempt to connect. The command is sent to qemu-guest-agent and the response is handed back to the guest application. libvirt arbitrates access so commands from multiple applications are serialized. Live migration resets the established connection between qemu-guest-agent and the source host's libvirt daemon. When an application issues the next qemu-guest-agent command the libvirt daemon on the destination host notices there is no established connection yet and starts a new one. Libvirt refuses to send qemu-guest-agent commands while live migration is in progress. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160315/fc01277d/attachment.sig>
On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote:> On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote: > > Michael pointed out that the virtio-vsock draft specification does not > > address live migration and in fact currently precludes migration. > > > > Migration is fundamental so the device specification at least mustn't > > preclude it. Having brainstormed migration with Matthew Benjamin and > > Michael Tsirkin, I am now summarizing the approach that I want to > > include in the next draft specification. > > > > Feedback and comments welcome! In the meantime I will implement this in > > code and update the draft specification. > > Most of the issue seems to be a consequence of using a 4 byte CID. > > I think the right thing to do is just to teach guests > about 64 bit CIDs. > > For now, can we drop guest CID from guest to host communication completely, > making CID only host-visible? Maybe leave the space > in the packet so we can add CID there later. > It seems that in theory this will allow changing CID > during migration, transparently to the guest. > > Guest visible CID is required for guest to guest communication - > but IIUC that is not currently supported. > Maybe that can be made conditional on 64 bit addressing. > Alternatively, it seems much easier to accept that these channels get broken > across migration.I reached the conclusion that channels break across migration because: 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by changing it to 64-bit. Application code would be specific virtio-vsock and wouldn't work with other AF_VSOCK transports that use the 32-bit sockaddr_vm struct. 2. Dropping guest CIDs from the protocol breaks network protocols that send addresses. NFS and netperf are the first two protocols I looked at and both transmit address information across the connection... -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160315/16222652/attachment.sig>