On Tue, Mar 15, 2016 at 06:12:55PM +0200, Michael S. Tsirkin wrote:> On Tue, Mar 15, 2016 at 03:15:29PM +0000, Stefan Hajnoczi wrote: > > On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote: > > > On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote: > > > > Michael pointed out that the virtio-vsock draft specification does not > > > > address live migration and in fact currently precludes migration. > > > > > > > > Migration is fundamental so the device specification at least mustn't > > > > preclude it. Having brainstormed migration with Matthew Benjamin and > > > > Michael Tsirkin, I am now summarizing the approach that I want to > > > > include in the next draft specification. > > > > > > > > Feedback and comments welcome! In the meantime I will implement this in > > > > code and update the draft specification. > > > > > > Most of the issue seems to be a consequence of using a 4 byte CID. > > > > > > I think the right thing to do is just to teach guests > > > about 64 bit CIDs. > > > > > > For now, can we drop guest CID from guest to host communication completely, > > > making CID only host-visible? Maybe leave the space > > > in the packet so we can add CID there later. > > > It seems that in theory this will allow changing CID > > > during migration, transparently to the guest. > > > > > > Guest visible CID is required for guest to guest communication - > > > but IIUC that is not currently supported. > > > Maybe that can be made conditional on 64 bit addressing. > > > Alternatively, it seems much easier to accept that these channels get broken > > > across migration. > > > > I reached the conclusion that channels break across migration because: > > > > 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by > > changing it to 64-bit. Application code would be specific > > virtio-vsock and wouldn't work with other AF_VSOCK transports that > > use the 32-bit sockaddr_vm struct. > > You don't have to repeat the IPv6 mistake. Make all 32 bit CIDs > 64 bit CIDs by padding with 0s, then 64 bit apps can use > any CID. > > Old 32 bit CID applications will not be able to use the extended > addresses, but hardcoding bugs > does not seem sane.A mixed 32-bit and 64-bit CID world is complex. The host doesn't know in advance whether all applications (especially inside the guest) will support 64-bit CIDs or not. 32-bit CID applications won't work if a 64-bit CID has been assigned. It also opens up the question how unique CIDs are allocated across hosts. Given that AF_VSOCK in Linux already exists in the 32-bit CID version, I'd prefer to make virtio-vsock compatible with that for the time being. Extensions can be added in the future but just implementing existing AF_VSOCK semantics will already allow the applications to run.> > 2. Dropping guest CIDs from the protocol breaks network protocols that > > send addresses. > > Stick it in config space if you really have to. > But why do you need it on each packet?If packets are implicitly guest<->host then adding guest<->guest communication requires a virtio spec change. If packets contain source/destination CIDs then allowing/forbidding guest<->host or guest<->guest communication is purely a host policy decision. I think it's worth keeping that in from the start.> > NFS and netperf are the first two protocols I looked > > at and both transmit address information across the connection... > > > Does netperf really attempt to get local IP > and then send that inline within the connection?Yes, netperf has separate control and data sockets. I think part of the reason for this split is that the control connection can communicate the address details for the data connection over a different protocol (TCP + RDMA?), but I'm not sure. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160316/22b8ad22/attachment.sig>
Hi, ----- Original Message -----> From: "Stefan Hajnoczi" <stefanha at redhat.com> > To: "Michael S. Tsirkin" <mst at redhat.com>> > > > I think the right thing to do is just to teach guests > > > > about 64 bit CIDs. > > > > > > > > For now, can we drop guest CID from guest to host communication > > > > completely, > > > > making CID only host-visible? Maybe leave the space > > > > in the packet so we can add CID there later. > > > > It seems that in theory this will allow changing CID > > > > during migration, transparently to the guest. > > > > > > > > Guest visible CID is required for guest to guest communication - > > > > but IIUC that is not currently supported. > > > > Maybe that can be made conditional on 64 bit addressing. > > > > Alternatively, it seems much easier to accept that these channels get > > > > broken > > > > across migration. > > > > > > I reached the conclusion that channels break across migration because: > > > > > > 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by > > > changing it to 64-bit. Application code would be specific > > > virtio-vsock and wouldn't work with other AF_VSOCK transports that > > > use the 32-bit sockaddr_vm struct. > > > > You don't have to repeat the IPv6 mistake. Make all 32 bit CIDs > > 64 bit CIDs by padding with 0s, then 64 bit apps can use > > any CID. > > > > Old 32 bit CID applications will not be able to use the extended > > addresses, but hardcoding bugs > > does not seem sane. > > A mixed 32-bit and 64-bit CID world is complex. The host doesn't know > in advance whether all applications (especially inside the guest) will > support 64-bit CIDs or not. 32-bit CID applications won't work if a > 64-bit CID has been assigned. > > It also opens up the question how unique CIDs are allocated across > hosts. > > Given that AF_VSOCK in Linux already exists in the 32-bit CID version, > I'd prefer to make virtio-vsock compatible with that for the time being. > Extensions can be added in the future but just implementing existing > AF_VSOCK semantics will already allow the applications to run. > > > > 2. Dropping guest CIDs from the protocol breaks network protocols that > > > send addresses. > > > > Stick it in config space if you really have to. > > But why do you need it on each packet? > > If packets are implicitly guest<->host then adding guest<->guest > communication requires a virtio spec change. If packets contain > source/destination CIDs then allowing/forbidding guest<->host or > guest<->guest communication is purely a host policy decision. I think > it's worth keeping that in from the start.I'm just the downstream consumer of vsock, but this was my intuition, as well. Matt> > > > NFS and netperf are the first two protocols I looked > > > at and both transmit address information across the connection... > > > >> Stefan >-- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-707-0660 fax. 734-769-8938 cel. 734-216-5309
On Wed, Mar 16, 2016 at 02:32:00PM +0000, Stefan Hajnoczi wrote:> On Tue, Mar 15, 2016 at 06:12:55PM +0200, Michael S. Tsirkin wrote: > > On Tue, Mar 15, 2016 at 03:15:29PM +0000, Stefan Hajnoczi wrote: > > > On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote: > > > > On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote: > > > > > Michael pointed out that the virtio-vsock draft specification does not > > > > > address live migration and in fact currently precludes migration. > > > > > > > > > > Migration is fundamental so the device specification at least mustn't > > > > > preclude it. Having brainstormed migration with Matthew Benjamin and > > > > > Michael Tsirkin, I am now summarizing the approach that I want to > > > > > include in the next draft specification. > > > > > > > > > > Feedback and comments welcome! In the meantime I will implement this in > > > > > code and update the draft specification. > > > > > > > > Most of the issue seems to be a consequence of using a 4 byte CID. > > > > > > > > I think the right thing to do is just to teach guests > > > > about 64 bit CIDs. > > > > > > > > For now, can we drop guest CID from guest to host communication completely, > > > > making CID only host-visible? Maybe leave the space > > > > in the packet so we can add CID there later. > > > > It seems that in theory this will allow changing CID > > > > during migration, transparently to the guest. > > > > > > > > Guest visible CID is required for guest to guest communication - > > > > but IIUC that is not currently supported. > > > > Maybe that can be made conditional on 64 bit addressing. > > > > Alternatively, it seems much easier to accept that these channels get broken > > > > across migration. > > > > > > I reached the conclusion that channels break across migration because: > > > > > > 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by > > > changing it to 64-bit. Application code would be specific > > > virtio-vsock and wouldn't work with other AF_VSOCK transports that > > > use the 32-bit sockaddr_vm struct. > > > > You don't have to repeat the IPv6 mistake. Make all 32 bit CIDs > > 64 bit CIDs by padding with 0s, then 64 bit apps can use > > any CID. > > > > Old 32 bit CID applications will not be able to use the extended > > addresses, but hardcoding bugs > > does not seem sane. > > A mixed 32-bit and 64-bit CID world is complex. The host doesn't know > in advance whether all applications (especially inside the guest) will > support 64-bit CIDs or not. 32-bit CID applications won't work if a > 64-bit CID has been assigned.Only for guest to guest communication, correct? Host can do dual addressing as well. Applications that do not want connections to be broken will use 64 bit addresses. Old applications will keep running until you migrate.> It also opens up the question how unique CIDs are allocated across > hosts.I think it's actually a good idea to define this, rather than leave things in the air. For example, EUI-64 can be used.> Given that AF_VSOCK in Linux already exists in the 32-bit CID version, > I'd prefer to make virtio-vsock compatible with that for the time being.Yes, so we cut corners in order to ship it quickly, but that is implementation. Linux can be extended. Why limit the protocol to follow current implementation bugs?> Extensions can be added in the future but just implementing existing > AF_VSOCK semantics will already allow the applications to run.It's an important goal. At the spec level, I do not think it is a good idea to put this limitation in, but users can just use a subset of the available address space in order to make existing apps work.> > > 2. Dropping guest CIDs from the protocol breaks network protocols that > > > send addresses. > > > > Stick it in config space if you really have to. > > But why do you need it on each packet? > > If packets are implicitly guest<->host then adding guest<->guest > communication requires a virtio spec change. If packets contain > source/destination CIDs then allowing/forbidding guest<->host or > guest<->guest communication is purely a host policy decision. I think > it's worth keeping that in from the start.OK.> > > NFS and netperf are the first two protocols I looked > > > at and both transmit address information across the connection... > > > > > > Does netperf really attempt to get local IP > > and then send that inline within the connection? > > Yes, netperf has separate control and data sockets. I think part of the > reason for this split is that the control connection can communicate the > address details for the data connection over a different protocol (TCP + > RDMA?), but I'm not sure. > > StefanThinking about it, netperf does not survive disconnects. So the current protocol would be useless for it. I am not sure about NFS but from (long past) experience it did not attempt to re-resolve the name to address, so changing an address would break it as well. So I think these applications would have to use a 64 bit CID. Why, then, do we care about one aspect of these applications (creating connections) and not another (not breaking them)? -- MST
On Wed, Mar 16, 2016 at 05:05:19PM +0200, Michael S. Tsirkin wrote:> > > > NFS and netperf are the first two protocols I looked > > > > at and both transmit address information across the connection... > > > > > > > > > Does netperf really attempt to get local IP > > > and then send that inline within the connection? > > > > Yes, netperf has separate control and data sockets. I think part of the > > reason for this split is that the control connection can communicate the > > address details for the data connection over a different protocol (TCP + > > RDMA?), but I'm not sure. > > > > Stefan > > Thinking about it, netperf does not survive disconnects. > So the current protocol would be useless for it. > I am not sure about NFS but from (long past) experience it did not > attempt to re-resolve the name to address, so changing an > address would break it as well. > > So I think these applications would have to use a 64 bit CID. > > Why, then, do we care about one aspect of these applications > (creating connections) and not another (not breaking them)?I care about mapping the semantics of AF_VSOCK to virtio-vsock. AF_VSOCK was implemented with the ability to plug in additional transports (like virtio). This allows guest agents and other applications to compile once and run on any transport. If we change virtio-vsock to rely on unique addresses across migration then we lose zero-configuration. AF_VSOCK applications use the VMADDR_CID_HOST (2) constant to communicate with the host. After live migration this well-known CID refers to the new host. Applications would need to know a unique host CID in order to work correctly across live migration. Although I appreciate your drive to make the device as flexible as possible, if we want to do this we are totally beyond AF_VSOCK semantics and would be better served by a separate effort that avoids confusion between class AF_VSOCK semantics and virtio socket semantics. Can we please treat AF_VSOCK semantics as the requirements we're trying to implement? It supports qemu-guest-agent and as I described in a previous mail could also support transparent connection migration a la CRIU sockets. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160406/86c1c5ef/attachment.sig>