Michael S. Tsirkin
2021-Apr-14 07:20 UTC
[RFC v2] virtio-vsock: add description for datagram type
On Wed, Apr 14, 2021 at 08:57:06AM +0200, Stefano Garzarella wrote:> On Tue, Apr 13, 2021 at 03:58:34PM -0400, Michael S. Tsirkin wrote: > > On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote: > > > On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote: > > > > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote: > > > > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote: > > > > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote: > > > > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote: > > > > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare at redhat.com> wrote: > > > > > > > > > > > > > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote: > > > > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote: > > > > > > > > > >> Add supports for datagram type for virtio-vsock. Datagram > > > > > > > > > >> sockets are connectionless and unreliable. To avoid contention > > > > > > > > > >> with stream and other sockets, add two more virtqueues and > > > > > > > > > >> a new feature bit to identify if those two new queues exist or not. > > > > > > > > > >> > > > > > > > > > >> Also add descriptions for resource management of datagram, which > > > > > > > > > >> does not use the existing credit update mechanism associated with > > > > > > > > > >> stream sockets. > > > > > > > > > >> > > > > > > > > > >> Signed-off-by: Jiang Wang <jiang.wang at bytedance.com> > > > > > > > > > >> --- > > > > > > > > > >> V2 addressed the comments for the previous version. > > > > > > > > > >> > > > > > > > > > >> virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++--------- > > > > > > > > > >> 1 file changed, 52 insertions(+), 10 deletions(-) > > > > > > > > > >> > > > > > > > > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex > > > > > > > > > >> index da7e641..62c12e0 100644 > > > > > > > > > >> --- a/virtio-vsock.tex > > > > > > > > > >> +++ b/virtio-vsock.tex > > > > > > > > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues} > > > > > > > > > >> \begin{description} > > > > > > > > > >> \item[0] rx > > > > > > > > > >> \item[1] tx > > > > > > > > > >> +\item[2] datagram rx > > > > > > > > > >> +\item[3] datagram tx > > > > > > > > > >> +\item[4] event > > > > > > > > > >> +\end{description} > > > > > > > > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it > > > > > > > > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets. > > > > > > > > > >> + > > > > > > > > > >> +\begin{description} > > > > > > > > > >> +\item[0] rx > > > > > > > > > >> +\item[1] tx > > > > > > > > > >> \item[2] event > > > > > > > > > >> \end{description} > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx" > > > > > > > > > >virtqueues and also adding this: > > > > > > > > > > > > > > > > > > > > When behavior differs between stream and datagram rx/tx virtqueues > > > > > > > > > > their full names are used. Common behavior is simply described in > > > > > > > > > > terms of rx/tx virtqueues and applies to both stream and datagram > > > > > > > > > > virtqueues. > > > > > > > > > > > > > > > > > > > >This way you won't need to duplicate portions of the spec that deal with > > > > > > > > > >populating the virtqueues, for example. > > > > > > > > > > > > > > > > > > > >It's also clearer to use a full name for stream rx/tx virtqueues instead > > > > > > > > > >of calling them rx/tx virtqueues now that we have datagram rx/tx > > > > > > > > > >virtqueues. > > > > > > > > > > > > > > > > > > > >> + > > > > > > > > > >> \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits} > > > > > > > > > >> > > > > > > > > > >> -There are currently no feature bits defined for this device. > > > > > > > > > >> +\begin{description} > > > > > > > > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type. > > > > > > > > > >> +\end{description} > > > > > > > > > >> > > > > > > > > > >> \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout} > > > > > > > > > >> > > > > > > > > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op > > > > > > > > > >> > > > > > > > > > >> \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control} > > > > > > > > > >> > > > > > > > > > >> +Flow control applies to stream sockets; datagram sockets do not have > > > > > > > > > >> +flow control. > > > > > > > > > >> + > > > > > > > > > >> The tx virtqueue carries packets initiated by applications and replies to > > > > > > > > > >> received packets. The rx virtqueue carries packets initiated by the device and > > > > > > > > > >> replies to previously transmitted packets. > > > > > > > > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera > > > > > > > > > >> consists of a (cid, port number) tuple. The header fields used for this are > > > > > > > > > >> \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}. > > > > > > > > > >> > > > > > > > > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream > > > > > > > > > >> -socket types. > > > > > > > > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream > > > > > > > > > >> +socket types. \field{type} is 3 for dgram socket types. > > > > > > > > > >> > > > > > > > > > >> Stream sockets provide in-order, guaranteed, connection-oriented delivery > > > > > > > > > >> without message boundaries. > > > > > > > > > >> > > > > > > > > > >> +Datagram sockets provide connectionless unreliable messages of > > > > > > > > > >> +a fixed maximum length. > > > > > > > > > > > > > > > > > > > >Plus unordered (?) and with message boundaries. In other words: > > > > > > > > > > > > > > > > > > > > Datagram sockets provide unordered, unreliable, connectionless message > > > > > > > > > > with message boundaries and a fixed maximum length. > > > > > > > > > > > > > > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the > > > > > > > > > >intention is that the rx buffer size is the message size limit? That's > > > > > > > > > >different from UDP messages, which can be fragmented into multiple IP > > > > > > > > > >packets and can be larger than 64KiB: > > > > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure > > > > > > > > > > > > > > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little > > > > > > > > > >concerned that applications that run successfully over UDP will not be > > > > > > > > > >portable if vsock has this limitation because it would impose extra > > > > > > > > > >message boundaries that the application protocol might not tolerate. > > > > > > > > > > > > > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET. > > > > > > > > > Fragment the packets according to the buffers in the virtqueue and set > > > > > > > > > the EOR flag to indicate the last packet in the message. > > > > > > > > > > > > > > > > > Agree. Another option is to use the ones for skb since we may need to > > > > > > > > use skbs for multiple transport support anyway. > > > > > > > > > > > > > > > > > > > > > > The important thing I think is to have a single flag in virtio-vsock that > > > > > > > identifies pretty much the same thing: this is the last fragment of a series > > > > > > > to rebuild a packet. > > > > > > > > > > > > > > We should reuse the same flag for DGRAM and SEQPACKET. > > > > > > > > > > > > > > Thanks, > > > > > > > Stefano > > > > > > > > > > > > Well DGRAM can drop data so I wonder whether it can work ... > > > > > > > > > > > > > > > > Yep, this is true, but the channel should not be losing packets, so if the > > > > > receiver discards packets, it knows that it must then discard all of them > > > > > until the EOR. > > > > > > > > That is not so easy - they can come mixed up from multiple sources. > > > > > > I think we can prevent mixing because virtuqueue is point to point and its > > > use is not thread safe, so the access (in the same peer) is already > > > serialized. > > > In the end the packet would be fragmented only before copying it to the > > > virtuqueue. > > > > > > But maybe I missed something... > > > > Well I ask what's the point of fragmenting then. I assume it's so we > > can pass huge messages around so you can't keep locks ... > > > > Maybe I'm wrong, but isn't this similar to what we do in virtio-net with > mergeable buffers?The point of mergeable buffers is to use less memory: both for each packet and for a full receive vq.> Also in this case I think the fragmentation will happen only in the device, > since the driver can enqueue the entire buffer. > > Maybe we can reuse mergeable buffers for virtio-vsock if the EOR flag is not > suitable.That sounds very reasonable.> IIUC in the vsock device the fragmentation for DGRAM will happen just before > to queue it in the virtqueue, and the device can check how many buffers are > available in the queue and it can decide whether to queue them all up or > throw them away. > > > > > > Sure linux net core does this but with fragmentation added in, > > > > I start wondering whether you are beginning to reinvent the net stack > > > > ... > > > > > > No, I hope not :-), in the end our advantage is that we have a channel that > > > doesn't lose packets, so I guess we can make assumptions that the network > > > stack can't. > > > > > > Thanks, > > > Stefano > > > > I still don't know how will credit accounting work for datagram, > > but proposals I saw seem to actually lose packets ... > > > > I still don't know too, but I think it's not an issue in the RX side, > since if it doesn't have space, can drop all the fragment. > > Another option to avoid fragmentation could be to allocate 64K buffers for > the new DGRAM virtqueue.That's a lot of buffers ...> In this way we will have at most 64K packets, which is similar to UDP/IP, > without extra work for the fragmentation.IIRC default MTU is 1280 not 64K ...> > Thanks, > Stefano
Stefano Garzarella
2021-Apr-14 09:38 UTC
[RFC v2] virtio-vsock: add description for datagram type
On Wed, Apr 14, 2021 at 03:20:07AM -0400, Michael S. Tsirkin wrote:>On Wed, Apr 14, 2021 at 08:57:06AM +0200, Stefano Garzarella wrote: >> On Tue, Apr 13, 2021 at 03:58:34PM -0400, Michael S. Tsirkin wrote: >> > On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote: >> > > On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote: >> > > > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote: >> > > > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote: >> > > > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote: >> > > > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote: >> > > > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare at redhat.com> wrote: >> > > > > > > > > >> > > > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote: >> > > > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang >> > > > > > > > > >wrote:[...]>> > > > > > > > > >> >> > > > > > > > > >> +Datagram sockets provide connectionless unreliable messages of >> > > > > > > > > >> +a fixed maximum length. >> > > > > > > > > > >> > > > > > > > > >Plus unordered (?) and with message boundaries. In other words: >> > > > > > > > > > >> > > > > > > > > > Datagram sockets provide unordered, unreliable, connectionless message >> > > > > > > > > > with message boundaries and a fixed maximum length. >> > > > > > > > > > >> > > > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the >> > > > > > > > > >intention is that the rx buffer size is the message size limit? That's >> > > > > > > > > >different from UDP messages, which can be fragmented into multiple IP >> > > > > > > > > >packets and can be larger than 64KiB: >> > > > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure >> > > > > > > > > > >> > > > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little >> > > > > > > > > >concerned that applications that run successfully over UDP will not be >> > > > > > > > > >portable if vsock has this limitation because it would impose extra >> > > > > > > > > >message boundaries that the application protocol might not tolerate. >> > > > > > > > > >> > > > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET. >> > > > > > > > > Fragment the packets according to the buffers in the virtqueue and set >> > > > > > > > > the EOR flag to indicate the last packet in the message. >> > > > > > > > > >> > > > > > > > Agree. Another option is to use the ones for skb since we may need to >> > > > > > > > use skbs for multiple transport support anyway. >> > > > > > > > >> > > > > > > >> > > > > > > The important thing I think is to have a single flag in virtio-vsock that >> > > > > > > identifies pretty much the same thing: this is the last fragment of a series >> > > > > > > to rebuild a packet. >> > > > > > > >> > > > > > > We should reuse the same flag for DGRAM and SEQPACKET. >> > > > > > > >> > > > > > > Thanks, >> > > > > > > Stefano >> > > > > > >> > > > > > Well DGRAM can drop data so I wonder whether it can work ... >> > > > > > >> > > > > >> > > > > Yep, this is true, but the channel should not be losing packets, so if the >> > > > > receiver discards packets, it knows that it must then discard all of them >> > > > > until the EOR. >> > > > >> > > > That is not so easy - they can come mixed up from multiple sources. >> > > >> > > I think we can prevent mixing because virtuqueue is point to point and its >> > > use is not thread safe, so the access (in the same peer) is already >> > > serialized. >> > > In the end the packet would be fragmented only before copying it to the >> > > virtuqueue. >> > > >> > > But maybe I missed something... >> > >> > Well I ask what's the point of fragmenting then. I assume it's so we >> > can pass huge messages around so you can't keep locks ... >> > >> >> Maybe I'm wrong, but isn't this similar to what we do in virtio-net with >> mergeable buffers? > >The point of mergeable buffers is to use less memory: both for each >packet and for a full receive vq. > >> Also in this case I think the fragmentation will happen only in the device, >> since the driver can enqueue the entire buffer. >> >> Maybe we can reuse mergeable buffers for virtio-vsock if the EOR flag is not >> suitable. > >That sounds very reasonable.It should also allow us to save the header for each fragment. @Jiang Do you want to explore this? I'm talking about VIRTIO_NET_F_MRG_RXBUF feature.> >> IIUC in the vsock device the fragmentation for DGRAM will happen just >> before >> to queue it in the virtqueue, and the device can check how many buffers are >> available in the queue and it can decide whether to queue them all up or >> throw them away. >> > >> > > > Sure linux net core does this but with fragmentation added in, >> > > > I start wondering whether you are beginning to reinvent the net stack >> > > > ... >> > > >> > > No, I hope not :-), in the end our advantage is that we have a channel that >> > > doesn't lose packets, so I guess we can make assumptions that the network >> > > stack can't. >> > > >> > > Thanks, >> > > Stefano >> > >> > I still don't know how will credit accounting work for datagram, >> > but proposals I saw seem to actually lose packets ... >> > >> >> I still don't know too, but I think it's not an issue in the RX side, >> since if it doesn't have space, can drop all the fragment. >> >> Another option to avoid fragmentation could be to allocate 64K buffers for >> the new DGRAM virtqueue. > >That's a lot of buffers ...Yep I see, and they would often be mostly unused...> >> In this way we will have at most 64K packets, which is similar to >> UDP/IP, >> without extra work for the fragmentation. > >IIRC default MTU is 1280 not 64K ...I was thinking that UDP at most can support 64K messages that IP should fragment according to MTU. Thanks, Stefano