Stefano Garzarella
2023-Jun-29 12:30 UTC
[PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams
On Tue, Jun 27, 2023 at 01:19:43AM +0000, Bobby Eshleman wrote:>On Mon, Jun 26, 2023 at 05:03:15PM +0200, Stefano Garzarella wrote: >> On Fri, Jun 23, 2023 at 04:37:55AM +0000, Bobby Eshleman wrote: >> > On Thu, Jun 22, 2023 at 06:09:12PM +0200, Stefano Garzarella wrote: >> > > On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote: >> > > > Hello Bobby! >> > > > >> > > > On 10.06.2023 03:58, Bobby Eshleman wrote: >> > > > > This commit adds support for datagrams over virtio/vsock. >> > > > > >> > > > > Message boundaries are preserved on a per-skb and per-vq entry basis. >> > > > >> > > > I'm a little bit confused about the following case: let vhost sends 4097 bytes >> > > > datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each >> > > > buffer has attached empty skb to it. Vhost places first 4096 bytes to the first >> > > > buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest >> > > > has two skb in it rx queue, and user in guest wants to read data - does it read >> > > > 4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is >> > > > special marker in header which shows where message ends, and how it works here? >> > > >> > > I think the main difference is that DGRAM is not connection-oriented, so >> > > we don't have a stream and we can't split the packet into 2 (maybe we >> > > could, but we have no guarantee that the second one for example will be >> > > not discarded because there is no space). >> > > >> > > So I think it is acceptable as a restriction to keep it simple. >> > > >> > > My only doubt is, should we make the RX buffer size configurable, >> > > instead of always using 4k? >> > > >> > I think that is a really good idea. What mechanism do you imagine? >> >> Some parameter in sysfs? >> > >I comment more on this below. > >> > >> > For sendmsg() with buflen > VQ_BUF_SIZE, I think I'd like -ENOBUFS >> >> For the guest it should be easy since it allocates the buffers, but for >> the host? >> >> Maybe we should add a field in the configuration space that reports some >> sort of MTU. >> >> Something in addition to what Laura had proposed here: >> https://markmail.org/message/ymhz7wllutdxji3e >> > >That sounds good to me. > >IIUC vhost exposes the limit via the configuration space, and the guest >can configure the RX buffer size up to that limit via sysfs? > >> > returned even though it is uncharacteristic of Linux sockets. >> > Alternatively, silently dropping is okay... but seems needlessly >> > unhelpful. >> >> UDP takes advantage of IP fragmentation, right? >> But what happens if a fragment is lost? >> >> We should try to behave in a similar way. >> > >AFAICT in UDP the sending socket will see EHOSTUNREACH on its error >queue and the packet will be dropped. > >For more details: >- the IP defragmenter will emit an ICMP_TIME_EXCEEDED from ip_expire() > if the fragment queue is not completed within time. >- Upon seeing ICMP_TIME_EXCEEDED, the sending stack will then add > EHOSTUNREACH to the socket's error queue, as seen in __udp4_lib_err(). > >Given some updated man pages I think enqueuing EHOSTUNREACH is okay for >vsock too. This also reserves ENOBUFS/ENOMEM only for shortage on local >buffers / mem. > >What do you think?Yep, makes sense to me! Stefano