thr3ads.net - Virtualization - [PATCH RFC net-next v2 0/4] virtio/vsock: support datagrams [May 2023]

If this information is useful, please help other people find it:
Share via:
Stefano Garzarella
2023-May-03 12:13 UTC
[PATCH RFC net-next v2 0/4] virtio/vsock: support datagrams

On Sat, Apr 15, 2023 at 03:55:05PM +0000, Bobby Eshleman
wrote:>On Fri, Apr 28, 2023 at 12:43:09PM +0200, Stefano Garzarella wrote:
>> On Sat, Apr 15, 2023 at 07:13:47AM +0000, Bobby Eshleman wrote:
>> > CC'ing virtio-dev at lists.oasis-open.org because this thread
is starting
>> > to touch the spec.
>> >
>> > On Wed, Apr 19, 2023 at 12:00:17PM +0200, Stefano Garzarella
wrote:
>> > > Hi Bobby,
>> > >
>> > > On Fri, Apr 14, 2023 at 11:18:40AM +0000, Bobby Eshleman
wrote:
>> > > > CC'ing Cong.
>> > > >
>> > > > On Fri, Apr 14, 2023 at 12:25:56AM +0000, Bobby Eshleman
wrote:
>> > > > > Hey all!
>> > > > >
>> > > > > This series introduces support for datagrams to
virtio/vsock.
>> > >
>> > > Great! Thanks for restarting this work!
>> > >
>> >
>> > No problem!
>> >
>> > > > >
>> > > > > It is a spin-off (and smaller version) of this
series from the summer:
>> > > > >  
https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman at
bytedance.com/
>> > > > >
>> > > > > Please note that this is an RFC and should not be
merged until
>> > > > > associated changes are made to the virtio
specification, which will
>> > > > > follow after discussion from this series.
>> > > > >
>> > > > > This series first supports datagrams in a basic
form for virtio, and
>> > > > > then optimizes the sendpath for all transports.
>> > > > >
>> > > > > The result is a very fast datagram communication
protocol that
>> > > > > outperforms even UDP on multi-queue virtio-net w/
vhost on a variety
>> > > > > of multi-threaded workload samples.
>> > > > >
>> > > > > For those that are curious, some summary data
comparing UDP and VSOCK
>> > > > > DGRAM (N=5):
>> > > > >
>> > > > > 	vCPUS: 16
>> > > > > 	virtio-net queues: 16
>> > > > > 	payload size: 4KB
>> > > > > 	Setup: bare metal + vm (non-nested)
>> > > > >
>> > > > > 	UDP: 287.59 MB/s
>> > > > > 	VSOCK DGRAM: 509.2 MB/s
>> > > > >
>> > > > > Some notes about the implementation...
>> > > > >
>> > > > > This datagram implementation forces datagrams to
self-throttle according
>> > > > > to the threshold set by sk_sndbuf. It behaves
similar to the credits
>> > > > > used by streams in its effect on throughput and
memory consumption, but
>> > > > > it is not influenced by the receiving socket as
credits are.
>> > >
>> > > So, sk_sndbuf influece the sender and sk_rcvbuf the receiver,
right?
>> > >
>> >
>> > Correct.
>> >
>> > > We should check if VMCI behaves the same.
>> > >
>> > > > >
>> > > > > The device drops packets silently. There is room
for improvement by
>> > > > > building into the device and driver some
intelligence around how to
>> > > > > reduce frequency of kicking the virtqueue when
packet loss is high. I
>> > > > > think there is a good discussion to be had on this.
>> > >
>> > > Can you elaborate a bit here?
>> > >
>> > > Do you mean some mechanism to report to the sender that a
destination
>> > > (cid, port) is full so the packet will be dropped?
>> > >
>> >
>> > Correct. There is also the case of there being no receiver at all
for
>> > this address since this case isn't rejected upon connect().
Ideally,
>> > such a socket (which will have 100% packet loss) will be throttled
>> > aggressively.
>> >
>> > Before we go down too far on this path, I also want to clarify
that
>> > using UDP over vhost/virtio-net also has this property... this can
be
>> > observed by using tcpdump to dump the UDP packets on the bridge
network
>> > your VM is using. UDP packets sent to a garbage address can be
seen on
>> > the host bridge (this is the nature of UDP, how is the host
supposed to
>> > know the address eventually goes nowhere). I mention the above
because I
>> > think it is possible for vsock to avoid this cost, given that it
>> > benefits from being point-to-point and g2h/h2g.
>> >
>> > If we're okay with vsock being on par, then the current series
does
>> > that. I propose something below that can be added later and maybe
>> > negotiated as a feature bit too.
>>
>> I see and I agree on that, let's do it step by step.
>> If we can do it in the first phase is great, but I think is fine to add
>> this feature later.
>>
>> >
>> > > Can we adapt the credit mechanism?
>> > >
>> >
>> > I've thought about this a lot because the attraction of the
approach for
>> > me would be that we could get the wait/buffer-limiting logic for
free
>> > and without big changes to the protocol, but the problem is that
the
>> > unreliable nature of datagrams means that the source's
free-running
>> > tx_cnt will become out-of-sync with the destination's fwd_cnt
upon
>> > packet loss.
>>
>> We need to understand where the packet can be lost.
>> If the packet always reaches the destination (vsock driver or device),
>> we can discard it, but also update the counters.
>>
>> >
>> > Imagine a source that initializes and starts sending packets
before a
>> > destination socket even is created, the source's
self-throttling will be
>> > dysfunctional because its tx_cnt will always far exceed the
>> > destination's fwd_cnt.
>>
>> Right, the other problem I see is that the socket aren't connected,
so
>> we have 1-N relationship.
>>
>
>Oh yeah, good point.
>
>> >
>> > We could play tricks with the meaning of the CREDIT_UPDATE message
and
>> > fwd_cnt/buf_alloc fields, but I don't think we want to go down
that
>> > path.
>> >
>> > I think that the best and simplest approach introduces a
congestion
>> > notification (VIRTIO_VSOCK_OP_CN?). When a packet is dropped, the
>> > destination sends this notification. At a given repeated time
period T,
>> > the source can check if it has received any notifications in the
last T.
>> > If so, it halves its buffer allocation. If not, it doubles its
buffer
>> > allocation unless it is already at its max or original value.
>> >
>> > An "invalid" socket which never has any receiver will
converge towards a
>> > rate limit of one packet per time T * log2(average pkt size). That
is, a
>> > socket with 100% packet loss will only be able to send 16 bytes
every
>> > 4T. A default send buffer of MAX_UINT32 and T=5ms would hit zero
within
>> > 160ms given at least one packet sent per 5ms. I have no idea if
that is
>> > a reasonable default T for vsock, I just pulled it out of a hat
for the
>> > sake of the example.
>> >
>> > "Normal" sockets will be responsive to high loss and
rebalance during
>> > low loss. The source is trying to guess and converge on the actual
>> > buffer state of the destination.
>> >
>> > This would reuse the already-existing throttling mechanisms that
>> > throttle based upon buffer allocation. The usage of sk_sndbuf
would have
>> > to be re-worked. The application using sendmsg() will see EAGAIN
when
>> > throttled, or just sleep if !MSG_DONTWAIT.
>>
>> I see, it looks interesting, but I think we need to share that
>> information between multiple sockets, since the same destination
>> (cid, port), can be reached by multiple sockets.
>>
>
>Good point, that is true.
>
>> Another approach could be to have both congestion notification and
>> decongestion, but maybe it produces double traffic.
>>
>
>I think this could simplify things and could reduce noise. It is also
>probably sufficient for the source to simply halt upon congestion
>notification and resume upon decongestion notification, instead of
>scaling up and down like I suggested above. It also avoids the
>burstiness that would occur with a "congestion notification"-only
>approach where the source guesses when to resume and guesses wrong.
>
>The congestion notification may want to have an expiration period after
>which the sender can resume without receiving a decongestion
>notification? If it receives congestion again, then it can halt again.
Yep, I agree.

Anyway the congestion/decongestion messages should be just a hint, 
because the other peer has to keep the state and a malicious host/guest 
could use it for DoS, so the peer could discard these packets if it has 
no more space to save the state.

Thanks,
Stefano
Virtualization - May 2023 - [PATCH RFC net-next v2 0/4] virtio/vsock: support datagrams

[PATCH RFC net-next v2 0/4] virtio/vsock: support datagrams