Stefano Garzarella
2021-May-18 13:02 UTC
[RFC v2] virtio-vsock: add description for datagram type
On Mon, May 17, 2021 at 11:33:06PM -0700, Jiang Wang . wrote:>On Mon, May 17, 2021 at 4:02 AM Stefano Garzarella <sgarzare at redhat.com> wrote: >> >> On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote: >> >On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare at redhat.com> wrote: >> >> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote: >> >> [...] >> >> >> >I see. I will add some limit to dgram packets. Also, when the >> >> >virtqueues >> >> >are shared between stream and dgram, both of them need to grab a lock >> >> >before using the virtqueue, so one will not completely block another one. >> >> >> >> I'm not worried about the concurrent access that we definitely need to >> >> handle with a lock, but more about the uncontrolled packet sending that >> >> dgram might have, flooding the queues and preventing others from >> >> communicating. >> > >> >That is a valid concern. Let me explain how I would handle that if we >> >don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list, >> >which is similar to send_pkt_list for stream (and seqpacket). But there >> >is one difference. The dgram_send_pkt_list has a maximum size setting, >> >and keep tracking how many pkts are in the list. The track number >> >(dgram_send_pkt_list_size) is increased when a packet is added >> >to the list and is decreased when a packet >> >is removed from the list and added to the virtqueue. In >> >virtio_transport_send_pkt, if the current >> >dgram_send_pkt_list_size is equal >> >to the maximum ( let's say 128), then it will not add to the >> >dgram_send_pkt_list and return an error to the application. >> >> For stream socket, we have the send_pkt_list and the send worker because >> the virtqueue can be full and the transmitter needs to wait available >> slots, because we can't discard packets. >> >> For dgram I think we don't need this, so we can avoid the >> dgram_send_pkt_list and directly enqueue packets in the virtqueue. >> >> If there are no slots available, we can simply drop the packet. >> In this way we don't have to put limits other than the available space >> in the virtqueue. > >I considered this approach before. One question not clear to me is >whether we should call virtqueue_kick for every dgram packet. If I >am not wrong, each virtqueue_kick will lead to a vm_exit to the host. >If we call virtqueue_kick() for each dgram packet, the performance >might be bad when there are lots of packets.Not every virtqueue_kick() will lead to a vm_exit. If the host (see vhost_transport_do_send_pkt()) is handling the virtqueue, it disables the notification through vhost_disable_notify().>One idea is to set >a threshold and a timer to call virtqueue_kick(). When there are >lots of packets, we wait until a threshold. When there are few packets, >the timer will make sure those packets will be delivered not too late.I think is better to keep the host polling a bit if we want to avoid some notifications (see vhost_net_busy_poll()).> >Any other ideas for virtqueue_kick? If the threshold plus timer is fine, >I can go this direction too.I think is better to follow vhost_net_busy_poll() approach.> >> > >> >In this way, the number of pending dgram pkts to be sent is limited. >> >Then both stream and dgram sockets will compete to hold a lock >> >for the tx virtqueue. Depending on the linux scheduler, this competition >> >will be somewhat fair. As a result, dgram will not block stream completely. >> >It will compete and share the virtqueue with stream, but stream >> >can still send some pkts. >> > >> >Basically, the virtqueue becomes a first-come first-serve resource for >> >the stream and dgram. When both stream and dgram applications >> >have lots of data to send, dgram_send_pkt_list and send_pkt_list >> >will still be a limited size, and each will have some chance to send out >> >the data via virtqueue. Does this address your concern? >> > >> > >> >> So having 2 dedicated queues could avoid a credit mechanism at all for >> >> connection-less sockets, and simply the receiver discards packets that >> >> it can't handle. >> > >> >With 2 dedicated queues, we still need some kind of accounting for >> >the dgram. Like the dgram_send_pkt_size I mentioned above. Otherwise, >> >it will cause OOM. It is not a real credit mechanism, but code is similar >> >with or without 2 dedicated queues in my current prototype. >> >> I think in both cases we don't need an accounting, but we can drop >> packets if the virtqueue is full. >> >> It's still not clear to me where OOM may occur. Can you elaborate on >> this? > >OOM depending on the implementation details. If we keep >dgram_send_pkt_list, and put no limitation, it may increase indefinitely >and cause OOM. As long as we have some limitations or drop packets >quickly, then we should be fine.Sure, with the extra list we need some limitations.> >> > >> >For receiver discarding packets part, could you explain more? I think >> >receiver discarding pkts code also is same with or without 2 dedicated >> >queues. >> >> Yep. >> >> > Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt); >> >to check if a packet should be discarded or not. >> >> I don't think we can use virtio_transport_inc_rx_pkt() for dgram. >> This is based on the credit mechanism (e.g. buf_alloc) that works when >> you have a connection oriented socket. >> >> With dgram you can have multiple transmitter for a single socket, so how >> we can exchange that information? >> >In my view, we don't need to exchange that information. Buf_alloc for >dgram is local to a receiving socket. The sending sockets do not know >about that. For receiving socket, it just checks if there is still buffer >available to decide to drop a packet or not. If multiple transmitter >send to a single socket, they will share the same buf_alloc, and packets >will be dropped randomly for those transmitters.I see, but if we reuse skbuff I think we don't need this.> >> If we reuse the VMCI implementation with skbuff, the sk_receive_skb() >> already checks if the sk_rcvbuf is full and discards the packet in >> this >> case, so we don't need an internal rx_queue. > >OK. > >> Maybe someday we should convert stream to skbuff too, it would make our >> lives easier. >> >Yeah. I remember the original virtio vsock patch did use skb, but somehow >it did not use skb anymore when merging to the upstream, not sure why.Really? I didn't know, then I'll take a look. If you have any links, please send :-) Thanks, Stefano
Jiang Wang .
2021-May-19 04:59 UTC
[RFC v2] virtio-vsock: add description for datagram type
On Tue, May 18, 2021 at 6:02 AM Stefano Garzarella <sgarzare at redhat.com> wrote:> > On Mon, May 17, 2021 at 11:33:06PM -0700, Jiang Wang . wrote: > >On Mon, May 17, 2021 at 4:02 AM Stefano Garzarella <sgarzare at redhat.com> wrote: > >> > >> On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote: > >> >On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare at redhat.com> wrote: > >> >> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote: > >> > >> [...] > >> > >> >> >I see. I will add some limit to dgram packets. Also, when the > >> >> >virtqueues > >> >> >are shared between stream and dgram, both of them need to grab a lock > >> >> >before using the virtqueue, so one will not completely block another one. > >> >> > >> >> I'm not worried about the concurrent access that we definitely need to > >> >> handle with a lock, but more about the uncontrolled packet sending that > >> >> dgram might have, flooding the queues and preventing others from > >> >> communicating. > >> > > >> >That is a valid concern. Let me explain how I would handle that if we > >> >don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list, > >> >which is similar to send_pkt_list for stream (and seqpacket). But there > >> >is one difference. The dgram_send_pkt_list has a maximum size setting, > >> >and keep tracking how many pkts are in the list. The track number > >> >(dgram_send_pkt_list_size) is increased when a packet is added > >> >to the list and is decreased when a packet > >> >is removed from the list and added to the virtqueue. In > >> >virtio_transport_send_pkt, if the current > >> >dgram_send_pkt_list_size is equal > >> >to the maximum ( let's say 128), then it will not add to the > >> >dgram_send_pkt_list and return an error to the application. > >> > >> For stream socket, we have the send_pkt_list and the send worker because > >> the virtqueue can be full and the transmitter needs to wait available > >> slots, because we can't discard packets. > >> > >> For dgram I think we don't need this, so we can avoid the > >> dgram_send_pkt_list and directly enqueue packets in the virtqueue. > >> > >> If there are no slots available, we can simply drop the packet. > >> In this way we don't have to put limits other than the available space > >> in the virtqueue. > > > >I considered this approach before. One question not clear to me is > >whether we should call virtqueue_kick for every dgram packet. If I > >am not wrong, each virtqueue_kick will lead to a vm_exit to the host. > >If we call virtqueue_kick() for each dgram packet, the performance > >might be bad when there are lots of packets. > > Not every virtqueue_kick() will lead to a vm_exit. If the host (see > vhost_transport_do_send_pkt()) is handling the virtqueue, it disables > the notification through vhost_disable_notify().Got it.> >One idea is to set > >a threshold and a timer to call virtqueue_kick(). When there are > >lots of packets, we wait until a threshold. When there are few packets, > >the timer will make sure those packets will be delivered not too late. > > I think is better to keep the host polling a bit if we want to avoid > some notifications (see vhost_net_busy_poll()).Sure.> > > >Any other ideas for virtqueue_kick? If the threshold plus timer is fine, > >I can go this direction too. > > I think is better to follow vhost_net_busy_poll() approach.OK. I will follow vhost_net_busy_poll()> > > >> > > >> >In this way, the number of pending dgram pkts to be sent is limited. > >> >Then both stream and dgram sockets will compete to hold a lock > >> >for the tx virtqueue. Depending on the linux scheduler, this competition > >> >will be somewhat fair. As a result, dgram will not block stream completely. > >> >It will compete and share the virtqueue with stream, but stream > >> >can still send some pkts. > >> > > >> >Basically, the virtqueue becomes a first-come first-serve resource for > >> >the stream and dgram. When both stream and dgram applications > >> >have lots of data to send, dgram_send_pkt_list and send_pkt_list > >> >will still be a limited size, and each will have some chance to send out > >> >the data via virtqueue. Does this address your concern? > >> > > >> > > >> >> So having 2 dedicated queues could avoid a credit mechanism at all for > >> >> connection-less sockets, and simply the receiver discards packets that > >> >> it can't handle. > >> > > >> >With 2 dedicated queues, we still need some kind of accounting for > >> >the dgram. Like the dgram_send_pkt_size I mentioned above. Otherwise, > >> >it will cause OOM. It is not a real credit mechanism, but code is similar > >> >with or without 2 dedicated queues in my current prototype. > >> > >> I think in both cases we don't need an accounting, but we can drop > >> packets if the virtqueue is full. > >> > >> It's still not clear to me where OOM may occur. Can you elaborate on > >> this? > > > >OOM depending on the implementation details. If we keep > >dgram_send_pkt_list, and put no limitation, it may increase indefinitely > >and cause OOM. As long as we have some limitations or drop packets > >quickly, then we should be fine. > > Sure, with the extra list we need some limitations. > > > > >> > > >> >For receiver discarding packets part, could you explain more? I think > >> >receiver discarding pkts code also is same with or without 2 dedicated > >> >queues. > >> > >> Yep. > >> > >> > Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt); > >> >to check if a packet should be discarded or not. > >> > >> I don't think we can use virtio_transport_inc_rx_pkt() for dgram. > >> This is based on the credit mechanism (e.g. buf_alloc) that works when > >> you have a connection oriented socket. > >> > >> With dgram you can have multiple transmitter for a single socket, so how > >> we can exchange that information? > >> > >In my view, we don't need to exchange that information. Buf_alloc for > >dgram is local to a receiving socket. The sending sockets do not know > >about that. For receiving socket, it just checks if there is still buffer > >available to decide to drop a packet or not. If multiple transmitter > >send to a single socket, they will share the same buf_alloc, and packets > >will be dropped randomly for those transmitters. > > I see, but if we reuse skbuff I think we don't need this. > > > > >> If we reuse the VMCI implementation with skbuff, the sk_receive_skb() > >> already checks if the sk_rcvbuf is full and discards the packet in > >> this > >> case, so we don't need an internal rx_queue. > > > >OK. > > > >> Maybe someday we should convert stream to skbuff too, it would make our > >> lives easier. > >> > >Yeah. I remember the original virtio vsock patch did use skb, but somehow > >it did not use skb anymore when merging to the upstream, not sure why. > > Really? I didn't know, then I'll take a look. If you have any links, > please send :-)No. I take it back. The skb is only used for dgram, not for stream sockets. Sorry for the confusion. The code I found is not in a easy-to-read format. Anyway, here is the link if you are interested: https://kvm.vger.kernel.narkive.com/BMvH9eEr/rfc-v2-0-7-introduce-vm-sockets-virtio-transport#post1 btw: as major parts of the spec are kind of settled. I will send my current POC code to the mailing list just to have a reference and see if there are other major changes are needed. I will add to do's in the cover letter. Will update spec too.> Thanks, > Stefano >