thr3ads.net - Linux Virtualization - Virtio BoF minutes from KVM Forum 2017 [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Jens Freimann

2017-Oct-29 12:52 UTC

Virtio BoF minutes from KVM Forum 2017

Virtio BoF minutes KVM Forum 2017

Attendees: Amnon Ilan, Maxime Coqueline, Vlad Yasevich, Malcolm Crossley,
	   David Vrabel, Ilya Lesokhin, Cunming Lian, Jens Freimann

Topics: packed ring layout with respect to hardware implementations

References:
https://lists.oasis-open.org/archives/virtio-dev/201702/msg00010.html
https://lists.oasis-open.org/archives/virtio-dev/201709/msg00013.html

Malcolm  Crossley, David Vrabel: 
- keep in mind not to only optimize for network with small frame sizes.
  Storage has much larger sizes
- is there really no cacheline ping pong, because we are overwriting the same
  cache line? 4 descs in one line, once we access two at the same time it will
  cause cache coherency, messages, no?
- interesting quirk, because we flip a bit, but intel doesn't support
writing
  single bytes, it will always be a full dword. will that be a problem? 
- interesting to look into NVME protocols, it seems to solve some of the same
  problems hardware-wise
- vmware vmxnet3 has a separate data ring for when they have bigger amounts of
  data. not to copy, but still interesting

Steve: 
- is the _MORE flag from packed ring layout proposal still in use? what is
it's meaning?

Ilya: 
- you might have more completions than descriptors available
- partial descriptor chains are a problem for hardware because you might have
  to read a bunch of conscriptors twice 
- how would you do deal with a big buffer that cointains a large number of
  small packets with respect to completions?
- is one bit for completion enough? right now it means descriptor was actually
  used. how to we signal when it was completed?
- concerned about not being able to do scatter/gatter with the ring layout.
  Network drivers heavily using indirect buffers.  
- for a hardware implementation a completion ring is a very convenient form for
  some use cases, so we want an efficient implementation for them. If we had an
  inline descriptor then a completion ring is just a normal ring and we
won't
  need another ring type.
- doesn't like the fact that we need to do a linear scan to find the length
of
  a descriptor chain. It would be nice if we could have the length of the chain
  in the first descriptor (i.e. the number of chained descriptors, not the
number
  of posted descriptors which can be deduced from the id field)


Vlad: 
- there were discussions about having a bigger descriptor. then we would
  have more space to put things like a vnet header into the descriptor. It would
  also mean less conflicts with accessing the same cache line. (descriptors
already
  grew to 16 bytes, do we need more?)
- was playing around with the idea of different ring types for different devices
  e.g. scsi, net. starting with generic information then comes protocol
  specific data. Ilya agrees. length of descriptor would be flexibla by adding a
  descriptor length field.  

How to continue / TODOs:
 - do benchmarking with bigger frame sizes on fast enough NICs
 - turn prototype code into a RFC series (work in progress)
 - more people interested to join monthly meetings

Open questions:
- Do we need an (optional) completion ring?
- Is there a situation where 4 descriptors in a cache line is a problem because
  we access the same cache line, causing cache ping-pong?
- Interrupt suppression requires device to do a memory read after writing out
  descriptors?  Will that be too costly? Let driver write out index?


regards
Jens

Michael S. Tsirkin

2017-Nov-01 14:59 UTC

head link

Virtio BoF minutes from KVM Forum 2017

On Sun, Oct 29, 2017 at 01:52:25PM +0100, Jens Freimann
wrote:> Ilya: - you might have more completions than descriptors available
> - partial descriptor chains are a problem for hardware because you might
have
>  to read a bunch of conscriptors twice - how would you do deal with a big
> buffer that cointains a large number of
>  small packets with respect to completions?
> - is one bit for completion enough? right now it means descriptor was
actually
>  used. how to we signal when it was completed?
I am not sure I understand the difference. Under virtio, driver makes a
descriptor available, then device reads/writes memory depending on
descriptor type, then marks it as used.

What does completed mean?
> - concerned about not being able to do scatter/gatter with the ring layout.
>  Network drivers heavily using indirect buffers.  - for a hardware
> implementation a completion ring is a very convenient form for
>  some use cases, so we want an efficient implementation for them. If we had
an
>  inline descriptor then a completion ring is just a normal ring and we
won't
>  need another ring type.
> - doesn't like the fact that we need to do a linear scan to find the
length of
>  a descriptor chain. It would be nice if we could have the length of the
chain
>  in the first descriptor (i.e. the number of chained descriptors, not the
number
>  of posted descriptors which can be deduced from the id field)
Not responding to rest of points since I don't understand the basic
assumption above yet.

-- 
MST

Ilya Lesokhin

2017-Nov-01 15:52 UTC

head link

Virtio BoF minutes from KVM Forum 2017

On Wednesday, November 01, 2017 4:59 PM, Michael S. Tsirkin wrote:
> On Sun, Oct 29, 2017 at 01:52:25PM +0100, Jens Freimann wrote:
> > Ilya: - you might have more completions than descriptors available
> > - partial descriptor chains are a problem for hardware because you
> > might have  to read a bunch of conscriptors twice - how would you do
> > deal with a big buffer that cointains a large number of  small packets
> > with respect to completions?
> > - is one bit for completion enough? right now it means descriptor was
> > actually  used. how to we signal when it was completed?
> 
> I am not sure I understand the difference. Under virtio, driver makes a
> descriptor available, then device reads/writes memory depending on
descriptor
> type, then marks it as used.
> 
> What does completed mean?
> 
During the BOF, someone raised the point that there is no indication that the HW
has
Read the descriptor. I think after some discussion we've agreed that
it's not a useful indication.

My issues with the current completion or used notifications are as follows:
1. There is no room for extra metadata such as checksum or flow tag.
You could put that in the descriptor payload but it's somewhat inconvenient.
You have to either use and additional descriptor for metadata per chain.
Or putting it in one of the buffers and forcing the lifetime of the metadata and
data to be the same.

2. Current format assumes 1-1 corresponds between descriptors and completions.
You did offer a skipping optimization for many descriptors -> 1 completion.
But it is somewhat inefficient.
And you didn't offer a solution for 1 descriptor -> multiple completions.
Mellanox has a feature called striding RQ where you post a large buffer and
The NIC fills it with multiple back to back packets with padding.
Each packet generates its own completion.

3. There is a usage model where you have multiple produce rings
And a single completion ring.
You could implement the completion ring using an additional virtio ring,  but 
The current model will require an extra indirection as it force you to write
into
The buffers the descriptor in the completion ring point to. Rather than writing
the
Completion into the ring itself.
Additionally the device is still required to write to the original producer ring
in addition to the completion ring.

I think the best and most flexible design is to have variable size descriptor
that
start with a dword header.
The dword header will include - an ownership bit, an opcode and descriptor
length.
The opcode and the "length" dwords following the header will be device
specific.

The owner bit meaning changes on each ring wrap around so the device doesn't
Need to update.

Each device (or device class) can choose whether completions are reported
directly inside
the descriptors in that ring or in a separate completion ring. 

completions rings can be implemented in an efficient manner with this design.
The driver will initialize a dedicated completion ring with empty completion
sized descriptors.
And the device will write the completions directly into the ring.

Possibly Parallel Threads

Search for more reasonably related threads

Linux Virtualization - Nov 2017 - Virtio BoF minutes from KVM Forum 2017

Virtio BoF minutes from KVM Forum 2017

Virtio BoF minutes from KVM Forum 2017

Virtio BoF minutes from KVM Forum 2017

Possibly Parallel Threads