thr3ads.net - Linux Virtualization - [RFC PATCH 0/7] Support for virtio-net hash reporting [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Jason Wang

2021-Jan-18 02:46 UTC

[RFC PATCH 0/7] Support for virtio-net hash reporting

On 2021/1/17 ??3:57, Yuri Benditovich wrote:> On Thu, Jan 14, 2021 at 5:39 AM Jason Wang <jasowang at redhat.com>
wrote:
>>
>> On 2021/1/13 ??10:33, Willem de Bruijn wrote:
>>> On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang at
redhat.com> wrote:
>>>> On 2021/1/13 ??7:47, Willem de Bruijn wrote:
>>>>> On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
>>>>> <yuri.benditovich at daynix.com> wrote:
>>>>>> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
>>>>>> <yuri.benditovich at daynix.com> wrote:
>>>>>>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
>>>>>>> <yuri.benditovich at daynix.com> wrote:
>>>>>>>> Existing TUN module is able to use provided
"steering eBPF" to
>>>>>>>> calculate per-packet hash and derive the
destination queue to
>>>>>>>> place the packet to. The eBPF uses mapped
configuration data
>>>>>>>> containing a key for hash calculation and
indirection table
>>>>>>>> with array of queues' indices.
>>>>>>>>
>>>>>>>> This series of patches adds support for
virtio-net hash reporting
>>>>>>>> feature as defined in virtio specification. It
extends the TUN module
>>>>>>>> and the "steering eBPF" as follows:
>>>>>>>>
>>>>>>>> Extended steering eBPF calculates the hash
value and hash type, keeps
>>>>>>>> hash value in the skb->hash and returns
index of destination virtqueue
>>>>>>>> and the type of the hash. TUN module keeps
returned hash type in
>>>>>>>> (currently unused) field of the skb.
>>>>>>>> skb->__unused renamed to
'hash_report_type'.
>>>>>>>>
>>>>>>>> When TUN module is called later to allocate and
fill the virtio-net
>>>>>>>> header and push it to destination virtqueue it
populates the hash
>>>>>>>> and the hash type into virtio-net header.
>>>>>>>>
>>>>>>>> VHOST driver is made aware of respective
virtio-net feature that
>>>>>>>> extends the virtio-net header to report the
hash value and hash report
>>>>>>>> type.
>>>>>>> Comment from Willem de Bruijn:
>>>>>>>
>>>>>>> Skbuff fields are in short supply. I don't
think we need to add one
>>>>>>> just for this narrow path entirely internal to the
tun device.
>>>>>>>
>>>>>> We understand that and try to minimize the impact by
using an already
>>>>>> existing unused field of skb.
>>>>> Not anymore. It was repurposed as a flags field very
recently.
>>>>>
>>>>> This use case is also very narrow in scope. And a very
short path from
>>>>> data producer to consumer. So I don't think it needs to
claim scarce
>>>>> bits in the skb.
>>>>>
>>>>> tun_ebpf_select_queue stores the field, tun_put_user reads
it and
>>>>> converts it to the virtio_net_hdr in the descriptor.
>>>>>
>>>>> tun_ebpf_select_queue is called from .ndo_select_queue. 
Storing the
>>>>> field in skb->cb is fragile, as in theory some code
could overwrite
>>>>> that between field between ndo_select_queue and
>>>>> ndo_start_xmit/tun_net_xmit, from which point it is fully
under tun
>>>>> control again. But in practice, I don't believe
anything does.
>>>>>
>>>>> Alternatively an existing skb field that is used only on
disjoint
>>>>> datapaths, such as ingress-only, could be viable.
>>>> A question here. We had metadata support in XDP for cooperation
between
>>>> eBPF programs. Do we have something similar in the skb?
>>>>
>>>> E.g in the RSS, if we want to pass some metadata information
between
>>>> eBPF program and the logic that generates the vnet header
(either hard
>>>> logic in the kernel or another eBPF program). Is there any way
that can
>>>> avoid the possible conflicts of qdiscs?
>>> Not that I am aware of. The closest thing is cb[].
>>>
>>> It'll have to aliase a field like that, that is known unused
for the given path.
>>
>> Right, we need to make sure cb is not used by other ones. I'm not
sure
>> how hard to achieve that consider Qemu installs the eBPF program but it
>> doesn't deal with networking configurations.
>>
>>
>>> One other approach that has been used within linear call stacks is
out
>>> of band. Like percpu variables softnet_data.xmit.more and
>>> mirred_rec_level. But that is perhaps a bit overwrought for this
use
>>> case.
>>
>> Yes, and if we go that way then eBPF turns out to be a burden since we
>> need to invent helpers to access those auxiliary data structure. It
>> would be better then to hard-coded the RSS in the kernel.
>>
>>
>>>>>>> Instead, you could just run the flow_dissector in
tun_put_user if the
>>>>>>> feature is negotiated. Indeed, the flow dissector
seems more apt to me
>>>>>>> than BPF here. Note that the flow dissector
internally can be
>>>>>>> overridden by a BPF program if the admin so
chooses.
>>>>>>>
>>>>>> When this set of patches is related to hash delivery in
the virtio-net
>>>>>> packet in general,
>>>>>> it was prepared in context of RSS feature
implementation as defined in
>>>>>> virtio spec [1]
>>>>>> In case of RSS it is not enough to run the
flow_dissector in tun_put_user:
>>>>>> in tun_ebpf_select_queue the TUN calls eBPF to
calculate the hash,
>>>>>> hash type and queue index
>>>>>> according to the (mapped) parameters (key, hash types,
indirection
>>>>>> table) received from the guest.
>>>>> TUNSETSTEERINGEBPF was added to support more diverse queue
selection
>>>>> than the default in case of multiqueue tun. Not sure what
the exact
>>>>> use cases are.
>>>>>
>>>>> But RSS is exactly the purpose of the flow dissector. It is
used for
>>>>> that purpose in the software variant RPS. The flow
dissector
>>>>> implements a superset of the RSS spec, and certainly
computes a
>>>>> four-tuple for TCP/IPv6. In the case of RPS, it is skipped
if the NIC
>>>>> has already computed a 4-tuple hash.
>>>>>
>>>>> What it does not give is a type indication, such as
>>>>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this
would be used.
>>>>> In datapaths where the NIC has already computed the
four-tuple hash
>>>>> and stored it in skb->hash --the common case for
servers--, That type
>>>>> field is the only reason to have to compute again.
>>>> The problem is there's no guarantee that the packet comes
from the NIC,
>>>> it could be a simple VM2VM or host2VM packet.
>>>>
>>>> And even if the packet is coming from the NIC that calculates
the hash
>>>> there's no guarantee that it's the has that guest want
(guest may use
>>>> different RSS keys).
>>> Ah yes, of course.
>>>
>>> I would still revisit the need to store a detailed hash_type along
with
>>> the hash, as as far I can tell that conveys no actionable
information
>>> to the guest.
>>
>> Yes, need to figure out its usage. According to [1], it only mention
>> that storing has type is a charge of driver. Maybe Yuri can answer
this.
>>
> For the case of Windows VM we can't know how exactly the network stack
> uses provided hash data (including hash type). But: different releases
> of Windows
> enable different hash types (for example UDP hash is enabled only on
> Server 2016 and up).
>
> Indeed the Windows requires a little more from the network adapter/driver
> than Linux does.
>
> The addition of RSS support to virtio specification takes in account
> the widest set of
> requirements (i.e. Windows one), our initial impression is that this
> should be enough also for Linux.
>
> The NDIS specification in part of RSS is _mandatory_ and there are
> certification tests
> that check that the driver provides the hash data as expected. All the
> high-performance
> network adapters have such RSS functionality in the hardware.
> With pre-RSS QEMU (i.e. where the virtio-net device does not indicate
> the RSS support)
> the virtio-net driver for Windows does all the job related to RSS:
> - hash calculation
> - hash/hash_type delivery
> - reporting each packet on the correct CPU according to RSS settings
>
> With RSS support in QEMU all the packets always come on a proper CPU and
> the driver never needs to reschedule them. The driver still need to
> calculate the
> hash and report it to Windows. In this case we do the same job twice: the
device
> (QEMU or eBPF) does calculate the hash and get proper queue/CPU to deliver
> the packet. But the hash is not delivered by the device, so the driver
needs to
> recalculate it and report to the Windows.
>
> If we add HASH_REPORT support (current set of patches) and the device
> indicates this
> feature we can avoid hash recalculation in the driver assuming we
> receive the correct hash
> value and hash type. Otherwise the driver can't know which exactly
> hash the device has calculated.
>
> Please let me know if I did not answer the question.

I think I get you. The hash type is also a kind of classification (e.g 
TCP or UDP). Any possibility that it can be deduced from the driver? (Or 
it could be too expensive to do that).

Thanks

>
>> Thanks
>>
>> [1]
>>
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/indicating-rss-receive-data
>>
>>

Yuri Benditovich

2021-Jan-18 09:09 UTC

head link

[RFC PATCH 0/7] Support for virtio-net hash reporting

On Mon, Jan 18, 2021 at 4:46 AM Jason Wang <jasowang at redhat.com>
wrote:>
>
> On 2021/1/17 ??3:57, Yuri Benditovich wrote:
> > On Thu, Jan 14, 2021 at 5:39 AM Jason Wang <jasowang at
redhat.com> wrote:
> >>
> >> On 2021/1/13 ??10:33, Willem de Bruijn wrote:
> >>> On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang at
redhat.com> wrote:
> >>>> On 2021/1/13 ??7:47, Willem de Bruijn wrote:
> >>>>> On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
> >>>>> <yuri.benditovich at daynix.com> wrote:
> >>>>>> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
> >>>>>> <yuri.benditovich at daynix.com> wrote:
> >>>>>>> On Tue, Jan 12, 2021 at 9:41 PM Yuri
Benditovich
> >>>>>>> <yuri.benditovich at daynix.com> wrote:
> >>>>>>>> Existing TUN module is able to use
provided "steering eBPF" to
> >>>>>>>> calculate per-packet hash and derive the
destination queue to
> >>>>>>>> place the packet to. The eBPF uses mapped
configuration data
> >>>>>>>> containing a key for hash calculation and
indirection table
> >>>>>>>> with array of queues' indices.
> >>>>>>>>
> >>>>>>>> This series of patches adds support for
virtio-net hash reporting
> >>>>>>>> feature as defined in virtio
specification. It extends the TUN module
> >>>>>>>> and the "steering eBPF" as
follows:
> >>>>>>>>
> >>>>>>>> Extended steering eBPF calculates the hash
value and hash type, keeps
> >>>>>>>> hash value in the skb->hash and returns
index of destination virtqueue
> >>>>>>>> and the type of the hash. TUN module keeps
returned hash type in
> >>>>>>>> (currently unused) field of the skb.
> >>>>>>>> skb->__unused renamed to
'hash_report_type'.
> >>>>>>>>
> >>>>>>>> When TUN module is called later to
allocate and fill the virtio-net
> >>>>>>>> header and push it to destination
virtqueue it populates the hash
> >>>>>>>> and the hash type into virtio-net header.
> >>>>>>>>
> >>>>>>>> VHOST driver is made aware of respective
virtio-net feature that
> >>>>>>>> extends the virtio-net header to report
the hash value and hash report
> >>>>>>>> type.
> >>>>>>> Comment from Willem de Bruijn:
> >>>>>>>
> >>>>>>> Skbuff fields are in short supply. I don't
think we need to add one
> >>>>>>> just for this narrow path entirely internal to
the tun device.
> >>>>>>>
> >>>>>> We understand that and try to minimize the impact
by using an already
> >>>>>> existing unused field of skb.
> >>>>> Not anymore. It was repurposed as a flags field very
recently.
> >>>>>
> >>>>> This use case is also very narrow in scope. And a very
short path from
> >>>>> data producer to consumer. So I don't think it
needs to claim scarce
> >>>>> bits in the skb.
> >>>>>
> >>>>> tun_ebpf_select_queue stores the field, tun_put_user
reads it and
> >>>>> converts it to the virtio_net_hdr in the descriptor.
> >>>>>
> >>>>> tun_ebpf_select_queue is called from
.ndo_select_queue.  Storing the
> >>>>> field in skb->cb is fragile, as in theory some code
could overwrite
> >>>>> that between field between ndo_select_queue and
> >>>>> ndo_start_xmit/tun_net_xmit, from which point it is
fully under tun
> >>>>> control again. But in practice, I don't believe
anything does.
> >>>>>
> >>>>> Alternatively an existing skb field that is used only
on disjoint
> >>>>> datapaths, such as ingress-only, could be viable.
> >>>> A question here. We had metadata support in XDP for
cooperation between
> >>>> eBPF programs. Do we have something similar in the skb?
> >>>>
> >>>> E.g in the RSS, if we want to pass some metadata
information between
> >>>> eBPF program and the logic that generates the vnet header
(either hard
> >>>> logic in the kernel or another eBPF program). Is there any
way that can
> >>>> avoid the possible conflicts of qdiscs?
> >>> Not that I am aware of. The closest thing is cb[].
> >>>
> >>> It'll have to aliase a field like that, that is known
unused for the given path.
> >>
> >> Right, we need to make sure cb is not used by other ones. I'm
not sure
> >> how hard to achieve that consider Qemu installs the eBPF program
but it
> >> doesn't deal with networking configurations.
> >>
> >>
> >>> One other approach that has been used within linear call
stacks is out
> >>> of band. Like percpu variables softnet_data.xmit.more and
> >>> mirred_rec_level. But that is perhaps a bit overwrought for
this use
> >>> case.
> >>
> >> Yes, and if we go that way then eBPF turns out to be a burden
since we
> >> need to invent helpers to access those auxiliary data structure.
It
> >> would be better then to hard-coded the RSS in the kernel.
> >>
> >>
> >>>>>>> Instead, you could just run the flow_dissector
in tun_put_user if the
> >>>>>>> feature is negotiated. Indeed, the flow
dissector seems more apt to me
> >>>>>>> than BPF here. Note that the flow dissector
internally can be
> >>>>>>> overridden by a BPF program if the admin so
chooses.
> >>>>>>>
> >>>>>> When this set of patches is related to hash
delivery in the virtio-net
> >>>>>> packet in general,
> >>>>>> it was prepared in context of RSS feature
implementation as defined in
> >>>>>> virtio spec [1]
> >>>>>> In case of RSS it is not enough to run the
flow_dissector in tun_put_user:
> >>>>>> in tun_ebpf_select_queue the TUN calls eBPF to
calculate the hash,
> >>>>>> hash type and queue index
> >>>>>> according to the (mapped) parameters (key, hash
types, indirection
> >>>>>> table) received from the guest.
> >>>>> TUNSETSTEERINGEBPF was added to support more diverse
queue selection
> >>>>> than the default in case of multiqueue tun. Not sure
what the exact
> >>>>> use cases are.
> >>>>>
> >>>>> But RSS is exactly the purpose of the flow dissector.
It is used for
> >>>>> that purpose in the software variant RPS. The flow
dissector
> >>>>> implements a superset of the RSS spec, and certainly
computes a
> >>>>> four-tuple for TCP/IPv6. In the case of RPS, it is
skipped if the NIC
> >>>>> has already computed a 4-tuple hash.
> >>>>>
> >>>>> What it does not give is a type indication, such as
> >>>>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how
this would be used.
> >>>>> In datapaths where the NIC has already computed the
four-tuple hash
> >>>>> and stored it in skb->hash --the common case for
servers--, That type
> >>>>> field is the only reason to have to compute again.
> >>>> The problem is there's no guarantee that the packet
comes from the NIC,
> >>>> it could be a simple VM2VM or host2VM packet.
> >>>>
> >>>> And even if the packet is coming from the NIC that
calculates the hash
> >>>> there's no guarantee that it's the has that guest
want (guest may use
> >>>> different RSS keys).
> >>> Ah yes, of course.
> >>>
> >>> I would still revisit the need to store a detailed hash_type
along with
> >>> the hash, as as far I can tell that conveys no actionable
information
> >>> to the guest.
> >>
> >> Yes, need to figure out its usage. According to [1], it only
mention
> >> that storing has type is a charge of driver. Maybe Yuri can answer
this.
> >>
> > For the case of Windows VM we can't know how exactly the network
stack
> > uses provided hash data (including hash type). But: different releases
> > of Windows
> > enable different hash types (for example UDP hash is enabled only on
> > Server 2016 and up).
> >
> > Indeed the Windows requires a little more from the network
adapter/driver
> > than Linux does.
> >
> > The addition of RSS support to virtio specification takes in account
> > the widest set of
> > requirements (i.e. Windows one), our initial impression is that this
> > should be enough also for Linux.
> >
> > The NDIS specification in part of RSS is _mandatory_ and there are
> > certification tests
> > that check that the driver provides the hash data as expected. All the
> > high-performance
> > network adapters have such RSS functionality in the hardware.
> > With pre-RSS QEMU (i.e. where the virtio-net device does not indicate
> > the RSS support)
> > the virtio-net driver for Windows does all the job related to RSS:
> > - hash calculation
> > - hash/hash_type delivery
> > - reporting each packet on the correct CPU according to RSS settings
> >
> > With RSS support in QEMU all the packets always come on a proper CPU
and
> > the driver never needs to reschedule them. The driver still need to
> > calculate the
> > hash and report it to Windows. In this case we do the same job twice:
the device
> > (QEMU or eBPF) does calculate the hash and get proper queue/CPU to
deliver
> > the packet. But the hash is not delivered by the device, so the driver
needs to
> > recalculate it and report to the Windows.
> >
> > If we add HASH_REPORT support (current set of patches) and the device
> > indicates this
> > feature we can avoid hash recalculation in the driver assuming we
> > receive the correct hash
> > value and hash type. Otherwise the driver can't know which exactly
> > hash the device has calculated.
> >
> > Please let me know if I did not answer the question.
>
>
> I think I get you. The hash type is also a kind of classification (e.g
> TCP or UDP). Any possibility that it can be deduced from the driver? (Or
> it could be too expensive to do that).
>The driver does it today (when the device does not offer any features)
and of course can continue doing it.
IMO if the device can't report the data according to the spec it
should not indicate support for the respective feature (or fallback to
vhost=off).
Again, IMO if Linux does not need the exact hash_type we can use (for
Linux) the way that Willem de Brujin suggested in his patchset:
- just add VIRTIO_NET_HASH_REPORT_L4 to the spec
- Linux can use MQ + hash delivery (and use VIRTIO_NET_HASH_REPORT_L4)
- Linux can use (if makes sense) RSS with VIRTIO_NET_HASH_REPORT_L4 and eBPF
- Windows gets what it needs + eBPF
So, everyone has what they need at the respective cost.

Regarding use of skb->cb for hash type:
Currently, if I'm not mistaken, there are 2 bytes at the end of skb->cb:
skb->cb is 48 bytes array
There is skb_gso_cb (14 bytes) at offset SKB_GSO_CB_OFFSET(32)
Is it possible to use one of these 2 bytes for hash_type?
If yes, shall we extend the skb_gso_cb and place the 1-bytes hash_type
in it or just emit compilation error if the skb_gso_cb grows beyond 15
bytes?

> Thanks
>
>
> >
> >> Thanks
> >>
> >> [1]
> >>
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/indicating-rss-receive-data
> >>
> >>
>

Linux Virtualization - Jan 2021 - [RFC PATCH 0/7] Support for virtio-net hash reporting

[RFC PATCH 0/7] Support for virtio-net hash reporting

[RFC PATCH 0/7] Support for virtio-net hash reporting