Vlad Yasevich
2017-Apr-20 15:34 UTC
[PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
On 04/17/2017 11:01 PM, Jason Wang wrote:> > > On 2017?04?16? 00:38, Vladislav Yasevich wrote: >> Curreclty virtion net header is fixed size and adding things to it is rather >> difficult to do. This series attempt to add the infrastructure as well as some >> extensions that try to resolve some deficiencies we currently have. >> >> First, vnet header only has space for 16 flags. This may not be enough >> in the future. The extensions will provide space for 32 possbile extension >> flags and 32 possible extensions. These flags will be carried in the >> first pseudo extension header, the presense of which will be determined by >> the flag in the virtio net header. >> >> The extensions themselves will immidiately follow the extension header itself. >> They will be added to the packet in the same order as they appear in the >> extension flags. No padding is placed between the extensions and any >> extensions negotiated, but not used need by a given packet will convert to >> trailing padding. > > Do we need a explicit padding (e.g an extension) which could be controlled by each side?I don't think so. The size of the vnet header is set based on the extensions negotiated. The one part I am not crazy about is that in the case of packet not using any extensions, the data is still placed after the entire vnet header, which essentially adds a lot of padding. However, that's really no different then if we simply grew the vnet header. The other thing I've tried before is putting extensions into their own sg buffer, but that made it slower.> >> >> For example: >> | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data | > > Just some rough thoughts: > > - Is this better to use TLV instead of bitmap here? One advantage of TLV is that the > length is not limited by the length of bitmap.but the disadvantage is that we add at least 4 bytes per extension of just TL data. That makes this thing even longer.> - For 1.1, do we really want something like vnet header? AFAIK, it was not used by modern > NICs, is this better to pack all meta-data into descriptor itself? This may need a some > changes in tun/macvtap, but looks more PCIE friendly.That would really be ideal and I've looked at this. There are small issues of exposing the 'net metadata' of the descriptor to taps so they can be filled in. The alternative is to use a different control structure for tap->qemu|vhost channel (that can be implementation specific) and have qemu|vhost populate the 'net metadata' of the descriptor. Thanks -vlad> > Thanks > >> >> Extensions proposed in this series are: >> - IPv6 fragment id extension >> * Currently, the guest generated fragment id is discarded and the host >> generates an IPv6 fragment id if the packet has to be fragmented. The >> code attempts to add time based perturbation to id generation to make >> it harder to guess the next fragment id to be used. However, doing this >> on the host may result is less perturbation (due to differnet timing) >> and might make id guessing easier. Ideally, the ids generated by the >> guest should be used. One could also argue that we a "violating" the >> IPv6 protocol in the if the _strict_ interpretation of the spec. >> >> - VLAN header acceleration >> * Currently virtio doesn't not do vlan header acceleration and instead >> uses software tagging. One of the first things that the host will do is >> strip the vlan header out. When passing the packet the a guest the >> vlan header is re-inserted in to the packet. We can skip all that work >> if we can pass the vlan data in accelearted format. Then the host will >> not do any extra work. However, so far, this yeilded a very small >> perf bump (only ~1%). I am still looking into this. >> >> - UDP tunnel offload >> * Similar to vlan acceleration, with this extension we can pass additional >> data to host for support GSO with udp tunnel and possible other >> encapsulations. This yeilds a significant perfromance improvement >> (still testing remote checksum code). >> >> An addition extension that is unfinished (due to still testing for any >> side-effects) is checksum passthrough to support drivers that set >> CHECKSUM_COMPLETE. This would eliminate the need for guests to compute >> the software checksum. >> >> This series only takes care of virtio net. I have addition patches for the >> host side (vhost and tap/macvtap as well as qemu), but wanted to get feedback >> on the general approach first. >> >> Vladislav Yasevich (6): >> virtio-net: Remove the use the padded vnet_header structure >> virtio-net: make header length handling uniform >> virtio_net: Add basic skeleton for handling vnet header extensions. >> virtio-net: Add support for IPv6 fragment id vnet header extension. >> virtio-net: Add support for vlan acceleration vnet header extension. >> virtio-net: Add support for UDP tunnel offload and extension. >> >> drivers/net/virtio_net.c | 132 +++++++++++++++++++++++++++++++++------- >> include/linux/skbuff.h | 5 ++ >> include/linux/virtio_net.h | 91 ++++++++++++++++++++++++++- >> include/uapi/linux/virtio_net.h | 38 ++++++++++++ >> 4 files changed, 242 insertions(+), 24 deletions(-) >> >
Jason Wang
2017-Apr-21 04:05 UTC
[PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
On 2017?04?20? 23:34, Vlad Yasevich wrote:> On 04/17/2017 11:01 PM, Jason Wang wrote: >> >> On 2017?04?16? 00:38, Vladislav Yasevich wrote: >>> Curreclty virtion net header is fixed size and adding things to it is rather >>> difficult to do. This series attempt to add the infrastructure as well as some >>> extensions that try to resolve some deficiencies we currently have. >>> >>> First, vnet header only has space for 16 flags. This may not be enough >>> in the future. The extensions will provide space for 32 possbile extension >>> flags and 32 possible extensions. These flags will be carried in the >>> first pseudo extension header, the presense of which will be determined by >>> the flag in the virtio net header. >>> >>> The extensions themselves will immidiately follow the extension header itself. >>> They will be added to the packet in the same order as they appear in the >>> extension flags. No padding is placed between the extensions and any >>> extensions negotiated, but not used need by a given packet will convert to >>> trailing padding. >> Do we need a explicit padding (e.g an extension) which could be controlled by each side? > I don't think so. The size of the vnet header is set based on the extensions negotiated. > The one part I am not crazy about is that in the case of packet not using any extensions, > the data is still placed after the entire vnet header, which essentially adds a lot > of padding. However, that's really no different then if we simply grew the vnet header. > > The other thing I've tried before is putting extensions into their own sg buffer, but that > made it slower.hYes.> >>> For example: >>> | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data | >> Just some rough thoughts: >> >> - Is this better to use TLV instead of bitmap here? One advantage of TLV is that the >> length is not limited by the length of bitmap. > but the disadvantage is that we add at least 4 bytes per extension of just TL data. That > makes this thing even longer.Yes, and it looks like the length is still limited by e.g the length of T.> >> - For 1.1, do we really want something like vnet header? AFAIK, it was not used by modern >> NICs, is this better to pack all meta-data into descriptor itself? This may need a some >> changes in tun/macvtap, but looks more PCIE friendly. > That would really be ideal and I've looked at this. There are small issues of exposing > the 'net metadata' of the descriptor to taps so they can be filled in. The alternative > is to use a different control structure for tap->qemu|vhost channel (that can be > implementation specific) and have qemu|vhost populate the 'net metadata' of the descriptor.Yes, this needs some thought. For vhost, things looks a little bit easier, we can probably use msg_control. Thanks> Thanks > -vlad > >> Thanks >> >>> Extensions proposed in this series are: >>> - IPv6 fragment id extension >>> * Currently, the guest generated fragment id is discarded and the host >>> generates an IPv6 fragment id if the packet has to be fragmented. The >>> code attempts to add time based perturbation to id generation to make >>> it harder to guess the next fragment id to be used. However, doing this >>> on the host may result is less perturbation (due to differnet timing) >>> and might make id guessing easier. Ideally, the ids generated by the >>> guest should be used. One could also argue that we a "violating" the >>> IPv6 protocol in the if the _strict_ interpretation of the spec. >>> >>> - VLAN header acceleration >>> * Currently virtio doesn't not do vlan header acceleration and instead >>> uses software tagging. One of the first things that the host will do is >>> strip the vlan header out. When passing the packet the a guest the >>> vlan header is re-inserted in to the packet. We can skip all that work >>> if we can pass the vlan data in accelearted format. Then the host will >>> not do any extra work. However, so far, this yeilded a very small >>> perf bump (only ~1%). I am still looking into this. >>> >>> - UDP tunnel offload >>> * Similar to vlan acceleration, with this extension we can pass additional >>> data to host for support GSO with udp tunnel and possible other >>> encapsulations. This yeilds a significant perfromance improvement >>> (still testing remote checksum code). >>> >>> An addition extension that is unfinished (due to still testing for any >>> side-effects) is checksum passthrough to support drivers that set >>> CHECKSUM_COMPLETE. This would eliminate the need for guests to compute >>> the software checksum. >>> >>> This series only takes care of virtio net. I have addition patches for the >>> host side (vhost and tap/macvtap as well as qemu), but wanted to get feedback >>> on the general approach first. >>> >>> Vladislav Yasevich (6): >>> virtio-net: Remove the use the padded vnet_header structure >>> virtio-net: make header length handling uniform >>> virtio_net: Add basic skeleton for handling vnet header extensions. >>> virtio-net: Add support for IPv6 fragment id vnet header extension. >>> virtio-net: Add support for vlan acceleration vnet header extension. >>> virtio-net: Add support for UDP tunnel offload and extension. >>> >>> drivers/net/virtio_net.c | 132 +++++++++++++++++++++++++++++++++------- >>> include/linux/skbuff.h | 5 ++ >>> include/linux/virtio_net.h | 91 ++++++++++++++++++++++++++- >>> include/uapi/linux/virtio_net.h | 38 ++++++++++++ >>> 4 files changed, 242 insertions(+), 24 deletions(-) >>>
Vlad Yasevich
2017-Apr-21 13:08 UTC
[PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
On 04/21/2017 12:05 AM, Jason Wang wrote:> > > On 2017?04?20? 23:34, Vlad Yasevich wrote: >> On 04/17/2017 11:01 PM, Jason Wang wrote: >>> >>> On 2017?04?16? 00:38, Vladislav Yasevich wrote: >>>> Curreclty virtion net header is fixed size and adding things to it is rather >>>> difficult to do. This series attempt to add the infrastructure as well as some >>>> extensions that try to resolve some deficiencies we currently have. >>>> >>>> First, vnet header only has space for 16 flags. This may not be enough >>>> in the future. The extensions will provide space for 32 possbile extension >>>> flags and 32 possible extensions. These flags will be carried in the >>>> first pseudo extension header, the presense of which will be determined by >>>> the flag in the virtio net header. >>>> >>>> The extensions themselves will immidiately follow the extension header itself. >>>> They will be added to the packet in the same order as they appear in the >>>> extension flags. No padding is placed between the extensions and any >>>> extensions negotiated, but not used need by a given packet will convert to >>>> trailing padding. >>> Do we need a explicit padding (e.g an extension) which could be controlled by each side? >> I don't think so. The size of the vnet header is set based on the extensions negotiated. >> The one part I am not crazy about is that in the case of packet not using any extensions, >> the data is still placed after the entire vnet header, which essentially adds a lot >> of padding. However, that's really no different then if we simply grew the vnet header. >> >> The other thing I've tried before is putting extensions into their own sg buffer, but that >> made it slower.h > > Yes. > >> >>>> For example: >>>> | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data | >>> Just some rough thoughts: >>> >>> - Is this better to use TLV instead of bitmap here? One advantage of TLV is that the >>> length is not limited by the length of bitmap. >> but the disadvantage is that we add at least 4 bytes per extension of just TL data. That >> makes this thing even longer. > > Yes, and it looks like the length is still limited by e.g the length of T.Not only that, but it is also limited by the skb->cb as a whole. So adding putting extensions into a TLV style means we have less extensions for now, until we get rid of skb->cb usage.> >> >>> - For 1.1, do we really want something like vnet header? AFAIK, it was not used by modern >>> NICs, is this better to pack all meta-data into descriptor itself? This may need a some >>> changes in tun/macvtap, but looks more PCIE friendly. >> That would really be ideal and I've looked at this. There are small issues of exposing >> the 'net metadata' of the descriptor to taps so they can be filled in. The alternative >> is to use a different control structure for tap->qemu|vhost channel (that can be >> implementation specific) and have qemu|vhost populate the 'net metadata' of the descriptor. > > Yes, this needs some thought. For vhost, things looks a little bit easier, we can probably > use msg_control. >We can use msg_control in qemu as well, can't we? It really is a question of who is doing the work and the number of copies. I can take a closer look of how it would look if we extend the descriptor with type specific data. I don't know if other users of virtio would benefit from it? -vlad> Thanks > >> Thanks >> -vlad >> >>> Thanks >>> >>>> Extensions proposed in this series are: >>>> - IPv6 fragment id extension >>>> * Currently, the guest generated fragment id is discarded and the host >>>> generates an IPv6 fragment id if the packet has to be fragmented. The >>>> code attempts to add time based perturbation to id generation to make >>>> it harder to guess the next fragment id to be used. However, doing this >>>> on the host may result is less perturbation (due to differnet timing) >>>> and might make id guessing easier. Ideally, the ids generated by the >>>> guest should be used. One could also argue that we a "violating" the >>>> IPv6 protocol in the if the _strict_ interpretation of the spec. >>>> >>>> - VLAN header acceleration >>>> * Currently virtio doesn't not do vlan header acceleration and instead >>>> uses software tagging. One of the first things that the host will do is >>>> strip the vlan header out. When passing the packet the a guest the >>>> vlan header is re-inserted in to the packet. We can skip all that work >>>> if we can pass the vlan data in accelearted format. Then the host will >>>> not do any extra work. However, so far, this yeilded a very small >>>> perf bump (only ~1%). I am still looking into this. >>>> >>>> - UDP tunnel offload >>>> * Similar to vlan acceleration, with this extension we can pass additional >>>> data to host for support GSO with udp tunnel and possible other >>>> encapsulations. This yeilds a significant perfromance improvement >>>> (still testing remote checksum code). >>>> >>>> An addition extension that is unfinished (due to still testing for any >>>> side-effects) is checksum passthrough to support drivers that set >>>> CHECKSUM_COMPLETE. This would eliminate the need for guests to compute >>>> the software checksum. >>>> >>>> This series only takes care of virtio net. I have addition patches for the >>>> host side (vhost and tap/macvtap as well as qemu), but wanted to get feedback >>>> on the general approach first. >>>> >>>> Vladislav Yasevich (6): >>>> virtio-net: Remove the use the padded vnet_header structure >>>> virtio-net: make header length handling uniform >>>> virtio_net: Add basic skeleton for handling vnet header extensions. >>>> virtio-net: Add support for IPv6 fragment id vnet header extension. >>>> virtio-net: Add support for vlan acceleration vnet header extension. >>>> virtio-net: Add support for UDP tunnel offload and extension. >>>> >>>> drivers/net/virtio_net.c | 132 +++++++++++++++++++++++++++++++++------- >>>> include/linux/skbuff.h | 5 ++ >>>> include/linux/virtio_net.h | 91 ++++++++++++++++++++++++++- >>>> include/uapi/linux/virtio_net.h | 38 ++++++++++++ >>>> 4 files changed, 242 insertions(+), 24 deletions(-) >>>> >
Michael S. Tsirkin
2017-Apr-24 17:04 UTC
[PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
On Thu, Apr 20, 2017 at 11:34:57AM -0400, Vlad Yasevich wrote:> > - For 1.1, do we really want something like vnet header? AFAIK, it was not used by modern > > NICs, is this better to pack all meta-data into descriptor itself? This may need a some > > changes in tun/macvtap, but looks more PCIE friendly. > > That would really be ideal and I've looked at this.We already have at least 16 unused bits in the used ring (head is 16 bit we are using 32 for it). -- MST
Reasonably Related Threads
- [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
- [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
- [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
- [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
- [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions