thr3ads.net - Linux Virtualization - [PATCH net-next v2 2/5] virtio-net: transmit napi [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Willem de Bruijn

2017-Apr-20 13:58 UTC

[PATCH net-next v2 2/5] virtio-net: transmit napi

On Thu, Apr 20, 2017 at 2:27 AM, Jason Wang <jasowang at redhat.com>
wrote:>
>
> On 2017?04?19? 04:21, Willem de Bruijn wrote:
>>
>> +static void virtnet_napi_tx_enable(struct virtnet_info *vi,
>> +                                  struct virtqueue *vq,
>> +                                  struct napi_struct *napi)
>> +{
>> +       if (!napi->weight)
>> +               return;
>> +
>> +       if (!vi->affinity_hint_set) {
>> +               napi->weight = 0;
>> +               return;
>> +       }
>> +
>> +       return virtnet_napi_enable(vq, napi);
>> +}
>> +
>>   static void refill_work(struct work_struct *work)
>
>
> Maybe I was wrong, but according to Michael's comment it looks like he
want
> check affinity_hint_set just for speculative tx polling on rx napi instead
> of disabling it at all.
>
> And I'm not convinced this is really needed, driver only provide
affinity
> hint instead of affinity, so it's not guaranteed that tx and rx
interrupt
> are in the same vcpus.
You're right. I made the restriction broader than the request, to really err
on the side of caution for the initial merge of napi tx. And enabling
the optimization is always a win over keeping it off, even without irq
affinity.

The cycle cost is significant without affinity regardless of whether the
optimization is used. Though this is not limited to napi-tx, it is more
pronounced in that mode than without napi.

1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}:

upstream:

1,1,1: 28985 Mbps, 278 Gcyc
1,0,2: 30067 Mbps, 402 Gcyc

napi tx:

1,1,1: 34492 Mbps, 269 Gcyc
1,0,2: 36527 Mbps, 537 Gcyc (!)
1,0,1: 36269 Mbps, 394 Gcyc
1,0,0: 34674 Mbps, 402 Gcyc

This is a particularly strong example. It is also representative
of most RR tests. It is less pronounced in other streaming tests.
10x TCP_RR, for instance:

upstream:

1,1,1: 42267 Mbps, 301 Gcyc
1,0,2: 40663 Mbps, 445 Gcyc

napi tx:

1,1,1: 42420 Mbps, 303 Gcyc
1,0,2:  42267 Mbps, 431 Gcyc

These numbers were obtained with the virtqueue_enable_cb_delayed
optimization after xmit_skb, btw. It turns out that moving that before
increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing
100x TCP_RR a bit.

Jason Wang

2017-Apr-21 03:53 UTC

head link

[PATCH net-next v2 2/5] virtio-net: transmit napi

On 2017?04?20? 21:58, Willem de Bruijn wrote:> On Thu, Apr 20, 2017 at 2:27 AM, Jason Wang <jasowang at redhat.com>
wrote:
>>
>> On 2017?04?19? 04:21, Willem de Bruijn wrote:
>>> +static void virtnet_napi_tx_enable(struct virtnet_info *vi,
>>> +                                  struct virtqueue *vq,
>>> +                                  struct napi_struct *napi)
>>> +{
>>> +       if (!napi->weight)
>>> +               return;
>>> +
>>> +       if (!vi->affinity_hint_set) {
>>> +               napi->weight = 0;
>>> +               return;
>>> +       }
>>> +
>>> +       return virtnet_napi_enable(vq, napi);
>>> +}
>>> +
>>>    static void refill_work(struct work_struct *work)
>>
>> Maybe I was wrong, but according to Michael's comment it looks like
he want
>> check affinity_hint_set just for speculative tx polling on rx napi
instead
>> of disabling it at all.
>>
>> And I'm not convinced this is really needed, driver only provide
affinity
>> hint instead of affinity, so it's not guaranteed that tx and rx
interrupt
>> are in the same vcpus.
> You're right. I made the restriction broader than the request, to
really err
> on the side of caution for the initial merge of napi tx. And enabling
> the optimization is always a win over keeping it off, even without irq
> affinity.
>
> The cycle cost is significant without affinity regardless of whether the
> optimization is used.
Yes, I noticed this in the past too.
> Though this is not limited to napi-tx, it is more
> pronounced in that mode than without napi.
>
> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}:
>
> upstream:
>
> 1,1,1: 28985 Mbps, 278 Gcyc
> 1,0,2: 30067 Mbps, 402 Gcyc
>
> napi tx:
>
> 1,1,1: 34492 Mbps, 269 Gcyc
> 1,0,2: 36527 Mbps, 537 Gcyc (!)
> 1,0,1: 36269 Mbps, 394 Gcyc
> 1,0,0: 34674 Mbps, 402 Gcyc
>
> This is a particularly strong example. It is also representative
> of most RR tests. It is less pronounced in other streaming tests.
> 10x TCP_RR, for instance:
>
> upstream:
>
> 1,1,1: 42267 Mbps, 301 Gcyc
> 1,0,2: 40663 Mbps, 445 Gcyc
>
> napi tx:
>
> 1,1,1: 42420 Mbps, 303 Gcyc
> 1,0,2:  42267 Mbps, 431 Gcyc
>
> These numbers were obtained with the virtqueue_enable_cb_delayed
> optimization after xmit_skb, btw. It turns out that moving that before
> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing
> 100x TCP_RR a bit.
I see, so I think we can leave the affinity hint optimization/check for 
future investigation:

- to avoid endless optimization (e.g we may want to share a single 
vector/napi for tx/rx queue pairs in the future) for this series.
- tx napi is disabled by default which means we can do optimization on top.

Thanks

Willem de Bruijn

2017-Apr-21 14:50 UTC

head link

[PATCH net-next v2 2/5] virtio-net: transmit napi

>>> Maybe I was wrong, but according to Michael's comment it looks
like he
>>> want
>>> check affinity_hint_set just for speculative tx polling on rx napi
>>> instead
>>> of disabling it at all.
>>>
>>> And I'm not convinced this is really needed, driver only
provide affinity
>>> hint instead of affinity, so it's not guaranteed that tx and rx
interrupt
>>> are in the same vcpus.
>>
>> You're right. I made the restriction broader than the request, to
really
>> err
>> on the side of caution for the initial merge of napi tx. And enabling
>> the optimization is always a win over keeping it off, even without irq
>> affinity.
>>
>> The cycle cost is significant without affinity regardless of whether
the
>> optimization is used.
>
>
> Yes, I noticed this in the past too.
>
>> Though this is not limited to napi-tx, it is more
>> pronounced in that mode than without napi.
>>
>> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}:
>>
>> upstream:
>>
>> 1,1,1: 28985 Mbps, 278 Gcyc
>> 1,0,2: 30067 Mbps, 402 Gcyc
>>
>> napi tx:
>>
>> 1,1,1: 34492 Mbps, 269 Gcyc
>> 1,0,2: 36527 Mbps, 537 Gcyc (!)
>> 1,0,1: 36269 Mbps, 394 Gcyc
>> 1,0,0: 34674 Mbps, 402 Gcyc
>>
>> This is a particularly strong example. It is also representative
>> of most RR tests. It is less pronounced in other streaming tests.
>> 10x TCP_RR, for instance:
>>
>> upstream:
>>
>> 1,1,1: 42267 Mbps, 301 Gcyc
>> 1,0,2: 40663 Mbps, 445 Gcyc
>>
>> napi tx:
>>
>> 1,1,1: 42420 Mbps, 303 Gcyc
>> 1,0,2:  42267 Mbps, 431 Gcyc
>>
>> These numbers were obtained with the virtqueue_enable_cb_delayed
>> optimization after xmit_skb, btw. It turns out that moving that before
>> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing
>> 100x TCP_RR a bit.
>
>
> I see, so I think we can leave the affinity hint optimization/check for
> future investigation:
>
> - to avoid endless optimization (e.g we may want to share a single
> vector/napi for tx/rx queue pairs in the future) for this series.
> - tx napi is disabled by default which means we can do optimization on top.
Okay. I'll drop the vi->affinity_hint_set from the patch set for now.

Maybe Matching Threads

Search for more maybe matching threads

Linux Virtualization - Apr 2017 - [PATCH net-next v2 2/5] virtio-net: transmit napi

[PATCH net-next v2 2/5] virtio-net: transmit napi

[PATCH net-next v2 2/5] virtio-net: transmit napi

[PATCH net-next v2 2/5] virtio-net: transmit napi

Maybe Matching Threads