thr3ads.net - Linux Virtualization - [PATCH vhost] vhost_net: Fix too many vring kick on busypoll [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Toshiaki Makita

2018-Jul-02 02:45 UTC

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

Hi Jason,

On 2018/06/29 18:30, Jason Wang wrote:> On 2018?06?29? 16:09, Toshiaki Makita wrote:
...>> To fix this, poll the work instead of enabling notification when
>> busypoll is interrupted by something. IMHO signal_pending() and
>> vhost_has_work() are kind of interruptions rather than signals to
>> completely cancel the busypoll, so let's run busypoll after the
>> necessary work is done. In order to avoid too long busyloop due to
>> interruption, save the endtime in vq field and use it when reentering
>> the work function.
> 
> I think we don't care long busyloop unless e.g tx can starve rx?
I just want to keep it user-controllable. Unless memorizing it busypoll
can run unexpectedly long.
>> I observed this problem makes tx performance poor but does not rx. I
>> guess it is because rx notification from socket is not that heavy so
>> did not impact the performance even when we failed to mask the
>> notification.
> 
> I think the reason is:
> 
> For tx, we may only process used ring under handle_tx(). Busy polling
> code can not recognize this case.
> For rx, we call peek_head_len() after exiting busy loop, so if the work
> is for rx, it can be processed in handle_rx() immediately.
Sorry but I don't see the difference. Tx busypoll calls
vhost_get_vq_desc() immediately after busypoll exits in
vhost_net_tx_get_vq_desc().

The scenario I described above (in 2nd paragraph) is a bit simplified.
To be clearer what I think is happening is:

1. handle_tx() runs busypoll with notification enabled (due to zerocopy
   callback or something).
2. Guest produces a packet in vring.
3. handle_tx() busypoll detects the produced packet and get it.
4. While vhost is processing the packet, guest kicks vring because it
   produced the packet. Vring notification is disabled automatically by
   event_idx and tx work is queued.
5. After processing the packet vhost enters busyloop again, but detects
   tx work and exits busyloop immediately. Then handle_tx() exits after
   enabling the notification.
6. Because the queued work was tx, handle_tx() is called immediately
   again. handle_tx() runs busypoll with notification enabled.
7. Repeat 2-6.
> 
>> Anyway for consistency I fixed rx routine as well as tx.
>>
>> Performance numbers:
>>
>> - Bulk transfer from guest to external physical server.
>> ???? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e
--(wire)--> [Server]
> 
> Just to confirm in this case since zerocopy is enabled, we are in fact
> use the generic XDP datapath?
For some reason zerocopy was not applied for most packets, so in most
cases driver XDP was used. I was going to dig into it but not yet.
> 
>> - Set 10us busypoll.
>> - Guest disables checksum and TSO because of host XDP.
>> - Measured single flow Mbps by netperf, and kicks by perf kvm stat
>> ?? (EPT_MISCONFIG event).
>>
>> ???????????????????????????? Before????????????? After
>> ?????????????????????????? Mbps? kicks/s????? Mbps? kicks/s
>> UDP_STREAM 1472byte????????????? 247758???????????????? 27
>> ???????????????? Send?? 3645.37??????????? 6958.10
>> ???????????????? Recv?? 3588.56??????????? 6958.10
>> ?????????????? 1byte??????????????? 9865???????????????? 37
>> ???????????????? Send????? 4.34?????????????? 5.43
>> ???????????????? Recv????? 4.17?????????????? 5.26
>> TCP_STREAM???????????? 8801.03??? 45794?? 9592.77???? 2884
> 
> Impressive numbers.
> 
>>
>> Signed-off-by: Toshiaki Makita <makita.toshiaki at lab.ntt.co.jp>
>> ---
...>> +static bool vhost_busy_poll_interrupted(struct vhost_dev *dev)
>> +{
>> +??? return unlikely(signal_pending(current)) || vhost_has_work(dev);
>> ? }
>> ? ? static void vhost_net_disable_vq(struct vhost_net *n,
>> @@ -437,10 +438,21 @@ static int vhost_net_tx_get_vq_desc(struct
>> vhost_net *net,
>> ? ????? if (r == vq->num && vq->busyloop_timeout) {
>> ????????? preempt_disable();
>> -??????? endtime = busy_clock() + vq->busyloop_timeout;
>> -??????? while (vhost_can_busy_poll(vq->dev, endtime) &&
>> -?????????????? vhost_vq_avail_empty(vq->dev, vq))
>> +??????? if (vq->busyloop_endtime) {
>> +??????????? endtime = vq->busyloop_endtime;
>> +??????????? vq->busyloop_endtime = 0;
> 
> Looks like endtime may be before current time. So we skip busy loop in
> this case.
Sure, I'll add a condition.
> 
>> +??????? } else {
>> +??????????? endtime = busy_clock() + vq->busyloop_timeout;
>> +??????? }
>> +??????? while (vhost_can_busy_poll(endtime)) {
>> +??????????? if (vhost_busy_poll_interrupted(vq->dev)) {
>> +??????????????? vq->busyloop_endtime = endtime;
>> +??????????????? break;
>> +??????????? }
>> +??????????? if (!vhost_vq_avail_empty(vq->dev, vq))
>> +??????????????? break;
>> ????????????? cpu_relax();
>> +??????? }
>> ????????? preempt_enable();
>> ????????? r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>> ??????????????????????? out_num, in_num, NULL, NULL);
...>> @@ -642,39 +658,51 @@ static void vhost_rx_signal_used(struct
>> vhost_net_virtqueue *nvq)
>> ? ? static int vhost_net_rx_peek_head_len(struct vhost_net *net,
>> struct sock *sk)
>> ? {
>> -??? struct vhost_net_virtqueue *rvq =
&net->vqs[VHOST_NET_VQ_RX];
>> -??? struct vhost_net_virtqueue *nvq =
&net->vqs[VHOST_NET_VQ_TX];
>> -??? struct vhost_virtqueue *vq = &nvq->vq;
>> +??? struct vhost_net_virtqueue *rnvq =
&net->vqs[VHOST_NET_VQ_RX];
>> +??? struct vhost_net_virtqueue *tnvq =
&net->vqs[VHOST_NET_VQ_TX];
>> +??? struct vhost_virtqueue *rvq = &rnvq->vq;
>> +??? struct vhost_virtqueue *tvq = &tnvq->vq;
> 
> NIt: I admit the variable name is confusing. It would be better to do
> the renaming in a separate patch to ease review.
OK.
> 
>> ????? unsigned long uninitialized_var(endtime);
>> -??? int len = peek_head_len(rvq, sk);
>> +??? int len = peek_head_len(rnvq, sk);
>> ? -??? if (!len && vq->busyloop_timeout) {
>> +??? if (!len && tvq->busyloop_timeout) {
>> ????????? /* Flush batched heads first */
>> -??????? vhost_rx_signal_used(rvq);
>> +??????? vhost_rx_signal_used(rnvq);
>> ????????? /* Both tx vq and rx socket were polled here */
>> -??????? mutex_lock_nested(&vq->mutex, 1);
>> -??????? vhost_disable_notify(&net->dev, vq);
>> +??????? mutex_lock_nested(&tvq->mutex, 1);
>> +??????? vhost_disable_notify(&net->dev, tvq);
>> ? ????????? preempt_disable();
>> -??????? endtime = busy_clock() + vq->busyloop_timeout;
>> +??????? if (rvq->busyloop_endtime) {
>> +??????????? endtime = rvq->busyloop_endtime;
>> +??????????? rvq->busyloop_endtime = 0;
>> +??????? } else {
>> +??????????? endtime = busy_clock() + tvq->busyloop_timeout;
>> +??????? }
>> ? -??????? while (vhost_can_busy_poll(&net->dev, endtime)
&&
>> -?????????????? !sk_has_rx_data(sk) &&
>> -?????????????? vhost_vq_avail_empty(&net->dev, vq))
>> +??????? while (vhost_can_busy_poll(endtime)) {
>> +??????????? if (vhost_busy_poll_interrupted(&net->dev)) {
>> +??????????????? rvq->busyloop_endtime = endtime;
>> +??????????????? break;
>> +??????????? }
>> +??????????? if (sk_has_rx_data(sk) ||
>> +??????????????? !vhost_vq_avail_empty(&net->dev, tvq))
>> +??????????????? break;
>> ????????????? cpu_relax();
> 
> Actually, I plan to poll guest rx refilling as well here to avoid rx
> kicks. Want to draft another patch to do it as well? It's as simple as
> add a vhost_vq_avail_empty for rvq above.
OK. will try it.
> 
>> +??????? }
>> ? ????????? preempt_enable();
>> ? -??????? if (!vhost_vq_avail_empty(&net->dev, vq))
>> -??????????? vhost_poll_queue(&vq->poll);
>> -??????? else if (unlikely(vhost_enable_notify(&net->dev, vq)))
{
>> -??????????? vhost_disable_notify(&net->dev, vq);
>> -??????????? vhost_poll_queue(&vq->poll);
>> +??????? if (!vhost_vq_avail_empty(&net->dev, tvq)) {
>> +??????????? vhost_poll_queue(&tvq->poll);
>> +??????? } else if (unlikely(vhost_enable_notify(&net->dev,
tvq))) {
>> +??????????? vhost_disable_notify(&net->dev, tvq);
>> +??????????? vhost_poll_queue(&tvq->poll);
>> ????????? }
>> ? -??????? mutex_unlock(&vq->mutex);
>> +??????? mutex_unlock(&tvq->mutex);
>> ? -??????? len = peek_head_len(rvq, sk);
>> +??????? len = peek_head_len(rnvq, sk);
>> ????? }
>> ? ????? return len;
...>> @@ -889,7 +918,12 @@ static void handle_rx(struct vhost_net *net)
>> ????????????? goto out;
>> ????????? }
>> ????? }
>> -??? vhost_net_enable_vq(net, vq);
>> +??? if (unlikely(vq->busyloop_endtime)) {
>> +??????? /* Busy poll is interrupted. */
>> +??????? vhost_poll_queue(&vq->poll);
>> +??? } else {
>> +??????? vhost_net_enable_vq(net, vq);
>> +??? }
> 
> This is to reduce the rx wake ups. Better with another patch.
> 
> So I suggest to split the patches into:
> 
> 1 better naming of variable of vhost_net_rx_peek_head_len().
> 2 avoid tx kicks (most of what this patch did)
> 3 avoid rx wakeups (exactly the above 6 lines did)
> 4 avoid rx kicks
OK, I'll reorganize the patch.
Thank you for your feedback!
> 
> Btw, tonghao is doing some refactoring of busy polling as well. Depends
> on the order of being merged, one of you may need rebasing.
Thanks for sharing. I'll take a look.

-- 
Toshiaki Makita

Jason Wang

2018-Jul-02 02:54 UTC

head link

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

On 2018?07?02? 10:45, Toshiaki Makita wrote:> Hi Jason,
>
> On 2018/06/29 18:30, Jason Wang wrote:
>> On 2018?06?29? 16:09, Toshiaki Makita wrote:
> ...
>>> To fix this, poll the work instead of enabling notification when
>>> busypoll is interrupted by something. IMHO signal_pending() and
>>> vhost_has_work() are kind of interruptions rather than signals to
>>> completely cancel the busypoll, so let's run busypoll after the
>>> necessary work is done. In order to avoid too long busyloop due to
>>> interruption, save the endtime in vq field and use it when
reentering
>>> the work function.
>> I think we don't care long busyloop unless e.g tx can starve rx?
> I just want to keep it user-controllable. Unless memorizing it busypoll
> can run unexpectedly long.
I think the total amount of time for busy polling is bounded. If I was 
wrong, it should be a bug somewhere.
>
>>> I observed this problem makes tx performance poor but does not rx.
I
>>> guess it is because rx notification from socket is not that heavy
so
>>> did not impact the performance even when we failed to mask the
>>> notification.
>> I think the reason is:
>>
>> For tx, we may only process used ring under handle_tx(). Busy polling
>> code can not recognize this case.
>> For rx, we call peek_head_len() after exiting busy loop, so if the work
>> is for rx, it can be processed in handle_rx() immediately.
> Sorry but I don't see the difference. Tx busypoll calls
> vhost_get_vq_desc() immediately after busypoll exits in
> vhost_net_tx_get_vq_desc().
Yes, so for the problem of zerocopy callback issue is we don't poll used 
(done_idx != upend_idx). Maybe we can try to add this and avoid the 
check of vhost_has_worker().
>
> The scenario I described above (in 2nd paragraph) is a bit simplified.
> To be clearer what I think is happening is:
>
> 1. handle_tx() runs busypoll with notification enabled (due to zerocopy
>     callback or something).
I think it was due to we enable notification when we exit handle_tx().
> 2. Guest produces a packet in vring.
> 3. handle_tx() busypoll detects the produced packet and get it.
> 4. While vhost is processing the packet, guest kicks vring because it
>     produced the packet. Vring notification is disabled automatically by
>     event_idx and tx work is queued.
> 5. After processing the packet vhost enters busyloop again, but detects
>     tx work and exits busyloop immediately. Then handle_tx() exits after
>     enabling the notification.
> 6. Because the queued work was tx, handle_tx() is called immediately
>     again. handle_tx() runs busypoll with notification enabled.
> 7. Repeat 2-6.
Exactly what I understand.
>
>>> Anyway for consistency I fixed rx routine as well as tx.
>>>
>>> Performance numbers:
>>>
>>> - Bulk transfer from guest to external physical server.
>>>  ???? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e
--(wire)--> [Server]
>> Just to confirm in this case since zerocopy is enabled, we are in fact
>> use the generic XDP datapath?
> For some reason zerocopy was not applied for most packets, so in most
> cases driver XDP was used. I was going to dig into it but not yet.
Right, just to confirm this. This is expected.

In tuntap, we do native XDP only for small and non zerocopy packets. See 
tun_can_build_skb(). The reason is XDP may adjust packet header which is 
not supported by zercopy. We can only use XDP generic for zerocopy in 
this case.
>
>>> - Set 10us busypoll.
>>> - Guest disables checksum and TSO because of host XDP.
>>> - Measured single flow Mbps by netperf, and kicks by perf kvm stat
>>>  ?? (EPT_MISCONFIG event).
>>>
>>>  ???????????????????????????? Before????????????? After
>>>  ?????????????????????????? Mbps? kicks/s????? Mbps? kicks/s
>>> UDP_STREAM 1472byte????????????? 247758???????????????? 27
>>>  ???????????????? Send?? 3645.37??????????? 6958.10
>>>  ???????????????? Recv?? 3588.56??????????? 6958.10
>>>  ?????????????? 1byte??????????????? 9865???????????????? 37
>>>  ???????????????? Send????? 4.34?????????????? 5.43
>>>  ???????????????? Recv????? 4.17?????????????? 5.26
>>> TCP_STREAM???????????? 8801.03??? 45794?? 9592.77???? 2884
>> Impressive numbers.
>>
>>> Signed-off-by: Toshiaki Makita <makita.toshiaki at
lab.ntt.co.jp>
>>> ---
> ...
>>> +static bool vhost_busy_poll_interrupted(struct vhost_dev *dev)
>>> +{
>>> +??? return unlikely(signal_pending(current)) ||
vhost_has_work(dev);
>>>  ? }
>>>  ? ? static void vhost_net_disable_vq(struct vhost_net *n,
>>> @@ -437,10 +438,21 @@ static int vhost_net_tx_get_vq_desc(struct
>>> vhost_net *net,
>>>  ? ????? if (r == vq->num && vq->busyloop_timeout) {
>>>  ????????? preempt_disable();
>>> -??????? endtime = busy_clock() + vq->busyloop_timeout;
>>> -??????? while (vhost_can_busy_poll(vq->dev, endtime) &&
>>> -?????????????? vhost_vq_avail_empty(vq->dev, vq))
>>> +??????? if (vq->busyloop_endtime) {
>>> +??????????? endtime = vq->busyloop_endtime;
>>> +??????????? vq->busyloop_endtime = 0;
>> Looks like endtime may be before current time. So we skip busy loop in
>> this case.
> Sure, I'll add a condition.
>
>>> +??????? } else {
>>> +??????????? endtime = busy_clock() + vq->busyloop_timeout;
>>> +??????? }
>>> +??????? while (vhost_can_busy_poll(endtime)) {
>>> +??????????? if (vhost_busy_poll_interrupted(vq->dev)) {
>>> +??????????????? vq->busyloop_endtime = endtime;
>>> +??????????????? break;
>>> +??????????? }
>>> +??????????? if (!vhost_vq_avail_empty(vq->dev, vq))
>>> +??????????????? break;
>>>  ????????????? cpu_relax();
>>> +??????? }
>>>  ????????? preempt_enable();
>>>  ????????? r = vhost_get_vq_desc(vq, vq->iov,
ARRAY_SIZE(vq->iov),
>>>  ??????????????????????? out_num, in_num, NULL, NULL);
> ...
>>> @@ -642,39 +658,51 @@ static void vhost_rx_signal_used(struct
>>> vhost_net_virtqueue *nvq)
>>>  ? ? static int vhost_net_rx_peek_head_len(struct vhost_net *net,
>>> struct sock *sk)
>>>  ? {
>>> -??? struct vhost_net_virtqueue *rvq =
&net->vqs[VHOST_NET_VQ_RX];
>>> -??? struct vhost_net_virtqueue *nvq =
&net->vqs[VHOST_NET_VQ_TX];
>>> -??? struct vhost_virtqueue *vq = &nvq->vq;
>>> +??? struct vhost_net_virtqueue *rnvq =
&net->vqs[VHOST_NET_VQ_RX];
>>> +??? struct vhost_net_virtqueue *tnvq =
&net->vqs[VHOST_NET_VQ_TX];
>>> +??? struct vhost_virtqueue *rvq = &rnvq->vq;
>>> +??? struct vhost_virtqueue *tvq = &tnvq->vq;
>> NIt: I admit the variable name is confusing. It would be better to do
>> the renaming in a separate patch to ease review.
> OK.
>
>>>  ????? unsigned long uninitialized_var(endtime);
>>> -??? int len = peek_head_len(rvq, sk);
>>> +??? int len = peek_head_len(rnvq, sk);
>>>  ? -??? if (!len && vq->busyloop_timeout) {
>>> +??? if (!len && tvq->busyloop_timeout) {
>>>  ????????? /* Flush batched heads first */
>>> -??????? vhost_rx_signal_used(rvq);
>>> +??????? vhost_rx_signal_used(rnvq);
>>>  ????????? /* Both tx vq and rx socket were polled here */
>>> -??????? mutex_lock_nested(&vq->mutex, 1);
>>> -??????? vhost_disable_notify(&net->dev, vq);
>>> +??????? mutex_lock_nested(&tvq->mutex, 1);
>>> +??????? vhost_disable_notify(&net->dev, tvq);
>>>  ? ????????? preempt_disable();
>>> -??????? endtime = busy_clock() + vq->busyloop_timeout;
>>> +??????? if (rvq->busyloop_endtime) {
>>> +??????????? endtime = rvq->busyloop_endtime;
>>> +??????????? rvq->busyloop_endtime = 0;
>>> +??????? } else {
>>> +??????????? endtime = busy_clock() + tvq->busyloop_timeout;
>>> +??????? }
>>>  ? -??????? while (vhost_can_busy_poll(&net->dev, endtime)
&&
>>> -?????????????? !sk_has_rx_data(sk) &&
>>> -?????????????? vhost_vq_avail_empty(&net->dev, vq))
>>> +??????? while (vhost_can_busy_poll(endtime)) {
>>> +??????????? if (vhost_busy_poll_interrupted(&net->dev)) {
>>> +??????????????? rvq->busyloop_endtime = endtime;
>>> +??????????????? break;
>>> +??????????? }
>>> +??????????? if (sk_has_rx_data(sk) ||
>>> +??????????????? !vhost_vq_avail_empty(&net->dev, tvq))
>>> +??????????????? break;
>>>  ????????????? cpu_relax();
>> Actually, I plan to poll guest rx refilling as well here to avoid rx
>> kicks. Want to draft another patch to do it as well? It's as simple
as
>> add a vhost_vq_avail_empty for rvq above.
> OK. will try it.
>
>>> +??????? }
>>>  ? ????????? preempt_enable();
>>>  ? -??????? if (!vhost_vq_avail_empty(&net->dev, vq))
>>> -??????????? vhost_poll_queue(&vq->poll);
>>> -??????? else if (unlikely(vhost_enable_notify(&net->dev,
vq))) {
>>> -??????????? vhost_disable_notify(&net->dev, vq);
>>> -??????????? vhost_poll_queue(&vq->poll);
>>> +??????? if (!vhost_vq_avail_empty(&net->dev, tvq)) {
>>> +??????????? vhost_poll_queue(&tvq->poll);
>>> +??????? } else if (unlikely(vhost_enable_notify(&net->dev,
tvq))) {
>>> +??????????? vhost_disable_notify(&net->dev, tvq);
>>> +??????????? vhost_poll_queue(&tvq->poll);
>>>  ????????? }
>>>  ? -??????? mutex_unlock(&vq->mutex);
>>> +??????? mutex_unlock(&tvq->mutex);
>>>  ? -??????? len = peek_head_len(rvq, sk);
>>> +??????? len = peek_head_len(rnvq, sk);
>>>  ????? }
>>>  ? ????? return len;
> ...
>>> @@ -889,7 +918,12 @@ static void handle_rx(struct vhost_net *net)
>>>  ????????????? goto out;
>>>  ????????? }
>>>  ????? }
>>> -??? vhost_net_enable_vq(net, vq);
>>> +??? if (unlikely(vq->busyloop_endtime)) {
>>> +??????? /* Busy poll is interrupted. */
>>> +??????? vhost_poll_queue(&vq->poll);
>>> +??? } else {
>>> +??????? vhost_net_enable_vq(net, vq);
>>> +??? }
>> This is to reduce the rx wake ups. Better with another patch.
>>
>> So I suggest to split the patches into:
>>
>> 1 better naming of variable of vhost_net_rx_peek_head_len().
>> 2 avoid tx kicks (most of what this patch did)
>> 3 avoid rx wakeups (exactly the above 6 lines did)
>> 4 avoid rx kicks
> OK, I'll reorganize the patch.
> Thank you for your feedback!
>
>> Btw, tonghao is doing some refactoring of busy polling as well. Depends
>> on the order of being merged, one of you may need rebasing.
> Thanks for sharing. I'll take a look.
>
Thanks.

Toshiaki Makita

2018-Jul-02 04:37 UTC

head link

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

On 2018/07/02 11:54, Jason Wang wrote:> On 2018?07?02? 10:45, Toshiaki Makita wrote:
>> Hi Jason,
>>
>> On 2018/06/29 18:30, Jason Wang wrote:
>>> On 2018?06?29? 16:09, Toshiaki Makita wrote:
>> ...
>>>> To fix this, poll the work instead of enabling notification
when
>>>> busypoll is interrupted by something. IMHO signal_pending() and
>>>> vhost_has_work() are kind of interruptions rather than signals
to
>>>> completely cancel the busypoll, so let's run busypoll after
the
>>>> necessary work is done. In order to avoid too long busyloop due
to
>>>> interruption, save the endtime in vq field and use it when
reentering
>>>> the work function.
>>> I think we don't care long busyloop unless e.g tx can starve
rx?
>> I just want to keep it user-controllable. Unless memorizing it busypoll
>> can run unexpectedly long.
> 
> I think the total amount of time for busy polling is bounded. If I was
> wrong, it should be a bug somewhere.
Consider this kind of scenario:
0. Set 100us busypoll for example.
1. handle_tx() runs busypoll.
2. Something like zerocopy queues tx_work within 100us.
3. busypoll exits and call handle_tx() again.
4. Repeat 1-3.

In this case handle_tx() does not process packets but busypoll
essentially runs beyond 100us without endtime memorized. This may be
just a theoretical problem, but I was worried that more code to poll tx
queue can be added in the future and it becomes realistic.
>>>> Performance numbers:
>>>>
>>>> - Bulk transfer from guest to external physical server.
>>>> ????? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e
--(wire)-->
>>>> [Server]
>>> Just to confirm in this case since zerocopy is enabled, we are in
fact
>>> use the generic XDP datapath?
>> For some reason zerocopy was not applied for most packets, so in most
>> cases driver XDP was used. I was going to dig into it but not yet.
> 
> Right, just to confirm this. This is expected.
> 
> In tuntap, we do native XDP only for small and non zerocopy packets. See
> tun_can_build_skb(). The reason is XDP may adjust packet header which is
> not supported by zercopy. We can only use XDP generic for zerocopy in
> this case.
I think I understand when driver XDP can be used. What I'm not sure and
was going to narrow down is why zerocopy is mostly not applied.

-- 
Toshiaki Makita

Reasonably Related Threads

Search for more reasonably related threads

Linux Virtualization - Jul 2018 - [PATCH vhost] vhost_net: Fix too many vring kick on busypoll

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

Reasonably Related Threads