Toshiaki Makita
2018-Jul-02 04:37 UTC
[PATCH vhost] vhost_net: Fix too many vring kick on busypoll
On 2018/07/02 11:54, Jason Wang wrote:> On 2018?07?02? 10:45, Toshiaki Makita wrote: >> Hi Jason, >> >> On 2018/06/29 18:30, Jason Wang wrote: >>> On 2018?06?29? 16:09, Toshiaki Makita wrote: >> ... >>>> To fix this, poll the work instead of enabling notification when >>>> busypoll is interrupted by something. IMHO signal_pending() and >>>> vhost_has_work() are kind of interruptions rather than signals to >>>> completely cancel the busypoll, so let's run busypoll after the >>>> necessary work is done. In order to avoid too long busyloop due to >>>> interruption, save the endtime in vq field and use it when reentering >>>> the work function. >>> I think we don't care long busyloop unless e.g tx can starve rx? >> I just want to keep it user-controllable. Unless memorizing it busypoll >> can run unexpectedly long. > > I think the total amount of time for busy polling is bounded. If I was > wrong, it should be a bug somewhere.Consider this kind of scenario: 0. Set 100us busypoll for example. 1. handle_tx() runs busypoll. 2. Something like zerocopy queues tx_work within 100us. 3. busypoll exits and call handle_tx() again. 4. Repeat 1-3. In this case handle_tx() does not process packets but busypoll essentially runs beyond 100us without endtime memorized. This may be just a theoretical problem, but I was worried that more code to poll tx queue can be added in the future and it becomes realistic.>>>> Performance numbers: >>>> >>>> - Bulk transfer from guest to external physical server. >>>> ????? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)--> >>>> [Server] >>> Just to confirm in this case since zerocopy is enabled, we are in fact >>> use the generic XDP datapath? >> For some reason zerocopy was not applied for most packets, so in most >> cases driver XDP was used. I was going to dig into it but not yet. > > Right, just to confirm this. This is expected. > > In tuntap, we do native XDP only for small and non zerocopy packets. See > tun_can_build_skb(). The reason is XDP may adjust packet header which is > not supported by zercopy. We can only use XDP generic for zerocopy in > this case.I think I understand when driver XDP can be used. What I'm not sure and was going to narrow down is why zerocopy is mostly not applied. -- Toshiaki Makita
Jason Wang
2018-Jul-02 06:17 UTC
[PATCH vhost] vhost_net: Fix too many vring kick on busypoll
On 2018?07?02? 12:37, Toshiaki Makita wrote:> On 2018/07/02 11:54, Jason Wang wrote: >> On 2018?07?02? 10:45, Toshiaki Makita wrote: >>> Hi Jason, >>> >>> On 2018/06/29 18:30, Jason Wang wrote: >>>> On 2018?06?29? 16:09, Toshiaki Makita wrote: >>> ... >>>>> To fix this, poll the work instead of enabling notification when >>>>> busypoll is interrupted by something. IMHO signal_pending() and >>>>> vhost_has_work() are kind of interruptions rather than signals to >>>>> completely cancel the busypoll, so let's run busypoll after the >>>>> necessary work is done. In order to avoid too long busyloop due to >>>>> interruption, save the endtime in vq field and use it when reentering >>>>> the work function. >>>> I think we don't care long busyloop unless e.g tx can starve rx? >>> I just want to keep it user-controllable. Unless memorizing it busypoll >>> can run unexpectedly long. >> I think the total amount of time for busy polling is bounded. If I was >> wrong, it should be a bug somewhere. > Consider this kind of scenario: > 0. Set 100us busypoll for example. > 1. handle_tx() runs busypoll. > 2. Something like zerocopy queues tx_work within 100us. > 3. busypoll exits and call handle_tx() again. > 4. Repeat 1-3. > > In this case handle_tx() does not process packets but busypoll > essentially runs beyond 100us without endtime memorized. This may be > just a theoretical problem, but I was worried that more code to poll tx > queue can be added in the future and it becomes realistic.Yes, but consider zerocopy tends to batch 16 used packets and we will finally finish all processing of packets. The above won't be endless, so it was probably tolerable.> >>>>> Performance numbers: >>>>> >>>>> - Bulk transfer from guest to external physical server. >>>>> ????? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)--> >>>>> [Server] >>>> Just to confirm in this case since zerocopy is enabled, we are in fact >>>> use the generic XDP datapath? >>> For some reason zerocopy was not applied for most packets, so in most >>> cases driver XDP was used. I was going to dig into it but not yet. >> Right, just to confirm this. This is expected. >> >> In tuntap, we do native XDP only for small and non zerocopy packets. See >> tun_can_build_skb(). The reason is XDP may adjust packet header which is >> not supported by zercopy. We can only use XDP generic for zerocopy in >> this case. > I think I understand when driver XDP can be used. What I'm not sure and > was going to narrow down is why zerocopy is mostly not applied. >I see, any touch to the zerocopy packet (clone, header expansion or segmentation) that lead a userspace copy will increase the error counter in vhost_net. Then vhost_net_tx_select_zcopy() may choose not to use zerocopy. So it was probably something in your setup or a bug somewhere. Thanks
Toshiaki Makita
2018-Jul-02 07:11 UTC
[PATCH vhost] vhost_net: Fix too many vring kick on busypoll
On 2018/07/02 15:17, Jason Wang wrote:> On 2018?07?02? 12:37, Toshiaki Makita wrote: >> On 2018/07/02 11:54, Jason Wang wrote: >>> On 2018?07?02? 10:45, Toshiaki Makita wrote: >>>> Hi Jason, >>>> >>>> On 2018/06/29 18:30, Jason Wang wrote: >>>>> On 2018?06?29? 16:09, Toshiaki Makita wrote: >>>> ... >>>>>> To fix this, poll the work instead of enabling notification when >>>>>> busypoll is interrupted by something. IMHO signal_pending() and >>>>>> vhost_has_work() are kind of interruptions rather than signals to >>>>>> completely cancel the busypoll, so let's run busypoll after the >>>>>> necessary work is done. In order to avoid too long busyloop due to >>>>>> interruption, save the endtime in vq field and use it when reentering >>>>>> the work function. >>>>> I think we don't care long busyloop unless e.g tx can starve rx? >>>> I just want to keep it user-controllable. Unless memorizing it busypoll >>>> can run unexpectedly long. >>> I think the total amount of time for busy polling is bounded. If I was >>> wrong, it should be a bug somewhere. >> Consider this kind of scenario: >> 0. Set 100us busypoll for example. >> 1. handle_tx() runs busypoll. >> 2. Something like zerocopy queues tx_work within 100us. >> 3. busypoll exits and call handle_tx() again. >> 4. Repeat 1-3. >> >> In this case handle_tx() does not process packets but busypoll >> essentially runs beyond 100us without endtime memorized. This may be >> just a theoretical problem, but I was worried that more code to poll tx >> queue can be added in the future and it becomes realistic. > > Yes, but consider zerocopy tends to batch 16 used packets and we will > finally finish all processing of packets. The above won't be endless, so > it was probably tolerable.Right. So endtime memorization is more like a future-proof thing. Would you like to keep it or change something?>>>>>> Performance numbers: >>>>>> >>>>>> - Bulk transfer from guest to external physical server. >>>>>> ?????? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)--> >>>>>> [Server] >>>>> Just to confirm in this case since zerocopy is enabled, we are in fact >>>>> use the generic XDP datapath? >>>> For some reason zerocopy was not applied for most packets, so in most >>>> cases driver XDP was used. I was going to dig into it but not yet. >>> Right, just to confirm this. This is expected. >>> >>> In tuntap, we do native XDP only for small and non zerocopy packets. See >>> tun_can_build_skb(). The reason is XDP may adjust packet header which is >>> not supported by zercopy. We can only use XDP generic for zerocopy in >>> this case. >> I think I understand when driver XDP can be used. What I'm not sure and >> was going to narrow down is why zerocopy is mostly not applied. >> > > I see, any touch to the zerocopy packet (clone, header expansion or > segmentation) that lead a userspace copy will increase the error counter > in vhost_net. Then vhost_net_tx_select_zcopy() may choose not to use > zerocopy. So it was probably something in your setup or a bug somewhere.Thanks for the hint! -- Toshiaki Makita
Possibly Parallel Threads
- [PATCH vhost] vhost_net: Fix too many vring kick on busypoll
- [PATCH vhost] vhost_net: Fix too many vring kick on busypoll
- [PATCH vhost] vhost_net: Fix too many vring kick on busypoll
- [PATCH vhost] vhost_net: Fix too many vring kick on busypoll
- [PATCH vhost] vhost_net: Fix too many vring kick on busypoll