Toshiaki Makita
2018-Jul-02 07:11 UTC
[PATCH vhost] vhost_net: Fix too many vring kick on busypoll
On 2018/07/02 15:17, Jason Wang wrote:> On 2018?07?02? 12:37, Toshiaki Makita wrote: >> On 2018/07/02 11:54, Jason Wang wrote: >>> On 2018?07?02? 10:45, Toshiaki Makita wrote: >>>> Hi Jason, >>>> >>>> On 2018/06/29 18:30, Jason Wang wrote: >>>>> On 2018?06?29? 16:09, Toshiaki Makita wrote: >>>> ... >>>>>> To fix this, poll the work instead of enabling notification when >>>>>> busypoll is interrupted by something. IMHO signal_pending() and >>>>>> vhost_has_work() are kind of interruptions rather than signals to >>>>>> completely cancel the busypoll, so let's run busypoll after the >>>>>> necessary work is done. In order to avoid too long busyloop due to >>>>>> interruption, save the endtime in vq field and use it when reentering >>>>>> the work function. >>>>> I think we don't care long busyloop unless e.g tx can starve rx? >>>> I just want to keep it user-controllable. Unless memorizing it busypoll >>>> can run unexpectedly long. >>> I think the total amount of time for busy polling is bounded. If I was >>> wrong, it should be a bug somewhere. >> Consider this kind of scenario: >> 0. Set 100us busypoll for example. >> 1. handle_tx() runs busypoll. >> 2. Something like zerocopy queues tx_work within 100us. >> 3. busypoll exits and call handle_tx() again. >> 4. Repeat 1-3. >> >> In this case handle_tx() does not process packets but busypoll >> essentially runs beyond 100us without endtime memorized. This may be >> just a theoretical problem, but I was worried that more code to poll tx >> queue can be added in the future and it becomes realistic. > > Yes, but consider zerocopy tends to batch 16 used packets and we will > finally finish all processing of packets. The above won't be endless, so > it was probably tolerable.Right. So endtime memorization is more like a future-proof thing. Would you like to keep it or change something?>>>>>> Performance numbers: >>>>>> >>>>>> - Bulk transfer from guest to external physical server. >>>>>> ?????? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)--> >>>>>> [Server] >>>>> Just to confirm in this case since zerocopy is enabled, we are in fact >>>>> use the generic XDP datapath? >>>> For some reason zerocopy was not applied for most packets, so in most >>>> cases driver XDP was used. I was going to dig into it but not yet. >>> Right, just to confirm this. This is expected. >>> >>> In tuntap, we do native XDP only for small and non zerocopy packets. See >>> tun_can_build_skb(). The reason is XDP may adjust packet header which is >>> not supported by zercopy. We can only use XDP generic for zerocopy in >>> this case. >> I think I understand when driver XDP can be used. What I'm not sure and >> was going to narrow down is why zerocopy is mostly not applied. >> > > I see, any touch to the zerocopy packet (clone, header expansion or > segmentation) that lead a userspace copy will increase the error counter > in vhost_net. Then vhost_net_tx_select_zcopy() may choose not to use > zerocopy. So it was probably something in your setup or a bug somewhere.Thanks for the hint! -- Toshiaki Makita
Jason Wang
2018-Jul-02 07:52 UTC
[PATCH vhost] vhost_net: Fix too many vring kick on busypoll
On 2018?07?02? 15:11, Toshiaki Makita wrote:> On 2018/07/02 15:17, Jason Wang wrote: >> On 2018?07?02? 12:37, Toshiaki Makita wrote: >>> On 2018/07/02 11:54, Jason Wang wrote: >>>> On 2018?07?02? 10:45, Toshiaki Makita wrote: >>>>> Hi Jason, >>>>> >>>>> On 2018/06/29 18:30, Jason Wang wrote: >>>>>> On 2018?06?29? 16:09, Toshiaki Makita wrote: >>>>> ... >>>>>>> To fix this, poll the work instead of enabling notification when >>>>>>> busypoll is interrupted by something. IMHO signal_pending() and >>>>>>> vhost_has_work() are kind of interruptions rather than signals to >>>>>>> completely cancel the busypoll, so let's run busypoll after the >>>>>>> necessary work is done. In order to avoid too long busyloop due to >>>>>>> interruption, save the endtime in vq field and use it when reentering >>>>>>> the work function. >>>>>> I think we don't care long busyloop unless e.g tx can starve rx? >>>>> I just want to keep it user-controllable. Unless memorizing it busypoll >>>>> can run unexpectedly long. >>>> I think the total amount of time for busy polling is bounded. If I was >>>> wrong, it should be a bug somewhere. >>> Consider this kind of scenario: >>> 0. Set 100us busypoll for example. >>> 1. handle_tx() runs busypoll. >>> 2. Something like zerocopy queues tx_work within 100us. >>> 3. busypoll exits and call handle_tx() again. >>> 4. Repeat 1-3. >>> >>> In this case handle_tx() does not process packets but busypoll >>> essentially runs beyond 100us without endtime memorized. This may be >>> just a theoretical problem, but I was worried that more code to poll tx >>> queue can be added in the future and it becomes realistic. >> Yes, but consider zerocopy tends to batch 16 used packets and we will >> finally finish all processing of packets. The above won't be endless, so >> it was probably tolerable. > Right. So endtime memorization is more like a future-proof thing. > Would you like to keep it or change something?I think we'd better introduce it only when we meet real bugs. Thanks> >>>>>>> Performance numbers: >>>>>>> >>>>>>> - Bulk transfer from guest to external physical server. >>>>>>> ?????? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)--> >>>>>>> [Server] >>>>>> Just to confirm in this case since zerocopy is enabled, we are in fact >>>>>> use the generic XDP datapath? >>>>> For some reason zerocopy was not applied for most packets, so in most >>>>> cases driver XDP was used. I was going to dig into it but not yet. >>>> Right, just to confirm this. This is expected. >>>> >>>> In tuntap, we do native XDP only for small and non zerocopy packets. See >>>> tun_can_build_skb(). The reason is XDP may adjust packet header which is >>>> not supported by zercopy. We can only use XDP generic for zerocopy in >>>> this case. >>> I think I understand when driver XDP can be used. What I'm not sure and >>> was going to narrow down is why zerocopy is mostly not applied. >>> >> I see, any touch to the zerocopy packet (clone, header expansion or >> segmentation) that lead a userspace copy will increase the error counter >> in vhost_net. Then vhost_net_tx_select_zcopy() may choose not to use >> zerocopy. So it was probably something in your setup or a bug somewhere. > Thanks for the hint! >
Toshiaki Makita
2018-Jul-02 08:05 UTC
[PATCH vhost] vhost_net: Fix too many vring kick on busypoll
On 2018/07/02 16:52, Jason Wang wrote:> On 2018?07?02? 15:11, Toshiaki Makita wrote: >> On 2018/07/02 15:17, Jason Wang wrote: >>> On 2018?07?02? 12:37, Toshiaki Makita wrote: >>>> On 2018/07/02 11:54, Jason Wang wrote: >>>>> On 2018?07?02? 10:45, Toshiaki Makita wrote: >>>>>> Hi Jason, >>>>>> >>>>>> On 2018/06/29 18:30, Jason Wang wrote: >>>>>>> On 2018?06?29? 16:09, Toshiaki Makita wrote: >>>>>> ... >>>>>>>> To fix this, poll the work instead of enabling notification when >>>>>>>> busypoll is interrupted by something. IMHO signal_pending() and >>>>>>>> vhost_has_work() are kind of interruptions rather than signals to >>>>>>>> completely cancel the busypoll, so let's run busypoll after the >>>>>>>> necessary work is done. In order to avoid too long busyloop due to >>>>>>>> interruption, save the endtime in vq field and use it when >>>>>>>> reentering >>>>>>>> the work function. >>>>>>> I think we don't care long busyloop unless e.g tx can starve rx? >>>>>> I just want to keep it user-controllable. Unless memorizing it >>>>>> busypoll >>>>>> can run unexpectedly long. >>>>> I think the total amount of time for busy polling is bounded. If I was >>>>> wrong, it should be a bug somewhere. >>>> Consider this kind of scenario: >>>> 0. Set 100us busypoll for example. >>>> 1. handle_tx() runs busypoll. >>>> 2. Something like zerocopy queues tx_work within 100us. >>>> 3. busypoll exits and call handle_tx() again. >>>> 4. Repeat 1-3. >>>> >>>> In this case handle_tx() does not process packets but busypoll >>>> essentially runs beyond 100us without endtime memorized. This may be >>>> just a theoretical problem, but I was worried that more code to poll tx >>>> queue can be added in the future and it becomes realistic. >>> Yes, but consider zerocopy tends to batch 16 used packets and we will >>> finally finish all processing of packets. The above won't be endless, so >>> it was probably tolerable. >> Right. So endtime memorization is more like a future-proof thing. >> Would you like to keep it or change something? > > I think we'd better introduce it only when we meet real bugs.I'll change it to a flag to indicate the previous busypoll is interrupted.>>>>>>>> Performance numbers: >>>>>>>> >>>>>>>> - Bulk transfer from guest to external physical server. >>>>>>>> ??????? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)--> >>>>>>>> [Server] >>>>>>> Just to confirm in this case since zerocopy is enabled, we are in >>>>>>> fact >>>>>>> use the generic XDP datapath? >>>>>> For some reason zerocopy was not applied for most packets, so in most >>>>>> cases driver XDP was used. I was going to dig into it but not yet. >>>>> Right, just to confirm this. This is expected. >>>>> >>>>> In tuntap, we do native XDP only for small and non zerocopy >>>>> packets. See >>>>> tun_can_build_skb(). The reason is XDP may adjust packet header >>>>> which is >>>>> not supported by zercopy. We can only use XDP generic for zerocopy in >>>>> this case. >>>> I think I understand when driver XDP can be used. What I'm not sure and >>>> was going to narrow down is why zerocopy is mostly not applied. >>>> >>> I see, any touch to the zerocopy packet (clone, header expansion or >>> segmentation) that lead a userspace copy will increase the error counter >>> in vhost_net. Then vhost_net_tx_select_zcopy() may choose not to use >>> zerocopy. So it was probably something in your setup or a bug somewhere. >> Thanks for the hint!Seems zerocopy packets are always nonlinear and netif_receive_generic_xdp() calls skb_linearize() in which __pskb_pull_tail() calls skb_zcopy_clear(). Looks like tx_zcopy_err is always counted when zerocopy is used with XDP in my env. -- Toshiaki Makita