On 2019/1/7 ??11:17, Michael S. Tsirkin wrote:> On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote: >> On 2019/1/2 ??9:59, Michael S. Tsirkin wrote: >>> On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote: >>>> On 2018/12/31 ??2:45, Michael S. Tsirkin wrote: >>>>> On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote: >>>>>> On 2018/12/26 ??11:19, Michael S. Tsirkin wrote: >>>>>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote: >>>>>>>> On 2018/12/6 ??6:54, Michael S. Tsirkin wrote: >>>>>>>>> When use_napi is set, let's enable BQLs. Note: some of the issues are >>>>>>>>> similar to wifi. It's worth considering whether something similar to >>>>>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be >>>>>>>>> benefitial. >>>>>>>> I've played a similar patch several days before. The tricky part is the mode >>>>>>>> switching between napi and no napi. We should make sure when the packet is >>>>>>>> sent and trakced by BQL,? it should be consumed by BQL as well. I did it by >>>>>>>> tracking it through skb->cb.? And deal with the freeze by reset the BQL >>>>>>>> status. Patch attached. >>>>>>>> >>>>>>>> But when testing with vhost-net, I don't very a stable performance, >>>>>>> So how about increasing TSQ pacing shift then? >>>>>> I can test this. But changing default TCP value is much more than a >>>>>> virtio-net specific thing. >>>>> Well same logic as wifi applies. Unpredictable latencies related >>>>> to radio in one case, to host scheduler in the other. >>>>> >>>>>>>> it was >>>>>>>> probably because we batch the used ring updating so tx interrupt may come >>>>>>>> randomly. We probably need to implement time bounded coalescing mechanism >>>>>>>> which could be configured from userspace. >>>>>>> I don't think it's reasonable to expect userspace to be that smart ... >>>>>>> Why do we need time bounded? used ring is always updated when ring >>>>>>> becomes empty. >>>>>> We don't add used when means BQL may not see the consumed packet in time. >>>>>> And the delay varies based on the workload since we count packets not bytes >>>>>> or time before doing the batched updating. >>>>>> >>>>>> Thanks >>>>> Sorry I still don't get it. >>>>> When nothing is outstanding then we do update the used. >>>>> So if BQL stops userspace from sending packets then >>>>> we get an interrupt and packets start flowing again. >>>> Yes, but how about the cases of multiple flows. That's where I see unstable >>>> results. >>>> >>>> >>>>> It might be suboptimal, we might need to tune it but I doubt running >>>>> timers is a solution, timer interrupts cause VM exits. >>>> Probably not a timer but a time counter (or event byte counter) in vhost to >>>> add used and signal guest if it exceeds a value instead of waiting the >>>> number of packets. >>>> >>>> >>>> Thanks >>> Well we already have VHOST_NET_WEIGHT - is it too big then? >> >> I'm not sure, it might be too big. >> >> >>> And maybe we should expose the "MORE" flag in the descriptor - >>> do you think that will help? >>> >> I don't know. But how a "more" flag can help here? >> >> Thanks > It sounds like we should be a bit more aggressive in updating used ring. > But if we just do it naively we will harm performance for sure as that > is how we are doing batching right now.I agree but the problem is to balance the PPS and throughput. More batching helps for PPS but may damage TCP throughput.> Instead we could make guest > control batching using the more flag - if that's not set we write out > the used ring.It's under the control of guest, so I'm afraid we still need some more guard (e.g time/bytes counters) on host. Thanks>
On Mon, Jan 07, 2019 at 11:51:55AM +0800, Jason Wang wrote:> > On 2019/1/7 ??11:17, Michael S. Tsirkin wrote: > > On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote: > > > On 2019/1/2 ??9:59, Michael S. Tsirkin wrote: > > > > On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote: > > > > > On 2018/12/31 ??2:45, Michael S. Tsirkin wrote: > > > > > > On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote: > > > > > > > On 2018/12/26 ??11:19, Michael S. Tsirkin wrote: > > > > > > > > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote: > > > > > > > > > On 2018/12/6 ??6:54, Michael S. Tsirkin wrote: > > > > > > > > > > When use_napi is set, let's enable BQLs. Note: some of the issues are > > > > > > > > > > similar to wifi. It's worth considering whether something similar to > > > > > > > > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be > > > > > > > > > > benefitial. > > > > > > > > > I've played a similar patch several days before. The tricky part is the mode > > > > > > > > > switching between napi and no napi. We should make sure when the packet is > > > > > > > > > sent and trakced by BQL,? it should be consumed by BQL as well. I did it by > > > > > > > > > tracking it through skb->cb.? And deal with the freeze by reset the BQL > > > > > > > > > status. Patch attached. > > > > > > > > > > > > > > > > > > But when testing with vhost-net, I don't very a stable performance, > > > > > > > > So how about increasing TSQ pacing shift then? > > > > > > > I can test this. But changing default TCP value is much more than a > > > > > > > virtio-net specific thing. > > > > > > Well same logic as wifi applies. Unpredictable latencies related > > > > > > to radio in one case, to host scheduler in the other. > > > > > > > > > > > > > > > it was > > > > > > > > > probably because we batch the used ring updating so tx interrupt may come > > > > > > > > > randomly. We probably need to implement time bounded coalescing mechanism > > > > > > > > > which could be configured from userspace. > > > > > > > > I don't think it's reasonable to expect userspace to be that smart ... > > > > > > > > Why do we need time bounded? used ring is always updated when ring > > > > > > > > becomes empty. > > > > > > > We don't add used when means BQL may not see the consumed packet in time. > > > > > > > And the delay varies based on the workload since we count packets not bytes > > > > > > > or time before doing the batched updating. > > > > > > > > > > > > > > Thanks > > > > > > Sorry I still don't get it. > > > > > > When nothing is outstanding then we do update the used. > > > > > > So if BQL stops userspace from sending packets then > > > > > > we get an interrupt and packets start flowing again. > > > > > Yes, but how about the cases of multiple flows. That's where I see unstable > > > > > results. > > > > > > > > > > > > > > > > It might be suboptimal, we might need to tune it but I doubt running > > > > > > timers is a solution, timer interrupts cause VM exits. > > > > > Probably not a timer but a time counter (or event byte counter) in vhost to > > > > > add used and signal guest if it exceeds a value instead of waiting the > > > > > number of packets. > > > > > > > > > > > > > > > Thanks > > > > Well we already have VHOST_NET_WEIGHT - is it too big then? > > > > > > I'm not sure, it might be too big. > > > > > > > > > > And maybe we should expose the "MORE" flag in the descriptor - > > > > do you think that will help? > > > > > > > I don't know. But how a "more" flag can help here? > > > > > > Thanks > > It sounds like we should be a bit more aggressive in updating used ring. > > But if we just do it naively we will harm performance for sure as that > > is how we are doing batching right now. > > > I agree but the problem is to balance the PPS and throughput. More batching > helps for PPS but may damage TCP throughput.That is what more flag is supposed to be I think - it is only set if there's a socket that actually needs the skb freed in order to go on.> > > Instead we could make guest > > control batching using the more flag - if that's not set we write out > > the used ring. > > > It's under the control of guest, so I'm afraid we still need some more guard > (e.g time/bytes counters) on host. > > ThanksPoint is if guest does not care about the skb being freed, then there is no rush host side to mark buffer used.> > >
On 2019/1/7 ??12:01, Michael S. Tsirkin wrote:> On Mon, Jan 07, 2019 at 11:51:55AM +0800, Jason Wang wrote: >> On 2019/1/7 ??11:17, Michael S. Tsirkin wrote: >>> On Mon, Jan 07, 2019 at 10:14:37AM +0800, Jason Wang wrote: >>>> On 2019/1/2 ??9:59, Michael S. Tsirkin wrote: >>>>> On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote: >>>>>> On 2018/12/31 ??2:45, Michael S. Tsirkin wrote: >>>>>>> On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote: >>>>>>>> On 2018/12/26 ??11:19, Michael S. Tsirkin wrote: >>>>>>>>> On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote: >>>>>>>>>> On 2018/12/6 ??6:54, Michael S. Tsirkin wrote: >>>>>>>>>>> When use_napi is set, let's enable BQLs. Note: some of the issues are >>>>>>>>>>> similar to wifi. It's worth considering whether something similar to >>>>>>>>>>> commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be >>>>>>>>>>> benefitial. >>>>>>>>>> I've played a similar patch several days before. The tricky part is the mode >>>>>>>>>> switching between napi and no napi. We should make sure when the packet is >>>>>>>>>> sent and trakced by BQL,? it should be consumed by BQL as well. I did it by >>>>>>>>>> tracking it through skb->cb.? And deal with the freeze by reset the BQL >>>>>>>>>> status. Patch attached. >>>>>>>>>> >>>>>>>>>> But when testing with vhost-net, I don't very a stable performance, >>>>>>>>> So how about increasing TSQ pacing shift then? >>>>>>>> I can test this. But changing default TCP value is much more than a >>>>>>>> virtio-net specific thing. >>>>>>> Well same logic as wifi applies. Unpredictable latencies related >>>>>>> to radio in one case, to host scheduler in the other. >>>>>>> >>>>>>>>>> it was >>>>>>>>>> probably because we batch the used ring updating so tx interrupt may come >>>>>>>>>> randomly. We probably need to implement time bounded coalescing mechanism >>>>>>>>>> which could be configured from userspace. >>>>>>>>> I don't think it's reasonable to expect userspace to be that smart ... >>>>>>>>> Why do we need time bounded? used ring is always updated when ring >>>>>>>>> becomes empty. >>>>>>>> We don't add used when means BQL may not see the consumed packet in time. >>>>>>>> And the delay varies based on the workload since we count packets not bytes >>>>>>>> or time before doing the batched updating. >>>>>>>> >>>>>>>> Thanks >>>>>>> Sorry I still don't get it. >>>>>>> When nothing is outstanding then we do update the used. >>>>>>> So if BQL stops userspace from sending packets then >>>>>>> we get an interrupt and packets start flowing again. >>>>>> Yes, but how about the cases of multiple flows. That's where I see unstable >>>>>> results. >>>>>> >>>>>> >>>>>>> It might be suboptimal, we might need to tune it but I doubt running >>>>>>> timers is a solution, timer interrupts cause VM exits. >>>>>> Probably not a timer but a time counter (or event byte counter) in vhost to >>>>>> add used and signal guest if it exceeds a value instead of waiting the >>>>>> number of packets. >>>>>> >>>>>> >>>>>> Thanks >>>>> Well we already have VHOST_NET_WEIGHT - is it too big then? >>>> I'm not sure, it might be too big. >>>> >>>> >>>>> And maybe we should expose the "MORE" flag in the descriptor - >>>>> do you think that will help? >>>>> >>>> I don't know. But how a "more" flag can help here? >>>> >>>> Thanks >>> It sounds like we should be a bit more aggressive in updating used ring. >>> But if we just do it naively we will harm performance for sure as that >>> is how we are doing batching right now. >> >> I agree but the problem is to balance the PPS and throughput. More batching >> helps for PPS but may damage TCP throughput. > That is what more flag is supposed to be I think - it is only set if > there's a socket that actually needs the skb freed in order to go on.I'm not quite sure I get, but is this something similar to what you want? https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027667.html Which enables tx interrupt for TCP packets, and you want to add used more aggressively for those sockets? Thanks>>> Instead we could make guest >>> control batching using the more flag - if that's not set we write out >>> the used ring. >> >> It's under the control of guest, so I'm afraid we still need some more guard >> (e.g time/bytes counters) on host. >> >> Thanks > Point is if guest does not care about the skb being freed, then there is no > rush host side to mark buffer used. > >