Michael S. Tsirkin
2015-Oct-22  09:33 UTC
[PATCH net-next RFC 2/2] vhost_net: basic polling support
On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:> This patch tries to poll for new added tx buffer for a while at the > end of tx processing. The maximum time spent on polling were limited > through a module parameter. To avoid block rx, the loop will end it > there's new other works queued on vhost so in fact socket receive > queue is also be polled. > > busyloop_timeout = 50 gives us following improvement on TCP_RR test: > > size/session/+thu%/+normalize% > 1/ 1/ +5%/ -20% > 1/ 50/ +17%/ +3%Is there a measureable increase in cpu utilization with busyloop_timeout = 0?> Signed-off-by: Jason Wang <jasowang at redhat.com>We might be able to shave off the minor regression by careful use of likely/unlikely, or maybe deferring> --- > drivers/vhost/net.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index 9eda69e..bbb522a 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -31,7 +31,9 @@ > #include "vhost.h" > > static int experimental_zcopytx = 1; > +static int busyloop_timeout = 50; > module_param(experimental_zcopytx, int, 0444); > +module_param(busyloop_timeout, int, 0444);Pls add a description, including the units and the special value 0.> MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" > " 1 -Enable; 0 - Disable"); > > @@ -287,12 +289,23 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) > rcu_read_unlock_bh(); > } > > +static bool tx_can_busy_poll(struct vhost_dev *dev, > + unsigned long endtime) > +{ > + unsigned long now = local_clock() >> 10;local_clock might go backwards if we jump between CPUs. One way to fix would be to record the CPU id and break out of loop if that changes. Also - defer this until we actually know we need it?> + > + return busyloop_timeout && !need_resched() && > + !time_after(now, endtime) && !vhost_has_work(dev) && > + single_task_running();signal pending as well?> +} > + > /* Expects to be always run from workqueue - which acts as > * read-size critical section for our kind of RCU. */ > static void handle_tx(struct vhost_net *net) > { > struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; > struct vhost_virtqueue *vq = &nvq->vq; > + unsigned long endtime; > unsigned out, in; > int head; > struct msghdr msg = { > @@ -331,6 +344,8 @@ static void handle_tx(struct vhost_net *net) > % UIO_MAXIOV == nvq->done_idx)) > break; > > + endtime = (local_clock() >> 10) + busyloop_timeout; > +again: > head = vhost_get_vq_desc(vq, vq->iov, > ARRAY_SIZE(vq->iov), > &out, &in, > @@ -340,6 +355,10 @@ static void handle_tx(struct vhost_net *net) > break; > /* Nothing new? Wait for eventfd to tell us they refilled. */ > if (head == vq->num) { > + if (tx_can_busy_poll(vq->dev, endtime)) { > + cpu_relax(); > + goto again; > + } > if (unlikely(vhost_enable_notify(&net->dev, vq))) { > vhost_disable_notify(&net->dev, vq); > continue; > -- > 1.8.3.1
Rick Jones
2015-Oct-22  15:46 UTC
[PATCH net-next RFC 2/2] vhost_net: basic polling support
On 10/22/2015 02:33 AM, Michael S. Tsirkin wrote:> On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote: >> This patch tries to poll for new added tx buffer for a while at the >> end of tx processing. The maximum time spent on polling were limited >> through a module parameter. To avoid block rx, the loop will end it >> there's new other works queued on vhost so in fact socket receive >> queue is also be polled. >> >> busyloop_timeout = 50 gives us following improvement on TCP_RR test: >> >> size/session/+thu%/+normalize% >> 1/ 1/ +5%/ -20% >> 1/ 50/ +17%/ +3% > > Is there a measureable increase in cpu utilization > with busyloop_timeout = 0?And since a netperf TCP_RR test is involved, be careful about what netperf reports for CPU util if that increase isn't in the context of the guest OS. For completeness, looking at the effect on TCP_STREAM and TCP_MAERTS, aggregate _RR and even aggregate _RR/packets per second for many VMs on the same system would be in order. happy benchmarking, rick jones
Michael S. Tsirkin
2015-Oct-22  16:16 UTC
[PATCH net-next RFC 2/2] vhost_net: basic polling support
On Thu, Oct 22, 2015 at 08:46:33AM -0700, Rick Jones wrote:> On 10/22/2015 02:33 AM, Michael S. Tsirkin wrote: > >On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote: > >>This patch tries to poll for new added tx buffer for a while at the > >>end of tx processing. The maximum time spent on polling were limited > >>through a module parameter. To avoid block rx, the loop will end it > >>there's new other works queued on vhost so in fact socket receive > >>queue is also be polled. > >> > >>busyloop_timeout = 50 gives us following improvement on TCP_RR test: > >> > >>size/session/+thu%/+normalize% > >> 1/ 1/ +5%/ -20% > >> 1/ 50/ +17%/ +3% > > > >Is there a measureable increase in cpu utilization > >with busyloop_timeout = 0? > > And since a netperf TCP_RR test is involved, be careful about what netperf > reports for CPU util if that increase isn't in the context of the guest OS. > > For completeness, looking at the effect on TCP_STREAM and TCP_MAERTS, > aggregate _RR and even aggregate _RR/packets per second for many VMs on the > same system would be in order. > > happy benchmarking, > > rick jonesAbsolutely, merging a new kernel API just for a specific benchmark doesn't make sense. I'm guessing this is just an early RFC, a fuller submission will probably include more numbers. -- MST
Jason Wang
2015-Oct-23  07:13 UTC
[PATCH net-next RFC 2/2] vhost_net: basic polling support
On 10/22/2015 05:33 PM, Michael S. Tsirkin wrote:> On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote: >> This patch tries to poll for new added tx buffer for a while at the >> end of tx processing. The maximum time spent on polling were limited >> through a module parameter. To avoid block rx, the loop will end it >> there's new other works queued on vhost so in fact socket receive >> queue is also be polled. >> >> busyloop_timeout = 50 gives us following improvement on TCP_RR test: >> >> size/session/+thu%/+normalize% >> 1/ 1/ +5%/ -20% >> 1/ 50/ +17%/ +3% > Is there a measureable increase in cpu utilization > with busyloop_timeout = 0?Just run TCP_RR, no increasing. Will run a complete test on next version.> >> Signed-off-by: Jason Wang <jasowang at redhat.com> > We might be able to shave off the minor regression > by careful use of likely/unlikely, or maybe > deferringYes, but what did "deferring" mean here?> >> --- >> drivers/vhost/net.c | 19 +++++++++++++++++++ >> 1 file changed, 19 insertions(+) >> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >> index 9eda69e..bbb522a 100644 >> --- a/drivers/vhost/net.c >> +++ b/drivers/vhost/net.c >> @@ -31,7 +31,9 @@ >> #include "vhost.h" >> >> static int experimental_zcopytx = 1; >> +static int busyloop_timeout = 50; >> module_param(experimental_zcopytx, int, 0444); >> +module_param(busyloop_timeout, int, 0444); > Pls add a description, including the units and the special > value 0.Ok.> >> MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" >> " 1 -Enable; 0 - Disable"); >> >> @@ -287,12 +289,23 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) >> rcu_read_unlock_bh(); >> } >> >> +static bool tx_can_busy_poll(struct vhost_dev *dev, >> + unsigned long endtime) >> +{ >> + unsigned long now = local_clock() >> 10; > local_clock might go backwards if we jump between CPUs. > One way to fix would be to record the CPU id and break > out of loop if that changes.Right, or maybe disable preemption in this case?> > Also - defer this until we actually know we need it?Right.> >> + >> + return busyloop_timeout && !need_resched() && >> + !time_after(now, endtime) && !vhost_has_work(dev) && >> + single_task_running(); > signal pending as well?Yes.>> +} >> + >> /* Expects to be always run from workqueue - which acts as >> * read-size critical section for our kind of RCU. */ >> static void handle_tx(struct vhost_net *net) >> { >> struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; >> struct vhost_virtqueue *vq = &nvq->vq; >> + unsigned long endtime; >> unsigned out, in; >> int head; >> struct msghdr msg = { >> @@ -331,6 +344,8 @@ static void handle_tx(struct vhost_net *net) >> % UIO_MAXIOV == nvq->done_idx)) >> break; >> >> + endtime = (local_clock() >> 10) + busyloop_timeout; >> +again: >> head = vhost_get_vq_desc(vq, vq->iov, >> ARRAY_SIZE(vq->iov), >> &out, &in, >> @@ -340,6 +355,10 @@ static void handle_tx(struct vhost_net *net) >> break; >> /* Nothing new? Wait for eventfd to tell us they refilled. */ >> if (head == vq->num) { >> + if (tx_can_busy_poll(vq->dev, endtime)) { >> + cpu_relax(); >> + goto again; >> + } >> if (unlikely(vhost_enable_notify(&net->dev, vq))) { >> vhost_disable_notify(&net->dev, vq); >> continue; >> -- >> 1.8.3.1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
Michael S. Tsirkin
2015-Oct-23  13:39 UTC
[PATCH net-next RFC 2/2] vhost_net: basic polling support
On Fri, Oct 23, 2015 at 03:13:07PM +0800, Jason Wang wrote:> > > On 10/22/2015 05:33 PM, Michael S. Tsirkin wrote: > > On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote: > >> This patch tries to poll for new added tx buffer for a while at the > >> end of tx processing. The maximum time spent on polling were limited > >> through a module parameter. To avoid block rx, the loop will end it > >> there's new other works queued on vhost so in fact socket receive > >> queue is also be polled. > >> > >> busyloop_timeout = 50 gives us following improvement on TCP_RR test: > >> > >> size/session/+thu%/+normalize% > >> 1/ 1/ +5%/ -20% > >> 1/ 50/ +17%/ +3% > > Is there a measureable increase in cpu utilization > > with busyloop_timeout = 0? > > Just run TCP_RR, no increasing. Will run a complete test on next version. > > > > >> Signed-off-by: Jason Wang <jasowang at redhat.com> > > We might be able to shave off the minor regression > > by careful use of likely/unlikely, or maybe > > deferring > > Yes, but what did "deferring" mean here?Don't call local_clock until we know we'll need it.> > > >> --- > >> drivers/vhost/net.c | 19 +++++++++++++++++++ > >> 1 file changed, 19 insertions(+) > >> > >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > >> index 9eda69e..bbb522a 100644 > >> --- a/drivers/vhost/net.c > >> +++ b/drivers/vhost/net.c > >> @@ -31,7 +31,9 @@ > >> #include "vhost.h" > >> > >> static int experimental_zcopytx = 1; > >> +static int busyloop_timeout = 50; > >> module_param(experimental_zcopytx, int, 0444); > >> +module_param(busyloop_timeout, int, 0444); > > Pls add a description, including the units and the special > > value 0. > > Ok. > > > > >> MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" > >> " 1 -Enable; 0 - Disable"); > >> > >> @@ -287,12 +289,23 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) > >> rcu_read_unlock_bh(); > >> } > >> > >> +static bool tx_can_busy_poll(struct vhost_dev *dev, > >> + unsigned long endtime) > >> +{ > >> + unsigned long now = local_clock() >> 10; > > local_clock might go backwards if we jump between CPUs. > > One way to fix would be to record the CPU id and break > > out of loop if that changes. > > Right, or maybe disable preemption in this case? > > > > > Also - defer this until we actually know we need it? > > Right. > > > > >> + > >> + return busyloop_timeout && !need_resched() && > >> + !time_after(now, endtime) && !vhost_has_work(dev) && > >> + single_task_running(); > > signal pending as well? > > Yes. > > >> +} > >> + > >> /* Expects to be always run from workqueue - which acts as > >> * read-size critical section for our kind of RCU. */ > >> static void handle_tx(struct vhost_net *net) > >> { > >> struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; > >> struct vhost_virtqueue *vq = &nvq->vq; > >> + unsigned long endtime; > >> unsigned out, in; > >> int head; > >> struct msghdr msg = { > >> @@ -331,6 +344,8 @@ static void handle_tx(struct vhost_net *net) > >> % UIO_MAXIOV == nvq->done_idx)) > >> break; > >> > >> + endtime = (local_clock() >> 10) + busyloop_timeout; > >> +again: > >> head = vhost_get_vq_desc(vq, vq->iov, > >> ARRAY_SIZE(vq->iov), > >> &out, &in, > >> @@ -340,6 +355,10 @@ static void handle_tx(struct vhost_net *net) > >> break; > >> /* Nothing new? Wait for eventfd to tell us they refilled. */ > >> if (head == vq->num) { > >> + if (tx_can_busy_poll(vq->dev, endtime)) { > >> + cpu_relax(); > >> + goto again; > >> + } > >> if (unlikely(vhost_enable_notify(&net->dev, vq))) { > >> vhost_disable_notify(&net->dev, vq); > >> continue; > >> -- > >> 1.8.3.1 > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo at vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/
Maybe Matching Threads
- [PATCH net-next RFC 2/2] vhost_net: basic polling support
- [PATCH net-next RFC 2/2] vhost_net: basic polling support
- [PATCH net-next RFC 2/2] vhost_net: basic polling support
- [PATCH V2 3/3] vhost_net: basic polling support
- [PATCH V3 3/3] vhost_net: basic polling support