thr3ads.net - Linux Virtualization - [PATCH RFC v4 net-next 0/5] virtio

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2014-Dec-02 09:55 UTC

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

On Tue, Dec 02, 2014 at 09:59:48AM +0008, Jason Wang
wrote:> 
> 
> On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin <mst at
redhat.com> wrote:
> >On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
> >>     On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang <jasowang at
redhat.com>
> >>wrote:
> >> >
> >> >
> >> >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin <mst at
redhat.com>
> >>wrote:
> >> >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang
wrote:
> >> >>> Hello:
> >> >>>  We used to orphan packets before transmission for
virtio-net. This
> >> >>>breaks
> >> >>> socket accounting and can lead serveral functions
won't work, e.g:
> >> >>>  - Byte Queue Limit depends on tx completion
nofication to work.
> >> >>> - Packet Generator depends on tx completion
nofication for the last
> >> >>>   transmitted packet to complete.
> >> >>> - TCP Small Queue depends on proper accounting of
sk_wmem_alloc to
> >> >>>work.
> >> >>>  This series tries to solve the issue by enabling tx
interrupts. To
> >> >>>minize
> >> >>> the performance impacts of this, several
optimizations were used:
> >> >>>  - In guest side, virtqueue_enable_cb_delayed() was
used to delay
> >>the
> >> >>>tx
> >> >>>   interrupt untile 3/4 pending packets were sent.
> >> >>> - In host side, interrupt coalescing were used to
reduce tx
> >> >>>interrupts.
> >> >>>  Performance test results[1] (tx-frames 16 tx-usecs
16) shows:
> >> >>>  - For guest receiving. No obvious regression on
throughput were
> >> >>>   noticed. More cpu utilization were noticed in few
cases.
> >> >>> - For guest transmission. Very huge improvement on
througput for
> >> >>>small
> >> >>>   packet transmission were noticed. This is expected
since TSQ and
> >> >>>other
> >> >>>   optimization for small packet transmission work
after tx
> >>interrupt.
> >> >>>But
> >> >>>   will use more cpu for large packets.
> >> >>> - For TCP_RR, regression (10% on transaction rate and
cpu
> >> >>>utilization) were
> >> >>>   found. Tx interrupt won't help but cause
overhead in this case.
> >> >>>Using
> >> >>>   more aggressive coalescing parameters may help to
reduce the
> >> >>>regression.
> >> >>
> >> >>OK, you do have posted coalescing patches - does it help
any?
> >> >
> >> >Helps a lot.
> >> >
> >> >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
> >> >For small packet TX, it increases 33% - 245% throughput.
(reduce about
> >>60%
> >> >inters)
> >> >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80%
tx
> >>intrs)
> >> >
> >> >>
> >> >>I'm not sure the regression is due to interrupts.
> >> >>It would make sense for CPU but why would it
> >> >>hurt transaction rate?
> >> >
> >> >Anyway guest need to take some cycles to handle tx interrupts.
> >> >And transaction rate does increase if we coalesces more tx
interurpts.
> >> >>
> >> >>
> >> >>It's possible that we are deferring kicks too much due
to BQL.
> >> >>
> >> >>As an experiment: do we get any of it back if we do
> >> >>-        if (kick || netif_xmit_stopped(txq))
> >> >>-                virtqueue_kick(sq->vq);
> >> >>+        virtqueue_kick(sq->vq);
> >> >>?
> >> >
> >> >
> >> >I will try, but during TCP_RR, at most 1 packets were pending,
> >> >I suspect if BQL can help in this case.
> >> Looks like this helps a lot in multiple sessions of TCP_RR.
> >
> >so what's faster
> >	BQL + kick each packet
> >	no BQL
> >?
> 
> Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not show obvious
> differences.
> 
> May need a complete benchmark to see.
Okay so going forward something like BQL + kick each packet
might be a good solution.
The advantage of BQL is that it works without GSO.
For example, now that we don't do UFO, you might
see significant gains with UDP.

> >
> >
> >> How about move the BQL patch out of this series?
> >> Let's first converge tx interrupt and then introduce it?
> >> (e.g with kicking after queuing X bytes?)
> >
> >Sounds good.

Pankaj Gupta

2014-Dec-02 10:08 UTC

head link

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

> 
> On Tue, Dec 02, 2014 at 09:59:48AM +0008, Jason Wang wrote:
> > 
> > 
> > On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin <mst at
redhat.com> wrote:
> > >On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
> > >>     On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang <jasowang
at redhat.com>
> > >>wrote:
> > >> >
> > >> >
> > >> >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin
<mst at redhat.com>
> > >>wrote:
> > >> >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang
wrote:
> > >> >>> Hello:
> > >> >>>  We used to orphan packets before transmission
for virtio-net. This
> > >> >>>breaks
> > >> >>> socket accounting and can lead serveral
functions won't work, e.g:
> > >> >>>  - Byte Queue Limit depends on tx completion
nofication to work.
> > >> >>> - Packet Generator depends on tx completion
nofication for the last
> > >> >>>   transmitted packet to complete.
> > >> >>> - TCP Small Queue depends on proper accounting
of sk_wmem_alloc to
> > >> >>>work.
> > >> >>>  This series tries to solve the issue by
enabling tx interrupts. To
> > >> >>>minize
> > >> >>> the performance impacts of this, several
optimizations were used:
> > >> >>>  - In guest side, virtqueue_enable_cb_delayed()
was used to delay
> > >>the
> > >> >>>tx
> > >> >>>   interrupt untile 3/4 pending packets were
sent.
> > >> >>> - In host side, interrupt coalescing were used
to reduce tx
> > >> >>>interrupts.
> > >> >>>  Performance test results[1] (tx-frames 16
tx-usecs 16) shows:
> > >> >>>  - For guest receiving. No obvious regression on
throughput were
> > >> >>>   noticed. More cpu utilization were noticed in
few cases.
> > >> >>> - For guest transmission. Very huge improvement
on througput for
> > >> >>>small
> > >> >>>   packet transmission were noticed. This is
expected since TSQ and
> > >> >>>other
> > >> >>>   optimization for small packet transmission
work after tx
> > >>interrupt.
> > >> >>>But
> > >> >>>   will use more cpu for large packets.
> > >> >>> - For TCP_RR, regression (10% on transaction
rate and cpu
> > >> >>>utilization) were
> > >> >>>   found. Tx interrupt won't help but cause
overhead in this case.
> > >> >>>Using
> > >> >>>   more aggressive coalescing parameters may help
to reduce the
> > >> >>>regression.
> > >> >>
> > >> >>OK, you do have posted coalescing patches - does it
help any?
> > >> >
> > >> >Helps a lot.
> > >> >
> > >> >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx
intrs)
> > >> >For small packet TX, it increases 33% - 245% throughput.
(reduce about
> > >>60%
> > >> >inters)
> > >> >For TCP_RR, it increase the 3%-10% trans.rate. (reduce
40%-80% tx
> > >>intrs)
> > >> >
> > >> >>
> > >> >>I'm not sure the regression is due to interrupts.
> > >> >>It would make sense for CPU but why would it
> > >> >>hurt transaction rate?
> > >> >
> > >> >Anyway guest need to take some cycles to handle tx
interrupts.
> > >> >And transaction rate does increase if we coalesces more
tx interurpts.
> > >> >>
> > >> >>
> > >> >>It's possible that we are deferring kicks too
much due to BQL.
> > >> >>
> > >> >>As an experiment: do we get any of it back if we do
> > >> >>-        if (kick || netif_xmit_stopped(txq))
> > >> >>-                virtqueue_kick(sq->vq);
> > >> >>+        virtqueue_kick(sq->vq);
> > >> >>?
> > >> >
> > >> >
> > >> >I will try, but during TCP_RR, at most 1 packets were
pending,
> > >> >I suspect if BQL can help in this case.
> > >> Looks like this helps a lot in multiple sessions of TCP_RR.
> > >
> > >so what's faster
> > >	BQL + kick each packet
> > >	no BQL
> > >?
> > 
> > Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not show
obvious
> > differences.
> > 
> > May need a complete benchmark to see.
> 
> Okay so going forward something like BQL + kick each packet
> might be a good solution.
> The advantage of BQL is that it works without GSO.
> For example, now that we don't do UFO, you might
> see significant gains with UDP.
If I understand correctly, it can also help for small packet
regr. in multiqueue scenario? Would be nice to see the perf. numbers
with multi-queue for small packets streams.> 
> 
> > >
> > >
> > >> How about move the BQL patch out of this series?
> > >> Let's first converge tx interrupt and then introduce it?
> > >> (e.g with kicking after queuing X bytes?)
> > >
> > >Sounds good.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev"
in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Michael S. Tsirkin

2014-Dec-02 10:11 UTC

head link

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

On Tue, Dec 02, 2014 at 05:08:35AM -0500, Pankaj Gupta
wrote:> 
> > 
> > On Tue, Dec 02, 2014 at 09:59:48AM +0008, Jason Wang wrote:
> > > 
> > > 
> > > On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin <mst at
redhat.com> wrote:
> > > >On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
> > > >>     On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang
<jasowang at redhat.com>
> > > >>wrote:
> > > >> >
> > > >> >
> > > >> >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin
<mst at redhat.com>
> > > >>wrote:
> > > >> >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason
Wang wrote:
> > > >> >>> Hello:
> > > >> >>>  We used to orphan packets before
transmission for virtio-net. This
> > > >> >>>breaks
> > > >> >>> socket accounting and can lead serveral
functions won't work, e.g:
> > > >> >>>  - Byte Queue Limit depends on tx
completion nofication to work.
> > > >> >>> - Packet Generator depends on tx completion
nofication for the last
> > > >> >>>   transmitted packet to complete.
> > > >> >>> - TCP Small Queue depends on proper
accounting of sk_wmem_alloc to
> > > >> >>>work.
> > > >> >>>  This series tries to solve the issue by
enabling tx interrupts. To
> > > >> >>>minize
> > > >> >>> the performance impacts of this, several
optimizations were used:
> > > >> >>>  - In guest side,
virtqueue_enable_cb_delayed() was used to delay
> > > >>the
> > > >> >>>tx
> > > >> >>>   interrupt untile 3/4 pending packets were
sent.
> > > >> >>> - In host side, interrupt coalescing were
used to reduce tx
> > > >> >>>interrupts.
> > > >> >>>  Performance test results[1] (tx-frames 16
tx-usecs 16) shows:
> > > >> >>>  - For guest receiving. No obvious
regression on throughput were
> > > >> >>>   noticed. More cpu utilization were
noticed in few cases.
> > > >> >>> - For guest transmission. Very huge
improvement on througput for
> > > >> >>>small
> > > >> >>>   packet transmission were noticed. This is
expected since TSQ and
> > > >> >>>other
> > > >> >>>   optimization for small packet
transmission work after tx
> > > >>interrupt.
> > > >> >>>But
> > > >> >>>   will use more cpu for large packets.
> > > >> >>> - For TCP_RR, regression (10% on
transaction rate and cpu
> > > >> >>>utilization) were
> > > >> >>>   found. Tx interrupt won't help but
cause overhead in this case.
> > > >> >>>Using
> > > >> >>>   more aggressive coalescing parameters may
help to reduce the
> > > >> >>>regression.
> > > >> >>
> > > >> >>OK, you do have posted coalescing patches - does
it help any?
> > > >> >
> > > >> >Helps a lot.
> > > >> >
> > > >> >For RX, it saves about 5% - 10% cpu. (reduce 60%-90%
tx intrs)
> > > >> >For small packet TX, it increases 33% - 245%
throughput. (reduce about
> > > >>60%
> > > >> >inters)
> > > >> >For TCP_RR, it increase the 3%-10% trans.rate.
(reduce 40%-80% tx
> > > >>intrs)
> > > >> >
> > > >> >>
> > > >> >>I'm not sure the regression is due to
interrupts.
> > > >> >>It would make sense for CPU but why would it
> > > >> >>hurt transaction rate?
> > > >> >
> > > >> >Anyway guest need to take some cycles to handle tx
interrupts.
> > > >> >And transaction rate does increase if we coalesces
more tx interurpts.
> > > >> >>
> > > >> >>
> > > >> >>It's possible that we are deferring kicks
too much due to BQL.
> > > >> >>
> > > >> >>As an experiment: do we get any of it back if we
do
> > > >> >>-        if (kick || netif_xmit_stopped(txq))
> > > >> >>-                virtqueue_kick(sq->vq);
> > > >> >>+        virtqueue_kick(sq->vq);
> > > >> >>?
> > > >> >
> > > >> >
> > > >> >I will try, but during TCP_RR, at most 1 packets
were pending,
> > > >> >I suspect if BQL can help in this case.
> > > >> Looks like this helps a lot in multiple sessions of
TCP_RR.
> > > >
> > > >so what's faster
> > > >	BQL + kick each packet
> > > >	no BQL
> > > >?
> > > 
> > > Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not show
obvious
> > > differences.
> > > 
> > > May need a complete benchmark to see.
> > 
> > Okay so going forward something like BQL + kick each packet
> > might be a good solution.
> > The advantage of BQL is that it works without GSO.
> > For example, now that we don't do UFO, you might
> > see significant gains with UDP.
> 
> If I understand correctly, it can also help for small packet
> regr. in multiqueue scenario?
Well BQL generally should only be active for 1:1 mappings.
> Would be nice to see the perf. numbers
> with multi-queue for small packets streams.
> > 
> > 
> > > >
> > > >
> > > >> How about move the BQL patch out of this series?
> > > >> Let's first converge tx interrupt and then introduce
it?
> > > >> (e.g with kicking after queuing X bytes?)
> > > >
> > > >Sounds good.
> > --
> > To unsubscribe from this list: send the line "unsubscribe
netdev" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

Maybe Matching Threads

Search for more seemingly similar threads

Linux Virtualization - Dec 2014 - [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

Maybe Matching Threads