thr3ads.net - Virtualization - [PATCH RFC v4 net-next 0/5] virtio

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2014-Dec-02 09:43 UTC

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang
wrote:> 
> 
> On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang <jasowang at redhat.com>
wrote:
> >
> >
> >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin <mst at
redhat.com> wrote:
> >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
> >>> Hello:
> >>>  We used to orphan packets before transmission for virtio-net.
This
> >>>breaks
> >>> socket accounting and can lead serveral functions won't
work, e.g:
> >>>  - Byte Queue Limit depends on tx completion nofication to
work.
> >>> - Packet Generator depends on tx completion nofication for the
last
> >>>   transmitted packet to complete.
> >>> - TCP Small Queue depends on proper accounting of
sk_wmem_alloc to
> >>>work.
> >>>  This series tries to solve the issue by enabling tx
interrupts. To
> >>>minize
> >>> the performance impacts of this, several optimizations were
used:
> >>>  - In guest side, virtqueue_enable_cb_delayed() was used to
delay the
> >>>tx
> >>>   interrupt untile 3/4 pending packets were sent.
> >>> - In host side, interrupt coalescing were used to reduce tx
> >>>interrupts.
> >>>  Performance test results[1] (tx-frames 16 tx-usecs 16) shows:
> >>>  - For guest receiving. No obvious regression on throughput
were
> >>>   noticed. More cpu utilization were noticed in few cases.
> >>> - For guest transmission. Very huge improvement on througput
for
> >>>small
> >>>   packet transmission were noticed. This is expected since TSQ
and
> >>>other
> >>>   optimization for small packet transmission work after tx
interrupt.
> >>>But
> >>>   will use more cpu for large packets.
> >>> - For TCP_RR, regression (10% on transaction rate and cpu
> >>>utilization) were
> >>>   found. Tx interrupt won't help but cause overhead in
this case.
> >>>Using
> >>>   more aggressive coalescing parameters may help to reduce the
> >>>regression.
> >>
> >>OK, you do have posted coalescing patches - does it help any?
> >
> >Helps a lot.
> >
> >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
> >For small packet TX, it increases 33% - 245% throughput. (reduce about
60%
> >inters)
> >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx
intrs)
> >
> >>
> >>I'm not sure the regression is due to interrupts.
> >>It would make sense for CPU but why would it
> >>hurt transaction rate?
> >
> >Anyway guest need to take some cycles to handle tx interrupts.
> >And transaction rate does increase if we coalesces more tx interurpts.
> >>
> >>
> >>It's possible that we are deferring kicks too much due to BQL.
> >>
> >>As an experiment: do we get any of it back if we do
> >>-        if (kick || netif_xmit_stopped(txq))
> >>-                virtqueue_kick(sq->vq);
> >>+        virtqueue_kick(sq->vq);
> >>?
> >
> >
> >I will try, but during TCP_RR, at most 1 packets were pending,
> >I suspect if BQL can help in this case.
> 
> Looks like this helps a lot in multiple sessions of TCP_RR.
so what's faster
	BQL + kick each packet
	no BQL
?
> How about move the BQL patch out of this series?
> 
> Let's first converge tx interrupt and then introduce it?
> (e.g with kicking after queuing X bytes?)
Sounds good.

Jason Wang

2014-Dec-02 09:51 UTC

head link

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin <mst at redhat.com> 
wrote:> On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
>>  
>>  
>>  On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang <jasowang at
redhat.com>
>> wrote:
>>  >
>>  >
>>  >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin 
>> <mst at redhat.com> wrote:
>>  >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
>>  >>> Hello:
>>  >>>  We used to orphan packets before transmission for
virtio-net.
>> This
>>  >>>breaks
>>  >>> socket accounting and can lead serveral functions
won't work,
>> e.g:
>>  >>>  - Byte Queue Limit depends on tx completion nofication
to work.
>>  >>> - Packet Generator depends on tx completion nofication
for the
>> last
>>  >>>   transmitted packet to complete.
>>  >>> - TCP Small Queue depends on proper accounting of
sk_wmem_alloc
>> to
>>  >>>work.
>>  >>>  This series tries to solve the issue by enabling tx 
>> interrupts. To
>>  >>>minize
>>  >>> the performance impacts of this, several optimizations
were
>> used:
>>  >>>  - In guest side, virtqueue_enable_cb_delayed() was used
to
>> delay the
>>  >>>tx
>>  >>>   interrupt untile 3/4 pending packets were sent.
>>  >>> - In host side, interrupt coalescing were used to reduce
tx
>>  >>>interrupts.
>>  >>>  Performance test results[1] (tx-frames 16 tx-usecs 16)
shows:
>>  >>>  - For guest receiving. No obvious regression on
throughput were
>>  >>>   noticed. More cpu utilization were noticed in few
cases.
>>  >>> - For guest transmission. Very huge improvement on
througput for
>>  >>>small
>>  >>>   packet transmission were noticed. This is expected
since TSQ
>> and
>>  >>>other
>>  >>>   optimization for small packet transmission work after
tx
>> interrupt.
>>  >>>But
>>  >>>   will use more cpu for large packets.
>>  >>> - For TCP_RR, regression (10% on transaction rate and cpu
>>  >>>utilization) were
>>  >>>   found. Tx interrupt won't help but cause overhead
in this
>> case.
>>  >>>Using
>>  >>>   more aggressive coalescing parameters may help to
reduce the
>>  >>>regression.
>>  >>
>>  >>OK, you do have posted coalescing patches - does it help any?
>>  >
>>  >Helps a lot.
>>  >
>>  >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
>>  >For small packet TX, it increases 33% - 245% throughput. (reduce 
>> about 60%
>>  >inters)
>>  >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx 
>> intrs)
>>  >
>>  >>
>>  >>I'm not sure the regression is due to interrupts.
>>  >>It would make sense for CPU but why would it
>>  >>hurt transaction rate?
>>  >
>>  >Anyway guest need to take some cycles to handle tx interrupts.
>>  >And transaction rate does increase if we coalesces more tx 
>> interurpts.
>>  >>
>>  >>
>>  >>It's possible that we are deferring kicks too much due to
BQL.
>>  >>
>>  >>As an experiment: do we get any of it back if we do
>>  >>-        if (kick || netif_xmit_stopped(txq))
>>  >>-                virtqueue_kick(sq->vq);
>>  >>+        virtqueue_kick(sq->vq);
>>  >>?
>>  >
>>  >
>>  >I will try, but during TCP_RR, at most 1 packets were pending,
>>  >I suspect if BQL can help in this case.
>>  
>>  Looks like this helps a lot in multiple sessions of TCP_RR.
> 
> so what's faster
> 	BQL + kick each packet
> 	no BQL
> ?
Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not 
show obvious differences.

May need a complete benchmark to see.> 
> 
>>  How about move the BQL patch out of this series?
>>  
>>  Let's first converge tx interrupt and then introduce it?
>>  (e.g with kicking after queuing X bytes?)
> 
> Sounds good.

Michael S. Tsirkin

2014-Dec-02 09:55 UTC

head link

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

On Tue, Dec 02, 2014 at 09:59:48AM +0008, Jason Wang
wrote:> 
> 
> On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin <mst at
redhat.com> wrote:
> >On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
> >>     On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang <jasowang at
redhat.com>
> >>wrote:
> >> >
> >> >
> >> >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin <mst at
redhat.com>
> >>wrote:
> >> >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang
wrote:
> >> >>> Hello:
> >> >>>  We used to orphan packets before transmission for
virtio-net. This
> >> >>>breaks
> >> >>> socket accounting and can lead serveral functions
won't work, e.g:
> >> >>>  - Byte Queue Limit depends on tx completion
nofication to work.
> >> >>> - Packet Generator depends on tx completion
nofication for the last
> >> >>>   transmitted packet to complete.
> >> >>> - TCP Small Queue depends on proper accounting of
sk_wmem_alloc to
> >> >>>work.
> >> >>>  This series tries to solve the issue by enabling tx
interrupts. To
> >> >>>minize
> >> >>> the performance impacts of this, several
optimizations were used:
> >> >>>  - In guest side, virtqueue_enable_cb_delayed() was
used to delay
> >>the
> >> >>>tx
> >> >>>   interrupt untile 3/4 pending packets were sent.
> >> >>> - In host side, interrupt coalescing were used to
reduce tx
> >> >>>interrupts.
> >> >>>  Performance test results[1] (tx-frames 16 tx-usecs
16) shows:
> >> >>>  - For guest receiving. No obvious regression on
throughput were
> >> >>>   noticed. More cpu utilization were noticed in few
cases.
> >> >>> - For guest transmission. Very huge improvement on
througput for
> >> >>>small
> >> >>>   packet transmission were noticed. This is expected
since TSQ and
> >> >>>other
> >> >>>   optimization for small packet transmission work
after tx
> >>interrupt.
> >> >>>But
> >> >>>   will use more cpu for large packets.
> >> >>> - For TCP_RR, regression (10% on transaction rate and
cpu
> >> >>>utilization) were
> >> >>>   found. Tx interrupt won't help but cause
overhead in this case.
> >> >>>Using
> >> >>>   more aggressive coalescing parameters may help to
reduce the
> >> >>>regression.
> >> >>
> >> >>OK, you do have posted coalescing patches - does it help
any?
> >> >
> >> >Helps a lot.
> >> >
> >> >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
> >> >For small packet TX, it increases 33% - 245% throughput.
(reduce about
> >>60%
> >> >inters)
> >> >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80%
tx
> >>intrs)
> >> >
> >> >>
> >> >>I'm not sure the regression is due to interrupts.
> >> >>It would make sense for CPU but why would it
> >> >>hurt transaction rate?
> >> >
> >> >Anyway guest need to take some cycles to handle tx interrupts.
> >> >And transaction rate does increase if we coalesces more tx
interurpts.
> >> >>
> >> >>
> >> >>It's possible that we are deferring kicks too much due
to BQL.
> >> >>
> >> >>As an experiment: do we get any of it back if we do
> >> >>-        if (kick || netif_xmit_stopped(txq))
> >> >>-                virtqueue_kick(sq->vq);
> >> >>+        virtqueue_kick(sq->vq);
> >> >>?
> >> >
> >> >
> >> >I will try, but during TCP_RR, at most 1 packets were pending,
> >> >I suspect if BQL can help in this case.
> >> Looks like this helps a lot in multiple sessions of TCP_RR.
> >
> >so what's faster
> >	BQL + kick each packet
> >	no BQL
> >?
> 
> Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not show obvious
> differences.
> 
> May need a complete benchmark to see.
Okay so going forward something like BQL + kick each packet
might be a good solution.
The advantage of BQL is that it works without GSO.
For example, now that we don't do UFO, you might
see significant gains with UDP.

> >
> >
> >> How about move the BQL patch out of this series?
> >> Let's first converge tx interrupt and then introduce it?
> >> (e.g with kicking after queuing X bytes?)
> >
> >Sounds good.

Maybe Matching Threads

Search for more maybe matching threads

Virtualization - Dec 2014 - [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

Maybe Matching Threads