Displaying 20 results from an estimated 21 matches for "vring_bench".
2015 Nov 17
0
[PATCH] virtio_ring: Shadow available ring flags & index
...of testing
>> of my own, thanks!
> Thanks!
Venkatesh:
Is it that your patch only applies to CPUs w/ exclusive caches? Do you
have perf data on Intel CPUs?
For the perf metric you provide, why not L1-dcache-load-misses which is
more meaning full?
>
>>> In a concurrent version of vring_bench, the time required for
>>> 10,000,000 buffer checkout/returns was reduced by ~2% (average
>>> across many runs) on an AMD Piledriver (15h) CPU:
>>>
>>> (w/o shadowing):
>>> Performance counter stats for './vring_bench':
>>> 5,451,0...
2015 Nov 11
2
[PATCH] virtio_ring: Shadow available ring flags & index
...ites to avail->flags on the producer.
This change shadows the flags and index fields in producer memory;
the vring code now reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.
In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:
(w/o shadowing):
Performance counter stats for './vring_bench':
5,451,082,016 L1-dcache-loads
...
2.221477739 seconds time elapsed...
2015 Nov 11
2
[PATCH] virtio_ring: Shadow available ring flags & index
...ites to avail->flags on the producer.
This change shadows the flags and index fields in producer memory;
the vring code now reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.
In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:
(w/o shadowing):
Performance counter stats for './vring_bench':
5,451,082,016 L1-dcache-loads
...
2.221477739 seconds time elapsed...
2015 Nov 13
2
[PATCH] virtio_ring: Shadow available ring flags & index
...from the shadows and only ever writes to
> > avail->flags and avail->idx, allowing the cacheline to transfer
> > core -> core optimally.
>
> Sounds logical, I'll apply this after a bit of testing
> of my own, thanks!
Thanks!
> > In a concurrent version of vring_bench, the time required for
> > 10,000,000 buffer checkout/returns was reduced by ~2% (average
> > across many runs) on an AMD Piledriver (15h) CPU:
> >
> > (w/o shadowing):
> > Performance counter stats for './vring_bench':
> > 5,451,082,016 L1-dca...
2015 Nov 13
2
[PATCH] virtio_ring: Shadow available ring flags & index
...from the shadows and only ever writes to
> > avail->flags and avail->idx, allowing the cacheline to transfer
> > core -> core optimally.
>
> Sounds logical, I'll apply this after a bit of testing
> of my own, thanks!
Thanks!
> > In a concurrent version of vring_bench, the time required for
> > 10,000,000 buffer checkout/returns was reduced by ~2% (average
> > across many runs) on an AMD Piledriver (15h) CPU:
> >
> > (w/o shadowing):
> > Performance counter stats for './vring_bench':
> > 5,451,082,016 L1-dca...
2014 Sep 03
8
[PATCH 0/3] virtio: simplify virtio_ring.
I resurrected these patches after prompting from Andy Lutomirski's
recent patches. I put them on the back-burner because vring_bench
had a 15% slowdown on my laptop: pktgen testing revealed a speedup,
if anything, so I've cleaned them up.
Rusty Russell (3):
virtio_net: pass well-formed sgs to virtqueue_add_*()
virtio_ring: assume sgs are always well-formed.
virtio_ring: unify direct/indirect code paths.
drivers/net/...
2014 Sep 03
8
[PATCH 0/3] virtio: simplify virtio_ring.
I resurrected these patches after prompting from Andy Lutomirski's
recent patches. I put them on the back-burner because vring_bench
had a 15% slowdown on my laptop: pktgen testing revealed a speedup,
if anything, so I've cleaned them up.
Rusty Russell (3):
virtio_net: pass well-formed sgs to virtqueue_add_*()
virtio_ring: assume sgs are always well-formed.
virtio_ring: unify direct/indirect code paths.
drivers/net/...
2014 Sep 01
2
[PATCH 1/3] virtio_ring: Remove sg_next indirection
...at KS.)
Unfortunately the reason we dance through so many hoops here is that
it has a measurable performance impact :( Those indirect calls get inlined.
There's only one place which actually uses a weirdly-formed sg now,
and that's virtio_net. It's pretty trivial to fix.
However, vring_bench drops 15% when we do this. There's a larger
question as to how much difference that makes in Real Life, of course.
I'll measure that today.
Here are my two patches, back-to-back (it cam out of of an earlier
concern about reducing stack usage, hence the stack measurements).
Cheers,
Rusty....
2014 Sep 01
2
[PATCH 1/3] virtio_ring: Remove sg_next indirection
...at KS.)
Unfortunately the reason we dance through so many hoops here is that
it has a measurable performance impact :( Those indirect calls get inlined.
There's only one place which actually uses a weirdly-formed sg now,
and that's virtio_net. It's pretty trivial to fix.
However, vring_bench drops 15% when we do this. There's a larger
question as to how much difference that makes in Real Life, of course.
I'll measure that today.
Here are my two patches, back-to-back (it cam out of of an earlier
concern about reducing stack usage, hence the stack measurements).
Cheers,
Rusty....
2014 Sep 03
0
[PATCH 3/3] virtio_ring: unify direct/indirect code paths.
...) did the allocation and the simple
linear layout. We replace that with alloc_indirect() which allocates
the indirect table then chains it like the normal descriptor table so
we can reuse the core logic.
This slows down pktgen by less than 1/2 a percent (which uses direct
descriptors), as well as vring_bench, but it's far neater.
vring_bench before:
1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
vring_bench after:
1125610268-1183528965(1.14172e+09+/-8e+06)ns
pktgen before:
787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0...
2014 Sep 03
0
[PATCH 2/3] virtio_ring: assume sgs are always well-formed.
We used to have several callers which just used arrays. They're
gone, so we can use sg_next() everywhere, simplifying the code.
On my laptop, this slowed down vring_bench by 15%:
vring_bench before:
936153354-967745359(9.44739e+08+/-6.1e+06)ns
vring_bench after:
1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
However, a more realistic test using pktgen on a AMD FX(tm)-8320 saw
a few percent improvement:
pktgen before:
767390-792966(785159+/-6.5e+03)pps 356-367...
2015 Nov 11
0
[PATCH] virtio_ring: Shadow available ring flags & index
...memory;
> the vring code now reads from the shadows and only ever writes to
> avail->flags and avail->idx, allowing the cacheline to transfer
> core -> core optimally.
Sounds logical, I'll apply this after a bit of testing
of my own, thanks!
> In a concurrent version of vring_bench, the time required for
> 10,000,000 buffer checkout/returns was reduced by ~2% (average
> across many runs) on an AMD Piledriver (15h) CPU:
>
> (w/o shadowing):
> Performance counter stats for './vring_bench':
> 5,451,082,016 L1-dcache-loads
> ...
>...
2014 Sep 01
1
[PATCH 1/3] virtio_ring: Remove sg_next indirection
...ow,
> > and that's virtio_net. It's pretty trivial to fix.
This path in virtio net is also unused on modern hypervisors, so we probably
don't care how well does it perform: why not apply it anyway?
It's the virtio_ring changes that we need to worry about.
> > However, vring_bench drops 15% when we do this. There's a larger
> > question as to how much difference that makes in Real Life, of course.
> > I'll measure that today.
>
> Weird. sg_next shouldn't be nearly that slow. Weird.
I think that's down to the fact that it's out of li...
2014 Sep 01
1
[PATCH 1/3] virtio_ring: Remove sg_next indirection
...ow,
> > and that's virtio_net. It's pretty trivial to fix.
This path in virtio net is also unused on modern hypervisors, so we probably
don't care how well does it perform: why not apply it anyway?
It's the virtio_ring changes that we need to worry about.
> > However, vring_bench drops 15% when we do this. There's a larger
> > question as to how much difference that makes in Real Life, of course.
> > I'll measure that today.
>
> Weird. sg_next shouldn't be nearly that slow. Weird.
I think that's down to the fact that it's out of li...
2015 Dec 04
0
[PATCH] tools/virtio: fix byteswap logic
...Nov 30, 2015 at 10:33:34AM +0200, Michael S. Tsirkin wrote:
> commit cf561f0d2eb74574ad9985a2feab134267a9d298 ("virtio: introduce
> virtio_is_little_endian() helper") changed byteswap logic to
> skip feature bit checks for LE platforms, but didn't
> update tools/virtio, so vring_bench started failing.
>
> Update the copy under tools/virtio/ (TODO: find a way to avoid this code
> duplication).
>
> Cc: Greg Kurz <gkurz at linux.vnet.ibm.com>
> Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
> ---
> tools/virtio/linux/virtio_config.h | 2...
2015 Dec 04
0
[PATCH] tools/virtio: fix byteswap logic
...Nov 30, 2015 at 10:33:34AM +0200, Michael S. Tsirkin wrote:
> commit cf561f0d2eb74574ad9985a2feab134267a9d298 ("virtio: introduce
> virtio_is_little_endian() helper") changed byteswap logic to
> skip feature bit checks for LE platforms, but didn't
> update tools/virtio, so vring_bench started failing.
>
> Update the copy under tools/virtio/ (TODO: find a way to avoid this code
> duplication).
>
> Cc: Greg Kurz <gkurz at linux.vnet.ibm.com>
> Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
> ---
> tools/virtio/linux/virtio_config.h | 2...
2014 Sep 01
0
[PATCH 1/3] virtio_ring: Remove sg_next indirection
...erformance impact :( Those indirect calls get inlined.
gcc inlines that? That must nearly double the size of the object file. :-/
>
> There's only one place which actually uses a weirdly-formed sg now,
> and that's virtio_net. It's pretty trivial to fix.
>
> However, vring_bench drops 15% when we do this. There's a larger
> question as to how much difference that makes in Real Life, of course.
> I'll measure that today.
Weird. sg_next shouldn't be nearly that slow. Weird.
>
> Here are my two patches, back-to-back (it cam out of of an earlier
&g...
2016 Aug 31
0
[PATCH v2] virtio_ring: Make interrupt suppression spec compliant
According to the spec, if the VIRTIO_RING_F_EVENT_IDX feature bit is
negotiated the driver MUST set flags to 0. Not dirtying the available
ring in virtqueue_disable_cb also has a minor positive performance
impact, improving L1 dcache load missed by ~0.5% in vring_bench.
Writes to the used event field (vring_used_event) are still unconditional.
Cc: Michael S. Tsirkin <mst at redhat.com>
Cc: <stable at vger.kernel.org> # f277ec4 virtio_ring: shadow available
Cc: <stable at vger.kernel.org>
Signed-off-by: Ladi Prosek <lprosek at redhat.com>...
2016 Aug 31
0
[PATCH v2] virtio_ring: Make interrupt suppression spec compliant
According to the spec, if the VIRTIO_RING_F_EVENT_IDX feature bit is
negotiated the driver MUST set flags to 0. Not dirtying the available
ring in virtqueue_disable_cb also has a minor positive performance
impact, improving L1 dcache load missed by ~0.5% in vring_bench.
Writes to the used event field (vring_used_event) are still unconditional.
Cc: Michael S. Tsirkin <mst at redhat.com>
Cc: <stable at vger.kernel.org> # f277ec4 virtio_ring: shadow available
Cc: <stable at vger.kernel.org>
Signed-off-by: Ladi Prosek <lprosek at redhat.com>...
2014 Aug 26
10
[PATCH 0/3] virtio: Clean up scatterlists and use the DMA API
This fixes virtio on Xen guests as well as on any other platform on
which physical addresses don't match bus addresses.
This can be tested with:
virtme-run --xen xen --kimg arch/x86/boot/bzImage --console
using virtme from here:
https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git
Without these patches, the guest hangs forever. With these patches,
everything works.
There