thr3ads.net - search: "vring

[PATCH] virtio_ring: Shadow available ring flags & index

2015 Nov 17

0

[PATCH] virtio_ring: Shadow available ring flags & index

...of testing >> of my own, thanks! > Thanks! Venkatesh: Is it that your patch only applies to CPUs w/ exclusive caches? Do you have perf data on Intel CPUs? For the perf metric you provide, why not L1-dcache-load-misses which is more meaning full? > >>> In a concurrent version of vring_bench, the time required for >>> 10,000,000 buffer checkout/returns was reduced by ~2% (average >>> across many runs) on an AMD Piledriver (15h) CPU: >>> >>> (w/o shadowing): >>> Performance counter stats for './vring_bench': >>> 5,451,0...

[PATCH] virtio_ring: Shadow available ring flags & index

2015 Nov 11

2

[PATCH] virtio_ring: Shadow available ring flags & index

...ites to avail->flags on the producer. This change shadows the flags and index fields in producer memory; the vring code now reads from the shadows and only ever writes to avail->flags and avail->idx, allowing the cacheline to transfer core -> core optimally. In a concurrent version of vring_bench, the time required for 10,000,000 buffer checkout/returns was reduced by ~2% (average across many runs) on an AMD Piledriver (15h) CPU: (w/o shadowing): Performance counter stats for './vring_bench': 5,451,082,016 L1-dcache-loads ... 2.221477739 seconds time elapsed...

[PATCH] virtio_ring: Shadow available ring flags & index

2015 Nov 11

2

[PATCH] virtio_ring: Shadow available ring flags & index

...ites to avail->flags on the producer. This change shadows the flags and index fields in producer memory; the vring code now reads from the shadows and only ever writes to avail->flags and avail->idx, allowing the cacheline to transfer core -> core optimally. In a concurrent version of vring_bench, the time required for 10,000,000 buffer checkout/returns was reduced by ~2% (average across many runs) on an AMD Piledriver (15h) CPU: (w/o shadowing): Performance counter stats for './vring_bench': 5,451,082,016 L1-dcache-loads ... 2.221477739 seconds time elapsed...

[PATCH] virtio_ring: Shadow available ring flags & index

2015 Nov 13

2

[PATCH] virtio_ring: Shadow available ring flags & index

...from the shadows and only ever writes to > > avail->flags and avail->idx, allowing the cacheline to transfer > > core -> core optimally. > > Sounds logical, I'll apply this after a bit of testing > of my own, thanks! Thanks! > > In a concurrent version of vring_bench, the time required for > > 10,000,000 buffer checkout/returns was reduced by ~2% (average > > across many runs) on an AMD Piledriver (15h) CPU: > > > > (w/o shadowing): > > Performance counter stats for './vring_bench': > > 5,451,082,016 L1-dca...

[PATCH] virtio_ring: Shadow available ring flags & index

2015 Nov 13

2

[PATCH] virtio_ring: Shadow available ring flags & index

...from the shadows and only ever writes to > > avail->flags and avail->idx, allowing the cacheline to transfer > > core -> core optimally. > > Sounds logical, I'll apply this after a bit of testing > of my own, thanks! Thanks! > > In a concurrent version of vring_bench, the time required for > > 10,000,000 buffer checkout/returns was reduced by ~2% (average > > across many runs) on an AMD Piledriver (15h) CPU: > > > > (w/o shadowing): > > Performance counter stats for './vring_bench': > > 5,451,082,016 L1-dca...

[PATCH 0/3] virtio: simplify virtio_ring.

2014 Sep 03

8

[PATCH 0/3] virtio: simplify virtio_ring.

I resurrected these patches after prompting from Andy Lutomirski's recent patches. I put them on the back-burner because vring_bench had a 15% slowdown on my laptop: pktgen testing revealed a speedup, if anything, so I've cleaned them up. Rusty Russell (3): virtio_net: pass well-formed sgs to virtqueue_add_*() virtio_ring: assume sgs are always well-formed. virtio_ring: unify direct/indirect code paths. drivers/net/...

[PATCH 0/3] virtio: simplify virtio_ring.

2014 Sep 03

8

[PATCH 0/3] virtio: simplify virtio_ring.

I resurrected these patches after prompting from Andy Lutomirski's recent patches. I put them on the back-burner because vring_bench had a 15% slowdown on my laptop: pktgen testing revealed a speedup, if anything, so I've cleaned them up. Rusty Russell (3): virtio_net: pass well-formed sgs to virtqueue_add_*() virtio_ring: assume sgs are always well-formed. virtio_ring: unify direct/indirect code paths. drivers/net/...

[PATCH 1/3] virtio_ring: Remove sg_next indirection

2014 Sep 01

2

[PATCH 1/3] virtio_ring: Remove sg_next indirection

...at KS.) Unfortunately the reason we dance through so many hoops here is that it has a measurable performance impact :( Those indirect calls get inlined. There's only one place which actually uses a weirdly-formed sg now, and that's virtio_net. It's pretty trivial to fix. However, vring_bench drops 15% when we do this. There's a larger question as to how much difference that makes in Real Life, of course. I'll measure that today. Here are my two patches, back-to-back (it cam out of of an earlier concern about reducing stack usage, hence the stack measurements). Cheers, Rusty....

[PATCH 1/3] virtio_ring: Remove sg_next indirection

2014 Sep 01

2

[PATCH 1/3] virtio_ring: Remove sg_next indirection

...at KS.) Unfortunately the reason we dance through so many hoops here is that it has a measurable performance impact :( Those indirect calls get inlined. There's only one place which actually uses a weirdly-formed sg now, and that's virtio_net. It's pretty trivial to fix. However, vring_bench drops 15% when we do this. There's a larger question as to how much difference that makes in Real Life, of course. I'll measure that today. Here are my two patches, back-to-back (it cam out of of an earlier concern about reducing stack usage, hence the stack measurements). Cheers, Rusty....

[PATCH 3/3] virtio_ring: unify direct/indirect code paths.

2014 Sep 03

0

[PATCH 3/3] virtio_ring: unify direct/indirect code paths.

...) did the allocation and the simple linear layout. We replace that with alloc_indirect() which allocates the indirect table then chains it like the normal descriptor table so we can reuse the core logic. This slows down pktgen by less than 1/2 a percent (which uses direct descriptors), as well as vring_bench, but it's far neater. vring_bench before: 1061485790-1104800648(1.08254e+09+/-6.6e+06)ns vring_bench after: 1125610268-1183528965(1.14172e+09+/-8e+06)ns pktgen before: 787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0...

[PATCH 2/3] virtio_ring: assume sgs are always well-formed.

2014 Sep 03

0

[PATCH 2/3] virtio_ring: assume sgs are always well-formed.

We used to have several callers which just used arrays. They're gone, so we can use sg_next() everywhere, simplifying the code. On my laptop, this slowed down vring_bench by 15%: vring_bench before: 936153354-967745359(9.44739e+08+/-6.1e+06)ns vring_bench after: 1061485790-1104800648(1.08254e+09+/-6.6e+06)ns However, a more realistic test using pktgen on a AMD FX(tm)-8320 saw a few percent improvement: pktgen before: 767390-792966(785159+/-6.5e+03)pps 356-367...

[PATCH] virtio_ring: Shadow available ring flags & index

2015 Nov 11

0

[PATCH] virtio_ring: Shadow available ring flags & index

...memory; > the vring code now reads from the shadows and only ever writes to > avail->flags and avail->idx, allowing the cacheline to transfer > core -> core optimally. Sounds logical, I'll apply this after a bit of testing of my own, thanks! > In a concurrent version of vring_bench, the time required for > 10,000,000 buffer checkout/returns was reduced by ~2% (average > across many runs) on an AMD Piledriver (15h) CPU: > > (w/o shadowing): > Performance counter stats for './vring_bench': > 5,451,082,016 L1-dcache-loads > ... >...

[PATCH 1/3] virtio_ring: Remove sg_next indirection

2014 Sep 01

1

[PATCH 1/3] virtio_ring: Remove sg_next indirection

...ow, > > and that's virtio_net. It's pretty trivial to fix. This path in virtio net is also unused on modern hypervisors, so we probably don't care how well does it perform: why not apply it anyway? It's the virtio_ring changes that we need to worry about. > > However, vring_bench drops 15% when we do this. There's a larger > > question as to how much difference that makes in Real Life, of course. > > I'll measure that today. > > Weird. sg_next shouldn't be nearly that slow. Weird. I think that's down to the fact that it's out of li...

[PATCH 1/3] virtio_ring: Remove sg_next indirection

2014 Sep 01

1

[PATCH 1/3] virtio_ring: Remove sg_next indirection

...ow, > > and that's virtio_net. It's pretty trivial to fix. This path in virtio net is also unused on modern hypervisors, so we probably don't care how well does it perform: why not apply it anyway? It's the virtio_ring changes that we need to worry about. > > However, vring_bench drops 15% when we do this. There's a larger > > question as to how much difference that makes in Real Life, of course. > > I'll measure that today. > > Weird. sg_next shouldn't be nearly that slow. Weird. I think that's down to the fact that it's out of li...

[PATCH] tools/virtio: fix byteswap logic

2015 Dec 04

0

[PATCH] tools/virtio: fix byteswap logic

...Nov 30, 2015 at 10:33:34AM +0200, Michael S. Tsirkin wrote: > commit cf561f0d2eb74574ad9985a2feab134267a9d298 ("virtio: introduce > virtio_is_little_endian() helper") changed byteswap logic to > skip feature bit checks for LE platforms, but didn't > update tools/virtio, so vring_bench started failing. > > Update the copy under tools/virtio/ (TODO: find a way to avoid this code > duplication). > > Cc: Greg Kurz <gkurz at linux.vnet.ibm.com> > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > tools/virtio/linux/virtio_config.h | 2...

[PATCH] tools/virtio: fix byteswap logic

2015 Dec 04

0

[PATCH] tools/virtio: fix byteswap logic

...Nov 30, 2015 at 10:33:34AM +0200, Michael S. Tsirkin wrote: > commit cf561f0d2eb74574ad9985a2feab134267a9d298 ("virtio: introduce > virtio_is_little_endian() helper") changed byteswap logic to > skip feature bit checks for LE platforms, but didn't > update tools/virtio, so vring_bench started failing. > > Update the copy under tools/virtio/ (TODO: find a way to avoid this code > duplication). > > Cc: Greg Kurz <gkurz at linux.vnet.ibm.com> > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > tools/virtio/linux/virtio_config.h | 2...

[PATCH 1/3] virtio_ring: Remove sg_next indirection

2014 Sep 01

0

[PATCH 1/3] virtio_ring: Remove sg_next indirection

...erformance impact :( Those indirect calls get inlined. gcc inlines that? That must nearly double the size of the object file. :-/ > > There's only one place which actually uses a weirdly-formed sg now, > and that's virtio_net. It's pretty trivial to fix. > > However, vring_bench drops 15% when we do this. There's a larger > question as to how much difference that makes in Real Life, of course. > I'll measure that today. Weird. sg_next shouldn't be nearly that slow. Weird. > > Here are my two patches, back-to-back (it cam out of of an earlier &g...

[PATCH v2] virtio_ring: Make interrupt suppression spec compliant

2016 Aug 31

0

[PATCH v2] virtio_ring: Make interrupt suppression spec compliant

According to the spec, if the VIRTIO_RING_F_EVENT_IDX feature bit is negotiated the driver MUST set flags to 0. Not dirtying the available ring in virtqueue_disable_cb also has a minor positive performance impact, improving L1 dcache load missed by ~0.5% in vring_bench. Writes to the used event field (vring_used_event) are still unconditional. Cc: Michael S. Tsirkin <mst at redhat.com> Cc: <stable at vger.kernel.org> # f277ec4 virtio_ring: shadow available Cc: <stable at vger.kernel.org> Signed-off-by: Ladi Prosek <lprosek at redhat.com>...

[PATCH v2] virtio_ring: Make interrupt suppression spec compliant

2016 Aug 31

0

[PATCH v2] virtio_ring: Make interrupt suppression spec compliant

According to the spec, if the VIRTIO_RING_F_EVENT_IDX feature bit is negotiated the driver MUST set flags to 0. Not dirtying the available ring in virtqueue_disable_cb also has a minor positive performance impact, improving L1 dcache load missed by ~0.5% in vring_bench. Writes to the used event field (vring_used_event) are still unconditional. Cc: Michael S. Tsirkin <mst at redhat.com> Cc: <stable at vger.kernel.org> # f277ec4 virtio_ring: shadow available Cc: <stable at vger.kernel.org> Signed-off-by: Ladi Prosek <lprosek at redhat.com>...

[PATCH 0/3] virtio: Clean up scatterlists and use the DMA API

2014 Aug 26

10

[PATCH 0/3] virtio: Clean up scatterlists and use the DMA API

This fixes virtio on Xen guests as well as on any other platform on which physical addresses don't match bus addresses. This can be tested with: virtme-run --xen xen --kimg arch/x86/boot/bzImage --console using virtme from here: https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git Without these patches, the guest hangs forever. With these patches, everything works. There

search for: vring_bench