search for: piledriver

Displaying 13 results from an estimated 13 matches for "piledriver".

2013 Nov 22
2
[LLVMdev] SchedMachineModel clarifications
If you haven't found it yet, the last public AMD Software Optimization Guide for Family 15h is here: http://developer.amd.com/wordpress/media/2012/03/47414_15h_sw_opt_guide.pdf This one describes both Bulldozer and Piledriver processors. Chapter 2 will given an overview of the Microarchitecture and Appendix B gives some additional details on which pipes are used for where. I haven't yet looked in detail at your patch to check your model, but at minimum the comments, references and naming still all refer to Sandy B...
2015 Nov 18
2
[PATCH] virtio_ring: Shadow available ring flags & index
...useful numbers for this case -- even with core turbo disabled, runtime variance is very high (10 - 30% run-to-run). > For the perf metric you provide, why not L1-dcache-load-misses which is > more meaning full? L1-dcache-load-misses is a better metric, you're right; for the original AMD Piledriver run I posted: Performance counter stats for './vring_bench_noshadow': 5,451,082,016 L1-dcache-loads 31,690,398 L1-dcache-load-misses 60,288,052 L1-dcache-stores 60,517,840 LLC-loads 9,726 LLC-load-misses 2.221477739...
2015 Nov 18
2
[PATCH] virtio_ring: Shadow available ring flags & index
...useful numbers for this case -- even with core turbo disabled, runtime variance is very high (10 - 30% run-to-run). > For the perf metric you provide, why not L1-dcache-load-misses which is > more meaning full? L1-dcache-load-misses is a better metric, you're right; for the original AMD Piledriver run I posted: Performance counter stats for './vring_bench_noshadow': 5,451,082,016 L1-dcache-loads 31,690,398 L1-dcache-load-misses 60,288,052 L1-dcache-stores 60,517,840 LLC-loads 9,726 LLC-load-misses 2.221477739...
2015 Nov 13
2
[PATCH] virtio_ring: Shadow available ring flags & index
...e optimally. > > Sounds logical, I'll apply this after a bit of testing > of my own, thanks! Thanks! > > In a concurrent version of vring_bench, the time required for > > 10,000,000 buffer checkout/returns was reduced by ~2% (average > > across many runs) on an AMD Piledriver (15h) CPU: > > > > (w/o shadowing): > > Performance counter stats for './vring_bench': > > 5,451,082,016 L1-dcache-loads > > ... > > 2.221477739 seconds time elapsed > > > > (w/ shadowing): > > Performance counter...
2015 Nov 13
2
[PATCH] virtio_ring: Shadow available ring flags & index
...e optimally. > > Sounds logical, I'll apply this after a bit of testing > of my own, thanks! Thanks! > > In a concurrent version of vring_bench, the time required for > > 10,000,000 buffer checkout/returns was reduced by ~2% (average > > across many runs) on an AMD Piledriver (15h) CPU: > > > > (w/o shadowing): > > Performance counter stats for './vring_bench': > > 5,451,082,016 L1-dcache-loads > > ... > > 2.221477739 seconds time elapsed > > > > (w/ shadowing): > > Performance counter...
2015 Nov 19
1
[PATCH] virtio_ring: Shadow available ring flags & index
...core turbo disabled, runtime variance is very high (10 - 30% run-to-run). >> >>> For the perf metric you provide, why not L1-dcache-load-misses which is >>> more meaning full? >> L1-dcache-load-misses is a better metric, you're right; for the original >> AMD Piledriver run I posted: >> >> Performance counter stats for './vring_bench_noshadow': >> 5,451,082,016 L1-dcache-loads >> 31,690,398 L1-dcache-load-misses >> 60,288,052 L1-dcache-stores >> 60,517,840 LLC-loads >&...
2015 Nov 18
0
[PATCH] virtio_ring: Shadow available ring flags & index
...n with > core turbo disabled, runtime variance is very high (10 - 30% run-to-run). > > > For the perf metric you provide, why not L1-dcache-load-misses which is > > more meaning full? > > L1-dcache-load-misses is a better metric, you're right; for the original > AMD Piledriver run I posted: > > Performance counter stats for './vring_bench_noshadow': > 5,451,082,016 L1-dcache-loads > 31,690,398 L1-dcache-load-misses > 60,288,052 L1-dcache-stores > 60,517,840 LLC-loads > 9,726...
2015 Nov 11
2
[PATCH] virtio_ring: Shadow available ring flags & index
...w reads from the shadows and only ever writes to avail->flags and avail->idx, allowing the cacheline to transfer core -> core optimally. In a concurrent version of vring_bench, the time required for 10,000,000 buffer checkout/returns was reduced by ~2% (average across many runs) on an AMD Piledriver (15h) CPU: (w/o shadowing): Performance counter stats for './vring_bench': 5,451,082,016 L1-dcache-loads ... 2.221477739 seconds time elapsed (w/ shadowing): Performance counter stats for './vring_bench': 5,405,701,361 L1-dcache-loads ......
2015 Nov 11
2
[PATCH] virtio_ring: Shadow available ring flags & index
...w reads from the shadows and only ever writes to avail->flags and avail->idx, allowing the cacheline to transfer core -> core optimally. In a concurrent version of vring_bench, the time required for 10,000,000 buffer checkout/returns was reduced by ~2% (average across many runs) on an AMD Piledriver (15h) CPU: (w/o shadowing): Performance counter stats for './vring_bench': 5,451,082,016 L1-dcache-loads ... 2.221477739 seconds time elapsed (w/ shadowing): Performance counter stats for './vring_bench': 5,405,701,361 L1-dcache-loads ......
2013 Nov 22
0
[LLVMdev] SchedMachineModel clarifications
...Mike Vermeulen <mevermeulen at gmail.com>wrote: > If you haven't found it yet, the last public AMD Software Optimization > Guide for Family 15h is here: > http://developer.amd.com/wordpress/media/2012/03/47414_15h_sw_opt_guide.pdf > > This one describes both Bulldozer and Piledriver processors. Chapter 2 > will given an overview of the Microarchitecture and Appendix B gives some > additional details on which pipes are used for where. > > I haven't yet looked in detail at your patch to check your model, but at > minimum the comments, references and naming st...
2015 Nov 17
0
[PATCH] virtio_ring: Shadow available ring flags & index
...Intel CPUs? For the perf metric you provide, why not L1-dcache-load-misses which is more meaning full? > >>> In a concurrent version of vring_bench, the time required for >>> 10,000,000 buffer checkout/returns was reduced by ~2% (average >>> across many runs) on an AMD Piledriver (15h) CPU: >>> >>> (w/o shadowing): >>> Performance counter stats for './vring_bench': >>> 5,451,082,016 L1-dcache-loads >>> ... >>> 2.221477739 seconds time elapsed >>> >>> (w/ shadowing): >&g...
2015 Nov 11
0
[PATCH] virtio_ring: Shadow available ring flags & index
...heline to transfer > core -> core optimally. Sounds logical, I'll apply this after a bit of testing of my own, thanks! > In a concurrent version of vring_bench, the time required for > 10,000,000 buffer checkout/returns was reduced by ~2% (average > across many runs) on an AMD Piledriver (15h) CPU: > > (w/o shadowing): > Performance counter stats for './vring_bench': > 5,451,082,016 L1-dcache-loads > ... > 2.221477739 seconds time elapsed > > (w/ shadowing): > Performance counter stats for './vring_bench': >...
2012 Feb 15
4
question on unused directories in /usr/lib and /usr/lib64
I was working on archiving an old virtual server today and was reminded of how much space is wasted by some of the default installations on CentOS. I think this was a 5.x box. Anyway, in /usr/lib/64 (and probably /usr/lib on non-64 systems), there were a lot of directories which have no bearing on a basic server. I saw firefox, openoffice and many, many other directories -- replete with enough