Displaying 13 results from an estimated 13 matches for "piledriver".
2013 Nov 22
2
[LLVMdev] SchedMachineModel clarifications
If you haven't found it yet, the last public AMD Software Optimization
Guide for Family 15h is here:
http://developer.amd.com/wordpress/media/2012/03/47414_15h_sw_opt_guide.pdf
This one describes both Bulldozer and Piledriver processors. Chapter 2
will given an overview of the Microarchitecture and Appendix B gives some
additional details on which pipes are used for where.
I haven't yet looked in detail at your patch to check your model, but at
minimum the comments, references and naming still all refer to Sandy B...
2015 Nov 18
2
[PATCH] virtio_ring: Shadow available ring flags & index
...useful numbers for this case -- even with
core turbo disabled, runtime variance is very high (10 - 30% run-to-run).
> For the perf metric you provide, why not L1-dcache-load-misses which is
> more meaning full?
L1-dcache-load-misses is a better metric, you're right; for the original
AMD Piledriver run I posted:
Performance counter stats for './vring_bench_noshadow':
5,451,082,016 L1-dcache-loads
31,690,398 L1-dcache-load-misses
60,288,052 L1-dcache-stores
60,517,840 LLC-loads
9,726 LLC-load-misses
2.221477739...
2015 Nov 18
2
[PATCH] virtio_ring: Shadow available ring flags & index
...useful numbers for this case -- even with
core turbo disabled, runtime variance is very high (10 - 30% run-to-run).
> For the perf metric you provide, why not L1-dcache-load-misses which is
> more meaning full?
L1-dcache-load-misses is a better metric, you're right; for the original
AMD Piledriver run I posted:
Performance counter stats for './vring_bench_noshadow':
5,451,082,016 L1-dcache-loads
31,690,398 L1-dcache-load-misses
60,288,052 L1-dcache-stores
60,517,840 LLC-loads
9,726 LLC-load-misses
2.221477739...
2015 Nov 13
2
[PATCH] virtio_ring: Shadow available ring flags & index
...e optimally.
>
> Sounds logical, I'll apply this after a bit of testing
> of my own, thanks!
Thanks!
> > In a concurrent version of vring_bench, the time required for
> > 10,000,000 buffer checkout/returns was reduced by ~2% (average
> > across many runs) on an AMD Piledriver (15h) CPU:
> >
> > (w/o shadowing):
> > Performance counter stats for './vring_bench':
> > 5,451,082,016 L1-dcache-loads
> > ...
> > 2.221477739 seconds time elapsed
> >
> > (w/ shadowing):
> > Performance counter...
2015 Nov 13
2
[PATCH] virtio_ring: Shadow available ring flags & index
...e optimally.
>
> Sounds logical, I'll apply this after a bit of testing
> of my own, thanks!
Thanks!
> > In a concurrent version of vring_bench, the time required for
> > 10,000,000 buffer checkout/returns was reduced by ~2% (average
> > across many runs) on an AMD Piledriver (15h) CPU:
> >
> > (w/o shadowing):
> > Performance counter stats for './vring_bench':
> > 5,451,082,016 L1-dcache-loads
> > ...
> > 2.221477739 seconds time elapsed
> >
> > (w/ shadowing):
> > Performance counter...
2015 Nov 19
1
[PATCH] virtio_ring: Shadow available ring flags & index
...core turbo disabled, runtime variance is very high (10 - 30% run-to-run).
>>
>>> For the perf metric you provide, why not L1-dcache-load-misses which is
>>> more meaning full?
>> L1-dcache-load-misses is a better metric, you're right; for the original
>> AMD Piledriver run I posted:
>>
>> Performance counter stats for './vring_bench_noshadow':
>> 5,451,082,016 L1-dcache-loads
>> 31,690,398 L1-dcache-load-misses
>> 60,288,052 L1-dcache-stores
>> 60,517,840 LLC-loads
>&...
2015 Nov 18
0
[PATCH] virtio_ring: Shadow available ring flags & index
...n with
> core turbo disabled, runtime variance is very high (10 - 30% run-to-run).
>
> > For the perf metric you provide, why not L1-dcache-load-misses which is
> > more meaning full?
>
> L1-dcache-load-misses is a better metric, you're right; for the original
> AMD Piledriver run I posted:
>
> Performance counter stats for './vring_bench_noshadow':
> 5,451,082,016 L1-dcache-loads
> 31,690,398 L1-dcache-load-misses
> 60,288,052 L1-dcache-stores
> 60,517,840 LLC-loads
> 9,726...
2015 Nov 11
2
[PATCH] virtio_ring: Shadow available ring flags & index
...w reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.
In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:
(w/o shadowing):
Performance counter stats for './vring_bench':
5,451,082,016 L1-dcache-loads
...
2.221477739 seconds time elapsed
(w/ shadowing):
Performance counter stats for './vring_bench':
5,405,701,361 L1-dcache-loads
......
2015 Nov 11
2
[PATCH] virtio_ring: Shadow available ring flags & index
...w reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.
In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:
(w/o shadowing):
Performance counter stats for './vring_bench':
5,451,082,016 L1-dcache-loads
...
2.221477739 seconds time elapsed
(w/ shadowing):
Performance counter stats for './vring_bench':
5,405,701,361 L1-dcache-loads
......
2013 Nov 22
0
[LLVMdev] SchedMachineModel clarifications
...Mike Vermeulen <mevermeulen at gmail.com>wrote:
> If you haven't found it yet, the last public AMD Software Optimization
> Guide for Family 15h is here:
> http://developer.amd.com/wordpress/media/2012/03/47414_15h_sw_opt_guide.pdf
>
> This one describes both Bulldozer and Piledriver processors. Chapter 2
> will given an overview of the Microarchitecture and Appendix B gives some
> additional details on which pipes are used for where.
>
> I haven't yet looked in detail at your patch to check your model, but at
> minimum the comments, references and naming st...
2015 Nov 17
0
[PATCH] virtio_ring: Shadow available ring flags & index
...Intel CPUs?
For the perf metric you provide, why not L1-dcache-load-misses which is
more meaning full?
>
>>> In a concurrent version of vring_bench, the time required for
>>> 10,000,000 buffer checkout/returns was reduced by ~2% (average
>>> across many runs) on an AMD Piledriver (15h) CPU:
>>>
>>> (w/o shadowing):
>>> Performance counter stats for './vring_bench':
>>> 5,451,082,016 L1-dcache-loads
>>> ...
>>> 2.221477739 seconds time elapsed
>>>
>>> (w/ shadowing):
>&g...
2015 Nov 11
0
[PATCH] virtio_ring: Shadow available ring flags & index
...heline to transfer
> core -> core optimally.
Sounds logical, I'll apply this after a bit of testing
of my own, thanks!
> In a concurrent version of vring_bench, the time required for
> 10,000,000 buffer checkout/returns was reduced by ~2% (average
> across many runs) on an AMD Piledriver (15h) CPU:
>
> (w/o shadowing):
> Performance counter stats for './vring_bench':
> 5,451,082,016 L1-dcache-loads
> ...
> 2.221477739 seconds time elapsed
>
> (w/ shadowing):
> Performance counter stats for './vring_bench':
>...
2012 Feb 15
4
question on unused directories in /usr/lib and /usr/lib64
I was working on archiving an old virtual server today and was reminded of how much space is wasted by some of the default installations on CentOS. I think this was a 5.x box.
Anyway, in /usr/lib/64 (and probably /usr/lib on non-64 systems), there were a lot of directories which have no bearing on a basic server. I saw firefox, openoffice and many, many other directories -- replete with enough