Displaying 20 results from an estimated 156 matches for "microarchitectures".
Did you mean:
microarchitecture
2017 Feb 13
2
(RFC) Adjusting default loop fully unroll threshold
...<
llvm-dev at lists.llvm.org> wrote:
> For unrolling specifically I agree with Hal that the hooks should be
> target specific. Actually, I go further and think they should be uArch
> specific.
>
They already are, it is just that no one has contributed a patch to use
this on x86 microarchitectures.
Until someone shows up with data showing that we need different tunings for
different microarchitectures, it doesn't make sense for us to just make up
numbers there.
On the (very limited) microarchitectures we have and can test on, we're not
seeing a need for microarchitectural tuning. B...
2006 Oct 02
0
[PATCH][XENOPROFILE] add support for Intel CORE microarchitecture
This adds support for core and core2 chips. Tested on Woodcrest
processors. Requires Oprofile 0.9.2.
-Andrew
Signed-off-by: Andrew Theurer <habanero@us.ibm.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
2016 Sep 29
2
Load combine pass
On 29 Sep 2016, at 01:25, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>
> Hi David,
>
> David Chisnall via llvm-dev wrote:
> > On 28 Sep 2016, at 16:50, Philip Reames via llvm-dev<llvm-dev at lists.llvm.org> wrote:
> >> At this point, my general view is that widening transformations of any kind should be done very late. Ideally, this is
2015 Nov 11
3
[AArch64] Address computation folding
Hi,
I was looking at some AArch64 benchmarks and noticed some simple cases
where addresses are being folded into the address mode computations
and was curious as to why. In particular, consider the following
simple example:
void f2(unsigned long *x, unsigned long c)
{
x[c] *= 2;
}
This generates:
lsl x8, x1, #3
ldr x9, [x0, x8]
lsl x9, x9, #1
str x9, [x0, x8]
Given the two
2018 Mar 15
5
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
[You can find an easier to read and more complete version of this RFC here
<https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>
.]
Knowing instruction scheduling properties (latency, uops) is the basis for
all scheduling work done by LLVM.
Unfortunately, vendors usually release only partial (and sometimes
incorrect) information. Updating the
2013 Nov 13
2
[LLVMdev] SchedMachineModel clarifications
Dear Andrew and the Group,
I’m trying come up with a SchedMachineModel for the AMD bulldozer
http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture).
The model is not exist for the same .Please correct me if am i wrong here.
I was going through your reference @
https://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/Target/TargetSchedule.td
.
But I couldn’t model some of the
2008 Jul 31
0
[LLVMdev] Generating movq2dq using IRBuilder
...ors, MMX
instructions are often emitted even when SSE3 is available. Is this
really the intent or is it just that SSE versions of certain patterns
have not been added, and therefore it falls back to MMX versions? It's
not really encouraged to use MMX (or x87 for that matter) on modern
microarchitectures if you can get away with SSE.
--
Stefanus Du Toit <stefanus.dutoit at rapidmind.com>
RapidMind Inc.
phone: +1 519 885 5455 x116 -- fax: +1 519 885 1463
2015 Nov 11
2
[AArch64] Address computation folding
Hi,
Indeed, the complex add is more expensive on all Cortex cores I know of.
However there is an important point here that the code sequence we generate
requires two registers live instead of one. In high regpressure loops, were
probably losing performance.
James
On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 11 November 2015 at
2018 Aug 14
4
Why did Intel change his static branch prediction mechanism during these years?
( I don't know if it's allowed to ask such question, if not, please remind me. )
I know Intel implemented several static branch prediction mechanisms
these years:
* 80486 age: Always-not-take
* Pentium4 age: Backwards Taken/Forwards Not-Taken
* PM, Core2: Didn't use static prediction, randomly depending on
what happens to be in corresponding BTB entry , according to agner's
2017 Nov 01
5
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Hello all,
I would like to propose adding the -mprefer-avx256 and -mprefer-avx128
command line flags supported by latest GCC to clang. These flags will be
used to limit the vector register size presented by TTI to the vectorizers.
The backend will still be able to use wider registers for code written
using the instrinsics in x86intrin.h. And the backend will still be able to
use AVX512VL
2016 Sep 08
2
Pattern transformation between scalar and vector on IR.
Hi All,
I'm tring to use RSQRT instructions on follow case for ARM
(now what using is sqrt):
1.0 / sqrt(x)
The RSQRT instructions(VRSQRTE/VRSQRTS) are vector type,
but above operation is scalar type. So a transformation must be
done(transform sqrt pattern to rsqrt).
I have completed a patch for this, but I made the transformation in the
backend which will leads to additional
2018 Mar 15
0
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote:
> [You can find an easier to read and more complete version of this RFC
> here
> <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>.]
>
> Knowing instruction scheduling properties (latency, uops) is the basis
> for all scheduling work done by LLVM.
>
>
>
2018 Mar 15
3
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
On Thu, Mar 15, 2018 at 4:41 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote:
>
> [You can find an easier to read and more complete version of this RFC here
> <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>
> .]
>
> Knowing
2008 Sep 03
0
[LLVMdev] Instruction MVT::ValueTypes
On Sep 3, 2008, at 1:14 PM, David Greene wrote:
> On Tuesday 02 September 2008 16:47, Evan Cheng wrote:
>> On Sep 2, 2008, at 10:42 AM, David Greene wrote:
>>> Is there an easy way to get the MVT::ValueType of a
>>> MachineInstruction
>>> MachineOperand? For example, the register operand of an x86 MOVAPD
>>> should
>>> have an
2014 Oct 17
3
[LLVMdev] oprofile support?
I've been trying to get oprofile results for jitted code without success. I
built an 3.5.0 llvm with oprofile enabled, and tested it with lli on a
small test case. I built the latest oprofile from the git repository.
Debugging I can see that lli is registering the listener and making the
oprofile calls to the libopagent api to specify the names and address
ranges of jit'd routines, and
2013 Nov 21
0
[LLVMdev] SchedMachineModel clarifications
Dear All,
Attached files is related to the changes made to add the Schedmodel for a
AMD bulldozer target,
Please note that , the model is incomplete but has some of the valuables
features implemented.
Request to the group or someone from AMD for the comments on the
implementation.
Thanks
~umesh
On Wed, Nov 13, 2013 at 8:14 PM, Umesh Kalappa <umesh.kalappa0 at gmail.com>wrote:
>
2018 Mar 15
0
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
Sounds like a very useful tool. Thank you for contributing.
Taking a step back and looking at the big picture, combining this with
the recently contributed llvm-mca dramatically improves our scheduling
and performance analysis story. Being able to take a snippet of code on
a particular machine, measure latency/throughput/ports for each
instruction (this tool), and then analyze the entire
2010 Nov 02
2
[LLVMdev] Static Profiling Algorithms in LLVM
Hello Kapil,
I have implemented a static profiler for LLVM as a google summer of
code project in 2009. I wrote it for the 2.4 branch, but the
implementation never made into the tree. I have recently ported it to
LLVM 2.8, but I haven't tested it. You can take a look at the code
from: http://homepages.dcc.ufmg.br/~rimsa/tools/stprof-llvm.patch
The implementation is based on Wu's
2016 Dec 29
1
[compiler-rt] Improve atomic locking?
Hey,
I am wondering if there wouldn't be more room for improving the
locking of a pointer when an atomic operation is being made since I've
noticed that one could increase the SPINLOCK_COUNT in
lib/builtins/atomic.c to (1 << 13) which is a 8x increase of available
locks if we also change the type of the atomic lock which currently is
uintptr_t to a single byte (uint8_t) which I
2020 Apr 22
3
_ExtInt, LLVM integers and constant time
> On Apr 22, 2020, at 12:24 AM, Roman Lebedev via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> On Wed, Apr 22, 2020 at 9:35 AM Adrien Guinet via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Hello everyone,
>>
>> After reading the nice blog post about _ExtInt, I was wondering whether
>>