thr3ads.net - search: "microarchitectural"

Displaying 20 results from an estimated 156 matches for "microarchitectural".

(RFC) Adjusting default loop fully unroll threshold

2017 Feb 13

(RFC) Adjusting default loop fully unroll threshold

...is on x86 microarchitectures. Until someone shows up with data showing that we need different tunings for different microarchitectures, it doesn't make sense for us to just make up numbers there. On the (very limited) microarchitectures we have and can test on, we're not seeing a need for microarchitectural tuning. But if others have different data, that would of course be welcome. That's part of what we're looking for in this thread. > I have no data or prove but would not be surprised to see a wider variety > of numbers when the thresholds are tested on a wide range of x86 machines....

[PATCH][XENOPROFILE] add support for Intel CORE microarchitecture

2006 Oct 02

[PATCH][XENOPROFILE] add support for Intel CORE microarchitecture

This adds support for core and core2 chips. Tested on Woodcrest processors. Requires Oprofile 0.9.2. -Andrew Signed-off-by: Andrew Theurer <habanero@us.ibm.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel

Load combine pass

2016 Sep 29

Load combine pass

On 29 Sep 2016, at 01:25, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > Hi David, > > David Chisnall via llvm-dev wrote: > > On 28 Sep 2016, at 16:50, Philip Reames via llvm-dev<llvm-dev at lists.llvm.org> wrote: > >> At this point, my general view is that widening transformations of any kind should be done very late. Ideally, this is

[AArch64] Address computation folding

2015 Nov 11

[AArch64] Address computation folding

Hi, I was looking at some AArch64 benchmarks and noticed some simple cases where addresses are being folded into the address mode computations and was curious as to why. In particular, consider the following simple example: void f2(unsigned long *x, unsigned long c) { x[c] *= 2; } This generates: lsl x8, x1, #3 ldr x9, [x0, x8] lsl x9, x9, #1 str x9, [x0, x8] Given the two

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

2018 Mar 15

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

[You can find an easier to read and more complete version of this RFC here <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#> .] Knowing instruction scheduling properties (latency, uops) is the basis for all scheduling work done by LLVM. Unfortunately, vendors usually release only partial (and sometimes incorrect) information. Updating the

[LLVMdev] SchedMachineModel clarifications

2013 Nov 13

[LLVMdev] SchedMachineModel clarifications

Dear Andrew and the Group, I’m trying come up with a SchedMachineModel for the AMD bulldozer http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture). The model is not exist for the same .Please correct me if am i wrong here. I was going through your reference @ https://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/Target/TargetSchedule.td . But I couldn’t model some of the

[LLVMdev] Generating movq2dq using IRBuilder

2008 Jul 31

[LLVMdev] Generating movq2dq using IRBuilder

On 31-Jul-08, at 2:38 PM, Dan Gohman wrote: > On Jul 31, 2008, at 7:22 AM, Nicolas Capens wrote: >> In the same breath I’d also like to kindly ask if someone could have >> a look at the reverse operations, namely trunk from 128 to 64 bit >> using movdq2q, and 128 to 32 and 64 to 32 using movd. This also >> seems related to Bug 2585. Thanks again. > > The operations

[AArch64] Address computation folding

2015 Nov 11

[AArch64] Address computation folding

Hi, Indeed, the complex add is more expensive on all Cortex cores I know of. However there is an important point here that the code sequence we generate requires two registers live instead of one. In high regpressure loops, were probably losing performance. James On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On 11 November 2015 at

Why did Intel change his static branch prediction mechanism during these years?

2018 Aug 14

Why did Intel change his static branch prediction mechanism during these years?

( I don't know if it's allowed to ask such question, if not, please remind me. ) I know Intel implemented several static branch prediction mechanisms these years: * 80486 age: Always-not-take * Pentium4 age: Backwards Taken/Forwards Not-Taken * PM, Core2: Didn't use static prediction, randomly depending on what happens to be in corresponding BTB entry , according to agner's

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 01

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

Hello all, I would like to propose adding the -mprefer-avx256 and -mprefer-avx128 command line flags supported by latest GCC to clang. These flags will be used to limit the vector register size presented by TTI to the vectorizers. The backend will still be able to use wider registers for code written using the instrinsics in x86intrin.h. And the backend will still be able to use AVX512VL

Pattern transformation between scalar and vector on IR.

2016 Sep 08

Pattern transformation between scalar and vector on IR.

Hi All, I'm tring to use RSQRT instructions on follow case for ARM (now what using is sqrt): 1.0 / sqrt(x) The RSQRT instructions(VRSQRTE/VRSQRTS) are vector type, but above operation is scalar type. So a transformation must be done(transform sqrt pattern to rsqrt). I have completed a patch for this, but I made the transformation in the backend which will leads to additional

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

2018 Mar 15

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote: > [You can find an easier to read and more complete version of this RFC > here > <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>.] > > Knowing instruction scheduling properties (latency, uops) is the basis > for all scheduling work done by LLVM. > > >

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

2018 Mar 15

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

On Thu, Mar 15, 2018 at 4:41 PM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote: > > [You can find an easier to read and more complete version of this RFC here > <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#> > .] > > Knowing

[LLVMdev] Instruction MVT::ValueTypes

2008 Sep 03

[LLVMdev] Instruction MVT::ValueTypes

On Sep 3, 2008, at 1:14 PM, David Greene wrote: > On Tuesday 02 September 2008 16:47, Evan Cheng wrote: >> On Sep 2, 2008, at 10:42 AM, David Greene wrote: >>> Is there an easy way to get the MVT::ValueType of a >>> MachineInstruction >>> MachineOperand? For example, the register operand of an x86 MOVAPD >>> should >>> have an

[LLVMdev] oprofile support?

2014 Oct 17

[LLVMdev] oprofile support?

I've been trying to get oprofile results for jitted code without success. I built an 3.5.0 llvm with oprofile enabled, and tested it with lli on a small test case. I built the latest oprofile from the git repository. Debugging I can see that lli is registering the listener and making the oprofile calls to the libopagent api to specify the names and address ranges of jit'd routines, and

[LLVMdev] SchedMachineModel clarifications

2013 Nov 21

[LLVMdev] SchedMachineModel clarifications

Dear All, Attached files is related to the changes made to add the Schedmodel for a AMD bulldozer target, Please note that , the model is incomplete but has some of the valuables features implemented. Request to the group or someone from AMD for the comments on the implementation. Thanks ~umesh On Wed, Nov 13, 2013 at 8:14 PM, Umesh Kalappa <umesh.kalappa0 at gmail.com>wrote: >

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

2018 Mar 15

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

Sounds like a very useful tool. Thank you for contributing. Taking a step back and looking at the big picture, combining this with the recently contributed llvm-mca dramatically improves our scheduling and performance analysis story. Being able to take a snippet of code on a particular machine, measure latency/throughput/ports for each instruction (this tool), and then analyze the entire

[LLVMdev] Static Profiling Algorithms in LLVM

2010 Nov 02

[LLVMdev] Static Profiling Algorithms in LLVM

Hello Kapil, I have implemented a static profiler for LLVM as a google summer of code project in 2009. I wrote it for the 2.4 branch, but the implementation never made into the tree. I have recently ported it to LLVM 2.8, but I haven't tested it. You can take a look at the code from: http://homepages.dcc.ufmg.br/~rimsa/tools/stprof-llvm.patch The implementation is based on Wu's

[compiler-rt] Improve atomic locking?

2016 Dec 29

[compiler-rt] Improve atomic locking?

Hey, I am wondering if there wouldn't be more room for improving the locking of a pointer when an atomic operation is being made since I've noticed that one could increase the SPINLOCK_COUNT in lib/builtins/atomic.c to (1 << 13) which is a 8x increase of available locks if we also change the type of the atomic lock which currently is uintptr_t to a single byte (uint8_t) which I

_ExtInt, LLVM integers and constant time

2020 Apr 22

_ExtInt, LLVM integers and constant time

> On Apr 22, 2020, at 12:24 AM, Roman Lebedev via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Wed, Apr 22, 2020 at 9:35 AM Adrien Guinet via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hello everyone, >> >> After reading the nice blog post about _ExtInt, I was wondering whether >>

search for: microarchitectural