thr3ads.net - similar to: "[LLVMdev] Macro-op fusion experiment"

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] Macro-op fusion experiment"

2011 Apr 08

[LLVMdev] Macro-op fusion experiment

On Apr 8, 2011, at 9:56 AM, NAKAMURA Takumi wrote: >>> 8B C3 mov eax, ebx >>> 03 C1 add eax, ecx >>> becomes >>> 8B C3 03 C1 add eax, ebx, ecx > > In my understanding, twoaddr pass tends to emit such a sequence. Yes, it always does, and the coalescer tries very hard to eliminate the copy. > Though I

[LLVMdev] Macro-op fusion experiment

2011 Apr 08

[LLVMdev] Macro-op fusion experiment

>> 8B C3 mov eax, ebx >> 03 C1 add eax, ecx >> becomes >> 8B C3 03 C1 add eax, ebx, ecx In my understanding, twoaddr pass tends to emit such a sequence. Though I don't have sandybridge, I have not measured. Prior processors(intel and amd) might spend 1 ALU to execute "mov", then mov - add must have dependency.

[LLVMdev] Macro-op fusion experiment

2011 Apr 17

[LLVMdev] Macro-op fusion experiment

Hi Jacob, As far as I know, an x86 'mov' instruction always uses an ALU resource. According to Agner Fog's documents (http://www.agner.org/optimize/), it can execute on port 0, 1 or 5 on recent architectures though. So it's not that likely to be resource limited. But it still occupies an instruction slot throughout the entire pipeline, costing power and potentially limiting other

[LLVMdev] Macro-op fusion experiment

2011 Apr 08

[LLVMdev] Macro-op fusion experiment

Hi all, x86 processors use macro-op fusion to merge together two instructions and execute them as one. So it's beneficial for the compiler to emit them as a pair. Currently only compare and jump instructions get fused though. And I was wondering whether it also makes sense to fuse move and arithmetic instructions together, to form non-destructive instructions (which x86 lacks for

[LLVMdev] Macro-op fusion experiment

2011 Apr 17

[LLVMdev] Macro-op fusion experiment

On Apr 17, 2011, at 9:59 AM, Nicolas Capens wrote: > My immediate concern is getting a reasonable estimate for how often this macro-op fusion could be performed. This could then be used to evaluate whether it's worth the added decoder complexity. In that case, just look at the generated code. I don't think any pass is inserting instructions between 'mov' and two-address

[LLVMdev] Need advice on writing scheduling pass

2011 May 24

[LLVMdev] Need advice on writing scheduling pass

Hi (Jakob), in reference to the prior message below, I have the following follow-up questions, as I also need a scheduling pass prior to regalloc. I need to do this in order to set VLIW-flags, so that the RA is aware of several MI's per cycle with a redefined LiveRange::overlap-function. On a multiple-issue cycle, a register that gets killed can be reused by another MI - these live ranges do

[LLVMdev] Need advice on writing scheduling pass

2011 May 24

[LLVMdev] Need advice on writing scheduling pass

On May 24, 2011, at 8:22 AM, Jonas Paulsson wrote: > Hi (Jakob), > > in reference to the prior message below, I have the following follow-up questions, as I also need a scheduling pass > prior to regalloc. I need to do this in order to set VLIW-flags, so that the RA is aware of several MI's > per cycle with a redefined LiveRange::overlap-function. On a multiple-issue cycle, a

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

2018 Mar 15

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

[You can find an easier to read and more complete version of this RFC here <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#> .] Knowing instruction scheduling properties (latency, uops) is the basis for all scheduling work done by LLVM. Unfortunately, vendors usually release only partial (and sometimes incorrect) information. Updating the

[LLVMdev] New machine model questions

2014 Jan 28

[LLVMdev] New machine model questions

From: Andrew Trick [mailto:atrick at apple.com] Sent: 24 January 2014 21:52 To: Daniel Sanders Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu) Subject: Re: New machine model questions On Jan 24, 2014, at 2:21 AM, Daniel Sanders <Daniel.Sanders at imgtec.com<mailto:Daniel.Sanders at imgtec.com>> wrote: Hi Andrew, I seem to be making good progress on the P5600 scheduler

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

2013 Sep 12

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

Hi Adam, > OK. I know the reason you cannot reproduce it, before posting > the patch I've decided to check for AVX before checking AVX2, > just not to cpuid AVX2 when we don't have AVX1 anyway. I suspect it was also incompetence on my part. Given the differences I'm seeing now I can't believe there'd be *no* difference in my tests if I'd done them properly.

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

2013 Nov 23

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

I agree with Tim, you need to implement a GetCpuIDAndInfoEx function in Host.cpp and pass the correct value to ecx. Also you need to verify that 7 is a valid leaf because an invalid leaf is defined to return the highest supported leaf on that processor. So if a processor supports say leaf 6 and not leaf 7, then an access leaf 7 will return the data from leaf 6 causing unrelated bits to be

Implement Loop Fusion Pass

2016 Feb 18

Implement Loop Fusion Pass

Hi all, I have created a patch (up for review at: http://reviews.llvm.org/D17386) that does Loop Fusion implementation. Approach: Legality: Currently it can fuse two adjacent loops whose iteration spaces are same and are at same depth. Dependence legality: Currently, dependence legality cannot be checked across loops. Hence the loops are cloned along a versioned path, unconditionally fused

TwoAddressInstructionPass::isProfitableToConv3Addr()

2015 Sep 29

TwoAddressInstructionPass::isProfitableToConv3Addr()

Hi, I have cases of instruction pairs, where one is cheaper 2-address, and the other 3-address. I would like to select the 2-addr instruction during isel, but use the 3-addr instruction to avoid a copy if possible. I find that TwoAddressInstructionPass::isProfitableToConv3Addr() is only checking for the case of a physreg copy, and so leaves the majority of cases as they are (2-address). I

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

2018 Mar 15

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

On Thu, Mar 15, 2018 at 4:41 PM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote: > > [You can find an easier to read and more complete version of this RFC here > <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#> > .] > > Knowing

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

2013 Sep 12

[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)

> That's far more worrying to me than not being able to detect Haswell. > I can't reproduce the problem here at the moment: both debug and > release builds give identical assembly for Host.cpp. OK. I know the reason you cannot reproduce it, before posting the patch I've decided to check for AVX before checking AVX2, just not to cpuid AVX2 when we don't have AVX1 anyway.

AArch64 fmul/fadd fusion

2015 Sep 19

AArch64 fmul/fadd fusion

Hi All, Recently I was doing some AArch64 work and noticed some cases where fmuls were not getting fused with fadds. Is there any particular reason that the AArch64 machine combiner doesn't do this like it does for add/mul? I am happy to work up a patch for this, but I wanted to make sure that there wasn't a good reason for it not already being there. FWIW, I see where GCC is doing

Implement Loop Fusion Pass

2016 Feb 19

Implement Loop Fusion Pass

Hi, Thanks for the reply. Few thoughts inlined. On Fri, Feb 19, 2016 at 8:00 AM, Adam Nemet <anemet at apple.com> wrote: > Hi Vikram, > > On Feb 18, 2016, at 9:21 AM, Vikram TV via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi all, > > I have created a patch (up for review at: http://reviews.llvm.org/D17386) > that does Loop Fusion implementation.

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

2018 Mar 15

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote: > [You can find an easier to read and more complete version of this RFC > here > <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>.] > > Knowing instruction scheduling properties (latency, uops) is the basis > for all scheduling work done by LLVM. > > >

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

2018 Mar 15

[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops

Sounds like a very useful tool. Thank you for contributing. Taking a step back and looking at the big picture, combining this with the recently contributed llvm-mca dramatically improves our scheduling and performance analysis story. Being able to take a snippet of code on a particular machine, measure latency/throughput/ports for each instruction (this tool), and then analyze the entire

Search list of elements for a specific pattern

2012 Jun 22

Search list of elements for a specific pattern

Hi, I have a list of mutations, called "mutList", of the form: > head(mutList) Alu 1 AluJ 2 AluJ/F(R)AM 3 AluJ/FLAM 4 AluJ/FRAM 5 AluJ/monomer 6 AluJb It contains about 500 elements and not all of them contain the sequence "Alu". I tried using this code: Alu<-mutList[which(grep("Alu",mutList)==1)] But that simply returned

similar to: [LLVMdev] Macro-op fusion experiment