similar to: [LLVMdev] Macro-op fusion experiment

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] Macro-op fusion experiment"

2011 Apr 08
3
[LLVMdev] Macro-op fusion experiment
On Apr 8, 2011, at 9:56 AM, NAKAMURA Takumi wrote: >>> 8B C3 mov eax, ebx >>> 03 C1 add eax, ecx >>> becomes >>> 8B C3 03 C1 add eax, ebx, ecx > > In my understanding, twoaddr pass tends to emit such a sequence. Yes, it always does, and the coalescer tries very hard to eliminate the copy. > Though I
2011 Apr 08
0
[LLVMdev] Macro-op fusion experiment
>>                 8B C3 mov eax, ebx >>                 03 C1 add eax, ecx >> becomes >>                 8B C3 03 C1 add eax, ebx, ecx In my understanding, twoaddr pass tends to emit such a sequence. Though I don't have sandybridge, I have not measured. Prior processors(intel and amd) might spend 1 ALU to execute "mov", then mov - add must have dependency.
2011 Apr 17
0
[LLVMdev] Macro-op fusion experiment
Hi Jacob, As far as I know, an x86 'mov' instruction always uses an ALU resource. According to Agner Fog's documents (http://www.agner.org/optimize/), it can execute on port 0, 1 or 5 on recent architectures though. So it's not that likely to be resource limited. But it still occupies an instruction slot throughout the entire pipeline, costing power and potentially limiting other
2011 Apr 08
0
[LLVMdev] Macro-op fusion experiment
Hi all, x86 processors use macro-op fusion to merge together two instructions and execute them as one. So it's beneficial for the compiler to emit them as a pair. Currently only compare and jump instructions get fused though. And I was wondering whether it also makes sense to fuse move and arithmetic instructions together, to form non-destructive instructions (which x86 lacks for
2011 Apr 17
1
[LLVMdev] Macro-op fusion experiment
On Apr 17, 2011, at 9:59 AM, Nicolas Capens wrote: > My immediate concern is getting a reasonable estimate for how often this macro-op fusion could be performed. This could then be used to evaluate whether it's worth the added decoder complexity. In that case, just look at the generated code. I don't think any pass is inserting instructions between 'mov' and two-address
2011 May 24
4
[LLVMdev] Need advice on writing scheduling pass
Hi (Jakob), in reference to the prior message below, I have the following follow-up questions, as I also need a scheduling pass prior to regalloc. I need to do this in order to set VLIW-flags, so that the RA is aware of several MI's per cycle with a redefined LiveRange::overlap-function. On a multiple-issue cycle, a register that gets killed can be reused by another MI - these live ranges do
2011 May 24
0
[LLVMdev] Need advice on writing scheduling pass
On May 24, 2011, at 8:22 AM, Jonas Paulsson wrote: > Hi (Jakob), > > in reference to the prior message below, I have the following follow-up questions, as I also need a scheduling pass > prior to regalloc. I need to do this in order to set VLIW-flags, so that the RA is aware of several MI's > per cycle with a redefined LiveRange::overlap-function. On a multiple-issue cycle, a
2018 Mar 15
5
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
[You can find an easier to read and more complete version of this RFC here <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#> .] Knowing instruction scheduling properties (latency, uops) is the basis for all scheduling work done by LLVM. Unfortunately, vendors usually release only partial (and sometimes incorrect) information. Updating the
2014 Jan 28
3
[LLVMdev] New machine model questions
From: Andrew Trick [mailto:atrick at apple.com] Sent: 24 January 2014 21:52 To: Daniel Sanders Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu) Subject: Re: New machine model questions On Jan 24, 2014, at 2:21 AM, Daniel Sanders <Daniel.Sanders at imgtec.com<mailto:Daniel.Sanders at imgtec.com>> wrote: Hi Andrew, I seem to be making good progress on the P5600 scheduler
2013 Sep 12
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Hi Adam, > OK. I know the reason you cannot reproduce it, before posting > the patch I've decided to check for AVX before checking AVX2, > just not to cpuid AVX2 when we don't have AVX1 anyway. I suspect it was also incompetence on my part. Given the differences I'm seeing now I can't believe there'd be *no* difference in my tests if I'd done them properly.
2013 Nov 23
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
I agree with Tim, you need to implement a GetCpuIDAndInfoEx function in Host.cpp and pass the correct value to ecx. Also you need to verify that 7 is a valid leaf because an invalid leaf is defined to return the highest supported leaf on that processor. So if a processor supports say leaf 6 and not leaf 7, then an access leaf 7 will return the data from leaf 6 causing unrelated bits to be
2016 Feb 18
2
Implement Loop Fusion Pass
Hi all, I have created a patch (up for review at: http://reviews.llvm.org/D17386) that does Loop Fusion implementation. Approach: Legality: Currently it can fuse two adjacent loops whose iteration spaces are same and are at same depth. Dependence legality: Currently, dependence legality cannot be checked across loops. Hence the loops are cloned along a versioned path, unconditionally fused
2015 Sep 29
4
TwoAddressInstructionPass::isProfitableToConv3Addr()
Hi, I have cases of instruction pairs, where one is cheaper 2-address, and the other 3-address. I would like to select the 2-addr instruction during isel, but use the 3-addr instruction to avoid a copy if possible. I find that TwoAddressInstructionPass::isProfitableToConv3Addr() is only checking for the case of a physreg copy, and so leaves the majority of cases as they are (2-address). I
2018 Mar 15
3
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
On Thu, Mar 15, 2018 at 4:41 PM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote: > > [You can find an easier to read and more complete version of this RFC here > <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#> > .] > > Knowing
2013 Sep 12
3
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
> That's far more worrying to me than not being able to detect Haswell. > I can't reproduce the problem here at the moment: both debug and > release builds give identical assembly for Host.cpp. OK. I know the reason you cannot reproduce it, before posting the patch I've decided to check for AVX before checking AVX2, just not to cpuid AVX2 when we don't have AVX1 anyway.
2015 Sep 19
2
AArch64 fmul/fadd fusion
Hi All, Recently I was doing some AArch64 work and noticed some cases where fmuls were not getting fused with fadds. Is there any particular reason that the AArch64 machine combiner doesn't do this like it does for add/mul? I am happy to work up a patch for this, but I wanted to make sure that there wasn't a good reason for it not already being there. FWIW, I see where GCC is doing
2016 Feb 19
3
Implement Loop Fusion Pass
Hi, Thanks for the reply. Few thoughts inlined. On Fri, Feb 19, 2016 at 8:00 AM, Adam Nemet <anemet at apple.com> wrote: > Hi Vikram, > > On Feb 18, 2016, at 9:21 AM, Vikram TV via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi all, > > I have created a patch (up for review at: http://reviews.llvm.org/D17386) > that does Loop Fusion implementation.
2018 Mar 15
0
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote: > [You can find an easier to read and more complete version of this RFC > here > <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>.] > > Knowing instruction scheduling properties (latency, uops) is the basis > for all scheduling work done by LLVM. > > >
2018 Mar 15
0
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
Sounds like a very useful tool.  Thank you for contributing. Taking a step back and looking at the big picture, combining this with the recently contributed llvm-mca dramatically improves our scheduling and performance analysis story.  Being able to take a snippet of code on a particular machine, measure latency/throughput/ports for each instruction (this tool), and then analyze the entire
2012 Jun 22
4
Search list of elements for a specific pattern
Hi, I have a list of mutations, called "mutList", of the form: > head(mutList) Alu 1 AluJ 2 AluJ/F(R)AM 3 AluJ/FLAM 4 AluJ/FRAM 5 AluJ/monomer 6 AluJb It contains about 500 elements and not all of them contain the sequence "Alu". I tried using this code: Alu<-mutList[which(grep("Alu",mutList)==1)] But that simply returned