thr3ads.net - similar to: "[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize."

Displaying 20 results from an estimated 600 matches similar to: "[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize."

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should

2013 Jul 27

Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is

Extending SLP Vectorizer to deal with aggregates?

2015 Oct 14

Extending SLP Vectorizer to deal with aggregates?

I'm looking for a sanity check on extending SLP Vectorizer to deal with aggregates. I'd like to vectorize Julia tuple operations. The Julia compiler lowers tuples to LLVM arrays, not LLVM vectors. I've tried making Julia lower tuples to LLVM vectors, but that hurt performance when SLP Vectorizer was not applicable, because of extraction/insertion overhead. I.e., the Julia lowering

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2 %w = getelementptr [4 x i32]* %0, i32 0, i32 3

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >

[LLVMdev] Modifications to SLP

2015 Jul 07

[LLVMdev] Modifications to SLP

Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

On Thu, Oct 24, 2013 at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > > I've been playing around with the SLPVectorizer trying to get it to > vectorize this simple program: > > define void @vector(i32 addrspace(1)* %out, i32 %index) { > entry: > %0 = alloca [4 x i32] > %x = getelementptr [4 x i32]* %0, i32 0, i32 0 > %y = getelementptr [4

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 07

[LLVMdev] How to broaden the SLP vectorizer's search

The BB vectorizer has an option 'bb-vectorizer-search-limit'. Is there a similar option for the SLP vectorizer? Maybe an analysis pass' scope that can be widen? I have large basic blocks with instructions that should be merged into packed versions. However, the blocks are optimized independently from each other. Now, if the instructions to be merged aren't too far apart the

[LLVMdev] Extend SLPVectorizer to struct operations that are isomorphic to vector operations?

2014 Apr 17

[LLVMdev] Extend SLPVectorizer to struct operations that are isomorphic to vector operations?

While playing with SLPVectorizer, I notice that it will happily vectorize cases involving extractelement/insertelement, but won't vectorize isomorphic cases involving extractvalue/insertvalue (such as the attached example). Is that something that could be straightforward to add to SLPVectorizer, or are there some hard issue? In particular, the transformation would seem to require casts of

GlobalsAA from GVN

2015 Dec 04

GlobalsAA from GVN

>You could, in the LTO pipeline, reinsert GlobalsAA after the SLPVectorizer (not saying you should). I didn't realise that adding GlobalsAA* after* SLPVectorizer could help. Thanks for this tip. >There is something fishy. Do you have a test case that reproduce with llvm-lto? I'm currently looking at a proprietary benchmark. I'll try to extract out a simple test case and send it.

GlobalsAA from GVN

2015 Dec 03

GlobalsAA from GVN

Hi Mehdi, Thank you for the response. I'm actually on an LTO setup and was referring to PassManagerBuilder::addLTOOptimizationPasses. Here, GlobalsAA is scheduled to run well ahead of SLPVectorizer. However since GlobalsAA is a module pass, it runs once and a bunch of passes, including SLPVectorizer is run for each function. When one of them invalidates the analysis, rest of the functions do

[LLVMdev] SLP vectorizer turned on in commit r190916 which says nothing about it - how to turn it off?

2013 Nov 07

[LLVMdev] SLP vectorizer turned on in commit r190916 which says nothing about it - how to turn it off?

Revision 190916 Commit message: "Lift alignment restrictions for load/store folding on VINSERTF128/VEXTRACTF128. Fixes PR17268." Actual contents of the commit includes Index: tools/opt/opt.cpp =================================================================== --- tools/opt/opt.cpp (revision 190915) +++ tools/opt/opt.cpp (revision 190916) @@ -462,6 +462,7 @@

llvm-lit: 2>&1 and FileCheck

2017 Feb 23

llvm-lit: 2>&1 and FileCheck

Hi all, quite a few tests use the pattern "2>&1 | FileCheck %s". AFAIK how stdout and stderr are merged into a single character stream is undefined and depends e.g. on whether stdout is buffered. I think we are often saved by the fact that standard output is written only at the end of the program and stderr is unbuffered, i.e. always written before stdout. A lot of tests disable

Vectorization in LLVM x86 backend

2017 Aug 21

Vectorization in LLVM x86 backend

Hi all, Recently I compiled the attached .c file using Clang with "-mavx2 -mfma -m32 -O3" optimization flags. First I used -emit-llvm and inspected the LLVM IR and there are no vector instructions. Then I got the assembly output of the file in it I can clearly see vector instructions in it. Neither the SLPVectorizer or the LoopVectorizer is however doing any vectorization (also

[LLVMdev] Improving SLPVectorizer for Julia

2014 Mar 17

[LLVMdev] Improving SLPVectorizer for Julia

I'm working on some small improvements to SLPVectorizer.cpp so that it can deal with some tuple operations arising from Julia code. Being fairly new to LLVM, I could use some advice, particular from those familiar with the internals of SLPVectorizer. The motivation can be found in the Julia discussion https://github.com/JuliaLang/julia/issues/5857 . Here is an example of the kind of LLVM

Vector evolution?

2020 Sep 01

Vector evolution?

On Tue, Sep 1, 2020 at 5:10 PM Florian Hahn <florian_hahn at apple.com> wrote: > The loop vectorizer does not really handle loops that already operate on vectors, so that is why the loop using v4f32 does not get widened. > > Arguably the user explicitly asked for 4xfloat vectors in the v4f32 version, so that is what gets generated. In my case I have tons of legacy code written for

RFC phantom memory intrinsic

2017 Sep 12

RFC phantom memory intrinsic

Hi, For PR21780 solution, I plan to add a new functionality to restore memory operations that was once deleted, in this particular case it is the load operations that were deleted by InstCombine, please note that once the load was removed there is no way to restore it back and that prevents us from vectorizing the shuffle operation. There are probably more similar issues where this approach could

Vectorization in LLVM x86 backend

2017 Aug 21

Vectorization in LLVM x86 backend

I isolated the LLVM IR and the X86 instructions emitted for the function and are attached herewith and it is clearly emitting vector instructions. I am having a hard time figuring out where the vector instructions are formulated. For sure SLP and Loop vectorizer is not doing anything. On Mon, Aug 21, 2017 at 11:56 AM, Craig Topper <craig.topper at gmail.com> wrote: > The X86 backend

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

2015 Nov 09

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

I've done compile-time experiments for AArch64 over SPEC{2000,2006} and of course the test-suite. I measure no significant compile-time impact of enabling this feature by default. I also ran the test-suite on an X86-64 machine. I can't imagine any other targets being uniquely effected in terms of compile-time by turning this on after testing both AArch64 and X86-64. I also timed running

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

On Oct 24, 2013, at 3:00 PM, Chandler Carruth <chandlerc at google.com> wrote: > Just a note, I don't think you should or need to vectorize the actual alloca stuff. If you can simply transform the dynamically indexed load: > > Then running SROA and InstCombine will mop up the rest. So its mostly about getting the SLPVectorizer to handle the dynamic GEP. As soon as it does

similar to: [LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.