similar to: Loop Unrolling Fail in Simple Vectorized loop

Displaying 20 results from an estimated 900 matches similar to: "Loop Unrolling Fail in Simple Vectorized loop"

2016 Oct 13
2
Loop Unrolling Fail in Simple Vectorized loop
Thanks for the explanation. But I am a little confused with the following fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert the loop to say; for(unsigned i = 0; i < vectorizable_elements ; i += 2){ //main loop } for(unsigned i=0 ; i < vectorizable_elements % 2; i++){ //fix up } Why does it have to reason about the range of vectorizable_elements? Even
2016 Oct 13
2
Loop Unrolling Fail in Simple Vectorized loop
If count > MAX_UINT-4 your loop loops indefinitely with an increment of 4, I think. On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev < llvm-dev at lists.llvm.org> wrote: > So, I tried unrolling the following simple loop. > > int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ > > for(unsigned i=0; i<count; i++){ > > a[i] =
2018 Aug 09
3
Legacy Loop Pass Manager question
Hi, If we add multiple loop passes to the pass manager in PassManagerBuilder.cpp consecutively without any func/module pass in between, I used to think they would belong to the same loop pass manager. But it does not seem to be the case. For example for this code snippet PM.add(createIndVarSimplifyPass()); // Canonicalize indvars MPM.add(createLoopIdiomPass()); //
2016 Sep 03
2
llc error
I updated to the latest revision and now llvm does not build and quits cmake with CMake Error at cmake/modules/LLVMProcessSources.cmake:83 (message): Found unknown source file ../llvm-revec/lib/CodeGen/MachineFunctionAnalysis.cpp Please update ../llvm-revec/lib/CodeGen/CMakeLists.txt Thanks On Sat, Sep 3, 2016 at 2:09 AM, Craig Topper <craig.topper at gmail.com> wrote: >
2017 Aug 21
2
Vectorization in LLVM x86 backend
I isolated the LLVM IR and the X86 instructions emitted for the function and are attached herewith and it is clearly emitting vector instructions. I am having a hard time figuring out where the vector instructions are formulated. For sure SLP and Loop vectorizer is not doing anything. On Mon, Aug 21, 2017 at 11:56 AM, Craig Topper <craig.topper at gmail.com> wrote: > The X86 backend
2016 Sep 03
4
llc error
Hi all, The attached LLVM assembly file fails to generate x86 code when compiled using llc. compilation command - ../llvm-build/bin/llc -filetype=asm -march=x86-64 -mcpu=core-avx2 ex4.ll The error message is, LLVM ERROR: Cannot select: t95: v8f32 = X86ISD::SUBV_BROADCAST t17 t17: v4f32,ch = load<LD16[%scevgep](tbaa=<0x4dbcd98>)> t0, t16, undef:i64 t16: i64 = add t2,
2017 Oct 03
2
Changing Alignment of global variables in LLVM
If I know for sure I am accessing 32 byte chunks at a time, how can I go about changing the alignment of @u? Should I use DataLayout's reset method? I couldn't find a method to change alignment of one global variable. Thanks On Tue, Oct 3, 2017 at 6:34 PM, Matthias Braun <mbraun at apple.com> wrote: > The effective alignment is part of the load and store operations. Updating
2017 Oct 03
2
Changing Alignment of global variables in LLVM
What is the best way to change the alignment of global variables and allocated structures in LLVM during one of its optimization passes? For example, I want to change, @u = internal unnamed_addr global [5 x [65 x [65 x [65 x double]]]] zeroinitializer, align 16 to align to 32 bytes. How can this be accomplished so that all other references in the code accessing this structure are also
2017 Aug 21
2
Vectorization in LLVM x86 backend
Hi all, Recently I compiled the attached .c file using Clang with "-mavx2 -mfma -m32 -O3" optimization flags. First I used -emit-llvm and inspected the LLVM IR and there are no vector instructions. Then I got the assembly output of the file in it I can clearly see vector instructions in it. Neither the SLPVectorizer or the LoopVectorizer is however doing any vectorization (also
2016 Oct 04
2
Getting the symbolic expression for an address calculation
How do you generate a SCEVAddRecExpr from a SCEV? It tried dyn_casting and it seems like that the SCEV returned by getSCEV is not a SCEVAddRecExpr. Thanks On Fri, Sep 30, 2016 at 4:16 PM, Friedman, Eli <efriedma at codeaurora.org> wrote: > On 9/30/2016 12:16 PM, Charith Mendis via llvm-dev wrote: > >> >> Hi all, >> >> What is the best way to get the symbolic
2016 Sep 30
2
Getting the symbolic expression for an address calculation
Hi all, What is the best way to get the symbolic expression for an address calculation in llvm specially when memory addresses are calculated within a loop. Use case: I want to know what loop induction variables are used for a particular address calculation and in what symbolic context. Thereby, I want to identify which stores and loads will be contiguous in memory if I unroll each of the
2016 Oct 28
1
Vector Shuffle chain lowering to X86 instructions simplification inconsistencies
Hi all, Attached herewith is a fairly simple LLVM file (shuffle.ll) with lots of vector shuffles. When I use llc with -O3 -mcpu=core-avx2 the first shuffle sequence containing types of 128 wide gets reduced a single shuffle, where as the second shuffle sequence containing types of 256 wide doesn't get reduced to a single shuffle instruction in the resulting X86 code (Shuffle.s attached).
2017 Dec 21
2
Pass ordering - GVN vs. loop optimizations
Hi, This is Ariel from the Rust team again. I am having another pass ordering issue. Looking at the pass manager at https://github.com/llvm-mirror/llvm/blob/7034870f30320d6fbc74effff539d946018cd00a/lib/Transforms/IPO/PassManagerBuilder.cpp (the early SimplifyCfg now doesn't sink stores anymore! I can't wait until I can get to use that in rustc!) I find that the loop optimization group
2013 Apr 17
0
[LLVMdev] [polly] pass ordering
On 04/17/2013 09:04 PM, Sebastian Pop wrote: > Tobias Grosser wrote: >> As said before, we could probably add it in between those two passes: >> >> MPM.add(createReassociatePass()); // Reassociate expressions >> + addExtensionsToPM(EP_LoopOptimizerStart, MPM); >> MPM.add(createLoopRotatePass()); // Rotate Loop > > As this is in the middle of other
2013 Sep 25
0
[LLVMdev] [Polly] Move Polly's execution later
Here is an update about moving Polly later. 1. Why does Polly generate incorrect code when we move Polly immediately after the loop rotating pass? It is mainly caused by a wrong polly merge block. When Polly detects a valid loop for Polyhedral transformations, it usually introduces a new basic block "polly.merge_new_and_old" after the original loop exit block. This new basic block
2013 Jul 18
3
[LLVMdev] IR Passes and TargetTransformInfo: Straw Man
Andy and I briefly discussed this the other day, we have not yet got chance to list a detailed pass order for the pre- and post- IPO scalar optimizations. This is wish-list in our mind: pre-IPO: based on the ordering he propose, get rid of the inlining (or just inline tiny func), get rid of all loop xforms... post-IPO: get rid of inlining, or maybe we still need it, only
2013 Sep 25
3
[LLVMdev] [Polly] Move Polly's execution later
On 09/25/2013 04:55 AM, Star Tan wrote: > Here is an update about moving Polly later. Hi star tan, thanks for your report. > > 1. Why does Polly generate incorrect code when we move Polly immediately after the loop rotating pass? > > It is mainly caused by a wrong polly merge block. When Polly detects a valid loop for Polyhedral transformations, it usually introduces a new basic
2013 Apr 17
2
[LLVMdev] [polly] pass ordering
Tobias Grosser wrote: > As said before, we could probably add it in between those two passes: > > MPM.add(createReassociatePass()); // Reassociate expressions > + addExtensionsToPM(EP_LoopOptimizerStart, MPM); > MPM.add(createLoopRotatePass()); // Rotate Loop As this is in the middle of other LNO passes, can you please rename s/EP_LoopOptimizerStart/EP_Polly_LNO/ or
2015 Dec 20
3
How to run InternalizePass
I'm working on a whole program optimizer that uses LLVM as a library, and one of the things I want to do is eliminate dead global functions and variables even when they are not local to a module. (This understandably doesn't happen by default because the optimizer has to assume it could be compiling a library rather than a program.) I've actually written a function to do this, but
2013 Jan 31
1
The way Puppet installs things fail
Basically, the way puppet installs things things with /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install <package_name> fails due to authentication WARNING: The following packages cannot be authenticated But if I do it from an ssh console normally, using apt-get install <package_name> it works fine without issues. Is there a way to change how puppet uses the