thr3ads.net - similar to: "[LLVMdev] [Loop Vectorize] Question on -O3"

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] [Loop Vectorize] Question on -O3"

[LLVMdev] [Loop Vectorize] Question on -O3

2013 Jul 02

[LLVMdev] [Loop Vectorize] Question on -O3

Hi maxs, On 02/07/13 09:49, maxs wrote: > Hi, > When I use "-loop-vectorize" to vectorize a loop as below: > //==================================== > void bar(float *A, float* B, float K, int start, int end) { > for (int i = start; i < end; ++i) > A[i] *= B[i] + K; > } > //==================================== > First, I use "*clang -O0

[LLVMdev] [Loop Vectorize] Question on -O3

2013 Jul 02

[LLVMdev] [Loop Vectorize] Question on -O3

On 2 July 2013 12:00, Duncan Sands <baldrick at free.fr> wrote: > all of the advanced LLVM optimizations assume that the IR has already been > cleaned up already by the less advanced optimizers. Try running something > like -sroa -instcombine -simplifycfg first. > There could be a warning on more advanced optimizations if the pre-requisites haven't run (as per individual

[LLVMdev] [Loop Vectorize] Question on -O3

2013 Jul 02

[LLVMdev] [Loop Vectorize] Question on -O3

Hi Renato, On 02/07/13 13:12, Renato Golin wrote: > On 2 July 2013 12:00, Duncan Sands <baldrick at free.fr <mailto:baldrick at free.fr>> > wrote: > > all of the advanced LLVM optimizations assume that the IR has already been > cleaned up already by the less advanced optimizers. Try running something > like -sroa -instcombine -simplifycfg first. >

opt - replicating multiple passes from -O3 -debug-pass=Executions

2020 Aug 11

opt - replicating multiple passes from -O3 -debug-pass=Executions

Hello, I am trying to replicate the output from opt -O3 foo.bc -o foo.opt.bc by specifying the individual passes instead of the -O3 flag. Looking at the passes from opt -O3 foo.bc -o foo.bc -debug-pass=Executions it seems there are two passes being run. When I run the flags indicated for the two passes specified in the 'Pass Arguments:' as two sequential opt processes or a single opt

[LLVMdev] loop multiversioning

2015 Jan 17

[LLVMdev] loop multiversioning

Does LLVM have loop multiversioning ? it seems it does not with clang++ -O3 -mllvm -debug-pass=Arguments program.c -c bash-4.1$ clang++ -O3 -mllvm -debug-pass=Arguments fast_algorithms.c -c clang-3.6: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated Pass Arguments: -datalayout -notti -basictti -x86tti -targetlibinfo -no-aa -tbaa -scoped-noalias

Some questions about phase ordering in OPT and LLC

2016 May 09

Some questions about phase ordering in OPT and LLC

Hi, I'm a PhD student doing phase ordering as part of my PhD topic and I would like to ask some questions about LLVM. Executing the following command to see what passes does OPT execute when targeting a SPARC V8 processor: /opt/clang+llvm-3.7.1-x86_64-linux-gnu-ubuntu-15.10/bin/llvm-as < /dev/null | /opt/clang+llvm-3.7.1-x86_64-linux-gnu-ubuntu-15.10/bin/opt -O3 -march=sparc -mcpu=v8

[LLVMdev] about MemoryDependenceAnalysis usage

2015 May 11

[LLVMdev] about MemoryDependenceAnalysis usage

add -basicaa to your command line :) On Mon, May 11, 2015 at 7:15 AM, Willy WOLFF <willy.mh.wolff at gmail.com> wrote: > I play a bit more with MemoryDependenceAnalysis by wrapping my pass, and > call explicitely BasicAliasAnalysis. Its still using No Alias Analysis. > > How can I let MemoryDependenceAnalysis use BasicAliasAnalysis? > > Please, find attached my pass. >

Some questions about phase ordering in OPT and LLC

2016 May 09

Some questions about phase ordering in OPT and LLC

On Mon, May 09, 2016 at 01:07:07PM -0700, Mehdi Amini via llvm-dev wrote: > > > On May 9, 2016, at 10:43 AM, Ricardo Nobre via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > Hi, > > > > I'm a PhD student doing phase ordering as part of my PhD topic and I would like to ask some questions about LLVM. > > > > Executing the following

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

Hi, The TinyTripCountVectorThreshold only applies to loops with a known (constant) trip count. If a loop has a trip count below this value we don’t attempt to vectorize the loop. The loop below has an unknown trip count. Once we decide to vectorize a loop, we emit code to check whether we can execute one iteration of the vectorized body. This is the code quoted below. On May 22, 2013, at 10:23

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

Hi, Just from personal interest, is there a canonical way in IR+metadata to express "This small constant trip-count loop is desired to be converted into a sequence of vector operations directly"? Ie, mapping a 4 element i32 loop into a linear sequence of <4 x i32> operations. Obviously this may not always be a win, but I'm just wondering if there's a way to communicate

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

Hi, I have the llvm loop vectorizer to complie the following sample: //================= int test(int *a, int n) { for(int i = 0; i < n; i++) { a[i] += i; } return 0; } //================ The corresponded .ll file has a loop preheader: //================ for.body.lr.ph: ; preds = %entry

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

2013 Jul 17

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of

[LLVMdev] about MemoryDependenceAnalysis usage

2015 May 09

[LLVMdev] about MemoryDependenceAnalysis usage

Hi, I try to use MemoryDependenceAnalysis in a pass to analyse a simple function: void fct (int *restrict*restrict M, int *restrict*restrict L) { S1: M[1][1] = 1; S2: L[2][2] = 2; } When I iterate over MemoryDependenceAnalysis on the S2 statement, I get the load instruction for the first depth of the array, that’s ok. But I get also the load and store for the S1 statement. I assume the

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

2015 Mar 12

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

I think it would make sense for (1) and (2). I am not sure if (3) is feasible in instcombine. (I am not too familiar with LoopInfo) For the Octasic's Opus platform, I modified shouldMergeGEPs in our fork to: if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() && !Src.hasOneUse()) return false; return Src.hasAllConstantIndices(); // was return false;

Any indispensable passes?

2017 Mar 01

Any indispensable passes?

Hi everyone, I am currently testing out a combination of IR->IR passes with opt to benchmark how they affect performance. The source code works fine if simply use the clang (-O0/-O3) to directly compile to object files and link them. However, when I use opt with a select set of passes and then use llc to compile them to binary, the compiled binary is wrong. That makes me wonder if there are

Any indispensable passes?

2017 Mar 01

Any indispensable passes?

On Wed, Mar 1, 2017 at 12:53 PM, John Criswell via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On 3/1/17 2:54 PM, Peizhao Ou via llvm-dev wrote: > > Hi everyone, > > I am currently testing out a combination of IR->IR passes with opt to > benchmark how they affect performance. The source code works fine if simply > use the clang (-O0/-O3) to directly compile to

CFLAA

2016 Aug 25

CFLAA

(and sys::cas_flag that STATISTIC uses is a uint32 ...) On Thu, Aug 25, 2016 at 9:54 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > Okay, dumb question: > Are you really getting negative numbers in the second column? > > 526,766 -136 mem2reg # PHI nodes inserted > > http://llvm.org/docs/doxygen/html/PromoteMemoryToRegister_8cpp_source.html >

[LLVMdev] Modifying LoopUnrollingPass

2015 May 02

[LLVMdev] Modifying LoopUnrollingPass

Hi Zhoulai, I am trying to modify "LoopUnrollPass" in llvm which produces multiple copies of loop equal to the loop unroll factor.Currently, using multicore architecture, say 3 for example and the execution goes like: for 3 cores if there are 9 iterations of loop core instruction 1 0,3,6 2 1,4,7 3 2,5,8 But I want to to

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

2015 Mar 12

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

Hi Mark, It is not clear to me at all that preventing the merging is the right solution. There are a large number of analysis, including alias analysis, and optimizations that use GetUnderlyingObject, and related routines to search back through GEPs. They only do this up to some small finite depth (six, IIRC). So reducing the GEP depth is likely the right solution for InstCombine (which has the

CFLAA

2016 Aug 25

CFLAA

I did gathered aggregate statistics reported by “-stats” over the ~400 test files. The following table summarizes the impact. The first column is the sum where the new analysis is enabled, the second column is the delta from baseline where no CFL alias analysis is performed. I am not experienced enough to know which of these are “good” or “bad” indicators. —david 72,250 685 SLP

similar to: [LLVMdev] [Loop Vectorize] Question on -O3