similar to: [LLVMdev] [Loop Vectorize] Question on -O3

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] [Loop Vectorize] Question on -O3"

2013 Jul 02
0
[LLVMdev] [Loop Vectorize] Question on -O3
Hi maxs, On 02/07/13 09:49, maxs wrote: > Hi, > When I use "-loop-vectorize" to vectorize a loop as below: > //==================================== > void bar(float *A, float* B, float K, int start, int end) { > for (int i = start; i < end; ++i) > A[i] *= B[i] + K; > } > //==================================== > First, I use "*clang -O0
2013 Jul 02
2
[LLVMdev] [Loop Vectorize] Question on -O3
On 2 July 2013 12:00, Duncan Sands <baldrick at free.fr> wrote: > all of the advanced LLVM optimizations assume that the IR has already been > cleaned up already by the less advanced optimizers. Try running something > like -sroa -instcombine -simplifycfg first. > There could be a warning on more advanced optimizations if the pre-requisites haven't run (as per individual
2013 Jul 02
0
[LLVMdev] [Loop Vectorize] Question on -O3
Hi Renato, On 02/07/13 13:12, Renato Golin wrote: > On 2 July 2013 12:00, Duncan Sands <baldrick at free.fr <mailto:baldrick at free.fr>> > wrote: > > all of the advanced LLVM optimizations assume that the IR has already been > cleaned up already by the less advanced optimizers. Try running something > like -sroa -instcombine -simplifycfg first. >
2020 Aug 11
2
opt - replicating multiple passes from -O3 -debug-pass=Executions
Hello, I am trying to replicate the output from opt -O3 foo.bc -o foo.opt.bc by specifying the individual passes instead of the -O3 flag. Looking at the passes from opt -O3 foo.bc -o foo.bc -debug-pass=Executions it seems there are two passes being run. When I run the flags indicated for the two passes specified in the 'Pass Arguments:' as two sequential opt processes or a single opt
2015 Jan 17
3
[LLVMdev] loop multiversioning
Does LLVM have loop multiversioning ? it seems it does not with clang++ -O3 -mllvm -debug-pass=Arguments program.c -c bash-4.1$ clang++ -O3 -mllvm -debug-pass=Arguments fast_algorithms.c -c clang-3.6: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated Pass Arguments: -datalayout -notti -basictti -x86tti -targetlibinfo -no-aa -tbaa -scoped-noalias
2016 May 09
4
Some questions about phase ordering in OPT and LLC
Hi, I'm a PhD student doing phase ordering as part of my PhD topic and I would like to ask some questions about LLVM. Executing the following command to see what passes does OPT execute when targeting a SPARC V8 processor: /opt/clang+llvm-3.7.1-x86_64-linux-gnu-ubuntu-15.10/bin/llvm-as < /dev/null | /opt/clang+llvm-3.7.1-x86_64-linux-gnu-ubuntu-15.10/bin/opt -O3 -march=sparc -mcpu=v8
2015 May 11
2
[LLVMdev] about MemoryDependenceAnalysis usage
add -basicaa to your command line :) On Mon, May 11, 2015 at 7:15 AM, Willy WOLFF <willy.mh.wolff at gmail.com> wrote: > I play a bit more with MemoryDependenceAnalysis by wrapping my pass, and > call explicitely BasicAliasAnalysis. Its still using No Alias Analysis. > > How can I let MemoryDependenceAnalysis use BasicAliasAnalysis? > > Please, find attached my pass. >
2016 May 09
2
Some questions about phase ordering in OPT and LLC
On Mon, May 09, 2016 at 01:07:07PM -0700, Mehdi Amini via llvm-dev wrote: > > > On May 9, 2016, at 10:43 AM, Ricardo Nobre via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > Hi, > > > > I'm a PhD student doing phase ordering as part of my PhD topic and I would like to ask some questions about LLVM. > > > > Executing the following
2013 May 23
0
[LLVMdev] LLVM Loop Vectorizer puzzle
Hi, The TinyTripCountVectorThreshold only applies to loops with a known (constant) trip count. If a loop has a trip count below this value we don’t attempt to vectorize the loop. The loop below has an unknown trip count. Once we decide to vectorize a loop, we emit code to check whether we can execute one iteration of the vectorized body. This is the code quoted below. On May 22, 2013, at 10:23
2013 May 23
2
[LLVMdev] LLVM Loop Vectorizer puzzle
Hi, Just from personal interest, is there a canonical way in IR+metadata to express "This small constant trip-count loop is desired to be converted into a sequence of vector operations directly"? Ie, mapping a 4 element i32 loop into a linear sequence of <4 x i32> operations. Obviously this may not always be a win, but I'm just wondering if there's a way to communicate
2013 May 23
2
[LLVMdev] LLVM Loop Vectorizer puzzle
Hi, I have the llvm loop vectorizer to complie the following sample: //================= int test(int *a, int n) { for(int i = 0; i < n; i++) { a[i] += i; } return 0; } //================ The corresponded .ll file has a loop preheader: //================ for.body.lr.ph: ; preds = %entry
2013 Jul 17
5
[LLVMdev] IR Passes and TargetTransformInfo: Straw Man
Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of
2015 May 09
2
[LLVMdev] about MemoryDependenceAnalysis usage
Hi, I try to use MemoryDependenceAnalysis in a pass to analyse a simple function: void fct (int *restrict*restrict M, int *restrict*restrict L) { S1: M[1][1] = 1; S2: L[2][2] = 2; } When I iterate over MemoryDependenceAnalysis on the S2 statement, I get the load instruction for the first depth of the array, that’s ok. But I get also the load and store for the S1 statement. I assume the
2015 Mar 12
3
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
I think it would make sense for (1) and (2). I am not sure if (3) is feasible in instcombine. (I am not too familiar with LoopInfo) For the Octasic's Opus platform, I modified shouldMergeGEPs in our fork to: if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() && !Src.hasOneUse()) return false; return Src.hasAllConstantIndices(); // was return false;
2017 Mar 01
5
Any indispensable passes?
Hi everyone, I am currently testing out a combination of IR->IR passes with opt to benchmark how they affect performance. The source code works fine if simply use the clang (-O0/-O3) to directly compile to object files and link them. However, when I use opt with a select set of passes and then use llc to compile them to binary, the compiled binary is wrong. That makes me wonder if there are
2017 Mar 01
2
Any indispensable passes?
On Wed, Mar 1, 2017 at 12:53 PM, John Criswell via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On 3/1/17 2:54 PM, Peizhao Ou via llvm-dev wrote: > > Hi everyone, > > I am currently testing out a combination of IR->IR passes with opt to > benchmark how they affect performance. The source code works fine if simply > use the clang (-O0/-O3) to directly compile to
2016 Aug 25
4
CFLAA
(and sys::cas_flag that STATISTIC uses is a uint32 ...) On Thu, Aug 25, 2016 at 9:54 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > Okay, dumb question: > Are you really getting negative numbers in the second column? > > 526,766 -136 mem2reg # PHI nodes inserted > > http://llvm.org/docs/doxygen/html/PromoteMemoryToRegister_8cpp_source.html >
2015 May 02
5
[LLVMdev] Modifying LoopUnrollingPass
Hi Zhoulai, I am trying to modify "LoopUnrollPass" in llvm which produces multiple copies of loop equal to the loop unroll factor.Currently, using multicore architecture, say 3 for example and the execution goes like: for 3 cores if there are 9 iterations of loop core instruction 1 0,3,6 2 1,4,7 3 2,5,8 But I want to to
2015 Mar 12
2
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
Hi Mark, It is not clear to me at all that preventing the merging is the right solution. There are a large number of analysis, including alias analysis, and optimizations that use GetUnderlyingObject, and related routines to search back through GEPs. They only do this up to some small finite depth (six, IIRC). So reducing the GEP depth is likely the right solution for InstCombine (which has the
2016 Aug 25
2
CFLAA
I did gathered aggregate statistics reported by “-stats” over the ~400 test files. The following table summarizes the impact. The first column is the sum where the new analysis is enabled, the second column is the delta from baseline where no CFL alias analysis is performed. I am not experienced enough to know which of these are “good” or “bad” indicators. —david 72,250 685 SLP