thr3ads.net - similar to: "[LLVMdev] How to unroll reduction loop with caching accumulator on register?"

Displaying 20 results from an estimated 100 matches similar to: "[LLVMdev] How to unroll reduction loop with caching accumulator on register?"

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

I tried to manually assign each of 3 arrays a unique TBAA node. But it does not seem to help: alias analysis still considers arrays as may-alias, which most likely prevents the desired optimization. Below is the sample code with TBAA metadata inserted. Could you please suggest what might be wrong with it? Many thanks, - D. marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa

[LLVMdev] Interesting post increment situation in DAG combiner

2013 Mar 01

[LLVMdev] Interesting post increment situation in DAG combiner

Hal, (and everyone who might care about post increment generation)... I have an interesting question/observation. Consider this vector loop. void vec_add_const(unsigned N, short __attribute__ ((aligned (16))) *A, short __attribute__ ((aligned (16))) val) { unsigned i,j; for (i=0; i<N; i++) { for (j=0; j<N; j++) { A[i*N+j] += val; } } } The

[LLVMdev] why LoopUnswitch pass does not constant fold conditional branch and merge blocks

2015 Jul 16

[LLVMdev] why LoopUnswitch pass does not constant fold conditional branch and merge blocks

Hi, I have a general question on LoopUnswtich pass. Consider the following IR snippet: define i32 @test(i1 %cond) { br label %loop_begin loop_begin: br i1 %cond, label %loop_body, label %loop_exit loop_body: br label %do_something do_something: call void @some_func() noreturn nounwind br label %loop_begin loop_exit: ret i32 0 } declare void @some_func() noreturn After running

[LLVMdev] Interesting post increment situation in DAG combiner

2013 Mar 01

[LLVMdev] Interesting post increment situation in DAG combiner

----- Original Message ----- > From: "Sergei Larin" <slarin at codeaurora.org> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: llvmdev at cs.uiuc.edu > Sent: Friday, March 1, 2013 10:24:39 AM > Subject: Interesting post increment situation in DAG combiner > > Hal, (and everyone who might care about post increment generation)... Sergei, Perhaps

[LLVMdev] Interesting post increment situation in DAG combiner

2013 Mar 01

[LLVMdev] Interesting post increment situation in DAG combiner

Hal, Here is my patch for the post inc case. I think it is symmetrically applicable to the pre-inc, but I have not tested it for that. I think you can clearly see my intent here - I simply select the "latest" candidate when multiple are available. Who else might be interested in this? Sergei --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The

[LLVMdev] parallel loop metadata simplification

2013 Mar 01

[LLVMdev] parallel loop metadata simplification

----- Original Message ----- > From: "Paul Redmond" <paul.redmond at intel.com> > To: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > Sent: Thursday, February 28, 2013 1:30:57 PM > Subject: [LLVMdev] parallel loop metadata simplification > > Hi, > > I've been working on clang codegen for #pragma ivdep and creating the >

[LLVMdev] parallel loop metadata simplification

2013 Feb 28

[LLVMdev] parallel loop metadata simplification

Hi, I've been working on clang codegen for #pragma ivdep and creating the llvm.mem.parallel_loop_access metadata seems quite difficult. The main problem is that there are so many places where loads and stores are created and all of them need to be changed when emitting a parallel loop. Note that creating llvm.loop.parallel is not a problem. One option is to modify IRBuilder to enable

[LLVMdev] Unnatural loops with O0

2008 Jun 11

[LLVMdev] Unnatural loops with O0

On Thursday 08 May 2008 18:33:48 Adrian Prantl wrote: > we noticed that llvmgcc4.2-2.2 sometimes generates non-natural loops > when compiling to bytecode without any optimizations. Apparently what > happens is that the loop header is duplicated, which results in two > entry points for the loop. this is actually a problem with the tailduplication pass of llvm. it does not consider

[LLVMdev] Unnatural loops with O0

2008 May 08

[LLVMdev] Unnatural loops with O0

Hello everybody, we noticed that llvmgcc4.2-2.2 sometimes generates non-natural loops when compiling to bytecode without any optimizations. Apparently what happens is that the loop header is duplicated, which results in two entry points for the loop. Since this could obstruct subsequent loop optimizations, it might be interesting to further investigate this behavior. To show the problem, I have

Information about the number of indices in memory accesses

2020 Sep 23

Information about the number of indices in memory accesses

Hi all, For loads and stores i want to extract information about the number of indices accessed. For instance: struct S {int X, int *Y}; __global__ void kernel(int *A, int **B, struct S) { int x = A[..][..]; // -> L: A[..][..] int y = *B[2]; // -> L: B[0][2] int z = S.y[..]; // -> L: S.1[..] // etc.. } I am performing some preprocessing on IR to: 1. Move constant

Legal names for Functions and other Identifiers

2017 Jun 22

Legal names for Functions and other Identifiers

Thanks for the heads up Philip ! I did come across a strange case where LLVM allowed "%" to be a part of a function's name. This was in the context of my patch https://reviews.llvm.org/D33985, where I prefix the name of the source function and the Scop ( A special kind of Region that Polly can optimize, the name of the Scop is the name of the Region ) to the name of the PTX kernel

Information about the number of indices in memory accesses

2020 Oct 03

Information about the number of indices in memory accesses

Hi Ees, SCEV Delinearization is the closest I know. But it has its problems. Well for one your expression should be SCEVable. But more importantly, SCEV Delinearization is trying to deduce something that is high-level (actually source-level) from a low-level IR in which a lot of this info has been lost. So, since there's not a 1-1 mapping from high-level code to LLVM IR, going backwards will

[LLVMdev] Unnatural loops with O0

2008 Jun 21

[LLVMdev] Unnatural loops with O0

On Jun 11, 2008, at 6:27 AM, Florian Brandner wrote: > On Thursday 08 May 2008 18:33:48 Adrian Prantl wrote: >> we noticed that llvmgcc4.2-2.2 sometimes generates non-natural loops >> when compiling to bytecode without any optimizations. Apparently what >> happens is that the loop header is duplicated, which results in two >> entry points for the loop. > > this is

[LLVMdev] how create a pointer to FILE*

2014 Jun 27

[LLVMdev] how create a pointer to FILE*

Hi, all I want to create a function in LLVM IR, whose type is: void _to_prof( FILE* ptrF); I do it within LLVM, but I don't know how build the parameter type for FILE* ptrF. Best Regards. Eric Lu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140627/c49bdb2a/attachment.html>

Information about the number of indices in memory accesses

2020 Oct 03

Information about the number of indices in memory accesses

Michael makes a great point about aliasing here and different indexing that accesses the same element! Another note: x = A[0][2] is fundamentally different depending on the type of `A`. If e.g. A was declared: int A[10][20], there's only _one_ load. A is a (and is treated as) a linear buffer, and GEPs only pinpoint the specific position of A[0][2] in this buffer (i.e. 0*10 + 2). But if A was

[LLVMdev] [polly] Polly Loop info and LoopSimplify functionality

2013 May 15

[LLVMdev] [polly] Polly Loop info and LoopSimplify functionality

Tobias, I am working on one very well hidden issue with Polly loop structure. Here is a brief description. In polly::createLoop() we create something like this (topology is important): polly.start: ; preds = %polly.split_new_and_old ... <some code> br label %polly.loop_header polly.loop_after: ; preds =

[LLVMdev] Combining Branch Statements - Missing Optimization Pass?

2010 May 28

[LLVMdev] Combining Branch Statements - Missing Optimization Pass?

I have some LLVM IR after the optimization passes defined in createStandardModulePasses with the optimization level set to 3. It contains what appears to me to be an easily optimizable branch statement. In particular, note in the code below that at the end of the "loop" BasicBlock that there is a conditional branch where in the false case, it branches to the label

[LLVMdev] Combining Branch Statements - Missing Optimization Pass?

2010 May 28

[LLVMdev] Combining Branch Statements - Missing Optimization Pass?

The thread here should help. http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-May/031624.html On May 28, 2010, at 6:35 AMPDT, Curtis Faith wrote: > I have some LLVM IR after the optimization passes defined in createStandardModulePasses with the optimization level set to 3. It contains what appears to me to be an easily optimizable branch statement. > > In particular, note in the code

RFC: Extending loop metadata

2018 May 31

RFC: Extending loop metadata

Hi llvm-dev, I recently posted an RFC about extending #pragma clang loop to the cfe-dev mailing list [1]. It proposes adding more loop transformations to Clang, defines an execution order if multiple transformations are specified and allow programmers to assign names to loops. This email is about the LLVM part of the proposal. I am happy for any feedback, especially about whether the community

[LLVMdev] Unnatural loops with O0

2009 Feb 11

[LLVMdev] Unnatural loops with O0

I am reviving this thread because I am seeing the same thing (unnatural loops produced by llvm-gcc), but it is not limited to -O0 -- I am seeing it for -O2 and -O3 as well. Some of my research work is relying on LoopInfo to provide loop information for all loops, but it is missing these loops. Is there any work in the pipeline that aims to fix this? Many thanks, Marc On Sat, Jun 21, 2008 at

similar to: [LLVMdev] How to unroll reduction loop with caching accumulator on register?