thr3ads.net - similar to: "Optimization of successive constant stores"

Displaying 20 results from an estimated 10000 matches similar to: "Optimization of successive constant stores"

Optimization of successive constant stores

2015 Dec 11

Optimization of successive constant stores

Hmm... found an interesting issue: Given: %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0 store i8 1, i8* %2, align 8 %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1 store i8 2, i8* %3, align 1 %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2 store i8 3, i8* %4, align 2 %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3

Suboptimal code generated by clang+llc in quite a common scenario (?)

2019 Aug 08

Suboptimal code generated by clang+llc in quite a common scenario (?)

I found a something that I quite not understand when compiling a common piece of code using the -Os flags. I found it while testing my own backend but then I got deeper and found that at least the x86 is affected as well. This is the referred code: char pp[3]; char *scscx = pp; int tst( char i, char j, char k ) { scscx[0] = i; scscx[1] = j; scscx[2] = k; return 0; } The above gets

Byte-wide stores aren't coalesced if interspersed with other stores

2018 Sep 11

Byte-wide stores aren't coalesced if interspersed with other stores

Andres: FWIW, codegen will do the merge if you turn on global alias analysis for it "-combiner-global-alias-analysis". That said, we should be able to do this merging earlier. -Nirav On Mon, Sep 10, 2018 at 8:33 PM, Andres Freund via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi, > > On 2018-09-10 13:42:21 -0700, Andres Freund wrote: > > I have, in postres,

[LLVMdev] Area for improvement

2005 Feb 22

[LLVMdev] Area for improvement

When I increased COLS to the point where the loop could no longer be unrolled, the selection dag code generator generated effectively the same code as the default X86 code generator. Lots of redundant imul/movl/addl sequences. It can't clean it up either. Only unrolling all nested loops permits it to be optimized away, regardless of code generator. Jeff Cohen wrote: > I noticed

[LLVMdev] Area for improvement

2005 Feb 22

[LLVMdev] Area for improvement

I noticed that fourinarow is one of the programs in which LLVM is much slower than GCC, so I decided to take a look and see why that is so. The program has many loops that look like this: #define ROWS 6 #define COLS 7 void init_board(char b[COLS][ROWS+1]) { int i,j; for (i=0;i<COLS;i++) for (j=0;j<ROWS;j++) b[i][j]='.';

[LLVMdev] Optimizing out redundant alloca involving byval params

2015 Mar 08

[LLVMdev] Optimizing out redundant alloca involving byval params

errata: I am on 3.6 full stop. I *thought* there was a 3.7 available, based on the title of http://llvm.org/docs/ ("LLVM 3.7 documentation"). I suppose the docs are ahead of the release schedule? On Sun, Mar 8, 2015 at 9:44 AM Mircea Trofin <mtrofin at google.com> wrote: > Sorry, that phase is part of the PNaCl toolchain. This would be LLVM 3.6, > would your comments still

[LLVMdev] i1 types in MergeConsecutiveStores

2015 May 12

[LLVMdev] i1 types in MergeConsecutiveStores

Hello LLVM, In DAGCombiner.cpp, MergeConsecutiveStores uses int64_t ElementSizeBytes = MemVT.getSizeInBits()/8; https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L10669 which is broken for i1 types where getSizeInBits() == 1. My out-of-tree target hits this case and eventually LLVM asserts in Type.cpp. Is there some reason MergeConsecutiveStores should

[LLVMdev] GVN Infinite loop

2011 May 03

[LLVMdev] GVN Infinite loop

Hi, GVN seems to be running in an infinite loop on my example. I have attached the output of one iteration. I cant seem to reduce the testcase either. Any pointers to how to reduce the test case. THanks, Arushi GVN iteration: 8 GVN WIDENED LOAD: %0 = load i8* getelementptr inbounds (%struct.CHESS_POSITION* @search, i64 0, i32 23), align 2, !dbg !875 TO: %1 = load i16* bitcast (i8*

RFC: a more detailed design for ThinLTO + vcall CFI

2016 Oct 26

RFC: a more detailed design for ThinLTO + vcall CFI

Hi all, As promised, here is a brain dump on how I see CFI for vcalls working under ThinLTO. Most of this has been prototyped, so the design does appear to be sound. For context on how CFI currently works under regular LTO, please read: http://llvm.org/docs/TypeMetadata.html http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html http://clang.llvm.org/docs/LTOVisibility.html ==== Summary

Is it ok to allocate > half of address space?

2017 Nov 08

Is it ok to allocate > half of address space?

Hi, I was looking into the semantics of GEP inbounds and some BasicAA rules and I'm wondering if it's valid in LLVM IR to allocate more than half of the address space with a global variable or an alloca. If that's a scenario want to consider, then we have problems :) Consider this C code (32 bits): #include <string.h> char obj[0x80000008]; char f() { char *p = obj +

[LLVMdev] Semantics of an Inbounds GetElementPtr

2015 May 03

[LLVMdev] Semantics of an Inbounds GetElementPtr

Hi - I've got a question about what optimizations the "inbounds" keyword of "getelementptr" allows you to use. In the code below, %five is loaded from and inbounds offset of either a null pointer or %mem: target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128" define i8 @func(i8* %mem) { %test = icmp eq i8* %mem, null br i1 %test, label %done, label

[LLVMdev] GVN Infinite loop

2011 May 04

[LLVMdev] GVN Infinite loop

On May 3, 2011, at 3:25 PM, Arushi Aggarwal wrote: > Hi, > > GVN seems to be running in an infinite loop on my example. I have attached the output of one iteration. I cant seem to reduce the testcase either. > > Any pointers to how to reduce the test case. Bugzilla can reduce testcases that cause infinite loops (it has a -timeout flag), I'd try it. Even if this doesn't

[LLVMdev] A very basic doubt about LLVM Alias Analysis

2010 Feb 14

[LLVMdev] A very basic doubt about LLVM Alias Analysis

Hi Ambika, > Oh m sorry for that mistake as I had points to in mind. > But still what about the following prog: > > int main() > { > int *i,*j,k; > i=&k; > j=&k; > k=4; > printf("%d,%d,%d",*i,*j,k); > return 0; > } > > > here too i dont get <i,j> alias each other. how are you

[LLVMdev] What happened to "malloc" in LLVM 3.0 IR?

2012 Jan 19

[LLVMdev] What happened to "malloc" in LLVM 3.0 IR?

Hello folks, I have a compiler written with LLVM 2.6 by a student that produces .ll files, It behaved fine at the time. Trying to take the work over using version 3.0, I run into the problem that "malloc" in the IR is no longer valid: semac1 menu > llvm-as Carre.ll llvm-as: Carre.ll:68:14: error: expected instruction opcode %_malloc = malloc i8, i32 %2 ;

[LLVMdev] [DragonEgg] Mysterious FRAME coming from gimple to LLVM

2012 Jul 31

[LLVMdev] [DragonEgg] Mysterious FRAME coming from gimple to LLVM

Hi Duncan, A DragonEgg/GCC-related question: do you know where these strange FRAME tokens originate from (e.g. %struct.FRAME.matmul)? Compiling simple Fortran code with DragonEgg: > cat matmul.f90 subroutine matmul(nx, ny, nz) implicit none integer :: nx, ny, nz real, dimension(nx, ny) :: A real, dimension(ny, nz) :: B real, dimension(nx, nz) :: C integer :: i, j, k real,

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 26

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Thu, Jan 26, 2012 at 3:41 PM, Hal Finkel <hfinkel at anl.gov> wrote: > On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote: >> arm-none-linux-gnueabi > > Indeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get Minor remark: please use -target instead of -ccc-host-triple that is now deprecated. Thanks for looking at this testcase. Sebastian -- Qualcomm

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 26

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote: > arm-none-linux-gnueabi Indeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get vectorization (even though I don't get vectorization when targeting x86_64). I'll let you know what I find. -Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory

My own codegen is 2.5x slower than llc?

2018 May 29

My own codegen is 2.5x slower than llc?

My back-end code generator uses LLVM 5.0.1 to optimize and generate code for x86_64. If I run it on a given sample of IR, it takes almost 5 minutes to generate object code. 95%+ of this time is spent in MergeConsecutiveStores(). (One function has a basic block with 14000 instructions, which is a pathological case for MergeConsecutiveStores.) If, instead, I dump out the LLVM IR, and manually

[LLVMdev] problem with sgt's on Sparc machine

2011 Sep 15

[LLVMdev] problem with sgt's on Sparc machine

Hi guys, Thanks for the input. However, it seems that the code still produces the wrong output on a Sparc machine. My current llvm_print.bc code is: -------------------------------------------------------------------------------------------------- ; MduleID = '<stdin>' target datalayout =

[LLVMdev] problem with sgt's on Sparc machine

2011 Sep 15

[LLVMdev] problem with sgt's on Sparc machine

Hi, I compiled the following code on a Sparc machine, basically it produce different results than a X86 machine. ------------------------------------------------------------------------------------------------------------------- ; MduleID = '<stdin>' target datalayout =

similar to: Optimization of successive constant stores