thr3ads.net - similar to: "[LLVMdev] Post-inc combining"

Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] Post-inc combining"

2011 Jan 28

[LLVMdev] Post-inc combining

On Jan 27, 2011, at 11:13 PM, Jonas Paulsson wrote: > Hi, > > I would like to transform a LLVM function containing a load and an add of the base address inside a loop to a post-incremented load. In DAGCombiner.cpp::CombineToPostIndexedLoadStore(), it says it cannot fold the add for instance if it is a predecessor/successor of the load. I find this odd, as this > is exactly what I

[LLVMdev] Post-inc combining

2011 Feb 07

[LLVMdev] Post-inc combining

When I compile the following program (for ARM): for(i=0;i<n2;i+=n3) { s+=a[i]; } , with GCC, I get the following loop body, with a post-modify load: .L4: add r1, r1, r3 ldr r4, [ip], r6 rsb r5, r3, r1 cmp r2, r5 add r0, r0, r4 bgt .L4 With LLVM, however, I get: .LBB0_3: @

[ARM] Should Use Load and Store with Register Offset

2020 Jul 21

[ARM] Should Use Load and Store with Register Offset

Hello Sjoerd, Thank you for your response! I was not aware that -Oz is a closer equivalent to GCC's -Os. I tried -Oz when compiling with clang and confirmed that the Clang's generated assembly is equivalent to GCC for the code snippet I posted above. clang --target=armv6m-none-eabi -Oz -fomit-frame-pointer memcpy_alt1: push {r4, lr} movs r3, #0 .LBB0_1: cmp

Aarch64: unaligned access despite -mstrict-align

2020 Jun 01

Aarch64: unaligned access despite -mstrict-align

Hi, I experienced a crash in code compiled with Clang 10.0.0 due to a misaligned 64-bit data access. The (ARMv8) CPU is configured with SCTL.A == 1 (alignment check enable). With SCTLR.A == 0 the code runs as expected. After some investigation I came up with the following reproducer: ---8<-------8<-------8<-------8<-------8<-------8<-------8<------- $ cat test.c extern char

[ARM] Should Use Load and Store with Register Offset

2020 Jul 20

[ARM] Should Use Load and Store with Register Offset

Hello LLVM Community (specifically anyone working with ARM Cortex-M), While trying to compile the Newlib C library I found that Clang10 was generating slightly larger binaries than the libc from the prebuilt gcc-arm-none-eabi toolchain. I looked at a few specific functions (memcpy, strcpy, etc.) and noticed that LLVM does not tend to generate load/store instructions with a register offset (e.g.

Comparing Clang and GCC: only clang stores updated value in each iteration.

2018 Sep 20

Comparing Clang and GCC: only clang stores updated value in each iteration.

Hi, I have a benchmark (mcf) that is currently slower when compiled with clang compared to gcc 8 (~10%). It seems that a hot loop has a few differences, where one interesting one is that while clang stores an incremented value in each iteration, gcc waits and just stores the final value just once after the loop. The value is a global variable. I wonder if this is something clang does not do

[LLVMdev] EFLAGS and MVT::Glue

2011 Feb 18

[LLVMdev] EFLAGS and MVT::Glue

The log message for revision 122213 says: > Change the X86 backend to stop using the evil ADDC/ADDE/SUBC/SUBE nodes (which > their carry depenedencies with MVT::Flag operands) and use clean and beautiful > EFLAGS dependences instead. (MVT::Flag has since been renamed to MVT::Glue.) That revision made bug 8404 go away. Am I right in thinking that one of the problems with MVT::Glue is

Handling post-inc users in LSR

2016 May 27

Handling post-inc users in LSR

Hello, For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc mode. Based on the observation that the icmp is already a post-inc user, I hacked LSR to prevent the icmp from being

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

2019 Jun 30

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

Hi All, The following code : void hexagon2( int *a, int *res ) { int i = 100; while ( i-- ) { *res++ = *a++; } } gets compiled as a sub-optimal Software loop on LLVM 9.0 instead of a Hardware loop, whereas it was compiled as a Hardware Loop in LLVM 7.0. This is the final assembly code generated by LLVM 9.0 : .text .file "main.c" .globl hexagon2 // --

Handling post-inc users in LSR

2016 May 27

Handling post-inc users in LSR

> On May 27, 2016, at 2:50 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello, > > For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. > > From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc

[X86][AVX512] RFC: make i1 illegal in the Codegen

2017 Jan 24

[X86][AVX512] RFC: make i1 illegal in the Codegen

Hi All, AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1's. For example, define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) { %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>,

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

Hi, Recently 10% performance regression on an important benchmark showed up after we integrated https://reviews.llvm.org/rL318299. The analysis showed that rL318299 triggered loop rotation on an multi exits loop, and the loop rotation introduced code layout issue. The performance regression is a side-effect of rL318299. I got two testcases a.ll and b.ll attached to illustrate the problem. a.ll

[LLVMdev] Loop simplification

2011 Feb 01

[LLVMdev] Loop simplification

I have a (non-entry) basic block that contains only PHI nodes and an unconditional branch (that does not branch to itself). Is it always possible to merge this block with it's successor and produce a semantically equivalent program? I'm trying to undo some of the loop optimizations that LLVM has applied to my program to reduce a pair of nested loops to a single loop.

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

On Mon, Dec 18, 2017 at 5:46 PM Xinliang David Li <davidxl at google.com> wrote: > The introduction of cleanup.cond block in b.ll without loop-rotation > already makes the layout worse than a.ll. > > > Without introducing cleanup.cond block, the layout out is > > entry->while.cond -> while.body->ret > > All the arrows are hot fall through edges which is

[LLVMdev] Loop simplification

2011 Feb 01

[LLVMdev] Loop simplification

On Feb 1, 2011, at 1:34 PM, Andrew Trick wrote: > On Feb 1, 2011, at 1:08 PM, Andrew Clinton wrote: > >> I have a (non-entry) basic block that contains only PHI nodes and an >> unconditional branch (that does not branch to itself). Is it always >> possible to merge this block with it's successor and produce a >> semantically equivalent program? I'm

[LLVMdev] Loop simplification

2011 Feb 01

[LLVMdev] Loop simplification

On Feb 1, 2011, at 1:08 PM, Andrew Clinton wrote: > I have a (non-entry) basic block that contains only PHI nodes and an > unconditional branch (that does not branch to itself). Is it always > possible to merge this block with it's successor and produce a > semantically equivalent program? I'm trying to undo some of the loop > optimizations that LLVM has applied to my

Propagation of debug information for variable into basic blocks.

2016 Sep 21

Propagation of debug information for variable into basic blocks.

Adrian, I am currently investigating issues where variables that one would expect to be available in a debugger are not in code that is compiled at optimisations other than -O0 The main problem appears to be with the LiveDebugValues::join() method because it does not allow variables to be propagated into blocks unless all predecessor blocks have an Outgoing Location for that variable. As a

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

the clang 3.5 loop optimizer seems to jump in unintentional for simple loops the very simple example ---- const int SIZE = 3; int the_func(int* p_array) { int dummy = 0; #if defined(ITER) for(int* p = &p_array[0]; p < &p_array[SIZE]; ++p) dummy += *p; #else for(int i = 0; i < SIZE; ++i) dummy += p_array[i]; #endif return dummy; } int main(int argc, char** argv) {

Asan code size overhead

2016 Oct 26

Asan code size overhead

Hi Kcc, I'm trying enabling the Asan in my firmware, but I find the asan instrumentation code size impact is too big for me. I just implement necessary firmware version runtime library functions (e.g. __asan_report_load8) with blank body firstly to pass the asan enabled build, but I find the new binary code size is already ~2.5 times as original one with asan disabled in GCC. I know Linux

Aarch64: unaligned access despite -mstrict-align

2020 Jun 01

Aarch64: unaligned access despite -mstrict-align

Sorry, quick message to ignore what I wrote before, I got myself confused (probably you too), With a recent trunk build I get this: f: adrp x8, g ldr x8, [x8, :lo12:g] mov w2, #16 mov x1, x0 mov x0, x8 b memcmp This looks more correct, and I need to look a bit more into this (and how clang 10.0.0 behaves).

similar to: [LLVMdev] Post-inc combining