similar to: [ARM] Should Use Load and Store with Register Offset

Displaying 20 results from an estimated 3000 matches similar to: "[ARM] Should Use Load and Store with Register Offset"

[ARM] Should Use Load and Store with Register Offset

2020 Jul 21

2

[ARM] Should Use Load and Store with Register Offset

Hello Sjoerd, Thank you for your response! I was not aware that -Oz is a closer equivalent to GCC's -Os. I tried -Oz when compiling with clang and confirmed that the Clang's generated assembly is equivalent to GCC for the code snippet I posted above. clang --target=armv6m-none-eabi -Oz -fomit-frame-pointer memcpy_alt1: push {r4, lr} movs r3, #0 .LBB0_1: cmp

[DbgInfo] Potential bug in location list address ranges

2018 Apr 27

2

[DbgInfo] Potential bug in location list address ranges

Hi all, Consider this ARM assembly code of a C function: 00008124 <foo>: 8124: push {r4, r6, r7, lr} 8126: add r7, sp, #8 8128: mov r4, r0 812a: ldrsb.w r0, [r2] 812e: cmp r0, #1 8130: itt lt 8132: movlt r0, #85 ;

[DbgInfo] Potential bug in location list address ranges

2018 Apr 27

2

[DbgInfo] Potential bug in location list address ranges

As Adrian said, we'd need to see the source of foo() to assess what the location-list for bar ought to be. Without actually going to look, I would guess that 'poplt' is considered a conditional move, therefore r4's contents are not guaranteed after it executes (i.e. it is a clobber). If one operand of 'poplt' is 'pc' then of course it is also a conditional indirect

[DbgInfo] Potential bug in location list address ranges

2018 May 07

2

[DbgInfo] Potential bug in location list address ranges

Hello, Has anyone taken a look at this bug? I really want to fix this, but as Paul pointed out, this requires a lot of care... Thank you for your help Son Tuan Vu On Fri, Apr 27, 2018 at 7:29 PM, Son Tuan VU <sontuan.vu119 at gmail.com> wrote: > Thank you all for taking a look at this. I pasted the C source then > deleted it because I was afraid that it was too long to read...

[DbgInfo] Potential bug in location list address ranges

2018 Apr 27

0

[DbgInfo] Potential bug in location list address ranges

> On Apr 27, 2018, at 7:48 AM, Son Tuan VU <sontuan.vu119 at gmail.com> wrote: > > Hi all, > > Consider this ARM assembly code of a C function: > > 00008124 <foo>: > 8124: push {r4, r6, r7, lr} > 8126: add r7, sp, #8 > 8128: mov r4, r0 > 812a: ldrsb.w

[DbgInfo] Potential bug in location list address ranges

2018 Apr 27

0

[DbgInfo] Potential bug in location list address ranges

Thank you all for taking a look at this. I pasted the C source then deleted it because I was afraid that it was too long to read... Here's the code of *foo*. Its real name is *verifyPIN*. The variable *bar* is *userPin*. int *verifyPIN*(char **userPin*, char *cardPin, int *cpt) { int i; int status; int diff; if (*cpt > 0) { status = 0x55; diff = 0x55; for (i = 0; i

[DbgInfo] Potential bug in location list address ranges

2018 May 07

0

[DbgInfo] Potential bug in location list address ranges

Could you file a bug report about this (bugs.llvm.org <http://bugs.llvm.org/>)? If you don't have an account on bugzilla, I'd be happy to file one for you. Please provide exact instructions to reproduce the issue including any compilation flags. thanks, vedant > On May 7, 2018, at 9:16 AM, Son Tuan VU <sontuan.vu119 at gmail.com> wrote: > > Hello, > > Has

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

2019 Jun 30

6

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

Hi All, The following code : void hexagon2( int *a, int *res ) { int i = 100; while ( i-- ) { *res++ = *a++; } } gets compiled as a sub-optimal Software loop on LLVM 9.0 instead of a Hardware loop, whereas it was compiled as a Hardware Loop in LLVM 7.0. This is the final assembly code generated by LLVM 9.0 : .text .file "main.c" .globl hexagon2 // --

[LLVMdev] Post-inc combining

2011 Feb 07

1

[LLVMdev] Post-inc combining

When I compile the following program (for ARM): for(i=0;i<n2;i+=n3) { s+=a[i]; } , with GCC, I get the following loop body, with a post-modify load: .L4: add r1, r1, r3 ldr r4, [ip], r6 rsb r5, r3, r1 cmp r2, r5 add r0, r0, r4 bgt .L4 With LLVM, however, I get: .LBB0_3: @

[RFC] __builtin_constant_p() Improvements

2018 Apr 12

3

[RFC] __builtin_constant_p() Improvements

Hello again! I took a stab at PR4898[1]. The attached patch improves Clang's __builtin_constant_p support so that the Linux kernel is happy. With this improvement, Clang can determine if __builtin_constant_p is true or false after inlining. As an example: static __attribute__((always_inline)) int foo(int x) { if (__builtin_constant_p(x)) return 1; return 0; } static

Different SelectionDAGs for same CPU

2019 Jan 26

2

Different SelectionDAGs for same CPU

Hi Tim, >That C++ function is probably what looks for an FrameIndex node and >has been taught that it can be folded into the load. How do you teach a function that a node can be folded into an instruction? ________________________________ From: Tim Northover <t.p.northover at gmail.com> Sent: Monday, January 21, 2019 11:52 PM To: Josh Sharp Cc: via llvm-dev Subject: Re: [llvm-dev]

[RFC] New pass: LoopExitValues

2015 Aug 31

2

[RFC] New pass: LoopExitValues

Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the

Handling post-inc users in LSR

2016 May 27

2

Handling post-inc users in LSR

Hello, For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc mode. Based on the observation that the icmp is already a post-inc user, I hacked LSR to prevent the icmp from being

[RFC] New pass: LoopExitValues

2015 Sep 01

2

[RFC] New pass: LoopExitValues

On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix

[LLVMdev] Post-inc combining

2011 Jan 28

0

[LLVMdev] Post-inc combining

On Jan 27, 2011, at 11:13 PM, Jonas Paulsson wrote: > Hi, > > I would like to transform a LLVM function containing a load and an add of the base address inside a loop to a post-incremented load. In DAGCombiner.cpp::CombineToPostIndexedLoadStore(), it says it cannot fold the add for instance if it is a predecessor/successor of the load. I find this odd, as this > is exactly what I

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

Hi @ll, while clang/LLVM recognizes common bit-twiddling idioms/expressions like unsigned int rotate(unsigned int x, unsigned int n) { return (x << n) | (x >> (32 - n)); } and typically generates "rotate" machine instructions for this expression, it fails to recognize other also common bit-twiddling idioms/expressions. The standard IEEE CRC-32 for "big

[LLVMdev] Post-inc combining

2011 Jan 28

3

[LLVMdev] Post-inc combining

Hi, I would like to transform a LLVM function containing a load and an add of the base address inside a loop to a post-incremented load. In DAGCombiner.cpp::CombineToPostIndexedLoadStore(), it says it cannot fold the add for instance if it is a predecessor/successor of the load. I find this odd, as this is exactly what I would like to handle: a simple loop with an address that is inremented in

[RFC] __builtin_constant_p() Improvements

2018 Apr 13

0

[RFC] __builtin_constant_p() Improvements

I actually was working on an updated patch for the LLVM-side of this, also. :) I was just working on some test cases; I'll post it soon. It's somewhat different than yours. I haven't touched the clang side yet, but I think it needs to be more complex than what you have there. I think it actually needs to be able to evaluate the intrinsic as a constant _false_ in the front-end in some

[LLVMdev] ARM struct byval size > 64 triggers failure

2013 Jun 19

2

[LLVMdev] ARM struct byval size > 64 triggers failure

I missed that the testing case is returning a struct. You are right in VARegSaveSize. For callee: sub sp, sp, #16 push {r11, lr} mov r11, sp sub sp, sp, #8 str r3, [r11, #20] str r2, [r11, #16] str r1, [r11, #12] ldr r1, [r11, #76] The beginning of the input struct @ sp_at_entry - 16 - 8 + 12 = sp_at_entry -12 # of leftover bytes 67-12 = 55 r11+76 is @ sp_at_entry - 24 + 76 = sp_at_entry

Handling post-inc users in LSR

2016 May 27

0

Handling post-inc users in LSR

> On May 27, 2016, at 2:50 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello, > > For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. > > From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc