Displaying 20 results from an estimated 3000 matches similar to: "[ARM] Should Use Load and Store with Register Offset"
2020 Jul 21
2
[ARM] Should Use Load and Store with Register Offset
Hello Sjoerd,
Thank you for your response! I was not aware that -Oz is a closer
equivalent to GCC's -Os. I tried -Oz when compiling with clang and
confirmed that the Clang's generated assembly is equivalent to GCC for the
code snippet I posted above.
clang --target=armv6m-none-eabi -Oz -fomit-frame-pointer
memcpy_alt1:
push {r4, lr}
movs r3, #0
.LBB0_1:
cmp
2018 Apr 27
2
[DbgInfo] Potential bug in location list address ranges
Hi all,
Consider this ARM assembly code of a C function:
00008124 <foo>:
8124: push {r4, r6, r7, lr}
8126: add r7, sp, #8
8128: mov r4, r0
812a: ldrsb.w r0, [r2]
812e: cmp r0, #1
8130: itt lt
8132: movlt r0, #85 ;
2018 Apr 27
2
[DbgInfo] Potential bug in location list address ranges
As Adrian said, we'd need to see the source of foo() to assess what the location-list for bar ought to be.
Without actually going to look, I would guess that 'poplt' is considered a conditional move, therefore r4's contents are not guaranteed after it executes (i.e. it is a clobber). If one operand of 'poplt' is 'pc' then of course it is also a conditional indirect
2018 May 07
2
[DbgInfo] Potential bug in location list address ranges
Hello,
Has anyone taken a look at this bug? I really want to fix this, but as Paul
pointed out, this requires a lot of care...
Thank you for your help
Son Tuan Vu
On Fri, Apr 27, 2018 at 7:29 PM, Son Tuan VU <sontuan.vu119 at gmail.com>
wrote:
> Thank you all for taking a look at this. I pasted the C source then
> deleted it because I was afraid that it was too long to read...
2018 Apr 27
0
[DbgInfo] Potential bug in location list address ranges
> On Apr 27, 2018, at 7:48 AM, Son Tuan VU <sontuan.vu119 at gmail.com> wrote:
>
> Hi all,
>
> Consider this ARM assembly code of a C function:
>
> 00008124 <foo>:
> 8124: push {r4, r6, r7, lr}
> 8126: add r7, sp, #8
> 8128: mov r4, r0
> 812a: ldrsb.w
2018 Apr 27
0
[DbgInfo] Potential bug in location list address ranges
Thank you all for taking a look at this. I pasted the C source then
deleted it because I was afraid that it was too long to read...
Here's the code of *foo*. Its real name is *verifyPIN*. The variable *bar*
is *userPin*.
int *verifyPIN*(char **userPin*, char *cardPin, int *cpt)
{
int i;
int status;
int diff;
if (*cpt > 0) {
status = 0x55;
diff = 0x55;
for (i = 0; i
2018 May 07
0
[DbgInfo] Potential bug in location list address ranges
Could you file a bug report about this (bugs.llvm.org <http://bugs.llvm.org/>)? If you don't have an account on bugzilla, I'd be happy to file one for you. Please provide exact instructions to reproduce the issue including any compilation flags.
thanks,
vedant
> On May 7, 2018, at 9:16 AM, Son Tuan VU <sontuan.vu119 at gmail.com> wrote:
>
> Hello,
>
> Has
2019 Jun 30
6
[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.
Hi All,
The following code :
void hexagon2( int *a, int *res )
{
int i = 100;
while ( i-- ) {
*res++ = *a++;
}
}
gets compiled as a sub-optimal Software loop on LLVM 9.0 instead of a Hardware loop, whereas it was compiled as a Hardware Loop in LLVM 7.0.
This is the final assembly code generated by LLVM 9.0 :
.text
.file "main.c"
.globl hexagon2 // --
2011 Feb 07
1
[LLVMdev] Post-inc combining
When I compile the following program (for ARM):
for(i=0;i<n2;i+=n3)
{
s+=a[i];
}
, with GCC, I get the following loop body, with a post-modify load:
.L4:
add r1, r1, r3
ldr r4, [ip], r6
rsb r5, r3, r1
cmp r2, r5
add r0, r0, r4
bgt .L4
With LLVM, however, I get:
.LBB0_3: @
2018 Apr 12
3
[RFC] __builtin_constant_p() Improvements
Hello again!
I took a stab at PR4898[1]. The attached patch improves Clang's
__builtin_constant_p support so that the Linux kernel is happy. With this
improvement, Clang can determine if __builtin_constant_p is true or false
after inlining.
As an example:
static __attribute__((always_inline)) int foo(int x) {
if (__builtin_constant_p(x))
return 1;
return 0;
}
static
2019 Jan 26
2
Different SelectionDAGs for same CPU
Hi Tim,
>That C++ function is probably what looks for an FrameIndex node and
>has been taught that it can be folded into the load.
How do you teach a function that a node can be folded into an instruction?
________________________________
From: Tim Northover <t.p.northover at gmail.com>
Sent: Monday, January 21, 2019 11:52 PM
To: Josh Sharp
Cc: via llvm-dev
Subject: Re: [llvm-dev]
2015 Aug 31
2
[RFC] New pass: LoopExitValues
Hello LLVM,
This is a proposal for a new pass that improves performance and code
size in some nested loop situations. The pass is target independent.
>From the description in the file header:
This optimization finds loop exit values reevaluated after the loop
execution and replaces them by the corresponding exit values if they
are available. Such sequences can arise after the
2016 May 27
2
Handling post-inc users in LSR
Hello,
For a very simple loop where all IV users are post-inc users, I observed
redundant add instructions in AArch64.
From LSR debug, I can see initial formula for icmp is the one that
transformed to a post-inc form in OptimizeLoopTermCond() and later
expanded in post-inc mode. Based on the observation that the icmp is
already a post-inc user, I hacked LSR to prevent the icmp from being
2015 Sep 01
2
[RFC] New pass: LoopExitValues
On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem
<jvanadrighem at gmail.com> wrote:
> Do you have some specific performance measurements?
Averaging 4 runs of 10000 iterations each of Coremark on my X86_64
desktop showed:
-O2 performance: +2.9% faster with the L.E.V. pass
-Os size: 1.5% smaller with the L.E.V. pass
In the case of Coremark, the benefit comes mainly from the matrix
2011 Jan 28
0
[LLVMdev] Post-inc combining
On Jan 27, 2011, at 11:13 PM, Jonas Paulsson wrote:
> Hi,
>
> I would like to transform a LLVM function containing a load and an add of the base address inside a loop to a post-incremented load. In DAGCombiner.cpp::CombineToPostIndexedLoadStore(), it says it cannot fold the add for instance if it is a predecessor/successor of the load. I find this odd, as this
> is exactly what I
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
Hi @ll,
while clang/LLVM recognizes common bit-twiddling idioms/expressions
like
unsigned int rotate(unsigned int x, unsigned int n)
{
return (x << n) | (x >> (32 - n));
}
and typically generates "rotate" machine instructions for this
expression, it fails to recognize other also common bit-twiddling
idioms/expressions.
The standard IEEE CRC-32 for "big
2011 Jan 28
3
[LLVMdev] Post-inc combining
Hi,
I would like to transform a LLVM function containing a load and an add of the base address inside a loop to a post-incremented load. In DAGCombiner.cpp::CombineToPostIndexedLoadStore(), it says it cannot fold the add for instance if it is a predecessor/successor of the load. I find this odd, as this
is exactly what I would like to handle: a simple loop with an address that is inremented in
2018 Apr 13
0
[RFC] __builtin_constant_p() Improvements
I actually was working on an updated patch for the LLVM-side of this, also.
:) I was just working on some test cases; I'll post it soon. It's somewhat
different than yours.
I haven't touched the clang side yet, but I think it needs to be more
complex than what you have there. I think it actually needs to be able to
evaluate the intrinsic as a constant _false_ in the front-end in some
2013 Jun 19
2
[LLVMdev] ARM struct byval size > 64 triggers failure
I missed that the testing case is returning a struct.
You are right in VARegSaveSize.
For callee:
sub sp, sp, #16
push {r11, lr}
mov r11, sp
sub sp, sp, #8
str r3, [r11, #20]
str r2, [r11, #16]
str r1, [r11, #12]
ldr r1, [r11, #76]
The beginning of the input struct @ sp_at_entry - 16 - 8 + 12 = sp_at_entry -12
# of leftover bytes 67-12 = 55
r11+76 is @ sp_at_entry - 24 + 76 = sp_at_entry
2016 May 27
0
Handling post-inc users in LSR
> On May 27, 2016, at 2:50 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> Hello,
>
> For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64.
>
> From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc