Displaying 20 results from an estimated 49 matches for "lcpi0_0".
2020 Apr 15
2
[ARM] Register pressure with -mthumb forces register reload before each call
...mov r4, r0
> bl "<90>w\n "
> mov r1, r2
> mov r2, r5
> bl "<90>w\n "
> mov r0, r5
> mov r1, r4
> mov r2, r6
> ldr r6, .LCPI0_0
> blx r6
> mov r0, r5
> mov r1, r5
> mov r2, r4
> blx r6
>
> regalloc dump (attached) shows:
> Inline spilling tGPR:%9 [80r,152r:0) 0 at 80r weight:3.209746e-03
> From original %3
> also spill snippet...
2014 Mar 14
3
[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable
>> Any thoughs?
>
> I'm now struggling to see how GCC justifies it. What if a different
> translation-unit declared those variables in a different order? I also
> can't get the same behaviour here, do you have a more complete
> command-line?
Ah, I see; the translation-unit that does the optimisation needs to
have them as a definition (i.e. "= {0}") rather
2012 Aug 07
2
[LLVMdev] ARM eabi calling convention
...[2 x i32] %0)
nounwindt
*
It doesn't seem that ARM backend can figure out that "[2 x i32] %0" was
originally a structure consisting of a single double field. When I run llc,
it looks like "%0" is being passed in register r1 and r2.
*$ llc vararg1-main.ll -o -
ldr r0, .LCPI0_0
ldm r0, {r1, r2}
.LCPI0_0:
.long .Lmain.s0
...
.Lmain.s0:
.long 0 @ double 2.000000e+00
*
I am running tests to see if llc targeting mips can correctly compile a
bitcode file generated by clang-arm.
One of the tests is failing, and I was wondering whe...
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You,
It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 =
[8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are
indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from
these locations. and zmm2 contains constant 4000. so,
vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000,
as for arr...
2011 Mar 18
2
[LLVMdev] LLVM ERROR: No such instruction: `vmovsd ...' ?
...00
define i32 @main() {
entry:
%0 = load double* @g
%1 = fmul double 1.000000e+06, %0
store double %1, double* @g
ret i32 0
}
in test.ll and I run
> llc test.ll
> gcc test.s
I get:
test.s:12:no such instruction: `vmovsd _g(%rip), %xmm0'
test.s:13:no such instruction: `vmulsd LCPI0_0(%rip), %xmm0,%xmm0'
test.s:14:no such instruction: `vmovsd %xmm0, _g(%rip)'
I'm completely puzzled. Help?
Thanks!
N
2012 Aug 07
0
[LLVMdev] ARM eabi calling convention
...It doesn't seem that ARM backend can figure out that "[2 x i32] %0" was
> originally a structure consisting of a single double field. When I run llc,
> it looks like "%0" is being passed in register r1 and r2.
>
> $ llc vararg1-main.ll -o -
>
> ldr r0, .LCPI0_0
> ldm r0, {r1, r2}
>
> .LCPI0_0:
> .long .Lmain.s0
> ...
> .Lmain.s0:
> .long 0 @ double 2.000000e+00
>
>
> I am running tests to see if llc targeting mips can correctly compile a
> bitcode file generated by clang-arm.
&...
2014 Oct 07
4
[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
...And I expected that minimal positive denormalized double times -0.5 is equal
to -0.0, so correct exit code is 1.
llvm-3.4.2 on x86 linux target produced the following assembly:
.file "fpfail.ll"
.section .rodata.cst8,"aM", at progbits,8
.align 8
.LCPI0_0:
.quad -4620693217682128896 # double -0.5
.LCPI0_1:
.quad -9223372036854775808 # double -0
.text
.globl main
.align 16, 0x90
.type main, at function
main: # @main
.cfi_startproc
# BB#0:
vmov...
2013 Jun 25
2
[LLVMdev] Contants generation - proposal
...Problem description:
In X86_64 target the Code Model is "Large". It means that address is 64-bit and IP-relative memory operand can't be used in this case.
(Because in IP-relative memory operand the displacement is 32-bit).
In order to load constant, we use 2 instructions.
movabsq $.LCPI0_0, %rcx
vmulpd (%rcx), %ymm0, %ymm0
It happens because .LCPI0_0 is in .rodata section and instruction itself is in .text.
If I put the constant in .text, the code will look much better:
vmulpd .LCPI0_0(%rip), %ymm0, %ymm0
(2) Proposal
Define one more Code Model, let's say "LargeNearCons...
2011 Oct 17
0
[LLVMdev] LLVM Build Bot failure on llmv-x86_64-ubuntu
...llvm/Debug+Asserts/bin/llc < /home/jabbey/src/osuosl/buildbot/sandbox/llvm-x86_64-ubuntu/llvm-x86_64-ubuntu/llvm/test/CodeGen/X86/mmx-pinsrw.ll -mtriple=x86_64-linux -mattr=+mmx,+sse2
produces:
.file "<stdin>"
.section .rodata.cst16,"aM", at progbits,16
.align 16
.LCPI0_0:
.byte 0 # 0x0
.byte 1 # 0x1
.byte 4 # 0x4
.byte 5 # 0x5
.byte 8 # 0x8
.byte 9 # 0x9
.byte 12 # 0xc
.byte 13 # 0xd
....
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...ast-math
-march=native -mtune=native -DSPILLING_ENSUES=0 /* no spilling */
$ objdump -dC --no-show-raw-insn ./a.out
...
00000000004005c0 <main>:
4005c0: movdqa 0x1a58(%rip),%xmm0 # 402020 <x>
4005c8: psrld $0x17,%xmm0
4005cd: paddd 0x12b(%rip),%xmm0 # 400700 <.LCPI0_0>
4005d5: cvtdq2ps %xmm0,%xmm1
4005d8: divps 0x131(%rip),%xmm1 # 400710 <.LCPI0_1>
4005df: cvttps2dq %xmm1,%xmm1
4005e3: pmullw 0x135(%rip),%xmm1 # 400720 <.LCPI0_2>
4005eb: psubd %xmm1,%xmm0
4005ef: movq %xmm0,%rax
4005f4: movslq %eax,%rcx
4005f7: sa...
2020 Mar 31
2
[ARM] Register pressure with -mthumb forces register reload before each call
Hi,
Compiling attached test-case, which is reduced version of of
uECC_shared_secret from tinycrypt library [1], with
--target=arm-linux-gnueabi -march=armv6-m -Oz -S
results in reloading of register holding function's address before
every call to blx:
ldr r3, .LCPI0_0
blx r3
mov r0, r6
mov r1, r5
mov r2, r4
ldr r3, .LCPI0_0
blx r3
ldr r3, .LCPI0_0
mov r0, r6
mov r1, r5
mov r2, r4
blx r3
.LCPI0_0:
.long foo
>Fro...
2012 Jan 24
0
[LLVMdev] Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)
On Tue, Jan 24, 2012 at 08:36:17AM -0800, Esperanza de Escobar wrote:
> No one is arguing that there aren't ABI specs or LLVM design
> guidelines that say that unaligned accesses "should not", "could not"
> or "aren't guaranteed to" work, because it's besides the point.
No, it is the core of the issue. Standard C gives the compiler certain
2016 Dec 08
2
visitShiftByConstant of DAGCombiner
...of it:
For code as below:
unsigned array[4];
unsigned foo(unsigned long x) {
return array[(x>>2)&3ul];
sequence before this canonicalisation (ARM):
foo:
.fnstart
@ BB#0: @ %entry
lsrs r0, r0, #2
movs r1, #3
ands r1, r0
lsls r0, r1, #2
ldr r1, .LCPI0_0
ldr r0, [r1, r0]
bx lr
.p2align 2
sequence after this canonicalisation:
foo:
.fnstart
@ BB#0: @ %entry
movs r1, #12
ands r1, r0
ldr r0, .LCPI0_0
ldr r0, [r0, r1]
bx lr
.p2align 2
This canonicalisation makes shift folding possible. But I wonder if only
shift contex...
2020 Apr 15
4
[ARM] Register pressure with -mthumb forces register reload before each call
...mov r6, r2
mov r5, r1
mov r4, r0
bl "<90>w\n "
mov r1, r2
mov r2, r5
bl "<90>w\n "
mov r0, r5
mov r1, r4
mov r2, r6
ldr r6, .LCPI0_0
blx r6
mov r0, r5
mov r1, r5
mov r2, r4
blx r6
regalloc dump (attached) shows:
Inline spilling tGPR:%9 [80r,152r:0) 0 at 80r weight:3.209746e-03
>From original %3
also spill snippet %8 [152r,232r:0) 0 at 152r weight:2.104167e...
2020 Apr 07
2
[ARM] Register pressure with -mthumb forces register reload before each call
...ling attached test-case, which is reduced version of of
> uECC_shared_secret from tinycrypt library [1], with
> --target=arm-linux-gnueabi -march=armv6-m -Oz -S
> results in reloading of register holding function's address before
> every call to blx:
>
> ldr r3, .LCPI0_0
> blx r3
> mov r0, r6
> mov r1, r5
> mov r2, r4
> ldr r3, .LCPI0_0
> blx r3
> ldr r3, .LCPI0_0
> mov r0, r6
> mov r1, r5
> mov r2, r4
>...
2012 Aug 07
0
[LLVMdev] ARM eabi calling convention
On Aug 6, 2012, at 3:21 PM, Akira Hatanaka <ahatanak at gmail.com> wrote:
> When I compile this program
>
> $ cat vararg1-main.c
>
> typedef struct {
> double d;
> } S0;
>
> S0 g1;
>
> void foo0(int a, ...);
>
> int main(int argc, char **argv) {
> S0 s0 = { 2.0 };
>
> foo0(1, s0);
>
> printf("%f\n", g1.d);
>
2010 Nov 25
2
[LLVMdev] ARM Intruction Constraint DestReg!=SrcReg patch?
...------------- next part --------------
.syntax unified
.cpu arm10tdmi
.eabi_attribute 10, 2
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.file "foo.c"
.text
.globl foo
.align 2
.type foo,%function
foo:
ldr r1, .LCPI0_0
ldr r0, [r1]
mov r2, #123
mul r0, r0, r2
mov r2, #15
orr r2, r2, #15, 24
add r0, r0, #114, 30
and r0, r0, r2
str r0, [r1]
bx lr
.align 2
.LCPI0_0:
.long bar
.Ltmp0:
.size foo, .Ltmp0-foo
.type bar,%object
.comm bar,4,4
.ident "GCC: (GNU) 4.2.1 (Based on Apple Inc. build 5658)...
2012 Aug 07
2
[LLVMdev] ARM eabi calling convention
...RM backend can figure out that "[2 x i32] %0" was
>> originally a structure consisting of a single double field. When I run llc,
>> it looks like "%0" is being passed in register r1 and r2.
>>
>> $ llc vararg1-main.ll -o -
>>
>> ldr r0, .LCPI0_0
>> ldm r0, {r1, r2}
>>
>> .LCPI0_0:
>> .long .Lmain.s0
>> ...
>> .Lmain.s0:
>> .long 0 @ double 2.000000e+00
>>
>>
>> I am running tests to see if llc targeting mips can correctly compile a
&g...
2012 Aug 06
2
[LLVMdev] ARM eabi calling convention
When I compile this program
*$ cat vararg1-main.c
typedef struct {
double d;
} S0;
S0 g1;
void foo0(int a, ...);
int main(int argc, char **argv) {
S0 s0 = { 2.0 };
foo0(1, s0);
printf("%f\n", g1.d);
* * return 0;
}*
with this command,
*$ clang -target arm-none-linux-gnueabi-gcc -ccc-clang-archs armv7
-emit-llvm vararg1-main.c -S -o vararg1-main.ll -O3*
I get this
2014 Mar 14
2
[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable
...tmp3-.L0$pb), %ebx leal g0 at GOTOFF(%ebx), %eax movl %eax, (%esp) calll bar at PLT leal g1 at GOTOFF(%ebx), %eax movl %eax, (%esp) calll bar at PLT
Which is ok , since the add of ebx is folded and the constant is an immediate in x86.
On ARM, that is not the case. We produce
ldr r0, .LCPI0_0
add r4, pc, r0 // r4 is the equivalent of ebx in the x86 case.
ldr r0, .LCPI0_1 // r0 is the constant that is an
immediate in x86.
add r0, r0, r4 // that is the add that is folded in x86
...
.LCPI0_0:
.long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8)
.LCPI0_1:...