Displaying 20 results from an estimated 67 matches for "vmov".
Did you mean:
mov
2013 Jul 01
3
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...tcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code.
Attached motivating_example.ll shows such a case:
llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -
ldr r0, [r1]
ldr r1, [r2]
vmov s1, r1
vmov s0, r0
Here each ldr, vmov sequences could have been replaced by a simple vld1.32.
** Proposed Solution **
Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free.
The attached patch demonstrates that, but is missing the proper check...
2011 Nov 12
2
[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels
...pon closer examination I think it may
actually be initialized from a program counter-relative 32-bit .long
constant immediately following my method's code.
.loc 1 388 3
ldr r0, [r5]
ldr r1, [r4, r0]
adds r1, #1
str r1, [r4, r0]
.loc 1 390 64
mov r0, r4
ldr r1, [r6]
blx _objc_msgSend
vmov s0, r0
vmul.f32 d0, d0, d8
vcvt.u32.f32 d0, d0
vmov r0, s0
Ltmp272:
.loc 1 392 9
cmp.w r0, #4000
Ltmp273:
.loc 1 393 13
it hs
blxhs _usleep
cmp.w *looks* like a 16-bit comparison with an immediate constant, but
in reality the constant is twelve bits. The ARM and Thumb instruction
sets hav...
2013 Jul 01
0
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...; selection. In that case, they incur moves between integer unit and floating
> point unit that may result in inefficient code.
>
> Attached motivating_example.ll shows such a case:
> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -
> ldr r0, [r1]
> ldr r1, [r2]
> vmov s1, r1
> vmov s0, r0
> Here each ldr, vmov sequences could have been replaced by a simple vld1.32.
>
> ** Proposed Solution **
> Lower to more vector friendly code (using a sequence of
> insert_vector_elt), when bit casts will not be free.
> The attached patch demonstrates that...
2013 Jul 01
3
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...on selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code.
>
> Attached motivating_example.ll shows such a case:
> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -
> ldr r0, [r1]
> ldr r1, [r2]
> vmov s1, r1
> vmov s0, r0
> Here each ldr, vmov sequences could have been replaced by a simple vld1.32.
>
> ** Proposed Solution **
> Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free.
> The attached patch demonstrates that, b...
2013 Jul 01
0
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...hey incur moves between integer unit and floating
>> point unit that may result in inefficient code.
>>
>> Attached motivating_example.ll shows such a case:
>> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -
>> ldr r0, [r1]
>> ldr r1, [r2]
>> vmov s1, r1
>> vmov s0, r0
>> Here each ldr, vmov sequences could have been replaced by a simple
>> vld1.32.
>>
>> ** Proposed Solution **
>> Lower to more vector friendly code (using a sequence of
>> insert_vector_elt), when bit casts will not be free.
>>...
2015 Dec 04
4
[RFC] [ARM] Execute only support
...passes said attribute to
LLVM.
If execute only is enabled:
- Instead of using integer literal pools, use movw/movt to
construct the literals. This means this feature is only available for
sub-targets that support these instructions.
- For floating point literals, use movw/movt/vmov instead of a
literal pool.
- Move jump tables to data sections.
This is basically a re-implementation of a feature that is found in the ARM
Compiler
(http://infocenter.arm.com/help/topic/com.arm.doc.dui0471l/chr1368698593511.
html).
Would such a feature be accepted upstream?
T...
2010 Nov 12
2
[LLVMdev] Simple NEON optimization
...optimization in a NEON case I've seen
these days, most as a matter of exercise, but it also simplifies (just
a bit) the code generated.
The case is simple:
uint32x2_t x, res;
res = vceq_u32(x, vcreate_u32(0));
This will generate the following code:
; zero d16
vmov.i32 d16, #0x0
; load a into d17
movw r0, :lower16:a
movt r0, :upper16:a
vld1.32 {d17}, [r0]
; compare two registers
vceq.i32 d17, d17, d16
But, because the vector is zero, and there is a NEON instruction to
compare against an imme...
2011 Sep 01
0
[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point
..._to_16bit(int32_t a) {
+ int32_t ret;
+ asm ("ssat %[ret], #16, %[a]"
+ : [ret] "=&r" (ret)
+ : [a] "r" (a)
+ : );
+ return ret;
+}
+#else
+static inline int32_t saturate_32bit_to_16bit(int32_t a) {
+ int32_t ret;
+ asm ("vmov.s32 d0[0], %[a]\n"
+ "vqmovn.s32 d0, q0\n"
+ "vmov.s16 %[ret], d0[0]\n"
+ : [ret] "=&r" (ret)
+ : [a] "r" (a)
+ : "q0");
+ return ret;
+}
+#endif
+#undef WORD2INT
+#define WORD2INT(x) (saturate_32b...
2012 Jul 05
2
[LLVMdev] RE : Vector argument passing abi for ARM ?
....1 generated code contains a misaligned load:
bar: @ @bar
@ BB#0: @ %L.entry
push {r11, lr}
add r0, r1, #2
vldr s0, [r1]
vldr s2, [r0] # <= here load is misaligned
vmovl.u8 q8, d0
vmovl.u8 q9, d1
vmovl.u16 q8, d16
vmovl.u16 q9, d18
vmov r0, r1, d16
vmov r2, r3, d18
bl zzz(PLT)
pop {r11, pc}
with LLVM trunk, assembly looks like:
bar: @ @bar
@ BB#0: @ %L...
2011 May 26
2
[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team
Hi all,
LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community.
If you are interested in this position, please send your resume / CV and relevant information to evan.cheng at apple.com
Thanks,
Evan
Job description
The Apple compiler team is seeking an engineer who is strongly
2011 May 27
1
[LLVMdev] Question about ARM/vfp/NEON code generation
...t
.align 2
_FloatingPointTest: @ @FloatingPointTest
@ BB#0: @ %entry
sub sp, sp, #8
str lr, [sp, #4]
str r7, [sp]
mov r7, sp
sub sp, sp, #36
str r0, [r7, #-4]
vmov s0, r0
str r1, [r7, #-8]
vmov s1, r1
str r2, [r7, #-12]
vmov s2, r2
vldr.32 s3, [r7, #-4]
vldr.32 s4, [r7, #-8]
vmul.f32 s3, s3, s4
vstr.32 s3, [r7, #-16]
vldr.32 s4, [r7, #-12]
vcmpe.f32 s...
2010 Nov 12
0
[LLVMdev] Simple NEON optimization
...s, most as a matter of exercise, but it also simplifies (just
> a bit) the code generated.
>
> The case is simple:
>
> uint32x2_t x, res;
> res = vceq_u32(x, vcreate_u32(0));
>
> This will generate the following code:
>
> ; zero d16
> vmov.i32 d16, #0x0
> ; load a into d17
> movw r0, :lower16:a
> movt r0, :upper16:a
> vld1.32 {d17}, [r0]
> ; compare two registers
> vceq.i32 d17, d17, d16
>
> But, because the vector is zero, and there is a NEON inst...
2012 Jul 05
0
[LLVMdev] RE : Vector argument passing abi for ARM ?
...misaligned load:
>
> bar: @ @bar
> @ BB#0: @ %L.entry
> push {r11, lr}
> add r0, r1, #2
> vldr s0, [r1]
> vldr s2, [r0] # <= here load is misaligned
> vmovl.u8 q8, d0
> vmovl.u8 q9, d1
> vmovl.u16 q8, d16
> vmovl.u16 q9, d18
> vmov r0, r1, d16
> vmov r2, r3, d18
> bl zzz(PLT)
> pop {r11, pc}
>
> with LLVM trunk, assembly looks like:
>
> bar:...
2016 Jul 30
2
Cannot compile speexdsp 1.2rc3 on ARM64
...ot;fmov %w[ret], s0\n"
> : [ret] "=&r" (ret)
> : [a] "w" (a)
> : "v0" );
> return ret;
> }
> #elif defined(__ARM_NEON__)
> static inline int32_t saturate_32bit_to_16bit(int32_t a) {
> int32_t ret;
> asm volatile ("vmov.s32 d24[0], %[a]\n"
> "vqmovn.s32 d24, q12\n"
> "vmov.s16 %[ret], d24[0]\n"
> : [ret] "=&r" (ret)
> : [a] "r" (a)
> : "q12", "d24", "d25" );
> return ret;
> }
> #else
>...
2010 Nov 12
2
[LLVMdev] Simple NEON optimization
...ovember 2010 17:52, Bob Wilson <bob.wilson at apple.com> wrote:
> I recommend implementing this as a target-specific DAG combine optimization. We already have target-specific DAG nodes for the relevant NEON comparison operations (ARMISD::VCEQ, etc. -- see ARMISelLowering.h) as well as the vmov (ARMISD::VMOVIMM). You just need to teach the DAG combiner how to fold them together. Here's what you need to do (all of this code is in ARMISelLowering.cpp):
Hi Bob,
I thought so... I'll get cracked and see if I can generate some simple tests.
Thank you very much for the detailed expl...
2014 Mar 10
4
[LLVMdev] neon registers llvm using
Hi, Everyone:
Can anyone let me know the default NEON registers llvm going to use with armv7 devices?
For example, d10 and d11 are treated as default zero? I am using Xcode5 + llvm and I got a case that compiler will generate neon codes
" vst.8 {d10, d11}, [r1] "
from C codes:
"int aMV[4];
......
aMV[0] = aMV[1] = aMV[2] = aMV[3] = 0; "
and I
2018 Jul 30
2
how to build NE10 Project using llvm compiler
.../A72), and I want to use NE 10 Project
library , and llvm compiler 3.8.1.1
(https://projectne10.github.io/Ne10/) <https://projectne10.github.io/Ne10/>
When compiling the project file I get the following errors :
./NE10_abs.asm.s:59:9: error: unrecognized instruction mnemonic
vmov s2, r3
^
../NE10_abs.asm.s:62:9: error: unrecognized instruction mnemonic
vldr s1, [r1]
^
../NE10_abs.asm.s:63:13: error: invalid operand for instruction
add r1, r1, #4
can you advice , how to handle it?
Re,
Yehuda Marko
Yehuda.Marko at scaleil.com...
2017 Mar 21
3
clang 4.0.0: Invalid code for builtin floating point function with -mfloat-abi=hard -ffast-math (ARM)
Hello,
clang/llvm 4.0.0 generates invalid calls for builtin functions with
-mfloat-abi=hard -ffast-math.
Small example fail.c:
// clang -O2 -target armv7a-none-none-eabi -mfloat-abi=hard
-ffast-math -S fail.c -o -
extern float sinf (float x);
float sin1 (float x) {return (sinf (x));}
generates code to pass the parameter in r0 and expect the result in r0.
The same code without
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com>
I optimized Speex resampler for NEON capable ARM CPUs. The first patch
should speed up resampling on any platform that can spare the
increased memory usage. It would be nice to have these merged to the
master branch. Please let me know if there is anything I can do to
help the the merge. The patches have been rebased on top of master
branch in
2012 Jul 05
0
[LLVMdev] Vector argument passing abi for ARM ?
Hi Sebastien,
> Thanks for the quick answer, how do I know which type is legal/illegal with respect to calling convention ?
the code generators are supposed to produce working code no matter what the
parameter type is. The fact that the ARM ABI doesn't specify how <2 x i8>
is passed just means that the code generators can pass it using whatever
technique it feels like (since it