thr3ads.net - search: "vmov"

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

2013 Jul 01

3

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

...tcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. Attached motivating_example.ll shows such a case: llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - ldr r0, [r1] ldr r1, [r2] vmov s1, r1 vmov s0, r0 Here each ldr, vmov sequences could have been replaced by a simple vld1.32. ** Proposed Solution ** Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. The attached patch demonstrates that, but is missing the proper check...

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

2011 Nov 12

2

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

...pon closer examination I think it may actually be initialized from a program counter-relative 32-bit .long constant immediately following my method's code. .loc 1 388 3 ldr r0, [r5] ldr r1, [r4, r0] adds r1, #1 str r1, [r4, r0] .loc 1 390 64 mov r0, r4 ldr r1, [r6] blx _objc_msgSend vmov s0, r0 vmul.f32 d0, d0, d8 vcvt.u32.f32 d0, d0 vmov r0, s0 Ltmp272: .loc 1 392 9 cmp.w r0, #4000 Ltmp273: .loc 1 393 13 it hs blxhs _usleep cmp.w *looks* like a 16-bit comparison with an immediate constant, but in reality the constant is twelve bits. The ARM and Thumb instruction sets hav...

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

2013 Jul 01

0

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

...; selection. In that case, they incur moves between integer unit and floating > point unit that may result in inefficient code. > > Attached motivating_example.ll shows such a case: > llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - > ldr r0, [r1] > ldr r1, [r2] > vmov s1, r1 > vmov s0, r0 > Here each ldr, vmov sequences could have been replaced by a simple vld1.32. > > ** Proposed Solution ** > Lower to more vector friendly code (using a sequence of > insert_vector_elt), when bit casts will not be free. > The attached patch demonstrates that...

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

2013 Jul 01

3

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

...on selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. > > Attached motivating_example.ll shows such a case: > llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - > ldr r0, [r1] > ldr r1, [r2] > vmov s1, r1 > vmov s0, r0 > Here each ldr, vmov sequences could have been replaced by a simple vld1.32. > > ** Proposed Solution ** > Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. > The attached patch demonstrates that, b...

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

2013 Jul 01

0

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

...hey incur moves between integer unit and floating >> point unit that may result in inefficient code. >> >> Attached motivating_example.ll shows such a case: >> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - >> ldr r0, [r1] >> ldr r1, [r2] >> vmov s1, r1 >> vmov s0, r0 >> Here each ldr, vmov sequences could have been replaced by a simple >> vld1.32. >> >> ** Proposed Solution ** >> Lower to more vector friendly code (using a sequence of >> insert_vector_elt), when bit casts will not be free. >>...

[RFC] [ARM] Execute only support

2015 Dec 04

4

[RFC] [ARM] Execute only support

...passes said attribute to LLVM. If execute only is enabled: - Instead of using integer literal pools, use movw/movt to construct the literals. This means this feature is only available for sub-targets that support these instructions. - For floating point literals, use movw/movt/vmov instead of a literal pool. - Move jump tables to data sections. This is basically a re-implementation of a feature that is found in the ARM Compiler (http://infocenter.arm.com/help/topic/com.arm.doc.dui0471l/chr1368698593511. html). Would such a feature be accepted upstream? T...

[LLVMdev] Simple NEON optimization

2010 Nov 12

2

[LLVMdev] Simple NEON optimization

...optimization in a NEON case I've seen these days, most as a matter of exercise, but it also simplifies (just a bit) the code generated. The case is simple: uint32x2_t x, res; res = vceq_u32(x, vcreate_u32(0)); This will generate the following code: ; zero d16 vmov.i32 d16, #0x0 ; load a into d17 movw r0, :lower16:a movt r0, :upper16:a vld1.32 {d17}, [r0] ; compare two registers vceq.i32 d17, d17, d16 But, because the vector is zero, and there is a NEON instruction to compare against an imme...

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

2011 Sep 01

0

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

..._to_16bit(int32_t a) { + int32_t ret; + asm ("ssat %[ret], #16, %[a]" + : [ret] "=&r" (ret) + : [a] "r" (a) + : ); + return ret; +} +#else +static inline int32_t saturate_32bit_to_16bit(int32_t a) { + int32_t ret; + asm ("vmov.s32 d0[0], %[a]\n" + "vqmovn.s32 d0, q0\n" + "vmov.s16 %[ret], d0[0]\n" + : [ret] "=&r" (ret) + : [a] "r" (a) + : "q0"); + return ret; +} +#endif +#undef WORD2INT +#define WORD2INT(x) (saturate_32b...

[LLVMdev] RE : Vector argument passing abi for ARM ?

2012 Jul 05

2

[LLVMdev] RE : Vector argument passing abi for ARM ?

....1 generated code contains a misaligned load: bar: @ @bar @ BB#0: @ %L.entry push {r11, lr} add r0, r1, #2 vldr s0, [r1] vldr s2, [r0] # <= here load is misaligned vmovl.u8 q8, d0 vmovl.u8 q9, d1 vmovl.u16 q8, d16 vmovl.u16 q9, d18 vmov r0, r1, d16 vmov r2, r3, d18 bl zzz(PLT) pop {r11, pc} with LLVM trunk, assembly looks like: bar: @ @bar @ BB#0: @ %L...

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

2011 May 26

2

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

Hi all, LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community. If you are interested in this position, please send your resume / CV and relevant information to evan.cheng at apple.com Thanks, Evan Job description The Apple compiler team is seeking an engineer who is strongly

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

1

[LLVMdev] Question about ARM/vfp/NEON code generation

...t .align 2 _FloatingPointTest: @ @FloatingPointTest @ BB#0: @ %entry sub sp, sp, #8 str lr, [sp, #4] str r7, [sp] mov r7, sp sub sp, sp, #36 str r0, [r7, #-4] vmov s0, r0 str r1, [r7, #-8] vmov s1, r1 str r2, [r7, #-12] vmov s2, r2 vldr.32 s3, [r7, #-4] vldr.32 s4, [r7, #-8] vmul.f32 s3, s3, s4 vstr.32 s3, [r7, #-16] vldr.32 s4, [r7, #-12] vcmpe.f32 s...

[LLVMdev] Simple NEON optimization

2010 Nov 12

0

[LLVMdev] Simple NEON optimization

...s, most as a matter of exercise, but it also simplifies (just > a bit) the code generated. > > The case is simple: > > uint32x2_t x, res; > res = vceq_u32(x, vcreate_u32(0)); > > This will generate the following code: > > ; zero d16 > vmov.i32 d16, #0x0 > ; load a into d17 > movw r0, :lower16:a > movt r0, :upper16:a > vld1.32 {d17}, [r0] > ; compare two registers > vceq.i32 d17, d17, d16 > > But, because the vector is zero, and there is a NEON inst...

[LLVMdev] RE : Vector argument passing abi for ARM ?

2012 Jul 05

0

[LLVMdev] RE : Vector argument passing abi for ARM ?

...misaligned load: > > bar: @ @bar > @ BB#0: @ %L.entry > push {r11, lr} > add r0, r1, #2 > vldr s0, [r1] > vldr s2, [r0] # <= here load is misaligned > vmovl.u8 q8, d0 > vmovl.u8 q9, d1 > vmovl.u16 q8, d16 > vmovl.u16 q9, d18 > vmov r0, r1, d16 > vmov r2, r3, d18 > bl zzz(PLT) > pop {r11, pc} > > with LLVM trunk, assembly looks like: > > bar:...

Cannot compile speexdsp 1.2rc3 on ARM64

2016 Jul 30

2

Cannot compile speexdsp 1.2rc3 on ARM64

...ot;fmov %w[ret], s0\n" > : [ret] "=&r" (ret) > : [a] "w" (a) > : "v0" ); > return ret; > } > #elif defined(__ARM_NEON__) > static inline int32_t saturate_32bit_to_16bit(int32_t a) { > int32_t ret; > asm volatile ("vmov.s32 d24[0], %[a]\n" > "vqmovn.s32 d24, q12\n" > "vmov.s16 %[ret], d24[0]\n" > : [ret] "=&r" (ret) > : [a] "r" (a) > : "q12", "d24", "d25" ); > return ret; > } > #else >...

[LLVMdev] Simple NEON optimization

2010 Nov 12

2

[LLVMdev] Simple NEON optimization

...ovember 2010 17:52, Bob Wilson <bob.wilson at apple.com> wrote: > I recommend implementing this as a target-specific DAG combine optimization. We already have target-specific DAG nodes for the relevant NEON comparison operations (ARMISD::VCEQ, etc. -- see ARMISelLowering.h) as well as the vmov (ARMISD::VMOVIMM). You just need to teach the DAG combiner how to fold them together. Here's what you need to do (all of this code is in ARMISelLowering.cpp): Hi Bob, I thought so... I'll get cracked and see if I can generate some simple tests. Thank you very much for the detailed expl...

[LLVMdev] neon registers llvm using

2014 Mar 10

4

[LLVMdev] neon registers llvm using

Hi, Everyone: Can anyone let me know the default NEON registers llvm going to use with armv7 devices? For example, d10 and d11 are treated as default zero? I am using Xcode5 + llvm and I got a case that compiler will generate neon codes " vst.8 {d10, d11}, [r1] " from C codes: "int aMV[4]; ...... aMV[0] = aMV[1] = aMV[2] = aMV[3] = 0; " and I

how to build NE10 Project using llvm compiler

2018 Jul 30

2

how to build NE10 Project using llvm compiler

.../A72), and I want to use NE 10 Project library , and llvm compiler 3.8.1.1 (https://projectne10.github.io/Ne10/) <https://projectne10.github.io/Ne10/> When compiling the project file I get the following errors : ./NE10_abs.asm.s:59:9: error: unrecognized instruction mnemonic vmov s2, r3 ^ ../NE10_abs.asm.s:62:9: error: unrecognized instruction mnemonic vldr s1, [r1] ^ ../NE10_abs.asm.s:63:13: error: invalid operand for instruction add r1, r1, #4 can you advice , how to handle it? Re, Yehuda Marko Yehuda.Marko at scaleil.com...

clang 4.0.0: Invalid code for builtin floating point function with -mfloat-abi=hard -ffast-math (ARM)

2017 Mar 21

3

clang 4.0.0: Invalid code for builtin floating point function with -mfloat-abi=hard -ffast-math (ARM)

Hello, clang/llvm 4.0.0 generates invalid calls for builtin functions with -mfloat-abi=hard -ffast-math. Small example fail.c: // clang -O2 -target armv7a-none-none-eabi -mfloat-abi=hard -ffast-math -S fail.c -o - extern float sinf (float x); float sin1 (float x) {return (sinf (x));} generates code to pass the parameter in r0 and expect the result in r0. The same code without

[PATCH 0/5] ARM NEON optimization for samplerate converter

2011 Sep 01

6

[PATCH 0/5] ARM NEON optimization for samplerate converter

From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in

[LLVMdev] Vector argument passing abi for ARM ?

2012 Jul 05

0

[LLVMdev] Vector argument passing abi for ARM ?

Hi Sebastien, > Thanks for the quick answer, how do I know which type is legal/illegal with respect to calling convention ? the code generators are supposed to produce working code no matter what the parameter type is. The fact that the ARM ABI doesn't specify how <2 x i8> is passed just means that the code generators can pass it using whatever technique it feels like (since it

search for: vmov