thr3ads.net - search: "vcvt"

[LLVMdev] LLVM 3.0 release notes ARM Target

2011 Nov 16

0

[LLVMdev] LLVM 3.0 release notes ARM Target

what do you mean by "more optimal instructions" ? -omer On Wed, Nov 16, 2011 at 1:28 AM, Joe Abbey <jabbey at arxan.com> wrote: > I've done a first pass over the past 6 months of changes and some notable > things stood out: > > * The ARM backend has reworked Set Jump Long Jump EH Lowering. > * The ARM backend includes improved support for Cortex-M > *

[LLVMdev] LLVM 3.0 release notes ARM Target

2011 Nov 16

4

[LLVMdev] LLVM 3.0 release notes ARM Target

I've done a first pass over the past 6 months of changes and some notable things stood out: * The ARM backend has reworked Set Jump Long Jump EH Lowering. * The ARM backend includes improved support for Cortex-M * The ARM backend adds parsing and encoding ARM/Thumb/Thumb2 assembly There are also many many code generation improvements which select more optimal instructions. Those seemed

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

2011 Nov 12

2

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

...may actually be initialized from a program counter-relative 32-bit .long constant immediately following my method's code. .loc 1 388 3 ldr r0, [r5] ldr r1, [r4, r0] adds r1, #1 str r1, [r4, r0] .loc 1 390 64 mov r0, r4 ldr r1, [r6] blx _objc_msgSend vmov s0, r0 vmul.f32 d0, d0, d8 vcvt.u32.f32 d0, d0 vmov r0, s0 Ltmp272: .loc 1 392 9 cmp.w r0, #4000 Ltmp273: .loc 1 393 13 it hs blxhs _usleep cmp.w *looks* like a 16-bit comparison with an immediate constant, but in reality the constant is twelve bits. The ARM and Thumb instruction sets have quite severe restrictions on the...

Why does LLVM keep some loads in the loops even after applying the O3 optimization?

2019 Mar 28

3

Why does LLVM keep some loads in the loops even after applying the O3 optimization?

...is created by applying O3 optimization. Here it is: .LBB4_19: @ %for.body.91 @ =>This Inner Loop Header: Depth=1 ldr r0, [r5] mov r1, r8 add r0, r0, r7 vldr s0, [r0] mov r0, r6 vcvt.f64.f32 d0, s0 vmov r2, r3, d0 bl fprintf cmp r0, #0 blt .LBB4_25 @ BB#20: @ %for.cond.89 @ in Loop: Header=BB4_19 Depth=1 ldr r0, .LCPI4_2 add r4, r4, #1 add r7, r7, #4...

Why does LLVM keep some loads in the loops even after applying the O3 optimization?

2019 Mar 28

2

Why does LLVM keep some loads in the loops even after applying the O3 optimization?

...gt; .LBB4_19: @ %for.body.91 > @ =>This Inner Loop Header: Depth=1 > ldr r0, [r5] > mov r1, r8 > add r0, r0, r7 > vldr s0, [r0] > mov r0, r6 > vcvt.f64.f32 d0, s0 > vmov r2, r3, d0 > bl fprintf > cmp r0, #0 > blt .LBB4_25 > @ BB#20: @ %for.cond.89 > @ in Loop: Header=BB4_19 Depth=1 > ldr r0, .LCPI4...

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

2011 May 26

2

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

Hi all, LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community. If you are interested in this position, please send your resume / CV and relevant information to evan.cheng at apple.com Thanks, Evan Job description The Apple compiler team is seeking an engineer who is strongly

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

1

[LLVMdev] Question about ARM/vfp/NEON code generation

...s4 vmrs apsr_nzcv, fpscr vstr.32 s0, [sp, #16] vstr.32 s2, [sp, #12] vstr.32 s1, [sp, #8] ble LBB20_2 @ BB#1: @ %bb vldr.32 s0, [r7, #-16] ldr r0, LCPI20_0 LPC20_0: add r0, pc, r0 vcvt.f64.f32 d1, s0 vmov r1, r2, d1 bl _printf str r0, [sp, #4] b LBB20_3 LBB20_2: @ %bb1 vldr.32 s0, [r7, #-12] ldr r0, LCPI20_1 LPC20_1: add r0, pc, r0 vcvt.f64.f32 d1, s0...

Safe fptoui/fptosi casts

2018 Nov 05

3

Safe fptoui/fptosi casts

...of this issue for the Rust language can be found at https://github.com/rust-lang/rust/issues/10184 . Unfortunately, implementing this behavior in an efficient manner is not easy right now, because depending on the target architecture different instruction sequences need to be generated. On ARM the vcvt instruction directly exposes the desired saturation behavior. On X86 good instruction sequences vary depending on the size of the floating point number, and the size and signedness of the target integer type. I think there are broadly three ways in which the current situation can be improved: 1....

[LLVMdev] folding x * 0 = 0

2010 Mar 03

2

[LLVMdev] folding x * 0 = 0

On Wednesday 03 March 2010 15:38:06 Chris Lattner wrote: > > Signalling NaN is one case. I'm sure there are others. > > The only other thing I could imagine that it is useful for is for rounding > mode control. Yep. > IMO rounding mode should be explicitly marked on the > instruction as well. That would also be useful for some GPUs where each instruction can specify

[LLVMdev] folding x * 0 = 0

2010 Mar 03

0

[LLVMdev] folding x * 0 = 0

On Mar 3, 2010, at 1:53 PM, David Greene wrote: >> >> IMO rounding mode should be explicitly marked on the >> instruction as well. > > That would also be useful for some GPUs where each instruction can specify > its own rounding mode. SSE4 also has this for at least one conversion instruction. -Chris

[LLVMdev] folding x * 0 = 0

2010 Mar 04

5

[LLVMdev] folding x * 0 = 0

...ng mode should be explicitly marked on the >>> instruction as well. >> >> That would also be useful for some GPUs where each instruction can specify >> its own rounding mode. > > SSE4 also has this for at least one conversion instruction. As does ARM NEON. Sorta. vcvt normally uses round-to-zero, but it can optionally use the mode specified by the control register instead.

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

2011 Sep 01

0

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

...t len) @@ -97,4 +121,81 @@ static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, u return ret; } +#elif defined(FLOATING_POINT) + +static inline int32_t saturate_float_to_16bit(float a) { + int32_t ret; + asm ("vmov.f32 d0[0], %[a]\n" + "vcvt.s32.f32 d0, d0, #15\n" + "vqrshrn.s32 d0, q0, #15\n" + "vmov.s16 %[ret], d0[0]\n" + : [ret] "=&r" (ret) + : [a] "r" (a) + : "q0"); + return ret; +} +#undef WORD2INT +#define WORD2INT(x) (saturate_f...

Safe fptoui/fptosi casts

2018 Nov 05

5

Safe fptoui/fptosi casts

...ust language can be found > at https://github.com/rust-lang/rust/issues/10184. > > Unfortunately, implementing this behavior in an efficient manner is not > easy right now, because depending on the target architecture different > instruction sequences need to be generated. On ARM the vcvt instruction > directly exposes the desired saturation behavior. On X86 good instruction > sequences vary depending on the size of the floating point number, and the > size and signedness of the target integer type. > > I think there are broadly three ways in which the current situati...

[PATCH 0/5] ARM NEON optimization for samplerate converter

2011 Sep 01

6

[PATCH 0/5] ARM NEON optimization for samplerate converter

From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in

search for: vcvt