Displaying 6 results from an estimated 6 matches for "vpadd".
Did you mean:
vpaddd
2013 May 21
0
[PATCH] 02-
...ot;vld1.16 d0[0], [%0]!;\n" //Load x
+ "vshll.s16 q7, d0, %[SIGSHIFT];\n" //Initialize sum
+ "vmull.s16 q8, d4, d11;\n" //Multiply-accumulate
+ "vadd.s32 d16, d16, d17;\n" //Three next instructions reduce the sum
+ "vpadd.s32 d16, d16;\n"
+ "vadd.s16 d14, d14, d16;\n"
+ "vrshrn.s32 d6, q7, %[SIGSHIFT];\n" //Scale result to 16 bits
+ "vmov.s16 d11, d0;\n" //Move previous
+ "vst1.16 d6[0], [%2]!;\n" //Store result
+
+ "...
2013 May 21
2
[PATCH] 02-Add CELT filter optimizations
...ot;vld1.16 d0[0], [%0]!;\n" //Load x
+ "vshll.s16 q7, d0, %[SIGSHIFT];\n" //Initialize sum
+ "vmull.s16 q8, d4, d11;\n" //Multiply-accumulate
+ "vadd.s32 d16, d16, d17;\n" //Three next instructions reduce the sum
+ "vpadd.s32 d16, d16;\n"
+ "vadd.s16 d14, d14, d16;\n"
+ "vrshrn.s32 d6, q7, %[SIGSHIFT];\n" //Scale result to 16 bits
+ "vmov.s16 d11, d0;\n" //Move previous
+ "vst1.16 d6[0], [%2]!;\n" //Store result
+
+ "...
2017 Feb 01
2
RFC: Generic IR reductions
On 1 February 2017 at 08:27, Renato Golin <renato.golin at linaro.org> wrote:
> Sorry, I meant min/max + reduce, just like above.
>
> %sum = add <N x float>, <N x float> %a, <N x float> %b
> %min = @llvm.minnum(<N x float> %sum)
> %red = @llvm.reduce(%min, float %acc)
No, this is wrong. I actually meant overriding the max/min intrinsics
to take
2011 Sep 01
0
[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point
...ot;4:"
+ " vld1.32 {q6}, [%[b]]!\n"
+ " vld1.32 {q10}, [%[a]]!\n"
+ " subs %[remainder], %[remainder], #4\n"
+ " vmla.f32 q0, q6, q10\n"
+ " bne 4b\n"
+ "5:"
+ " vadd.f32 d0, d0, d1\n"
+ " vpadd.f32 d0, d0, d0\n"
+ " vmov.f32 %[ret], d0[0]\n"
+ : [ret] "=&r" (ret), [a] "+r" (a), [b] "+r" (b),
+ [len] "+l" (len), [remainder] "+l" (remainder)
+ :
+ : "cc", "q0", "q1", "q2...
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com>
I optimized Speex resampler for NEON capable ARM CPUs. The first patch
should speed up resampling on any platform that can spare the
increased memory usage. It would be nice to have these merged to the
master branch. Please let me know if there is anything I can do to
help the the merge. The patches have been rebased on top of master
branch in
2014 Nov 11
4
[LLVMdev] supporting SAD in loop vectorizer
----- Original Message -----
> From: "James Molloy" <james at jamesmolloy.co.uk>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Dibyendu Das" <Dibyendu.Das at amd.com>, llvmdev at cs.uiuc.edu
> Sent: Tuesday, November 11, 2014 8:21:37 AM
> Subject: Re: [LLVMdev] supporting SAD in loop vectorizer
>
>
> If you'd like to