thr3ads.net - search: "vpadd"

2013 May 21

0

[PATCH] 02-

...ot;vld1.16 d0[0], [%0]!;\n" //Load x + "vshll.s16 q7, d0, %[SIGSHIFT];\n" //Initialize sum + "vmull.s16 q8, d4, d11;\n" //Multiply-accumulate + "vadd.s32 d16, d16, d17;\n" //Three next instructions reduce the sum + "vpadd.s32 d16, d16;\n" + "vadd.s16 d14, d14, d16;\n" + "vrshrn.s32 d6, q7, %[SIGSHIFT];\n" //Scale result to 16 bits + "vmov.s16 d11, d0;\n" //Move previous + "vst1.16 d6[0], [%2]!;\n" //Store result + + "...

[PATCH] 02-Add CELT filter optimizations

2013 May 21

2

[PATCH] 02-Add CELT filter optimizations

...ot;vld1.16 d0[0], [%0]!;\n" //Load x + "vshll.s16 q7, d0, %[SIGSHIFT];\n" //Initialize sum + "vmull.s16 q8, d4, d11;\n" //Multiply-accumulate + "vadd.s32 d16, d16, d17;\n" //Three next instructions reduce the sum + "vpadd.s32 d16, d16;\n" + "vadd.s16 d14, d14, d16;\n" + "vrshrn.s32 d6, q7, %[SIGSHIFT];\n" //Scale result to 16 bits + "vmov.s16 d11, d0;\n" //Move previous + "vst1.16 d6[0], [%2]!;\n" //Store result + + "...

RFC: Generic IR reductions

2017 Feb 01

2

RFC: Generic IR reductions

On 1 February 2017 at 08:27, Renato Golin <renato.golin at linaro.org> wrote: > Sorry, I meant min/max + reduce, just like above. > > %sum = add <N x float>, <N x float> %a, <N x float> %b > %min = @llvm.minnum(<N x float> %sum) > %red = @llvm.reduce(%min, float %acc) No, this is wrong. I actually meant overriding the max/min intrinsics to take

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

2011 Sep 01

0

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

...ot;4:" + " vld1.32 {q6}, [%[b]]!\n" + " vld1.32 {q10}, [%[a]]!\n" + " subs %[remainder], %[remainder], #4\n" + " vmla.f32 q0, q6, q10\n" + " bne 4b\n" + "5:" + " vadd.f32 d0, d0, d1\n" + " vpadd.f32 d0, d0, d0\n" + " vmov.f32 %[ret], d0[0]\n" + : [ret] "=&r" (ret), [a] "+r" (a), [b] "+r" (b), + [len] "+l" (len), [remainder] "+l" (remainder) + : + : "cc", "q0", "q1", "q2...

[PATCH 0/5] ARM NEON optimization for samplerate converter

2011 Sep 01

6

[PATCH 0/5] ARM NEON optimization for samplerate converter

From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in

[LLVMdev] supporting SAD in loop vectorizer

2014 Nov 11

4

[LLVMdev] supporting SAD in loop vectorizer

----- Original Message ----- > From: "James Molloy" <james at jamesmolloy.co.uk> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Dibyendu Das" <Dibyendu.Das at amd.com>, llvmdev at cs.uiuc.edu > Sent: Tuesday, November 11, 2014 8:21:37 AM > Subject: Re: [LLVMdev] supporting SAD in loop vectorizer > > > If you'd like to

search for: vpadd