search for: vmlal

Displaying 4 results from an estimated 4 matches for "vmlal".

Did you mean: vmla
2013 May 21
0
[PATCH] 02-
...2, %[SIGSHIFT];\n" + "vshll.s16 q10, d3, %[SIGSHIFT];\n" + + /* Make previous samples vector for MAC in q5, q6 lanes */ + "vext.16 q5, q5, q0, #7;\n" + "vext.16 q6, q0, q1, #7;\n" + + /* Doing 16 samples filtering at a time */ + "vmlal.s16 q7, d8, d10;\n" + "vmlal.s16 q8, d8, d11;\n" + "vmlal.s16 q9, d8, d12;\n" + "vmlal.s16 q10, d8, d13;\n" + + /* Reduce filter sum to 16 bits for y output */ + "vrshrn.s32 d4, q7, %[SIGSHIFT];\n" + "vrshrn.s32 d...
2013 May 21
2
[PATCH] 02-Add CELT filter optimizations
...2, %[SIGSHIFT];\n" + "vshll.s16 q10, d3, %[SIGSHIFT];\n" + + /* Make previous samples vector for MAC in q5, q6 lanes */ + "vext.16 q5, q5, q0, #7;\n" + "vext.16 q6, q0, q1, #7;\n" + + /* Doing 16 samples filtering at a time */ + "vmlal.s16 q7, d8, d10;\n" + "vmlal.s16 q8, d8, d11;\n" + "vmlal.s16 q9, d8, d12;\n" + "vmlal.s16 q10, d8, d13;\n" + + /* Reduce filter sum to 16 bits for y output */ + "vrshrn.s32 d4, q7, %[SIGSHIFT];\n" + "vrshrn.s32 d...
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
...+ " beq 5f\n" + " b 4f\n" + "1:" + " vld1.16 {d16, d17, d18, d19}, [%[b]]!\n" + " vld1.16 {d20, d21, d22, d23}, [%[a]]!\n" + " subs %[len], %[len], #16\n" + " vmull.s16 q0, d16, d20\n" + " vmlal.s16 q0, d17, d21\n" + " vmlal.s16 q0, d18, d22\n" + " vmlal.s16 q0, d19, d23\n" + " beq 3f\n" + "2:" + " vld1.16 {d16, d17, d18, d19}, [%[b]]!\n" + " vld1.16 {d20, d21, d22, d23}, [%[a]]!\n" + " subs %[...
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in