search for: state_qs_s32x4

Displaying 4 results from an estimated 4 matches for "state_qs_s32x4".

2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...taps" at the same time. If that's causing a > slowdown, then it should be possible to do the processing in "sections". > By that, I mean that you can implement (e.g.) an order-8 "kernel" that > computes the correlations and also outputs the last element of > state_QS_s32x4[0][0] back to input_QS[], so that it can be used to > compute a new secion. > Done. The speed is almost identical (slightly slower), however the extra bonus is code size saving. 4) It's a minor detail, but the last element of corr_QC[] that's not > currently vectorized could simp...
2017 Apr 13
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...e same time. If that's causing a >> slowdown, then it should be possible to do the processing in "sections". >> By that, I mean that you can implement (e.g.) an order-8 "kernel" that >> computes the correlations and also outputs the last element of >> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to >> compute a new secion. >> > > Done. The speed is almost identical (slightly slower), however the extra > bonus is code size saving. > > 4) It's a minor detail, but the last element of corr_QC[] that's not >&...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...sing up to 24 "taps" at the same time. If that's causing a slowdown, then it should be possible to do the processing in "sections". By that, I mean that you can implement (e.g.) an order-8 "kernel" that computes the correlations and also outputs the last element of state_QS_s32x4[0][0] back to input_QS[], so that it can be used to compute a new secion. 4) It's a minor detail, but the last element of corr_QC[] that's not currently vectorized could simply be vectorized independently outside the loop (and it's the same for all orders). Cheers, Jean-Marc On 05/0...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,