Displaying 4 results from an estimated 4 matches for "state_qs_s32x4".
2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...taps" at the same time. If that's causing a
> slowdown, then it should be possible to do the processing in "sections".
> By that, I mean that you can implement (e.g.) an order-8 "kernel" that
> computes the correlations and also outputs the last element of
> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to
> compute a new secion.
>
Done. The speed is almost identical (slightly slower), however the extra
bonus is code size saving.
4) It's a minor detail, but the last element of corr_QC[] that's not
> currently vectorized could simp...
2017 Apr 13
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...e same time. If that's causing a
>> slowdown, then it should be possible to do the processing in "sections".
>> By that, I mean that you can implement (e.g.) an order-8 "kernel" that
>> computes the correlations and also outputs the last element of
>> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to
>> compute a new secion.
>>
>
> Done. The speed is almost identical (slightly slower), however the extra
> bonus is code size saving.
>
> 4) It's a minor detail, but the last element of corr_QC[] that's not
>&...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...sing up to 24 "taps" at the same time. If that's causing a
slowdown, then it should be possible to do the processing in "sections".
By that, I mean that you can implement (e.g.) an order-8 "kernel" that
computes the correlations and also outputs the last element of
state_QS_s32x4[0][0] back to input_QS[], so that it can be used to
compute a new secion.
4) It's a minor detail, but the last element of corr_QC[] that's not
currently vectorized could simply be vectorized independently outside
the loop (and it's the same for all orders).
Cheers,
Jean-Marc
On 05/0...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc!
The speedup percentages are all relative to the entire encoder.
Comparing to master, this optimization patch speeds up fixed-point SILK
encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8%
Complexity 8: 5.5% Complexity 10: 4.0%
when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max
MHz: 2116.5
Thanks,
Linfeng
On Wed, Apr 5, 2017 at 11:02 AM,