Displaying 6 results from an estimated 6 matches for "input_qs".
2016 Jul 01
1
silk_warped_autocorrelation_FIX() NEON optimization
Hi all,
I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email.
It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8
Thanks for your comments.
Linfeng
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang
2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...If that's causing a
> slowdown, then it should be possible to do the processing in "sections".
> By that, I mean that you can implement (e.g.) an order-8 "kernel" that
> computes the correlations and also outputs the last element of
> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to
> compute a new secion.
>
Done. The speed is almost identical (slightly slower), however the extra
bonus is code size saving.
4) It's a minor detail, but the last element of corr_QC[] that's not
> currently vectorized could simply be vectorized indepe...
2017 Apr 13
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...using a
>> slowdown, then it should be possible to do the processing in "sections".
>> By that, I mean that you can implement (e.g.) an order-8 "kernel" that
>> computes the correlations and also outputs the last element of
>> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to
>> compute a new secion.
>>
>
> Done. The speed is almost identical (slightly slower), however the extra
> bonus is code size saving.
>
> 4) It's a minor detail, but the last element of corr_QC[] that's not
>> currently vectorize...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...; at the same time. If that's causing a
slowdown, then it should be possible to do the processing in "sections".
By that, I mean that you can implement (e.g.) an order-8 "kernel" that
computes the correlations and also outputs the last element of
state_QS_s32x4[0][0] back to input_QS[], so that it can be used to
compute a new secion.
4) It's a minor detail, but the last element of corr_QC[] that's not
currently vectorized could simply be vectorized independently outside
the loop (and it's the same for all orders).
Cheers,
Jean-Marc
On 05/04/17 02:13 PM, Linfeng...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc!
The speedup percentages are all relative to the entire encoder.
Comparing to master, this optimization patch speeds up fixed-point SILK
encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8%
Complexity 8: 5.5% Complexity 10: 4.0%
when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max
MHz: 2116.5
Thanks,
Linfeng
On Wed, Apr 5, 2017 at 11:02 AM,