thr3ads.net - search: "input

Displaying 6 results from an estimated 6 matches for "input_qs".

silk_warped_autocorrelation_FIX() NEON optimization

2016 Jul 01

silk_warped_autocorrelation_FIX() NEON optimization

Hi all, I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email. It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8 Thanks for your comments. Linfeng

Several patches of ARM NEON optimization

2016 Jul 14

Several patches of ARM NEON optimization

I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 11

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...If that's causing a > slowdown, then it should be possible to do the processing in "sections". > By that, I mean that you can implement (e.g.) an order-8 "kernel" that > computes the correlations and also outputs the last element of > state_QS_s32x4[0][0] back to input_QS[], so that it can be used to > compute a new secion. > Done. The speed is almost identical (slightly slower), however the extra bonus is code size saving. 4) It's a minor detail, but the last element of corr_QC[] that's not > currently vectorized could simply be vectorized indepe...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 13

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...using a >> slowdown, then it should be possible to do the processing in "sections". >> By that, I mean that you can implement (e.g.) an order-8 "kernel" that >> computes the correlations and also outputs the last element of >> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to >> compute a new secion. >> > > Done. The speed is almost identical (slightly slower), however the extra > bonus is code size saving. > > 4) It's a minor detail, but the last element of corr_QC[] that's not >> currently vectorize...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...; at the same time. If that's causing a slowdown, then it should be possible to do the processing in "sections". By that, I mean that you can implement (e.g.) an order-8 "kernel" that computes the correlations and also outputs the last element of state_QS_s32x4[0][0] back to input_QS[], so that it can be used to compute a new secion. 4) It's a minor detail, but the last element of corr_QC[] that's not currently vectorized could simply be vectorized independently outside the loop (and it's the same for all orders). Cheers, Jean-Marc On 05/04/17 02:13 PM, Linfeng...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,

search for: input_qs