thr3ads.net - search: "calc

Displaying 4 results from an estimated 4 matches for "calc_state".

Did you mean: calc_rate

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 11

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...n with complexity 8 on my Acre Chromebook. Here is my configure: ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf --disable-assertions --enable-fixed-point --enable-intrinsics CFLAGS=-O3 --disable-shared The testing speech file may also change the speed results. > 1) In calc_state(), rather than splitting the multiply in two > instructions, you may be able to simply shift the warping left 16 bits, > then use the Neon instruction that does a*b>>32 (i.e. the one that > computes the top bits of a 32x32 multiply) > Done. > 2) If the problem is with the mo...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 13

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

.... > Here is my configure: > ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf > --disable-assertions --enable-fixed-point --enable-intrinsics CFLAGS=-O3 > --disable-shared > > The testing speech file may also change the speed results. > > >> 1) In calc_state(), rather than splitting the multiply in two >> instructions, you may be able to simply shift the warping left 16 bits, >> then use the Neon instruction that does a*b>>32 (i.e. the one that >> computes the top bits of a 32x32 multiply) >> > > Done. > > >...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...ity 8. It appears that the warped autocorrelation function itself is only faster by a factor of about 1.35. That's a bit surprising considering I see nothing obviously wrong with the code. I'm not sure what's the problem, but here's a few things that may be worth considering: 1) In calc_state(), rather than splitting the multiply in two instructions, you may be able to simply shift the warping left 16 bits, then use the Neon instruction that does a*b>>32 (i.e. the one that computes the top bits of a 32x32 multiply) 2) If the problem is with the movs at the end of each iteration, t...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,

search for: calc_state