Displaying 4 results from an estimated 4 matches for "calc_stat".
Did you mean:
calc_state
2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...n with
complexity 8 on my Acre Chromebook.
Here is my configure:
./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf
--disable-assertions --enable-fixed-point --enable-intrinsics CFLAGS=-O3
--disable-shared
The testing speech file may also change the speed results.
> 1) In calc_state(), rather than splitting the multiply in two
> instructions, you may be able to simply shift the warping left 16 bits,
> then use the Neon instruction that does a*b>>32 (i.e. the one that
> computes the top bits of a 32x32 multiply)
>
Done.
> 2) If the problem is with the m...
2017 Apr 13
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
....
> Here is my configure:
> ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf
> --disable-assertions --enable-fixed-point --enable-intrinsics CFLAGS=-O3
> --disable-shared
>
> The testing speech file may also change the speed results.
>
>
>> 1) In calc_state(), rather than splitting the multiply in two
>> instructions, you may be able to simply shift the warping left 16 bits,
>> then use the Neon instruction that does a*b>>32 (i.e. the one that
>> computes the top bits of a 32x32 multiply)
>>
>
> Done.
>
>
>...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ity 8. It
appears that the warped autocorrelation function itself is only faster
by a factor of about 1.35. That's a bit surprising considering I see
nothing obviously wrong with the code.
I'm not sure what's the problem, but here's a few things that may be
worth considering:
1) In calc_state(), rather than splitting the multiply in two
instructions, you may be able to simply shift the warping left 16 bits,
then use the Neon instruction that does a*b>>32 (i.e. the one that
computes the top bits of a 32x32 multiply)
2) If the problem is with the movs at the end of each iteration,...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc!
The speedup percentages are all relative to the entire encoder.
Comparing to master, this optimization patch speeds up fixed-point SILK
encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8%
Complexity 8: 5.5% Complexity 10: 4.0%
when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max
MHz: 2116.5
Thanks,
Linfeng
On Wed, Apr 5, 2017 at 11:02 AM,