similar to: ASM runtime detection and optimizations

Displaying 20 results from an estimated 600 matches similar to: "ASM runtime detection and optimizations"

2010 Nov 16
1
Possible bug in "pitch_downsample"
Hi Jean-Marc, I could be way off base here, but it seems to me that line 115 in pitch.c in the function "pitch_downsample": x_lp[i] = SHR32(HALF32(HALF32(x[1][(2*i-1)]+x[1][(2*i+1)])+x[1][2*i]), SIG_SHIFT+2); should actually be: x_lp[i] += SHR32(HALF32(HALF32(x[1][(2*i-1)]+x[1][(2*i+1)])+x[1][2*i]), SIG_SHIFT+2); Sorry if I'm totally misreading things and just
2016 Sep 13
4
[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)
Should call celt_inner_prod(). --- celt/bands.c | 7 ++++--- celt/bands.h | 2 +- celt/celt_encoder.c | 6 +++--- celt/pitch.c | 2 +- src/opus_multistream_encoder.c | 2 +- 5 files changed, 10 insertions(+), 9 deletions(-) diff --git a/celt/bands.c b/celt/bands.c index bbe8a4c..1ab24aa 100644 --- a/celt/bands.c +++ b/celt/bands.c
2016 Jun 17
5
ARM NEON optimization -- celt_fir()
Hi all, This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the next few months. I'm submitting 2 patches in the following couple of emails, which have the new created celt_fir_neon(). I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are concerns to this change, please let me know. Many thanks to your comments. Linfeng Zhang
2015 Nov 05
0
AVX Optimizations
Velea, Radu wrote: > Yes, > > Thank you. I'll follow up with the AVX code and tests for pitch code. Actually, I lied. Because you update opus_select_arch(), you can now return a value for arch (4) that is larger than the maximum we currently support (3). This doesn't actually cause failures, because we mask with OPUS_ARCHMASK, but it does mean that a CPU with AVX will invoke
2016 Jul 14
0
[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
Create the fixed-point intrinsics optimization celt_fir_neon() for ARM NEON. Create test tests/test_unit_optimization to unit test the optimization. --- .gitignore | 1 + Makefile.am | 39 ++++- celt/arm/arm_celt_map.c | 17 +++ celt/arm/celt_lpc_arm.h | 65 ++++++++ celt/arm/celt_lpc_neon_intr.c
2013 May 21
0
[PATCH] 02-
- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It increase performance when using optimized macros (ex: ARMv5E). A possible side effect of loop unroll is that i don't check for odd length here. - Add NEON version of FIR filter and autocorr -- Aur?lien Zanelli Parrot SA 174, quai de Jemmapes 75010 Paris France -------------- next part -------------- diff --git
2015 Nov 05
2
AVX Optimizations
Sorry. I missed that. Good observation. Please go ahead and correct the patch. Thanks, Radu -----Original Message----- From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry Sent: Thursday, November 5, 2015 11:08 AM To: opus at xiph.org Subject: Re: [opus] AVX Optimizations Velea, Radu wrote: > Yes, > > Thank you. I'll follow up with
2013 May 21
2
[PATCH] 02-Add CELT filter optimizations
Please ignore my previous mail and patch, there is a new version :). Patch changes are: - Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It increase performance when using optimized macros (ex: ARMv5E). A possible side effect of loop unroll is that i don't check for odd length here. - Add NEON version of FIR filter and autocorr - Add a section in autoconf in order to
2015 Mar 13
1
[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Mar 12
1
[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Nov 05
2
AVX Optimizations
Yes, Thank you. I'll follow up with the AVX code and tests for pitch code. Radu -----Original Message----- From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry Sent: Thursday, November 5, 2015 10:31 AM To: opus at xiph.org Subject: Re: [opus] AVX Optimizations Velea, Radu wrote: > I've created a pull request[1] to enable configuration
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang
2015 Nov 21
0
[Aarch64 v2 08/18] Add Neon fixed-point implementation of xcorr_kernel.
Used for celt_pitch_xcorr on aarch64, and celt_fir and celt_iir on both armv7 and aarch64. --- celt/arm/arm_celt_map.c | 17 +++++++++++++ celt/arm/celt_neon_intr.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++- celt/arm/pitch_arm.h | 31 +++++++++++++++++++++++- 3 files changed, 107 insertions(+), 2 deletions(-) diff --git a/celt/arm/arm_celt_map.c b/celt/arm/arm_celt_map.c index
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM, At line 221 in celt_lpc.c (the celt_iir function) I think you really want the RESTORE_STACK statement to be before the #endif instead of after it. Also, I couldn't help notice that your SSE code for xcorr_kernel reads more than "len" elements of "_x". I don't know if that's really a problem when running the codec, but a tool like valgrind will have a
2013 Jun 07
0
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi John, Thanks for the two fixes. They're in git now. Your SSE version seems to also be slightly faster than mine -- probably due the the partial sums. As for the NEON code, it would be good to compare the performance with the code Aur?lien Zanelli posted at http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch Cheers, Jean-Marc On 06/06/2013 08:07
2013 Jun 07
1
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Unfortunately I don't have a setup that lets me easily profile ARM code, so I really can't tell which method is faster (though I suspect Mr. Zanelli's code is). Let me offer up another intrinsic version of the NEON xcorr_kernel that is almost identical to the SSE version, and more in line with Mr. Zanelli's code: static inline void xcorr_kernel_neon(const opus_val16 *x, const
2015 Aug 05
0
[PATCH 2/8] Reorganize pitch_arm.h, so RTCD works for intrinsics functions as well.
--- celt/arm/arm_celt_map.c | 24 +++++++++++- celt/arm/pitch_arm.h | 97 +++++++++++++++++++++++++++++++++---------------- 2 files changed, 88 insertions(+), 33 deletions(-) diff --git a/celt/arm/arm_celt_map.c b/celt/arm/arm_celt_map.c index 0c9acff..cc6b706 100644 --- a/celt/arm/arm_celt_map.c +++ b/celt/arm/arm_celt_map.c @@ -94,9 +94,14 @@ void (*const
2015 Mar 02
13
Patch cleaning up Opus x86 intrinsics configury
The attached patch cleans up Opus's x86 intrinsics configury. It: * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in
2015 Aug 05
0
[PATCH 1/8] Move ARM-specific macro overrides to arm-specific file.
--- celt/arm/pitch_arm.h | 19 +++++++++++++++++++ celt/pitch.h | 19 ------------------- 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/celt/arm/pitch_arm.h b/celt/arm/pitch_arm.h index d5c9408..fe76f8d 100644 --- a/celt/arm/pitch_arm.h +++ b/celt/arm/pitch_arm.h @@ -75,4 +75,23 @@ void celt_pitch_xcorr_float_neon(const opus_val16 *_x, const opus_val16 *_y, #endif
2015 Dec 08
2
[Aarch64 v2 02/18] Reorganize ARM CPU #ifdefs.
Jonathan Lennox wrote: > -# if defined(FIXED_POINT) > +# if defined(FIXED_POINT) && \ > + ((defined(OPUS_ARM_MAY_HAVE_NEON) && !defined(OPUS_ARM_PRESUME_NEON)) || \ > + (defined(OPUS_ARM_MAY_HAVE_MEDIA) && !defined(OPUS_ARM_PRESUME_MEDIA)) || \ > + (defined(OPUS_ARM_MAY_HAVE_EDSP) && !defined(OPUS_ARM_PRESUME_EDSP))) > opus_val32 (*const