Displaying 20 results from an estimated 600 matches similar to: "ASM runtime detection and optimizations"
2010 Nov 16
1
Possible bug in "pitch_downsample"
Hi Jean-Marc,
I could be way off base here, but it seems to me that line 115 in
pitch.c in the function "pitch_downsample":
x_lp[i] =
SHR32(HALF32(HALF32(x[1][(2*i-1)]+x[1][(2*i+1)])+x[1][2*i]), SIG_SHIFT+2);
should actually be:
x_lp[i] +=
SHR32(HALF32(HALF32(x[1][(2*i-1)]+x[1][(2*i+1)])+x[1][2*i]), SIG_SHIFT+2);
Sorry if I'm totally misreading things and just
2016 Sep 13
4
[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)
Should call celt_inner_prod().
---
celt/bands.c | 7 ++++---
celt/bands.h | 2 +-
celt/celt_encoder.c | 6 +++---
celt/pitch.c | 2 +-
src/opus_multistream_encoder.c | 2 +-
5 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/celt/bands.c b/celt/bands.c
index bbe8a4c..1ab24aa 100644
--- a/celt/bands.c
+++ b/celt/bands.c
2016 Jun 17
5
ARM NEON optimization -- celt_fir()
Hi all,
This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the
next few months.
I'm submitting 2 patches in the following couple of emails, which have the new
created celt_fir_neon().
I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are
concerns to this change, please let me know.
Many thanks to your comments.
Linfeng Zhang
2015 Nov 05
0
AVX Optimizations
Velea, Radu wrote:
> Yes,
>
> Thank you. I'll follow up with the AVX code and tests for pitch code.
Actually, I lied. Because you update opus_select_arch(), you can now
return a value for arch (4) that is larger than the maximum we currently
support (3). This doesn't actually cause failures, because we mask with
OPUS_ARCHMASK, but it does mean that a CPU with AVX will invoke
2016 Jul 14
0
[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
Create the fixed-point intrinsics optimization celt_fir_neon() for ARM NEON.
Create test tests/test_unit_optimization to unit test the optimization.
---
.gitignore | 1 +
Makefile.am | 39 ++++-
celt/arm/arm_celt_map.c | 17 +++
celt/arm/celt_lpc_arm.h | 65 ++++++++
celt/arm/celt_lpc_neon_intr.c
2013 May 21
0
[PATCH] 02-
- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It
increase performance when using optimized macros (ex: ARMv5E). A
possible side effect of loop unroll is that i don't check for odd length
here.
- Add NEON version of FIR filter and autocorr
--
Aur?lien Zanelli
Parrot SA
174, quai de Jemmapes
75010 Paris
France
-------------- next part --------------
diff --git
2015 Nov 05
2
AVX Optimizations
Sorry. I missed that. Good observation.
Please go ahead and correct the patch.
Thanks,
Radu
-----Original Message-----
From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry
Sent: Thursday, November 5, 2015 11:08 AM
To: opus at xiph.org
Subject: Re: [opus] AVX Optimizations
Velea, Radu wrote:
> Yes,
>
> Thank you. I'll follow up with
2013 May 21
2
[PATCH] 02-Add CELT filter optimizations
Please ignore my previous mail and patch, there is a new version :).
Patch changes are:
- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It
increase performance when using optimized macros (ex: ARMv5E). A
possible side effect of loop unroll is that i don't check for odd length
here.
- Add NEON version of FIR filter and autocorr
- Add a section in autoconf in order to
2015 Mar 13
1
[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com>
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Mar 12
1
[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com>
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Nov 05
2
AVX Optimizations
Yes,
Thank you. I'll follow up with the AVX code and tests for pitch code.
Radu
-----Original Message-----
From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry
Sent: Thursday, November 5, 2015 10:31 AM
To: opus at xiph.org
Subject: Re: [opus] AVX Optimizations
Velea, Radu wrote:
> I've created a pull request[1] to enable configuration
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang
2015 Nov 21
0
[Aarch64 v2 08/18] Add Neon fixed-point implementation of xcorr_kernel.
Used for celt_pitch_xcorr on aarch64, and celt_fir and celt_iir on both armv7 and aarch64.
---
celt/arm/arm_celt_map.c | 17 +++++++++++++
celt/arm/celt_neon_intr.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++-
celt/arm/pitch_arm.h | 31 +++++++++++++++++++++++-
3 files changed, 107 insertions(+), 2 deletions(-)
diff --git a/celt/arm/arm_celt_map.c b/celt/arm/arm_celt_map.c
index
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM,
At line 221 in celt_lpc.c (the celt_iir function) I think you really
want the RESTORE_STACK statement to be before the #endif instead of
after it. Also, I couldn't help notice that your SSE code for
xcorr_kernel reads more than "len" elements of "_x". I don't know if
that's really a problem when running the codec, but a tool like valgrind
will have a
2013 Jun 07
0
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi John,
Thanks for the two fixes. They're in git now. Your SSE version seems to
also be slightly faster than mine -- probably due the the partial sums.
As for the NEON code, it would be good to compare the performance with
the code Aur?lien Zanelli posted at
http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch
Cheers,
Jean-Marc
On 06/06/2013 08:07
2013 Jun 07
1
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Unfortunately I don't have a setup that lets me easily profile ARM code,
so I really can't tell which method is faster (though I suspect Mr.
Zanelli's code is). Let me offer up another intrinsic version of the
NEON xcorr_kernel that is almost identical to the SSE version, and more
in line with Mr. Zanelli's code:
static inline void xcorr_kernel_neon(const opus_val16 *x, const
2015 Aug 05
0
[PATCH 2/8] Reorganize pitch_arm.h, so RTCD works for intrinsics functions as well.
---
celt/arm/arm_celt_map.c | 24 +++++++++++-
celt/arm/pitch_arm.h | 97 +++++++++++++++++++++++++++++++++----------------
2 files changed, 88 insertions(+), 33 deletions(-)
diff --git a/celt/arm/arm_celt_map.c b/celt/arm/arm_celt_map.c
index 0c9acff..cc6b706 100644
--- a/celt/arm/arm_celt_map.c
+++ b/celt/arm/arm_celt_map.c
@@ -94,9 +94,14 @@ void (*const
2015 Mar 02
13
Patch cleaning up Opus x86 intrinsics configury
The attached patch cleans up Opus's x86 intrinsics configury.
It:
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in
2015 Aug 05
0
[PATCH 1/8] Move ARM-specific macro overrides to arm-specific file.
---
celt/arm/pitch_arm.h | 19 +++++++++++++++++++
celt/pitch.h | 19 -------------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/celt/arm/pitch_arm.h b/celt/arm/pitch_arm.h
index d5c9408..fe76f8d 100644
--- a/celt/arm/pitch_arm.h
+++ b/celt/arm/pitch_arm.h
@@ -75,4 +75,23 @@ void celt_pitch_xcorr_float_neon(const opus_val16 *_x, const opus_val16 *_y,
#endif
2015 Dec 08
2
[Aarch64 v2 02/18] Reorganize ARM CPU #ifdefs.
Jonathan Lennox wrote:
> -# if defined(FIXED_POINT)
> +# if defined(FIXED_POINT) && \
> + ((defined(OPUS_ARM_MAY_HAVE_NEON) && !defined(OPUS_ARM_PRESUME_NEON)) || \
> + (defined(OPUS_ARM_MAY_HAVE_MEDIA) && !defined(OPUS_ARM_PRESUME_MEDIA)) || \
> + (defined(OPUS_ARM_MAY_HAVE_EDSP) && !defined(OPUS_ARM_PRESUME_EDSP)))
> opus_val32 (*const