thr3ads.net - similar to: "[PATCH] Add Aarch64 intrinsic for SIG2WORD16."

Displaying 20 results from an estimated 400 matches similar to: "[PATCH] Add Aarch64 intrinsic for SIG2WORD16."

[PATCH 3/3] Add Aarch64 intrinsic for SIG2WORD16.

2015 Nov 19

[PATCH 3/3] Add Aarch64 intrinsic for SIG2WORD16.

--- celt/arch.h | 4 +++- celt/arm/fixed_arm64.h | 35 +++++++++++++++++++++++++++++++++++ celt_headers.mk | 1 + 3 files changed, 39 insertions(+), 1 deletion(-) create mode 100644 celt/arm/fixed_arm64.h diff --git a/celt/arch.h b/celt/arch.h index 670527b..9a06359 100644 --- a/celt/arch.h +++ b/celt/arch.h @@ -123,7 +123,9 @@ static OPUS_INLINE opus_int16 SAT16(opus_int32

[PATCH 4/8] Arm64 assembly for Celt fixed-point math.

2015 Aug 05

[PATCH 4/8] Arm64 assembly for Celt fixed-point math.

--- celt/arch.h | 2 ++ celt/arm/fixed_arm64.h | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++ celt_headers.mk | 1 + 3 files changed, 78 insertions(+) create mode 100644 celt/arm/fixed_arm64.h diff --git a/celt/arch.h b/celt/arch.h index 9f74ddd..219569b 100644 --- a/celt/arch.h +++ b/celt/arch.h @@ -122,6 +122,8 @@ static OPUS_INLINE opus_int16 SAT16(opus_int32 x)

[Aarch64 06/11] Add aarch64 assembly for Celt fixed-point math.

2015 Nov 07

[Aarch64 06/11] Add aarch64 assembly for Celt fixed-point math.

[PATCH 1/3] Add configure check for Aarch64-specific Neon intrinsics.

2015 Nov 19

[PATCH 1/3] Add configure check for Aarch64-specific Neon intrinsics.

--- configure.ac | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/configure.ac b/configure.ac index 90a06c8..adcb969 100644 --- a/configure.ac +++ b/configure.ac @@ -503,6 +503,26 @@ AS_IF([test x"$enable_intrinsics" = x"yes"],[ [rtcd_support="$rtcd_support (NE10)"]) ]) + OPUS_CHECK_INTRINSICS( +

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

2015 Nov 21

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

--- configure.ac | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index f52d2c2..e1a6e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -190,7 +190,7 @@ AC_ARG_ENABLE([rtcd], [enable_rtcd=yes]) AC_ARG_ENABLE([intrinsics], - [AS_HELP_STRING([--disable-intrinsics], [Disable intrinsics optimizations for ARM(float) X86(fixed)])],, +

[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 21

[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Optimize celt_pitch_xcorr function (for floating point) using ARM NEON intrinsics for SoCs that have NEON VFP unit. As initial step, targeting ARMv7 NEON (VFP3+) based SoCs. To enable this optimization, use --enable-arm-neon-intrinsics configure option. This flag is not enabled by default. Compile time and runtime checks are also supported to make sure this optimization is only enabled when the

[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON

2016 Jul 14

[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON

Create the fixed-point intrinsics optimization celt_fir_neon() for ARM NEON. Create test tests/test_unit_optimization to unit test the optimization. --- .gitignore | 1 + Makefile.am | 39 ++++- celt/arm/arm_celt_map.c | 17 +++ celt/arm/celt_lpc_arm.h | 65 ++++++++ celt/arm/celt_lpc_neon_intr.c

[PATCH 0/8] Patches for arm64 (aarch64) support

2015 Aug 05

[PATCH 0/8] Patches for arm64 (aarch64) support

This sequence of patches provides arm64 support for Opus. Tested on iOS, Android, and Ubuntu 14.04. The patch sequence was written on top of Viswanath Puttagunta's Ne10 patches, but all but the second ("Reorganize pitch_arm.h") should, I think, apply independently of it. It does depends on my previous intrinsics configury reorganization, however. Comments welcome. With this and

[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.

2015 Nov 07

[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.

Here are my aarch64 patches rebased to the current tip of Opus master. They're largely the same as my previous patch set, with the addition of the final one (the Neon fixed-point implementation of xcorr_kernel). This replaces Viswanath's Neon fixed-point celt_pitch_xcorr, since xcorr_kernel is used in celt_fir and celt_iir as well. These have been tested for correctness under qemu

[PATCH] Add Aarch64 intrinsic for SIG2WORD16.

2015 Nov 21

[PATCH] Add Aarch64 intrinsic for SIG2WORD16.

Jonathan Lennox wrote: > Fixed definition of SIG2WORD16, thanks to John Ridges. To answer your earlier question from IRC, yes, resending the whole series would probably be helpful (I'm assuming this replaces on of the previous patches).

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 19

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: > > I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics. That?s an obvious next step. This doesn?t show any appreciable speed difference in my tests, but the code is obviously better by inspection (all three of these map directly to a single Aarch64

[RFC V3 7/8] armv7, armv8: Optimize fixed point fft using NE10 library

2015 May 15

[RFC V3 7/8] armv7, armv8: Optimize fixed point fft using NE10 library

Uses NEON optimized fixed point fft routines in NE10 library Signed-off-by: Viswanath Puttagunta <viswanath.puttagunta at linaro.org> Signed-off-by: Jonathan Lennox <jonathan at vidyo.com> --- Makefile.am | 12 +- celt/arm/arm_celt_map.c | 46 ++-- celt/arm/celt_ne10_fft.c | 98 +++++---- celt/arm/fft_arm.h |

[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

2014 Nov 09

[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

Optimize kf_bfly4 function using ARM NEON intrinsics for SoCs that have NEON VFP unit As initial step, only targetting ARMv7-VFP based SoCs. To enable this optimization, use --enable-armv7-neon-float when running configure command. This is disabled by default. --- Makefile.am | 16 ++++ celt/_kiss_fft_guts.h | 13 +++ celt/arm/kiss_fft_neon.c | 211

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

Thanks, I look forward to seeing what you find out. BTW, I was wondering if you tried replacing the SIG2WORD16 macro using the vqmovns_s32 intrinsic? I'm sure it would be faster than the C code, but in the grand scheme of things it might not make much difference. On 11/13/2015 12:15 PM, Jonathan Lennox wrote: >> On Nov 13, 2015, at 1:51 PM, John Ridges <jridges at masque.com>

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

2015 Jan 20

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

Optimize opus encode (float only) usecase using ARM NE10 library. Mainly effects opus_fft and ctl_mdct_forward and related functions. This optimization can be used for ARM CPUs that have NEON VFP unit. This patch only enables optimizations for ARMv7. Official ARM NE10 library page available at http://projectne10.github.io/Ne10/ To enable this optimization, use --enable-intrinsics

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

2015 Jan 29

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

Hi Timothy, Appreciate the comprehensive code review. The biggest issue I see is the peak stack usage.... rest looks like fairly straight forward cleanup. Is the peak stack usage a complete blocker in current form? If it is indeed a blocker, would it be acceptable if we can reduce additional buffer requirement from 2 buffers (current) to 1, possibly by moving scaling inside

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

2015 Jan 29

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

Viswanath Puttagunta wrote: > if OPUS_ARM_NEON_INTR > CELT_ARM_NEON_INTR_OBJ = $(CELT_SOURCES_ARM_NEON_INTR:.c=.lo) \ > - %test_unit_rotation.o %test_unit_mathops.o > -$(CELT_ARM_NEON_INTR_OBJ): CFLAGS += $(OPUS_ARM_NEON_INTR_CPPFLAGS) > + $(CELT_SOURCES_ARM_NE10:.c=.lo) \ > + %test_unit_rotation.o %test_unit_mathops.o \ > +

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 19

[Aarch64 00/11] Patches to enable Aarch64

Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) On 11/19/2015 2:52 PM, Jonathan Lennox wrote: >> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: >> >> I haven?t yet tried replacing SIG2WORD16 (or

[Aarch64 v2 00/18] Patches to enable Aarch64 (version 2)

2015 Nov 21

[Aarch64 v2 00/18] Patches to enable Aarch64 (version 2)

As promised, here's a re-send of all my Aarch64 patches, following comments by John Ridges. Note that they actually affect more than just Aarch64 -- other than the ones specifically guarded by AARCH64_NEON defines, the Neon intrinsics all also apply on armv7; and the OPUS_FAST_INT64 patches apply on any 64-bit machine. The patches should largely be independent and independently useful, other

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 20

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming

similar to: [PATCH] Add Aarch64 intrinsic for SIG2WORD16.