thr3ads.net - similar to: "[Fast Int64 1/4] Move OPUS_FAST

Displaying 20 results from an estimated 1000 matches similar to: "[Fast Int64 1/4] Move OPUS_FAST_INT64 definition to celt/arch.h."

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

2015 Nov 21

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

--- configure.ac | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index f52d2c2..e1a6e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -190,7 +190,7 @@ AC_ARG_ENABLE([rtcd], [enable_rtcd=yes]) AC_ARG_ENABLE([intrinsics], - [AS_HELP_STRING([--disable-intrinsics], [Disable intrinsics optimizations for ARM(float) X86(fixed)])],, +

[PATCH 0/8] Patches for arm64 (aarch64) support

2015 Aug 05

[PATCH 0/8] Patches for arm64 (aarch64) support

This sequence of patches provides arm64 support for Opus. Tested on iOS, Android, and Ubuntu 14.04. The patch sequence was written on top of Viswanath Puttagunta's Ne10 patches, but all but the second ("Reorganize pitch_arm.h") should, I think, apply independently of it. It does depends on my previous intrinsics configury reorganization, however. Comments welcome. With this and

[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.

2015 Nov 07

[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.

Here are my aarch64 patches rebased to the current tip of Opus master. They're largely the same as my previous patch set, with the addition of the final one (the Neon fixed-point implementation of xcorr_kernel). This replaces Viswanath's Neon fixed-point celt_pitch_xcorr, since xcorr_kernel is used in celt_fir and celt_iir as well. These have been tested for correctness under qemu

[Fast Int64 3/4] Explicitly cast results of silk OPUS_FAST_INT64 macros back to opus_int32.

2015 Nov 16

[Fast Int64 3/4] Explicitly cast results of silk OPUS_FAST_INT64 macros back to opus_int32.

--- silk/macros.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/silk/macros.h b/silk/macros.h index 1ba614a..e1e05b9 100644 --- a/silk/macros.h +++ b/silk/macros.h @@ -48,14 +48,14 @@ POSSIBILITY OF SUCH DAMAGE. /* (a32 * (opus_int32)((opus_int16)(b32))) >> 16 output have to be 32bit int */ #if OPUS_FAST_INT64 -#define silk_SMULWB(a32, b32)

[Fast Int64 4/4] Add OPUS_FAST_INT64 definition of silk_SMULWT.

2015 Nov 16

[Fast Int64 4/4] Add OPUS_FAST_INT64 definition of silk_SMULWT.

--- silk/macros.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/silk/macros.h b/silk/macros.h index e1e05b9..7cefedc 100644 --- a/silk/macros.h +++ b/silk/macros.h @@ -61,7 +61,11 @@ POSSIBILITY OF SUCH DAMAGE. #endif /* (a32 * (b32 >> 16)) >> 16 */ +#if OPUS_FAST_INT64 +#define silk_SMULWT(a32, b32) ((opus_int32)(((a32) * (opus_int64)((b32) >> 16))

2017 Apr 26

2 patches related to silk_biquad_alt() optimization

On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > > > A_Q28 is split to 2 14-bit (or 16-bit, whatever) integers, to make the > > multiplication operation within 32-bits. NEON can do 32-bit x 32-bit = > > 64-bit using 'int64x2_t vmull_s32(int32x2_t a, int32x2_t b)', and it > > could possibly be faster and less

2017 May 15

2 patches related to silk_biquad_alt() optimization

Hi Linfeng, Sorry for the delay -- I was actually trying to think of the best option here. For now, my preference would be to keep things bit-exact, but should there be more similar optimizations relying on 64-bit multiplication results, then we could consider having a special option to enable those (even in C). Cheers, Jean-Marc On 08/05/17 12:12 PM, Linfeng Zhang wrote: > Ping for

[Fast Int64 2/4] Add OPUS_FAST_INT64 flavors of celt/fixed_generic.h macros.

2015 Nov 16

[Fast Int64 2/4] Add OPUS_FAST_INT64 flavors of celt/fixed_generic.h macros.

--- celt/fixed_generic.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/celt/fixed_generic.h b/celt/fixed_generic.h index ac67d37..1cfd6d6 100644 --- a/celt/fixed_generic.h +++ b/celt/fixed_generic.h @@ -37,16 +37,32 @@ #define MULT16_16SU(a,b) ((opus_val32)(opus_val16)(a)*(opus_val32)(opus_uint16)(b)) /** 16x32 multiplication, followed by a 16-bit shift right. Results

[PATCH] Create OPUS_FAST_INT64 macro, to abstract conditions where opus_int64 should be used.

2015 Aug 04

[PATCH] Create OPUS_FAST_INT64 macro, to abstract conditions where opus_int64 should be used.

This patch adds a macro abstracting the condition under which the silk math macros use opus_int64-based calculations rather than opus_int32. No substantive change, but will make it easier to adjust if additional such platforms are found in the future. --- silk/macros.h | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/silk/macros.h b/silk/macros.h index

2017 Apr 25

2 patches related to silk_biquad_alt() optimization

On Mon, Apr 24, 2017 at 5:52 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > On 24/04/17 08:03 PM, Linfeng Zhang wrote: > > Tested on my chromebook, when stride (channel) == 1, the optimization > > has no gain compared with C function. > > You mean that the Neon code is the same speed as the C code for > stride==1? This is not terribly surprising for an IIRC

[PATCH 7/8] Update NSQ_LPC_BUF_LENGTH macro.

2016 Aug 23

[PATCH 7/8] Update NSQ_LPC_BUF_LENGTH macro.

NSQ_LPC_BUF_LENGTH is independent of DECISION_DELAY. --- silk/define.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/silk/define.h b/silk/define.h index 781cfdc..1286048 100644 --- a/silk/define.h +++ b/silk/define.h @@ -173,11 +173,7 @@ extern "C" #define MAX_MATRIX_SIZE MAX_LPC_ORDER /* Max of LPC Order and LTP order */ -#if( MAX_LPC_ORDER >

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

Hi Jonathan, I'm sorry to bring this up again, and I don't want to beat a dead horse, but I was very surprised by your benchmarks so I took a little closer look. I think what's happening is that it's a little unfair to compare the ARM64 inline assembly to the C code, because looking at the C macros in "fixed_generic.h" for MULT16_32_Q16 and MULT16_32_Q15 you find

Several patches of ARM NEON optimization

2016 Jul 14

Several patches of ARM NEON optimization

I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang

[Aarch64 v2 00/18] Patches to enable Aarch64 (version 2)

2015 Nov 21

[Aarch64 v2 00/18] Patches to enable Aarch64 (version 2)

As promised, here's a re-send of all my Aarch64 patches, following comments by John Ridges. Note that they actually affect more than just Aarch64 -- other than the ones specifically guarded by AARCH64_NEON defines, the Neon intrinsics all also apply on armv7; and the OPUS_FAST_INT64 patches apply on any 64-bit machine. The patches should largely be independent and independently useful, other

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

Thanks, I look forward to seeing what you find out. BTW, I was wondering if you tried replacing the SIG2WORD16 macro using the vqmovns_s32 intrinsic? I'm sure it would be faster than the C code, but in the grand scheme of things it might not make much difference. On 11/13/2015 12:15 PM, Jonathan Lennox wrote: >> On Nov 13, 2015, at 1:51 PM, John Ridges <jridges at masque.com>

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 12

[Aarch64 00/11] Patches to enable Aarch64

One other minor thing: I notice that in the inline assembly the result (rd) is constrained as an earlyclobber operand. What was the reason for that?

[AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset

2015 Dec 23

[AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset

Following Tim's comments, here are my reworked patches for the Neon intrinsic function patches of of my Aarch64 patchset, i.e. replacing patches 5-8 of the v2 series. Patches 1-4 and 9-18 of the old series still apply unmodified. The one new (as opposed to changed) patch is the first one in this series, to add named constants for the ARM architecture variants. There are also some minor code

silk_warped_autocorrelation_FIX() NEON optimization

2016 Jul 01

silk_warped_autocorrelation_FIX() NEON optimization

Hi all, I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email. It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8 Thanks for your comments. Linfeng

Patch cleaning up Opus x86 intrinsics configury

2015 Mar 02

Patch cleaning up Opus x86 intrinsics configury

The attached patch cleans up Opus's x86 intrinsics configury. It: * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 21

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

Hello, I received feedback from engineers working on NE10 [1] that it would be better to use NE10 [1] for FFT optimizations for opus use cases. However, these FFT patches are currently in review and haven't been integrated into NE10 yet. While the FFT functions in NE10 are getting baked, I wanted to optimize the celt_pitch_xcorr (floating point only) and use it to introduce ARM NEON

similar to: [Fast Int64 1/4] Move OPUS_FAST_INT64 definition to celt/arch.h.