thr3ads.net - search: "opus_fast

Displaying 16 results from an estimated 16 matches for "opus_fast_int64".

[Fast Int64 1/4] Move OPUS_FAST_INT64 definition to celt/arch.h.

2015 Nov 16

[Fast Int64 1/4] Move OPUS_FAST_INT64 definition to celt/arch.h.

...const char *file, int line) #define UADD32(a,b) ((a)+(b)) #define USUB32(a,b) ((a)-(b)) +/* Set this if opus_int64 is a native type of the CPU. */ +/* Assume that all LP64 architectures have fast 64-bit types; also x86_64 (which can be ILP32 for x32) + and Win64 (which is LLP64). */ +#define OPUS_FAST_INT64 (defined(__LP64__) || defined(__x86_64__) || defined(_WIN64)) + #define PRINT_MIPS(file) #ifdef FIXED_POINT diff --git a/silk/macros.h b/silk/macros.h index bc30303..1ba614a 100644 --- a/silk/macros.h +++ b/silk/macros.h @@ -34,6 +34,7 @@ POSSIBILITY OF SUCH DAMAGE. #include "opus_types...

[PATCH] Create OPUS_FAST_INT64 macro, to abstract conditions where opus_int64 should be used.

2015 Aug 04

[PATCH] Create OPUS_FAST_INT64 macro, to abstract conditions where opus_int64 should be used.

...ns(+), 5 deletions(-) diff --git a/silk/macros.h b/silk/macros.h index 2f24950..bc30303 100644 --- a/silk/macros.h +++ b/silk/macros.h @@ -43,17 +43,20 @@ POSSIBILITY OF SUCH DAMAGE. #define opus_unlikely(x) (!!(x)) #endif +/* Set this if opus_int64 is a native type of the CPU. */ +#define OPUS_FAST_INT64 (defined(__x86_64__) || defined(__LP64__) || defined(_WIN64)) + /* This is an OPUS_INLINE header file for general platform. */ /* (a32 * (opus_int32)((opus_int16)(b32))) >> 16 output have to be 32bit int */ -#if defined(__x86_64__) || defined(__LP64__) || defined(_WIN64) +#if OPUS_FAST_IN...

[Fast Int64 3/4] Explicitly cast results of silk OPUS_FAST_INT64 macros back to opus_int32.

2015 Nov 16

[Fast Int64 3/4] Explicitly cast results of silk OPUS_FAST_INT64 macros back to opus_int32.

...le changed, 5 insertions(+), 5 deletions(-) diff --git a/silk/macros.h b/silk/macros.h index 1ba614a..e1e05b9 100644 --- a/silk/macros.h +++ b/silk/macros.h @@ -48,14 +48,14 @@ POSSIBILITY OF SUCH DAMAGE. /* (a32 * (opus_int32)((opus_int16)(b32))) >> 16 output have to be 32bit int */ #if OPUS_FAST_INT64 -#define silk_SMULWB(a32, b32) (((a32) * (opus_int64)((opus_int16)(b32))) >> 16) +#define silk_SMULWB(a32, b32) ((opus_int32)(((a32) * (opus_int64)((opus_int16)(b32))) >> 16)) #else #define silk_SMULWB(a32, b32) ((((a32) >> 16) * (opus_int32)((op...

[Fast Int64 2/4] Add OPUS_FAST_INT64 flavors of celt/fixed_generic.h macros.

2015 Nov 16

[Fast Int64 2/4] Add OPUS_FAST_INT64 flavors of celt/fixed_generic.h macros.

...fixed_generic.h index ac67d37..1cfd6d6 100644 --- a/celt/fixed_generic.h +++ b/celt/fixed_generic.h @@ -37,16 +37,32 @@ #define MULT16_16SU(a,b) ((opus_val32)(opus_val16)(a)*(opus_val32)(opus_uint16)(b)) /** 16x32 multiplication, followed by a 16-bit shift right. Results fits in 32 bits */ +#if OPUS_FAST_INT64 +#define MULT16_32_Q16(a,b) ((opus_val32)SHR((opus_int64)((opus_val16)(a))*(b),16)) +#else #define MULT16_32_Q16(a,b) ADD32(MULT16_16((a),SHR((b),16)), SHR(MULT16_16SU((a),((b)&0x0000ffff)),16)) +#endif /** 16x32 multiplication, followed by a 16-bit shift right (round-to-nearest). Results f...

[Fast Int64 4/4] Add OPUS_FAST_INT64 definition of silk_SMULWT.

2015 Nov 16

[Fast Int64 4/4] Add OPUS_FAST_INT64 definition of silk_SMULWT.

--- silk/macros.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/silk/macros.h b/silk/macros.h index e1e05b9..7cefedc 100644 --- a/silk/macros.h +++ b/silk/macros.h @@ -61,7 +61,11 @@ POSSIBILITY OF SUCH DAMAGE. #endif /* (a32 * (b32 >> 16)) >> 16 */ +#if OPUS_FAST_INT64 +#define silk_SMULWT(a32, b32) ((opus_int32)(((a32) * (opus_int64)((b32) >> 16)) >> 16)) +#else #define silk_SMULWT(a32, b32) (((a32) >> 16) * ((b32) >> 16) + ((((a32) & 0x0000FFFF) * ((b32) >> 16)) >> 16)) +#endif /* a32 + (b32 * (c...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

...believe that your inline assembly is faster than that mess. But on a 64-bit machine, there's no reason to go through all that when a simple 64-bit multiply will do. The SILK macros in "macros.h" already make this distinction but you don't see it in your benchmarks because the OPUS_FAST_INT64 macro only looks for 64-bit x86, and doesn't check for ARM64. I don't think it's a coincidence that the macros you didn't replace only performed one multiply while the ones you did replace performed two. I think it would be very interesting to try the benchmarks again after addi...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

...e to believe that your inline assembly is faster than that mess. But on a 64-bit machine, there's no reason to go through all that when a simple 64-bit multiply will do. The SILK macros in "macros.h" already make this distinction but you don't see it in your benchmarks because the OPUS_FAST_INT64 macro only looks for 64-bit x86, and doesn't check for ARM64. > No, __LP64__ is set on arm64, both on iOS and on Linux. (64-bit long and pointer.) This is included in the OPUS_FAST_INT64 test. > > The tests for __x86_64__ and for _WIN64 are in OPUS_FAST_INT64 because those are the tw...

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

2015 Nov 21

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

--- configure.ac | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index f52d2c2..e1a6e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -190,7 +190,7 @@ AC_ARG_ENABLE([rtcd], [enable_rtcd=yes]) AC_ARG_ENABLE([intrinsics], - [AS_HELP_STRING([--disable-intrinsics], [Disable intrinsics optimizations for ARM(float) X86(fixed)])],, +

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 16

[Aarch64 00/11] Patches to enable Aarch64

I?ve tried adding support for OPUS_FAST_INT64 to celt/arch.h, and I?ve found that this is indeed comparable in speed, if not a touch faster, than my inline assembly. I?ll submit patches for this. The inline assembly parts of my aarch64 patch set can thus be considered withdrawn. I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/si...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

...e to believe that your inline assembly is faster than that mess. But on a 64-bit machine, there's no reason to go through all that when a simple 64-bit multiply will do. The SILK macros in "macros.h" already make this distinction but you don't see it in your benchmarks because the OPUS_FAST_INT64 macro only looks for 64-bit x86, and doesn't check for ARM64. No, __LP64__ is set on arm64, both on iOS and on Linux. (64-bit long and pointer.) This is included in the OPUS_FAST_INT64 test. The tests for __x86_64__ and for _WIN64 are in OPUS_FAST_INT64 because those are the two common plat...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 12

[Aarch64 00/11] Patches to enable Aarch64

One other minor thing: I notice that in the inline assembly the result (rd) is constrained as an earlyclobber operand. What was the reason for that?

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 10

[Aarch64 00/11] Patches to enable Aarch64

Since you're already set up for benchmarks, I would ask if you could benchmark the difference between using and not using the ARM64 inline assembly. I believe the original justification on ARMv7 for the assembly was the processor's panoply of multiply instructions and their long cycle times. It seems to me that the ARM64 processor is much more like an x86 one, where using a

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 10

[Aarch64 00/11] Patches to enable Aarch64

...has many *fewer* inline assembly snippets for ARM64 than the ARMv7 code does. The guy here at Vidyo who actually did this optimization work (Johnny Lee, whose work I?m just massaging into submittable form) found that many of the multiplies were indeed better as C, especially with (what?s now) the OPUS_FAST_INT64 test. > >

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 10

[Aarch64 00/11] Patches to enable Aarch64

[Aarch64 v2 00/18] Patches to enable Aarch64 (version 2)

2015 Nov 21

[Aarch64 v2 00/18] Patches to enable Aarch64 (version 2)

As promised, here's a re-send of all my Aarch64 patches, following comments by John Ridges. Note that they actually affect more than just Aarch64 -- other than the ones specifically guarded by AARCH64_NEON defines, the Neon intrinsics all also apply on armv7; and the OPUS_FAST_INT64 patches apply on any 64-bit machine. The patches should largely be independent and independently useful, other than obvious infrastructure setups. Jonathan Lennox (18): Move ARM-specific macro overrides to arm-specific file. Reorganize ARM CPU #ifdefs. Rename OPUS_ARM_NEON_INTR AM_CONDITION...

[AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset

2015 Dec 23

[AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset

Following Tim's comments, here are my reworked patches for the Neon intrinsic function patches of of my Aarch64 patchset, i.e. replacing patches 5-8 of the v2 series. Patches 1-4 and 9-18 of the old series still apply unmodified. The one new (as opposed to changed) patch is the first one in this series, to add named constants for the ARM architecture variants. There are also some minor code

search for: opus_fast_int64