thr3ads.net - search: "vqmovns

Displaying 10 results from an estimated 10 matches for "vqmovns_s32".

Did you mean: vqmovn_s32

[PATCH 1/3] Add configure check for Aarch64-specific Neon intrinsics.

2015 Nov 19

[PATCH 1/3] Add configure check for Aarch64-specific Neon intrinsics.

...[$ARM_NEON_INTR_CFLAGS], + [OPUS_ARM_MAY_HAVE_AARCH64_NEON_INTR], + [OPUS_ARM_PRESUME_AARCH64_NEON_INTR], + [[#include <arm_neon.h> + ]], + [[ + static int32_t IN; + static int16_t OUT; + OUT = vqmovns_s32(IN); + ]] + ) + + AS_IF([test x"$OPUS_ARM_PRESUME_AARCH64_NEON_INTR" = x"1"], + [ + AC_DEFINE([OPUS_ARM_PRESUME_AARCH64_NEON_INTR], 1, [Define if binary requires Aarch64 Neon Intrinsics]) + intrinsics_support="$intrin...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 19

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: > > I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics. That?s an obvious next step. This doesn?t show any appreciable speed difference in my tests, but the code is obviously better by inspection (all three of these map directly to a single Aarch64

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

Thanks, I look forward to seeing what you find out. BTW, I was wondering if you tried replacing the SIG2WORD16 macro using the vqmovns_s32 intrinsic? I'm sure it would be faster than the C code, but in the grand scheme of things it might not make much difference. On 11/13/2015 12:15 PM, Jonathan Lennox wrote: >> On Nov 13, 2015, at 1:51 PM, John Ridges <jridges at masque.com> wrote: >> >> Hi Jonathan, &...

[PATCH 3/3] Add Aarch64 intrinsic for SIG2WORD16.

2015 Nov 19

[PATCH 3/3] Add Aarch64 intrinsic for SIG2WORD16.

...TRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#ifndef FIXED_ARM64_H +#define FIXED_ARM64_H + +#include <arm_neon.h> + +#undef SIG2WORD16 +#define SIG2WORD16(x) (vqmovns_s32((x))) + +#endif diff --git a/celt_headers.mk b/celt_headers.mk index 0eca6e6..c9df94b 100644 --- a/celt_headers.mk +++ b/celt_headers.mk @@ -36,6 +36,7 @@ celt/static_modes_fixed_arm_ne10.h \ celt/arm/armcpu.h \ celt/arm/fixed_armv4.h \ celt/arm/fixed_armv5e.h \ +celt/arm/fixed_arm64.h \ celt/a...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 19

[Aarch64 00/11] Patches to enable Aarch64

Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) On 11/19/2015 2:52 PM, Jonathan Lennox wrote: >> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: >> >> I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics. That?s an obvious...

[PATCH] Add Aarch64 intrinsic for SIG2WORD16.

2015 Nov 20

[PATCH] Add Aarch64 intrinsic for SIG2WORD16.

...TRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#ifndef FIXED_ARM64_H +#define FIXED_ARM64_H + +#include <arm_neon.h> + +#undef SIG2WORD16 +#define SIG2WORD16(x) (vqmovns_s32(PSHR32((x), SIG_SHIFT))) + +#endif diff --git a/celt_headers.mk b/celt_headers.mk index 0eca6e6..c9df94b 100644 --- a/celt_headers.mk +++ b/celt_headers.mk @@ -36,6 +36,7 @@ celt/static_modes_fixed_arm_ne10.h \ celt/arm/armcpu.h \ celt/arm/fixed_armv4.h \ celt/arm/fixed_armv5e.h \ +celt/arm/fixe...

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

2015 Nov 21

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

--- configure.ac | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index f52d2c2..e1a6e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -190,7 +190,7 @@ AC_ARG_ENABLE([rtcd], [enable_rtcd=yes]) AC_ARG_ENABLE([intrinsics], - [AS_HELP_STRING([--disable-intrinsics], [Disable intrinsics optimizations for ARM(float) X86(fixed)])],, +

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 20

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming once the tests actually run. > On 11/19/2015 2:52 PM, Jonathan Lennox wrote: >>> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jon...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 16

[Aarch64 00/11] Patches to enable Aarch64

...T32) with Neon intrinsics. That?s an obvious next step. On Nov 13, 2015, at 2:47 PM, John Ridges <jridges at masque.com<mailto:jridges at masque.com>> wrote: Thanks, I look forward to seeing what you find out. BTW, I was wondering if you tried replacing the SIG2WORD16 macro using the vqmovns_s32 intrinsic? I'm sure it would be faster than the C code, but in the grand scheme of things it might not make much difference. On 11/13/2015 12:15 PM, Jonathan Lennox wrote: On Nov 13, 2015, at 1:51 PM, John Ridges <jridges at masque.com><mailto:jridges at masque.com> wrote: Hi Jo...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

Hi Jonathan, I'm sorry to bring this up again, and I don't want to beat a dead horse, but I was very surprised by your benchmarks so I took a little closer look. I think what's happening is that it's a little unfair to compare the ARM64 inline assembly to the C code, because looking at the C macros in "fixed_generic.h" for MULT16_32_Q16 and MULT16_32_Q15 you find

search for: vqmovns_s32