thr3ads.net - similar to: "[LLVMdev] ARM NEON intrinsics in clang"

Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] ARM NEON intrinsics in clang"

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

Hello Renato, It turned out I just didn't do the cross-compilation correctly, and Tim Northover already pointed me to a guide you have written on it ( http://clang.llvm.org/docs/CrossCompilation.html), so I will read that before continuing with my efforts. To answer your question I am testing on a pandaboard currently, which has an arm cortex-a9 processor, which I think is 64-bit. I am much

[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

2014 Nov 09

[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

Optimize kf_bfly4 function using ARM NEON intrinsics for SoCs that have NEON VFP unit As initial step, only targetting ARMv7-VFP based SoCs. To enable this optimization, use --enable-armv7-neon-float when running configure command. This is disabled by default. --- Makefile.am | 16 ++++ celt/_kiss_fft_guts.h | 13 +++ celt/arm/kiss_fft_neon.c | 211

[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 21

[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Optimize celt_pitch_xcorr function (for floating point) using ARM NEON intrinsics for SoCs that have NEON VFP unit. As initial step, targeting ARMv7 NEON (VFP3+) based SoCs. To enable this optimization, use --enable-arm-neon-intrinsics configure option. This flag is not enabled by default. Compile time and runtime checks are also supported to make sure this optimization is only enabled when the

[LLVMdev] ARM NEON intrinsics in clang

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

Hello LLVM Devs, I am starting my PhD on Automatic Parallelization for DSP and want to play with some ARM NEON intrinsics for a start. I spent the last three days trying to compile a version of LLVM that would allow me to compile sources that contain these intrinsics, but with no success. In the process I found out that clang doesn't support NEON (as per

[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 01

[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Hello Timothy, Appreciate the thorough review. Have a few questions before I re-spin the patch in-line. On 28 November 2014 at 15:52, Timothy B. Terriberry <tterribe at xiph.org> wrote: > Review comments inline. > >> +if OPUS_ARM_NEON_INTR >> +noinst_LTLIBRARIES = libarmneon.la >> +libarmneon_la_SOURCES = $(CELT_SOURCES_ARM_NEON_INTR) >>

[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 28

[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Review comments inline. > +if OPUS_ARM_NEON_INTR > +noinst_LTLIBRARIES = libarmneon.la > +libarmneon_la_SOURCES = $(CELT_SOURCES_ARM_NEON_INTR) > +libarmneon_la_CPPFLAGS = $(OPUS_ARM_NEON_INTR_CPPFLAGS) -I$(top_srcdir)/include > +endif I don't think these should be in a separate library. It brings with it lots of complications (to name one: wouldn't the .pc files need to

[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 19

[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Optimize celt_pitch_xcorr function (for floating point) using ARM NEON intrinsics for SoCs that have NEON VFP unit. To enable this optimization, use --enable-intrinsics configure option. Compile time and runtime checks are also supported to make sure this optimization is only enabled when the compiler supports neon intrinsics. --- Makefile.am | 12 ++

[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

2014 Nov 09

[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

Hello, This patch introduces ARM NEON Intrinsics to optimize kf_bfly4 routine in celt part of libopus. Using NEON optimized kf_bfly4(_neon) routine helped improve performance of opus_fft_impl function by about 21.4%. The end use case was decoding a music opus ogg file. The end use case saw performance improvement of about 4.47%. This patch has 2 components i. Actual neon code to improve

[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 10

[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 07

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 19

[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

On 19 December 2014 at 17:25, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > Optimize celt_pitch_xcorr function (for floating point) > using ARM NEON intrinsics for SoCs that have NEON VFP unit. > > To enable this optimization, use --enable-intrinsics > configure option. > > Compile time and runtime checks are also supported to make sure > this

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 09

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Viswanath Puttagunta wrote: > + SUMM = vdupq_n_f32(0); It kills me that there's no intrinsic for VMOV.F32 d0, #0 (or at least I couldn't find one), so this takes two instructions instead of one. > + /* Consume 4 elements in x vector and 8 elements in y > + * vector. However, the 8'th element in y never really gets > + * touched in this loop. So, if len == 4,

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 11

Vectorization with fast-math on irregular ISA sub-sets

----- Original Message ----- > From: "Renato Golin" <renato.golin at linaro.org> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "James Molloy" <James.Molloy at arm.com>, "Nadav Rotem" <nrotem at apple.com>, "Arnold Schwaighofer" > <aschwaighofer at apple.com>, "LLVM Dev" <llvm-dev at

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 10

[LLVMdev] NEON intrinsics preventing redundant load optimization?

On 9 Dec 2014, at 02:20, Jim Grosbach <grosbach at apple.com> wrote: >> On Dec 8, 2014, at 1:05 AM, Simon Taylor <simontaylor1 at ntlworld.com> wrote: >> >> On 8 Dec 2014, at 00:13, Renato Golin <renato.golin at linaro.org> wrote: >> >>> On 7 December 2014 at 19:15, Simon Taylor <simontaylor1 at ntlworld.com> wrote: >>>> Is

[LLVMdev] ARM NEON intrinsics in clang

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

> To answer your question I am testing on a pandaboard currently, which has >> an arm cortex-a9 processor, which I think is 64-bit. >> > > Cortex-A9 is still 32-bits, so you'll have all support you need. ;) > Ah, Okay, embarrassing... however it doesn't if I remove the -ffreestanding flag. I need to figure >> this out next. >> > > Can you at

[PATCH] Use NEON intrinsics detection that fails with gcc 4.8.

2017 Mar 23

[PATCH] Use NEON intrinsics detection that fails with gcc 4.8.

gcc 4.8's NEON intrinsics have bugs that prevent opus's NEON intrinsics from compiling. --- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index fca746f..f945b9a 100644 --- a/configure.ac +++ b/configure.ac @@ -471,7 +471,7 @@ AS_IF([test x"$enable_intrinsics" = x"yes"],[ ]], [[

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 21

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

Hello, I received feedback from engineers working on NE10 [1] that it would be better to use NE10 [1] for FFT optimizations for opus use cases. However, these FFT patches are currently in review and haven't been integrated into NE10 yet. While the FFT functions in NE10 are getting baked, I wanted to optimize the celt_pitch_xcorr (floating point only) and use it to introduce ARM NEON

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 07

[LLVMdev] NEON intrinsics preventing redundant load optimization?

Hi all, I’m not sure if this is the right list, so apologies if not. Doing some profiling I noticed some of my hand-tuned matrix multiply code with NEON intrinsics was much slower through a C++ template wrapper vs calling the intrinsics function directly. It turned out clang/LLVM was unable to eliminate a temporary even though the case seemed quite straightforward. Unfortunately any loads

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2015 Jan 05

[LLVMdev] NEON intrinsics preventing redundant load optimization?

On 4 Jan 2015, at 21:06, Tim Northover <t.p.northover at gmail.com> wrote: >>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing

[LLVMdev] ARM NEON intrinsics in clang

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

On 26 September 2013 17:52, Stanislav Manilov <stanislav.manilov at gmail.com>wrote: > To answer your question I am testing on a pandaboard currently, which has > an arm cortex-a9 processor, which I think is 64-bit. > Cortex-A9 is still 32-bits, so you'll have all support you need. ;) however it doesn't if I remove the -ffreestanding flag. I need to figure > this out

similar to: [LLVMdev] ARM NEON intrinsics in clang