thr3ads.net - similar to: "[LLVMdev] ARM NEON intrinsics in clang"

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] ARM NEON intrinsics in clang"

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

Hello Renato, It turned out I just didn't do the cross-compilation correctly, and Tim Northover already pointed me to a guide you have written on it ( http://clang.llvm.org/docs/CrossCompilation.html), so I will read that before continuing with my efforts. To answer your question I am testing on a pandaboard currently, which has an arm cortex-a9 processor, which I think is 64-bit. I am much

[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 07

[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv1: - Rebased on top of commit aad281878: Fix celt_pitch_xcorr_c signature. which got rid of ugly code around CELT_PITCH_XCORR_IMPL passing of "arch" parameter. - Unified with --enable-intrinsics used by x86 - Modified algorithm to be more in-line with algorithm in celt_pitch_xcorr_arm.s Viswanath Puttagunta

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 09

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Viswanath Puttagunta wrote: > + SUMM = vdupq_n_f32(0); It kills me that there's no intrinsic for VMOV.F32 d0, #0 (or at least I couldn't find one), so this takes two instructions instead of one. > + /* Consume 4 elements in x vector and 8 elements in y > + * vector. However, the 8'th element in y never really gets > + * touched in this loop. So, if len == 4,

[LLVMdev] ARM NEON intrinsics in clang

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

On 26 September 2013 17:52, Stanislav Manilov <stanislav.manilov at gmail.com>wrote: > To answer your question I am testing on a pandaboard currently, which has > an arm cortex-a9 processor, which I think is 64-bit. > Cortex-A9 is still 32-bits, so you'll have all support you need. ;) however it doesn't if I remove the -ffreestanding flag. I need to figure > this out

[PATCH v1] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 19

[PATCH v1] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Hi, Optimizes celt_pitch_xcorr for ARM NEON floating point. Changes from RFCv3: - celt_neon_intr.c - removed warnings due to not having constant pointers - Put simpler loop to take care of corner cases. Unrolling using intrinsics was not really mapping well to what was done in celt_pitch_xcorr_arm.s - Makefile.am Removed explicit -O3 optimization - test_unit_mathops.c,

[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

2014 Nov 09

[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

Hello, This patch introduces ARM NEON Intrinsics to optimize kf_bfly4 routine in celt part of libopus. Using NEON optimized kf_bfly4(_neon) routine helped improve performance of opus_fft_impl function by about 21.4%. The end use case was decoding a music opus ogg file. The end use case saw performance improvement of about 4.47%. This patch has 2 components i. Actual neon code to improve

[RFC PATCH v3] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 10

[RFC PATCH v3] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv2: - Changes recommended by Timothy for celt_neon_intr.c everything except, left the unrolled loop still unrolled - configure.ac - use AC_LINK_IFELSE instead of AC_COMPILE_IFELSE - Moved compile flags into Makefile.am - OPUS_ARM_NEON_INR --> typo --> OPUS_ARM_NEON_INTR Viswanath Puttagunta (1): armv7:

[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 07

[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

From: Viswanath Puttagunta <viswanath.puttagunta at linaro.org> Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv1: - Rebased on top of commit aad281878: Fix celt_pitch_xcorr_c signature. which got rid of ugly code around CELT_PITCH_XCORR_IMPL passing of "arch" parameter. - Unified with --enable-intrinsics used by x86 - Modified algorithm to be more

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 21

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

Hello, I received feedback from engineers working on NE10 [1] that it would be better to use NE10 [1] for FFT optimizations for opus use cases. However, these FFT patches are currently in review and haven't been integrated into NE10 yet. While the FFT functions in NE10 are getting baked, I wanted to optimize the celt_pitch_xcorr (floating point only) and use it to introduce ARM NEON

[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC

2013 Oct 02

[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC

On 2 October 2013 12:17, Renato Golin <renato.golin at linaro.org> wrote: > On 2 October 2013 10:12, Steven Newbury <steve at snewbury.org.uk> wrote: > >> How does this make any sense? >> > > I have to agree with you that this doesn't make much sense, but there is a > case where you would want something like that: when the original source > uses NEON

[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC

2013 Oct 02

[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC

On 2 October 2013 10:12, Steven Newbury <steve at snewbury.org.uk> wrote: > How does this make any sense? > I have to agree with you that this doesn't make much sense, but there is a case where you would want something like that: when the original source uses NEON intrinsics, and there is no alternative in AltiVec, AVX or even plain C. We encourage people to use NEON intrinsics,

[LLVMdev] ARM NEON intrinsics in clang

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

Hello Tim, > I spent the last three days trying to compile a version of LLVM that would > > allow me to compile sources that contain these intrinsics, but with no > success. > > Ok. This we can probably help with. Did you manage to build a version > of Clang (preferably from git/subversion)? > Yes, I managed to build the latest (r191291) svn revision of LLVM + clang. If

[LLVMdev] ARM NEON intrinsics in clang

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

> To answer your question I am testing on a pandaboard currently, which has >> an arm cortex-a9 processor, which I think is 64-bit. >> > > Cortex-A9 is still 32-bits, so you'll have all support you need. ;) > Ah, Okay, embarrassing... however it doesn't if I remove the -ffreestanding flag. I need to figure >> this out next. >> > > Can you at

[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC

2013 Oct 02

[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC

Hello Hal, I am not very familiar with the DSP capabilities of PowerPC, but I imagine there will be instructions for simple vector operations like vector addition, multiplication, etc. so for these I imagine the implementation would consist of just outputting the correct instruction. However, for NEON instructions like the reciprocal step (see

[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC

2013 Oct 01

[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC

Hello LLVM Devs, Thanks for helping me previously to cross-compile for ARM, I managed to get a working toolchain and am currently having fun compiling different toy problems and running them on a pandaboard. As part of my research I am trying to implement the ARM NEON Intrinsics in the PowerPC LLVM backend. I am still at the beginning of my efforts and am not yet familiar with either the ARM or

[RFC PATCH v1 0/3] Introducing ARM SIMD Support

2014 Sep 10

[RFC PATCH v1 0/3] Introducing ARM SIMD Support

libvorbis does not currently have any simd/vectorization. Following patches add generic framework for simd/vectorization and on top, add ARM-NEON simd vectorization using intrinsics. I was able to get over 34% performance improvement on my Beaglebone Black which is single Cortex-A8 based CPU. You can find more information on metrics and procedure I used to measure at

ARM vectorized fp16 support

2019 Sep 05

ARM vectorized fp16 support

Hi, I'm trying to compile half precision program for ARM, while it seems LLVM fails to automatically generate fused-multiply-add instructions for c += a * b. I'm wondering whether I did something wrong, if not, is it a missing feature that will be supported later? (I know there're fp16 FMLA intrinsics though) Test programs and outputs, $ clang -O3 -march=armv8.2-a+fp16fml

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 10

[LLVMdev] NEON intrinsics preventing redundant load optimization?

On 9 Dec 2014, at 02:20, Jim Grosbach <grosbach at apple.com> wrote: >> On Dec 8, 2014, at 1:05 AM, Simon Taylor <simontaylor1 at ntlworld.com> wrote: >> >> On 8 Dec 2014, at 00:13, Renato Golin <renato.golin at linaro.org> wrote: >> >>> On 7 December 2014 at 19:15, Simon Taylor <simontaylor1 at ntlworld.com> wrote: >>>> Is

[LLVMdev] arm neon intrinsics cross compile error on windows system

2011 Nov 23

[LLVMdev] arm neon intrinsics cross compile error on windows system

Dear all. I built the LLVM 3.0 rc4 with Clang front-end in windows os env. (also with -DLLVM_TARGETS_TO_BUILD=all option) For arm neon intrinsics testing, I tried to compile some codes, which are included a few neon intrinsics. Although I got a well done bitcode on ubuntu build pc, it shows some errors when compile the codes on windows. Could you let me know why occurred errors? is this just a

[PATCH 1/3] Add configure check for Aarch64-specific Neon intrinsics.

2015 Nov 19

[PATCH 1/3] Add configure check for Aarch64-specific Neon intrinsics.

--- configure.ac | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/configure.ac b/configure.ac index 90a06c8..adcb969 100644 --- a/configure.ac +++ b/configure.ac @@ -503,6 +503,26 @@ AS_IF([test x"$enable_intrinsics" = x"yes"],[ [rtcd_support="$rtcd_support (NE10)"]) ]) + OPUS_CHECK_INTRINSICS( +

similar to: [LLVMdev] ARM NEON intrinsics in clang