Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] ARM NEON intrinsics in clang"
2013 Sep 26
2
[LLVMdev] ARM NEON intrinsics in clang
Hello Renato,
It turned out I just didn't do the cross-compilation correctly, and Tim
Northover already pointed me to a guide you have written on it (
http://clang.llvm.org/docs/CrossCompilation.html), so I will read that
before continuing with my efforts.
To answer your question I am testing on a pandaboard currently, which has
an arm cortex-a9 processor, which I think is 64-bit.
I am much
2014 Dec 07
2
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv1:
- Rebased on top of commit
aad281878: Fix celt_pitch_xcorr_c signature.
which got rid of ugly code around CELT_PITCH_XCORR_IMPL
passing of "arch" parameter.
- Unified with --enable-intrinsics used by x86
- Modified algorithm to be more in-line with algorithm in
celt_pitch_xcorr_arm.s
Viswanath Puttagunta
2014 Dec 09
1
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Viswanath Puttagunta wrote:
> + SUMM = vdupq_n_f32(0);
It kills me that there's no intrinsic for VMOV.F32 d0, #0 (or at least I
couldn't find one), so this takes two instructions instead of one.
> + /* Consume 4 elements in x vector and 8 elements in y
> + * vector. However, the 8'th element in y never really gets
> + * touched in this loop. So, if len == 4,
2013 Sep 26
0
[LLVMdev] ARM NEON intrinsics in clang
On 26 September 2013 17:52, Stanislav Manilov
<stanislav.manilov at gmail.com>wrote:
> To answer your question I am testing on a pandaboard currently, which has
> an arm cortex-a9 processor, which I think is 64-bit.
>
Cortex-A9 is still 32-bits, so you'll have all support you need. ;)
however it doesn't if I remove the -ffreestanding flag. I need to figure
> this out
2014 Dec 19
2
[PATCH v1] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for ARM NEON floating point.
Changes from RFCv3:
- celt_neon_intr.c
- removed warnings due to not having constant pointers
- Put simpler loop to take care of corner cases. Unrolling using
intrinsics was not really mapping well to what was done
in celt_pitch_xcorr_arm.s
- Makefile.am
Removed explicit -O3 optimization
- test_unit_mathops.c,
2014 Nov 09
3
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
Hello,
This patch introduces ARM NEON Intrinsics to optimize
kf_bfly4 routine in celt part of libopus.
Using NEON optimized kf_bfly4(_neon) routine helped improve
performance of opus_fft_impl function by about 21.4%. The
end use case was decoding a music opus ogg file. The end
use case saw performance improvement of about 4.47%.
This patch has 2 components
i. Actual neon code to improve
2014 Dec 10
2
[RFC PATCH v3] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv2:
- Changes recommended by Timothy for celt_neon_intr.c
everything except, left the unrolled loop still unrolled
- configure.ac
- use AC_LINK_IFELSE instead of AC_COMPILE_IFELSE
- Moved compile flags into Makefile.am
- OPUS_ARM_NEON_INR --> typo --> OPUS_ARM_NEON_INTR
Viswanath Puttagunta (1):
armv7:
2014 Dec 07
3
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
From: Viswanath Puttagunta <viswanath.puttagunta at linaro.org>
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv1:
- Rebased on top of commit
aad281878: Fix celt_pitch_xcorr_c signature.
which got rid of ugly code around CELT_PITCH_XCORR_IMPL
passing of "arch" parameter.
- Unified with --enable-intrinsics used by x86
- Modified algorithm to be more
2014 Nov 21
4
[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hello,
I received feedback from engineers working on NE10 [1] that
it would be better to use NE10 [1] for FFT optimizations for
opus use cases. However, these FFT patches are currently in review
and haven't been integrated into NE10 yet.
While the FFT functions in NE10 are getting baked, I wanted
to optimize the celt_pitch_xcorr (floating point only) and use
it to introduce ARM NEON
2013 Oct 02
0
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
On 2 October 2013 12:17, Renato Golin <renato.golin at linaro.org> wrote:
> On 2 October 2013 10:12, Steven Newbury <steve at snewbury.org.uk> wrote:
>
>> How does this make any sense?
>>
>
> I have to agree with you that this doesn't make much sense, but there is a
> case where you would want something like that: when the original source
> uses NEON
2013 Oct 02
3
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
On 2 October 2013 10:12, Steven Newbury <steve at snewbury.org.uk> wrote:
> How does this make any sense?
>
I have to agree with you that this doesn't make much sense, but there is a
case where you would want something like that: when the original source
uses NEON intrinsics, and there is no alternative in AltiVec, AVX or even
plain C.
We encourage people to use NEON intrinsics,
2013 Sep 26
0
[LLVMdev] ARM NEON intrinsics in clang
Hello Tim,
> I spent the last three days trying to compile a version of LLVM that would
> > allow me to compile sources that contain these intrinsics, but with no
> success.
>
> Ok. This we can probably help with. Did you manage to build a version
> of Clang (preferably from git/subversion)?
>
Yes, I managed to build the latest (r191291) svn revision of LLVM + clang.
If
2013 Sep 26
1
[LLVMdev] ARM NEON intrinsics in clang
> To answer your question I am testing on a pandaboard currently, which has
>> an arm cortex-a9 processor, which I think is 64-bit.
>>
>
> Cortex-A9 is still 32-bits, so you'll have all support you need. ;)
>
Ah, Okay, embarrassing...
however it doesn't if I remove the -ffreestanding flag. I need to figure
>> this out next.
>>
>
> Can you at
2013 Oct 02
5
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
Hello Hal,
I am not very familiar with the DSP capabilities of PowerPC, but I imagine
there will be instructions for simple vector operations like vector
addition, multiplication, etc. so for these I imagine the implementation
would consist of just outputting the correct instruction. However, for NEON
instructions like the reciprocal step (see
2013 Oct 01
3
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
Hello LLVM Devs,
Thanks for helping me previously to cross-compile for ARM, I managed to get
a working toolchain and am currently having fun compiling different toy
problems and running them on a pandaboard.
As part of my research I am trying to implement the ARM NEON Intrinsics in
the PowerPC LLVM backend. I am still at the beginning of my efforts and am
not yet familiar with either the ARM or
2014 Sep 10
4
[RFC PATCH v1 0/3] Introducing ARM SIMD Support
libvorbis does not currently have any simd/vectorization.
Following patches add generic framework for simd/vectorization
and on top, add ARM-NEON simd vectorization using intrinsics.
I was able to get over 34% performance improvement on my
Beaglebone Black which is single Cortex-A8 based CPU.
You can find more information on metrics and procedure I used
to measure at
2019 Sep 05
2
ARM vectorized fp16 support
Hi,
I'm trying to compile half precision program for ARM, while it seems
LLVM fails to automatically generate fused-multiply-add instructions
for c += a * b. I'm wondering whether I did something wrong, if not,
is it a missing feature that will be supported later? (I know there're
fp16 FMLA intrinsics though)
Test programs and outputs,
$ clang -O3 -march=armv8.2-a+fp16fml
2014 Dec 10
2
[LLVMdev] NEON intrinsics preventing redundant load optimization?
On 9 Dec 2014, at 02:20, Jim Grosbach <grosbach at apple.com> wrote:
>> On Dec 8, 2014, at 1:05 AM, Simon Taylor <simontaylor1 at ntlworld.com> wrote:
>>
>> On 8 Dec 2014, at 00:13, Renato Golin <renato.golin at linaro.org> wrote:
>>
>>> On 7 December 2014 at 19:15, Simon Taylor <simontaylor1 at ntlworld.com> wrote:
>>>> Is
2011 Nov 23
4
[LLVMdev] arm neon intrinsics cross compile error on windows system
Dear all.
I built the LLVM 3.0 rc4 with Clang front-end in windows os env. (also with
-DLLVM_TARGETS_TO_BUILD=all option)
For arm neon intrinsics testing, I tried to compile some codes, which are
included a few neon intrinsics.
Although I got a well done bitcode on ubuntu build pc, it shows some errors
when compile the codes on windows.
Could you let me know why occurred errors? is this just a
2015 Nov 19
3
[PATCH 1/3] Add configure check for Aarch64-specific Neon intrinsics.
---
configure.ac | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/configure.ac b/configure.ac
index 90a06c8..adcb969 100644
--- a/configure.ac
+++ b/configure.ac
@@ -503,6 +503,26 @@ AS_IF([test x"$enable_intrinsics" = x"yes"],[
[rtcd_support="$rtcd_support (NE10)"])
])
+ OPUS_CHECK_INTRINSICS(
+