Displaying 16 results from an estimated 16 matches for "vecsum".
2019 Oct 13
2
Replicate Individual O3 optimizations
Hello,
I want to study the individual O3 optimizations. For this I am using
following commands, but unable to replicate O3 behavior.
1. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O1
-Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o vecsum-noopt.ll
2. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O3
-mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c
3. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt
<optimization sequence obtained in step 2> -S vecsum-noopt.ll -S -o
o3-chk.l...
2019 Oct 19
3
Replicate Individual O3 optimizations
...Hello,
> > I want to study the individual O3 optimizations. For this I am using
> > following commands, but unable to replicate O3 behavior.
> >
> > 1. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O1
> > -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o vecsum-noopt.ll
> >
> > 2. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O3
> > -mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c
> >
> > 3. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt
> > <optimization sequenc...
2019 Oct 24
2
Replicate Individual O3 optimizations
...o study the individual O3 optimizations. For this I am using
>> > following commands, but unable to replicate O3 behavior.
>> >
>> > 1. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>> -O1
>> > -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o vecsum-noopt.ll
>> >
>> > 2. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>> -O3
>> > -mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c
>> >
>> > 3. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt
>&...
2019 Jan 16
3
Issues with using scalar evolution with newer versions of LLVM IR
Thank You..
I used following command to generate .bc or .ll
/Documents/clang+llvm-4.0.0-x86_64-linux-gnu-ubuntu-16.04/bin/clang -O0
-emit-llvm -S -o vec4.ll vecsum.c
/Documents/clang+llvm-7.0.0-x86_64-linux-gnu-ubuntu-16.04/bin/clang -O0
-emit-llvm -S -o vec7.ll vecsum.c
On Wed, Jan 16, 2019 at 6:49 AM Sanjoy Das <sanjoy at playingwithpointers.com>
wrote:
> It is hard to tell what's going on from the information you have
> provided. How ar...
2018 Jan 29
1
Polly loop offloading to Accelerator
...lows;
#include <stdio.h>
int a[2048], b[2048], c[2048];
foo () {
int i;
for (i=0; i<2048; i++) {
a[i]=b[5] + c[i];
}
}
i executed following commands;
$clang -S -emit-llvm vec-sum.cpp -march=native -O3 -mllvm
-disable-llvm-optzns -o vec-sum.s
$opt -S -polly-canonicalize vec-sum.s > vecsum.preopt.ll
$opt -polly-ast -polly-ast-detect-parallel -analyze -q vecsum.preopt.ll
-polly-process-unprofitable
the output is;
:: isl ast :: foo :: %1---%8
if (1)
#pragma simd
#pragma known-parallel
for (int c0 = 0; c0 <= 2047; c0 += 1)
Stmt0(c0);
else
{ /* original cod...
2016 Jun 17
5
ARM NEON optimization -- celt_fir()
Hi all,
This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the
next few months.
I'm submitting 2 patches in the following couple of emails, which have the new
created celt_fir_neon().
I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are
concerns to this change, please let me know.
Many thanks to your comments.
Linfeng Zhang
2015 Mar 13
1
[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
...4 ], int len)
+
+void dual_inner_prod_sse(const opus_val16 *x, const opus_val16 *y01, const opus_val16 *y02,
+ int N, opus_val32 *xy1, opus_val32 *xy2)
{
- int j;
-
- __m128i vecX, vecX0, vecX1, vecX2, vecX3;
- __m128i vecY0, vecY1, vecY2, vecY3;
- __m128i sum0, sum1, sum2, sum3, vecSum;
- __m128i initSum;
-
- celt_assert(len >= 3);
-
- sum0 = _mm_setzero_si128();
- sum1 = _mm_setzero_si128();
- sum2 = _mm_setzero_si128();
- sum3 = _mm_setzero_si128();
-
- for (j=0;j<(len-7);j+=8)
- {
- vecX = _mm_loadu_si128((__m128i *)(&x[j + 0]));
-...
2015 Mar 12
1
[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
...4 ], int len)
+
+void dual_inner_prod_sse(const opus_val16 *x, const opus_val16 *y01, const opus_val16 *y02,
+ int N, opus_val32 *xy1, opus_val32 *xy2)
{
- int j;
-
- __m128i vecX, vecX0, vecX1, vecX2, vecX3;
- __m128i vecY0, vecY1, vecY2, vecY3;
- __m128i sum0, sum1, sum2, sum3, vecSum;
- __m128i initSum;
-
- celt_assert(len >= 3);
-
- sum0 = _mm_setzero_si128();
- sum1 = _mm_setzero_si128();
- sum2 = _mm_setzero_si128();
- sum3 = _mm_setzero_si128();
-
- for (j=0;j<(len-7);j+=8)
- {
- vecX = _mm_loadu_si128((__m128i *)(&x[j + 0]));
-...
2019 Jan 15
2
Issues with using scalar evolution with newer versions of LLVM IR
Hello,
I am trying to use scalar evolution pass using following command;
opt -analyze -mem2reg -indvars -loop-simplify -scalar-evolution < vec.bc
when vec.bc is generated using newer version of LLVM i.e LLVM 6 and 7 i get
following message in the end;
Determining loop execution counts for: @main
Loop %8: Unpredictable backedge-taken count.
Loop %8: Unpredictable max backedge-taken count.
Loop
2015 Mar 02
13
Patch cleaning up Opus x86 intrinsics configury
The attached patch cleans up Opus's x86 intrinsics configury.
It:
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in
2015 Mar 18
5
[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10
Hi All,
Since I continue to base my work on top of Jonathan's patch,
and my previous Ne10 fft/ifft/mdct_forward/backward patches,
I thought it would be better to just post all new patches
as a patch series. Please let me know if anyone disagrees
with this approach.
You can see wip branch of all latest patches at
https://git.linaro.org/people/viswanath.puttagunta/opus.git
Branch:
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang
2015 Mar 31
6
[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series
Hi Timothy,
As I mentioned earlier [1], I now fixed compile issues
with fixed point and resubmitting the patch.
I also have new patch that does intrinsics optimizations
for celt_pitch_xcorr targetting aarch64.
You can find my latest work-in-progress branch at [2]
For reference, you can use the Ne10 pre-built libraries
at [3]
Note that I am working with Phil at ARM to get my patch at [4]
2015 May 08
8
[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]
Hi All,
As per Timothy's suggestion, disabling mdct_forward
for fixed point. Only effects
armv7,armv8: Extend fixed fft NE10 optimizations to mdct
Rest of patches are same as in [1]
For reference, latest wip code for opus is at [2]
Still working with NE10 team at ARM to get corner cases of
mdct_forward. Will update with another patch
when issue in NE10 gets fixed.
Regards,
Vish
[1]:
2015 May 15
11
[RFC V3 0/8] Ne10 fft fixed and previous
Hi All,
Changes from RFC v2 [1]
armv7,armv8: Extend fixed fft NE10 optimizations to mdct
- Overflow issue fixed by Phil at ARM. Ne10 wip at [2]. Should be upstream soon.
- So, re-enabled using fixed fft for mdct_forward which was disabled in RFCv2
armv7,armv8: Optimize fixed point fft using NE10 library
- Thanks to Jonathan Lennox, fixed some build fixes on iOS and some copy-paste errors
Rest
2015 Apr 28
10
[RFC PATCH v1 0/8] Ne10 fft fixed and previous
Hello Timothy / Jean-Marc / opus-dev,
This patch series is follow up on work I posted on [1].
In addition to what was posted on [1], this patch series mainly
integrates Fixed point FFT implementations in NE10 library into opus.
You can view my opus wip code at [2].
Note that while I found some issues both with the NE10 library(fixed fft)
and with Linaro toolchain (armv8 intrinsics), the work