search for: mavx2

Displaying 20 results from an estimated 27 matches for "mavx2".

Did you mean: avx2
2016 Aug 20
2
LLVM flags for Vectorization
Hi, I have been analyzing the LLVM vectorizer by running some benchmarks. For vectorization, I have used the following flags: -O3 -ffast-math -mavx2 Am I missing any other flags which will improve vectorizer performance? Thanks, Santanu Das IIT Hyd -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160820/0ea6f706/attachment.html>
2017 Aug 21
2
Vectorization in LLVM x86 backend
Hi all, Recently I compiled the attached .c file using Clang with "-mavx2 -mfma -m32 -O3" optimization flags. First I used -emit-llvm and inspected the LLVM IR and there are no vector instructions. Then I got the assembly output of the file in it I can clearly see vector instructions in it. Neither the SLPVectorizer or the LoopVectorizer is however doing any vecto...
2016 Oct 27
2
Bug with auto-vectorization of logf
...t array x of size n and output float array f(x), where f is either fabsf or logf. The LLVM 3.9 auto-vectorization docs claim that both functions will be vectorized: http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls When running with "clang -O3 -march=x86-64 -mavx2 -ffast-math test.c -S -emit-llvm", the function calling fabsf is vectorized while the function calling logf is not. This is with clang 3.9, but I've also confirmed the bug exists back to at least clang 3.7. I've also observed that logf calls break vectorization of more complex loops,...
2017 Aug 21
2
Vectorization in LLVM x86 backend
...don't think the X86 backend will create any. > > ~Craig > > On Mon, Aug 21, 2017 at 8:49 AM, Charith Mendis via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> >> Hi all, >> >> Recently I compiled the attached .c file using Clang with "-mavx2 -mfma >> -m32 -O3" optimization flags. >> >> First I used -emit-llvm and inspected the LLVM IR and there are no vector >> instructions. Then I got the assembly output of the file in it I can >> clearly see vector instructions in it. >> >> Neither the S...
2019 Feb 27
2
Problem with compiling OpenBLAS to work with R
...9;s great instructions. However, when I run make, things go well to a certain point, and then go bad: make [snip] touch cygopenblas_haswellp-r0.3.5.a make -j 1 -C test all make[1]: Entering directory '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 sblat1.o ../cygopenblas_haswellp-r0.3.5.a -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib -L/usr/lib/w32api -lmsys-2.0 D:/msys64/usr/lib/....
2018 Feb 12
1
Pattern not recognized as reduction
...separate loop [-Rpass-analysis=loop-vectorize] for(int i=100;i<1000;i++) ^ ------------------------------------------------------------ ------------------------------------------------------------------- The command used for capturing the remarks from above code is clang -O3 -ffast-math -mavx2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize file.c On the other hand , if we replace in the same code sum[0] by x and making very slight changes , it gets vectorized and prints the following remarks : CODE_2 ------------------------------------------------------------ ------------...
2016 Oct 27
0
Bug with auto-vectorization of logf
...torization docs claim that both functions will be vectorized: http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls <http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls> > > When running with "clang -O3 -march=x86-64 -mavx2 -ffast-math test.c -S -emit-llvm", the function calling fabsf is vectorized while the function calling logf is not. This is with clang 3.9, but I've also confirmed the bug exists back to at least clang 3.7. I've also observed that logf calls break vectorization of more complex loops,...
2019 Feb 28
3
Problem with compiling OpenBLAS to work with R
...d then go bad: > > > > make > > [snip] > > > > touch cygopenblas_haswellp-r0.3.5.a > > make -j 1 -C test all > > make[1]: Entering directory > > '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' > > gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 sblat1.o > > ../cygopenblas_haswellp-r0.3.5.a -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 > > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib > > -L/usr/lib/../lib > > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib > >...
2018 Dec 01
2
Restrict global constructors to base ISA
...urce file to the ISA as needed. Then, we guard the higher ISAs at runtime to avoid SIGILLs. It worked well until we added AVX2. For AVX2 we see this as expected: $ CXX=/opt/local/bin/clang++-mp-5.0 make /opt/local/bin/clang++-mp-5.0 ... -c chacha.cpp /opt/local/bin/clang++-mp-5.0 ... -mavx2 -c chacha_avx.cpp /opt/local/bin/clang++-mp-5.0 ... -msse2 -c chacha_simd.cpp ... At runtime we catch a SIGILL due to chacha_avx.cpp as shown below. It looks like global constructors are using instructions from AVX (vxorps), which is beyond what the machine supports. How do we tell Clang...
2016 Oct 28
1
Bug with auto-vectorization of logf
...on docs claim that both functions will be vectorized: http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls <http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls> >> >> When running with "clang -O3 -march=x86-64 -mavx2 -ffast-math test.c -S -emit-llvm", the function calling fabsf is vectorized while the function calling logf is not. This is with clang 3.9, but I've also confirmed the bug exists back to at least clang 3.7. I've also observed that logf calls break vectorization of more complex loops,...
2015 Jan 27
7
[LLVMdev] Embedding cpu and feature strings into IR and enabling switching subtarget on a per function basis
...target lookup to override the function attributes if the corresponding options were specified on the command line. - FIx clang to embed "-target-cpu" and "-target-feature" attributes in the IR. I've tested the changes I made and confirmed that target options such as "-mavx2" don't get dropped during LTO and are honored by backend codegen passes. This is my plan for the remaining tasks: 1. FIx other in-tree targets and other code-gen passes that are still using TargetMachine's subtarget where the per-function subtarget should be used. 2. Fix TargetTrans...
2019 Nov 28
2
SLP example not being vectorized
Hi, I am new to llvm with a particular interested in the optimization area, specially on SLP. While working through the tutorial, I ran this example [1] with the hope to see SLP vectorization in action but for some reason, I do not see it on the LLVM assembly as seen below. Is there anything I am missing? I am using Clearlinux as build machine and this has clang version 9.0.0.
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f .cfi_startproc ## BB#0: ## %entry...
2015 May 04
3
[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo
Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2019 Mar 04
1
Problem with compiling OpenBLAS to work with R
...] >>> > >>> > touch cygopenblas_haswellp-r0.3.5.a > make -j 1 -C >>> test all > make[1]: Entering directory > >>> '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' > >>> gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 >>> sblat1.o > ../cygopenblas_haswellp-r0.3.5.a >>> -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 >>> > >>> -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib >>> > -L/usr/lib/../lib > >>&g...
2019 Feb 28
0
Problem with compiling OpenBLAS to work with R
...e, things go well to a certain point, > and then go bad: > > make > [snip] > > touch cygopenblas_haswellp-r0.3.5.a > make -j 1 -C test all > make[1]: Entering directory > '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' > gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 sblat1.o > ../cygopenblas_haswellp-r0.3.5.a -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib > -L/usr/lib/../lib > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib > -L/usr/lib/w32api -lmsy...
2017 Jan 09
5
The most efficient way to implement an integer based power function pow in LLVM
Hi, I want an efficient way to implement function pow in LLVM instead of invoking pow() math built-in. For algorithm part, I am clear for the logic. But I am not quite sure for which parts of LLVM should I replace built-in pow with another efficient pow implementation. Any comments and feedback are appreciated. Thanks! -- Wei Ding -------------- next part -------------- An HTML attachment was
2019 Mar 01
0
Problem with compiling OpenBLAS to work with R
...; > make >> > [snip] >> > >> > touch cygopenblas_haswellp-r0.3.5.a >> > make -j 1 -C test all >> > make[1]: Entering directory >> > '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' >> > gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 sblat1.o >> > ../cygopenblas_haswellp-r0.3.5.a -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 >> > >> -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib >> > -L/usr/lib/../lib >> > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86...
2020 May 18
2
Use Galois field New Instructions (GFNI) to combine affine instructions
...this one: https://github.com/aguinet/llvm-project/commit/9ed424cbac0fe3566f801167e2190fad5ad07507#diff-21dd247f3b8aa49860ae8122fe3ea698R22 This gets even more interesting with vectorized code, with an example here: * original C code: https://pastebin.com/4JjF7DPu * LLVM IR after clang -O2 -mgfni -mavx2: https://pastebin.com/Ti0Vm2gj [3] * LLVM IR after ACE (using opt -aggressive-instcombine -S): https://pastebin.com/2zFU7J6g (interesting things happened at line 67) If, like me, you don't have a GFNI-enabled CPU, you can use Intel SDE [4] to run the compiled code. The code of the pass is ava...