thr3ads.net - search: "mavx2"

Displaying 20 results from an estimated 27 matches for "mavx2".

Did you mean: avx2

2016 Aug 20

LLVM flags for Vectorization

Hi, I have been analyzing the LLVM vectorizer by running some benchmarks. For vectorization, I have used the following flags: -O3 -ffast-math -mavx2 Am I missing any other flags which will improve vectorizer performance? Thanks, Santanu Das IIT Hyd -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160820/0ea6f706/attachment.html>

Vectorization in LLVM x86 backend

2017 Aug 21

Vectorization in LLVM x86 backend

Hi all, Recently I compiled the attached .c file using Clang with "-mavx2 -mfma -m32 -O3" optimization flags. First I used -emit-llvm and inspected the LLVM IR and there are no vector instructions. Then I got the assembly output of the file in it I can clearly see vector instructions in it. Neither the SLPVectorizer or the LoopVectorizer is however doing any vecto...

Bug with auto-vectorization of logf

2016 Oct 27

Bug with auto-vectorization of logf

...t array x of size n and output float array f(x), where f is either fabsf or logf. The LLVM 3.9 auto-vectorization docs claim that both functions will be vectorized: http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls When running with "clang -O3 -march=x86-64 -mavx2 -ffast-math test.c -S -emit-llvm", the function calling fabsf is vectorized while the function calling logf is not. This is with clang 3.9, but I've also confirmed the bug exists back to at least clang 3.7. I've also observed that logf calls break vectorization of more complex loops,...

Vectorization in LLVM x86 backend

2017 Aug 21

Vectorization in LLVM x86 backend

...don't think the X86 backend will create any. > > ~Craig > > On Mon, Aug 21, 2017 at 8:49 AM, Charith Mendis via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> >> Hi all, >> >> Recently I compiled the attached .c file using Clang with "-mavx2 -mfma >> -m32 -O3" optimization flags. >> >> First I used -emit-llvm and inspected the LLVM IR and there are no vector >> instructions. Then I got the assembly output of the file in it I can >> clearly see vector instructions in it. >> >> Neither the S...

Problem with compiling OpenBLAS to work with R

2019 Feb 27

Problem with compiling OpenBLAS to work with R

...9;s great instructions. However, when I run make, things go well to a certain point, and then go bad: make [snip] touch cygopenblas_haswellp-r0.3.5.a make -j 1 -C test all make[1]: Entering directory '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 sblat1.o ../cygopenblas_haswellp-r0.3.5.a -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib -L/usr/lib/w32api -lmsys-2.0 D:/msys64/usr/lib/....

Pattern not recognized as reduction

2018 Feb 12

Pattern not recognized as reduction

...separate loop [-Rpass-analysis=loop-vectorize] for(int i=100;i<1000;i++) ^ ------------------------------------------------------------ ------------------------------------------------------------------- The command used for capturing the remarks from above code is clang -O3 -ffast-math -mavx2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize file.c On the other hand , if we replace in the same code sum[0] by x and making very slight changes , it gets vectorized and prints the following remarks : CODE_2 ------------------------------------------------------------ ------------...

Bug with auto-vectorization of logf

2016 Oct 27

Bug with auto-vectorization of logf

...torization docs claim that both functions will be vectorized: http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls <http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls> > > When running with "clang -O3 -march=x86-64 -mavx2 -ffast-math test.c -S -emit-llvm", the function calling fabsf is vectorized while the function calling logf is not. This is with clang 3.9, but I've also confirmed the bug exists back to at least clang 3.7. I've also observed that logf calls break vectorization of more complex loops,...

Problem with compiling OpenBLAS to work with R

2019 Feb 28

Problem with compiling OpenBLAS to work with R

...d then go bad: > > > > make > > [snip] > > > > touch cygopenblas_haswellp-r0.3.5.a > > make -j 1 -C test all > > make[1]: Entering directory > > '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' > > gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 sblat1.o > > ../cygopenblas_haswellp-r0.3.5.a -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 > > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib > > -L/usr/lib/../lib > > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib > >...

Restrict global constructors to base ISA

2018 Dec 01

Restrict global constructors to base ISA

...urce file to the ISA as needed. Then, we guard the higher ISAs at runtime to avoid SIGILLs. It worked well until we added AVX2. For AVX2 we see this as expected: $ CXX=/opt/local/bin/clang++-mp-5.0 make /opt/local/bin/clang++-mp-5.0 ... -c chacha.cpp /opt/local/bin/clang++-mp-5.0 ... -mavx2 -c chacha_avx.cpp /opt/local/bin/clang++-mp-5.0 ... -msse2 -c chacha_simd.cpp ... At runtime we catch a SIGILL due to chacha_avx.cpp as shown below. It looks like global constructors are using instructions from AVX (vxorps), which is beyond what the machine supports. How do we tell Clang...

Bug with auto-vectorization of logf

2016 Oct 28

Bug with auto-vectorization of logf

...on docs claim that both functions will be vectorized: http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls <http://llvm.org/releases/3.9.0/docs/Vectorizers.html#vectorization-of-function-calls> >> >> When running with "clang -O3 -march=x86-64 -mavx2 -ffast-math test.c -S -emit-llvm", the function calling fabsf is vectorized while the function calling logf is not. This is with clang 3.9, but I've also confirmed the bug exists back to at least clang 3.7. I've also observed that logf calls break vectorization of more complex loops,...

[LLVMdev] Embedding cpu and feature strings into IR and enabling switching subtarget on a per function basis

2015 Jan 27

[LLVMdev] Embedding cpu and feature strings into IR and enabling switching subtarget on a per function basis

...target lookup to override the function attributes if the corresponding options were specified on the command line. - FIx clang to embed "-target-cpu" and "-target-feature" attributes in the IR. I've tested the changes I made and confirmed that target options such as "-mavx2" don't get dropped during LTO and are honored by backend codegen passes. This is my plan for the remaining tasks: 1. FIx other in-tree targets and other code-gen passes that are still using TargetMachine's subtarget where the per-function subtarget should be used. 2. Fix TargetTrans...

SLP example not being vectorized

2019 Nov 28

SLP example not being vectorized

Hi, I am new to llvm with a particular interested in the optimization area, specially on SLP. While working through the tutorial, I ran this example [1] with the hope to see SLP vectorization in action but for some reason, I do not see it on the LLVM assembly as seen below. Is there anything I am missing? I am using Clearlinux as build machine and this has clang version 9.0.0.

[LLVMdev] AVX code gen

2013 Dec 12

[LLVMdev] AVX code gen

It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f .cfi_startproc ## BB#0: ## %entry...

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

2015 May 04

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is

[LLVMdev] AVX code gen

2013 Dec 11

[LLVMdev] AVX code gen

Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such

Problem with compiling OpenBLAS to work with R

2019 Mar 04

Problem with compiling OpenBLAS to work with R

...] >>> > >>> > touch cygopenblas_haswellp-r0.3.5.a > make -j 1 -C >>> test all > make[1]: Entering directory > >>> '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' > >>> gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 >>> sblat1.o > ../cygopenblas_haswellp-r0.3.5.a >>> -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 >>> > >>> -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib >>> > -L/usr/lib/../lib > >>&g...

Problem with compiling OpenBLAS to work with R

2019 Feb 28

Problem with compiling OpenBLAS to work with R

...e, things go well to a certain point, > and then go bad: > > make > [snip] > > touch cygopenblas_haswellp-r0.3.5.a > make -j 1 -C test all > make[1]: Entering directory > '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' > gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 sblat1.o > ../cygopenblas_haswellp-r0.3.5.a -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib > -L/usr/lib/../lib > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib > -L/usr/lib/w32api -lmsy...

The most efficient way to implement an integer based power function pow in LLVM

2017 Jan 09

The most efficient way to implement an integer based power function pow in LLVM

Hi, I want an efficient way to implement function pow in LLVM instead of invoking pow() math built-in. For algorithm part, I am clear for the logic. But I am not quite sure for which parts of LLVM should I replace built-in pow with another efficient pow implementation. Any comments and feedback are appreciated. Thanks! -- Wei Ding -------------- next part -------------- An HTML attachment was

Problem with compiling OpenBLAS to work with R

2019 Mar 01

Problem with compiling OpenBLAS to work with R

...; > make >> > [snip] >> > >> > touch cygopenblas_haswellp-r0.3.5.a >> > make -j 1 -C test all >> > make[1]: Entering directory >> > '/home/erinm/OPB_HOME/xianyi-OpenBLAS-eebc189/test' >> > gfortran -O2 -Wall -frecursive -m64 -mavx2 -o sblat1 sblat1.o >> > ../cygopenblas_haswellp-r0.3.5.a -L/usr/lib/gcc/x86_64-pc-msys/7.3.0 >> > >> -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86_64-pc-msys/lib/../lib >> > -L/usr/lib/../lib >> > -L/usr/lib/gcc/x86_64-pc-msys/7.3.0/../../../../x86...

Use Galois field New Instructions (GFNI) to combine affine instructions

2020 May 18

Use Galois field New Instructions (GFNI) to combine affine instructions

...this one: https://github.com/aguinet/llvm-project/commit/9ed424cbac0fe3566f801167e2190fad5ad07507#diff-21dd247f3b8aa49860ae8122fe3ea698R22 This gets even more interesting with vectorized code, with an example here: * original C code: https://pastebin.com/4JjF7DPu * LLVM IR after clang -O2 -mgfni -mavx2: https://pastebin.com/Ti0Vm2gj [3] * LLVM IR after ACE (using opt -aggressive-instcombine -S): https://pastebin.com/2zFU7J6g (interesting things happened at line 67) If, like me, you don't have a GFNI-enabled CPU, you can use Intel SDE [4] to run the compiled code. The code of the pass is ava...

search for: mavx2