Displaying 20 results from an estimated 44 matches for "mavx".
Did you mean:
avx
2013 Dec 24
2
[LLVMdev] [cfe-dev] [Proposal] function attribute to reduce emission of vzeroupper instructions
...putting the attributes on the function
decls is the simplest and should cover most of the cases, I think we
can probably start with that and revisit if we still see too many
vzeroupper being inserted. What do you think?
>>
>> Function definitions in a translation unit compiled with -mavx architecture
>> will
>>
>> implicitly have this attribute.
>
> Can you safely do that? What about code that does uses inline assembly
> to use legacy SSE instructions in a TU compiled with -mavx, for
> instance?
I think it would take a performance penalty, but I don...
2016 Sep 01
2
enabling interleaved access loop vectorization
...the backend to generate good code for the vector types that produces, specifically, in this case, <12 x i8>. The details are in PR29025.
The upshot of this is that for the original program (with an outer loop around it):
$ bin/clang -m32 -O2 -o ~/llvm/temp/rgb2yik.exe ~/llvm/temp/rgb2yik.c -mavx && time ~/llvm/temp/rgb2yik.exe
real 0m2.229s
user 0m2.224s
$ bin/clang -m32 -O2 -o ~/llvm/temp/rgb2yik.exe ~/llvm/temp/rgb2yik.c -mavx -mllvm -enable-interleaved-mem-accesses && time ~/llvm/temp/rgb2yik.exe
real 0m2.590s
user 0m2.584s
This indicates that we do...
2013 Dec 19
4
[LLVMdev] [Proposal] function attribute to reduce emission of vzeroupper instructions
...efinition does not use any legacy SSE instructions, e.g.,
declare x86_avxcc i32 @foo()
2) (Clang part) Add a function attribute to the clang front-end which specifies
this calling convention, e.g.,
extern int foo() __attribute__((avx));
Function definitions in a translation unit compiled with -mavx architecture will
implicitly have this attribute.
Benefits:
No vzeroupper is needed before calling a function with this avx attribute, e.g.,
extern int foo() __attribute__((avx));
void bar() {
...
// some AVX instruction
...
// no vzeroupper is needed before the call instructi...
2012 Sep 06
1
[LLVMdev] Error running spec benchmark with FMA4 on X86
...FMA3). I
have used -ffp-contract=fast to turn on this option. (Compilation options
and targets pasted below).
>>>>>>>>
clang version 3.2 (trunk 163295:163308) (llvm/trunk 163295)
Target: x86_64-unknown-linux-gnu
Thread model: posix
(Options to clang)
-O3 -march=bdver2 -mavx -mno-fma -mfma4 -ffp-contract=fast -save-temps
<<<<<<<
Note that BDVER2 supports both FMA3 and FMA4. Also the benchmark was
run *successfully* when FMA3 was enabled. Reducing the testcase might take
more time but has anyone noticed this issue?
For those interested, miscompar...
2016 Aug 17
2
enabling interleaved access loop vectorization
Thanks Ayal!
On Wed, Aug 17, 2016 at 2:14 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote:
> Hi Michael,
>
>
>
> Don’t quite have a full reproducer for you yet. You’re welcome to try and
> see what’s happening in 32 bit mode when enabling interleaving for the
> following, based on “https://en.wikipedia.org/wiki/YIQ#From_RGB_to_YIQ”:
>
>
>
> void rgb2yik
2012 Apr 30
0
[LLVMdev] [cfe-dev] [RFC] Encoding Compile Flags into the IR
...cation. Any change to the IR has that property,
so it is better that it stays a somewhat formal process, involving a
discussion of each change and documentation on the language
reference..
A simple example of a problem it would be nice to handle in LTO: A
single file in a project is compiled with -mavx and the project uses
cpuid to decide if it should use that function or not. With LTO
currently we would miss the information that functions from one file
could use AVX.
Faced with this problem and with the above scheme implemented, it is
very likely I would jump to recording -mavx in the IL. A mor...
2013 Oct 14
2
[LLVMdev] [RFC] CodeGen Context
...be overridden by options
>> specified in the OptionContext. These are used as IR options by the middle
>> end. A suitable API will be set up to make this transparent to the middle end
>> *waves hands wildly*.
>
> I’m not sure I understand. If I do this:
>
> $ clang -mavx -flto -c file1.c
> $ clang -mno-avx file1.o file2.c
>
> that the -mno-avx should win and that file1.c will be compiled without AVX support?
>
> Maybe I missed some earlier discussion, but that seems really wrong to me. We need the front-end settings to be consistent with the code-...
2015 Aug 22
2
Build optimized R : openblas, MKL, ATLAS
I want to build R optimized, with either MKL, OpenBLAS or ATLAS.
My OS: Fedora 22
Hardware: CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian
CPU(s): 8 Thread(s) per core: 2 Vendor ID: GenuineIntel Model name:
Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
I am a little confused when it comes to choose a method and would like
to hear your experiences. If I am right, I have 3 possibilities:
-
2015 Sep 08
2
Build rpm package for R-MKL
...icc compiler. Build is
going well, but at one point make complains:
make[2]: Entering directory '/home/poisonivy/rpmbuild/BUILD/R-3.2.1/src/unix'
icc -std=c99 -I. -I../../src/include -I../../src/include
-I/usr/local/include -DHAVE_CONFIG_H -fpic -ip -O3
-opt-mem-layout-trans=3 -xHost -mavx -fp-model precise -wd188
-DMKL_ILP64 -qopenmp -parallel
-I/opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/include
-Wl,-z,relro -DR_HOME='"/usr/lib64/R"' \
-o Rscript ./Rscript.c
make[2]: icc: Command not found
Weird, as i had same line many times before with no complai...
2013 Oct 14
0
[LLVMdev] [RFC] CodeGen Context
...;>> specified in the OptionContext. These are used as IR options by the middle
>>> end. A suitable API will be set up to make this transparent to the middle end
>>> *waves hands wildly*.
>>
>> I’m not sure I understand. If I do this:
>>
>> $ clang -mavx -flto -c file1.c
>> $ clang -mno-avx file1.o file2.c
>>
>> that the -mno-avx should win and that file1.c will be compiled without AVX support?
>>
>> Maybe I missed some earlier discussion, but that seems really wrong to me. We need the front-end settings to be consi...
2015 Sep 07
2
Build rpm package for R-MKL
I want to create a clean .rpm package for R built with MKL and ICC. I
follow Fedora instrcutions[0] to create the package. As a base, I use
the R-3.2.2.src.rpm.
I am left with this error:
------------------------------------------
installing R info pages ...
updating '/usr/share/info/dir' ...
make[1]: Leaving directory '/home/poisonivy/rpmbuild/BUILD/R-3.2.2/doc/manual'
+ mv
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture.
You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell.
$ clang -march=core-avx2 -O3 -S -o - test.c
.section __TEXT,__text,regular,pure_instructions
.globl _f
.align 4, 0x90
_f:...
2013 Oct 13
0
[LLVMdev] [RFC] CodeGen Context
...fied by function attributes may be overridden by options
> specified in the OptionContext. These are used as IR options by the middle
> end. A suitable API will be set up to make this transparent to the middle end
> *waves hands wildly*.
I’m not sure I understand. If I do this:
$ clang -mavx -flto -c file1.c
$ clang -mno-avx file1.o file2.c
that the -mno-avx should win and that file1.c will be compiled without AVX support?
Maybe I missed some earlier discussion, but that seems really wrong to me. We need the front-end settings to be consistent with the code-gen options. Whatever op...
2012 Apr 30
2
[LLVMdev] [cfe-dev] [RFC] Encoding Compile Flags into the IR
On Apr 29, 2012, at 5:39 PM, Rafael Espíndola wrote:
> On 29 April 2012 18:44, Bill Wendling <wendling at apple.com> wrote:
>> Hi,
>>
>> Link-Time Optimization has a problem. We need to preserve some of the flags with which the modules were compiled so that the semantics of the resulting program are correct. For example, a module compiled with `-msoft-float' should
2019 Dec 18
3
CMake patches
Hi all,
With some downtime it's time for some CMake fixes.
Most critically is the SSE fixes to avoid crashes that is described in 154 and 132 in github. Patch 5 should address this and also adding APPROX-FLOAT option.
Hopefully this can give some gains for those of us running on Windows servers.j
I went through the pull request and picked out a few that will ease up integration for
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Illustrative Example:
clang -fveclib=SVML -O3 svml.c -mavx
#include <math.h>
void foo(double *a, int N){
int i;
#pragma clang loop vectorize_width(8)
for (i=0;i<N;i++){
a[i] = sin(i);
}
}
Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
This is 8-element SVML sin() called wit...
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello -
I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2015 Sep 04
0
Build R with MKL and ICC
...orrect, I would appreciate some advices and hints.
This is how we build R with the Intel compilers and MKL on CentOS 6.x,
with different versions of R (latest version: 3.2.1) and Intel compilers
(latest version: 2015.3) on Intel SandyBridge CPUs:
fast="-ip -O3 -opt-mem-layout-trans=3 -xHost -mavx"
export CC="icc"
export CFLAGS="$fast -wd188 -fp-model precise"
export F77="ifort"
export FFLAGS="$fast -fp-model precise"
export CXX="icpc"
export CXXFLAGS="$fast -fp-model precise"
export FC="ifort"
export FCFLAGS="$f...
2015 Sep 04
1
Build R with MKL and ICC
...orrect, I would appreciate some advices and
hints.
This is how we build R with the Intel compilers and MKL on CentOS 6.x,
with different versions of R (latest version: 3.2.1) and Intel compilers
(latest version: 2015.3) on Intel SandyBridge CPUs:
fast="-ip -O3 -opt-mem-layout-trans=3 -xHost -mavx"
export CC="icc"
export CFLAGS="$fast -wd188 -fp-model precise"
export F77="ifort"
export FFLAGS="$fast -fp-model precise"
export CXX="icpc"
export CXXFLAGS="$fast -fp-model precise"
export FC="ifort"
export FCFLAGS="$f...
2012 Oct 19
0
--enable-R-shlib and external BLAS/LAPACK libraries
...ure ignore any specified external LAPACK library and use the
internal one insted. I asked why, and was told it was intentional.
Now, with R 2.15.1, I see that it at least appears that this is no
longer the case. I've run configure like this:
fast="-ip -O3 -opt-mem-layout-trans=3 -xHost -mavx"
export CC="icc"
export CFLAGS="$fast -wd188 -fp-model precise"
export F77="ifort"
export FFLAGS="$fast -fp-model precise"
export CXX="icpc"
export CXXFLAGS="$fast -fp-model precise"
export FC="ifort"
export FCFLAGS="$f...