thr3ads.net - search: "avx512f"

Displaying 20 results from an estimated 33 matches for "avx512f".

Did you mean: avx512

r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set

2016 May 01

r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set

...: http://llvm.org/viewvc/llvm-project?rev=267690&view=rev Log: [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set Differential Revision: http://reviews.llvm.org/D19529 Modified: cfe/trunk/include/clang/Basic/BuiltinsX86.def cfe/trunk/lib/Headers/avx512fintrin.h cfe/trunk/test/CodeGen/avx512f-builtins.c Modified: cfe/trunk/include/clang/Basic/BuiltinsX86.def URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/BuiltinsX86.def?rev=267690&r1=267689&r2=267690&view=diff ================================================...

r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set

2016 May 15

r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Hi All. This is an RFC for a proposed target specific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix calle...

Building LLVM 3.8 and later with 2016 Intel C++ compiler

2016 May 09

Building LLVM 3.8 and later with 2016 Intel C++ compiler

...uot;declval" iterator_range<decltype(begin(std::declval<T>()))> drop_begin(T &&t, int n) { An isolated std::declval example shows that this compiler doesn't support it. I can't really roll back to an earlier LLVM version since I need the latest LLVM backends (avx512f etc). I also can't really use GCC to build LLVM since I need to build other parts of the project with Intel and linking would then be a problem. Does this std::declval piece of code happen to be in a place which can be turned on/off with a configure/cmake option? Thanks, Frank

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...: code size reduction in X86 by replacing EVEX > with VEX encoding > > > > Hi All. > > > > This is an RFC for a proposed target specific X86 optimization for > reducing code size in the encoding of AVX-512 instructions when possible. > > > > When the AVX512F instruction set was introduced in X86 it included > additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as > additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. > > In order to encode the new registers of 16-31 and the additional > instructions, a...

LLVM and Xeon Skylake v5

2017 May 08

LLVM and Xeon Skylake v5

Thank you. I'm letting it auto detect by setting the target using getProcessTarget. I disabled avx512 support by passing -avx512f (and the other variants) to setMAttrs on EngineBuilder. I can see refs to avx512 in X86.td. It's the exact same executable running on Kabylake. What does the Cannot select: specifically mean? Is there some table that doesn't have a definition for a key in it that I would need to patch up?...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 24

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...nt: Wednesday, November 23, 2016 5:50:42 AM Subject: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding Hi All. This is an RFC for a proposed target specific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix calle...

invalid code generated on Windows x86_64 using skylake-specific features

2017 Sep 30

invalid code generated on Windows x86_64 using skylake-specific features

...strdup(features.getString().c_str()); } On this windows laptop that I am testing on, I get these values: target_specific_cpu_args: skylake target_specific_features: +sse2,+cx16,-tbm,-avx512ifma,-avx512dq,-fma4,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes,+xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-pku,+mmx,-lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsave,-avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,-sha,+adx,-avx512pf,+sse3 It successfully creates a binary, bu...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 28

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...ent: Wednesday, November 23, 2016 5:50:42 AM Subject: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding Hi All. This is an RFC for a proposed target specific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix calle...

LLVM and Xeon Skylake v5

2017 May 08

LLVM and Xeon Skylake v5

...rection: getProcessTriple not getProcessTarget. > > On 8 May 2017, at 17:55, Andy Schneider via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Thank you. I'm letting it auto detect by setting the target using > getProcessTarget. I disabled avx512 support by passing -avx512f (and the > other variants) to setMAttrs on EngineBuilder. I can see refs to avx512 in > X86.td. It's the exact same executable running on Kabylake. > > What does the Cannot select: specifically mean? Is there some table that > doesn't have a definition for a key in it that I...

invalid code generated on Windows x86_64 using skylake-specific features

2017 Oct 01

invalid code generated on Windows x86_64 using skylake-specific features

...> > On this windows laptop that I am testing on, I get these values: > > target_specific_cpu_args: skylake > > target_specific_features: +sse2,+cx16,-tbm,-avx512ifma,- > avx512dq,-fma4,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes, > +xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-pku,+mmx,- > lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsave,- > avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+sse4. > 1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ > ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,-sha,+adx,-avx512pf,+sse3 > > >...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

avx512 JIT backend generates wrong code on <4 x float>

...is wrong. When I execute the exploit program on an Intel KNL the following output is produced: CPU name = knl -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, Assembly: .text .file "module_KFxOBX_i4_after.ll" .globl adjmul .align 16, 0x90 .type adjmul, at function adjmul: .cfi_startproc leaq (%rdi,%r8), %rdx...

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

...-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="knl" "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+rdseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

avx512 JIT backend generates wrong code on <4 x float>

...e the exploit program on an Intel KNL the following > output > is produced: > > CPU name = knl > -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, > Assembly: > .text > .file "module_KFxOBX_i4_after.ll" > .globl adjmul > .align 16, 0x90 > .type adjmul, at function > adjmul: > .cfi_start...

AVX 512 Assembly Code Generation issues

2017 Jun 21

AVX 512 Assembly Code Generation issues

...t; a[i] = b[i] + c[i]; > } > } > > i first generated its .ll file via clang > > clang -S -emit-llvm test.c -o test.ll > > then i optimized it; > > opt -S -O3 test.ll -o test_o3.ll > > then i used llc for code generation > > llc -mcpu=skylake-avx512 -mattr=+avx512f test_o3.ll -o test_o3.s > > llc -mcpu=knl -mattr=+avx512f test_o3.ll -o test_o3.s > > > here is my generated code; > > > > .text > .file "filer_o3.ll" > .globl foo > .p2align 4, 0x90 > .type foo, at function > foo:...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 30

avx512 JIT backend generates wrong code on <4 x float>

...am on an Intel KNL the following >> output >> is produced: >> >> CPU name = knl >> -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, >> Assembly: >> .text >> .file "module_KFxOBX_i4_after.ll" >> .globl adjmul >> .align 16, 0x90 >> .type adjmul, at function >...

64 bit mask in x86vshuffle instruction

2018 Apr 10

64 bit mask in x86vshuffle instruction

...uffleAsRotate(DL, MVT::v64i32, V1, V2, Mask, Subtarget, DAG)) return Rotate; // Assume that a single SHUFPS is faster than using a permv shuffle. // If some CPU is harmed by the domain switch, we can fix it in a later pass. // If we have AVX512F support, we can use VEXPAND. if (SDValue V = lowerVectorShuffleToEXPAND(DL, MVT::v64i32, Zeroable, Mask, V1, V2, DAG, Subtarget)) return V; return lowerVectorShuffleWithPERMV(DL, MVT::v64i32, Mask, V1, V2, DAG); } static SDValue lowerV32I64Vec...

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

...; "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" >> "stack-protector-buffer-size"="8" "target-cpu"="knl" >> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er, >> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx >> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+ >> rdseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" >> "unsafe-fp-math"="false" "use-soft-float"="fa...

AVX512 instruction generated when JIT compiling for an avx2 architecture

2016 Jun 23

AVX512 instruction generated when JIT compiling for an avx2 architecture

...section ".note.GNU-stack","", at progbits end assembly! I am not sure what instruction is the offending one, but the 'vmovdqa32' looks avx512. I wasn't able to reproduce this with 'opt' - it generates avx2 instructions. And when I force it to use e.g. avx512f it rejects the CPU type. Any ideas? Frank

invalid code generated on Windows x86_64 using skylake-specific features

2017 Oct 03

invalid code generated on Windows x86_64 using skylake-specific features

...ing on, I get these values: >>> >>> target_specific_cpu_args: skylake >>> >>> target_specific_features: +sse2,+cx16,-tbm,-avx512ifma,- >>> avx512dq,-fma4,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes, >>> +xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-p >>> ku,+mmx,-lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsav >>> e,-avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+ >>> sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+ >>> f16c,+ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,- >&g...

search for: avx512f