search for: avx512

Displaying 20 results from an estimated 167 matches for "avx512".

2016 May 15
2
r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set
...Michael Zuckerman From: Eric Christopher [mailto:echristo at gmail.com] Sent: Sunday, May 01, 2016 19:54 To: Zuckerman, Michael <michael.zuckerman at intel.com>; Craig Topper <craig.topper at gmail.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set Why? On Sun, May 1, 2016, 6:04 AM Zuckerman, Michael via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, For now no. But I will add this three builtins to CGBuiltin.cpp. If you want, y...
2016 May 01
2
r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set
...uiltins to CGBuiltin.cpp. If you want, you can be a reviewer of this change. Regards Michael Zuckerman From: Craig Topper [mailto:craig.topper at gmail.com] Sent: Thursday, April 28, 2016 04:53 To: Zuckerman, Michael <michael.zuckerman at intel.com> Subject: Re: r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set Can we use native IR for the stores the way the 128-bit and 256-bit equivalents do? On Wed, Apr 27, 2016 at 3:44 AM, Michael Zuckerman via cfe-commits <cfe-commits at lists.llvm.org<mailto:cfe-commits at lists.llvm.org>&gt...
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
...what value "getHostCPUName" returned? getHostCPUName() = skylake > > On Thu, Jun 23, 2016 at 9:53 AM, Frank Winter via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > With LLVM 3.8 the JIT compiler engine generates an AVX512 > instruction although I target an 'avx2' CPU (intel Core I7). > I just downloaded the most recent 3.8 and still it happens. > > It happens with this input module: > > > target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" > >...
2013 Oct 30
0
[LLVMdev] [AVX512] Inconsistent mask types for intrinsics?
Hey guys, There seems to be an inconsistency between mask operand types for the AVX512 intrinsics. The mask instruction intrinsics expect a v16i1 for the mask operands: > def int_x86_kadd_v16i1 : GCCBuiltin<"__builtin_ia32_kaddw">, > Intrinsic<[llvm_v16i1_ty], [llvm_v16i1_ty, llvm_v16i1_ty], > [IntrNoMem]>; But o...
2017 May 08
2
LLVM and Xeon Skylake v5
Thank you. I'm letting it auto detect by setting the target using getProcessTarget. I disabled avx512 support by passing -avx512f (and the other variants) to setMAttrs on EngineBuilder. I can see refs to avx512 in X86.td. It's the exact same executable running on Kabylake. What does the Cannot select: specifically mean? Is there some table that doesn't have a definition for a key in it th...
2016 Nov 10
2
X86 backend code ownership
Thanks for the support Nadav, Zvi, Chandler, Renato, and anyone else I missed. Quetin, to maybe address your concerns. My focus lately has been fixing inconsistency in instruction selection behavior between the older AVX instruction encodings and the new AVX512 encodings. I've also been trying to fix cases where concepts haven't been extended to wider vectors yet. For instance, the instcombine handling of x86 shift intrinsics. I've also been trying to remove AVX512 intrinsics for things that can be represented with native IR or where we can us...
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
With LLVM 3.8 the JIT compiler engine generates an AVX512 instruction although I target an 'avx2' CPU (intel Core I7). I just downloaded the most recent 3.8 and still it happens. It happens with this input module: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" define void @module_cFFEMJ(i64 %lo, i64 %hi, i64 %myId, i1 %...
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
...ently. > > -Hal > > ----- Original Message ----- >> From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org> >> To: "LLVM Dev" <llvm-dev at lists.llvm.org> >> Sent: Wednesday, June 29, 2016 2:41:39 PM >> Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4 x float> >> >> Hi! >> >> When compiling the attached module with the JIT engine on an Intel >> KNL I >> see wrong code getting emitted. I attach a complete exploit program >> which shows the bug in LLVM 3.8. It l...
2020 Sep 05
2
Possible AVX512 codegen bug in LLVM 10.0.1?
Hey LLVMDev, Perhaps I'm missing something, but I think I've stumbled across a codegen bug in LLVM 10.0.1 related to AVX512. I've attached a small LLVM IR testcase and generated x86_64 assembly file that shows the bug. The test case is small, but not quite minimal, mostly because of driver code included in the test case so one can compile and run the program. The program does a simple vectorizable computation two...
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
...evelopment has been very active recently. -Hal ----- Original Message ----- > From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org> > To: "LLVM Dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, June 29, 2016 2:41:39 PM > Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4 x float> > > Hi! > > When compiling the attached module with the JIT engine on an Intel > KNL I > see wrong code getting emitted. I attach a complete exploit program > which shows the bug in LLVM 3.8. It loads and JIT compiles the...
2017 May 08
2
LLVM and Xeon Skylake v5
...ts.com> wrote: > Correction: getProcessTriple not getProcessTarget. > > On 8 May 2017, at 17:55, Andy Schneider via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Thank you. I'm letting it auto detect by setting the target using > getProcessTarget. I disabled avx512 support by passing -avx512f (and the > other variants) to setMAttrs on EngineBuilder. I can see refs to avx512 in > X86.td. It's the exact same executable running on Kabylake. > > What does the Cannot select: specifically mean? Is there some table that > doesn't have a defini...
2016 Nov 10
2
X86 backend code ownership
...Thanks for the support Nadav, Zvi, Chandler, Renato, and anyone else I >> missed. >> >> Quetin, to maybe address your concerns. My focus lately has been fixing >> inconsistency in instruction selection behavior between the older AVX >> instruction encodings and the new AVX512 encodings. I've also been trying >> to fix cases where concepts haven't been extended to wider vectors yet. For >> instance, the instcombine handling of x86 shift intrinsics. I've also been >> trying to remove AVX512 intrinsics for things that can be represented with &...
2017 Nov 14
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...; >>>> If skylake is that bad at AVX2 >>>> >>>> >>>> I don't think this says anything negative about AVX2, but AVX-512. >>>> >>> >> Right. I think we're at AVX/AVX2 is "bad" on Haswell/Broadwell and AVX512 >> is "bad" on Skylake. At least in the "random autovectorization spread out" >> aspect. >> >> >>> >>>> >>>> it belongs in -mcpu / -march IMO. >>>> >>>> >>>> No. We'd still want...
2017 May 08
2
LLVM and Xeon Skylake v5
Hi, I have a JIT compiler using the legacy JIT on LLVM 3.5 that, when run on the Xeon v5 Skylakes produces "Cannot select: intrinsic %llvm.x86.sse41.round.sd". Note, this does not occur on i7 Kabylakes. To get this far I had to disable AVX512 code gen. Upgrading the system I am looking at from 3.5 to a later version is a big job that I'd prefer not to have on my critical path. Does anyone have any tips on where I would look to debug this sort of issue? I'm new to LLVM. Thanks Andy
2017 Jun 21
2
AVX 512 Assembly Code Generation issues
when i generate code with 72 loop iterations. the compiler generates code with using avx512 zmm operations 4 times (16x4=64) and remaining 8 iterations are handled by routine mov operations with EAX register. wouldn't it be better if it uses ymm for remaining 8 iterations as it does when iteration count is between 8 and 15. same for xmm and so on. please correct me if i am wrong....
2017 Nov 13
3
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...anything negative about AVX2, but AVX-512. > > it belongs in -mcpu / -march IMO. > > > No. We'd still want to enable the architectural features for vector > intrinsics and the like. > I took this to mean that the feature should be enabled by default for -march=skylake-avx512. > > > Based on the current performance data we're seeing, we think we need to > ultimately default skylake-avx512 to -mprefer-vector-width=256. > > > Craig, is this for both integer and floating-point code? > I believe so, but I'll try to get confirmation from t...
2017 Nov 13
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
.../11/2017 09:52 PM, UE US via llvm-dev wrote: >> If skylake is that bad at AVX2 > > I don't think this says anything negative about AVX2, but AVX-512. > > > Right. I think we're at AVX/AVX2 is "bad" on Haswell/Broadwell and > AVX512 is "bad" on Skylake. At least in the "random autovectorization > spread out" aspect. > > > >> it belongs in -mcpu / -march IMO. > > No. We'd still want to enable the architectural features for > vector intrinsics and th...
2013 Dec 13
2
[LLVMdev] broken LLVM-MC?
Hi, It seems LLVM-MC is broken with Avx512? $ echo "vinserti32x4 \$1, %xmm21, %zmm5, %zmm17"|./Release+Asserts/bin/llvm-mc -assemble -arch=x86-64 -show-encoding -x86-asm-syntax=att .text vinserti32x4 $1, %xmm21, %zmm5, %zmm17 # encoding: [0x62,0xa3,0x55,0x48,0x38,0xcd,0x01] $ echo "0x62,0xa3,0x55,0x4...
2019 May 02
2
llvm is illegally vectorizing with a recurrence on skylake
...=digits-1;++d) { int *t; one(in,in+n,shift[d],indicies[d],dst); t=in,in=dst,dst=t; } #ifndef NO_TWO two(in,in+n,shift[d],indicies[d],idx); #endif } /*****************************************************************/ clang -S -O2 -Rpass=loop-vectorize small.c -march=skylake-avx512 small.c:6:3: remark: vectorized loop (vectorization width: 16, interleaved count: 1) [-Rpass=loop-vectorize] do { ^ I believe the problem to be a issue with dependency information getting destroyed because if you remove the two() function (or compile one() on its own, or prevent inlining of on...
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
...prints the assembler. I stumbled on this since the result of an actual calculation was wrong. So, it's not only the text version of the assembler also the machine assembler is wrong. When I execute the exploit program on an Intel KNL the following output is produced: CPU name = knl -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, Assembly: .text .file &...