similar to: Possible AVX512 codegen bug in LLVM 10.0.1?

Displaying 20 results from an estimated 4000 matches similar to: "Possible AVX512 codegen bug in LLVM 10.0.1?"

2017 Jun 21
2
AVX 512 Assembly Code Generation issues
when i generate code with 72 loop iterations. the compiler generates code with using avx512 zmm operations 4 times (16x4=64) and remaining 8 iterations are handled by routine mov operations with EAX register. wouldn't it be better if it uses ymm for remaining 8 iterations as it does when iteration count is between 8 and 15. same for xmm and so on. please correct me if i am wrong. Thank
2016 Jun 13
2
Loop vectorizer Queires
Hello, I have a few issues in vectorizing loops using Clang 3.8. Will it be ok if I shoot some of my findings and queries here? Meanwhile, can I please know if LLVM support autovectorized MIC instructions for Xeon phi? If so, could you please tell me the flags to use? I am Jumana, a masters student in Embedded system working as a graduate research intern with Intel. For my thesis, I am working
2016 May 01
2
r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set
Hi, For now no. But I will add this three builtins to CGBuiltin.cpp. If you want, you can be a reviewer of this change. Regards Michael Zuckerman From: Craig Topper [mailto:craig.topper at gmail.com] Sent: Thursday, April 28, 2016 04:53 To: Zuckerman, Michael <michael.zuckerman at intel.com> Subject: Re: r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps
2016 May 15
2
r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa vmovntpd vmovntps instruction set
Hi , In the future, we will address this issue. Regards Michael Zuckerman From: Eric Christopher [mailto:echristo at gmail.com] Sent: Sunday, May 01, 2016 19:54 To: Zuckerman, Michael <michael.zuckerman at intel.com>; Craig Topper <craig.topper at gmail.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] r267690 - [Clang][BuiltIn][AVX512]Adding intrinsics for vmovntdqa
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
On 06/23/2016 12:56 PM, Craig Topper wrote: > Can you check what value "getHostCPUName" returned? getHostCPUName() = skylake > > On Thu, Jun 23, 2016 at 9:53 AM, Frank Winter via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > With LLVM 3.8 the JIT compiler engine generates an AVX512 > instruction although I
2017 Jan 24
7
[X86][AVX512] RFC: make i1 illegal in the Codegen
Hi All, AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1's. For example, define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) { %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>,
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
Hi Hal! Thanks, but unfortunately it didn't help. The exact same assembler instructions are generated for both 3.8 (yesterday) and trunk (from today). So, this really looks like a bug. Best, Frank On 06/29/2016 03:48 PM, Hal Finkel wrote: > Hi Frank, > > I recommend trying trunk LLVM. AVX-512 development has been very active recently. > > -Hal > > ----- Original
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
With LLVM 3.8 the JIT compiler engine generates an AVX512 instruction although I target an 'avx2' CPU (intel Core I7). I just downloaded the most recent 3.8 and still it happens. It happens with this input module: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" define void @module_cFFEMJ(i64 %lo, i64 %hi, i64 %myId, i1 %ordered, i64 %start, i32* noalias align 32
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
Hi Frank, I recommend trying trunk LLVM. AVX-512 development has been very active recently. -Hal ----- Original Message ----- > From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org> > To: "LLVM Dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, June 29, 2016 2:41:39 PM > Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
Hi! When compiling the attached module with the JIT engine on an Intel KNL I see wrong code getting emitted. I attach a complete exploit program which shows the bug in LLVM 3.8. It loads and JIT compiles the module and prints the assembler. I stumbled on this since the result of an actual calculation was wrong. So, it's not only the text version of the assembler also the machine
2005 May 11
4
Should shadow_lock be spin_lock_recursive?
During our testing, we found this code path where xen attempts to grab the shadow_lock, while holding it - leading to a deadlock. >> free_dom_mem-> >> shadow_sync_and_drop_references-> >> shadow_lock -> ..................... first lock >> shadow_remove_all_access-> >> remove_all_access_in_page-> >> put_page-> >>
2016 Nov 09
3
Vectorizers code ownership
Hi Quentin,  Thank you for bringing this up. I planned to finish the discussion on the vectorizer before starting the discussion on the X86 backed code ownership, but now is a good time. Simon, Sanjay, Craig, Elena, Bruno, Michael, Andrea, Chandler have made significant contributions to the X86 backend in the last few years. I think that Craig Topper would be a great code owner, assuming he wants
2015 Jul 08
7
[LLVMdev] LLVM loop vectorizer
Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure but the LLVM loop vectorizer is not able to handle such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV function is inspired from
2017 Nov 13
3
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
On Sat, Nov 11, 2017 at 8:52 PM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > On 11/11/2017 09:52 PM, UE US via llvm-dev wrote: > > If skylake is that bad at AVX2 > > > I don't think this says anything negative about AVX2, but AVX-512. > > it belongs in -mcpu / -march IMO. > > > No. We'd still want to enable the architectural
2017 Nov 14
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
I haven't looked into actually implementing revectorization, so we may just want to ignore that possibility for now. But I imagined that revectorization could hit the same problem that we're trying to avoid here: if the cost models say that wider vectors are legal and cheaper, but the reality is that perf will suffer when using those wider vectors, then we want to avoid using the wider
2017 Nov 13
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
On 11/13/2017 05:49 PM, Eric Christopher wrote: > > > On Mon, Nov 13, 2017 at 2:15 PM Craig Topper via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > On Sat, Nov 11, 2017 at 8:52 PM, Hal Finkel via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > > On
2017 Nov 12
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
If skylake is that bad at AVX2 it belongs in -mcpu / -march IMO. Most people will build for the standard x86_64-pc-linux or whatever anyway, and completely ignore the change. This will mainly affect those who build their own software and optimize for their system, and lots there have probably caught on to this already. I always thought that's what -march was made for, really. GNOMETOYS
2017 Nov 01
5
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Hello all, I would like to propose adding the -mprefer-avx256 and -mprefer-avx128 command line flags supported by latest GCC to clang. These flags will be used to limit the vector register size presented by TTI to the vectorizers. The backend will still be able to use wider registers for code written using the instrinsics in x86intrin.h. And the backend will still be able to use AVX512VL
2017 Nov 11
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Are you referring to the X86TargetLowering::isFsqrtCheap hook? ~Craig On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <spatel at rotateright.com> wrote: > We can tie a user preference / override to a CPU model. We do something > like that for square root estimates already (although it does use a > SubtargetFeature currently for x86; ideally, we'd key that off of something >
2020 May 18
2
Use Galois field New Instructions (GFNI) to combine affine instructions
On 5/18/20 8:24 PM, Craig Topper wrote: > I can tell you that your avx512 issue is that v64i8 gfni instructions also > require avx512bw to be enabled to make v64i8 a supported type. The C > intrinsics handling in the front end know this rule. But since you > generated your own intrinsics you bypassed that. Indeed that's the issue... I was stick with what Intel announces here