Frank Winter via llvm-dev
2016-Jun-23 17:00 UTC
[llvm-dev] AVX512 instruction generated when JIT compiling for an avx2 architecture
On 06/23/2016 12:56 PM, Craig Topper wrote:> Can you check what value "getHostCPUName" returned?getHostCPUName() = skylake> > On Thu, Jun 23, 2016 at 9:53 AM, Frank Winter via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > With LLVM 3.8 the JIT compiler engine generates an AVX512 > instruction although I target an 'avx2' CPU (intel Core I7). > I just downloaded the most recent 3.8 and still it happens. > > It happens with this input module: > > > target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" > > define void @module_cFFEMJ(i64 %lo, i64 %hi, i64 %myId, i1 > %ordered, i64 %start, i32* noalias align 32 %arg0, i32* noalias > align 32 %arg1) { > entrypoint: > %0 = add nsw i64 %lo, %start > %1 = add nsw i64 %hi, %start > %2 = select i1 %ordered, i64 %0, i64 %lo > %3 = select i1 %ordered, i64 %1, i64 %hi > %4 = sdiv i64 %2, 4 > %5 = sdiv i64 %3, 4 > %6 = bitcast i32* %arg1 to i64* > %7 = load i64, i64* %6, align 32 > %8 = trunc i64 %7 to i32 > %9 = getelementptr i32, i32* %arg1, i64 1 > %10 = lshr i64 %7, 32 > %11 = trunc i64 %10 to i32 > %12 = getelementptr i32, i32* %arg1, i64 2 > %13 = bitcast i32* %12 to i64* > %14 = load i64, i64* %13, align 8 > %15 = trunc i64 %14 to i32 > %16 = getelementptr i32, i32* %arg1, i64 3 > %17 = lshr i64 %14, 32 > %18 = trunc i64 %17 to i32 > br label %L5 > > L5: ; preds = %L5, > %entrypoint > %19 = phi i64 [ %32, %L5 ], [ %4, %entrypoint ] > %20 = shl i64 %19, 4 > %21 = or i64 %20, 4 > %22 = or i64 %20, 8 > %23 = or i64 %20, 12 > %broadcast.splatinsert9 = insertelement <4 x i32> undef, i32 %8, > i32 0 > %broadcast.splat10 = shufflevector <4 x i32> > %broadcast.splatinsert9, <4 x i32> undef, <4 x i32> zeroinitializer > %broadcast.splatinsert11 = insertelement <4 x i32> undef, i32 > %11, i32 0 > %broadcast.splat12 = shufflevector <4 x i32> > %broadcast.splatinsert11, <4 x i32> undef, <4 x i32> zeroinitializer > %broadcast.splatinsert13 = insertelement <4 x i32> undef, i32 > %15, i32 0 > %broadcast.splat14 = shufflevector <4 x i32> > %broadcast.splatinsert13, <4 x i32> undef, <4 x i32> zeroinitializer > %broadcast.splatinsert15 = insertelement <4 x i32> undef, i32 > %18, i32 0 > %broadcast.splat16 = shufflevector <4 x i32> > %broadcast.splatinsert15, <4 x i32> undef, <4 x i32> zeroinitializer > %24 = getelementptr i32, i32* %arg0, i64 %20 > %25 = bitcast i32* %24 to <4 x i32>* > store <4 x i32> %broadcast.splat10, <4 x i32>* %25, align 16 > %26 = getelementptr i32, i32* %arg0, i64 %21 > %27 = bitcast i32* %26 to <4 x i32>* > store <4 x i32> %broadcast.splat12, <4 x i32>* %27, align 16 > %28 = getelementptr i32, i32* %arg0, i64 %22 > %29 = bitcast i32* %28 to <4 x i32>* > store <4 x i32> %broadcast.splat14, <4 x i32>* %29, align 16 > %30 = getelementptr i32, i32* %arg0, i64 %23 > %31 = bitcast i32* %30 to <4 x i32>* > store <4 x i32> %broadcast.splat16, <4 x i32>* %31, align 16 > %32 = add nsw i64 %19, 1 > %33 = icmp slt i64 %32, %5 > br i1 %33, label %L5, label %L6 > > L6: ; preds = %L5 > ret void > } > > > The following code line show how I call the JIT compiler. ('Mod' > is pointing to the module). > > llvm::EngineBuilder > engineBuilder(std::move(std::unique_ptr<llvm::Module>(Mod))); > engineBuilder.setMCPU(llvm::sys::getHostCPUName()); > engineBuilder.setEngineKind(llvm::EngineKind::JIT); > engineBuilder.setOptLevel(llvm::CodeGenOpt::Aggressive); > engineBuilder.setErrorStr(&mcjit_error); > > llvm::TargetOptions targetOptions; > targetOptions.AllowFPOpFusion = llvm::FPOpFusion::Fast; > engineBuilder.setTargetOptions( targetOptions ); > > TheExecutionEngine = engineBuilder.create(); > > targetMachine = engineBuilder.selectTarget(); > Mod->setDataLayout( targetMachine->createDataLayout() ); > > TheExecutionEngine->finalizeObject(); // MCJIT > fptr_mainFunc_extern = TheExecutionEngine->getPointerToFunction( > mainFunc_extern ); > > > When calling the function an 'illegal instruction' is raised. > Looking at the assembler reveals an AVX512 instruction which > shouldn't be there. > > Assembly: > .text > .file "module" > .globl main > .align 16, 0x90 > .type main, at function > main: > .cfi_startproc > movq 8(%rsp), %r10 > leaq (%rdi,%r8), %rdx > addq %rsi, %r8 > testb $1, %cl > cmoveq %rdi, %rdx > cmoveq %rsi, %r8 > movq %rdx, %rax > sarq $63, %rax > shrq $62, %rax > addq %rdx, %rax > sarq $2, %rax > movq %r8, %rcx > sarq $63, %rcx > shrq $62, %rcx > addq %r8, %rcx > sarq $2, %rcx > movq (%r10), %r8 > movq 8(%r10), %r10 > movq %r8, %rdi > shrq $32, %rdi > movq %r10, %rsi > shrq $32, %rsi > movq %rax, %rdx > shlq $6, %rdx > leaq 48(%rdx,%r9), %rdx > .align 16, 0x90 > .LBB0_1: > vmovd %r8d, %xmm0 > vpbroadcastd %xmm0, %xmm0 > vmovd %edi, %xmm1 > vpbroadcastd %xmm1, %xmm1 > vmovd %r10d, %xmm2 > vpbroadcastd %xmm2, %xmm2 > vmovd %esi, %xmm3 > vpbroadcastd %xmm3, %xmm3 > vmovdqa32 %xmm0, -48(%rdx) > vmovdqa32 %xmm1, -32(%rdx) > vmovdqa32 %xmm2, -16(%rdx) > vmovdqa32 %xmm3, (%rdx) > addq $1, %rax > addq $64, %rdx > cmpq %rcx, %rax > jl .LBB0_1 > retq > .Lfunc_end0: > .size main, .Lfunc_end0-main > .cfi_endproc > > > .section ".note.GNU-stack","", at progbits > > end assembly! > > I am not sure what instruction is the offending one, but the > 'vmovdqa32' looks avx512. > > I wasn't able to reproduce this with 'opt' - it generates avx2 > instructions. And when I force it to use e.g. avx512f it rejects > the CPU type. > > Any ideas? > > > Frank > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > -- > ~Craig-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160623/98510675/attachment.html>
Keno Fischer via llvm-dev
2016-Jun-23 17:07 UTC
[llvm-dev] AVX512 instruction generated when JIT compiling for an avx2 architecture
You likely haven't set the cpu features correctly. See llvm::sys::getHostCPUFeatures. E.g. this is what we're doing in julia: https://github.com/JuliaLang/julia/blob/59b253031af87f62e7d70a7d8848cdfd4a84288b/src/codegen.cpp#L5627 On Thu, Jun 23, 2016 at 1:00 PM, Frank Winter via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > > > On 06/23/2016 12:56 PM, Craig Topper wrote: > > Can you check what value "getHostCPUName" returned? > > getHostCPUName() = skylake > > > On Thu, Jun 23, 2016 at 9:53 AM, Frank Winter via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> With LLVM 3.8 the JIT compiler engine generates an AVX512 instruction although I target an 'avx2' CPU (intel Core I7). >> I just downloaded the most recent 3.8 and still it happens. >> >> It happens with this input module: >> >> >> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" >> >> define void @module_cFFEMJ(i64 %lo, i64 %hi, i64 %myId, i1 %ordered, i64 %start, i32* noalias align 32 %arg0, i32* noalias align 32 %arg1) { >> entrypoint: >> %0 = add nsw i64 %lo, %start >> %1 = add nsw i64 %hi, %start >> %2 = select i1 %ordered, i64 %0, i64 %lo >> %3 = select i1 %ordered, i64 %1, i64 %hi >> %4 = sdiv i64 %2, 4 >> %5 = sdiv i64 %3, 4 >> %6 = bitcast i32* %arg1 to i64* >> %7 = load i64, i64* %6, align 32 >> %8 = trunc i64 %7 to i32 >> %9 = getelementptr i32, i32* %arg1, i64 1 >> %10 = lshr i64 %7, 32 >> %11 = trunc i64 %10 to i32 >> %12 = getelementptr i32, i32* %arg1, i64 2 >> %13 = bitcast i32* %12 to i64* >> %14 = load i64, i64* %13, align 8 >> %15 = trunc i64 %14 to i32 >> %16 = getelementptr i32, i32* %arg1, i64 3 >> %17 = lshr i64 %14, 32 >> %18 = trunc i64 %17 to i32 >> br label %L5 >> >> L5: ; preds = %L5, %entrypoint >> %19 = phi i64 [ %32, %L5 ], [ %4, %entrypoint ] >> %20 = shl i64 %19, 4 >> %21 = or i64 %20, 4 >> %22 = or i64 %20, 8 >> %23 = or i64 %20, 12 >> %broadcast.splatinsert9 = insertelement <4 x i32> undef, i32 %8, i32 0 >> %broadcast.splat10 = shufflevector <4 x i32> %broadcast.splatinsert9, <4 x i32> undef, <4 x i32> zeroinitializer >> %broadcast.splatinsert11 = insertelement <4 x i32> undef, i32 %11, i32 0 >> %broadcast.splat12 = shufflevector <4 x i32> %broadcast.splatinsert11, <4 x i32> undef, <4 x i32> zeroinitializer >> %broadcast.splatinsert13 = insertelement <4 x i32> undef, i32 %15, i32 0 >> %broadcast.splat14 = shufflevector <4 x i32> %broadcast.splatinsert13, <4 x i32> undef, <4 x i32> zeroinitializer >> %broadcast.splatinsert15 = insertelement <4 x i32> undef, i32 %18, i32 0 >> %broadcast.splat16 = shufflevector <4 x i32> %broadcast.splatinsert15, <4 x i32> undef, <4 x i32> zeroinitializer >> %24 = getelementptr i32, i32* %arg0, i64 %20 >> %25 = bitcast i32* %24 to <4 x i32>* >> store <4 x i32> %broadcast.splat10, <4 x i32>* %25, align 16 >> %26 = getelementptr i32, i32* %arg0, i64 %21 >> %27 = bitcast i32* %26 to <4 x i32>* >> store <4 x i32> %broadcast.splat12, <4 x i32>* %27, align 16 >> %28 = getelementptr i32, i32* %arg0, i64 %22 >> %29 = bitcast i32* %28 to <4 x i32>* >> store <4 x i32> %broadcast.splat14, <4 x i32>* %29, align 16 >> %30 = getelementptr i32, i32* %arg0, i64 %23 >> %31 = bitcast i32* %30 to <4 x i32>* >> store <4 x i32> %broadcast.splat16, <4 x i32>* %31, align 16 >> %32 = add nsw i64 %19, 1 >> %33 = icmp slt i64 %32, %5 >> br i1 %33, label %L5, label %L6 >> >> L6: ; preds = %L5 >> ret void >> } >> >> >> The following code line show how I call the JIT compiler. ('Mod' is pointing to the module). >> >> llvm::EngineBuilder engineBuilder(std::move(std::unique_ptr<llvm::Module>(Mod))); >> engineBuilder.setMCPU(llvm::sys::getHostCPUName()); >> engineBuilder.setEngineKind(llvm::EngineKind::JIT); >> engineBuilder.setOptLevel(llvm::CodeGenOpt::Aggressive); >> engineBuilder.setErrorStr(&mcjit_error); >> >> llvm::TargetOptions targetOptions; >> targetOptions.AllowFPOpFusion = llvm::FPOpFusion::Fast; >> engineBuilder.setTargetOptions( targetOptions ); >> >> TheExecutionEngine = engineBuilder.create(); >> >> targetMachine = engineBuilder.selectTarget(); >> Mod->setDataLayout( targetMachine->createDataLayout() ); >> >> TheExecutionEngine->finalizeObject(); // MCJIT >> fptr_mainFunc_extern = TheExecutionEngine->getPointerToFunction( mainFunc_extern ); >> >> >> When calling the function an 'illegal instruction' is raised. >> Looking at the assembler reveals an AVX512 instruction which shouldn't be there. >> >> Assembly: >> .text >> .file "module" >> .globl main >> .align 16, 0x90 >> .type main, at function >> main: >> .cfi_startproc >> movq 8(%rsp), %r10 >> leaq (%rdi,%r8), %rdx >> addq %rsi, %r8 >> testb $1, %cl >> cmoveq %rdi, %rdx >> cmoveq %rsi, %r8 >> movq %rdx, %rax >> sarq $63, %rax >> shrq $62, %rax >> addq %rdx, %rax >> sarq $2, %rax >> movq %r8, %rcx >> sarq $63, %rcx >> shrq $62, %rcx >> addq %r8, %rcx >> sarq $2, %rcx >> movq (%r10), %r8 >> movq 8(%r10), %r10 >> movq %r8, %rdi >> shrq $32, %rdi >> movq %r10, %rsi >> shrq $32, %rsi >> movq %rax, %rdx >> shlq $6, %rdx >> leaq 48(%rdx,%r9), %rdx >> .align 16, 0x90 >> .LBB0_1: >> vmovd %r8d, %xmm0 >> vpbroadcastd %xmm0, %xmm0 >> vmovd %edi, %xmm1 >> vpbroadcastd %xmm1, %xmm1 >> vmovd %r10d, %xmm2 >> vpbroadcastd %xmm2, %xmm2 >> vmovd %esi, %xmm3 >> vpbroadcastd %xmm3, %xmm3 >> vmovdqa32 %xmm0, -48(%rdx) >> vmovdqa32 %xmm1, -32(%rdx) >> vmovdqa32 %xmm2, -16(%rdx) >> vmovdqa32 %xmm3, (%rdx) >> addq $1, %rax >> addq $64, %rdx >> cmpq %rcx, %rax >> jl .LBB0_1 >> retq >> .Lfunc_end0: >> .size main, .Lfunc_end0-main >> .cfi_endproc >> >> >> .section ".note.GNU-stack","", at progbits >> >> end assembly! >> >> I am not sure what instruction is the offending one, but the 'vmovdqa32' looks avx512. >> >> I wasn't able to reproduce this with 'opt' - it generates avx2 instructions. And when I force it to use e.g. avx512f it rejects the CPU type. >> >> Any ideas? >> >> >> Frank >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > -- > ~Craig > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Craig Topper via llvm-dev
2016-Jun-23 17:09 UTC
[llvm-dev] AVX512 instruction generated when JIT compiling for an avx2 architecture
I think there's a bug in 3.8 where skylake as cpu implies AVX512. I think later a skylake-avx512(or something similar) was added and avx512 feature was removed from the skylake cpu name. The right way to fix this is to call getHostCPUFeatures as well. This will return a StringMap of true and false values for each feature. Iterate through that and build a string of "+feature,-feature" based on each feature name its true/false value. Then pass that string to engineBuilder.setMAttrs. This will protect against other issues such as low end versions of SandyBridge, Haswell, and SkyLake processesors not supporting AVX at all. On Thu, Jun 23, 2016 at 10:07 AM, Keno Fischer <kfischer at college.harvard.edu> wrote:> You likely haven't set the cpu features correctly. See > llvm::sys::getHostCPUFeatures. E.g. this is what we're doing in julia: > > https://github.com/JuliaLang/julia/blob/59b253031af87f62e7d70a7d8848cdfd4a84288b/src/codegen.cpp#L5627 > > On Thu, Jun 23, 2016 at 1:00 PM, Frank Winter via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > > > > > On 06/23/2016 12:56 PM, Craig Topper wrote: > > > > Can you check what value "getHostCPUName" returned? > > > > getHostCPUName() = skylake > > > > > > On Thu, Jun 23, 2016 at 9:53 AM, Frank Winter via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> With LLVM 3.8 the JIT compiler engine generates an AVX512 instruction > although I target an 'avx2' CPU (intel Core I7). > >> I just downloaded the most recent 3.8 and still it happens. > >> > >> It happens with this input module: > >> > >> > >> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" > >> > >> define void @module_cFFEMJ(i64 %lo, i64 %hi, i64 %myId, i1 %ordered, > i64 %start, i32* noalias align 32 %arg0, i32* noalias align 32 %arg1) { > >> entrypoint: > >> %0 = add nsw i64 %lo, %start > >> %1 = add nsw i64 %hi, %start > >> %2 = select i1 %ordered, i64 %0, i64 %lo > >> %3 = select i1 %ordered, i64 %1, i64 %hi > >> %4 = sdiv i64 %2, 4 > >> %5 = sdiv i64 %3, 4 > >> %6 = bitcast i32* %arg1 to i64* > >> %7 = load i64, i64* %6, align 32 > >> %8 = trunc i64 %7 to i32 > >> %9 = getelementptr i32, i32* %arg1, i64 1 > >> %10 = lshr i64 %7, 32 > >> %11 = trunc i64 %10 to i32 > >> %12 = getelementptr i32, i32* %arg1, i64 2 > >> %13 = bitcast i32* %12 to i64* > >> %14 = load i64, i64* %13, align 8 > >> %15 = trunc i64 %14 to i32 > >> %16 = getelementptr i32, i32* %arg1, i64 3 > >> %17 = lshr i64 %14, 32 > >> %18 = trunc i64 %17 to i32 > >> br label %L5 > >> > >> L5: ; preds = %L5, > %entrypoint > >> %19 = phi i64 [ %32, %L5 ], [ %4, %entrypoint ] > >> %20 = shl i64 %19, 4 > >> %21 = or i64 %20, 4 > >> %22 = or i64 %20, 8 > >> %23 = or i64 %20, 12 > >> %broadcast.splatinsert9 = insertelement <4 x i32> undef, i32 %8, i32 0 > >> %broadcast.splat10 = shufflevector <4 x i32> %broadcast.splatinsert9, > <4 x i32> undef, <4 x i32> zeroinitializer > >> %broadcast.splatinsert11 = insertelement <4 x i32> undef, i32 %11, > i32 0 > >> %broadcast.splat12 = shufflevector <4 x i32> > %broadcast.splatinsert11, <4 x i32> undef, <4 x i32> zeroinitializer > >> %broadcast.splatinsert13 = insertelement <4 x i32> undef, i32 %15, > i32 0 > >> %broadcast.splat14 = shufflevector <4 x i32> > %broadcast.splatinsert13, <4 x i32> undef, <4 x i32> zeroinitializer > >> %broadcast.splatinsert15 = insertelement <4 x i32> undef, i32 %18, > i32 0 > >> %broadcast.splat16 = shufflevector <4 x i32> > %broadcast.splatinsert15, <4 x i32> undef, <4 x i32> zeroinitializer > >> %24 = getelementptr i32, i32* %arg0, i64 %20 > >> %25 = bitcast i32* %24 to <4 x i32>* > >> store <4 x i32> %broadcast.splat10, <4 x i32>* %25, align 16 > >> %26 = getelementptr i32, i32* %arg0, i64 %21 > >> %27 = bitcast i32* %26 to <4 x i32>* > >> store <4 x i32> %broadcast.splat12, <4 x i32>* %27, align 16 > >> %28 = getelementptr i32, i32* %arg0, i64 %22 > >> %29 = bitcast i32* %28 to <4 x i32>* > >> store <4 x i32> %broadcast.splat14, <4 x i32>* %29, align 16 > >> %30 = getelementptr i32, i32* %arg0, i64 %23 > >> %31 = bitcast i32* %30 to <4 x i32>* > >> store <4 x i32> %broadcast.splat16, <4 x i32>* %31, align 16 > >> %32 = add nsw i64 %19, 1 > >> %33 = icmp slt i64 %32, %5 > >> br i1 %33, label %L5, label %L6 > >> > >> L6: ; preds = %L5 > >> ret void > >> } > >> > >> > >> The following code line show how I call the JIT compiler. ('Mod' is > pointing to the module). > >> > >> llvm::EngineBuilder > engineBuilder(std::move(std::unique_ptr<llvm::Module>(Mod))); > >> engineBuilder.setMCPU(llvm::sys::getHostCPUName()); > >> engineBuilder.setEngineKind(llvm::EngineKind::JIT); > >> engineBuilder.setOptLevel(llvm::CodeGenOpt::Aggressive); > >> engineBuilder.setErrorStr(&mcjit_error); > >> > >> llvm::TargetOptions targetOptions; > >> targetOptions.AllowFPOpFusion = llvm::FPOpFusion::Fast; > >> engineBuilder.setTargetOptions( targetOptions ); > >> > >> TheExecutionEngine = engineBuilder.create(); > >> > >> targetMachine = engineBuilder.selectTarget(); > >> Mod->setDataLayout( targetMachine->createDataLayout() ); > >> > >> TheExecutionEngine->finalizeObject(); // MCJIT > >> fptr_mainFunc_extern = TheExecutionEngine->getPointerToFunction( > mainFunc_extern ); > >> > >> > >> When calling the function an 'illegal instruction' is raised. > >> Looking at the assembler reveals an AVX512 instruction which shouldn't > be there. > >> > >> Assembly: > >> .text > >> .file "module" > >> .globl main > >> .align 16, 0x90 > >> .type main, at function > >> main: > >> .cfi_startproc > >> movq 8(%rsp), %r10 > >> leaq (%rdi,%r8), %rdx > >> addq %rsi, %r8 > >> testb $1, %cl > >> cmoveq %rdi, %rdx > >> cmoveq %rsi, %r8 > >> movq %rdx, %rax > >> sarq $63, %rax > >> shrq $62, %rax > >> addq %rdx, %rax > >> sarq $2, %rax > >> movq %r8, %rcx > >> sarq $63, %rcx > >> shrq $62, %rcx > >> addq %r8, %rcx > >> sarq $2, %rcx > >> movq (%r10), %r8 > >> movq 8(%r10), %r10 > >> movq %r8, %rdi > >> shrq $32, %rdi > >> movq %r10, %rsi > >> shrq $32, %rsi > >> movq %rax, %rdx > >> shlq $6, %rdx > >> leaq 48(%rdx,%r9), %rdx > >> .align 16, 0x90 > >> .LBB0_1: > >> vmovd %r8d, %xmm0 > >> vpbroadcastd %xmm0, %xmm0 > >> vmovd %edi, %xmm1 > >> vpbroadcastd %xmm1, %xmm1 > >> vmovd %r10d, %xmm2 > >> vpbroadcastd %xmm2, %xmm2 > >> vmovd %esi, %xmm3 > >> vpbroadcastd %xmm3, %xmm3 > >> vmovdqa32 %xmm0, -48(%rdx) > >> vmovdqa32 %xmm1, -32(%rdx) > >> vmovdqa32 %xmm2, -16(%rdx) > >> vmovdqa32 %xmm3, (%rdx) > >> addq $1, %rax > >> addq $64, %rdx > >> cmpq %rcx, %rax > >> jl .LBB0_1 > >> retq > >> .Lfunc_end0: > >> .size main, .Lfunc_end0-main > >> .cfi_endproc > >> > >> > >> .section ".note.GNU-stack","", at progbits > >> > >> end assembly! > >> > >> I am not sure what instruction is the offending one, but the > 'vmovdqa32' looks avx512. > >> > >> I wasn't able to reproduce this with 'opt' - it generates avx2 > instructions. And when I force it to use e.g. avx512f it rejects the CPU > type. > >> > >> Any ideas? > >> > >> > >> Frank > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > > > -- > > ~Craig > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160623/27c6de13/attachment.html>