I have some trouble making the SIMD vector length visible to the passes. My application is basically on the level of 'opt'. What I did in version 3.6 was functionPassManager->add(new llvm::TargetLibraryInfo(llvm::Triple(Mod->getTargetTriple()))); functionPassManager->add(new llvm::DataLayoutPass()); and then the -basicaa and -loop-vectorizer were able to vectorize the input IR for AVX. Now, with 3.8 that didn't compile. What I do instead is just setting the datalayout to the Module (got that from the Kaleido example). Mod->setDataLayout( targetMachine->createDataLayout() ); I don't add anything to the pass manager anymore, right? Especially I don't set the target triple..?! However, the SIMD size doesn't shine through. The debug output of the loop vectorizer says: LV: Checking a loop in "main" from module LV: Loop hints: force=? width=0 unroll=0 LV: Found a loop: L3 LV: Found an induction variable. LV: We can vectorize this loop! LV: Found trip count: 8 LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 32 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %6 = phi i64 [ %14, %L3 ], [ 0, %L5 ] LV: Found an estimated cost of 1 for VF 1 For instruction: %7 = add nsw i64 %19, %6 LV: Found an estimated cost of 0 for VF 1 For instruction: %8 = getelementptr float, float* %arg1, i64 %7 LV: Found an estimated cost of 1 for VF 1 For instruction: %9 = load float, float* %8 LV: Found an estimated cost of 0 for VF 1 For instruction: %10 = getelementptr float, float* %arg2, i64 %7 LV: Found an estimated cost of 1 for VF 1 For instruction: %11 = load float, float* %10 LV: Found an estimated cost of 1 for VF 1 For instruction: %12 = fadd float %11, %9 LV: Found an estimated cost of 0 for VF 1 For instruction: %13 = getelementptr float, float* %arg0, i64 %7 LV: Found an estimated cost of 1 for VF 1 For instruction: store float %12, float* %13 LV: Found an estimated cost of 1 for VF 1 For instruction: %14 = add nsw i64 %6, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %15 = icmp sge i64 %14, 8 LV: Found an estimated cost of 1 for VF 1 For instruction: br i1 %15, label %L4, label %L3 LV: Scalar loop costs: 8. LV: Selecting VF: 1. LV: Vectorization is possible but not beneficial. LV: Interleaving is not beneficial. The problematic line is: LV: The Widest register is: 32 bits. Before, with 3.6 on the same hardware it showed 256 bits. (which is correct). Something is a miss here. I know, there were some changes to the target triple, but I didn't follow it too closely. Anyone knows how this is done now? Thanks, Frank
Do you have the TTI in your pass manager? Something like: // Add the TTI (required to inform the vectorizer about register size for // instance) PM.add(createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis())); Also, are you populating the pass manager using the passmanagerbuilder? You still need the TLI: // Populate the PassManager PassManagerBuilder PMB; PMB.LibraryInfo = new TargetLibraryInfoImpl(TM->getTargetTriple()); .... Or without the PassManagerBuild, something like: PM.add(new TargetLibraryInfoWrapperPass(TargetLibraryInfoImpl(TM->getTargetTriple()))); -- Mehdi> On Feb 19, 2016, at 12:08 PM, Frank Winter via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I have some trouble making the SIMD vector length visible to the passes. My application is basically on the level of 'opt'. > What I did in version 3.6 was > > functionPassManager->add(new llvm::TargetLibraryInfo(llvm::Triple(Mod->getTargetTriple()))); > functionPassManager->add(new llvm::DataLayoutPass()); > > and then the -basicaa and -loop-vectorizer were able to vectorize the input IR for AVX. > > Now, with 3.8 that didn't compile. What I do instead is just setting the datalayout to the Module (got that from the Kaleido example). > > Mod->setDataLayout( targetMachine->createDataLayout() ); > > I don't add anything to the pass manager anymore, right? Especially I don't set the target triple..?! > > However, the SIMD size doesn't shine through. The debug output of the loop vectorizer says: > > LV: Checking a loop in "main" from module > LV: Loop hints: force=? width=0 unroll=0 > LV: Found a loop: L3 > LV: Found an induction variable. > LV: We can vectorize this loop! > LV: Found trip count: 8 > LV: The Smallest and Widest types: 32 / 32 bits. > LV: The Widest register is: 32 bits. > LV: Found an estimated cost of 0 for VF 1 For instruction: %6 = phi i64 [ %14, %L3 ], [ 0, %L5 ] > LV: Found an estimated cost of 1 for VF 1 For instruction: %7 = add nsw i64 %19, %6 > LV: Found an estimated cost of 0 for VF 1 For instruction: %8 = getelementptr float, float* %arg1, i64 %7 > LV: Found an estimated cost of 1 for VF 1 For instruction: %9 = load float, float* %8 > LV: Found an estimated cost of 0 for VF 1 For instruction: %10 = getelementptr float, float* %arg2, i64 %7 > LV: Found an estimated cost of 1 for VF 1 For instruction: %11 = load float, float* %10 > LV: Found an estimated cost of 1 for VF 1 For instruction: %12 = fadd float %11, %9 > LV: Found an estimated cost of 0 for VF 1 For instruction: %13 = getelementptr float, float* %arg0, i64 %7 > LV: Found an estimated cost of 1 for VF 1 For instruction: store float %12, float* %13 > LV: Found an estimated cost of 1 for VF 1 For instruction: %14 = add nsw i64 %6, 1 > LV: Found an estimated cost of 1 for VF 1 For instruction: %15 = icmp sge i64 %14, 8 > LV: Found an estimated cost of 1 for VF 1 For instruction: br i1 %15, label %L4, label %L3 > LV: Scalar loop costs: 8. > LV: Selecting VF: 1. > LV: Vectorization is possible but not beneficial. > LV: Interleaving is not beneficial. > > > The problematic line is: > > LV: The Widest register is: 32 bits. > > Before, with 3.6 on the same hardware it showed 256 bits. (which is correct). > > Something is a miss here. I know, there were some changes to the target triple, but I didn't follow it too closely. Anyone knows how this is done now? > > Thanks, > Frank > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
I added your suggestion and am using this now llvm::legacy::FunctionPassManager *functionPassManager = new llvm::legacy::FunctionPassManager(Mod); llvm::PassRegistry ®istry = *llvm::PassRegistry::getPassRegistry(); initializeScalarOpts(registry); functionPassManager->add( new llvm::TargetLibraryInfoWrapperPass(llvm::TargetLibraryInfoImpl(targetMachine->getTargetTriple())) ); still, LV: The Widest register is: 32 bits. so, unfortunately no change. If I dump the Module, it starts with: ; ModuleID = 'module' target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" Does this datalayout look good for x86-64 AVX ? Frank On 02/19/2016 03:14 PM, Mehdi Amini wrote:> Do you have the TTI in your pass manager? > > Something like: > > // Add the TTI (required to inform the vectorizer about register size for > // instance) > PM.add(createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis())); > > Also, are you populating the pass manager using the passmanagerbuilder? You still need the TLI: > > // Populate the PassManager > PassManagerBuilder PMB; > PMB.LibraryInfo = new TargetLibraryInfoImpl(TM->getTargetTriple()); > .... > > > Or without the PassManagerBuild, something like: > > PM.add(new TargetLibraryInfoWrapperPass(TargetLibraryInfoImpl(TM->getTargetTriple()))); > >