martin krastev
2015-Feb-26 08:54 UTC
[LLVMdev] MCJIT generating loads of just-stored constants
Hello, I end up with the following IR, exhibiting an apparent missed optimisation opportunity, namely loading of just-stored constants: ... %5 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 0 store i32 1, i32* %5, align 4 %6 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 1 store i32 1, i32* %6, align 4 %7 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 2 store i32 0, i32* %7, align 4 %8 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 6 store i32 2, i32* %8, align 4 %9 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 8 store i32 2, i32* %9, align 4 %10 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 10 store i32 16, i32* %10, align 4 %11 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 11 store i32 16, i32* %11, align 4 %12 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 12 store i32 0, i32* %12, align 4 %13 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 13 store i32 0, i32* %13, align 4 %14 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 15 store i32 8, i32* %14, align 4 %15 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 17 store i32 0, i32* %15, align 4 %16 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 0 %17 = load i32* %16, align 4 %18 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 3 %19 = load float* %18, align 4 %20 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 4 %21 = load float* %20, align 4 %22 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 5 %23 = load float* %22, align 4 %24 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 6 %25 = load i32* %24, align 4 %26 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 7 %27 = load float* %26, align 4 %28 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 8 %29 = load i32* %28, align 4 %30 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 9 %31 = load float* %30, align 4 %32 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 10 %33 = load i32* %32, align 4 %34 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 11 %35 = load i32* %34, align 4 %36 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 13 %37 = load i32* %36, align 4 %38 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 14 %39 = load float* %38, align 4 %40 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 15 %41 = load i32* %40, align 4 %42 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 16 %43 = load float* %42, align 4 ... The above happens after a callee gets inlined - all the stores are from the caller, and the loads are from the inlined callee. Please note the partial overlap between stored and loaded fields. The general steps leading to the above: 1. Load a module containing a function A::foo(), which function starts with reading fields from an object of class A. 2. Add to the module a wrapper function bar() which takes as an argument an object of class A, stores literals to (most of the) fields of the object, then calls A::foo() with the same object. 3. Update the original A::foo() with an AlwaysInline attribute. 4. Pass the module to MCJIT from clang 3.4.2, set up as: ... llvm::PassRegistry ®istry *llvm::PassRegistry::getPassRegistry(); llvm::initializeCore(registry); llvm::initializeScalarOpts(registry); llvm::initializeObjCARCOpts(registry); llvm::initializeVectorization(registry); llvm::initializeIPO(registry); llvm::initializeAnalysis(registry); llvm::initializeIPA(registry); llvm::initializeTransformUtils(registry); llvm::initializeInstCombine(registry); llvm::initializeTarget(registry); llvm::initializeCodeGen(registry); llvm::initializeLoopStrengthReducePass(registry); llvm::initializeLowerIntrinsicsPass(registry); llvm::initializeUnreachableBlockElimPass(registry); llvm::TargetOptions opt; opt.PositionIndependentExecutable = false; const std::string& triple = llvm::sys::getProcessTriple(); const std::string& hostcpu = llvm::sys::getHostCPUName(); const std::string& features = ""; std::string error; const llvm::Target *const target llvm::TargetRegistry::lookupTarget(triple, error); llvm::TargetMachine *const tm = target->createTargetMachine( triple, hostcpu, features, opt, llvm::Reloc::Default, llvm::CodeModel::JITDefault, llvm::CodeGenOpt::Aggressive); // Set up IR pass management llvm::FunctionPassManager fpm(module); llvm::PassManager pm; tm->addAnalysisPasses(pm); tm->addAnalysisPasses(fpm); // Use a pass manager builder for C-style optimisations llvm::PassManagerBuilder passBuilder; passBuilder.OptLevel = 3; passBuilder.SizeLevel = 0; passBuilder.Inliner llvm::createAlwaysInlinerPass(false); // suppress llvm.lifetime.* intrinsics passBuilder.BBVectorize = true; passBuilder.SLPVectorize = true; passBuilder.LoopVectorize = true; passBuilder.LateVectorize = true; passBuilder.populateFunctionPassManager(fpm); passBuilder.populateModulePassManager(pm); fpm.doInitialization(); for (llvm::Module::iterator it = module->begin(), endit = module->end(); it != endit; ++it) { fpm.run(*it); } fpm.doFinalization(); pm.run(*module); execEngine llvm::EngineBuilder(module).setEngineKind(llvm::EngineKind::JIT).setUseMCJIT(true).create(tm); execEngine->finalizeObject(); ... I guess there's something apparent I'm missing from the MCJIT setup in order to get these results. Any hits are greatly appreciated. Regards, Martin