Teresa Johnson
2015-Jun-05 03:14 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
On Thu, Jun 4, 2015 at 5:33 PM, Reid Kleckner <rnk at google.com> wrote:> On Thu, Jun 4, 2015 at 5:17 PM, Teresa Johnson <tejohnson at google.com> wrote: >> >> Agreed. Although I assume you mean invoke the new pass under a >> ThinLTO-only option so that avail extern are not dropped in the >> compile pass before the LTO link? > > > No, this pass would actually be an improvement to the standard -O2 pipeline. > The special case is the regular LTO pass pipeline, which wants to run > GlobalDCE but doesn't want to drop available_externally function definitions > until after linker-stage inlining.Ok. It looks to me like the LTO compile step with -O2 -flto -c uses this same -O2 optimization pipeline as without LTO. Clang communicates the fact that we are doing an LTO compile to llvm via the -emit-llvm-bc flag, which just tells it to emit bitcode and exit before codegen. So I would either need to key off of that or setup a new flag to indicate to llvm that we are doing an LTO -c compile. Or is there some other way that I am missing? Incidentally, we'll also want to skip this new pass and keep any referenced avail extern functions in the ThinLTO -c compile step for the same reasons (and there are no imported functions yet at that point). Teresa> > Consider this test case: > > declare void @blah() > define i32 @main() { > call void @foo() > ret i32 0 > } > define available_externally void @foo() noinline { > call void @bar() > ret void > } > define linkonce_odr void @bar() noinline { > call void @blah() > ret void > } > > If I run opt -O2 on this and feed it to llc, it actually generates code for > bar, even though there are no actual references to bar in the final code: > > main: # @main > pushq %rax > callq foo > xorl %eax, %eax > popq %rdx > retq > > bar: # @bar > jmp blah # TAILCALL > > This corner case happens to come up organically with dllimported classes, > which is why I happened to think of it. :) I'm happy with a flag to > globaldce for LTO and the original patch, honestly.-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
Eric Christopher
2015-Jun-05 04:20 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
On Thu, Jun 4, 2015 at 8:17 PM Teresa Johnson <tejohnson at google.com> wrote:> On Thu, Jun 4, 2015 at 5:33 PM, Reid Kleckner <rnk at google.com> wrote: > > On Thu, Jun 4, 2015 at 5:17 PM, Teresa Johnson <tejohnson at google.com> > wrote: > >> > >> Agreed. Although I assume you mean invoke the new pass under a > >> ThinLTO-only option so that avail extern are not dropped in the > >> compile pass before the LTO link? > > > > > > No, this pass would actually be an improvement to the standard -O2 > pipeline. > > The special case is the regular LTO pass pipeline, which wants to run > > GlobalDCE but doesn't want to drop available_externally function > definitions > > until after linker-stage inlining. > > Ok. It looks to me like the LTO compile step with -O2 -flto -c uses > this same -O2 optimization pipeline as without LTO. Clang communicates > the fact that we are doing an LTO compile to llvm via the > -emit-llvm-bc flag, which just tells it to emit bitcode and exit > before codegen. So I would either need to key off of that or setup a > new flag to indicate to llvm that we are doing an LTO -c compile. Or > is there some other way that I am missing? > > Incidentally, we'll also want to skip this new pass and keep any > referenced avail extern functions in the ThinLTO -c compile step for > the same reasons (and there are no imported functions yet at that > point). > >Ultimately for any planned LTO build we're going to want to do a different pass pipeline, it's probably worth considering what should be done before and during LTO. -eric> Teresa > > > > > Consider this test case: > > > > declare void @blah() > > define i32 @main() { > > call void @foo() > > ret i32 0 > > } > > define available_externally void @foo() noinline { > > call void @bar() > > ret void > > } > > define linkonce_odr void @bar() noinline { > > call void @blah() > > ret void > > } > > > > If I run opt -O2 on this and feed it to llc, it actually generates code > for > > bar, even though there are no actual references to bar in the final code: > > > > main: # @main > > pushq %rax > > callq foo > > xorl %eax, %eax > > popq %rdx > > retq > > > > bar: # @bar > > jmp blah # TAILCALL > > > > This corner case happens to come up organically with dllimported classes, > > which is why I happened to think of it. :) I'm happy with a flag to > > globaldce for LTO and the original patch, honestly. > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150605/2b9255cb/attachment.html>
Teresa Johnson
2015-Jun-08 16:03 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
Talked to Eric Fri and he suggested that this might be the first of several places where we want behavior of LTO compiles to diverge from normal -O2 compiles. So for now I have implemented this such that we pass down -flto to the -cc1 job, and that gets propagated as a code gen option and into the PassManagerBuilder. I have left the current logic translating -flto to the -emit-llvm-bc option, since IMO that is cleaner for controlling the output type. Let me know what you think of saving the overall LTO setting in CodeGenOpts/PassManagerBuilder, or if is it better to record just the individual behaviors we want (i.e. change the new field in CodeGenOpts/PassManagerBuilder from "LTO" to "ElimAvailExtern" or something like that). Patches attached. I modified an existing available external clang test to check for the new O2 (LTO vs not) behaviors. Thanks, Teresa On Thu, Jun 4, 2015 at 9:20 PM, Eric Christopher <echristo at gmail.com> wrote:> > > On Thu, Jun 4, 2015 at 8:17 PM Teresa Johnson <tejohnson at google.com> wrote: >> >> On Thu, Jun 4, 2015 at 5:33 PM, Reid Kleckner <rnk at google.com> wrote: >> > On Thu, Jun 4, 2015 at 5:17 PM, Teresa Johnson <tejohnson at google.com> >> > wrote: >> >> >> >> Agreed. Although I assume you mean invoke the new pass under a >> >> ThinLTO-only option so that avail extern are not dropped in the >> >> compile pass before the LTO link? >> > >> > >> > No, this pass would actually be an improvement to the standard -O2 >> > pipeline. >> > The special case is the regular LTO pass pipeline, which wants to run >> > GlobalDCE but doesn't want to drop available_externally function >> > definitions >> > until after linker-stage inlining. >> >> Ok. It looks to me like the LTO compile step with -O2 -flto -c uses >> this same -O2 optimization pipeline as without LTO. Clang communicates >> the fact that we are doing an LTO compile to llvm via the >> -emit-llvm-bc flag, which just tells it to emit bitcode and exit >> before codegen. So I would either need to key off of that or setup a >> new flag to indicate to llvm that we are doing an LTO -c compile. Or >> is there some other way that I am missing? >> >> Incidentally, we'll also want to skip this new pass and keep any >> referenced avail extern functions in the ThinLTO -c compile step for >> the same reasons (and there are no imported functions yet at that >> point). >> > > Ultimately for any planned LTO build we're going to want to do a different > pass pipeline, it's probably worth considering what should be done before > and during LTO. > > -eric > >> >> Teresa >> >> > >> > Consider this test case: >> > >> > declare void @blah() >> > define i32 @main() { >> > call void @foo() >> > ret i32 0 >> > } >> > define available_externally void @foo() noinline { >> > call void @bar() >> > ret void >> > } >> > define linkonce_odr void @bar() noinline { >> > call void @blah() >> > ret void >> > } >> > >> > If I run opt -O2 on this and feed it to llc, it actually generates code >> > for >> > bar, even though there are no actual references to bar in the final >> > code: >> > >> > main: # @main >> > pushq %rax >> > callq foo >> > xorl %eax, %eax >> > popq %rdx >> > retq >> > >> > bar: # @bar >> > jmp blah # TAILCALL >> > >> > This corner case happens to come up organically with dllimported >> > classes, >> > which is why I happened to think of it. :) I'm happy with a flag to >> > globaldce for LTO and the original patch, honestly. >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 -------------- next part -------------- Index: include/llvm/InitializePasses.h ==================================================================--- include/llvm/InitializePasses.h (revision 237590) +++ include/llvm/InitializePasses.h (working copy) @@ -130,6 +130,7 @@ void initializeSanitizerCoverageModulePass(PassReg void initializeDataFlowSanitizerPass(PassRegistry&); void initializeScalarizerPass(PassRegistry&); void initializeEarlyCSELegacyPassPass(PassRegistry &); +void initializeElimAvailExternPass(PassRegistry&); void initializeExpandISelPseudosPass(PassRegistry&); void initializeFunctionAttrsPass(PassRegistry&); void initializeGCMachineCodeAnalysisPass(PassRegistry&); Index: include/llvm/Transforms/IPO.h ==================================================================--- include/llvm/Transforms/IPO.h (revision 237590) +++ include/llvm/Transforms/IPO.h (working copy) @@ -71,6 +71,12 @@ ModulePass *createGlobalOptimizerPass(); ModulePass *createGlobalDCEPass(); //===----------------------------------------------------------------------===// +/// createElimAvailExternPass - This transform is designed to eliminate +/// available external globals (functions or global variables) +/// +ModulePass *createElimAvailExternPass(); + +//===----------------------------------------------------------------------===// /// createGVExtractionPass - If deleteFn is true, this pass deletes /// the specified global values. Otherwise, it deletes as much of the module as /// possible, except for the global values specified. Index: include/llvm/Transforms/IPO/PassManagerBuilder.h ==================================================================--- include/llvm/Transforms/IPO/PassManagerBuilder.h (revision 237590) +++ include/llvm/Transforms/IPO/PassManagerBuilder.h (working copy) @@ -121,6 +121,7 @@ class PassManagerBuilder { bool VerifyInput; bool VerifyOutput; bool MergeFunctions; + bool LTO; private: /// ExtensionList - This is list of all of the extensions that are registered. Index: lib/Transforms/IPO/CMakeLists.txt ==================================================================--- lib/Transforms/IPO/CMakeLists.txt (revision 237590) +++ lib/Transforms/IPO/CMakeLists.txt (working copy) @@ -3,6 +3,7 @@ add_llvm_library(LLVMipo BarrierNoopPass.cpp ConstantMerge.cpp DeadArgumentElimination.cpp + ElimAvailExtern.cpp ExtractGV.cpp FunctionAttrs.cpp GlobalDCE.cpp Index: lib/Transforms/IPO/ElimAvailExtern.cpp ==================================================================--- lib/Transforms/IPO/ElimAvailExtern.cpp (revision 0) +++ lib/Transforms/IPO/ElimAvailExtern.cpp (working copy) @@ -0,0 +1,90 @@ +//===-- ElimAvailExtern.cpp - DCE unreachable internal functions ----------------===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +// +// This transform is designed to eliminate available external global +// definitions from the program, turning them into declarations. +// +//===----------------------------------------------------------------------===// + +#include "llvm/Transforms/IPO.h" +#include "llvm/ADT/Statistic.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/Instructions.h" +#include "llvm/IR/Module.h" +#include "llvm/Transforms/Utils/CtorUtils.h" +#include "llvm/Transforms/Utils/GlobalStatus.h" +#include "llvm/Pass.h" +using namespace llvm; + +#define DEBUG_TYPE "elimavailextern" + +STATISTIC(NumAliases , "Number of global aliases removed"); +STATISTIC(NumFunctions, "Number of functions removed"); +STATISTIC(NumVariables, "Number of global variables removed"); + +namespace { + struct ElimAvailExtern : public ModulePass { + static char ID; // Pass identification, replacement for typeid + ElimAvailExtern() : ModulePass(ID) { + initializeElimAvailExternPass(*PassRegistry::getPassRegistry()); + } + + // run - Do the ElimAvailExtern pass on the specified module, optionally + // updating the specified callgraph to reflect the changes. + // + bool runOnModule(Module &M) override; + }; +} + +char ElimAvailExtern::ID = 0; +INITIALIZE_PASS(ElimAvailExtern, "elimavailextern", + "Eliminate Available External Globals", false, false) + +ModulePass *llvm::createElimAvailExternPass() { return new ElimAvailExtern(); } + +bool ElimAvailExtern::runOnModule(Module &M) { + bool Changed = false; + + // Drop initializers of available externally global variables. + for (Module::global_iterator I = M.global_begin(), E = M.global_end(); + I != E; ++I) { + if (!I->hasAvailableExternallyLinkage()) + continue; + if (I->hasInitializer()) { + Constant *Init = I->getInitializer(); + I->setInitializer(nullptr); + if (isSafeToDestroyConstant(Init)) + Init->destroyConstant(); + } + I->removeDeadConstantUsers(); + NumVariables++; + } + + // Drop the bodies of available externally functions. + for (Module::iterator I = M.begin(), E = M.end(); I != E; ++I) { + if (!I->hasAvailableExternallyLinkage()) + continue; + if (!I->isDeclaration()) + I->deleteBody(); + I->removeDeadConstantUsers(); + NumFunctions++; + } + + // Drop targets of available externally aliases. + for (Module::alias_iterator I = M.alias_begin(), E = M.alias_end(); I != E; + ++I) { + if (!I->hasAvailableExternallyLinkage()) + continue; + I->setAliasee(nullptr); + I->removeDeadConstantUsers(); + NumAliases++; + } + + return Changed; +} Index: lib/Transforms/IPO/PassManagerBuilder.cpp ==================================================================--- lib/Transforms/IPO/PassManagerBuilder.cpp (revision 237590) +++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy) @@ -106,6 +106,7 @@ PassManagerBuilder::PassManagerBuilder() { VerifyInput = false; VerifyOutput = false; MergeFunctions = false; + LTO = false; } PassManagerBuilder::~PassManagerBuilder() { @@ -407,6 +408,16 @@ void PassManagerBuilder::populateModulePassManager // GlobalOpt already deletes dead functions and globals, at -O2 try a // late pass of GlobalDCE. It is capable of deleting dead cycles. if (OptLevel > 1) { + if (!LTO) { + // Remove avail extern fns and globals definitions if we aren't + // doing an -flto compilation. For LTO we will preserve these + // so they are eligible for inlining at link-time. Note if they + // are unreferenced they will be removed by GlobalDCE below, so + // this only impacts referenced available externally globals. + // Eventually they will be suppressed during codegen, but eliminating + // here is a compile-time savings. + MPM.add(createElimAvailExternPass()); + } MPM.add(createGlobalDCEPass()); // Remove dead fns and globals. MPM.add(createConstantMergePass()); // Merge dup global constants } -------------- next part -------------- Index: include/clang/Driver/Options.td ==================================================================--- include/clang/Driver/Options.td (revision 237590) +++ include/clang/Driver/Options.td (working copy) @@ -636,7 +636,7 @@ def flat__namespace : Flag<["-"], "flat_namespace" def flax_vector_conversions : Flag<["-"], "flax-vector-conversions">, Group<f_Group>; def flimited_precision_EQ : Joined<["-"], "flimited-precision=">, Group<f_Group>; def flto_EQ : Joined<["-"], "flto=">, Group<clang_ignored_gcc_optimization_f_Group>; -def flto : Flag<["-"], "flto">, Group<f_Group>; +def flto : Flag<["-"], "flto">, Flags<[CC1Option]>, Group<f_Group>; def fno_lto : Flag<["-"], "fno-lto">, Group<f_Group>; def fmacro_backtrace_limit_EQ : Joined<["-"], "fmacro-backtrace-limit=">, Group<f_Group>, Flags<[DriverOption, CoreOption]>; Index: include/clang/Frontend/CodeGenOptions.def ==================================================================--- include/clang/Frontend/CodeGenOptions.def (revision 237590) +++ include/clang/Frontend/CodeGenOptions.def (working copy) @@ -67,6 +67,7 @@ CODEGENOPT(InstrumentFunctions , 1, 0) ///< Set wh CODEGENOPT(InstrumentForProfiling , 1, 0) ///< Set when -pg is enabled. CODEGENOPT(LessPreciseFPMAD , 1, 0) ///< Enable less precise MAD instructions to ///< be generated. +CODEGENOPT(LTO , 1, 0) ///< Set when -flto is enabled. CODEGENOPT(MergeAllConstants , 1, 1) ///< Merge identical constants. CODEGENOPT(MergeFunctions , 1, 0) ///< Set when -fmerge-functions is enabled. CODEGENOPT(MSVolatile , 1, 0) ///< Set when /volatile:ms is enabled. Index: lib/CodeGen/BackendUtil.cpp ==================================================================--- lib/CodeGen/BackendUtil.cpp (revision 237590) +++ lib/CodeGen/BackendUtil.cpp (working copy) @@ -287,6 +287,7 @@ void EmitAssemblyHelper::CreatePasses() { PMBuilder.DisableUnitAtATime = !CodeGenOpts.UnitAtATime; PMBuilder.DisableUnrollLoops = !CodeGenOpts.UnrollLoops; PMBuilder.MergeFunctions = CodeGenOpts.MergeFunctions; + PMBuilder.LTO = CodeGenOpts.LTO; PMBuilder.RerollLoops = CodeGenOpts.RerollLoops; PMBuilder.addExtension(PassManagerBuilder::EP_EarlyAsPossible, Index: lib/Driver/Tools.cpp ==================================================================--- lib/Driver/Tools.cpp (revision 237590) +++ lib/Driver/Tools.cpp (working copy) @@ -2676,6 +2676,10 @@ void Clang::ConstructJob(Compilation &C, const Job assert((isa<CompileJobAction>(JA) || isa<BackendJobAction>(JA)) && "Invalid action for clang tool."); + if (JA.getType() == types::TY_LTO_IR || + JA.getType() == types::TY_LTO_BC) { + CmdArgs.push_back("-flto"); + } if (JA.getType() == types::TY_Nothing) { CmdArgs.push_back("-fsyntax-only"); } else if (JA.getType() == types::TY_LLVM_IR || Index: lib/Frontend/CompilerInvocation.cpp ==================================================================--- lib/Frontend/CompilerInvocation.cpp (revision 237590) +++ lib/Frontend/CompilerInvocation.cpp (working copy) @@ -489,6 +489,8 @@ static bool ParseCodeGenArgs(CodeGenOptions &Opts, Opts.MergeFunctions = Args.hasArg(OPT_fmerge_functions); + Opts.LTO = Args.hasArg(OPT_flto); + Opts.MSVolatile = Args.hasArg(OPT_fms_volatile); Opts.VectorizeBB = Args.hasArg(OPT_vectorize_slp_aggressive); Index: test/CodeGen/available-externally-suppress.c ==================================================================--- test/CodeGen/available-externally-suppress.c (revision 237590) +++ test/CodeGen/available-externally-suppress.c (working copy) @@ -1,6 +1,10 @@ // RUN: %clang_cc1 -emit-llvm -o - -triple x86_64-apple-darwin10 %s | FileCheck %s +// RUN: %clang_cc1 -O2 -fno-inline -emit-llvm -o - -triple x86_64-apple-darwin10 %s | FileCheck %s +// RUN: %clang_cc1 -flto -O2 -fno-inline -emit-llvm -o - -triple x86_64-apple-darwin10 %s | FileCheck %s -check-prefix=LTO // Ensure that we don't emit available_externally functions at -O0. +// Also should not emit them at -O2, unless -flto is present in which case +// we should preserve them for link-time inlining decisions. int x; inline void f0(int y) { x = y; } @@ -7,6 +11,8 @@ inline void f0(int y) { x = y; } // CHECK-LABEL: define void @test() // CHECK: declare void @f0(i32) +// LTO-LABEL: define void @test() +// LTO: define available_externally void @f0 void test() { f0(17); } @@ -19,9 +25,13 @@ inline int __attribute__((always_inline)) f1(int x } // CHECK: @test1 +// LTO: @test1 int test1(int x) { // CHECK: br i1 // CHECK-NOT: call {{.*}} @f1 // CHECK: ret i32 + // LTO: br i1 + // LTO-NOT: call {{.*}} @f1 + // LTO: ret i32 return f1(x); }