Teresa Johnson
2015-Jun-03 16:56 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
On Tue, May 19, 2015 at 1:18 PM, David Blaikie <dblaikie at gmail.com> wrote:> > > On Tue, May 19, 2015 at 1:05 PM, Teresa Johnson <tejohnson at google.com> > wrote: >> >> On Tue, May 19, 2015 at 12:41 PM, David Blaikie <dblaikie at gmail.com> >> wrote: >> > >> > >> > On Mon, May 18, 2015 at 9:09 PM, Teresa Johnson <tejohnson at google.com> >> > wrote: >> >> >> >> On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <tejohnson at google.com> >> >> wrote: >> >> > On Fri, May 15, 2015 at 2:52 PM, David Blaikie <dblaikie at gmail.com> >> >> > wrote: >> >> >> >> >> >> >> >> >> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson >> >> >> <tejohnson at google.com> >> >> >> wrote: >> >> >>> >> >> >>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie >> >> >>> <dblaikie at gmail.com> >> >> >>> wrote: >> >> >>> > >> >> >>> > >> >> >>> > On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson >> >> >>> > <tejohnson at google.com> >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Looking back through my GlobalDCE changes, it looks like one of >> >> >>> >> the >> >> >>> >> places I had changed (where we mark defined globals in >> >> >>> >> runOnModule) >> >> >>> >> already has a guard for !hasAvailableExternallyLinkage and >> >> >>> >> !isDiscardableIfUnused, so my additional guard against marking >> >> >>> >> imported functions is unnecessary. But the other place I had to >> >> >>> >> change >> >> >>> >> was in GlobalIsNeeded where it walks through the function and >> >> >>> >> recursively marks any referenced global as needed. Here there >> >> >>> >> was >> >> >>> >> no >> >> >>> >> guard against marking a global that is available externally as >> >> >>> >> needed >> >> >>> >> if it is referenced. I had added a check here to not mark >> >> >>> >> imported >> >> >>> >> functions as needed on reference unless they were discardable >> >> >>> >> (i.e. >> >> >>> >> link once). Is this a bug - should this have a guard against >> >> >>> >> marking >> >> >>> >> available externally function refs as needed? >> >> >>> > >> >> >>> > >> >> >>> > Duncan's probably got a better idea of what the right answer is >> >> >>> > here. I >> >> >>> > suspect "yes". >> >> >>> > >> >> >>> > The trick with available_externally is to ensure we keep these >> >> >>> > around >> >> >>> > for >> >> >>> > long enough that their definitions are useful (for inlining, >> >> >>> > constant >> >> >>> > prop, >> >> >>> > all that good stuff) but remove them before we actually do too >> >> >>> > much >> >> >>> > work >> >> >>> > on >> >> >>> > them, or at least before we emit them into the final object file. >> >> >>> >> >> >>> Yep, and that is exactly how long I want my imported functions to >> >> >>> stick around. Currently I have the ThinLTO import pass added at the >> >> >>> start of addLTOOptimizationPasses, just after the AA passes. So the >> >> >>> imported functions stick around through global opt, constant merge, >> >> >>> combining, and inlining. The call to GlobalDCE is just after >> >> >>> inlining >> >> >>> and a second globalopt. >> >> >>> >> >> >>> > >> >> >>> > I imagine if GlobalDCE isn't removing available_externally >> >> >>> > functions >> >> >>> > it's >> >> >>> > because they're still useful in the optimization pipeline and >> >> >>> > something >> >> >>> > further down the pipe removes them (because they do, ultimately, >> >> >>> > get >> >> >>> > removed). >> >> >>> >> >> >>> That would be good to know, since if they are useful later on in >> >> >>> the >> >> >>> pipe then presumably my imported functions could be as well. But I >> >> >>> wonder who is removing them since GlobalDCE isn't (if there are >> >> >>> refs)? >> >> >> >> >> >> >> >> >> Yep - seems no one removes them (well, probably if there are no >> >> >> calls >> >> >> to the >> >> >> function at all they get removed as usual (eg: if all the calls are >> >> >> inlined >> >> >> we probably drop the function as we would for a linkonce_odr >> >> >> function)). If >> >> >> they're still around come codegen time, we just skip them: >> >> >> >> >> >> lib/CodeGen/MachineFunctionPass.cpp, >> >> >> MachineFunctionPass::runOnFunction: >> >> >> >> >> >> 33| bool MachineFunctionPass::runOnFunction(Function &F) { >> >> >> 34| // Do not codegen any 'available_externally' functions at all, >> >> >> they >> >> >> have >> >> >> 35| // definitions outside the translation unit. >> >> >> 36| if (F.hasAvailableExternallyLinkage()) >> >> >> 37| return false; >> >> >> 38| >> >> >> 39| MachineFunction &MF >> >> >> getAnalysis<MachineFunctionAnalysis>().getMF(); >> >> >> 40| return runOnMachineFunction(MF); >> >> >> 41| } >> >> > >> >> > Ok, thanks for digging this up. I guess that would work for the >> >> > imported functions as well, but it seems like a waste of compile time >> >> > if they are not useful after inlining/global opt. Is there any reason >> >> > to keep them past GlobalDCE or should I try changing GlobalDCE so >> >> > that >> >> > it removes them? >> >> > >> >> >> >> I found allowing GlobalDCE to remove referenced >> >> AvailableExternallyLinkage values earlier passes all the clang/llvm >> >> tests (see patch below). >> > >> > >> > The LLVM regression tests might not catch the sort of regression this >> > change >> > could cause. Usually we test each pass in isolation and tend towards >> > white >> > box testing (so we test the cases that are "interesting" according to >> > the >> > algorithm) - in this case, you're adding a condition (adding an >> > "interesting" case that wasn't interesting before - so didn't need to be >> > tested explicitly) that changes the behavior of GlobalDCE. This behavior >> > may've been depended upon by further down-stream optimizations for >> > overall >> > performance. >> > >> >> >> >> AFAICT, it shouldn't be necessary to keep >> >> these functions around past the first call to GlobalDCE (which is >> >> after inlining), but let me know if I missed something. >> > >> > >> > It's possible they're still useful for optimizations other than >> > inlining. We >> > can still do constant propagation through such functions (hey, look, >> > this >> > function always returns 3, etc). >> >> I thought this was done by passes such as IPSCCPPass and GlobalOpt >> which happen earlier. > > > You might well be right - I know next to nothing about these things. Having > someone a double-check/review from someone more familiar would be nice. > > I'll leave the rest of the review to any such someone who might be > sufficiently informed about the pass pipeline to decide whether this is the > right call. (and/or perf numbers that demonstrate it) > > Hopefully my naive review has at least covered the basics.I just did some SPEC cpu2006 performance testing with this change, at -O2 both with and without LTO. There were no measurable performance differences. In fact, the final binaries were largely identical (always the same size before and after, generated code looks identical, slight differences in label naming are the only differences in the nm and objdump output for a couple benchmarks - most are totally identical). Can someone take a look at the patch and approve if possible? While currently this doesn't affect large numbers of functions, with ThinLTO we want to avoid keeping around the larger numbers of available-externally imported objects unnecessarily. This change avoids the need of special casing here for ThinLTO imported functions/variables. Thanks, Teresa> > - David > >> >> >> > >> > It might be necessary/worth actually running perf numbers on such a >> > change >> > (or manually (possibly by asking those who know - not me, unfortunately) >> > checking the passes that come after GlobalDCE to see if any would do >> > cross >> > function constant prop, etc. Perhaps someone else knows this >> > off-the-cuff. >> >> I can collect some perf data. >> >> > >> >> >> >> Although they >> >> will be removed eventually by MachineFunctionPass as David pointed >> >> out, >> > >> > >> > Interestingly they're not exactly /removed/, they're just not emitted. I >> > don't know whether that's important. >> >> Yes, true, suppressed not removed. >> >> > >> >> >> >> so this is more of a compile-time improvement than anything. It >> >> will affect ThinLTO more significantly in compile time because there >> >> will be more functions with this linkage type, depending on how many >> >> static functions >> > >> > >> > Hmm - why static functions in particular? I imagine this attribute would >> > be >> > used for non-static functions too in ThinLTO. >> >> Arg, yes, I made the same mistake in some earlier email - all "global" >> functions (originally so and those static promoted) will be >> available_externally. So this is a significant amount for ThinLTO. >> >> > >> >> >> >> are imported that we have to promote. >> >> >> >> >> >> Also, what about available_externally variable >> >> definitions/initializations? I'm not sure what source construct >> >> currently can generate those, >> > >> > >> > I believe we emit C++ vtables as available_externally when the type has >> > a >> > key function. This allows devirtualization opportunities without >> > bloating >> > object file size with duplicate vtable data. >> > >> >> >> >> but are these suppressed/turned into >> >> declarations at some point? >> > >> > >> > Maybe? Again, I would guess (and you can check some of this with >> > -print-after-all to see how the IR changes as it goes through the >> > optimizations) they might just be omitted from the generated code, not >> > necessarily removed from the IR at any point. >> >> Ok, I will investigate. >> >> Thanks, >> Teresa >> >> > >> >> >> >> Note that ThinLTO imported file static >> >> variables always need to be promoted, and will be marked >> >> available_external. >> >> >> >> Index: lib/Transforms/IPO/GlobalDCE.cpp >> >> ==================================================================>> >> --- lib/Transforms/IPO/GlobalDCE.cpp (revision 237590) >> >> +++ lib/Transforms/IPO/GlobalDCE.cpp (working copy) >> >> @@ -162,7 +162,10 @@ bool GlobalDCE::runOnModule(Module &M) { >> >> // themselves. >> >> for (unsigned i = 0, e = DeadFunctions.size(); i != e; ++i) { >> >> RemoveUnusedGlobalValue(*DeadFunctions[i]); >> >> - M.getFunctionList().erase(DeadFunctions[i]); >> >> + // Might have deleted the body of an available externally >> >> function >> >> that >> >> + // is still referenced. Leave the declaration. >> >> + if (DeadFunctions[i]->use_empty()) >> >> + M.getFunctionList().erase(DeadFunctions[i]); >> >> } >> >> NumFunctions += DeadFunctions.size(); >> >> Changed = true; >> >> @@ -171,7 +174,10 @@ bool GlobalDCE::runOnModule(Module &M) { >> >> if (!DeadGlobalVars.empty()) { >> >> for (unsigned i = 0, e = DeadGlobalVars.size(); i != e; ++i) { >> >> RemoveUnusedGlobalValue(*DeadGlobalVars[i]); >> >> - M.getGlobalList().erase(DeadGlobalVars[i]); >> >> + // Might have deleted the definition of an available externally >> >> function >> >> + // that is still referenced. Leave the declaration. >> >> + if (DeadGlobalVars[i]->use_empty()) >> >> + M.getGlobalList().erase(DeadGlobalVars[i]); >> >> } >> >> NumVariables += DeadGlobalVars.size(); >> >> Changed = true; >> >> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) { >> >> for (Function::iterator BB = F->begin(), E = F->end(); BB != E; >> >> ++BB) >> >> for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I !>> >> E; >> >> ++I) >> >> for (User::op_iterator U = I->op_begin(), E = I->op_end(); U >> >> !>> >> E; ++U) >> >> - if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) >> >> - GlobalIsNeeded(GV); >> >> + if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) { >> >> + if (!GV->hasAvailableExternallyLinkage()) >> >> + GlobalIsNeeded(GV); >> >> + } >> >> else if (Constant *C = dyn_cast<Constant>(*U)) >> >> MarkUsedGlobalsAsNeeded(C); >> >> } >> >> >> >> >> >> >> >> -- >> >> Teresa Johnson | Software Engineer | tejohnson at google.com | >> >> 408-460-2413 >> > >> > >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 > >-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
Duncan P. N. Exon Smith
2015-Jun-03 21:07 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
> On 2015-Jun-03, at 09:56, Teresa Johnson <tejohnson at google.com> wrote: > > On Tue, May 19, 2015 at 1:18 PM, David Blaikie <dblaikie at gmail.com> wrote: >> >> >> On Tue, May 19, 2015 at 1:05 PM, Teresa Johnson <tejohnson at google.com> >> wrote: >>> >>> On Tue, May 19, 2015 at 12:41 PM, David Blaikie <dblaikie at gmail.com> >>> wrote: >>>> >>>> >>>> On Mon, May 18, 2015 at 9:09 PM, Teresa Johnson <tejohnson at google.com> >>>> wrote: >>>>> >>>>> On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <tejohnson at google.com> >>>>> wrote: >>>>>> On Fri, May 15, 2015 at 2:52 PM, David Blaikie <dblaikie at gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson >>>>>>> <tejohnson at google.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie >>>>>>>> <dblaikie at gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson >>>>>>>>> <tejohnson at google.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Looking back through my GlobalDCE changes, it looks like one of >>>>>>>>>> the >>>>>>>>>> places I had changed (where we mark defined globals in >>>>>>>>>> runOnModule) >>>>>>>>>> already has a guard for !hasAvailableExternallyLinkage and >>>>>>>>>> !isDiscardableIfUnused, so my additional guard against marking >>>>>>>>>> imported functions is unnecessary. But the other place I had to >>>>>>>>>> change >>>>>>>>>> was in GlobalIsNeeded where it walks through the function and >>>>>>>>>> recursively marks any referenced global as needed. Here there >>>>>>>>>> was >>>>>>>>>> no >>>>>>>>>> guard against marking a global that is available externally as >>>>>>>>>> needed >>>>>>>>>> if it is referenced. I had added a check here to not mark >>>>>>>>>> imported >>>>>>>>>> functions as needed on reference unless they were discardable >>>>>>>>>> (i.e. >>>>>>>>>> link once). Is this a bug - should this have a guard against >>>>>>>>>> marking >>>>>>>>>> available externally function refs as needed? >>>>>>>>> >>>>>>>>> >>>>>>>>> Duncan's probably got a better idea of what the right answer is >>>>>>>>> here. I >>>>>>>>> suspect "yes". >>>>>>>>> >>>>>>>>> The trick with available_externally is to ensure we keep these >>>>>>>>> around >>>>>>>>> for >>>>>>>>> long enough that their definitions are useful (for inlining, >>>>>>>>> constant >>>>>>>>> prop, >>>>>>>>> all that good stuff) but remove them before we actually do too >>>>>>>>> much >>>>>>>>> work >>>>>>>>> on >>>>>>>>> them, or at least before we emit them into the final object file. >>>>>>>> >>>>>>>> Yep, and that is exactly how long I want my imported functions to >>>>>>>> stick around. Currently I have the ThinLTO import pass added at the >>>>>>>> start of addLTOOptimizationPasses, just after the AA passes. So the >>>>>>>> imported functions stick around through global opt, constant merge, >>>>>>>> combining, and inlining. The call to GlobalDCE is just after >>>>>>>> inlining >>>>>>>> and a second globalopt. >>>>>>>> >>>>>>>>> >>>>>>>>> I imagine if GlobalDCE isn't removing available_externally >>>>>>>>> functions >>>>>>>>> it's >>>>>>>>> because they're still useful in the optimization pipeline and >>>>>>>>> something >>>>>>>>> further down the pipe removes them (because they do, ultimately, >>>>>>>>> get >>>>>>>>> removed). >>>>>>>> >>>>>>>> That would be good to know, since if they are useful later on in >>>>>>>> the >>>>>>>> pipe then presumably my imported functions could be as well. But I >>>>>>>> wonder who is removing them since GlobalDCE isn't (if there are >>>>>>>> refs)? >>>>>>> >>>>>>> >>>>>>> Yep - seems no one removes them (well, probably if there are no >>>>>>> calls >>>>>>> to the >>>>>>> function at all they get removed as usual (eg: if all the calls are >>>>>>> inlined >>>>>>> we probably drop the function as we would for a linkonce_odr >>>>>>> function)). If >>>>>>> they're still around come codegen time, we just skip them: >>>>>>> >>>>>>> lib/CodeGen/MachineFunctionPass.cpp, >>>>>>> MachineFunctionPass::runOnFunction: >>>>>>> >>>>>>> 33| bool MachineFunctionPass::runOnFunction(Function &F) { >>>>>>> 34| // Do not codegen any 'available_externally' functions at all, >>>>>>> they >>>>>>> have >>>>>>> 35| // definitions outside the translation unit. >>>>>>> 36| if (F.hasAvailableExternallyLinkage()) >>>>>>> 37| return false; >>>>>>> 38| >>>>>>> 39| MachineFunction &MF >>>>>>> getAnalysis<MachineFunctionAnalysis>().getMF(); >>>>>>> 40| return runOnMachineFunction(MF); >>>>>>> 41| } >>>>>> >>>>>> Ok, thanks for digging this up. I guess that would work for the >>>>>> imported functions as well, but it seems like a waste of compile time >>>>>> if they are not useful after inlining/global opt. Is there any reason >>>>>> to keep them past GlobalDCE or should I try changing GlobalDCE so >>>>>> that >>>>>> it removes them? >>>>>> >>>>> >>>>> I found allowing GlobalDCE to remove referenced >>>>> AvailableExternallyLinkage values earlier passes all the clang/llvm >>>>> tests (see patch below). >>>> >>>> >>>> The LLVM regression tests might not catch the sort of regression this >>>> change >>>> could cause. Usually we test each pass in isolation and tend towards >>>> white >>>> box testing (so we test the cases that are "interesting" according to >>>> the >>>> algorithm) - in this case, you're adding a condition (adding an >>>> "interesting" case that wasn't interesting before - so didn't need to be >>>> tested explicitly) that changes the behavior of GlobalDCE. This behavior >>>> may've been depended upon by further down-stream optimizations for >>>> overall >>>> performance. >>>> >>>>> >>>>> AFAICT, it shouldn't be necessary to keep >>>>> these functions around past the first call to GlobalDCE (which is >>>>> after inlining), but let me know if I missed something. >>>> >>>> >>>> It's possible they're still useful for optimizations other than >>>> inlining. We >>>> can still do constant propagation through such functions (hey, look, >>>> this >>>> function always returns 3, etc). >>> >>> I thought this was done by passes such as IPSCCPPass and GlobalOpt >>> which happen earlier. >> >> >> You might well be right - I know next to nothing about these things. Having >> someone a double-check/review from someone more familiar would be nice. >> >> I'll leave the rest of the review to any such someone who might be >> sufficiently informed about the pass pipeline to decide whether this is the >> right call. (and/or perf numbers that demonstrate it) >> >> Hopefully my naive review has at least covered the basics. > > I just did some SPEC cpu2006 performance testing with this change, at > -O2 both with and without LTO. There were no measurable performance > differences. In fact, the final binaries were largely identical > (always the same size before and after, generated code looks > identical, slight differences in label naming are the only differences > in the nm and objdump output for a couple benchmarks - most are > totally identical). > > Can someone take a look at the patch and approve if possible? While > currently this doesn't affect large numbers of functions, with ThinLTO > we want to avoid keeping around the larger numbers of > available-externally imported objects unnecessarily. This change > avoids the need of special casing here for ThinLTO imported > functions/variables.I'd be interested in performance of a clang bootstrap or of Chromium (or some other large C++ program built with -flto). Moreover, I suspect this will cause a problem for full LTO, where a lot of inlining takes place at link time. There have even been proposals (but no one has collected any data to share yet) to run just the always-inliner at compile time, deferring almost all inlining until later. But if we can have different pass pipelines for "normal" vs. "LTO" compiles, then "ThinLTO" can have its own pass pipeline as well. Seems like it already needs one. Why not add a late pass there to drop available_externally functions?
Teresa Johnson
2015-Jun-04 14:10 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:> >> On 2015-Jun-03, at 09:56, Teresa Johnson <tejohnson at google.com> wrote: >> >> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <dblaikie at gmail.com> wrote: >>> >>> >>> On Tue, May 19, 2015 at 1:05 PM, Teresa Johnson <tejohnson at google.com> >>> wrote: >>>> >>>> On Tue, May 19, 2015 at 12:41 PM, David Blaikie <dblaikie at gmail.com> >>>> wrote: >>>>> >>>>> >>>>> On Mon, May 18, 2015 at 9:09 PM, Teresa Johnson <tejohnson at google.com> >>>>> wrote: >>>>>> >>>>>> On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <tejohnson at google.com> >>>>>> wrote: >>>>>>> On Fri, May 15, 2015 at 2:52 PM, David Blaikie <dblaikie at gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson >>>>>>>> <tejohnson at google.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie >>>>>>>>> <dblaikie at gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson >>>>>>>>>> <tejohnson at google.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Looking back through my GlobalDCE changes, it looks like one of >>>>>>>>>>> the >>>>>>>>>>> places I had changed (where we mark defined globals in >>>>>>>>>>> runOnModule) >>>>>>>>>>> already has a guard for !hasAvailableExternallyLinkage and >>>>>>>>>>> !isDiscardableIfUnused, so my additional guard against marking >>>>>>>>>>> imported functions is unnecessary. But the other place I had to >>>>>>>>>>> change >>>>>>>>>>> was in GlobalIsNeeded where it walks through the function and >>>>>>>>>>> recursively marks any referenced global as needed. Here there >>>>>>>>>>> was >>>>>>>>>>> no >>>>>>>>>>> guard against marking a global that is available externally as >>>>>>>>>>> needed >>>>>>>>>>> if it is referenced. I had added a check here to not mark >>>>>>>>>>> imported >>>>>>>>>>> functions as needed on reference unless they were discardable >>>>>>>>>>> (i.e. >>>>>>>>>>> link once). Is this a bug - should this have a guard against >>>>>>>>>>> marking >>>>>>>>>>> available externally function refs as needed? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Duncan's probably got a better idea of what the right answer is >>>>>>>>>> here. I >>>>>>>>>> suspect "yes". >>>>>>>>>> >>>>>>>>>> The trick with available_externally is to ensure we keep these >>>>>>>>>> around >>>>>>>>>> for >>>>>>>>>> long enough that their definitions are useful (for inlining, >>>>>>>>>> constant >>>>>>>>>> prop, >>>>>>>>>> all that good stuff) but remove them before we actually do too >>>>>>>>>> much >>>>>>>>>> work >>>>>>>>>> on >>>>>>>>>> them, or at least before we emit them into the final object file. >>>>>>>>> >>>>>>>>> Yep, and that is exactly how long I want my imported functions to >>>>>>>>> stick around. Currently I have the ThinLTO import pass added at the >>>>>>>>> start of addLTOOptimizationPasses, just after the AA passes. So the >>>>>>>>> imported functions stick around through global opt, constant merge, >>>>>>>>> combining, and inlining. The call to GlobalDCE is just after >>>>>>>>> inlining >>>>>>>>> and a second globalopt. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I imagine if GlobalDCE isn't removing available_externally >>>>>>>>>> functions >>>>>>>>>> it's >>>>>>>>>> because they're still useful in the optimization pipeline and >>>>>>>>>> something >>>>>>>>>> further down the pipe removes them (because they do, ultimately, >>>>>>>>>> get >>>>>>>>>> removed). >>>>>>>>> >>>>>>>>> That would be good to know, since if they are useful later on in >>>>>>>>> the >>>>>>>>> pipe then presumably my imported functions could be as well. But I >>>>>>>>> wonder who is removing them since GlobalDCE isn't (if there are >>>>>>>>> refs)? >>>>>>>> >>>>>>>> >>>>>>>> Yep - seems no one removes them (well, probably if there are no >>>>>>>> calls >>>>>>>> to the >>>>>>>> function at all they get removed as usual (eg: if all the calls are >>>>>>>> inlined >>>>>>>> we probably drop the function as we would for a linkonce_odr >>>>>>>> function)). If >>>>>>>> they're still around come codegen time, we just skip them: >>>>>>>> >>>>>>>> lib/CodeGen/MachineFunctionPass.cpp, >>>>>>>> MachineFunctionPass::runOnFunction: >>>>>>>> >>>>>>>> 33| bool MachineFunctionPass::runOnFunction(Function &F) { >>>>>>>> 34| // Do not codegen any 'available_externally' functions at all, >>>>>>>> they >>>>>>>> have >>>>>>>> 35| // definitions outside the translation unit. >>>>>>>> 36| if (F.hasAvailableExternallyLinkage()) >>>>>>>> 37| return false; >>>>>>>> 38| >>>>>>>> 39| MachineFunction &MF >>>>>>>> getAnalysis<MachineFunctionAnalysis>().getMF(); >>>>>>>> 40| return runOnMachineFunction(MF); >>>>>>>> 41| } >>>>>>> >>>>>>> Ok, thanks for digging this up. I guess that would work for the >>>>>>> imported functions as well, but it seems like a waste of compile time >>>>>>> if they are not useful after inlining/global opt. Is there any reason >>>>>>> to keep them past GlobalDCE or should I try changing GlobalDCE so >>>>>>> that >>>>>>> it removes them? >>>>>>> >>>>>> >>>>>> I found allowing GlobalDCE to remove referenced >>>>>> AvailableExternallyLinkage values earlier passes all the clang/llvm >>>>>> tests (see patch below). >>>>> >>>>> >>>>> The LLVM regression tests might not catch the sort of regression this >>>>> change >>>>> could cause. Usually we test each pass in isolation and tend towards >>>>> white >>>>> box testing (so we test the cases that are "interesting" according to >>>>> the >>>>> algorithm) - in this case, you're adding a condition (adding an >>>>> "interesting" case that wasn't interesting before - so didn't need to be >>>>> tested explicitly) that changes the behavior of GlobalDCE. This behavior >>>>> may've been depended upon by further down-stream optimizations for >>>>> overall >>>>> performance. >>>>> >>>>>> >>>>>> AFAICT, it shouldn't be necessary to keep >>>>>> these functions around past the first call to GlobalDCE (which is >>>>>> after inlining), but let me know if I missed something. >>>>> >>>>> >>>>> It's possible they're still useful for optimizations other than >>>>> inlining. We >>>>> can still do constant propagation through such functions (hey, look, >>>>> this >>>>> function always returns 3, etc). >>>> >>>> I thought this was done by passes such as IPSCCPPass and GlobalOpt >>>> which happen earlier. >>> >>> >>> You might well be right - I know next to nothing about these things. Having >>> someone a double-check/review from someone more familiar would be nice. >>> >>> I'll leave the rest of the review to any such someone who might be >>> sufficiently informed about the pass pipeline to decide whether this is the >>> right call. (and/or perf numbers that demonstrate it) >>> >>> Hopefully my naive review has at least covered the basics. >> >> I just did some SPEC cpu2006 performance testing with this change, at >> -O2 both with and without LTO. There were no measurable performance >> differences. In fact, the final binaries were largely identical >> (always the same size before and after, generated code looks >> identical, slight differences in label naming are the only differences >> in the nm and objdump output for a couple benchmarks - most are >> totally identical). >> >> Can someone take a look at the patch and approve if possible? While >> currently this doesn't affect large numbers of functions, with ThinLTO >> we want to avoid keeping around the larger numbers of >> available-externally imported objects unnecessarily. This change >> avoids the need of special casing here for ThinLTO imported >> functions/variables. > > I'd be interested in performance of a clang bootstrap or of Chromium (or > some other large C++ program built with -flto). > > Moreover, I suspect this will cause a problem for full LTO, where a lot > of inlining takes place at link time. There have even been proposals > (but no one has collected any data to share yet) to run just the > always-inliner at compile time, deferring almost all inlining until > later.Note that one of the sets of SPEC cpu2006 data I collected was with full LTO. I'm trying to understand where this would be an issue for full LTO, since GlobalDCE is running after inlining in the link time LTO pipeline. Just to be clear, are you concerned about the following: 1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an unreferenced available externally function 2) LTO link step would have inlined that available externally function somewhere (must be a different module since it was unreferenced in step 1) but now can't I'm not sure when 2) can happen since it had to be unreferenced in its module to be removed with my changes. For an available externally function, any reference from a different module would have needed its own copy to start with (at least that is the case with C inline functions). It looks like available externally C inline functions are already removed when unreferenced by the compile step (e.g. when inlined into that module's references during the "-flto -O2 -c" compile).> > But if we can have different pass pipelines for "normal" vs. "LTO" > compiles, then "ThinLTO" can have its own pass pipeline as well. Seems > like it already needs one. Why not add a late pass there to drop > available_externally functions? >Sure that's possible, but I am trying to understand what circumstance there is where a full LTO would need to keep around available externally functions after GlobalDCE where ThinLTO wouldn't. Presumably if they are useful for optimization after GlobalDCE then we would want to keep them around longer for ThinLTO as well. I just haven't identified any need to do so. Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413