Teresa Johnson
2015-Jun-04 17:13 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
On Thu, Jun 4, 2015 at 9:51 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:> >> On 2015-Jun-04, at 07:10, Teresa Johnson <tejohnson at google.com> wrote: >> >> On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith >> <dexonsmith at apple.com> wrote: >>> >>>> On 2015-Jun-03, at 09:56, Teresa Johnson <tejohnson at google.com> wrote: >>>> >>>> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <dblaikie at gmail.com> wrote: >>>> >>>>> You might well be right - I know next to nothing about these things. Having >>>>> someone a double-check/review from someone more familiar would be nice. >>>>> >>>>> I'll leave the rest of the review to any such someone who might be >>>>> sufficiently informed about the pass pipeline to decide whether this is the >>>>> right call. (and/or perf numbers that demonstrate it) >>>>> >>>>> Hopefully my naive review has at least covered the basics. >>>> >>>> I just did some SPEC cpu2006 performance testing with this change, at >>>> -O2 both with and without LTO. There were no measurable performance >>>> differences. In fact, the final binaries were largely identical >>>> (always the same size before and after, generated code looks >>>> identical, slight differences in label naming are the only differences >>>> in the nm and objdump output for a couple benchmarks - most are >>>> totally identical). >>>> >>>> Can someone take a look at the patch and approve if possible? While >>>> currently this doesn't affect large numbers of functions, with ThinLTO >>>> we want to avoid keeping around the larger numbers of >>>> available-externally imported objects unnecessarily. This change >>>> avoids the need of special casing here for ThinLTO imported >>>> functions/variables. >>> >>> I'd be interested in performance of a clang bootstrap or of Chromium (or >>> some other large C++ program built with -flto). >>> >>> Moreover, I suspect this will cause a problem for full LTO, where a lot >>> of inlining takes place at link time. There have even been proposals >>> (but no one has collected any data to share yet) to run just the >>> always-inliner at compile time, deferring almost all inlining until >>> later. >> >> Note that one of the sets of SPEC cpu2006 data I collected was with >> full LTO. I'm trying to understand where this would be an issue for >> full LTO, since GlobalDCE is running after inlining in the link time >> LTO pipeline. >> >> Just to be clear, are you concerned about the following: >> >> 1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an >> unreferenced available externally function >> 2) LTO link step would have inlined that available externally function >> somewhere (must be a different module since it was unreferenced in >> step 1) but now can't >> >> I'm not sure when 2) can happen since it had to be unreferenced in its >> module to be removed with my changes. > > That's not what it looks like to me. Here's what I think the relevant > part of your patch is: > > On 2015-May-18, at 21:09, Teresa Johnson <tejohnson at google.com> wrote: >> >> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) { >> for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB) >> for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) >> for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U) >> - if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) >> - GlobalIsNeeded(GV); >> + if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) { >> + if (!GV->hasAvailableExternallyLinkage()) >> + GlobalIsNeeded(GV); >> + } >> else if (Constant *C = dyn_cast<Constant>(*U)) >> MarkUsedGlobalsAsNeeded(C); >> } > > > IIUC, this changes the logic from "if it's referenced, keep it" to > "if it's referenced and not available_externally, keep it".Sorry, you're right, I conflated a couple of different things when I looked at this again this morning. For ThinLTO I don't in fact need this change for anything that is fully inlined after importing.> > In particular, I'm worried about GlobalDCE removing a *referenced* > available_externally function, particularly if/when we stop inlining at > the compile step when -flto.Ok, that makes sense. In that case, I agree that doing this for ThinLTO imported functions is the right way to go. My initial approach was to mark the functions that were ThinLTO-imported and check that here. I think we will likely want to mark the imported functions as such regardless, since we may want to apply different optimization thresholds for imported functions, or at least use for debugging and tracing, so we could consider using that approach here. Another possibility as you mentioned was to drop in a new pass in the ThinLTO backend optimization pipeline just for this. Or perhaps the GlobalDCE pass invocation could take a flag indicating that it should remove the avail extern functions, that we can set in the ThinLTO backend case. It seems like most of the code would be shared between the current GlobalDCE and any new ThinLTO-specific version that setting GlobalDCE up to do this optionally (either per-imported function or with a configuration flag on the pass) made sense. Teresa> >> For an available externally >> function, any reference from a different module would have needed its >> own copy to start with (at least that is the case with C inline >> functions). It looks like available externally C inline functions are >> already removed when unreferenced by the compile step (e.g. when >> inlined into that module's references during the "-flto -O2 -c" >> compile). >> >>> >>> But if we can have different pass pipelines for "normal" vs. "LTO" >>> compiles, then "ThinLTO" can have its own pass pipeline as well. Seems >>> like it already needs one. Why not add a late pass there to drop >>> available_externally functions? >>> >> >> Sure that's possible, but I am trying to understand what circumstance >> there is where a full LTO would need to keep around available >> externally functions after GlobalDCE where ThinLTO wouldn't. >> Presumably if they are useful for optimization after GlobalDCE then we >> would want to keep them around longer for ThinLTO as well. I just >> haven't identified any need to do so. >> >> Thanks, >> Teresa >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 >-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
Duncan P. N. Exon Smith
2015-Jun-04 18:27 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
> On 2015-Jun-04, at 10:13, Teresa Johnson <tejohnson at google.com> wrote: > > On Thu, Jun 4, 2015 at 9:51 AM, Duncan P. N. Exon Smith > <dexonsmith at apple.com> wrote: >> >>> On 2015-Jun-04, at 07:10, Teresa Johnson <tejohnson at google.com> wrote: >>> >>> On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith >>> <dexonsmith at apple.com> wrote: >>>> >>>>> On 2015-Jun-03, at 09:56, Teresa Johnson <tejohnson at google.com> wrote: >>>>> >>>>> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <dblaikie at gmail.com> wrote: >>>>> >>>>>> You might well be right - I know next to nothing about these things. Having >>>>>> someone a double-check/review from someone more familiar would be nice. >>>>>> >>>>>> I'll leave the rest of the review to any such someone who might be >>>>>> sufficiently informed about the pass pipeline to decide whether this is the >>>>>> right call. (and/or perf numbers that demonstrate it) >>>>>> >>>>>> Hopefully my naive review has at least covered the basics. >>>>> >>>>> I just did some SPEC cpu2006 performance testing with this change, at >>>>> -O2 both with and without LTO. There were no measurable performance >>>>> differences. In fact, the final binaries were largely identical >>>>> (always the same size before and after, generated code looks >>>>> identical, slight differences in label naming are the only differences >>>>> in the nm and objdump output for a couple benchmarks - most are >>>>> totally identical). >>>>> >>>>> Can someone take a look at the patch and approve if possible? While >>>>> currently this doesn't affect large numbers of functions, with ThinLTO >>>>> we want to avoid keeping around the larger numbers of >>>>> available-externally imported objects unnecessarily. This change >>>>> avoids the need of special casing here for ThinLTO imported >>>>> functions/variables. >>>> >>>> I'd be interested in performance of a clang bootstrap or of Chromium (or >>>> some other large C++ program built with -flto). >>>> >>>> Moreover, I suspect this will cause a problem for full LTO, where a lot >>>> of inlining takes place at link time. There have even been proposals >>>> (but no one has collected any data to share yet) to run just the >>>> always-inliner at compile time, deferring almost all inlining until >>>> later. >>> >>> Note that one of the sets of SPEC cpu2006 data I collected was with >>> full LTO. I'm trying to understand where this would be an issue for >>> full LTO, since GlobalDCE is running after inlining in the link time >>> LTO pipeline. >>> >>> Just to be clear, are you concerned about the following: >>> >>> 1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an >>> unreferenced available externally function >>> 2) LTO link step would have inlined that available externally function >>> somewhere (must be a different module since it was unreferenced in >>> step 1) but now can't >>> >>> I'm not sure when 2) can happen since it had to be unreferenced in its >>> module to be removed with my changes. >> >> That's not what it looks like to me. Here's what I think the relevant >> part of your patch is: >> >> On 2015-May-18, at 21:09, Teresa Johnson <tejohnson at google.com> wrote: >>> >>> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) { >>> for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB) >>> for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) >>> for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U) >>> - if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) >>> - GlobalIsNeeded(GV); >>> + if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) { >>> + if (!GV->hasAvailableExternallyLinkage()) >>> + GlobalIsNeeded(GV); >>> + } >>> else if (Constant *C = dyn_cast<Constant>(*U)) >>> MarkUsedGlobalsAsNeeded(C); >>> } >> >> >> IIUC, this changes the logic from "if it's referenced, keep it" to >> "if it's referenced and not available_externally, keep it". > > Sorry, you're right, I conflated a couple of different things when I > looked at this again this morning. For ThinLTO I don't in fact need > this change for anything that is fully inlined after importing. > >> >> In particular, I'm worried about GlobalDCE removing a *referenced* >> available_externally function, particularly if/when we stop inlining at >> the compile step when -flto. > > Ok, that makes sense. In that case, I agree that doing this for > ThinLTO imported functions is the right way to go. My initial approach > was to mark the functions that were ThinLTO-imported and check that > here. I think we will likely want to mark the imported functions as > such regardless, since we may want to apply different optimization > thresholds for imported functions, or at least use for debugging and > tracing, so we could consider using that approach here. > > Another possibility as you mentioned was to drop in a new pass in the > ThinLTO backend optimization pipeline just for this. Or perhaps the > GlobalDCE pass invocation could take a flag indicating that it should > remove the avail extern functions, that we can set in the ThinLTO > backend case. It seems like most of the code would be shared between > the current GlobalDCE and any new ThinLTO-specific version that > setting GlobalDCE up to do this optionally (either per-imported > function or with a configuration flag on the pass) made sense.Since the compiler is always free to delete available_externally functions, I think you could just add a pass to the -flto=thin pipeline that deletes all of them (referenced or not) -- it's just a single loop through all the functions deleting the bodies of those with the right linkage. I imagine there are other pass pipelines that might want to do something similar. I don't really like having GlobalDCE delete them (even behind a flag) because they aren't actually dead, and I think a separate pass makes it easier to test and all that. (I haven't actually worked much with pass pipelines, though, so there's a chance I'm on my own here?) You make an interesting point about applying different thresholds to imported functions, but is there any reason not to change all available_externally functions in the some way? Moreover, it feels like thin-LTO's imported functions are a new source of functions with available_externally linkage, but otherwise they aren't interestingly different.
Teresa Johnson
2015-Jun-04 20:13 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
On Thu, Jun 4, 2015 at 11:27 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:> >> On 2015-Jun-04, at 10:13, Teresa Johnson <tejohnson at google.com> wrote: >> >> On Thu, Jun 4, 2015 at 9:51 AM, Duncan P. N. Exon Smith >> <dexonsmith at apple.com> wrote: >>> >>> In particular, I'm worried about GlobalDCE removing a *referenced* >>> available_externally function, particularly if/when we stop inlining at >>> the compile step when -flto. >> >> Ok, that makes sense. In that case, I agree that doing this for >> ThinLTO imported functions is the right way to go. My initial approach >> was to mark the functions that were ThinLTO-imported and check that >> here. I think we will likely want to mark the imported functions as >> such regardless, since we may want to apply different optimization >> thresholds for imported functions, or at least use for debugging and >> tracing, so we could consider using that approach here. >> >> Another possibility as you mentioned was to drop in a new pass in the >> ThinLTO backend optimization pipeline just for this. Or perhaps the >> GlobalDCE pass invocation could take a flag indicating that it should >> remove the avail extern functions, that we can set in the ThinLTO >> backend case. It seems like most of the code would be shared between >> the current GlobalDCE and any new ThinLTO-specific version that >> setting GlobalDCE up to do this optionally (either per-imported >> function or with a configuration flag on the pass) made sense. > > Since the compiler is always free to delete available_externally > functions, I think you could just add a pass to the -flto=thin pipeline > that deletes all of them (referenced or not) -- it's just a single loop > through all the functions deleting the bodies of those with the right > linkage. I imagine there are other pass pipelines that might want to do > something similar. I don't really like having GlobalDCE delete them > (even behind a flag) because they aren't actually dead, and I think a > separate pass makes it easier to test and all that. (I haven't actually > worked much with pass pipelines, though, so there's a chance I'm on my > own here?)Ok, sounds reasonable to do as a separate ThinLTO specific pass.> > You make an interesting point about applying different thresholds to > imported functions, but is there any reason not to change all > available_externally functions in the some way? Moreover, it feels like > thin-LTO's imported functions are a new source of functions with > available_externally linkage, but otherwise they aren't interestingly > different. >I suspect those cases are going to be due to summary information recorded for the imported functions anyway, which we haven't defined the format of yet anyway. So I think we can separate that from this aspect of imported function handling, which can be based on the linkage type alone, and defer that discussion until later. Thanks for your help, Teresa -- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
Reid Kleckner
2015-Jun-04 21:37 UTC
[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
On Thu, Jun 4, 2015 at 11:27 AM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> Since the compiler is always free to delete available_externally > functions, I think you could just add a pass to the -flto=thin pipeline > that deletes all of them (referenced or not) -- it's just a single loop > through all the functions deleting the bodies of those with the right > linkage. I imagine there are other pass pipelines that might want to do > something similar. I don't really like having GlobalDCE delete them > (even behind a flag) because they aren't actually dead, and I think a > separate pass makes it easier to test and all that. (I haven't actually > worked much with pass pipelines, though, so there's a chance I'm on my > own here?) >It's possible to get into situations where available_externally functions or globals (dllimported vftable) reference linkonce_odr functions (virtual destructor thunks), so a single pass to drop available_externally function bodies isn't as strong as adding a flag to GlobalDCE. Personally, I think the right approach is to add a bool to createGlobalDCEPass defaulting to true named something like IsAfterInlining. In most standard pass pipelines, GlobalDCE runs after inlining for obvious reasons, so the default makes sense. The special case is the LTO pass pipeline, where the code will be reloaded and reoptimized later. IMO it's natural to put the customization there. You make an interesting point about applying different thresholds to> imported functions, but is there any reason not to change all > available_externally functions in the some way? Moreover, it feels like > thin-LTO's imported functions are a new source of functions with > available_externally linkage, but otherwise they aren't interestingly > different.Right, ThinLTO functions and C99 inline functions aren't interestingly different. Having GlobalDCE drop bodies of available_externally functions in the standard -O2 pipeline avoids wasting time running IR passes like CodeGenPrepare on them. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150604/36d4c5f2/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
- [LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
- [LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
- [LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
- [LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)