Eric Christopher
2015-Feb-27 21:42 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
On Fri, Feb 27, 2015 at 1:38 PM Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:> On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com> > wrote: > > > > Hi Ahmed, > > > > Did you run these experiments on a platform with a linker that makes > > use of the AArch64CollectLOH-pass-produced information? > > As Jim says, I'm on iOS, so yes. However, I'm mostly running tests > with the pass disabled. > > > > > I'm guessing that the AArch64CollectLOH-pass information and a linker > > that makes use of that information could affect the profitability of > > the GlobalMerge pass? > > It could, and does, from what I've seen (beware anecdata): > - reusing the adrp base prevents optimizing it (the various > Adrp*{ldr,str} LOHs). > - reusing the adrp+add MergedGlobal pointer, with indexed addressing, > doesn't prevent the AdrpAdd optimization. > > All in all, whether GlobalMerge is profitable or not (by increasing > register pressure, or adding another indirection), whenever the LOH > optimizations fire, they reduce its usefulness. > > AFAICT, the only case where LOHs help GlobalMerge is when the > MergedGlobal base is closer to the adrp sequence than the actual > global. Given that we only merge 4k of globals, on a 1MB range this > doesn't happen very often. > > > > Which brings us to my fallback proposal: what about disabling the > pass on darwin only? Various darwin-enabled features (e.g., LOHs) > help mitigate the adrp problem, and global usage is usually frowned > upon in those circles (except for singletons, class-/function-statics > and whatnot, which I'm trying to address in an upcoming patch). > >Before making the disabling darwin only I'd like to see some analysis of the regressions/improvements. Has anyone looked at the code for those yet?> As for other targets, as a first step, making the pass run under -O3 > rather than -O1 is hopefully agreeable to everyone? After all, it is > "aggressive", and isn't always profitable. That's pretty much the > description of -O3. > We can still run into problematic cases under LTO, though. > >Seems reasonable to me, but probably want to see what happens with the above questions first. -eric> -Ahmed > > > > > Thanks, > > > > Kristof > > > > > -----Original Message----- > > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > > > On Behalf Of Ahmed Bougacha > > > Sent: 26 February 2015 01:13 > > > To: LLVM Dev > > > Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge? > > > > > > With the numbers! > > > -Ahmed > > > > > > > > > On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha > > > <ahmed.bougacha at gmail.com> wrote: > > > > Hi all, > > > > > > > > I've started looking at the GlobalMerge pass, enabled by default on > > > > ARM and AArch64. I think we should reconsider that, at least for > > > > AArch64. > > > > > > > > As is, the pass just merges all globals together, in groups of 4KB > > > > (AArch64, 128B on ARM). > > > > > > > > At the time it was enabled, the general thinking was "it's almost > > > > free, it doesn't affect performance much, we might as well use it". > > > > Now, it's preventing some link-time optimizations (as acknowledged in > > > > one of the FIXMEs). > > > > > > > > > > > > -- Performance impact > > > > Overall, it isn't that profitable on the test-suite, and actually > > > > degrades performance on a lot of other - "non-benchmark" - projects I > > > > tried (where the main reason to use a global is file- or function- > > > > static variables, only accessed through a single getter function). > > > > > > > > Across several runs on the entire test-suite, when disabling the > pass, > > > > I measured: > > > > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean > > > > regression. > > > > > > > > As for just SPEC2006, there are two big regressions: 400.perlbench > > > > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o). > > > > > > > > Numbers are attached. > > > > > > > > > > > > -- A way forward > > > > One obvious way to improve it is: look at uses of globals, and try to > > > > form sets of globals commonly used together. The tricky part is to > > > > define heuristics for "commonly". Also, the pass then becomes much > > > > more expensive. I'm currently looking into improving it, and will > > > > report if I come up with a good solution. But this shouldn't stop us > > > > from disabling it, for now. > > > > > > > > Also, the pass seems like a good candidate for > > > > -O3/CodeGenOpt::Aggressive. However, the latter is implied by LTO, > > > > which IMO shouldn't include these not-always-profitable > optimizations. > > > > That's another problem though. > > > > > > > > > > > > > > > > Right now, I think we should disable the pass by default, until it's > > > > deemed profitable enough. > > > > > > > > -Ahmed > > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150227/fa7eb96c/attachment.html>
Ahmed Bougacha
2015-Feb-27 22:13 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
On Fri, Feb 27, 2015 at 1:42 PM, Eric Christopher <echristo at gmail.com> wrote:> > > On Fri, Feb 27, 2015 at 1:38 PM Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: >> >> On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com> >> wrote: >> > >> > Hi Ahmed, >> > >> > Did you run these experiments on a platform with a linker that makes >> > use of the AArch64CollectLOH-pass-produced information? >> >> As Jim says, I'm on iOS, so yes. However, I'm mostly running tests >> with the pass disabled. >> >> > >> > I'm guessing that the AArch64CollectLOH-pass information and a linker >> > that makes use of that information could affect the profitability of >> > the GlobalMerge pass? >> >> It could, and does, from what I've seen (beware anecdata): >> - reusing the adrp base prevents optimizing it (the various >> Adrp*{ldr,str} LOHs). >> - reusing the adrp+add MergedGlobal pointer, with indexed addressing, >> doesn't prevent the AdrpAdd optimization. >> >> All in all, whether GlobalMerge is profitable or not (by increasing >> register pressure, or adding another indirection), whenever the LOH >> optimizations fire, they reduce its usefulness. >> >> AFAICT, the only case where LOHs help GlobalMerge is when the >> MergedGlobal base is closer to the adrp sequence than the actual >> global. Given that we only merge 4k of globals, on a 1MB range this >> doesn't happen very often. >> >> >> >> Which brings us to my fallback proposal: what about disabling the >> pass on darwin only? Various darwin-enabled features (e.g., LOHs) >> help mitigate the adrp problem, and global usage is usually frowned >> upon in those circles (except for singletons, class-/function-statics >> and whatnot, which I'm trying to address in an upcoming patch). >> > > Before making the disabling darwin only I'd like to see some analysis of the > regressions/improvements. Has anyone looked at the code for those yet?Yep, I put a quick analysis in my other reply.> >> >> As for other targets, as a first step, making the pass run under -O3 >> rather than -O1 is hopefully agreeable to everyone? After all, it is >> "aggressive", and isn't always profitable. That's pretty much the >> description of -O3. >> We can still run into problematic cases under LTO, though. >> > > Seems reasonable to me, but probably want to see what happens with the above > questions first.Fair enough. Bottom line is: - disabling it without LTO is a slight win on the test-suite, a solid win everywhere else I've looked. - disabling it with LTO regresses quite a few SPEC benchmarks, and is overall a slight regression on the test-suite. -Ahmed> -eric >
Eric Christopher
2015-Feb-27 22:21 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
On Fri, Feb 27, 2015 at 2:13 PM Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:> On Fri, Feb 27, 2015 at 1:42 PM, Eric Christopher <echristo at gmail.com> > wrote: > > > > > > On Fri, Feb 27, 2015 at 1:38 PM Ahmed Bougacha <ahmed.bougacha at gmail.com > > > > wrote: > >> > >> On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com> > >> wrote: > >> > > >> > Hi Ahmed, > >> > > >> > Did you run these experiments on a platform with a linker that makes > >> > use of the AArch64CollectLOH-pass-produced information? > >> > >> As Jim says, I'm on iOS, so yes. However, I'm mostly running tests > >> with the pass disabled. > >> > >> > > >> > I'm guessing that the AArch64CollectLOH-pass information and a linker > >> > that makes use of that information could affect the profitability of > >> > the GlobalMerge pass? > >> > >> It could, and does, from what I've seen (beware anecdata): > >> - reusing the adrp base prevents optimizing it (the various > >> Adrp*{ldr,str} LOHs). > >> - reusing the adrp+add MergedGlobal pointer, with indexed addressing, > >> doesn't prevent the AdrpAdd optimization. > >> > >> All in all, whether GlobalMerge is profitable or not (by increasing > >> register pressure, or adding another indirection), whenever the LOH > >> optimizations fire, they reduce its usefulness. > >> > >> AFAICT, the only case where LOHs help GlobalMerge is when the > >> MergedGlobal base is closer to the adrp sequence than the actual > >> global. Given that we only merge 4k of globals, on a 1MB range this > >> doesn't happen very often. > >> > >> > >> > >> Which brings us to my fallback proposal: what about disabling the > >> pass on darwin only? Various darwin-enabled features (e.g., LOHs) > >> help mitigate the adrp problem, and global usage is usually frowned > >> upon in those circles (except for singletons, class-/function-statics > >> and whatnot, which I'm trying to address in an upcoming patch). > >> > > > > Before making the disabling darwin only I'd like to see some analysis of > the > > regressions/improvements. Has anyone looked at the code for those yet? > > Yep, I put a quick analysis in my other reply. >The LOH/ADRP bit?> > > > >> > >> As for other targets, as a first step, making the pass run under -O3 > >> rather than -O1 is hopefully agreeable to everyone? After all, it is > >> "aggressive", and isn't always profitable. That's pretty much the > >> description of -O3. > >> We can still run into problematic cases under LTO, though. > >> > > > > Seems reasonable to me, but probably want to see what happens with the > above > > questions first. > > Fair enough. Bottom line is: > - disabling it without LTO is a slight win on the test-suite, a solid > win everywhere else I've looked. > - disabling it with LTO regresses quite a few SPEC benchmarks, and is > overall a slight regression on the test-suite. > >Ah, I meant an analysis of the code, not just the numbers. I think the ADRP/LOH commentary really helps. It might only be a decent LTOish optimization, but I'm still curious how it's helping there over other optimizations. Anyhow, FWIW I'm in favor of pulling it out of the non-LTO pipeline universally. -eric> -Ahmed > > > -eric > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150227/b43e6d1a/attachment.html>
Possibly Parallel Threads
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM