Kristof Beyls
2015-Feb-26 10:33 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
Hi Ahmed, Did you run these experiments on a platform with a linker that makes use of the AArch64CollectLOH-pass-produced information? I'm guessing that the AArch64CollectLOH-pass information and a linker that makes use of that information could affect the profitability of the GlobalMerge pass? Thanks, Kristof> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Ahmed Bougacha > Sent: 26 February 2015 01:13 > To: LLVM Dev > Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge? > > With the numbers! > -Ahmed > > > On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha > <ahmed.bougacha at gmail.com> wrote: > > Hi all, > > > > I've started looking at the GlobalMerge pass, enabled by default on > > ARM and AArch64. I think we should reconsider that, at least for > > AArch64. > > > > As is, the pass just merges all globals together, in groups of 4KB > > (AArch64, 128B on ARM). > > > > At the time it was enabled, the general thinking was "it's almost > > free, it doesn't affect performance much, we might as well use it". > > Now, it's preventing some link-time optimizations (as acknowledged in > > one of the FIXMEs). > > > > > > -- Performance impact > > Overall, it isn't that profitable on the test-suite, and actually > > degrades performance on a lot of other - "non-benchmark" - projects I > > tried (where the main reason to use a global is file- or function- > > static variables, only accessed through a single getter function). > > > > Across several runs on the entire test-suite, when disabling the pass, > > I measured: > > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean > > regression. > > > > As for just SPEC2006, there are two big regressions: 400.perlbench > > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o). > > > > Numbers are attached. > > > > > > -- A way forward > > One obvious way to improve it is: look at uses of globals, and try to > > form sets of globals commonly used together. The tricky part is to > > define heuristics for "commonly". Also, the pass then becomes much > > more expensive. I'm currently looking into improving it, and will > > report if I come up with a good solution. But this shouldn't stop us > > from disabling it, for now. > > > > Also, the pass seems like a good candidate for > > -O3/CodeGenOpt::Aggressive. However, the latter is implied by LTO, > > which IMO shouldn't include these not-always-profitable optimizations. > > That's another problem though. > > > > > > > > Right now, I think we should disable the pass by default, until it's > > deemed profitable enough. > > > > -Ahmed
Jim Grosbach
2015-Feb-27 20:42 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
Hi Kristof, Our tests are on iOS, which definitely uses the LOH optimizations for ARM64. -Jim> On Feb 26, 2015, at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: > > Hi Ahmed, > > Did you run these experiments on a platform with a linker that makes > use of the AArch64CollectLOH-pass-produced information? > I'm guessing that the AArch64CollectLOH-pass information and a linker > that makes use of that information could affect the profitability of > the GlobalMerge pass? > > Thanks, > > Kristof > >> -----Original Message----- >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >> On Behalf Of Ahmed Bougacha >> Sent: 26 February 2015 01:13 >> To: LLVM Dev >> Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge? >> >> With the numbers! >> -Ahmed >> >> >> On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha >> <ahmed.bougacha at gmail.com> wrote: >>> Hi all, >>> >>> I've started looking at the GlobalMerge pass, enabled by default on >>> ARM and AArch64. I think we should reconsider that, at least for >>> AArch64. >>> >>> As is, the pass just merges all globals together, in groups of 4KB >>> (AArch64, 128B on ARM). >>> >>> At the time it was enabled, the general thinking was "it's almost >>> free, it doesn't affect performance much, we might as well use it". >>> Now, it's preventing some link-time optimizations (as acknowledged in >>> one of the FIXMEs). >>> >>> >>> -- Performance impact >>> Overall, it isn't that profitable on the test-suite, and actually >>> degrades performance on a lot of other - "non-benchmark" - projects I >>> tried (where the main reason to use a global is file- or function- >>> static variables, only accessed through a single getter function). >>> >>> Across several runs on the entire test-suite, when disabling the pass, >>> I measured: >>> without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean >>> regression. >>> >>> As for just SPEC2006, there are two big regressions: 400.perlbench >>> (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o). >>> >>> Numbers are attached. >>> >>> >>> -- A way forward >>> One obvious way to improve it is: look at uses of globals, and try to >>> form sets of globals commonly used together. The tricky part is to >>> define heuristics for "commonly". Also, the pass then becomes much >>> more expensive. I'm currently looking into improving it, and will >>> report if I come up with a good solution. But this shouldn't stop us >>> from disabling it, for now. >>> >>> Also, the pass seems like a good candidate for >>> -O3/CodeGenOpt::Aggressive. However, the latter is implied by LTO, >>> which IMO shouldn't include these not-always-profitable optimizations. >>> That's another problem though. >>> >>> >>> >>> Right now, I think we should disable the pass by default, until it's >>> deemed profitable enough. >>> >>> -Ahmed > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Ahmed Bougacha
2015-Feb-27 21:26 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com> wrote:> > Hi Ahmed, > > Did you run these experiments on a platform with a linker that makes > use of the AArch64CollectLOH-pass-produced information?As Jim says, I'm on iOS, so yes. However, I'm mostly running tests with the pass disabled.> > I'm guessing that the AArch64CollectLOH-pass information and a linker > that makes use of that information could affect the profitability of > the GlobalMerge pass?It could, and does, from what I've seen (beware anecdata): - reusing the adrp base prevents optimizing it (the various Adrp*{ldr,str} LOHs). - reusing the adrp+add MergedGlobal pointer, with indexed addressing, doesn't prevent the AdrpAdd optimization. All in all, whether GlobalMerge is profitable or not (by increasing register pressure, or adding another indirection), whenever the LOH optimizations fire, they reduce its usefulness. AFAICT, the only case where LOHs help GlobalMerge is when the MergedGlobal base is closer to the adrp sequence than the actual global. Given that we only merge 4k of globals, on a 1MB range this doesn't happen very often. Which brings us to my fallback proposal: what about disabling the pass on darwin only? Various darwin-enabled features (e.g., LOHs) help mitigate the adrp problem, and global usage is usually frowned upon in those circles (except for singletons, class-/function-statics and whatnot, which I'm trying to address in an upcoming patch). As for other targets, as a first step, making the pass run under -O3 rather than -O1 is hopefully agreeable to everyone? After all, it is "aggressive", and isn't always profitable. That's pretty much the description of -O3. We can still run into problematic cases under LTO, though. -Ahmed> > Thanks, > > Kristof > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > > On Behalf Of Ahmed Bougacha > > Sent: 26 February 2015 01:13 > > To: LLVM Dev > > Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge? > > > > With the numbers! > > -Ahmed > > > > > > On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha > > <ahmed.bougacha at gmail.com> wrote: > > > Hi all, > > > > > > I've started looking at the GlobalMerge pass, enabled by default on > > > ARM and AArch64. I think we should reconsider that, at least for > > > AArch64. > > > > > > As is, the pass just merges all globals together, in groups of 4KB > > > (AArch64, 128B on ARM). > > > > > > At the time it was enabled, the general thinking was "it's almost > > > free, it doesn't affect performance much, we might as well use it". > > > Now, it's preventing some link-time optimizations (as acknowledged in > > > one of the FIXMEs). > > > > > > > > > -- Performance impact > > > Overall, it isn't that profitable on the test-suite, and actually > > > degrades performance on a lot of other - "non-benchmark" - projects I > > > tried (where the main reason to use a global is file- or function- > > > static variables, only accessed through a single getter function). > > > > > > Across several runs on the entire test-suite, when disabling the pass, > > > I measured: > > > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean > > > regression. > > > > > > As for just SPEC2006, there are two big regressions: 400.perlbench > > > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o). > > > > > > Numbers are attached. > > > > > > > > > -- A way forward > > > One obvious way to improve it is: look at uses of globals, and try to > > > form sets of globals commonly used together. The tricky part is to > > > define heuristics for "commonly". Also, the pass then becomes much > > > more expensive. I'm currently looking into improving it, and will > > > report if I come up with a good solution. But this shouldn't stop us > > > from disabling it, for now. > > > > > > Also, the pass seems like a good candidate for > > > -O3/CodeGenOpt::Aggressive. However, the latter is implied by LTO, > > > which IMO shouldn't include these not-always-profitable optimizations. > > > That's another problem though. > > > > > > > > > > > > Right now, I think we should disable the pass by default, until it's > > > deemed profitable enough. > > > > > > -Ahmed > > >
Eric Christopher
2015-Feb-27 21:42 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
On Fri, Feb 27, 2015 at 1:38 PM Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:> On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com> > wrote: > > > > Hi Ahmed, > > > > Did you run these experiments on a platform with a linker that makes > > use of the AArch64CollectLOH-pass-produced information? > > As Jim says, I'm on iOS, so yes. However, I'm mostly running tests > with the pass disabled. > > > > > I'm guessing that the AArch64CollectLOH-pass information and a linker > > that makes use of that information could affect the profitability of > > the GlobalMerge pass? > > It could, and does, from what I've seen (beware anecdata): > - reusing the adrp base prevents optimizing it (the various > Adrp*{ldr,str} LOHs). > - reusing the adrp+add MergedGlobal pointer, with indexed addressing, > doesn't prevent the AdrpAdd optimization. > > All in all, whether GlobalMerge is profitable or not (by increasing > register pressure, or adding another indirection), whenever the LOH > optimizations fire, they reduce its usefulness. > > AFAICT, the only case where LOHs help GlobalMerge is when the > MergedGlobal base is closer to the adrp sequence than the actual > global. Given that we only merge 4k of globals, on a 1MB range this > doesn't happen very often. > > > > Which brings us to my fallback proposal: what about disabling the > pass on darwin only? Various darwin-enabled features (e.g., LOHs) > help mitigate the adrp problem, and global usage is usually frowned > upon in those circles (except for singletons, class-/function-statics > and whatnot, which I'm trying to address in an upcoming patch). > >Before making the disabling darwin only I'd like to see some analysis of the regressions/improvements. Has anyone looked at the code for those yet?> As for other targets, as a first step, making the pass run under -O3 > rather than -O1 is hopefully agreeable to everyone? After all, it is > "aggressive", and isn't always profitable. That's pretty much the > description of -O3. > We can still run into problematic cases under LTO, though. > >Seems reasonable to me, but probably want to see what happens with the above questions first. -eric> -Ahmed > > > > > Thanks, > > > > Kristof > > > > > -----Original Message----- > > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > > > On Behalf Of Ahmed Bougacha > > > Sent: 26 February 2015 01:13 > > > To: LLVM Dev > > > Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge? > > > > > > With the numbers! > > > -Ahmed > > > > > > > > > On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha > > > <ahmed.bougacha at gmail.com> wrote: > > > > Hi all, > > > > > > > > I've started looking at the GlobalMerge pass, enabled by default on > > > > ARM and AArch64. I think we should reconsider that, at least for > > > > AArch64. > > > > > > > > As is, the pass just merges all globals together, in groups of 4KB > > > > (AArch64, 128B on ARM). > > > > > > > > At the time it was enabled, the general thinking was "it's almost > > > > free, it doesn't affect performance much, we might as well use it". > > > > Now, it's preventing some link-time optimizations (as acknowledged in > > > > one of the FIXMEs). > > > > > > > > > > > > -- Performance impact > > > > Overall, it isn't that profitable on the test-suite, and actually > > > > degrades performance on a lot of other - "non-benchmark" - projects I > > > > tried (where the main reason to use a global is file- or function- > > > > static variables, only accessed through a single getter function). > > > > > > > > Across several runs on the entire test-suite, when disabling the > pass, > > > > I measured: > > > > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean > > > > regression. > > > > > > > > As for just SPEC2006, there are two big regressions: 400.perlbench > > > > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o). > > > > > > > > Numbers are attached. > > > > > > > > > > > > -- A way forward > > > > One obvious way to improve it is: look at uses of globals, and try to > > > > form sets of globals commonly used together. The tricky part is to > > > > define heuristics for "commonly". Also, the pass then becomes much > > > > more expensive. I'm currently looking into improving it, and will > > > > report if I come up with a good solution. But this shouldn't stop us > > > > from disabling it, for now. > > > > > > > > Also, the pass seems like a good candidate for > > > > -O3/CodeGenOpt::Aggressive. However, the latter is implied by LTO, > > > > which IMO shouldn't include these not-always-profitable > optimizations. > > > > That's another problem though. > > > > > > > > > > > > > > > > Right now, I think we should disable the pass by default, until it's > > > > deemed profitable enough. > > > > > > > > -Ahmed > > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150227/fa7eb96c/attachment.html>
Renato Golin
2015-Feb-27 22:01 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
On 27 February 2015 at 21:26, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:> Which brings us to my fallback proposal: what about disabling the > pass on darwin only?That's a decision for Jim/Evan. I'm ok if they are.> As for other targets, as a first step, making the pass run under -O3 > rather than -O1 is hopefully agreeable to everyone?Sounds reasonable. Even though it conflicts with LTO, that's what O3 means, as you said, instability. People at O3 might want to fiddle with the passes (on/off) to get the best performance for their own code/workload. cheers, --renato
Possibly Parallel Threads
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?