Quentin Colombet
2015-May-27 22:14 UTC
[LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64
Hi, Shrink-wrapping capabilities, i.e., better placement of prologue and epilogue sequences, landed in r236507 but are not yet enabled by default. Since r236507 AArch64 is shrink-wrapping ready, meaning we can turn the pass on for this target. I’ve done the same for X86 in r 238293. Now, I need your help to test and benchmark how shrink-wrapping perform on those targets. The goal is to decide whether or not the support is good enough to be enabled by default. ** How Can I Test/Benchmark It? ** Add (-mllvm) -enable-shrink-wrap on your command line or patch the XXXConfigPass to set EnableShrinkWrap to true. Note the -enable-shrink-wrap=<bool> takes precedence over whatever is set for EnableShrinkWrap. Please report any problem specific to this optimization turned on. A PR with a small IR to reproduce are appreciated. Note: I’ve seem up to 4% runtime improvements on the LLVM test-suite + specs for Os and O3. Thanks in advance for your help, -Quentin
Chandler Carruth
2015-Jul-14 22:58 UTC
[LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64
I've run it across a wide variety of server benchmarks we care about. Looks like all the changes are in the noise across sandybridge and ivybridge architectures. No interesting performance changes (in either direction sadly). I saw some very minor size fluctuations and dug into it. Turns out there was a missed easy size optimization in it that Quentin has already implemented based on our conversation on IRC. As far as I can see, this is pure goodness. Let's turn it on. On Wed, May 27, 2015 at 3:17 PM Quentin Colombet <qcolombet at apple.com> wrote:> Hi, > > Shrink-wrapping capabilities, i.e., better placement of prologue and > epilogue sequences, landed in r236507 but are not yet enabled by default. > > Since r236507 AArch64 is shrink-wrapping ready, meaning we can turn the > pass on for this target. > I’ve done the same for X86 in r 238293. > > Now, I need your help to test and benchmark how shrink-wrapping perform on > those targets. > > The goal is to decide whether or not the support is good enough to be > enabled by default. > > > ** How Can I Test/Benchmark It? ** > > Add (-mllvm) -enable-shrink-wrap on your command line or patch the > XXXConfigPass to set EnableShrinkWrap to true. > Note the -enable-shrink-wrap=<bool> takes precedence over whatever is set > for EnableShrinkWrap. > > Please report any problem specific to this optimization turned on. A PR > with a small IR to reproduce are appreciated. > > Note: I’ve seem up to 4% runtime improvements on the LLVM test-suite + > specs for Os and O3. > > > Thanks in advance for your help, > -Quentin > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150714/0b952616/attachment.html>
Renato Golin
2015-Jul-15 09:21 UTC
[LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64
Hi Quentin, This is interesting, I was meaning to look at that at some point. Glad you did it. :) On 27 May 2015 at 23:14, Quentin Colombet <qcolombet at apple.com> wrote:> Note: I’ve seem up to 4% runtime improvements on the LLVM test-suite + specs for Os and O3.Which target? cheers, --renato
Renato Golin
2015-Jul-15 12:12 UTC
[LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64
On 27 May 2015 at 23:14, Quentin Colombet <qcolombet at apple.com> wrote:> Note: I’ve seem up to 4% runtime improvements on the LLVM test-suite + specs for Os and O3.I just did a quick test-suite run on AArch64 and I'm getting 1% worse overall in the benchmark set. There were cases over 50% worse (TSVC/Equivalencing-dbl, TSVC/Symbolics-dbl, Polybench/medley/reg_detect) and the best case was only 30% better (ASC_Sequoia/IRSmk). Geomean difference of compile time is within noise range.
Chad Rosier
2015-Jul-15 14:54 UTC
[LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64
On an a57 device for Spec2000 with -O3 I'm seeing no significant performance changes across the board. With that being said, I think this is pretty cool stuff. Thanks for working on this Quentin. Chad -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Renato Golin Sent: Wednesday, July 15, 2015 8:13 AM To: Quentin Colombet Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64 On 27 May 2015 at 23:14, Quentin Colombet <qcolombet at apple.com> wrote:> Note: I’ve seem up to 4% runtime improvements on the LLVM test-suite + specs for Os and O3.I just did a quick test-suite run on AArch64 and I'm getting 1% worse overall in the benchmark set. There were cases over 50% worse (TSVC/Equivalencing-dbl, TSVC/Symbolics-dbl, Polybench/medley/reg_detect) and the best case was only 30% better (ASC_Sequoia/IRSmk). Geomean difference of compile time is within noise range. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Quentin Colombet
2015-Jul-15 16:56 UTC
[LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64
Hi Renato,> On Jul 15, 2015, at 2:21 AM, Renato Golin <renato.golin at linaro.org> wrote: > > Hi Quentin, > > This is interesting, I was meaning to look at that at some point. Glad > you did it. :)You’re welcome :).> > On 27 May 2015 at 23:14, Quentin Colombet <qcolombet at apple.com> wrote: >> Note: I’ve seem up to 4% runtime improvements on the LLVM test-suite + specs for Os and O3. > > Which target?That was AArch64. Cheers, Q.> > cheers, > --renato
Quentin Colombet
2015-Jul-15 17:01 UTC
[LLVMdev] [Shrink-Wrapping] Request For Benchmarking: X86 and AArch64
> On Jul 15, 2015, at 5:12 AM, Renato Golin <renato.golin at linaro.org> wrote: > > On 27 May 2015 at 23:14, Quentin Colombet <qcolombet at apple.com> wrote: >> Note: I’ve seem up to 4% runtime improvements on the LLVM test-suite + specs for Os and O3. > > I just did a quick test-suite run on AArch64 and I'm getting 1% worse > overall in the benchmark set. There were cases over 50% worse > (TSVC/Equivalencing-dbl, TSVC/Symbolics-dbl, > Polybench/medley/reg_detect) and the best case was only 30% better > (ASC_Sequoia/IRSmk).Strange. I haven’t seen that big swings and the overall performance difference was in the noise but better. Would you mind looking closer to the diff and file a PR or gave me the command line to reproduce? Thanks, -Quentin> > Geomean difference of compile time is within noise range.
Maybe Matching Threads
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] IR Passes and TargetTransformInfo: Straw Man
- [test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
- [LLVMdev] TSVC/Equivalencing-dbl
- [LLVMdev] TSVC/Equivalencing-dbl