On CINT2006 ARM64/ref input/lto+pgo I practically measure no performance difference for the 7 benchmarks that compile. This includes bzip2 (although different source base than in CINT2000), mcf, hmmer, sjeng, h364ref, astar, xalancbmk On Sep 15, 2014, at 11:59 AM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- >> From: "Gerolf Hoflehner" <ghoflehner at apple.com> >> To: "Jiangning Liu" <liujiangning1 at gmail.com>, "George Burgess IV" <george.burgess.iv at gmail.com>, "Hal Finkel" >> <hfinkel at anl.gov> >> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> >> Sent: Sunday, September 14, 2014 12:15:02 AM >> Subject: Re: [LLVMdev] Testing the new CFL alias analysis >> >> In lto+pgo some (5 out of 12 with usual suspect like perlbench and >> gcc among them using -flto -Wl,-mllvm,-use-cfl-aa >> -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t >> compile. > > On what platform? Could you bugpoint it and file a report?Ok, I’ll see that I can get a small test case.> >> Has the implementation been tested with lto? > > I've not. > >> If not, please >> stress the implementation more. >> Do we know reasons for gains? Where did you expect the biggest gains? > > I don't want to make a global statement here. My expectation is that we'll see wins from increasing register pressure ;) -- hoisting more loads out of loops (there are certainly cases involving multiple-levels of dereferencing and insert/extract instructions where CFL can provide a NoAlias answer where BasicAA gives up). Obviously, we'll also have problems if we increase pressure too much.Maybe. But I prefer the OoO HW to handle hoisting though. It is hard to tune in the compiler. I’m also curious about the impact on loop transformations.> >> Some of the losses will likely boil down to increased register >> pressure. > > Agreed. > >> >> >> Looks like the current performance numbers pose a good challenge for >> gaining new and refreshing insights into our heuristics (and for >> smoothing out the implementation along the way). > > It certainly seems that way. > > Thanks again, > Hal > >> >> >> Cheers >> Gerolf >> >> >> >> >> >> >> >> >> >> On Sep 12, 2014, at 1:27 AM, Jiangning Liu < liujiangning1 at gmail.com >>> wrote: >> >> >> >> Hi Hal, >> >> I run on SPEC2000 on cortex-a57(AArch64), and got the following >> results, >> >> (It is to measure run-time reduction, and negative is better >> performance) >> >> spec.cpu2000.ref.183_equake 33.77% >> spec.cpu2000.ref.179_art 13.44% >> spec.cpu2000.ref.256_bzip2 7.80% >> spec.cpu2000.ref.186_crafty 3.69% >> spec.cpu2000.ref.175_vpr 2.96% >> spec.cpu2000.ref.176_gcc 1.77% >> spec.cpu2000.ref.252_eon 1.77% >> spec.cpu2000.ref.254_gap 1.19% >> spec.cpu2000.ref.197_parser 1.15% >> spec.cpu2000.ref.253_perlbmk 1.11% >> spec.cpu2000.ref.300_twolf -1.04% >> >> So we can see almost all got worse performance. >> >> The command line option I'm using is "-O3 -std=gnu89 -ffast-math >> -fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa >> -mllvm -use-cfl-aa-in-codegen" >> >> I didn't try compile-time, and I think your test on POWER7 native >> build should already meant something for other hosts. Also I don't >> have a good benchmark suit for compile time testing. My past >> experiences showed both llvm-test-suite (single/multiple) and spec >> benchmark are not good benchmarks for compile time testing. >> >> Thanks, >> -Jiangning >> >> >> 2014-09-04 1:11 GMT+08:00 Hal Finkel < hfinkel at anl.gov > : >> >> >> Hello everyone, >> >> One of Google's summer interns, George Burgess IV, created an >> implementation of the CFL pointer-aliasing analysis algorithm, and >> this has now been added to LLVM trunk. Now we should determine >> whether it is worthwhile adding this to the default optimization >> pipeline. For ease of testing, I've added the command line option >> -use-cfl-aa which will cause the CFL analysis to be added to the >> optimization pipeline. This can be used with the opt program, and >> also via Clang by passing: -mllvm -use-cfl-aa. >> >> For the purpose of testing with those targets that make use of >> aliasing analysis during code generation, there is also a >> corresponding -use-cfl-aa-in-codegen option. >> >> Running the test suite on one of our IBM POWER7 systems (comparing >> -O3 -mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm >> -use-cfl-aa-in-codegen [testing without use in code generation were >> essentially the same]), I see no significant compile-time changes, >> and the following performance results: >> speedup: >> MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257% >> >> slowdown: >> MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/- 22.3212% >> MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset: >> 0.627176% +/- 0.290698% >> MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869% >> >> I ran the test suite 20 times in each configuration, using make -j48 >> each time, so I'll only pick up large changes. I've not yet >> investigated the cause of the slowdowns (or the speedup), and I >> really need people to try this on x86, ARM, etc. I appears, however, >> the better aliasing analysis results might have some negative >> unintended consequences, and we'll need to look at those closely. >> >> Please let me know how this fares on your systems! >> >> Thanks again, >> Hal >> >> -- >> Hal Finkel >> Assistant Computational Scientist >> Leadership Computing Facility >> Argonne National Laboratory >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140915/9e9b072e/attachment.html>
I filed bugzilla pr20954. -Gerolf On Sep 15, 2014, at 2:56 PM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:> On CINT2006 ARM64/ref input/lto+pgo I practically measure no performance difference for the 7 benchmarks that compile. This includes bzip2 (although different source base than in CINT2000), mcf, hmmer, sjeng, h364ref, astar, xalancbmk > > On Sep 15, 2014, at 11:59 AM, Hal Finkel <hfinkel at anl.gov> wrote: > >> ----- Original Message ----- >>> From: "Gerolf Hoflehner" <ghoflehner at apple.com> >>> To: "Jiangning Liu" <liujiangning1 at gmail.com>, "George Burgess IV" <george.burgess.iv at gmail.com>, "Hal Finkel" >>> <hfinkel at anl.gov> >>> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> >>> Sent: Sunday, September 14, 2014 12:15:02 AM >>> Subject: Re: [LLVMdev] Testing the new CFL alias analysis >>> >>> In lto+pgo some (5 out of 12 with usual suspect like perlbench and >>> gcc among them using -flto -Wl,-mllvm,-use-cfl-aa >>> -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t >>> compile. >> >> On what platform? Could you bugpoint it and file a report? > Ok, I’ll see that I can get a small test case. >> >>> Has the implementation been tested with lto? >> >> I've not. >> >>> If not, please >>> stress the implementation more. >>> Do we know reasons for gains? Where did you expect the biggest gains? >> >> I don't want to make a global statement here. My expectation is that we'll see wins from increasing register pressure ;) -- hoisting more loads out of loops (there are certainly cases involving multiple-levels of dereferencing and insert/extract instructions where CFL can provide a NoAlias answer where BasicAA gives up). Obviously, we'll also have problems if we increase pressure too much. > Maybe. But I prefer the OoO HW to handle hoisting though. It is hard to tune in the compiler. > I’m also curious about the impact on loop transformations. >> >>> Some of the losses will likely boil down to increased register >>> pressure. >> >> Agreed. >> >>> >>> >>> Looks like the current performance numbers pose a good challenge for >>> gaining new and refreshing insights into our heuristics (and for >>> smoothing out the implementation along the way). >> >> It certainly seems that way. >> >> Thanks again, >> Hal >> >>> >>> >>> Cheers >>> Gerolf >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Sep 12, 2014, at 1:27 AM, Jiangning Liu < liujiangning1 at gmail.com >>>> wrote: >>> >>> >>> >>> Hi Hal, >>> >>> I run on SPEC2000 on cortex-a57(AArch64), and got the following >>> results, >>> >>> (It is to measure run-time reduction, and negative is better >>> performance) >>> >>> spec.cpu2000.ref.183_equake 33.77% >>> spec.cpu2000.ref.179_art 13.44% >>> spec.cpu2000.ref.256_bzip2 7.80% >>> spec.cpu2000.ref.186_crafty 3.69% >>> spec.cpu2000.ref.175_vpr 2.96% >>> spec.cpu2000.ref.176_gcc 1.77% >>> spec.cpu2000.ref.252_eon 1.77% >>> spec.cpu2000.ref.254_gap 1.19% >>> spec.cpu2000.ref.197_parser 1.15% >>> spec.cpu2000.ref.253_perlbmk 1.11% >>> spec.cpu2000.ref.300_twolf -1.04% >>> >>> So we can see almost all got worse performance. >>> >>> The command line option I'm using is "-O3 -std=gnu89 -ffast-math >>> -fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa >>> -mllvm -use-cfl-aa-in-codegen" >>> >>> I didn't try compile-time, and I think your test on POWER7 native >>> build should already meant something for other hosts. Also I don't >>> have a good benchmark suit for compile time testing. My past >>> experiences showed both llvm-test-suite (single/multiple) and spec >>> benchmark are not good benchmarks for compile time testing. >>> >>> Thanks, >>> -Jiangning >>> >>> >>> 2014-09-04 1:11 GMT+08:00 Hal Finkel < hfinkel at anl.gov > : >>> >>> >>> Hello everyone, >>> >>> One of Google's summer interns, George Burgess IV, created an >>> implementation of the CFL pointer-aliasing analysis algorithm, and >>> this has now been added to LLVM trunk. Now we should determine >>> whether it is worthwhile adding this to the default optimization >>> pipeline. For ease of testing, I've added the command line option >>> -use-cfl-aa which will cause the CFL analysis to be added to the >>> optimization pipeline. This can be used with the opt program, and >>> also via Clang by passing: -mllvm -use-cfl-aa. >>> >>> For the purpose of testing with those targets that make use of >>> aliasing analysis during code generation, there is also a >>> corresponding -use-cfl-aa-in-codegen option. >>> >>> Running the test suite on one of our IBM POWER7 systems (comparing >>> -O3 -mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm >>> -use-cfl-aa-in-codegen [testing without use in code generation were >>> essentially the same]), I see no significant compile-time changes, >>> and the following performance results: >>> speedup: >>> MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257% >>> >>> slowdown: >>> MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/- 22.3212% >>> MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset: >>> 0.627176% +/- 0.290698% >>> MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869% >>> >>> I ran the test suite 20 times in each configuration, using make -j48 >>> each time, so I'll only pick up large changes. I've not yet >>> investigated the cause of the slowdowns (or the speedup), and I >>> really need people to try this on x86, ARM, etc. I appears, however, >>> the better aliasing analysis results might have some negative >>> unintended consequences, and we'll need to look at those closely. >>> >>> Please let me know how this fares on your systems! >>> >>> Thanks again, >>> Hal >>> >>> -- >>> Hal Finkel >>> Assistant Computational Scientist >>> Leadership Computing Facility >>> Argonne National Laboratory >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> -- >> Hal Finkel >> Assistant Computational Scientist >> Leadership Computing Facility >> Argonne National Laboratory-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140915/ce6e5076/attachment.html>
----- Original Message -----> From: "Gerolf Hoflehner" <ghoflehner at apple.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>, "Jiangning Liu" <liujiangning1 at gmail.com>, "George Burgess IV" > <george.burgess.iv at gmail.com> > Sent: Monday, September 15, 2014 7:58:59 PM > Subject: Re: [LLVMdev] Testing the new CFL alias analysis > > I filed bugzilla pr20954.Thanks! -Hal> > > -Gerolf > > > > > > > On Sep 15, 2014, at 2:56 PM, Gerolf Hoflehner < ghoflehner at apple.com > > wrote: > > > > On CINT2006 ARM64/ref input/lto+pgo I practically measure no > performance difference for the 7 benchmarks that compile. This > includes bzip2 (although different source base than in CINT2000), > mcf, hmmer, sjeng, h364ref, astar, xalancbmk > > > > On Sep 15, 2014, at 11:59 AM, Hal Finkel < hfinkel at anl.gov > wrote: > > > > ----- Original Message ----- > > > From: "Gerolf Hoflehner" < ghoflehner at apple.com > > To: "Jiangning Liu" < liujiangning1 at gmail.com >, "George Burgess IV" > < george.burgess.iv at gmail.com >, "Hal Finkel" > < hfinkel at anl.gov > > Cc: "LLVM Dev" < llvmdev at cs.uiuc.edu > > Sent: Sunday, September 14, 2014 12:15:02 AM > Subject: Re: [LLVMdev] Testing the new CFL alias analysis > > In lto+pgo some (5 out of 12 with usual suspect like perlbench and > gcc among them using -flto -Wl,-mllvm,-use-cfl-aa > -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t > compile. > > On what platform? Could you bugpoint it and file a report? > Ok, I’ll see that I can get a small test case. > > > > > > > Has the implementation been tested with lto? > > I've not. > > > > If not, please > stress the implementation more. > Do we know reasons for gains? Where did you expect the biggest gains? > > I don't want to make a global statement here. My expectation is that > we'll see wins from increasing register pressure ;) -- hoisting more > loads out of loops (there are certainly cases involving > multiple-levels of dereferencing and insert/extract instructions > where CFL can provide a NoAlias answer where BasicAA gives up). > Obviously, we'll also have problems if we increase pressure too > much. > > Maybe. But I prefer the OoO HW to handle hoisting though. It is hard > to tune in the compiler. > I’m also curious about the impact on loop transformations. > > > > > > Some of the losses will likely boil down to increased register > pressure. > > Agreed. > > > > > > Looks like the current performance numbers pose a good challenge for > gaining new and refreshing insights into our heuristics (and for > smoothing out the implementation along the way). > > It certainly seems that way. > > Thanks again, > Hal > > > > > > Cheers > Gerolf > > > > > > > > > > On Sep 12, 2014, at 1:27 AM, Jiangning Liu < liujiangning1 at gmail.com > > > wrote: > > > > Hi Hal, > > I run on SPEC2000 on cortex-a57(AArch64), and got the following > results, > > (It is to measure run-time reduction, and negative is better > performance) > > spec.cpu2000.ref.183_equake 33.77% > spec.cpu2000.ref.179_art 13.44% > spec.cpu2000.ref.256_bzip2 7.80% > spec.cpu2000.ref.186_crafty 3.69% > spec.cpu2000.ref.175_vpr 2.96% > spec.cpu2000.ref.176_gcc 1.77% > spec.cpu2000.ref.252_eon 1.77% > spec.cpu2000.ref.254_gap 1.19% > spec.cpu2000.ref.197_parser 1.15% > spec.cpu2000.ref.253_perlbmk 1.11% > spec.cpu2000.ref.300_twolf -1.04% > > So we can see almost all got worse performance. > > The command line option I'm using is "-O3 -std=gnu89 -ffast-math > -fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa > -mllvm -use-cfl-aa-in-codegen" > > I didn't try compile-time, and I think your test on POWER7 native > build should already meant something for other hosts. Also I don't > have a good benchmark suit for compile time testing. My past > experiences showed both llvm-test-suite (single/multiple) and spec > benchmark are not good benchmarks for compile time testing. > > Thanks, > -Jiangning > > > 2014-09-04 1:11 GMT+08:00 Hal Finkel < hfinkel at anl.gov > : > > > Hello everyone, > > One of Google's summer interns, George Burgess IV, created an > implementation of the CFL pointer-aliasing analysis algorithm, and > this has now been added to LLVM trunk. Now we should determine > whether it is worthwhile adding this to the default optimization > pipeline. For ease of testing, I've added the command line option > -use-cfl-aa which will cause the CFL analysis to be added to the > optimization pipeline. This can be used with the opt program, and > also via Clang by passing: -mllvm -use-cfl-aa. > > For the purpose of testing with those targets that make use of > aliasing analysis during code generation, there is also a > corresponding -use-cfl-aa-in-codegen option. > > Running the test suite on one of our IBM POWER7 systems (comparing > -O3 -mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm > -use-cfl-aa-in-codegen [testing without use in code generation were > essentially the same]), I see no significant compile-time changes, > and the following performance results: > speedup: > MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257% > > slowdown: > MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/- 22.3212% > MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset: > 0.627176% +/- 0.290698% > MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869% > > I ran the test suite 20 times in each configuration, using make -j48 > each time, so I'll only pick up large changes. I've not yet > investigated the cause of the slowdowns (or the speedup), and I > really need people to try this on x86, ARM, etc. I appears, however, > the better aliasing analysis results might have some negative > unintended consequences, and we'll need to look at those closely. > > Please let me know how this fares on your systems! > > Thanks again, > Hal > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Possibly Parallel Threads
- [LLVMdev] Testing the new CFL alias analysis
- [LLVMdev] Testing the new CFL alias analysis
- [LLVMdev] Testing the new CFL alias analysis
- [LLVMdev] question about enabling cfl-aa and collecting a57 numbers
- [LLVMdev] question about enabling cfl-aa and collecting a57 numbers