Hi, Sean:
I'm sorry I lie. I didn't mean to lie. I did try to avoid making a
*BIG* change
to the IPO pass-ordering for now. However, when I make a minor change to
populateLTOPassManager() by separating module-pass and non-module-passes, I
saw quite a few performance difference, most of them are degradations.
Attacking
these degradations one by one in a piecemeal manner is wasting time. We
might as
well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at
this time,
and hopefully once for all.
In order to repair the image of being a liar, I post some preliminary
result in this cozy
Saturday afternoon which I normally denote to daydreaming :-)
So far I only measure the result of MultiSource benchmarks on my iMac
(late
2012 model), and the command to run the benchmark is
"make TEST=simple report OPTFLAGS='-O3 -flto'".
In terms of execution-time, some degrade, but more improve, few of them
are quite substantial. User-time is used for comparison. I measure the
result twice, they are basically very stable. As far as I can tell from
the result,
the proposed pass-ordering is basically toward good change.
Interesting enough, if I combine the populatePreIPOPassMgr() as the
preIPO phase
(see the patch) with original populateLTOPassManager() for both IPO and
postIPO,
I see significant improve to
"Benchmarks/Trimaran/netbench-crc/netbench-crc"
(about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have
not yet got chance
to figure out why this combination improves this benchmark this much.
In teams of compile-time, the result reports my change improve the compile
time by about 2x, which is non-sense. I guess test-script doesn't count
link-time.
The new pass ordering Pre-IPO, IPO, and PostIPO are defined by
populate{PreIPO|IPO|PostIPO}PassMgr().
I will discuss with Andy next Monday in order to be consistent with the
pass-ordering design he is envisioning, and measure more benchmarks then
post the patch and result to the community for discussion and approval.
Thanks
Shuxin
On 7/17/13 7:09 PM, Shuxin Yang wrote:> Andy and I briefly discussed this the other day, we have not yet got
> chance to list a detailed pass order
> for the pre- and post- IPO scalar optimizations.
>
> This is wish-list in our mind:
>
> pre-IPO: based on the ordering he propose, get rid of the inlining
> (or just inline tiny func), get rid of
> all loop xforms...
>
> post-IPO: get rid of inlining, or maybe we still need it, only perform
> the inling to to callee which now become tiny.
> enable the loop xforms.
>
> The SCC pass manager seems to be important inling, no
> matter how the inling looks like in the future,
> I think the passmanager is still useful for scalar
> opt. It enable us to achieve cheap inter-procedural
> opt hands down in the sense that we can optimize
> callee, analyze it, and feedback the detailed whatever
> info back to caller (say info like "the callee
> already return constant 5", the "callee return value in
5-10",
> and such info is difficult to obtain and IPO stage, as
> it can not afford to take such closer look.
>
> I think it is too early to discuss the pre-IPO and post-IPO thing, let
> us focus on what Andy is proposing.
>
>
> On 7/17/13 6:04 PM, Sean Silva wrote:
>> There seems to be a lot of interest recently in LTO. How do you see
>> the situation of splitting the IR passes between per-TU processing
>> and multi-TU ("link time") processing?
>>
>> -- Sean Silva
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130727/ff57e511/attachment.html>
-------------- next part --------------
name exec_was exec_is exec_diff
------------------------------------------- ---------- ----------
----------------
Benchmarks/TSVC/Symbolics-flt/Symbolics-flt 1.4634 0.684
-53.259532595326
Benchmarks/MiBench/security-sha/security-sh 0.0199 0.0128
-35.678391959799
Benchmarks/mediabench/adpcm/rawcaudio/rawca 0.0034 0.0025
-26.470588235294
Benchmarks/Prolangs-C/agrep/agrep 0.0032 0.0025 -21.875
Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg 0.0032 0.0025 -21.875
Benchmarks/Olden/perimeter/perimeter 0.1747 0.1422
-18.603319977103
Benchmarks/mediabench/adpcm/rawdaudio/rawda 0.0022 0.0018
-18.181818181818
Benchmarks/FreeBench/fourinarow/fourinarow 0.2457 0.2018
-17.867317867317
Benchmarks/Prolangs-C++/family/family 0.0006 0.0005
-16.666666666666
Applications/ALAC/encode/alacconvert-encode 0.0314 0.0264
-15.923566878980
Benchmarks/MiBench/security-rijndael/securi 0.0243 0.0207
-14.814814814814
Benchmarks/mediabench/gsm/toast/toast 0.0174 0.0149
-14.367816091954
Benchmarks/Prolangs-C++/shapes/shapes 0.0007 0.0006
-14.285714285714
Benchmarks/Prolangs-C/bison/mybison 0.0021 0.0018
-14.285714285714
Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl 2.1248 1.8634
-12.302334337349
Benchmarks/McCat/03-testtrie/testtrie 0.0092 0.0081
-11.956521739130
Applications/treecc/treecc 0.0009 0.0008
-11.111111111111
Benchmarks/Prolangs-C/cdecl/cdecl 0.0009 0.0008
-11.111111111111
Benchmarks/TSVC/NodeSplitting-flt/NodeSplit 2.3019 2.0529
-10.817151049133
Benchmarks/MiBench/network-patricia/network 0.0647 0.0581
-10.200927357032
Benchmarks/McCat/09-vor/vor 0.0816 0.0735
-9.9264705882353
Benchmarks/MallocBench/gs/gs 0.029 0.0262
-9.6551724137931
Benchmarks/MiBench/telecomm-CRC32/telecomm- 0.1227 0.1122
-8.5574572127139
Benchmarks/TSVC/ControlLoops-flt/ControlLoo 1.5978 1.4648
-8.3239454249593
Applications/hexxagon/hexxagon 4.9682 4.566
-8.0954872992230
Benchmarks/Prolangs-C++/simul/simul 0.0043 0.004
-6.9767441860465
Benchmarks/TSVC/Reductions-dbl/Reductions-d 2.3107 2.1611
-6.4742285887393
Benchmarks/TSVC/LinearDependence-dbl/Linear 2.5083 2.3536
-6.1675238209145
Benchmarks/TSVC/LinearDependence-flt/Linear 2.0396 1.9215
-5.7903510492253
Benchmarks/TSVC/ControlLoops-dbl/ControlLoo 2.1258 2.0077
-5.5555555555555
Benchmarks/MiBench/consumer-lame/consumer-l 0.1355 0.1285
-5.1660516605166
Benchmarks/Trimaran/enc-rc4/enc-rc4 0.6262 0.5967
-4.7109549664643
Applications/oggenc/oggenc 0.077 0.0735
-4.5454545454545
Benchmarks/BitBench/uuencode/uuencode 0.0119 0.0114
-4.2016806722689
Benchmarks/Prolangs-C/unix-smail/unix-smail 0.0024 0.0023
-4.1666666666666
Benchmarks/TSVC/InductionVariable-dbl/Induc 2.9528 2.8362
-3.9487943646708
Benchmarks/TSVC/NodeSplitting-dbl/NodeSplit 2.7203 2.6209
-3.6540087490350
Applications/d/make_dparser 0.0174 0.0168
-3.4482758620689
Applications/lambda-0.1.3/lambda 2.6777 2.5864
-3.4096426037270
Applications/viterbi/viterbi 1.8383 1.777
-3.3346026219877
Benchmarks/MiBench/telecomm-gsm/telecomm-gs 0.1172 0.1134
-3.2423208191126
Benchmarks/McCat/18-imp/imp 0.0415 0.0402
-3.1325301204819
Benchmarks/MiBench/automotive-bitcount/auto 0.0518 0.0502
-3.0888030888030
Benchmarks/FreeBench/analyzer/analyzer 0.0333 0.0323
-3.0030030030030
Benchmarks/Prolangs-C++/city/city 0.0036 0.0035
-2.7777777777777
Benchmarks/TSVC/Reductions-flt/Reductions-f 4.4121 4.2942
-2.6721969130346
Benchmarks/Olden/tsp/tsp 0.5126 0.5011
-2.2434646898166
Benchmarks/Trimaran/enc-pc1/enc-pc1 0.1574 0.154
-2.1601016518424
Benchmarks/TSVC/ControlFlow-flt/ControlFlow 2.351 2.3012
-2.1182475542322
Benchmarks/MiBench/network-dijkstra/network 0.0296 0.029
-2.0270270270270
Benchmarks/Ptrdist/bc/bc 0.4764 0.4674
-1.8891687657430
Benchmarks/Prolangs-C/gnugo/gnugo 0.028 0.0275
-1.7857142857142
Benchmarks/VersaBench/dbms/dbms 0.8088 0.7949
-1.7185954500494
Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk 3.7015 3.6379
-1.7182223422936
Benchmarks/Olden/health/health 0.1787 0.1757
-1.6787912702854
Benchmarks/VersaBench/bmm/bmm 1.4694 1.4455
-1.6265142234925
Benchmarks/McCat/01-qbsort/qbsort 0.0876 0.0862
-1.5981735159817
Applications/ClamAV/clamscan 0.094 0.0925
-1.5957446808510
Benchmarks/McCat/17-bintr/bintr 0.0666 0.0658
-1.2012012012012
Benchmarks/MiBench/automotive-susan/automot 0.0312 0.0309
-0.9615384615384
Benchmarks/TSVC/LoopRerolling-dbl/LoopRerol 2.7783 2.7524
-0.9322247417485
Benchmarks/SciMark2-C/scimark2 22.2684 22.0824
-0.8352643207414
Benchmarks/mediabench/g721/g721encode/encod 0.0403 0.04
-0.7444168734491
Benchmarks/ASC_Sequoia/AMGmk/AMGmk 5.0381 5.0033
-0.6907365872054
Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDa 2.3246 2.3089
-0.6753850124752
Applications/sgefa/sgefa 0.0962 0.0956
-0.6237006237006
Applications/minisat/minisat 4.021 4.0023
-0.4650584431733
Benchmarks/llubenchmark/llu 2.8277 2.8147
-0.4597375959260
Benchmarks/TSVC/Expansion-flt/Expansion-flt 1.8036 1.7961
-0.4158349966733
Applications/aha/aha 1.1345 1.1299
-0.4054649625385
Benchmarks/TSVC/Expansion-dbl/Expansion-dbl 2.5986 2.5886
-0.3848225967828
Benchmarks/PAQ8p/paq8p 33.6364 33.5149
-0.3612158257126
Benchmarks/FreeBench/neural/neural 0.1771 0.1765
-0.3387916431394
Benchmarks/Ptrdist/ft/ft 0.6569 0.6549
-0.3044603440401
Benchmarks/Trimaran/enc-3des/enc-3des 1.3386 1.3354
-0.2390557298670
Benchmarks/VersaBench/ecbdes/ecbdes 1.5638 1.5623
-0.0959201943982
Benchmarks/TSVC/Recurrences-dbl/Recurrences 2.8128 2.8102
-0.0924345847554
Benchmarks/Trimaran/netbench-crc/netbench-c 0.5665 0.566
-0.0882612533098
Benchmarks/Prolangs-C++/life/life 1.826 1.8244
-0.0876232201533
Benchmarks/TSVC/ControlFlow-dbl/ControlFlow 2.6993 2.6973
-0.0740932834438
Benchmarks/TSVC/Packing-flt/Packing-flt 2.6722 2.6716
-0.0224534091759
Benchmarks/TSVC/Searching-flt/Searching-flt 3.3246 3.324
-0.0180472838837
Benchmarks/TSVC/Searching-dbl/Searching-dbl 3.3563 3.3558
-0.0148973572088
Benchmarks/TSVC/Equivalencing-flt/Equivalen 0.9735 0.9734
-0.0102722136620
Applications/Burg/burg 0.0008 0.0008 0.0
Applications/hbd/hbd 0.0018 0.0018 0.0
Benchmarks/BitBench/uudecode/uudecode 0.0243 0.0243 0.0
Benchmarks/McCat/04-bisect/bisect 0.0696 0.0696 0.0
Benchmarks/McCat/05-eks/eks 0.0021 0.0021 0.0
Benchmarks/McCat/15-trie/trie 0.0008 0.0008 0.0
Benchmarks/MiBench/consumer-jpeg/consumer-j 0.0028 0.0028 0.0
Benchmarks/MiBench/office-ispell/office-isp 0.0006 0.0006 0.0
Benchmarks/MiBench/security-blowfish/securi 0.0007 0.0007 0.0
Benchmarks/MiBench/telecomm-adpcm/telecomm- 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/NP/np 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/deriv1/deriv1 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/deriv2/deriv2 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/employ/employ 0.0038 0.0038 0.0
Benchmarks/Prolangs-C++/fsm/fsm 0.0005 0.0005 0.0
Benchmarks/Prolangs-C++/garage/garage 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/ocean/ocean 0.042 0.042 0.0
Benchmarks/Prolangs-C++/office/office 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/trees/trees 0.0006 0.0006 0.0
Benchmarks/Prolangs-C++/vcirc/vcirc 0.0005 0.0005 0.0
Benchmarks/Prolangs-C/allroots/allroots 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/compiler/compiler 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/fixoutput/fixoutput 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/football/football 0.0005 0.0005 0.0
Benchmarks/Prolangs-C/loader/loader 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/simulator/simulator 0.0006 0.0006 0.0
Benchmarks/Prolangs-C/unix-tbl/unix-tbl 0.0006 0.0006 0.0
Benchmarks/TSVC/Recurrences-flt/Recurrences 2.7172 2.7173
0.00368025909023
Benchmarks/TSVC/StatementReordering-dbl/Sta 2.5547 2.555
0.01174306180765
Benchmarks/Trimaran/enc-md5/enc-md5 1.2119 1.2126
0.05776054129878
Benchmarks/MiBench/automotive-basicmath/aut 0.1698 0.1699
0.05889281507655
Benchmarks/ASC_Sequoia/IRSmk/IRSmk 2.6607 2.6626
0.07140977938136
Benchmarks/Fhourstones-3.1/fhourstones3.1 0.7427 0.7433
0.08078632018310
Benchmarks/TSVC/LoopRestructuring-dbl/LoopR 2.9857 2.9883
0.08708175637204
Benchmarks/Olden/em3d/em3d 2.0241 2.0262
0.10374981473247
Benchmarks/TSVC/LoopRerolling-flt/LoopRerol 2.0889 2.0914
0.11968021446694
Benchmarks/TSVC/Packing-dbl/Packing-dbl 2.8154 2.8196
0.14917951268025
Benchmarks/BitBench/five11/five11 4.038 4.0448
0.16840019811788
Benchmarks/Olden/treeadd/treeadd 0.1588 0.1591
0.18891687657430
Benchmarks/TSVC/IndirectAddressing-flt/Indi 2.1573 2.1615
0.19468780419969
Benchmarks/Ptrdist/anagram/anagram 0.6629 0.6644
0.22627847337455
Benchmarks/TSVC/StatementReordering-flt/Sta 1.8867 1.892
0.28091376477446
Benchmarks/TSVC/IndirectAddressing-dbl/Indi 2.6113 2.6189
0.29104277562899
Benchmarks/FreeBench/pifft/pifft 0.0636 0.0638
0.31446540880501
Benchmarks/Prolangs-C++/primes/primes 0.1916 0.1923
0.36534446764092
Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDa 1.2514 1.2567
0.42352565127056
Benchmarks/Olden/power/power 0.7097 0.7129
0.45089474425813
Benchmarks/ASCI_Purple/SMG2000/smg2000 1.4904 1.4972
0.45625335480408
Applications/lemon/lemon 0.6774 0.6805
0.45763212282255
Benchmarks/MiBench/telecomm-FFT/telecomm-ff 0.0209 0.021
0.47846889952154
Benchmarks/7zip/7zip-benchmark 5.9521 5.9811
0.48722299692545
Benchmarks/TSVC/CrossingThresholds-dbl/Cros 2.6449 2.6578
0.48773110514575
Applications/SPASS/SPASS 5.9442 5.9748
0.51478752397294
Benchmarks/MallocBench/cfrac/cfrac 1.2635 1.2704
0.54610209734862
Benchmarks/Ptrdist/ks/ks 0.7054 0.7117
0.89311029203288
Benchmarks/MallocBench/espresso/espresso 0.3836 0.3871
0.91240875912408
Applications/JM/lencod/lencod 3.7442 3.7859
1.11372255755568
Benchmarks/TSVC/Equivalencing-dbl/Equivalen 1.3717 1.3881
1.19559670481884
Benchmarks/Olden/bh/bh 0.6255 0.633
1.1990407673861
Benchmarks/VersaBench/8b10b/8b10b 2.8968 2.9416
1.5465341066004
Benchmarks/BitBench/drop3/drop3 0.174 0.1768
1.60919540229886
Benchmarks/McCat/12-IOtest/iotest 0.1223 0.1243
1.63532297628781
Applications/spiff/spiff 1.629 1.6558
1.6451810926949
Benchmarks/TSVC/CrossingThresholds-flt/Cros 2.0682 2.1028
1.67295232569383
Benchmarks/Olden/voronoi/voronoi 0.1569 0.1596
1.72084130019119
Applications/lua/lua 14.0101 14.2671
1.83439090370518
Benchmarks/nbench/nbench 5.4638 5.568
1.90709762436399
Applications/sqlite3/sqlite3 2.3871 2.4339
1.960537891165
Applications/ALAC/decode/alacconvert-decode 0.0152 0.0155
1.97368421052632
Benchmarks/Trimaran/netbench-url/netbench-u 2.7548 2.8112
2.04733555975025
Benchmarks/Olden/bisort/bisort 0.3265 0.3332
2.05206738131699
Benchmarks/Fhourstones/fhourstones 0.6284 0.6419
2.14831317632083
Applications/JM/ldecod/ldecod 0.0543 0.0556
2.39410681399631
Benchmarks/TSVC/LoopRestructuring-flt/LoopR 2.2302 2.2848
2.4482109227872
Benchmarks/FreeBench/mason/mason 0.1085 0.1113
2.58064516129032
Benchmarks/Bullet/bullet 3.0174 3.0968
2.63140452044807
Applications/SIBsim4/SIBsim4 1.8364 1.8853
2.66281855804835
Benchmarks/McCat/08-main/main 0.0138 0.0142
2.89855072463769
Applications/siod/siod 1.8991 1.9696
3.71228476646833
Benchmarks/FreeBench/distray/distray 0.0793 0.0829
4.53972257250947
Benchmarks/NPB-serial/is/is 4.6101 4.8299
4.76779245569511
Applications/kimwitu++/kc 0.0266 0.0279
4.88721804511279
Benchmarks/Olden/mst/mst 0.0551 0.0589
6.89655172413793
Benchmarks/Ptrdist/yacr2/yacr2 0.5277 0.5663
7.31476217547851
Benchmarks/VersaBench/beamformer/beamformer 0.6497 0.7015
7.97291057411112
Benchmarks/sim/sim 2.6061 2.8147
8.00429760945475
Benchmarks/FreeBench/pcompress2/pcompress2 0.101 0.1097
8.61386138613861
Benchmarks/mafft/pairlocalalign 16.7374 18.4048
9.9621207594967
Benchmarks/MiBench/office-stringsearch/offi 0.001 0.0011 10.0
Benchmarks/TSVC/InductionVariable-flt/Induc 2.0788 2.2966
10.4771983836829
Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2d 0.0076 0.0084
10.5263157894737
Benchmarks/MiBench/consumer-typeset/consume 0.0943 0.1053
11.6648992576882
Benchmarks/tramp3d-v4/tramp3d-v4 0.1849 0.208
12.493239588967
Benchmarks/Prolangs-C++/objects/objects 0.0005 0.0006 20.0
Benchmarks/Prolangs-C/TimberWolfMC/timberwo 0.0005 0.0006 20.0
Benchmarks/Prolangs-C/assembler/assembler 0.0005 0.0006 20.0
-------------- next part --------------
Index: include/llvm/Transforms/IPO/PassManagerBuilder.h
==================================================================---
include/llvm/Transforms/IPO/PassManagerBuilder.h (revision 187135)
+++ include/llvm/Transforms/IPO/PassManagerBuilder.h (working copy)
@@ -132,8 +132,14 @@
/// populateModulePassManager - This sets up the primary pass manager.
void populateModulePassManager(PassManagerBase &MPM);
- void populateLTOPassManager(PassManagerBase &PM, bool Internalize,
- bool RunInliner, bool DisableGVNLoadPRE = false);
+
+ /// setup passes for Pre-IPO phase
+ void populatePreIPOPassMgr(PassManagerBase &MPM);
+
+ void populateIPOPassManager(PassManagerBase &PM, bool Internalize,
+ bool RunInliner);
+
+ void populatePostIPOPM(PassManagerBase &PM);
};
/// Registers a function for adding a standard set of passes. This should be
Index: include/llvm/Transforms/IPO.h
==================================================================---
include/llvm/Transforms/IPO.h (revision 187135)
+++ include/llvm/Transforms/IPO.h (working copy)
@@ -89,6 +89,7 @@
/// threshold given here.
Pass *createFunctionInliningPass();
Pass *createFunctionInliningPass(int Threshold);
+Pass *createTinyFuncInliningPass();
//===----------------------------------------------------------------------===//
/// createAlwaysInlinerPass - Return a new pass object that inlines only
Index: tools/lto/LTOCodeGenerator.cpp
==================================================================---
tools/lto/LTOCodeGenerator.cpp (revision 187135)
+++ tools/lto/LTOCodeGenerator.cpp (working copy)
@@ -412,11 +412,12 @@
// Enabling internalize here would use its AllButMain variant. It
// keeps only main if it exists and does nothing for libraries. Instead
// we create the pass ourselves with the symbol list provided by the linker.
- if (!DisableOpt)
- PassManagerBuilder().populateLTOPassManager(passes,
- /*Internalize=*/false,
- !DisableInline,
- DisableGVNLoadPRE);
+ if (!DisableOpt) {
+ PassManagerBuilder().populateIPOPassManager(passes,
+ /*Internalize=*/false,
+ !DisableInline);
+ PassManagerBuilder().populatePostIPOPM(passes);
+ }
// Make sure everything is still good.
passes.add(createVerifierPass());
Index: tools/opt/opt.cpp
==================================================================---
tools/opt/opt.cpp (revision 187135)
+++ tools/opt/opt.cpp (working copy)
@@ -104,6 +104,11 @@
cl::desc("Include the standard compile time
optimizations"));
static cl::opt<bool>
+StandardPreIPOOpts("std-preipo-opts",
+ cl::desc("Include the standard pre-IPO
optimizations"));
+
+
+static cl::opt<bool>
StandardLinkOpts("std-link-opts",
cl::desc("Include the standard link time
optimizations"));
@@ -470,6 +475,23 @@
Builder.populateModulePassManager(PM);
}
+static void AddPreIPOCompilePasses(PassManagerBase &PM) {
+ PM.add(createVerifierPass()); // Verify that input is
correct
+
+ // If the -strip-debug command line option was specified, do it.
+ if (StripDebug)
+ addPass(PM, createStripSymbolsPass(true));
+
+ if (DisableOptimizations) return;
+
+ // -std-preipo-opts adds the same module passes as -O3.
+ PassManagerBuilder Builder;
+ if (!DisableInline)
+ Builder.Inliner = createTinyFuncInliningPass();
+ Builder.OptLevel = 3;
+ Builder.populatePreIPOPassMgr(PM);
+}
+
static void AddStandardLinkPasses(PassManagerBase &PM) {
PM.add(createVerifierPass()); // Verify that input is
correct
@@ -480,8 +502,9 @@
if (DisableOptimizations) return;
PassManagerBuilder Builder;
- Builder.populateLTOPassManager(PM, /*Internalize=*/ !DisableInternalize,
+ Builder.populateIPOPassManager(PM, /*Internalize=*/ !DisableInternalize,
/*RunInliner=*/ !DisableInline);
+ Builder.populatePostIPOPM(PM);
}
//===----------------------------------------------------------------------===//
@@ -778,6 +801,12 @@
StandardCompileOpts = false;
}
+ // If -std-preipo-opts was specified at the end of the pass list, add them.
+ if (StandardPreIPOOpts) {
+ AddPreIPOCompilePasses(Passes);
+ StandardPreIPOOpts = false;
+ }
+
if (StandardLinkOpts) {
AddStandardLinkPasses(Passes);
StandardLinkOpts = false;
Index: tools/bugpoint/bugpoint.cpp
==================================================================---
tools/bugpoint/bugpoint.cpp (revision 187135)
+++ tools/bugpoint/bugpoint.cpp (working copy)
@@ -169,8 +169,9 @@
if (StandardLinkOpts) {
PassManagerBuilder Builder;
- Builder.populateLTOPassManager(PM, /*Internalize=*/true,
+ Builder.populateIPOPassManager(PM, /*Internalize=*/true,
/*RunInliner=*/true);
+ Builder.populatePostIPOPM(PM);
}
if (OptLevelO1 || OptLevelO2 || OptLevelO3) {
Index: lib/Transforms/IPO/PassManagerBuilder.cpp
==================================================================---
lib/Transforms/IPO/PassManagerBuilder.cpp (revision 187135)
+++ lib/Transforms/IPO/PassManagerBuilder.cpp (working copy)
@@ -294,10 +294,78 @@
addExtensionsToPM(EP_OptimizerLast, MPM);
}
-void PassManagerBuilder::populateLTOPassManager(PassManagerBase &PM,
+void PassManagerBuilder::populatePreIPOPassMgr(PassManagerBase &MPM) {
+ // If all optimizations are disabled, just run the always-inline pass.
+ if (OptLevel == 0) {
+ if (Inliner) {
+ MPM.add(Inliner);
+ Inliner = 0;
+ }
+ return;
+ }
+
+ bool EnableLightWeightIPO = (OptLevel > 1);
+
+ // Add LibraryInfo if we have some.
+ if (LibraryInfo) MPM.add(new TargetLibraryInfo(*LibraryInfo));
+ addInitialAliasAnalysisPasses(MPM);
+
+ // Start of CallGraph SCC passes.
+ {
+ if (EnableLightWeightIPO) {
+ MPM.add(createPruneEHPass()); // Remove dead EH info
+ if (Inliner) {
+ MPM.add(Inliner);
+ Inliner = 0;
+ }
+ MPM.add(createArgumentPromotionPass()); // Scalarize uninlined fn args
+ }
+
+ // Start of function pass.
+ {
+ if (UseNewSROA)
+ MPM.add(createSROAPass(/*RequiresDomTree*/ false));
+ else
+ MPM.add(createScalarReplAggregatesPass(-1, false));
+
+ MPM.add(createEarlyCSEPass()); // Catch trivial redundancies
+ MPM.add(createJumpThreadingPass()); // Thread jumps.
+ MPM.add(createCorrelatedValuePropagationPass());// Propagate conditionals
+ MPM.add(createCFGSimplificationPass()); // Merge & remove BBs
+ MPM.add(createInstructionCombiningPass()); // Combine silly seq's
+ MPM.add(createReassociatePass()); // Reassociate expressions
+ MPM.add(createLoopRotatePass()); // Rotate Loop
+ MPM.add(createLICMPass()); // Hoist loop invariants
+ MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
+ MPM.add(createLoopIdiomPass()); // Recognize idioms like
memset.
+ MPM.add(createLoopDeletionPass()); // Delete dead loops
+
+ MPM.add(createGVNPass()); // Remove redundancies
+ MPM.add(createMemCpyOptPass()); // Remove memcpy / form
memset
+ MPM.add(createSCCPPass()); // Constant prop with SCCP
+
+ MPM.add(createDeadStoreEliminationPass()); // Delete dead stores
+ MPM.add(createAggressiveDCEPass()); // Delete dead instructions
+ MPM.add(createFunctionAttrsPass()); // Set readonly/readnone
attrs
+
+ MPM.add(createTailCallEliminationPass()); // Eliminate tail calls
+ }
+
+ // End of CallGraph SCC passes.
+ }
+
+ if (EnableLightWeightIPO) {
+ MPM.add(createGlobalOptimizerPass()); // Optimize out global vars
+ MPM.add(createIPSCCPPass()); // IP SCCP
+ MPM.add(createDeadArgEliminationPass()); // Dead argument elimination
+ MPM.add(createGlobalDCEPass()); // Remove dead fns and globals.
+ MPM.add(createConstantMergePass()); // Merge dup global constants
+ }
+}
+
+void PassManagerBuilder::populateIPOPassManager(PassManagerBase &PM,
bool Internalize,
- bool RunInliner,
- bool DisableGVNLoadPRE) {
+ bool RunInliner) {
// Provide AliasAnalysis services for optimizations.
addInitialAliasAnalysisPasses(PM);
@@ -325,15 +393,9 @@
// Remove unused arguments from functions.
PM.add(createDeadArgEliminationPass());
- // Reduce the code after globalopt and ipsccp. Both can open up significant
- // simplification opportunities, and both can propagate functions through
- // function pointers. When this happens, we often have to resolve varargs
- // calls, etc, so let instcombine do this.
- PM.add(createInstructionCombiningPass());
-
// Inline small functions
if (RunInliner)
- PM.add(createFunctionInliningPass());
+ PM.add(createFunctionInliningPass(255));
PM.add(createPruneEHPass()); // Remove dead EH info.
@@ -346,35 +408,98 @@
// transform it to pass arguments by value instead of by reference.
PM.add(createArgumentPromotionPass());
- // The IPO passes may leave cruft around. Clean up after them.
- PM.add(createInstructionCombiningPass());
- PM.add(createJumpThreadingPass());
- // Break up allocas
- if (UseNewSROA)
- PM.add(createSROAPass());
- else
- PM.add(createScalarReplAggregatesPass());
-
// Run a few AA driven optimizations here and now, to cleanup the code.
PM.add(createFunctionAttrsPass()); // Add nocapture.
PM.add(createGlobalsModRefPass()); // IP alias analysis.
+}
- PM.add(createLICMPass()); // Hoist loop invariants.
- PM.add(createGVNPass(DisableGVNLoadPRE)); // Remove redundancies.
- PM.add(createMemCpyOptPass()); // Remove dead memcpys.
- // Nuke dead stores.
- PM.add(createDeadStoreEliminationPass());
+void PassManagerBuilder::populatePostIPOPM(PassManagerBase &PM) {
+ // In PostIPO phase, the choice for inlining is simple: either no inlining at
+ // all or just run the inliner which only inline tiny functions. This
function
+ // has freedom to pick up which choice is more appropriate.
+ //
+ assert(Inliner == 0 && "Don't specify inliner");
+ if (OptLevel == 0)
+ return;
- // Cleanup and simplify the code after the scalar optimizations.
- PM.add(createInstructionCombiningPass());
+ bool EnableLightWeightIPO = (OptLevel > 1);
- PM.add(createJumpThreadingPass());
+ // Add LibraryInfo if we have some.
+ if (LibraryInfo) PM.add(new TargetLibraryInfo(*LibraryInfo));
- // Delete basic blocks, which optimization passes may have killed.
- PM.add(createCFGSimplificationPass());
+ addInitialAliasAnalysisPasses(PM);
- // Now that we have optimized the program, discard unreachable functions.
- PM.add(createGlobalDCEPass());
+ // Start of CallGraph SCC passes.
+ {
+ if (EnableLightWeightIPO) {
+ PM.add(createTinyFuncInliningPass());
+ PM.add(createFunctionAttrsPass()); // Set readonly/readnone attrs
+ }
+
+ // Start of function pass.
+ {
+ PM.add(createMemCpyOptPass()); // Remove memcpy / form memset
+ if (UseNewSROA)
+ PM.add(createSROAPass(/*RequiresDomTree*/ false));
+ else
+ PM.add(createScalarReplAggregatesPass(-1, false));
+ PM.add(createEarlyCSEPass()); // Catch trivial redundancies
+ PM.add(createSCCPPass()); // Constant prop with SCCP
+ PM.add(createJumpThreadingPass()); // Thread jumps
+ PM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals
+ PM.add(createCFGSimplificationPass()); // Merge & remove BBs
+ PM.add(createReassociatePass()); // Reassociate expressions
+ PM.add(createLoopRotatePass()); // Rotate Loop
+ PM.add(createLICMPass()); // Hoist loop invariants
+ PM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3));
+ PM.add(createIndVarSimplifyPass()); // Canonicalize indvars
+ PM.add(createLoopIdiomPass()); // Recognize idioms like
memset.
+ PM.add(createLoopDeletionPass()); // Delete dead loops
+
+ if (/*LoopVectorize &&*/ OptLevel > 1 && SizeLevel
< 2)
+ PM.add(createLoopVectorizePass());
+
+ if (!DisableUnrollLoops)
+ PM.add(createLoopUnrollPass()); // Unroll small loops
+
+ addExtensionsToPM(EP_LoopOptimizerEnd, PM);
+
+ if (OptLevel > 1)
+ PM.add(createGVNPass()); // Remove redundancies
+
+ PM.add(createInstructionCombiningPass());
+ PM.add(createDeadStoreEliminationPass()); // Delete dead stores
+ PM.add(createAggressiveDCEPass()); // Delete dead instructions
+ if (UseNewSROA)
+ PM.add(createSROAPass(/*RequiresDomTree*/ false));
+ else
+ PM.add(createScalarReplAggregatesPass(-1, false));
+
+ addExtensionsToPM(EP_ScalarOptimizerLate, PM);
+
+
+ // Add the various vectorization passes and relevant cleanup passes for
+ // them since we are no longer in the middle of the main scalar pipeline.
+ if (/*LoopVectorize && */OptLevel > 1 && SizeLevel
< 2)
+ PM.add(createLoopVectorizePass());
+
+ #if 1
+ if (!DisableUnrollLoops)
+ PM.add(createLoopUnrollPass()); // Unroll small loops
+ #endif
+
+ PM.add(createInstructionCombiningPass());
+
+ if (SLPVectorize)
+ PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar
chains.
+ }
+ }
+
+ if (EnableLightWeightIPO) {
+ PM.add(createGlobalDCEPass()); // Remove dead fns and globals.
+ PM.add(createConstantMergePass()); // Merge dup global constants
+ }
+ addExtensionsToPM(EP_OptimizerLast, PM);
}
inline PassManagerBuilder *unwrap(LLVMPassManagerBuilderRef P) {
@@ -458,5 +583,6 @@
LLVMBool RunInliner) {
PassManagerBuilder *Builder = unwrap(PMB);
PassManagerBase *LPM = unwrap(PM);
- Builder->populateLTOPassManager(*LPM, Internalize != 0, RunInliner != 0);
+ Builder->populateIPOPassManager(*LPM, Internalize != 0, RunInliner != 0);
+ Builder->populatePostIPOPM(*LPM);
}
Index: lib/Transforms/IPO/InlineSimple.cpp
==================================================================---
lib/Transforms/IPO/InlineSimple.cpp (revision 187135)
+++ lib/Transforms/IPO/InlineSimple.cpp (working copy)
@@ -72,6 +72,10 @@
return new SimpleInliner(Threshold);
}
+Pass *llvm::createTinyFuncInliningPass() {
+ return new SimpleInliner(40);
+}
+
bool SimpleInliner::runOnSCC(CallGraphSCC &SCC) {
ICA = &getAnalysis<InlineCostAnalysis>();
return Inliner::runOnSCC(SCC);
-------------- next part --------------
Index: include/clang/Frontend/CodeGenOptions.def
==================================================================---
include/clang/Frontend/CodeGenOptions.def (revision 187135)
+++ include/clang/Frontend/CodeGenOptions.def (working copy)
@@ -112,6 +112,7 @@
CODEGENOPT(VectorizeBB , 1, 0) ///< Run basic block vectorizer.
CODEGENOPT(VectorizeLoop , 1, 0) ///< Run loop vectorizer.
CODEGENOPT(VectorizeSLP , 1, 0) ///< Run SLP vectorizer.
+CODEGENOPT(IsPreIPO , 1, 0) ///< Indicate in pre-IPO phase
/// Attempt to use register sized accesses to bit-fields in structures, when
/// possible.
Index: include/clang/Driver/CC1Options.td
==================================================================---
include/clang/Driver/CC1Options.td (revision 187135)
+++ include/clang/Driver/CC1Options.td (working copy)
@@ -210,6 +210,8 @@
HelpText<"Run the SLP vectorization passes">;
def vectorize_slp_aggressive : Flag<["-"],
"vectorize-slp-aggressive">,
HelpText<"Run the BB vectorization passes">;
+def preipo : Flag<["-"], "preipo">,
+ HelpText<"Run the pre-IPO passes">;
//===----------------------------------------------------------------------===//
// Dependency Output Options
Index: lib/Frontend/CompilerInvocation.cpp
==================================================================---
lib/Frontend/CompilerInvocation.cpp (revision 187135)
+++ lib/Frontend/CompilerInvocation.cpp (working copy)
@@ -402,6 +402,7 @@
Opts.VectorizeBB = Args.hasArg(OPT_vectorize_slp_aggressive);
Opts.VectorizeLoop = Args.hasArg(OPT_vectorize_loops);
Opts.VectorizeSLP = Args.hasArg(OPT_vectorize_slp);
+ Opts.IsPreIPO = Args.hasArg(OPT_preipo);
Opts.MainFileName = Args.getLastArgValue(OPT_main_file_name);
Opts.VerifyModule = !Args.hasArg(OPT_disable_llvm_verifier);
Index: lib/Driver/Tools.cpp
==================================================================---
lib/Driver/Tools.cpp (revision 187135)
+++ lib/Driver/Tools.cpp (working copy)
@@ -2014,7 +2014,8 @@
CmdArgs.push_back("-emit-pth");
} else {
assert(isa<CompileJobAction>(JA) && "Invalid action for
clang tool.");
-
+ if (D.IsUsingLTO(Args))
+ CmdArgs.push_back("-preipo");
if (JA.getType() == types::TY_Nothing) {
CmdArgs.push_back("-fsyntax-only");
} else if (JA.getType() == types::TY_LLVM_IR ||
Index: lib/CodeGen/BackendUtil.cpp
==================================================================---
lib/CodeGen/BackendUtil.cpp (revision 187135)
+++ lib/CodeGen/BackendUtil.cpp (working copy)
@@ -274,6 +274,10 @@
switch (Inlining) {
case CodeGenOptions::NoInlining: break;
case CodeGenOptions::NormalInlining: {
+ if (CodeGenOpts.IsPreIPO) {
+ PMBuilder.Inliner = createTinyFuncInliningPass();
+ break;
+ }
// FIXME: Derive these constants in a principled fashion.
unsigned Threshold = 225;
if (CodeGenOpts.OptimizeSize == 1) // -Os
@@ -321,7 +325,10 @@
MPM->add(createStripSymbolsPass(true));
}
- PMBuilder.populateModulePassManager(*MPM);
+ if (!CodeGenOpts.IsPreIPO)
+ PMBuilder.populateModulePassManager(*MPM);
+ else
+ PMBuilder.populatePreIPOPassMgr(*MPM);
}
TargetMachine *EmitAssemblyHelper::CreateTargetMachine(bool MustCreateTM) {