Chad Verbowski via llvm-dev
2020-Jul-03  02:50 UTC
[llvm-dev] flags to reproduce clang -O3 with opt -O3
Awesome, thanks! I'd like to have the last step (llc in your example) not perform additional optimization passes, such as O3, and simply use the O3 pass from opt in the previous line. Do you happen to know if I should use 'llc -O0 foo_o.bc -o foo.exe' instead to achieve this? On Thu, Jul 2, 2020 at 6:35 PM Mehdi AMINI <joker.eph at gmail.com> wrote:> > > On Thu, Jul 2, 2020 at 2:28 PM Chad Verbowski via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello, >> >> I've been trying to figure out how to reproduce the results of a single >> clang -O3 compilation to a binary with a multi-step process using opt. >> >> Specifically I have: >> >> clang -O3 foo.c -o foo.exe >> >> >> which I want to replicate with: >> >> clang -O0 -c -emit-llvm foo.c >> >> > Using O0 will mark every function in the IR with "optnone" which prevents > `opt` from optimizing it. I'd try `clang -O3 -Xclang > -disable-llvm-passes -c -emit-llvm foo.c` > > >> opt -O3 foo.bc -o foo_o.bc >> clang foo_o.bc -o foo.exe >> >> > This last step won't enable optimizations in the backend, you likely > should try `llc -O3` instead. > > Best, > > -- > Mehdi > > > > > >> >> Any hints / suggestions on what additional flags I need to produce the >> same binary are greatly appreciated! >> >> *What I've tried:* >> I've been reading the archives, and found this >> <http://lists.llvm.org/pipermail/llvm-dev/2017-September/117144.html>, >> which suggests dumping the pass arguments using: >> >> clang -mllvm -debug-pass=Structure -O3 foo.c -o foo.exe >> >> and comparing with: >> >> clang -mllvm -debug-pass=Structure -O0 -c -emit-llvm foo.c >> opt -debug-pass=Structure -O3 foo.bc -o foo_o.bc >> clang -mllvm -debug-pass=Structure foo_o.bc -o foo.exe >> >> >> The first has 30 "Pass Argument" statements though only these 5 are >> distinct. Across these 5 there are 190 distinct flags. The multi-step >> compilation has only 140 distinct flags. Comparing the flags, 18 from the >> multi-step are missing in the 1pass, and 67 from 1pass are missing in the >> multistep. >> >> These appear to be opt flags, since they cause an error when trying to >> use them with clang (e.g. -x86-fixup-LEAs) and when used with opt causes >> a crash with stack dump and request to submit a bug report. Others like >> -attributor appear to work with opt. >> >> I'm currently blindly trying to add the 67 different flags to the opt >> step to see which work, and hopefully that subset will produce the same >> result as clang -O3. >> >> It seems like there must be an easier / more exact way of getting the opt >> -O3 multi-step to match the clang -O3 result. >> >> Any thoughts or insights are appreciated. Below is a sorted list of the >> flags missing from each for completeness. >> >> not contained in 1pass O3 (count=18) >> >> -aa-scalar-evolution >> >> -always-inline >> >> -callsite-splitting >> >> -inject-tli-mappings >> >> -ipsccp >> >> -jump-threading-correlated-propagation >> >> -livedebugvalues >> >> -loops-loop-simplify >> >> -memdep-lazy-branch-prob >> >> -openmpopt >> >> -opt-remark-emitter-instcombine >> >> -regallocfast >> >> -speculative-execution >> >> -stackmap-liveness >> >> -tbaa-scoped-noalias >> >> -vector-combine >> >> -verify >> >> -write-bitcode >> >> not contained in multi O3 (count=67) >> >> -attributor >> >> -block-freq-loop-simplify >> >> -branch-folder >> >> -break-false-deps >> >> -callsite-splitting-ipsccp >> >> -codegenprepare >> >> -consthoist >> >> -dead-mi-elimination >> >> -detect-dead-lanes >> >> -early-ifcvt >> >> -early-machinelicm >> >> -early-tailduplication >> >> -expandmemcmp >> >> -greedy >> >> -interleaved-access >> >> -iv-users >> >> -lazy-block-freq-opt-remark-emitter >> >> -livedebugvars >> >> -liveintervals >> >> -liveregmatrix >> >> -livestacks >> >> -livevars >> >> -loop-reduce >> >> -loop-simplify-lcssa-verification >> >> -lrshrink >> >> -machine-block-freq >> >> -machine-combiner >> >> -machine-cp >> >> -machine-cse >> >> -machinedomtree-machine-loops >> >> -machinelicm >> >> -machine-loops >> >> -machinepostdomtree >> >> -machinepostdomtree-block-placement >> >> -machine-scheduler >> >> -machine-sink >> >> -machine-trace-metrics >> >> -mergeicmps >> >> -objc-arc-contract >> >> -opt-phis >> >> -partially-inline-libcalls >> >> -peephole-opt >> >> -postra-machine-sink >> >> -post-RA-sched >> >> -processimpdefs >> >> -reaching-deps-analysis >> >> -rename-independent-subregs >> >> -shrink-wrap >> >> -simple-register-coalescing >> >> -slotindexes >> >> -spill-code-placement >> >> -stack-coloring >> >> -stackmap-liveness-livedebugvalues >> >> -stack-slot-coloring >> >> -tailduplication >> >> -unreachable-mbb-elimination >> >> -virtregmap >> >> -virtregrewriter >> >> -x86-avoid-SFB >> >> -x86-cf-opt >> >> -x86-cmov-conversion >> >> -x86-domain-reassignment >> >> -x86-evex-to-vex-compress >> >> -x86-execution-domain-fix >> >> -x86-fixup-bw-insts >> >> -x86-fixup-LEAs >> >> -x86-optimize-LEAs >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200702/cff74997/attachment.html>
Mehdi AMINI via llvm-dev
2020-Jul-03  02:55 UTC
[llvm-dev] flags to reproduce clang -O3 with opt -O3
On Thu, Jul 2, 2020 at 7:50 PM Chad Verbowski <chad at verbowski.com> wrote:> Awesome, thanks! > > I'd like to have the last step (llc in your example) not perform > additional optimization passes, such as O3, and simply use the O3 pass from > opt in the previous line. > > Do you happen to know if I should use 'llc -O0 foo_o.bc -o foo.exe' > instead to achieve this? >No you should use `llc -O3`: this is controlling only the backend part of the pipeline.> > On Thu, Jul 2, 2020 at 6:35 PM Mehdi AMINI <joker.eph at gmail.com> wrote: > >> >> >> On Thu, Jul 2, 2020 at 2:28 PM Chad Verbowski via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hello, >>> >>> I've been trying to figure out how to reproduce the results of a single >>> clang -O3 compilation to a binary with a multi-step process using opt. >>> >>> Specifically I have: >>> >>> clang -O3 foo.c -o foo.exe >>> >>> >>> which I want to replicate with: >>> >>> clang -O0 -c -emit-llvm foo.c >>> >>> >> Using O0 will mark every function in the IR with "optnone" which prevents >> `opt` from optimizing it. I'd try `clang -O3 -Xclang >> -disable-llvm-passes -c -emit-llvm foo.c` >> >> >>> opt -O3 foo.bc -o foo_o.bc >>> clang foo_o.bc -o foo.exe >>> >>> >> This last step won't enable optimizations in the backend, you likely >> should try `llc -O3` instead. >> >> Best, >> >> -- >> Mehdi >> >> >> >> >> >>> >>> Any hints / suggestions on what additional flags I need to produce the >>> same binary are greatly appreciated! >>> >>> *What I've tried:* >>> I've been reading the archives, and found this >>> <http://lists.llvm.org/pipermail/llvm-dev/2017-September/117144.html>, >>> which suggests dumping the pass arguments using: >>> >>> clang -mllvm -debug-pass=Structure -O3 foo.c -o foo.exe >>> >>> and comparing with: >>> >>> clang -mllvm -debug-pass=Structure -O0 -c -emit-llvm foo.c >>> opt -debug-pass=Structure -O3 foo.bc -o foo_o.bc >>> clang -mllvm -debug-pass=Structure foo_o.bc -o foo.exe >>> >>> >>> The first has 30 "Pass Argument" statements though only these 5 are >>> distinct. Across these 5 there are 190 distinct flags. The multi-step >>> compilation has only 140 distinct flags. Comparing the flags, 18 from the >>> multi-step are missing in the 1pass, and 67 from 1pass are missing in the >>> multistep. >>> >>> These appear to be opt flags, since they cause an error when trying to >>> use them with clang (e.g. -x86-fixup-LEAs) and when used with opt >>> causes a crash with stack dump and request to submit a bug report. Others >>> like -attributor appear to work with opt. >>> >>> I'm currently blindly trying to add the 67 different flags to the opt >>> step to see which work, and hopefully that subset will produce the same >>> result as clang -O3. >>> >>> It seems like there must be an easier / more exact way of getting the >>> opt -O3 multi-step to match the clang -O3 result. >>> >>> Any thoughts or insights are appreciated. Below is a sorted list of the >>> flags missing from each for completeness. >>> >>> not contained in 1pass O3 (count=18) >>> >>> -aa-scalar-evolution >>> >>> -always-inline >>> >>> -callsite-splitting >>> >>> -inject-tli-mappings >>> >>> -ipsccp >>> >>> -jump-threading-correlated-propagation >>> >>> -livedebugvalues >>> >>> -loops-loop-simplify >>> >>> -memdep-lazy-branch-prob >>> >>> -openmpopt >>> >>> -opt-remark-emitter-instcombine >>> >>> -regallocfast >>> >>> -speculative-execution >>> >>> -stackmap-liveness >>> >>> -tbaa-scoped-noalias >>> >>> -vector-combine >>> >>> -verify >>> >>> -write-bitcode >>> >>> not contained in multi O3 (count=67) >>> >>> -attributor >>> >>> -block-freq-loop-simplify >>> >>> -branch-folder >>> >>> -break-false-deps >>> >>> -callsite-splitting-ipsccp >>> >>> -codegenprepare >>> >>> -consthoist >>> >>> -dead-mi-elimination >>> >>> -detect-dead-lanes >>> >>> -early-ifcvt >>> >>> -early-machinelicm >>> >>> -early-tailduplication >>> >>> -expandmemcmp >>> >>> -greedy >>> >>> -interleaved-access >>> >>> -iv-users >>> >>> -lazy-block-freq-opt-remark-emitter >>> >>> -livedebugvars >>> >>> -liveintervals >>> >>> -liveregmatrix >>> >>> -livestacks >>> >>> -livevars >>> >>> -loop-reduce >>> >>> -loop-simplify-lcssa-verification >>> >>> -lrshrink >>> >>> -machine-block-freq >>> >>> -machine-combiner >>> >>> -machine-cp >>> >>> -machine-cse >>> >>> -machinedomtree-machine-loops >>> >>> -machinelicm >>> >>> -machine-loops >>> >>> -machinepostdomtree >>> >>> -machinepostdomtree-block-placement >>> >>> -machine-scheduler >>> >>> -machine-sink >>> >>> -machine-trace-metrics >>> >>> -mergeicmps >>> >>> -objc-arc-contract >>> >>> -opt-phis >>> >>> -partially-inline-libcalls >>> >>> -peephole-opt >>> >>> -postra-machine-sink >>> >>> -post-RA-sched >>> >>> -processimpdefs >>> >>> -reaching-deps-analysis >>> >>> -rename-independent-subregs >>> >>> -shrink-wrap >>> >>> -simple-register-coalescing >>> >>> -slotindexes >>> >>> -spill-code-placement >>> >>> -stack-coloring >>> >>> -stackmap-liveness-livedebugvalues >>> >>> -stack-slot-coloring >>> >>> -tailduplication >>> >>> -unreachable-mbb-elimination >>> >>> -virtregmap >>> >>> -virtregrewriter >>> >>> -x86-avoid-SFB >>> >>> -x86-cf-opt >>> >>> -x86-cmov-conversion >>> >>> -x86-domain-reassignment >>> >>> -x86-evex-to-vex-compress >>> >>> -x86-execution-domain-fix >>> >>> -x86-fixup-bw-insts >>> >>> -x86-fixup-LEAs >>> >>> -x86-optimize-LEAs >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200702/4648ba04/attachment.html>
Chad Verbowski via llvm-dev
2020-Jul-03  03:00 UTC
[llvm-dev] flags to reproduce clang -O3 with opt -O3
Thanks. My intent is to reduce the overall compile time by eliminating unused optimizations. Do you happen to know if there is a list somewhere (or a way to dump / extract them) of the individual flags which llc uses as part of -O3, so I can perhaps experiment with removing the unnecessary ones for my code? On Thu, Jul 2, 2020 at 7:55 PM Mehdi AMINI <joker.eph at gmail.com> wrote:> > > On Thu, Jul 2, 2020 at 7:50 PM Chad Verbowski <chad at verbowski.com> wrote: > >> Awesome, thanks! >> >> I'd like to have the last step (llc in your example) not perform >> additional optimization passes, such as O3, and simply use the O3 pass from >> opt in the previous line. >> >> Do you happen to know if I should use 'llc -O0 foo_o.bc -o foo.exe' >> instead to achieve this? >> > > No you should use `llc -O3`: this is controlling only the backend part of > the pipeline. > > >> >> On Thu, Jul 2, 2020 at 6:35 PM Mehdi AMINI <joker.eph at gmail.com> wrote: >> >>> >>> >>> On Thu, Jul 2, 2020 at 2:28 PM Chad Verbowski via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hello, >>>> >>>> I've been trying to figure out how to reproduce the results of a single >>>> clang -O3 compilation to a binary with a multi-step process using opt. >>>> >>>> Specifically I have: >>>> >>>> clang -O3 foo.c -o foo.exe >>>> >>>> >>>> which I want to replicate with: >>>> >>>> clang -O0 -c -emit-llvm foo.c >>>> >>>> >>> Using O0 will mark every function in the IR with "optnone" which >>> prevents `opt` from optimizing it. I'd try `clang -O3 -Xclang >>> -disable-llvm-passes -c -emit-llvm foo.c` >>> >>> >>>> opt -O3 foo.bc -o foo_o.bc >>>> clang foo_o.bc -o foo.exe >>>> >>>> >>> This last step won't enable optimizations in the backend, you likely >>> should try `llc -O3` instead. >>> >>> Best, >>> >>> -- >>> Mehdi >>> >>> >>> >>> >>> >>>> >>>> Any hints / suggestions on what additional flags I need to produce the >>>> same binary are greatly appreciated! >>>> >>>> *What I've tried:* >>>> I've been reading the archives, and found this >>>> <http://lists.llvm.org/pipermail/llvm-dev/2017-September/117144.html>, >>>> which suggests dumping the pass arguments using: >>>> >>>> clang -mllvm -debug-pass=Structure -O3 foo.c -o foo.exe >>>> >>>> and comparing with: >>>> >>>> clang -mllvm -debug-pass=Structure -O0 -c -emit-llvm foo.c >>>> opt -debug-pass=Structure -O3 foo.bc -o foo_o.bc >>>> clang -mllvm -debug-pass=Structure foo_o.bc -o foo.exe >>>> >>>> >>>> The first has 30 "Pass Argument" statements though only these 5 are >>>> distinct. Across these 5 there are 190 distinct flags. The multi-step >>>> compilation has only 140 distinct flags. Comparing the flags, 18 from the >>>> multi-step are missing in the 1pass, and 67 from 1pass are missing in the >>>> multistep. >>>> >>>> These appear to be opt flags, since they cause an error when trying to >>>> use them with clang (e.g. -x86-fixup-LEAs) and when used with opt >>>> causes a crash with stack dump and request to submit a bug report. Others >>>> like -attributor appear to work with opt. >>>> >>>> I'm currently blindly trying to add the 67 different flags to the opt >>>> step to see which work, and hopefully that subset will produce the same >>>> result as clang -O3. >>>> >>>> It seems like there must be an easier / more exact way of getting the >>>> opt -O3 multi-step to match the clang -O3 result. >>>> >>>> Any thoughts or insights are appreciated. Below is a sorted list of the >>>> flags missing from each for completeness. >>>> >>>> not contained in 1pass O3 (count=18) >>>> >>>> -aa-scalar-evolution >>>> >>>> -always-inline >>>> >>>> -callsite-splitting >>>> >>>> -inject-tli-mappings >>>> >>>> -ipsccp >>>> >>>> -jump-threading-correlated-propagation >>>> >>>> -livedebugvalues >>>> >>>> -loops-loop-simplify >>>> >>>> -memdep-lazy-branch-prob >>>> >>>> -openmpopt >>>> >>>> -opt-remark-emitter-instcombine >>>> >>>> -regallocfast >>>> >>>> -speculative-execution >>>> >>>> -stackmap-liveness >>>> >>>> -tbaa-scoped-noalias >>>> >>>> -vector-combine >>>> >>>> -verify >>>> >>>> -write-bitcode >>>> >>>> not contained in multi O3 (count=67) >>>> >>>> -attributor >>>> >>>> -block-freq-loop-simplify >>>> >>>> -branch-folder >>>> >>>> -break-false-deps >>>> >>>> -callsite-splitting-ipsccp >>>> >>>> -codegenprepare >>>> >>>> -consthoist >>>> >>>> -dead-mi-elimination >>>> >>>> -detect-dead-lanes >>>> >>>> -early-ifcvt >>>> >>>> -early-machinelicm >>>> >>>> -early-tailduplication >>>> >>>> -expandmemcmp >>>> >>>> -greedy >>>> >>>> -interleaved-access >>>> >>>> -iv-users >>>> >>>> -lazy-block-freq-opt-remark-emitter >>>> >>>> -livedebugvars >>>> >>>> -liveintervals >>>> >>>> -liveregmatrix >>>> >>>> -livestacks >>>> >>>> -livevars >>>> >>>> -loop-reduce >>>> >>>> -loop-simplify-lcssa-verification >>>> >>>> -lrshrink >>>> >>>> -machine-block-freq >>>> >>>> -machine-combiner >>>> >>>> -machine-cp >>>> >>>> -machine-cse >>>> >>>> -machinedomtree-machine-loops >>>> >>>> -machinelicm >>>> >>>> -machine-loops >>>> >>>> -machinepostdomtree >>>> >>>> -machinepostdomtree-block-placement >>>> >>>> -machine-scheduler >>>> >>>> -machine-sink >>>> >>>> -machine-trace-metrics >>>> >>>> -mergeicmps >>>> >>>> -objc-arc-contract >>>> >>>> -opt-phis >>>> >>>> -partially-inline-libcalls >>>> >>>> -peephole-opt >>>> >>>> -postra-machine-sink >>>> >>>> -post-RA-sched >>>> >>>> -processimpdefs >>>> >>>> -reaching-deps-analysis >>>> >>>> -rename-independent-subregs >>>> >>>> -shrink-wrap >>>> >>>> -simple-register-coalescing >>>> >>>> -slotindexes >>>> >>>> -spill-code-placement >>>> >>>> -stack-coloring >>>> >>>> -stackmap-liveness-livedebugvalues >>>> >>>> -stack-slot-coloring >>>> >>>> -tailduplication >>>> >>>> -unreachable-mbb-elimination >>>> >>>> -virtregmap >>>> >>>> -virtregrewriter >>>> >>>> -x86-avoid-SFB >>>> >>>> -x86-cf-opt >>>> >>>> -x86-cmov-conversion >>>> >>>> -x86-domain-reassignment >>>> >>>> -x86-evex-to-vex-compress >>>> >>>> -x86-execution-domain-fix >>>> >>>> -x86-fixup-bw-insts >>>> >>>> -x86-fixup-LEAs >>>> >>>> -x86-optimize-LEAs >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200702/58c57ab6/attachment-0001.html>