Mehdi AMINI via llvm-dev
2019-Oct-19 17:48 UTC
[llvm-dev] Replicate Individual O3 optimizations
On Thu, Oct 17, 2019 at 11:22 AM David Greene via llvm-dev < llvm-dev at lists.llvm.org> wrote:> hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org> writes: > > > Hello, > > I want to study the individual O3 optimizations. For this I am using > > following commands, but unable to replicate O3 behavior. > > > > 1. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O1 > > -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o vecsum-noopt.ll > > > > 2. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O3 > > -mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c > > > > 3. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt > > <optimization sequence obtained in step 2> -S vecsum-noopt.ll -S -o > > o3-chk.ll > > > > Why the IR obtained by above step i.e individual O3 sequences, is not > same > > when O3 is passed? > > > > Where I am doing mistake? >If you could provide the full reproducer, it could help to debug this.> > I think you need to turn off LLVM optimizations when doing the > -emit-llvm dump. Something like this: > > Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O3 \ > -mllvm -debug-pass=Arguments -Xclang -disable-llvm-optzns -emit-llvm \ > -S vecsum.c > > Otherwise you are effectively running the O3 pipeline twice, as clang > will emit LLVM IR after optimization, not before (this confused me too > when I first tried it). >This is the common pitfall indeed! I think they are doing it correctly in step 1 though by including: `-Xclang -disable-llvm-passes`. That said, I'm not sure you will get the same IR out of opt as with> clang -O3 even with the above. For example, clang sets > TargetTransformInfo for the pass pipeline and the detailed information > it uses may or may not be transmitted via the IR it dumps out. I have > not personally tried to do this kind of thing in a while.I struggled as well to setup TTI and TLI the same way clang does :( It'd be nice to revisit our PassManagerBuilder setup and the opt integration to provide reproducibility (maybe could be a starter project for someone?). -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191019/1bd12e26/attachment-0001.html>
Neil Nelson via llvm-dev
2019-Oct-21 03:24 UTC
[llvm-dev] Replicate Individual O3 optimizations
|is_sorted.cpp bool| |is_sorted(||int| |*a, ||int| |n) {||||for| |(||int| |i = 0; i < n - 1; i++)| |||if| |(a[i] > a[i + 1])| |||return| |false||;| |||return| |true||;| |}| https://blog.regehr.org/archives/1605 How Clang Compiles a Function https://blog.regehr.org/archives/1603 How LLVM Optimizes a Function clang version 10.0.0, Xubuntu 19.04 clang is_sorted.cpp -S -emit-llvm -o is_sorted_.ll clang is_sorted.cpp -O0 -S -emit-llvm -o is_sorted_O0.ll clang is_sorted.cpp -O0 -Xclang -disable-llvm-passes -S -emit-llvm -o is_sorted_disable.ll No difference in the prior three ll files. clang is_sorted.cpp -O1 -S -emit-llvm -o is_sorted_O1.ll Many differences between is_sorted_O1.ll and is_sorted_.ll. opt -O3 -S is_sorted_.ll -o is_sorted_optO3.ll clang is_sorted.cpp -mllvm -debug-pass=Arguments -O3 -S -emit-llvm -o is_sorted_O3arg.ll opt <optimization sequence obtained in prior step> -S is_sorted_.ll -o is_sorted_opt_parms.ll No difference between is_sorted_optO3.ll and is_sorted_opt_parms.ll, the last two opt runs. Many differences between is_sorted_O3arg.ll and is_sorted_opt_parms.ll, the last two runs, clang and opt. Conclusions: Given my current understanding, the ll files from the first three clang runs are before any optimizations. Those ll files are from the front-end phase (CFE). But this is a simple program and it may be that for a more complex program that the ll files could be different. Whether or not we use a -O3 optimization or use the parameters provided by clang for a -03 optimization, we obtain the same result. The difference in question is why an opt run using the CFE ll before optimization obtains a different ll than a CFE run that includes optimization. That is, for this case, it is not the expansion of the -O3 parameters that is the difference. Initially, it would be interesting to have an ll listing before optimization from the clang run that includes optimization to compare with the ll from the clang run without optimization. Neil Nelson On 10/19/19 11:48 AM, Mehdi AMINI via llvm-dev wrote:> > > On Thu, Oct 17, 2019 at 11:22 AM David Greene via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>> writes: > > > Hello, > > I want to study the individual O3 optimizations. For this I am using > > following commands, but unable to replicate O3 behavior. > > > > 1. > Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O1 > > -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o > vecsum-noopt.ll > > > > 2. > Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O3 > > -mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c > > > > 3. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt > > <optimization sequence obtained in step 2> -S vecsum-noopt.ll -S -o > > o3-chk.ll > > > > Why the IR obtained by above step i.e individual O3 sequences, > is not same > > when O3 is passed? > > > > Where I am doing mistake? > > > If you could provide the full reproducer, it could help to debug this. > > > I think you need to turn off LLVM optimizations when doing the > -emit-llvm dump. Something like this: > > Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang > -O3 \ > -mllvm -debug-pass=Arguments -Xclang -disable-llvm-optzns > -emit-llvm \ > -S vecsum.c > > Otherwise you are effectively running the O3 pipeline twice, as clang > will emit LLVM IR after optimization, not before (this confused me too > when I first tried it). > > > This is the common pitfall indeed! > I think they are doing it correctly in step 1 though by including: > `-Xclang -disable-llvm-passes`. > > > That said, I'm not sure you will get the same IR out of opt as with > clang -O3 even with the above. For example, clang sets > TargetTransformInfo for the pass pipeline and the detailed information > it uses may or may not be transmitted via the IR it dumps out. I have > not personally tried to do this kind of thing in a while. > > > I struggled as well to setup TTI and TLI the same way clang does :( > It'd be nice to revisit our PassManagerBuilder setup and the opt > integration to provide reproducibility (maybe could be a starter > project for someone?). > > -- > Mehdi > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191020/957cd7b2/attachment.html>
hameeza ahmed via llvm-dev
2019-Oct-24 11:04 UTC
[llvm-dev] Replicate Individual O3 optimizations
I run matrix multiplication code with both the approaches o3 at clang and o3 at opt. clang o3 is about 2.97x faster than opt o3. On Mon, Oct 21, 2019 at 8:24 AM Neil Nelson <nnelson at infowest.com> wrote:> is_sorted.cpp > bool is_sorted(int *a, int n) { > > for (int i = 0; i < n - 1; i++) > > if (a[i] > a[i + 1]) > return false; > return true; > } > > https://blog.regehr.org/archives/1605 How Clang Compiles a Functionhttps://blog.regehr.org/archives/1603 How LLVM Optimizes a Function > clang version 10.0.0, Xubuntu 19.04 > > clang is_sorted.cpp -S -emit-llvm -o is_sorted_.ll > clang is_sorted.cpp -O0 -S -emit-llvm -o is_sorted_O0.ll > clang is_sorted.cpp -O0 -Xclang -disable-llvm-passes -S -emit-llvm -o is_sorted_disable.ll > > No difference in the prior three ll files. > > clang is_sorted.cpp -O1 -S -emit-llvm -o is_sorted_O1.ll > > Many differences between is_sorted_O1.ll and is_sorted_.ll. > > opt -O3 -S is_sorted_.ll -o is_sorted_optO3.ll > > clang is_sorted.cpp -mllvm -debug-pass=Arguments -O3 -S -emit-llvm -o is_sorted_O3arg.ll > opt <optimization sequence obtained in prior step> -S is_sorted_.ll -o is_sorted_opt_parms.ll > > No difference between is_sorted_optO3.ll and is_sorted_opt_parms.ll, the last two opt runs. > Many differences between is_sorted_O3arg.ll and is_sorted_opt_parms.ll, the last two runs, > clang and opt. > > Conclusions: > > Given my current understanding, the ll files from the first three clang runs > are before any optimizations. Those ll files are from the front-end phase (CFE). > But this is a simple program and it may be that for a more complex program that > the ll files could be different. > > Whether or not we use a -O3 optimization or use the parameters provided by clang for a > -03 optimization, we obtain the same result. > > The difference in question is why an opt run using the CFE ll before optimization > obtains a different ll than a CFE run that includes optimization. That is, for this case, > it is not the expansion of the -O3 parameters that is the difference. > > Initially, it would be interesting to have an ll listing before optimization from the > clang run that includes optimization to compare with the ll from the clang run without > optimization. > > Neil Nelson > > On 10/19/19 11:48 AM, Mehdi AMINI via llvm-dev wrote: > > > > On Thu, Oct 17, 2019 at 11:22 AM David Greene via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org> writes: >> >> > Hello, >> > I want to study the individual O3 optimizations. For this I am using >> > following commands, but unable to replicate O3 behavior. >> > >> > 1. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang >> -O1 >> > -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o vecsum-noopt.ll >> > >> > 2. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang >> -O3 >> > -mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c >> > >> > 3. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt >> > <optimization sequence obtained in step 2> -S vecsum-noopt.ll -S -o >> > o3-chk.ll >> > >> > Why the IR obtained by above step i.e individual O3 sequences, is not >> same >> > when O3 is passed? >> > >> > Where I am doing mistake? >> > > If you could provide the full reproducer, it could help to debug this. > > >> >> I think you need to turn off LLVM optimizations when doing the >> -emit-llvm dump. Something like this: >> >> Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O3 \ >> -mllvm -debug-pass=Arguments -Xclang -disable-llvm-optzns -emit-llvm \ >> -S vecsum.c >> >> Otherwise you are effectively running the O3 pipeline twice, as clang >> will emit LLVM IR after optimization, not before (this confused me too >> when I first tried it). >> > > This is the common pitfall indeed! > I think they are doing it correctly in step 1 though by including: > `-Xclang -disable-llvm-passes`. > > > That said, I'm not sure you will get the same IR out of opt as with >> clang -O3 even with the above. For example, clang sets >> TargetTransformInfo for the pass pipeline and the detailed information >> it uses may or may not be transmitted via the IR it dumps out. I have >> not personally tried to do this kind of thing in a while. > > > I struggled as well to setup TTI and TLI the same way clang does :( > It'd be nice to revisit our PassManagerBuilder setup and the opt > integration to provide reproducibility (maybe could be a starter project > for someone?). > > -- > Mehdi > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191024/2a1f30f4/attachment.html>
David Greene via llvm-dev
2019-Oct-25 19:14 UTC
[llvm-dev] Replicate Individual O3 optimizations
Neil Nelson <nnelson at infowest.com> writes:> clang is_sorted.cpp -S -emit-llvm -o is_sorted_.ll > clang is_sorted.cpp -O0 -S -emit-llvm -o is_sorted_O0.ll > clang is_sorted.cpp -O0 -Xclang -disable-llvm-passes -S -emit-llvm -o is_sorted_disable.ll > > No difference in the prior three ll files. > > clang is_sorted.cpp -O1 -S -emit-llvm -o is_sorted_O1.ll > > Many differences between is_sorted_O1.ll and is_sorted_.ll.Sure. One is optimized and the other is not.> opt -O3 -S is_sorted_.ll -o is_sorted_optO3.ll > > clang is_sorted.cpp -mllvm -debug-pass=Arguments -O3 -S -emit-llvm -o is_sorted_O3arg.ll > opt <optimization sequence obtained in prior step> -S is_sorted_.ll -o is_sorted_opt_parms.ll > > No difference between is_sorted_optO3.ll and is_sorted_opt_parms.ll, the last two opt runs.Ok. This isn't surprising to me.> Many differences between is_sorted_O3arg.ll and is_sorted_opt_parms.ll, the last two runs, > clang and opt.I think the problem is that without an optimization argument (-O1, -O3, etc.) clang sets the "optnone" attribute on all functions and opt will refuse to optimize. I think this is very unfortunate behavior.> Conclusions: > > Given my current understanding, the ll files from the first three clang runs > are before any optimizations. Those ll files are from the front-end phase (CFE). > But this is a simple program and it may be that for a more complex program that > the ll files could be different. > > Whether or not we use a -O3 optimization or use the parameters provided by clang for a > -03 optimization, we obtain the same result.Yep, in both cases opt is not doing any optimization. :)> The difference in question is why an opt run using the CFE ll before optimization > obtains a different ll than a CFE run that includes optimization. That is, for this case, > it is not the expansion of the -O3 parameters that is the difference.I think it's the optnone attribute set by clang in the first three runs.> Initially, it would be interesting to have an ll listing before optimization from the > clang run that includes optimization to compare with the ll from the clang run without > optimization.Unfortunately I don't know of a great way to do that. -mllvm -print-before-all might be close but it will also dump out a ton of stuff and not all dumps are complete (e.g. only dumps a Function rather than the whole Module). -David