thr3ads.net - llvm dev - [llvm-dev] Replicate Individual O3 optimizations [Oct 2019]

If this information is useful, please help other people find it:
Share via:

hameeza ahmed via llvm-dev

2019-Oct-24 11:04 UTC

[llvm-dev] Replicate Individual O3 optimizations

I run matrix multiplication code with both the approaches o3 at clang and
o3 at opt. clang o3 is about 2.97x faster than opt o3.



On Mon, Oct 21, 2019 at 8:24 AM Neil Nelson <nnelson at infowest.com>
wrote:
> is_sorted.cpp
> bool is_sorted(int *a, int n) {
>
>   for (int i = 0; i < n - 1; i++)
>
>     if (a[i] > a[i + 1])
>       return false;
>   return true;
> }
>
> https://blog.regehr.org/archives/1605 How Clang Compiles a
Functionhttps://blog.regehr.org/archives/1603 How LLVM Optimizes a Function
> clang version 10.0.0, Xubuntu 19.04
>
> clang is_sorted.cpp -S -emit-llvm -o is_sorted_.ll
> clang is_sorted.cpp -O0 -S -emit-llvm -o is_sorted_O0.ll
> clang is_sorted.cpp -O0 -Xclang -disable-llvm-passes -S -emit-llvm -o
is_sorted_disable.ll
>
> No difference in the prior three ll files.
>
> clang is_sorted.cpp -O1 -S -emit-llvm -o is_sorted_O1.ll
>
> Many differences between is_sorted_O1.ll and is_sorted_.ll.
>
> opt -O3 -S is_sorted_.ll -o is_sorted_optO3.ll
>
> clang is_sorted.cpp -mllvm -debug-pass=Arguments -O3 -S -emit-llvm -o
is_sorted_O3arg.ll
> opt <optimization sequence obtained in prior step> -S is_sorted_.ll
-o is_sorted_opt_parms.ll
>
> No difference between is_sorted_optO3.ll and is_sorted_opt_parms.ll, the
last two opt runs.
> Many differences between is_sorted_O3arg.ll and is_sorted_opt_parms.ll, the
last two runs,
> clang and opt.
>
> Conclusions:
>
> Given my current understanding, the ll files from the first three clang
runs
> are before any optimizations. Those ll files are from the front-end phase
(CFE).
> But this is a simple program and it may be that for a more complex program
that
> the ll files could be different.
>
> Whether or not we use a -O3 optimization or use the parameters provided by
clang for a
> -03 optimization, we obtain the same result.
>
> The difference in question is why an opt run using the CFE ll before
optimization
> obtains a different ll than a CFE run that includes optimization. That is,
for this case,
> it is not the expansion of the -O3 parameters that is the difference.
>
> Initially, it would be interesting to have an ll listing before
optimization from the
> clang run that includes optimization to compare with the ll from the clang
run without
> optimization.
>
> Neil Nelson
>
> On 10/19/19 11:48 AM, Mehdi AMINI via llvm-dev wrote:
>
>
>
> On Thu, Oct 17, 2019 at 11:22 AM David Greene via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org> writes:
>>
>> > Hello,
>> > I want to study the individual O3 optimizations. For this I am
using
>> > following commands, but unable to replicate O3 behavior.
>> >
>> > 1.
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>> -O1
>> > -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o
vecsum-noopt.ll
>> >
>> > 2.
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>> -O3
>> > -mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c
>> >
>> > 3.
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt
>> > <optimization sequence obtained in step 2> -S
vecsum-noopt.ll -S -o
>> > o3-chk.ll
>> >
>> > Why the IR obtained by above step i.e individual O3 sequences, is
not
>> same
>> > when O3 is passed?
>> >
>> > Where I am doing mistake?
>>
>
> If you could provide the full reproducer, it could help to debug this.
>
>
>>
>> I think you need to turn off LLVM optimizations when doing the
>> -emit-llvm dump.  Something like this:
>>
>> Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O3
\
>>   -mllvm -debug-pass=Arguments -Xclang -disable-llvm-optzns -emit-llvm
\
>>   -S vecsum.c
>>
>> Otherwise you are effectively running the O3 pipeline twice, as clang
>> will emit LLVM IR after optimization, not before (this confused me too
>> when I first tried it).
>>
>
> This is the common pitfall indeed!
> I think they are doing it correctly in step 1 though by including:
> `-Xclang -disable-llvm-passes`.
>
>
> That said, I'm not sure you will get the same IR out of opt as with
>> clang -O3 even with the above.  For example, clang sets
>> TargetTransformInfo for the pass pipeline and the detailed information
>> it uses may or may not be transmitted via the IR it dumps out.  I have
>> not personally tried to do this kind of thing in a while.
>
>
> I struggled as well to setup TTI and TLI the same way clang does :(
> It'd be nice to revisit our PassManagerBuilder setup and the opt
> integration to provide reproducibility (maybe could be a starter project
> for someone?).
>
> --
> Mehdi
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191024/2a1f30f4/attachment.html>

Neil Nelson via llvm-dev

2019-Oct-25 04:21 UTC

head link

[llvm-dev] Replicate Individual O3 optimizations

Yes, this is another indication that there some processing or bridge in 
the clang -O3 compile not so far evidenced as well when compiling with 
clang to its IR before the optimization passes.

This may be an issue explained in a yet to be known documentation page. 
Or it may be a point at the moment overlooked by the well informed.

An issue being noted here but not well addressed is that a well stated 
design of LLVM with its front-ends and back ends is that the front-ends 
compile to an IR without optimization that LLVM uses for optimization 
and preparation for various back-ends. But that with clang -O3, given 
this evidence, we are not easily seeing how the division between the 
clang front end and LLVM works, though the assumed design suggests it 
should be quite easy.

We should be able to compile with clang to the IR before optimization 
and then apply the LLVM optimization separately to obtain the same final 
IR as a clang -O3 compile doing both of those. But we are not seeing that.

This also bears on the e2e thread in that this assumed division posits 
that the separate clang and LLVM debug sequences can provide a high 
reliability since the IR intermediate between the two is not expected to 
be that error prone. The errors are expected to be primarily either in 
clang in obtaining a correct IR or in opt (LLVM) in optimizing that IR 
for the back-end. But since we are not able to identify the IR between 
the two under clang -O3 it is a question as to what debug sequence would 
handle what we could not identify.

Neil Nelson

On 10/24/19 5:04 AM, hameeza ahmed wrote:> I run matrix multiplication code with both the approaches o3 at clang 
> and o3 at opt. clang o3 is about 2.97x faster than opt o3.
>
>
>
> On Mon, Oct 21, 2019 at 8:24 AM Neil Nelson <nnelson at infowest.com 
> <mailto:nnelson at infowest.com>> wrote:
>
>     |is_sorted.cpp bool|  |is_sorted(||int|  |*a, ||int|  |n) {||||for| 
|(||int|  |i = 0; i < n - 1; i++)|
>
>     |||if| |(a[i] > a[i + 1])|
>     |||return| |false||;|
>     |||return| |true||;|
>     |}|
>
>     https://blog.regehr.org/archives/1605  How Clang Compiles a Function
>     https://blog.regehr.org/archives/1603  How LLVM Optimizes a Function
>     clang version 10.0.0, Xubuntu 19.04
>
>     clang is_sorted.cpp -S -emit-llvm -o is_sorted_.ll
>     clang is_sorted.cpp -O0 -S -emit-llvm -o is_sorted_O0.ll
>     clang is_sorted.cpp -O0 -Xclang -disable-llvm-passes -S -emit-llvm -o
is_sorted_disable.ll
>
>     No difference in the prior three ll files.
>
>     clang is_sorted.cpp -O1 -S -emit-llvm -o is_sorted_O1.ll
>
>     Many differences between is_sorted_O1.ll and is_sorted_.ll.
>
>     opt -O3 -S is_sorted_.ll -o is_sorted_optO3.ll
>
>     clang is_sorted.cpp -mllvm -debug-pass=Arguments -O3 -S -emit-llvm -o
is_sorted_O3arg.ll
>     opt <optimization sequence obtained in prior step> -S
is_sorted_.ll -o is_sorted_opt_parms.ll
>
>     No difference between is_sorted_optO3.ll and is_sorted_opt_parms.ll,
the last two opt runs.
>     Many differences between is_sorted_O3arg.ll and is_sorted_opt_parms.ll,
the last two runs,
>     clang and opt.
>
>     Conclusions:
>
>     Given my current understanding, the ll files from the first three clang
runs
>     are before any optimizations. Those ll files are from the front-end
phase (CFE).
>     But this is a simple program and it may be that for a more complex
program that
>     the ll files could be different.
>
>     Whether or not we use a -O3 optimization or use the parameters provided
by clang for a
>     -03 optimization, we obtain the same result.
>
>     The difference in question is why an opt run using the CFE ll before
optimization
>     obtains a different ll than a CFE run that includes optimization. That
is, for this case,
>     it is not the expansion of the -O3 parameters that is the difference.
>
>     Initially, it would be interesting to have an ll listing before
optimization from the
>     clang run that includes optimization to compare with the ll from the
clang run without
>     optimization.
>
>     Neil Nelson
>
>     On 10/19/19 11:48 AM, Mehdi AMINI via llvm-dev wrote:
>
>>
>>
>>     On Thu, Oct 17, 2019 at 11:22 AM David Greene via llvm-dev
>>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>         hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org
>>         <mailto:llvm-dev at lists.llvm.org>> writes:
>>
>>         > Hello,
>>         > I want to study the individual O3 optimizations. For this
I
>>         am using
>>         > following commands, but unable to replicate O3 behavior.
>>         >
>>         > 1.
>>        
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>>         -O1
>>         > -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o
>>         vecsum-noopt.ll
>>         >
>>         > 2.
>>        
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>>         -O3
>>         > -mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c
>>         >
>>         > 3.
>>        
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt
>>         > <optimization sequence obtained in step 2> -S
>>         vecsum-noopt.ll -S -o
>>         > o3-chk.ll
>>         >
>>         > Why the IR obtained by above step i.e individual O3
>>         sequences, is not same
>>         > when O3 is passed?
>>         >
>>         > Where I am doing mistake?
>>
>>
>>     If you could provide the full reproducer, it could help to debug
>>     this.
>>
>>
>>         I think you need to turn off LLVM optimizations when doing the
>>         -emit-llvm dump.  Something like this:
>>
>>        
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>>         -O3 \
>>           -mllvm -debug-pass=Arguments -Xclang -disable-llvm-optzns
>>         -emit-llvm \
>>           -S vecsum.c
>>
>>         Otherwise you are effectively running the O3 pipeline twice,
>>         as clang
>>         will emit LLVM IR after optimization, not before (this
>>         confused me too
>>         when I first tried it).
>>
>>
>>     This is the common pitfall indeed!
>>     I think they are doing it correctly in step 1 though by
>>     including: `-Xclang -disable-llvm-passes`.
>>
>>
>>         That said, I'm not sure you will get the same IR out of opt
>>         as with
>>         clang -O3 even with the above.  For example, clang sets
>>         TargetTransformInfo for the pass pipeline and the detailed
>>         information
>>         it uses may or may not be transmitted via the IR it dumps
>>         out.  I have
>>         not personally tried to do this kind of thing in a while.
>>
>>
>>     I struggled as well to setup TTI and TLI the same way clang does :(
>>     It'd be nice to revisit our PassManagerBuilder setup and the
opt
>>     integration to provide reproducibility (maybe could be a starter
>>     project for someone?).
>>
>>     -- 
>>     Mehdi
>>
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     llvm-dev at lists.llvm.org  <mailto:llvm-dev at
lists.llvm.org>
>>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191024/e9b5e0df/attachment.html>

David Blaikie via llvm-dev

2019-Oct-25 04:29 UTC

head link

[llvm-dev] Replicate Individual O3 optimizations

It's 'known' (by some number of LLVM developers) that opt -O3
isn't the
same as clang -O3. It'd be nice if they were closer - patches welcome, etc,
but it hasn't been a priority for anyone. opt -O3 is rarely used - usually
opt is used for testing specific optimizations.

Clang's IR output will differ between -O0 and -O3 (even before running any
LLVM optimizations) - things like lifetime intrinsics, etc, are emitted
only with optimizations enabled, for instance.

If you want to reproduce clang's -O3, best to use clang -O3 (with source
code, or with LLVM IR generated from clang -O3 (so it has lifetime
intrinsics, etc))

On Thu, Oct 24, 2019 at 9:22 PM Neil Nelson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Yes, this is another indication that there some processing or bridge in
> the clang -O3 compile not so far evidenced as well when compiling with
> clang to its IR before the optimization passes.
>
> This may be an issue explained in a yet to be known documentation page. Or
> it may be a point at the moment overlooked by the well informed.
>
> An issue being noted here but not well addressed is that a well stated
> design of LLVM with its front-ends and back ends is that the front-ends
> compile to an IR without optimization that LLVM uses for optimization and
> preparation for various back-ends. But that with clang -O3, given this
> evidence, we are not easily seeing how the division between the clang front
> end and LLVM works, though the assumed design suggests it should be quite
> easy.
>
> We should be able to compile with clang to the IR before optimization and
> then apply the LLVM optimization separately to obtain the same final IR as
> a clang -O3 compile doing both of those. But we are not seeing that.
>
> This also bears on the e2e thread in that this assumed division posits
> that the separate clang and LLVM debug sequences can provide a high
> reliability since the IR intermediate between the two is not expected to be
> that error prone. The errors are expected to be primarily either in clang
> in obtaining a correct IR or in opt (LLVM) in optimizing that IR for the
> back-end. But since we are not able to identify the IR between the two
> under clang -O3 it is a question as to what debug sequence would handle
> what we could not identify.
>
> Neil Nelson
> On 10/24/19 5:04 AM, hameeza ahmed wrote:
>
> I run matrix multiplication code with both the approaches o3 at clang and
> o3 at opt. clang o3 is about 2.97x faster than opt o3.
>
>
>
> On Mon, Oct 21, 2019 at 8:24 AM Neil Nelson <nnelson at infowest.com>
wrote:
>
>> is_sorted.cpp
>> bool is_sorted(int *a, int n) {
>>
>>   for (int i = 0; i < n - 1; i++)
>>
>>     if (a[i] > a[i + 1])
>>       return false;
>>   return true;
>> }
>>
>> https://blog.regehr.org/archives/1605 How Clang Compiles a
Functionhttps://blog.regehr.org/archives/1603 How LLVM Optimizes a Function
>> clang version 10.0.0, Xubuntu 19.04
>>
>> clang is_sorted.cpp -S -emit-llvm -o is_sorted_.ll
>> clang is_sorted.cpp -O0 -S -emit-llvm -o is_sorted_O0.ll
>> clang is_sorted.cpp -O0 -Xclang -disable-llvm-passes -S -emit-llvm -o
is_sorted_disable.ll
>>
>> No difference in the prior three ll files.
>>
>> clang is_sorted.cpp -O1 -S -emit-llvm -o is_sorted_O1.ll
>>
>> Many differences between is_sorted_O1.ll and is_sorted_.ll.
>>
>> opt -O3 -S is_sorted_.ll -o is_sorted_optO3.ll
>>
>> clang is_sorted.cpp -mllvm -debug-pass=Arguments -O3 -S -emit-llvm -o
is_sorted_O3arg.ll
>> opt <optimization sequence obtained in prior step> -S
is_sorted_.ll -o is_sorted_opt_parms.ll
>>
>> No difference between is_sorted_optO3.ll and is_sorted_opt_parms.ll,
the last two opt runs.
>> Many differences between is_sorted_O3arg.ll and is_sorted_opt_parms.ll,
the last two runs,
>> clang and opt.
>>
>> Conclusions:
>>
>> Given my current understanding, the ll files from the first three clang
runs
>> are before any optimizations. Those ll files are from the front-end
phase (CFE).
>> But this is a simple program and it may be that for a more complex
program that
>> the ll files could be different.
>>
>> Whether or not we use a -O3 optimization or use the parameters provided
by clang for a
>> -03 optimization, we obtain the same result.
>>
>> The difference in question is why an opt run using the CFE ll before
optimization
>> obtains a different ll than a CFE run that includes optimization. That
is, for this case,
>> it is not the expansion of the -O3 parameters that is the difference.
>>
>> Initially, it would be interesting to have an ll listing before
optimization from the
>> clang run that includes optimization to compare with the ll from the
clang run without
>> optimization.
>>
>> Neil Nelson
>>
>> On 10/19/19 11:48 AM, Mehdi AMINI via llvm-dev wrote:
>>
>>
>>
>> On Thu, Oct 17, 2019 at 11:22 AM David Greene via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org>
writes:
>>>
>>> > Hello,
>>> > I want to study the individual O3 optimizations. For this I am
using
>>> > following commands, but unable to replicate O3 behavior.
>>> >
>>> > 1.
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>>> -O1
>>> > -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o
vecsum-noopt.ll
>>> >
>>> > 2.
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>>> -O3
>>> > -mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c
>>> >
>>> > 3.
Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt
>>> > <optimization sequence obtained in step 2> -S
vecsum-noopt.ll -S -o
>>> > o3-chk.ll
>>> >
>>> > Why the IR obtained by above step i.e individual O3 sequences,
is not
>>> same
>>> > when O3 is passed?
>>> >
>>> > Where I am doing mistake?
>>>
>>
>> If you could provide the full reproducer, it could help to debug this.
>>
>>
>>>
>>> I think you need to turn off LLVM optimizations when doing the
>>> -emit-llvm dump.  Something like this:
>>>
>>> Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
-O3 \
>>>   -mllvm -debug-pass=Arguments -Xclang -disable-llvm-optzns
-emit-llvm \
>>>   -S vecsum.c
>>>
>>> Otherwise you are effectively running the O3 pipeline twice, as
clang
>>> will emit LLVM IR after optimization, not before (this confused me
too
>>> when I first tried it).
>>>
>>
>> This is the common pitfall indeed!
>> I think they are doing it correctly in step 1 though by including:
>> `-Xclang -disable-llvm-passes`.
>>
>>
>> That said, I'm not sure you will get the same IR out of opt as with
>>> clang -O3 even with the above.  For example, clang sets
>>> TargetTransformInfo for the pass pipeline and the detailed
information
>>> it uses may or may not be transmitted via the IR it dumps out.  I
have
>>> not personally tried to do this kind of thing in a while.
>>
>>
>> I struggled as well to setup TTI and TLI the same way clang does :(
>> It'd be nice to revisit our PassManagerBuilder setup and the opt
>> integration to provide reproducibility (maybe could be a starter
project
>> for someone?).
>>
>> --
>> Mehdi
>>
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191024/0c48400f/attachment-0001.html>

llvm dev - Oct 2019 - Replicate Individual O3 optimizations

[llvm-dev] Replicate Individual O3 optimizations

[llvm-dev] Replicate Individual O3 optimizations

[llvm-dev] Replicate Individual O3 optimizations