thr3ads.net - llvm dev - [llvm-dev] Replication -O3 optimizations manually [Aug 2018]

If this information is useful, please help other people find it:
Share via:

cszide via llvm-dev

2018-Aug-17 01:55 UTC

[llvm-dev] Replication -O3 optimizations manually

Hi, Stefano
I also have the problem as described by Emanuele. You say that clang schedules
target-independent and target-dependent passes.
However, when I use lli to execute bitcode generated by opt with -O3 or with the
same optimization passes as -O3, the performance are still different.
So, are there some special operations by -O3 option? I read the source code of
opt, but I cannot find the reason.

Best regards
Zide

At 2018-08-16 22:13:14, "Stefano Cherubin via llvm-dev" <llvm-dev
at lists.llvm.org> wrote:

Hello Emanuele,


When you provide the optimization level -O3 to the clang driver, it does not
simply schedule a sequence of passes to be run on the intermediate
representation.
Indeed, it schedules target-independent and target-dependent passes.
Moreover, IIRC, the optimization level is also used in the later stages of the
code generation to apply target-dependent optimizations (i.e. vectorizer).


The most common use case when someone wants to test its own pass/work within the
LLVM toolchain is the following

- use clang to generate a LLVM-IR file
- use opt to run your desired pass / pass sequence and output another LLVM-IR
file
- use clang -O3 to compile to executable machine code


However, with this approach you will run the passes on the LLVM-IR twice.
There are use cases when this could invalidate your results.
As opt stops at LLVM-IR level, I would suggest you to use also other LLVM tools
to run individually the backend stages / sequence of passes which cannot be run
by opt (such as llc / llvm-mc).
An extensive list of tools/commands you can use is available at [0].
For your specific case, I would suggest you to have a look at this restricted
schema [1].


Yet there is another way to get into even fine grain detail.
You can check which are the clang DriverActions you are running with a given
command line. See [2].
From that point you can rebuild the exact whole sequence of commands that the
clang driver triggers.


If you can provide more details about what is your use case (measure
performance, pass development and testing, flag selection, phase ordering), we
can suggest the most suitable approach.


Kind regards,


Stefano Cherubin


[0] http://llvm.org/docs/CommandGuide/
[1] https://github.com/skeru/LLVM-intro/blob/master/img/03/toolchain.pdf
[2] https://clang.llvm.org/docs/DriverInternals.html#driver-stages









On Thursday, 16 August 2018, 12:46:04 CEST, Emanuele Del Sozzo via llvm-dev
<llvm-dev at lists.llvm.org> wrote:





Hello llvm-dev,

my name is Emanuele and I am an intern in ARM. As part of the project I am doing
here, I would like to manually replicate the optimizations that LLVM applies
when I type -O3. In other words, I would like to know what are the compilation
flags/passes that -O3 triggers.

I noticed that GCC reports, on its website, all the flags that are enforced by
-O3 (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html), but I wasn't
able to find something similar within LLVM documentation. On the other hand, I
found that this command displays all the optimization passes applied by opt when
-O3 flag is on:

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments

I tried to apply the same optimization passes through opt, but, even though the
performance are similar, the resulting binary is slower than the one generated
using -O3 (also the binaries differ, of course).

Again, I found this other command that does something similar (it lists the
sequence of optimization passes applied):

clang -O3 -mllvm -debug-pass=Arguments file.c 

In this case, the performance are still different and some of the optimization
passes listed in the last block of passes (e.g. -machinemoduleinfo,
-stack-protector, etc.) are unknown to opt.




Said that, my question is: how can I find out what optimization passes/flags -O3
enforces in order to manually apply the same optimizations and have, hopefully,
the same binary and performance?




I am currently using LLVM version 5.0.2.




Thank you for both your help and your time!




Best regards

Emanuele




IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/c087c442/attachment.html>

Emanuele Del Sozzo via llvm-dev

2018-Aug-17 09:12 UTC

head link

[llvm-dev] Replication -O3 optimizations manually

Hi Stefano,

first of all, thank you for your reply!


Here the compilation steps I am currently applying:

1) clang main.c -Xclang -disable-O0-optnone -fomit-frame-pointer -Xclang
-vectorize-loops -Xclang -vectorize-slp -momit-leaf-frame-pointer -S -emit-llvm
-o main.ll
2) opt main.ll $myPasses -o main.bc
3) llc main.bc -o main.s
4) clang -c main.s
5) clang main.o -lm -o exe -mno-relax-all


With respect to each step:

1) I use -disable-O0-optnone because I noticed that, when I do not use any
-Olevel flag (I leave the default -O0), opt ignores most of the optimization
passes I provide.

Moreover, using the following command:

clang file.c -xc -O3 -o /dev/null -###

I noticed that some clang activates some optimization flags (e.g.
-vectorize-loops, -vectorize-slp), that, otherwise, are not enabled by -O0

2) $myPasses contains all the optimization passes extracted using this command:

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments

I also tried using the passes provided by this command:

clang -O3 -mllvm -debug-pass=Arguments file.c

but, as I said before, some of the optimization passes generate an error

5) -mno-relax-all comes again from the fact that -mrelax-all is enabled using
-O0, while it is not using -O3.


I am currently measuring the performance in terms of execution time. I am using
a suite of benchmarks that automatically provides the execution time of its
benchmarks. In some cases, my execution times are close to the ones produced
using -O3, while, in order cases, my results are definitely worse.


I also tried to compared the IR generated by step 2) against the one generated
by the foloowing command:

clang main.c -S -emit-llvm

and, of course, they are different. I noticed that the latter also contains more
metadata.


Am I missing any optimizations?


Thank you for your help!


Best regards

Emanuele Del Sozzo

________________________________
From: cszide <cszide at 163.com>
Sent: Friday, August 17, 2018 2:55:30 AM
To: Stefano Cherubin
Cc: llvm-dev at lists.llvm.org; Emanuele Del Sozzo
Subject: Re:Re: [llvm-dev] Replication -O3 optimizations manually

Hi, Stefano
I also have the problem as described by Emanuele. You say that clang schedules
target-independent and target-dependent passes.
However, when I use lli to execute bitcode generated by opt with -O3 or with the
same optimization passes as -O3, the performance are still different.
So, are there some special operations by -O3 option? I read the source code of
opt, but I cannot find the reason.

Best regards
Zide

At 2018-08-16 22:13:14, "Stefano Cherubin via llvm-dev" <llvm-dev
at lists.llvm.org> wrote:
Hello Emanuele,

When you provide the optimization level -O3 to the clang driver, it does not
simply schedule a sequence of passes to be run on the intermediate
representation.
Indeed, it schedules target-independent and target-dependent passes.
Moreover, IIRC, the optimization level is also used in the later stages of the
code generation to apply target-dependent optimizations (i.e. vectorizer).

The most common use case when someone wants to test its own pass/work within the
LLVM toolchain is the following
- use clang to generate a LLVM-IR file
- use opt to run your desired pass / pass sequence and output another LLVM-IR
file
- use clang -O3 to compile to executable machine code

However, with this approach you will run the passes on the LLVM-IR twice.
There are use cases when this could invalidate your results.
As opt stops at LLVM-IR level, I would suggest you to use also other LLVM tools
to run individually the backend stages / sequence of passes which cannot be run
by opt (such as llc / llvm-mc).
An extensive list of tools/commands you can use is available at [0].
For your specific case, I would suggest you to have a look at this restricted
schema [1].

Yet there is another way to get into even fine grain detail.
You can check which are the clang DriverActions you are running with a given
command line. See [2].>From that point you can rebuild the exact whole sequence of commands that
the clang driver triggers.
If you can provide more details about what is your use case (measure
performance, pass development and testing, flag selection, phase ordering), we
can suggest the most suitable approach.

Kind regards,

Stefano Cherubin

[0] http://llvm.org/docs/CommandGuide/
[1] https://github.com/skeru/LLVM-intro/blob/master/img/03/toolchain.pdf
[2] https://clang.llvm.org/docs/DriverInternals.html#driver-stages




On Thursday, 16 August 2018, 12:46:04 CEST, Emanuele Del Sozzo via llvm-dev
<llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
wrote:



Hello llvm-dev,

my name is Emanuele and I am an intern in ARM. As part of the project I am doing
here, I would like to manually replicate the optimizations that LLVM applies
when I type -O3. In other words, I would like to know what are the compilation
flags/passes that -O3 triggers.

I noticed that GCC reports, on its website, all the flags that are enforced by
-O3 (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html), but I wasn't
able to find something similar within LLVM documentation. On the other hand, I
found that this command displays all the optimization passes applied by opt when
-O3 flag is on:

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments

I tried to apply the same optimization passes through opt, but, even though the
performance are similar, the resulting binary is slower than the one generated
using -O3 (also the binaries differ, of course).

Again, I found this other command that does something similar (it lists the
sequence of optimization passes applied):

clang -O3 -mllvm -debug-pass=Arguments file.c

In this case, the performance are still different and some of the optimization
passes listed in the last block of passes (e.g. -machinemoduleinfo,
-stack-protector, etc.) are unknown to opt.


Said that, my question is: how can I find out what optimization passes/flags -O3
enforces in order to manually apply the same optimizations and have, hopefully,
the same binary and performance?


I am currently using LLVM version 5.0.2.


Thank you for both your help and your time!


Best regards

Emanuele


IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/bb411641/attachment.html>

Stefano Cherubin via llvm-dev

2018-Aug-17 10:44 UTC

head link

[llvm-dev] Replication -O3 optimizations manually

Hi Zide,
the scope of opt is limited to the LLVM-IR, which is meant to be always target
independent.In order to apply backend optimizations you need to lower the
representation to something closer to the machine-level.I would suggest you to
measure performance on machine code, not LLVM-IR.
To this end, please refer to the setup Emanuele is using.
However, I may not have properly understood your test.lli is the LLVM-IR
interpreter and it is meant more for functional testing rather than performance
testing.Are you comparing the performance of machine code generated by clang -O3
against the performance of lli optimized_IR.bc ?
Best regards,
Stefano Cherubin

    On Friday, 17 August 2018, 03:55:52 CEST, cszide <cszide at 163.com>
wrote:
 
 Hi, Stefano
I also have the problem as described by Emanuele. You say that clang schedules
target-independent and target-dependent passes.
However, when I use lli to execute bitcode generated by opt with -O3 or with the
same optimization passes as -O3, the performance are still different.
So, are there some special operations by -O3 option? I read the source code of
opt, but I cannot find the reason.

Best regards
Zide

At 2018-08-16 22:13:14, "Stefano Cherubin via llvm-dev" <llvm-dev
at lists.llvm.org> wrote:
 
 Hello Emanuele,
When you provide the optimization level -O3 to the clang driver, it does not
simply schedule a sequence of passes to be run on the intermediate
representation.Indeed, it schedules target-independent and target-dependent
passes.Moreover, IIRC, the optimization level is also used in the later stages
of the code generation to apply target-dependent optimizations (i.e.
vectorizer).
The most common use case when someone wants to test its own pass/work within the
LLVM toolchain is the following
- use clang to generate a LLVM-IR file- use opt to run your desired pass / pass
sequence and output another LLVM-IR file- use clang -O3 to compile to executable
machine code
However, with this approach you will run the passes on the LLVM-IR twice.There
are use cases when this could invalidate your results.As opt stops at LLVM-IR
level, I would suggest you to use also other LLVM tools to run individually the
backend stages / sequence of passes which cannot be run by opt (such as llc /
llvm-mc).An extensive list of tools/commands you can use is available at [0].For
your specific case, I would suggest you to have a look at this restricted schema
[1].
Yet there is another way to get into even fine grain detail.You can check which
are the clang DriverActions you are running with a given command line. See
[2].From that point you can rebuild the exact whole sequence of commands that
the clang driver triggers.
If you can provide more details about what is your use case (measure
performance, pass development and testing, flag selection, phase ordering), we
can suggest the most suitable approach.
Kind regards,
Stefano Cherubin
[0] http://llvm.org/docs/CommandGuide/[1]
https://github.com/skeru/LLVM-intro/blob/master/img/03/toolchain.pdf[2]
https://clang.llvm.org/docs/DriverInternals.html#driver-stages




    On Thursday, 16 August 2018, 12:46:04 CEST, Emanuele Del Sozzo via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
 
  


Hello llvm-dev,

my name is Emanuele and I am an intern in ARM. As part of the project I am doing
here, I would like to manually replicate the optimizations that LLVM applies
when I type -O3. In other words, I would like to know what are the compilation
flags/passes that -O3 triggers. 

I noticed that GCC reports, on its website, all the flags that are enforced by
-O3 (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html), but I wasn't
able to find something similar within LLVM documentation. On the other hand, I
found that this command displays all the optimization passes applied by opt when
-O3 flag is on:

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments

I tried to apply the same optimization passes through opt, but, even though the
performance are similar, the resulting binary is slower than the one generated
using -O3 (also the binaries differ, of course).

Again, I found this other command that does something similar (it lists the
sequence of optimization passes applied):

clang -O3 -mllvm -debug-pass=Arguments file.c 

In this case, the performance are still different and some of the optimization
passes listed in the last block of passes (e.g. -machinemoduleinfo,
-stack-protector, etc.) are unknown to opt.




Said that, my question is: how can I find out what optimization passes/flags -O3
enforces in order to manually apply the same optimizations and have, hopefully,
the same binary and performance?




I am currently using LLVM version 5.0.2.




Thank you for both your help and your time!




Best regards

Emanuele



IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you._______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
  



 
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/d4de898a/attachment-0001.html>

Stefano Cherubin via llvm-dev

2018-Aug-17 10:58 UTC

head link

[llvm-dev] Replication -O3 optimizations manually

Hi Emanuele,
The first thing I would highlight in your compilation flow is the absence of any
optimization level in the compilation step 3.If your goal is to compare code and
performance against clang -O3, you would probably need to add -O3 also to the
llc command.
I honestly don't know on which other steps the optimization level of clang
may impact.If the difference, after the aforementioned fix is not negligible, I
can suggest you to replace step 3,4,5 with a simplerclang -O3 main.bc -o exe -lm
Best regards,
Stefano Cherubin

    On Friday, 17 August 2018, 11:12:36 CEST, Emanuele Del Sozzo
<Emanuele.DelSozzo at arm.com> wrote:
 
 
Hi Stefano,

first of all, thank you for your reply!




Here the compilation steps I am currently applying:


1) clang main.c -Xclang -disable-O0-optnone -fomit-frame-pointer -Xclang
-vectorize-loops -Xclang -vectorize-slp -momit-leaf-frame-pointer -S -emit-llvm
-o main.ll2) opt main.ll $myPasses -o main.bc3) llc main.bc -o main.s4) clang -c
main.s5) clang main.o -lm -o exe -mno-relax-all



With respect to each step:

1) I use -disable-O0-optnone because I noticed that, when I do not use any
-Olevel flag (I leave the default -O0), opt ignores most of the optimization
passes I provide.

Moreover, using the following command: 

clang file.c -xc -O3 -o /dev/null -### 

I noticed that some clang activates some optimization flags (e.g.
-vectorize-loops, -vectorize-slp), that, otherwise, are not enabled by -O0

2) $myPasses contains all the optimization passes extracted using this command:

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments


I also tried using the passes provided by this command:

clang -O3 -mllvm -debug-pass=Arguments file.c 


but, as I said before, some of the optimization passes generate an error

5) -mno-relax-all comes again from the fact that -mrelax-all is enabled using
-O0, while it is not using -O3.




I am currently measuring the performance in terms of execution time. I am using
a suite of benchmarks that automatically provides the execution time of its
benchmarks. In some cases, my execution times are close to the ones produced
using -O3, while, in order cases, my results are definitely worse.




I also tried to compared the IR generated by step 2) against the one generated
by the foloowing command:

clang main.c -S -emit-llvm

and, of course, they are different. I noticed that the latter also contains more
metadata.




Am I missing any optimizations?




Thank you for your help!




Best regards

Emanuele Del Sozzo
From: cszide <cszide at 163.com>
Sent: Friday, August 17, 2018 2:55:30 AM
To: Stefano Cherubin
Cc: llvm-dev at lists.llvm.org; Emanuele Del Sozzo
Subject: Re:Re: [llvm-dev] Replication -O3 optimizations manually Hi, Stefano
I also have the problem as described by Emanuele. You say that clang schedules
target-independent and target-dependent passes.
However, when I use lli to execute bitcode generated by opt with -O3 or with the
same optimization passes as -O3, the performance are still different.
So, are there some special operations by -O3 option? I read the source code of
opt, but I cannot find the reason.

Best regards
Zide

At 2018-08-16 22:13:14, "Stefano Cherubin via llvm-dev" <llvm-dev
at lists.llvm.org> wrote:

Hello Emanuele,
When you provide the optimization level -O3 to the clang driver, it does not
simply schedule a sequence of passes to be run on the intermediate
representation.Indeed, it schedules target-independent and target-dependent
passes.Moreover, IIRC, the optimization level is also used in the later stages
of the code generation to apply target-dependent optimizations (i.e.
vectorizer).
The most common use case when someone wants to test its own pass/work within the
LLVM toolchain is the following
- use clang to generate a LLVM-IR file- use opt to run your desired pass / pass
sequence and output another LLVM-IR file- use clang -O3 to compile to executable
machine code
However, with this approach you will run the passes on the LLVM-IR twice.There
are use cases when this could invalidate your results.As opt stops at LLVM-IR
level, I would suggest you to use also other LLVM tools to run individually the
backend stages / sequence of passes which cannot be run by opt (such as llc /
llvm-mc).An extensive list of tools/commands you can use is available at [0].For
your specific case, I would suggest you to have a look at this restricted schema
[1].
Yet there is another way to get into even fine grain detail.You can check which
are the clang DriverActions you are running with a given command line. See
[2].From that point you can rebuild the exact whole sequence of commands that
the clang driver triggers.
If you can provide more details about what is your use case (measure
performance, pass development and testing, flag selection, phase ordering), we
can suggest the most suitable approach.
Kind regards,
Stefano Cherubin
[0] http://llvm.org/docs/CommandGuide/[1]
https://github.com/skeru/LLVM-intro/blob/master/img/03/toolchain.pdf[2]
https://clang.llvm.org/docs/DriverInternals.html#driver-stages




On Thursday, 16 August 2018, 12:46:04 CEST, Emanuele Del Sozzo via llvm-dev
<llvm-dev at lists.llvm.org> wrote:




Hello llvm-dev,

my name is Emanuele and I am an intern in ARM. As part of the project I am doing
here, I would like to manually replicate the optimizations that LLVM applies
when I type -O3. In other words, I would like to know what are the compilation
flags/passes that -O3 triggers. 

I noticed that GCC reports, on its website, all the flags that are enforced by
-O3 (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html), but I wasn't
able to find something similar within LLVM documentation. On the other hand, I
found that this command displays all the optimization passes applied by opt when
-O3 flag is on:

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments

I tried to apply the same optimization passes through opt, but, even though the
performance are similar, the resulting binary is slower than the one generated
using -O3 (also the binaries differ, of course).

Again, I found this other command that does something similar (it lists the
sequence of optimization passes applied):

clang -O3 -mllvm -debug-pass=Arguments file.c 

In this case, the performance are still different and some of the optimization
passes listed in the last block of passes (e.g. -machinemoduleinfo,
-stack-protector, etc.) are unknown to opt.




Said that, my question is: how can I find out what optimization passes/flags -O3
enforces in order to manually apply the same optimizations and have, hopefully,
the same binary and performance?




I am currently using LLVM version 5.0.2.




Thank you for both your help and your time!




Best regards

Emanuele



IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you. _______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




 
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/2ef3b674/attachment.html>

Emanuele Del Sozzo via llvm-dev

2018-Aug-17 15:49 UTC

head link

[llvm-dev] Replication -O3 optimizations manually

Hi Zide,

I think I found the right way to reach my goal.

I used the following command:

clang -O3 -Xclang -disable-llvm-optzns main.c -S -emit-llvm -o main.ll

to generate an IR file enriched by all the metadata that otherwise wouldn't
be generated with -O0. Moreover, -disable-llvm-optzns flag ensures that none of
the optimization passes has been applied yet to the IR.

In this way, I can replicate -O3 result by applying the optimization passes
using opt. Apparently, those metadata are necessary to fully optimize the code.


I hope that this may help you too.


Best regards

Emanuele Del Sozzo

________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Stefano
Cherubin via llvm-dev <llvm-dev at lists.llvm.org>
Sent: Friday, August 17, 2018 11:44:50 AM
To: llvm-dev at lists.llvm.org; cszide
Subject: Re: [llvm-dev] Replication -O3 optimizations manually

Hi Zide,

the scope of opt is limited to the LLVM-IR, which is meant to be always target
independent.
In order to apply backend optimizations you need to lower the representation to
something closer to the machine-level.
I would suggest you to measure performance on machine code, not LLVM-IR.
To this end, please refer to the setup Emanuele is using.

However, I may not have properly understood your test.
lli is the LLVM-IR interpreter and it is meant more for functional testing
rather than performance testing.
Are you comparing the performance of machine code generated by clang -O3 against
the performance of lli optimized_IR.bc ?

Best regards,

Stefano Cherubin


On Friday, 17 August 2018, 03:55:52 CEST, cszide <cszide at 163.com>
wrote:


Hi, Stefano
I also have the problem as described by Emanuele. You say that clang schedules
target-independent and target-dependent passes.
However, when I use lli to execute bitcode generated by opt with -O3 or with the
same optimization passes as -O3, the performance are still different.
So, are there some special operations by -O3 option? I read the source code of
opt, but I cannot find the reason.

Best regards
Zide

At 2018-08-16 22:13:14, "Stefano Cherubin via llvm-dev" <llvm-dev
at lists.llvm.org> wrote:
Hello Emanuele,

When you provide the optimization level -O3 to the clang driver, it does not
simply schedule a sequence of passes to be run on the intermediate
representation.
Indeed, it schedules target-independent and target-dependent passes.
Moreover, IIRC, the optimization level is also used in the later stages of the
code generation to apply target-dependent optimizations (i.e. vectorizer).

The most common use case when someone wants to test its own pass/work within the
LLVM toolchain is the following
- use clang to generate a LLVM-IR file
- use opt to run your desired pass / pass sequence and output another LLVM-IR
file
- use clang -O3 to compile to executable machine code

However, with this approach you will run the passes on the LLVM-IR twice.
There are use cases when this could invalidate your results.
As opt stops at LLVM-IR level, I would suggest you to use also other LLVM tools
to run individually the backend stages / sequence of passes which cannot be run
by opt (such as llc / llvm-mc).
An extensive list of tools/commands you can use is available at [0].
For your specific case, I would suggest you to have a look at this restricted
schema [1].

Yet there is another way to get into even fine grain detail.
You can check which are the clang DriverActions you are running with a given
command line. See [2].>From that point you can rebuild the exact whole sequence of commands that
the clang driver triggers.
If you can provide more details about what is your use case (measure
performance, pass development and testing, flag selection, phase ordering), we
can suggest the most suitable approach.

Kind regards,

Stefano Cherubin

[0] http://llvm.org/docs/CommandGuide/
[1] https://github.com/skeru/LLVM-intro/blob/master/img/03/toolchain.pdf
[2] https://clang.llvm.org/docs/DriverInternals.html#driver-stages




On Thursday, 16 August 2018, 12:46:04 CEST, Emanuele Del Sozzo via llvm-dev
<llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
wrote:



Hello llvm-dev,

my name is Emanuele and I am an intern in ARM. As part of the project I am doing
here, I would like to manually replicate the optimizations that LLVM applies
when I type -O3. In other words, I would like to know what are the compilation
flags/passes that -O3 triggers.

I noticed that GCC reports, on its website, all the flags that are enforced by
-O3 (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html), but I wasn't
able to find something similar within LLVM documentation. On the other hand, I
found that this command displays all the optimization passes applied by opt when
-O3 flag is on:

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments

I tried to apply the same optimization passes through opt, but, even though the
performance are similar, the resulting binary is slower than the one generated
using -O3 (also the binaries differ, of course).

Again, I found this other command that does something similar (it lists the
sequence of optimization passes applied):

clang -O3 -mllvm -debug-pass=Arguments file.c

In this case, the performance are still different and some of the optimization
passes listed in the last block of passes (e.g. -machinemoduleinfo,
-stack-protector, etc.) are unknown to opt.


Said that, my question is: how can I find out what optimization passes/flags -O3
enforces in order to manually apply the same optimizations and have, hopefully,
the same binary and performance?


I am currently using LLVM version 5.0.2.


Thank you for both your help and your time!


Best regards

Emanuele


IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/0cc0448b/attachment-0001.html>

llvm dev - Aug 2018 - Replication -O3 optimizations manually

[llvm-dev] Replication -O3 optimizations manually

[llvm-dev] Replication -O3 optimizations manually

[llvm-dev] Replication -O3 optimizations manually

[llvm-dev] Replication -O3 optimizations manually

[llvm-dev] Replication -O3 optimizations manually