thr3ads.net - llvm dev - [LLVMdev] [cfe-dev] Controlling the LTO optimization level [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne

2015-Mar-18 23:27 UTC

[LLVMdev] Controlling the LTO optimization level

Hi all,

I wanted to start a thread to discuss ways to control the optimization
level when using LTO. We have found that there are use cases for the LTO
mechanism beyond whole-program optimization, in which full optimization
is not always needed or desired. We started that discussion over in
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266560.html
and I thought I'd summarize the problem and possible solutions here:

Problem
-------

As currently implemented, the control flow integrity checks in Clang rely on
a so-called bit set lowering pass to implement its checks efficiently. The
current implementation of the bit set lowering pass requires whole-program
visibility. The full details of why are described in the design document at:
http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html

We currently achieve whole-program visibility using LTO. The trouble with LTO
is that it comes with a significant compile time cost -- on large programs
such as Chrome, compiling with link-time optimization can be over 7x slower
(over 3 hours has been measured) than compiling without.

So I would like there to be a way for users to choose whether to apply
optimizations, and how much optimization to apply.

Achieving this requires a design for how users should specify the level of
optimization to apply, as well as a design for changes to the clang driver
and the various LTO plugins so that the plugin knows whether optimizations
are required.

Solutions
---------

1) Controlled at compile time

Strawman proposal for command line syntax:

-flto-level=X means optimize at level X. At link time, the LTO plugin will
take the maximum of all -flto-level flags and optimize at that level.

-flto-level is inferred from other flags if not specified:

-flto implies -flto-level=2.
If -flto not specified, -O >= 1 implies -flto-level=1.
Otherwise, default to -flto-level=0.

This is probably easier to implement in a supported way. We can pass the
LTO level to the linker via module flags as shown in the patches attached to
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266778.html

2) Controlled at link time

-flto-level has the same semantics as in the previous sub-section, except it is
instead passed at link time.

This is to a certain extent possible to implement with libLTO by passing
-mllvm flags to the linker, or with gold by passing -plugin-opt flags.

According to Duncan, passing flags to libLTO this way is unsupported --
if we did want to accept flags at link time, and we absolutely don't want
to pass flags to the linker that way, I suppose we could do something like
have the clang driver synthesize a module containing the module flags we want.

Optimization Levels
-------------------

We need to decide what the various optimization levels mean. The thing that
works best for the CFI use case is for -flto-level=2 to mean what -flto
currently means, for -flto-level=1 to mean "run only the globaldce and
simplifycfg passes", and for -flto-level=0 to mean "run no
passes", but this
may not be the correct thing to do in every situation where we only want a
few passes to run at link time. We may want to make -flto-level a cc1-level
flag until we've had more experience and found more use cases.

Thanks,
-- 
Peter

Sean Silva

2015-Mar-19 01:58 UTC

head link

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

How much much of the LTO time is actually spent in the optimization passes?

-- Sean Silva

On Wed, Mar 18, 2015 at 4:27 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Hi all,
>
> I wanted to start a thread to discuss ways to control the optimization
> level when using LTO. We have found that there are use cases for the LTO
> mechanism beyond whole-program optimization, in which full optimization
> is not always needed or desired. We started that discussion over in
>
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266560.html
> and I thought I'd summarize the problem and possible solutions here:
>
> Problem
> -------
>
> As currently implemented, the control flow integrity checks in Clang rely
> on
> a so-called bit set lowering pass to implement its checks efficiently. The
> current implementation of the bit set lowering pass requires whole-program
> visibility. The full details of why are described in the design document
> at:
> http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
>
> We currently achieve whole-program visibility using LTO. The trouble with
> LTO
> is that it comes with a significant compile time cost -- on large programs
> such as Chrome, compiling with link-time optimization can be over 7x slower
> (over 3 hours has been measured) than compiling without.
>
> So I would like there to be a way for users to choose whether to apply
> optimizations, and how much optimization to apply.
>
> Achieving this requires a design for how users should specify the level of
> optimization to apply, as well as a design for changes to the clang driver
> and the various LTO plugins so that the plugin knows whether optimizations
> are required.
>
> Solutions
> ---------
>
> 1) Controlled at compile time
>
> Strawman proposal for command line syntax:
>
> -flto-level=X means optimize at level X. At link time, the LTO plugin will
> take the maximum of all -flto-level flags and optimize at that level.
>
> -flto-level is inferred from other flags if not specified:
>
> -flto implies -flto-level=2.
> If -flto not specified, -O >= 1 implies -flto-level=1.
> Otherwise, default to -flto-level=0.
>
> This is probably easier to implement in a supported way. We can pass the
> LTO level to the linker via module flags as shown in the patches attached
> to
>
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266778.html
>
> 2) Controlled at link time
>
> -flto-level has the same semantics as in the previous sub-section, except
> it is
> instead passed at link time.
>
> This is to a certain extent possible to implement with libLTO by passing
> -mllvm flags to the linker, or with gold by passing -plugin-opt flags.
>
> According to Duncan, passing flags to libLTO this way is unsupported --
> if we did want to accept flags at link time, and we absolutely don't
want
> to pass flags to the linker that way, I suppose we could do something like
> have the clang driver synthesize a module containing the module flags we
> want.
>
> Optimization Levels
> -------------------
>
> We need to decide what the various optimization levels mean. The thing that
> works best for the CFI use case is for -flto-level=2 to mean what -flto
> currently means, for -flto-level=1 to mean "run only the globaldce and
> simplifycfg passes", and for -flto-level=0 to mean "run no
passes", but
> this
> may not be the correct thing to do in every situation where we only want a
> few passes to run at link time. We may want to make -flto-level a cc1-level
> flag until we've had more experience and found more use cases.
>
> Thanks,
> --
> Peter
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150318/c1ef61da/attachment.html>

Bob Wilson

2015-Mar-19 02:00 UTC

head link

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

> On Mar 18, 2015, at 4:27 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> 
> Hi all,
> 
> I wanted to start a thread to discuss ways to control the optimization
> level when using LTO. We have found that there are use cases for the LTO
> mechanism beyond whole-program optimization, in which full optimization
> is not always needed or desired. We started that discussion over in
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266560.html
> and I thought I'd summarize the problem and possible solutions here:
> 
> Problem
> -------
> 
> As currently implemented, the control flow integrity checks in Clang rely
on
> a so-called bit set lowering pass to implement its checks efficiently. The
> current implementation of the bit set lowering pass requires whole-program
> visibility. The full details of why are described in the design document
at:
> http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
> 
> We currently achieve whole-program visibility using LTO. The trouble with
LTO
> is that it comes with a significant compile time cost -- on large programs
> such as Chrome, compiling with link-time optimization can be over 7x slower
> (over 3 hours has been measured) than compiling without.
We’ve had some recent improvements that speed things up considerably, and
hopefully things will continue to get faster, but I’m sure there will always be
cases where LTO is slower.
> 
> So I would like there to be a way for users to choose whether to apply
> optimizations, and how much optimization to apply.
> 
> Achieving this requires a design for how users should specify the level of
> optimization to apply, as well as a design for changes to the clang driver
> and the various LTO plugins so that the plugin knows whether optimizations
> are required.
> 
> Solutions
> ---------
> 
> 1) Controlled at compile time
> 
> Strawman proposal for command line syntax:
> 
> -flto-level=X means optimize at level X. At link time, the LTO plugin will
> take the maximum of all -flto-level flags and optimize at that level.
> 
> -flto-level is inferred from other flags if not specified:
> 
> -flto implies -flto-level=2.
> If -flto not specified, -O >= 1 implies -flto-level=1.
> Otherwise, default to -flto-level=0.
> 
> This is probably easier to implement in a supported way. We can pass the
> LTO level to the linker via module flags as shown in the patches attached
to
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266778.html
> 
> 2) Controlled at link time
> 
> -flto-level has the same semantics as in the previous sub-section, except
it is
> instead passed at link time.
> 
> This is to a certain extent possible to implement with libLTO by passing
> -mllvm flags to the linker, or with gold by passing -plugin-opt flags.
> 
> According to Duncan, passing flags to libLTO this way is unsupported --
> if we did want to accept flags at link time, and we absolutely don't
want
> to pass flags to the linker that way, I suppose we could do something like
> have the clang driver synthesize a module containing the module flags we
want.
Option (2) makes more sense to me, but I don’t like the idea of introducing a
new command line option. At least for now, this seems like a fairly
special-purpose request for CFI. I haven’t heard anyone else asking for LTO with
minimal optimization. How about if you just pass the “-mllvm” options yourself
when using CFI?

If it turns out that there are lots of people who want this feature, I could
imagine that we might someday repurpose the existing -O optimization options to
pass something to the linker to control LTO optimization. The downside of that
is the clang driver doesn’t know whether the link will involve LTO or not, so it
would have to pass those flags to the linker all the time. That’s not a real
problem, but it’s just extra complexity that doesn’t seem justified unless it
benefits more people.
> Optimization Levels
> -------------------
> 
> We need to decide what the various optimization levels mean. The thing that
> works best for the CFI use case is for -flto-level=2 to mean what -flto
> currently means, for -flto-level=1 to mean "run only the globaldce and
> simplifycfg passes", and for -flto-level=0 to mean "run no
passes", but this
> may not be the correct thing to do in every situation where we only want a
> few passes to run at link time. We may want to make -flto-level a cc1-level
> flag until we've had more experience and found more use cases.
> 
> Thanks,
> -- 
> Peter
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

Peter Collingbourne

2015-Mar-19 03:12 UTC

head link

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

On Wed, Mar 18, 2015 at 07:00:25PM -0700, Bob Wilson
wrote:> 
> > On Mar 18, 2015, at 4:27 PM, Peter Collingbourne <peter at
pcc.me.uk> wrote:
> > 
> > Hi all,
> > 
> > I wanted to start a thread to discuss ways to control the optimization
> > level when using LTO. We have found that there are use cases for the
LTO
> > mechanism beyond whole-program optimization, in which full
optimization
> > is not always needed or desired. We started that discussion over in
> >
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266560.html
> > and I thought I'd summarize the problem and possible solutions
here:
> > 
> > Problem
> > -------
> > 
> > As currently implemented, the control flow integrity checks in Clang
rely on
> > a so-called bit set lowering pass to implement its checks efficiently.
The
> > current implementation of the bit set lowering pass requires
whole-program
> > visibility. The full details of why are described in the design
document at:
> > http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
> > 
> > We currently achieve whole-program visibility using LTO. The trouble
with LTO
> > is that it comes with a significant compile time cost -- on large
programs
> > such as Chrome, compiling with link-time optimization can be over 7x
slower
> > (over 3 hours has been measured) than compiling without.
> 
> We’ve had some recent improvements that speed things up considerably, and
hopefully things will continue to get faster, but I’m sure there will always be
cases where LTO is slower.
Today I found http://reviews.llvm.org/D8431 which seems to fix one of the
big performance issues I was suffering from. Binary size is still an issue
though and I've found opt-level=1 makes a significant improvement there.
> > So I would like there to be a way for users to choose whether to apply
> > optimizations, and how much optimization to apply.
> > 
> > Achieving this requires a design for how users should specify the
level of
> > optimization to apply, as well as a design for changes to the clang
driver
> > and the various LTO plugins so that the plugin knows whether
optimizations
> > are required.
> > 
> > Solutions
> > ---------
> > 
> > 1) Controlled at compile time
> > 
> > Strawman proposal for command line syntax:
> > 
> > -flto-level=X means optimize at level X. At link time, the LTO plugin
will
> > take the maximum of all -flto-level flags and optimize at that level.
> > 
> > -flto-level is inferred from other flags if not specified:
> > 
> > -flto implies -flto-level=2.
> > If -flto not specified, -O >= 1 implies -flto-level=1.
> > Otherwise, default to -flto-level=0.
> > 
> > This is probably easier to implement in a supported way. We can pass
the
> > LTO level to the linker via module flags as shown in the patches
attached to
> >
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266778.html
> > 
> > 2) Controlled at link time
> > 
> > -flto-level has the same semantics as in the previous sub-section,
except it is
> > instead passed at link time.
> > 
> > This is to a certain extent possible to implement with libLTO by
passing
> > -mllvm flags to the linker, or with gold by passing -plugin-opt flags.
> > 
> > According to Duncan, passing flags to libLTO this way is unsupported
--
> > if we did want to accept flags at link time, and we absolutely
don't want
> > to pass flags to the linker that way, I suppose we could do something
like
> > have the clang driver synthesize a module containing the module flags
we want.
> 
> Option (2) makes more sense to me, but I don’t like the idea of introducing
a new command line option.
I assume you mean a driver command line option.
> At least for now, this seems like a fairly special-purpose request for CFI.
I haven’t heard anyone else asking for LTO with minimal optimization. How about
if you just pass the “-mllvm” options yourself when using CFI?
That would work, if we had something like a -opt-level flag that the gold and
libLTO plugins understand. (We already have disable-opt on the libLTO side,
but I'd still like a way of saying opt-level=1).
> If it turns out that there are lots of people who want this feature, I
could imagine that we might someday repurpose the existing -O optimization
options to pass something to the linker to control LTO optimization. The
downside of that is the clang driver doesn’t know whether the link will involve
LTO or not, so it would have to pass those flags to the linker all the time.
That’s not a real problem, but it’s just extra complexity that doesn’t seem
justified unless it benefits more people.
Seems reasonable.

Thanks,
-- 
Peter

Rafael Espíndola

2015-Mar-19 18:09 UTC

head link

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

Having the analogous of -O0/-O1/-O2/-O3 for the LTO pipeline makes
sense I think.

I agree that something along option number 2 is probably the best.
Some questions:

* Should "clang -O3 foo.o -o foo" use LTO with -O3?
* Should "clang foo.o -o foo" use LTO with -O0? That would be a fairly
big change. Maybe we could make the LTO default be 3?
* Should we just add a --ltoO to the clang driver that is independent of -O?
* Some linkers already take a -O(1,2,3) option. Should we try to
forward that or should we differentiate LTO optimizations and general
linker optimizations?

If we want to differentiate linker and LTO optimizations, adding a -O
plugin option to the gold plugin should be fine. As Bob points out,
for ld64 for now we would just use -mllvm.

Cheers,
Rafael

Eric Christopher

2015-Mar-19 18:32 UTC

head link

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

On Thu, Mar 19, 2015 at 11:12 AM Rafael Espíndola <
rafael.espindola at gmail.com> wrote:
> Having the analogous of -O0/-O1/-O2/-O3 for the LTO pipeline makes
> sense I think.
>
> I agree that something along option number 2 is probably the best.
> Some questions:
>
> * Should "clang -O3 foo.o -o foo" use LTO with -O3?
> * Should "clang foo.o -o foo" use LTO with -O0? That would be a
fairly
> big change. Maybe we could make the LTO default be 3?
> * Should we just add a --ltoO to the clang driver that is independent of
> -O?
> * Some linkers already take a -O(1,2,3) option. Should we try to
> forward that or should we differentiate LTO optimizations and general
> linker optimizations?
>
>The linker taking -O1,2,3 as a start is fine for sure. I'd rather this go
from a clang driving everything perspective than a linker driving
everything, but that ship may have sailed.

> If we want to differentiate linker and LTO optimizations, adding a -O
> plugin option to the gold plugin should be fine. As Bob points out,
> for ld64 for now we would just use -mllvm.
>
Sure. A better command line interface similar to the one that we already
have in clang to deal with enabling/disabling passes (or, perhaps, one
that's even better - we're not very good at that at the moment) would be
ultimately a good place to be. Otherwise the interface is just going to be
some sort of special case hell for what everyone wants to do at the LTO
level.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150319/b8a70ba7/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Mar 2015 - [LLVMdev] [cfe-dev] Controlling the LTO optimization level

[LLVMdev] Controlling the LTO optimization level

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

[LLVMdev] [cfe-dev] Controlling the LTO optimization level

Possibly Parallel Threads