via llvm-dev
2019-Mar-29 13:40 UTC
[llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline
Awesome start. Back when I did a similar project at HP/NonStop, the class of optimizations we turned off for our O1 (Og equivalent) tended to be those that reordered code or otherwise messed with the CFG. In fact one of our metrics was: - The set of breakpoint locations available at Og should be the same as those available at O0. This is pretty easy to measure. It can mean either turning off optimizations or doing a better job with the line table; either way you get the preferred user experience. Not saying *Clang* has to use the "must be the same" criterion, but being able to measure this will be extremely helpful. Comparing the metric with/without a given pass will give us a good idea of how much that pass damages the single-stepping experience, and gives us hard data to decide whether certain passes should stay or go. I don't remember whether HP/NonStop turned off constant/value propagation, but I *think* we did, because that can have a really bad effect on availability of variables. Now, if we're more industrious about generating DIExpressions to recover values that get optimized away, that's probably good enough, as usually you want to be looking at things and not so much modifying things during a debugging session. As for Sony's users in particular, working in a real-time environment does constrain how much performance we can give away for other benefits like good debugging. I think we'll have to see how that falls out. --paulr From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Greg Bedwell via llvm-dev Sent: Friday, March 29, 2019 8:25 AM To: Eric Christopher Cc: llvm-dev; Ahmed Bougacha; Petr Hosek Subject: Re: [llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline Thanks for posting this. I'm absolutely of the opinion that current -O1 is almost a "worst of all worlds" optimization level, where the performance of the generated code isn't good enough to be particularly useful (for our users at least) but the debug experience is already getting close to being as bad as -O2/3, so I'm personally very happy with your direction of redefining -O1 (especially as that could then open up the way to future enhancements like using PGO data to let us compile everything at -O1 for the build time performance win, except for the critical hot functions that get the full -O2/3 pipeline for the run time performance win). How will this optimization level interact with LTO (specifically ThinLTO)? Would -O1 -flto=thin to run through a different, faster LTO pipeline or are we expecting that any everyday development build configuration won't include LTO? I'm a little bit more on the fence with what this would mean for -Og, as I'd really like to try and come to some sort of community consensus on exactly what -Og should mean and what its aims should be. If you happen to be at EuroLLVM this year then that would be absolutely perfect timing as I'd already submitted a round table topic to try and start just that process [ http://llvm.org/devmtg/2019-04/#rounds ]. My team's main focus right now is in trying to fix as many -O2 debug experience issues as possible, with the hope that we could consider using an -Og mode to mop up what's left, but we've been surveying our users for a few years now about what they'd find useful in such an optimization level. The general consensus is that performance must not be significantly worse than -O2. We've heard a few numbers thrown around like 5-10% runtime slowdown compared to -O2 being the absolute maximum acceptable level of intrusion for them to consider using such a mode. I'm not really sure how realistic that is and I'm inclined to think that we could probably stretch that limit a little bit here and there if the debugging experience really was that much better, but I think it gives a good indication of at least what our users are looking for. Essentially -O2 but with as few changes as we can get away with making to make the debugging experience better. I know that this is somewhat woolly, so it might be that your proposed pipeline is the closest we can get that matches such an aim, but once we've decided what -Og should mean, I'd like to try and justify any changes with some real data. I'm willing for my team to contribute as much data as we can. We've also been using dexter [ http://llvm.org/devmtg/2018-04/slides/Bedwell-Measuring_the_User_Debugging_Experience.pdf ] to target our -O2 debugging improvement work, but hopefully it will be useful to provide another datapoint for the effects on the debugging experience of disabling specific passes. In my mind, -Og probably would incorporate a few things: * Tweak certain pass behaviors in order to be more favorable towards debugging [ https://reviews.llvm.org/D59431#1437716 ] * Enable features favorable to debugging [ http://llvm.org/devmtg/2017-10/#lightning8 ] * Disable whole passes that are known to fundamentally harm the debugging experience if there is no other alternative approach (this proposal?) * Still give a decent debug experience when used in conjunction with LTO. Thanks again for writing up your proposal. I'm really happy to see movement in this area! -Greg On Fri, 29 Mar 2019 at 02:09, Eric Christopher via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi All, I’ve been thinking about both O1 and Og optimization levels and have a proposal for an improved O1 that I think overlaps in functionality with our desires for Og. The design goal is to rewrite the O1 optimization and code generation pipeline to include the set of optimizations that minimizes build and test time while retaining our ability to debug. This isn’t to minimize efforts around optimized debugging or negate O0 builds, but rather to provide a compromise mode that encompasses some of the benefits of both. In effect to create a “build mode for everyday development”. This proposal is a first approximation guess on direction. I’ll be exploring different options and combinations, but I think this is a good place to start for discussion. Unless there are serious objections to the general direction I’d like to get started so we can explore and look at the code as it comes through review. Optimization and Code Generation Pipeline The optimization passes chosen fall into a few main categories, redundancy elimination and basic optimization/abstraction elimination. The idea is that these are going to be the optimizations that a programmer would expect to happen without affecting debugging. This means not eliminating redundant calls or non-redundant loads as those could fail in different ways and locations while executing. These optimizations will also reduce the overall amount of code going to the code generator helping both linker input size and code generation speed. Dead code elimination - Dead code elimination (ADCE, BDCE) - Dead store elimination - Parts of CFG Simplification - Removing branches and dead code paths and not including commoning and speculation Basic Scalar Optimizations - Constant propagation including SCCP and IPCP - Constant merging - Instruction Combining - Inlining: always_inline and normal inlining passes - Memory to register promotion - CSE of “unobservable” operations - Reassociation of expressions - Global optimizations - try to fold globals to constants Loop Optimizations Loop optimizations have some problems around debuggability and observability, but a suggested set of passes would include optimizations that remove abstractions and not ones that necessarily optimize for performance. - Induction Variable Simplification - LICM but not promotion - Trivial Unswitching - Loop rotation - Full loop unrolling - Loop deletion Pass Structure Overall pass ordering will look similar to the existing pass layout in llvm with passes added or subtracted for O1 rather than a new pass ordering. The motivation here is to make the overall proposal easier to understand initially upstream while also maintaining existing pass pipeline synergies between passes. Instruction selection We will use the fast instruction selector (where it exists) for three reasons: - Significantly faster code generation than llvm’s dag based instruction selection - Better debugability than selection dag - fewer instructions moved around - Fast instruction selection has been optimized somewhat and shouldn’t be an outrageous penalty on most architectures Register allocation The fast register allocator should be used for compilation speed. Thoughts? Thanks! -eric _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190329/990e2f62/attachment-0001.html>
David Blaikie via llvm-dev
2019-Mar-29 18:12 UTC
[llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline
Nice to have metrics - so thanks for mentioning that, even if it doesn't
end up being suitable, it's certainly worth looking at.
Did you do anything similar for the values of variables? I could imagine
"printing the value of a variable" (not necessarily being able to
modify
it) at all those locations should render the same value (not undefined).
& to me, that's actually where I would've guessed -Og (which might
be a
better discussion for a separate thread, to be honest - as much as it was
brought up in the subject of this thread) would diverge from -O1. Doing
things like "leaking the value of any variable at the end of its
scope" to
avoid dead store/unused value elimination ("oh, we saw the last use of this
variable half way through the function, so we reused its register for
something else later on") - and that's a case where that behavior
can't
really (that I can think of) be justified to be unconditional at -O1
(because it pessimizes the code in a way that /only/ gives improvements to
a debugger, really) - though I'm happy to be wrong/hear other opinions on
that.
So my model is more "-Og would be an even more pessimized -O1" (or
potentially -Og isn't really an optimization level, but an orthogonal
setting to optimization that does things like actively pessimize certain
features to make them more debuggable somewhat independently of what
optimizations are used - sort of like the sanitizers) but perhaps that's
inconsistent with what other folks have in mind.
- Dave
On Fri, Mar 29, 2019 at 6:41 AM via llvm-dev <llvm-dev at lists.llvm.org>
wrote:
> Awesome start.
>
>
>
> Back when I did a similar project at HP/NonStop, the class of
> optimizations we turned off for our O1 (Og equivalent) tended to be those
> that reordered code or otherwise messed with the CFG. In fact one of our
> metrics was:
>
> - The set of breakpoint locations available at Og should be the
> same as those available at O0.
>
> This is pretty easy to measure. It can mean either turning off
> optimizations or doing a better job with the line table; either way you get
> the preferred user experience. Not saying *Clang* has to use the "must
be
> the same" criterion, but being able to measure this will be extremely
> helpful. Comparing the metric with/without a given pass will give us a
> good idea of how much that pass damages the single-stepping experience, and
> gives us hard data to decide whether certain passes should stay or go.
>
>
>
> I don't remember whether HP/NonStop turned off constant/value
propagation,
> but I *think* we did, because that can have a really bad effect on
> availability of variables. Now, if we're more industrious about
generating
> DIExpressions to recover values that get optimized away, that's
probably
> good enough, as usually you want to be looking at things and not so much
> modifying things during a debugging session.
>
>
>
> As for Sony's users in particular, working in a real-time environment
does
> constrain how much performance we can give away for other benefits like
> good debugging. I think we'll have to see how that falls out.
>
>
>
> --paulr
>
>
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of
*Greg
> Bedwell via llvm-dev
> *Sent:* Friday, March 29, 2019 8:25 AM
> *To:* Eric Christopher
> *Cc:* llvm-dev; Ahmed Bougacha; Petr Hosek
> *Subject:* Re: [llvm-dev] Proposal for O1/Og Optimization and Code
> Generation Pipeline
>
>
>
> Thanks for posting this. I'm absolutely of the opinion that current
-O1
> is almost a "worst of all worlds" optimization level, where the
performance
> of the generated code isn't good enough to be particularly useful (for
our
> users at least) but the debug experience is already getting close to being
> as bad as -O2/3, so I'm personally very happy with your direction of
> redefining -O1 (especially as that could then open up the way to future
> enhancements like using PGO data to let us compile everything at -O1 for
> the build time performance win, except for the critical hot functions that
> get the full -O2/3 pipeline for the run time performance win).
>
>
>
> How will this optimization level interact with LTO (specifically
> ThinLTO)? Would -O1 -flto=thin to run through a different, faster LTO
> pipeline or are we expecting that any everyday development build
> configuration won't include LTO?
>
>
>
> I'm a little bit more on the fence with what this would mean for -Og,
as
> I'd really like to try and come to some sort of community consensus on
> exactly what -Og should mean and what its aims should be. If you happen to
> be at EuroLLVM this year then that would be absolutely perfect timing as
> I'd already submitted a round table topic to try and start just that
> process [ http://llvm.org/devmtg/2019-04/#rounds ]. My team's main
focus
> right now is in trying to fix as many -O2 debug experience issues as
> possible, with the hope that we could consider using an -Og mode to mop up
> what's left, but we've been surveying our users for a few years now
about
> what they'd find useful in such an optimization level.
>
>
>
> The general consensus is that performance must not be significantly worse
> than -O2. We've heard a few numbers thrown around like 5-10% runtime
> slowdown compared to -O2 being the absolute maximum acceptable level of
> intrusion for them to consider using such a mode. I'm not really sure
how
> realistic that is and I'm inclined to think that we could probably
stretch
> that limit a little bit here and there if the debugging experience really
> was that much better, but I think it gives a good indication of at least
> what our users are looking for. Essentially -O2 but with as few changes as
> we can get away with making to make the debugging experience better. I
> know that this is somewhat woolly, so it might be that your proposed
> pipeline is the closest we can get that matches such an aim, but once
we've
> decided what -Og should mean, I'd like to try and justify any changes
with
> some real data. I'm willing for my team to contribute as much data as
we
> can. We've also been using dexter [
>
http://llvm.org/devmtg/2018-04/slides/Bedwell-Measuring_the_User_Debugging_Experience.pdf
]
> to target our -O2 debugging improvement work, but hopefully it will be
> useful to provide another datapoint for the effects on the debugging
> experience of disabling specific passes.
>
>
>
> In my mind, -Og probably would incorporate a few things:
>
> * Tweak certain pass behaviors in order to be more favorable towards
> debugging [ https://reviews.llvm.org/D59431#1437716 ]
>
> * Enable features favorable to debugging [
> http://llvm.org/devmtg/2017-10/#lightning8 ]
>
> * Disable whole passes that are known to fundamentally harm the debugging
> experience if there is no other alternative approach (this proposal?)
>
> * Still give a decent debug experience when used in conjunction with LTO.
>
>
>
> Thanks again for writing up your proposal. I'm really happy to see
> movement in this area!
>
>
>
> -Greg
>
>
>
>
>
>
>
> On Fri, 29 Mar 2019 at 02:09, Eric Christopher via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi All,
>
> I’ve been thinking about both O1 and Og optimization levels and have a
> proposal for an improved O1 that I think overlaps in functionality
> with our desires for Og. The design goal is to rewrite the O1
> optimization and code generation pipeline to include the set of
> optimizations that minimizes build and test time while retaining our
> ability to debug.
>
> This isn’t to minimize efforts around optimized debugging or negate O0
> builds, but rather to provide a compromise mode that encompasses some
> of the benefits of both. In effect to create a “build mode for
> everyday development”.
>
> This proposal is a first approximation guess on direction. I’ll be
> exploring different options and combinations, but I think this is a
> good place to start for discussion. Unless there are serious
> objections to the general direction I’d like to get started so we can
> explore and look at the code as it comes through review.
>
>
> Optimization and Code Generation Pipeline
>
> The optimization passes chosen fall into a few main categories,
> redundancy elimination and basic optimization/abstraction elimination.
> The idea is that these are going to be the optimizations that a
> programmer would expect to happen without affecting debugging. This
> means not eliminating redundant calls or non-redundant loads as those
> could fail in different ways and locations while executing. These
> optimizations will also reduce the overall amount of code going to the
> code generator helping both linker input size and code generation
> speed.
>
> Dead code elimination
>
> - Dead code elimination (ADCE, BDCE)
> - Dead store elimination
> - Parts of CFG Simplification
> - Removing branches and dead code paths and not including commoning
> and speculation
>
> Basic Scalar Optimizations
>
> - Constant propagation including SCCP and IPCP
> - Constant merging
> - Instruction Combining
> - Inlining: always_inline and normal inlining passes
> - Memory to register promotion
> - CSE of “unobservable” operations
> - Reassociation of expressions
> - Global optimizations - try to fold globals to constants
>
> Loop Optimizations
>
> Loop optimizations have some problems around debuggability and
> observability, but a suggested set of passes would include
> optimizations that remove abstractions and not ones that necessarily
> optimize for performance.
>
> - Induction Variable Simplification
> - LICM but not promotion
> - Trivial Unswitching
> - Loop rotation
> - Full loop unrolling
> - Loop deletion
>
> Pass Structure
>
> Overall pass ordering will look similar to the existing pass layout in
> llvm with passes added or subtracted for O1 rather than a new pass
> ordering. The motivation here is to make the overall proposal easier
> to understand initially upstream while also maintaining existing pass
> pipeline synergies between passes.
>
> Instruction selection
>
> We will use the fast instruction selector (where it exists) for three
> reasons:
> - Significantly faster code generation than llvm’s dag based
> instruction selection
> - Better debugability than selection dag - fewer instructions moved around
> - Fast instruction selection has been optimized somewhat and
> shouldn’t be an outrageous penalty on most architectures
>
> Register allocation
>
> The fast register allocator should be used for compilation speed.
>
> Thoughts?
>
> Thanks!
>
> -eric
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190329/4653dd7b/attachment.html>
via llvm-dev
2019-Mar-29 18:55 UTC
[llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline
Did you do anything similar for the values of variables? I could imagine
"printing the value of a variable" (not necessarily being able to
modify it) at all those locations should render the same value (not undefined).
Oh yes! We also had a criterion that the set of available variables at each
breakpoint would be the same. (I don't think we did a runtime analysis to
verify the actual values were all the same, the tool I remember was a dumper
sort of thing that read the binaries.) This one was mildly tricky, as –O0 tends
to report locals using single-locations for a stack slot and not use ranges; I
don't remember what we did about that. Possibly looked at disassembly, and
identified the first assignment to each variable? Thus constraining the
"true" –O0 available range. Sorry for being fuzzy on this, it was over
a decade ago and I didn't write the tool myself.
So my model is more "-Og would be an even more pessimized -O1" (or
potentially -Og isn't really an optimization level, but an orthogonal
setting to optimization that does things like actively pessimize certain
features to make them more debuggable somewhat independently of what
optimizations are used - sort of like the sanitizers) but perhaps that's
inconsistent with what other folks have in mind.
Distinguishing –O1 from –Og does enable that sort of thing, although you can
also have pessimizations under separate flags. For example the "fake
use" pessimization; Wolfgang Pieb did a lightning talk at the US 2017 dev
meeting on this, his slides say 5-7% performance hit. Our users have come to
like it.
--paulr
From: David Blaikie [mailto:dblaikie at gmail.com]
Sent: Friday, March 29, 2019 2:12 PM
To: Robinson, Paul
Cc: gregbedwell at gmail.com; echristo at gmail.com; llvm-dev at lists.llvm.org;
abougacha at apple.com; phosek at google.com
Subject: Re: [llvm-dev] Proposal for O1/Og Optimization and Code Generation
Pipeline
Nice to have metrics - so thanks for mentioning that, even if it doesn't end
up being suitable, it's certainly worth looking at.
Did you do anything similar for the values of variables? I could imagine
"printing the value of a variable" (not necessarily being able to
modify it) at all those locations should render the same value (not undefined).
& to me, that's actually where I would've guessed -Og (which might
be a better discussion for a separate thread, to be honest - as much as it was
brought up in the subject of this thread) would diverge from -O1. Doing things
like "leaking the value of any variable at the end of its scope" to
avoid dead store/unused value elimination ("oh, we saw the last use of this
variable half way through the function, so we reused its register for something
else later on") - and that's a case where that behavior can't
really (that I can think of) be justified to be unconditional at -O1 (because it
pessimizes the code in a way that /only/ gives improvements to a debugger,
really) - though I'm happy to be wrong/hear other opinions on that.
So my model is more "-Og would be an even more pessimized -O1" (or
potentially -Og isn't really an optimization level, but an orthogonal
setting to optimization that does things like actively pessimize certain
features to make them more debuggable somewhat independently of what
optimizations are used - sort of like the sanitizers) but perhaps that's
inconsistent with what other folks have in mind.
- Dave
On Fri, Mar 29, 2019 at 6:41 AM via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Awesome start.
Back when I did a similar project at HP/NonStop, the class of optimizations we
turned off for our O1 (Og equivalent) tended to be those that reordered code or
otherwise messed with the CFG. In fact one of our metrics was:
- The set of breakpoint locations available at Og should be the same as
those available at O0.
This is pretty easy to measure. It can mean either turning off optimizations or
doing a better job with the line table; either way you get the preferred user
experience. Not saying *Clang* has to use the "must be the same"
criterion, but being able to measure this will be extremely helpful. Comparing
the metric with/without a given pass will give us a good idea of how much that
pass damages the single-stepping experience, and gives us hard data to decide
whether certain passes should stay or go.
I don't remember whether HP/NonStop turned off constant/value propagation,
but I *think* we did, because that can have a really bad effect on availability
of variables. Now, if we're more industrious about generating DIExpressions
to recover values that get optimized away, that's probably good enough, as
usually you want to be looking at things and not so much modifying things during
a debugging session.
As for Sony's users in particular, working in a real-time environment does
constrain how much performance we can give away for other benefits like good
debugging. I think we'll have to see how that falls out.
--paulr
From: llvm-dev [mailto:llvm-dev-bounces at
lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of
Greg Bedwell via llvm-dev
Sent: Friday, March 29, 2019 8:25 AM
To: Eric Christopher
Cc: llvm-dev; Ahmed Bougacha; Petr Hosek
Subject: Re: [llvm-dev] Proposal for O1/Og Optimization and Code Generation
Pipeline
Thanks for posting this. I'm absolutely of the opinion that current -O1 is
almost a "worst of all worlds" optimization level, where the
performance of the generated code isn't good enough to be particularly
useful (for our users at least) but the debug experience is already getting
close to being as bad as -O2/3, so I'm personally very happy with your
direction of redefining -O1 (especially as that could then open up the way to
future enhancements like using PGO data to let us compile everything at -O1 for
the build time performance win, except for the critical hot functions that get
the full -O2/3 pipeline for the run time performance win).
How will this optimization level interact with LTO (specifically ThinLTO)?
Would -O1 -flto=thin to run through a different, faster LTO pipeline or are we
expecting that any everyday development build configuration won't include
LTO?
I'm a little bit more on the fence with what this would mean for -Og, as
I'd really like to try and come to some sort of community consensus on
exactly what -Og should mean and what its aims should be. If you happen to be
at EuroLLVM this year then that would be absolutely perfect timing as I'd
already submitted a round table topic to try and start just that process [
http://llvm.org/devmtg/2019-04/#rounds ]. My team's main focus right now is
in trying to fix as many -O2 debug experience issues as possible, with the hope
that we could consider using an -Og mode to mop up what's left, but
we've been surveying our users for a few years now about what they'd
find useful in such an optimization level.
The general consensus is that performance must not be significantly worse than
-O2. We've heard a few numbers thrown around like 5-10% runtime slowdown
compared to -O2 being the absolute maximum acceptable level of intrusion for
them to consider using such a mode. I'm not really sure how realistic that
is and I'm inclined to think that we could probably stretch that limit a
little bit here and there if the debugging experience really was that much
better, but I think it gives a good indication of at least what our users are
looking for. Essentially -O2 but with as few changes as we can get away with
making to make the debugging experience better. I know that this is somewhat
woolly, so it might be that your proposed pipeline is the closest we can get
that matches such an aim, but once we've decided what -Og should mean,
I'd like to try and justify any changes with some real data. I'm
willing for my team to contribute as much data as we can. We've also been
using dexter [
http://llvm.org/devmtg/2018-04/slides/Bedwell-Measuring_the_User_Debugging_Experience.pdf
] to target our -O2 debugging improvement work, but hopefully it will be useful
to provide another datapoint for the effects on the debugging experience of
disabling specific passes.
In my mind, -Og probably would incorporate a few things:
* Tweak certain pass behaviors in order to be more favorable towards debugging [
https://reviews.llvm.org/D59431#1437716 ]
* Enable features favorable to debugging [
http://llvm.org/devmtg/2017-10/#lightning8 ]
* Disable whole passes that are known to fundamentally harm the debugging
experience if there is no other alternative approach (this proposal?)
* Still give a decent debug experience when used in conjunction with LTO.
Thanks again for writing up your proposal. I'm really happy to see movement
in this area!
-Greg
On Fri, 29 Mar 2019 at 02:09, Eric Christopher via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi All,
I’ve been thinking about both O1 and Og optimization levels and have a
proposal for an improved O1 that I think overlaps in functionality
with our desires for Og. The design goal is to rewrite the O1
optimization and code generation pipeline to include the set of
optimizations that minimizes build and test time while retaining our
ability to debug.
This isn’t to minimize efforts around optimized debugging or negate O0
builds, but rather to provide a compromise mode that encompasses some
of the benefits of both. In effect to create a “build mode for
everyday development”.
This proposal is a first approximation guess on direction. I’ll be
exploring different options and combinations, but I think this is a
good place to start for discussion. Unless there are serious
objections to the general direction I’d like to get started so we can
explore and look at the code as it comes through review.
Optimization and Code Generation Pipeline
The optimization passes chosen fall into a few main categories,
redundancy elimination and basic optimization/abstraction elimination.
The idea is that these are going to be the optimizations that a
programmer would expect to happen without affecting debugging. This
means not eliminating redundant calls or non-redundant loads as those
could fail in different ways and locations while executing. These
optimizations will also reduce the overall amount of code going to the
code generator helping both linker input size and code generation
speed.
Dead code elimination
- Dead code elimination (ADCE, BDCE)
- Dead store elimination
- Parts of CFG Simplification
- Removing branches and dead code paths and not including commoning
and speculation
Basic Scalar Optimizations
- Constant propagation including SCCP and IPCP
- Constant merging
- Instruction Combining
- Inlining: always_inline and normal inlining passes
- Memory to register promotion
- CSE of “unobservable” operations
- Reassociation of expressions
- Global optimizations - try to fold globals to constants
Loop Optimizations
Loop optimizations have some problems around debuggability and
observability, but a suggested set of passes would include
optimizations that remove abstractions and not ones that necessarily
optimize for performance.
- Induction Variable Simplification
- LICM but not promotion
- Trivial Unswitching
- Loop rotation
- Full loop unrolling
- Loop deletion
Pass Structure
Overall pass ordering will look similar to the existing pass layout in
llvm with passes added or subtracted for O1 rather than a new pass
ordering. The motivation here is to make the overall proposal easier
to understand initially upstream while also maintaining existing pass
pipeline synergies between passes.
Instruction selection
We will use the fast instruction selector (where it exists) for three reasons:
- Significantly faster code generation than llvm’s dag based
instruction selection
- Better debugability than selection dag - fewer instructions moved around
- Fast instruction selection has been optimized somewhat and
shouldn’t be an outrageous penalty on most architectures
Register allocation
The fast register allocator should be used for compilation speed.
Thoughts?
Thanks!
-eric
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190329/54fe1098/attachment-0001.html>
Eric Christopher via llvm-dev
2019-Apr-02 02:51 UTC
[llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline
On Fri, Mar 29, 2019 at 6:40 AM <paul.robinson at sony.com> wrote:> > Awesome start. > > > > Back when I did a similar project at HP/NonStop, the class of optimizations we turned off for our O1 (Og equivalent) tended to be those that reordered code or otherwise messed with the CFG. In fact one of our metrics was: > > - The set of breakpoint locations available at Og should be the same as those available at O0. >That's a very interesting metric and yes, should be fairly straightforward to measure.> This is pretty easy to measure. It can mean either turning off optimizations or doing a better job with the line table; either way you get the preferred user experience. Not saying *Clang* has to use the "must be the same" criterion, but being able to measure this will be extremely helpful. Comparing the metric with/without a given pass will give us a good idea of how much that pass damages the single-stepping experience, and gives us hard data to decide whether certain passes should stay or go. > > > > I don't remember whether HP/NonStop turned off constant/value propagation, but I *think* we did, because that can have a really bad effect on availability of variables. Now, if we're more industrious about generating DIExpressions to recover values that get optimized away, that's probably good enough, as usually you want to be looking at things and not so much modifying things during a debugging session. > >That's the idea yes. :)> > As for Sony's users in particular, working in a real-time environment does constrain how much performance we can give away for other benefits like good debugging. I think we'll have to see how that falls out. > >Thanks! It's definitely going to be a bit of a collaborative effort. -eric> > --paulr > > > > > > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Greg Bedwell via llvm-dev > Sent: Friday, March 29, 2019 8:25 AM > To: Eric Christopher > Cc: llvm-dev; Ahmed Bougacha; Petr Hosek > Subject: Re: [llvm-dev] Proposal for O1/Og Optimization and Code Generation Pipeline > > > > Thanks for posting this. I'm absolutely of the opinion that current -O1 is almost a "worst of all worlds" optimization level, where the performance of the generated code isn't good enough to be particularly useful (for our users at least) but the debug experience is already getting close to being as bad as -O2/3, so I'm personally very happy with your direction of redefining -O1 (especially as that could then open up the way to future enhancements like using PGO data to let us compile everything at -O1 for the build time performance win, except for the critical hot functions that get the full -O2/3 pipeline for the run time performance win). > > > > How will this optimization level interact with LTO (specifically ThinLTO)? Would -O1 -flto=thin to run through a different, faster LTO pipeline or are we expecting that any everyday development build configuration won't include LTO? > > > > I'm a little bit more on the fence with what this would mean for -Og, as I'd really like to try and come to some sort of community consensus on exactly what -Og should mean and what its aims should be. If you happen to be at EuroLLVM this year then that would be absolutely perfect timing as I'd already submitted a round table topic to try and start just that process [ http://llvm.org/devmtg/2019-04/#rounds ]. My team's main focus right now is in trying to fix as many -O2 debug experience issues as possible, with the hope that we could consider using an -Og mode to mop up what's left, but we've been surveying our users for a few years now about what they'd find useful in such an optimization level. > > > > The general consensus is that performance must not be significantly worse than -O2. We've heard a few numbers thrown around like 5-10% runtime slowdown compared to -O2 being the absolute maximum acceptable level of intrusion for them to consider using such a mode. I'm not really sure how realistic that is and I'm inclined to think that we could probably stretch that limit a little bit here and there if the debugging experience really was that much better, but I think it gives a good indication of at least what our users are looking for. Essentially -O2 but with as few changes as we can get away with making to make the debugging experience better. I know that this is somewhat woolly, so it might be that your proposed pipeline is the closest we can get that matches such an aim, but once we've decided what -Og should mean, I'd like to try and justify any changes with some real data. I'm willing for my team to contribute as much data as we can. We've also been using dexter [ http://llvm.org/devmtg/2018-04/slides/Bedwell-Measuring_the_User_Debugging_Experience.pdf ] to target our -O2 debugging improvement work, but hopefully it will be useful to provide another datapoint for the effects on the debugging experience of disabling specific passes. > > > > In my mind, -Og probably would incorporate a few things: > > * Tweak certain pass behaviors in order to be more favorable towards debugging [ https://reviews.llvm.org/D59431#1437716 ] > > * Enable features favorable to debugging [ http://llvm.org/devmtg/2017-10/#lightning8 ] > > * Disable whole passes that are known to fundamentally harm the debugging experience if there is no other alternative approach (this proposal?) > > * Still give a decent debug experience when used in conjunction with LTO. > > > > Thanks again for writing up your proposal. I'm really happy to see movement in this area! > > > > -Greg > > > > > > > > On Fri, 29 Mar 2019 at 02:09, Eric Christopher via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi All, > > I’ve been thinking about both O1 and Og optimization levels and have a > proposal for an improved O1 that I think overlaps in functionality > with our desires for Og. The design goal is to rewrite the O1 > optimization and code generation pipeline to include the set of > optimizations that minimizes build and test time while retaining our > ability to debug. > > This isn’t to minimize efforts around optimized debugging or negate O0 > builds, but rather to provide a compromise mode that encompasses some > of the benefits of both. In effect to create a “build mode for > everyday development”. > > This proposal is a first approximation guess on direction. I’ll be > exploring different options and combinations, but I think this is a > good place to start for discussion. Unless there are serious > objections to the general direction I’d like to get started so we can > explore and look at the code as it comes through review. > > > Optimization and Code Generation Pipeline > > The optimization passes chosen fall into a few main categories, > redundancy elimination and basic optimization/abstraction elimination. > The idea is that these are going to be the optimizations that a > programmer would expect to happen without affecting debugging. This > means not eliminating redundant calls or non-redundant loads as those > could fail in different ways and locations while executing. These > optimizations will also reduce the overall amount of code going to the > code generator helping both linker input size and code generation > speed. > > Dead code elimination > > - Dead code elimination (ADCE, BDCE) > - Dead store elimination > - Parts of CFG Simplification > - Removing branches and dead code paths and not including commoning > and speculation > > Basic Scalar Optimizations > > - Constant propagation including SCCP and IPCP > - Constant merging > - Instruction Combining > - Inlining: always_inline and normal inlining passes > - Memory to register promotion > - CSE of “unobservable” operations > - Reassociation of expressions > - Global optimizations - try to fold globals to constants > > Loop Optimizations > > Loop optimizations have some problems around debuggability and > observability, but a suggested set of passes would include > optimizations that remove abstractions and not ones that necessarily > optimize for performance. > > - Induction Variable Simplification > - LICM but not promotion > - Trivial Unswitching > - Loop rotation > - Full loop unrolling > - Loop deletion > > Pass Structure > > Overall pass ordering will look similar to the existing pass layout in > llvm with passes added or subtracted for O1 rather than a new pass > ordering. The motivation here is to make the overall proposal easier > to understand initially upstream while also maintaining existing pass > pipeline synergies between passes. > > Instruction selection > > We will use the fast instruction selector (where it exists) for three reasons: > - Significantly faster code generation than llvm’s dag based > instruction selection > - Better debugability than selection dag - fewer instructions moved around > - Fast instruction selection has been optimized somewhat and > shouldn’t be an outrageous penalty on most architectures > > Register allocation > > The fast register allocator should be used for compilation speed. > > Thoughts? > > Thanks! > > -eric > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev