thr3ads.net - llvm dev - [llvm-dev] [RFC] Add IR level interprocedural outliner for code size. [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das via llvm-dev

2017-Jul-26 20:41 UTC

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi,

On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at gmail.com>
wrote:> The way I interpret Quentin's statement is something like:
>
> - Inlining turns an interprocedural problem into an intraprocedural problem
> - Outlining turns an intraprocedural problem into an interprocedural
problem
>
> Insofar as our intraprocedural analyses and transformations are strictly
> more powerful than interprocedural, then there is a precise sense in which
> inlining exposes optimization opportunities while outlining does not.
While I think our intra-proc optimizations are *generally* more
powerful, I don't think they are *always* more powerful.  For
instance, LICM (today) won't hoist full regions but it can hoist
single function calls.  If we can extract out a region into a
readnone+nounwind function call then LICM will hoist it to the
preheader if the safety checks pass.
> Actually, for his internship last summer River wrote a profile-guided
> outliner / partial inliner (it didn't try to do deduplication; so it
was
> more like PartialInliner.cpp). IIRC he found that LLVM's
interprocedural
> analyses were so bad that there were pretty adverse effects from many of
the
> outlining decisions. E.g. if you outline from the left side of a diamond,
> that side basically becomes a black box to most LLVM analyses and forces
> downstream dataflow meet points to give an overly conservative result, even
> though our standard intraprocedural analyses would have happily dug through
> the left side of the diamond if the code had not been outlined.
>
> Also, River's patch (the one in this thread) does parameterized
outlining.
> For example, two sequences containing stores can be outlined even if the
> corresponding stores have different pointers. The pointer to be loaded from
> is passed as a parameter to the outlined function. In that sense, the
> outlined function's behavior becomes a conservative approximation of
both
> which in principle loses precision.
Can we outline only once we've already done all of these optimizations
that outlining would block?
> I like your EarlyCSE example and it is interesting that combined with
> functionattrs it can make a "cheap" pass get a transformation
that an
> "expensive" pass would otherwise be needed. Are there any cases
where we
> only have the "cheap" pass and thus the outlining would be
essential for our
> optimization pipeline to get the optimization right?
>
> The case that comes to mind for me is cases where we have some cutoff of
> search depth. Reducing a sequence to a single call (+ functionattr
> inference) can essentially summarize the sequence and effectively increase
> search depth, which might give more results. That seems like a bit of a
weak
> example though.
I don't know if River's patch outlines entire control flow regions at
a time, but if it does then we could use cheap basic block scanning
analyses for things that would normally require CFG-level analysis.

-- Sanjoy
>
> -- Sean Silva
>
> On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi,
>>
>> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > No, I mean in terms of enabling other optimizations in the
pipeline like
>> > vectorizer. Outliner does not expose any of that.
>>
>> I have not made a lot of effort to understand the full discussion here
(so
>> what
>> I say below may be off-base), but I think there are some cases where
>> outlining
>> (especially working with function-attrs) can make optimization easier.
>>
>> It can help transforms that duplicate code (like loop unrolling and
>> inlining) be
>> more profitable -- I'm thinking of cases where unrolling/inlining
would
>> have to
>> duplicate a lot of code, but after outlining would require duplicating
>> only a
>> few call instructions.
>>
>>
>> It can help EarlyCSE do things that require GVN today:
>>
>> void foo() {
>>   ... complex computation that computes func()
>>   ... complex computation that computes func()
>> }
>>
>> outlining=>
>>
>> int func() { ... }
>>
>> void foo() {
>>   int x = func();
>>   int y = func();
>> }
>>
>> functionattrs=>
>>
>> int func() readonly { ... }
>>
>> void foo(int a, int b) {
>>   int x = func();
>>   int y = func();
>> }
>>
>> earlycse=>
>>
>> int func(int t) readnone { ... }
>>
>> void foo(int a, int b) {
>>   int x = func(a);
>>   int y = x;
>> }
>>
>> GVN will catch this, but EarlyCSE is (at least supposed to be!)
cheaper.
>>
>>
>> Once we have an analysis that can prove that certain functions
can't trap,
>> outlining can allow LICM etc. to speculate entire outlined regions out
of
>> loops.
>>
>>
>> Generally, I think outlining exposes information that certain regions
of
>> the
>> program are doing identical things.  We should expect to get some
mileage
>> out of
>> this information.
>>
>> -- Sanjoy
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

River Riddle via llvm-dev

2017-Jul-26 20:52 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hey Sanjoy,

On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at
gmail.com>
> wrote:
> > The way I interpret Quentin's statement is something like:
> >
> > - Inlining turns an interprocedural problem into an intraprocedural
> problem
> > - Outlining turns an intraprocedural problem into an interprocedural
> problem
> >
> > Insofar as our intraprocedural analyses and transformations are
strictly
> > more powerful than interprocedural, then there is a precise sense in
> which
> > inlining exposes optimization opportunities while outlining does not.
>
> While I think our intra-proc optimizations are *generally* more
> powerful, I don't think they are *always* more powerful.  For
> instance, LICM (today) won't hoist full regions but it can hoist
> single function calls.  If we can extract out a region into a
> readnone+nounwind function call then LICM will hoist it to the
> preheader if the safety checks pass.
>
> > Actually, for his internship last summer River wrote a profile-guided
> > outliner / partial inliner (it didn't try to do deduplication; so
it was
> > more like PartialInliner.cpp). IIRC he found that LLVM's
interprocedural
> > analyses were so bad that there were pretty adverse effects from many
of
> the
> > outlining decisions. E.g. if you outline from the left side of a
diamond,
> > that side basically becomes a black box to most LLVM analyses and
forces
> > downstream dataflow meet points to give an overly conservative result,
> even
> > though our standard intraprocedural analyses would have happily dug
> through
> > the left side of the diamond if the code had not been outlined.
> >
> > Also, River's patch (the one in this thread) does parameterized
> outlining.
> > For example, two sequences containing stores can be outlined even if
the
> > corresponding stores have different pointers. The pointer to be loaded
> from
> > is passed as a parameter to the outlined function. In that sense, the
> > outlined function's behavior becomes a conservative approximation
of both
> > which in principle loses precision.
>
> Can we outline only once we've already done all of these optimizations
> that outlining would block?
>
  The outliner is able to run at any point in the interprocedural pipeline.
There are currently two locations: Early outlining(pre inliner) and late
outlining(practically the last pass to run). It is configured to run either
Early+Late, or just Late.

> > I like your EarlyCSE example and it is interesting that combined with
> > functionattrs it can make a "cheap" pass get a
transformation that an
> > "expensive" pass would otherwise be needed. Are there any
cases where we
> > only have the "cheap" pass and thus the outlining would be
essential for
> our
> > optimization pipeline to get the optimization right?
> >
> > The case that comes to mind for me is cases where we have some cutoff
of
> > search depth. Reducing a sequence to a single call (+ functionattr
> > inference) can essentially summarize the sequence and effectively
> increase
> > search depth, which might give more results. That seems like a bit of
a
> weak
> > example though.
>
> I don't know if River's patch outlines entire control flow regions
at
> a time, but if it does then we could use cheap basic block scanning
> analyses for things that would normally require CFG-level analysis.
>
  The current patch currently just supports outlining from within a single
block. Although, I had a working prototype for Region based outlining, I
kept it from this patch for simplicity. So its entirely possible to add
that kind of functionality because I've already tried.
Thanks,
  River Riddle

>
> -- Sanjoy
>
> >
> > -- Sean Silva
> >
> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> >>
> >> Hi,
> >>
> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> > No, I mean in terms of enabling other optimizations in the
pipeline
> like
> >> > vectorizer. Outliner does not expose any of that.
> >>
> >> I have not made a lot of effort to understand the full discussion
here
> (so
> >> what
> >> I say below may be off-base), but I think there are some cases
where
> >> outlining
> >> (especially working with function-attrs) can make optimization
easier.
> >>
> >> It can help transforms that duplicate code (like loop unrolling
and
> >> inlining) be
> >> more profitable -- I'm thinking of cases where
unrolling/inlining would
> >> have to
> >> duplicate a lot of code, but after outlining would require
duplicating
> >> only a
> >> few call instructions.
> >>
> >>
> >> It can help EarlyCSE do things that require GVN today:
> >>
> >> void foo() {
> >>   ... complex computation that computes func()
> >>   ... complex computation that computes func()
> >> }
> >>
> >> outlining=>
> >>
> >> int func() { ... }
> >>
> >> void foo() {
> >>   int x = func();
> >>   int y = func();
> >> }
> >>
> >> functionattrs=>
> >>
> >> int func() readonly { ... }
> >>
> >> void foo(int a, int b) {
> >>   int x = func();
> >>   int y = func();
> >> }
> >>
> >> earlycse=>
> >>
> >> int func(int t) readnone { ... }
> >>
> >> void foo(int a, int b) {
> >>   int x = func(a);
> >>   int y = x;
> >> }
> >>
> >> GVN will catch this, but EarlyCSE is (at least supposed to be!)
cheaper.
> >>
> >>
> >> Once we have an analysis that can prove that certain functions
can't
> trap,
> >> outlining can allow LICM etc. to speculate entire outlined regions
out
> of
> >> loops.
> >>
> >>
> >> Generally, I think outlining exposes information that certain
regions of
> >> the
> >> program are doing identical things.  We should expect to get some
> mileage
> >> out of
> >> this information.
> >>
> >> -- Sanjoy
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170726/b416f355/attachment.html>

Chris Bieneman via llvm-dev

2017-Jul-29 04:58 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Apologies for delayed joining of this discussion, but I had a few notes from
this thread that I really wanted to chime in about.

River,

I don't mean to put you on the spot, but I do want to start on a semantic
issue. In several places in the thread you used the words "we" and
"our" to imply that you're not alone in writing this (which is
totally fine), but your initial thread presented this as entirely your own work.
So, when you said things like "we feel there's an advantage to being at
the IR level", can you please clarify who is "we"?

Given that there are a number of disagreements and opinions floating around I
think it benefits us all to speak clearly about who is taking what stances.

One particular disagreement that I think very much needs to be revisited in this
thread was Jessica's proposal of a pipeline of:
IR outline
Inline
MIR outline
In your response to that proposal you dismissed it out of hand with
"feelings" but not data. Given that the proposal came from Jessica (a
community member with significant relevant experience in outlining), and it was
also recognized as interesting by Eric Christopher (a long-time member of the
community with wide reaching expertise), I think dismissing it may have been a
little premature.

I also want to visit a few procedural notes.

Mehdi commented on the thread that it wouldn't be fair to ask for a
comparative study because the MIR outliner didn't have one. While I
don't think anyone is asking for a comparative study, I want to point out
that I think it is completely fair. If a new contributor approached the
community with a new SROA pass and wanted to land it in-tree it would be
appropriate to ask for a comparative analysis against the existing pass. How is
this different?

Adding a new IR outliner is a different situation from when the MIR one was
added. When the MIR outliner was introduced there was no in-tree analog. When
someone comes to the community with something that has no existing in-tree
analog it isn't fair to necessarily ask them to implement it multiple
different ways to prove their solution is the best. However, as a community, we
do still exercise the right to reject contributions we disagree with, and we
frequently request changes to the implementation (as is shown every time someone
tries to add SPIR-V support).

In the LLVM community we have a long history of approaching large contributions
(especially ones from new contributors) with scrutiny and discussion. It would
be a disservice to the project to forget that.

River, as a last note. I see that you've started uploading patches to
Phabricator, and I know you're relatively new to the community. When
uploading patches it helps to include appropriate reviewers so that the right
people see the patches as they come in. To that end can you please include
Jessica as a reviewer? Given her relevant domain experience I think her feedback
on the patches will be very valuable.

Thank you,
-Chris
> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hey Sanjoy,
>   
> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Hi,
> 
> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at gmail.com
<mailto:chisophugis at gmail.com>> wrote:
> > The way I interpret Quentin's statement is something like:
> >
> > - Inlining turns an interprocedural problem into an intraprocedural
problem
> > - Outlining turns an intraprocedural problem into an interprocedural
problem
> >
> > Insofar as our intraprocedural analyses and transformations are
strictly
> > more powerful than interprocedural, then there is a precise sense in
which
> > inlining exposes optimization opportunities while outlining does not.
> 
> While I think our intra-proc optimizations are *generally* more
> powerful, I don't think they are *always* more powerful.  For
> instance, LICM (today) won't hoist full regions but it can hoist
> single function calls.  If we can extract out a region into a
> readnone+nounwind function call then LICM will hoist it to the
> preheader if the safety checks pass.
> 
> > Actually, for his internship last summer River wrote a profile-guided
> > outliner / partial inliner (it didn't try to do deduplication; so
it was
> > more like PartialInliner.cpp). IIRC he found that LLVM's
interprocedural
> > analyses were so bad that there were pretty adverse effects from many
of the
> > outlining decisions. E.g. if you outline from the left side of a
diamond,
> > that side basically becomes a black box to most LLVM analyses and
forces
> > downstream dataflow meet points to give an overly conservative result,
even
> > though our standard intraprocedural analyses would have happily dug
through
> > the left side of the diamond if the code had not been outlined.
> >
> > Also, River's patch (the one in this thread) does parameterized
outlining.
> > For example, two sequences containing stores can be outlined even if
the
> > corresponding stores have different pointers. The pointer to be loaded
from
> > is passed as a parameter to the outlined function. In that sense, the
> > outlined function's behavior becomes a conservative approximation
of both
> > which in principle loses precision.
> 
> Can we outline only once we've already done all of these optimizations
> that outlining would block?
>  
>   The outliner is able to run at any point in the interprocedural pipeline.
There are currently two locations: Early outlining(pre inliner) and late
outlining(practically the last pass to run). It is configured to run either
Early+Late, or just Late.
> 
> 
> > I like your EarlyCSE example and it is interesting that combined with
> > functionattrs it can make a "cheap" pass get a
transformation that an
> > "expensive" pass would otherwise be needed. Are there any
cases where we
> > only have the "cheap" pass and thus the outlining would be
essential for our
> > optimization pipeline to get the optimization right?
> >
> > The case that comes to mind for me is cases where we have some cutoff
of
> > search depth. Reducing a sequence to a single call (+ functionattr
> > inference) can essentially summarize the sequence and effectively
increase
> > search depth, which might give more results. That seems like a bit of
a weak
> > example though.
> 
> I don't know if River's patch outlines entire control flow regions
at
> a time, but if it does then we could use cheap basic block scanning
> analyses for things that would normally require CFG-level analysis.
> 
>   The current patch currently just supports outlining from within a single
block. Although, I had a working prototype for Region based outlining, I kept it
from this patch for simplicity. So its entirely possible to add that kind of
functionality because I've already tried.
> Thanks,
>   River Riddle
>  
> 
> -- Sanjoy
> 
> >
> > -- Sean Silva
> >
> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
> > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> >>
> >> Hi,
> >>
> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via llvm-dev
> >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> >> > No, I mean in terms of enabling other optimizations in the
pipeline like
> >> > vectorizer. Outliner does not expose any of that.
> >>
> >> I have not made a lot of effort to understand the full discussion
here (so
> >> what
> >> I say below may be off-base), but I think there are some cases
where
> >> outlining
> >> (especially working with function-attrs) can make optimization
easier.
> >>
> >> It can help transforms that duplicate code (like loop unrolling
and
> >> inlining) be
> >> more profitable -- I'm thinking of cases where
unrolling/inlining would
> >> have to
> >> duplicate a lot of code, but after outlining would require
duplicating
> >> only a
> >> few call instructions.
> >>
> >>
> >> It can help EarlyCSE do things that require GVN today:
> >>
> >> void foo() {
> >>   ... complex computation that computes func()
> >>   ... complex computation that computes func()
> >> }
> >>
> >> outlining=>
> >>
> >> int func() { ... }
> >>
> >> void foo() {
> >>   int x = func();
> >>   int y = func();
> >> }
> >>
> >> functionattrs=>
> >>
> >> int func() readonly { ... }
> >>
> >> void foo(int a, int b) {
> >>   int x = func();
> >>   int y = func();
> >> }
> >>
> >> earlycse=>
> >>
> >> int func(int t) readnone { ... }
> >>
> >> void foo(int a, int b) {
> >>   int x = func(a);
> >>   int y = x;
> >> }
> >>
> >> GVN will catch this, but EarlyCSE is (at least supposed to be!)
cheaper.
> >>
> >>
> >> Once we have an analysis that can prove that certain functions
can't trap,
> >> outlining can allow LICM etc. to speculate entire outlined regions
out of
> >> loops.
> >>
> >>
> >> Generally, I think outlining exposes information that certain
regions of
> >> the
> >> program are doing identical things.  We should expect to get some
mileage
> >> out of
> >> this information.
> >>
> >> -- Sanjoy
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> >
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170728/28007938/attachment.html>

llvm dev - Jul 2017 - [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.