thr3ads.net - llvm dev - [llvm-dev] [RFC] Add IR level interprocedural outliner for code size. [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Chris Bieneman via llvm-dev

2017-Jul-29 04:58 UTC

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Apologies for delayed joining of this discussion, but I had a few notes from
this thread that I really wanted to chime in about.

River,

I don't mean to put you on the spot, but I do want to start on a semantic
issue. In several places in the thread you used the words "we" and
"our" to imply that you're not alone in writing this (which is
totally fine), but your initial thread presented this as entirely your own work.
So, when you said things like "we feel there's an advantage to being at
the IR level", can you please clarify who is "we"?

Given that there are a number of disagreements and opinions floating around I
think it benefits us all to speak clearly about who is taking what stances.

One particular disagreement that I think very much needs to be revisited in this
thread was Jessica's proposal of a pipeline of:
IR outline
Inline
MIR outline
In your response to that proposal you dismissed it out of hand with
"feelings" but not data. Given that the proposal came from Jessica (a
community member with significant relevant experience in outlining), and it was
also recognized as interesting by Eric Christopher (a long-time member of the
community with wide reaching expertise), I think dismissing it may have been a
little premature.

I also want to visit a few procedural notes.

Mehdi commented on the thread that it wouldn't be fair to ask for a
comparative study because the MIR outliner didn't have one. While I
don't think anyone is asking for a comparative study, I want to point out
that I think it is completely fair. If a new contributor approached the
community with a new SROA pass and wanted to land it in-tree it would be
appropriate to ask for a comparative analysis against the existing pass. How is
this different?

Adding a new IR outliner is a different situation from when the MIR one was
added. When the MIR outliner was introduced there was no in-tree analog. When
someone comes to the community with something that has no existing in-tree
analog it isn't fair to necessarily ask them to implement it multiple
different ways to prove their solution is the best. However, as a community, we
do still exercise the right to reject contributions we disagree with, and we
frequently request changes to the implementation (as is shown every time someone
tries to add SPIR-V support).

In the LLVM community we have a long history of approaching large contributions
(especially ones from new contributors) with scrutiny and discussion. It would
be a disservice to the project to forget that.

River, as a last note. I see that you've started uploading patches to
Phabricator, and I know you're relatively new to the community. When
uploading patches it helps to include appropriate reviewers so that the right
people see the patches as they come in. To that end can you please include
Jessica as a reviewer? Given her relevant domain experience I think her feedback
on the patches will be very valuable.

Thank you,
-Chris
> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hey Sanjoy,
>   
> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Hi,
> 
> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at gmail.com
<mailto:chisophugis at gmail.com>> wrote:
> > The way I interpret Quentin's statement is something like:
> >
> > - Inlining turns an interprocedural problem into an intraprocedural
problem
> > - Outlining turns an intraprocedural problem into an interprocedural
problem
> >
> > Insofar as our intraprocedural analyses and transformations are
strictly
> > more powerful than interprocedural, then there is a precise sense in
which
> > inlining exposes optimization opportunities while outlining does not.
> 
> While I think our intra-proc optimizations are *generally* more
> powerful, I don't think they are *always* more powerful.  For
> instance, LICM (today) won't hoist full regions but it can hoist
> single function calls.  If we can extract out a region into a
> readnone+nounwind function call then LICM will hoist it to the
> preheader if the safety checks pass.
> 
> > Actually, for his internship last summer River wrote a profile-guided
> > outliner / partial inliner (it didn't try to do deduplication; so
it was
> > more like PartialInliner.cpp). IIRC he found that LLVM's
interprocedural
> > analyses were so bad that there were pretty adverse effects from many
of the
> > outlining decisions. E.g. if you outline from the left side of a
diamond,
> > that side basically becomes a black box to most LLVM analyses and
forces
> > downstream dataflow meet points to give an overly conservative result,
even
> > though our standard intraprocedural analyses would have happily dug
through
> > the left side of the diamond if the code had not been outlined.
> >
> > Also, River's patch (the one in this thread) does parameterized
outlining.
> > For example, two sequences containing stores can be outlined even if
the
> > corresponding stores have different pointers. The pointer to be loaded
from
> > is passed as a parameter to the outlined function. In that sense, the
> > outlined function's behavior becomes a conservative approximation
of both
> > which in principle loses precision.
> 
> Can we outline only once we've already done all of these optimizations
> that outlining would block?
>  
>   The outliner is able to run at any point in the interprocedural pipeline.
There are currently two locations: Early outlining(pre inliner) and late
outlining(practically the last pass to run). It is configured to run either
Early+Late, or just Late.
> 
> 
> > I like your EarlyCSE example and it is interesting that combined with
> > functionattrs it can make a "cheap" pass get a
transformation that an
> > "expensive" pass would otherwise be needed. Are there any
cases where we
> > only have the "cheap" pass and thus the outlining would be
essential for our
> > optimization pipeline to get the optimization right?
> >
> > The case that comes to mind for me is cases where we have some cutoff
of
> > search depth. Reducing a sequence to a single call (+ functionattr
> > inference) can essentially summarize the sequence and effectively
increase
> > search depth, which might give more results. That seems like a bit of
a weak
> > example though.
> 
> I don't know if River's patch outlines entire control flow regions
at
> a time, but if it does then we could use cheap basic block scanning
> analyses for things that would normally require CFG-level analysis.
> 
>   The current patch currently just supports outlining from within a single
block. Although, I had a working prototype for Region based outlining, I kept it
from this patch for simplicity. So its entirely possible to add that kind of
functionality because I've already tried.
> Thanks,
>   River Riddle
>  
> 
> -- Sanjoy
> 
> >
> > -- Sean Silva
> >
> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
> > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> >>
> >> Hi,
> >>
> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via llvm-dev
> >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> >> > No, I mean in terms of enabling other optimizations in the
pipeline like
> >> > vectorizer. Outliner does not expose any of that.
> >>
> >> I have not made a lot of effort to understand the full discussion
here (so
> >> what
> >> I say below may be off-base), but I think there are some cases
where
> >> outlining
> >> (especially working with function-attrs) can make optimization
easier.
> >>
> >> It can help transforms that duplicate code (like loop unrolling
and
> >> inlining) be
> >> more profitable -- I'm thinking of cases where
unrolling/inlining would
> >> have to
> >> duplicate a lot of code, but after outlining would require
duplicating
> >> only a
> >> few call instructions.
> >>
> >>
> >> It can help EarlyCSE do things that require GVN today:
> >>
> >> void foo() {
> >>   ... complex computation that computes func()
> >>   ... complex computation that computes func()
> >> }
> >>
> >> outlining=>
> >>
> >> int func() { ... }
> >>
> >> void foo() {
> >>   int x = func();
> >>   int y = func();
> >> }
> >>
> >> functionattrs=>
> >>
> >> int func() readonly { ... }
> >>
> >> void foo(int a, int b) {
> >>   int x = func();
> >>   int y = func();
> >> }
> >>
> >> earlycse=>
> >>
> >> int func(int t) readnone { ... }
> >>
> >> void foo(int a, int b) {
> >>   int x = func(a);
> >>   int y = x;
> >> }
> >>
> >> GVN will catch this, but EarlyCSE is (at least supposed to be!)
cheaper.
> >>
> >>
> >> Once we have an analysis that can prove that certain functions
can't trap,
> >> outlining can allow LICM etc. to speculate entire outlined regions
out of
> >> loops.
> >>
> >>
> >> Generally, I think outlining exposes information that certain
regions of
> >> the
> >> program are doing identical things.  We should expect to get some
mileage
> >> out of
> >> this information.
> >>
> >> -- Sanjoy
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> >
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170728/28007938/attachment.html>

River Riddle via llvm-dev

2017-Jul-29 05:33 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi Chris,

It's okay to put this on the spot because posting the patches was meant to
help further the discussion that kind of stalled previously.

On Fri, Jul 28, 2017 at 9:58 PM, Chris Bieneman <beanz at apple.com>
wrote:
> Apologies for delayed joining of this discussion, but I had a few notes
> from this thread that I really wanted to chime in about.
>
> River,
>
> I don't mean to put you on the spot, but I do want to start on a
semantic
> issue. In several places in the thread you used the words "we"
and "our" to
> imply that you're not alone in writing this (which is totally fine),
but
> your initial thread presented this as entirely your own work. So, when you
> said things like "we feel there's an advantage to being at the IR
level",
> can you please clarify who is "we"?
>
 In regards to the words "we" and "our", I am referring to
myself. My
writing style tends to shift between the usage of those words. I'll avoid
any kind of confusion in the future.

> Given that there are a number of disagreements and opinions floating
> around I think it benefits us all to speak clearly about who is taking what
> stances.
>
> One particular disagreement that I think very much needs to be revisited
> in this thread was Jessica's proposal of a pipeline of:
>
>    1. IR outline
>    2. Inline
>    3. MIR outline
>
> In your response to that proposal you dismissed it out of hand with
> "feelings" but not data. Given that the proposal came from
Jessica (a
> community member with significant relevant experience in outlining), and it
> was also recognized as interesting by Eric Christopher (a long-time member
> of the community with wide reaching expertise), I think dismissing it may
> have been a little premature.
>
I dismissed the idea of an outliner at the machine level being able to
catch bad inlining decisions. Given the loss of information between the two
I felt it was a little optimistic to rely on a very late pass being able to
reverse those decisions, especially coupled with the fact that the current
machine outliner requires exact equivalence. I don't disagree with the
proposal of an example : outline, inline, outline: pipeline, but the idea
of being able to catch inlining decisions given the circumstances seemed
optimistic to me. From there I went ahead and implemented a generic
interface for outlining that can be shared between IR/Machine level so that
such a pipeline could be more feasible.

>
> I also want to visit a few procedural notes.
>
> Mehdi commented on the thread that it wouldn't be fair to ask for a
> comparative study because the MIR outliner didn't have one. While I
don't
> think anyone is asking for a comparative study, I want to point out that I
> think it is completely fair. If a new contributor approached the community
> with a new SROA pass and wanted to land it in-tree it would be appropriate
> to ask for a comparative analysis against the existing pass. How is this
> different?
>
The real question comes from what exactly you want to define as a
"comparative analysis". When posting the patch I included additional
performance data( found here goo.gl/5k6wsP) that includes benchmarking and
comparisons between the outliner that I am proposing and the machine
outliner on a wide variety of benchmarks. The proposed outliner performs
quite favorable in comparison. As for feature comparison, the proposed
outliner has many features currently missing from the machine outliner:
 - parameterization
 - outputs
 - relaxed equivalence(machine outliner requires exact)
 - usage of profile data
 - support for opt remarks

 The machine outliner currently only supports X86 and AArch64, the IR
outliner can/should support all targets immediately without the requirement
of ABI restrictions(mno-red-zone is required for the machine outliner).
 At the IR level we have much more opportunity to find congruent
instructions than at the machine level given the possible variation at that
level: RA, instruction selection, instruction scheduling, etc.


I am more than willing to do a comparative analysis but I'm not quite sure
what the expectation for one is.

> Adding a new IR outliner is a different situation from when the MIR one
> was added. When the MIR outliner was introduced there was no in-tree
> analog. When someone comes to the community with something that has no
> existing in-tree analog it isn't fair to necessarily ask them to
implement
> it multiple different ways to prove their solution is the best. However, as
> a community, we do still exercise the right to reject contributions we
> disagree with, and we frequently request changes to the implementation (as
> is shown every time someone tries to add SPIR-V support).
>
I perfectly agree :)

> In the LLVM community we have a long history of approaching large
> contributions (especially ones from new contributors) with scrutiny and
> discussion. It would be a disservice to the project to forget that.
>
> River, as a last note. I see that you've started uploading patches to
> Phabricator, and I know you're relatively new to the community. When
> uploading patches it helps to include appropriate reviewers so that the
> right people see the patches as they come in. To that end can you please
> include Jessica as a reviewer? Given her relevant domain experience I think
> her feedback on the patches will be very valuable.
>
I accidentally posted without any reviewers at first, I've been going back
through and adding people I missed.

> Thank you,
> -Chris
>I appreciate the feedback and welcome all critical discussion about the
right way to move forward.
Thanks,
 River Riddle

>
> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hey Sanjoy,
>
> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi,
>>
>> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at
gmail.com>
>> wrote:
>> > The way I interpret Quentin's statement is something like:
>> >
>> > - Inlining turns an interprocedural problem into an
intraprocedural
>> problem
>> > - Outlining turns an intraprocedural problem into an
interprocedural
>> problem
>> >
>> > Insofar as our intraprocedural analyses and transformations are
strictly
>> > more powerful than interprocedural, then there is a precise sense
in
>> which
>> > inlining exposes optimization opportunities while outlining does
not.
>>
>> While I think our intra-proc optimizations are *generally* more
>> powerful, I don't think they are *always* more powerful.  For
>> instance, LICM (today) won't hoist full regions but it can hoist
>> single function calls.  If we can extract out a region into a
>> readnone+nounwind function call then LICM will hoist it to the
>> preheader if the safety checks pass.
>>
>> > Actually, for his internship last summer River wrote a
profile-guided
>> > outliner / partial inliner (it didn't try to do deduplication;
so it was
>> > more like PartialInliner.cpp). IIRC he found that LLVM's
interprocedural
>> > analyses were so bad that there were pretty adverse effects from
many
>> of the
>> > outlining decisions. E.g. if you outline from the left side of a
>> diamond,
>> > that side basically becomes a black box to most LLVM analyses and
forces
>> > downstream dataflow meet points to give an overly conservative
result,
>> even
>> > though our standard intraprocedural analyses would have happily
dug
>> through
>> > the left side of the diamond if the code had not been outlined.
>> >
>> > Also, River's patch (the one in this thread) does
parameterized
>> outlining.
>> > For example, two sequences containing stores can be outlined even
if the
>> > corresponding stores have different pointers. The pointer to be
loaded
>> from
>> > is passed as a parameter to the outlined function. In that sense,
the
>> > outlined function's behavior becomes a conservative
approximation of
>> both
>> > which in principle loses precision.
>>
>> Can we outline only once we've already done all of these
optimizations
>> that outlining would block?
>>
>
>   The outliner is able to run at any point in the interprocedural
> pipeline. There are currently two locations: Early outlining(pre inliner)
> and late outlining(practically the last pass to run). It is configured to
> run either Early+Late, or just Late.
>
>
>> > I like your EarlyCSE example and it is interesting that combined
with
>> > functionattrs it can make a "cheap" pass get a
transformation that an
>> > "expensive" pass would otherwise be needed. Are there
any cases where we
>> > only have the "cheap" pass and thus the outlining would
be essential
>> for our
>> > optimization pipeline to get the optimization right?
>> >
>> > The case that comes to mind for me is cases where we have some
cutoff of
>> > search depth. Reducing a sequence to a single call (+ functionattr
>> > inference) can essentially summarize the sequence and effectively
>> increase
>> > search depth, which might give more results. That seems like a bit
of a
>> weak
>> > example though.
>>
>> I don't know if River's patch outlines entire control flow
regions at
>> a time, but if it does then we could use cheap basic block scanning
>> analyses for things that would normally require CFG-level analysis.
>>
>
>   The current patch currently just supports outlining from within a single
> block. Although, I had a working prototype for Region based outlining, I
> kept it from this patch for simplicity. So its entirely possible to add
> that kind of functionality because I've already tried.
> Thanks,
>   River Riddle
>
>
>>
>> -- Sanjoy
>>
>> >
>> > -- Sean Silva
>> >
>> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via
llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >> > No, I mean in terms of enabling other optimizations in
the pipeline
>> like
>> >> > vectorizer. Outliner does not expose any of that.
>> >>
>> >> I have not made a lot of effort to understand the full
discussion here
>> (so
>> >> what
>> >> I say below may be off-base), but I think there are some cases
where
>> >> outlining
>> >> (especially working with function-attrs) can make optimization
easier.
>> >>
>> >> It can help transforms that duplicate code (like loop
unrolling and
>> >> inlining) be
>> >> more profitable -- I'm thinking of cases where
unrolling/inlining would
>> >> have to
>> >> duplicate a lot of code, but after outlining would require
duplicating
>> >> only a
>> >> few call instructions.
>> >>
>> >>
>> >> It can help EarlyCSE do things that require GVN today:
>> >>
>> >> void foo() {
>> >>   ... complex computation that computes func()
>> >>   ... complex computation that computes func()
>> >> }
>> >>
>> >> outlining=>
>> >>
>> >> int func() { ... }
>> >>
>> >> void foo() {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> functionattrs=>
>> >>
>> >> int func() readonly { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> earlycse=>
>> >>
>> >> int func(int t) readnone { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func(a);
>> >>   int y = x;
>> >> }
>> >>
>> >> GVN will catch this, but EarlyCSE is (at least supposed to
be!)
>> cheaper.
>> >>
>> >>
>> >> Once we have an analysis that can prove that certain functions
can't
>> trap,
>> >> outlining can allow LICM etc. to speculate entire outlined
regions out
>> of
>> >> loops.
>> >>
>> >>
>> >> Generally, I think outlining exposes information that certain
regions
>> of
>> >> the
>> >> program are doing identical things.  We should expect to get
some
>> mileage
>> >> out of
>> >> this information.
>> >>
>> >> -- Sanjoy
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170728/e50ee098/attachment.html>

Evgeny Astigeevich via llvm-dev

2017-Jul-31 15:46 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi Chris,
> One particular disagreement that I think very much needs to be revisited in
this thread was Jessica's proposal of a pipeline of:
> 1. IR outline
> 2. Inline
> 3. MIR outline
IMHO, there is no need to restrict a place of the Outliner in the pipeline at
the moment. I hope people representing different architectures will try
different configurations and the best will be chosen. I’d like to try the
pipeline configuration:

1.       Inline

2.       IR optimizations

3.       IR outline

4.       MIR optimizations

5.       MIR outline

I think this configuration allows to apply as many IR optimizations, especially
those which reduce code size, as possible and then extract commonly used code
into functions. I am also interested in some kind of Oz LTO with the IR Outliner
enabled.

Evgeny Astigeevich

Senior Compiler Engineer

Compilation Tools

ARM

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of River
Riddle via llvm-dev
Sent: Saturday, July 29, 2017 6:33 AM
To: Chris Bieneman
Cc: llvm-dev
Subject: Re: [llvm-dev] [RFC] Add IR level interprocedural outliner for code
size.

Hi Chris,

It's okay to put this on the spot because posting the patches was meant to
help further the discussion that kind of stalled previously.

On Fri, Jul 28, 2017 at 9:58 PM, Chris Bieneman <beanz at
apple.com<mailto:beanz at apple.com>> wrote:
Apologies for delayed joining of this discussion, but I had a few notes from
this thread that I really wanted to chime in about.

River,

I don't mean to put you on the spot, but I do want to start on a semantic
issue. In several places in the thread you used the words "we" and
"our" to imply that you're not alone in writing this (which is
totally fine), but your initial thread presented this as entirely your own work.
So, when you said things like "we feel there's an advantage to being at
the IR level", can you please clarify who is "we"?

 In regards to the words "we" and "our", I am referring to
myself. My writing style tends to shift between the usage of those words.
I'll avoid any kind of confusion in the future.

Given that there are a number of disagreements and opinions floating around I
think it benefits us all to speak clearly about who is taking what stances.

One particular disagreement that I think very much needs to be revisited in this
thread was Jessica's proposal of a pipeline of:

  1.  IR outline
  2.  Inline
  3.  MIR outline
In your response to that proposal you dismissed it out of hand with
"feelings" but not data. Given that the proposal came from Jessica (a
community member with significant relevant experience in outlining), and it was
also recognized as interesting by Eric Christopher (a long-time member of the
community with wide reaching expertise), I think dismissing it may have been a
little premature.

I dismissed the idea of an outliner at the machine level being able to catch bad
inlining decisions. Given the loss of information between the two I felt it was
a little optimistic to rely on a very late pass being able to reverse those
decisions, especially coupled with the fact that the current machine outliner
requires exact equivalence. I don't disagree with the proposal of an example
: outline, inline, outline: pipeline, but the idea of being able to catch
inlining decisions given the circumstances seemed optimistic to me. From there I
went ahead and implemented a generic interface for outlining that can be shared
between IR/Machine level so that such a pipeline could be more feasible.

I also want to visit a few procedural notes.

Mehdi commented on the thread that it wouldn't be fair to ask for a
comparative study because the MIR outliner didn't have one. While I
don't think anyone is asking for a comparative study, I want to point out
that I think it is completely fair. If a new contributor approached the
community with a new SROA pass and wanted to land it in-tree it would be
appropriate to ask for a comparative analysis against the existing pass. How is
this different?

The real question comes from what exactly you want to define as a
"comparative analysis". When posting the patch I included additional
performance data( found here goo.gl/5k6wsP<http://goo.gl/5k6wsP>) that
includes benchmarking and comparisons between the outliner that I am proposing
and the machine outliner on a wide variety of benchmarks. The proposed outliner
performs quite favorable in comparison. As for feature comparison, the proposed
outliner has many features currently missing from the machine outliner:
 - parameterization
 - outputs
 - relaxed equivalence(machine outliner requires exact)
 - usage of profile data
 - support for opt remarks

 The machine outliner currently only supports X86 and AArch64, the IR outliner
can/should support all targets immediately without the requirement of ABI
restrictions(mno-red-zone is required for the machine outliner).
 At the IR level we have much more opportunity to find congruent instructions
than at the machine level given the possible variation at that level: RA,
instruction selection, instruction scheduling, etc.

I am more than willing to do a comparative analysis but I'm not quite sure
what the expectation for one is.

Adding a new IR outliner is a different situation from when the MIR one was
added. When the MIR outliner was introduced there was no in-tree analog. When
someone comes to the community with something that has no existing in-tree
analog it isn't fair to necessarily ask them to implement it multiple
different ways to prove their solution is the best. However, as a community, we
do still exercise the right to reject contributions we disagree with, and we
frequently request changes to the implementation (as is shown every time someone
tries to add SPIR-V support).

I perfectly agree :)

In the LLVM community we have a long history of approaching large contributions
(especially ones from new contributors) with scrutiny and discussion. It would
be a disservice to the project to forget that.

River, as a last note. I see that you've started uploading patches to
Phabricator, and I know you're relatively new to the community. When
uploading patches it helps to include appropriate reviewers so that the right
people see the patches as they come in. To that end can you please include
Jessica as a reviewer? Given her relevant domain experience I think her feedback
on the patches will be very valuable.

I accidentally posted without any reviewers at first, I've been going back
through and adding people I missed.

Thank you,
-Chris
I appreciate the feedback and welcome all critical discussion about the right
way to move forward.
Thanks,
 River Riddle

On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hey Sanjoy,

On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi,

On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at
gmail.com<mailto:chisophugis at gmail.com>>
wrote:> The way I interpret Quentin's statement is something like:
>
> - Inlining turns an interprocedural problem into an intraprocedural problem
> - Outlining turns an intraprocedural problem into an interprocedural
problem
>
> Insofar as our intraprocedural analyses and transformations are strictly
> more powerful than interprocedural, then there is a precise sense in which
> inlining exposes optimization opportunities while outlining does not.
While I think our intra-proc optimizations are *generally* more
powerful, I don't think they are *always* more powerful.  For
instance, LICM (today) won't hoist full regions but it can hoist
single function calls.  If we can extract out a region into a
readnone+nounwind function call then LICM will hoist it to the
preheader if the safety checks pass.
> Actually, for his internship last summer River wrote a profile-guided
> outliner / partial inliner (it didn't try to do deduplication; so it
was
> more like PartialInliner.cpp). IIRC he found that LLVM's
interprocedural
> analyses were so bad that there were pretty adverse effects from many of
the
> outlining decisions. E.g. if you outline from the left side of a diamond,
> that side basically becomes a black box to most LLVM analyses and forces
> downstream dataflow meet points to give an overly conservative result, even
> though our standard intraprocedural analyses would have happily dug through
> the left side of the diamond if the code had not been outlined.
>
> Also, River's patch (the one in this thread) does parameterized
outlining.
> For example, two sequences containing stores can be outlined even if the
> corresponding stores have different pointers. The pointer to be loaded from
> is passed as a parameter to the outlined function. In that sense, the
> outlined function's behavior becomes a conservative approximation of
both
> which in principle loses precision.
Can we outline only once we've already done all of these optimizations
that outlining would block?

  The outliner is able to run at any point in the interprocedural pipeline.
There are currently two locations: Early outlining(pre inliner) and late
outlining(practically the last pass to run). It is configured to run either
Early+Late, or just Late.

> I like your EarlyCSE example and it is interesting that combined with
> functionattrs it can make a "cheap" pass get a transformation
that an
> "expensive" pass would otherwise be needed. Are there any cases
where we
> only have the "cheap" pass and thus the outlining would be
essential for our
> optimization pipeline to get the optimization right?
>
> The case that comes to mind for me is cases where we have some cutoff of
> search depth. Reducing a sequence to a single call (+ functionattr
> inference) can essentially summarize the sequence and effectively increase
> search depth, which might give more results. That seems like a bit of a
weak
> example though.
I don't know if River's patch outlines entire control flow regions at
a time, but if it does then we could use cheap basic block scanning
analyses for things that would normally require CFG-level analysis.

  The current patch currently just supports outlining from within a single
block. Although, I had a working prototype for Region based outlining, I kept it
from this patch for simplicity. So its entirely possible to add that kind of
functionality because I've already tried.
Thanks,
  River Riddle

-- Sanjoy
>
> -- Sean Silva
>
> On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
> <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
wrote:
>>
>> Hi,
>>
>> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via llvm-dev
>> <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>> wrote:
>> > No, I mean in terms of enabling other optimizations in the
pipeline like
>> > vectorizer. Outliner does not expose any of that.
>>
>> I have not made a lot of effort to understand the full discussion here
(so
>> what
>> I say below may be off-base), but I think there are some cases where
>> outlining
>> (especially working with function-attrs) can make optimization easier.
>>
>> It can help transforms that duplicate code (like loop unrolling and
>> inlining) be
>> more profitable -- I'm thinking of cases where unrolling/inlining
would
>> have to
>> duplicate a lot of code, but after outlining would require duplicating
>> only a
>> few call instructions.
>>
>>
>> It can help EarlyCSE do things that require GVN today:
>>
>> void foo() {
>>   ... complex computation that computes func()
>>   ... complex computation that computes func()
>> }
>>
>> outlining=>
>>
>> int func() { ... }
>>
>> void foo() {
>>   int x = func();
>>   int y = func();
>> }
>>
>> functionattrs=>
>>
>> int func() readonly { ... }
>>
>> void foo(int a, int b) {
>>   int x = func();
>>   int y = func();
>> }
>>
>> earlycse=>
>>
>> int func(int t) readnone { ... }
>>
>> void foo(int a, int b) {
>>   int x = func(a);
>>   int y = x;
>> }
>>
>> GVN will catch this, but EarlyCSE is (at least supposed to be!)
cheaper.
>>
>>
>> Once we have an analysis that can prove that certain functions
can't trap,
>> outlining can allow LICM etc. to speculate entire outlined regions out
of
>> loops.
>>
>>
>> Generally, I think outlining exposes information that certain regions
of
>> the
>> program are doing identical things.  We should expect to get some
mileage
>> out of
>> this information.
>>
>> -- Sanjoy
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170731/f3e8d78c/attachment-0001.html>

Eric Christopher via llvm-dev

2017-Jul-31 16:55 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi River,

>> Given that there are a number of disagreements and opinions floating
>> around I think it benefits us all to speak clearly about who is taking
what
>> stances.
>>
>> One particular disagreement that I think very much needs to be
revisited
>> in this thread was Jessica's proposal of a pipeline of:
>>
>>    1. IR outline
>>    2. Inline
>>    3. MIR outline
>>
>> In your response to that proposal you dismissed it out of hand with
>> "feelings" but not data. Given that the proposal came from
Jessica (a
>> community member with significant relevant experience in outlining),
and it
>> was also recognized as interesting by Eric Christopher (a long-time
member
>> of the community with wide reaching expertise), I think dismissing it
may
>> have been a little premature.
>>
>
> I dismissed the idea of an outliner at the machine level being able to
> catch bad inlining decisions. Given the loss of information between the two
> I felt it was a little optimistic to rely on a very late pass being able to
> reverse those decisions, especially coupled with the fact that the current
> machine outliner requires exact equivalence. I don't disagree with the
> proposal of an example : outline, inline, outline: pipeline, but the idea
> of being able to catch inlining decisions given the circumstances seemed
> optimistic to me. From there I went ahead and implemented a generic
> interface for outlining that can be shared between IR/Machine level so that
> such a pipeline could be more feasible.
>
Honestly given that the owner of the outlining code was suggesting this
path, I don't think that without a concrete reason you should unilaterally
make this decision.

>
>
>>
>> I also want to visit a few procedural notes.
>>
>> Mehdi commented on the thread that it wouldn't be fair to ask for a
>> comparative study because the MIR outliner didn't have one. While I
don't
>> think anyone is asking for a comparative study, I want to point out
that I
>> think it is completely fair. If a new contributor approached the
community
>> with a new SROA pass and wanted to land it in-tree it would be
appropriate
>> to ask for a comparative analysis against the existing pass. How is
this
>> different?
>>
>
> The real question comes from what exactly you want to define as a
> "comparative analysis". When posting the patch I included
additional
> performance data( found here goo.gl/5k6wsP) that includes benchmarking
> and comparisons between the outliner that I am proposing and the machine
> outliner on a wide variety of benchmarks. The proposed outliner performs
> quite favorable in comparison. As for feature comparison, the proposed
> outliner has many features currently missing from the machine outliner:
>  - parameterization
>  - outputs
>  - relaxed equivalence(machine outliner requires exact)
>  - usage of profile data
>  - support for opt remarks
>
>  The machine outliner currently only supports X86 and AArch64, the IR
> outliner can/should support all targets immediately without the requirement
> of ABI restrictions(mno-red-zone is required for the machine outliner).
>  At the IR level we have much more opportunity to find congruent
> instructions than at the machine level given the possible variation at that
> level: RA, instruction selection, instruction scheduling, etc.
>
These are all theoretical advantages and quite compelling, however, numbers
are important and I think we should see one.

> In the LLVM community we have a long history of approaching large
>> contributions (especially ones from new contributors) with scrutiny and
>> discussion. It would be a disservice to the project to forget that.
>>
>> River, as a last note. I see that you've started uploading patches
to
>> Phabricator, and I know you're relatively new to the community.
When
>> uploading patches it helps to include appropriate reviewers so that the
>> right people see the patches as they come in. To that end can you
please
>> include Jessica as a reviewer? Given her relevant domain experience I
think
>> her feedback on the patches will be very valuable.
>>
>
> I accidentally posted without any reviewers at first, I've been going
back
> through and adding people I missed.
>
>Last I checked you had still not added Jessica here. I think for design and
future decisions here she should be added and be considered one of the
prime reviewers of this effort.

-eric

>
>> Thank you,
>> -Chris
>>
> I appreciate the feedback and welcome all critical discussion about the
> right way to move forward.
> Thanks,
>  River Riddle
>
>
>>
>> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hey Sanjoy,
>>
>> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi,
>>>
>>> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at
gmail.com>
>>> wrote:
>>> > The way I interpret Quentin's statement is something like:
>>> >
>>> > - Inlining turns an interprocedural problem into an
intraprocedural
>>> problem
>>> > - Outlining turns an intraprocedural problem into an
interprocedural
>>> problem
>>> >
>>> > Insofar as our intraprocedural analyses and transformations
are
>>> strictly
>>> > more powerful than interprocedural, then there is a precise
sense in
>>> which
>>> > inlining exposes optimization opportunities while outlining
does not.
>>>
>>> While I think our intra-proc optimizations are *generally* more
>>> powerful, I don't think they are *always* more powerful.  For
>>> instance, LICM (today) won't hoist full regions but it can
hoist
>>> single function calls.  If we can extract out a region into a
>>> readnone+nounwind function call then LICM will hoist it to the
>>> preheader if the safety checks pass.
>>>
>>> > Actually, for his internship last summer River wrote a
profile-guided
>>> > outliner / partial inliner (it didn't try to do
deduplication; so it
>>> was
>>> > more like PartialInliner.cpp). IIRC he found that LLVM's
>>> interprocedural
>>> > analyses were so bad that there were pretty adverse effects
from many
>>> of the
>>> > outlining decisions. E.g. if you outline from the left side of
a
>>> diamond,
>>> > that side basically becomes a black box to most LLVM analyses
and
>>> forces
>>> > downstream dataflow meet points to give an overly conservative
result,
>>> even
>>> > though our standard intraprocedural analyses would have
happily dug
>>> through
>>> > the left side of the diamond if the code had not been
outlined.
>>> >
>>> > Also, River's patch (the one in this thread) does
parameterized
>>> outlining.
>>> > For example, two sequences containing stores can be outlined
even if
>>> the
>>> > corresponding stores have different pointers. The pointer to
be loaded
>>> from
>>> > is passed as a parameter to the outlined function. In that
sense, the
>>> > outlined function's behavior becomes a conservative
approximation of
>>> both
>>> > which in principle loses precision.
>>>
>>> Can we outline only once we've already done all of these
optimizations
>>> that outlining would block?
>>>
>>
>>   The outliner is able to run at any point in the interprocedural
>> pipeline. There are currently two locations: Early outlining(pre
inliner)
>> and late outlining(practically the last pass to run). It is configured
to
>> run either Early+Late, or just Late.
>>
>>
>>> > I like your EarlyCSE example and it is interesting that
combined with
>>> > functionattrs it can make a "cheap" pass get a
transformation that an
>>> > "expensive" pass would otherwise be needed. Are
there any cases where
>>> we
>>> > only have the "cheap" pass and thus the outlining
would be essential
>>> for our
>>> > optimization pipeline to get the optimization right?
>>> >
>>> > The case that comes to mind for me is cases where we have some
cutoff
>>> of
>>> > search depth. Reducing a sequence to a single call (+
functionattr
>>> > inference) can essentially summarize the sequence and
effectively
>>> increase
>>> > search depth, which might give more results. That seems like a
bit of
>>> a weak
>>> > example though.
>>>
>>> I don't know if River's patch outlines entire control flow
regions at
>>> a time, but if it does then we could use cheap basic block scanning
>>> analyses for things that would normally require CFG-level analysis.
>>>
>>
>>   The current patch currently just supports outlining from within a
>> single block. Although, I had a working prototype for Region based
>> outlining, I kept it from this patch for simplicity. So its entirely
>> possible to add that kind of functionality because I've already
tried.
>> Thanks,
>>   River Riddle
>>
>>
>>>
>>> -- Sanjoy
>>>
>>> >
>>> > -- Sean Silva
>>> >
>>> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
>>> > <llvm-dev at lists.llvm.org> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via
llvm-dev
>>> >> <llvm-dev at lists.llvm.org> wrote:
>>> >> > No, I mean in terms of enabling other optimizations
in the pipeline
>>> like
>>> >> > vectorizer. Outliner does not expose any of that.
>>> >>
>>> >> I have not made a lot of effort to understand the full
discussion
>>> here (so
>>> >> what
>>> >> I say below may be off-base), but I think there are some
cases where
>>> >> outlining
>>> >> (especially working with function-attrs) can make
optimization easier.
>>> >>
>>> >> It can help transforms that duplicate code (like loop
unrolling and
>>> >> inlining) be
>>> >> more profitable -- I'm thinking of cases where
unrolling/inlining
>>> would
>>> >> have to
>>> >> duplicate a lot of code, but after outlining would require
duplicating
>>> >> only a
>>> >> few call instructions.
>>> >>
>>> >>
>>> >> It can help EarlyCSE do things that require GVN today:
>>> >>
>>> >> void foo() {
>>> >>   ... complex computation that computes func()
>>> >>   ... complex computation that computes func()
>>> >> }
>>> >>
>>> >> outlining=>
>>> >>
>>> >> int func() { ... }
>>> >>
>>> >> void foo() {
>>> >>   int x = func();
>>> >>   int y = func();
>>> >> }
>>> >>
>>> >> functionattrs=>
>>> >>
>>> >> int func() readonly { ... }
>>> >>
>>> >> void foo(int a, int b) {
>>> >>   int x = func();
>>> >>   int y = func();
>>> >> }
>>> >>
>>> >> earlycse=>
>>> >>
>>> >> int func(int t) readnone { ... }
>>> >>
>>> >> void foo(int a, int b) {
>>> >>   int x = func(a);
>>> >>   int y = x;
>>> >> }
>>> >>
>>> >> GVN will catch this, but EarlyCSE is (at least supposed to
be!)
>>> cheaper.
>>> >>
>>> >>
>>> >> Once we have an analysis that can prove that certain
functions can't
>>> trap,
>>> >> outlining can allow LICM etc. to speculate entire outlined
regions
>>> out of
>>> >> loops.
>>> >>
>>> >>
>>> >> Generally, I think outlining exposes information that
certain regions
>>> of
>>> >> the
>>> >> program are doing identical things.  We should expect to
get some
>>> mileage
>>> >> out of
>>> >> this information.
>>> >>
>>> >> -- Sanjoy
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> llvm-dev at lists.llvm.org
>>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >
>>> >
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170731/5572a8ff/attachment-0001.html>

Mehdi AMINI via llvm-dev

2017-Aug-01 05:38 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

2017-07-28 21:58 GMT-07:00 Chris Bieneman via llvm-dev <
llvm-dev at lists.llvm.org>:
> Apologies for delayed joining of this discussion, but I had a few notes
> from this thread that I really wanted to chime in about.
>
> River,
>
> I don't mean to put you on the spot, but I do want to start on a
semantic
> issue. In several places in the thread you used the words "we"
and "our" to
> imply that you're not alone in writing this (which is totally fine),
but
> your initial thread presented this as entirely your own work. So, when you
> said things like "we feel there's an advantage to being at the IR
level",
> can you please clarify who is "we"?
>
> Given that there are a number of disagreements and opinions floating
> around I think it benefits us all to speak clearly about who is taking what
> stances.
>
> One particular disagreement that I think very much needs to be revisited
> in this thread was Jessica's proposal of a pipeline of:
>
>    1. IR outline
>    2. Inline
>    3. MIR outline
>
> In your response to that proposal you dismissed it out of hand with
> "feelings" but not data. Given that the proposal came from
Jessica (a
> community member with significant relevant experience in outlining), and it
> was also recognized as interesting by Eric Christopher (a long-time member
> of the community with wide reaching expertise), I think dismissing it may
> have been a little premature.
>
It isn't clear to me how much the *exact* pipeline and ordering of passes
is relevant to consider if "having an outliner at the IR level" is a
good
idea.


> I also want to visit a few procedural notes.
>
> Mehdi commented on the thread that it wouldn't be fair to ask for a
> comparative study because the MIR outliner didn't have one. While I
don't
> think anyone is asking for a comparative study, I want to point out that I
> think it is completely fair.
>If a new contributor approached the community with a new SROA pass
and> wanted to land it in-tree it would be appropriate to ask for a comparative
> analysis against the existing pass. How is this different?
>
It seems quite different to me because there is no outliner at the IR level
and so they don't provide the same functionality. The "Why at the IR
level"
section of the original email combined with the performance numbers seems
largely enough to me to explain why it isn't redundant to the Machine-level
outliner.
I'd consider this work for inclusion upstream purely on its technical merit
at this point.
Discussing inclusion as part of any of the default pipeline is a different
story.

Similarly last year, the IR-level PGO was also implemented even though we
already had a PGO implementation, because 1) it provided a generic
solutions for other frontend (just like here it could be said that it
provides a generic solution for targets) and 2) it supported cases that
FE-PGO didn't (especially around better counter-context using pre-inlining
and such).


>
> Adding a new IR outliner is a different situation from when the MIR one
> was added. When the MIR outliner was introduced there was no in-tree
> analog.
>
We still usually discuss design extensively. Skipping the IR-level option
didn't seem obvious to me, to say the least. And it wasn't really much
discussed/considered extensively upstream.
If the idea is that implementing a concept at the machine level may
preclude a future implementation at the IR level, it means we should be *a
lot* more picky before accepting such contribution.
In this case, if I had anticipated any push-back on an IR-level
implementation only based on the fact that we have now a Machine-level one,
I'd likely have pushed back on the machine-level one.


> When someone comes to the community with something that has no existing
> in-tree analog it isn't fair to necessarily ask them to implement it
> multiple different ways to prove their solution is the best.
>
It may or may not be fair, but there is a tradeoff in how much effort we
would require them to convince the community that this is *the* right way
to go, depending on what it implies for future approaches.

-- 
Mehdi

> However, as a community, we do still exercise the right to reject
> contributions we disagree with, and we frequently request changes to the
> implementation (as is shown every time someone tries to add SPIR-V
support).
>
> In the LLVM community we have a long history of approaching large
> contributions (especially ones from new contributors) with scrutiny and
> discussion. It would be a disservice to the project to forget that.
>
> River, as a last note. I see that you've started uploading patches to
> Phabricator, and I know you're relatively new to the community. When
> uploading patches it helps to include appropriate reviewers so that the
> right people see the patches as they come in. To that end can you please
> include Jessica as a reviewer? Given her relevant domain experience I think
> her feedback on the patches will be very valuable.
>
> Thank you,
> -Chris
>
> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hey Sanjoy,
>
> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi,
>>
>> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at
gmail.com>
>> wrote:
>> > The way I interpret Quentin's statement is something like:
>> >
>> > - Inlining turns an interprocedural problem into an
intraprocedural
>> problem
>> > - Outlining turns an intraprocedural problem into an
interprocedural
>> problem
>> >
>> > Insofar as our intraprocedural analyses and transformations are
strictly
>> > more powerful than interprocedural, then there is a precise sense
in
>> which
>> > inlining exposes optimization opportunities while outlining does
not.
>>
>> While I think our intra-proc optimizations are *generally* more
>> powerful, I don't think they are *always* more powerful.  For
>> instance, LICM (today) won't hoist full regions but it can hoist
>> single function calls.  If we can extract out a region into a
>> readnone+nounwind function call then LICM will hoist it to the
>> preheader if the safety checks pass.
>>
>> > Actually, for his internship last summer River wrote a
profile-guided
>> > outliner / partial inliner (it didn't try to do deduplication;
so it was
>> > more like PartialInliner.cpp). IIRC he found that LLVM's
interprocedural
>> > analyses were so bad that there were pretty adverse effects from
many
>> of the
>> > outlining decisions. E.g. if you outline from the left side of a
>> diamond,
>> > that side basically becomes a black box to most LLVM analyses and
forces
>> > downstream dataflow meet points to give an overly conservative
result,
>> even
>> > though our standard intraprocedural analyses would have happily
dug
>> through
>> > the left side of the diamond if the code had not been outlined.
>> >
>> > Also, River's patch (the one in this thread) does
parameterized
>> outlining.
>> > For example, two sequences containing stores can be outlined even
if the
>> > corresponding stores have different pointers. The pointer to be
loaded
>> from
>> > is passed as a parameter to the outlined function. In that sense,
the
>> > outlined function's behavior becomes a conservative
approximation of
>> both
>> > which in principle loses precision.
>>
>> Can we outline only once we've already done all of these
optimizations
>> that outlining would block?
>>
>
>   The outliner is able to run at any point in the interprocedural
> pipeline. There are currently two locations: Early outlining(pre inliner)
> and late outlining(practically the last pass to run). It is configured to
> run either Early+Late, or just Late.
>
>
>> > I like your EarlyCSE example and it is interesting that combined
with
>> > functionattrs it can make a "cheap" pass get a
transformation that an
>> > "expensive" pass would otherwise be needed. Are there
any cases where we
>> > only have the "cheap" pass and thus the outlining would
be essential
>> for our
>> > optimization pipeline to get the optimization right?
>> >
>> > The case that comes to mind for me is cases where we have some
cutoff of
>> > search depth. Reducing a sequence to a single call (+ functionattr
>> > inference) can essentially summarize the sequence and effectively
>> increase
>> > search depth, which might give more results. That seems like a bit
of a
>> weak
>> > example though.
>>
>> I don't know if River's patch outlines entire control flow
regions at
>> a time, but if it does then we could use cheap basic block scanning
>> analyses for things that would normally require CFG-level analysis.
>>
>
>   The current patch currently just supports outlining from within a single
> block. Although, I had a working prototype for Region based outlining, I
> kept it from this patch for simplicity. So its entirely possible to add
> that kind of functionality because I've already tried.
> Thanks,
>   River Riddle
>
>
>>
>> -- Sanjoy
>>
>> >
>> > -- Sean Silva
>> >
>> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via
llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >> > No, I mean in terms of enabling other optimizations in
the pipeline
>> like
>> >> > vectorizer. Outliner does not expose any of that.
>> >>
>> >> I have not made a lot of effort to understand the full
discussion here
>> (so
>> >> what
>> >> I say below may be off-base), but I think there are some cases
where
>> >> outlining
>> >> (especially working with function-attrs) can make optimization
easier.
>> >>
>> >> It can help transforms that duplicate code (like loop
unrolling and
>> >> inlining) be
>> >> more profitable -- I'm thinking of cases where
unrolling/inlining would
>> >> have to
>> >> duplicate a lot of code, but after outlining would require
duplicating
>> >> only a
>> >> few call instructions.
>> >>
>> >>
>> >> It can help EarlyCSE do things that require GVN today:
>> >>
>> >> void foo() {
>> >>   ... complex computation that computes func()
>> >>   ... complex computation that computes func()
>> >> }
>> >>
>> >> outlining=>
>> >>
>> >> int func() { ... }
>> >>
>> >> void foo() {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> functionattrs=>
>> >>
>> >> int func() readonly { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> earlycse=>
>> >>
>> >> int func(int t) readnone { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func(a);
>> >>   int y = x;
>> >> }
>> >>
>> >> GVN will catch this, but EarlyCSE is (at least supposed to
be!)
>> cheaper.
>> >>
>> >>
>> >> Once we have an analysis that can prove that certain functions
can't
>> trap,
>> >> outlining can allow LICM etc. to speculate entire outlined
regions out
>> of
>> >> loops.
>> >>
>> >>
>> >> Generally, I think outlining exposes information that certain
regions
>> of
>> >> the
>> >> program are doing identical things.  We should expect to get
some
>> mileage
>> >> out of
>> >> this information.
>> >>
>> >> -- Sanjoy
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170731/6bed9cd1/attachment.html>

Sean Silva via llvm-dev

2017-Aug-01 08:02 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

On Jul 31, 2017 10:38 PM, "Mehdi AMINI via llvm-dev" <
llvm-dev at lists.llvm.org> wrote:



2017-07-28 21:58 GMT-07:00 Chris Bieneman via llvm-dev <
llvm-dev at lists.llvm.org>:
> Apologies for delayed joining of this discussion, but I had a few notes
> from this thread that I really wanted to chime in about.
>
> River,
>
> I don't mean to put you on the spot, but I do want to start on a
semantic
> issue. In several places in the thread you used the words "we"
and "our" to
> imply that you're not alone in writing this (which is totally fine),
but
> your initial thread presented this as entirely your own work. So, when you
> said things like "we feel there's an advantage to being at the IR
level",
> can you please clarify who is "we"?
>
> Given that there are a number of disagreements and opinions floating
> around I think it benefits us all to speak clearly about who is taking what
> stances.
>
> One particular disagreement that I think very much needs to be revisited
> in this thread was Jessica's proposal of a pipeline of:
>
>    1. IR outline
>    2. Inline
>    3. MIR outline
>
> In your response to that proposal you dismissed it out of hand with
> "feelings" but not data. Given that the proposal came from
Jessica (a
> community member with significant relevant experience in outlining), and it
> was also recognized as interesting by Eric Christopher (a long-time member
> of the community with wide reaching expertise), I think dismissing it may
> have been a little premature.
>
It isn't clear to me how much the *exact* pipeline and ordering of passes
is relevant to consider if "having an outliner at the IR level" is a
good
idea.


> I also want to visit a few procedural notes.
>
> Mehdi commented on the thread that it wouldn't be fair to ask for a
> comparative study because the MIR outliner didn't have one. While I
don't
> think anyone is asking for a comparative study, I want to point out that I
> think it is completely fair.
>If a new contributor approached the community with a new SROA pass
and> wanted to land it in-tree it would be appropriate to ask for a comparative
> analysis against the existing pass. How is this different?
>
It seems quite different to me because there is no outliner at the IR level
and so they don't provide the same functionality. The "Why at the IR
level"
section of the original email combined with the performance numbers seems
largely enough to me to explain why it isn't redundant to the Machine-level
outliner.
I'd consider this work for inclusion upstream purely on its technical merit
at this point.
Discussing inclusion as part of any of the default pipeline is a different
story.

Similarly last year, the IR-level PGO was also implemented even though we
already had a PGO implementation, because 1) it provided a generic
solutions for other frontend (just like here it could be said that it
provides a generic solution for targets) and 2) it supported cases that
FE-PGO didn't (especially around better counter-context using pre-inlining
and such).


>
> Adding a new IR outliner is a different situation from when the MIR one
> was added. When the MIR outliner was introduced there was no in-tree
> analog.
>
We still usually discuss design extensively. Skipping the IR-level option
didn't seem obvious to me, to say the least. And it wasn't really much
discussed/considered extensively upstream.



I think Quentin described it pretty well in a reply to the original RFC:


"
The other part is at the LLVM IR level or before register allocation
identifying similar code sequence is much harder, at least with a suffix
tree like algorithm. Basically the problem is how do we name our
instructions such that we can match them.
Let me take an example.
foo() {
/* bunch of code */
a = add b, c;
d = add e, f;
}

bar() {
d = add e, g;
f = add c, w;
}

With proper renaming, we can outline both adds in one function. The
difficulty is to recognize that they are semantically equivalent to give
them the same identifier in the suffix tree. I won’t get into the details
but it gets tricky quickly. We were thinking of reusing GVN to have such
identifier if we wanted to do the outlining at IR level but solving this
problem is hard.
"

The pass in this RFC solves this problem to allow using a suffix tree/array
type algorithm (string algorithm) on a dataflow graph like IR or pre-RA
MIR. It doesn't do it by producing value numbers based on an exact
congruence relation to feed into the string algorithms though (and I think
that is provably impossible except post-RA; I can elaborate if anyone is
interested). Instead it uses a relaxed congruence relation for the suffix
tree/array to find potential repeated sequences (that may not in fact be
exactly congruent). Then further steps perform exact congruence checks on
the found sequences along with parameterizing parameterizable differences.

Admittedly, I don't think this has come across well in River's posts.
I've
been working offline with him to help him rework his approach to this RFC
and how to work with the community more idiomatically. I'm hoping he'll
be
able to successfully reboot this RFC as I think this is a very neat
algorithm.


Also as a side note, I think in the original MachineOutliner RFC thread
there was some confusion as to whether it was possible to solve the code
folding outlining problem exactly as a graph problem on SSA using standard
value numbering algorithms in polynomial time. I can elaborate further, but
1. it is easy to see that you can map an arbitrary dag to an isomorphic
data flow graph in an SSA IR e.g. in LLVM IR or pre-RA MIR
2. Given two dags, you can create their respective isomorphic data flow
graphs (say, put them into two separate functions)
3. An exact graph based code folding outliner would be able to discover if
the two dataflow graphs are isomorphic (that is basically what I mean by
exact) and outline them.
4. Thus, graph isomorphism on dags can be solved with such an algorithm and
thus the outlining problem is GI-hard and a polynomial time solution would
be a big breakthrough in CS.
5. The actual problem the outliner is trying to solve is actually more like
finding subgraphs that are isomorphic, making the problem even harder
(something like "given dags A and B does there exist a subgraph of A that
is isomorphic to a subgraph of B")

So some sort of compromise is needed.

The reduction of the problem from a graph problem to a string problem is a
way to work around this. We sacrifice some code folding opportunities due
to the particular order in which the instructions were linearized into a
string. Or to put it another way, commuting instructions could reveal
better code folding opportunities to such string algorithms, but finding
the optimal order to commute them into to reveal such opportunities is
GI-hard. (and I think it is interesting future work to see if heuristically
reordering instructions can expose more opportunities to string-based code
folding outliners. For example, one can imagine a pass that tries to
canonicalize prologue or call-setup sequences to promote code folding by
our post-RA MachineOutliner)




-- Sean Silva

If the idea is that implementing a concept at the machine level may
preclude a future implementation at the IR level, it means we should be *a
lot* more picky before accepting such contribution.
In this case, if I had anticipated any push-back on an IR-level
implementation only based on the fact that we have now a Machine-level one,
I'd likely have pushed back on the machine-level one.


> When someone comes to the community with something that has no existing
> in-tree analog it isn't fair to necessarily ask them to implement it
> multiple different ways to prove their solution is the best.
>
It may or may not be fair, but there is a tradeoff in how much effort we
would require them to convince the community that this is *the* right way
to go, depending on what it implies for future approaches.

-- 
Mehdi

> However, as a community, we do still exercise the right to reject
> contributions we disagree with, and we frequently request changes to the
> implementation (as is shown every time someone tries to add SPIR-V
support).
>
> In the LLVM community we have a long history of approaching large
> contributions (especially ones from new contributors) with scrutiny and
> discussion. It would be a disservice to the project to forget that.
>
> River, as a last note. I see that you've started uploading patches to
> Phabricator, and I know you're relatively new to the community. When
> uploading patches it helps to include appropriate reviewers so that the
> right people see the patches as they come in. To that end can you please
> include Jessica as a reviewer? Given her relevant domain experience I think
> her feedback on the patches will be very valuable.
>
> Thank you,
> -Chris
>
> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hey Sanjoy,
>
> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi,
>>
>> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at
gmail.com>
>> wrote:
>> > The way I interpret Quentin's statement is something like:
>> >
>> > - Inlining turns an interprocedural problem into an
intraprocedural
>> problem
>> > - Outlining turns an intraprocedural problem into an
interprocedural
>> problem
>> >
>> > Insofar as our intraprocedural analyses and transformations are
strictly
>> > more powerful than interprocedural, then there is a precise sense
in
>> which
>> > inlining exposes optimization opportunities while outlining does
not.
>>
>> While I think our intra-proc optimizations are *generally* more
>> powerful, I don't think they are *always* more powerful.  For
>> instance, LICM (today) won't hoist full regions but it can hoist
>> single function calls.  If we can extract out a region into a
>> readnone+nounwind function call then LICM will hoist it to the
>> preheader if the safety checks pass.
>>
>> > Actually, for his internship last summer River wrote a
profile-guided
>> > outliner / partial inliner (it didn't try to do deduplication;
so it was
>> > more like PartialInliner.cpp). IIRC he found that LLVM's
interprocedural
>> > analyses were so bad that there were pretty adverse effects from
many
>> of the
>> > outlining decisions. E.g. if you outline from the left side of a
>> diamond,
>> > that side basically becomes a black box to most LLVM analyses and
forces
>> > downstream dataflow meet points to give an overly conservative
result,
>> even
>> > though our standard intraprocedural analyses would have happily
dug
>> through
>> > the left side of the diamond if the code had not been outlined.
>> >
>> > Also, River's patch (the one in this thread) does
parameterized
>> outlining.
>> > For example, two sequences containing stores can be outlined even
if the
>> > corresponding stores have different pointers. The pointer to be
loaded
>> from
>> > is passed as a parameter to the outlined function. In that sense,
the
>> > outlined function's behavior becomes a conservative
approximation of
>> both
>> > which in principle loses precision.
>>
>> Can we outline only once we've already done all of these
optimizations
>> that outlining would block?
>>
>
>   The outliner is able to run at any point in the interprocedural
> pipeline. There are currently two locations: Early outlining(pre inliner)
> and late outlining(practically the last pass to run). It is configured to
> run either Early+Late, or just Late.
>
>
>> > I like your EarlyCSE example and it is interesting that combined
with
>> > functionattrs it can make a "cheap" pass get a
transformation that an
>> > "expensive" pass would otherwise be needed. Are there
any cases where we
>> > only have the "cheap" pass and thus the outlining would
be essential
>> for our
>> > optimization pipeline to get the optimization right?
>> >
>> > The case that comes to mind for me is cases where we have some
cutoff of
>> > search depth. Reducing a sequence to a single call (+ functionattr
>> > inference) can essentially summarize the sequence and effectively
>> increase
>> > search depth, which might give more results. That seems like a bit
of a
>> weak
>> > example though.
>>
>> I don't know if River's patch outlines entire control flow
regions at
>> a time, but if it does then we could use cheap basic block scanning
>> analyses for things that would normally require CFG-level analysis.
>>
>
>   The current patch currently just supports outlining from within a single
> block. Although, I had a working prototype for Region based outlining, I
> kept it from this patch for simplicity. So its entirely possible to add
> that kind of functionality because I've already tried.
> Thanks,
>   River Riddle
>
>
>>
>> -- Sanjoy
>>
>> >
>> > -- Sean Silva
>> >
>> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via
llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >> > No, I mean in terms of enabling other optimizations in
the pipeline
>> like
>> >> > vectorizer. Outliner does not expose any of that.
>> >>
>> >> I have not made a lot of effort to understand the full
discussion here
>> (so
>> >> what
>> >> I say below may be off-base), but I think there are some cases
where
>> >> outlining
>> >> (especially working with function-attrs) can make optimization
easier.
>> >>
>> >> It can help transforms that duplicate code (like loop
unrolling and
>> >> inlining) be
>> >> more profitable -- I'm thinking of cases where
unrolling/inlining would
>> >> have to
>> >> duplicate a lot of code, but after outlining would require
duplicating
>> >> only a
>> >> few call instructions.
>> >>
>> >>
>> >> It can help EarlyCSE do things that require GVN today:
>> >>
>> >> void foo() {
>> >>   ... complex computation that computes func()
>> >>   ... complex computation that computes func()
>> >> }
>> >>
>> >> outlining=>
>> >>
>> >> int func() { ... }
>> >>
>> >> void foo() {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> functionattrs=>
>> >>
>> >> int func() readonly { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> earlycse=>
>> >>
>> >> int func(int t) readnone { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func(a);
>> >>   int y = x;
>> >> }
>> >>
>> >> GVN will catch this, but EarlyCSE is (at least supposed to
be!)
>> cheaper.
>> >>
>> >>
>> >> Once we have an analysis that can prove that certain functions
can't
>> trap,
>> >> outlining can allow LICM etc. to speculate entire outlined
regions out
>> of
>> >> loops.
>> >>
>> >>
>> >> Generally, I think outlining exposes information that certain
regions
>> of
>> >> the
>> >> program are doing identical things.  We should expect to get
some
>> mileage
>> >> out of
>> >> this information.
>> >>
>> >> -- Sanjoy
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170801/3ae5bc25/attachment-0001.html>

Andrey Bokhanko via llvm-dev

2017-Aug-01 08:07 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

All,

+1 to what Mehdi said.

It's a fair concern to question whatever we need yet another Outlining
pass. I believe this concern has been cleared by River -- both with
theoretical arguments and practical data (benchmark numbers).

Jessica's pipeline proposal is completely orthogonal. It's not fair to
request River to implement / fit into what she suggested. Sure, it's a
valid topic to discuss -- but yet completely orthogonal one. If anything,
accepting River's implementation would enable us to do experiments /
developments like pipeline changes of this ilk!

Yours,
Andrey
==Compiler Architect
NXP


On Tue, Aug 1, 2017 at 7:38 AM, Mehdi AMINI via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> 2017-07-28 21:58 GMT-07:00 Chris Bieneman via llvm-dev <
> llvm-dev at lists.llvm.org>:
>
>> Apologies for delayed joining of this discussion, but I had a few notes
>> from this thread that I really wanted to chime in about.
>>
>> River,
>>
>> I don't mean to put you on the spot, but I do want to start on a
semantic
>> issue. In several places in the thread you used the words
"we" and "our" to
>> imply that you're not alone in writing this (which is totally
fine), but
>> your initial thread presented this as entirely your own work. So, when
you
>> said things like "we feel there's an advantage to being at the
IR level",
>> can you please clarify who is "we"?
>>
>> Given that there are a number of disagreements and opinions floating
>> around I think it benefits us all to speak clearly about who is taking
what
>> stances.
>>
>> One particular disagreement that I think very much needs to be
revisited
>> in this thread was Jessica's proposal of a pipeline of:
>>
>>    1. IR outline
>>    2. Inline
>>    3. MIR outline
>>
>> In your response to that proposal you dismissed it out of hand with
>> "feelings" but not data. Given that the proposal came from
Jessica (a
>> community member with significant relevant experience in outlining),
and it
>> was also recognized as interesting by Eric Christopher (a long-time
member
>> of the community with wide reaching expertise), I think dismissing it
may
>> have been a little premature.
>>
>
> It isn't clear to me how much the *exact* pipeline and ordering of
passes
> is relevant to consider if "having an outliner at the IR level"
is a good
> idea.
>
>
>
>> I also want to visit a few procedural notes.
>>
>> Mehdi commented on the thread that it wouldn't be fair to ask for a
>> comparative study because the MIR outliner didn't have one. While I
don't
>> think anyone is asking for a comparative study, I want to point out
that I
>> think it is completely fair.
>>
> If a new contributor approached the community with a new SROA pass and
>> wanted to land it in-tree it would be appropriate to ask for a
comparative
>> analysis against the existing pass. How is this different?
>>
>
> It seems quite different to me because there is no outliner at the IR
> level and so they don't provide the same functionality. The "Why
at the IR
> level" section of the original email combined with the performance
numbers
> seems largely enough to me to explain why it isn't redundant to the
> Machine-level outliner.
> I'd consider this work for inclusion upstream purely on its technical
> merit at this point.
> Discussing inclusion as part of any of the default pipeline is a different
> story.
>
> Similarly last year, the IR-level PGO was also implemented even though we
> already had a PGO implementation, because 1) it provided a generic
> solutions for other frontend (just like here it could be said that it
> provides a generic solution for targets) and 2) it supported cases that
> FE-PGO didn't (especially around better counter-context using
pre-inlining
> and such).
>
>
>
>>
>> Adding a new IR outliner is a different situation from when the MIR one
>> was added. When the MIR outliner was introduced there was no in-tree
>> analog.
>>
>
> We still usually discuss design extensively. Skipping the IR-level option
> didn't seem obvious to me, to say the least. And it wasn't really
much
> discussed/considered extensively upstream.
> If the idea is that implementing a concept at the machine level may
> preclude a future implementation at the IR level, it means we should be *a
> lot* more picky before accepting such contribution.
> In this case, if I had anticipated any push-back on an IR-level
> implementation only based on the fact that we have now a Machine-level one,
> I'd likely have pushed back on the machine-level one.
>
>
>
>> When someone comes to the community with something that has no existing
>> in-tree analog it isn't fair to necessarily ask them to implement
it
>> multiple different ways to prove their solution is the best.
>>
>
> It may or may not be fair, but there is a tradeoff in how much effort we
> would require them to convince the community that this is *the* right way
> to go, depending on what it implies for future approaches.
>
> --
> Mehdi
>
>
>> However, as a community, we do still exercise the right to reject
>> contributions we disagree with, and we frequently request changes to
the
>> implementation (as is shown every time someone tries to add SPIR-V
support).
>>
>> In the LLVM community we have a long history of approaching large
>> contributions (especially ones from new contributors) with scrutiny and
>> discussion. It would be a disservice to the project to forget that.
>>
>> River, as a last note. I see that you've started uploading patches
to
>> Phabricator, and I know you're relatively new to the community.
When
>> uploading patches it helps to include appropriate reviewers so that the
>> right people see the patches as they come in. To that end can you
please
>> include Jessica as a reviewer? Given her relevant domain experience I
think
>> her feedback on the patches will be very valuable.
>>
>> Thank you,
>> -Chris
>>
>> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hey Sanjoy,
>>
>> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi,
>>>
>>> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at
gmail.com>
>>> wrote:
>>> > The way I interpret Quentin's statement is something like:
>>> >
>>> > - Inlining turns an interprocedural problem into an
intraprocedural
>>> problem
>>> > - Outlining turns an intraprocedural problem into an
interprocedural
>>> problem
>>> >
>>> > Insofar as our intraprocedural analyses and transformations
are
>>> strictly
>>> > more powerful than interprocedural, then there is a precise
sense in
>>> which
>>> > inlining exposes optimization opportunities while outlining
does not.
>>>
>>> While I think our intra-proc optimizations are *generally* more
>>> powerful, I don't think they are *always* more powerful.  For
>>> instance, LICM (today) won't hoist full regions but it can
hoist
>>> single function calls.  If we can extract out a region into a
>>> readnone+nounwind function call then LICM will hoist it to the
>>> preheader if the safety checks pass.
>>>
>>> > Actually, for his internship last summer River wrote a
profile-guided
>>> > outliner / partial inliner (it didn't try to do
deduplication; so it
>>> was
>>> > more like PartialInliner.cpp). IIRC he found that LLVM's
>>> interprocedural
>>> > analyses were so bad that there were pretty adverse effects
from many
>>> of the
>>> > outlining decisions. E.g. if you outline from the left side of
a
>>> diamond,
>>> > that side basically becomes a black box to most LLVM analyses
and
>>> forces
>>> > downstream dataflow meet points to give an overly conservative
result,
>>> even
>>> > though our standard intraprocedural analyses would have
happily dug
>>> through
>>> > the left side of the diamond if the code had not been
outlined.
>>> >
>>> > Also, River's patch (the one in this thread) does
parameterized
>>> outlining.
>>> > For example, two sequences containing stores can be outlined
even if
>>> the
>>> > corresponding stores have different pointers. The pointer to
be loaded
>>> from
>>> > is passed as a parameter to the outlined function. In that
sense, the
>>> > outlined function's behavior becomes a conservative
approximation of
>>> both
>>> > which in principle loses precision.
>>>
>>> Can we outline only once we've already done all of these
optimizations
>>> that outlining would block?
>>>
>>
>>   The outliner is able to run at any point in the interprocedural
>> pipeline. There are currently two locations: Early outlining(pre
inliner)
>> and late outlining(practically the last pass to run). It is configured
to
>> run either Early+Late, or just Late.
>>
>>
>>> > I like your EarlyCSE example and it is interesting that
combined with
>>> > functionattrs it can make a "cheap" pass get a
transformation that an
>>> > "expensive" pass would otherwise be needed. Are
there any cases where
>>> we
>>> > only have the "cheap" pass and thus the outlining
would be essential
>>> for our
>>> > optimization pipeline to get the optimization right?
>>> >
>>> > The case that comes to mind for me is cases where we have some
cutoff
>>> of
>>> > search depth. Reducing a sequence to a single call (+
functionattr
>>> > inference) can essentially summarize the sequence and
effectively
>>> increase
>>> > search depth, which might give more results. That seems like a
bit of
>>> a weak
>>> > example though.
>>>
>>> I don't know if River's patch outlines entire control flow
regions at
>>> a time, but if it does then we could use cheap basic block scanning
>>> analyses for things that would normally require CFG-level analysis.
>>>
>>
>>   The current patch currently just supports outlining from within a
>> single block. Although, I had a working prototype for Region based
>> outlining, I kept it from this patch for simplicity. So its entirely
>> possible to add that kind of functionality because I've already
tried.
>> Thanks,
>>   River Riddle
>>
>>
>>>
>>> -- Sanjoy
>>>
>>> >
>>> > -- Sean Silva
>>> >
>>> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
>>> > <llvm-dev at lists.llvm.org> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via
llvm-dev
>>> >> <llvm-dev at lists.llvm.org> wrote:
>>> >> > No, I mean in terms of enabling other optimizations
in the pipeline
>>> like
>>> >> > vectorizer. Outliner does not expose any of that.
>>> >>
>>> >> I have not made a lot of effort to understand the full
discussion
>>> here (so
>>> >> what
>>> >> I say below may be off-base), but I think there are some
cases where
>>> >> outlining
>>> >> (especially working with function-attrs) can make
optimization easier.
>>> >>
>>> >> It can help transforms that duplicate code (like loop
unrolling and
>>> >> inlining) be
>>> >> more profitable -- I'm thinking of cases where
unrolling/inlining
>>> would
>>> >> have to
>>> >> duplicate a lot of code, but after outlining would require
duplicating
>>> >> only a
>>> >> few call instructions.
>>> >>
>>> >>
>>> >> It can help EarlyCSE do things that require GVN today:
>>> >>
>>> >> void foo() {
>>> >>   ... complex computation that computes func()
>>> >>   ... complex computation that computes func()
>>> >> }
>>> >>
>>> >> outlining=>
>>> >>
>>> >> int func() { ... }
>>> >>
>>> >> void foo() {
>>> >>   int x = func();
>>> >>   int y = func();
>>> >> }
>>> >>
>>> >> functionattrs=>
>>> >>
>>> >> int func() readonly { ... }
>>> >>
>>> >> void foo(int a, int b) {
>>> >>   int x = func();
>>> >>   int y = func();
>>> >> }
>>> >>
>>> >> earlycse=>
>>> >>
>>> >> int func(int t) readnone { ... }
>>> >>
>>> >> void foo(int a, int b) {
>>> >>   int x = func(a);
>>> >>   int y = x;
>>> >> }
>>> >>
>>> >> GVN will catch this, but EarlyCSE is (at least supposed to
be!)
>>> cheaper.
>>> >>
>>> >>
>>> >> Once we have an analysis that can prove that certain
functions can't
>>> trap,
>>> >> outlining can allow LICM etc. to speculate entire outlined
regions
>>> out of
>>> >> loops.
>>> >>
>>> >>
>>> >> Generally, I think outlining exposes information that
certain regions
>>> of
>>> >> the
>>> >> program are doing identical things.  We should expect to
get some
>>> mileage
>>> >> out of
>>> >> this information.
>>> >>
>>> >> -- Sanjoy
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> llvm-dev at lists.llvm.org
>>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >
>>> >
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170801/b17d3db4/attachment.html>

Chris Bieneman via llvm-dev

2017-Aug-01 18:03 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

> On Jul 31, 2017, at 10:38 PM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
> 
> 
> 
> 2017-07-28 21:58 GMT-07:00 Chris Bieneman via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>:
> Apologies for delayed joining of this discussion, but I had a few notes
from this thread that I really wanted to chime in about.
> 
> River,
> 
> I don't mean to put you on the spot, but I do want to start on a
semantic issue. In several places in the thread you used the words
"we" and "our" to imply that you're not alone in writing
this (which is totally fine), but your initial thread presented this as entirely
your own work. So, when you said things like "we feel there's an
advantage to being at the IR level", can you please clarify who is
"we"?
> 
> Given that there are a number of disagreements and opinions floating around
I think it benefits us all to speak clearly about who is taking what stances.
> 
> One particular disagreement that I think very much needs to be revisited in
this thread was Jessica's proposal of a pipeline of:
> IR outline
> Inline
> MIR outline
> In your response to that proposal you dismissed it out of hand with
"feelings" but not data. Given that the proposal came from Jessica (a
community member with significant relevant experience in outlining), and it was
also recognized as interesting by Eric Christopher (a long-time member of the
community with wide reaching expertise), I think dismissing it may have been a
little premature.
> 
> It isn't clear to me how much the *exact* pipeline and ordering of
passes is relevant to consider if "having an outliner at the IR level"
is a good idea.
I think it is particularly relevant because based on the limited performance
numbers we've seen it looks like the MIR and IR outliners have different
benefits. Figuring out a pipeline where one doesn't prevent the other from
performing good optimizations seems like a reasonable precondition to accepting
these patches.
> 
>  
> I also want to visit a few procedural notes.
> 
> Mehdi commented on the thread that it wouldn't be fair to ask for a
comparative study because the MIR outliner didn't have one. While I
don't think anyone is asking for a comparative study, I want to point out
that I think it is completely fair.
> If a new contributor approached the community with a new SROA pass and
wanted to land it in-tree it would be appropriate to ask for a comparative
analysis against the existing pass. How is this different?
> 
> It seems quite different to me because there is no outliner at the IR level
and so they don't provide the same functionality. The "Why at the IR
level" section of the original email combined with the performance numbers
seems largely enough to me to explain why it isn't redundant to the
Machine-level outliner.
> I'd consider this work for inclusion upstream purely on its technical
merit at this point.
I believe the technical merit has not been shown clearly enough. The only data
we've seen has been cherry-picked and there are outstanding technical
questions about the approach.
> Discussing inclusion as part of any of the default pipeline is a different
story.
The patches that were sent out *do* include it in default pass pipelines.
> 
> Similarly last year, the IR-level PGO was also implemented even though we
already had a PGO implementation, because 1) it provided a generic solutions for
other frontend (just like here it could be said that it provides a generic
solution for targets) and 2) it supported cases that FE-PGO didn't
(especially around better counter-context using pre-inlining and such).
> 
>  
> 
> Adding a new IR outliner is a different situation from when the MIR one was
added. When the MIR outliner was introduced there was no in-tree analog.
> 
> We still usually discuss design extensively. Skipping the IR-level option
didn't seem obvious to me, to say the least. And it wasn't really much
discussed/considered extensively upstream.
The reasoning for this was covered in the discussions and in Jessica's LLVM
dev meeting talk. It may not have been widely discussed because it was widely
agreed on.
> If the idea is that implementing a concept at the machine level may
preclude a future implementation at the IR level, it means we should be *a lot*
more picky before accepting such contribution.
Nobody is precluding an IR implementation. We are merely holding the IR
implementation to the same high standards of justification that we held the MIR
one to. You may not recall this, but the MIR one took *months* to go from RFC to
landing in-tree.
> In this case, if I had anticipated any push-back on an IR-level
implementation only based on the fact that we have now a Machine-level one,
I'd likely have pushed back on the machine-level one.
There is no pushback based solely on the presence of the MIR outliner. One
source of inquiry about the merits of the IR outliner is its comparison to the
MIR outliner, and whether or not the two can play well together. This seems like
a reasonable line of inquiry to me.
> 
>  
> When someone comes to the community with something that has no existing
in-tree analog it isn't fair to necessarily ask them to implement it
multiple different ways to prove their solution is the best.
> 
> It may or may not be fair, but there is a tradeoff in how much effort we
would require them to convince the community that this is *the* right way to go,
depending on what it implies for future approaches.
Sure, and several of us are trying to have a conversation with River about how
the IR outliner will best fit into LLVM and what technical considerations have
to be made. You arguing that we should just accept the patches as they are is
counter productive to us being able to ensure that the IR outliner is at an
appropriate quality and has sufficient technical merit.

-Chris
> 
> -- 
> Mehdi
>  
> However, as a community, we do still exercise the right to reject
contributions we disagree with, and we frequently request changes to the
implementation (as is shown every time someone tries to add SPIR-V support).
> 
> In the LLVM community we have a long history of approaching large
contributions (especially ones from new contributors) with scrutiny and
discussion. It would be a disservice to the project to forget that.
> 
> River, as a last note. I see that you've started uploading patches to
Phabricator, and I know you're relatively new to the community. When
uploading patches it helps to include appropriate reviewers so that the right
people see the patches as they come in. To that end can you please include
Jessica as a reviewer? Given her relevant domain experience I think her feedback
on the patches will be very valuable.
> 
> Thank you,
> -Chris
> 
>> On Jul 26, 2017, at 1:52 PM, River Riddle via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Hey Sanjoy,
>>   
>> On Wed, Jul 26, 2017 at 1:41 PM, Sanjoy Das via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> Hi,
>> 
>> On Wed, Jul 26, 2017 at 12:54 PM, Sean Silva <chisophugis at
gmail.com <mailto:chisophugis at gmail.com>> wrote:
>> > The way I interpret Quentin's statement is something like:
>> >
>> > - Inlining turns an interprocedural problem into an
intraprocedural problem
>> > - Outlining turns an intraprocedural problem into an
interprocedural problem
>> >
>> > Insofar as our intraprocedural analyses and transformations are
strictly
>> > more powerful than interprocedural, then there is a precise sense
in which
>> > inlining exposes optimization opportunities while outlining does
not.
>> 
>> While I think our intra-proc optimizations are *generally* more
>> powerful, I don't think they are *always* more powerful.  For
>> instance, LICM (today) won't hoist full regions but it can hoist
>> single function calls.  If we can extract out a region into a
>> readnone+nounwind function call then LICM will hoist it to the
>> preheader if the safety checks pass.
>> 
>> > Actually, for his internship last summer River wrote a
profile-guided
>> > outliner / partial inliner (it didn't try to do deduplication;
so it was
>> > more like PartialInliner.cpp). IIRC he found that LLVM's
interprocedural
>> > analyses were so bad that there were pretty adverse effects from
many of the
>> > outlining decisions. E.g. if you outline from the left side of a
diamond,
>> > that side basically becomes a black box to most LLVM analyses and
forces
>> > downstream dataflow meet points to give an overly conservative
result, even
>> > though our standard intraprocedural analyses would have happily
dug through
>> > the left side of the diamond if the code had not been outlined.
>> >
>> > Also, River's patch (the one in this thread) does
parameterized outlining.
>> > For example, two sequences containing stores can be outlined even
if the
>> > corresponding stores have different pointers. The pointer to be
loaded from
>> > is passed as a parameter to the outlined function. In that sense,
the
>> > outlined function's behavior becomes a conservative
approximation of both
>> > which in principle loses precision.
>> 
>> Can we outline only once we've already done all of these
optimizations
>> that outlining would block?
>>  
>>   The outliner is able to run at any point in the interprocedural
pipeline. There are currently two locations: Early outlining(pre inliner) and
late outlining(practically the last pass to run). It is configured to run either
Early+Late, or just Late.
>> 
>> 
>> > I like your EarlyCSE example and it is interesting that combined
with
>> > functionattrs it can make a "cheap" pass get a
transformation that an
>> > "expensive" pass would otherwise be needed. Are there
any cases where we
>> > only have the "cheap" pass and thus the outlining would
be essential for our
>> > optimization pipeline to get the optimization right?
>> >
>> > The case that comes to mind for me is cases where we have some
cutoff of
>> > search depth. Reducing a sequence to a single call (+ functionattr
>> > inference) can essentially summarize the sequence and effectively
increase
>> > search depth, which might give more results. That seems like a bit
of a weak
>> > example though.
>> 
>> I don't know if River's patch outlines entire control flow
regions at
>> a time, but if it does then we could use cheap basic block scanning
>> analyses for things that would normally require CFG-level analysis.
>> 
>>   The current patch currently just supports outlining from within a
single block. Although, I had a working prototype for Region based outlining, I
kept it from this patch for simplicity. So its entirely possible to add that
kind of functionality because I've already tried.
>> Thanks,
>>   River Riddle
>>  
>> 
>> -- Sanjoy
>> 
>> >
>> > -- Sean Silva
>> >
>> > On Wed, Jul 26, 2017 at 12:07 PM, Sanjoy Das via llvm-dev
>> > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via
llvm-dev
>> >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>> >> > No, I mean in terms of enabling other optimizations in
the pipeline like
>> >> > vectorizer. Outliner does not expose any of that.
>> >>
>> >> I have not made a lot of effort to understand the full
discussion here (so
>> >> what
>> >> I say below may be off-base), but I think there are some cases
where
>> >> outlining
>> >> (especially working with function-attrs) can make optimization
easier.
>> >>
>> >> It can help transforms that duplicate code (like loop
unrolling and
>> >> inlining) be
>> >> more profitable -- I'm thinking of cases where
unrolling/inlining would
>> >> have to
>> >> duplicate a lot of code, but after outlining would require
duplicating
>> >> only a
>> >> few call instructions.
>> >>
>> >>
>> >> It can help EarlyCSE do things that require GVN today:
>> >>
>> >> void foo() {
>> >>   ... complex computation that computes func()
>> >>   ... complex computation that computes func()
>> >> }
>> >>
>> >> outlining=>
>> >>
>> >> int func() { ... }
>> >>
>> >> void foo() {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> functionattrs=>
>> >>
>> >> int func() readonly { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func();
>> >>   int y = func();
>> >> }
>> >>
>> >> earlycse=>
>> >>
>> >> int func(int t) readnone { ... }
>> >>
>> >> void foo(int a, int b) {
>> >>   int x = func(a);
>> >>   int y = x;
>> >> }
>> >>
>> >> GVN will catch this, but EarlyCSE is (at least supposed to
be!) cheaper.
>> >>
>> >>
>> >> Once we have an analysis that can prove that certain functions
can't trap,
>> >> outlining can allow LICM etc. to speculate entire outlined
regions out of
>> >> loops.
>> >>
>> >>
>> >> Generally, I think outlining exposes information that certain
regions of
>> >> the
>> >> program are doing identical things.  We should expect to get
some mileage
>> >> out of
>> >> this information.
>> >>
>> >> -- Sanjoy
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> >
>> >
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170801/b20c846c/attachment.html>

llvm dev - Jul 2017 - [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.