thr3ads.net - llvm dev - [llvm-dev] more reassociation in IR [May 2018]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin via llvm-dev

2018-May-12 01:40 UTC

[llvm-dev] more reassociation in IR

On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <yamauchi at google.com>
wrote:
>
>
> On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <dberlin at
dberlin.org>
> wrote:
>
>>
>>
>> On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <yamauchi at
google.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <dberlin at
dberlin.org>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <yamauchi
at google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <dberlin
at dberlin.org>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via
llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> (
>>>>>>> I came across this issue in the context of
>>>>>>>  D46336 <https://reviews.llvm.org/D46336>.
>>>>>>>  
>>>>>>> Thanks, Sanjay, for starting this discussion.)
>>>>>>>
>>>>>>> If
>>>>>>> we will
>>>>>>>  move
>>>>>>> reassociation,
>>>>>>> or keep additional ones
>>>>>>> ,
>>>>>>> out of instcombine,
>>>>>>> open questions for me would be
>>>>>>> :
>>>>>>>
>>>>>>>
>>>>>>> 1. Since -reassociate isn't a fixed point pass,
>>>>>>>
>>>>>>
>>>>>> This is fixable, fwiw, without fixpointing it.
>>>>>>
>>>>>
>>>>> How?
>>>>>
>>>>
>>>> Depends on specifically which part you would like to know about
;)
>>>>
>>>
>>> Maybe I misunderstood what you meant by "This is
fixable". Did you mean
>>> that we won't somehow need to fixpoint between instcombine and
reassociate,
>>> or that the specific motivating examples from the above
differentials
>>> are foldable without fixpointing?
>>>
>>
>> If by fixpointing you mean "fixpointing reassociate and
instcombine",
>> then yes, that is fixable without fixpointing reassociate and
instcombine,
>> but would require rewriting instcombine :)
>>
>>
>>>
>>> If the latter, that may be the case. The concern was that we may
>>> encounter examples that may need many more iterations, if not
fixpointing.
>>> As long as it's feasible to fixpoint between instcombine and
>>> reassociate, it seems to work, but I guess that would probably need
>>> some pass management change.
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> we might need to repeat "-instcombine
-reassociate" multiple times
>>>>>>> to
>>>>>>> fold
>>>>>>>  down to what we want (relating to my comment here
>>>>>>> <https://reviews.llvm.org/D46336#1087082>). I
assumed this isn't
>>>>>>> not what we want to do
>>>>>>> ? My impression is we don't do a fixed-point
with passes?
>>>>>>>
>>>>>>
>>>>>> Well, i mean there is no practical difference between
passes that we
>>>>>> fixpoint externally and fixpoint internally.
>>>>>>
>>>>> 
>>>>> I had the following in mind: Does the pass manager support
fixpointing
>>>>> externally? Is there any performance difference? Are people
okay with that
>>>>> in general?
>>>>>
>>>>> But if there is no practical difference, I don't see
any problem with
>>>>> that :)
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>> 2.
>>>>>>> Since -reassociate needs to come up with one
operand order (at
>>>>>>> least currently as the only reassociate pass),
would there exist a
>>>>>>> single, unique operand order that would enable all
>>>>>>> reassociative/commutative foldings that we want?
>>>>>>>
>>>>>>
>>>>>> In what way?
>>>>>> Are you asking whether there is a single reassociation
order that
>>>>>> makes all foldings occur in the same operation or
something?
>>>>>> I don't feel like i understand what you are asking.
>>>>>>
>>>>>
>>>>> Does this rephrase help: with the motivating examples (like
>>>>> and-of-shifts or bit check patterns) from the above
differentials in mind,
>>>>> can we come up with a single reassociation order that
solves all
>>>>> those and all the others that may come up in the future?
Would we need
>>>>> different reassociation orders to fold different patterns?
>>>>>
>>>>
>>>> It doesn't quite help.
>>>> When stated that generally, there can be no such ordering at
all,
>>>> that's easy to prove.  It is a statically undecidable
problem.
>>>>
>>>> There is however, a different question and answer to a few
related
>>>> problems that maybe you are really asking?
>>>> 1. Is there a way to determine and apply the a maximal or
>>>> nearly-maximal set of folds/graph transforms that could be
applied to a
>>>> given set of code in a sane and principled way -> yes
>>>>
>>>> (see, e.g.,
http://www.cs.cornell.edu/~ross/publications/eqsat/)
>>>>
>>>> 2. Is there a way to determine all expressions in the program
as it
>>>> exists that are equivalent or equivalent under constant time
constant
>>>> folding/reassociation, in a reasonable time bound -> yes
>>>>
>>>> (not a single easy link, happy to talk about it)
>>>>
>>>> Your original question is basically equivalent to
>>>> Is there a way to determine all expressions in the program as
it exists
>>>> that are equivalent or could be made equivalent through any
type of folding
>>>> that one can think up?
>>>> The answer to that is "no", it's provable that
this is not statically
>>>> decidable, so the time bound doesn't matter :)
>>>>
>>>> You have to limit the possible folding/evaluation you apply in
various
>>>> ways to make this decidable, and then further limit it to make
the time
>>>> bound reasonable.
>>>>
>>>> This all quickly devolves into herbrand equivalence and
it's variations.
>>>>
>>>>
>>>> 
>>> Let me try one more time :) May we need multiple reassociate passes
to
>>> fold different reassociative patterns?
>>>
>>> A longer version: If Sanjay wants a particular reassociative
pattern to
>>> be folded (D45842), Omer wants another particular reassociative
pattern
>>> to be folded (D41574), and I want yet another particular
reassociative
>>> pattern to be folded (D46336), would we potentially need three
>>> different reassociate passes with each combined with instcombine,
rather
>>> than just one that may be able to somehow handle those cases in one
>>> shot, (assuming we don't want to put those in instcombine)?
>>>
>>> And it sounds like the answer is yes?
>>>
>>
>> If you take the current instcombine as a base, then yes, that is
correct.
>>
> 
>> If you are willing to rearchitect instcombine, the answer is no,
it's
>> possible to do this all in a single pass in a relatively sane way.
>>
>
> I assume by rearchitect, you mean a major rewrite as per this comment:
"Is
> there a way to determine all expressions in the program as it exists that
> are equivalent or equivalent under constant time constant
> folding/reassociation, in a reasonable time bound -> yes". Any
pointer or
> time to chat?
>
I'm happy to do both.

> 
> I think that an approach like 
> D
> 
> 
> 4
> 
> 
> 6
> 
> 
> 3
> 
> 
> 3
> 
> 
> 6
> 
> 
>  /
> 
> D
> 
> 
> 4
> 
> 
> 6
> 
> 
> 5
> 
> 
> 9
> 
> 
> 5
> 
>  has a merit: it would adds a bit of complexity, but would not require:
>
> 1. a major rewrite of instcombine,
> 2. writing multiple (potentially many) reassociate passes and figuring out
> how to fixpoint them with instcombine, or
> 3. writing a self-contained folding pass for a specific pattern
>
> If you look at the diffs in the existing .ll files in
> D46336
> , it helps fold some previously-unfolded reassociation patterns beyond
> the bit check patterns that it originally targeted.
>
> Sure, and it does so by  adding another O(N) cost to evaluation in eachcase. Instcombine doesn't even do lazy reevaluation through tracking
dependencies, so it'll do so a lot of times as well.

To me, that's not a good tradeoff, especially given how slow instcombine is
*already*.  The code it produces is "good enough" to stop for a while
and
do something else and not suffer horribly in performance.[1]

Let me ask a different question:

At what point would anyone here be willing to stop adding things to
instcombine and start doing something else instead, instead of waiting for
someone else to do it?
As far as i can tell, the answer is: "never", which makes most of
these
discussions just pointless rehashes as we slowly repeat the same disaster
that became  gcc's instruction combiner  :)

If the answer is "something", great, i'll set a mail filter and
ignore
these threads until that something happens :)

Personally, in my experience people will never do more here unless pushed
somewhat, or the thing becomes such a complete disaster no one wants to
touch it.

[1]  Last year i computed the "improvement in performance on
applications"
due to instcombine for a bunch of google apps and open source apps that had
easy to use benchmarks (IE I isolated about two years of instcombine
changes and made them to a current compiler piece by piece while measuring
performance).
I also computed the compile time increase in single instcombine passes over
the same time period.

On x86, but the numbers basically said we were basically gaining nearly
nothing for high cost.  IE our drive for better looking output does not
appear to translate into any real gains that i can find.  Either
improvements to other opts hid them, or they simply didn't matter on the
processors i tested on.

Certainly, apps/workloads/architectures may vary here, and my goal is not
to claim it's all worthless.
My actual goal in all of this was to get a sense of whether my perspective
on instcombine was still "reasonable", not to do a true scientific
exploration :)
I didn't have time/energy/etc to run it elsewhere, and again, my goal was
not to give certainty/try to give exact percentages.

--Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180511/e0e467e2/attachment.html>

Hal Finkel via llvm-dev

2018-May-12 02:20 UTC

head link

[llvm-dev] more reassociation in IR

On 05/11/2018 08:40 PM, Daniel Berlin via llvm-dev
wrote:>
>
> On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <yamauchi at
google.com
> <mailto:yamauchi at google.com>> wrote:
>
>
>
>     On Thu, May 10, 2018 at 12:49 PM Daniel Berlin
>     <dberlin at dberlin.org <mailto:dberlin at dberlin.org>>
wrote:
>
>
>
>         On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi
>         <yamauchi at google.com <mailto:yamauchi at
google.com>> wrote:
>
>
>
>             On Wed, May 9, 2018 at 8:24 PM Daniel Berlin
>             <dberlin at dberlin.org <mailto:dberlin at
dberlin.org>> wrote:
>
>
>
>                 On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi
>                 <yamauchi at google.com <mailto:yamauchi at
google.com>> wrote:
>
>
>
>                     On Tue, May 8, 2018 at 11:15 AM Daniel Berlin
>                     <dberlin at dberlin.org <mailto:dberlin at
dberlin.org>>
>                     wrote:
>
>
>
>                         On Tue, May 8, 2018 at 10:38 AM, Hiroshi
>                         Yamauchi via llvm-dev <llvm-dev at
lists.llvm.org
>                         <mailto:llvm-dev at lists.llvm.org>>
wrote:
>
>                             (
>                             I came across this issue in the context of
>                              D46336
<https://reviews.llvm.org/D46336>.
>                              
>                             Thanks, Sanjay, for starting this discussion.)
>
>                             If
>                             we will
>                              move
>                             reassociation, 
>                             or keep additional ones
>                             ,
>                             out of instcombine,
>                             open questions for me would be
>                             :
>
>
>                             1. Since -reassociate isn't a fixed point
>                             pass,
>
>
>                         This is fixable, fwiw, without fixpointing it.
>
>
>                     How?
>
>
>                 Depends on specifically which part you would like to
>                 know about ;)
>
>
>             Maybe I misunderstood what you meant by "This is
>             fixable". Did you mean that we won't somehow need to
>             fixpoint between instcombine and reassociate, or that the
>             specific motivating examples from the above differentials
>             are foldable without fixpointing?
>
>
>         If by fixpointing you mean "fixpointing reassociate and
>         instcombine", then yes, that is fixable without fixpointing
>         reassociate and instcombine, but would require rewriting
>         instcombine :)
>          
>
>
>             If the latter, that may be the case. The concern was that
>             we may encounter examples that may need many more
>             iterations, if not fixpointing. As long as it's feasible
>             to fixpoint between instcombine and reassociate, it seems
>             to work, but I guess that would probably need some pass
>             management change.
>
>                  
>
>
>                          
>
>                             we might need to repeat "-instcombine
>                             -reassociate" multiple times to
>                             fold
>                              down to what we want (relating to my
>                             comment here
>                            
<https://reviews.llvm.org/D46336#1087082>).
>                             I assumed this isn't not what we want to do
>                             ? My impression is we don't do a
>                             fixed-point with passes?
>
>
>                         Well, i mean there is no practical difference
>                         between passes that we fixpoint externally and
>                         fixpoint internally.
>
>                     
>                     I had the following in mind: Does the pass manager
>                     support fixpointing externally? Is there any
>                     performance difference? Are people okay with that
>                     in general?
>
>                     But if there is no practical difference, I don't
>                     see any problem with that :)
>
>                          
>
>                              
>
>                             2.
>                             Since -reassociate needs to come up with
>                             one operand order (at least currently as
>                             the only reassociate pass), would there
>                             exist a single, unique operand order that
>                             would enable all reassociative/commutative
>                             foldings that we want?
>
>
>                         In what way?
>                         Are you asking whether there is a single
>                         reassociation order that makes all foldings
>                         occur in the same operation or something?
>                         I don't feel like i understand what you are
>                         asking.
>
>
>                     Does this rephrase help: with the motivating
>                     examples (like and-of-shifts or bit check
>                     patterns) from the above differentials in mind,
>                     can we come up with a single reassociation order
>                     that solves all those and all the others that may
>                     come up in the future? Would we need
>                     different reassociation orders to fold different
>                     patterns?
>
>
>                 It doesn't quite help.
>                 When stated that generally, there can be no such
>                 ordering at all, that's easy to prove.  It is a
>                 statically undecidable problem.
>
>                 There is however, a different question and answer to a
>                 few related problems that maybe you are really asking?
>                 1. Is there a way to determine and apply the a maximal
>                 or nearly-maximal set of folds/graph transforms that
>                 could be applied to a given set of code in a sane and
>                 principled way -> yes
>
>                 (see,
>                 e.g., http://www.cs.cornell.edu/~ross/publications/eqsat/
>                
<http://www.cs.cornell.edu/%7Eross/publications/eqsat/>)
>
>                 2. Is there a way to determine all expressions in the
>                 program as it exists that are equivalent or equivalent
>                 under constant time constant folding/reassociation, in
>                 a reasonable time bound -> yes
>
>                 (not a single easy link, happy to talk about it)
>
>                 Your original question is basically equivalent to
>                 Is there a way to determine all expressions in the
>                 program as it exists that are equivalent or could be
>                 made equivalent through any type of folding that one
>                 can think up?
>                 The answer to that is "no", it's provable
that this is
>                 not statically decidable, so the time bound doesn't
>                 matter :)
>
>                 You have to limit the possible folding/evaluation you
>                 apply in various ways to make this decidable, and then
>                 further limit it to make the time bound reasonable.
>
>                 This all quickly devolves into herbrand equivalence
>                 and it's variations.
>
>
>             
>             Let me try one more time :) May we need multiple
>             reassociate passes to fold different reassociative patterns?
>
>             A longer version: If Sanjay wants a particular
>             reassociative pattern to be folded (D45842), Omer wants
>             another particular reassociative pattern to be folded
>             (D41574), and I want yet another particular reassociative
>             pattern to be folded (D46336), would we potentially need
>             three different reassociate passes with each combined with
>             instcombine, rather than just one that may be able to
>             somehow handle those cases in one shot, (assuming we don't
>             want to put those in instcombine)?
>
>             And it sounds like the answer is yes?
>
>
>         If you take the current instcombine as a base, then yes, that
>         is correct.
>
>         
>         If you are willing to rearchitect instcombine, the answer is
>         no, it's possible to do this all in a single pass in a
>         relatively sane way.
>
>
>     I assume by rearchitect, you mean a major rewrite as per this
>     comment: "Is there a way to determine all expressions in the
>     program as it exists that are equivalent or equivalent under
>     constant time constant folding/reassociation, in a reasonable time
>     bound -> yes". Any pointer or time to chat?
>
>
> I'm happy to do both.
>  
>
>     
>     I think that an approach like 
>     D
>     
>     
>     4
>     
>     
>     6
>     
>     
>     3
>     
>     
>     3
>     
>     
>     6
>     
>     
>      / 
>     
>     D
>     
>     
>     4
>     
>     
>     6
>     
>     
>     5
>     
>     
>     9
>     
>     
>     5
>     
>      has a merit: it would adds a bit of complexity, but would
>     not require:
>
>     1. a major rewrite of instcombine,
>     2. writing multiple (potentially many) reassociate passes and
>     figuring out how to fixpoint them with instcombine, or
>     3. writing a self-contained folding pass for a specific pattern
>
>     If you look at the diffs in the existing .ll files in 
>     D46336
>     , it helps fold some previously-unfolded reassociation patterns
>     beyond the bit check patterns that it originally targeted.
>
> Sure, and it does so by  adding another O(N) cost to evaluation in
> each case. Instcombine doesn't even do lazy reevaluation through
> tracking dependencies, so it'll do so a lot of times as well.
>
> To me, that's not a good tradeoff, especially given how slow
> instcombine is *already*.  The code it produces is "good enough"
to
> stop for a while and do something else and not suffer horribly in
> performance.[1]
>
> Let me ask a different question:
>
> At what point would anyone here be willing to stop adding things to
> instcombine and start doing something else instead, instead of waiting
> for someone else to do it?
> As far as i can tell, the answer is: "never", which makes most of
> these discussions just pointless rehashes as we slowly repeat the same
> disaster that became  gcc's instruction combiner  :)
>
> If the answer is "something", great, i'll set a mail filter
and ignore
> these threads until that something happens :)
>
> Personally, in my experience people will never do more here unless
> pushed somewhat, or the thing becomes such a complete disaster no one
> wants to touch it.
I've said this before, but I think a major impediment to forward
progress here is coming up with an agreement on what the "something
else" should be. Some of us have talked for years about having some
TableGen-driven replacement, or maybe we want something with a syntax
more like what is used by the Alive tool, but regardless, in order to
gain in efficiency I suspect we need a model that is more restrictive
than more-or-less arbitrary C++ code, and so we should pick a model and
figure out how things might work.
>
> [1]  Last year i computed the "improvement in performance on
> applications" due to instcombine for a bunch of google apps and open
> source apps that had easy to use benchmarks (IE I isolated about two
> years of instcombine changes and made them to a current compiler piece
> by piece while measuring performance).
> I also computed the compile time increase in single instcombine passes
> over the same time period.
>
> On x86, but the numbers basically said we were basically gaining
> nearly nothing for high cost.  IE our drive for better looking output
> does not appear to translate into any real gains that i can find. 
> Either improvements to other opts hid them, or they simply didn't
> matter on the processors i tested on.
>
> Certainly, apps/workloads/architectures may vary here, and my goal is
> not to claim it's all worthless.
> My actual goal in all of this was to get a sense of whether my
> perspective on instcombine was still "reasonable", not to do a
true
> scientific exploration :)
> I didn't have time/energy/etc to run it elsewhere, and again, my goal
> was not to give certainty/try to give exact percentages.
This also matches my experience, but I draw a somewhat different lesson.
I often tell application developers that *this* is why they must file
compiler bug reports. Waiting and assuming that someone else will hit
the same problem, and file the report, is a bad strategy. I think that
this is due to two things:

 1. As far as things go, the tail of the distribution is often really
long, and probability that the particular thing hampering one piece of
hot code is the same thing hampering another piece of hot code is often
small.

 2. We tend to add special cases instead of adding more-general
algorithms. The more-general work is often hard because figuring out the
cost modeling is often highly non-trivial. Also, when it's finally done,
the chances that the old special cases are removed is also small (so
we'll still accumulate cruft without specific effort).

 -Hal
>
> --Dan
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180511/2b3a06d9/attachment-0001.html>

Hiroshi Yamauchi via llvm-dev

2018-May-14 19:34 UTC

head link

[llvm-dev] more reassociation in IR

On Fri, May 11, 2018 at 7:20 PM Hal Finkel <hfinkel at anl.gov> wrote:
>
> On 05/11/2018 08:40 PM, Daniel Berlin via llvm-dev wrote:
>
>
>
> On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <yamauchi at
google.com>
> wrote:
>
>>
>>
>> On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <dberlin at
dberlin.org>
>> wrote:
>>
>>>
>>>
>>> On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <yamauchi at
google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <dberlin at
dberlin.org>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi
<yamauchi at google.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, May 8, 2018 at 11:15 AM Daniel Berlin
<dberlin at dberlin.org>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi
via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> (
>>>>>>>> I came across this issue in the context of
>>>>>>>>  D46336
<https://reviews.llvm.org/D46336>.
>>>>>>>>  
>>>>>>>> Thanks, Sanjay, for starting this discussion.)
>>>>>>>>
>>>>>>>> If
>>>>>>>> we will
>>>>>>>>  move
>>>>>>>> reassociation,
>>>>>>>> or keep additional ones
>>>>>>>> ,
>>>>>>>> out of instcombine,
>>>>>>>> open questions for me would be
>>>>>>>> :
>>>>>>>>
>>>>>>>>
>>>>>>>> 1. Since -reassociate isn't a fixed point
pass,
>>>>>>>>
>>>>>>>
>>>>>>> This is fixable, fwiw, without fixpointing it.
>>>>>>>
>>>>>>
>>>>>> How?
>>>>>>
>>>>>
>>>>> Depends on specifically which part you would like to know
about ;)
>>>>>
>>>>
>>>> Maybe I misunderstood what you meant by "This is
fixable". Did you
>>>> mean that we won't somehow need to fixpoint between
instcombine and
>>>> reassociate, or that the specific motivating examples from the
above
>>>> differentials are foldable without fixpointing?
>>>>
>>>
>>> If by fixpointing you mean "fixpointing reassociate and
instcombine",
>>> then yes, that is fixable without fixpointing reassociate and
instcombine,
>>> but would require rewriting instcombine :)
>>>
>>>
>>>>
>>>> If the latter, that may be the case. The concern was that we
may
>>>> encounter examples that may need many more iterations, if not
fixpointing.
>>>> As long as it's feasible to fixpoint between instcombine
and
>>>> reassociate, it seems to work, but I guess that would probably
need
>>>> some pass management change.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> we might need to repeat "-instcombine
-reassociate" multiple times
>>>>>>>> to
>>>>>>>> fold
>>>>>>>>  down to what we want (relating to my comment
here
>>>>>>>>
<https://reviews.llvm.org/D46336#1087082>). I assumed this isn't
>>>>>>>> not what we want to do
>>>>>>>> ? My impression is we don't do a
fixed-point with passes?
>>>>>>>>
>>>>>>>
>>>>>>> Well, i mean there is no practical difference
between passes that we
>>>>>>> fixpoint externally and fixpoint internally.
>>>>>>>
>>>>>> 
>>>>>> I had the following in mind: Does the pass manager
support
>>>>>> fixpointing externally? Is there any performance
difference? Are people
>>>>>> okay with that in general?
>>>>>>
>>>>>> But if there is no practical difference, I don't
see any problem
>>>>>> with that :)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> 2.
>>>>>>>> Since -reassociate needs to come up with one
operand order (at
>>>>>>>> least currently as the only reassociate pass),
would there exist a
>>>>>>>> single, unique operand order that would enable
all
>>>>>>>> reassociative/commutative foldings that we
want?
>>>>>>>>
>>>>>>>
>>>>>>> In what way?
>>>>>>> Are you asking whether there is a single
reassociation order that
>>>>>>> makes all foldings occur in the same operation or
something?
>>>>>>> I don't feel like i understand what you are
asking.
>>>>>>>
>>>>>>
>>>>>> Does this rephrase help: with the motivating examples
(like
>>>>>> and-of-shifts or bit check patterns) from the above
differentials in mind,
>>>>>> can we come up with a single reassociation order that
solves all
>>>>>> those and all the others that may come up in the
future? Would we need
>>>>>> different reassociation orders to fold different
patterns?
>>>>>>
>>>>>
>>>>> It doesn't quite help.
>>>>> When stated that generally, there can be no such ordering
at all,
>>>>> that's easy to prove.  It is a statically undecidable
problem.
>>>>>
>>>>> There is however, a different question and answer to a few
related
>>>>> problems that maybe you are really asking?
>>>>> 1. Is there a way to determine and apply the a maximal or
>>>>> nearly-maximal set of folds/graph transforms that could be
applied to a
>>>>> given set of code in a sane and principled way -> yes
>>>>>
>>>>> (see, e.g.,
http://www.cs.cornell.edu/~ross/publications/eqsat/)
>>>>>
>>>>> 2. Is there a way to determine all expressions in the
program as it
>>>>> exists that are equivalent or equivalent under constant
time constant
>>>>> folding/reassociation, in a reasonable time bound -> yes
>>>>>
>>>>> (not a single easy link, happy to talk about it)
>>>>>
>>>>> Your original question is basically equivalent to
>>>>> Is there a way to determine all expressions in the program
as it
>>>>> exists that are equivalent or could be made equivalent
through any type of
>>>>> folding that one can think up?
>>>>> The answer to that is "no", it's provable
that this is not statically
>>>>> decidable, so the time bound doesn't matter :)
>>>>>
>>>>> You have to limit the possible folding/evaluation you apply
in various
>>>>> ways to make this decidable, and then further limit it to
make the time
>>>>> bound reasonable.
>>>>>
>>>>> This all quickly devolves into herbrand equivalence and
it's
>>>>> variations.
>>>>>
>>>>>
>>>>> 
>>>> Let me try one more time :) May we need multiple reassociate
passes to
>>>> fold different reassociative patterns?
>>>>
>>>> A longer version: If Sanjay wants a particular reassociative
pattern to
>>>> be folded (D45842), Omer wants another particular reassociative
pattern
>>>> to be folded (D41574), and I want yet another particular
reassociative
>>>> pattern to be folded (D46336), would we potentially need three
>>>> different reassociate passes with each combined with
instcombine, rather
>>>> than just one that may be able to somehow handle those cases in
one
>>>> shot, (assuming we don't want to put those in instcombine)?
>>>>
>>>> And it sounds like the answer is yes?
>>>>
>>>
>>> If you take the current instcombine as a base, then yes, that is
correct.
>>>
>> 
>>> If you are willing to rearchitect instcombine, the answer is no,
it's
>>> possible to do this all in a single pass in a relatively sane way.
>>>
>>
>> I assume by rearchitect, you mean a major rewrite as per this comment:
"Is
>> there a way to determine all expressions in the program as it exists
that
>> are equivalent or equivalent under constant time constant
>> folding/reassociation, in a reasonable time bound -> yes". Any
pointer
>> or time to chat?
>>
>
> I'm happy to do both.
>
>
>> 
>> I think that an approach like 
>> D
>> 
>> 
>> 4
>> 
>> 
>> 6
>> 
>> 
>> 3
>> 
>> 
>> 3
>> 
>> 
>> 6
>> 
>> 
>>  /
>> 
>> D
>> 
>> 
>> 4
>> 
>> 
>> 6
>> 
>> 
>> 5
>> 
>> 
>> 9
>> 
>> 
>> 5
>> 
>>  has a merit: it would adds a bit of complexity, but would not
require:
>>
>> 1. a major rewrite of instcombine,
>> 2. writing multiple (potentially many) reassociate passes and figuring
>> out how to fixpoint them with instcombine, or
>> 3. writing a self-contained folding pass for a specific pattern
>>
>> If you look at the diffs in the existing .ll files in
>> D46336
>> , it helps fold some previously-unfolded reassociation patterns beyond
>> the bit check patterns that it originally targeted.
>>
>> Sure, and it does so by  adding another O(N) cost to evaluation in each
> case. Instcombine doesn't even do lazy reevaluation through tracking
> dependencies, so it'll do so a lot of times as well.
>
> To me, that's not a good tradeoff, especially given how slow
instcombine
> is *already*.  The code it produces is "good enough" to stop for
a while
> and do something else and not suffer horribly in performance.[1]
>
> Let me ask a different question:
>
> At what point would anyone here be willing to stop adding things to
> instcombine and start doing something else instead, instead of waiting for
> someone else to do it?
> As far as i can tell, the answer is: "never", which makes most of
these
> discussions just pointless rehashes as we slowly repeat the same disaster
> that became  gcc's instruction combiner  :)
>
> If the answer is "something", great, i'll set a mail filter
and ignore
> these threads until that something happens :)
>
> Personally, in my experience people will never do more here unless pushed
> somewhat, or the thing becomes such a complete disaster no one wants to
> touch it.
>
>
> I've said this before, but I think a major impediment to forward
progress
> here is coming up with an agreement on what the "something else"
should be.
> Some of us have talked for years about having some TableGen-driven
> replacement, or maybe we want something with a syntax more like what is
> used by the Alive tool, but regardless, in order to gain in efficiency I
> suspect we need a model that is more restrictive than more-or-less
> arbitrary C++ code, and so we should pick a model and figure out how things
> might work.
>
>
> [1]  Last year i computed the "improvement in performance on
applications"
> due to instcombine for a bunch of google apps and open source apps that had
> easy to use benchmarks (IE I isolated about two years of instcombine
> changes and made them to a current compiler piece by piece while measuring
> performance).
> I also computed the compile time increase in single instcombine passes
> over the same time period.
>
> On x86, but the numbers basically said we were basically gaining nearly
> nothing for high cost.  IE our drive for better looking output does not
> appear to translate into any real gains that i can find.  Either
> improvements to other opts hid them, or they simply didn't matter on
the
> processors i tested on.
>
> Certainly, apps/workloads/architectures may vary here, and my goal is not
> to claim it's all worthless.
> My actual goal in all of this was to get a sense of whether my perspective
> on instcombine was still "reasonable", not to do a true
scientific
> exploration :)
> I didn't have time/energy/etc to run it elsewhere, and again, my goal
was
> not to give certainty/try to give exact percentages.
>
>
> This also matches my experience, but I draw a somewhat different lesson. I
> often tell application developers that *this* is why they must file
> compiler bug reports. Waiting and assuming that someone else will hit the
> same problem, and file the report, is a bad strategy. I think that this is
> due to two things:
>
>  1. As far as things go, the tail of the distribution is often really
> long, and probability that the particular thing hampering one piece of hot
> code is the same thing hampering another piece of hot code is often small.
>
>  2. We tend to add special cases instead of adding more-general
> algorithms. The more-general work is often hard because figuring out the
> cost modeling is often highly non-trivial. Also, when it's finally
done,
> the chances that the old special cases are removed is also small (so
we'll
> still accumulate cruft without specific effort).
>
>I don't have better or larger study results or data, but for fixing some
smaller but important enough performance degradations (often in
microbenchmarks, but occasionally in larger settings) smaller compiler
improvements (not necessarily in instcombine but around that level) do seem
to matter at least in individual contexts. It's not clear how much those
actually mattered in a grander scheme of things, though.

It sounds like we don't want to add to instcombine due to added
cost/complexity, rearchitecting would be hard, it's not clear if
incremental changes in instcombine did much per cost, which would make it
harder to justify rearchitecting it...

What do people think of approaches like D41574 and D45842?


>  -Hal
>
>
> --Dan
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180514/fd3d6a32/attachment-0001.html>

Michael Zolotukhin via llvm-dev

2018-May-29 23:30 UTC

head link

[llvm-dev] more reassociation in IR

> On May 11, 2018, at 7:20 PM, Hal Finkel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> On 05/11/2018 08:40 PM, Daniel Berlin via llvm-dev wrote:
>> 
>> 
>> On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <yamauchi at
google.com <mailto:yamauchi at google.com>> wrote:
>> 
>> 
>> On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <dberlin at
dberlin.org <mailto:dberlin at dberlin.org>> wrote:
>> 
>> 
>> On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <yamauchi at
google.com <mailto:yamauchi at google.com>> wrote:
>> 
>> 
>> On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <dberlin at dberlin.org
<mailto:dberlin at dberlin.org>> wrote:
>> 
>> 
>> On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <yamauchi at
google.com <mailto:yamauchi at google.com>> wrote:
>> 
>> 
>> On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <dberlin at
dberlin.org <mailto:dberlin at dberlin.org>> wrote:
>> 
>> 
>> On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>> ( I came across this issue in the context of  D46336
<https://reviews.llvm.org/D46336>.   Thanks, Sanjay, for starting this
discussion.)
>> 
>> If we will  move reassociation,  or keep additional ones , out of
instcombine, open questions for me would be :
>> 
>> 1. Since -reassociate isn't a fixed point pass,
>> 
>> This is fixable, fwiw, without fixpointing it.
>> 
>> How?
>> 
>> Depends on specifically which part you would like to know about ;)
>> 
>> Maybe I misunderstood what you meant by "This is fixable".
Did you mean that we won't somehow need to fixpoint between instcombine and
reassociate, or that the specific motivating examples from the above
differentials are foldable without fixpointing?
>> 
>> If by fixpointing you mean "fixpointing reassociate and
instcombine", then yes, that is fixable without fixpointing reassociate and
instcombine, but would require rewriting instcombine :)
>>  
>> 
>> If the latter, that may be the case. The concern was that we may
encounter examples that may need many more iterations, if not fixpointing. As
long as it's feasible to fixpoint between instcombine and reassociate, it
seems to work, but I guess that would probably need some pass management change.
>> 
>>  
>> 
>>  
>> we might need to repeat "-instcombine -reassociate" multiple
times to fold  down to what we want (relating to my comment here
<https://reviews.llvm.org/D46336#1087082>). I assumed this isn't not
what we want to do ? My impression is we don't do a fixed-point with
passes?
>> 
>> Well, i mean there is no practical difference between passes that we
fixpoint externally and fixpoint internally.
>> 
>> I had the following in mind: Does the pass manager support fixpointing
externally? Is there any performance difference? Are people okay with that in
general?
>> 
>> But if there is no practical difference, I don't see any problem
with that :)
>> 
>>  
>>  
>> 2. Since -reassociate needs to come up with one operand order (at
least currently as the only reassociate pass), would there exist a single,
unique operand order that would enable all reassociative/commutative foldings
that we want?
>> 
>> In what way?
>> Are you asking whether there is a single reassociation order that makes
all foldings occur in the same operation or something?
>> I don't feel like i understand what you are asking.
>> 
>> Does this rephrase help: with the motivating examples (like
and-of-shifts or bit check patterns) from the above differentials in mind, can
we come up with a single reassociation order that solves all those and all the
others that may come up in the future? Would we need different reassociation
orders to fold different patterns?
>> 
>> It doesn't quite help.
>> When stated that generally, there can be no such ordering at all,
that's easy to prove.  It is a statically undecidable problem.
>> 
>> There is however, a different question and answer to a few related
problems that maybe you are really asking?
>> 1. Is there a way to determine and apply the a maximal or
nearly-maximal set of folds/graph transforms that could be applied to a given
set of code in a sane and principled way -> yes
>> 
>> (see, e.g., http://www.cs.cornell.edu/~ross/publications/eqsat/
<http://www.cs.cornell.edu/%7Eross/publications/eqsat/>)
>> 
>> 2. Is there a way to determine all expressions in the program as it
exists that are equivalent or equivalent under constant time constant
folding/reassociation, in a reasonable time bound -> yes
>> 
>> (not a single easy link, happy to talk about it)
>> 
>> Your original question is basically equivalent to
>> Is there a way to determine all expressions in the program as it exists
that are equivalent or could be made equivalent through any type of folding that
one can think up?
>> The answer to that is "no", it's provable that this is
not statically decidable, so the time bound doesn't matter :)
>> 
>> You have to limit the possible folding/evaluation you apply in various
ways to make this decidable, and then further limit it to make the time bound
reasonable.
>> 
>> This all quickly devolves into herbrand equivalence and it's
variations.
>> 
>> 
>> 
>> Let me try one more time :) May we need multiple reassociate passes to
fold different reassociative patterns?
>> 
>> A longer version: If Sanjay wants a particular reassociative pattern to
be folded (D45842), Omer wants another particular reassociative pattern to be
folded (D41574), and I want yet another particular reassociative pattern to be
folded (D46336), would we potentially need three different reassociate passes
with each combined with instcombine, rather than just one that may be able to
somehow handle those cases in one shot, (assuming we don't want to put those
in instcombine)?
>> 
>> And it sounds like the answer is yes?
>> 
>> If you take the current instcombine as a base, then yes, that is
correct.
>>  If you are willing to rearchitect instcombine, the answer is no,
it's possible to do this all in a single pass in a relatively sane way.
>> 
>> I assume by rearchitect, you mean a major rewrite as per this comment:
"Is there a way to determine all expressions in the program as it exists
that are equivalent or equivalent under constant time constant
folding/reassociation, in a reasonable time bound -> yes". Any pointer
or time to chat?
>> 
>> I'm happy to do both.
>>  
>> 
>> I think that an approach like  D   4   6   3   3  
6    /   D   4   6   5   9   5   has a merit: it
would adds a bit of complexity, but would not require:
>> 
>> 1. a major rewrite of instcombine,
>> 2. writing multiple (potentially many) reassociate passes and figuring
out how to fixpoint them with instcombine, or
>> 3. writing a self-contained folding pass for a specific pattern
>> 
>> If you look at the diffs in the existing .ll files in  D46336 , it
helps fold some previously-unfolded reassociation patterns beyond the bit check
patterns that it originally targeted.
>> 
>> Sure, and it does so by  adding another O(N) cost to evaluation in each
case. Instcombine doesn't even do lazy reevaluation through tracking
dependencies, so it'll do so a lot of times as well.
>> 
>> To me, that's not a good tradeoff, especially given how slow
instcombine is *already*.  The code it produces is "good enough" to
stop for a while and do something else and not suffer horribly in
performance.[1]
>> 
>> Let me ask a different question:
>> 
>> At what point would anyone here be willing to stop adding things to
instcombine and start doing something else instead, instead of waiting for
someone else to do it?
>> As far as i can tell, the answer is: "never", which makes
most of these discussions just pointless rehashes as we slowly repeat the same
disaster that became  gcc's instruction combiner  :)
>> 
>> If the answer is "something", great, i'll set a mail
filter and ignore these threads until that something happens :)
>> 
>> Personally, in my experience people will never do more here unless
pushed somewhat, or the thing becomes such a complete disaster no one wants to
touch it.
> 
> I've said this before, but I think a major impediment to forward
progress here is coming up with an agreement on what the "something
else" should be. Some of us have talked for years about having some
TableGen-driven replacement, or maybe we want something with a syntax more like
what is used by the Alive tool, but regardless, in order to gain in efficiency I
suspect we need a model that is more restrictive than more-or-less arbitrary C++
code, and so we should pick a model and figure out how things might work.
> 
>> 
>> [1]  Last year i computed the "improvement in performance on
applications" due to instcombine for a bunch of google apps and open source
apps that had easy to use benchmarks (IE I isolated about two years of
instcombine changes and made them to a current compiler piece by piece while
measuring performance).
>> I also computed the compile time increase in single instcombine passes
over the same time period.
>> 
>> On x86, but the numbers basically said we were basically gaining nearly
nothing for high cost.  IE our drive for better looking output does not appear
to translate into any real gains that i can find.  Either improvements to other
opts hid them, or they simply didn't matter on the processors i tested on.
>> 
>> Certainly, apps/workloads/architectures may vary here, and my goal is
not to claim it's all worthless.
>> My actual goal in all of this was to get a sense of whether my
perspective on instcombine was still "reasonable", not to do a true
scientific exploration :)
>> I didn't have time/energy/etc to run it elsewhere, and again, my
goal was not to give certainty/try to give exact percentages.
> 
> This also matches my experience, but I draw a somewhat different lesson. I
often tell application developers that *this* is why they must file compiler bug
reports. Waiting and assuming that someone else will hit the same problem, and
file the report, is a bad strategy. I think that this is due to two things:
> 
>  1. As far as things go, the tail of the distribution is often really long,
and probability that the particular thing hampering one piece of hot code is the
same thing hampering another piece of hot code is often small.
> 
>  2. We tend to add special cases instead of adding more-general algorithms.
The more-general work is often hard because figuring out the cost modeling is
often highly non-trivial. Also, when it's finally done, the chances that the
old special cases are removed is also small (so we'll still accumulate cruft
without specific effort).My experience is also exactly like this. We keep adding new patterns to
inst-combine, but nothing has been done yet to modularize or somehow partition
it to reduce the costs we need to pay for these new patterns. Inst-combine is
scheduled to run 8 times in the current O2/Os pipelines and 9 times in O3. In
most applications the new patterns will probably never even trigger, but we will
check for them at least 8 times. It’s more or less acceptable for O3, where
we’re supposed to try squeezing as much optimizations as possible, but for O2
and Os it doesn’t sound reasonable to me.

Recently AggressiveInstCombine pass has been added and probably some of the
patterns might go there to slightly reduce the costs, but a bigger rethink might
be needed in this area anyway. I’d be glad to help doing this if we agree on a
general direction we want to go.

Michael> 
>  -Hal
> 
>> 
>> --Dan
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180529/0b8445b6/attachment-0001.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - May 2018 - more reassociation in IR

[llvm-dev] more reassociation in IR

[llvm-dev] more reassociation in IR

[llvm-dev] more reassociation in IR

[llvm-dev] more reassociation in IR

Seemingly Similar Threads