thr3ads.net - llvm dev - [LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ? [Jul 2015]

If this information is useful, please help other people find it:
Share via:

James Molloy

2015-Jul-15 20:47 UTC

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

> Given what you are saying, you are also suggesting we are notrematerializing addressing computations where it is cheaper to do so.
That seems pretty critical to good RA :P

Yep, about 5 months ago I had a conversation about this too... it may even
be the one you're referencing. Our remat is really conservative - it only
rematerializes values that have zero input operands (move immediate only,
essentially).

James

On Wed, 15 Jul 2015 at 21:28 Daniel Berlin <dberlin at dberlin.org> wrote:
> IMHO, This doesn't make a lot of sense to turn off this part on
it's own.
> I would just use the enable-pre flag to turn off scalar PRE, as it
> will cause the same issue in other cases as well.
> Is there some reason you aren't just doing that?
> I suspect if this is a performance win, that would be as well.
>
> Also note that you will have the same problem as GVN/EarlyCSE/etc
> becomes smarter, as these are full redundancies being eliminated (IE
> there is no insertion happening). It just happens that PRE notices
> them and GVN doesn't, because GVN is dominator based and PRE is not.
> A slightly smarter GVN/EarlyCSE would do the same thing.
>
>
> Given what you are saying, you are also suggesting we are not
> rematerializing addressing computations where it is cheaper to do so.
> That seems pretty critical to good RA :P
>
>
>
> On Wed, Jul 15, 2015 at 11:51 AM, Lawrence <lawrence at
codeaurora.org>
> wrote:
> > Hi, Daniel:
> >
> > Thanks a lot for detailed background information, we are willing to
> provide the right fix, however it will take time, do you mind if you
> forward me the discussion you had 5 months ago?  I may not be able to
> access it since I only joined ellvmdev list this week.
> >
> > I did some performance measurement last night, some of our critical
> benchmark degraded up to 30% with your patch, so we have to turn it off for
> now at least.
> >
> > I posted patch to add a debug option (off by default), so we could
turn
> it off with that option, could you please review it and commit it for me if
> possible?  I don't have commit right yet, will ask soon.
> > http://reviews.llvm.org/D11234
> >
> > Thanks again.
> >
> > Lawrence Hu
> >
> >
> > -----Original Message-----
> > From: Daniel Berlin [mailto:dberlin at dberlin.org]
> > Sent: Wednesday, July 15, 2015 7:48 AM
> > To: Lawrence
> > Cc: LLVM Developers Mailing List
> > Subject: Re: Register pressure mechanism in PRE or Smarter
> rematerialization/split/spiller/coalescing ?
> >
> > On Tue, Jul 14, 2015 at 11:43 PM, Lawrence <lawrence at
codeaurora.org>
> wrote:
> >> I thought about a little bit more, I think adding Register
pressure
> control in your patch or PRE may be the only choice.
> >>
> >> Because at least for this case I am looking at,  what your patch
did is
> created more relatively complex long live range, rematerialization is not
> smart enough to undo your change or at least without a lot of work,
> coalescing only create even longer live range not shorter, Spiller
can't
> help since it's the Spiller created Spill/Reloads due to high register
> pressure, Splitting can shorten the live ranges, but I don't think it
can
> handle your case without a lot of work.
> >>
> >
> > 1. As I mentioned, it simply fixes a bug in implementation of one of
the
> two PRE's LLVM has.  It does not  change the PRE algorithm or add
> > anything to it.  The code had a bug. I fixed the bug :P.    PRE is
> > *not even adding more code in this case*.   The code is already there.
> >   All it is doing is inserting a phi node.  If you transformed your
code
> to use memory, and reverted my patch, you'd get the same result,
because
> Load PRE will do the same thing. It's what PRE does.
> >
> > 2. GCC and other compilers have PRE's literally the same thing my
patch
> does (you are welcome to verify, i wrote GCC's :P), and apparently are
> smart enough to handle this in RA.  So i'm going to suggest that it is,
in
> fact, possible to do so, and i'm going to further suggest that if we
want
> to match their performance, we need to be able to do the same.  You
can't
> simply "turn down" any optimization that RA may have to deal
with.  It
> usually doesn't work in practice.
> > This is one of the reasons good RA is so hard.
> >
> > 3. As I also mentioned, register pressure heuristics in PRE simply do
> not work.  They've been tried.  By many.  With little to no success.
> >
> > PRE is too high in the stack of optimizations to estimate register
> > pressure in any sane fashion.   It's pretty much a fools errand. 
You
> > can never tune it to do what you want.  *Many* have tried.
> >
> > Your base level problem here is that all modern PRE algorithms (except
> for min-cut PRE, as I mentioned), are based on a notion of lifetime
> optimality. That is, they extend lifetimes as minimally as possible to
> still eliminate a given redundancy. Ours does the same.
> >
> > However, this is not an entirely useful metric.  Optimizing for some
> other metric is what something like min-cut PRE lets you do.
> > But even then,  register pressure heuristics are almost certainly not
> the answer.
> >
> > 4. This was actually already discussed when the patch was submitted,
and
> the consensus was "we should just fix RA".  Feel free to look at
the
> discussion 5 months ago.
> >
> > I would suggest, if you want to fix this, you take the approach that
was
> discussed then.
> >
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/f3c5706b/attachment.html>

Philip Reames

2015-Jul-15 21:43 UTC

head link

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

On 07/15/2015 01:47 PM, James Molloy wrote:> > Given what you are saying, you are also suggesting we are not
> rematerializing addressing computations where it is cheaper to do so.
> That seems pretty critical to good RA :P
>
> Yep, about 5 months ago I had a conversation about this too... it may 
> even be the one you're referencing. Our remat is really conservative - 
> it only rematerializes values that have zero input operands (move 
> immediate only, essentially).I thought we could remat explicitly invariant loads as
well?>
> James
>
> On Wed, 15 Jul 2015 at 21:28 Daniel Berlin <dberlin at dberlin.org 
> <mailto:dberlin at dberlin.org>> wrote:
>
>     IMHO, This doesn't make a lot of sense to turn off this part on
>     it's own.
>     I would just use the enable-pre flag to turn off scalar PRE, as it
>     will cause the same issue in other cases as well.
>     Is there some reason you aren't just doing that?
>     I suspect if this is a performance win, that would be as well.
>
>     Also note that you will have the same problem as GVN/EarlyCSE/etc
>     becomes smarter, as these are full redundancies being eliminated (IE
>     there is no insertion happening). It just happens that PRE notices
>     them and GVN doesn't, because GVN is dominator based and PRE is
not.
>     A slightly smarter GVN/EarlyCSE would do the same thing.
>
>
>     Given what you are saying, you are also suggesting we are not
>     rematerializing addressing computations where it is cheaper to do so.
>     That seems pretty critical to good RA :P
>
>
>
>     On Wed, Jul 15, 2015 at 11:51 AM, Lawrence
>     <lawrence at codeaurora.org <mailto:lawrence at
codeaurora.org>> wrote:
>     > Hi, Daniel:
>     >
>     > Thanks a lot for detailed background information, we are willing
>     to provide the right fix, however it will take time, do you mind
>     if you forward me the discussion you had 5 months ago?  I may not
>     be able to access it since I only joined ellvmdev list this week.
>     >
>     > I did some performance measurement last night, some of our
>     critical benchmark degraded up to 30% with your patch, so we have
>     to turn it off for now at least.
>     >
>     > I posted patch to add a debug option (off by default), so we
>     could turn it off with that option, could you please review it and
>     commit it for me if possible?  I don't have commit right yet, will
>     ask soon.
>     > http://reviews.llvm.org/D11234
>    
<https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D11234&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=zIdYS4bXbG63-LsNf28NXGKkrEdKPhY_mdzZrzA1eQE&s=6zQDYowITv9KrC3Glp-DVdhOmWxIQRFN5V8VIAdvnJQ&e=>
>     >
>     > Thanks again.
>     >
>     > Lawrence Hu
>     >
>     >
>     > -----Original Message-----
>     > From: Daniel Berlin [mailto:dberlin at dberlin.org
>     <mailto:dberlin at dberlin.org>]
>     > Sent: Wednesday, July 15, 2015 7:48 AM
>     > To: Lawrence
>     > Cc: LLVM Developers Mailing List
>     > Subject: Re: Register pressure mechanism in PRE or Smarter
>     rematerialization/split/spiller/coalescing ?
>     >
>     > On Tue, Jul 14, 2015 at 11:43 PM, Lawrence
>     <lawrence at codeaurora.org <mailto:lawrence at
codeaurora.org>> wrote:
>     >> I thought about a little bit more, I think adding Register
>     pressure control in your patch or PRE may be the only choice.
>     >>
>     >> Because at least for this case I am looking at,  what your
>     patch did is created more relatively complex long live range,
>     rematerialization is not smart enough to undo your change or at
>     least without a lot of work, coalescing only create even longer
>     live range not shorter, Spiller can't help since it's the
Spiller
>     created Spill/Reloads due to high register pressure, Splitting can
>     shorten the live ranges, but I don't think it can handle your case
>     without a lot of work.
>     >>
>     >
>     > 1. As I mentioned, it simply fixes a bug in implementation of
>     one of the two PRE's LLVM has.  It does not change the PRE
>     algorithm or add
>     > anything to it.  The code had a bug. I fixed the bug :P.   PRE is
>     > *not even adding more code in this case*.   The code is already
>     there.
>     >   All it is doing is inserting a phi node.  If you transformed
>     your code to use memory, and reverted my patch, you'd get the same
>     result, because Load PRE will do the same thing. It's what PRE
does.
>     >
>     > 2. GCC and other compilers have PRE's literally the same thing
>     my patch does (you are welcome to verify, i wrote GCC's :P), and
>     apparently are smart enough to handle this in RA.  So i'm going to
>     suggest that it is, in fact, possible to do so, and i'm going to
>     further suggest that if we want to match their performance, we
>     need to be able to do the same.  You can't simply "turn
down" any
>     optimization that RA may have to deal with.  It usually doesn't
>     work in practice.
>     > This is one of the reasons good RA is so hard.
>     >
>     > 3. As I also mentioned, register pressure heuristics in PRE
>     simply do not work.  They've been tried.  By many.  With little to
>     no success.
>     >
>     > PRE is too high in the stack of optimizations to estimate register
>     > pressure in any sane fashion.   It's pretty much a fools
>     errand.  You
>     > can never tune it to do what you want.  *Many* have tried.
>     >
>     > Your base level problem here is that all modern PRE algorithms
>     (except for min-cut PRE, as I mentioned), are based on a notion of
>     lifetime optimality. That is, they extend lifetimes as minimally
>     as possible to still eliminate a given redundancy. Ours does the same.
>     >
>     > However, this is not an entirely useful metric. Optimizing for
>     some other metric is what something like min-cut PRE lets you do.
>     > But even then,  register pressure heuristics are almost
>     certainly not the answer.
>     >
>     > 4. This was actually already discussed when the patch was
>     submitted, and the consensus was "we should just fix RA".
Feel
>     free to look at the discussion 5 months ago.
>     >
>     > I would suggest, if you want to fix this, you take the approach
>     that was discussed then.
>     >
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/58477794/attachment.html>

Hal Finkel

2015-Jul-16 04:44 UTC

head link

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

----- Original Message -----
> From: "Philip Reames" <listmail at philipreames.com>
> To: "James Molloy" <james at jamesmolloy.co.uk>,
"Daniel Berlin"
> <dberlin at dberlin.org>, "Lawrence" <lawrence at
codeaurora.org>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, July 15, 2015 4:43:51 PM
> Subject: Re: [LLVMdev] Register pressure mechanism in PRE or Smarter
> rematerialization/split/spiller/coalescing ?
> On 07/15/2015 01:47 PM, James Molloy wrote:
> > > Given what you are saying, you are also suggesting we are not
> 
> > rematerializing addressing computations where it is cheaper to do
> > so.
> 
> > That seems pretty critical to good RA :P
> 
> > Yep, about 5 months ago I had a conversation about this too... it
> > may
> > even be the one you're referencing. Our remat is really
> > conservative
> > - it only rematerializes values that have zero input operands (move
> > immediate only, essentially).
> 
> I thought we could remat explicitly invariant loads as well?
We can. See TargetInstrInfo::isReallyTriviallyReMaterializableGeneric in
lib/CodeGen/TargetInstrInfo.cpp -- What we can't do, for example, it
rematerialize small sequences of instructions, such as those that form constants
out of smaller immediates.

-Hal 
> > James
> 
> > On Wed, 15 Jul 2015 at 21:28 Daniel Berlin < dberlin at dberlin.org
>
> > wrote:
> 
> > > IMHO, This doesn't make a lot of sense to turn off this part
on
> > > it's
> > > own.
> > 
> 
> > > I would just use the enable-pre flag to turn off scalar PRE, as
> > > it
> > 
> 
> > > will cause the same issue in other cases as well.
> > 
> 
> > > Is there some reason you aren't just doing that?
> > 
> 
> > > I suspect if this is a performance win, that would be as well.
> > 
> 
> > > Also note that you will have the same problem as GVN/EarlyCSE/etc
> > 
> 
> > > becomes smarter, as these are full redundancies being eliminated
> > > (IE
> > 
> 
> > > there is no insertion happening). It just happens that PRE
> > > notices
> > 
> 
> > > them and GVN doesn't, because GVN is dominator based and PRE
is
> > > not.
> > 
> 
> > > A slightly smarter GVN/EarlyCSE would do the same thing.
> > 
> 
> > > Given what you are saying, you are also suggesting we are not
> > 
> 
> > > rematerializing addressing computations where it is cheaper to do
> > > so.
> > 
> 
> > > That seems pretty critical to good RA :P
> > 
> 
> > > On Wed, Jul 15, 2015 at 11:51 AM, Lawrence <
> > > lawrence at codeaurora.org
> > > > wrote:
> > 
> 
> > > > Hi, Daniel:
> > 
> 
> > > >
> > 
> 
> > > > Thanks a lot for detailed background information, we are
> > > > willing
> > > > to
> > > > provide the right fix, however it will take time, do you
mind
> > > > if
> > > > you forward me the discussion you had 5 months ago? I may
not
> > > > be
> > > > able to access it since I only joined ellvmdev list this
week.
> > 
> 
> > > >
> > 
> 
> > > > I did some performance measurement last night, some of our
> > > > critical
> > > > benchmark degraded up to 30% with your patch, so we have to
> > > > turn
> > > > it off for now at least.
> > 
> 
> > > >
> > 
> 
> > > > I posted patch to add a debug option (off by default), so we
> > > > could
> > > > turn it off with that option, could you please review it and
> > > > commit it for me if possible? I don't have commit right
yet,
> > > > will
> > > > ask soon.
> > 
> 
> > > > http://reviews.llvm.org/D11234
> > 
> 
> > > >
> > 
> 
> > > > Thanks again.
> > 
> 
> > > >
> > 
> 
> > > > Lawrence Hu
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > -----Original Message-----
> > 
> 
> > > > From: Daniel Berlin [mailto: dberlin at dberlin.org ]
> > 
> 
> > > > Sent: Wednesday, July 15, 2015 7:48 AM
> > 
> 
> > > > To: Lawrence
> > 
> 
> > > > Cc: LLVM Developers Mailing List
> > 
> 
> > > > Subject: Re: Register pressure mechanism in PRE or Smarter
> > > > rematerialization/split/spiller/coalescing ?
> > 
> 
> > > >
> > 
> 
> > > > On Tue, Jul 14, 2015 at 11:43 PM, Lawrence <
> > > > lawrence at codeaurora.org > wrote:
> > 
> 
> > > >> I thought about a little bit more, I think adding
Register
> > > >> pressure control in your patch or PRE may be the only
choice.
> > 
> 
> > > >>
> > 
> 
> > > >> Because at least for this case I am looking at, what
your
> > > >> patch
> > > >> did is created more relatively complex long live range,
> > > >> rematerialization is not smart enough to undo your
change or
> > > >> at
> > > >> least without a lot of work, coalescing only create even
> > > >> longer
> > > >> live range not shorter, Spiller can't help since
it's the
> > > >> Spiller
> > > >> created Spill/Reloads due to high register pressure,
Splitting
> > > >> can shorten the live ranges, but I don't think it
can handle
> > > >> your
> > > >> case without a lot of work.
> > 
> 
> > > >>
> > 
> 
> > > >
> > 
> 
> > > > 1. As I mentioned, it simply fixes a bug in implementation
of
> > > > one
> > > > of the two PRE's LLVM has. It does not change the PRE
algorithm
> > > > or
> > > > add
> > 
> 
> > > > anything to it. The code had a bug. I fixed the bug :P. PRE
is
> > 
> 
> > > > *not even adding more code in this case*. The code is
already
> > > > there.
> > 
> 
> > > > All it is doing is inserting a phi node. If you transformed
> > > > your
> > > > code to use memory, and reverted my patch, you'd get the
same
> > > > result, because Load PRE will do the same thing. It's
what PRE
> > > > does.
> > 
> 
> > > >
> > 
> 
> > > > 2. GCC and other compilers have PRE's literally the same
thing
> > > > my
> > > > patch does (you are welcome to verify, i wrote GCC's
:P), and
> > > > apparently are smart enough to handle this in RA. So i'm
going
> > > > to
> > > > suggest that it is, in fact, possible to do so, and i'm
going
> > > > to
> > > > further suggest that if we want to match their performance,
we
> > > > need to be able to do the same. You can't simply
"turn down"
> > > > any
> > > > optimization that RA may have to deal with. It usually
doesn't
> > > > work in practice.
> > 
> 
> > > > This is one of the reasons good RA is so hard.
> > 
> 
> > > >
> > 
> 
> > > > 3. As I also mentioned, register pressure heuristics in PRE
> > > > simply
> > > > do not work. They've been tried. By many. With little to
no
> > > > success.
> > 
> 
> > > >
> > 
> 
> > > > PRE is too high in the stack of optimizations to estimate
> > > > register
> > 
> 
> > > > pressure in any sane fashion. It's pretty much a fools
errand.
> > > > You
> > 
> 
> > > > can never tune it to do what you want. *Many* have tried.
> > 
> 
> > > >
> > 
> 
> > > > Your base level problem here is that all modern PRE
algorithms
> > > > (except for min-cut PRE, as I mentioned), are based on a
notion
> > > > of
> > > > lifetime optimality. That is, they extend lifetimes as
> > > > minimally
> > > > as possible to still eliminate a given redundancy. Ours does
> > > > the
> > > > same.
> > 
> 
> > > >
> > 
> 
> > > > However, this is not an entirely useful metric. Optimizing
for
> > > > some
> > > > other metric is what something like min-cut PRE lets you do.
> > 
> 
> > > > But even then, register pressure heuristics are almost
> > > > certainly
> > > > not the answer.
> > 
> 
> > > >
> > 
> 
> > > > 4. This was actually already discussed when the patch was
> > > > submitted, and the consensus was "we should just fix
RA". Feel
> > > > free to look at the discussion 5 months ago.
> > 
> 
> > > >
> > 
> 
> > > > I would suggest, if you want to fix this, you take the
approach
> > > > that was discussed then.
> > 
> 
> > > >
> > 
> 
> > > _______________________________________________
> > 
> 
> > > LLVM Developers mailing list
> > 
> 
> > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> > 
> 
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > 
> 
> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> 
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/c8674a98/attachment.html>

llvm dev - Jul 2015 - [LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?