thr3ads.net - llvm dev - [LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ? [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Lawrence

2015-Jul-15 18:51 UTC

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

Hi, Daniel:

Thanks a lot for detailed background information, we are willing to provide the
right fix, however it will take time, do you mind if you forward me the
discussion you had 5 months ago?  I may not be able to access it since I only
joined ellvmdev list this week.

I did some performance measurement last night, some of our critical benchmark
degraded up to 30% with your patch, so we have to turn it off for now at least.

I posted patch to add a debug option (off by default), so we could turn it off
with that option, could you please review it and commit it for me if possible? 
I don't have commit right yet, will ask soon.
http://reviews.llvm.org/D11234

Thanks again.

Lawrence Hu

-----Original Message-----
From: Daniel Berlin [mailto:dberlin at dberlin.org] 
Sent: Wednesday, July 15, 2015 7:48 AM
To: Lawrence
Cc: LLVM Developers Mailing List
Subject: Re: Register pressure mechanism in PRE or Smarter
rematerialization/split/spiller/coalescing ?

On Tue, Jul 14, 2015 at 11:43 PM, Lawrence <lawrence at codeaurora.org>
wrote:> I thought about a little bit more, I think adding Register pressure control
in your patch or PRE may be the only choice.
>
> Because at least for this case I am looking at,  what your patch did is
created more relatively complex long live range, rematerialization is not smart
enough to undo your change or at least without a lot of work, coalescing only
create even longer live range not shorter, Spiller can't help since it's
the Spiller created Spill/Reloads due to high register pressure, Splitting can
shorten the live ranges, but I don't think it can handle your case without a
lot of work.
>
1. As I mentioned, it simply fixes a bug in implementation of one of the two
PRE's LLVM has.  It does not  change the PRE algorithm or add
anything to it.  The code had a bug. I fixed the bug :P.    PRE is
*not even adding more code in this case*.   The code is already there.
  All it is doing is inserting a phi node.  If you transformed your code to use
memory, and reverted my patch, you'd get the same result, because Load PRE
will do the same thing. It's what PRE does.

2. GCC and other compilers have PRE's literally the same thing my patch does
(you are welcome to verify, i wrote GCC's :P), and apparently are smart
enough to handle this in RA.  So i'm going to suggest that it is, in fact,
possible to do so, and i'm going to further suggest that if we want to match
their performance, we need to be able to do the same.  You can't simply
"turn down" any optimization that RA may have to deal with.  It
usually doesn't work in practice.
This is one of the reasons good RA is so hard.

3. As I also mentioned, register pressure heuristics in PRE simply do not work. 
They've been tried.  By many.  With little to no success.

PRE is too high in the stack of optimizations to estimate register
pressure in any sane fashion.   It's pretty much a fools errand.  You
can never tune it to do what you want.  *Many* have tried.

Your base level problem here is that all modern PRE algorithms (except for
min-cut PRE, as I mentioned), are based on a notion of lifetime optimality. That
is, they extend lifetimes as minimally as possible to still eliminate a given
redundancy. Ours does the same.

However, this is not an entirely useful metric.  Optimizing for some other
metric is what something like min-cut PRE lets you do.
But even then,  register pressure heuristics are almost certainly not the
answer.

4. This was actually already discussed when the patch was submitted, and the
consensus was "we should just fix RA".  Feel free to look at the
discussion 5 months ago.

I would suggest, if you want to fix this, you take the approach that was
discussed then.

Daniel Berlin

2015-Jul-15 20:10 UTC

head link

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

IMHO, This doesn't make a lot of sense to turn off this part on it's
own.
I would just use the enable-pre flag to turn off scalar PRE, as it
will cause the same issue in other cases as well.
Is there some reason you aren't just doing that?
I suspect if this is a performance win, that would be as well.

Also note that you will have the same problem as GVN/EarlyCSE/etc
becomes smarter, as these are full redundancies being eliminated (IE
there is no insertion happening). It just happens that PRE notices
them and GVN doesn't, because GVN is dominator based and PRE is not.
A slightly smarter GVN/EarlyCSE would do the same thing.


Given what you are saying, you are also suggesting we are not
rematerializing addressing computations where it is cheaper to do so.
That seems pretty critical to good RA :P



On Wed, Jul 15, 2015 at 11:51 AM, Lawrence <lawrence at codeaurora.org>
wrote:> Hi, Daniel:
>
> Thanks a lot for detailed background information, we are willing to provide
the right fix, however it will take time, do you mind if you forward me the
discussion you had 5 months ago?  I may not be able to access it since I only
joined ellvmdev list this week.
>
> I did some performance measurement last night, some of our critical
benchmark degraded up to 30% with your patch, so we have to turn it off for now
at least.
>
> I posted patch to add a debug option (off by default), so we could turn it
off with that option, could you please review it and commit it for me if
possible?  I don't have commit right yet, will ask soon.
> http://reviews.llvm.org/D11234
>
> Thanks again.
>
> Lawrence Hu
>
>
> -----Original Message-----
> From: Daniel Berlin [mailto:dberlin at dberlin.org]
> Sent: Wednesday, July 15, 2015 7:48 AM
> To: Lawrence
> Cc: LLVM Developers Mailing List
> Subject: Re: Register pressure mechanism in PRE or Smarter
rematerialization/split/spiller/coalescing ?
>
> On Tue, Jul 14, 2015 at 11:43 PM, Lawrence <lawrence at
codeaurora.org> wrote:
>> I thought about a little bit more, I think adding Register pressure
control in your patch or PRE may be the only choice.
>>
>> Because at least for this case I am looking at,  what your patch did is
created more relatively complex long live range, rematerialization is not smart
enough to undo your change or at least without a lot of work, coalescing only
create even longer live range not shorter, Spiller can't help since it's
the Spiller created Spill/Reloads due to high register pressure, Splitting can
shorten the live ranges, but I don't think it can handle your case without a
lot of work.
>>
>
> 1. As I mentioned, it simply fixes a bug in implementation of one of the
two PRE's LLVM has.  It does not  change the PRE algorithm or add
> anything to it.  The code had a bug. I fixed the bug :P.    PRE is
> *not even adding more code in this case*.   The code is already there.
>   All it is doing is inserting a phi node.  If you transformed your code to
use memory, and reverted my patch, you'd get the same result, because Load
PRE will do the same thing. It's what PRE does.
>
> 2. GCC and other compilers have PRE's literally the same thing my patch
does (you are welcome to verify, i wrote GCC's :P), and apparently are smart
enough to handle this in RA.  So i'm going to suggest that it is, in fact,
possible to do so, and i'm going to further suggest that if we want to match
their performance, we need to be able to do the same.  You can't simply
"turn down" any optimization that RA may have to deal with.  It
usually doesn't work in practice.
> This is one of the reasons good RA is so hard.
>
> 3. As I also mentioned, register pressure heuristics in PRE simply do not
work.  They've been tried.  By many.  With little to no success.
>
> PRE is too high in the stack of optimizations to estimate register
> pressure in any sane fashion.   It's pretty much a fools errand.  You
> can never tune it to do what you want.  *Many* have tried.
>
> Your base level problem here is that all modern PRE algorithms (except for
min-cut PRE, as I mentioned), are based on a notion of lifetime optimality. That
is, they extend lifetimes as minimally as possible to still eliminate a given
redundancy. Ours does the same.
>
> However, this is not an entirely useful metric.  Optimizing for some other
metric is what something like min-cut PRE lets you do.
> But even then,  register pressure heuristics are almost certainly not the
answer.
>
> 4. This was actually already discussed when the patch was submitted, and
the consensus was "we should just fix RA".  Feel free to look at the
discussion 5 months ago.
>
> I would suggest, if you want to fix this, you take the approach that was
discussed then.
>

Daniel Berlin

2015-Jul-15 20:36 UTC

head link

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

On Wed, Jul 15, 2015 at 1:10 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:> IMHO, This doesn't make a lot of sense to turn off this part on
it's own.
> I would just use the enable-pre flag to turn off scalar PRE, as it
> will cause the same issue in other cases as well.
> Is there some reason you aren't just doing that?
> I suspect if this is a performance win, that would be as well.
>
Ugh, actually, it should be a win with the following change:


diff --git a/lib/Transforms/Scalar/GVN.cpp b/lib/Transforms/Scalar/GVN.cpp
index 2c47a8a..a3387e3 100644
--- a/lib/Transforms/Scalar/GVN.cpp
+++ b/lib/Transforms/Scalar/GVN.cpp
@@ -1767,7 +1767,7 @@ bool GVN::processNonLocalLoad(LoadInst *LI) {
   }

   // Step 4: Eliminate partial redundancy.
-  if (!EnablePRE || !EnableLoadPRE)
+  if (!EnableLoadPRE)
     return false;

   return PerformLoadPRE(LI, ValuesPerBlock, UnavailableBlocks);




This will disable Scalar PRE without disabling load PRE.


(note, again, however, that load PRE can create exactly the same GEP
situation you are referring to)

James Molloy

2015-Jul-15 20:47 UTC

head link

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

> Given what you are saying, you are also suggesting we are notrematerializing addressing computations where it is cheaper to do so.
That seems pretty critical to good RA :P

Yep, about 5 months ago I had a conversation about this too... it may even
be the one you're referencing. Our remat is really conservative - it only
rematerializes values that have zero input operands (move immediate only,
essentially).

James

On Wed, 15 Jul 2015 at 21:28 Daniel Berlin <dberlin at dberlin.org> wrote:
> IMHO, This doesn't make a lot of sense to turn off this part on
it's own.
> I would just use the enable-pre flag to turn off scalar PRE, as it
> will cause the same issue in other cases as well.
> Is there some reason you aren't just doing that?
> I suspect if this is a performance win, that would be as well.
>
> Also note that you will have the same problem as GVN/EarlyCSE/etc
> becomes smarter, as these are full redundancies being eliminated (IE
> there is no insertion happening). It just happens that PRE notices
> them and GVN doesn't, because GVN is dominator based and PRE is not.
> A slightly smarter GVN/EarlyCSE would do the same thing.
>
>
> Given what you are saying, you are also suggesting we are not
> rematerializing addressing computations where it is cheaper to do so.
> That seems pretty critical to good RA :P
>
>
>
> On Wed, Jul 15, 2015 at 11:51 AM, Lawrence <lawrence at
codeaurora.org>
> wrote:
> > Hi, Daniel:
> >
> > Thanks a lot for detailed background information, we are willing to
> provide the right fix, however it will take time, do you mind if you
> forward me the discussion you had 5 months ago?  I may not be able to
> access it since I only joined ellvmdev list this week.
> >
> > I did some performance measurement last night, some of our critical
> benchmark degraded up to 30% with your patch, so we have to turn it off for
> now at least.
> >
> > I posted patch to add a debug option (off by default), so we could
turn
> it off with that option, could you please review it and commit it for me if
> possible?  I don't have commit right yet, will ask soon.
> > http://reviews.llvm.org/D11234
> >
> > Thanks again.
> >
> > Lawrence Hu
> >
> >
> > -----Original Message-----
> > From: Daniel Berlin [mailto:dberlin at dberlin.org]
> > Sent: Wednesday, July 15, 2015 7:48 AM
> > To: Lawrence
> > Cc: LLVM Developers Mailing List
> > Subject: Re: Register pressure mechanism in PRE or Smarter
> rematerialization/split/spiller/coalescing ?
> >
> > On Tue, Jul 14, 2015 at 11:43 PM, Lawrence <lawrence at
codeaurora.org>
> wrote:
> >> I thought about a little bit more, I think adding Register
pressure
> control in your patch or PRE may be the only choice.
> >>
> >> Because at least for this case I am looking at,  what your patch
did is
> created more relatively complex long live range, rematerialization is not
> smart enough to undo your change or at least without a lot of work,
> coalescing only create even longer live range not shorter, Spiller
can't
> help since it's the Spiller created Spill/Reloads due to high register
> pressure, Splitting can shorten the live ranges, but I don't think it
can
> handle your case without a lot of work.
> >>
> >
> > 1. As I mentioned, it simply fixes a bug in implementation of one of
the
> two PRE's LLVM has.  It does not  change the PRE algorithm or add
> > anything to it.  The code had a bug. I fixed the bug :P.    PRE is
> > *not even adding more code in this case*.   The code is already there.
> >   All it is doing is inserting a phi node.  If you transformed your
code
> to use memory, and reverted my patch, you'd get the same result,
because
> Load PRE will do the same thing. It's what PRE does.
> >
> > 2. GCC and other compilers have PRE's literally the same thing my
patch
> does (you are welcome to verify, i wrote GCC's :P), and apparently are
> smart enough to handle this in RA.  So i'm going to suggest that it is,
in
> fact, possible to do so, and i'm going to further suggest that if we
want
> to match their performance, we need to be able to do the same.  You
can't
> simply "turn down" any optimization that RA may have to deal
with.  It
> usually doesn't work in practice.
> > This is one of the reasons good RA is so hard.
> >
> > 3. As I also mentioned, register pressure heuristics in PRE simply do
> not work.  They've been tried.  By many.  With little to no success.
> >
> > PRE is too high in the stack of optimizations to estimate register
> > pressure in any sane fashion.   It's pretty much a fools errand. 
You
> > can never tune it to do what you want.  *Many* have tried.
> >
> > Your base level problem here is that all modern PRE algorithms (except
> for min-cut PRE, as I mentioned), are based on a notion of lifetime
> optimality. That is, they extend lifetimes as minimally as possible to
> still eliminate a given redundancy. Ours does the same.
> >
> > However, this is not an entirely useful metric.  Optimizing for some
> other metric is what something like min-cut PRE lets you do.
> > But even then,  register pressure heuristics are almost certainly not
> the answer.
> >
> > 4. This was actually already discussed when the patch was submitted,
and
> the consensus was "we should just fix RA".  Feel free to look at
the
> discussion 5 months ago.
> >
> > I would suggest, if you want to fix this, you take the approach that
was
> discussed then.
> >
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/f3c5706b/attachment.html>

llvm dev - Jul 2015 - [LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?

[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?