James Molloy
2015-Jul-15 20:47 UTC
[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?
> Given what you are saying, you are also suggesting we are notrematerializing addressing computations where it is cheaper to do so. That seems pretty critical to good RA :P Yep, about 5 months ago I had a conversation about this too... it may even be the one you're referencing. Our remat is really conservative - it only rematerializes values that have zero input operands (move immediate only, essentially). James On Wed, 15 Jul 2015 at 21:28 Daniel Berlin <dberlin at dberlin.org> wrote:> IMHO, This doesn't make a lot of sense to turn off this part on it's own. > I would just use the enable-pre flag to turn off scalar PRE, as it > will cause the same issue in other cases as well. > Is there some reason you aren't just doing that? > I suspect if this is a performance win, that would be as well. > > Also note that you will have the same problem as GVN/EarlyCSE/etc > becomes smarter, as these are full redundancies being eliminated (IE > there is no insertion happening). It just happens that PRE notices > them and GVN doesn't, because GVN is dominator based and PRE is not. > A slightly smarter GVN/EarlyCSE would do the same thing. > > > Given what you are saying, you are also suggesting we are not > rematerializing addressing computations where it is cheaper to do so. > That seems pretty critical to good RA :P > > > > On Wed, Jul 15, 2015 at 11:51 AM, Lawrence <lawrence at codeaurora.org> > wrote: > > Hi, Daniel: > > > > Thanks a lot for detailed background information, we are willing to > provide the right fix, however it will take time, do you mind if you > forward me the discussion you had 5 months ago? I may not be able to > access it since I only joined ellvmdev list this week. > > > > I did some performance measurement last night, some of our critical > benchmark degraded up to 30% with your patch, so we have to turn it off for > now at least. > > > > I posted patch to add a debug option (off by default), so we could turn > it off with that option, could you please review it and commit it for me if > possible? I don't have commit right yet, will ask soon. > > http://reviews.llvm.org/D11234 > > > > Thanks again. > > > > Lawrence Hu > > > > > > -----Original Message----- > > From: Daniel Berlin [mailto:dberlin at dberlin.org] > > Sent: Wednesday, July 15, 2015 7:48 AM > > To: Lawrence > > Cc: LLVM Developers Mailing List > > Subject: Re: Register pressure mechanism in PRE or Smarter > rematerialization/split/spiller/coalescing ? > > > > On Tue, Jul 14, 2015 at 11:43 PM, Lawrence <lawrence at codeaurora.org> > wrote: > >> I thought about a little bit more, I think adding Register pressure > control in your patch or PRE may be the only choice. > >> > >> Because at least for this case I am looking at, what your patch did is > created more relatively complex long live range, rematerialization is not > smart enough to undo your change or at least without a lot of work, > coalescing only create even longer live range not shorter, Spiller can't > help since it's the Spiller created Spill/Reloads due to high register > pressure, Splitting can shorten the live ranges, but I don't think it can > handle your case without a lot of work. > >> > > > > 1. As I mentioned, it simply fixes a bug in implementation of one of the > two PRE's LLVM has. It does not change the PRE algorithm or add > > anything to it. The code had a bug. I fixed the bug :P. PRE is > > *not even adding more code in this case*. The code is already there. > > All it is doing is inserting a phi node. If you transformed your code > to use memory, and reverted my patch, you'd get the same result, because > Load PRE will do the same thing. It's what PRE does. > > > > 2. GCC and other compilers have PRE's literally the same thing my patch > does (you are welcome to verify, i wrote GCC's :P), and apparently are > smart enough to handle this in RA. So i'm going to suggest that it is, in > fact, possible to do so, and i'm going to further suggest that if we want > to match their performance, we need to be able to do the same. You can't > simply "turn down" any optimization that RA may have to deal with. It > usually doesn't work in practice. > > This is one of the reasons good RA is so hard. > > > > 3. As I also mentioned, register pressure heuristics in PRE simply do > not work. They've been tried. By many. With little to no success. > > > > PRE is too high in the stack of optimizations to estimate register > > pressure in any sane fashion. It's pretty much a fools errand. You > > can never tune it to do what you want. *Many* have tried. > > > > Your base level problem here is that all modern PRE algorithms (except > for min-cut PRE, as I mentioned), are based on a notion of lifetime > optimality. That is, they extend lifetimes as minimally as possible to > still eliminate a given redundancy. Ours does the same. > > > > However, this is not an entirely useful metric. Optimizing for some > other metric is what something like min-cut PRE lets you do. > > But even then, register pressure heuristics are almost certainly not > the answer. > > > > 4. This was actually already discussed when the patch was submitted, and > the consensus was "we should just fix RA". Feel free to look at the > discussion 5 months ago. > > > > I would suggest, if you want to fix this, you take the approach that was > discussed then. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/f3c5706b/attachment.html>
Philip Reames
2015-Jul-15 21:43 UTC
[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?
On 07/15/2015 01:47 PM, James Molloy wrote:> > Given what you are saying, you are also suggesting we are not > rematerializing addressing computations where it is cheaper to do so. > That seems pretty critical to good RA :P > > Yep, about 5 months ago I had a conversation about this too... it may > even be the one you're referencing. Our remat is really conservative - > it only rematerializes values that have zero input operands (move > immediate only, essentially).I thought we could remat explicitly invariant loads as well?> > James > > On Wed, 15 Jul 2015 at 21:28 Daniel Berlin <dberlin at dberlin.org > <mailto:dberlin at dberlin.org>> wrote: > > IMHO, This doesn't make a lot of sense to turn off this part on > it's own. > I would just use the enable-pre flag to turn off scalar PRE, as it > will cause the same issue in other cases as well. > Is there some reason you aren't just doing that? > I suspect if this is a performance win, that would be as well. > > Also note that you will have the same problem as GVN/EarlyCSE/etc > becomes smarter, as these are full redundancies being eliminated (IE > there is no insertion happening). It just happens that PRE notices > them and GVN doesn't, because GVN is dominator based and PRE is not. > A slightly smarter GVN/EarlyCSE would do the same thing. > > > Given what you are saying, you are also suggesting we are not > rematerializing addressing computations where it is cheaper to do so. > That seems pretty critical to good RA :P > > > > On Wed, Jul 15, 2015 at 11:51 AM, Lawrence > <lawrence at codeaurora.org <mailto:lawrence at codeaurora.org>> wrote: > > Hi, Daniel: > > > > Thanks a lot for detailed background information, we are willing > to provide the right fix, however it will take time, do you mind > if you forward me the discussion you had 5 months ago? I may not > be able to access it since I only joined ellvmdev list this week. > > > > I did some performance measurement last night, some of our > critical benchmark degraded up to 30% with your patch, so we have > to turn it off for now at least. > > > > I posted patch to add a debug option (off by default), so we > could turn it off with that option, could you please review it and > commit it for me if possible? I don't have commit right yet, will > ask soon. > > http://reviews.llvm.org/D11234 > <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D11234&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=zIdYS4bXbG63-LsNf28NXGKkrEdKPhY_mdzZrzA1eQE&s=6zQDYowITv9KrC3Glp-DVdhOmWxIQRFN5V8VIAdvnJQ&e=> > > > > Thanks again. > > > > Lawrence Hu > > > > > > -----Original Message----- > > From: Daniel Berlin [mailto:dberlin at dberlin.org > <mailto:dberlin at dberlin.org>] > > Sent: Wednesday, July 15, 2015 7:48 AM > > To: Lawrence > > Cc: LLVM Developers Mailing List > > Subject: Re: Register pressure mechanism in PRE or Smarter > rematerialization/split/spiller/coalescing ? > > > > On Tue, Jul 14, 2015 at 11:43 PM, Lawrence > <lawrence at codeaurora.org <mailto:lawrence at codeaurora.org>> wrote: > >> I thought about a little bit more, I think adding Register > pressure control in your patch or PRE may be the only choice. > >> > >> Because at least for this case I am looking at, what your > patch did is created more relatively complex long live range, > rematerialization is not smart enough to undo your change or at > least without a lot of work, coalescing only create even longer > live range not shorter, Spiller can't help since it's the Spiller > created Spill/Reloads due to high register pressure, Splitting can > shorten the live ranges, but I don't think it can handle your case > without a lot of work. > >> > > > > 1. As I mentioned, it simply fixes a bug in implementation of > one of the two PRE's LLVM has. It does not change the PRE > algorithm or add > > anything to it. The code had a bug. I fixed the bug :P. PRE is > > *not even adding more code in this case*. The code is already > there. > > All it is doing is inserting a phi node. If you transformed > your code to use memory, and reverted my patch, you'd get the same > result, because Load PRE will do the same thing. It's what PRE does. > > > > 2. GCC and other compilers have PRE's literally the same thing > my patch does (you are welcome to verify, i wrote GCC's :P), and > apparently are smart enough to handle this in RA. So i'm going to > suggest that it is, in fact, possible to do so, and i'm going to > further suggest that if we want to match their performance, we > need to be able to do the same. You can't simply "turn down" any > optimization that RA may have to deal with. It usually doesn't > work in practice. > > This is one of the reasons good RA is so hard. > > > > 3. As I also mentioned, register pressure heuristics in PRE > simply do not work. They've been tried. By many. With little to > no success. > > > > PRE is too high in the stack of optimizations to estimate register > > pressure in any sane fashion. It's pretty much a fools > errand. You > > can never tune it to do what you want. *Many* have tried. > > > > Your base level problem here is that all modern PRE algorithms > (except for min-cut PRE, as I mentioned), are based on a notion of > lifetime optimality. That is, they extend lifetimes as minimally > as possible to still eliminate a given redundancy. Ours does the same. > > > > However, this is not an entirely useful metric. Optimizing for > some other metric is what something like min-cut PRE lets you do. > > But even then, register pressure heuristics are almost > certainly not the answer. > > > > 4. This was actually already discussed when the patch was > submitted, and the consensus was "we should just fix RA". Feel > free to look at the discussion 5 months ago. > > > > I would suggest, if you want to fix this, you take the approach > that was discussed then. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/58477794/attachment.html>
Hal Finkel
2015-Jul-16 04:44 UTC
[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?
----- Original Message -----> From: "Philip Reames" <listmail at philipreames.com> > To: "James Molloy" <james at jamesmolloy.co.uk>, "Daniel Berlin" > <dberlin at dberlin.org>, "Lawrence" <lawrence at codeaurora.org> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Wednesday, July 15, 2015 4:43:51 PM > Subject: Re: [LLVMdev] Register pressure mechanism in PRE or Smarter > rematerialization/split/spiller/coalescing ?> On 07/15/2015 01:47 PM, James Molloy wrote:> > > Given what you are saying, you are also suggesting we are not > > > rematerializing addressing computations where it is cheaper to do > > so. > > > That seems pretty critical to good RA :P >> > Yep, about 5 months ago I had a conversation about this too... it > > may > > even be the one you're referencing. Our remat is really > > conservative > > - it only rematerializes values that have zero input operands (move > > immediate only, essentially). > > I thought we could remat explicitly invariant loads as well?We can. See TargetInstrInfo::isReallyTriviallyReMaterializableGeneric in lib/CodeGen/TargetInstrInfo.cpp -- What we can't do, for example, it rematerialize small sequences of instructions, such as those that form constants out of smaller immediates. -Hal> > James >> > On Wed, 15 Jul 2015 at 21:28 Daniel Berlin < dberlin at dberlin.org > > > wrote: >> > > IMHO, This doesn't make a lot of sense to turn off this part on > > > it's > > > own. > > > > > > I would just use the enable-pre flag to turn off scalar PRE, as > > > it > > > > > > will cause the same issue in other cases as well. > > > > > > Is there some reason you aren't just doing that? > > > > > > I suspect if this is a performance win, that would be as well. > > >> > > Also note that you will have the same problem as GVN/EarlyCSE/etc > > > > > > becomes smarter, as these are full redundancies being eliminated > > > (IE > > > > > > there is no insertion happening). It just happens that PRE > > > notices > > > > > > them and GVN doesn't, because GVN is dominator based and PRE is > > > not. > > > > > > A slightly smarter GVN/EarlyCSE would do the same thing. > > >> > > Given what you are saying, you are also suggesting we are not > > > > > > rematerializing addressing computations where it is cheaper to do > > > so. > > > > > > That seems pretty critical to good RA :P > > >> > > On Wed, Jul 15, 2015 at 11:51 AM, Lawrence < > > > lawrence at codeaurora.org > > > > wrote: > > > > > > > Hi, Daniel: > > > > > > > > > > > > > > Thanks a lot for detailed background information, we are > > > > willing > > > > to > > > > provide the right fix, however it will take time, do you mind > > > > if > > > > you forward me the discussion you had 5 months ago? I may not > > > > be > > > > able to access it since I only joined ellvmdev list this week. > > > > > > > > > > > > > > I did some performance measurement last night, some of our > > > > critical > > > > benchmark degraded up to 30% with your patch, so we have to > > > > turn > > > > it off for now at least. > > > > > > > > > > > > > > I posted patch to add a debug option (off by default), so we > > > > could > > > > turn it off with that option, could you please review it and > > > > commit it for me if possible? I don't have commit right yet, > > > > will > > > > ask soon. > > > > > > > http://reviews.llvm.org/D11234 > > > > > > > > > > > > > > Thanks again. > > > > > > > > > > > > > > Lawrence Hu > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Daniel Berlin [mailto: dberlin at dberlin.org ] > > > > > > > Sent: Wednesday, July 15, 2015 7:48 AM > > > > > > > To: Lawrence > > > > > > > Cc: LLVM Developers Mailing List > > > > > > > Subject: Re: Register pressure mechanism in PRE or Smarter > > > > rematerialization/split/spiller/coalescing ? > > > > > > > > > > > > > > On Tue, Jul 14, 2015 at 11:43 PM, Lawrence < > > > > lawrence at codeaurora.org > wrote: > > > > > > >> I thought about a little bit more, I think adding Register > > > >> pressure control in your patch or PRE may be the only choice. > > > > > > >> > > > > > > >> Because at least for this case I am looking at, what your > > > >> patch > > > >> did is created more relatively complex long live range, > > > >> rematerialization is not smart enough to undo your change or > > > >> at > > > >> least without a lot of work, coalescing only create even > > > >> longer > > > >> live range not shorter, Spiller can't help since it's the > > > >> Spiller > > > >> created Spill/Reloads due to high register pressure, Splitting > > > >> can shorten the live ranges, but I don't think it can handle > > > >> your > > > >> case without a lot of work. > > > > > > >> > > > > > > > > > > > > > > 1. As I mentioned, it simply fixes a bug in implementation of > > > > one > > > > of the two PRE's LLVM has. It does not change the PRE algorithm > > > > or > > > > add > > > > > > > anything to it. The code had a bug. I fixed the bug :P. PRE is > > > > > > > *not even adding more code in this case*. The code is already > > > > there. > > > > > > > All it is doing is inserting a phi node. If you transformed > > > > your > > > > code to use memory, and reverted my patch, you'd get the same > > > > result, because Load PRE will do the same thing. It's what PRE > > > > does. > > > > > > > > > > > > > > 2. GCC and other compilers have PRE's literally the same thing > > > > my > > > > patch does (you are welcome to verify, i wrote GCC's :P), and > > > > apparently are smart enough to handle this in RA. So i'm going > > > > to > > > > suggest that it is, in fact, possible to do so, and i'm going > > > > to > > > > further suggest that if we want to match their performance, we > > > > need to be able to do the same. You can't simply "turn down" > > > > any > > > > optimization that RA may have to deal with. It usually doesn't > > > > work in practice. > > > > > > > This is one of the reasons good RA is so hard. > > > > > > > > > > > > > > 3. As I also mentioned, register pressure heuristics in PRE > > > > simply > > > > do not work. They've been tried. By many. With little to no > > > > success. > > > > > > > > > > > > > > PRE is too high in the stack of optimizations to estimate > > > > register > > > > > > > pressure in any sane fashion. It's pretty much a fools errand. > > > > You > > > > > > > can never tune it to do what you want. *Many* have tried. > > > > > > > > > > > > > > Your base level problem here is that all modern PRE algorithms > > > > (except for min-cut PRE, as I mentioned), are based on a notion > > > > of > > > > lifetime optimality. That is, they extend lifetimes as > > > > minimally > > > > as possible to still eliminate a given redundancy. Ours does > > > > the > > > > same. > > > > > > > > > > > > > > However, this is not an entirely useful metric. Optimizing for > > > > some > > > > other metric is what something like min-cut PRE lets you do. > > > > > > > But even then, register pressure heuristics are almost > > > > certainly > > > > not the answer. > > > > > > > > > > > > > > 4. This was actually already discussed when the patch was > > > > submitted, and the consensus was "we should just fix RA". Feel > > > > free to look at the discussion 5 months ago. > > > > > > > > > > > > > > I would suggest, if you want to fix this, you take the approach > > > > that was discussed then. > > > > > > > > > >> > > _______________________________________________ > > > > > > LLVM Developers mailing list > > > > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >> > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/c8674a98/attachment.html>