Lawrence
2015-Jul-15 06:43 UTC
[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?
I thought about a little bit more, I think adding Register pressure control in your patch or PRE may be the only choice. Because at least for this case I am looking at, what your patch did is created more relatively complex long live range, rematerialization is not smart enough to undo your change or at least without a lot of work, coalescing only create even longer live range not shorter, Spiller can't help since it's the Spiller created Spill/Reloads due to high register pressure, Splitting can shorten the live ranges, but I don't think it can handle your case without a lot of work. Suggestions are welcome. -----Original Message----- From: Daniel Berlin [mailto:dberlin at dberlin.org] Sent: Tuesday, July 14, 2015 6:54 PM To: Lawrence Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] FW: Insight of 403050abcc091260be2e8f58484e7a39c0782b47? Simple register pressure heuristics for PRE rarely work. If we wanted to do something better for PRE, the real answer is something like min-cut PRE. On Tue, Jul 14, 2015 at 5:42 PM, Lawrence <lawrence at codeaurora.org> wrote:> +llvmdev > > > > From: Lawrence [mailto:lawrence at codeaurora.org] > Sent: Tuesday, July 14, 2015 5:42 PM > To: 'dberlin at dberlin.org' > Cc: 'Ana Pazos'; 'Sanjin Sijaric'; 'zinob at codeaurora.org' > Subject: RE: Insight of 403050abcc091260be2e8f58484e7a39c0782b47? > > > > Thanks Daniel for your prompt response. > > > > I understand the problem is a combination of multiple things, I have > opened at least one RA bug for it. > > > > One thing I am not sure is should we add some register pressure > mechanism to your patch (or even PRE) or fix later optimization such > as > rematerialization/split/spiller/coalescing(?) to handle that. > > > > +llvmdev for more discussion > > > > Will attached two log more detail to individual if anyone ask. > > > > Regards. > > > > Lawrence Hu > > > > > > From: Lawrence [mailto:lawrence at codeaurora.org] > Sent: Tuesday, July 14, 2015 3:16 PM > To: 'dberlin at dberlin.org' > Cc: 'Ana Pazos'; 'Sanjin Sijaric'; 'zinob at codeaurora.org' > Subject: Insight of 403050abcc091260be2e8f58484e7a39c0782b47? > > > > Hi, Daniel: > > > > I am investigating the performance of a version of open source > bzip2/decompress, I found out that your patch > > > > commit 403050abcc091260be2e8f58484e7a39c0782b47 > > Author: Daniel Berlin <dberlin at dberlin.org> > > Date: Tue Feb 3 20:37:08 2015 +0000 > > > > Allow PRE to insert no-cost phi nodes > > > > changed some getelementptr to PHIs, which introduced a lot more long > live ranges, that leaded to almost 3 times more Spill/Reloads from > register allocation, as a result, it degraded the performance of > bzip2/decompress by 16%. > > > > Attached are two log files, log.good without your change, log.bad with > your change. > > > > > > I couldn’t understand your patch well just base on the description and > comments, could you please give me some insight about the intention of > the patch and possible way of limiting it to avoid that performance degradation? > > > > Your help is highly appreciated. > > > > Regards > > > > Lawrence Hu. > > > > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Daniel Berlin
2015-Jul-15 14:48 UTC
[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?
On Tue, Jul 14, 2015 at 11:43 PM, Lawrence <lawrence at codeaurora.org> wrote:> I thought about a little bit more, I think adding Register pressure control in your patch or PRE may be the only choice. > > Because at least for this case I am looking at, what your patch did is created more relatively complex long live range, rematerialization is not smart enough to undo your change or at least without a lot of work, coalescing only create even longer live range not shorter, Spiller can't help since it's the Spiller created Spill/Reloads due to high register pressure, Splitting can shorten the live ranges, but I don't think it can handle your case without a lot of work. >1. As I mentioned, it simply fixes a bug in implementation of one of the two PRE's LLVM has. It does not change the PRE algorithm or add anything to it. The code had a bug. I fixed the bug :P. PRE is *not even adding more code in this case*. The code is already there. All it is doing is inserting a phi node. If you transformed your code to use memory, and reverted my patch, you'd get the same result, because Load PRE will do the same thing. It's what PRE does. 2. GCC and other compilers have PRE's literally the same thing my patch does (you are welcome to verify, i wrote GCC's :P), and apparently are smart enough to handle this in RA. So i'm going to suggest that it is, in fact, possible to do so, and i'm going to further suggest that if we want to match their performance, we need to be able to do the same. You can't simply "turn down" any optimization that RA may have to deal with. It usually doesn't work in practice. This is one of the reasons good RA is so hard. 3. As I also mentioned, register pressure heuristics in PRE simply do not work. They've been tried. By many. With little to no success. PRE is too high in the stack of optimizations to estimate register pressure in any sane fashion. It's pretty much a fools errand. You can never tune it to do what you want. *Many* have tried. Your base level problem here is that all modern PRE algorithms (except for min-cut PRE, as I mentioned), are based on a notion of lifetime optimality. That is, they extend lifetimes as minimally as possible to still eliminate a given redundancy. Ours does the same. However, this is not an entirely useful metric. Optimizing for some other metric is what something like min-cut PRE lets you do. But even then, register pressure heuristics are almost certainly not the answer. 4. This was actually already discussed when the patch was submitted, and the consensus was "we should just fix RA". Feel free to look at the discussion 5 months ago. I would suggest, if you want to fix this, you take the approach that was discussed then.
Philip Reames
2015-Jul-15 16:57 UTC
[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?
On 07/15/2015 07:48 AM, Daniel Berlin wrote:> On Tue, Jul 14, 2015 at 11:43 PM, Lawrence <lawrence at codeaurora.org> wrote: >> I thought about a little bit more, I think adding Register pressure control in your patch or PRE may be the only choice. >> >> Because at least for this case I am looking at, what your patch did is created more relatively complex long live range, rematerialization is not smart enough to undo your change or at least without a lot of work, coalescing only create even longer live range not shorter, Spiller can't help since it's the Spiller created Spill/Reloads due to high register pressure, Splitting can shorten the live ranges, but I don't think it can handle your case without a lot of work. >> > 1. As I mentioned, it simply fixes a bug in implementation of one of > the two PRE's LLVM has. It does not change the PRE algorithm or add > anything to it. The code had a bug. I fixed the bug :P. PRE is > *not even adding more code in this case*. The code is already there. > All it is doing is inserting a phi node. If you transformed your > code to use memory, and reverted my patch, you'd get the same result, > because Load PRE will do the same thing. It's what PRE does. > > 2. GCC and other compilers have PRE's literally the same thing my > patch does (you are welcome to verify, i wrote GCC's :P), and > apparently are smart enough to handle this in RA. So i'm going to > suggest that it is, in fact, possible to do so, and i'm going to > further suggest that if we want to match their performance, we need to > be able to do the same. You can't simply "turn down" any optimization > that RA may have to deal with. It usually doesn't work in practice. > This is one of the reasons good RA is so hard. > > 3. As I also mentioned, register pressure heuristics in PRE simply do > not work. They've been tried. By many. With little to no success. > > PRE is too high in the stack of optimizations to estimate register > pressure in any sane fashion. It's pretty much a fools errand. You > can never tune it to do what you want. *Many* have tried. > > Your base level problem here is that all modern PRE algorithms (except > for min-cut PRE, as I mentioned), are based on a notion of lifetime > optimality. That is, they extend lifetimes as minimally as possible to > still eliminate a given redundancy. Ours does the same. > > However, this is not an entirely useful metric. Optimizing for some > other metric is what something like min-cut PRE lets you do. > But even then, register pressure heuristics are almost certainly not > the answer. > > 4. This was actually already discussed when the patch was submitted, > and the consensus was "we should just fix RA". Feel free to look at > the discussion 5 months ago.+1. If you can show a common pattern created by PRE which our regalloc doesn't handle well, please file a bug.> > I would suggest, if you want to fix this, you take the approach that > was discussed then. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Lawrence
2015-Jul-15 18:51 UTC
[LLVMdev] Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ?
Hi, Daniel: Thanks a lot for detailed background information, we are willing to provide the right fix, however it will take time, do you mind if you forward me the discussion you had 5 months ago? I may not be able to access it since I only joined ellvmdev list this week. I did some performance measurement last night, some of our critical benchmark degraded up to 30% with your patch, so we have to turn it off for now at least. I posted patch to add a debug option (off by default), so we could turn it off with that option, could you please review it and commit it for me if possible? I don't have commit right yet, will ask soon. http://reviews.llvm.org/D11234 Thanks again. Lawrence Hu -----Original Message----- From: Daniel Berlin [mailto:dberlin at dberlin.org] Sent: Wednesday, July 15, 2015 7:48 AM To: Lawrence Cc: LLVM Developers Mailing List Subject: Re: Register pressure mechanism in PRE or Smarter rematerialization/split/spiller/coalescing ? On Tue, Jul 14, 2015 at 11:43 PM, Lawrence <lawrence at codeaurora.org> wrote:> I thought about a little bit more, I think adding Register pressure control in your patch or PRE may be the only choice. > > Because at least for this case I am looking at, what your patch did is created more relatively complex long live range, rematerialization is not smart enough to undo your change or at least without a lot of work, coalescing only create even longer live range not shorter, Spiller can't help since it's the Spiller created Spill/Reloads due to high register pressure, Splitting can shorten the live ranges, but I don't think it can handle your case without a lot of work. >1. As I mentioned, it simply fixes a bug in implementation of one of the two PRE's LLVM has. It does not change the PRE algorithm or add anything to it. The code had a bug. I fixed the bug :P. PRE is *not even adding more code in this case*. The code is already there. All it is doing is inserting a phi node. If you transformed your code to use memory, and reverted my patch, you'd get the same result, because Load PRE will do the same thing. It's what PRE does. 2. GCC and other compilers have PRE's literally the same thing my patch does (you are welcome to verify, i wrote GCC's :P), and apparently are smart enough to handle this in RA. So i'm going to suggest that it is, in fact, possible to do so, and i'm going to further suggest that if we want to match their performance, we need to be able to do the same. You can't simply "turn down" any optimization that RA may have to deal with. It usually doesn't work in practice. This is one of the reasons good RA is so hard. 3. As I also mentioned, register pressure heuristics in PRE simply do not work. They've been tried. By many. With little to no success. PRE is too high in the stack of optimizations to estimate register pressure in any sane fashion. It's pretty much a fools errand. You can never tune it to do what you want. *Many* have tried. Your base level problem here is that all modern PRE algorithms (except for min-cut PRE, as I mentioned), are based on a notion of lifetime optimality. That is, they extend lifetimes as minimally as possible to still eliminate a given redundancy. Ours does the same. However, this is not an entirely useful metric. Optimizing for some other metric is what something like min-cut PRE lets you do. But even then, register pressure heuristics are almost certainly not the answer. 4. This was actually already discussed when the patch was submitted, and the consensus was "we should just fix RA". Feel free to look at the discussion 5 months ago. I would suggest, if you want to fix this, you take the approach that was discussed then.