Jim Grosbach
2014-Jan-07 19:16 UTC
[LLVMdev] A question about everyone's favorite constructs: NSW and NUW
On Jan 6, 2014, at 9:41 PM, Andrew Trick <atrick at apple.com> wrote:> > On Jan 4, 2014, at 5:29 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > >> So, I know there are a lot of mixed feelings about NSW and NUW, but I'm running into a common pattern where they really help: >> >> void f(char *array, int i) { >> // do some stuff... >> g(array[i++]); >> >> // do some more stuff... >> g(array[i++]); >> >> // more of the same >> } >> >> So, this kind of code comes up pretty frequently. Unrolled loops with an int IV[1], encoding, decoding, compression, etc. What we'd like to do is take the sequence of N increments and turn them into N addressing modes with a constant offset. Then add a small constant at the end and test the flags if we're in a loop, etc. This is *much faster* than clobbering registers to do LEA or inc or add for each one (and yea, we do LEA and use lots of registers currently). >> >> The problem is that this is being compiled in 64-bit mode. So the index is 32-bits and gets sign extended prior to the GEP. So we end up with a lot of repetitions of: >> >> %a = add nsw i32 %i, N # for some immediate small integer N >> %b = sext i32 %i to i64 >> %ptr = gep inbounds i8* %base, i64 %b >> >> When we try to match this pattern in the backend to an x86 addressing mode we fail because we can't look through the sext. This is particularly frustrating because we *should* be able to look through the sext due to the NSW bit on the add, but we don't preserve NSW constraints into the DAG. >> >> How can we fix this? I see the following options: >> >> 1) Canonicalize the other way in the IR. This could take a few different forms: >> 1a) Widen math (where NSW or NUW allows) which is only used by an *ext feeding a GEP index. >> 1b) Widen all math which is only used by an *ext that targets a legal integer size. >> >> 2) Add the NSW (and NUW) bits to the DAG, and query them in the address mode matching to match across (sext (op nsw ...)) and (zext (op nuw ...)) >> >> 3) Add a complete hack to CodeGenPrepare (or similar) which does #1 as a non-canonical step in the IR. > > Here are some of my current prespectives that are *loosely* related to the issue. All FWIW. There's no need to debate them in this thread other than as they apply to the problem at hand. > > - I agree that promotion to 64-bit loses information and should only be done when target information tells us that it's good for codegen. > > - I personally have no problem using "indvars" as a codegen prepare pass for any noncanonical target-specific transforms that require SCEV. The only canonicalization that it usually accomplishes that is replacement of loop exit values. The sign/zero extend elimination (widening) and LFTR are really codegen optimizations that should be defered until the last round of loop opts. I don't see it as being any different in this respect than LSR and vectorization. Unfortunately we rerun canonical InstCombine after indvars, which can undo any target-specific goodness. But when that becomes a serious problem we should just fix that too. If we really need to replace loop exit values earlier before other canonical passes, we can split up indvars. > > - I know we have considered adding NSW to SelectionDag in the past and I'm not opposed to it in principle, but it was difficult to justify adding the complexity. I don't remember all the arguments against it now. There are some obvious issues: (1) it's not clear how to cleanly implement it. (2) I woud be concerned about relying on preservation of such a brittle attribute so late. It seems easy for target specific DAG combines to either drop or misuse the NSW flag. (3) Determining where to widen values may require global analysis. > > - I know that the problem of determining when values should be promoted to a wider type and which uses to replace doesn't have a general solution that is practical to implement. We can easily end up with more register pressure and extra copies that the coalescer is not smart about removing or optimizing. It seems wise to tackle a narrow subset of the problem (e.g. codegen of specific patterns that match addressing modes). Quentin probably has more insight here. > >> I dislike #3 intensely -- we really want to be able to combine on this stuff in a canonical way, so I want the pattern to either always be (gep ... (sext (op nsw ...))) or (gep ... (op nsw (sext ...))). Also, it feels like quite a hack around the lack of fidelity in the backend. > > I understand why this sort of hack is so terrible early in the IR pass pipeline (especially before inlining). But I think it is already the norm for anyone seriously optimizing for a specific target to manipulate the IR (here I do welcome others to confirm or refute this point). Since we can't do anything to change that, let's just declare that those codegen IR transforms are part of the "backend" and be done. Naturally, the more a target customizes codegen preparation, the less DAGCombine can be reused.Yes. That’s CodgeGenPrepare in a nutshell, really. I don’t think anyone claims CGP is a good thing, merely that with our existing infrastructure, it’s a necessary and effective thing. -Jim> > That said, if NSW can be added to SelectionDAG without too many issues, then it may be a moot point. > > -Andy > >> >> I'm perfectly fine with either #1 or #2. Doing #1 would require a lot of benchmarking to make sure the different canonicalization doesn't regress lots of things, and probably a few fixes where it does regress things. These are likely to be easy to find by looking for patterns that match sext. >> >> Doing #2 requires a plumbing these flags through the DAG. I don't yet know how hard that would be, but it doesn't sound too bad. The current approach I would probably take is to add an SDNodeProperty for NSW and NUW, and nodes for operations with those flags. But that means generalizing everything that currently only looks at ADD. There is probably a better way to express this in the DAG, so if this is the direction folks thing LLVM should go, point me at any ideas you have about how to best implement it. >> >> Currently, I would naively go after #1 just because I know exactly how to do that. But it doesn't feel quite right and I wonder if we should really invest the time in making sure the backend is aware of the NSW and NUW flags. >> >> -Chandler >> >> [1]: Note that we could fix this inside of loop bodies when its an induction variable by widening the induction variable. But that doesn't fix in the general case. >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140107/850166b8/attachment.html>
Quentin Colombet
2014-Jan-07 19:24 UTC
[LLVMdev] A question about everyone's favorite constructs: NSW and NUW
Hi, On Jan 7, 2014, at 11:16 AM, Jim Grosbach <grosbach at apple.com> wrote:> > On Jan 6, 2014, at 9:41 PM, Andrew Trick <atrick at apple.com> wrote: > >> >> On Jan 4, 2014, at 5:29 PM, Chandler Carruth <chandlerc at gmail.com> wrote: >> >>> So, I know there are a lot of mixed feelings about NSW and NUW, but I'm running into a common pattern where they really help: >>> >>> void f(char *array, int i) { >>> // do some stuff... >>> g(array[i++]); >>> >>> // do some more stuff... >>> g(array[i++]); >>> >>> // more of the same >>> } >>> >>> So, this kind of code comes up pretty frequently. Unrolled loops with an int IV[1], encoding, decoding, compression, etc. What we'd like to do is take the sequence of N increments and turn them into N addressing modes with a constant offset. Then add a small constant at the end and test the flags if we're in a loop, etc. This is *much faster* than clobbering registers to do LEA or inc or add for each one (and yea, we do LEA and use lots of registers currently). >>> >>> The problem is that this is being compiled in 64-bit mode. So the index is 32-bits and gets sign extended prior to the GEP. So we end up with a lot of repetitions of: >>> >>> %a = add nsw i32 %i, N # for some immediate small integer N >>> %b = sext i32 %i to i64 >>> %ptr = gep inbounds i8* %base, i64 %b >>> >>> When we try to match this pattern in the backend to an x86 addressing mode we fail because we can't look through the sext. This is particularly frustrating because we *should* be able to look through the sext due to the NSW bit on the add, but we don't preserve NSW constraints into the DAG. >>> >>> How can we fix this? I see the following options: >>> >>> 1) Canonicalize the other way in the IR. This could take a few different forms: >>> 1a) Widen math (where NSW or NUW allows) which is only used by an *ext feeding a GEP index. >>> 1b) Widen all math which is only used by an *ext that targets a legal integer size. >>> >>> 2) Add the NSW (and NUW) bits to the DAG, and query them in the address mode matching to match across (sext (op nsw ...)) and (zext (op nuw ...)) >>> >>> 3) Add a complete hack to CodeGenPrepare (or similar) which does #1 as a non-canonical step in the IR. >> >> Here are some of my current prespectives that are *loosely* related to the issue. All FWIW. There's no need to debate them in this thread other than as they apply to the problem at hand. >> >> - I agree that promotion to 64-bit loses information and should only be done when target information tells us that it's good for codegen. >> >> - I personally have no problem using "indvars" as a codegen prepare pass for any noncanonical target-specific transforms that require SCEV. The only canonicalization that it usually accomplishes that is replacement of loop exit values. The sign/zero extend elimination (widening) and LFTR are really codegen optimizations that should be defered until the last round of loop opts. I don't see it as being any different in this respect than LSR and vectorization. Unfortunately we rerun canonical InstCombine after indvars, which can undo any target-specific goodness. But when that becomes a serious problem we should just fix that too. If we really need to replace loop exit values earlier before other canonical passes, we can split up indvars. >> >> - I know we have considered adding NSW to SelectionDag in the past and I'm not opposed to it in principle, but it was difficult to justify adding the complexity. I don't remember all the arguments against it now. There are some obvious issues: (1) it's not clear how to cleanly implement it. (2) I woud be concerned about relying on preservation of such a brittle attribute so late. It seems easy for target specific DAG combines to either drop or misuse the NSW flag. (3) Determining where to widen values may require global analysis. >> >> - I know that the problem of determining when values should be promoted to a wider type and which uses to replace doesn't have a general solution that is practical to implement. We can easily end up with more register pressure and extra copies that the coalescer is not smart about removing or optimizing. It seems wise to tackle a narrow subset of the problem (e.g. codegen of specific patterns that match addressing modes). Quentin probably has more insight here. >> >>> I dislike #3 intensely -- we really want to be able to combine on this stuff in a canonical way, so I want the pattern to either always be (gep ... (sext (op nsw ...))) or (gep ... (op nsw (sext ...))). Also, it feels like quite a hack around the lack of fidelity in the backend. >> >> I understand why this sort of hack is so terrible early in the IR pass pipeline (especially before inlining). But I think it is already the norm for anyone seriously optimizing for a specific target to manipulate the IR (here I do welcome others to confirm or refute this point). Since we can't do anything to change that, let's just declare that those codegen IR transforms are part of the "backend" and be done. Naturally, the more a target customizes codegen preparation, the less DAGCombine can be reused. > > Yes. That’s CodgeGenPrepare in a nutshell, really. I don’t think anyone claims CGP is a good thing, merely that with our existing infrastructure, it’s a necessary and effective thing.I agree with Andy and Jim. Actually, I have prototyped a compiler that does exactly this kind of promotion in CodeGenPrepare. Basically, I have updated the addressing mode matcher so that it moves a sext that is in a way of an addressing mode (i.e., it promotes the operand of the sext, let us call this operand def, if it is legal to do so, and sign extends the operands of def). When the matcher does not manage to absorb more computation after promoting def, it can revert the promotion. I am currently benchmarking this solution and I’ll update this thread with the results. Cheers, -Quentin> > -Jim > >> >> That said, if NSW can be added to SelectionDAG without too many issues, then it may be a moot point. >> >> -Andy >> >>> >>> I'm perfectly fine with either #1 or #2. Doing #1 would require a lot of benchmarking to make sure the different canonicalization doesn't regress lots of things, and probably a few fixes where it does regress things. These are likely to be easy to find by looking for patterns that match sext. >>> >>> Doing #2 requires a plumbing these flags through the DAG. I don't yet know how hard that would be, but it doesn't sound too bad. The current approach I would probably take is to add an SDNodeProperty for NSW and NUW, and nodes for operations with those flags. But that means generalizing everything that currently only looks at ADD. There is probably a better way to express this in the DAG, so if this is the direction folks thing LLVM should go, point me at any ideas you have about how to best implement it. >>> >>> Currently, I would naively go after #1 just because I know exactly how to do that. But it doesn't feel quite right and I wonder if we should really invest the time in making sure the backend is aware of the NSW and NUW flags. >>> >>> -Chandler >>> >>> [1]: Note that we could fix this inside of loop bodies when its an induction variable by widening the induction variable. But that doesn't fix in the general case. >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140107/78eaf7ee/attachment.html>
Chandler Carruth
2014-Jan-07 19:40 UTC
[LLVMdev] A question about everyone's favorite constructs: NSW and NUW
On Tue, Jan 7, 2014 at 2:24 PM, Quentin Colombet <qcolombet at apple.com>wrote:> I agree with Andy and Jim. > Actually, I have prototyped a compiler that does exactly this kind of > promotion in CodeGenPrepare. > Basically, I have updated the addressing mode matcher so that it moves a > sext that is in a way of an addressing mode (i.e., it promotes the operand > of the sext, let us call this operand def, if it is legal to do so, and > sign extends the operands of def). When the matcher does not manage to > absorb more computation after promoting def, it can revert the promotion. > > I am currently benchmarking this solution and I’ll update this thread with > the results. >Very cool. Could you share the patch? I can also run some benchmarks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140107/21e7eb9e/attachment.html>