Chawla, Pankaj via llvm-dev
2017-Apr-13 00:21 UTC
[llvm-dev] Question on induction variable simplification pass
Hi all, It looks like the induction variable simplification pass prefers doing a zero-extension to compute the wider trip count of loops when extending the IV. This can sometimes result in loss of information making ScalarEvolution's analysis conservative which can lead to missed performance opportunities. For example, consider this loopnest- int i, j; for(i=0; i< 40; i++) for(j=0; j< i-1; j++) A[i+j] = j; We are mainly interested in the backedge taken count of the inner loop. Before indvars, the backedge information computed by ScalarEvolution is as follows- Outer loop- backedge-taken count is 39 max backedge-taken count is 39 Inner loop- backedge-taken count is {-2,+,1}<nsw><%for.cond1.preheader> max backedge-taken count is 37 After indvars, the backedge information computed by ScalarEvolution is as follows- Outer loop- backedge-taken count is 39 max backedge-taken count is 39 Inner loop- backedge-taken count is (-1 + (zext i32 {-1,+,1}<nsw><%for.cond1.preheader> to i64))<nsw> max backedge-taken count is -1 If we generate sext instead of zext, the information computed by ScalarEvolution is as follows- Outer loop- backedge-taken count is 39 max backedge-taken count is 39 Inner loop- backedge-taken count is {-2,+,1}<nw><%for.cond1.preheader> max backedge-taken count is -2 We now have a simplified backedge taken count for the inner loop which is almost as precise as before indvars, except for the <nw> flag instead of <nsw> flag. I think ScalarEvolution should be able to precisely deduce wrap flags in the presence of sext but this may require a separate fix. The reason for the conservative max backedge taken count may be the same. Thanks, Pankaj -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170413/61343235/attachment.html>
Hal Finkel via llvm-dev
2017-Apr-13 00:38 UTC
[llvm-dev] Question on induction variable simplification pass
[+Sanjoy] The fact that we lose information by widening during induction-variable simplification is certainly a known problem. I don't recall if we've ever really decided on a path forward. I personally suspect that, as an information-destroying transformation, the widening should be moved to the lowering phase (i.e. near where we do vectorization, etc.). Unless the widening itself enables other transformations, I don't see why we should do it early. The one exception I can think of is where it might enable us to collapse redundant PHIs, as is: int i = 0; long j = 0; for (; i < n; ++i, ++j) { ... using i and j ... } but that seems like a special case we could handle separately. -Hal On 04/12/2017 07:21 PM, Chawla, Pankaj via llvm-dev wrote:> > Hi all, > > It looks like the induction variable simplification pass prefers doing > a zero-extension to compute the wider trip count of loops when > extending the IV. This can sometimes result in loss of information > making ScalarEvolution’s analysis conservative which can lead to > missed performance opportunities. > > For example, consider this loopnest- > > int i, j; > > for(i=0; i< 40; i++) > > for(j=0; j< i-1; j++) > > A[i+j] = j; > > We are mainly interested in the backedge taken count of the inner loop. > > Before indvars, the backedge information computed by ScalarEvolution > is as follows- > > Outer loop- > > backedge-taken count is 39 > > max backedge-taken count is 39 > > Inner loop- > > backedge-taken count is {-2,+,1}<nsw><%for.cond1.preheader> > > max backedge-taken count is 37 > > After indvars, the backedge information computed by ScalarEvolution is > as follows- > > Outer loop- > > backedge-taken count is 39 > > max backedge-taken count is 39 > > Inner loop- > > backedge-taken count is (-1 + (zext i32 > {-1,+,1}<nsw><%for.cond1.preheader> to i64))<nsw> > > max backedge-taken count is -1 > > If we generate sext instead of zext, the information computed by > ScalarEvolution is as follows- > > Outer loop- > > backedge-taken count is 39 > > max backedge-taken count is 39 > > Inner loop- > > backedge-taken count is {-2,+,1}<nw><%for.cond1.preheader> > > max backedge-taken count is -2 > > We now have a simplified backedge taken count for the inner loop which > is almost as precise as before indvars, except for the <nw> flag > instead of <nsw> flag. I think ScalarEvolution should be able to > precisely deduce wrap flags in the presence of sext but this may > require a separate fix. The reason for the conservative max backedge > taken count may be the same. > > Thanks, > > Pankaj > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/68ec1432/attachment.html>
Sanjoy Das via llvm-dev
2017-Apr-14 03:34 UTC
[llvm-dev] Question on induction variable simplification pass
Hi Pankaj, On April 12, 2017 at 5:22:45 PM, Chawla, Pankaj via llvm-dev (llvm-dev at lists.llvm.org) wrote:> It looks like the induction variable simplification pass prefers doing a zero-extension > to compute the wider trip count of loops when extending the IV. This can sometimes result > in loss of information making ScalarEvolution's analysis conservative which can lead > to missed performance opportunities. > > For example, consider this loopnest- > > int i, j; > for(i=0; i< 40; i++) > for(j=0; j< i-1; j++) > A[i+j] = j; > > > We are mainly interested in the backedge taken count of the inner loop. > > Before indvars, the backedge information computed by ScalarEvolution is as follows- > > Outer loop- > backedge-taken count is 39 > max backedge-taken count is 39 > > Inner loop- > backedge-taken count is {-2,+,1}<%for.cond1.preheader> > max backedge-taken count is 37 > > > After indvars, the backedge information computed by ScalarEvolution is as follows- > > Outer loop- > backedge-taken count is 39 > max backedge-taken count is 39 > > Inner loop- > backedge-taken count is (-1 + (zext i32 {-1,+,1}<%for.cond1.preheader> to i64)) > max backedge-taken count is -1One approach is to use the facts: - The inner loop will not be entered in the 0th iteration of <%for.cond1.preheader> - {-1,+,1}<%for.cond1.preheader> is s< 40 to simplify the above to {-2,+,1}<%for.cond1.preheader> (in i64). The original expression was not -2 in the 0th iteration of <%for.cond1.preheader>, but we don't care about that iteration of <%for.cond1.preheader> since we won't enter the inner loop. The other option is to widen of IVs late, as a "lowering" transformation, like Hal said. That's a more invasive change, but if you have time and resources, it would be nice to at least give it a shot, measure and see what falls over.> If we generate sext instead of zext, the information computed by ScalarEvolution is > as follows- > > Outer loop- > backedge-taken count is 39 > max backedge-taken count is 39 > > Inner loop- > backedge-taken count is {-2,+,1}<%for.cond1.preheader> > max backedge-taken count is -2 > > We now have a simplified backedge taken count for the inner loop which is almost as precise > as before indvars, except for the flag instead of flag. I think ScalarEvolution(JFYI: My mail client's compose ate the <nsw> and <nw>) Can you please share the IR that you piped through SCEV? My guess is that SCEV did not "try to" infer a more aggressive no-wrap flag for {-2,+,1} -- most of the no-wrap inferring logic kicks in when you try to sign/zero extend an add recurrence. One suspicious bit here is the "max backedge-taken count is -2" bit. I expected SCEV to have inferred the max trip count to be 37 in this second case. -- Sanjoy
Chawla, Pankaj via llvm-dev
2017-Apr-14 23:55 UTC
[llvm-dev] Question on induction variable simplification pass
Hi Sanjoy, I have attached the IR I got by compiling with -O2. This is just before we widen the IV. To get the backedge taken count info I ran indvars on it and then replaced zext with sext. I think regardless of where we decide to add this transformation in the pipeline, it should try to preserve as much information as it can. This means that we should generate sext for signed IVs and vice-versa. I believe this is a better approach as it preserves the information directly in the IR as opposed to relying on ScalarEvolution to deduce it. Moving it to a different location can be done separately. Do you agree? Thanks, Pankaj -----Original Message----- From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com] Sent: Thursday, April 13, 2017 8:35 PM To: llvm-dev at lists.llvm.org; Chawla, Pankaj Subject: Re: [llvm-dev] Question on induction variable simplification pass Hi Pankaj, On April 12, 2017 at 5:22:45 PM, Chawla, Pankaj via llvm-dev (llvm-dev at lists.llvm.org) wrote:> It looks like the induction variable simplification pass prefers doing > a zero-extension to compute the wider trip count of loops when > extending the IV. This can sometimes result in loss of information > making ScalarEvolution's analysis conservative which can lead to missed performance opportunities. > > For example, consider this loopnest- > > int i, j; > for(i=0; i< 40; i++) > for(j=0; j< i-1; j++) > A[i+j] = j; > > > We are mainly interested in the backedge taken count of the inner loop. > > Before indvars, the backedge information computed by ScalarEvolution > is as follows- > > Outer loop- > backedge-taken count is 39 > max backedge-taken count is 39 > > Inner loop- > backedge-taken count is {-2,+,1}<%for.cond1.preheader> max > backedge-taken count is 37 > > > After indvars, the backedge information computed by ScalarEvolution is > as follows- > > Outer loop- > backedge-taken count is 39 > max backedge-taken count is 39 > > Inner loop- > backedge-taken count is (-1 + (zext i32 {-1,+,1}<%for.cond1.preheader> > to i64)) max backedge-taken count is -1One approach is to use the facts: - The inner loop will not be entered in the 0th iteration of <%for.cond1.preheader> - {-1,+,1}<%for.cond1.preheader> is s< 40 to simplify the above to {-2,+,1}<%for.cond1.preheader> (in i64). The original expression was not -2 in the 0th iteration of <%for.cond1.preheader>, but we don't care about that iteration of <%for.cond1.preheader> since we won't enter the inner loop. The other option is to widen of IVs late, as a "lowering" transformation, like Hal said. That's a more invasive change, but if you have time and resources, it would be nice to at least give it a shot, measure and see what falls over.> If we generate sext instead of zext, the information computed by > ScalarEvolution is as follows- > > Outer loop- > backedge-taken count is 39 > max backedge-taken count is 39 > > Inner loop- > backedge-taken count is {-2,+,1}<%for.cond1.preheader> max > backedge-taken count is -2 > > We now have a simplified backedge taken count for the inner loop which > is almost as precise as before indvars, except for the flag instead of > flag. I think ScalarEvolution(JFYI: My mail client's compose ate the <nsw> and <nw>) Can you please share the IR that you piped through SCEV? My guess is that SCEV did not "try to" infer a more aggressive no-wrap flag for {-2,+,1} -- most of the no-wrap inferring logic kicks in when you try to sign/zero extend an add recurrence. One suspicious bit here is the "max backedge-taken count is -2" bit. I expected SCEV to have inferred the max trip count to be 37 in this second case. -- Sanjoy -------------- next part -------------- A non-text attachment was scrubbed... Name: indvars.ll Type: application/octet-stream Size: 1435 bytes Desc: indvars.ll URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170414/9273ec1d/attachment.obj>