On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi <Xiaoyi.Guo at amd.com> wrote:> Hi, > > We have a benchmark where there are 128 MAD computations in a loop. (See the attached IR.) Creating SCEVs for these expressions takes a long time, making the compile time too long. E.g., running opt with the “indvars” pass only takes 45 seconds. > > It seems that the majority of the time is spent in comparing the complexity of the expression operands to sort them. > > I realize that the expression grows to be really large towards the end of the loop. > > I don’t know of all the uses of the built SCEV. But I image it won’t be very useful for such complex expressions. Yet, it’s making the compile time much longer. > > So I’m wondering if it would make sense to abort the creation of SCEV when the expression gets really complex and large. Or is there any way to further optimize the performance of SCEV building for such cases. > > Thanks in advance for any response.Nice test case. I tried printing the SCEV… oops. I haven’t seen a case this bad, but I know you’re not the first to run into this problem. There are two steps in GroupByComplexity, sorting (std::sort) and grouping (N^2). The sort calls SCEVComplexityCompare::compare() which can make multiple recursive calls for nodes with multiple operands. This looks like it could be a disaster for expressions that are not strictly trees--exponential in the size of the DAG. If you just have a very large tree with many similar looking subexpressions, then I’m not sure what to do except cut it into reasonable subtrees. AFAICT, it’s not just sorting that’s a problem but also grouping? Also, I think the shear depth of the createSCEV recursion is itself a problem. I don’t see any reason not to limit the size of SCEV expressions, but I also don’t have a brilliant idea for how to do it at the moment (other than the obvious depth cutoff). -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130730/8c20dfb6/attachment.html>
Hi Andy, Thanks very much for looking into the problem. In this particular test case, it seems most of the time is spent in the sorting, not the grouping. Later, I realized that it seems in this test case most of the expressions to be compared have different length. I tried the following change in compare() when the LHS and RHS's types are the same: ==================================================================--- lib/Analysis/ScalarEvolution.cpp (revision 187379) +++ lib/Analysis/ScalarEvolution.cpp (working copy) @@ -585,6 +585,9 @@ case scAddExpr: case scMulExpr: case scSMaxExpr: case scUMaxExpr: { const SCEVNAryExpr *LC = cast<SCEVNAryExpr>(LHS); const SCEVNAryExpr *RC = cast<SCEVNAryExpr>(RHS); // Lexicographically compare n-ary expressions. unsigned LNumOps = LC->getNumOperands(), RNumOps = RC->getNumOperands(); + if (LNumOps != RNumOps) { + return (int)LNumOps - (int)RNumOps; + } for (unsigned i = 0; i != LNumOps; ++i) { if (i >= RNumOps) return 1; And the compile time is cut down from 45s to 1s. This will give different sorting result than the original algorithm. However, it looks like that shouldn't be a problem according to this comment before the switch statement in compare(); // Aside from the getSCEVType() ordering, the particular ordering // isn't very important except that it's beneficial to be consistent, // so that (a + b) and (b + a) don't end up as different expressions. Does this solution seem ok? If the above solution seems ok, that solves the problem for cases when most of the time the expressions to be compared have different lengths. However, the problem still exists if the expressions to be compared are large, similar, and have the same length. Maybe I'll leave that to later when there's a test case for such situations? Thanks, Xiaoyi From: Andrew Trick [mailto:atrick at apple.com] Sent: Tuesday, July 30, 2013 2:20 PM To: Guo, Xiaoyi Cc: LLVMdev at cs.uiuc.edu; Dan Gohman Subject: Re: [LLVMdev] creating SCEV taking too long On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi <Xiaoyi.Guo at amd.com<mailto:Xiaoyi.Guo at amd.com>> wrote: Hi, We have a benchmark where there are 128 MAD computations in a loop. (See the attached IR.) Creating SCEVs for these expressions takes a long time, making the compile time too long. E.g., running opt with the "indvars" pass only takes 45 seconds. It seems that the majority of the time is spent in comparing the complexity of the expression operands to sort them. I realize that the expression grows to be really large towards the end of the loop. I don't know of all the uses of the built SCEV. But I image it won't be very useful for such complex expressions. Yet, it's making the compile time much longer. So I'm wondering if it would make sense to abort the creation of SCEV when the expression gets really complex and large. Or is there any way to further optimize the performance of SCEV building for such cases. Thanks in advance for any response. Nice test case. I tried printing the SCEV... oops. I haven't seen a case this bad, but I know you're not the first to run into this problem. There are two steps in GroupByComplexity, sorting (std::sort) and grouping (N^2). The sort calls SCEVComplexityCompare::compare() which can make multiple recursive calls for nodes with multiple operands. This looks like it could be a disaster for expressions that are not strictly trees--exponential in the size of the DAG. If you just have a very large tree with many similar looking subexpressions, then I'm not sure what to do except cut it into reasonable subtrees. AFAICT, it's not just sorting that's a problem but also grouping? Also, I think the shear depth of the createSCEV recursion is itself a problem. I don't see any reason not to limit the size of SCEV expressions, but I also don't have a brilliant idea for how to do it at the moment (other than the obvious depth cutoff). -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130730/6bfdf074/attachment.html>
On Tue, Jul 30, 2013 at 2:20 PM, Andrew Trick <atrick at apple.com> wrote:> > On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi <Xiaoyi.Guo at amd.com> wrote: > > Hi, > > We have a benchmark where there are 128 MAD computations in a loop. (See the > attached IR.) Creating SCEVs for these expressions takes a long time, making > the compile time too long. E.g., running opt with the “indvars” pass only > takes 45 seconds. > > It seems that the majority of the time is spent in comparing the complexity > of the expression operands to sort them. > > I realize that the expression grows to be really large towards the end of > the loop. > > I don’t know of all the uses of the built SCEV. But I image it won’t be very > useful for such complex expressions. Yet, it’s making the compile time much > longer. > > So I’m wondering if it would make sense to abort the creation of SCEV when > the expression gets really complex and large. Or is there any way to further > optimize the performance of SCEV building for such cases. > > Thanks in advance for any response. > > > Nice test case. I tried printing the SCEV… oops. I haven’t seen a case this > bad, but I know you’re not the first to run into this problem. > > There are two steps in GroupByComplexity, sorting (std::sort) and grouping > (N^2). > > The sort calls SCEVComplexityCompare::compare() which can make multiple > recursive calls for nodes with multiple operands. This looks like it could > be a disaster for expressions that are not strictly trees--exponential in > the size of the DAG. >Yes, this is why i suggested computing some deterministic "complexity hash" on the way, and caching it in the SCEV. It would not help if they were all the same, but if they were different only at the end, you wouldn't end up comparing every operand to get there. If they are all the same, you are right that cut-off is the only reasonable answer. Or calculate "complexity" in a way that does not require operand by operand comparison (IE compute a "complexity" number instead of a hash, as you build the SCEV). It's just trying to get a canonical sort here. This would at least make the sort fast, grouping can't be made linear unless you are willing to trust the hash :)
On Jul 30, 2013, at 4:10 PM, Guo, Xiaoyi <Xiaoyi.Guo at amd.com> wrote:> Hi Andy, > > Thanks very much for looking into the problem. > > In this particular test case, it seems most of the time is spent in the sorting, not the grouping. > > Later, I realized that it seems in this test case most of the expressions to be compared have different length. I tried the following change in compare() when the LHS and RHS’s types are the same: > > ==================================================================> --- lib/Analysis/ScalarEvolution.cpp (revision 187379) > +++ lib/Analysis/ScalarEvolution.cpp (working copy) > @@ -585,6 +585,9 @@ > case scAddExpr: > case scMulExpr: > case scSMaxExpr: > case scUMaxExpr: { > const SCEVNAryExpr *LC = cast<SCEVNAryExpr>(LHS); > const SCEVNAryExpr *RC = cast<SCEVNAryExpr>(RHS); > > // Lexicographically compare n-ary expressions. > unsigned LNumOps = LC->getNumOperands(), RNumOps = RC->getNumOperands(); > > + if (LNumOps != RNumOps) { > + return (int)LNumOps - (int)RNumOps; > + } > for (unsigned i = 0; i != LNumOps; ++i) { > if (i >= RNumOps) > return 1; > > And the compile time is cut down from 45s to 1s.Committed r187475. Thanks! -Andy> > This will give different sorting result than the original algorithm. However, it looks like that shouldn’t be a problem according to this comment before the switch statement in compare(); > // Aside from the getSCEVType() ordering, the particular ordering > // isn't very important except that it's beneficial to be consistent, > // so that (a + b) and (b + a) don't end up as different expressions. > > Does this solution seem ok? > > If the above solution seems ok, that solves the problem for cases when most of the time the expressions to be compared have different lengths. However, the problem still exists if the expressions to be compared are large, similar, and have the same length. Maybe I’ll leave that to later when there’s a test case for such situations? > > Thanks, > Xiaoyi > > From: Andrew Trick [mailto:atrick at apple.com] > Sent: Tuesday, July 30, 2013 2:20 PM > To: Guo, Xiaoyi > Cc: LLVMdev at cs.uiuc.edu; Dan Gohman > Subject: Re: [LLVMdev] creating SCEV taking too long > > > On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi <Xiaoyi.Guo at amd.com> wrote: > > > Hi, > > We have a benchmark where there are 128 MAD computations in a loop. (See the attached IR.) Creating SCEVs for these expressions takes a long time, making the compile time too long. E.g., running opt with the “indvars” pass only takes 45 seconds. > > It seems that the majority of the time is spent in comparing the complexity of the expression operands to sort them. > > I realize that the expression grows to be really large towards the end of the loop. > > I don’t know of all the uses of the built SCEV. But I image it won’t be very useful for such complex expressions. Yet, it’s making the compile time much longer. > > So I’m wondering if it would make sense to abort the creation of SCEV when the expression gets really complex and large. Or is there any way to further optimize the performance of SCEV building for such cases. > > Thanks in advance for any response. > > Nice test case. I tried printing the SCEV… oops. I haven’t seen a case this bad, but I know you’re not the first to run into this problem. > > There are two steps in GroupByComplexity, sorting (std::sort) and grouping (N^2). > > The sort calls SCEVComplexityCompare::compare() which can make multiple recursive calls for nodes with multiple operands. This looks like it could be a disaster for expressions that are not strictly trees--exponential in the size of the DAG. > > If you just have a very large tree with many similar looking subexpressions, then I’m not sure what to do except cut it into reasonable subtrees. > > AFAICT, it’s not just sorting that’s a problem but also grouping? Also, I think the shear depth of the createSCEV recursion is itself a problem. > I don’t see any reason not to limit the size of SCEV expressions, but I also don’t have a brilliant idea for how to do it at the moment (other than the obvious depth cutoff). > > -Andy-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130730/7b7bbbe0/attachment.html>
On Jul 30, 2013, at 6:46 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> On Tue, Jul 30, 2013 at 2:20 PM, Andrew Trick <atrick at apple.com> wrote: >> >> On Jul 29, 2013, at 4:08 PM, Guo, Xiaoyi <Xiaoyi.Guo at amd.com> wrote: >> >> Hi, >> >> We have a benchmark where there are 128 MAD computations in a loop. (See the >> attached IR.) Creating SCEVs for these expressions takes a long time, making >> the compile time too long. E.g., running opt with the “indvars” pass only >> takes 45 seconds. >> >> It seems that the majority of the time is spent in comparing the complexity >> of the expression operands to sort them. >> >> I realize that the expression grows to be really large towards the end of >> the loop. >> >> I don’t know of all the uses of the built SCEV. But I image it won’t be very >> useful for such complex expressions. Yet, it’s making the compile time much >> longer. >> >> So I’m wondering if it would make sense to abort the creation of SCEV when >> the expression gets really complex and large. Or is there any way to further >> optimize the performance of SCEV building for such cases. >> >> Thanks in advance for any response. >> >> >> Nice test case. I tried printing the SCEV… oops. I haven’t seen a case this >> bad, but I know you’re not the first to run into this problem. >> >> There are two steps in GroupByComplexity, sorting (std::sort) and grouping >> (N^2). >> >> The sort calls SCEVComplexityCompare::compare() which can make multiple >> recursive calls for nodes with multiple operands. This looks like it could >> be a disaster for expressions that are not strictly trees--exponential in >> the size of the DAG. >> > Yes, this is why i suggested computing some deterministic "complexity > hash" on the way, and caching it in the SCEV. > It would not help if they were all the same, but if they were > different only at the end, you wouldn't end up comparing every operand > to get there. > If they are all the same, you are right that cut-off is the only > reasonable answer. > Or calculate "complexity" in a way that does not require operand by > operand comparison (IE compute a "complexity" number instead of a > hash, as you build the SCEV). It's just trying to get a canonical > sort here. > This would at least make the sort fast, grouping can't be made linear > unless you are willing to trust the hash :)That would be ideal. I wish it were obvious to me how to implement it. Xiaoyi’s simple fix handled this scenario and was totally consistent with the current implementation. It seems we’re not yet running into the issue of visiting all paths in a DAG. I don’t want to discourage a more general fix/redesign though. We could probably recover more compile time in many cases. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130730/4f0d638d/attachment.html>