thr3ads.net - llvm dev - [llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Huang, Li1 via llvm-dev

2016-Aug-03 18:46 UTC

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

Hi,

I'm working on a slow-compile problem caused by SCEV (PR28830), and I need
your suggestions on how to fix it. The loop below causes
ScalarEvolution::getMulExpr to hang.

int get(unsigned n)
{
    unsigned i, j, mult = 1;
    for (i = 0; i < 1; i++) {
        for (j = 0; j < 30; j++) {
            mult *= n++;
        }
    }
    return mult;
}

the inner loop is completed unrolled (by 30) to a long chain of "add"
and "mult" instructions, then indvars follows loop-unroll and triggers
SCEV creations for this long instruction chain. When it comes to getMulExpr, it
tries to fold multiplications of AddRecs of the same induction variable
recursively, this could be very slow if the expression tree is deep.

Code responsible for this problem starts here (in ScalarEvolution.cpp):

   // Okay, if there weren't any loop invariants to be folded, check to see
if
    // there are multiple AddRec's with the same loop induction variable
being
    // multiplied together.  If so, we can fold them.

    // {A1,+,A2,+,...,+,An}<L> * {B1,+,B2,+,...,+,Bn}<L>
    // = {x=1 in [ sum y=x..2x [ sum z=max(y-x, y-n)..min(x,n) [
    //       choose(x, 2x)*choose(2x-y, x-z)*A_{y-z}*B_z
    //   ]]],+,...up to x=2n}.
    // Note that the arguments to choose() are always integers with values
    // known at compile time, never SCEV objects.
    //
    // The implementation avoids pointless extra computations when the two
    // addrec's are of different length (mathematically, it's equivalent
to
    // an infinite stream of zeros on the right).
    bool OpsModified = false;
    for (unsigned OtherIdx = Idx+1;
         OtherIdx != Ops.size() &&
isa<SCEVAddRecExpr>(Ops[OtherIdx]);
...
...

One way I thought about to fix this is to have a threshold for the number of
terms in each AddRec when folding them. For example, we only fold AddRec1 *
AddRec2 when  (#terms in AddRec1 + #terms in AddRec2) < threshold. I tried
setting the threshold to 8, and it stops early for this case. Another way is to
limit the depth of this folding, but this might require change of API.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160803/cb2b69d9/attachment.html>

Kevin Choi via llvm-dev

2016-Aug-03 19:16 UTC

head link

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

Correct me if I'm wrong, is the code trying to expand geometric series into
polynomial expression? (that doesn't sound like a very good thing to do, it
would spend so much time computing coefficients)

Wouldn't this be better to transform as below:
mult *= n++   // 30 times

; into
mult = mult (n)...(n+29)
       = mult 1...(n+29)/((1)...(n-1))
       = mult * geo_sum(n+29) / geo_sum(n-1)  ; is this more expensive than
leaving as geo series?

Regards,
Kevin

On 3 August 2016 at 11:46, Huang, Li1 via llvm-dev <llvm-dev at
lists.llvm.org>
wrote:
> Hi,
>
>
>
> I’m working on a slow-compile problem caused by SCEV (PR28830), and I need
> your suggestions on how to fix it. The loop below causes
> ScalarEvolution::getMulExpr to hang.
>
>
>
> int get(unsigned n)
>
> {
>
>     unsigned i, j, mult = 1;
>
>     for (i = 0; i < 1; i++) {
>
>         for (j = 0; j < 30; j++) {
>
>             mult *= n++;
>
>         }
>
>     }
>
>     return mult;
>
> }
>
>
>
> the inner loop is completed unrolled (by 30) to a long chain of “add” and
> “mult” instructions, then indvars follows loop-unroll and triggers SCEV
> creations for this long instruction chain. When it comes to getMulExpr, it
> tries to fold multiplications of AddRecs of the same induction variable
> recursively, this could be very slow if the expression tree is deep.
>
>
>
> Code responsible for this problem starts here (in ScalarEvolution.cpp):
>
>
>
>    // Okay, if there weren't any loop invariants to be folded, check to
> see if
>
>     // there are multiple AddRec's with the same loop induction
variable
> being
>
>     // multiplied together.  If so, we can fold them.
>
>
>
>     // {A1,+,A2,+,...,+,An}<L> * {B1,+,B2,+,...,+,Bn}<L>
>
>     // = {x=1 in [ sum y=x..2x [ sum z=max(y-x, y-n)..min(x,n) [
>
>     //       choose(x, 2x)*choose(2x-y, x-z)*A_{y-z}*B_z
>
>     //   ]]],+,...up to x=2n}.
>
>     // Note that the arguments to choose() are always integers with values
>
>     // known at compile time, never SCEV objects.
>
>     //
>
>     // The implementation avoids pointless extra computations when the two
>
>     // addrec's are of different length (mathematically, it's
equivalent to
>
>     // an infinite stream of zeros on the right).
>
>     bool OpsModified = false;
>
>     for (unsigned OtherIdx = Idx+1;
>
>          OtherIdx != Ops.size() &&
isa<SCEVAddRecExpr>(Ops[OtherIdx]);
>
> ...
>
> ...
>
>
>
> One way I thought about to fix this is to have a threshold for the number
> of terms in each AddRec when folding them. For example, we only fold
> AddRec1 * AddRec2 when  (#terms in AddRec1 + #terms in AddRec2) <
> threshold. I tried setting the threshold to 8, and it stops early for this
> case. Another way is to limit the depth of this folding, but this might
> require change of API.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160803/37c56333/attachment.html>

Mehdi Amini via llvm-dev

2016-Aug-03 19:16 UTC

head link

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

> On Aug 3, 2016, at 11:46 AM, Huang, Li1 via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
>  
> I’m working on a slow-compile problem caused by SCEV (PR28830), and I need
your suggestions on how to fix it. The loop below causes
ScalarEvolution::getMulExpr to hang.
I believe this is a dup of https://llvm.org/bugs/show_bug.cgi?id=18606

See also this patch: https://reviews.llvm.org/D3127

— 
Mehdi

>  
> int get(unsigned n)
> {
>     unsigned i, j, mult = 1;
>     for (i = 0; i < 1; i++) {
>         for (j = 0; j < 30; j++) {
>             mult *= n++;
>         }
>     }
>     return mult;
> }
>  
> the inner loop is completed unrolled (by 30) to a long chain of “add” and
“mult” instructions, then indvars follows loop-unroll and triggers SCEV
creations for this long instruction chain. When it comes to getMulExpr, it tries
to fold multiplications of AddRecs of the same induction variable recursively,
this could be very slow if the expression tree is deep.
>  
> Code responsible for this problem starts here (in ScalarEvolution.cpp):
>  
>    // Okay, if there weren't any loop invariants to be folded, check to
see if
>     // there are multiple AddRec's with the same loop induction
variable being
>     // multiplied together.  If so, we can fold them.
>  
>     // {A1,+,A2,+,...,+,An}<L> * {B1,+,B2,+,...,+,Bn}<L>
>     // = {x=1 in [ sum y=x..2x [ sum z=max(y-x, y-n)..min(x,n) [
>     //       choose(x, 2x)*choose(2x-y, x-z)*A_{y-z}*B_z
>     //   ]]],+,...up to x=2n}.
>     // Note that the arguments to choose() are always integers with values
>     // known at compile time, never SCEV objects.
>     //
>     // The implementation avoids pointless extra computations when the two
>     // addrec's are of different length (mathematically, it's
equivalent to
>     // an infinite stream of zeros on the right).
>     bool OpsModified = false;
>     for (unsigned OtherIdx = Idx+1;
>          OtherIdx != Ops.size() &&
isa<SCEVAddRecExpr>(Ops[OtherIdx]);
> ...
> ...
>  
> One way I thought about to fix this is to have a threshold for the number
of terms in each AddRec when folding them. For example, we only fold AddRec1 *
AddRec2 when  (#terms in AddRec1 + #terms in AddRec2) < threshold. I tried
setting the threshold to 8, and it stops early for this case. Another way is to
limit the depth of this folding, but this might require change of API.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160803/04bfe259/attachment.html>

Sanjoy Das via llvm-dev

2016-Aug-03 19:23 UTC

head link

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

Hi,

I generally don't like cutoffs / "max depths", but I can't
think of a
better solution here (other than perhaps not doing this simplification
at all, but that will affect things that depend on this
simplification).  I'd be okay with adding a "SimplificationDepth"
type
cl::opt to SCEV that controls the point where we bail out of
simplifying further.

-- Sanjoy

Sanjoy Das via llvm-dev

2016-Aug-03 19:24 UTC

head link

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

Hi Kevin,

Kevin Choi via llvm-dev wrote:
 > Correct me if I'm wrong, is the code trying to expand geometric series
 > into polynomial expression? (that doesn't sound like a very good thing
 > to do, it would spend so much time computing coefficients)
 >
 > Wouldn't this be better to transform as below:
 > mult *= n++   // 30 times
 >
 > ; into
 > mult = mult (n)...(n+29)
 >         = mult 1...(n+29)/((1)...(n-1))
 >         = mult * geo_sum(n+29) / geo_sum(n-1)  ; is this more expensive
 > than leaving as geo series?

Not sure what you mean by geo_sum -- is geo_sum(n) == 1 * 2 * ... * n?
If so, SCEV does not have a node for that.

-- Sanjoy

Huang, Li1 via llvm-dev

2016-Aug-03 19:53 UTC

head link

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

Hi Mehdi,

Thank you for finding this dup, I did some search but wasn’t able to locate a
dup.

-Li Huang

From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
Sent: Wednesday, August 3, 2016 12:17 PM
To: Huang, Li1 <li1.huang at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [SCEV] getMulExpr could be extremely slow when creating
SCEVs for a long chain of add/mul instructions

On Aug 3, 2016, at 11:46 AM, Huang, Li1 via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi,

I’m working on a slow-compile problem caused by SCEV (PR28830), and I need your
suggestions on how to fix it. The loop below causes ScalarEvolution::getMulExpr
to hang.

I believe this is a dup of https://llvm.org/bugs/show_bug.cgi?id=18606

See also this patch: https://reviews.llvm.org/D3127

—
Mehdi

int get(unsigned n)
{
    unsigned i, j, mult = 1;
    for (i = 0; i < 1; i++) {
        for (j = 0; j < 30; j++) {
            mult *= n++;
        }
    }
    return mult;
}

the inner loop is completed unrolled (by 30) to a long chain of “add” and “mult”
instructions, then indvars follows loop-unroll and triggers SCEV creations for
this long instruction chain. When it comes to getMulExpr, it tries to fold
multiplications of AddRecs of the same induction variable recursively, this
could be very slow if the expression tree is deep.

Code responsible for this problem starts here (in ScalarEvolution.cpp):

   // Okay, if there weren't any loop invariants to be folded, check to see
if
    // there are multiple AddRec's with the same loop induction variable
being
    // multiplied together.  If so, we can fold them.

    // {A1,+,A2,+,...,+,An}<L> * {B1,+,B2,+,...,+,Bn}<L>
    // = {x=1 in [ sum y=x..2x [ sum z=max(y-x, y-n)..min(x,n) [
    //       choose(x, 2x)*choose(2x-y, x-z)*A_{y-z}*B_z
    //   ]]],+,...up to x=2n}.
    // Note that the arguments to choose() are always integers with values
    // known at compile time, never SCEV objects.
    //
    // The implementation avoids pointless extra computations when the two
    // addrec's are of different length (mathematically, it's equivalent
to
    // an infinite stream of zeros on the right).
    bool OpsModified = false;
    for (unsigned OtherIdx = Idx+1;
         OtherIdx != Ops.size() &&
isa<SCEVAddRecExpr>(Ops[OtherIdx]);
...
...

One way I thought about to fix this is to have a threshold for the number of
terms in each AddRec when folding them. For example, we only fold AddRec1 *
AddRec2 when  (#terms in AddRec1 + #terms in AddRec2) < threshold. I tried
setting the threshold to 8, and it stops early for this case. Another way is to
limit the depth of this folding, but this might require change of API.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160803/b29c4d60/attachment-0001.html>

Huang, Li1 via llvm-dev

2016-Aug-03 20:00 UTC

head link

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

Hi Sanjoy,

Thank you for your suggestion. Looks like someone is already working on a
solution in this direction, as Mehdi pointed out:
https://reviews.llvm.org/D3127. But this patch is old and was not reviewed until
recently.

- Li Huang

-----Original Message-----
From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com] 
Sent: Wednesday, August 3, 2016 12:23 PM
To: Huang, Li1 <li1.huang at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [SCEV] getMulExpr could be extremely slow when creating
SCEVs for a long chain of add/mul instructions

Hi,

I generally don't like cutoffs / "max depths", but I can't
think of a better solution here (other than perhaps not doing this
simplification at all, but that will affect things that depend on this
simplification).  I'd be okay with adding a "SimplificationDepth"
type cl::opt to SCEV that controls the point where we bail out of simplifying
further.

-- Sanjoy

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Aug 2016 - [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

[llvm-dev] [SCEV] getMulExpr could be extremely slow when creating SCEVs for a long chain of add/mul instructions

Apparently Analagous Threads