thr3ads.net - llvm dev - [LLVMdev] Hoisting elements of array argument into registers [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Max Bolingbroke

2010-Nov-05 12:56 UTC

[LLVMdev] Hoisting elements of array argument into registers

Hi,

The Glasgow Haskell Compiler LLVM backend is generating a lot of code which
appears to be optimised  by LLVM quite poorly. The problem is demonstrated by
this C source code:

int wf(int sp[]) {
    if (sp[0] == 0) {
        return sp[2] + sp[3];
    } else {
        sp[3] = sp[3] + (sp[1] * 5);
        sp[2] = (sp[2] + sp[0]) + 1;
        sp[1] = sp[1] - 1;
        sp[0] = sp[0] - 1;
        return wf(sp);
    }
}

int g(int a) {
    int sp[] = { a, a, 0, 0 };
    return wf(sp);
}

int main(void) {
    printf("%d\n", g(5));
    return 0;
}

As you can see, wf takes its arguments via a pointer. Nonetheless, we would do
like to loop  optimisations on wf -- in particular, we would like to eliminate
the duplicate computations in sp[0] and sp[1] (GHC's own optimiser does not
do
any classical loop optimisations like this).

Unfortunately, LLVM (opt -O3: I tried v2.7 and HEAD) doesn't seem to get
very
far. It eliminates the tail call for wf() correctly, but the arguments are
carried between each iteration of the loop by storing intothe sp array. It
would be better if the arguments were kept in registers during the loop and
only stored  into the array upon exit. (These stores could then be eliminated
easily since there is no later load from sp).

If I manually rewrite the code to replace the array argument with its composite
scalars in the following manner LLVM obviously does the right thing:

int wf(int a, int b, int c, int d) {
    if (a == 0) {
        return c + d;
    } else {
        return wf(a - 1, b - 1, (c + a) + 1, d + (b * 5));
    }
}

int g(int a) {
    return wf(a, a, 0, 0);
}

int main(void) {
    printf("%d\n", g(5));
    return 0;
}

The question is, is there any way to get LLVM to do this rewriting for us? It
seems like this would be a generally useful optimisation.

I guess you could also achieve this by sinking the stores to sp into next loop
iteration/the loop exit,  where they will meet the corresponding loads and so
good things will happen. The current instruction sinking pass doesn't seem
to
get this case, though.

Thanks in advance for any help!
Max

p.s. Our real code is a little more complicated because sp is really a pointer
to the top of the Haskell runtimes stack (which is of unknown length), not a
4-element array, which might make the optimisation a bit harder. Perhaps LLVM
would have to promote just those elements of the input array which were
accessed in the loop into scalars.

Duncan Sands

2010-Nov-05 14:51 UTC

head link

[LLVMdev] Hoisting elements of array argument into registers

Hi Max,
> The Glasgow Haskell Compiler LLVM backend is generating a lot of code which
> appears to be optimised  by LLVM quite poorly. The problem is demonstrated
by
> this C source code:
if I run this at -O3 under llvm-gcc from top-of-tree on x86-64 linux then
(1) it computes that g(5) is equal to 95, and main is transformed to just print
95.
(2) g is transformed into code with no loop, i.e. the result is computed
directly from the value of "a" in straight line code (there is a
special
case when a is zero).
(3) wf is not transformed as well: values are loaded from memory and stored
back on each loop.  And there is still a loop.

I see the same with clang.  I'm not sure why the optimizers do so much
better
when they can see that sp is a local array (the special initial values don't
matter).

Ciao,

Duncan.

>
> int wf(int sp[]) {
>      if (sp[0] == 0) {
>          return sp[2] + sp[3];
>      } else {
>          sp[3] = sp[3] + (sp[1] * 5);
>          sp[2] = (sp[2] + sp[0]) + 1;
>          sp[1] = sp[1] - 1;
>          sp[0] = sp[0] - 1;
>          return wf(sp);
>      }
> }
>
> int g(int a) {
>      int sp[] = { a, a, 0, 0 };
>      return wf(sp);
> }
>
> int main(void) {
>      printf("%d\n", g(5));
>      return 0;
> }
>
> As you can see, wf takes its arguments via a pointer. Nonetheless, we would
do
> like to loop  optimisations on wf -- in particular, we would like to
eliminate
> the duplicate computations in sp[0] and sp[1] (GHC's own optimiser does
not do
> any classical loop optimisations like this).
>
> Unfortunately, LLVM (opt -O3: I tried v2.7 and HEAD) doesn't seem to
get very
> far. It eliminates the tail call for wf() correctly, but the arguments are
> carried between each iteration of the loop by storing intothe sp array. It
> would be better if the arguments were kept in registers during the loop and
> only stored  into the array upon exit. (These stores could then be
eliminated
> easily since there is no later load from sp).
>
> If I manually rewrite the code to replace the array argument with its
composite
> scalars in the following manner LLVM obviously does the right thing:
>
> int wf(int a, int b, int c, int d) {
>      if (a == 0) {
>          return c + d;
>      } else {
>          return wf(a - 1, b - 1, (c + a) + 1, d + (b * 5));
>      }
> }
>
> int g(int a) {
>      return wf(a, a, 0, 0);
> }
>
> int main(void) {
>      printf("%d\n", g(5));
>      return 0;
> }
>
> The question is, is there any way to get LLVM to do this rewriting for us?
It
> seems like this would be a generally useful optimisation.
>
> I guess you could also achieve this by sinking the stores to sp into next
loop
> iteration/the loop exit,  where they will meet the corresponding loads and
so
> good things will happen. The current instruction sinking pass doesn't
seem to
> get this case, though.
>
> Thanks in advance for any help!
> Max
>
> p.s. Our real code is a little more complicated because sp is really a
pointer
> to the top of the Haskell runtimes stack (which is of unknown length), not
a
> 4-element array, which might make the optimisation a bit harder. Perhaps
LLVM
> would have to promote just those elements of the input array which were
> accessed in the loop into scalars.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Duncan Sands

2010-Nov-05 16:51 UTC

head link

[LLVMdev] Hoisting elements of array argument into registers

> I see the same with clang.  I'm not sure why the optimizers do so much
better
> when they can see that sp is a local array (the special initial values
don't
> matter).
It is the scalar replacement of aggregates pass that puts everything into
registers when sp is a local array.  What happens is: the tail recursion in
wf is eliminated.  wf is inlined into g.  scalarrepl turns the (local) array
accesses in g into registers.  Once everything is in registers, later passes
clean everything up.

Ciao,

Duncan.

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Nov 2010 - [LLVMdev] Hoisting elements of array argument into registers

[LLVMdev] Hoisting elements of array argument into registers

[LLVMdev] Hoisting elements of array argument into registers

[LLVMdev] Hoisting elements of array argument into registers

Possibly Parallel Threads