thr3ads.net - llvm dev - [llvm-dev] [Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1 [Mar 2018]

If this information is useful, please help other people find it:
Share via:

Björn Ruytenberg via llvm-dev

2018-Mar-08 16:37 UTC

[llvm-dev] [Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1

Hi,

Recently I was looking at the potential of optimizing through Polly. The
code that I am trying to optimize [1] adjusts a picture's colors to get
an Instagram-like effect.

To improve code analyzability on LLVM 3.9.0, I made the following changes:
- Improve SCoP detection through -polly-process-unprofitable
- Enable outer loop vectorization through -polly-vectorizer=stripmine,
disabling timeouts with -polly-dependences-computeout=0
- Avoid sign extensions by replacing all 32-bit ints with longs, as
Polly seems to model using 64-bit loop counters
- Avoid interrupting control flow through -ffast-math and moving mallocs
to the top of the code

So to compile, we have:
    clang -I. -O3 -g3 -Wall -Wextra -std=c99 -D_POSIX_C_SOURCE=200000L
-ffast-math -mllvm -polly -mllvm -polly-dot -mllvm
-polly-process-unprofitable -mllvm -polly-vectorizer=stripmine -mllvm
-polly-dependences-computeout=0 -c -o localcolorcorrection.o
localcolorcorrection.c

Unfortunately, LLVM 5.0.1 generates different results in analyzing the
CFG compared to LLVM 3.9.0. The latter version analyzes most of the CFG
[2], but 5.0.1 leaves large parts of the hot paths untouched due to "non
affine access functions" [3].

What I have tried:
    - Moving Polly to different positions in the LLVM pass pipeline
(-polly-position=early vs. -polly-position=before-vectorizer). The
latter option adds one large basic block, but otherwise doesn't seem to
analyze the hot paths.
    - Setting -polly-delicm-compute-known=true and
polly-delicm-overapproximate-writes=true. This doesn't seem to have
effect on the hot paths.

Can anyone give me some pointers on how to fix this? Or could this be a
regression in Polly?

Thanks!

[1] https://nautilus.bjornweb.nl/files/localcolorcorrection.c
[2] https://nautilus.bjornweb.nl/files/polly390-cfg.pdf
[3] https://nautilus.bjornweb.nl/files/polly501-cfg.pdf

-- 
Kind regards,
  Björn Ruytenberg
  https://bjornweb.nl

Alexandre Isoard via llvm-dev

2018-Mar-08 20:32 UTC

head link

[llvm-dev] [Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1

Hi,

Polly can only analyze (multidimensional) affine memory access. Polynomial
memory access don't do well, and I see your code has some linearized arrays
(that leads to polynomials).
Luckily Polly has a delinearizer that tries to recover multidimensional
access from linearized ones, but the problem is that it does not always
work (especially if earlier transformations "optimize" it).

That might be the problem here, you could look at the SCEV of the memory
access if they look "nice".
I don't know how good is the delinearization in general. That is, does it
survive most of LLVM transformations?

On Thu, Mar 8, 2018 at 8:37 AM, Björn Ruytenberg via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> Recently I was looking at the potential of optimizing through Polly. The
> code that I am trying to optimize [1] adjusts a picture's colors to get
> an Instagram-like effect.
>
> To improve code analyzability on LLVM 3.9.0, I made the following changes:
> - Improve SCoP detection through -polly-process-unprofitable
> - Enable outer loop vectorization through -polly-vectorizer=stripmine,
> disabling timeouts with -polly-dependences-computeout=0
> - Avoid sign extensions by replacing all 32-bit ints with longs, as
> Polly seems to model using 64-bit loop counters
> - Avoid interrupting control flow through -ffast-math and moving mallocs
> to the top of the code
>
> So to compile, we have:
>     clang -I. -O3 -g3 -Wall -Wextra -std=c99 -D_POSIX_C_SOURCE=200000L
> -ffast-math -mllvm -polly -mllvm -polly-dot -mllvm
> -polly-process-unprofitable -mllvm -polly-vectorizer=stripmine -mllvm
> -polly-dependences-computeout=0 -c -o localcolorcorrection.o
> localcolorcorrection.c
>
> Unfortunately, LLVM 5.0.1 generates different results in analyzing the
> CFG compared to LLVM 3.9.0. The latter version analyzes most of the CFG
> [2], but 5.0.1 leaves large parts of the hot paths untouched due to
"non
> affine access functions" [3].
>
> What I have tried:
>     - Moving Polly to different positions in the LLVM pass pipeline
> (-polly-position=early vs. -polly-position=before-vectorizer). The
> latter option adds one large basic block, but otherwise doesn't seem to
> analyze the hot paths.
>     - Setting -polly-delicm-compute-known=true and
> polly-delicm-overapproximate-writes=true. This doesn't seem to have
> effect on the hot paths.
>
> Can anyone give me some pointers on how to fix this? Or could this be a
> regression in Polly?
>
> Thanks!
>
> [1] https://nautilus.bjornweb.nl/files/localcolorcorrection.c
> [2] https://nautilus.bjornweb.nl/files/polly390-cfg.pdf
> [3] https://nautilus.bjornweb.nl/files/polly501-cfg.pdf
>
> --
> Kind regards,
>   Björn Ruytenberg
>   https://bjornweb.nl
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


-- 
*Alexandre Isoard*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180308/426e80ea/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Mar 2018 - [Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1

[llvm-dev] [Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1

[llvm-dev] [Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1

Possibly Parallel Threads