Björn Ruytenberg via llvm-dev
2018-Mar-08 16:37 UTC
[llvm-dev] [Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1
Hi, Recently I was looking at the potential of optimizing through Polly. The code that I am trying to optimize [1] adjusts a picture's colors to get an Instagram-like effect. To improve code analyzability on LLVM 3.9.0, I made the following changes: - Improve SCoP detection through -polly-process-unprofitable - Enable outer loop vectorization through -polly-vectorizer=stripmine, disabling timeouts with -polly-dependences-computeout=0 - Avoid sign extensions by replacing all 32-bit ints with longs, as Polly seems to model using 64-bit loop counters - Avoid interrupting control flow through -ffast-math and moving mallocs to the top of the code So to compile, we have: clang -I. -O3 -g3 -Wall -Wextra -std=c99 -D_POSIX_C_SOURCE=200000L -ffast-math -mllvm -polly -mllvm -polly-dot -mllvm -polly-process-unprofitable -mllvm -polly-vectorizer=stripmine -mllvm -polly-dependences-computeout=0 -c -o localcolorcorrection.o localcolorcorrection.c Unfortunately, LLVM 5.0.1 generates different results in analyzing the CFG compared to LLVM 3.9.0. The latter version analyzes most of the CFG [2], but 5.0.1 leaves large parts of the hot paths untouched due to "non affine access functions" [3]. What I have tried: - Moving Polly to different positions in the LLVM pass pipeline (-polly-position=early vs. -polly-position=before-vectorizer). The latter option adds one large basic block, but otherwise doesn't seem to analyze the hot paths. - Setting -polly-delicm-compute-known=true and polly-delicm-overapproximate-writes=true. This doesn't seem to have effect on the hot paths. Can anyone give me some pointers on how to fix this? Or could this be a regression in Polly? Thanks! [1] https://nautilus.bjornweb.nl/files/localcolorcorrection.c [2] https://nautilus.bjornweb.nl/files/polly390-cfg.pdf [3] https://nautilus.bjornweb.nl/files/polly501-cfg.pdf -- Kind regards, Björn Ruytenberg https://bjornweb.nl
Alexandre Isoard via llvm-dev
2018-Mar-08 20:32 UTC
[llvm-dev] [Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1
Hi, Polly can only analyze (multidimensional) affine memory access. Polynomial memory access don't do well, and I see your code has some linearized arrays (that leads to polynomials). Luckily Polly has a delinearizer that tries to recover multidimensional access from linearized ones, but the problem is that it does not always work (especially if earlier transformations "optimize" it). That might be the problem here, you could look at the SCEV of the memory access if they look "nice". I don't know how good is the delinearization in general. That is, does it survive most of LLVM transformations? On Thu, Mar 8, 2018 at 8:37 AM, Björn Ruytenberg via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > Recently I was looking at the potential of optimizing through Polly. The > code that I am trying to optimize [1] adjusts a picture's colors to get > an Instagram-like effect. > > To improve code analyzability on LLVM 3.9.0, I made the following changes: > - Improve SCoP detection through -polly-process-unprofitable > - Enable outer loop vectorization through -polly-vectorizer=stripmine, > disabling timeouts with -polly-dependences-computeout=0 > - Avoid sign extensions by replacing all 32-bit ints with longs, as > Polly seems to model using 64-bit loop counters > - Avoid interrupting control flow through -ffast-math and moving mallocs > to the top of the code > > So to compile, we have: > clang -I. -O3 -g3 -Wall -Wextra -std=c99 -D_POSIX_C_SOURCE=200000L > -ffast-math -mllvm -polly -mllvm -polly-dot -mllvm > -polly-process-unprofitable -mllvm -polly-vectorizer=stripmine -mllvm > -polly-dependences-computeout=0 -c -o localcolorcorrection.o > localcolorcorrection.c > > Unfortunately, LLVM 5.0.1 generates different results in analyzing the > CFG compared to LLVM 3.9.0. The latter version analyzes most of the CFG > [2], but 5.0.1 leaves large parts of the hot paths untouched due to "non > affine access functions" [3]. > > What I have tried: > - Moving Polly to different positions in the LLVM pass pipeline > (-polly-position=early vs. -polly-position=before-vectorizer). The > latter option adds one large basic block, but otherwise doesn't seem to > analyze the hot paths. > - Setting -polly-delicm-compute-known=true and > polly-delicm-overapproximate-writes=true. This doesn't seem to have > effect on the hot paths. > > Can anyone give me some pointers on how to fix this? Or could this be a > regression in Polly? > > Thanks! > > [1] https://nautilus.bjornweb.nl/files/localcolorcorrection.c > [2] https://nautilus.bjornweb.nl/files/polly390-cfg.pdf > [3] https://nautilus.bjornweb.nl/files/polly501-cfg.pdf > > -- > Kind regards, > Björn Ruytenberg > https://bjornweb.nl > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- *Alexandre Isoard* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180308/426e80ea/attachment.html>
Maybe Matching Threads
- [Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1
- Build polly-amd64-linux Failure
- opt with Polly doesn't find the passes
- Determination of statements that contain only matrix multiplication
- Determination of statements that contain only matrix multiplication