thr3ads.net - llvm dev - [LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Star Tan

2013-Jul-30 17:03 UTC

[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

Hi Tobias and all Polly developers,


I have re-evaluated the Polly compile-time performance using newest LLVM/Polly
source code.  You can view the results on  http://188.40.87.11:8000.


Especially, I also evaluated our r187102 patch file that avoids expensive
failure string operations in normal execution. Specifically, I evaluated two
cases for it:


Polly-NoCodeGen: clang -O3 -load LLVMPolly.so -mllvm -polly-optimizer=none
-mllvm -polly-code-generator=none
http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median
Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly
http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=11&baseline=11&aggregation_fn=median


The "Polly-NoCodeGen" case is mainly used to compare the compile-time
performance for the polly-detect pass. As shown in the results, our patch file
could significantly reduce the compile-time overhead for some benchmarks such as
tramp3dv4 (24.2%), simple_types_constant_folding(12.6%), oggenc(9.1%),
loop_unroll(7.8%)


The "Polly-opt" case is used to compare the whole compile-time
performance of Polly. Since our patch file mainly affects the Polly-Detect pass,
it shows similar performance to "Polly-NoCodeGen". As shown in
results, it reduces the compile-time overhead of some benchmarks such as 
tramp3dv4 (23.7%), simple_types_constant_folding(12.9%), oggenc(8.3%),
loop_unroll(7.5%)


At last, I also evaluated the performance of the ScopBottomUp patch that changes
the up-down scop detection into bottom-up scop detection. Results can be viewed
by:
pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s.
LLVMPolly-ScopBottomUp.so)  -mllvm -polly-optimizer=none -mllvm
-polly-code-generator=none
http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median
pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. LLVMPolly-ScopBottomUp.so)
-mllvm -polly
http://188.40.87.11:8000/db_default/v4/nts/19?compare_to=18&baseline=18&aggregation_fn=median
(*Both of these results are based on LLVM r187116, which has included the
r187102 patch file that we discussed above)


Please notice that this patch file will lead to some errors in Polly-tests, so
the data shown here can not be regards as confident results. For example, this
patch can significantly reduce the compile-time overhead of
SingleSource/Benchmarks/Shootout/nestedloop only because it regards the nested
loop as an invalid scop and skips all following transformations and
optimizations. However, I evaluated it here to see its potential performance
impact.  Based on the results shown on
http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median,
we can see detecting scops bottom-up may further reduce Polly compile-time by
more than 10%.


Best wishes,
Star Tan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130731/f8814648/attachment.html>

Tobias Grosser

2013-Jul-31 14:50 UTC

head link

[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

On 07/30/2013 10:03 AM, Star Tan wrote:> Hi Tobias and all Polly developers,
>
> I have re-evaluated the Polly compile-time performance using newest
> LLVM/Polly source code.  You can view the results on
> http://188.40.87.11:8000
>
<http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median>.
>
> Especially, I also evaluated ourr187102 patch file that avoids expensive
> failure string operations in normal execution. Specifically, I evaluated
> two cases for it:
>
> Polly-NoCodeGen: clang -O3 -load LLVMPolly.so -mllvm
> -polly-optimizer=none -mllvm -polly-code-generator=none
>
http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median
> Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly
>
http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=11&baseline=11&aggregation_fn=median
>
> The "Polly-NoCodeGen" case is mainly used to compare the
compile-time
> performance for the polly-detect pass. As shown in the results, our
> patch file could significantly reduce the compile-time overhead for some
> benchmarks such as tramp3dv4
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.355=2>
(24.2%), simple_types_constant_folding
>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.366=2>(12.6%),
> oggenc
>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.331=2>(9.1%),
> loop_unroll
>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.235=2>(7.8%)
Very nice!

Though I am surprised to also see performance regressions. They are all 
in very shortly executing kernels, so they may very well be measuring 
noice. Is this really the case?

Also, it may be interesting to compare against the non-polly case to see
how much overhead there is still due to our scop detetion.
> The "Polly-opt" case is used to compare the whole compile-time
> performance of Polly. Since our patch file mainly affects the
> Polly-Detect pass, it shows similar performance to
"Polly-NoCodeGen". As
> shown in results, it reduces the compile-time overhead of some
> benchmarks such as tramp3dv4
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.355=2>
(23.7%), simple_types_constant_folding
>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.366=2>(12.9%),
> oggenc
>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.331=2>(8.3%),
> loop_unroll
>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.235=2>(7.5%)
>
> At last, I also evaluated the performance of the ScopBottomUp patch that
> changes the up-down scop detection into bottom-up scop detection.
> Results can be viewed by:
> pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s.
> LLVMPolly-ScopBottomUp.so)  -mllvm -polly-optimizer=none -mllvm
> -polly-code-generator=none
>
http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median
> pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s.
> LLVMPolly-ScopBottomUp.so)  -mllvm -polly
>
http://188.40.87.11:8000/db_default/v4/nts/19?compare_to=18&baseline=18&aggregation_fn=median
> (*Both of these results are based on LLVM r187116, which has included
> the r187102 patch file that we discussed above)
>
> Please notice that this patch file will lead to some errors in
> Polly-tests, so the data shown here can not be regards as confident
> results. For example, this patch can significantly reduce the
> compile-time overhead of SingleSource/Benchmarks/Shootout/nestedloop
> <http://188.40.87.11:8000/db_default/v4/nts/19/graph?test.17=2> only
> because it regards the nested loop as an invalid scop and skips all
> following transformations and optimizations. However, I evaluated it
> here to see its potential performance impact.  Based on the results
> shown on
>
http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median,
> we can see detecting scops bottom-up may further reduce Polly
> compile-time by more than 10%.
Interesting. For some reason it also regresses huffbench quite a bit. 
:-( I think here an up-to-date non-polly to polly comparision would come 
handy to see which benchmarks we still see larger performance 
regressions. And if the bottom-up scop detection actually helps here.
As this is a larger patch, we should really have a need for it before 
switching to it.

Cheers,
Tobias

Star Tan

2013-Aug-01 02:28 UTC

head link

[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

Hi all,


I have also evaluated Poly compile-time performance with our patch file for
polly-dependence pass.  Results can be viewed on:
http://188.40.87.11:8000/db_default/v4/nts/23?baseline=18&compare_to=18


With this patch file, Polly would only create a single parameter for memory
accesses that share the same loop variable with different base address value. As
a result, it can significantly reduce compile-time for some array-intensive
benchmarks such like lu (reduced by 83.65%) and AMGMK (reduced by 56.24%).


For our standard benchmark a shown in
http://llvm.org/bugs/show_bug.cgi?id=14240, the total compile-time is reduced to
0.0164s from 154.5389s. Especially, the compile-time of polly-dependence is
reduced to 0.0066s (40.5%) from 148.8800s ( 96.3%).


Cheers,
Star Tan

At 2013-07-31 01:03:11,"Star Tan" <tanmx_star at yeah.net>
wrote:

Hi Tobias and all Polly developers,


I have re-evaluated the Polly compile-time performance using newest LLVM/Polly
source code.  You can view the results on  http://188.40.87.11:8000.


Especially, I also evaluated our r187102 patch file that avoids expensive
failure string operations in normal execution. Specifically, I evaluated two
cases for it:


Polly-NoCodeGe! n: clang -O3 -load LLVMPolly.so -mllvm -polly-optimizer=none
-mllvm -polly-code-generator=none
http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median
Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly
http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=11&baseline=11&aggregation_fn=median
<! span style="font-family: Helvetica, arial, freesans, clean,
sans-serif ; font-size: 15px; line-height: 25px;">

The "Polly-NoCodeGen" case is mainly used to compare the compile-time
performance for the polly-detect pass. As shown in the results, our patch file
could significantly reduce the compile-time overhead for some benchmarks such as
tramp3dv4 (24.2%), simple_types_constant_folding(12.6%), oggenc(9.1%),
loop_unroll(7.8%)


The "Polly-opt" case is used to compare the whole compile-time
performance of Polly. Since our patch file mainly affects the Polly-Detect pass,
it shows similar performance to "Polly-NoCodeGen". As shown in
results, it reduces the compile-time overhead of some benchmarks such as 
tramp3dv4 (23.7%), simple_types_constant_folding(12.9%), oggenc(8.3%),
loop_unroll(7.5%)


At last, I also evaluated the performance of the ScopBottomUp patch that changes
the up-down scop detection into bottom-up scop detection. Results can be viewed
by:
pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s.
LLVMPolly-ScopBottomUp.so)  -mllvm -polly-optimizer=none -mllvm
-polly-code-generator=none
http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&base
line=16&aggregation_fn=median
pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. LLVMPolly-ScopBottomUp.so)
-mllvm -polly
http://188.40.87.11:8000/db_default/v4/nts/19?compare_to=18&baseline=18&aggregation_fn=median
(*Both of these results are based on LLVM r187116, which has included the
r187102 patch file that we discussed above)


Please notice that this patch file will lead to some errors in Polly-tests, so
the data shown here can not be regards as confident results. For example, this
patch can significantly reduce the compile-time overhead of
SingleSource/Benchmarks/Shootout/nestedloop only because it regards the nested
loop as an invalid scop and skips all following transformations and
optimizations. However, I evaluated it here to see its potential performance
impact.  Based on the results shown on
http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median,
we can see detecting scops bottom-up may further reduce Polly compile-time by
more than 10%.


Best wishes,
Star Tan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130801/306a991b/attachment.html>

Star Tan

2013-Aug-01 04:23 UTC

head link

[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

At 2013-07-31 22:50:57,"Tobias Grosser" <tobias at grosser.es>
wrote:
>On 07/30/2013 10:03 AM, Star Tan wrote:
>> Hi Tobias and all Polly developers,
>>
>> I have re-evaluated the Polly compile-time performance using newest
>> LLVM/Polly source code.  You can view the results on
>> http://188.40.87.11:8000
>>
<http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median>.
>>
>> Especially, I also evaluated ourr187102 patch file that avoids
expensive
>> failure string operations in normal execution. Specifically, I
evaluated
>> two cases for it:
>>
>> Polly-NoCodeGen: clang -O3 -load LLVMPolly.so -mllvm
>> -polly-optimizer=none -mllvm -polly-code-generator=none
>>
http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median
>> Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly
>>
http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=11&baseline=11&aggregation_fn=median
>>
>> The "Polly-NoCodeGen" case is mainly used to compare the
compile-time
>> performance for the polly-detect pass. As shown in the results, our
>> patch file could significantly reduce the compile-time overhead for
some
>> benchmarks such as tramp3dv4
>> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.355=2>
(24.2%), simple_types_constant_folding
>>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.366=2>(12.6%),
>> oggenc
>>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.331=2>(9.1%),
>> loop_unroll
>>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.235=2>(7.8%)
>
>Very nice!
>
>Though I am surprised to also see performance regressions. They are all 
>in very shortly executing kernels, so they may very well be measuring 
>noice. Is this really the case?Yes, it seems that shortly executing benchmarks always show huge unexpected
noise even we run 10 samples for a test.
I have changed the ignore_small abs value to 0.05 from the original 0.01, which
means benchmarks with the performance delta less then 0.05s would be skipped. In
that case, the results seem to be much more stable.
However, I have noticed that there are many other Polly patches between the two
version r185399 and r187116. They may also affect the compile-time performance.
I would re-evaluate LLVM-testsuite to see the performance improvements caused
only by our  >
>Also, it may be interesting to compare against the non-polly case to see
>how much overhead there is still due to our scop detetion.
>
>> The "Polly-opt" case is used to compare the whole
compile-time
>> performance of Polly. Since our patch file mainly affects the
>> Polly-Detect pass, it shows similar performance to
"Polly-NoCodeGen". As
>> shown in results, it reduces the compile-time overhead of some
>> benchmarks such as tramp3dv4
>> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.355=2>
(23.7%), simple_types_constant_folding
>>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.366=2>(12.9%),
>> oggenc
>>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.331=2>(8.3%),
>> loop_unroll
>>
<http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.235=2>(7.5%)
>>
>> At last, I also evaluated the performance of the ScopBottomUp patch
that
>> changes the up-down scop detection into bottom-up scop detection.
>> Results can be viewed by:
>> pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s.
>> LLVMPolly-ScopBottomUp.so)  -mllvm -polly-optimizer=none -mllvm
>> -polly-code-generator=none
>>
http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median
>> pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s.
>> LLVMPolly-ScopBottomUp.so)  -mllvm -polly
>>
http://188.40.87.11:8000/db_default/v4/nts/19?compare_to=18&baseline=18&aggregation_fn=median
>> (*Both of these results are based on LLVM r187116, which has included
>> the r187102 patch file that we discussed above)
>>
>> Please notice that this patch file will lead to some errors in
>> Polly-tests, so the data shown here can not be regards as confident
>> results. For example, this patch can significantly reduce the
>> compile-time overhead of SingleSource/Benchmarks/Shootout/nestedloop
>> <http://188.40.87.11:8000/db_default/v4/nts/19/graph?test.17=2>
only
>> because it regards the nested loop as an invalid scop and skips all
>> following transformations and optimizations. However, I evaluated it
>> here to see its potential performance impact.  Based on the results
>> shown on
>>
http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median,
>> we can see detecting scops bottom-up may further reduce Polly
>> compile-time by more than 10%.
>
>Interesting. For some reason it also regresses huffbench quite a bit. This is because the ScopBottomUp patch file invalids the scop detection for
huffbench. The run-time of huffbench with different options are shown as
follows:
clang: 19.1680s  (see runid=14)
polly without ScopBottomUp patch file: 14.8340s (see runid=16)
polly with ScopBottomUp patch file: 19.2920s (see runid=21)
As you can see, the ScopBottomUp patch file shows almost the same execution
performance with clang. That is because no invalid scops is detected with this
patch file at all.

>:-( I think here an up-to-date non-polly to polly comparision would come 
>handy to see which benchmarks we still see larger performance 
>regressions. And if the bottom-up scop detection actually helps here.
>As this is a larger patch, we should really have a need for it before 
>switching to it.
>I have evaluated Polly compile-time performance for the following options:
  clang: clang -O3  (runid: 14) 
  pBasic: clang -O3 -load LLVMPolly.so (runid:15) 
  pNoGen: pollycc -O3 -mllvm -polly-optimizer=none -mllvm
-polly-code-generator=none (runid:16)
  pNoOpt: pollycc -O3 -mllvm -polly-optimizer=none (runid:17) 
  pOpt: pollycc -O3 (runid:18)
For example, you can view the comparison between "clang" and
"pNoGen" with:
http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=14&baseline=14
It shows that without optimizer and code generator, Polly would lead to less
then 30% extra compile-time overhead.
For the execution performance, it is interesting that pNoGen not only
significantly improves the execution performance for some benchmarks
(nestedloop/huffbench) but also significantly reduces the execution performance
for another set of benchmarks (gcc-loops/lpbench).


Thanks,
Star Tan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130801/92eedbb7/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Jul 2013 - [LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

Maybe Matching Threads