thr3ads.net - llvm dev - [llvm-dev] (RFC) Adjusting default loop fully unroll threshold [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Dehao Chen via llvm-dev

2017-Jan-30 18:49 UTC

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

Currently, loop fully unroller shares the same default threshold as loop
dynamic unroller and partial unroller. This seems conservative because
unlike dynamic/partial unrolling, fully unrolling will not affect
LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to
double the threshold for loop fully unroller. This will change the codegen
of several SPECCPU benchmarks:

Code size:
447.dealII 0.50%
453.povray 0.42%
433.milc 0.20%
445.gobmk 0.32%
403.gcc 0.05%
464.h264ref 3.62%

Compile Time:
447.dealII 0.22%
453.povray -0.16%
433.milc 0.09%
445.gobmk -2.43%
403.gcc 0.06%
464.h264ref 3.21%

Performance (on intel sandybridge):
447.dealII +0.07%
453.povray +1.79%
433.milc +1.02%
445.gobmk +0.56%
403.gcc -0.16%
464.h264ref -0.41%

Looks like the change has overall positive performance impact with very
small code size/compile time overhead. Now the question is shall we make
this change default in O2, or shall we leave it in O3. We would like to
have more input from the community to make the decision.

Thanks,

Dehao
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170130/2fc84f24/attachment.html>

Matthias Braun via llvm-dev

2017-Jan-30 19:28 UTC

head link

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Currently, loop fully unroller shares the same default threshold as loop
dynamic unroller and partial unroller. This seems conservative because unlike
dynamic/partial unrolling, fully unrolling will not affect LSD/ICache
performance. In https://reviews.llvm.org/D28368
<https://reviews.llvm.org/D28368>, I proposed to double the threshold for
loop fully unroller. This will change the codegen of several SPECCPU benchmarks:
> 
> Code size:
> 447.dealII 0.50%
> 453.povray 0.42%
> 433.milc 0.20%
> 445.gobmk 0.32%
> 403.gcc 0.05%
> 464.h264ref 3.62%
> 
> Compile Time:
> 447.dealII 0.22%
> 453.povray -0.16%
> 433.milc 0.09%
> 445.gobmk -2.43%
> 403.gcc 0.06%
> 464.h264ref 3.21%
> 
> Performance (on intel sandybridge):
> 447.dealII +0.07%
> 453.povray +1.79%
> 433.milc +1.02%
> 445.gobmk +0.56%
> 403.gcc -0.16%
> 464.h264ref -0.41%
> 
> Looks like the change has overall positive performance impact with very
small code size/compile time overhead. Now the question is shall we make this
change default in O2, or shall we leave it in O3. We would like to have more
input from the community to make the decision.
> Intuitively (correct me if I am wrong) I would think loop unrolling is a more
risky operation that is good in many/most cases but can be detrimental to
performance (by blowing up code sizes and I-Caches). So I would rather put that
into -O3.

- Matthias

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170130/86c15877/attachment.html>

Dehao Chen via llvm-dev

2017-Jan-30 19:43 UTC

head link

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

On Mon, Jan 30, 2017 at 11:28 AM, Matthias Braun <mbraun at apple.com>
wrote:
>
> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Currently, loop fully unroller shares the same default threshold as loop
> dynamic unroller and partial unroller. This seems conservative because
> unlike dynamic/partial unrolling, fully unrolling will not affect
> LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to
> double the threshold for loop fully unroller. This will change the codegen
> of several SPECCPU benchmarks:
>
> Code size:
> 447.dealII 0.50%
> 453.povray 0.42%
> 433.milc 0.20%
> 445.gobmk 0.32%
> 403.gcc 0.05%
> 464.h264ref 3.62%
>
> Compile Time:
> 447.dealII 0.22%
> 453.povray -0.16%
> 433.milc 0.09%
> 445.gobmk -2.43%
> 403.gcc 0.06%
> 464.h264ref 3.21%
>
> Performance (on intel sandybridge):
> 447.dealII +0.07%
> 453.povray +1.79%
> 433.milc +1.02%
> 445.gobmk +0.56%
> 403.gcc -0.16%
> 464.h264ref -0.41%
>
> Looks like the change has overall positive performance impact with very
> small code size/compile time overhead. Now the question is shall we make
> this change default in O2, or shall we leave it in O3. We would like to
> have more input from the community to make the decision.
>
> Intuitively (correct me if I am wrong) I would think loop unrolling is a
> more risky operation that is good in many/most cases but can be detrimental
> to performance (by blowing up code sizes and I-Caches). So I would rather
> put that into -O3.
>
Yes, it applies to runtime/partial loop unroll. That's why we want to set a
conservative threshold to make sure unrolled loop does not blow up loop
stream detector (LSD) and L1I. But for fully unroll, it only handles cases
where loop trip count is small, so the LSD will never activate. By
completely flattening the loop, the loop became straight-line code thus we
do not have L1I issue for the flattened loop. That's why we think the
threshold should be more aggressive for fully unroller.

Dehao
>
> - Matthias
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170130/601f13f5/attachment.html>

Mehdi Amini via llvm-dev

2017-Jan-30 23:51 UTC

head link

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Currently, loop fully unroller shares the same default threshold as loop
dynamic unroller and partial unroller. This seems conservative because unlike
dynamic/partial unrolling, fully unrolling will not affect LSD/ICache
performance. In https://reviews.llvm.org/D28368
<https://reviews.llvm.org/D28368>, I proposed to double the threshold for
loop fully unroller. This will change the codegen of several SPECCPU benchmarks:
> 
> Code size:
> 447.dealII 0.50%
> 453.povray 0.42%
> 433.milc 0.20%
> 445.gobmk 0.32%
> 403.gcc 0.05%
> 464.h264ref 3.62%
> 
> Compile Time:
> 447.dealII 0.22%
> 453.povray -0.16%
> 433.milc 0.09%
> 445.gobmk -2.43%
> 403.gcc 0.06%
> 464.h264ref 3.21%
> 
> Performance (on intel sandybridge):
> 447.dealII +0.07%
> 453.povray +1.79%
> 433.milc +1.02%
> 445.gobmk +0.56%
> 403.gcc -0.16%
> 464.h264ref -0.41%
> 
Can you clarify how to read these numbers? (I’m using +xx% to indicates a
slowdown usually, it seems you’re doing the opposite?).

So considering 464.h264ref, does it mean it is 3.6% slower to compile, gets 3.2%
larger, and 0.4% slower?

Another question is about PGO integration: is it already hooked there? Should we
have a more aggressive threshold in a hot function? (Assuming we’re willing to
spend some binary size there but not on the cold path).

Thanks,

— 
Mehdi

> Looks like the change has overall positive performance impact with very
small code size/compile time overhead. Now the question is shall we make this
change default in O2, or shall we leave it in O3. We would like to have more
input from the community to make the decision.
> 
> Thanks,
> 
> Dehao
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170130/e3ec0873/attachment.html>

Chandler Carruth via llvm-dev

2017-Jan-30 23:56 UTC

head link

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Currently, loop fully unroller shares the same default threshold as loop
> dynamic unroller and partial unroller. This seems conservative because
> unlike dynamic/partial unrolling, fully unrolling will not affect
> LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to
> double the threshold for loop fully unroller. This will change the codegen
> of several SPECCPU benchmarks:
>
> Code size:
> 447.dealII 0.50%
> 453.povray 0.42%
> 433.milc 0.20%
> 445.gobmk 0.32%
> 403.gcc 0.05%
> 464.h264ref 3.62%
>
> Compile Time:
> 447.dealII 0.22%
> 453.povray -0.16%
> 433.milc 0.09%
> 445.gobmk -2.43%
> 403.gcc 0.06%
> 464.h264ref 3.21%
>
> Performance (on intel sandybridge):
> 447.dealII +0.07%
> 453.povray +1.79%
> 433.milc +1.02%
> 445.gobmk +0.56%
> 403.gcc -0.16%
> 464.h264ref -0.41%
>
>
> Can you clarify how to read these numbers? (I’m using +xx% to indicates a
> slowdown usually, it seems you’re doing the opposite?).
>
> So considering 464.h264ref, does it mean it is 3.6% slower to compile,
> gets 3.2% larger, and 0.4% slower?
>
> Another question is about PGO integration: is it already hooked there?
> Should we have a more aggressive threshold in a hot function? (Assuming
> we’re willing to spend some binary size there but not on the cold path).
>
I would even wire the *unrolling* the other way: just suppress unrolling in
cold paths to save binary size. rolled loops seem like a generally good
thing in cold code unless they are having some larger impact (IE, the loop
itself is more expensive than the unrolled form).

>
> Thanks,
>
> —
> Mehdi
>
>
> Looks like the change has overall positive performance impact with very
> small code size/compile time overhead. Now the question is shall we make
> this change default in O2, or shall we leave it in O3. We would like to
> have more input from the community to make the decision.
>
> Thanks,
>
> Dehao
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170130/87a2c6f4/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jan 2017 - (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

Maybe Matching Threads