thr3ads.net - llvm dev - [llvm-dev] (RFC) Adjusting default loop fully unroll threshold [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth via llvm-dev

2017-Feb-16 23:45 UTC

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

First off, I just want to say wow and thank you. This kind of data is
amazing. =D

On Thu, Feb 16, 2017 at 2:46 AM Kristof Beyls <Kristof.Beyls at arm.com>
wrote:
> The biggest relative code size increases indeed didn't happen for the
> biggest programs, but instead for a few programs weighing in at about
100KB.
> I'm assuming the Google benchmark set covers much bigger programs than
the
> ones displayed here.
> FWIW, the cluster of programs where code size increases between 60% to 80%
> with a size of about 100KB, all come from MultiSource/Benchmarks/TSVC.
> Interestingly, these programs seem to have float and double variants,  e.g.
> (MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt and
> MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl), and the code size
> bloat only happens for the double variants.
>
I think we should definitely look at this (as it seems likely to be a bug
somewhere), but I'm also not overly concerned with size regressions in the
TSVC benchmarks which are unusually loop heavy and small. We've have
several other changes that caused big fluctuations here.


> I think it may still be worthwhile to check if this also happens on other
> architectures, and why it happens only for the double-variants, not the
> float-variants.
>
+1

The second chart shows relative code size increase (vertical axis)
vs> relative performance improvement (horizontal axis):
> I manually checked the cause of the 3 biggest performance regressions
> (proprietary benchmark1: -13.70%;
> MultiSource/Applications/hexxagon/hexxagon: -10.10%;
> MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow -5.23%).
> For the proprietary benchmark and hexxagon, the code generation didn't
> change for the hottest parts, so probably is caused by micro-architectural
> effects of code layout changes.
>
This is always good to know, even though it is frustrating. =]

> For fourinarow, there seemed to be a lot more spill/fill code, so probably
> due to non-optimality of register allocation.
>
This is something we should probably look at. If you have the output lying
around, maybe file a PR about it?

The third chart below just zooms in on the above chart to the -5% to
5%> performance improvement range:
> [image: unroll_codesize_vs_performance_zoom.png]
>
>
> Whether to enable the increase in unroll threshold only at O3 or also at
> O2: I don't have a strong opinion based on the above data.
>
FWIW, this data seems to clearly indicate that we don't get performance
wins with any consistency when the code size goes up (and thus the change
has impact). As a consequence, I pretty strongly suspect that this should
be *just* used at O3 at least for now.

I see two further directions for Dehao that make sense here (at least to
me):
1) I suspect we should investigate *why* the size increases are happening
without helping speed. I can imagine some reasons that this would of course
happen (cold loops getting unrolled), but especially in light of the
oddities you point out above, I suspect there may be issues where more
unrolling is uncovering other problems and if we fix those other problems
the shape of things will be different. We should at least address the
issues you uncovered above.

2) If this turns out to be architecture specific (it seems that way at
least initially, but hard to tell for sure with different benchmark sets)
we might make AArch64 and x86 use different thresholds here. I'm skeptical
about this though. I suspect we should do #1, and we'll either get a
different shape, or just decide that O3 is more appropriate.

> Maybe the compile time impact is what should be driving that discussion
> the most? I'm afraid I don't have compile time numbers.
>
FWIW, I strongly suspect that for *this* change, compile time and code size
will be pretty precisely correlated. Dehao's data shows that to be true in
several cases certainly.

> Ultimately, I guess this boils down to what exactly the difference is in
> intent between O2 and O3, which seems like a never-ending discussion...
>
The definitions I am working from are here:
https://github.com/llvm-project/llvm-project/blob/master/llvm/include/llvm/Passes/PassBuilder.h#L81-L90

I've highlighted the part that makes me think O3 is better here: the code
size increases (and thus compile time increases) don't seem to correspond
to runtime improvements.

>
> Hoping you find this useful,
>
Very. Once again, this kind of data and analysis is awesome. =D
>
> Kristof
>
>
> On Tue, Feb 14, 2017 at 1:06 PM Kristof Beyls via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> I've run the patch on https://reviews.llvm.org/D28368 on the test-suite
> and other benchmarks, for AArch64 -O3 -fomit-frame-pointer, both for
> Cortex-A53 and Cortex-A57.
>
> The geomean over the few hundred programs in there is roughly the same for
> Cortex-A53 and Cortex-A57: a bit over 1% improvement in execution speed for
> a bit over 5% increase in code size.
> Obviously I wouldn't want this for optimization levels where code size
is
> of any concern, like -Os or -Oz, but don't have a problem with this
going
> in for other optimization levels where this isn't a concern.
>
> Thanks,
>
> Kristof
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/c7268499/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unroll_codesize_absolute_vs_relative.png
Type: image/png
Size: 86966 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/c7268499/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unroll_codesize_vs_performance.png
Type: image/png
Size: 84065 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/c7268499/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unroll_codesize_vs_performance_zoom.png
Type: image/png
Size: 103095 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/c7268499/attachment-0005.png>

Xinliang David Li via llvm-dev

2017-Feb-17 00:41 UTC

head link

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

On Thu, Feb 16, 2017 at 3:45 PM, Chandler Carruth via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> First off, I just want to say wow and thank you. This kind of data is
> amazing. =D
>
> On Thu, Feb 16, 2017 at 2:46 AM Kristof Beyls <Kristof.Beyls at
arm.com>
> wrote:
>
>> The biggest relative code size increases indeed didn't happen for
the
>> biggest programs, but instead for a few programs weighing in at about
100KB.
>> I'm assuming the Google benchmark set covers much bigger programs
than
>> the ones displayed here.
>> FWIW, the cluster of programs where code size increases between 60% to
>> 80% with a size of about 100KB, all come from
MultiSource/Benchmarks/TSVC.
>> Interestingly, these programs seem to have float and double variants, 
e.g.
>> (MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt and
>> MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl), and the code
>> size bloat only happens for the double variants.
>>
>
> I think we should definitely look at this (as it seems likely to be a bug
> somewhere), but I'm also not overly concerned with size regressions in
the
> TSVC benchmarks which are unusually loop heavy and small. We've have
> several other changes that caused big fluctuations here.
>
>
>
>> I think it may still be worthwhile to check if this also happens on
other
>> architectures, and why it happens only for the double-variants, not the
>> float-variants.
>>
>
> +1
>
> The second chart shows relative code size increase (vertical axis) vs
>> relative performance improvement (horizontal axis):
>> I manually checked the cause of the 3 biggest performance regressions
>> (proprietary benchmark1: -13.70%;
MultiSource/Applications/hexxagon/hexxagon:
>> -10.10%; MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow
-5.23%).
>> For the proprietary benchmark and hexxagon, the code generation
didn't
>> change for the hottest parts, so probably is caused by
micro-architectural
>> effects of code layout changes.
>>
>
> This is always good to know, even though it is frustrating. =]
>
>
>> For fourinarow, there seemed to be a lot more spill/fill code, so
>> probably due to non-optimality of register allocation.
>>
>
> This is something we should probably look at. If you have the output lying
> around, maybe file a PR about it?
>
> The third chart below just zooms in on the above chart to the -5% to 5%
>> performance improvement range:
>> [image: unroll_codesize_vs_performance_zoom.png]
>>
>>
>> Whether to enable the increase in unroll threshold only at O3 or also
at
>> O2: I don't have a strong opinion based on the above data.
>>
>
> FWIW, this data seems to clearly indicate that we don't get performance
> wins with any consistency when the code size goes up (and thus the change
> has impact). As a consequence, I pretty strongly suspect that this should
> be *just* used at O3 at least for now.
>
The correlation is there -- when there is performance improvement, there is
size increase. The opposite is not true -- but that is expected. If the
speedup is in the cold path, there won't be visible performance improvement
but size increase.

Put it another way. If we reduce the threshold, there will be sizable size
improvement for many benchmarks without regressing performance, shall we
use the reduced threshold for O2 instead?

It is usually tiny programs that are sensitive (size) to this change. The
size vs size increase chart confirms that point. There is basically no
large size increase for programs > 1MB (clang release build size is 78M).
In other words, I believe the actual size impact on real world applications
should be negligible.  This behavior is very different from the case when
we increase inline threshold for instance -- which will have size impact
across the board. The latter is certainly more limited to higher
optimization levels.

thanks,

David




>
> I see two further directions for Dehao that make sense here (at least to
> me):
> 1) I suspect we should investigate *why* the size increases are happening
> without helping speed. I can imagine some reasons that this would of course
> happen (cold loops getting unrolled), but especially in light of the
> oddities you point out above, I suspect there may be issues where more
> unrolling is uncovering other problems and if we fix those other problems
> the shape of things will be different. We should at least address the
> issues you uncovered above.
>
> 2) If this turns out to be architecture specific (it seems that way at
> least initially, but hard to tell for sure with different benchmark sets)
> we might make AArch64 and x86 use different thresholds here. I'm
skeptical
> about this though. I suspect we should do #1, and we'll either get a
> different shape, or just decide that O3 is more appropriate.
>
>
>> Maybe the compile time impact is what should be driving that discussion
>> the most? I'm afraid I don't have compile time numbers.
>>
>
> FWIW, I strongly suspect that for *this* change, compile time and code
> size will be pretty precisely correlated. Dehao's data shows that to be
> true in several cases certainly.
>
>
>> Ultimately, I guess this boils down to what exactly the difference is
in
>> intent between O2 and O3, which seems like a never-ending discussion...
>>
>
> The definitions I am working from are here:
> https://github.com/llvm-project/llvm-project/blob/
> master/llvm/include/llvm/Passes/PassBuilder.h#L81-L90
>
> I've highlighted the part that makes me think O3 is better here: the
code
> size increases (and thus compile time increases) don't seem to
correspond
> to runtime improvements.
>
>
>>
>> Hoping you find this useful,
>>
>
> Very. Once again, this kind of data and analysis is awesome. =D
>
>>
>> Kristof
>>
>>
>> On Tue, Feb 14, 2017 at 1:06 PM Kristof Beyls via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> I've run the patch on https://reviews.llvm.org/D28368 on the
test-suite
>> and other benchmarks, for AArch64 -O3 -fomit-frame-pointer, both for
>> Cortex-A53 and Cortex-A57.
>>
>> The geomean over the few hundred programs in there is roughly the same
>> for Cortex-A53 and Cortex-A57: a bit over 1% improvement in execution
speed
>> for a bit over 5% increase in code size.
>> Obviously I wouldn't want this for optimization levels where code
size is
>> of any concern, like -Os or -Oz, but don't have a problem with this
going
>> in for other optimization levels where this isn't a concern.
>>
>> Thanks,
>>
>> Kristof
>>
>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/b98fb6bb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unroll_codesize_vs_performance_zoom.png
Type: image/png
Size: 103095 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/b98fb6bb/attachment-0001.png>

Mehdi Amini via llvm-dev

2017-Feb-17 01:43 UTC

head link

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

> On Feb 16, 2017, at 4:41 PM, Xinliang David Li via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
> On Thu, Feb 16, 2017 at 3:45 PM, Chandler Carruth via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> First off, I just want to say wow and thank you. This kind of data is
amazing. =D
> 
> On Thu, Feb 16, 2017 at 2:46 AM Kristof Beyls <Kristof.Beyls at arm.com
<mailto:Kristof.Beyls at arm.com>> wrote:
> The biggest relative code size increases indeed didn't happen for the
biggest programs, but instead for a few programs weighing in at about 100KB.
> I'm assuming the Google benchmark set covers much bigger programs than
the ones displayed here.
> FWIW, the cluster of programs where code size increases between 60% to 80%
with a size of about 100KB, all come from MultiSource/Benchmarks/TSVC.
Interestingly, these programs seem to have float and double variants,  e.g.
(MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt and
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl), and the code size
bloat only happens for the double variants.
> 
> I think we should definitely look at this (as it seems likely to be a bug
somewhere), but I'm also not overly concerned with size regressions in the
TSVC benchmarks which are unusually loop heavy and small. We've have several
other changes that caused big fluctuations here.
>  
>  
> I think it may still be worthwhile to check if this also happens on other
architectures, and why it happens only for the double-variants, not the
float-variants.
> 
> +1
> 
> The second chart shows relative code size increase (vertical axis) vs
relative performance improvement (horizontal axis):
> I manually checked the cause of the 3 biggest performance regressions
(proprietary benchmark1: -13.70%; MultiSource/Applications/hexxagon/hexxagon:
-10.10%; MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow
> -5.23%).
> For the proprietary benchmark and hexxagon, the code generation didn't
change for the hottest parts, so probably is caused by micro-architectural
effects of code layout changes.
> 
> This is always good to know, even though it is frustrating. =]
>  
> For fourinarow, there seemed to be a lot more spill/fill code, so probably
due to non-optimality of register allocation.
> 
> This is something we should probably look at. If you have the output lying
around, maybe file a PR about it?
> 
> The third chart below just zooms in on the above chart to the -5% to 5%
performance improvement range:
> <unroll_codesize_vs_performance_zoom.png>
> 
> 
> Whether to enable the increase in unroll threshold only at O3 or also at
O2: I don't have a strong opinion based on the above data.
> 
> FWIW, this data seems to clearly indicate that we don't get performance
wins with any consistency when the code size goes up (and thus the change has
impact). As a consequence, I pretty strongly suspect that this should be *just*
used at O3 at least for now.
> 
> The correlation is there -- when there is performance improvement, there is
size increase.
I didn’t quite get this impression from the graph, the highest improvement
didn’t come with code size increase:





And on the other hand there were many code-size increase without any runtime
improvement.

> The opposite is not true -- but that is expected. If the speedup is in the
cold path, there won't be visible performance improvement but size increase.
> 
> Put it another way. If we reduce the threshold, there will be sizable size
improvement for many benchmarks without regressing performance, shall we use the
reduced threshold for O2 instead?
Yes, all the ones here IIUC:





However it is likely that we could consider these “small” benchmarks should use
-Os if they're sensitive to size, and so O2 would be fine with the more
aggressive threshold (as larger program aren’t affected).

With good heuristic we’d have every dot forming a straight line  
code_size_increase = m * runtime_perf (with m as small as possible). The current
lack of shape (or the exact opposite distribution to the ideal I imagine above)
seems to show that our "profitability” heuristics are pretty bad and the
current threshold knob is bad predictor of the runtime performance.

— 
Mehdi

> 
> It is usually tiny programs that are sensitive (size) to this change. The
size vs size increase chart confirms that point. There is basically no large
size increase for programs > 1MB (clang release build size is 78M). In other
words, I believe the actual size impact on real world applications should be
negligible.  This behavior is very different from the case when we increase
inline threshold for instance -- which will have size impact across the board.
The latter is certainly more limited to higher optimization levels.
> 
> thanks,
> 
> David
> 
> 
> 
>  
> 
> I see two further directions for Dehao that make sense here (at least to
me):
> 1) I suspect we should investigate *why* the size increases are happening
without helping speed. I can imagine some reasons that this would of course
happen (cold loops getting unrolled), but especially in light of the oddities
you point out above, I suspect there may be issues where more unrolling is
uncovering other problems and if we fix those other problems the shape of things
will be different. We should at least address the issues you uncovered above.
> 
> 2) If this turns out to be architecture specific (it seems that way at
least initially, but hard to tell for sure with different benchmark sets) we
might make AArch64 and x86 use different thresholds here. I'm skeptical
about this though. I suspect we should do #1, and we'll either get a
different shape, or just decide that O3 is more appropriate.
>  
> Maybe the compile time impact is what should be driving that discussion the
most? I'm afraid I don't have compile time numbers.
> 
> FWIW, I strongly suspect that for *this* change, compile time and code size
will be pretty precisely correlated. Dehao's data shows that to be true in
several cases certainly.
>  
> Ultimately, I guess this boils down to what exactly the difference is in
intent between O2 and O3, which seems like a never-ending discussion...
> 
> The definitions I am working from are here:
>
https://github.com/llvm-project/llvm-project/blob/master/llvm/include/llvm/Passes/PassBuilder.h#L81-L90
<https://github.com/llvm-project/llvm-project/blob/master/llvm/include/llvm/Passes/PassBuilder.h#L81-L90>
> 
> I've highlighted the part that makes me think O3 is better here: the
code size increases (and thus compile time increases) don't seem to
correspond to runtime improvements.
>  
> 
> Hoping you find this useful,
> 
> Very. Once again, this kind of data and analysis is awesome. =D 
> 
> Kristof
> 
>> 
>> On Tue, Feb 14, 2017 at 1:06 PM Kristof Beyls via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> I've run the patch on https://reviews.llvm.org/D28368
<https://reviews.llvm.org/D28368> on the test-suite and other benchmarks,
for AArch64 -O3 -fomit-frame-pointer, both for Cortex-A53 and Cortex-A57.
>> 
>> The geomean over the few hundred programs in there is roughly the same
for Cortex-A53 and Cortex-A57: a bit over 1% improvement in execution speed for
a bit over 5% increase in code size.
>> Obviously I wouldn't want this for optimization levels where code
size is of any concern, like -Os or -Oz, but don't have a problem with this
going in for other optimization levels where this isn't a concern.
>> 
>> Thanks,
>> 
>> Kristof
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/fa10aa06/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2017-02-16 at 5.33.29 PM.png
Type: image/png
Size: 33362 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/fa10aa06/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2017-02-16 at 5.35.53 PM.png
Type: image/png
Size: 19584 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/fa10aa06/attachment-0003.png>

Kristof Beyls via llvm-dev

2017-Feb-17 14:42 UTC

head link

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

On 17 Feb 2017, at 00:45, Chandler Carruth <chandlerc at
gmail.com<mailto:chandlerc at gmail.com>> wrote:

First off, I just want to say wow and thank you. This kind of data is amazing.
=D

Thanks :)

For fourinarow, there seemed to be a lot more spill/fill code, so probably due
to non-optimality of register allocation.

This is something we should probably look at. If you have the output lying
around, maybe file a PR about it?

I'll have another, closer look to be more sure about what the root cause is,
and file a PR if it looks like a good case to help improve the register
allocator.

The third chart below just zooms in on the above chart to the -5% to 5%
performance improvement range:
<unroll_codesize_vs_performance_zoom.png>

Whether to enable the increase in unroll threshold only at O3 or also at O2: I
don't have a strong opinion based on the above data.

FWIW, this data seems to clearly indicate that we don't get performance wins
with any consistency when the code size goes up (and thus the change has
impact). As a consequence, I pretty strongly suspect that this should be *just*
used at O3 at least for now.

I see two further directions for Dehao that make sense here (at least to me):
1) I suspect we should investigate *why* the size increases are happening
without helping speed. I can imagine some reasons that this would of course
happen (cold loops getting unrolled), but especially in light of the oddities
you point out above, I suspect there may be issues where more unrolling is
uncovering other problems and if we fix those other problems the shape of things
will be different. We should at least address the issues you uncovered above.

2) If this turns out to be architecture specific (it seems that way at least
initially, but hard to tell for sure with different benchmark sets) we might
make AArch64 and x86 use different thresholds here. I'm skeptical about this
though. I suspect we should do #1, and we'll either get a different shape,
or just decide that O3 is more appropriate.

Agreed. FWIW, I haven't spotted any results that suggested to me that the
unrolling threshold should be different between different
architectures.>From a basic principles perspective, I'd assume that micro-architectural
features such as the size of the instruction cache should mostly define the
unrolling thresholds, rather than instruction set architecture. Which implies
different thresholds per subtarget, rather than per target. Anyway, let's
not try to come up with different thresholds per target/subtarget without clear
data.

Maybe the compile time impact is what should be driving that discussion the
most? I'm afraid I don't have compile time numbers.

FWIW, I strongly suspect that for *this* change, compile time and code size will
be pretty precisely correlated. Dehao's data shows that to be true in
several cases certainly.

Ultimately, I guess this boils down to what exactly the difference is in intent
between O2 and O3, which seems like a never-ending discussion...

The definitions I am working from are here:
https://github.com/llvm-project/llvm-project/blob/master/llvm/include/llvm/Passes/PassBuilder.h#L81-L90

I've highlighted the part that makes me think O3 is better here: the code
size increases (and thus compile time increases) don't seem to correspond to
runtime improvements.

Agreed, only enabling this for O3 seems to be most in line with the definition
you pointed at.

Thanks,

Kristof

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170217/43ba2226/attachment.html>

Dehao Chen via llvm-dev

2017-Feb-17 21:55 UTC

head link

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

I've update https://reviews.llvm.org/D28368 to move the threshold update to
O3.

Chandler, could you help take a look if the changes that pass down
OptLevel to LoopUnroller is reasonable?

Thanks,
Dehao

llvm dev - Feb 2017 - (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

[llvm-dev] (RFC) Adjusting default loop fully unroll threshold