Displaying 3 results from an estimated 3 matches for "unroll_codesize_vs_performance_zoom".
2017 Feb 16
4
(RFC) Adjusting default loop fully unroll threshold
...so probably
> due to non-optimality of register allocation.
>
This is something we should probably look at. If you have the output lying
around, maybe file a PR about it?
The third chart below just zooms in on the above chart to the -5% to 5%
> performance improvement range:
> [image: unroll_codesize_vs_performance_zoom.png]
>
>
> Whether to enable the increase in unroll threshold only at O3 or also at
> O2: I don't have a strong opinion based on the above data.
>
FWIW, this data seems to clearly indicate that we don't get performance
wins with any consistency when the code size goes up (an...
2017 Feb 15
2
(RFC) Adjusting default loop fully unroll threshold
Thanks for running these Kristof!
I'd still like to hear from Apple, and if we can get a few more x86
micro-architectures covered that'd be great, but it looks like -O3 is
uncontroversial, and the question is whether this makes sense at O2...
To me, it would help a lot to know the actual breakdown of benchmarks such
as yours Kristof (as they seem to have more codesize impact than others
2017 Feb 17
2
(RFC) Adjusting default loop fully unroll threshold
...o probably due to non-optimality of register allocation.
>
> This is something we should probably look at. If you have the output lying around, maybe file a PR about it?
>
> The third chart below just zooms in on the above chart to the -5% to 5% performance improvement range:
> <unroll_codesize_vs_performance_zoom.png>
>
>
> Whether to enable the increase in unroll threshold only at O3 or also at O2: I don't have a strong opinion based on the above data.
>
> FWIW, this data seems to clearly indicate that we don't get performance wins with any consistency when the code size goes u...