search for: unroll_codesize_vs_performance_zoom

Displaying 3 results from an estimated 3 matches for "unroll_codesize_vs_performance_zoom".

2017 Feb 16
4
(RFC) Adjusting default loop fully unroll threshold
...so probably > due to non-optimality of register allocation. > This is something we should probably look at. If you have the output lying around, maybe file a PR about it? The third chart below just zooms in on the above chart to the -5% to 5% > performance improvement range: > [image: unroll_codesize_vs_performance_zoom.png] > > > Whether to enable the increase in unroll threshold only at O3 or also at > O2: I don't have a strong opinion based on the above data. > FWIW, this data seems to clearly indicate that we don't get performance wins with any consistency when the code size goes up (an...
2017 Feb 15
2
(RFC) Adjusting default loop fully unroll threshold
Thanks for running these Kristof! I'd still like to hear from Apple, and if we can get a few more x86 micro-architectures covered that'd be great, but it looks like -O3 is uncontroversial, and the question is whether this makes sense at O2... To me, it would help a lot to know the actual breakdown of benchmarks such as yours Kristof (as they seem to have more codesize impact than others
2017 Feb 17
2
(RFC) Adjusting default loop fully unroll threshold
...o probably due to non-optimality of register allocation. > > This is something we should probably look at. If you have the output lying around, maybe file a PR about it? > > The third chart below just zooms in on the above chart to the -5% to 5% performance improvement range: > <unroll_codesize_vs_performance_zoom.png> > > > Whether to enable the increase in unroll threshold only at O3 or also at O2: I don't have a strong opinion based on the above data. > > FWIW, this data seems to clearly indicate that we don't get performance wins with any consistency when the code size goes u...