Chandler Carruth via llvm-dev
2017-Feb-13 22:17 UTC
[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
On Mon, Feb 13, 2017 at 2:06 PM Gerolf Hoflehner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> For unrolling specifically I agree with Hal that the hooks should be > target specific. Actually, I go further and think they should be uArch > specific. >They already are, it is just that no one has contributed a patch to use this on x86 microarchitectures. Until someone shows up with data showing that we need different tunings for different microarchitectures, it doesn't make sense for us to just make up numbers there. On the (very limited) microarchitectures we have and can test on, we're not seeing a need for microarchitectural tuning. But if others have different data, that would of course be welcome. That's part of what we're looking for in this thread.> I have no data or prove but would not be surprised to see a wider variety > of numbers when the thresholds are tested on a wide range of x86 machines. >Until we have data, I don't see how we can act on this though.> My first thought also was along the lines of Matthias: do it at a higher > opt level e.g. O3 or possibly revisit/start thinking about O4. >Why? What about the data presented means that this isn't appropriate at O2? I'm fine if that's the answer, but I think we need to have a clear and unambiguous rationale behind it. With the current data on this thread, the code size and compile time impact seem *very small* except for very small benchmarks, many of which actually show the performance improvement as well.>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170213/aebc122d/attachment.html>
Gerolf Hoflehner via llvm-dev
2017-Feb-14 20:53 UTC
[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
> On Feb 13, 2017, at 2:17 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > On Mon, Feb 13, 2017 at 2:06 PM Gerolf Hoflehner via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > For unrolling specifically I agree with Hal that the hooks should be target specific. Actually, I go further and think they should be uArch specific. > > They already are, it is just that no one has contributed a patch to use this on x86 microarchitectures. > > Until someone shows up with data showing that we need different tunings for different microarchitectures, it doesn't make sense for us to just make up numbers there. > > On the (very limited) microarchitectures we have and can test on, we're not seeing a need for microarchitectural tuning. But if others have different data, that would of course be welcome. That's part of what we're looking for in this thread. > > I have no data or prove but would not be surprised to see a wider variety of numbers when the thresholds are tested on a wide range of x86 machines. > > Until we have data, I don't see how we can act on this though. > > My first thought also was along the lines of Matthias: do it at a higher opt level e.g. O3 or possibly revisit/start thinking about O4. > > Why? What about the data presented means that this isn't appropriate at O2? I'm fine if that's the answer, but I think we need to have a clear and unambiguous rationale behind it. With the current data on this thread, the code size and compile time impact seem *very small* except for very small benchmarks, many of which actually show the performance improvement as well.If there is clear insight where the gains are coming from O2 is fine. IMHO if we just have the “better” numbers we should go for a higher opt level since not everyone will benefit. Some users will only pay higher compile-times. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170214/d1234e58/attachment.html>
Dehao Chen via llvm-dev
2017-Feb-15 18:17 UTC
[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
Thanks all for the comments. On Tue, Feb 14, 2017 at 12:53 PM, Gerolf Hoflehner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > If there is clear insight where the gains are coming from O2 is fine. IMHO if we just have the “better” numbers we should go for a higher opt level since not everyone will benefit. Some users will only pay higher compile-times.As from my previous reply, for the google applications that we see gains from the threshold increase, the benefits are simply coming from reduced instructions from unrolled loops. Looks like that we all agree that the threshold increase is OK at O2. If there is no objection by the end of day today, I will submit the patch. Thanks, Dehao