I've been trying to get rid of the loop scale limiting problem during BFI. Initially, this was saturating frequencies to the max side of the scale, so a double nested loop would get max frequencies in all the blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner loop no hotter than the outer loop, so block placement would not bother aligning them. In convertFloatingToInteger() we are scaling the BFI frequencies so they fit an integer. The function tries to choose a scaling factor and warns about being careful so RA doesn't get confused. It chooses a scaling factor of 1 / Min, which almost always turns up to be 1. This was causing me grief in the double nested loop case because the inner loop had a freq of about 6e20 while the outer blocks had a frequency of 2e19. With a scaling factor of 1, we were saturating everything to UINT64_MAX. I changed it so it uses a scaling factor that puts the frequencies in [1, UINT32_MAX], but only if the Max frequency is outside that range. This is causing two failures in the testsuite, which seem to be caused by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll we are hoisting into the wrong loop now, but I'm not sure. The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block placement. Which is a bit odd. The frequencies given by my changes are certainly different, but the body of the loop is given a disproportionately larger frequency than the others (much like in the original case). Though, I think what's going on here is that my changes are causing the smaller frequencies to be saturated down to 1: Original: float-to-int: min = 0.0000004768367035, max = 2047.994141, factor = 16777232.0 Printing analysis 'Block Frequency Analysis' for function 't': block-frequency-info: t - entry: float = 1.0, int = 16777232 - if.then: float = 0.0000009536743164, int = 16 - if.else: float = 0.9999990463, int = 16777216 - if.then15: float = 0.0000009536734069, int = 16 - if.else18: float = 0.9999980927, int = 16777200 - if.then102: float = 0.0000009536734069, int = 16 - cond.true10.i: float = 0.0000004768367035, int = 8 - t.exit: float = 0.0000009536734069, int = 16 - if.then115: float = 0.4999985695, int = 8388592 - if.else145: float = 0.2499992847, int = 4194296 - if.else163: float = 0.2499992847, int = 4194296 - while.body172: float = 2047.994141, int = 34359672832 - if.else173: float = 0.4999985695, int = 8388592 My patch: float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0, factor = 0.0000000004656626195 block-frequency-info: t - entry: float = 1.0, int = 1 - if.then: float = 0.0000009536743164, int = 1 - if.else: float = 0.9999990463, int = 1 - if.then15: float = 0.0000009536734069, int = 1 - if.else18: float = 0.9999980927, int = 1 - if.then102: float = 0.0000009536734069, int = 1 - cond.true10.i: float = 0.0000004768367035, int = 1 - t.exit: float = 0.0000009536734069, int = 1 - if.then115: float = 0.4999985695, int = 1 - if.else145: float = 0.2499992847, int = 1 - if.else163: float = 0.2499992847, int = 1 - while.body172: float = 9223345648592486401.0, int = 4294967295 - if.else173: float = 0.4999985695, int = 1 The scaling factor is so minuscule that I end up squashing every "low" frequency to 1. I think I need to smooth this better. In the meantime, I wanted to pick your brain. Maybe I'm completely off-base in my analysis. Thanks. Diego. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Remove-4-096-loop-scale-limitation.patch Type: application/octet-stream Size: 11213 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150327/29ddc958/attachment.obj>
How about only removing the scaling limit when PGO is on? I don't see the need for this change without PGO. David On Fri, Mar 27, 2015 at 11:52 AM, Diego Novillo <dnovillo at google.com> wrote:> I've been trying to get rid of the loop scale limiting problem during > BFI. Initially, this was saturating frequencies to the max side of the > scale, so a double nested loop would get max frequencies in all the > blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner > loop no hotter than the outer loop, so block placement would not > bother aligning them. > > In convertFloatingToInteger() we are scaling the BFI frequencies so > they fit an integer. The function tries to choose a scaling factor and > warns about being careful so RA doesn't get confused. It chooses a > scaling factor of 1 / Min, which almost always turns up to be 1. This > was causing me grief in the double nested loop case because the inner > loop had a freq of about 6e20 while the outer blocks had a frequency > of 2e19. With a scaling factor of 1, we were saturating everything to > UINT64_MAX. > > I changed it so it uses a scaling factor that puts the frequencies in > [1, UINT32_MAX], but only if the Max frequency is outside that range. > This is causing two failures in the testsuite, which seem to be caused > by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll > we are hoisting into the wrong loop now, but I'm not sure. > > The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block > placement. Which is a bit odd. The frequencies given by my changes are > certainly different, but the body of the loop is given a > disproportionately larger frequency than the others (much like in the > original case). Though, I think what's going on here is that my > changes are causing the smaller frequencies to be saturated down to 1: > > Original: > float-to-int: min = 0.0000004768367035, max = 2047.994141, factor > 16777232.0 > > Printing analysis 'Block Frequency Analysis' for function 't': > block-frequency-info: t > - entry: float = 1.0, int = 16777232 > - if.then: float = 0.0000009536743164, int = 16 > - if.else: float = 0.9999990463, int = 16777216 > - if.then15: float = 0.0000009536734069, int = 16 > - if.else18: float = 0.9999980927, int = 16777200 > - if.then102: float = 0.0000009536734069, int = 16 > - cond.true10.i: float = 0.0000004768367035, int = 8 > - t.exit: float = 0.0000009536734069, int = 16 > - if.then115: float = 0.4999985695, int = 8388592 > - if.else145: float = 0.2499992847, int = 4194296 > - if.else163: float = 0.2499992847, int = 4194296 > - while.body172: float = 2047.994141, int = 34359672832 > > > - > if.else173: float = 0.4999985695, int = 8388592 > > > > My patch: > float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0, > factor = 0.0000000004656626195 > > block-frequency-info: t > - entry: float = 1.0, int = 1 > - if.then: float = 0.0000009536743164, int = 1 > - if.else: float = 0.9999990463, int = 1 > - if.then15: float = 0.0000009536734069, int = 1 > - if.else18: float = 0.9999980927, int = 1 > - if.then102: float = 0.0000009536734069, int = 1 > - cond.true10.i: float = 0.0000004768367035, int = 1 > - t.exit: float = 0.0000009536734069, int = 1 > - if.then115: float = 0.4999985695, int = 1 > - if.else145: float = 0.2499992847, int = 1 > - if.else163: float = 0.2499992847, int = 1 > - while.body172: float = 9223345648592486401.0, int = 4294967295 > - if.else173: float = 0.4999985695, int = 1 > > > > The scaling factor is so minuscule that I end up squashing every "low" > frequency to 1. I think I need to smooth this better. In the meantime, > I wanted to pick your brain. Maybe I'm completely off-base in my > analysis. > > > Thanks. Diego. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150327/6de4066d/attachment.html>
On Fri, Mar 27, 2015 at 4:09 PM, Xinliang David Li <xinliangli at gmail.com> wrote:> How about only removing the scaling limit when PGO is on? I don't see the > need for this change without PGO.This is what my patch does, and it's getting into issues. With the scaling limit gone, the frequencies propagated overflow 64bits, so they need to be scaled down to a 64bit space. To be on the safe side, my patch is mapping them down to a 32bit space, but I am squishing them too much on the lower end. So regions of the CFG that before had distinct temperatures are now showing up with frequency == 1. I need a better smoother for the mapping from the Scale64 floats down to 64bit (or 32bit) integers. Diego.
On Fri, Mar 27, 2015 at 11:52 AM, Diego Novillo <dnovillo at google.com> wrote:> I changed it so it uses a scaling factor that puts the frequencies in > [1, UINT32_MAX], but only if the Max frequency is outside that range. > This is causing two failures in the testsuite, which seem to be caused > by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll > we are hoisting into the wrong loop now, but I'm not sure. > > The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block > placement. Which is a bit odd. The frequencies given by my changes are > certainly different, but the body of the loop is given a > disproportionately larger frequency than the others (much like in the > original case). Though, I think what's going on here is that my > changes are causing the smaller frequencies to be saturated down to 1: > > Original: > float-to-int: min = 0.0000004768367035, max = 2047.994141, factor > 16777232.0 > > Printing analysis 'Block Frequency Analysis' for function 't': > block-frequency-info: t > - entry: float = 1.0, int = 16777232 > - if.then: float = 0.0000009536743164, int = 16 > - if.else: float = 0.9999990463, int = 16777216 > - if.then15: float = 0.0000009536734069, int = 16 > - if.else18: float = 0.9999980927, int = 16777200 > - if.then102: float = 0.0000009536734069, int = 16 > - cond.true10.i: float = 0.0000004768367035, int = 8 > - t.exit: float = 0.0000009536734069, int = 16 > - if.then115: float = 0.4999985695, int = 8388592 > - if.else145: float = 0.2499992847, int = 4194296 > - if.else163: float = 0.2499992847, int = 4194296 > - while.body172: float = 2047.994141, int = 34359672832 > > > - > if.else173: float = 0.4999985695, int = 8388592 > > > > My patch: > float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0, > factor = 0.0000000004656626195 > > block-frequency-info: t > - entry: float = 1.0, int = 1 > - if.then: float = 0.0000009536743164, int = 1 > - if.else: float = 0.9999990463, int = 1 > - if.then15: float = 0.0000009536734069, int = 1 > - if.else18: float = 0.9999980927, int = 1 > - if.then102: float = 0.0000009536734069, int = 1 > - cond.true10.i: float = 0.0000004768367035, int = 1 > - t.exit: float = 0.0000009536734069, int = 1 > - if.then115: float = 0.4999985695, int = 1 > - if.else145: float = 0.2499992847, int = 1 > - if.else163: float = 0.2499992847, int = 1 > - while.body172: float = 9223345648592486401.0, int = 4294967295 > - if.else173: float = 0.4999985695, int = 1 > > > > The scaling factor is so minuscule that I end up squashing every "low" > frequency to 1. I think I need to smooth this better. In the meantime, > I wanted to pick your brain. Maybe I'm completely off-base in my > analysis. >This all makes sense to me. I think we should do at least one thing and maybe two things: 1) I wonder if there is a reason that we *need* to scale down to a 32-bit max? Even if we increase the max though, I suspect we'll still see cases where this doesn't work well. It's interesting that we hit 32-bit limits so quickly when using synthetic loop estimates. 2) I think we should use a non-linear scale once the range exceeds [1, MAX]. We were already doing some of this and I think we just need to do more. I could see either a quadratic or exponential scaling technique working well... Or one of the interpolation scaling techniques. I think we'll have to experiment to identify the best one. In particular, I'm imagining we'll want to play with both quadratic and exponential scales in both directions: compressing the high frequencies and compressing the low frequencies. And if neither is good, we may need an interpolation that compresses both the high and low frequencies. (Naturally, I'm interested in other ideas for how to handle this as well...) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150327/545bd8e7/attachment.html>
On Fri, Mar 27, 2015 at 1:33 PM, Chandler Carruth <chandlerc at google.com> wrote:> We were already doing some of this and I think we just need to do more.errr "... and I think we just need to keep doing some or do it differently." I'm bad at words today. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150327/92e016eb/attachment.html>
On Fri, Mar 27, 2015 at 11:52 AM, Diego Novillo <dnovillo at google.com> wrote:> I've been trying to get rid of the loop scale limiting problem during > BFI. Initially, this was saturating frequencies to the max side of the > scale, so a double nested loop would get max frequencies in all the > blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner > loop no hotter than the outer loop, so block placement would not > bother aligning them. > > In convertFloatingToInteger() we are scaling the BFI frequencies so > they fit an integer. The function tries to choose a scaling factor and > warns about being careful so RA doesn't get confused. It chooses a > scaling factor of 1 / Min, which almost always turns up to be 1.This is not necessarily true. If there are any conditional code in a acyclic region, the scale factor will be greater than 1.> This > was causing me grief in the double nested loop case because the inner > loop had a freq of about 6e20 while the outer blocks had a frequency > of 2e19. With a scaling factor of 1, we were saturating everything to > UINT64_MAX. > > I changed it so it uses a scaling factor that puts the frequencies in > [1, UINT32_MAX], but only if the Max frequency is outside that range. > This is causing two failures in the testsuite, which seem to be caused > by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll > we are hoisting into the wrong loop now, but I'm not sure. > > The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block > placement. Which is a bit odd. The frequencies given by my changes are > certainly different, but the body of the loop is given a > disproportionately larger frequency than the others (much like in the > original case). Though, I think what's going on here is that my > changes are causing the smaller frequencies to be saturated down to 1: >All these problems are related to infinite loops. David> > Original: > float-to-int: min = 0.0000004768367035, max = 2047.994141, factor > 16777232.0 > > Printing analysis 'Block Frequency Analysis' for function 't': > block-frequency-info: t > - entry: float = 1.0, int = 16777232 > - if.then: float = 0.0000009536743164, int = 16 > - if.else: float = 0.9999990463, int = 16777216 > - if.then15: float = 0.0000009536734069, int = 16 > - if.else18: float = 0.9999980927, int = 16777200 > - if.then102: float = 0.0000009536734069, int = 16 > - cond.true10.i: float = 0.0000004768367035, int = 8 > - t.exit: float = 0.0000009536734069, int = 16 > - if.then115: float = 0.4999985695, int = 8388592 > - if.else145: float = 0.2499992847, int = 4194296 > - if.else163: float = 0.2499992847, int = 4194296 > - while.body172: float = 2047.994141, int = 34359672832 > > > - > if.else173: float = 0.4999985695, int = 8388592 > > > > My patch: > float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0, > factor = 0.0000000004656626195 > > block-frequency-info: t > - entry: float = 1.0, int = 1 > - if.then: float = 0.0000009536743164, int = 1 > - if.else: float = 0.9999990463, int = 1 > - if.then15: float = 0.0000009536734069, int = 1 > - if.else18: float = 0.9999980927, int = 1 > - if.then102: float = 0.0000009536734069, int = 1 > - cond.true10.i: float = 0.0000004768367035, int = 1 > - t.exit: float = 0.0000009536734069, int = 1 > - if.then115: float = 0.4999985695, int = 1 > - if.else145: float = 0.2499992847, int = 1 > - if.else163: float = 0.2499992847, int = 1 > - while.body172: float = 9223345648592486401.0, int = 4294967295 > - if.else173: float = 0.4999985695, int = 1 > > > > The scaling factor is so minuscule that I end up squashing every "low" > frequency to 1. I think I need to smooth this better. In the meantime, > I wanted to pick your brain. Maybe I'm completely off-base in my > analysis. > > > Thanks. Diego. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150327/6a389149/attachment.html>
Duncan P. N. Exon Smith
2015-Mar-29 18:02 UTC
[LLVMdev] fix for loop scale limiting in BFI
> On 2015 Mar 27, at 11:52, Diego Novillo <dnovillo at google.com> wrote: > > I've been trying to get rid of the loop scale limiting problem during > BFI. Initially, this was saturating frequencies to the max side of the > scale, so a double nested loop would get max frequencies in all the > blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner > loop no hotter than the outer loop, so block placement would not > bother aligning them. > > In convertFloatingToInteger() we are scaling the BFI frequencies so > they fit an integer. The function tries to choose a scaling factor and > warns about being careful so RA doesn't get confused. It chooses a > scaling factor of 1 / Min, which almost always turns up to be 1. This > was causing me grief in the double nested loop case because the inner > loop had a freq of about 6e20 while the outer blocks had a frequency > of 2e19. With a scaling factor of 1, we were saturating everything to > UINT64_MAX. > > I changed it so it uses a scaling factor that puts the frequencies in > [1, UINT32_MAX], but only if the Max frequency is outside that range. > This is causing two failures in the testsuite, which seem to be caused > by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll > we are hoisting into the wrong loop now, but I'm not sure. > > The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block > placement. Which is a bit odd. The frequencies given by my changes are > certainly different, but the body of the loop is given a > disproportionately larger frequency than the others (much like in the > original case). Though, I think what's going on here is that my > changes are causing the smaller frequencies to be saturated down to 1: > > Original: > float-to-int: min = 0.0000004768367035, max = 2047.994141, factor = 16777232.0 > > Printing analysis 'Block Frequency Analysis' for function 't': > block-frequency-info: t > - entry: float = 1.0, int = 16777232 > - if.then: float = 0.0000009536743164, int = 16 > - if.else: float = 0.9999990463, int = 16777216 > - if.then15: float = 0.0000009536734069, int = 16 > - if.else18: float = 0.9999980927, int = 16777200 > - if.then102: float = 0.0000009536734069, int = 16 > - cond.true10.i: float = 0.0000004768367035, int = 8 > - t.exit: float = 0.0000009536734069, int = 16 > - if.then115: float = 0.4999985695, int = 8388592 > - if.else145: float = 0.2499992847, int = 4194296 > - if.else163: float = 0.2499992847, int = 4194296 > - while.body172: float = 2047.994141, int = 34359672832 > > > - > if.else173: float = 0.4999985695, int = 8388592 > > > > My patch: > float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0, > factor = 0.0000000004656626195 > > block-frequency-info: t > - entry: float = 1.0, int = 1 > - if.then: float = 0.0000009536743164, int = 1 > - if.else: float = 0.9999990463, int = 1 > - if.then15: float = 0.0000009536734069, int = 1 > - if.else18: float = 0.9999980927, int = 1 > - if.then102: float = 0.0000009536734069, int = 1 > - cond.true10.i: float = 0.0000004768367035, int = 1 > - t.exit: float = 0.0000009536734069, int = 1 > - if.then115: float = 0.4999985695, int = 1 > - if.else145: float = 0.2499992847, int = 1 > - if.else163: float = 0.2499992847, int = 1 > - while.body172: float = 9223345648592486401.0, int = 4294967295 > - if.else173: float = 0.4999985695, int = 1 > > > > The scaling factor is so minuscule that I end up squashing every "low" > frequency to 1. I think I need to smooth this better. In the meantime, > I wanted to pick your brain. Maybe I'm completely off-base in my > analysis. > > > Thanks. Diego. > <0001-Remove-4-096-loop-scale-limitation.patch>Your change to `convertFloatingToInteger()` looks basically correct to me. If we need to saturate at one end or the other, it makes more sense to be saturating at the bottom. I think you can pick a higher number than UINT32_MAX. I'd start with something like 2^60 and lower it (or fix backend optimizations to handle bigger numbers) from there if you hit problems. I'm also not sure the logic leftover for shifting the scale up makes sense after your change. Maybe something like this: constexpr unsigned MaxBits = 60; unsigned SpreadBits = (Max / Min).lg(); Scaled64 ScalingFactor; if (SpreadBits <= MaxBits - 3) ScalingFactor = Min.inverse() << 3; // Make the minimum 8. else if (SpreadBits <= MaxBits) ScalingFactor = Min.inverse(); // Make the minimum 1. else ScalingFactor = Scaled64(1, MaxBits) / Max; // Saturate at 1. Infinite loops are a corner case, so I think we should just handle them specially. One way to do this is to change `computeLoopScale()` to give an arbitrary finite scale to infinite loops: // Choose an arbitrary, finite loop scale for infinite loops. constexpr LoopScale Infinite(64, 0); // LoopScale == 1 / ExitMass // ExitMass == HeadMass - BackedgeMass BlockMass ExitMass = BlockMass::getFull() - Loop.BackedgeMass; // Block scale stores the inverse of the scale. Loop.Scale = ExitMass.isEmpty() ? Infinite : ExitMass.toScaled().inverse(); How much does the exact loop scale matter anyway for infinite loops?
Special casing infinite loops sounds reasonable for now. Maybe just stick to the current 4096 scale limit for them. Infinite loops do exist (e.g. in server code), but I don't yet have data to show how important it is to handle them more precisely. thanks, David On Sun, Mar 29, 2015 at 11:02 AM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > > On 2015 Mar 27, at 11:52, Diego Novillo <dnovillo at google.com> wrote: > > > > I've been trying to get rid of the loop scale limiting problem during > > BFI. Initially, this was saturating frequencies to the max side of the > > scale, so a double nested loop would get max frequencies in all the > > blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner > > loop no hotter than the outer loop, so block placement would not > > bother aligning them. > > > > In convertFloatingToInteger() we are scaling the BFI frequencies so > > they fit an integer. The function tries to choose a scaling factor and > > warns about being careful so RA doesn't get confused. It chooses a > > scaling factor of 1 / Min, which almost always turns up to be 1. This > > was causing me grief in the double nested loop case because the inner > > loop had a freq of about 6e20 while the outer blocks had a frequency > > of 2e19. With a scaling factor of 1, we were saturating everything to > > UINT64_MAX. > > > > I changed it so it uses a scaling factor that puts the frequencies in > > [1, UINT32_MAX], but only if the Max frequency is outside that range. > > This is causing two failures in the testsuite, which seem to be caused > > by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll > > we are hoisting into the wrong loop now, but I'm not sure. > > > > The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block > > placement. Which is a bit odd. The frequencies given by my changes are > > certainly different, but the body of the loop is given a > > disproportionately larger frequency than the others (much like in the > > original case). Though, I think what's going on here is that my > > changes are causing the smaller frequencies to be saturated down to 1: > > > > Original: > > float-to-int: min = 0.0000004768367035, max = 2047.994141, factor > 16777232.0 > > > > Printing analysis 'Block Frequency Analysis' for function 't': > > block-frequency-info: t > > - entry: float = 1.0, int = 16777232 > > - if.then: float = 0.0000009536743164, int = 16 > > - if.else: float = 0.9999990463, int = 16777216 > > - if.then15: float = 0.0000009536734069, int = 16 > > - if.else18: float = 0.9999980927, int = 16777200 > > - if.then102: float = 0.0000009536734069, int = 16 > > - cond.true10.i: float = 0.0000004768367035, int = 8 > > - t.exit: float = 0.0000009536734069, int = 16 > > - if.then115: float = 0.4999985695, int = 8388592 > > - if.else145: float = 0.2499992847, int = 4194296 > > - if.else163: float = 0.2499992847, int = 4194296 > > - while.body172: float = 2047.994141, int = 34359672832 > > > > > > - > > if.else173: float = 0.4999985695, int = 8388592 > > > > > > > > My patch: > > float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0, > > factor = 0.0000000004656626195 > > > > block-frequency-info: t > > - entry: float = 1.0, int = 1 > > - if.then: float = 0.0000009536743164, int = 1 > > - if.else: float = 0.9999990463, int = 1 > > - if.then15: float = 0.0000009536734069, int = 1 > > - if.else18: float = 0.9999980927, int = 1 > > - if.then102: float = 0.0000009536734069, int = 1 > > - cond.true10.i: float = 0.0000004768367035, int = 1 > > - t.exit: float = 0.0000009536734069, int = 1 > > - if.then115: float = 0.4999985695, int = 1 > > - if.else145: float = 0.2499992847, int = 1 > > - if.else163: float = 0.2499992847, int = 1 > > - while.body172: float = 9223345648592486401.0, int = 4294967295 > > - if.else173: float = 0.4999985695, int = 1 > > > > > > > > The scaling factor is so minuscule that I end up squashing every "low" > > frequency to 1. I think I need to smooth this better. In the meantime, > > I wanted to pick your brain. Maybe I'm completely off-base in my > > analysis. > > > > > > Thanks. Diego. > > <0001-Remove-4-096-loop-scale-limitation.patch> > > Your change to `convertFloatingToInteger()` looks basically correct to > me. If we need to saturate at one end or the other, it makes more sense > to be saturating at the bottom. > > I think you can pick a higher number than UINT32_MAX. I'd start with > something like 2^60 and lower it (or fix backend optimizations to > handle bigger numbers) from there if you hit problems. > > I'm also not sure the logic leftover for shifting the scale up makes > sense after your change. Maybe something like this: > > constexpr unsigned MaxBits = 60; > unsigned SpreadBits = (Max / Min).lg(); > Scaled64 ScalingFactor; > if (SpreadBits <= MaxBits - 3) > ScalingFactor = Min.inverse() << 3; // Make the minimum 8. > else if (SpreadBits <= MaxBits) > ScalingFactor = Min.inverse(); // Make the minimum 1. > else > ScalingFactor = Scaled64(1, MaxBits) / Max; // Saturate at 1. > > Infinite loops are a corner case, so I think we should just handle them > specially. One way to do this is to change `computeLoopScale()` to > give an arbitrary finite scale to infinite loops: > > // Choose an arbitrary, finite loop scale for infinite loops. > constexpr LoopScale Infinite(64, 0); > > // LoopScale == 1 / ExitMass > // ExitMass == HeadMass - BackedgeMass > BlockMass ExitMass = BlockMass::getFull() - Loop.BackedgeMass; > > // Block scale stores the inverse of the scale. > Loop.Scale = ExitMass.isEmpty() ? Infinite > : ExitMass.toScaled().inverse(); > > How much does the exact loop scale matter anyway for infinite loops?-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150329/a86e657d/attachment.html>
On Sun, Mar 29, 2015 at 2:02 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:> Your change to `convertFloatingToInteger()` looks basically correct to > me. If we need to saturate at one end or the other, it makes more sense > to be saturating at the bottom.Thanks. Special casing infinite loops was the problem here. Saturating everything else down to 1 in normal circumstances is not a big deal in the cases I tried. I've sent http://reviews.llvm.org/D8718 for review. The change should be performance neutral, but I'm running a batch of SPEC and internal tests to make sure. Thanks. Diego.
On Sun, Mar 29, 2015 at 2:02 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:> Your change to `convertFloatingToInteger()` looks basically correct to > me. If we need to saturate at one end or the other, it makes more sense > to be saturating at the bottom.Thanks. Special casing infinite loops was the problem here. Saturating everything else down to 1 in normal circumstances is not a big deal in the cases I tried. I've sent http://reviews.llvm.org/D8718 for review. The change should be performance neutral, but I'm running a batch of SPEC and internal tests to make sure. Thanks. Diego.