llvm dev - Mar 2015 - [LLVMdev] fix for loop scale limiting in BFI

I've been trying to get rid of the loop scale limiting problem during
BFI. Initially, this was saturating frequencies to the max side of the
scale, so a double nested loop would get max frequencies in all the
blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner
loop no hotter than the outer loop, so block placement would not
bother aligning them.

In convertFloatingToInteger() we are scaling the BFI frequencies so
they fit an integer. The function tries to choose a scaling factor and
warns about being careful so RA doesn't get confused. It chooses a
scaling factor of 1 / Min, which almost always turns up to be 1. This
was causing me grief in the double nested loop case because the inner
loop had a freq of about 6e20 while the outer blocks had a frequency
of 2e19. With a scaling factor of 1, we were saturating everything to
UINT64_MAX.

I changed it so it uses a scaling factor that puts the frequencies in
[1, UINT32_MAX], but only if the Max frequency is outside that range.
This is causing two failures in the testsuite, which seem to be caused
by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll
we are hoisting into the wrong loop now, but I'm not sure.

The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block
placement. Which is a bit odd. The frequencies given by my changes are
certainly different, but the body of the loop is given a
disproportionately larger frequency than the others (much like in the
original case).  Though, I think what's going on here is that my
changes are causing the smaller frequencies to be saturated down to 1:

Original:
float-to-int: min = 0.0000004768367035, max = 2047.994141, factor = 16777232.0

Printing analysis 'Block Frequency Analysis' for function 't':
  block-frequency-info: t
   - entry: float = 1.0, int = 16777232
   - if.then: float = 0.0000009536743164, int = 16
   - if.else: float = 0.9999990463, int = 16777216
   - if.then15: float = 0.0000009536734069, int = 16
   - if.else18: float = 0.9999980927, int = 16777200
   - if.then102: float = 0.0000009536734069, int = 16
   - cond.true10.i: float = 0.0000004768367035, int = 8
   - t.exit: float = 0.0000009536734069, int = 16
   - if.then115: float = 0.4999985695, int = 8388592
   - if.else145: float = 0.2499992847, int = 4194296
   - if.else163: float = 0.2499992847, int = 4194296
   - while.body172: float = 2047.994141, int = 34359672832

                                                           -
if.else173: float = 0.4999985695, int = 8388592

My patch:
float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0,
factor = 0.0000000004656626195

block-frequency-info: t
 - entry: float = 1.0, int = 1
 - if.then: float = 0.0000009536743164, int = 1
 - if.else: float = 0.9999990463, int = 1
 - if.then15: float = 0.0000009536734069, int = 1
 - if.else18: float = 0.9999980927, int = 1
 - if.then102: float = 0.0000009536734069, int = 1
 - cond.true10.i: float = 0.0000004768367035, int = 1
 - t.exit: float = 0.0000009536734069, int = 1
 - if.then115: float = 0.4999985695, int = 1
 - if.else145: float = 0.2499992847, int = 1
 - if.else163: float = 0.2499992847, int = 1
 - while.body172: float = 9223345648592486401.0, int = 4294967295
 - if.else173: float = 0.4999985695, int = 1

The scaling factor is so minuscule that I end up squashing every "low"
frequency to 1. I think I need to smooth this better. In the meantime,
I wanted to pick your brain. Maybe I'm completely off-base in my
analysis.

Thanks.  Diego.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Remove-4-096-loop-scale-limitation.patch
Type: application/octet-stream
Size: 11213 bytes
Desc: not available
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20150327/29ddc958/attachment.obj>