Quentin Colombet via llvm-dev
2021-Jul-01 17:55 UTC
[llvm-dev] Problems with subreg-liveness and Greedy RA
Hi Nemanja,
Do you have something I could run on my side?
I’d like to see how/when we create %296 and the dump, doesn’t have that
information:
```
selectOrSplit VSRpRC:%39 [536r,6168B:0) 0 at 536r L0000000000000002
[536r,6168B:0) 0 at 536r L0000000000000040 [536r,536d:0) 0 at 536r
weight:7.122666e+05 w=7.122666e+05
RS_Split Cascade 11
Analyze counted 3 instrs in 2 blocks, through 0 blocks.
Cost of isolating all blocks = 2147483648.8
$vsrp0 no positive bundles
$vsrp1 no positive bundles
$vsrp2 no positive bundles
$vsrp3 no positive bundles
$vsrp4 no positive bundles
$vsrp5 no positive bundles
$vsrp6 no positive bundles
$vsrp17 no positive bundles
$vsrp18 no positive bundles
$vsrp16 no positive bundles
$vsrp19 no positive bundles
$vsrp20 no positive bundles
$vsrp21 no positive bundles
$vsrp22 no positive bundles
$vsrp23 no positive bundles
$vsrp24 no positive bundles
$vsrp25 no positive bundles
$vsrp15 no positive bundles
$vsrp14 no positive bundles
$vsrp13 no positive bundles
$vsrp12 no positive bundles
$vsrp11 no positive bundles
$vsrp10 no positive bundles
$vsrp9 no positive bundles
$vsrp8 no positive bundles
$vsrp7 no positive bundles
$vsrp31 no positive bundles
$vsrp30 no positive bundles
$vsrp29 no positive bundles
$vsrp28 no positive bundles
$vsrp27 no positive bundles
$vsrp26 no positive bundles
enterIntvBefore 5296r: valno 0
leaveIntvAfter 5320r: valno 0
useIntv [5292r;5320r): [5292r;5320r):1
Multi-mapped complement 0 at 5316r for parent 0 at 536r hoist to %bb.2 5316r
Direct complement def at 536r
Removing 1 back-copies.
Removing 5316r undef %296.sub_64:vsrprc = COPY %39.sub_64:vsrprc. <————
THIS COPY
blit [536r,6168B:0): [536r;5292r)=0(%296)(recalc)
[5292r;5320r)=1(%297)(recalc) [5320r;6168B)=0(%296)(recalc)
rewr %bb.1 536r:0 %296:vsrprc = LXVP 0, %33:g8rc_and_g8rc_nox0 :: (load 32
from constant-pool)
rewr %bb.2 5320B:1 %290:vsrc = contract nofpexcept XVMADDADP
%290:vsrc(tied-def 0), %297.sub_vsx0:vsrprc, %31:vsrc, implicit $rm
rewr %bb.2 5296B:1 %289:vsrc = contract nofpexcept XVMADDADP
%289:vsrc(tied-def 0), %297.sub_vsx0:vsrprc, %62:vsrc, implicit $rm
rewr %bb.2 5292B:0 undef %297.sub_64:vsrprc = COPY %296.sub_64:vsrprc
queuing new interval: %296 [536r,6168B:0) 0 at 536r L0000000000000040
[536r,536d:0) 0 at 536r L0000000000000002 [536r,6168B:1) 0 at x 1 at 536r
weight:3.596946e+05
queuing new interval: %297 [5292r,5320r:0) 0 at 5292r L0000000000000002
[5292r,5320r:0) 0 at 5292r weight:1.520298e+07
```
Ideally, if you can file a bug, that would make the tracking simpler.
Cheers,
-Quentin
> On Jun 23, 2021, at 12:18 PM, Nemanja Ivanovic <nemanja.i.ibm at
gmail.com> wrote:
>
> Sorry, it would appear that the dev list stripped my attachment. I have
reduced the file using bugpoint and produced the output from that. Attaching it
here.
>
> Nemanja
>
> On Wed, Jun 23, 2021 at 2:23 PM Nemanja Ivanovic <nemanja.i.ibm at
gmail.com <mailto:nemanja.i.ibm at gmail.com>> wrote:
> Thank you so much for taking the time to answer Quentin.
>
> The bad copies are definitely added by live range splitting. The issue
seems to be the LaneBitmasks for the various subregisters. Honestly, I don't
really know what the bits of LaneBitmask produced by TblGen are meant to mean,
but I can't make any sense of them. And those seem to lead the register
allocator astray.
> Here are the LaneBitmasks from the register include file:
> static const LaneBitmask SubRegIndexLaneMaskTable[] = {
> LaneBitmask::getAll(),
> LaneBitmask(0x0000000000000001), // sub_32
> LaneBitmask(0x0000000000000002), // sub_64
> LaneBitmask(0x0000000000000004), // sub_eq
> LaneBitmask(0x0000000000000001), // sub_gp8_x0
> LaneBitmask(0x0000000000000200), // sub_gp8_x1
> LaneBitmask(0x0000000000000008), // sub_gt
> LaneBitmask(0x0000000000000010), // sub_lt
> LaneBitmask(0x0000000000000042), // sub_pair0
> LaneBitmask(0x0000000000000180), // sub_pair1
> LaneBitmask(0x0000000000000020), // sub_un
> LaneBitmask(0x0000000000000002), // sub_vsx0
> LaneBitmask(0x0000000000000040), // sub_vsx1
> LaneBitmask(0x0000000000000040), // sub_vsx1_then_sub_64
> LaneBitmask(0x0000000000000080), // sub_pair1_then_sub_64
> LaneBitmask(0x0000000000000080), // sub_pair1_then_sub_vsx0
> LaneBitmask(0x0000000000000100), // sub_pair1_then_sub_vsx1
> LaneBitmask(0x0000000000000100), // sub_pair1_then_sub_vsx1_then_sub_64
> LaneBitmask(0x0000000000000200), // sub_gp8_x1_then_sub_32
> };
>
> For example, what does it mean that the mask for sub_64 and sub_vsx0 are
the same? The two subregisters certainly do not represent the same lanes in
their respective registers. The sub_vsx0 subregister is the first VSX register
in a VSX register pair. And each of the two subregisters of a VSX register pair
(sub_vsx0, sub_vsx1) have their own scalar subregister (sub_64).
>
> I have also attached the output of RA, but it is huge :(
> It is the result of specifying options -debug-only=regalloc
-print-before=greedy -print-after=greedy on the command line.
>
> On Tue, Jun 22, 2021 at 3:21 PM Quentin Colombet <qcolombet at apple.com
<mailto:qcolombet at apple.com>> wrote:
>
>
>> On Jun 21, 2021, at 10:05 AM, Nemanja Ivanovic via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>
>> I am having a really difficult time with subregister related issues
when I turn
>> on subregister liveness tracking.
>>
>> Before RA:
>> 79760B %2216:vsrc = LXVDSX %5551:g8rc_and_g8rc_nox0, %2215:g8rc ::
(load 8 from %ir.scevgep1857.cast, !alias.scope !92, !noalias !93)
>> 79872B %2225:vsrprc = LXVP 352, %661:g8rc_and_g8rc_nox0
>> 84328B %5540:vsrc = contract nofpexcept XVMADDADP
%5540:vsrc(tied-def 0), %2225.sub_vsx0:vsrprc, %2216:vsrc, implicit $rm
>>
>> After RA (greedy):
>> 79744B %2214:vsrc = LXVDSX %5551:g8rc_and_g8rc_nox0, %6477:g8rc ::
(load 8 from %ir.scevgep1860.cast, !alias.scope !92, !noalias !93)
>> 79872B %7503:vsrprc = LXVP 352, %661:g8rc_and_g8rc_nox0
>> 80248B %7527:vsrprc = COPY %7503:vsrprc
>> 80988B undef %7526.sub_64:vsrprc = COPY %7527.sub_64:vsrprc
>> 84324B undef %7501.sub_64:vsrprc = COPY %7526.sub_64:vsrprc
>> 84328B %5546:vsrc = contract nofpexcept XVMADDADP
%5546:vsrc(tied-def 0), %7501.sub_vsx0:vsrprc, %2214:vsrc, implicit $rm
>>
>> Subregister definitions for PPC:
>> def sub_64 : SubRegIndex<64>;
>> def sub_vsx0 : SubRegIndex<128>;
>> def sub_vsx1 : SubRegIndex<128, 128>;
>> def sub_pair0 : SubRegIndex<256>;
>> def sub_pair1 : SubRegIndex<256, 256>;
>>
>> So the instruction at 84328B uses the full register %2216 and the high
order
>> 128 bits of (256-bit) register %2225. However, the register allocator
splits
>> the live range and introduces a copy of the high order 64 bits of that
256-bit
>> register, then another copy of that copy and rewrites the use in
instruction
>> 84328B to that copy. The copy is marked undef so the register allocator
>> assigns just some random register to the use of that copy in 84328B.
>>
>> Or maybe I am completely misinterpreting the meaning of the debug dumps
>> from the register allocator.
>>
>> This appears to be related to lane masks and dead lane detection
although
>> I don't see dead lane detection marking anything unexpected as
undef (seems
>> to just be INSERT_SUBREG and PHI).
>
> Are the copies added by dead lane detection or by live-range splitting?
>
> The undef flag on the definition of %7501 is suspicious and depending on
how you look at it, so is the one on %7526. Essentially, we are losing the full
copy in this chain of copies and I wonder what is at fault here.
>
> Could you share the debug output of regalloc?
>
>>
>> If anyone has suggestions on what might be the issue and/or how to go
about figuring this out and fixing it, I would really appreciate it.
>>
>> Nemanja
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
> <ra-before-after-debug.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210701/b1a35297/attachment.html>
Nemanja Ivanovic via llvm-dev
2021-Jul-06 23:58 UTC
[llvm-dev] Problems with subreg-liveness and Greedy RA
Hi Quentin, I am really sorry for the delay. I am working on getting a reproducer for this and once I have something manageable, I'll open a bug. Thanks for all your help, Nemanja On Thu, Jul 1, 2021 at 1:55 PM Quentin Colombet <qcolombet at apple.com> wrote:> Hi Nemanja, > > Do you have something I could run on my side? > > I’d like to see how/when we create %296 and the dump, doesn’t have that > information: > ``` > selectOrSplit VSRpRC:%39 [536r,6168B:0) 0 at 536r L0000000000000002 > [536r,6168B:0) 0 at 536r L0000000000000040 [536r,536d:0) 0 at 536r > weight:7.122666e+05 w=7.122666e+05 > RS_Split Cascade 11 > Analyze counted 3 instrs in 2 blocks, through 0 blocks. > Cost of isolating all blocks = 2147483648.8 > $vsrp0 no positive bundles > $vsrp1 no positive bundles > $vsrp2 no positive bundles > $vsrp3 no positive bundles > $vsrp4 no positive bundles > $vsrp5 no positive bundles > $vsrp6 no positive bundles > $vsrp17 no positive bundles > $vsrp18 no positive bundles > $vsrp16 no positive bundles > $vsrp19 no positive bundles > $vsrp20 no positive bundles > $vsrp21 no positive bundles > $vsrp22 no positive bundles > $vsrp23 no positive bundles > $vsrp24 no positive bundles > $vsrp25 no positive bundles > $vsrp15 no positive bundles > $vsrp14 no positive bundles > $vsrp13 no positive bundles > $vsrp12 no positive bundles > $vsrp11 no positive bundles > $vsrp10 no positive bundles > $vsrp9 no positive bundles > $vsrp8 no positive bundles > $vsrp7 no positive bundles > $vsrp31 no positive bundles > $vsrp30 no positive bundles > $vsrp29 no positive bundles > $vsrp28 no positive bundles > $vsrp27 no positive bundles > $vsrp26 no positive bundles > enterIntvBefore 5296r: valno 0 > leaveIntvAfter 5320r: valno 0 > useIntv [5292r;5320r): [5292r;5320r):1 > Multi-mapped complement 0 at 5316r for parent 0 at 536r hoist to %bb.2 5316r > Direct complement def at 536r > Removing 1 back-copies. > Removing 5316r undef %296.sub_64:vsrprc = COPY %39.sub_64:vsrprc. <———— > THIS COPY > blit [536r,6168B:0): [536r;5292r)=0(%296)(recalc) > [5292r;5320r)=1(%297)(recalc) [5320r;6168B)=0(%296)(recalc) > rewr %bb.1 536r:0 %296:vsrprc = LXVP 0, %33:g8rc_and_g8rc_nox0 :: > (load 32 from constant-pool) > rewr %bb.2 5320B:1 %290:vsrc = contract nofpexcept XVMADDADP > %290:vsrc(tied-def 0), %297.sub_vsx0:vsrprc, %31:vsrc, implicit $rm > rewr %bb.2 5296B:1 %289:vsrc = contract nofpexcept XVMADDADP > %289:vsrc(tied-def 0), %297.sub_vsx0:vsrprc, %62:vsrc, implicit $rm > rewr %bb.2 5292B:0 undef %297.sub_64:vsrprc = COPY %296.sub_64:vsrprc > queuing new interval: %296 [536r,6168B:0) 0 at 536r L0000000000000040 > [536r,536d:0) 0 at 536r L0000000000000002 [536r,6168B:1) 0 at x 1 at 536r > weight:3.596946e+05 > queuing new interval: %297 [5292r,5320r:0) 0 at 5292r L0000000000000002 > [5292r,5320r:0) 0 at 5292r weight:1.520298e+07 > ``` > > Ideally, if you can file a bug, that would make the tracking simpler. > > Cheers, > -Quentin > > On Jun 23, 2021, at 12:18 PM, Nemanja Ivanovic <nemanja.i.ibm at gmail.com> > wrote: > > Sorry, it would appear that the dev list stripped my attachment. I have > reduced the file using bugpoint and produced the output from that. > Attaching it here. > > Nemanja > > On Wed, Jun 23, 2021 at 2:23 PM Nemanja Ivanovic <nemanja.i.ibm at gmail.com> > wrote: > >> Thank you so much for taking the time to answer Quentin. >> >> The bad copies are definitely added by live range splitting. The issue >> seems to be the LaneBitmasks for the various subregisters. Honestly, I >> don't really know what the bits of LaneBitmask produced by TblGen are meant >> to mean, but I can't make any sense of them. And those seem to lead the >> register allocator astray. >> Here are the LaneBitmasks from the register include file: >> static const LaneBitmask SubRegIndexLaneMaskTable[] = { >> LaneBitmask::getAll(), >> LaneBitmask(0x0000000000000001), // sub_32 >> LaneBitmask(0x0000000000000002), // sub_64 >> LaneBitmask(0x0000000000000004), // sub_eq >> LaneBitmask(0x0000000000000001), // sub_gp8_x0 >> LaneBitmask(0x0000000000000200), // sub_gp8_x1 >> LaneBitmask(0x0000000000000008), // sub_gt >> LaneBitmask(0x0000000000000010), // sub_lt >> LaneBitmask(0x0000000000000042), // sub_pair0 >> LaneBitmask(0x0000000000000180), // sub_pair1 >> LaneBitmask(0x0000000000000020), // sub_un >> LaneBitmask(0x0000000000000002), // sub_vsx0 >> LaneBitmask(0x0000000000000040), // sub_vsx1 >> LaneBitmask(0x0000000000000040), // sub_vsx1_then_sub_64 >> LaneBitmask(0x0000000000000080), // sub_pair1_then_sub_64 >> LaneBitmask(0x0000000000000080), // sub_pair1_then_sub_vsx0 >> LaneBitmask(0x0000000000000100), // sub_pair1_then_sub_vsx1 >> LaneBitmask(0x0000000000000100), // sub_pair1_then_sub_vsx1_then_sub_64 >> LaneBitmask(0x0000000000000200), // sub_gp8_x1_then_sub_32 >> }; >> >> For example, what does it mean that the mask for sub_64 and sub_vsx0 are >> the same? The two subregisters certainly do not represent the same lanes in >> their respective registers. The sub_vsx0 subregister is the first VSX >> register in a VSX register pair. And each of the two subregisters of a VSX >> register pair (sub_vsx0, sub_vsx1) have their own scalar subregister ( >> sub_64). >> >> I have also attached the output of RA, but it is huge :( >> It is the result of specifying options -debug-only=regalloc >> -print-before=greedy -print-after=greedy on the command line. >> >> On Tue, Jun 22, 2021 at 3:21 PM Quentin Colombet <qcolombet at apple.com> >> wrote: >> >>> >>> >>> On Jun 21, 2021, at 10:05 AM, Nemanja Ivanovic via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>> I am having a really difficult time with subregister related issues when >>> I turn >>> on subregister liveness tracking. >>> >>> Before RA: >>> 79760B %2216:vsrc = LXVDSX %5551:g8rc_and_g8rc_nox0, %2215:g8rc :: >>> (load 8 from %ir.scevgep1857.cast, !alias.scope !92, !noalias !93) >>> 79872B %2225:vsrprc = LXVP 352, %661:g8rc_and_g8rc_nox0 >>> 84328B %5540:vsrc = contract nofpexcept XVMADDADP %5540:vsrc(tied-def >>> 0), %2225.sub_vsx0:vsrprc, %2216:vsrc, implicit $rm >>> >>> After RA (greedy): >>> 79744B %2214:vsrc = LXVDSX %5551:g8rc_and_g8rc_nox0, %6477:g8rc :: >>> (load 8 from %ir.scevgep1860.cast, !alias.scope !92, !noalias !93) >>> 79872B %7503:vsrprc = LXVP 352, %661:g8rc_and_g8rc_nox0 >>> 80248B %7527:vsrprc = COPY %7503:vsrprc >>> 80988B undef %7526.sub_64:vsrprc = COPY %7527.sub_64:vsrprc >>> 84324B undef %7501.sub_64:vsrprc = COPY %7526.sub_64:vsrprc >>> 84328B %5546:vsrc = contract nofpexcept XVMADDADP %5546:vsrc(tied-def >>> 0), %7501.sub_vsx0:vsrprc, %2214:vsrc, implicit $rm >>> >>> Subregister definitions for PPC: >>> def sub_64 : SubRegIndex<64>; >>> def sub_vsx0 : SubRegIndex<128>; >>> def sub_vsx1 : SubRegIndex<128, 128>; >>> def sub_pair0 : SubRegIndex<256>; >>> def sub_pair1 : SubRegIndex<256, 256>; >>> >>> So the instruction at 84328B uses the full register %2216 and the high >>> order >>> 128 bits of (256-bit) register %2225. However, the register allocator >>> splits >>> the live range and introduces a copy of the high order 64 bits of that >>> 256-bit >>> register, then another copy of that copy and rewrites the use in >>> instruction >>> 84328B to that copy. The copy is marked undef so the register allocator >>> assigns just some random register to the use of that copy in 84328B. >>> >>> Or maybe I am completely misinterpreting the meaning of the debug dumps >>> from the register allocator. >>> >>> This appears to be related to lane masks and dead lane detection although >>> I don't see dead lane detection marking anything unexpected as undef >>> (seems >>> to just be INSERT_SUBREG and PHI). >>> >>> >>> Are the copies added by dead lane detection or by live-range splitting? >>> >>> The undef flag on the definition of %7501 is suspicious and depending on >>> how you look at it, so is the one on %7526. Essentially, we are losing the >>> full copy in this chain of copies and I wonder what is at fault here. >>> >>> Could you share the debug output of regalloc? >>> >>> >>> If anyone has suggestions on what might be the issue and/or how to go >>> about figuring this out and fixing it, I would really appreciate it. >>> >>> Nemanja >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>> <ra-before-after-debug.txt> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210706/8720aada/attachment.html>