Tim Northover via llvm-dev
2017-Jul-19  21:08 UTC
[llvm-dev] GlobalISel legalization guarantees
Hi all,
I've been thinking about what guarantees the legalization pass needs
to make in GlobalISel and how it should do it. Turns out it's really
quite difficult so I thought I'd send a message out to the list with
the current status and my thoughts for the future.
It got really quite long unfortunately, so the important bit is
probably the second section "High level solutions" on scary changes
that I can't see a way around.
Does anyone else either see issues I've missed with guaranteeing
legalization, or see simpler solutions (please please please)?
Cheers.
Tim.
=========Background
=========
By some means we need the legalizer to be able to completely remove all illegal
types (a nebulous concept, but it should be obvious that you don't want to
try
RegBankSelect or ISel on a s1234). However, each stepwise change of the
legalizer has the input and output VRegs of an instruction as its interface so
it cannot remove these types.
So, the bulk of legalization for larger types is expressed in terms of
G_MERGE_VALUES and G_UNMERGE_VALUES instructions, with the intent that a final
combining pass will be run after the main legalization to remove matching pairs
and leave the MachineFunction legal.
However, it does an incomplete job right now.
In ~630 cases (in the test-suite at -O0) we have a G_UNMERGE_VALUES but it
can't
be matched up with G_MERGE_VALUES and removed:
600 G_UNMERGE hits a PHI
30  G_UNMERGE hits an AArch64 intrinsic (e.g. @llvm.aarch64.ld2)
On the other hand about 8000 G_MERGE_VALUES remain after the combining that
don't match a G_UNMERGE_VALUES:
5300 used by a G_STORE
3090 used by a COPY (to physregs or dead?).
60 used by a G_EXTRACT
15 are dead
There are also more or less theoretical issues that don't arise in the
test-suite but need to be resolved. In particular even when a G_MERGE_VALUES is
consumed by a G_UNMERGE_VALUES the input types of the merge may not be the same
as the outputs of the unmerge. The only way to resolve this conflict is by
inserting explicit extract/insert instructions which may or may not themselves
be legal (at least in principle).
===================High level solutions
===================
Unless anyone comes up with more bright ideas, I think we can see the following
complications before we get to the state where the legalizer has a hope of
replacing SDAG with a straight face:
1. G_MERGE_VALUES and G_UNMERGE_VALUES will not always be eliminated and must be
   mappable+selectable to complex multi-instruction, multi-register
   copy/inserts.
2. We must have a complex enough combiner in the legalizer pass alone to perform
   multi-step legalization intermixed with combines in the hope that we reach a
   fixed point.
Below is a bit of expanded detail on the issues found by the test-suite
above. I'm not quite sure how coherent it is, but it made sense to me so I
can
probably try to reword if anything is unclear.
========================================Expanded description of test-suite
issues
========================================
G_STORE and G_EXTRACT
---------------------
These are instances of a merge feeding legal instructions. I believe this has to
be permitted but it means that selection is going to contend with a wide range
of possible different merge instructions. At best all legal input and output
register classes must be handled.
On AArch64 the likely list is:
  + "mov vD.2d[0], dN; mov vD.2d[1], dM" (and s variants with 4
insts).
  + "mov vD.4s[0], wN, ..." (and d variants).
  + "ubfi xD, xN, #0, #32; ubfi xD, xM, #32, #32"
  + "COPY; COPY; COPY" or multiples of the above when stitching
together a QQQ.
Machine intrinsics
------------------
The versions of this problem I've seen come from code like this::
  {<2 x i32>, <2 x i32>, <2 x i32>} %val = call {...}
@llvm.aarch64.ld3(%ptr)
  %vec0 = extractelement {...} %val, 1
which becomes::
  s192 = G_INTRINSIC_W_SIDE_EFFECTS ...
  s64, s64, s64 = G_UNMERGE_VALUES s192
Because intrinsics have to have sane types (they're not really legalized
currently, even in SDag) this is very much like the issue with G_STORE and
G_EXTRACT above. We probably have to support RegBankSelect and ISel on this kind
of G_UNMERGE_VALUES, which becomes a tractable but complex sequence of copies.
PHIs
----
For example:
::
   s128 = PHI [%v0, BB#0], [%v1, BB#1]
   s64, s64 = G_UNMERGE_VALUES s128
I believe we have to split the PHI into sizes corresponding to the
G_UNMERGE_VALUES. If these correspond to the sizes of the (expected) merges on
the other side that's not too bad::
  s64 = PHI [%v0.0, BB#0], [%v1.0, BB#1]
  s64 = PHI [%v0.1, BB#0], [%v1.1, BB#1]
On the other hand we also have the possibility of mismatching types here, with
the added fun that PHI instructions have to be the first in their block.
I believe we'll have to insert messy extract/insert code into a separate,
fallthrough block just before this one. The main question is whether it should
be done as part of the main legalization loop or in the post-processing pass.
Since we have to deal with illegal extracts & inserts anyway, delaying PHIs
until everything else is decided seems reasonable to me.
Mismatching types in paired merge/unmerge
-----------------------------------------
Friendly example::
  s384 = G_MERGE_VALUES s192, s192
  s128, s128, s128 = G_UNMERGE_VALUES s384
Both of these instructions have to go (s384 is likely horribly illegal), but
they can't simply be combined away. The first thing to try is converting to
extracts & inserts::
  s128 = G_EXTRACT s192, 0
  s64 = G_EXTRACT s192, 128
  s64 = G_EXTRACT s192, 0
  s128 = G_IMPLICIT_DEF
  s128 = G_INSERT s128, s64, 0
  s128 = G_INSERT s128, s64, 64
  s128 = G_EXTRACT s192, 64
The good side of this is that I believe it is capable of eliminating the bad
s384 type. The bad side is that we have no reason to expect the actual extract
and insert operations to be legal. Each one potentially corresponds to some
reasonably complex shifting and masking across multiple registers.
To get around this we could either demand that extract and insert **are** legal
whenever a type has an available register class, or attempt to legalize them in
turn (leading to more merges and a fixed-point at the end if we're lucky).
Used by COPY
------------
Too many to be sure what's going on, but a sampling suggests most are either
ending up in PhysRegs (fine) or dead.
A merge being used by dead COPYs could be a problem.
Dead instructions
-----------------
Not a concern. Delete them if we want, or maybe ignore them.
Quentin Colombet via llvm-dev
2017-Jul-19  21:40 UTC
[llvm-dev] GlobalISel legalization guarantees
Hi Tim,
Thanks for writing that up.
The way I initially envisioned it could be summarized in the following pseudo
algorithm:
List IllegalOps = gatherAllIllegalOps;
List LegalizationArtifacts;
do {
  Assert(LegalizationArtifacts.empty());
  // Do the bulk of legalization.
  While (IllegalOps.empty()) {
    Instr I = IllegalOps.pop_back_val();
    // Legalizing an instruction may create some (less) illegal instructions and
    // LegalizationArtifact like trunc, anyext, merge, unmerge.
    Legalize(I, &IllegalOps, &LegalizationArtifacts);
  }
  // Clean-ups the artifacts.
  While (LegalizationArtifacts.empty()) {
     Instr I = LegalizationArtifacts.pop_back_val();
     // An artifact may not be cleaned up, in that case, it needs to be legalize
     CleanUpArtifact(I, &LegalizationArtifacts, &IllegalOps);
  }
} while(!IllegalOps.empty());
So roughly, legalize, then clean-up (via combine), and legalize again what’s
left.
The clean-up part should move the artifacts (unmerge, anyext, and so on) past
PHIs.
In the end, it should remain only things that are fed via legal instructions or
fed legal instructions. Thus the set of things to legalize (via loads and
stores, if the target wants something fancier, it custom lowers it) is much
narrower and somewhat expected by the target.
> On Jul 19, 2017, at 2:08 PM, Tim Northover via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi all,
> 
> I've been thinking about what guarantees the legalization pass needs
> to make in GlobalISel and how it should do it. Turns out it's really
> quite difficult so I thought I'd send a message out to the list with
> the current status and my thoughts for the future.
> 
> It got really quite long unfortunately, so the important bit is
> probably the second section "High level solutions" on scary
changes
> that I can't see a way around.
> 
> Does anyone else either see issues I've missed with guaranteeing
> legalization, or see simpler solutions (please please please)?
> 
> Cheers.
> 
> Tim.
> 
> =========> Background
> =========> 
> By some means we need the legalizer to be able to completely remove all
illegal
> types (a nebulous concept, but it should be obvious that you don't want
to try
> RegBankSelect or ISel on a s1234). However, each stepwise change of the
> legalizer has the input and output VRegs of an instruction as its interface
so
> it cannot remove these types.
> 
> So, the bulk of legalization for larger types is expressed in terms of
> G_MERGE_VALUES and G_UNMERGE_VALUES instructions, with the intent that a
final
> combining pass will be run after the main legalization to remove matching
pairs
> and leave the MachineFunction legal.
> 
> However, it does an incomplete job right now.
> 
> In ~630 cases (in the test-suite at -O0) we have a G_UNMERGE_VALUES but it
can't
> be matched up with G_MERGE_VALUES and removed:
> 
> 600 G_UNMERGE hits a PHI
That’s fine, we should push them around.
> 30  G_UNMERGE hits an AArch64 intrinsic (e.g. @llvm.aarch64.ld2)
What does this mean for the lowering?
> 
> On the other hand about 8000 G_MERGE_VALUES remain after the combining that
> don't match a G_UNMERGE_VALUES:
> 
> 5300 used by a G_STORE
> 3090 used by a COPY (to physregs or dead?).
> 60 used by a G_EXTRACT
> 15 are dead
What is the total #instrs?
I’d like to have a sense of how big that is.
Also, how much of that could be avoided by splitting the structures up-front? (I
expect this is what the G_STOREs come from.)
Why can’t we push the unmerge to the G_STORE/COPY by splitting them?
> 
> There are also more or less theoretical issues that don't arise in the
> test-suite but need to be resolved. In particular even when a
G_MERGE_VALUES is
> consumed by a G_UNMERGE_VALUES the input types of the merge may not be the
same
> as the outputs of the unmerge. The only way to resolve this conflict is by
> inserting explicit extract/insert instructions which may or may not
themselves
> be legal (at least in principle).
> 
> ===================> High level solutions
> ===================> 
> Unless anyone comes up with more bright ideas, I think we can see the
following
> complications before we get to the state where the legalizer has a hope of
> replacing SDAG with a straight face:
> 
> 1. G_MERGE_VALUES and G_UNMERGE_VALUES will not always be eliminated and
must be
>   mappable+selectable to complex multi-instruction, multi-register
>   copy/inserts.
That’s expected, and after the combines/clean-ups are done, the generic code
should go with loads and stores.
We need to understand why we get so many though.
> 2. We must have a complex enough combiner in the legalizer pass alone to
perform
>   multi-step legalization intermixed with combines in the hope that we
reach a
>   fixed point.
Yup, that was the plan.
We should already support multi-step legalization, right?
> 
> Below is a bit of expanded detail on the issues found by the test-suite
> above. I'm not quite sure how coherent it is, but it made sense to me
so I can
> probably try to reword if anything is unclear.
> 
> ========================================> Expanded description of
test-suite issues
> ========================================> 
> G_STORE and G_EXTRACT
> ---------------------
> 
> These are instances of a merge feeding legal instructions. I believe this
has to
> be permitted but it means that selection is going to contend with a wide
range
> of possible different merge instructions. At best all legal input and
output
> register classes must be handled.
> 
> On AArch64 the likely list is:
> 
>  + "mov vD.2d[0], dN; mov vD.2d[1], dM" (and s variants with 4
insts).
>  + "mov vD.4s[0], wN, ..." (and d variants).
>  + "ubfi xD, xN, #0, #32; ubfi xD, xM, #32, #32"
>  + "COPY; COPY; COPY" or multiples of the above when stitching
together a QQQ.
> 
> Machine intrinsics
> ------------------
> 
> The versions of this problem I've seen come from code like this::
> 
>  {<2 x i32>, <2 x i32>, <2 x i32>} %val = call {...}
@llvm.aarch64.ld3(%ptr)
>  %vec0 = extractelement {...} %val, 1
> 
> which becomes::
> 
>  s192 = G_INTRINSIC_W_SIDE_EFFECTS …
Isn’t this lower into something like:
Vreg1, vreg2, vreg3 by call lowering.
>  s64, s64, s64 = G_UNMERGE_VALUES s192
> 
> Because intrinsics have to have sane types (they're not really
legalized
> currently, even in SDag) this is very much like the issue with G_STORE and
> G_EXTRACT above. We probably have to support RegBankSelect and ISel on this
kind
> of G_UNMERGE_VALUES, which becomes a tractable but complex sequence of
copies.
If that’s legal, it should indeed by supported by RBS and ISel. If not, then the
legalizer should lower them.
> 
> PHIs
> ----
> 
> For example:
> 
> ::
>   s128 = PHI [%v0, BB#0], [%v1, BB#1]
>   s64, s64 = G_UNMERGE_VALUES s128
> 
> I believe we have to split the PHI into sizes corresponding to the
> G_UNMERGE_VALUES. If these correspond to the sizes of the (expected) merges
on
> the other side that's not too bad::
> 
>  s64 = PHI [%v0.0, BB#0], [%v1.0, BB#1]
>  s64 = PHI [%v0.1, BB#0], [%v1.1, BB#1]
Yep.
> 
> On the other hand we also have the possibility of mismatching types here,
with
> the added fun that PHI instructions have to be the first in their block.
What’s the problem here.
> 
> I believe we'll have to insert messy extract/insert code into a
separate,
> fallthrough block just before this one. The main question is whether it
should
> be done as part of the main legalization loop or in the post-processing
pass.
Clean-up part.
> 
> Since we have to deal with illegal extracts & inserts anyway, delaying
PHIs
> until everything else is decided seems reasonable to me.
> 
> Mismatching types in paired merge/unmerge
> -----------------------------------------
> 
> Friendly example::
> 
>  s384 = G_MERGE_VALUES s192, s192
>  s128, s128, s128 = G_UNMERGE_VALUES s384
> 
> Both of these instructions have to go (s384 is likely horribly illegal),
but
> they can't simply be combined away. The first thing to try is
converting to
> extracts & inserts::
> 
>  s128 = G_EXTRACT s192, 0
> 
>  s64 = G_EXTRACT s192, 128
>  s64 = G_EXTRACT s192, 0
>  s128 = G_IMPLICIT_DEF
>  s128 = G_INSERT s128, s64, 0
>  s128 = G_INSERT s128, s64, 64
> 
>  s128 = G_EXTRACT s192, 64
I would go with just loads and stores:
Store s192
Store s192
s128 = load
s128 = load
s128 = load
> 
> The good side of this is that I believe it is capable of eliminating the
bad
> s384 type. The bad side is that we have no reason to expect the actual
extract
> and insert operations to be legal. Each one potentially corresponds to some
> reasonably complex shifting and masking across multiple registers.
> 
> To get around this we could either demand that extract and insert **are**
legal
> whenever a type has an available register class, or attempt to legalize
them in
> turn (leading to more merges and a fixed-point at the end if we're
lucky).
The only way I can see we end up in this situation is when the input IR has
already weird (struct) bit cast. At this point, I’d say garbage in, garbage out.
Hope this helps.
Cheers,
-Quentin
> 
> Used by COPY
> ------------
> 
> Too many to be sure what's going on, but a sampling suggests most are
either
> ending up in PhysRegs (fine) or dead.
> 
> A merge being used by dead COPYs could be a problem.
> 
> 
> Dead instructions
> -----------------
> 
> Not a concern. Delete them if we want, or maybe ignore them.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Friedman, Eli via llvm-dev
2017-Jul-19  22:12 UTC
[llvm-dev] GlobalISel legalization guarantees
On 7/19/2017 2:08 PM, Tim Northover via llvm-dev wrote:> ========================================> Expanded description of test-suite issues > ========================================> > G_STORE and G_EXTRACT > --------------------- > > These are instances of a merge feeding legal instructions. I believe this has to > be permitted but it means that selection is going to contend with a wide range > of possible different merge instructions. At best all legal input and output > register classes must be handled. > > On AArch64 the likely list is: > > + "mov vD.2d[0], dN; mov vD.2d[1], dM" (and s variants with 4 insts). > + "mov vD.4s[0], wN, ..." (and d variants). > + "ubfi xD, xN, #0, #32; ubfi xD, xM, #32, #32" > + "COPY; COPY; COPY" or multiples of the above when stitching together a QQQ.You can use a stack temporary for all of these, and worry about optimizing it later.> > Machine intrinsics > ------------------ > > The versions of this problem I've seen come from code like this:: > > {<2 x i32>, <2 x i32>, <2 x i32>} %val = call {...} @llvm.aarch64.ld3(%ptr) > %vec0 = extractelement {...} %val, 1 > > which becomes:: > > s192 = G_INTRINSIC_W_SIDE_EFFECTS ... > s64, s64, s64 = G_UNMERGE_VALUES s192Instead of keeping this form through legalization, could we just fix the intrinsic to return its natural type? -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project