thr3ads.net - llvm dev - [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Daniel Sanders via llvm-dev

2018-Nov-12 23:13 UTC

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

> On Nov 10, 2018, at 03:28, Nicolai Hähnle <nhaehnle at gmail.com>
wrote:
> 
> Thank you for the detailed reply! There's a lot to digest :)  Let me
try to address most of it.
> 
> 
> [snip]
>>> I also think you should have 'ins' and 'outs'
separately; after all, a predicate may have to do a combined check on two
matched registers / operands, and produce one or more values for later re-use.
>>> 
>>> Once you have separate 'ins' and 'outs', the
"bool" in there seems a bit redundant.
>> I think there's three kinds values involved in the predicates. The
first is the inputs like these values come from other parts of the match and it
makes sense that they belong in 'ins'. The second is values that are
directly related to the truthiness of the predicate such as bool, unsigned
(register number), MachineInstr*, etc.. These are the result of the underlying
function and there can only be one of these per predicate. The type can
potentially be std::pair or std::tuple to give multiple results provided a
suitable conversion to bool is provided. The third is metadata that's just
carrying additional information to the apply. These are currently in the
'ins' section and I agree that they don't really make sense there. I
can pull them out into a separate 'outs' section. This has a flexibility
cost since the user can no longer specify the complete argument order but
that's probably a good thing for readability.
>> Here's an example of a non-bool result:
>>     def reg_with_shift : GIMatchPredicate<
>>          (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
>>       // MachineOperand *matchARMShiftedOp2(const MachineOperand &,
uint64_t);
>>       return matchARMShiftedOp2(${A}, ${imm});
>>     }]>;
>>     def : GICombineRule<
>>       (defs root:$D, reg:$S1, reg_with_shift:$S2),
>>       (match [{MIR %D = G_ADD %S1, %S2 }]),
>>       (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
>> (I've just realized there's a gap in the ability to name
metadata when used in this way. I've gone with 'S2.shift' for now)
>> This is equivalent to:
>>     def reg_with_shift : GIMatchPredicate<
>>          (ins reg:$A), (outs ptr_to_reg:$reg, imm:$shift), bool, [{
>>       // bool matchARMShiftedOp2(const MachineOperand &,
MachineOperand*&, uint64_t&);
>>       return matchARMShiftedOp2(${A}, ${reg}, ${imm});
>>     }]>;
>>     def : GICombineRule<
>>       (defs root:$D, reg:$S1, reg_with_shift:$S1),
>>       (match [{MIR %D = G_ADD %S1, %S2.reg }]),
>>       (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
>> but the first definition slightly more efficient for most targets since
there's usually a register or two for return values which in this version
we're spending solely on a redundant bool and requires fewer argument
registers, the second version has to store the MachineOperand* on the stack to
pass it by reference. The difference is fairly small when considered on an
individual rule but should accumulate on large rulesets or large functions.
> 
> Okay, I see now where you're coming from, and the goal makes a lot of
sense to me. I need some help with the syntax and semantics though :)
> 
> First, just to clarify, the intention is for matchARMShiftedOp2 to match
something like "G_SHL %X, ${imm}", right? Why not express that
directly in MIR?
Going by
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBEAGE.html
(which admittedly is old docs but it's unlikely to have changed), it needs
to be able to match the following:

%0 = G_CONSTANT iN ${imm} # Where imm is any immediate expressible using an
unsigned 8-bit immediate rotated by an even number of bits

%0 # I.e. just a plain vreg

%0 = G_CONSTANT iN ${imm} # Where imm is 1 to 32
%2 = G_[AL]SHR %1, %0

%0 = G_CONSTANT iN ${imm} # Where imm is 0 to 31
%2 = G_SHL %1, %0

%0 = G_CONSTANT iN ${imm} # Where imm is 1 to 31
%2 = G_ROR %1, %0 # We don't actually have a rotate right at the moment so
this would have to match the equivalent and/shift/or sequence

%0 = G_CONSTANT iN 1
%2 = G_ROR %1, %0 # We don't actually have a rotate right at the moment so
this would have to match the equivalent and/shift/or sequence
%3 = G_SEXT %2

%2 = G_[AL]SHR %0, %1

%2 = G_SHL %0, %1

%2 = G_ROR %0, %1 # We don't actually have a rotate right at the moment so
this would have to match the equivalent and/shift/or sequence

I don't think it's strictly necessary to escape into C++ for this case
(except for checking the immediates have the right value) as there's nothing
here that can't be done with patterns. Historically, I believe the reason
targets have taken this route in SelectionDAG is mainly down to the fact that
SelectionDAG patterns didn't provide a good means to factor out common
sub-patterns. Until earlier this year, PatFrags really only defined
abbreviations for common sub-patterns rather than defining multiple
alternatives. So a target like ARM would have had to duplicate patterns support
for every operation that supported it. In the case of ARM that's 153 (9
sub-patterns * 17 instructions supporting it) patterns. By escaping into C++ via
ComplexPattern, 17 would do the same job with no code duplication.

FWIW, in the long run I would like most cases of this to move to macros as the
tablegen pass ought to be smart enough to factor out the commonality. There will
be some that genuinely need the escape hatch but I'd like that to be kept to
a minimum as the more matching we can delegate to tablegen, the more we can do
interesting things like measure rule coverage, guided fuzzing of rules,
profiling the rules, dropping low frequency/profit rules from -O1/-O2, proving
correctness, etc. Conversely the more C++ there is, the more work is required to
implement those features.
> But for the sake of the argument I'll assume there's a good reason
for the escape to C++. The way I understood your original description of
GIMatchPredicate, its most generic and versatile form of use is as an explicit
entry in the (match) list. Like so:
> 
>  def reg_with_shift : GIMatchPredicate<
>      (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
>    // MachineOperand *matchARMShiftedOp2(const MachineOperand &,
>    //                                    uint64_t &);
>    return matchARMShiftedOp2(${A}, ${imm});
>  }]>;
>  def : GICombineRule<
>      (defs root:$D, reg:$S1, reg:$S2),
>      (match [{MIR %D = G_ADD %S1, %tmp }],
>             (reg_with_shift $tmp, $shift):$S2),
>      (apply [{MIR %D = ADD %S1, %S2, %shift }])>;
> 
> I believe that the kind of syntax you showed, in whatever form it
eventually ends up, should just be syntactic sugar. It would be de-sugared to
essentially what I've shown here, right?
Yes, that's right.
> So then the question becomes: what is the sugared syntax, if any?
> 
> I must say that I really don't like your first example, i.e. this one:
> 
>  def : GICombineRule<
>      (defs root:$D, reg:$S1, reg_with_shift:$S2),
>      (match [{MIR %D = G_ADD %S1, %S2 }]),
>      (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
> 
> The syntax very strongly suggests that $S2 will be assigned to the RHS
operand of the G_ADD, but there seems to be non-local magic which causes it to
be assigned something else instead -- unless I've misunderstood the purpose
of the matchARMShiftedOp2?
I see your point here. The matched operand and the result of the predicate
essentially have the same name at the moment. Something like:
 (apply [{MIR %D = ADD %S1, %S2.result, %S2.shift }])
would resolve the ambiguity but then 'S2.result' is a magically
generated name. We'll need to give the result of the predicate a proper name

I suppose we could treat the first 'outs' to mean the return type but I
think we ought to be more explicit than that and keep the return value separate
from the metadata. Something like:
 def reg_with_shift : GIMatchPredicate<
     (ins reg:$A), (outs bool), (metadata_out MachineOperandPtr:$result,
uint64_t:$shift), [{
   // bool matchARMShiftedOp2(const MachineOperand &, MachineOperand *&,
uint64_t &);
   return matchARMShiftedOp2(${A}, ${imm});
 }]>;
 def reg_with_shift : GIMatchPredicate<
     (ins reg:$A), (outs MachineOperandPtr:$result), (metadata_out
uint64_t:$shift), [{
   // MachineOperand *matchARMShiftedOp2(const MachineOperand &, uint64_t
&);
   return matchARMShiftedOp2(${A}, ${imm});
 }]>;
We'd also need something to make using the original %S2 an error as I'm
sure passing the operand being matched instead of the operand returned by the
predicate will be a common mistake otherwise:
	let ApplyCannotUseMatchedOperand = 1;
I'd also be inclined to make it a warning for this to be unset if the result
has a name.
>>>>     }]>;
>>>>     def extending_loads : GICombineRule<
>>>>       (defs root:$root, reg:$A,
extending_load_matchdata:$matchinfo),
>>>>       (match [{MIR %root = G_LOAD %A }],
>>>>              (extending_load_predicate root:$A,
>>>>                                        
extending_load_matchdata:$matchinfo)),
>>>>       (apply (exec [{
Helper.applyCombineExtendingLoads(${root}, ${matchinfo}); }],
>>>>                    reg:$root,
extending_load_matchdata:$matchinfo)>;
>>>> The GIDefMatchData declares a new type of data that can be
passed from the match to the apply. Tablegen is responsible for arranging for
appropriate storage during the Combine algorithm. The GIMatchPredicate declares
a C++ predicate that fills out the PreferredTuple (passed by reference) whenever
it returns true for a successful match. We could have made the predicate return
std::pair<bool, PreferredTuple> instead but that's less efficient (it
would be an sret return on many targets) and would require us to define the
truthiness (no examples of are in this email as I expect it to be a rare thing
to need) in order to act as a predicate. Normally, you'd feed this into a
(create_imm ...) or a (create_operand ...) in the apply section. However, in
this particular case the data being passed determines the entirety of the
replacement so we escape into arbitrary C++ instead and arrange for the
variables to be injected appropriately using the 'exec' operator.
>>> 
>>> Have you considered the possibility of defining custom `create`
actions for the apply, analogous to custom predicates?
>>> 
>>> In your description, it appears there is a fixed list of builtin
actions such as `create_imm`, `create_operand` (not really described anywhere?),
`exec` and the tentative debug info approach.
>>> 
>>> It seems valuable to be able to define one's own operator for
custom operations.
>> I didn't go into the mechanical definitions but create_imm is
really just a specialization of create_operand and is equivalent to
(create_operand [{ MachineOperand::CreateImm( <code> ) }]). The string
concatenation needed for that specialization currently doesn't work on the
code type but that seems easily fixable. I haven't defined one yet, but I
don't see any reason we couldn't define non-operands in a similar way.
> 
> Okay, that sounds good to me.
> 
> I'm fairly sure that concatenation on code types works, but it may be
the case that it currently requires some back-and-forth casts with the string
type. Anyway, like you say, should be easy to fix.
> 
> 
> [snip]
>>> Let's talk about using DAGs. I think I understand where the
temptation comes from to describe the MIR using code blocks, but I'm also
pretty sure that it's a mistake to do so.
>> The main motivation to move away from DAGs came from limitations in
DAGs and the general ugliness of the original DAG-based definitions. Once the
idea of using MIR came up, we liked the fact that it matched the existing
serialization format which makes it easy to turn a specific example from the
compiler output into a combine rule and avoid the need for another
representation for instructions. We also liked the way it dealt with the
difficult cases, MachineMemoryOperands, subregs, etc. and also that it
wasn't a new syntax or parser (although it make require modifications to the
existing one). It also looked like it be convenient for a tool like Alive
(https://www.cs.utah.edu/~regehr/papers/pldi15.pdf) although we didn't
really explore that particular thought further.
>> Ultimately, both representations are DAG's and it comes down to the
convenience of the syntax in specifying the rules we want. MIR is very
convenient because of it's familiarity and flexibility.
>> One of the bigger problems with using tablegen's DAG type is that
it doesn't deal with multi-result instructions very well. Every time this
has been raised on the list w.r.t SelectionDAG the solution has boiled down to
'use C++ instead' and it would be good to fix that so that things like
UADDO are representable. You can write a rule that matches something like divmod
at the top-level using the 'set' operator:
>>   (set $D1, $D2, (divmod $A, $B))
>> but as soon as it's not the top-level, it gets really ugly fast
even using pseudo-nodes:
>>   (set (outs $D1, $D2), (sext (result (G_DIVMOD $A, $B):$T, 0)),
>>                         (sext (result $T, 1)))
>> In this example, outs is a pseudo-node needed to distinguish the
results from the sinks of the DAG, and result is a pseudo-node that selects one
of the multiple results for use by the parent node.  Meanwhile the MIR for is:
>>   %0, %1 = G_DIVMOD %A, %B
>>   %D1 = G_SEXT %0
>>   %D2 = G_SEXT %1
>> which seems much clearer. It becomes a bigger problem when you start
wanting to match target specific instructions as multi-result becomes more
common later in the backend, especially post-isel. It's also a problem for
upside-down matches, inside-out, and multi-root matches since tablegen's dag
type requires a single common root node and can only draw edges in one
direction. If you want to have to root in the middle then you have to distort
the DAG to bring it to the root and mark up the reversed edges somehow
>> Moving the result inside the instruction operator helps with these
problems but requires a list of dags and all the temporaries need to be named:
>>   [(G_DIVMOD $T1, $T2, $A, $B),
>>    (G_SEXT $D1, $T1),
>>    (G_SEXT $D2, $T2)]
>> and at that point, we're very nearly at MIR.
> 
> Yes, that was kind of the point. Expressing MIR, but in a way that is
native to TableGen :)
> 
> I should probably clarify that when I wrote "DAG", I meant that
only in the sense of the TableGen 'dag' type, which should probably just
be called "S-expression" instead of "DAG".
> 
> I absolutely believe that a list-of-instructions approach is better than
what the SDISel currently uses, for all the reasons you mentioned (multiple
defs, multiple roots).
Ah ok. I thought that you were aiming for SDISel style DAG's. :-)

At first glance, I think everything can fit into this style. For example,
matching could be something like:
  (G_DIVMOD $T1, $T2, $A, $B, debug-loc:$DL1)
  (DEBUG_VALUE $T1, debug-expr:$DE1)
  (DEBUG_VALUE $T2, debug-expr:$DE2)
and the corresponding apply:
  (DIVMOD $T1, $T2, $A, $B, debug-loc:$DL1)
  (DEBUG_VALUE $T1, (modify-expr debug-expr:$DE1, ...))
  (DEBUG_VALUE $T2, (modify-expr debug-expr:$DE2, ...))
macros would be the same as instructions, etc. I'll work through the
examples in my notes and see if I can find something reasonable for everything
in this style.

One potential nit, is that it it's a little odd for the defs to be on the
right hand side of the opcode. It's easy enough to change:
  (set $T1, $T2, (G_DIVMOD $A, $B))
but I think that's worse to read. I'd be inclined to go for everything
on the right and just label the defs:
  (G_DIVMOD def:$T1, def:$T2, use:$A, use:$B, debug-loc:$DL1)
> I simply think using the 'dag' type in TableGen is a much better
approach because it composes with other TableGen frontend features (e.g., all
the substitution rules "just work", you can programmatically compose a
dag/sexpr from multiple pieces using !con, etc.).
That makes sense to me.
>> Another issue we didn't like with DAG is that you also end up
needing a lot of pseudo-nodes like EXTRACT_SUBREG whereas these are natural
parts of the MIR syntax. For example:
>> (set $d, (EXTRACT_SUBREG (MYTGT_ADD $A, $B), sub_lo))
>> compared to:
>> %d:sub_lo = MYTGT_ADD %A, %B
>> and:
>> (set $d, (REG_SEQUENCE GPR32, (MYTGT_ADD $A, $B), sub_lo, (MYTGT_ADD
$C, $D), sub_hi))
>> compared to:
>> %d:sub_lo = MYTGT_ADD %A, %B
>> %d:sub_hi = MYTGT_ADD %C, %D
> 
> How about:
> 
>  (MYTGT_ADD (sub_lo $d), $A, $B),
>  (MYTGT_ADD (sub_hi $d), $C, $D),
> 
> That seems acceptable to me.
That looks good to me too.
>> Along the same lines, I also think that the integrated debug-info is
only really practical in MIR. It's possible to shoe-horn DILocation in using
a pseudo-node like so:
>>    (set $d, (add (mul $a, $b):$mul, $c):$add   -> (set
(merged_dilocation (muladd $a, $b, $c), $mul, $add)
>> but there isn't really a good place to modify the DIExpression used
by DEBUG_VALUE.
>>> One reason is that it adds yet another parser, which is more
maintenance burden without buying much.
>> The expectation is that we can make use of the existing MIR parser and
replace it's MachineInstr/MachineOperand/etc. generation code with
pattern-matching generation code. I'm going to be looking into the
implementation of that over the next few weeks.
>>> The more important reason is that DAGs compose better with the rest
of TableGen. Consider combine rules defined in multiclasses, for example. That
is very common all through LLVM. In order to mirror an existing selection rule
in the AMDGPU backend, we'd likely want to have something like this:
>>> 
>>>  multiclass Arithmetic_i16_GIPats<Instruction ginst,
>>>                                   Instruction inst> {
>>>    // This is probably wishful thinking -- would be okay if we had
to
>>>    // split this into two different rules due to the different
types
>>>    // of $dst
>>>    def : GICombineRule<
>>>      (defs root:$dst, operand:$src0, operand:$src1),
>>>      (oneof (ginst i16:$dst, i16:$src0, i16:$src1),
>>>             (match (ginst i16:$tmp, i16:$src0, i16:$src1),
>>>                    (G_ZEXT i32:$dst, $tmp))),
>>>      (inst $dst, $src0, $src1)>;
>>> 
>>>    // A third rule for zext to 64 bits would go here...
>>>  }
>> This is the reason that macros are explicitly imported in the defs
section and the imported instances are renameable. The same thing in the MIR
syntax (and using the changes to variable naming from above) would be:
>>  multiclass Arithmetic_i16_GIPats<Instruction ginst,
>>                                   Instruction inst> {
>>    def : GICombineRule<
>>      # There's a corner case for the double definition of $dst
here. I'll come back to that later in the mixing concerns comment
>>      (defs root:$dst, (import-macro ginst:$ginst, def:$dst, use:$src0,
use:$src1)),
>>      (match [{ %dst:(s16) = ginst %src0:(s16), %src1:(s16) }]), #
I've corrected the i16 to s16 here. i16 isn't a type in GlobalISel
>>      (apply /* I'll come back to this bit */)>;
>>    // Covers s32 and s64
>>    def : GICombineRule<
>>      (defs root:$dst, operand:$src0, operand:$src1, (import-macro
ginst:$ginst)),
>>      (match [{ %0:(s16) = ginst %src0:(s16), %src1:(s16)
>>                %dst = G_ZEXT %0:(s16) }]
>>             (isS32OrS64 type:$dst)),
>>      (apply /* I'll come back to this bit */)>;
>>  }
>> We could potentially fold the has G_ZEXT case and no G_ZEXT cases
together with a macro but it seems unnecessary complexity in this case. It might
be more worth it if there's a similar G_SEXT variant too.
>> The reason for the "I'll come back to this bit" in the
apply section is because you've raised an issue I'd forgotten to deal
with. The opcodes in the apply section should also be parameterizable. My first
thought on that is that there should be a macro-equivalent for apply such as:
>>  def : GIExpansion<
>>    (defs def:$R, use:$S, imm:$IDX),
>>    (apply (select imm:$IDX,
>>                (0 [{MIR %R = FOO %S }]),
>>                (1 [{MIR %R = BAR %S }]),
>>                (2 [{MIR %0 = BAZ %S
>>                         %R = BAR %0 }])))>;
>> or alternatively:
>>   def : GICombineRule<
>>     ...,
>>     ...,
>>     (apply (create_opcode [{ ${IDX} ? MYTGT_A : MYTGT_B }] ...):$name,
>>            [{MIR %D = name %s }])>;
>> The former is more powerful, but the latter is more convenient for
trivial cases
> 
> I admit that you've lost me a little here :)
> 
> Okay, I see how importing an instruction as a macro is possible without
string concatenation, although I have to say that having to express this
explicit import still feels quite a bit more cumbersome than the strawman syntax
I've shown using the 'dag' type.
> 
> Again, the 'dag' type just composes better with the rest of
TableGen.
With your reply above, I'm coming around to the dag type. I'll see how
the examples in my notes end up in that style.
> On the apply-side though, I don't understand where the difficulty comes
from. Each instantiation of the multiclass has a single `inst' which needs
to go into the apply rule. That shouldn't be so difficult to express -- at
least with the 'dag' type it's trivial.
Yeah, I went off on a tangent a bit :-).

For your example: Yes, there's only one inst for each multiclass
instantiation so it's simple.

For the tangent I was on, consider:
	%0 = G_CONSTANT iN %x
	%1 = G_CONSTANT iN %y
	%2 = G_LSHR %a, %x
	%3 = G_LSHR %1, %y
	=>
	%0 = G_CONSTANT iN %z # z = x+y
	%1 = G_LSHR %a, %0
and:
	%0 = G_CONSTANT iN %x
	%1 = G_CONSTANT iN %y
	%2 = G_ASHR %a, %x
	%3 = G_ASHR %1, %y
	=>
	%0 = G_CONSTANT iN %z # z = x+y
	%1 = G_ASHR %a, %0
and:
	%0 = G_CONSTANT iN %x
	%1 = G_CONSTANT iN %y
	%2 = G_SHL %a, %x
	%3 = G_SHL %1, %y
	=>
	%0 = G_CONSTANT iN %z # z = x+y
	%1 = G_SHL %a, %0

These rules are very similar so it would be nice to merge them like so:
	%0 = G_CONSTANT iN %x
	%1 = G_CONSTANT iN %y
	%2 = {G_SHL,G_ASHR,G_LSHR} %a, %x
	%3 = <same-shift-as-%2> %1, %y
	=>
	%0 = G_CONSTANT iN %z # z = x+y
	%1 = <same-shift-as-match> %a, %0
>>> Obviously you can achieve this in your proposal as well using
string concatenation, but that does seem rather backwards.
>>> 
>>> Something else that falls out rather nicely from using DAGs is that
it gives you a natural way to give a name to an _instruction_, to be able to
pass it to a predicate for extended checks (and as a potentially more natural
way of preserving debug locations!).
>> I don't think we need a means to name instructions since
instructions can be trivially found from of their results using
MachineOperand::getParent().
>> As noted above, merging debug locations doesn't fit very well into
DAG's as you have to use a pseudo node to be able to provide both
DILocations on the same result instruction, and there's no real means to
include DEBUG_LOC for temporaries that are eliminated and need to be rebased
from another value.
> 
> Hmm... I feel like we need more concrete examples for debug info to discuss
this. One of the reasons I've brought up naming instructions is precisely
because I thought it could simplify transferring DebugLocs.
> 
> Again, I'm not suggesting the use of a DAG like in SDSel, as I think
that's a mistake. I simply mean the 'dag' type of TableGen, which
should really be called 'sexpr' or something.
I think we can use the same idea you suggested for subregisters:
  (DIVMOD $T1, $T2, $A, $B, (merge_debug_loc $DL1, $DL2))
>>> I have a more explicit example of this later.
>>> 
>>> 
>>>> _'Upside-down' matches (i.e. roots at the top) and
similar_
>>>> This one requires algorithm changes which I'd prefer not to
discuss in this RFC. Assuming the underlying algorithm gains support for this,
this is how the syntax would look:
>>>> def : GICombineRule<
>>>>   (defs root:$root, reg:$A),
>>>>   (match [{MIR %1 = G_LOAD %root
>>>>                %A = G_SEXT %1 }]),
>>>>   (apply [{MIR %A = G_SEXTLOAD %root }])>;
>>>> The only unusual thing about this rule is that the root
isn't at the bottom. Instead of starting at a use and matching towards defs,
we're starting at the def and matching towards uses. This has some
potentially useful properties. The combine algorithm has to chose an insertion
point for the replacement code and the traditional choice has been the root of
the match. Assuming we keep doing the same thing, 'upside-down' matching
like this is able to avoid the checks that the load is safe to move, is
non-volatile, has one use, etc. that would be necessary if we moved the G_LOAD
down to the G_SEXT. Also, assuming we keep the same broadly bottom-up processing
order as the existing Combine algorithm this kind of rule also has relatively
lower priority than conventional rules because the root is seen later. This can
be useful as (algorithm-dependent) the MIR may be more settled by the time it
tries to match.
>>>> Along the same lines, the syntax can potentially support the
root being in the middle of a match. This could be used in a similar way to the
upside-down match above to control the insertion point and priority. For
example:
>>>> def : GICombineRule<
>>>>   (defs root:$root, reg:$A, reg:$B, reg:$C),
>>>>   (match [{MIR %1(s32) = G_TRUNC %A(s64)
>>>>                %2(s32) = G_TRUNC %B(s64)
>>>>                %root = G_ADD %1, %2
>>>>                %C(s64) = G_SEXT %root }]),
>>>>   (emit [{MIR %root = G_ADD %A, %B
>>>>               %C = G_SEXT_INREG %root }])>;
>>>> Unfortunately, I don't have any concrete examples where
this would be useful in comparison to a conventional or upside-down match at the
moment. I'm mostly keeping the door open as I can see potential for benefits
(mostly w.r.t sinking and hoisting safety around an instr that would be
difficult to test for that) given an appropriate rule and also because I'm
inclined to say that the tablegen syntax shouldn't be the reason it's
not possible. It should be up to the Combine algorithm and and tablegen-erate
pass involved in specializing the algorithm for a target.
>>> 
>>> I'm concerned that the concrete syntax proposed here mixes a
bunch of concerns that really should be addressed separately:
>>> 
>>> 1. Defining a mapping between semantically equivalent chunks of
code.
>>> 2. Controlling insertion points as a simple way to keep side
effects under control in many cases.
>>> 3. Controlling additional algorithmic aspects of the combine
algorithm (whether you're matching up or down, say).
>> I think there may be a misunderstanding with #3. There's no real
concept of matching up or down in the rules, there's only the starting point
for the match and from there whether it's following the uses or defs
isn't something the rule needs to be concerned about. The algorithm needs to
be aware of it (and historically hasn't been, DAGCombine broadly runs bottom
up and expects to match towards defs and can fail to re-schedule nodes for
re-visit in many common cases) but the algorithm is expected to figure that out
for itself.
>> 'root' definitions are the only place that #2 and #3 come into
the proposed syntax. At the moment, the only requirement the syntax imposes on
the algorithm is that it has a concept of a current MachineOperand Def* (and by
extension instruction) that will be used as the starting point for the match and
that it can choose a valid insertion point for each instruction produced. The
insertion point is the same position in the current algorithm but that's
algorithm dependent and doesn't have to be the case in the long run. There
may even be more than one for rules that emit multiple instructions in the apply
section. There's currently no means to specify an insertion point, it's
left up to the algorithm to find a place for each instruction.  That said,
I'd be surprised if there was an algorithm where the current MachineOperand
isn't also at least one of the insertion points.
>> * The current upstream code is a bit misleading about that and makes it
seem that it's the MachineInstr but I made that mistake before on ISel so
I'm intending to change that.
>>> Could we keep those separate somehow? E.g., don't distinguish
between 'root' and 'reg' or 'operand' in the (defs ...)
part of the rule; have TableGen figure out the roots automatically based on
walking the graph implied by the matching rule, and default to using the sink(s)
as roots.
>> operand isn't really connected to this, it's just a wildcard
for an MachineOperand that doesn't care whether it's a reg, imm, MCExpr,
or something else.
>> I'm inclined to agree that it would be nice to get 'root'
out of the definition but I don't really see a good way to do it. The sinks
aren't always the right choice and are a particularly bad choice on combines
like the extending loads since that's trying to match all uses
simultaneously and those uses could have any opcode.
> 
> How about having it as a lettable variable? As in (using my favorite
syntax, but it should translate either way):
> 
>  let MatchRoot = "$tmp" in
>  def : GICombineRule<
>      (match (G_LOAD $tmp, $ptr),
>             (G_SEXT $dst, $tmp)),
>      (apply (G_SEXTLOAD $dst, $ptr))>;
> 
> I'm using $tmp instead of $ptr as the root here since you wrote above
that it needs to be a Def.
> 
> Which makes the discussion about the insertion point potentially confusing,
since $tmp no longer appears in the (apply). But maybe that doesn't matter.
The insertion point is up to the algorithm anyway so it makes sense for it to
not be visible.
> If MatchRoot is not set, it'd just default to using a sink -- that
should be by far the most common case.
If we change that to all the sinks then we cover the common multi-root case too.
> Also, it'd be nice to get rid of the (defs) part of the syntax entirely
for brevity. I believe that should be possible.
I'm not very keen to remove the defs section but I'll see whether
it's still needed in this style. Even if it's removable though, having
it is likely to reduce the time spent on the tablegen pass as there's a cost
to having to deduce information. I also think it's also important for error
checking since it provides a means to declare which values you expect to see
both in the match/apply whereas it's expected that some are only matched.
>>> Then add some optional let-able variable in the GICombineRule to
define non-default rules for the combine algorithm.
>>> 
>>> Controlling the insertion point to address loads like in your
example above could be done quite naturally if the MIR is expressed in a DAG:
>>> 
>>>  def : GICombineRule<
>>>    (match (G_LOAD $tmp, $ptr):$load,
>>>           (G_SEXT $dst, $tmp)),
>>>    (apply (insertat $load),
>>>           (G_SEXTLOAD $dst, $ptr))>;
>> This isn't quite the same thing as the extending loads combine in
the example. In that combine, it's matching the load and all uses of its
result and chosing a replacement based on the global usage of that def across
the whole function, rewriting conflicting sext/zext/aext/trunc's to be as
cheap as possible (a lot of the detail of that is inside the function but the
code is upstream if you want to see the details). This rule is matching a single
use and relying on CSE to de-dupe the N G_SEXTLOAD's that come out of it. To
illustrate the difference, consider:
>> %0(s8) = G_LOAD %p
>> %d1(s16) = G_SEXT %0
>> %d2(s32) = G_SEXT %0
>> If I apply the above rule as much as possible to this (twice) with the
roots at the sink, I'll get:
>> %d1(s16) = G_SEXTLOAD %p
>> %d2(s32) = G_SEXTLOAD %p
>> whereas if I apply the extending loads combine to it (once), I'll
get:
>> %d2(s32) = G_SEXTLOAD %p
>> %d1(s16) = G_TRUNC %d2
>> which will normally be cheaper to execute. In combination with other
rules and CSE, the first version could eventually turn into the second version
but it requires more memory churn and processing time to get there.
> 
> You've lost me again, sorry. Where does the G_TRUNC suddenly come from?
Is that the rule that uses `Helper.matchCombineExtendingLoads`?
> 
> In that case, don't worry, I only intended my example to be comparable
to the simpler G_LOAD / G_SEXTLOAD rule without C++ escape that you showed in
your original mail as well.
Yes, it comes from the Helper.applyCombineExtendingLoads(...), which looking
back at the thread is from the macros section and not the upside down matching
section you were commenting on. Sorry about that
> If, on the other hand, you have a plan to get to the more advanced result
of matching multiple G_SEXTs and using G_TRUNC _without_ a C++ escape, I'd
be _very_ interested to learn more about that!
It's easy enough to match multiple uses in the current rule definitions. You
just describe each use in the match section and have a predicate check that the
operands aren't the same. However, C++ is the only way to look at all the
users of a given value and modify them differently at the moment. I've had
some vague thoughts about a (for-each-use ...) in the match and apply section
but I haven't given that much thought yet.
>> One of the things I'm planning to look into in the algorithm is
making it smarter about combines involving multiple-uses in the intermediate
operands. DAGCombine is harmfully greedy in this area and will often duplicate
expensive operations if one use is combinable but another isn't. This has
led to the addition of hasOneUse() checks which solves that but takes things to
the opposite extreme. I think it should be possible to find an algorithm that
makes an intelligent decision for defs with multiple uses.
> 
> Absolutely agreed that we need better solutions here.
> 
> In the AMDGPU backend, we have some rather interesting challenges around
those. This is perhaps a bit of a far future tangent, but one of them is that
floating point operations can optionally negate their operands; however, this
can be at the cost of a larger instruction encoding, and the rules for that are
rather complex. So there's a problem of optimizing *where* we insert the
negations to improve code size.
> 
> In a computation that is a tree with a single sink, an optimal solution can
be found in linear time using dynamic programming (maybe there's an easier
way, but I'm not aware of one). With general DAGs, the problem is of course
much harder.
> 
> The negation is just one example, and it would be pretty awesome if we
could use a declarative database of combines to make progress at the very least
in local search optimizations.
It certainly seems that something like this ought to be possible. A set of
combines could find the local minimums and it's also possible for combine
rules to avoid making changes and record the possible places for the negation
and the associated costs which ought to make it possible to reach the optimum
for the tree.

Along the same lines, it occurs to me that known-bits analysis is very similar
and could be done declaratively with rules like:
 def : GIKnownBitsRule<
     (match reg:$dst, reg:$src))
     (match (G_ZEXT $dst, $src)),
     (apply (exec [{ KnownZero.setBitsFrom(MRI.getType(${src}).getSizeInBits())
}], reg:$src))>;
> Thanks again for your detailed reply!
> Cheers,
> Nicolai
> 
> 
> 
>>> As you can see, this discussion also makes me wonder whether the
(defs) are needed at all; maybe we can rely on type inference almost everywhere?
>>> 
>>> 
>>>> _Multiple roots_
>>>> This one requires algorithm changes which I'd prefer not to
discuss in this RFC. Assuming the underlying algorithm gains support for this,
this is how the syntax would look:
>>>> def : GICombineRule<
>>>>   (defs root:$root1, root:$root2, reg:$A, reg:$B),
>>>>   (match [{MIR %root1 = G_ADD %A, %B }],
>>>>          [{MIR %root2 = G_SUB %A, %B }]),
>>>>   (emit [{MIR %root1, %root2 = BUTTERFLY %A, %B }])>;
>>>> This would match a G_ADD and G_SUB with operands in common and
combine them into a BUTTERFLY operation. You can think of this as two normal
rules, one with %root1 as the root and the other with %root2 as the root.
>>> 
>>> Again, it seems to me that it would be preferable if we could just
express this rule as:
>>> 
>>>  def : GICombineRule<
>>>    (match (G_ADD $dst1, $a, $b),
>>>           (G_SUB $dst2, $a, $b),
>>>           reg:$a, reg:$b),
>>>    (BUTTERFLY $dst1, $dst2, $a, $b)>;
>>> 
>>> or perhaps:
>>> 
>>>  def : GICombineRule<
>>>    (match (G_ADD $dst1, reg:$a, $b),
>>>           (G_SUB $dst2, $a, reg:$b)),
>>>    (BUTTERFLY $dst1, $dst2, $a, $b)>;
>>> 
>>> and similar variations.
>> Both of these would be ok when inferring the roots to be the sinks.
>>> TableGen should be able to figure out the multiple roots by walking
the dependency graph that is implied by the knowledge about which operands of
G_ADD and G_SUB are defs and uses, respectively.
>>> 
>>> To sum up, I really like the proposal overall, except I think it
would be a **major** improvement to use DAGs for expressing the MIR, and sure,
there's the odd detail here and there like about GIMacro.
>>> 
>>> Cheers,
>>> Nicolai
>> Thanks for all your comments!
>>>> _Grouping and Ordering Rules_
>>>> Combine rules are ordered and grouped using definitions of the
form:
>>>>     def trivial_combines : GICombineGroup<[copy_prop]>;
>>>>     def combines_for_extload:
GICombineGroup<[extending_loads]>;
>>>>     def all_combines : GICombineGroup<[trivial_combines,
combines_for_extload]>;
>>>> Essentially, we create a long ordered list of all the combines.
Tablegen is free to re-order rules so long as the resulting ruleset always
behaves as if it were in this specific order. So for example, the list
[sext_trunc_sext, zext_trunc_zext, sext_sext, zext_zexts] is free to re-order to
[sext_trunc_sext, sext_sext, zext_trunc_zext, zext_zexts] because the rules
involved are mutually exclusive due to the root opcode.
>>>> So why not make tablegen figure out the order like ISel does?
ISel attempts to figure out an order using an scoring system called Complexity
along with a hack (AddedComplexity) to allow the user to provide magic numbers
to fix the cases it got it wrong. The simplest way to confuse it is with
patterns like (add s32, complexpattern1) and (add s32, complexpattern2). These
have the same Complexity score but in truth, each one has a (possibly
overlapping) range of complexity depending on what C++ code is inside the
complexpattern's and which path through that C++ is dynamically taken. If it
matches nothing then the score should be 0 but if it matches a dozen nodes it
should be 12 (or possibly higher). We don't know which until we try to match
that specific pattern. Correctly figuring out an order in the presence of
complexpatterns is impossible. Similarly, it's also possible to confuse it
with patterns that differ but overlap and add up to the same complexity due to
quirks of the scoring system.
>>>> _Declaring Combine Passes_
>>>> Combiner passes are defined by subclassing GICombiner like so:
>>>>     def CommonPreLegalizerCombiner: GICombiner<
>>>>       "CommonGenPreLegalizerCombiner",
[all_combines]> {
>>>>       let DisableRuleOption =
"common-prelegalizercombiner-disable-rule";
>>>> let Modifiers = [(disable_rule copy_prop)]
>>>>     }
>>>> This causes tablegen to create a class named
'CommonPreLegalizerCombiner' which you can use to perform combines. This
combiner contains all the combines mentioned in the previous section because it
includes the 'all_combines' group. However, it disables the copy_prop
rule to prevent it from attempting to match. I'll discuss that a bit more
below. It also generates a command-line option for asserts builds only which can
be used to further disable rules at run-time which will be useful for tracking
down bugs or for testing a rule in isolation. I'm hoping that one day tools
like bugpoint will be able to search through the individual rules within a
combine pass when searching for a minimal reproducer.
>>>> The Modifiers field is intended to allow targets to modify an
existing ruleset (particularly a target independent one) with additional target
specific quirks. For example, one particular rule might be doing more harm than
good and should be disabled. Or maybe only a subset of the things it would
normally match are safe in which case an extra predicate should be tested. Being
able to make minor edits to the ruleset without taking on the whole maintenance
burden of the common code or causing code bloat by duplicating tables would be
useful for targets that are generally similar to the targets within LLVM but
have minor quirks.
>>>> Larger scale changes should take an alternate approach to
modifiers. It's expected that even targets that are very different from the
rest of the pack still have features in common. Such targets can declare their
own combiner to replace the common one but still generally make use of the
improvements made by the wider community. This is where GICombinerGroup will
start to shine as such a target can declare a combiner like so:
>>>> def MyTargetPreLegalizerCombiner: GICombiner<
>>>>       "MyTargetGenPreLegalizerCombiner",
>>>> [common_extend_optimizations,
>>>>        common_extending_loads,
>>>>        // common_rsqrt_and_nr_iterations, // This target has a
real sqrt operation
>>>>        mytarget_special_fma,
>>>>        common_fma,
>>>>        common_bswap_idioms,
>>>>        mytarget_bcd_arithmetic
>>>> ]> {
>>>>     }
>>>> As LLVM improves on the common_* groups, the target benefits
from those improvements automatically. However, it doesn't benefit from new
groups being added to common_all_combines because it's no longer using that
group so entirely new categories of combines added to it would not appear in the
combiner. It would still be nice to find out that a new category has appeared
though so that a decision can be made on it. To that end, I'm considering
adding:
>>>> let Verify = [(has_all_of_except all_combines,
common_rsqrt_and_nr_iterations)];
>>>> which would cause a warning if something new appeared in the
original group.
>>>> _Conclusion_
>>>> There is a lot of work that needs to be done to get all this
working and some of it may have to change once it runs into the reality of
implementation :-). However, we think that this will prove to be a very
convenient and powerful syntax with some potential a variety of tools from
profilers, to bug reducers, to correctness checking tools.
>>>> There is a patch at https://reviews.llvm.org/D54286 makes a
start on it to try out the general feel of the syntax but currently lacks the
core feature of generating match and apply code from the MIR. I'm going to
be looking into that over the next few weeks.
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>>> --
>>> Lerne, wie die Welt wirklich ist,
>>> Aber vergiss niemals, wie sie sein sollte.
> 
> -- 
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181112/8b14cab0/attachment-0001.html>

Nicolai Hähnle via llvm-dev

2018-Nov-15 18:17 UTC

head link

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

On 13.11.18 00:13, Daniel Sanders wrote:>> On Nov 10, 2018, at 03:28, Nicolai Hähnle <nhaehnle at gmail.com 
>> <mailto:nhaehnle at gmail.com>> wrote:
>>
>> Thank you for the detailed reply! There's a lot to digest :)  Let
me
>> try to address most of it.
>>
>>
>> [snip]
>>>> I also think you should have 'ins' and 'outs'
separately; after all,
>>>> a predicate may have to do a combined check on two matched
registers
>>>> / operands, and produce one or more values for later re-use.
>>>>
>>>> Once you have separate 'ins' and 'outs', the
"bool" in there seems a
>>>> bit redundant.
>>> I think there's three kinds values involved in the predicates.
The
>>> first is the inputs like these values come from other parts of the 
>>> match and it makes sense that they belong in 'ins'. The
second is
>>> values that are directly related to the truthiness of the predicate
>>> such as bool, unsigned (register number), MachineInstr*, etc..
These
>>> are the result of the underlying function and there can only be one
>>> of these per predicate. The type can potentially be std::pair or 
>>> std::tuple to give multiple results provided a suitable conversion
to
>>> bool is provided. The third is metadata that's just carrying 
>>> additional information to the apply. These are currently in the
'ins'
>>> section and I agree that they don't really make sense there. I
can
>>> pull them out into a separate 'outs' section. This has a
flexibility
>>> cost since the user can no longer specify the complete argument
order
>>> but that's probably a good thing for readability.
>>> Here's an example of a non-bool result:
>>>     def reg_with_shift : GIMatchPredicate<
>>>          (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr,
[{
>>>       // MachineOperand *matchARMShiftedOp2(const MachineOperand
&,
>>> uint64_t);
>>>       return matchARMShiftedOp2(${A}, ${imm});
>>>     }]>;
>>>     def : GICombineRule<
>>>       (defs root:$D, reg:$S1, reg_with_shift:$S2),
>>>       (match [{MIR %D = G_ADD %S1, %S2 }]),
>>>       (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
>>> (I've just realized there's a gap in the ability to name
metadata
>>> when used in this way. I've gone with 'S2.shift' for
now)
>>> This is equivalent to:
>>>     def reg_with_shift : GIMatchPredicate<
>>>          (ins reg:$A), (outs ptr_to_reg:$reg, imm:$shift), bool, [{
>>>       // bool matchARMShiftedOp2(const MachineOperand &, 
>>> MachineOperand*&, uint64_t&);
>>>       return matchARMShiftedOp2(${A}, ${reg}, ${imm});
>>>     }]>;
>>>     def : GICombineRule<
>>>       (defs root:$D, reg:$S1, reg_with_shift:$S1),
>>>       (match [{MIR %D = G_ADD %S1, %S2.reg }]),
>>>       (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
>>> but the first definition slightly more efficient for most targets 
>>> since there's usually a register or two for return values which
in
>>> this version we're spending solely on a redundant bool and
requires
>>> fewer argument registers, the second version has to store the 
>>> MachineOperand* on the stack to pass it by reference. The
difference
>>> is fairly small when considered on an individual rule but should 
>>> accumulate on large rulesets or large functions.
>>
>> Okay, I see now where you're coming from, and the goal makes a lot
of
>> sense to me. I need some help with the syntax and semantics though :)
>>
>> First, just to clarify, the intention is for matchARMShiftedOp2 to 
>> match something like "G_SHL %X, ${imm}", right? Why not
express that
>> directly in MIR?
> 
> Going by 
>
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBEAGE.html
> (which admittedly is old docs but it's unlikely to have changed), it 
> needs to be able to match the following:
> 
> %0 = G_CONSTANT iN ${imm} # Where imm is any immediate expressible using 
> an unsigned 8-bit immediate rotated by an even number of bits
> 
> %0 # I.e. just a plain vreg
> 
> %0 = G_CONSTANT iN ${imm} # Where imm is 1 to 32
> %2 = G_[AL]SHR %1, %0
> 
> %0 = G_CONSTANT iN ${imm} # Where imm is 0 to 31
> %2 = G_SHL %1, %0
> 
> %0 = G_CONSTANT iN ${imm} # Where imm is 1 to 31
> %2 = G_ROR %1, %0 # We don't actually have a rotate right at the moment
> so this would have to match the equivalent and/shift/or sequence
> 
> %0 = G_CONSTANT iN 1
> %2 = G_ROR %1, %0 # We don't actually have a rotate right at the moment
> so this would have to match the equivalent and/shift/or sequence
> %3 = G_SEXT %2
> 
> %2 = G_[AL]SHR %0, %1
> 
> %2 = G_SHL %0, %1
> 
> %2 = G_ROR %0, %1 # We don't actually have a rotate right at the moment
> so this would have to match the equivalent and/shift/or sequence
> 
> I don't think it's strictly necessary to escape into C++ for this
case
> (except for checking the immediates have the right value) as there's 
> nothing here that can't be done with patterns. Historically, I believe 
> the reason targets have taken this route in SelectionDAG is mainly down 
> to the fact that SelectionDAG patterns didn't provide a good means to 
> factor out common sub-patterns. Until earlier this year, PatFrags really 
> only defined abbreviations for common sub-patterns rather than defining 
> multiple alternatives. So a target like ARM would have had to duplicate 
> patterns support for every operation that supported it. In the case of 
> ARM that's 153 (9 sub-patterns * 17 instructions supporting it) 
> patterns. By escaping into C++ via ComplexPattern, 17 would do the same 
> job with no code duplication.
> 
> FWIW, in the long run I would like most cases of this to move to macros 
> as the tablegen pass ought to be smart enough to factor out the 
> commonality. There will be some that genuinely need the escape hatch but 
> I'd like that to be kept to a minimum as the more matching we can 
> delegate to tablegen, the more we can do interesting things like measure 
> rule coverage, guided fuzzing of rules, profiling the rules, dropping 
> low frequency/profit rules from -O1/-O2, proving correctness, etc. 
> Conversely the more C++ there is, the more work is required to implement 
> those features.
Totally agreed.

>> But for the sake of the argument I'll assume there's a good
reason for
>> the escape to C++. The way I understood your original description of 
>> GIMatchPredicate, its most generic and versatile form of use is as an 
>> explicit entry in the (match) list. Like so:
>>
>>  def reg_with_shift : GIMatchPredicate<
>>      (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
>>    // MachineOperand *matchARMShiftedOp2(const MachineOperand &,
>>    //                                    uint64_t &);
>>    return matchARMShiftedOp2(${A}, ${imm});
>>  }]>;
>>  def : GICombineRule<
>>      (defs root:$D, reg:$S1, reg:$S2),
>>      (match [{MIR %D = G_ADD %S1, %tmp }],
>>             (reg_with_shift $tmp, $shift):$S2),
>>      (apply [{MIR %D = ADD %S1, %S2, %shift }])>;
>>
>> I believe that the kind of syntax you showed, in whatever form it 
>> eventually ends up, should just be syntactic sugar. It would be 
>> de-sugared to essentially what I've shown here, right?
> 
> Yes, that's right.
> >> So then the question becomes: what is the sugared syntax, if any?
>>
>> I must say that I really don't like your first example, i.e. this
one:
>>
>>  def : GICombineRule<
>>      (defs root:$D, reg:$S1, reg_with_shift:$S2),
>>      (match [{MIR %D = G_ADD %S1, %S2 }]),
>>      (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
>>
>> The syntax very strongly suggests that $S2 will be assigned to the RHS 
>> operand of the G_ADD, but there seems to be non-local magic which 
>> causes it to be assigned something else instead -- unless I've 
>> misunderstood the purpose of the matchARMShiftedOp2?
> 
> I see your point here. The matched operand and the result of the 
> predicate essentially have the same name at the moment. Something like:
>   (apply [{MIR %D = ADD %S1, %S2.result, %S2.shift }])
> would resolve the ambiguity but then 'S2.result' is a magically 
> generated name. We'll need to give the result of the predicate a proper
name
> 
> I suppose we could treat the first 'outs' to mean the return type
but I
> think we ought to be more explicit than that and keep the return value 
> separate from the metadata. Something like:
>   def reg_with_shift : GIMatchPredicate<
>       (ins reg:$A), (outs bool), (metadata_out 
> MachineOperandPtr:$result, uint64_t:$shift), [{
>     // bool matchARMShiftedOp2(const MachineOperand &, MachineOperand 
> *&, uint64_t &);
>     return matchARMShiftedOp2(${A}, ${imm});
>   }]>;
>   def reg_with_shift : GIMatchPredicate<
>       (ins reg:$A), (outs MachineOperandPtr:$result), 
> (metadata_out uint64_t:$shift), [{
>     // MachineOperand *matchARMShiftedOp2(const MachineOperand 
> &, uint64_t &);
>     return matchARMShiftedOp2(${A}, ${imm});
>   }]>;
> We'd also need something to make using the original %S2 an error as
I'm
> sure passing the operand being matched instead of the operand returned 
> by the predicate will be a common mistake otherwise:
> let ApplyCannotUseMatchedOperand = 1;
> I'd also be inclined to make it a warning for this to be unset if the 
> result has a name.
Is the ability to inline the reg_with_shift into the (defs) worth this 
complexity? I'd prefer to keep things simpler and more explicit.

Especially since doing so seems like it would allow getting rid of 
(defs) entirely. (I know, I keep harping on this point :))


[snip]>>>> Let's talk about using DAGs. I think I understand where the
>>>> temptation comes from to describe the MIR using code blocks,
but I'm
>>>> also pretty sure that it's a mistake to do so.
>>> The main motivation to move away from DAGs came from limitations in
>>> DAGs and the general ugliness of the original DAG-based
definitions.
>>> Once the idea of using MIR came up, we liked the fact that it
matched
>>> the existing serialization format which makes it easy to turn a 
>>> specific example from the compiler output into a combine rule and 
>>> avoid the need for another representation for instructions. We also
>>> liked the way it dealt with the difficult cases, 
>>> MachineMemoryOperands, subregs, etc. and also that it wasn't a
new
>>> syntax or parser (although it make require modifications to the 
>>> existing one). It also looked like it be convenient for a tool like
>>> Alive (https://www.cs.utah.edu/~regehr/papers/pldi15.pdf) although
we
>>> didn't really explore that particular thought further.
>>> Ultimately, both representations are DAG's and it comes down to
the
>>> convenience of the syntax in specifying the rules we want. MIR is 
>>> very convenient because of it's familiarity and flexibility.
>>> One of the bigger problems with using tablegen's DAG type is
that it
>>> doesn't deal with multi-result instructions very well. Every
time
>>> this has been raised on the list w.r.t SelectionDAG the solution
has
>>> boiled down to 'use C++ instead' and it would be good to
fix that so
>>> that things like UADDO are representable. You can write a rule that
>>> matches something like divmod at the top-level using the
'set' operator:
>>>   (set $D1, $D2, (divmod $A, $B))
>>> but as soon as it's not the top-level, it gets really ugly fast
even
>>> using pseudo-nodes:
>>>   (set (outs $D1, $D2), (sext (result (G_DIVMOD $A, $B):$T, 0)),
>>>                         (sext (result $T, 1)))
>>> In this example, outs is a pseudo-node needed to distinguish the 
>>> results from the sinks of the DAG, and result is a pseudo-node that
>>> selects one of the multiple results for use by the parent node. 
>>>  Meanwhile the MIR for is:
>>>   %0, %1 = G_DIVMOD %A, %B
>>>   %D1 = G_SEXT %0
>>>   %D2 = G_SEXT %1
>>> which seems much clearer. It becomes a bigger problem when you
start
>>> wanting to match target specific instructions as multi-result
becomes
>>> more common later in the backend, especially post-isel. It's
also a
>>> problem for upside-down matches, inside-out, and multi-root matches
>>> since tablegen's dag type requires a single common root node
and can
>>> only draw edges in one direction. If you want to have to root in
the
>>> middle then you have to distort the DAG to bring it to the root and
>>> mark up the reversed edges somehow
>>> Moving the result inside the instruction operator helps with these 
>>> problems but requires a list of dags and all the temporaries need
to
>>> be named:
>>>   [(G_DIVMOD $T1, $T2, $A, $B),
>>>    (G_SEXT $D1, $T1),
>>>    (G_SEXT $D2, $T2)]
>>> and at that point, we're very nearly at MIR.
>>
>> Yes, that was kind of the point. Expressing MIR, but in a way that is 
>> native to TableGen :)
>>
>> I should probably clarify that when I wrote "DAG", I meant
that only
>> in the sense of the TableGen 'dag' type, which should probably
just be
>> called "S-expression" instead of "DAG".
>>
>> I absolutely believe that a list-of-instructions approach is better 
>> than what the SDISel currently uses, for all the reasons you mentioned 
>> (multiple defs, multiple roots).
> 
> Ah ok. I thought that you were aiming for SDISel style DAG's. :-)
> 
> At first glance, I think everything can fit into this style. For 
> example, matching could be something like:
>    (G_DIVMOD $T1, $T2, $A, $B, debug-loc:$DL1)
>    (DEBUG_VALUE $T1, debug-expr:$DE1)
>    (DEBUG_VALUE $T2, debug-expr:$DE2)
> and the corresponding apply:
>    (DIVMOD $T1, $T2, $A, $B, debug-loc:$DL1)
>    (DEBUG_VALUE $T1, (modify-expr debug-expr:$DE1, ...))
>    (DEBUG_VALUE $T2, (modify-expr debug-expr:$DE2, ...))
> macros would be the same as instructions, etc. I'll work through the 
> examples in my notes and see if I can find something reasonable for 
> everything in this style.
> 
> One potential nit, is that it it's a little odd for the defs to be on 
> the right hand side of the opcode. It's easy enough to change:
>    (set $T1, $T2, (G_DIVMOD $A, $B))
> but I think that's worse to read. I'd be inclined to go for
everything
> on the right and just label the defs:
>    (G_DIVMOD def:$T1, def:$T2, use:$A, use:$B, debug-loc:$DL1)
I guess I'm so used to looking at textual assembly that I don't find the
defs on the RHS as grating even without the labeling. But that seems 
like a minor point.

>> I simply think using the 'dag' type in TableGen is a much
better
>> approach because it composes with other TableGen frontend features 
>> (e.g., all the substitution rules "just work", you can 
>> programmatically compose a dag/sexpr from multiple pieces using !con, 
>> etc.).
> 
> That makes sense to me.
> 
>>> Another issue we didn't like with DAG is that you also end up
needing
>>> a lot of pseudo-nodes like EXTRACT_SUBREG whereas these are natural
>>> parts of the MIR syntax. For example:
>>> (set $d, (EXTRACT_SUBREG (MYTGT_ADD $A, $B), sub_lo))
>>> compared to:
>>> %d:sub_lo = MYTGT_ADD %A, %B
>>> and:
>>> (set $d, (REG_SEQUENCE GPR32, (MYTGT_ADD $A, $B), sub_lo,
(MYTGT_ADD
>>> $C, $D), sub_hi))
>>> compared to:
>>> %d:sub_lo = MYTGT_ADD %A, %B
>>> %d:sub_hi = MYTGT_ADD %C, %D
>>
>> How about:
>>
>>  (MYTGT_ADD (sub_lo $d), $A, $B),
>>  (MYTGT_ADD (sub_hi $d), $C, $D),
>>
>> That seems acceptable to me.
> 
> That looks good to me too.
> 
>>> Along the same lines, I also think that the integrated debug-info
is
>>> only really practical in MIR. It's possible to shoe-horn
DILocation
>>> in using a pseudo-node like so:
>>>    (set $d, (add (mul $a, $b):$mul, $c):$add   -> (set 
>>> (merged_dilocation (muladd $a, $b, $c), $mul, $add)
>>> but there isn't really a good place to modify the DIExpression
used
>>> by DEBUG_VALUE.
>>>> One reason is that it adds yet another parser, which is more 
>>>> maintenance burden without buying much.
>>> The expectation is that we can make use of the existing MIR parser 
>>> and replace it's MachineInstr/MachineOperand/etc. generation
code
>>> with pattern-matching generation code. I'm going to be looking
into
>>> the implementation of that over the next few weeks.
>>>> The more important reason is that DAGs compose better with the
rest
>>>> of TableGen. Consider combine rules defined in multiclasses,
for
>>>> example. That is very common all through LLVM. In order to
mirror an
>>>> existing selection rule in the AMDGPU backend, we'd likely
want to
>>>> have something like this:
>>>>
>>>>  multiclass Arithmetic_i16_GIPats<Instruction ginst,
>>>>                                   Instruction inst> {
>>>>    // This is probably wishful thinking -- would be okay if we
had to
>>>>    // split this into two different rules due to the different
types
>>>>    // of $dst
>>>>    def : GICombineRule<
>>>>      (defs root:$dst, operand:$src0, operand:$src1),
>>>>      (oneof (ginst i16:$dst, i16:$src0, i16:$src1),
>>>>             (match (ginst i16:$tmp, i16:$src0, i16:$src1),
>>>>                    (G_ZEXT i32:$dst, $tmp))),
>>>>      (inst $dst, $src0, $src1)>;
>>>>
>>>>    // A third rule for zext to 64 bits would go here...
>>>>  }
>>> This is the reason that macros are explicitly imported in the defs 
>>> section and the imported instances are renameable. The same thing
in
>>> the MIR syntax (and using the changes to variable naming from
above)
>>> would be:
>>>  multiclass Arithmetic_i16_GIPats<Instruction ginst,
>>>                                   Instruction inst> {
>>>    def : GICombineRule<
>>>      # There's a corner case for the double definition of $dst
here.
>>> I'll come back to that later in the mixing concerns comment
>>>      (defs root:$dst, (import-macro ginst:$ginst, def:$dst, 
>>> use:$src0, use:$src1)),
>>>      (match [{ %dst:(s16) = ginst %src0:(s16), %src1:(s16) }]), # 
>>> I've corrected the i16 to s16 here. i16 isn't a type in
GlobalISel
>>>      (apply /* I'll come back to this bit */)>;
>>>    // Covers s32 and s64
>>>    def : GICombineRule<
>>>      (defs root:$dst, operand:$src0, operand:$src1, (import-macro 
>>> ginst:$ginst)),
>>>      (match [{ %0:(s16) = ginst %src0:(s16), %src1:(s16)
>>>                %dst = G_ZEXT %0:(s16) }]
>>>             (isS32OrS64 type:$dst)),
>>>      (apply /* I'll come back to this bit */)>;
>>>  }
>>> We could potentially fold the has G_ZEXT case and no G_ZEXT cases 
>>> together with a macro but it seems unnecessary complexity in this 
>>> case. It might be more worth it if there's a similar G_SEXT
variant too.
>>> The reason for the "I'll come back to this bit" in
the apply section
>>> is because you've raised an issue I'd forgotten to deal
with. The
>>> opcodes in the apply section should also be parameterizable. My
first
>>> thought on that is that there should be a macro-equivalent for
apply
>>> such as:
>>>  def : GIExpansion<
>>>    (defs def:$R, use:$S, imm:$IDX),
>>>    (apply (select imm:$IDX,
>>>                (0 [{MIR %R = FOO %S }]),
>>>                (1 [{MIR %R = BAR %S }]),
>>>                (2 [{MIR %0 = BAZ %S
>>>                         %R = BAR %0 }])))>;
>>> or alternatively:
>>>   def : GICombineRule<
>>>     ...,
>>>     ...,
>>>     (apply (create_opcode [{ ${IDX} ? MYTGT_A : MYTGT_B }]
...):$name,
>>>            [{MIR %D = name %s }])>;
>>> The former is more powerful, but the latter is more convenient for 
>>> trivial cases
>>
>> I admit that you've lost me a little here :)
>>
>> Okay, I see how importing an instruction as a macro is possible 
>> without string concatenation, although I have to say that having to 
>> express this explicit import still feels quite a bit more cumbersome 
>> than the strawman syntax I've shown using the 'dag' type.
>>
>> Again, the 'dag' type just composes better with the rest of
TableGen.
> 
> With your reply above, I'm coming around to the dag type. I'll see
how
> the examples in my notes end up in that style.
I'm very happy to read that :)

>> On the apply-side though, I don't understand where the difficulty 
>> comes from. Each instantiation of the multiclass has a single
`inst'
>> which needs to go into the apply rule. That shouldn't be so
difficult
>> to express -- at least with the 'dag' type it's trivial.
> 
> Yeah, I went off on a tangent a bit :-).
> 
> For your example: Yes, there's only one inst for each multiclass 
> instantiation so it's simple.
> 
> For the tangent I was on, consider:
> %0 = G_CONSTANT iN %x
> %1 = G_CONSTANT iN %y
> %2 = G_LSHR %a, %x
> %3 = G_LSHR %1, %y
> =>
> %0 = G_CONSTANT iN %z # z = x+y
> %1 = G_LSHR %a, %0
> and:
> %0 = G_CONSTANT iN %x
> %1 = G_CONSTANT iN %y
> %2 = G_ASHR %a, %x
> %3 = G_ASHR %1, %y
> =>
> %0 = G_CONSTANT iN %z # z = x+y
> %1 = G_ASHR %a, %0
> and:
> %0 = G_CONSTANT iN %x
> %1 = G_CONSTANT iN %y
> %2 = G_SHL %a, %x
> %3 = G_SHL %1, %y
> =>
> %0 = G_CONSTANT iN %z # z = x+y
> %1 = G_SHL %a, %0
> 
> These rules are very similar so it would be nice to merge them like so:
> %0 = G_CONSTANT iN %x
> %1 = G_CONSTANT iN %y
> %2 = {G_SHL,G_ASHR,G_LSHR} %a, %x
> %3 = <same-shift-as-%2> %1, %y
> =>
> %0 = G_CONSTANT iN %z # z = x+y
> %1 = <same-shift-as-match> %a, %0
Ah, I see what you mean.

My first instinct is to say: you could use a TableGen class or even a 
`foreach` as lightweight methods of generating three separate 
records/defs and thus three separate rules for that:

   foreach inst = [G_LSHR, G_ASHR, G_SHL] in
   def : GICombineRule<
     ... simply use inst here in the appropriate place ...
     >;

Then again, maybe it would be useful to write it as a single rule if it 
allows the TableGen backend to generate better matching code by 
explicitly expressing the similarity between the rules.

I'm not sure which is better: writing more but simpler rules, and 
relying on the backend to piece things together again and exploit 
optimization opportunities, or writing simpler but more complex rules.

>>>> Obviously you can achieve this in your proposal as well using
string
>>>> concatenation, but that does seem rather backwards.
>>>>
>>>> Something else that falls out rather nicely from using DAGs is
that
>>>> it gives you a natural way to give a name to an _instruction_,
to be
>>>> able to pass it to a predicate for extended checks (and as a 
>>>> potentially more natural way of preserving debug locations!).
>>> I don't think we need a means to name instructions since
instructions
>>> can be trivially found from of their results using 
>>> MachineOperand::getParent().
>>> As noted above, merging debug locations doesn't fit very well
into
>>> DAG's as you have to use a pseudo node to be able to provide
both
>>> DILocations on the same result instruction, and there's no real
means
>>> to include DEBUG_LOC for temporaries that are eliminated and need
to
>>> be rebased from another value.
>>
>> Hmm... I feel like we need more concrete examples for debug info to 
>> discuss this. One of the reasons I've brought up naming
instructions
>> is precisely because I thought it could simplify transferring
DebugLocs.
>>
>> Again, I'm not suggesting the use of a DAG like in SDSel, as I
think
>> that's a mistake. I simply mean the 'dag' type of TableGen,
which
>> should really be called 'sexpr' or something.
> 
> I think we can use the same idea you suggested for subregisters:
>    (DIVMOD $T1, $T2, $A, $B, (merge_debug_loc $DL1, $DL2))
That sounds good!

>>>> I have a more explicit example of this later.
>>>>
>>>>
>>>>> _'Upside-down' matches (i.e. roots at the top) and
similar_
>>>>> This one requires algorithm changes which I'd prefer
not to discuss
>>>>> in this RFC. Assuming the underlying algorithm gains
support for
>>>>> this, this is how the syntax would look:
>>>>> def : GICombineRule<
>>>>>   (defs root:$root, reg:$A),
>>>>>   (match [{MIR %1 = G_LOAD %root
>>>>>                %A = G_SEXT %1 }]),
>>>>>   (apply [{MIR %A = G_SEXTLOAD %root }])>;
>>>>> The only unusual thing about this rule is that the root
isn't at
>>>>> the bottom. Instead of starting at a use and matching
towards defs,
>>>>> we're starting at the def and matching towards uses.
This has some
>>>>> potentially useful properties. The combine algorithm has to
chose
>>>>> an insertion point for the replacement code and the
traditional
>>>>> choice has been the root of the match. Assuming we keep
doing the
>>>>> same thing, 'upside-down' matching like this is
able to avoid the
>>>>> checks that the load is safe to move, is non-volatile, has
one use,
>>>>> etc. that would be necessary if we moved the G_LOAD down to
the
>>>>> G_SEXT. Also, assuming we keep the same broadly bottom-up 
>>>>> processing order as the existing Combine algorithm this
kind of
>>>>> rule also has relatively lower priority than conventional
rules
>>>>> because the root is seen later. This can be useful as 
>>>>> (algorithm-dependent) the MIR may be more settled by the
time it
>>>>> tries to match.
>>>>> Along the same lines, the syntax can potentially support
the root
>>>>> being in the middle of a match. This could be used in a
similar way
>>>>> to the upside-down match above to control the insertion
point and
>>>>> priority. For example:
>>>>> def : GICombineRule<
>>>>>   (defs root:$root, reg:$A, reg:$B, reg:$C),
>>>>>   (match [{MIR %1(s32) = G_TRUNC %A(s64)
>>>>>                %2(s32) = G_TRUNC %B(s64)
>>>>>                %root = G_ADD %1, %2
>>>>>                %C(s64) = G_SEXT %root }]),
>>>>>   (emit [{MIR %root = G_ADD %A, %B
>>>>>               %C = G_SEXT_INREG %root }])>;
>>>>> Unfortunately, I don't have any concrete examples where
this would
>>>>> be useful in comparison to a conventional or upside-down
match at
>>>>> the moment. I'm mostly keeping the door open as I can
see potential
>>>>> for benefits (mostly w.r.t sinking and hoisting safety
around an
>>>>> instr that would be difficult to test for that) given an 
>>>>> appropriate rule and also because I'm inclined to say
that the
>>>>> tablegen syntax shouldn't be the reason it's not
possible. It
>>>>> should be up to the Combine algorithm and and
tablegen-erate pass
>>>>> involved in specializing the algorithm for a target.
>>>>
>>>> I'm concerned that the concrete syntax proposed here mixes
a bunch
>>>> of concerns that really should be addressed separately:
>>>>
>>>> 1. Defining a mapping between semantically equivalent chunks of
code.
>>>> 2. Controlling insertion points as a simple way to keep side
effects
>>>> under control in many cases.
>>>> 3. Controlling additional algorithmic aspects of the combine 
>>>> algorithm (whether you're matching up or down, say).
>>> I think there may be a misunderstanding with #3. There's no
real
>>> concept of matching up or down in the rules, there's only the 
>>> starting point for the match and from there whether it's
following
>>> the uses or defs isn't something the rule needs to be concerned
>>> about. The algorithm needs to be aware of it (and historically
hasn't
>>> been, DAGCombine broadly runs bottom up and expects to match
towards
>>> defs and can fail to re-schedule nodes for re-visit in many common 
>>> cases) but the algorithm is expected to figure that out for itself.
>>> 'root' definitions are the only place that #2 and #3 come
into the
>>> proposed syntax. At the moment, the only requirement the syntax 
>>> imposes on the algorithm is that it has a concept of a current 
>>> MachineOperand Def* (and by extension instruction) that will be
used
>>> as the starting point for the match and that it can choose a valid 
>>> insertion point for each instruction produced. The insertion point
is
>>> the same position in the current algorithm but that's algorithm
>>> dependent and doesn't have to be the case in the long run.
There may
>>> even be more than one for rules that emit multiple instructions in 
>>> the apply section. There's currently no means to specify an
insertion
>>> point, it's left up to the algorithm to find a place for each 
>>> instruction.  That said, I'd be surprised if there was an
algorithm
>>> where the current MachineOperand isn't also at least one of the
>>> insertion points.
>>> * The current upstream code is a bit misleading about that and
makes
>>> it seem that it's the MachineInstr but I made that mistake
before on
>>> ISel so I'm intending to change that.
>>>> Could we keep those separate somehow? E.g., don't
distinguish
>>>> between 'root' and 'reg' or 'operand'
in the (defs ...) part of the
>>>> rule; have TableGen figure out the roots automatically based on
>>>> walking the graph implied by the matching rule, and default to
using
>>>> the sink(s) as roots.
>>> operand isn't really connected to this, it's just a
wildcard for an
>>> MachineOperand that doesn't care whether it's a reg, imm,
MCExpr, or
>>> something else.
>>> I'm inclined to agree that it would be nice to get
'root' out of the
>>> definition but I don't really see a good way to do it. The
sinks
>>> aren't always the right choice and are a particularly bad
choice on
>>> combines like the extending loads since that's trying to match
all
>>> uses simultaneously and those uses could have any opcode.
>>
>> How about having it as a lettable variable? As in (using my favorite 
>> syntax, but it should translate either way):
>>
>>  let MatchRoot = "$tmp" in
>>  def : GICombineRule<
>>      (match (G_LOAD $tmp, $ptr),
>>             (G_SEXT $dst, $tmp)),
>>      (apply (G_SEXTLOAD $dst, $ptr))>;
>>
>> I'm using $tmp instead of $ptr as the root here since you wrote
above
>> that it needs to be a Def.
>>
>> Which makes the discussion about the insertion point potentially 
>> confusing, since $tmp no longer appears in the (apply). But maybe that 
>> doesn't matter.
> 
> The insertion point is up to the algorithm anyway so it makes sense for 
> it to not be visible.
> 
>> If MatchRoot is not set, it'd just default to using a sink -- that 
>> should be by far the most common case.
> 
> If we change that to all the sinks then we cover the common multi-root 
> case too.
> 
>> Also, it'd be nice to get rid of the (defs) part of the syntax 
>> entirely for brevity. I believe that should be possible.
> 
> I'm not very keen to remove the defs section but I'll see whether
it's
> still needed in this style. Even if it's removable though, having it is
> likely to reduce the time spent on the tablegen pass as there's a cost 
> to having to deduce information. I also think it's also important for 
> error checking since it provides a means to declare which values you 
> expect to see both in the match/apply whereas it's expected that some 
> are only matched.By reducing the time spent you mean the time spent in TableGen? That may 
well be the case :)

If it's only about making error checking easier, maybe it could be an 
optional annotation? Something like:

   def : GICombineRule<
     (match (INST $dst, $src0, $src1),
            reg:$src0, reg:src1),
     (apply ...)>;

Just spit-balling here admittedly.

>>>> Then add some optional let-able variable in the GICombineRule
to
>>>> define non-default rules for the combine algorithm.
>>>>
>>>> Controlling the insertion point to address loads like in your 
>>>> example above could be done quite naturally if the MIR is
expressed
>>>> in a DAG:
>>>>
>>>>  def : GICombineRule<
>>>>    (match (G_LOAD $tmp, $ptr):$load,
>>>>           (G_SEXT $dst, $tmp)),
>>>>    (apply (insertat $load),
>>>>           (G_SEXTLOAD $dst, $ptr))>;
>>> This isn't quite the same thing as the extending loads combine
in the
>>> example. In that combine, it's matching the load and all uses
of its
>>> result and chosing a replacement based on the global usage of that 
>>> def across the whole function, rewriting conflicting 
>>> sext/zext/aext/trunc's to be as cheap as possible (a lot of the
>>> detail of that is inside the function but the code is upstream if
you
>>> want to see the details). This rule is matching a single use and 
>>> relying on CSE to de-dupe the N G_SEXTLOAD's that come out of
it. To
>>> illustrate the difference, consider:
>>> %0(s8) = G_LOAD %p
>>> %d1(s16) = G_SEXT %0
>>> %d2(s32) = G_SEXT %0
>>> If I apply the above rule as much as possible to this (twice) with 
>>> the roots at the sink, I'll get:
>>> %d1(s16) = G_SEXTLOAD %p
>>> %d2(s32) = G_SEXTLOAD %p
>>> whereas if I apply the extending loads combine to it (once),
I'll get:
>>> %d2(s32) = G_SEXTLOAD %p
>>> %d1(s16) = G_TRUNC %d2
>>> which will normally be cheaper to execute. In combination with
other
>>> rules and CSE, the first version could eventually turn into the 
>>> second version but it requires more memory churn and processing
time
>>> to get there.
>>
>> You've lost me again, sorry. Where does the G_TRUNC suddenly come 
>> from? Is that the rule that uses `Helper.matchCombineExtendingLoads`?
>>
>> In that case, don't worry, I only intended my example to be
comparable
>> to the simpler G_LOAD / G_SEXTLOAD rule without C++ escape that you 
>> showed in your original mail as well.
> 
> Yes, it comes from the Helper.applyCombineExtendingLoads(...), which 
> looking back at the thread is from the macros section and not the upside 
> down matching section you were commenting on. Sorry about that
> 
>> If, on the other hand, you have a plan to get to the more advanced 
>> result of matching multiple G_SEXTs and using G_TRUNC _without_ a C++ 
>> escape, I'd be _very_ interested to learn more about that!
> 
> It's easy enough to match multiple uses in the current rule
definitions.
> You just describe each use in the match section and have a predicate 
> check that the operands aren't the same. However, C++ is the only way
to
> look at all the users of a given value and modify them differently at 
> the moment. I've had some vague thoughts about a (for-each-use ...) in 
> the match and apply section but I haven't given that much thought yet.
Okay, makes sense.

>>> One of the things I'm planning to look into in the algorithm is
>>> making it smarter about combines involving multiple-uses in the 
>>> intermediate operands. DAGCombine is harmfully greedy in this area 
>>> and will often duplicate expensive operations if one use is 
>>> combinable but another isn't. This has led to the addition of 
>>> hasOneUse() checks which solves that but takes things to the
opposite
>>> extreme. I think it should be possible to find an algorithm that 
>>> makes an intelligent decision for defs with multiple uses.
>>
>> Absolutely agreed that we need better solutions here.
>>
>> In the AMDGPU backend, we have some rather interesting challenges 
>> around those. This is perhaps a bit of a far future tangent, but one 
>> of them is that floating point operations can optionally negate their 
>> operands; however, this can be at the cost of a larger instruction 
>> encoding, and the rules for that are rather complex. So there's a 
>> problem of optimizing *where* we insert the negations to improve code 
>> size.
>>
>> In a computation that is a tree with a single sink, an optimal 
>> solution can be found in linear time using dynamic programming (maybe 
>> there's an easier way, but I'm not aware of one). With general
DAGs,
>> the problem is of course much harder.
>>
>> The negation is just one example, and it would be pretty awesome if we 
>> could use a declarative database of combines to make progress at the 
>> very least in local search optimizations.
> 
> It certainly seems that something like this ought to be possible. A set 
> of combines could find the local minimums and it's also possible for 
> combine rules to avoid making changes and record the possible places for 
> the negation and the associated costs which ought to make it possible to 
> reach the optimum for the tree.
> 
> Along the same lines, it occurs to me that known-bits analysis is very 
> similar and could be done declaratively with rules like:
>   def : GIKnownBitsRule<
>       (match reg:$dst, reg:$src))
>       (match (G_ZEXT $dst, $src)),
>       (apply (exec [{ 
> KnownZero.setBitsFrom(MRI.getType(${src}).getSizeInBits()) }],
reg:$src))>;
That's a cool idea.

By the way, did I mention that I *really* like the idea of using ${var} 
expansions in the C++ fragments? That's so much nicer than what we have 
for the SDIsel :)

Cheers,
Nicolai

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Daniel Sanders via llvm-dev

2018-Nov-27 17:59 UTC

head link

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

Hi All,

I've more or less finished updating the examples to the DAG style we were
talking about. Hopefully I haven't forgotten anything, there was a lot to
keep track of :-). Overall, I think there's a couple places where things get
a a little awkward (mainly debug info) but things generally look good to me.

A Simple Example

def : GICombineRule<(defs reg:$D, reg:$S),
                    (match (G_ZEXT s32:$t1, s8:$S),
                           (G_TRUNC s16:$D, s32:$t1)),
                    (apply (G_ZEXT s16:$D, s8:$S))>;

This has been converted to the style that fits within tblgen's DAG type but
isn't (directly) a DAG. The DAG structure is defined by the matching names
instead of tblgen's DAG type. This avoids the restrictions of the DAG type
and is something that's needed anyway to allow one node to be referenced in
multiple places. I've kept the 'match' node at the top of the match
to allow instructions and predicates to be freely combined within this section
(see later examples). Without it, the stricter type checking on list<X>
would require Instruction, GIMatchPredicate, and any future matching extensions
to have a common base class.

The explicit specification of root has also been removed in favour of a default
value and a 'let' assignment to deal with any cases where the
algorithm-chosen default is unsuitable (see the upside-down match section
below). The default would be the set of defs minus the set of uses. This covers
the common single-root case, as well as the multi-root case. The 'let'
would be required for the upside-down and inside-out matches.

One consequence of this style is that there's no means to have anonymous
operands that still define the edges of the DAG being matched as shown by the
need to name $t1 above.

I'll come back to possibility of removing the defs section in a follow-up
email as I suspect I'm over the size limit again. The bit I need to mention
in this email is that the formerly anonymous operands are omitted from the defs
section in these examples.

Generalizing the Simple Example

As before, this is the example above with the concrete types removed.
    def isScalarType : GIMatchPredicate<bool, (ins type:$A), (outs), [{
      return ${A}.isScalar();
    }]>;
    def isLargerType : GIMatchPredicate<bool, (ins type:$A, type:$B), (outs),
[{
      return ${A}.getSizeInBits() > ${B}.getSizeInBits();
    }]>;
    def : GICombineRule<(defs reg:$D, reg:$S),
                        (match (G_ZEXT $t1, $S),
                               (G_TRUNC $D, $t1),
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply (G_ZEXT $D, $S))>;

There's little different here aside from what's mentioned in the simple
example. I've just changed the argument order of GIMatchPredicate to better
match the order in the underlying function prototype.

Preserving DILocation and other debug info

We have a choice of syntax here and neither seems to be clearly better than the
other when considering just locations:
    def : GICombineRule<(defs reg:$D, reg:$S, instr:$MI0, instr:$MI1),
                        (match (G_ZEXT $t0, $S):$MI0,
                               (G_TRUNC $D, $t0):$MI1),
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply (G_ZEXT $D, $S, (debug_locations $MI0,
$MI1)))>;

    def : GICombineRule<(defs reg:$D, reg:$S, debug_location:$DL0,
debug_location:$DL0),
                        (match (G_ZEXT $t0, $S, debug_location:$DL0),
                               (G_TRUNC $D, $t0, debug_location:$DL1)),
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply (G_ZEXT $D, $S, (debug_locations $DL0,
$DL1)))>;

However, the former is more compact when DBG_VALUE is involved since naming the
instruction gives access to three of the four pieces of data we need to pass on
to the apply step:
    def : GICombineRule<(defs reg:$D, reg:$S, instr:$MI0, instr:$MI1,
instr:$MI2, instr:$MI3, debug_expr:$DNewExpr),
                        (match (G_ZEXT $t0, $S):$MI0,
                               (G_TRUNC $D, $t0):$MI1,
                               (DBG_VALUE $t0):$MI2,
                               (DBG_VALUE $D):$MI3,
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply (createDIExprLLVMFragment debug_expr:$MI2,
type:$D):$DNewExpr,
                               (G_ZEXT $D, $S, (debug_location $MI0, $MI1)),
                               (DBG_VALUE $D, debug_local:$MI2,
debug_expr:$DNewExpr, (debug_location $MI2)),
                               (DBG_VALUE $D, debug_local:$MI3, debug_expr:$MI3,
(debug_location $MI3))))>;

Matching immediates and G_CONSTANT/G_FCONSTANT

There isn't much that's special about the changes in this section.
Rewriting the examples into the DAG style gives:
    def : GICombineRule<(defs reg:$D, reg:$A, imm:$VAL),
                        (match (G_MUL $D, $A, $VAL),
                               (isTwo imm:$VAL)),
                        (apply (MYTGT_DOUBLE $D, $A))>;
Or equivalently:
    def imm_2 : GIPredicatedDefKind<(isTwo imm)>;
    def : GICombineRule<(defs reg:$D, reg:$A, imm_2:$VAL),
                        (match (G_MUL $D, $A, $VAL)),
                        (apply (MYTGT_DOUBLE $D, $A))>;

And here's the example that replaces 2 * (A + B) with 2A + 2B:
    def : GICombineRule<
      (defs reg:$D, reg:$A, imm:$B, imm_2:$C),
      (match (G_ADD $t1, $A, $B),
             (G_MUL $D, $t1, $C)),
      (apply (create_imm [{ 2 * ${B}->getZExtValue() }], apint_value:$B):$NB,
             (G_ADD $t1, $A, $A),
             (G_ADD $D, $t1, $NB))>;

Passing arbitrary data from match to apply

The main change in this section that hasn't already been discussed is that
the result of extending_load_predicate has been moved to the new 'outs'
section of GIMatchPredicate and the code expansion refers to a particular output
of the predicate using 'matchinfo.B' similar to a struct member or
multi-operand ComplexPatterns.

    def extending_load_matchdata :
GIDefMatchData<"PreferredTuple">;
    def extending_load_predicate : GIMatchPredicate<
         bool, (ins reg:$A), (outs extending_load_matchdata:$B), [{
      return Helper.matchCombineExtendingLoads(${A}, ${B});
    }]>;
    def extending_loads : GICombineRule<
      (defs operand:$D, reg:$A, extending_load_matchdata:$matchinfo),
      (match (G_LOAD $D, $A),
             (extending_load_predicate operand:$A,
                                       extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${D}, ${matchinfo.B});
}],
                   reg:$D, extending_load_matchdata:$matchinfo)>;

The main problem with this is that this prevents access to the contents of
matchinfo from outside of C++. To overcome this, I'd suggest allowing the
'${foo.bar}' syntax in tablegen's dag type. It should be a fairly
simple change and there's no tblgen-internal reason to not allow dots in the
names. Existing backends would have to check for it though which could be a
tblgen performance issue.

Macros

Aside from changing to the DAG style, the main change here is that the children
of 'oneof' all have 'match' at the top-level. This is for the
same reason 'match' was kept on the simple example: the relaxed type
checking on 'dag' compared to 'list<X>' allows us to
freely mix DAG-style matches and predicates.
    def ANYLOAD : GIMacro<(defs def:$R, use:$S, uint64_t:$IDX),
                          (match (oneof (match (G_LOAD $R, $S)),
                                        (match (G_SEXTLOAD $R, $S)),
                                        (match (G_ZEXTLOAD $R, $S))):$IDX>;
    def extending_loads : GICombineRule<
      (defs reg:$D, reg:$A, extending_load_matchdata:$matchinfo,
ANYLOAD:$ANYLOAD),
      (match (ANYLOAD $D, $A),
             (extending_load_predicate operand:$A,
                                       extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${D}, ${matchinfo.B});
}],
                   reg:$D, extending_load_matchdata:$matchinfo)>;

'Upside-down' matches (i.e. roots at the top) and similar

As mentioned back in the simple example, these need to specify the match roots
using a 'let' statement to override the default:
def : GICombineRule<
  (defs reg:$D, reg:$A),
  (match (G_LOAD $t1, $D),
         (G_SEXT $A, $t1)),
  (apply (G_SEXTLOAD $A, $D))> {
  let MatchStartsFrom = (roots $D);
};

def : GICombineRule<
  (defs reg:$D, reg:$A, reg:$B, reg:$C),
  (match (G_TRUNC s32:$t1, s64:$A),
         (G_TRUNC s32:$t2, s64:$B),
         (G_ADD $D, $t1, $t2)
         (G_SEXT s64:$C, $D)),
  (apply (G_ADD $D, $A, $B),
         (G_SEXT_INREG $C, $D))> {
  let MatchStartsFrom = (roots $D);
};

Multiple roots

All the changes in this section have already been discussed above:
def : GICombineRule<
  (defs reg:$D1, reg:$D2, reg:$A, reg:$B),
  (match (G_ADD $D1, $A, $B),
         (G_SUB $D2, $A, $B)),
  (apply (BUTTERFLY $D1, $D2, $A, $B))>;
$D1 and $D2 are both automatically chosen as roots since they are def'd but
not used.

Subregisters

This is a new example based on our discussion about MIR having direct support
for subregisters. We use subregister indexes to specify that it's a
subregister that should be emitted by the apply step.
def : GICombineRule<
  (defs reg:$D, reg:$A, reg:$B),
  (match (G_TRUNC s32:$t1, s64:$A),
         (G_TRUNC s32:$t2, s64:$B),
         (G_ADD $D, $t1, $t2),
  (apply (ADD32 $D, (sub_lo $A), (sub_lo $B)))>;

Matching MachineMemOperands

While re-writing these examples, I also realized I didn't have any examples
for testing properties of the MachineMemOperand, so here's one:
    def mmo_is_load_8 : GIMatchPredicate<
         (ins instr:$A), (outs), [{
      if (!${A}.hasOneMemOperand())
        return false;
      const auto &MMO = *${A}.memoperands_begin();
      return MMO.isLoad() && MMO.getSize() == 1;
    }]>;
    def : GICombineRule<
      (defs operand:$D, operand:$A),
      (match (G_LOAD $D, $A):$MI,
             (mmo_is_load8 instr:$MI)),
      (apply ...)>;
I've made use of the naming instructions here since this was better in the
debug info section. If we choose the alternative in that section then we can
change it to $D or $A instead (either way the generated code would call
getParent()).


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181127/637d63ed/attachment.html>

Daniel Sanders via llvm-dev

2018-Nov-27 18:01 UTC

head link

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

...Continued from the other email

Removing the defs section

We can potentially infer quite a lot of the defs section but it requires both a
complicated ruleset and that tblgen spends processing time doing the
inferencing. That processing time is potentially significant for large combiners
and for that reason we need to be careful not to let inferencing become a burden
on tblgen. My main worry here is that it may require analyzing both the match
and apply section to fully inference the types and I'd like the tablegen
pass to be able to ignore the apply section until it's about to write it out
to the output file.

There are however, some simple rules we can use to minimize the defs section
with just a scan of the defs section (trivial) and match section (which we have
to do anyway):
Match roots are implicitly declared in the defs section and have type
'reg'.
Names used as argument to an operator that subclasses Instruction are implicitly
declared in the defs section with type 'reg'. This allows leaf-operands
to be used in the apply section
This does however come at additional run-time memory cost
Names used as argument to an operator that subclasses Instruction are assumed to
be 'reg'. This allows us to omit the formerly-anonymous names such as
$t[0-3] in the above examples.
Using an 'operand' typed variable as the def of an instruction does not
change it to 'reg' but does emit the isReg() check that 'reg'
would normally emit.
Names used in an 'outs' position on a GIMatchPredicate are implicitly
declared in the defs section using the type given by the predicate.
Any names not given in the defs section are assumed to be 'instr' if
they are used as a name for an operator that subclasses Instruction. This allows
us to omit the defs for MI's involved in debug info.
Any names used solely in the apply section and defined by a value-constructing
operator (e.g. create*) can be omitted

With these rules, the defs section isn't completely gone but is often empty
and can be omitted with a variant of GICombineRule. The examples would be as
below. I've indicated which rule eliminates which def with color coding and
strike-through:
    // Omitted due to rule 1
    // Omitted due to rule 2
    // We haven't been listing any defs related to rule 3 since these are
the formerly anonymous
    // operands that now need names to fit into the tblgen dag type
    // Omitted due to rule 4
    // Omitted due to rule 5
    // Omitted due to rule 6
    def : GICombineRule<(defs reg:$D, reg:$S),
                        (match (G_ZEXT s32:$t1, s8:$S),
                               (G_TRUNC s16:$D, s32:$t1)),
                        (apply (G_ZEXT s16:$D, s8:$S))>;
    def : GICombineRule<(defs reg:$D, reg:$S),
                        (match (G_ZEXT $t1, $S),
                               (G_TRUNC $D, $t1),
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply (G_ZEXT $D, $S))>;
    def : GICombineRule<(defs reg:$D, reg:$S, instr:$MI0, instr:$MI1),
                        (match (G_ZEXT $t0, $S):$MI0,
                               (G_TRUNC $D, $t0):$MI1),
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply (G_ZEXT $D, $S, (debug_locations $MI0,
$MI1)))>;
    def : GICombineRule<(defs reg:$D, reg:$S, instr:$MI0, instr:$MI1,
instr:$MI2, instr:$MI3, debug_expr:$DNewExpr),
                        (match (G_ZEXT $t0, $S):$MI0,
                               (G_TRUNC $D, $t0):$MI1,
                               (DBG_VALUE $t0):$MI2,
                               (DBG_VALUE $D):$MI3,
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply (createDIExprLLVMFragment debug_expr:$MI2,
type:$D):$DNewExpr,
                               (G_ZEXT $D, $S, (debug_location $MI0, $MI1)),
                               (DBG_VALUE $D, debug_local:$MI2,
debug_expr:$DNewExpr, (debug_location $MI2)),
                               (DBG_VALUE $D, debug_local:$MI3, debug_expr:$MI3,
(debug_location $MI3))))>;
    // $VAL is needed to indicate it's an immediate and provide the
predicate
    def : GICombineRule<(defs reg:$D, reg:$A, imm_2:$VAL),
                        (match (G_MUL $D, $A, $VAL)),
                        (apply (MYTGT_DOUBLE $D, $A))>;
    // $B/$C are needed to indicate they're immediates and provide $C's
predicate
    def : GICombineRule<
      (defs reg:$D, reg:$A, imm:$B, imm_2:$C),
      (match (G_ADD $t1, $A, $B),
             (G_MUL $D, $t1, $C)),
      (apply (create_imm [{ 2 * ${B}->getZExtValue() }], apint_value:$B):$NB,
             (G_ADD $t1, $A, $A),
             (G_ADD $D, $t1, $NB))>;
    // $D is needed because we wanted operand instead of reg. We could rewrite
the predicate to take a reg though.
    def extending_loads : GICombineRule<
      (defs operand:$D, reg:$A, extending_load_matchdata:$matchinfo),
      (match (G_LOAD $D, $A),
             (extending_load_predicate operand:$A,
                                       extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${D}, ${matchinfo.B});
}],
                   reg:$D, extending_load_matchdata:$matchinfo)>;
    // I haven't included any rules for omitting defs from GIMacro but I can
look into this if we want.
    def ANYLOAD : GIMacro<(defs def:$R, use:$S, uint64_t:$IDX),
                          (match (oneof (match (G_LOAD $R, $S)),
                                        (match (G_SEXTLOAD $R, $S)),
                                        (match (G_ZEXTLOAD $R, $S))):$IDX>;
    // $ANYLOAD is needed to import the macro and name its instance
    def extending_loads : GICombineRule<
      (defs reg:$D, reg:$A, extending_load_matchdata:$matchinfo,
ANYLOAD:$ANYLOAD),
      (match (ANYLOAD $D, $A),
             (extending_load_predicate operand:$A,
                                       extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${D}, ${matchinfo.B});
}],
                   reg:$D, extending_load_matchdata:$matchinfo)>;
    def : GICombineRule<
      (defs reg:$D, reg:$A),
      (match (G_LOAD $t1, $D),
             (G_SEXT $A, $t1)),
      (apply (G_SEXTLOAD $A, $D))> {
      let MatchStartsFrom = (roots $D);
    };
    def : GICombineRule<
      (defs reg:$D, reg:$A, reg:$B, reg:$C),
      (match (G_TRUNC s32:$t1, s64:$A),
             (G_TRUNC s32:$t2, s64:$B),
             (G_ADD $D, $t1, $t2)
             (G_SEXT s64:$C, $D)),
      (apply (G_ADD $D, $A, $B),
             (G_SEXT_INREG $C, $D))> {
      let MatchStartsFrom = (roots $D);
    };
    def : GICombineRule<
      (defs reg:$D1, reg:$D2, reg:$A, reg:$B),
      (match (G_ADD $D1, $A, $B),
             (G_SUB $D2, $A, $B)),
      (apply (BUTTERFLY $D1, $D2, $A, $B))>;
    def : GICombineRule<
      (defs reg:$D, reg:$A, reg:$B),
      (match (G_TRUNC s32:$t1, s64:$A),
             (G_TRUNC s32:$t2, s64:$B),
             (G_ADD $D, $t1, $t2),
      (apply (ADD32 $D, (sub_lo $A), (sub_lo $B)))>;
    // This one keeps the def of $D/$A because we wanted operand instead of reg.
We didn't have
    // anything requiring that though and could trivially change them to reg.
    def : GICombineRule<
      (defs operand:$D, operand:$A),
      (match (G_LOAD $D, $A):$MI,
             (mmo_is_load8 instr:$MI)),
      (apply ...)>;
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181127/7aca646b/attachment-0001.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Nov 2018 - [RFC] Tablegen-erated GlobalISel Combine Rules

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

Apparently Analagous Threads