thr3ads.net - llvm dev - [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Daniel Sanders via llvm-dev

2018-Nov-10 02:05 UTC

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

Thanks David!
> On Nov 9, 2018, at 08:36, David Greene <dag at cray.com> wrote:
> 
> Daniel Sanders via llvm-dev <llvm-dev at lists.llvm.org> writes:
> 
>> I've been working on the GlobalISel combiner recently and I'd
like to
>> share the plan for how Combine Rules will be defined in GlobalISel and
>> solicit feedback on it.
> 
> This is really great stuff!  I agree with pretty much everything Nicolai
> said, particularly the use of DAGs.  That's seems much more natually
> TableGen-y to me.  Specific comments are below.
> 
> But before that, I've been long pained that we have so much duplicated
> code in instcombine and dagcombine.  I know this is way beyond the scope
> of this work but do you think the basic concepts could be applied to
> produce TableGen-generated instcombine passes?  It would be nice to
> re-use many of the rules, for example, except they'd match LLVM IR
> rather than MIR.  As you go about implementation, maybe keep this idea
> in mind?
That's an interesting idea. Certainly tablegenerating InstCombine ought to
be possible and sharing code sounds like it ought to be doable. MIR and IR are
pretty similar especially after IRTranslator (which is a direct translation)
through to the Legalizer (which is the first point target instructions can until
targets make custom passes). From the Legalizer to ISel, there's still
likely to be a fair amount of overlap between the two as a lot of the G_*
opcodes directly correspond to LLVM-IR instructions. The tricky bit will be the
escape hatches into C++ would need to either have Instruction/MachineInstr
versions or would need to accept both.
>> Here's a simple example that eliminates a redundant G_TRUNC.
>> 
>> def : GICombineRule<(defs root:$D, operand:$S),
>> (match [{MIR %1(s32) = G_ZEXT %S(s8)
>> %D(s16) = G_TRUNC %1(s32) }]),
>> (apply [{MIR %D(s16) = G_ZEXT %S(s8) }])>;
> 
> The use of '%' vs. '$' here really threw me for a loop.  I
would have
> expected '$D' and '$S' everywhere.  What's the
significance of '%'
> vs. '$'?  I know MIR uses '%' for names but must that be
the case in
> these match blocks?
In MIR, '%' and '$' have a semantic difference when used on
operands.
'%foo' is a virtual register named foo but '$foo' is the
physical register foo.

The main reason I didn't pick something distinct from either (e.g.
'${foo}') is that I'd like to minimize the need to modify the MIR
parser to support pattern-specific syntax.
> Of course if we go with Nicolai's use of DAGs this issue seems to go
> away.
> 
>> * defs declares the interface for the rule. The definitions in the
>>  defs section are the glue between the Combine algorithm and the
>>  rule, as well as between the match and apply sections. In this case
>>  we only define the root of the match and that we have a register
>>  variable called $S.
> 
> My understanding is that "root" means the sink here, but later in
the
> "upside-down" example, "root" means the source.  It
seems to me the use
> of "root" here is a misnomer/confusing.  I liked Nicolai's
ideas about
> specifying the insert point.  Why do users need to know about
"root" at
> all?  Is there some other special meaning attached to it?
> 
>                               -David
It doesn't correspond with any property of the DAG being matched. It's
the entry point that the combine algorithm uses to begin an attempt to match the
rule. In DAGCombine terms, it's the first node checked for the rule inside
the call to DAGCombine::visit(SDNode *).

David Greene via llvm-dev

2018-Nov-13 16:01 UTC

head link

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

Daniel Sanders via llvm-dev <llvm-dev at lists.llvm.org> writes:
> That's an interesting idea. Certainly tablegenerating InstCombine
> ought to be possible and sharing code sounds like it ought to be
> doable. MIR and IR are pretty similar especially after IRTranslator
> (which is a direct translation) through to the Legalizer (which is the
> first point target instructions can until targets make custom
> passes). From the Legalizer to ISel, there's still likely to be a fair
> amount of overlap between the two as a lot of the G_* opcodes directly
> correspond to LLVM-IR instructions. The tricky bit will be the escape
> hatches into C++ would need to either have Instruction/MachineInstr
> versions or would need to accept both.
Yes.  I wonder if templates can help with this.  I'm thinking of
LoopInfo, which is parameterized with BasicBlock/Loop or
MachineBasicBlock/MachineLoop, as a guide.  I don't know if Instruction
and MachineInstr have enough APIs in common, though.  Perhaps a
longer-term effort could be made to develop a common API with the same
spellings for both, at least enough for dagcombine purposes.

Your initial response to Nicolai's suggestion to use TableGen's dag type
triggered something else in my mind.  The TableGen SelectionDAG stuff is
really restrictive in a lot of ways because the TableGen dag type is
used to express actual DAGs and you rightly pointed out that that's too
unwieldy for dagcombine.  It's unwieldy for isel too, which has many of
the same issues (multiple outputs, use of things like EXTRACT_SUBREG and
so on).  It's never handled selecting to multiple instructions very well
either.  I wonder if it would be possible to express isel patterns using
TableGen's dag type in the way that Nicolai suggests for dagcombine.  In
a lot of ways, isel is "just another kind of dagcombine."
>> The use of '%' vs. '$' here really threw me for a loop.
I would have
>> expected '$D' and '$S' everywhere.  What's the
significance of '%'
>> vs. '$'?  I know MIR uses '%' for names but must that
be the case in
>> these match blocks?
>
> In MIR, '%' and '$' have a semantic difference when used on
operands.
> '%foo' is a virtual register named foo but '$foo' is the
physical register foo.
Ok, thanks for the explanation.  I have not worked extensively with MIR
yet.
> The main reason I didn't pick something distinct from either
> (e.g. '${foo}') is that I'd like to minimize the need to modify
the
> MIR parser to support pattern-specific syntax.
Sure, that makes sense.
>> My understanding is that "root" means the sink here, but
later in the
>> "upside-down" example, "root" means the source.  It
seems to me the use
>> of "root" here is a misnomer/confusing.  I liked
Nicolai's ideas about
>> specifying the insert point.  Why do users need to know about
"root" at
>> all?  Is there some other special meaning attached to it?
>
> It doesn't correspond with any property of the DAG being matched.
It's
> the entry point that the combine algorithm uses to begin an attempt to
> match the rule. In DAGCombine terms, it's the first node checked for
> the rule inside the call to DAGCombine::visit(SDNode *).
Got it.  "root" has a specific meeting for a tree/dag and I think I
was
getting hung up on that, but I see how it has meaning as the starting
point of a match.  I was going to say maybe we can find another name for
it, but now I can't think of a better one.  :)

                               -David

Daniel Sanders via llvm-dev

2018-Nov-15 18:00 UTC

head link

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

> On Nov 13, 2018, at 08:01, David Greene <dag at cray.com> wrote:
> 
> Daniel Sanders via llvm-dev <llvm-dev at lists.llvm.org> writes:
> 
>> That's an interesting idea. Certainly tablegenerating InstCombine
>> ought to be possible and sharing code sounds like it ought to be
>> doable. MIR and IR are pretty similar especially after IRTranslator
>> (which is a direct translation) through to the Legalizer (which is the
>> first point target instructions can until targets make custom
>> passes). From the Legalizer to ISel, there's still likely to be a
fair
>> amount of overlap between the two as a lot of the G_* opcodes directly
>> correspond to LLVM-IR instructions. The tricky bit will be the escape
>> hatches into C++ would need to either have Instruction/MachineInstr
>> versions or would need to accept both.
> 
> Yes.  I wonder if templates can help with this.  I'm thinking of
> LoopInfo, which is parameterized with BasicBlock/Loop or
> MachineBasicBlock/MachineLoop, as a guide.  I don't know if Instruction
> and MachineInstr have enough APIs in common, though.  Perhaps a
> longer-term effort could be made to develop a common API with the same
> spellings for both, at least enough for dagcombine purposes.
Unfortunately, Instruction is a class-hierachy based design with specialized
getters for each subclass whereas MachineInstr is a single one-size-fits-all
class. It's likely to be difficult to unify them with templates. A common
API sounds feasible to some degree, certainly the G_* opcodes and the
Instruction hierarchy ought to be able to agree on abstraction that can interact
with both, and MachineMemOperand ought to be able to agree with
LoadInst/StoreInst/Atomic*Inst
> Your initial response to Nicolai's suggestion to use TableGen's dag
type
> triggered something else in my mind.  The TableGen SelectionDAG stuff is
> really restrictive in a lot of ways because the TableGen dag type is
> used to express actual DAGs and you rightly pointed out that that's too
> unwieldy for dagcombine.  It's unwieldy for isel too, which has many of
> the same issues (multiple outputs, use of things like EXTRACT_SUBREG and
> so on).  It's never handled selecting to multiple instructions very
well
> either.  I wonder if it would be possible to express isel patterns using
> TableGen's dag type in the way that Nicolai suggests for dagcombine. 
In
> a lot of ways, isel is "just another kind of dagcombine."
I expect it to be able to deal with ISel too. The main difference between the
two is that ISel's apply step is required to constrain operands to a
register class whereas Combine doesn't need to do that but can choose to.
I'm intending to share code between the tablegen passes for Combine and ISel
so that it will be the same underlying code generator.

The syntax could also work for the Legalizer, and RegBankAlloc too. The
legalizer is a degenerate case of matching a single instruction's opcode,
types, and MachineMemOperand so it's probably not worth doing there, but
using it in RegBankAlloc to specify the alternative code for each bank and the
constraints to apply could be useful. The RegBankAlloc pass boils down to lots
of rules of the form:
	%2(s32) = ADD %0(s32), %1(s32) => %2:GPR = ADD %0:GPR, %1:GPR
with some contradictory rules like:
	%1(s32) = FNEG %0(s32) => %1:FPR = FNEG %0: FPR
	%1(s32) = FNEG %0(s32) => %1:GPR = XOR %0:GPR, i32 0x80000000
RegBankAlloc's job is then to resolve the contradictions in the most
globally efficient way. For example, if the input is already in FPR's then
the higher latency of an FNEG might beat the cost of transferring it to GPR and
using the lower-latency XOR. However, if the result needs to be in a GPR anyway
then it's better to copy to GPR before the FNEG and use the lower-latency
XOR instead.
>>> The use of '%' vs. '$' here really threw me for a
loop.  I would have
>>> expected '$D' and '$S' everywhere.  What's the
significance of '%'
>>> vs. '$'?  I know MIR uses '%' for names but must
that be the case in
>>> these match blocks?
>> 
>> In MIR, '%' and '$' have a semantic difference when
used on operands.
>> '%foo' is a virtual register named foo but '$foo' is
the physical register foo.
> 
> Ok, thanks for the explanation.  I have not worked extensively with MIR
> yet.
> 
>> The main reason I didn't pick something distinct from either
>> (e.g. '${foo}') is that I'd like to minimize the need to
modify the
>> MIR parser to support pattern-specific syntax.
> 
> Sure, that makes sense.
> 
>>> My understanding is that "root" means the sink here, but
later in the
>>> "upside-down" example, "root" means the source.
It seems to me the use
>>> of "root" here is a misnomer/confusing.  I liked
Nicolai's ideas about
>>> specifying the insert point.  Why do users need to know about
"root" at
>>> all?  Is there some other special meaning attached to it?
>> 
>> It doesn't correspond with any property of the DAG being matched.
It's
>> the entry point that the combine algorithm uses to begin an attempt to
>> match the rule. In DAGCombine terms, it's the first node checked
for
>> the rule inside the call to DAGCombine::visit(SDNode *).
> 
> Got it.  "root" has a specific meeting for a tree/dag and I think
I was
> getting hung up on that, but I see how it has meaning as the starting
> point of a match.  I was going to say maybe we can find another name for
> it, but now I can't think of a better one.  :)
The best I can think of is 'matchpoint' or 'matchfrom'. As
mentioned on the thread with Nicolai, we might be able to drop it altogether in
favour of a let statement which can a longer name which will help to clarify it.
>                               -David

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Nov 2018 - [RFC] Tablegen-erated GlobalISel Combine Rules

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

Apparently Analagous Threads