Daniel Sanders via llvm-dev
2018-Nov-10 02:05 UTC
[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules
Thanks David!> On Nov 9, 2018, at 08:36, David Greene <dag at cray.com> wrote: > > Daniel Sanders via llvm-dev <llvm-dev at lists.llvm.org> writes: > >> I've been working on the GlobalISel combiner recently and I'd like to >> share the plan for how Combine Rules will be defined in GlobalISel and >> solicit feedback on it. > > This is really great stuff! I agree with pretty much everything Nicolai > said, particularly the use of DAGs. That's seems much more natually > TableGen-y to me. Specific comments are below. > > But before that, I've been long pained that we have so much duplicated > code in instcombine and dagcombine. I know this is way beyond the scope > of this work but do you think the basic concepts could be applied to > produce TableGen-generated instcombine passes? It would be nice to > re-use many of the rules, for example, except they'd match LLVM IR > rather than MIR. As you go about implementation, maybe keep this idea > in mind?That's an interesting idea. Certainly tablegenerating InstCombine ought to be possible and sharing code sounds like it ought to be doable. MIR and IR are pretty similar especially after IRTranslator (which is a direct translation) through to the Legalizer (which is the first point target instructions can until targets make custom passes). From the Legalizer to ISel, there's still likely to be a fair amount of overlap between the two as a lot of the G_* opcodes directly correspond to LLVM-IR instructions. The tricky bit will be the escape hatches into C++ would need to either have Instruction/MachineInstr versions or would need to accept both.>> Here's a simple example that eliminates a redundant G_TRUNC. >> >> def : GICombineRule<(defs root:$D, operand:$S), >> (match [{MIR %1(s32) = G_ZEXT %S(s8) >> %D(s16) = G_TRUNC %1(s32) }]), >> (apply [{MIR %D(s16) = G_ZEXT %S(s8) }])>; > > The use of '%' vs. '$' here really threw me for a loop. I would have > expected '$D' and '$S' everywhere. What's the significance of '%' > vs. '$'? I know MIR uses '%' for names but must that be the case in > these match blocks?In MIR, '%' and '$' have a semantic difference when used on operands. '%foo' is a virtual register named foo but '$foo' is the physical register foo. The main reason I didn't pick something distinct from either (e.g. '${foo}') is that I'd like to minimize the need to modify the MIR parser to support pattern-specific syntax.> Of course if we go with Nicolai's use of DAGs this issue seems to go > away. > >> * defs declares the interface for the rule. The definitions in the >> defs section are the glue between the Combine algorithm and the >> rule, as well as between the match and apply sections. In this case >> we only define the root of the match and that we have a register >> variable called $S. > > My understanding is that "root" means the sink here, but later in the > "upside-down" example, "root" means the source. It seems to me the use > of "root" here is a misnomer/confusing. I liked Nicolai's ideas about > specifying the insert point. Why do users need to know about "root" at > all? Is there some other special meaning attached to it? > > -DavidIt doesn't correspond with any property of the DAG being matched. It's the entry point that the combine algorithm uses to begin an attempt to match the rule. In DAGCombine terms, it's the first node checked for the rule inside the call to DAGCombine::visit(SDNode *).
David Greene via llvm-dev
2018-Nov-13 16:01 UTC
[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules
Daniel Sanders via llvm-dev <llvm-dev at lists.llvm.org> writes:> That's an interesting idea. Certainly tablegenerating InstCombine > ought to be possible and sharing code sounds like it ought to be > doable. MIR and IR are pretty similar especially after IRTranslator > (which is a direct translation) through to the Legalizer (which is the > first point target instructions can until targets make custom > passes). From the Legalizer to ISel, there's still likely to be a fair > amount of overlap between the two as a lot of the G_* opcodes directly > correspond to LLVM-IR instructions. The tricky bit will be the escape > hatches into C++ would need to either have Instruction/MachineInstr > versions or would need to accept both.Yes. I wonder if templates can help with this. I'm thinking of LoopInfo, which is parameterized with BasicBlock/Loop or MachineBasicBlock/MachineLoop, as a guide. I don't know if Instruction and MachineInstr have enough APIs in common, though. Perhaps a longer-term effort could be made to develop a common API with the same spellings for both, at least enough for dagcombine purposes. Your initial response to Nicolai's suggestion to use TableGen's dag type triggered something else in my mind. The TableGen SelectionDAG stuff is really restrictive in a lot of ways because the TableGen dag type is used to express actual DAGs and you rightly pointed out that that's too unwieldy for dagcombine. It's unwieldy for isel too, which has many of the same issues (multiple outputs, use of things like EXTRACT_SUBREG and so on). It's never handled selecting to multiple instructions very well either. I wonder if it would be possible to express isel patterns using TableGen's dag type in the way that Nicolai suggests for dagcombine. In a lot of ways, isel is "just another kind of dagcombine.">> The use of '%' vs. '$' here really threw me for a loop. I would have >> expected '$D' and '$S' everywhere. What's the significance of '%' >> vs. '$'? I know MIR uses '%' for names but must that be the case in >> these match blocks? > > In MIR, '%' and '$' have a semantic difference when used on operands. > '%foo' is a virtual register named foo but '$foo' is the physical register foo.Ok, thanks for the explanation. I have not worked extensively with MIR yet.> The main reason I didn't pick something distinct from either > (e.g. '${foo}') is that I'd like to minimize the need to modify the > MIR parser to support pattern-specific syntax.Sure, that makes sense.>> My understanding is that "root" means the sink here, but later in the >> "upside-down" example, "root" means the source. It seems to me the use >> of "root" here is a misnomer/confusing. I liked Nicolai's ideas about >> specifying the insert point. Why do users need to know about "root" at >> all? Is there some other special meaning attached to it? > > It doesn't correspond with any property of the DAG being matched. It's > the entry point that the combine algorithm uses to begin an attempt to > match the rule. In DAGCombine terms, it's the first node checked for > the rule inside the call to DAGCombine::visit(SDNode *).Got it. "root" has a specific meeting for a tree/dag and I think I was getting hung up on that, but I see how it has meaning as the starting point of a match. I was going to say maybe we can find another name for it, but now I can't think of a better one. :) -David
Daniel Sanders via llvm-dev
2018-Nov-15 18:00 UTC
[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules
> On Nov 13, 2018, at 08:01, David Greene <dag at cray.com> wrote: > > Daniel Sanders via llvm-dev <llvm-dev at lists.llvm.org> writes: > >> That's an interesting idea. Certainly tablegenerating InstCombine >> ought to be possible and sharing code sounds like it ought to be >> doable. MIR and IR are pretty similar especially after IRTranslator >> (which is a direct translation) through to the Legalizer (which is the >> first point target instructions can until targets make custom >> passes). From the Legalizer to ISel, there's still likely to be a fair >> amount of overlap between the two as a lot of the G_* opcodes directly >> correspond to LLVM-IR instructions. The tricky bit will be the escape >> hatches into C++ would need to either have Instruction/MachineInstr >> versions or would need to accept both. > > Yes. I wonder if templates can help with this. I'm thinking of > LoopInfo, which is parameterized with BasicBlock/Loop or > MachineBasicBlock/MachineLoop, as a guide. I don't know if Instruction > and MachineInstr have enough APIs in common, though. Perhaps a > longer-term effort could be made to develop a common API with the same > spellings for both, at least enough for dagcombine purposes.Unfortunately, Instruction is a class-hierachy based design with specialized getters for each subclass whereas MachineInstr is a single one-size-fits-all class. It's likely to be difficult to unify them with templates. A common API sounds feasible to some degree, certainly the G_* opcodes and the Instruction hierarchy ought to be able to agree on abstraction that can interact with both, and MachineMemOperand ought to be able to agree with LoadInst/StoreInst/Atomic*Inst> Your initial response to Nicolai's suggestion to use TableGen's dag type > triggered something else in my mind. The TableGen SelectionDAG stuff is > really restrictive in a lot of ways because the TableGen dag type is > used to express actual DAGs and you rightly pointed out that that's too > unwieldy for dagcombine. It's unwieldy for isel too, which has many of > the same issues (multiple outputs, use of things like EXTRACT_SUBREG and > so on). It's never handled selecting to multiple instructions very well > either. I wonder if it would be possible to express isel patterns using > TableGen's dag type in the way that Nicolai suggests for dagcombine. In > a lot of ways, isel is "just another kind of dagcombine."I expect it to be able to deal with ISel too. The main difference between the two is that ISel's apply step is required to constrain operands to a register class whereas Combine doesn't need to do that but can choose to. I'm intending to share code between the tablegen passes for Combine and ISel so that it will be the same underlying code generator. The syntax could also work for the Legalizer, and RegBankAlloc too. The legalizer is a degenerate case of matching a single instruction's opcode, types, and MachineMemOperand so it's probably not worth doing there, but using it in RegBankAlloc to specify the alternative code for each bank and the constraints to apply could be useful. The RegBankAlloc pass boils down to lots of rules of the form: %2(s32) = ADD %0(s32), %1(s32) => %2:GPR = ADD %0:GPR, %1:GPR with some contradictory rules like: %1(s32) = FNEG %0(s32) => %1:FPR = FNEG %0: FPR %1(s32) = FNEG %0(s32) => %1:GPR = XOR %0:GPR, i32 0x80000000 RegBankAlloc's job is then to resolve the contradictions in the most globally efficient way. For example, if the input is already in FPR's then the higher latency of an FNEG might beat the cost of transferring it to GPR and using the lower-latency XOR. However, if the result needs to be in a GPR anyway then it's better to copy to GPR before the FNEG and use the lower-latency XOR instead.>>> The use of '%' vs. '$' here really threw me for a loop. I would have >>> expected '$D' and '$S' everywhere. What's the significance of '%' >>> vs. '$'? I know MIR uses '%' for names but must that be the case in >>> these match blocks? >> >> In MIR, '%' and '$' have a semantic difference when used on operands. >> '%foo' is a virtual register named foo but '$foo' is the physical register foo. > > Ok, thanks for the explanation. I have not worked extensively with MIR > yet. > >> The main reason I didn't pick something distinct from either >> (e.g. '${foo}') is that I'd like to minimize the need to modify the >> MIR parser to support pattern-specific syntax. > > Sure, that makes sense. > >>> My understanding is that "root" means the sink here, but later in the >>> "upside-down" example, "root" means the source. It seems to me the use >>> of "root" here is a misnomer/confusing. I liked Nicolai's ideas about >>> specifying the insert point. Why do users need to know about "root" at >>> all? Is there some other special meaning attached to it? >> >> It doesn't correspond with any property of the DAG being matched. It's >> the entry point that the combine algorithm uses to begin an attempt to >> match the rule. In DAGCombine terms, it's the first node checked for >> the rule inside the call to DAGCombine::visit(SDNode *). > > Got it. "root" has a specific meeting for a tree/dag and I think I was > getting hung up on that, but I see how it has meaning as the starting > point of a match. I was going to say maybe we can find another name for > it, but now I can't think of a better one. :)The best I can think of is 'matchpoint' or 'matchfrom'. As mentioned on the thread with Nicolai, we might be able to drop it altogether in favour of a let statement which can a longer name which will help to clarify it.> -David