Che-Liang Chiou
2011-Aug-24 08:30 UTC
[LLVMdev] proposal: add macro expansion of for-loop to TableGen
Hi folks, TableGen provides sufficiently rich syntax for expressing target instruction set. Nevertheless, when I wrote the PTX backend, I observed that some redundancy in TableGen can be further eliminated through macro expansion of for-loops. The semantics of a for-loop is expanding the for-loop body, and so it is equivalent to manually unroll the loop (see example #1). I believe the for-loop is not only useful to the PTX backend but also to other backends (see examples below). Generally speaking, a for-loop can be used anywhere when you see a "table filling" pattern --- you are writing repeated identical lines that only differs in a few places (see examples below). An (illustrative, not complete) BNF of for-loop is as follows: ---------------------------------------- FOR_LOOP := for INDICES in BLOCK INDICES := INDEX, INDICES | INDEX INDEX := VARIABLE = RANGE RANGE := [comma separated values, ...] | function(args, ...) BLOCK := { STATEMENTS } | STATEMENT; STATEMENTS := STATEMENT; STATEMENTS | STATEMENT; ---------------------------------------- Notes: * In statements, you may write #var or #func(args, ...) to expand a macro or macro function. * Macro functions are limited to a small set of functions, such as sequence(), lower(), and upper(). ====Example #1=== When defining register files, we repeatedly define every registers. In ARMRegisterInfo.td: ---------------------------------------- def R0 : ARMReg< 0, "r0">, DwarfRegNum<[0]>; def R1 : ARMReg< 1, "r1">, DwarfRegNum<[1]>; def R2 : ARMReg< 2, "r2">, DwarfRegNum<[2]>; def R3 : ARMReg< 3, "r3">, DwarfRegNum<[3]>; def R4 : ARMReg< 4, "r4">, DwarfRegNum<[4]>; def R5 : ARMReg< 5, "r5">, DwarfRegNum<[5]>; def R6 : ARMReg< 6, "r6">, DwarfRegNum<[6]>; def R7 : ARMReg< 7, "r7">, DwarfRegNum<[7]>; ---------------------------------------- I think it could be cleaner and shorter if we could write: ---------------------------------------- for i = sequence(0, 7) in def R#i : ARMReg<#i, "r#i">, DwarfRegNum<[#i]>; ---------------------------------------- The for-loop is expanded into 8 def's, and each #i is substituted with 0 ~ 7. As a matter of fact, we have added "(sequence ...)" to TableGen that makes defining register classes easier. I think add for-loop expansion to TableGen would make defining registers easier. ====Example #2=== As you may see below, each "def load_*" is almost identical. The only difference is the memory space name (global, constant, and etc.). I believe a for-loop can make it much more readable. (defining memory space patterns in PTXInstrInfo.td for each memory space) ---------------------------------------- def load_global : PatFrag<(ops node:$ptr), (load node:$ptr), [{ const Value *Src; const PointerType *PT; if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && (PT = dyn_cast<PointerType>(Src->getType()))) return PT->getAddressSpace() == PTX::GLOBAL; return false; }]>; def load_constant : PatFrag<(ops node:$ptr), (load node:$ptr), [{ const Value *Src; const PointerType *PT; if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && (PT = dyn_cast<PointerType>(Src->getType()))) return PT->getAddressSpace() == PTX::CONSTANT; return false; }]>; def load_local : PatFrag<(ops node:$ptr), (load node:$ptr), [{ const Value *Src; const PointerType *PT; if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && (PT = dyn_cast<PointerType>(Src->getType()))) return PT->getAddressSpace() == PTX::LOCAL; return false; }]>; def load_parameter : PatFrag<(ops node:$ptr), (load node:$ptr), [{ const Value *Src; const PointerType *PT; if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && (PT = dyn_cast<PointerType>(Src->getType()))) return PT->getAddressSpace() == PTX::PARAMETER; return false; }]>; def load_shared : PatFrag<(ops node:$ptr), (load node:$ptr), [{ const Value *Src; const PointerType *PT; if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && (PT = dyn_cast<PointerType>(Src->getType()))) return PT->getAddressSpace() == PTX::SHARED; return false; }]>; ---------------------------------------- It would be much cleaner if we could write: ---------------------------------------- for space = [global, constant, local, parameter, shared] in { def load_#space : PatFrag<(ops node:$ptr), (load node:$ptr), [{ const Value *Src; const PointerType *PT; if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && (PT = dyn_cast<PointerType>(Src->getType()))) return PT->getAddressSpace() == PTX::#upper(space); return false; }]>; } ---------------------------------------- ====Example #3=== The PTX backend makes excessive use of multiclass to "overload" and instruction that supports different types of operands. Each "def" of the multiclass is almost idential. In this case, although using a foo-loop adds a little bit cognitive cost to understand macro expansion, since it removes a lot of redundancy, I think it is actually more readable. (excerpt of PTXInstrInfo.td) ---------------------------------------- multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> { def rrr32 : InstPTX<(outs RegF32:$d), (ins RegF32:$a, RegF32:$b, RegF32:$c), !strconcat(opcstr, ".f32\t$d, $a, $b, $c"), [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a, RegF32:$b), RegF32:$c))]>; def rri32 : InstPTX<(outs RegF32:$d), (ins RegF32:$a, RegF32:$b, f32imm:$c), !strconcat(opcstr, ".f32\t$d, $a, $b, $c"), [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a, RegF32:$b), fpimm:$c))]>; def rrr64 : InstPTX<(outs RegF64:$d), (ins RegF64:$a, RegF64:$b, RegF64:$c), !strconcat(opcstr, ".f64\t$d, $a, $b, $c"), [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a, RegF64:$b), RegF64:$c))]>; def rri64 : InstPTX<(outs RegF64:$d), (ins RegF64:$a, RegF64:$b, f64imm:$c), !strconcat(opcstr, ".f64\t$d, $a, $b, $c"), [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a, RegF64:$b), fpimm:$c))]>; } ---------------------------------------- (Equivalent TableGen code with a for-loop) ---------------------------------------- multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> { for nbit = [32, 32, 64, 64], op_suffix = [r, i, r, i], op_type = [RegF32, f32imm, RegF64, f64imm], op_node_type = [RegF32, fpimm, RegF64, fpimm] in { def rr#op_suffix#nbit : InstPTX<(outs RegF#nbit:$d), (ins RegF#nbit:$a, RegF#nbit:$b, #op_type:$c), !strconcat(opcstr, ".f#nbit\t$d, $a, $b, $c"), [(set RegF#nbit:$d, (opnode2 (opnode1 RegF#nbit:$a, RegF#nbit:$b), #op_node_type:$c))]>; } } ---------------------------------------- === Any comments are welcome. If no one thinks it does not make sense at all, I will write up more details to the mailing list (like a complete BNF). Regards, Che-Liang
Che-Liang Chiou
2011-Aug-31 06:51 UTC
[LLVMdev] proposal: add macro expansion of for-loop to TableGen
Hi folks, I have done studies since then, and I found that TableGen per se is probably not a perfect place of adding macro expansion of for-loops. If in the future someone proposes a similar thing, I hope this is helpful to you. Here are the reasons: Reason #1: If we add for-loop as a top-level syntactical construct, such as let-statement, its usefulness is restricted. Here is an excerpt of BNF of TableGen: ---------------------------------------- ObjectList :== Object* Object ::= ClassInst Object ::= DefInst Object ::= MultiClassInst Object ::= DefMInst Object ::= LETCommand '{' ObjectList '}' Object ::= LETCommand Object DefInst ::= DEF ObjectName ObjectBody ClassInst ::= CLASS ID TemplateArgList? ObjectBody ObjectBody ::= BaseClassList Body Body ::= ';' Body ::= '{' BodyList '}' BodyList ::= BodyItem* BodyItem ::= Declaration ';' BodyItem ::= LET ID OptionalBitList '=' Value ';' ---------------------------------------- So it is natural to think that we could add for-loop as an Object: ---------------------------------------- Object ::= FORCommand '{' ObjectList '}' Object ::= FORCommand Object FORCommand ::= FOR ForList IN ForList ::= ForItem (',' ForItem)* ForItem ::= ID '=' ForRange ForRange ::= '[' ID (',' ID)* ']' ForRange ::= SEQUENCETOK '(' INTVAL ',' INTVAL ')' ---------------------------------------- Note that inside a def block and a class block, we cannot define an Object --- only Declaration and LET-assignment are allowed. So although It is technological possible to add for-loop as a Object, the usefulness is restricted: we may not write a for-loop inside a def block or a class block. Of course it is possible to rewrite the BNF to allow us write for-loop inside a def block or a class block. This leads to reason #2. Reason #2: If we make for-loop syntax orthogonal so that we may write for-loop inside a def block or a class block, TableGen would be very complicated regarding the BNF and the implementation. I believe we could implement for-loop in a way that does not add too much maintenance cost to the TableGen codebase. Conclusion: Instead of implementing for-loop in TableGen, I think it should be done in a preprocessor, which unrolls the for-loop and probably also injects C-style line control directives: "#line linenum filename". Make sense? Regards, Che-Liang On Wed, Aug 24, 2011 at 4:30 PM, Che-Liang Chiou <clchiou at gmail.com> wrote:> Hi folks, > > TableGen provides sufficiently rich syntax for expressing target > instruction set. Nevertheless, when I wrote the PTX backend, I > observed that some redundancy in TableGen can be further eliminated > through macro expansion of for-loops. > > The semantics of a for-loop is expanding the for-loop body, and so it > is equivalent to manually unroll the loop (see example #1). > > I believe the for-loop is not only useful to the PTX backend but also > to other backends (see examples below). Generally speaking, a for-loop > can be used anywhere when you see a "table filling" pattern --- you > are writing repeated identical lines that only differs in a few places > (see examples below). > > An (illustrative, not complete) BNF of for-loop is as follows: > ---------------------------------------- > FOR_LOOP := for INDICES in BLOCK > > INDICES := INDEX, INDICES | INDEX > > INDEX := VARIABLE = RANGE > > RANGE := [comma separated values, ...] | function(args, ...) > > BLOCK := { STATEMENTS } | STATEMENT; > > STATEMENTS := STATEMENT; STATEMENTS | STATEMENT; > ---------------------------------------- > > Notes: > * In statements, you may write #var or #func(args, ...) to expand a > macro or macro function. > * Macro functions are limited to a small set of functions, such as > sequence(), lower(), and upper(). > > ====Example #1===> > When defining register files, we repeatedly define every registers. In > ARMRegisterInfo.td: > ---------------------------------------- > def R0 : ARMReg< 0, "r0">, DwarfRegNum<[0]>; > def R1 : ARMReg< 1, "r1">, DwarfRegNum<[1]>; > def R2 : ARMReg< 2, "r2">, DwarfRegNum<[2]>; > def R3 : ARMReg< 3, "r3">, DwarfRegNum<[3]>; > def R4 : ARMReg< 4, "r4">, DwarfRegNum<[4]>; > def R5 : ARMReg< 5, "r5">, DwarfRegNum<[5]>; > def R6 : ARMReg< 6, "r6">, DwarfRegNum<[6]>; > def R7 : ARMReg< 7, "r7">, DwarfRegNum<[7]>; > ---------------------------------------- > > I think it could be cleaner and shorter if we could write: > ---------------------------------------- > for i = sequence(0, 7) in > def R#i : ARMReg<#i, "r#i">, DwarfRegNum<[#i]>; > ---------------------------------------- > The for-loop is expanded into 8 def's, and each #i is substituted with 0 ~ 7. > > As a matter of fact, we have added "(sequence ...)" to TableGen that > makes defining register classes easier. I think add for-loop expansion > to TableGen would make defining registers easier. > > ====Example #2===> > As you may see below, each "def load_*" is almost identical. The only > difference is the memory space name (global, constant, and etc.). I > believe a for-loop can make it much more readable. > > (defining memory space patterns in PTXInstrInfo.td for each memory space) > ---------------------------------------- > def load_global : PatFrag<(ops node:$ptr), (load node:$ptr), [{ > const Value *Src; > const PointerType *PT; > if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && > (PT = dyn_cast<PointerType>(Src->getType()))) > return PT->getAddressSpace() == PTX::GLOBAL; > return false; > }]>; > def load_constant : PatFrag<(ops node:$ptr), (load node:$ptr), [{ > const Value *Src; > const PointerType *PT; > if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && > (PT = dyn_cast<PointerType>(Src->getType()))) > return PT->getAddressSpace() == PTX::CONSTANT; > return false; > }]>; > def load_local : PatFrag<(ops node:$ptr), (load node:$ptr), [{ > const Value *Src; > const PointerType *PT; > if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && > (PT = dyn_cast<PointerType>(Src->getType()))) > return PT->getAddressSpace() == PTX::LOCAL; > return false; > }]>; > def load_parameter : PatFrag<(ops node:$ptr), (load node:$ptr), [{ > const Value *Src; > const PointerType *PT; > if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && > (PT = dyn_cast<PointerType>(Src->getType()))) > return PT->getAddressSpace() == PTX::PARAMETER; > return false; > }]>; > def load_shared : PatFrag<(ops node:$ptr), (load node:$ptr), [{ > const Value *Src; > const PointerType *PT; > if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && > (PT = dyn_cast<PointerType>(Src->getType()))) > return PT->getAddressSpace() == PTX::SHARED; > return false; > }]>; > ---------------------------------------- > > It would be much cleaner if we could write: > ---------------------------------------- > for space = [global, constant, local, parameter, shared] in { > def load_#space : PatFrag<(ops node:$ptr), (load node:$ptr), [{ > const Value *Src; > const PointerType *PT; > if ((Src = cast<LoadSDNode>(N)->getSrcValue()) && > (PT = dyn_cast<PointerType>(Src->getType()))) > return PT->getAddressSpace() == PTX::#upper(space); > return false; > }]>; > } > ---------------------------------------- > > ====Example #3===> > The PTX backend makes excessive use of multiclass to "overload" and > instruction that supports different types of operands. Each "def" of > the multiclass is almost idential. > > In this case, although using a foo-loop adds a little bit cognitive > cost to understand macro expansion, since it removes a lot of > redundancy, I think it is actually more readable. > > (excerpt of PTXInstrInfo.td) > ---------------------------------------- > multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> { > def rrr32 : InstPTX<(outs RegF32:$d), > (ins RegF32:$a, RegF32:$b, RegF32:$c), > !strconcat(opcstr, ".f32\t$d, $a, $b, $c"), > [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a, > RegF32:$b), > RegF32:$c))]>; > def rri32 : InstPTX<(outs RegF32:$d), > (ins RegF32:$a, RegF32:$b, f32imm:$c), > !strconcat(opcstr, ".f32\t$d, $a, $b, $c"), > [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a, > RegF32:$b), > fpimm:$c))]>; > def rrr64 : InstPTX<(outs RegF64:$d), > (ins RegF64:$a, RegF64:$b, RegF64:$c), > !strconcat(opcstr, ".f64\t$d, $a, $b, $c"), > [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a, > RegF64:$b), > RegF64:$c))]>; > def rri64 : InstPTX<(outs RegF64:$d), > (ins RegF64:$a, RegF64:$b, f64imm:$c), > !strconcat(opcstr, ".f64\t$d, $a, $b, $c"), > [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a, > RegF64:$b), > fpimm:$c))]>; > } > ---------------------------------------- > > (Equivalent TableGen code with a for-loop) > ---------------------------------------- > multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> { > for nbit = [32, 32, 64, 64], > op_suffix = [r, i, r, i], > op_type = [RegF32, f32imm, RegF64, f64imm], > op_node_type = [RegF32, fpimm, RegF64, fpimm] in { > def rr#op_suffix#nbit > : InstPTX<(outs RegF#nbit:$d), > (ins RegF#nbit:$a, RegF#nbit:$b, #op_type:$c), > !strconcat(opcstr, ".f#nbit\t$d, $a, $b, $c"), > [(set RegF#nbit:$d, > (opnode2 (opnode1 RegF#nbit:$a, RegF#nbit:$b), > #op_node_type:$c))]>; > } > } > ---------------------------------------- > > ===> > Any comments are welcome. If no one thinks it does not make sense at > all, I will write up more details to the mailing list (like a complete > BNF). > > Regards, > Che-Liang >