Che-Liang Chiou
2011-Aug-24 08:30 UTC
[LLVMdev] proposal: add macro expansion of for-loop to TableGen
Hi folks,
TableGen provides sufficiently rich syntax for expressing target
instruction set. Nevertheless, when I wrote the PTX backend, I
observed that some redundancy in TableGen can be further eliminated
through macro expansion of for-loops.
The semantics of a for-loop is expanding the for-loop body, and so it
is equivalent to manually unroll the loop (see example #1).
I believe the for-loop is not only useful to the PTX backend but also
to other backends (see examples below). Generally speaking, a for-loop
can be used anywhere when you see a "table filling" pattern --- you
are writing repeated identical lines that only differs in a few places
(see examples below).
An (illustrative, not complete) BNF of for-loop is as follows:
----------------------------------------
FOR_LOOP := for INDICES in BLOCK
INDICES := INDEX, INDICES | INDEX
INDEX := VARIABLE = RANGE
RANGE := [comma separated values, ...] | function(args, ...)
BLOCK := { STATEMENTS } | STATEMENT;
STATEMENTS := STATEMENT; STATEMENTS | STATEMENT;
----------------------------------------
Notes:
* In statements, you may write #var or #func(args, ...) to expand a
macro or macro function.
* Macro functions are limited to a small set of functions, such as
sequence(), lower(), and upper().
====Example #1===
When defining register files, we repeatedly define every registers. In
ARMRegisterInfo.td:
----------------------------------------
def R0 : ARMReg< 0, "r0">, DwarfRegNum<[0]>;
def R1 : ARMReg< 1, "r1">, DwarfRegNum<[1]>;
def R2 : ARMReg< 2, "r2">, DwarfRegNum<[2]>;
def R3 : ARMReg< 3, "r3">, DwarfRegNum<[3]>;
def R4 : ARMReg< 4, "r4">, DwarfRegNum<[4]>;
def R5 : ARMReg< 5, "r5">, DwarfRegNum<[5]>;
def R6 : ARMReg< 6, "r6">, DwarfRegNum<[6]>;
def R7 : ARMReg< 7, "r7">, DwarfRegNum<[7]>;
----------------------------------------
I think it could be cleaner and shorter if we could write:
----------------------------------------
for i = sequence(0, 7) in
def R#i : ARMReg<#i, "r#i">, DwarfRegNum<[#i]>;
----------------------------------------
The for-loop is expanded into 8 def's, and each #i is substituted with 0 ~
7.
As a matter of fact, we have added "(sequence ...)" to TableGen that
makes defining register classes easier. I think add for-loop expansion
to TableGen would make defining registers easier.
====Example #2===
As you may see below, each "def load_*" is almost identical. The only
difference is the memory space name (global, constant, and etc.). I
believe a for-loop can make it much more readable.
(defining memory space patterns in PTXInstrInfo.td for each memory space)
----------------------------------------
def load_global : PatFrag<(ops node:$ptr), (load node:$ptr), [{
const Value *Src;
const PointerType *PT;
if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
(PT = dyn_cast<PointerType>(Src->getType())))
return PT->getAddressSpace() == PTX::GLOBAL;
return false;
}]>;
def load_constant : PatFrag<(ops node:$ptr), (load node:$ptr), [{
const Value *Src;
const PointerType *PT;
if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
(PT = dyn_cast<PointerType>(Src->getType())))
return PT->getAddressSpace() == PTX::CONSTANT;
return false;
}]>;
def load_local : PatFrag<(ops node:$ptr), (load node:$ptr), [{
const Value *Src;
const PointerType *PT;
if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
(PT = dyn_cast<PointerType>(Src->getType())))
return PT->getAddressSpace() == PTX::LOCAL;
return false;
}]>;
def load_parameter : PatFrag<(ops node:$ptr), (load node:$ptr), [{
const Value *Src;
const PointerType *PT;
if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
(PT = dyn_cast<PointerType>(Src->getType())))
return PT->getAddressSpace() == PTX::PARAMETER;
return false;
}]>;
def load_shared : PatFrag<(ops node:$ptr), (load node:$ptr), [{
const Value *Src;
const PointerType *PT;
if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
(PT = dyn_cast<PointerType>(Src->getType())))
return PT->getAddressSpace() == PTX::SHARED;
return false;
}]>;
----------------------------------------
It would be much cleaner if we could write:
----------------------------------------
for space = [global, constant, local, parameter, shared] in {
def load_#space : PatFrag<(ops node:$ptr), (load node:$ptr), [{
const Value *Src;
const PointerType *PT;
if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
(PT = dyn_cast<PointerType>(Src->getType())))
return PT->getAddressSpace() == PTX::#upper(space);
return false;
}]>;
}
----------------------------------------
====Example #3===
The PTX backend makes excessive use of multiclass to "overload" and
instruction that supports different types of operands. Each "def" of
the multiclass is almost idential.
In this case, although using a foo-loop adds a little bit cognitive
cost to understand macro expansion, since it removes a lot of
redundancy, I think it is actually more readable.
(excerpt of PTXInstrInfo.td)
----------------------------------------
multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> {
def rrr32 : InstPTX<(outs RegF32:$d),
(ins RegF32:$a, RegF32:$b, RegF32:$c),
!strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
[(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
RegF32:$b),
RegF32:$c))]>;
def rri32 : InstPTX<(outs RegF32:$d),
(ins RegF32:$a, RegF32:$b, f32imm:$c),
!strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
[(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
RegF32:$b),
fpimm:$c))]>;
def rrr64 : InstPTX<(outs RegF64:$d),
(ins RegF64:$a, RegF64:$b, RegF64:$c),
!strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
[(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
RegF64:$b),
RegF64:$c))]>;
def rri64 : InstPTX<(outs RegF64:$d),
(ins RegF64:$a, RegF64:$b, f64imm:$c),
!strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
[(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
RegF64:$b),
fpimm:$c))]>;
}
----------------------------------------
(Equivalent TableGen code with a for-loop)
----------------------------------------
multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> {
for nbit = [32, 32, 64, 64],
op_suffix = [r, i, r, i],
op_type = [RegF32, f32imm, RegF64, f64imm],
op_node_type = [RegF32, fpimm, RegF64, fpimm] in {
def rr#op_suffix#nbit
: InstPTX<(outs RegF#nbit:$d),
(ins RegF#nbit:$a, RegF#nbit:$b, #op_type:$c),
!strconcat(opcstr, ".f#nbit\t$d, $a, $b, $c"),
[(set RegF#nbit:$d,
(opnode2 (opnode1 RegF#nbit:$a, RegF#nbit:$b),
#op_node_type:$c))]>;
}
}
----------------------------------------
===
Any comments are welcome. If no one thinks it does not make sense at
all, I will write up more details to the mailing list (like a complete
BNF).
Regards,
Che-Liang
Che-Liang Chiou
2011-Aug-31 06:51 UTC
[LLVMdev] proposal: add macro expansion of for-loop to TableGen
Hi folks,
I have done studies since then, and I found that TableGen per se is
probably not a perfect place of adding macro expansion of for-loops.
If in the future someone proposes a similar thing, I hope this is
helpful to you. Here are the reasons:
Reason #1: If we add for-loop as a top-level syntactical construct,
such as let-statement, its usefulness is restricted.
Here is an excerpt of BNF of TableGen:
----------------------------------------
ObjectList :== Object*
Object ::= ClassInst
Object ::= DefInst
Object ::= MultiClassInst
Object ::= DefMInst
Object ::= LETCommand '{' ObjectList '}'
Object ::= LETCommand Object
DefInst ::= DEF ObjectName ObjectBody
ClassInst ::= CLASS ID TemplateArgList? ObjectBody
ObjectBody ::= BaseClassList Body
Body ::= ';'
Body ::= '{' BodyList '}'
BodyList ::= BodyItem*
BodyItem ::= Declaration ';'
BodyItem ::= LET ID OptionalBitList '=' Value ';'
----------------------------------------
So it is natural to think that we could add for-loop as an Object:
----------------------------------------
Object ::= FORCommand '{' ObjectList '}'
Object ::= FORCommand Object
FORCommand ::= FOR ForList IN
ForList ::= ForItem (',' ForItem)*
ForItem ::= ID '=' ForRange
ForRange ::= '[' ID (',' ID)* ']'
ForRange ::= SEQUENCETOK '(' INTVAL ',' INTVAL ')'
----------------------------------------
Note that inside a def block and a class block, we cannot define an
Object --- only Declaration and LET-assignment are allowed.
So although It is technological possible to add for-loop as a Object,
the usefulness is restricted: we may not write a for-loop inside a def
block or a class block.
Of course it is possible to rewrite the BNF to allow us write for-loop
inside a def block or a class block. This leads to reason #2.
Reason #2: If we make for-loop syntax orthogonal so that we may write
for-loop inside a def block or a class block, TableGen would be very
complicated regarding the BNF and the implementation. I believe we
could implement for-loop in a way that does not add too much
maintenance cost to the TableGen codebase.
Conclusion:
Instead of implementing for-loop in TableGen, I think it should be
done in a preprocessor, which unrolls the for-loop and probably also
injects C-style line control directives: "#line linenum filename".
Make sense?
Regards,
Che-Liang
On Wed, Aug 24, 2011 at 4:30 PM, Che-Liang Chiou <clchiou at gmail.com>
wrote:> Hi folks,
>
> TableGen provides sufficiently rich syntax for expressing target
> instruction set. Nevertheless, when I wrote the PTX backend, I
> observed that some redundancy in TableGen can be further eliminated
> through macro expansion of for-loops.
>
> The semantics of a for-loop is expanding the for-loop body, and so it
> is equivalent to manually unroll the loop (see example #1).
>
> I believe the for-loop is not only useful to the PTX backend but also
> to other backends (see examples below). Generally speaking, a for-loop
> can be used anywhere when you see a "table filling" pattern ---
you
> are writing repeated identical lines that only differs in a few places
> (see examples below).
>
> An (illustrative, not complete) BNF of for-loop is as follows:
> ----------------------------------------
> FOR_LOOP := for INDICES in BLOCK
>
> INDICES := INDEX, INDICES | INDEX
>
> INDEX := VARIABLE = RANGE
>
> RANGE := [comma separated values, ...] | function(args, ...)
>
> BLOCK := { STATEMENTS } | STATEMENT;
>
> STATEMENTS := STATEMENT; STATEMENTS | STATEMENT;
> ----------------------------------------
>
> Notes:
> * In statements, you may write #var or #func(args, ...) to expand a
> macro or macro function.
> * Macro functions are limited to a small set of functions, such as
> sequence(), lower(), and upper().
>
> ====Example #1===>
> When defining register files, we repeatedly define every registers. In
> ARMRegisterInfo.td:
> ----------------------------------------
> def R0 : ARMReg< 0, "r0">, DwarfRegNum<[0]>;
> def R1 : ARMReg< 1, "r1">, DwarfRegNum<[1]>;
> def R2 : ARMReg< 2, "r2">, DwarfRegNum<[2]>;
> def R3 : ARMReg< 3, "r3">, DwarfRegNum<[3]>;
> def R4 : ARMReg< 4, "r4">, DwarfRegNum<[4]>;
> def R5 : ARMReg< 5, "r5">, DwarfRegNum<[5]>;
> def R6 : ARMReg< 6, "r6">, DwarfRegNum<[6]>;
> def R7 : ARMReg< 7, "r7">, DwarfRegNum<[7]>;
> ----------------------------------------
>
> I think it could be cleaner and shorter if we could write:
> ----------------------------------------
> for i = sequence(0, 7) in
> def R#i : ARMReg<#i, "r#i">, DwarfRegNum<[#i]>;
> ----------------------------------------
> The for-loop is expanded into 8 def's, and each #i is substituted with
0 ~ 7.
>
> As a matter of fact, we have added "(sequence ...)" to TableGen
that
> makes defining register classes easier. I think add for-loop expansion
> to TableGen would make defining registers easier.
>
> ====Example #2===>
> As you may see below, each "def load_*" is almost identical. The
only
> difference is the memory space name (global, constant, and etc.). I
> believe a for-loop can make it much more readable.
>
> (defining memory space patterns in PTXInstrInfo.td for each memory space)
> ----------------------------------------
> def load_global : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::GLOBAL;
> return false;
> }]>;
> def load_constant : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::CONSTANT;
> return false;
> }]>;
> def load_local : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::LOCAL;
> return false;
> }]>;
> def load_parameter : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::PARAMETER;
> return false;
> }]>;
> def load_shared : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::SHARED;
> return false;
> }]>;
> ----------------------------------------
>
> It would be much cleaner if we could write:
> ----------------------------------------
> for space = [global, constant, local, parameter, shared] in {
> def load_#space : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::#upper(space);
> return false;
> }]>;
> }
> ----------------------------------------
>
> ====Example #3===>
> The PTX backend makes excessive use of multiclass to "overload"
and
> instruction that supports different types of operands. Each "def"
of
> the multiclass is almost idential.
>
> In this case, although using a foo-loop adds a little bit cognitive
> cost to understand macro expansion, since it removes a lot of
> redundancy, I think it is actually more readable.
>
> (excerpt of PTXInstrInfo.td)
> ----------------------------------------
> multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode
opnode2> {
> def rrr32 : InstPTX<(outs RegF32:$d),
> (ins RegF32:$a, RegF32:$b, RegF32:$c),
> !strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
> [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
> RegF32:$b),
> RegF32:$c))]>;
> def rri32 : InstPTX<(outs RegF32:$d),
> (ins RegF32:$a, RegF32:$b, f32imm:$c),
> !strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
> [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
> RegF32:$b),
> fpimm:$c))]>;
> def rrr64 : InstPTX<(outs RegF64:$d),
> (ins RegF64:$a, RegF64:$b, RegF64:$c),
> !strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
> [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
> RegF64:$b),
> RegF64:$c))]>;
> def rri64 : InstPTX<(outs RegF64:$d),
> (ins RegF64:$a, RegF64:$b, f64imm:$c),
> !strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
> [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
> RegF64:$b),
> fpimm:$c))]>;
> }
> ----------------------------------------
>
> (Equivalent TableGen code with a for-loop)
> ----------------------------------------
> multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode
opnode2> {
> for nbit = [32, 32, 64, 64],
> op_suffix = [r, i, r, i],
> op_type = [RegF32, f32imm, RegF64, f64imm],
> op_node_type = [RegF32, fpimm, RegF64, fpimm] in {
> def rr#op_suffix#nbit
> : InstPTX<(outs RegF#nbit:$d),
> (ins RegF#nbit:$a, RegF#nbit:$b, #op_type:$c),
> !strconcat(opcstr, ".f#nbit\t$d, $a, $b, $c"),
> [(set RegF#nbit:$d,
> (opnode2 (opnode1 RegF#nbit:$a, RegF#nbit:$b),
> #op_node_type:$c))]>;
> }
> }
> ----------------------------------------
>
> ===>
> Any comments are welcome. If no one thinks it does not make sense at
> all, I will write up more details to the mailing list (like a complete
> BNF).
>
> Regards,
> Che-Liang
>