thr3ads.net - llvm dev - [LLVMdev] Instruction descriptions question [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Roman Levenstein

2006-Oct-01 12:06 UTC

[LLVMdev] Instruction descriptions question

Hi,

I'm trying to implement a new backend for an embedded CISC processor.
Therefore I thought that it makes sense to take X86 target as a basis,
to save some time. 

But when I look into the X86InstrInfo.td, I have a very strong feeling
that it is one of the most complex instruction set descriptions
compared to other targets. I can imagine that this is due to the
complexity of X86's instruction set. But still I have a feeling that it
is overcomplicated, but may be it is just an initial impression.

Since I just started, here are my first several questions and
proposals. I hope that some of them make sense for you.

1. Why does X86 instruction set description provide different
descriptions for the same instructions, which differ only in the size
of operands?
E.g.

def MOV8rm  : I<0x8A, MRMSrcMem, (ops GR8 :$dst, i8mem :$src),
                "mov{b} {$src, $dst|$dst, $src}",
                [(set GR8:$dst, (load addr:$src))]>;
def MOV16rm : I<0x8B, MRMSrcMem, (ops GR16:$dst, i16mem:$src),
                "mov{w} {$src, $dst|$dst, $src}",
                [(set GR16:$dst, (load addr:$src))]>, OpSize;
def MOV32rm : I<0x8B, MRMSrcMem, (ops GR32:$dst, i32mem:$src),
                "mov{l} {$src, $dst|$dst, $src}",
                [(set GR32:$dst, (load addr:$src))]>;

For my target processor, only the types of operands (e.g. rm) are
really important. The size is encoded into the opcode, but otherwise it
does not affect anything except for constraints on the sizes of
operands. And constraint is basically that the size of both operands
should be the same.

Wouldn't it be possible and even more clean to have just one
description like (I use a pseudo-description here):

def MOVrr  : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst, 
(i8mem|i16mem|i32mem):$src),
                "mov{b} {$src, $dst|$dst, $src}", []>,
isSameSize($dst,
$src);

The semantic of such a description would mean that $dst should be one
of GR8, GR16, GR32 and $dst is one of i8mem, i16mem, i32mem with the
additional constraint that the sizes of both operands are the same
(this is checked by isSameSize predicate). 

More over, in very many cases, the operation (e.g. OR) itself affects
only the encoding of the operation inside the opcode and of course the
description of the effect and assembler syntax e.g. [(store (or (load
addr:$dst), GR8:$src), addr:$dst)]. For OR insn the opreation should be
or, and for AND insn - obviously and. But operands and operand
combinations have the same limitations and descriptions also for many
other operations, e.g. XOR, AND, etc.
Wouldn't it make sense to provide an easier way to describe it?
For example, one would describe operands combinations (e.g. RR, RM)
separately and then just say which instructions they are applicable
for? May be using something like:

def rm : (GR32 :$dst, i32mem:$src)
{
  "$assm_op{l} {$src, $dst|$dst, $src}",
  [(set GR32:$dst, ($insn_effect_op GR16:$dst, (load addr:$src)))]     
         
};

Mapping from instruction name to instruction syntax and/or effect can
be also simplified to something like:

def AND -> assm_op="and", insn_effect_op="and";
def OR -> assm_op"or", insn_effect_op="or";

let rm in {
  def OR    : ...;
  def AND   : ...;
  def MOV   : ...;
  ...
}

The effect would be a set of instructions descriptions with names 
ORrm, ANDrm, MOVrm and required type of operands. Effectively, it
produces the same descriptions that would be written by hand (in this
regard it works pretty much as a preprocessor that expands macro
definitions). The advantage is that descriptions become much shorter
and cleaner, because parts common for multiple instructions are
factored out into a single description element.

I ask it because I have used BURG based instruction selectors (e.g. the
one from LCC compiler and some others) before and there is was quite
easy to express something like that (though not everything described
above) in the tree grammar for a code selector,e.g.
 gpr  -> GR8 | GR16| GR32;
 mem  -> i8mem | i16mem | i32mem;  
 insn -> mov(gpr:src, mem:dst) | operand_size_of(src) =operand_size_of(src)

Is it possible to express something like that using TableGen
descriptions? Or are they so limited at the moment that it is not
possible and this is the reason for so elaborative descriptions for the
X86 target. If it is not possible yet, are there any plans to implement
something like that for providing the possibilities for even cleaner,
shorter and more expressive and concise descriptions? Of course, all
extensions the I described are just sketches, but what do you think
about this approach as such?

2. Another related question is about instruciton costs. In BURG-based
selectors there is usually a possibility to describe costs for the
instructions so that a least-cost cover can be found during the
instruction selection process. I have not seen any such cost
descriptions in TableGen files. Does it mean that it is not supported?
Why? As far as I understand, LLVM uses BURG/IBURG for instruction
selection and both of these tools support costs, AFAIK.

Best Regards,
  Roman

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Roman Levenstein

2006-Oct-02 19:37 UTC

head link

[LLVMdev] Instruction descriptions question

Hi Chris,

Thanks a lot for your answer!

Chris Lattner wrote:>> 1. Why does X86 instruction set description provide different
>> descriptions for the same instructions, which differ only in the
size>> of operands?
>> E.g.
>>
>> def MOV8rm  : I<0x8A, MRMSrcMem, (ops GR8 :$dst, i8mem :$src),
>>                "mov{b} {$src, $dst|$dst, $src}",
>>                [(set GR8:$dst, (load addr:$src))]>;
>> def MOV16rm : I<0x8B, MRMSrcMem, (ops GR16:$dst, i16mem:$src),
>>                "mov{w} {$src, $dst|$dst, $src}",
>>                [(set GR16:$dst, (load addr:$src))]>, OpSize;
>> def MOV32rm : I<0x8B, MRMSrcMem, (ops GR32:$dst, i32mem:$src),
>>                "mov{l} {$src, $dst|$dst, $src}",
>>                [(set GR32:$dst, (load addr:$src))]>;
> 
> We must do this, because they perform different operations. 
Specifically, > the loads all read a different number of bytes (1/2/4).
OK.

 >> For my target processor, only the types of operands (e.g. rm) are
>> really important. The size is encoded into the opcode, but otherwise
it>> does not affect anything except for constraints on the sizes of
>> operands. And constraint is basically that the size of both operands
>> should be the same.
> 
> Ok.
> 
>> Wouldn't it be possible and even more clean to have just one
>> description like (I use a pseudo-description here):
>>
>> def MOVrr  : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst,
>> (i8mem|i16mem|i32mem):$src),
>>                "mov{b} {$src, $dst|$dst, $src}", []>,
isSameSize($dst,>> $src);
> 
> We already have something like this, but it's a little more general. 
The > X86 backend hasn't been converted to use it. 
Any plans to do it, to make things simpler?
> This is the 'multiclass' 
> facility in tblgen:
> http://llvm.org/docs/TableGenFundamentals.html#multiclass
This is very interesting.
 > Basically this lets you use one definition to implement multiple
different > instructions.  For example, most instructions in the sparc target
come in > "reg,reg" and "reg,imm" forms.  As such, it defines:
> 
> multiclass F3_12<string OpcStr, bits<6> Op3Val, SDNode OpNode>
{
>    def rr  : F3_1<2, Op3Val,
>                   (ops IntRegs:$dst, IntRegs:$b, IntRegs:$c),
>                   !strconcat(OpcStr, " $b, $c, $dst"),
>                   [(set IntRegs:$dst, (OpNode IntRegs:$b,
IntRegs:$c))]>;>    def ri  : F3_2<2, Op3Val,
>                   (ops IntRegs:$dst, IntRegs:$b, i32imm:$c),
>                   !strconcat(OpcStr, " $b, $c, $dst"),
>                   [(set IntRegs:$dst, (OpNode IntRegs:$b,
simm13:$c))]>;> }
> 
> which allows it to use instructions like:
> 
> defm AND    : F3_12<"and"  , 0b000001, and>;
> defm OR     : F3_12<"or"   , 0b000010, or>;
> defm XOR    : F3_12<"xor"  , 0b000011, xor>;
> defm SLL    : F3_12<"sll"  , 0b100101, shl>;
> defm SRL    : F3_12<"srl"  , 0b100110, srl>;
> defm SRA    : F3_12<"sra"  , 0b100111, sra>;
> defm ADD    : F3_12<"add"  , 0b000000, add>;
> defm ADDCC  : F3_12<"addcc", 0b010000, addc>;
> defm ADDX   : F3_12<"addx" , 0b001000, adde>;
> defm SUB    : F3_12<"sub"  , 0b000100, sub>;
> defm SUBX   : F3_12<"subx" , 0b001100, sube>;
> defm SUBCC  : F3_12<"subcc", 0b010100, SPcmpicc>;
> ...
> 
> Each of these 'defm's expand into two instructions.
> 
>> The semantic of such a description would mean that $dst should be
one>> of GR8, GR16, GR32 and $dst is one of i8mem, i16mem, i32mem with the
>> additional constraint that the sizes of both operands are the same
>> (this is checked by isSameSize predicate).
> 
> Sure.  It would be straight-forward to define a multiclass for your
arch, > in which each use of defm makes three instructions: one for eachwidth.

Beautiful! This saved my day! Multiclass is an extremely useful
construct. Thanks a lot for explanations. Multiclasses seem to provide
almost everything (if not all) of the features that I was asking for.
I'll try to use it for writing a short and concise definition of
instructions for my target. 
 >> 2. Another related question is about instruciton costs. In
BURG-based>> selectors there is usually a possibility to describe costs for the
>> instructions so that a least-cost cover can be found during the
>> instruction selection process. I have not seen any such cost
>> descriptions in TableGen files. Does it mean that it is not
supported?> 
> LLVM currently uses assumes that all instructions have cost 1, unless
told > otherwise ('Pat' patterns which produce multiple instructions have
cost > equal to the sum of their result).  Given this, it attempts to match
the > largest patterns possible, which reduces cost.
> 
> The 'largest' metric is somewhat intricate, but you can directly
affect it > in your .td files with the 'AddedComplexity'.  If you increase it,
it
will > make an instruction more likely to be matched.  However, you really 
> shouldn't need to do this, except in extreme cases.  Are you seeing
cases > where the selector is missing an optimal match, or are you just used
to > other tools which aren't as smart at inferring cost as tblgen?
I guess, I'm just used to other more BURG-like tools. And yes, they are
not too intelligent compared to tblgen;) 

So far, I haven't seen a real case where an optimal match is missed by
tblgen. But I have not completely converted the old BURG-spec into
tblgen yet. When I'm ready, I'll try to compare the results and check
some corner cases (for example, I was using tree grammar to
automatically select a best addressing mode, which tblgen doesn't
support yet as far as I understand from the documentation; there were
also interesting tree patterns for C operations of the form X OP= Y and
increment operators) - let's see how tblgen will perform.
 >> Why? As far as I understand, LLVM uses BURG/IBURG for instruction
>> selection and both of these tools support costs, AFAIK.
> 
> Nope, it uses neither.  It uses custom code built into tblgen.
OK. Now I understand. Probably I thought that tblgen is BURG-based
because there is a Burg subdirectory in the llvm/utils. And in older
versions of llvm it was even non-empty, giving the impression that it
is used.

BTW, does it use the same theory as BURG/IBURG when it generates the
selector? I.e. dynamic programming + precomputed costs,etc?  Does it 
implicitly build a tree grammar or something like that? Does it try to
merge/share patterns as much as possible to speedup the recognition
phase as it is done by BURMs? I have the impression (by looking at the
produced C++ selector code) that tblgen-based selectors are not as
"precomputed" and optimal as BURG-based ones. Do you have an idea or
figures about how it compares to BURG/IBURG selectors?
 
Thanks again,
  Roman


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Chris Lattner

2006-Oct-02 20:00 UTC

head link

[LLVMdev] Instruction descriptions question

On Sun, 1 Oct 2006, Roman Levenstein wrote:> I'm trying to implement a new backend for an embedded CISC processor.
> Therefore I thought that it makes sense to take X86 target as a basis,
> to save some time.
Ok.  Note that the X86 backend is one of the most complex though, because 
it supports several subtargets and ABIs, which makes it more complex than 
some other targets.
> But when I look into the X86InstrInfo.td, I have a very strong feeling
> that it is one of the most complex instruction set descriptions
> compared to other targets.
Yep.
> I can imagine that this is due to the
> complexity of X86's instruction set. But still I have a feeling that it
> is overcomplicated, but may be it is just an initial impression.
Ok.
> Since I just started, here are my first several questions and
> proposals. I hope that some of them make sense for you.
Ok.
> 1. Why does X86 instruction set description provide different
> descriptions for the same instructions, which differ only in the size
> of operands?
> E.g.
>
> def MOV8rm  : I<0x8A, MRMSrcMem, (ops GR8 :$dst, i8mem :$src),
>                "mov{b} {$src, $dst|$dst, $src}",
>                [(set GR8:$dst, (load addr:$src))]>;
> def MOV16rm : I<0x8B, MRMSrcMem, (ops GR16:$dst, i16mem:$src),
>                "mov{w} {$src, $dst|$dst, $src}",
>                [(set GR16:$dst, (load addr:$src))]>, OpSize;
> def MOV32rm : I<0x8B, MRMSrcMem, (ops GR32:$dst, i32mem:$src),
>                "mov{l} {$src, $dst|$dst, $src}",
>                [(set GR32:$dst, (load addr:$src))]>;
We must do this, because they perform different operations.  Specifically, 
the loads all read a different number of bytes (1/2/4).
> For my target processor, only the types of operands (e.g. rm) are
> really important. The size is encoded into the opcode, but otherwise it
> does not affect anything except for constraints on the sizes of
> operands. And constraint is basically that the size of both operands
> should be the same.
Ok.
> Wouldn't it be possible and even more clean to have just one
> description like (I use a pseudo-description here):
>
> def MOVrr  : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst,
> (i8mem|i16mem|i32mem):$src),
>                "mov{b} {$src, $dst|$dst, $src}", []>,
isSameSize($dst,
> $src);
We already have something like this, but it's a little more general.  The 
X86 backend hasn't been converted to use it.  This is the
'multiclass'
facility in tblgen:
http://llvm.org/docs/TableGenFundamentals.html#multiclass

Basically this lets you use one definition to implement multiple different 
instructions.  For example, most instructions in the sparc target come in 
"reg,reg" and "reg,imm" forms.  As such, it defines:

multiclass F3_12<string OpcStr, bits<6> Op3Val, SDNode OpNode> {
   def rr  : F3_1<2, Op3Val,
                  (ops IntRegs:$dst, IntRegs:$b, IntRegs:$c),
                  !strconcat(OpcStr, " $b, $c, $dst"),
                  [(set IntRegs:$dst, (OpNode IntRegs:$b, IntRegs:$c))]>;
   def ri  : F3_2<2, Op3Val,
                  (ops IntRegs:$dst, IntRegs:$b, i32imm:$c),
                  !strconcat(OpcStr, " $b, $c, $dst"),
                  [(set IntRegs:$dst, (OpNode IntRegs:$b, simm13:$c))]>;
}

which allows it to use instructions like:

defm AND    : F3_12<"and"  , 0b000001, and>;
defm OR     : F3_12<"or"   , 0b000010, or>;
defm XOR    : F3_12<"xor"  , 0b000011, xor>;
defm SLL    : F3_12<"sll"  , 0b100101, shl>;
defm SRL    : F3_12<"srl"  , 0b100110, srl>;
defm SRA    : F3_12<"sra"  , 0b100111, sra>;
defm ADD    : F3_12<"add"  , 0b000000, add>;
defm ADDCC  : F3_12<"addcc", 0b010000, addc>;
defm ADDX   : F3_12<"addx" , 0b001000, adde>;
defm SUB    : F3_12<"sub"  , 0b000100, sub>;
defm SUBX   : F3_12<"subx" , 0b001100, sube>;
defm SUBCC  : F3_12<"subcc", 0b010100, SPcmpicc>;
...

Each of these 'defm's expand into two instructions.
> The semantic of such a description would mean that $dst should be one
> of GR8, GR16, GR32 and $dst is one of i8mem, i16mem, i32mem with the
> additional constraint that the sizes of both operands are the same
> (this is checked by isSameSize predicate).
Sure.  It would be straight-forward to define a multiclass for your arch, 
in which each use of defm makes three instructions: one for each width.
> More over, in very many cases, the operation (e.g. OR) itself affects
> only the encoding of the operation inside the opcode and of course the
> description of the effect and assembler syntax e.g. [(store (or (load
> addr:$dst), GR8:$src), addr:$dst)]. For OR insn the opreation should be
> or, and for AND insn - obviously and. But operands and operand
> combinations have the same limitations and descriptions also for many
> other operations, e.g. XOR, AND, etc.
Yep.
> Wouldn't it make sense to provide an easier way to describe it?
> For example, one would describe operands combinations (e.g. RR, RM)
> separately and then just say which instructions they are applicable
> for? May be using something like:
>
> def rm : (GR32 :$dst, i32mem:$src)
> {
>  "$assm_op{l} {$src, $dst|$dst, $src}",
>  [(set GR32:$dst, ($insn_effect_op GR16:$dst, (load addr:$src)))]
>
> };
Yep.  tblgen gives you lots of tools to factor your instructions.
> 2. Another related question is about instruciton costs. In BURG-based
> selectors there is usually a possibility to describe costs for the
> instructions so that a least-cost cover can be found during the
> instruction selection process. I have not seen any such cost
> descriptions in TableGen files. Does it mean that it is not supported?
LLVM currently uses assumes that all instructions have cost 1, unless told 
otherwise ('Pat' patterns which produce multiple instructions have cost 
equal to the sum of their result).  Given this, it attempts to match the 
largest patterns possible, which reduces cost.

The 'largest' metric is somewhat intricate, but you can directly affect
it
in your .td files with the 'AddedComplexity'.  If you increase it, it
will
make an instruction more likely to be matched.  However, you really 
shouldn't need to do this, except in extreme cases.  Are you seeing cases 
where the selector is missing an optimal match, or are you just used to 
other tools which aren't as smart at inferring cost as tblgen?
> Why? As far as I understand, LLVM uses BURG/IBURG for instruction
> selection and both of these tools support costs, AFAIK.
Nope, it uses neither.  It uses custom code built into tblgen.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Oct 2006 - [LLVMdev] Instruction descriptions question

[LLVMdev] Instruction descriptions question

[LLVMdev] Instruction descriptions question

[LLVMdev] Instruction descriptions question

Seemingly Similar Threads