thr3ads.net - llvm dev - [LLVMdev] How to partition registers into different RegisterClass? [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Tzu-Chien Chiu

2005-Jul-25 09:38 UTC

[LLVMdev] How to partition registers into different RegisterClass?

Thanks, I think it can solve my problem. 

But please allow me to explain the hardware in detail. Hope there is
more elegant way to solve it.

The hardware is a "stream processor". That is, It processes samples
one by one. Each sample is associated with several 128-bit
four-element vector registers, namely:

* input registers - the attributes of the sample, the values of the
registers are different and initialized for each sample before
execution. READ-ONLY (can only be declared once by 'dcl' instruction).

* constant registers - sample-invariant. READ-ONLY (can only be
defined once by 'def' instruction). All samples shares the same set of
constant register values.

* general purpose registers - values are not initialized before the
execution and destroyed after execution. They can be read and written.

* output registers - WRITE-ONLY.

Sample program converted to pseudo-LLVM assembly (SSA):

  %Vec4 = type < 4 x float>

  // declare input registers and 
  // define constant register values
  %v1 = dcl %Vec4 xyz      
  %v2 = dcl %Vec4 color
  %c1 = def %Vec4 <1,2,3,4>  

  // v1, v2, c1 are not allowed to be destination register 
  // of any instruction hereafter.

  %r1 = add %Vec4 v1, c1
  %r2 = mul %Vec4 v1, c2
  %o1 = mul %Vec4 r2, v2     // write the output register 'o1'

I planed to partition the register into different RegisterClass:
input, output, general purpose, constant, etc.

  def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>;
  def InputRC : RegisterClass<packed, 128, [V0, V1]>;
  def ConstantRC : RegisterClass<packed, 128, [C0, C1]>;

 def ADDgg : BinaryInst<0x51, (
   ops GeneralPurposeRC :$dest,
   ope GeneralPurposeRC :$src), "add $dest, $src">;

 def ADDgi : BinaryInst<0x52, (
   ops GeneralPurposeRC :$dest,
   ope InputRC :$src), "add $dest, $src">;

 def ADDgc : BinaryInst<0x52, (
   ops GeneralPurposeRC :$dest,
   ope ConstantRC :$src), "add $dest, $src">;

The problem is: SDOperand alwasy return the 'type' of the value (in
this case, 'packed', the first argument of RegisterClass<>), but
not
the 'RegisterClass'. With two 'packed' operands, the instruction
selector doesn't know whether a ADDgg, ADDgi, or an ADDgc should be
generated (BuildMI() function).

The same problem exists when there are two types of costant registers,
floating point and integer, and each is declared 'packed' ([4xfloat]
and [4xint]). The instruction selector doesn't know which instruction
it should produce because the newly defined MVT type 'packed' is
always used for all operands (registers), even if it's acutally a
[4xfloat] or [4xint].

2005/7/24, Chris Lattner <sabre at nondot.org>:> On Sat, 23 Jul 2005, Tzu-Chien Chiu wrote:
> > 2005/7/23, Chris Lattner <sabre at nondot.org>:
> >> What does a 'read only' register mean?  Is it a constant
(e.g. returns
> >> 1.0)?  Otherwise, how can it be a useful value?
> >
> > Yes, it's a constant register.
> >
> > Because the instruction cannot contain an immediate value, a constant
> > value may be stored in a constant register, and it's defined
_before_
> > the program starts by API. For example:
> >
> >  SetConstantValue( 5, Vector4( 1, 2, 3, 4 ); // C5 = <1,2,3,4>
> >  HANDLE handle = LoadCodeFromFile( filename );
> >  SetCode( handle );  // C5 is referenced here
> >  Execute();
> 
> Ah, ok. In that case, you want to put all of the registers in one register
> file, and not make the constant register allocatable (e.g. see
> X86RegisterInfo.td, and note how the register classes include EBP and ESP,
> but do not register allocate them (through the definition of
> allocation_order_end()).
> 
> -Chris
> 
> --
> http://nondot.org/sabre/
> http://llvm.org/
> 

-- 
Tzu-Chien Chiu,
3D Graphics Hardware Enginner,
<URL:http://www.csie.nctu.edu.tw/~jwchiu>

Chris Lattner

2005-Jul-26 05:27 UTC

head link

[LLVMdev] How to partition registers into different RegisterClass?

On Mon, 25 Jul 2005, Tzu-Chien Chiu wrote:> But please allow me to explain the hardware in detail. Hope there is
> more elegant way to solve it.
Sounds good!
> The hardware is a "stream processor". That is, It processes
samples
> one by one. Each sample is associated with several 128-bit
> four-element vector registers, namely:
>
> * input registers - the attributes of the sample, the values of the
> registers are different and initialized for each sample before
> execution. READ-ONLY (can only be declared once by 'dcl'
instruction).
Ok.
> * constant registers - sample-invariant. READ-ONLY (can only be
> defined once by 'def' instruction). All samples shares the same set
of
> constant register values.
Ok.  I don't think the definition of these should be represented in your 
code.  The code should just read them when needed.
> * general purpose registers - values are not initialized before the
> execution and destroyed after execution. They can be read and written.
Yup, these should be register allocated.
> * output registers - WRITE-ONLY.
And these should be explicitly defined once, also not register allocated.
> Sample program converted to pseudo-LLVM assembly (SSA):
>
>  %Vec4 = type < 4 x float>
>
>  // declare input registers and
>  // define constant register values
>  %v1 = dcl %Vec4 xyz
>  %v2 = dcl %Vec4 color
>  %c1 = def %Vec4 <1,2,3,4>
>
>  // v1, v2, c1 are not allowed to be destination register
>  // of any instruction hereafter.
>
>  %r1 = add %Vec4 v1, c1
>  %r2 = mul %Vec4 v1, c2
>  %o1 = mul %Vec4 r2, v2     // write the output register 'o1'
Here, the v1/v2/c1/o1 registers should be represented as explicit 
registers, and the GPRs should be virtual registers.  This would give you 
code that looks something like this:

%reg1024 = add v1, c1
%reg1025 = mul v1, c2
%reg1026 = mul %reg1024, %v2
%o1 = mov %reg1026

The 'mov' register-to-register copy instruction will be coallesced and 
eliminated by the register allocator.  The regalloc will eliminate the 
virtual registers, assigning physical GPRs.  This is what the 'allocation 
order' is to cover.
> I planed to partition the register into different RegisterClass:
> input, output, general purpose, constant, etc.
>
>  def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>;
>  def InputRC : RegisterClass<packed, 128, [V0, V1]>;
>  def ConstantRC : RegisterClass<packed, 128, [C0, C1]>;
The way you want to partition these is based on how the instruction set 
works.  If there is a single 'add' instruction that can operate on any
of
these registers, there should be a single register class.  If there are 
two adds (as it looks like you have below, judging by the opcode) with 
different register constraints, then you should partition the registers so 
that each the register classes line up with the instruction operand 
requirements.
> def ADDgg : BinaryInst<0x51, (
>   ops GeneralPurposeRC :$dest,
>   ope GeneralPurposeRC :$src), "add $dest, $src">;
>
> def ADDgi : BinaryInst<0x52, (
>   ops GeneralPurposeRC :$dest,
>   ope InputRC :$src), "add $dest, $src">;
>
> def ADDgc : BinaryInst<0x52, (
>   ops GeneralPurposeRC :$dest,
>   ope ConstantRC :$src), "add $dest, $src">;
>
> The problem is: SDOperand alwasy return the 'type' of the value (in
> this case, 'packed', the first argument of RegisterClass<>),
but not
> the 'RegisterClass'. With two 'packed' operands, the
instruction
> selector doesn't know whether a ADDgg, ADDgi, or an ADDgc should be
> generated (BuildMI() function).
Right.  You don't want to do this sort of partitioning.  All of the 
'computed' values should be virtual registers which will end up being 
assigned to GPRs.  The register allocator will attempt to coallesce the 
GPR into an output or input register if possible.  To allow this 
coallescing to happen, implement the TargetInstrInfo::isMoveInstr virtual 
method for your target.
> The same problem exists when there are two types of costant registers,
> floating point and integer, and each is declared 'packed'
([4xfloat]
> and [4xint]). The instruction selector doesn't know which instruction
> it should produce because the newly defined MVT type 'packed' is
> always used for all operands (registers), even if it's acutally a
> [4xfloat] or [4xint].
It might make sense to add two MVT enums: one for packed integers, and one 
for packed floats?

-Chris
> 2005/7/24, Chris Lattner <sabre at nondot.org>:
>> On Sat, 23 Jul 2005, Tzu-Chien Chiu wrote:
>>> 2005/7/23, Chris Lattner <sabre at nondot.org>:
>>>> What does a 'read only' register mean?  Is it a
constant (e.g. returns
>>>> 1.0)?  Otherwise, how can it be a useful value?
>>>
>>> Yes, it's a constant register.
>>>
>>> Because the instruction cannot contain an immediate value, a
constant
>>> value may be stored in a constant register, and it's defined
_before_
>>> the program starts by API. For example:
>>>
>>>  SetConstantValue( 5, Vector4( 1, 2, 3, 4 ); // C5 =
<1,2,3,4>
>>>  HANDLE handle = LoadCodeFromFile( filename );
>>>  SetCode( handle );  // C5 is referenced here
>>>  Execute();
>>
>> Ah, ok. In that case, you want to put all of the registers in one
register
>> file, and not make the constant register allocatable (e.g. see
>> X86RegisterInfo.td, and note how the register classes include EBP and
ESP,
>> but do not register allocate them (through the definition of
>> allocation_order_end()).
>>
>> -Chris
>>
>> --
>> http://nondot.org/sabre/
>> http://llvm.org/
>>
>
>
>
-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Tzu-Chien Chiu

2005-Jul-26 06:12 UTC

head link

[LLVMdev] How to partition registers into different RegisterClass?

2005/7/26, Chris Lattner <sabre at nondot.org>:> Tzu-Chien Chiu wrote:
> > The same problem exists when there are two types of costant registers,
> > floating point and integer, and each is declared 'packed'
([4xfloat]
> > and [4xint]). The instruction selector doesn't know which
instruction
> > it should produce because the newly defined MVT type 'packed'
is
> > always used for all operands (registers), even if it's acutally a
> > [4xfloat] or [4xint].
>
> It might make sense to add two MVT enums: one for packed integers, and one
> for packed floats?
I thought about that too, but what if:
* there are many packed types, 16 and 32-bit floating points, 16 and
32-bit integers, a lot of enums will be needed.
* there number of elements in a packed type could vary.

There could be a more general way to support packed type. The member
fucntion SequentialType::getElementType() returns the type of the
packed elements:

File: include/llvm/Type.h
<code>
  class SequentialType : public CompositeType {
  public:
    inline const Type *getElementType() const { return ContainedTys[0]; }
  };

  class PackedType : public SequentialType {
  public:
    inline unsigned getNumElements() const { return NumElements; }
  };
</code>

If SDOperand can return a "const Type *", the element type of the
packed type can be obtained, and only one enum value 'packed' needed
to be added to MVT::Type. Not only the element type can be available,
but also the number of elements.

-- 
Tzu-Chien Chiu,
3D Graphics Hardware Architect
<URL:http://www.csie.nctu.edu.tw/~jwchiu>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Jul 2005 - [LLVMdev] How to partition registers into different RegisterClass?

[LLVMdev] How to partition registers into different RegisterClass?

[LLVMdev] How to partition registers into different RegisterClass?

[LLVMdev] How to partition registers into different RegisterClass?

Possibly Parallel Threads