Tzu-Chien Chiu
2005-Jul-25 09:38 UTC
[LLVMdev] How to partition registers into different RegisterClass?
Thanks, I think it can solve my problem. But please allow me to explain the hardware in detail. Hope there is more elegant way to solve it. The hardware is a "stream processor". That is, It processes samples one by one. Each sample is associated with several 128-bit four-element vector registers, namely: * input registers - the attributes of the sample, the values of the registers are different and initialized for each sample before execution. READ-ONLY (can only be declared once by 'dcl' instruction). * constant registers - sample-invariant. READ-ONLY (can only be defined once by 'def' instruction). All samples shares the same set of constant register values. * general purpose registers - values are not initialized before the execution and destroyed after execution. They can be read and written. * output registers - WRITE-ONLY. Sample program converted to pseudo-LLVM assembly (SSA): %Vec4 = type < 4 x float> // declare input registers and // define constant register values %v1 = dcl %Vec4 xyz %v2 = dcl %Vec4 color %c1 = def %Vec4 <1,2,3,4> // v1, v2, c1 are not allowed to be destination register // of any instruction hereafter. %r1 = add %Vec4 v1, c1 %r2 = mul %Vec4 v1, c2 %o1 = mul %Vec4 r2, v2 // write the output register 'o1' I planed to partition the register into different RegisterClass: input, output, general purpose, constant, etc. def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>; def InputRC : RegisterClass<packed, 128, [V0, V1]>; def ConstantRC : RegisterClass<packed, 128, [C0, C1]>; def ADDgg : BinaryInst<0x51, ( ops GeneralPurposeRC :$dest, ope GeneralPurposeRC :$src), "add $dest, $src">; def ADDgi : BinaryInst<0x52, ( ops GeneralPurposeRC :$dest, ope InputRC :$src), "add $dest, $src">; def ADDgc : BinaryInst<0x52, ( ops GeneralPurposeRC :$dest, ope ConstantRC :$src), "add $dest, $src">; The problem is: SDOperand alwasy return the 'type' of the value (in this case, 'packed', the first argument of RegisterClass<>), but not the 'RegisterClass'. With two 'packed' operands, the instruction selector doesn't know whether a ADDgg, ADDgi, or an ADDgc should be generated (BuildMI() function). The same problem exists when there are two types of costant registers, floating point and integer, and each is declared 'packed' ([4xfloat] and [4xint]). The instruction selector doesn't know which instruction it should produce because the newly defined MVT type 'packed' is always used for all operands (registers), even if it's acutally a [4xfloat] or [4xint]. 2005/7/24, Chris Lattner <sabre at nondot.org>:> On Sat, 23 Jul 2005, Tzu-Chien Chiu wrote: > > 2005/7/23, Chris Lattner <sabre at nondot.org>: > >> What does a 'read only' register mean? Is it a constant (e.g. returns > >> 1.0)? Otherwise, how can it be a useful value? > > > > Yes, it's a constant register. > > > > Because the instruction cannot contain an immediate value, a constant > > value may be stored in a constant register, and it's defined _before_ > > the program starts by API. For example: > > > > SetConstantValue( 5, Vector4( 1, 2, 3, 4 ); // C5 = <1,2,3,4> > > HANDLE handle = LoadCodeFromFile( filename ); > > SetCode( handle ); // C5 is referenced here > > Execute(); > > Ah, ok. In that case, you want to put all of the registers in one register > file, and not make the constant register allocatable (e.g. see > X86RegisterInfo.td, and note how the register classes include EBP and ESP, > but do not register allocate them (through the definition of > allocation_order_end()). > > -Chris > > -- > http://nondot.org/sabre/ > http://llvm.org/ >-- Tzu-Chien Chiu, 3D Graphics Hardware Enginner, <URL:http://www.csie.nctu.edu.tw/~jwchiu>
Chris Lattner
2005-Jul-26 05:27 UTC
[LLVMdev] How to partition registers into different RegisterClass?
On Mon, 25 Jul 2005, Tzu-Chien Chiu wrote:> But please allow me to explain the hardware in detail. Hope there is > more elegant way to solve it.Sounds good!> The hardware is a "stream processor". That is, It processes samples > one by one. Each sample is associated with several 128-bit > four-element vector registers, namely: > > * input registers - the attributes of the sample, the values of the > registers are different and initialized for each sample before > execution. READ-ONLY (can only be declared once by 'dcl' instruction).Ok.> * constant registers - sample-invariant. READ-ONLY (can only be > defined once by 'def' instruction). All samples shares the same set of > constant register values.Ok. I don't think the definition of these should be represented in your code. The code should just read them when needed.> * general purpose registers - values are not initialized before the > execution and destroyed after execution. They can be read and written.Yup, these should be register allocated.> * output registers - WRITE-ONLY.And these should be explicitly defined once, also not register allocated.> Sample program converted to pseudo-LLVM assembly (SSA): > > %Vec4 = type < 4 x float> > > // declare input registers and > // define constant register values > %v1 = dcl %Vec4 xyz > %v2 = dcl %Vec4 color > %c1 = def %Vec4 <1,2,3,4> > > // v1, v2, c1 are not allowed to be destination register > // of any instruction hereafter. > > %r1 = add %Vec4 v1, c1 > %r2 = mul %Vec4 v1, c2 > %o1 = mul %Vec4 r2, v2 // write the output register 'o1'Here, the v1/v2/c1/o1 registers should be represented as explicit registers, and the GPRs should be virtual registers. This would give you code that looks something like this: %reg1024 = add v1, c1 %reg1025 = mul v1, c2 %reg1026 = mul %reg1024, %v2 %o1 = mov %reg1026 The 'mov' register-to-register copy instruction will be coallesced and eliminated by the register allocator. The regalloc will eliminate the virtual registers, assigning physical GPRs. This is what the 'allocation order' is to cover.> I planed to partition the register into different RegisterClass: > input, output, general purpose, constant, etc. > > def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>; > def InputRC : RegisterClass<packed, 128, [V0, V1]>; > def ConstantRC : RegisterClass<packed, 128, [C0, C1]>;The way you want to partition these is based on how the instruction set works. If there is a single 'add' instruction that can operate on any of these registers, there should be a single register class. If there are two adds (as it looks like you have below, judging by the opcode) with different register constraints, then you should partition the registers so that each the register classes line up with the instruction operand requirements.> def ADDgg : BinaryInst<0x51, ( > ops GeneralPurposeRC :$dest, > ope GeneralPurposeRC :$src), "add $dest, $src">; > > def ADDgi : BinaryInst<0x52, ( > ops GeneralPurposeRC :$dest, > ope InputRC :$src), "add $dest, $src">; > > def ADDgc : BinaryInst<0x52, ( > ops GeneralPurposeRC :$dest, > ope ConstantRC :$src), "add $dest, $src">; > > The problem is: SDOperand alwasy return the 'type' of the value (in > this case, 'packed', the first argument of RegisterClass<>), but not > the 'RegisterClass'. With two 'packed' operands, the instruction > selector doesn't know whether a ADDgg, ADDgi, or an ADDgc should be > generated (BuildMI() function).Right. You don't want to do this sort of partitioning. All of the 'computed' values should be virtual registers which will end up being assigned to GPRs. The register allocator will attempt to coallesce the GPR into an output or input register if possible. To allow this coallescing to happen, implement the TargetInstrInfo::isMoveInstr virtual method for your target.> The same problem exists when there are two types of costant registers, > floating point and integer, and each is declared 'packed' ([4xfloat] > and [4xint]). The instruction selector doesn't know which instruction > it should produce because the newly defined MVT type 'packed' is > always used for all operands (registers), even if it's acutally a > [4xfloat] or [4xint].It might make sense to add two MVT enums: one for packed integers, and one for packed floats? -Chris> 2005/7/24, Chris Lattner <sabre at nondot.org>: >> On Sat, 23 Jul 2005, Tzu-Chien Chiu wrote: >>> 2005/7/23, Chris Lattner <sabre at nondot.org>: >>>> What does a 'read only' register mean? Is it a constant (e.g. returns >>>> 1.0)? Otherwise, how can it be a useful value? >>> >>> Yes, it's a constant register. >>> >>> Because the instruction cannot contain an immediate value, a constant >>> value may be stored in a constant register, and it's defined _before_ >>> the program starts by API. For example: >>> >>> SetConstantValue( 5, Vector4( 1, 2, 3, 4 ); // C5 = <1,2,3,4> >>> HANDLE handle = LoadCodeFromFile( filename ); >>> SetCode( handle ); // C5 is referenced here >>> Execute(); >> >> Ah, ok. In that case, you want to put all of the registers in one register >> file, and not make the constant register allocatable (e.g. see >> X86RegisterInfo.td, and note how the register classes include EBP and ESP, >> but do not register allocate them (through the definition of >> allocation_order_end()). >> >> -Chris >> >> -- >> http://nondot.org/sabre/ >> http://llvm.org/ >> > > >-Chris -- http://nondot.org/sabre/ http://llvm.org/
Tzu-Chien Chiu
2005-Jul-26 06:12 UTC
[LLVMdev] How to partition registers into different RegisterClass?
2005/7/26, Chris Lattner <sabre at nondot.org>:> Tzu-Chien Chiu wrote: > > The same problem exists when there are two types of costant registers, > > floating point and integer, and each is declared 'packed' ([4xfloat] > > and [4xint]). The instruction selector doesn't know which instruction > > it should produce because the newly defined MVT type 'packed' is > > always used for all operands (registers), even if it's acutally a > > [4xfloat] or [4xint]. > > It might make sense to add two MVT enums: one for packed integers, and one > for packed floats?I thought about that too, but what if: * there are many packed types, 16 and 32-bit floating points, 16 and 32-bit integers, a lot of enums will be needed. * there number of elements in a packed type could vary. There could be a more general way to support packed type. The member fucntion SequentialType::getElementType() returns the type of the packed elements: File: include/llvm/Type.h <code> class SequentialType : public CompositeType { public: inline const Type *getElementType() const { return ContainedTys[0]; } }; class PackedType : public SequentialType { public: inline unsigned getNumElements() const { return NumElements; } }; </code> If SDOperand can return a "const Type *", the element type of the packed type can be obtained, and only one enum value 'packed' needed to be added to MVT::Type. Not only the element type can be available, but also the number of elements. -- Tzu-Chien Chiu, 3D Graphics Hardware Architect <URL:http://www.csie.nctu.edu.tw/~jwchiu>
Possibly Parallel Threads
- [LLVMdev] How to partition registers into different RegisterClass?
- [LLVMdev] How to partition registers into different RegisterClass?
- [LLVMdev] How to partition registers into different RegisterClass?
- [LLVMdev] How to partition registers into different RegisterClass?
- [LLVMdev] How to partition registers into different RegisterClass?