thr3ads.net - llvm dev - [LLVMdev] Overlapping register classes [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2009-Mar-16 18:31 UTC

[LLVMdev] Overlapping register classes

Dan Gohman <gohman at apple.com> writes:
> On Mar 15, 2009, at 2:02 PM, Jakob Stoklund Olesen wrote:
>> Am I misusing register classes, or is this simply functionality that
>> has not been written yet? The existing backends seem to have only one
>> register class per machine value type.
>
> The x86 backend has an example of a partial solution.  The GR32
> register class has a subset, GR32_, which is the registers in GR32
> that support 8-bit subregs.  Instructions that reference 8-bit subregs
> are emitted with a copy (MOV32to32_) to put the value in a virtual
> register of the needed class.  This copy may then optimized away
> by subsequent passes.
I missed this before (thanks, Eli).  I tried adding the explicit move
patterns, and at least it compiles correctly now:

i1_ls:
	R0.H = HI(i1_l); R0.L = LO(i1_l);
	P0 = R0;
	R0.H = HI(i1_s); R0.L = LO(i1_s);
	R1 = B[P0] (Z);
	R2 = 1 (X);
	P0 = R0;
	R0 = R1 & R2;
	B[P0] = R0;
	RTS;

The moves (P0 = R0) did not get optimized away by the register
allocator.  RALinScan::attemptTrivialCoalescing almost succeeded; it got
as far as testing if the source register R0 is contained in the
destination regclass (P).  It isn't, so the move stayed in.

The problem is that the source register is allocated before coalescing
is attempted.  The destination regclass does not backpropagate and
so doesn't influence the allocation class.

PBQP doesn't even attempt to remove a move unless source and destination
regclasses are identical.
> Right now the x86 target code has to explicitly spell out where
> such copies are needed.  It isn't a lot of trouble because there are
> a small number of situations where copies are needed.  From your
> description, it sounds like this would be much more significant on
> blackfin.  Handling this automatically seems possible, though this
> is functionality that has not been written yet.
Yes, inserting explicit patterns everywhere would make a complete mess
of my InstrInfo.td.  All arithmetic requires D-regs, and all load/stores
require P-regs.

It would be fairly simple to insert move instructions in the selection
DAG after instruction selection is complete.  I could do this in my
InstructionSelect() as a first fix, but I think I would have to do
something more clever eventually.

I think a few tricks when creating vregs would go a long way:

1. If the def regclass is a subset of the operand regclass, there is no
   problem.  ScheduleDAGSDNodes::AddOperand should simply allow this
   case.

2. If there is a regclass contained in the def regclass and all the
   operand regclasses, change the vreg regclass to the intersection.
   This could be a bad idea if there are many uses with different
   regclasses.

3. If def and operand regclasses are disjoint, a move is necessary.  It
   should be possible to produce an abstract vreg-vreg copy instruction
   that changes the regclass.  The copy instruction would eventually
   become a copyRegToReg() call after registers are allocated.
> Also, the register allocator and associated passes don't yet know
> how to handle register classes like this.  For example, many
> architectures like this have an add instruction that can add two
> address registers, and one that can add two data registers, but
> not one that can directly add an address register and a data
> register.  In this case, if one operand of an add is in a known class,
> it may be desireable to allocate the other operand in the same
> class (in simple cases).  In LLVM, this is functionality that is not
> yet written.
I have the exact same problem on blackfin.  I can add D=D+D or P=P+P,
but no combinations.  The same goes for post-modify store: I can have
base+offset as P+P or I+M, where I and M are further register classes I
didn't tell you about.

One way of handling this would be to mark an instruction with a list of
alternative instructions.  The alternatives are functionally identical,
but with different operand and result regclasses.  Ideally the register
allocator would choose the best alternative.  However, this is a rather
big change in the problem definition for the register allocator.  I am
going to ignore this issue for now and live with a few redundant
register copies.

Evan Cheng

2009-Mar-17 05:31 UTC

head link

[LLVMdev] Overlapping register classes

On Mar 16, 2009, at 11:31 AM, Jakob Stoklund Olesen wrote:
> Dan Gohman <gohman at apple.com> writes:
>
>> On Mar 15, 2009, at 2:02 PM, Jakob Stoklund Olesen wrote:
>>> Am I misusing register classes, or is this simply functionality
that
>>> has not been written yet? The existing backends seem to have only  
>>> one
>>> register class per machine value type.
>>
>> The x86 backend has an example of a partial solution.  The GR32
>> register class has a subset, GR32_, which is the registers in GR32
>> that support 8-bit subregs.  Instructions that reference 8-bit  
>> subregs
>> are emitted with a copy (MOV32to32_) to put the value in a virtual
>> register of the needed class.  This copy may then optimized away
>> by subsequent passes.
>
> I missed this before (thanks, Eli).  I tried adding the explicit move
> patterns, and at least it compiles correctly now:
>
> i1_ls:
> 	R0.H = HI(i1_l); R0.L = LO(i1_l);
> 	P0 = R0;
> 	R0.H = HI(i1_s); R0.L = LO(i1_s);
> 	R1 = B[P0] (Z);
> 	R2 = 1 (X);
> 	P0 = R0;
> 	R0 = R1 & R2;
> 	B[P0] = R0;
> 	RTS;
>
> The moves (P0 = R0) did not get optimized away by the register
> allocator.  RALinScan::attemptTrivialCoalescing almost succeeded; it  
> got
> as far as testing if the source register R0 is contained in the
> destination regclass (P).  It isn't, so the move stayed in.
>
> The problem is that the source register is allocated before coalescing
> is attempted.  The destination regclass does not backpropagate and
> so doesn't influence the allocation class.
The coalescer has the capability to coalesce cross register class  
copies. It's not quite done. Try -join-cross-class-copies.
>
> PBQP doesn't even attempt to remove a move unless source and  
> destination
> regclasses are identical.
>
>> Right now the x86 target code has to explicitly spell out where
>> such copies are needed.  It isn't a lot of trouble because there
are
>> a small number of situations where copies are needed.  From your
>> description, it sounds like this would be much more significant on
>> blackfin.  Handling this automatically seems possible, though this
>> is functionality that has not been written yet.
>
> Yes, inserting explicit patterns everywhere would make a complete mess
> of my InstrInfo.td.  All arithmetic requires D-regs, and all load/ 
> stores
> require P-regs.
>
> It would be fairly simple to insert move instructions in the selection
> DAG after instruction selection is complete.  I could do this in my
> InstructionSelect() as a first fix, but I think I would have to do
> something more clever eventually.
>
> I think a few tricks when creating vregs would go a long way:
>
> 1. If the def regclass is a subset of the operand regclass, there is  
> no
>   problem.  ScheduleDAGSDNodes::AddOperand should simply allow this
>   case.
>
> 2. If there is a regclass contained in the def regclass and all the
>   operand regclasses, change the vreg regclass to the intersection.
>   This could be a bad idea if there are many uses with different
>   regclasses.
>
> 3. If def and operand regclasses are disjoint, a move is necessary.   
> It
>   should be possible to produce an abstract vreg-vreg copy instruction
>   that changes the regclass.  The copy instruction would eventually
>   become a copyRegToReg() call after registers are allocated.
Sure. These tricks can be added by demand. Patches welcome.

Evan
>
>> Also, the register allocator and associated passes don't yet know
>> how to handle register classes like this.  For example, many
>> architectures like this have an add instruction that can add two
>> address registers, and one that can add two data registers, but
>> not one that can directly add an address register and a data
>> register.  In this case, if one operand of an add is in a known  
>> class,
>> it may be desireable to allocate the other operand in the same
>> class (in simple cases).  In LLVM, this is functionality that is not
>> yet written.
>
> I have the exact same problem on blackfin.  I can add D=D+D or P=P+P,
> but no combinations.  The same goes for post-modify store: I can have
> base+offset as P+P or I+M, where I and M are further register  
> classes I
> didn't tell you about.
>
> One way of handling this would be to mark an instruction with a list  
> of
> alternative instructions.  The alternatives are functionally  
> identical,
> but with different operand and result regclasses.  Ideally the  
> register
> allocator would choose the best alternative.  However, this is a  
> rather
> big change in the problem definition for the register allocator.  I am
> going to ignore this issue for now and live with a few redundant
> register copies.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Jakob Stoklund Olesen

2009-Mar-17 06:58 UTC

head link

[LLVMdev] Overlapping register classes

Evan Cheng <echeng at apple.com> writes:
> On Mar 16, 2009, at 11:31 AM, Jakob Stoklund Olesen wrote:
>> The problem is that the source register is allocated before coalescing
>> is attempted.  The destination regclass does not backpropagate and
>> so doesn't influence the allocation class.
>
> The coalescer has the capability to coalesce cross register class  
> copies. It's not quite done. Try -join-cross-class-copies.
That did the trick!  Now my trivial example becomes:

i1_ls:
	P0.H = HI(i1_l); P0.L = LO(i1_l);
	P1.H = HI(i1_s); P1.L = LO(i1_s);
	R0 = B[P0] (Z);
	R1 = 1 (X);
	R0 = R0 & R1;
	B[P1] = R0;
	RTS;

The inserted copies are gone.
>> 1. If the def regclass is a subset of the operand regclass, there is
>> no problem.  ScheduleDAGSDNodes::AddOperand should simply allow this
>> case.
>>
>> 2. If there is a regclass contained in the def regclass and all the
>> operand regclasses, change the vreg regclass to the intersection.
>> This could be a bad idea if there are many uses with different
>> regclasses.
>>
>> 3. If def and operand regclasses are disjoint, a move is necessary.
>> It should be possible to produce an abstract vreg-vreg copy
>> instruction that changes the regclass.  The copy instruction would
>> eventually become a copyRegToReg() call after registers are
>> allocated.
>
> Sure. These tricks can be added by demand. Patches welcome.
I will look into it.  It doesn't feel right to insert moves everywhere
and hope the coalescer will remove then again.

I am not completely sure 2. would be a good idea.  Changing the vreg to
a smaller regclass would increase the register pressure.  For instance,
in the X86 backend you could change the pattern:

def : Pat<(and GR32:$src1, 0xff),
          (MOVZX32rr8 (i8 (EXTRACT_SUBREG (MOV32to32_ GR32:$src1),
                                          x86_subreg_8bit)))>

into:

def : Pat<(and GR32_:$src1, 0xff),
          (MOVZX32rr8 (i8 (EXTRACT_SUBREG GR32_:$src1,
                                          x86_subreg_8bit)))>

If 2. above were implemented, the vreg representing $src1 would be
forced into GR32_.  If it is live for a long time, that might not be a
good thing.

Is the register allocator able to insert moves in this case? A kind of
cross class spilling.

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Mar 2009 - [LLVMdev] Overlapping register classes

[LLVMdev] Overlapping register classes

[LLVMdev] Overlapping register classes

[LLVMdev] Overlapping register classes

Apparently Analagous Threads