thr3ads.net - llvm dev - [LLVMdev] Overlapping register classes [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2009-Mar-15 21:02 UTC

[LLVMdev] Overlapping register classes

Hi,

I am writing a backend for the Blackfin processor from Analog Devices. I
just started so I still have a lot to learn about the code generator. So
far, I can compile test/CodeGen/Generic/BasicInstrs.ll correctly, but
that is about it.

The Blackfin 32-bit registers divide naturally into several classes. I
have modelled these register classes without knowing anything about what
the code generator expects.

There are data and pointer registers:

def D : RegisterClass<"Bfin", [i32], 32, [R0, R1, R2, R3, R4, R5,
R6, R7]>;
def P : RegisterClass<"Bfin", [i32], 32, [P0, P1, P2, P3, P4, P5,
SP, FP]>;

For instance, a zero-extending byte load needs the address in a P-reg
and can only load a D-reg:

def LOAD32p_8z: F1<(outs D:$dst), (ins P:$ptr),
                   "$dst = B[$ptr] (Z);",
                   [(set D:$dst, (zextloadi8 P:$ptr))]>;

Some instructions work on all registers:

def GR : RegisterClass<"Bfin", [i32], 32,
    [R0, R1, R2, R3, R4, R5, R6, R7,
     P0, P1, P2, P3, P4, P5, SP, FP,
     I0, I1, I2, I3, M0, M1, M2, M3,
     B0, B1, B2, B3, L0, L1, L2, L3]>;

For instance, I can load an arbitrary 32-bit constant or
globaladdr/externalsym into any register:

def LOAD32imm: Pseudo<(outs GR:$dst), (ins i32imm:$src),
                      "$dst.H = HI($src); $dst.L = LO($src);",
                      [(set GR:$dst, imm:$src)]>;

I think I am stretching the code generator beyond its capability by
doing this. As far as I can tell, instruction selection is done purely
based on value types. Register classes are not considered.

I get in trouble when I try to compile this function:

@i1_l = external global i1
@i1_s = external global i1

define void @i1_ls() nounwind  {
	%tmp = load i1* @i1_l
	store i1 %tmp, i1* @i1_s
	ret void
}

Instruction selection works correctly, but the scheduling step fails
with "Register class of operand and regclass of use don't agree!"
in
ScheduleDAGSDNodes::AddOperand. The selected DAG contains:

    (LOAD32p_8z (LOAD32imm (tglobaladdr "i1_l")))

LOAD32imm produces a GR-class vreg, and LOAD32p_8z expects a P-class
vreg, hence the error. But P is a subclass of GR, so if the vreg class
were changed to P everything would work.

The solution is not always that simple. There could be multiple uses
with different regclasses, or the def and use regclasses could be
disjoint so a reg-to-reg copy would be necessary.

Am I misusing register classes, or is this simply functionality that has
not been written yet? The existing backends seem to have only one
register class per machine value type.

Regards,
/stoklund

someguy

2009-Mar-16 07:33 UTC

head link

[LLVMdev] Overlapping register classes

I don't know if there is a 'real' solution to your issue, but you
might consider implementing some pseudo instructions for conversions
between register classes. This should help alleviate the issue you are
currently getting, as (with the right patterns) the generator should
automatically insert the conversion instructions when needed.

Alternatively, you could kill your 'joint' register class, and simply
use multiple patterns on those instructions which can use all
registers. This is  probably the right way to go.

Lastly, you could use a single 'joint' register class, and some
predicates in your .td to make sure that only specific registers in
the class are used for a given pattern.

In any case, I recommend you come hang out on the IRC channel. I've
just struggled through a lot of the backend target stuff, and may be
able to help you avoid the pitfalls I hit. (I also know a little
blackfin ;)).

On Sun, Mar 15, 2009 at 11:02 PM, Jakob Stoklund Olesen <stoklund at
2pi.dk> wrote:> Hi,
>
> I am writing a backend for the Blackfin processor from Analog Devices. I
> just started so I still have a lot to learn about the code generator. So
> far, I can compile test/CodeGen/Generic/BasicInstrs.ll correctly, but
> that is about it.
>
> The Blackfin 32-bit registers divide naturally into several classes. I
> have modelled these register classes without knowing anything about what
> the code generator expects.
>
> There are data and pointer registers:
>
> def D : RegisterClass<"Bfin", [i32], 32, [R0, R1, R2, R3, R4,
R5, R6, R7]>;
> def P : RegisterClass<"Bfin", [i32], 32, [P0, P1, P2, P3, P4,
P5, SP, FP]>;
>
> For instance, a zero-extending byte load needs the address in a P-reg
> and can only load a D-reg:
>
> def LOAD32p_8z: F1<(outs D:$dst), (ins P:$ptr),
>                   "$dst = B[$ptr] (Z);",
>                   [(set D:$dst, (zextloadi8 P:$ptr))]>;
>
> Some instructions work on all registers:
>
> def GR : RegisterClass<"Bfin", [i32], 32,
>    [R0, R1, R2, R3, R4, R5, R6, R7,
>     P0, P1, P2, P3, P4, P5, SP, FP,
>     I0, I1, I2, I3, M0, M1, M2, M3,
>     B0, B1, B2, B3, L0, L1, L2, L3]>;
>
> For instance, I can load an arbitrary 32-bit constant or
> globaladdr/externalsym into any register:
>
> def LOAD32imm: Pseudo<(outs GR:$dst), (ins i32imm:$src),
>                      "$dst.H = HI($src); $dst.L = LO($src);",
>                      [(set GR:$dst, imm:$src)]>;
>
> I think I am stretching the code generator beyond its capability by
> doing this. As far as I can tell, instruction selection is done purely
> based on value types. Register classes are not considered.
>
> I get in trouble when I try to compile this function:
>
> @i1_l = external global i1
> @i1_s = external global i1
>
> define void @i1_ls() nounwind  {
>        %tmp = load i1* @i1_l
>        store i1 %tmp, i1* @i1_s
>        ret void
> }
>
> Instruction selection works correctly, but the scheduling step fails
> with "Register class of operand and regclass of use don't
agree!" in
> ScheduleDAGSDNodes::AddOperand. The selected DAG contains:
>
>    (LOAD32p_8z (LOAD32imm (tglobaladdr "i1_l")))
>
> LOAD32imm produces a GR-class vreg, and LOAD32p_8z expects a P-class
> vreg, hence the error. But P is a subclass of GR, so if the vreg class
> were changed to P everything would work.
>
> The solution is not always that simple. There could be multiple uses
> with different regclasses, or the def and use regclasses could be
> disjoint so a reg-to-reg copy would be necessary.
>
> Am I misusing register classes, or is this simply functionality that has
> not been written yet? The existing backends seem to have only one
> register class per machine value type.
>
> Regards,
> /stoklund
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Eli Friedman

2009-Mar-16 09:25 UTC

head link

[LLVMdev] Overlapping register classes

On Sun, Mar 15, 2009 at 2:02 PM, Jakob Stoklund Olesen <stoklund at
2pi.dk> wrote:> Am I misusing register classes, or is this simply functionality that has
> not been written yet? The existing backends seem to have only one
> register class per machine value type.
Have you taken a look at MOV16to16_ in X86InstrInfo.td?  I think the
situation there is similar to your issue.

-Eli

Dan Gohman

2009-Mar-16 16:37 UTC

head link

[LLVMdev] Overlapping register classes

On Mar 15, 2009, at 2:02 PM, Jakob Stoklund Olesen wrote:
> Hi,
>
> I am writing a backend for the Blackfin processor from Analog  
> Devices. I
> just started so I still have a lot to learn about the code  
> generator. So
> far, I can compile test/CodeGen/Generic/BasicInstrs.ll correctly, but
> that is about it.
>
> The Blackfin 32-bit registers divide naturally into several classes. I
> have modelled these register classes without knowing anything about  
> what
> the code generator expects.
>
> There are data and pointer registers:
>
> def D : RegisterClass<"Bfin", [i32], 32, [R0, R1, R2, R3, R4,
R5,
> R6, R7]>;
> def P : RegisterClass<"Bfin", [i32], 32, [P0, P1, P2, P3, P4,
P5,
> SP, FP]>;
>
> For instance, a zero-extending byte load needs the address in a P-reg
> and can only load a D-reg:
>
> def LOAD32p_8z: F1<(outs D:$dst), (ins P:$ptr),
>                   "$dst = B[$ptr] (Z);",
>                   [(set D:$dst, (zextloadi8 P:$ptr))]>;
>
> Some instructions work on all registers:
>
> def GR : RegisterClass<"Bfin", [i32], 32,
>    [R0, R1, R2, R3, R4, R5, R6, R7,
>     P0, P1, P2, P3, P4, P5, SP, FP,
>     I0, I1, I2, I3, M0, M1, M2, M3,
>     B0, B1, B2, B3, L0, L1, L2, L3]>;
>
> For instance, I can load an arbitrary 32-bit constant or
> globaladdr/externalsym into any register:
>
> def LOAD32imm: Pseudo<(outs GR:$dst), (ins i32imm:$src),
>                      "$dst.H = HI($src); $dst.L = LO($src);",
>                      [(set GR:$dst, imm:$src)]>;
>
> I think I am stretching the code generator beyond its capability by
> doing this. As far as I can tell, instruction selection is done purely
> based on value types. Register classes are not considered.
>
> I get in trouble when I try to compile this function:
>
> @i1_l = external global i1
> @i1_s = external global i1
>
> define void @i1_ls() nounwind  {
> 	%tmp = load i1* @i1_l
> 	store i1 %tmp, i1* @i1_s
> 	ret void
> }
>
> Instruction selection works correctly, but the scheduling step fails
> with "Register class of operand and regclass of use don't
agree!" in
> ScheduleDAGSDNodes::AddOperand. The selected DAG contains:
>
>    (LOAD32p_8z (LOAD32imm (tglobaladdr "i1_l")))
>
> LOAD32imm produces a GR-class vreg, and LOAD32p_8z expects a P-class
> vreg, hence the error. But P is a subclass of GR, so if the vreg class
> were changed to P everything would work.
>
> The solution is not always that simple. There could be multiple uses
> with different regclasses, or the def and use regclasses could be
> disjoint so a reg-to-reg copy would be necessary.
>
> Am I misusing register classes, or is this simply functionality that  
> has
> not been written yet? The existing backends seem to have only one
> register class per machine value type.

The x86 backend has an example of a partial solution.  The GR32
register class has a subset, GR32_, which is the registers in GR32
that support 8-bit subregs.  Instructions that reference 8-bit subregs
are emitted with a copy (MOV32to32_) to put the value in a virtual
register of the needed class.  This copy may then optimized away
by subsequent passes.

Right now the x86 target code has to explicitly spell out where
such copies are needed.  It isn't a lot of trouble because there are
a small number of situations where copies are needed.  From your
description, it sounds like this would be much more significant on
blackfin.  Handling this automatically seems possible, though this
is functionality that has not been written yet.

Also, the register allocator and associated passes don't yet know
how to handle register classes like this.  For example, many
architectures like this have an add instruction that can add two
address registers, and one that can add two data registers, but
not one that can directly add an address register and a data
register.  In this case, if one operand of an add is in a known class,
it may be desireable to allocate the other operand in the same
class (in simple cases).  In LLVM, this is functionality that is not
yet written.

Dan

Jakob Stoklund Olesen

2009-Mar-16 18:31 UTC

head link

[LLVMdev] Overlapping register classes

Dan Gohman <gohman at apple.com> writes:
> On Mar 15, 2009, at 2:02 PM, Jakob Stoklund Olesen wrote:
>> Am I misusing register classes, or is this simply functionality that
>> has not been written yet? The existing backends seem to have only one
>> register class per machine value type.
>
> The x86 backend has an example of a partial solution.  The GR32
> register class has a subset, GR32_, which is the registers in GR32
> that support 8-bit subregs.  Instructions that reference 8-bit subregs
> are emitted with a copy (MOV32to32_) to put the value in a virtual
> register of the needed class.  This copy may then optimized away
> by subsequent passes.
I missed this before (thanks, Eli).  I tried adding the explicit move
patterns, and at least it compiles correctly now:

i1_ls:
	R0.H = HI(i1_l); R0.L = LO(i1_l);
	P0 = R0;
	R0.H = HI(i1_s); R0.L = LO(i1_s);
	R1 = B[P0] (Z);
	R2 = 1 (X);
	P0 = R0;
	R0 = R1 & R2;
	B[P0] = R0;
	RTS;

The moves (P0 = R0) did not get optimized away by the register
allocator.  RALinScan::attemptTrivialCoalescing almost succeeded; it got
as far as testing if the source register R0 is contained in the
destination regclass (P).  It isn't, so the move stayed in.

The problem is that the source register is allocated before coalescing
is attempted.  The destination regclass does not backpropagate and
so doesn't influence the allocation class.

PBQP doesn't even attempt to remove a move unless source and destination
regclasses are identical.
> Right now the x86 target code has to explicitly spell out where
> such copies are needed.  It isn't a lot of trouble because there are
> a small number of situations where copies are needed.  From your
> description, it sounds like this would be much more significant on
> blackfin.  Handling this automatically seems possible, though this
> is functionality that has not been written yet.
Yes, inserting explicit patterns everywhere would make a complete mess
of my InstrInfo.td.  All arithmetic requires D-regs, and all load/stores
require P-regs.

It would be fairly simple to insert move instructions in the selection
DAG after instruction selection is complete.  I could do this in my
InstructionSelect() as a first fix, but I think I would have to do
something more clever eventually.

I think a few tricks when creating vregs would go a long way:

1. If the def regclass is a subset of the operand regclass, there is no
   problem.  ScheduleDAGSDNodes::AddOperand should simply allow this
   case.

2. If there is a regclass contained in the def regclass and all the
   operand regclasses, change the vreg regclass to the intersection.
   This could be a bad idea if there are many uses with different
   regclasses.

3. If def and operand regclasses are disjoint, a move is necessary.  It
   should be possible to produce an abstract vreg-vreg copy instruction
   that changes the regclass.  The copy instruction would eventually
   become a copyRegToReg() call after registers are allocated.
> Also, the register allocator and associated passes don't yet know
> how to handle register classes like this.  For example, many
> architectures like this have an add instruction that can add two
> address registers, and one that can add two data registers, but
> not one that can directly add an address register and a data
> register.  In this case, if one operand of an add is in a known class,
> it may be desireable to allocate the other operand in the same
> class (in simple cases).  In LLVM, this is functionality that is not
> yet written.
I have the exact same problem on blackfin.  I can add D=D+D or P=P+P,
but no combinations.  The same goes for post-modify store: I can have
base+offset as P+P or I+M, where I and M are further register classes I
didn't tell you about.

One way of handling this would be to mark an instruction with a list of
alternative instructions.  The alternatives are functionally identical,
but with different operand and result regclasses.  Ideally the register
allocator would choose the best alternative.  However, this is a rather
big change in the problem definition for the register allocator.  I am
going to ignore this issue for now and live with a few redundant
register copies.

Jakob Stoklund Olesen

2009-Mar-18 06:59 UTC

head link

[LLVMdev] Overlapping register classes

Jakob Stoklund Olesen <stoklund at 2pi.dk> writes:
> The Blackfin 32-bit registers divide naturally into several classes. I
> have modelled these register classes without knowing anything about what
> the code generator expects.
Here is what I have done for now: I have added a narrowing move
instruction for each register subclass, much like the X86 MOV32to32_
instruction:

def MOVE_d: F1<(outs D:$dst), (ins ALL:$src),"$dst = $src;",
[]>;
def MOVE_p: F1<(outs P:$dst), (ins ALL:$src),"$dst = $src;",
[]>;

I do not use these move instructions in patterns; that would be too
messy.  Instead, I insert narrowing moves in the selection DAG after
instruction selection is complete, before scheduling.  Since the
narrowing moves take operands from the ALL regclass, ScheduleDAGSDNodes
gets upset unless you apply the following patch:

--- a/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodesEmit.cpp
+++ b/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodesEmit.cpp
@@ -262,7 +262,7 @@ void ScheduleDAGSDNodes::AddOperand(MachineInstr *MI,
SDValue Op,
       const TargetRegisterClass *RC= getInstrOperandRegClass(TRI, *II,
IIOpNum);
       assert((RC || II->isVariadic()) && "Expected reg class
info!");
       const TargetRegisterClass *VRC = MRI.getRegClass(VReg);
-      if (RC && VRC != RC) {
+      if (RC && VRC != RC && !VRC->hasSuperClass(RC)) {
         cerr << "Register class of operand and regclass of use
don't agree!\n";
         cerr << "Operand = " << IIOpNum <<
"\n";
         cerr << "Op->Val = "; Op.getNode()->dump(DAG);
cerr << "\n";

When running llc with -join-cross-class-copies as Evan suggested, it
produces good results.

I am going to move on to implementing advanced features like jumps and
condition codes now :-)

Thanks,
/jakob

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Mar 2009 - [LLVMdev] Overlapping register classes

[LLVMdev] Overlapping register classes

[LLVMdev] Overlapping register classes

[LLVMdev] Overlapping register classes

[LLVMdev] Overlapping register classes

[LLVMdev] Overlapping register classes

[LLVMdev] Overlapping register classes

Reasonably Related Threads