thr3ads.net - llvm dev - [LLVMdev] Register based vector insert/extract [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Christopher Lamb

2007-Apr-23 20:53 UTC

[LLVMdev] Register based vector insert/extract

On Apr 23, 2007, at 1:43 PM, Christopher Lamb wrote:
> On Apr 23, 2007, at 1:17 PM, Christopher Lamb wrote:
>
>> On Apr 23, 2007, at 12:31 PM, Chris Lattner wrote:
>>
>>> On Mon, 23 Apr 2007, Christopher Lamb wrote:
>>>> How can one let the back end know how to insert and extract  
>>>> elements of
>>>> a vector through sub-register copies? I'm at a loss how to
do
>>>> this...
>>>
>>> You probably want to custom lower the insertelement/extractelement
>>> operations for the cases you support.  Take a look at
>>> X86TargetLowering::LowerEXTRACT_VECTOR_ELT for some examples of  
>>> how to do
>>> this.
>>
>> The issue I'm having is that there is no extract/insert  
>> instruction in the ISA, it's simply based on using subregister  
>> operands in subsequent/preliminary instructions. At the pointer of  
>> custom lowering register allocation has not yet been done, so I  
>> don't have a way to communicate the dependency.
>>
>
> An example is in order:
>
> If I have a register v4r0 with subregisters {r0, r1, r2, r3} and a  
> DAG that looks like
>
> load v4si <- extract_element 2 <- add -> load i32
>
> I'd like to be able to generate
>
> load v4r0
> load r10
> add r11, r10, r2 <== subregister 2 of v4r0
I see that Evan has added getSubRegisters()/getSuperRegisters() to  
MRegisterInfo. This is what's needed in order to implement the  
register allocation constraint, but there's no way yet to pass the  
constraint through the operands from the DAG. There would need to be  
some way to specify that the SDOperand is referencing a subvalue of  
the produced value (perhaps a subclass of SDOperand?). This would  
allow the register allocator to try to use the sub/super register  
sets to perform the instert/extract.

Is any of this kind of work planned? The addition of those  
MRegisterInfo functions has me curious...

--
Christopher Lamb

Chris Lattner

2007-Apr-23 21:22 UTC

head link

[LLVMdev] Register based vector insert/extract

On Mon, 23 Apr 2007, Christopher Lamb wrote:>>> The issue I'm having is that there is no extract/insert
>>> instruction in the ISA, it's simply based on using subregister
>>> operands in subsequent/preliminary instructions. At the pointer of
>>> custom lowering register allocation has not yet been done, so I
>>> don't have a way to communicate the dependency.
Ok.
>> If I have a register v4r0 with subregisters {r0, r1, r2, r3} and a
>> DAG that looks like
>>
>> load v4si <- extract_element 2 <- add -> load i32
>>
>> I'd like to be able to generate
>>
>> load v4r0
>> load r10
>> add r11, r10, r2 <== subregister 2 of v4r0
Nice ISA.  That is entirely too logical. :)

We have a similar problem on X86.  In particular, an integer truncate or 
an extend (e.g. i16 -> i8) wants to make use of subregisters.  Consider 
code like this:

   t1 = load i16
   t2 = truncate i16 t1 to i8
   t3 = add i8 t2, 42

What we would really want to generate is something like this at the 
machine instr level:

   r1024 = X86_LOADi16 ...     ;; r1024 is i16
   r1026 = ADDi8 r1024[subreg #0], 42

More specifically, we want to be able to define, for each register class, 
a set of subregister classes.  In the X86 world, the 64-bit register 
classes could have subregclass0 = i8 parts, subregclass1 = i16 parts, 
subregclass2 = i32 parts.  Each <physreg, subreg#> pair should map to 
another physreg (e.g. <RAX,1> -> AX).

The idea of this is that the register allocator allocates registers like 
normal, but when it does the rewriting pass, when it replaces vregs with 
pregs (e.g. r1024 with CX in this example), it rewrites r1024[subreg0] 
with CL instead of CX.  This would give us this code:

   CX = X86_LOADi16 ...
   DL = ADDi8 CL, 42

In your case, you'd define your vector register class with 4 subregs, one 
for each piece.


Unfortunately, none of this exists yet :(.  To handle truncates and 
extends on X86, we currently emulate this by generating machineinstrs 
like:

   r1024 = X86_LOADi16 ...
   r1025 = TRUNCATE_i16_to_i8 r1024
   r1026 = ADDi8 r1025, 42

In the asmprinter, we print TRUNCATE_i16_to_i8 as a commented out noop if 
the register allocator happens to allocate 1024 and 1025 to the same 
register.  If not, it uses an asmprinter hack to print this as a copy 
instruction.  This is horrible, and doesn't produce good code.  OTOH, 
before Evan improved this, we always copied into AX and out of AL for each 
i16->i8 truncate, which was much worse :)
> I see that Evan has added getSubRegisters()/getSuperRegisters() to
> MRegisterInfo. This is what's needed in order to implement the
> register allocation constraint, but there's no way yet to pass the
> constraint through the operands from the DAG. There would need to be
> some way to specify that the SDOperand is referencing a subvalue of
> the produced value (perhaps a subclass of SDOperand?). This would
> allow the register allocator to try to use the sub/super register
> sets to perform the instert/extract.
Right.  Evan is currently focusing on getting the late stages of the code 
generator (e.g. livevars) to be able to understand arbitrary machine 
instrs in the face of physreg subregs.  This lays the groundwork for 
handling vreg subregs, but won't solve it directly.
> Is any of this kind of work planned? The addition of those
> MRegisterInfo functions has me curious...
This is on our mid-term plan, which means we'll probably tackle it over 
the next year or so, but we don't have any concrete plans in the immediate 
future.  If you are interested, this should be a pretty reasonable project 
that will give you a chance to become more familiar with various pieces of 
the early code generator. :)

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Christopher Lamb

2007-Apr-23 23:07 UTC

head link

[LLVMdev] Register based vector insert/extract

Thanks for the detailed response.

On Apr 23, 2007, at 4:22 PM, Chris Lattner wrote:
> Right.  Evan is currently focusing on getting the late stages of  
> the code
> generator (e.g. livevars) to be able to understand arbitrary machine
> instrs in the face of physreg subregs.  This lays the groundwork for
> handling vreg subregs, but won't solve it directly.
Is the work Evan doing a prerequisite for supporting vreg subregs?
Is there a PR for the feature Evan is working on?
>> Is any of this kind of work planned? The addition of those
>> MRegisterInfo functions has me curious...
>
> This is on our mid-term plan, which means we'll probably tackle it  
> over
> the next year or so, but we don't have any concrete plans in the  
> immediate
> future.  If you are interested, this should be a pretty reasonable  
> project
> that will give you a chance to become more familiar with various  
> pieces of
> the early code generator. :)
I have other higher priority tasks right now, but I think we'll want  
to have this in sooner rather than later. If you have any pointers on  
a good starting point it'd be mighty helpful. If I can get a grasp on  
it I'll start incremental work in the background.

Probably the place to start would be opening a PR. Does "Support for  
vreg subregs" capture the essence of the enhancement?

--
Christopher Lamb

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Apr 2007 - [LLVMdev] Register based vector insert/extract

[LLVMdev] Register based vector insert/extract

[LLVMdev] Register based vector insert/extract

[LLVMdev] Register based vector insert/extract

Apparently Analagous Threads