thr3ads.net - llvm dev - [LLVMdev] Custom GEP lowering [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Scott Michel

2007-Aug-29 01:15 UTC

[LLVMdev] Custom GEP lowering

On Aug 28, 2007, at 7:02 AM, Dan Gohman wrote:
> On Mon, Aug 27, 2007 at 07:26:55PM -0700, Scott Michel wrote:
>> It looks like I need to be able to intercept GEP lowering (in
>> SelectionDAGLowering::visitGetElementPtr) and insert something else
>> other than the shifts and adds. The basic problem is that CellSPU
>> loads and stores on 16-byte boundaries. Consequently, the SPU backend
>> has to do the load or store differently than most normal
>> architectures that have byte-addressable operations.
>
> In TOT, load and store instructions have an alignment attribute  
> which is
> useful for addressing similar needs on other architectures. For  
> example,
> this attribute is used on x86, which also has a bunch of instructions
> which require 16-byte alignment. x86 uses it quite late, after  
> legalize,
> and I don't know if that's appropriate for the CellSPU target, but
> wherever you're doing the lowering, could you use the load and store
> alignment attribute?
I'm aware of this attribute, but it doesn't help. The underlying  
problem is that CellSPU does not know how to natively perform byte- 
level addressing. For example, here's an indexed stack instruction to  
load register $3:

	ldq	$3, 4($sp)

In reality, the "4($sp)" doesn't mean what you think it means in
the
PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four  
zero bits to the right of the offset. To get at the 4th byte requires  
loading from 0($sp) and some vector shuffling. (Dan: Think about  
older Cray hardware... you'll immediately understand!)

I could try custom lowering loads and stores as an interim step and  
detect if one of the operands is really a frameindex (or global  
variable or external variable or ... <insert exhaustive list of edge  
cases here>) added to some offset. Ultimately, custom lowering GEPs  
is probably the better idea (if not a lot more work).

If I go ahead and shuffle around some code (no pun intended), would  
it worth my while to prototype some refactoring of the legalize/ 
promote/custom mess, since I'll have to touch it anyway for custom  
GEP lowering?

-scooter

Christopher Lamb

2007-Aug-29 02:12 UTC

head link

[LLVMdev] Custom GEP lowering

On Aug 28, 2007, at 6:15 PM, Scott Michel wrote:
> On Aug 28, 2007, at 7:02 AM, Dan Gohman wrote:
>
>> On Mon, Aug 27, 2007 at 07:26:55PM -0700, Scott Michel wrote:
>>> It looks like I need to be able to intercept GEP lowering (in
>>> SelectionDAGLowering::visitGetElementPtr) and insert something else
>>> other than the shifts and adds. The basic problem is that CellSPU
>>> loads and stores on 16-byte boundaries. Consequently, the SPU  
>>> backend
>>> has to do the load or store differently than most normal
>>> architectures that have byte-addressable operations.
>>
>> In TOT, load and store instructions have an alignment attribute
>> which is
>> useful for addressing similar needs on other architectures. For
>> example,
>> this attribute is used on x86, which also has a bunch of instructions
>> which require 16-byte alignment. x86 uses it quite late, after
>> legalize,
>> and I don't know if that's appropriate for the CellSPU target,
but
>> wherever you're doing the lowering, could you use the load and
store
>> alignment attribute?
>
> I'm aware of this attribute, but it doesn't help. The underlying
> problem is that CellSPU does not know how to natively perform byte-
> level addressing. For example, here's an indexed stack instruction to
> load register $3:
>
> 	ldq	$3, 4($sp)
>
> In reality, the "4($sp)" doesn't mean what you think it means
in the
> PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four
> zero bits to the right of the offset. To get at the 4th byte requires
> loading from 0($sp) and some vector shuffling. (Dan: Think about
> older Cray hardware... you'll immediately understand!)
Isn't this just an ISel issue? You have to ISel unaligned load/ 
store's to more complex code is all. It seems very simple to current  
targets that support indexed and non-indexed addressing modes. In  
this case it's simply that you have to implement the un-indexed modes  
in terms of a more complex expression based on an indexed load.

I tackled a similar issue in Ageia's back end in just this way.
--
Christopher Lamb



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20070828/de12127a/attachment.html>

Scott Michel

2007-Aug-29 03:06 UTC

head link

[LLVMdev] Custom GEP lowering

On Aug 28, 2007, at 7:12 PM, Christopher Lamb wrote:
>
> On Aug 28, 2007, at 6:15 PM, Scott Michel wrote:
>
>> On Aug 28, 2007, at 7:02 AM, Dan Gohman wrote:
>>
>>> On Mon, Aug 27, 2007 at 07:26:55PM -0700, Scott Michel wrote:
>>>> It looks like I need to be able to intercept GEP lowering (in
>>>> SelectionDAGLowering::visitGetElementPtr) and insert something
else
>>>> other than the shifts and adds. The basic problem is that
CellSPU
>>>> loads and stores on 16-byte boundaries. Consequently, the SPU  
>>>> backend
>>>> has to do the load or store differently than most normal
>>>> architectures that have byte-addressable operations.
>>>
>>> In TOT, load and store instructions have an alignment attribute
>>> which is
>>> useful for addressing similar needs on other architectures. For
>>> example,
>>> this attribute is used on x86, which also has a bunch of  
>>> instructions
>>> which require 16-byte alignment. x86 uses it quite late, after
>>> legalize,
>>> and I don't know if that's appropriate for the CellSPU
target, but
>>> wherever you're doing the lowering, could you use the load and
store
>>> alignment attribute?
>>
>> I'm aware of this attribute, but it doesn't help. The
underlying
>> problem is that CellSPU does not know how to natively perform byte-
>> level addressing. For example, here's an indexed stack instruction
to
>> load register $3:
>>
>> 	ldq	$3, 4($sp)
>>
>> In reality, the "4($sp)" doesn't mean what you think it
means in the
>> PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends
four
>> zero bits to the right of the offset. To get at the 4th byte requires
>> loading from 0($sp) and some vector shuffling. (Dan: Think about
>> older Cray hardware... you'll immediately understand!)
>
> Isn't this just an ISel issue? You have to ISel unaligned load/ 
> store's to more complex code is all. It seems very simple to  
> current targets that support indexed and non-indexed addressing  
> modes. In this case it's simply that you have to implement the un- 
> indexed modes in terms of a more complex expression based on an  
> indexed load.
>
> I tackled a similar issue in Ageia's back end in just this way.
Will do a little debugging and investigating and get back to you on  
this... although I suspect the answer is still going to be custom  
lowering GEPs. I'd like to be wrong!


-scooter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20070828/a753cab4/attachment.html>

Chris Lattner

2007-Aug-29 06:34 UTC

head link

[LLVMdev] Custom GEP lowering

On Aug 28, 2007, at 6:15 PM, Scott Michel wrote:
> On Aug 28, 2007, at 7:02 AM, Dan Gohman wrote:
>
>> On Mon, Aug 27, 2007 at 07:26:55PM -0700, Scott Michel wrote:
>>> It looks like I need to be able to intercept GEP lowering (in
>>> SelectionDAGLowering::visitGetElementPtr) and insert something else
>>> other than the shifts and adds. The basic problem is that CellSPU
>>> loads and stores on 16-byte boundaries. Consequently, the SPU  
>>> backend
>>> has to do the load or store differently than most normal
>>> architectures that have byte-addressable operations.
>>
>> In TOT, load and store instructions have an alignment attribute
>> which is
>> useful for addressing similar needs on other architectures. For
>> example,
>> this attribute is used on x86, which also has a bunch of instructions
>> which require 16-byte alignment. x86 uses it quite late, after
>> legalize,
>> and I don't know if that's appropriate for the CellSPU target,
but
>> wherever you're doing the lowering, could you use the load and
store
>> alignment attribute?
>
> I'm aware of this attribute, but it doesn't help. The underlying
> problem is that CellSPU does not know how to natively perform byte-
> level addressing. For example, here's an indexed stack instruction to
> load register $3:
>
> 	ldq	$3, 4($sp)
>
> In reality, the "4($sp)" doesn't mean what you think it means
in the
> PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four
> zero bits to the right of the offset. To get at the 4th byte requires
> loading from 0($sp) and some vector shuffling. (Dan: Think about
> older Cray hardware... you'll immediately understand!)
I agree with Christopher that this is just an unaligned load issue.   
Consider a risc chip with only a 32-bit load that requires the  
pointer to be aligned.  If you want to do an unaligned load, you'd  
have to do something like this:

  t1 = load p & ~3
  t2 = load (p+4) &~3
  t3 = merge t1, t2, p & 3

in the altivec world this is a very very common thing to code up.   
The nice thing about doing this is that the dag combiner can then  
hack away loads if it discovers that p&3 is zero.>
> I could try custom lowering loads and stores as an interim step and
> detect if one of the operands is really a frameindex (or global
> variable or external variable or ... <insert exhaustive list of edge
> cases here>) added to some offset. Ultimately, custom lowering GEPs
> is probably the better idea (if not a lot more work).
You're really asking about alignment.  You can take alignment into  
consideration when you do this.

The bigger problem that you'll hit is that LSR lowers a lot of  
getelementptr instructions to explicit ptrtoint + add + inttoptr, so  
you won't get the GEP expressions in lots of cases.

Better yet, you won't have to do major surgery on the code generator :)

-Chris

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Aug 2007 - [LLVMdev] Custom GEP lowering

[LLVMdev] Custom GEP lowering

[LLVMdev] Custom GEP lowering

[LLVMdev] Custom GEP lowering

[LLVMdev] Custom GEP lowering

Possibly Parallel Threads