On Aug 28, 2007, at 7:02 AM, Dan Gohman wrote:> On Mon, Aug 27, 2007 at 07:26:55PM -0700, Scott Michel wrote: >> It looks like I need to be able to intercept GEP lowering (in >> SelectionDAGLowering::visitGetElementPtr) and insert something else >> other than the shifts and adds. The basic problem is that CellSPU >> loads and stores on 16-byte boundaries. Consequently, the SPU backend >> has to do the load or store differently than most normal >> architectures that have byte-addressable operations. > > In TOT, load and store instructions have an alignment attribute > which is > useful for addressing similar needs on other architectures. For > example, > this attribute is used on x86, which also has a bunch of instructions > which require 16-byte alignment. x86 uses it quite late, after > legalize, > and I don't know if that's appropriate for the CellSPU target, but > wherever you're doing the lowering, could you use the load and store > alignment attribute?I'm aware of this attribute, but it doesn't help. The underlying problem is that CellSPU does not know how to natively perform byte- level addressing. For example, here's an indexed stack instruction to load register $3: ldq $3, 4($sp) In reality, the "4($sp)" doesn't mean what you think it means in the PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four zero bits to the right of the offset. To get at the 4th byte requires loading from 0($sp) and some vector shuffling. (Dan: Think about older Cray hardware... you'll immediately understand!) I could try custom lowering loads and stores as an interim step and detect if one of the operands is really a frameindex (or global variable or external variable or ... <insert exhaustive list of edge cases here>) added to some offset. Ultimately, custom lowering GEPs is probably the better idea (if not a lot more work). If I go ahead and shuffle around some code (no pun intended), would it worth my while to prototype some refactoring of the legalize/ promote/custom mess, since I'll have to touch it anyway for custom GEP lowering? -scooter
On Aug 28, 2007, at 6:15 PM, Scott Michel wrote:> On Aug 28, 2007, at 7:02 AM, Dan Gohman wrote: > >> On Mon, Aug 27, 2007 at 07:26:55PM -0700, Scott Michel wrote: >>> It looks like I need to be able to intercept GEP lowering (in >>> SelectionDAGLowering::visitGetElementPtr) and insert something else >>> other than the shifts and adds. The basic problem is that CellSPU >>> loads and stores on 16-byte boundaries. Consequently, the SPU >>> backend >>> has to do the load or store differently than most normal >>> architectures that have byte-addressable operations. >> >> In TOT, load and store instructions have an alignment attribute >> which is >> useful for addressing similar needs on other architectures. For >> example, >> this attribute is used on x86, which also has a bunch of instructions >> which require 16-byte alignment. x86 uses it quite late, after >> legalize, >> and I don't know if that's appropriate for the CellSPU target, but >> wherever you're doing the lowering, could you use the load and store >> alignment attribute? > > I'm aware of this attribute, but it doesn't help. The underlying > problem is that CellSPU does not know how to natively perform byte- > level addressing. For example, here's an indexed stack instruction to > load register $3: > > ldq $3, 4($sp) > > In reality, the "4($sp)" doesn't mean what you think it means in the > PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four > zero bits to the right of the offset. To get at the 4th byte requires > loading from 0($sp) and some vector shuffling. (Dan: Think about > older Cray hardware... you'll immediately understand!)Isn't this just an ISel issue? You have to ISel unaligned load/ store's to more complex code is all. It seems very simple to current targets that support indexed and non-indexed addressing modes. In this case it's simply that you have to implement the un-indexed modes in terms of a more complex expression based on an indexed load. I tackled a similar issue in Ageia's back end in just this way. -- Christopher Lamb -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070828/de12127a/attachment.html>
On Aug 28, 2007, at 7:12 PM, Christopher Lamb wrote:> > On Aug 28, 2007, at 6:15 PM, Scott Michel wrote: > >> On Aug 28, 2007, at 7:02 AM, Dan Gohman wrote: >> >>> On Mon, Aug 27, 2007 at 07:26:55PM -0700, Scott Michel wrote: >>>> It looks like I need to be able to intercept GEP lowering (in >>>> SelectionDAGLowering::visitGetElementPtr) and insert something else >>>> other than the shifts and adds. The basic problem is that CellSPU >>>> loads and stores on 16-byte boundaries. Consequently, the SPU >>>> backend >>>> has to do the load or store differently than most normal >>>> architectures that have byte-addressable operations. >>> >>> In TOT, load and store instructions have an alignment attribute >>> which is >>> useful for addressing similar needs on other architectures. For >>> example, >>> this attribute is used on x86, which also has a bunch of >>> instructions >>> which require 16-byte alignment. x86 uses it quite late, after >>> legalize, >>> and I don't know if that's appropriate for the CellSPU target, but >>> wherever you're doing the lowering, could you use the load and store >>> alignment attribute? >> >> I'm aware of this attribute, but it doesn't help. The underlying >> problem is that CellSPU does not know how to natively perform byte- >> level addressing. For example, here's an indexed stack instruction to >> load register $3: >> >> ldq $3, 4($sp) >> >> In reality, the "4($sp)" doesn't mean what you think it means in the >> PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four >> zero bits to the right of the offset. To get at the 4th byte requires >> loading from 0($sp) and some vector shuffling. (Dan: Think about >> older Cray hardware... you'll immediately understand!) > > Isn't this just an ISel issue? You have to ISel unaligned load/ > store's to more complex code is all. It seems very simple to > current targets that support indexed and non-indexed addressing > modes. In this case it's simply that you have to implement the un- > indexed modes in terms of a more complex expression based on an > indexed load. > > I tackled a similar issue in Ageia's back end in just this way.Will do a little debugging and investigating and get back to you on this... although I suspect the answer is still going to be custom lowering GEPs. I'd like to be wrong! -scooter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070828/a753cab4/attachment.html>
On Aug 28, 2007, at 6:15 PM, Scott Michel wrote:> On Aug 28, 2007, at 7:02 AM, Dan Gohman wrote: > >> On Mon, Aug 27, 2007 at 07:26:55PM -0700, Scott Michel wrote: >>> It looks like I need to be able to intercept GEP lowering (in >>> SelectionDAGLowering::visitGetElementPtr) and insert something else >>> other than the shifts and adds. The basic problem is that CellSPU >>> loads and stores on 16-byte boundaries. Consequently, the SPU >>> backend >>> has to do the load or store differently than most normal >>> architectures that have byte-addressable operations. >> >> In TOT, load and store instructions have an alignment attribute >> which is >> useful for addressing similar needs on other architectures. For >> example, >> this attribute is used on x86, which also has a bunch of instructions >> which require 16-byte alignment. x86 uses it quite late, after >> legalize, >> and I don't know if that's appropriate for the CellSPU target, but >> wherever you're doing the lowering, could you use the load and store >> alignment attribute? > > I'm aware of this attribute, but it doesn't help. The underlying > problem is that CellSPU does not know how to natively perform byte- > level addressing. For example, here's an indexed stack instruction to > load register $3: > > ldq $3, 4($sp) > > In reality, the "4($sp)" doesn't mean what you think it means in the > PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four > zero bits to the right of the offset. To get at the 4th byte requires > loading from 0($sp) and some vector shuffling. (Dan: Think about > older Cray hardware... you'll immediately understand!)I agree with Christopher that this is just an unaligned load issue. Consider a risc chip with only a 32-bit load that requires the pointer to be aligned. If you want to do an unaligned load, you'd have to do something like this: t1 = load p & ~3 t2 = load (p+4) &~3 t3 = merge t1, t2, p & 3 in the altivec world this is a very very common thing to code up. The nice thing about doing this is that the dag combiner can then hack away loads if it discovers that p&3 is zero.> > I could try custom lowering loads and stores as an interim step and > detect if one of the operands is really a frameindex (or global > variable or external variable or ... <insert exhaustive list of edge > cases here>) added to some offset. Ultimately, custom lowering GEPs > is probably the better idea (if not a lot more work).You're really asking about alignment. You can take alignment into consideration when you do this. The bigger problem that you'll hit is that LSR lowers a lot of getelementptr instructions to explicit ptrtoint + add + inttoptr, so you won't get the GEP expressions in lots of cases. Better yet, you won't have to do major surgery on the code generator :) -Chris