Tzu-Chien Chiu
2005-May-11 02:25 UTC
[LLVMdev] avoid live range overlap of "vector" registers
On Tue May 10 2005, Chris Lattner wrote:>On Tue, 10 May 2005, Morten Ofstad wrote: >> Actually, I think it would be better to define the registers as a machine >> value type for packed float x4, and providing some 'extract' and 'inject' >> instructions to access individual components... There should also be a >> 'shuffle' instruction (corresponding to the SSE PSHUF instruction) to change >> the individual components around. > >You're right, that would be a better way to go. To start, I would suggest >adding extract/inject intrinsics (not instructions) because it is easier. >If you're interested in doing this, there is documentation for this here:quote <http://llvm.cs.uiuc.edu/docs/LangRef.html#intrinsics>: "To do this, extend the default implementation of the IntrinsicLowering class to handle the intrinsic. Code generators use this class to lower intrinsics they do not understand to raw LLVM instructions that they do." but to which llvm instructions should the extract/inject (or shuffle/pack) intrinsics be lowered? llvm instruction does not allow to access the individual scalar value in a packed value.
Chris Lattner
2005-May-11 04:02 UTC
[LLVMdev] avoid live range overlap of "vector" registers
On Wed, 11 May 2005, Tzu-Chien Chiu wrote:> On Tue May 10 2005, Chris Lattner wrote: >> On Tue, 10 May 2005, Morten Ofstad wrote: >>> Actually, I think it would be better to define the registers as a machine >>> value type for packed float x4, and providing some 'extract' and 'inject' >>> instructions to access individual components... There should also be a >>> 'shuffle' instruction (corresponding to the SSE PSHUF instruction) to change >>> the individual components around. >> >> You're right, that would be a better way to go. To start, I would suggest >> adding extract/inject intrinsics (not instructions) because it is easier. >> If you're interested in doing this, there is documentation for this here: > > quote <http://llvm.cs.uiuc.edu/docs/LangRef.html#intrinsics>: > "To do this, extend the default implementation of the > IntrinsicLowering class to handle the intrinsic. Code generators use > this class to lower intrinsics they do not understand to raw LLVM > instructions that they do." > > but to which llvm instructions should the extract/inject (or > shuffle/pack) intrinsics be lowered? llvm instruction does not allow > to access the individual scalar value in a packed value.None, that documentation is out of date and doesn't make a ton of sense for your application. I would suggest that you implement it in the context of the SelectionDAG framework that all of the code generators either currently use or are moving to. I updated the documentation here: http://llvm.cs.uiuc.edu/ChrisLLVM/docs/ExtendingLLVM.html#intrinsic This will allow you to do something like this: %i32v4 = type <4 x uint> %f32v4 = type <4 x float> declare %f32v4 %swizzle(%f32v4 %In, %i32v4 %Form) %G = external global %f32v4 void %test() { %A = load %f32v4* %G %B = call %f32v4 %swizzle(%f32v4 %A, %i32v4 <uint 1, uint 1, uint 1, uint 1>) ;; splat XYZW -> YYYY store %f32v4 %B, %f32v4* %G ret void } ... Except using llvm.swizzle instead of 'swizzle'. Unfortunately the code generator currently does not support packed types, so this will require some work. However, this certainly is the closest match for your model. -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/
Morten Ofstad
2005-May-11 07:40 UTC
[LLVMdev] avoid live range overlap of "vector" registers
Chris Lattner wrote:> None, that documentation is out of date and doesn't make a ton of sense > for your application. I would suggest that you implement it in the > context of the SelectionDAG framework that all of the code generators > either currently use or are moving to. I updated the documentation > here: http://llvm.cs.uiuc.edu/ChrisLLVM/docs/ExtendingLLVM.html#intrinsic > > This will allow you to do something like this: > > %i32v4 = type <4 x uint> > > %f32v4 = type <4 x float> > > declare %f32v4 %swizzle(%f32v4 %In, %i32v4 %Form) > > %G = external global %f32v4 > > void %test() { > %A = load %f32v4* %G > %B = call %f32v4 %swizzle(%f32v4 %A, %i32v4 <uint 1, uint 1, > uint 1, uint 1>) ;; splat XYZW -> YYYY > store %f32v4 %B, %f32v4* %G > ret void > } > > ... Except using llvm.swizzle instead of 'swizzle'.I much prefer the name chosen in the SSE instruction set: 'shuffle'> Unfortunately the code generator currently does not support packed > types, so this will require some work. However, this certainly is the > closest match for your model.This work needs to be done for SSE code generation, which I think would be of interest to several people (including me) -- Our front-end generates code that uses packed datatypes a lot and I'm not entirely happy with the current situation using the LowerPacked pass... If SSE code generation was working, we would use LLVM for a lot more, at the moment we have a small runtime library with SSE optimized functions for things like trilinear interpolation, but the LLVM optimizer can't do very much with these functions since they are just external calls. m.
Reasonably Related Threads
- [LLVMdev] avoid live range overlap of "vector" registers
- [LLVMdev] avoid live range overlap of "vector" registers
- [LLVMdev] adding new instructions to support "swizzle" and "writemask"
- [LLVMdev] Vector LLVM extension v.s. DirectX Shaders
- [LLVMdev] Vector LLVM extension v.s. DirectX Shaders