Tzu-Chien Chiu
2005-Apr-20 08:03 UTC
[LLVMdev] adding new instructions to support "swizzle" and "writemask"
Hello, everyone: I am writing a compiler for a programmable graphics hardware. Each registers of the hardware has four channels, namely 'r', 'b', 'g', 'a', and each channel is a 32-bit floating point. It's similar to the high and low 8-bit of an x86 16-bit general purpose register "AX" can be individually referenced as "AH" and "AL". What's different is the hardware further "source register swizzle" and "writemask". For example: # The following two instructions are equivalent. # They cost the same instruction slot, and have same # execution time. Four channels are added in parallel. add r0, r1, r2 add r0.xyzw, r1.xyzw, r2.xyzw # equivalent to: # r0.x = r1.yy + r2.w # r0.z = r1.yy + r2.x # r0.y and r0.w remains unchanged add r0.xz, r1.y, r2.wx Note that the channel y of r1 is replicated in the third instruction. Detailed documentation: <http://msdn.microsoft.com/library/default.asp?url=/library/en-us/directx9_c/directx/graphics/reference/AssemblyLanguageShaders/PixelShaders/Registers/Modifiers/SourceRegisterModifiers/PS_Swizzling.asp> The code must be be transformed in SSA (.ll file). The problem is that no existing LLVM instruction or intrinsic function supports swizzle and writemask. I have a few solutions: (1) Treat each channel of a register as a individual SSA variable. This could generate inefficient machine code. For example, the instruction: add r0.xz, r1.y, r2.wx is translated to two LLVM instructions: r0_x = add float r1_y, r2_w r0_z = add float r1_y, r2_x Subsequent optimization passes could insert other instructions between these two instruction (for example, in instruction scheduling pass). I don't know how they could be easily merged back into one instruction. It could lead to inefficient machine code (though correct). (2) Add new LLVM instructions, "swizzle" and "merge". # A swizzle instruction acts like a channel "selector", # selecting channels from the temporary registers r1 and r2. temp_0 = swizzle.yy r1 temp_1 = swizzle.wx r2 temp_3 = add float temp_0, temp_1 temp_4 = merge.xz float temp_3, r0 But implementing swizzle and merge instructions like this seems non-trivial. I'd know if anyone knows if there is possible and ealier alternatives? Thank you.
Chris Lattner
2005-Apr-20 08:26 UTC
[LLVMdev] adding new instructions to support "swizzle" and "writemask"
On Wed, 20 Apr 2005, Tzu-Chien Chiu wrote:> I am writing a compiler for a programmable graphics hardware. Each > registers of the hardware has four channels, namely 'r', 'b', 'g', > 'a', and each channel is a 32-bit floating point. It's similar to the > high and low 8-bit of an x86 16-bit general purpose register "AX" can > be individually referenced as "AH" and "AL". What's different is the > hardware further "source register swizzle" and "writemask". For > example:Cool!> But implementing swizzle and merge instructions like this seems non-trivial. > > I'd know if anyone knows if there is possible and ealier alternatives? > Thank you.I strongly suggest representing these with the LLVM packed type, e.g. as <4 x double>. This will keep the values together, which you require, and are first-class SSA values: http://llvm.cs.uiuc.edu/docs/LangRef.html#t_packed Adding the instructions is possible, but for your purposes, I strongly suggest modelling these as intrinsics, which are much easier to add than new instructions. For info on adding intrinsics, take a look at: http://llvm.cs.uiuc.edu/docs/ExtendingLLVM.html#intrinsic -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/
Reasonably Related Threads
- [LLVMdev] Converting a i32 pointer to a vector of i32 ( C array to LLVM vector)
- [PATCH] nouveau: codegen: Take src swizzle into account on loads
- [PATCH 02/13] nv50: add functions for swizzle resolution
- [PATCH] nouveau: codegen: Take src swizzle into account on loads
- [PATCH] nouveau: codegen: Take src swizzle into account on loads