thr3ads.net - llvm dev - [LLVMdev] adding new instructions to support "swizzle" and "writemask" [Apr 2005]

If this information is useful, please help other people find it:
Share via:

Tzu-Chien Chiu

2005-Apr-20 08:03 UTC

[LLVMdev] adding new instructions to support "swizzle" and "writemask"

Hello, everyone:

I am writing a compiler for a programmable graphics hardware. Each
registers of the hardware has four channels, namely 'r', 'b',
'g',
'a', and each channel is a 32-bit floating point. It's similar to
the
high and low 8-bit of an x86 16-bit general purpose register "AX" can
be individually referenced as "AH" and "AL". What's
different is the
hardware further "source register swizzle" and "writemask".
For
example:

# The following two instructions are equivalent.
# They cost the same instruction slot, and have same
# execution time. Four channels are added in parallel.
add r0, r1, r2
add r0.xyzw, r1.xyzw, r2.xyzw

# equivalent to:
# r0.x = r1.yy + r2.w
# r0.z = r1.yy + r2.x
# r0.y and r0.w remains unchanged
add r0.xz, r1.y, r2.wx

Note that the channel y of r1 is replicated in the third instruction.

Detailed documentation:
<msdn.microsoft.com/library/default.asp?url=/library/en-us/directx9_c/directx/graphics/reference/AssemblyLanguageShaders/PixelShaders/Registers/Modifiers/SourceRegisterModifiers/PS_Swizzling.asp>

The code must be be transformed in SSA (.ll file). The problem is that
no existing LLVM instruction or intrinsic function supports swizzle
and writemask.

I have a few solutions:

(1) Treat each channel of a register as a individual SSA variable.

This could generate inefficient machine code.

For example, the instruction:

add r0.xz, r1.y, r2.wx

is translated to two LLVM instructions:

r0_x = add float r1_y, r2_w
r0_z = add float r1_y, r2_x

Subsequent optimization passes could insert other instructions between
these two instruction (for example, in instruction scheduling pass). I
don't know how they could be easily merged back into one instruction.
It could lead to inefficient machine code (though correct).

(2) Add new LLVM instructions, "swizzle" and "merge".

# A swizzle instruction acts like a channel "selector",
# selecting channels from the temporary registers r1 and r2.
temp_0 = swizzle.yy r1
temp_1 = swizzle.wx r2

temp_3 = add float temp_0, temp_1

temp_4 = merge.xz float temp_3, r0

But implementing swizzle and merge instructions like this seems non-trivial.

I'd know if anyone knows if there is possible and ealier alternatives?
Thank you.

Chris Lattner

2005-Apr-20 08:26 UTC

head link

[LLVMdev] adding new instructions to support "swizzle" and "writemask"

On Wed, 20 Apr 2005, Tzu-Chien Chiu wrote:> I am writing a compiler for a programmable graphics hardware. Each
> registers of the hardware has four channels, namely 'r',
'b', 'g',
> 'a', and each channel is a 32-bit floating point. It's similar
to the
> high and low 8-bit of an x86 16-bit general purpose register "AX"
can
> be individually referenced as "AH" and "AL". What's
different is the
> hardware further "source register swizzle" and
"writemask". For
> example:
Cool!
> But implementing swizzle and merge instructions like this seems
non-trivial.
>
> I'd know if anyone knows if there is possible and ealier alternatives?
> Thank you.
I strongly suggest representing these with the LLVM packed type, e.g. as 
<4 x double>.  This will keep the values together, which you require, and 
are first-class SSA values:
llvm.cs.uiuc.edu/docs/LangRef.html#t_packed

Adding the instructions is possible, but for your purposes, I strongly 
suggest modelling these as intrinsics, which are much easier to add than 
new instructions.  For info on adding intrinsics, take a look at:
llvm.cs.uiuc.edu/docs/ExtendingLLVM.html#intrinsic

-Chris

-- 
nondot.org/sabre
llvm.cs.uiuc.edu

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Apr 2005 - [LLVMdev] adding new instructions to support "swizzle" and "writemask"

[LLVMdev] adding new instructions to support "swizzle" and "writemask"

[LLVMdev] adding new instructions to support "swizzle" and "writemask"

Apparently Analagous Threads