Way back on Wed Dec 14, 2005, Tzu-Chien Chiu wrote:
> To write a compiler for Microsoft Direct3D shaders from our hardware,
> I have a program which translates the Direct3D shader assembly to LLVM
> assembly. I added several intrinsics for this purpose.
> It's a vector ISA and has some special instructions like:
> * rcp (reciprocal)
> ...
> These operations are very specific to 3D graphics and missing from the
> LLVM instructions.
In case you haven't already noticed, mainline CVS has significantly better
support for adding target-specific intrinsics like this, and lots of
examples. Mainline CVS now supports all of the altivec intrinsics, and a
big chunk of the SSE ones (still in progress). Adding support for
Direct3D shaders should be straight-forward.
Take a look at llvm/include/llvm/IntrinsicsPowerPC.td for examples.
Adding an intrinsic to LLVM now is just a matter of adding it to the
include/llvm/Intrinsics*.td file and adding a line to your code generator
.td file.
> DSP and other scientific programs do not permuate the vectors as
> frequent as 3D programs do. Almost each 3D instruction requires to
> permuate its operands. For example:
>
> // Each register is a 4-component vector
> // the names of the components are x, y, z, w
> add r0.xy, r1.zxyw, r2.yyyy
>
> The components of r1 and r2 and permuted before the addition, but the
> permeation result is _not_ written backed to r1 and r2. 'zxyw' and
> 'yyyy' are the permutation patterns (they are called
'swizzle').
> 'xy' is called the write mask. The result is written to only x and
y
> component of r0. z and w are left untouched.
To support this, and other things, LLVM now has a new shufflevector
instruction. In particular, you can write this as something like this:
%r1.1 = ...
%r0.1 = ...
; Swizzle the inputs
%tmp1 = shufflevector <4 x float> %r1.1, <4 x float> undef,
<4 x uint> <uint 3, uint 1, uint 2, uint 0>
%tmp2 = shufflevector <4 x float> %r0.1, <4 x float> undef,
<4 x uint> <uint 2, uint 2, uint 2, uint 2>
; do the add
%tmp3 = add <4 x float> %tmp1, %tmp2
; insert the values according to the write mask.
%r0.2 = shufflevector <4 x float> %r0.1, %tmp3,
<4 x uint> <uint ...>
If you are using a selection-dag based code generator, pattern matching
this as an add with two shuffle inputs and a shuffle result should be
straight-forward.
Hopefully this helps!
-Chris
--
http://nondot.org/sabre/
http://llvm.org/