thr3ads.net - llvm dev - [LLVMdev] Vector LLVM extension v.s. DirectX Shaders [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Tzu-Chien Chiu

2005-Dec-15 04:52 UTC

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

Dear all:

To write a compiler for Microsoft Direct3D shaders from our hardware,
I have a program which translates the Direct3D shader assembly to LLVM
assembly. I added several intrinsics for this purpose.
It's a vector ISA and has some special instructions like:
* rcp (reciprocal)
* frc (the fractional portion of each input component)
* dp4 (dot product)
* exp (exponential)
* max, min
These operations are very specific to 3D graphics and missing from the
LLVM instructions. The vector LLVM extension is not enough to compiled
Direct3D shaders.

The result LLVM assembly is assembled by llvm-as, and directly passed
to llc. The frontend is missing from the picture. The reasons are
simple. The transformations/optimizations in the frontend
1) don't understand the intrinsic
2) don't deal with packed type (it's not vectorized)

I consider to add new instructions, instead of intrinsic, to LLVM.
However, there are two options.

In the vector LLVM extension, there are dedicated instructions
manipulating the vectors like 'extract', 'combine', and
'permute'. DSP
and other scientific programs do not permuate the vectors as frequent
as 3D programs do. Almost each 3D instruction requires to permuate its
operands. For example:

// Each register is a 4-component vector
// the names of the components are x, y, z, w
add r0.xy, r1.zxyw, r2.yyyy

The components of r1 and r2 and permuted before the addition, but the
permeation result is _not_ written backed to r1 and r2. 'zxyw' and
'yyyy' are the permutation patterns (they are called 'swizzle').

'xy' is called the write mask. The result is written to only x and y
component of r0. z and w are left untouched.

_Almost each_ instruction specifies different write masks and
swizzles. There will be a lot of extract, combine, and permute LLVA
instructions. It may make the transformations difficult to match a
certain pattern in the program semantic tree. For example, to match
'mul' and 'add', and merge them to a single instruction
'mad'
(multiple-and-add). For another example, to vectorize several scalar
operations like:

add r0.xy, r1.xy, r2.xy
add r0.zw, r1.zw, r2.zw

to:

add r0.xyzw, r1.xyzw, r2.xyzw

If the write mask and swizzles are 'supported' in the each instruction
per se. The syntax/signature of LLVM assembly will need to be changed
from:

to:
<result>.<writemask> = add <ty>
<var1>.<swizzle>, <var2>.<swizzle>

This could be easier for the frontend transformations to
recognize/identify the real program semantics, without the additional
extract, combine, and permute instruction sequences.
>From the point of view writing frontend vector transformation andoptimizations, which method is better?
1. Follow the vector LLVM extension style, using dedicated instruction
to manipulate the vectors.
2. Support writemask and swizzle (permuate) as part of the instruction syntax.

I worked on the backend and don't have much experience on the fronted.
Thank you all.

--
Tzu-Chien Chiu - XGI Technology, Inc.
URL: http://www.csie.nctu.edu.tw/~jwchiu/

Chris Lattner

2005-Dec-15 22:05 UTC

head link

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

On Thu, 15 Dec 2005, Tzu-Chien Chiu wrote:> To write a compiler for Microsoft Direct3D shaders from our hardware,
> I have a program which translates the Direct3D shader assembly to LLVM
> assembly. I added several intrinsics for this purpose.
> It's a vector ISA and has some special instructions like:
> * rcp (reciprocal)
> * frc (the fractional portion of each input component)
> * dp4 (dot product)
> * exp (exponential)
> * max, min
> These operations are very specific to 3D graphics and missing from the
> LLVM instructions. The vector LLVM extension is not enough to compiled
> Direct3D shaders.
ok.
> The result LLVM assembly is assembled by llvm-as, and directly passed
> to llc. The frontend is missing from the picture. The reasons are
> simple. The transformations/optimizations in the frontend
> 1) don't understand the intrinsic
> 2) don't deal with packed type (it's not vectorized)
>
> I consider to add new instructions, instead of intrinsic, to LLVM.
> However, there are two options.
I think it makes sense for some of these (e.g. permute and insert/extract 
element) to be native instructions, but other less used ones can probably 
stay intrinsics.  In general, intrinsics are far easier to add than 
instructions.
> In the vector LLVM extension, there are dedicated instructions
> manipulating the vectors like 'extract', 'combine', and
'permute'. DSP
> and other scientific programs do not permuate the vectors as frequent
> as 3D programs do.
Yes, I think that rob is interested in porting these instructions to 
mainline LLVM, he just hasn't had time so far.
> Almost each 3D instruction requires to permuate its
> operands. For example:
>
>  // Each register is a 4-component vector
>  // the names of the components are x, y, z, w
>  add r0.xy, r1.zxyw, r2.yyyy
>
> The components of r1 and r2 and permuted before the addition, but the
> permeation result is _not_ written backed to r1 and r2. 'zxyw' and
> 'yyyy' are the permutation patterns (they are called
'swizzle').
Yup.  This is a matter of folding the permute into the add instruction as 
part of instruction selection.
> 'xy' is called the write mask. The result is written to only x and
y
> component of r0. z and w are left untouched.
yup.
> _Almost each_ instruction specifies different write masks and
> swizzles. There will be a lot of extract, combine, and permute LLVA
> instructions. It may make the transformations difficult to match a
> certain pattern in the program semantic tree. For example, to match
> 'mul' and 'add', and merge them to a single instruction
'mad'
> (multiple-and-add).
I don't really buy that.  Why do you think it will be hard?  This is 
exactly what pattern-matching instruction selectors do.
> For another example, to vectorize several scalar
> operations like:
>
>  add r0.xy, r1.xy, r2.xy
>  add r0.zw, r1.zw, r2.zw
>
> to:
>
>  add r0.xyzw, r1.xyzw, r2.xyzw
>
> If the write mask and swizzles are 'supported' in the each
instruction
> per se.
This is a separate transformation, that should be done in the dag 
combiner (not in instruction selection), but shouldn't conceptually be a 
problem.
> The syntax/signature of LLVM assembly will need to be changed
> from:
>
>   <result> = add <ty> <var1>, <var2>
>
> to:
>   <result>.<writemask> = add <ty>
<var1>.<swizzle>, <var2>.<swizzle>
>
> This could be easier for the frontend transformations to
> recognize/identify the real program semantics, without the additional
> extract, combine, and permute instruction sequences.
I don't agree.  Exposing separate operations as separate pieces should be 
better.  Of course you are welcome to make changes to support this in your 
local tree, but we won't be able to accept it back to mainline: it's far
too domain specific, and it can be achieved with explicit permute 
instructions.
>> From the point of view writing frontend vector transformation and
> optimizations, which method is better?
> 1. Follow the vector LLVM extension style, using dedicated instruction
> to manipulate the vectors.
> 2. Support writemask and swizzle (permuate) as part of the instruction
syntax.
>
> I worked on the backend and don't have much experience on the fronted.
> Thank you all.
I think that #1 is the better approach.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Robert L. Bocchino Jr.

2005-Dec-15 22:52 UTC

head link

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

> Yes, I think that rob is interested in porting these instructions  
> to mainline LLVM, he just hasn't had time so far.
That's right, I've been busy finishing up the semester, but I'll get
to this soon.

Rob

Robert L. Bocchino Jr.
Ph.D. Student, Computer Science
University of Illinois, Urbana-Champaign

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20051215/72eb6005/attachment.html>

Tzu-Chien Chiu

2005-Dec-16 06:10 UTC

head link

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

2005/12/16, Chris Lattner <sabre at nondot.org>:> > _Almost each_ instruction specifies different write masks and
> > swizzles. There will be a lot of extract, combine, and permute LLVA
> > instructions. It may make the transformations difficult to match a
> > certain pattern in the program semantic tree. For example, to match
> > 'mul' and 'add', and merge them to a single
instruction 'mad'
> > (multiple-and-add).
>
> I don't really buy that.  Why do you think it will be hard?  This is
> exactly what pattern-matching instruction selectors do.
For example, to match the 'mul' followed by an 'add' to be a
single
'mad', there could be several patterns because of the vector
manipulation instructions.

  y = A * B + C

They could simply be:

  x = mul A, B
  y = add x, C

or:
  x = mul A, B
  x_ = permute x
  y = add x_, C

or:
  x = mul A, B
  x_ = permute x
  C_ = permute C
  y = add x_, C_

or:
  x = mul A, B
  C_ = permute C
  y = add x, C_

or:
  x = mul A, B
  y_ = add x, C
  y = apply_write_mask y (select and combine)

All of above instruction sequence could be selected to be the
following machine instruction:

  // swizzle = permute
  y.writemask = mad A.permute, B.permute, C.permute

This is probably not a good example for a frontend transformation, but
I just wanna to point out the potential difficulty to implement a
frontend pass.

--
Tzu-Chien Chiu - XGI Technology, Inc.
URL: http://www.csie.nctu.edu.tw/~jwchiu/

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Dec 2005 - [LLVMdev] Vector LLVM extension v.s. DirectX Shaders

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

Reasonably Related Threads