search for: zxyw

Displaying 7 results from an estimated 7 matches for "zxyw".

Did you mean: zxy
2005 Dec 15
3
[LLVMdev] Vector LLVM extension v.s. DirectX Shaders
...and 'permute'. DSP and other scientific programs do not permuate the vectors as frequent as 3D programs do. Almost each 3D instruction requires to permuate its operands. For example: // Each register is a 4-component vector // the names of the components are x, y, z, w add r0.xy, r1.zxyw, r2.yyyy The components of r1 and r2 and permuted before the addition, but the permeation result is _not_ written backed to r1 and r2. 'zxyw' and 'yyyy' are the permutation patterns (they are called 'swizzle'). 'xy' is called the write mask. The result is written t...
2006 Apr 08
0
[LLVMdev] RE: LLVM extension v.s. DirectX Shaders
...and other scientific programs do not permuate the vectors as > frequent as 3D programs do. Almost each 3D instruction requires to > permuate its operands. For example: > > // Each register is a 4-component vector > // the names of the components are x, y, z, w > add r0.xy, r1.zxyw, r2.yyyy > > The components of r1 and r2 and permuted before the addition, but the > permeation result is _not_ written backed to r1 and r2. 'zxyw' and > 'yyyy' are the permutation patterns (they are called 'swizzle'). > 'xy' is called the write mask....
2009 Feb 16
2
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in the following example: // dp4 = dot product 4-element dp4 r0.x, r1, r2 dp4 r0.y, r3, r4 dp4 r0.z, r5, r6 dp4 r0.w, r7, r8 sub r5, r0.xyzw, r6 -- View this message in c...
2005 Dec 15
0
[LLVMdev] Vector LLVM extension v.s. DirectX Shaders
...sted in porting these instructions to mainline LLVM, he just hasn't had time so far. > Almost each 3D instruction requires to permuate its > operands. For example: > > // Each register is a 4-component vector > // the names of the components are x, y, z, w > add r0.xy, r1.zxyw, r2.yyyy > > The components of r1 and r2 and permuted before the addition, but the > permeation result is _not_ written backed to r1 and r2. 'zxyw' and > 'yyyy' are the permutation patterns (they are called 'swizzle'). Yup. This is a matter of folding the permu...
2009 Feb 16
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...ain (with my implementation) Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in the following example: // dp4 = dot product 4-element dp4 r0.x, r1, r2 dp4 r0.y, r3, r4 dp4 r0.z, r5, r6 dp4 r0.w, r7, r8 sub r5, r0.xyzw, r6 -- View this message in c...
2009 Feb 13
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
On Feb 13, 2009, at 9:47 AM, Alex wrote: > It seems to me that LLVM sub-register is not for the following > hardware architecture. > > All instructions of a hardware are vector instructions. All > registers contains > 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. > > Most instructions write more than one elements in this way: > > mul
2009 Feb 13
3
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
It seems to me that LLVM sub-register is not for the following hardware architecture. All instructions of a hardware are vector instructions. All registers contains 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. Most instructions write more than one elements in this way: mul r0.xyw, r1, r2 add r0.z, r3, r4 sub r5, r0, r1 Notice that the four elements of r0 are written