search for: tp22001613p22034856

Displaying 4 results from an estimated 4 matches for "tp22001613p22034856".

2009 Feb 16
2
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...I model the use of r0.xyzw in the following example: // dp4 = dot product 4-element dp4 r0.x, r1, r2 dp4 r0.y, r3, r4 dp4 r0.z, r5, r6 dp4 r0.w, r7, r8 sub r5, r0.xyzw, r6 -- View this message in context: http://www.nabble.com/Modeling-GPU-vector-registers%2C-again-%28with-my-implementation%29-tp22001613p22034856.html Sent from the LLVM - Dev mailing list archive at Nabble.com.
2009 Feb 16
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...I model the use of r0.xyzw in the following example: // dp4 = dot product 4-element dp4 r0.x, r1, r2 dp4 r0.y, r3, r4 dp4 r0.z, r5, r6 dp4 r0.w, r7, r8 sub r5, r0.xyzw, r6 -- View this message in context: http://www.nabble.com/Modeling-GPU-vector-registers%2C-again-%28with-my- implementation%29-tp22001613p22034856.html Sent from the LLVM - Dev mailing list archive at Nabble.com. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
2009 Feb 13
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
On Feb 13, 2009, at 9:47 AM, Alex wrote: > It seems to me that LLVM sub-register is not for the following > hardware architecture. > > All instructions of a hardware are vector instructions. All > registers contains > 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. > > Most instructions write more than one elements in this way: > > mul
2009 Feb 13
3
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
It seems to me that LLVM sub-register is not for the following hardware architecture. All instructions of a hardware are vector instructions. All registers contains 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. Most instructions write more than one elements in this way: mul r0.xyw, r1, r2 add r0.z, r3, r4 sub r5, r0, r1 Notice that the four elements of r0 are written