thr3ads.net - search: "4x4x4x4"

Displaying 5 results from an estimated 5 matches for "4x4x4x4".

2003 Sep 25

apply on a 4D array

I am trying to multiply a 3D array of 4x4x4 by the 4 3D arrays of a 4D array with dimensions 4x4x4x4 (the last dimension being the one that I want to split by). (4x4x4 array) > hiaAry , , a1 i1 i2 i3 i4 h1 9.5936098 6.001040 0.08772 0.3138600 h2 1.2003500 1.454570 2.79248 0.0000000 h3 0.1346500 0.201220 0.39256 0.5464000 h4 0.0109000 0.012270 0.16417 0.2766900 ,...

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in the following example: // dp4 = dot product 4-element dp4 r0.x, r1, r2 dp4 r0.y, r3, r4 dp4 r0.z, r5, r6 dp4 r0.w...

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

...edu Subject: Re: [LLVMdev] Modeling GPU vector registers, again (with my implementation) Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in the following example: // dp4 = dot product 4-element dp4 r0.x, r1, r2 dp4 r0.y, r3, r4 dp4 r0.z, r5, r6 dp4 r0.w...

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 13

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

On Feb 13, 2009, at 9:47 AM, Alex wrote: > It seems to me that LLVM sub-register is not for the following > hardware architecture. > > All instructions of a hardware are vector instructions. All > registers contains > 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. > > Most instructions write more than one elements in this way: > > mul

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 13

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

It seems to me that LLVM sub-register is not for the following hardware architecture. All instructions of a hardware are vector instructions. All registers contains 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. Most instructions write more than one elements in this way: mul r0.xyw, r1, r2 add r0.z, r3, r4 sub r5, r0, r1 Notice that the four elements of r0 are written

search for: 4x4x4x4