Displaying 5 results from an estimated 5 matches for "4x4x4x4".
2003 Sep 25
1
apply on a 4D array
I am trying to multiply a 3D array of 4x4x4 by the 4 3D arrays of a 4D array
with dimensions 4x4x4x4 (the last dimension being the one that I want to
split by).
(4x4x4 array)
> hiaAry
, , a1
i1 i2 i3 i4
h1 9.5936098 6.001040 0.08772 0.3138600
h2 1.2003500 1.454570 2.79248 0.0000000
h3 0.1346500 0.201220 0.39256 0.5464000
h4 0.0109000 0.012270 0.16417 0.2766900
,...
2009 Feb 16
2
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Evan Cheng-2 wrote:
>
> Well, how many possible permutations are there? Is it possible to
> model each case as a separate physical register?
>
> Evan
>
I don't think so. There are 4x4x4x4 = 256 permutations. For example:
* xyzw: default
* zxyw
* yyyy: splat
Even if can model each of these 256 cases as a separate physical register,
how can I model the use of r0.xyzw in the following example:
// dp4 = dot product 4-element
dp4 r0.x, r1, r2
dp4 r0.y, r3, r4
dp4 r0.z, r5, r6
dp4 r0.w...
2009 Feb 16
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...edu
Subject: Re: [LLVMdev] Modeling GPU vector registers, again (with my
implementation)
Evan Cheng-2 wrote:
>
> Well, how many possible permutations are there? Is it possible to
> model each case as a separate physical register?
>
> Evan
>
I don't think so. There are 4x4x4x4 = 256 permutations. For example:
* xyzw: default
* zxyw
* yyyy: splat
Even if can model each of these 256 cases as a separate physical
register,
how can I model the use of r0.xyzw in the following example:
// dp4 = dot product 4-element
dp4 r0.x, r1, r2
dp4 r0.y, r3, r4
dp4 r0.z, r5, r6
dp4 r0.w...
2009 Feb 13
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
On Feb 13, 2009, at 9:47 AM, Alex wrote:
> It seems to me that LLVM sub-register is not for the following
> hardware architecture.
>
> All instructions of a hardware are vector instructions. All
> registers contains
> 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w.
>
> Most instructions write more than one elements in this way:
>
> mul
2009 Feb 13
3
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
It seems to me that LLVM sub-register is not for the following hardware
architecture.
All instructions of a hardware are vector instructions. All registers
contains
4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w.
Most instructions write more than one elements in this way:
mul r0.xyw, r1, r2
add r0.z, r3, r4
sub r5, r0, r1
Notice that the four elements of r0 are written