Displaying 4 results from an estimated 4 matches for "v64i8".
Did you mean:
v4i8
2020 May 18
2
Use Galois field New Instructions (GFNI) to combine affine instructions
On 5/18/20 8:24 PM, Craig Topper wrote:
> I can tell you that your avx512 issue is that v64i8 gfni instructions also
> require avx512bw to be enabled to make v64i8 a supported type. The C
> intrinsics handling in the front end know this rule. But since you
> generated your own intrinsics you bypassed that.
Indeed that's the issue... I was stick with what Intel announces here
(...
2020 Jun 25
2
How to implement load/store for vector predicate register
...ardware has 64 vector registers(vr for short) and 8 vector predicate registers. And there is no move instructions between vr and vpr.
vr supports many operations, and vpr supports vpror, vprxor, vprand and vprinv operations.
A vr has 512 bits, and a vpr has 128 bits. vr is used for v16i32, v32i16, v64i8. And a scalar register has 32 bits.
If we compare or add two v16i32, a element in vpr has 8 bits. If we compare or add two v64i8, then a element in vpr has 2 bits(one bit for compare flag and one bit for carry flag).
A element in vpr contains carry flag and compare flag.
We have defined registers...
2020 May 18
2
Use Galois field New Instructions (GFNI) to combine affine instructions
Hello everyone,
On the last couple of days, I have been experimenting with teaching LLVM how to combine a
set of affine instructions into an instruction that uses the GFNI [1] AVX512 extension,
especially GF2P8AFFINEQB [2]. While the general idea seems to work, I have some questions
about my current implementation (see below). FTR, I have named this transformation
AffineCombineExpr (ACE).
2020 Jun 26
2
How to implement load/store for vector predicate register
...ardware has 64 vector registers(vr for short) and 8 vector predicate registers. And there is no move instructions between vr and vpr.
vr supports many operations, and vpr supports vpror, vprxor, vprand and vprinv operations.
A vr has 512 bits, and a vpr has 128 bits. vr is used for v16i32, v32i16, v64i8. And a scalar register has 32 bits.
If we compare or add two v16i32, a element in vpr has 8 bits. If we compare or add two v64i8, then a element in vpr has 2 bits(one bit for compare flag and one bit for carry flag).
A element in vpr contains carry flag and compare flag.
We have defined registers...