search for: v64i8

Displaying 4 results from an estimated 4 matches for "v64i8".

Did you mean: v4i8
2020 May 18
2
Use Galois field New Instructions (GFNI) to combine affine instructions
On 5/18/20 8:24 PM, Craig Topper wrote: > I can tell you that your avx512 issue is that v64i8 gfni instructions also > require avx512bw to be enabled to make v64i8 a supported type. The C > intrinsics handling in the front end know this rule. But since you > generated your own intrinsics you bypassed that. Indeed that's the issue... I was stick with what Intel announces here (...
2020 Jun 25
2
How to implement load/store for vector predicate register
...ardware has 64 vector registers(vr for short) and 8 vector predicate registers. And there is no move instructions between vr and vpr. vr supports many operations, and vpr supports vpror, vprxor, vprand and vprinv operations. A vr has 512 bits, and a vpr has 128 bits. vr is used for v16i32, v32i16, v64i8. And a scalar register has 32 bits. If we compare or add two v16i32, a element in vpr has 8 bits. If we compare or add two v64i8, then a element in vpr has 2 bits(one bit for compare flag and one bit for carry flag). A element in vpr contains carry flag and compare flag. We have defined registers...
2020 May 18
2
Use Galois field New Instructions (GFNI) to combine affine instructions
Hello everyone, On the last couple of days, I have been experimenting with teaching LLVM how to combine a set of affine instructions into an instruction that uses the GFNI [1] AVX512 extension, especially GF2P8AFFINEQB [2]. While the general idea seems to work, I have some questions about my current implementation (see below). FTR, I have named this transformation AffineCombineExpr (ACE).
2020 Jun 26
2
How to implement load/store for vector predicate register
...ardware has 64 vector registers(vr for short) and 8 vector predicate registers. And there is no move instructions between vr and vpr. vr supports many operations, and vpr supports vpror, vprxor, vprand and vprinv operations. A vr has 512 bits, and a vpr has 128 bits. vr is used for v16i32, v32i16, v64i8. And a scalar register has 32 bits. If we compare or add two v16i32, a element in vpr has 8 bits. If we compare or add two v64i8, then a element in vpr has 2 bits(one bit for compare flag and one bit for carry flag). A element in vpr contains carry flag and compare flag. We have defined registers...