thr3ads.net - llvm dev - [llvm-dev] Use Galois field New Instructions (GFNI) to combine affine instructions [May 2020]

If this information is useful, please help other people find it:
Share via:

Adrien Guinet via llvm-dev

2020-May-18 18:29 UTC

[llvm-dev] Use Galois field New Instructions (GFNI) to combine affine instructions

On 5/18/20 8:24 PM, Craig Topper wrote:> I can tell you that your avx512 issue is that v64i8 gfni instructions also
> require avx512bw to be enabled to make v64i8 a supported type. The C
> intrinsics handling in the front end know this rule. But since you
> generated your own intrinsics you bypassed that.
Indeed that's the issue... I was stick with what Intel announces here
(https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=gf2p&expand=2907),
but
I guess I should have checked the C intrinsics.

I will fix my code to verify the presence of avx512bw if I ever need v64i8.

Thanks for the hint!

Craig Topper via llvm-dev

2020-May-19 00:23 UTC

head link

[llvm-dev] Use Galois field New Instructions (GFNI) to combine affine instructions

I'm guessing AggressiveInstCombine only runs once in the pipeline and its
probably before the vectorizers. Its existing transforms probably output
things the vectorizer can understand and vectorize. In your case you're
fully dependent on vectorized code.

We don't like to form target specific intrinsics in the middle end
pipeline. We'd prefer to do something in the X86 specific IR pipeline or
Machine IR pipeline run by llc. Or have a generic concept in IR that we can
express like llvm.ctlz, llvm.cttz, llvm.popcnt or llvm.bitreverse. We have
methods in TargetTransformInfo to query for targets supporting them or in
the worst case we're able to generate reasonable code if the target
doesn't
support it natively.

I'll try to point some more people here at Intel towards this thread.

~Craig

On Mon, May 18, 2020 at 11:30 AM Adrien Guinet via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 5/18/20 8:24 PM, Craig Topper wrote:
> > I can tell you that your avx512 issue is that v64i8 gfni instructions
> also
> > require avx512bw to be enabled to make v64i8 a supported type. The C
> > intrinsics handling in the front end know this rule. But since you
> > generated your own intrinsics you bypassed that.
>
> Indeed that's the issue... I was stick with what Intel announces here
> (
>
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=gf2p&expand=2907),
> but
> I guess I should have checked the C intrinsics.
>
> I will fix my code to verify the presence of avx512bw if I ever need v64i8.
>
> Thanks for the hint!
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200518/5f348d93/attachment.html>

Adrien Guinet via llvm-dev

2020-May-19 07:58 UTC

head link

[llvm-dev] Use Galois field New Instructions (GFNI) to combine affine instructions

On 5/19/20 2:23 AM, Craig Topper wrote:> I'm guessing AggressiveInstCombine only runs once in the pipeline and
its
> probably before the vectorizers. Its existing transforms probably output
> things the vectorizer can understand and vectorize. In your case you're
> fully dependent on vectorized code.
Yes. I think it would generate more efficient code if I would do the combination
within
the loop vectorization algorithm (that is, if I understood correctly, in VPlan).
It would
for instance give more opportunity to "hide" the latency of
GF2P8AFFINEQB, by e.g fine
tuning the loop unrolling factor. But I have to take more time to figure out how
to make
this happen.
> We don't like to form target specific intrinsics in the middle end
> pipeline. We'd prefer to do something in the X86 specific IR pipeline
or
> Machine IR pipeline run by llc. Or have a generic concept in IR that we can
> express like llvm.ctlz, llvm.cttz, llvm.popcnt or llvm.bitreverse. We have
> methods in TargetTransformInfo to query for targets supporting them or in
> the worst case we're able to generate reasonable code if the target
doesn't
> support it natively.
I thought about putting that in the X86 pipeline, but it might remove some
opportunities:

* supporting equivalent instructions from another vendor without rewriting
everything
* fine tuning the loop vectorization process like said above

The approach with defining a generic intrinsic in the IR can be an interesting
one, the
remaining question would be which API should we put there, so that it's
generic enough (to
be a generic intrinsic) but doesn't prevent some optimization opportunities.
That being
said, something like:

llvm.gf2p8.XX(<i8, XX> value, <i8, XX> matrix, i8 cst)

seems reasonnable to me.

I guess that's the eternal debate on where optimization X should happen and
how... :)
> I'll try to point some more people here at Intel towards this thread.
Thanks a lot!

Adrien.

llvm dev - May 2020 - Use Galois field New Instructions (GFNI) to combine affine instructions

[llvm-dev] Use Galois field New Instructions (GFNI) to combine affine instructions

[llvm-dev] Use Galois field New Instructions (GFNI) to combine affine instructions

[llvm-dev] Use Galois field New Instructions (GFNI) to combine affine instructions