thr3ads.net - llvm dev - [llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly? [Apr 2021]

If this information is useful, please help other people find it:
Share via:

Luo, Yuanke via llvm-dev

2021-Apr-14 12:39 UTC

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

Hi,

I discussed with Florian for a solution at https://reviews.llvm.org/D99152.
Florian suggest introducing a specific intrinsic to replace bitcast in
front-end, and backend need extra effort to optimize or eliminate the intrinsic.
This idea looks good to me. Here is my plan.

  1.  specify x86_amx in LangRef and verify the IR. Patches were uploaded at
https://reviews.llvm.org/D100032 and https://reviews.llvm.org/D100472.
  2.  Add llvm.x86.tile.cast intrinsic in LLVM.
  3.  Optimize some of llvm.x86.tile.cast code as bitcast does, and transform
llvm.x86.tile.cast to amx intrinsic if it can't be eliminated.
  4.  After the above 3 items are finished, replace bitcast with
llvm.x86.tile.cast in front-end when generate IR for amx builtin.
  5.  After some time for stabilization, remove bitcast transform code from
LLVM.

Thanks
Yuanke

From: Florian Hahn <florian_hahn at apple.com>
Sent: Tuesday, March 23, 2021 6:16 PM
To: Luo, Yuanke <yuanke.luo at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; Zhang, Xiang1 <xiang1.zhang
at intel.com>; James Y Knight <jyknight at google.com>
Subject: Re: [llvm-dev] Does middle-end pass need to consider some special type
when doing optimization? Or letting back-end to revert the optimization
accordingly?




On Mar 23, 2021, at 08:21, Luo, Yuanke <yuanke.luo at
intel.com<mailto:yuanke.luo at intel.com>> wrote:

I prototyped the approach 1 at https://reviews.llvm.org/D99152 and I realized
that sometimes bitcast optimization in middle-end is helpful. For the test case
of inner_product(), we need extra effort eliminate
llvm.x86.vector.amx.cast.x86amx.v256i32 by ourselves.


I think that’s expected, you might need to add some optimizations for the
conversion intrinsic. But that can easily be limited to the AMX specific passes
and all existing LLVM transformations should remain correct without changes.

Cheers,
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210414/9f8fe217/attachment.html>

Roman Lebedev via llvm-dev

2021-Apr-14 12:44 UTC

head link

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

I think i commented about this already, but isn't it a problem
that you will still be doing non-sequential loads/stores
via plain load/store IR instructions?

If they would just natively take the underlying <256 x i32> or whatever,
will you even need all this x86_amx special handling, and x86_amx itself?

Roman

On Wed, Apr 14, 2021 at 3:39 PM Luo, Yuanke via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi,
>
>
>
> I discussed with Florian for a solution at https://reviews.llvm.org/D99152.
Florian suggest introducing a specific intrinsic to replace bitcast in
front-end, and backend need extra effort to optimize or eliminate the intrinsic.
This idea looks good to me. Here is my plan.
>
> specify x86_amx in LangRef and verify the IR. Patches were uploaded at
https://reviews.llvm.org/D100032 and https://reviews.llvm.org/D100472.
> Add llvm.x86.tile.cast intrinsic in LLVM.
> Optimize some of llvm.x86.tile.cast code as bitcast does, and transform
llvm.x86.tile.cast to amx intrinsic if it can't be eliminated.
> After the above 3 items are finished, replace bitcast with
llvm.x86.tile.cast in front-end when generate IR for amx builtin.
> After some time for stabilization, remove bitcast transform code from LLVM.
>
>
>
> Thanks
>
> Yuanke
>
>
>
> From: Florian Hahn <florian_hahn at apple.com>
> Sent: Tuesday, March 23, 2021 6:16 PM
> To: Luo, Yuanke <yuanke.luo at intel.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Zhang, Xiang1
<xiang1.zhang at intel.com>; James Y Knight <jyknight at google.com>
> Subject: Re: [llvm-dev] Does middle-end pass need to consider some special
type when doing optimization? Or letting back-end to revert the optimization
accordingly?
>
>
>
>
>
>
>
> On Mar 23, 2021, at 08:21, Luo, Yuanke <yuanke.luo at intel.com>
wrote:
>
>
>
> I prototyped the approach 1 at https://reviews.llvm.org/D99152 and I
realized that sometimes bitcast optimization in middle-end is helpful. For the
test case of inner_product(), we need extra effort eliminate
llvm.x86.vector.amx.cast.x86amx.v256i32 by ourselves.
>
>
>
>
>
> I think that’s expected, you might need to add some optimizations for the
conversion intrinsic. But that can easily be limited to the AMX specific passes
and all existing LLVM transformations should remain correct without changes.
>
>
>
> Cheers,
>
> Florian
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Florian Hahn via llvm-dev

2021-Apr-14 13:38 UTC

head link

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

> On Apr 14, 2021, at 13:39, Luo, Yuanke <yuanke.luo at intel.com>
wrote:
> 
> Hi,
>  
> I discussed with Florian for a solution at https://reviews.llvm.org/D99152
<https://reviews.llvm.org/D99152>. Florian suggest introducing a specific
intrinsic to replace bitcast in front-end, and backend need extra effort to
optimize or eliminate the intrinsic. This idea looks good to me. Here is my
plan.
> specify x86_amx in LangRef and verify the IR. Patches were uploaded at
https://reviews.llvm.org/D100032 <https://reviews.llvm.org/D100032> and
https://reviews.llvm.org/D100472 <https://reviews.llvm.org/D100472>.
> Add llvm.x86.tile.cast intrinsic in LLVM.
> Optimize some of llvm.x86.tile.cast code as bitcast does, and transform
llvm.x86.tile.cast to amx intrinsic if it can't be eliminated.
> After the above 3 items are finished, replace bitcast with
llvm.x86.tile.cast in front-end when generate IR for amx builtin.
> After some time for stabilization, remove bitcast transform code from LLVM.
>  

Thanks for the update, I think that’s a good step forward to more effectively
hash out the specification for the x86_amx type. I think it clarifies the
expected behavior, but it would be great if other people  could also take a
look.

Cheers,
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210414/f14d6716/attachment.html>

llvm dev - Apr 2021 - Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?