thr3ads.net - llvm dev - [llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly? [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Luo, Yuanke via llvm-dev

2021-Mar-22 14:02 UTC

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

Yes, bitcasts introduced by the frontend call amx intrinsics. We use vector to
represent 2D amx tile in C language, on the other hand we don’t want to mix our
amx tile to other vector operation, so x86_amx is introduced to isolate amx
intrinsics from normal vector operation. The bitcast is to monitor that a normal
vector is passed to amx intrinsics. In below example, we need to transform the
bitcast to a vector store and an amx load intrinsic. The x86_amx* is unexpected
at the beginning, but in the pass of InstrCombine the middle-end generate the
x86_amx pointer.

define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y,
i16 %r, i16 %c, i8* %buf, i64 %s) {
; CHECK-LABEL: @test_src_add(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    [[TMP0:%.*]] = alloca <256 x i32>, align 64
; CHECK-NEXT:    [[ADD:%.*]] = add <256 x i32> [[Y:%.*]], [[X:%.*]]
; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <256 x i32>* [[TMP0]] to i8*
; CHECK-NEXT:    store <256 x i32> [[ADD]], <256 x i32>* [[TMP0]],
align 1024
; CHECK-NEXT:    [[TMP2:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16
[[R:%.*]], i16 [[C:%.*]], i8* [[TMP1]], i64 64)
; CHECK-NEXT:    call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]],
i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP2]])
; CHECK-NEXT:    ret void
;
entry:
  %add = add <256 x i32> %y, %x
  %t = bitcast <256 x i32> %add to x86_amx
  call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64 %s,
x86_amx %t)
  ret void
}

Thanks
Yuanke

From: Florian Hahn <florian_hahn at apple.com>
Sent: Monday, March 22, 2021 9:40 PM
To: Zhang, Xiang1 <xiang1.zhang at intel.com>; llvm-dev <llvm-dev at
lists.llvm.org>
Cc: James Y Knight <jyknight at google.com>; Luo, Yuanke <yuanke.luo at
intel.com>
Subject: Re: [llvm-dev] Does middle-end pass need to consider some special type
when doing optimization? Or letting back-end to revert the optimization
accordingly?




On Mar 19, 2021, at 02:04, Zhang, Xiang1 via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Yes, that is equivalent, but at Front end, we don’t have existed type to express
AMX type.
The “AMX type” in c/c++ language is implied by the following structure:

typedef int tile1024i __attribute__((__vector_size__(1024), __aligned__(64)));
typedef struct __tile1024i_str {
  const unsigned short row;
  const unsigned short col;
  tile1024i tile;
} __tile1024i

So we handle the “%src = load <256 x i32>, <256 x i32>* %addr, align
64       %2 = bitcast <256 x i32> %src to x86_amx”
not “%2 = load x86_amx, x86_amx* %addr, align 64”


Are the bitcasts introduced by the frontend? If you need different semantics for
loading from an `x86_amx` pointer, could the frontend generate a call to an
intrinsic instead?

Cheers,
Florian

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210322/132d411f/attachment-0001.html>

Florian Hahn via llvm-dev

2021-Mar-22 15:04 UTC

head link

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

> On Mar 22, 2021, at 14:02, Luo, Yuanke <yuanke.luo at intel.com>
wrote:
> 
> Yes, bitcasts introduced by the frontend call amx intrinsics. We use vector
to represent 2D amx tile in C language, on the other hand we don’t want to mix
our amx tile to other vector operation, so x86_amx is introduced to isolate amx
intrinsics from normal vector operation. The bitcast is to monitor that a normal
vector is passed to amx intrinsics. In below example, we need to transform the
bitcast to a vector store and an amx load intrinsic. The x86_amx* is unexpected
at the beginning, but in the pass of InstrCombine the middle-end generate the
x86_amx pointer.
>  
> define dso_local void @test_src_add(<256 x i32> %x, <256 x i32>
%y, i16 %r, i16 %c, i8* %buf, i64 %s) {
> ; CHECK-LABEL: @test_src_add(
> ; CHECK-NEXT:  entry:
> ; CHECK-NEXT:    [[TMP0:%.*]] = alloca <256 x i32>, align 64
> ; CHECK-NEXT:    [[ADD:%.*]] = add <256 x i32> [[Y:%.*]], [[X:%.*]]
> ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <256 x i32>* [[TMP0]] to i8*
> ; CHECK-NEXT:    store <256 x i32> [[ADD]], <256 x i32>*
[[TMP0]], align 1024
> ; CHECK-NEXT:    [[TMP2:%.*]] = call x86_amx
@llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 [[C:%.*]], i8* [[TMP1]], i64
64)
> ; CHECK-NEXT:    call void @llvm.x86.tilestored64.internal(i16 [[R]], i16
[[C]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP2]])
> ; CHECK-NEXT:    ret void
> ;
> entry:
>   %add = add <256 x i32> %y, %x
>   %t = bitcast <256 x i32> %add to x86_amx
>   call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64
%s, x86_amx %t)
>   ret void
> }
>  
Ok I think I understand the issue better now. IIUC you use `bitcast` in the
frontend to convert between regular vector and the AMX values?

This doesn’t really match the way `bitcast` is defined (as discussed earlier)
and this mismatch seems to be the source of the issues. I don’t think you should
use `bitcast`s that way and instead adjust the frontend to emit different code
for the conversion between vector and amx values (e.g. use an intrinsic to
convert between vector and amx values; the intrinsic can be directly lowered to
the conversion code).

I think there are at least two ways forward:

1. Avoid using bitcasts for the conversion in the frontend.
2. Try & define the semantics of bitcast/load for AMX types, such that the
transformations you want to exclude in instcombine are illegal.

If you decide to go with 2., you probably will have to make a convincing 
argument why this is the right thing to do and why other alternatives do not
work, because it means that certain general transformations that are legal at
the moment become illegal for certain types (which is illustrated by the
instcombine patches you mentioned)

Cheers.
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210322/25944c0c/attachment.html>

llvm dev - Mar 2021 - Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?