thr3ads.net - llvm dev - [llvm-dev] Is it legal to pass a half by value on x86

If this information is useful, please help other people find it:
Share via:

Wang, Pengfei via llvm-dev

2021-Mar-05 14:46 UTC

[llvm-dev] Is it legal to pass a half by value on x86_64?

Hi Jason,

The different behavior between Linux and Windows comes form the difference of
the calling conversion. Windows uses 4 registers for arguments passing which
Linux uses 6.
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#parameter-passing

Thanks
Pengfei

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Jason
Hafer via llvm-dev
Sent: Friday, March 5, 2021 10:21 PM
To: Craig Topper <craig.topper at gmail.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Is it legal to pass a half by value on x86_64?

Hi All,

Thank you very much for all the great information.  This is awesome!

To circle back on Craig's questions.
I did notice LLVM 11 behave very differently.

** Per: What does "incorrect math operations" mean?
The half is passed to the function as a float.  The function does operations
with other half numbers.  On Windows when we don't get the float to half
conversation the input is always truncated to 0.0.

** Per: "Do you have a more complete IR file for Windows that I can take a
look at?"
I can get you our IR if you want, but I think it is more convoluted than
required.  I was working on a unit test and I think all one needs to see the
anomaly is:
define void @foo(i8, i8, i8, i8, half) {
; CHECK-I686:    callq __gnu_f2h_ieee

  %6 = alloca half
  store half %4, half* %6, align 1
  ret void
}

x86_64-pc-windows gives:
push rax
.seh_stackalloc 8
.seh_endprologue
movss xmm0, dword ptr [rsp + 48] # xmm0 = mem[0],zero,zero,zero
movss dword ptr [rsp + 4], xmm0 # 4-byte Spill
pop rax
ret
.seh_handlerdata
.text
.seh_endproc

What I find extremely interesting is the behavior seems has something to do with
the stack?  For dropping the inputs by one then even Windows will generate the
conversion.

define void @foo(i8, i8, i8, half) {
; CHECK-I686:    callq __gnu_f2h_ieee

  %5 = alloca half
  store half %3, half* %5, align 1
  ret void
}

x86_64-pc-windows gives:
sub rsp, 40
.seh_stackalloc 40
.seh_endprologue
movabs rax, offset __gnu_f2h_ieee
movaps xmm0, xmm3
call rax
mov word ptr [rsp + 38], ax
add rsp, 40
ret
.seh_handlerdata
.text
.seh_endproc


** If interested, here is a dissection of our real asm.
For both Windows and Linux our IR calls c2_foo() with a half(2):
...
call void @c2_foo(i8* %S_6, [21 x i8*]* %ptr_gvar_instance_7, %emlrtStack*
%c2_b_st_, [18 x float]* @15, half 0xH4000, [18 x i8]* %t10)

They both register this in c2_foo as:
...
  %c2_in2_ = alloca half
  store half %c2_in2, half* %c2_in2_, align 1

When we compile them, they both send 0x40000000 to c2_foo (a single).
The Linux c2_foo() asm addresses this with a float2half conversion:
...
 mov qword ptr [rsp + 448], rdi
 mov qword ptr [rsp + 440], rsi
 mov qword ptr [rsp + 432], rdx
 mov qword ptr [rsp + 424], rcx
 movabs rcx, offset __gnu_f2h_ieee     # <---Convert Here
 mov qword ptr [rsp + 336], r8 # 8-byte Spill
 call rcx
 mov word ptr [rsp + 422], ax
 mov rcx, qword ptr [rsp + 336] # 8-byte Reload
 mov qword ptr [rsp + 408], rcx
 mov qword ptr [rsp + 392], 0
 mov qword ptr [rsp + 384], 0
 mov qword ptr [rsp + 376], 0
 mov qword ptr [rsp + 368], 0
 mov rdx, qword ptr [rsp + 432]
 mov qword ptr [rsp + 360], rdx
 mov rdx, qword ptr [rsp + 432]
 mov rdx, qword ptr [rdx + 8]
 mov qword ptr [rsp + 352], rdx
 mov rdx, qword ptr [rsp + 440]
 mov rdx, qword ptr [rdx + 56]
 mov qword ptr [rsp + 344], rdx
 mov dword ptr [rsp + 400], 0
 jmp .LBB9_9

The Windows c2_foo() asm is missing this conversion but treats the value as if
it has been converted.
...
 mov rax, qword ptr [rsp + 424]
 movss xmm0, dword ptr [rsp + 416] # xmm0 = mem[0],zero,zero,zero  # <--
moves the data like it wants to convert but never does
 mov qword ptr [rsp + 344], rcx
 mov qword ptr [rsp + 336], rdx
 mov qword ptr [rsp + 328], r8
 mov qword ptr [rsp + 320], r9
 mov qword ptr [rsp + 304], 0
 mov qword ptr [rsp + 296], 0
 mov qword ptr [rsp + 288], 0
 mov qword ptr [rsp + 280], 0
 mov rcx, qword ptr [rsp + 328]
 mov qword ptr [rsp + 272], rcx
 mov rcx, qword ptr [rsp + 328]
 mov rcx, qword ptr [rcx + 8]
 mov qword ptr [rsp + 264], rcx
 mov rcx, qword ptr [rsp + 336]
 mov rcx, qword ptr [rcx + 56]
 mov qword ptr [rsp + 256], rcx
 mov dword ptr [rsp + 312], 0
 mov qword ptr [rsp + 248], rax # 8-byte Spill
 movss dword ptr




________________________________
From: Wang, Pengfei <pengfei.wang at intel.com<mailto:pengfei.wang at
intel.com>>
Sent: Friday, March 5, 2021 7:30 AM
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at
arm.com>>; Jason Hafer <jhafer at mathworks.com<mailto:jhafer at
mathworks.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: RE: Is it legal to pass a half by value on x86_64?


I guess it's designed for language portability. You can use this type across
different platforms. Nevertheless, I'm not a FE expert, so I cannot think
out other intentions.

The _Float16 is a primitive type in the latest x86 ABI, but there's no X86
target that supports it yet. So you cannot use it on X86 by now. I think
that's the difference from __fp16 and why should use it.

We also have some discussion here. https://reviews.llvm.org/D97318



Thanks

Pengfei



From: Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at
arm.com>>
Sent: Friday, March 5, 2021 5:49 PM
To: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at
mathworks.com>>; Wang, Pengfei <pengfei.wang at
intel.com<mailto:pengfei.wang at intel.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: Re: Is it legal to pass a half by value on x86_64?



__fp16 is a pure storage format. You cannot pass it by value, because only
ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be
passed by value while __fp16 is not one of them.

Yep. Any specific reason to use a pure storage format? The native type is
_Float16 and would give some benefits, but this is not yet supported on x86, see
also:



https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point



Cheers,
Sjoerd.

________________________________

From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> on behalf of Wang, Pengfei via llvm-dev <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: 05 March 2021 06:28
To: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at
mathworks.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: Re: [llvm-dev] Is it legal to pass a half by value on x86_64?



Hi Jason,



__fp16 is a pure storage format. You cannot pass it by value, because only
ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be
passed by value while __fp16 is not one of them.



  *   if "define void @foo(i8, i8, i8, i8, half) " is even legal to
use

half as a target independent type is legal for LLVM. It's not legal for
unsupported target like X86. The behavior depends on how we lowering it. But I
don't know why there's differences between Linux and Windows. Maybe
because "__gnu_f2h_ieee" is a Linux only function?



Thanks

Pengfei



From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> On Behalf Of Jason Hafer via llvm-dev
Sent: Friday, March 5, 2021 10:46 AM
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Cc: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at
mathworks.com>>
Subject: [llvm-dev] Is it legal to pass a half by value on x86_64?



Hello,



I am attempting to understand an anomaly I am seeing when dealing with half on
Windows and could use some help.



Using LLVM 8 or 10, if I have IR of the flavor below:
define void @foo(i8, i8, i8, i8, half) {

  %6 = alloca half

  store half %4, half* %6, align 1

  ...

  ret void

}



Using x86_64-pc-linux, we convert the float passed in with __gnu_f2h_ieee.

Using x86_64-pc-windows I do not get the conversion, so we end up with incorrect
math operations.



While investigating I noticed clang gave me the error below:

error: parameters cannot have __fp16 type; did you forget * ?
void foo(int dc1, int dc2,int dc3,int dc4, __fp16 in)



So, this got me wondering if "define void @foo(i8, i8, i8, i8, half) "
is even legal to use or if I should rather pass by ref?  I have yet to find
documentation to convince me one way or the other.  Thus, I was hoping someone
here might be able to shed some light on the issue.



Thank you in advance!



Cheers,



JP
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210305/ca2327ba/attachment.html>

Craig Topper via llvm-dev

2021-Mar-05 18:23 UTC

head link

[llvm-dev] Is it legal to pass a half by value on x86_64?

For this code the half store from the IR appears to have been removed
because it is a local variable that was never read from. The store that
says "4-byte Spill" is a different store and seems to be some -O0
artifact.
With -O2 the whole thing becomes just a ret.

define void @foo(i8, i8, i8, i8, half) {
; CHECK-I686:    callq __gnu_f2h_ieee

  %6 = alloca half
  store half %4, half* %6, align 1
  ret void
}

x86_64-pc-windows gives:
push rax
.seh_stackalloc 8
.seh_endprologue
movss xmm0, dword ptr [rsp + 48] # xmm0 = mem[0],zero,zero,zero
movss dword ptr [rsp + 4], xmm0 # 4-byte Spill
pop rax
ret
.seh_handlerdata
.text
.seh_endproc


As an experiment, I tried this which does produce a call to __gnu_f2h_ieee
on windows with llvm 8.0 and llvm 10.0

define void @foo(half*, i8, i8, half) {
store half %3, half* %0, align 1
ret void
}


For this assembly you provided, I don't see any reads from xmm0, or any
word stores. So it's hard for me to determine what might be going wrong.
Can provide the assembly where xmm0 is eventually used?

mov rax, qword ptr [rsp + 424]
 movss xmm0, dword ptr [rsp + 416] # xmm0 = mem[0],zero,zero,zero  # <--
moves the data like it wants to convert but never does
 mov qword ptr [rsp + 344], rcx
 mov qword ptr [rsp + 336], rdx
 mov qword ptr [rsp + 328], r8
 mov qword ptr [rsp + 320], r9
 mov qword ptr [rsp + 304], 0
 mov qword ptr [rsp + 296], 0
 mov qword ptr [rsp + 288], 0
 mov qword ptr [rsp + 280], 0
 mov rcx, qword ptr [rsp + 328]
 mov qword ptr [rsp + 272], rcx
 mov rcx, qword ptr [rsp + 328]
 mov rcx, qword ptr [rcx + 8]
 mov qword ptr [rsp + 264], rcx
 mov rcx, qword ptr [rsp + 336]
 mov rcx, qword ptr [rcx + 56]
 mov qword ptr [rsp + 256], rcx
 mov dword ptr [rsp + 312], 0
 mov qword ptr [rsp + 248], rax # 8-byte Spill
 movss dword ptr


~Craig


On Fri, Mar 5, 2021 at 6:46 AM Wang, Pengfei <pengfei.wang at intel.com>
wrote:
> Hi Jason,
>
>
>
> The different behavior between Linux and Windows comes form the difference
> of the calling conversion. Windows uses 4 registers for arguments passing
> which Linux uses 6.
>
>
>
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#parameter-passing
>
>
>
> Thanks
>
> Pengfei
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of
*Jason
> Hafer via llvm-dev
> *Sent:* Friday, March 5, 2021 10:21 PM
> *To:* Craig Topper <craig.topper at gmail.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Is it legal to pass a half by value on x86_64?
>
>
>
> Hi All,
>
>
>
> Thank you very much for all the great information.  This is awesome!
>
>
>
> To circle back on Craig's questions.
>
> I did notice LLVM 11 behave very differently.
>
>
>
> ** Per: What does "incorrect math operations" mean?
>
> The half is passed to the function as a float.  The function does
> operations with other half numbers.  On Windows when we don't get the
float
> to half conversation the input is always truncated to 0.0.
>
>
>
> ** Per: "Do you have a more complete IR file for Windows that I can
take a
> look at?"
>
> I can get you our IR if you want, but I think it is more convoluted than
> required.  I was working on a unit test and I think all one needs to see
> the anomaly is:
>
> define void @foo(i8, i8, i8, i8, half) {
>
> ; CHECK-I686:    callq __gnu_f2h_ieee
>
>
>
>   %6 = alloca half
>
>   store half %4, half* %6, align 1
>
>   ret void
>
> }
>
> x86_64-pc-windows gives:
> push rax
>
> .seh_stackalloc 8
>
> .seh_endprologue
>
> movss xmm0, dword ptr [rsp + 48] # xmm0 = mem[0],zero,zero,zero
>
> movss dword ptr [rsp + 4], xmm0 # 4-byte Spill
>
> pop rax
>
> ret
>
> .seh_handlerdata
>
> .text
>
> .seh_endproc
>
>
>
> What I find extremely interesting is the behavior seems has something to
> do with the stack?  For dropping the inputs by one then even Windows will
> generate the conversion.
>
>
>
> define void @foo(i8, i8, i8, half) {
>
> ; CHECK-I686:    callq __gnu_f2h_ieee
>
>
>
>   %5 = alloca half
>
>   store half %3, half* %5, align 1
>
>   ret void
>
> }
>
>
>
> x86_64-pc-windows gives:
>
> sub rsp, 40
>
> .seh_stackalloc 40
>
> .seh_endprologue
>
> movabs rax, offset __gnu_f2h_ieee
>
> movaps xmm0, xmm3
>
> call rax
>
> mov word ptr [rsp + 38], ax
>
> add rsp, 40
>
> ret
>
> .seh_handlerdata
>
> .text
>
> .seh_endproc
>
>
>
>
>
> ** If interested, here is a dissection of our real asm.
> For both Windows and Linux our IR calls c2_foo() with a half(2):
>
> ...
>
> call void @c2_foo(i8* %S_6, [21 x i8*]* %ptr_gvar_instance_7, %emlrtStack*
> %c2_b_st_, [18 x float]* @15, half 0xH4000, [18 x i8]* %t10)
>
>
>
> They both register this in c2_foo as:
>
> ...
>
>   %c2_in2_ = alloca half
>
>   store half %c2_in2, half* %c2_in2_, align 1
>
>
>
> When we compile them, they both send 0x40000000 to c2_foo (a single).
>
> The Linux c2_foo() asm addresses this with a float2half conversion:
>
> ...
>
>  mov qword ptr [rsp + 448], rdi
>
>  mov qword ptr [rsp + 440], rsi
>
>  mov qword ptr [rsp + 432], rdx
>
>  mov qword ptr [rsp + 424], rcx
>
>  movabs rcx, offset __gnu_f2h_ieee     # <---Convert Here
>
>  mov qword ptr [rsp + 336], r8 # 8-byte Spill
>
>  call rcx
>
>  mov word ptr [rsp + 422], ax
>
>  mov rcx, qword ptr [rsp + 336] # 8-byte Reload
>
>  mov qword ptr [rsp + 408], rcx
>
>  mov qword ptr [rsp + 392], 0
>
>  mov qword ptr [rsp + 384], 0
>
>  mov qword ptr [rsp + 376], 0
>
>  mov qword ptr [rsp + 368], 0
>
>  mov rdx, qword ptr [rsp + 432]
>
>  mov qword ptr [rsp + 360], rdx
>
>  mov rdx, qword ptr [rsp + 432]
>
>  mov rdx, qword ptr [rdx + 8]
>
>  mov qword ptr [rsp + 352], rdx
>
>  mov rdx, qword ptr [rsp + 440]
>
>  mov rdx, qword ptr [rdx + 56]
>
>  mov qword ptr [rsp + 344], rdx
>
>  mov dword ptr [rsp + 400], 0
>
>  jmp .LBB9_9
>
>
>
> The Windows c2_foo() asm is missing this conversion but treats the value
> as if it has been converted.
>
> ...
>
>  mov rax, qword ptr [rsp + 424]
>
>  movss xmm0, dword ptr [rsp + 416] # xmm0 = mem[0],zero,zero,zero  # <--
> moves the data like it wants to convert but never does
>
>  mov qword ptr [rsp + 344], rcx
>
>  mov qword ptr [rsp + 336], rdx
>
>  mov qword ptr [rsp + 328], r8
>
>  mov qword ptr [rsp + 320], r9
>
>  mov qword ptr [rsp + 304], 0
>
>  mov qword ptr [rsp + 296], 0
>
>  mov qword ptr [rsp + 288], 0
>
>  mov qword ptr [rsp + 280], 0
>
>  mov rcx, qword ptr [rsp + 328]
>
>  mov qword ptr [rsp + 272], rcx
>
>  mov rcx, qword ptr [rsp + 328]
>
>  mov rcx, qword ptr [rcx + 8]
>
>  mov qword ptr [rsp + 264], rcx
>
>  mov rcx, qword ptr [rsp + 336]
>
>  mov rcx, qword ptr [rcx + 56]
>
>  mov qword ptr [rsp + 256], rcx
>
>  mov dword ptr [rsp + 312], 0
>
>  mov qword ptr [rsp + 248], rax # 8-byte Spill
>
>  movss dword ptr
>
>
>
>
>
>
> ------------------------------
>
> *From:* Wang, Pengfei <pengfei.wang at intel.com>
> *Sent:* Friday, March 5, 2021 7:30 AM
> *To:* Sjoerd Meijer <Sjoerd.Meijer at arm.com>; Jason Hafer <
> jhafer at mathworks.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* RE: Is it legal to pass a half by value on x86_64?
>
>
>
> I guess it’s designed for language portability. You can use this type
> across different platforms. Nevertheless, I’m not a FE expert, so I cannot
> think out other intentions.
>
> The _Float16 is a primitive type in the latest x86 ABI, but there’s no X86
> target that supports it yet. So you cannot use it on X86 by now. I think
> that’s the difference from __fp16 and why should use it.
>
> We also have some discussion here. https://reviews.llvm.org/D97318
>
>
>
> Thanks
>
> Pengfei
>
>
>
> *From:* Sjoerd Meijer <Sjoerd.Meijer at arm.com>
> *Sent:* Friday, March 5, 2021 5:49 PM
> *To:* Jason Hafer <jhafer at mathworks.com>; Wang, Pengfei <
> pengfei.wang at intel.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: Is it legal to pass a half by value on x86_64?
>
>
>
> __fp16 is a pure storage format. You cannot pass it by value, because only
>  ABI <https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can
be
> passed by value while __fp16 is not one of them.
>
> Yep. Any specific reason to use a pure storage format? The native type is
> _Float16 and would give some benefits, but this is not yet supported on
> x86, see also:
>
>
>
>
>
https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
>
>
>
> Cheers,
> Sjoerd.
> ------------------------------
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of
Wang,
> Pengfei via llvm-dev <llvm-dev at lists.llvm.org>
> *Sent:* 05 March 2021 06:28
> *To:* Jason Hafer <jhafer at mathworks.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] Is it legal to pass a half by value on x86_64?
>
>
>
> Hi Jason,
>
>
>
> __fp16 is a pure storage format. You cannot pass it by value, because only
> ABI <https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can
be
> passed by value while __fp16 is not one of them.
>
>
>
>    - if "define void @foo(i8, i8, i8, i8, half) " is even legal
to use
>
> half as a target independent type is legal for LLVM. It’s not legal for
> unsupported target like X86. The behavior depends on how we lowering it.
> But I don’t know why there’s differences between Linux and Windows. Maybe
> because “__gnu_f2h_ieee” is a Linux only function?
>
>
>
> Thanks
>
> Pengfei
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of
*Jason
> Hafer via llvm-dev
> *Sent:* Friday, March 5, 2021 10:46 AM
> *To:* llvm-dev at lists.llvm.org
> *Cc:* Jason Hafer <jhafer at mathworks.com>
> *Subject:* [llvm-dev] Is it legal to pass a half by value on x86_64?
>
>
>
> Hello,
>
>
>
> I am attempting to understand an anomaly I am seeing when dealing with
> half on Windows and could use some help.
>
>
>
> Using LLVM 8 or 10, if I have IR of the flavor below:
> define void @foo(i8, i8, i8, i8, half) {
>
>   %6 = alloca half
>
>   store half %4, half* %6, align 1
>
>   ...
>
>   ret void
>
> }
>
>
>
> Using x86_64-pc-linux, we convert the float passed in with __gnu_f2h_ieee.
>
> Using x86_64-pc-windows I do not get the conversion, so we end up with
> incorrect math operations.
>
>
>
> While investigating I noticed clang gave me the error below:
>
> error: parameters cannot have __fp16 type; did you forget * ?
> void foo(int dc1, int dc2,int dc3,int dc4, __fp16 in)
>
>
>
> So, this got me wondering if "define void @foo(i8, i8, i8, i8, half)
" is
> even legal to use or if I should rather pass by ref?  I have yet to find
> documentation to convince me one way or the other.  Thus, I was hoping
> someone here might be able to shed some light on the issue.
>
>
>
> Thank you in advance!
>
>
>
> Cheers,
>
>
>
> JP
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210305/6b47ae82/attachment.html>

llvm dev - Mar 2021 - Is it legal to pass a half by value on x86_64?

[llvm-dev] Is it legal to pass a half by value on x86_64?

[llvm-dev] Is it legal to pass a half by value on x86_64?