thr3ads.net - llvm dev - [llvm-dev] Is it legal to pass a half by value on x86

If this information is useful, please help other people find it:
Share via:

Wang, Pengfei via llvm-dev

2021-Mar-05 12:30 UTC

[llvm-dev] Is it legal to pass a half by value on x86_64?

I guess it's designed for language portability. You can use this type across
different platforms. Nevertheless, I'm not a FE expert, so I cannot think
out other intentions.
The _Float16 is a primitive type in the latest x86 ABI, but there's no X86
target that supports it yet. So you cannot use it on X86 by now. I think
that's the difference from __fp16 and why should use it.
We also have some discussion here. https://reviews.llvm.org/D97318

Thanks
Pengfei

From: Sjoerd Meijer <Sjoerd.Meijer at arm.com>
Sent: Friday, March 5, 2021 5:49 PM
To: Jason Hafer <jhafer at mathworks.com>; Wang, Pengfei <pengfei.wang
at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: Is it legal to pass a half by value on x86_64?

__fp16 is a pure storage format. You cannot pass it by value, because only
ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be
passed by value while __fp16 is not one of them.
Yep. Any specific reason to use a pure storage format? The native type is
_Float16 and would give some benefits, but this is not yet supported on x86, see
also:


https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point


Cheers,
Sjoerd.
________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> on behalf of Wang, Pengfei via llvm-dev <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: 05 March 2021 06:28
To: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at
mathworks.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: Re: [llvm-dev] Is it legal to pass a half by value on x86_64?


Hi Jason,



__fp16 is a pure storage format. You cannot pass it by value, because only
ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be
passed by value while __fp16 is not one of them.



  *   if "define void @foo(i8, i8, i8, i8, half) " is even legal to
use

half as a target independent type is legal for LLVM. It's not legal for
unsupported target like X86. The behavior depends on how we lowering it. But I
don't know why there's differences between Linux and Windows. Maybe
because "__gnu_f2h_ieee" is a Linux only function?



Thanks

Pengfei



From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> On Behalf Of Jason Hafer via llvm-dev
Sent: Friday, March 5, 2021 10:46 AM
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Cc: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at
mathworks.com>>
Subject: [llvm-dev] Is it legal to pass a half by value on x86_64?



Hello,



I am attempting to understand an anomaly I am seeing when dealing with half on
Windows and could use some help.



Using LLVM 8 or 10, if I have IR of the flavor below:
define void @foo(i8, i8, i8, i8, half) {

  %6 = alloca half

  store half %4, half* %6, align 1

  ...

  ret void

}



Using x86_64-pc-linux, we convert the float passed in with __gnu_f2h_ieee.

Using x86_64-pc-windows I do not get the conversion, so we end up with incorrect
math operations.



While investigating I noticed clang gave me the error below:

error: parameters cannot have __fp16 type; did you forget * ?
void foo(int dc1, int dc2,int dc3,int dc4, __fp16 in)



So, this got me wondering if "define void @foo(i8, i8, i8, i8, half) "
is even legal to use or if I should rather pass by ref?  I have yet to find
documentation to convince me one way or the other.  Thus, I was hoping someone
here might be able to shed some light on the issue.



Thank you in advance!



Cheers,



JP
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210305/ba5ae36b/attachment.html>

Jason Hafer via llvm-dev

2021-Mar-05 14:21 UTC

head link

[llvm-dev] Is it legal to pass a half by value on x86_64?

Hi All,

Thank you very much for all the great information.  This is awesome!

To circle back on Craig's questions.
I did notice LLVM 11 behave very differently.

** Per: What does "incorrect math operations" mean?
The half is passed to the function as a float.  The function does operations
with other half numbers.  On Windows when we don't get the float to half
conversation the input is always truncated to 0.0.

** Per: "Do you have a more complete IR file for Windows that I can take a
look at?"
I can get you our IR if you want, but I think it is more convoluted than
required.  I was working on a unit test and I think all one needs to see the
anomaly is:

define void @foo(i8, i8, i8, i8, half) {
; CHECK-I686:    callq __gnu_f2h_ieee

  %6 = alloca half
  store half %4, half* %6, align 1
  ret void
}

x86_64-pc-windows gives:
push rax
.seh_stackalloc 8
.seh_endprologue
movss xmm0, dword ptr [rsp + 48] # xmm0 = mem[0],zero,zero,zero
movss dword ptr [rsp + 4], xmm0 # 4-byte Spill
pop rax
ret
.seh_handlerdata
.text
.seh_endproc

What I find extremely interesting is the behavior seems has something to do with
the stack?  For dropping the inputs by one then even Windows will generate the
conversion.

define void @foo(i8, i8, i8, half) {
; CHECK-I686:    callq __gnu_f2h_ieee

  %5 = alloca half
  store half %3, half* %5, align 1
  ret void
}

x86_64-pc-windows gives:
sub rsp, 40
.seh_stackalloc 40
.seh_endprologue
movabs rax, offset __gnu_f2h_ieee
movaps xmm0, xmm3
call rax
mov word ptr [rsp + 38], ax
add rsp, 40
ret
.seh_handlerdata
.text
.seh_endproc


** If interested, here is a dissection of our real asm.
For both Windows and Linux our IR calls c2_foo() with a half(2):
...
call void @c2_foo(i8* %S_6, [21 x i8*]* %ptr_gvar_instance_7, %emlrtStack*
%c2_b_st_, [18 x float]* @15, half 0xH4000, [18 x i8]* %t10)

They both register this in c2_foo as:
...
  %c2_in2_ = alloca half
  store half %c2_in2, half* %c2_in2_, align 1

When we compile them, they both send 0x40000000 to c2_foo (a single).
The Linux c2_foo() asm addresses this with a float2half conversion:
...
 mov qword ptr [rsp + 448], rdi
 mov qword ptr [rsp + 440], rsi
 mov qword ptr [rsp + 432], rdx
 mov qword ptr [rsp + 424], rcx
 movabs rcx, offset __gnu_f2h_ieee     # <---Convert Here
 mov qword ptr [rsp + 336], r8 # 8-byte Spill
 call rcx
 mov word ptr [rsp + 422], ax
 mov rcx, qword ptr [rsp + 336] # 8-byte Reload
 mov qword ptr [rsp + 408], rcx
 mov qword ptr [rsp + 392], 0
 mov qword ptr [rsp + 384], 0
 mov qword ptr [rsp + 376], 0
 mov qword ptr [rsp + 368], 0
 mov rdx, qword ptr [rsp + 432]
 mov qword ptr [rsp + 360], rdx
 mov rdx, qword ptr [rsp + 432]
 mov rdx, qword ptr [rdx + 8]
 mov qword ptr [rsp + 352], rdx
 mov rdx, qword ptr [rsp + 440]
 mov rdx, qword ptr [rdx + 56]
 mov qword ptr [rsp + 344], rdx
 mov dword ptr [rsp + 400], 0
 jmp .LBB9_9

The Windows c2_foo() asm is missing this conversion but treats the value as if
it has been converted.
...
 mov rax, qword ptr [rsp + 424]
 movss xmm0, dword ptr [rsp + 416] # xmm0 = mem[0],zero,zero,zero  # <--
moves the data like it wants to convert but never does
 mov qword ptr [rsp + 344], rcx
 mov qword ptr [rsp + 336], rdx
 mov qword ptr [rsp + 328], r8
 mov qword ptr [rsp + 320], r9
 mov qword ptr [rsp + 304], 0
 mov qword ptr [rsp + 296], 0
 mov qword ptr [rsp + 288], 0
 mov qword ptr [rsp + 280], 0
 mov rcx, qword ptr [rsp + 328]
 mov qword ptr [rsp + 272], rcx
 mov rcx, qword ptr [rsp + 328]
 mov rcx, qword ptr [rcx + 8]
 mov qword ptr [rsp + 264], rcx
 mov rcx, qword ptr [rsp + 336]
 mov rcx, qword ptr [rcx + 56]
 mov qword ptr [rsp + 256], rcx
 mov dword ptr [rsp + 312], 0
 mov qword ptr [rsp + 248], rax # 8-byte Spill
 movss dword ptr





________________________________
From: Wang, Pengfei <pengfei.wang at intel.com>
Sent: Friday, March 5, 2021 7:30 AM
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>; Jason Hafer <jhafer at
mathworks.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: RE: Is it legal to pass a half by value on x86_64?


I guess it’s designed for language portability. You can use this type across
different platforms. Nevertheless, I’m not a FE expert, so I cannot think out
other intentions.

The _Float16 is a primitive type in the latest x86 ABI, but there’s no X86
target that supports it yet. So you cannot use it on X86 by now. I think that’s
the difference from __fp16 and why should use it.

We also have some discussion here.
https://reviews.llvm.org/D97318<https://reviews.llvm.org/D97318>



Thanks

Pengfei



From: Sjoerd Meijer <Sjoerd.Meijer at arm.com>
Sent: Friday, March 5, 2021 5:49 PM
To: Jason Hafer <jhafer at mathworks.com>; Wang, Pengfei <pengfei.wang
at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: Is it legal to pass a half by value on x86_64?



__fp16 is a pure storage format. You cannot pass it by value, because only
ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be
passed by value while __fp16 is not one of them.

Yep. Any specific reason to use a pure storage format? The native type is
_Float16 and would give some benefits, but this is not yet supported on x86, see
also:



https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point<https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point>




Cheers,
Sjoerd.

________________________________

From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> on behalf of Wang, Pengfei via llvm-dev <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: 05 March 2021 06:28
To: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at
mathworks.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: Re: [llvm-dev] Is it legal to pass a half by value on x86_64?



Hi Jason,



__fp16 is a pure storage format. You cannot pass it by value, because only
ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be
passed by value while __fp16 is not one of them.



  *   if "define void @foo(i8, i8, i8, i8, half) " is even legal to
use

half as a target independent type is legal for LLVM. It’s not legal for
unsupported target like X86. The behavior depends on how we lowering it. But I
don’t know why there’s differences between Linux and Windows. Maybe because
“__gnu_f2h_ieee” is a Linux only function?



Thanks

Pengfei



From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> On Behalf Of Jason Hafer via llvm-dev
Sent: Friday, March 5, 2021 10:46 AM
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Cc: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at
mathworks.com>>
Subject: [llvm-dev] Is it legal to pass a half by value on x86_64?



Hello,



I am attempting to understand an anomaly I am seeing when dealing with half on
Windows and could use some help.



Using LLVM 8 or 10, if I have IR of the flavor below:
define void @foo(i8, i8, i8, i8, half) {

  %6 = alloca half

  store half %4, half* %6, align 1

  ...

  ret void

}



Using x86_64-pc-linux, we convert the float passed in with __gnu_f2h_ieee.

Using x86_64-pc-windows I do not get the conversion, so we end up with incorrect
math operations.



While investigating I noticed clang gave me the error below:

error: parameters cannot have __fp16 type; did you forget * ?
void foo(int dc1, int dc2,int dc3,int dc4, __fp16 in)



So, this got me wondering if "define void @foo(i8, i8, i8, i8, half) "
is even legal to use or if I should rather pass by ref?  I have yet to find
documentation to convince me one way or the other.  Thus, I was hoping someone
here might be able to shed some light on the issue.



Thank you in advance!



Cheers,



JP
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210305/1785e67b/attachment-0001.html>

llvm dev - Mar 2021 - Is it legal to pass a half by value on x86_64?

[llvm-dev] Is it legal to pass a half by value on x86_64?

[llvm-dev] Is it legal to pass a half by value on x86_64?