Wang, Pengfei via llvm-dev
2021-Mar-05 14:46 UTC
[llvm-dev] Is it legal to pass a half by value on x86_64?
Hi Jason, The different behavior between Linux and Windows comes form the difference of the calling conversion. Windows uses 4 registers for arguments passing which Linux uses 6. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#parameter-passing Thanks Pengfei From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Jason Hafer via llvm-dev Sent: Friday, March 5, 2021 10:21 PM To: Craig Topper <craig.topper at gmail.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Is it legal to pass a half by value on x86_64? Hi All, Thank you very much for all the great information. This is awesome! To circle back on Craig's questions. I did notice LLVM 11 behave very differently. ** Per: What does "incorrect math operations" mean? The half is passed to the function as a float. The function does operations with other half numbers. On Windows when we don't get the float to half conversation the input is always truncated to 0.0. ** Per: "Do you have a more complete IR file for Windows that I can take a look at?" I can get you our IR if you want, but I think it is more convoluted than required. I was working on a unit test and I think all one needs to see the anomaly is: define void @foo(i8, i8, i8, i8, half) { ; CHECK-I686: callq __gnu_f2h_ieee %6 = alloca half store half %4, half* %6, align 1 ret void } x86_64-pc-windows gives: push rax .seh_stackalloc 8 .seh_endprologue movss xmm0, dword ptr [rsp + 48] # xmm0 = mem[0],zero,zero,zero movss dword ptr [rsp + 4], xmm0 # 4-byte Spill pop rax ret .seh_handlerdata .text .seh_endproc What I find extremely interesting is the behavior seems has something to do with the stack? For dropping the inputs by one then even Windows will generate the conversion. define void @foo(i8, i8, i8, half) { ; CHECK-I686: callq __gnu_f2h_ieee %5 = alloca half store half %3, half* %5, align 1 ret void } x86_64-pc-windows gives: sub rsp, 40 .seh_stackalloc 40 .seh_endprologue movabs rax, offset __gnu_f2h_ieee movaps xmm0, xmm3 call rax mov word ptr [rsp + 38], ax add rsp, 40 ret .seh_handlerdata .text .seh_endproc ** If interested, here is a dissection of our real asm. For both Windows and Linux our IR calls c2_foo() with a half(2): ... call void @c2_foo(i8* %S_6, [21 x i8*]* %ptr_gvar_instance_7, %emlrtStack* %c2_b_st_, [18 x float]* @15, half 0xH4000, [18 x i8]* %t10) They both register this in c2_foo as: ... %c2_in2_ = alloca half store half %c2_in2, half* %c2_in2_, align 1 When we compile them, they both send 0x40000000 to c2_foo (a single). The Linux c2_foo() asm addresses this with a float2half conversion: ... mov qword ptr [rsp + 448], rdi mov qword ptr [rsp + 440], rsi mov qword ptr [rsp + 432], rdx mov qword ptr [rsp + 424], rcx movabs rcx, offset __gnu_f2h_ieee # <---Convert Here mov qword ptr [rsp + 336], r8 # 8-byte Spill call rcx mov word ptr [rsp + 422], ax mov rcx, qword ptr [rsp + 336] # 8-byte Reload mov qword ptr [rsp + 408], rcx mov qword ptr [rsp + 392], 0 mov qword ptr [rsp + 384], 0 mov qword ptr [rsp + 376], 0 mov qword ptr [rsp + 368], 0 mov rdx, qword ptr [rsp + 432] mov qword ptr [rsp + 360], rdx mov rdx, qword ptr [rsp + 432] mov rdx, qword ptr [rdx + 8] mov qword ptr [rsp + 352], rdx mov rdx, qword ptr [rsp + 440] mov rdx, qword ptr [rdx + 56] mov qword ptr [rsp + 344], rdx mov dword ptr [rsp + 400], 0 jmp .LBB9_9 The Windows c2_foo() asm is missing this conversion but treats the value as if it has been converted. ... mov rax, qword ptr [rsp + 424] movss xmm0, dword ptr [rsp + 416] # xmm0 = mem[0],zero,zero,zero # <-- moves the data like it wants to convert but never does mov qword ptr [rsp + 344], rcx mov qword ptr [rsp + 336], rdx mov qword ptr [rsp + 328], r8 mov qword ptr [rsp + 320], r9 mov qword ptr [rsp + 304], 0 mov qword ptr [rsp + 296], 0 mov qword ptr [rsp + 288], 0 mov qword ptr [rsp + 280], 0 mov rcx, qword ptr [rsp + 328] mov qword ptr [rsp + 272], rcx mov rcx, qword ptr [rsp + 328] mov rcx, qword ptr [rcx + 8] mov qword ptr [rsp + 264], rcx mov rcx, qword ptr [rsp + 336] mov rcx, qword ptr [rcx + 56] mov qword ptr [rsp + 256], rcx mov dword ptr [rsp + 312], 0 mov qword ptr [rsp + 248], rax # 8-byte Spill movss dword ptr ________________________________ From: Wang, Pengfei <pengfei.wang at intel.com<mailto:pengfei.wang at intel.com>> Sent: Friday, March 5, 2021 7:30 AM To: Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at arm.com>>; Jason Hafer <jhafer at mathworks.com<mailto:jhafer at mathworks.com>> Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: RE: Is it legal to pass a half by value on x86_64? I guess it's designed for language portability. You can use this type across different platforms. Nevertheless, I'm not a FE expert, so I cannot think out other intentions. The _Float16 is a primitive type in the latest x86 ABI, but there's no X86 target that supports it yet. So you cannot use it on X86 by now. I think that's the difference from __fp16 and why should use it. We also have some discussion here. https://reviews.llvm.org/D97318 Thanks Pengfei From: Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at arm.com>> Sent: Friday, March 5, 2021 5:49 PM To: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at mathworks.com>>; Wang, Pengfei <pengfei.wang at intel.com<mailto:pengfei.wang at intel.com>> Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: Re: Is it legal to pass a half by value on x86_64? __fp16 is a pure storage format. You cannot pass it by value, because only ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be passed by value while __fp16 is not one of them. Yep. Any specific reason to use a pure storage format? The native type is _Float16 and would give some benefits, but this is not yet supported on x86, see also: https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point Cheers, Sjoerd. ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> on behalf of Wang, Pengfei via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Sent: 05 March 2021 06:28 To: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at mathworks.com>> Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: Re: [llvm-dev] Is it legal to pass a half by value on x86_64? Hi Jason, __fp16 is a pure storage format. You cannot pass it by value, because only ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be passed by value while __fp16 is not one of them. * if "define void @foo(i8, i8, i8, i8, half) " is even legal to use half as a target independent type is legal for LLVM. It's not legal for unsupported target like X86. The behavior depends on how we lowering it. But I don't know why there's differences between Linux and Windows. Maybe because "__gnu_f2h_ieee" is a Linux only function? Thanks Pengfei From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Jason Hafer via llvm-dev Sent: Friday, March 5, 2021 10:46 AM To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Cc: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at mathworks.com>> Subject: [llvm-dev] Is it legal to pass a half by value on x86_64? Hello, I am attempting to understand an anomaly I am seeing when dealing with half on Windows and could use some help. Using LLVM 8 or 10, if I have IR of the flavor below: define void @foo(i8, i8, i8, i8, half) { %6 = alloca half store half %4, half* %6, align 1 ... ret void } Using x86_64-pc-linux, we convert the float passed in with __gnu_f2h_ieee. Using x86_64-pc-windows I do not get the conversion, so we end up with incorrect math operations. While investigating I noticed clang gave me the error below: error: parameters cannot have __fp16 type; did you forget * ? void foo(int dc1, int dc2,int dc3,int dc4, __fp16 in) So, this got me wondering if "define void @foo(i8, i8, i8, i8, half) " is even legal to use or if I should rather pass by ref? I have yet to find documentation to convince me one way or the other. Thus, I was hoping someone here might be able to shed some light on the issue. Thank you in advance! Cheers, JP -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210305/ca2327ba/attachment.html>
Craig Topper via llvm-dev
2021-Mar-05 18:23 UTC
[llvm-dev] Is it legal to pass a half by value on x86_64?
For this code the half store from the IR appears to have been removed because it is a local variable that was never read from. The store that says "4-byte Spill" is a different store and seems to be some -O0 artifact. With -O2 the whole thing becomes just a ret. define void @foo(i8, i8, i8, i8, half) { ; CHECK-I686: callq __gnu_f2h_ieee %6 = alloca half store half %4, half* %6, align 1 ret void } x86_64-pc-windows gives: push rax .seh_stackalloc 8 .seh_endprologue movss xmm0, dword ptr [rsp + 48] # xmm0 = mem[0],zero,zero,zero movss dword ptr [rsp + 4], xmm0 # 4-byte Spill pop rax ret .seh_handlerdata .text .seh_endproc As an experiment, I tried this which does produce a call to __gnu_f2h_ieee on windows with llvm 8.0 and llvm 10.0 define void @foo(half*, i8, i8, half) { store half %3, half* %0, align 1 ret void } For this assembly you provided, I don't see any reads from xmm0, or any word stores. So it's hard for me to determine what might be going wrong. Can provide the assembly where xmm0 is eventually used? mov rax, qword ptr [rsp + 424] movss xmm0, dword ptr [rsp + 416] # xmm0 = mem[0],zero,zero,zero # <-- moves the data like it wants to convert but never does mov qword ptr [rsp + 344], rcx mov qword ptr [rsp + 336], rdx mov qword ptr [rsp + 328], r8 mov qword ptr [rsp + 320], r9 mov qword ptr [rsp + 304], 0 mov qword ptr [rsp + 296], 0 mov qword ptr [rsp + 288], 0 mov qword ptr [rsp + 280], 0 mov rcx, qword ptr [rsp + 328] mov qword ptr [rsp + 272], rcx mov rcx, qword ptr [rsp + 328] mov rcx, qword ptr [rcx + 8] mov qword ptr [rsp + 264], rcx mov rcx, qword ptr [rsp + 336] mov rcx, qword ptr [rcx + 56] mov qword ptr [rsp + 256], rcx mov dword ptr [rsp + 312], 0 mov qword ptr [rsp + 248], rax # 8-byte Spill movss dword ptr ~Craig On Fri, Mar 5, 2021 at 6:46 AM Wang, Pengfei <pengfei.wang at intel.com> wrote:> Hi Jason, > > > > The different behavior between Linux and Windows comes form the difference > of the calling conversion. Windows uses 4 registers for arguments passing > which Linux uses 6. > > > https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#parameter-passing > > > > Thanks > > Pengfei > > > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Jason > Hafer via llvm-dev > *Sent:* Friday, March 5, 2021 10:21 PM > *To:* Craig Topper <craig.topper at gmail.com> > *Cc:* llvm-dev at lists.llvm.org > *Subject:* Re: [llvm-dev] Is it legal to pass a half by value on x86_64? > > > > Hi All, > > > > Thank you very much for all the great information. This is awesome! > > > > To circle back on Craig's questions. > > I did notice LLVM 11 behave very differently. > > > > ** Per: What does "incorrect math operations" mean? > > The half is passed to the function as a float. The function does > operations with other half numbers. On Windows when we don't get the float > to half conversation the input is always truncated to 0.0. > > > > ** Per: "Do you have a more complete IR file for Windows that I can take a > look at?" > > I can get you our IR if you want, but I think it is more convoluted than > required. I was working on a unit test and I think all one needs to see > the anomaly is: > > define void @foo(i8, i8, i8, i8, half) { > > ; CHECK-I686: callq __gnu_f2h_ieee > > > > %6 = alloca half > > store half %4, half* %6, align 1 > > ret void > > } > > x86_64-pc-windows gives: > push rax > > .seh_stackalloc 8 > > .seh_endprologue > > movss xmm0, dword ptr [rsp + 48] # xmm0 = mem[0],zero,zero,zero > > movss dword ptr [rsp + 4], xmm0 # 4-byte Spill > > pop rax > > ret > > .seh_handlerdata > > .text > > .seh_endproc > > > > What I find extremely interesting is the behavior seems has something to > do with the stack? For dropping the inputs by one then even Windows will > generate the conversion. > > > > define void @foo(i8, i8, i8, half) { > > ; CHECK-I686: callq __gnu_f2h_ieee > > > > %5 = alloca half > > store half %3, half* %5, align 1 > > ret void > > } > > > > x86_64-pc-windows gives: > > sub rsp, 40 > > .seh_stackalloc 40 > > .seh_endprologue > > movabs rax, offset __gnu_f2h_ieee > > movaps xmm0, xmm3 > > call rax > > mov word ptr [rsp + 38], ax > > add rsp, 40 > > ret > > .seh_handlerdata > > .text > > .seh_endproc > > > > > > ** If interested, here is a dissection of our real asm. > For both Windows and Linux our IR calls c2_foo() with a half(2): > > ... > > call void @c2_foo(i8* %S_6, [21 x i8*]* %ptr_gvar_instance_7, %emlrtStack* > %c2_b_st_, [18 x float]* @15, half 0xH4000, [18 x i8]* %t10) > > > > They both register this in c2_foo as: > > ... > > %c2_in2_ = alloca half > > store half %c2_in2, half* %c2_in2_, align 1 > > > > When we compile them, they both send 0x40000000 to c2_foo (a single). > > The Linux c2_foo() asm addresses this with a float2half conversion: > > ... > > mov qword ptr [rsp + 448], rdi > > mov qword ptr [rsp + 440], rsi > > mov qword ptr [rsp + 432], rdx > > mov qword ptr [rsp + 424], rcx > > movabs rcx, offset __gnu_f2h_ieee # <---Convert Here > > mov qword ptr [rsp + 336], r8 # 8-byte Spill > > call rcx > > mov word ptr [rsp + 422], ax > > mov rcx, qword ptr [rsp + 336] # 8-byte Reload > > mov qword ptr [rsp + 408], rcx > > mov qword ptr [rsp + 392], 0 > > mov qword ptr [rsp + 384], 0 > > mov qword ptr [rsp + 376], 0 > > mov qword ptr [rsp + 368], 0 > > mov rdx, qword ptr [rsp + 432] > > mov qword ptr [rsp + 360], rdx > > mov rdx, qword ptr [rsp + 432] > > mov rdx, qword ptr [rdx + 8] > > mov qword ptr [rsp + 352], rdx > > mov rdx, qword ptr [rsp + 440] > > mov rdx, qword ptr [rdx + 56] > > mov qword ptr [rsp + 344], rdx > > mov dword ptr [rsp + 400], 0 > > jmp .LBB9_9 > > > > The Windows c2_foo() asm is missing this conversion but treats the value > as if it has been converted. > > ... > > mov rax, qword ptr [rsp + 424] > > movss xmm0, dword ptr [rsp + 416] # xmm0 = mem[0],zero,zero,zero # <-- > moves the data like it wants to convert but never does > > mov qword ptr [rsp + 344], rcx > > mov qword ptr [rsp + 336], rdx > > mov qword ptr [rsp + 328], r8 > > mov qword ptr [rsp + 320], r9 > > mov qword ptr [rsp + 304], 0 > > mov qword ptr [rsp + 296], 0 > > mov qword ptr [rsp + 288], 0 > > mov qword ptr [rsp + 280], 0 > > mov rcx, qword ptr [rsp + 328] > > mov qword ptr [rsp + 272], rcx > > mov rcx, qword ptr [rsp + 328] > > mov rcx, qword ptr [rcx + 8] > > mov qword ptr [rsp + 264], rcx > > mov rcx, qword ptr [rsp + 336] > > mov rcx, qword ptr [rcx + 56] > > mov qword ptr [rsp + 256], rcx > > mov dword ptr [rsp + 312], 0 > > mov qword ptr [rsp + 248], rax # 8-byte Spill > > movss dword ptr > > > > > > > ------------------------------ > > *From:* Wang, Pengfei <pengfei.wang at intel.com> > *Sent:* Friday, March 5, 2021 7:30 AM > *To:* Sjoerd Meijer <Sjoerd.Meijer at arm.com>; Jason Hafer < > jhafer at mathworks.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org> > *Subject:* RE: Is it legal to pass a half by value on x86_64? > > > > I guess it’s designed for language portability. You can use this type > across different platforms. Nevertheless, I’m not a FE expert, so I cannot > think out other intentions. > > The _Float16 is a primitive type in the latest x86 ABI, but there’s no X86 > target that supports it yet. So you cannot use it on X86 by now. I think > that’s the difference from __fp16 and why should use it. > > We also have some discussion here. https://reviews.llvm.org/D97318 > > > > Thanks > > Pengfei > > > > *From:* Sjoerd Meijer <Sjoerd.Meijer at arm.com> > *Sent:* Friday, March 5, 2021 5:49 PM > *To:* Jason Hafer <jhafer at mathworks.com>; Wang, Pengfei < > pengfei.wang at intel.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org> > *Subject:* Re: Is it legal to pass a half by value on x86_64? > > > > __fp16 is a pure storage format. You cannot pass it by value, because only > ABI <https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be > passed by value while __fp16 is not one of them. > > Yep. Any specific reason to use a pure storage format? The native type is > _Float16 and would give some benefits, but this is not yet supported on > x86, see also: > > > > > https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point > > > > Cheers, > Sjoerd. > ------------------------------ > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Wang, > Pengfei via llvm-dev <llvm-dev at lists.llvm.org> > *Sent:* 05 March 2021 06:28 > *To:* Jason Hafer <jhafer at mathworks.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org> > *Subject:* Re: [llvm-dev] Is it legal to pass a half by value on x86_64? > > > > Hi Jason, > > > > __fp16 is a pure storage format. You cannot pass it by value, because only > ABI <https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be > passed by value while __fp16 is not one of them. > > > > - if "define void @foo(i8, i8, i8, i8, half) " is even legal to use > > half as a target independent type is legal for LLVM. It’s not legal for > unsupported target like X86. The behavior depends on how we lowering it. > But I don’t know why there’s differences between Linux and Windows. Maybe > because “__gnu_f2h_ieee” is a Linux only function? > > > > Thanks > > Pengfei > > > > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Jason > Hafer via llvm-dev > *Sent:* Friday, March 5, 2021 10:46 AM > *To:* llvm-dev at lists.llvm.org > *Cc:* Jason Hafer <jhafer at mathworks.com> > *Subject:* [llvm-dev] Is it legal to pass a half by value on x86_64? > > > > Hello, > > > > I am attempting to understand an anomaly I am seeing when dealing with > half on Windows and could use some help. > > > > Using LLVM 8 or 10, if I have IR of the flavor below: > define void @foo(i8, i8, i8, i8, half) { > > %6 = alloca half > > store half %4, half* %6, align 1 > > ... > > ret void > > } > > > > Using x86_64-pc-linux, we convert the float passed in with __gnu_f2h_ieee. > > Using x86_64-pc-windows I do not get the conversion, so we end up with > incorrect math operations. > > > > While investigating I noticed clang gave me the error below: > > error: parameters cannot have __fp16 type; did you forget * ? > void foo(int dc1, int dc2,int dc3,int dc4, __fp16 in) > > > > So, this got me wondering if "define void @foo(i8, i8, i8, i8, half) " is > even legal to use or if I should rather pass by ref? I have yet to find > documentation to convince me one way or the other. Thus, I was hoping > someone here might be able to shed some light on the issue. > > > > Thank you in advance! > > > > Cheers, > > > > JP >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210305/6b47ae82/attachment.html>