H.J. Lu via llvm-dev
2021-Jul-13 14:26 UTC
[llvm-dev] [PATCH] Add optional _Float16 support
On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang at intel.com> wrote:> > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e., > 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?Here is the v2 patch to add the missing _Float16 bits. The PDF file is at https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI> Thanks > Pengfei > > -----Original Message----- > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of H.J. Lu via llvm-dev > Sent: Friday, July 2, 2021 6:28 AM > To: Joseph Myers <joseph at codesourcery.com> > Cc: llvm-dev at lists.llvm.org; GCC Patches <gcc-patches at gcc.gnu.org>; GNU C Library <libc-alpha at sourceware.org>; IA32 System V Application Binary Interface <ia32-abi at googlegroups.com> > Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support > > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph at codesourcery.com> wrote: > > > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > That restricts use of _Float16 to processors with SSE. Is that what > > we want in the ABI, or should _Float16 be available with base 32-bit > > x86 architecture features only, much like _Float128 and the decimal FP > > types > > Yes, _Float16 requires XMM registers. > > > are? (If it is restricted to SSE, we can of course ensure relevant > > libgcc functions are built with SSE enabled, and likewise in glibc if > > that gains > > _Float16 functions, though maybe with some extra complications to get > > relevant testcases to run whenever possible.) > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > BTW, _Float16 software emulation may require more than just SSE since we need to do _Float16 load and store with XMM registers. > There is no 16bit load/store for XMM registers without AVX512FP16. > > -- > H.J. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- H.J. -------------- next part -------------- A non-text attachment was scrubbed... Name: v2-0001-Add-optional-_Float16-support.patch Type: text/x-patch Size: 7316 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210713/ff0df3f1/attachment.bin>
Wang, Pengfei via llvm-dev
2021-Jul-13 14:48 UTC
[llvm-dev] [PATCH] Add optional _Float16 support
Hi H.J., Our LLVM implementation currently use %xmm0 for both _Complex's real part and imaginary part. Do we have special reason to use two registers? We are using one register on X64. Considering the performance, especially the register pressure, should it be better to use one register for _Complex _Float16 on 32 bits target? Thanks Pengfei -----Original Message----- From: H.J. Lu <hjl.tools at gmail.com> Sent: Tuesday, July 13, 2021 10:26 PM To: Wang, Pengfei <pengfei.wang at intel.com>; llvm-dev at lists.llvm.org Cc: Joseph Myers <joseph at codesourcery.com>; GCC Patches <gcc-patches at gcc.gnu.org>; GNU C Library <libc-alpha at sourceware.org>; IA32 System V Application Binary Interface <ia32-abi at googlegroups.com> Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang at intel.com> wrote:> > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > Can you please explain the behavior here? Is there difference between > _Float16 and _Complex _Float16 when return? I.e., 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?Here is the v2 patch to add the missing _Float16 bits. The PDF file is at https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI> Thanks > Pengfei > > -----Original Message----- > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of H.J. Lu > via llvm-dev > Sent: Friday, July 2, 2021 6:28 AM > To: Joseph Myers <joseph at codesourcery.com> > Cc: llvm-dev at lists.llvm.org; GCC Patches <gcc-patches at gcc.gnu.org>; > GNU C Library <libc-alpha at sourceware.org>; IA32 System V Application > Binary Interface <ia32-abi at googlegroups.com> > Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support > > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph at codesourcery.com> wrote: > > > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > That restricts use of _Float16 to processors with SSE. Is that what > > we want in the ABI, or should _Float16 be available with base 32-bit > > x86 architecture features only, much like _Float128 and the decimal > > FP types > > Yes, _Float16 requires XMM registers. > > > are? (If it is restricted to SSE, we can of course ensure relevant > > libgcc functions are built with SSE enabled, and likewise in glibc > > if that gains > > _Float16 functions, though maybe with some extra complications to > > get relevant testcases to run whenever possible.) > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > BTW, _Float16 software emulation may require more than just SSE since we need to do _Float16 load and store with XMM registers. > There is no 16bit load/store for XMM registers without AVX512FP16. > > -- > H.J. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- H.J.
Joseph Myers via llvm-dev
2021-Jul-13 15:41 UTC
[llvm-dev] [PATCH] Add optional _Float16 support
On Tue, 13 Jul 2021, H.J. Lu wrote:> On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang at intel.com> wrote: > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e., > > 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? > > Here is the v2 patch to add the missing _Float16 bits. The PDF file is at > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABIThis PDF shows _Complex _Float16 as having a size of 2 bytes (should be 4-byte size, 2-byte alignment). It also seems to change double from 4-byte to 8-byte alignment, which is wrong. And it's inconsistent about whether it covers the long double = double (Android) case - it shows that case for _Complex long double but not for long double itself. -- Joseph S. Myers joseph at codesourcery.com