thr3ads.net - llvm dev - [llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Dan Liew via llvm-dev

2015-Sep-08 23:48 UTC

[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics

Hi,

I was looking at the x86 vector intrinsics for converting half
precision floating point numbers and I'm a bit confused as to why
certain types were chosen. I've gone ahead and used their current
definition with success but I'd like to understand why the types used
with these intrinsics are done this way.

For reference see ``include/llvm/IR/IntrinsicsX86.td``. Here are the
intrinsics of interest.

```
let TargetPrefix = "x86" in {  // All intrinsics start with
"llvm.x86.".
  def int_x86_vcvtph2ps_128 :
GCCBuiltin<"__builtin_ia32_vcvtph2ps">,
              Intrinsic<[llvm_v4f32_ty], [llvm_v8i16_ty], [IntrNoMem]>;
  def int_x86_vcvtph2ps_256 :
GCCBuiltin<"__builtin_ia32_vcvtph2ps256">,
              Intrinsic<[llvm_v8f32_ty], [llvm_v8i16_ty], [IntrNoMem]>;
  def int_x86_vcvtps2ph_128 :
GCCBuiltin<"__builtin_ia32_vcvtps2ph">,
              Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_i32_ty],
                        [IntrNoMem]>;
  def int_x86_vcvtps2ph_256 :
GCCBuiltin<"__builtin_ia32_vcvtps2ph256">,
              Intrinsic<[llvm_v8i16_ty], [llvm_v8f32_ty, llvm_i32_ty],
                        [IntrNoMem]>;

```

Here's what seems weird to me:

* For the 4 wide intrinsics (``_128`` suffix) some of the types are
wider than they need to be. For example ``int_x86_vcvtph2ps_128``
takes <8 x i16> as an argument but this intrinsic only uses the first
four lanes so why is the argument type not <4 x i16>?
``int_x86_vcvtps2ph_128`` also has the same oddity but on its return
type (returns <8 x i16> but only the first four are relevant).

* The use of ``i16`` types also seems a little strange given that the
more semantically correct ``f16`` type and vectorized forms (e.g.
``llvm_v4f16_ty``) are available. Sure I can use a bitcast with the
intrinsics to get the type I want in the IR but why were ``i16`` was
chosen over using ``f16``?

Any ideas?

Thanks,
Dan.

Ahmed Bougacha via llvm-dev

2015-Sep-08 23:58 UTC

head link

[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics

Hi Dan,

On Tue, Sep 8, 2015 at 4:48 PM, Dan Liew via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Hi,
>
> I was looking at the x86 vector intrinsics for converting half
> precision floating point numbers and I'm a bit confused as to why
> certain types were chosen. I've gone ahead and used their current
> definition with success but I'd like to understand why the types used
> with these intrinsics are done this way.
>
> For reference see ``include/llvm/IR/IntrinsicsX86.td``. Here are the
> intrinsics of interest.
>
> ```
> let TargetPrefix = "x86" in {  // All intrinsics start with
"llvm.x86.".
>   def int_x86_vcvtph2ps_128 :
GCCBuiltin<"__builtin_ia32_vcvtph2ps">,
>               Intrinsic<[llvm_v4f32_ty], [llvm_v8i16_ty],
[IntrNoMem]>;
>   def int_x86_vcvtph2ps_256 :
GCCBuiltin<"__builtin_ia32_vcvtph2ps256">,
>               Intrinsic<[llvm_v8f32_ty], [llvm_v8i16_ty],
[IntrNoMem]>;
>   def int_x86_vcvtps2ph_128 :
GCCBuiltin<"__builtin_ia32_vcvtps2ph">,
>               Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_i32_ty],
>                         [IntrNoMem]>;
>   def int_x86_vcvtps2ph_256 :
GCCBuiltin<"__builtin_ia32_vcvtps2ph256">,
>               Intrinsic<[llvm_v8i16_ty], [llvm_v8f32_ty, llvm_i32_ty],
>                         [IntrNoMem]>;
>
> ```
>
> Here's what seems weird to me:
>
> * For the 4 wide intrinsics (``_128`` suffix) some of the types are
> wider than they need to be. For example ``int_x86_vcvtph2ps_128``
> takes <8 x i16> as an argument but this intrinsic only uses the first
> four lanes so why is the argument type not <4 x i16>?
> ``int_x86_vcvtps2ph_128`` also has the same oddity but on its return
> type (returns <8 x i16> but only the first four are relevant).
One reason is that <4 x i16> is too small to be a legal SSE vector
type, so the IR intrinsics, much like the Intel C intrinsics and the
instructions, are defined in terms of the widened <8 x i16> (with
either __m128, or xmm registers).
> * The use of ``i16`` types also seems a little strange given that the
> more semantically correct ``f16`` type and vectorized forms (e.g.
> ``llvm_v4f16_ty``) are available. Sure I can use a bitcast with the
> intrinsics to get the type I want in the IR but why were ``i16`` was
> chosen over using ``f16``?
f16 wasn't, until recently, very well supported. It still has rough
edges on targets without native scalar register classes such as X86.

Instead, these targets use i16, and do the conversion with other
(native) FP types using the dedicated convert.to/from.fp16 intrinsics.
We match that here and use an i16 element type.

Someday, we'll get rid of these intrinsics and use half everywhere,
but we're not there yet!

HTH,
-Ahmed
> Any ideas?
>
> Thanks,
> Dan.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Dan Liew via llvm-dev

2015-Sep-09 00:23 UTC

head link

[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics

Hi,
>> Here's what seems weird to me:
>>
>> * For the 4 wide intrinsics (``_128`` suffix) some of the types are
>> wider than they need to be. For example ``int_x86_vcvtph2ps_128``
>> takes <8 x i16> as an argument but this intrinsic only uses the
first
>> four lanes so why is the argument type not <4 x i16>?
>> ``int_x86_vcvtps2ph_128`` also has the same oddity but on its return
>> type (returns <8 x i16> but only the first four are relevant).
>
> One reason is that <4 x i16> is too small to be a legal SSE vector
> type, so the IR intrinsics, much like the Intel C intrinsics and the
> instructions, are defined in terms of the widened <8 x i16> (with
> either __m128, or xmm registers).
Ah I see. Makes sense.
>> * The use of ``i16`` types also seems a little strange given that the
>> more semantically correct ``f16`` type and vectorized forms (e.g.
>> ``llvm_v4f16_ty``) are available. Sure I can use a bitcast with the
>> intrinsics to get the type I want in the IR but why were ``i16`` was
>> chosen over using ``f16``?
>
> f16 wasn't, until recently, very well supported. It still has rough
> edges on targets without native scalar register classes such as X86.
What do you mean by "register classes"? Sorry if this is a dumb
question.
> Instead, these targets use i16, and do the conversion with other
> (native) FP types using the dedicated convert.to/from.fp16 intrinsics.
> We match that here and use an i16 element type.
I remember seeing that intrinsic in the language reference but
unfortunately ``convert.to.fp16`` [1] isn't useful
for what I'm working on because it doesn't specify a rounding mode.
fp16 has so little precision that the rounding mode
**really matters**.
> Someday, we'll get rid of these intrinsics and use half everywhere,
> but we're not there yet!
Okay.
> HTH,
Very helpful, thanks.


[1] http://llvm.org/docs/LangRef.html#llvm-convert-to-fp16-intrinsic

Thanks,
Dan.

llvm dev - Sep 2015 - Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics

[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics

[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics

[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics