Dan Liew via llvm-dev
2015-Sep-08 23:48 UTC
[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics
Hi, I was looking at the x86 vector intrinsics for converting half precision floating point numbers and I'm a bit confused as to why certain types were chosen. I've gone ahead and used their current definition with success but I'd like to understand why the types used with these intrinsics are done this way. For reference see ``include/llvm/IR/IntrinsicsX86.td``. Here are the intrinsics of interest. ``` let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.". def int_x86_vcvtph2ps_128 : GCCBuiltin<"__builtin_ia32_vcvtph2ps">, Intrinsic<[llvm_v4f32_ty], [llvm_v8i16_ty], [IntrNoMem]>; def int_x86_vcvtph2ps_256 : GCCBuiltin<"__builtin_ia32_vcvtph2ps256">, Intrinsic<[llvm_v8f32_ty], [llvm_v8i16_ty], [IntrNoMem]>; def int_x86_vcvtps2ph_128 : GCCBuiltin<"__builtin_ia32_vcvtps2ph">, Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_i32_ty], [IntrNoMem]>; def int_x86_vcvtps2ph_256 : GCCBuiltin<"__builtin_ia32_vcvtps2ph256">, Intrinsic<[llvm_v8i16_ty], [llvm_v8f32_ty, llvm_i32_ty], [IntrNoMem]>; ``` Here's what seems weird to me: * For the 4 wide intrinsics (``_128`` suffix) some of the types are wider than they need to be. For example ``int_x86_vcvtph2ps_128`` takes <8 x i16> as an argument but this intrinsic only uses the first four lanes so why is the argument type not <4 x i16>? ``int_x86_vcvtps2ph_128`` also has the same oddity but on its return type (returns <8 x i16> but only the first four are relevant). * The use of ``i16`` types also seems a little strange given that the more semantically correct ``f16`` type and vectorized forms (e.g. ``llvm_v4f16_ty``) are available. Sure I can use a bitcast with the intrinsics to get the type I want in the IR but why were ``i16`` was chosen over using ``f16``? Any ideas? Thanks, Dan.
Ahmed Bougacha via llvm-dev
2015-Sep-08 23:58 UTC
[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics
Hi Dan, On Tue, Sep 8, 2015 at 4:48 PM, Dan Liew via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi, > > I was looking at the x86 vector intrinsics for converting half > precision floating point numbers and I'm a bit confused as to why > certain types were chosen. I've gone ahead and used their current > definition with success but I'd like to understand why the types used > with these intrinsics are done this way. > > For reference see ``include/llvm/IR/IntrinsicsX86.td``. Here are the > intrinsics of interest. > > ``` > let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.". > def int_x86_vcvtph2ps_128 : GCCBuiltin<"__builtin_ia32_vcvtph2ps">, > Intrinsic<[llvm_v4f32_ty], [llvm_v8i16_ty], [IntrNoMem]>; > def int_x86_vcvtph2ps_256 : GCCBuiltin<"__builtin_ia32_vcvtph2ps256">, > Intrinsic<[llvm_v8f32_ty], [llvm_v8i16_ty], [IntrNoMem]>; > def int_x86_vcvtps2ph_128 : GCCBuiltin<"__builtin_ia32_vcvtps2ph">, > Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_i32_ty], > [IntrNoMem]>; > def int_x86_vcvtps2ph_256 : GCCBuiltin<"__builtin_ia32_vcvtps2ph256">, > Intrinsic<[llvm_v8i16_ty], [llvm_v8f32_ty, llvm_i32_ty], > [IntrNoMem]>; > > ``` > > Here's what seems weird to me: > > * For the 4 wide intrinsics (``_128`` suffix) some of the types are > wider than they need to be. For example ``int_x86_vcvtph2ps_128`` > takes <8 x i16> as an argument but this intrinsic only uses the first > four lanes so why is the argument type not <4 x i16>? > ``int_x86_vcvtps2ph_128`` also has the same oddity but on its return > type (returns <8 x i16> but only the first four are relevant).One reason is that <4 x i16> is too small to be a legal SSE vector type, so the IR intrinsics, much like the Intel C intrinsics and the instructions, are defined in terms of the widened <8 x i16> (with either __m128, or xmm registers).> * The use of ``i16`` types also seems a little strange given that the > more semantically correct ``f16`` type and vectorized forms (e.g. > ``llvm_v4f16_ty``) are available. Sure I can use a bitcast with the > intrinsics to get the type I want in the IR but why were ``i16`` was > chosen over using ``f16``?f16 wasn't, until recently, very well supported. It still has rough edges on targets without native scalar register classes such as X86. Instead, these targets use i16, and do the conversion with other (native) FP types using the dedicated convert.to/from.fp16 intrinsics. We match that here and use an i16 element type. Someday, we'll get rid of these intrinsics and use half everywhere, but we're not there yet! HTH, -Ahmed> Any ideas? > > Thanks, > Dan. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Dan Liew via llvm-dev
2015-Sep-09 00:23 UTC
[llvm-dev] Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics
Hi,>> Here's what seems weird to me: >> >> * For the 4 wide intrinsics (``_128`` suffix) some of the types are >> wider than they need to be. For example ``int_x86_vcvtph2ps_128`` >> takes <8 x i16> as an argument but this intrinsic only uses the first >> four lanes so why is the argument type not <4 x i16>? >> ``int_x86_vcvtps2ph_128`` also has the same oddity but on its return >> type (returns <8 x i16> but only the first four are relevant). > > One reason is that <4 x i16> is too small to be a legal SSE vector > type, so the IR intrinsics, much like the Intel C intrinsics and the > instructions, are defined in terms of the widened <8 x i16> (with > either __m128, or xmm registers).Ah I see. Makes sense.>> * The use of ``i16`` types also seems a little strange given that the >> more semantically correct ``f16`` type and vectorized forms (e.g. >> ``llvm_v4f16_ty``) are available. Sure I can use a bitcast with the >> intrinsics to get the type I want in the IR but why were ``i16`` was >> chosen over using ``f16``? > > f16 wasn't, until recently, very well supported. It still has rough > edges on targets without native scalar register classes such as X86.What do you mean by "register classes"? Sorry if this is a dumb question.> Instead, these targets use i16, and do the conversion with other > (native) FP types using the dedicated convert.to/from.fp16 intrinsics. > We match that here and use an i16 element type.I remember seeing that intrinsic in the language reference but unfortunately ``convert.to.fp16`` [1] isn't useful for what I'm working on because it doesn't specify a rounding mode. fp16 has so little precision that the rounding mode **really matters**.> Someday, we'll get rid of these intrinsics and use half everywhere, > but we're not there yet!Okay.> HTH,Very helpful, thanks. [1] http://llvm.org/docs/LangRef.html#llvm-convert-to-fp16-intrinsic Thanks, Dan.