I have a question about the SSE scalar convert intrinsics. cvtsd2si is defined thusly: def int_x86_sse2_cvtsd2si64 : GCCBuiltin<"__builtin_ia32_cvtsd2si64">, Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>; This matches the signature of the GCC intrinsic. The fact that the GCC intrinsic has a type mismatch on the input (vector rather than scalar) is strange, but ok, we'll run with it. Until this: def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f128mem: $src), "cvtsd2si\t{$src, $dst|$dst, $src}", [(set GR32:$dst, (int_x86_sse2_cvtsd2si (load addr:$src)))]>; Er, this makes us load a 128-bit quantity, which is almost certainly not what we want. Do we need two intrinsics for these scalar converts, one to satisfy the (arguably broken) GCC interface and one to really reflect the operation as specified by the ISA? -Dave
On Jun 5, 2009, at 8:51 AM, David Greene wrote:> I have a question about the SSE scalar convert intrinsics. > > cvtsd2si is defined thusly: > > def int_x86_sse2_cvtsd2si64 : > GCCBuiltin<"__builtin_ia32_cvtsd2si64">, > Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>; > > This matches the signature of the GCC intrinsic. The fact that the > GCC > intrinsic has a type mismatch on the input (vector rather than scalar) > is strange, but ok, we'll run with it. > > Until this: > > def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins > f128mem: > $src), > "cvtsd2si\t{$src, $dst|$dst, $src}", > [(set GR32:$dst, (int_x86_sse2_cvtsd2si > (load addr:$src)))]>; > > Er, this makes us load a 128-bit quantity, which is almost certainly > not > what we want.Yes, that looks wrong, even if it ends up doing something that ends up working.> > > Do we need two intrinsics for these scalar converts, one to satisfy > the > (arguably broken) GCC interface and one to really reflect the > operation > as specified by the ISA?That's what's done for most other instructions, unfortunately. For cvtsd2si, there's currently no "normal" version in the tree, but if you add one, it wouldn't be alone. One thing we'd like to do at some point is have front-ends lower intrinsics for scalar instructions into extractelement+op+insertelement, so that we don't need two versions of each of the instructions. Doing this for everything will require some work to make sure that the extra insert/extract operators don't incur unnecessary copying, but that's also something we'd like to do regardless. Dan
On Fri, Jun 5, 2009 at 8:51 AM, David Greene<dag at cray.com> wrote:> def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f128mem: > $src), > "cvtsd2si\t{$src, $dst|$dst, $src}", > [(set GR32:$dst, (int_x86_sse2_cvtsd2si > (load addr:$src)))]>; > > Er, this makes us load a 128-bit quantity, which is almost certainly not > what we want.I agree, that doesn't look right.> Do we need two intrinsics for these scalar converts, one to satisfy the > (arguably broken) GCC interface and one to really reflect the operation > as specified by the ISA?We really need zero intrinsics... it's quite easy to map onto existing LLVM instructions. See the definition of CVTSD2SIrm. -Eli
On Jun 5, 2009, at 1:19 PM, Dan Gohman wrote:> > On Jun 5, 2009, at 8:51 AM, David Greene wrote: > >> I have a question about the SSE scalar convert intrinsics. >> >> cvtsd2si is defined thusly: >> >> def int_x86_sse2_cvtsd2si64 : >> GCCBuiltin<"__builtin_ia32_cvtsd2si64">, >> Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>; >> >> This matches the signature of the GCC intrinsic. The fact that the >> GCC >> intrinsic has a type mismatch on the input (vector rather than >> scalar) >> is strange, but ok, we'll run with it. >> >> Until this: >> >> def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins >> f128mem: >> $src), >> "cvtsd2si\t{$src, $dst|$dst, $src}", >> [(set GR32:$dst, (int_x86_sse2_cvtsd2si >> (load addr:$src)))]>; >> >> Er, this makes us load a 128-bit quantity, which is almost certainly >> not >> what we want. > > Yes, that looks wrong, even if it ends up doing something that > ends up working. > >> >> >> Do we need two intrinsics for these scalar converts, one to satisfy >> the >> (arguably broken) GCC interface and one to really reflect the >> operation >> as specified by the ISA? > > That's what's done for most other instructions, unfortunately. > For cvtsd2si, there's currently no "normal" version in the tree, > but if you add one, it wouldn't be alone. > > One thing we'd like to do at some point is have front-ends lower > intrinsics for scalar instructions into > extractelement+op+insertelement, so that we don't need two > versions of each of the instructions. Doing this for everything > will require some work to make sure that the extra insert/extract > operators don't incur unnecessary copying, but that's also > something we'd like to do regardless.Agreed! Nate
On Friday 05 June 2009 15:19, Dan Gohman wrote:> > Do we need two intrinsics for these scalar converts, one to satisfy > > the > > (arguably broken) GCC interface and one to really reflect the > > operation > > as specified by the ISA? > > That's what's done for most other instructions, unfortunately. > For cvtsd2si, there's currently no "normal" version in the tree, > but if you add one, it wouldn't be alone.Ok.> One thing we'd like to do at some point is have front-ends lower > intrinsics for scalar instructions into > extractelement+op+insertelement, so that we don't need two > versions of each of the instructions. Doing this for everything > will require some work to make sure that the extra insert/extract > operators don't incur unnecessary copying, but that's also > something we'd like to do regardless.So then how does one do a memop intrinsic? Does it mean we can't match to the memop versions of instructions? -Dave
On Friday 05 June 2009 15:22, Eli Friedman wrote:> > Do we need two intrinsics for these scalar converts, one to satisfy the > > (arguably broken) GCC interface and one to really reflect the operation > > as specified by the ISA? > > We really need zero intrinsics... it's quite easy to map onto existing > LLVM instructions. See the definition of CVTSD2SIrm.In some cases, yes. But not all of the X86 instructions are accessible through LLVM IR. And sometimes we like the ability to have our frontend lower to intrinsics so we know EXACTLY what code will come out the other end. And see my previous post about sint_to_fp with a memory operand not working in TableGen ("TableGen Type Inference"). I'll be debugging that next week, probably. -Dave