thr3ads.net - llvm dev - [LLVMdev] SSE Scalar Convert Intrinsics [Jun 2009]

If this information is useful, please help other people find it:
Share via:

David Greene

2009-Jun-05 15:51 UTC

[LLVMdev] SSE Scalar Convert Intrinsics

I have a question about the SSE scalar convert intrinsics.

cvtsd2si is defined thusly:

  def int_x86_sse2_cvtsd2si64 :
GCCBuiltin<"__builtin_ia32_cvtsd2si64">,
              Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>;

This matches the signature of the GCC intrinsic.  The fact that the GCC
intrinsic has a type mismatch on the input (vector rather than scalar)
is strange, but ok, we'll run with it.

Until this:

def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f128mem:
$src),
                         "cvtsd2si\t{$src, $dst|$dst, $src}",
                         [(set GR32:$dst, (int_x86_sse2_cvtsd2si
                                           (load addr:$src)))]>;

Er, this makes us load a 128-bit quantity, which is almost certainly not
what we want.

Do we need two intrinsics for these scalar converts, one to satisfy the
(arguably broken) GCC interface and one to really reflect the operation
as specified by the ISA?

                                   -Dave

Dan Gohman

2009-Jun-05 20:19 UTC

head link

[LLVMdev] SSE Scalar Convert Intrinsics

On Jun 5, 2009, at 8:51 AM, David Greene wrote:
> I have a question about the SSE scalar convert intrinsics.
>
> cvtsd2si is defined thusly:
>
>  def int_x86_sse2_cvtsd2si64 :  
> GCCBuiltin<"__builtin_ia32_cvtsd2si64">,
>              Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>;
>
> This matches the signature of the GCC intrinsic.  The fact that the  
> GCC
> intrinsic has a type mismatch on the input (vector rather than scalar)
> is strange, but ok, we'll run with it.
>
> Until this:
>
> def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins  
> f128mem:
> $src),
>                         "cvtsd2si\t{$src, $dst|$dst, $src}",
>                         [(set GR32:$dst, (int_x86_sse2_cvtsd2si
>                                           (load addr:$src)))]>;
>
> Er, this makes us load a 128-bit quantity, which is almost certainly  
> not
> what we want.
Yes, that looks wrong, even if it ends up doing something that
ends up working.
>
>
> Do we need two intrinsics for these scalar converts, one to satisfy  
> the
> (arguably broken) GCC interface and one to really reflect the  
> operation
> as specified by the ISA?
That's what's done for most other instructions, unfortunately.
For cvtsd2si, there's currently no "normal" version in the tree,
but if you add one, it wouldn't be alone.

One thing we'd like to do at some point is have front-ends lower
intrinsics for scalar instructions into
extractelement+op+insertelement, so that we don't need two
versions of each of the instructions.  Doing this for everything
will require some work to make sure that the extra insert/extract
operators don't incur unnecessary copying, but that's also
something we'd like to do regardless.

Dan

Eli Friedman

2009-Jun-05 20:22 UTC

head link

[LLVMdev] SSE Scalar Convert Intrinsics

On Fri, Jun 5, 2009 at 8:51 AM, David Greene<dag at cray.com>
wrote:> def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins
f128mem:
> $src),
>                         "cvtsd2si\t{$src, $dst|$dst, $src}",
>                         [(set GR32:$dst, (int_x86_sse2_cvtsd2si
>                                           (load addr:$src)))]>;
>
> Er, this makes us load a 128-bit quantity, which is almost certainly not
> what we want.
I agree, that doesn't look right.
> Do we need two intrinsics for these scalar converts, one to satisfy the
> (arguably broken) GCC interface and one to really reflect the operation
> as specified by the ISA?
We really need zero intrinsics... it's quite easy to map onto existing
LLVM instructions.  See the definition of CVTSD2SIrm.

-Eli

Nate Begeman

2009-Jun-05 20:33 UTC

head link

[LLVMdev] SSE Scalar Convert Intrinsics

On Jun 5, 2009, at 1:19 PM, Dan Gohman wrote:
>
> On Jun 5, 2009, at 8:51 AM, David Greene wrote:
>
>> I have a question about the SSE scalar convert intrinsics.
>>
>> cvtsd2si is defined thusly:
>>
>> def int_x86_sse2_cvtsd2si64 :
>> GCCBuiltin<"__builtin_ia32_cvtsd2si64">,
>>             Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>;
>>
>> This matches the signature of the GCC intrinsic.  The fact that the
>> GCC
>> intrinsic has a type mismatch on the input (vector rather than  
>> scalar)
>> is strange, but ok, we'll run with it.
>>
>> Until this:
>>
>> def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins
>> f128mem:
>> $src),
>>                        "cvtsd2si\t{$src, $dst|$dst, $src}",
>>                        [(set GR32:$dst, (int_x86_sse2_cvtsd2si
>>                                          (load addr:$src)))]>;
>>
>> Er, this makes us load a 128-bit quantity, which is almost certainly
>> not
>> what we want.
>
> Yes, that looks wrong, even if it ends up doing something that
> ends up working.
>
>>
>>
>> Do we need two intrinsics for these scalar converts, one to satisfy
>> the
>> (arguably broken) GCC interface and one to really reflect the
>> operation
>> as specified by the ISA?
>
> That's what's done for most other instructions, unfortunately.
> For cvtsd2si, there's currently no "normal" version in the
tree,
> but if you add one, it wouldn't be alone.
>
> One thing we'd like to do at some point is have front-ends lower
> intrinsics for scalar instructions into
> extractelement+op+insertelement, so that we don't need two
> versions of each of the instructions.  Doing this for everything
> will require some work to make sure that the extra insert/extract
> operators don't incur unnecessary copying, but that's also
> something we'd like to do regardless.
Agreed!

Nate

David Greene

2009-Jun-05 22:16 UTC

head link

[LLVMdev] SSE Scalar Convert Intrinsics

On Friday 05 June 2009 15:19, Dan Gohman wrote:
> > Do we need two intrinsics for these scalar converts, one to satisfy
> > the
> > (arguably broken) GCC interface and one to really reflect the
> > operation
> > as specified by the ISA?
>
> That's what's done for most other instructions, unfortunately.
> For cvtsd2si, there's currently no "normal" version in the
tree,
> but if you add one, it wouldn't be alone.
Ok.
> One thing we'd like to do at some point is have front-ends lower
> intrinsics for scalar instructions into
> extractelement+op+insertelement, so that we don't need two
> versions of each of the instructions.  Doing this for everything
> will require some work to make sure that the extra insert/extract
> operators don't incur unnecessary copying, but that's also
> something we'd like to do regardless.
So then how does one do a memop intrinsic?  Does it mean we can't
match to the memop versions of instructions?

                             -Dave

David Greene

2009-Jun-05 22:19 UTC

head link

[LLVMdev] SSE Scalar Convert Intrinsics

On Friday 05 June 2009 15:22, Eli Friedman wrote:
> > Do we need two intrinsics for these scalar converts, one to satisfy
the
> > (arguably broken) GCC interface and one to really reflect the
operation
> > as specified by the ISA?
>
> We really need zero intrinsics... it's quite easy to map onto existing
> LLVM instructions.  See the definition of CVTSD2SIrm.
In some cases, yes.  But not all of the X86 instructions are accessible
through LLVM IR.  And sometimes we like the ability to have our frontend
lower to intrinsics so we know EXACTLY what code will come out the other
end.

And see my previous post about sint_to_fp with a memory operand not working
in TableGen ("TableGen Type Inference").  I'll be debugging that
next week,
probably.

                               -Dave

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jun 2009 - [LLVMdev] SSE Scalar Convert Intrinsics

[LLVMdev] SSE Scalar Convert Intrinsics

[LLVMdev] SSE Scalar Convert Intrinsics

[LLVMdev] SSE Scalar Convert Intrinsics

[LLVMdev] SSE Scalar Convert Intrinsics

[LLVMdev] SSE Scalar Convert Intrinsics

[LLVMdev] SSE Scalar Convert Intrinsics

Maybe Matching Threads