Hi all,
Consider the following snippet, the aim of which is to convert a
double to a signed i16, returning 0 if not exactly representable:
define i16 @foo(double) {
top:
%1 = fptosi double %0 to i16
%2 = sitofp i16 %1 to double
%3 = fcmp une double %2, %0
%4 = select i1 %3, i16 0, i16 %1
ret i16 %4
}
Of course, if the value is out-of-range, the result of fptosi is
undefined. Nevertheless, the snippet works on x86 & x86_64, generating
what to me seems to be fairly efficient code for the task.
However it breaks on ARM, with foo(200000.0) => 3392. From what I can
tell (given my very limited knowledge of LLVM IR, assembler and ARM
architecture), the first line is returning a value out-of-range of the
i16 type.
1) I realise this is a somewhat silly question, but is this still
acceptable "undefined behaviour"?
2) If so, is there a way to do this in an efficient manner without
relying on undefined behaviour? (i.e. I can introduce a range check
before the fptosi call, but this would add further overhead).
(for further context, this problem originally arose in the Julia issue
https://github.com/JuliaLang/julia/issues/14549)
Thanks,
Simon
On Fri, Jan 22, 2016 at 07:45:30PM +0000, Simon Byrne via llvm-dev wrote:> Hi all, > > Consider the following snippet, the aim of which is to convert a > double to a signed i16, returning 0 if not exactly representable: > > define i16 @foo(double) { > top: > %1 = fptosi double %0 to i16 > %2 = sitofp i16 %1 to double > %3 = fcmp une double %2, %0 > %4 = select i1 %3, i16 0, i16 %1 > ret i16 %4 > } > > Of course, if the value is out-of-range, the result of fptosi is > undefined. Nevertheless, the snippet works on x86 & x86_64, generating > what to me seems to be fairly efficient code for the task. > > However it breaks on ARM, with foo(200000.0) => 3392. From what I can > tell (given my very limited knowledge of LLVM IR, assembler and ARM > architecture), the first line is returning a value out-of-range of the > i16 type. > > 1) I realise this is a somewhat silly question, but is this still > acceptable "undefined behaviour"? >Yes, it is.> 2) If so, is there a way to do this in an efficient manner without > relying on undefined behaviour? (i.e. I can introduce a range check > before the fptosi call, but this would add further overhead). >You will need to add the bounds checks to the LLVM IR to get the behavior that you want. If LLVM does not generate efficient code for this, then you will need to teach the backends to recognize this pattern and generate better code if it can. -Tom> (for further context, this problem originally arose in the Julia issue > https://github.com/JuliaLang/julia/issues/14549) > > Thanks, > Simon > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On 22 January 2016 at 12:20, Tom Stellard via llvm-dev <llvm-dev at lists.llvm.org> wrote:>> 1) I realise this is a somewhat silly question, but is this still >> acceptable "undefined behaviour"? > > Yes, it is.I always thought these out-of-range instructions did produce an "undef" rather than allowing fully-general undefined behaviour (otherwise we couldn't speculate them, for a start). If so, I think the code ought to be valid: %1 is *some* i16 bitpattern, which means %2 cannot be completely unconstrained and should never be equal to %0. Cheers. Tim.