Carlos Liam via llvm-dev
2016-Apr-14 21:29 UTC
[llvm-dev] Integer -> Floating point -> Integer cast optimizations
I'm saying at the IR level, not the C level. IR makes certain assumptions about the representation of floating point numbers. Nothing to do with C, I only used it as an example. - CL> On Apr 14, 2016, at 4:49 PM, Martin J. O'Riordan <martin.oriordan at movidius.com> wrote: > > I don't think that this is correct. > > | Let's say we have an int x, and we cast it to a float and back. Floats have 8 exponent bits and 23 mantissa bits. > > 'float', 'double' and 'long double' do not have specific representations, and a given implementation might choose different FP implementations for each. > > ISO C and C++ only guarantee that 'long double' can accurately represent all values that may be represented by 'double', and that 'double' can represent accurately all values that may be represented by 'float'; but it does not state that 'float' has 8 bits of exponent and 23-bits of mantissa. > > And this is a particular problem I often face when porting floating-point code between platforms, each of which can genuinely claim to be ISO C compliant. > > It is "common" for 'float' to be IEEE 754 32-bit Single Precision compliant. > It is also "common" for 'double' to be IEEE 754 64-bit Double Precision compliant. > > But "common" does not mean "standard". The 'clang' optimisations have to adhere to the ISO C/C++ Standards, and not what might be perceived as "the norm". Floating-Point has for a very long time been a problem. > > o How does the machine resolve FP arithmetic? > o How does the compiler perform FP arithmetic - is it the same as the target machine or different? > o How does the pre-processor evaluate FP arithmetic - is it the same as the target machine or different? > > These have been issues since the very first ISO C standard (ANSI C'89/ISO C'90) and before. Very simple things like: > > #define MY_FP_VAL (3.14159 / 2.0) > > Where is that divide performed? In that compiler subject to host FP rules? In the compiler subject to target rules? Executed dynamically by the host? The same problem occurs when performing constant folding in the compiler, should it follow a model that is different to what the target would do or not? Worse still, when the pre-processor, compiler, and target are each different machines. > > These are huge problems in the FP world where exact equivalence and ordering of evaluation really matters (think partial ordering - not the happy unsaturated INT modulo 2^N world). > > On our architecture, we have chosen the 32-bit IEEE model provided by 'clang' for 'float' and 'double', but we have chosen the 64-bit IEEE model for 'long double'; other implementations are free to choose a different model. We also use IEEE 16-bit FP for 'half' aka '__fp16'. But IEEE also provides for 128-bit FP, 256-bit FP, and there are FP implementations that use 80-bits. In fact, 'clang' does not preclude an implementation choosing IEEE 754 16-bit Half-Precision as its representation for 'float'. This means 5-bits of exponent and 10-bits of mantissa - and that is still ISO C compliant. > > Any target is free to choose the FP representation it prefers for 'float', and that does not mean that it is bound to IEEE 754 32-bit Single Precision Floating-Point. Any FP optimisations within the compiler need to keep that target clearly in mind; I know, I've been burned by this before. > > MartinO > > > -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Carlos Liam via llvm-dev > Sent: 14 April 2016 19:14 > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] Integer -> Floating point -> Integer cast optimizations > > I brought this up in IRC and was told to consult someone who knows more about floating point numbers; I propose an optimization as follows. > > Let's say we have an int x, and we cast it to a float and back. Floats have 8 exponent bits and 23 mantissa bits. > > If x matches the condition `countTrailingZeros(abs(x)) > (log2(abs(x)) - 23)`, then we can remove the float casts. > > So, if we can establish that abs(x) is <= 2**23, we can remove the casts. LLVM does not currently perform that optimization on this C code: > > int floatcast(int x) { > if (abs(x) <= 16777216) { // abs(x) is definitely <= 2**23 and fits into our mantissa cleanly > float flt = (float)x; > return (int)flt; > } > return x; > } > > Things get more interesting when you bring in higher integers and leading zeros. Floating point can't exactly represent integers that don't fit neatly into the mantissa; they have to round to a multiple of some power of 2. For example, integers between 2**23 and 2**24 round to a multiple of 2**1 - meaning that the result has *at least* 1 trailing zero. Integers between 2**24 and 2**25 round to a multiple of 2**2 - with the result having at least 2 trailing zeros. Et cetera. If we can prove that the input to these casts fits in between one of those ranges *and* has at least the correct number of leading zeros, we can eliminate the casts. LLVM does not currently perform this optimization on this C code: > > int floatcast(int x) { > if (16777217 <= abs(x) && abs(x) <= 33554432) { // abs(x) is definitely between 2**23 and 2**24 > float flt = (float)(x / abs(x) * (abs(x) & (UINT32_MAX ^ 2))); // what's being casted to float definitely has at least one trailing zero in its absolute value > return (int)flt; > } > return x; > } > > > - CL > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
via llvm-dev
2016-Apr-14 22:02 UTC
[llvm-dev] Integer -> Floating point -> Integer cast optimizations
We already do this to some extent; see this code in InstCombineCasts: // fpto{s/u}i({u/s}itofp(X)) --> X or zext(X) or sext(X) or trunc(X) // This is safe if the intermediate type has enough bits in its mantissa to // accurately represent all values of X. For example, this won't work with // i64 -> float -> i64. Instruction *InstCombiner::FoldItoFPtoI(Instruction &FI) { —escha> On Apr 14, 2016, at 2:29 PM, Carlos Liam via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I'm saying at the IR level, not the C level. IR makes certain assumptions about the representation of floating point numbers. Nothing to do with C, I only used it as an example. > > - CL > >> On Apr 14, 2016, at 4:49 PM, Martin J. O'Riordan <martin.oriordan at movidius.com> wrote: >> >> I don't think that this is correct. >> >> | Let's say we have an int x, and we cast it to a float and back. Floats have 8 exponent bits and 23 mantissa bits. >> >> 'float', 'double' and 'long double' do not have specific representations, and a given implementation might choose different FP implementations for each. >> >> ISO C and C++ only guarantee that 'long double' can accurately represent all values that may be represented by 'double', and that 'double' can represent accurately all values that may be represented by 'float'; but it does not state that 'float' has 8 bits of exponent and 23-bits of mantissa. >> >> And this is a particular problem I often face when porting floating-point code between platforms, each of which can genuinely claim to be ISO C compliant. >> >> It is "common" for 'float' to be IEEE 754 32-bit Single Precision compliant. >> It is also "common" for 'double' to be IEEE 754 64-bit Double Precision compliant. >> >> But "common" does not mean "standard". The 'clang' optimisations have to adhere to the ISO C/C++ Standards, and not what might be perceived as "the norm". Floating-Point has for a very long time been a problem. >> >> o How does the machine resolve FP arithmetic? >> o How does the compiler perform FP arithmetic - is it the same as the target machine or different? >> o How does the pre-processor evaluate FP arithmetic - is it the same as the target machine or different? >> >> These have been issues since the very first ISO C standard (ANSI C'89/ISO C'90) and before. Very simple things like: >> >> #define MY_FP_VAL (3.14159 / 2.0) >> >> Where is that divide performed? In that compiler subject to host FP rules? In the compiler subject to target rules? Executed dynamically by the host? The same problem occurs when performing constant folding in the compiler, should it follow a model that is different to what the target would do or not? Worse still, when the pre-processor, compiler, and target are each different machines. >> >> These are huge problems in the FP world where exact equivalence and ordering of evaluation really matters (think partial ordering - not the happy unsaturated INT modulo 2^N world). >> >> On our architecture, we have chosen the 32-bit IEEE model provided by 'clang' for 'float' and 'double', but we have chosen the 64-bit IEEE model for 'long double'; other implementations are free to choose a different model. We also use IEEE 16-bit FP for 'half' aka '__fp16'. But IEEE also provides for 128-bit FP, 256-bit FP, and there are FP implementations that use 80-bits. In fact, 'clang' does not preclude an implementation choosing IEEE 754 16-bit Half-Precision as its representation for 'float'. This means 5-bits of exponent and 10-bits of mantissa - and that is still ISO C compliant. >> >> Any target is free to choose the FP representation it prefers for 'float', and that does not mean that it is bound to IEEE 754 32-bit Single Precision Floating-Point. Any FP optimisations within the compiler need to keep that target clearly in mind; I know, I've been burned by this before. >> >> MartinO >> >> >> -----Original Message----- >> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Carlos Liam via llvm-dev >> Sent: 14 April 2016 19:14 >> To: llvm-dev at lists.llvm.org >> Subject: [llvm-dev] Integer -> Floating point -> Integer cast optimizations >> >> I brought this up in IRC and was told to consult someone who knows more about floating point numbers; I propose an optimization as follows. >> >> Let's say we have an int x, and we cast it to a float and back. Floats have 8 exponent bits and 23 mantissa bits. >> >> If x matches the condition `countTrailingZeros(abs(x)) > (log2(abs(x)) - 23)`, then we can remove the float casts. >> >> So, if we can establish that abs(x) is <= 2**23, we can remove the casts. LLVM does not currently perform that optimization on this C code: >> >> int floatcast(int x) { >> if (abs(x) <= 16777216) { // abs(x) is definitely <= 2**23 and fits into our mantissa cleanly >> float flt = (float)x; >> return (int)flt; >> } >> return x; >> } >> >> Things get more interesting when you bring in higher integers and leading zeros. Floating point can't exactly represent integers that don't fit neatly into the mantissa; they have to round to a multiple of some power of 2. For example, integers between 2**23 and 2**24 round to a multiple of 2**1 - meaning that the result has *at least* 1 trailing zero. Integers between 2**24 and 2**25 round to a multiple of 2**2 - with the result having at least 2 trailing zeros. Et cetera. If we can prove that the input to these casts fits in between one of those ranges *and* has at least the correct number of leading zeros, we can eliminate the casts. LLVM does not currently perform this optimization on this C code: >> >> int floatcast(int x) { >> if (16777217 <= abs(x) && abs(x) <= 33554432) { // abs(x) is definitely between 2**23 and 2**24 >> float flt = (float)(x / abs(x) * (abs(x) & (UINT32_MAX ^ 2))); // what's being casted to float definitely has at least one trailing zero in its absolute value >> return (int)flt; >> } >> return x; >> } >> >> >> - CL >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160414/a1b0d2d2/attachment.html>
Carlos Liam via llvm-dev
2016-Apr-15 12:53 UTC
[llvm-dev] Integer -> Floating point -> Integer cast optimizations
My understanding is that this checks whether the bit width of the integer *type* fits in the bit width of the mantissa, not the bit width of the integer value. - CL> On Apr 14, 2016, at 6:02 PM, escha at apple.com wrote: > > We already do this to some extent; see this code in InstCombineCasts: > > // fpto{s/u}i({u/s}itofp(X)) --> X or zext(X) or sext(X) or trunc(X) > // This is safe if the intermediate type has enough bits in its mantissa to > // accurately represent all values of X. For example, this won't work with > // i64 -> float -> i64. > Instruction *InstCombiner::FoldItoFPtoI(Instruction &FI) { > > —escha > >> On Apr 14, 2016, at 2:29 PM, Carlos Liam via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> I'm saying at the IR level, not the C level. IR makes certain assumptions about the representation of floating point numbers. Nothing to do with C, I only used it as an example. >> >> - CL >> >>> On Apr 14, 2016, at 4:49 PM, Martin J. O'Riordan <martin.oriordan at movidius.com <mailto:martin.oriordan at movidius.com>> wrote: >>> >>> I don't think that this is correct. >>> >>> | Let's say we have an int x, and we cast it to a float and back. Floats have 8 exponent bits and 23 mantissa bits. >>> >>> 'float', 'double' and 'long double' do not have specific representations, and a given implementation might choose different FP implementations for each. >>> >>> ISO C and C++ only guarantee that 'long double' can accurately represent all values that may be represented by 'double', and that 'double' can represent accurately all values that may be represented by 'float'; but it does not state that 'float' has 8 bits of exponent and 23-bits of mantissa. >>> >>> And this is a particular problem I often face when porting floating-point code between platforms, each of which can genuinely claim to be ISO C compliant. >>> >>> It is "common" for 'float' to be IEEE 754 32-bit Single Precision compliant. >>> It is also "common" for 'double' to be IEEE 754 64-bit Double Precision compliant. >>> >>> But "common" does not mean "standard". The 'clang' optimisations have to adhere to the ISO C/C++ Standards, and not what might be perceived as "the norm". Floating-Point has for a very long time been a problem. >>> >>> o How does the machine resolve FP arithmetic? >>> o How does the compiler perform FP arithmetic - is it the same as the target machine or different? >>> o How does the pre-processor evaluate FP arithmetic - is it the same as the target machine or different? >>> >>> These have been issues since the very first ISO C standard (ANSI C'89/ISO C'90) and before. Very simple things like: >>> >>> #define MY_FP_VAL (3.14159 / 2.0) >>> >>> Where is that divide performed? In that compiler subject to host FP rules? In the compiler subject to target rules? Executed dynamically by the host? The same problem occurs when performing constant folding in the compiler, should it follow a model that is different to what the target would do or not? Worse still, when the pre-processor, compiler, and target are each different machines. >>> >>> These are huge problems in the FP world where exact equivalence and ordering of evaluation really matters (think partial ordering - not the happy unsaturated INT modulo 2^N world). >>> >>> On our architecture, we have chosen the 32-bit IEEE model provided by 'clang' for 'float' and 'double', but we have chosen the 64-bit IEEE model for 'long double'; other implementations are free to choose a different model. We also use IEEE 16-bit FP for 'half' aka '__fp16'. But IEEE also provides for 128-bit FP, 256-bit FP, and there are FP implementations that use 80-bits. In fact, 'clang' does not preclude an implementation choosing IEEE 754 16-bit Half-Precision as its representation for 'float'. This means 5-bits of exponent and 10-bits of mantissa - and that is still ISO C compliant. >>> >>> Any target is free to choose the FP representation it prefers for 'float', and that does not mean that it is bound to IEEE 754 32-bit Single Precision Floating-Point. Any FP optimisations within the compiler need to keep that target clearly in mind; I know, I've been burned by this before. >>> >>> MartinO >>> >>> >>> -----Original Message----- >>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org <mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Carlos Liam via llvm-dev >>> Sent: 14 April 2016 19:14 >>> To: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> Subject: [llvm-dev] Integer -> Floating point -> Integer cast optimizations >>> >>> I brought this up in IRC and was told to consult someone who knows more about floating point numbers; I propose an optimization as follows. >>> >>> Let's say we have an int x, and we cast it to a float and back. Floats have 8 exponent bits and 23 mantissa bits. >>> >>> If x matches the condition `countTrailingZeros(abs(x)) > (log2(abs(x)) - 23)`, then we can remove the float casts. >>> >>> So, if we can establish that abs(x) is <= 2**23, we can remove the casts. LLVM does not currently perform that optimization on this C code: >>> >>> int floatcast(int x) { >>> if (abs(x) <= 16777216) { // abs(x) is definitely <= 2**23 and fits into our mantissa cleanly >>> float flt = (float)x; >>> return (int)flt; >>> } >>> return x; >>> } >>> >>> Things get more interesting when you bring in higher integers and leading zeros. Floating point can't exactly represent integers that don't fit neatly into the mantissa; they have to round to a multiple of some power of 2. For example, integers between 2**23 and 2**24 round to a multiple of 2**1 - meaning that the result has *at least* 1 trailing zero. Integers between 2**24 and 2**25 round to a multiple of 2**2 - with the result having at least 2 trailing zeros. Et cetera. If we can prove that the input to these casts fits in between one of those ranges *and* has at least the correct number of leading zeros, we can eliminate the casts. LLVM does not currently perform this optimization on this C code: >>> >>> int floatcast(int x) { >>> if (16777217 <= abs(x) && abs(x) <= 33554432) { // abs(x) is definitely between 2**23 and 2**24 >>> float flt = (float)(x / abs(x) * (abs(x) & (UINT32_MAX ^ 2))); // what's being casted to float definitely has at least one trailing zero in its absolute value >>> return (int)flt; >>> } >>> return x; >>> } >>> >>> >>> - CL >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160415/35c9c228/attachment.html>