Dan Liew via llvm-dev
2015-Aug-21 19:36 UTC
[llvm-dev] The semantics of the fptrunc instruction with an example of incorrect optimisation
I've recently been looking at how to implement in LLVM IR the rounding of floating point values when casting using different rounding modes and I've hit some problems. It seems that when casting down floats to less precise types the ``fptrunc`` LLVM IR instruction is used. The LLVM language reference suggests that it just truncates the value (which would be equivalent to rounding towards zero) but this seems to be very misleading because on the target I'm using (x86_64) that **is not** what happens. Consider the following example in C ``` #include <stdio.h> #include <fenv.h> int main() { double x = 0.3; fesetround(FE_TONEAREST); float y = (float) x; printf("y (nearest):%a\n", y); fesetround(FE_UPWARD); y = (float) x; printf("y (upward):%a\n", y); fesetround(FE_DOWNWARD); y = (float) x; printf("y (downward):%a\n", y); return (int) y; } ``` If I get the unoptimised LLVM IR for this by running ``clang -O0 float.c -emit-llvm -c -o float.clang.o0.bc`` I can see that the cast of variable x is being handled using LLVM IR's ``fptrunc`` ``` ... store double 3.000000e-01, double* %x, align 8 %call = call i32 @fesetround(i32 0) #3 %0 = load double, double* %x, align 8 %conv = fptrunc double %0 to float .... ``` If I look at the codegened assembly I see that the ``cvtsd2ss`` x86 instruction is used (how rounding is done is controlled by the MXCSR register apparently). So this instruction might not "truncate" depending on how MXCSR is set. If I run the program ``` $ clang -O0 float.c -lm -o float.clang.o0 $ ./float.clang.o0 y (nearest):0x1.333334p-2 y (upward):0x1.333334p-2 y (downward):0x1.333332p-2 ``` I can see that the last cast gives a different result because the rounding mode has been changed as expected. Now let's see what clang does when we ask it to optimize. ``` ./float.clang.o3 y (nearest):0x1.333334p-2 y (upward):0x1.333334p-2 y (downward):0x1.333334p-2 ``` The result of the last cast is wrong (note gcc at -O3 also seems to do this) and looking at the optimized LLVM IR reveals why ``` define i32 @main() #0 { entry: %call = tail call i32 @fesetround(i32 0) #2 %call2 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([16 x i8], [16 x i8]* @.str, i64 0, i64 0), double 0x3FD3333340000000) #2 %call3 = tail call i32 @fesetround(i32 2048) #2 %call6 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([15 x i8], [15 x i8]* @.str.1, i64 0, i64 0), double 0x3FD3333340000000) #2 %call7 = tail call i32 @fesetround(i32 1024) #2 %call10 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([17 x i8], [17 x i8]* @.str.2, i64 0, i64 0), double 0x3FD3333340000000) #2 ret i32 0 } ``` the cast of a constant has been constant folded incorrectly (I guess that clang is assuming a particular rounding mode which in this case is sometimes the wrong rounding mode). I'm not sure if there's a good way to fix this. First I thought it would be better if the rounding mode was an operand to ``fptrunc`` (which would make constant folding correct) but then I realized that for codegen to be always correct, every time a ``fptrunc`` is about to be executed the rounding mode might to be reset which most of the time would be a very wasteful thing to do. In general its not (at least in C) possible always know what the rounding mode is going to be statically at any point during the program because it's part of the currently executing thread's state. On the other hand LLVM IR isn't supposed to be tied to C so I feel like there ought to be away to specify how certain floating point operations do rounding. (I think these rounding issues apply to more than just ``fptrunc``) Any thoughts on this? At the very least the LLVM IR documentation needs to be more specific about how rounding is done. Thanks, Dan.
Ahmed Bougacha via llvm-dev
2015-Aug-21 20:25 UTC
[llvm-dev] The semantics of the fptrunc instruction with an example of incorrect optimisation
On Fri, Aug 21, 2015 at 12:36 PM, Dan Liew via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > I've recently been looking at how to implement in LLVM IR the rounding > of floating point values when casting using different rounding modes > and I've hit some problems. > > It seems that when casting down floats to less precise types the > ``fptrunc`` LLVM IR instruction is used. The LLVM language reference > suggests that it just truncates the value (which would be equivalent > to rounding towards zero) but this seems to be very misleading because > on the target I'm using (x86_64) that **is not** what happens. > > Consider the following example in C > > ``` > #include <stdio.h> > #include <fenv.h> > int main() { > double x = 0.3; > fesetround(FE_TONEAREST); > float y = (float) x; > printf("y (nearest):%a\n", y); > fesetround(FE_UPWARD); > y = (float) x; > printf("y (upward):%a\n", y); > fesetround(FE_DOWNWARD); > y = (float) x; > printf("y (downward):%a\n", y); > return (int) y; > } > ```This sounds like https://llvm.org/bugs/show_bug.cgi?id=8100 : complete support for FP rounding and exceptions (via `#pragma STDC FENV_ACCESS ON', which you need for fesetround to be "meaningful") isn't implemented yet (and is probably a huge task, as you explain). -Ahmed> > If I get the unoptimised LLVM IR for this by running ``clang -O0 > float.c -emit-llvm -c -o float.clang.o0.bc`` I can see that the cast > of variable x is being handled using LLVM IR's ``fptrunc`` > > ``` > ... > store double 3.000000e-01, double* %x, align 8 > %call = call i32 @fesetround(i32 0) #3 > %0 = load double, double* %x, align 8 > %conv = fptrunc double %0 to float > .... > ``` > > If I look at the codegened assembly I see that the ``cvtsd2ss`` x86 > instruction is used (how rounding is done is controlled by the MXCSR > register apparently). So this instruction might not "truncate" > depending on how MXCSR is set. > > If I run the program > ``` > $ clang -O0 float.c -lm -o float.clang.o0 > $ ./float.clang.o0 > y (nearest):0x1.333334p-2 > y (upward):0x1.333334p-2 > y (downward):0x1.333332p-2 > ``` > > I can see that the last cast gives a different result because the > rounding mode has been changed as expected. > > Now let's see what clang does when we ask it to optimize. > > ``` > ./float.clang.o3 > y (nearest):0x1.333334p-2 > y (upward):0x1.333334p-2 > y (downward):0x1.333334p-2 > ``` > > The result of the last cast is wrong (note gcc at -O3 also seems to do > this) and looking at the optimized LLVM IR reveals why > > ``` > define i32 @main() #0 { > entry: > %call = tail call i32 @fesetround(i32 0) #2 > %call2 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds > ([16 x i8], [16 x i8]* @.str, i64 0, i64 0), double > 0x3FD3333340000000) #2 > %call3 = tail call i32 @fesetround(i32 2048) #2 > %call6 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds > ([15 x i8], [15 x i8]* @.str.1, i64 0, i64 0), double > 0x3FD3333340000000) #2 > %call7 = tail call i32 @fesetround(i32 1024) #2 > %call10 = tail call i32 (i8*, ...) @printf(i8* getelementptr > inbounds ([17 x i8], [17 x i8]* @.str.2, i64 0, i64 0), double > 0x3FD3333340000000) #2 > ret i32 0 > } > ``` > > the cast of a constant has been constant folded incorrectly (I guess > that clang is assuming a particular rounding mode which in this case > is sometimes the wrong rounding mode). > > I'm not sure if there's a good way to fix this. First I thought it > would be better if the rounding mode was an operand to ``fptrunc`` > (which would make constant folding correct) but then I realized that > for codegen to be always correct, every time a ``fptrunc`` is about to > be executed the rounding mode might to be reset which most of the time > would be a very wasteful thing to do. > > In general its not (at least in C) possible always know what the > rounding mode is going to be statically at any point during the > program because it's part of the currently executing thread's state. > > On the other hand LLVM IR isn't supposed to be tied to C so I feel > like there ought to be away to specify how certain floating point > operations do rounding. (I think these rounding issues apply to more > than just ``fptrunc``) > > Any thoughts on this? At the very least the LLVM IR documentation > needs to be more specific about how rounding is done. > > > Thanks, > Dan. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Dan Liew via llvm-dev
2015-Aug-21 21:05 UTC
[llvm-dev] The semantics of the fptrunc instruction with an example of incorrect optimisation
Hi,> > This sounds like https://llvm.org/bugs/show_bug.cgi?id=8100 : complete > support for FP rounding and exceptions (via `#pragma STDC FENV_ACCESS > ON', which you need for fesetround to be "meaningful") isn't > implemented yet (and is probably a huge task, as you explain). >Thanks. I wasn't aware of ``STDC FENV_ACCESS``. Supporting something like this no doubt is difficult. One way I could imagine supporting rounding in a more general way would be to have all floating point operations at the IR level take a rounding mode operand (would let you do correctly rounded constant folding in all cases). When doing codegen for something like x86 the most simplistic thing you could do is reset the rounding mode for every floating point operation but I could imagine handling this more efficiently by computing call-free (functions known not to modify the rounding mode could be ignored) single-entry-single-exit regions where the rounding mode does not change and omitting rounding mode reset instructions there. I'm not really sure if this is a good idea, if there aren't any real world targets that make the rounding mode part of instruction op-codes then it feels like this would be forcing a virtual machine model in LLVM IR that although useful for static analysis poorly reflects what real machines do. However some low hanging fruit that could be addressed pretty quickly would be to give better semantics to ``fptrunc`` based on how it is codegened by targets currently. For x86 it seems to mean convert a floating point number to a lower precision type using the current rounding mode of the floating point environment. I don't really know what the other targets do though so someone who has a broader overview than me needs to rewrite the semantics. Thanks, Dan.