Michael Ilseman
2012-Nov-14 20:28 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
I think I missed what problem we're trying to solve here. I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here. Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default. Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts. On Nov 12, 2012, at 5:42 PM, Chris Lattner <clattner at apple.com> wrote:> > On Nov 12, 2012, at 10:39 AM, Joe Abbey <jabbey at arxan.com> wrote: > >> Michael, >> >> Since you won't be using metadata to store this information and are augmenting the IR, I'd recommend incrementing the bitcode version number. The current version stored in a local variable in BitcodeWriter.cpp:1814* >> >> I would suspect then you'll also need to provide additional logic for reading: >> >> switch (module_version) { >> default: return Error("Unknown bitstream version!"); >> case 2: >> EncodesFastMathIR = true; >> case 1: >> UseRelativeIDs = true; >> break; >> case 0: >> UseRelativeIDs = false; >> break; >> >> } > > Couldn't this be handled by adding an extra operand to the binary operators? > > -Chris > >> >> Joe >> >> (*TODO: Put this somewhere else). >> >> On Nov 9, 2012, at 5:34 PM, Michael Ilseman <milseman at apple.com> wrote: >> >>> Revision 2 >>> >>> Revision 2 changes: >>> * Add in separate Reciprocal flag >>> * Clarified wording of flags, specified undefined values, not behavior >>> * Removed some confusing language >>> * Mentioned optimizations/analyses adding in flags due to inferred knowledge >>> >>> Revision 1 changes: >>> * Removed Fusion flag from all sections >>> * Clarified and changed descriptions of remaining flags: >>> * Make 'N' and 'I' flags be explicitly concerning values of operands, and >>> producing undef values if a NaN/Inf is provided. >>> * 'S' is now only about distinguishing between +/-0. >>> * LangRef changes updated to reflect flags changes >>> * Updated Quesiton section given the now simpler set of flags >>> * Optimizations changed to reflect 'N' and 'I' describing operands and not >>> results >>> * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc) >>> * Mention that this could be alternatively solved with metadata, and open the >>> debate >>> >>> >>> Introduction >>> --- >>> >>> LLVM IR currently does not have any support for specifying fine-grained control >>> over relaxing floating point requirements for the optimizer. The below is a >>> proposal to extend floating point IR instructions to support a number of flags >>> that a creator of IR can use to allow for greater optimizations when >>> desired. Such changes are sometimes referred to as fast-math, but this proposal >>> is about finer-grained specifications at a per-instruction level. >>> >>> >>> What this doesn't address >>> --- >>> >>> Default behavior is retained, and this proposal is only addressing relaxing >>> restrictions. LLVM currently by default: >>> - ignores signaling NaNs >>> - assumes default rounding mode >>> - assumes FENV_ACCESS is off >>> >>> Discussion on changing the default behavior of LLVM or allowing for more >>> restrictive behavior is outside the scope of this proposal. This proposal does >>> not address behavior of denormals, which is more of a backend concern. >>> >>> Specifying exact precision control or requirements is outside the scope of this >>> proposal, and can probably be handled with the existing metadata implementation. >>> >>> This proposal covers changes to and optimizations over LLVM IR, and changes to >>> codegen are outside the scope of this proposal. The flags described in the next >>> section exist only at the IR level, and will not be propagated into codegen or >>> the SelectionDAG. >>> >>> >>> Flags >>> --- >>> >>> LLVM IR instructions will have the following flags that can be set by the >>> creator of the IR. >>> >>> no NaNs (N) >>> - Allow optimizations that assume the arguments and result are not NaN. Such >>> optimizations are required to retain defined behavior over NaNs, but the >>> value of the result is undefined. >>> >>> no Infs (I) >>> - Allow optimizations that assume the arguments and result are not >>> +/-Inf. Such optimizations are required to retain defined behavior over >>> +/-Inf, but the value of the result is undefined. >>> >>> no signed zeros (S) >>> - Allow optimizations to treat the sign of a zero argument or result as >>> insignificant. >>> >>> allow reciprocal (R) >>> - Allow optimizations to use the reciprocal of an argument instead of dividing >>> >>> unsafe algebra (A) >>> - The optimizer is allowed to perform algebraically equivalent transformations >>> that may dramatically change results in floating point. (e.g. >>> reassociation). >>> >>> Throughout I'll refer to these options in their short-hand, e.g. 'A'. >>> Internally, these flags are to reside in SubclassData. >>> >>> Setting the 'A' flag implies the setting of all the others ('N', 'I', 'S', 'R'). >>> >>> >>> Changes to LangRef >>> --- >>> >>> Change the definitions of floating point arithmetic operations, below is how >>> fadd will change: >>> >>> 'fadd' Instruction >>> Syntax: >>> >>> <result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result >>> ... >>> Semantics: >>> ... >>> flag can be one of the following optimizer hints to enable otherwise unsafe >>> floating point optimizations: >>> N: no NaNs - Allow optimizations that assume the arguments and result are not >>> NaN. Such optimizations are required to retain defined behavior over NaNs, >>> but the value of the result is undefined. >>> I: no infs - Allow optimizations that assume the arguments and result are not >>> +/-Inf. Such optimizations are required to retain defined behavior over >>> +/-Inf, but the value of the result is undefined. >>> S: no signed zeros - Allow optimizations to treat the sign of a zero argument >>> or result as insignificant. >>> A: unsafe algebra - The optimizer is allowed to perform algebraically >>> equivalent transformations that may dramatically change results in floating >>> point. (e.g. reassociation). >>> >>> fdiv will also mention that 'R' allows the fdiv to be replaced by a >>> multiply-by-reciprocal. >>> >>> >>> Changes to optimizations >>> --- >>> >>> Optimizations should be allowed to perform unsafe optimizations provided the >>> instructions involved have the corresponding restrictions relaxed. When >>> combining instructions, optimizations should do what makes sense to not remove >>> restrictions that previously existed (commonly, a bitwise-AND of the flags). >>> >>> Below are some example optimizations that could be allowed with the given >>> relaxations. >>> >>> N - no NaNs >>> x == x ==> true >>> >>> S - no signed zeros >>> x - 0 ==> x >>> 0 - (x - y) ==> y - x >>> >>> NIS - no signed zeros AND no NaNs AND no Infs >>> x * 0 ==> 0 >>> >>> NI - no infs AND no NaNs >>> x - x ==> 0 >>> >>> R - reciprocal >>> x / y ==> x * (1/y) >>> >>> A - unsafe-algebra >>> Reassociation >>> (x + y) + z ==> x + (y + z) >>> (x + C1) + C2 ==> x + (C1 + C2) >>> Redistribution >>> (x * C) + x ==> x * (C+1) >>> (x * C) + (x + x) ==> x * (C + 2) >>> >>> I propose to expand -instsimplify and -instcombine to perform these kinds of >>> optimizations. -reassociate will be expanded to reassociate floating point >>> operations when allowed. Similar to existing behavior regarding integer >>> wrapping, -early-cse will not CSE FP operations with mismatched flags, while >>> -gvn will (conservatively). This allows later optimizations to optimize the >>> expressions independently between runs of -early-cse and -gvn. >>> >>> Optimizations and analyses that are able to infer certain properties of >>> instructions are allowed to set relevant flags. For example, if some analysis >>> has determined that the arguments and result of an instruction are not NaNs or >>> Infs, then it may set the 'N' and 'I' flags, allowing every other optimization >>> and analysis to benefit from this inferred knowledge. >>> >>> Changes to frontends >>> --- >>> >>> Frontends are free to generate code with flags set as they desire. Frontends >>> should continue to call llc with their desired options, as the flags apply only >>> at the IR level and not at codegen or the SelectionDAGs. >>> >>> The intention behind the flags are to allow the IR creator to say something >>> along the lines of: >>> "If this operation is given a NaN, or the result is a NaN, then I don't care >>> what answer I get back. However, I expect my program to otherwise behave >>> properly." >>> >>> Below is a suggested change to clang's command-line options. >>> >>> -ffast-math >>> Currently described as: >>> Enable the *frontend*'s 'fast-math' mode. This has no effect on optimizations, >>> but provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math >>> flag >>> >>> I propose to change the description and behavior to: >>> >>> Enable 'fast-math' mode. This allows for optimizations that may produce >>> incorrect and unsafe results, and thus should only be used with care. This >>> also provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math >>> flag >>> >>> I propose that this turn on all flags for all floating point instructions. If >>> this flag doesn't already cause clang to run llc with -enable-unsafe-fp-math, >>> then I propose that it does so as well. >>> >>> (Optional) >>> I propose adding the below flags: >>> >>> -ffinite-math-only >>> Allow optimizations to assume that floating point arguments and results are >>> NaNs or +/-Inf. This may produce incorrect results, and so should be used with >>> care. >>> >>> This would set the 'I' and 'N' bits on all generated floating point instructions. >>> >>> -fno-signed-zeros >>> Allow optimizations to ignore the signedness of zero. This may produce >>> incorrect results, and so should be used with care. >>> >>> This would set the 'S' bit on all FP instructions. >>> >>> -freciprocal-math >>> Allow optimizations to use the reciprocal of an argument instead of using >>> division. This may produce less precise results, and so should be used with >>> care. >>> >>> This would set the 'R' bit on all relevant FP instructions >>> >>> Changes to llvm cli tools >>> --- >>> opt and llc already have the command line options >>> -enable-unsafe-fp-math: Enable optimizations that may decrease FP precision >>> -enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs >>> -enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs >>> However, opt makes no use of them as they are currently only considered to be >>> TargetOptions. llc will remain unchanged, as these options apply to DAG >>> optimizations while this proposal deals with IR optimizations. >>> >>> (Optional) >>> Have an opt pass that adds the desired flags to floating point instructions. >>> >>> >>> Miscellaneous explanations in the form of Q&A >>> --- >>> >>> Why not just have "fast-math" rather than individual flags? >>> >>> Having the individual flags gives the granularity to choose the levels of >>> optimizations. For example, unsafe-algebra can lead to dramatically different >>> results in corner cases, and may not be desired when a user just wants to ensure >>> that x*0 folds to 0. >>> >>> >>> Why have these flags attached to the instruction itself, rather than be a >>> compiler mode? >>> >>> Being attached to the instruction itself allows much greater flexibility both >>> for other optimizations and for the concerns of the source and target. For >>> example, a frontend may desire that x - x be folded to 0. This would require >>> no-NaNs for the subtract. However, the frontend may want to keep NaNs for its >>> comparisons. >>> >>> Additionally, these properties can be set internally in the optimizer when the >>> property has been proven. For example, if x has been found to be positive, then >>> operations involving x and a constant can be marked to ignore signed zero. >>> >>> Finally, having these flags allows for greater safety and optimization when code >>> of different flags are mixed. For example, a function author may set the >>> unsafe-algebra flag knowing that such transformations will not meaningfully >>> alter its result. If that function gets inlined into a caller, however, we don't >>> want to always assume that the function's expressions can be reassociated with >>> the caller's expressions. These properties allow us to preserve the >>> optimizations of the inlined function without affecting the caller. >>> >>> >>> Why not use metadata rather than flags? >>> >>> There is existing metadata to denote precisions, and this proposal is orthogonal >>> to those efforts. While these properties could still be expressed as metadata, >>> the proposed flags are analogous to nsw/nuw and are inherent properties of the >>> IR instructions themselves that all transformations should respect. >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121114/709e9a56/attachment.html>
Krzysztof Parzyszek
2012-Nov-14 20:43 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On 11/14/2012 2:28 PM, Michael Ilseman wrote:> I think I missed what problem we're trying to solve here. >I'm guessing that the actual encoding may change. If we know that there are certain unused bits, we don't need to store them. If the bits are used, we have to keep them. I think someone was working on some sort of bitcode compression scheme, and in that context the difference may be significant. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Chris Lattner
2012-Nov-14 20:47 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Nov 14, 2012, at 12:28 PM, Michael Ilseman <milseman at apple.com> wrote:> I think I missed what problem we're trying to solve here. > > I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here. > > Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default.Yes, this is the right thing, just make the sense of the bits in the bitcode file be "1" for cases that differs from the old default.> Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts.How, specifically, are you proposing that these bits be encoded? -Chris
Michael Ilseman
2012-Nov-14 21:39 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Nov 14, 2012, at 12:47 PM, Chris Lattner <clattner at apple.com> wrote:> > On Nov 14, 2012, at 12:28 PM, Michael Ilseman <milseman at apple.com> wrote: > >> I think I missed what problem we're trying to solve here. >> >> I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here. >> >> Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default. > > Yes, this is the right thing, just make the sense of the bits in the bitcode file be "1" for cases that differs from the old default. >Will do!>> Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts. > > How, specifically, are you proposing that these bits be encoded? >I'm new to the bitcode so let me know if this doesn't make sense. I was going to look at the encoding for nuw (OBO_NO_UNSIGNED_WRAP) and follow how it is encoded/decoded in the bitcode. I would then specify some kind of fast-math enum and encode it in a similar fashion. After I go down this path a little more I might be able to give you a more intelligent answer. Thanks!> -Chris >
Michael Ilseman
2012-Nov-15 03:19 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Nov 14, 2012, at 12:28 PM, Michael Ilseman <milseman at apple.com> wrote:> I think I missed what problem we're trying to solve here. > > I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here. > > Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default. > > Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts. >I see now that it's only binary operators that have OptimizationFlags reserved for them in the bitcode. Adding fast-math flags for only binary ops is straight-forward, but adding them for other ops might require a more involved bitcode change. I think that there might be some benefit to having flags for other kinds of ops, but those seem a bit more far-fetched and less common. For example, "fcmp N oeq x, x ==> true" or "bitcast N (bitcast N i32 %foo to float) to i32 ==> i32 %foo" seem more contrived than optimizations over binary ops. Comparisons are already sort of their own beast, as they ignore the sign of zero and have ordered and unordered versions. Given all that, I think it makes sense to add support for fast-math flags only to binary ops in this iteration, and think about adding it to other operations in the future. Thoughts?> > On Nov 12, 2012, at 5:42 PM, Chris Lattner <clattner at apple.com> wrote: > >> >> On Nov 12, 2012, at 10:39 AM, Joe Abbey <jabbey at arxan.com> wrote: >> >>> Michael, >>> >>> Since you won't be using metadata to store this information and are augmenting the IR, I'd recommend incrementing the bitcode version number. The current version stored in a local variable in BitcodeWriter.cpp:1814* >>> >>> I would suspect then you'll also need to provide additional logic for reading: >>> >>> switch (module_version) { >>> default: return Error("Unknown bitstream version!"); >>> case 2: >>> EncodesFastMathIR = true; >>> case 1: >>> UseRelativeIDs = true; >>> break; >>> case 0: >>> UseRelativeIDs = false; >>> break; >>> >>> } >> >> Couldn't this be handled by adding an extra operand to the binary operators? >> >> -Chris >> >>> >>> Joe >>> >>> (*TODO: Put this somewhere else). >>> >>> On Nov 9, 2012, at 5:34 PM, Michael Ilseman <milseman at apple.com> wrote: >>> >>>> Revision 2 >>>> >>>> Revision 2 changes: >>>> * Add in separate Reciprocal flag >>>> * Clarified wording of flags, specified undefined values, not behavior >>>> * Removed some confusing language >>>> * Mentioned optimizations/analyses adding in flags due to inferred knowledge >>>> >>>> Revision 1 changes: >>>> * Removed Fusion flag from all sections >>>> * Clarified and changed descriptions of remaining flags: >>>> * Make 'N' and 'I' flags be explicitly concerning values of operands, and >>>> producing undef values if a NaN/Inf is provided. >>>> * 'S' is now only about distinguishing between +/-0. >>>> * LangRef changes updated to reflect flags changes >>>> * Updated Quesiton section given the now simpler set of flags >>>> * Optimizations changed to reflect 'N' and 'I' describing operands and not >>>> results >>>> * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc) >>>> * Mention that this could be alternatively solved with metadata, and open the >>>> debate >>>> >>>> >>>> Introduction >>>> --- >>>> >>>> LLVM IR currently does not have any support for specifying fine-grained control >>>> over relaxing floating point requirements for the optimizer. The below is a >>>> proposal to extend floating point IR instructions to support a number of flags >>>> that a creator of IR can use to allow for greater optimizations when >>>> desired. Such changes are sometimes referred to as fast-math, but this proposal >>>> is about finer-grained specifications at a per-instruction level. >>>> >>>> >>>> What this doesn't address >>>> --- >>>> >>>> Default behavior is retained, and this proposal is only addressing relaxing >>>> restrictions. LLVM currently by default: >>>> - ignores signaling NaNs >>>> - assumes default rounding mode >>>> - assumes FENV_ACCESS is off >>>> >>>> Discussion on changing the default behavior of LLVM or allowing for more >>>> restrictive behavior is outside the scope of this proposal. This proposal does >>>> not address behavior of denormals, which is more of a backend concern. >>>> >>>> Specifying exact precision control or requirements is outside the scope of this >>>> proposal, and can probably be handled with the existing metadata implementation. >>>> >>>> This proposal covers changes to and optimizations over LLVM IR, and changes to >>>> codegen are outside the scope of this proposal. The flags described in the next >>>> section exist only at the IR level, and will not be propagated into codegen or >>>> the SelectionDAG. >>>> >>>> >>>> Flags >>>> --- >>>> >>>> LLVM IR instructions will have the following flags that can be set by the >>>> creator of the IR. >>>> >>>> no NaNs (N) >>>> - Allow optimizations that assume the arguments and result are not NaN. Such >>>> optimizations are required to retain defined behavior over NaNs, but the >>>> value of the result is undefined. >>>> >>>> no Infs (I) >>>> - Allow optimizations that assume the arguments and result are not >>>> +/-Inf. Such optimizations are required to retain defined behavior over >>>> +/-Inf, but the value of the result is undefined. >>>> >>>> no signed zeros (S) >>>> - Allow optimizations to treat the sign of a zero argument or result as >>>> insignificant. >>>> >>>> allow reciprocal (R) >>>> - Allow optimizations to use the reciprocal of an argument instead of dividing >>>> >>>> unsafe algebra (A) >>>> - The optimizer is allowed to perform algebraically equivalent transformations >>>> that may dramatically change results in floating point. (e.g. >>>> reassociation). >>>> >>>> Throughout I'll refer to these options in their short-hand, e.g. 'A'. >>>> Internally, these flags are to reside in SubclassData. >>>> >>>> Setting the 'A' flag implies the setting of all the others ('N', 'I', 'S', 'R'). >>>> >>>> >>>> Changes to LangRef >>>> --- >>>> >>>> Change the definitions of floating point arithmetic operations, below is how >>>> fadd will change: >>>> >>>> 'fadd' Instruction >>>> Syntax: >>>> >>>> <result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result >>>> ... >>>> Semantics: >>>> ... >>>> flag can be one of the following optimizer hints to enable otherwise unsafe >>>> floating point optimizations: >>>> N: no NaNs - Allow optimizations that assume the arguments and result are not >>>> NaN. Such optimizations are required to retain defined behavior over NaNs, >>>> but the value of the result is undefined. >>>> I: no infs - Allow optimizations that assume the arguments and result are not >>>> +/-Inf. Such optimizations are required to retain defined behavior over >>>> +/-Inf, but the value of the result is undefined. >>>> S: no signed zeros - Allow optimizations to treat the sign of a zero argument >>>> or result as insignificant. >>>> A: unsafe algebra - The optimizer is allowed to perform algebraically >>>> equivalent transformations that may dramatically change results in floating >>>> point. (e.g. reassociation). >>>> >>>> fdiv will also mention that 'R' allows the fdiv to be replaced by a >>>> multiply-by-reciprocal. >>>> >>>> >>>> Changes to optimizations >>>> --- >>>> >>>> Optimizations should be allowed to perform unsafe optimizations provided the >>>> instructions involved have the corresponding restrictions relaxed. When >>>> combining instructions, optimizations should do what makes sense to not remove >>>> restrictions that previously existed (commonly, a bitwise-AND of the flags). >>>> >>>> Below are some example optimizations that could be allowed with the given >>>> relaxations. >>>> >>>> N - no NaNs >>>> x == x ==> true >>>> >>>> S - no signed zeros >>>> x - 0 ==> x >>>> 0 - (x - y) ==> y - x >>>> >>>> NIS - no signed zeros AND no NaNs AND no Infs >>>> x * 0 ==> 0 >>>> >>>> NI - no infs AND no NaNs >>>> x - x ==> 0 >>>> >>>> R - reciprocal >>>> x / y ==> x * (1/y) >>>> >>>> A - unsafe-algebra >>>> Reassociation >>>> (x + y) + z ==> x + (y + z) >>>> (x + C1) + C2 ==> x + (C1 + C2) >>>> Redistribution >>>> (x * C) + x ==> x * (C+1) >>>> (x * C) + (x + x) ==> x * (C + 2) >>>> >>>> I propose to expand -instsimplify and -instcombine to perform these kinds of >>>> optimizations. -reassociate will be expanded to reassociate floating point >>>> operations when allowed. Similar to existing behavior regarding integer >>>> wrapping, -early-cse will not CSE FP operations with mismatched flags, while >>>> -gvn will (conservatively). This allows later optimizations to optimize the >>>> expressions independently between runs of -early-cse and -gvn. >>>> >>>> Optimizations and analyses that are able to infer certain properties of >>>> instructions are allowed to set relevant flags. For example, if some analysis >>>> has determined that the arguments and result of an instruction are not NaNs or >>>> Infs, then it may set the 'N' and 'I' flags, allowing every other optimization >>>> and analysis to benefit from this inferred knowledge. >>>> >>>> Changes to frontends >>>> --- >>>> >>>> Frontends are free to generate code with flags set as they desire. Frontends >>>> should continue to call llc with their desired options, as the flags apply only >>>> at the IR level and not at codegen or the SelectionDAGs. >>>> >>>> The intention behind the flags are to allow the IR creator to say something >>>> along the lines of: >>>> "If this operation is given a NaN, or the result is a NaN, then I don't care >>>> what answer I get back. However, I expect my program to otherwise behave >>>> properly." >>>> >>>> Below is a suggested change to clang's command-line options. >>>> >>>> -ffast-math >>>> Currently described as: >>>> Enable the *frontend*'s 'fast-math' mode. This has no effect on optimizations, >>>> but provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math >>>> flag >>>> >>>> I propose to change the description and behavior to: >>>> >>>> Enable 'fast-math' mode. This allows for optimizations that may produce >>>> incorrect and unsafe results, and thus should only be used with care. This >>>> also provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math >>>> flag >>>> >>>> I propose that this turn on all flags for all floating point instructions. If >>>> this flag doesn't already cause clang to run llc with -enable-unsafe-fp-math, >>>> then I propose that it does so as well. >>>> >>>> (Optional) >>>> I propose adding the below flags: >>>> >>>> -ffinite-math-only >>>> Allow optimizations to assume that floating point arguments and results are >>>> NaNs or +/-Inf. This may produce incorrect results, and so should be used with >>>> care. >>>> >>>> This would set the 'I' and 'N' bits on all generated floating point instructions. >>>> >>>> -fno-signed-zeros >>>> Allow optimizations to ignore the signedness of zero. This may produce >>>> incorrect results, and so should be used with care. >>>> >>>> This would set the 'S' bit on all FP instructions. >>>> >>>> -freciprocal-math >>>> Allow optimizations to use the reciprocal of an argument instead of using >>>> division. This may produce less precise results, and so should be used with >>>> care. >>>> >>>> This would set the 'R' bit on all relevant FP instructions >>>> >>>> Changes to llvm cli tools >>>> --- >>>> opt and llc already have the command line options >>>> -enable-unsafe-fp-math: Enable optimizations that may decrease FP precision >>>> -enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs >>>> -enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs >>>> However, opt makes no use of them as they are currently only considered to be >>>> TargetOptions. llc will remain unchanged, as these options apply to DAG >>>> optimizations while this proposal deals with IR optimizations. >>>> >>>> (Optional) >>>> Have an opt pass that adds the desired flags to floating point instructions. >>>> >>>> >>>> Miscellaneous explanations in the form of Q&A >>>> --- >>>> >>>> Why not just have "fast-math" rather than individual flags? >>>> >>>> Having the individual flags gives the granularity to choose the levels of >>>> optimizations. For example, unsafe-algebra can lead to dramatically different >>>> results in corner cases, and may not be desired when a user just wants to ensure >>>> that x*0 folds to 0. >>>> >>>> >>>> Why have these flags attached to the instruction itself, rather than be a >>>> compiler mode? >>>> >>>> Being attached to the instruction itself allows much greater flexibility both >>>> for other optimizations and for the concerns of the source and target. For >>>> example, a frontend may desire that x - x be folded to 0. This would require >>>> no-NaNs for the subtract. However, the frontend may want to keep NaNs for its >>>> comparisons. >>>> >>>> Additionally, these properties can be set internally in the optimizer when the >>>> property has been proven. For example, if x has been found to be positive, then >>>> operations involving x and a constant can be marked to ignore signed zero. >>>> >>>> Finally, having these flags allows for greater safety and optimization when code >>>> of different flags are mixed. For example, a function author may set the >>>> unsafe-algebra flag knowing that such transformations will not meaningfully >>>> alter its result. If that function gets inlined into a caller, however, we don't >>>> want to always assume that the function's expressions can be reassociated with >>>> the caller's expressions. These properties allow us to preserve the >>>> optimizations of the inlined function without affecting the caller. >>>> >>>> >>>> Why not use metadata rather than flags? >>>> >>>> There is existing metadata to denote precisions, and this proposal is orthogonal >>>> to those efforts. While these properties could still be expressed as metadata, >>>> the proposed flags are analogous to nsw/nuw and are inherent properties of the >>>> IR instructions themselves that all transformations should respect. >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121114/01f9d5dc/attachment.html>
Chandler Carruth
2012-Nov-15 03:37 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Wed, Nov 14, 2012 at 7:19 PM, Michael Ilseman <milseman at apple.com> wrote:> I see now that it's only binary operators that have OptimizationFlags > reserved for them in the bitcode. Adding fast-math flags for only binary ops > is straight-forward, but adding them for other ops might require a more > involved bitcode change....> Given all that, I think it makes sense to add support for fast-math flags > only to binary ops in this iteration, and think about adding it to other > operations in the future. Thoughts?I'm really not trying to rehash a discussion in too much depth, but I have to wonder with all of this -- why not use metadata as the *encoding* mechanism for these flags? Just to be clear, I have no strong feelings about any of this, but it feels like metadata at the bitcode level provides a nice extensible encoding scheme. Simultaneously the super convenient accessor methods on the C++ instruction APIs much like flags seem really convenient. But I don't see why we can't have the best of both worlds. Essentially, put the flags in "metadata", but still provide nice first-class APIs etc so that they're significantly easier to use. :: shrug :: just an idea that might simplify modeling this stuff.> > > On Nov 12, 2012, at 5:42 PM, Chris Lattner <clattner at apple.com> wrote: > > > On Nov 12, 2012, at 10:39 AM, Joe Abbey <jabbey at arxan.com> wrote: > > Michael, > > Since you won't be using metadata to store this information and are > augmenting the IR, I'd recommend incrementing the bitcode version number. > The current version stored in a local variable in BitcodeWriter.cpp:1814* > > I would suspect then you'll also need to provide additional logic for > reading: > > switch (module_version) { > default: return Error("Unknown bitstream version!"); > case 2: > EncodesFastMathIR = true; > case 1: > UseRelativeIDs = true; > break; > case 0: > UseRelativeIDs = false; > break; > > } > > > Couldn't this be handled by adding an extra operand to the binary operators? > > -Chris > > > Joe > > (*TODO: Put this somewhere else). > > On Nov 9, 2012, at 5:34 PM, Michael Ilseman <milseman at apple.com> wrote: > > Revision 2 > > Revision 2 changes: > * Add in separate Reciprocal flag > * Clarified wording of flags, specified undefined values, not behavior > * Removed some confusing language > * Mentioned optimizations/analyses adding in flags due to inferred knowledge > > Revision 1 changes: > * Removed Fusion flag from all sections > * Clarified and changed descriptions of remaining flags: > * Make 'N' and 'I' flags be explicitly concerning values of operands, and > producing undef values if a NaN/Inf is provided. > * 'S' is now only about distinguishing between +/-0. > * LangRef changes updated to reflect flags changes > * Updated Quesiton section given the now simpler set of flags > * Optimizations changed to reflect 'N' and 'I' describing operands and not > results > * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc) > * Mention that this could be alternatively solved with metadata, and open > the > debate > > > Introduction > --- > > LLVM IR currently does not have any support for specifying fine-grained > control > over relaxing floating point requirements for the optimizer. The below is a > proposal to extend floating point IR instructions to support a number of > flags > that a creator of IR can use to allow for greater optimizations when > desired. Such changes are sometimes referred to as fast-math, but this > proposal > is about finer-grained specifications at a per-instruction level. > > > What this doesn't address > --- > > Default behavior is retained, and this proposal is only addressing relaxing > restrictions. LLVM currently by default: > - ignores signaling NaNs > - assumes default rounding mode > - assumes FENV_ACCESS is off > > Discussion on changing the default behavior of LLVM or allowing for more > restrictive behavior is outside the scope of this proposal. This proposal > does > not address behavior of denormals, which is more of a backend concern. > > Specifying exact precision control or requirements is outside the scope of > this > proposal, and can probably be handled with the existing metadata > implementation. > > This proposal covers changes to and optimizations over LLVM IR, and changes > to > codegen are outside the scope of this proposal. The flags described in the > next > section exist only at the IR level, and will not be propagated into codegen > or > the SelectionDAG. > > > Flags > --- > > LLVM IR instructions will have the following flags that can be set by the > creator of the IR. > > no NaNs (N) > - Allow optimizations that assume the arguments and result are not NaN. Such > optimizations are required to retain defined behavior over NaNs, but the > value of the result is undefined. > > no Infs (I) > - Allow optimizations that assume the arguments and result are not > +/-Inf. Such optimizations are required to retain defined behavior over > +/-Inf, but the value of the result is undefined. > > no signed zeros (S) > - Allow optimizations to treat the sign of a zero argument or result as > insignificant. > > allow reciprocal (R) > - Allow optimizations to use the reciprocal of an argument instead of > dividing > > unsafe algebra (A) > - The optimizer is allowed to perform algebraically equivalent > transformations > that may dramatically change results in floating point. (e.g. > reassociation). > > Throughout I'll refer to these options in their short-hand, e.g. 'A'. > Internally, these flags are to reside in SubclassData. > > Setting the 'A' flag implies the setting of all the others ('N', 'I', 'S', > 'R'). > > > Changes to LangRef > --- > > Change the definitions of floating point arithmetic operations, below is how > fadd will change: > > 'fadd' Instruction > Syntax: > > <result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result > ... > Semantics: > ... > flag can be one of the following optimizer hints to enable otherwise unsafe > floating point optimizations: > N: no NaNs - Allow optimizations that assume the arguments and result are > not > NaN. Such optimizations are required to retain defined behavior over NaNs, > but the value of the result is undefined. > I: no infs - Allow optimizations that assume the arguments and result are > not > +/-Inf. Such optimizations are required to retain defined behavior over > +/-Inf, but the value of the result is undefined. > S: no signed zeros - Allow optimizations to treat the sign of a zero > argument > or result as insignificant. > A: unsafe algebra - The optimizer is allowed to perform algebraically > equivalent transformations that may dramatically change results in > floating > point. (e.g. reassociation). > > fdiv will also mention that 'R' allows the fdiv to be replaced by a > multiply-by-reciprocal. > > > Changes to optimizations > --- > > Optimizations should be allowed to perform unsafe optimizations provided the > instructions involved have the corresponding restrictions relaxed. When > combining instructions, optimizations should do what makes sense to not > remove > restrictions that previously existed (commonly, a bitwise-AND of the flags). > > Below are some example optimizations that could be allowed with the given > relaxations. > > N - no NaNs > x == x ==> true > > S - no signed zeros > x - 0 ==> x > 0 - (x - y) ==> y - x > > NIS - no signed zeros AND no NaNs AND no Infs > x * 0 ==> 0 > > NI - no infs AND no NaNs > x - x ==> 0 > > R - reciprocal > x / y ==> x * (1/y) > > A - unsafe-algebra > Reassociation > (x + y) + z ==> x + (y + z) > (x + C1) + C2 ==> x + (C1 + C2) > Redistribution > (x * C) + x ==> x * (C+1) > (x * C) + (x + x) ==> x * (C + 2) > > I propose to expand -instsimplify and -instcombine to perform these kinds of > optimizations. -reassociate will be expanded to reassociate floating point > operations when allowed. Similar to existing behavior regarding integer > wrapping, -early-cse will not CSE FP operations with mismatched flags, while > -gvn will (conservatively). This allows later optimizations to optimize the > expressions independently between runs of -early-cse and -gvn. > > Optimizations and analyses that are able to infer certain properties of > instructions are allowed to set relevant flags. For example, if some > analysis > has determined that the arguments and result of an instruction are not NaNs > or > Infs, then it may set the 'N' and 'I' flags, allowing every other > optimization > and analysis to benefit from this inferred knowledge. > > Changes to frontends > --- > > Frontends are free to generate code with flags set as they desire. Frontends > should continue to call llc with their desired options, as the flags apply > only > at the IR level and not at codegen or the SelectionDAGs. > > The intention behind the flags are to allow the IR creator to say something > along the lines of: > "If this operation is given a NaN, or the result is a NaN, then I don't care > what answer I get back. However, I expect my program to otherwise behave > properly." > > Below is a suggested change to clang's command-line options. > > -ffast-math > Currently described as: > Enable the *frontend*'s 'fast-math' mode. This has no effect on > optimizations, > but provides a preprocessor macro __FAST_MATH__ the same as GCC's > -ffast-math > flag > > I propose to change the description and behavior to: > > Enable 'fast-math' mode. This allows for optimizations that may produce > incorrect and unsafe results, and thus should only be used with care. This > also provides a preprocessor macro __FAST_MATH__ the same as GCC's > -ffast-math > flag > > I propose that this turn on all flags for all floating point instructions. > If > this flag doesn't already cause clang to run llc with > -enable-unsafe-fp-math, > then I propose that it does so as well. > > (Optional) > I propose adding the below flags: > > -ffinite-math-only > Allow optimizations to assume that floating point arguments and results are > NaNs or +/-Inf. This may produce incorrect results, and so should be used > with > care. > > This would set the 'I' and 'N' bits on all generated floating point > instructions. > > -fno-signed-zeros > Allow optimizations to ignore the signedness of zero. This may produce > incorrect results, and so should be used with care. > > This would set the 'S' bit on all FP instructions. > > -freciprocal-math > Allow optimizations to use the reciprocal of an argument instead of using > division. This may produce less precise results, and so should be used with > care. > > This would set the 'R' bit on all relevant FP instructions > > Changes to llvm cli tools > --- > opt and llc already have the command line options > -enable-unsafe-fp-math: Enable optimizations that may decrease FP precision > -enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs > -enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs > However, opt makes no use of them as they are currently only considered to > be > TargetOptions. llc will remain unchanged, as these options apply to DAG > optimizations while this proposal deals with IR optimizations. > > (Optional) > Have an opt pass that adds the desired flags to floating point instructions. > > > Miscellaneous explanations in the form of Q&A > --- > > Why not just have "fast-math" rather than individual flags? > > Having the individual flags gives the granularity to choose the levels of > optimizations. For example, unsafe-algebra can lead to dramatically > different > results in corner cases, and may not be desired when a user just wants to > ensure > that x*0 folds to 0. > > > Why have these flags attached to the instruction itself, rather than be a > compiler mode? > > Being attached to the instruction itself allows much greater flexibility > both > for other optimizations and for the concerns of the source and target. For > example, a frontend may desire that x - x be folded to 0. This would require > no-NaNs for the subtract. However, the frontend may want to keep NaNs for > its > comparisons. > > Additionally, these properties can be set internally in the optimizer when > the > property has been proven. For example, if x has been found to be positive, > then > operations involving x and a constant can be marked to ignore signed zero. > > Finally, having these flags allows for greater safety and optimization when > code > of different flags are mixed. For example, a function author may set the > unsafe-algebra flag knowing that such transformations will not meaningfully > alter its result. If that function gets inlined into a caller, however, we > don't > want to always assume that the function's expressions can be reassociated > with > the caller's expressions. These properties allow us to preserve the > optimizations of the inlined function without affecting the caller. > > > Why not use metadata rather than flags? > > There is existing metadata to denote precisions, and this proposal is > orthogonal > to those efforts. While these properties could still be expressed as > metadata, > the proposed flags are analogous to nsw/nuw and are inherent properties of > the > IR instructions themselves that all transformations should respect. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Evan Cheng
2012-Nov-15 06:55 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Nov 14, 2012, at 7:19 PM, Michael Ilseman <milseman at apple.com> wrote:> > On Nov 14, 2012, at 12:28 PM, Michael Ilseman <milseman at apple.com> wrote: > >> I think I missed what problem we're trying to solve here. >> >> I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here. >> >> Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default. >> >> Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts. > > I see now that it's only binary operators that have OptimizationFlags reserved for them in the bitcode. Adding fast-math flags for only binary ops is straight-forward, but adding them for other ops might require a more involved bitcode change. > > I think that there might be some benefit to having flags for other kinds of ops, but those seem a bit more far-fetched and less common. For example, "fcmp N oeq x, x ==> true" or "bitcast N (bitcast N i32 %foo to float) to i32 ==> i32 %foo" seem more contrived than optimizations over binary ops. Comparisons are already sort of their own beast, as they ignore the sign of zero and have ordered and unordered versions. > > Given all that, I think it makes sense to add support for fast-math flags only to binary ops in this iteration, and think about adding it to other operations in the future. Thoughts?I agree. We are not trying all the problems at this time. Evan> >> >> On Nov 12, 2012, at 5:42 PM, Chris Lattner <clattner at apple.com> wrote: >> >>> >>> On Nov 12, 2012, at 10:39 AM, Joe Abbey <jabbey at arxan.com> wrote: >>> >>>> Michael, >>>> >>>> Since you won't be using metadata to store this information and are augmenting the IR, I'd recommend incrementing the bitcode version number. The current version stored in a local variable in BitcodeWriter.cpp:1814* >>>> >>>> I would suspect then you'll also need to provide additional logic for reading: >>>> >>>> switch (module_version) { >>>> default: return Error("Unknown bitstream version!"); >>>> case 2: >>>> EncodesFastMathIR = true; >>>> case 1: >>>> UseRelativeIDs = true; >>>> break; >>>> case 0: >>>> UseRelativeIDs = false; >>>> break; >>>> >>>> } >>> >>> Couldn't this be handled by adding an extra operand to the binary operators? >>> >>> -Chris >>> >>>> >>>> Joe >>>> >>>> (*TODO: Put this somewhere else). >>>> >>>> On Nov 9, 2012, at 5:34 PM, Michael Ilseman <milseman at apple.com> wrote: >>>> >>>>> Revision 2 >>>>> >>>>> Revision 2 changes: >>>>> * Add in separate Reciprocal flag >>>>> * Clarified wording of flags, specified undefined values, not behavior >>>>> * Removed some confusing language >>>>> * Mentioned optimizations/analyses adding in flags due to inferred knowledge >>>>> >>>>> Revision 1 changes: >>>>> * Removed Fusion flag from all sections >>>>> * Clarified and changed descriptions of remaining flags: >>>>> * Make 'N' and 'I' flags be explicitly concerning values of operands, and >>>>> producing undef values if a NaN/Inf is provided. >>>>> * 'S' is now only about distinguishing between +/-0. >>>>> * LangRef changes updated to reflect flags changes >>>>> * Updated Quesiton section given the now simpler set of flags >>>>> * Optimizations changed to reflect 'N' and 'I' describing operands and not >>>>> results >>>>> * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc) >>>>> * Mention that this could be alternatively solved with metadata, and open the >>>>> debate >>>>> >>>>> >>>>> Introduction >>>>> --- >>>>> >>>>> LLVM IR currently does not have any support for specifying fine-grained control >>>>> over relaxing floating point requirements for the optimizer. The below is a >>>>> proposal to extend floating point IR instructions to support a number of flags >>>>> that a creator of IR can use to allow for greater optimizations when >>>>> desired. Such changes are sometimes referred to as fast-math, but this proposal >>>>> is about finer-grained specifications at a per-instruction level. >>>>> >>>>> >>>>> What this doesn't address >>>>> --- >>>>> >>>>> Default behavior is retained, and this proposal is only addressing relaxing >>>>> restrictions. LLVM currently by default: >>>>> - ignores signaling NaNs >>>>> - assumes default rounding mode >>>>> - assumes FENV_ACCESS is off >>>>> >>>>> Discussion on changing the default behavior of LLVM or allowing for more >>>>> restrictive behavior is outside the scope of this proposal. This proposal does >>>>> not address behavior of denormals, which is more of a backend concern. >>>>> >>>>> Specifying exact precision control or requirements is outside the scope of this >>>>> proposal, and can probably be handled with the existing metadata implementation. >>>>> >>>>> This proposal covers changes to and optimizations over LLVM IR, and changes to >>>>> codegen are outside the scope of this proposal. The flags described in the next >>>>> section exist only at the IR level, and will not be propagated into codegen or >>>>> the SelectionDAG. >>>>> >>>>> >>>>> Flags >>>>> --- >>>>> >>>>> LLVM IR instructions will have the following flags that can be set by the >>>>> creator of the IR. >>>>> >>>>> no NaNs (N) >>>>> - Allow optimizations that assume the arguments and result are not NaN. Such >>>>> optimizations are required to retain defined behavior over NaNs, but the >>>>> value of the result is undefined. >>>>> >>>>> no Infs (I) >>>>> - Allow optimizations that assume the arguments and result are not >>>>> +/-Inf. Such optimizations are required to retain defined behavior over >>>>> +/-Inf, but the value of the result is undefined. >>>>> >>>>> no signed zeros (S) >>>>> - Allow optimizations to treat the sign of a zero argument or result as >>>>> insignificant. >>>>> >>>>> allow reciprocal (R) >>>>> - Allow optimizations to use the reciprocal of an argument instead of dividing >>>>> >>>>> unsafe algebra (A) >>>>> - The optimizer is allowed to perform algebraically equivalent transformations >>>>> that may dramatically change results in floating point. (e.g. >>>>> reassociation). >>>>> >>>>> Throughout I'll refer to these options in their short-hand, e.g. 'A'. >>>>> Internally, these flags are to reside in SubclassData. >>>>> >>>>> Setting the 'A' flag implies the setting of all the others ('N', 'I', 'S', 'R'). >>>>> >>>>> >>>>> Changes to LangRef >>>>> --- >>>>> >>>>> Change the definitions of floating point arithmetic operations, below is how >>>>> fadd will change: >>>>> >>>>> 'fadd' Instruction >>>>> Syntax: >>>>> >>>>> <result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result >>>>> ... >>>>> Semantics: >>>>> ... >>>>> flag can be one of the following optimizer hints to enable otherwise unsafe >>>>> floating point optimizations: >>>>> N: no NaNs - Allow optimizations that assume the arguments and result are not >>>>> NaN. Such optimizations are required to retain defined behavior over NaNs, >>>>> but the value of the result is undefined. >>>>> I: no infs - Allow optimizations that assume the arguments and result are not >>>>> +/-Inf. Such optimizations are required to retain defined behavior over >>>>> +/-Inf, but the value of the result is undefined. >>>>> S: no signed zeros - Allow optimizations to treat the sign of a zero argument >>>>> or result as insignificant. >>>>> A: unsafe algebra - The optimizer is allowed to perform algebraically >>>>> equivalent transformations that may dramatically change results in floating >>>>> point. (e.g. reassociation). >>>>> >>>>> fdiv will also mention that 'R' allows the fdiv to be replaced by a >>>>> multiply-by-reciprocal. >>>>> >>>>> >>>>> Changes to optimizations >>>>> --- >>>>> >>>>> Optimizations should be allowed to perform unsafe optimizations provided the >>>>> instructions involved have the corresponding restrictions relaxed. When >>>>> combining instructions, optimizations should do what makes sense to not remove >>>>> restrictions that previously existed (commonly, a bitwise-AND of the flags). >>>>> >>>>> Below are some example optimizations that could be allowed with the given >>>>> relaxations. >>>>> >>>>> N - no NaNs >>>>> x == x ==> true >>>>> >>>>> S - no signed zeros >>>>> x - 0 ==> x >>>>> 0 - (x - y) ==> y - x >>>>> >>>>> NIS - no signed zeros AND no NaNs AND no Infs >>>>> x * 0 ==> 0 >>>>> >>>>> NI - no infs AND no NaNs >>>>> x - x ==> 0 >>>>> >>>>> R - reciprocal >>>>> x / y ==> x * (1/y) >>>>> >>>>> A - unsafe-algebra >>>>> Reassociation >>>>> (x + y) + z ==> x + (y + z) >>>>> (x + C1) + C2 ==> x + (C1 + C2) >>>>> Redistribution >>>>> (x * C) + x ==> x * (C+1) >>>>> (x * C) + (x + x) ==> x * (C + 2) >>>>> >>>>> I propose to expand -instsimplify and -instcombine to perform these kinds of >>>>> optimizations. -reassociate will be expanded to reassociate floating point >>>>> operations when allowed. Similar to existing behavior regarding integer >>>>> wrapping, -early-cse will not CSE FP operations with mismatched flags, while >>>>> -gvn will (conservatively). This allows later optimizations to optimize the >>>>> expressions independently between runs of -early-cse and -gvn. >>>>> >>>>> Optimizations and analyses that are able to infer certain properties of >>>>> instructions are allowed to set relevant flags. For example, if some analysis >>>>> has determined that the arguments and result of an instruction are not NaNs or >>>>> Infs, then it may set the 'N' and 'I' flags, allowing every other optimization >>>>> and analysis to benefit from this inferred knowledge. >>>>> >>>>> Changes to frontends >>>>> --- >>>>> >>>>> Frontends are free to generate code with flags set as they desire. Frontends >>>>> should continue to call llc with their desired options, as the flags apply only >>>>> at the IR level and not at codegen or the SelectionDAGs. >>>>> >>>>> The intention behind the flags are to allow the IR creator to say something >>>>> along the lines of: >>>>> "If this operation is given a NaN, or the result is a NaN, then I don't care >>>>> what answer I get back. However, I expect my program to otherwise behave >>>>> properly." >>>>> >>>>> Below is a suggested change to clang's command-line options. >>>>> >>>>> -ffast-math >>>>> Currently described as: >>>>> Enable the *frontend*'s 'fast-math' mode. This has no effect on optimizations, >>>>> but provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math >>>>> flag >>>>> >>>>> I propose to change the description and behavior to: >>>>> >>>>> Enable 'fast-math' mode. This allows for optimizations that may produce >>>>> incorrect and unsafe results, and thus should only be used with care. This >>>>> also provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math >>>>> flag >>>>> >>>>> I propose that this turn on all flags for all floating point instructions. If >>>>> this flag doesn't already cause clang to run llc with -enable-unsafe-fp-math, >>>>> then I propose that it does so as well. >>>>> >>>>> (Optional) >>>>> I propose adding the below flags: >>>>> >>>>> -ffinite-math-only >>>>> Allow optimizations to assume that floating point arguments and results are >>>>> NaNs or +/-Inf. This may produce incorrect results, and so should be used with >>>>> care. >>>>> >>>>> This would set the 'I' and 'N' bits on all generated floating point instructions. >>>>> >>>>> -fno-signed-zeros >>>>> Allow optimizations to ignore the signedness of zero. This may produce >>>>> incorrect results, and so should be used with care. >>>>> >>>>> This would set the 'S' bit on all FP instructions. >>>>> >>>>> -freciprocal-math >>>>> Allow optimizations to use the reciprocal of an argument instead of using >>>>> division. This may produce less precise results, and so should be used with >>>>> care. >>>>> >>>>> This would set the 'R' bit on all relevant FP instructions >>>>> >>>>> Changes to llvm cli tools >>>>> --- >>>>> opt and llc already have the command line options >>>>> -enable-unsafe-fp-math: Enable optimizations that may decrease FP precision >>>>> -enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs >>>>> -enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs >>>>> However, opt makes no use of them as they are currently only considered to be >>>>> TargetOptions. llc will remain unchanged, as these options apply to DAG >>>>> optimizations while this proposal deals with IR optimizations. >>>>> >>>>> (Optional) >>>>> Have an opt pass that adds the desired flags to floating point instructions. >>>>> >>>>> >>>>> Miscellaneous explanations in the form of Q&A >>>>> --- >>>>> >>>>> Why not just have "fast-math" rather than individual flags? >>>>> >>>>> Having the individual flags gives the granularity to choose the levels of >>>>> optimizations. For example, unsafe-algebra can lead to dramatically different >>>>> results in corner cases, and may not be desired when a user just wants to ensure >>>>> that x*0 folds to 0. >>>>> >>>>> >>>>> Why have these flags attached to the instruction itself, rather than be a >>>>> compiler mode? >>>>> >>>>> Being attached to the instruction itself allows much greater flexibility both >>>>> for other optimizations and for the concerns of the source and target. For >>>>> example, a frontend may desire that x - x be folded to 0. This would require >>>>> no-NaNs for the subtract. However, the frontend may want to keep NaNs for its >>>>> comparisons. >>>>> >>>>> Additionally, these properties can be set internally in the optimizer when the >>>>> property has been proven. For example, if x has been found to be positive, then >>>>> operations involving x and a constant can be marked to ignore signed zero. >>>>> >>>>> Finally, having these flags allows for greater safety and optimization when code >>>>> of different flags are mixed. For example, a function author may set the >>>>> unsafe-algebra flag knowing that such transformations will not meaningfully >>>>> alter its result. If that function gets inlined into a caller, however, we don't >>>>> want to always assume that the function's expressions can be reassociated with >>>>> the caller's expressions. These properties allow us to preserve the >>>>> optimizations of the inlined function without affecting the caller. >>>>> >>>>> >>>>> Why not use metadata rather than flags? >>>>> >>>>> There is existing metadata to denote precisions, and this proposal is orthogonal >>>>> to those efforts. While these properties could still be expressed as metadata, >>>>> the proposed flags are analogous to nsw/nuw and are inherent properties of the >>>>> IR instructions themselves that all transformations should respect. >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121114/43333fa1/attachment.html>
Possibly Parallel Threads
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level