The attached patch is a first attempt at representing "-ffast-math" at the IR level, in fact on individual floating point instructions (fadd, fsub etc). It is done using metadata. We already have a "fpmath" metadata type which can be used to signal that reduced precision is OK for a floating point operation, eg %z = fmul float %x, %y, !fpmath !0 ... !0 = metadata !{double 2.5} indicates that the multiplication can be done in any way that doesn't introduce more than 2.5 ULPs of error. The first observation is that !fpmath can be extended with additional operands in the future: operands that say things like whether it is OK to assume that there are no NaNs and so forth. This patch doesn't add additional operands though. It just allows the existing accuracy operand to be the special keyword "fast" instead of a number: %z = fmul float %x, %y, !fpmath !0 ... !0 = metadata !{!metadata "fast"} This indicates that accuracy loss is acceptable (just how much is unspecified) for the sake of speed. Thanks to Chandler for pushing me to do it this way! It also creates a simple way of getting and setting this information: the FPMathOperator class: you can cast appropriate instructions to this class and then use the querying/mutating methods to get/set the accuracy, whether 2.5 or "fast". The attached clang patch uses this to set the openCL 2.5 ULPs accuracy rather than doing it by hand for example. In addition it changes IRBuilder so that you can provide an accuracy when creating floating point operations. I don't like this so much. It would be more efficient to just create the metadata once and then splat it onto each instruction. Also, if fpmath gets a bunch more options/operands in the future then this interface will become more and more awkward. Opinions welcome! I didn't actually implement any optimizations that use this yet. I took a look at the impact on aermod.f90, a reasonably floating point heavy Fortran benchmark (4% of the human readable IR consists of floating point operations). At -O3 (the worst), the size of the bitcode increases by 0.8%. No idea if that's acceptable - hopefully it is! Enjoy! Duncan. -------------- next part -------------- A non-text attachment was scrubbed... Name: fastm-llvm.diff Type: text/x-patch Size: 14251 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120414/95aa6cb6/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: fastm-clang.diff Type: text/x-patch Size: 2240 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120414/95aa6cb6/attachment-0001.bin>
Hi Duncan, I'm not sure about this: + if (!Accuracy) + // If it's not a floating point number then it must be 'fast'. + return getFastAccuracy(); Since you allow accuracies bigger than 1 in setFPAccuracy(), integers should be treated as float. Or at least assert. Also, I'm thinking you should carry the annotation forward on all uses of an annotated result, or make sure the floating point library searches recursively for annotations on any dependency of the value being analysed. About creating annotations every time, I think this could be a nice idea for a metadata factory functionality. Something that would cache metadata, and in case of repetition, point to the same metadata. This could be used for other optimisations (if I recall correctly, the debug metadata does that already). The problem with this is that, if an optimisation pass changes one, you must make sure the other can also be changed, or split-on-write, and that can cause some bloated code in the optimiser, which is not ideal. I think, for now, it's acceptable. But should be on request basis (aka, only present if -fmath options are explicitly specified). The rest of the patch looks sane, though. I like the idea of using metadata, since the target code can easily ignore if it doesn't support FP optimisations or IEEE strictness. cheers, --renato
Hi Renato,> I'm not sure about this: > > + if (!Accuracy) > + // If it's not a floating point number then it must be 'fast'. > + return getFastAccuracy(); > > Since you allow accuracies bigger than 1 in setFPAccuracy(), integers > should be treated as float. Or at least assert.the verifier checks that the accuracy operand is either a floating point number (ConstantFP) or the keyword "fast". If "Accuracy" is zero here then that means it wasn't ConstantFP. Thus it must have been the keyword "fast".> Also, I'm thinking you should carry the annotation forward on all uses > of an annotated result, or make sure the floating point library > searches recursively for annotations on any dependency of the value > being analysed.Yes, this is a possible optimization (especially useful if functions from a -ffast-math compiled module are inlined into functions from a non -ffast-math compiled module or vice versa) but it is not needed for correctness. I plan to implement optimizations using the metadata later.> About creating annotations every time, I think this could be a nice > idea for a metadata factory functionality. Something that would cache > metadata, and in case of repetition, point to the same metadata. This > could be used for other optimisations (if I recall correctly, the > debug metadata does that already).Yes, Chandler suggested it already, and I think it is a good idea.> The problem with this is that, if an optimisation pass changes one, > you must make sure the other can also be changed, or split-on-write, > and that can cause some bloated code in the optimiser, which is not > ideal.Optimizers don't (or shouldn't) change metadata because metadata is uniqued: if you change it you change it for all users. Instead new metadata has to be created. So I doubt that this is a problem in practice. Also, I think metadata is intrinsically a weak value handle, so if someone changes the metadata underneath the builder then its copy will become null. When it sees that the cached metadata is null then it can create it anew. So I think it should be possible to ensure that this works well.> I think, for now, it's acceptable. But should be on request basis > (aka, only present if -fmath options are explicitly specified). > > The rest of the patch looks sane, though. I like the idea of using > metadata, since the target code can easily ignore if it doesn't > support FP optimisations or IEEE strictness.This kind of metadata must only relax IEEE strictness (and never tighten it) because *metadata can always be discarded*. Discarding it must never result in wrong IR/transforms, thus metadata can only give additional permissions. Ciao, Duncan.
Hi Duncan, I'm not an expert in fp accuracy question, but I had quite a few experience dealing with fp accuracy problems during compiler transformations. I think you have a step in the right direction, walking away from ULPs, which are pretty useless for the purpose of describing allowed fp optimizations IMHO. But using just "fast" keyword (or whatever else will be added in the future) is not enough without strict definition of this keyword in terms of IR transformations. For example, particular transformation may be interested if reassociation is allowed or not ((a+b)+c=> a+(b+c)), if fp contraction is allowed or not (ab+c >fma(a,b,c)), if addition of zero may be canceled (x+0=>x) and etc. If this definition is not given on infrastructure level, this may lead to disaster, when each transformation interprets "fast" in its own way. Dmitry. On Sat, Apr 14, 2012 at 10:28 PM, Duncan Sands <baldrick at free.fr> wrote:> The attached patch is a first attempt at representing "-ffast-math" at the > IR > level, in fact on individual floating point instructions (fadd, fsub etc). > It > is done using metadata. We already have a "fpmath" metadata type which > can be > used to signal that reduced precision is OK for a floating point > operation, eg > > %z = fmul float %x, %y, !fpmath !0 > ... > !0 = metadata !{double 2.5} > > indicates that the multiplication can be done in any way that doesn't > introduce > more than 2.5 ULPs of error. > > The first observation is that !fpmath can be extended with additional > operands > in the future: operands that say things like whether it is OK to assume > that > there are no NaNs and so forth. > > This patch doesn't add additional operands though. It just allows the > existing > accuracy operand to be the special keyword "fast" instead of a number: > > %z = fmul float %x, %y, !fpmath !0 > ... > !0 = metadata !{!metadata "fast"} > > This indicates that accuracy loss is acceptable (just how much is > unspecified) > for the sake of speed. Thanks to Chandler for pushing me to do it this > way! > > It also creates a simple way of getting and setting this information: the > FPMathOperator class: you can cast appropriate instructions to this class > and then use the querying/mutating methods to get/set the accuracy, whether > 2.5 or "fast". The attached clang patch uses this to set the openCL 2.5 > ULPs > accuracy rather than doing it by hand for example. > > In addition it changes IRBuilder so that you can provide an accuracy when > creating floating point operations. I don't like this so much. It would > be more efficient to just create the metadata once and then splat it onto > each instruction. Also, if fpmath gets a bunch more options/operands in > the future then this interface will become more and more awkward. Opinions > welcome! > > I didn't actually implement any optimizations that use this yet. > > I took a look at the impact on aermod.f90, a reasonably floating point > heavy > Fortran benchmark (4% of the human readable IR consists of floating point > operations). At -O3 (the worst), the size of the bitcode increases by > 0.8%. > No idea if that's acceptable - hopefully it is! > > Enjoy! > > Duncan. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120414/90dd087d/attachment.html>
Hi Dmitry,> I'm not an expert in fp accuracy question, but I had quite a > few experience dealing with fp accuracy problems during compiler transformations.I agree that it's a minefield which is why I intend to proceed conservatively.> I think you have a step in the right direction, walking away from ULPs, which > are pretty useless for the purpose of describing allowed fp optimizations IMHO. > But using just "fast" keyword (or whatever else will be added in the future) is > not enough without strict definition of this keyword in terms of IR > transformations. For example, particular transformation may be interested if > reassociation is allowed or not ((a+b)+c=> a+(b+c)), if fp contraction is > allowed or not (ab+c = >fma(a,b,c)), if addition of zero may be canceled > (x+0=>x) and etc. If this definition is not given on infrastructure level, this > may lead to disaster, when each transformation interprets "fast" in its own way.This is actually the main reason for using metadata rather than a flag like the "nsw" flag on integer operations: it is easily extendible with more info to say whether reassociation is OK and so forth. The kinds of transforms I think can reasonably be done with the current information are things like: x + 0.0 -> x; x / constant -> x * (1 / constant) if constant and 1 / constant are normal (and not denormal) numbers. Ciao, Duncan.> > Dmitry. > > On Sat, Apr 14, 2012 at 10:28 PM, Duncan Sands <baldrick at free.fr > <mailto:baldrick at free.fr>> wrote: > > The attached patch is a first attempt at representing "-ffast-math" at the IR > level, in fact on individual floating point instructions (fadd, fsub etc). It > is done using metadata. We already have a "fpmath" metadata type which can be > used to signal that reduced precision is OK for a floating point operation, eg > > %z = fmul float %x, %y, !fpmath !0 > ... > !0 = metadata !{double 2.5} > > indicates that the multiplication can be done in any way that doesn't introduce > more than 2.5 ULPs of error. > > The first observation is that !fpmath can be extended with additional operands > in the future: operands that say things like whether it is OK to assume that > there are no NaNs and so forth. > > This patch doesn't add additional operands though. It just allows the existing > accuracy operand to be the special keyword "fast" instead of a number: > > %z = fmul float %x, %y, !fpmath !0 > ... > !0 = metadata !{!metadata "fast"} > > This indicates that accuracy loss is acceptable (just how much is unspecified) > for the sake of speed. Thanks to Chandler for pushing me to do it this way! > > It also creates a simple way of getting and setting this information: the > FPMathOperator class: you can cast appropriate instructions to this class > and then use the querying/mutating methods to get/set the accuracy, whether > 2.5 or "fast". The attached clang patch uses this to set the openCL 2.5 ULPs > accuracy rather than doing it by hand for example. > > In addition it changes IRBuilder so that you can provide an accuracy when > creating floating point operations. I don't like this so much. It would > be more efficient to just create the metadata once and then splat it onto > each instruction. Also, if fpmath gets a bunch more options/operands in > the future then this interface will become more and more awkward. Opinions > welcome! > > I didn't actually implement any optimizations that use this yet. > > I took a look at the impact on aermod.f90, a reasonably floating point heavy > Fortran benchmark (4% of the human readable IR consists of floating point > operations). At -O3 (the worst), the size of the bitcode increases by 0.8%. > No idea if that's acceptable - hopefully it is! > > Enjoy! > > Duncan. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Here's a revised patch, plus patches showing how fpmath metadata could be turned on in clang and dragonegg (it seemed safest for the moment to condition on -ffast-math rather than on one of the flags implied by -ffast-math). Major changes: - The FPMathOperator class can no longer be used to change math settings, only to read them. Currently it can be queried for accuracy info. I split the accuracy methods into two: one for 'fast' accuracy, one for a numerical accuracy (which returns +infty when the accuracy is 'fast'). - MDBuilder got support for creating fpmath metadata, in particular there is function that returns the appropriate settings for -ffast-math. - A default fpmath setting can be supplied to IRBuilder, which will then apply it to all floating point operations. It is also possible to specify specific fpmath metadata when creating an operation. Ciao, Duncan. -------------- next part -------------- A non-text attachment was scrubbed... Name: fastm.diff Type: text/x-patch Size: 18788 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120416/e7f4b1c8/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: fastm-clang.diff Type: text/x-patch Size: 1497 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120416/e7f4b1c8/attachment-0001.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: fastm-dragonegg.diff Type: text/x-patch Size: 563 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120416/e7f4b1c8/attachment-0002.bin>
Hi Duncan, I like the changes to IRBuilder and how the operator can't change it. Looks a lot safer (mistake-wise) and more convenient. This function won't to remove a previously set tag, which could be used by optimisations or inlining. + Instruction *AddFPMathTag(Instruction *I, MDNode *FPMathTag) const { + if (!FPMathTag) + FPMathTag = DefaultFPMathTag; + if (FPMathTag) + I->setMetadata(LLVMContext::MD_fpmath, FPMathTag); + return I; + } If you want to keep it as only Add, then make FPMathTag = 0 so that you can easily add the default by just calling AddFPMathTag(instr); But I'd add a ClearFPMathTag function for optimisations/inlining. Maybe later. Also, would be good to make sure the instruction is, in fact, a floating point operation. Either via restricting the type or asserting on it. -- cheers, --renato http://systemcall.org/
Thanks for the updates! Minor comments: + if (!Accuracy) + // If it's not a floating point number then it must be 'fast'. + return HUGE_VALF; Can we add an assert instead of a comment? It's just as documenting and will catch any goofs. + // If it's not a floating point number then it must be 'fast'. + return !isa<ConstantFP>(MD->getOperand(0)); Here as well. + if (ConstantFP *CFP0 = dyn_cast_or_null<ConstantFP>(Op0)) { + APFloat Accuracy = CFP0->getValueAPF(); + Assert1(Accuracy.isNormal() && !Accuracy.isNegative(), + "fpmath accuracy not a positive number!", &I); To be pedantic for a moment, zero is not a positive number. What about asserting these individually to give us more clear asserts if they fire? That also makes the string easier to write: "fpmath accuracy is a negative number!". + /// SetDefaultFPMathTag - Set the floating point math metadata to be used. + void SetDefaultFPMathTag(MDNode *FPMathTag) { DefaultFPMathTag FPMathTag; } This should be 'setDefault...' much like 'getDefault...' above. + Instruction *AddFPMathTag(Instruction *I, MDNode *FPMathTag) const { Another bad case, but I think this instruction is gone... + MDString *GetFastString() const { + return CreateString("fast"); + } 'getFastString'. + /// CreateFastFPMath - Return metadata with appropriate settings for 'fast + /// math'. I would prefer the more modern doxygen style: /// \brief Return metadata ... + MDNode *CreateFastFPMath() { Capitalization. The capitalization and doxygen style comments apply to the next function as well. Both the Clang and DragonEgg patches look good, but both need test cases. =] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120416/466d8d0e/attachment.html>
Duncan, I have some issues with representing this as a single "fast" mode flag, which mostly boil down to the fact that this is a very C-centric view of the world. And, since C compilers are not generally known for their awesomeness on issues of numerics, I'm not sure that's a good idea. Having something called a "fast" or "relaxed" mode implies that it is less precise than whatever the standard mode is. However, C is notably sparse in specifying what exactly the standard mode is. The typical assumption is that it is the strict one-to-one translation to IEEE754 semantics, but no optimizing C compiler actually implements that. Other languages are more interesting in this regard. Fortran, for instance, allows reassociation within parentheses. (Can that even be represented with instruction metadata?) OpenCL has a very fairly baseline mode, but specifies a number of specific options the user can enable to relax it (-cl-mad-enable, -cl-no-signed-zeros, -cl-unsafe-math-optimization (implies the previous two), -cl-finite-math-only, -cl-fast-relaxed-math (implies all prior)). GLSL has distinct desktop and embedded specifications that place different levels of constraint on implementations. If we define the baseline behavior to be strict IEEE conformance, and then don't provide a more nuanced method of relaxing it, we're not going to be in a significantly better world than we are today. No reasonable implementation of these languages wants strict conformance (except maybe desktop-profile OpenCL) as their default mode, nor is there any way a universal definition of "fast" math can work for all of them. --Owen On Apr 14, 2012, at 11:28 AM, Duncan Sands <baldrick at free.fr> wrote:> The attached patch is a first attempt at representing "-ffast-math" at the IR > level, in fact on individual floating point instructions (fadd, fsub etc). It > is done using metadata. We already have a "fpmath" metadata type which can be > used to signal that reduced precision is OK for a floating point operation, eg > > %z = fmul float %x, %y, !fpmath !0 > ... > !0 = metadata !{double 2.5} > > indicates that the multiplication can be done in any way that doesn't introduce > more than 2.5 ULPs of error. > > The first observation is that !fpmath can be extended with additional operands > in the future: operands that say things like whether it is OK to assume that > there are no NaNs and so forth. > > This patch doesn't add additional operands though. It just allows the existing > accuracy operand to be the special keyword "fast" instead of a number: > > %z = fmul float %x, %y, !fpmath !0 > ... > !0 = metadata !{!metadata "fast"} > > This indicates that accuracy loss is acceptable (just how much is unspecified) > for the sake of speed. Thanks to Chandler for pushing me to do it this way! > > It also creates a simple way of getting and setting this information: the > FPMathOperator class: you can cast appropriate instructions to this class > and then use the querying/mutating methods to get/set the accuracy, whether > 2.5 or "fast". The attached clang patch uses this to set the openCL 2.5 ULPs > accuracy rather than doing it by hand for example. > > In addition it changes IRBuilder so that you can provide an accuracy when > creating floating point operations. I don't like this so much. It would > be more efficient to just create the metadata once and then splat it onto > each instruction. Also, if fpmath gets a bunch more options/operands in > the future then this interface will become more and more awkward. Opinions > welcome! > > I didn't actually implement any optimizations that use this yet. > > I took a look at the impact on aermod.f90, a reasonably floating point heavy > Fortran benchmark (4% of the human readable IR consists of floating point > operations). At -O3 (the worst), the size of the bitcode increases by 0.8%. > No idea if that's acceptable - hopefully it is! > > Enjoy! > > Duncan. > <fastm-llvm.diff><fastm-clang.diff>_______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Owen,> I have some issues with representing this as a single "fast" mode flag,it isn't a single flag, that's the whole point of using metadata. OK, right now there is only one option (the "accuracy"), true, but the intent is that others will be added, and the meaning of accuracy tightened, later. MDBuilder has a createFastFPMath method which is intended to produce settings that match GCC's -ffast-math, however frontends will be able to specify whatever settings they like if that doesn't suit them (i.e. createFPMath will get more arguments as more settings become available). Note that as the current option isn't actually connected to any optimizations, there is nothing much to argue about for the moment. My plan is to introduce a few simple optimizations (x + 0.0 -> x for example) that introduce a finite number of ULPs of error, and hook them up. Thus this does not include things like x * 0.0 -> 0.0 (infinite ULPs of error), reassociation (infinite ULPs of error) or any other scary things. which mostly boil down to the fact that this is a very C-centric view of the world. And, since C compilers are not generally known for their awesomeness on issues of numerics, I'm not sure that's a good idea.> Having something called a "fast" or "relaxed" mode implies that it is less precise than whatever the standard mode is. However, C is notably sparse in specifying what exactly the standard mode is. The typical assumption is that it is the strict one-to-one translation to IEEE754 semantics, but no optimizing C compiler actually implements that.I think this is a misunderstanding of where I'm going, see above.> Other languages are more interesting in this regard. Fortran, for instance, allows reassociation within parentheses. (Can that even be represented with instruction metadata?)I'm aware of Fortran parentheses (PAREN_EXPR in gcc). If it can't be expressed well then too bad: reassociation can just be turned off and we won't optimize Fortran as well as we could. (As mentioned above I have no intention of turning on reassociation based on the current flag since it can introduce an unbounded number of ULPs of error). OpenCL has a very fairly baseline mode, but specifies a number of specific options the user can enable to relax it (-cl-mad-enable, -cl-no-signed-zeros, -cl-unsafe-math-optimization (implies the previous two), -cl-finite-math-only, -cl-fast-relaxed-math (implies all prior)). GLSL has distinct desktop and embedded specifications that place different levels of constraint on implementations. Yup.> > If we define the baseline behavior to be strict IEEE conformance,Which we do. and then don't provide a more nuanced method of relaxing it, Allowing more nuanced ways is the reason for using metadata as explained above. we're not going to be in a significantly better world than we are today. No reasonable implementation of these languages wants strict conformance (except maybe desktop-profile OpenCL) as their default mode, Strict conformance is what they get right now. nor is there any way a universal definition of "fast" math can work for all of them. I agree, and I'm not trying to provide one. Ciao, Duncan.> > --Owen > > On Apr 14, 2012, at 11:28 AM, Duncan Sands<baldrick at free.fr> wrote: > >> The attached patch is a first attempt at representing "-ffast-math" at the IR >> level, in fact on individual floating point instructions (fadd, fsub etc). It >> is done using metadata. We already have a "fpmath" metadata type which can be >> used to signal that reduced precision is OK for a floating point operation, eg >> >> %z = fmul float %x, %y, !fpmath !0 >> ... >> !0 = metadata !{double 2.5} >> >> indicates that the multiplication can be done in any way that doesn't introduce >> more than 2.5 ULPs of error. >> >> The first observation is that !fpmath can be extended with additional operands >> in the future: operands that say things like whether it is OK to assume that >> there are no NaNs and so forth. >> >> This patch doesn't add additional operands though. It just allows the existing >> accuracy operand to be the special keyword "fast" instead of a number: >> >> %z = fmul float %x, %y, !fpmath !0 >> ... >> !0 = metadata !{!metadata "fast"} >> >> This indicates that accuracy loss is acceptable (just how much is unspecified) >> for the sake of speed. Thanks to Chandler for pushing me to do it this way! >> >> It also creates a simple way of getting and setting this information: the >> FPMathOperator class: you can cast appropriate instructions to this class >> and then use the querying/mutating methods to get/set the accuracy, whether >> 2.5 or "fast". The attached clang patch uses this to set the openCL 2.5 ULPs >> accuracy rather than doing it by hand for example. >> >> In addition it changes IRBuilder so that you can provide an accuracy when >> creating floating point operations. I don't like this so much. It would >> be more efficient to just create the metadata once and then splat it onto >> each instruction. Also, if fpmath gets a bunch more options/operands in >> the future then this interface will become more and more awkward. Opinions >> welcome! >> >> I didn't actually implement any optimizations that use this yet. >> >> I took a look at the impact on aermod.f90, a reasonably floating point heavy >> Fortran benchmark (4% of the human readable IR consists of floating point >> operations). At -O3 (the worst), the size of the bitcode increases by 0.8%. >> No idea if that's acceptable - hopefully it is! >> >> Enjoy! >> >> Duncan. >> <fastm-llvm.diff><fastm-clang.diff>_______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >