Evan Cheng
2012-Oct-30 22:11 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Oct 30, 2012, at 1:46 AM, Duncan Sands <baldrick at free.fr> wrote:> Hi Michael, > >> Flags >> --- >> no NaNs (N) >> - ignore the existence of NaNs when convenient >> no Infs (I) >> - ignore the existence of Infs when convenient >> no signed zeros (S) >> - ignore the existence of negative zero when convenient > > while the above flags make perfect sense for me, the other two seem more > dubious: > >> allow fusion (F) >> - fuse FP operations when convenient, despite possible differences in rounding >> (e.g. form FMAs) >> unsafe algebra (A) >> - allow for algebraically equivalent transformations that may dramatically >> change results in floating point. (e.g. reassociation) > > They don't seem to be capturing a clear concept, they seem more like a grab-bag > of "everything else" (A) or "here's a random thing that is important today so > let's have a flag for it" (F). > > ... > >> Why not use metadata rather than flags? >> >> There is existing metadata to denote precisions, and this proposal is orthogonal >> to those efforts. These flags are analogous to nsw/nuw, and are inherent >> properties of the IR instructions themselves that all transformations should >> respect. > > If you drop any of these flags then things are still conservatively correct, > just like with metadata. In my opinion this could be implemented as metadata. > (I'm not saying it should be represented as metadata, I'm saying it could be). > > Disadvantages of metadata: > > - Bloats the IR (however my measurements suggest this is by < 2% for math heavy > code) > - More painful to work with (though helper classes can mitigate this) > - Less efficient to modify (but will flags be cleared that often)? > > Disadvantages of using subclass data bits: > > - Can only represent flags. Thus you might end up with a mix of flags and > metadata for floating point math, with the metadata holding the non-flag > info, and subclass data holding the flags. In which case it might be better > to just have it all be metadata in the first place > - Only a limited number of bits (but hey) > > Hopefully Chris will weigh in with his opinion.FYI. We've already had extensive discussion with Chris on this. He has made it clear this *must* be implemented with subclass data bits, not with metadata. Evan> > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Chris Lattner
2012-Oct-31 05:50 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Oct 30, 2012, at 3:11 PM, Evan Cheng <evan.cheng at apple.com> wrote:>> Disadvantages of using subclass data bits: >> >> - Can only represent flags. Thus you might end up with a mix of flags and >> metadata for floating point math, with the metadata holding the non-flag >> info, and subclass data holding the flags. In which case it might be better >> to just have it all be metadata in the first place >> - Only a limited number of bits (but hey) >> >> Hopefully Chris will weigh in with his opinion. > > FYI. We've already had extensive discussion with Chris on this. He has made it clear this *must* be implemented with subclass data bits, not with metadata.More specifically, I reviewed the proposal and I agree with it's general design: I think it makes sense to use subclass data for these bits even though fpprecision doesn't. It follows the analogy of NSW/NUW bits which have worked well. I also think it makes a lot of sense to separate out the "relaxing FP math" part of the FP problem from orthogonal issues like modeling rounding modes, trapping operations (SNANs), etc. That said, I agree that the individual proposed bits (e.g. "A") could use some refinement. I think it is really important to accurately model the concepts that GCC exposes, but it may make sense to decompose them into finer-grained concepts than what GCC exposes. Also, infer-ability is an important aspect of this: we already have stuff in LLVM that tries to figure out things like "this can never be negative zero". I'd like it if we can separate the inference of this property from the clients of it. At a (ridiculous) limit, we could take everything in "A" and see what optimizations we want to permit, and add a separate bit for every suboptimization that it would enable. Hopefully from that list we can find natural clusters that would make sense to group together. -Chris
Michael Ilseman
2012-Nov-02 16:58 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Oct 30, 2012, at 10:50 PM, Chris Lattner <clattner at apple.com> wrote:> On Oct 30, 2012, at 3:11 PM, Evan Cheng <evan.cheng at apple.com> wrote: >>> Disadvantages of using subclass data bits: >>> >>> - Can only represent flags. Thus you might end up with a mix of flags and >>> metadata for floating point math, with the metadata holding the non-flag >>> info, and subclass data holding the flags. In which case it might be better >>> to just have it all be metadata in the first place >>> - Only a limited number of bits (but hey) >>> >>> Hopefully Chris will weigh in with his opinion. >> >> FYI. We've already had extensive discussion with Chris on this. He has made it clear this *must* be implemented with subclass data bits, not with metadata. > > More specifically, I reviewed the proposal and I agree with it's general design: I think it makes sense to use subclass data for these bits even though fpprecision doesn't. It follows the analogy of NSW/NUW bits which have worked well. I also think it makes a lot of sense to separate out the "relaxing FP math" part of the FP problem from orthogonal issues like modeling rounding modes, trapping operations (SNANs), etc. > > That said, I agree that the individual proposed bits (e.g. "A") could use some refinement. I think it is really important to accurately model the concepts that GCC exposes, but it may make sense to decompose them into finer-grained concepts than what GCC exposes. Also, infer-ability is an important aspect of this: we already have stuff in LLVM that tries to figure out things like "this can never be negative zero". I'd like it if we can separate the inference of this property from the clients of it. > > At a (ridiculous) limit, we could take everything in "A" and see what optimizations we want to permit, and add a separate bit for every suboptimization that it would enable. Hopefully from that list we can find natural clusters that would make sense to group together. >I should separate out Reciprocal from the rest of "A", as I believe that's pretty separable and safer than allowing the other transforms. One very desired transform from "A" is to allow the reassociation/canonicalization of floating point operations similarly to how the reassociation pass operates over integer operations. I'll think about whether there are other transforms that would be sufficiently distinct from this one remaining in "A" that would make sense to separate out.> -ChrisThanks for the feedback!> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Possibly Parallel Threads
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] Representing -ffast-math at the IR level
- [LLVMdev] Representing -ffast-math at the IR level
- [LLVMdev] Representing -ffast-math at the IR level