thr3ads.net - llvm dev - [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Evan Cheng

2012-Oct-30 22:11 UTC

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

On Oct 30, 2012, at 1:46 AM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Michael,
> 
>> Flags
>> ---
>> no NaNs (N)
>>   - ignore the existence of NaNs when convenient
>> no Infs (I)
>>   - ignore the existence of Infs when convenient
>> no signed zeros (S)
>>   - ignore the existence of negative zero when convenient
> 
> while the above flags make perfect sense for me, the other two seem more
> dubious:
> 
>> allow fusion (F)
>>   - fuse FP operations when convenient, despite possible differences in
rounding
>>     (e.g. form FMAs)
>> unsafe algebra (A)
>>   - allow for algebraically equivalent transformations that may
dramatically
>>     change results in floating point. (e.g. reassociation)
> 
> They don't seem to be capturing a clear concept, they seem more like a
grab-bag
> of "everything else" (A) or "here's a random thing that
is important today so
> let's have a flag for it" (F).
> 
> ...
> 
>> Why not use metadata rather than flags?
>> 
>> There is existing metadata to denote precisions, and this proposal is
orthogonal
>> to those efforts. These flags are analogous to nsw/nuw, and are
inherent
>> properties of the IR instructions themselves that all transformations
should
>> respect.
> 
> If you drop any of these flags then things are still conservatively
correct,
> just like with metadata.  In my opinion this could be implemented as
metadata.
> (I'm not saying it should be represented as metadata, I'm saying it
could be).
> 
> Disadvantages of metadata:
> 
> - Bloats the IR (however my measurements suggest this is by < 2% for
math heavy
> code)
> - More painful to work with (though helper classes can mitigate this)
> - Less efficient to modify (but will flags be cleared that often)?
> 
> Disadvantages of using subclass data bits:
> 
> - Can only represent flags.  Thus you might end up with a mix of flags and
> metadata for floating point math, with the metadata holding the non-flag
> info, and subclass data holding the flags.  In which case it might be
better
> to just have it all be metadata in the first place
> - Only a limited number of bits (but hey)
> 
> Hopefully Chris will weigh in with his opinion.
FYI. We've already had extensive discussion with Chris on this. He has made
it clear this *must* be implemented with subclass data bits, not with metadata.

Evan
> 
> Ciao, Duncan.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chris Lattner

2012-Oct-31 05:50 UTC

head link

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

On Oct 30, 2012, at 3:11 PM, Evan Cheng <evan.cheng at apple.com>
wrote:>> Disadvantages of using subclass data bits:
>> 
>> - Can only represent flags.  Thus you might end up with a mix of flags
and
>> metadata for floating point math, with the metadata holding the
non-flag
>> info, and subclass data holding the flags.  In which case it might be
better
>> to just have it all be metadata in the first place
>> - Only a limited number of bits (but hey)
>> 
>> Hopefully Chris will weigh in with his opinion.
> 
> FYI. We've already had extensive discussion with Chris on this. He has
made it clear this *must* be implemented with subclass data bits, not with
metadata.
More specifically, I reviewed the proposal and I agree with it's general
design: I think it makes sense to use subclass data for these bits even though
fpprecision doesn't.  It follows the analogy of NSW/NUW bits which have
worked well.  I also think it makes a lot of sense to separate out the
"relaxing FP math" part of the FP problem from orthogonal issues like
modeling rounding modes, trapping operations (SNANs), etc.

That said, I agree that the individual proposed bits (e.g. "A") could
use some refinement.  I think it is really important to accurately model the
concepts that GCC exposes, but it may make sense to decompose them into
finer-grained concepts than what GCC exposes.  Also, infer-ability is an
important aspect of this: we already have stuff in LLVM that tries to figure out
things like "this can never be negative zero".  I'd like it if we
can separate the inference of this property from the clients of it.

At a (ridiculous) limit, we could take everything in "A" and see what
optimizations we want to permit, and add a separate bit for every
suboptimization that it would enable.  Hopefully from that list we can find
natural clusters that would make sense to group together.

-Chris

Michael Ilseman

2012-Nov-02 16:58 UTC

head link

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

On Oct 30, 2012, at 10:50 PM, Chris Lattner <clattner at apple.com> wrote:
> On Oct 30, 2012, at 3:11 PM, Evan Cheng <evan.cheng at apple.com>
wrote:
>>> Disadvantages of using subclass data bits:
>>> 
>>> - Can only represent flags.  Thus you might end up with a mix of
flags and
>>> metadata for floating point math, with the metadata holding the
non-flag
>>> info, and subclass data holding the flags.  In which case it might
be better
>>> to just have it all be metadata in the first place
>>> - Only a limited number of bits (but hey)
>>> 
>>> Hopefully Chris will weigh in with his opinion.
>> 
>> FYI. We've already had extensive discussion with Chris on this. He
has made it clear this *must* be implemented with subclass data bits, not with
metadata.
> 
> More specifically, I reviewed the proposal and I agree with it's
general design: I think it makes sense to use subclass data for these bits even
though fpprecision doesn't.  It follows the analogy of NSW/NUW bits which
have worked well.  I also think it makes a lot of sense to separate out the
"relaxing FP math" part of the FP problem from orthogonal issues like
modeling rounding modes, trapping operations (SNANs), etc.
> 
> That said, I agree that the individual proposed bits (e.g. "A")
could use some refinement.  I think it is really important to accurately model
the concepts that GCC exposes, but it may make sense to decompose them into
finer-grained concepts than what GCC exposes.  Also, infer-ability is an
important aspect of this: we already have stuff in LLVM that tries to figure out
things like "this can never be negative zero".  I'd like it if we
can separate the inference of this property from the clients of it.
> 
> At a (ridiculous) limit, we could take everything in "A" and see
what optimizations we want to permit, and add a separate bit for every
suboptimization that it would enable.  Hopefully from that list we can find
natural clusters that would make sense to group together.
> 
I should separate out Reciprocal from the rest of "A", as I believe
that's pretty separable and safer than allowing the other transforms.

One very desired transform from "A" is to allow the
reassociation/canonicalization of floating point operations similarly to how the
reassociation pass operates over integer operations. I'll think about
whether there are other transforms that would be sufficiently distinct from this
one remaining in "A" that would make sense to separate out.
> -Chris
Thanks for the feedback!
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Nov 2012 - [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

Maybe Matching Threads