thr3ads.net - llvm dev - [LLVMdev] Representing -ffast-math at the IR level [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Duncan Sands

2012-Apr-14 21:02 UTC

[LLVMdev] Representing -ffast-math at the IR level

Hi Dmitry,
>     The kinds of transforms I think can reasonably be done with the current
>     information are things like: x + 0.0 -> x; x / constant -> x * (1
/ constant) if
>     constant and 1 / constant are normal (and not denormal) numbers.
>
>
> The particular definition is not that important, as the fact that this
> definition exists :) I.e. I think we need a set of transformations to be
defined
> (as enum the most likely, as Renato pointed out) and an interface, which
accepts
> "fp-model" (which is "fast", "strict" or
whatever keyword we may end up) and the
> particular transformation and returns true of false, depending whether the
> definition of fp-model allows this transformation or not. So the
transformation
> would request, for example, if reassociation is allowed or not.
at some point each optimization will have to decide if it is going to be applied
or not, so that's not really the point.  It seems to me that there are many
many
possible optimizations, and putting them all as flags in the metadata is out of
the question.  What seems reasonable to me is dividing transforms up into a few
major (and orthogonal) classes and putting flags for them in the metadata.
> Another point, important from practical point of view, is that fp-model is
> almost always the same for any instructions in the function (or even
module) and
> tagging every instruction with fp-model metadata is quite a substantial
waste of
> resources.
I measured the resource waste and it seems fairly small.

  So it makes sense to me to have a default fp-model defined for
the> function or module, which can be overwritten with instruction metadata.
That's possible (I already discussed this with Chandler), but in my opinion
is
only worth doing if we see unreasonable increases in bitcode size in real code.
> I also understand that clang generally derives GCC switches and fp
precision
> switches are not an exception, but I'd like to point out that
there's a far more
> orderly way of defining fp precision model (IMHO, of course :-) ), adopted
by MS
> and Intel Compiler (-fp-model [strict|precise|fast]). It would be nice to
have
> it adopted in clang.
>
> But while adding MS-style fp-model switches is different topic (and I guess
> quite arguable one), I'm mentioning it to show the importance of an
idea of
> abstracting internal compiler fp-model from external switches
The info in the meta-data is essentially a bunch of external switches which
will then be used to determine which transforms are run.

  and exposing> a querying interface to transformations. Transformations shouldn't care
about
> particular model, they need to know only if particular type of
transformation is
> allowed.
Do you have a concrete suggestion for what should be in the metadata?

Ciao, Duncan.
>
> Dmitry.
>
>
>     Ciao, Duncan.
>
>
>         Dmitry.
>
>         On Sat, Apr 14, 2012 at 10:28 PM, Duncan Sands <baldrick at
free.fr
>         <mailto:baldrick at free.fr>
>         <mailto:baldrick at free.fr <mailto:baldrick at
free.fr>>> wrote:
>
>             The attached patch is a first attempt at representing
"-ffast-math"
>         at the IR
>             level, in fact on individual floating point instructions (fadd,
fsub
>         etc).  It
>             is done using metadata.  We already have a "fpmath"
metadata type
>         which can be
>             used to signal that reduced precision is OK for a floating
point
>         operation, eg
>
>                 %z = fmul float %x, %y, !fpmath !0
>               ...
>               !0 = metadata !{double 2.5}
>
>             indicates that the multiplication can be done in any way that
>         doesn't introduce
>             more than 2.5 ULPs of error.
>
>             The first observation is that !fpmath can be extended with
>         additional operands
>             in the future: operands that say things like whether it is OK
to
>         assume that
>             there are no NaNs and so forth.
>
>             This patch doesn't add additional operands though.  It just
allows
>         the existing
>             accuracy operand to be the special keyword "fast"
instead of a number:
>
>                 %z = fmul float %x, %y, !fpmath !0
>               ...
>               !0 = metadata !{!metadata "fast"}
>
>             This indicates that accuracy loss is acceptable (just how much
is
>         unspecified)
>             for the sake of speed.  Thanks to Chandler for pushing me to do
it
>         this way!
>
>             It also creates a simple way of getting and setting this
>         information: the
>             FPMathOperator class: you can cast appropriate instructions to
this
>         class
>             and then use the querying/mutating methods to get/set the
accuracy,
>         whether
>             2.5 or "fast".  The attached clang patch uses this to
set the openCL
>         2.5 ULPs
>             accuracy rather than doing it by hand for example.
>
>             In addition it changes IRBuilder so that you can provide an
accuracy
>         when
>             creating floating point operations.  I don't like this so
much.  It
>         would
>             be more efficient to just create the metadata once and then
splat it
>         onto
>             each instruction.  Also, if fpmath gets a bunch more
options/operands in
>             the future then this interface will become more and more
awkward.
>           Opinions
>             welcome!
>
>             I didn't actually implement any optimizations that use this
yet.
>
>             I took a look at the impact on aermod.f90, a reasonably
floating
>         point heavy
>             Fortran benchmark (4% of the human readable IR consists of
floating
>         point
>             operations).  At -O3 (the worst), the size of the bitcode
increases
>         by 0.8%.
>             No idea if that's acceptable - hopefully it is!
>
>             Enjoy!
>
>             Duncan.
>
>             ______________________________ _________________
>             LLVM Developers mailing list
>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at
cs.uiuc.edu>>
>         http://llvm.cs.uiuc.edu
>         http://lists.cs.uiuc.edu/ mailman/listinfo/llvmdev
>         <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
>
>
>

Dmitry Babokin

2012-Apr-14 22:00 UTC

head link

[LLVMdev] Representing -ffast-math at the IR level

On Sun, Apr 15, 2012 at 1:02 AM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Dmitry,
>
>
>     The kinds of transforms I think can reasonably be done with the current
>>    information are things like: x + 0.0 -> x; x / constant -> x *
(1 /
>> constant) if
>>    constant and 1 / constant are normal (and not denormal) numbers.
>>
>> The particular definition is not that important, as the fact that this
>> definition exists :) I.e. I think we need a set of transformations to
be
>> defined
>> (as enum the most likely, as Renato pointed out) and an interface,
which
>> accepts
>> "fp-model" (which is "fast", "strict" or
whatever keyword we may end up)
>> and the
>> particular transformation and returns true of false, depending whether
the
>> definition of fp-model allows this transformation or not. So the
>> transformation
>> would request, for example, if reassociation is allowed or not.
>>
>
> at some point each optimization will have to decide if it is going to be
> applied
> or not, so that's not really the point.  It seems to me that there are
> many many
> possible optimizations, and putting them all as flags in the metadata is
> out of
> the question.  What seems reasonable to me is dividing transforms up into
> a few
> major (and orthogonal) classes and putting flags for them in the metadata.
>
> Optimization decision to apply or not should be based on strict definitionof what is allowed or not, but not on optimization interpretation of
"fast"
fp-model (for example). Say, after widely adopting "fast" fp-model in
the
compiler, you suddenly realize that  the definition is wrong and allowing
some type of transformation is a bad idea (for any reason - being
incompatible with some compiler or not taking into account some corner
cases or for whatever other reason), then you'll have to go and fix one
million places where the decision is made.

Alternatively, defining classes of transformation and making optimization
to query for particular types of transformation you keep it under control.

>  Another point, important from practical point of view, is that fp-model is
>> almost always the same for any instructions in the function (or even
>> module) and
>> tagging every instruction with fp-model metadata is quite a substantial
>> waste of
>> resources.
>>
>
> I measured the resource waste and it seems fairly small.
>
>
>  So it makes sense to me to have a default fp-model defined for the
>
>> function or module, which can be overwritten with instruction metadata.
>>
>
> That's possible (I already discussed this with Chandler), but in my
> opinion is
> only worth doing if we see unreasonable increases in bitcode size in real
> code.

What is reasonable or not is defined not only by absolute numbers (0.8% or
any other number). Does it make sense to increase bitcode size by 1% if
it's used only by math library writes and a couple other people who
reeeeally care about precision *and* performance at the same time
and knowledgeable enough to restrict precision on particular instructions
only? In my experience it's extremely rare case, when people would like to
have more than compiler flags to control fp accuracy and ready to deal with
pragmas (when they are available).
>
>
>  I also understand that clang generally derives GCC switches and fp
>> precision
>> switches are not an exception, but I'd like to point out that
there's a
>> far more
>> orderly way of defining fp precision model (IMHO, of course :-) ),
>> adopted by MS
>> and Intel Compiler (-fp-model [strict|precise|fast]). It would be nice
to
>> have
>> it adopted in clang.
>>
>> But while adding MS-style fp-model switches is different topic (and I
>> guess
>> quite arguable one), I'm mentioning it to show the importance of an
idea
>> of
>> abstracting internal compiler fp-model from external switches
>>
>
> The info in the meta-data is essentially a bunch of external switches which
> will then be used to determine which transforms are run.
>
>
>  and exposing
>
>> a querying interface to transformations. Transformations shouldn't
care
>> about
>> particular model, they need to know only if particular type of
>> transformation is
>> allowed.
>>
>
> Do you have a concrete suggestion for what should be in the metadata?
>
I would define the set of transformations, such as (i can help with more
complete list if you prefer):

   - reassociation
   - x+0.0=>x
   - x*0.0=>0.0
   - x*1.0=>x
   - a/b => a* 1/b
   - a*b+c=>fma(a,b,c)
   - ignoring NaNs in compare, i.e. (a<b) => !(a>=b)
   - value unsafe transformation (for aggressive fp optimizations, like
   a*b+a*c => a(b+c)) and other of the kind.

and several aliases for "strict", "precise",
"fast" models (which are
effectively combination of flags above).

So that metadata would be able to say "fast", "fast, but no fma
allowed",
"strict, but fma allowed", I.e. metadata should be a base-level +
optional
set of adjustments from the list above.

And, again, I think this should be function level model, unless specified
otherwise in the instruction, as it will be the case in 99.9999% of the
compilations.
>
> Ciao, Duncan.
>
Dmitry.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120415/94733b10/attachment.html>

Chandler Carruth

2012-Apr-14 23:53 UTC

head link

[LLVMdev] Representing -ffast-math at the IR level

I feel like this discussion is getting a bit off track...

On Sun, Apr 15, 2012 at 12:00 AM, Dmitry Babokin <babokin at gmail.com>
wrote:
>
> I would define the set of transformations, such as (i can help with more
> complete list if you prefer):
>
>    - reassociation
>    - x+0.0=>x
>    - x*0.0=>0.0
>    - x*1.0=>x
>    - a/b => a* 1/b
>    - a*b+c=>fma(a,b,c)
>    - ignoring NaNs in compare, i.e. (a<b) => !(a>=b)
>    - value unsafe transformation (for aggressive fp optimizations, like
>    a*b+a*c => a(b+c)) and other of the kind.
>
> and several aliases for "strict", "precise",
"fast" models (which are
> effectively combination of flags above).
>
> So that metadata would be able to say "fast", "fast, but no
fma allowed",
> "strict, but fma allowed", I.e. metadata should be a base-level +
optional
> set of adjustments from the list above.
>
I would love to see such detailed models if we have real use cases and
people interested in implementing them.

However, today we have a feature in moderately widespread use,
'-ffast-math'. It's semantics may not be the ideal way to enable
restricted, predictable optimizations of floating point operations, but
they are effective for a wide range of programs today.

I think having a generic flag value which specifically is attempting to
model the *loose* semantics of '-ffast-math' is really important, and I
think any more detailed framework for classifying and enabling specific
optimizations should be layered on afterward. While I share our frustration
with the very vague and hard to reason about semantics of '-ffast-math',
I
think we can provide a clear enough spec to make it implementable, and we
should give ourselves the freedom to implement all the optimizations within
that spec which existing applications rely on for performance.

And, again, I think this should be function level model, unless
specified> otherwise in the instruction, as it will be the case in 99.9999% of the
> compilations.
>
I actually lobbied with Duncan to use a function default, with instruction
level overrides, but his posts about the metadata overhead of just doing it
on each instruction, I think his approach is simpler.

As he argued to me, *eventually*, this has to end up on the instruction in
order to model inlining correctly -- a function compiled with
'-ffast-math'
might be inlined into a function compiled without it, and vice versa. Since
you need this ability, it makes sense to simplify the inliner, the metadata
schema, etc and just always place the data on the instructions *unless*
there is some significant scaling problem. I think Duncan has demonstrated
it scales pretty well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120415/b539e122/attachment.html>

Duncan Sands

2012-Apr-15 08:07 UTC

head link

[LLVMdev] Representing -ffast-math at the IR level

Hi Dmitry,
>     That's possible (I already discussed this with Chandler), but in my
opinion is
>     only worth doing if we see unreasonable increases in bitcode size in
real code.
>
>
> What is reasonable or not is defined not only by absolute numbers (0.8% or
any
> other number). Does it make sense to increase bitcode size by 1% if
it's used
> only by math library writes and a couple other people who reeeeally care
about
> precision *and* performance at the same time and knowledgeable enough to
> restrict precision on particular instructions only? In my experience
> it's extremely rare case, when people would like to have more than
compiler
> flags to control fp accuracy and ready to deal with pragmas (when they are
> available).
there is no increase in bitcode size if you don't use this feature.  If more
options are added it will hardly increase the bitcode size: there will be one
metadatum with lots of options (!0 = metadata !{ this, that, other }), and
instructions just have a reference to it.  So the size increase isn't like
(number of options) * (number of instructions), it is (number of options) +
(number of instructions).
> And, again, I think this should be function level model, unless specified
> otherwise in the instruction, as it will be the case in 99.9999% of the
> compilations.
Link-time optimization will sometimes result in "fast-math" functions
being
inlined into non-fast math functions and vice-versa.  This pretty much
inevitably means that per-instruction fpmath options are required.  That
said, to save space, if every fp instruction in a function has the same
fpmath metadata then the metadata could be attached to the function instead.
But since (in my opinion) the size increase is mild, I don't think it is
worth the added complexity.

Ciao, Duncan.

Hal Finkel

2012-Apr-15 13:16 UTC

head link

[LLVMdev] Representing -ffast-math at the IR level

On Sun, 15 Apr 2012 02:00:37 +0400
Dmitry Babokin <babokin at gmail.com> wrote:
> On Sun, Apr 15, 2012 at 1:02 AM, Duncan Sands <baldrick at free.fr>
> wrote:
> 
> > Hi Dmitry,
> >
> >
> >     The kinds of transforms I think can reasonably be done with the
> > current
> >>    information are things like: x + 0.0 -> x; x / constant
-> x *
> >> (1 / constant) if
> >>    constant and 1 / constant are normal (and not denormal)
numbers.
> >>
> >> The particular definition is not that important, as the fact that
> >> this definition exists :) I.e. I think we need a set of
> >> transformations to be defined
> >> (as enum the most likely, as Renato pointed out) and an interface,
> >> which accepts
> >> "fp-model" (which is "fast",
"strict" or whatever keyword we may
> >> end up) and the
> >> particular transformation and returns true of false, depending
> >> whether the definition of fp-model allows this transformation or
> >> not. So the transformation
> >> would request, for example, if reassociation is allowed or not.
> >>
> >
> > at some point each optimization will have to decide if it is going
> > to be applied
> > or not, so that's not really the point.  It seems to me that there
> > are many many
> > possible optimizations, and putting them all as flags in the
> > metadata is out of
> > the question.  What seems reasonable to me is dividing transforms
> > up into a few
> > major (and orthogonal) classes and putting flags for them in the
> > metadata.
> >
> > Optimization decision to apply or not should be based on strict
> > definition
> of what is allowed or not, but not on optimization interpretation of
> "fast" fp-model (for example). Say, after widely adopting
"fast"
> fp-model in the compiler, you suddenly realize that  the definition
> is wrong and allowing some type of transformation is a bad idea (for
> any reason - being incompatible with some compiler or not taking into
> account some corner cases or for whatever other reason), then you'll
> have to go and fix one million places where the decision is made.
> 
> Alternatively, defining classes of transformation and making
> optimization to query for particular types of transformation you keep
> it under control.
> 
> 
> >  Another point, important from practical point of view, is that
> > fp-model is
> >> almost always the same for any instructions in the function (or
> >> even module) and
> >> tagging every instruction with fp-model metadata is quite a
> >> substantial waste of
> >> resources.
> >>
> >
> > I measured the resource waste and it seems fairly small.
> >
> >
> >  So it makes sense to me to have a default fp-model defined for the
> >
> >> function or module, which can be overwritten with instruction
> >> metadata.
> >>
> >
> > That's possible (I already discussed this with Chandler), but in
my
> > opinion is
> > only worth doing if we see unreasonable increases in bitcode size
> > in real code.
> 
> 
> What is reasonable or not is defined not only by absolute numbers
> (0.8% or any other number). Does it make sense to increase bitcode
> size by 1% if it's used only by math library writes and a couple
> other people who reeeeally care about precision *and* performance at
> the same time and knowledgeable enough to restrict precision on
> particular instructions only? In my experience it's extremely rare
> case, when people would like to have more than compiler flags to
> control fp accuracy and ready to deal with pragmas (when they are
> available).
> 
> >
> >
> >  I also understand that clang generally derives GCC switches and fp
> >> precision
> >> switches are not an exception, but I'd like to point out that
> >> there's a far more
> >> orderly way of defining fp precision model (IMHO, of course :-) ),
> >> adopted by MS
> >> and Intel Compiler (-fp-model [strict|precise|fast]). It would be
> >> nice to have
> >> it adopted in clang.
> >>
> >> But while adding MS-style fp-model switches is different topic
> >> (and I guess
> >> quite arguable one), I'm mentioning it to show the importance
of
> >> an idea of
> >> abstracting internal compiler fp-model from external switches
> >>
> >
> > The info in the meta-data is essentially a bunch of external
> > switches which will then be used to determine which transforms are
> > run.
> >
> >
> >  and exposing
> >
> >> a querying interface to transformations. Transformations
shouldn't
> >> care about
> >> particular model, they need to know only if particular type of
> >> transformation is
> >> allowed.
> >>
> >
> > Do you have a concrete suggestion for what should be in the
> > metadata?
> >
> 
> I would define the set of transformations, such as (i can help with
> more complete list if you prefer):
> 
>    - reassociation
>    - x+0.0=>x
>    - x*0.0=>0.0
>    - x*1.0=>x
>    - a/b => a* 1/b
>    - a*b+c=>fma(a,b,c)
>    - ignoring NaNs in compare, i.e. (a<b) => !(a>=b)
>    - value unsafe transformation (for aggressive fp optimizations,
> like a*b+a*c => a(b+c)) and other of the kind.
> 
> and several aliases for "strict", "precise",
"fast" models (which are
> effectively combination of flags above).
>From a user's perspective, I think that it is important to havecategories defining:
 - finite math (as precise as normal, but might do odd things for NaNs
   or Infty, etc.) - I'd suppose this is a strictest "fast"
option.
 - algebraic-equivalence - The compiler might do anything that is
   algebraically the same (even if the numerics could be quite
   different) - This is probably the loosest "fast" option.

 -Hal
> 
> So that metadata would be able to say "fast", "fast, but no
fma
> allowed", "strict, but fma allowed", I.e. metadata should be
a
> base-level + optional set of adjustments from the list above.
> 
> And, again, I think this should be function level model, unless
> specified otherwise in the instruction, as it will be the case in
> 99.9999% of the compilations.
> 
> >
> > Ciao, Duncan.
> >
> 
> Dmitry.


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Apr 2012 - [LLVMdev] Representing -ffast-math at the IR level

[LLVMdev] Representing -ffast-math at the IR level

[LLVMdev] Representing -ffast-math at the IR level

[LLVMdev] Representing -ffast-math at the IR level

[LLVMdev] Representing -ffast-math at the IR level

[LLVMdev] Representing -ffast-math at the IR level

Possibly Parallel Threads