thr3ads.net - llvm dev - [LLVMdev] Representing -ffast-math at the IR level [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Duncan Sands

2012-Apr-14 19:44 UTC

[LLVMdev] Representing -ffast-math at the IR level

Hi Dmitry,
> I'm not an expert in fp accuracy question, but I had quite a
> few experience dealing with fp accuracy problems during compiler
transformations.
I agree that it's a minefield which is why I intend to proceed
conservatively.
> I think you have a step in the right direction, walking away from ULPs,
which
> are pretty useless for the purpose of describing allowed fp optimizations
IMHO.
> But using just "fast" keyword (or whatever else will be added in
the future) is
> not enough without strict definition of this keyword in terms of IR
> transformations. For example, particular transformation may be interested
if
> reassociation is allowed or not ((a+b)+c=> a+(b+c)), if fp contraction
is
> allowed or not (ab+c = >fma(a,b,c)), if addition of zero may be canceled
> (x+0=>x) and etc. If this definition is not given on infrastructure
level, this
> may lead to disaster, when each transformation interprets "fast"
in its own way.
This is actually the main reason for using metadata rather than a flag like the
"nsw" flag on integer operations: it is easily extendible with more
info to say
whether reassociation is OK and so forth.

The kinds of transforms I think can reasonably be done with the current
information are things like: x + 0.0 -> x; x / constant -> x * (1 /
constant) if
constant and 1 / constant are normal (and not denormal) numbers.

Ciao, Duncan.
>
> Dmitry.
>
> On Sat, Apr 14, 2012 at 10:28 PM, Duncan Sands <baldrick at free.fr
> <mailto:baldrick at free.fr>> wrote:
>
>     The attached patch is a first attempt at representing
"-ffast-math" at the IR
>     level, in fact on individual floating point instructions (fadd, fsub
etc).  It
>     is done using metadata.  We already have a "fpmath" metadata
type which can be
>     used to signal that reduced precision is OK for a floating point
operation, eg
>
>         %z = fmul float %x, %y, !fpmath !0
>       ...
>       !0 = metadata !{double 2.5}
>
>     indicates that the multiplication can be done in any way that
doesn't introduce
>     more than 2.5 ULPs of error.
>
>     The first observation is that !fpmath can be extended with additional
operands
>     in the future: operands that say things like whether it is OK to assume
that
>     there are no NaNs and so forth.
>
>     This patch doesn't add additional operands though.  It just allows
the existing
>     accuracy operand to be the special keyword "fast" instead of
a number:
>
>         %z = fmul float %x, %y, !fpmath !0
>       ...
>       !0 = metadata !{!metadata "fast"}
>
>     This indicates that accuracy loss is acceptable (just how much is
unspecified)
>     for the sake of speed.  Thanks to Chandler for pushing me to do it this
way!
>
>     It also creates a simple way of getting and setting this information:
the
>     FPMathOperator class: you can cast appropriate instructions to this
class
>     and then use the querying/mutating methods to get/set the accuracy,
whether
>     2.5 or "fast".  The attached clang patch uses this to set the
openCL 2.5 ULPs
>     accuracy rather than doing it by hand for example.
>
>     In addition it changes IRBuilder so that you can provide an accuracy
when
>     creating floating point operations.  I don't like this so much.  It
would
>     be more efficient to just create the metadata once and then splat it
onto
>     each instruction.  Also, if fpmath gets a bunch more options/operands
in
>     the future then this interface will become more and more awkward. 
Opinions
>     welcome!
>
>     I didn't actually implement any optimizations that use this yet.
>
>     I took a look at the impact on aermod.f90, a reasonably floating point
heavy
>     Fortran benchmark (4% of the human readable IR consists of floating
point
>     operations).  At -O3 (the worst), the size of the bitcode increases by
0.8%.
>     No idea if that's acceptable - hopefully it is!
>
>     Enjoy!
>
>     Duncan.
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Dmitry Babokin

2012-Apr-14 20:34 UTC

head link

[LLVMdev] Representing -ffast-math at the IR level

On Sat, Apr 14, 2012 at 11:44 PM, Duncan Sands <baldrick at free.fr>
wrote:
>
> I think you have a step in the right direction, walking away from ULPs,
>> which
>> are pretty useless for the purpose of describing allowed fp
optimizations
>> IMHO.
>> But using just "fast" keyword (or whatever else will be added
in the
>> future) is
>> not enough without strict definition of this keyword in terms of IR
>> transformations. For example, particular transformation may be
interested
>> if
>> reassociation is allowed or not ((a+b)+c=> a+(b+c)), if fp
contraction is
>> allowed or not (ab+c = >fma(a,b,c)), if addition of zero may be
canceled
>> (x+0=>x) and etc. If this definition is not given on infrastructure
>> level, this
>> may lead to disaster, when each transformation interprets
"fast" in its
>> own way.
>>
>
> This is actually the main reason for using metadata rather than a flag
> like the
> "nsw" flag on integer operations: it is easily extendible with
more info
> to say
> whether reassociation is OK and so forth.
>
> The kinds of transforms I think can reasonably be done with the current
> information are things like: x + 0.0 -> x; x / constant -> x * (1 /
> constant) if
> constant and 1 / constant are normal (and not denormal) numbers.
>
The particular definition is not that important, as the fact that this
definition exists :) I.e. I think we need a set of transformations to be
defined (as enum the most likely, as Renato pointed out) and an interface,
which accepts "fp-model" (which is "fast",
"strict" or whatever keyword we
may end up) and the particular transformation and returns true of false,
depending whether the definition of fp-model allows this transformation or
not. So the transformation would request, for example, if reassociation is
allowed or not.

Another point, important from practical point of view, is that fp-model is
almost always the same for any instructions in the function (or even
module) and tagging every instruction with fp-model metadata is quite
a substantial waste of resources. So it makes sense to me to have a default
fp-model defined for the function or module, which can be overwritten with
instruction metadata.

I also understand that clang generally derives GCC switches and fp
precision switches are not an exception, but I'd like to point out that
there's a far more orderly way of defining fp precision model (IMHO, of
course :-) ), adopted by MS and Intel Compiler (-fp-model
[strict|precise|fast]). It would be nice to have it adopted in clang.

But while adding MS-style fp-model switches is different topic (and I guess
quite arguable one), I'm mentioning it to show the importance of an idea of
abstracting internal compiler fp-model from external switches and exposing
a querying interface to transformations. Transformations shouldn't care
about particular model, they need to know only if particular type of
transformation is allowed.

Dmitry.

>
> Ciao, Duncan.
>
>
>> Dmitry.
>>
>> On Sat, Apr 14, 2012 at 10:28 PM, Duncan Sands <baldrick at free.fr
>> <mailto:baldrick at free.fr>> wrote:
>>
>>    The attached patch is a first attempt at representing
"-ffast-math" at
>> the IR
>>    level, in fact on individual floating point instructions (fadd, fsub
>> etc).  It
>>    is done using metadata.  We already have a "fpmath"
metadata type
>> which can be
>>    used to signal that reduced precision is OK for a floating point
>> operation, eg
>>
>>        %z = fmul float %x, %y, !fpmath !0
>>      ...
>>      !0 = metadata !{double 2.5}
>>
>>    indicates that the multiplication can be done in any way that
doesn't
>> introduce
>>    more than 2.5 ULPs of error.
>>
>>    The first observation is that !fpmath can be extended with
additional
>> operands
>>    in the future: operands that say things like whether it is OK to
>> assume that
>>    there are no NaNs and so forth.
>>
>>    This patch doesn't add additional operands though.  It just
allows the
>> existing
>>    accuracy operand to be the special keyword "fast" instead
of a number:
>>
>>        %z = fmul float %x, %y, !fpmath !0
>>      ...
>>      !0 = metadata !{!metadata "fast"}
>>
>>    This indicates that accuracy loss is acceptable (just how much is
>> unspecified)
>>    for the sake of speed.  Thanks to Chandler for pushing me to do it
>> this way!
>>
>>    It also creates a simple way of getting and setting this
information:
>> the
>>    FPMathOperator class: you can cast appropriate instructions to this
>> class
>>    and then use the querying/mutating methods to get/set the accuracy,
>> whether
>>    2.5 or "fast".  The attached clang patch uses this to set
the openCL
>> 2.5 ULPs
>>    accuracy rather than doing it by hand for example.
>>
>>    In addition it changes IRBuilder so that you can provide an accuracy
>> when
>>    creating floating point operations.  I don't like this so much. 
It
>> would
>>    be more efficient to just create the metadata once and then splat it
>> onto
>>    each instruction.  Also, if fpmath gets a bunch more
options/operands
>> in
>>    the future then this interface will become more and more awkward.
>>  Opinions
>>    welcome!
>>
>>    I didn't actually implement any optimizations that use this yet.
>>
>>    I took a look at the impact on aermod.f90, a reasonably floating
point
>> heavy
>>    Fortran benchmark (4% of the human readable IR consists of floating
>> point
>>    operations).  At -O3 (the worst), the size of the bitcode increases
by
>> 0.8%.
>>    No idea if that's acceptable - hopefully it is!
>>
>>    Enjoy!
>>
>>    Duncan.
>>
>>    ______________________________**_________________
>>    LLVM Developers mailing list
>>    LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>> http://llvm.cs.uiuc.edu
>>   
http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120415/2ac9a4a1/attachment.html>

Duncan Sands

2012-Apr-14 21:02 UTC

head link

[LLVMdev] Representing -ffast-math at the IR level

Hi Dmitry,
>     The kinds of transforms I think can reasonably be done with the current
>     information are things like: x + 0.0 -> x; x / constant -> x * (1
/ constant) if
>     constant and 1 / constant are normal (and not denormal) numbers.
>
>
> The particular definition is not that important, as the fact that this
> definition exists :) I.e. I think we need a set of transformations to be
defined
> (as enum the most likely, as Renato pointed out) and an interface, which
accepts
> "fp-model" (which is "fast", "strict" or
whatever keyword we may end up) and the
> particular transformation and returns true of false, depending whether the
> definition of fp-model allows this transformation or not. So the
transformation
> would request, for example, if reassociation is allowed or not.
at some point each optimization will have to decide if it is going to be applied
or not, so that's not really the point.  It seems to me that there are many
many
possible optimizations, and putting them all as flags in the metadata is out of
the question.  What seems reasonable to me is dividing transforms up into a few
major (and orthogonal) classes and putting flags for them in the metadata.
> Another point, important from practical point of view, is that fp-model is
> almost always the same for any instructions in the function (or even
module) and
> tagging every instruction with fp-model metadata is quite a substantial
waste of
> resources.
I measured the resource waste and it seems fairly small.

  So it makes sense to me to have a default fp-model defined for
the> function or module, which can be overwritten with instruction metadata.
That's possible (I already discussed this with Chandler), but in my opinion
is
only worth doing if we see unreasonable increases in bitcode size in real code.
> I also understand that clang generally derives GCC switches and fp
precision
> switches are not an exception, but I'd like to point out that
there's a far more
> orderly way of defining fp precision model (IMHO, of course :-) ), adopted
by MS
> and Intel Compiler (-fp-model [strict|precise|fast]). It would be nice to
have
> it adopted in clang.
>
> But while adding MS-style fp-model switches is different topic (and I guess
> quite arguable one), I'm mentioning it to show the importance of an
idea of
> abstracting internal compiler fp-model from external switches
The info in the meta-data is essentially a bunch of external switches which
will then be used to determine which transforms are run.

  and exposing> a querying interface to transformations. Transformations shouldn't care
about
> particular model, they need to know only if particular type of
transformation is
> allowed.
Do you have a concrete suggestion for what should be in the metadata?

Ciao, Duncan.
>
> Dmitry.
>
>
>     Ciao, Duncan.
>
>
>         Dmitry.
>
>         On Sat, Apr 14, 2012 at 10:28 PM, Duncan Sands <baldrick at
free.fr
>         <mailto:baldrick at free.fr>
>         <mailto:baldrick at free.fr <mailto:baldrick at
free.fr>>> wrote:
>
>             The attached patch is a first attempt at representing
"-ffast-math"
>         at the IR
>             level, in fact on individual floating point instructions (fadd,
fsub
>         etc).  It
>             is done using metadata.  We already have a "fpmath"
metadata type
>         which can be
>             used to signal that reduced precision is OK for a floating
point
>         operation, eg
>
>                 %z = fmul float %x, %y, !fpmath !0
>               ...
>               !0 = metadata !{double 2.5}
>
>             indicates that the multiplication can be done in any way that
>         doesn't introduce
>             more than 2.5 ULPs of error.
>
>             The first observation is that !fpmath can be extended with
>         additional operands
>             in the future: operands that say things like whether it is OK
to
>         assume that
>             there are no NaNs and so forth.
>
>             This patch doesn't add additional operands though.  It just
allows
>         the existing
>             accuracy operand to be the special keyword "fast"
instead of a number:
>
>                 %z = fmul float %x, %y, !fpmath !0
>               ...
>               !0 = metadata !{!metadata "fast"}
>
>             This indicates that accuracy loss is acceptable (just how much
is
>         unspecified)
>             for the sake of speed.  Thanks to Chandler for pushing me to do
it
>         this way!
>
>             It also creates a simple way of getting and setting this
>         information: the
>             FPMathOperator class: you can cast appropriate instructions to
this
>         class
>             and then use the querying/mutating methods to get/set the
accuracy,
>         whether
>             2.5 or "fast".  The attached clang patch uses this to
set the openCL
>         2.5 ULPs
>             accuracy rather than doing it by hand for example.
>
>             In addition it changes IRBuilder so that you can provide an
accuracy
>         when
>             creating floating point operations.  I don't like this so
much.  It
>         would
>             be more efficient to just create the metadata once and then
splat it
>         onto
>             each instruction.  Also, if fpmath gets a bunch more
options/operands in
>             the future then this interface will become more and more
awkward.
>           Opinions
>             welcome!
>
>             I didn't actually implement any optimizations that use this
yet.
>
>             I took a look at the impact on aermod.f90, a reasonably
floating
>         point heavy
>             Fortran benchmark (4% of the human readable IR consists of
floating
>         point
>             operations).  At -O3 (the worst), the size of the bitcode
increases
>         by 0.8%.
>             No idea if that's acceptable - hopefully it is!
>
>             Enjoy!
>
>             Duncan.
>
>             ______________________________ _________________
>             LLVM Developers mailing list
>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at
cs.uiuc.edu>>
>         http://llvm.cs.uiuc.edu
>         http://lists.cs.uiuc.edu/ mailman/listinfo/llvmdev
>         <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
>
>
>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Apr 2012 - [LLVMdev] Representing -ffast-math at the IR level

[LLVMdev] Representing -ffast-math at the IR level

[LLVMdev] Representing -ffast-math at the IR level

[LLVMdev] Representing -ffast-math at the IR level

Maybe Matching Threads