thr3ads.net - llvm dev - [llvm-dev] FW: clarification needed for the constrained fp implementation. [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Kaylor, Andrew via llvm-dev

2017-Nov-03 20:29 UTC

[llvm-dev] FW: clarification needed for the constrained fp implementation.

Copying the list on a discussion of potentially general interest....

From: Kaylor, Andrew
Sent: Friday, November 03, 2017 1:11 PM
To: 'Ding, Wei' <Wei.Ding2 at amd.com>; Sumner, Brian
<Brian.Sumner at amd.com>; Arsenault, Matthew <Matthew.Arsenault at
amd.com>
Subject: RE: clarification needed for the constrained fp implementation.

Hi Wei,

I've been meaning to write something up for discussion on the LLVM Dev list
about this.  I hope you don't mind if I copy the list now to accomplish that
while also answering your questions.  Eventually I create a document describing
this in more detail and less formally than the language definition.

Basically, the "constraints" in the constrained FP intrinisics are
constraints on the optimizer.  They are a way of telling the optimizer what it
can and cannot assume about rounding mode and FP exception behavior.  By
default, the optimizer assumes that the rounding mode is round-to-nearest and
that FP exceptions are being ignored.  If the user code is going to do anything
that invalidates these assumptions, then we need a way to make the optimizer
stop assuming that.  That's what the intrinisics do.  Because most passes
don't recognize the intrinisics, they can't do anything with the
operations they represent and therefore can't make any assumption about
them.

The intrinsics are not intended to do anything to change the rounding mode or FP
exception handling state.  I have an idea in mind for some additional intrinsics
that would provide a way to control the FP environment.  There are already some
target-specific mechanisms for doing that, but I'd like to have something
that's target independent.  I'll say more about this in a minute.

I mentioned in my review comments that my work on this has been motivated by the
STDC pragmas, and I think if I explain that it might make the semantics of the
intrinsics seem a little more natural.  The primary pragma I have in mind here
is the "STDC FENV_ACCESS" pragma.  I believe this is part of the C99
standard, but compiler support for it is still mostly (if not entirely) missing.
For instance, if you try to use this pragma with clang you will get a message
telling you that the pragma isn't supported and it will have no other
effect.  We want to change that.

Basically, what the "STDC FENV_ACCESS" pragma does is provide
programmers with a way to tell the compiler that the program might change the FP
environment.  This pragma represents a setting that has only two states -- on
and off.  The default setting of this state is documented as being
implementation defined.  In clang the default state will be off.  The C99
standard states that accessing the FP environment (testing FP status flags,
changing FP control modes, etc.) when FENV_ACCESS is off is undefined behavior. 
The C99 standard provides library calls to access the environment (fesetround,
fegetround, fetestexcept, etc.) but you can only safely use these if you have
set FENV_ACCESS to the "on" state.  A typical usage might look like
this:

#include <fenv.h>

double someFunc(double A, double B, bool ForceRoundUp) {
  #pragma STDC FENV_ACCESS ON
  double Result;
  if (ForceRoundUp) {
    int OldRM = fegetround();
    fesetround(FE_UPWARD);
    Result = A/B;
    fesetround(OldRM);
  } else {
    Result = A/B;
  }
  return Result;
}

So you see here that there are explicit calls to change the rounding mode.  If
you were to do this in clang today, the generated IR would look like this:

define double @someFunc(double, double, i1) {
  br i1 %2, label %4, label %8

; <label>:4:                                      ; preds = %3
  %5 = tail call i32 @fegetround()
  %6 = tail call i32 @fesetround(i32 2048)
  %7 = tail call i32 @fesetround(i32 %5)
  br label %8

; <label>:8:                                      ; preds = %3, %4
  %9 = fdiv double %0, %1
  ret double %9
}

Notice that the fdiv got sunk outside of the calls to change the rounding mode. 
Once we support the FENV_ACCESS pragma, the generated IR will look like this
instead:

define double @someFunc(double, double, i1) {
  br i1 %2, label %4, label %8

; <label>:4:                                      ; preds = %3
  %5 = tail call i32 @fegetround()
  %6 = tail call i32 @fesetround(i32 2048)
  %7 = call double llvm.experimental.constrained.fdiv.f64(double %0, double%1,
metadata "round.dynamic", metadata "fpexcept.strict")
  %8 = tail call i32 @fesetround(i32 %5)
  br label %11

; <label>:9:                                      ; preds = %3
  %10 = call double llvm.experimental.constrained.fdiv.f64(double %0, double%1,
metadata "round.dynamic", metadata "fpexcept.strict")
  Br label %11

; <label>:11:                                      ; preds = %4, %9
  %12 = phi double [ %7, %4 ], [ %10, %9 ]
  ret double %12
}

Note that I've left the rounding mode as "round.dynamic" here.  In
theory we could implement an optimization that recognizes the calls to
fesetround and changes that argument to "round.upward", but initially
it will be "round.dynamic" which indicates that the program is allowed
to change the rounding mode at runtime and that's what I expect will always
come out of the front end.

The key point here is that the pragma just gives the programmer permission to
change things.  There are some other pragmas in draft standards that allow the
programmer to specify what the rounding mode will be for a given scope, but it
is my understanding that they do not actually change the rounding mode either. 
They just tell the compiler what the programmer is promising the rounding mode
will be at that point.

All of this is substantially background information with regard to LLVM since
the LLVM optimizer is independent of any front end.  The key point I'm
trying to convey is that the constrained intrinsics are meant to behave in a way
that is analogous to these pragmas.

What I think we need now are a set of language-neutral and target-independent
intrinsics that allow us to control the FP rounding mode.  For instance, suppose
you are implementing some function and you want to be able to control the
rounding mode for the entire scope of the function.  You typically will need to
do that similarly to what I showed in my example above -- get the current
rounding mode, set the rounding mode, perform some operations, and restore the
rounding mode.  I don't think you would want the constrained intrinsics that
we have to be responsible for controlling the rounding mode because that would
mean that the backend would need to generate instructions to get-set-restore the
rounding mode around every operation in your function (unless the target
supports explicit rounding mode operands).

What I'm thinking is that we need something like this:

void llvm.set.roundingmode(i32 mode)
i32 lllvm.get.roundingmode()

These would then get translated during instruction selection to target-specific
instructions, which is equivalent to what fesetround() and fegetround() do. But
I think it would also be useful to have something like this:

void llvm.begin.local.roundingmode(i32 mode)
void llvm.end.local.roundingmode()

This could encapsulate to get-set-restore idiom.  My thinking then is that if a
target does support explicit rounding mode operands, these intrinsics
wouldn't need to result in any instructions that change the processors
rounding mode and could instead just be used to determine what the rounding mode
operand should be where applicable.

There are still some issues that need to be worked out here (such as the fact
that a i32 here is far too general), but that's basically what I'm
thinking.

Does that make sense?

-Andy

From: Ding, Wei [mailto:Wei.Ding2 at amd.com]
Sent: Friday, November 03, 2017 12:04 PM
To: Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at
intel.com>>; Sumner, Brian <Brian.Sumner at
amd.com<mailto:Brian.Sumner at amd.com>>; Arsenault, Matthew
<Matthew.Arsenault at amd.com<mailto:Matthew.Arsenault at amd.com>>
Subject: clarification needed for the constrained fp implementation.

Hi Andy,

Thanks a lot for your comments https://reviews.llvm.org/D38634. Actually, I
don't think I got 100% clear about the way you are trying to implement for
the constrained fps. If possible, could you please elaborate on? Especially I
don't quite follow your comments like "I'm approaching this from
the perspective of the STDC pragmas related to the FP environment. ".

Thank you so much!

Best regards,

Wei
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171103/f0719dff/attachment.html>

陳韋任 via llvm-dev

2017-Nov-03 22:26 UTC

head link

[llvm-dev] FW: clarification needed for the constrained fp implementation.

2017-11-04 4:29 GMT+08:00 Kaylor, Andrew via llvm-dev <
llvm-dev at lists.llvm.org>:
> Copying the list on a discussion of potentially general interest….
>
>
>
> *From:* Kaylor, Andrew
> *Sent:* Friday, November 03, 2017 1:11 PM
> *To:* 'Ding, Wei' <Wei.Ding2 at amd.com>; Sumner, Brian
<Brian.Sumner at amd.com>;
> Arsenault, Matthew <Matthew.Arsenault at amd.com>
> *Subject:* RE: clarification needed for the constrained fp implementation.
>
>
>
> Hi Wei,
>
>
>
> I’ve been meaning to write something up for discussion on the LLVM Dev
> list about this.  I hope you don’t mind if I copy the list now to
> accomplish that while also answering your questions.  Eventually I create a
> document describing this in more detail and less formally than the language
> definition.
>
>
>
> Basically, the “constraints” in the constrained FP intrinisics are
> constraints on the optimizer.  They are a way of telling the optimizer what
> it can and cannot assume about rounding mode and FP exception behavior.  By
> default, the optimizer assumes that the rounding mode is round-to-nearest
> and that FP exceptions are being ignored.  If the user code is going to do
> anything that invalidates these assumptions, then we need a way to make the
> optimizer stop assuming that.  That’s what the intrinisics do.  Because
> most passes don’t recognize the intrinisics, they can’t do anything with
> the operations they represent and therefore can’t make any assumption about
> them.
>
>
>
> The intrinsics are not intended to do anything to change the rounding mode
> or FP exception handling state.  I have an idea in mind for some additional
> intrinsics that would provide a way to control the FP environment.  There
> are already some target-specific mechanisms for doing that, but I’d like to
> have something that’s target independent.  I’ll say more about this in a
> minute.
>
>
>
> I mentioned in my review comments that my work on this has been motivated
> by the STDC pragmas, and I think if I explain that it might make the
> semantics of the intrinsics seem a little more natural.  The primary pragma
> I have in mind here is the “STDC FENV_ACCESS” pragma.  I believe this is
> part of the C99 standard, but compiler support for it is still mostly (if
> not entirely) missing.  For instance, if you try to use this pragma with
> clang you will get a message telling you that the pragma isn’t supported
> and it will have no other effect.  We want to change that.
>
>
>
> Basically, what the “STDC FENV_ACCESS” pragma does is provide programmers
> with a way to tell the compiler that the program might change the FP
> environment.  This pragma represents a setting that has only two states --
> on and off.  The default setting of this state is documented as being
> implementation defined.  In clang the default state will be off.  The C99
> standard states that accessing the FP environment (testing FP status flags,
> changing FP control modes, etc.) when FENV_ACCESS is off is undefined
> behavior.  The C99 standard provides library calls to access the
> environment (fesetround, fegetround, fetestexcept, etc.) but you can only
> safely use these if you have set FENV_ACCESS to the “on” state.  A typical
> usage might look like this:
>
>
>
> #include <fenv.h>
>
>
>
> double someFunc(double A, double B, bool ForceRoundUp) {
>
>   #pragma STDC FENV_ACCESS ON
>
>   double Result;
>
>   if (ForceRoundUp) {
>
>     int OldRM = fegetround();
>
>     fesetround(FE_UPWARD);
>
>     Result = A/B;
>
>     fesetround(OldRM);
>
>   } else {
>
>     Result = A/B;
>
>   }
>
>   return Result;
>
> }
>
>
>

... 
abridge
 ...

>
>
> What I’m thinking is that we need something like this:
>
>
>
> void llvm.set.roundingmode(i32 mode)
>
> i32 lllvm.get.roundingmode()
>
>
>
> These would then get translated during instruction selection to
> target-specific instructions, which is equivalent to what fesetround() and
> fegetround() do. But I think it would also be useful to have something like
> this:
>
>
>
> void llvm.begin.local.roundingmode(i32 mode)
>
> void llvm.end.local.roundingmode()
>
A little bit of curiously. What if user doesn't restore the OldRM like
this

double someFunc(double A, double B, bool ForceRoundUp) {
  #pragma STDC FENV_ACCESS ON
  double Result;
  if (ForceRoundUp) {
    int OldRM = fegetround();
    fesetround(FE_UPWARD);
    Result = A/B;

// 
fesetround(OldRM);
  } else {
    Result = A/B;
  }
  return Result;

}


Are we still going to generate llvm.begin.local.roundingmode/
llvm.end.local.roundingmode?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Homepage: https://people.cs.nctu.edu.tw/~chenwj
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171104/f2e75a9b/attachment.html>

Hal Finkel via llvm-dev

2017-Nov-04 00:45 UTC

head link

[llvm-dev] FW: clarification needed for the constrained fp implementation.

On 11/03/2017 05:26 PM, 陳韋任 via llvm-dev wrote:>
>
> 2017-11-04 4:29 GMT+08:00 Kaylor, Andrew via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>:
>
>     Copying the list on a discussion of potentially general interest….
>
>     *From:* Kaylor, Andrew
>     *Sent:* Friday, November 03, 2017 1:11 PM
>     *To:* 'Ding, Wei' <Wei.Ding2 at amd.com <mailto:Wei.Ding2
at amd.com>>;
>     Sumner, Brian <Brian.Sumner at amd.com
>     <mailto:Brian.Sumner at amd.com>>; Arsenault, Matthew
>     <Matthew.Arsenault at amd.com <mailto:Matthew.Arsenault at
amd.com>>
>     *Subject:* RE: clarification needed for the constrained fp
>     implementation.
>
>     Hi Wei,
>
>     I’ve been meaning to write something up for discussion on the LLVM
>     Dev list about this.  I hope you don’t mind if I copy the list now
>     to accomplish that while also answering your questions. 
>     Eventually I create a document describing this in more detail and
>     less formally than the language definition.
>
>     Basically, the “constraints” in the constrained FP intrinisics are
>     constraints on the optimizer.  They are a way of telling the
>     optimizer what it can and cannot assume about rounding mode and FP
>     exception behavior.  By default, the optimizer assumes that the
>     rounding mode is round-to-nearest and that FP exceptions are being
>     ignored.  If the user code is going to do anything that
>     invalidates these assumptions, then we need a way to make the
>     optimizer stop assuming that.  That’s what the intrinisics do. 
>     Because most passes don’t recognize the intrinisics, they can’t do
>     anything with the operations they represent and therefore can’t
>     make any assumption about them.
>
>     The intrinsics are not intended to do anything to change the
>     rounding mode or FP exception handling state.  I have an idea in
>     mind for some additional intrinsics that would provide a way to
>     control the FP environment.  There are already some
>     target-specific mechanisms for doing that, but I’d like to have
>     something that’s target independent. I’ll say more about this in a
>     minute.
>
>     I mentioned in my review comments that my work on this has been
>     motivated by the STDC pragmas, and I think if I explain that it
>     might make the semantics of the intrinsics seem a little more
>     natural.  The primary pragma I have in mind here is the “STDC
>     FENV_ACCESS” pragma.  I believe this is part of the C99 standard,
>     but compiler support for it is still mostly (if not entirely)
>     missing.  For instance, if you try to use this pragma with clang
>     you will get a message telling you that the pragma isn’t supported
>     and it will have no other effect. We want to change that.
>
>     Basically, what the “STDC FENV_ACCESS” pragma does is provide
>     programmers with a way to tell the compiler that the program might
>     change the FP environment.  This pragma represents a setting that
>     has only two states -- on and off.  The default setting of this
>     state is documented as being implementation defined.  In clang the
>     default state will be off. The C99 standard states that accessing
>     the FP environment (testing FP status flags, changing FP control
>     modes, etc.) when FENV_ACCESS is off is undefined behavior.  The
>     C99 standard provides library calls to access the environment
>     (fesetround, fegetround, fetestexcept, etc.) but you can only
>     safely use these if you have set FENV_ACCESS to the “on” state.  A
>     typical usage might look like this:
>
>     #include <fenv.h>
>
>     double someFunc(double A, double B, bool ForceRoundUp) {
>
>       #pragma STDC FENV_ACCESS ON
>
>       double Result;
>
>       if (ForceRoundUp) {
>
>         int OldRM = fegetround();
>
>       fesetround(FE_UPWARD);
>
>         Result = A/B;
>
>       fesetround(OldRM);
>
>       } else {
>
>         Result = A/B;
>
>       }
>
>       return Result;
>
>     }
>
>
> ... 
> abridge
>  ...
>
>
>     What I’m thinking is that we need something like this:
>
>     void llvm.set.roundingmode(i32 mode)
>
>     i32 lllvm.get.roundingmode()
>
>     These would then get translated during instruction selection to
>     target-specific instructions, which is equivalent to what
>     fesetround() and fegetround() do. But I think it would also be
>     useful to have something like this:
>
>     void llvm.begin.local.roundingmode(i32 mode)
>
>     void llvm.end.local.roundingmode()
>
>
> A little bit of curiously. What if user doesn't restore the OldRM 
> like this
>
>     double someFunc(double A, double B, bool ForceRoundUp) {
>       #pragma STDC FENV_ACCESS ON
>       double Result;
>       if (ForceRoundUp) {
>         int OldRM = fegetround();
>     fesetround(FE_UPWARD);
>         Result = A/B;
>     // 
>     fesetround(OldRM);
>       } else {
>         Result = A/B;
>       }
>       return Result;
>
>     }
>
>
> Are we still going to generate 
> llvm.begin.local.roundingmode/llvm.end.local.roundingmode?
I think that, however we do this, we'll need a way of dealing with
"live
in" and "live out" FP-environment state. Moreover, any external
call can
change these as well (unless we prove/know that it doesn't). Translating 
these calls into intrinsics so that the optimizer can reason about them 
seems like a reasonable plan.

  -Hal
>
> Regards,
> chenwj
>
> -- 
> Wei-Ren Chen (陳韋任)
> Homepage: https://people.cs.nctu.edu.tw/~chenwj 
> <https://people.cs.nctu.edu.tw/%7Echenwj>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171103/381f336d/attachment.html>

llvm dev - Nov 2017 - FW: clarification needed for the constrained fp implementation.

[llvm-dev] FW: clarification needed for the constrained fp implementation.

[llvm-dev] FW: clarification needed for the constrained fp implementation.

[llvm-dev] FW: clarification needed for the constrained fp implementation.