thr3ads.net - llvm dev - [llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Nema, Ashutosh via llvm-dev

2018-Sep-25 07:23 UTC

[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

Hi,

Consider the following test case:

int foo(float *A, float *B, float *C, int len, int VSMALL) {
  for (int i = 0; i < len; i++)
    if (C[i] > VSMALL)
      A[i] = B[i] / C[i];
}

In this test the div operation is conditional but llvm is generating
unconditional div for this case:

vector.body:                                      ; preds = %vector.body,
%vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %0 = getelementptr inbounds float, float* %C, i64 %index
  %1 = bitcast float* %0 to <8 x float>*
  %wide.load = load <8 x float>, <8 x float>* %1, align 4, !tbaa !2,
!alias.scope !6
  %2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30
  %3 = getelementptr inbounds float, float* %B, i64 %index
  %4 = bitcast float* %3 to <8 x float>*
  %wide.masked.load = call <8 x float>
@llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2,
<8 x float> undef), !tbaa !2, !alias.scope !9
  %5 = fdiv <8 x float> %wide.masked.load, %wide.load
  %6 = getelementptr inbounds float, float* %A, i64 %index
  %7 = bitcast float* %6 to <8 x float>*
  call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x
float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias
!13
  %index.next = add i64 %index, 8
  %8 = icmp eq i64 %index.next, %n.vec
  br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14

The generated IR seems unsafe because fdiv is not respecting the compare mask.

As div is the unsafe operation, llvm should generates the predicated divs.

If I change the data type of A, B & C to the integer type then it generates
the right code, where div is predicated based on the mask, and scalar div gets
generated for each lane.

This seems like a problem in predicate instruction detection part of LV,
currently it considers only UDiv, SDiv, URem, SRem.

bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I,
unsigned VF) {
  if (!Legal->blockNeedsPredication(I->getParent()))
    return false;
  switch(I->getOpcode()) {
  default:
    break;
  case Instruction::UDiv:  <- Floating point operations not considered i.e
FDiv & FRem
  case Instruction::SDiv:
  case Instruction::SRem:
  case Instruction::URem:
    return mayDivideByZero(*I);
}

I don't have any background of this function, but I feel this should
consider FDiv & FRem instructions as well.

If there is no objection to it, will do a patch.

Thanks,
Ashutosh
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180925/20ab4f0a/attachment-0001.html>

Robin Kruppe via llvm-dev

2018-Sep-25 09:44 UTC

head link

[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

Hi Ashutosh,


On Tue, 25 Sep 2018 at 09:23, Nema, Ashutosh via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi,
>
>
>
> Consider the following test case:
>
>
>
> int foo(float *A, float *B, float *C, int len, int VSMALL) {
>
>   for (int i = 0; i < len; i++)
>
>     if (C[i] > VSMALL)
>
>       A[i] = B[i] / C[i];
>
> }
>
>
>
> In this test the div operation is conditional but llvm is generating
unconditional div for this case:
>
>
>
> vector.body:                                      ; preds = %vector.body,
%vector.ph
>
>   %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
>
>   %0 = getelementptr inbounds float, float* %C, i64 %index
>
>   %1 = bitcast float* %0 to <8 x float>*
>
>   %wide.load = load <8 x float>, <8 x float>* %1, align 4,
!tbaa !2, !alias.scope !6
>
>   %2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30
>
>   %3 = getelementptr inbounds float, float* %B, i64 %index
>
>   %4 = bitcast float* %3 to <8 x float>*
>
>   %wide.masked.load = call <8 x float>
@llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2,
<8 x float> undef), !tbaa !2, !alias.scope !9
>
>   %5 = fdiv <8 x float> %wide.masked.load, %wide.load
>
>   %6 = getelementptr inbounds float, float* %A, i64 %index
>
>   %7 = bitcast float* %6 to <8 x float>*
>
>   call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x
float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias
!13
>
>   %index.next = add i64 %index, 8
>
>   %8 = icmp eq i64 %index.next, %n.vec
>
>   br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14
>
>
>
> The generated IR seems unsafe because fdiv is not respecting the compare
mask.
>
>
>
> As div is the unsafe operation, llvm should generates the predicated divs.
Can you elaborate on why you think the floating point operations are
"unsafe" and need to be predicated? Integer division by zero and
remainder by zero is Undefined Behavior, but the corresponding
floating point operations just result in a NaN or infinity in "error"
cases such as division by zero.

You might be thinking about the "floating point exceptions" that these
operations can signal. If so, keep in mind that by default these do
not trap but simply make the operation silently return in a default
value such as an infinity, zero, or NaN. The LLVM IR instructions fdiv
and frem (as well as their siblings fadd, fmul, etc.) are assumed to
execute in an environment [1] where this default handling is not
changed and where nobody inspects any flags (e.g., in an FPU status
register) that may be set when exceptions occur. Programs where this
assumption is not true have to use the constrained fp intrinsics [2],
which indeed constrain the vectorizer and all other optimization
passes (LV is far from the only pass that will move an fdiv out of a
conditional).


Cheers,
Robin

[1]: https://llvm.org/docs/LangRef.html#floating-point-environment
[2]: https://llvm.org/docs/LangRef.html#constrainedfp
>
>
> If I change the data type of A, B & C to the integer type then it
generates the right code, where div is predicated based on the mask, and scalar
div gets generated for each lane.
>
>
>
> This seems like a problem in predicate instruction detection part of LV,
currently it considers only UDiv, SDiv, URem, SRem.
>
>
>
> bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I,
unsigned VF) {
>
>   if (!Legal->blockNeedsPredication(I->getParent()))
>
>     return false;
>
>   switch(I->getOpcode()) {
>
>   default:
>
>     break;
>
>   case Instruction::UDiv:  <- Floating point operations not considered
i.e FDiv & FRem
>
>   case Instruction::SDiv:
>
>   case Instruction::SRem:
>
>   case Instruction::URem:
>
>     return mayDivideByZero(*I);
>
> }
>
>
>
> I don’t have any background of this function, but I feel this should
consider FDiv & FRem instructions as well.
>
>
>
> If there is no objection to it, will do a patch.
>
>
>
> Thanks,
>
> Ashutosh
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Nema, Ashutosh via llvm-dev

2018-Sep-26 04:47 UTC

head link

[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

Thanks for the detailed explanation Robin, was not aware of this fact that for
the floating point operation llvm assumes:

"The default LLVM floating-point environment assumes that floating-point
instructions do not have side effects. Results assume the round-to-nearest
rounding mode. No floating-point exception state is maintained in this
environment."

The test snip mentioned in my previous mail if from openFOAM application, it
fails at runtime because of unconditional FDIV.

Thanks,
Ashutosh

-----Original Message-----
From: Robin Kruppe <robin.kruppe at gmail.com> 
Sent: Tuesday, September 25, 2018 3:14 PM
To: Nema, Ashutosh <Ashutosh.Nema at amd.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Unsafe floating point operation (FDiv & FRem) in
LoopVectorizer

Hi Ashutosh,


On Tue, 25 Sep 2018 at 09:23, Nema, Ashutosh via llvm-dev <llvm-dev at
lists.llvm.org> wrote:>
> Hi,
>
>
>
> Consider the following test case:
>
>
>
> int foo(float *A, float *B, float *C, int len, int VSMALL) {
>
>   for (int i = 0; i < len; i++)
>
>     if (C[i] > VSMALL)
>
>       A[i] = B[i] / C[i];
>
> }
>
>
>
> In this test the div operation is conditional but llvm is generating
unconditional div for this case:
>
>
>
> vector.body:                                      ; preds = %vector.body,
%vector.ph
>
>   %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
>
>   %0 = getelementptr inbounds float, float* %C, i64 %index
>
>   %1 = bitcast float* %0 to <8 x float>*
>
>   %wide.load = load <8 x float>, <8 x float>* %1, align 4,
!tbaa !2,
> !alias.scope !6
>
>   %2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30
>
>   %3 = getelementptr inbounds float, float* %B, i64 %index
>
>   %4 = bitcast float* %3 to <8 x float>*
>
>   %wide.masked.load = call <8 x float> 
> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x
i1> %2,
> <8 x float> undef), !tbaa !2, !alias.scope !9
>
>   %5 = fdiv <8 x float> %wide.masked.load, %wide.load
>
>   %6 = getelementptr inbounds float, float* %A, i64 %index
>
>   %7 = bitcast float* %6 to <8 x float>*
>
>   call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x 
> float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11,
!noalias
> !13
>
>   %index.next = add i64 %index, 8
>
>   %8 = icmp eq i64 %index.next, %n.vec
>
>   br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14
>
>
>
> The generated IR seems unsafe because fdiv is not respecting the compare
mask.
>
>
>
> As div is the unsafe operation, llvm should generates the predicated divs.
Can you elaborate on why you think the floating point operations are
"unsafe" and need to be predicated? Integer division by zero and
remainder by zero is Undefined Behavior, but the corresponding floating point
operations just result in a NaN or infinity in "error"
cases such as division by zero.

You might be thinking about the "floating point exceptions" that these
operations can signal. If so, keep in mind that by default these do not trap but
simply make the operation silently return in a default value such as an
infinity, zero, or NaN. The LLVM IR instructions fdiv and frem (as well as their
siblings fadd, fmul, etc.) are assumed to execute in an environment [1] where
this default handling is not changed and where nobody inspects any flags (e.g.,
in an FPU status
register) that may be set when exceptions occur. Programs where this assumption
is not true have to use the constrained fp intrinsics [2], which indeed
constrain the vectorizer and all other optimization passes (LV is far from the
only pass that will move an fdiv out of a conditional).


Cheers,
Robin

[1]: https://llvm.org/docs/LangRef.html#floating-point-environment
[2]: https://llvm.org/docs/LangRef.html#constrainedfp
>
>
> If I change the data type of A, B & C to the integer type then it
generates the right code, where div is predicated based on the mask, and scalar
div gets generated for each lane.
>
>
>
> This seems like a problem in predicate instruction detection part of LV,
currently it considers only UDiv, SDiv, URem, SRem.
>
>
>
> bool LoopVectorizationCostModel::isScalarWithPredication(Instruction 
> *I, unsigned VF) {
>
>   if (!Legal->blockNeedsPredication(I->getParent()))
>
>     return false;
>
>   switch(I->getOpcode()) {
>
>   default:
>
>     break;
>
>   case Instruction::UDiv:  <- Floating point operations not considered 
> i.e FDiv & FRem
>
>   case Instruction::SDiv:
>
>   case Instruction::SRem:
>
>   case Instruction::URem:
>
>     return mayDivideByZero(*I);
>
> }
>
>
>
> I don’t have any background of this function, but I feel this should
consider FDiv & FRem instructions as well.
>
>
>
> If there is no objection to it, will do a patch.
>
>
>
> Thanks,
>
> Ashutosh
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Sep 2018 - Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

Maybe Matching Threads