Nema, Ashutosh via llvm-dev
2018-Sep-25 07:23 UTC
[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer
Hi, Consider the following test case: int foo(float *A, float *B, float *C, int len, int VSMALL) { for (int i = 0; i < len; i++) if (C[i] > VSMALL) A[i] = B[i] / C[i]; } In this test the div operation is conditional but llvm is generating unconditional div for this case: vector.body: ; preds = %vector.body, %vector.ph %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %0 = getelementptr inbounds float, float* %C, i64 %index %1 = bitcast float* %0 to <8 x float>* %wide.load = load <8 x float>, <8 x float>* %1, align 4, !tbaa !2, !alias.scope !6 %2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30 %3 = getelementptr inbounds float, float* %B, i64 %index %4 = bitcast float* %3 to <8 x float>* %wide.masked.load = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2, <8 x float> undef), !tbaa !2, !alias.scope !9 %5 = fdiv <8 x float> %wide.masked.load, %wide.load %6 = getelementptr inbounds float, float* %A, i64 %index %7 = bitcast float* %6 to <8 x float>* call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias !13 %index.next = add i64 %index, 8 %8 = icmp eq i64 %index.next, %n.vec br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14 The generated IR seems unsafe because fdiv is not respecting the compare mask. As div is the unsafe operation, llvm should generates the predicated divs. If I change the data type of A, B & C to the integer type then it generates the right code, where div is predicated based on the mask, and scalar div gets generated for each lane. This seems like a problem in predicate instruction detection part of LV, currently it considers only UDiv, SDiv, URem, SRem. bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I, unsigned VF) { if (!Legal->blockNeedsPredication(I->getParent())) return false; switch(I->getOpcode()) { default: break; case Instruction::UDiv: <- Floating point operations not considered i.e FDiv & FRem case Instruction::SDiv: case Instruction::SRem: case Instruction::URem: return mayDivideByZero(*I); } I don't have any background of this function, but I feel this should consider FDiv & FRem instructions as well. If there is no objection to it, will do a patch. Thanks, Ashutosh -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180925/20ab4f0a/attachment-0001.html>
Robin Kruppe via llvm-dev
2018-Sep-25 09:44 UTC
[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer
Hi Ashutosh, On Tue, 25 Sep 2018 at 09:23, Nema, Ashutosh via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi, > > > > Consider the following test case: > > > > int foo(float *A, float *B, float *C, int len, int VSMALL) { > > for (int i = 0; i < len; i++) > > if (C[i] > VSMALL) > > A[i] = B[i] / C[i]; > > } > > > > In this test the div operation is conditional but llvm is generating unconditional div for this case: > > > > vector.body: ; preds = %vector.body, %vector.ph > > %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] > > %0 = getelementptr inbounds float, float* %C, i64 %index > > %1 = bitcast float* %0 to <8 x float>* > > %wide.load = load <8 x float>, <8 x float>* %1, align 4, !tbaa !2, !alias.scope !6 > > %2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30 > > %3 = getelementptr inbounds float, float* %B, i64 %index > > %4 = bitcast float* %3 to <8 x float>* > > %wide.masked.load = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2, <8 x float> undef), !tbaa !2, !alias.scope !9 > > %5 = fdiv <8 x float> %wide.masked.load, %wide.load > > %6 = getelementptr inbounds float, float* %A, i64 %index > > %7 = bitcast float* %6 to <8 x float>* > > call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias !13 > > %index.next = add i64 %index, 8 > > %8 = icmp eq i64 %index.next, %n.vec > > br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14 > > > > The generated IR seems unsafe because fdiv is not respecting the compare mask. > > > > As div is the unsafe operation, llvm should generates the predicated divs.Can you elaborate on why you think the floating point operations are "unsafe" and need to be predicated? Integer division by zero and remainder by zero is Undefined Behavior, but the corresponding floating point operations just result in a NaN or infinity in "error" cases such as division by zero. You might be thinking about the "floating point exceptions" that these operations can signal. If so, keep in mind that by default these do not trap but simply make the operation silently return in a default value such as an infinity, zero, or NaN. The LLVM IR instructions fdiv and frem (as well as their siblings fadd, fmul, etc.) are assumed to execute in an environment [1] where this default handling is not changed and where nobody inspects any flags (e.g., in an FPU status register) that may be set when exceptions occur. Programs where this assumption is not true have to use the constrained fp intrinsics [2], which indeed constrain the vectorizer and all other optimization passes (LV is far from the only pass that will move an fdiv out of a conditional). Cheers, Robin [1]: https://llvm.org/docs/LangRef.html#floating-point-environment [2]: https://llvm.org/docs/LangRef.html#constrainedfp> > > If I change the data type of A, B & C to the integer type then it generates the right code, where div is predicated based on the mask, and scalar div gets generated for each lane. > > > > This seems like a problem in predicate instruction detection part of LV, currently it considers only UDiv, SDiv, URem, SRem. > > > > bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I, unsigned VF) { > > if (!Legal->blockNeedsPredication(I->getParent())) > > return false; > > switch(I->getOpcode()) { > > default: > > break; > > case Instruction::UDiv: <- Floating point operations not considered i.e FDiv & FRem > > case Instruction::SDiv: > > case Instruction::SRem: > > case Instruction::URem: > > return mayDivideByZero(*I); > > } > > > > I don’t have any background of this function, but I feel this should consider FDiv & FRem instructions as well. > > > > If there is no objection to it, will do a patch. > > > > Thanks, > > Ashutosh > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Nema, Ashutosh via llvm-dev
2018-Sep-26 04:47 UTC
[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer
Thanks for the detailed explanation Robin, was not aware of this fact that for the floating point operation llvm assumes: "The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment." The test snip mentioned in my previous mail if from openFOAM application, it fails at runtime because of unconditional FDIV. Thanks, Ashutosh -----Original Message----- From: Robin Kruppe <robin.kruppe at gmail.com> Sent: Tuesday, September 25, 2018 3:14 PM To: Nema, Ashutosh <Ashutosh.Nema at amd.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer Hi Ashutosh, On Tue, 25 Sep 2018 at 09:23, Nema, Ashutosh via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi, > > > > Consider the following test case: > > > > int foo(float *A, float *B, float *C, int len, int VSMALL) { > > for (int i = 0; i < len; i++) > > if (C[i] > VSMALL) > > A[i] = B[i] / C[i]; > > } > > > > In this test the div operation is conditional but llvm is generating unconditional div for this case: > > > > vector.body: ; preds = %vector.body, %vector.ph > > %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] > > %0 = getelementptr inbounds float, float* %C, i64 %index > > %1 = bitcast float* %0 to <8 x float>* > > %wide.load = load <8 x float>, <8 x float>* %1, align 4, !tbaa !2, > !alias.scope !6 > > %2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30 > > %3 = getelementptr inbounds float, float* %B, i64 %index > > %4 = bitcast float* %3 to <8 x float>* > > %wide.masked.load = call <8 x float> > @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2, > <8 x float> undef), !tbaa !2, !alias.scope !9 > > %5 = fdiv <8 x float> %wide.masked.load, %wide.load > > %6 = getelementptr inbounds float, float* %A, i64 %index > > %7 = bitcast float* %6 to <8 x float>* > > call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x > float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias > !13 > > %index.next = add i64 %index, 8 > > %8 = icmp eq i64 %index.next, %n.vec > > br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14 > > > > The generated IR seems unsafe because fdiv is not respecting the compare mask. > > > > As div is the unsafe operation, llvm should generates the predicated divs.Can you elaborate on why you think the floating point operations are "unsafe" and need to be predicated? Integer division by zero and remainder by zero is Undefined Behavior, but the corresponding floating point operations just result in a NaN or infinity in "error" cases such as division by zero. You might be thinking about the "floating point exceptions" that these operations can signal. If so, keep in mind that by default these do not trap but simply make the operation silently return in a default value such as an infinity, zero, or NaN. The LLVM IR instructions fdiv and frem (as well as their siblings fadd, fmul, etc.) are assumed to execute in an environment [1] where this default handling is not changed and where nobody inspects any flags (e.g., in an FPU status register) that may be set when exceptions occur. Programs where this assumption is not true have to use the constrained fp intrinsics [2], which indeed constrain the vectorizer and all other optimization passes (LV is far from the only pass that will move an fdiv out of a conditional). Cheers, Robin [1]: https://llvm.org/docs/LangRef.html#floating-point-environment [2]: https://llvm.org/docs/LangRef.html#constrainedfp> > > If I change the data type of A, B & C to the integer type then it generates the right code, where div is predicated based on the mask, and scalar div gets generated for each lane. > > > > This seems like a problem in predicate instruction detection part of LV, currently it considers only UDiv, SDiv, URem, SRem. > > > > bool LoopVectorizationCostModel::isScalarWithPredication(Instruction > *I, unsigned VF) { > > if (!Legal->blockNeedsPredication(I->getParent())) > > return false; > > switch(I->getOpcode()) { > > default: > > break; > > case Instruction::UDiv: <- Floating point operations not considered > i.e FDiv & FRem > > case Instruction::SDiv: > > case Instruction::SRem: > > case Instruction::URem: > > return mayDivideByZero(*I); > > } > > > > I don’t have any background of this function, but I feel this should consider FDiv & FRem instructions as well. > > > > If there is no objection to it, will do a patch. > > > > Thanks, > > Ashutosh > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev