Venkataramanan Kumar via llvm-dev
2021-Oct-06 16:32 UTC
[llvm-dev] Help: Question on Epilog Vectorization
Hi, I wrote a small test case and tried to force epilog vectorization for the loop. void foo(double * restrict a, double * restrict b, int N) { for(int i = 0; i < N; ++i) a[i] = sin(i); } clang -O3 -mavx2 -fveclib=libmvec sin.c -mllvm -epilogue-vectorization-minimum-VF=4 -S -emit-llvm -fno-unroll-loops But I ended up with epilog vectorization failing at this check. In the function "isCandidateForEpilogueVectorization", I find the below check. -- Snip llvm/lib/Transforms/Vectorize/LoopVectorize.cpp -- // Induction variables that are widened require special handling that is // currently not supported. if (any_of(Legal->getInductionVars(), [&](auto &Entry) { return !(this->isScalarAfterVectorization(Entry.first, VF) || this->isProfitableToScalarize(Entry.first, VF)); -- Snip -- I understand that when induction variables are widened as per the VPLAN , we don't support such loops for epilog vectorization at the moment. But can someone please explain the "special handling" we need to do here? If I remove the check from the source, the epilog vectorization is happening, but the generated LLVM IR seems to be wrong. ---Snip-- 12: ; preds = %12, %10 %13 = phi i64 [ 0, %10 ], [ %19, %12 ] %14 = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %10 ], [ %20, %12 ] %15 = sitofp <4 x i32> %14 to <4 x double> %16 = call <4 x double> @_ZGVdN4v_sin(<4 x double> %15) %17 = getelementptr inbounds double, double* %0, i64 %13 %18 = bitcast double* %17 to <4 x double>* store <4 x double> %16, <4 x double>* %18, align 8, !tbaa !3 %19 = add nuw i64 %13, 4 %20 = add <4 x i32> %14, <i32 4, i32 4, i32 4, i32 4> %21 = icmp eq i64 %19, %11 br i1 %21, label %22, label %12, !llvm.loop !7 22: ; preds = %12 %23 = icmp eq i64 %11, %6 br i1 %23, label %44, label %24 24: ; preds = %22 %25 = and i64 %6, 2 %26 = icmp eq i64 %25, 0 br i1 %26, label %42, label %27 27: ; preds = %8, %24 %28 = phi i64 [ %11, %24 ], [ 0, %8 ] %29 = and i64 %6, 4294967294 br label %30 30: ; preds = %30, %27 %31 = phi i64 [ %28, %27 ], [ %37, %30 ] %32 = phi <2 x i32> [ <i32 0, i32 1>, %27 ], [ %38, %30 ] <== Resume value seem to be wrong. %33 = sitofp <2 x i32> %32 to <2 x double> %34 = call <2 x double> @_ZGVbN2v_sin(<2 x double> %33) %35 = getelementptr inbounds double, double* %0, i64 %31 %36 = bitcast double* %35 to <2 x double>* store <2 x double> %34, <2 x double>* %36, align 8, !tbaa !3 %37 = add nuw i64 %31, 2 %38 = add <2 x i32> %32, <i32 2, i32 2> %39 = icmp eq i64 %37, %29 br i1 %39, label %40, label %30, !llvm.loop !11 --- Snip-- I see the resume value for the widened phi node in the epilog loop is not updated correctly. Are there any other issues here apart from handling the widened induction variable's resume value ? Regards, Venkat. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211006/f25c30d7/attachment.html>
Bardia Mahjour via llvm-dev
2021-Oct-06 20:41 UTC
[llvm-dev] Help: Question on Epilog Vectorization
The resume value for the widened induction is the only problem I'm aware of. The issue is that normally scalar induction resume values are created/updated as part of skeleton creation. However for widened inductions in the epilogue loop, we have corresponding recipes in the vplan that haven't been executed at the time of skeleton creation. We either have to find the related phis after the fact and fix them up, or change the vplan to update the incoming values of the widened IVs before executing on it. Florian demonstrate the latter idea in https://reviews.llvm.org/D92132, so maybe he has more details to share. Bardia Mahjour Compiler Optimizations IBM Toronto Software Lab From: "Venkataramanan Kumar" <venkataramanan.kumar.llvm at gmail.com> To: "llvm-dev" <llvm-dev at lists.llvm.org> Cc: bmahjour at ca.ibm.com, "Florian Hahn" <florian_hahn at apple.com> Date: 2021/10/06 12:33 PM Subject: [EXTERNAL] Help: Question on Epilog Vectorization Hi, I wrote a small test case and tried to force epilog vectorization for the loop. void foo(double * restrict a, double * restrict b, int N) { for(int i = 0; i < N; ++i) a[i] = sin(i); } clang -O3 -mavx2 -fveclib=libmvec sin.c ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi, I wrote a small test case and tried to force epilog vectorization for the loop. void foo(double * restrict a, double * restrict b, int N) { for(int i = 0; i < N; ++i) a[i] = sin(i); } clang -O3 -mavx2 -fveclib=libmvec sin.c -mllvm -epilogue-vectorization-minimum-VF=4 -S -emit-llvm -fno-unroll-loops But I ended up with epilog vectorization failing at this check. In the function "isCandidateForEpilogueVectorization", I find the below check. -- Snip llvm/lib/Transforms/Vectorize/LoopVectorize.cpp -- // Induction variables that are widened require special handling that is // currently not supported. if (any_of(Legal->getInductionVars(), [&](auto &Entry) { return !(this->isScalarAfterVectorization(Entry.first, VF) || this->isProfitableToScalarize(Entry.first, VF)); -- Snip -- I understand that when induction variables are widened as per the VPLAN , we don't support such loops for epilog vectorization at the moment. But can someone please explain the "special handling" we need to do here? If I remove the check from the source, the epilog vectorization is happening, but the generated LLVM IR seems to be wrong. ---Snip-- 12: ; preds = %12, %10 %13 = phi i64 [ 0, %10 ], [ %19, %12 ] %14 = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %10 ], [ %20, %12 ] %15 = sitofp <4 x i32> %14 to <4 x double> %16 = call <4 x double> @_ZGVdN4v_sin(<4 x double> %15) %17 = getelementptr inbounds double, double* %0, i64 %13 %18 = bitcast double* %17 to <4 x double>* store <4 x double> %16, <4 x double>* %18, align 8, !tbaa !3 %19 = add nuw i64 %13, 4 %20 = add <4 x i32> %14, <i32 4, i32 4, i32 4, i32 4> %21 = icmp eq i64 %19, %11 br i1 %21, label %22, label %12, !llvm.loop !7 22: ; preds = %12 %23 = icmp eq i64 %11, %6 br i1 %23, label %44, label %24 24: ; preds = %22 %25 = and i64 %6, 2 %26 = icmp eq i64 %25, 0 br i1 %26, label %42, label %27 27: ; preds = %8, %24 %28 = phi i64 [ %11, %24 ], [ 0, %8 ] %29 = and i64 %6, 4294967294 br label %30 30: ; preds = %30, %27 %31 = phi i64 [ %28, %27 ], [ %37, %30 ] %32 = phi <2 x i32> [ <i32 0, i32 1>, %27 ], [ %38, %30 ] <== Resume value seem to be wrong. %33 = sitofp <2 x i32> %32 to <2 x double> %34 = call <2 x double> @_ZGVbN2v_sin(<2 x double> %33) %35 = getelementptr inbounds double, double* %0, i64 %31 %36 = bitcast double* %35 to <2 x double>* store <2 x double> %34, <2 x double>* %36, align 8, !tbaa !3 %37 = add nuw i64 %31, 2 %38 = add <2 x i32> %32, <i32 2, i32 2> %39 = icmp eq i64 %37, %29 br i1 %39, label %40, label %30, !llvm.loop !11 --- Snip-- I see the resume value for the widened phi node in the epilog loop is not updated correctly. Are there any other issues here apart from handling the widened induction variable's resume value ? Regards, Venkat. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211006/93305657/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211006/93305657/attachment.gif>