Displaying 7 results from an estimated 7 matches for "vctp".
Did you mean:
sctp
2020 May 04
3
LV: predication
...ing out the underlying element count given a predicate, maybe we could attack it from that angle? For example, introduce a special intrinsic for deriving the mask (sort of like the SVE whilelo).
That would be an excellent way of doing it and it would also map very well to MVE too, where we have a VCTP intrinsic/instruction that creates the mask/predicate (Vector Create Tail-Predicate). So I will go for this approach. Such an intrinsic was actually also proposed in Sam's original RFC (see https://lists.llvm.org/pipermail/llvm-dev/2019-May/132512.html), but we hadn't implemented it yet. Th...
2019 Jul 15
2
Tail-Loop Folding/Predication
...orm without
any predicated intrinsics:
#pragma tail_predicate
do {
VLD(..); // some vector load intrinsic
VST(..); // some vector store intrinsic
..
} while (N);
which can then be transformed and predication made explicit through data
dependencies like so:
do {
mask = vctp(N); // intrinsic that generates the mask of active lanes
VLD(.., mask);
VST(.., mask);
..
} while (N);
A vector loop in this form can easily be picked up the new hardware loop pass,
and the corresponding tail-predicated hardware loop can be generated. This is
only a small example,...
2020 May 01
3
LV: predication
...4 x i32> @llvm.masked.load
call <4 x i32> @llvm.masked.load
call void @llvm.masked.store
call i32 @llvm.loop.decrement.reg
br i1 %12, label %.*, label %vector.body
We then pick this up in our tail-predication pass, remove @llvm.set.loop.elements intrinsic, and add @vctp which is our intrinsic that generates the mask of active/inactive lanes:
vector.ph:
call void @llvm.set.loop.iterations.i32(i32 %5)
br label %vector.body
vector.body:
call <4 x i1> @llvm.arm.mve.vctp32
call <4 x i32> @llvm.masked.load
call <4 x i32&...
2020 May 01
5
LV: predication
...2> @llvm.masked.load
call <4 x i32> @llvm.masked.load
call void @llvm.masked.store
call i32 @llvm.loop.decrement.reg
br i1 %12, label %.*, label %vector.body
We then pick this up in our tail-predication pass, remove @llvm.set.loop.elements intrinsic, and add @vctp which is our intrinsic that generates the mask of active/inactive lanes:
vector.ph:
call void @llvm.set.loop.iterations.i32(i32 %5)
br label %vector.body
vector.body:
call <4 x i1> @llvm.arm.mve.vctp32
call <4 x i32> @llvm.masked.load
call <...
2020 May 04
3
LV: predication
...om>
Cc: Eli Friedman <efriedma at quicinc.com>; llvm-dev <llvm-dev at lists.llvm.org>; Sam Parker <Sam.Parker at arm.com>
Subject: Re: [llvm-dev] LV: predication
Hi Sjoerd,
That would be an excellent way of doing it and it would also map very well to MVE too, where we have a VCTP intrinsic/instruction that creates the mask/predicate (Vector Create Tail-Predicate). So I will go for this approach. Such an intrinsic was actually also proposed in Sam's original RFC (see https://lists.llvm.org/pipermail/llvm-dev/2019-May/132512.html), but we hadn't implemented it yet. Th...
2020 May 20
2
LV: predication
...2> @llvm.masked.load
call <4 x i32> @llvm.masked.load
call void @llvm.masked.store
call i32 @llvm.loop.decrement.reg
br i1 %12, label %.*, label %vector.body
We then pick this up in our tail-predication pass, remove @llvm.set.loop.elements intrinsic, and add @vctp which is our intrinsic that generates the mask of active/inactive lanes:
vector.ph:
call void @llvm.set.loop.iterations.i32(i32 %5)
br label %vector.body
vector.body:
call <4 x i1> @llvm.arm.mve.vctp32
call <4 x i32> @llvm.masked.load
call <...
2020 May 21
2
LV: predication
...2> @llvm.masked.load
call <4 x i32> @llvm.masked.load
call void @llvm.masked.store
call i32 @llvm.loop.decrement.reg
br i1 %12, label %.*, label %vector.body
We then pick this up in our tail-predication pass, remove @llvm.set.loop.elements intrinsic, and add @vctp which is our intrinsic that generates the mask of active/inactive lanes:
vector.ph:
call void @llvm.set.loop.iterations.i32(i32 %5)
br label %vector.body
vector.body:
call <4 x i1> @llvm.arm.mve.vctp32
call <4 x i32> @llvm.masked.load
call <...