search for: p0v4i32

Displaying 6 results from an estimated 6 matches for "p0v4i32".

2015 May 05
2
[LLVMdev] [AArch64] Should we restrict to the pointer type used in ldN/stN intrinsics?
...n pass and generate ld2 with "llc -march=aarch64 < ld2.ll". I just think it's strange that the pointer has no relationship with the returned type. Currently there are IR regression test cases using different kinds of pointers like 'xx.ld2.v4i32.p0i32', 'xx.ld2.v4i32.p0v4i32' or 'xx.ld2.v4i32.p0i8', which looks confusing. Should we modify the definition of such intrinsics and restrict the pointer type? If you agree with me, I suggest to use a pointer type to the vector element. Because the 'arm_neon.h' declare the ld2 intrinsic like 'int32x2...
2019 May 20
3
[RFC] Intrinsics for Hardware Loops
...br i1 %1, label %vector.ph, label %for.loopexit vector.ph: br label %vector.body vector.body: %elts = phi i32 [ %N, %vector.ph ], [ %elts.rem, %vector.body ] %active = call <4 x i1> @llvm.arm.get.active.mask(i32 %elts, i32 4) %load = tail call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %active, <4 x i32> undef) tail call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %load, <4 x i32>* %addr.1, i32 4, <4 x i1> %active) %elts.rem = call i32 @llvm.arm.loop.end(i32 %elts, i32 4) %cmp = icmp sgt i32 %elts.rem...
2016 Oct 10
2
[arm, aarch64] Alignment checking in interleaved access pass
...gt; %113, <4 x i32> <i32 16, i32 17, i32 18, i32 19> %117 = shufflevector <16 x i32> %112, <16 x i32> %113, <4 x i32> <i32 24, i32 25, i32 26, i32 27> %118 = bitcast <16 x i32>* %uglygep242243 to <4 x i32>* call void @llvm.aarch64.neon.st4.v4i32.p0v4i32(<4 x i32> %114, <4 x i32> %115, <4 x i32> %116, <4 x i32> %117, <4 x i32>* %118) %scevgep241 = getelementptr <16 x i32>, <16 x i32>* %uglygep242243, i64 1 %119 = shufflevector <16 x i32> %112, <16 x i32> %113, <4 x i32> <i32 4, i32...
2016 Oct 10
2
[arm, aarch64] Alignment checking in interleaved access pass
Hi Renato, Thank you for the answers! First, let me clarify a couple of things and give some context. The patch it looking at VSTn, rather than VLDn (stores seem to be somewhat harder to get the "right" patterns, the pass is doing a good job for loads already) The examples you gave come mostly from loop vectorization, which, as I understand it, was the reason for adding the
2020 May 21
2
LV: predication
...tory. A previous concern was this inhibiting other things, but I don't see that. What we are changing is this original icmp: %active.lane.mask =<4 x i1> icmp ult <4 x i32> %induction, <4 x i32> %broadcast.splat %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> undef) with this: %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i32(<4 x i32> %induction, <4 x i32> %broadcast.splat) %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p...
2020 May 20
2
LV: predication
Hi Ayal, Let me start with commenting on this: > A dedicated intrinsic that freezes the compare instruction, for no apparent reason, may potentially cripple subsequent passes from further optimizing the vectorized loop. The point is we have a very good reason, which is that it passes on the right information on the backend, enabling opimisations as opposed to crippling them. The compare