Displaying 3 results from an estimated 3 matches for "vstns".
Did you mean:
vstn
2016 Oct 10
2
[arm, aarch64] Alignment checking in interleaved access pass
...y generating explicit VSTn intrinsics,
with some of the patterns I described, and I found no reason why Halide
shouldn't generate a single shuffle, followed by a generic vector store and
rely on the interleaved access pass to generate the right intrinsic.
Performance-wise, it is worth using the VSTns in the scenarios they
encounter, it's mostly a question of where they get generated.
The alignment question is orthogonal to the patch up for review. There was
no alignment check before, and I didn't have enough background of the
architectures to conclude if this was needed or not. I added...
2016 Oct 10
2
[arm, aarch64] Alignment checking in interleaved access pass
...;t thought
> about the general case yet".
>
That's right, perhaps because Halide is not a regular vectorizer, which
opens up new cases.
To give a bit more insight, here's a simple example of where the data is
still continuous: [0 .. 32) , but it needs to be split to use multiple
VSTns/STns. This is what Halide generates for aarch64:
%uglygep242243 = bitcast i8* %uglygep242 to <16 x i32>*
%114 = shufflevector <16 x i32> %112, <16 x i32> %113, <4 x i32> <i32 0,
i32 1, i32 2, i32 3>
%115 = shufflevector <16 x i32> %112, <16 x i32> %...
2016 Sep 19
3
[arm, aarch64] Alignment checking in interleaved access pass
Hi,
As a follow up to Patch D23646 <https://reviews.llvm.org/D23646>, I'm
trying to figure out if there should be an alignment check and what the
correct approach is.
Some background:
For stores, the pass turns:
%i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1,
<0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>
store <12 x i32> %i.vec, <12 x i32>* %ptr