search for: vstns

Displaying 3 results from an estimated 3 matches for "vstns".

Did you mean: vstn
2016 Oct 10
2
[arm, aarch64] Alignment checking in interleaved access pass
...y generating explicit VSTn intrinsics, with some of the patterns I described, and I found no reason why Halide shouldn't generate a single shuffle, followed by a generic vector store and rely on the interleaved access pass to generate the right intrinsic. Performance-wise, it is worth using the VSTns in the scenarios they encounter, it's mostly a question of where they get generated. The alignment question is orthogonal to the patch up for review. There was no alignment check before, and I didn't have enough background of the architectures to conclude if this was needed or not. I added...
2016 Oct 10
2
[arm, aarch64] Alignment checking in interleaved access pass
...;t thought > about the general case yet". > That's right, perhaps because Halide is not a regular vectorizer, which opens up new cases. To give a bit more insight, here's a simple example of where the data is still continuous: [0 .. 32) , but it needs to be split to use multiple VSTns/STns. This is what Halide generates for aarch64: %uglygep242243 = bitcast i8* %uglygep242 to <16 x i32>* %114 = shufflevector <16 x i32> %112, <16 x i32> %113, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %115 = shufflevector <16 x i32> %112, <16 x i32> %...
2016 Sep 19
3
[arm, aarch64] Alignment checking in interleaved access pass
Hi, As a follow up to Patch D23646 <https://reviews.llvm.org/D23646>, I'm trying to figure out if there should be an alignment check and what the correct approach is. Some background: For stores, the pass turns: %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11> store <12 x i32> %i.vec, <12 x i32>* %ptr