search for: vst4

Displaying 5 results from an estimated 5 matches for "vst4".

Did you mean: st4
2013 Oct 14
1
[LLVMdev] Vectorization of pointer PHI nodes
...loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that > this is what happened). But the LLVM-IR you sent has a store of 0 in there > ;) and strides by 4. > I think so. Ignore the last write, it was bogus. (but don't ignore the fact that GCC vectorized it anyway with vst4!). By running GCC with -ftree-vectorizer-verbose=1 I got: test.c:11: note: create runtime check for data references DELTA and *WRITE_30 test.c:11: note: create runtime check for data references *READ_29 and *WRITE_30 test.c:11: note: created 2 versioning for alias checks. test.c:11: note: === vec...
2016 Oct 10
2
[arm, aarch64] Alignment checking in interleaved access pass
On Mon, Oct 10, 2016 at 1:14 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 10 October 2016 at 19:39, Alina Sbirlea <alina.sbirlea at gmail.com> > wrote: > > Now, for ARM archs Halide is currently generating explicit VSTn > intrinsics, > > with some of the patterns I described, and I found no reason why Halide > > shouldn't generate a single
2020 Jan 27
4
Limited use types in the back end
I am hoping that someone can offer advice on a somewhat unusual issue that I am facing with the SDAG. Namely, I am trying to implement some custom operations that do very specific things on multiple registers at a time. The operations themselves will simply be intrinsics since there are no equivalent operations in IR/SDAG. However, handling the types seems rather tricky. One approach I tried is
2013 Oct 14
0
[LLVMdev] Vectorization of pointer PHI nodes
Renato, can you post the c code for the function and the assembly that gcc produces? Your initial example could be well handled by vectorization of strided loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that this is what happened). But the LLVM-IR you sent has a store of 0 in there ;) and strides by 4. Thanks, Arnold Vectorization of strided loops: I am using float as the
2013 Oct 14
4
[LLVMdev] Vectorization of pointer PHI nodes
This is almost ideal for SLP vectorization, except for two problems: 1. We have 4 stores to consecutive locations, but the last element is the constant zero, and not an additional SUB. At the moment we don’t have support for idempotence operations, but this is something that we should add. 2. The values that we are subtracting come from 3 loads. We usually load 4 elements from memory, or