Displaying 5 results from an estimated 5 matches for "vst3".
Did you mean:
vst
2013 Oct 14
0
[LLVMdev] Vectorization of pointer PHI nodes
Renato, can you post the c code for the function and the assembly that gcc produces?
Your initial example could be well handled by vectorization of strided loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that this is what happened). But the LLVM-IR you sent has a store of 0 in there ;) and strides by 4.
Thanks,
Arnold
Vectorization of strided loops:
I am using float as the example otherwise would get too long.
void f(float * restrict read, float * restrict write) {
fo...
2013 Oct 14
1
[LLVMdev] Vectorization of pointer PHI nodes
...13 19:31, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> Renato, can you post the c code for the function and the assembly that gcc
> produces?
>
Attached.
Your initial example could be well handled by vectorization of strided
> loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that
> this is what happened). But the LLVM-IR you sent has a store of 0 in there
> ;) and strides by 4.
>
I think so. Ignore the last write, it was bogus. (but don't ignore the fact
that GCC vectorized it anyway with vst4!).
By running GCC with -ftree-vectoriz...
2013 Oct 14
4
[LLVMdev] Vectorization of pointer PHI nodes
This is almost ideal for SLP vectorization, except for two problems:
1. We have 4 stores to consecutive locations, but the last element is the constant zero, and not an additional SUB. At the moment we don’t have support for idempotence operations, but this is something that we should add.
2. The values that we are subtracting come from 3 loads. We usually load 4 elements from memory, or
2013 Oct 14
4
[LLVMdev] Fwd: Vectorization of pointer PHI nodes
...teger and float reduction variables, not
pointers.
My code looks like this:
for (i: 0 -> MAX) {
a = *read;
b = *read;
c = *read;
// do the same stuff to a, b, c
*write++ = a;
*write++ = b;
*write++ = c;
}
Vectorizing this is very simple and it's a sequence of VLD3 + VOPS + VST3,
which GCC does it nicely, but we don't.
What would be the steps in adding a pointer increment reduction kind
(RK_PointerInc)? I believe the logic would be similar to RK_IntegerAdd, but
with a stride of type size, right?
Or maybe we'd have to translate the loop into something like:
for (...
2013 Oct 14
0
[LLVMdev] Vectorization of pointer PHI nodes
...this:
>
> for (i: 0 -> MAX) {
> a = *read;
> b = *read;
> c = *read;
>
> // do the same stuff to a, b, c
>
> *write++ = a;
> *write++ = b;
> *write++ = c;
> }
>
> Vectorizing this is very simple and it's a sequence of VLD3 + VOPS + VST3, which GCC does it nicely, but we don't.
>
> What would be the steps in adding a pointer increment reduction kind (RK_PointerInc)? I believe the logic would be similar to RK_IntegerAdd, but with a stride of type size, right?
>
> Or maybe we'd have to translate the loop into so...