Displaying 2 results from an estimated 2 matches for "a2_2".
Did you mean:
a22
2013 Oct 14
0
[LLVMdev] Vectorization of pointer PHI nodes
...accesses (3 vector loads plus interleaves which on arm we can do with VLD3.8):
for (int i = 0; i < 256; i +=12) {
float a1 = *read[i] * 3.0;
float a1_2 = *read[i+3] * 3.0;
float a1_3 = *read[i+6] * 3.0;
float a1_4 = *read[i+9] * 3.0
float a2 = *read[i+1]* 4.0;
float a2_2 = *read[i+3+1]* 4.0;
…
float a3 = *read[i+2] * 5.0;
float a3_2 = *read[i+3+2] * 5.0;
write[i] = a1;
write[i+3] = a1_2;
…
write[i+1] = a2;
write[i+1+3] = a2_2;
...
}
VLD3.f32 {a1..a1_4, a2..a2_4, a3..3_4} [read+i]
a1..a1_4 = VMUL a1..a1_4, #3.0
a2..a2_4...
2013 Oct 14
4
[LLVMdev] Vectorization of pointer PHI nodes
This is almost ideal for SLP vectorization, except for two problems:
1. We have 4 stores to consecutive locations, but the last element is the constant zero, and not an additional SUB. At the moment we don’t have support for idempotence operations, but this is something that we should add.
2. The values that we are subtracting come from 3 loads. We usually load 4 elements from memory, or