thr3ads.net - search: "a1

Displaying 2 results from an estimated 2 matches for "a1_4".

Did you mean: a104

[LLVMdev] Vectorization of pointer PHI nodes

2013 Oct 14

[LLVMdev] Vectorization of pointer PHI nodes

...ead[i+9+2]) are consecutive and we can efficiently vectorized these accesses (3 vector loads plus interleaves which on arm we can do with VLD3.8): for (int i = 0; i < 256; i +=12) { float a1 = *read[i] * 3.0; float a1_2 = *read[i+3] * 3.0; float a1_3 = *read[i+6] * 3.0; float a1_4 = *read[i+9] * 3.0 float a2 = *read[i+1]* 4.0; float a2_2 = *read[i+3+1]* 4.0; … float a3 = *read[i+2] * 5.0; float a3_2 = *read[i+3+2] * 5.0; write[i] = a1; write[i+3] = a1_2; … write[i+1] = a2; write[i+1+3] = a2_2; ... } VLD3.f32 {a1..a1_4, a2....

[LLVMdev] Vectorization of pointer PHI nodes

2013 Oct 14

[LLVMdev] Vectorization of pointer PHI nodes

This is almost ideal for SLP vectorization, except for two problems: 1. We have 4 stores to consecutive locations, but the last element is the constant zero, and not an additional SUB. At the moment we don’t have support for idempotence operations, but this is something that we should add. 2. The values that we are subtracting come from 3 loads. We usually load 4 elements from memory, or

search for: a1_4