search for: a3_4

Displaying 2 results from an estimated 2 matches for "a3_4".

Did you mean: 3_4
2013 Oct 14
0
[LLVMdev] Vectorization of pointer PHI nodes
...float a3 = *read[i+2] * 5.0; float a3_2 = *read[i+3+2] * 5.0; write[i] = a1; write[i+3] = a1_2; … write[i+1] = a2; write[i+1+3] = a2_2; ... } VLD3.f32 {a1..a1_4, a2..a2_4, a3..3_4} [read+i] a1..a1_4 = VMUL a1..a1_4, #3.0 a2..a2_4 = VMUL a2..a2_4, #4.0 a3..a3_4 = VMUL a3..a3_4, #5.0 VST3.f32 {a1..a1_4, a2..a2_4, a3..3_4} [read+i] On Oct 14, 2013, at 12:15 PM, Nadav Rotem <nrotem at apple.com> wrote: > This is almost ideal for SLP vectorization, except for two problems: > > 1. We have 4 stores to consecutive locations, but the last ele...
2013 Oct 14
4
[LLVMdev] Vectorization of pointer PHI nodes
This is almost ideal for SLP vectorization, except for two problems: 1. We have 4 stores to consecutive locations, but the last element is the constant zero, and not an additional SUB. At the moment we don’t have support for idempotence operations, but this is something that we should add. 2. The values that we are subtracting come from 3 loads. We usually load 4 elements from memory, or