thr3ads.net - search: "a3

Displaying 2 results from an estimated 2 matches for "a3_4".

Did you mean: 3_4

[LLVMdev] Vectorization of pointer PHI nodes

2013 Oct 14

[LLVMdev] Vectorization of pointer PHI nodes

...float a3 = *read[i+2] * 5.0; float a3_2 = *read[i+3+2] * 5.0; write[i] = a1; write[i+3] = a1_2; … write[i+1] = a2; write[i+1+3] = a2_2; ... } VLD3.f32 {a1..a1_4, a2..a2_4, a3..3_4} [read+i] a1..a1_4 = VMUL a1..a1_4, #3.0 a2..a2_4 = VMUL a2..a2_4, #4.0 a3..a3_4 = VMUL a3..a3_4, #5.0 VST3.f32 {a1..a1_4, a2..a2_4, a3..3_4} [read+i] On Oct 14, 2013, at 12:15 PM, Nadav Rotem <nrotem at apple.com> wrote: > This is almost ideal for SLP vectorization, except for two problems: > > 1. We have 4 stores to consecutive locations, but the last ele...

[LLVMdev] Vectorization of pointer PHI nodes

2013 Oct 14

[LLVMdev] Vectorization of pointer PHI nodes

This is almost ideal for SLP vectorization, except for two problems: 1. We have 4 stores to consecutive locations, but the last element is the constant zero, and not an additional SUB. At the moment we don’t have support for idempotence operations, but this is something that we should add. 2. The values that we are subtracting come from 3 loads. We usually load 4 elements from memory, or

search for: a3_4