Nadav Rotem
2013-Nov-14 15:03 UTC
[LLVMdev] Vectorization of loops with conditional dereferencing
I think that the best way to move forward with the vectorization of this loop is to make progress on the vectorization pragmas. The LoopVectorizer is already prepared for handling pragmas and we just need to add the clang-side support. Is anyone planning to work on this ? On Nov 14, 2013, at 2:18 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 1 November 2013 13:40, Hal Finkel <hfinkel at anl.gov> wrote: > Done = false; > FirstI = -1, LastI = -1; > while (!Done) { > for (I = FirstI+1; I < N; ++I) > if (r[i] > 0) { > FirstI = I; > break; > } > > for (; I < N && !page_bound(&m[i]) && ...; ++I) { > if (r[i] > 0) > LastI = I; > } > > Done = I == N; > > for (I = FirstI; I <= LastI; ...) { > // Run the scalar/vector loop sequence as before. > } > } > > Hi Hal, > > Even if you could do something like this, there is no guarantee that (Last - First) will be multiple of VF, or even bigger than VF, and you'll have to cope with boundary conditions on every external loop, which, depending on the distribution of r[] could be as many as half the size of r[] itself. > > I can't see an algorithmically (compile-time or run-time) way to guarantee that the number of clusters will be small without scanning the whole array. So, even if you add such a check in run-time and only vectorize if the number of clusters is small, the worst case scenario will run twice as many times, (the initial check can be vectorized, as it's a read-only loop), but the later cannot. in the best case scenario, you'll have only a handful of loops, which will run in parallel. > > Worst case: n/VF + n > Best case: n/VF + ammortized n/VF > > For VF == 2, > * best case is as fast as scalar code, considering overheads. > * worst case is 50% slower > > For VF == 4, > * best case is 50% faster than scalar code > * worst case is 25% slower > > And all that, depends on each workload, so it'll change for every different set of arguments, which in an n-body simulation, changes dynamically. > > cheers, > --renato-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131114/a52a3df7/attachment.html>
Renato Golin
2013-Nov-14 15:09 UTC
[LLVMdev] Vectorization of loops with conditional dereferencing
On 14 November 2013 15:03, Nadav Rotem <nrotem at apple.com> wrote:> I think that the best way to move forward with the vectorization of this > loop is to make progress on the vectorization pragmas. The LoopVectorizer > is already prepared for handling pragmas and we just need to add the > clang-side support. Is anyone planning to work on this ? >I'm not. :( What kind of pragmas would work for this loop? Something telling that it's safe to speculatively read from m[] at any position? In this reduction case it might be enough. But if this would be an induction store like: for () { if (a[i] > 0) x[i] = ... + m[i]; then, the store would be a more complicated way to write to memory and you'd need the read-pragma to not affect such cases. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131114/93d3c2a5/attachment.html>
Pekka Jääskeläinen
2013-Nov-14 15:45 UTC
[LLVMdev] Vectorization of loops with conditional dereferencing
Hi all, On 11/14/2013 05:09 PM, Renato Golin wrote:> What kind of pragmas would work for this loop?Wouldn't the "all-trap-type-covering" 'notrap' attribute take care of this one too? -- --Pekka
Nadav Rotem
2013-Nov-14 16:38 UTC
[LLVMdev] Vectorization of loops with conditional dereferencing
> > I'm not. :( >I think that this is probably the most important feature for the vectorizer right now. Other features require adding complexity to the vectorizer while this feature is relatively simple.> What kind of pragmas would work for this loop? Something telling that it's safe to speculatively read from m[] at any position? In this reduction case it might be enough. But if this would be an induction store like: > > for () { > if (a[i] > 0) > x[i] = ... + m[i]; >Sure. Vectorization of stores is done by loading the current value from memory, blending the new value and saving it back to memory.> then, the store would be a more complicated way to write to memory and you'd need the read-pragma to not affect such cases.There is no need for read pragma or even a special attribute. The ‘vectorize’ pragma tells the vectorizer that it is safe to access the predicated memory (read or write).
Apparently Analagous Threads
- [LLVMdev] Vectorization of loops with conditional dereferencing
- [LLVMdev] Vectorization of loops with conditional dereferencing
- [LLVMdev] Vectorization of loops with conditional dereferencing
- [LLVMdev] Vectorization of loops with conditional dereferencing
- [LLVMdev] Vectorization of loops with conditional dereferencing