Hi,
LLVM now has a new loop vectorizer. We are now able to vectorize loops such as
this:
for (i=0; i<n; i++) {
a[i] = b[i+1] + c[i+3] + i;
sum += d[i];
}
The loop vectorizer is disabled by default and can be enabled in clang using the
"-mllvm -vectorize" flag (or '-loop-vectorize' in opt). The
loop vectorizer is far from being 'ready', and this feature should be
considered as "highly experimental".
The work on the loop vectorizer had just began, and there is lots of work ahead.
If you find bugs, or opportunities for improvements, then please open a bugzilla
bug report and CC me. If you decide to run the loop vectorizer on public
benchmarks or on your own workloads then please share the results. This
information is important because it can help us decide where to focus our
efforts.
We currently know of a number of areas where we can improve. At the moment the
vectorizer will vectorize anything it can, because we do not have a
"cost-model" to estimate the profitability of vectorization.
Implementing a cost model is a high-priority for us, and until this is ready you
should expect to see slowdowns on many loops. Another area which we need to
improve is the memory dependence check. At the moment we have a very basic
memory legality check which can be improved. Additionally, there are a number of
cases where we generate poor vector code or suffer from a phase-ordering
problem. Once we solve these problems we can continue to implement additional
features.
Thanks,
Nadav