Bernhard Manfred Gruber via llvm-dev
2021-Feb-24 17:43 UTC
[llvm-dev] Vectorization of single loop with AoSoA layout
Hi everybody! I have a question for the vectorization experts and would like to ask for some insight please. I am working on an LLVM-independent library that offers various memory layouts for arrays of plain structs in C++. One of these layouts is an AoSoA (Array of Struct of Arrays). E.g.: constexpr auto lanes = 8; struct Block { float a[lanes]; float b[lanes]; float c[lanes]; }; Single loops that iterate over arrays of this layout fail to vectorize with clang/LLVM (also with recent g++, icc and MSVC). E.g. adding the vectors of floats a and b into c, where a, b and c are stored in one memory block as AoSoA: constexpr auto alignment = lanes * sizeof(float); void aosoa1(Block* ubuf, size_t n) { auto* buf = std::assume_aligned<alignment>(ubuf); for (size_t i = 0; i < (n/lanes)*lanes; i++) { const auto block = i / lanes; const auto lane = i % lanes; buf[block].c[lane] = buf[block].a[lane] + buf[block].b[lane]; } } Flags for clang: -std=c++20 -O3 -mavx2 -Rpass-analysis=loop-vectorize -Rpass-missed=loop-vectorize clang gives me this remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]. I tried browsing through the LLVM source to figure out if I could get it working, but that obviously grew over my head :) With two nested loops, the inner one vectorizes fine: void aosoa2(Block* ubuf, size_t n) { auto* buf = std::assume_aligned<alignment>(ubuf); for (size_t block = 0; block < n/lanes; block++) { for (size_t lane = 0; lane < lanes; lane++) { buf[block].c[lane] = buf[block].a[lane] + buf[block].b[lane]; } } } Full example: https://godbolt.org/z/qdG9aY Why does clang/LLVM fail to vectorize the loop in aosoa1() which splits the loop index into block and lane index? I think I do not sufficiently understand the "cannot identify array bounds" remark. Is vectorization theoretically possible for aosoa1()? That is, there is no reason that forbids vectorization. Is there a workaround for clang, like a #pragma, that can be used to allow clang to vectorize aosoa1()? Would this use case be important enough that clang/LLVM could at some point recognize such a pattern and successfully vectorize it? I really appreciate your input here! Thank you very much! Bernhard -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210224/a222b595/attachment-0001.html>