I have a simple expression-evaluation language using LLVM (it's at https://cowlark.com/calculon, if anyone's interested). It has pretty primitive support for 3-vectors, which I'm representing as a <3 x float>. One of my users has asked for proper n-vector support, and I agree with him so I'm adding that. However, he wants to use quite large vectors. He's mentioned 30 elements, and I think he would like 50. At what point do vectors stop being useful? I've experimented with llc and, while large vectors *work*, the code that's produced looks pretty scary (and I don't know enough about SSE and AVX instructions to evaluate whether it's good or not). I can think of various things I could do: passing input parameters by reference instead of value, ditto with output parameters, falling back to aggregates or plain arrays... but all this adds complexity to the code generator, and I don't know if it's worth it (or at what point it *becomes* worth it). My JITted code consists of a single module with a single entry point, and is all compiled up front with aggressive inlining. Vector parameters are only ever used internally. I've noticed that LLVM does an excellent job of optimising. Can I just use the naive approach and trust LLVM to get on with it and make it work? -- ┌─── dg@cowlark.com ───── http://www.cowlark.com ───── │ │ 𝕻𝖍'𝖓𝖌𝖑𝖚𝖎 𝖒𝖌𝖑𝖜'𝖓𝖆𝖋𝖍 𝕮𝖙𝖍𝖚𝖑𝖍𝖚 𝕽'𝖑𝖞𝖊𝖍 𝖜𝖌𝖆𝖍'𝖓𝖆𝖌𝖑 𝖋𝖍𝖙𝖆𝖌𝖓. │ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 263 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130206/18535c72/attachment.sig>
I can see why freakishly large vectors would produce bad code. The type <50 x float> would be widened to the next power of two, and then split over and over again until it fits into registers. So, any <50 x float> would take 16 XMM registers, that will be spilled. The situation with integer types is even worse because you can truncate or extend from one type to another. On Feb 6, 2013, at 8:41 AM, David Given <dg at cowlark.com> wrote:> I have a simple expression-evaluation language using LLVM (it's at > https://cowlark.com/calculon, if anyone's interested). It has pretty > primitive support for 3-vectors, which I'm representing as a <3 x float>. > > One of my users has asked for proper n-vector support, and I agree with > him so I'm adding that. However, he wants to use quite large vectors. > He's mentioned 30 elements, and I think he would like 50. > > At what point do vectors stop being useful? > > I've experimented with llc and, while large vectors *work*, the code > that's produced looks pretty scary (and I don't know enough about SSE > and AVX instructions to evaluate whether it's good or not). I can think > of various things I could do: passing input parameters by reference > instead of value, ditto with output parameters, falling back to > aggregates or plain arrays... but all this adds complexity to the code > generator, and I don't know if it's worth it (or at what point it > *becomes* worth it). > > My JITted code consists of a single module with a single entry point, > and is all compiled up front with aggressive inlining. Vector parameters > are only ever used internally. I've noticed that LLVM does an excellent > job of optimising. Can I just use the naive approach and trust LLVM to > get on with it and make it work? > > -- > ┌─── dg@cowlark.com ───── http://www.cowlark.com ───── > │ > │ 𝕻𝖍'𝖓𝖌𝖑𝖚𝖎 𝖒𝖌𝖑𝖜'𝖓𝖆𝖋𝖍 𝕮𝖙𝖍𝖚𝖑𝖍𝖚 𝕽'𝖑𝖞𝖊𝖍 > 𝖜𝖌𝖆𝖍'𝖓𝖆𝖌𝖑 𝖋𝖍𝖙𝖆𝖌𝖓. > │ > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 6 February 2013 17:03, Nadav Rotem <nrotem at apple.com> wrote:> I can see why freakishly large vectors would produce bad code. The type > <50 x float> would be widened to the next power of two, and then split over > and over again until it fits into registers. So, any <50 x float> would > take 16 XMM registers, that will be spilled. The situation with integer > types is even worse because you can truncate or extend from one type to > another. >In that sense, an inner loop with sequential access would be vectorized into much better code than having a <50 x float>. Whether this is something LLVM could do with <50 x float> or should always be up to the front-end developer, I don't know. It doesn't seem particularly hard to do it in the vectorizer, but it's also probably won't be high on the TODO list for a while. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130206/f720fdc4/attachment.html>