Hi Dave,>> We generate xEXT nodes in many cases. Unlike GCC which vectorizes >> inner loops, we vectorize the implicit outermost loop of data-parallel >> workloads (also called whole function vectorization). We vectorize >> code even if the user uses xEXT instructions, uses mixed types, etc. >> We choose a vectorization factor which is likely to generate more >> legal vector types, but if the user mixes types then we are forced to >> make a decision. We rely on the LLVM code generator to produce >> quality code. To my understanding, the GCC vectorizer does not >> vectorize code if it thinks that it misses a single operation. > > My experience is similar to Nadav's. The Cray vectorizer vectorizes > much more code that the gcc vectorizer. Things are much more > complicated than gcc vector code would lead one to believe.I think it is important we produce non-scalarized code for the IR produced by the GCC vectorizer, since we know it can be done (otherwise GCC wouldn't have produced it). It is of course important to produce decent code in the most common cases coming from other vectorizers too. However it seems sensible to me to start with the case where you know you can easily get perfect results (GCC vectorizer output) and then try to progressively extend the goodness to the more problematic cases coming from other vectorizers. Ciao, Duncan.
David A. Greene
2012-Feb-08 16:10 UTC
[LLVMdev] SelectionDAG scalarizes vector operations.
Duncan Sands <baldrick at free.fr> writes:> I think it is important we produce non-scalarized code for the IR produced by > the GCC vectorizer, since we know it can be done (otherwise GCC wouldn't have > produced it). It is of course important to produce decent code in the most > common cases coming from other vectorizers too. However it seems sensible to > me to start with the case where you know you can easily get perfect results > (GCC vectorizer output) and then try to progressively extend the goodness to > the more problematic cases coming from other vectorizers.Of course. I was simply supporting Nadav's explanation that there's a lot of pessimization in the current lowering that doesn't even appear for code generated by gcc. We have a number of lowering modifications here to handle many of these cases. As always, I am slogging through trying to get them moved upstream. It's a long process, unfortunately. But don't be surprised to see changes that might look "unnecessary" but are very important for various compilers. -Dave
Hi David! I'd be interested in hearing about the places that you had to fix. It seems like there is a number of people who are starting to look at the quality of the generated vector code. Maybe we should report our findings in bug reports, so that we could share the work and discuss possible findings. I also plan to fill a few bug reports with suboptimal code. Thanks, Nadav -----Original Message----- From: David A. Greene [mailto:dag at cray.com] Sent: Wednesday, February 08, 2012 18:11 To: Duncan Sands Cc: David A. Greene; Rotem, Nadav; Zaks, Ayal; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] SelectionDAG scalarizes vector operations. Duncan Sands <baldrick at free.fr> writes:> I think it is important we produce non-scalarized code for the IR > produced by the GCC vectorizer, since we know it can be done > (otherwise GCC wouldn't have produced it). It is of course important > to produce decent code in the most common cases coming from other > vectorizers too. However it seems sensible to me to start with the > case where you know you can easily get perfect results (GCC vectorizer > output) and then try to progressively extend the goodness to the more problematic cases coming from other vectorizers.Of course. I was simply supporting Nadav's explanation that there's a lot of pessimization in the current lowering that doesn't even appear for code generated by gcc. We have a number of lowering modifications here to handle many of these cases. As always, I am slogging through trying to get them moved upstream. It's a long process, unfortunately. But don't be surprised to see changes that might look "unnecessary" but are very important for various compilers. -Dave --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
I'd like to throw my backing for this also. We see some IR that our internal passes have vectorized that the SelectionDAG then scalarizes. Micah> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of David A. Greene > Sent: Wednesday, February 08, 2012 8:11 AM > To: Duncan Sands > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] SelectionDAG scalarizes vector operations. > > Duncan Sands <baldrick at free.fr> writes: > > > I think it is important we produce non-scalarized code for the IR > produced by > > the GCC vectorizer, since we know it can be done (otherwise GCC > wouldn't have > > produced it). It is of course important to produce decent code in > the most > > common cases coming from other vectorizers too. However it seems > sensible to > > me to start with the case where you know you can easily get perfect > results > > (GCC vectorizer output) and then try to progressively extend the > goodness to > > the more problematic cases coming from other vectorizers. > > Of course. I was simply supporting Nadav's explanation that there's a > lot of pessimization in the current lowering that doesn't even appear > for code generated by gcc. > > We have a number of lowering modifications here to handle many of these > cases. As always, I am slogging through trying to get them moved > upstream. It's a long process, unfortunately. But don't be surprised > to see changes that might look "unnecessary" but are very important for > various compilers. > > -Dave > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi All,> Hi Dave, > > >> We generate xEXT nodes in many cases. Unlike GCC which vectorizes > >> inner loops, we vectorize the implicit outermost loop of > >> data-parallel workloads (also called whole function vectorization).Just to clarify, GCC vectorizes innermost and next-to-innermost (aka outer) loops, packing instances of the same original scalar instruction across different iterations into a vector instruction. It also vectorizes within basic blocks (aka SLP), packing distinct scalar instructions into vectors. And, it does the latter while considering a (possible) enclosing loop -- in order to place loop-invariant code outside, and also to unroll the enclosing loop if/as needed to fill the vectors. But, in any event, it creates fully vectorized code regions, with scalar code used only in supporting computations such as addressing, loop induction variable handling, reduction epilogs etc.> >> We vectorize code even if the user uses xEXT instructions, uses mixed > types, etc.GCC does vectorize code which contains multiple data types, by choosing the vectorization factor according to the smallest type, and using multiple vectors to hold larger types.> >> We choose a vectorization factor which is likely to generate more > >> legal vector types, but if the user mixes types then we are forced to > >> make a decision. We rely on the LLVM code generator to produce > >> quality code. To my understanding, the GCC vectorizer does not > >> vectorize code if it thinks that it misses a single operation. > >Right. It queries whether the target supports a vectorized form (of the desired vectorization factor) for each scalar instruction in the loop or region. There is no scalarization -- code is either fully vectorized in a way that survives code generation, or else the vectorizer gives up and avoids modifying the relevant scalar code. This may indeed not be an optimal decision; but even then, there are cases where it's better not to vectorize.> > My experience is similar to Nadav's. The Cray vectorizer vectorizes > > much more code that the gcc vectorizer. Things are much more > > complicated than gcc vector code would lead one to believe. > > I think it is important we produce non-scalarized code for the IR produced by > the GCC vectorizer, since we know it can be done (otherwise GCC wouldn't > have produced it). It is of course important to produce decent code in the > most common cases coming from other vectorizers too. However it seems > sensible to me to start with the case where you know you can easily get > perfect results (GCC vectorizer output) and then try to progressively extend > the goodness to the more problematic cases coming from other vectorizers. >BTW, the GCC vectorizer can also tell you why it did not vectorize; e.g., if some instruction was not available in vector form. So the vectorizer takes care of any desired unrollings on its own, and does not rely on a separate unroll pass. It does rely on a separate if-conversion pass especially designed to eliminate if-then-else hammocks in relevant regions (loops) right before the vectorizer kicks in. This part may require undoing, when an if-converted loop is not vectorized and the target does not support the resulting predicated scalar instructions. Hope this helps. Had the pleasure of working with the GCC autovect guys (or rather gals) from the start, before joining Nadav et al. recently. Ayal.> Ciao, Duncan.--------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Hi Ayal, thanks for this interesting information.>>>> We choose a vectorization factor which is likely to generate more >>>> legal vector types, but if the user mixes types then we are forced to >>>> make a decision. We rely on the LLVM code generator to produce >>>> quality code. To my understanding, the GCC vectorizer does not >>>> vectorize code if it thinks that it misses a single operation. >>> > > Right. It queries whether the target supports a vectorized form (of the desired vectorization factor) for each scalar instruction in the loop or region. There is no scalarization -- code is either fully vectorized in a way that survives code generation, or else the vectorizer gives up and avoids modifying the relevant scalar code. This may indeed not be an optimal decision; but even then, there are cases where it's better not to vectorize.The problem right now is that LLVM's codegen takes the vector IR produced by GCC and often scalarizes it. Ciao, Duncan.
Apparently Analagous Threads
- [LLVMdev] SelectionDAG scalarizes vector operations.
- [LLVMdev] SelectionDAG scalarizes vector operations.
- [LLVMdev] SelectionDAG scalarizes vector operations.
- [LLVMdev] SelectionDAG scalarizes vector operations.
- [LLVMdev] SelectionDAG scalarizes vector operations.