thr3ads.net - llvm dev - [LLVMdev] SelectionDAG scalarizes vector operations. [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Duncan Sands

2012-Feb-08 15:38 UTC

[LLVMdev] SelectionDAG scalarizes vector operations.

Hi Dave,
>> We generate xEXT nodes in many cases.  Unlike GCC which vectorizes
>> inner loops, we vectorize the implicit outermost loop of data-parallel
>> workloads (also called whole function vectorization).  We vectorize
>> code even if the user uses xEXT instructions, uses mixed types, etc.
>> We choose a vectorization factor which is likely to generate more
>> legal vector types, but if the user mixes types then we are forced to
>> make a decision.  We rely on the LLVM code generator to produce
>> quality code.  To my understanding, the GCC vectorizer does not
>> vectorize code if it thinks that it misses a single operation.
>
> My experience is similar to Nadav's.  The Cray vectorizer vectorizes
> much more code that the gcc vectorizer.  Things are much more
> complicated than gcc vector code would lead one to believe.
I think it is important we produce non-scalarized code for the IR produced by
the GCC vectorizer, since we know it can be done (otherwise GCC wouldn't
have
produced it).  It is of course important to produce decent code in the most
common cases coming from other vectorizers too.  However it seems sensible to
me to start with the case where you know you can easily get perfect results
(GCC vectorizer output) and then try to progressively extend the goodness to
the more problematic cases coming from other vectorizers.

Ciao, Duncan.

David A. Greene

2012-Feb-08 16:10 UTC

head link

[LLVMdev] SelectionDAG scalarizes vector operations.

Duncan Sands <baldrick at free.fr> writes:
> I think it is important we produce non-scalarized code for the IR produced
by
> the GCC vectorizer, since we know it can be done (otherwise GCC
wouldn't have
> produced it).  It is of course important to produce decent code in the most
> common cases coming from other vectorizers too.  However it seems sensible
to
> me to start with the case where you know you can easily get perfect results
> (GCC vectorizer output) and then try to progressively extend the goodness
to
> the more problematic cases coming from other vectorizers.
Of course.  I was simply supporting Nadav's explanation that there's a
lot of pessimization in the current lowering that doesn't even appear
for code generated by gcc.

We have a number of lowering modifications here to handle many of these
cases.  As always, I am slogging through trying to get them moved
upstream.  It's a long process, unfortunately.  But don't be surprised
to see changes that might look "unnecessary" but are very important
for
various compilers.

                              -Dave

Rotem, Nadav

2012-Feb-08 16:24 UTC

head link

[LLVMdev] SelectionDAG scalarizes vector operations.

Hi David!

I'd be interested in hearing about the places that you had to fix.  It seems
like there is a number of people who are starting to look at the quality of the
generated vector code.  Maybe we should report our findings in bug reports, so
that we could share the work and discuss possible findings.  I also plan to fill
a few bug reports with suboptimal code.

Thanks,
Nadav

-----Original Message-----
From: David A. Greene [mailto:dag at cray.com] 
Sent: Wednesday, February 08, 2012 18:11
To: Duncan Sands
Cc: David A. Greene; Rotem, Nadav; Zaks, Ayal; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] SelectionDAG scalarizes vector operations.

Duncan Sands <baldrick at free.fr> writes:
> I think it is important we produce non-scalarized code for the IR 
> produced by the GCC vectorizer, since we know it can be done 
> (otherwise GCC wouldn't have produced it).  It is of course important 
> to produce decent code in the most common cases coming from other 
> vectorizers too.  However it seems sensible to me to start with the 
> case where you know you can easily get perfect results (GCC vectorizer 
> output) and then try to progressively extend the goodness to the more
problematic cases coming from other vectorizers.
Of course.  I was simply supporting Nadav's explanation that there's a
lot of pessimization in the current lowering that doesn't even appear for
code generated by gcc.

We have a number of lowering modifications here to handle many of these cases. 
As always, I am slogging through trying to get them moved upstream.  It's a
long process, unfortunately.  But don't be surprised to see changes that
might look "unnecessary" but are very important for various compilers.

                              -Dave
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Villmow, Micah

2012-Feb-08 16:29 UTC

head link

[LLVMdev] SelectionDAG scalarizes vector operations.

I'd like to throw my backing for this also. We see some IR that our internal
passes have vectorized that the SelectionDAG then scalarizes.

Micah
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of David A. Greene
> Sent: Wednesday, February 08, 2012 8:11 AM
> To: Duncan Sands
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] SelectionDAG scalarizes vector operations.
> 
> Duncan Sands <baldrick at free.fr> writes:
> 
> > I think it is important we produce non-scalarized code for the IR
> produced by
> > the GCC vectorizer, since we know it can be done (otherwise GCC
> wouldn't have
> > produced it).  It is of course important to produce decent code in
> the most
> > common cases coming from other vectorizers too.  However it seems
> sensible to
> > me to start with the case where you know you can easily get perfect
> results
> > (GCC vectorizer output) and then try to progressively extend the
> goodness to
> > the more problematic cases coming from other vectorizers.
> 
> Of course.  I was simply supporting Nadav's explanation that
there's a
> lot of pessimization in the current lowering that doesn't even appear
> for code generated by gcc.
> 
> We have a number of lowering modifications here to handle many of these
> cases.  As always, I am slogging through trying to get them moved
> upstream.  It's a long process, unfortunately.  But don't be
surprised
> to see changes that might look "unnecessary" but are very
important for
> various compilers.
> 
>                               -Dave
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Zaks, Ayal

2012-Feb-08 21:46 UTC

head link

[LLVMdev] SelectionDAG scalarizes vector operations.

Hi All,
> Hi Dave,
> 
> >> We generate xEXT nodes in many cases.  Unlike GCC which vectorizes
> >> inner loops, we vectorize the implicit outermost loop of
> >> data-parallel workloads (also called whole function
vectorization).
Just to clarify, GCC vectorizes innermost and next-to-innermost (aka outer)
loops, packing instances of the same original scalar instruction across
different iterations into a vector instruction. It also vectorizes within basic
blocks (aka SLP), packing distinct scalar instructions into vectors. And, it
does the latter while considering a (possible) enclosing loop -- in order to
place loop-invariant code outside, and also to unroll the enclosing loop if/as
needed to fill the vectors. But, in any event, it creates fully vectorized code
regions, with scalar code used only in supporting computations such as
addressing, loop induction variable handling, reduction epilogs etc.
> >> We vectorize code even if the user uses xEXT instructions, uses
mixed
> types, etc.
GCC does vectorize code which contains multiple data types, by choosing the
vectorization factor according to the smallest type, and using multiple vectors
to hold larger types.
> >> We choose a vectorization factor which is likely to generate more
> >> legal vector types, but if the user mixes types then we are forced
to
> >> make a decision.  We rely on the LLVM code generator to produce
> >> quality code.  To my understanding, the GCC vectorizer does not
> >> vectorize code if it thinks that it misses a single operation.
> >
Right. It queries whether the target supports a vectorized form (of the desired
vectorization factor) for each scalar instruction in the loop or region. There
is no scalarization -- code is either fully vectorized in a way that survives
code generation, or else the vectorizer gives up and avoids modifying the
relevant scalar code. This may indeed not be an optimal decision; but even then,
there are cases where it's better not to vectorize.
> > My experience is similar to Nadav's.  The Cray vectorizer
vectorizes
> > much more code that the gcc vectorizer.  Things are much more
> > complicated than gcc vector code would lead one to believe.
> 
> I think it is important we produce non-scalarized code for the IR produced
by
> the GCC vectorizer, since we know it can be done (otherwise GCC
wouldn't
> have produced it).  It is of course important to produce decent code in the
> most common cases coming from other vectorizers too.  However it seems
> sensible to me to start with the case where you know you can easily get
> perfect results (GCC vectorizer output) and then try to progressively
extend
> the goodness to the more problematic cases coming from other vectorizers.
> 
BTW, the GCC vectorizer can also tell you why it did not vectorize; e.g., if
some instruction was not available in vector form.

So the vectorizer takes care of any desired unrollings on its own, and does not
rely on a separate unroll pass. It does rely on a separate if-conversion pass
especially designed to eliminate if-then-else hammocks in relevant regions
(loops) right before the vectorizer kicks in. This part may require undoing,
when an if-converted loop is not vectorized and the target does not support the
resulting predicated scalar instructions.

Hope this helps. Had the pleasure of working with the GCC autovect guys (or
rather gals) from the start, before joining Nadav et al. recently.

Ayal.
> Ciao, Duncan.---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Duncan Sands

2012-Feb-09 09:07 UTC

head link

[LLVMdev] SelectionDAG scalarizes vector operations.

Hi Ayal, thanks for this interesting information.
>>>> We choose a vectorization factor which is likely to generate
more
>>>> legal vector types, but if the user mixes types then we are
forced to
>>>> make a decision.  We rely on the LLVM code generator to produce
>>>> quality code.  To my understanding, the GCC vectorizer does not
>>>> vectorize code if it thinks that it misses a single operation.
>>>
>
> Right. It queries whether the target supports a vectorized form (of the
desired vectorization factor) for each scalar instruction in the loop or region.
There is no scalarization -- code is either fully vectorized in a way that
survives code generation, or else the vectorizer gives up and avoids modifying
the relevant scalar code. This may indeed not be an optimal decision; but even
then, there are cases where it's better not to vectorize.
The problem right now is that LLVM's codegen takes the vector IR produced by
GCC and often scalarizes it.

Ciao, Duncan.

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Feb 2012 - [LLVMdev] SelectionDAG scalarizes vector operations.

[LLVMdev] SelectionDAG scalarizes vector operations.

[LLVMdev] SelectionDAG scalarizes vector operations.

[LLVMdev] SelectionDAG scalarizes vector operations.

[LLVMdev] SelectionDAG scalarizes vector operations.

[LLVMdev] SelectionDAG scalarizes vector operations.

[LLVMdev] SelectionDAG scalarizes vector operations.

Apparently Analagous Threads