thr3ads.net - llvm dev - [LLVMdev] Vectorization: Next Steps [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2012-Feb-03 03:56 UTC

[LLVMdev] Vectorization: Next Steps

As some of you may know, I committed my basic-block autovectorization
pass a few days ago. I encourage anyone interested to try it out (pass
-vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
Especially in combination with -unroll-allow-partial, I have observed
some significant benchmark speedups, but, I have also observed some
significant slowdowns. I would like to share my thoughts, and hopefully
get feedback, on next steps.

1. "Target Data" for vectorization - I think that in order to improve
the vectorization quality, the vectorizer will need more information
about the target. This information could be provided in the form of a
kind of extended target data. This extended target data might contain:
 - What basic types can be vectorized, and how many of them will fit
into (the largest) vector registers
 - What classes of operations can be vectorized (division, conversions /
sign extension, etc. are not always supported)
 - What alignment is necessary for loads and stores
 - Is scalar-to-vector free?

2. Feedback between passes - We may to implement a closer coupling
between optimization passes than currently exists. Specifically, I have
in mind two things:
 - The vectorizer should communicate more closely with the loop
unroller. First, the loop unroller should try to unroll to preserve
maximal load/store alignments. Second, I think it would make a lot of
sense to be able to unroll and, only if this helps vectorization should
the unrolled version be kept in preference to the original. With basic
block vectorization, it is often necessary to (partially) unroll in
order to vectorize. Even when we also have real loop vectorization,
however, I still think that it will be important for the loop unroller
to communicate with the vectorizer.
 - After vectorization, it would make sense for the vectorization pass
to request further simplification, but only on those parts of the code
that it modified. 

3. Loop vectorization - It would be nice to have, in addition to
basic-block vectorization, a more-traditional loop vectorization pass. I
think that we'll need a better loop analysis pass in order for this to
happen. Some of this was started in LoopDependenceAnalysis, but that
pass is not yet finished. We'll need something like this to recognize
affine memory references, etc.

I look forward to hearing everyone's thoughts.

 -Hal

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Duncan Sands

2012-Feb-03 08:49 UTC

head link

[LLVMdev] Vectorization: Next Steps

Hi Hal,
> As some of you may know, I committed my basic-block autovectorization
> pass a few days ago. I encourage anyone interested to try it out (pass
> -vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
> Especially in combination with -unroll-allow-partial, I have observed
> some significant benchmark speedups, but, I have also observed some
> significant slowdowns.
codegen for vector constructs is not always that great in my experience.
It could be that your vectorizer is doing the right thing, and it's
codegen that needs to be improved.  For example when I use the GCC
autovectorizer I often see LLVM codegen unnecessarily scalarizing the
vector code.  Did you try to analyse these slowdowns?

Ciao, Duncan.

  I would like to share my thoughts, and hopefully> get feedback, on next steps.
>
> 1. "Target Data" for vectorization - I think that in order to
improve
> the vectorization quality, the vectorizer will need more information
> about the target. This information could be provided in the form of a
> kind of extended target data. This extended target data might contain:
>   - What basic types can be vectorized, and how many of them will fit
> into (the largest) vector registers
>   - What classes of operations can be vectorized (division, conversions /
> sign extension, etc. are not always supported)
>   - What alignment is necessary for loads and stores
>   - Is scalar-to-vector free?

Rotem, Nadav

2012-Feb-03 11:34 UTC

head link

[LLVMdev] Vectorization: Next Steps

Duncan, 

I also noticed cases where vector IR is scalariezd by the codegen.  From what I
have seen (which is based on a different vectorizer with a different code model,
etc) there are two main areas for improvements:

1. Complex instructions - Instructions such as shuffles are very sensitive to
the ability of the codegen to lower them. If a vectorizer generates shuffle
instructions which are not handled properly by the manual lowering code, then
the instruction is scalarized.

2. Instructions with mixed types -Instructions which operate on mixed types,
such as 2xfloat->2xdouble, are usually scalarized by the type legalizer.

Nadav

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Duncan Sands
Sent: Friday, February 03, 2012 10:50
To: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Vectorization: Next Steps

Hi Hal,
> As some of you may know, I committed my basic-block autovectorization 
> pass a few days ago. I encourage anyone interested to try it out (pass 
> -vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
> Especially in combination with -unroll-allow-partial, I have observed 
> some significant benchmark speedups, but, I have also observed some 
> significant slowdowns.
codegen for vector constructs is not always that great in my experience.
It could be that your vectorizer is doing the right thing, and it's codegen
that needs to be improved.  For example when I use the GCC autovectorizer I
often see LLVM codegen unnecessarily scalarizing the vector code.  Did you try
to analyse these slowdowns?

Ciao, Duncan.

  I would like to share my thoughts, and hopefully> get feedback, on next steps.
>
> 1. "Target Data" for vectorization - I think that in order to
improve
> the vectorization quality, the vectorizer will need more information 
> about the target. This information could be provided in the form of a 
> kind of extended target data. This extended target data might contain:
>   - What basic types can be vectorized, and how many of them will fit 
> into (the largest) vector registers
>   - What classes of operations can be vectorized (division, 
> conversions / sign extension, etc. are not always supported)
>   - What alignment is necessary for loads and stores
>   - Is scalar-to-vector free?
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Jack Howarth

2012-Feb-03 13:45 UTC

head link

[LLVMdev] Vectorization: Next Steps

On Fri, Feb 03, 2012 at 09:49:30AM +0100, Duncan Sands
wrote:> Hi Hal,
> 
> > As some of you may know, I committed my basic-block autovectorization
> > pass a few days ago. I encourage anyone interested to try it out (pass
> > -vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
> > Especially in combination with -unroll-allow-partial, I have observed
> > some significant benchmark speedups, but, I have also observed some
> > significant slowdowns.
> 
> codegen for vector constructs is not always that great in my experience.
> It could be that your vectorizer is doing the right thing, and it's
> codegen that needs to be improved.  For example when I use the GCC
> autovectorizer I often see LLVM codegen unnecessarily scalarizing the
> vector code.  Did you try to analyse these slowdowns?
> 
> Ciao, Duncan.
Duncan,
  Is there a recommended approach for testing the new -vectorize support
within dragonegg?
               Jack
> 
>   I would like to share my thoughts, and hopefully
> > get feedback, on next steps.
> >
> > 1. "Target Data" for vectorization - I think that in order
to improve
> > the vectorization quality, the vectorizer will need more information
> > about the target. This information could be provided in the form of a
> > kind of extended target data. This extended target data might contain:
> >   - What basic types can be vectorized, and how many of them will fit
> > into (the largest) vector registers
> >   - What classes of operations can be vectorized (division,
conversions /
> > sign extension, etc. are not always supported)
> >   - What alignment is necessary for loads and stores
> >   - Is scalar-to-vector free?
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Hal Finkel

2012-Feb-03 13:56 UTC

head link

[LLVMdev] Vectorization: Next Steps

On Fri, 2012-02-03 at 09:49 +0100, Duncan Sands wrote:> Hi Hal,
> 
> > As some of you may know, I committed my basic-block autovectorization
> > pass a few days ago. I encourage anyone interested to try it out (pass
> > -vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
> > Especially in combination with -unroll-allow-partial, I have observed
> > some significant benchmark speedups, but, I have also observed some
> > significant slowdowns.
> 
> codegen for vector constructs is not always that great in my experience.
> It could be that your vectorizer is doing the right thing, and it's
> codegen that needs to be improved.  For example when I use the GCC
> autovectorizer I often see LLVM codegen unnecessarily scalarizing the
> vector code.  Did you try to analyse these slowdowns?
There are a lot of them and I've only looked at a small fraction as of
yet. I have seen things that look like codegen deficiencies, but I've
not confirmed this in detail. One important case that I have noticed is
where the pass will vectorize sign-extended conversions, or int/float
conversions, etc. which end up being expensive to scalarize. There are
also cases where it vectorizes small integer operations which just get
scalarized by the codegen. I need to spend some more time looking at
this.

 -Hal 
> 
> Ciao, Duncan.
> 
>   I would like to share my thoughts, and hopefully
> > get feedback, on next steps.
> >
> > 1. "Target Data" for vectorization - I think that in order
to improve
> > the vectorization quality, the vectorizer will need more information
> > about the target. This information could be provided in the form of a
> > kind of extended target data. This extended target data might contain:
> >   - What basic types can be vectorized, and how many of them will fit
> > into (the largest) vector registers
> >   - What classes of operations can be vectorized (division,
conversions /
> > sign extension, etc. are not always supported)
> >   - What alignment is necessary for loads and stores
> >   - Is scalar-to-vector free?
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Chris Lattner

2012-Feb-06 22:26 UTC

head link

[LLVMdev] Vectorization: Next Steps

On Feb 2, 2012, at 7:56 PM, Hal Finkel wrote:> As some of you may know, I committed my basic-block autovectorization
> pass a few days ago. I encourage anyone interested to try it out (pass
> -vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
> Especially in combination with -unroll-allow-partial, I have observed
> some significant benchmark speedups, but, I have also observed some
> significant slowdowns. I would like to share my thoughts, and hopefully
> get feedback, on next steps.
Hi Hal,

I haven't had a chance to look at your pass in detail, but here are some
opinions: :)
> 1. "Target Data" for vectorization - I think that in order to
improve
> the vectorization quality, the vectorizer will need more information
> about the target. This information could be provided in the form of a
> kind of extended target data. This extended target data might contain:
> - What basic types can be vectorized, and how many of them will fit
> into (the largest) vector registers
> - What classes of operations can be vectorized (division, conversions /
> sign extension, etc. are not always supported)
> - What alignment is necessary for loads and stores
> - Is scalar-to-vector free?
I think that this will be a really important API, but I strongly advocate that
you model this after TargetLoweringInfo instead of TargetData.  First,
TargetData isn't actually a target API (it should be fixed, I filed PR11936
to track this).  Second, targets will have to implement imperative code to
return precise answers to questions.  For example, you'll want something
like "what is the cost of a shuffle with this mask" which will be
extremely target specific, will depend on what CPU subfeatures are enabled, etc.

When you start working on this, I strongly encourage you to propose the API you
want here.  Start small and add features as you go.
> 2. Feedback between passes - We may to implement a closer coupling
> between optimization passes than currently exists. Specifically, I have
> in mind two things:
> - The vectorizer should communicate more closely with the loop
> unroller. First, the loop unroller should try to unroll to preserve
> maximal load/store alignments. Second, I think it would make a lot of
> sense to be able to unroll and, only if this helps vectorization should
> the unrolled version be kept in preference to the original. With basic
> block vectorization, it is often necessary to (partially) unroll in
> order to vectorize. Even when we also have real loop vectorization,
> however, I still think that it will be important for the loop unroller
> to communicate with the vectorizer.
I really disagree with this, see below.
> 3. Loop vectorization - It would be nice to have, in addition to
> basic-block vectorization, a more-traditional loop vectorization pass. I
> think that we'll need a better loop analysis pass in order for this to
> happen. Some of this was started in LoopDependenceAnalysis, but that
> pass is not yet finished. We'll need something like this to recognize
> affine memory references, etc.
I think that a loop vectorizor and a basic block vectorizer both make perfect
sense and are important for different classes of code.  However, I don't
think that we should go down the path of trying to use a "basic block
vectorizor + loop unrolling" serve the purpose of a loop vectorizer. 
Trying to make a BBVectorizer and a loop unroller play together will be really
fragile, because they'll both have to duplicate the same metrics (otherwise,
for example, you'd unroll a loop that isn't vectorizable).  This will
also be a huge hit to compile time.

-Chris

Hal Finkel

2012-Feb-07 20:10 UTC

head link

[LLVMdev] Vectorization: Next Steps

On Mon, 2012-02-06 at 14:26 -0800, Chris Lattner wrote:> On Feb 2, 2012, at 7:56 PM, Hal Finkel wrote:
> > As some of you may know, I committed my basic-block autovectorization
> > pass a few days ago. I encourage anyone interested to try it out (pass
> > -vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
> > Especially in combination with -unroll-allow-partial, I have observed
> > some significant benchmark speedups, but, I have also observed some
> > significant slowdowns. I would like to share my thoughts, and
hopefully
> > get feedback, on next steps.
> 
> Hi Hal,
> 
> I haven't had a chance to look at your pass in detail, but here are
some opinions: :)
> 
> > 1. "Target Data" for vectorization - I think that in order
to improve
> > the vectorization quality, the vectorizer will need more information
> > about the target. This information could be provided in the form of a
> > kind of extended target data. This extended target data might contain:
> > - What basic types can be vectorized, and how many of them will fit
> > into (the largest) vector registers
> > - What classes of operations can be vectorized (division, conversions
/
> > sign extension, etc. are not always supported)
> > - What alignment is necessary for loads and stores
> > - Is scalar-to-vector free?
> 
> I think that this will be a really important API, but I strongly advocate
that you model this after TargetLoweringInfo instead of TargetData.  First,
TargetData isn't actually a target API (it should be fixed, I filed PR11936
to track this).  Second, targets will have to implement imperative code to
return precise answers to questions.  For example, you'll want something
like "what is the cost of a shuffle with this mask" which will be
extremely target specific, will depend on what CPU subfeatures are enabled, etc.
This makes sense. What do you think will be the best way of
synchronizing things like CPU subfeatures between this API and the
backend target libraries? They could be linked directly, although I
don't know if we want to do that. tablegen could extract a bunch of this
information into separate objects that get linked into opt.
> 
> When you start working on this, I strongly encourage you to propose the API
you want here.  Start small and add features as you go.
> 
> > 2. Feedback between passes - We may to implement a closer coupling
> > between optimization passes than currently exists. Specifically, I
have
> > in mind two things:
> > - The vectorizer should communicate more closely with the loop
> > unroller. First, the loop unroller should try to unroll to preserve
> > maximal load/store alignments. Second, I think it would make a lot of
> > sense to be able to unroll and, only if this helps vectorization
should
> > the unrolled version be kept in preference to the original. With basic
> > block vectorization, it is often necessary to (partially) unroll in
> > order to vectorize. Even when we also have real loop vectorization,
> > however, I still think that it will be important for the loop unroller
> > to communicate with the vectorizer.
> 
> I really disagree with this, see below.
> 
> > 3. Loop vectorization - It would be nice to have, in addition to
> > basic-block vectorization, a more-traditional loop vectorization pass.
I
> > think that we'll need a better loop analysis pass in order for
this to
> > happen. Some of this was started in LoopDependenceAnalysis, but that
> > pass is not yet finished. We'll need something like this to
recognize
> > affine memory references, etc.
> 
> I think that a loop vectorizor and a basic block vectorizer both make
perfect sense and are important for different classes of code.  However, I
don't think that we should go down the path of trying to use a "basic
block vectorizor + loop unrolling" serve the purpose of a loop vectorizer. 
Trying to make a BBVectorizer and a loop unroller play together will be really
fragile, because they'll both have to duplicate the same metrics (otherwise,
for example, you'd unroll a loop that isn't vectorizable).  This will
also be a huge hit to compile time.
The only problem with this comes from loops for which unrolling is
necessary to expose vectorization because the memory access pattern is
too complicated to model in more-traditional loop vectorization. This
generally is useful only in cases with a large number of flops per
memory operation (or maybe integer ops too, but I have less experience
with those), so maybe we can design a useful heuristic to handle those
cases. That having been said, unroll+(failed vectorize)+rollback is not
really any more expensive at compile time than unroll+(failed vectorize)
except that the resulting code would run faster (actually it is cheaper
to compile because the optimization/compilation of the unvectorized
unrolled loop code takes longer than the non-unrolled loop). There might
be a clean way of doing this; I'll think about it.

Thanks again,
Hal
> 
> -Chris
-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Carl-Philip Hänsch

2012-Feb-09 12:27 UTC

head link

[LLVMdev] Vectorization: Next Steps

I have a super-simple test case 4x4 matrix * 4-vector which gets correctly
unrolled, but is not vectorized by -bb-vectorize. (I used llvm 3.1svn)
I attached the test case so you can see what is going wrong there.

2012/2/3 Hal Finkel <hfinkel at anl.gov>
> As some of you may know, I committed my basic-block autovectorization
> pass a few days ago. I encourage anyone interested to try it out (pass
> -vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
> Especially in combination with -unroll-allow-partial, I have observed
> some significant benchmark speedups, but, I have also observed some
> significant slowdowns. I would like to share my thoughts, and hopefully
> get feedback, on next steps.
>
> 1. "Target Data" for vectorization - I think that in order to
improve
> the vectorization quality, the vectorizer will need more information
> about the target. This information could be provided in the form of a
> kind of extended target data. This extended target data might contain:
>  - What basic types can be vectorized, and how many of them will fit
> into (the largest) vector registers
>  - What classes of operations can be vectorized (division, conversions /
> sign extension, etc. are not always supported)
>  - What alignment is necessary for loads and stores
>  - Is scalar-to-vector free?
>
> 2. Feedback between passes - We may to implement a closer coupling
> between optimization passes than currently exists. Specifically, I have
> in mind two things:
>  - The vectorizer should communicate more closely with the loop
> unroller. First, the loop unroller should try to unroll to preserve
> maximal load/store alignments. Second, I think it would make a lot of
> sense to be able to unroll and, only if this helps vectorization should
> the unrolled version be kept in preference to the original. With basic
> block vectorization, it is often necessary to (partially) unroll in
> order to vectorize. Even when we also have real loop vectorization,
> however, I still think that it will be important for the loop unroller
> to communicate with the vectorizer.
>  - After vectorization, it would make sense for the vectorization pass
> to request further simplification, but only on those parts of the code
> that it modified.
>
> 3. Loop vectorization - It would be nice to have, in addition to
> basic-block vectorization, a more-traditional loop vectorization pass. I
> think that we'll need a better loop analysis pass in order for this to
> happen. Some of this was started in LoopDependenceAnalysis, but that
> pass is not yet finished. We'll need something like this to recognize
> affine memory references, etc.
>
> I look forward to hearing everyone's thoughts.
>
>  -Hal
>
> --
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120209/d8a6d21a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: matrix.c
Type: text/x-csrc
Size: 443 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120209/d8a6d21a/attachment.c>

Hal Finkel

2012-Feb-10 18:02 UTC

head link

[LLVMdev] Vectorization: Next Steps

Carl-Philip,

The reason that this does not vectorize is that it cannot vectorize the
stores; this leaves only the mul-add chains (and some chains with
loads), and they only have a depth of 2 (the threshold is 6).

If you give clang -mllvm -bb-vectorize-req-chain-depth=2 then it will
vectorize. The reason the heuristic has such a large default value is to
prevent cases where it costs more to permute all of the necessary values
into and out of the vector registers than is saved by vectorizing. Does
the code generated with -bb-vectorize-req-chain-depth=2 run faster than
the unvectorized code?

The heuristic can certainly be improved, and these kinds of test cases
are very important to that improvement process.

 -Hal

On Thu, 2012-02-09 at 13:27 +0100, Carl-Philip Hänsch
wrote:> I have a super-simple test case 4x4 matrix * 4-vector which gets
> correctly unrolled, but is not vectorized by -bb-vectorize. (I used
> llvm 3.1svn)
> I attached the test case so you can see what is going wrong there.
> 
> 2012/2/3 Hal Finkel <hfinkel at anl.gov>
>         As some of you may know, I committed my basic-block
>         autovectorization
>         pass a few days ago. I encourage anyone interested to try it
>         out (pass
>         -vectorize to opt or -mllvm -vectorize to clang) and provide
>         feedback.
>         Especially in combination with -unroll-allow-partial, I have
>         observed
>         some significant benchmark speedups, but, I have also observed
>         some
>         significant slowdowns. I would like to share my thoughts, and
>         hopefully
>         get feedback, on next steps.
>         
>         1. "Target Data" for vectorization - I think that in
order to
>         improve
>         the vectorization quality, the vectorizer will need more
>         information
>         about the target. This information could be provided in the
>         form of a
>         kind of extended target data. This extended target data might
>         contain:
>          - What basic types can be vectorized, and how many of them
>         will fit
>         into (the largest) vector registers
>          - What classes of operations can be vectorized (division,
>         conversions /
>         sign extension, etc. are not always supported)
>          - What alignment is necessary for loads and stores
>          - Is scalar-to-vector free?
>         
>         2. Feedback between passes - We may to implement a closer
>         coupling
>         between optimization passes than currently exists.
>         Specifically, I have
>         in mind two things:
>          - The vectorizer should communicate more closely with the
>         loop
>         unroller. First, the loop unroller should try to unroll to
>         preserve
>         maximal load/store alignments. Second, I think it would make a
>         lot of
>         sense to be able to unroll and, only if this helps
>         vectorization should
>         the unrolled version be kept in preference to the original.
>         With basic
>         block vectorization, it is often necessary to (partially)
>         unroll in
>         order to vectorize. Even when we also have real loop
>         vectorization,
>         however, I still think that it will be important for the loop
>         unroller
>         to communicate with the vectorizer.
>          - After vectorization, it would make sense for the
>         vectorization pass
>         to request further simplification, but only on those parts of
>         the code
>         that it modified.
>         
>         3. Loop vectorization - It would be nice to have, in addition
>         to
>         basic-block vectorization, a more-traditional loop
>         vectorization pass. I
>         think that we'll need a better loop analysis pass in order for
>         this to
>         happen. Some of this was started in LoopDependenceAnalysis,
>         but that
>         pass is not yet finished. We'll need something like this to
>         recognize
>         affine memory references, etc.
>         
>         I look forward to hearing everyone's thoughts.
>         
>          -Hal
>         
>         --
>         Hal Finkel
>         Postdoctoral Appointee
>         Leadership Computing Facility
>         Argonne National Laboratory
>         
>         _______________________________________________
>         LLVM Developers mailing list
>         LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>         http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Feb 2012 - [LLVMdev] Vectorization: Next Steps

[LLVMdev] Vectorization: Next Steps

[LLVMdev] Vectorization: Next Steps

[LLVMdev] Vectorization: Next Steps

[LLVMdev] Vectorization: Next Steps

[LLVMdev] Vectorization: Next Steps

[LLVMdev] Vectorization: Next Steps

[LLVMdev] Vectorization: Next Steps

[LLVMdev] Vectorization: Next Steps

[LLVMdev] Vectorization: Next Steps

Apparently Analagous Threads