Hi Arch,
Thanks for looking at this.
The reason the SLPVectorizer bails out on many cases that seem vectorizable is
scheduling. It needs to produce a legal schedule. The way it does this is by
making sure that it can move all vectorized instructions to the last instruction
in a bundle. (Alternatively, you could build a dag, make sure that you don’t
create cycles and then produce a topological sort, but this was not done out of
compile time concerns).
If I understand your patch correctly you are disabling the above mentioned check
if the vectorizer starts at an insertelement instruction? What about other
users? You still need to detect that you can schedule them correctly.
define <4 x float> @julia_foo111(<4 x float>, <4 x float>) {
top:
%2 = extractelement <4 x float> %0, i32 0
%3 = extractelement <4 x float> %1, i32 0
%4 = fadd float %2, %3
%5 = insertelement <4 x float> undef, float %4, i32 0
%6 = extractelement <4 x float> %0, i32 1
%7 = extractelement <4 x float> %1, i32 1
%8 = fadd float %6, %7
%foo = operation which has a use of %8 that potentially feeds %12 but even if
not all of its users now need to be move below %16 and we need to check all
their users recursively …
%9 = insertelement <4 x float> %5, float %8, i32 1
%10 = extractelement <4 x float> %0, i32 2
%11 = extractelement <4 x float> %1, i32 2
%12 = fadd float %10, %11
%13 = insertelement <4 x float> %9, float %12, i32 2
%14 = extractelement <4 x float> %0, i32 3
%15 = extractelement <4 x float> %1, i32 3
%16 = fadd float %14, %15
%17 = insertelement <4 x float> %13, float %16, i32 3
ret <4 x float> %17
}
For your case of insertelements that start a vector tree you would get away
keeping a set of “insertelement” instructions of of which trytoVectorizeList
below started of.
if (InsertElementInst *IE = dyn_cast<InsertElementInst>(it)) {
SmallVector<Value *, 8> Ops;
if (!findBuildVector(IE, Ops))
continue;
// add insert elements to InsertVectorRoot. you would need to make sure
that all ‘other’ uses of those insert elements are below the last insert.
if (tryToVectorizeList(Ops, R))
Instead of checking “buildsVector”. You could check this set.
if (RdxOps && RdxOps->count(UI))
continue;
+ // This user is part of building a vector
+ if (buildsVector) // use something like: if (InsertVectorRoot.count(UI))
instead.
+ continue;
+
And this set would also contain the instructions that need to be moved.
Alternatively, we could teach the slp vectorizer how to ‘vectorize’
insertelements and start the vectorization tree with the insertelements instead
of its operands. Then it would naturally work (because in tree users are
considered safe).
Best,
Arnold
On Mar 17, 2014, at 2:38 PM, Robison, Arch <arch.robison at intel.com>
wrote:
> define <4 x float> @julia_foo111(<4 x float>, <4 x
float>) {
> top:
> %2 = extractelement <4 x float> %0, i32 0
> %3 = extractelement <4 x float> %1, i32 0
> %4 = fadd float %2, %3
> %5 = insertelement <4 x float> undef, float %4, i32 0
> %6 = extractelement <4 x float> %0, i32 1
> %7 = extractelement <4 x float> %1, i32 1
> %8 = fadd float %6, %7
> %9 = insertelement <4 x float> %5, float %8, i32 1
> %10 = extractelement <4 x float> %0, i32 2
> %11 = extractelement <4 x float> %1, i32 2
> %12 = fadd float %10, %11
> %13 = insertelement <4 x float> %9, float %12, i32 2
> %14 = extractelement <4 x float> %0, i32 3
> %15 = extractelement <4 x float> %1, i32 3
> %16 = fadd float %14, %15
> %17 = insertelement <4 x float> %13, float %16, i32 3
> ret <4 x float> %17
> }