Have you tried running SLP vectorizer pass (-vectorize-slp)?
Eugene
On Mon, Aug 19, 2013 at 9:04 PM, Matt Arsenault <arsenm2 at gmail.com>
wrote:
> Hi,
>
> I've found a case I would expect would optimize easily, but it
doesn't. A
> simple implementation of vector select:
>
> float4 simple_select(float4 a, float4 b, int4 c)
> {
> float4 result;
>
> result.x = c.x ? a.x : b.x;
> result.y = c.y ? a.y : b.y;
> result.z = c.z ? a.z : b.z;
> result.w = c.w ? a.w : b.w;
>
> return result;
> }
>
> I would expect this would be optimized to
>
> %bool = icmp eq <4 x i32> %c, 0
> %result = select <4 x i1> %bool, <4 x float> %a, <4x
float> %b
> ret <4 x float> %result
>
> However, it actually ends up as the 4 separate extractelement/icmp/select
> sequence.
>
> Where would be the best place to fix this? Should InstCombine be taking
> care of this or the vectorizer?
>
>
> Thanks
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/9f3b44dc/attachment.html>