On Aug 20, 2013, at 10:22 , Nadav Rotem <nrotem at apple.com> wrote:> Can you send the IR of the function ?Attached is the -O0 and -O3 IR -------------- next part -------------- A non-text attachment was scrubbed... Name: vselect_optimized.ll Type: application/octet-stream Size: 1545 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130820/94befc0a/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: vselect_unoptimized.ll Type: application/octet-stream Size: 4423 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130820/94befc0a/attachment-0001.obj>
Hi Matt, This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able to catch it. Thanks, Nadav On Aug 20, 2013, at 1:14 PM, Matt Arsenault <arsenm2 at gmail.com> wrote:> On Aug 20, 2013, at 10:22 , Nadav Rotem <nrotem at apple.com> wrote: > >> Can you send the IR of the function ? > > Attached is the -O0 and -O3 IR > > <vselect_optimized.ll><vselect_unoptimized.ll>
Nadav, I think what matt was looking for is why the slp-vectorizer is not vectorizing the booleans? To me it seems like the vectorizer got the first step right(vectorizing the operands), but not the second step(vectorizing the comparison operation). I actually would expect a single icmp ne <4 x i32> %c, <4 x i32><i32 0, i32 0, i32 0, i32 0> instead of 4 icmp's. Micah> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On > Behalf Of Nadav Rotem > Sent: Tuesday, August 20, 2013 2:49 PM > To: Matt Arsenault > Cc: Mailing List > Subject: Re: [LLVMdev] Failure to optimize vector select > > Hi Matt, > > This code maintains a vector of float4 and it inserts and extracts values from > this vector. The 'select' operations are already vectorized. Maybe a sequence > of inst-combines (or DAG-combines) can help. If you re-write this code using > scalars then the slp-vectorizer, with some tweaks, will be able to catch it. > > Thanks, > Nadav > > > On Aug 20, 2013, at 1:14 PM, Matt Arsenault <arsenm2 at gmail.com> wrote: > > > On Aug 20, 2013, at 10:22 , Nadav Rotem <nrotem at apple.com> wrote: > > > >> Can you send the IR of the function ? > > > > Attached is the -O0 and -O3 IR > > > > <vselect_optimized.ll><vselect_unoptimized.ll> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Aug 20, 2013, at 14:49 , Nadav Rotem <nrotem at apple.com> wrote:> Hi Matt, > > This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able to catch it. >I've tried manually scalarizing the arguments so the other select arguments are scalars, but the vectorizer still doesn't change it. Here is the scalarized IR. -------------- next part -------------- A non-text attachment was scrubbed... Name: manual_scalarize.ll Type: application/octet-stream Size: 1739 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130820/1b1f33b4/attachment.obj> -------------- next part --------------