Nadav,
You are absolutely right, it's ISPC workload. I've checked SSE4 and
it's
also severely affected.
We use intrinsics only for conversion <N x i32> <=> i32, i.e.
movmsk.ps.
For the rest we use general LLVM instructions. And I actually would really
like to stick this way. We rely on LLVM's ability to produce efficient code
from general LLVM IR. Relying on intrinsics too much would be a crunch and
a path to nowhere for many reasons :)
What is the reason for this transformation, if it doesn't lead to efficient
code?
Dmitry.
On Mon, Oct 21, 2013 at 7:01 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi Dmitry.
>
> This looks like an ISPC workload. ISPC works around a limitation in
> selection dag which does not know how to legalize mask types when both 128
> and 256 bit registers are available. ISPC works around this problem by
> expanding the mask to i32s and using intrinsics. Can you please verify that
> this regression only happens on AVX ? Can you change ISPC to use intrinsics
> ?
>
> Thanks
> Nadav
>
> Sent from my iPhone
>
> > On Oct 21, 2013, at 4:04, Dmitry Babokin <babokin at gmail.com>
wrote:
> >
> > Nadav,
> >
> > Could you please have a look at bug #16941 and let us know what you
> think about it? It's performance regression after one of your commits.
> >
> > Thanks.
> >
> > Dmitry.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/83d77e1f/attachment.html>