thr3ads.net - llvm dev - [LLVMdev] Bug #16941 [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Dmitry Babokin

2013-Oct-21 11:04 UTC

[LLVMdev] Bug #16941

Nadav,

Could you please have a look at bug #16941 and let us know what you think
about it? It's performance regression after one of your commits.

Thanks.

Dmitry.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/036e81d6/attachment.html>

Nadav Rotem

2013-Oct-21 15:01 UTC

head link

[LLVMdev] Bug #16941

Hi Dmitry. 

This looks like an ISPC workload. ISPC works around a limitation in selection
dag which does not know how to legalize mask types when both 128 and 256 bit
registers are available. ISPC works around this problem by expanding the mask to
i32s and using intrinsics. Can you please verify that this regression only
happens on AVX ? Can you change ISPC to use intrinsics ?

Thanks
Nadav

Sent from my iPhone
> On Oct 21, 2013, at 4:04, Dmitry Babokin <babokin at gmail.com>
wrote:
> 
> Nadav,
> 
> Could you please have a look at bug #16941 and let us know what you think
about it? It's performance regression after one of your commits.
> 
> Thanks.
> 
> Dmitry.

Dmitry Babokin

2013-Oct-21 18:12 UTC

head link

[LLVMdev] Bug #16941

Nadav,

You are absolutely right, it's ISPC workload. I've checked SSE4 and
it's
also severely affected.

We use intrinsics only for conversion <N x i32> <=> i32, i.e.
movmsk.ps.
For the rest we use general LLVM instructions. And I actually would really
like to stick this way. We rely on LLVM's ability to produce efficient code
from general LLVM IR. Relying on intrinsics too much would be a crunch and
a path to nowhere for many reasons :)

What is the reason for this transformation, if it doesn't lead to efficient
code?

Dmitry.

On Mon, Oct 21, 2013 at 7:01 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi Dmitry.
>
> This looks like an ISPC workload. ISPC works around a limitation in
> selection dag which does not know how to legalize mask types when both 128
> and 256 bit registers are available. ISPC works around this problem by
> expanding the mask to i32s and using intrinsics. Can you please verify that
> this regression only happens on AVX ? Can you change ISPC to use intrinsics
> ?
>
> Thanks
> Nadav
>
> Sent from my iPhone
>
> > On Oct 21, 2013, at 4:04, Dmitry Babokin <babokin at gmail.com>
wrote:
> >
> > Nadav,
> >
> > Could you please have a look at bug #16941 and let us know what you
> think about it? It's performance regression after one of your commits.
> >
> > Thanks.
> >
> > Dmitry.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/83d77e1f/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Oct 2013 - [LLVMdev] Bug #16941

[LLVMdev] Bug #16941

[LLVMdev] Bug #16941

[LLVMdev] Bug #16941

Apparently Analagous Threads