search for: see4

Displaying 5 results from an estimated 5 matches for "see4".

Did you mean: see
2013 Oct 25
2
[LLVMdev] Bug #16941
...;8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has huge overhead. I'm attaching 3 files with vectors of length 4, 8 and 16. Try 4 on SEE4 and you'll see that both cases work well, 8 demonstrates the difference on SSE4. The same on AVX (8 vs 16). On Wed, Oct 23, 2013 at 1:41 AM, Nadav Rotem <nrotem at apple.com> wrote: > > On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <babokin at gmail.com> wrote: > > B...
2013 Oct 26
0
[LLVMdev] Bug #16941
...; mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has huge overhead. > > I'm attaching 3 files with vectors of length 4, 8 and 16. Try 4 on SEE4 and you'll see that both cases work well, 8 demonstrates the difference on SSE4. The same on AVX (8 vs 16). > > > > > On Wed, Oct 23, 2013 at 1:41 AM, Nadav Rotem <nrotem at apple.com> wrote: > > On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <babokin at gmail.c...
2013 Oct 26
1
[LLVMdev] Bug #16941
...s two XMM registers, > select converts them to a single XMM registers (i.e. 8 x 16 bit), > immediately after it converts back to two XMM registers and does blend. > Conversion forth and back has huge overhead. > > I'm attaching 3 files with vectors of length 4, 8 and 16. Try 4 on SEE4 > and you'll see that both cases work well, 8 demonstrates the difference on > SSE4. The same on AVX (8 vs 16). > > > > > On Wed, Oct 23, 2013 at 1:41 AM, Nadav Rotem <nrotem at apple.com> wrote: > >> >> On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <...
2013 Oct 22
0
[LLVMdev] Bug #16941
On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <babokin at gmail.com> wrote: > By the way, I'm curious, is the any reason why you focus on SSE4, not AVX? Seems that vectorizer should care the most about the latest silicon. > I am interested in looking at the SSE4 code because lowering of AVX code is more complicated, especially for masks. The problem that <8 x i1> can be
2013 Oct 21
2
[LLVMdev] Bug #16941
Nadav, You are right, ISPC may issue intrinsics as a result of AST selection. Though I believe that we should stick to LLVM IR whenever is possible. Intrinsics may appear to be boundaries for optimizations (on both data and control flow) and are generally not optimizable. LLVM may improve over time from performance stand point and we would benefit from it (or it may play against us, like in this