Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] LLVMdev Digest, Vol 112, Issue 56"
2013 Oct 21
0
[LLVMdev] First attempt at recognizing pointer reduction
On Oct 21, 2013, at 1:00 PM, Renato Golin <renato.golin at linaro.org> wrote:
> Hi Arnold,
>
> To sum up my intentions, I want to understand how the reduction/induction variable detection works in LLVM, so that I can know better how to detect different patterns in memory, not just the stride vectorization.
To detect memory access patterns you will want to look at the SCEV of a
2013 Oct 21
0
[LLVMdev] Bug #16941
Hi Dmitry,
ISPC does some instruction selection as part of vectorization (on ASTs!) by placing intrinsics for specific operations. The SEXT to i32 pattern was implemented because LLVM did not support vector-selects when this code was written.
Can you submit a small SSE4 test case that demonstrates the problem? Select is the canonical form of this operations, and SEXT is usually more
2013 Oct 21
2
[LLVMdev] Bug #16941
Nadav,
You are right, ISPC may issue intrinsics as a result of AST selection.
Though I believe that we should stick to LLVM IR whenever is possible.
Intrinsics may appear to be boundaries for optimizations (on both data and
control flow) and are generally not optimizable. LLVM may improve over time
from performance stand point and we would benefit from it (or it may play
against us, like in this
2013 Oct 21
2
[LLVMdev] Bug #16941
Nadav,
You are absolutely right, it's ISPC workload. I've checked SSE4 and it's
also severely affected.
We use intrinsics only for conversion <N x i32> <=> i32, i.e. movmsk.ps.
For the rest we use general LLVM instructions. And I actually would really
like to stick this way. We rely on LLVM's ability to produce efficient code
from general LLVM IR. Relying on
2013 Oct 21
0
[LLVMdev] Bug #16941
Hi Dmitry.
This looks like an ISPC workload. ISPC works around a limitation in selection dag which does not know how to legalize mask types when both 128 and 256 bit registers are available. ISPC works around this problem by expanding the mask to i32s and using intrinsics. Can you please verify that this regression only happens on AVX ? Can you change ISPC to use intrinsics ?
Thanks
Nadav
Sent
2013 Oct 26
1
[LLVMdev] Bug #16941
Hi Nadav,
ISPC is generating long vectors (on corresponding ISPC targets) this way
since the every beginning of ISPC as far as I know. There's no such things
in official LLVM documents as "illegal vectors", so people do expect that
arbitrary long vectors are supported and generated reasonably well. Note,
not super-optimal, but reasonably well. Keeping it this way allows
considering
2013 Oct 21
1
[LLVMdev] First attempt at recognizing pointer reduction
On 21 October 2013 20:58, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> For example these should be the SCEVs of “int a[2*i] = ; a[2*i+1] =”:
>
> {ptr, +, 8}_loop
> {ptr+4, +, 8}_loop
>
> Each access on its own requires a gather/scather (2 loads/stores when
> vectorized (VF=2) + inserts/extracts). But when we look at both at once we
> see that we only
2013 Oct 26
0
[LLVMdev] Bug #16941
Hi Dmitry,
Yes, this is a known problem with legalizing vector masks. The type <8 x i1> is legalized to 8 x i16, on SSE, but your operands are legalized to <4 x i32>. Type-legalization is performed per-node and we don’t have a good way to support instructions that mix the mask and operand type. Why does ISPC generate illegal vector types ? Does ISPC rely on the LLVM codegen to
2013 Oct 25
2
[LLVMdev] Bug #16941
Nadav,
The problem appears only for vectors longer than available hardware
register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8
on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers,
select converts them to a single XMM registers (i.e. 8 x 16 bit),
immediately after it converts back to two XMM registers and does blend.
Conversion forth and back has
2013 Oct 22
0
[LLVMdev] Bug #16941
On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <babokin at gmail.com> wrote:
> By the way, I'm curious, is the any reason why you focus on SSE4, not AVX? Seems that vectorizer should care the most about the latest silicon.
>
I am interested in looking at the SSE4 code because lowering of AVX code is more complicated, especially for masks. The problem that <8 x i1> can be
2013 Oct 21
2
[LLVMdev] First attempt at recognizing pointer reduction
Hi Arnold,
To sum up my intentions, I want to understand how the reduction/induction
variable detection works in LLVM, so that I can know better how to detect
different patterns in memory, not just the stride vectorization.
For instance, even if the relationship between each loop would be
complicated, I know that in each loop, all three reads are sequential. So,
at least, I could use a
2013 Oct 21
2
[LLVMdev] Bug #16941
Nadav,
Could you please have a look at bug #16941 and let us know what you think
about it? It's performance regression after one of your commits.
Thanks.
Dmitry.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/036e81d6/attachment.html>
2013 Oct 21
1
[LLVMdev] [lld] Handle _GLOBAL_OFFSET_TABLE symbol
The Hexagon target adds new atom using the addAbsoluteAtom() functions
and then assigns a virtual address in the finalizeSymbolValues()
routine. The X86_64 target uses addAtom() function to add an object of
the GLOBAL_OFFSET_TABLEAtom class to do the same thing. What is the
reason of this difference? Is the GLOBAL_OFFSET_TABLEAtom just a
useful wrapper which eliminates the necessity to assign an
2013 Sep 19
0
[LLVMdev] unaligned AVX store gets split into two instructions
Nadav,
We see multiple regressions after r172868 in ISPC compiler (based on LLVM
optimizer). The regressions are due to spill/reloads, which are due to
increase register pressure. This matches Zach's analysis. We've filed bug
17285 for this problem.
Is there any possibility to avoid splitting in case of multiple loads going
together?
Dmitry.
On Wed, Jul 10, 2013 at 1:12 PM, Zach
2013 Oct 23
0
[LLVMdev] First attempt at recognizing pointer reduction
On Oct 23, 2013, at 3:10 PM, Renato Golin <renato.golin at linaro.org> wrote:
> On 23 October 2013 16:05, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> In the examples you gave there are no reduction variables in the loop vectorizer’s sense. But, they all have memory accesses that are strided.
>
> This is what I don't get. As far as I understood, a
2011 Nov 29
1
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
I agree that a single vector index is sufficient for many cases. Matt Pharr (from the ISPC compiler), showed me an interesting case where there is a single pointer into an array. In this case we need to have two indices, where the first index is zero. Once the basic patch is in, we can start looking at adding support for arrays and multiple indices.
Nadav
-----Original Message-----
From: David
2013 Oct 24
1
[LLVMdev] First attempt at recognizing pointer reduction
On 23 October 2013 23:05, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> A reduction is something like:
>
> for (i= …) {
> r+= a[i];
> }
> return r;
>
Ok, so "reduction" is just a reduction in the map-reduce sense, and nothing
else.
You don’t need to transform them in the legality phase. Believe me ;). Look
> at how we handle stride one
2016 Sep 26
2
RFC: New intrinsics masked.expandload and masked.compressstore
|
|How would this work in this case? The result would need to affect the
|legality and cost of the memory instruction. From your poster, it looks
|like we're talking about loops with constructs like this:
|
|for (i =0; i < N; i++) {
| if (topVal > b[i]) {
| *dst = a[i];
| dst++;
| }
|}
|
|is this loop vectorizable at all without these constructs?
Good
2013 Feb 26
0
[LLVMdev] Generate scalar SSE instructions instead of packed instructions
Thanks for the reply, they were very helpful.
Is it enough to prevent BBVectorize from packing together double precision instructions? If a non-clang frontend is used, such as ISPC, is it possible that the IR may contain packed double instruction?
Tyler
From: Cameron McInally [mailto:cameron.mcinally at nyu.edu]
Sent: Thursday, February 21, 2013 6:39 PM
To: Nowicki, Tyler
Cc: Nadav Rotem; LLVM
2011 Nov 23
3
[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP
Duncan,
Thanks for the quick review! Here is a short description (design) of where I am going with this patch:
1. Motivation: Vectors-of-pointers is the first step in supporting scatter/gather instructions (available in AVX2, for example). I believe that this feature was requested on the mailing list before. As mentioned by Hal Finkel earlier today, this feature is desired by autovectorizers as