similar to: [LLVMdev] GSoC 2009: Auto-vectorization

Displaying 20 results from an estimated 20000 matches similar to: "[LLVMdev] GSoC 2009: Auto-vectorization"

2009 Apr 01
0
[LLVMdev] GSoC 2009: Auto-vectorization
Andreas Bolka wrote: > Hi all, > > I'd like to have a first stab at a loop-based auto-vectorization pass as > part of 2009's Google Summer of Code program. As far as I can tell from > searching the mailing list archives, no work on such an auto-vectorizer > seems to be currently in progress. Hi Andreas, Actually, you'd be the third person to try writing one, with
2009 Apr 01
2
[LLVMdev] GSoC 2009: Auto-vectorization
Nick Lewycky wrote: > Andreas Bolka wrote: >> Hi all, >> >> I'd like to have a first stab at a loop-based auto-vectorization pass as >> part of 2009's Google Summer of Code program. As far as I can tell from >> searching the mailing list archives, no work on such an auto-vectorizer >> seems to be currently in progress. > > Hi Andreas, >
2009 Apr 01
0
[LLVMdev] GSoC 2009: Auto-vectorization
On Mar 31, 2009, at 5:27 PM, Andreas Bolka wrote: > Hi all, > I'd like to have a first stab at a loop-based auto-vectorization > pass as > part of 2009's Google Summer of Code program. As far as I can tell > from > searching the mailing list archives, no work on such an auto- > vectorizer > seems to be currently in progress. Hi Andreas, This would be a very
2009 Apr 01
0
[LLVMdev] GSoC 2009: Auto-vectorization
Hi Andreas, On 31-Mar-09, at 8:27 PM, Andreas Bolka wrote: > So, initially, I aim at supporting only the simplest loops such as: > > int a[256], b[256], c[256]; > for (int i = 0; i < 256; ++i) > c[i] = a[i] + b[i]; > > My goal is to implement the necessary analyses and transformations to > turn IR corresponding to such code into IR utilizing vector >
2009 Apr 01
1
[LLVMdev] GSoC 2009: Auto-vectorization
Hi Stefanus, On Wed Apr 01 16:08:45 +0200 2009, Stefanus Du Toit wrote: > On 31-Mar-09, at 8:27 PM, Andreas Bolka wrote: > > i.e. the core of the desired result would look like: > > > > %va = load <256 x i32>* %a > > %vb = load <256 x i32>* %b > > %vc = add <256 x i32> %a, %b > > store <256 x i32> %vc, <256 x i32>*
2009 Apr 01
1
[LLVMdev] GSoC 2009: Auto-vectorization
Hi Chris, On Wed Apr 01 08:18:28 +0200 2009, Chris Lattner wrote: > On Mar 31, 2009, at 5:27 PM, Andreas Bolka wrote: > > My goal is to implement the necessary analyses and transformations to > > turn IR corresponding to such code into IR utilizing vector > > instructions; > > Sounds great. Some important steps: > 1. We need an abstract dependence analysis
2013 Oct 25
2
[LLVMdev] Is there pass to break down <4 x float> to scalars
Hi, Great to see someone working on this. This will benefit the performance portability goal of the pocl's OpenCL kernel compiler. It has been one of the low hanging fruits in improving its implicit WG vectorization applicability. The use case there is that sometimes it makes sense to devectorize the explicitly used vector datatype code of OpenCL kernels in order to make better opportunities
2013 Oct 25
0
[LLVMdev] Is there pass to break down <4 x float> to scalars
Renato Golin <renato.golin at linaro.org> writes: > On 25 October 2013 11:06, Richard Sandiford <rsandifo at linux.vnet.ibm.com>wrote>> It would also need some TargetTransformInfo hooks to decide which >> vectors should be decomposed. > > If I got it right, this may not be necessary, or it may even be harmful. > > Say you decide that <4 x i32> vectors
2020 Oct 02
2
PSLP: Padded SLP Automatic Vectorization
On 9/29/2020 14:37, David Chisnall via llvm-dev wrote: > On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote: >> Hey, I noticed this talk from the EuroLLVM 2015 >> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) >> on the PSLP vectorization algorithm (CGO 2015 paper: >> http://vporpo.me/papers/pslp_cgo2015.pdf). >> >> Is anyone
2013 Oct 25
3
[LLVMdev] Is there pass to break down <4 x float> to scalars
On 25 October 2013 11:06, Richard Sandiford <rsandifo at linux.vnet.ibm.com>wrote: > I wanted the same thing for SystemZ, which doesn't have vectors, > in order to improve the llvmpipe code. > Hi Richard, This is a nice patch. I was wondering how hard it'd be to do that, and it seems that you're catching lots of corner cases. My interest is also due to converting odd
2020 Sep 28
2
PSLP: Padded SLP Automatic Vectorization
Hey, I noticed this talk from the EuroLLVM 2015 (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) on the PSLP vectorization algorithm (CGO 2015 paper: http://vporpo.me/papers/pslp_cgo2015.pdf). Is anyone working on implementing it? If so, are there Phab reviews I can subscribe to? Best, Matt
2013 Oct 30
3
[LLVMdev] loop vectorizer
----- Original Message ----- > > > I ran the BB vectorizer as I guess this is the SLP vectorizer. No, while the BB vectorizer is doing a form of SLP vectorization, there is a separate SLP vectorization pass which uses a different algorithm. You can pass -vectorize-slp to opt. -Hal > > BBV: using target information > BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_...
2013 Oct 30
3
[LLVMdev] loop vectorizer
On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com> wrote: > The access pattern to arrays a and b is non-linear. Unrolled loops are > usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all > values for i ? > Based on his list of values, it seems that the induction stride is linear within each block of 4 iterations, but it's not a clear
2012 Feb 03
8
[LLVMdev] Vectorization: Next Steps
As some of you may know, I committed my basic-block autovectorization pass a few days ago. I encourage anyone interested to try it out (pass -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. Especially in combination with -unroll-allow-partial, I have observed some significant benchmark speedups, but, I have also observed some significant slowdowns. I would like to share my
2013 Oct 31
3
[LLVMdev] loop vectorizer misses opportunity, exploit
----- Original Message ----- > > Hi Nadav, > > that's the whole point of it. I can't in general make the index > calculation simpler. The example given is the simplest non-trivial > index function that is needed. It might well be that it's that > simple that the index calculation in this case can be thrown aways > altogether and - as you say - be replaced by
2013 Oct 30
3
[LLVMdev] loop vectorizer
Hi Frank, > We are looking at a variety of target architectures. Ultimately we aim to run on BG/Q and Intel Xeon Phi (native). However, running on those architectures with the LLVM technology is planned in some future. As a first step we would target vanilla x86 with SSE/AVX 128/256 as a proof-of-concept. Great! It should be easy to support these targets. When you said wide-vectors I assumed
2015 Nov 09
3
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I have not. I could feasibly do this, but I'm not set up to perform good experiments on X86-64 hardware. Furthermore, if I do it for X86-64, it only seems fair I should do it for the other backends as well, which is much less feasible for me. I'm reaching out the community to see if there's any objection based on their own measurements of this feature about defaulting it to on. Please
2013 Oct 30
2
[LLVMdev] loop vectorizer
The debug messages are misleading. They should read “trying to vectorize a list of …”; The problem is that the SCEV analysis is unable to detect that C[ir0] and C[ir1] are consecutive. Is this loop from an important benchmark ? Thanks, Nadav On Oct 30, 2013, at 11:13 AM, Frank Winter <fwinter at jlab.org> wrote: > The SLP vectorizer apparently did something in the prologue of the
2015 Nov 10
4
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know. -Thx -----Original Message----- From: nrotem at apple.com [mailto:nrotem at apple.com] Sent: Tuesday, November 10, 2015 3:33 AM To: Charlie Turner Cc: Das, Dibyendu; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default
2013 Oct 30
0
[LLVMdev] loop vectorizer
I ran the BB vectorizer as I guess this is the SLP vectorizer. BBV: using target information BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_... BBV: found 2 instructions with candidate pairs BBV: found 0 pair connections. BBV: done! However, this was run on the unrolled loop (I guess). Here is the IR printed by 'opt': entry: %cmp9 = icmp ult i64 %start, %end br i1 %cmp9, label