thr3ads.net - search: "slp"

Displaying 20 results from an estimated 431 matches for "slp".

Did you mean: lp

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2...

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <to...

Fatal trap 12: page fault while in kernel mode / current process=12 (swi1: net)

2006 Mar 17

Fatal trap 12: page fault while in kernel mode / current process=12 (swi1: net)

...0 dr3 0 dr4 0xffff0ff0 dr5 0x400 dr6 0xffff0ff0 dr7 0x400 propagate_priority+0x66: movq 0x48(%r15),%rdi db> ps pid proc uid ppid pgrp flag stat wmesg wchan cmd 8390 ffffff004b20b340 1002 8389 408 4000000 [SLPQ user map 0xffffff004a2d7950][SLP] sh 8389 ffffff004b20b680 1002 1796 408 0004000 [SLPQ wait 0xffffff004b20b680][SLP] sh 7902 ffffff0049e3c340 1002 408 408 0000100 [SLPQ accept 0xffffff0060c632c6][SLP] httpd 7901 ffffff004a256680 1002 408 408 0000100 [SLPQ select 0xffffffff80569bf0][S...

PSLP: Padded SLP Automatic Vectorization

2020 Oct 02

PSLP: Padded SLP Automatic Vectorization

On 9/29/2020 14:37, David Chisnall via llvm-dev wrote: > On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote: >> Hey, I noticed this talk from the EuroLLVM 2015 >> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) >> on the PSLP vectorization algorithm (CGO 2015 paper: >> http://vporpo.me/papers/pslp_cgo2015.pdf). >> >> Is anyone working on implementing it? >> >> If so, are there Phab reviews I can subscribe to? > > The CGO paper was based...

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

2015 Nov 10

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know. -Thx -----Original Message----- From: nrotem at apple.com [mailto:nrotem at apple.com] Sent: Tuesday, November 10, 2015 3:33 AM To: Charlie Turner Cc: Das, Dibyendu; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on...

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 06

[RFC] Make LoopVectorize Aware of SLP Operations

Hello, We would like to propose making LoopVectorize aware of SLP operations, to improve the generated code for loops operating on struct fields or doing complex math. At the moment, LoopVectorize uses interleaving to vectorize loops that operate on values loaded/stored from consecutive addresses: vector loads/stores are generated to combine consecutive load...

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 08

[RFC] Make LoopVectorize Aware of SLP Operations

Hi Florian! This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation that can handle complicated cas...

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

On Thu, Oct 24, 2013 at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > > I've been playing around with the SLPVectorizer trying to get it to > vectorize this simple program: > > define void @vector(i32 addrspace(1)* %out, i32 %index) { > entry: > %0 = alloca [4 x i32] > %x = getelementptr [4 x i32]* %0, i32 0, i32 0 > %y = getelementptr [4 x i32]* %0, i32 0, i32 1 > %z = gete...

[PATCH] allow to disable SLP with runtime option

2009 Apr 22

[PATCH] allow to disable SLP with runtime option

Hi everyone, I'd like to propose a patch for review. It enhances rsync when patched and compiled with slp support. It adds a new global boolean option, 'disable slp', which can be used to disable SLP advertisements at runtime. The idea behind this patch is to allow distributors to build rsync with SLP support compiled in, but to allow the users to turn it off without recompiling rsync on their...

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

2015 Nov 11

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

...Charlie Turner<mailto:charlesturner7c5 at gmail.com> Sent: ‎11/‎11/‎2015 6:34 PM To: Das, Dibyendu<mailto:Dibyendu.Das at amd.com> Cc: nrotem at apple.com<mailto:nrotem at apple.com>; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default > I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know. Do you have a time estimate on when you'll be able to get these numbers? Another option would be to default the flag...

[LLVMdev] MCJIT generates MOVAPS on unaligned address

2014 Aug 07

[LLVMdev] MCJIT generates MOVAPS on unaligned address

It's not reproducible with 'opt'. I call the SLP pass from my application and only then the wrong IR gets generated. On the attached module I call via the function pass manager: 1) TargetLibraryInfo with the target triple 2) Set the data layout 3) Basic Alias Analysis 4) SLP vectorizer This produces the wrong IR. On the other hand running the...

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 07

[LLVMdev] How to broaden the SLP vectorizer's search

The BB vectorizer has an option 'bb-vectorizer-search-limit'. Is there a similar option for the SLP vectorizer? Maybe an analysis pass' scope that can be widen? I have large basic blocks with instructions that should be merged into packed versions. However, the blocks are optimized independently from each other. Now, if the instructions to be merged aren't too far apart the SLP vecto...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...hey should read “trying to vectorize a list of …”; The problem is that the SCEV analysis is unable to detect that C[ir0] and C[ir1] are consecutive. Is this loop from an important benchmark ? Thanks, Nadav On Oct 30, 2013, at 11:13 AM, Frank Winter <fwinter at jlab.org> wrote: > The SLP vectorizer apparently did something in the prologue of the function (where storing of arguments on the stack happens) which then got eliminated later on (since I don't see any vector instructions in the final IR). Below the debug output of the SLP pass: > > Args: opt -O1 -vectorize-slp -...

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

2015 Nov 09

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

...compile-time impact of enabling this feature by default. I also ran the test-suite on an X86-64 machine. I can't imagine any other targets being uniquely effected in terms of compile-time by turning this on after testing both AArch64 and X86-64. I also timed running the regression tests with -slp-vectorize-hor enabled and disabled, no significant difference here either. There are no significant performance regressions (or much improvements) on AArch64 in night-test suite. I do see wins in third party benchmarks when using this flag, which is why I'm asking if there would be any objecti...

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 08

[LLVMdev] How to broaden the SLP vectorizer's search

...his function we process all of the stores in the function in buckets of 16 elements because constructing consecutive stores is implemented using an O(n^2) algorithm. You can try to increase this threshold to 128 and see if it helps. I also agree with Renato and Chad that adding a flag to tell the SLP-vectorizer to put more effort (compile time) into the problem is a good idea. Thanks, Nadav > On Aug 8, 2014, at 8:27 AM, Frank Winter <fwinter at jlab.org> wrote: > > I changed the max. recursion depth to 36, and tried then 1000 (from the original value of 12) and it did not i...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

----- Original Message ----- > > > I ran the BB vectorizer as I guess this is the SLP vectorizer. No, while the BB vectorizer is doing a form of SLP vectorization, there is a separate SLP vectorization pass which uses a different algorithm. You can pass -vectorize-slp to opt. -Hal > > BBV: using target information > BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_......

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

2015 Nov 09

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

...com> wrote: > Have you run cpu2006 for x86-64 for perf progression/regression ? > > Sent from my Windows Phone > ________________________________ > From: Charlie Turner via llvm-dev > Sent: ‎11/‎9/‎2015 11:15 PM > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default > > I've done compile-time experiments for AArch64 over SPEC{2000,2006} > and of course the test-suite. I measure no significant compile-time > impact of enabling this feature by default. > > I also ran the test-suite on an X86-...

[LLVMdev] loop vectorizer

2013 Oct 31

[LLVMdev] loop vectorizer

...ir0 = ( ((i+3)/inner) * 2 + 0 ) * inner + (i+3)%4; const std::uint64_t ir1 = ( ((i+3)/inner) * 2 + 1 ) * inner + (i+3)%4; c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; } } } This should be an ideal test case for the SLP vectorizer, right? It seems, I am out of luck: opt -O3 -vectorize-slp -debug loop.ll -S SLP: Analyzing blocks in _Z3barmmPfS_S_. SLP: Found 8 stores to vectorize. SLP: Analyzing a store chain of length 8. SLP: Trying to vectorize starting at PHIs (1) SLP: Vectorizing a list of length = 2. SLP: V...

[LLVMdev] Modifications to SLP

2015 Jul 07

[LLVMdev] Modifications to SLP

Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load fl...

[LLVMdev] loop vectorizer

2013 Oct 31

[LLVMdev] loop vectorizer

...gt; loop iter 0: > index_0 = 0 index_1 = 4 > index_0 = 1 index_1 = 5 > index_0 = 2 index_1 = 6 > index_0 = 3 index_1 = 7 > > loop iter 1: > index_0 = 8 index_1 = 12 > index_0 = 9 index_1 = 13 > index_0 = 10 index_1 = 14 > index_0 = 11 index_1 = 15 The SLP-vectorizer detects 8 stores, but it can’t prove that they are consecutive, so it moves on. Can you simplify the address expression ? Can you write " index0 = i*8 + 0 “ and give it a try ? > > For completeness, here the code: > > void bar(std::uint64_t start, std::uint64_t end,...

search for: slp