thr3ads.net - similar to: "[patch] SLP support (+ question)"

Displaying 20 results from an estimated 1000 matches similar to: "[patch] SLP support (+ question)"

[PATCH] rsync-patches/slp.diff: use lp_num_modules instead of the removed lp_numserv

2013 Oct 25

[PATCH] rsync-patches/slp.diff: use lp_num_modules instead of the removed lp_numserv

Hello, rsync-patches/slp.diff is still using lp_numserv which was removed by commit b583594ac7d2f8a38aca85c1bfa4b1487122377a Signed-off-by: Vitezslav Cizek <vcizek at suse.cz> --- slp.diff | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/slp.diff b/slp.diff index a9703f1..953b400 100644 --- a/slp.diff +++ b/slp.diff @@ -479,7 +479,7 @@ new file mode 100644 +

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

2015 Nov 09

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

I've done compile-time experiments for AArch64 over SPEC{2000,2006} and of course the test-suite. I measure no significant compile-time impact of enabling this feature by default. I also ran the test-suite on an X86-64 machine. I can't imagine any other targets being uniquely effected in terms of compile-time by turning this on after testing both AArch64 and X86-64. I also timed running

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

2015 Nov 09

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

I have not. I could feasibly do this, but I'm not set up to perform good experiments on X86-64 hardware. Furthermore, if I do it for X86-64, it only seems fair I should do it for the other backends as well, which is much less feasible for me. I'm reaching out the community to see if there's any objection based on their own measurements of this feature about defaulting it to on. Please

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

2015 Nov 10

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know. -Thx -----Original Message----- From: nrotem at apple.com [mailto:nrotem at apple.com] Sent: Tuesday, November 10, 2015 3:33 AM To: Charlie Turner Cc: Das, Dibyendu; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

2015 Nov 11

[RFC][SLP] Let's turn -slp-vectorize-hor on by default

We have started this. Since there are some holidays expect a small delay. Will let you know by Friday. Thx Sent from my Windows Phone ________________________________ From: Charlie Turner<mailto:charlesturner7c5 at gmail.com> Sent: ‎11/‎11/‎2015 6:34 PM To: Das, Dibyendu<mailto:Dibyendu.Das at amd.com> Cc: nrotem at apple.com<mailto:nrotem at apple.com>; llvm-dev at

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 07

[LLVMdev] How to broaden the SLP vectorizer's search

The BB vectorizer has an option 'bb-vectorizer-search-limit'. Is there a similar option for the SLP vectorizer? Maybe an analysis pass' scope that can be widen? I have large basic blocks with instructions that should be merged into packed versions. However, the blocks are optimized independently from each other. Now, if the instructions to be merged aren't too far apart the

[PATCH] allow to disable SLP with runtime option

2009 Apr 22

[PATCH] allow to disable SLP with runtime option

Hi everyone, I'd like to propose a patch for review. It enhances rsync when patched and compiled with slp support. It adds a new global boolean option, 'disable slp', which can be used to disable SLP advertisements at runtime. The idea behind this patch is to allow distributors to build rsync with SLP support compiled in, but to allow the users to turn it off without recompiling

PSLP: Padded SLP Automatic Vectorization

2020 Sep 28

PSLP: Padded SLP Automatic Vectorization

Hey, I noticed this talk from the EuroLLVM 2015 (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) on the PSLP vectorization algorithm (CGO 2015 paper: http://vporpo.me/papers/pslp_cgo2015.pdf). Is anyone working on implementing it? If so, are there Phab reviews I can subscribe to? Best, Matt

PSLP: Padded SLP Automatic Vectorization

2020 Oct 02

PSLP: Padded SLP Automatic Vectorization

On 9/29/2020 14:37, David Chisnall via llvm-dev wrote: > On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote: >> Hey, I noticed this talk from the EuroLLVM 2015 >> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) >> on the PSLP vectorization algorithm (CGO 2015 paper: >> http://vporpo.me/papers/pslp_cgo2015.pdf). >> >> Is anyone

[LLVMdev] SLP vectorizer on AVX feature

2015 Jul 01

[LLVMdev] SLP vectorizer on AVX feature

On 1 July 2015 at 21:22, Frank Winter <fwinter at jlab.org> wrote: > there were two follow-up emails. I only got one... weird... > The issue is solved. The SLP vectorizer has > a magic number built into the code which determines the max. vector length > to search for. That was set to 128 bits. Increasing it to 256 bits solved > the issue. That looks like a simple fix. Is

SLP regression on SystemZ

2017 Mar 24

SLP regression on SystemZ

Hi, I have come across a major regression resulting after SLP vectorization (+18% on SystemZ, just for enabling SLP). This all relates to one particular very hot loop. Scalar code: %conv252 = zext i16 %110 to i64 %conv254 = zext i16 %111 to i64 %sub255 = sub nsw i64 %conv252, %conv254 ... repeated SLP output: %101 = zext <16 x i16> %100 to <16 x i64> %104 = zext

[LLVMdev] Modifications to SLP

2015 Jul 07

[LLVMdev] Modifications to SLP

Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 08

[LLVMdev] How to broaden the SLP vectorizer's search

Hi Frank, Thanks for working on this. Please look at vectorizeStoreChains. In this function we process all of the stores in the function in buckets of 16 elements because constructing consecutive stores is implemented using an O(n^2) algorithm. You can try to increase this threshold to 128 and see if it helps. I also agree with Renato and Chad that adding a flag to tell the SLP-vectorizer to

SLP example not being vectorized

2019 Nov 28

SLP example not being vectorized

Hi, I am new to llvm with a particular interested in the optimization area, specially on SLP. While working through the tutorial, I ran this example [1] with the hope to see SLP vectorization in action but for some reason, I do not see it on the LLVM assembly as seen below. Is there anything I am missing? I am using Clearlinux as build machine and this has clang version 9.0.0.

[LLVMdev] Enabling the SLP-vectorizer by default for -O3

2013 Jul 29

[LLVMdev] Enabling the SLP-vectorizer by default for -O3

Cool. Thanks! -Jim On Jul 29, 2013, at 1:07 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 29 July 2013 20:39, Jim Grosbach <grosbach at apple.com> wrote: > These results are really excellent. They’re on Intel, I assume, right? What do the ARM numbers look like? Before enabling by default, we should make sure that the results are comparable there as well. > >

[LLVMdev] SLP vectorizer on AVX feature

2015 Jul 01

[LLVMdev] SLP vectorizer on AVX feature

Frank, It sounds like the SLP vectorizer thinks that it is more profitable to use 128bit wide operations (because 256bit operations are double pumped on Sandybridge). Did you see a different result on Haswell? Thanks, Nadav > On Jul 1, 2015, at 11:06 AM, Frank Winter <fwinter at jlab.org> wrote: > > I realized that the function parameters had no alignment attributes on them.

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 07

[LLVMdev] How to broaden the SLP vectorizer's search

On 7 August 2014 17:33, Chad Rosier <mcrosier at codeaurora.org> wrote: > You might consider filing a bug (llvm.org/bugs) requesting a flag, but I > don't know if the code owners want to expose such a flag. I'm not sure that's a good idea as a raw access to that limit, as there are no guarantees that it'll stay the same. But maybe a flag turning some

Extending SLP Vectorizer to deal with aggregates?

2015 Oct 14

Extending SLP Vectorizer to deal with aggregates?

I'm looking for a sanity check on extending SLP Vectorizer to deal with aggregates. I'd like to vectorize Julia tuple operations. The Julia compiler lowers tuples to LLVM arrays, not LLVM vectors. I've tried making Julia lower tuples to LLVM vectors, but that hurt performance when SLP Vectorizer was not applicable, because of extraction/insertion overhead. I.e., the Julia lowering

[LLVMdev] Enabling the SLP-vectorizer by default for -O3

2013 Jul 29

[LLVMdev] Enabling the SLP-vectorizer by default for -O3

On 29 July 2013 20:39, Jim Grosbach <grosbach at apple.com> wrote: > These results are really excellent. They’re on Intel, I assume, right? > What do the ARM numbers look like? Before enabling by default, we should > make sure that the results are comparable there as well. > Hi Jim, I'll have a look. --renato -------------- next part -------------- An HTML attachment was

Data structure improvement for the SLP vectorizer

2017 Mar 15

Data structure improvement for the SLP vectorizer

There was some discussion of this on the llvm-commits list, but I wanted to raise the topic for discussion here. The background of the -commits discussion was that r296863 added the ability to sort memory access when the SLP vectorizer reached a load (the SLP vectorizer starts at a store or some other sink, and tries to go up the tree vectorizing as it goes along - if the input is in a different

similar to: [patch] SLP support (+ question)