similar to: SLP regression on SystemZ

Displaying 20 results from an estimated 6000 matches similar to: "SLP regression on SystemZ"

2013 Oct 24
4
[LLVMdev] Vectorizing alloca instructions
Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2 %w = getelementptr [4 x i32]* %0, i32 0, i32 3
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi, I am trying to understand LLVM vectorization implementation and was looking into both loop and SLP vectorization. test case 1: *int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return sum;}* This code is vectorized by loop vectorizer where we calculate scalar loop cost as 4 and vector loop cost as 2. Since vector loop cost is less and above reduction is legal to
2013 Oct 24
1
[LLVMdev] Vectorizing alloca instructions
On Oct 24, 2013, at 3:00 PM, Chandler Carruth <chandlerc at google.com> wrote: > Just a note, I don't think you should or need to vectorize the actual alloca stuff. If you can simply transform the dynamically indexed load: > > Then running SROA and InstCombine will mop up the rest. So its mostly about getting the SLPVectorizer to handle the dynamic GEP. As soon as it does
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
On Thu, Oct 24, 2013 at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > > I've been playing around with the SLPVectorizer trying to get it to > vectorize this simple program: > > define void @vector(i32 addrspace(1)* %out, i32 %index) { > entry: > %0 = alloca [4 x i32] > %x = getelementptr [4 x i32]* %0, i32 0, i32 0 > %y = getelementptr [4
2009 Apr 22
1
[PATCH] allow to disable SLP with runtime option
Hi everyone, I'd like to propose a patch for review. It enhances rsync when patched and compiled with slp support. It adds a new global boolean option, 'disable slp', which can be used to disable SLP advertisements at runtime. The idea behind this patch is to allow distributors to build rsync with SLP support compiled in, but to allow the users to turn it off without recompiling
2015 Nov 11
2
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
We have started this. Since there are some holidays expect a small delay. Will let you know by Friday. Thx Sent from my Windows Phone ________________________________ From: Charlie Turner<mailto:charlesturner7c5 at gmail.com> Sent: ‎11/‎11/‎2015 6:34 PM To: Das, Dibyendu<mailto:Dibyendu.Das at amd.com> Cc: nrotem at apple.com<mailto:nrotem at apple.com>; llvm-dev at
2015 Nov 10
4
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know. -Thx -----Original Message----- From: nrotem at apple.com [mailto:nrotem at apple.com] Sent: Tuesday, November 10, 2015 3:33 AM To: Charlie Turner Cc: Das, Dibyendu; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default
2020 Oct 02
2
PSLP: Padded SLP Automatic Vectorization
On 9/29/2020 14:37, David Chisnall via llvm-dev wrote: > On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote: >> Hey, I noticed this talk from the EuroLLVM 2015 >> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) >> on the PSLP vectorization algorithm (CGO 2015 paper: >> http://vporpo.me/papers/pslp_cgo2015.pdf). >> >> Is anyone
2015 Nov 09
3
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I have not. I could feasibly do this, but I'm not set up to perform good experiments on X86-64 hardware. Furthermore, if I do it for X86-64, it only seems fair I should do it for the other backends as well, which is much less feasible for me. I'm reaching out the community to see if there's any objection based on their own measurements of this feature about defaulting it to on. Please
2014 Aug 07
3
[LLVMdev] How to broaden the SLP vectorizer's search
The BB vectorizer has an option 'bb-vectorizer-search-limit'. Is there a similar option for the SLP vectorizer? Maybe an analysis pass' scope that can be widen? I have large basic blocks with instructions that should be merged into packed versions. However, the blocks are optimized independently from each other. Now, if the instructions to be merged aren't too far apart the
2015 Jul 07
2
[LLVMdev] Modifications to SLP
Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load
2015 Nov 09
2
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I've done compile-time experiments for AArch64 over SPEC{2000,2006} and of course the test-suite. I measure no significant compile-time impact of enabling this feature by default. I also ran the test-suite on an X86-64 machine. I can't imagine any other targets being uniquely effected in terms of compile-time by turning this on after testing both AArch64 and X86-64. I also timed running
2014 Aug 08
2
[LLVMdev] How to broaden the SLP vectorizer's search
Hi Frank, Thanks for working on this. Please look at vectorizeStoreChains. In this function we process all of the stores in the function in buckets of 16 elements because constructing consecutive stores is implemented using an O(n^2) algorithm. You can try to increase this threshold to 128 and see if it helps. I also agree with Renato and Chad that adding a flag to tell the SLP-vectorizer to
2002 Sep 22
2
[patch] SLP support (+ question)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 G'day The attached patch adds "first draft" support for service location protocol, using OpenSLP (http://www.openslp.org). This allows you to automagically discover all the rsync servers on your network (which is defined in terms of your SLP configuration - typically equal to multicast scope, but you can change it around with
2018 Feb 08
0
[RFC] Make LoopVectorize Aware of SLP Operations
Hi Florian! This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation that can handle complicated cases to something ad-hoc for some
2014 Mar 17
2
[LLVMdev] Improving SLPVectorizer for Julia
I'm working on some small improvements to SLPVectorizer.cpp so that it can deal with some tuple operations arising from Julia code. Being fairly new to LLVM, I could use some advice, particular from those familiar with the internals of SLPVectorizer. The motivation can be found in the Julia discussion https://github.com/JuliaLang/julia/issues/5857 . Here is an example of the kind of LLVM