Displaying 20 results from an estimated 6000 matches similar to: "SLP regression on SystemZ"
2013 Oct 24
4
[LLVMdev] Vectorizing alloca instructions
Hi,
I've been playing around with the SLPVectorizer trying to get it to
vectorize this simple program:
define void @vector(i32 addrspace(1)* %out, i32 %index) {
entry:
%0 = alloca [4 x i32]
%x = getelementptr [4 x i32]* %0, i32 0, i32 0
%y = getelementptr [4 x i32]* %0, i32 0, i32 1
%z = getelementptr [4 x i32]* %0, i32 0, i32 2
%w = getelementptr [4 x i32]* %0, i32 0, i32 3
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
Hi Tom,
Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA.
Thanks,
Nadav
On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:
> Hi,
>
>
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi,
I am trying to understand LLVM vectorization implementation and was looking
into both loop and SLP vectorization.
test case 1:
*int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return
sum;}*
This code is vectorized by loop vectorizer where we calculate scalar loop
cost as 4 and vector loop cost as 2.
Since vector loop cost is less and above reduction is legal to
2013 Oct 24
1
[LLVMdev] Vectorizing alloca instructions
On Oct 24, 2013, at 3:00 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Just a note, I don't think you should or need to vectorize the actual alloca stuff. If you can simply transform the dynamically indexed load:
>
> Then running SROA and InstCombine will mop up the rest. So its mostly about getting the SLPVectorizer to handle the dynamic GEP. As soon as it does
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
Hi Arnold,
Thanks for your reply.
I tried test case as suggested by you.
*void foo(int *a, int *sum) {*sum =
a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}*
so that it has a 'store' in its IR.
*IR before vectorization :*target datalayout =
"e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple =
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog,
Thanks for looking at this. This has recently got itself onto my TODO list
too.
> I am not sure how much all this will improve the code quality for
horizontal reduction
> (donno how frequently such pattern of horizontal reduction from same
array occurs in real world/SPECS).
Actually the main loop of 470.lbm can be SLP vectorized like this. We have
three parts to it: A fully
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav,
Thanks for the quick reply !!
Ok, so as of now we are lacking capability to handle flat large reductions.
I did go through function vectorizeChainsInBlock() (line number 2862). In
this function,
we try to vectorize if we have phi nodes in the IR (several if's check for
phi nodes) i.e we try to
construct tree that starts at chains.
Any pointers on how to join multiple trees? I
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
On Thu, Oct 24, 2013 at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:
> Hi,
>
> I've been playing around with the SLPVectorizer trying to get it to
> vectorize this simple program:
>
> define void @vector(i32 addrspace(1)* %out, i32 %index) {
> entry:
> %0 = alloca [4 x i32]
> %x = getelementptr [4 x i32]* %0, i32 0, i32 0
> %y = getelementptr [4
2009 Apr 22
1
[PATCH] allow to disable SLP with runtime option
Hi everyone,
I'd like to propose a patch for review. It enhances rsync when
patched and compiled with slp support.
It adds a new global boolean option, 'disable slp', which can be used to disable
SLP advertisements at runtime. The idea behind this patch is to allow
distributors to build rsync with SLP support compiled in, but to allow
the users to turn it off without recompiling
2015 Nov 11
2
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
We have started this. Since there are some holidays expect a small delay. Will let you know by Friday.
Thx
Sent from my Windows Phone
________________________________
From: Charlie Turner<mailto:charlesturner7c5 at gmail.com>
Sent: 11/11/2015 6:34 PM
To: Das, Dibyendu<mailto:Dibyendu.Das at amd.com>
Cc: nrotem at apple.com<mailto:nrotem at apple.com>; llvm-dev at
2015 Nov 10
4
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know.
-Thx
-----Original Message-----
From: nrotem at apple.com [mailto:nrotem at apple.com]
Sent: Tuesday, November 10, 2015 3:33 AM
To: Charlie Turner
Cc: Das, Dibyendu; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default
2020 Oct 02
2
PSLP: Padded SLP Automatic Vectorization
On 9/29/2020 14:37, David Chisnall via llvm-dev wrote:
> On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote:
>> Hey, I noticed this talk from the EuroLLVM 2015
>> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf)
>> on the PSLP vectorization algorithm (CGO 2015 paper:
>> http://vporpo.me/papers/pslp_cgo2015.pdf).
>>
>> Is anyone
2015 Nov 09
3
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I have not. I could feasibly do this, but I'm not set up to perform
good experiments on X86-64 hardware. Furthermore, if I do it for
X86-64, it only seems fair I should do it for the other backends as
well, which is much less feasible for me. I'm reaching out the
community to see if there's any objection based on their own
measurements of this feature about defaulting it to on.
Please
2014 Aug 07
3
[LLVMdev] How to broaden the SLP vectorizer's search
The BB vectorizer has an option 'bb-vectorizer-search-limit'. Is there a
similar option for the SLP vectorizer? Maybe an analysis pass' scope
that can be widen?
I have large basic blocks with instructions that should be merged into
packed versions. However, the blocks are optimized independently from
each other. Now, if the instructions to be merged aren't too far apart
the
2015 Jul 07
2
[LLVMdev] Modifications to SLP
Hi all!
It takes the current SLP vectorizer too long to vectorize my scalar
code. I am talking here about functions that have a single, huge basic
block with O(10^6) instructions. Here's an example:
%0 = getelementptr float* %arg1, i32 49
%1 = load float* %0
%2 = getelementptr float* %arg1, i32 4145
%3 = load float* %2
%4 = getelementptr float* %arg2, i32 49
%5 = load
2015 Nov 09
2
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I've done compile-time experiments for AArch64 over SPEC{2000,2006}
and of course the test-suite. I measure no significant compile-time
impact of enabling this feature by default.
I also ran the test-suite on an X86-64 machine. I can't imagine any
other targets being uniquely effected in terms of compile-time by
turning this on after testing both AArch64 and X86-64. I also timed
running
2014 Aug 08
2
[LLVMdev] How to broaden the SLP vectorizer's search
Hi Frank,
Thanks for working on this. Please look at vectorizeStoreChains. In this function we process all of the stores in the function in buckets of 16 elements because constructing consecutive stores is implemented using an O(n^2) algorithm. You can try to increase this threshold to 128 and see if it helps.
I also agree with Renato and Chad that adding a flag to tell the SLP-vectorizer to
2002 Sep 22
2
[patch] SLP support (+ question)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
G'day
The attached patch adds "first draft" support for service location protocol,
using OpenSLP (http://www.openslp.org). This allows you to automagically
discover all the rsync servers on your network (which is defined in terms of
your SLP configuration - typically equal to multicast scope, but you can
change it around with
2018 Feb 08
0
[RFC] Make LoopVectorize Aware of SLP Operations
Hi Florian!
This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation that can handle complicated cases to something ad-hoc for some
2014 Mar 17
2
[LLVMdev] Improving SLPVectorizer for Julia
I'm working on some small improvements to SLPVectorizer.cpp so that it can deal with some tuple operations arising from Julia code. Being fairly new to LLVM, I could use some advice, particular from those familiar with the internals of SLPVectorizer.
The motivation can be found in the Julia discussion https://github.com/JuliaLang/julia/issues/5857 . Here is an example of the kind of LLVM