similar to: [RFC][SLP] Let's turn -slp-vectorize-hor on by default

Displaying 20 results from an estimated 7000 matches similar to: "[RFC][SLP] Let's turn -slp-vectorize-hor on by default"

2015 Nov 09
3
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I have not. I could feasibly do this, but I'm not set up to perform good experiments on X86-64 hardware. Furthermore, if I do it for X86-64, it only seems fair I should do it for the other backends as well, which is much less feasible for me. I'm reaching out the community to see if there's any objection based on their own measurements of this feature about defaulting it to on. Please
2015 Nov 11
2
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
We have started this. Since there are some holidays expect a small delay. Will let you know by Friday. Thx Sent from my Windows Phone ________________________________ From: Charlie Turner<mailto:charlesturner7c5 at gmail.com> Sent: ‎11/‎11/‎2015 6:34 PM To: Das, Dibyendu<mailto:Dibyendu.Das at amd.com> Cc: nrotem at apple.com<mailto:nrotem at apple.com>; llvm-dev at
2015 Nov 10
4
[RFC][SLP] Let's turn -slp-vectorize-hor on by default
I will try to get some spec cpu 2006 rate runs done under -O3 -flto with and without -slp-vectorize-hor and let you know. -Thx -----Original Message----- From: nrotem at apple.com [mailto:nrotem at apple.com] Sent: Tuesday, November 10, 2015 3:33 AM To: Charlie Turner Cc: Das, Dibyendu; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [RFC][SLP] Let's turn -slp-vectorize-hor on by default
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =
2014 Nov 28
5
[LLVMdev] [RFC] Removing BBVectorize?
Hi Everyone, I propose that we remove BBVectorize from trunk. Here's why: - It never made it from "interesting experiment" to "production quality" (it is not part of any in-tree optimization pipeline). - We now have an SLP vectorizer that we do use in production, had have for some time. - BBVectorize otherwise needs refactoring, and the implementation has lots of
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I
2013 Nov 07
1
[LLVMdev] SLP vectorizer turned on in commit r190916 which says nothing about it - how to turn it off?
Revision 190916 Commit message: "Lift alignment restrictions for load/store folding on VINSERTF128/VEXTRACTF128. Fixes PR17268." Actual contents of the commit includes Index: tools/opt/opt.cpp =================================================================== --- tools/opt/opt.cpp (revision 190915) +++ tools/opt/opt.cpp (revision 190916) @@ -462,6 +462,7 @@
2013 Jul 27
2
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote: > Author:
2017 Aug 16
2
Heroic LLVM optimizations
Hi Tobias- The loop fusion you mention is the one in libquantum/cpu2006 ? Or something else in cpu2017 ? -Thx Dibyendu -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Tobias Grosser via llvm-dev Sent: Wednesday, August 16, 2017 10:10 AM To: renau at uncore.io; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Heroic LLVM optimizations Hi
2013 Jul 27
0
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should
2013 Jul 27
1
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is
2017 Aug 16
1
Heroic LLVM optimizations
I'll be interested in seeing the improvements. As a reference, this is what I get in an Intel 6700K when I compare gcc 5.4 (Ofast flto) vs published Intel results. 23x in libquantum, and over 40% in many benchmarks. I think that it is mostly from AoS vs SoA and loop transformations. 5.4
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi, I am trying to understand LLVM vectorization implementation and was looking into both loop and SLP vectorization. test case 1: *int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return sum;}* This code is vectorized by loop vectorizer where we calculate scalar loop cost as 4 and vector loop cost as 2. Since vector loop cost is less and above reduction is legal to
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
On 1 July 2015 at 21:22, Frank Winter <fwinter at jlab.org> wrote: > there were two follow-up emails. I only got one... weird... > The issue is solved. The SLP vectorizer has > a magic number built into the code which determines the max. vector length > to search for. That was set to 128 bits. Increasing it to 256 bits solved > the issue. That looks like a simple fix. Is
2020 Oct 02
2
PSLP: Padded SLP Automatic Vectorization
On 9/29/2020 14:37, David Chisnall via llvm-dev wrote: > On 28/09/2020 15:45, Matt P. Dziubinski via llvm-dev wrote: >> Hey, I noticed this talk from the EuroLLVM 2015 >> (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) >> on the PSLP vectorization algorithm (CGO 2015 paper: >> http://vporpo.me/papers/pslp_cgo2015.pdf). >> >> Is anyone
2015 Jul 07
2
[LLVMdev] Modifications to SLP
Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load
2020 Sep 28
2
PSLP: Padded SLP Automatic Vectorization
Hey, I noticed this talk from the EuroLLVM 2015 (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) on the PSLP vectorization algorithm (CGO 2015 paper: http://vporpo.me/papers/pslp_cgo2015.pdf). Is anyone working on implementing it? If so, are there Phab reviews I can subscribe to? Best, Matt
2014 Aug 07
3
[LLVMdev] How to broaden the SLP vectorizer's search
The BB vectorizer has an option 'bb-vectorizer-search-limit'. Is there a similar option for the SLP vectorizer? Maybe an analysis pass' scope that can be widen? I have large basic blocks with instructions that should be merged into packed versions. However, the blocks are optimized independently from each other. Now, if the instructions to be merged aren't too far apart the
2014 Aug 08
2
[LLVMdev] How to broaden the SLP vectorizer's search
Hi Frank, Thanks for working on this. Please look at vectorizeStoreChains. In this function we process all of the stores in the function in buckets of 16 elements because constructing consecutive stores is implemented using an O(n^2) algorithm. You can try to increase this threshold to 128 and see if it helps. I also agree with Renato and Chad that adding a flag to tell the SLP-vectorizer to