thr3ads.net - similar to: "Loop Vectorize: Testing cost model driven transformations"

Displaying 20 results from an estimated 9000 matches similar to: "Loop Vectorize: Testing cost model driven transformations"

Loop Vectorize: Testing cost model driven transformations

2016 Nov 29

Loop Vectorize: Testing cost model driven transformations

On Tue, Nov 29, 2016 at 5:11 PM, Adam Nemet via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Do we need a new (loop-vectorizer-specific) command line option for this? > Don’t we get the default TTI if the target is unspecified in the test? I think you're right! It looks like I am getting the default TTI when the target is left unspecified. I was assuming it would default to

Loop Vectorize: Testing cost model driven transformations

2016 Nov 30

Loop Vectorize: Testing cost model driven transformations

Yeah, this makes a lot of sense, -mcpu=generic (as opposed to -mcpu=native) is the sane default. I guess I was just expecting an x86 host to get a "generic x86 TTI" (whatever that means), not a "generic TTI". On Wed, Nov 30, 2016 at 11:49 AM, Matthew Simpson <mssimpso at codeaurora.org> wrote: > That's right. In your example, if the target isn't specified

Loop Vectorize: Testing cost model driven transformations

2016 Nov 30

Loop Vectorize: Testing cost model driven transformations

Right, let's say what we get from llc --version is: Default target: x86_64-unknown-linux-gnu Host CPU: haswell So, what we currently do is use the default target (which is normally the host target), but ignore the host cpu? Michael On Wed, Nov 30, 2016 at 10:58 AM, Matthew Simpson <mssimpso at codeaurora.org> wrote: > > On Wed, Nov 30, 2016 at 1:04 PM, Michael Kuperstein

Loop Vectorize: Testing cost model driven transformations

2016 Nov 30

Loop Vectorize: Testing cost model driven transformations

Thanks Matt! So, just to make sure I understand, what is getting a specific TTI in llc triggered off? -mcpu? On Tue, Nov 29, 2016 at 2:49 PM, Matthew Simpson via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > On Tue, Nov 29, 2016 at 5:11 PM, Adam Nemet via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Do we need a new (loop-vectorizer-specific) command

Loop Vectorize: Testing cost model driven transformations

2016 Dec 02

Loop Vectorize: Testing cost model driven transformations

It isn't relevant, really, Matt just brought up "llc --version" as a way to show the default triple and native cpu. The same question ("Which TTI do/should we get with -mcpu=generic / when not providing -mcpu at all") applies to opt. On Fri, Dec 2, 2016 at 9:53 AM, Adam Nemet <anemet at apple.com> wrote: > Why is llc relevant to this thread, is this just an

Loop Vectorize: Testing cost model driven transformations

2016 Nov 30

Loop Vectorize: Testing cost model driven transformations

On Wed, Nov 30, 2016 at 1:04 PM, Michael Kuperstein via llvm-dev < llvm-dev at lists.llvm.org> wrote: > So, just to make sure I understand, what is getting a specific TTI in llc > triggered off? -mcpu? Right, TTI would be determined by the target specified in the IR or set explicitly with the -m flags. My understanding is that if the target is left unspecified in the IR and not set

Loop Vectorize: Testing cost model driven transformations

2016 Nov 30

Loop Vectorize: Testing cost model driven transformations

That's right. In your example, if the target isn't specified anywhere, an llc invocation would be equivalent to "llc -mtriple=x86_64-unknown-linux-gnu -mcpu=generic". TTI queries (in e.g., CodeGenPrepare) would be based on this. From opt, if the target triple is left unspecified, we will use the "base" TTI implementation (not x86). -- Matt On Wed, Nov 30, 2016 at 2:07

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

2014 Dec 13

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported The vectorizer’s scope will become wider. Did/do you foresee any issue with this? Thanks, Shahid From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Saturday, December 13, 2014 2:47 AM To: Shahid, Asghar-ahmad Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vectorization factor limitation in

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 06

[RFC] Make LoopVectorize Aware of SLP Operations

Hello, We would like to propose making LoopVectorize aware of SLP operations, to improve the generated code for loops operating on struct fields or doing complex math. At the moment, LoopVectorize uses interleaving to vectorize loops that operate on values loaded/stored from consecutive addresses: vector loads/stores are generated to combine consecutive loads/stores and then shufflevector

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

Regarding InterleavedAccessPass - sure, but proper strided/interleaved access optimization ought to have a positive impact even without target support. Case in point - Hal enabled it on PPC last September. An important difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC, but, as I said below, I hope we can enable it on x86 with a conservative cost function, and still get

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 15

[Proposal][RFC] Strided Memory Access Vectorization

Sorry for the spam. Copy-paste didn't capture the Subject properly. Resending with the correct Subject so that the thread is captured properly. -----Original Message----- From: Saito, Hideki Sent: Wednesday, June 15, 2016 1:39 PM To: 'llvm-dev at lists.llvm.org' <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] [Proposal][RFC] Strided Memory Access Ashutosh, First,

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 18

[Proposal][RFC] Strided Memory Access Vectorization

>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can >do a great job optimizing. Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates the output of vectorizer: If vectorizer is the best place to perform the optimization, it should do so. This includes the cases like

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

Hi Michael, Sometime back I did some experiments with interleave vectorizer and did not found any degrade, probably my tests/benchmarks are not extensive enough to cover much. Elina is the right person to comment on it as she already experienced cases where it hinders performance. For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later. I plan to pitch in (e.g., perf experiments). >Probably can depend on the support provided by below RFC by Michael: > "Allow loop vectorizer to choose vector widths that generate illegal types" >In that case Loop Vectorizer will

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

One common concern raised for cases where Loop Vectorizer generate bigger types than target supported: Based on VF currently we check the cost and generate the expected set of instruction[s] for bigger type. It has two challenges for bigger types cost is not always correct and code generation may not generate efficient instruction[s]. Probably can depend on the support provided by below RFC by

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 08

[RFC] Make LoopVectorize Aware of SLP Operations

Hi Florian! This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation that can handle complicated cases to something ad-hoc for some

Some questions about phase ordering in OPT and LLC

2016 May 09

Some questions about phase ordering in OPT and LLC

On Mon, May 09, 2016 at 01:07:07PM -0700, Mehdi Amini via llvm-dev wrote: > > > On May 9, 2016, at 10:43 AM, Ricardo Nobre via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > Hi, > > > > I'm a PhD student doing phase ordering as part of my PhD topic and I would like to ask some questions about LLVM. > > > > Executing the following

[LLVMdev] ARM vectorizer cost model

2013 Jan 09

[LLVMdev] ARM vectorizer cost model

Hi Renato, > I'm interested in knowing how you'll work up the ARM cost model and how easy it'd be to split the work. Yes, I am starting to work on the ARM cost model and I would appreciate any help in the form of: advice, performance measurements, patches, etc. I tune the cost model by running the cost model analysis pass and I compare the output of the analysis to the output

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

2014 Dec 11

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

Hi Nadav/Devs I am exploring Loop Vectorizer to vectorize i8 scalar operations into 8xi8 vector operation. I was expecting the Loop Vectorizer to analyze the profitability for vectorization factor(VF) of 8, However it is not doing so due to the widest type calculation done for the blocks inside the loop. May be I am missing something, however, I am curious to know why Loop Vectorizer limits the

[LLVMdev] Vectorization Cost Models and Multi-Instruction Patterns?

2015 Jan 19

[LLVMdev] Vectorization Cost Models and Multi-Instruction Patterns?

Hi all, While tinkering with saturation instructions, I hit problems with the cost model calculations. The loop vectorizer cost model accumulates the individual TTI cost model of each instruction. For saturating arithmetic, this is a gross overestimate, since you have 2 sexts (inputs), 2 icmps + 2 selects (for the saturation), and a truncate (output); these all fold alway. With an intrinsic,

similar to: Loop Vectorize: Testing cost model driven transformations