Displaying 20 results from an estimated 9000 matches similar to: "Loop Vectorize: Testing cost model driven transformations"
2016 Nov 29
2
Loop Vectorize: Testing cost model driven transformations
On Tue, Nov 29, 2016 at 5:11 PM, Adam Nemet via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Do we need a new (loop-vectorizer-specific) command line option for this?
> Don’t we get the default TTI if the target is unspecified in the test?
I think you're right! It looks like I am getting the default TTI when the
target is left unspecified. I was assuming it would default to
2016 Nov 30
1
Loop Vectorize: Testing cost model driven transformations
Yeah, this makes a lot of sense, -mcpu=generic (as opposed to -mcpu=native)
is the sane default.
I guess I was just expecting an x86 host to get a "generic x86 TTI"
(whatever that means), not a "generic TTI".
On Wed, Nov 30, 2016 at 11:49 AM, Matthew Simpson <mssimpso at codeaurora.org>
wrote:
> That's right. In your example, if the target isn't specified
2016 Nov 30
0
Loop Vectorize: Testing cost model driven transformations
Right, let's say what we get from llc --version is:
Default target: x86_64-unknown-linux-gnu
Host CPU: haswell
So, what we currently do is use the default target (which is normally the
host target), but ignore the host cpu?
Michael
On Wed, Nov 30, 2016 at 10:58 AM, Matthew Simpson <mssimpso at codeaurora.org>
wrote:
>
> On Wed, Nov 30, 2016 at 1:04 PM, Michael Kuperstein
2016 Nov 30
0
Loop Vectorize: Testing cost model driven transformations
Thanks Matt!
So, just to make sure I understand, what is getting a specific TTI in llc
triggered off? -mcpu?
On Tue, Nov 29, 2016 at 2:49 PM, Matthew Simpson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On Tue, Nov 29, 2016 at 5:11 PM, Adam Nemet via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Do we need a new (loop-vectorizer-specific) command
2016 Dec 02
2
Loop Vectorize: Testing cost model driven transformations
It isn't relevant, really, Matt just brought up "llc --version" as a way to
show the default triple and native cpu.
The same question ("Which TTI do/should we get with -mcpu=generic / when
not providing -mcpu at all") applies to opt.
On Fri, Dec 2, 2016 at 9:53 AM, Adam Nemet <anemet at apple.com> wrote:
> Why is llc relevant to this thread, is this just an
2016 Nov 30
2
Loop Vectorize: Testing cost model driven transformations
On Wed, Nov 30, 2016 at 1:04 PM, Michael Kuperstein via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> So, just to make sure I understand, what is getting a specific TTI in llc
> triggered off? -mcpu?
Right, TTI would be determined by the target specified in the IR or set
explicitly with the -m flags. My understanding is that if the target is
left unspecified in the IR and not set
2016 Nov 30
3
Loop Vectorize: Testing cost model driven transformations
That's right. In your example, if the target isn't specified anywhere, an
llc invocation would be equivalent to "llc
-mtriple=x86_64-unknown-linux-gnu -mcpu=generic". TTI queries (in e.g.,
CodeGenPrepare) would be based on this. From opt, if the target triple is
left unspecified, we will use the "base" TTI implementation (not x86).
-- Matt
On Wed, Nov 30, 2016 at 2:07
2014 Dec 13
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported
The vectorizer’s scope will become wider.
Did/do you foresee any issue with this?
Thanks,
Shahid
From: Nadav Rotem [mailto:nrotem at apple.com]
Sent: Saturday, December 13, 2014 2:47 AM
To: Shahid, Asghar-ahmad
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Vectorization factor limitation in
2018 Feb 06
2
[RFC] Make LoopVectorize Aware of SLP Operations
Hello,
We would like to propose making LoopVectorize aware of SLP operations,
to improve the generated code for loops operating on struct fields or
doing complex math.
At the moment, LoopVectorize uses interleaving to vectorize loops that
operate on values loaded/stored from consecutive addresses: vector
loads/stores are generated to combine consecutive loads/stores and then
shufflevector
2016 Aug 05
2
enabling interleaved access loop vectorization
Regarding InterleavedAccessPass - sure, but proper strided/interleaved
access optimization ought to have a positive impact even without target
support.
Case in point - Hal enabled it on PPC last September. An important
difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC,
but, as I said below, I hope we can enable it on x86 with a conservative
cost function, and still get
2016 Jun 15
3
[Proposal][RFC] Strided Memory Access Vectorization
Sorry for the spam. Copy-paste didn't capture the Subject properly. Resending with the correct Subject so that the thread is captured properly.
-----Original Message-----
From: Saito, Hideki
Sent: Wednesday, June 15, 2016 1:39 PM
To: 'llvm-dev at lists.llvm.org' <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] [Proposal][RFC] Strided Memory Access
Ashutosh,
First,
2016 Jun 18
2
[Proposal][RFC] Strided Memory Access Vectorization
>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can
>do a great job optimizing.
Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates
the output of vectorizer:
If vectorizer is the best place to perform the optimization, it should do so.
This includes the cases like
2016 Aug 05
3
enabling interleaved access loop vectorization
Hi Michael,
Sometime back I did some experiments with interleave vectorizer and did not found any degrade,
probably my tests/benchmarks are not extensive enough to cover much.
Elina is the right person to comment on it as she already experienced cases where it hinders performance.
For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not
2016 Jun 30
1
[Proposal][RFC] Strided Memory Access Vectorization
As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later.
I plan to pitch in (e.g., perf experiments).
>Probably can depend on the support provided by below RFC by Michael:
> "Allow loop vectorizer to choose vector widths that generate illegal types"
>In that case Loop Vectorizer will
2016 Jun 30
0
[Proposal][RFC] Strided Memory Access Vectorization
One common concern raised for cases where Loop Vectorizer generate
bigger types than target supported:
Based on VF currently we check the cost and generate the expected set of
instruction[s] for bigger type. It has two challenges for bigger types cost
is not always correct and code generation may not generate efficient
instruction[s].
Probably can depend on the support provided by below RFC by
2018 Feb 08
0
[RFC] Make LoopVectorize Aware of SLP Operations
Hi Florian!
This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation that can handle complicated cases to something ad-hoc for some
2016 May 09
2
Some questions about phase ordering in OPT and LLC
On Mon, May 09, 2016 at 01:07:07PM -0700, Mehdi Amini via llvm-dev wrote:
>
> > On May 9, 2016, at 10:43 AM, Ricardo Nobre via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > I'm a PhD student doing phase ordering as part of my PhD topic and I would like to ask some questions about LLVM.
> >
> > Executing the following
2013 Jan 09
0
[LLVMdev] ARM vectorizer cost model
Hi Renato,
> I'm interested in knowing how you'll work up the ARM cost model and how easy it'd be to split the work.
Yes, I am starting to work on the ARM cost model and I would appreciate any help in the form of: advice, performance measurements, patches, etc.
I tune the cost model by running the cost model analysis pass and I compare the output of the analysis to the output
2014 Dec 11
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
Hi Nadav/Devs
I am exploring Loop Vectorizer to vectorize i8 scalar operations into 8xi8 vector operation.
I was expecting the Loop Vectorizer to analyze the profitability for vectorization factor(VF) of 8,
However it is not doing so due to the widest type calculation done for the blocks inside the loop.
May be I am missing something, however, I am curious to know why Loop Vectorizer limits the
2015 Jan 19
2
[LLVMdev] Vectorization Cost Models and Multi-Instruction Patterns?
Hi all,
While tinkering with saturation instructions, I hit problems with the
cost model calculations.
The loop vectorizer cost model accumulates the individual TTI cost
model of each instruction. For saturating arithmetic, this is a gross
overestimate, since you have 2 sexts (inputs), 2 icmps + 2 selects
(for the saturation), and a truncate (output); these all fold alway.
With an intrinsic,