similar to: [LLVMdev] Instruction Cost

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Instruction Cost"

2015 Jan 15
2
[LLVMdev] Instruction Cost
CostModule::getInstructionCost also consults TTI ( http://llvm.org/docs/doxygen/html/CostModel_8cpp_source.html#l00380). No? Jingyue On Wed, Jan 14, 2015 at 4:05 PM, Chandler Carruth <chandlerc at google.com> wrote: > > On Wed, Jan 14, 2015 at 3:54 PM, Jingyue Wu <jingyue at google.com> wrote: > >> I'm looking for APIs that compute instruction costs, and noticed
2017 Feb 08
2
ShuffleKind SK_ExtractSubvector
Hi, I am a little unsure about the semantics of the ShuffleKind SK_ExtractSubvector. It seems a subvector is to be extracted, starting from a given index of a given subtype. First of all, if index 0 is passed, I suppose this would mean a noop? But what about calls like the one made of LoopVectorizer for Instruction::PHI in getInstructionCost(): return
2015 Jan 20
2
[LLVMdev] Instruction Cost
Thanks all for replying! I'll try the CostModel class first. Jingyue On Thu, Jan 15, 2015 at 3:47 AM, Jonas Wagner <jonas.wagner at epfl.ch> wrote: > Hi, > > 3. CostModel::getInstructionCost::getInstructionCost in >> lib/Analysis/CostModel.cpp >> > > I've been using the CostModel class in a project, and it has worked quite > well. I don't have
2016 Jun 02
4
[GSoC 2016] Parameters of a target architecture
Dear LLVM contributors, I work on the "Improvement of vectorization process in Polly". At the moment I'm trying to implement tiling, interchanging and unrolling of specific loops based on the following algorithm for the analytical modeling [1]. It requires information about the following parameters of a target architecture: 1. Size of double-precision floating-point number. 2.
2016 Jan 05
3
TargetTransformInfo getOperationCost uses
Hi, I'm trying to implement the TTI hooks for AMDGPU to avoid unrolling loops for operations with huge expansions (i.e. integer division). The values that are ultimately reported by opt -cost-model -analyze (the actual cost model tests) seem to not matter for this. The huge cost I've assigned division doesn't prevent the loop from being unrolled, because it isn't actually
2013 Jan 09
2
[LLVMdev] ARM vectorizer cost model
Hi Nadav, I'm interested in knowing how you'll work up the ARM cost model and how easy it'd be to split the work. As far as I can see, LoopVectorizationCostModel is the class that does all the work, with assistance from the target transform info. Do you think that updating ARMTTI would be the best course of action now, and inspect the differences in the CostModel later? I also
2013 Jan 09
0
[LLVMdev] ARM vectorizer cost model
Hi Renato, > I'm interested in knowing how you'll work up the ARM cost model and how easy it'd be to split the work. Yes, I am starting to work on the ARM cost model and I would appreciate any help in the form of: advice, performance measurements, patches, etc. I tune the cost model by running the cost model analysis pass and I compare the output of the analysis to the output
2014 Jul 17
4
[LLVMdev] Using CostModel to estimate machine cycles of each instruction
There is CostModel.cpp since LLVM3, I am wondering if anyone can give me an concrete example on how to use this pass to estimate cycles used in a given IR file. Thank you very much. Don
2013 Jan 10
2
[LLVMdev] ARM vectorizer cost model
On 9 January 2013 17:10, Nadav Rotem <nrotem at apple.com> wrote: > For example: > "opt -cost-model -analyze dumper.ll -mtriple=thumbv7 > -mcpu=cortex-a15" > > I also run the vectorizer with -debug-only=loop-vectorize because it dumps > the costs of all of the instructions with different vectorization factors, > and it also detects the different kinds
2013 Feb 07
0
[LLVMdev] CostModelAnalysis for 3.0 release
Hi Ryan, I think that it would be difficult to back port the CostModel back to LLVM3.0 because it uses the new TargetTransformInfo analysis. I also wanted to mention that only _lowering_ passes (target-specific optimization passes) may use TTI and the cost model. Higher-level canonicalization passes should not use it. Thanks, Nadav On Feb 6, 2013, at 1:26 AM, ryan <stdstack at
2015 May 04
3
[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo
Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is
2015 Jul 01
2
[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution
----- Original Message ----- > From: "Bjarke Roune" <broune at google.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: llvmdev at cs.uiuc.edu, "Jingyue Wu" <jingyue at google.com> > Sent: Wednesday, July 1, 2015 2:27:59 PM > Subject: Re: [LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution > >
2013 Feb 06
2
[LLVMdev] CostModelAnalysis for 3.0 release
Hi All, I wanted to do some basic cost estimation of Instruction/BB. There're CostModelAnalysis and CodeMetrics available in 3.1 and 3.2 releases. I've been using 3.0 for a while. I'm wondering whether similar analysis can be done in the old 3.0 release, e.g. back porting the implementation. Thank you! Best, Ryan -------------- next part -------------- An HTML attachment was
2014 Jun 17
3
[LLVMdev] Attaching range metadata to IntrinsicInst
On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Eric, > > In the IR, besides "target datalayout" and "target triple", we have a > special "target cpu" string which is set by the Clang front-end according to > its -target-cpu flag. We also write a Module::getTargetCPU() method to > retrieve this string from the
2020 Nov 05
4
[Proposal] Introducing the concept of invalid costs to the IR cost model
Hi, I'd like to propose a change to our cost interfaces so that instead of returning an unsigned value from functions like getInstructionCost, getUserCost, etc., we instead return a wrapper class that encodes an integer cost along with extra state. The extra state can be used to express: 1. A cost as infinitely expensive in order to prevent certain optimisations taking place. For example,
2013 Jan 05
1
[LLVMdev] RFC: Can we make TargetTransformInfo an analysis group?
I know, I said a bad word -- analysis group. But it works pretty much the way I think we want here. We *always* want a TargetTransformInfo, and we have reasonable (conservative) stubs in place. We would just like the option of providing one from the target that has very clever implementations. I would propose that we make TargetTransformInfo be an analysis group, and provide
2014 Jun 17
4
[LLVMdev] Attaching range metadata to IntrinsicInst
On 17 June 2014 06:41, Eli Bendersky <eliben at google.com> wrote: > On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas at mxc.ca> wrote: > >> Chandler Carruth wrote: >> >>> This seems fine to me, but I'd like to make sure it looks OK to Nick as >>> well. >>> >> >> I strongly prefer baking in knowledge about the
2014 Jun 17
5
[LLVMdev] Attaching range metadata to IntrinsicInst
Chandler Carruth wrote: > This seems fine to me, but I'd like to make sure it looks OK to Nick as > well. I strongly prefer baking in knowledge about the intrinsics themselves into the passes if possible. Metadata will always be secondary. Separately, should value tracking look use range metadata when it's available? Absolutely. I think it should apply to all CallInst not just
2014 Jun 17
2
[LLVMdev] Attaching range metadata to IntrinsicInst
Eh? How do you envision this? -eric On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Nick, > > That makes sense. I think a main issue here is that the ranges of these PTX > special registers (e.g., threadIdx.x) depend on -target-cpu which is only > visible to clang and llc. Would you mind we specify "target cpu" in the IR > similar
2014 Oct 24
3
[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs
Hi, I noticed a significant performance regression (up to 40%) on some internal CUDA benchmarks (a reduced example presented below). The root cause of this regression seems that IndVarSimpilfy widens induction variables assuming arithmetics on wider integer types are as cheap as those on narrower ones. However, this assumption is wrong at least for the NVPTX64 target. Although the NVPTX64 target