thr3ads.net - similar to: "[LLVMdev] Why does Select have a higher speculation cost than other instructions?"

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] Why does Select have a higher speculation cost than other instructions?"

Dataflow analysis regression in 3.7

2017 Jul 05

Dataflow analysis regression in 3.7

Hi all, I just found an optimization regression regarding simple dataflow/constprop analysis: https://godbolt.org/g/Uz8P7t This code ``` int dataflow(int b) { int a; if (b==4) a = 3*b; // fully optimized when changed to a = 3; else a = 5; if (a == 4) return 0; else return 1; } ``` is no longer optimized to just a "return 1". The regression happened in LLVM

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

2020 Jul 28

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

_mm_lfence was originally documented as a load fence. But in light of speculative execution vulnerabilities it has started being advertised as a way to prevent speculative execution. Current Intel Software Development Manual documents it as "Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE

[LLVMdev] [Proposal] Speculative execution of function calls

2013 Jul 31

[LLVMdev] [Proposal] Speculative execution of function calls

On 31 Jul 2013, at 10:50, "Kuperstein, Michael M" <michael.m.kuperstein at intel.com> wrote: > This has two main uses: > 1) Intrinsics, including target-dependent intrinsics, can be marked with this attribute – hopefully a lot of intrinsics that do not have explicit side effects and do not rely on global state that is not currently modeled by “readnone” (e.g. rounding

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

2013 Jul 17

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

2020 Aug 09

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

Hi Craig, The review for the similar GPU problem is now up here: https://reviews.llvm.org/D85603 (+ some other patches on the Phabricator stack). >From a pragmatic perspective, the constraints added to program transforms there are sufficient for what you need. You'd produce IR such as: %token = call token @llvm.experimental.convergence.anchor() br i1 %c, label %then, label %else

Eliminate some two entry PHI nodes - SimplifyCFG

2020 Feb 03

Eliminate some two entry PHI nodes - SimplifyCFG

SimplifyCFG FoldTwoEntryPhiNode looks to simplify all 2 entry phi nodes in a block, if it can't do them all then it won't do any and returns. There is a lot of code that is directly in this function geared toward this requirement. Is it possible currently to get this function (or pass) to simply fold "some" of the phis (without having to fold them all?). I understand that

Eliminate some two entry PHI nodes - SimplifyCFG

2020 Feb 05

Eliminate some two entry PHI nodes - SimplifyCFG

Conditional on the target supporting cmov? Though that's probably not optimal. On Wed, Feb 5, 2020, 7:47 AM Nicolai Hähnle <nhaehnle at gmail.com> wrote: > Hi Ryan, > > On Mon, Feb 3, 2020 at 7:08 PM Ryan Taylor via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > SimplifyCFG FoldTwoEntryPhiNode looks to simplify all 2 entry phi nodes > in a block, if it

Dataflow analysis regression in 3.7

2017 Jul 06

Dataflow analysis regression in 3.7

On Thu, Jul 6, 2017 at 7:00 AM, Davide Italiano <davide at freebsd.org> wrote: > On Wed, Jul 5, 2017 at 3:59 PM, Johan Engelen via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > I just found an optimization regression regarding simple > > dataflow/constprop analysis: > > https://godbolt.org/g/Uz8P7t > > > > This code >

[LLVMdev] [Proposal] Speculative execution of function calls

2013 Jul 31

[LLVMdev] [Proposal] Speculative execution of function calls

Hello, Chris requested I start a fresh discussion on this, so, here goes. The previous iterations can be found here (and in follow-ups): http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130722/182590.html http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/064047.html Cutting to the chase, the goal is to enhance llvm::isSafeToSpeculativelyExecute() to support call instructions.

[LLVMdev] Is this a bug with loop unrolling and TargetTransformInfo ?

2015 Feb 04

[LLVMdev] Is this a bug with loop unrolling and TargetTransformInfo ?

Hi, I ran into this issue recently and wanted to know if it was a bug or expected behavior. In the R600 backend's TargetTransformInfo implementation, we were setting UnrollingPreferences::Count = UINT_MAX. This was a mistake as we should have been setting UnrollingPreferences::MaxCount instead. However, as a result of setting Count to UINT_MAX, this loop would be unrolled 15 times: if (b

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

2013 Jul 29

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

On 7/16/2013 11:38 PM, Andrew Trick wrote: > Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. > > To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline.

Look up table in function section

2017 May 04

Look up table in function section

I have legit requirement to keep the switch generated lookup table in function section. The lookup table is being generated in SimplifyCFG pass and is treated as a global. Is there a good way to mark these lookup tables and recognize them later to keep them in function sections. --Sumanth -------------- next part -------------- An HTML attachment was scrubbed... URL:

RFC: Harvard architectures and default address spaces

2017 Jul 11

RFC: Harvard architectures and default address spaces

Hello all, I’m looking into solving an AVR-specific issue and would love to hear peoples thoughts on how to best fix it. Background As you may or may not know, I maintain the in-tree AVR backend, which also happens to be (to the best of my knowledge) the first in-tree backend for a Harvard architecture. In this architecture, code lives inside the ‘program memory’ space (numbered 1), whereas data

[LLVMdev] Instruction Cost

2015 Jan 14

[LLVMdev] Instruction Cost

Hi, I'm looking for APIs that compute instruction costs, and noticed several of them. 1. A series of APIs of TargetTransformInfo that compute the cost of instructions of a particular type (e.g. getArithmeticInstrCost and getShuffleCost) 2. TargetTransformInfo::getOperationCost 3. CostModel::getInstructionCost::getInstructionCost in lib/Analysis/CostModel.cpp Only the first one is used

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

2015 May 04

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

Hi all, I have a query regarding Cost Table for AVX2 in TargetTransformInfo. The table consist of entries for shift and div operations only. There are no entries for ADD, SUB and MUL for AVX2 cost table. Those entries are present in Cost Table for AVX. The reason for query is - when my sub target feature is AVX2, in SLP Vectorization, while calculating scalar cost of ADD, it doesn't see

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

2015 May 04

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is

[LLVMdev] Instruction Cost

2015 Jan 15

[LLVMdev] Instruction Cost

CostModule::getInstructionCost also consults TTI ( http://llvm.org/docs/doxygen/html/CostModel_8cpp_source.html#l00380). No? Jingyue On Wed, Jan 14, 2015 at 4:05 PM, Chandler Carruth <chandlerc at google.com> wrote: > > On Wed, Jan 14, 2015 at 3:54 PM, Jingyue Wu <jingyue at google.com> wrote: > >> I'm looking for APIs that compute instruction costs, and noticed

Mitigating straight-line speculation vulnerability CVE-2020-13844

2020 Jun 08

Mitigating straight-line speculation vulnerability CVE-2020-13844

Hi, A new speculative cache side-channel vulnerability has been published at https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/downloads/straight-line-speculation, named "straight-line speculation”, CVE-2020-13844. In this email, I'd like to explain the toolchain mitigation we've prepared to mitigate against this vulnerability for AArch64.

Cost model is missing in InstCombiner

2016 Aug 17

Cost model is missing in InstCombiner

Hi, I think canEvaluateTruncated() in InstCombiner needs use cost model to decide whether perform optimization or not. Without cost model from TargetTransformInfo, aggressively optimizing IR in vector types according to the number of bits demanded may lead to scalarization of vector operations. For example, if the input IR is: %wide.load25 = load <32 x i8>, <32 x i8>* %231, align

[LLVMdev] Vectorization Cost Models and Multi-Instruction Patterns?

2015 Jan 19

[LLVMdev] Vectorization Cost Models and Multi-Instruction Patterns?

Hi all, While tinkering with saturation instructions, I hit problems with the cost model calculations. The loop vectorizer cost model accumulates the individual TTI cost model of each instruction. For saturating arithmetic, this is a gross overestimate, since you have 2 sexts (inputs), 2 icmps + 2 selects (for the saturation), and a truncate (output); these all fold alway. With an intrinsic,

similar to: [LLVMdev] Why does Select have a higher speculation cost than other instructions?