thr3ads.net - search: "othercounter"

Displaying 4 results from an estimated 4 matches for "othercounter".

RFC: Pass to prune redundant profiling instrumentation

2016 Mar 12

RFC: Pass to prune redundant profiling instrumentation

...with their > multiplicities > create new counter > emit side-table data that relates the new counter to an array of > (other counter, multiplicity of update) > > The runtime just emits the side-table and then llvm-profdata does: > > for each counter C: > for (otherCounter, multiplicity) in side-table[C]: > counters[otherCounter] += multiplicity * counters[C] > > There are other issues that can complicate the matter. 1) The assumption in the algorithm is that the source counter has only one update site -- but instead it may have more than one sites...

MachineScheduler not scheduling for latency

2019 Sep 10

MachineScheduler not scheduling for latency

Hi Andy, Thanks for the explanations. Yes AMDGPU is in-order and has MicroOpBufferSize = 1. Re "issue limited" and instruction groups: could it make sense to disable the generic scheduler's detection of issue limitation on in-order CPUs, or on CPUs that don't define instruction groups, or some similar condition? Something like: --- a/lib/CodeGen/MachineScheduler.cpp +++

RFC: Pass to prune redundant profiling instrumentation

2016 Mar 12

RFC: Pass to prune redundant profiling instrumentation

> On Mar 11, 2016, at 5:28 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > > On Fri, Mar 11, 2016 at 12:47 PM, Vedant Kumar <vsk at apple.com> wrote: > There have been a lot of responses. I'll try to summarize the thread and respond > to some of the questions/feedback. > > > Summary > ======= > > 1. We should teach GlobalDCE to

Fwd: MachineScheduler not scheduling for latency

2019 Sep 09

Fwd: MachineScheduler not scheduling for latency

Hi, I'm trying to understand why MachineScheduler does a poor job in straight line code in cases like the one in the attached debug dump. This is on AMDGPU, an in-order target, and the problem is that the IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in the resulting schedule they are often placed right next to their uses like this: 1784B %140:vgpr_32 =

search for: othercounter