Displaying 4 results from an estimated 4 matches for "othercounter".
2016 Mar 12
2
RFC: Pass to prune redundant profiling instrumentation
...with their
> multiplicities
> create new counter
> emit side-table data that relates the new counter to an array of
> (other counter, multiplicity of update)
>
> The runtime just emits the side-table and then llvm-profdata does:
>
> for each counter C:
> for (otherCounter, multiplicity) in side-table[C]:
> counters[otherCounter] += multiplicity * counters[C]
>
>
There are other issues that can complicate the matter.
1) The assumption in the algorithm is that the source counter has only one
update site -- but instead it may have more than one sites...
2019 Sep 10
2
MachineScheduler not scheduling for latency
Hi Andy,
Thanks for the explanations. Yes AMDGPU is in-order and has
MicroOpBufferSize = 1.
Re "issue limited" and instruction groups: could it make sense to
disable the generic scheduler's detection of issue limitation on
in-order CPUs, or on CPUs that don't define instruction groups, or
some similar condition? Something like:
--- a/lib/CodeGen/MachineScheduler.cpp
+++
2016 Mar 12
2
RFC: Pass to prune redundant profiling instrumentation
> On Mar 11, 2016, at 5:28 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
>
> On Fri, Mar 11, 2016 at 12:47 PM, Vedant Kumar <vsk at apple.com> wrote:
> There have been a lot of responses. I'll try to summarize the thread and respond
> to some of the questions/feedback.
>
>
> Summary
> =======
>
> 1. We should teach GlobalDCE to
2019 Sep 09
2
Fwd: MachineScheduler not scheduling for latency
Hi,
I'm trying to understand why MachineScheduler does a poor job in
straight line code in cases like the one in the attached debug dump.
This is on AMDGPU, an in-order target, and the problem is that the
IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in
the resulting schedule they are often placed right next to their uses
like this:
1784B %140:vgpr_32 =