thr3ads.net - similar to: "mischeduler (pre-RA) experiments"

Displaying 20 results from an estimated 800 matches similar to: "mischeduler (pre-RA) experiments"

2016 Oct 28

mischeduler

Hi, Regarding the mischeduler, I wonder // For loops that are acyclic path limited, aggressively schedule for // latency. This can result in very long dependence chains scheduled in // sequence, so once every cycle (when CurrMOps == 0), switch to normal // heuristics. if (Rem.IsAcyclicLatencyLimited && !Zone->getCurrMOps() && tryLatency(TryCand, Cand, *Zone))

MachineScheduler not scheduling for latency

2019 Sep 10

MachineScheduler not scheduling for latency

Hi Andy, Thanks for the explanations. Yes AMDGPU is in-order and has MicroOpBufferSize = 1. Re "issue limited" and instruction groups: could it make sense to disable the generic scheduler's detection of issue limitation on in-order CPUs, or on CPUs that don't define instruction groups, or some similar condition? Something like: --- a/lib/CodeGen/MachineScheduler.cpp +++

mischeduler (pre-RA) experiments

2017 Nov 25

mischeduler (pre-RA) experiments

> > Of course, you want to duplicate as little of the generic scheduling logic > as you can. So I think the challenge is how to expose the > generic scheduler's functionality as a base class or composition of > utilities so that defining your strategy doesn't require too much > copy-paste. Isn't GCNMaxOccupancySchedStrategy [1] already an example on using

[LLVMdev] Question about load clustering in the machine scheduler

2015 Mar 27

[LLVMdev] Question about load clustering in the machine scheduler

On Thu, Mar 26, 2015 at 11:50:20PM -0700, Andrew Trick wrote: > > > On Mar 26, 2015, at 7:36 PM, Tom Stellard <tom at stellard.net> wrote: > > > > Hi, > > > > I have a program with over 100 loads (each with a 10 cycle latency) > > at the beginning of the program, and I can't figure out how to get > > the machine scheduler to intermix ALU

[LLVMdev] Question about load clustering in the machine scheduler

2015 Mar 27

[LLVMdev] Question about load clustering in the machine scheduler

Hi, I have a program with over 100 loads (each with a 10 cycle latency) at the beginning of the program, and I can't figure out how to get the machine scheduler to intermix ALU instructions with the loads to effectively hide the latency. It seems the issue is with load clustering. I restrict load clustering to 4 at a time, but when I look at the debug output, the loads are always being

top-down vs. bottom-up list scheduling

2018 Nov 06

top-down vs. bottom-up list scheduling

Hello List! I am looking at top-down vs. bottom-up list scheduling for simple(r) in-order cores. First, for some context, below is a fairly representative pseudo-code example of the sort of DSP-like codes I am looking at: uint64_t foo(int *pA, int *pB, unsigned N, unsigned C) { uint64_t sum = 0; while (N-- > 0) { A1 = *pA++; A2 = *pA++; B1 = *pB++; B2 =

[LLVMdev] MIScheduler / bundling

2013 Feb 27

[LLVMdev] MIScheduler / bundling

Hi, I am looking at the Hexagon MI Scheduling and trying to adapt it to my target. As far as I can see, Hexagon does not bundle the VLIW-bundles by calling bundleWithPred() on MIs of the completed cycle. First of all, why is this not done? SlotIndexes seems to have at least some support for this, by calling getBundleStart() for each MI that is looked up. A follow up question is then, how would

Mischeduler: Unknown reason for peak register pressure increase

2017 Aug 12

Mischeduler: Unknown reason for peak register pressure increase

I am working on a project where we are integrating an existing pre-RA scheduler into LLVM and we are trying to match our peak register pressure values with the machine instruction schedulers values while using X86. I am finding some mismatches in test cases like the one attached. The registers "AH" and "AL" are live-out but not live-in and I don't see that they are defined

[LLVMdev] MIScheduler + AA: Missed scheduling opportunity in MIsNeedChainEdge. Bug?

2015 Jul 01

[LLVMdev] MIScheduler + AA: Missed scheduling opportunity in MIsNeedChainEdge. Bug?

Hello, While tuning the MIScheduler for my target, I discovered a code that unnecessarily restricts the scheduler. I think this is a bug, but I would appreciate a second opinion. In file ScheduleDAGInstrs.cpp, the function MIsNeedChainEdge determines whether two MachineInstrs are ordered by a memory dependence. It first runs through the standard criteria (Do both instructions access memory?

Machine Scheduler on Power PC: Latency Limit and Register Pressure

2017 Oct 13

Machine Scheduler on Power PC: Latency Limit and Register Pressure

> On Oct 13, 2017, at 1:46 PM, Matthias Braun <matze at braunis.de> wrote: > > Yes, I've run into the problem myself that the Pending queue isn't even checked with the tryCandidate() logic and so takes priority over all other scheduling decisions. > > I personally would be open to changes in this area. To start the brainstorming I could imagine that we move nodes

Prioritizing an SDNode for scheduling

2016 Oct 21

Prioritizing an SDNode for scheduling

Hello. Is there a way to specify in the back end an (ISD::INLINEASM) SDNode to be scheduled first under all circumstances? I need to specify something like node priority to schedule the node before all other nodes in the SelectionDAG of the basic block. (Using chain or glue edges in order to make a node first is not a good idea, since I am doing this at instruction selection time, on

Prioritizing an SDNode for scheduling

2016 Oct 21

Prioritizing an SDNode for scheduling

I probably misunderstood the question. You probably want to do this in SelectionDAG. On Fri, Oct 21, 2016 at 10:29 AM, Ehsan Amiri <ehsanamiri at gmail.com> wrote: > You can do this by changing instruction scheduling heuristics. I think the > more important question is if this correct always for all platforms. > > I don't know which scheduler you use. We use

reg coalescing improvements

2017 Aug 17

reg coalescing improvements

Hi, I am seeing cases of poorly coalesced IV updates on SystemZ: In the final IR, it is obvious that %R4D<def> = LA %R2D<kill>, 4, %noreg // R4 = R2 + 4 %R2D<def> = LGR %R4D<kill> // R2 = R4 could be optimized to -> %R2D<def> = LA %R2D<kill>, 4, %noreg // R2 = R2 + 4 The reason this wasn't coalesced, is

Fwd: MachineScheduler not scheduling for latency

2019 Sep 09

Fwd: MachineScheduler not scheduling for latency

Hi, I'm trying to understand why MachineScheduler does a poor job in straight line code in cases like the one in the attached debug dump. This is on AMDGPU, an in-order target, and the problem is that the IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in the resulting schedule they are often placed right next to their uses like this: 1784B %140:vgpr_32 =

Machine Scheduler on Power PC: Latency Limit and Register Pressure

2017 Oct 13

Machine Scheduler on Power PC: Latency Limit and Register Pressure

Hi, I've been looking at the Machine Scheduler on Power PC. I am looking only at the pre-RA machine scheduler and I am running it in the default bi-directional mode (so, both top down and bottom up queues are considered). I've come across an example where the scheduler picks a poor ordering for the instructions which results in very high register pressure which results in spills.

[buildSchedGraph] memory dependencies

2016 Feb 03

[buildSchedGraph] memory dependencies

Hi, (This only concerns MISNeedChainEdge(), and is separate from D8705) I found out that the MIScheduler (pre-ra) could not handle a simple test case (test/CodeGen/SystemZ/alias-01.ll), with 16 independent load / add / stores. The buildSchedGraph() put too many edges between memory accesses, because 1) There was no implementation of areMemAccessesTriviallyDisjoint() for SystemZ. 2) Type

appending

2006 Jun 14

appending

All, In the function below I have 24 individuals and 6 calculations per individual. The 6 calculations are collected each time in a 1:24 loop when calculating "delta". I'd like to collect all 144 = 24*6 calculations in one vector ("delta.patient.comb"). The function works as is via indexing, but is there an easier way to collect the measurements via appendinng the 6

R's AIC values differ from published values

2012 Feb 13

R's AIC values differ from published values

Using the Cement hardening data in Anderson (2008) Model Based Inference in the Life Sciences. A Primer on Evidence, and working with the best model which is lm ( y ~ x1 + x2, data = cement ) the AIC value from R is model <- lm ( formula = y ~ x1 + x2 , data = cement ) AIC ( model ) 64.312 which can be converted to AICc by adding the bias

2017 Aug 30

> On Aug 30, 2017, at 1:43 PM, Matthias Braun <matze at braunis.de> wrote: > > That means you cannot use the code from RegisterPressure.{cpp|h} to compute this. The other liveness analysis we have in llvm codegen is LiveIntervals (LiveItnervalAnalysis) which gives you a list of liveness segments of a given vreg (the same representation is used in most linear scan allocators even

A combinatorial optimization problem: finding the best permutation of a complex vector

2009 Nov 12

A combinatorial optimization problem: finding the best permutation of a complex vector

Hi, I have a complex-valued vector X in C^n. Given another complex-valued vector Y in C^n, I want to find a permutation of Y, say, Y*, that minimizes ||X - Y*||, the distance between X and Y*. Note that this problem can be trivially solved for "Real" vectors, since real numbers possess the ordering property. Complex numbers, however, do not possess this property. Hence the

similar to: mischeduler (pre-RA) experiments