similar to: mischeduler (pre-RA) experiments

Displaying 20 results from an estimated 800 matches similar to: "mischeduler (pre-RA) experiments"

2016 Oct 28
2
mischeduler
Hi, Regarding the mischeduler, I wonder // For loops that are acyclic path limited, aggressively schedule for // latency. This can result in very long dependence chains scheduled in // sequence, so once every cycle (when CurrMOps == 0), switch to normal // heuristics. if (Rem.IsAcyclicLatencyLimited && !Zone->getCurrMOps() && tryLatency(TryCand, Cand, *Zone))
2019 Sep 10
2
MachineScheduler not scheduling for latency
Hi Andy, Thanks for the explanations. Yes AMDGPU is in-order and has MicroOpBufferSize = 1. Re "issue limited" and instruction groups: could it make sense to disable the generic scheduler's detection of issue limitation on in-order CPUs, or on CPUs that don't define instruction groups, or some similar condition? Something like: --- a/lib/CodeGen/MachineScheduler.cpp +++
2017 Nov 25
2
mischeduler (pre-RA) experiments
> > Of course, you want to duplicate as little of the generic scheduling logic > as you can. So I think the challenge is how to expose the > generic scheduler's functionality as a base class or composition of > utilities so that defining your strategy doesn't require too much > copy-paste. ​Isn't GCNMaxOccupancySchedStrategy [1] already an example on using
2015 Mar 27
2
[LLVMdev] Question about load clustering in the machine scheduler
On Thu, Mar 26, 2015 at 11:50:20PM -0700, Andrew Trick wrote: > > > On Mar 26, 2015, at 7:36 PM, Tom Stellard <tom at stellard.net> wrote: > > > > Hi, > > > > I have a program with over 100 loads (each with a 10 cycle latency) > > at the beginning of the program, and I can't figure out how to get > > the machine scheduler to intermix ALU
2015 Mar 27
2
[LLVMdev] Question about load clustering in the machine scheduler
Hi, I have a program with over 100 loads (each with a 10 cycle latency) at the beginning of the program, and I can't figure out how to get the machine scheduler to intermix ALU instructions with the loads to effectively hide the latency. It seems the issue is with load clustering. I restrict load clustering to 4 at a time, but when I look at the debug output, the loads are always being
2018 Nov 06
4
top-down vs. bottom-up list scheduling
Hello List! I am looking at top-down vs. bottom-up list scheduling for simple(r) in-order cores. First, for some context, below is a fairly representative pseudo-code example of the sort of DSP-like codes I am looking at: uint64_t foo(int *pA, int *pB, unsigned N, unsigned C) { uint64_t sum = 0; while (N-- > 0) { A1 = *pA++; A2 = *pA++; B1 = *pB++; B2 =
2013 Feb 27
1
[LLVMdev] MIScheduler / bundling
Hi, I am looking at the Hexagon MI Scheduling and trying to adapt it to my target. As far as I can see, Hexagon does not bundle the VLIW-bundles by calling bundleWithPred() on MIs of the completed cycle. First of all, why is this not done? SlotIndexes seems to have at least some support for this, by calling getBundleStart() for each MI that is looked up. A follow up question is then, how would
2017 Aug 12
3
Mischeduler: Unknown reason for peak register pressure increase
I am working on a project where we are integrating an existing pre-RA scheduler into LLVM and we are trying to match our peak register pressure values with the machine instruction schedulers values while using X86. I am finding some mismatches in test cases like the one attached. The registers "AH" and "AL" are live-out but not live-in and I don't see that they are defined
2015 Jul 01
3
[LLVMdev] MIScheduler + AA: Missed scheduling opportunity in MIsNeedChainEdge. Bug?
Hello, While tuning the MIScheduler for my target, I discovered a code that unnecessarily restricts the scheduler. I think this is a bug, but I would appreciate a second opinion. In file ScheduleDAGInstrs.cpp, the function MIsNeedChainEdge determines whether two MachineInstrs are ordered by a memory dependence. It first runs through the standard criteria (Do both instructions access memory?
2017 Oct 13
3
Machine Scheduler on Power PC: Latency Limit and Register Pressure
> On Oct 13, 2017, at 1:46 PM, Matthias Braun <matze at braunis.de> wrote: > > Yes, I've run into the problem myself that the Pending queue isn't even checked with the tryCandidate() logic and so takes priority over all other scheduling decisions. > > I personally would be open to changes in this area. To start the brainstorming I could imagine that we move nodes
2016 Oct 21
2
Prioritizing an SDNode for scheduling
Hello. Is there a way to specify in the back end an (ISD::INLINEASM) SDNode to be scheduled first under all circumstances? I need to specify something like node priority to schedule the node before all other nodes in the SelectionDAG of the basic block. (Using chain or glue edges in order to make a node first is not a good idea, since I am doing this at instruction selection time, on
2016 Oct 21
3
Prioritizing an SDNode for scheduling
I probably misunderstood the question. You probably want to do this in SelectionDAG. On Fri, Oct 21, 2016 at 10:29 AM, Ehsan Amiri <ehsanamiri at gmail.com> wrote: > You can do this by changing instruction scheduling heuristics. I think the > more important question is if this correct always for all platforms. > > I don't know which scheduler you use. We use
2017 Aug 17
2
reg coalescing improvements
Hi, I am seeing cases of poorly coalesced IV updates on SystemZ: In the final IR, it is obvious that %R4D<def> = LA %R2D<kill>, 4, %noreg // R4 = R2 + 4 %R2D<def> = LGR %R4D<kill> // R2 = R4 could be optimized to -> %R2D<def> = LA %R2D<kill>, 4, %noreg // R2 = R2 + 4 The reason this wasn't coalesced, is
2019 Sep 09
2
Fwd: MachineScheduler not scheduling for latency
Hi, I'm trying to understand why MachineScheduler does a poor job in straight line code in cases like the one in the attached debug dump. This is on AMDGPU, an in-order target, and the problem is that the IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in the resulting schedule they are often placed right next to their uses like this: 1784B %140:vgpr_32 =
2017 Oct 13
2
Machine Scheduler on Power PC: Latency Limit and Register Pressure
Hi, I've been looking at the Machine Scheduler on Power PC. I am looking only at the pre-RA machine scheduler and I am running it in the default bi-directional mode (so, both top down and bottom up queues are considered). I've come across an example where the scheduler picks a poor ordering for the instructions which results in very high register pressure which results in spills.
2016 Feb 03
2
[buildSchedGraph] memory dependencies
Hi, (This only concerns MISNeedChainEdge(), and is separate from D8705) I found out that the MIScheduler (pre-ra) could not handle a simple test case (test/CodeGen/SystemZ/alias-01.ll), with 16 independent load / add / stores. The buildSchedGraph() put too many edges between memory accesses, because 1) There was no implementation of areMemAccessesTriviallyDisjoint() for SystemZ. 2) Type
2006 Jun 14
3
appending
All, In the function below I have 24 individuals and 6 calculations per individual. The 6 calculations are collected each time in a 1:24 loop when calculating "delta". I'd like to collect all 144 = 24*6 calculations in one vector ("delta.patient.comb"). The function works as is via indexing, but is there an easier way to collect the measurements via appendinng the 6
2012 Feb 13
2
R's AIC values differ from published values
Using the Cement hardening data in Anderson (2008) Model Based Inference in the Life Sciences. A Primer on Evidence, and working with the best model which is lm ( y ~ x1 + x2, data = cement ) the AIC value from R is model <- lm ( formula = y ~ x1 + x2 , data = cement ) AIC ( model ) 64.312 which can be converted to AICc by adding the bias
2017 Aug 30
2
Register pressure calculation in the machine scheduler and live-through registers
> On Aug 30, 2017, at 1:43 PM, Matthias Braun <matze at braunis.de> wrote: > > That means you cannot use the code from RegisterPressure.{cpp|h} to compute this. The other liveness analysis we have in llvm codegen is LiveIntervals (LiveItnervalAnalysis) which gives you a list of liveness segments of a given vreg (the same representation is used in most linear scan allocators even
2009 Nov 12
2
A combinatorial optimization problem: finding the best permutation of a complex vector
Hi, I have a complex-valued vector X in C^n. Given another complex-valued vector Y in C^n, I want to find a permutation of Y, say, Y*, that minimizes ||X - Y*||, the distance between X and Y*. Note that this problem can be trivially solved for "Real" vectors, since real numbers possess the ordering property. Complex numbers, however, do not possess this property. Hence the