thr3ads.net - similar to: "Fwd: MachineScheduler not scheduling for latency"

Displaying 20 results from an estimated 200 matches similar to: "Fwd: MachineScheduler not scheduling for latency"

MachineScheduler not scheduling for latency

2019 Sep 10

MachineScheduler not scheduling for latency

Hi Andy, Thanks for the explanations. Yes AMDGPU is in-order and has MicroOpBufferSize = 1. Re "issue limited" and instruction groups: could it make sense to disable the generic scheduler's detection of issue limitation on in-order CPUs, or on CPUs that don't define instruction groups, or some similar condition? Something like: --- a/lib/CodeGen/MachineScheduler.cpp +++

[LLVMdev] Question about load clustering in the machine scheduler

2015 Mar 27

[LLVMdev] Question about load clustering in the machine scheduler

On Thu, Mar 26, 2015 at 11:50:20PM -0700, Andrew Trick wrote: > > > On Mar 26, 2015, at 7:36 PM, Tom Stellard <tom at stellard.net> wrote: > > > > Hi, > > > > I have a program with over 100 loads (each with a 10 cycle latency) > > at the beginning of the program, and I can't figure out how to get > > the machine scheduler to intermix ALU

Bug in TableGen RegisterBankEmitter

2017 May 16

Bug in TableGen RegisterBankEmitter

On 05/16/2017 11:57 AM, Daniel Sanders wrote: >> If that's right, one possible fix would be to rename some of the subregister indices but that's likely to be quite painful. I'll have a think and see if I can come up with something nicer. > > I haven't been able to come up with a better answer for this, just an alternate choice as to where the complexity is. If we were

RFC: atomic operations on SI+

2016 Mar 28

RFC: atomic operations on SI+

On Fri, Mar 25, 2016 at 02:22:11PM -0400, Jan Vesely wrote: > Hi Tom, Matt, > > I'm working on a project that needs few coherent atomic operations (HSA > mode: load, store, compare-and-swap) for std::atomic_uint in HCC. > > the attached patch implements atomic compare and swap for SI+ > (untested). I tried to stay within what was available, but there are > few issues

LiveInterval error with 2 dead defs

2019 Sep 09

LiveInterval error with 2 dead defs

Hi, I’m hitting a machine verifier error in a trivial testcase which I don’t understand. There are 2 dead defs of the same register: --- name: multiple_connected_compnents_dead tracksRegLiveness: true body: | bb.0: dead %0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec dead %0:vgpr_32 = V_MOV_B32_e32 1, implicit $exec ... The live intervals look OK to me with 1 valno

RFC: atomic operations on SI+

2016 Mar 25

RFC: atomic operations on SI+

Hi Tom, Matt, I'm working on a project that needs few coherent atomic operations (HSA mode: load, store, compare-and-swap) for std::atomic_uint in HCC. the attached patch implements atomic compare and swap for SI+ (untested). I tried to stay within what was available, but there are few issues that I was unsure how to address: 1.) it currently uses v2i32 for both input and output. This

Bug in TableGen RegisterBankEmitter

2017 May 10

Bug in TableGen RegisterBankEmitter

Hi Tom, The output: Added VReg_64(explicit) Added VS_32(explicit (VS_32) VReg_64 class-with-subregs: VReg_64) is saying that VS_32 was added because VReg_64 was explicitly specified and that while inspecting VS_32, it noticed that every register in VS_32 was a subregister of a register from VReg_64 using a single common subregister index. I've added some more tracing to my local copy and

[MachineScheduler] Question about IssueWidth / NumMicroOps

2018 May 09

[MachineScheduler] Question about IssueWidth / NumMicroOps

Hi, I would like to ask what IssueWidth and NumMicroOps refer to in MachineScheduler, just to be 100% sure what the intent is. Are we modeling the decoder phase or the execution stage? Background: First of all, there seems to be different meanings of "issue" depending on which platform you're on:

LiveInterval error with 2 dead defs

2019 Oct 07

LiveInterval error with 2 dead defs

The associated patch caused a compilation problems on Hexagon: https://bugs.llvm.org/show_bug.cgi?id=43302 The splitting of a live interval should not be done automatically upon creation. Calling LIS->getInterval(Reg) should not go around changing the code behind the scenes. There is already a function “splitSeparateComponents” that does that. It should be added where it’s missing. --

mischeduler (pre-RA) experiments

2017 Nov 25

mischeduler (pre-RA) experiments

> > Of course, you want to duplicate as little of the generic scheduling logic > as you can. So I think the challenge is how to expose the > generic scheduler's functionality as a base class or composition of > utilities so that defining your strategy doesn't require too much > copy-paste. Isn't GCNMaxOccupancySchedStrategy [1] already an example on using

[MachineScheduler] Question about IssueWidth / NumMicroOps

2018 May 09

[MachineScheduler] Question about IssueWidth / NumMicroOps

> On May 9, 2018, at 9:43 AM, Jonas Paulsson <paulsson at linux.vnet.ibm.com> wrote: > > Hi, > > I would like to ask what IssueWidth and NumMicroOps refer to in MachineScheduler, just to be 100% sure what the intent is. > Are we modeling the decoder phase or the execution stage? > > Background: > > First of all, there seems to be different meanings of

[LLVMdev] "Anti" scheduling with OoO cores?

2014 Nov 02

[LLVMdev] "Anti" scheduling with OoO cores?

Hi Andy, Dave, I've been doing a bit of experimentation trying to understand the schedmodel a bit better and improving modelling of FDIV (on Cortex-A57). FDIV is not pipelined, and blocks other FDIV operations (FDIVDrr and FDIVSrr). This seems to be already semi-modelled, with a "ResourceCycles=[18]" line in the SchedWriteRes for this instruction. This doesn't seem to work (a

[LLVMdev] Question about load clustering in the machine scheduler

2015 Mar 27

[LLVMdev] Question about load clustering in the machine scheduler

Hi, I have a program with over 100 loads (each with a 10 cycle latency) at the beginning of the program, and I can't figure out how to get the machine scheduler to intermix ALU instructions with the loads to effectively hide the latency. It seems the issue is with load clustering. I restrict load clustering to 4 at a time, but when I look at the debug output, the loads are always being

Prioritizing an SDNode for scheduling

2016 Oct 21

Prioritizing an SDNode for scheduling

Hello. Is there a way to specify in the back end an (ISD::INLINEASM) SDNode to be scheduled first under all circumstances? I need to specify something like node priority to schedule the node before all other nodes in the SelectionDAG of the basic block. (Using chain or glue edges in order to make a node first is not a good idea, since I am doing this at instruction selection time, on

[LLVMdev] how to detect data hazard in pre-RA-sched

2013 Sep 25

[LLVMdev] how to detect data hazard in pre-RA-sched

Hi, Andrew, Thank you for answering my question. What's the status of misched? is it experimental? I found it is disabled by default for all architectures(3.4svn). I also don't understand the algorithm. Could you point to me more papers or text materials about your approach? it seems that you want to balance register pressure and ILP in misched. On Tue, Sep 24, 2013 at 4:07 PM,

How to get started with instruction scheduling? Advice needed.

2016 Apr 26

How to get started with instruction scheduling? Advice needed.

Hi Phil. You more or less answered your own question, but let me give you some more info. Maybe it is of use. >From what I understand the SchedMachineModel is the future, although it is not as powerful as itineraries at present. The mi-scheduler is mostly developed around out-of-orders cores, I believe (I love to hear arguments on the contrary). Some of the constraints that can be found in

Prioritizing an SDNode for scheduling

2016 Oct 21

Prioritizing an SDNode for scheduling

I probably misunderstood the question. You probably want to do this in SelectionDAG. On Fri, Oct 21, 2016 at 10:29 AM, Ehsan Amiri <ehsanamiri at gmail.com> wrote: > You can do this by changing instruction scheduling heuristics. I think the > more important question is if this correct always for all platforms. > > I don't know which scheduler you use. We use

[MachineScheduler] Question about IssueWidth / NumMicroOps

2018 May 14

[MachineScheduler] Question about IssueWidth / NumMicroOps

Hi Andrew, Thank you very much for the most helpful explanations! Many things could go in as comments, if you ask me - for example: --- > The LLVM machine model is an abstract machine. > The abstract pipeline is built around the notion of an "issue point". This is merely a reference point for counting machine cycles. > > > IssueWidth is meant to be a hard in-order

[LLVMdev] how to detect data hazard in pre-RA-sched

2013 Sep 26

[LLVMdev] how to detect data hazard in pre-RA-sched

On Wed, Sep 25, 2013 at 1:15 PM, Andrew Trick <atrick at apple.com> wrote: > > On Sep 24, 2013, at 7:59 PM, Liu Xin <navy.xliu at gmail.com> wrote: > > Hi, Andrew, > > Thank you for answering my question. > > What's the status of misched? is it experimental? I found it is disabled > by default for all architectures(3.4svn). I also don't understand

Scheduler: modelling long register reservations?

2017 Apr 03

Scheduler: modelling long register reservations?

Hello, My out-of-tree target features some high latency instructions (let's call them FXLV). When an FXLV issues, it reserves its destination register and execution continues; if a subsequent instruction attempts to read or write that register, the pipline will stall until the FXLV completes. I have attempted to encode this constraint in the machine scheduler (excerpt at bottom of email).

similar to: Fwd: MachineScheduler not scheduling for latency