thr3ads.net - search: "microopbuffersize"

Displaying 20 results from an estimated 20 matches for "microopbuffersize".

MachineScheduler not scheduling for latency

2019 Sep 10

MachineScheduler not scheduling for latency

Hi Andy, Thanks for the explanations. Yes AMDGPU is in-order and has MicroOpBufferSize = 1. Re "issue limited" and instruction groups: could it make sense to disable the generic scheduler's detection of issue limitation on in-order CPUs, or on CPUs that don't define instruction groups, or some similar condition? Something like: --- a/lib/CodeGen/MachineScheduler.c...

Fwd: MachineScheduler not scheduling for latency

2019 Sep 09

Fwd: MachineScheduler not scheduling for latency

Hi, I'm trying to understand why MachineScheduler does a poor job in straight line code in cases like the one in the attached debug dump. This is on AMDGPU, an in-order target, and the problem is that the IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in the resulting schedule they are often placed right next to their uses like this: 1784B %140:vgpr_32 =

[LLVMdev] "Anti" scheduling with OoO cores?

2014 Nov 02

[LLVMdev] "Anti" scheduling with OoO cores?

...t; line in the SchedWriteRes for this instruction. This doesn't seem to work (a poor schedule is produced) so I changed it to also require another resource that I modelled as unbuffered (BufferSize=0), in the hope that this would "block" other FDIVs... no joy. Then I noticed that the MicroOpBufferSize is set to 128, which is wildly high as Cortex-A57 has separated smaller reorder buffers, not one larger reorder buffer. Even reducing it down to "2" made no effect, the divs were scheduled in a clump together. But "1" and "0" (denoting in-order) produced a nice schedu...

How to get started with instruction scheduling? Advice needed.

2016 Apr 26

How to get started with instruction scheduling? Advice needed.

...ple would be AArch64/AArch64SchedA53.td. If itineraries are present, they are used by the mi-scheduler next to the SchedMachineModel to detect hazards. I think that is the only place where the mi-scheduler uses itineraries. There are some magic numbers you need for in-order operation. Most notably MicroOpBufferSize should be set to 0 for full in-order behaviour. You also want to set CompleteModel to 0 as that prevents asserts due to instructions without scheduling information. There is a script that might help you to visualise if you have provided scheduling information in the SchedMachineModel for all instru...

Scheduler: modelling long register reservations?

2017 Apr 03

Scheduler: modelling long register reservations?

...e vector result. Am I specifying the scheduling constraints incorrectly? Can llvm support this kind of constraint? Thank you, Nick Johnson D. E. Shaw Research // Excerpted from lib/Target/MyTarget/MyTargetSchedule.td: // def DesGCv3GenericModel : SchedMachineModel { let IssueWidth = 1; let MicroOpBufferSize = 0; let CompleteModel = 1; } // ... def FlexU : ProcResource<64> { let BufferSize = 1; } def : WriteRes<IIFlexRead, [FlexU]> { let Latency = 25; let ResourceCycles = [25]; } class SchedFlexRead : Sched< [IIFlexRead] >; // I apply this to the definition of...

Machine Scheduler on Power PC: Latency Limit and Register Pressure

2017 Oct 13

Machine Scheduler on Power PC: Latency Limit and Register Pressure

...lable instructions can effectively be scheduled “for free”. Scheduling them first won’t delay anything in the pending queue! If that’s not the case then there’s something wrong with your model. Or if this kind of VLIW-style scheduling isn’t what you wanted, then you shouldn’t have ordered it: Set MicroOpBufferSize=1 instead: // MicroOpBufferSize is the number of micro-ops that the processor may buffer // for out-of-order execution. // // "0" means operations that are not ready in this cycle are not considered // for scheduling (they go in the pending queue). Latency is paramount. This...

How to get started with instruction scheduling? Advice needed.

2016 Apr 20

How to get started with instruction scheduling? Advice needed.

So if I use the SchedMachineModel method, can I just skip itineraries? Phil On Wed, Apr 20, 2016 at 12:29 PM, Sergei Larin <slarin at codeaurora.org> wrote: > Target does make a difference. VLIW needs more hand-holding. For what you > are describing it should be fairly simple. > > > > Best strategy – see what other targets do. ARM might be a good start for > generic

[VLIW Scheduler] Itineraries vs. per operand scheduling

2018 Feb 08

[VLIW Scheduler] Itineraries vs. per operand scheduling

...e description. At least initially, there also wasn't anyone interested in working on an in-order scheduling implementation based on the new machine model, even though the machine description itself was designed for in-order scheduling. There are a handful of targets now using the new model with MicroOpBufferSize = 0/1 (in-order mode). It's hard for me to say how widely supported it is, because so many backends using the MachineScheduler are out-of-tree. The per-operand model should be much more compile time efficient, but that's often not a concern. It can be easier now to incrementally bootstrap...

[VLIW Scheduler] Itineraries vs. per operand scheduling

2018 Feb 04

[VLIW Scheduler] Itineraries vs. per operand scheduling

Hi, What is the best way to model a scheduler for a VLIW in-order architecture? I've looked at the Hexagon and R600 architectures and they are using itineraries. I wanted to understand the benefit in using itineraries over the per operand scheduling. I also found this thread from almost 2 years ago: http://lists.llvm.org/pipermail/llvm-dev/2016-April/098763.html At that time it seemed the

Machine Scheduler on Power PC: Latency Limit and Register Pressure

2017 Oct 13

Machine Scheduler on Power PC: Latency Limit and Register Pressure

Hi, I've been looking at the Machine Scheduler on Power PC. I am looking only at the pre-RA machine scheduler and I am running it in the default bi-directional mode (so, both top down and bottom up queues are considered). I've come across an example where the scheduler picks a poor ordering for the instructions which results in very high register pressure which results in spills.

[LLVMdev] Question about load clustering in the machine scheduler

2015 Mar 27

[LLVMdev] Question about load clustering in the machine scheduler

On Thu, Mar 26, 2015 at 11:50:20PM -0700, Andrew Trick wrote: > > > On Mar 26, 2015, at 7:36 PM, Tom Stellard <tom at stellard.net> wrote: > > > > Hi, > > > > I have a program with over 100 loads (each with a 10 cycle latency) > > at the beginning of the program, and I can't figure out how to get > > the machine scheduler to intermix ALU

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

[RFC] llvm-mca: a static performance analysis tool

...er detail, but the benefit didn’t justify the complexity and compile time. I always felt that a static analysis tool was the right place for this kind of simulation. > A few examples of details that are missing in scheduling models are: > - Maximum number of instructions retired per cycle. MicroOpBufferSize is presumed to cover register renaming and retirement, assuming they are well-balanced. For your tool, you certainly want to be more precise. > - Actual dispatch width (it often differs from the issue width). This was always a hard one to generalize in a machine independent way, and half the...

[LLVMdev] [RFC] Iterrative compilation framework for Clang/LLVM

2013 Dec 23

[LLVMdev] [RFC] Iterrative compilation framework for Clang/LLVM

On Dec 16, 2013, at 4:26 PM, Hal Finkel <hfinkel at anl.gov> wrote: >> At the end of each iteration, quality of generated code is estimated >> by >> executing newly introduced target dependent pass. Based on results >> path for >> the following iteration is calculated. At the moment, this has been >> proved >> for MIPS only and it is based on code

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

[RFC] llvm-mca: a static performance analysis tool

...s tool was the right place for this kind of simulation. > I agree. I am pretty confident that all the extra details can become opt-in for targets. > A few examples of details that are missing in scheduling models are: > - Maximum number of instructions retired per cycle. > > > MicroOpBufferSize is presumed to cover register renaming and retirement, > assuming they are well-balanced. For your tool, you certainly want to be > more precise. > Yes. The long term goal is to have specific (optional) fields for targets that want to specify a different value. MicroOpBufferSize could sti...

[LLVMdev] New machine model questions

2014 Jan 28

[LLVMdev] New machine model questions

From: Andrew Trick [mailto:atrick at apple.com] Sent: 24 January 2014 21:52 To: Daniel Sanders Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu) Subject: Re: New machine model questions On Jan 24, 2014, at 2:21 AM, Daniel Sanders <Daniel.Sanders at imgtec.com<mailto:Daniel.Sanders at imgtec.com>> wrote: Hi Andrew, I seem to be making good progress on the P5600 scheduler

[MachineScheduler] Question about IssueWidth / NumMicroOps

2018 May 09

[MachineScheduler] Question about IssueWidth / NumMicroOps

...nction between decode bandwidth and in-order issue of micro-ops. Some target maintainers may want to schedule for an OOO machine as if it were in-order. They are welcome to do that (and hopefully have plenty of architectural registers). The scheduling mode can be broadly selected with the infamous MicroOpBufferSize setting, or individual resources can be marked in-order with BufferSize=0. And, as always, I suggest writing your own scheduling strategy of you care that deeply about scheduling for the peculiarities of your machine. (caveat: there may still be GenericScheduler implementation deficiencies because...

[MachineScheduler] Question about IssueWidth / NumMicroOps

2018 May 09

[MachineScheduler] Question about IssueWidth / NumMicroOps

Hi, I would like to ask what IssueWidth and NumMicroOps refer to in MachineScheduler, just to be 100% sure what the intent is. Are we modeling the decoder phase or the execution stage? Background: First of all, there seems to be different meanings of "issue" depending on which platform you're on:

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 01

[RFC] llvm-mca: a static performance analysis tool

...tSchedule.td). The reorder buffer is implemented by class RetireControlUnit (see Dispatch.h). Its goal is to track the progress of instructions that are "in-flight", and retire instructions in program order. The number of entries in the reorder buffer defaults to the value of field 'MicroOpBufferSize' from the target scheduling model. Instructions that are dispatched to the schedulers consume scheduler buffer entries. The tool queries the scheduling model to figure out the set of buffered resources consumed by an instruction. Buffered resources are treated like "scheduler" reso...

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

[RFC] llvm-mca: a static performance analysis tool

...r is implemented by class RetireControlUnit (see > Dispatch.h). > Its goal is to track the progress of instructions that are > "in-flight", and > retire instructions in program order. The number of entries in the > reorder > buffer defaults to the value of field 'MicroOpBufferSize' from the target > scheduling model. > > Instructions that are dispatched to the schedulers consume scheduler > buffer > entries. The tool queries the scheduling model to figure out the set of > buffered resources consumed by an instruction. Buffered resources are > tre...

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

[RFC] llvm-mca: a static performance analysis tool

...reorder buffer is implemented by class RetireControlUnit (see > Dispatch.h). > Its goal is to track the progress of instructions that are "in-flight", and > retire instructions in program order. The number of entries in the reorder > buffer defaults to the value of field 'MicroOpBufferSize' from the target > scheduling model. > > Instructions that are dispatched to the schedulers consume scheduler buffer > entries. The tool queries the scheduling model to figure out the set of > buffered resources consumed by an instruction. Buffered resources are > treated &g...

search for: microopbuffersize