Displaying 20 results from an estimated 22 matches for "schedwriteres".
2014 Feb 18
2
[LLVMdev] Question about per-operand machine model
...class InstTEST<..., InstrItinClass itin> : Instruction {
let Itinerary = Itin;
}
// I assume this MI writes 2 registers.
def TESTINST : InstTEST<..., II_TEST>
// schedule info
II_TEST: InstrItinClass;
def ALU1: ProcResource<1>;
def ALU2: ProcResource<1>;
def WriteALU1: SchedWriteRes<[ALU1]> { let Latency = 1; }
def WriteALU2: SchedWriteRes<[ALU2]> { let Latency = 2; }
def : ItinRW<[WriteALU1, WriteALU2], [II_TEST]>
From this example, we can access the latency information of MI with
'getWriteLatencyEntry()' and the resource information of MI with...
2018 Mar 26
2
InstrItin and SchedWriteRes
Hi,
>From what I can understand from analyzing several *.td files, there are two
ways of specifying scheduling information for a specific target, either
using SchedWriteRes and InstrItinClass/Data.
Specifically looking at ARMScheduleA9.td, I can find both representations
and a comment (in the beggining of the file):
// This section contains legacy support for itineraries. This is
// required until SD and PostRA schedulers are replaced by MachineScheduler.
This pose...
2018 Apr 06
0
InstrItin and SchedWriteRes
> On Mar 26, 2018, at 5:18 AM, Pedro Lopes via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
> From what I can understand from analyzing several *.td files, there are two ways of specifying scheduling information for a specific target, either using SchedWriteRes and InstrItinClass/Data.
>
> Specifically looking at ARMScheduleA9.td, I can find both representations and a comment (in the beggining of the file):
>
> // This section contains legacy support for itineraries. This is
> // required until SD and PostRA schedulers are replaced by Mac...
2014 Jan 28
3
[LLVMdev] New machine model questions
...h pipeline independently. Some backend maintainers may still want to use itineraries if that level of precision is critical [1]. Another option is extending the new model. [2]
I will assume that each queue is fully pipelined (4 ACQ ops can be in-flight).
Forcing all this information into a single SchedWriteRes def would look like this:
def P5600FLD : SchedWriteRes <[P5600UnitAGQ, P5600UnitFP]> {
let Latency = 5; // 4 cycle load + 1 cycle FP writeback
let NumMicroOps = 2;
}
This is bad (for an in-order processor) because it prevents FPLoad + FPx from being scheduled in the same cycle and fails...
2018 Apr 06
1
InstrItin and SchedWriteRes
...2018, at 5:18 AM, Pedro Lopes via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > From what I can understand from analyzing several *.td files, there are
> two ways of specifying scheduling information for a specific target, either
> using SchedWriteRes and InstrItinClass/Data.
> >
> > Specifically looking at ARMScheduleA9.td, I can find both
> representations and a comment (in the beggining of the file):
> >
> > // This section contains legacy support for itineraries. This is
> > // required until SD and PostRA sc...
2018 Nov 15
2
Per-write cycle count with ReadAdvance - Do I really need that?
...o work with my ARCH.
It is about the scheduler info which describes reading my ARCH's vector
register. There are different latencies since forwarding/bypass appears. I
give it as below example:
def : WriteRes<WriteVector, [MyArchVALU]> { let Latency = 6; }
...
def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
...
Here I defined 3 different Writes with same latency number. Below shows the
forwarding.
def : ReadAdvance<MyReadVector, 5, [WriteVector]>;
def : ReadAdvance<MyReadVe...
2018 Nov 17
2
Per-write cycle count with ReadAdvance - Do I really need that?
...bout the scheduler info which describes reading my ARCH's vector
> register. There are different latencies since forwarding/bypass appears. I
> give it as below example:
>
> def : WriteRes<WriteVector, [MyArchVALU]> { let Latency = 6; }
> ...
> def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
> def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
> ...
>
> Here I defined 3 different Writes with same latency number. Below shows
> the forwarding.
>
> def : ReadAdvance<MyReadVector, 5, [WriteVector]&g...
2018 Nov 19
2
Per-write cycle count with ReadAdvance - Do I really need that?
...which describes reading my ARCH's vector
>> register. There are different latencies since forwarding/bypass appears. I
>> give it as below example:
>>
>> def : WriteRes<WriteVector, [MyArchVALU]> { let Latency = 6; }
>> ...
>> def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
>> def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
>> ...
>>
>> Here I defined 3 different Writes with same latency number. Below shows
>> the forwarding.
>>
>> def : ReadAdvance<MyRe...
2016 May 13
2
A question about AArch64 Cortex-A57 subtarget definition
...sGroup`, `A57UnitV`,
which can execute a 128bit ASIMD floating point operation,
such as FMLA(Q-form), in a single clock cycle.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
But in line 479-483 of `AArch64SchedA57.td`, as shown below
```
def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9; }
def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency = 10; }
def A57ReadFPVMA5 : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ]>;
def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex "^FML[AS](v2f32|v1i32|v...
2014 Jan 24
2
[LLVMdev] New machine model questions
Hi Andrew,
I seem to be making good progress on the P5600 scheduler using the new machine model but I've got a few questions about it.
How would you represent an instruction that splits into two micro-ops and is dispatched to two different reservation stations?
For example, I have two reservation stations (AGQ and FPQ). An FPU load instruction is split into a load micro-op which is
2014 Nov 02
3
[LLVMdev] "Anti" scheduling with OoO cores?
...doing a bit of experimentation trying to understand the
schedmodel a bit better and improving modelling of FDIV (on Cortex-A57).
FDIV is not pipelined, and blocks other FDIV operations (FDIVDrr and
FDIVSrr). This seems to be already semi-modelled, with a
"ResourceCycles=[18]" line in the SchedWriteRes for this instruction. This
doesn't seem to work (a poor schedule is produced) so I changed it to also
require another resource that I modelled as unbuffered (BufferSize=0), in
the hope that this would "block" other FDIVs... no joy.
Then I noticed that the MicroOpBufferSize is set to...
2020 May 10
2
[llvm-mca] Resource consumption of ProcResGroups
...ideal, but it is
> sort-of a consequence of the above mentioned two limitations (plus the way
> how the Haswell and Broadwell models were originally designed).
>
> I hope it helps,
> -Andrea
>
>
> Food for thought...
>
> It would be easy to add a DelayCycles vector to SchedWriteRes to indicate
> the relative start cycle for each reserved resource. That would effectively
> model dependent uOps.
>
> NumMicroOps is only meant to model any general limitation of the cpu
> frontend to issue/rename/retire micro-ops. So, yes, there's no way to
> associate resour...
2020 May 10
2
[llvm-mca] Resource consumption of ProcResGroups
..."reserved" flag is not ideal, but it is sort-of a consequence of the above mentioned two limitations (plus the way how the Haswell and Broadwell models were originally designed).
>
> I hope it helps,
> -Andrea
Food for thought...
It would be easy to add a DelayCycles vector to SchedWriteRes to indicate the relative start cycle for each reserved resource. That would effectively model dependent uOps.
NumMicroOps is only meant to model any general limitation of the cpu frontend to issue/rename/retire micro-ops. So, yes, there's no way to associate resources with specific uOps. You c...
2014 Mar 03
2
[LLVMdev] Question about per-operand machine model
...1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], [II_ADD]>;
> def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], [II_MUL]>;
>
> where n_LATENCY_WITH_p is defined roughly as:
>
> class n_LATENCY_WITH_p<int latency, ProcResourceKind port> : SchedWriteRes<[PR_Pp]> {
> let Latency = latency;
> let ResourceDelays = [latency];
> }
>
> class PR_Pp<int portIdx> : ProcResource<1>;
>
> The latency for register write-back/port access is static and without interlock, which I think means the port resources should...
2014 Mar 04
2
[LLVMdev] Question about per-operand machine model
...LATENCY_WITH_P2], [II_ADD]>;
>>> def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], [II_MUL]>;
>>>
>>> where n_LATENCY_WITH_p is defined roughly as:
>>>
>>> class n_LATENCY_WITH_p<int latency, ProcResourceKind port> : SchedWriteRes<[PR_Pp]> {
>>> let Latency = latency;
>>> let ResourceDelays = [latency];
>>> }
>>>
>>> class PR_Pp<int portIdx> : ProcResource<1>;
>>>
>>> The latency for register write-back/port access is static and without...
2016 Mar 08
2
Head at revision #262824 - breaks Movidius Out-of-Tree target
[I tweaked the subject, #262824 did not introduce the problem, it is just the version I am first seeing this problem]
A quick update - I have added 'Sched<[]>' as a base class for all instructions, and also:
let hasNoSchedulingInfo = 1;
to all the Pseudos, but while most of the errors have gone, I still get the diagnostic for 'COPY' thus:
error : No schedule
2014 Feb 28
2
[LLVMdev] Question about per-operand machine model
On Feb 19, 2014, at 1:54 PM, jingu <jingu at codeplay.com> wrote:
> Hi Andy,
>
> I am trying to schedule and packetize instructions for VLIW at post-RA
> stage or final codegen stage, where code transformations are not allowed
> any more, because hardware can not resolve resource conflict. There is a
> simple example as following:
>
> ADD dest_reg1, src_reg1,
2020 May 09
2
[llvm-mca] Resource consumption of ProcResGroups
Hi,
I’m trying to work out the behavior of llvm-mca on instructions with ProcResGroups. My current understanding is:
When an instruction requests a port group (e.g., HWPort015) and all of its atomic sub-resources (e.g., HWPort0,HWPort1,HWPort5), HWPort015 is marked as “reserved” and is issued in parallel with HWPort0, HWPort1, and HWPort5, blocking future instructions from reserving HWPort015
2018 May 10
2
[RFC] MC support for variant scheduling classes.
...t a register-register XOR is a
zero-idiom if both operands are the same register. That means, the XOR would
be optimized out at register renaming stage, and no opcode issued to the
pipelines. A variant scheduling class can be used to describe this case (see
example below):
```
def ZeroIdiomWrite : SchedWriteRes<[]> { let Latency = 0; }
def ZeroIdiom : SchedPredicate<[{
MI->getOpcode() == X86::XORrr &&
MI->getOperand(0).getReg() == MI->getOperand(1).getReg()
}]>;
def WriteXOR : SchedWriteVariant<[
SchedVar<ZeroIdiom, [ZeroIdiomWrite],
SchedVar<NoSched...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...ler's queue.
>
> Zero latency instructions (for example NOP instructions) don't consume scheduler
> resources. However, those instructions still reserve a number of slots in the
> reorder buffer.
As currently modeled by dummy µOps:
// A single nop micro-op (uX).
def WriteX : SchedWriteRes<[]> { let Latency = 0; }
You’ll need MachineInstr’s though.
> Instruction Issue
> -----------------
>
> As mentioned in the previous section, each scheduler resource implements a queue
> of instructions. An instruction has to wait in the scheduler's queue until
> inp...