thr3ads.net - llvm dev - [LLVMdev] Question about per-operand machine model [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Andrew Trick

2014-Feb-28 01:00 UTC

[LLVMdev] Question about per-operand machine model

On Feb 19, 2014, at 1:54 PM, jingu <jingu at codeplay.com> wrote:
> Hi Andy,
> 
> I am trying to schedule and packetize instructions for VLIW at post-RA
> stage or final codegen stage, where code transformations are not allowed
> any more, because hardware can not resolve resource conflict. There is a
> simple example as following:
> 
> ADD dest_reg1, src_reg1, src_reg2 (functional unit : ALU)
> STORE dest_reg2, mem (functional unit: LOAD_STORE)
> 
> These instructions can be genally packetized together because there is
> no dependency among operands and they use different functional unit. But
> we have one more restricton. The restriction is that some of
> instructions can not access to same register file at the same cycle. In
> other words, if 'src_reg1' of ADD instruction uses register file
'A' and
> 'dest_reg2' of STORE instruction uses same register file at the
same
> cycle, it causes resource conflict and can not be executed on same
> cycle. This restriction depends on instruction type. I tried to consider
> each register file as a resource unit which is consumed by each operand.
> While scheduling instructions per cycle, used register file is recorded
> on state per cycle to check the conflict. In our heristic, it depends on
> operand's latency to record this resource on specific cycle's
state. so
> I have tried to find a way to get latency and resource with each
> operand. If it is not possible to support this feature with per-operand
> resource model, as you suggested, I will try to make our own state
> machine or other scheduling constraint logic. I am newbee with
> scheduler. If you have any kinds of comment or feel something worng,
> please let me know. It will be really helpful.
It sounds like the register file is static and does not depend on register
allocation. In this case, what you tried makes sense but is really not
supported. The machine model tables are designed to be efficient for the common
case, and per-operand resources don’t really make sense most of the time.

It sounds like you want to model the pipeline stage at which a resource is used.
To do that with the per-operand machine model (misnomer), I think we need a
ResourceDelay vector in addition to ResourceCycles, which we could easily add.

However, overall, I think you’re target is interesting enough that you may be
better off augmenting the standard machine model with your own model. Your
scheduler plugin could keep your own tables or state machine to model the
constraints.

If you want to be clever, you could write tablegen code to build your model up
from the SchedRead/Write definitions that are part of the standard model. You
could add extra fields specific to your model.

Were you previously using the old instruction itineraries, and now moving to the
new model?

-Andy
> 
> Thanks for your kind response,
> JinGu Kang
> 
> On 2014-02-20 오전 2:27, Andrew Trick wrote:
>> Hi JinGu,
>> 
>> We currently have the ResourceCycles list to indicate the number of cpu
cycles during which a resource is reserved. We could simply add a ResourceDelay
with similar grammar. The MachineScheduler could be taught to keep track of the
first and last time that a resource is reserved.
>> 
>> Note that the MachineScheduler will work with the instruction
itineraries if you choose to implement them. That’s the only way to get a full
reservation table without customizing the scheduler. You can plugin your own
state machine or other scheduling constraint logic. You may want to do this if
you have very complicated constraints.
>> 
>> Can you provide an example of the most complicated instruction
resources that you need to model?
>> 
>> -Andy
>> 
>> On Feb 19, 2014, at 4:57 AM, JinGu Kang <jingu at codeplay.com>
wrote:
>> 
>>> Hi Andy,
>>> 
>>> I am sorry to misunderstand 'ReadAdvance' code. In order to
support
>>> resource per operand, I feel we need more table and function. If
>>> possbile, I would like to listen to your opinion whether this
feature is
>>> useful or not. As I mentioned on previous e-mail, it will be useful
to
>>> access the latency and the resource per operand while checking
resource
>>> conflict per cycle.
>>> 
>>> Thanks,
>>> JinGu Kang
>>> 
>>> On 18/02/14 23:09, jingu wrote:
>>>>> Resources and latency are not tied. An instruction is
mapped to a
>>>>> scheduling class. A scheduling class is mapped to a set of
resources
>>>>> and a per-operand list of latencies.
>>>> Thanks for your kind explanation.
>>>> 
>>>> Our heuristic algorithm have needed the latency and the
resource per
>>>> operand to check resource conflicts per cycle. In order to
support
>>>> this with LLVM, I expected a per-operand list of resources like
>>>> latencies with a scheduling class.
>>>> 
>>>> Can I ask you something to modify on tablegen? I think that the
>>>> 'WriteResourceID' field of
'MCWriteLatencyEntry' is for identifying
>>>> the WriteResources of each defintion as commented on code. As
you
>>>> know, tablegen sets the 'WriteResourceID' field of
>>>> 'MCWriteLatencyEntry' with 'WriteID' when the
'Write' of defition is
>>>> referenced by a 'ReadAdvance'. If we always set this
field with
>>>> 'WriteID', it causes problem? I can see that
'ReadAdvance' only uses
>>>> the 'WriteResourceID' field of
'MCWriteLatencyEntry' in
>>>> 'computeOperandLatency' function. I think the pair of
latency and
>>>> write resource for defintion will be useful to check conflicts
of
>>>> resources. As reference, I have attached simple patch.
>>>> 
>>>> Thanks,
>>>> JinGu Kang
>>>> 
>

Pierre-Andre Saulais

2014-Mar-03 16:53 UTC

head link

[LLVMdev] Question about per-operand machine model

Hi Andrew,

We are currently using a custom model where scheduling information is 
attached to each MCInstrDesc through tablegen, and we're trying to move 
to one of LLVM's models.

To expand on what JinGu mentioned, our target has explicit ports that 
are used to read and write values from and to the register file. The 
read port is usually accessed on cycle 0 while the write port is 
accessed when the result is written back to the destination register. 
Let's assume ADD has a latency of 1, MUL has a latency of 2 and both use 
port P0 to write back their result. The two instructions below would 
conflict on P0:

MUL r3, r4, r5
ADD r0, r1, r2
NOP               ; Both r0 and r4 are written back using P0 - conflict.

On our target there is no interlock which means any conflict results in 
the wrong value being written back to one of the register. That's why we 
want to model these ports as resources in the new model. That's also why 
we map these port resources to each operand as each operand accesses a 
different port.

After reading your replies, we have realized that the scheduler does not 
need to know which operand corresponds to each port. It simply needs to 
know the set of ports used by each instruction and after how many cycles 
these ports are used/reserved to avoid any conflict. That's why I 
believe the new process resource model closely fits what we need, except 
for the per-resource delay you mentioned.

This is how our model currently looks like:

def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], 
[II_ADD]>;
def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], 
[II_MUL]>;

where n_LATENCY_WITH_p is defined roughly as:

class n_LATENCY_WITH_p<int latency, ProcResourceKind port> : 
SchedWriteRes<[PR_Pp]> {
     let Latency = latency;
     let ResourceDelays = [latency];
}

class PR_Pp<int portIdx> : ProcResource<1>;

The latency for register write-back/port access is static and without 
interlock, which I think means the port resources should have 'Buffered 
= 0' in the definition. Is that correct?

I have attached a patch that adds the 'ResourceDelays' field in 
tablegen. Could you have a look at it? A couple possible issues are:
- 'Delay' is signed, since 'Cycles' in MCWriteLatencyEntry is
also signed.
- When an instruction accesses the same resource multiple times, the 
uses are aggregated in SubtargetEmitter::GenSchedClassTables. I'm not 
sure how that would work if we add a 'Delay' field to
MCWriteProcResEntry.

Thanks,
Pierre

On 28/02/14 01:00, Andrew Trick wrote:> On Feb 19, 2014, at 1:54 PM, jingu <jingu at codeplay.com> wrote:
>
>> Hi Andy,
>>
>> I am trying to schedule and packetize instructions for VLIW at post-RA
>> stage or final codegen stage, where code transformations are not
allowed
>> any more, because hardware can not resolve resource conflict. There is
a
>> simple example as following:
>>
>> ADD dest_reg1, src_reg1, src_reg2 (functional unit : ALU)
>> STORE dest_reg2, mem (functional unit: LOAD_STORE)
>>
>> These instructions can be genally packetized together because there is
>> no dependency among operands and they use different functional unit.
But
>> we have one more restricton. The restriction is that some of
>> instructions can not access to same register file at the same cycle. In
>> other words, if 'src_reg1' of ADD instruction uses register
file 'A' and
>> 'dest_reg2' of STORE instruction uses same register file at the
same
>> cycle, it causes resource conflict and can not be executed on same
>> cycle. This restriction depends on instruction type. I tried to
consider
>> each register file as a resource unit which is consumed by each
operand.
>> While scheduling instructions per cycle, used register file is recorded
>> on state per cycle to check the conflict. In our heristic, it depends
on
>> operand's latency to record this resource on specific cycle's
state. so
>> I have tried to find a way to get latency and resource with each
>> operand. If it is not possible to support this feature with per-operand
>> resource model, as you suggested, I will try to make our own state
>> machine or other scheduling constraint logic. I am newbee with
>> scheduler. If you have any kinds of comment or feel something worng,
>> please let me know. It will be really helpful.
> It sounds like the register file is static and does not depend on register
allocation. In this case, what you tried makes sense but is really not
supported. The machine model tables are designed to be efficient for the common
case, and per-operand resources don’t really make sense most of the time.
>
> It sounds like you want to model the pipeline stage at which a resource is
used. To do that with the per-operand machine model (misnomer), I think we need
a ResourceDelay vector in addition to ResourceCycles, which we could easily add.
>
> However, overall, I think you’re target is interesting enough that you may
be better off augmenting the standard machine model with your own model. Your
scheduler plugin could keep your own tables or state machine to model the
constraints.
>
> If you want to be clever, you could write tablegen code to build your model
up from the SchedRead/Write definitions that are part of the standard model. You
could add extra fields specific to your model.
>
> Were you previously using the old instruction itineraries, and now moving
to the new model?
>
> -Andy
>
>> Thanks for your kind response,
>> JinGu Kang
>>
>> On 2014-02-20 오전 2:27, Andrew Trick wrote:
>>> Hi JinGu,
>>>
>>> We currently have the ResourceCycles list to indicate the number of
cpu cycles during which a resource is reserved. We could simply add a
ResourceDelay with similar grammar. The MachineScheduler could be taught to keep
track of the first and last time that a resource is reserved.
>>>
>>> Note that the MachineScheduler will work with the instruction
itineraries if you choose to implement them. That’s the only way to get a full
reservation table without customizing the scheduler. You can plugin your own
state machine or other scheduling constraint logic. You may want to do this if
you have very complicated constraints.
>>>
>>> Can you provide an example of the most complicated instruction
resources that you need to model?
>>>
>>> -Andy
>>>
>>> On Feb 19, 2014, at 4:57 AM, JinGu Kang <jingu at
codeplay.com> wrote:
>>>
>>>> Hi Andy,
>>>>
>>>> I am sorry to misunderstand 'ReadAdvance' code. In
order to support
>>>> resource per operand, I feel we need more table and function.
If
>>>> possbile, I would like to listen to your opinion whether this
feature is
>>>> useful or not. As I mentioned on previous e-mail, it will be
useful to
>>>> access the latency and the resource per operand while checking
resource
>>>> conflict per cycle.
>>>>
>>>> Thanks,
>>>> JinGu Kang
>>>>
>>>> On 18/02/14 23:09, jingu wrote:
>>>>>> Resources and latency are not tied. An instruction is
mapped to a
>>>>>> scheduling class. A scheduling class is mapped to a set
of resources
>>>>>> and a per-operand list of latencies.
>>>>> Thanks for your kind explanation.
>>>>>
>>>>> Our heuristic algorithm have needed the latency and the
resource per
>>>>> operand to check resource conflicts per cycle. In order to
support
>>>>> this with LLVM, I expected a per-operand list of resources
like
>>>>> latencies with a scheduling class.
>>>>>
>>>>> Can I ask you something to modify on tablegen? I think that
the
>>>>> 'WriteResourceID' field of
'MCWriteLatencyEntry' is for identifying
>>>>> the WriteResources of each defintion as commented on code.
As you
>>>>> know, tablegen sets the 'WriteResourceID' field of
>>>>> 'MCWriteLatencyEntry' with 'WriteID' when
the 'Write' of defition is
>>>>> referenced by a 'ReadAdvance'. If we always set
this field with
>>>>> 'WriteID', it causes problem? I can see that
'ReadAdvance' only uses
>>>>> the 'WriteResourceID' field of
'MCWriteLatencyEntry' in
>>>>> 'computeOperandLatency' function. I think the pair
of latency and
>>>>> write resource for defintion will be useful to check
conflicts of
>>>>> resources. As reference, I have attached simple patch.
>>>>>
>>>>> Thanks,
>>>>> JinGu Kang
>>>>>

-- 
Pierre-Andre Saulais
Compiler Developer
Codeplay Software Ltd
45 York Place, Edinburgh, EH1 3HP
Tel: 0131 466 0503
Fax: 0131 557 6600
Website: http://www.codeplay.com
Twitter: https://twitter.com/codeplaysoft

This email and any attachments may contain confidential and /or privileged
information and is for use by the addressee only. If you are not the intended
recipient, please notify Codeplay Software Ltd immediately and delete the
message from your computer. You may not copy or forward it,or use or disclose
its contents to any other person. Any views or other information in this message
which do not relate to our business are not authorized by Codeplay software Ltd,
nor does this message form part of any contract unless so stated.
As internet communications are capable of data corruption Codeplay Software Ltd
does not accept any responsibility for any changes made to this message after it
was sent. Please note that Codeplay Software Ltd does not accept any liability
or responsibility for viruses and it is your responsibility to scan any
attachments.
Company registered in England and Wales, number: 04567874
Registered office: 81 Linkfield Street, Redhill RH1 6BY

-------------- next part --------------
A non-text attachment was scrubbed...
Name: add_resource_delays.patch
Type: text/x-patch
Size: 5050 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140303/86a89bfe/attachment.bin>

Andrew Trick

2014-Mar-03 22:21 UTC

head link

[LLVMdev] Question about per-operand machine model

On Mar 3, 2014, at 8:53 AM, Pierre-Andre Saulais <pierre-andre at
codeplay.com> wrote:
> Hi Andrew,
> 
> We are currently using a custom model where scheduling information is
attached to each MCInstrDesc through tablegen, and we're trying to move to
one of LLVM's models.
> 
> To expand on what JinGu mentioned, our target has explicit ports that are
used to read and write values from and to the register file. The read port is
usually accessed on cycle 0 while the write port is accessed when the result is
written back to the destination register. Let's assume ADD has a latency of
1, MUL has a latency of 2 and both use port P0 to write back their result. The
two instructions below would conflict on P0:
> 
> MUL r3, r4, r5
> ADD r0, r1, r2
> NOP               ; Both r0 and r4 are written back using P0 - conflict.
> 
> On our target there is no interlock which means any conflict results in the
wrong value being written back to one of the register. That's why we want to
model these ports as resources in the new model. That's also why we map
these port resources to each operand as each operand accesses a different port.
> 
> After reading your replies, we have realized that the scheduler does not
need to know which operand corresponds to each port. It simply needs to know the
set of ports used by each instruction and after how many cycles these ports are
used/reserved to avoid any conflict. That's why I believe the new process
resource model closely fits what we need, except for the per-resource delay you
mentioned.
> 
> This is how our model currently looks like:
> 
> def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2],
[II_ADD]>;
> def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2],
[II_MUL]>;
> 
> where n_LATENCY_WITH_p is defined roughly as:
> 
> class n_LATENCY_WITH_p<int latency, ProcResourceKind port> :
SchedWriteRes<[PR_Pp]> {
>    let Latency = latency;
>    let ResourceDelays = [latency];
> }
> 
> class PR_Pp<int portIdx> : ProcResource<1>;
> 
> The latency for register write-back/port access is static and without
interlock, which I think means the port resources should have 'Buffered =
0' in the definition. Is that correct?
Yes, but it isn’t sufficient. The scheduler makes no attempt to insert nops
currently. However, at the very least, you will want to implement your own
MachineSchedStrategy. It would be natural to handle nop insertion within your
implementation.

In fact, the interpretation of most machine model properties (MircoOpBufferSize,
resource BufferSize, ResourceCycles, ResourceDelay) is handled within the
MachineSchedStrategy. In past emails I have been explaining how the
GenericScheduler interprets the model, but it is really up to your custom
strategy to implement the model.
> I have attached a patch that adds the 'ResourceDelays' field in
tablegen. Could you have a look at it? A couple possible issues are:
> - 'Delay' is signed, since 'Cycles' in MCWriteLatencyEntry
is also signed.
Sure.
> - When an instruction accesses the same resource multiple times, the uses
are aggregated in SubtargetEmitter::GenSchedClassTables. I'm not sure how
that would work if we add a 'Delay' field to MCWriteProcResEntry.
Me neither. I suggest adding an assert to make sure no one accidentally uses two
resources with non-zero delay. Otherwise, your patch looks fine to me. It’s
totally up to you to test it though. I really want to take this patch, but we
have no mechanism for testing out-of-tree target features.

-Andy
> 
> Thanks,
> Pierre
> 
> On 28/02/14 01:00, Andrew Trick wrote:
>> On Feb 19, 2014, at 1:54 PM, jingu <jingu at codeplay.com> wrote:
>> 
>>> Hi Andy,
>>> 
>>> I am trying to schedule and packetize instructions for VLIW at
post-RA
>>> stage or final codegen stage, where code transformations are not
allowed
>>> any more, because hardware can not resolve resource conflict. There
is a
>>> simple example as following:
>>> 
>>> ADD dest_reg1, src_reg1, src_reg2 (functional unit : ALU)
>>> STORE dest_reg2, mem (functional unit: LOAD_STORE)
>>> 
>>> These instructions can be genally packetized together because there
is
>>> no dependency among operands and they use different functional
unit. But
>>> we have one more restricton. The restriction is that some of
>>> instructions can not access to same register file at the same
cycle. In
>>> other words, if 'src_reg1' of ADD instruction uses register
file 'A' and
>>> 'dest_reg2' of STORE instruction uses same register file at
the same
>>> cycle, it causes resource conflict and can not be executed on same
>>> cycle. This restriction depends on instruction type. I tried to
consider
>>> each register file as a resource unit which is consumed by each
operand.
>>> While scheduling instructions per cycle, used register file is
recorded
>>> on state per cycle to check the conflict. In our heristic, it
depends on
>>> operand's latency to record this resource on specific
cycle's state. so
>>> I have tried to find a way to get latency and resource with each
>>> operand. If it is not possible to support this feature with
per-operand
>>> resource model, as you suggested, I will try to make our own state
>>> machine or other scheduling constraint logic. I am newbee with
>>> scheduler. If you have any kinds of comment or feel something
worng,
>>> please let me know. It will be really helpful.
>> It sounds like the register file is static and does not depend on
register allocation. In this case, what you tried makes sense but is really not
supported. The machine model tables are designed to be efficient for the common
case, and per-operand resources don’t really make sense most of the time.
>> 
>> It sounds like you want to model the pipeline stage at which a resource
is used. To do that with the per-operand machine model (misnomer), I think we
need a ResourceDelay vector in addition to ResourceCycles, which we could easily
add.
>> 
>> However, overall, I think you’re target is interesting enough that you
may be better off augmenting the standard machine model with your own model.
Your scheduler plugin could keep your own tables or state machine to model the
constraints.
>> 
>> If you want to be clever, you could write tablegen code to build your
model up from the SchedRead/Write definitions that are part of the standard
model. You could add extra fields specific to your model.
>> 
>> Were you previously using the old instruction itineraries, and now
moving to the new model?
>> 
>> -Andy
>> 
>>> Thanks for your kind response,
>>> JinGu Kang
>>> 
>>> On 2014-02-20 오전 2:27, Andrew Trick wrote:
>>>> Hi JinGu,
>>>> 
>>>> We currently have the ResourceCycles list to indicate the
number of cpu cycles during which a resource is reserved. We could simply add a
ResourceDelay with similar grammar. The MachineScheduler could be taught to keep
track of the first and last time that a resource is reserved.
>>>> 
>>>> Note that the MachineScheduler will work with the instruction
itineraries if you choose to implement them. That’s the only way to get a full
reservation table without customizing the scheduler. You can plugin your own
state machine or other scheduling constraint logic. You may want to do this if
you have very complicated constraints.
>>>> 
>>>> Can you provide an example of the most complicated instruction
resources that you need to model?
>>>> 
>>>> -Andy
>>>> 
>>>> On Feb 19, 2014, at 4:57 AM, JinGu Kang <jingu at
codeplay.com> wrote:
>>>> 
>>>>> Hi Andy,
>>>>> 
>>>>> I am sorry to misunderstand 'ReadAdvance' code. In
order to support
>>>>> resource per operand, I feel we need more table and
function. If
>>>>> possbile, I would like to listen to your opinion whether
this feature is
>>>>> useful or not. As I mentioned on previous e-mail, it will
be useful to
>>>>> access the latency and the resource per operand while
checking resource
>>>>> conflict per cycle.
>>>>> 
>>>>> Thanks,
>>>>> JinGu Kang
>>>>> 
>>>>> On 18/02/14 23:09, jingu wrote:
>>>>>>> Resources and latency are not tied. An instruction
is mapped to a
>>>>>>> scheduling class. A scheduling class is mapped to a
set of resources
>>>>>>> and a per-operand list of latencies.
>>>>>> Thanks for your kind explanation.
>>>>>> 
>>>>>> Our heuristic algorithm have needed the latency and the
resource per
>>>>>> operand to check resource conflicts per cycle. In order
to support
>>>>>> this with LLVM, I expected a per-operand list of
resources like
>>>>>> latencies with a scheduling class.
>>>>>> 
>>>>>> Can I ask you something to modify on tablegen? I think
that the
>>>>>> 'WriteResourceID' field of
'MCWriteLatencyEntry' is for identifying
>>>>>> the WriteResources of each defintion as commented on
code. As you
>>>>>> know, tablegen sets the 'WriteResourceID' field
of
>>>>>> 'MCWriteLatencyEntry' with 'WriteID'
when the 'Write' of defition is
>>>>>> referenced by a 'ReadAdvance'. If we always set
this field with
>>>>>> 'WriteID', it causes problem? I can see that
'ReadAdvance' only uses
>>>>>> the 'WriteResourceID' field of
'MCWriteLatencyEntry' in
>>>>>> 'computeOperandLatency' function. I think the
pair of latency and
>>>>>> write resource for defintion will be useful to check
conflicts of
>>>>>> resources. As reference, I have attached simple patch.
>>>>>> 
>>>>>> Thanks,
>>>>>> JinGu Kang
>>>>>> 
> 
> 
> -- 
> Pierre-Andre Saulais
> Compiler Developer
> Codeplay Software Ltd
> 45 York Place, Edinburgh, EH1 3HP
> Tel: 0131 466 0503
> Fax: 0131 557 6600
> Website: http://www.codeplay.com
> Twitter: https://twitter.com/codeplaysoft
> 
> This email and any attachments may contain confidential and /or privileged
information and is for use by the addressee only. If you are not the intended
recipient, please notify Codeplay Software Ltd immediately and delete the
message from your computer. You may not copy or forward it,or use or disclose
its contents to any other person. Any views or other information in this message
which do not relate to our business are not authorized by Codeplay software Ltd,
nor does this message form part of any contract unless so stated.
> As internet communications are capable of data corruption Codeplay Software
Ltd does not accept any responsibility for any changes made to this message
after it was sent. Please note that Codeplay Software Ltd does not accept any
liability or responsibility for viruses and it is your responsibility to scan
any attachments.
> Company registered in England and Wales, number: 04567874
> Registered office: 81 Linkfield Street, Redhill RH1 6BY
> 
> <add_resource_delays.patch>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140303/5a430496/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Mar 2014 - [LLVMdev] Question about per-operand machine model

[LLVMdev] Question about per-operand machine model

[LLVMdev] Question about per-operand machine model

[LLVMdev] Question about per-operand machine model

Maybe Matching Threads