thr3ads.net - llvm dev - [LLVMdev] Question about per-operand machine model [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Andrew Trick

2014-Mar-04 18:08 UTC

[LLVMdev] Question about per-operand machine model

On Mar 4, 2014, at 10:05 AM, Pete Cooper <peter_cooper at apple.com>
wrote:
> 
> On Mar 3, 2014, at 2:21 PM, Andrew Trick <atrick at apple.com> wrote:
> 
>> 
>> On Mar 3, 2014, at 8:53 AM, Pierre-Andre Saulais <pierre-andre at
codeplay.com> wrote:
>> 
>>> Hi Andrew,
>>> 
>>> We are currently using a custom model where scheduling information
is attached to each MCInstrDesc through tablegen, and we're trying to move
to one of LLVM's models.
>>> 
>>> To expand on what JinGu mentioned, our target has explicit ports
that are used to read and write values from and to the register file. The read
port is usually accessed on cycle 0 while the write port is accessed when the
result is written back to the destination register. Let's assume ADD has a
latency of 1, MUL has a latency of 2 and both use port P0 to write back their
result. The two instructions below would conflict on P0:
>>> 
>>> MUL r3, r4, r5
>>> ADD r0, r1, r2
>>> NOP               ; Both r0 and r4 are written back using P0 -
conflict.
>>> 
>>> On our target there is no interlock which means any conflict
results in the wrong value being written back to one of the register. That's
why we want to model these ports as resources in the new model. That's also
why we map these port resources to each operand as each operand accesses a
different port.
>>> 
>>> After reading your replies, we have realized that the scheduler
does not need to know which operand corresponds to each port. It simply needs to
know the set of ports used by each instruction and after how many cycles these
ports are used/reserved to avoid any conflict. That's why I believe the new
process resource model closely fits what we need, except for the per-resource
delay you mentioned.
>>> 
>>> This is how our model currently looks like:
>>> 
>>> def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1,
0_LATENCY_WITH_P2], [II_ADD]>;
>>> def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1,
0_LATENCY_WITH_P2], [II_MUL]>;
>>> 
>>> where n_LATENCY_WITH_p is defined roughly as:
>>> 
>>> class n_LATENCY_WITH_p<int latency, ProcResourceKind port> :
SchedWriteRes<[PR_Pp]> {
>>>    let Latency = latency;
>>>    let ResourceDelays = [latency];
>>> }
>>> 
>>> class PR_Pp<int portIdx> : ProcResource<1>;
>>> 
>>> The latency for register write-back/port access is static and
without interlock, which I think means the port resources should have
'Buffered = 0' in the definition. Is that correct?
>> 
>> Yes, but it isn’t sufficient. The scheduler makes no attempt to insert
nops currently. However, at the very least, you will want to implement your own
MachineSchedStrategy. It would be natural to handle nop insertion within your
implementation.
> Nop insertion during scheduling sounds good to me, but nop insertion after
regalloc has the advantage of being able to insert nops for spill/reload. 
Unless you don’t have spills?
To elaborate a bit more, MachineScheduler can run both preRA and postRA. So, if
you want to do nop insertion within MachineScheduler (as opposed to a separate
pass) you could enable it only during postRA scheduling.

-Andy
> Pete
>> 
>> In fact, the interpretation of most machine model properties
(MircoOpBufferSize, resource BufferSize, ResourceCycles, ResourceDelay) is
handled within the MachineSchedStrategy. In past emails I have been explaining
how the GenericScheduler interprets the model, but it is really up to your
custom strategy to implement the model.
>> 
>>> I have attached a patch that adds the 'ResourceDelays'
field in tablegen. Could you have a look at it? A couple possible issues are:
>>> - 'Delay' is signed, since 'Cycles' in
MCWriteLatencyEntry is also signed.
>> 
>> Sure.
>> 
>>> - When an instruction accesses the same resource multiple times,
the uses are aggregated in SubtargetEmitter::GenSchedClassTables. I'm not
sure how that would work if we add a 'Delay' field to
MCWriteProcResEntry.
>> 
>> Me neither. I suggest adding an assert to make sure no one accidentally
uses two resources with non-zero delay. Otherwise, your patch looks fine to me.
It’s totally up to you to test it though. I really want to take this patch, but
we have no mechanism for testing out-of-tree target features.
>> 
>> -Andy
>> 
>>> 
>>> Thanks,
>>> Pierre
>>> 
>>> On 28/02/14 01:00, Andrew Trick wrote:
>>>> On Feb 19, 2014, at 1:54 PM, jingu <jingu at
codeplay.com> wrote:
>>>> 
>>>>> Hi Andy,
>>>>> 
>>>>> I am trying to schedule and packetize instructions for VLIW
at post-RA
>>>>> stage or final codegen stage, where code transformations
are not allowed
>>>>> any more, because hardware can not resolve resource
conflict. There is a
>>>>> simple example as following:
>>>>> 
>>>>> ADD dest_reg1, src_reg1, src_reg2 (functional unit : ALU)
>>>>> STORE dest_reg2, mem (functional unit: LOAD_STORE)
>>>>> 
>>>>> These instructions can be genally packetized together
because there is
>>>>> no dependency among operands and they use different
functional unit. But
>>>>> we have one more restricton. The restriction is that some
of
>>>>> instructions can not access to same register file at the
same cycle. In
>>>>> other words, if 'src_reg1' of ADD instruction uses
register file 'A' and
>>>>> 'dest_reg2' of STORE instruction uses same register
file at the same
>>>>> cycle, it causes resource conflict and can not be executed
on same
>>>>> cycle. This restriction depends on instruction type. I
tried to consider
>>>>> each register file as a resource unit which is consumed by
each operand.
>>>>> While scheduling instructions per cycle, used register file
is recorded
>>>>> on state per cycle to check the conflict. In our heristic,
it depends on
>>>>> operand's latency to record this resource on specific
cycle's state. so
>>>>> I have tried to find a way to get latency and resource with
each
>>>>> operand. If it is not possible to support this feature with
per-operand
>>>>> resource model, as you suggested, I will try to make our
own state
>>>>> machine or other scheduling constraint logic. I am newbee
with
>>>>> scheduler. If you have any kinds of comment or feel
something worng,
>>>>> please let me know. It will be really helpful.
>>>> It sounds like the register file is static and does not depend
on register allocation. In this case, what you tried makes sense but is really
not supported. The machine model tables are designed to be efficient for the
common case, and per-operand resources don’t really make sense most of the time.
>>>> 
>>>> It sounds like you want to model the pipeline stage at which a
resource is used. To do that with the per-operand machine model (misnomer), I
think we need a ResourceDelay vector in addition to ResourceCycles, which we
could easily add.
>>>> 
>>>> However, overall, I think you’re target is interesting enough
that you may be better off augmenting the standard machine model with your own
model. Your scheduler plugin could keep your own tables or state machine to
model the constraints.
>>>> 
>>>> If you want to be clever, you could write tablegen code to
build your model up from the SchedRead/Write definitions that are part of the
standard model. You could add extra fields specific to your model.
>>>> 
>>>> Were you previously using the old instruction itineraries, and
now moving to the new model?
>>>> 
>>>> -Andy
>>>> 
>>>>> Thanks for your kind response,
>>>>> JinGu Kang
>>>>> 
>>>>> On 2014-02-20 오전 2:27, Andrew Trick wrote:
>>>>>> Hi JinGu,
>>>>>> 
>>>>>> We currently have the ResourceCycles list to indicate
the number of cpu cycles during which a resource is reserved. We could simply
add a ResourceDelay with similar grammar. The MachineScheduler could be taught
to keep track of the first and last time that a resource is reserved.
>>>>>> 
>>>>>> Note that the MachineScheduler will work with the
instruction itineraries if you choose to implement them. That’s the only way to
get a full reservation table without customizing the scheduler. You can plugin
your own state machine or other scheduling constraint logic. You may want to do
this if you have very complicated constraints.
>>>>>> 
>>>>>> Can you provide an example of the most complicated
instruction resources that you need to model?
>>>>>> 
>>>>>> -Andy
>>>>>> 
>>>>>> On Feb 19, 2014, at 4:57 AM, JinGu Kang <jingu at
codeplay.com> wrote:
>>>>>> 
>>>>>>> Hi Andy,
>>>>>>> 
>>>>>>> I am sorry to misunderstand 'ReadAdvance'
code. In order to support
>>>>>>> resource per operand, I feel we need more table and
function. If
>>>>>>> possbile, I would like to listen to your opinion
whether this feature is
>>>>>>> useful or not. As I mentioned on previous e-mail,
it will be useful to
>>>>>>> access the latency and the resource per operand
while checking resource
>>>>>>> conflict per cycle.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> JinGu Kang
>>>>>>> 
>>>>>>> On 18/02/14 23:09, jingu wrote:
>>>>>>>>> Resources and latency are not tied. An
instruction is mapped to a
>>>>>>>>> scheduling class. A scheduling class is
mapped to a set of resources
>>>>>>>>> and a per-operand list of latencies.
>>>>>>>> Thanks for your kind explanation.
>>>>>>>> 
>>>>>>>> Our heuristic algorithm have needed the latency
and the resource per
>>>>>>>> operand to check resource conflicts per cycle.
In order to support
>>>>>>>> this with LLVM, I expected a per-operand list
of resources like
>>>>>>>> latencies with a scheduling class.
>>>>>>>> 
>>>>>>>> Can I ask you something to modify on tablegen?
I think that the
>>>>>>>> 'WriteResourceID' field of
'MCWriteLatencyEntry' is for identifying
>>>>>>>> the WriteResources of each defintion as
commented on code. As you
>>>>>>>> know, tablegen sets the
'WriteResourceID' field of
>>>>>>>> 'MCWriteLatencyEntry' with
'WriteID' when the 'Write' of defition is
>>>>>>>> referenced by a 'ReadAdvance'. If we
always set this field with
>>>>>>>> 'WriteID', it causes problem? I can see
that 'ReadAdvance' only uses
>>>>>>>> the 'WriteResourceID' field of
'MCWriteLatencyEntry' in
>>>>>>>> 'computeOperandLatency' function. I
think the pair of latency and
>>>>>>>> write resource for defintion will be useful to
check conflicts of
>>>>>>>> resources. As reference, I have attached simple
patch.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> JinGu Kang
>>>>>>>> 
>>> 
>>> 
>>> -- 
>>> Pierre-Andre Saulais
>>> Compiler Developer
>>> Codeplay Software Ltd
>>> 45 York Place, Edinburgh, EH1 3HP
>>> Tel: 0131 466 0503
>>> Fax: 0131 557 6600
>>> Website: http://www.codeplay.com
>>> Twitter: https://twitter.com/codeplaysoft
>>> 
>>> This email and any attachments may contain confidential and /or
privileged information and is for use by the addressee only. If you are not the
intended recipient, please notify Codeplay Software Ltd immediately and delete
the message from your computer. You may not copy or forward it,or use or
disclose its contents to any other person. Any views or other information in
this message which do not relate to our business are not authorized by Codeplay
software Ltd, nor does this message form part of any contract unless so stated.
>>> As internet communications are capable of data corruption Codeplay
Software Ltd does not accept any responsibility for any changes made to this
message after it was sent. Please note that Codeplay Software Ltd does not
accept any liability or responsibility for viruses and it is your responsibility
to scan any attachments.
>>> Company registered in England and Wales, number: 04567874
>>> Registered office: 81 Linkfield Street, Redhill RH1 6BY
>>> 
>>> <add_resource_delays.patch>
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/3e91d049/attachment.html>

Pierre-André Saulais

2014-Mar-04 20:25 UTC

head link

[LLVMdev] Question about per-operand machine model

On 04/03/14 18:08, Andrew Trick wrote:>
> On Mar 4, 2014, at 10:05 AM, Pete Cooper <peter_cooper at apple.com 
> <mailto:peter_cooper at apple.com>> wrote:
>
>>
>> On Mar 3, 2014, at 2:21 PM, Andrew Trick <atrick at apple.com 
>> <mailto:atrick at apple.com>> wrote:
>>
>>>
>>> On Mar 3, 2014, at 8:53 AM, Pierre-Andre Saulais 
>>> <pierre-andre at codeplay.com <mailto:pierre-andre at
codeplay.com>> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> We are currently using a custom model where scheduling
information
>>>> is attached to each MCInstrDesc through tablegen, and we're
trying
>>>> to move to one of LLVM's models.
>>>>
>>>> To expand on what JinGu mentioned, our target has explicit
ports
>>>> that are used to read and write values from and to the register
>>>> file. The read port is usually accessed on cycle 0 while the
write
>>>> port is accessed when the result is written back to the
destination
>>>> register. Let's assume ADD has a latency of 1, MUL has a
latency of
>>>> 2 and both use port P0 to write back their result. The two 
>>>> instructions below would conflict on P0:
>>>>
>>>> MUL r3, r4, r5
>>>> ADD r0, r1, r2
>>>> NOP               ; Both r0 and r4 are written back using P0 - 
>>>> conflict.
>>>>
>>>> On our target there is no interlock which means any conflict 
>>>> results in the wrong value being written back to one of the 
>>>> register. That's why we want to model these ports as
resources in
>>>> the new model. That's also why we map these port resources
to each
>>>> operand as each operand accesses a different port.
>>>>
>>>> After reading your replies, we have realized that the scheduler
>>>> does not need to know which operand corresponds to each port.
It
>>>> simply needs to know the set of ports used by each instruction
and
>>>> after how many cycles these ports are used/reserved to avoid
any
>>>> conflict. That's why I believe the new process resource
model
>>>> closely fits what we need, except for the per-resource delay
you
>>>> mentioned.
>>>>
>>>> This is how our model currently looks like:
>>>>
>>>> def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 
>>>> 0_LATENCY_WITH_P2], [II_ADD]>;
>>>> def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 
>>>> 0_LATENCY_WITH_P2], [II_MUL]>;
>>>>
>>>> where n_LATENCY_WITH_p is defined roughly as:
>>>>
>>>> class n_LATENCY_WITH_p<int latency, ProcResourceKind
port> :
>>>> SchedWriteRes<[PR_Pp]> {
>>>>    let Latency = latency;
>>>>    let ResourceDelays = [latency];
>>>> }
>>>>
>>>> class PR_Pp<int portIdx> : ProcResource<1>;
>>>>
>>>> The latency for register write-back/port access is static and 
>>>> without interlock, which I think means the port resources
should
>>>> have 'Buffered = 0' in the definition. Is that correct?
>>>
>>> Yes, but it isn’t sufficient. The scheduler makes no attempt to 
>>> insert nops currently. However, at the very least, you will want to
>>> implement your own MachineSchedStrategy. It would be natural to 
>>> handle nop insertion within your implementation.Thanks, I'll have a look at MachineSchedStrategy and see how we can 
implement it for our target.>> Nop insertion during scheduling sounds good to me, but nop insertion 
>> after regalloc has the advantage of being able to insert nops for 
>> spill/reload.  Unless you don’t have spills?
>
> To elaborate a bit more, MachineScheduler can run both preRA and 
> postRA. So, if you want to do nop insertion within MachineScheduler 
> (as opposed to a separate pass) you could enable it only during postRA 
> scheduling.
>
> -AndyWe are currently doing scheduling with a custom scheduler/packetization 
pass at the very end of the machine compilation process, before object 
generation. We aren't using the MachineScheduler or any other scheduling 
before that pass, neither preRA or postRA. I think we could benefit from 
at least preRA scheduling, so that the register live ranges seen by the 
RA better match the final ranges (after our scheduling pass).

We do have spills, but I'm not sure if there is a benefit to inserting 
nops before our final pass though. Would that improve register 
allocation, if it was possible to do so preRA?>
>> Pete
>>>
>>> In fact, the interpretation of most machine model properties 
>>> (MircoOpBufferSize, resource BufferSize, ResourceCycles, 
>>> ResourceDelay) is handled within the MachineSchedStrategy. In past 
>>> emails I have been explaining how the GenericScheduler interprets 
>>> the model, but it is really up to your custom strategy to implement
>>> the model.
>>>
>>>> I have attached a patch that adds the 'ResourceDelays'
field in
>>>> tablegen. Could you have a look at it? A couple possible issues
are:
>>>> - 'Delay' is signed, since 'Cycles' in
MCWriteLatencyEntry is also
>>>> signed.
>>>
>>> Sure.
>>>
>>>> - When an instruction accesses the same resource multiple
times,
>>>> the uses are aggregated in
SubtargetEmitter::GenSchedClassTables.
>>>> I'm not sure how that would work if we add a
'Delay' field to
>>>> MCWriteProcResEntry.
>>>
>>> Me neither. I suggest adding an assert to make sure no one 
>>> accidentally uses two resources with non-zero delay. Otherwise,
your
>>> patch looks fine to me. It’s totally up to you to test it though. I
>>> really want to take this patch, but we have no mechanism for
testing
>>> out-of-tree target features.Adding an assert when someone uses two resources with non-zero delay, or 
maybe two different delays, sounds good to me. I'm glad to hear that 
you'd want to take patches even for out-of-tree features, it's much 
appreciated.

I'm not very familiar with the itinerary model, but aren't these two 
ways of expressing schedules equivalent?

def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], 
[II_ADD]>;

InstrItinData<II_ADD, [InstrStage<1, [P1], 0>, InstrStage<1,
[P2]>,
InstrStage<1, [P0]>], [1, 0, 0]>

If that's the case, then does the new machine model express itineraries 
with more than one stage, without adding this 'ResourceDelays' field?

Thanks,
Pierre
>>>
>>> -Andy
>>>
>>>>
>>>> Thanks,
>>>> Pierre
>>>>
>>>> On 28/02/14 01:00, Andrew Trick wrote:
>>>>> On Feb 19, 2014, at 1:54 PM, jingu <jingu at
codeplay.com
>>>>> <mailto:jingu at codeplay.com>> wrote:
>>>>>
>>>>>> Hi Andy,
>>>>>>
>>>>>> I am trying to schedule and packetize instructions for
VLIW at
>>>>>> post-RA
>>>>>> stage or final codegen stage, where code
transformations are not
>>>>>> allowed
>>>>>> any more, because hardware can not resolve resource
conflict.
>>>>>> There is a
>>>>>> simple example as following:
>>>>>>
>>>>>> ADD dest_reg1, src_reg1, src_reg2 (functional unit :
ALU)
>>>>>> STORE dest_reg2, mem (functional unit: LOAD_STORE)
>>>>>>
>>>>>> These instructions can be genally packetized together
because
>>>>>> there is
>>>>>> no dependency among operands and they use different
functional
>>>>>> unit. But
>>>>>> we have one more restricton. The restriction is that
some of
>>>>>> instructions can not access to same register file at
the same
>>>>>> cycle. In
>>>>>> other words, if 'src_reg1' of ADD instruction
uses register file
>>>>>> 'A' and
>>>>>> 'dest_reg2' of STORE instruction uses same
register file at the same
>>>>>> cycle, it causes resource conflict and can not be
executed on same
>>>>>> cycle. This restriction depends on instruction type. I
tried to
>>>>>> consider
>>>>>> each register file as a resource unit which is consumed
by each
>>>>>> operand.
>>>>>> While scheduling instructions per cycle, used register
file is
>>>>>> recorded
>>>>>> on state per cycle to check the conflict. In our
heristic, it
>>>>>> depends on
>>>>>> operand's latency to record this resource on
specific cycle's
>>>>>> state. so
>>>>>> I have tried to find a way to get latency and resource
with each
>>>>>> operand. If it is not possible to support this feature
with
>>>>>> per-operand
>>>>>> resource model, as you suggested, I will try to make
our own state
>>>>>> machine or other scheduling constraint logic. I am
newbee with
>>>>>> scheduler. If you have any kinds of comment or feel
something worng,
>>>>>> please let me know. It will be really helpful.
>>>>> It sounds like the register file is static and does not
depend on
>>>>> register allocation. In this case, what you tried makes
sense but
>>>>> is really not supported. The machine model tables are
designed to
>>>>> be efficient for the common case, and per-operand resources
don’t
>>>>> really make sense most of the time.
>>>>>
>>>>> It sounds like you want to model the pipeline stage at
which a
>>>>> resource is used. To do that with the per-operand machine
model
>>>>> (misnomer), I think we need a ResourceDelay vector in
addition to
>>>>> ResourceCycles, which we could easily add.
>>>>>
>>>>> However, overall, I think you’re target is interesting
enough that
>>>>> you may be better off augmenting the standard machine model
with
>>>>> your own model. Your scheduler plugin could keep your own
tables
>>>>> or state machine to model the constraints.
>>>>>
>>>>> If you want to be clever, you could write tablegen code to
build
>>>>> your model up from the SchedRead/Write definitions that are
part
>>>>> of the standard model. You could add extra fields specific
to your
>>>>> model.
>>>>>
>>>>> Were you previously using the old instruction itineraries,
and now
>>>>> moving to the new model?
>>>>>
>>>>> -Andy
>>>>>
>>>>>> Thanks for your kind response,
>>>>>> JinGu Kang
>>>>>>
>>>>>> On 2014-02-20 오전 2:27, Andrew Trick wrote:
>>>>>>> Hi JinGu,
>>>>>>>
>>>>>>> We currently have the ResourceCycles list to
indicate the number
>>>>>>> of cpu cycles during which a resource is reserved.
We could
>>>>>>> simply add a ResourceDelay with similar grammar.
The
>>>>>>> MachineScheduler could be taught to keep track of
the first and
>>>>>>> last time that a resource is reserved.
>>>>>>>
>>>>>>> Note that the MachineScheduler will work with the
instruction
>>>>>>> itineraries if you choose to implement them. That’s
the only way
>>>>>>> to get a full reservation table without customizing
the
>>>>>>> scheduler. You can plugin your own state machine or
other
>>>>>>> scheduling constraint logic. You may want to do
this if you have
>>>>>>> very complicated constraints.
>>>>>>>
>>>>>>> Can you provide an example of the most complicated
instruction
>>>>>>> resources that you need to model?
>>>>>>>
>>>>>>> -Andy
>>>>>>>
>>>>>>> On Feb 19, 2014, at 4:57 AM, JinGu Kang <jingu
at codeplay.com
>>>>>>> <mailto:jingu at codeplay.com>> wrote:
>>>>>>>
>>>>>>>> Hi Andy,
>>>>>>>>
>>>>>>>> I am sorry to misunderstand
'ReadAdvance' code. In order to support
>>>>>>>> resource per operand, I feel we need more table
and function. If
>>>>>>>> possbile, I would like to listen to your
opinion whether this
>>>>>>>> feature is
>>>>>>>> useful or not. As I mentioned on previous
e-mail, it will be
>>>>>>>> useful to
>>>>>>>> access the latency and the resource per operand
while checking
>>>>>>>> resource
>>>>>>>> conflict per cycle.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> JinGu Kang
>>>>>>>>
>>>>>>>> On 18/02/14 23:09, jingu wrote:
>>>>>>>>>> Resources and latency are not tied. An
instruction is mapped to a
>>>>>>>>>> scheduling class. A scheduling class is
mapped to a set of
>>>>>>>>>> resources
>>>>>>>>>> and a per-operand list of latencies.
>>>>>>>>> Thanks for your kind explanation.
>>>>>>>>>
>>>>>>>>> Our heuristic algorithm have needed the
latency and the
>>>>>>>>> resource per
>>>>>>>>> operand to check resource conflicts per
cycle. In order to support
>>>>>>>>> this with LLVM, I expected a per-operand
list of resources like
>>>>>>>>> latencies with a scheduling class.
>>>>>>>>>
>>>>>>>>> Can I ask you something to modify on
tablegen? I think that the
>>>>>>>>> 'WriteResourceID' field of
'MCWriteLatencyEntry' is for
>>>>>>>>> identifying
>>>>>>>>> the WriteResources of each defintion as
commented on code. As you
>>>>>>>>> know, tablegen sets the
'WriteResourceID' field of
>>>>>>>>> 'MCWriteLatencyEntry' with
'WriteID' when the 'Write' of
>>>>>>>>> defition is
>>>>>>>>> referenced by a 'ReadAdvance'. If
we always set this field with
>>>>>>>>> 'WriteID', it causes problem? I can
see that 'ReadAdvance'
>>>>>>>>> only uses
>>>>>>>>> the 'WriteResourceID' field of
'MCWriteLatencyEntry' in
>>>>>>>>> 'computeOperandLatency' function. I
think the pair of latency and
>>>>>>>>> write resource for defintion will be useful
to check conflicts of
>>>>>>>>> resources. As reference, I have attached
simple patch.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> JinGu Kang
>>>>>>>>>
>>>>
>>>>
>>>> --
>>>> Pierre-Andre Saulais
>>>> Compiler Developer
>>>> Codeplay Software Ltd
>>>> 45 York Place, Edinburgh, EH1 3HP
>>>> Tel: 0131 466 0503
>>>> Fax: 0131 557 6600
>>>> Website:http://www.codeplay.com
<http://www.codeplay.com/>
>>>> Twitter:https://twitter.com/codeplaysoft
>>>>
>>>> This email and any attachments may contain confidential and /or
>>>> privileged information and is for use by the addressee only. If
you
>>>> are not the intended recipient, please notify Codeplay Software
Ltd
>>>> immediately and delete the message from your computer. You may
not
>>>> copy or forward it,or use or disclose its contents to any other
>>>> person. Any views or other information in this message which do
not
>>>> relate to our business are not authorized by Codeplay software
Ltd,
>>>> nor does this message form part of any contract unless so
stated.
>>>> As internet communications are capable of data corruption
Codeplay
>>>> Software Ltd does not accept any responsibility for any changes
>>>> made to this message after it was sent. Please note that
Codeplay
>>>> Software Ltd does not accept any liability or responsibility
for
>>>> viruses and it is your responsibility to scan any attachments.
>>>> Company registered in England and Wales, number: 04567874
>>>> Registered office: 81 Linkfield Street, Redhill RH1 6BY
>>>>
>>>> <add_resource_delays.patch>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu 
>>> <mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu 
>>> <http://llvm.cs.uiuc.edu/>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/b84641a4/attachment.html>

Andrew Trick

2014-Mar-11 03:47 UTC

head link

[LLVMdev] Question about per-operand machine model

On Mar 4, 2014, at 12:25 PM, Pierre-André Saulais <pierre-andre at
codeplay.com> wrote:
> On 04/03/14 18:08, Andrew Trick wrote:
>> 
>> On Mar 4, 2014, at 10:05 AM, Pete Cooper <peter_cooper at
apple.com> wrote:
>> 
>>> 
>>> On Mar 3, 2014, at 2:21 PM, Andrew Trick <atrick at
apple.com> wrote:
>>> 
>>>> 
>>>> On Mar 3, 2014, at 8:53 AM, Pierre-Andre Saulais
<pierre-andre at codeplay.com> wrote:
>>>> 
>>>>> Hi Andrew,
>>>>> 
>>>>> We are currently using a custom model where scheduling
information is attached to each MCInstrDesc through tablegen, and we're
trying to move to one of LLVM's models.
>>>>> 
>>>>> To expand on what JinGu mentioned, our target has explicit
ports that are used to read and write values from and to the register file. The
read port is usually accessed on cycle 0 while the write port is accessed when
the result is written back to the destination register. Let's assume ADD has
a latency of 1, MUL has a latency of 2 and both use port P0 to write back their
result. The two instructions below would conflict on P0:
>>>>> 
>>>>> MUL r3, r4, r5
>>>>> ADD r0, r1, r2
>>>>> NOP               ; Both r0 and r4 are written back using
P0 - conflict.
>>>>> 
>>>>> On our target there is no interlock which means any
conflict results in the wrong value being written back to one of the register.
That's why we want to model these ports as resources in the new model.
That's also why we map these port resources to each operand as each operand
accesses a different port.
>>>>> 
>>>>> After reading your replies, we have realized that the
scheduler does not need to know which operand corresponds to each port. It
simply needs to know the set of ports used by each instruction and after how
many cycles these ports are used/reserved to avoid any conflict. That's why
I believe the new process resource model closely fits what we need, except for
the per-resource delay you mentioned.
>>>>> 
>>>>> This is how our model currently looks like:
>>>>> 
>>>>> def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1,
0_LATENCY_WITH_P2], [II_ADD]>;
>>>>> def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1,
0_LATENCY_WITH_P2], [II_MUL]>;
>>>>> 
>>>>> where n_LATENCY_WITH_p is defined roughly as:
>>>>> 
>>>>> class n_LATENCY_WITH_p<int latency, ProcResourceKind
port> : SchedWriteRes<[PR_Pp]> {
>>>>>    let Latency = latency;
>>>>>    let ResourceDelays = [latency];
>>>>> }
>>>>> 
>>>>> class PR_Pp<int portIdx> : ProcResource<1>;
>>>>> 
>>>>> The latency for register write-back/port access is static
and without interlock, which I think means the port resources should have
'Buffered = 0' in the definition. Is that correct?
>>>> 
>>>> Yes, but it isn’t sufficient. The scheduler makes no attempt to
insert nops currently. However, at the very least, you will want to implement
your own MachineSchedStrategy. It would be natural to handle nop insertion
within your implementation.
> Thanks, I'll have a look at MachineSchedStrategy and see how we can
implement it for our target.
>>> Nop insertion during scheduling sounds good to me, but nop
insertion after regalloc has the advantage of being able to insert nops for
spill/reload.  Unless you don’t have spills?
>> 
>> To elaborate a bit more, MachineScheduler can run both preRA and
postRA. So, if you want to do nop insertion within MachineScheduler (as opposed
to a separate pass) you could enable it only during postRA scheduling.
>> 
>> -Andy
> We are currently doing scheduling with a custom scheduler/packetization
pass at the very end of the machine compilation process, before object
generation. We aren't using the MachineScheduler or any other scheduling
before that pass, neither preRA or postRA. I think we could benefit from at
least preRA scheduling, so that the register live ranges seen by the RA better
match the final ranges (after our scheduling pass).
I see, then you’re really can do whatever you want with the machine model.
You’re just limited by the tables produced by the current tablegen backend and
whatever features you add to it. You just need to understand the format of the
tables in <YourTarget>GenSubtargetInfo.inc and try to fit your model into
that format. Hopefully adding ResourceDelays will give you enough flexibility.
> We do have spills, but I'm not sure if there is a benefit to inserting
nops before our final pass though. Would that improve register allocation, if it
was possible to do so preRA?
Nope. The main reason to bundle pre-RA is to expose more opportunity for code
motion (without physical register dependencies), thus generate tighter bundles.
In general, there’s no reason to introduce nops other than to meet your encoding
constraints, so it’s just a matter of picking the most convenient place to do
that.

-Andy
>> 
>>> Pete
>>>> 
>>>> In fact, the interpretation of most machine model properties
(MircoOpBufferSize, resource BufferSize, ResourceCycles, ResourceDelay) is
handled within the MachineSchedStrategy. In past emails I have been explaining
how the GenericScheduler interprets the model, but it is really up to your
custom strategy to implement the model.
>>>> 
>>>>> I have attached a patch that adds the
'ResourceDelays' field in tablegen. Could you have a look at it? A
couple possible issues are:
>>>>> - 'Delay' is signed, since 'Cycles' in
MCWriteLatencyEntry is also signed.
>>>> 
>>>> Sure.
>>>> 
>>>>> - When an instruction accesses the same resource multiple
times, the uses are aggregated in SubtargetEmitter::GenSchedClassTables. I'm
not sure how that would work if we add a 'Delay' field to
MCWriteProcResEntry.
>>>> 
>>>> Me neither. I suggest adding an assert to make sure no one
accidentally uses two resources with non-zero delay. Otherwise, your patch looks
fine to me. It’s totally up to you to test it though. I really want to take this
patch, but we have no mechanism for testing out-of-tree target features.
> Adding an assert when someone uses two resources with non-zero delay, or
maybe two different delays, sounds good to me. I'm glad to hear that
you'd want to take patches even for out-of-tree features, it's much
appreciated.
> 
> I'm not very familiar with the itinerary model, but aren't these
two ways of expressing schedules equivalent?
> 
> def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2],
[II_ADD]>;
> 
> InstrItinData<II_ADD, [InstrStage<1, [P1], 0>, InstrStage<1,
[P2]>, InstrStage<1, [P0]>], [1, 0, 0]>
> 
> If that's the case, then does the new machine model express itineraries
with more than one stage, without adding this 'ResourceDelays' field?
> 
> Thanks,
> Pierre
> 
>>>> 
>>>> -Andy
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Pierre
>>>>> 
>>>>> On 28/02/14 01:00, Andrew Trick wrote:
>>>>>> On Feb 19, 2014, at 1:54 PM, jingu <jingu at
codeplay.com> wrote:
>>>>>> 
>>>>>>> Hi Andy,
>>>>>>> 
>>>>>>> I am trying to schedule and packetize instructions
for VLIW at post-RA
>>>>>>> stage or final codegen stage, where code
transformations are not allowed
>>>>>>> any more, because hardware can not resolve resource
conflict. There is a
>>>>>>> simple example as following:
>>>>>>> 
>>>>>>> ADD dest_reg1, src_reg1, src_reg2 (functional unit
: ALU)
>>>>>>> STORE dest_reg2, mem (functional unit: LOAD_STORE)
>>>>>>> 
>>>>>>> These instructions can be genally packetized
together because there is
>>>>>>> no dependency among operands and they use different
functional unit. But
>>>>>>> we have one more restricton. The restriction is
that some of
>>>>>>> instructions can not access to same register file
at the same cycle. In
>>>>>>> other words, if 'src_reg1' of ADD
instruction uses register file 'A' and
>>>>>>> 'dest_reg2' of STORE instruction uses same
register file at the same
>>>>>>> cycle, it causes resource conflict and can not be
executed on same
>>>>>>> cycle. This restriction depends on instruction
type. I tried to consider
>>>>>>> each register file as a resource unit which is
consumed by each operand.
>>>>>>> While scheduling instructions per cycle, used
register file is recorded
>>>>>>> on state per cycle to check the conflict. In our
heristic, it depends on
>>>>>>> operand's latency to record this resource on
specific cycle's state. so
>>>>>>> I have tried to find a way to get latency and
resource with each
>>>>>>> operand. If it is not possible to support this
feature with per-operand
>>>>>>> resource model, as you suggested, I will try to
make our own state
>>>>>>> machine or other scheduling constraint logic. I am
newbee with
>>>>>>> scheduler. If you have any kinds of comment or feel
something worng,
>>>>>>> please let me know. It will be really helpful.
>>>>>> It sounds like the register file is static and does not
depend on register allocation. In this case, what you tried makes sense but is
really not supported. The machine model tables are designed to be efficient for
the common case, and per-operand resources don’t really make sense most of the
time.
>>>>>> 
>>>>>> It sounds like you want to model the pipeline stage at
which a resource is used. To do that with the per-operand machine model
(misnomer), I think we need a ResourceDelay vector in addition to
ResourceCycles, which we could easily add.
>>>>>> 
>>>>>> However, overall, I think you’re target is interesting
enough that you may be better off augmenting the standard machine model with
your own model. Your scheduler plugin could keep your own tables or state
machine to model the constraints.
>>>>>> 
>>>>>> If you want to be clever, you could write tablegen code
to build your model up from the SchedRead/Write definitions that are part of the
standard model. You could add extra fields specific to your model.
>>>>>> 
>>>>>> Were you previously using the old instruction
itineraries, and now moving to the new model?
>>>>>> 
>>>>>> -Andy
>>>>>> 
>>>>>>> Thanks for your kind response,
>>>>>>> JinGu Kang
>>>>>>> 
>>>>>>> On 2014-02-20 오전 2:27, Andrew Trick wrote:
>>>>>>>> Hi JinGu,
>>>>>>>> 
>>>>>>>> We currently have the ResourceCycles list to
indicate the number of cpu cycles during which a resource is reserved. We could
simply add a ResourceDelay with similar grammar. The MachineScheduler could be
taught to keep track of the first and last time that a resource is reserved.
>>>>>>>> 
>>>>>>>> Note that the MachineScheduler will work with
the instruction itineraries if you choose to implement them. That’s the only way
to get a full reservation table without customizing the scheduler. You can
plugin your own state machine or other scheduling constraint logic. You may want
to do this if you have very complicated constraints.
>>>>>>>> 
>>>>>>>> Can you provide an example of the most
complicated instruction resources that you need to model?
>>>>>>>> 
>>>>>>>> -Andy
>>>>>>>> 
>>>>>>>> On Feb 19, 2014, at 4:57 AM, JinGu Kang
<jingu at codeplay.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi Andy,
>>>>>>>>> 
>>>>>>>>> I am sorry to misunderstand
'ReadAdvance' code. In order to support
>>>>>>>>> resource per operand, I feel we need more
table and function. If
>>>>>>>>> possbile, I would like to listen to your
opinion whether this feature is
>>>>>>>>> useful or not. As I mentioned on previous
e-mail, it will be useful to
>>>>>>>>> access the latency and the resource per
operand while checking resource
>>>>>>>>> conflict per cycle.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> JinGu Kang
>>>>>>>>> 
>>>>>>>>> On 18/02/14 23:09, jingu wrote:
>>>>>>>>>>> Resources and latency are not tied.
An instruction is mapped to a
>>>>>>>>>>> scheduling class. A scheduling
class is mapped to a set of resources
>>>>>>>>>>> and a per-operand list of
latencies.
>>>>>>>>>> Thanks for your kind explanation.
>>>>>>>>>> 
>>>>>>>>>> Our heuristic algorithm have needed the
latency and the resource per
>>>>>>>>>> operand to check resource conflicts per
cycle. In order to support
>>>>>>>>>> this with LLVM, I expected a
per-operand list of resources like
>>>>>>>>>> latencies with a scheduling class.
>>>>>>>>>> 
>>>>>>>>>> Can I ask you something to modify on
tablegen? I think that the
>>>>>>>>>> 'WriteResourceID' field of
'MCWriteLatencyEntry' is for identifying
>>>>>>>>>> the WriteResources of each defintion as
commented on code. As you
>>>>>>>>>> know, tablegen sets the
'WriteResourceID' field of
>>>>>>>>>> 'MCWriteLatencyEntry' with
'WriteID' when the 'Write' of defition is
>>>>>>>>>> referenced by a 'ReadAdvance'.
If we always set this field with
>>>>>>>>>> 'WriteID', it causes problem? I
can see that 'ReadAdvance' only uses
>>>>>>>>>> the 'WriteResourceID' field of
'MCWriteLatencyEntry' in
>>>>>>>>>> 'computeOperandLatency'
function. I think the pair of latency and
>>>>>>>>>> write resource for defintion will be
useful to check conflicts of
>>>>>>>>>> resources. As reference, I have
attached simple patch.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> JinGu Kang
>>>>>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Pierre-Andre Saulais
>>>>> Compiler Developer
>>>>> Codeplay Software Ltd
>>>>> 45 York Place, Edinburgh, EH1 3HP
>>>>> Tel: 0131 466 0503
>>>>> Fax: 0131 557 6600
>>>>> Website: http://www.codeplay.com
>>>>> Twitter: https://twitter.com/codeplaysoft
>>>>> 
>>>>> This email and any attachments may contain confidential and
/or privileged information and is for use by the addressee only. If you are not
the intended recipient, please notify Codeplay Software Ltd immediately and
delete the message from your computer. You may not copy or forward it,or use or
disclose its contents to any other person. Any views or other information in
this message which do not relate to our business are not authorized by Codeplay
software Ltd, nor does this message form part of any contract unless so stated.
>>>>> As internet communications are capable of data corruption
Codeplay Software Ltd does not accept any responsibility for any changes made to
this message after it was sent. Please note that Codeplay Software Ltd does not
accept any liability or responsibility for viruses and it is your responsibility
to scan any attachments.
>>>>> Company registered in England and Wales, number: 04567874
>>>>> Registered office: 81 Linkfield Street, Redhill RH1 6BY
>>>>> 
>>>>> <add_resource_delays.patch>
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140310/18f61269/attachment.html>

llvm dev - Mar 2014 - [LLVMdev] Question about per-operand machine model

[LLVMdev] Question about per-operand machine model

[LLVMdev] Question about per-operand machine model

[LLVMdev] Question about per-operand machine model