On Mar 4, 2014, at 10:05 AM, Pete Cooper <peter_cooper at apple.com> wrote:> > On Mar 3, 2014, at 2:21 PM, Andrew Trick <atrick at apple.com> wrote: > >> >> On Mar 3, 2014, at 8:53 AM, Pierre-Andre Saulais <pierre-andre at codeplay.com> wrote: >> >>> Hi Andrew, >>> >>> We are currently using a custom model where scheduling information is attached to each MCInstrDesc through tablegen, and we're trying to move to one of LLVM's models. >>> >>> To expand on what JinGu mentioned, our target has explicit ports that are used to read and write values from and to the register file. The read port is usually accessed on cycle 0 while the write port is accessed when the result is written back to the destination register. Let's assume ADD has a latency of 1, MUL has a latency of 2 and both use port P0 to write back their result. The two instructions below would conflict on P0: >>> >>> MUL r3, r4, r5 >>> ADD r0, r1, r2 >>> NOP ; Both r0 and r4 are written back using P0 - conflict. >>> >>> On our target there is no interlock which means any conflict results in the wrong value being written back to one of the register. That's why we want to model these ports as resources in the new model. That's also why we map these port resources to each operand as each operand accesses a different port. >>> >>> After reading your replies, we have realized that the scheduler does not need to know which operand corresponds to each port. It simply needs to know the set of ports used by each instruction and after how many cycles these ports are used/reserved to avoid any conflict. That's why I believe the new process resource model closely fits what we need, except for the per-resource delay you mentioned. >>> >>> This is how our model currently looks like: >>> >>> def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], [II_ADD]>; >>> def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], [II_MUL]>; >>> >>> where n_LATENCY_WITH_p is defined roughly as: >>> >>> class n_LATENCY_WITH_p<int latency, ProcResourceKind port> : SchedWriteRes<[PR_Pp]> { >>> let Latency = latency; >>> let ResourceDelays = [latency]; >>> } >>> >>> class PR_Pp<int portIdx> : ProcResource<1>; >>> >>> The latency for register write-back/port access is static and without interlock, which I think means the port resources should have 'Buffered = 0' in the definition. Is that correct? >> >> Yes, but it isn’t sufficient. The scheduler makes no attempt to insert nops currently. However, at the very least, you will want to implement your own MachineSchedStrategy. It would be natural to handle nop insertion within your implementation. > Nop insertion during scheduling sounds good to me, but nop insertion after regalloc has the advantage of being able to insert nops for spill/reload. Unless you don’t have spills?To elaborate a bit more, MachineScheduler can run both preRA and postRA. So, if you want to do nop insertion within MachineScheduler (as opposed to a separate pass) you could enable it only during postRA scheduling. -Andy> Pete >> >> In fact, the interpretation of most machine model properties (MircoOpBufferSize, resource BufferSize, ResourceCycles, ResourceDelay) is handled within the MachineSchedStrategy. In past emails I have been explaining how the GenericScheduler interprets the model, but it is really up to your custom strategy to implement the model. >> >>> I have attached a patch that adds the 'ResourceDelays' field in tablegen. Could you have a look at it? A couple possible issues are: >>> - 'Delay' is signed, since 'Cycles' in MCWriteLatencyEntry is also signed. >> >> Sure. >> >>> - When an instruction accesses the same resource multiple times, the uses are aggregated in SubtargetEmitter::GenSchedClassTables. I'm not sure how that would work if we add a 'Delay' field to MCWriteProcResEntry. >> >> Me neither. I suggest adding an assert to make sure no one accidentally uses two resources with non-zero delay. Otherwise, your patch looks fine to me. It’s totally up to you to test it though. I really want to take this patch, but we have no mechanism for testing out-of-tree target features. >> >> -Andy >> >>> >>> Thanks, >>> Pierre >>> >>> On 28/02/14 01:00, Andrew Trick wrote: >>>> On Feb 19, 2014, at 1:54 PM, jingu <jingu at codeplay.com> wrote: >>>> >>>>> Hi Andy, >>>>> >>>>> I am trying to schedule and packetize instructions for VLIW at post-RA >>>>> stage or final codegen stage, where code transformations are not allowed >>>>> any more, because hardware can not resolve resource conflict. There is a >>>>> simple example as following: >>>>> >>>>> ADD dest_reg1, src_reg1, src_reg2 (functional unit : ALU) >>>>> STORE dest_reg2, mem (functional unit: LOAD_STORE) >>>>> >>>>> These instructions can be genally packetized together because there is >>>>> no dependency among operands and they use different functional unit. But >>>>> we have one more restricton. The restriction is that some of >>>>> instructions can not access to same register file at the same cycle. In >>>>> other words, if 'src_reg1' of ADD instruction uses register file 'A' and >>>>> 'dest_reg2' of STORE instruction uses same register file at the same >>>>> cycle, it causes resource conflict and can not be executed on same >>>>> cycle. This restriction depends on instruction type. I tried to consider >>>>> each register file as a resource unit which is consumed by each operand. >>>>> While scheduling instructions per cycle, used register file is recorded >>>>> on state per cycle to check the conflict. In our heristic, it depends on >>>>> operand's latency to record this resource on specific cycle's state. so >>>>> I have tried to find a way to get latency and resource with each >>>>> operand. If it is not possible to support this feature with per-operand >>>>> resource model, as you suggested, I will try to make our own state >>>>> machine or other scheduling constraint logic. I am newbee with >>>>> scheduler. If you have any kinds of comment or feel something worng, >>>>> please let me know. It will be really helpful. >>>> It sounds like the register file is static and does not depend on register allocation. In this case, what you tried makes sense but is really not supported. The machine model tables are designed to be efficient for the common case, and per-operand resources don’t really make sense most of the time. >>>> >>>> It sounds like you want to model the pipeline stage at which a resource is used. To do that with the per-operand machine model (misnomer), I think we need a ResourceDelay vector in addition to ResourceCycles, which we could easily add. >>>> >>>> However, overall, I think you’re target is interesting enough that you may be better off augmenting the standard machine model with your own model. Your scheduler plugin could keep your own tables or state machine to model the constraints. >>>> >>>> If you want to be clever, you could write tablegen code to build your model up from the SchedRead/Write definitions that are part of the standard model. You could add extra fields specific to your model. >>>> >>>> Were you previously using the old instruction itineraries, and now moving to the new model? >>>> >>>> -Andy >>>> >>>>> Thanks for your kind response, >>>>> JinGu Kang >>>>> >>>>> On 2014-02-20 오전 2:27, Andrew Trick wrote: >>>>>> Hi JinGu, >>>>>> >>>>>> We currently have the ResourceCycles list to indicate the number of cpu cycles during which a resource is reserved. We could simply add a ResourceDelay with similar grammar. The MachineScheduler could be taught to keep track of the first and last time that a resource is reserved. >>>>>> >>>>>> Note that the MachineScheduler will work with the instruction itineraries if you choose to implement them. That’s the only way to get a full reservation table without customizing the scheduler. You can plugin your own state machine or other scheduling constraint logic. You may want to do this if you have very complicated constraints. >>>>>> >>>>>> Can you provide an example of the most complicated instruction resources that you need to model? >>>>>> >>>>>> -Andy >>>>>> >>>>>> On Feb 19, 2014, at 4:57 AM, JinGu Kang <jingu at codeplay.com> wrote: >>>>>> >>>>>>> Hi Andy, >>>>>>> >>>>>>> I am sorry to misunderstand 'ReadAdvance' code. In order to support >>>>>>> resource per operand, I feel we need more table and function. If >>>>>>> possbile, I would like to listen to your opinion whether this feature is >>>>>>> useful or not. As I mentioned on previous e-mail, it will be useful to >>>>>>> access the latency and the resource per operand while checking resource >>>>>>> conflict per cycle. >>>>>>> >>>>>>> Thanks, >>>>>>> JinGu Kang >>>>>>> >>>>>>> On 18/02/14 23:09, jingu wrote: >>>>>>>>> Resources and latency are not tied. An instruction is mapped to a >>>>>>>>> scheduling class. A scheduling class is mapped to a set of resources >>>>>>>>> and a per-operand list of latencies. >>>>>>>> Thanks for your kind explanation. >>>>>>>> >>>>>>>> Our heuristic algorithm have needed the latency and the resource per >>>>>>>> operand to check resource conflicts per cycle. In order to support >>>>>>>> this with LLVM, I expected a per-operand list of resources like >>>>>>>> latencies with a scheduling class. >>>>>>>> >>>>>>>> Can I ask you something to modify on tablegen? I think that the >>>>>>>> 'WriteResourceID' field of 'MCWriteLatencyEntry' is for identifying >>>>>>>> the WriteResources of each defintion as commented on code. As you >>>>>>>> know, tablegen sets the 'WriteResourceID' field of >>>>>>>> 'MCWriteLatencyEntry' with 'WriteID' when the 'Write' of defition is >>>>>>>> referenced by a 'ReadAdvance'. If we always set this field with >>>>>>>> 'WriteID', it causes problem? I can see that 'ReadAdvance' only uses >>>>>>>> the 'WriteResourceID' field of 'MCWriteLatencyEntry' in >>>>>>>> 'computeOperandLatency' function. I think the pair of latency and >>>>>>>> write resource for defintion will be useful to check conflicts of >>>>>>>> resources. As reference, I have attached simple patch. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> JinGu Kang >>>>>>>> >>> >>> >>> -- >>> Pierre-Andre Saulais >>> Compiler Developer >>> Codeplay Software Ltd >>> 45 York Place, Edinburgh, EH1 3HP >>> Tel: 0131 466 0503 >>> Fax: 0131 557 6600 >>> Website: http://www.codeplay.com >>> Twitter: https://twitter.com/codeplaysoft >>> >>> This email and any attachments may contain confidential and /or privileged information and is for use by the addressee only. If you are not the intended recipient, please notify Codeplay Software Ltd immediately and delete the message from your computer. You may not copy or forward it,or use or disclose its contents to any other person. Any views or other information in this message which do not relate to our business are not authorized by Codeplay software Ltd, nor does this message form part of any contract unless so stated. >>> As internet communications are capable of data corruption Codeplay Software Ltd does not accept any responsibility for any changes made to this message after it was sent. Please note that Codeplay Software Ltd does not accept any liability or responsibility for viruses and it is your responsibility to scan any attachments. >>> Company registered in England and Wales, number: 04567874 >>> Registered office: 81 Linkfield Street, Redhill RH1 6BY >>> >>> <add_resource_delays.patch> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/3e91d049/attachment.html>
Pierre-André Saulais
2014-Mar-04 20:25 UTC
[LLVMdev] Question about per-operand machine model
On 04/03/14 18:08, Andrew Trick wrote:> > On Mar 4, 2014, at 10:05 AM, Pete Cooper <peter_cooper at apple.com > <mailto:peter_cooper at apple.com>> wrote: > >> >> On Mar 3, 2014, at 2:21 PM, Andrew Trick <atrick at apple.com >> <mailto:atrick at apple.com>> wrote: >> >>> >>> On Mar 3, 2014, at 8:53 AM, Pierre-Andre Saulais >>> <pierre-andre at codeplay.com <mailto:pierre-andre at codeplay.com>> wrote: >>> >>>> Hi Andrew, >>>> >>>> We are currently using a custom model where scheduling information >>>> is attached to each MCInstrDesc through tablegen, and we're trying >>>> to move to one of LLVM's models. >>>> >>>> To expand on what JinGu mentioned, our target has explicit ports >>>> that are used to read and write values from and to the register >>>> file. The read port is usually accessed on cycle 0 while the write >>>> port is accessed when the result is written back to the destination >>>> register. Let's assume ADD has a latency of 1, MUL has a latency of >>>> 2 and both use port P0 to write back their result. The two >>>> instructions below would conflict on P0: >>>> >>>> MUL r3, r4, r5 >>>> ADD r0, r1, r2 >>>> NOP ; Both r0 and r4 are written back using P0 - >>>> conflict. >>>> >>>> On our target there is no interlock which means any conflict >>>> results in the wrong value being written back to one of the >>>> register. That's why we want to model these ports as resources in >>>> the new model. That's also why we map these port resources to each >>>> operand as each operand accesses a different port. >>>> >>>> After reading your replies, we have realized that the scheduler >>>> does not need to know which operand corresponds to each port. It >>>> simply needs to know the set of ports used by each instruction and >>>> after how many cycles these ports are used/reserved to avoid any >>>> conflict. That's why I believe the new process resource model >>>> closely fits what we need, except for the per-resource delay you >>>> mentioned. >>>> >>>> This is how our model currently looks like: >>>> >>>> def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, >>>> 0_LATENCY_WITH_P2], [II_ADD]>; >>>> def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, >>>> 0_LATENCY_WITH_P2], [II_MUL]>; >>>> >>>> where n_LATENCY_WITH_p is defined roughly as: >>>> >>>> class n_LATENCY_WITH_p<int latency, ProcResourceKind port> : >>>> SchedWriteRes<[PR_Pp]> { >>>> let Latency = latency; >>>> let ResourceDelays = [latency]; >>>> } >>>> >>>> class PR_Pp<int portIdx> : ProcResource<1>; >>>> >>>> The latency for register write-back/port access is static and >>>> without interlock, which I think means the port resources should >>>> have 'Buffered = 0' in the definition. Is that correct? >>> >>> Yes, but it isn’t sufficient. The scheduler makes no attempt to >>> insert nops currently. However, at the very least, you will want to >>> implement your own MachineSchedStrategy. It would be natural to >>> handle nop insertion within your implementation.Thanks, I'll have a look at MachineSchedStrategy and see how we can implement it for our target.>> Nop insertion during scheduling sounds good to me, but nop insertion >> after regalloc has the advantage of being able to insert nops for >> spill/reload. Unless you don’t have spills? > > To elaborate a bit more, MachineScheduler can run both preRA and > postRA. So, if you want to do nop insertion within MachineScheduler > (as opposed to a separate pass) you could enable it only during postRA > scheduling. > > -AndyWe are currently doing scheduling with a custom scheduler/packetization pass at the very end of the machine compilation process, before object generation. We aren't using the MachineScheduler or any other scheduling before that pass, neither preRA or postRA. I think we could benefit from at least preRA scheduling, so that the register live ranges seen by the RA better match the final ranges (after our scheduling pass). We do have spills, but I'm not sure if there is a benefit to inserting nops before our final pass though. Would that improve register allocation, if it was possible to do so preRA?> >> Pete >>> >>> In fact, the interpretation of most machine model properties >>> (MircoOpBufferSize, resource BufferSize, ResourceCycles, >>> ResourceDelay) is handled within the MachineSchedStrategy. In past >>> emails I have been explaining how the GenericScheduler interprets >>> the model, but it is really up to your custom strategy to implement >>> the model. >>> >>>> I have attached a patch that adds the 'ResourceDelays' field in >>>> tablegen. Could you have a look at it? A couple possible issues are: >>>> - 'Delay' is signed, since 'Cycles' in MCWriteLatencyEntry is also >>>> signed. >>> >>> Sure. >>> >>>> - When an instruction accesses the same resource multiple times, >>>> the uses are aggregated in SubtargetEmitter::GenSchedClassTables. >>>> I'm not sure how that would work if we add a 'Delay' field to >>>> MCWriteProcResEntry. >>> >>> Me neither. I suggest adding an assert to make sure no one >>> accidentally uses two resources with non-zero delay. Otherwise, your >>> patch looks fine to me. It’s totally up to you to test it though. I >>> really want to take this patch, but we have no mechanism for testing >>> out-of-tree target features.Adding an assert when someone uses two resources with non-zero delay, or maybe two different delays, sounds good to me. I'm glad to hear that you'd want to take patches even for out-of-tree features, it's much appreciated. I'm not very familiar with the itinerary model, but aren't these two ways of expressing schedules equivalent? def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], [II_ADD]>; InstrItinData<II_ADD, [InstrStage<1, [P1], 0>, InstrStage<1, [P2]>, InstrStage<1, [P0]>], [1, 0, 0]> If that's the case, then does the new machine model express itineraries with more than one stage, without adding this 'ResourceDelays' field? Thanks, Pierre>>> >>> -Andy >>> >>>> >>>> Thanks, >>>> Pierre >>>> >>>> On 28/02/14 01:00, Andrew Trick wrote: >>>>> On Feb 19, 2014, at 1:54 PM, jingu <jingu at codeplay.com >>>>> <mailto:jingu at codeplay.com>> wrote: >>>>> >>>>>> Hi Andy, >>>>>> >>>>>> I am trying to schedule and packetize instructions for VLIW at >>>>>> post-RA >>>>>> stage or final codegen stage, where code transformations are not >>>>>> allowed >>>>>> any more, because hardware can not resolve resource conflict. >>>>>> There is a >>>>>> simple example as following: >>>>>> >>>>>> ADD dest_reg1, src_reg1, src_reg2 (functional unit : ALU) >>>>>> STORE dest_reg2, mem (functional unit: LOAD_STORE) >>>>>> >>>>>> These instructions can be genally packetized together because >>>>>> there is >>>>>> no dependency among operands and they use different functional >>>>>> unit. But >>>>>> we have one more restricton. The restriction is that some of >>>>>> instructions can not access to same register file at the same >>>>>> cycle. In >>>>>> other words, if 'src_reg1' of ADD instruction uses register file >>>>>> 'A' and >>>>>> 'dest_reg2' of STORE instruction uses same register file at the same >>>>>> cycle, it causes resource conflict and can not be executed on same >>>>>> cycle. This restriction depends on instruction type. I tried to >>>>>> consider >>>>>> each register file as a resource unit which is consumed by each >>>>>> operand. >>>>>> While scheduling instructions per cycle, used register file is >>>>>> recorded >>>>>> on state per cycle to check the conflict. In our heristic, it >>>>>> depends on >>>>>> operand's latency to record this resource on specific cycle's >>>>>> state. so >>>>>> I have tried to find a way to get latency and resource with each >>>>>> operand. If it is not possible to support this feature with >>>>>> per-operand >>>>>> resource model, as you suggested, I will try to make our own state >>>>>> machine or other scheduling constraint logic. I am newbee with >>>>>> scheduler. If you have any kinds of comment or feel something worng, >>>>>> please let me know. It will be really helpful. >>>>> It sounds like the register file is static and does not depend on >>>>> register allocation. In this case, what you tried makes sense but >>>>> is really not supported. The machine model tables are designed to >>>>> be efficient for the common case, and per-operand resources don’t >>>>> really make sense most of the time. >>>>> >>>>> It sounds like you want to model the pipeline stage at which a >>>>> resource is used. To do that with the per-operand machine model >>>>> (misnomer), I think we need a ResourceDelay vector in addition to >>>>> ResourceCycles, which we could easily add. >>>>> >>>>> However, overall, I think you’re target is interesting enough that >>>>> you may be better off augmenting the standard machine model with >>>>> your own model. Your scheduler plugin could keep your own tables >>>>> or state machine to model the constraints. >>>>> >>>>> If you want to be clever, you could write tablegen code to build >>>>> your model up from the SchedRead/Write definitions that are part >>>>> of the standard model. You could add extra fields specific to your >>>>> model. >>>>> >>>>> Were you previously using the old instruction itineraries, and now >>>>> moving to the new model? >>>>> >>>>> -Andy >>>>> >>>>>> Thanks for your kind response, >>>>>> JinGu Kang >>>>>> >>>>>> On 2014-02-20 오전 2:27, Andrew Trick wrote: >>>>>>> Hi JinGu, >>>>>>> >>>>>>> We currently have the ResourceCycles list to indicate the number >>>>>>> of cpu cycles during which a resource is reserved. We could >>>>>>> simply add a ResourceDelay with similar grammar. The >>>>>>> MachineScheduler could be taught to keep track of the first and >>>>>>> last time that a resource is reserved. >>>>>>> >>>>>>> Note that the MachineScheduler will work with the instruction >>>>>>> itineraries if you choose to implement them. That’s the only way >>>>>>> to get a full reservation table without customizing the >>>>>>> scheduler. You can plugin your own state machine or other >>>>>>> scheduling constraint logic. You may want to do this if you have >>>>>>> very complicated constraints. >>>>>>> >>>>>>> Can you provide an example of the most complicated instruction >>>>>>> resources that you need to model? >>>>>>> >>>>>>> -Andy >>>>>>> >>>>>>> On Feb 19, 2014, at 4:57 AM, JinGu Kang <jingu at codeplay.com >>>>>>> <mailto:jingu at codeplay.com>> wrote: >>>>>>> >>>>>>>> Hi Andy, >>>>>>>> >>>>>>>> I am sorry to misunderstand 'ReadAdvance' code. In order to support >>>>>>>> resource per operand, I feel we need more table and function. If >>>>>>>> possbile, I would like to listen to your opinion whether this >>>>>>>> feature is >>>>>>>> useful or not. As I mentioned on previous e-mail, it will be >>>>>>>> useful to >>>>>>>> access the latency and the resource per operand while checking >>>>>>>> resource >>>>>>>> conflict per cycle. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> JinGu Kang >>>>>>>> >>>>>>>> On 18/02/14 23:09, jingu wrote: >>>>>>>>>> Resources and latency are not tied. An instruction is mapped to a >>>>>>>>>> scheduling class. A scheduling class is mapped to a set of >>>>>>>>>> resources >>>>>>>>>> and a per-operand list of latencies. >>>>>>>>> Thanks for your kind explanation. >>>>>>>>> >>>>>>>>> Our heuristic algorithm have needed the latency and the >>>>>>>>> resource per >>>>>>>>> operand to check resource conflicts per cycle. In order to support >>>>>>>>> this with LLVM, I expected a per-operand list of resources like >>>>>>>>> latencies with a scheduling class. >>>>>>>>> >>>>>>>>> Can I ask you something to modify on tablegen? I think that the >>>>>>>>> 'WriteResourceID' field of 'MCWriteLatencyEntry' is for >>>>>>>>> identifying >>>>>>>>> the WriteResources of each defintion as commented on code. As you >>>>>>>>> know, tablegen sets the 'WriteResourceID' field of >>>>>>>>> 'MCWriteLatencyEntry' with 'WriteID' when the 'Write' of >>>>>>>>> defition is >>>>>>>>> referenced by a 'ReadAdvance'. If we always set this field with >>>>>>>>> 'WriteID', it causes problem? I can see that 'ReadAdvance' >>>>>>>>> only uses >>>>>>>>> the 'WriteResourceID' field of 'MCWriteLatencyEntry' in >>>>>>>>> 'computeOperandLatency' function. I think the pair of latency and >>>>>>>>> write resource for defintion will be useful to check conflicts of >>>>>>>>> resources. As reference, I have attached simple patch. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> JinGu Kang >>>>>>>>> >>>> >>>> >>>> -- >>>> Pierre-Andre Saulais >>>> Compiler Developer >>>> Codeplay Software Ltd >>>> 45 York Place, Edinburgh, EH1 3HP >>>> Tel: 0131 466 0503 >>>> Fax: 0131 557 6600 >>>> Website:http://www.codeplay.com <http://www.codeplay.com/> >>>> Twitter:https://twitter.com/codeplaysoft >>>> >>>> This email and any attachments may contain confidential and /or >>>> privileged information and is for use by the addressee only. If you >>>> are not the intended recipient, please notify Codeplay Software Ltd >>>> immediately and delete the message from your computer. You may not >>>> copy or forward it,or use or disclose its contents to any other >>>> person. Any views or other information in this message which do not >>>> relate to our business are not authorized by Codeplay software Ltd, >>>> nor does this message form part of any contract unless so stated. >>>> As internet communications are capable of data corruption Codeplay >>>> Software Ltd does not accept any responsibility for any changes >>>> made to this message after it was sent. Please note that Codeplay >>>> Software Ltd does not accept any liability or responsibility for >>>> viruses and it is your responsibility to scan any attachments. >>>> Company registered in England and Wales, number: 04567874 >>>> Registered office: 81 Linkfield Street, Redhill RH1 6BY >>>> >>>> <add_resource_delays.patch> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu >>> <mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu >>> <http://llvm.cs.uiuc.edu/> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140304/b84641a4/attachment.html>
On Mar 4, 2014, at 12:25 PM, Pierre-André Saulais <pierre-andre at codeplay.com> wrote:> On 04/03/14 18:08, Andrew Trick wrote: >> >> On Mar 4, 2014, at 10:05 AM, Pete Cooper <peter_cooper at apple.com> wrote: >> >>> >>> On Mar 3, 2014, at 2:21 PM, Andrew Trick <atrick at apple.com> wrote: >>> >>>> >>>> On Mar 3, 2014, at 8:53 AM, Pierre-Andre Saulais <pierre-andre at codeplay.com> wrote: >>>> >>>>> Hi Andrew, >>>>> >>>>> We are currently using a custom model where scheduling information is attached to each MCInstrDesc through tablegen, and we're trying to move to one of LLVM's models. >>>>> >>>>> To expand on what JinGu mentioned, our target has explicit ports that are used to read and write values from and to the register file. The read port is usually accessed on cycle 0 while the write port is accessed when the result is written back to the destination register. Let's assume ADD has a latency of 1, MUL has a latency of 2 and both use port P0 to write back their result. The two instructions below would conflict on P0: >>>>> >>>>> MUL r3, r4, r5 >>>>> ADD r0, r1, r2 >>>>> NOP ; Both r0 and r4 are written back using P0 - conflict. >>>>> >>>>> On our target there is no interlock which means any conflict results in the wrong value being written back to one of the register. That's why we want to model these ports as resources in the new model. That's also why we map these port resources to each operand as each operand accesses a different port. >>>>> >>>>> After reading your replies, we have realized that the scheduler does not need to know which operand corresponds to each port. It simply needs to know the set of ports used by each instruction and after how many cycles these ports are used/reserved to avoid any conflict. That's why I believe the new process resource model closely fits what we need, except for the per-resource delay you mentioned. >>>>> >>>>> This is how our model currently looks like: >>>>> >>>>> def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], [II_ADD]>; >>>>> def :ItinRW<[2_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], [II_MUL]>; >>>>> >>>>> where n_LATENCY_WITH_p is defined roughly as: >>>>> >>>>> class n_LATENCY_WITH_p<int latency, ProcResourceKind port> : SchedWriteRes<[PR_Pp]> { >>>>> let Latency = latency; >>>>> let ResourceDelays = [latency]; >>>>> } >>>>> >>>>> class PR_Pp<int portIdx> : ProcResource<1>; >>>>> >>>>> The latency for register write-back/port access is static and without interlock, which I think means the port resources should have 'Buffered = 0' in the definition. Is that correct? >>>> >>>> Yes, but it isn’t sufficient. The scheduler makes no attempt to insert nops currently. However, at the very least, you will want to implement your own MachineSchedStrategy. It would be natural to handle nop insertion within your implementation. > Thanks, I'll have a look at MachineSchedStrategy and see how we can implement it for our target. >>> Nop insertion during scheduling sounds good to me, but nop insertion after regalloc has the advantage of being able to insert nops for spill/reload. Unless you don’t have spills? >> >> To elaborate a bit more, MachineScheduler can run both preRA and postRA. So, if you want to do nop insertion within MachineScheduler (as opposed to a separate pass) you could enable it only during postRA scheduling. >> >> -Andy > We are currently doing scheduling with a custom scheduler/packetization pass at the very end of the machine compilation process, before object generation. We aren't using the MachineScheduler or any other scheduling before that pass, neither preRA or postRA. I think we could benefit from at least preRA scheduling, so that the register live ranges seen by the RA better match the final ranges (after our scheduling pass).I see, then you’re really can do whatever you want with the machine model. You’re just limited by the tables produced by the current tablegen backend and whatever features you add to it. You just need to understand the format of the tables in <YourTarget>GenSubtargetInfo.inc and try to fit your model into that format. Hopefully adding ResourceDelays will give you enough flexibility.> We do have spills, but I'm not sure if there is a benefit to inserting nops before our final pass though. Would that improve register allocation, if it was possible to do so preRA?Nope. The main reason to bundle pre-RA is to expose more opportunity for code motion (without physical register dependencies), thus generate tighter bundles. In general, there’s no reason to introduce nops other than to meet your encoding constraints, so it’s just a matter of picking the most convenient place to do that. -Andy>> >>> Pete >>>> >>>> In fact, the interpretation of most machine model properties (MircoOpBufferSize, resource BufferSize, ResourceCycles, ResourceDelay) is handled within the MachineSchedStrategy. In past emails I have been explaining how the GenericScheduler interprets the model, but it is really up to your custom strategy to implement the model. >>>> >>>>> I have attached a patch that adds the 'ResourceDelays' field in tablegen. Could you have a look at it? A couple possible issues are: >>>>> - 'Delay' is signed, since 'Cycles' in MCWriteLatencyEntry is also signed. >>>> >>>> Sure. >>>> >>>>> - When an instruction accesses the same resource multiple times, the uses are aggregated in SubtargetEmitter::GenSchedClassTables. I'm not sure how that would work if we add a 'Delay' field to MCWriteProcResEntry. >>>> >>>> Me neither. I suggest adding an assert to make sure no one accidentally uses two resources with non-zero delay. Otherwise, your patch looks fine to me. It’s totally up to you to test it though. I really want to take this patch, but we have no mechanism for testing out-of-tree target features. > Adding an assert when someone uses two resources with non-zero delay, or maybe two different delays, sounds good to me. I'm glad to hear that you'd want to take patches even for out-of-tree features, it's much appreciated. > > I'm not very familiar with the itinerary model, but aren't these two ways of expressing schedules equivalent? > > def :ItinRW<[1_LATENCY_WITH_P0, 0_LATENCY_WITH_P1, 0_LATENCY_WITH_P2], [II_ADD]>; > > InstrItinData<II_ADD, [InstrStage<1, [P1], 0>, InstrStage<1, [P2]>, InstrStage<1, [P0]>], [1, 0, 0]> > > If that's the case, then does the new machine model express itineraries with more than one stage, without adding this 'ResourceDelays' field? > > Thanks, > Pierre > >>>> >>>> -Andy >>>> >>>>> >>>>> Thanks, >>>>> Pierre >>>>> >>>>> On 28/02/14 01:00, Andrew Trick wrote: >>>>>> On Feb 19, 2014, at 1:54 PM, jingu <jingu at codeplay.com> wrote: >>>>>> >>>>>>> Hi Andy, >>>>>>> >>>>>>> I am trying to schedule and packetize instructions for VLIW at post-RA >>>>>>> stage or final codegen stage, where code transformations are not allowed >>>>>>> any more, because hardware can not resolve resource conflict. There is a >>>>>>> simple example as following: >>>>>>> >>>>>>> ADD dest_reg1, src_reg1, src_reg2 (functional unit : ALU) >>>>>>> STORE dest_reg2, mem (functional unit: LOAD_STORE) >>>>>>> >>>>>>> These instructions can be genally packetized together because there is >>>>>>> no dependency among operands and they use different functional unit. But >>>>>>> we have one more restricton. The restriction is that some of >>>>>>> instructions can not access to same register file at the same cycle. In >>>>>>> other words, if 'src_reg1' of ADD instruction uses register file 'A' and >>>>>>> 'dest_reg2' of STORE instruction uses same register file at the same >>>>>>> cycle, it causes resource conflict and can not be executed on same >>>>>>> cycle. This restriction depends on instruction type. I tried to consider >>>>>>> each register file as a resource unit which is consumed by each operand. >>>>>>> While scheduling instructions per cycle, used register file is recorded >>>>>>> on state per cycle to check the conflict. In our heristic, it depends on >>>>>>> operand's latency to record this resource on specific cycle's state. so >>>>>>> I have tried to find a way to get latency and resource with each >>>>>>> operand. If it is not possible to support this feature with per-operand >>>>>>> resource model, as you suggested, I will try to make our own state >>>>>>> machine or other scheduling constraint logic. I am newbee with >>>>>>> scheduler. If you have any kinds of comment or feel something worng, >>>>>>> please let me know. It will be really helpful. >>>>>> It sounds like the register file is static and does not depend on register allocation. In this case, what you tried makes sense but is really not supported. The machine model tables are designed to be efficient for the common case, and per-operand resources don’t really make sense most of the time. >>>>>> >>>>>> It sounds like you want to model the pipeline stage at which a resource is used. To do that with the per-operand machine model (misnomer), I think we need a ResourceDelay vector in addition to ResourceCycles, which we could easily add. >>>>>> >>>>>> However, overall, I think you’re target is interesting enough that you may be better off augmenting the standard machine model with your own model. Your scheduler plugin could keep your own tables or state machine to model the constraints. >>>>>> >>>>>> If you want to be clever, you could write tablegen code to build your model up from the SchedRead/Write definitions that are part of the standard model. You could add extra fields specific to your model. >>>>>> >>>>>> Were you previously using the old instruction itineraries, and now moving to the new model? >>>>>> >>>>>> -Andy >>>>>> >>>>>>> Thanks for your kind response, >>>>>>> JinGu Kang >>>>>>> >>>>>>> On 2014-02-20 오전 2:27, Andrew Trick wrote: >>>>>>>> Hi JinGu, >>>>>>>> >>>>>>>> We currently have the ResourceCycles list to indicate the number of cpu cycles during which a resource is reserved. We could simply add a ResourceDelay with similar grammar. The MachineScheduler could be taught to keep track of the first and last time that a resource is reserved. >>>>>>>> >>>>>>>> Note that the MachineScheduler will work with the instruction itineraries if you choose to implement them. That’s the only way to get a full reservation table without customizing the scheduler. You can plugin your own state machine or other scheduling constraint logic. You may want to do this if you have very complicated constraints. >>>>>>>> >>>>>>>> Can you provide an example of the most complicated instruction resources that you need to model? >>>>>>>> >>>>>>>> -Andy >>>>>>>> >>>>>>>> On Feb 19, 2014, at 4:57 AM, JinGu Kang <jingu at codeplay.com> wrote: >>>>>>>> >>>>>>>>> Hi Andy, >>>>>>>>> >>>>>>>>> I am sorry to misunderstand 'ReadAdvance' code. In order to support >>>>>>>>> resource per operand, I feel we need more table and function. If >>>>>>>>> possbile, I would like to listen to your opinion whether this feature is >>>>>>>>> useful or not. As I mentioned on previous e-mail, it will be useful to >>>>>>>>> access the latency and the resource per operand while checking resource >>>>>>>>> conflict per cycle. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> JinGu Kang >>>>>>>>> >>>>>>>>> On 18/02/14 23:09, jingu wrote: >>>>>>>>>>> Resources and latency are not tied. An instruction is mapped to a >>>>>>>>>>> scheduling class. A scheduling class is mapped to a set of resources >>>>>>>>>>> and a per-operand list of latencies. >>>>>>>>>> Thanks for your kind explanation. >>>>>>>>>> >>>>>>>>>> Our heuristic algorithm have needed the latency and the resource per >>>>>>>>>> operand to check resource conflicts per cycle. In order to support >>>>>>>>>> this with LLVM, I expected a per-operand list of resources like >>>>>>>>>> latencies with a scheduling class. >>>>>>>>>> >>>>>>>>>> Can I ask you something to modify on tablegen? I think that the >>>>>>>>>> 'WriteResourceID' field of 'MCWriteLatencyEntry' is for identifying >>>>>>>>>> the WriteResources of each defintion as commented on code. As you >>>>>>>>>> know, tablegen sets the 'WriteResourceID' field of >>>>>>>>>> 'MCWriteLatencyEntry' with 'WriteID' when the 'Write' of defition is >>>>>>>>>> referenced by a 'ReadAdvance'. If we always set this field with >>>>>>>>>> 'WriteID', it causes problem? I can see that 'ReadAdvance' only uses >>>>>>>>>> the 'WriteResourceID' field of 'MCWriteLatencyEntry' in >>>>>>>>>> 'computeOperandLatency' function. I think the pair of latency and >>>>>>>>>> write resource for defintion will be useful to check conflicts of >>>>>>>>>> resources. As reference, I have attached simple patch. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> JinGu Kang >>>>>>>>>> >>>>> >>>>> >>>>> -- >>>>> Pierre-Andre Saulais >>>>> Compiler Developer >>>>> Codeplay Software Ltd >>>>> 45 York Place, Edinburgh, EH1 3HP >>>>> Tel: 0131 466 0503 >>>>> Fax: 0131 557 6600 >>>>> Website: http://www.codeplay.com >>>>> Twitter: https://twitter.com/codeplaysoft >>>>> >>>>> This email and any attachments may contain confidential and /or privileged information and is for use by the addressee only. If you are not the intended recipient, please notify Codeplay Software Ltd immediately and delete the message from your computer. You may not copy or forward it,or use or disclose its contents to any other person. Any views or other information in this message which do not relate to our business are not authorized by Codeplay software Ltd, nor does this message form part of any contract unless so stated. >>>>> As internet communications are capable of data corruption Codeplay Software Ltd does not accept any responsibility for any changes made to this message after it was sent. Please note that Codeplay Software Ltd does not accept any liability or responsibility for viruses and it is your responsibility to scan any attachments. >>>>> Company registered in England and Wales, number: 04567874 >>>>> Registered office: 81 Linkfield Street, Redhill RH1 6BY >>>>> >>>>> <add_resource_delays.patch> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140310/18f61269/attachment.html>