Phil Tomson via llvm-dev
2016-Jun-08 02:02 UTC
[llvm-dev] Instruction Itineraries: question about operand latencies
I overrode getInstrLatency and did some printing to see what is available there. It looks like the registers are still virtual at that point when getInstrLatency is called - is that correct? (we needed to make some decisions based on actual registers that have been assigned since some registers are reserved as address space pointers and we could vary the latency based on which address space pointer register is being used - but it looks like they're virtual there) Phil On Mon, Jun 6, 2016 at 3:10 PM, Ehsan Amiri <ehsanamiri at gmail.com> wrote:> Hi Phil > > There are some comments in "include/llvm/Target/TargetItinerary.td" where > class InstrItinData is defined. > > B is the number of cycles after issue where the first operand of the > instruction is defined. A is the number of cycles that the instruction will > stay in that particular stage in the pipeline. So for simple cases, like > your example, one would expect that A and B should have the same value.But > there is different API for accessing to A and B. > > An example of accessing to B in the source code can be found here: > PPCInstrInfo::getInstrLatency. You can also look at getStageLatency in > include/llvm/MC/MCInstrItineraries.h. From this two you can probably find > other relevant places. > > Hope this helps > Ehsan > > > On Mon, Jun 6, 2016 at 2:37 PM, Phil Tomson via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> In our architecture loads from certain memory locations take a long time >> to complete (on the order of 150 clock cycles). Since we don't have a way >> to tell at compile time if the address being loaded from lies in slow or >> fast memory, I've gone ahead and made all of the load numbers high, such as: >> >> InstrItinData< II_LOAD1, [InstrStage<150, [AGU]>]>, >> >> However, I see that there is another field which I haven't specified >> where operand latencies are specified. Here's an example from >> ARMScheduleA8.td: >> >> InstrItinData<IIC_iALUi ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, 2]>, >> >> Now I'm wondering if Instead of what I had above, I should instead have >> specified: >> >> InstrItinData< II_LOAD1, [InstrStage<150, [AGU]>],[150,1,1]>, >> >> ? >> >> but is that first '150' parameter there redundant? Since it's specified >> in the operand latency list ([150,1,1] - the first element of that array >> being the latency for the output)? >> >> >> To clarify, for values of 'A' and 'B' below: >> >> InstrItinData< II_LOAD1, [InstrStage<A, [AGU]>], [B,1,1]>, >> >> ...what is the difference in the meaning for 'A' and 'B'? Are they >> essentially the same value since only one functional unit is specified? >> ([AGU]) >> >> Phil >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160607/9acfb95c/attachment.html>
Ehsan Amiri via llvm-dev
2016-Jun-08 03:57 UTC
[llvm-dev] Instruction Itineraries: question about operand latencies
There are two scheduling passes. One is before register allocation and the other one is after register allocation. You probably looked at the print outs during first (pre-ra) scheduling pass. Start from TargetPassConfig::addMachinePasses to find more details about code gen passes. On Tue, Jun 7, 2016 at 10:02 PM, Phil Tomson <phil.a.tomson at gmail.com> wrote:> I overrode getInstrLatency and did some printing to see what is available > there. It looks like the registers are still virtual at that point when > getInstrLatency is called - is that correct? (we needed to make some > decisions based on actual registers that have been assigned since some > registers are reserved as address space pointers and we could vary the > latency based on which address space pointer register is being used - but > it looks like they're virtual there) > > Phil > > On Mon, Jun 6, 2016 at 3:10 PM, Ehsan Amiri <ehsanamiri at gmail.com> wrote: > >> Hi Phil >> >> There are some comments in "include/llvm/Target/TargetItinerary.td" where >> class InstrItinData is defined. >> >> B is the number of cycles after issue where the first operand of the >> instruction is defined. A is the number of cycles that the instruction will >> stay in that particular stage in the pipeline. So for simple cases, like >> your example, one would expect that A and B should have the same value.But >> there is different API for accessing to A and B. >> >> An example of accessing to B in the source code can be found here: >> PPCInstrInfo::getInstrLatency. You can also look at getStageLatency in >> include/llvm/MC/MCInstrItineraries.h. From this two you can probably find >> other relevant places. >> >> Hope this helps >> Ehsan >> >> >> On Mon, Jun 6, 2016 at 2:37 PM, Phil Tomson via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> In our architecture loads from certain memory locations take a long time >>> to complete (on the order of 150 clock cycles). Since we don't have a way >>> to tell at compile time if the address being loaded from lies in slow or >>> fast memory, I've gone ahead and made all of the load numbers high, such as: >>> >>> InstrItinData< II_LOAD1, [InstrStage<150, [AGU]>]>, >>> >>> However, I see that there is another field which I haven't specified >>> where operand latencies are specified. Here's an example from >>> ARMScheduleA8.td: >>> >>> InstrItinData<IIC_iALUi ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, >>> 2]>, >>> >>> Now I'm wondering if Instead of what I had above, I should instead have >>> specified: >>> >>> InstrItinData< II_LOAD1, [InstrStage<150, [AGU]>],[150,1,1]>, >>> >>> ? >>> >>> but is that first '150' parameter there redundant? Since it's specified >>> in the operand latency list ([150,1,1] - the first element of that array >>> being the latency for the output)? >>> >>> >>> To clarify, for values of 'A' and 'B' below: >>> >>> InstrItinData< II_LOAD1, [InstrStage<A, [AGU]>], [B,1,1]>, >>> >>> ...what is the difference in the meaning for 'A' and 'B'? Are they >>> essentially the same value since only one functional unit is specified? >>> ([AGU]) >>> >>> Phil >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160607/66a98e98/attachment.html>
Phil Tomson via llvm-dev
2016-Jun-08 22:13 UTC
[llvm-dev] Instruction Itineraries: question about operand latencies
I did some looking around and found this in Passes.cpp: // Temporary option to allow experimenting with MachineScheduler as a post-RA // scheduler. Targets can "properly" enable this with // substitutePass(&PostRASchedulerID, &PostMachineSchedulerID); Ideally it // wouldn't be part of the standard pass pipeline, and the target would just add // a PostRA scheduling pass wherever it wants. static cl::opt<bool> MISchedPostRA("misched-postra", cl::Hidden, cl::desc("Run MachineScheduler post regalloc (independent of preRA sched)")); So I added this to our target's passConfig subclass: class XSTGPassConfig : public TargetPassConfig { public: XSTGPassConfig(XSTGTargetMachine *TM, PassManagerBase &PM) : TargetPassConfig(TM, PM) { if (TM->getOptLevel() != CodeGenOpt::None) substitutePass(&PostRASchedulerID, &PostMachineSchedulerID); } Then built and ran clang on some code. I had added some couts to the getInstrLatency to display the UseInstr. This is an example of what I see on the output:>>> DefInstr: %vreg34<def> = LOADI32_RI %vreg3, 268;mem:LD4[%3](tbaa=<0x75a64d8>) R32C:%vreg34 GPRC:%vreg3 dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[ pipeline_routing_be.c:1120:1 ] ] Latency: 142 for: DefIdx= 0 UseIdx= 1 UseInstr: %vreg34<def> = LOADI32_RI %vreg3, 268; mem:LD4[%3](tbaa=<0x75a64d8>) R32C:%vreg34 GPRC:%vreg3 dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[ pipeline_routing_be.c:1120:1 ] ] %vreg35<def> = CVT_U32_TO_U64 %vreg34; GPRC:%vreg35 R32C:%vreg34 dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[ pipeline_routing_be.c:1120:1 ] ] ...since I see vreg's mentioned there, I'm assuming this didn't run postRA as I would have expected. (Our code is based on LLVM 3.6 if that's relevant) Phil On Tue, Jun 7, 2016 at 8:57 PM, Ehsan Amiri <ehsanamiri at gmail.com> wrote:> There are two scheduling passes. One is before register allocation and the > other one is after register allocation. You probably looked at the print > outs during first (pre-ra) scheduling pass. Start from > TargetPassConfig::addMachinePasses to find more details about code gen > passes. > > On Tue, Jun 7, 2016 at 10:02 PM, Phil Tomson <phil.a.tomson at gmail.com> > wrote: > >> I overrode getInstrLatency and did some printing to see what is available >> there. It looks like the registers are still virtual at that point when >> getInstrLatency is called - is that correct? (we needed to make some >> decisions based on actual registers that have been assigned since some >> registers are reserved as address space pointers and we could vary the >> latency based on which address space pointer register is being used - but >> it looks like they're virtual there) >> >> Phil >> >> On Mon, Jun 6, 2016 at 3:10 PM, Ehsan Amiri <ehsanamiri at gmail.com> wrote: >> >>> Hi Phil >>> >>> There are some comments in "include/llvm/Target/TargetItinerary.td" >>> where class InstrItinData is defined. >>> >>> B is the number of cycles after issue where the first operand of the >>> instruction is defined. A is the number of cycles that the instruction will >>> stay in that particular stage in the pipeline. So for simple cases, like >>> your example, one would expect that A and B should have the same value.But >>> there is different API for accessing to A and B. >>> >>> An example of accessing to B in the source code can be found here: >>> PPCInstrInfo::getInstrLatency. You can also look at getStageLatency in >>> include/llvm/MC/MCInstrItineraries.h. From this two you can probably find >>> other relevant places. >>> >>> Hope this helps >>> Ehsan >>> >>> >>> On Mon, Jun 6, 2016 at 2:37 PM, Phil Tomson via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> In our architecture loads from certain memory locations take a long >>>> time to complete (on the order of 150 clock cycles). Since we don't have a >>>> way to tell at compile time if the address being loaded from lies in slow >>>> or fast memory, I've gone ahead and made all of the load numbers high, such >>>> as: >>>> >>>> InstrItinData< II_LOAD1, [InstrStage<150, [AGU]>]>, >>>> >>>> However, I see that there is another field which I haven't specified >>>> where operand latencies are specified. Here's an example from >>>> ARMScheduleA8.td: >>>> >>>> InstrItinData<IIC_iALUi ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, >>>> 2]>, >>>> >>>> Now I'm wondering if Instead of what I had above, I should instead have >>>> specified: >>>> >>>> InstrItinData< II_LOAD1, [InstrStage<150, [AGU]>],[150,1,1]>, >>>> >>>> ? >>>> >>>> but is that first '150' parameter there redundant? Since it's specified >>>> in the operand latency list ([150,1,1] - the first element of that array >>>> being the latency for the output)? >>>> >>>> >>>> To clarify, for values of 'A' and 'B' below: >>>> >>>> InstrItinData< II_LOAD1, [InstrStage<A, [AGU]>], [B,1,1]>, >>>> >>>> ...what is the difference in the meaning for 'A' and 'B'? Are they >>>> essentially the same value since only one functional unit is specified? >>>> ([AGU]) >>>> >>>> Phil >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/917d0c8a/attachment.html>