Dave, Thank you for your interest. Please see my replies below. Sorry that my terminology is not as crisp as Andy's, but I think you can see what I mean. Sergei -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.> -----Original Message----- > From: dag at cray.com [mailto:dag at cray.com] > Sent: Friday, May 11, 2012 12:14 PM > To: Sergei Larin > Cc: 'Andrew Trick'; 'Hal Finkel'; wrf at cray.com; 'LLVM Developers > Mailing List' > Subject: Re: [LLVMdev] Scheduler Roadmap > > Sergei Larin <slarin at codeaurora.org> writes: > > > - We do need to have a way to assign bundles much earlier than we > do now. > > Yeah, I can imagine why this would be useful. > > > And it needs to be intertwined with scheduling (Bundler currently > > reuses a good chunk of scheduler infrastructure). > > Just to clarify, is the need due to the current bundling implementation > of reusing scheduler infrastructure or is there a more fundamental > reason the two should be tied together? I can imagine some advantages > of fusing the two but I'm no VLIW expert.[Larin, Sergei] A little bit of both. Current bundler uses DAG dep builder to facilitate its analysis, for that it actually instantiates full MI scheduler... without scheduling. Ideally scheduler itself should be able to produce bundled code (since it has the best picture of machine resources and instruction stream), but standalone bundler by itself might be needed to re-bundle incrementally (which it does not do right now). In short - I see bundler as a utility, not as a pass.> > > It is also obvious that it will have adverse effect on all the > > downstream passes. > > How so? Isn't the bundle representation supposed to be fairly > transparent to passes that don't care about it?[Larin, Sergei] Kind of. Once bundles are finalized, bundle header become a new "super instructions", and if a pass does not need to look at individual (MI) instructions, there will not be any difference for it. But if a pass need to deal with individual MIs, things get interesting. For one, we lack API for moving/adding/removing individual MIs to/from finalized bundles. We also lack API to move MIs between BBs in presence of bundles. Live Intervals obviously do not work with bundles... Two, semantics (dependencies) within a bundle are parallel (think { r0 r1; r1 = r0 } in serial vs. parallel semantics) and if a pass needs to "understand" it, it will need to be "taught" how to do it. This is where incremental rebundling might come in handy. Fortunately we currently do bundling fairly late, so it is not an issue yet.> > > It is further insulting due to the fact that bundling is trivial to > do > > during scheduling, but it goes hard against the original assumptions > > made elsewhere. > > Can you explain more about this?[Larin, Sergei] The core of bundler is the DFA state machine, which also _must_ be a part of any VLIW scheduler, so in my Hexagon VLIW "custom" scheduler (I actually have two - SDNode and MI based) I virtually __create__ bundles, but discard them at the end of pass, only to recreate them again later in the standalone bundler. Second attempt is riddled with additional false dependencies (anti, output etc.) introduced by the register allocation, so bundling quality is affected.> > > Re-bundling is also a distinct task that might need to be addressed > in > > this context.[Larin, Sergei] I think above explanation covers this.> > Rebundling is certainly useful. I'm not sure what you mean by "in this > context." > > > - We do need to have at least a distant plan for global scheduling. > > Yes, definitely! > > > BB scope is nice and manageable, but I can easily get several percent > > of "missed" performance by simply implementing a pull-up pass at the > > very end of code generation... meaning multiple opportunities were > > lost earlier. Current way to express and maintain state needed for > > global scheduling remains to be improved. > > We could also use a global scheduler in the medium-term though it's not > absolutely critical. A nice clean infrastructure to support global > scheduling with multiple heuristics, etc. would be very valuable. It > would also be a lot of work. :)[Larin, Sergei] I will need to start doing this very soon to meet my performance goals, but extensive discussion and generic support might take some time to crystallize... but indeed critically needed.> > > - SW pipelining and scheduler interaction with it. When (not if:) > we > > will have a robust SW pipeliner it will likely to take place before > > first scheduling pass, and we do not want to "undo" some decision > made there. > > Right. We (the LLVM community) will need ways to mark regions "don't > schedule." > > -Dave
Thanks for helping explain the infrastructure. One ammendment... On May 11, 2012, at 11:28 AM, Sergei Larin <slarin at codeaurora.org> wrote:> [Larin, Sergei] Kind of. Once bundles are finalized, bundle header become a > new "super instructions", and if a pass does not need to look at individual > (MI) instructions, there will not be any difference for it. But if a pass > need to deal with individual MIs, things get interesting. For one, we lack > API for moving/adding/removing individual MIs to/from finalized bundles. We > also lack API to move MIs between BBs in presence of bundles. Live Intervals > obviously do not work with bundles... > Two, semantics (dependencies) within a bundle are parallel (think { r0 > r1; r1 = r0 } in serial vs. parallel semantics) and if a pass needs to > "understand" it, it will need to be "taught" how to do it. This is where > incremental rebundling might come in handy. Fortunately we currently do > bundling fairly late, so it is not an issue yet.LiveIntervals should work for your bundles by giving each instruction in the bundle the same slot index. Regalloc should do the right thing. There's a proof-of-concept API in LiveIntervalAnalyses, handleMoveIntoBundle(), that you should try to use. As I was saying in my last message, we may want to improve that API as you actually start testing it. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120511/d37d45f9/attachment.html>
Sergei Larin <slarin at codeaurora.org> writes:> Ideally scheduler itself should be able to produce bundled code (since > it has the best picture of machine resources and instruction stream)Ok, that's what I was imagining.>> How so? Isn't the bundle representation supposed to be fairly >> transparent to passes that don't care about it?> We also lack API to move MIs between BBs in presence of bundles.Why would you want to do that? Wouldn't you just move bundles across BBs? A rebundling pass could split out the individual MIs you want to move if there are dependence issues.> Live Intervals obviously do not work with bundles...I don't see that as obvious. I'd been imagining LiveIntervals to use a generic bundle interface to get at the defs and uses of trhe MIs inside it. Is it just a matter of not having those interfaces yet?> Two, semantics (dependencies) within a bundle are parallel (think { r0 > = r1; r1 = r0 } in serial vs. parallel semantics) and if a pass needs to > "understand" it, it will need to be "taught" how to do it.But if a pass need to know about that it is a special bundle-aware pass and I wouldn't expect the bundle to be transparent.> This is where incremental rebundling might come in handy. Fortunately > we currently do bundling fairly late, so it is not an issue yet.Right, but I'm glad you're thinking ahead. :)> [Larin, Sergei] The core of bundler is the DFA state machine, which also > _must_ be a part of any VLIW scheduler, so in my Hexagon VLIW "custom" > scheduler (I actually have two - SDNode and MI based) I virtually __create__ > bundles, but discard them at the end of pass, only to recreate them again > later in the standalone bundler. Second attempt is riddled with additional > false dependencies (anti, output etc.) introduced by the register > allocation, so bundling quality is affected.They're only "false" in the sense of the register names being fixed at that point. It's essentially the same problem as trying to schedule for latency after register allocation. You're constrained by hazards imposed by a reduced namespace. If the scheduler-created bundles were maintained, then the register allocator would have less freedom to assign registers. It's a classic scheduler/regalloc tradeoff. Bunding is "just" another fun detail, I think. :)>> We could also use a global scheduler in the medium-term though it's not >> absolutely critical. A nice clean infrastructure to support global >> scheduling with multiple heuristics, etc. would be very valuable. It >> would also be a lot of work. :) > > [Larin, Sergei] I will need to start doing this very soon to meet my > performance goals, but extensive discussion and generic support might take > some time to crystallize... but indeed critically needed.Yes, I fully expect it will take a while to mature. I'm really excited that you're starting to look at this! -Dave
Hello everyone, Let me (re)present a question that might have previously been discussed, but did not result in any code (AFIK). How do we represent a _conditional_ assignment (def) in a bundle MI? More contents - currently we expose internal def/use/kill information to a bundle header - something like this: BUNDLE %PC<imp-def>, %R0<imp-def>, %P0<imp-use,kill>, %R16<imp-use> * %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0; * %P0<def> = CMPEQri %R16, 0; Here CMPEQri is a compare to a predicate register instruction, and LDriuh_cdnNotPt is a _conditional_ load, which might or might not Take place based on the outcome of the compare... As such R0 might or might not be defined in this bundle, which obviously changes the liveness update process. My question, do we need another attribute along with isImplicit and isEarlyClobber etc. to designate a conditional def? Furthermore, depending on architectural details we well might have a conditional use as well... and what about the individual (unbundled) def/use? Should this: %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0; ...become this: %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0; or even: %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>, %R16<use-cond>, 0; So, if I am missing something in current implementation or an ongoing discussions (and that is entirely possible since I am just back after vacation), please let me know how to achieve this functionality, but if this is something missing in implementation, let's discuss how do we want to realize it. Thanks. Sergei Larin -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
Hi Sergei, It seems to me that you can represent the semantics of a conditional instruction by adding a use of the conditionally defined register to the instruction. The value of the output register of an instruction is either the value of the instruction if it was conditionally executed or the value of the output register before the instruction. The Bundle would be: BUNDLE %PC<imp-def>, %R0<imp-def>, %P0<imp-use,kill>, %R16<imp-use>, %R0<imp-use,kill> * %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0, %R0<imp-use,kill> * %P0<def> = CMPEQri %R16, 0 The individual instruction would be: %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0, %R0<imp-use, kill>; How would you use cond-def/uses? How would they change liveness? Best, Arnold -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum. On 8/9/2012 11:48 AM, Sergei Larin wrote:> > Hello everyone, > > Let me (re)present a question that might have previously been discussed, > but did not result in any code (AFIK). > > How do we represent a _conditional_ assignment (def) in a bundle MI? > > More contents - currently we expose internal def/use/kill information to a > bundle header - something like this: > > > BUNDLE %PC<imp-def>, %R0<imp-def>, %P0<imp-use,kill>, %R16<imp-use> > * %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0; > * %P0<def> = CMPEQri %R16, 0; > > Here CMPEQri is a compare to a predicate register instruction, and > LDriuh_cdnNotPt is a _conditional_ load, which might or might not > Take place based on the outcome of the compare... As such R0 might or might > not be defined in this bundle, which obviously changes the liveness update > process. > > My question, do we need another attribute along with isImplicit and > isEarlyClobber etc. to designate a conditional def? Furthermore, depending > on architectural details we well might have a conditional use as well... and > what about the individual (unbundled) def/use? Should this: > > %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0; > > ...become this: > > %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0; > > or even: > > %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>, %R16<use-cond>, 0; > > So, if I am missing something in current implementation or an ongoing > discussions (and that is entirely possible since I am just back after > vacation), please let me know how to achieve this functionality, but if this > is something missing in implementation, let's discuss how do we want to > realize it. > > Thanks. > > Sergei Larin > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
Hi Sergei. If an instruction conditionally writes R0 then I think it needs to implicitly use R0 for proper liveness Andy On Aug 9, 2012, at 9:48 AM, Sergei Larin <slarin at codeaurora.org> wrote:> > Hello everyone, > > Let me (re)present a question that might have previously been discussed, > but did not result in any code (AFIK). > > How do we represent a _conditional_ assignment (def) in a bundle MI? > > More contents - currently we expose internal def/use/kill information to a > bundle header - something like this: > > > BUNDLE %PC<imp-def>, %R0<imp-def>, %P0<imp-use,kill>, %R16<imp-use> > * %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0; > * %P0<def> = CMPEQri %R16, 0; > > Here CMPEQri is a compare to a predicate register instruction, and > LDriuh_cdnNotPt is a _conditional_ load, which might or might not > Take place based on the outcome of the compare... As such R0 might or might > not be defined in this bundle, which obviously changes the liveness update > process. > > My question, do we need another attribute along with isImplicit and > isEarlyClobber etc. to designate a conditional def? Furthermore, depending > on architectural details we well might have a conditional use as well... and > what about the individual (unbundled) def/use? Should this: > > %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0; > > ...become this: > > %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0; > > or even: > > %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>, %R16<use-cond>, 0; > > So, if I am missing something in current implementation or an ongoing > discussions (and that is entirely possible since I am just back after > vacation), please let me know how to achieve this functionality, but if this > is something missing in implementation, let's discuss how do we want to > realize it. > > Thanks. > > Sergei Larin > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum. > >
Andy, This is less of a question but rather a status quo verification. We currently have certain indeterminism in MI scheduler DAG construction - it is introduces by the use of std::map/std::set during edge traversal. Result - a random variation in SUnit edge order (which will remain fixed thereafter). Logically, it is the same DAG, but topologically it is a slightly different one, and if some algorithm is dependent on the order of edge traversal, we can have performance and debugging indeterminism. The way I have discovered it - VLIW scheduler can produce identical cost function for a pair of SUs, making visitation order the tie breaker, which is not deterministic per above discussion. For me it is trivial to fix, but I wonder if this might become a source of well hidden issues in the future. I am at this time not proposing anything - a fix is definitely possible, but I wonder what people think about it before I even consider this a bug. Thanks. Sergei --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121016/d0a09e82/attachment.html>