Sergei, I would say that each target has its own scheduling strategy that has changed considerably over time. We try to maximize code reuse across targets, but it's not easy and done ad hoc. The result is confusing code that makes it difficult to understand the strategy for any particular target. The right thing to do is: 1) Make it as easy as possible to understand how scheduling works for each of the primary targets (x86 and ARM) independent of each other. 2) Make it easy for very similar targets to piggyback on one of those implementations, without having to worry about other targets 3) Allow dissimilar targets (e.g. VLIW) to completely bypass the scheduler used by other targets and reuse only nicely self-contained parts of the framework, such as the DAG builder and individual machine description features. We've recently moved further from this ideal scenario in that we're now forcing targets to implement the bottom-up selection dag scheduler. This is not really so bad, because you can revert to "source order" scheduling, -pre-RA-sched=source, and you don't need to implement many target hooks. It burns compile time for no good reason, but you can probably live with it. Then you're free to implement your own MI-level scheduler. The next step in making it easier to maintain an llvm scheduler for "interesting" targets is to build an MI-level scheduling framework and move at least one of the primary targets to this framework so it's well supported. This would separate the nasty issues of serializing the selection DAG from the challenge of microarchitecture-level scheduling, and provide a suitable place to inject your own scheduling algorithm. It's easier to implement a scheduler when starting from a valid instruction sequence where all dependencies are resolved and no register interferences exit. To answer your question, there's no clear way to describe the current overall scheduling strategy. For now, you'll need to ask porting questions on llvm-dev. Maybe someone who's faced a similar problem will have a good suggestion. We do want to improve that situation and we intend to do that by first providing a new scheduler framework. When we get to that point, I'll be sure that the new direction can work for you and is easy to understand. All I can say now is that the new design will allow a target to compose a preRA scheduler from an MI-level framework combined with target-specific logic for selecting the optimal instruction order. I don't see any point in imposing a generic scheduling algorithm across all targets. -Andy On Nov 29, 2011, at 11:20 AM, Sergei Larin wrote:> > Andy, > > Is there any good info/docs on scheduling strategy in LLVM? As I was > complaining to you at the LLVM meeting, I end up reverse engineering/double > guessing more than I would like to... This thread shows that I am not > exactly alone in this... Thanks. > > Sergei Larin > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum. > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On > Behalf Of Andrew Trick > Sent: Tuesday, November 29, 2011 11:48 AM > To: Hal Finkel > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] [llvm-commits] Bottom-Up Scheduling? > > ARM can reuse all the default scoreboard hazard recognizer logic such as > recede cycle (naturally since its the primary client). If you can do the > same with PPC that's great. > > Andy > > On Nov 29, 2011, at 8:51 AM, Hal Finkel <hfinkel at anl.gov> wrote: > >>> Thanks! Since I have to change PPCHazardRecognizer for bottom-up support >>> anyway, is there any reason not to have it derive from >>> ScoreboardHazardRecognizer at this point? It looks like the custom >>> bundling logic could be implemented on top of the scoreboard recognizer >>> (that seems similar to what ARM's recognizer is doing). >> >> Also, how does the ARM hazard recognizer get away with not implementing >> RecedeCycle? >> >> Thanks again, >> Hal > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Andy, Thank you for the extended and prompt answer. Let me try to summaries my current position so you (and everyone interested) would have a better view of the world through my eyes ;) 1) LLVM first robust VLIW target is currently in review. It needs for scheduling strategy/quality are rather different than what current scheduler(schedulers) can provide. 2) My first attempt in porting (while I was on 2.9) resulted in a new top-down Pre-RA VLIW enabled scheduler that I was hoping to upstream as soon as our back end is accepted. I guess I have missed the window since our commit took a bit longer than planned. Now Evan told me (and you have confirmed) that it would need to change to bottom-up version for 3.0. Moreover, current "level" (exact placement in DAG->DAG pass) of Pre-RA scheduling is less than optimal (and I agree to that since I have to bend backwards to extract info readily available in MIs). 3) Your group is working on a "new" scheduler, and the best I understand it would be same general algorithm moved "closer" to RA. I also understand that at first it would not have added support for "packets"/bundles/multiops in VLIW definition (or will it?). If they will be presented, interesting discussion on how subsequent passes will be modified to recognize them would follow... but we had another thread on this topic not that long ago. So, IMHO the following would make sense: 1) It would be very nice if we can have some sort of write-up detailing proposed changes, and maybe defining overall strategy for instruction scheduling in LLVM __before__ major decisions are made. It should later be converted in to "how to" or simple doc chapter on porting scheduler(s) to new targets. Public discussion should follow, and we need to try to accommodate all needs (as much as possible). 2) Any attempts on my part to further VLIW scheduler design for my target would be unwise until such discussion would take place. I also do not separate this process from bundle/packet representation. If you perceive an overhead associated with this activity, I could volunteer to help. Also, please see my comments embedded below. Thanks. Sergei Larin -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.> -----Original Message----- > From: Andrew Trick [mailto:atrick at apple.com] > Sent: Tuesday, November 29, 2011 3:16 PM > To: Sergei Larin > Cc: 'Hal Finkel'; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] [llvm-commits] Bottom-Up Scheduling? > > Sergei, > > I would say that each target has its own scheduling strategy that has > changed considerably over time. We try to maximize code reuse across > targets, but it's not easy and done ad hoc. The result is confusing > code that makes it difficult to understand the strategy for any > particular target. > > The right thing to do is: > 1) Make it as easy as possible to understand how scheduling works for > each of the primary targets (x86 and ARM) independent of each other.[Larin, Sergei] Sure, that could be achieved with the design document/documentation set I am talking about.> 2) Make it easy for very similar targets to piggyback on one of those > implementations, without having to worry about other targets[Larin, Sergei] Yes, and having a robust VLIW scheduler would greatly help here. It would also IMHO set LLVM apart from GCC, and become an additional selling point for us.> 3) Allow dissimilar targets (e.g. VLIW) to completely bypass the > scheduler used by other targets and reuse only nicely self-contained > parts of the framework, such as the DAG builder and individual machine > description features.[Larin, Sergei] I think this is rather implementation dependent, and we can finesse this once we have framework better defined.> > We've recently moved further from this ideal scenario in that we're now > forcing targets to implement the bottom-up selection dag scheduler.[Larin, Sergei] I really dislike this, especially due to the reason that lead to this decision. I think the general "flexibility"/functionality was sacrificed for tactical reason.> This is not really so bad, because you can revert to "source order" > scheduling, -pre-RA-sched=source, and you don't need to implement many > target hooks. It burns compile time for no good reason, but you can > probably live with it. Then you're free to implement your own MI-level > scheduler.[Larin, Sergei] I am not 100% sure about this statement, but as I get closer to re-implementing my scheduler I might grasp a better picture.> > The next step in making it easier to maintain an llvm scheduler for > "interesting" targets is to build an MI-level scheduling framework and > move at least one of the primary targets to this framework so it's well > supported. This would separate the nasty issues of serializing the > selection DAG from the challenge of microarchitecture-level scheduling, > and provide a suitable place to inject your own scheduling algorithm. > It's easier to implement a scheduler when starting from a valid > instruction sequence where all dependencies are resolved and no > register interferences exit.[Larin, Sergei] Agree, and my whole point is that it needs to be done with preceding public discussion, and not de-facto with code drops.> > To answer your question, there's no clear way to describe the current > overall scheduling strategy. For now, you'll need to ask porting > questions on llvm-dev. Maybe someone who's faced a similar problem will > have a good suggestion. We do want to improve that situation and we > intend to do that by first providing a new scheduler framework. When we > get to that point, I'll be sure that the new direction can work for you[Larin, Sergei] Any clue on time frame?> and is easy to understand. All I can say now is that the new design > will allow a target to compose a preRA scheduler from an MI-level > framework combined with target-specific logic for selecting the optimal > instruction order. I don't see any point in imposing a generic > scheduling algorithm across all targets. > > -Andy[Larin, Sergei] Thank you again for the explanation. I am really looking forward to digging into it.> > On Nov 29, 2011, at 11:20 AM, Sergei Larin wrote: > > > > > Andy, > > > > Is there any good info/docs on scheduling strategy in LLVM? As I was > > complaining to you at the LLVM meeting, I end up reverse > engineering/double > > guessing more than I would like to... This thread shows that I am not > > exactly alone in this... Thanks. > > > > Sergei Larin > > > > -- > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum. > > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev- > bounces at cs.uiuc.edu] On > > Behalf Of Andrew Trick > > Sent: Tuesday, November 29, 2011 11:48 AM > > To: Hal Finkel > > Cc: llvmdev at cs.uiuc.edu > > Subject: Re: [LLVMdev] [llvm-commits] Bottom-Up Scheduling? > > > > ARM can reuse all the default scoreboard hazard recognizer logic such > as > > recede cycle (naturally since its the primary client). If you can do > the > > same with PPC that's great. > > > > Andy > > > > On Nov 29, 2011, at 8:51 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > >>> Thanks! Since I have to change PPCHazardRecognizer for bottom-up > support > >>> anyway, is there any reason not to have it derive from > >>> ScoreboardHazardRecognizer at this point? It looks like the custom > >>> bundling logic could be implemented on top of the scoreboard > recognizer > >>> (that seems similar to what ARM's recognizer is doing). > >> > >> Also, how does the ARM hazard recognizer get away with not > implementing > >> RecedeCycle? > >> > >> Thanks again, > >> Hal > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
On Nov 30, 2011, at 9:11 AM, Sergei Larin wrote:>> This is not really so bad, because you can revert to "source order" >> scheduling, -pre-RA-sched=source, and you don't need to implement many >> target hooks. It burns compile time for no good reason, but you can >> probably live with it. Then you're free to implement your own MI-level >> scheduler. > [Larin, Sergei] > I am not 100% sure about this statement, but as I get closer to > re-implementing my scheduler I might grasp a better picture.One thing that would be nice to have ASAP is a SelectionDAG serialization pass that satisfies dependencies and physical register interferences, while preserving IR instruction whenever possible. This should be totally separate from from the SelectionDAG scheduler. It should not work on SUnits. I realize this is quite disjoint from the work needed to port a new target. I'm just pointing out that it would be a welcome feature. If we had that pass, I could tell you that it would be fairly straightforward to reenable the top-down SD scheduler. At this point, since you'd rather scheduler MI's anyway, you may choose to focus on that strategy instead.>> The next step in making it easier to maintain an llvm scheduler for >> "interesting" targets is to build an MI-level scheduling framework and >> move at least one of the primary targets to this framework so it's well >> supported. This would separate the nasty issues of serializing the >> selection DAG from the challenge of microarchitecture-level scheduling, >> and provide a suitable place to inject your own scheduling algorithm. >> It's easier to implement a scheduler when starting from a valid >> instruction sequence where all dependencies are resolved and no >> register interferences exit. > > > [Larin, Sergei] > Agree, and my whole point is that it needs to be done with preceding > public discussion, and not de-facto with code drops.It will be an incremental process. I'm not going to design a complete scheduling framework for all microarchitectures "on paper" before making any changes. Design decisions will be deferred as late as they can be without holding up progress. You'll know when they're being made and have the opportunity to influence them. In fact, any new design will be strongly influenced by the scheduler work that you and others have done recently. I think you're reacting to the recent dropping of preRA top-down scheduling without public discussion. As you know it was not part of a planned strategy, and not a desirable outcome for anyone. The fact is that we couldn't wait to fix an existing design flaw in DAG serialization. The bottom-up scheduler has the ability to overcome this problem, but implementing a fix that doesn't require running the bottom-up scheduler requires significant work. The right thing to do is to implement SD serialization pass I mentioned above. That solution would be preferable to everyone, but someone needs make the investment. Of course, anyone is welcome to fix the existing top-down scheduler as well. It requires implementing the inverse of the bottom-up scheduler's physical register tracking, see LiveRegDefs, plus some really hairy logic for resolving interferences that the SelectionDAG builder has created. FWIW, we're not going to run into this issue with the MI scheduling framework that I'm referring to because no part of it will be imposed on any targets.>> To answer your question, there's no clear way to describe the current >> overall scheduling strategy. For now, you'll need to ask porting >> questions on llvm-dev. Maybe someone who's faced a similar problem will >> have a good suggestion. We do want to improve that situation and we >> intend to do that by first providing a new scheduler framework. When we >> get to that point, I'll be sure that the new direction can work for you > > [Larin, Sergei] > Any clue on time frame?2012 :) -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111130/10e6f6ff/attachment.html>