Magenheimer, Dan (HP Labs Fort Collins)
2005-Mar-08 22:46 UTC
[Xen-devel] Scheduler portability problem
I am working on Xen/ia64 changes (within Xen itself) to support multiple domains and ran into the following problem: It appears that __enter_scheduler was derived from an old version of the Linux scheduler ("schedule()"), with some changes made for simplification. Many of the function names are the same but some of the syntax and semantics have changed. In particular, four of note: 1) switch_to now takes two arguments instead of three, and 2) after switch_to is called, "other things" are done which utilize the "next" pointer 3) schedule_tail is passed the "next" task, rather than "prev" 4) schedule_tail is assumed to never return I''m all for simplification if the Linux code is too complicated, but in this case, some of the complexity is present to support other architectures. I can speak for ia64 but I suspect that similar problems will occur with other non-x86 ports. On Linux, switch_to is actually a macro and on ia64, another routine is called which returns a value that is "passed back" in the third switch_to argument. Why? Because switch_to actually does a task switch and the world may be very different when it returns. In particular, the values for prev and next are *different* when it returns. Why? Because switch_to (at least on ia64) is the key point where all of the current task state is put in memory, stacks are changed, and the new task state is taken back out of memory. Actually, that''s not quite accurate... at the point of the call to switch_to, a fair amount of state has *already* been put in memory in both the memory stack and the register stack. The only way to restore this state (short of some very complex stack analysis) is to exit each routine in the call stack the same way as it was called. So, on Linux, after the call to switch_to, "next" is no longer valid and is not used. "Prev" is used only because of the third argument macro trick, and "current" has already been changed to point to the new task. On Xen/x86, it appears schedule_tail never returns because some cool assembly tricks are used to jump directly to the right place, basically as if throwing an exception (I''m guessing because there is no useful state on the call stack on x86). As previously noted, this is problematic on ia64. Bottom line: The current code in __enter_scheduler() does not easily accommodate other architectures. I''ll be taking a look at what it will take to "fix" it, but wanted to open discussion first. I know there are some that will say "just change the ia64 code"... because of architectural constraints, this is far FAR more easily said than done. And there are some that will say that mimicking Linux is a mistake because XINL (Xen is not Linux). However, I believe this is a case where leveraging the many many years of experience on many many architectures (with said experience only documented in the code itself) of Linux will benefit Xen portability in the long run (and, in my case, in the short run). Comments? Thanks, Dan ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Hi, Dan, Your finding is real problem for porting XEN to archs like IA64 which has a large set of register files. Current XEN/x86 adopts so-called continuation mechanism to provide only one HV stack per LP, for all domains running on that LP. A simple flow when context switch can be: 1. Scheduler picks a new domain 2. In switch_to: - Save domain context (xen_regs) to prev''s thread_struct.execution_context_t - Load next''s domain context to bottom of stack (xen_regs) 3. Then schedule_tail simply does assembly tricks like you said, to reset stack pointer to xen_regs area and resume to new domain This flow is elegant regarding to small context of x86, which saves time for normal function exits since the stack content is known to be useless on this continuation mechanism. Also by this way, two parameters are enough for switch_to, since no stack switch happens at all. Say, IA64 has a large set of register files (n Kbytes) and especially, a hardware engine to manage stack registers. Then both performance and implementation difficulty are dramatically influenced if we still adopt same mechanism. So, yes, we need to find a generic way to allow both mechanisms (per-LP stack and per-domain stack) co-exist. A quick code surf seems to indicate the first and major blocker is the BUG in the end of __enter_scheduler. If we can take that check into arch specific scheduler_tail, saying let different arch to decide whether it wants a normal return, per-domain stacks may start to work if fortunate enough. As long as the execution path follows normal function return path to assembly stub, ia64_switch_to you ported from IPF linux can work smoothly. However, you are right, we need comments from broader developers to see what on earth an complete solution should be. :) Thanks, Kevin>-----Original Message----- >From: xen-devel-admin@lists.sourceforge.net >[mailto:xen-devel-admin@lists.sourceforge.net] On Behalf OfMagenheimer, Dan (HP>Labs Fort Collins) >Sent: Tuesday, March 08, 2005 2:47 PM >To: xen-devel@lists.sourceforge.net >Subject: [Xen-devel] Scheduler portability problem > >I am working on Xen/ia64 changes (within Xen itself) to support >multiple domains and ran into the following problem: > >It appears that __enter_scheduler was derived from an old version >of the Linux scheduler ("schedule()"), with some changes made for >simplification. Many of the function names are the same but some >of the syntax and semantics have changed. In particular, four >of note: > >1) switch_to now takes two arguments instead of three, and >2) after switch_to is called, "other things" are done which > utilize the "next" pointer >3) schedule_tail is passed the "next" task, rather than "prev" >4) schedule_tail is assumed to never return > >I''m all for simplification if the Linux code is too complicated, >but in this case, some of the complexity is present to support >other architectures. I can speak for ia64 but I suspect that >similar problems will occur with other non-x86 ports. > >On Linux, switch_to is actually a macro and on ia64, another routine >is called which returns a value that is "passed back" in the >third switch_to argument. Why? Because switch_to actually does a task >switch and the world may be very different when it returns. >In particular, the values for prev and next are *different* when >it returns. Why? Because switch_to (at least on ia64) is the >key point where all of the current task state is put in memory, >stacks are changed, and the new task state is taken back out >of memory. Actually, that''s not quite accurate... at the point of >the call to switch_to, a fair amount of state has *already* been >put in memory in both the memory stack and the register stack. >The only way to restore this state (short of some very complex >stack analysis) is to exit each routine in the call stack the same >way as it was called. > >So, on Linux, after the call to switch_to, "next" is no longer >valid and is not used. "Prev" is used only because of the third >argument macro trick, and "current" has already been changed to >point to the new task. > >On Xen/x86, it appears schedule_tail never returns because some cool >assembly tricks are used to jump directly to the right place, >basically as if throwing an exception (I''m guessing because there is no >useful state on the call stack on x86). As previously noted, this is >problematic on ia64. > >Bottom line: The current code in __enter_scheduler() does not easily >accommodate other architectures. I''ll be taking a look at what it >will take to "fix" it, but wanted to open discussion first. I know >there are some that will say "just change the ia64 code"... because >of architectural constraints, this is far FAR more easily said than >done. And there are some that will say that mimicking Linux is >a mistake because XINL (Xen is not Linux). However, I believe this >is a case where leveraging the many many years of experience on many >many architectures (with said experience only documented in the code >itself) of Linux will benefit Xen portability in the long run (and, >in my case, in the short run). > >Comments? > >Thanks, >Dan > > >------------------------------------------------------- >SF email is sponsored by - The IT Product Guide >Read honest & candid reviews on hundreds of IT Products from realusers.>Discover which products truly live up to the hype. Start reading now. >http://ads.osdn.com/?ad_ide95&alloc_id396&op=ick >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On 8 Mar 2005, at 22:46, Magenheimer, Dan (HP Labs Fort Collins) wrote:> Bottom line: The current code in __enter_scheduler() does not easily > accommodate other architectures. I''ll be taking a look at what it > will take to "fix" it, but wanted to open discussion first. I know > there are some that will say "just change the ia64 code"... because > of architectural constraints, this is far FAR more easily said than > done. And there are some that will say that mimicking Linux is > a mistake because XINL (Xen is not Linux). However, I believe this > is a case where leveraging the many many years of experience on many > many architectures (with said experience only documented in the code > itself) of Linux will benefit Xen portability in the long run (and, > in my case, in the short run).I''ve changed the tail of __enter_scheduler() to call a new arch-specific function context_switch(). This subsumes switch_to() and schedule_tail() so should give you the freedom to do what you require. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2005-Mar-09 20:40 UTC
RE: [Xen-devel] Scheduler portability problem
Thanks! Please pull bk://xen-ia64.bkbits.net/xeno-unstable-ia64.bk to get the complementary changes for arch/ia64. (Note that this doesn''t fix the ia64 problem yet, just ensures top of trunk compiles for ia64 and works the same as before.)> -----Original Message----- > From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] > Sent: Wednesday, March 09, 2005 2:06 AM > To: Magenheimer, Dan (HP Labs Fort Collins) > Cc: xen-devel@lists.sourceforge.net > Subject: Re: [Xen-devel] Scheduler portability problem > > > On 8 Mar 2005, at 22:46, Magenheimer, Dan (HP Labs Fort > Collins) wrote: > > > Bottom line: The current code in __enter_scheduler() does not easily > > accommodate other architectures. I''ll be taking a look at what it > > will take to "fix" it, but wanted to open discussion first. I know > > there are some that will say "just change the ia64 code"... because > > of architectural constraints, this is far FAR more easily said than > > done. And there are some that will say that mimicking Linux is > > a mistake because XINL (Xen is not Linux). However, I believe this > > is a case where leveraging the many many years of experience on many > > many architectures (with said experience only documented in the code > > itself) of Linux will benefit Xen portability in the long run (and, > > in my case, in the short run). > > I''ve changed the tail of __enter_scheduler() to call a new > arch-specific function context_switch(). This subsumes > switch_to() and > schedule_tail() so should give you the freedom to do what you require. > > -- Keir > >------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel