George, Yunhong, and others, So, it seems that runing stop_machine_run(), and now continue_hypercall_on_cpu(), in softirq context is a bit of a problem. Because the softirq can stop the currently-running vcpu from being descheduled we can end up with subtle deadlocks. For example, with s_m_r() we try to rendezvous all cpus in softirq context -- we can have CPU A enter the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning trying to pause VCPU X. Hence CPU B doesn''t get into softirq, and so CPU A never leaves it, and we have deadlock. There are various possible solutions to this, but one of the architecturally neatest would be to run the s_m_r() and c_h_o_c() work in a ''Linux-workqueue'' type of environment -- i.e., in a proper non-guest vcpu context. Rather than introducing the whole kthread concept into Xen, one possibility would be to schedule this work on the idle vcpus -- effectively promoting idle vcpus to a more general kind of ''Xen worker vcpu'' whose job can include running the idle loop. One bit of mechanism this would require is the ability to bump the idle vcpu priority up - preferably to ''max'' priority forcing it to run next until we return it to idle/lowest priority. George: how hard would such a mechanism be to implement do you think? More generally: what do people think of this idea? Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-19 05:00 UTC
Re: [Xen-devel] [PROPOSAL] Doing work in idle-vcpu context
Keir Fraser wrote:> George, Yunhong, and others, > > So, it seems that runing stop_machine_run(), and now > continue_hypercall_on_cpu(), in softirq context is a bit of a problem. > Because the softirq can stop the currently-running vcpu from being > descheduled we can end up with subtle deadlocks. For example, with s_m_r() > we try to rendezvous all cpus in softirq context -- we can have CPU A enter > the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning > trying to pause VCPU X. Hence CPU B doesn''t get into softirq, and so CPU A > never leaves it, and we have deadlock. > > There are various possible solutions to this, but one of the architecturally > neatest would be to run the s_m_r() and c_h_o_c() work in a > ''Linux-workqueue'' type of environment -- i.e., in a proper non-guest vcpu > context. Rather than introducing the whole kthread concept into Xen, one > possibility would be to schedule this work on the idle vcpus -- effectively > promoting idle vcpus to a more general kind of ''Xen worker vcpu'' whose job > can include running the idle loop. > > One bit of mechanism this would require is the ability to bump the idle vcpu > priority up - preferably to ''max'' priority forcing it to run next until we > return it to idle/lowest priority. George: how hard would such a mechanism > be to implement do you think? > > More generally: what do people think of this idea?Sounds a little bit like my original proposal for the change of c_h_o_c(): Introduce a "hypervisor domain" for stuff like this. I still think, this would be cleaner. The hypervisor vcpus would run with high priority and they could block without rising major problems. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-19 05:55 UTC
[Xen-devel] RE: [PROPOSAL] Doing work in idle-vcpu context
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Saturday, April 17, 2010 2:06 AM >To: Jiang, Yunhong; George Dunlap; xen-devel@lists.xensource.com >Subject: [PROPOSAL] Doing work in idle-vcpu context > >George, Yunhong, and others, > >So, it seems that runing stop_machine_run(), and now >continue_hypercall_on_cpu(), in softirq context is a bit of a problem. >Because the softirq can stop the currently-running vcpu from being >descheduled we can end up with subtle deadlocks. For example, with s_m_r() >we try to rendezvous all cpus in softirq context -- we can have CPU A enter >the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning >trying to pause VCPU X. Hence CPU B doesn''t get into softirq, and so CPU A >never leaves it, and we have deadlock. > >There are various possible solutions to this, but one of the architecturally >neatest would be to run the s_m_r() and c_h_o_c() work in a >''Linux-workqueue'' type of environment -- i.e., in a proper non-guest vcpu >context. Rather than introducing the whole kthread concept into Xen, one >possibility would be to schedule this work on the idle vcpus -- effectively >promoting idle vcpus to a more general kind of ''Xen worker vcpu'' whose job >can include running the idle loop. > >One bit of mechanism this would require is the ability to bump the idle vcpu >priority up - preferably to ''max'' priority forcing it to run next until we >return it to idle/lowest priority. George: how hard would such a mechanism >be to implement do you think? > >More generally: what do people think of this idea?The only concern from me is, are there any assumption in other components that idle vcpu is always for idle, and is always lowest priority? --jyh> > Thanks, > Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dulloor
2010-Apr-19 06:08 UTC
Re: [Xen-devel] RE: [PROPOSAL] Doing work in idle-vcpu context
On Mon, Apr 19, 2010 at 1:55 AM, Jiang, Yunhong <yunhong.jiang@intel.com> wrote:> > >>-----Original Message----- >>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>Sent: Saturday, April 17, 2010 2:06 AM >>To: Jiang, Yunhong; George Dunlap; xen-devel@lists.xensource.com >>Subject: [PROPOSAL] Doing work in idle-vcpu context >> >>George, Yunhong, and others, >> >>So, it seems that runing stop_machine_run(), and now >>continue_hypercall_on_cpu(), in softirq context is a bit of a problem. >>Because the softirq can stop the currently-running vcpu from being >>descheduled we can end up with subtle deadlocks. For example, with s_m_r() >>we try to rendezvous all cpus in softirq context -- we can have CPU A enter >>the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning >>trying to pause VCPU X. Hence CPU B doesn''t get into softirq, and so CPU A >>never leaves it, and we have deadlock. >> >>There are various possible solutions to this, but one of the architecturally >>neatest would be to run the s_m_r() and c_h_o_c() work in a >>''Linux-workqueue'' type of environment -- i.e., in a proper non-guest vcpu >>context. Rather than introducing the whole kthread concept into Xen, one >>possibility would be to schedule this work on the idle vcpus -- effectively >>promoting idle vcpus to a more general kind of ''Xen worker vcpu'' whose job >>can include running the idle loop. >> >>One bit of mechanism this would require is the ability to bump the idle vcpu >>priority up - preferably to ''max'' priority forcing it to run next until we >>return it to idle/lowest priority. George: how hard would such a mechanism >>be to implement do you think? >> >>More generally: what do people think of this idea? > > The only concern from me is, are there any assumption in other components that idle > vcpu is always for idle, and is always lowest priority?Using the idle_domain as a worker_domain sounds a good idea. And, bumping the credit up doesn''t seem to be too difficult. I have attached a quickly whipped working patch (with a test driver) for this. Not many scheduler changes. I have looked at all the other places for idle_vcpu and PRI_IDLE too and they look fine to me. Keir, is this similar to what you are looking for ?> > --jyh > >> >> Thanks, >> Keir >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-19 06:45 UTC
[Xen-devel] Re: [PROPOSAL] Doing work in idle-vcpu context
On 19/04/2010 06:55, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> One bit of mechanism this would require is the ability to bump the idle vcpu >> priority up - preferably to ''max'' priority forcing it to run next until we >> return it to idle/lowest priority. George: how hard would such a mechanism >> be to implement do you think? >> >> More generally: what do people think of this idea? > > The only concern from me is, are there any assumption in other components that > idle vcpu is always for idle, and is always lowest priority?I suppose we would find out. I don''t think so, except of course it is built into the scheduler that it is lowest priority. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-19 06:53 UTC
Re: [Xen-devel] RE: [PROPOSAL] Doing work in idle-vcpu context
On 19/04/2010 07:08, "Dulloor" <dulloor@gmail.com> wrote:> Using the idle_domain as a worker_domain sounds a good idea. And, > bumping the credit up > doesn''t seem to be too difficult. I have attached a quickly whipped > working patch (with a test driver) for this. > Not many scheduler changes. I have looked at all the other places for > idle_vcpu and > PRI_IDLE too and they look fine to me. > > Keir, is this similar to what you are looking for ?Yes indeed something similar to that. It''s the scheduler changes I can''t be sure about, and of course we really need SEDF and credit2 changes as well, hence I''d also like some feedback from George. Letting the scheduler peek straight at per-cpu workqueues to check for work to do is a good idea though. Easier than adding more scheduler hooks I think. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Apr-19 09:43 UTC
[Xen-devel] Re: [PROPOSAL] Doing work in idle-vcpu context
Keir Fraser wrote:> One bit of mechanism this would require is the ability to bump the idle vcpu > priority up - preferably to ''max'' priority forcing it to run next until we > return it to idle/lowest priority. George: how hard would such a mechanism > be to implement do you think? > > More generally: what do people think of this idea? >I think it''s a pretty good idea. Obviously having to have the the idle thread be able to switch to "schedule me NOW" priority would involve some special-casing (esp with shared runqueues), but it shoudn''t be too bad. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-19 10:52 UTC
[Xen-devel] Re: [PROPOSAL] Doing work in idle-vcpu context
So I''ve now implemented this at the tip of xen-unstable staging tree. Except that I retasked the concept of ''tasklets'' to implement this, rather than introducing a whole new abstraction like Linux workqueues. Thanks to Dulloor for initial changes to the credit scheduler. I should have acknowledged you in the changeset comment too: sorry about that. :-( George: let me know if the scheduler changes in c/s 21197 look okay. Thanks, Keir On 16/04/2010 19:05, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> George, Yunhong, and others, > > So, it seems that runing stop_machine_run(), and now > continue_hypercall_on_cpu(), in softirq context is a bit of a problem. > Because the softirq can stop the currently-running vcpu from being > descheduled we can end up with subtle deadlocks. For example, with s_m_r() > we try to rendezvous all cpus in softirq context -- we can have CPU A enter > the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning > trying to pause VCPU X. Hence CPU B doesn''t get into softirq, and so CPU A > never leaves it, and we have deadlock. > > There are various possible solutions to this, but one of the architecturally > neatest would be to run the s_m_r() and c_h_o_c() work in a > ''Linux-workqueue'' type of environment -- i.e., in a proper non-guest vcpu > context. Rather than introducing the whole kthread concept into Xen, one > possibility would be to schedule this work on the idle vcpus -- effectively > promoting idle vcpus to a more general kind of ''Xen worker vcpu'' whose job > can include running the idle loop. > > One bit of mechanism this would require is the ability to bump the idle vcpu > priority up - preferably to ''max'' priority forcing it to run next until we > return it to idle/lowest priority. George: how hard would such a mechanism > be to implement do you think? > > More generally: what do people think of this idea? > > Thanks, > Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Apr 19, 2010 at 6:52 AM, Keir Fraser <keir.fraser@eu.citrix.com> wrote:> So I''ve now implemented this at the tip of xen-unstable staging tree. Except > that I retasked the concept of ''tasklets'' to implement this, rather than > introducing a whole new abstraction like Linux workqueues.Yeah, this looks better.> > Thanks to Dulloor for initial changes to the credit scheduler. I should have > acknowledged you in the changeset comment too: sorry about that. :-(No problem :)> > George: let me know if the scheduler changes in c/s 21197 look okay.George might be able to comment better, but two things : 1. Should we not set (ret.time) to some timeslice (rather than -1) when we BOOST the idle_vcpu (for csched and csched2). 2. Is it fine to use a simple list_empty in checking if the tasklet_queue is empty for a cpu, with other cpus possibly accessing the list too.> > Thanks, > Keir > > On 16/04/2010 19:05, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >> George, Yunhong, and others, >> >> So, it seems that runing stop_machine_run(), and now >> continue_hypercall_on_cpu(), in softirq context is a bit of a problem. >> Because the softirq can stop the currently-running vcpu from being >> descheduled we can end up with subtle deadlocks. For example, with s_m_r() >> we try to rendezvous all cpus in softirq context -- we can have CPU A enter >> the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning >> trying to pause VCPU X. Hence CPU B doesn''t get into softirq, and so CPU A >> never leaves it, and we have deadlock. >> >> There are various possible solutions to this, but one of the architecturally >> neatest would be to run the s_m_r() and c_h_o_c() work in a >> ''Linux-workqueue'' type of environment -- i.e., in a proper non-guest vcpu >> context. Rather than introducing the whole kthread concept into Xen, one >> possibility would be to schedule this work on the idle vcpus -- effectively >> promoting idle vcpus to a more general kind of ''Xen worker vcpu'' whose job >> can include running the idle loop. >> >> One bit of mechanism this would require is the ability to bump the idle vcpu >> priority up - preferably to ''max'' priority forcing it to run next until we >> return it to idle/lowest priority. George: how hard would such a mechanism >> be to implement do you think? >> >> More generally: what do people think of this idea? >> >> Thanks, >> Keir >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-20 12:47 UTC
[Xen-devel] Re: [PROPOSAL] Doing work in idle-vcpu context
On 20/04/2010 07:50, "Dulloor" <dulloor@gmail.com> wrote:>> George: let me know if the scheduler changes in c/s 21197 look okay. > > George might be able to comment better, but two things : > 1. Should we not set (ret.time) to some timeslice (rather than -1) > when we BOOST the idle_vcpu (for csched and csched2).I purposely skipped that bit of your patch because the idle vcpu will not be descheduled until it voluntarily re-enters the scheduler, and there is no tasklet work to do, so it becomes unboosted. The time-slice mechanism is completely redundant in this scenario so we may as well leave it turned off. I would have done that in the other two schedulers too, but they still appear to like to set a real timeout for the idle vcpu. I don''t know why, but George will be able to answer for credit2 at least!> 2. Is it fine to use a simple list_empty in checking if the > tasklet_queue is empty for a cpu, with other cpus possibly accessing > the list too.The scheduler will always be run through in its entirety at some time after any change is made to the tasklet_list (because we always raise SCHEDULE_SOFTIRQ immediately after any such change). As long as the tasklet is not removed meanwhile it is *guaranteed* that list_empty() will return false on that subsequent scheduler run-through. Also note that no matter how clever we make the tasklet_queue_empty() function, the caller itself is not synchronised to the tasklet subsystem, so its return value can be stale before the caller even looks at it, no matter what syncronisation we do before return. Therefore we have to accept that the scheduler can act on stale/bad information, and simply ensure that in such cases the scheduler will get run through again very soon after. Hence the proliferation of [cpu_]raise_softirq() calls in tasklet.c! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel