IMHO, it seems to be deadlock when changing dom0''s weight in credit2 scheduler. when the sched_adjust() in schedule.c is called, it grabs the schedule_lock after pausing all of the vcpus and then, csched_dom_cntl in sched_credit2.c, it also grab the schedule_lock by using vcpu_schedule_lock_irq(). In the credit2, all of the percpu schedule_lock points out same runqueue lock if they belong to same runqueue. Eventually, all of vcpu are paused except for itself running the code, and it try to grab schedule_lock that was grabbed by itself. Am I right? If I was wrong, please tell me my misunderstanding. And, I have a question about the code, where are in sched_adjust() in schedule.c if ( d == current->domain ) vcpu_schedule_lock_irq(current); It was very hard to understan for me..:) What does it exactly mean? I would be very grateful for any comments -- Best Regards, Eunbyung Park _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2011/10/14 Eunbyung Park <silverbottlep@gmail.com>:> IMHO, it seems to be deadlock when changing dom0''s weight in credit2 > scheduler. > > when the sched_adjust() in schedule.c is called, it grabs the > schedule_lock after pausing all of the vcpus > > and then, csched_dom_cntl in sched_credit2.c, it also grab the > schedule_lock by using vcpu_schedule_lock_irq(). > > In the credit2, all of the percpu schedule_lock points out same runqueue > lock if they belong to same runqueue. > > Eventually, all of vcpu are paused except for itself running the code, > and it try to grab schedule_lock that was grabbed by itself. > > Am I right? If I was wrong, please tell me my misunderstanding.Hmm, I think you may have discovered the source of a bug that people have been reporting but I haven''t had time to look into yet. Keir, I think that lock in schedule.c around SCHED(adjust) must be wrong. By the time we grab that lock, grabbing it will be completely pointless. What are we going to be racing against? In any case, the actual scheduler should be responsible for grabbing locks; there''s no reason that the scheduler can''t grab whatever lock it needs inside that function. I haven''t done a deep analysis, but my initial instinct is to just get rid of it. What do you think?> if ( d == current->domain ) > vcpu_schedule_lock_irq(current); > > It was very hard to understan for me..:) What does it exactly mean?You''re asking what "current" means? "current" is a macro that always resolves to the vcpu which is running on the current processor. sched_adjust() seems to be trying to avoid scheduling races in general by pausing all vcpus before calling the per-scheduler function. But if a VM is calling the op on itself, the vcpu making the hypercall can''t pause itself. So in that case (current->domain == d) will be true, so sched_adjust() grab the schedule lock of that vm instead. But really all that locking should be handled in the scheduler function, not by the generic code. It knows best what needs to be locked when. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/10/2011 12:47, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:> 2011/10/14 Eunbyung Park <silverbottlep@gmail.com>: >> IMHO, it seems to be deadlock when changing dom0''s weight in credit2 >> scheduler. >> >> when the sched_adjust() in schedule.c is called, it grabs the >> schedule_lock after pausing all of the vcpus >> >> and then, csched_dom_cntl in sched_credit2.c, it also grab the >> schedule_lock by using vcpu_schedule_lock_irq(). >> >> In the credit2, all of the percpu schedule_lock points out same runqueue >> lock if they belong to same runqueue. >> >> Eventually, all of vcpu are paused except for itself running the code, >> and it try to grab schedule_lock that was grabbed by itself. >> >> Am I right? If I was wrong, please tell me my misunderstanding. > > Hmm, I think you may have discovered the source of a bug that people > have been reporting but I haven''t had time to look into yet. > > Keir, I think that lock in schedule.c around SCHED(adjust) must be > wrong. By the time we grab that lock, grabbing it will be completely > pointless. What are we going to be racing against? In any case, the > actual scheduler should be responsible for grabbing locks; there''s no > reason that the scheduler can''t grab whatever lock it needs inside > that function. I haven''t done a deep analysis, but my initial > instinct is to just get rid of it. What do you think?Fine by me. The synchronisation in that function looks pretty fragile. It''s probably outdated too. -- Keir>> if ( d == current->domain ) >> vcpu_schedule_lock_irq(current); >> >> It was very hard to understan for me..:) What does it exactly mean? > > You''re asking what "current" means? "current" is a macro that always > resolves to the vcpu which is running on the current processor. > > sched_adjust() seems to be trying to avoid scheduling races in general > by pausing all vcpus before calling the per-scheduler function. But > if a VM is calling the op on itself, the vcpu making the hypercall > can''t pause itself. So in that case (current->domain == d) will be > true, so sched_adjust() grab the schedule lock of that vm instead. > > But really all that locking should be handled in the scheduler > function, not by the generic code. It knows best what needs to be > locked when. > > -George_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> if ( d == current->domain ) >>> vcpu_schedule_lock_irq(current); >>> >>> It was very hard to understan for me..:) What does it exactly mean? >> You''re asking what "current" means? "current" is a macro that always >> resolves to the vcpu which is running on the current processor. >> >> sched_adjust() seems to be trying to avoid scheduling races in general >> by pausing all vcpus before calling the per-scheduler function. But >> if a VM is calling the op on itself, the vcpu making the hypercall >> can''t pause itself. So in that case (current->domain == d) will be >> true, so sched_adjust() grab the schedule lock of that vm instead. >> >> But really all that locking should be handled in the scheduler >> function, not by the generic code. It knows best what needs to be >> locked when.This was what I really wanted to ask about. Now I can understand what the generic scheduler code was going to say. Thanks for your kindness. -- Best Regards, Eunbyung Park _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel