hi, all
I run 8 vms in HP DL380G7 server. These vms run fine if vm''s schedule
cap
value is 0(default).But most of vms stop running about one minute later if
their schedule cap values is 42 or other value.
After a few weeks tracking, I find that when I modify sched_credit.c as
follows, vms run fine again:
1. in csched_acct function:
- svc->flags |= CSCHED_FLAG_VCPU_PARKED;
+ set_bit(0, &svc->flags); // set CSCHED_FLAG_VCPU_PARKED
2. in csched_vcpu_yield function:
- sv->flags |= CSCHED_FLAG_VCPU_YIELD;
+ set_bit(1, &svc->flags); // set CSCHED_FLAG_VCPU_YIELD
3. in csched_vcpu struct:
- uint16_t flags;
+ uint32_t flags; // set_bit ask
Vms don''t stop running either if sched_credit_default_yield = 1 in grub
menu.lst.
So, what I modified is correct? Is there access race when xen access
svc->flags from csched_acct and csched_vcpu_yield?
Another interesting things is that vms don''t stop running in other
server
even if vms''s schedule cap values is not 0, just on this HP DL380G7
server.
I don''t know why.
My xen version is 4.1.0, dom0 kernel version is 2.6.32.41 from jeremy''s
git
a few months before.
--
View this message in context:
http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5107044p5107044.html
Sent from the Xen - Dev mailing list archive at Nabble.com.
hi,all
In my HP DL380G7 server, csched_vcpu_yield and csched_acct may access
svc->flags at the same time, when this happens vms stop running because
csched_vcpu_yield overwrites svc->flags which csched_acct set to
CSCHED_FLAG_VCPU_PARKED (My vm''s schedule cap values is not 0).
Vms run fine if I modified schedule_credit.c as follows:
--- xen/common/sched_credit.c 2010-12-10 10:19:45.000000000 +0800
+++ ../../xen-4.1.0/xen/common/sched_credit.c 2010-12-31
10:47:39.000000000 +0800
@@ -135,7 +135,7 @@ struct csched_vcpu {
struct vcpu *vcpu;
atomic_t credit;
s_time_t start_time; /* When we were scheduled (used for credit) */
- uint16_t flags;
+ uint32_t flags;
int16_t pri;
#ifdef CSCHED_STATS
struct {
@@ -787,7 +787,7 @@ csched_vcpu_yield(const struct scheduler
if ( !sched_credit_default_yield )
{
/* Let the scheduler know that this vcpu is trying to yield */
- sv->flags |= CSCHED_FLAG_VCPU_YIELD;
+ set_bit(1, &sv->flags);
}
}
@@ -1086,7 +1086,7 @@ csched_acct(void* dummy)
{
CSCHED_STAT_CRANK(vcpu_park);
vcpu_pause_nosync(svc->vcpu);
- svc->flags |= CSCHED_FLAG_VCPU_PARKED;
+ set_bit(0, &svc->flags);
}
/* Lower bound on credits */
@@ -1111,7 +1111,7 @@ csched_acct(void* dummy)
*/
CSCHED_STAT_CRANK(vcpu_unpark);
vcpu_unpause(svc->vcpu);
- svc->flags &= ~CSCHED_FLAG_VCPU_PARKED;
+ clear_bit(0, &svc->flags);
}
/* Upper bound on credits means VCPU stops earning */
@@ -1337,7 +1337,7 @@ csched_schedule(
* Clear YIELD flag before scheduling out
*/
if ( scurr->flags & CSCHED_FLAG_VCPU_YIELD )
- scurr->flags &= ~(CSCHED_FLAG_VCPU_YIELD);
+ clear_bit(1, &scurr->flags);
Are these modification correct? Another interesting thing is that vms run
fine on other server even if these vms''s credit cap value is not 0. I
don''t
know why.
My xen version is 4.1.0, dom0 kernel version is 2.6.32.41 from jerry''s
git a
few months before.
Thanks
liuyi
--
View this message in context:
http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5111504p5111504.html
Sent from the Xen - Dev mailing list archive at Nabble.com.
On 31/12/2011 02:55, "Liu.yi" <liu.yi24@zte.com.cn> wrote:> hi,all > In my HP DL380G7 server, csched_vcpu_yield and csched_acct may access > svc->flags at the same time, when this happens vms stop running because > csched_vcpu_yield overwrites svc->flags which csched_acct set to > CSCHED_FLAG_VCPU_PARKED (My vm''s schedule cap values is not 0). > Vms run fine if I modified schedule_credit.c as follows:Cc''ing George Dunlap. This is probably a good bug fix. -- Keir> --- xen/common/sched_credit.c 2010-12-10 10:19:45.000000000 +0800 > +++ ../../xen-4.1.0/xen/common/sched_credit.c 2010-12-31 > 10:47:39.000000000 +0800 > @@ -135,7 +135,7 @@ struct csched_vcpu { > struct vcpu *vcpu; > atomic_t credit; > s_time_t start_time; /* When we were scheduled (used for credit) */ > - uint16_t flags; > + uint32_t flags; > int16_t pri; > #ifdef CSCHED_STATS > struct { > @@ -787,7 +787,7 @@ csched_vcpu_yield(const struct scheduler > if ( !sched_credit_default_yield ) > { > /* Let the scheduler know that this vcpu is trying to yield */ > - sv->flags |= CSCHED_FLAG_VCPU_YIELD; > + set_bit(1, &sv->flags); > } > } > > @@ -1086,7 +1086,7 @@ csched_acct(void* dummy) > { > CSCHED_STAT_CRANK(vcpu_park); > vcpu_pause_nosync(svc->vcpu); > - svc->flags |= CSCHED_FLAG_VCPU_PARKED; > + set_bit(0, &svc->flags); > } > > /* Lower bound on credits */ > @@ -1111,7 +1111,7 @@ csched_acct(void* dummy) > */ > CSCHED_STAT_CRANK(vcpu_unpark); > vcpu_unpause(svc->vcpu); > - svc->flags &= ~CSCHED_FLAG_VCPU_PARKED; > + clear_bit(0, &svc->flags); > } > > /* Upper bound on credits means VCPU stops earning */ > @@ -1337,7 +1337,7 @@ csched_schedule( > * Clear YIELD flag before scheduling out > */ > if ( scurr->flags & CSCHED_FLAG_VCPU_YIELD ) > - scurr->flags &= ~(CSCHED_FLAG_VCPU_YIELD); > + clear_bit(1, &scurr->flags); > > Are these modification correct? Another interesting thing is that vms run > fine on other server even if these vms''s credit cap value is not 0. I don''t > know why. > My xen version is 4.1.0, dom0 kernel version is 2.6.32.41 from jerry''s git a > few months before. > > Thanks > > liuyi > > > -- > View this message in context: > http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5111 > 504p5111504.html > Sent from the Xen - Dev mailing list archive at Nabble.com. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
On Fri, Dec 30, 2011 at 9:55 PM, Liu.yi <liu.yi24@zte.com.cn> wrote:> hi,all > In my HP DL380G7 server, csched_vcpu_yield and csched_acct may access > svc->flags at the same time, when this happens vms stop running because > csched_vcpu_yield overwrites svc->flags which csched_acct set to > CSCHED_FLAG_VCPU_PARKED (My vm''s schedule cap values is not 0). > Vms run fine if I modified schedule_credit.c as follows: > > --- xen/common/sched_credit.c 2010-12-10 10:19:45.000000000 +0800 > +++ ../../xen-4.1.0/xen/common/sched_credit.c 2010-12-31Liu, Thanks for the patch. Unfortunately it doesn''t apply for me. It needs to be able to apply using either "patch -p1 < patch_file" or "hg qimport patch_file". There are instructions on how to make such a patch here: http://wiki.xen.org/wiki/SubmittingXenPatches Just from looking at it: You''re right, the svc->flags variable is modified while holding the vcpu scheduling lock in the case of vcpu_yield, but the scheduler private lock in the case of csched_acct(). It looks like your solution makes the update atomic -- let me see if that''s the best thing. I think the updates shouldn''t be too frequent, so it might be easier than trying to sort out locking. I''ll take a look at changing the locking and get back with you. -George> 10:47:39.000000000 +0800 > @@ -135,7 +135,7 @@ struct csched_vcpu { > struct vcpu *vcpu; > atomic_t credit; > s_time_t start_time; /* When we were scheduled (used for credit) */ > - uint16_t flags; > + uint32_t flags; > int16_t pri; > #ifdef CSCHED_STATS > struct { > @@ -787,7 +787,7 @@ csched_vcpu_yield(const struct scheduler > if ( !sched_credit_default_yield ) > { > /* Let the scheduler know that this vcpu is trying to yield */ > - sv->flags |= CSCHED_FLAG_VCPU_YIELD; > + set_bit(1, &sv->flags); > } > } > > @@ -1086,7 +1086,7 @@ csched_acct(void* dummy) > { > CSCHED_STAT_CRANK(vcpu_park); > vcpu_pause_nosync(svc->vcpu); > - svc->flags |= CSCHED_FLAG_VCPU_PARKED; > + set_bit(0, &svc->flags); > } > > /* Lower bound on credits */ > @@ -1111,7 +1111,7 @@ csched_acct(void* dummy) > */ > CSCHED_STAT_CRANK(vcpu_unpark); > vcpu_unpause(svc->vcpu); > - svc->flags &= ~CSCHED_FLAG_VCPU_PARKED; > + clear_bit(0, &svc->flags); > } > > /* Upper bound on credits means VCPU stops earning */ > @@ -1337,7 +1337,7 @@ csched_schedule( > * Clear YIELD flag before scheduling out > */ > if ( scurr->flags & CSCHED_FLAG_VCPU_YIELD ) > - scurr->flags &= ~(CSCHED_FLAG_VCPU_YIELD); > + clear_bit(1, &scurr->flags); > > Are these modification correct? Another interesting thing is that vms run > fine on other server even if these vms''s credit cap value is not 0. I don''t > know why. > My xen version is 4.1.0, dom0 kernel version is 2.6.32.41 from jerry''s git a > few months before. > > Thanks > > liuyi > > > -- > View this message in context: http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5111504p5111504.html > Sent from the Xen - Dev mailing list archive at Nabble.com. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Hi, Keir and George, thanks for your concerns.
Access race to svc->flags seems inevitably when csched_acct calls
vcpu_pause_nosync:
vm excutes PAUSE instruction.
hypercalls call csched_vcpu_yield.
I had think about taking csched_private->lock in csched_vcpu_yield, but
csched_acct takes this lock for a long time so that csched_vcpu_yield may
been blocked for long time too, so I chooes set_bit and clear_bit.
Atomic operations using LOCK prefix (like set_bit and clear_bit) will block
all the physical cpus which excute memory access. Does this fact implies
that spin locks are more efficient when their''s granularity is small?
Sorry for my poor english.
Liuyi
--
View this message in context:
http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5111504p5121613.html
Sent from the Xen - Dev mailing list archive at Nabble.com.
>>> On 05.01.12 at 03:36, "Liu.yi" <liu.yi24@zte.com.cn> wrote: > Atomic operations using LOCK prefix (like set_bit and clear_bit) will block > all the physical cpus which excute memory access.... to that same cache line.> Does this fact implies > that spin locks are more efficient when their''s granularity is small?No - after all, acquiring a spin lock unavoidably implies using a LOCKed instruction. Jan