hi, all I run 8 vms in HP DL380G7 server. These vms run fine if vm''s schedule cap value is 0(default).But most of vms stop running about one minute later if their schedule cap values is 42 or other value. After a few weeks tracking, I find that when I modify sched_credit.c as follows, vms run fine again: 1. in csched_acct function: - svc->flags |= CSCHED_FLAG_VCPU_PARKED; + set_bit(0, &svc->flags); // set CSCHED_FLAG_VCPU_PARKED 2. in csched_vcpu_yield function: - sv->flags |= CSCHED_FLAG_VCPU_YIELD; + set_bit(1, &svc->flags); // set CSCHED_FLAG_VCPU_YIELD 3. in csched_vcpu struct: - uint16_t flags; + uint32_t flags; // set_bit ask Vms don''t stop running either if sched_credit_default_yield = 1 in grub menu.lst. So, what I modified is correct? Is there access race when xen access svc->flags from csched_acct and csched_vcpu_yield? Another interesting things is that vms don''t stop running in other server even if vms''s schedule cap values is not 0, just on this HP DL380G7 server. I don''t know why. My xen version is 4.1.0, dom0 kernel version is 2.6.32.41 from jeremy''s git a few months before. -- View this message in context: http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5107044p5107044.html Sent from the Xen - Dev mailing list archive at Nabble.com.
hi,all In my HP DL380G7 server, csched_vcpu_yield and csched_acct may access svc->flags at the same time, when this happens vms stop running because csched_vcpu_yield overwrites svc->flags which csched_acct set to CSCHED_FLAG_VCPU_PARKED (My vm''s schedule cap values is not 0). Vms run fine if I modified schedule_credit.c as follows: --- xen/common/sched_credit.c 2010-12-10 10:19:45.000000000 +0800 +++ ../../xen-4.1.0/xen/common/sched_credit.c 2010-12-31 10:47:39.000000000 +0800 @@ -135,7 +135,7 @@ struct csched_vcpu { struct vcpu *vcpu; atomic_t credit; s_time_t start_time; /* When we were scheduled (used for credit) */ - uint16_t flags; + uint32_t flags; int16_t pri; #ifdef CSCHED_STATS struct { @@ -787,7 +787,7 @@ csched_vcpu_yield(const struct scheduler if ( !sched_credit_default_yield ) { /* Let the scheduler know that this vcpu is trying to yield */ - sv->flags |= CSCHED_FLAG_VCPU_YIELD; + set_bit(1, &sv->flags); } } @@ -1086,7 +1086,7 @@ csched_acct(void* dummy) { CSCHED_STAT_CRANK(vcpu_park); vcpu_pause_nosync(svc->vcpu); - svc->flags |= CSCHED_FLAG_VCPU_PARKED; + set_bit(0, &svc->flags); } /* Lower bound on credits */ @@ -1111,7 +1111,7 @@ csched_acct(void* dummy) */ CSCHED_STAT_CRANK(vcpu_unpark); vcpu_unpause(svc->vcpu); - svc->flags &= ~CSCHED_FLAG_VCPU_PARKED; + clear_bit(0, &svc->flags); } /* Upper bound on credits means VCPU stops earning */ @@ -1337,7 +1337,7 @@ csched_schedule( * Clear YIELD flag before scheduling out */ if ( scurr->flags & CSCHED_FLAG_VCPU_YIELD ) - scurr->flags &= ~(CSCHED_FLAG_VCPU_YIELD); + clear_bit(1, &scurr->flags); Are these modification correct? Another interesting thing is that vms run fine on other server even if these vms''s credit cap value is not 0. I don''t know why. My xen version is 4.1.0, dom0 kernel version is 2.6.32.41 from jerry''s git a few months before. Thanks liuyi -- View this message in context: http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5111504p5111504.html Sent from the Xen - Dev mailing list archive at Nabble.com.
On 31/12/2011 02:55, "Liu.yi" <liu.yi24@zte.com.cn> wrote:> hi,all > In my HP DL380G7 server, csched_vcpu_yield and csched_acct may access > svc->flags at the same time, when this happens vms stop running because > csched_vcpu_yield overwrites svc->flags which csched_acct set to > CSCHED_FLAG_VCPU_PARKED (My vm''s schedule cap values is not 0). > Vms run fine if I modified schedule_credit.c as follows:Cc''ing George Dunlap. This is probably a good bug fix. -- Keir> --- xen/common/sched_credit.c 2010-12-10 10:19:45.000000000 +0800 > +++ ../../xen-4.1.0/xen/common/sched_credit.c 2010-12-31 > 10:47:39.000000000 +0800 > @@ -135,7 +135,7 @@ struct csched_vcpu { > struct vcpu *vcpu; > atomic_t credit; > s_time_t start_time; /* When we were scheduled (used for credit) */ > - uint16_t flags; > + uint32_t flags; > int16_t pri; > #ifdef CSCHED_STATS > struct { > @@ -787,7 +787,7 @@ csched_vcpu_yield(const struct scheduler > if ( !sched_credit_default_yield ) > { > /* Let the scheduler know that this vcpu is trying to yield */ > - sv->flags |= CSCHED_FLAG_VCPU_YIELD; > + set_bit(1, &sv->flags); > } > } > > @@ -1086,7 +1086,7 @@ csched_acct(void* dummy) > { > CSCHED_STAT_CRANK(vcpu_park); > vcpu_pause_nosync(svc->vcpu); > - svc->flags |= CSCHED_FLAG_VCPU_PARKED; > + set_bit(0, &svc->flags); > } > > /* Lower bound on credits */ > @@ -1111,7 +1111,7 @@ csched_acct(void* dummy) > */ > CSCHED_STAT_CRANK(vcpu_unpark); > vcpu_unpause(svc->vcpu); > - svc->flags &= ~CSCHED_FLAG_VCPU_PARKED; > + clear_bit(0, &svc->flags); > } > > /* Upper bound on credits means VCPU stops earning */ > @@ -1337,7 +1337,7 @@ csched_schedule( > * Clear YIELD flag before scheduling out > */ > if ( scurr->flags & CSCHED_FLAG_VCPU_YIELD ) > - scurr->flags &= ~(CSCHED_FLAG_VCPU_YIELD); > + clear_bit(1, &scurr->flags); > > Are these modification correct? Another interesting thing is that vms run > fine on other server even if these vms''s credit cap value is not 0. I don''t > know why. > My xen version is 4.1.0, dom0 kernel version is 2.6.32.41 from jerry''s git a > few months before. > > Thanks > > liuyi > > > -- > View this message in context: > http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5111 > 504p5111504.html > Sent from the Xen - Dev mailing list archive at Nabble.com. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
On Fri, Dec 30, 2011 at 9:55 PM, Liu.yi <liu.yi24@zte.com.cn> wrote:> hi,all > In my HP DL380G7 server, csched_vcpu_yield and csched_acct may access > svc->flags at the same time, when this happens vms stop running because > csched_vcpu_yield overwrites svc->flags which csched_acct set to > CSCHED_FLAG_VCPU_PARKED (My vm''s schedule cap values is not 0). > Vms run fine if I modified schedule_credit.c as follows: > > --- xen/common/sched_credit.c 2010-12-10 10:19:45.000000000 +0800 > +++ ../../xen-4.1.0/xen/common/sched_credit.c 2010-12-31Liu, Thanks for the patch. Unfortunately it doesn''t apply for me. It needs to be able to apply using either "patch -p1 < patch_file" or "hg qimport patch_file". There are instructions on how to make such a patch here: http://wiki.xen.org/wiki/SubmittingXenPatches Just from looking at it: You''re right, the svc->flags variable is modified while holding the vcpu scheduling lock in the case of vcpu_yield, but the scheduler private lock in the case of csched_acct(). It looks like your solution makes the update atomic -- let me see if that''s the best thing. I think the updates shouldn''t be too frequent, so it might be easier than trying to sort out locking. I''ll take a look at changing the locking and get back with you. -George> 10:47:39.000000000 +0800 > @@ -135,7 +135,7 @@ struct csched_vcpu { > struct vcpu *vcpu; > atomic_t credit; > s_time_t start_time; /* When we were scheduled (used for credit) */ > - uint16_t flags; > + uint32_t flags; > int16_t pri; > #ifdef CSCHED_STATS > struct { > @@ -787,7 +787,7 @@ csched_vcpu_yield(const struct scheduler > if ( !sched_credit_default_yield ) > { > /* Let the scheduler know that this vcpu is trying to yield */ > - sv->flags |= CSCHED_FLAG_VCPU_YIELD; > + set_bit(1, &sv->flags); > } > } > > @@ -1086,7 +1086,7 @@ csched_acct(void* dummy) > { > CSCHED_STAT_CRANK(vcpu_park); > vcpu_pause_nosync(svc->vcpu); > - svc->flags |= CSCHED_FLAG_VCPU_PARKED; > + set_bit(0, &svc->flags); > } > > /* Lower bound on credits */ > @@ -1111,7 +1111,7 @@ csched_acct(void* dummy) > */ > CSCHED_STAT_CRANK(vcpu_unpark); > vcpu_unpause(svc->vcpu); > - svc->flags &= ~CSCHED_FLAG_VCPU_PARKED; > + clear_bit(0, &svc->flags); > } > > /* Upper bound on credits means VCPU stops earning */ > @@ -1337,7 +1337,7 @@ csched_schedule( > * Clear YIELD flag before scheduling out > */ > if ( scurr->flags & CSCHED_FLAG_VCPU_YIELD ) > - scurr->flags &= ~(CSCHED_FLAG_VCPU_YIELD); > + clear_bit(1, &scurr->flags); > > Are these modification correct? Another interesting thing is that vms run > fine on other server even if these vms''s credit cap value is not 0. I don''t > know why. > My xen version is 4.1.0, dom0 kernel version is 2.6.32.41 from jerry''s git a > few months before. > > Thanks > > liuyi > > > -- > View this message in context: http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5111504p5111504.html > Sent from the Xen - Dev mailing list archive at Nabble.com. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Hi, Keir and George, thanks for your concerns. Access race to svc->flags seems inevitably when csched_acct calls vcpu_pause_nosync: vm excutes PAUSE instruction. hypercalls call csched_vcpu_yield. I had think about taking csched_private->lock in csched_vcpu_yield, but csched_acct takes this lock for a long time so that csched_vcpu_yield may been blocked for long time too, so I chooes set_bit and clear_bit. Atomic operations using LOCK prefix (like set_bit and clear_bit) will block all the physical cpus which excute memory access. Does this fact implies that spin locks are more efficient when their''s granularity is small? Sorry for my poor english. Liuyi -- View this message in context: http://xen.1045712.n5.nabble.com/credit-scheduler-svc-flags-access-race-tp5111504p5121613.html Sent from the Xen - Dev mailing list archive at Nabble.com.
>>> On 05.01.12 at 03:36, "Liu.yi" <liu.yi24@zte.com.cn> wrote: > Atomic operations using LOCK prefix (like set_bit and clear_bit) will block > all the physical cpus which excute memory access.... to that same cache line.> Does this fact implies > that spin locks are more efficient when their''s granularity is small?No - after all, acquiring a spin lock unavoidably implies using a LOCKed instruction. Jan