Wei, Gang
2011-Mar-04 09:40 UTC
[Xen-devel] dump runq with debug key ''r'' may cause dead loop
Recently I found dump runq with debug key ''r'' may cause dead loop like below: (XEN) active vcpus: (XEN) 1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256] (XEN) 2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256] (XEN) 3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256] ... (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256] ... (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256] ... This means the active vcpu 0.2 became non-active just after it was access in the loop ''2:'', and that list element became empty state (head->next==next). Should we always hold a lock before access any schedule related list, even in the debug purpose dump code? If it is not acceptable, then we''d better add a list_empty() check in the dump functions which access schedule related list at least to avoid such a dead loop. Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Mar-04 10:05 UTC
Re: [Xen-devel] dump runq with debug key ''r'' may cause dead loop
On 04/03/2011 09:40, "Wei, Gang" <gang.wei@intel.com> wrote:> Recently I found dump runq with debug key ''r'' may cause dead loop like below: > > (XEN) active vcpus: > (XEN) 1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256] > (XEN) 2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256] > (XEN) 3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256] > ... > (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256] > ... > (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256] > ... > > This means the active vcpu 0.2 became non-active just after it was access in > the loop ''2:'', and that list element became empty state (head->next==next). > > Should we always hold a lock before access any schedule related list, even in > the debug purpose dump code? If it is not acceptable, then we''d better add a > list_empty() check in the dump functions which access schedule related list at > least to avoid such a dead loop.The appropriate lock should be taken. Please send a patch. -- Keir> Jimmy > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wei, Gang
2011-Mar-04 14:51 UTC
[PATCH]sched_credit: Hold lock while dump scheduler info (RE: [Xen-devel] dump runq with debug key ''r'' may cause dead loop)
Here is the patch. sched_credit: Hold lock while dump scheduler info Dump runq with debug key ''r'' may cause dead loop like below: (XEN) active vcpus: (XEN) 1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256] (XEN) 2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256] (XEN) 3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256] ... (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256] ... (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256] ... This means the active vcpu 0.2 became non-active with the active list element empty just after it was accessed in the loop ''2:''. We should always hold a lock before access scheduler related list, even in the debug purpose dump code. Signed-off-by: Wei Gang <gang.wei@intel.com> diff -r 6241fa0ad1a9 xen/common/sched_credit.c --- a/xen/common/sched_credit.c Thu Mar 03 18:52:09 2011 +0000 +++ b/xen/common/sched_credit.c Sun Mar 06 04:31:57 2011 +0800 @@ -1452,6 +1452,10 @@ csched_dump(const struct scheduler *ops) struct list_head *iter_sdom, *iter_svc; struct csched_private *prv = CSCHED_PRIV(ops); int loop; + unsigned long flags; + + spin_lock_irqsave(&(prv->lock), flags); + #define idlers_buf keyhandler_scratch printk("info:\n" @@ -1500,6 +1504,8 @@ csched_dump(const struct scheduler *ops) } } #undef idlers_buf + + spin_unlock_irqrestore(&(prv->lock), flags); } static int Keir Fraser wrote onĀ 2011-03-04:> On 04/03/2011 09:40, "Wei, Gang" <gang.wei@intel.com> wrote: > >> Recently I found dump runq with debug key ''r'' may cause dead loop like >> below: >> >> (XEN) active vcpus: >> (XEN) 1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256] >> (XEN) 2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256] >> (XEN) 3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256] >> ... >> (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256] ... >> (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256] ... >> >> This means the active vcpu 0.2 became non-active just after it was >> access in the loop ''2:'', and that list element became empty state >> (head->next==next). >> >> Should we always hold a lock before access any schedule related >> list, even in the debug purpose dump code? If it is not acceptable, >> then we''d better add a >> list_empty() check in the dump functions which access schedule >> related list at least to avoid such a dead loop. > > The appropriate lock should be taken. Please send a patch.Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel