Jiang, Yunhong
2010-Jan-06 08:14 UTC
[Xen-devel] A question about changeset 20621:f9392f6eda79 and Discontinuous online node
Hi, Andre and Keir, we meet a divide_by_zero bug in xend when create guest, after checking the code, seems it is caused by changeset 20621(see below for the patch). It removes the checking for len(info[''node_to_cpu''][i]) > 0 before nodeload[i] = int(nodeload[i] * 16 / len(info[''node_to_cpu''][i])), so that if a node has no CPU populated, it will fail. A deep checking of the code reveals more than this changeset. Per my understanding to the code, currently Xen API and control panel assumes the online node is always continuous. The XEN_SYSCTL_physinfo hypercall will return only the number of online node, and control panel like tools/python/xen/xend/XendDomainInfo.py will iterate from 0~nr_nodes. However, this is not always true, considering if no memory is populated behind some socket. For example, in a NUMA system with 4 pxm domain, pxm 0/2 has both CPU and memory populated, while pxm 1/3 has only CPU.. Xen hypervisor will setup pxm~node mapping for all 4 domain(assume pxm is 1:1 mapping with node), but only node 0/2 is online (per my understanding, according to current memory allocation mechanism, only node with memory populated is online). When control panel call XEN_SYSCTL_physinfo, it will get nr_nodes as 2, and currently it will assume node 0/1 is online, this is sure to be wrong and may cause various issues. This continuous assumption apply to CPU side also. Currently nr_cpus is returned as num_online_cpus(), this will cause issue if some of cpu is offlined. I''m considering if we can pass this dis-continuous information to user space too, but that requires change this sysctl interface. The worse is, even if we can change this interface, we may run out of the 128 byte limitation for xen_sysctl hypercall if we change the NR_CPUS == 128 in future (currently the struct xen_sysctl_physinfo is 104 byte already). I''d get some input from you guys and community before I try to fix this issue, any suggestion? Thanks Yunhong Jiang diff -r a50c1cbf08ec -r f9392f6eda79 tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Fri Dec 11 08:58:06 2009 +0000 +++ b/tools/python/xen/xend/XendDomainInfo.py Fri Dec 11 08:59:54 2009 +0000 @@ -2670,10 +2670,9 @@ class XendDomainInfo: nodeload[i] += 1 break for i in range(0, nr_nodes): - if len(info[''node_to_cpu''][i]) > 0 and i in node_list: - nodeload[i] = int(nodeload[i] * 16 / len(info[''node_to_cpu''][i])) - else: - nodeload[i] = sys.maxint + nodeload[i] = int(nodeload[i] * 16 / len(info[''node_to_cpu''][i])) + if len(info[''node_to_cpu''][i]) == 0 or i not in node_list: + nodelist[i] += 8 return map(lambda x: x[0], sorted(enumerate(nodeload), key=lambda x:x[1])) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jan-06 08:35 UTC
[Xen-devel] Re: A question about changeset 20621:f9392f6eda79 and Discontinuous online node
On 06/01/2010 08:14, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> This continuous assumption apply to CPU side also. Currently nr_cpus is > returned as num_online_cpus(), this will cause issue if some of cpu is > offlined. > > I''m considering if we can pass this dis-continuous information to user space > too, but that requires change this sysctl interface. The worse is, even if we > can change this interface, we may run out of the 128 byte limitation for > xen_sysctl hypercall if we change the NR_CPUS == 128 in future (currently the > struct xen_sysctl_physinfo is 104 byte already). > > I''d get some input from you guys and community before I try to fix this issue, > any suggestion?I think sufficient info is already made available by sysctl: cpu_to_node[] info is provided by physinfo command; cpu_to_core[] and cpu_to_socket[] by get_cputopo command. In both cases holes due to offline cpus are filled with a special indicator value of ~0. In the case of the node_to_cpu list returned by xc.physinfo(), it already accounts for holes in node space by having empty lists of cpus for those non-existent or empty nodes. Should just be a case of having the xend python code understand that as necessary? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Jan-06 09:05 UTC
Re: [Xen-devel] A question about changeset 20621:f9392f6eda79 and Discontinuous online node
>>> "Jiang, Yunhong" <yunhong.jiang@intel.com> 06.01.10 09:14 >>> >I''m considering if we can pass this dis-continuous information to user >space too, but that requires change this sysctl interface. The worse >is, even if we can change this interface, we may run out of the 128 >byte limitation for xen_sysctl hypercall if we change the >NR_CPUS == 128 in future (currently the struct xen_sysctl_physinfo >is 104 byte already).Could you point out where you see the risk of growing beyond 128 bytes? No data structure in the hypercall interface depends on NR_CPUS afaics. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jan-06 09:07 UTC
[Xen-devel] RE: A question about changeset 20621:f9392f6eda79 and Discontinuous online node
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, January 06, 2010 4:36 PM >To: Jiang, Yunhong; andre.przywara@amd.com >Cc: xen-devel@lists.xensource.com >Subject: Re: A question about changeset 20621:f9392f6eda79 and Discontinuous >online node > >On 06/01/2010 08:14, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> This continuous assumption apply to CPU side also. Currently nr_cpus is >> returned as num_online_cpus(), this will cause issue if some of cpu is >> offlined. >> >> I''m considering if we can pass this dis-continuous information to user space >> too, but that requires change this sysctl interface. The worse is, even if we >> can change this interface, we may run out of the 128 byte limitation for >> xen_sysctl hypercall if we change the NR_CPUS == 128 in future (currently the >> struct xen_sysctl_physinfo is 104 byte already). >> >> I''d get some input from you guys and community before I try to fix this issue, >> any suggestion? > >I think sufficient info is already made available by sysctl: cpu_to_node[] >info is provided by physinfo command; cpu_to_core[] and cpu_to_socket[] by >get_cputopo command. In both cases holes due to offline cpus are filled with >a special indicator value of ~0. > >In the case of the node_to_cpu list returned by xc.physinfo(), it already >accounts for holes in node space by having empty lists of cpus for those >non-existent or empty nodes. Should just be a case of having the xend python >code understand that as necessary?Keir, thanks for pointing out this.The API is sufficient. We only need changes to python side to understand this. One minor thing is, maybe we need change the nr_nodes from number of online node to the maxium of online node id, otherwise, xc.physinfo may have no idea when iterate the node_to_memory information (will it be possible the a node has only memory populated, while no CPU populated?) --jyh> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jan-06 09:09 UTC
RE: [Xen-devel] A question about changeset 20621:f9392f6eda79 and Discontinuous online node
Originally I thought to use bitmap to present the online status, but seems I forgot type that idea out :( As Keir pointed out, we don''t need change the interface at all. --jyh>-----Original Message----- >From: Jan Beulich [mailto:JBeulich@novell.com] >Sent: Wednesday, January 06, 2010 5:05 PM >To: Jiang, Yunhong >Cc: andre.przywara@amd.com; Keir Fraser; xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] A question about changeset 20621:f9392f6eda79 and >Discontinuous online node > >>>> "Jiang, Yunhong" <yunhong.jiang@intel.com> 06.01.10 09:14 >>> >>I''m considering if we can pass this dis-continuous information to user >>space too, but that requires change this sysctl interface. The worse >>is, even if we can change this interface, we may run out of the 128 >>byte limitation for xen_sysctl hypercall if we change the >>NR_CPUS == 128 in future (currently the struct xen_sysctl_physinfo >>is 104 byte already). > >Could you point out where you see the risk of growing beyond 128 >bytes? No data structure in the hypercall interface depends on >NR_CPUS afaics. > >Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jan-06 09:28 UTC
[Xen-devel] Re: A question about changeset 20621:f9392f6eda79 and Discontinuous online node
On 06/01/2010 09:07, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> In the case of the node_to_cpu list returned by xc.physinfo(), it already >> accounts for holes in node space by having empty lists of cpus for those >> non-existent or empty nodes. Should just be a case of having the xend python >> code understand that as necessary? > > Keir, thanks for pointing out this.The API is sufficient. We only need changes > to python side to understand this. One minor thing is, maybe we need change > the nr_nodes from number of online node to the maxium of online node id, > otherwise, xc.physinfo may have no idea when iterate the node_to_memory > information (will it be possible the a node has only memory populated, while > no CPU populated?)Yes, I think actually that would be a bug fix, given how tools currently use the nr_nodes field. Feel free to send a patch as part of your fix-up work: I don''t think even changing the field name is necessary, but we can add a comemnt to sysctl.h. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jan-06 09:39 UTC
[Xen-devel] RE: A question about changeset 20621:f9392f6eda79 and Discontinuous online node
Sure. --jyh>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, January 06, 2010 5:29 PM >To: Jiang, Yunhong; andre.przywara@amd.com >Cc: xen-devel@lists.xensource.com >Subject: Re: A question about changeset 20621:f9392f6eda79 and Discontinuous >online node > >On 06/01/2010 09:07, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> In the case of the node_to_cpu list returned by xc.physinfo(), it already >>> accounts for holes in node space by having empty lists of cpus for those >>> non-existent or empty nodes. Should just be a case of having the xend python >>> code understand that as necessary? >> >> Keir, thanks for pointing out this.The API is sufficient. We only need changes >> to python side to understand this. One minor thing is, maybe we need change >> the nr_nodes from number of online node to the maxium of online node id, >> otherwise, xc.physinfo may have no idea when iterate the node_to_memory >> information (will it be possible the a node has only memory populated, while >> no CPU populated?) > >Yes, I think actually that would be a bug fix, given how tools currently use >the nr_nodes field. Feel free to send a patch as part of your fix-up work: I >don''t think even changing the field name is necessary, but we can add a >comemnt to sysctl.h. > > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jan-06 09:44 UTC
Re: [Xen-devel] Re: A question about changeset 20621:f9392f6eda79 and Discontinuous online node
On 06/01/2010 09:28, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> Keir, thanks for pointing out this.The API is sufficient. We only need >> changes >> to python side to understand this. One minor thing is, maybe we need change >> the nr_nodes from number of online node to the maxium of online node id, >> otherwise, xc.physinfo may have no idea when iterate the node_to_memory >> information (will it be possible the a node has only memory populated, while >> no CPU populated?) > > Yes, I think actually that would be a bug fix, given how tools currently use > the nr_nodes field. Feel free to send a patch as part of your fix-up work: I > don''t think even changing the field name is necessary, but we can add a > comemnt to sysctl.h.Actually I think we should change the nr_cpus field in the same way, and may as well change their names to max_cpu_id and max_node_id. I''ll look into this myself. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jan-06 09:46 UTC
RE: [Xen-devel] Re: A question about changeset 20621:f9392f6eda79 and Discontinuous online node
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, January 06, 2010 5:45 PM >To: Keir Fraser; Jiang, Yunhong; andre.przywara@amd.com >Cc: xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Re: A question about changeset 20621:f9392f6eda79 and >Discontinuous online node > >On 06/01/2010 09:28, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >>> Keir, thanks for pointing out this.The API is sufficient. We only need >>> changes >>> to python side to understand this. One minor thing is, maybe we need change >>> the nr_nodes from number of online node to the maxium of online node id, >>> otherwise, xc.physinfo may have no idea when iterate the node_to_memory >>> information (will it be possible the a node has only memory populated, while >>> no CPU populated?) >> >> Yes, I think actually that would be a bug fix, given how tools currently use >> the nr_nodes field. Feel free to send a patch as part of your fix-up work: I >> don''t think even changing the field name is necessary, but we can add a >> comemnt to sysctl.h. > >Actually I think we should change the nr_cpus field in the same way, and may >as well change their names to max_cpu_id and max_node_id. I''ll look into >this myself.Why we need change the nr_cpus? It has max_cpu_id already. Thanks that you will look into this. --jyh> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jan-06 10:18 UTC
Re: [Xen-devel] Re: A question about changeset 20621:f9392f6eda79 and Discontinuous online node
On 06/01/2010 09:46, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>>> Yes, I think actually that would be a bug fix, given how tools currently use >>> the nr_nodes field. Feel free to send a patch as part of your fix-up work: I >>> don''t think even changing the field name is necessary, but we can add a >>> comemnt to sysctl.h. >> >> Actually I think we should change the nr_cpus field in the same way, and may >> as well change their names to max_cpu_id and max_node_id. I''ll look into >> this myself. > > Why we need change the nr_cpus? It has max_cpu_id already. > Thanks that you will look into this.Yes, true. So I only changed nr_nodes field. I checked in a patch as c/s 20762. I didn''t change any Python code -- just up as far as the Python C extension. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel