Eric Shelton
2013-Aug-14 19:23 UTC
What is the target CPU "topology" of an SMP HVM machine?
In doing some work to run OS X under Xen on my MacBook Air 2012 (Ivy Bridge), I ran into some issues in Darwin''s probing of what it refers to as the CPU topology. Although the Darwin kernel may make certain assumptions about the platforms on which it is being run, it nevertheless appears the various values Xen returns via CPUID and MSR are not wholly consistent. For example, when I configured the domain to have only 1 vcpu, Darwin was still able to infer that the system had multiple processors (maybe even the correct numbers of cores and processors). Adding the following to the domain config file got things to move past a divide by zero resulting from the topology info reported by Xen: cpuid = [ ''4,3:eax=0001xxxxxxxxxx1111xxxxxxxxxxxxxx'' ] The ''1111'' portion is the key part, and was merely copied from the bits natively reported by the CPU outside of Xen - a configuration providing 4 logical processors. So, seeing as this information is being closely interrogated, what is the target virtual CPU topology? How should this be reported via CPUID and MSR? Darwin appears to be trying to determine or take into account things such as a number of packages, dies per package, cores per pie & package, and threads/logical CPUs per core & package; the degrees of sharing of caches by CPUs at various cache levels, and the presence of hyperthreading. For example, Darwin''s osfmk/i386/cpu_threads.c (thankfully open source), will report the following - I believe just based on the CPUID and MSR values: TOPO_DBG("\nCache Topology Parameters:\n"); TOPO_DBG("\tLLC Depth: %d\n", topoParms.LLCDepth); TOPO_DBG("\tCores Sharing LLC: %d\n", topoParms.nCoresSharingLLC); TOPO_DBG("\tThreads Sharing LLC: %d\n", topoParms.nLCPUsSharingLLC); TOPO_DBG("\tmax Sharing of LLC: %d\n", topoParms.maxSharingLLC); TOPO_DBG("\nLogical Topology Parameters:\n"); TOPO_DBG("\tThreads per Core: %d\n", topoParms.nLThreadsPerCore); TOPO_DBG("\tCores per Die: %d\n", topoParms.nLCoresPerDie); TOPO_DBG("\tThreads per Die: %d\n", topoParms.nLThreadsPerDie); TOPO_DBG("\tDies per Package: %d\n", topoParms.nLDiesPerPackage); TOPO_DBG("\tCores per Package: %d\n", topoParms.nLCoresPerPackage); TOPO_DBG("\tThreads per Package: %d\n", topoParms.nLThreadsPerPackage); TOPO_DBG("\nPhysical Topology Parameters:\n"); TOPO_DBG("\tThreads per Core: %d\n", topoParms.nPThreadsPerCore); TOPO_DBG("\tCores per Die: %d\n", topoParms.nPCoresPerDie); TOPO_DBG("\tThreads per Die: %d\n", topoParms.nPThreadsPerDie); TOPO_DBG("\tDies per Package: %d\n", topoParms.nPDiesPerPackage); TOPO_DBG("\tCores per Package: %d\n", topoParms.nPCoresPerPackage); TOPO_DBG("\tThreads per Package: %d\n", topoParms.nPThreadsPerPackage); In addition to CPUID and MSR, does any of this get reflected in the ACPI tables? Also, is there a presumed relationship between the number of dies or cores and the number of HPET comparators to be concerned with? Finally, included in all of this is the use of an undocumented MSR 0x35, which appears to be available on at least Nehalem on up, which reports the number of cores and processors, and reports this information slightly differently between some of the Intel architectures. Would it be OK to trap & emulate this behavior where CPUID is reporting a model that implements MSR 0x35? Would it be better to be able to override MSR values in the domain config file, much as with CPUID? Thanks, Eric
Andrew Cooper
2013-Aug-14 20:06 UTC
Re: What is the target CPU "topology" of an SMP HVM machine?
On 14/08/13 20:23, Eric Shelton wrote:> In doing some work to run OS X under Xen on my MacBook Air 2012 (Ivy > Bridge), I ran into some issues in Darwin''s probing of what it refers > to as the CPU topology. Although the Darwin kernel may make certain > assumptions about the platforms on which it is being run, it > nevertheless appears the various values Xen returns via CPUID and MSR > are not wholly consistent. For example, when I configured the domain > to have only 1 vcpu, Darwin was still able to infer that the system > had multiple processors (maybe even the correct numbers of cores and > processors).The extended model name is passed through from the real CPU, so Darwin could easily be working on logic such as "I have found a CPU which claims to be this type of IvyBridge - I know it has these details"> > Adding the following to the domain config file got things to move past > a divide by zero resulting from the topology info reported by Xen: > > cpuid = [ ''4,3:eax=0001xxxxxxxxxx1111xxxxxxxxxxxxxx'' ] > > The ''1111'' portion is the key part, and was merely copied from the > bits natively reported by the CPU outside of Xen - a configuration > providing 4 logical processors. > > So, seeing as this information is being closely interrogated, what is > the target virtual CPU topology? How should this be reported via > CPUID and MSR? Darwin appears to be trying to determine or take into > account things such as a number of packages, dies per package, cores > per pie & package, and threads/logical CPUs per core & package; the > degrees of sharing of caches by CPUs at various cache levels, and the > presence of hyperthreading.Xen by default advertises all VCPUs as separate sockets, to try and dissuade "clever" schedulers from doing dumb things based on false information.> > For example, Darwin''s osfmk/i386/cpu_threads.c (thankfully open > source), will report the following - I believe just based on the CPUID > and MSR values: > > TOPO_DBG("\nCache Topology Parameters:\n"); > TOPO_DBG("\tLLC Depth: %d\n", topoParms.LLCDepth); > TOPO_DBG("\tCores Sharing LLC: %d\n", topoParms.nCoresSharingLLC); > TOPO_DBG("\tThreads Sharing LLC: %d\n", topoParms.nLCPUsSharingLLC); > TOPO_DBG("\tmax Sharing of LLC: %d\n", topoParms.maxSharingLLC); > > TOPO_DBG("\nLogical Topology Parameters:\n"); > TOPO_DBG("\tThreads per Core: %d\n", topoParms.nLThreadsPerCore); > TOPO_DBG("\tCores per Die: %d\n", topoParms.nLCoresPerDie); > TOPO_DBG("\tThreads per Die: %d\n", topoParms.nLThreadsPerDie); > TOPO_DBG("\tDies per Package: %d\n", topoParms.nLDiesPerPackage); > TOPO_DBG("\tCores per Package: %d\n", topoParms.nLCoresPerPackage); > TOPO_DBG("\tThreads per Package: %d\n", topoParms.nLThreadsPerPackage); > > TOPO_DBG("\nPhysical Topology Parameters:\n"); > TOPO_DBG("\tThreads per Core: %d\n", topoParms.nPThreadsPerCore); > TOPO_DBG("\tCores per Die: %d\n", topoParms.nPCoresPerDie); > TOPO_DBG("\tThreads per Die: %d\n", topoParms.nPThreadsPerDie); > TOPO_DBG("\tDies per Package: %d\n", topoParms.nPDiesPerPackage); > TOPO_DBG("\tCores per Package: %d\n", topoParms.nPCoresPerPackage); > TOPO_DBG("\tThreads per Package: %d\n", topoParms.nPThreadsPerPackage); > > In addition to CPUID and MSR, does any of this get reflected in the > ACPI tables? Also, is there a presumed relationship between the > number of dies or cores and the number of HPET comparators to be > concerned with?There is a distinct lack of consistency between the various mechanisms. The ACPI tables are essentially static from build time.> > Finally, included in all of this is the use of an undocumented MSR > 0x35, which appears to be available on at least Nehalem on up, which > reports the number of cores and processors, and reports this > information slightly differently between some of the Intel > architectures. Would it be OK to trap & emulate this behavior where > CPUID is reporting a model that implements MSR 0x35? Would it be > better to be able to override MSR values in the domain config file, > much as with CPUID? > > Thanks, > EricFrom a quick glance at the documentation, there are several different generations of processors which use MSR 0x35 for different purposes, although it does appear to be somewhat common as performance counters of one form or another. What reference are you using to find out that this msr provides topology information? It would certainly be a good project to try and make this information more consistent and easier to configure. ~Andrew
Dario Faggioli
2013-Aug-14 20:52 UTC
Re: What is the target CPU "topology" of an SMP HVM machine?
On mer, 2013-08-14 at 21:06 +0100, Andrew Cooper wrote:> On 14/08/13 20:23, Eric Shelton wrote: > > So, seeing as this information is being closely interrogated, what is > > the target virtual CPU topology? How should this be reported via > > CPUID and MSR? Darwin appears to be trying to determine or take into > > account things such as a number of packages, dies per package, cores > > per pie & package, and threads/logical CPUs per core & package; the > > degrees of sharing of caches by CPUs at various cache levels, and the > > presence of hyperthreading. > > Xen by default advertises all VCPUs as separate sockets, to try and > dissuade "clever" schedulers from doing dumb things based on false > information. >Are we absolutely sure about this? I''m asking because Elena run into a similar issue, i.e., seeing some vCPUs being advertised as threads/siblings (although that was a pv-guest)... Am I right Elena? I think she also has a patch that she may be able to share soon, which does right the masking of some of the CPUID stuff, as it looks like some false information was reaching out to the Linux Scheduler! :-O I''m not sure this is the exact same issue, though.... Elena, could you tell something more about this? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Eric Shelton
2013-Aug-14 22:59 UTC
Re: What is the target CPU "topology" of an SMP HVM machine?
On Wed, Aug 14, 2013 at 4:06 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:> On 14/08/13 20:23, Eric Shelton wrote: >> In doing some work to run OS X under Xen on my MacBook Air 2012 (Ivy >> Bridge), I ran into some issues in Darwin''s probing of what it refers >> to as the CPU topology. Although the Darwin kernel may make certain >> assumptions about the platforms on which it is being run, it >> nevertheless appears the various values Xen returns via CPUID and MSR >> are not wholly consistent. For example, when I configured the domain >> to have only 1 vcpu, Darwin was still able to infer that the system >> had multiple processors (maybe even the correct numbers of cores and >> processors). > > The extended model name is passed through from the real CPU, so Darwin > could easily be working on logic such as "I have found a CPU which > claims to be this type of IvyBridge - I know it has these details"In this case, I do not have that kind of a table lookup problem. The kernel source indicates that values obtained via CPUID and MSR are being used to make these determinations. The Ivy Bridge model id also triggers reading from MSRs 0x35, 0xCE, and 0x194; and CPUID leaf 7. MSR 0x35 is the one that affects the topology calculations.> Xen by default advertises all VCPUs as separate sockets, to try and > dissuade "clever" schedulers from doing dumb things based on false > information.I think OS X has such a scheduler, and I imagine moreso with the next rev (10.9), which has some new emphasis on power consumption.>> In addition to CPUID and MSR, does any of this get reflected in the >> ACPI tables? Also, is there a presumed relationship between the >> number of dies or cores and the number of HPET comparators to be >> concerned with? > > There is a distinct lack of consistency between the various mechanisms. > The ACPI tables are essentially static from build time.Although it may be nontrivial to do runtime generation or modification of the ACPI tables, are there any specific items in the ACPI tables that come to mind which maybe should vary according to the number of CPUs for better consistency? Having 256 CPU entries seems to be harmless.> From a quick glance at the documentation, there are several different > generations of processors which use MSR 0x35 for different purposes, > although it does appear to be somewhat common as performance counters of > one form or another. > > What reference are you using to find out that this msr provides topology > information?From review of the kernel source (osfmk/i386/cpuid.c), which indicates for MSR 0x35 that on Westmere bits 19-16 are the core count, and bits 15-0 are the thread count; on Nehalem, Sandy Bridge, and Ivy Bridge bits 31-16 are the core count, and 15-0 are the thread count. There is no indication as to what the other bits are (eg, the performance counters you mentioned). My Core i5 returns 0x00020004 in the lower 32 bits. It sounds like in an HVM both of these bit ranges should be equal to the number of vcpus.> It would certainly be a good project to try and make this information > more consistent and easier to configure.I''m looking at least to have a few more of the CPUID and MSR values line up with the number of vcpus and their simple topology, and see where I end up. http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/ appears to give a good amount of information as to what values are involved.
Elena Ufimtseva
2013-Aug-15 02:18 UTC
Re: What is the target CPU "topology" of an SMP HVM machine?
Hi Well, I see thats for HVM guest, right. On pv guest I run into the following when assigning vcpus to the virtual numa nodes: [ 0.004000] ------------[ cut here ]------------ [ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smpboot.c:324 topology_sane.isra.7+0x67/0x79() [ 0.004000] sched: CPU #1''s smt-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [ 0.004000] Modules linked in: [ 0.004000] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc4+ #36 [ 0.004000] 0000000000000000 0000000000000009 ffffffff813b9ce7 ffff88001f1b1e60 [ 0.004000] ffffffff810462f0 ffff88001f1b1e70 ffffffff8102f5bb ffff880000000000 [ 0.004000] 0000000000000001 ffff88003f613880 0000000000000000 000000000000b0c0 [ 0.004000] Call Trace: [ 0.004000] [<ffffffff813b9ce7>] ? dump_stack+0x41/0x51 [ 0.004000] [<ffffffff810462f0>] ? warn_slowpath_common+0x78/0x90 [ 0.004000] [<ffffffff8102f5bb>] ? topology_sane.isra.7+0x67/0x79 [ 0.004000] [<ffffffff810463a0>] ? warn_slowpath_fmt+0x45/0x4a [ 0.004000] [<ffffffff8102f5bb>] ? topology_sane.isra.7+0x67/0x79 [ 0.004000] [<ffffffff8102f80d>] ? set_cpu_sibling_map+0x1c9/0x3fc [ 0.004000] [<ffffffff8100b9e1>] ? cpu_bringup+0x47/0x86 [ 0.004000] [<ffffffff8100ba41>] ? cpu_bringup_and_idle+0x7/0x12 [ 0.004000] ---[ end trace 62b6815bad5814b4 ]--- I just added into the cpuid trap masking out the SMT-width for initial APIC ID leaf 0x1, so the topology will be physical package = logical processor. Not sure if in this case SMT cache topology should be masked out as well. Also masked out on 0x1 leaf X86_FEATURE_HT. Elena On Wed, Aug 14, 2013 at 4:52 PM, Dario Faggioli <dario.faggioli@citrix.com>wrote:> On mer, 2013-08-14 at 21:06 +0100, Andrew Cooper wrote: > > On 14/08/13 20:23, Eric Shelton wrote: > > > So, seeing as this information is being closely interrogated, what is > > > the target virtual CPU topology? How should this be reported via > > > CPUID and MSR? Darwin appears to be trying to determine or take into > > > account things such as a number of packages, dies per package, cores > > > per pie & package, and threads/logical CPUs per core & package; the > > > degrees of sharing of caches by CPUs at various cache levels, and the > > > presence of hyperthreading. > > > > Xen by default advertises all VCPUs as separate sockets, to try and > > dissuade "clever" schedulers from doing dumb things based on false > > information. > > > Are we absolutely sure about this? I''m asking because Elena run into a > similar issue, i.e., seeing some vCPUs being advertised as > threads/siblings (although that was a pv-guest)... Am I right Elena? > > I think she also has a patch that she may be able to share soon, which > does right the masking of some of the CPUID stuff, as it looks like some > false information was reaching out to the Linux Scheduler! :-O > > I''m not sure this is the exact same issue, though.... Elena, could you > tell something more about this? > > Regards, > Dario > > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > >-- Elena _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Matt Wilson
2013-Aug-21 21:30 UTC
Re: What is the target CPU "topology" of an SMP HVM machine?
On Wed, Aug 14, 2013 at 10:52:54PM +0200, Dario Faggioli wrote:> On mer, 2013-08-14 at 21:06 +0100, Andrew Cooper wrote: > > On 14/08/13 20:23, Eric Shelton wrote: > > > So, seeing as this information is being closely interrogated, what is > > > the target virtual CPU topology? How should this be reported via > > > CPUID and MSR? Darwin appears to be trying to determine or take into > > > account things such as a number of packages, dies per package, cores > > > per pie & package, and threads/logical CPUs per core & package; the > > > degrees of sharing of caches by CPUs at various cache levels, and the > > > presence of hyperthreading. > > > > Xen by default advertises all VCPUs as separate sockets, to try and > > dissuade "clever" schedulers from doing dumb things based on false > > information. > > > Are we absolutely sure about this? I''m asking because Elena run into a > similar issue, i.e., seeing some vCPUs being advertised as > threads/siblings (although that was a pv-guest)... Am I right Elena?Yes, this is the current behavior when we set up HVM guests. My guest NUMA patch for HVM guests adds the necessary features to adjust the initial local APIC ID so that CPU topology enumeration works. We should figure out how we want topology to be presented to PV guests. --msw> I think she also has a patch that she may be able to share soon, which > does right the masking of some of the CPUID stuff, as it looks like some > false information was reaching out to the Linux Scheduler! :-O > > I''m not sure this is the exact same issue, though.... Elena, could you > tell something more about this?