Hi George, Bot Justin and I were able to reproduce a situation where, on a 2 socket system (see below), credit2 was activating only 1 runqueue. That seemed in line with some comment in the sched_credit2.c source file, such as this one: /* * Design: * * VMs "burn" credits based on their weight; higher weight means * credits burn more slowly. The highest weight vcpu burns credits at * a rate of 1 credit per nanosecond. Others burn proportionally * more. * * vcpus are inserted into the runqueue by credit order. * * Credits are "reset" when the next vcpu in the runqueue is less than * or equal to zero. At that point, everyone''s credits are "clipped" * to a small value, and a fixed credit is added to everyone. * * The plan is for all cores that share an L2 will share the same * runqueue. At the moment, there is one global runqueue for all * cores. */ However, I remembered it different, and looking at init_pcpu() I spotted this: /* Figure out which runqueue to put it in */ /* NB: cpu 0 doesn''t get a STARTING callback, so we hard-code it to runqueue 0. */ if ( cpu == 0 ) rqi = 0; else rqi = cpu_to_socket(cpu); which looks to me like the code for having one runqueue per socket _is_ there already! That means two things: (1) that comment above is wrong :-) but, at the same time, (2) this code right here is not working! Justin also noticed that init_pcpu() was actually being called twice, for all pcpus except #0, triggering the following warning: printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu); I did some investigation, in the following system: cpu_topology : cpu: core socket node 0: 0 0 0 1: 1 0 0 2: 2 0 0 3: 3 0 0 4: 0 1 1 5: 1 1 1 6: 2 1 1 7: 3 1 1 So, what I expect is, for instance, cpu 1 to be on runqueue 0, and cpu 5 on runqueue 1. The problem is here: static void * csched_alloc_pdata(const struct scheduler *ops, int cpu) { /* Check to see if the cpu is online yet */ /* Note: cpu 0 doesn''t get a STARTING callback */ if ( cpu == 0 || cpu_to_socket(cpu) >= 0 ) init_pcpu(ops, cpu); else printk("%s: cpu %d not online yet, deferring initializatgion\n", __func__, cpu); return (void *)1; } In fact, this is meant to actually call init_pcpu() *only* on pcpu 0 (which don''t get the STARTING notification) and on those pcpus that are already onlined. Unfortunately, "cpu_to_socket(cpu) >= 0" is not (any longer?) a valid way to check the latter, and in fact init_pcpus() is always called, even for pcpus that are not identified and inited yet. That, with cpu_to_socket() returning constantly 0, means all the pcpus end up in the sole and only runqueue 0. I verified that removing the right side of the || makes things work (I enabled some debug output and added some more myself): (XEN) csched_alloc_pdata for cpu 0 on socket 0 (XEN) Adding cpu 0 to runqueue 0 (XEN) First cpu on runqueue, activating ... (XEN) CPU 1 APIC 1 -> Node 0 (XEN) csched_vcpu_insert: Inserting d32767v1 (XEN) csched_alloc_pdata for cpu 1 on socket 0 (XEN) csched_alloc_pdata: cpu 1 not online yet, deferring initializatgion (XEN) Booting processor 1/1 eip 8e000 (XEN) Initializing CPU#1 (XEN) CPU: L1 I cache 64K (64 bytes/line), D cache 64K (64 bytes/line) (XEN) CPU: L2 Cache: 512K (64 bytes/line) (XEN) CPU 1(4) -> Processor 0, Core 1 (XEN) CPU1: AMD Quad-Core AMD Opteron(tm) Processor 2376 stepping 02 (XEN) csched_cpu_starting on cpu 1 (XEN) Adding cpu 1 to runqueue 0 ... (XEN) CPU 5 APIC 5 -> Node 1 (XEN) microcode: CPU4 collect_cpu_info: patch_id=0x1000086 (XEN) csched_vcpu_insert: Inserting d32767v5 (XEN) csched_alloc_pdata for cpu 5 on socket 0 (XEN) csched_alloc_pdata: cpu 5 not online yet, deferring initializatgion (XEN) Booting processor 5/5 eip 8e000 (XEN) Initializing CPU#5 (XEN) CPU: L1 I cache 64K (64 bytes/line), D cache 64K (64 bytes/line) (XEN) CPU: L2 Cache: 512K (64 bytes/line) (XEN) CPU 5(4) -> Processor 1, Core 1 (XEN) CPU5: AMD Quad-Core AMD Opteron(tm) Processor 2376 stepping 02 (XEN) csched_cpu_starting on cpu 5 (XEN) Adding cpu 5 to runqueue 1 ... Now the question is, for fixing this, would it be preferable to do something along this line (i.e., removing the right side of the || and, in general, make csched_alloc_pdata() a pcpu 0 only thing)? Or, perhaps, should I look into a way to properly initialize the cpu_data array, so that cpu_to_socket() actually returns something ''< 0'' for pcpus not yet onlined and identified? The former is surely quicker, but I think I like the latter better (provided it''s doable). What do you think? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel