Jiang, Yunhong
2009-Sep-24 15:31 UTC
[Xen-devel] [PATCH 2/2] Add physical CPU hotplug support in PV_ops dom0
This patch add physical CPU hotplug support to PV_ops dom0. The basic workflow is: after CPU hot add, instead of mark CPU to be present to kernel, we will try to invoke hypercall to Xen hypervisor to mark this CPU persent in hypervisor directly and then return. Xen hypervisor will send a vIRQ notification, with this notification, dom0 will online the new CPUs. We utilize processor external controller logic in pv_ops dom0 to add some hook in dom0''s ACPI code. We add /sys/device/system/xen_pcpu to present all physical CPU in the system, so that we can utilize the udev rules to online the CPU after it is added, this sepertaed directory will not cause conflict with vcpu hotplug logic. The method is echo "1" > /sys/device/system/xen_pcpuXX/online, which is quite similar to current vCPU online method, except in different sysfs entry. Currently Linux kernel has a lot of callback for logical/physical CPU O*L, we checked them and think the MCE/microcode callback is related with physical CPU O*L, while all other callback is just for virtual CPU. arch/x86/kernel/microcode_xen.c | 4 +- drivers/acpi/processor_core.c | 30 ++- drivers/xen/Makefile | 1 + drivers/xen/acpi_processor.c | 50 +++++- drivers/xen/mce.c | 75 ++++++-- drivers/xen/pcpu.c | 406 ++++++++++++++++++++++++++++++++++++++ include/linux/acpi.h | 1 + include/xen/interface/platform.h | 56 ++++++ include/xen/interface/xen.h | 1 + include/xen/pcpu.h | 8 + 10 files changed, 601 insertions(+), 31 deletions(-) Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> diff --git a/arch/x86/kernel/microcode_xen.c b/arch/x86/kernel/microcode_xen.c index 397cba1..fed9368 100644 --- a/arch/x86/kernel/microcode_xen.c +++ b/arch/x86/kernel/microcode_xen.c @@ -27,7 +27,7 @@ struct xen_microcode { char data[0]; }; -static int xen_microcode_update(int cpu) +int xen_microcode_update(int cpu) { int err; struct xen_platform_op op; @@ -54,7 +54,7 @@ static int xen_microcode_update(int cpu) return err; } - +EXPORT_SYMBOL_GPL(xen_microcode_update); static enum ucode_state xen_request_microcode_fw(int cpu, struct device *device) { char name[30]; diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c index 98010d5..69d45f6 100644 --- a/drivers/acpi/processor_core.c +++ b/drivers/acpi/processor_core.c @@ -561,6 +561,17 @@ static int get_cpu_id(acpi_handle handle, int type, u32 acpi_id) } return -1; } + +int get_apic_id(acpi_handle handle, int type, u32 acpi_id) +{ + int apic_id = -1; + + apic_id = map_mat_entry(handle, type, acpi_id); + if (apic_id == -1) + apic_id = map_madt_entry(type, acpi_id); + return apic_id; +} +EXPORT_SYMBOL_GPL(get_apic_id); #endif /* -------------------------------------------------------------------------- @@ -647,7 +658,7 @@ static int acpi_processor_get_info(struct acpi_device *device) * less than the max # of CPUs. They should be ignored _iff * they are physically not present. */ - if (pr->id == -1) { + if ( !xen_pv_domain() && (pr->id == -1) ) { if (ACPI_FAILURE (acpi_processor_hotadd_init(pr->handle, &pr->id))) { return -ENODEV; @@ -735,6 +746,10 @@ static int __cpuinit acpi_processor_start(struct acpi_device *device) per_cpu(processors, pr->id) = pr; + if (processor_cntl_xen()) + processor_cntl_xen_notify(pr, + PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD); + result = acpi_processor_add_fs(device); if (result) goto end; @@ -974,10 +989,6 @@ int acpi_processor_device_add(acpi_handle handle, struct acpi_device **device) if (!pr) return -ENODEV; - if (processor_cntl_xen()) - processor_cntl_xen_notify(pr, - PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD); - if ((pr->id >= 0) && (pr->id < nr_cpu_ids)) { kobject_uevent(&(*device)->dev.kobj, KOBJ_ONLINE); } @@ -1023,13 +1034,12 @@ static void __ref acpi_processor_hotplug_notify(acpi_handle handle, if (pr->id >= 0 && (pr->id < nr_cpu_ids)) { kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE); + + if (processor_cntl_xen()) + processor_cntl_xen_notify(pr, PROCESSOR_HOTPLUG, + HOTPLUG_TYPE_REMOVE); break; } - - if (processor_cntl_xen()) - processor_cntl_xen_notify(pr, PROCESSOR_HOTPLUG, - HOTPLUG_TYPE_REMOVE); - result = acpi_processor_start(device); if ((!result) && ((pr->id >= 0) && (pr->id < nr_cpu_ids))) { kobject_uevent(&device->dev.kobj, KOBJ_ONLINE); diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index 8e7ce48..d3c6bb3 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -6,6 +6,7 @@ CFLAGS_features.o := $(nostackp) obj-$(CONFIG_PCI) += pci.o obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o +obj-$(CONFIG_HOTPLUG_CPU) += pcpu.o obj-$(CONFIG_XEN_XENCOMM) += xencomm.o obj-$(CONFIG_XEN_BALLOON) += balloon.o obj-$(CONFIG_XEN_DEV_EVTCHN) += evtchn.o diff --git a/drivers/xen/acpi_processor.c b/drivers/xen/acpi_processor.c index 6a4e8e4..a42a3d0 100644 --- a/drivers/xen/acpi_processor.c +++ b/drivers/xen/acpi_processor.c @@ -32,6 +32,7 @@ #include <linux/cpufreq.h> #include <acpi/processor.h> #include <xen/acpi.h> +#include <xen/pcpu.h> #include <asm/xen/hypercall.h> #include <asm/xen/hypervisor.h> @@ -71,6 +72,8 @@ int processor_cntl_xen_power_cache(int cpu, int cx, int processor_cntl_xen(void) { + if (!xen_initial_domain()) + return 0; return 1; } @@ -122,6 +125,7 @@ static int processor_notify_smm(void) int processor_cntl_xen_notify(struct acpi_processor *pr, int event, int type) { int ret = -EINVAL; + int apic_id; switch (event) { case PROCESSOR_PM_INIT: @@ -135,6 +139,8 @@ int processor_cntl_xen_notify(struct acpi_processor *pr, int event, int type) case PROCESSOR_HOTPLUG: if (xen_ops.hotplug) ret = xen_ops.hotplug(pr, type); + apic_id = get_apic_id(pr->handle, 1, pr->acpi_id); + xen_pcpu_hotplug(type, apic_id); break; default: printk(KERN_ERR "Unsupport processor events %d.\n", event); @@ -423,9 +429,51 @@ static int xen_tx_notifier(struct acpi_processor *pr, int action) { return -EINVAL; } + static int xen_hotplug_notifier(struct acpi_processor *pr, int event) { - return -EINVAL; + int ret = -EINVAL; + uint32_t apic_id; + unsigned long long pxm; + acpi_status status = 0; + + xen_platform_op_t op = { + .cmd = XENPF_resource_hotplug, + .interface_version = XENPF_INTERFACE_VERSION, + .u.resource.u.sadd.acpi_id = pr->acpi_id, + .u.resource.u.sadd.apic_id = pr->id, + }; + + apic_id = get_apic_id(pr->handle, 1, pr->acpi_id); + if (apic_id < 0) + { + printk(KERN_WARNING "Can''t get apic_id for acpi_id %x\n", pr->acpi_id); + return -1; + } + op.u.resource.u.sadd.apic_id = apic_id; + + status = acpi_evaluate_integer(pr->handle, "_PXM", + NULL, &pxm); + if (ACPI_FAILURE(status)) + { + printk(KERN_WARNING "can''t get pxm for acpi_id %x\n", pr->acpi_id); + return -1; + } + op.u.resource.u.sadd.pxm = pxm; + + switch (event) + { + case HOTPLUG_TYPE_ADD: + op.u.resource.sub_cmd = XEN_CPU_add; + ret = HYPERVISOR_dom0_op(&op); + break; + case HOTPLUG_TYPE_REMOVE: + op.u.resource.sub_cmd = XEN_CPU_remove; + ret = HYPERVISOR_dom0_op(&op); + break; + } + + return ret; } static int __init xen_acpi_processor_extcntl_init(void) diff --git a/drivers/xen/mce.c b/drivers/xen/mce.c index b354dc8..79616de 100644 --- a/drivers/xen/mce.c +++ b/drivers/xen/mce.c @@ -45,9 +45,14 @@ static mc_info_t *g_mi; static mcinfo_logical_cpu_t *g_physinfo; static uint32_t ncpus; +static DEFINE_SPINLOCK(pcpu_spinlock); + +#define enter_pcpu_list() spin_lock_irqsave(&pcpu_spinlock, flags); +#define leave_pcpu_list() spin_unlock_irqrestore(&pcpu_spinlock, flags); static int convert_log(struct mc_info *mi) { + unsigned long flags; struct mcinfo_common *mic = NULL; struct mcinfo_global *mc_global; struct mcinfo_bank *mc_bank; @@ -61,6 +66,7 @@ static int convert_log(struct mc_info *mi) mc_global = (struct mcinfo_global *)mic; m.mcgstatus = mc_global->mc_gstatus; m.apicid = mc_global->mc_apicid; + enter_pcpu_list(); for (i = 0; i < ncpus; i++) { if (g_physinfo[i].mc_apicid == m.apicid) { found = 1; @@ -72,6 +78,8 @@ static int convert_log(struct mc_info *mi) m.socketid = g_physinfo[i].mc_chipid; m.cpu = m.extcpu = g_physinfo[i].mc_cpunr; m.cpuvendor = (__u8)g_physinfo[i].mc_vendor; + leave_pcpu_list(); + x86_mcinfo_lookup(&mic, mi, MC_TYPE_BANK); do { if (mic == NULL || mic->size == 0) @@ -143,46 +151,77 @@ end: return IRQ_HANDLED; } -static int bind_virq_for_mce(void) +static int sync_pcpu_info(void) { - int ret; + int ret = 0; + unsigned long flags; xen_mc_t mc_op; + mcinfo_logical_cpu_t *pcpu_info = NULL; - g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL); - - if (!g_mi) - return -ENOMEM; - + enter_pcpu_list(); /* Fetch physical CPU Numbers */ mc_op.cmd = XEN_MC_physcpuinfo; mc_op.interface_version = XEN_MCA_INTERFACE_VERSION; - set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo); + set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, NULL); ret = HYPERVISOR_mca(&mc_op); if (ret) { printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical CPU numbers\n"); - kfree(g_mi); - return ret; + goto failed; } /* Fetch each CPU Physical Info for later reference*/ ncpus = mc_op.u.mc_physcpuinfo.ncpus; - g_physinfo = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus, + pcpu_info = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus, GFP_KERNEL); - if (!g_physinfo) { - kfree(g_mi); - return -ENOMEM; - } - set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo); + if (!pcpu_info) + goto failed; + + set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, pcpu_info); ret = HYPERVISOR_mca(&mc_op); if (ret) { printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical CPUs info\n"); - kfree(g_mi); + kfree(pcpu_info); + goto failed; + } + /* replace the old pointer */ + if (g_physinfo) kfree(g_physinfo); + g_physinfo = pcpu_info; + leave_pcpu_list(); + return 0; +failed: + if (pcpu_info) + kfree(pcpu_info); + leave_pcpu_list(); + return ret; +} + +int xen_mce_virq_update_cpu(uint32_t xen_id) +{ + return sync_pcpu_info(); +} + +EXPORT_SYMBOL(xen_mce_virq_update_cpu); + +static int bind_virq_for_mce(void) +{ + int ret; + xen_mc_t mc_op; + + g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL); + + if (!g_mi) + return -ENOMEM; + + /* Fetch physical CPU Numbers */ + if ((ret = sync_pcpu_info())) + { + kfree(g_mi); return ret; } ret = bind_virq_to_irqhandler(VIRQ_MCA, 0, - mce_dom_interrupt, 0, "mce", NULL); + mce_dom_interrupt, 0, "mce", NULL); if (ret < 0) { printk(KERN_ERR "MCE_DOM0_LOG: bind_virq for DOM0 failed\n"); diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c new file mode 100644 index 0000000..3872c20 --- /dev/null +++ b/drivers/xen/pcpu.c @@ -0,0 +1,406 @@ +/* + * pcpu.c - management physical cpu in dom0 environment + */ +#include <linux/interrupt.h> +#include <linux/spinlock.h> +#include <asm/xen/hypervisor.h> +#include <asm/xen/hypercall.h> +#include <asm/cpu.h> +#include <xen/xenbus.h> +#include <xen/pcpu.h> +#include <xen/events.h> +#include <xen/acpi.h> + +void xen_pcpu_dpc(struct work_struct *work); +static struct sysdev_class xen_pcpu_sysdev_class = { + .name = "xen_pcpu", +}; + +DECLARE_WORK(work, xen_pcpu_dpc); +static DEFINE_SPINLOCK(pcpu_spinlock); + +/* No need for irq disable since hotplug notify is in workqueue context */ +#define enter_pcpu_list() spin_lock(&pcpu_spinlock); +#define leave_pcpu_list() spin_unlock(&pcpu_spinlock); + +struct pcpu +{ + struct list_head pcpu_list; + struct sys_device sysdev; + uint32_t xen_id; + uint32_t apic_id; +#define PCPU_LOOPED 0x10000000 + uint32_t flags; +}; + +struct xen_pcpus +{ + struct list_head list; + int possible; + int present; +}; +struct xen_pcpus xen_pcpus; + +#define XEN_PCPU_ONLINE 0x01 +#define XEN_PCPU_OFFLINE 0x02 +#define XEN_PCPU_ADD 0x04 +#define XEN_PCPU_REMOVE 0x08 + +static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr, + char *buf) +{ + struct pcpu *cpu = container_of(dev, struct pcpu, sysdev); + + return sprintf(buf, "%u\n", cpu->flags & XEN_PCPU_FLAGS_ONLINE); +} + +static int xen_pcpu_down(uint32_t xen_id) +{ + int ret; + xen_platform_op_t op = { + .cmd = XENPF_resource_hotplug, + .interface_version = XENPF_INTERFACE_VERSION, + .u.resource.u.cpu_ol.cpuid = xen_id, + }; + + op.u.resource.sub_cmd = XEN_CPU_offline; + ret = HYPERVISOR_dom0_op(&op); + return ret; +} + +static int xen_pcpu_up(uint32_t xen_id) +{ + int ret; + xen_platform_op_t op = { + .cmd = XENPF_resource_hotplug, + .interface_version = XENPF_INTERFACE_VERSION, + .u.resource.u.cpu_ol.cpuid = xen_id, + }; + + op.u.resource.sub_cmd = XEN_CPU_online; + ret = HYPERVISOR_dom0_op(&op); + return ret; +} + +static ssize_t __ref store_online(struct sys_device *dev, + struct sysdev_attribute *attr, + const char *buf, size_t count) +{ + struct pcpu *cpu = container_of(dev, struct pcpu, sysdev); + ssize_t ret; + + switch (buf[0]) { + case ''0'': + ret = xen_pcpu_down(cpu->xen_id); + break; + case ''1'': + ret = xen_pcpu_up(cpu->xen_id); + break; + default: + ret = -EINVAL; + } + + if (ret >= 0) + ret = count; + return ret; +} + +static SYSDEV_ATTR(online, 0644, show_online, store_online); + +static int xen_free_physical_cpu(struct pcpu * pcpu) +{ + if (!pcpu) + return 0; + + sysdev_remove_file(&pcpu->sysdev, &attr_online); + sysdev_unregister(&pcpu->sysdev); + list_del(&pcpu->pcpu_list); + kfree(pcpu); + + return 0; +} + +static struct pcpu * xen_add_physical_cpu(struct xen_physical_cpuinfo *info) +{ + struct pcpu *cpu; + int error; + + cpu = kzalloc(sizeof(struct pcpu), GFP_KERNEL); + if (!cpu) + return NULL; + + INIT_LIST_HEAD(&cpu->pcpu_list); + cpu->xen_id = info->xen_cpuid; + cpu->apic_id = info->apic_id; + cpu->flags = info->flags; + + cpu->sysdev.cls = &xen_pcpu_sysdev_class; + cpu->sysdev.id = info->xen_cpuid; + printk("get cpuid %x\n", cpu->sysdev.id); + + error = sysdev_register(&cpu->sysdev); + if (error) + { + printk("failed to register pcpu\n"); + kfree(cpu); + return NULL; + } + sysdev_create_file(&cpu->sysdev, &attr_online); + if (error) + { + printk("failed to create attr\n"); + sysdev_unregister(&cpu->sysdev); + kfree(cpu); + return NULL; + } + + return cpu; +} + +static struct xen_physical_cpuinfo * xen_fetch_cpu_info(int *num, + int *possible) +{ + int cpu_num, ret = 0; + struct xen_physical_cpuinfo *info; + xen_platform_op_t op = { + .cmd = XENPF_get_cpuinfo, + .interface_version = XENPF_INTERFACE_VERSION, + .u.pcpu_info.ncpus = 0, + }; + + set_xen_guest_handle(op.u.pcpu_info.info, NULL); + + ret = HYPERVISOR_dom0_op(&op); + if (ret) + return NULL; + cpu_num = op.u.pcpu_info.max_cpus; + + info = kzalloc(cpu_num * sizeof(struct xen_physical_cpuinfo), GFP_KERNEL); + if (!info) + return NULL; + op.u.pcpu_info.ncpus = cpu_num; + set_xen_guest_handle(op.u.pcpu_info.info, info); + + ret = HYPERVISOR_dom0_op(&op); + if (ret) + { + kfree(info); + printk(KERN_WARNING "Error for pcpu info fetch\n"); + return NULL; + } + if (possible) + *possible = op.u.pcpu_info.max_cpus; + if (num) + *num = op.u.pcpu_info.ncpus; + + return info; +} + +static int xen_pcpu_callback(uint32_t xen_id, int action) +{ + switch (action) + { + case XEN_PCPU_ONLINE: + { +#ifdef CONFIG_MICROCODE_XEN + xen_microcode_update(xen_id); +#endif +#ifdef CONFIG_MCE_XEN + xen_mce_virq_update_cpu(xen_id); +#endif + } + break; + case XEN_PCPU_OFFLINE: + { +#ifdef CONFIG_MCE_XEN + xen_mce_virq_update_cpu(xen_id); +#endif + } + break; + default: + break; + } + return 0; +} + +/* + * Sync dom0''s pcpu information with xen hypervisor''s + */ +static int xen_pcpu_sync(void) +{ + struct xen_physical_cpuinfo *info; + int cpu_num, i; + struct list_head *elem, *tmp; + struct pcpu *pcpu; + + info = xen_fetch_cpu_info(&cpu_num, NULL); + + if (!info) + return IRQ_HANDLED; + enter_pcpu_list(); + + /* Check for current cpu list */ + for (i = 0; i < cpu_num; i++) + { + int found = 0; + list_for_each_entry(pcpu, &xen_pcpus.list, pcpu_list) + { + if ( (pcpu->apic_id == info[i].apic_id) && + (pcpu->xen_id == info[i].xen_cpuid) ) + { + found = 1; + if( (info[i].flags & XEN_PCPU_FLAGS_ONLINE) && + !(pcpu->flags &XEN_PCPU_FLAGS_ONLINE)) + { + /* the pcpu is onlined now */ + pcpu->flags |= XEN_PCPU_FLAGS_ONLINE; + kobject_uevent(&pcpu->sysdev.kobj, KOBJ_ONLINE); + xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_ONLINE); + } + else if ( !(info[i].flags & XEN_PCPU_FLAGS_ONLINE) && + (pcpu->flags & XEN_PCPU_FLAGS_ONLINE)) + { + /* The pcpu is offlined now */ + pcpu->flags &= ~XEN_PCPU_FLAGS_ONLINE; + kobject_uevent(&pcpu->sysdev.kobj, KOBJ_OFFLINE); + xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_OFFLINE); + } + pcpu->flags |= PCPU_LOOPED; + } + } + if (!found) + { + struct pcpu *cpu; + + if (info[i].flags & XEN_PCPU_FLAGS_ONLINE) + printk(KERN_WARNING "A hotadd cpu is onlined also \n"); + cpu = xen_add_physical_cpu(&info[i]); + if (cpu == NULL) + goto failed; + list_add_tail(&cpu->pcpu_list, &xen_pcpus.list); + xen_pcpu_callback(cpu->xen_id, XEN_PCPU_ADD); + cpu->flags |= PCPU_LOOPED; + } + } + + list_for_each_safe(elem, tmp, &xen_pcpus.list) + { + pcpu = list_entry(elem, struct pcpu, pcpu_list); + if (pcpu->flags & PCPU_LOOPED) + pcpu->flags &= ~PCPU_LOOPED; + else + { + /* The pcpu does not exist any more, remove it */ + xen_free_physical_cpu(pcpu); + xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_REMOVE); + } + } + +failed: + leave_pcpu_list(); + return 0; +} + +static int init_xen_pcpu_info(void) +{ + int possible, cpu_num, i; + struct xen_physical_cpuinfo *info = NULL; + struct list_head *elem, *tmp; + struct pcpu *pcpu; + + xen_pcpus.possible = xen_pcpus.present = 0; + INIT_LIST_HEAD(&xen_pcpus.list); + info = xen_fetch_cpu_info(&cpu_num, &possible); + if (!info) + return -1; + xen_pcpus.possible = possible; + xen_pcpus.present = cpu_num; + enter_pcpu_list(); + for (i = 0; i < cpu_num; i++) + { + pcpu = xen_add_physical_cpu(&info[i]); + if (!pcpu) + { + leave_pcpu_list(); + goto failed; + } + list_add_tail(&pcpu->pcpu_list, &xen_pcpus.list); + } + leave_pcpu_list(); + kfree(info); + + return 0; +failed: + list_for_each_safe(elem, tmp, &xen_pcpus.list) + { + pcpu = list_entry(elem, struct pcpu, pcpu_list); + xen_free_physical_cpu(pcpu); + } + if (info) + kfree(info); + xen_pcpus.possible = xen_pcpus.present = 0; + INIT_LIST_HEAD(&xen_pcpus.list); + return -1; +} + +static irqreturn_t xen_pcpu_interrupt(int irq, void *dev_id) +{ + schedule_work(&work); + return IRQ_HANDLED; +} + +/* + * type: 0 add, 1 remove + */ +int xen_pcpu_hotplug(int type, uint32_t apic_id) +{ + struct pcpu *pcpu; + int found = 0; + + xen_pcpu_sync(); + enter_pcpu_list(); + list_for_each_entry(pcpu, &xen_pcpus.list, pcpu_list) + { + if (pcpu->apic_id == apic_id) + { + found = 1; + break; + } + } + leave_pcpu_list(); + + if (!found && (type == HOTPLUG_TYPE_ADD)) + printk(KERN_WARNING "The cpu is not added into Xen HV?\n"); + + if (found && (type == HOTPLUG_TYPE_REMOVE) ) + printk(KERN_WARNING "The cpu still exits in Xen HV?\n"); + return 0; +} +EXPORT_SYMBOL(xen_pcpu_hotplug); + +void xen_pcpu_dpc(struct work_struct *work) +{ + xen_pcpu_sync(); +} + +static int __init xen_physical_cpu_init(void) +{ + int err; + + if (!xen_initial_domain()) + return 0; + + err = sysdev_class_register(&xen_pcpu_sysdev_class); + if (err) + { + printk(KERN_WARNING "error to register sysdev for xen_pcpu\n"); + return err; + } + + err = init_xen_pcpu_info(); + if (!err) + bind_virq_to_irqhandler(VIRQ_PCPU_STATE, 0, xen_pcpu_interrupt, 0, "pcpu", NULL); + return err; +} + +subsys_initcall(xen_physical_cpu_init); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 34321cf..e414fcc 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -85,6 +85,7 @@ int acpi_boot_init (void); int acpi_boot_table_init (void); int acpi_mps_check (void); int acpi_numa_init (void); +int get_apic_id(acpi_handle handle, int type, u32 acpi_id); int acpi_table_init (void); int acpi_table_parse (char *id, acpi_table_handler handler); diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h index 6783fce..851a7c1 100644 --- a/include/xen/interface/platform.h +++ b/include/xen/interface/platform.h @@ -312,6 +312,60 @@ struct xenpf_set_processor_pminfo { typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t; DEFINE_GUEST_HANDLE_STRUCT(xenpf_set_processor_pminfo); +struct xenpf_hotadd_cpu +{ + uint32_t apic_id; + uint32_t acpi_id; + uint32_t pxm; +}; + +struct xenpf_hotremove_cpu +{ + uint32_t apic_id; +}; + +struct xenpf_cpu_ol +{ + uint32_t cpuid; +}; + +#define XENPF_resource_hotplug 55 +struct xenpf_resource_hotplug { + uint32_t sub_cmd; +#define XEN_CPU_add 1 +#define XEN_CPU_remove 2 +#define XEN_CPU_online 3 +#define XEN_CPU_offline 4 + union { + struct xenpf_hotadd_cpu sadd; + struct xenpf_hotremove_cpu sremove; + struct xenpf_cpu_ol cpu_ol; + }u; +}; + +#define XENPF_get_cpuinfo 56 +struct xen_physical_cpuinfo { + uint32_t xen_cpuid; + uint32_t apic_id; +#define XEN_PCPU_FLAGS_ONLINE 1 + uint32_t flags; + uint32_t pad; +}; +typedef struct xen_physical_cpuinfo xen_physical_cpuinfo_t; +DEFINE_GUEST_HANDLE_STRUCT(xen_physical_cpuinfo); + +struct xenpf_pcpu_info +{ + /* IN/OUT */ + uint32_t ncpus; + /* OUT */ + /* The possible CPU */ + uint32_t max_cpus; + GUEST_HANDLE(xen_physical_cpuinfo) info; +}; +typedef struct xenpf_pcpu_info xenpf_pcpu_info_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_pcpu_info); + struct xen_platform_op { uint32_t cmd; uint32_t interface_version; /* XENPF_INTERFACE_VERSION */ @@ -327,6 +381,8 @@ struct xen_platform_op { struct xenpf_change_freq change_freq; struct xenpf_getidletime getidletime; struct xenpf_set_processor_pminfo set_pminfo; + struct xenpf_resource_hotplug resource; + struct xenpf_pcpu_info pcpu_info; uint8_t pad[128]; } u; }; diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h index 812ffd5..c11920b 100644 --- a/include/xen/interface/xen.h +++ b/include/xen/interface/xen.h @@ -79,6 +79,7 @@ #define VIRQ_CONSOLE 2 /* (DOM0) Bytes received on emergency console. */ #define VIRQ_DOM_EXC 3 /* (DOM0) Exceptional event for some domain. */ #define VIRQ_DEBUGGER 6 /* (DOM0) A domain has paused for debugging. */ +#define VIRQ_PCPU_STATE 9 /* G. (DOM0) PCPU state changed */ /* Architecture-specific VIRQ definitions. */ #define VIRQ_ARCH_0 16 diff --git a/include/xen/pcpu.h b/include/xen/pcpu.h new file mode 100644 index 0000000..0d71149 --- /dev/null +++ b/include/xen/pcpu.h @@ -0,0 +1,8 @@ +#ifndef _XEN_PCPU_H +#define _XEN_PCPU_H + +extern int xen_pcpu_hotplug(int type, uint32_t apic_id); +extern int xen_mce_virq_update_cpu(uint32_t xen_id); +extern int xen_microcode_update(int cpu); + +#endif _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Sep-24 17:18 UTC
[Xen-devel] Re: [PATCH 2/2] Add physical CPU hotplug support in PV_ops dom0
On 09/24/09 08:31, Jiang, Yunhong wrote:> This patch add physical CPU hotplug support to PV_ops dom0. >Is there a patch 1/2 for the kernel?> The basic workflow is: after CPU hot add, instead of mark CPU to be present to kernel, we will try to invoke hypercall to Xen hypervisor to mark this CPU persent in hypervisor directly and then return. Xen hypervisor will send a vIRQ notification, with this notification, dom0 will online the new CPUs. >I''m a little confused here. When you say "dom0 will online the new CPUs", do you mean in the sense that Xen will start using them to schedule vcpus contexts onto them? Or do you mean that it will online them within dom0 as new vcpus?> We utilize processor external controller logic in pv_ops dom0 to add some hook in dom0''s ACPI code. > > We add /sys/device/system/xen_pcpu to present all physical CPU in the system, so that we can utilize the udev rules to online the CPU after it is added, this sepertaed directory will not cause conflict with vcpu hotplug logic. The method is echo "1" > /sys/device/system/xen_pcpuXX/online, which is quite similar to current vCPU online method, except in different sysfs entry. >Yes, that makes sense. Would it eventually be worth mirroring the normal cpu heirarchy as much as possible to add an interface for stats and power management? And I guess offlining works as expected?> Currently Linux kernel has a lot of callback for logical/physical CPU O*L, we checked them and think the MCE/microcode callback is related with physical CPU O*L, while all other callback is just for virtual CPU. >I guess in general the microcode driver never cares about hotplug of vcpus. Overall comment: this looks fairly reasonable. I don''t have any good idea about how we''re going to get these core acpi changes into an upstreamable form, but it looks OK within the context of what we have. Though I guess its another of those cases where ACPI is uncomfortably straddling dom0 and xen''s responsibilities. Ideally Xen would handle pcpu hotplug events by itself and then just notify dom0 to make any policy decisions. I''d like to see this split up into a few patches though: * generic core acpi change (get_acpi_id) * core pcpu hotplug * microcode changes * mce changes You should also give the patches a going-over with checkpatch.pl. More detailed comments below. Thanks, J> arch/x86/kernel/microcode_xen.c | 4 +- > drivers/acpi/processor_core.c | 30 ++- > drivers/xen/Makefile | 1 + > drivers/xen/acpi_processor.c | 50 +++++- > drivers/xen/mce.c | 75 ++++++-- > drivers/xen/pcpu.c | 406 ++++++++++++++++++++++++++++++++++++++ > include/linux/acpi.h | 1 + > include/xen/interface/platform.h | 56 ++++++ > include/xen/interface/xen.h | 1 + > include/xen/pcpu.h | 8 + > 10 files changed, 601 insertions(+), 31 deletions(-) > > Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> > > > diff --git a/arch/x86/kernel/microcode_xen.c b/arch/x86/kernel/microcode_xen.c > index 397cba1..fed9368 100644 > --- a/arch/x86/kernel/microcode_xen.c > +++ b/arch/x86/kernel/microcode_xen.c > @@ -27,7 +27,7 @@ struct xen_microcode { > char data[0]; > }; > > -static int xen_microcode_update(int cpu) > +int xen_microcode_update(int cpu) > { > int err; > struct xen_platform_op op; > @@ -54,7 +54,7 @@ static int xen_microcode_update(int cpu) > > return err; > } > - > +EXPORT_SYMBOL_GPL(xen_microcode_update); >I think adding a new notifier would be better here, to avoid having an explicit dependency between pcpu hotplug and the microcode driver. If nothing else, I want to maintain the microcode driver and pcpu hotplug on different branches, and having one patch touch both is awkward to manage.> static enum ucode_state xen_request_microcode_fw(int cpu, struct device *device) > { > char name[30]; > diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c > index 98010d5..69d45f6 100644 > --- a/drivers/acpi/processor_core.c > +++ b/drivers/acpi/processor_core.c > @@ -561,6 +561,17 @@ static int get_cpu_id(acpi_handle handle, int type, u32 acpi_id) > } > return -1; > } > + > +int get_apic_id(acpi_handle handle, int type, u32 acpi_id) > +{ > + int apic_id = -1; > + > + apic_id = map_mat_entry(handle, type, acpi_id); > + if (apic_id == -1) > + apic_id = map_madt_entry(type, acpi_id); > + return apic_id; > +} > +EXPORT_SYMBOL_GPL(get_apic_id); > #endif > > /* -------------------------------------------------------------------------- > @@ -647,7 +658,7 @@ static int acpi_processor_get_info(struct acpi_device *device) > * less than the max # of CPUs. They should be ignored _iff > * they are physically not present. > */ > - if (pr->id == -1) { > + if ( !xen_pv_domain() && (pr->id == -1) ) { > if (ACPI_FAILURE > (acpi_processor_hotadd_init(pr->handle, &pr->id))) { > return -ENODEV; > @@ -735,6 +746,10 @@ static int __cpuinit acpi_processor_start(struct acpi_device *device) > > per_cpu(processors, pr->id) = pr; > > + if (processor_cntl_xen()) > + processor_cntl_xen_notify(pr, > + PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD); > + > result = acpi_processor_add_fs(device); > if (result) > goto end; > @@ -974,10 +989,6 @@ int acpi_processor_device_add(acpi_handle handle, struct acpi_device **device) > if (!pr) > return -ENODEV; > > - if (processor_cntl_xen()) > - processor_cntl_xen_notify(pr, > - PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD); > - > if ((pr->id >= 0) && (pr->id < nr_cpu_ids)) { > kobject_uevent(&(*device)->dev.kobj, KOBJ_ONLINE); > } > @@ -1023,13 +1034,12 @@ static void __ref acpi_processor_hotplug_notify(acpi_handle handle, > > if (pr->id >= 0 && (pr->id < nr_cpu_ids)) { > kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE); > + > + if (processor_cntl_xen()) > + processor_cntl_xen_notify(pr, PROCESSOR_HOTPLUG, > + HOTPLUG_TYPE_REMOVE); > break; > } > - > - if (processor_cntl_xen()) > - processor_cntl_xen_notify(pr, PROCESSOR_HOTPLUG, > - HOTPLUG_TYPE_REMOVE); > - > result = acpi_processor_start(device); > if ((!result) && ((pr->id >= 0) && (pr->id < nr_cpu_ids))) { > kobject_uevent(&device->dev.kobj, KOBJ_ONLINE); > diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile > index 8e7ce48..d3c6bb3 100644 > --- a/drivers/xen/Makefile > +++ b/drivers/xen/Makefile > @@ -6,6 +6,7 @@ CFLAGS_features.o := $(nostackp) > > obj-$(CONFIG_PCI) += pci.o > obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o > +obj-$(CONFIG_HOTPLUG_CPU) += pcpu.o >Does this have any other dependencies? CONFIG_HOTPLUG_CPU doesn''t depend on ACPI. Perhaps adding a new CONFIG_XEN_HOTPLUG_PCPU config flag with the appropriate dependencies would be better (which may or may not be user-settable; I''d prefer not unless there''s a good reason to want to explicitly disable it).> obj-$(CONFIG_XEN_XENCOMM) += xencomm.o > obj-$(CONFIG_XEN_BALLOON) += balloon.o > obj-$(CONFIG_XEN_DEV_EVTCHN) += evtchn.o > diff --git a/drivers/xen/acpi_processor.c b/drivers/xen/acpi_processor.c > index 6a4e8e4..a42a3d0 100644 > --- a/drivers/xen/acpi_processor.c > +++ b/drivers/xen/acpi_processor.c > @@ -32,6 +32,7 @@ > #include <linux/cpufreq.h> > #include <acpi/processor.h> > #include <xen/acpi.h> > +#include <xen/pcpu.h> > > #include <asm/xen/hypercall.h> > #include <asm/xen/hypervisor.h> > @@ -71,6 +72,8 @@ int processor_cntl_xen_power_cache(int cpu, int cx, > > int processor_cntl_xen(void) > { > + if (!xen_initial_domain()) > + return 0; > return 1; > } > > @@ -122,6 +125,7 @@ static int processor_notify_smm(void) > int processor_cntl_xen_notify(struct acpi_processor *pr, int event, int type) > { > int ret = -EINVAL; > + int apic_id; > > switch (event) { > case PROCESSOR_PM_INIT: > @@ -135,6 +139,8 @@ int processor_cntl_xen_notify(struct acpi_processor *pr, int event, int type) > case PROCESSOR_HOTPLUG: > if (xen_ops.hotplug) > ret = xen_ops.hotplug(pr, type); > + apic_id = get_apic_id(pr->handle, 1, pr->acpi_id); > + xen_pcpu_hotplug(type, apic_id); > break; > default: > printk(KERN_ERR "Unsupport processor events %d.\n", event); > @@ -423,9 +429,51 @@ static int xen_tx_notifier(struct acpi_processor *pr, int action) > { > return -EINVAL; > } > + > static int xen_hotplug_notifier(struct acpi_processor *pr, int event) > { > - return -EINVAL; > + int ret = -EINVAL; > + uint32_t apic_id; > + unsigned long long pxm; > + acpi_status status = 0; > + > + xen_platform_op_t op = { > + .cmd = XENPF_resource_hotplug, > + .interface_version = XENPF_INTERFACE_VERSION, > + .u.resource.u.sadd.acpi_id = pr->acpi_id, > + .u.resource.u.sadd.apic_id = pr->id, > + }; > + > + apic_id = get_apic_id(pr->handle, 1, pr->acpi_id); > + if (apic_id < 0) > + { > + printk(KERN_WARNING "Can''t get apic_id for acpi_id %x\n", pr->acpi_id); > + return -1; > + } > + op.u.resource.u.sadd.apic_id = apic_id; > + > + status = acpi_evaluate_integer(pr->handle, "_PXM", > + NULL, &pxm); > + if (ACPI_FAILURE(status)) > + { > + printk(KERN_WARNING "can''t get pxm for acpi_id %x\n", pr->acpi_id); > + return -1; > + } > + op.u.resource.u.sadd.pxm = pxm; > + > + switch (event) > + { > + case HOTPLUG_TYPE_ADD: > + op.u.resource.sub_cmd = XEN_CPU_add; > + ret = HYPERVISOR_dom0_op(&op); > + break; > + case HOTPLUG_TYPE_REMOVE: > + op.u.resource.sub_cmd = XEN_CPU_remove; > + ret = HYPERVISOR_dom0_op(&op); > + break; > + } > + > + return ret; > } > > static int __init xen_acpi_processor_extcntl_init(void) > diff --git a/drivers/xen/mce.c b/drivers/xen/mce.c > index b354dc8..79616de 100644 > --- a/drivers/xen/mce.c > +++ b/drivers/xen/mce.c > @@ -45,9 +45,14 @@ > static mc_info_t *g_mi; > static mcinfo_logical_cpu_t *g_physinfo; > static uint32_t ncpus; > +static DEFINE_SPINLOCK(pcpu_spinlock); >Split the mce changes out into a separate patch too, if possible.> + > +#define enter_pcpu_list() spin_lock_irqsave(&pcpu_spinlock, flags); > +#define leave_pcpu_list() spin_unlock_irqrestore(&pcpu_spinlock, flags); >Another definition of these? Please, no. (I found the drivers/xen/pcpu.c versions of these first; see below.)> static int convert_log(struct mc_info *mi) > { > + unsigned long flags; > struct mcinfo_common *mic = NULL; > struct mcinfo_global *mc_global; > struct mcinfo_bank *mc_bank; > @@ -61,6 +66,7 @@ static int convert_log(struct mc_info *mi) > mc_global = (struct mcinfo_global *)mic; > m.mcgstatus = mc_global->mc_gstatus; > m.apicid = mc_global->mc_apicid; > + enter_pcpu_list(); > for (i = 0; i < ncpus; i++) { > if (g_physinfo[i].mc_apicid == m.apicid) { > found = 1; > @@ -72,6 +78,8 @@ static int convert_log(struct mc_info *mi) > m.socketid = g_physinfo[i].mc_chipid; > m.cpu = m.extcpu = g_physinfo[i].mc_cpunr; > m.cpuvendor = (__u8)g_physinfo[i].mc_vendor; > + leave_pcpu_list(); > + > x86_mcinfo_lookup(&mic, mi, MC_TYPE_BANK); > do { > if (mic == NULL || mic->size == 0) > @@ -143,46 +151,77 @@ end: > return IRQ_HANDLED; > } > > -static int bind_virq_for_mce(void) > +static int sync_pcpu_info(void) > { > - int ret; > + int ret = 0; > + unsigned long flags; > xen_mc_t mc_op; > + mcinfo_logical_cpu_t *pcpu_info = NULL; > > - g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL); > - > - if (!g_mi) > - return -ENOMEM; > - > + enter_pcpu_list(); > /* Fetch physical CPU Numbers */ > mc_op.cmd = XEN_MC_physcpuinfo; > mc_op.interface_version = XEN_MCA_INTERFACE_VERSION; > - set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo); > + set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, NULL); > ret = HYPERVISOR_mca(&mc_op); > if (ret) { > printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical CPU numbers\n"); > - kfree(g_mi); > - return ret; > + goto failed; > } > > /* Fetch each CPU Physical Info for later reference*/ > ncpus = mc_op.u.mc_physcpuinfo.ncpus; > - g_physinfo = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus, > + pcpu_info = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus, > GFP_KERNEL); > - if (!g_physinfo) { > - kfree(g_mi); > - return -ENOMEM; > - } > - set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, g_physinfo); > + if (!pcpu_info) > + goto failed; > + > + set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, pcpu_info); > ret = HYPERVISOR_mca(&mc_op); > if (ret) { > printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical CPUs info\n"); > - kfree(g_mi); > + kfree(pcpu_info); > + goto failed; > + } > + /* replace the old pointer */ > + if (g_physinfo) > kfree(g_physinfo); > + g_physinfo = pcpu_info; > + leave_pcpu_list(); > + return 0; > +failed: > + if (pcpu_info) > + kfree(pcpu_info); > + leave_pcpu_list(); > + return ret; > +} > + > +int xen_mce_virq_update_cpu(uint32_t xen_id) > +{ > + return sync_pcpu_info(); > +} > + > +EXPORT_SYMBOL(xen_mce_virq_update_cpu); > + > +static int bind_virq_for_mce(void) > +{ > + int ret; > + xen_mc_t mc_op; > + > + g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL); > + > + if (!g_mi) > + return -ENOMEM; > + > + /* Fetch physical CPU Numbers */ > + if ((ret = sync_pcpu_info())) > + { > + kfree(g_mi); > return ret; > } > > ret = bind_virq_to_irqhandler(VIRQ_MCA, 0, > - mce_dom_interrupt, 0, "mce", NULL); > + mce_dom_interrupt, 0, "mce", NULL); >Don''t make unrelated formatting changes.> if (ret < 0) { > printk(KERN_ERR "MCE_DOM0_LOG: bind_virq for DOM0 failed\n"); > diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c > new file mode 100644 > index 0000000..3872c20 > --- /dev/null > +++ b/drivers/xen/pcpu.c > @@ -0,0 +1,406 @@ > +/* > + * pcpu.c - management physical cpu in dom0 environment > + */ > +#include <linux/interrupt.h> > +#include <linux/spinlock.h> > +#include <asm/xen/hypervisor.h> > +#include <asm/xen/hypercall.h> > +#include <asm/cpu.h> > +#include <xen/xenbus.h> > +#include <xen/pcpu.h> > +#include <xen/events.h> > +#include <xen/acpi.h> > + > +void xen_pcpu_dpc(struct work_struct *work); > +static struct sysdev_class xen_pcpu_sysdev_class = { > + .name = "xen_pcpu", > +}; > + > +DECLARE_WORK(work, xen_pcpu_dpc); >static? Also "work" is a very generic name; how about something a bit more descriptive, or at least identifying.> +static DEFINE_SPINLOCK(pcpu_spinlock); > + > +/* No need for irq disable since hotplug notify is in workqueue context */ > +#define enter_pcpu_list() spin_lock(&pcpu_spinlock); > +#define leave_pcpu_list() spin_unlock(&pcpu_spinlock); >Don''t wrap spinlocks up like this. Always leave them open-coded, or at the very least put "lock"/"unlock" in the wrapper names. I didn''t realize enter/leave meant lock/unlock on my first read over the code.> + > +struct pcpu > +{ > + struct list_head pcpu_list; > + struct sys_device sysdev; > + uint32_t xen_id; > + uint32_t apic_id; > +#define PCPU_LOOPED 0x10000000 > + uint32_t flags; > +}; > + > +struct xen_pcpus > +{ > + struct list_head list; > + int possible; > + int present; > +}; > +struct xen_pcpus xen_pcpus; > + > +#define XEN_PCPU_ONLINE 0x01 > +#define XEN_PCPU_OFFLINE 0x02 > +#define XEN_PCPU_ADD 0x04 > +#define XEN_PCPU_REMOVE 0x08 > + > +static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr, > + char *buf) > +{ > + struct pcpu *cpu = container_of(dev, struct pcpu, sysdev); > + > + return sprintf(buf, "%u\n", cpu->flags & XEN_PCPU_FLAGS_ONLINE); >This relies on XEN_PCPU_FLAGS_ONLINE equalling 1 to guarantee that the file contains 0/1. You should be more explicit about it by adding "!= 0".> +} > + > +static int xen_pcpu_down(uint32_t xen_id) > +{ > + int ret; > + xen_platform_op_t op = { > + .cmd = XENPF_resource_hotplug, > + .interface_version = XENPF_INTERFACE_VERSION, > + .u.resource.u.cpu_ol.cpuid = xen_id, > + }; > + > + op.u.resource.sub_cmd = XEN_CPU_offline; > + ret = HYPERVISOR_dom0_op(&op); > + return ret; > +} > + > +static int xen_pcpu_up(uint32_t xen_id) > +{ > + int ret; > + xen_platform_op_t op = { > + .cmd = XENPF_resource_hotplug, > + .interface_version = XENPF_INTERFACE_VERSION, > + .u.resource.u.cpu_ol.cpuid = xen_id, > + }; > + > + op.u.resource.sub_cmd = XEN_CPU_online; > + ret = HYPERVISOR_dom0_op(&op); > + return ret; > +} > + > +static ssize_t __ref store_online(struct sys_device *dev, > + struct sysdev_attribute *attr, > + const char *buf, size_t count) > +{ > + struct pcpu *cpu = container_of(dev, struct pcpu, sysdev); > + ssize_t ret; > + > + switch (buf[0]) { > + case ''0'': > + ret = xen_pcpu_down(cpu->xen_id); > + break; > + case ''1'': > + ret = xen_pcpu_up(cpu->xen_id); > + break; > + default: > + ret = -EINVAL; > + } > + > + if (ret >= 0) > + ret = count; > + return ret; > +} > + > +static SYSDEV_ATTR(online, 0644, show_online, store_online); > + > +static int xen_free_physical_cpu(struct pcpu * pcpu) > +{ > + if (!pcpu) > + return 0; > + > + sysdev_remove_file(&pcpu->sysdev, &attr_online); > + sysdev_unregister(&pcpu->sysdev); > + list_del(&pcpu->pcpu_list); > + kfree(pcpu); > + > + return 0; > +} > + > +static struct pcpu * xen_add_physical_cpu(struct xen_physical_cpuinfo *info) > +{ > + struct pcpu *cpu; > + int error; > + > + cpu = kzalloc(sizeof(struct pcpu), GFP_KERNEL); > + if (!cpu) > + return NULL; > + > + INIT_LIST_HEAD(&cpu->pcpu_list); > + cpu->xen_id = info->xen_cpuid; > + cpu->apic_id = info->apic_id; > + cpu->flags = info->flags; > + > + cpu->sysdev.cls = &xen_pcpu_sysdev_class; > + cpu->sysdev.id = info->xen_cpuid; > + printk("get cpuid %x\n", cpu->sysdev.id); > + > + error = sysdev_register(&cpu->sysdev); > + if (error) > + { > + printk("failed to register pcpu\n"); > + kfree(cpu); > + return NULL; > + } > + sysdev_create_file(&cpu->sysdev, &attr_online); > + if (error) > + { > + printk("failed to create attr\n"); > + sysdev_unregister(&cpu->sysdev); > + kfree(cpu); > + return NULL; > + } > + > + return cpu; > +} > + > +static struct xen_physical_cpuinfo * xen_fetch_cpu_info(int *num, > + int *possible) > +{ > + int cpu_num, ret = 0; > + struct xen_physical_cpuinfo *info; > + xen_platform_op_t op = { > + .cmd = XENPF_get_cpuinfo, > + .interface_version = XENPF_INTERFACE_VERSION, > + .u.pcpu_info.ncpus = 0, > + }; > + > + set_xen_guest_handle(op.u.pcpu_info.info, NULL); > + > + ret = HYPERVISOR_dom0_op(&op); > + if (ret) > + return NULL; > + cpu_num = op.u.pcpu_info.max_cpus; > + > + info = kzalloc(cpu_num * sizeof(struct xen_physical_cpuinfo), GFP_KERNEL); > + if (!info) > + return NULL; > + op.u.pcpu_info.ncpus = cpu_num; > + set_xen_guest_handle(op.u.pcpu_info.info, info); > + > + ret = HYPERVISOR_dom0_op(&op); > + if (ret) > + { > + kfree(info); > + printk(KERN_WARNING "Error for pcpu info fetch\n"); > + return NULL; > + } > + if (possible) > + *possible = op.u.pcpu_info.max_cpus; > + if (num) > + *num = op.u.pcpu_info.ncpus; > + > + return info; > +} > + > +static int xen_pcpu_callback(uint32_t xen_id, int action) > +{ > + switch (action) > + { > + case XEN_PCPU_ONLINE: > + { > +#ifdef CONFIG_MICROCODE_XEN > + xen_microcode_update(xen_id); > +#endif > +#ifdef CONFIG_MCE_XEN > + xen_mce_virq_update_cpu(xen_id); > +#endif > + } > + break; > + case XEN_PCPU_OFFLINE: > + { > +#ifdef CONFIG_MCE_XEN > + xen_mce_virq_update_cpu(xen_id); > +#endif >Yeah, these are definitely notifier material.> + } > + break; > + default: > + break; > + } > + return 0; > +} > + > +/* > + * Sync dom0''s pcpu information with xen hypervisor''s > + */ > +static int xen_pcpu_sync(void) > +{ > + struct xen_physical_cpuinfo *info; > + int cpu_num, i; > + struct list_head *elem, *tmp; > + struct pcpu *pcpu; > + > + info = xen_fetch_cpu_info(&cpu_num, NULL); > + > + if (!info) > + return IRQ_HANDLED; > + enter_pcpu_list(); > + > + /* Check for current cpu list */ > + for (i = 0; i < cpu_num; i++) > + { > + int found = 0; > + list_for_each_entry(pcpu, &xen_pcpus.list, pcpu_list) > + { > + if ( (pcpu->apic_id == info[i].apic_id) && > + (pcpu->xen_id == info[i].xen_cpuid) ) > + { >Could you hoist some of this inner code into helper functions to reduce the nesting here?> + found = 1; > + if( (info[i].flags & XEN_PCPU_FLAGS_ONLINE) && > + !(pcpu->flags &XEN_PCPU_FLAGS_ONLINE)) > + { > + /* the pcpu is onlined now */ > + pcpu->flags |= XEN_PCPU_FLAGS_ONLINE; > + kobject_uevent(&pcpu->sysdev.kobj, KOBJ_ONLINE); > + xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_ONLINE); > + } > + else if ( !(info[i].flags & XEN_PCPU_FLAGS_ONLINE) && > + (pcpu->flags & XEN_PCPU_FLAGS_ONLINE)) > + { > + /* The pcpu is offlined now */ > + pcpu->flags &= ~XEN_PCPU_FLAGS_ONLINE; > + kobject_uevent(&pcpu->sysdev.kobj, KOBJ_OFFLINE); > + xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_OFFLINE); > + } > + pcpu->flags |= PCPU_LOOPED; > + } > + } > + if (!found) > + { > + struct pcpu *cpu; > + > + if (info[i].flags & XEN_PCPU_FLAGS_ONLINE) > + printk(KERN_WARNING "A hotadd cpu is onlined also \n"); >This could do with some clarification. Is this case that a newly added pcpu already appears to be online? Also it should be prefixed with some clue about what code/operation is printing the message.> + cpu = xen_add_physical_cpu(&info[i]); > + if (cpu == NULL) > + goto failed; > + list_add_tail(&cpu->pcpu_list, &xen_pcpus.list); > + xen_pcpu_callback(cpu->xen_id, XEN_PCPU_ADD); > + cpu->flags |= PCPU_LOOPED; > + } > + } > + > + list_for_each_safe(elem, tmp, &xen_pcpus.list) > + { > + pcpu = list_entry(elem, struct pcpu, pcpu_list); > + if (pcpu->flags & PCPU_LOOPED) > + pcpu->flags &= ~PCPU_LOOPED; > + else > + { > + /* The pcpu does not exist any more, remove it */ > + xen_free_physical_cpu(pcpu); > + xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_REMOVE); > + } > + } > + > +failed: > + leave_pcpu_list(); > + return 0; > +} > + > +static int init_xen_pcpu_info(void) >__init? Also I prefer names of the form subsystem_action, so xen_pcpu_info_init().> +{ > + int possible, cpu_num, i; > + struct xen_physical_cpuinfo *info = NULL; > + struct list_head *elem, *tmp; > + struct pcpu *pcpu; > + > + xen_pcpus.possible = xen_pcpus.present = 0; > + INIT_LIST_HEAD(&xen_pcpus.list); > + info = xen_fetch_cpu_info(&cpu_num, &possible); > + if (!info) > + return -1; > + xen_pcpus.possible = possible; > + xen_pcpus.present = cpu_num; > + enter_pcpu_list(); > + for (i = 0; i < cpu_num; i++) > + { > + pcpu = xen_add_physical_cpu(&info[i]); > + if (!pcpu) > + { > + leave_pcpu_list(); > + goto failed; > + } > + list_add_tail(&pcpu->pcpu_list, &xen_pcpus.list); > + } > + leave_pcpu_list(); > + kfree(info); > + > + return 0; > +failed: > + list_for_each_safe(elem, tmp, &xen_pcpus.list) > + { > + pcpu = list_entry(elem, struct pcpu, pcpu_list); > + xen_free_physical_cpu(pcpu); > + } > + if (info) > + kfree(info); > + xen_pcpus.possible = xen_pcpus.present = 0; > + INIT_LIST_HEAD(&xen_pcpus.list); > + return -1; > +} > + > +static irqreturn_t xen_pcpu_interrupt(int irq, void *dev_id) > +{ > + schedule_work(&work); > + return IRQ_HANDLED; > +} > + > +/* > + * type: 0 add, 1 remove >Looks like "type" is really an enum.> + */ > +int xen_pcpu_hotplug(int type, uint32_t apic_id) > +{ > + struct pcpu *pcpu; > + int found = 0; > + > + xen_pcpu_sync(); > + enter_pcpu_list(); > + list_for_each_entry(pcpu, &xen_pcpus.list, pcpu_list) > + { > + if (pcpu->apic_id == apic_id) > + { > + found = 1; > + break; > + } > + } > + leave_pcpu_list(); > + > + if (!found && (type == HOTPLUG_TYPE_ADD)) > + printk(KERN_WARNING "The cpu is not added into Xen HV?\n"); > + > + if (found && (type == HOTPLUG_TYPE_REMOVE) ) > + printk(KERN_WARNING "The cpu still exits in Xen HV?\n"); > + return 0; > +} > +EXPORT_SYMBOL(xen_pcpu_hotplug); >_GPL?> + > +void xen_pcpu_dpc(struct work_struct *work) > +{ > + xen_pcpu_sync(); > +} > + > +static int __init xen_physical_cpu_init(void) > +{ > + int err; > + > + if (!xen_initial_domain()) > + return 0; > + > + err = sysdev_class_register(&xen_pcpu_sysdev_class); > + if (err) > + { > + printk(KERN_WARNING "error to register sysdev for xen_pcpu\n"); > + return err; > + } > + > + err = init_xen_pcpu_info(); > + if (!err) > + bind_virq_to_irqhandler(VIRQ_PCPU_STATE, 0, xen_pcpu_interrupt, 0, "pcpu", NULL); >So when does this interrupt get raised? Is the full flow: 1. ACPI raises SCI 2. dom0 catches that and gets the new pcpu event 3. dom0 notifies Xen that the pcpu exists 4. xen interrupts dom0 saying that a new pcpu exists 5. dom0 processes the list and adds it to sysfs 6. usermode onlines the cpu for Xen''s use ? Or does Xen interrupt dom0 once dom0 does the online?> + return err; > +} > + > +subsys_initcall(xen_physical_cpu_init); > diff --git a/include/linux/acpi.h b/include/linux/acpi.h > index 34321cf..e414fcc 100644 > --- a/include/linux/acpi.h > +++ b/include/linux/acpi.h > @@ -85,6 +85,7 @@ int acpi_boot_init (void); > int acpi_boot_table_init (void); > int acpi_mps_check (void); > int acpi_numa_init (void); > +int get_apic_id(acpi_handle handle, int type, u32 acpi_id); > > int acpi_table_init (void); > int acpi_table_parse (char *id, acpi_table_handler handler); > diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h > index 6783fce..851a7c1 100644 > --- a/include/xen/interface/platform.h > +++ b/include/xen/interface/platform.h > @@ -312,6 +312,60 @@ struct xenpf_set_processor_pminfo { > typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t; > DEFINE_GUEST_HANDLE_STRUCT(xenpf_set_processor_pminfo); > > +struct xenpf_hotadd_cpu > +{ > + uint32_t apic_id; > + uint32_t acpi_id; > + uint32_t pxm; > +}; > + > +struct xenpf_hotremove_cpu > +{ > + uint32_t apic_id; > +}; > + > +struct xenpf_cpu_ol > +{ > + uint32_t cpuid; > +}; > + > +#define XENPF_resource_hotplug 55 > +struct xenpf_resource_hotplug { > + uint32_t sub_cmd; > +#define XEN_CPU_add 1 > +#define XEN_CPU_remove 2 > +#define XEN_CPU_online 3 > +#define XEN_CPU_offline 4 > + union { > + struct xenpf_hotadd_cpu sadd; > + struct xenpf_hotremove_cpu sremove; > + struct xenpf_cpu_ol cpu_ol; > + }u; > +}; > + > +#define XENPF_get_cpuinfo 56 > +struct xen_physical_cpuinfo { > + uint32_t xen_cpuid; > + uint32_t apic_id; > +#define XEN_PCPU_FLAGS_ONLINE 1 > + uint32_t flags; > + uint32_t pad; > +}; > +typedef struct xen_physical_cpuinfo xen_physical_cpuinfo_t; > +DEFINE_GUEST_HANDLE_STRUCT(xen_physical_cpuinfo); > + > +struct xenpf_pcpu_info > +{ > + /* IN/OUT */ > + uint32_t ncpus; > + /* OUT */ > + /* The possible CPU */ > + uint32_t max_cpus; > + GUEST_HANDLE(xen_physical_cpuinfo) info; > +}; > +typedef struct xenpf_pcpu_info xenpf_pcpu_info_t; > +DEFINE_GUEST_HANDLE_STRUCT(xenpf_pcpu_info); > + > struct xen_platform_op { > uint32_t cmd; > uint32_t interface_version; /* XENPF_INTERFACE_VERSION */ > @@ -327,6 +381,8 @@ struct xen_platform_op { > struct xenpf_change_freq change_freq; > struct xenpf_getidletime getidletime; > struct xenpf_set_processor_pminfo set_pminfo; > + struct xenpf_resource_hotplug resource; > + struct xenpf_pcpu_info pcpu_info; > uint8_t pad[128]; > } u; > }; > diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h > index 812ffd5..c11920b 100644 > --- a/include/xen/interface/xen.h > +++ b/include/xen/interface/xen.h > @@ -79,6 +79,7 @@ > #define VIRQ_CONSOLE 2 /* (DOM0) Bytes received on emergency console. */ > #define VIRQ_DOM_EXC 3 /* (DOM0) Exceptional event for some domain. */ > #define VIRQ_DEBUGGER 6 /* (DOM0) A domain has paused for debugging. */ > +#define VIRQ_PCPU_STATE 9 /* G. (DOM0) PCPU state changed */ > > /* Architecture-specific VIRQ definitions. */ > #define VIRQ_ARCH_0 16 > diff --git a/include/xen/pcpu.h b/include/xen/pcpu.h > new file mode 100644 > index 0000000..0d71149 > --- /dev/null > +++ b/include/xen/pcpu.h > @@ -0,0 +1,8 @@ > +#ifndef _XEN_PCPU_H > +#define _XEN_PCPU_H > + > +extern int xen_pcpu_hotplug(int type, uint32_t apic_id); > +extern int xen_mce_virq_update_cpu(uint32_t xen_id); > +extern int xen_microcode_update(int cpu); > + > +#endif > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Sep-25 00:27 UTC
[Xen-devel] RE: [PATCH 2/2] Add physical CPU hotplug support in PV_ops dom0
Jeremy Fitzhardinge wrote:> On 09/24/09 08:31, Jiang, Yunhong wrote: >> This patch add physical CPU hotplug support to PV_ops dom0. >> > > Is there a patch 1/2 for the kernel?Seems I named both patch to be 2/2 :( This is the only patch for kernel, while the other one is for Xen hypervisor.> >> The basic workflow is: after CPU hot add, instead of mark > CPU to be present to kernel, we will try to invoke hypercall > to Xen hypervisor to mark this CPU persent in hypervisor > directly and then return. Xen hypervisor will send a vIRQ > notification, with this notification, dom0 will online the new CPUs. >> > I''m a little confused here. When you say "dom0 will online the new > CPUs", do you mean in the sense that Xen will start using them to > schedule vcpus contexts onto them? Or do you mean that it will online > them within dom0 as new vcpus?"Online the new CPUs" means dom0 will notify Xen to online the new CPU, so that they will be used for schedule.> >> We utilize processor external controller logic in pv_ops > dom0 to add some hook in dom0''s ACPI code. >> >> We add /sys/device/system/xen_pcpu to present all physical > CPU in the system, so that we can utilize the udev rules to > online the CPU after it is added, this sepertaed directory > will not cause conflict with vcpu hotplug logic. The method is > echo "1" > /sys/device/system/xen_pcpuXX/online, which is > quite similar to current vCPU online method, except in > different sysfs entry. >> > > Yes, that makes sense. Would it eventually be worth mirroring the > normal cpu heirarchy as much as possible to add an interface for > stats and power management?As stated in the [patch 0/2] mail, the xen_pcpu is entry just for online now, but it maybe useful for other purpose. I think it should be useful for hierarchy, stats. Ke should have more idea of the power management side. Do you mean I need do these staff in this patch, we will hold for future work?> > And I guess offlining works as expected?Do you mean logical offlining or physical offlining? I think logical offlining is tested long before. For physical offlining, I implement it but I have no platform to test still. In fact , it is a tricky to physical offlining a cpu (or, more precisely speaking, socket) because that means we need offline all memory behind a socket. But it can be used for socket migration, i.e. put down the old CPU and bring up a spare CPU, in that situation, we need CPU offlining. We are still working on socket migration.> >> Currently Linux kernel has a lot of callback for > logical/physical CPU O*L, we checked them and think the > MCE/microcode callback is related with physical CPU O*L, while > all other callback is just for virtual CPU. >> > > I guess in general the microcode driver never cares about > hotplug of vcpus.Yes, also MCE. And, are you sure currently the microcode driver really not called in hotplug of vcpus? Will vcpu online not trigger the CPU_online notifier?> > Overall comment: this looks fairly reasonable. I don''t have any good > idea about how we''re going to get these core acpi changes into an > upstreamable form, but it looks OK within the context of what we have. > > Though I guess its another of those cases where ACPI is uncomfortably > straddling dom0 and xen''s responsibilities. Ideally Xen would handle > pcpu hotplug events by itself and then just notify dom0 to make any > policy decisions.Yes, ideally we should have xen handle the event, but xen have no idea of the DSDT :(> > I''d like to see this split up into a few patches though: > > * generic core acpi change (get_acpi_id) > * core pcpu hotplug > * microcode changes > * mce changesSure.> > You should also give the patches a going-over with checkpatch.pl. > > More detailed comments below. > > Thanks, > J > >> arch/x86/kernel/microcode_xen.c | 4 +- >> drivers/acpi/processor_core.c | 30 ++- >> drivers/xen/Makefile | 1 + >> drivers/xen/acpi_processor.c | 50 +++++- >> drivers/xen/mce.c | 75 ++++++-- >> drivers/xen/pcpu.c | 406 > ++++++++++++++++++++++++++++++++++++++ >> include/linux/acpi.h | 1 + >> include/xen/interface/platform.h | 56 ++++++ >> include/xen/interface/xen.h | 1 + >> include/xen/pcpu.h | 8 + >> 10 files changed, 601 insertions(+), 31 deletions(-) >> >> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> >> >> >> diff --git a/arch/x86/kernel/microcode_xen.c > b/arch/x86/kernel/microcode_xen.c >> index 397cba1..fed9368 100644 >> --- a/arch/x86/kernel/microcode_xen.c >> +++ b/arch/x86/kernel/microcode_xen.c >> @@ -27,7 +27,7 @@ struct xen_microcode { >> char data[0]; >> }; >> >> -static int xen_microcode_update(int cpu) >> +int xen_microcode_update(int cpu) >> { >> int err; >> struct xen_platform_op op; >> @@ -54,7 +54,7 @@ static int xen_microcode_update(int cpu) >> >> return err; >> } >> - >> +EXPORT_SYMBOL_GPL(xen_microcode_update); >> > > I think adding a new notifier would be better here, to avoid having an > explicit dependency between pcpu hotplug and the microcode driver. > > If nothing else, I want to maintain the microcode driver and pcpu > hotplug on different branches, and having one patch touch both is > awkward to manage.Yes, that''s cleaner, I will try that way.> >> static enum ucode_state xen_request_microcode_fw(int cpu, struct >> device *device) { char name[30]; >> diff --git a/drivers/acpi/processor_core.c > b/drivers/acpi/processor_core.c >> index 98010d5..69d45f6 100644 >> --- a/drivers/acpi/processor_core.c >> +++ b/drivers/acpi/processor_core.c >> @@ -561,6 +561,17 @@ static int get_cpu_id(acpi_handle handle, int >> type, u32 acpi_id) } return -1; >> } >> + >> +int get_apic_id(acpi_handle handle, int type, u32 acpi_id) +{ >> + int apic_id = -1; >> + >> + apic_id = map_mat_entry(handle, type, acpi_id); + if >> (apic_id == -1) + apic_id = map_madt_entry(type, >> acpi_id); + return apic_id; +} >> +EXPORT_SYMBOL_GPL(get_apic_id); >> #endif >> >> /* > --------------------------------------------------------------- > ----------- >> @@ -647,7 +658,7 @@ static int > acpi_processor_get_info(struct acpi_device *device) >> * less than the max # of CPUs. They should be ignored _iff >> * they are physically not present. >> */ >> - if (pr->id == -1) { >> + if ( !xen_pv_domain() && (pr->id == -1) ) { >> if (ACPI_FAILURE >> (acpi_processor_hotadd_init(pr->handle, >> &pr->id))) { return -ENODEV; >> @@ -735,6 +746,10 @@ static int __cpuinit > acpi_processor_start(struct acpi_device *device) >> >> per_cpu(processors, pr->id) = pr; >> >> + if (processor_cntl_xen()) >> + processor_cntl_xen_notify(pr, >> + PROCESSOR_HOTPLUG, >> HOTPLUG_TYPE_ADD); + result = acpi_processor_add_fs(device); >> if (result) >> goto end; >> @@ -974,10 +989,6 @@ int > acpi_processor_device_add(acpi_handle handle, struct > acpi_device **device) >> if (!pr) >> return -ENODEV; >> >> - if (processor_cntl_xen()) >> - processor_cntl_xen_notify(pr, >> - PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD); - >> if ((pr->id >= 0) && (pr->id < nr_cpu_ids)) { >> kobject_uevent(&(*device)->dev.kobj, KOBJ_ONLINE); >> } @@ -1023,13 +1034,12 @@ static void __ref > acpi_processor_hotplug_notify(acpi_handle handle, >> >> if (pr->id >= 0 && (pr->id < nr_cpu_ids)) { >> kobject_uevent(&device->dev.kobj, >> KOBJ_OFFLINE); + + if (processor_cntl_xen()) >> + > processor_cntl_xen_notify(pr, PROCESSOR_HOTPLUG, >> + HOTPLUG_TYPE_REMOVE); >> break; >> } >> - >> - if (processor_cntl_xen()) >> - processor_cntl_xen_notify(pr, >> PROCESSOR_HOTPLUG, - > HOTPLUG_TYPE_REMOVE); >> - >> result = acpi_processor_start(device); >> if ((!result) && ((pr->id >= 0) && (pr->id < >> nr_cpu_ids))) { >> kobject_uevent(&device->dev.kobj, KOBJ_ONLINE); >> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile >> index 8e7ce48..d3c6bb3 100644 >> --- a/drivers/xen/Makefile >> +++ b/drivers/xen/Makefile >> @@ -6,6 +6,7 @@ CFLAGS_features.o :>> $(nostackp) >> >> obj-$(CONFIG_PCI) += pci.o >> obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o >> +obj-$(CONFIG_HOTPLUG_CPU) += pcpu.o >> > Does this have any other dependencies? CONFIG_HOTPLUG_CPU doesn''t > depend on ACPI. > > Perhaps adding a new CONFIG_XEN_HOTPLUG_PCPU config flag with the > appropriate dependencies would be better (which may or may not be > user-settable; I''d prefer not unless there''s a good reason to want to > explicitly disable it).Yes, thanks for suggestion.> >> obj-$(CONFIG_XEN_XENCOMM) += xencomm.o >> obj-$(CONFIG_XEN_BALLOON) += balloon.o >> obj-$(CONFIG_XEN_DEV_EVTCHN) += evtchn.o >> diff --git a/drivers/xen/acpi_processor.c >> b/drivers/xen/acpi_processor.c index 6a4e8e4..a42a3d0 100644 --- >> a/drivers/xen/acpi_processor.c +++ b/drivers/xen/acpi_processor.c >> @@ -32,6 +32,7 @@ >> #include <linux/cpufreq.h> >> #include <acpi/processor.h> >> #include <xen/acpi.h> >> +#include <xen/pcpu.h> >> >> #include <asm/xen/hypercall.h> >> #include <asm/xen/hypervisor.h> >> @@ -71,6 +72,8 @@ int processor_cntl_xen_power_cache(int cpu, int cx, >> >> int processor_cntl_xen(void) >> { >> + if (!xen_initial_domain()) >> + return 0; >> return 1; >> } >> >> @@ -122,6 +125,7 @@ static int processor_notify_smm(void) >> int processor_cntl_xen_notify(struct acpi_processor *pr, int event, >> int type) { int ret = -EINVAL; >> + int apic_id; >> >> switch (event) { >> case PROCESSOR_PM_INIT: >> @@ -135,6 +139,8 @@ int processor_cntl_xen_notify(struct > acpi_processor *pr, int event, int type) >> case PROCESSOR_HOTPLUG: >> if (xen_ops.hotplug) >> ret = xen_ops.hotplug(pr, type); >> + apic_id = get_apic_id(pr->handle, 1, pr->acpi_id); >> + xen_pcpu_hotplug(type, apic_id); >> break; >> default: >> printk(KERN_ERR "Unsupport processor events %d.\n", >> event); @@ -423,9 +429,51 @@ static int xen_tx_notifier(struct > acpi_processor *pr, int action) >> { >> return -EINVAL; >> } >> + >> static int xen_hotplug_notifier(struct acpi_processor *pr, int >> event) { - return -EINVAL; >> + int ret = -EINVAL; >> + uint32_t apic_id; >> + unsigned long long pxm; >> + acpi_status status = 0; >> + >> + xen_platform_op_t op = { >> + .cmd = XENPF_resource_hotplug, >> + .interface_version = XENPF_INTERFACE_VERSION, >> + .u.resource.u.sadd.acpi_id = pr->acpi_id, >> + .u.resource.u.sadd.apic_id = pr->id, + }; >> + >> + apic_id = get_apic_id(pr->handle, 1, pr->acpi_id); + >> if (apic_id < 0) + { >> + printk(KERN_WARNING "Can''t get apic_id for acpi_id >> %x\n", pr->acpi_id); + return -1; + } >> + op.u.resource.u.sadd.apic_id = apic_id; >> + >> + status = acpi_evaluate_integer(pr->handle, "_PXM", >> + NULL, &pxm); >> + if (ACPI_FAILURE(status)) >> + { >> + printk(KERN_WARNING "can''t get pxm for acpi_id >> %x\n", pr->acpi_id); + return -1; + } >> + op.u.resource.u.sadd.pxm = pxm; >> + >> + switch (event) >> + { >> + case HOTPLUG_TYPE_ADD: >> + op.u.resource.sub_cmd = XEN_CPU_add; >> + ret = HYPERVISOR_dom0_op(&op); >> + break; >> + case HOTPLUG_TYPE_REMOVE: >> + op.u.resource.sub_cmd = XEN_CPU_remove; >> + ret = HYPERVISOR_dom0_op(&op); >> + break; >> + } >> + >> + return ret; >> } >> >> static int __init xen_acpi_processor_extcntl_init(void) >> diff --git a/drivers/xen/mce.c b/drivers/xen/mce.c >> index b354dc8..79616de 100644 >> --- a/drivers/xen/mce.c >> +++ b/drivers/xen/mce.c >> @@ -45,9 +45,14 @@ >> static mc_info_t *g_mi; >> static mcinfo_logical_cpu_t *g_physinfo; >> static uint32_t ncpus; >> +static DEFINE_SPINLOCK(pcpu_spinlock); >> > > Split the mce changes out into a separate patch too, if possible.After using the notifier, yes, we can possible do that. In fact, I''m considering whether xen''s mce driver need hypercall to get the pcpu information, if we can export all the information through pcpu interface, I will check it.> >> + >> +#define enter_pcpu_list() spin_lock_irqsave(&pcpu_spinlock, flags); >> +#define leave_pcpu_list() > spin_unlock_irqrestore(&pcpu_spinlock, flags); >> > > Another definition of these? Please, no. (I found the > drivers/xen/pcpu.c versions of these first; see below.)It should be static.> >> static int convert_log(struct mc_info *mi) >> { >> + unsigned long flags; >> struct mcinfo_common *mic = NULL; >> struct mcinfo_global *mc_global; >> struct mcinfo_bank *mc_bank; >> @@ -61,6 +66,7 @@ static int convert_log(struct mc_info *mi) >> mc_global = (struct mcinfo_global *)mic; >> m.mcgstatus = mc_global->mc_gstatus; >> m.apicid = mc_global->mc_apicid; >> + enter_pcpu_list(); >> for (i = 0; i < ncpus; i++) { >> if (g_physinfo[i].mc_apicid == m.apicid) { >> found = 1; >> @@ -72,6 +78,8 @@ static int convert_log(struct mc_info *mi) >> m.socketid = g_physinfo[i].mc_chipid; >> m.cpu = m.extcpu = g_physinfo[i].mc_cpunr; >> m.cpuvendor = (__u8)g_physinfo[i].mc_vendor; + >> leave_pcpu_list(); + >> x86_mcinfo_lookup(&mic, mi, MC_TYPE_BANK); >> do { >> if (mic == NULL || mic->size == 0) >> @@ -143,46 +151,77 @@ end: >> return IRQ_HANDLED; >> } >> >> -static int bind_virq_for_mce(void) >> +static int sync_pcpu_info(void) >> { >> - int ret; >> + int ret = 0; >> + unsigned long flags; >> xen_mc_t mc_op; >> + mcinfo_logical_cpu_t *pcpu_info = NULL; >> >> - g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL); - >> - if (!g_mi) >> - return -ENOMEM; >> - >> + enter_pcpu_list(); >> /* Fetch physical CPU Numbers */ >> mc_op.cmd = XEN_MC_physcpuinfo; >> mc_op.interface_version = XEN_MCA_INTERFACE_VERSION; >> - set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, >> g_physinfo); + >> set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, NULL); ret >> = HYPERVISOR_mca(&mc_op); if (ret) { >> printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical >> CPU numbers\n"); >> - kfree(g_mi); >> - return ret; >> + goto failed; >> } >> >> /* Fetch each CPU Physical Info for later reference*/ >> ncpus = mc_op.u.mc_physcpuinfo.ncpus; >> - g_physinfo = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus, >> + pcpu_info = kmalloc(sizeof(struct mcinfo_logical_cpu)*ncpus, >> GFP_KERNEL); >> - if (!g_physinfo) { >> - kfree(g_mi); >> - return -ENOMEM; >> - } >> - set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, >> g_physinfo); + if (!pcpu_info) + goto failed; >> + >> + set_xen_guest_handle(mc_op.u.mc_physcpuinfo.info, pcpu_info); >> ret = HYPERVISOR_mca(&mc_op); >> if (ret) { >> printk(KERN_ERR "MCE_DOM0_LOG: Fail to get physical >> CPUs info\n"); - kfree(g_mi); + >> kfree(pcpu_info); + goto failed; >> + } >> + /* replace the old pointer */ >> + if (g_physinfo) >> kfree(g_physinfo); >> + g_physinfo = pcpu_info; >> + leave_pcpu_list(); >> + return 0; >> +failed: >> + if (pcpu_info) >> + kfree(pcpu_info); >> + leave_pcpu_list(); >> + return ret; >> +} >> + >> +int xen_mce_virq_update_cpu(uint32_t xen_id) >> +{ >> + return sync_pcpu_info(); >> +} >> + >> +EXPORT_SYMBOL(xen_mce_virq_update_cpu); >> + >> +static int bind_virq_for_mce(void) >> +{ >> + int ret; >> + xen_mc_t mc_op; >> + >> + g_mi = kmalloc(sizeof(struct mc_info), GFP_KERNEL); + >> + if (!g_mi) >> + return -ENOMEM; >> + >> + /* Fetch physical CPU Numbers */ >> + if ((ret = sync_pcpu_info())) >> + { >> + kfree(g_mi); >> return ret; >> } >> >> ret = bind_virq_to_irqhandler(VIRQ_MCA, 0, >> - mce_dom_interrupt, 0, "mce", NULL); >> + mce_dom_interrupt, 0, "mce", NULL); >> > > Don''t make unrelated formatting changes. > >> if (ret < 0) { >> printk(KERN_ERR "MCE_DOM0_LOG: bind_virq for DOM0 >> failed\n"); diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c >> new file mode 100644 >> index 0000000..3872c20 >> --- /dev/null >> +++ b/drivers/xen/pcpu.c >> @@ -0,0 +1,406 @@ >> +/* >> + * pcpu.c - management physical cpu in dom0 environment + */ >> +#include <linux/interrupt.h> >> +#include <linux/spinlock.h> >> +#include <asm/xen/hypervisor.h> >> +#include <asm/xen/hypercall.h> >> +#include <asm/cpu.h> >> +#include <xen/xenbus.h> >> +#include <xen/pcpu.h> >> +#include <xen/events.h> >> +#include <xen/acpi.h> >> + >> +void xen_pcpu_dpc(struct work_struct *work); >> +static struct sysdev_class xen_pcpu_sysdev_class = { + .name >> = "xen_pcpu", +}; >> + >> +DECLARE_WORK(work, xen_pcpu_dpc); >> > > static? Also "work" is a very generic name; how about something a bit > more descriptive, or at least identifying.Sure.> >> +static DEFINE_SPINLOCK(pcpu_spinlock); >> + >> +/* No need for irq disable since hotplug notify is in workqueue >> context */ +#define enter_pcpu_list() spin_lock(&pcpu_spinlock); >> +#define leave_pcpu_list() spin_unlock(&pcpu_spinlock); >> > > Don''t wrap spinlocks up like this. Always leave them open-coded, or > at the very least put "lock"/"unlock" in the wrapper names. I didn''t > realize enter/leave meant lock/unlock on my first read over the code. > >> + >> +struct pcpu >> +{ >> + struct list_head pcpu_list; >> + struct sys_device sysdev; >> + uint32_t xen_id; >> + uint32_t apic_id; >> +#define PCPU_LOOPED 0x10000000 >> + uint32_t flags; >> +}; >> + >> +struct xen_pcpus >> +{ >> + struct list_head list; >> + int possible; >> + int present; >> +}; >> +struct xen_pcpus xen_pcpus; >> + >> +#define XEN_PCPU_ONLINE 0x01 >> +#define XEN_PCPU_OFFLINE 0x02 >> +#define XEN_PCPU_ADD 0x04 >> +#define XEN_PCPU_REMOVE 0x08 >> + >> +static ssize_t show_online(struct sys_device *dev, struct >> sysdev_attribute *attr, + char *buf) +{ >> + struct pcpu *cpu = container_of(dev, struct pcpu, sysdev); + >> + return sprintf(buf, "%u\n", cpu->flags & >> XEN_PCPU_FLAGS_ONLINE); >> > > This relies on XEN_PCPU_FLAGS_ONLINE equalling 1 to guarantee that the > file contains 0/1. You should be more explicit about it by > adding "!= 0".Sure.> >> +} >> + >> +static int xen_pcpu_down(uint32_t xen_id) >> +{ >> + int ret; >> + xen_platform_op_t op = { >> + .cmd = XENPF_resource_hotplug, >> + .interface_version = XENPF_INTERFACE_VERSION, >> + .u.resource.u.cpu_ol.cpuid = xen_id, + }; >> + >> + op.u.resource.sub_cmd = XEN_CPU_offline; >> + ret = HYPERVISOR_dom0_op(&op); >> + return ret; >> +} >> + >> +static int xen_pcpu_up(uint32_t xen_id) >> +{ >> + int ret; >> + xen_platform_op_t op = { >> + .cmd = XENPF_resource_hotplug, >> + .interface_version = XENPF_INTERFACE_VERSION, >> + .u.resource.u.cpu_ol.cpuid = xen_id, + }; >> + >> + op.u.resource.sub_cmd = XEN_CPU_online; >> + ret = HYPERVISOR_dom0_op(&op); >> + return ret; >> +} >> + >> +static ssize_t __ref store_online(struct sys_device *dev, >> + struct sysdev_attribute *attr, >> + const char *buf, size_t count) >> +{ >> + struct pcpu *cpu = container_of(dev, struct pcpu, sysdev); + >> ssize_t ret; + >> + switch (buf[0]) { >> + case ''0'': >> + ret = xen_pcpu_down(cpu->xen_id); >> + break; >> + case ''1'': >> + ret = xen_pcpu_up(cpu->xen_id); >> + break; >> + default: >> + ret = -EINVAL; >> + } >> + >> + if (ret >= 0) >> + ret = count; >> + return ret; >> +} >> + >> +static SYSDEV_ATTR(online, 0644, show_online, store_online); + >> +static int xen_free_physical_cpu(struct pcpu * pcpu) +{ >> + if (!pcpu) >> + return 0; >> + >> + sysdev_remove_file(&pcpu->sysdev, &attr_online); >> + sysdev_unregister(&pcpu->sysdev); >> + list_del(&pcpu->pcpu_list); >> + kfree(pcpu); >> + >> + return 0; >> +} >> + >> +static struct pcpu * xen_add_physical_cpu(struct >> xen_physical_cpuinfo *info) +{ + struct pcpu *cpu; >> + int error; >> + >> + cpu = kzalloc(sizeof(struct pcpu), GFP_KERNEL); + if >> (!cpu) + return NULL; >> + >> + INIT_LIST_HEAD(&cpu->pcpu_list); >> + cpu->xen_id = info->xen_cpuid; >> + cpu->apic_id = info->apic_id; >> + cpu->flags = info->flags; >> + >> + cpu->sysdev.cls = &xen_pcpu_sysdev_class; >> + cpu->sysdev.id = info->xen_cpuid; >> + printk("get cpuid %x\n", cpu->sysdev.id); >> + >> + error = sysdev_register(&cpu->sysdev); >> + if (error) >> + { >> + printk("failed to register pcpu\n"); + >> kfree(cpu); + return NULL; >> + } >> + sysdev_create_file(&cpu->sysdev, &attr_online); + if >> (error) + { >> + printk("failed to create attr\n"); >> + sysdev_unregister(&cpu->sysdev); >> + kfree(cpu); >> + return NULL; >> + } >> + >> + return cpu; >> +} >> + >> +static struct xen_physical_cpuinfo * xen_fetch_cpu_info(int *num, >> + int *possible) >> +{ >> + int cpu_num, ret = 0; >> + struct xen_physical_cpuinfo *info; >> + xen_platform_op_t op = { >> + .cmd = XENPF_get_cpuinfo, >> + .interface_version = XENPF_INTERFACE_VERSION, >> + .u.pcpu_info.ncpus = 0, >> + }; >> + >> + set_xen_guest_handle(op.u.pcpu_info.info, NULL); + >> + ret = HYPERVISOR_dom0_op(&op); >> + if (ret) >> + return NULL; >> + cpu_num = op.u.pcpu_info.max_cpus; >> + >> + info = kzalloc(cpu_num * sizeof(struct > xen_physical_cpuinfo), GFP_KERNEL); >> + if (!info) >> + return NULL; >> + op.u.pcpu_info.ncpus = cpu_num; >> + set_xen_guest_handle(op.u.pcpu_info.info, info); + >> + ret = HYPERVISOR_dom0_op(&op); >> + if (ret) >> + { >> + kfree(info); >> + printk(KERN_WARNING "Error for pcpu info fetch\n"); >> + return NULL; + } >> + if (possible) >> + *possible = op.u.pcpu_info.max_cpus; + if (num) >> + *num = op.u.pcpu_info.ncpus; >> + >> + return info; >> +} >> + >> +static int xen_pcpu_callback(uint32_t xen_id, int action) +{ >> + switch (action) >> + { >> + case XEN_PCPU_ONLINE: >> + { >> +#ifdef CONFIG_MICROCODE_XEN >> + xen_microcode_update(xen_id); +#endif >> +#ifdef CONFIG_MCE_XEN >> + xen_mce_virq_update_cpu(xen_id); >> +#endif + } >> + break; >> + case XEN_PCPU_OFFLINE: >> + { >> +#ifdef CONFIG_MCE_XEN >> + xen_mce_virq_update_cpu(xen_id); >> +#endif >> > > Yeah, these are definitely notifier material.Ok, I will do it that way.> >> + } >> + break; >> + default: >> + break; >> + } >> + return 0; >> +} >> + >> +/* >> + * Sync dom0''s pcpu information with xen hypervisor''s + */ >> +static int xen_pcpu_sync(void) >> +{ >> + struct xen_physical_cpuinfo *info; >> + int cpu_num, i; >> + struct list_head *elem, *tmp; >> + struct pcpu *pcpu; >> + >> + info = xen_fetch_cpu_info(&cpu_num, NULL); >> + >> + if (!info) >> + return IRQ_HANDLED; >> + enter_pcpu_list(); >> + >> + /* Check for current cpu list */ >> + for (i = 0; i < cpu_num; i++) >> + { >> + int found = 0; >> + list_for_each_entry(pcpu, &xen_pcpus.list, >> pcpu_list) + { + if ( >> (pcpu->apic_id == info[i].apic_id) && + >> (pcpu->xen_id == info[i].xen_cpuid) ) + { >> > > Could you hoist some of this inner code into helper functions to > reduce the nesting here?Yes.> >> + found = 1; >> + if( (info[i].flags & >> XEN_PCPU_FLAGS_ONLINE) && + > !(pcpu->flags &XEN_PCPU_FLAGS_ONLINE)) >> + { >> + /* the pcpu is onlined now */ >> + pcpu->flags |>> XEN_PCPU_FLAGS_ONLINE; + > kobject_uevent(&pcpu->sysdev.kobj, KOBJ_ONLINE); >> + > xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_ONLINE); >> + } >> + else if ( !(info[i].flags & >> XEN_PCPU_FLAGS_ONLINE) && + >> (pcpu->flags & XEN_PCPU_FLAGS_ONLINE)) + >> { + /* The pcpu is offlined >> now */ + pcpu->flags &>> ~XEN_PCPU_FLAGS_ONLINE; + > kobject_uevent(&pcpu->sysdev.kobj, KOBJ_OFFLINE); >> + > xen_pcpu_callback(pcpu->xen_id, XEN_PCPU_OFFLINE); >> + } >> + pcpu->flags |= PCPU_LOOPED; + >> } + } >> + if (!found) >> + { >> + struct pcpu *cpu; >> + >> + if (info[i].flags & XEN_PCPU_FLAGS_ONLINE) >> + printk(KERN_WARNING "A > hotadd cpu is onlined also \n"); >> > > This could do with some clarification. Is this case that a newly > added pcpu already appears to be online?Because the vIRQ notification for pcpu hotplug from xen hypervisor is async, so maybe it happens after the pcpu is onlined by user. (i.e., the online notification and hotplug notification is merged)>Also it should be prefixed > with some > clue about what code/operation is printing the message. > >> + cpu = xen_add_physical_cpu(&info[i]); >> + if (cpu == NULL) >> + goto failed; >> + list_add_tail(&cpu->pcpu_list, >> &xen_pcpus.list); + >> xen_pcpu_callback(cpu->xen_id, XEN_PCPU_ADD); + >> cpu->flags |= PCPU_LOOPED; + } >> + } >> + >> + list_for_each_safe(elem, tmp, &xen_pcpus.list) + { >> + pcpu = list_entry(elem, struct pcpu, pcpu_list); >> + if (pcpu->flags & PCPU_LOOPED) >> + pcpu->flags &= ~PCPU_LOOPED; + >> else + { >> + /* The pcpu does not exist any more, remove >> it */ + xen_free_physical_cpu(pcpu); >> + xen_pcpu_callback(pcpu->xen_id, >> XEN_PCPU_REMOVE); + } + } >> + >> +failed: >> + leave_pcpu_list(); >> + return 0; >> +} >> + >> +static int init_xen_pcpu_info(void) >> > __init? > > Also I prefer names of the form subsystem_action, so > xen_pcpu_info_init().Sorry, what do you mean of subsystem_action? I didn''t find definition for it.> >> +{ >> + int possible, cpu_num, i; >> + struct xen_physical_cpuinfo *info = NULL; >> + struct list_head *elem, *tmp; >> + struct pcpu *pcpu; >> + >> + xen_pcpus.possible = xen_pcpus.present = 0; >> + INIT_LIST_HEAD(&xen_pcpus.list); >> + info = xen_fetch_cpu_info(&cpu_num, &possible); + if >> (!info) + return -1; >> + xen_pcpus.possible = possible; >> + xen_pcpus.present = cpu_num; >> + enter_pcpu_list(); >> + for (i = 0; i < cpu_num; i++) >> + { >> + pcpu = xen_add_physical_cpu(&info[i]); + >> if (!pcpu) + { >> + leave_pcpu_list(); >> + goto failed; >> + } >> + list_add_tail(&pcpu->pcpu_list, &xen_pcpus.list); + >> } + leave_pcpu_list(); >> + kfree(info); >> + >> + return 0; >> +failed: >> + list_for_each_safe(elem, tmp, &xen_pcpus.list) + { >> + pcpu = list_entry(elem, struct pcpu, pcpu_list); >> + xen_free_physical_cpu(pcpu); >> + } >> + if (info) >> + kfree(info); >> + xen_pcpus.possible = xen_pcpus.present = 0; >> + INIT_LIST_HEAD(&xen_pcpus.list); >> + return -1; >> +} >> + >> +static irqreturn_t xen_pcpu_interrupt(int irq, void *dev_id) +{ >> + schedule_work(&work); >> + return IRQ_HANDLED; >> +} >> + >> +/* >> + * type: 0 add, 1 remove >> > > Looks like "type" is really an enum. > >> + */ >> +int xen_pcpu_hotplug(int type, uint32_t apic_id) >> +{ >> + struct pcpu *pcpu; >> + int found = 0; >> + >> + xen_pcpu_sync(); >> + enter_pcpu_list(); >> + list_for_each_entry(pcpu, &xen_pcpus.list, pcpu_list) + >> { + if (pcpu->apic_id == apic_id) >> + { >> + found = 1; >> + break; >> + } >> + } >> + leave_pcpu_list(); >> + >> + if (!found && (type == HOTPLUG_TYPE_ADD)) >> + printk(KERN_WARNING "The cpu is not added into Xen >> HV?\n"); + + if (found && (type == HOTPLUG_TYPE_REMOVE) ) >> + printk(KERN_WARNING "The cpu still exits in Xen >> HV?\n"); + return 0; +} >> +EXPORT_SYMBOL(xen_pcpu_hotplug); >> > _GPL?Yes.> >> + >> +void xen_pcpu_dpc(struct work_struct *work) >> +{ >> + xen_pcpu_sync(); >> +} >> + >> +static int __init xen_physical_cpu_init(void) >> +{ >> + int err; >> + >> + if (!xen_initial_domain()) >> + return 0; >> + >> + err = sysdev_class_register(&xen_pcpu_sysdev_class); + >> if (err) + { >> + printk(KERN_WARNING "error to register sysdev for >> xen_pcpu\n"); + return err; + } >> + >> + err = init_xen_pcpu_info(); >> + if (!err) >> + bind_virq_to_irqhandler(VIRQ_PCPU_STATE, 0, > xen_pcpu_interrupt, 0, "pcpu", NULL); >> > > So when does this interrupt get raised? Is the full flow: > > 1. ACPI raises SCI > 2. dom0 catches that and gets the new pcpu event > 3. dom0 notifies Xen that the pcpu exists > 4. xen interrupts dom0 saying that a new pcpu exists > 5. dom0 processes the list and adds it to sysfs > 6. usermode onlines the cpu for Xen''s useAnd there is also step 7, Xen interrupt dom0 saying a new pCPU is onlined. Through step 7, dom0 will have the life cycle of the pCPU. Thanks Yunhong Jiang> > ? Or does Xen interrupt dom0 once dom0 does the online? > >> + return err; >> +} >> + >> +subsys_initcall(xen_physical_cpu_init); >> diff --git a/include/linux/acpi.h b/include/linux/acpi.h >> index 34321cf..e414fcc 100644 >> --- a/include/linux/acpi.h >> +++ b/include/linux/acpi.h >> @@ -85,6 +85,7 @@ int acpi_boot_init (void); >> int acpi_boot_table_init (void); >> int acpi_mps_check (void); >> int acpi_numa_init (void); >> +int get_apic_id(acpi_handle handle, int type, u32 acpi_id); >> >> int acpi_table_init (void); >> int acpi_table_parse (char *id, acpi_table_handler handler); >> diff --git a/include/xen/interface/platform.h > b/include/xen/interface/platform.h >> index 6783fce..851a7c1 100644 >> --- a/include/xen/interface/platform.h >> +++ b/include/xen/interface/platform.h >> @@ -312,6 +312,60 @@ struct xenpf_set_processor_pminfo { >> typedef struct xenpf_set_processor_pminfo >> xenpf_set_processor_pminfo_t; >> DEFINE_GUEST_HANDLE_STRUCT(xenpf_set_processor_pminfo); >> >> +struct xenpf_hotadd_cpu >> +{ >> + uint32_t apic_id; >> + uint32_t acpi_id; >> + uint32_t pxm; >> +}; >> + >> +struct xenpf_hotremove_cpu >> +{ >> + uint32_t apic_id; >> +}; >> + >> +struct xenpf_cpu_ol >> +{ >> + uint32_t cpuid; >> +}; >> + >> +#define XENPF_resource_hotplug 55 >> +struct xenpf_resource_hotplug { >> + uint32_t sub_cmd; >> +#define XEN_CPU_add 1 >> +#define XEN_CPU_remove 2 >> +#define XEN_CPU_online 3 >> +#define XEN_CPU_offline 4 >> + union { >> + struct xenpf_hotadd_cpu sadd; >> + struct xenpf_hotremove_cpu sremove; >> + struct xenpf_cpu_ol cpu_ol; >> + }u; >> +}; >> + >> +#define XENPF_get_cpuinfo 56 >> +struct xen_physical_cpuinfo { >> + uint32_t xen_cpuid; >> + uint32_t apic_id; >> +#define XEN_PCPU_FLAGS_ONLINE 1 >> + uint32_t flags; >> + uint32_t pad; >> +}; >> +typedef struct xen_physical_cpuinfo xen_physical_cpuinfo_t; >> +DEFINE_GUEST_HANDLE_STRUCT(xen_physical_cpuinfo); >> + >> +struct xenpf_pcpu_info >> +{ >> + /* IN/OUT */ >> + uint32_t ncpus; >> + /* OUT */ >> + /* The possible CPU */ >> + uint32_t max_cpus; >> + GUEST_HANDLE(xen_physical_cpuinfo) info; >> +}; >> +typedef struct xenpf_pcpu_info xenpf_pcpu_info_t; >> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_pcpu_info); >> + >> struct xen_platform_op { >> uint32_t cmd; >> uint32_t interface_version; /* XENPF_INTERFACE_VERSION */ >> @@ -327,6 +381,8 @@ struct xen_platform_op { >> struct xenpf_change_freq change_freq; >> struct xenpf_getidletime getidletime; >> struct xenpf_set_processor_pminfo set_pminfo; >> + struct xenpf_resource_hotplug resource; >> + struct xenpf_pcpu_info pcpu_info; >> uint8_t pad[128]; } u; >> }; >> diff --git a/include/xen/interface/xen.h >> b/include/xen/interface/xen.h index 812ffd5..c11920b 100644 --- >> a/include/xen/interface/xen.h +++ b/include/xen/interface/xen.h >> @@ -79,6 +79,7 @@ >> #define VIRQ_CONSOLE 2 /* (DOM0) Bytes received on emergency >> console. */ #define VIRQ_DOM_EXC 3 /* (DOM0) Exceptional event >> for some domain. */ #define VIRQ_DEBUGGER 6 /* (DOM0) A domain >> has paused for debugging. */ +#define VIRQ_PCPU_STATE 9 /* G. >> (DOM0) PCPU state changed > */ >> >> /* Architecture-specific VIRQ definitions. */ >> #define VIRQ_ARCH_0 16 >> diff --git a/include/xen/pcpu.h b/include/xen/pcpu.h >> new file mode 100644 >> index 0000000..0d71149 >> --- /dev/null >> +++ b/include/xen/pcpu.h >> @@ -0,0 +1,8 @@ >> +#ifndef _XEN_PCPU_H >> +#define _XEN_PCPU_H >> + >> +extern int xen_pcpu_hotplug(int type, uint32_t apic_id); >> +extern int xen_mce_virq_update_cpu(uint32_t xen_id); >> +extern int xen_microcode_update(int cpu); >> + >> +#endif_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Sep-25 00:49 UTC
[Xen-devel] Re: [PATCH 2/2] Add physical CPU hotplug support in PV_ops dom0
On 09/24/09 17:27, Jiang, Yunhong wrote:> Do you mean logical offlining or physical offlining? I think logical offlining is tested long before. For physical offlining, I implement it but I have no platform to test still. > > In fact , it is a tricky to physical offlining a cpu (or, more precisely speaking, socket) because that means we need offline all memory behind a socket. But it can be used for socket migration, i.e. put down the old CPU and bring up a spare CPU, in that situation, we need CPU offlining. We are still working on socket migration. >I see. I guess it makes it hard when the CPUs are also the memory controllers.> Yes, also MCE. > And, are you sure currently the microcode driver really not called in hotplug of vcpus? Will vcpu online not trigger the CPU_online notifier? >I would expect vcpu hotplug will call the usual notifiers as expected, but drivers which really care about pcpus don''t have much use for those notifications.>> This could do with some clarification. Is this case that a newly >> added pcpu already appears to be online? >> > Because the vIRQ notification for pcpu hotplug from xen hypervisor is async, so maybe it happens after the pcpu is onlined by user. (i.e., the online notification and hotplug notification is merged) >I''m not sure I follow. Are you saying this is an expected race, and that this printk is just for debugging?>> __init? >> >> Also I prefer names of the form subsystem_action, so >> xen_pcpu_info_init(). >> > Sorry, what do you mean of subsystem_action? I didn''t find definition for it. >I meant "<subsystem>_<action>()".>> So when does this interrupt get raised? Is the full flow: >> >> 1. ACPI raises SCI >> 2. dom0 catches that and gets the new pcpu event >> 3. dom0 notifies Xen that the pcpu exists >> 4. xen interrupts dom0 saying that a new pcpu exists >> 5. dom0 processes the list and adds it to sysfs >> 6. usermode onlines the cpu for Xen''s use >> > And there is also step 7, Xen interrupt dom0 saying a new pCPU is onlined. Through step 7, dom0 will have the life cycle of the pCPU. >OK, I see. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Sep-25 01:32 UTC
[Xen-devel] RE: [PATCH 2/2] Add physical CPU hotplug support in PV_ops dom0
Jeremy Fitzhardinge wrote:> On 09/24/09 17:27, Jiang, Yunhong wrote: >> Do you mean logical offlining or physical offlining? I think > logical offlining is tested long before. For physical > offlining, I implement it but I have no platform to test still. >> >> In fact , it is a tricky to physical offlining a cpu (or, > more precisely speaking, socket) because that means we need > offline all memory behind a socket. But it can be used for > socket migration, i.e. put down the old CPU and bring up a > spare CPU, in that situation, we need CPU offlining. We are > still working on socket migration. >> > > I see. I guess it makes it hard when the CPUs are also the memory > controllers. > >> Yes, also MCE. >> And, are you sure currently the microcode driver really not > called in hotplug of vcpus? Will vcpu online not trigger the > CPU_online notifier? >> > > I would expect vcpu hotplug will call the usual notifiers as expected, > but drivers which really care about pcpus don''t have much use for > those notifications.Yes.>>> This could do with some clarification. Is this case that a newly >>> added pcpu already appears to be online? >>> >> Because the vIRQ notification for pcpu hotplug from xen > hypervisor is async, so maybe it happens after the pcpu is > onlined by user. (i.e., the online notification and hotplug > notification is merged) >> > > I''m not sure I follow. Are you saying this is an expected race, and > that this printk is just for debugging?Yes, exactly. --jyh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel