Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [GIT PULL] Xen APIC hooks (with io_apic_ops)
Hi Ingo, Here''s a revised set of the Xen APIC changes which adds io_apic_ops to allow Xen to intercept IO APIC access operations. Thanks, J The following changes since commit ce791368bb4a53d05e78e1588bac0aacde8db84c: Jeremy Fitzhardinge (1): xen/i386: make sure initial VGA/ISA mappings are not overridden are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git for-ingo/xen/dom0/apic-ops Gerd Hoffmann (2): xen: set pirq name to something useful. xen: fix legacy irq setup, make ioapic-less machines work. Ian Campbell (1): xen: pre-initialize legacy irqs early Jeremy Fitzhardinge (14): xen/dom0: handle acpi lapic parsing in Xen dom0 x86: add io_apic_ops to allow interception xen: implement io_apic_ops xen: create dummy ioapic mapping xen: implement pirq type event channels x86/io_apic: add get_nr_irqs_gsi() xen/apic: identity map gsi->irqs xen: direct irq registration to pirq event channels xen: bind pirq to vector and event channel xen: don''t setup acpi interrupt unless there is one xen: use acpi_get_override_irq() to get triggering for legacy irqs xen: initialize irq 0 too xen: dynamically allocate irq & event structures xen: disable MSI arch/x86/include/asm/io_apic.h | 10 ++ arch/x86/include/asm/xen/pci.h | 13 ++ arch/x86/kernel/acpi/boot.c | 18 +++- arch/x86/kernel/apic/io_apic.c | 55 ++++++++- arch/x86/xen/Kconfig | 11 ++ arch/x86/xen/Makefile | 3 +- arch/x86/xen/apic.c | 69 ++++++++++ arch/x86/xen/enlighten.c | 2 + arch/x86/xen/mmu.c | 10 ++ arch/x86/xen/pci.c | 86 +++++++++++++ arch/x86/xen/xen-ops.h | 6 + drivers/pci/pci.h | 2 - drivers/xen/events.c | 273 ++++++++++++++++++++++++++++++++++++++-- include/linux/pci.h | 6 + include/xen/events.h | 19 +++ 15 files changed, 568 insertions(+), 15 deletions(-) create mode 100644 arch/x86/include/asm/xen/pci.h create mode 100644 arch/x86/xen/apic.c create mode 100644 arch/x86/xen/pci.c _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 01/17] xen/dom0: handle acpi lapic parsing in Xen dom0
When running in Xen dom0, we still want to parse the ACPI tables to find out about local and IO apics, but we don''t want to actually use the lapics. Put a couple of tests for Xen to prevent lapics from being mapped or accessed. This is very Xen-specific behaviour, so there didn''t seem to be any point in adding more indirection. [ Impact: ignore local apics, which are not usable under Xen ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> --- arch/x86/kernel/acpi/boot.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 723989d..4147e0c 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -41,6 +41,8 @@ #include <asm/mpspec.h> #include <asm/smp.h> +#include <asm/xen/hypervisor.h> + static int __initdata acpi_force = 0; u32 acpi_rsdt_forced; #ifdef CONFIG_ACPI @@ -218,6 +220,10 @@ static void __cpuinit acpi_register_lapic(int id, u8 enabled) { unsigned int ver = 0; + /* We don''t want to register lapics when in Xen dom0 */ + if (xen_initial_domain()) + return; + if (!enabled) { ++disabled_cpus; return; @@ -802,6 +808,10 @@ static int __init acpi_parse_fadt(struct acpi_table_header *table) static void __init acpi_register_lapic_address(unsigned long address) { + /* Xen dom0 doesn''t have usable lapics */ + if (xen_initial_domain()) + return; + mp_lapic_addr = address; set_fixmap_nocache(FIX_APIC_BASE, address); -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 02/17] x86: add io_apic_ops to allow interception
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add a io_apic_ops for it to intercept. Do this as ops structure because there''s at least some chance that another paravirtualized environment may want to intercept these. [Impact: indirect IO APIC access via io_apic_ops] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/io_apic.h | 9 +++++++ arch/x86/kernel/apic/io_apic.c | 50 +++++++++++++++++++++++++++++++++++++-- 2 files changed, 56 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index 9d826e4..8cbfe73 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -21,6 +21,15 @@ #define IO_APIC_REDIR_LEVEL_TRIGGER (1 << 15) #define IO_APIC_REDIR_MASKED (1 << 16) +struct io_apic_ops { + void (*init)(void); + unsigned int (*read)(unsigned int apic, unsigned int reg); + void (*write)(unsigned int apic, unsigned int reg, unsigned int value); + void (*modify)(unsigned int apic, unsigned int reg, unsigned int value); +}; + +void __init set_io_apic_ops(const struct io_apic_ops *); + /* * The structure of the IO-APIC: */ diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 30da617..c24f116 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -66,6 +66,25 @@ #define __apicdebuginit(type) static type __init +static void __init __ioapic_init_mappings(void); +static unsigned int __io_apic_read(unsigned int apic, unsigned int reg); +static void __io_apic_write(unsigned int apic, unsigned int reg, + unsigned int val); +static void __io_apic_modify(unsigned int apic, unsigned int reg, + unsigned int val); + +static struct io_apic_ops io_apic_ops = { + .init = __ioapic_init_mappings, + .read = __io_apic_read, + .write = __io_apic_write, + .modify = __io_apic_modify, +}; + +void __init set_io_apic_ops(const struct io_apic_ops *ops) +{ + io_apic_ops = *ops; +} + /* * Is the SiS APIC rmw bug present ? * -1 = don''t know, 0 = no, 1 = yes @@ -385,6 +404,24 @@ set_extra_move_desc(struct irq_desc *desc, const struct cpumask *mask) } #endif +static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg) +{ + return io_apic_ops.read(apic, reg); +} + +static inline void io_apic_write(unsigned int apic, unsigned int reg, + unsigned int value) +{ + io_apic_ops.write(apic, reg, value); +} + +static inline void io_apic_modify(unsigned int apic, unsigned int reg, + unsigned int value) +{ + io_apic_ops.modify(apic, reg, value); +} + + struct io_apic { unsigned int index; unsigned int unused[3]; @@ -405,14 +442,15 @@ static inline void io_apic_eoi(unsigned int apic, unsigned int vector) writel(vector, &io_apic->eoi); } -static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg) +static unsigned int __io_apic_read(unsigned int apic, unsigned int reg) { struct io_apic __iomem *io_apic = io_apic_base(apic); writel(reg, &io_apic->index); return readl(&io_apic->data); } -static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value) +static void __io_apic_write(unsigned int apic, unsigned int reg, + unsigned int value) { struct io_apic __iomem *io_apic = io_apic_base(apic); writel(reg, &io_apic->index); @@ -425,7 +463,8 @@ static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned i * * Older SiS APIC requires we rewrite the index register */ -static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value) +static void __io_apic_modify(unsigned int apic, unsigned int reg, + unsigned int value) { struct io_apic __iomem *io_apic = io_apic_base(apic); @@ -4141,6 +4180,11 @@ static struct resource * __init ioapic_setup_resources(void) void __init ioapic_init_mappings(void) { + io_apic_ops.init(); +} + +static void __init __ioapic_init_mappings(void) +{ unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0; struct resource *ioapic_res; int i; -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 03/17] xen: implement io_apic_ops
Writes to the IO APIC are paravirtualized via hypercalls, so implement the appropriate operations. [ Impact: implement Xen interface for io_apic_ops ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/Makefile | 2 +- arch/x86/xen/apic.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ arch/x86/xen/enlighten.c | 2 + arch/x86/xen/xen-ops.h | 6 ++++ 4 files changed, 73 insertions(+), 1 deletions(-) create mode 100644 arch/x86/xen/apic.c diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index c4cda96..73ecb74 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -11,4 +11,4 @@ obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_SMP) += smp.o spinlock.o obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o -obj-$(CONFIG_XEN_DOM0) += vga.o +obj-$(CONFIG_XEN_DOM0) += vga.o apic.o diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c new file mode 100644 index 0000000..8ae563c --- /dev/null +++ b/arch/x86/xen/apic.c @@ -0,0 +1,64 @@ +#include <linux/kernel.h> +#include <linux/threads.h> +#include <linux/bitmap.h> + +#include <asm/io_apic.h> +#include <asm/acpi.h> + +#include <asm/xen/hypervisor.h> +#include <asm/xen/hypercall.h> + +#include <xen/interface/xen.h> +#include <xen/interface/physdev.h> + +static void __init xen_io_apic_init(void) +{ +} + +static unsigned int xen_io_apic_read(unsigned apic, unsigned reg) +{ + struct physdev_apic apic_op; + int ret; + + apic_op.apic_physbase = mp_ioapics[apic].apicaddr; + apic_op.reg = reg; + ret = HYPERVISOR_physdev_op(PHYSDEVOP_apic_read, &apic_op); + if (ret) + BUG(); + return apic_op.value; +} + + +static void xen_io_apic_write(unsigned int apic, unsigned int reg, unsigned int value) +{ + struct physdev_apic apic_op; + + apic_op.apic_physbase = mp_ioapics[apic].apicaddr; + apic_op.reg = reg; + apic_op.value = value; + if (HYPERVISOR_physdev_op(PHYSDEVOP_apic_write, &apic_op)) + BUG(); +} + +static struct io_apic_ops __initdata xen_ioapic_ops = { + .init = xen_io_apic_init, + .read = xen_io_apic_read, + .write = xen_io_apic_write, + .modify = xen_io_apic_write, +}; + +void xen_init_apic(void) +{ + if (!xen_initial_domain()) + return; + + set_io_apic_ops(&xen_ioapic_ops); + +#ifdef CONFIG_ACPI + /* + * Pretend ACPI found our lapic even though we''ve disabled it, + * to prevent MP tables from setting up lapics. + */ + acpi_lapic = 1; +#endif +} diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 12e4d9c..3a4932a 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1085,6 +1085,8 @@ asmlinkage void __init xen_start_kernel(void) set_iopl.iopl = 1; if (HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl) == -1) BUG(); + + xen_init_apic(); } /* set the limit of our address space */ diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 40abcef..0853949 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -76,13 +76,19 @@ struct dom0_vga_console_info; #ifdef CONFIG_XEN_DOM0 void xen_init_vga(const struct dom0_vga_console_info *, size_t size); +void xen_init_apic(void); #else static inline void xen_init_vga(const struct dom0_vga_console_info *info, size_t size) { } + +static inline void xen_init_apic(void) +{ +} #endif + /* Declare an asm function, along with symbols needed to make it inlineable */ #define DECL_ASM(ret, name, ...) \ -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 04/17] xen: create dummy ioapic mapping
We don''t allow direct access to the IO apic, so make sure that any request to map it just "maps" non-present pages. We should see any attempts at direct access explode nicely. [ Impact: debuggability (make failures obvious) ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/mmu.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 331e52d..139c8de 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1919,6 +1919,16 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) pte = pfn_pte(phys, prot); break; +#ifdef CONFIG_X86_IO_APIC + case FIX_IO_APIC_BASE_0 ... FIX_IO_APIC_BASE_END: + /* + * We just don''t map the IO APIC - all access is via + * hypercalls. Keep the address in the pte for reference. + */ + pte = pfn_pte(phys, PAGE_NONE); + break; +#endif + case FIX_PARAVIRT_BOOTMAP: /* This is an MFN, but it isn''t an IO mapping from the IO domain */ -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 05/17] xen: implement pirq type event channels
A privileged PV Xen domain can get direct access to hardware. In order for this to be useful, it must be able to get hardware interrupts. Being a PV Xen domain, all interrupts are delivered as event channels. PIRQ event channels are bound to a pirq number and an interrupt vector. When a IO APIC raises a hardware interrupt on that vector, it is delivered as an event channel, which we can deliver to the appropriate device driver(s). This patch simply implements the infrastructure for dealing with pirq event channels. [ Impact: integrate hardware interrupts into Xen''s event scheme ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- drivers/xen/events.c | 245 +++++++++++++++++++++++++++++++++++++++++++++++++- include/xen/events.h | 11 +++ 2 files changed, 253 insertions(+), 3 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 97f4b39..fd98c19 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -16,7 +16,7 @@ * (typically dom0). * 2. VIRQs, typically used for timers. These are per-cpu events. * 3. IPIs. - * 4. Hardware interrupts. Not supported at present. + * 4. PIRQs - Hardware interrupts. * * Jeremy Fitzhardinge <jeremy@xensource.com>, XenSource Inc, 2007 */ @@ -40,6 +40,9 @@ #include <xen/interface/xen.h> #include <xen/interface/event_channel.h> +/* Leave low irqs free for identity mapping */ +#define LEGACY_IRQS 16 + /* * This lock protects updates to the following mapping and reference-count * arrays. The lock does not need to be acquired to read the mapping tables. @@ -83,10 +86,12 @@ struct irq_info enum ipi_vector ipi; struct { unsigned short gsi; - unsigned short vector; + unsigned char vector; + unsigned char flags; } pirq; } u; }; +#define PIRQ_NEEDS_EOI (1 << 0) static struct irq_info irq_info[NR_IRQS]; @@ -106,6 +111,7 @@ static inline unsigned long *cpu_evtchn_mask(int cpu) #define VALID_EVTCHN(chn) ((chn) != 0) static struct irq_chip xen_dynamic_chip; +static struct irq_chip xen_pirq_chip; /* Constructor for packed IRQ information. */ static struct irq_info mk_unbound_info(void) @@ -218,6 +224,15 @@ static unsigned int cpu_from_evtchn(unsigned int evtchn) return ret; } +static bool pirq_needs_eoi(unsigned irq) +{ + struct irq_info *info = info_for_irq(irq); + + BUG_ON(info->type != IRQT_PIRQ); + + return info->u.pirq.flags & PIRQ_NEEDS_EOI; +} + static inline unsigned long active_evtchns(unsigned int cpu, struct shared_info *sh, unsigned int idx) @@ -334,7 +349,7 @@ static int find_unbound_irq(void) int irq; struct irq_desc *desc; - for (irq = 0; irq < nr_irqs; irq++) + for (irq = LEGACY_IRQS; irq < nr_irqs; irq++) if (irq_info[irq].type == IRQT_UNBOUND) break; @@ -350,6 +365,210 @@ static int find_unbound_irq(void) return irq; } +static bool identity_mapped_irq(unsigned irq) +{ + /* only identity map legacy irqs */ + return irq < LEGACY_IRQS; +} + +static void pirq_unmask_notify(int irq) +{ + struct physdev_eoi eoi = { .irq = irq }; + + if (unlikely(pirq_needs_eoi(irq))) { + int rc = HYPERVISOR_physdev_op(PHYSDEVOP_eoi, &eoi); + WARN_ON(rc); + } +} + +static void pirq_query_unmask(int irq) +{ + struct physdev_irq_status_query irq_status; + struct irq_info *info = info_for_irq(irq); + + BUG_ON(info->type != IRQT_PIRQ); + + irq_status.irq = irq; + if (HYPERVISOR_physdev_op(PHYSDEVOP_irq_status_query, &irq_status)) + irq_status.flags = 0; + + info->u.pirq.flags &= ~PIRQ_NEEDS_EOI; + if (irq_status.flags & XENIRQSTAT_needs_eoi) + info->u.pirq.flags |= PIRQ_NEEDS_EOI; +} + +static bool probing_irq(int irq) +{ + struct irq_desc *desc = irq_to_desc(irq); + + return desc && desc->action == NULL; +} + +static unsigned int startup_pirq(unsigned int irq) +{ + struct evtchn_bind_pirq bind_pirq; + struct irq_info *info = info_for_irq(irq); + int evtchn = evtchn_from_irq(irq); + + BUG_ON(info->type != IRQT_PIRQ); + + if (VALID_EVTCHN(evtchn)) + goto out; + + bind_pirq.pirq = irq; + /* NB. We are happy to share unless we are probing. */ + bind_pirq.flags = probing_irq(irq) ? 0 : BIND_PIRQ__WILL_SHARE; + if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq) != 0) { + if (!probing_irq(irq)) + printk(KERN_INFO "Failed to obtain physical IRQ %d\n", + irq); + return 0; + } + evtchn = bind_pirq.port; + + pirq_query_unmask(irq); + + evtchn_to_irq[evtchn] = irq; + bind_evtchn_to_cpu(evtchn, 0); + info->evtchn = evtchn; + + out: + unmask_evtchn(evtchn); + pirq_unmask_notify(irq); + + return 0; +} + +static void shutdown_pirq(unsigned int irq) +{ + struct evtchn_close close; + struct irq_info *info = info_for_irq(irq); + int evtchn = evtchn_from_irq(irq); + + BUG_ON(info->type != IRQT_PIRQ); + + if (!VALID_EVTCHN(evtchn)) + return; + + mask_evtchn(evtchn); + + close.port = evtchn; + if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) + BUG(); + + bind_evtchn_to_cpu(evtchn, 0); + evtchn_to_irq[evtchn] = -1; + info->evtchn = 0; +} + +static void enable_pirq(unsigned int irq) +{ + startup_pirq(irq); +} + +static void disable_pirq(unsigned int irq) +{ +} + +static void ack_pirq(unsigned int irq) +{ + int evtchn = evtchn_from_irq(irq); + + move_native_irq(irq); + + if (VALID_EVTCHN(evtchn)) { + mask_evtchn(evtchn); + clear_evtchn(evtchn); + } +} + +static void end_pirq(unsigned int irq) +{ + int evtchn = evtchn_from_irq(irq); + struct irq_desc *desc = irq_to_desc(irq); + + if (WARN_ON(!desc)) + return; + + if ((desc->status & (IRQ_DISABLED|IRQ_PENDING)) =+ (IRQ_DISABLED|IRQ_PENDING)) { + shutdown_pirq(irq); + } else if (VALID_EVTCHN(evtchn)) { + unmask_evtchn(evtchn); + pirq_unmask_notify(irq); + } +} + +static int find_irq_by_gsi(unsigned gsi) +{ + int irq; + + for (irq = 0; irq < NR_IRQS; irq++) { + struct irq_info *info = info_for_irq(irq); + + if (info == NULL || info->type != IRQT_PIRQ) + continue; + + if (gsi_from_irq(irq) == gsi) + return irq; + } + + return -1; +} + +/* + * Allocate a physical irq, along with a vector. We don''t assign an + * event channel until the irq actually started up. Return an + * existing irq if we''ve already got one for the gsi. + */ +int xen_allocate_pirq(unsigned gsi) +{ + int irq; + struct physdev_irq irq_op; + + spin_lock(&irq_mapping_update_lock); + + irq = find_irq_by_gsi(gsi); + if (irq != -1) { + printk(KERN_INFO "xen_allocate_pirq: returning irq %d for gsi %u\n", + irq, gsi); + goto out; /* XXX need refcount? */ + } + + if (identity_mapped_irq(gsi)) { + irq = gsi; + dynamic_irq_init(irq); + } else + irq = find_unbound_irq(); + + set_irq_chip_and_handler_name(irq, &xen_pirq_chip, + handle_level_irq, "pirq"); + + irq_op.irq = irq; + if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) { + dynamic_irq_cleanup(irq); + irq = -ENOSPC; + goto out; + } + + irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + +out: + spin_unlock(&irq_mapping_update_lock); + + return irq; +} + +int xen_vector_from_irq(unsigned irq) +{ + return vector_from_irq(irq); +} + +int xen_gsi_from_irq(unsigned irq) +{ + return gsi_from_irq(irq); +} + int bind_evtchn_to_irq(unsigned int evtchn) { int irq; @@ -922,6 +1141,26 @@ static struct irq_chip xen_dynamic_chip __read_mostly = { .retrigger = retrigger_dynirq, }; +static struct irq_chip xen_pirq_chip __read_mostly = { + .name = "xen-pirq", + + .startup = startup_pirq, + .shutdown = shutdown_pirq, + + .enable = enable_pirq, + .unmask = enable_pirq, + + .disable = disable_pirq, + .mask = disable_pirq, + + .ack = ack_pirq, + .end = end_pirq, + + .set_affinity = set_affinity_irq, + + .retrigger = retrigger_dynirq, +}; + void __init xen_init_IRQ(void) { int i; diff --git a/include/xen/events.h b/include/xen/events.h index 9f24b64..e5b541d 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -58,4 +58,15 @@ void xen_poll_irq(int irq); /* Determine the IRQ which is bound to an event channel */ unsigned irq_from_evtchn(unsigned int evtchn); +/* Allocate an irq for a physical interrupt, given a gsi. "Legacy" + GSIs are identity mapped; others are dynamically allocated as + usual. */ +int xen_allocate_pirq(unsigned gsi); + +/* Return vector allocated to pirq */ +int xen_vector_from_irq(unsigned pirq); + +/* Return gsi allocated to pirq */ +int xen_gsi_from_irq(unsigned pirq); + #endif /* _XEN_EVENTS_H */ -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 06/17] x86/io_apic: add get_nr_irqs_gsi()
From: Jeremy Fitzhardinge <jeremy@f9-builder.(none)> Add get_nr_irqs_gsi() to return nr_irqs_gsi. Xen will use this to determine how many irqs it needs to reserve for hardware irqs. [ Impact: new interface to get max GSI ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> --- arch/x86/include/asm/io_apic.h | 1 + arch/x86/kernel/apic/io_apic.c | 5 +++++ 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index 8cbfe73..e33ccb7 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -181,6 +181,7 @@ extern void reinit_intr_remapped_IO_APIC(int intr_remapping, #endif extern void probe_nr_irqs_gsi(void); +extern int get_nr_irqs_gsi(void); extern int setup_ioapic_entry(int apic, int irq, struct IO_APIC_route_entry *entry, diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index c24f116..07dc530 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -3917,6 +3917,11 @@ void __init probe_nr_irqs_gsi(void) printk(KERN_DEBUG "nr_irqs_gsi: %d\n", nr_irqs_gsi); } +int get_nr_irqs_gsi(void) +{ + return nr_irqs_gsi; +} + #ifdef CONFIG_SPARSE_IRQ int __init arch_probe_nr_irqs(void) { -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 07/17] xen/apic: identity map gsi->irqs
From: Jeremy Fitzhardinge <jeremy@f9-builder.(none)> Reserve the lower irq range for use for hardware interrupts so we can identity-map them. [ Impact: preserve compat with native ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- drivers/xen/events.c | 23 +++++++++++++++++------ 1 files changed, 17 insertions(+), 6 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index fd98c19..88395bb 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -31,6 +31,7 @@ #include <asm/ptrace.h> #include <asm/irq.h> #include <asm/idle.h> +#include <asm/io_apic.h> #include <asm/sync_bitops.h> #include <asm/xen/hypercall.h> #include <asm/xen/hypervisor.h> @@ -40,9 +41,6 @@ #include <xen/interface/xen.h> #include <xen/interface/event_channel.h> -/* Leave low irqs free for identity mapping */ -#define LEGACY_IRQS 16 - /* * This lock protects updates to the following mapping and reference-count * arrays. The lock does not need to be acquired to read the mapping tables. @@ -344,12 +342,24 @@ static void unmask_evtchn(int port) put_cpu(); } +static int get_nr_hw_irqs(void) +{ + int ret = 1; + +#ifdef CONFIG_X86_IO_APIC + ret = get_nr_irqs_gsi(); +#endif + + return ret; +} + static int find_unbound_irq(void) { int irq; struct irq_desc *desc; + int start = get_nr_hw_irqs(); - for (irq = LEGACY_IRQS; irq < nr_irqs; irq++) + for (irq = start; irq < nr_irqs; irq++) if (irq_info[irq].type == IRQT_UNBOUND) break; @@ -367,8 +377,8 @@ static int find_unbound_irq(void) static bool identity_mapped_irq(unsigned irq) { - /* only identity map legacy irqs */ - return irq < LEGACY_IRQS; + /* identity map all the hardware irqs */ + return irq < get_nr_hw_irqs(); } static void pirq_unmask_notify(int irq) @@ -537,6 +547,7 @@ int xen_allocate_pirq(unsigned gsi) if (identity_mapped_irq(gsi)) { irq = gsi; + irq_to_desc_alloc_cpu(irq, 0); dynamic_irq_init(irq); } else irq = find_unbound_irq(); -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 08/17] xen: direct irq registration to pirq event channels
This patch puts the hooks into place so that when the interrupt subsystem registers an irq, it gets routed via Xen (if we''re running under Xen). The first step is to get a gsi for a particular device+pin. We use the normal acpi interrupt routing to do the mapping. We reserve enough irq space to fit the hardware interrupt sources in, so we can allocate the irq == gsi, as we do in the native case; software events will get allocated irqs above that. Having allocated an irq, we ask Xen to allocate a vector, and then bind that pirq/vector to an event channel. When the hardware raises an interrupt on a vector, Xen signals us on the corresponding event channel, which gets routed to the irq and delivered to the appropriate device driver. This patch does everything except set up the IO APIC pin routing to the vector. [ Impact: route hardware interrupts via Xen ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/xen/pci.h | 13 +++++++++++ arch/x86/kernel/acpi/boot.c | 8 ++++++- arch/x86/xen/Kconfig | 11 +++++++++ arch/x86/xen/Makefile | 1 + arch/x86/xen/pci.c | 47 ++++++++++++++++++++++++++++++++++++++++ drivers/xen/events.c | 6 ++++- include/xen/events.h | 8 ++++++ 7 files changed, 92 insertions(+), 2 deletions(-) create mode 100644 arch/x86/include/asm/xen/pci.h create mode 100644 arch/x86/xen/pci.c diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h new file mode 100644 index 0000000..0563fc6 --- /dev/null +++ b/arch/x86/include/asm/xen/pci.h @@ -0,0 +1,13 @@ +#ifndef _ASM_X86_XEN_PCI_H +#define _ASM_X86_XEN_PCI_H + +#ifdef CONFIG_XEN_DOM0_PCI +int xen_register_gsi(u32 gsi, int triggering, int polarity); +#else +static inline int xen_register_gsi(u32 gsi, int triggering, int polarity) +{ + return -1; +} +#endif + +#endif /* _ASM_X86_XEN_PCI_H */ diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 4147e0c..d4de1c2 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -41,6 +41,8 @@ #include <asm/mpspec.h> #include <asm/smp.h> +#include <asm/xen/pci.h> + #include <asm/xen/hypervisor.h> static int __initdata acpi_force = 0; @@ -530,9 +532,13 @@ int acpi_gsi_to_irq(u32 gsi, unsigned int *irq) */ int acpi_register_gsi(u32 gsi, int triggering, int polarity) { - unsigned int irq; + int irq; unsigned int plat_gsi = gsi; + irq = xen_register_gsi(gsi, triggering, polarity); + if (irq >= 0) + return irq; + #ifdef CONFIG_PCI /* * Make sure all (legacy) PCI IRQs are set as level-triggered. diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index fe69286..42e9f0a 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -37,6 +37,17 @@ config XEN_DEBUG_FS Enable statistics output and various tuning options in debugfs. Enabling this option may incur a significant performance overhead. +config XEN_PCI_PASSTHROUGH + bool #"Enable support for Xen PCI passthrough devices" + depends on XEN && PCI + help + Enable support for passing PCI devices through to + unprivileged domains. (COMPLETELY UNTESTED) + +config XEN_DOM0_PCI + def_bool y + depends on XEN_DOM0 && PCI + config XEN_DOM0 bool "Enable Xen privileged domain support" depends on XEN && X86_IO_APIC && ACPI diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index 73ecb74..639965a 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -12,3 +12,4 @@ obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_SMP) += smp.o spinlock.o obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o apic.o +obj-$(CONFIG_XEN_DOM0_PCI) += pci.o \ No newline at end of file diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c new file mode 100644 index 0000000..f450007 --- /dev/null +++ b/arch/x86/xen/pci.c @@ -0,0 +1,47 @@ +#include <linux/kernel.h> +#include <linux/acpi.h> +#include <linux/pci.h> + +#include <asm/pci_x86.h> + +#include <asm/xen/hypervisor.h> + +#include <xen/interface/xen.h> +#include <xen/events.h> + +#include "xen-ops.h" + +int xen_register_gsi(u32 gsi, int triggering, int polarity) +{ + int irq; + + if (!xen_domain()) + return -1; + + printk(KERN_DEBUG "xen: registering gsi %u triggering %d polarity %d\n", + gsi, triggering, polarity); + + irq = xen_allocate_pirq(gsi); + + printk(KERN_DEBUG "xen: --> irq=%d\n", irq); + + return irq; +} + +void __init xen_setup_pirqs(void) +{ +#ifdef CONFIG_ACPI + int irq; + + /* + * Set up acpi interrupt in acpi_gbl_FADT.sci_interrupt. + */ + irq = xen_allocate_pirq(acpi_gbl_FADT.sci_interrupt); + + printk(KERN_INFO "xen: allocated irq %d for acpi %d\n", + irq, acpi_gbl_FADT.sci_interrupt); + + /* Blerk. */ + acpi_gbl_FADT.sci_interrupt = irq; +#endif +} diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 88395bb..968e927 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -419,6 +419,7 @@ static unsigned int startup_pirq(unsigned int irq) struct evtchn_bind_pirq bind_pirq; struct irq_info *info = info_for_irq(irq); int evtchn = evtchn_from_irq(irq); + int rc; BUG_ON(info->type != IRQT_PIRQ); @@ -428,7 +429,8 @@ static unsigned int startup_pirq(unsigned int irq) bind_pirq.pirq = irq; /* NB. We are happy to share unless we are probing. */ bind_pirq.flags = probing_irq(irq) ? 0 : BIND_PIRQ__WILL_SHARE; - if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq) != 0) { + rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq); + if (rc != 0) { if (!probing_irq(irq)) printk(KERN_INFO "Failed to obtain physical IRQ %d\n", irq); @@ -1187,4 +1189,6 @@ void __init xen_init_IRQ(void) mask_evtchn(i); irq_ctx_init(smp_processor_id()); + + xen_setup_pirqs(); } diff --git a/include/xen/events.h b/include/xen/events.h index e5b541d..6fe4863 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -69,4 +69,12 @@ int xen_vector_from_irq(unsigned pirq); /* Return gsi allocated to pirq */ int xen_gsi_from_irq(unsigned pirq); +#ifdef CONFIG_XEN_DOM0_PCI +void xen_setup_pirqs(void); +#else +static inline void xen_setup_pirqs(void) +{ +} +#endif + #endif /* _XEN_EVENTS_H */ -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 09/17] xen: bind pirq to vector and event channel
Having converting a dev+pin to a gsi, and that gsi to an irq, and allocated a vector for the irq, we must program the IO APIC to deliver an interrupt on a pin to the vector, so Xen can deliver it as an event channel. Given the pirq, we can get the gsi and vector. We map the gsi to a specific IO APIC''s pin, and set the routing entry. (We were passing the ACPI triggering and polarity levels directly into the apic - but they have reversed values. The result was that all the level-triggered interrupts were edge, and vice-versa. It''s surprising that anything worked at all, but now AHCI works for me. Thanks for Gerd Hoffmann for noticing this.) [ Impact: program IO APICs under Xen ] Diagnosed-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/apic.c | 2 ++ arch/x86/xen/pci.c | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 8ae563c..35a8af7 100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -4,6 +4,7 @@ #include <asm/io_apic.h> #include <asm/acpi.h> +#include <asm/hw_irq.h> #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> @@ -13,6 +14,7 @@ static void __init xen_io_apic_init(void) { + enable_IO_APIC(); } static unsigned int xen_io_apic_read(unsigned apic, unsigned reg) diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c index f450007..af4e898 100644 --- a/arch/x86/xen/pci.c +++ b/arch/x86/xen/pci.c @@ -2,6 +2,8 @@ #include <linux/acpi.h> #include <linux/pci.h> +#include <asm/mpspec.h> +#include <asm/io_apic.h> #include <asm/pci_x86.h> #include <asm/xen/hypervisor.h> @@ -11,6 +13,32 @@ #include "xen-ops.h" +static void xen_set_io_apic_routing(int irq, int trigger, int polarity) +{ + int ioapic, ioapic_pin; + int vector, gsi; + struct IO_APIC_route_entry entry; + + gsi = xen_gsi_from_irq(irq); + vector = xen_vector_from_irq(irq); + + ioapic = mp_find_ioapic(gsi); + if (ioapic == -1) { + printk(KERN_WARNING "xen_set_ioapic_routing: irq %d gsi %d ioapic %d\n", + irq, gsi, ioapic); + return; + } + + ioapic_pin = mp_find_ioapic_pin(ioapic, gsi); + + printk(KERN_INFO "xen_set_ioapic_routing: irq %d gsi %d vector %d ioapic %d pin %d triggering %d polarity %d\n", + irq, gsi, vector, ioapic, ioapic_pin, trigger, polarity); + + setup_ioapic_entry(ioapic, -1, &entry, ~0, trigger, polarity, vector, + ioapic_pin); + ioapic_write_entry(ioapic, ioapic_pin, entry); +} + int xen_register_gsi(u32 gsi, int triggering, int polarity) { int irq; @@ -25,6 +53,11 @@ int xen_register_gsi(u32 gsi, int triggering, int polarity) printk(KERN_DEBUG "xen: --> irq=%d\n", irq); + if (irq > 0) + xen_set_io_apic_routing(irq, + triggering == ACPI_EDGE_SENSITIVE ? 0 : 1, + polarity == ACPI_ACTIVE_HIGH ? 0 : 1); + return irq; } -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 10/17] xen: pre-initialize legacy irqs early
From: Ian Campbell <ian.campbell@citrix.com> Various legacy devices, such as IDE, assume their legacy interrupts are already initialized and are immediately usable. Pre-initialize all the legacy interrupts. [ Impact: ISA/legacy device compat ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/pci.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c index af4e898..402a5bd 100644 --- a/arch/x86/xen/pci.c +++ b/arch/x86/xen/pci.c @@ -63,9 +63,9 @@ int xen_register_gsi(u32 gsi, int triggering, int polarity) void __init xen_setup_pirqs(void) { -#ifdef CONFIG_ACPI int irq; +#ifdef CONFIG_ACPI /* * Set up acpi interrupt in acpi_gbl_FADT.sci_interrupt. */ @@ -77,4 +77,8 @@ void __init xen_setup_pirqs(void) /* Blerk. */ acpi_gbl_FADT.sci_interrupt = irq; #endif + + /* Pre-allocate legacy irqs */ + for (irq = 0; irq < NR_IRQS_LEGACY; irq++) + xen_allocate_pirq(irq); } -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 11/17] xen: don''t setup acpi interrupt unless there is one
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> If the SCI hasn''t been set, then presumably we''re not running with acpi, don''t bother setting up the interrupt. [ Impact: compatibility with pre-ACPI machines ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/pci.c | 11 +++++------ 1 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c index 402a5bd..00ad6df 100644 --- a/arch/x86/xen/pci.c +++ b/arch/x86/xen/pci.c @@ -69,13 +69,12 @@ void __init xen_setup_pirqs(void) /* * Set up acpi interrupt in acpi_gbl_FADT.sci_interrupt. */ - irq = xen_allocate_pirq(acpi_gbl_FADT.sci_interrupt); + if (acpi_gbl_FADT.sci_interrupt > 0) { + irq = xen_allocate_pirq(acpi_gbl_FADT.sci_interrupt); - printk(KERN_INFO "xen: allocated irq %d for acpi %d\n", - irq, acpi_gbl_FADT.sci_interrupt); - - /* Blerk. */ - acpi_gbl_FADT.sci_interrupt = irq; + printk(KERN_INFO "xen: allocated irq %d for acpi %d\n", + irq, acpi_gbl_FADT.sci_interrupt); + } #endif /* Pre-allocate legacy irqs */ -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 12/17] xen: use acpi_get_override_irq() to get triggering for legacy irqs
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> We need to set up proper IO apic entries for legacy irqs, which are not normally configured by either normal acpi interrupt routing or PNP. This also generalizes the acpi interrupt setup, so we can remove it as a special case. [ Impact: compatibility with legacy/ISA hardware ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/pci.c | 24 ++++++++++-------------- 1 files changed, 10 insertions(+), 14 deletions(-) diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c index 00ad6df..db0c74c 100644 --- a/arch/x86/xen/pci.c +++ b/arch/x86/xen/pci.c @@ -65,19 +65,15 @@ void __init xen_setup_pirqs(void) { int irq; -#ifdef CONFIG_ACPI - /* - * Set up acpi interrupt in acpi_gbl_FADT.sci_interrupt. - */ - if (acpi_gbl_FADT.sci_interrupt > 0) { - irq = xen_allocate_pirq(acpi_gbl_FADT.sci_interrupt); - - printk(KERN_INFO "xen: allocated irq %d for acpi %d\n", - irq, acpi_gbl_FADT.sci_interrupt); - } -#endif - /* Pre-allocate legacy irqs */ - for (irq = 0; irq < NR_IRQS_LEGACY; irq++) - xen_allocate_pirq(irq); + for (irq = 0; irq < NR_IRQS_LEGACY; irq++) { + int trigger, polarity; + + if (acpi_get_override_irq(irq, &trigger, &polarity) == -1) + continue; + + xen_register_gsi(irq, + trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE, + polarity ? ACPI_ACTIVE_LOW : ACPI_ACTIVE_HIGH); + } } -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 13/17] xen: initialize irq 0 too
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> IRQ 0 is valid, so make sure it gets initialized properly too. (Though in practice it doesn''t matter, because its the timer interrupt we don''t use under Xen.) [ Impact: theoretical bugfix, cleanup ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/pci.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c index db0c74c..381b7ab 100644 --- a/arch/x86/xen/pci.c +++ b/arch/x86/xen/pci.c @@ -53,7 +53,7 @@ int xen_register_gsi(u32 gsi, int triggering, int polarity) printk(KERN_DEBUG "xen: --> irq=%d\n", irq); - if (irq > 0) + if (irq >= 0) xen_set_io_apic_routing(irq, triggering == ACPI_EDGE_SENSITIVE ? 0 : 1, polarity == ACPI_ACTIVE_HIGH ? 0 : 1); -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 14/17] xen: dynamically allocate irq & event structures
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Dynamically allocate the irq_info and evtchn_to_irq arrays, so that 1) the irq_info array scales to the actual number of possible irqs, and 2) we don''t needlessly increase the static size of the kernel when we aren''t running under Xen. Derived on patch from Mike Travis <travis@sgi.com>. [ Impact: reduce memory usage ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- drivers/xen/events.c | 15 +++++++++------ 1 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 968e927..e6ddf78 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -27,6 +27,7 @@ #include <linux/module.h> #include <linux/string.h> #include <linux/bootmem.h> +#include <linux/irqnr.h> #include <asm/ptrace.h> #include <asm/irq.h> @@ -91,11 +92,9 @@ struct irq_info }; #define PIRQ_NEEDS_EOI (1 << 0) -static struct irq_info irq_info[NR_IRQS]; +static struct irq_info *irq_info; -static int evtchn_to_irq[NR_EVENT_CHANNELS] = { - [0 ... NR_EVENT_CHANNELS-1] = -1 -}; +static int *evtchn_to_irq; struct cpu_evtchn_s { unsigned long bits[NR_EVENT_CHANNELS/BITS_PER_LONG]; }; @@ -515,7 +514,7 @@ static int find_irq_by_gsi(unsigned gsi) { int irq; - for (irq = 0; irq < NR_IRQS; irq++) { + for (irq = 0; irq < nr_irqs; irq++) { struct irq_info *info = info_for_irq(irq); if (info == NULL || info->type != IRQT_PIRQ) @@ -1180,7 +1179,11 @@ void __init xen_init_IRQ(void) size_t size = nr_cpu_ids * sizeof(struct cpu_evtchn_s); cpu_evtchn_mask_p = alloc_bootmem(size); - BUG_ON(cpu_evtchn_mask_p == NULL); + irq_info = alloc_bootmem(nr_irqs * sizeof(*irq_info)); + + evtchn_to_irq = alloc_bootmem(NR_EVENT_CHANNELS * sizeof(*evtchn_to_irq)); + for (i = 0; i < NR_EVENT_CHANNELS; i++) + evtchn_to_irq[i] = -1; init_evtchn_cpu_bindings(); -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 15/17] xen: set pirq name to something useful.
From: Gerd Hoffmann <kraxel@xeni.home.kraxel.org> Make pirq show useful information in /proc/interrupts [ Impact: better output in /proc/interrupts ] Signed-off-by: Gerd Hoffmann <kraxel@xeni.home.kraxel.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/pci.c | 3 ++- drivers/xen/events.c | 4 ++-- include/xen/events.h | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c index 381b7ab..4b286f1 100644 --- a/arch/x86/xen/pci.c +++ b/arch/x86/xen/pci.c @@ -49,7 +49,8 @@ int xen_register_gsi(u32 gsi, int triggering, int polarity) printk(KERN_DEBUG "xen: registering gsi %u triggering %d polarity %d\n", gsi, triggering, polarity); - irq = xen_allocate_pirq(gsi); + irq = xen_allocate_pirq(gsi, (triggering == ACPI_EDGE_SENSITIVE) + ? "ioapic-edge" : "ioapic-level"); printk(KERN_DEBUG "xen: --> irq=%d\n", irq); diff --git a/drivers/xen/events.c b/drivers/xen/events.c index e6ddf78..f84d13b 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -532,7 +532,7 @@ static int find_irq_by_gsi(unsigned gsi) * event channel until the irq actually started up. Return an * existing irq if we''ve already got one for the gsi. */ -int xen_allocate_pirq(unsigned gsi) +int xen_allocate_pirq(unsigned gsi, char *name) { int irq; struct physdev_irq irq_op; @@ -554,7 +554,7 @@ int xen_allocate_pirq(unsigned gsi) irq = find_unbound_irq(); set_irq_chip_and_handler_name(irq, &xen_pirq_chip, - handle_level_irq, "pirq"); + handle_level_irq, name); irq_op.irq = irq; if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) { diff --git a/include/xen/events.h b/include/xen/events.h index 6fe4863..4b19b9c 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -61,7 +61,7 @@ unsigned irq_from_evtchn(unsigned int evtchn); /* Allocate an irq for a physical interrupt, given a gsi. "Legacy" GSIs are identity mapped; others are dynamically allocated as usual. */ -int xen_allocate_pirq(unsigned gsi); +int xen_allocate_pirq(unsigned gsi, char *name); /* Return vector allocated to pirq */ int xen_vector_from_irq(unsigned pirq); -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 23:25 UTC
[Xen-devel] [PATCH 16/17] xen: fix legacy irq setup, make ioapic-less machines work.
From: Gerd Hoffmann <kraxel@xeni.home.kraxel.org> If the machine has no IO APICs, then just allocate a set of legacy interrupts. [ Impact: fix Xen compatibility with old machines ] Signed-off-by: Gerd Hoffmann <kraxel@xeni.home.kraxel.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/pci.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c index 4b286f1..07b59fe 100644 --- a/arch/x86/xen/pci.c +++ b/arch/x86/xen/pci.c @@ -66,6 +66,12 @@ void __init xen_setup_pirqs(void) { int irq; + if (0 == nr_ioapics) { + for (irq = 0; irq < NR_IRQS_LEGACY; irq++) + xen_allocate_pirq(irq, "xt-pic"); + return; + } + /* Pre-allocate legacy irqs */ for (irq = 0; irq < NR_IRQS_LEGACY; irq++) { int trigger, polarity; -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Disable MSI until we support it properly. [ Impact: prevent MSI subsystem from crashing ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/apic.c | 3 +++ drivers/pci/pci.h | 2 -- include/linux/pci.h | 6 ++++++ 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 35a8af7..fece57a 100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -1,6 +1,7 @@ #include <linux/kernel.h> #include <linux/threads.h> #include <linux/bitmap.h> +#include <linux/pci.h> #include <asm/io_apic.h> #include <asm/acpi.h> @@ -54,6 +55,8 @@ void xen_init_apic(void) if (!xen_initial_domain()) return; + pci_no_msi(); + set_io_apic_ops(&xen_ioapic_ops); #ifdef CONFIG_ACPI diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index d03f6b9..79ada7b 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -111,10 +111,8 @@ extern struct rw_semaphore pci_bus_sem; extern unsigned int pci_pm_d3_delay; #ifdef CONFIG_PCI_MSI -void pci_no_msi(void); extern void pci_msi_init_pci_dev(struct pci_dev *dev); #else -static inline void pci_no_msi(void) { } static inline void pci_msi_init_pci_dev(struct pci_dev *dev) { } #endif diff --git a/include/linux/pci.h b/include/linux/pci.h index 72698d8..724d030 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1253,5 +1253,11 @@ static inline irqreturn_t pci_sriov_migration(struct pci_dev *dev) } #endif +#ifdef CONFIG_PCI_MSI +void pci_no_msi(void); +#else +static inline void pci_no_msi(void) { } +#endif + #endif /* __KERNEL__ */ #endif /* LINUX_PCI_H */ -- 1.6.0.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2009-May-19 12:35 UTC
[Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:> Hi Ingo, > > Here''s a revised set of the Xen APIC changes which adds > io_apic_ops to allow Xen to intercept IO APIC access operations.In a previous discussion you said:> IO APIC operations are not even slightly performance critical? Are > they ever used on the interrupt delivery path?Since they are not performance critical, then why doesnt Xen catch the IO-APIC accesses, and virtualizes the device? If you want to hook into the IO-APIC code at such a low level, why dont you hook into the _hardware_ API - i.e. catch those setup/routing modifications to the IO-APIC space. No Linux changes are needed in that case. Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-20 17:57 UTC
[Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
Ingo Molnar wrote:> Since they are not performance critical, then why doesnt Xen catch > the IO-APIC accesses, and virtualizes the device? > > If you want to hook into the IO-APIC code at such a low level, why > dont you hook into the _hardware_ API - i.e. catch those > setup/routing modifications to the IO-APIC space. No Linux changes > are needed in that case. >Yes, these changes aren''t for a performance reason. It''s a case where a few lines change in Linux saves many hundreds or thousands of lines change in Xen. Xen doesn''t have an internal mechanism for emulating devices via pagefaults (that''s generally handled by a qemu instance running as part of a guest domain), so there''s no mechanism to map and emulate the io-apic. Putting such support into Xen would mean adding a pile of new infrastructure to support this case. Unlike the mtrr discussion, where the msr read/write ops would allow us to emulate the mtrr within the Xen-specific parts of the kernel, the io-apic ops are just accessed via normal memory writes which we can''t hook, so it would have to be done within Xen. The other thing I thought about was putting a hook in the Linux pagefault handler, so we could emulate the ioapic at that level. But putting a hook in a very hot path to avoid code changes in a cold path doesn''t make any sense. (Same applies to doing PF emulation within Xen; that''s an even hotter path than Linux''s.) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Avi Kivity
2009-May-24 20:10 UTC
[Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
Ingo Molnar wrote:>> IO APIC operations are not even slightly performance critical? Are >> they ever used on the interrupt delivery path? >> > > Since they are not performance critical, then why doesnt Xen catch > the IO-APIC accesses, and virtualizes the device? > > If you want to hook into the IO-APIC code at such a low level, why > dont you hook into the _hardware_ API - i.e. catch those > setup/routing modifications to the IO-APIC space. No Linux changes > are needed in that case. >When x2apic is enabled, and EOI broadcast is disabled, then the io apic does become a hot path - it needs to be written for each level-triggered interrupt EOI. In this case I might want to paravirtualize the EOI write to exit only if an interrupt is pending; otherwise communicate via shared memory. We do something similar for Windows (by patching it) very successfully; Windows likes to touch the APIC TPR ~ 100,000 times per second, usually without triggering an interrupt. We hijack these writes, do the checks in guest context, and only exit if the TPR write would trigger an interrupt. (kvm will likely gain x2apic support in 2.6.32; patches have already been posted) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2009-May-25 03:51 UTC
[Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
* Avi Kivity <avi@redhat.com> wrote:> Ingo Molnar wrote: >>> IO APIC operations are not even slightly performance critical? Are >>> they ever used on the interrupt delivery path? >>> >> >> Since they are not performance critical, then why doesnt Xen catch the >> IO-APIC accesses, and virtualizes the device? >> >> If you want to hook into the IO-APIC code at such a low level, why >> dont you hook into the _hardware_ API - i.e. catch those setup/routing >> modifications to the IO-APIC space. No Linux changes are needed in that >> case. >> > > When x2apic is enabled, and EOI broadcast is disabled, then the io > apic does become a hot path - it needs to be written for each > level-triggered interrupt EOI. In this case I might want to > paravirtualize the EOI write to exit only if an interrupt is > pending; otherwise communicate via shared memory. > > We do something similar for Windows (by patching it) very > successfully; Windows likes to touch the APIC TPR ~ 100,000 times > per second, usually without triggering an interrupt. We hijack > these writes, do the checks in guest context, and only exit if the > TPR write would trigger an interrupt.I suspect you aware of that this is about the io-apic not the local APIC. The local apic methods are already driver-ized - and they sit closer to the CPU so they matter more to performance.> (kvm will likely gain x2apic support in 2.6.32; patches have > already been posted)ok. This points in the direction of the io-apic driver abstraction from Jeremy being the right long-term approach. We already have a few quirks that could be cleaned up by using a proper driver interface. Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2009-May-25 03:54 UTC
[Xen-devel] Re: [PATCH 02/17] x86: add io_apic_ops to allow interception
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add > a io_apic_ops for it to intercept. Do this as ops structure because > there''s at least some chance that another paravirtualized environment > may want to intercept these. > > [Impact: indirect IO APIC access via io_apic_ops] > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > --- > arch/x86/include/asm/io_apic.h | 9 +++++++ > arch/x86/kernel/apic/io_apic.c | 50 +++++++++++++++++++++++++++++++++++++-- > 2 files changed, 56 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h > index 9d826e4..8cbfe73 100644 > --- a/arch/x86/include/asm/io_apic.h > +++ b/arch/x86/include/asm/io_apic.h > @@ -21,6 +21,15 @@ > #define IO_APIC_REDIR_LEVEL_TRIGGER (1 << 15) > #define IO_APIC_REDIR_MASKED (1 << 16) > > +struct io_apic_ops { > + void (*init)(void); > + unsigned int (*read)(unsigned int apic, unsigned int reg); > + void (*write)(unsigned int apic, unsigned int reg, unsigned int value); > + void (*modify)(unsigned int apic, unsigned int reg, unsigned int value); > +}; > + > +void __init set_io_apic_ops(const struct io_apic_ops *);ok, could you please turn the whole IO-APIC code into a driver framework? I.e. all IO-APIC calls outside of arch/x86/kernel/apic/io_apic.c should be to some io_apic-> method. The advantage will be a proper abstraction for all IO-APIC details - not just a minimalistic one for Xen''s need. Also, please name it ''struct io_apic'' - similar to the ''struct apic'' naming we have for the local APIC driver structure. Thanks, Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2009-May-25 04:10 UTC
[Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:> Ingo Molnar wrote: >> Since they are not performance critical, then why doesnt Xen catch the >> IO-APIC accesses, and virtualizes the device? >> >> If you want to hook into the IO-APIC code at such a low level, why >> dont you hook into the _hardware_ API - i.e. catch those setup/routing >> modifications to the IO-APIC space. No Linux changes are needed in that >> case. >> > > Yes, these changes aren''t for a performance reason. It''s a case > where a few lines change in Linux saves many hundreds or thousands > of lines change in Xen. > > Xen doesn''t have an internal mechanism for emulating devices via > pagefaults (that''s generally handled by a qemu instance running as > part of a guest domain), so there''s no mechanism to map and > emulate the io-apic. Putting such support into Xen would mean > adding a pile of new infrastructure to support this case.Note that this design problem has been created by Xen, intentionally, and Xen is now suffering under those bad technical choices made years ago. It''s not Linux''s problem. The whole Xen design is messed up really: you have taken off bits of the Linux kernel you found interesting, turned them into a micro-kernel in essence and renamed it to ''Xen''. But drivers and proper architecture is apparently boring (and fragile and hard and expensive to write and support in a micro-kernel setup) so you came up with this DOM0 piece of cr*p that ties Linux to Xen even closer (along an _ABI_), where Linux does most of the real work while Xen still stays ''separate'' on paper. Xen isnt actually useful _at all_ without Linux/DOM0. Without Dom0 Xen is slow and native hardware support within Xen is virtually non-existent, as you point out above. This is proof that you should have done all that work within Linux - instead of duplicating a lot of code.> Unlike the mtrr discussion, where the msr read/write ops would > allow us to emulate the mtrr within the Xen-specific parts of the > kernel, the io-apic ops are just accessed via normal memory writes > which we can''t hook, so it would have to be done within Xen. > > The other thing I thought about was putting a hook in the Linux > pagefault handler, so we could emulate the ioapic at that level. > But putting a hook in a very hot path to avoid code changes in a > cold path doesn''t make any sense. (Same applies to doing PF > emulation within Xen; that''s an even hotter path than Linux''s.)We already have various page fault notifiers, you could reuse them if you wanted to. Anyway, i''ll pull the IO-APIC driver-ization changes if it''s complete, thorough and clean, because that will obviously help Linux too. But the influx of paravirt overhead slowing down the native kernel has to stop really. Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Avi Kivity
2009-May-25 04:55 UTC
[Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
Ingo Molnar wrote:>> We do something similar for Windows (by patching it) very >> successfully; Windows likes to touch the APIC TPR ~ 100,000 times >> per second, usually without triggering an interrupt. We hijack >> these writes, do the checks in guest context, and only exit if the >> TPR write would trigger an interrupt. >> > > I suspect you aware of that this is about the io-apic not the local > APIC. The local apic methods are already driver-ized - and they sit > closer to the CPU so they matter more to performance. >Yeah, I gave this as an example. It''s very different -- io-apic vs. local apic, paravirtualization vs. patching the guest behind its back, Linux vs. Windows. Of course if we hook the io-apic EOI we''ll want to hook the local apic EOI as well. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2009-May-25 05:06 UTC
[Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
* Avi Kivity <avi@redhat.com> wrote:> Ingo Molnar wrote: >>> We do something similar for Windows (by patching it) very >>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times >>> per second, usually without triggering an interrupt. We hijack >>> these writes, do the checks in guest context, and only exit if the >>> TPR write would trigger an interrupt. >>> >> >> I suspect you aware of that this is about the io-apic not the local >> APIC. The local apic methods are already driver-ized - and they sit >> closer to the CPU so they matter more to performance. >> > > Yeah, I gave this as an example. It''s very different -- io-apic > vs. local apic, paravirtualization vs. patching the guest behind > its back, Linux vs. Windows. > > Of course if we hook the io-apic EOI we''ll want to hook the local > apic EOI as well.Yeah. Eventually anything that matters to performance will be accelerated by hardware (and properly virtualized), which in turn will be faster than any hypercall based approach, right? Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Avi Kivity
2009-May-25 05:12 UTC
[Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
Ingo Molnar wrote:> * Avi Kivity <avi@redhat.com> wrote: > > >> Ingo Molnar wrote: >> >>>> We do something similar for Windows (by patching it) very >>>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times >>>> per second, usually without triggering an interrupt. We hijack >>>> these writes, do the checks in guest context, and only exit if the >>>> TPR write would trigger an interrupt. >>>> >>>> >>> I suspect you aware of that this is about the io-apic not the local >>> APIC. The local apic methods are already driver-ized - and they sit >>> closer to the CPU so they matter more to performance. >>> >>> >> Yeah, I gave this as an example. It''s very different -- io-apic >> vs. local apic, paravirtualization vs. patching the guest behind >> its back, Linux vs. Windows. >> >> Of course if we hook the io-apic EOI we''ll want to hook the local >> apic EOI as well. >> > > Yeah. Eventually anything that matters to performance will be > accelerated by hardware (and properly virtualized), which in turn > will be faster than any hypercall based approach, right? >Right. That''s already happened to the TPR (Intel processors accelerate that 4-bit registers but ignore everything else in the local apic). As another example, we have mmu paravirtualization in kvm, but automatically disable it when the hardware does nested paging. The problem is that hardware support has a long pipeline, and even when support does appear, there''s a massive installed base to care about. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2009-May-25 05:19 UTC
[Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
* Avi Kivity <avi@redhat.com> wrote:> Ingo Molnar wrote: >> * Avi Kivity <avi@redhat.com> wrote: >> >> >>> Ingo Molnar wrote: >>> >>>>> We do something similar for Windows (by patching it) very >>>>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times >>>>> per second, usually without triggering an interrupt. We hijack >>>>> these writes, do the checks in guest context, and only exit if >>>>> the TPR write would trigger an interrupt. >>>>> >>>> I suspect you aware of that this is about the io-apic not the local >>>> APIC. The local apic methods are already driver-ized - and they >>>> sit closer to the CPU so they matter more to performance. >>>> >>> Yeah, I gave this as an example. It''s very different -- io-apic vs. >>> local apic, paravirtualization vs. patching the guest behind its >>> back, Linux vs. Windows. >>> >>> Of course if we hook the io-apic EOI we''ll want to hook the local >>> apic EOI as well. >>> >> >> Yeah. Eventually anything that matters to performance will be >> accelerated by hardware (and properly virtualized), which in turn >> will be faster than any hypercall based approach, right? > > Right. That''s already happened to the TPR (Intel processors > accelerate that 4-bit registers but ignore everything else in the > local apic). As another example, we have mmu paravirtualization > in kvm, but automatically disable it when the hardware does nested > paging. The problem is that hardware support has a long pipeline, > and even when support does appear, there''s a massive installed > base to care about.Yeah. Btw., i also think that in-kernel IO-APIC and APIC emulation could have uses elsewhere as well - such as in testing. Currently you actually have to own a big box to be able to test certain hardware limits. This has a negative effect on test coverage and a subsequent negative effect on kernel quality. If KVM provided clean code to emulate certain hw environments we could check out limits (and our bugs) far more effectively. Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-May-26 12:46 UTC
Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
On Mon, May 25, 2009 at 5:10 AM, Ingo Molnar <mingo@elte.hu> wrote:> Note that this design problem has been created by Xen, > intentionally, and Xen is now suffering under those bad technical > choices made years ago. It''s not Linux''s problem.I''d like to respecfully disagree with this. I think I can see your point of view: you''re being asked to make changes to accommodate a project you''re not involved in, and whose fundamental design you disagree with. And no one disagrees with the stance that changes to accomodate Xen must not impact native performance. But I think the current design (with dom0 running linux-as-hypervisor-component) is the best one, and it''s one we would make over again if we had to start from scratch. Basically, there are three ways to approach the hypervisor problem wrt Linux: 1. Make Linux into a hypervisor (linux-as-hypervisor). This is the KVM approach. 2. Fork Linux, stealing all the device drivers, and making a monolithic hypervisor. 3. Make a small, lean hypervisor, but leverage Linux to run the devices and control stack (linux-as-hypervisor-component). I''ve worked a bit at both kernel and hypervisor level (although admittedly much more in-depth at the hypervisor level). It seems to me that being a hypervisor is a much different thing than being a kernel. I don''t believe that one piece of software can do both well. And I believe that, when it begins to mature more, KVM will run into the very same issue. KVM developers will really want to start to make the kernel into a hypervisor, and there will be a disagreement between those who want the kernel to be just a kernel, and those who want the kernel also to be a hypervisor. The result will be either a heavily modified Linux (much more than linux-as-hypervisor-component) or a really sucky hypervisor. As a simple example, take scheduling. I''m about to re-write the Xen scheduler, and in the process I took a good look at the scheduler you wrote. I think it''s got a lot of really good ideas, which I plan to steal. :-) However, I''m going to have to make some key changes in order for it to function well as a hypervisor scheduler. If KVM is used on a production server with 20 or 30 multi-vcpu VMs, I predict the current scheduler will do very poorly, because it wasn''t designed with VMs in mind, but with processes. Making changes so that VMs run better will fundamentally make things that make processes run less well. Forking Linux, drivers an all, is not a good idea; anyone would have to be a fool to try it. I think if you think seriously about it, you''d never do something like that. I don''t believe any such a project would have a snowball''s chance in hell of attracting anywhere near the required number of hardware developers to make it an enterprise-class system. If, somehow, it did manage to attract a critical mass to make it viable, then the result would be two much weaker projects, wasting millions of man-hours of labor doing unnecessary duplication. No, I think the best option, and the option the Xen project would take again if we were to start from scratch, would be what we have done: To build a hypervisor to be a hypervisor, and let the kernel be a kernel: but leverage the millions of man-hours still being done in hardware support for Linux. Either way, time will tell in the end. If I''m wrong, and KVM can become an enterprise-class hypervisor while playing well with linux-as-kernel, then eventually it will dominate and Xen will die out. You can say "I told you so" and remove all the crap you''ve been objecting to. If I''m right, however, then having Xen around will be critical, not just for open-source virtualization, but for the kernel as well. You''ll be happy to be able to tell people, "Don''t put this hypervisor crap in here. If you want a hypervisor, go to Xen." :-) Until things are shown clearly one way or the other, the best thing to do is hedge your bets, and allow both projects to develop. [That''s my main point; in-line responses below.]> The whole Xen design is messed up really: you have taken off bits of > the Linux kernel you found interesting, turned them into a > micro-kernel in essence and renamed it to ''Xen''.That''s how Xen started, and that''s really the beauty of open-source. (After all, KVM has stolen some ideas from the Xen shadow code.) But since then, basically all of the code has been replaced with Xen-written code. I think if you did an SCO-style audit comparing Linux and Xen 3.4, you''d find a lot less in common than you think.> But drivers and proper architecture is apparently boring (and > fragile and hard and expensive to write and support in a > micro-kernel setup) so you came up with this DOM0 piece of cr*p that > ties Linux to Xen even closer (along an _ABI_), where Linux does > most of the real work while Xen still stays ''separate'' on paper.It''s not boring, it''s just a colossal waste of time and resources to duplicate all that effort. "Real work" is done by all of the components: Xen does the "real work" of scheduling and resource management; Linux does the "real work" of process-level stuff, filesystems, and so on and (in the case of dom0) hardware support; qemu does the "real work" of doing device emulation. All of them are unique, difficult, and interesting to somebody. Reducing duplication means everyone can work on what interests them the most, and minimizes the total "busy work" for all involved. How many KVM developers are working on device drivers? And how would Xen duplicating all the driver development help Linux? Linux would still have to do everything, there''d just be fewer developers to do it (since some people would be working on Xen drivers instead).> Xen isnt actually useful _at all_ without Linux/DOM0. Without Dom0 > Xen is slow and native hardware support within Xen is virtually > non-existent, as you point out above.And qemu-kvm isn''t useful _at_all_ without Linux either; and Linux-KVM isn''t useful _at_all_ without qemu. Your point? Xen will run without dom0? I wasn''t aware of that... ;-)> This is proof that you should have done all that work within Linux - > instead of duplicating a lot of code.See above. -George Dunlap _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Avi Kivity
2009-May-26 18:26 UTC
Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
George Dunlap wrote:> As a simple example, take scheduling. I''m about to re-write the Xen > scheduler, and in the process I took a good look at the scheduler you > wrote. I think it''s got a lot of really good ideas, which I plan to > steal. :-) However, I''m going to have to make some key changes in > order for it to function well as a hypervisor scheduler. If KVM is > used on a production server with 20 or 30 multi-vcpu VMs, I predict > the current scheduler will do very poorly, because it wasn''t designed > with VMs in mind, but with processes. Making changes so that VMs run > better will fundamentally make things that make processes run less > well. >The Linux scheduler already supports multiple scheduling classes. If we find that none of them will fit our needs, we''ll propose a new one. There are also multiple I/O schedulers, multiple allocators (perhaps a bad example), and multiple filesystems. When the need can be demonstrated to be real, and the implementation can be clean, Linux can usually be adapted. I think the Xen design has merit if it can truly make dom0 a guest -- that is, if it can survive dom0 failure. Until then, you''re just taking a large interdependent codebase and splitting it at some random point, but you don''t get any stability or security in return. It will also be interesting to see how far Xen can get along without real memory management (overcommit).>> The whole Xen design is messed up really: you have taken off bits of >> the Linux kernel you found interesting, turned them into a >> micro-kernel in essence and renamed it to ''Xen''. >> > > That''s how Xen started, and that''s really the beauty of open-source. > (After all, KVM has stolen some ideas from the Xen shadow code.) But > since then, basically all of the code has been replaced with > Xen-written code. I think if you did an SCO-style audit comparing > Linux and Xen 3.4, you''d find a lot less in common than you think. >A lot of the arch code is derived from Linux.>> Xen isnt actually useful _at all_ without Linux/DOM0. Without Dom0 >> Xen is slow and native hardware support within Xen is virtually >> non-existent, as you point out above. >> > > And qemu-kvm isn''t useful _at_all_ without Linux either; and Linux-KVM > isn''t useful _at_all_ without qemu. Your point? >kvm is actually being used by other userspaces. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-May-26 19:18 UTC
RE: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
> It will also be > interesting to see how far Xen can get along without real memory > management (overcommit).Several implementations of "classic" memory overcommit have been done for Xen, most recently the Difference Engine work at UCSD. It is true that none have been merged yet, in part because, in many real world environments, "generalized" overcommit often leads to hypervisor swapping, and performance becomes unacceptable. (In other words, except in certain limited customer use models, memory overcommit is a "marketing feature".) There''s also a novel approach, Transcendent Memory (aka "tmem" see http://oss.oracle.com/projects/tmem). Though tmem requires the guest to participate in memory management decisions (thus requiring a Linux patch), system-wide physical memory efficiency may improve vs memory deduplication, and hypervisor-based swapping is not necessary.> The Linux scheduler already supports multiple scheduling > classes. If we > find that none of them will fit our needs, we''ll propose a new one. > When the need can be demonstrated to be real, and the > implementation can > be clean, Linux can usually be adapted.But that''s exactly George and Jeremy''s point. KVM will eventually require changes that clutter Linux for purposes that are relevant only to a hypervisor.> > I think if you did an SCO-style audit comparing > > Linux and Xen 3.4, you''d find a lot less in common than you think. > > A lot of the arch code is derived from Linux.Indeed it is, but the operative word is "derived". In many cases, the code has been modified to be more applicable to a hypervisor. For example, in Xen, tmem uses radix trees in a way that is similar to Linux but different enough that the changes would not likely be acceptable in Linux. The separation between Xen and Linux allows this diversity without cluttering Linux. I think we can all agree that drawing boundaries between "hypervisor" functionality and "operating system" functionality is a work in progress and may take many more years to settle. In the meantime, there should be room (and support) for different approaches. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Avi Kivity
2009-May-26 19:41 UTC
Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
Dan Magenheimer wrote:>> It will also be >> interesting to see how far Xen can get along without real memory >> management (overcommit). >> > > Several implementations of "classic" memory overcommit have been > done for Xen, most recently the Difference Engine work at UCSD. > It is true that none have been merged yet, in part because, > in many real world environments, "generalized" overcommit > often leads to hypervisor swapping, and performance becomes > unacceptable. (In other words, except in certain limited customer > use models, memory overcommit is a "marketing feature".) >Swapping indeed drags performance down horribly. I regard it as a last resort solution used when everything else (page sharing, compression, ballooning, live migration) has failed. By having that last resort you can actually use the other methods without fearing an out-of-memory condition eventually. Note that with SSDs disks have started to narrow the gap between memory and secondary storage access times, so swapping will actually start improving rather than regressing as it has done in recent times.> There''s also a novel approach, Transcendent Memory (aka "tmem" > see http://oss.oracle.com/projects/tmem). Though tmem requires the > guest to participate in memory management decisions (thus requiring > a Linux patch), system-wide physical memory efficiency may > improve vs memory deduplication, and hypervisor-based swapping > is not necessary. >Yes, I''ve seen that. Another tool in the memory management arsenal.> >> The Linux scheduler already supports multiple scheduling >> classes. If we >> find that none of them will fit our needs, we''ll propose a new one. >> When the need can be demonstrated to be real, and the >> implementation can >> be clean, Linux can usually be adapted. >> > > But that''s exactly George and Jeremy''s point. KVM will > eventually require changes that clutter Linux for purposes > that are relevant only to a hypervisor. >kvm has already made changes to Linux. Preemption notifiers allow us to have a lightweight exit path, and mmu notifiers allow the Linux mmu to control the kvm mmu. And in fact mmu notifiers have proven useful to device drivers. It also works the other way around; for example work on cpu controllers will benefit kvm, and the real-time scheduler will also apply to kvm guests. In fact many scheduler and memory management features immediately apply to kvm, usually without any need for integration. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2009-May-26 21:19 UTC
Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
On 05/26/09 14:46, George Dunlap wrote:> On Mon, May 25, 2009 at 5:10 AM, Ingo Molnar<mingo@elte.hu> wrote: >> Note that this design problem has been created by Xen, >> intentionally, and Xen is now suffering under those bad technical >> choices made years ago. It''s not Linux''s problem. > > I''d like to respecfully disagree with this.Well. Xen *does* suffer from bad technical choices made years ago. I''m pretty sure Xen would look radically different when being rewritten from scratch today. One reason is that Xen predates vt and svm. With that in mind some of the xen interface bits don''t look *that* odd any more. Back then it did made sense to handle things that way. The ioapic hypercalls discussed in this thread belong into that group IMHO. Another reason is that Xen wasn''t "designed". Xen was "hacked up". As far I know there is no document which describes the overall design of the guest/xen ABI. Also there is no documentation (other than code) which describes all details of the guest/xen ABI. Simple reason: The ABI wasn''t designed. It was hammered into shape until it worked. On x86. The guys who attempted (and failed) to port xen to ppc had alot of *ahem* fun with that stuff. For example: Passing guest virtual addresses in (some) hypercalls. Also direct paging mode is a very x86-ish and is the reason for a number of ia64-ifdefs in places where you don''t expect them ... cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-27 07:17 UTC
[Xen-devel] Re: [PATCH 02/17] x86: add io_apic_ops to allow interception
Ingo Molnar wrote:> ok, could you please turn the whole IO-APIC code into a driver > framework? I.e. all IO-APIC calls outside of > arch/x86/kernel/apic/io_apic.c should be to some io_apic-> method. > > The advantage will be a proper abstraction for all IO-APIC details - > not just a minimalistic one for Xen''s need. > > Also, please name it ''struct io_apic'' - similar to the ''struct apic'' > naming we have for the local APIC driver structure.OK, I''ll have a look at it. I think it could turn out quite nicely, and possibly remove the need for some other other Xen hooks around the place, as well as make the path for some other other upcoming things clearer. But in the meantime, would you consider taking the minimal ops approach for this next merge window, and the full api in the next dev cycle? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-May-27 10:14 UTC
Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
On Tue, May 26, 2009 at 10:19 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:> Well. Xen *does* suffer from bad technical choices made years ago. I''m > pretty sure Xen would look radically different when being rewritten from > scratch today.That may be. I don''t know enough about the specific issues you raise below to comment. But Ingo wasn''t bringing up those issues: he was disagreeing with the whole idea of including dom0 Linux as a key component of the Xen system. If the Xen project were to start over from scratch, we might make a lot of different decisions; but running Linux as the hypervisor (as KVM does) or forking Linux (as Ingo seemed to suggest) are not among them. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2009-May-28 00:13 UTC
Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
* Dan Magenheimer <dan.magenheimer@oracle.com> wrote:> > The Linux scheduler already supports multiple scheduling > > classes. If we find that none of them will fit our needs, we''ll > > propose a new one. When the need can be demonstrated to be > > real, and the implementation can be clean, Linux can usually be > > adapted. > > But that''s exactly George and Jeremy''s point. KVM will eventually > require changes that clutter Linux for purposes that are relevant > only to a hypervisor.That''s wrong. Any such scheduler classes would also help: control groups, containers, vserver, UML and who knows what other isolation project. Many of such mechanisms are already implemented as well. I rarely see any KVM-only feature in generic kernel code, and that''s good. Xen changes - especially dom0 - are overwhelmingly not about improving Linux, but about having some special hook and extra treatment in random places - and that''s really bad. I also find it pretty telling that you cut out the most important point of Avi''s reply:> > I think the Xen design has merit if it can truly make dom0 a > > guest -- that is, if it can survive dom0 failure. Until then, > > you''re just taking a large interdependent codebase and splitting > > it at some random point, but you don''t get any stability or > > security in return.that crucial question really has to be answered honestly and upfront. Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-28 00:49 UTC
Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
Ingo Molnar wrote:> I also find it pretty telling that you cut out the most important > point of Avi''s reply: > > >>> I think the Xen design has merit if it can truly make dom0 a >>> guest -- that is, if it can survive dom0 failure. Until then, >>> you''re just taking a large interdependent codebase and splitting >>> it at some random point, but you don''t get any stability or >>> security in return. >>> > > that crucial question really has to be answered honestly and > upfront.Xen, the hypervisor itself, doesn''t require any services from dom0. From its perspective, dom0 is just another guest domain, though with enough privileges to access hardware. Dom0''s job is to provide device access to other less privileged domains. There is currently some system-wide information which is stored in a usermode daemon in dom0. Recovering from its loss is hard, but there is a prototype to pull that daemon out into its own special-purpose domain. At that point, dom0 can reboot without affecting any of the other domains or Xen itself. If dom0 goes away, the other domains will get a disconnect and temporarily lose access to their devices, but they can cope with that. From their perspective, it would look like they''d just been save/restored or migrated to another machine. When dom0 comes back, they''ll reconnect and carry on. The disaggregation of dom0''s functions is something that the Xen development community is actively perusing. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-May-28 03:47 UTC
RE: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
> * Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > > > > The Linux scheduler already supports multiple scheduling > > > classes. If we find that none of them will fit our needs, we''ll > > > propose a new one. When the need can be demonstrated to be > > > real, and the implementation can be clean, Linux can usually be > > > adapted. > > > > But that''s exactly George and Jeremy''s point. KVM will eventually > > require changes that clutter Linux for purposes that are relevant > > only to a hypervisor. > > That''s wrong. Any such scheduler classes would also help: control > groups, containers, vserver, UML and who knows what other isolation > project. Many of such mechanisms are already implemented as well.I think you are missing the point. Yes, certainly, generic scheduler code can be written that applies to all of these uses. But will that be the same code that is best for KVM to succeed in an enterprise-class virtual data center? I agree with George that it will not; generic code and optimal code are rarely the same thing. What''s best for an operating system is not always what''s best for a hypervisor. But we are both speculating. I guess only time will tell.> I also find it pretty telling that you cut out the most important > point of Avi''s reply: > > > > I think the Xen design has merit if it can truly make dom0 a > > > guest -- that is, if it can survive dom0 failure. Until then, > > > you''re just taking a large interdependent codebase and splitting > > > it at some random point, but you don''t get any stability or > > > security in return. > > that crucial question really has to be answered honestly and > upfront.I cut it out because I thought others would be more qualified to answer, but since nobody else has, I will. Absolutely there is work going on to survive failure of dom0 (or any domain)! This is a must for enterprise-grade availability and security, such as is needed for huge corporate data centers and "clouds". However, the majority of users (individuals and small businesses) will probably be most happy with their distro (and distro kernel) as dom0 since it is convenient and familiar. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke S Crawford
2009-May-28 12:03 UTC
Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
Dan Magenheimer <dan.magenheimer@oracle.com> writes:> such as is needed for huge corporate data centers and "clouds". > However, the majority of users (individuals and small businesses) > will probably be most happy with their distro (and distro kernel) > as dom0 since it is convenient and familiar.Hey, I''m going to hijack this for a rant, if you don''t mind. I''ve pared back the cc list. Now, I''m just the janitor; I rightly belong on xen-users. I lurk in -devel because it''s good to know what the developers are thinking. I don''t want to come off like I think I''m smarter than anyone here; I don''t. But I am a heavy user of Xen, and I don''t think I''m an unusual user of Xen. I''ve been selling VPSs using Xen since 2005. After the marketing people convince the middle managers that virtualization is the way to go, someone like me has to actually bang on the thing with a spanner or rub it with a greasy rag until it works. I also do contracting for some of those ''large corporate data centers'' of which you speak. (corporate data centers seem to be the worst in terms of operational efficiency. do you know how many Linux installations I''ve seen where the customer pays a few hundred extra per box for integrated KVM over IP functionality rather than the much cheaper and more useful serial consoles? Oy. You expect me to tell you why your server crashed when you have no console logs of the backtrace?) But I''m getting sidetracked. My point is that small companies need good tools more than large corporations do. The big guys can just keep throwing money at the problem until their stuff mostly works. In the Dom0, I want something as stable, minimal, standard and supported as possible (by supported, I really mean standard and widely used. The best ''support'' is searching mailing lists such as this one.) the last thing I want is all the cowboy hackery that goes into my favorite desktop OS to be included in my Xen Dom0. I moved to Ubuntu on my laptop last year and I was amazed how easy it was. everything just worked. making new hardware work was easier than windows. But do I want that on my Xen Dom0? certainly not until you get that thing working where I can reboot the Dom0 without killing everything. This is what I think is wrong about the default install of Xen; it is setup so that you can run your desktop in the dom0, and spin up DomUs as needed. It tries to be a virtualization server and a desktop at the same time, and it gives up stability for this. If you''ve ever run a Xen host and have forgotten to change the default dom0-min-mem of 192MiB, you''d know most (especially x86_64) linux installations are not stable under load with that much memory. Even if you set dom0-min-mem to a reasonable value, I''ve seen enough problems related to ballooning that I always disable it on all hosts. (granted, most of those problems are on old RHEL installs) Also, in all the environments I work in, both at the ''large corporate data center'' and my own, the configuration of the DomUs is fairly static. Sure, it''s kind of nice to be able to add resources without rebooting the DomU, but to give up stability for that is just crazy. If your customer says "I want more X" it''s fine to say "Ok, let me know when I can reboot you" - it''s not ok to crash. In my work, people mostly use the ''I take this Linux box, I set it up, and I use it for three years'' model. They don''t need any of the fancy ''computing on demand'' - they just want to move 16 of those crusty P3 servers that are killing their power bill and crashing due to bad hardware twice a month on to a nice shiny new 8 core box with 32GiB ram and a warranty. I''ve seen lots of people who buy ec2 instances and do the same thing; they leave it on all the time. (the basic ec2 instances are particularly unsuited to this usage, but people do it anyhow.) I''m not going to say memory overcommit is never useful for anyone; but I can say it is never useful for me. 32GiB registered ecc ddr2 is around $600. That''s not very many billable hours. That''s around half the approximate cost of an unplanned reboot of one of my servers. (I''m only counting money lost due to SLA and time to clean up; if you count loss to reputation, it gets even worse) My experience with memory overcommit has been that it makes your system either unstable, slow or both. Now, I don''t know if you could theoretically make a zero cost memory overcommit system; I''m just saying that every attempt at overcommiting memory between virtual servers I have seen ended in tears. (heck, I''ve seen quite a lot of tearful endings due to the memory overcommit linux itself does.) This is why I ditched FreeBSD jails and came to Xen in 2005. Right now, I''m using CentOS5 with the xen.org kernels, but it sure would be nice if there was some pared down pre-built dom0 configuration available. (I personally give my Dom0 1024MiB out of 32GiB) It could be based on centos, or on ttylinux, or whatever. just something standard, small, and simple. Make it good enough that people use it. When I see a problem, I want fifty other guys to have seen the problem first. I''m thinking about starting such a project myself once I get a few other things done. If nothing else, I can distribute kickstart files of a minimal dom0. Going forward, now that NetBSD 5 is out, perhaps I will switch back to NetBSD as my Dom0 (I switched from FreeBSD jails to NetBSD/Xen2 in 2005. I switched away for pae/x86_64 support. I mean, no pae sucked, but the os was solid) Unfortunately, that means I would have less ''support'' in the form of other people doing the same thing and talking about it in public. RedHat is talking about doing it with KVM - see the Red Hat Enterprise Virtualization hypervisor - they claim you will have a KVM ''dom0'' that uses only 64M ram- which seems funny to me, as my perception of KVM has always been that it was optimized to run virtual instances as needed on a box that usually ran applications on the bare metal, like a desktop. -- Luke S. Crawford http://prgmr.com/xen/ - Hosting for the technically adept http://nostarch.com/xen.htm We don''t assume you are stupid. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Post
2009-May-28 13:39 UTC
Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
On Thu, 2009-05-28 at 08:03 -0400, Luke S Crawford wrote:> Dan Magenheimer <dan.magenheimer@oracle.com> writes: > > such as is needed for huge corporate data centers and "clouds". > > However, the majority of users (individuals and small businesses) > > will probably be most happy with their distro (and distro kernel) > > as dom0 since it is convenient and familiar.> I''ve been selling VPSs using Xen since 2005. After the > marketing people convince the middle managers that virtualization is the > way to go, someone like me has to actually bang on the thing with a spanner > or rub it with a greasy rag until it works.So have I, since (pre) 2.0.7. I was one of the first (and only) to offer OpenSSI (paravirtualized) as an offering.> I also do contracting for some of those ''large corporate data centers'' of > which you speak. (corporate data centers seem to be the worst in terms of > operational efficiency. do you know how many Linux installations I''ve seen > where the customer pays a few hundred extra per box for integrated KVM over > IP functionality rather than the much cheaper and more useful serial > consoles? Oy. You expect me to tell you why your server crashed when > you have no console logs of the backtrace?)They are in business to make money, which is why real system integrators flourish and stand out from the crowd who read "linux for dummies version (x), now including KVM!!" You either know how Xen and Linux works or you don''t. Most DC "hands and eyes" just follow a pre-set procedure and can''t be bothered to deviate from it or handle special cases. Again, that''s why we have jobs.> But I''m getting sidetracked. My point is that small companies need good > tools more than large corporations do. The big guys can just keep > throwing money at the problem until their stuff mostly works.Here we go again. Writing your own tools is not too difficult, it makes you money using LGPL libraries that are (reasonably) self explanatory. Xen is a tool in your toolbox. All too often many fail to realize the difference between Xen the hypervisor and the tools provided.> the last thing I want is all the cowboy hackery that goes into my favorite > desktop OS to be included in my Xen Dom0. I moved to Ubuntu on my laptop > last year and I was amazed how easy it was. everything just worked. > making new hardware work was easier than windows.Desktop OS? We have to draw a line here. There is desktop and server virtualization. If you want to try xyz-distro on your desktop, use Virtualbox. If you want to put virtual machines to work, use Xen. What, exactly is cowboy hackery? A dom-0 that might be a little slower if you boot it without Xen?> But do I want that on my Xen Dom0? certainly not until you get that thing > working where I can reboot the Dom0 without killing everything.Mmmm, then work on getting xenstored into a stub domain.> This is what I think is wrong about the default install of Xen; it is setup > so that you can run your desktop in the dom0, and spin up DomUs as needed. > It tries to be a virtualization server and a desktop at the same time, > and it gives up stability for this.The only reason that you should be using Xen on a desktop is to test stuff that you want to propagate to servers. You''ve already said that you make your living as an integrator selling the use of computers that use Xen. Xen is meant for production, it can be used on a desktop.> If you''ve ever run a Xen host and have forgotten to change the default > dom0-min-mem of 192MiB, you''d know most (especially x86_64) linux > installations are not stable under load with that much memory.I have , and I don''t forget to change it.> In my work, people mostly use the ''I take this Linux box, I set it up, > and I use it for three years'' model. They don''t need any of the fancy > ''computing on demand'' - they just want to move 16 of those crusty P3 > servers that are killing their power bill and crashing due to bad hardware > twice a month on to a nice shiny new 8 core box with 32GiB ram and a > warranty. I''ve seen lots of people who buy ec2 instances and do the > same thing; they leave it on all the time. (the basic ec2 instances are > particularly unsuited to this usage, but people do it anyhow.)Have you even looked at / tried Eucalyptus ?> I''m not going to say memory overcommit is never useful for anyone; > but I can say it is never useful for me. 32GiB registered ecc ddr2 > is around $600. That''s not very many billable hours. That''s around > half the approximate cost of an unplanned reboot of one of my servers. > (I''m only counting money lost due to SLA and time to clean up; if you > count loss to reputation, it gets even worse)I don''t have this problem. I export PV guest vitals over xenbus and set up watches on them. As for overcommitment, the first step is knowing how much memory each domain''s kernel has actually promised to running processes. That much is already in the tree.> Right now, I''m using CentOS5 with the xen.org kernels, but it sure > would be nice if there was some pared down pre-built dom0 configuration > available. (I personally give my Dom0 1024MiB out of 32GiB) It could be > based on centos, or on ttylinux, or whatever. just something standard, small, > and simple. Make it good enough that people use it. When I see a problem, > I want fifty other guys to have seen the problem first.I don''t want to seem combative or antagonistic .. however, if I give you a screw driver and a wrench, I''d expect that you''d use them in your own way. Xen is no different.> I''m thinking about starting such a project myself once I get a few other > things done. If nothing else, I can distribute kickstart files of a minimal > dom0.Just as many others have done with debootstrap. I know your frustrated with dom-0 not being in mainline, we all are. However, it seems the tools frustrate you the most. Xen gives us a solid hypervisor, solid low level libraries and some examples on how to use them. I can''t see (at this point) why you are so seemingly disgruntled?> RedHat is talking about doing it with KVM - see the Red Hat Enterprise > Virtualization hypervisor - they claim you will have a KVM ''dom0'' that > uses only 64M ram- which seems funny to me, as my perception of KVM has > always been that it was optimized to run virtual instances as needed on > a box that usually ran applications on the bare metal, like a desktop.Eh, that funny thing we call "market research" influences that. People want easy desktop virtualization. Desktop virtualization is _most_decidedly_not_ IAAS. There will _always_ be a market for people who can make tools (or modify the existing ones) to suit some need. I agree with some of what you have to say, I always appreciate a rant and I do not mean to seem unfriendly .. however, I also fail to see the basis? Maybe I missed something, entirely possible. Cheers, --Tim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-May-28 14:26 UTC
Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
On Thu, May 28, 2009 at 1:13 AM, Ingo Molnar <mingo@elte.hu> wrote:>> > I think the Xen design has merit if it can truly make dom0 a >> > guest -- that is, if it can survive dom0 failure. Until then, >> > you''re just taking a large interdependent codebase and splitting >> > it at some random point, but you don''t get any stability or >> > security in return.Let me turn this around: are you (Ingo) saying that if a Xen system could successfully survive a dom0 failure, then you would consider that a valid reason for this design choice, and would be willing to support and pursue changes required to allow mainline linux to run as dom0? If not then this line of discussion is just a distraction. I personally think the strongest argument for an interdependent codebase is the ability to have a separate piece of software as a dedicated hypervisor. I also think Xen provides extra security and stability as it is right now. The code is much smaller and simpler than the kernel. The number of hypercalls is smaller than the number of system calls, and the complexity of hypercalls is much lower than the complexity of system calls in general. Driver domains, in which a driver runs in a domain other than dom0 and can fail and reboot, have been supported in Xen for years. The ability to survive dom0 failure is just an added benefit. As Dan and Jeremy said, the Xen community is actively pursuing the changes required to allow dom0 to panic / reboot without requiring a reboot of Xen and other guests. I''m sure if that would make members of the linux community actively support inclusion of dom0 support, we could make that work a priority. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke S Crawford
2009-May-28 22:23 UTC
Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
Tim Post <echo@echoreply.us> writes:> What, exactly is cowboy hackery? A dom-0 that might be a little slower > if you boot it without Xen?No, I mean like Debian''s 2.6.27 Dom0. As far as I can tell, they imported the SUSE Xen patch once, and have not pulled any of SUSE''s bugfixes since. By all reports, it works fine, and is excellent as a desktop OS. However, it''s not something I want on the server where reboots cost me money. (I''m further arguing that even in the case of a small office, you want your ''dedicated virtualization server'' to be just that; and rock solid.)> > If you''ve ever run a Xen host and have forgotten to change the default > > dom0-min-mem of 192MiB, you''d know most (especially x86_64) linux > > installations are not stable under load with that much memory. > > I have , and I don''t forget to change it.But why make the default something that will crash the server?> Have you even looked at / tried Eucalyptus ?I''ve looked at it, I''m thinking about using it. The EC2 interface is really neat. I have a lot of admiration and respect for amazon. But the interface is not what makes the small ec2 instances unsuitable as co-lo replacements, the unmirrored disk and the high price is. the small ec2 images are simply not designed to be a replacement for servers in the usual case. They are great if you have a nice redundant application, designed to let any node fail at any time, but that''s not how most small businesses configure their servers. For most small companies, it''s cheaper to get reasonably good hardware and redundant disk, take backups, and then have a fire drill every time the hardware fails.> I don''t have this problem. I export PV guest vitals over xenbus and set > up watches on them. As for overcommitment, the first step is knowing how > much memory each domain''s kernel has actually promised to running > processes. That much is already in the tree.That only solves half of the problem and gets you back to where you are with FreeBSD jails/unionfs (well, also my users run their own kernels and have full control over userland.) Even with that problem solved, you still have the problem of disk cache, which is essential for acceptable performance. If you want to buy a small image from me and thrash it, that''s fine. However, I don''t want that user who underprovisions his or her domain to make performance suck for a more responsible user. This is why I moved to xen in the first place; a few heavy users were trashing the disk cache on my FreeBSD jail system, and it was slow for everyone. With the move to Xen, suddenly the heavy user was the only user seeing the slowness. Now the heavy user has the option of paying me more money for more ram to use as disk cache, or of dealing with it being slow. Light users had no more trouble. Log in once every 3 months? your /etc/passwd is still cached from last time. This is why I''m so uneasy above overcommit. Ram is not like CPU, which you can take away at a moments notice and give back as if (almost) nothing happened. (or perhaps new CPUs are just so much more powerful than I need that I don''t notice the degridation.)> Just as many others have done with debootstrap. I know your frustrated > with dom-0 not being in mainline, we all are. However, it seems the > tools frustrate you the most. Xen gives us a solid hypervisor, solid low > level libraries and some examples on how to use them. I can''t see (at > this point) why you are so seemingly disgruntled?hm. Well, it is bad that I am coming across as disgruntled. But I do think it is bad that the tools that come with xen seem to be focused on Xen as a desktop OS at the expense of xen as a dedicated virtualization server. I don''t think Xen makes a particularly good desktop virtualization platform, and this setup unnecessarily raises the barrier to using Xen as a dedicated virtualization server. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar wrote:> Xen changes - especially dom0 - are overwhelmingly not about > improving Linux, but about having some special hook and extra > treatment in random places - and that''s really bad. >You''ve made this argument a few times now, and I take exception to it. It seems to be predicated on the idea that Xen has some kind of niche usage, with barely more users than Voyager. Or that it is a parasite sitting on the side of Linux, being a pure drain. Neither is true. Xen is very widely used. There are at least 500k servers running Xen in commercial user sites (and untold numbers of smaller sites and personal users), running millions of virtual guest domains. If you browse the net at all widely, you''re likely to be using a Xen-based server; all of Amazon runs on Xen, for example. Mozilla and Debian are hosted on Xen systems. Hardware vendors like Dell and HP are shipping servers with Xen built into the firmware, and increasingly, desktops and laptops. Many laptop "instant-on/instant-access" features are based on a combination of Xen and Linux. All major Linux distributions support running as a Xen guest, and many support running as a Xen host. For these users, Xen support is an active feature of Linux; Linux without Xen support would be much less useful to them, and better Xen support would be more useful. For them, Xen support is no different from any other kind of platform support. They are being actively hampered by the fact that the only dom0 support is available in the form of either ancient or very patched kernels. To them, improved Xen support *is* "improving Linux". Your view appears to be that virtualization is either useless, or a neat trick useful for doing a quick kernel test (which is why kvm got early traction in this community; it is well suited to this use-case). But that is a very parochial kernel-dev view. For many users, virtualization (in general, but commonly on Xen) has become an absolutely essential part of their computing infrastructure, and they would no more go without it than they would go without ethernet. We''re taking your technical critiques very seriously, of course, and I appreciate any constructive comment. But your baseline position of animosity towards Xen is unreasonable, unfair and unnecessary. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Post
2009-May-29 01:00 UTC
Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
On Thu, 2009-05-28 at 18:23 -0400, Luke S Crawford wrote:> Tim Post <echo@echoreply.us> writes: > > > What, exactly is cowboy hackery? A dom-0 that might be a little slower > > if you boot it without Xen? > > No, I mean like Debian''s 2.6.27 Dom0. As far as I can tell, > they imported the SUSE Xen patch once, and have not pulled any of SUSE''s > bugfixes since. By all reports, it works fine, and is excellent as a > desktop OS. However, it''s not something I want on the server where > reboots cost me money.One of the biggest problems is having to go out of tree to get a usable dom-0, then you deploy it .. then you find interesting bugs a week later. I think everyone right now is just crossing their fingers.> But why make the default something that will crash the server?That''s really something for the various distros to address too.> > I don''t have this problem. I export PV guest vitals over xenbus and set > > up watches on them. As for overcommitment, the first step is knowing how > > much memory each domain''s kernel has actually promised to running > > processes. That much is already in the tree. > > That only solves half of the problem and gets you back to where > you are with FreeBSD jails/unionfs (well, also my users run their own > kernels and have full control over userland.) Even with that problem > solved, you still have the problem of disk cache, which is essential for > acceptable performance.Right now, what we''re doing is not quite overcommitment, its more like accounting. By placing the output of sysinfo() and more (bits of /proc/meminfo) on Xenbus, its easy to get a bird''s eye view of what domains are under or over utilizing their given RAM. If a domain has 1GB, yet its kernel is consistently committing only 384MB (actual size), there''s a good chance that the guest would do just as well with 512MB, depending on its buffer use. The reverse is also true. Its looking at the whole VM big picture, including buffers, swap, etc. Its not an automatic process, but it does allow an administrator to better organize domains and allocate resources. In the case where you''ve sold someone 1GB its not applicable .. but in an office / enterprise setting it does make things easier.> This is why I''m so uneasy above overcommit. Ram is not like CPU, which > you can take away at a moments notice and give back as if (almost) nothing > happened. (or perhaps new CPUs are just so much more powerful than I > need that I don''t notice the degridation.)I''m the same way, I look forward to seeing the balloon driver advance, however I''d never flip a switch to ''auto''.> hm. Well, it is bad that I am coming across as disgruntled.Frustrated is probably a better word.> But > I do think it is bad that the tools that come with xen seem to be focused > on Xen as a desktop OS at the expense of xen as a dedicated > virtualization server. I don''t think Xen makes a particularly good > desktop virtualization platform, and this setup unnecessarily > raises the barrier to using Xen as a dedicated virtualization server.The tools are the minimum needed to control and manage domains, plus an API for those who don''t want to get intimiate with the lower level libraries. I know they''re basic, but they also present good examples and a great opportunity to make tools that suit your exact need. I don''t quite understand why you feel they are better suited to desktop virtualization (taking the API into consideration for multi server setups)? Cheers, --Tim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, May 28, 2009 at 05:45:34PM -0700, Jeremy Fitzhardinge wrote:> Mozilla and Debian are hosted on Xen systems.A tiny data point about these domains. They are hosted by osuosl.org, which uses xen systems running with the current dom0 patch set. Because those patches are out-of-tree, they have a hard time updating kernel versions, and generally lag kernel.org releases by a lot, which is not always a good thing. So getting the dom0 patches into mainline will make their lives much easier, and more secure. thanks, greg k-h _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
From: Jeremy Fitzhardinge <jeremy@goop.org> Date: Thu, 28 May 2009 17:45:34 -0700> Ingo Molnar wrote: >> Xen changes - especially dom0 - are overwhelmingly not about improving >> Linux, but about having some special hook and extra treatment in >> random places - and that''s really bad. >> > > You''ve made this argument a few times now, and I take exception to it. > > It seems to be predicated on the idea that Xen has some kind of niche > usage, with barely more users than Voyager. Or that it is a parasite > sitting on the side of Linux, being a pure drain.I don''t see Ingo''s comments, whether I agree with them or not, as an implication of Xen being niche. Rather I see his comments as an opposition to how Xen is implemented.> We''re taking your technical critiques very seriously, of course, and I > appreciate any constructive comment. But your baseline position of > animosity towards Xen is unreasonable, unfair and unnecessary.I don''t see any animosity at all in what Ingo has said. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Dave, On Thu, 2009-05-28 at 21:05 -0700, David Miller wrote:> From: Jeremy Fitzhardinge <jeremy@goop.org> > Date: Thu, 28 May 2009 17:45:34 -0700 > > > Ingo Molnar wrote: > >> Xen changes - especially dom0 - are overwhelmingly not about improving > >> Linux, but about having some special hook and extra treatment in > >> random places - and that''s really bad. > >> > > > > You''ve made this argument a few times now, and I take exception to it. > > > > It seems to be predicated on the idea that Xen has some kind of niche > > usage, with barely more users than Voyager. Or that it is a parasite > > sitting on the side of Linux, being a pure drain. > > I don''t see Ingo''s comments, whether I agree with them or not, as > an implication of Xen being niche. Rather I see his comments as > an opposition to how Xen is implemented. >You can see Ingo''s comments and whole thread under subject : Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops) http://lkml.org/lkml/2009/5/27/758 -- JSR _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
From: Jaswinder Singh Rajput <jaswinder@kernel.org> Date: Fri, 29 May 2009 12:07:32 +0530> Hi Dave, > > On Thu, 2009-05-28 at 21:05 -0700, David Miller wrote: >> From: Jeremy Fitzhardinge <jeremy@goop.org> >> Date: Thu, 28 May 2009 17:45:34 -0700 >> >> > Ingo Molnar wrote: >> >> Xen changes - especially dom0 - are overwhelmingly not about improving >> >> Linux, but about having some special hook and extra treatment in >> >> random places - and that''s really bad. >> >> >> > >> > You''ve made this argument a few times now, and I take exception to it. >> > >> > It seems to be predicated on the idea that Xen has some kind of niche >> > usage, with barely more users than Voyager. Or that it is a parasite >> > sitting on the side of Linux, being a pure drain. >> >> I don''t see Ingo''s comments, whether I agree with them or not, as >> an implication of Xen being niche. Rather I see his comments as >> an opposition to how Xen is implemented. >> > > You can see Ingo''s comments and whole thread under subject : > > Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops) > > http://lkml.org/lkml/2009/5/27/758Jeremy is specifically commenting on Ingo''s quoted "argument". And that "argument" is what he takes "exception to". And that''s the scope of what I''m commenting on too. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Tim Post
2009-May-29 08:31 UTC
Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
On Fri, 2009-05-29 at 09:00 +0800, Tim Post wrote:> Right now, what we''re doing is not quite overcommitment, its more like > accounting. By placing the output of sysinfo() and more (bits > of /proc/meminfo) on Xenbus, its easy to get a bird''s eye view of what > domains are under or over utilizing their given RAM. If a domain has > 1GB, yet its kernel is consistently committing only 384MB (actual size), > there''s a good chance that the guest would do just as well with 512MB, > depending on its buffer use. The reverse is also true. Its looking at > the whole VM big picture, including buffers, swap, etc.Sorry, forgot to mention, average (aggregate) IOWAIT is also a key factor. Users can do odd things like bypass buffers with relational databases. So, when we see the kernel overselling, next to nill buffers and a very high aggregate average IOWAIT across all vcpus, we have a pretty good idea of what''s going on. Xenbus/Xenstore exists, the combined size of these vitals are small .. until admin friendly introspection surfaces, its really the best way to put any given host under a stereo microscope. The problem is differentiating disk I/O from network I/O. Cheers, --Tim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-May-29 09:49 UTC
Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
Luke, I hope this doesn''t come off as a shameless plug, but Citrix XenServer is exactly what you describe: dom0 is used only as a utility domain to control other VMs. And the basic version, which now includes (if I recall our marketing blah blah blah correclty) support for server pools, migration, and remote storage, is available for free (as-in-beer, and with some registration so we can figure out who''s using it). It''s honestly what I would use if I were running a sever in a small business. If you''re commenting on the lack of free-as-in-speech distro that looks like XenServer, Xen as a project doesn''t have much say in how distros integrate Xen. I don''t see any technical reason why someone couldn''t take a Debian base and set up something like XenServer; or any technical reason why someone couldn''t do like CentOS has done, and clone our entire open-source tree as a starting point. (Obviously it would take a little bit of additional work, since the control stack on XenServer isn''t open-source.) If you''re up for starting a distro based on Xen, that would be great. I think it would probably get a lot of traction with server admins, and if you make good design choices to minimize the amount of work you have to do as things move forward, and can get a good community around it, it''s got a chance to have a big impact on OSS virtualization. And any technical feedback, such as suggesting a better dom0_min_mem size, can be submitted to the list, or put in a bugzilla note, even if you don''t have a patch to change it. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
David Miller wrote:> I don''t see Ingo''s comments, whether I agree with them or not, as > an implication of Xen being niche. Rather I see his comments as > an opposition to how Xen is implemented. >It''s in his definition of "improving Linux". Jeremy is saying that allowing Linux to run as dom0 *is* improving Linux. The lack of dom0 support is at this moment making life more difficult for a huge number of Linux users who use Xen, including Mozilla, Debian, and Amazon. Adding dom0 support would make Linux even more useful to a wide variety of people not using Xen at the moment. Saying that dom0 support is "not about improving Linux" completely ignores the cost people are paying right now, and the benefits people could have. That (if I understand him) what Jeremy meant by saying it was treating it as if it was some kind of "niche usage, with barely more users than Voyager", and "being a pure drain".> I don''t see any animosity at all in what Ingo has said. >The last few paragraphs of the e-mail weren''t about that particular argument, but about the sum of the interaction with Ingo over dom0 support for the last 6 months. If you read the various threads, it''s pretty clear that Ingo is resistant to accepting dom0 changes, for whatever reason, and has been looking for reasons not to include it. If we take him at his word, that the root issue is that he fundamentally dislikes the design choice of running Linux-as-hypervisor-component, then we have a difference of opinion and we''re just going to have to agree to disagree. But there are reasons to include it anyway, including benefits to existing Xen users and potential Xen users (who have decided not to use KVM for whatever reason), and the idea of survival-of-the-fittest: Xen and KVM have made different design choices, let''s let them both grow and see which one thrives. If KVM''s design is unilaterally superior, eventually Xen will die off. But I suspect that there''s significant demand in the OSS virtualization ecology for both approaches, and the world will be the worse for dom0 support being out-of-tree. In any case, making unreasonable or inconsistent technical objections, when the root issue is is actually something else, is a waste of time and energy for everyone involved. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-May-29 13:42 UTC
RE: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
> With the move to Xen, suddenly the heavy user was the only user > seeing the slowness. Now the heavy user has the option of paying > me more money for more ram to use as disk cache, or of dealing with it > being slow. Light users had no more trouble. Log in once > every 3 months? > your /etc/passwd is still cached from last time.Am I understanding this correctly that you are "renting" a fixed partition of physical RAM that (assuming the physical server never reboots) persistently holds one VSP customer''s VM''s memory forever, never saved to disk? Although I can see this being advantageous for some users, no matter how cheap RAM is, having RAM sit "idle" for months (or even minutes) seems a dreadful waste of resources, which is either increasing the price of the service or the cost to the provider for a very small benefit for a small number of users. I see it as akin to every VM computing pi in a background process because, after all, the CPU has nothing better to do if it was going to be idle anyway. While I can see how the current sorry state of memory management by OS''s and hypervisors might lead to this business decision, my goal is to make RAM a much more "renewable" resource. The same way CPU''s are adding power management so that they can be shut down when idle even for extremely small periods of time to conserve resources, I''d like to see "idle memory" dramatically reduced. Self-ballooning and tmem are admittedly only a step in that direction, but at least it is (I hope) the right direction. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, May 29, 2009 at 01:01:18PM +0100, George Dunlap wrote:> David Miller wrote: > >I don''t see Ingo''s comments, whether I agree with them or not, as > >an implication of Xen being niche. Rather I see his comments as > >an opposition to how Xen is implemented. > > > It''s in his definition of "improving Linux". Jeremy is saying that > allowing Linux to run as dom0 *is* improving Linux. The lack of dom0 > support is at this moment making life more difficult for a huge number > of Linux users who use Xen, including Mozilla, Debian, and Amazon. > Adding dom0 support would make Linux even more useful to a wide variety > of people not using Xen at the moment. >Like stated already earlier, there is a huge amount of Xen in use all around the globe for server/datacenter virtualization. Personally I know many Xen installations in production, but not a single KVM installation (I''m sure those exist aswell, but personally I haven''t seen those). At the moment it''s pretty painful for the distro developers to ship dom0 enabled kernels (most of the distros do ship or are waiting for upstream dom0 enabled kernel), and also for many advanced users who build their custom Xen based solutions.. The current situation is not good for anyone. We really need Xen dom0 support in mainline Linux. Just my 2 eurocents. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap <george.dunlap@eu.citrix.com> writes: cc list from hell trimmed.> allowing Linux to run as dom0 *is* improving Linux. The lack of dom0 > support is at this moment making life more difficult for a huge number > of Linux users who use Xen, including Mozilla, Debian, and Amazon. > Adding dom0 support would make Linux even more useful to a wide > variety of people not using Xen at the moment.Perhaps one way to address this problem would be to make the Dom0 interface less intrusive for the host OS? Maybe impression last time I looked was that there was huge potential of improvement in this area. For example the PAT issue recently discussed was completely unnecessary. Or if you added a "VT/SVM only" Dom0 mode I''m sure the interface would be significantly cleaner too. If you can come up with a slim clean interface the chances for actual integration would be likely much higher. And if people want to update the Dom0 they surely could update the hypervisor to one with cleaner interfaces too. I understand that the DomU Xen ABI is becoming a kind of standard and should be supported, but that''s far from true for Dom0. -Andi -- ak@linux.intel.com -- Speaking for myself only. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
From: Pasi Kärkkäinen <pasik@iki.fi> Date: Fri, 29 May 2009 17:14:39 +0300> We really need Xen dom0 support in mainline Linux.Whether we want a feature is seperate from making sure it''s implementation is up to snuff and doesn''t suck. But the concentration of the talk seems to be on wanting the feature, and that''s only half the story. I''m getting sick of hearing over and over how many people use Xen, that point has been made succintly so let''s move on ok? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andi Kleen wrote:> George Dunlap <george.dunlap@eu.citrix.com> writes: > > cc list from hell trimmed. > > >> allowing Linux to run as dom0 *is* improving Linux. The lack of dom0 >> support is at this moment making life more difficult for a huge number >> of Linux users who use Xen, including Mozilla, Debian, and Amazon. >> Adding dom0 support would make Linux even more useful to a wide >> variety of people not using Xen at the moment. >> > > Perhaps one way to address this problem would be to make the Dom0 > interface less intrusive for the host OS? >I''m certainly not deaf to criticism along those lines, and I''m looking at ways of cleaning up/decoupling those interactions. But my frustration arises from the fact that there''s been a total stall on merging any of the pieces, even the ones which are either uncontroversial, or purely xen-internal changes. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 5/29/2009 11:34:40 AM, Andi Kleen wrote:> George Dunlap <george.dunlap@eu.citrix.com> writes: > > cc list from hell trimmed. > > > allowing Linux to run as dom0 *is* improving Linux. The lack of > > dom0 support is at this moment making life more difficult for a huge > > number of Linux users who use Xen, including Mozilla, Debian, and Amazon. > > Adding dom0 support would make Linux even more useful to a wide > > variety of people not using Xen at the moment. > > Perhaps one way to address this problem would be to make the Dom0 > interface less intrusive for the host OS? > > Maybe impression last time I looked was that there was huge potential > of improvement in this area. For example the PAT issue recently > discussed was completely unnecessary. Or if you added a "VT/SVM only" > Dom0 mode I''m sure the interface would be significantly cleaner too. > If you can come up with a slim clean interface the chances for actual > integration would be likely much higher.I think we still need some (or all?) of additional dom0 PV ops even for HVM (Hardware-based VM) dom0. Hardware-based virtualization can significantly clean up the CPU-related PV ops (including some for local APIC), but they have nothing to do with dom0. Some hooks in the host could be removed by reusing the HVM-specific code with modifications to the virtualization logic, but I think people need to tell which specific ones are intrusive, to be fair. . Jun Nakajima | Intel Open Source Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nakajima, Jun wrote:> I think we still need some (or all?) of additional dom0 PV ops even for HVM (Hardware-based VM) dom0. Hardware-based virtualization can significantly clean up the CPU-related PV ops (including some for local APIC), but they have nothing to do with dom0. > > Some hooks in the host could be removed by reusing the HVM-specific code with modifications to the virtualization logic, but I think people need to tell which specific ones are intrusive, to be fair. >I think two things will significantly clean up the dom0 apic patches: One is to adjust the LAPIC and IOAPIC probing code so that it behaves correctly if the APIC cpuid flag is clear. That would remove a lot of the init-time ad-hoc Xen changes I made. The other is to implement Ingo''s suggestion of a proper ioapic driver layer. I think that would not only resolve the low-level IO-APIC register access issue, but probably clean up a lot of the vector allocation/handling, and make a clear path for MSI support. With luck it will also clean up things like x2apic support I''m planning on putting some time into investigating these next week. Once we''ve nailed down the details of how to make PAT work for PV guests on the Xen side, we should be able to implement that fairly easily in Linux with no core x86 changes. I really don''t think emulating MTRR register writes is the right way to implement Xen MTRR support, given that a much more semantically appropriate interface already exists, but we can do that if nothing else gets merged. IanC is restructuring the swiotlb changes in a way that I hope will be acceptable to all. At that point, I think we really will have resolved all the high-level concerns expressed about the overall architecture of the patches, and maybe we can finally see some progress. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael David Crawford
2009-May-30 01:10 UTC
[Xen-devel] Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant
Tim Post wrote:> What, exactly is cowboy hackery? A dom-0 that might be a little slower > if you boot it without Xen?I unpacked the source RPM to some kernel, I don''t recall if it was CentOS'' or Fedora''s, but there were something like a hundred patches applied to the original kernel.org sources. I don''t doubt that many of those were a good idea, but it would have been a great deal of work simply to determine which ones were both important and properly implemented. Mike -- Michael David Crawford mdc@prgmr.com prgmr.com - We Don''t Assume You Are Stupid. Xen-Powered Virtual Private Servers: http://prgmr.com/xen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2009/5/29 Jeremy Fitzhardinge <jeremy@goop.org>:> Ingo Molnar wrote: >> >> Xen changes - especially dom0 - are overwhelmingly not about improving >> Linux, but about having some special hook and extra treatment in random >> places - and that''s really bad. >> > > You''ve made this argument a few times now, and I take exception to it. > > There are at least 500k servers > running Xen in commercial user sites (and untold numbers of smaller sites > and personal users), running millions of virtual guest domains. > To them, improved Xen support *is* "improving Linux".Well said. I use xen both personally and in my business as a dozen or so of those unseen millions of domUs, I''ve bitten my tongue for months while watching xen developers jump through the hoops in order to get pv_ops dom0 into the mainstream, only to be knocked back or left until the next merge window and the next and the next. Sure there were "the bad old days" of xen''s history, but having been asked the go the pv_ops route, I feel it is not just failing to improve linux by keeping dom0 out of mainstream, but actually hurting users and trapping them on ancient kernels which are missing newer hardware support. Sure, I wouldn''t like to see any old rubbish merged into the kernel, but I''m amazed at Jeremy''s patience over this. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke S Crawford
2009-May-30 21:02 UTC
Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
Dan Magenheimer <dan.magenheimer@oracle.com> writes:> > With the move to Xen, suddenly the heavy user was the only user > > seeing the slowness. Now the heavy user has the option of paying > > me more money for more ram to use as disk cache, or of dealing with it > > being slow. Light users had no more trouble. Log in once > > every 3 months? > > your /etc/passwd is still cached from last time. > > Am I understanding this correctly that you are "renting" a > fixed partition of physical RAM that (assuming the physical > server never reboots) persistently holds one VSP customer''s > VM''s memory forever, never saved to disk?Yes. Exactly. If you rent a 1 GB VPS from me, the way I see it, you are renting 1/32nd of one of my 32GiB servers. (and paying a premium for the privlege) Because the cost of giving you extra CPU when nobody else wants it, I''ll give you up to a full core, if nobody else needs it, so that''s a small bonus.> Although I can see this being advantageous for some users, > no matter how cheap RAM is, having RAM sit "idle" for months > (or even minutes) seems a dreadful waste of resources, > which is either increasing the price of the service or the > cost to the provider for a very small benefit for a > small number of users. I see it as akin to every VM > computing pi in a background process because, after all, > the CPU has nothing better to do if it was going to be > idle anyway.wait what? the difference is if you aren''t using the CPU, I can take it away, and then give it back to you when you want it almost immediately, with a small cost (of flushing the cpu cache, but that is fast enough that while it''s a big deal for scientific type applications, it doesn''t really make the percieved responsiveness of the box worse, unless you do it a bunch of times in a small period of time.) Ram is different. If I take away your pagecache, either I save it to disk (slow) and restore it (slow) when I return it, or I take it from you without saving to disk, and return clean pages when you want it back, meaning if you want that data you''ve got to re-read from disk. (slow) By slow, I mean slow enough that you notice. you type a command and sit, wondering what the problem with this cheap peice of crap you rented from me is, while the disk seeks. Hitting disk brings the performance of nearly anything well into ''unacceptable'' even when you use the expensive 10K disks, especially when you have a bunch of people hitting those same disks. (and I and all competitors I know of within an order of magnitude of my pricing use 7500rpm sata, exasterbating the problem, but the difference between 10K sas and 7.5k sata is not many orders of magnitude like the difference between ram and disk is) This does not help ''a few users'' this massively increases the percieved responsiveness of nearly all VPSs. what if you only get a website hit every 10 minutes? would you be satisfied if that hit took north of a second to return because it had to hit disk every time? I wouldn''t. would you complain if there was often north of a 1500ms delay between when you type a command and when you got a responce? I can tell you that my customers did, when I used a shared pagecache. (and yeah, that was on 10K fibre disks in raid 1+0) solving these problems is what pagecache is for.> While I can see how the current sorry state of memory management > by OS''s and hypervisors might lead to this business decision, > my goal is to make RAM a much more "renewable" resource. > The same way CPU''s are adding power management so that > they can be shut down when idle even for extremely small > periods of time to conserve resources, I''d like to see > "idle memory" dramatically reduced. Self-ballooning and > tmem are admittedly only a step in that direction, but > at least it is (I hope) the right direction.I keep saying, Pagecache is not idle ram. Pagecache is essential to the perception of acceptable system performance. I''ve tried selling service (on 10K fibre disk, no less) with shared pagecache, and by all reasonable standards, performance was unacceptable. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Post
2009-May-31 16:44 UTC
Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
On Sat, 2009-05-30 at 17:02 -0400, Luke S Crawford wrote:> I keep saying, Pagecache is not idle ram. Pagecache is essential to the > perception of acceptable system performance. I''ve tried selling service > (on 10K fibre disk, no less) with shared pagecache, and by all reasonable > standards, performance was unacceptable.I''ve never seen automatic overcommitment work out in a way that everyone was happy in the hosting industry. You are 100% correct, by default Linux is like pac man gobbling up blocks for cache. However, this is partly because even most well written services and applications neglect to advise the kernel to do anything different. posix_madvise() and posix_fadvise() do not see the light of day nearly as often as they should. Are you parsing some m4 generated configuration file that''s just under or north of the system page size? You''d then want to tell the kernel "Hey, I only need this once .. " prior to even talking to read(). Yet I see people going hog wild with O_DIRECT because they think its supposed to make things faster. On enterprise systems (i.e. not hosting web sites and databases that are created by others and uploaded), this is less of a hassle and a bit easier to manage. You _know_ better than to make 1500 static HTML pages 360K long each and put them where Google can access them. You _know_ better than to mix services that allocate 20x more than they actually need on the same host. You''re able to adjust your swappiness on a whole group of domains instantly from a central place. Finally, your able to patch your services so they better suit your goals. What Dan is describing is very useful, but not to IAAS providers. Like I said before, I would not flip a switch to AUTO on any server that is providing the use of a VM to a customer. However , customers do get e-mails saying "You bought 1 GB, on average this month you''ve used only xxx (detail averages sampled through /proc and sysinfo()) you may wish to switch to a cheaper plan". Sound nuts? It actually makes more money, because our density per server goes up quite a bit. So in a large way, I think Dan is correct. If a client bought the use of memory and barely uses it, I''d rather give them a discount for giving some back, enabling me to set up another domain on that node. But don''t get me wrong, I''d never dream of doing that ''automagically'' :) Cheers, --Tim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Post
2009-May-31 17:00 UTC
Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
Sorry, hit send too quickly: On Mon, 2009-06-01 at 00:44 +0800, Tim Post wrote:> So in a large way, I think Dan is correct. If a client bought the use of > memory and barely uses it, I''d rather give them a discount for giving > some back, enabling me to set up another domain on that node. But don''t > get me wrong, I''d never dream of doing that ''automagically'' :)I meant to add, if an overcommit feature could just make and log suggestions, it would eliminate a ton of userspace hackery. Thus, it would be very useful to hosts (albeit in a neutered form). Most hosts would gladly deal with sed, grep and awk vs libxc and libxs :) Cheers, --Tim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-May-31 19:48 UTC
RE: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
> > So in a large way, I think Dan is correct. If a client > bought the use of > > memory and barely uses it, I''d rather give them a discount > for giving > > some back, enabling me to set up another domain on that > node. But don''t > > get me wrong, I''d never dream of doing that ''automagically'' :) > > I meant to add, if an overcommit feature could just make and log > suggestions, it would eliminate a ton of userspace hackery. Thus, it > would be very useful to hosts (albeit in a neutered form). > > Most hosts would gladly deal with sed, grep and awk vs libxc and > libxs :)Tmem with self-ballooning can be controlled on a guest-by-guest basis, dynamically and with fairly good granularity. So you need not turn overcommit "on" or "off". And there is no hypervisor-based swapping which is invisible to the guest; overcommit requires guests to provide swap space and if they don''t balloon down (voluntarily) and don''t exceed their RAM, they don''t use it. Picture this (and assume tools exist to help you measure and manage it): Each user is billed only for the resources they use, including RAM. RAM "optimization" can be controlled by the user via a menu (or slider bar for more granularity); at one extreme, RAM (and more specifically page cache) is aggressively reduced... but only if another VM is demanding it. On the other extreme, fixed maximum RAM is fully owned by the user, and it sits idle if not in use. The user can choose dynamically whether to pay more for fast responsiveness, or to pay less and surrender RAM if needed elsewhere, with some probability for slower responsiveness. In other words, this is like the option that some power utilities are providing to give you a discount if you are willing to let them shut off your air conditioning or water heater at peak load. Note that these tools DON''T exist today... and I don''t plan on writing them. I''m just working at the hypervisor level to ensure that memory utilization can be more effective and flexible (and measurable when the flexibility is used). Does that sound more attractive to an IAAS provider? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Jun-01 18:04 UTC
RE: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
Not to beat this to death, but one more comment:> wait what? the difference is if you aren''t using the CPU, I can take > it away, and then give it back to you when you want it almost > immediately, > with a small cost (of flushing the cpu cache, but that is fast enough > that while it''s a big deal for scientific type applications, > it doesn''t > really make the percieved responsiveness of the box worse, unless you > do it a bunch of times in a small period of time.) > > Ram is different. If I take away your pagecache, either I save it to > disk (slow) and restore it (slow) when I return it, or I take > it from you > without saving to disk, and return clean pages when you want it back, > meaning if you want that data you''ve got to re-read from disk. (slow)You are technically correct, but I''m not talking about taking away ALL of the pagecache. Pagecache is a guess as to what pages might be used in the future. A large percentage of those guesses are wrong and the page will never be used again and will eventually be evicted. This is what I call "idle memory" but I love the way Tim Post put it: "Linux is like pac man gobbling up blocks for cache." The right long-term answer is for Linux and OS''s in general to get smarter about giving up memory that they know is not going to be used again, but even if they get smarter, they will never be omniscient. So self-ballooning creates pressure on the page cache, making the OS evict pages that its not so sure about. Then tmem acts as a backup for those pages; if the OS was wrong and the page is needed again (soon), it can get it right back without a disk read. Clearly this won''t help users who leave their VM idle for three months and then expect instantaneous response, but that''s what I meant by your memory partitioning helping only a few users. Does that make sense? Is it at least a step in the right direction? Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke S Crawford
2009-Jun-02 00:15 UTC
Re: Distro kernel and ''virtualization server'' vs. ''server that sometimes runs virtual instances'' rant (was: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
Dan Magenheimer <dan.magenheimer@oracle.com> writes:> Picture this (and assume tools exist to help you measure > and manage it): Each user is billed only for the resources > they use, including RAM. RAM "optimization" can be controlled > by the user via a menu (or slider bar for more granularity); > at one extreme, RAM (and more specifically page cache) is > aggressively reduced... but only if another VM is demanding > it. On the other extreme, fixed maximum RAM is fully owned > by the user, and it sits idle if not in use. The user > can choose dynamically whether to pay more for fast responsiveness, > or to pay less and surrender RAM if needed elsewhere, with > some probability for slower responsiveness.That sounds excelent for situations where I can quickly and cheaply move a guest from one piece of physical hardware to another.> Does that sound more attractive to an IAAS provider?This is useful in some cases. Still not in mine; see, I can''t afford shared storage, so giving me free ram that may only be free for a few minutes is of limited utility. Yeah, I can use it as shared disk cache for extra heavy disk users, but it''s still a more complex model for the customer to understand, and I can''t bring up more guests on that host. I could give it to other people on the same host, but I think that might be of limited utility, as I don''t know how many customers will be willing to pay for extra capacity if that extra capacity is only sometimes available. But then, I am experimenting with low-cost homebrew OpenSolaris NAS setups, so if that works out, and I get a working live migration system together, then this could be useful. Not as useful as, say, some mechanisim for live or nearly live migration with local storage, but still useful. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 29 May 2009, George Dunlap wrote:> David Miller wrote: > > I don''t see Ingo''s comments, whether I agree with them or not, as > > an implication of Xen being niche. Rather I see his comments as > > an opposition to how Xen is implemented. > > > It''s in his definition of "improving Linux". Jeremy is saying that allowing > Linux to run as dom0 *is* improving Linux. The lack of dom0 support is at > this moment making life more difficult for a huge number of Linux users whoExactly that''s the point. Adding dom0 makes life easier for a group of users who decided to use Xen some time ago, but what Ingo wants is technical improvement of the kernel. There are many features which have been wildly used in the distro world where developers tried to push support into the kernel with the same line of arguments. The kernel policy always was and still is to accept only those features which have a technical benefit to the code base. I''m just picking a few examples: Aside of the paravirt, which seems to expand through arch/x86 like a hydra, the new patches sprinkle "if (xen_...)" all over the place. These extra xen dependencies are no improvement, they are a royal pain in the ... They are sticky once they got merged simply because the hypervisor relies on them and we need to provide compatibility for a long time. Aside of that it grows interfaces like pat_disable() just because the CPU model of Xen is obviously not able to kill the PAT flags in the CPUid emulation. Why for heavens sake do we have a cpuid paravirt op when we need to disable stuff seperately which can be disabled by paravirt functionality already? I don''t see this as an improvement either, it''s simple sloppy hackery. The changelogs of the patches are partially confusing as hell: commit 7d2b03ff4ae27b7c9e99a421a5b965f20e4bfaab x86: fix up flush_tlb_all - initialize the locks before the first use - make sure preemption is disabled [ Impact: Bug fixes: boot time warning, and crash ] This patch is in the Xen queue and I assume it''s XEN related as we have not seen anywhere a boot time warning and crash with the current code AFAICT, but the changelog reads like this is some generic BUG in the SMP boot code. There is neither a hint to Xen nor to another patch which caused that problem. While the patch itself is harmless I do not see what is improved and why the change was necessary in the first place. That''s what maintainers have to look at and not who is using the code already and wants to see it merged.> use Xen, including Mozilla, Debian, and Amazon. Adding dom0 support would > make Linux even more useful to a wide variety of people not using Xen at the > moment.I really have a hard time to see why dom0 support makes Linux more useful to people who do not use it. It does not improve the Linux experience of Joe User at all. In fact it could be harmful to the average user, if it''s merged in a crappy way that increases overhead, has a performance cost and draws away development and maintenance resources from other areas of the kernel. Aside of that it can also hinder the development of a properly designed hypervisor in Linux: ''why bother with that new stuff, it might be cleaner and nicer, but we have this Xen dom0 stuff already?''. Thanks, tglx _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thomas Gleixner wrote:> Exactly that''s the point. Adding dom0 makes life easier for a group of > users who decided to use Xen some time ago, but what Ingo wants is > technical improvement of the kernel. > > There are many features which have been wildly used in the distro > world where developers tried to push support into the kernel with the > same line of arguments. > > The kernel policy always was and still is to accept only those > features which have a technical benefit to the code base. >I can appreciate the idea of resisting the pushing of random features. Still, your definition of "improving Linux" is still lacking. Obviously a new scheduler is taking something that''s existing and improving it. But adding a new filesystem, a new driver, or adding a new feature, such as notifications, AIO, a new hardware architecture, or even KVM: How do those classify as "technical improvement to the kernel" or "features which have technical benefit to the code base" in a way that Xen does not? If you mean "increases Linux''s technical capability", and define Xen as outside of Linux, then I think the definition is too small. After all, allowing Linux to run on an ARM processor isn''t increasing Linux'' technical capability, it''s just allowing a new group of people (people with ARM chips) to use Linux. It''s the same with Xen. No one disputes the idea that changes shouldn''t be ugly; no one disputes the idea that changes shouldn''t introduce performance regressions. But there are patchqueues that are ready, signed-off by other maintainers, and which Ingo admits that he has no technical objections to, but refuses to merge. (His most recent "objection" is that he claims the currently existing pv_ops infrastructure (which KVM and others benefit from as well as Xen) introduces almost a 1% overhead on native in an mm-heavy microbenchmark. So he refuses to merge feature Y (dom0 support) until the Xen community helps technically unrelated existing feature X (pv_ops) meets some criteria. So it has nothing to do with the quality of the patches themselves.) [Not qualified to speak to the specific technical objections.]> I really have a hard time to see why dom0 support makes Linux more > useful to people who do not use it. It does not improve the Linux > experience of Joe User at all. >If Joe User uses Amazon, he benefits. If Joe User downloads an Ubuntu or Debian distro, and the hosting providers were more secure and had to do less work because dom0 was inlined, then he benefits because of the lower cost / resources freed to do other things. But what I was actually talking about is the number of people who don''t use it now but would use it if it were merged in. There hundreds of thousands of instances running now, and more people are chosing to use it at the moment, even though those who use it have the devil''s choice between doing patching or using a 3-year old kernel. How many more would use it if it were in mainline?> In fact it could be harmful to the average user, if it''s merged in a > crappy way that increases overhead, has a performance cost and draws > away development and maintenance resources from other areas of the > kernel. >No one is asking for something to be merged in a crappy way, or with unacceptable performance cost. There are a number of patchqueues that Ingo has no technical objections to, but which he still refuses to merge. "Drawing away development and maintenance resources" is a cost/benefits question, and Jeremy''s main point was that there is a *high* benefit for dom0 being merged into mainline. The same could be said of almost anything: are you suggesting not accepting any more KVM code because it might "draw away development and maintenance resources from other areas of the kernel"?> Aside of that it can also hinder the development of a properly > designed hypervisor in Linux: ''why bother with that new stuff, it > might be cleaner and nicer, but we have this Xen dom0 stuff > already?''. >This argument doesn''t make any sense. Would you advocate only having one filesystem for fear that people would somehow be discouraged from working on a new filesystem? Even if that were a valid argument, it wouldn''t apply in this situation. KVM has plenty of mind-share, and the support of RedHat. Also, I''d wager that it''s a lot easier for a Linux kernel developer to get involved in KVM than in Xen, because they''re already familiar with Linux. I don''t think anyone working on KVM will be tempted to give up just because Xen is also available, unless it becomes clear that linux-as-hypervisor isn''t the best technical solution; in which case, moving to Xen would be the right thing to do anyway. Merging dom0 Xen will in no way interfere with the development of KVM or other linux-as-hypervisor projects. The main point of Jeremy''s e-mail was NOT to say, "Lots of people use this so you should merge it." He''s was responding to Xen being treated like it had no benefit. It does have a benefit; it is a feature. -George _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
George Dunlap wrote:> Thomas Gleixner wrote:> No one disputes the idea that changes shouldn''t be ugly; no one disputes > the idea that changes shouldn''t introduce performance regressions. But > there are patchqueues that are ready, signed-off by other maintainers, > and which Ingo admits that he has no technical objections to, but > refuses to merge.I can''t comment on this part, but if so that seems unfortunate.> The main point of Jeremy''s e-mail was NOT to say, "Lots of people use > this so you should merge it." He''s was responding to Xen being treated > like it had no benefit. It does have a benefit; it is a feature.I don''t know about others, but I certainly interpreted a number of posts saying exactly that--that it''s useful so it should be included. I don''t think anyone is arguing that Xen is not useful or that it should not ever be included, rather the question is whether the current set of patches is suitable for addition or whether they are too messy and should be cleaned up first. Chris _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, 2 Jun 2009, George Dunlap wrote:> > idea that changes shouldn''t introduce performance regressions. But there are > patchqueues that are ready, signed-off by other maintainers, and which Ingo > admits that he has no technical objections to, but refuses to merge.I''ve seen technical objects in this thread. The whole thing _started_ with one, and Thomas brought up others. As a top-level maintainer, I can also very much sympathise with the "don''t merge new stuff if there are known problems and no known solutions to those issues". Is Ingo supposed to just continue to merge crap, when it''s admitted that it has problems and pollutes code that he has to maintain? The fact is (and this is a _fact_): Xen is a total mess from a development standpoint. I talked about this in private with Jeremy. Xen pollutes the architecture code in ways that NO OTHER subsystem does. And I have never EVER seen the Xen developers really acknowledge that and try to fix it. Thomas pointed to patches that add _explicitly_ Xen-related special cases that aren''t even trying to make sense. See the local apic thing. So quite frankly, I wish some of the Xen people looked themselves in the mirror, and then asked themselves "would _I_ merge something ugly like that, if it was filling my subsystem with totally unrelated hacks for some other crap"? Seriously. If it was just the local APIC, fine. But it may be just the local APIC code this time around, next time it will be something else. It''s been TLB, it''s been entry_*.S, it''s been all over. Some of them are performance issues. I dunno. I just do know that I pointed out the statistics for how mindlessly incestuous the Xen patches have historically been to Jeremy. He admitted it. I''ve not seen _anybody_ say that things will improve. Xen has been painful. If you give maintainers pain, don''t expect them to love you or respect you. So I would really suggest that Xen people should look at _why_ they are giving maintainers so much pain. Linus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, 2 Jun 2009, Linus Torvalds wrote:> > I dunno. I just do know that I pointed out the statistics for how > mindlessly incestuous the Xen patches have historically been to Jeremy. He > admitted it. I''ve not seen _anybody_ say that things will improve.In case people want to look at this on their own, get a git tree, and run the examples I asked Jeremy to run: git log --pretty=oneline --full-diff --stat arch/x86/kvm/ | grep -v ''/kvm'' | less -S and then go ahead and do the same except with "xen" instead of "kvm". Now, once you''ve done that, ask yourself which one is going to be merged easily and without any pushback. Btw, this is NOT meant to be a "xen vs kvm" thing. Before you react to the "kvm" part, replace "arch/x86/kvm" above with "drivers/scsi" or something. The point? Xen really is horribly badly separated out. It gets way more incestuous with other systems than it should. It''s entirely possible that this is very fundamental to both paravirtualization and to hypervisor behavior, but it doesn''t matter - it just measn that I can well see that Xen is a f*cking pain to merge. So please, Xen people, look at your track record, and look at the issues from the standpoint of somebody merging your code, rather than just from the standpoint of somebody who whines "I want my code to be merged". IOW, if you have trouble getting your code merged, ask yourself what _you_ are doing wrong. Linus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, 2 Jun 2009, George Dunlap wrote:> Thomas Gleixner wrote: > > Exactly that''s the point. Adding dom0 makes life easier for a group of > > users who decided to use Xen some time ago, but what Ingo wants is > > technical improvement of the kernel. > > > > There are many features which have been wildly used in the distro > > world where developers tried to push support into the kernel with the > > same line of arguments. > > > > The kernel policy always was and still is to accept only those > > features which have a technical benefit to the code base. > > > I can appreciate the idea of resisting the pushing of random features. Still, > your definition of "improving Linux" is still lacking. Obviously a new > scheduler is taking something that''s existing and improving it. But adding a > new filesystem, a new driver, or adding a new feature, such as notifications, > AIO, a new hardware architecture, or even KVM: How do those classify as > "technical improvement to the kernel" or "features which have technical > benefit to the code base" in a way that Xen does not?There is a huge difference between new filesystems, drivers, architectures and Xen. A new filesystem is not intrusive to the filesystem layers, it''s not adding its special cases all over the place. There is no single "if (fs_whatever)" hackery in the code base. Neither does a driver nor a new architecture. If the new functionality needs some extension to the generic code base then this is carefully added with the maintainers of that code and the extension is usually useful to other (filesystems, drivers, architectures) as well. If it''s necessary to add some special case for one architecture then this is done by proper abstraction to keep the burden and the maintainence cost down. There is no #ifdef ARCH_ARM in mm/ fs/ kernel/ block/ ..... Talking about KVM, there is not a single "if (kvm)" line in the arch/x86 code base. There is _ONE_ lonely #ifdef CONFIG_KVM_CLOCK (which could be eliminated) in the whole x86 codebase, but at least 10 CONFIG_XEN* ones all over the place. The KVM developers went great length to avoid adding restrictions to the existing code base. I''m not saying that the Xen folks did not listen to us, they improved lots of their code base and Jeremy was particularly helpful to unify the 32/64bit code. But right now I see a big code dump with subtle details where some of them are just not acceptable to me.> If you mean "increases Linux''s technical capability", and define Xen as > outside of Linux, then I think the definition is too small. After all, > allowing Linux to run on an ARM processor isn''t increasing Linux'' technical > capability, it''s just allowing a new group of people (people with ARM chips) > to use Linux. It''s the same with Xen.No, it''s not. ARM does not interfere with anything and it keeps its architecture specific limitations confined in arch/arm. Xen injects its design limitation workarounds into the arch/x86 codebase and burdens developers and maintainers with it.> No one disputes the idea that changes shouldn''t be ugly; no one disputes the > idea that changes shouldn''t introduce performance regressions. But there are > patchqueues that are ready, signed-off by other maintainers, and which Ingo > admits that he has no technical objections to, but refuses to merge. > (His most recent "objection" is that he claims the currently existing pv_ops > infrastructure (which KVM and others benefit from as well as Xen) introduces > almost a 1% overhead on native in an mm-heavy microbenchmark. So he refuses > to merge feature Y (dom0 support) until the Xen community helps technically > unrelated existing feature X (pv_ops) meets some criteria. So it has nothing > to do with the quality of the patches themselves.)Oh well. It has a lot to do with the quality of the patches. The design is part of the quality and right now the short comings of the design are papered over by adding Xen restrictions into the x86 code base.> [Not qualified to speak to the specific technical objections.] > > I really have a hard time to see why dom0 support makes Linux more > > useful to people who do not use it. It does not improve the Linux > > experience of Joe User at all. > > > If Joe User uses Amazon, he benefits. If Joe User downloads an Ubuntu or > Debian distro, and the hosting providers were more secure and had to do less > work because dom0 was inlined, then he benefits because of the lower cost / > resources freed to do other things.Right, then they can concentrate on adding another bunch out of tree patches to their kernels. Next time you stand up and tell me the same argument for apparmour, ndiswrapper or whatever people like to use.> But what I was actually talking about is the number of people who don''t use it > now but would use it if it were merged in. There hundreds of thousands of > instances running now, and more people are chosing to use it at the moment, > even though those who use it have the devil''s choice between doing patching or > using a 3-year old kernel. How many more would use it if it were in mainline?How many more would use ndiswrapper if it were in mainline ?> > In fact it could be harmful to the average user, if it''s merged in a > > crappy way that increases overhead, has a performance cost and draws > > away development and maintenance resources from other areas of the > > kernel. > > > No one is asking for something to be merged in a crappy way, or with > unacceptable performance cost. There are a number of patchqueues that Ingo > has no technical objections to, but which he still refuses to merge.Right, because the lineup of patches is not completely untangled and we still have objections against the overall outcome and design of the Dom0 integration into the kernel proper. It''s not our fault that the Dom0 design decisions were made in total disconnect to the kernel community and now a "swallow them as is" policy is imposed on us with the argument that the newer kernels need to run on ancient hypervisors as well. You whine about users having to use 3 year old kernels, but 3 years old hypervisors are fine, right ? I''m not against merging dom0 in general, I''m opposing that we need to buy inferior technical solutions which we can not change for a long time. Once we merged them the "you can not break existent hypervisors" argument will be used to prevent any design change and cleanup.> The main point of Jeremy''s e-mail was NOT to say, "Lots of people use this so > you should merge it." He''s was responding to Xen being treated like it had no > benefit. It does have a benefit; it is a feature.Right, a feature which comes with cost. The cost is the de facto injection of an dom0 ABI into the arch/x86 code base. A new driver is a feature as well, but it just adds the feature w/o impact to the general system. Thanks, tglx _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Linus Torvalds wrote:> The point? Xen really is horribly badly separated out. It gets way more > incestuous with other systems than it should. It''s entirely possible that > this is very fundamental to both paravirtualization and to hypervisor > behavior, but it doesn''t matter - it just measn that I can well see that > Xen is a f*cking pain to merge. > > So please, Xen people, look at your track record, and look at the issues > from the standpoint of somebody merging your code, rather than just from > the standpoint of somebody who whines "I want my code to be merged". > > IOW, if you have trouble getting your code merged, ask yourself what _you_ > are doing wrong. >There is in fact a way to get dom0 support with nearly no changes to Linux, but it involves massive changes to Xen itself and requires hardware support: run dom0 as a fully virtualized guest, and assign it all the resources dom0 can access. It''s probably a massive effort though. I''ve considered it for kvm when faced with the "I want a thin hypervisor" question: compile the hypervisor kernel with PCI support but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device drivers), load userspace from initramfs, and assign host devices to one or more privileged guests. You could probably run the host with a heavily stripped configuration, and enjoy the slimness while every interrupt invokes the scheduler, a context switch, and maybe an IPI for good measure. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, May 29, 2009 at 01:01:18PM +0100, George Dunlap wrote:> > If we take him at his word, that the root issue is that he fundamentally > dislikes the design choice of running Linux-as-hypervisor-component, > then we have a difference of opinion and we''re just going to have to > agree to disagree. But there are reasons to include it anyway, > including benefits to existing Xen users and potential Xen users (who > have decided not to use KVM for whatever reason), and the idea of > survival-of-the-fittest: Xen and KVM have made different design choices, > let''s let them both grow and see which one thrives. If KVM''s design is > unilaterally superior, eventually Xen will die off. But I suspect that > there''s significant demand in the OSS virtualization ecology for both > approaches, and the world will be the worse for dom0 support being > out-of-tree. >Three years ago, when I was hired by Red Hat, I was put on the Virt team, and I had to work on Xen. I found it an awkward community to say the least. But I''ll refrain from talking about that experience. Before I was hired, I was full time developing the -rt patch. I was accustom to the way the Linux development worked, and felt comfortable with it. I was very pleased when I left the virt team to go back to work on the -rt patch. Just before I left, KVM came out. I started playing with it and I once again felt comfortable in that development. I probably would not have mind working in the virt team if it was KVM that I was working on. I guess the point I''m trying to make here is that KVM is developed in a Linux community, Xen is not. The major difference between KVM and Xen is that KVM _is_ part of Linux. Xen is not. The reason that this matters is that if we need to make a change to the way Linux works we can simply make KVM handle the change. That is, you could think of it as Dom0 and the hypervisor would always be in sync. If we were to break an interface with Dom0 for Xen then we would have a bunch of people crying foul about us breaking a defined API. One of Thomas''s complaints (and a valid one) is that once Linux supports an external API it must always keep it compatible. This will hamper new development in Linux if the APIs are scattered throughout the kernel without much thought. Now here''s a crazy solution. Merge the Xen hypervisor into Linux ;-) Give full ownership of Xen to the Linux community. One of your people could be a maintainer. This way the API between Dom0 and the hypervisor would be an internal one. If you needed to upgrade Dom0, you also must upgrade the hypervisor, but that would be fine since the hypervisor would also be in the Kernel proper. This may not solve all the issues that the x86 maintainers have with the Dom0 patches, but it may help solve the API one. Yeah, I know, I''ll be having snowball fights with Saddam before that happens. -- Steve _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
* Steven Rostedt <rostedt@goodmis.org> wrote:> Now here''s a crazy solution. Merge the Xen hypervisor into Linux > ;-)That''s not that crazy - it''s the right technical solution if DOM0 is desired for upstream. From what i''ve seen in DOM0 land the incestous dependencies are really only long-term manageable if the whole thing is in a single tree. A lot of Xen legacies could be dropped: the crazy ring1 hack on 32-bit, the various wide interfaces to make pure-software virtualization limp along. All major CPUs shipped with hardware virtualization support in the past 2-3 years, so the availability of VMX and SVM can be taken for granted for such a project. That cuts down on a fair amount of crap. A lot of code on the Linux side could be reused, and a pure CONFIG_PCI=y (all other things disabled) would provide a "slim hypervisor" instance with a very small and concentrated code base. (That ''slim hypervisor'' might even be built with CONFIG_NOMMU.) That way dom0 would be a natural extension: a minimal interface between Linux-Xen-minimal and the dom0 guest instance. It''s a sane technical model IMO, and makes dom0 a lot more palatable. Having in-tree competition to KVM would also obviously be good to Linux in general. Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2 Jun 2009, Steven Rostedt wrote:> If we were to break an interface with Dom0 for Xen then we would have a bunch > of people crying foul about us breaking a defined API. One of Thomas''s complaints > (and a valid one) is that once Linux supports an external API it must always > keep it compatible. This will hamper new development in Linux if the APIs are > scattered throughout the kernel without much thought. > > Now here''s a crazy solution. Merge the Xen hypervisor into Linux ;-)Not that crazy as you might think.> Give full ownership of Xen to the Linux community. One of your people could be > a maintainer. This way the API between Dom0 and the hypervisor would be an internals/API/ABI/ :)> one. If you needed to upgrade Dom0, you also must upgrade the hypervisor, but that > would be fine since the hypervisor would also be in the Kernel proper. > > This may not solve all the issues that the x86 maintainers have with the Dom0 > patches, but it may help solve the API one.In fact it would resolve the ABI problem once and forever as we could fix hypervisor / dom0 in sync. hypervisor and dom0 need to run in lock-step anyway if you want to make useful progress aside of maintaining versioned interfaces which are known to bloat rapidly. It''s not a big deal to set a flag day which says: update hypervisor and (dom0) kernel in one go. Thanks, tglx _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
[ Speaking as me, no regard to $EMPLOYER ] On Wed, Jun 03, 2009 at 01:28:43AM +0200, Ingo Molnar wrote:> A lot of Xen legacies could be dropped: the crazy ring1 hack on > 32-bit, the various wide interfaces to make pure-software > virtualization limp along. All major CPUs shipped with hardware > virtualization support in the past 2-3 years, so the availability of > VMX and SVM can be taken for granted for such a project.The biggest reason I personally want Xen to be in mainline is PVM. Dropping PVM is, to me, pretty much saying "let''s merge Xen without taking the useful parts." I have only two large machines I control. They''re too big to run as single hosts - it''s a waste - but I can leverage cluster testing by virtualizing them. The first machine has HVM support. The early kind. It''s about 2 years old. It''s so dreadfully slow that I had to go to PVM. That runs at very good speeds and I''ve stopped noticing the virtualization. The only problem I have is managing the hypervisor bits, because they''re out of tree. Now, perhaps that could be fixed. Someone told me that older HVM boxen can''t be fixed; you need a very recent VMX/SVM to perform well. But if it is fixable, then perhaps future plans shouldn''t worry about it. The second machine is pre-HVM by a short period. It is not even three years old. I can''t run HVM on it, at all. I can either run PVM or I can''t virtualize. It has fast CPUs and many GB of RAM. I can do an entire four node cluster test on it, with serious (read, memory intensive) software. In a PVM-less world, this machine becomes a single cluster node, and I have to go find three more machines. Of course, if I had infinite machines, I wouldn''t be worrying about this at all. So I want to see PVM continue for a long time. I''d like it to be something I can get with mainline Linux. I don''t care if it is dom0, dom0 and the hypervisor, whatever. I just don''t want to have to be patching out-of-tree patches for a pretty basic functionality. I don''t see 2-3 years as a time frame to assume "everyone has one." Otherwise, why does Linux have code for x86_32? Everyone''s had a 64bit system for at least that long. Sure, that''s a straw man. It goes both ways. Like Chris said, if we have technical hurdles for Xen to cross, let''s get them out in the open and fixed. If previous Xen developer interaction has left a bad taste in people''s mouths, then the current crew has to make it up to us. But we have to be willing to notice they''re doing so. At the end of the day, I want to use Linux on my systems. Joel -- "I almost ran over an angel He had a nice big fat cigar. ''In a sense,'' he said, ''You''re alone here So if you jump, you''d best jump far.''" Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, 2 Jun 2009, Joel Becker wrote:> [ Speaking as me, no regard to $EMPLOYER ] > > On Wed, Jun 03, 2009 at 01:28:43AM +0200, Ingo Molnar wrote: >> A lot of Xen legacies could be dropped: the crazy ring1 hack on >> 32-bit, the various wide interfaces to make pure-software >> virtualization limp along. All major CPUs shipped with hardware >> virtualization support in the past 2-3 years, so the availability of >> VMX and SVM can be taken for granted for such a project. > > The biggest reason I personally want Xen to be in mainline is > PVM. Dropping PVM is, to me, pretty much saying "let''s merge Xen > without taking the useful parts."> So I want to see PVM continue for a long time. I''d like it to > be something I can get with mainline Linux. I don''t care if it is dom0, > dom0 and the hypervisor, whatever. I just don''t want to have to be > patching out-of-tree patches for a pretty basic functionality. > I don''t see 2-3 years as a time frame to assume "everyone has > one." Otherwise, why does Linux have code for x86_32? Everyone''s had a > 64bit system for at least that long. Sure, that''s a straw man. It goes > both ways.it''s always easier to continue to support stuff that you already have in place than it is to add new things. if the non PVM stuff could be added to the kernel, how much would that simplify the code needed to support PVM? would that reduce the amount of effort that the Xen people need to spend to something that would mean that they would be able to keep up with fairly recent kernels? or what about getting the non PVM version in, and then making the seperate argument to add PVM support with a different config option (''xen support for older CPU''s, note there is a performance degredation if this option is selected''), distros could support Xen in their main kernel package on new hardware, and users like you could enable the slower version. David Lang note: I am not an approver in this process, just an interested observer (who doesn''t use Xen) _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Jun 02, 2009 at 05:00:21PM -0700, Dan Magenheimer wrote:> That sound you heard was 10000 xen-users@lists.xensource.com > all having heart attacks at once. > > Need I say more.So maybe I''m stupid, but why would they be having heart attacks? It seems like a decent solutoin to me. What''s being proposed would make the dom0/hypervisor interface an internal once, always subject to change. What''s wrong with that? Presumably the domU/hypervisor interface would have to be remain stable, but why is the dom0/hypervisor interface have to be sacred and unchanging? I don''t understand the concern. - Ted _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Steven Rostedt
2009-Jun-03 03:42 UTC
[Xen-users] Re: Merge Xen (the hypervisor) into Linux
On Tue, 2 Jun 2009, Theodore Tso wrote:> On Tue, Jun 02, 2009 at 05:00:21PM -0700, Dan Magenheimer wrote: > > That sound you heard was 10000 xen-users@lists.xensource.com > > all having heart attacks at once. > > > > Need I say more. > > So maybe I''m stupid, but why would they be having heart attacks?Maybe because they asked for an apple and got an apple pie? That is, they are pushing hard for an interface for Dom0, and Ingo just agreed to take it along with the entire Xen hypervisor ;-)> > It seems like a decent solutoin to me. What''s being proposed would > make the dom0/hypervisor interface an internal once, always subject to > change. What''s wrong with that? Presumably the domU/hypervisor > interface would have to be remain stable, but why is the > dom0/hypervisor interface have to be sacred and unchanging? I don''t > understand the concern.I know I said it was a crazy idea, but the craziness was not with the technical side, or even if it is the correct thing to do. I just don''t see the Xen team cooperating with the Linux team. But maybe those are the old days. Perhaps the rightful place for the Xen hypervisor is in Linux. Xen is GPL right? Thus we could do this even with out the permission from Citrix. The Dom0 push of Xen just seems too much like Linux being Xen''s sex slave, when it should be the other way around. By Linux acquiring the Xen hypervisor, then I can imaging much more progress in the area of Xen. KVM may be a competitor, but the two may also be able to share code thus both could benefit. I''m not as turned off by Paravirt as others (although I''ve had my cursing at it), but with Xen inside Linux, we can tame the damage. Progress of Xen would speed up since there would be no barrier with the changes in Linux with the changes in Xen. That is, they will always be compatible. -- Steve _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
From: Dan Magenheimer <dan.magenheimer@oracle.com> Date: Tue, 2 Jun 2009 21:49:58 -0700 (PDT)> A hypervisor is not an operating system.This is a pretty bogus statement if you ask me. A hypervisor a software system that provides seperation between protection realms. It also handles exceptions and "system calls" on behalf of the other protection realms. I personally don''t see the difference at all. And since many hypervisors even do cpu scheduling, the fundamental differences converge to almost nothing. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2009-Jun-03 05:07 UTC
[Xen-users] Re: Merge Xen (the hypervisor) into Linux
On Tue, 2 Jun 2009, David Miller wrote:> From: Dan Magenheimer <dan.magenheimer@oracle.com> > Date: Tue, 2 Jun 2009 21:49:58 -0700 (PDT) > > > A hypervisor is not an operating system. > > This is a pretty bogus statement if you ask me. > > A hypervisor a software system that provides seperation between > protection realms. > > It also handles exceptions and "system calls" on behalf of the other > protection realms. > > I personally don''t see the difference at all. And since many > hypervisors even do cpu scheduling, the fundamental differences > converge to almost nothing.I recently sat in an Operating Systems class where the Professor was an old IBM retiree, that worked on the 390 system way back when. He would argue the point that an Operating System must do at least two things, schedule tasks and manage paging. The Xen hypervisor does both, thus in his eyes, it is indeed an Operating System. -- Steve P.S. he also thought that filesystem management does not have to be a duty of the OS and he hated the fact he had to teach it ;-) _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi,> It seems like a decent solutoin to me. What''s being proposed would > make the dom0/hypervisor interface an internal once, always subject to > change. What''s wrong with that?Linux is not the only player here. NetBSD can run as dom0 guest. Solaris can run as dom0 guest too. Thus making the dom0/xen interface private to linux and xen isn''t going to fly. cheers, Gerd _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> The biggest reason I personally want Xen to be in mainline is > PVM. Dropping PVM is, to me, pretty much saying "let''s merge Xen > without taking the useful parts."PVM is and has been for a long time a messaging parallel machine. Can you not misuse the abbreviation in confusing ways (especially in email I read in the morning ;)) Merging just hardware assisted vm support initially might be a perfectly sensible path.> Like Chris said, if we have technical hurdles for Xen to cross, > let''s get them out in the open and fixed. If previous Xen developer > interaction has left a bad taste in people''s mouths, then the current > crew has to make it up to us. But we have to be willing to notice > they''re doing so.Start by changing the mentality. Right now much of the patched code looks like "We made a decision years ago when creating Xen. Now we need to force that code we wrote into Linux somehow". Stuff gets merged a lot better if the thinking is "how do we make the minimal changes to the existing kernel, cleanly and with minimal inter-relationships". Only after that do you worry about whether the existing in kernel interfaces are right. There is a simple reason for this: Changing an interface in the kernel is a consensus finding process around all visible users of the interface. It''s much easier to do that as a follow up. That way you can bench alternatives, test if it harms any of the users and merge change sets that span all the various users of the interface in one go. It''s also frequently the case that when you have a simple clean interface that doesn''t fit some in tree users it becomes blindly obvious what it should look like. So I would suggest the path is - Use existing interfaces - Merge chunks of the Xen code without worrying too much about performance in Xen but worry in detail about bare metal performance - Don''t worry about "hard" problems initially - eg with PAE just use the paravirt CPUID hook and deny having PAE to begin with - Where there isn''t a clean simple interface try as hard as possible to build some glue code using existing interfaces in the kernel When it works, doesn''t harm bare metal performance and is merged then go back and worry about the harder stuff, optimisation and fine tuning. It doesn''t even need to be able to run all guests or all configurations initially. Also please can folks get out of the "how do we merge Xen" mentality into the "How do we create dom0 functionality for Xen in Linux" - don''t pre-suppose the existing implementation is right. Alan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Christian Tramnitz
2009-Jun-03 08:07 UTC
[Xen-users] Re: Merge Xen (the hypervisor) into Linux
Ingo Molnar wrote:> A lot of Xen legacies could be dropped: the crazy ring1 hack on > 32-bit, the various wide interfaces to make pure-software > virtualization limp along. All major CPUs shipped with hardware > virtualization support in the past 2-3 years, so the availability of > VMX and SVM can be taken for granted for such a project.What a great idea, and while we''re doing this let''s also drop support for legacy stuff like PATA and i8042 in mainline. Noone will need it anyway because their successors are on the market for years... let''s just take it for granted that everyone is using SATA and USB nowadays! Best regards, Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Linux is not the only player here. NetBSD can run as dom0 guest. > Solaris can run as dom0 guest too. Thus making the dom0/xen interface > private to linux and xen isn''t going to fly.It does not however preclude fixing the dom0 interface. Anyway we deal with unfixable interfaces on a regular basis with device hardware. What we don''t do is screw up the kernel handling garbage hardware. We dump the adaption on the driver. Same with Xen, impedance matching Xen''s interface with the kernel is (at least initialy) something that belongs entirely in the Xen glue, or to get started initially by just turning off stuff. MTRR, PAE etc can all be turned off for the purpose an initial merge. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 06/03/09 10:47, Alan Cox wrote:>> Linux is not the only player here. NetBSD can run as dom0 guest. >> Solaris can run as dom0 guest too. Thus making the dom0/xen interface >> private to linux and xen isn''t going to fly. > > It does not however preclude fixing the dom0 interface.It wasn''t my intention to imply that. The interface can be extended when needed. PAT support will probably be such a case. Changing it in incompatible ways isn''t going to work though.> MTRR, PAE etc can all be turned off for the purpose an initial merge.s/PAE/PAT/? PAE is mandatory ... Having not-yet supported stuff disabled initially is sensible IMHO. Can be done for MTRR and PAT. Is already done for MSI ;) The lapic/ioapic stuff must be sorted though because otherwise you can''t boot the box at all. I think the same is true for the swiotlb bits. cheers, Gerd _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 03/06/2009 10:09, "Gerd Hoffmann" <kraxel@redhat.com> wrote:> On 06/03/09 10:47, Alan Cox wrote: >>> Linux is not the only player here. NetBSD can run as dom0 guest. >>> Solaris can run as dom0 guest too. Thus making the dom0/xen interface >>> private to linux and xen isn''t going to fly. >> >> It does not however preclude fixing the dom0 interface. > > It wasn''t my intention to imply that. The interface can be extended > when needed. PAT support will probably be such a case. Changing it in > incompatible ways isn''t going to work though.We''re happy to change interfaces where we agree that makes sense. Compatibility is our own (Xen''s) problem of course, and it''s generally not an insurmountable problem -- worst case we can launch dom0 in a varying environment dependent on a Xen-specific elf note, for example. -- Keir _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jun 03, 2009 at 11:09:39AM +0200, Gerd Hoffmann wrote:> On 06/03/09 10:47, Alan Cox wrote: >>> Linux is not the only player here. NetBSD can run as dom0 guest. >>> Solaris can run as dom0 guest too. Thus making the dom0/xen interface >>> private to linux and xen isn''t going to fly. >> >> It does not however preclude fixing the dom0 interface. > > It wasn''t my intention to imply that. The interface can be extended > when needed. PAT support will probably be such a case. Changing it in > incompatible ways isn''t going to work though.But that means that if there is some fundamentally broken piece of dom0 design, that the Linux kernel will be stuck with it ***forever*** and it will contaminate code paths and make the code harder to maintain ***forever*** if we consent to the Xen merge? Is that really what you are saying? Be careful how you answer that.... - Ted _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 03/06/2009 12:15, "Theodore Tso" <tytso@mit.edu> wrote:>>> It does not however preclude fixing the dom0 interface. >> >> It wasn''t my intention to imply that. The interface can be extended >> when needed. PAT support will probably be such a case. Changing it in >> incompatible ways isn''t going to work though. > > But that means that if there is some fundamentally broken piece of > dom0 design, that the Linux kernel will be stuck with it ***forever*** > and it will contaminate code paths and make the code harder to > maintain ***forever*** if we consent to the Xen merge? Is that really > what you are saying? Be careful how you answer that....It''s not true, if you are prepared for a new dom0 kernel to require a new version of Xen (which seems not unreasonable). We''re happy to make reasonable interface changes, and deal with compatibility issues as necessary within Xen. -- Keir _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 06/03/09 13:15, Theodore Tso wrote:> On Wed, Jun 03, 2009 at 11:09:39AM +0200, Gerd Hoffmann wrote: >> It wasn''t my intention to imply that. The interface can be extended >> when needed. PAT support will probably be such a case. Changing it in >> incompatible ways isn''t going to work though. > > But that means that if there is some fundamentally broken piece of > dom0 design, that the Linux kernel will be stuck with it ***forever*** > and it will contaminate code paths and make the code harder to > maintain ***forever*** if we consent to the Xen merge?No. Xen is stuck with it forever (or at least for a few releases). Even when adding new & better dom0/xen interfaces in the merge process Xen has to keep the old ones to handle the other dom0 guests (NetBSD, Solaris, old 2.6.18 out-of-tree linux kernel). Pretty much like the linux kernel has to keep old syscalls to not break the ABI for the applications, xen has to maintain old hypercalls[1]. Other way around: Apps can use new system calls only when running one recent kernels, and they have to deal with -ENOSYS. Likewise it might be that the pv_ops-based dom0 kernel can provide some features only when running on a recent hypervisor. That will likely be the case for PAT. cheers, Gerd [1] and other interfaces like trap''n''emulate certain instructions. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ingo Molnar wrote:> A lot of Xen legacies could be dropped: the crazy ring1 hack on > 32-bit, the various wide interfaces to make pure-software > virtualization limp along. All major CPUs shipped with hardware > virtualization support in the past 2-3 years, so the availability of > VMX and SVM can be taken for granted for such a project.That''s a pretty bold statement. I have five x86 machines in my house currently being used, and none of them support VMX/SVM. At least some Lenovo laptops disable VMX in the BIOS with no way to enable it. Some of the Core2Duo chips don''t support VMX at all. I think Xen without paravirtualization would be a serious degradation of usefulness. Chris _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, 03 Jun 2009 11:31:13 -0600 "Chris Friesen" <cfriesen@nortel.com> wrote:> Ingo Molnar wrote: > > > A lot of Xen legacies could be dropped: the crazy ring1 hack on > > 32-bit, the various wide interfaces to make pure-software > > virtualization limp along. All major CPUs shipped with hardware > > virtualization support in the past 2-3 years, so the availability of > > VMX and SVM can be taken for granted for such a project. > > That''s a pretty bold statement. I have five x86 machines in my house > currently being used, and none of them support VMX/SVM. > > At least some Lenovo laptops disable VMX in the BIOS with no way to > enable it. Some of the Core2Duo chips don''t support VMX at all.Ditto some Atom cpus which in turn means you can''t run kvm on all the netbooks right now - which is one place its very useful.> I think Xen without paravirtualization would be a serious degradation of > usefulness.At that point you can just use kvm anyway. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thomas Gleixner wrote:> On Fri, 29 May 2009, George Dunlap wrote: >> David Miller wrote: >>> I don''t see Ingo''s comments, whether I agree with them or not, as >>> an implication of Xen being niche. Rather I see his comments as >>> an opposition to how Xen is implemented. >>> >> It''s in his definition of "improving Linux". Jeremy is saying that allowing >> Linux to run as dom0 *is* improving Linux. The lack of dom0 support is at >> this moment making life more difficult for a huge number of Linux users who > > Exactly that''s the point. Adding dom0 makes life easier for a group of > users who decided to use Xen some time ago, but what Ingo wants is > technical improvement of the kernel. > > There are many features which have been wildly used in the distro > world where developers tried to push support into the kernel with the > same line of arguments. > > The kernel policy always was and still is to accept only those > features which have a technical benefit to the code base. > > I''m just picking a few examples: > > Aside of the paravirt, which seems to expand through arch/x86 like a > hydra, the new patches sprinkle "if (xen_...)" all over the > place. These extra xen dependencies are no improvement, they are a > royal pain in the ... They are sticky once they got merged simply > because the hypervisor relies on them and we need to provide > compatibility for a long time. >Wait, let''s not classify something as "no improvement" when you mean "I don''t need it." The fact that processors without hardware VM can run virtual machines is a non-trivial benefit for many users, and in future embedded applications, where both hvm and 64 bit capability may not justify their power requirements. And the improved PV performance over full virtualization is an improvement, even though it certainly isn''t night and day. Having replace some systems with new hardware just so I could use KVM does not make me forget that I used xen for some time, and that PV is still a savings, even with the latest hardware. Let''s stick to technical issues, and not deny that there are a number of users who really will have expanded capability. The technical points are valid, but as a former and probable future xen (CentOS) user, so are the benefits. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, 3 Jun 2009, Bill Davidsen wrote:> Thomas Gleixner wrote: > > Aside of the paravirt, which seems to expand through arch/x86 like a > > hydra, the new patches sprinkle "if (xen_...)" all over the > > place. These extra xen dependencies are no improvement, they are a > > royal pain in the ... They are sticky once they got merged simply > > because the hypervisor relies on them and we need to provide > > compatibility for a long time. > > > Wait, let''s not classify something as "no improvement" when you mean "I don''t > need it."It''s not about "I don''t need it.". It''s about having Xen dependencies in the code all over the place which make mainatainence harder. I have to balance the users benefit (xen dom0 support) vs. the impact on maintainability and the restrictions which are going to be set almost in stone by merging it.> Let''s stick to technical issues, and not deny that there are a number of users > who really will have expanded capability. The technical points are valid, but > as a former and probable future xen (CentOS) user, so are the benefits.Refusing random "if (xen...)" dependencies is a purely technical decision. I have said more than once that I''m not against merging dom0 in general, I''m just frightened by the technical impact of a defacto ABI which we swallow with it. We have enough problems with real silicon and BIOS/ACPI already, why should we add artifical and _avoidable_ virtual silicon horror ? Thanks, tglx _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thomas Gleixner wrote:> On Wed, 3 Jun 2009, Bill Davidsen wrote: > >> Thomas Gleixner wrote: >> >>> Aside of the paravirt, which seems to expand through arch/x86 like a >>> hydra, the new patches sprinkle "if (xen_...)" all over the >>> place. These extra xen dependencies are no improvement, they are a >>> royal pain in the ... They are sticky once they got merged simply >>> because the hypervisor relies on them and we need to provide >>> compatibility for a long time. >>> >>> >> Wait, let''s not classify something as "no improvement" when you mean "I don''t >> need it." >> > > It''s not about "I don''t need it.". It''s about having Xen dependencies > in the code all over the place which make mainatainence harder. I have > to balance the users benefit (xen dom0 support) vs. the impact on > maintainability and the restrictions which are going to be set almost > in stone by merging it. > > >> Let''s stick to technical issues, and not deny that there are a number of users >> who really will have expanded capability. The technical points are valid, but >> as a former and probable future xen (CentOS) user, so are the benefits. >> > > Refusing random "if (xen...)" dependencies is a purely technical > decision. I have said more than once that I''m not against merging dom0 > in general, I''m just frightened by the technical impact of a defacto > ABI which we swallow with it. > >I was referring to your "no benefit" comment, I don''t dispute the technical issues. I think the idea of moving the hypervisor into the kernel and letting xen folks do the external parts as they please.> We have enough problems with real silicon and BIOS/ACPI already, why > should we add artifical and _avoidable_ virtual silicon horror ? >I guess my point wasn''t clear, sorry, it''s just that I felt as though the features lacking KVM (old/small/BIOS-limited CPUs) might be hidden in the smoke due to the technical issues. -- Bill Davidsen <davidsen@tmr.com> Even purely technical things can appear to be magic, if the documentation is obscure enough. For example, PulseAudio is configured by dancing naked around a fire at midnight, shaking a rattle with one hand and a LISP manual with the other, while reciting the GNU manifesto in hexadecimal. The documentation fails to note that you must circle the fire counter-clockwise in the southern hemisphere. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bill Davidsen wrote:> I was referring to your "no benefit" comment, I don''t dispute the > technical issues. I think the idea of moving the hypervisor into the > kernel and letting xen folks do the external parts as they please.Where does that come from? AFAICT Thomas never made a "no benefit" comment other than limited to the context of the technical implementation. I''ve always understood his meaning in this thread to be: "the proposed patch set does not improve the technical standard of the linux kernel, but would instead lower it considerably". Thomas has been extremely correct in this thread and IMO does not deserve this attack. Let''s look at his exact comments (emphasis mine). ! The kernel policy always was and still is to accept only those ! features which have a technical benefit **to the code base**. and ! Aside of the paravirt, which seems to expand through arch/x86 like a ! hydra, the new patches sprinkle "if (xen_...)" all over the ! place. These extra xen dependencies are no improvement, they are a ! royal pain in the ... Also clearly limited to technical implementation. ! I really have a hard time to see why dom0 support makes Linux more ! useful **to people who do not use it**. It does not improve the Linux ! experience **of Joe User** at all. Or has Thomas made some "no benefit" comment I''ve missed? Cheers, FJP _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Frans Pop wrote:> ! The kernel policy always was and still is to accept only those > ! features which have a technical benefit **to the code base**. >Yes, I think I understood him better after I responded to his e-mail (unfortunately). When people say things like "dom0 adds all these hooks but doesn''t add anything to Linux", they mean something like this (please correct me anyone, if I''m wrong). Kernel developers want Linux, as a project, to have cool things in it. They want it to be cool. Adding new features, new capabilities, new technical code, makes it cooler. Sometimes adding new features to make it cooler has some cost in terms of adding things to other parts of the code, possibly making it a little less clean or a little more convoluted. But if the coolness is cool enough, it''s worth the cost. The feeling is that adding a bunch of these dom0 hooks (especially of the type, "if(xen) { foo; }"), are a cost to Linux. They make the code ugly. They do allow a new kind of coolness, a (linux-dom0 + Xen) coolness. But none of the coolness actually happens in Linux; it all happens in Xen. So coolness may happen, and world happiness might increase marginally, but Linux itself doesn''t seem any cooler, it just has the cost of all these ugly hooks. Thus the "Linux is Xen''s sex slave" analogy. :-) If (hypothetically) we merged Xen into Linux, then (people are suggesting) the coolness of Xen would actually contribute to the coolness of Linux ("add technical benefit to the code base"). People would feel like working on the interface between linux-xen and the rest of linux would be making their own piece of software, Linux, work better, rather than feeling like they have to work with some foreign project that doesn''t make their code any cooler. Is that a pretty accurate representation of the "adding features which have a technical benefit to the code base" argument? -George _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Linus Torvalds wrote:> Seriously. > > If it was just the local APIC, fine. But it may be just the local APIC > code this time around, next time it will be something else. It''s been TLB, > it''s been entry_*.S, it''s been all over. Some of them are performance > issues. > > I dunno. I just do know that I pointed out the statistics for how > mindlessly incestuous the Xen patches have historically been to Jeremy. He > admitted it. I''ve not seen _anybody_ say that things will improve. > > Xen has been painful. If you give maintainers pain, don''t expect them to > love you or respect you. > > So I would really suggest that Xen people should look at _why_ they are > giving maintainers so much pain. > > LinusSeriously, reading this is discouraging. I had to stop myself criticizing too much this opinion here, but it''s kind of hard to read "mindless", "painful" and such considering the consequences of the current state. As time passes, it''s becoming more and more unmaintainable to manage the dom0 patch on one side, and the mainline kernel on the other, even for a user/admin point of view. THIS is years of mindless and painful administration/patching tasks. We''ve all bee waiting too long already. We need the Xen dom0 "feature" NOW! Not tomorrow, not in one week, not in 10 years... As a developer myself (not on the kernel though), I can perfectly understand the standpoint about ugliness of the code. However, refusing to merge gives bad headaches to hundreds of people trying to deal and maintain productions with the issues it creates. I stand on Steven Rostedt''s side (and many others too). Merging WILL make it possible to have Xen going the way you wish. Otherwise, it''s again a cathedral type of development. Keir Fraser and others seems to be willing to do the changes in the API if needed. It''s just not right to tell they don''t want to. And if there is such need for ABI/API compatibility, why not just add a config option "compatibility to old style Xen (dirty hugly slow feature)" if there are some issues? Now, about merging the Xen hypervisor, that''s another discussion that can happen later on, IMHO. What''s URGENT (I insist here) is dom0 support (including with 64 bits). Thomas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Jun 04, 2009 at 02:21:08PM +0100, George Dunlap wrote:> If (hypothetically) we merged Xen into Linux, then (people are > suggesting) the coolness of Xen would actually contribute to the > coolness of Linux ("add technical benefit to the code base"). People > would feel like working on the interface between linux-xen and the rest > of linux would be making their own piece of software, Linux, work > better, rather than feeling like they have to work with some foreign > project that doesn''t make their code any cooler. > > Is that a pretty accurate representation of the "adding features which > have a technical benefit to the code base" argument?The other argument is that by merging Xen into Linux, it becomes easier for kernel developers to understand *why* "if (xen) ..." shows up in random places in core kernel code, and it becomes easier to clean that up. If Xen isn''t merged, it becomes much harder to believe that those cleanups will occur, since the Xen developers might stonewall such cleanups for reasons that Linux developers might not consider valid. So the threshold for accepting patches might be much higher, since the subsystem maintainers involved might decide to NAK patches as uglifying the Linux kernel codebase with no real benefit to the Linux codebase --- and not much hope that said ugly hacks will get cleaned up later. Historically, once code with warts gets merged, we lose all leverage towards fixing those warts afterwards; this is true in general, and not a statement of a lack of trust of Xen developers specifically. This doesn''t make merging Xen *impossible*, but probably makes it harder, since each of those objections will have to be cleared, possibly by refactoring the code so that it adds benefits not just for Xen, but some other in-kernel user of that abstraction (i.e., like KVM, lguest, etc.) or by cleaning up the code in general, in order to clear NAK''s by the relevant developers. If Xen is merged, then ultimately Linus gets to make the call about whether something gets fixed, even at the cost of making a change to the hypervisor/dom0 interface. So this would likely decrease the threshold of what has to be fixed before people are willing to ACK a Xen merge, since there''s better confidence that these warts will be cleaned up. An example of that might be XFS, which had all sorts of Irix warts which has been gradually cleaned up over the years. Of course, there might still be some hideous abstraction violations that would have to be cleaned up first; but that''s up to the relevant subsystem maintainers. - Ted _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
George Dunlap wrote:> Frans Pop wrote: > >>! The kernel policy always was and still is to accept only those >>! features which have a technical benefit **to the code base**.> If (hypothetically) we merged Xen into Linux, then (people are > suggesting) the coolness of Xen would actually contribute to the > coolness of Linux ("add technical benefit to the code base"). People > would feel like working on the interface between linux-xen and the rest > of linux would be making their own piece of software, Linux, work > better, rather than feeling like they have to work with some foreign > project that doesn''t make their code any cooler.I suspect that there is an element of this. There is also the factor that if Xen was merged into linux, we would then be able to work towards a sane(r) virtualization layer that would be useful for KVM, Xen, and possibly others. This provides a technical benefit to the code base by introducing a more logical organization rather than having ad-hoc changes sprinkled all over. Chris _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Linus Torvalds
2009-Jun-04 18:53 UTC
[Xen-devel] Re: Merge Xen (the hypervisor) into Linux
On Wed, 3 Jun 2009, Christian Tramnitz wrote:> > What a great idea, and while we''re doing this let''s also drop support > for legacy stuff like PATA and i8042 in mainline. Noone will need it > anyway because their successors are on the market for years... let''s > just take it for granted that everyone is using SATA and USB nowadays!Have you noticed how PATA and i8042 don''t screw up anything else? You''re totally missing the problem. If Xen was a single driver thing, we wouldn''t have this discussion. But as is, Xen craps all over OTHER PEOPLES CODE. When those people then aren''t interested in Xen, why is anybody surprised that people aren''t excited? Linus _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2009-Jun-05 00:09 UTC
[Xen-devel] Re: Merge Xen (the hypervisor) into Linux
Linus Torvalds, le Thu 04 Jun 2009 11:53:45 -0700, a écrit :> On Wed, 3 Jun 2009, Christian Tramnitz wrote: > > > > What a great idea, and while we''re doing this let''s also drop support > > for legacy stuff like PATA and i8042 in mainline. Noone will need it > > anyway because their successors are on the market for years... let''s > > just take it for granted that everyone is using SATA and USB nowadays! > > Have you noticed how PATA and i8042 don''t screw up anything else?Right. We should get rid of all the HIGHMEM kmap crap that cripples all the code. Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Theodore Tso wrote:> The other argument is that by merging Xen into Linux, it becomes > easier for kernel developers to understand *why* "if (xen) ..." shows > up in random places in core kernel code, and it becomes easier to > clean that up. >I explicitly put the "if (xen) ..." stuff in there because I *know* its ugly: I didn''t want to sugar-coat it and I didn''t want to hide it behind some bogus prettifying abstraction layer, and I didn''t want it to be left unfixed. But it was also the pragmatic way to make progress towards something which actually works and is actually useful to people.> If Xen isn''t merged, it becomes much harder to believe that those > cleanups will occur, since the Xen developers might stonewall such > cleanups for reasons that Linux developers might not consider valid. > So the threshold for accepting patches might be much higher, since the > subsystem maintainers involved might decide to NAK patches as > uglifying the Linux kernel codebase with no real benefit to the Linux > codebase --- and not much hope that said ugly hacks will get cleaned > up later. Historically, once code with warts gets merged, we lose all > leverage towards fixing those warts afterwards; this is true in > general, and not a statement of a lack of trust of Xen developers > specifically. >Well, my whole goal with getting Xen into the kernel has been to make it a proper first-class kernel subsystem. That is: merge it with full review and consensus; take comments and bug reports seriously; work with other subsystem maintainers to sort problems out; take advantage of better kernel mechanisms to improve the Xen code; add better kernel mechanisms to improve the rest of the kernel. Stonewalling or blocking changes don''t come into it. But it works two ways; If I feel that I''m being stonewalled, if people aren''t working with me, then I get frustrated.> This doesn''t make merging Xen *impossible*, but probably makes it > harder, since each of those objections will have to be cleared, > possibly by refactoring the code so that it adds benefits not just for > Xen, but some other in-kernel user of that abstraction (i.e., like > KVM, lguest, etc.) or by cleaning up the code in general, in order to > clear NAK''s by the relevant developers. >A lot of my kernel work of the last few years has been along those lines: lots of unification, refactoring, cleanups, horse-trading with other subsystems to find common ground, etc. As a result, most of the Xen stuff has slipped in fairly cleanly and quietly. Consequently, very few people seem to realize how uncontroversial the Xen work has been so far; in the 3ish years I''ve been working on it, this is the first big mailing list blowup. In the meantime, there have been lots of users happily using Xen as shipped with their stock kernel.> If Xen is merged, then ultimately Linus gets to make the call about > whether something gets fixed, even at the cost of making a change to > the hypervisor/dom0 interface. So this would likely decrease the > threshold of what has to be fixed before people are willing to ACK a > Xen merge, since there''s better confidence that these warts will be > cleaned up.Well, lets be precise here. Full Xen domU support has been in released kernels for something like 18 months now. This whole discussion isn''t about "should we merge Xen?", because that has already happened. A groups of distinct subsystem developers got together, worked out a common set of requirements, built an interface to meet those requirements and implemented it. The result is Xen, VMI, lguest - and now kvm, which didn''t exist at the time, but has since found the interface useful. This controversy is about the - quite small - dom0 support subset of Xen, which primarily relates to allowing Xen domains to have direct access to hardware. It is technically challenging because it covers quite different set of functionality in different parts of the kernel - pci, dma, interrupts, etc. In some cases, the dom0 changes are fairly uncontroversial because they''re just another user of existing interfaces (dma_ops) or slightly controversial because they need tweaks to an existing interface (swiotlb). However, where the existing kernel code doesn''t have a suitable abstraction layer, or even particularly clean internal interfaces (like the apic code), working out how to make the appropriate Xen changes poses a tricky tradeoff: do I attempt to restructure a large complex subsystem with lots of subtle interactions with the rest of the kernel - not to mention subtle interactions with many types of quirky hardware - just to add my changes? Or do I make some relatively small, low risk (but low beauty) changes to get the job done? I went for the latter; the cost-benefit tradeoff just didn''t seem to justify a massive refactor. But others have pretty pointedly had the opposite view, so I''m now investigating what its actually going to involve. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
From: Samuel Thibault <samuel.thibault@ens-lyon.org> Date: Fri, 5 Jun 2009 02:09:10 +0200> Linus Torvalds, le Thu 04 Jun 2009 11:53:45 -0700, a écrit : >> On Wed, 3 Jun 2009, Christian Tramnitz wrote: >> > >> > What a great idea, and while we''re doing this let''s also drop support >> > for legacy stuff like PATA and i8042 in mainline. Noone will need it >> > anyway because their successors are on the market for years... let''s >> > just take it for granted that everyone is using SATA and USB nowadays! >> >> Have you noticed how PATA and i8042 don''t screw up anything else? > > Right. We should get rid of all the HIGHMEM kmap crap that cripples all > the code.The kmap interfaces are pretty damn clean if you ask me. Especially compared to the abortion Xen plops into the x86 platform code. So, keep searching for an argument where none exists. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Linus Torvalds
2009-Jun-05 00:54 UTC
[Xen-devel] Re: Merge Xen (the hypervisor) into Linux
On Fri, 5 Jun 2009, Samuel Thibault wrote:> > Right. We should get rid of all the HIGHMEM kmap crap that cripples all > the code.Now you''re starting to understand. However, the difference between Xen and highmem (which I do hate, and which took a long time and lots of effort to get done) is how many people care. And in particular how many kernel developers do. Until you can face these obvious facts, please just shut up. Ok? Linsu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frans Pop wrote:> Bill Davidsen wrote: > >> I was referring to your "no benefit" comment, I don''t dispute the >> technical issues. I think the idea of moving the hypervisor into the >> kernel and letting xen folks do the external parts as they please. >> > > Where does that come from? AFAICT Thomas never made a "no benefit" comment > other than limited to the context of the technical implementation. > >Where it comes from is his very recent statement, which contains those very words. You may interpret what he said in any way you choose, but denying that he said it shows that you didn''t follow the link back. I never denied the ugliness of the code, nor does the author, but it adds a great deal of value for many people, and that''s the point I was making. -- Bill Davidsen <davidsen@tmr.com> Even purely technical things can appear to be magic, if the documentation is obscure enough. For example, PulseAudio is configured by dancing naked around a fire at midnight, shaking a rattle with one hand and a LISP manual with the other, while reciting the GNU manifesto in hexadecimal. The documentation fails to note that you must circle the fire counter-clockwise in the southern hemisphere. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bill Davidsen wrote:> Where it comes from is his very recent statement, which contains those > very words. You may interpret what he said in any way you choose, but > denying that he said it shows that you didn''t follow the link back. I > never denied the ugliness of the code, nor does the author, but it adds > a great deal of value for many people, and that''s the point I was making.Lots of code could be said to add a great deal of value for many people (semi-closed video card drivers, ndiswrapper, etc.), but it''s never going to be accepted into the kernel. The maintainers get to decide whether the perceived benefit outweighs the perceived cost. So far, they''ve decided that Xen isn''t worth it. The most likely way to get Xen merged is to lower the cost (reduce the churn and ugliness), increase the benefit (improve the virtualization layer, thus cleaning up other code as well), or both. Chris _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
* Avi Kivity <avi@redhat.com> wrote:> Linus Torvalds wrote: >> The point? Xen really is horribly badly separated out. It gets way more >> incestuous with other systems than it should. It''s entirely possible >> that this is very fundamental to both paravirtualization and to >> hypervisor behavior, but it doesn''t matter - it just measn that I can >> well see that Xen is a f*cking pain to merge. >> >> So please, Xen people, look at your track record, and look at the >> issues from the standpoint of somebody merging your code, rather >> than just from the standpoint of somebody who whines "I want my >> code to be merged". >> >> IOW, if you have trouble getting your code merged, ask yourself >> what _you_ are doing wrong. > > There is in fact a way to get dom0 support with nearly no changes > to Linux, but it involves massive changes to Xen itself and > requires hardware support: run dom0 as a fully virtualized guest, > and assign it all the resources dom0 can access. It''s probably a > massive effort though. > > I''ve considered it for kvm when faced with the "I want a thin > hypervisor" question: compile the hypervisor kernel with PCI > support but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device > drivers), load userspace from initramfs, and assign host devices > to one or more privileged guests. You could probably run the host > with a heavily stripped configuration, and enjoy the slimness > while every interrupt invokes the scheduler, a context switch, and > maybe an IPI for good measure.This would be an acceptable model i suspect, if someone wants a ''slim hypervisor''. We can context switch way faster than we handle IRQs. Plus in a slimmed-down config we could intentionally slim down aspects of the scheduler as well, if it ever became a measurable performance issue. The hypervisor would run a minimal user-space and most of the context-switching overhead relates to having a full-fledged user-space with rich requirements. So there''s no real conceptual friction between a ''lean and mean'' hypervisor and a full-featured native kernel. This would certainly be an utterly clean design, and it would be interesting to see a Linux/Xen + Linux/Dom0 combo engineered in such a way - if people really find this layered kernel approach interesting. So the door is not closed to dom0 at all - but it has to be designed cleanly without messing up the native kernel. Ingo _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ingo Molnar wrote:>> There is in fact a way to get dom0 support with nearly no changes >> to Linux, but it involves massive changes to Xen itself and >> requires hardware support: run dom0 as a fully virtualized guest, >> and assign it all the resources dom0 can access. It''s probably a >> massive effort though. >> >> I''ve considered it for kvm when faced with the "I want a thin >> hypervisor" question: compile the hypervisor kernel with PCI >> support but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device >> drivers), load userspace from initramfs, and assign host devices >> to one or more privileged guests. You could probably run the host >> with a heavily stripped configuration, and enjoy the slimness >> while every interrupt invokes the scheduler, a context switch, and >> maybe an IPI for good measure. >> > > This would be an acceptable model i suspect, if someone wants a > ''slim hypervisor''. > > We can context switch way faster than we handle IRQs. Plus in a > slimmed-down config we could intentionally slim down aspects of the > scheduler as well, if it ever became a measurable performance issue. > The hypervisor would run a minimal user-space and most of the > context-switching overhead relates to having a full-fledged > user-space with rich requirements. So there''s no real conceptual > friction between a ''lean and mean'' hypervisor and a full-featured > native kernel. >The context switch would be taken by the Xen scheduler, not the Linux scheduler. It''s how interrupts work under Xen: an interrupt is taken, Xen schedules the domain that owns the interrupts (dom0 usually), which then handles the interrupt. The Linux scheduler would only be involved if you thread your interrupt handlers. This context switch is necessary regardless of how dom0 is integrated into Linux; it''s simply a side effect of implementing device drivers outside the kernel (in this context, the kernel is Xen, and dom0 is just another userspace, albeit with elevated privileges. The Linux equivalent to dom0 is a process that uses uio. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
* Avi Kivity <avi@redhat.com> wrote:> Ingo Molnar wrote: >>> There is in fact a way to get dom0 support with nearly no changes to >>> Linux, but it involves massive changes to Xen itself and requires >>> hardware support: run dom0 as a fully virtualized guest, and assign >>> it all the resources dom0 can access. It''s probably a massive effort >>> though. >>> >>> I''ve considered it for kvm when faced with the "I want a thin >>> hypervisor" question: compile the hypervisor kernel with PCI support >>> but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device drivers), >>> load userspace from initramfs, and assign host devices to one or more >>> privileged guests. You could probably run the host with a heavily >>> stripped configuration, and enjoy the slimness while every interrupt >>> invokes the scheduler, a context switch, and maybe an IPI for good >>> measure. >>> >> >> This would be an acceptable model i suspect, if someone wants a ''slim >> hypervisor''. >> >> We can context switch way faster than we handle IRQs. Plus in a >> slimmed-down config we could intentionally slim down aspects of the >> scheduler as well, if it ever became a measurable performance issue. >> The hypervisor would run a minimal user-space and most of the >> context-switching overhead relates to having a full-fledged user-space >> with rich requirements. So there''s no real conceptual friction between >> a ''lean and mean'' hypervisor and a full-featured native kernel. >> > > The context switch would be taken by the Xen scheduler, not the Linux > scheduler. [...]The ''slim hypervisor'' model i was suggesting was a slimmed down _Linux_ kernel. Ingo _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ingo Molnar wrote:> * Avi Kivity <avi@redhat.com> wrote: > > >> Ingo Molnar wrote: >> >>>> There is in fact a way to get dom0 support with nearly no changes to >>>> Linux, but it involves massive changes to Xen itself and requires >>>> hardware support: run dom0 as a fully virtualized guest, and assign >>>> it all the resources dom0 can access. It''s probably a massive effort >>>> though. >>>> >>>> I''ve considered it for kvm when faced with the "I want a thin >>>> hypervisor" question: compile the hypervisor kernel with PCI support >>>> but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device drivers), >>>> load userspace from initramfs, and assign host devices to one or more >>>> privileged guests. You could probably run the host with a heavily >>>> stripped configuration, and enjoy the slimness while every interrupt >>>> invokes the scheduler, a context switch, and maybe an IPI for good >>>> measure. >>>> >>>> >>> This would be an acceptable model i suspect, if someone wants a ''slim >>> hypervisor''. >>> >>> We can context switch way faster than we handle IRQs. Plus in a >>> slimmed-down config we could intentionally slim down aspects of the >>> scheduler as well, if it ever became a measurable performance issue. >>> The hypervisor would run a minimal user-space and most of the >>> context-switching overhead relates to having a full-fledged user-space >>> with rich requirements. So there''s no real conceptual friction between >>> a ''lean and mean'' hypervisor and a full-featured native kernel. >>> >>> >> The context switch would be taken by the Xen scheduler, not the Linux >> scheduler. [...] >> > > The ''slim hypervisor'' model i was suggesting was a slimmed down > _Linux_ kernel. >Yeah, I lost the context. I should reduce my own context switching. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sun, 2009-06-07 at 15:46 +0300, Avi Kivity wrote:> Ingo Molnar wrote: > > > > The ''slim hypervisor'' model i was suggesting was a slimmed down > > _Linux_ kernel. > > > > Yeah, I lost the context. I should reduce my own context switching. >It would be better if we monitor the switching, entry/exit and other useful parameters in count and frequency using debugfs to increase the performance. Thanks, -- JSR _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users