Paravirt ops is currently only capable of either replacing a lot of Linux internal code or none at all. The are users that don't need all of the possibilities pv-ops delivers though. On KVM for example we're perfectly fine not using the PV MMU, thus not touching any MMU code. That way we don't have to improve pv-ops to become fast, we just don't compile the MMU parts in! This patchset splits pv-ops into several smaller config options split by feature category and then converts the KVM pv-ops code to use only the bits that are required, lowering overhead. Alexander Graf (3): Split paravirt ops by functionality Only export selected pv-ops feature structs Split the KVM pv-ops support by feature arch/x86/Kconfig | 72 +++++++++++++++++++++++++- arch/x86/include/asm/apic.h | 2 +- arch/x86/include/asm/desc.h | 4 +- arch/x86/include/asm/fixmap.h | 2 +- arch/x86/include/asm/highmem.h | 2 +- arch/x86/include/asm/io_32.h | 4 +- arch/x86/include/asm/io_64.h | 2 +- arch/x86/include/asm/irqflags.h | 21 ++++++-- arch/x86/include/asm/mmu_context.h | 4 +- arch/x86/include/asm/msr.h | 4 +- arch/x86/include/asm/paravirt.h | 44 ++++++++++++++++- arch/x86/include/asm/paravirt_types.h | 12 +++++ arch/x86/include/asm/pgalloc.h | 2 +- arch/x86/include/asm/pgtable-3level_types.h | 2 +- arch/x86/include/asm/pgtable.h | 2 +- arch/x86/include/asm/processor.h | 2 +- arch/x86/include/asm/required-features.h | 2 +- arch/x86/include/asm/smp.h | 2 +- arch/x86/include/asm/system.h | 13 +++-- arch/x86/include/asm/tlbflush.h | 4 +- arch/x86/kernel/head_64.S | 2 +- arch/x86/kernel/kvm.c | 22 ++++++--- arch/x86/kernel/paravirt.c | 37 +++++++++++-- arch/x86/kernel/tsc.c | 2 +- arch/x86/kernel/vsmp_64.c | 2 +- arch/x86/xen/Kconfig | 2 +- 26 files changed, 219 insertions(+), 50 deletions(-)
Currently when using paravirt ops it's an all-or-nothing option. We can either use pv-ops for CPU, MMU, timing, etc. or not at all. Now there are some use cases where we don't need the full feature set, but only a small chunk of it. KVM is a pretty prominent example for this. So let's make everything a bit more fine-grained. We already have a splitting by function groups, namely "cpu", "mmu", "time", "irq", "apic" and "spinlock". Taking that existing splitting and extending it to only compile in the PV capable bits sounded like a natural fit. That way we don't get performance hits in MMU code from using the KVM PV clock which only needs the TIME parts of pv-ops. We define a new CONFIG_PARAVIRT_ALL option that basically does the same thing the CONFIG_PARAVIRT did before this splitting. We move all users of CONFIG_PARAVIRT to CONFIG_PARAVIRT_ALL, so they behave the same way they did before. So here it is - the splitting! I would have made the patch smaller, but this was the closest I could get to atomic (for bisect) while staying sane. Signed-off-by: Alexander Graf <agraf at suse.de> --- arch/x86/Kconfig | 47 ++++++++++++++++++++++++-- arch/x86/include/asm/apic.h | 2 +- arch/x86/include/asm/desc.h | 4 +- arch/x86/include/asm/fixmap.h | 2 +- arch/x86/include/asm/highmem.h | 2 +- arch/x86/include/asm/io_32.h | 4 ++- arch/x86/include/asm/io_64.h | 2 +- arch/x86/include/asm/irqflags.h | 21 +++++++++--- arch/x86/include/asm/mmu_context.h | 4 +- arch/x86/include/asm/msr.h | 4 +- arch/x86/include/asm/paravirt.h | 44 ++++++++++++++++++++++++- arch/x86/include/asm/paravirt_types.h | 12 +++++++ arch/x86/include/asm/pgalloc.h | 2 +- arch/x86/include/asm/pgtable-3level_types.h | 2 +- arch/x86/include/asm/pgtable.h | 2 +- arch/x86/include/asm/processor.h | 2 +- arch/x86/include/asm/required-features.h | 2 +- arch/x86/include/asm/smp.h | 2 +- arch/x86/include/asm/system.h | 13 +++++-- arch/x86/include/asm/tlbflush.h | 4 +- arch/x86/kernel/head_64.S | 2 +- arch/x86/kernel/paravirt.c | 2 + arch/x86/kernel/tsc.c | 2 +- arch/x86/kernel/vsmp_64.c | 2 +- arch/x86/xen/Kconfig | 2 +- 25 files changed, 149 insertions(+), 38 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0c7b699..8c150b6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -350,7 +350,7 @@ endif config X86_VSMP bool "ScaleMP vSMP" - select PARAVIRT + select PARAVIRT_ALL depends on X86_64 && PCI depends on X86_EXTENDED_PLATFORM ---help--- @@ -493,7 +493,7 @@ source "arch/x86/xen/Kconfig" config VMI bool "VMI Guest support (DEPRECATED)" - select PARAVIRT + select PARAVIRT_ALL depends on X86_32 ---help--- VMI provides a paravirtualized interface to the VMware ESX server @@ -512,7 +512,6 @@ config VMI config KVM_CLOCK bool "KVM paravirtualized clock" - select PARAVIRT select PARAVIRT_CLOCK ---help--- Turning on this option will allow you to run a paravirtualized clock @@ -523,7 +522,7 @@ config KVM_CLOCK config KVM_GUEST bool "KVM Guest support" - select PARAVIRT + select PARAVIRT_ALL ---help--- This option enables various optimizations for running under the KVM hypervisor. @@ -551,8 +550,48 @@ config PARAVIRT_SPINLOCKS If you are unsure how to answer this question, answer N. +config PARAVIRT_CPU + bool + select PARAVIRT + default n + +config PARAVIRT_TIME + bool + select PARAVIRT + default n + +config PARAVIRT_IRQ + bool + select PARAVIRT + default n + +config PARAVIRT_APIC + bool + select PARAVIRT + default n + +config PARAVIRT_MMU + bool + select PARAVIRT + default n + +# +# This is a placeholder to activate the old "include all pv-ops functionality" +# behavior. If you're using this I'd recommend looking through your code to see +# if you can be more specific. It probably saves you a few cycles! +# +config PARAVIRT_ALL + bool + select PARAVIRT_CPU + select PARAVIRT_TIME + select PARAVIRT_IRQ + select PARAVIRT_APIC + select PARAVIRT_MMU + default n + config PARAVIRT_CLOCK bool + select PARAVIRT_TIME default n endif diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 474d80d..b54c24a 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -81,7 +81,7 @@ static inline bool apic_from_smp_config(void) /* * Basic functions accessing APICs. */ -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_APIC #include <asm/paravirt.h> #endif diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index e8de2f6..cf65891 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -78,7 +78,7 @@ static inline int desc_empty(const void *ptr) return !(desc[0] | desc[1]); } -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_CPU #include <asm/paravirt.h> #else #define load_TR_desc() native_load_tr_desc() @@ -108,7 +108,7 @@ static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries) static inline void paravirt_free_ldt(struct desc_struct *ldt, unsigned entries) { } -#endif /* CONFIG_PARAVIRT */ +#endif /* CONFIG_PARAVIRT_CPU */ #define store_ldt(ldt) asm("sldt %0" : "=m"(ldt)) diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 14f9890..5f29317 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -156,7 +156,7 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte); void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags); -#ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PARAVIRT_MMU static inline void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags) { diff --git a/arch/x86/include/asm/highmem.h b/arch/x86/include/asm/highmem.h index 014c2b8..458d785 100644 --- a/arch/x86/include/asm/highmem.h +++ b/arch/x86/include/asm/highmem.h @@ -66,7 +66,7 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type); void *kmap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot); struct page *kmap_atomic_to_page(void *ptr); -#ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PARAVIRT_MMU #define kmap_atomic_pte(page, type) kmap_atomic(page, type) #endif diff --git a/arch/x86/include/asm/io_32.h b/arch/x86/include/asm/io_32.h index a299900..a263c6f 100644 --- a/arch/x86/include/asm/io_32.h +++ b/arch/x86/include/asm/io_32.h @@ -109,7 +109,9 @@ extern void io_delay_init(void); #if defined(CONFIG_PARAVIRT) #include <asm/paravirt.h> -#else +#endif + +#ifndef CONFIG_PARAVIRT_CPU static inline void slow_down_io(void) { diff --git a/arch/x86/include/asm/io_64.h b/arch/x86/include/asm/io_64.h index 2440678..82c6eae 100644 --- a/arch/x86/include/asm/io_64.h +++ b/arch/x86/include/asm/io_64.h @@ -40,7 +40,7 @@ extern void native_io_delay(void); extern int io_delay_type; extern void io_delay_init(void); -#if defined(CONFIG_PARAVIRT) +#if defined(CONFIG_PARAVIRT_CPU) #include <asm/paravirt.h> #else diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index 9e2b952..b8d8f4c 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -58,9 +58,11 @@ static inline void native_halt(void) #ifdef CONFIG_PARAVIRT #include <asm/paravirt.h> -#else +#endif + #ifndef __ASSEMBLY__ +#ifndef CONFIG_PARAVIRT_IRQ static inline unsigned long __raw_local_save_flags(void) { return native_save_fl(); @@ -110,12 +112,17 @@ static inline unsigned long __raw_local_irq_save(void) return flags; } -#else +#endif /* CONFIG_PARAVIRT_IRQ */ + +#else /* __ASSEMBLY__ */ +#ifndef CONFIG_PARAVIRT_IRQ #define ENABLE_INTERRUPTS(x) sti #define DISABLE_INTERRUPTS(x) cli +#endif /* !CONFIG_PARAVIRT_IRQ */ #ifdef CONFIG_X86_64 +#ifndef CONFIG_PARAVIRT_CPU #define SWAPGS swapgs /* * Currently paravirt can't handle swapgs nicely when we @@ -128,8 +135,6 @@ static inline unsigned long __raw_local_irq_save(void) */ #define SWAPGS_UNSAFE_STACK swapgs -#define PARAVIRT_ADJUST_EXCEPTION_FRAME /* */ - #define INTERRUPT_RETURN iretq #define USERGS_SYSRET64 \ swapgs; \ @@ -141,16 +146,22 @@ static inline unsigned long __raw_local_irq_save(void) swapgs; \ sti; \ sysexit +#endif /* !CONFIG_PARAVIRT_CPU */ + +#ifndef CONFIG_PARAVIRT_IRQ +#define PARAVIRT_ADJUST_EXCEPTION_FRAME /* */ +#endif /* !CONFIG_PARAVIRT_IRQ */ #else +#ifndef CONFIG_PARAVIRT_CPU #define INTERRUPT_RETURN iret #define ENABLE_INTERRUPTS_SYSEXIT sti; sysexit #define GET_CR0_INTO_EAX movl %cr0, %eax +#endif /* !CONFIG_PARAVIRT_CPU */ #endif #endif /* __ASSEMBLY__ */ -#endif /* CONFIG_PARAVIRT */ #ifndef __ASSEMBLY__ #define raw_local_save_flags(flags) \ diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 4a2d4e0..a209e67 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -6,14 +6,14 @@ #include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/paravirt.h> -#ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PARAVIRT_MMU #include <asm-generic/mm_hooks.h> static inline void paravirt_activate_mm(struct mm_struct *prev, struct mm_struct *next) { } -#endif /* !CONFIG_PARAVIRT */ +#endif /* !CONFIG_PARAVIRT_MMU */ /* * Used for LDT copy/destruction. diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h index 7e2b6ba..80ec5a5 100644 --- a/arch/x86/include/asm/msr.h +++ b/arch/x86/include/asm/msr.h @@ -123,7 +123,7 @@ static inline unsigned long long native_read_pmc(int counter) return EAX_EDX_VAL(val, low, high); } -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_CPU #include <asm/paravirt.h> #else #include <linux/errno.h> @@ -234,7 +234,7 @@ do { \ #define rdtscpll(val, aux) (val) = native_read_tscp(&(aux)) -#endif /* !CONFIG_PARAVIRT */ +#endif /* !CONFIG_PARAVIRT_CPU */ #define checking_wrmsrl(msr, val) wrmsr_safe((msr), (u32)(val), \ diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index efb3899..e543098 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -18,6 +18,7 @@ static inline int paravirt_enabled(void) return pv_info.paravirt_enabled; } +#ifdef CONFIG_PARAVIRT_CPU static inline void load_sp0(struct tss_struct *tss, struct thread_struct *thread) { @@ -58,7 +59,9 @@ static inline void write_cr0(unsigned long x) { PVOP_VCALL1(pv_cpu_ops.write_cr0, x); } +#endif /* CONFIG_PARAVIRT_CPU */ +#ifdef CONFIG_PARAVIRT_MMU static inline unsigned long read_cr2(void) { return PVOP_CALL0(unsigned long, pv_mmu_ops.read_cr2); @@ -78,7 +81,9 @@ static inline void write_cr3(unsigned long x) { PVOP_VCALL1(pv_mmu_ops.write_cr3, x); } +#endif /* CONFIG_PARAVIRT_MMU */ +#ifdef CONFIG_PARAVIRT_CPU static inline unsigned long read_cr4(void) { return PVOP_CALL0(unsigned long, pv_cpu_ops.read_cr4); @@ -92,8 +97,9 @@ static inline void write_cr4(unsigned long x) { PVOP_VCALL1(pv_cpu_ops.write_cr4, x); } +#endif /* CONFIG_PARAVIRT_CPU */ -#ifdef CONFIG_X86_64 +#if defined(CONFIG_X86_64) && defined(CONFIG_PARAVIRT_CPU) static inline unsigned long read_cr8(void) { return PVOP_CALL0(unsigned long, pv_cpu_ops.read_cr8); @@ -105,6 +111,7 @@ static inline void write_cr8(unsigned long x) } #endif +#ifdef CONFIG_PARAVIRT_IRQ static inline void raw_safe_halt(void) { PVOP_VCALL0(pv_irq_ops.safe_halt); @@ -114,14 +121,18 @@ static inline void halt(void) { PVOP_VCALL0(pv_irq_ops.safe_halt); } +#endif /* CONFIG_PARAVIRT_IRQ */ +#ifdef CONFIG_PARAVIRT_CPU static inline void wbinvd(void) { PVOP_VCALL0(pv_cpu_ops.wbinvd); } +#endif #define get_kernel_rpl() (pv_info.kernel_rpl) +#ifdef CONFIG_PARAVIRT_CPU static inline u64 paravirt_read_msr(unsigned msr, int *err) { return PVOP_CALL2(u64, pv_cpu_ops.read_msr, msr, err); @@ -224,12 +235,16 @@ do { \ } while (0) #define rdtscll(val) (val = paravirt_read_tsc()) +#endif /* CONFIG_PARAVIRT_CPU */ +#ifdef CONFIG_PARAVIRT_TIME static inline unsigned long long paravirt_sched_clock(void) { return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock); } +#endif /* CONFIG_PARAVIRT_TIME */ +#ifdef CONFIG_PARAVIRT_CPU static inline unsigned long long paravirt_read_pmc(int counter) { return PVOP_CALL1(u64, pv_cpu_ops.read_pmc, counter); @@ -345,8 +360,9 @@ static inline void slow_down_io(void) pv_cpu_ops.io_delay(); #endif } +#endif /* CONFIG_PARAVIRT_CPU */ -#ifdef CONFIG_SMP +#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_APIC) static inline void startup_ipi_hook(int phys_apicid, unsigned long start_eip, unsigned long start_esp) { @@ -355,6 +371,7 @@ static inline void startup_ipi_hook(int phys_apicid, unsigned long start_eip, } #endif +#ifdef CONFIG_PARAVIRT_MMU static inline void paravirt_activate_mm(struct mm_struct *prev, struct mm_struct *next) { @@ -698,7 +715,9 @@ static inline void pmd_clear(pmd_t *pmdp) set_pmd(pmdp, __pmd(0)); } #endif /* CONFIG_X86_PAE */ +#endif /* CONFIG_PARAVIRT_MMU */ +#ifdef CONFIG_PARAVIRT_CPU #define __HAVE_ARCH_START_CONTEXT_SWITCH static inline void arch_start_context_switch(struct task_struct *prev) { @@ -709,7 +728,9 @@ static inline void arch_end_context_switch(struct task_struct *next) { PVOP_VCALL1(pv_cpu_ops.end_context_switch, next); } +#endif /* CONFIG_PARAVIRT_CPU */ +#ifdef CONFIG_PARAVIRT_MMU #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE static inline void arch_enter_lazy_mmu_mode(void) { @@ -728,6 +749,7 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, { pv_mmu_ops.set_fixmap(idx, phys, flags); } +#endif /* CONFIG_PARAVIRT_MMU */ #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS) @@ -838,6 +860,7 @@ static __always_inline void __raw_spin_unlock(struct raw_spinlock *lock) #define __PV_IS_CALLEE_SAVE(func) \ ((struct paravirt_callee_save) { func }) +#ifdef CONFIG_PARAVIRT_IRQ static inline unsigned long __raw_local_save_flags(void) { return PVOP_CALLEE0(unsigned long, pv_irq_ops.save_fl); @@ -866,6 +889,7 @@ static inline unsigned long __raw_local_irq_save(void) raw_local_irq_disable(); return f; } +#endif /* CONFIG_PARAVIRT_IRQ */ /* Make sure as little as possible of this mess escapes. */ @@ -948,10 +972,13 @@ extern void default_banner(void); #define PARA_INDIRECT(addr) *%cs:addr #endif +#ifdef CONFIG_PARAVIRT_CPU #define INTERRUPT_RETURN \ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_iret), CLBR_NONE, \ jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_iret)) +#endif /* CONFIG_PARAVIRT_CPU */ +#ifdef CONFIG_PARAVIRT_IRQ #define DISABLE_INTERRUPTS(clobbers) \ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers, \ PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \ @@ -963,13 +990,17 @@ extern void default_banner(void); PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \ call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_enable); \ PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);) +#endif /* CONFIG_PARAVIRT_IRQ */ +#ifdef CONFIG_PARAVIRT_CPU #define USERGS_SYSRET32 \ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret32), \ CLBR_NONE, \ jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_usergs_sysret32)) +#endif /* CONFIG_PARAVIRT_CPU */ #ifdef CONFIG_X86_32 +#ifdef CONFIG_PARAVIRT_CPU #define GET_CR0_INTO_EAX \ push %ecx; push %edx; \ call PARA_INDIRECT(pv_cpu_ops+PV_CPU_read_cr0); \ @@ -979,10 +1010,12 @@ extern void default_banner(void); PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_irq_enable_sysexit), \ CLBR_NONE, \ jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_irq_enable_sysexit)) +#endif /* CONFIG_PARAVIRT_CPU */ #else /* !CONFIG_X86_32 */ +#ifdef CONFIG_PARAVIRT_CPU /* * If swapgs is used while the userspace stack is still current, * there's no way to call a pvop. The PV replacement *must* be @@ -1002,17 +1035,23 @@ extern void default_banner(void); PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), CLBR_NONE, \ call PARA_INDIRECT(pv_cpu_ops+PV_CPU_swapgs) \ ) +#endif /* CONFIG_PARAVIRT_CPU */ +#ifdef CONFIG_PARAVIRT_MMU #define GET_CR2_INTO_RCX \ call PARA_INDIRECT(pv_mmu_ops+PV_MMU_read_cr2); \ movq %rax, %rcx; \ xorq %rax, %rax; +#endif /* CONFIG_PARAVIRT_MMU */ +#ifdef CONFIG_PARAVIRT_IRQ #define PARAVIRT_ADJUST_EXCEPTION_FRAME \ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_adjust_exception_frame), \ CLBR_NONE, \ call PARA_INDIRECT(pv_irq_ops+PV_IRQ_adjust_exception_frame)) +#endif /* CONFIG_PARAVIRT_IRQ */ +#ifdef CONFIG_PARAVIRT_CPU #define USERGS_SYSRET64 \ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret64), \ CLBR_NONE, \ @@ -1022,6 +1061,7 @@ extern void default_banner(void); PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_irq_enable_sysexit), \ CLBR_NONE, \ jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_irq_enable_sysexit)) +#endif /* CONFIG_PARAVIRT_CPU */ #endif /* CONFIG_X86_32 */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 9357473..e190450 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -343,12 +343,24 @@ struct paravirt_patch_template { extern struct pv_info pv_info; extern struct pv_init_ops pv_init_ops; +#ifdef CONFIG_PARAVIRT_TIME extern struct pv_time_ops pv_time_ops; +#endif +#ifdef CONFIG_PARAVIRT_CPU extern struct pv_cpu_ops pv_cpu_ops; +#endif +#ifdef CONFIG_PARAVIRT_IRQ extern struct pv_irq_ops pv_irq_ops; +#endif +#ifdef CONFIG_PARAVIRT_APIC extern struct pv_apic_ops pv_apic_ops; +#endif +#ifdef CONFIG_PARAVIRT_MMU extern struct pv_mmu_ops pv_mmu_ops; +#endif +#ifdef CONFIG_PARAVIRT_SPINLOCKS extern struct pv_lock_ops pv_lock_ops; +#endif #define PARAVIRT_PATCH(x) \ (offsetof(struct paravirt_patch_template, x) / sizeof(void *)) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index 0e8c2a0..94cce3d 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -7,7 +7,7 @@ static inline int __paravirt_pgd_alloc(struct mm_struct *mm) { return 0; } -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_MMU #include <asm/paravirt.h> #else #define paravirt_pgd_alloc(mm) __paravirt_pgd_alloc(mm) diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h index 1bd5876..be58e74 100644 --- a/arch/x86/include/asm/pgtable-3level_types.h +++ b/arch/x86/include/asm/pgtable-3level_types.h @@ -18,7 +18,7 @@ typedef union { } pte_t; #endif /* !__ASSEMBLY__ */ -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_MMU #define SHARED_KERNEL_PMD (pv_info.shared_kernel_pmd) #else #define SHARED_KERNEL_PMD 1 diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index af6fd36..b68edfc 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -26,7 +26,7 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]; extern spinlock_t pgd_lock; extern struct list_head pgd_list; -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_MMU #include <asm/paravirt.h> #else /* !CONFIG_PARAVIRT */ #define set_pte(ptep, pte) native_set_pte(ptep, pte) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index c3429e8..a42a807 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -571,7 +571,7 @@ static inline void native_swapgs(void) #endif } -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_CPU #include <asm/paravirt.h> #else #define __cpuid native_cpuid diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h index 64cf2d2..f68edf2 100644 --- a/arch/x86/include/asm/required-features.h +++ b/arch/x86/include/asm/required-features.h @@ -48,7 +48,7 @@ #endif #ifdef CONFIG_X86_64 -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_MMU /* Paravirtualized systems may not have PSE or PGE available */ #define NEED_PSE 0 #define NEED_PGE 0 diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 1e79678..fdd889a 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -66,7 +66,7 @@ struct smp_ops { extern void set_cpu_sibling_map(int cpu); #ifdef CONFIG_SMP -#ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PARAVIRT_APIC #define startup_ipi_hook(phys_apicid, start_eip, start_esp) do { } while (0) #endif extern struct smp_ops smp_ops; diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h index f08f973..63ca93c 100644 --- a/arch/x86/include/asm/system.h +++ b/arch/x86/include/asm/system.h @@ -302,13 +302,18 @@ static inline void native_wbinvd(void) #ifdef CONFIG_PARAVIRT #include <asm/paravirt.h> -#else -#define read_cr0() (native_read_cr0()) -#define write_cr0(x) (native_write_cr0(x)) +#endif/* CONFIG_PARAVIRT */ + +#ifndef CONFIG_PARAVIRT_MMU #define read_cr2() (native_read_cr2()) #define write_cr2(x) (native_write_cr2(x)) #define read_cr3() (native_read_cr3()) #define write_cr3(x) (native_write_cr3(x)) +#endif /* CONFIG_PARAVIRT_MMU */ + +#ifndef CONFIG_PARAVIRT_CPU +#define read_cr0() (native_read_cr0()) +#define write_cr0(x) (native_write_cr0(x)) #define read_cr4() (native_read_cr4()) #define read_cr4_safe() (native_read_cr4_safe()) #define write_cr4(x) (native_write_cr4(x)) @@ -322,7 +327,7 @@ static inline void native_wbinvd(void) /* Clear the 'TS' bit */ #define clts() (native_clts()) -#endif/* CONFIG_PARAVIRT */ +#endif /* CONFIG_PARAVIRT_CPU */ #define stts() write_cr0(read_cr0() | X86_CR0_TS) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 7f3eba0..89e055c 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -7,7 +7,7 @@ #include <asm/processor.h> #include <asm/system.h> -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_MMU #include <asm/paravirt.h> #else #define __flush_tlb() __native_flush_tlb() @@ -162,7 +162,7 @@ static inline void reset_lazy_tlbstate(void) #endif /* SMP */ -#ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PARAVIRT_MMU #define flush_tlb_others(mask, mm, va) native_flush_tlb_others(mask, mm, va) #endif diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 780cd92..1284d8d 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -20,7 +20,7 @@ #include <asm/processor-flags.h> #include <asm/percpu.h> -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_MMU #include <asm/asm-offsets.h> #include <asm/paravirt.h> #else diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 1b1739d..c8530bd 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -155,12 +155,14 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf, else if (opfunc == _paravirt_ident_64) ret = paravirt_patch_ident_64(insnbuf, len); +#ifdef CONFIG_PARAVIRT_CPU else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) || type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) || type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) || type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret64)) /* If operation requires a jmp, then jmp */ ret = paravirt_patch_jmp(insnbuf, opfunc, addr, len); +#endif else /* Otherwise call the function; assume target could clobber any caller-save reg */ diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index cd982f4..96aad98 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -66,7 +66,7 @@ u64 native_sched_clock(void) /* We need to define a real function for sched_clock, to override the weak default version */ -#ifdef CONFIG_PARAVIRT +#ifdef CONFIG_PARAVIRT_TIME unsigned long long sched_clock(void) { return paravirt_sched_clock(); diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c index a1d804b..23f4612 100644 --- a/arch/x86/kernel/vsmp_64.c +++ b/arch/x86/kernel/vsmp_64.c @@ -22,7 +22,7 @@ #include <asm/paravirt.h> #include <asm/setup.h> -#if defined CONFIG_PCI && defined CONFIG_PARAVIRT +#if defined CONFIG_PCI && defined CONFIG_PARAVIRT_IRQ /* * Interrupt control on vSMPowered systems: * ~AC is a shadow of IF. If IF is 'on' AC should be 'off' diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index b83e119..eef41bd 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -4,7 +4,7 @@ config XEN bool "Xen guest support" - select PARAVIRT + select PARAVIRT_ALL select PARAVIRT_CLOCK depends on X86_64 || (X86_32 && X86_PAE && !X86_VISWS) depends on X86_CMPXCHG && X86_TSC -- 1.6.0.2
Alexander Graf
2009-Nov-18 00:13 UTC
[PATCH 2/3] Only export selected pv-ops feature structs
To really check for sure that we're not using any pv-ops code by accident, we should make sure that we don't even export the structures used to access pv-ops exported functions. So let's surround the pv-ops structs by #ifdefs. Signed-off-by: Alexander Graf <agraf at suse.de> --- arch/x86/kernel/paravirt.c | 35 +++++++++++++++++++++++++++++------ 1 files changed, 29 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index c8530bd..0619e7c 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -124,11 +124,21 @@ static void *get_call_destination(u8 type) { struct paravirt_patch_template tmpl = { .pv_init_ops = pv_init_ops, +#ifdef CONFIG_PARAVIRT_TIME .pv_time_ops = pv_time_ops, +#endif +#ifdef CONFIG_PARAVIRT_CPU .pv_cpu_ops = pv_cpu_ops, +#endif +#ifdef CONFIG_PARAVIRT_IRQ .pv_irq_ops = pv_irq_ops, +#endif +#ifdef CONFIG_PARAVIRT_APIC .pv_apic_ops = pv_apic_ops, +#endif +#ifdef CONFIG_PARAVIRT_MMU .pv_mmu_ops = pv_mmu_ops, +#endif #ifdef CONFIG_PARAVIRT_SPINLOCKS .pv_lock_ops = pv_lock_ops, #endif @@ -185,6 +195,7 @@ unsigned paravirt_patch_insns(void *insnbuf, unsigned len, return insn_len; } +#ifdef CONFIG_PARAVIRT_MMU static void native_flush_tlb(void) { __native_flush_tlb(); @@ -203,6 +214,7 @@ static void native_flush_tlb_single(unsigned long addr) { __native_flush_tlb_single(addr); } +#endif /* CONFIG_PARAVIRT_MMU */ /* These are in entry.S */ extern void native_iret(void); @@ -284,6 +296,7 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void) return percpu_read(paravirt_lazy_mode); } +#ifdef CONFIG_PARAVIRT_MMU void arch_flush_lazy_mmu_mode(void) { preempt_disable(); @@ -295,6 +308,7 @@ void arch_flush_lazy_mmu_mode(void) preempt_enable(); } +#endif /* CONFIG_PARAVIRT_MMU */ struct pv_info pv_info = { .name = "bare hardware", @@ -306,11 +320,16 @@ struct pv_info pv_info = { struct pv_init_ops pv_init_ops = { .patch = native_patch, }; +EXPORT_SYMBOL_GPL(pv_info); +#ifdef CONFIG_PARAVIRT_TIME struct pv_time_ops pv_time_ops = { .sched_clock = native_sched_clock, }; +EXPORT_SYMBOL_GPL(pv_time_ops); +#endif +#ifdef CONFIG_PARAVIRT_IRQ struct pv_irq_ops pv_irq_ops = { .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl), .restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl), @@ -322,7 +341,10 @@ struct pv_irq_ops pv_irq_ops = { .adjust_exception_frame = paravirt_nop, #endif }; +EXPORT_SYMBOL (pv_irq_ops); +#endif +#ifdef CONFIG_PARAVIRT_CPU struct pv_cpu_ops pv_cpu_ops = { .cpuid = native_cpuid, .get_debugreg = native_get_debugreg, @@ -383,12 +405,17 @@ struct pv_cpu_ops pv_cpu_ops = { .start_context_switch = paravirt_nop, .end_context_switch = paravirt_nop, }; +EXPORT_SYMBOL (pv_cpu_ops); +#endif +#ifdef CONFIG_PARAVIRT_APIC struct pv_apic_ops pv_apic_ops = { #ifdef CONFIG_X86_LOCAL_APIC .startup_ipi_hook = paravirt_nop, #endif }; +EXPORT_SYMBOL_GPL(pv_apic_ops); +#endif #if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE) /* 32-bit pagetable entries */ @@ -398,6 +425,7 @@ struct pv_apic_ops pv_apic_ops = { #define PTE_IDENT __PV_IS_CALLEE_SAVE(_paravirt_ident_64) #endif +#ifdef CONFIG_PARAVIRT_MMU struct pv_mmu_ops pv_mmu_ops = { .read_cr2 = native_read_cr2, @@ -470,10 +498,5 @@ struct pv_mmu_ops pv_mmu_ops = { .set_fixmap = native_set_fixmap, }; - -EXPORT_SYMBOL_GPL(pv_time_ops); -EXPORT_SYMBOL (pv_cpu_ops); EXPORT_SYMBOL (pv_mmu_ops); -EXPORT_SYMBOL_GPL(pv_apic_ops); -EXPORT_SYMBOL_GPL(pv_info); -EXPORT_SYMBOL (pv_irq_ops); +#endif -- 1.6.0.2
Currently selecting KVM guest support enabled multiple features at once that not everyone necessarily wants to have, namely: - PV MMU - zero io delay - apic detection workaround Let's split them off so we don't drag in the full pv-ops framework just to detect we're running on KVM. That gives us more chances to tweak performance! Signed-off-by: Alexander Graf <agraf at suse.de> --- arch/x86/Kconfig | 29 ++++++++++++++++++++++++++++- arch/x86/kernel/kvm.c | 22 +++++++++++++++------- 2 files changed, 43 insertions(+), 8 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8c150b6..97d4f92 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -522,11 +522,38 @@ config KVM_CLOCK config KVM_GUEST bool "KVM Guest support" - select PARAVIRT_ALL + select PARAVIRT ---help--- This option enables various optimizations for running under the KVM hypervisor. +config KVM_IODELAY + bool "KVM IO-delay support" + depends on KVM_GUEST + select PARAVIRT_CPU + ---help--- + Usually we wait for PIO access to complete. When inside KVM there's + no need to do that, as we know that we're not going through a bus, + but process PIO requests instantly. + + This option disables PIO waits, but drags in CPU-bound pv-ops. Thus + you will probably get more speed loss than speedup using this option. + + If in doubt, say N. + +config KVM_MMU + bool "KVM PV MMU support" + depends on KVM_GUEST + select PARAVIRT_MMU + ---help--- + This option enables the paravirtualized MMU for KVM. In most cases + it's pretty useless and shouldn't be used. + + It will only cost you performance, because it drags in pv-ops for + memory management. + + If in doubt, say N. + source "arch/x86/lguest/Kconfig" config PARAVIRT diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 63b0ec8..7e0207f 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -29,6 +29,16 @@ #include <linux/hardirq.h> #include <asm/timer.h> +#ifdef CONFIG_KVM_IODELAY +/* + * No need for any "IO delay" on KVM + */ +static void kvm_io_delay(void) +{ +} +#endif /* CONFIG_KVM_IODELAY */ + +#ifdef CONFIG_KVM_MMU #define MMU_QUEUE_SIZE 1024 struct kvm_para_state { @@ -43,13 +53,6 @@ static struct kvm_para_state *kvm_para_state(void) return &per_cpu(para_state, raw_smp_processor_id()); } -/* - * No need for any "IO delay" on KVM - */ -static void kvm_io_delay(void) -{ -} - static void kvm_mmu_op(void *buffer, unsigned len) { int r; @@ -194,15 +197,19 @@ static void kvm_leave_lazy_mmu(void) mmu_queue_flush(state); paravirt_leave_lazy_mmu(); } +#endif /* CONFIG_KVM_MMU */ static void __init paravirt_ops_setup(void) { pv_info.name = "KVM"; pv_info.paravirt_enabled = 1; +#ifdef CONFIG_KVM_IODELAY if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY)) pv_cpu_ops.io_delay = kvm_io_delay; +#endif +#ifdef CONFIG_KVM_MMU if (kvm_para_has_feature(KVM_FEATURE_MMU_OP)) { pv_mmu_ops.set_pte = kvm_set_pte; pv_mmu_ops.set_pte_at = kvm_set_pte_at; @@ -226,6 +233,7 @@ static void __init paravirt_ops_setup(void) pv_mmu_ops.lazy_mode.enter = kvm_enter_lazy_mmu; pv_mmu_ops.lazy_mode.leave = kvm_leave_lazy_mmu; } +#endif /* CONFIG_KVM_MMU */ #ifdef CONFIG_X86_IO_APIC no_timer_check = 1; #endif -- 1.6.0.2
On 11/18/2009 02:13 AM, Alexander Graf wrote:> Paravirt ops is currently only capable of either replacing a lot of Linux > internal code or none at all. The are users that don't need all of the > possibilities pv-ops delivers though. > > On KVM for example we're perfectly fine not using the PV MMU, thus not > touching any MMU code. That way we don't have to improve pv-ops to become > fast, we just don't compile the MMU parts in! > > This patchset splits pv-ops into several smaller config options split by > feature category and then converts the KVM pv-ops code to use only the > bits that are required, lowering overhead. > > Alexander Graf (3): > Split paravirt ops by functionality > Only export selected pv-ops feature structs > Split the KVM pv-ops support by feature > >The whole thing looks good to me. Let's wait for Jeremy to ack though. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.
Alexander Graf wrote:> Paravirt ops is currently only capable of either replacing a lot of Linux > internal code or none at all. The are users that don't need all of the > possibilities pv-ops delivers though. > > On KVM for example we're perfectly fine not using the PV MMU, thus not > touching any MMU code. That way we don't have to improve pv-ops to become > fast, we just don't compile the MMU parts in! > > This patchset splits pv-ops into several smaller config options split by > feature category and then converts the KVM pv-ops code to use only the > bits that are required, lowering overhead. >So has this ended up in some tree yet?
Reasonably Related Threads
- [PATCH 0/3] Split up pv-ops
- [PATCH 00/10] x86/paravirt: several cleanups
- [PATCH v2 00/11] x86/paravirt: several cleanups
- [PATCH 00/13] x86/paravirt: Make pv ops code generation more closely match reality
- [PATCH 00/13] x86/paravirt: Make pv ops code generation more closely match reality