Hi. This patchset implements ia64/pv_ops support which is the framework for virtualization support. Now all the comments so far have been addressed, but only a few exceptions. On x86 various ways to support virtualization were proposed, and eventually pv_ops won. So on ia64 the pv_ops strategy is appropriate too. Later I'll post the patchset which implements xen domU based on ia64/pv_ops. Currently only ia64/xen pv_ops implementation exists, but I believe ia64/kvm can also obtain benefits from this ia64/pv_ops frame work. linux/ia64 has the machine vector interface and another approach might be to utilize it. However pv_ops is better for some reasons. Please see the thread of the mailing list from http://www.spinics.net/lists/linux-ia64/msg05238.html This patchset does the followings. - some preparation work. Mainly make the kernel virtualization friendly. - introduce pv_info which describes the execution environment. - introduce macros for hand written assembly code to paravirtualize hand written code to replace sensitive/privileged instructions. - introduce various hooks to replace the implementation with paravirtualized. They are defined as function tables called, pv_init_ops, pv_cpu_ops, pv_irq_ops, pv_iosapic_ops and, pv_time_ops. They represent pv_init_ops: hooks for various initialization. pv_cpu_ops: hooks for ia64 intrinsics. pv_irq_ops: hooks for irq related operations. pv_iosapic_ops: hooks for paravirtualized iosapic. pv_time_ops: hooks for steal time accounting The working full source is available from http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/ branch: xen-ia64-2008apr30 At this phase, we don't address the following issues. Those will be addressed after the first merge. - optimization by binary patch In fact, we had the patch to do that, but we intentionally dropped for patch size/readability/cleanness. - freeing the unused pages, i.e. pages for unused ivt.S. Changes from take 4: - refined NR_IRQS paravirtualization according to Jes Soresen's comment - rebased and fixed the breakage due to the upstream change. - fixed various compilation errors. Changes from take 3: - split the patch set into pv_op part and xen domU part. - many clean ups. - introduced pv_ops: pv_cpu_ops and pv_time_ops. Changes from take 2: - many clean ups following to comments. - clean up:assembly instruction macro. - introduced pv_ops: pv_info, pv_init_ops, pv_iosapic_ops, pv_irq_ops. Changes from take 1: Single IVT source code. compile multitimes using assembler macros. thanks, Diffstat: arch/ia64/Makefile | 6 + arch/ia64/kernel/Makefile | 44 +++++ arch/ia64/kernel/entry.S | 115 +++++++---- arch/ia64/kernel/iosapic.c | 45 +++-- arch/ia64/kernel/irq_ia64.c | 19 ++- arch/ia64/kernel/ivt.S | 227 +++++++++++----------- arch/ia64/kernel/minstate.h | 13 +- arch/ia64/kernel/nr-irqs.c | 24 +++ arch/ia64/kernel/paravirt.c | 369 ++++++++++++++++++++++++++++++++++++ arch/ia64/kernel/paravirt_inst.h | 29 +++ arch/ia64/kernel/paravirtentry.S | 60 ++++++ arch/ia64/kernel/setup.c | 10 + arch/ia64/kernel/smpboot.c | 2 + arch/ia64/kernel/time.c | 23 +++ include/asm-ia64/Kbuild | 2 +- include/asm-ia64/gcc_intrin.h | 24 ++-- include/asm-ia64/hw_irq.h | 23 ++- include/asm-ia64/intel_intrin.h | 41 ++-- include/asm-ia64/intrinsics.h | 55 ++++++ include/asm-ia64/iosapic.h | 18 ++- include/asm-ia64/irq.h | 9 +- include/asm-ia64/mmu_context.h | 6 +- include/asm-ia64/native/inst.h | 180 ++++++++++++++++++ include/asm-ia64/native/irq.h | 35 ++++ include/asm-ia64/paravirt.h | 252 ++++++++++++++++++++++++ include/asm-ia64/paravirt_privop.h | 114 +++++++++++ include/asm-ia64/smp.h | 2 + include/asm-ia64/system.h | 10 +- 28 files changed, 1520 insertions(+), 237 deletions(-)
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 01/15] ia64: preparation: remove extern in irq_ia64.c
remove extern declaration of handle_IPI() in irq_ia64.c. Instead, declare it in asm-ia64/smp.h. Later handle_IPI() will be referenced from another file. Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/irq_ia64.c | 1 - include/asm-ia64/smp.h | 2 ++ 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c index 5538471..c48171b 100644 --- a/arch/ia64/kernel/irq_ia64.c +++ b/arch/ia64/kernel/irq_ia64.c @@ -600,7 +600,6 @@ static irqreturn_t dummy_handler (int irq, void *dev_id) { BUG(); } -extern irqreturn_t handle_IPI (int irq, void *dev_id); static struct irqaction ipi_irqaction = { .handler = handle_IPI, diff --git a/include/asm-ia64/smp.h b/include/asm-ia64/smp.h index ec5f355..2984e26 100644 --- a/include/asm-ia64/smp.h +++ b/include/asm-ia64/smp.h @@ -15,6 +15,7 @@ #include <linux/kernel.h> #include <linux/cpumask.h> #include <linux/bitops.h> +#include <linux/irqreturn.h> #include <asm/io.h> #include <asm/param.h> @@ -123,6 +124,7 @@ extern void __init smp_build_cpu_map(void); extern void __init init_smp_config (void); extern void smp_do_timer (struct pt_regs *regs); +extern irqreturn_t handle_IPI(int irq, void *dev_id); extern void smp_send_reschedule (int cpu); extern void lock_ipi_calllock(void); extern void unlock_ipi_calllock(void); -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 02/15] ia64/pv_ops: preparation: introduce ia64_set_rr0_to_rr4() to make kernel paravirtualization friendly.
make kernel paravirtualization friendly by introducing ia64_set_rr0_to_rr4(). ia64/Xen will replace setting rr[0-4] with single hypercall later. Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- include/asm-ia64/intrinsics.h | 9 +++++++++ include/asm-ia64/mmu_context.h | 6 +----- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/include/asm-ia64/intrinsics.h b/include/asm-ia64/intrinsics.h index f1135b5..9b83f8f 100644 --- a/include/asm-ia64/intrinsics.h +++ b/include/asm-ia64/intrinsics.h @@ -18,6 +18,15 @@ # include <asm/gcc_intrin.h> #endif +#define ia64_set_rr0_to_rr4(val0, val1, val2, val3, val4) \ +do { \ + ia64_set_rr(0x0000000000000000UL, (val0)); \ + ia64_set_rr(0x2000000000000000UL, (val1)); \ + ia64_set_rr(0x4000000000000000UL, (val2)); \ + ia64_set_rr(0x6000000000000000UL, (val3)); \ + ia64_set_rr(0x8000000000000000UL, (val4)); \ +} while (0) + /* * Force an unresolved reference if someone tries to use * ia64_fetch_and_add() with a bad value. diff --git a/include/asm-ia64/mmu_context.h b/include/asm-ia64/mmu_context.h index cef2400..040bc87 100644 --- a/include/asm-ia64/mmu_context.h +++ b/include/asm-ia64/mmu_context.h @@ -152,11 +152,7 @@ reload_context (nv_mm_context_t context) # endif #endif - ia64_set_rr(0x0000000000000000UL, rr0); - ia64_set_rr(0x2000000000000000UL, rr1); - ia64_set_rr(0x4000000000000000UL, rr2); - ia64_set_rr(0x6000000000000000UL, rr3); - ia64_set_rr(0x8000000000000000UL, rr4); + ia64_set_rr0_to_rr4(rr0, rr1, rr2, rr3, rr4); ia64_srlz_i(); /* srlz.i implies srlz.d */ } -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 03/15] ia64/pv_ops: preparation: introduce ia64_get_psr_i() to make kernel paravirtualization friendly.
__local_irq_save() and local_save_flags() are used to mask interruptions. They read all psr bits that requres whole bit emulation. On the other hand, reading only psr.i, the single bit, can be virtualized cheaply. Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- include/asm-ia64/intrinsics.h | 2 ++ include/asm-ia64/system.h | 10 ++++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/include/asm-ia64/intrinsics.h b/include/asm-ia64/intrinsics.h index 9b83f8f..a3b9689 100644 --- a/include/asm-ia64/intrinsics.h +++ b/include/asm-ia64/intrinsics.h @@ -18,6 +18,8 @@ # include <asm/gcc_intrin.h> #endif +#define ia64_get_psr_i() (ia64_getreg(_IA64_REG_PSR) & IA64_PSR_I) + #define ia64_set_rr0_to_rr4(val0, val1, val2, val3, val4) \ do { \ ia64_set_rr(0x0000000000000000UL, (val0)); \ diff --git a/include/asm-ia64/system.h b/include/asm-ia64/system.h index 26e250b..ed38c43 100644 --- a/include/asm-ia64/system.h +++ b/include/asm-ia64/system.h @@ -122,10 +122,16 @@ extern struct ia64_boot_param { * write a floating-point register right before reading the PSR * and that writes to PSR.mfl */ +#ifdef CONFIG_PARAVIRT +#define __local_save_flags() ia64_get_psr_i() +#else +#define __local_save_flags() ia64_getreg(_IA64_REG_PSR) +#endif + #define __local_irq_save(x) \ do { \ ia64_stop(); \ - (x) = ia64_getreg(_IA64_REG_PSR); \ + (x) = __local_save_flags(); \ ia64_stop(); \ ia64_rsm(IA64_PSR_I); \ } while (0) @@ -173,7 +179,7 @@ do { \ #endif /* !CONFIG_IA64_DEBUG_IRQ */ #define local_irq_enable() ({ ia64_stop(); ia64_ssm(IA64_PSR_I); ia64_srlz_d(); }) -#define local_save_flags(flags) ({ ia64_stop(); (flags) = ia64_getreg(_IA64_REG_PSR); }) +#define local_save_flags(flags) ({ ia64_stop(); (flags) = __local_save_flags(); }) #define irqs_disabled() \ ({ \ -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 04/15] ia64/pv_ops: introduce pv_info which describes some random info.
introduce pv_info which describes some randome info about underlying execution environment. Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/Makefile | 2 + arch/ia64/kernel/paravirt.c | 41 ++++++++++++++++++++++++++++ include/asm-ia64/paravirt.h | 62 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 105 insertions(+), 0 deletions(-) create mode 100644 arch/ia64/kernel/paravirt.c create mode 100644 include/asm-ia64/paravirt.h diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile index 13fd10e..10a4ddb 100644 --- a/arch/ia64/kernel/Makefile +++ b/arch/ia64/kernel/Makefile @@ -36,6 +36,8 @@ obj-$(CONFIG_PCI_MSI) += msi_ia64.o mca_recovery-y += mca_drv.o mca_drv_asm.o obj-$(CONFIG_IA64_MC_ERR_INJECT)+= err_inject.o +obj-$(CONFIG_PARAVIRT) += paravirt.o + obj-$(CONFIG_IA64_ESI) += esi.o ifneq ($(CONFIG_IA64_ESI),) obj-y += esi_stub.o # must be in kernel proper diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c new file mode 100644 index 0000000..d295ea5 --- /dev/null +++ b/arch/ia64/kernel/paravirt.c @@ -0,0 +1,41 @@ +/****************************************************************************** + * arch/ia64/kernel/paravirt.c + * + * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> + * VA Linux Systems Japan K.K. + * Yaozu (Eddie) Dong <eddie.dong at intel.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include <linux/init.h> + +#include <linux/compiler.h> +#include <linux/io.h> +#include <linux/irq.h> +#include <linux/types.h> + +#include <asm/iosapic.h> +#include <asm/paravirt.h> + +/*************************************************************************** + * general info + */ +struct pv_info pv_info = { + .kernel_rpl = 0, + .paravirt_enabled = 0, + .name = "bare hardware" +}; diff --git a/include/asm-ia64/paravirt.h b/include/asm-ia64/paravirt.h new file mode 100644 index 0000000..26b4334 --- /dev/null +++ b/include/asm-ia64/paravirt.h @@ -0,0 +1,62 @@ +/****************************************************************************** + * include/asm-ia64/paravirt.h + * + * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> + * VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + + +#ifndef __ASM_PARAVIRT_H +#define __ASM_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT_GUEST + +#ifndef __ASSEMBLY__ + +#include <asm/hw_irq.h> +#include <asm/meminit.h> + +/****************************************************************************** + * general info + */ +struct pv_info { + unsigned int kernel_rpl; + int paravirt_enabled; + const char *name; +}; + +extern struct pv_info pv_info; + +static inline int paravirt_enabled(void) +{ + return pv_info.paravirt_enabled; +} + +static inline unsigned int get_kernel_rpl(void) +{ + return pv_info.kernel_rpl; +} + +#endif /* !__ASSEMBLY__ */ + +#else +/* fallback for native case */ + +#endif /* CONFIG_PARAVIRT_GUEST */ + +#endif /* __ASM_PARAVIRT_H */ -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 06/15] ia64/pv_ops: preparation for paravirtulization of hand written assembly code.
Preparation for paravirtualization of hand written assembly code. They are paravirtualized by single source code and compiled multi times. To tell those files for target (including native), add one defines. Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/Makefile | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile index 10a4ddb..8b25242 100644 --- a/arch/ia64/kernel/Makefile +++ b/arch/ia64/kernel/Makefile @@ -72,3 +72,12 @@ $(obj)/gate-syms.o: $(obj)/gate.lds $(obj)/gate.o FORCE # We must build gate.so before we can assemble it. # Note: kbuild does not track this dependency due to usage of .incbin $(obj)/gate-data.o: $(obj)/gate.so + +# +# native ivt.S and entry.S +# +ASM_PARAVIRT_OBJS = ivt.o entry.o +define paravirtualized_native +AFLAGS_$(1) += -D__IA64_ASM_PARAVIRTUALIZED_NATIVE +endef +$(foreach obj,$(ASM_PARAVIRT_OBJS),$(eval $(call paravirtualized_native,$(obj)))) -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 07/15] ia64/pv_ops: define paravirtualized instructions for native.
pv_cpu_asm_ops: define paravirtualized introduce for native execution environment. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- include/asm-ia64/native/inst.h | 170 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 170 insertions(+), 0 deletions(-) create mode 100644 include/asm-ia64/native/inst.h diff --git a/include/asm-ia64/native/inst.h b/include/asm-ia64/native/inst.h new file mode 100644 index 0000000..295fde3 --- /dev/null +++ b/include/asm-ia64/native/inst.h @@ -0,0 +1,170 @@ +/****************************************************************************** + * include/asm-ia64/native/inst.h + * + * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> + * VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#ifdef CONFIG_PARAVIRT_GUEST_ASM_CLOBBER_CHECK +# define PARAVIRT_POISON 0xdeadbeefbaadf00d +# define CLOBBER(clob) \ + ;; \ + movl clob = PARAVIRT_POISON; \ + ;; +#else +# define CLOBBER(clob) /* nothing */ +#endif + +#define MOV_FROM_IFA(reg) \ + mov reg = cr.ifa + +#define MOV_FROM_ITIR(reg) \ + mov reg = cr.itir + +#define MOV_FROM_ISR(reg) \ + mov reg = cr.isr + +#define MOV_FROM_IHA(reg) \ + mov reg = cr.iha + +#define MOV_FROM_IPSR(pred, reg) \ +(pred) mov reg = cr.ipsr + +#define MOV_FROM_IIM(reg) \ + mov reg = cr.iim + +#define MOV_FROM_IIP(reg) \ + mov reg = cr.iip + +#define MOV_FROM_IVR(reg, clob) \ + mov reg = cr.ivr \ + CLOBBER(clob) + +#define MOV_FROM_PSR(pred, reg, clob) \ +(pred) mov reg = psr \ + CLOBBER(clob) + +#define MOV_TO_IFA(reg, clob) \ + mov cr.ifa = reg \ + CLOBBER(clob) + +#define MOV_TO_ITIR(pred, reg, clob) \ +(pred) mov cr.itir = reg \ + CLOBBER(clob) + +#define MOV_TO_IHA(pred, reg, clob) \ +(pred) mov cr.iha = reg \ + CLOBBER(clob) + +#define MOV_TO_IPSR(reg, clob) \ + mov cr.ipsr = reg \ + CLOBBER(clob) + +#define MOV_TO_IFS(pred, reg, clob) \ +(pred) mov cr.ifs = reg \ + CLOBBER(clob) + +#define MOV_TO_IIP(reg, clob) \ + mov cr.iip = reg \ + CLOBBER(clob) + +#define MOV_TO_KR(kr, reg, clob0, clob1) \ + mov IA64_KR(kr) = reg \ + CLOBBER(clob0) \ + CLOBBER(clob1) + +#define ITC_I(pred, reg, clob) \ +(pred) itc.i reg \ + CLOBBER(clob) + +#define ITC_D(pred, reg, clob) \ +(pred) itc.d reg \ + CLOBBER(clob) + +#define ITC_I_AND_D(pred_i, pred_d, reg, clob) \ +(pred_i) itc.i reg; \ +(pred_d) itc.d reg \ + CLOBBER(clob) + +#define THASH(pred, reg0, reg1, clob) \ +(pred) thash reg0 = reg1 \ + CLOBBER(clob) + +#define SSM_PSR_IC_AND_DEFAULT_BITS(clob0, clob1) \ + ssm psr.ic | PSR_DEFAULT_BITS \ + CLOBBER(clob0) \ + CLOBBER(clob1) \ + ;; \ + srlz.i /* guarantee that interruption collectin is on */ \ + ;; + +#define SSM_PSR_IC_AND_SRLZ_D(clob0, clob1) \ + ssm psr.ic \ + CLOBBER(clob0) \ + CLOBBER(clob1) \ + ;; \ + srlz.d + +#define RSM_PSR_IC(clob) \ + rsm psr.ic \ + CLOBBER(clob) + +#define SSM_PSR_I(pred, clob) \ +(pred) ssm psr.i \ + CLOBBER(clob) + +#define RSM_PSR_I(pred, clob0, clob1) \ +(pred) rsm psr.i \ + CLOBBER(clob0) \ + CLOBBER(clob1) + +#define RSM_PSR_I_IC(clob0, clob1, clob2) \ + rsm psr.i | psr.ic \ + CLOBBER(clob0) \ + CLOBBER(clob1) \ + CLOBBER(clob2) + +#define RSM_PSR_DT \ + rsm psr.dt + +#define RSM_PSR_DT_AND_SRLZ_I \ + rsm psr.dt \ + ;; \ + srlz.i + +#define SSM_PSR_DT_AND_SRLZ_I \ + ssm psr.dt \ + ;; \ + srlz.i + +#define BSW_0(clob0, clob1, clob2) \ + bsw.0 \ + CLOBBER(clob0) \ + CLOBBER(clob1) \ + CLOBBER(clob2) + +#define BSW_1(clob0, clob1) \ + bsw.1 \ + CLOBBER(clob0) \ + CLOBBER(clob1) + +#define COVER \ + cover + +#define RFI \ + rfi -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 08/15] ia64/pv_ops: paravirtualize minstate.h.
paravirtualize minstate.h which are hand written assembly code. They include sensitive or performance critical privileged instructions. So that they are appropriate for paravirtualization. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/minstate.h | 13 +++++++------ arch/ia64/kernel/paravirt_inst.h | 29 +++++++++++++++++++++++++++++ include/asm-ia64/native/inst.h | 2 ++ 3 files changed, 38 insertions(+), 6 deletions(-) create mode 100644 arch/ia64/kernel/paravirt_inst.h diff --git a/arch/ia64/kernel/minstate.h b/arch/ia64/kernel/minstate.h index 7c548ac..9db53cf 100644 --- a/arch/ia64/kernel/minstate.h +++ b/arch/ia64/kernel/minstate.h @@ -2,6 +2,7 @@ #include <asm/cache.h> #include "entry.h" +#include "paravirt_inst.h" #ifdef CONFIG_VIRT_CPU_ACCOUNTING /* read ar.itc in advance, and use it before leaving bank 0 */ @@ -40,16 +41,16 @@ * Note that psr.ic is NOT turned on by this macro. This is so that * we can pass interruption state as arguments to a handler. */ -#define DO_SAVE_MIN(COVER,SAVE_IFS,EXTRA) \ +#define IA64_NATIVE_DO_SAVE_MIN(__COVER,SAVE_IFS,EXTRA) \ mov r16=IA64_KR(CURRENT); /* M */ \ mov r27=ar.rsc; /* M */ \ mov r20=r1; /* A */ \ mov r25=ar.unat; /* M */ \ - mov r29=cr.ipsr; /* M */ \ + MOV_FROM_IPSR(p0,r29); /* M */ \ mov r26=ar.pfs; /* I */ \ - mov r28=cr.iip; /* M */ \ + MOV_FROM_IIP(r28); /* M */ \ mov r21=ar.fpsr; /* M */ \ - COVER; /* B;; (or nothing) */ \ + __COVER; /* B;; (or nothing) */ \ ;; \ adds r16=IA64_TASK_THREAD_ON_USTACK_OFFSET,r16; \ ;; \ @@ -206,6 +207,6 @@ st8 [r25]=r10; /* ar.ssd */ \ ;; -#define SAVE_MIN_WITH_COVER DO_SAVE_MIN(cover, mov r30=cr.ifs,) -#define SAVE_MIN_WITH_COVER_R19 DO_SAVE_MIN(cover, mov r30=cr.ifs, mov r15=r19) +#define SAVE_MIN_WITH_COVER DO_SAVE_MIN(COVER, mov r30=cr.ifs,) +#define SAVE_MIN_WITH_COVER_R19 DO_SAVE_MIN(COVER, mov r30=cr.ifs, mov r15=r19) #define SAVE_MIN DO_SAVE_MIN( , mov r30=r0, ) diff --git a/arch/ia64/kernel/paravirt_inst.h b/arch/ia64/kernel/paravirt_inst.h new file mode 100644 index 0000000..5cad6fb --- /dev/null +++ b/arch/ia64/kernel/paravirt_inst.h @@ -0,0 +1,29 @@ +/****************************************************************************** + * linux/arch/ia64/xen/paravirt_inst.h + * + * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> + * VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#ifdef __IA64_ASM_PARAVIRTUALIZED_XEN +#include <asm/xen/inst.h> +#include <asm/xen/minstate.h> +#else +#include <asm/native/inst.h> +#endif + diff --git a/include/asm-ia64/native/inst.h b/include/asm-ia64/native/inst.h index 295fde3..c167b7d 100644 --- a/include/asm-ia64/native/inst.h +++ b/include/asm-ia64/native/inst.h @@ -20,6 +20,8 @@ * */ +#define DO_SAVE_MIN IA64_NATIVE_DO_SAVE_MIN + #ifdef CONFIG_PARAVIRT_GUEST_ASM_CLOBBER_CHECK # define PARAVIRT_POISON 0xdeadbeefbaadf00d # define CLOBBER(clob) \ -- 1.5.3
paravirtualize ivt.S which implements fault handler in hand written assembly code. They includes sensitive or performance critical privileged instructions. So they need paravirtualization. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/ivt.S | 227 ++++++++++++++++++++++++------------------------ 1 files changed, 115 insertions(+), 112 deletions(-) diff --git a/arch/ia64/kernel/ivt.S b/arch/ia64/kernel/ivt.S index 6678c49..4244287 100644 --- a/arch/ia64/kernel/ivt.S +++ b/arch/ia64/kernel/ivt.S @@ -12,6 +12,14 @@ * * 00/08/23 Asit Mallick <asit.k.mallick at intel.com> TLB handling for SMP * 00/12/20 David Mosberger-Tang <davidm at hpl.hp.com> DTLB/ITLB handler now uses virtual PT. + * + * Copyright (C) 2005 Hewlett-Packard Co + * Dan Magenheimer <dan.magenheimer at hp.com> + * Xen paravirtualization + * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> + * VA Linux Systems Japan K.K. + * pv_ops. + * Yaozu (Eddie) Dong <eddie.dong at intel.com> */ /* * This file defines the interruption vector table used by the CPU. @@ -102,13 +110,13 @@ ENTRY(vhpt_miss) * - the faulting virtual address uses unimplemented address bits * - the faulting virtual address has no valid page table mapping */ - mov r16=cr.ifa // get address that caused the TLB miss + MOV_FROM_IFA(r16) // get address that caused the TLB miss #ifdef CONFIG_HUGETLB_PAGE movl r18=PAGE_SHIFT - mov r25=cr.itir + MOV_FROM_ITIR(r25) #endif ;; - rsm psr.dt // use physical addressing for data + RSM_PSR_DT // use physical addressing for data mov r31=pr // save the predicate registers mov r19=IA64_KR(PT_BASE) // get page table base address shl r21=r16,3 // shift bit 60 into sign bit @@ -168,21 +176,20 @@ ENTRY(vhpt_miss) dep r21=r19,r20,3,(PAGE_SHIFT-3) // r21=pte_offset(pmd,addr) ;; (p7) ld8 r18=[r21] // read *pte - mov r19=cr.isr // cr.isr bit 32 tells us if this is an insn miss + MOV_FROM_ISR(r19) // cr.isr bit 32 tells us if this is an insn miss ;; (p7) tbit.z p6,p7=r18,_PAGE_P_BIT // page present bit cleared? - mov r22=cr.iha // get the VHPT address that caused the TLB miss + MOV_FROM_IHA(r22) // get the VHPT address that caused the TLB miss ;; // avoid RAW on p7 (p7) tbit.nz.unc p10,p11=r19,32 // is it an instruction TLB miss? dep r23=0,r20,0,PAGE_SHIFT // clear low bits to get page address ;; -(p10) itc.i r18 // insert the instruction TLB entry -(p11) itc.d r18 // insert the data TLB entry + ITC_I_AND_D(p10, p11, r18, r24) // insert the instruction TLB entry and + // insert the data TLB entry (p6) br.cond.spnt.many page_fault // handle bad address/page not present (page fault) - mov cr.ifa=r22 - + MOV_TO_IFA(r22, r24) #ifdef CONFIG_HUGETLB_PAGE -(p8) mov cr.itir=r25 // change to default page-size for VHPT + MOV_TO_ITIR(p8, r25, r24) // change to default page-size for VHPT #endif /* @@ -192,7 +199,7 @@ ENTRY(vhpt_miss) */ adds r24=__DIRTY_BITS_NO_ED|_PAGE_PL_0|_PAGE_AR_RW,r23 ;; -(p7) itc.d r24 + ITC_D(p7, r24, r25) ;; #ifdef CONFIG_SMP /* @@ -234,7 +241,7 @@ ENTRY(vhpt_miss) #endif mov pr=r31,-1 // restore predicate registers - rfi + RFI END(vhpt_miss) .org ia64_ivt+0x400 @@ -248,11 +255,11 @@ ENTRY(itlb_miss) * mode, walk the page table, and then re-execute the PTE read and * go on normally after that. */ - mov r16=cr.ifa // get virtual address + MOV_FROM_IFA(r16) // get virtual address mov r29=b0 // save b0 mov r31=pr // save predicates .itlb_fault: - mov r17=cr.iha // get virtual address of PTE + MOV_FROM_IHA(r17) // get virtual address of PTE movl r30=1f // load nested fault continuation point ;; 1: ld8 r18=[r17] // read *pte @@ -261,7 +268,7 @@ ENTRY(itlb_miss) tbit.z p6,p0=r18,_PAGE_P_BIT // page present bit cleared? (p6) br.cond.spnt page_fault ;; - itc.i r18 + ITC_I(p0, r18, r19) ;; #ifdef CONFIG_SMP /* @@ -278,7 +285,7 @@ ENTRY(itlb_miss) (p7) ptc.l r16,r20 #endif mov pr=r31,-1 - rfi + RFI END(itlb_miss) .org ia64_ivt+0x0800 @@ -292,11 +299,11 @@ ENTRY(dtlb_miss) * mode, walk the page table, and then re-execute the PTE read and * go on normally after that. */ - mov r16=cr.ifa // get virtual address + MOV_FROM_IFA(r16) // get virtual address mov r29=b0 // save b0 mov r31=pr // save predicates dtlb_fault: - mov r17=cr.iha // get virtual address of PTE + MOV_FROM_IHA(r17) // get virtual address of PTE movl r30=1f // load nested fault continuation point ;; 1: ld8 r18=[r17] // read *pte @@ -305,7 +312,7 @@ dtlb_fault: tbit.z p6,p0=r18,_PAGE_P_BIT // page present bit cleared? (p6) br.cond.spnt page_fault ;; - itc.d r18 + ITC_D(p0, r18, r19) ;; #ifdef CONFIG_SMP /* @@ -322,7 +329,7 @@ dtlb_fault: (p7) ptc.l r16,r20 #endif mov pr=r31,-1 - rfi + RFI END(dtlb_miss) .org ia64_ivt+0x0c00 @@ -330,9 +337,9 @@ END(dtlb_miss) // 0x0c00 Entry 3 (size 64 bundles) Alt ITLB (19) ENTRY(alt_itlb_miss) DBG_FAULT(3) - mov r16=cr.ifa // get address that caused the TLB miss + MOV_FROM_IFA(r16) // get address that caused the TLB miss movl r17=PAGE_KERNEL - mov r21=cr.ipsr + MOV_FROM_IPSR(p0,r21) movl r19=(((1 << IA64_MAX_PHYS_BITS) - 1) & ~0xfff) mov r31=pr ;; @@ -341,9 +348,9 @@ ENTRY(alt_itlb_miss) ;; cmp.gt p8,p0=6,r22 // user mode ;; -(p8) thash r17=r16 + THASH(p8, r17, r16, r23) ;; -(p8) mov cr.iha=r17 + MOV_TO_IHA(p8, r17, r23) (p8) mov r29=b0 // save b0 (p8) br.cond.dptk .itlb_fault #endif @@ -358,9 +365,9 @@ ENTRY(alt_itlb_miss) or r19=r19,r18 // set bit 4 (uncached) if the access was to region 6 (p8) br.cond.spnt page_fault ;; - itc.i r19 // insert the TLB entry + ITC_I(p0, r19, r18) // insert the TLB entry mov pr=r31,-1 - rfi + RFI END(alt_itlb_miss) .org ia64_ivt+0x1000 @@ -368,11 +375,11 @@ END(alt_itlb_miss) // 0x1000 Entry 4 (size 64 bundles) Alt DTLB (7,46) ENTRY(alt_dtlb_miss) DBG_FAULT(4) - mov r16=cr.ifa // get address that caused the TLB miss + MOV_FROM_IFA(r16) // get address that caused the TLB miss movl r17=PAGE_KERNEL - mov r20=cr.isr + MOV_FROM_ISR(r20) movl r19=(((1 << IA64_MAX_PHYS_BITS) - 1) & ~0xfff) - mov r21=cr.ipsr + MOV_FROM_IPSR(p0,r21) mov r31=pr mov r24=PERCPU_ADDR ;; @@ -381,9 +388,9 @@ ENTRY(alt_dtlb_miss) ;; cmp.gt p8,p0=6,r22 // access to region 0-5 ;; -(p8) thash r17=r16 + THASH(p8, r17, r16, r25) ;; -(p8) mov cr.iha=r17 + MOV_TO_IHA(p8, r17, r25) (p8) mov r29=b0 // save b0 (p8) br.cond.dptk dtlb_fault #endif @@ -402,7 +409,7 @@ ENTRY(alt_dtlb_miss) tbit.nz p9,p0=r20,IA64_ISR_NA_BIT // is non-access bit on? ;; (p10) sub r19=r19,r26 -(p10) mov cr.itir=r25 + MOV_TO_ITIR(p10, r25, r24) cmp.ne p8,p0=r0,r23 (p9) cmp.eq.or.andcm p6,p7=IA64_ISR_CODE_LFETCH,r22 // check isr.code field (p12) dep r17=-1,r17,4,1 // set ma=UC for region 6 addr @@ -411,11 +418,11 @@ ENTRY(alt_dtlb_miss) dep r21=-1,r21,IA64_PSR_ED_BIT,1 ;; or r19=r19,r17 // insert PTE control bits into r19 -(p6) mov cr.ipsr=r21 + MOV_FROM_IPSR(p6,r21) ;; -(p7) itc.d r19 // insert the TLB entry + ITC_D(p7, r19, r18) // insert the TLB entry mov pr=r31,-1 - rfi + RFI END(alt_dtlb_miss) .org ia64_ivt+0x1400 @@ -444,10 +451,10 @@ ENTRY(nested_dtlb_miss) * * Clobbered: b0, r18, r19, r21, r22, psr.dt (cleared) */ - rsm psr.dt // switch to using physical data addressing + RSM_PSR_DT_AND_SRLZ_I // switch to using physical data addressing mov r19=IA64_KR(PT_BASE) // get the page table base address shl r21=r16,3 // shift bit 60 into sign bit - mov r18=cr.itir + MOV_FROM_ITIR(r18) ;; shr.u r17=r16,61 // get the region number into r17 extr.u r18=r18,2,6 // get the faulting page size @@ -510,21 +517,15 @@ END(ikey_miss) //----------------------------------------------------------------------------------- // call do_page_fault (predicates are in r31, psr.dt may be off, r16 is faulting address) ENTRY(page_fault) - ssm psr.dt - ;; - srlz.i + SSM_PSR_DT_AND_SRLZ_I ;; SAVE_MIN_WITH_COVER alloc r15=ar.pfs,0,0,3,0 - mov out0=cr.ifa - mov out1=cr.isr + MOV_FROM_IFA(out0) + MOV_FROM_ISR(out1) + SSM_PSR_IC_AND_DEFAULT_BITS(r14, r3) adds r3=8,r2 // set up second base pointer - ;; - ssm psr.ic | PSR_DEFAULT_BITS - ;; - srlz.i // guarantee that interruption collectin is on - ;; -(p15) ssm psr.i // restore psr.i + SSM_PSR_I(p15, r14) // restore psr.i movl r14=ia64_leave_kernel ;; SAVE_REST @@ -556,10 +557,10 @@ ENTRY(dirty_bit) * page table TLB entry isn't present, we take a nested TLB miss hit where we look * up the physical address of the L3 PTE and then continue at label 1 below. */ - mov r16=cr.ifa // get the address that caused the fault + MOV_FROM_IFA(r16) // get the address that caused the fault movl r30=1f // load continuation point in case of nested fault ;; - thash r17=r16 // compute virtual address of L3 PTE + THASH(p0, r17, r16, r18) // compute virtual address of L3 PTE mov r29=b0 // save b0 in case of nested fault mov r31=pr // save pr #ifdef CONFIG_SMP @@ -576,7 +577,7 @@ ENTRY(dirty_bit) ;; (p6) cmp.eq p6,p7=r26,r18 // Only compare if page is present ;; -(p6) itc.d r25 // install updated PTE + ITC_D(p6, r25, r18) // install updated PTE ;; /* * Tell the assemblers dependency-violation checker that the above "itc" instructions @@ -602,7 +603,7 @@ ENTRY(dirty_bit) itc.d r18 // install updated PTE #endif mov pr=r31,-1 // restore pr - rfi + RFI END(dirty_bit) .org ia64_ivt+0x2400 @@ -611,22 +612,22 @@ END(dirty_bit) ENTRY(iaccess_bit) DBG_FAULT(9) // Like Entry 8, except for instruction access - mov r16=cr.ifa // get the address that caused the fault + MOV_FROM_IFA(r16) // get the address that caused the fault movl r30=1f // load continuation point in case of nested fault mov r31=pr // save predicates #ifdef CONFIG_ITANIUM /* * Erratum 10 (IFA may contain incorrect address) has "NoFix" status. */ - mov r17=cr.ipsr + MOV_FROM_IPSR(p0,r17) ;; - mov r18=cr.iip + MOV_FROM_IIP(r18) tbit.z p6,p0=r17,IA64_PSR_IS_BIT // IA64 instruction set? ;; (p6) mov r16=r18 // if so, use cr.iip instead of cr.ifa #endif /* CONFIG_ITANIUM */ ;; - thash r17=r16 // compute virtual address of L3 PTE + THASH(p0, r17, r16, r18) // compute virtual address of L3 PTE mov r29=b0 // save b0 in case of nested fault) #ifdef CONFIG_SMP mov r28=ar.ccv // save ar.ccv @@ -642,7 +643,7 @@ ENTRY(iaccess_bit) ;; (p6) cmp.eq p6,p7=r26,r18 // Only if page present ;; -(p6) itc.i r25 // install updated PTE + ITC_I(p6, r25, r26) // install updated PTE ;; /* * Tell the assemblers dependency-violation checker that the above "itc" instructions @@ -668,7 +669,7 @@ ENTRY(iaccess_bit) itc.i r18 // install updated PTE #endif /* !CONFIG_SMP */ mov pr=r31,-1 - rfi + RFI END(iaccess_bit) .org ia64_ivt+0x2800 @@ -677,10 +678,10 @@ END(iaccess_bit) ENTRY(daccess_bit) DBG_FAULT(10) // Like Entry 8, except for data access - mov r16=cr.ifa // get the address that caused the fault + MOV_FROM_IFA(r16) // get the address that caused the fault movl r30=1f // load continuation point in case of nested fault ;; - thash r17=r16 // compute virtual address of L3 PTE + THASH(p0, r17, r16, r18) // compute virtual address of L3 PTE mov r31=pr mov r29=b0 // save b0 in case of nested fault) #ifdef CONFIG_SMP @@ -697,7 +698,7 @@ ENTRY(daccess_bit) ;; (p6) cmp.eq p6,p7=r26,r18 // Only if page is present ;; -(p6) itc.d r25 // install updated PTE + ITC_D(p6, r25, r26) // install updated PTE /* * Tell the assemblers dependency-violation checker that the above "itc" instructions * cannot possibly affect the following loads: @@ -721,7 +722,7 @@ ENTRY(daccess_bit) #endif mov b0=r29 // restore b0 mov pr=r31,-1 - rfi + RFI END(daccess_bit) .org ia64_ivt+0x2c00 @@ -745,10 +746,10 @@ ENTRY(break_fault) */ DBG_FAULT(11) mov.m r16=IA64_KR(CURRENT) // M2 r16 <- current task (12 cyc) - mov r29=cr.ipsr // M2 (12 cyc) + MOV_FROM_IPSR(p0,r29) // M2 (12 cyc) mov r31=pr // I0 (2 cyc) - mov r17=cr.iim // M2 (2 cyc) + MOV_FROM_IIM(r17) // M2 (2 cyc) mov.m r27=ar.rsc // M2 (12 cyc) mov r18=__IA64_BREAK_SYSCALL // A @@ -767,7 +768,7 @@ ENTRY(break_fault) nop.m 0 movl r30=sys_call_table // X - mov r28=cr.iip // M2 (2 cyc) + MOV_FROM_IIP(r28) // M2 (2 cyc) cmp.eq p0,p7=r18,r17 // I0 is this a system call? (p7) br.cond.spnt non_syscall // B no -> // @@ -864,10 +865,10 @@ ENTRY(break_fault) #endif mov ar.rsc=0x3 // M2 set eager mode, pl 0, LE, loadrs=0 nop 0 - bsw.1 // B (6 cyc) regs are saved, switch to bank 1 + BSW_1(r2, r14) // B (6 cyc) regs are saved, switch to bank 1 ;; - ssm psr.ic | PSR_DEFAULT_BITS // M2 now it's safe to re-enable intr.-collection + SSM_PSR_IC_AND_DEFAULT_BITS(r3, r16) // M2 now it's safe to re-enable intr.-collection movl r3=ia64_ret_from_syscall // X ;; @@ -875,7 +876,7 @@ ENTRY(break_fault) mov rp=r3 // I0 set the real return addr (p10) br.cond.spnt.many ia64_ret_from_syscall // B return if bad call-frame or r15 is a NaT -(p15) ssm psr.i // M2 restore psr.i + SSM_PSR_I(p15, r16) // M2 restore psr.i (p14) br.call.sptk.many b6=b6 // B invoke syscall-handker (ignore return addr) br.cond.spnt.many ia64_trace_syscall // B do syscall-tracing thingamagic // NOT REACHED @@ -899,7 +900,7 @@ ENTRY(interrupt) mov r31=pr // prepare to save predicates ;; SAVE_MIN_WITH_COVER // uses r31; defines r2 and r3 - ssm psr.ic | PSR_DEFAULT_BITS + SSM_PSR_IC_AND_DEFAULT_BITS(r3, r14) ;; adds r3=8,r2 // set up second base pointer for SAVE_REST srlz.i // ensure everybody knows psr.ic is back on @@ -908,7 +909,7 @@ ENTRY(interrupt) ;; MCA_RECOVER_RANGE(interrupt) alloc r14=ar.pfs,0,0,2,0 // must be first in an insn group - mov out0=cr.ivr // pass cr.ivr as first arg + MOV_FROM_IVR(out0, r8) // pass cr.ivr as first arg add out1=16,sp // pass pointer to pt_regs as second arg ;; srlz.d // make sure we see the effect of cr.ivr @@ -978,6 +979,7 @@ END(interrupt) * - ar.fpsr: set to kernel settings * - b6: preserved (same as on entry) */ +#ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE GLOBAL_ENTRY(ia64_syscall_setup) #if PT(B6) != 0 # error This code assumes that b6 is the first field in pt_regs. @@ -1069,6 +1071,7 @@ GLOBAL_ENTRY(ia64_syscall_setup) (p10) mov r8=-EINVAL br.ret.sptk.many b7 END(ia64_syscall_setup) +#endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ .org ia64_ivt+0x3c00 ///////////////////////////////////////////////////////////////////////////////////////// @@ -1089,11 +1092,11 @@ ENTRY(dispatch_illegal_op_fault) .prologue .body SAVE_MIN_WITH_COVER - ssm psr.ic | PSR_DEFAULT_BITS + SSM_PSR_IC_AND_DEFAULT_BITS(r3,r24) ;; srlz.i // guarantee that interruption collection is on ;; -(p15) ssm psr.i // restore psr.i + SSM_PSR_I(p15,r3) // restore psr.i adds r3=8,r2 // set up second base pointer for SAVE_REST ;; alloc r14=ar.pfs,0,0,1,0 // must be first in insn group @@ -1124,7 +1127,7 @@ END(dispatch_illegal_op_fault) DBG_FAULT(16) FAULT(16) -#ifdef CONFIG_VIRT_CPU_ACCOUNTING +#if defined(CONFIG_VIRT_CPU_ACCOUNTING) && defined(__IA64_ASM_PARAVIRTUALIZED_NATIVE) /* * There is no particular reason for this code to be here, other than * that there happens to be space here that would go unused otherwise. @@ -1134,7 +1137,7 @@ END(dispatch_illegal_op_fault) * account_sys_enter is called from SAVE_MIN* macros if accounting is * enabled and if the macro is entered from user mode. */ -ENTRY(account_sys_enter) +GLOBAL_ENTRY(account_sys_enter) // mov.m r20=ar.itc is called in advance, and r13 is current add r16=TI_AC_STAMP+IA64_TASK_SIZE,r13 add r17=TI_AC_LEAVE+IA64_TASK_SIZE,r13 @@ -1176,15 +1179,15 @@ ENTRY(non_syscall) // suitable spot... alloc r14=ar.pfs,0,0,2,0 - mov out0=cr.iim + MOV_FROM_IIM(out0) add out1=16,sp adds r3=8,r2 // set up second base pointer for SAVE_REST - ssm psr.ic | PSR_DEFAULT_BITS + SSM_PSR_IC_AND_DEFAULT_BITS(r15,r24) ;; srlz.i // guarantee that interruption collection is on ;; -(p15) ssm psr.i // restore psr.i + SSM_PSR_I(p15,r15) // restore psr.i movl r15=ia64_leave_kernel ;; SAVE_REST @@ -1210,14 +1213,14 @@ ENTRY(dispatch_unaligned_handler) SAVE_MIN_WITH_COVER ;; alloc r14=ar.pfs,0,0,2,0 // now it's safe (must be first in insn group!) - mov out0=cr.ifa + MOV_FROM_IFA(out0) adds out1=16,sp - ssm psr.ic | PSR_DEFAULT_BITS + SSM_PSR_IC_AND_DEFAULT_BITS(r3,r24) ;; srlz.i // guarantee that interruption collection is on ;; -(p15) ssm psr.i // restore psr.i + SSM_PSR_I(p15,r3) // restore psr.i adds r3=8,r2 // set up second base pointer ;; SAVE_REST @@ -1249,17 +1252,17 @@ ENTRY(dispatch_to_fault_handler) */ SAVE_MIN_WITH_COVER_R19 alloc r14=ar.pfs,0,0,5,0 - mov out0=r15 - mov out1=cr.isr - mov out2=cr.ifa - mov out3=cr.iim - mov out4=cr.itir + MOV_FROM_ISR(out1) + MOV_FROM_IFA(out2) + MOV_FROM_IIM(out3) + MOV_FROM_ITIR(out4) ;; - ssm psr.ic | PSR_DEFAULT_BITS + SSM_PSR_IC_AND_DEFAULT_BITS(r3, out0) + mov out0=r15 ;; srlz.i // guarantee that interruption collection is on ;; -(p15) ssm psr.i // restore psr.i + SSM_PSR_I(p15,r3) // restore psr.i adds r3=8,r2 // set up second base pointer for SAVE_REST ;; SAVE_REST @@ -1278,8 +1281,8 @@ END(dispatch_to_fault_handler) // 0x5000 Entry 20 (size 16 bundles) Page Not Present (10,22,49) ENTRY(page_not_present) DBG_FAULT(20) - mov r16=cr.ifa - rsm psr.dt + MOV_FROM_IFA(r16) + RSM_PSR_DT /* * The Linux page fault handler doesn't expect non-present pages to be in * the TLB. Flush the existing entry now, so we meet that expectation. @@ -1298,8 +1301,8 @@ END(page_not_present) // 0x5100 Entry 21 (size 16 bundles) Key Permission (13,25,52) ENTRY(key_permission) DBG_FAULT(21) - mov r16=cr.ifa - rsm psr.dt + MOV_FROM_IFA(r16) + RSM_PSR_DT mov r31=pr ;; srlz.d @@ -1311,8 +1314,8 @@ END(key_permission) // 0x5200 Entry 22 (size 16 bundles) Instruction Access Rights (26) ENTRY(iaccess_rights) DBG_FAULT(22) - mov r16=cr.ifa - rsm psr.dt + MOV_FROM_IFA(r16) + RSM_PSR_DT mov r31=pr ;; srlz.d @@ -1324,8 +1327,8 @@ END(iaccess_rights) // 0x5300 Entry 23 (size 16 bundles) Data Access Rights (14,53) ENTRY(daccess_rights) DBG_FAULT(23) - mov r16=cr.ifa - rsm psr.dt + MOV_FROM_IFA(r16) + RSM_PSR_DT mov r31=pr ;; srlz.d @@ -1337,7 +1340,7 @@ END(daccess_rights) // 0x5400 Entry 24 (size 16 bundles) General Exception (5,32,34,36,38,39) ENTRY(general_exception) DBG_FAULT(24) - mov r16=cr.isr + MOV_FROM_ISR(r16) mov r31=pr ;; cmp4.eq p6,p0=0,r16 @@ -1366,8 +1369,8 @@ END(disabled_fp_reg) ENTRY(nat_consumption) DBG_FAULT(26) - mov r16=cr.ipsr - mov r17=cr.isr + MOV_FROM_IPSR(p0,r16); + MOV_FROM_ISR(r17); mov r31=pr // save PR ;; and r18=0xf,r17 // r18 = cr.ipsr.code{3:0} @@ -1377,10 +1380,10 @@ ENTRY(nat_consumption) dep r16=-1,r16,IA64_PSR_ED_BIT,1 (p6) br.cond.spnt 1f // branch if (cr.ispr.na == 0 || cr.ipsr.code{3:0} != LFETCH) ;; - mov cr.ipsr=r16 // set cr.ipsr.na + MOV_TO_IPSR(r16,r18); mov pr=r31,-1 ;; - rfi + RFI 1: mov pr=r31,-1 ;; @@ -1402,26 +1405,26 @@ ENTRY(speculation_vector) * * cr.imm contains zero_ext(imm21) */ - mov r18=cr.iim + MOV_FROM_IIM(r18) ;; - mov r17=cr.iip + MOV_FROM_IIP(r17) shl r18=r18,43 // put sign bit in position (43=64-21) ;; - mov r16=cr.ipsr + MOV_FROM_IPSR(p0,r16) shr r18=r18,39 // sign extend (39=43-4) ;; add r17=r17,r18 // now add the offset ;; - mov cr.iip=r17 + MOV_FROM_IIP(r17) dep r16=0,r16,41,2 // clear EI ;; - mov cr.ipsr=r16 + MOV_FROM_IPSR(p0,r16) ;; - rfi // and go back + RFI END(speculation_vector) .org ia64_ivt+0x5800 @@ -1559,11 +1562,11 @@ ENTRY(ia32_intercept) DBG_FAULT(46) #ifdef CONFIG_IA32_SUPPORT mov r31=pr - mov r16=cr.isr + MOV_FROM_ISR(r16) ;; extr.u r17=r16,16,8 // get ISR.code mov r18=ar.eflag - mov r19=cr.iim // old eflag value + MOV_FROM_IIM(r19) // old eflag value ;; cmp.ne p6,p0=2,r17 (p6) br.cond.spnt 1f // not a system flag fault @@ -1575,7 +1578,7 @@ ENTRY(ia32_intercept) (p6) br.cond.spnt 1f // eflags.ac bit didn't change ;; mov pr=r31,-1 // restore predicate registers - rfi + RFI 1: #endif // CONFIG_IA32_SUPPORT @@ -1729,12 +1732,12 @@ END(ia32_interrupt) ENTRY(dispatch_to_ia32_handler) SAVE_MIN ;; - mov r14=cr.isr - ssm psr.ic | PSR_DEFAULT_BITS + MOV_FROM_ISR(r14) + SSM_PSR_IC_AND_DEFAULT_BITS(r3,r24) ;; srlz.i // guarantee that interruption collection is on ;; -(p15) ssm psr.i + SSM_PSR_I(p15,r3) adds r3=8,r2 // Base pointer for SAVE_REST ;; SAVE_REST -- 1.5.3
paravirtualize ia64_swtich_to, ia64_leave_syscall and ia64_leave_kernel. They include sensitive or performance critical privileged instructions so that they need paravirtualization. To paravirtualize them by single source and multi compile they are converted into indirect jump. And define each pv instances. Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/Makefile | 2 +- arch/ia64/kernel/entry.S | 115 ++++++++++++++++++++++------------- arch/ia64/kernel/paravirt.c | 19 ++++++ arch/ia64/kernel/paravirtentry.S | 60 +++++++++++++++++++ include/asm-ia64/native/inst.h | 8 +++ include/asm-ia64/paravirt_privop.h | 23 +++++++ 6 files changed, 183 insertions(+), 44 deletions(-) create mode 100644 arch/ia64/kernel/paravirtentry.S diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile index 8b25242..cea91f1 100644 --- a/arch/ia64/kernel/Makefile +++ b/arch/ia64/kernel/Makefile @@ -36,7 +36,7 @@ obj-$(CONFIG_PCI_MSI) += msi_ia64.o mca_recovery-y += mca_drv.o mca_drv_asm.o obj-$(CONFIG_IA64_MC_ERR_INJECT)+= err_inject.o -obj-$(CONFIG_PARAVIRT) += paravirt.o +obj-$(CONFIG_PARAVIRT) += paravirt.o paravirtentry.o obj-$(CONFIG_IA64_ESI) += esi.o ifneq ($(CONFIG_IA64_ESI),) diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S index e49ad8c..68fb7ef 100644 --- a/arch/ia64/kernel/entry.S +++ b/arch/ia64/kernel/entry.S @@ -23,6 +23,11 @@ * 11/07/2000 */ /* + * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> + * VA Linux Systems Japan K.K. + * pv_ops. + */ +/* * Global (preserved) predicate usage on syscall entry/exit path: * * pKStk: See entry.h. @@ -45,6 +50,7 @@ #include "minstate.h" +#ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE /* * execve() is special because in case of success, we need to * setup a null register window frame. @@ -173,6 +179,7 @@ GLOBAL_ENTRY(sys_clone) mov rp=loc0 br.ret.sptk.many rp END(sys_clone) +#endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ /* * prev_task <- ia64_switch_to(struct task_struct *next) @@ -180,7 +187,7 @@ END(sys_clone) * called. The code starting at .map relies on this. The rest of the code * doesn't care about the interrupt masking status. */ -GLOBAL_ENTRY(ia64_switch_to) +GLOBAL_ENTRY(__paravirt_switch_to) .prologue alloc r16=ar.pfs,1,0,0,0 DO_SAVE_SWITCH_STACK @@ -204,7 +211,7 @@ GLOBAL_ENTRY(ia64_switch_to) ;; .done: ld8 sp=[r21] // load kernel stack pointer of new task - mov IA64_KR(CURRENT)=in0 // update "current" application register + MOV_TO_KR(CURRENT, in0, r8, r9) // update "current" application register mov r8=r13 // return pointer to previously running task mov r13=in0 // set "current" pointer ;; @@ -216,26 +223,25 @@ GLOBAL_ENTRY(ia64_switch_to) br.ret.sptk.many rp // boogie on out in new context .map: - rsm psr.ic // interrupts (psr.i) are already disabled here + RSM_PSR_IC(r25) // interrupts (psr.i) are already disabled here movl r25=PAGE_KERNEL ;; srlz.d or r23=r25,r20 // construct PA | page properties mov r25=IA64_GRANULE_SHIFT<<2 ;; - mov cr.itir=r25 - mov cr.ifa=in0 // VA of next task... + MOV_TO_ITIR(p0, r25, r8) + MOV_TO_IFA(in0, r8) // VA of next task... ;; mov r25=IA64_TR_CURRENT_STACK - mov IA64_KR(CURRENT_STACK)=r26 // remember last page we mapped... + MOV_TO_KR(CURRENT_STACK, r26, r8, r9) // remember last page we mapped... ;; itr.d dtr[r25]=r23 // wire in new mapping... - ssm psr.ic // reenable the psr.ic bit - ;; - srlz.d + SSM_PSR_IC_AND_SRLZ_D(r8, r9) // reenable the psr.ic bit br.cond.sptk .done -END(ia64_switch_to) +END(__paravirt_switch_to) +#ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE /* * Note that interrupts are enabled during save_switch_stack and load_switch_stack. This * means that we may get an interrupt with "sp" pointing to the new kernel stack while @@ -375,7 +381,7 @@ END(save_switch_stack) * - b7 holds address to return to * - must not touch r8-r11 */ -ENTRY(load_switch_stack) +GLOBAL_ENTRY(load_switch_stack) .prologue .altrp b7 @@ -571,7 +577,7 @@ GLOBAL_ENTRY(ia64_trace_syscall) .ret3: (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk (pUStk) rsm psr.i // disable interrupts - br.cond.sptk .work_pending_syscall_end + br.cond.sptk ia64_work_pending_syscall_end strace_error: ld8 r3=[r2] // load pt_regs.r8 @@ -636,8 +642,17 @@ GLOBAL_ENTRY(ia64_ret_from_syscall) adds r2=PT(R8)+16,sp // r2 = &pt_regs.r8 mov r10=r0 // clear error indication in r10 (p7) br.cond.spnt handle_syscall_error // handle potential syscall failure +#ifdef CONFIG_PARAVIRT + ;; + br.cond.sptk.few ia64_leave_syscall + ;; +#endif /* CONFIG_PARAVIRT */ END(ia64_ret_from_syscall) +#ifndef CONFIG_PARAVIRT // fall through +#endif +#endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ + /* * ia64_leave_syscall(): Same as ia64_leave_kernel, except that it doesn't * need to switch to bank 0 and doesn't restore the scratch registers. @@ -682,7 +697,7 @@ END(ia64_ret_from_syscall) * ar.csd: cleared * ar.ssd: cleared */ -ENTRY(ia64_leave_syscall) +GLOBAL_ENTRY(__paravirt_leave_syscall) PT_REGS_UNWIND_INFO(0) /* * work.need_resched etc. mustn't get changed by this CPU before it returns to @@ -692,11 +707,11 @@ ENTRY(ia64_leave_syscall) * extra work. We always check for extra work when returning to user-level. * With CONFIG_PREEMPT, we also check for extra work when the preempt_count * is 0. After extra work processing has been completed, execution - * resumes at .work_processed_syscall with p6 set to 1 if the extra-work-check + * resumes at ia64_work_processed_syscall with p6 set to 1 if the extra-work-check * needs to be redone. */ #ifdef CONFIG_PREEMPT - rsm psr.i // disable interrupts + RSM_PSR_I(p0, r2, r18) // disable interrupts cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 ;; @@ -706,11 +721,12 @@ ENTRY(ia64_leave_syscall) ;; cmp.eq p6,p0=r21,r0 // p6 <- pUStk || (preempt_count == 0) #else /* !CONFIG_PREEMPT */ -(pUStk) rsm psr.i + RSM_PSR_I(pUStk, r2, r18) cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk #endif -.work_processed_syscall: +.global __paravirt_work_processed_syscall; +__paravirt_work_processed_syscall: #ifdef CONFIG_VIRT_CPU_ACCOUNTING adds r2=PT(LOADRS)+16,r12 (pUStk) mov.m r22=ar.itc // fetch time at leave @@ -744,7 +760,7 @@ ENTRY(ia64_leave_syscall) (pNonSys) break 0 // bug check: we shouldn't be here if pNonSys is TRUE! ;; invala // M0|1 invalidate ALAT - rsm psr.i | psr.ic // M2 turn off interrupts and interruption collection + RSM_PSR_I_IC(r28, r29, r30) // M2 turn off interrupts and interruption collection cmp.eq p9,p0=r0,r0 // A set p9 to indicate that we should restore cr.ifs ld8 r29=[r2],16 // M0|1 load cr.ipsr @@ -765,7 +781,7 @@ ENTRY(ia64_leave_syscall) ;; #endif ld8 r26=[r2],PT(B0)-PT(AR_PFS) // M0|1 load ar.pfs -(pKStk) mov r22=psr // M2 read PSR now that interrupts are disabled + MOV_FROM_PSR(pKStk, r22, r21) // M2 read PSR now that interrupts are disabled nop 0 ;; ld8 r21=[r2],PT(AR_RNAT)-PT(B0) // M0|1 load b0 @@ -798,7 +814,7 @@ ENTRY(ia64_leave_syscall) srlz.d // M0 ensure interruption collection is off (for cover) shr.u r18=r19,16 // I0|1 get byte size of existing "dirty" partition - cover // B add current frame into dirty partition & set cr.ifs + COVER // B add current frame into dirty partition & set cr.ifs ;; #ifdef CONFIG_VIRT_CPU_ACCOUNTING mov r19=ar.bsp // M2 get new backing store pointer @@ -823,8 +839,9 @@ ENTRY(ia64_leave_syscall) mov.m ar.ssd=r0 // M2 clear ar.ssd mov f11=f0 // F clear f11 br.cond.sptk.many rbs_switch // B -END(ia64_leave_syscall) +END(__paravirt_leave_syscall) +#ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE #ifdef CONFIG_IA32_SUPPORT GLOBAL_ENTRY(ia64_ret_from_ia32_execve) PT_REGS_UNWIND_INFO(0) @@ -835,10 +852,20 @@ GLOBAL_ENTRY(ia64_ret_from_ia32_execve) st8.spill [r2]=r8 // store return value in slot for r8 and set unat bit .mem.offset 8,0 st8.spill [r3]=r0 // clear error indication in slot for r10 and set unat bit +#ifdef CONFIG_PARAVIRT + ;; + // don't fall through, ia64_leave_kernel may be #define'd + br.cond.sptk.few ia64_leave_kernel + ;; +#endif /* CONFIG_PARAVIRT */ END(ia64_ret_from_ia32_execve) +#ifndef CONFIG_PARAVIRT // fall through +#endif #endif /* CONFIG_IA32_SUPPORT */ -GLOBAL_ENTRY(ia64_leave_kernel) +#endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ + +GLOBAL_ENTRY(__paravirt_leave_kernel) PT_REGS_UNWIND_INFO(0) /* * work.need_resched etc. mustn't get changed by this CPU before it returns to @@ -852,7 +879,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) * needs to be redone. */ #ifdef CONFIG_PREEMPT - rsm psr.i // disable interrupts + RSM_PSR_I(p0, r17, r31) // disable interrupts cmp.eq p0,pLvSys=r0,r0 // pLvSys=0: leave from kernel (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 ;; @@ -862,7 +889,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) ;; cmp.eq p6,p0=r21,r0 // p6 <- pUStk || (preempt_count == 0) #else -(pUStk) rsm psr.i + RSM_PSR_I(pUStk, r17, r31) cmp.eq p0,pLvSys=r0,r0 // pLvSys=0: leave from kernel (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk #endif @@ -910,7 +937,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) mov ar.csd=r30 mov ar.ssd=r31 ;; - rsm psr.i | psr.ic // initiate turning off of interrupt and interruption collection + RSM_PSR_I_IC(r23, r22, r25) // initiate turning off of interrupt and interruption collection invala // invalidate ALAT ;; ld8.fill r22=[r2],24 @@ -942,7 +969,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) mov ar.ccv=r15 ;; ldf.fill f11=[r2] - bsw.0 // switch back to bank 0 (no stop bit required beforehand...) + BSW_0(r2, r3, r15) // switch back to bank 0 (no stop bit required beforehand...) ;; (pUStk) mov r18=IA64_KR(CURRENT)// M2 (12 cycle read latency) adds r16=PT(CR_IPSR)+16,r12 @@ -950,12 +977,12 @@ GLOBAL_ENTRY(ia64_leave_kernel) #ifdef CONFIG_VIRT_CPU_ACCOUNTING .pred.rel.mutex pUStk,pKStk -(pKStk) mov r22=psr // M2 read PSR now that interrupts are disabled + MOV_FROM_PSR(pKStk, r22, r29) // M2 read PSR now that interrupts are disabled (pUStk) mov.m r22=ar.itc // M fetch time at leave nop.i 0 ;; #else -(pKStk) mov r22=psr // M2 read PSR now that interrupts are disabled + MOV_FROM_PSR(pKStk, r22, r29) // M2 read PSR now that interrupts are disabled nop.i 0 nop.i 0 ;; @@ -1027,7 +1054,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) * NOTE: alloc, loadrs, and cover can't be predicated. */ (pNonSys) br.cond.dpnt dont_preserve_current_frame - cover // add current frame into dirty partition and set cr.ifs + COVER // add current frame into dirty partition and set cr.ifs ;; mov r19=ar.bsp // get new backing store pointer rbs_switch: @@ -1130,16 +1157,16 @@ skip_rbs_switch: (pKStk) dep r29=r22,r29,21,1 // I0 update ipsr.pp with psr.pp (pLvSys)mov r16=r0 // A clear r16 for leave_syscall, no-op otherwise ;; - mov cr.ipsr=r29 // M2 + MOV_TO_IPSR(r29, r25) // M2 mov ar.pfs=r26 // I0 (pLvSys)mov r17=r0 // A clear r17 for leave_syscall, no-op otherwise -(p9) mov cr.ifs=r30 // M2 + MOV_TO_IFS(p9, r30, r25)// M2 mov b0=r21 // I0 (pLvSys)mov r18=r0 // A clear r18 for leave_syscall, no-op otherwise mov ar.fpsr=r20 // M2 - mov cr.iip=r28 // M2 + MOV_TO_IIP(r28, r25) // M2 nop 0 ;; (pUStk) mov ar.rnat=r24 // M2 must happen with RSE in lazy mode @@ -1148,7 +1175,7 @@ skip_rbs_switch: mov ar.rsc=r27 // M2 mov pr=r31,-1 // I0 - rfi // B + RFI // B /* * On entry: @@ -1170,36 +1197,37 @@ skip_rbs_switch: (pKStk) dep r21=-1,r0,PREEMPT_ACTIVE_BIT,1 ;; (pKStk) st4 [r20]=r21 - ssm psr.i // enable interrupts + SSM_PSR_I(p0, r2) // enable interrupts #endif br.call.spnt.many rp=schedule .ret9: cmp.eq p6,p0=r0,r0 // p6 <- 1 - rsm psr.i // disable interrupts + RSM_PSR_I(p0, r2, r20) // disable interrupts ;; #ifdef CONFIG_PREEMPT (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 ;; (pKStk) st4 [r20]=r0 // preempt_count() <- 0 #endif -(pLvSys)br.cond.sptk.few .work_pending_syscall_end +(pLvSys)br.cond.sptk.few __paravirt_pending_syscall_end br.cond.sptk.many .work_processed_kernel // re-check .notify: (pUStk) br.call.spnt.many rp=notify_resume_user .ret10: cmp.ne p6,p0=r0,r0 // p6 <- 0 -(pLvSys)br.cond.sptk.few .work_pending_syscall_end +(pLvSys)br.cond.sptk.few __paravirt_pending_syscall_end br.cond.sptk.many .work_processed_kernel // don't re-check -.work_pending_syscall_end: +.global __paravirt_pending_syscall_end; +__paravirt_pending_syscall_end: adds r2=PT(R8)+16,r12 adds r3=PT(R10)+16,r12 ;; ld8 r8=[r2] ld8 r10=[r3] - br.cond.sptk.many .work_processed_syscall // re-check - -END(ia64_leave_kernel) + br.cond.sptk.many __paravirt_work_processed_syscall_target // re-check +END(__paravirt_leave_kernel) +#ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE ENTRY(handle_syscall_error) /* * Some system calls (e.g., ptrace, mmap) can return arbitrary values which could @@ -1238,7 +1266,7 @@ END(ia64_invoke_schedule_tail) * be set up by the caller. We declare 8 input registers so the system call * args get preserved, in case we need to restart a system call. */ -ENTRY(notify_resume_user) +GLOBAL_ENTRY(notify_resume_user) .prologue ASM_UNW_PRLG_RP|ASM_UNW_PRLG_PFS, ASM_UNW_PRLG_GRSAVE(8) alloc loc1=ar.pfs,8,2,3,0 // preserve all eight input regs in case of syscall restart! mov r9=ar.unat @@ -1300,7 +1328,7 @@ ENTRY(sys_rt_sigreturn) adds sp=16,sp ;; ld8 r9=[sp] // load new ar.unat - mov.sptk b7=r8,ia64_leave_kernel + mov.sptk b7=r8,ia64_native_leave_kernel ;; mov ar.unat=r9 br.many b7 @@ -1659,3 +1687,4 @@ sys_call_table: data8 sys_timerfd_gettime .org sys_call_table + 8*NR_syscalls // guard against failures to increase NR_syscalls +#endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c index e5482bb..7126ea8 100644 --- a/arch/ia64/kernel/paravirt.c +++ b/arch/ia64/kernel/paravirt.c @@ -286,3 +286,22 @@ struct pv_cpu_ops pv_cpu_ops = { = ia64_native_intrin_local_irq_restore_func, }; EXPORT_SYMBOL(pv_cpu_ops); + +/****************************************************************************** + * replacement of hand written assembly codes. + */ + +void +paravirt_cpu_asm_init(const struct pv_cpu_asm_switch *cpu_asm_switch) +{ + extern unsigned long paravirt_switch_to_targ; + extern unsigned long paravirt_leave_syscall_targ; + extern unsigned long paravirt_work_processed_syscall_targ; + extern unsigned long paravirt_leave_kernel_targ; + + paravirt_switch_to_targ = cpu_asm_switch->switch_to; + paravirt_leave_syscall_targ = cpu_asm_switch->leave_syscall; + paravirt_work_processed_syscall_targ + cpu_asm_switch->work_processed_syscall; + paravirt_leave_kernel_targ = cpu_asm_switch->leave_kernel; +} diff --git a/arch/ia64/kernel/paravirtentry.S b/arch/ia64/kernel/paravirtentry.S new file mode 100644 index 0000000..2f42fcb --- /dev/null +++ b/arch/ia64/kernel/paravirtentry.S @@ -0,0 +1,60 @@ +/****************************************************************************** + * linux/arch/ia64/xen/paravirtentry.S + * + * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> + * VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include <asm/asmmacro.h> +#include <asm/asm-offsets.h> +#include "entry.h" + +#define DATA8(sym, init_value) \ + .pushsection .data.read_mostly ; \ + .align 8 ; \ + .global sym ; \ + sym: ; \ + data8 init_value ; \ + .popsection + +#define BRANCH(targ, reg, breg) \ + movl reg=targ ; \ + ;; \ + ld8 reg=[reg] ; \ + ;; \ + mov breg=reg ; \ + br.cond.sptk.many breg + +#define BRANCH_PROC(sym, reg, breg) \ + DATA8(paravirt_ ## sym ## _targ, ia64_native_ ## sym) ; \ + GLOBAL_ENTRY(paravirt_ ## sym) ; \ + BRANCH(paravirt_ ## sym ## _targ, reg, breg) ; \ + END(paravirt_ ## sym) + +#define BRANCH_PROC_UNWINFO(sym, reg, breg) \ + DATA8(paravirt_ ## sym ## _targ, ia64_native_ ## sym) ; \ + GLOBAL_ENTRY(paravirt_ ## sym) ; \ + PT_REGS_UNWIND_INFO(0) ; \ + BRANCH(paravirt_ ## sym ## _targ, reg, breg) ; \ + END(paravirt_ ## sym) + + +BRANCH_PROC(switch_to, r22, b7) +BRANCH_PROC_UNWINFO(leave_syscall, r22, b7) +BRANCH_PROC(work_processed_syscall, r2, b7) +BRANCH_PROC_UNWINFO(leave_kernel, r22, b7) diff --git a/include/asm-ia64/native/inst.h b/include/asm-ia64/native/inst.h index c167b7d..1855dae 100644 --- a/include/asm-ia64/native/inst.h +++ b/include/asm-ia64/native/inst.h @@ -22,6 +22,14 @@ #define DO_SAVE_MIN IA64_NATIVE_DO_SAVE_MIN +#define __paravirt_switch_to ia64_native_switch_to +#define __paravirt_leave_syscall ia64_native_leave_syscall +#define __paravirt_work_processed_syscall ia64_native_work_processed_syscall +#define __paravirt_leave_kernel ia64_native_leave_kernel +#define __paravirt_pending_syscall_end ia64_work_pending_syscall_end +#define __paravirt_work_processed_syscall_target \ + ia64_work_processed_syscall + #ifdef CONFIG_PARAVIRT_GUEST_ASM_CLOBBER_CHECK # define PARAVIRT_POISON 0xdeadbeefbaadf00d # define CLOBBER(clob) \ diff --git a/include/asm-ia64/paravirt_privop.h b/include/asm-ia64/paravirt_privop.h index 7b133ae..52482e6 100644 --- a/include/asm-ia64/paravirt_privop.h +++ b/include/asm-ia64/paravirt_privop.h @@ -80,12 +80,35 @@ extern unsigned long ia64_native_getreg_func(int regnum); ia64_native_rsm(mask); \ } while (0) +/****************************************************************************** + * replacement of hand written assembly codes. + */ +struct pv_cpu_asm_switch { + unsigned long switch_to; + unsigned long leave_syscall; + unsigned long work_processed_syscall; + unsigned long leave_kernel; +}; +void paravirt_cpu_asm_init(const struct pv_cpu_asm_switch *cpu_asm_switch); + #endif /* __ASSEMBLY__ */ +#define IA64_PARAVIRT_ASM_FUNC(name) paravirt_ ## name + #else /* fallback for native case */ +#define IA64_PARAVIRT_ASM_FUNC(name) ia64_native_ ## name #endif /* CONFIG_PARAVIRT */ +/* these routines utilize privilege-sensitive or performance-sensitive + * privileged instructions so the code must be replaced with + * paravirtualized versions */ +#define ia64_switch_to IA64_PARAVIRT_ASM_FUNC(switch_to) +#define ia64_leave_syscall IA64_PARAVIRT_ASM_FUNC(leave_syscall) +#define ia64_work_processed_syscall \ + IA64_PARAVIRT_ASM_FUNC(work_processed_syscall) +#define ia64_leave_kernel IA64_PARAVIRT_ASM_FUNC(leave_kernel) + #endif /* _ASM_IA64_PARAVIRT_PRIVOP_H */ -- 1.5.3
Make NR_IRQ overridable by each pv instances. Pv instance may need each own number of irqs so that NR_IRQS should be the maximum number of nr_irqs each pv instances need. Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/Makefile | 6 ++++++ arch/ia64/kernel/Makefile | 33 +++++++++++++++++++++++++++++++++ arch/ia64/kernel/nr-irqs.c | 24 ++++++++++++++++++++++++ include/asm-ia64/irq.h | 9 +-------- include/asm-ia64/native/irq.h | 35 +++++++++++++++++++++++++++++++++++ 5 files changed, 99 insertions(+), 8 deletions(-) create mode 100644 arch/ia64/kernel/nr-irqs.c create mode 100644 include/asm-ia64/native/irq.h diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile index ec4cca4..4721a12 100644 --- a/arch/ia64/Makefile +++ b/arch/ia64/Makefile @@ -99,3 +99,9 @@ define archhelp echo ' boot - Build vmlinux and bootloader for Ski simulator' echo '* unwcheck - Check vmlinux for invalid unwind info' endef + +archprepare: make_nr_irqs_h FORCE +PHONY += make_nr_irqs_h FORCE + +make_nr_irqs_h: FORCE + $(Q)$(MAKE) $(build)=arch/ia64/kernel include/asm-ia64/nr-irqs.h diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile index cea91f1..87fea11 100644 --- a/arch/ia64/kernel/Makefile +++ b/arch/ia64/kernel/Makefile @@ -73,6 +73,39 @@ $(obj)/gate-syms.o: $(obj)/gate.lds $(obj)/gate.o FORCE # Note: kbuild does not track this dependency due to usage of .incbin $(obj)/gate-data.o: $(obj)/gate.so +# Calculate NR_IRQ = max(IA64_NATIVE_NR_IRQS, XEN_NR_IRQS, ...) based on config +define sed-y + "/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}" +endef +quiet_cmd_nr_irqs = GEN $@ +define cmd_nr_irqs + (set -e; \ + echo "#ifndef __ASM_NR_IRQS_H__"; \ + echo "#define __ASM_NR_IRQS_H__"; \ + echo "/*"; \ + echo " * DO NOT MODIFY."; \ + echo " *"; \ + echo " * This file was generated by Kbuild"; \ + echo " *"; \ + echo " */"; \ + echo ""; \ + sed -ne $(sed-y) $<; \ + echo ""; \ + echo "#endif" ) > $@ +endef + +# We use internal kbuild rules to avoid the "is up to date" message from make +arch/$(SRCARCH)/kernel/nr-irqs.s: $(srctree)/arch/$(SRCARCH)/kernel/nr-irqs.c \ + $(wildcard $(srctree)/include/asm-ia64/*/irq.h) + $(Q)mkdir -p $(dir $@) + $(call if_changed_dep,cc_s_c) + +include/asm-ia64/nr-irqs.h: arch/$(SRCARCH)/kernel/nr-irqs.s + $(Q)mkdir -p $(dir $@) + $(call cmd,nr_irqs) + +clean-files += $(objtree)/include/asm-ia64/nr-irqs.h + # # native ivt.S and entry.S # diff --git a/arch/ia64/kernel/nr-irqs.c b/arch/ia64/kernel/nr-irqs.c new file mode 100644 index 0000000..1ae0491 --- /dev/null +++ b/arch/ia64/kernel/nr-irqs.c @@ -0,0 +1,24 @@ +/* + * calculate + * NR_IRQS = max(IA64_NATIVE_NR_IRQS, XEN_NR_IRQS, FOO_NR_IRQS...) + * depending on config. + * This must be calculated before processing asm-offset.c. + */ + +#define ASM_OFFSETS_C 1 + +#include <linux/kbuild.h> +#include <linux/threads.h> +#include <asm-ia64/native/irq.h> + +void foo(void) +{ + union paravirt_nr_irqs_max { + char ia64_native_nr_irqs[IA64_NATIVE_NR_IRQS]; +#ifdef CONFIG_XEN + char xen_nr_irqs[XEN_NR_IRQS]; +#endif + }; + + DEFINE(NR_IRQS, sizeof (union paravirt_nr_irqs_max)); +} diff --git a/include/asm-ia64/irq.h b/include/asm-ia64/irq.h index a66d268..3627116 100644 --- a/include/asm-ia64/irq.h +++ b/include/asm-ia64/irq.h @@ -13,14 +13,7 @@ #include <linux/types.h> #include <linux/cpumask.h> - -#define NR_VECTORS 256 - -#if (NR_VECTORS + 32 * NR_CPUS) < 1024 -#define NR_IRQS (NR_VECTORS + 32 * NR_CPUS) -#else -#define NR_IRQS 1024 -#endif +#include <asm-ia64/nr-irqs.h> static __inline__ int irq_canonicalize (int irq) diff --git a/include/asm-ia64/native/irq.h b/include/asm-ia64/native/irq.h new file mode 100644 index 0000000..efe9ff7 --- /dev/null +++ b/include/asm-ia64/native/irq.h @@ -0,0 +1,35 @@ +/****************************************************************************** + * include/asm-ia64/native/irq.h + * + * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> + * VA Linux Systems Japan K.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * moved from linux/include/asm-ia64/irq.h. + */ + +#ifndef _ASM_IA64_NATIVE_IRQ_H +#define _ASM_IA64_NATIVE_IRQ_H + +#define NR_VECTORS 256 + +#if (NR_VECTORS + 32 * NR_CPUS) < 1024 +#define IA64_NATIVE_NR_IRQS (NR_VECTORS + 32 * NR_CPUS) +#else +#define IA64_NATIVE_NR_IRQS 1024 +#endif + +#endif /* _ASM_IA64_NATIVE_IRQ_H */ -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 12/15] ia64/pv_ops: define initialization hooks, pv_init_ops, for paravirtualized environment.
define pv_init_ops hooks which represents various initialization hooks for paravirtualized environment. and add hooks. Signed-off-by: Alex Williamson <alex.williamson at hp.com> Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/paravirt.c | 7 ++++ arch/ia64/kernel/setup.c | 10 ++++++ arch/ia64/kernel/smpboot.c | 2 + include/asm-ia64/paravirt.h | 70 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 89 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c index 7126ea8..5daf659 100644 --- a/arch/ia64/kernel/paravirt.c +++ b/arch/ia64/kernel/paravirt.c @@ -42,6 +42,13 @@ struct pv_info pv_info = { }; /*************************************************************************** + * pv_init_ops + * initialization hooks. + */ + +struct pv_init_ops pv_init_ops; + +/*************************************************************************** * pv_cpu_ops * intrinsics hooks. */ diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c index 5015ca1..dc1b26b 100644 --- a/arch/ia64/kernel/setup.c +++ b/arch/ia64/kernel/setup.c @@ -51,6 +51,7 @@ #include <asm/mca.h> #include <asm/meminit.h> #include <asm/page.h> +#include <asm/paravirt.h> #include <asm/patch.h> #include <asm/pgtable.h> #include <asm/processor.h> @@ -312,6 +313,8 @@ reserve_memory (void) rsvd_region[n].end = (unsigned long) ia64_imva(_end); n++; + n += paravirt_reserve_memory(&rsvd_region[n]); + #ifdef CONFIG_BLK_DEV_INITRD if (ia64_boot_param->initrd_start) { rsvd_region[n].start = (unsigned long)__va(ia64_boot_param->initrd_start); @@ -490,6 +493,8 @@ setup_arch (char **cmdline_p) { unw_init(); + paravirt_arch_setup_early(); + ia64_patch_vtop((u64) __start___vtop_patchlist, (u64) __end___vtop_patchlist); *cmdline_p = __va(ia64_boot_param->command_line); @@ -544,6 +549,9 @@ setup_arch (char **cmdline_p) acpi_boot_init(); #endif + paravirt_banner(); + paravirt_arch_setup_console(cmdline_p); + #ifdef CONFIG_VT if (!conswitchp) { # if defined(CONFIG_DUMMY_CONSOLE) @@ -563,6 +571,8 @@ setup_arch (char **cmdline_p) #endif /* enable IA-64 Machine Check Abort Handling unless disabled */ + if (paravirt_arch_setup_nomca()) + nomca = 1; if (!nomca) ia64_mca_init(); diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c index d7ad42b..933f388 100644 --- a/arch/ia64/kernel/smpboot.c +++ b/arch/ia64/kernel/smpboot.c @@ -50,6 +50,7 @@ #include <asm/machvec.h> #include <asm/mca.h> #include <asm/page.h> +#include <asm/paravirt.h> #include <asm/pgalloc.h> #include <asm/pgtable.h> #include <asm/processor.h> @@ -642,6 +643,7 @@ void __devinit smp_prepare_boot_cpu(void) cpu_set(smp_processor_id(), cpu_online_map); cpu_set(smp_processor_id(), cpu_callin_map); per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE; + paravirt_post_smp_prepare_boot_cpu(); } #ifdef CONFIG_HOTPLUG_CPU diff --git a/include/asm-ia64/paravirt.h b/include/asm-ia64/paravirt.h index 26b4334..8778e8b 100644 --- a/include/asm-ia64/paravirt.h +++ b/include/asm-ia64/paravirt.h @@ -52,11 +52,81 @@ static inline unsigned int get_kernel_rpl(void) return pv_info.kernel_rpl; } +/****************************************************************************** + * initialization hooks. + */ +struct rsvd_region; + +struct pv_init_ops { + void (*banner)(void); + + int (*reserve_memory)(struct rsvd_region *region); + + void (*arch_setup_early)(void); + void (*arch_setup_console)(char **cmdline_p); + int (*arch_setup_nomca)(void); + + void (*post_smp_prepare_boot_cpu)(void); +}; + +extern struct pv_init_ops pv_init_ops; + +static inline void paravirt_banner(void) +{ + if (pv_init_ops.banner) + pv_init_ops.banner(); +} + +static inline int paravirt_reserve_memory(struct rsvd_region *region) +{ + if (pv_init_ops.reserve_memory) + return pv_init_ops.reserve_memory(region); + return 0; +} + +static inline void paravirt_arch_setup_early(void) +{ + if (pv_init_ops.arch_setup_early) + pv_init_ops.arch_setup_early(); +} + +static inline void paravirt_arch_setup_console(char **cmdline_p) +{ + if (pv_init_ops.arch_setup_console) + pv_init_ops.arch_setup_console(cmdline_p); +} + +static inline int paravirt_arch_setup_nomca(void) +{ + if (pv_init_ops.arch_setup_nomca) + return pv_init_ops.arch_setup_nomca(); + return 0; +} + +static inline void paravirt_post_smp_prepare_boot_cpu(void) +{ + if (pv_init_ops.post_smp_prepare_boot_cpu) + pv_init_ops.post_smp_prepare_boot_cpu(); +} + #endif /* !__ASSEMBLY__ */ #else /* fallback for native case */ +#ifndef __ASSEMBLY__ + +#define paravirt_banner() do { } while (0) +#define paravirt_reserve_memory(region) 0 + +#define paravirt_arch_setup_early() do { } while (0) +#define paravirt_arch_setup_console(cmdline_p) do { } while (0) +#define paravirt_arch_setup_nomca() 0 +#define paravirt_post_smp_prepare_boot_cpu() do { } while (0) + +#endif /* __ASSEMBLY__ */ + + #endif /* CONFIG_PARAVIRT_GUEST */ #endif /* __ASM_PARAVIRT_H */ -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 13/15] ia64/pv_ops: add hooks, pv_iosapic_ops, to paravirtualize iosapic.
add hooks to paravirtualize iosapic which is a real hardware resource. On virtualized environment it may be replaced something virtualized friendly. Define pv_iosapic_ops and add the hooks. Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/iosapic.c | 45 +++++++++++++++++++++++++++--------------- arch/ia64/kernel/paravirt.c | 25 +++++++++++++++++++++++ include/asm-ia64/iosapic.h | 18 +++++++++++++++- include/asm-ia64/paravirt.h | 40 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 110 insertions(+), 18 deletions(-) diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c index 082c31d..587196d 100644 --- a/arch/ia64/kernel/iosapic.c +++ b/arch/ia64/kernel/iosapic.c @@ -587,6 +587,15 @@ static inline int irq_is_shared (int irq) return (iosapic_intr_info[irq].count > 1); } +struct irq_chip* +ia64_native_iosapic_get_irq_chip(unsigned long trigger) +{ + if (trigger == IOSAPIC_EDGE) + return &irq_type_iosapic_edge; + else + return &irq_type_iosapic_level; +} + static int register_intr (unsigned int gsi, int irq, unsigned char delivery, unsigned long polarity, unsigned long trigger) @@ -637,13 +646,10 @@ register_intr (unsigned int gsi, int irq, unsigned char delivery, iosapic_intr_info[irq].dmode = delivery; iosapic_intr_info[irq].trigger = trigger; - if (trigger == IOSAPIC_EDGE) - irq_type = &irq_type_iosapic_edge; - else - irq_type = &irq_type_iosapic_level; + irq_type = iosapic_get_irq_chip(trigger); idesc = irq_desc + irq; - if (idesc->chip != irq_type) { + if (irq_type != NULL && idesc->chip != irq_type) { if (idesc->chip != &no_irq_type) printk(KERN_WARNING "%s: changing vector %d from %s to %s\n", @@ -976,6 +982,22 @@ iosapic_override_isa_irq (unsigned int isa_irq, unsigned int gsi, } void __init +ia64_native_iosapic_pcat_compat_init(void) +{ + if (pcat_compat) { + /* + * Disable the compatibility mode interrupts (8259 style), + * needs IN/OUT support enabled. + */ + printk(KERN_INFO + "%s: Disabling PC-AT compatible 8259 interrupts\n", + __func__); + outb(0xff, 0xA1); + outb(0xff, 0x21); + } +} + +void __init iosapic_system_init (int system_pcat_compat) { int irq; @@ -989,17 +1011,8 @@ iosapic_system_init (int system_pcat_compat) } pcat_compat = system_pcat_compat; - if (pcat_compat) { - /* - * Disable the compatibility mode interrupts (8259 style), - * needs IN/OUT support enabled. - */ - printk(KERN_INFO - "%s: Disabling PC-AT compatible 8259 interrupts\n", - __func__); - outb(0xff, 0xA1); - outb(0xff, 0x21); - } + if (pcat_compat) + iosapic_pcat_compat_init(); } static inline int diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c index 5daf659..65c211b 100644 --- a/arch/ia64/kernel/paravirt.c +++ b/arch/ia64/kernel/paravirt.c @@ -312,3 +312,28 @@ paravirt_cpu_asm_init(const struct pv_cpu_asm_switch *cpu_asm_switch) cpu_asm_switch->work_processed_syscall; paravirt_leave_kernel_targ = cpu_asm_switch->leave_kernel; } + +/*************************************************************************** + * pv_iosapic_ops + * iosapic read/write hooks. + */ + +static unsigned int +ia64_native_iosapic_read(char __iomem *iosapic, unsigned int reg) +{ + return __ia64_native_iosapic_read(iosapic, reg); +} + +static void +ia64_native_iosapic_write(char __iomem *iosapic, unsigned int reg, u32 val) +{ + __ia64_native_iosapic_write(iosapic, reg, val); +} + +struct pv_iosapic_ops pv_iosapic_ops = { + .pcat_compat_init = ia64_native_iosapic_pcat_compat_init, + .get_irq_chip = ia64_native_iosapic_get_irq_chip, + + .__read = ia64_native_iosapic_read, + .__write = ia64_native_iosapic_write, +}; diff --git a/include/asm-ia64/iosapic.h b/include/asm-ia64/iosapic.h index a3a4288..b9c102e 100644 --- a/include/asm-ia64/iosapic.h +++ b/include/asm-ia64/iosapic.h @@ -55,13 +55,27 @@ #define NR_IOSAPICS 256 -static inline unsigned int __iosapic_read(char __iomem *iosapic, unsigned int reg) +#ifdef CONFIG_PARAVIRT_GUEST +#include <asm/paravirt.h> +#else +#define iosapic_pcat_compat_init ia64_native_iosapic_pcat_compat_init +#define __iosapic_read __ia64_native_iosapic_read +#define __iosapic_write __ia64_native_iosapic_write +#define iosapic_get_irq_chip ia64_native_iosapic_get_irq_chip +#endif + +extern void __init ia64_native_iosapic_pcat_compat_init(void); +extern struct irq_chip *ia64_native_iosapic_get_irq_chip(unsigned long trigger); + +static inline unsigned int +__ia64_native_iosapic_read(char __iomem *iosapic, unsigned int reg) { writel(reg, iosapic + IOSAPIC_REG_SELECT); return readl(iosapic + IOSAPIC_WINDOW); } -static inline void __iosapic_write(char __iomem *iosapic, unsigned int reg, u32 val) +static inline void +__ia64_native_iosapic_write(char __iomem *iosapic, unsigned int reg, u32 val) { writel(reg, iosapic + IOSAPIC_REG_SELECT); writel(val, iosapic + IOSAPIC_WINDOW); diff --git a/include/asm-ia64/paravirt.h b/include/asm-ia64/paravirt.h index 8778e8b..48887a4 100644 --- a/include/asm-ia64/paravirt.h +++ b/include/asm-ia64/paravirt.h @@ -109,6 +109,46 @@ static inline void paravirt_post_smp_prepare_boot_cpu(void) pv_init_ops.post_smp_prepare_boot_cpu(); } +/****************************************************************************** + * replacement of iosapic operations. + */ + +struct pv_iosapic_ops { + void (*pcat_compat_init)(void); + + struct irq_chip *(*get_irq_chip)(unsigned long trigger); + + unsigned int (*__read)(char __iomem *iosapic, unsigned int reg); + void (*__write)(char __iomem *iosapic, unsigned int reg, u32 val); +}; + +extern struct pv_iosapic_ops pv_iosapic_ops; + +static inline void +iosapic_pcat_compat_init(void) +{ + if (pv_iosapic_ops.pcat_compat_init) + pv_iosapic_ops.pcat_compat_init(); +} + +static inline struct irq_chip* +iosapic_get_irq_chip(unsigned long trigger) +{ + return pv_iosapic_ops.get_irq_chip(trigger); +} + +static inline unsigned int +__iosapic_read(char __iomem *iosapic, unsigned int reg) +{ + return pv_iosapic_ops.__read(iosapic, reg); +} + +static inline void +__iosapic_write(char __iomem *iosapic, unsigned int reg, u32 val) +{ + return pv_iosapic_ops.__write(iosapic, reg, val); +} + #endif /* !__ASSEMBLY__ */ #else -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 14/15] ia64/pv_ops: add hooks, pv_irq_ops, to paravirtualized irq related operations.
introduce pv_irq_ops which adds hooks to paravirtualize irq related operations. On virtualized environment, interruption may be replaced by something virtualization friendly. So the irq related operation also may need paravirtualization. This patch adds necessary hooks to paravirtualize irq related operations. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/irq_ia64.c | 18 +++++++++++---- arch/ia64/kernel/paravirt.c | 15 +++++++++++++ include/asm-ia64/hw_irq.h | 23 +++++++++++++++++--- include/asm-ia64/paravirt.h | 48 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 95 insertions(+), 9 deletions(-) diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c index c48171b..28d3d48 100644 --- a/arch/ia64/kernel/irq_ia64.c +++ b/arch/ia64/kernel/irq_ia64.c @@ -196,7 +196,7 @@ static void clear_irq_vector(int irq) } int -assign_irq_vector (int irq) +ia64_native_assign_irq_vector (int irq) { unsigned long flags; int vector, cpu; @@ -222,7 +222,7 @@ assign_irq_vector (int irq) } void -free_irq_vector (int vector) +ia64_native_free_irq_vector (int vector) { if (vector < IA64_FIRST_DEVICE_VECTOR || vector > IA64_LAST_DEVICE_VECTOR) @@ -622,7 +622,7 @@ static struct irqaction tlb_irqaction = { #endif void -register_percpu_irq (ia64_vector vec, struct irqaction *action) +ia64_native_register_percpu_irq (ia64_vector vec, struct irqaction *action) { irq_desc_t *desc; unsigned int irq; @@ -637,13 +637,21 @@ register_percpu_irq (ia64_vector vec, struct irqaction *action) } void __init -init_IRQ (void) +ia64_native_register_ipi(void) { - register_percpu_irq(IA64_SPURIOUS_INT_VECTOR, NULL); #ifdef CONFIG_SMP register_percpu_irq(IA64_IPI_VECTOR, &ipi_irqaction); register_percpu_irq(IA64_IPI_RESCHEDULE, &resched_irqaction); register_percpu_irq(IA64_IPI_LOCAL_TLB_FLUSH, &tlb_irqaction); +#endif +} + +void __init +init_IRQ (void) +{ + ia64_register_ipi(); + register_percpu_irq(IA64_SPURIOUS_INT_VECTOR, NULL); +#ifdef CONFIG_SMP #if defined(CONFIG_IA64_GENERIC) || defined(CONFIG_IA64_DIG) if (vector_domain_type != VECTOR_DOMAIN_NONE) { BUG_ON(IA64_FIRST_DEVICE_VECTOR != IA64_IRQ_MOVE_VECTOR); diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c index 65c211b..ba5383b 100644 --- a/arch/ia64/kernel/paravirt.c +++ b/arch/ia64/kernel/paravirt.c @@ -337,3 +337,18 @@ struct pv_iosapic_ops pv_iosapic_ops = { .__read = ia64_native_iosapic_read, .__write = ia64_native_iosapic_write, }; + +/*************************************************************************** + * pv_irq_ops + * irq operations + */ + +struct pv_irq_ops pv_irq_ops = { + .register_ipi = ia64_native_register_ipi, + + .assign_irq_vector = ia64_native_assign_irq_vector, + .free_irq_vector = ia64_native_free_irq_vector, + .register_percpu_irq = ia64_native_register_percpu_irq, + + .resend_irq = ia64_native_resend_irq, +}; diff --git a/include/asm-ia64/hw_irq.h b/include/asm-ia64/hw_irq.h index 76366dc..5c99cbc 100644 --- a/include/asm-ia64/hw_irq.h +++ b/include/asm-ia64/hw_irq.h @@ -15,7 +15,11 @@ #include <asm/ptrace.h> #include <asm/smp.h> +#ifndef CONFIG_PARAVIRT typedef u8 ia64_vector; +#else +typedef u16 ia64_vector; +#endif /* * 0 special @@ -104,13 +108,24 @@ DECLARE_PER_CPU(int[IA64_NUM_VECTORS], vector_irq); extern struct hw_interrupt_type irq_type_ia64_lsapic; /* CPU-internal interrupt controller */ +#ifdef CONFIG_PARAVIRT_GUEST +#include <asm/paravirt.h> +#else +#define ia64_register_ipi ia64_native_register_ipi +#define assign_irq_vector ia64_native_assign_irq_vector +#define free_irq_vector ia64_native_free_irq_vector +#define register_percpu_irq ia64_native_register_percpu_irq +#define ia64_resend_irq ia64_native_resend_irq +#endif + +extern void ia64_native_register_ipi(void); extern int bind_irq_vector(int irq, int vector, cpumask_t domain); -extern int assign_irq_vector (int irq); /* allocate a free vector */ -extern void free_irq_vector (int vector); +extern int ia64_native_assign_irq_vector (int irq); /* allocate a free vector */ +extern void ia64_native_free_irq_vector (int vector); extern int reserve_irq_vector (int vector); extern void __setup_vector_irq(int cpu); extern void ia64_send_ipi (int cpu, int vector, int delivery_mode, int redirect); -extern void register_percpu_irq (ia64_vector vec, struct irqaction *action); +extern void ia64_native_register_percpu_irq (ia64_vector vec, struct irqaction *action); extern int check_irq_used (int irq); extern void destroy_and_reserve_irq (unsigned int irq); @@ -122,7 +137,7 @@ static inline int irq_prepare_move(int irq, int cpu) { return 0; } static inline void irq_complete_move(unsigned int irq) {} #endif -static inline void ia64_resend_irq(unsigned int vector) +static inline void ia64_native_resend_irq(unsigned int vector) { platform_send_ipi(smp_processor_id(), vector, IA64_IPI_DM_INT, 0); } diff --git a/include/asm-ia64/paravirt.h b/include/asm-ia64/paravirt.h index 48887a4..2203e9a 100644 --- a/include/asm-ia64/paravirt.h +++ b/include/asm-ia64/paravirt.h @@ -149,6 +149,54 @@ __iosapic_write(char __iomem *iosapic, unsigned int reg, u32 val) return pv_iosapic_ops.__write(iosapic, reg, val); } +/****************************************************************************** + * replacement of irq operations. + */ + +struct pv_irq_ops { + void (*register_ipi)(void); + + int (*assign_irq_vector)(int irq); + void (*free_irq_vector)(int vector); + + void (*register_percpu_irq)(ia64_vector vec, + struct irqaction *action); + + void (*resend_irq)(unsigned int vector); +}; + +extern struct pv_irq_ops pv_irq_ops; + +static inline void +ia64_register_ipi(void) +{ + pv_irq_ops.register_ipi(); +} + +static inline int +assign_irq_vector(int irq) +{ + return pv_irq_ops.assign_irq_vector(irq); +} + +static inline void +free_irq_vector(int vector) +{ + return pv_irq_ops.free_irq_vector(vector); +} + +static inline void +register_percpu_irq(ia64_vector vec, struct irqaction *action) +{ + pv_irq_ops.register_percpu_irq(vec, action); +} + +static inline void +ia64_resend_irq(unsigned int vector) +{ + pv_irq_ops.resend_irq(vector); +} + #endif /* !__ASSEMBLY__ */ #else -- 1.5.3
Isaku Yamahata
2008-Apr-30 12:29 UTC
[PATCH 15/15] ia64/pv_ops: add to hooks, pv_time_ops, for steal time accounting.
Introduce pv_time_ops which adds hook to steal time accounting. On virtualized environment, cpus are shared by many guests and steal time is the time which is used for other guests. On virtualized environtment, streal time should be accounted. Signed-off-by: Isaku Yamahata <yamahata at valinux.co.jp> --- arch/ia64/kernel/paravirt.c | 15 +++++++++++++++ arch/ia64/kernel/time.c | 23 +++++++++++++++++++++++ include/asm-ia64/paravirt.h | 32 ++++++++++++++++++++++++++++++++ 3 files changed, 70 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/paravirt.c b/arch/ia64/kernel/paravirt.c index ba5383b..afaf5b9 100644 --- a/arch/ia64/kernel/paravirt.c +++ b/arch/ia64/kernel/paravirt.c @@ -352,3 +352,18 @@ struct pv_irq_ops pv_irq_ops = { .resend_irq = ia64_native_resend_irq, }; + +/*************************************************************************** + * pv_time_ops + * time operations + */ + +static int +ia64_native_do_steal_accounting(unsigned long *new_itm) +{ + return 0; +} + +struct pv_time_ops pv_time_ops = { + .do_steal_accounting = ia64_native_do_steal_accounting, +}; diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index 48e15a5..744247c 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -24,6 +24,7 @@ #include <asm/machvec.h> #include <asm/delay.h> #include <asm/hw_irq.h> +#include <asm/paravirt.h> #include <asm/ptrace.h> #include <asm/sal.h> #include <asm/sections.h> @@ -48,6 +49,15 @@ EXPORT_SYMBOL(last_cli_ip); #endif +#ifdef CONFIG_PARAVIRT +static void +paravirt_clocksource_resume(void) +{ + if (pv_time_ops.clocksource_resume) + pv_time_ops.clocksource_resume(); +} +#endif + static struct clocksource clocksource_itc = { .name = "itc", .rating = 350, @@ -56,6 +66,9 @@ static struct clocksource clocksource_itc = { .mult = 0, /*to be calculated*/ .shift = 16, .flags = CLOCK_SOURCE_IS_CONTINUOUS, +#ifdef CONFIG_PARAVIRT + .resume = paravirt_clocksource_resume, +#endif }; static struct clocksource *itc_clocksource; @@ -156,6 +169,9 @@ timer_interrupt (int irq, void *dev_id) profile_tick(CPU_PROFILING); + if (paravirt_do_steal_accounting(&new_itm)) + goto skip_process_time_accounting; + while (1) { update_process_times(user_mode(get_irq_regs())); @@ -185,6 +201,8 @@ timer_interrupt (int irq, void *dev_id) local_irq_disable(); } +skip_process_time_accounting: + do { /* * If we're too close to the next clock tick for @@ -334,6 +352,11 @@ ia64_init_itm (void) */ clocksource_itc.rating = 50; + paravirt_init_missing_ticks_accounting(smp_processor_id()); + + /* avoid softlock up message when cpu is unplug and plugged again. */ + touch_softlockup_watchdog(); + /* Setup the CPU local timer tick */ ia64_cpu_local_tick(); diff --git a/include/asm-ia64/paravirt.h b/include/asm-ia64/paravirt.h index 2203e9a..44017c1 100644 --- a/include/asm-ia64/paravirt.h +++ b/include/asm-ia64/paravirt.h @@ -197,6 +197,35 @@ ia64_resend_irq(unsigned int vector) pv_irq_ops.resend_irq(vector); } +/****************************************************************************** + * replacement of time operations. + */ + +extern struct itc_jitter_data_t itc_jitter_data; +extern volatile int time_keeper_id; + +struct pv_time_ops { + void (*init_missing_ticks_accounting)(int cpu); + int (*do_steal_accounting)(unsigned long *new_itm); + + void (*clocksource_resume)(void); +}; + +extern struct pv_time_ops pv_time_ops; + +static inline void +paravirt_init_missing_ticks_accounting(int cpu) +{ + if (pv_time_ops.init_missing_ticks_accounting) + pv_time_ops.init_missing_ticks_accounting(cpu); +} + +static inline int +paravirt_do_steal_accounting(unsigned long *new_itm) +{ + return pv_time_ops.do_steal_accounting(new_itm); +} + #endif /* !__ASSEMBLY__ */ #else @@ -212,6 +241,9 @@ ia64_resend_irq(unsigned int vector) #define paravirt_arch_setup_nomca() 0 #define paravirt_post_smp_prepare_boot_cpu() do { } while (0) +#define paravirt_init_missing_ticks_accounting(cpu) do { } while (0) +#define paravirt_do_steal_accounting(new_itm) 0 + #endif /* __ASSEMBLY__ */ -- 1.5.3
Hi Tony. Could you give me comments/hints to make progress? What do you think about those patches? I believe that ia64/kvm and other ia64 virtualization technologies also can get much benefit from ia64/pv_ops and that no performance penalties exist with CONFIG_PARAVIRT=n. Regarding to the ia64/xen specific patches, should they go through the xen maintainer like the ia64/kvm patches did? thanks, On Wed, Apr 30, 2008 at 09:29:13PM +0900, Isaku Yamahata wrote:> Hi. This patchset implements ia64/pv_ops support which is the > framework for virtualization support. > Now all the comments so far have been addressed, but only a few exceptions. > > On x86 various ways to support virtualization were proposed, and > eventually pv_ops won. So on ia64 the pv_ops strategy is appropriate too. > Later I'll post the patchset which implements xen domU based on > ia64/pv_ops. Currently only ia64/xen pv_ops implementation exists, > but I believe ia64/kvm can also obtain benefits from this ia64/pv_ops > frame work. > linux/ia64 has the machine vector interface and another approach might be > to utilize it. However pv_ops is better for some reasons. Please see > the thread of the mailing list from > http://www.spinics.net/lists/linux-ia64/msg05238.html > > > This patchset does the followings. > - some preparation work. Mainly make the kernel virtualization friendly. > - introduce pv_info which describes the execution environment. > - introduce macros for hand written assembly code to paravirtualize > hand written code to replace sensitive/privileged instructions. > - introduce various hooks to replace the implementation with > paravirtualized. > They are defined as function tables called, pv_init_ops, pv_cpu_ops, > pv_irq_ops, pv_iosapic_ops and, pv_time_ops. > They represent > pv_init_ops: hooks for various initialization. > pv_cpu_ops: hooks for ia64 intrinsics. > pv_irq_ops: hooks for irq related operations. > pv_iosapic_ops: hooks for paravirtualized iosapic. > pv_time_ops: hooks for steal time accounting > > > The working full source is available from > http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/ > branch: xen-ia64-2008apr30 > > At this phase, we don't address the following issues. > Those will be addressed after the first merge. > - optimization by binary patch > In fact, we had the patch to do that, but we intentionally dropped > for patch size/readability/cleanness. > - freeing the unused pages, i.e. pages for unused ivt.S. > > Changes from take 4: > - refined NR_IRQS paravirtualization according to Jes Soresen's comment > - rebased and fixed the breakage due to the upstream change. > - fixed various compilation errors. > > Changes from take 3: > - split the patch set into pv_op part and xen domU part. > - many clean ups. > - introduced pv_ops: pv_cpu_ops and pv_time_ops. > > Changes from take 2: > - many clean ups following to comments. > - clean up:assembly instruction macro. > - introduced pv_ops: pv_info, pv_init_ops, pv_iosapic_ops, pv_irq_ops. > > Changes from take 1: > Single IVT source code. compile multitimes using assembler macros. > > thanks, > > Diffstat: > arch/ia64/Makefile | 6 + > arch/ia64/kernel/Makefile | 44 +++++ > arch/ia64/kernel/entry.S | 115 +++++++---- > arch/ia64/kernel/iosapic.c | 45 +++-- > arch/ia64/kernel/irq_ia64.c | 19 ++- > arch/ia64/kernel/ivt.S | 227 +++++++++++----------- > arch/ia64/kernel/minstate.h | 13 +- > arch/ia64/kernel/nr-irqs.c | 24 +++ > arch/ia64/kernel/paravirt.c | 369 ++++++++++++++++++++++++++++++++++++ > arch/ia64/kernel/paravirt_inst.h | 29 +++ > arch/ia64/kernel/paravirtentry.S | 60 ++++++ > arch/ia64/kernel/setup.c | 10 + > arch/ia64/kernel/smpboot.c | 2 + > arch/ia64/kernel/time.c | 23 +++ > include/asm-ia64/Kbuild | 2 +- > include/asm-ia64/gcc_intrin.h | 24 ++-- > include/asm-ia64/hw_irq.h | 23 ++- > include/asm-ia64/intel_intrin.h | 41 ++-- > include/asm-ia64/intrinsics.h | 55 ++++++ > include/asm-ia64/iosapic.h | 18 ++- > include/asm-ia64/irq.h | 9 +- > include/asm-ia64/mmu_context.h | 6 +- > include/asm-ia64/native/inst.h | 180 ++++++++++++++++++ > include/asm-ia64/native/irq.h | 35 ++++ > include/asm-ia64/paravirt.h | 252 ++++++++++++++++++++++++ > include/asm-ia64/paravirt_privop.h | 114 +++++++++++ > include/asm-ia64/smp.h | 2 + > include/asm-ia64/system.h | 10 +- > 28 files changed, 1520 insertions(+), 237 deletions(-)-- yamahata
I started looking at this patch set. Parts 1-9 applied ok, but part10 (entry.S) failed to apply because of recent changes to this file to fix the problems with warnings when trying to get locks with interrupts blocked. I thought this would be a good point to test the bisectability of this patch set and started a build with just parts 1-9 applied. The build was perfectly clean, but the kernel hung during boot (possibly when first trying to run user processes by the look of the last few messages on the console. The code with parts 1-9 applied still looks readable, but I think we may need some documentation later to help with maintainability. Perhaps even a tool that can check to make sure people don't add direct uses of instructions to .S files that need to be paravirtualized. I'm talking to some of the virtualization people here to see if they can help me set up a system so that I will be able to test out the non-native execution paths. -Tony