Magnus Damm
2006-Oct-16 08:33 UTC
[Xen-devel] [PATCH 00/04] Kexec / Kdump: Release 20061016 (xen-unstable-11760)
[PATCH 00/04] Kexec / Kdump: Release 20061016 (xen-unstable-11760) This is the 20061016 release of the Kexec / Kdump patches for x86 Xen. Test Results: Kexec Kexec Kexec Kexec Kdump Hardware Xen -> Xen -> bzImage -> Xen -> Xen -> Arch Platform Xen bzImage Xen vmlinux vmlinux i386 A PASS PASS PASS PASS PASS i386 B (VMX) PASS PASS PASS PASS PASS i386 C (SVM) PASS PASS PASS PASS PASS i386/PAE A PASS PASS PASS PASS PASS i386/PAE B (VMX) PASS PASS PASS PASS PASS i386/PAE C (SVM) PASS PASS PASS PASS PASS x86_64 D PASS PASS PASS PASS PASS x86_64 B (VMX) PASS PASS PASS PASS PASS x86_64 C (SVM) PASS PASS PASS PASS PASS The tests were made with version 46ecc6c6c77b1fab20b08286209631a00eb1049e of kexec-tools from the kexec-tools-testing tree which can be found here: http://www.kernel.org/git/?p=linux/kernel/git/horms/kexec-tools-testing.git Hardware Platforms: A: i386 - VA Linux 1220, 2 x Pentium III 866 Mhz, 2 GB B: Intel VT - Shuttle XPC SD36G5, 1 x Pentium D 930, 1 GB C: AMD VT - Shuttle XPC SK22G2, 1 x Athlon64 x2 3800+, 1 GB D: x86_64 - TYAN Transport GX28 B2881, 2 x Opteron 244 1.8 GHz, 2 GB Changes: 20061016 - Release 20061016 for xen-unstable-11760 - "Avoid overwriting the current pgd (V4)" patches accepted upstream - Included in Linux-2.6.19-rc1 - Up-ported Xen code to build on top of merged patches - Implemented and tested VT-extension support for x86: - Intel VMX / IVT "Vanderpool" support for x86_32 and x86_64 - AMD SVM / AMD-V "Pacifica" support for x86_32 and x86_64 - Command line parameter is now the same as for Linux: - For instance, "crashkernel=64M@32M" reserves a 64 MB window at 32 MB - x86 and ia64 patches are now separated, this release is x86-only - The x86 port is from this release handled by Magnus Damm - The ia64 port is handled by Simon Horman 20060931 - Take XIV for xen-unstable-11296 posted by Simon Horman Enjoy! / magnus _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
[PATCH 01/04] Kexec / Kdump: Generic code This patch implements the generic portion of the Kexec / Kdump port to Xen. Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> --- Applies on top of xen-unstable-11760. linux-2.6-xen-sparse/drivers/xen/core/Makefile | 1 linux-2.6-xen-sparse/drivers/xen/core/reboot.c | 4 patches/linux-2.6.16.29/series | 1 linux-2.6-xen-sparse/drivers/xen/core/crash.c | 49 ++ linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c | 76 +++ patches/linux-2.6.16.29/kexec-generic.patch | 283 ++++++++++++ xen/arch/ia64/xen/crash.c | 26 + xen/arch/ia64/xen/machine_kexec.c | 41 + xen/arch/powerpc/crash.c | 26 + xen/arch/powerpc/machine_kexec.c | 41 + xen/arch/x86/crash.c | 26 + xen/arch/x86/machine_kexec.c | 41 + xen/common/kexec.c | 260 +++++++++++ xen/include/asm-ia64/kexec.h | 32 + xen/include/asm-x86/kexec.h | 31 + xen/include/public/kexec.h | 84 +++ xen/include/xen/elfcore.h | 73 +++ xen/include/xen/kexec.h | 30 + xen/arch/ia64/xen/Makefile | 2 xen/arch/powerpc/Makefile | 2 xen/arch/x86/Makefile | 2 xen/common/Makefile | 1 xen/common/page_alloc.c | 33 - xen/drivers/char/console.c | 3 xen/include/xen/hypercall.h | 6 xen/include/xen/mm.h | 1 26 files changed, 1164 insertions(+), 11 deletions(-) --- 0001/linux-2.6-xen-sparse/drivers/xen/core/Makefile +++ work/linux-2.6-xen-sparse/drivers/xen/core/Makefile 2006-10-16 12:04:03.000000000 +0900 @@ -11,3 +11,4 @@ obj-$(CONFIG_XEN_SYSFS) += xen_sysfs.o obj-$(CONFIG_XEN_SKBUFF) += skbuff.o obj-$(CONFIG_XEN_REBOOT) += reboot.o obj-$(CONFIG_XEN_SMPBOOT) += smpboot.o +obj-$(CONFIG_KEXEC) += machine_kexec.o crash.o --- /dev/null +++ work/linux-2.6-xen-sparse/drivers/xen/core/crash.c 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,49 @@ +/* + * drivers/xen/core/crash.c + * Architecture independent functions for kexec based crash dumps in xen. + * + * Created by: Horms <horms@verge.net.au> + * + */ + +#include <asm/ptrace.h> +#include <linux/types.h> +#include <asm/kexec-xen.h> +#include <asm/hypervisor.h> +#include <asm/system.h> +#include <linux/preempt.h> +#include <linux/smp.h> +#include <asm/hw_irq.h> +#include <xen/interface/kexec.h> + +/* + * This passes the registers''s down to the hypervisor and has it kexec() + * This is a bit different to the linux implementation which + * has this call save registers and stop CPUs and then goes into + * machine_kexec() later. But for Xen it makes more sense to + * have the kexec hypercall do everything, and this call + * has the registers parameter that is needed. + * to the hypervisor to allow the hypervisor to kdump itself + * on an internal panic + */ +void machine_crash_shutdown(struct pt_regs *regs) +{ + struct cpu_user_regs xen_regs; + printk("machine_crash_shutdown: %d\n", smp_processor_id()); + local_irq_disable(); +#ifdef CONFIG_X86_IO_APIC + disable_IO_APIC(); +#endif + crash_translate_regs(regs, &xen_regs); + HYPERVISOR_kexec(KEXEC_CMD_kexec, KEXEC_TYPE_CRASH, &xen_regs); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- /dev/null +++ work/linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,76 @@ +/* + * drivers/xen/core/machine_kexec.c + * handle transition of Linux booting another kernel + * + * Created By: Horms <horms@verge.net.au> + * + * Losely based on arch/i386/kernel/machine_kexec.c + */ + +#include <linux/kexec.h> +#include <xen/interface/kexec.h> +#include <linux/mm.h> +#include <asm/hypercall.h> +#include <asm/kexec-xen.h> + +extern void machine_kexec_setup_load_arg(xen_kexec_image_t *xki, + struct kimage *image); + +static void setup_load_arg(xen_kexec_image_t *xki, struct kimage *image) +{ + memset(xki, 0, sizeof(*xki)); + + machine_kexec_setup_load_arg(xki, image); + + xki->indirection_page = image->head; + xki->start_address = image->start; +} + +/* + * Load the image into xen so xen can kdump itself + * This might have been done in prepare, but prepare + * is currently called too early. It might make sense + * to move prepare, but for now, just add an extra hook. + */ +int xen_machine_kexec_load(struct kimage *image) +{ + xen_kexec_image_t xki; + + setup_load_arg(&xki, image); + return HYPERVISOR_kexec(KEXEC_CMD_kexec_load, image->type, &xki); +} + +/* + * Unload the image that was stored by machine_kexec_load() + * This might have been done in machine_kexec_cleanup() but it + * is called too late, and its possible xen could try and kdump + * using resources that have been freed. + */ +void xen_machine_kexec_unload(struct kimage *image) +{ + HYPERVISOR_kexec(KEXEC_CMD_kexec_unload, image->type, NULL); +} + +/* + * Do not allocate memory (or fail in any way) in machine_kexec(). + * We are past the point of no return, committed to rebooting now. + * + * This has the hypervisor move to the prefered reboot CPU, + * stop all CPUs and kexec. That is it combines machine_shutdown() + * and machine_kexec() in Linux kexec terms. + */ +NORET_TYPE void xen_machine_kexec(struct kimage *image) +{ + HYPERVISOR_kexec(KEXEC_CMD_kexec, image->type, NULL); + panic("KEXEC_CMD_kexec hypercall should not return\n"); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- 0001/linux-2.6-xen-sparse/drivers/xen/core/reboot.c +++ work/linux-2.6-xen-sparse/drivers/xen/core/reboot.c 2006-10-16 12:04:03.000000000 +0900 @@ -65,6 +65,10 @@ void machine_power_off(void) HYPERVISOR_shutdown(SHUTDOWN_poweroff); } +#ifdef CONFIG_KEXEC +void machine_shutdown(void) { } +#endif + int reboot_thru_bios = 0; /* for dmi_scan.c */ EXPORT_SYMBOL(machine_restart); EXPORT_SYMBOL(machine_halt); --- /dev/null +++ work/patches/linux-2.6.16.29/kexec-generic.patch 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,283 @@ + drivers/base/cpu.c | 20 +++++++++++++++++ + include/linux/kexec.h | 5 ++++ + kernel/kexec.c | 57 ++++++++++++++++++++++++++++++++++++++----------- + kernel/sys.c | 4 +++ + 4 files changed, 74 insertions(+), 12 deletions(-) + +--- x/drivers/base/cpu.c ++++ x/drivers/base/cpu.c +@@ -11,6 +11,10 @@ + + #include "base.h" + ++#ifdef CONFIG_XEN ++#include <xen/interface/kexec.h> ++#endif ++ + struct sysdev_class cpu_sysdev_class = { + set_kset_name("cpu"), + }; +@@ -86,6 +90,18 @@ static inline void register_cpu_control( + #ifdef CONFIG_KEXEC + #include <linux/kexec.h> + ++#ifdef CONFIG_XEN ++static unsigned long get_crash_notes(int cpu) ++{ ++ unsigned long crash_note; ++ ++ if (HYPERVISOR_kexec(KEXEC_CMD_kexec_crash_note, cpu, &crash_note) < 0) ++ return 0UL; ++ return crash_note; ++} ++#endif ++ ++/* XXX: This only finds dom0''s CPU''s */ + static ssize_t show_crash_notes(struct sys_device *dev, char *buf) + { + struct cpu *cpu = container_of(dev, struct cpu, sysdev); +@@ -101,7 +117,11 @@ static ssize_t show_crash_notes(struct s + * boot up and this data does not change there after. Hence this + * operation should be safe. No locking required. + */ ++#ifndef CONFIG_XEN + addr = __pa(per_cpu_ptr(crash_notes, cpunum)); ++#else ++ addr = (unsigned long long)get_crash_notes(cpunum); ++#endif + rc = sprintf(buf, "%Lx\n", addr); + return rc; + } +--- x/include/linux/kexec.h ++++ x/include/linux/kexec.h +@@ -91,6 +91,11 @@ struct kimage { + extern NORET_TYPE void machine_kexec(struct kimage *image) ATTRIB_NORET; + extern int machine_kexec_prepare(struct kimage *image); + extern void machine_kexec_cleanup(struct kimage *image); ++#ifdef CONFIG_XEN ++extern int xen_machine_kexec_load(struct kimage *image); ++extern void xen_machine_kexec_unload(struct kimage *image); ++extern NORET_TYPE void xen_machine_kexec(struct kimage *image) ATTRIB_NORET; ++#endif + extern asmlinkage long sys_kexec_load(unsigned long entry, + unsigned long nr_segments, + struct kexec_segment __user *segments, +--- x/kernel/kexec.c ++++ x/kernel/kexec.c +@@ -26,6 +26,9 @@ + #include <asm/io.h> + #include <asm/system.h> + #include <asm/semaphore.h> ++#ifdef CONFIG_XEN ++#include <asm/kexec-xen.h> ++#endif + + /* Per cpu memory for storing cpu states in case of system crash. */ + note_buf_t* crash_notes; +@@ -403,7 +406,7 @@ static struct page *kimage_alloc_normal_ + pages = kimage_alloc_pages(GFP_KERNEL, order); + if (!pages) + break; +- pfn = page_to_pfn(pages); ++ pfn = kexec_page_to_pfn(pages); + epfn = pfn + count; + addr = pfn << PAGE_SHIFT; + eaddr = epfn << PAGE_SHIFT; +@@ -437,6 +440,7 @@ static struct page *kimage_alloc_normal_ + return pages; + } + ++#ifndef CONFIG_XEN + static struct page *kimage_alloc_crash_control_pages(struct kimage *image, + unsigned int order) + { +@@ -490,7 +494,7 @@ static struct page *kimage_alloc_crash_c + } + /* If I don''t overlap any segments I have found my hole! */ + if (i == image->nr_segments) { +- pages = pfn_to_page(hole_start >> PAGE_SHIFT); ++ pages = kexec_pfn_to_page(hole_start >> PAGE_SHIFT); + break; + } + } +@@ -517,6 +521,13 @@ struct page *kimage_alloc_control_pages( + + return pages; + } ++#else /* !CONFIG_XEN */ ++struct page *kimage_alloc_control_pages(struct kimage *image, ++ unsigned int order) ++{ ++ return kimage_alloc_normal_control_pages(image, order); ++} ++#endif + + static int kimage_add_entry(struct kimage *image, kimage_entry_t entry) + { +@@ -532,7 +543,7 @@ static int kimage_add_entry(struct kimag + return -ENOMEM; + + ind_page = page_address(page); +- *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION; ++ *image->entry = kexec_virt_to_phys(ind_page) | IND_INDIRECTION; + image->entry = ind_page; + image->last_entry = ind_page + + ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1); +@@ -593,13 +604,13 @@ static int kimage_terminate(struct kimag + #define for_each_kimage_entry(image, ptr, entry) \ + for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \ + ptr = (entry & IND_INDIRECTION)? \ +- phys_to_virt((entry & PAGE_MASK)): ptr +1) ++ kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1) + + static void kimage_free_entry(kimage_entry_t entry) + { + struct page *page; + +- page = pfn_to_page(entry >> PAGE_SHIFT); ++ page = kexec_pfn_to_page(entry >> PAGE_SHIFT); + kimage_free_pages(page); + } + +@@ -611,6 +622,10 @@ static void kimage_free(struct kimage *i + if (!image) + return; + ++#ifdef CONFIG_XEN ++ xen_machine_kexec_unload(image); ++#endif ++ + kimage_free_extra_pages(image); + for_each_kimage_entry(image, ptr, entry) { + if (entry & IND_INDIRECTION) { +@@ -686,7 +701,7 @@ static struct page *kimage_alloc_page(st + * have a match. + */ + list_for_each_entry(page, &image->dest_pages, lru) { +- addr = page_to_pfn(page) << PAGE_SHIFT; ++ addr = kexec_page_to_pfn(page) << PAGE_SHIFT; + if (addr == destination) { + list_del(&page->lru); + return page; +@@ -701,12 +716,12 @@ static struct page *kimage_alloc_page(st + if (!page) + return NULL; + /* If the page cannot be used file it away */ +- if (page_to_pfn(page) > ++ if (kexec_page_to_pfn(page) > + (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { + list_add(&page->lru, &image->unuseable_pages); + continue; + } +- addr = page_to_pfn(page) << PAGE_SHIFT; ++ addr = kexec_page_to_pfn(page) << PAGE_SHIFT; + + /* If it is the destination page we want use it */ + if (addr == destination) +@@ -729,7 +744,7 @@ static struct page *kimage_alloc_page(st + struct page *old_page; + + old_addr = *old & PAGE_MASK; +- old_page = pfn_to_page(old_addr >> PAGE_SHIFT); ++ old_page = kexec_pfn_to_page(old_addr >> PAGE_SHIFT); + copy_highpage(page, old_page); + *old = addr | (*old & ~PAGE_MASK); + +@@ -779,7 +794,7 @@ static int kimage_load_normal_segment(st + result = -ENOMEM; + goto out; + } +- result = kimage_add_page(image, page_to_pfn(page) ++ result = kimage_add_page(image, kexec_page_to_pfn(page) + << PAGE_SHIFT); + if (result < 0) + goto out; +@@ -811,6 +826,7 @@ out: + return result; + } + ++#ifndef CONFIG_XEN + static int kimage_load_crash_segment(struct kimage *image, + struct kexec_segment *segment) + { +@@ -833,7 +849,7 @@ static int kimage_load_crash_segment(str + char *ptr; + size_t uchunk, mchunk; + +- page = pfn_to_page(maddr >> PAGE_SHIFT); ++ page = kexec_pfn_to_page(maddr >> PAGE_SHIFT); + if (page == 0) { + result = -ENOMEM; + goto out; +@@ -881,6 +897,13 @@ static int kimage_load_segment(struct ki + + return result; + } ++#else /* CONFIG_XEN */ ++static int kimage_load_segment(struct kimage *image, ++ struct kexec_segment *segment) ++{ ++ return kimage_load_normal_segment(image, segment); ++} ++#endif + + /* + * Exec Kernel system call: for obvious reasons only root may call it. +@@ -991,6 +1014,11 @@ asmlinkage long sys_kexec_load(unsigned + if (result) + goto out; + } ++#ifdef CONFIG_XEN ++ result = xen_machine_kexec_load(image); ++ if (result) ++ goto out; ++#endif + /* Install the new kernel, and Uninstall the old */ + image = xchg(dest_image, image); + +@@ -1045,7 +1073,6 @@ void crash_kexec(struct pt_regs *regs) + struct kimage *image; + int locked; + +- + /* Take the kexec_lock here to prevent sys_kexec_load + * running on one cpu from replacing the crash kernel + * we are using after a panic on a different cpu. +@@ -1061,12 +1088,17 @@ void crash_kexec(struct pt_regs *regs) + struct pt_regs fixed_regs; + crash_setup_regs(&fixed_regs, regs); + machine_crash_shutdown(&fixed_regs); ++#ifdef CONFIG_XEN ++ xen_machine_kexec(image); ++#else + machine_kexec(image); ++#endif + } + xchg(&kexec_lock, 0); + } + } + ++#ifndef CONFIG_XEN + static int __init crash_notes_memory_init(void) + { + /* Allocate memory for saving cpu registers. */ +@@ -1079,3 +1111,4 @@ static int __init crash_notes_memory_ini + return 0; + } + module_init(crash_notes_memory_init) ++#endif +--- x/kernel/sys.c ++++ x/kernel/sys.c +@@ -435,8 +435,12 @@ void kernel_kexec(void) + kernel_restart_prepare(NULL); + printk(KERN_EMERG "Starting new kernel\n"); + machine_shutdown(); ++#ifdef CONFIG_XEN ++ xen_machine_kexec(image); ++#else + machine_kexec(image); + #endif ++#endif + } + EXPORT_SYMBOL_GPL(kernel_kexec); + --- 0001/patches/linux-2.6.16.29/series +++ work/patches/linux-2.6.16.29/series 2006-10-16 12:04:03.000000000 +0900 @@ -1,3 +1,4 @@ +kexec-generic.patch blktap-aio-16_03_06.patch device_bind.patch fix-hz-suspend.patch --- 0001/xen/arch/ia64/xen/Makefile +++ work/xen/arch/ia64/xen/Makefile 2006-10-16 12:04:03.000000000 +0900 @@ -25,5 +25,7 @@ obj-y += xensetup.o obj-y += xentime.o obj-y += flushd.o obj-y += privop_stat.o +obj-y += machine_kexec.o +obj-y += crash.o obj-$(crash_debug) += gdbstub.o --- /dev/null +++ work/xen/arch/ia64/xen/crash.c 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,26 @@ +/********************************************************************** + * arch/ia64/xen/crash.c + * + * Created By: Horms + * + */ + +#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/types.h> +#include <public/kexec.h> + +void machine_crash_shutdown(struct cpu_user_regs *regs) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + --- /dev/null +++ work/xen/arch/ia64/xen/machine_kexec.c 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,41 @@ +/********************************************************************** + * arch/ia64/xen/machine_kexec.c + * + * Created By: Horms + * + */ + +#include <xen/lib.h> /* for printk() used in stubs */ +#include <xen/types.h> +#include <public/kexec.h> + +int machine_kexec_load(int type, xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + return -1; +} + +void machine_kexec_unload(int type, xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +void machine_kexec(xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +void machine_shutdown(xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- 0001/xen/arch/powerpc/Makefile +++ work/xen/arch/powerpc/Makefile 2006-10-16 12:04:03.000000000 +0900 @@ -40,6 +40,8 @@ obj-y += smp-tbsync.o obj-y += sysctl.o obj-y += time.o obj-y += usercopy.o +obj-y += machine_kexec.o +obj-y += crash.o obj-$(debug) += 0opt.o obj-$(crash_debug) += gdbstub.o --- /dev/null +++ work/xen/arch/powerpc/crash.c 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,26 @@ +/********************************************************************** + * arch/powerpc/crash.c + * + * Created By: Horms + * + */ + +#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/types.h> +#include <public/kexec.h> + +void machine_crash_shutdown(struct cpu_user_regs *regs) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + --- /dev/null +++ work/xen/arch/powerpc/machine_kexec.c 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,41 @@ +/********************************************************************** + * arch/powerpc/machine_kexec.c + * + * Created By: Horms + * + */ + +#include <xen/lib.h> /* for printk() used in stubs */ +#include <xen/types.h> +#include <public/kexec.h> + +int machine_kexec_load(int type, xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + return -1; +} + +void machine_kexec_unload(int type, xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +void machine_kexec(xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +void machine_shutdown(xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- 0001/xen/arch/x86/Makefile +++ work/xen/arch/x86/Makefile 2006-10-16 12:04:03.000000000 +0900 @@ -41,6 +41,8 @@ obj-y += trampoline.o obj-y += traps.o obj-y += usercopy.o obj-y += x86_emulate.o +obj-y += machine_kexec.o +obj-y += crash.o obj-$(crash_debug) += gdbstub.o --- /dev/null +++ work/xen/arch/x86/crash.c 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,26 @@ +/****************************************************************************** + * arch/x86/crash.c + * + * Created By: Horms + * + * Should be based heavily on arch/i386/kernel/crash.c from Linux 2.6.16 + */ + +#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/types.h> +#include <public/xen.h> + +void machine_crash_shutdown(struct cpu_user_regs *regs) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- /dev/null +++ work/xen/arch/x86/machine_kexec.c 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,41 @@ +/****************************************************************************** + * arch/x86/machine_kexec.c + * + * Created By: Horms + * + */ + +#include <xen/lib.h> /* for printk() used in stubs */ +#include <xen/types.h> +#include <public/kexec.h> + +int machine_kexec_load(int type, xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + return -1; +} + +void machine_kexec_unload(int type, xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +void machine_kexec(xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +void machine_shutdown(xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- 0001/xen/common/Makefile +++ work/xen/common/Makefile 2006-10-16 12:04:03.000000000 +0900 @@ -7,6 +7,7 @@ obj-y += event_channel.o obj-y += grant_table.o obj-y += kernel.o obj-y += keyhandler.o +obj-y += kexec.o obj-y += lib.o obj-y += memory.o obj-y += multicall.o --- /dev/null +++ work/xen/common/kexec.c 2006-10-16 12:23:11.000000000 +0900 @@ -0,0 +1,260 @@ +/****************************************************************************** + * common/kexec.c - Achitecture independent kexec code for Xen + * + * Created By: Horms <horms@verge.net.au> + * + * Based in part on Linux 2.6.16''s kernel/kexec.c + */ + +#include <asm/kexec.h> +#include <xen/lib.h> +#include <xen/ctype.h> +#include <xen/errno.h> +#include <xen/guest_access.h> +#include <xen/sched.h> +#include <xen/types.h> +#include <xen/kexec.h> +#include <xen/keyhandler.h> +#include <public/kexec.h> + +static char opt_crashkernel[32] = ""; +string_param("crashkernel", opt_crashkernel); + +DEFINE_PER_CPU (note_buf_t, crash_notes); + +static xen_kexec_image_t kexec_image; +static int kexec_image_set = 0; +static xen_kexec_image_t kexec_crash_image; +static int kexec_crash_image_set = 0; +static int kexec_crash_lock = 0; + +/* Must call with kexec_crash_lock held */ +void __crash_kexec(struct cpu_user_regs *regs) +{ + struct cpu_user_regs fixed_regs; + + if (!kexec_crash_image_set) + return; + crash_setup_regs(&fixed_regs, regs); + machine_crash_shutdown(&fixed_regs); + machine_kexec(&kexec_crash_image); /* Does not return */ +} + +void crash_kexec(struct cpu_user_regs *regs) +{ + int locked; + + locked = xchg(&kexec_crash_lock, 1); + if (locked) + return; + __crash_kexec(regs); + + /* The if() here is bogus, but gcc will throws a warning that the + * computed value is unused and xen compiles with -Werror. + * This seems like a viable work around. + * This did not seem to happen with slightly older gcc. + * Observed with: + * gcc version 4.1.2 20060604 (prerelease) (Debian * 4.1.1-2) */ + if (xchg(&kexec_crash_lock, 0)) ; + + return; +} + +static void do_crashdump_trigger(unsigned char key) +{ + printk("triggering crashdump\n"); + crash_kexec(NULL); +} + +static __init int register_crashdump_trigger(void) +{ + register_keyhandler(''c'', do_crashdump_trigger, "trigger a crashdump"); + return 0; +} +__initcall(register_crashdump_trigger); + +static int get_crash_note(int vcpuid, XEN_GUEST_HANDLE(void) uarg) +{ + struct domain *domain = current->domain; + unsigned long crash_note; + struct vcpu *vcpu; + int locked; + + if (vcpuid < 0 || vcpuid > MAX_VIRT_CPUS) + return -EINVAL; + + if ( ! (vcpu = domain->vcpu[vcpuid]) ) + return -EINVAL; + + locked = xchg(&kexec_crash_lock, 1); + if (locked) + { + printk("do_kexec_op: (CMD_kexec_crash_note): dump is locked\n"); + return -EFAULT; + } + crash_note = __pa((unsigned long)per_cpu(crash_notes, vcpu->processor)); + + /* The if() here is bogus, but gcc will throws a warning that the + * computed value is unused and xen compiles with -Werror. + * This seems like a viable work around. + * This did not seem to happen with slightly older gcc. + * Observed with: + * gcc version 4.1.2 20060604 (prerelease) (Debian * 4.1.1-2) */ + if (xchg(&kexec_crash_lock, 0)) ; + + if ( unlikely(copy_to_guest(uarg, &crash_note, 1) != 0) ) + { + printk("do_kexec_op: (CMD_kexec_crash_note): copy_to_guest failed\n"); + return -EFAULT; + } + + return 0; +} + +void machine_kexec_reserved(xen_kexec_reserve_t *reservation) +{ + unsigned long val[2]; + char *str = opt_crashkernel; + int k = 0; + + memset(reservation, 0, sizeof(*reservation)); + + while (k < ARRAY_SIZE(val)) { + if (*str == ''\0'') { + break; + } + val[k] = simple_strtoul(str, &str, 0); + switch (toupper(*str)) { + case ''G'': val[k] <<= 10; + case ''M'': val[k] <<= 10; + case ''K'': val[k] <<= 10; + str++; + } + if (*str == ''@'') { + str++; + } + k++; + } + + if (k == ARRAY_SIZE(val)) { + reservation->size = val[0]; + reservation->start = val[1]; + } +} + +static int get_reserve(XEN_GUEST_HANDLE(void) uarg) +{ + xen_kexec_reserve_t reservation; + + machine_kexec_reserved(&reservation); + if ( unlikely(copy_to_guest(uarg, &reservation, 1) != 0) ) + { + printk("do_kexec_op (CMD_kexec_reserve): copy_to_guest failed\n"); + return -EFAULT; + } + + return 0; +} + +static int __do_kexec(unsigned long type, XEN_GUEST_HANDLE(void) uarg, + xen_kexec_image_t *image) +{ + cpu_user_regs_t regs; + + if (type == KEXEC_TYPE_DEFAULT) + machine_shutdown(image); /* Does not return */ + else + { + if ( unlikely(copy_from_guest(®s, uarg, 1) != 0) ) + { + printk("do_kexec_op (CMD_kexec): copy_from_guest failed\n"); + return -EFAULT; + } + __crash_kexec(®s); /* Does not return */ + } + + return -EINVAL; +} + +long do_kexec_op(unsigned long op, int arg1, XEN_GUEST_HANDLE(void) uarg) +{ + xen_kexec_image_t *image; + int locked; + int *image_set; + int status = -EINVAL; + + if ( !IS_PRIV(current->domain) ) + return -EPERM; + + switch (op) + { + case KEXEC_CMD_kexec_crash_note: + return get_crash_note(arg1, uarg); + case KEXEC_CMD_kexec_reserve: + return get_reserve(uarg); + } + + /* For all other ops, arg1 is the type of kexec, that is + * KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH */ + if (arg1 == KEXEC_TYPE_CRASH) + { + image = &kexec_crash_image; + image_set = &kexec_crash_image_set; + locked = xchg(&kexec_crash_lock, 1); + if (locked) + { + printk("do_kexec_op: dump is locked\n"); + return -EFAULT; + } + } + else + { + image = &kexec_image; + image_set = &kexec_image_set; + } + + switch(op) { + case KEXEC_CMD_kexec: + BUG_ON(!*image_set); + status = __do_kexec(arg1, uarg, image); + break; + case KEXEC_CMD_kexec_load: + BUG_ON(*image_set); + if ( unlikely(copy_from_guest(image, uarg, 1) != 0) ) + { + printk("do_kexec_op (CMD_kexec_load): copy_from_guest failed\n"); + status = -EFAULT; + break; + } + *image_set = 1; + status = machine_kexec_load(arg1, image); + break; + case KEXEC_CMD_kexec_unload: + BUG_ON(!*image_set); + *image_set = 0; + machine_kexec_unload(arg1, image); + status = 0; + break; + } + + if (arg1 == KEXEC_TYPE_CRASH) + /* The if() here is bogus, but gcc will throws a warning that the + * computed value is unused and xen compiles with -Werror. + * This seems like a viable work around. + * This did not seem to happen with slightly older gcc. + * Observed with: + * gcc version 4.1.2 20060604 (prerelease) (Debian * 4.1.1-2) */ + if (xchg(&kexec_crash_lock, 0)) ; + + return status; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- 0001/xen/common/page_alloc.c +++ work/xen/common/page_alloc.c 2006-10-16 12:04:03.000000000 +0900 @@ -213,24 +213,35 @@ void init_boot_pages(paddr_t ps, paddr_t } } +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at) +{ + unsigned long i; + + for ( i = 0; i < nr_pfns; i++ ) + if ( allocated_in_map(pfn_at + i) ) + break; + + if ( i == nr_pfns ) + { + map_alloc(pfn_at, nr_pfns); + return pfn_at; + } + + return 0; +} + unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align) { - unsigned long pg, i; + unsigned long pg, i = 0; for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align ) { - for ( i = 0; i < nr_pfns; i++ ) - if ( allocated_in_map(pg + i) ) - break; - - if ( i == nr_pfns ) - { - map_alloc(pg, nr_pfns); - return pg; - } + i = alloc_boot_pages_at(nr_pfns, pg); + if (i != 0) + break; } - return 0; + return i; } --- 0001/xen/drivers/char/console.c +++ work/xen/drivers/char/console.c 2006-10-16 12:04:03.000000000 +0900 @@ -613,6 +613,7 @@ void panic(const char *fmt, ...) char buf[128]; unsigned long flags; static DEFINE_SPINLOCK(lock); + extern void crash_kexec(struct cpu_user_regs *regs); debugtrace_dump(); @@ -635,6 +636,8 @@ void panic(const char *fmt, ...) debugger_trap_immediate(); + crash_kexec(NULL); + if ( opt_noreboot ) { machine_halt(); --- /dev/null +++ work/xen/include/asm-ia64/kexec.h 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,32 @@ +/****************************************************************************** + * include/asm-ia64/kexec.h + * + * Created By: Horms + * + */ + +#ifndef __IA64_KEXEC_H__ +#define __IA64_KEXEC_H__ + +#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/types.h> +#include <public/xen.h> + +static void crash_setup_regs(struct cpu_user_regs *newregs, + struct cpu_user_regs *oldregs) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +#endif /* __IA64_KEXEC_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + --- /dev/null +++ work/xen/include/asm-x86/kexec.h 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,31 @@ +/****************************************************************************** + * include/asm-x86/kexec.h + * + * Created By: Horms + * + */ + +#ifndef __X86_KEXEC_H__ +#define __X86_KEXEC_H__ + +#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/types.h> +#include <public/xen.h> + +static void crash_setup_regs(struct cpu_user_regs *newregs, + struct cpu_user_regs *oldregs) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +#endif /* __X86_KEXEC_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- /dev/null +++ work/xen/include/public/kexec.h 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,84 @@ +/****************************************************************************** + * kexec.h - Public portion + * + * Created By: Horms <horms@verge.net.au> + * + * Types based on those in ./vcpu.h on request from Keir Frasier + */ + +#ifndef _XEN_PUBLIC_KEXEC_H +#define _XEN_PUBLIC_KEXEC_H + +#include "xen.h" + +/* + * Prototype for this hypercall is: + * int kexec_op(int cmd, int type, void *extra_args) + * @cmd == KEXEC_CMD_... + * KEXEC operation to perform + * @arg1 == Operation-specific unsigned long argument + * This could be in extra_args, but by putting it here + * copy_from_user can be avoided, inparticular in + * KEXEC_CMD_kexec during a crash dump, which is a failry + * critical section of code.If this turns out not to be + * important then it can be collapsed into extra_args. + * @extra_args == Operation-specific extra arguments (NULL if none). + */ + +#define KEXEC_TYPE_DEFAULT 0 +#define KEXEC_TYPE_CRASH 1 + +/* + * Perform kexec having previously loaded a kexec or kdump kernel + * as appropriate. + * @arg1 == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH + * @extra_arg == pointer to cpu_user_regs_t structure. + */ +#define KEXEC_CMD_kexec 0 + +/* + * Load kernel image in preparation for kexec or kdump. + * @arg1 == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH + * @extra_arg == pointer to xen_kexec_image_t structure. + */ +#define KEXEC_CMD_kexec_load 1 +typedef struct xen_kexec_image { + unsigned long indirection_page; + unsigned long start_address; +} xen_kexec_image_t; + +/* + * Clean up image loaded by KEXEC_CMD_kexec_load + * @arg1 == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH + */ +#define KEXEC_CMD_kexec_unload 2 + +/* + * Find the base pointer and size of the area that xen has + * reserved for use by the crash kernel. + * @extra_arg == pointer to xen_kexec_reserve_t structure. + */ +#define KEXEC_CMD_kexec_reserve 3 +typedef struct xen_kexec_reserve { + unsigned long size; + unsigned long start; +} xen_kexec_reserve_t; + +/* + * Find the base pointer of the area that xen has + * reserved for use by a crash note for a given VCPU + * @extra_arg == pointer to unsigned long. + */ +#define KEXEC_CMD_kexec_crash_note 4 + +#endif /* _XEN_PUBLIC_KEXEC_H */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- /dev/null +++ work/xen/include/xen/elfcore.h 2006-10-16 12:04:04.000000000 +0900 @@ -0,0 +1,73 @@ +/****************************************************************************** + * include/xen/elfcore.h + * + * Created By: Horms + * + * Based heavily on include/linux/elfcore.h from Linux 2.6.16 + * Naming scheeme based on include/xen/elf.h (not include/linux/elfcore.h) + * + */ + +#ifndef __ELFCOREC_H__ +#define __ELFCOREC_H__ + +#include <xen/types.h> +#include <xen/elf.h> +#include <public/xen.h> + +#define NT_PRSTATUS 1 + +typedef struct +{ + int signo; /* signal number */ + int code; /* extra code */ + int errno; /* errno */ +} ELF_Signifo; + +/* These seem to be the same length on all architectures on Linux */ +typedef int ELF_Pid; +typedef struct { + long tv_sec; + long tv_usec; +} ELF_Timeval; +typedef unsigned long ELF_Greg; +#define ELF_NGREG (sizeof (struct cpu_user_regs) / sizeof(ELF_Greg)) +typedef ELF_Greg ELF_Gregset[ELF_NGREG]; + +/* + * Definitions to generate Intel SVR4-like core files. + * These mostly have the same names as the SVR4 types with "elf_" + * tacked on the front to prevent clashes with linux definitions, + * and the typedef forms have been avoided. This is mostly like + * the SVR4 structure, but more Linuxy, with things that Linux does + * not support and which gdb doesn''t really use excluded. + */ +typedef struct +{ + ELF_Signifo pr_info; /* Info associated with signal */ + short pr_cursig; /* Current signal */ + unsigned long pr_sigpend; /* Set of pending signals */ + unsigned long pr_sighold; /* Set of held signals */ + ELF_Pid pr_pid; + ELF_Pid pr_ppid; + ELF_Pid pr_pgrp; + ELF_Pid pr_sid; + ELF_Timeval pr_utime; /* User time */ + ELF_Timeval pr_stime; /* System time */ + ELF_Timeval pr_cutime; /* Cumulative user time */ + ELF_Timeval pr_cstime; /* Cumulative system time */ + ELF_Gregset pr_reg; /* GP registers */ + int pr_fpvalid; /* True if math co-processor being used. */ +} ELF_Prstatus; + +#endif /* __ELFCOREC_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- 0001/xen/include/xen/hypercall.h +++ work/xen/include/xen/hypercall.h 2006-10-16 12:04:03.000000000 +0900 @@ -102,4 +102,10 @@ do_hvm_op( unsigned long op, XEN_GUEST_HANDLE(void) arg); +extern long +do_kexec_op( + unsigned long op, + int arg1, + XEN_GUEST_HANDLE(void) arg); + #endif /* __XEN_HYPERCALL_H__ */ --- /dev/null +++ work/xen/include/xen/kexec.h 2006-10-16 12:21:58.000000000 +0900 @@ -0,0 +1,30 @@ +/****************************************************************************** + * include/xen/kexec.h - Internal archtecture independant portion + * + * Created By: Horms <horms@verge.net.au> + * + */ + +#include <public/kexec.h> + +#define MAX_NOTE_BYTES 1024 + +typedef u32 note_buf_t[MAX_NOTE_BYTES/4]; +DECLARE_PER_CPU (note_buf_t, crash_notes); + +int machine_kexec_load(int type, xen_kexec_image_t *image); +void machine_kexec_unload(int type, xen_kexec_image_t *image); +void machine_kexec_reserved(xen_kexec_reserve_t *reservation); +void machine_kexec(xen_kexec_image_t *image); +void machine_shutdown(xen_kexec_image_t *image); +void machine_crash_shutdown(cpu_user_regs_t *regs); + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- 0001/xen/include/xen/mm.h +++ work/xen/include/xen/mm.h 2006-10-16 12:04:03.000000000 +0900 @@ -40,6 +40,7 @@ struct page_info; paddr_t init_boot_allocator(paddr_t bitmap_start); void init_boot_pages(paddr_t ps, paddr_t pe); unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align); +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at); void end_boot_allocator(void); /* Generic allocator. These functions are *not* interrupt-safe. */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magnus Damm
2006-Oct-16 08:33 UTC
[Xen-devel] [PATCH 02/04] Kexec / Kdump: Code shared between x86_32 and x86_64
[PATCH 02/04] Kexec / Kdump: Code shared between x86_32 and x86_64 This patch contains Kexec / Kdump code shared between x86_32 and x86_64. Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> --- Applies on top of xen-unstable-11760. patches/linux-2.6.16.29/series | 2 patches/linux-2.6.16.29/git-2a..f7.patch | 62 +++ patches/linux-2.6.16.29/git-2e..11.patch | 93 +++++ xen/arch/x86/x86_32/machine_kexec.c | 26 + xen/arch/x86/x86_64/machine_kexec.c | 27 + xen/include/asm-x86/elf.h | 27 + xen/include/asm-x86/x86_32/elf.h | 28 + xen/include/asm-x86/x86_32/kexec.h | 48 ++ xen/include/asm-x86/x86_64/elf.h | 28 + xen/include/asm-x86/x86_64/kexec.h | 33 + xen/arch/x86/crash.c | 178 +++++++++- xen/arch/x86/machine_kexec.c | 72 +++- xen/arch/x86/setup.c | 73 +++- xen/arch/x86/traps.c | 3 xen/arch/x86/x86_32/Makefile | 1 xen/arch/x86/x86_64/Makefile | 1 xen/include/asm-x86/fixmap.h | 3 xen/include/asm-x86/hypercall.h | 5 xen/include/asm-x86/kexec.h | 13 xen/include/public/kexec.h | 7 xen/include/xen/elfcore.h | 3 21 files changed, 706 insertions(+), 27 deletions(-) --- /dev/null +++ work/patches/linux-2.6.16.29/git-2a8a3d5b65e86ec1dfef7d268c64a909eab94af7.patch 2006-10-16 12:15:10.000000000 +0900 @@ -0,0 +1,62 @@ +From: Eric W. Biederman <ebiederm@xmission.com> +Date: Sun, 30 Jul 2006 10:03:20 +0000 (-0700) +Subject: [PATCH] machine_kexec.c: Fix the description of segment handling +X-Git-Tag: v2.6.18-rc4 +X-Git-Url: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2a8a3d5b65e86ec1dfef7d268c64a909eab94af7 + +[PATCH] machine_kexec.c: Fix the description of segment handling + +One of my original comments in machine_kexec was unclear +and this should fix it. + +Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> +Cc: Andi Kleen <ak@muc.de> +Acked-by: Horms <horms@verge.net.au> +Signed-off-by: Andrew Morton <akpm@osdl.org> +Signed-off-by: Linus Torvalds <torvalds@osdl.org> +--- + +--- a/arch/i386/kernel/machine_kexec.c ++++ b/arch/i386/kernel/machine_kexec.c +@@ -189,14 +189,11 @@ NORET_TYPE void machine_kexec(struct kim + memcpy((void *)reboot_code_buffer, relocate_new_kernel, + relocate_new_kernel_size); + +- /* The segment registers are funny things, they are +- * automatically loaded from a table, in memory wherever you +- * set them to a specific selector, but this table is never +- * accessed again you set the segment to a different selector. +- * +- * The more common model is are caches where the behide +- * the scenes work is done, but is also dropped at arbitrary +- * times. ++ /* The segment registers are funny things, they have both a ++ * visible and an invisible part. Whenever the visible part is ++ * set to a specific selector, the invisible part is loaded ++ * with from a table in memory. At no other time is the ++ * descriptor table in memory accessed. + * + * I take advantage of this here by force loading the + * segments, before I zap the gdt with an invalid value. +--- a/arch/x86_64/kernel/machine_kexec.c ++++ b/arch/x86_64/kernel/machine_kexec.c +@@ -207,14 +207,11 @@ NORET_TYPE void machine_kexec(struct kim + __flush_tlb(); + + +- /* The segment registers are funny things, they are +- * automatically loaded from a table, in memory wherever you +- * set them to a specific selector, but this table is never +- * accessed again unless you set the segment to a different selector. +- * +- * The more common model are caches where the behide +- * the scenes work is done, but is also dropped at arbitrary +- * times. ++ /* The segment registers are funny things, they have both a ++ * visible and an invisible part. Whenever the visible part is ++ * set to a specific selector, the invisible part is loaded ++ * with from a table in memory. At no other time is the ++ * descriptor table in memory accessed. + * + * I take advantage of this here by force loading the + * segments, before I zap the gdt with an invalid value. --- /dev/null +++ work/patches/linux-2.6.16.29/git-2efe55a9cec8418f0e0cde3dc3787a42fddc4411.patch 2006-10-16 12:15:10.000000000 +0900 @@ -0,0 +1,93 @@ +From: Tobias Klauser <tklauser@nuerscht.ch> +Date: Mon, 26 Jun 2006 16:57:34 +0000 (+0200) +Subject: Storage class should be first +X-Git-Tag: v2.6.18-rc1 +X-Git-Url: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2efe55a9cec8418f0e0cde3dc3787a42fddc4411 + +Storage class should be first + +Storage class should be before const + +Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch> +Signed-off-by: Adrian Bunk <bunk@stusta.de> +--- + +--- a/arch/i386/kernel/machine_kexec.c ++++ b/arch/i386/kernel/machine_kexec.c +@@ -133,9 +133,9 @@ typedef asmlinkage NORET_TYPE void (*rel + unsigned long start_address, + unsigned int has_pae) ATTRIB_NORET; + +-const extern unsigned char relocate_new_kernel[]; ++extern const unsigned char relocate_new_kernel[]; + extern void relocate_new_kernel_end(void); +-const extern unsigned int relocate_new_kernel_size; ++extern const unsigned int relocate_new_kernel_size; + + /* + * A architecture hook called to validate the +--- a/arch/powerpc/kernel/machine_kexec_32.c ++++ b/arch/powerpc/kernel/machine_kexec_32.c +@@ -30,8 +30,8 @@ typedef NORET_TYPE void (*relocate_new_k + */ + void default_machine_kexec(struct kimage *image) + { +- const extern unsigned char relocate_new_kernel[]; +- const extern unsigned int relocate_new_kernel_size; ++ extern const unsigned char relocate_new_kernel[]; ++ extern const unsigned int relocate_new_kernel_size; + unsigned long page_list; + unsigned long reboot_code_buffer, reboot_code_buffer_phys; + relocate_new_kernel_t rnk; +--- a/arch/ppc/kernel/machine_kexec.c ++++ b/arch/ppc/kernel/machine_kexec.c +@@ -25,8 +25,8 @@ typedef NORET_TYPE void (*relocate_new_k + unsigned long reboot_code_buffer, + unsigned long start_address) ATTRIB_NORET; + +-const extern unsigned char relocate_new_kernel[]; +-const extern unsigned int relocate_new_kernel_size; ++extern const unsigned char relocate_new_kernel[]; ++extern const unsigned int relocate_new_kernel_size; + + void machine_shutdown(void) + { +--- a/arch/s390/kernel/machine_kexec.c ++++ b/arch/s390/kernel/machine_kexec.c +@@ -27,8 +27,8 @@ static void kexec_halt_all_cpus(void *); + + typedef void (*relocate_kernel_t) (kimage_entry_t *, unsigned long); + +-const extern unsigned char relocate_kernel[]; +-const extern unsigned long long relocate_kernel_len; ++extern const unsigned char relocate_kernel[]; ++extern const unsigned long long relocate_kernel_len; + + int + machine_kexec_prepare(struct kimage *image) +--- a/arch/sh/kernel/machine_kexec.c ++++ b/arch/sh/kernel/machine_kexec.c +@@ -25,8 +25,8 @@ typedef NORET_TYPE void (*relocate_new_k + unsigned long start_address, + unsigned long vbr_reg) ATTRIB_NORET; + +-const extern unsigned char relocate_new_kernel[]; +-const extern unsigned int relocate_new_kernel_size; ++extern const unsigned char relocate_new_kernel[]; ++extern const unsigned int relocate_new_kernel_size; + extern void *gdb_vbr_vector; + + /* +--- a/arch/x86_64/kernel/machine_kexec.c ++++ b/arch/x86_64/kernel/machine_kexec.c +@@ -149,8 +149,8 @@ typedef NORET_TYPE void (*relocate_new_k + unsigned long start_address, + unsigned long pgtable) ATTRIB_NORET; + +-const extern unsigned char relocate_new_kernel[]; +-const extern unsigned long relocate_new_kernel_size; ++extern const unsigned char relocate_new_kernel[]; ++extern const unsigned long relocate_new_kernel_size; + + int machine_kexec_prepare(struct kimage *image) + { --- 0003/patches/linux-2.6.16.29/series +++ work/patches/linux-2.6.16.29/series 2006-10-16 12:15:09.000000000 +0900 @@ -1,4 +1,6 @@ kexec-generic.patch +git-2efe55a9cec8418f0e0cde3dc3787a42fddc4411.patch +git-2a8a3d5b65e86ec1dfef7d268c64a909eab94af7.patch blktap-aio-16_03_06.patch device_bind.patch fix-hz-suspend.patch --- 0003/xen/arch/x86/crash.c +++ work/xen/arch/x86/crash.c 2006-10-16 12:15:09.000000000 +0900 @@ -3,16 +3,188 @@ * * Created By: Horms * - * Should be based heavily on arch/i386/kernel/crash.c from Linux 2.6.16 + * Based heavily on arch/i386/kernel/crash.c from Linux 2.6.16 */ -#include <xen/lib.h> /* for printk() used in stub */ +#include <asm/atomic.h> +#include <asm/elf.h> +#include <asm/percpu.h> +#include <asm/kexec.h> #include <xen/types.h> +#include <xen/irq.h> +#include <asm/ipi.h> +#include <asm/nmi.h> +#include <xen/string.h> +#include <xen/elf.h> +#include <xen/elfcore.h> +#include <xen/smp.h> +#include <xen/delay.h> +#include <xen/perfc.h> +#include <xen/kexec.h> #include <public/xen.h> +#include <asm/hvm/hvm.h> + +static int crashing_cpu; + +static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, + size_t data_len) +{ + Elf_Note note; + + note.namesz = strlen(name) + 1; + note.descsz = data_len; + note.type = type; + memcpy(buf, ¬e, sizeof(note)); + buf += (sizeof(note) +3)/4; + memcpy(buf, name, note.namesz); + buf += (note.namesz + 3)/4; + memcpy(buf, data, note.descsz); + buf += (note.descsz + 3)/4; + + return buf; +} + +static void final_note(u32 *buf) +{ + Elf_Note note; + + note.namesz = 0; + note.descsz = 0; + note.type = 0; + memcpy(buf, ¬e, sizeof(note)); +} + +static void crash_save_this_cpu(struct cpu_user_regs *regs, int cpu) +{ + ELF_Prstatus prstatus; + uint32_t *buf; + + printk("crash_save_this_cpu: %d\n", cpu); + + if ((cpu < 0) || (cpu >= NR_CPUS)) + return; + + /* Using ELF notes here is opportunistic. + * A well defined structure format with tags is needed + * ELF notes happen to provide this and there is infastructure + * in the Linux kernel to supprot them. In order to make + * crash dumps produced by xen the same, the same + * technique is used here. + */ + + /* It should be safe to use per_cpu() here instead of per_cpu_ptr() + * (which does not exist in xen) as kexecing_lock must be held in + * order to get anywhere near here */ + buf = (uint32_t *)per_cpu(crash_notes, cpu); + if (!buf) /* XXX: Can this ever occur? */ + return; + memset(&prstatus, 0, sizeof(prstatus)); + /* XXX: Xen does not have processes. For the crashing CPU on a dom0 + * crash this could be pased down from dom0, but is this + * neccessary? + * prstatus.pr_pid = current->pid; */ + ELF_CORE_COPY_REGS(prstatus.pr_reg, regs); + buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus, + sizeof(prstatus)); + final_note(buf); +} + +static void crash_save_self(struct cpu_user_regs *regs) +{ + crash_save_this_cpu(regs, smp_processor_id()); +} + +#ifdef CONFIG_SMP +static atomic_t waiting_for_crash_ipi; + +static int crash_nmi_callback(struct cpu_user_regs *regs, int cpu) +{ + struct cpu_user_regs fixed_regs; + + /* Don''t do anything if this handler is invoked on crashing cpu. + * Otherwise, system will completely hang. Crashing cpu can get + * an NMI if system was initially booted with nmi_watchdog parameter. + */ + if (cpu == crashing_cpu) + return 1; + local_irq_disable(); + +#ifdef CONFIG_X86_32 + if (!user_mode(regs)) { + crash_fixup_ss_esp(&fixed_regs, regs); + regs = &fixed_regs; + } +#endif + crash_save_this_cpu(regs, cpu); + disable_local_APIC(); + atomic_dec(&waiting_for_crash_ipi); + hvm_disable(); + + for ( ; ; ) + __asm__ __volatile__ ( "hlt" ); + + return 1; + + /* Need to use this somewhere as Xen builds with -Werror */ + crash_setup_regs(&fixed_regs, regs); +} + +/* + * By using the NMI code instead of a vector we just sneak thru the + * word generator coming out with just what we want. AND it does + * not matter if clustered_apic_mode is set or not. + */ +static void smp_send_nmi_allbutself(void) +{ + cpumask_t allbutself = cpu_online_map; + cpu_clear(smp_processor_id(), allbutself); + send_IPI_mask(allbutself, APIC_DM_NMI); +} + +static void nmi_shootdown_cpus(void) +{ + unsigned long msecs; + + atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1); + /* Would it be better to replace the trap vector here? */ + set_nmi_callback(crash_nmi_callback); + /* Ensure the new callback function is set before sending + * out the NMI + */ + wmb(); + + smp_send_nmi_allbutself(); + + msecs = 1000; /* Wait at most a second for the other cpus to stop */ + while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { + mdelay(1); + msecs--; + } + + /* Leave the nmi callback set */ + disable_local_APIC(); +} +#else +static void nmi_shootdown_cpus(void) +{ + /* There are no cpus to shootdown */ +} +#endif void machine_crash_shutdown(struct cpu_user_regs *regs) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + printk("machine_crash_shutdown: %d\n", smp_processor_id()); + local_irq_disable(); + + crashing_cpu = smp_processor_id(); + nmi_shootdown_cpus(); + +#ifdef CONFIG_X86_IO_APIC + disable_IO_APIC(); +#endif + hvm_disable(); + + crash_save_self(regs); } /* --- 0003/xen/arch/x86/machine_kexec.c +++ work/xen/arch/x86/machine_kexec.c 2006-10-16 12:15:09.000000000 +0900 @@ -4,30 +4,84 @@ * Created By: Horms * */ - -#include <xen/lib.h> /* for printk() used in stubs */ + +#include <xen/lib.h> +#include <asm/irq.h> +#include <asm/page.h> +#include <asm/flushtlb.h> +#include <xen/smp.h> +#include <xen/nmi.h> #include <xen/types.h> +#include <xen/console.h> +#include <xen/kexec.h> #include <public/kexec.h> +#include <xen/domain_page.h> +#include <asm/fixmap.h> +#include <asm/hvm/hvm.h> int machine_kexec_load(int type, xen_kexec_image_t *image) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); - return -1; + unsigned long prev_ma = 0; + int k; + + /* setup fixmap to point to our pages and record the virtual address + * in every odd index in page_list[]. + */ + + for (k = 0; k < KEXEC_XEN_NO_PAGES; k++) { + if ((k & 1) == 0) { /* even pages: machine address */ + prev_ma = image->page_list[k]; + } + else { /* odd pages: va for previous ma */ + set_fixmap(FIX_KEXEC_BASE_0 + (k >> 1), prev_ma); + image->page_list[k] = fix_to_virt(FIX_KEXEC_BASE_0 + (k >> 1)); + } + } + + return 0; } void machine_kexec_unload(int type, xen_kexec_image_t *image) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); } -void machine_kexec(xen_kexec_image_t *image) +static void __machine_shutdown(void *data) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); -} + xen_kexec_image_t *image = (xen_kexec_image_t *)data; + + watchdog_disable(); + console_start_sync(); + smp_send_stop(); + +#ifdef CONFIG_X86_IO_APIC + disable_IO_APIC(); +#endif + hvm_disable(); + + machine_kexec(image); +} + void machine_shutdown(xen_kexec_image_t *image) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + int reboot_cpu_id; + cpumask_t reboot_cpu; + + reboot_cpu_id = 0; + + if (!cpu_isset(reboot_cpu_id, cpu_online_map)) + reboot_cpu_id = smp_processor_id(); + + if (reboot_cpu_id != smp_processor_id()) { + cpus_clear(reboot_cpu); + cpu_set(reboot_cpu_id, reboot_cpu); + on_selected_cpus(reboot_cpu, __machine_shutdown, image, 1, 0); + for (;;) + ; /* nothing */ + } + else + __machine_shutdown(image); + BUG(); } /* --- 0001/xen/arch/x86/setup.c +++ work/xen/arch/x86/setup.c 2006-10-16 12:20:51.000000000 +0900 @@ -26,6 +26,7 @@ #include <asm/shadow.h> #include <asm/e820.h> #include <acm/acm_hooks.h> +#include <xen/kexec.h> extern void dmi_scan_machine(void); extern void generic_apic_probe(void); @@ -219,6 +220,20 @@ static void __init init_idle_domain(void setup_idle_pagetable(); } +void __init move_memory(unsigned long dst, + unsigned long src_start, unsigned long src_end) +{ +#if defined(CONFIG_X86_32) + memmove((void *)dst, /* use low mapping */ + (void *)src_start, /* use low mapping */ + src_end - src_start); +#elif defined(CONFIG_X86_64) + memmove(__va(dst), + __va(src_start), + src_end - src_start); +#endif +} + void __init __start_xen(multiboot_info_t *mbi) { char __cmdline[] = "", *cmdline = __cmdline; @@ -228,6 +243,7 @@ void __init __start_xen(multiboot_info_t unsigned long nr_pages, modules_length; paddr_t s, e; int i, e820_warn = 0, e820_raw_nr = 0, bytes = 0; + xen_kexec_reserve_t crash_area; struct ns16550_defaults ns16550 = { .data_bits = 8, .parity = ''n'', @@ -359,15 +375,8 @@ void __init __start_xen(multiboot_info_t initial_images_start = xenheap_phys_end; initial_images_end = initial_images_start + modules_length; -#if defined(CONFIG_X86_32) - memmove((void *)initial_images_start, /* use low mapping */ - (void *)mod[0].mod_start, /* use low mapping */ - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#elif defined(CONFIG_X86_64) - memmove(__va(initial_images_start), - __va(mod[0].mod_start), - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#endif + move_memory(initial_images_start, + mod[0].mod_start, mod[mbi->mods_count-1].mod_end); /* Initialise boot-time allocator with all RAM situated after modules. */ xenheap_phys_start = init_boot_allocator(__pa(&_end)); @@ -415,6 +424,52 @@ void __init __start_xen(multiboot_info_t #endif } + machine_kexec_reserved(&crash_area); + if (crash_area.size > 0) { + unsigned long kdump_start, kdump_size, k; + + /* mark images pages as free for now */ + + init_boot_pages(initial_images_start, initial_images_end); + + kdump_start = crash_area.start; + kdump_size = crash_area.size; + + printk("Kdump: %luMB (%lukB) at 0x%lx\n", + kdump_size >> 20, + kdump_size >> 10, + kdump_start); + + if ((kdump_start & ~PAGE_MASK) || (kdump_size & ~PAGE_MASK)) + panic("Kdump parameters not page aligned\n"); + + kdump_start >>= PAGE_SHIFT; + kdump_size >>= PAGE_SHIFT; + + /* allocate pages for Kdump memory area */ + + k = alloc_boot_pages_at(kdump_size, kdump_start); + + if (k != kdump_start) + panic("Unable to reserve Kdump memory\n"); + + /* allocate pages for relocated initial images */ + + k = ((initial_images_end - initial_images_start) & ~PAGE_MASK) ? 1 : 0; + k += (initial_images_end - initial_images_start) >> PAGE_SHIFT; + + k = alloc_boot_pages(k, 1); + + if (!k) + panic("Unable to allocate initial images memory\n"); + + move_memory(k << PAGE_SHIFT, initial_images_start, initial_images_end); + + initial_images_end -= initial_images_start; + initial_images_start = k << PAGE_SHIFT; + initial_images_end += initial_images_start; + } + memguard_init(); percpu_guard_areas(); --- 0001/xen/arch/x86/traps.c +++ work/xen/arch/x86/traps.c 2006-10-16 12:15:09.000000000 +0900 @@ -105,6 +105,8 @@ unsigned long do_get_debugreg(int reg); static int debug_stack_lines = 20; integer_param("debug_stack_lines", debug_stack_lines); +extern void crash_kexec(struct cpu_user_regs *regs); + #ifdef CONFIG_X86_32 #define stack_words_per_line 8 #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)®s->esp) @@ -1595,6 +1597,7 @@ static void unknown_nmi_error(unsigned c printk("Uhhuh. NMI received for unknown reason %02x.\n", reason); printk("Dazed and confused, but trying to continue\n"); printk("Do you have a strange power saving mode enabled?\n"); + crash_kexec(NULL); } } --- 0001/xen/arch/x86/x86_32/Makefile +++ work/xen/arch/x86/x86_32/Makefile 2006-10-16 12:15:09.000000000 +0900 @@ -3,5 +3,6 @@ obj-y += entry.o obj-y += mm.o obj-y += seg_fixup.o obj-y += traps.o +obj-y += machine_kexec.o obj-$(supervisor_mode_kernel) += supervisor_mode_kernel.o --- /dev/null +++ work/xen/arch/x86/x86_32/machine_kexec.c 2006-10-16 12:15:11.000000000 +0900 @@ -0,0 +1,26 @@ +/* + * arch/x86/x86_32/machine_kexec.c + * Handle transition of Linux booting another kernel + * + * Created By: Horms <horms@verge.net.au> + * + * Should be losely based on arch/i386/kernel/machine_kexec.c + */ + +#include <xen/lib.h> /* for printk() used in stub */ +#include <public/kexec.h> + +void machine_kexec(xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- 0001/xen/arch/x86/x86_64/Makefile +++ work/xen/arch/x86/x86_64/Makefile 2006-10-16 12:15:09.000000000 +0900 @@ -1,3 +1,4 @@ obj-y += entry.o obj-y += mm.o obj-y += traps.o +obj-y += machine_kexec.o --- /dev/null +++ work/xen/arch/x86/x86_64/machine_kexec.c 2006-10-16 12:15:11.000000000 +0900 @@ -0,0 +1,27 @@ +/****************************************************************************** + * arch/x86/x86_64/machine_kexec.c + * Handle transition of Linux booting another kernel + * + * Created By: Horms <horms@verge.net.au> + * + * Should be losely based on arch/x86_64/kernel/machine_kexec.c + */ + +#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/types.h> +#include <public/kexec.h> + +void machine_kexec(xen_kexec_image_t *image) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- /dev/null +++ work/xen/include/asm-x86/elf.h 2006-10-16 12:15:10.000000000 +0900 @@ -0,0 +1,27 @@ +/****************************************************************************** + * include/asm-x86/elf.h + * + * Created By: Horms + * + */ + +#ifndef __X86_ELF_H__ +#define __X86_ELF_H__ + +#ifdef __x86_64__ +#include <asm/x86_64/elf.h> +#else +#include <asm/x86_32/elf.h> +#endif + +#endif /* __X86_ELF_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- 0001/xen/include/asm-x86/fixmap.h +++ work/xen/include/asm-x86/fixmap.h 2006-10-16 12:15:09.000000000 +0900 @@ -16,6 +16,7 @@ #include <asm/apicdef.h> #include <asm/acpi.h> #include <asm/page.h> +#include <public/kexec.h> /* * Here we define all the compile-time ''special'' virtual @@ -36,6 +37,8 @@ enum fixed_addresses { FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1, FIX_HPET_BASE, FIX_CYCLONE_TIMER, + FIX_KEXEC_BASE_0, + FIX_KEXEC_BASE_END = FIX_KEXEC_BASE_0 + (KEXEC_XEN_NO_PAGES >> 1)-1, __end_of_fixed_addresses }; --- 0001/xen/include/asm-x86/hypercall.h +++ work/xen/include/asm-x86/hypercall.h 2006-10-16 12:15:09.000000000 +0900 @@ -6,6 +6,7 @@ #define __ASM_X86_HYPERCALL_H__ #include <public/physdev.h> +#include <xen/types.h> extern long do_event_channel_op_compat( @@ -87,6 +88,10 @@ extern long arch_do_vcpu_op( int cmd, struct vcpu *v, XEN_GUEST_HANDLE(void) arg); +extern int +do_kexec( + unsigned long op, unsigned arg1, XEN_GUEST_HANDLE(void) uarg); + #ifdef __x86_64__ extern long --- 0003/xen/include/asm-x86/kexec.h +++ work/xen/include/asm-x86/kexec.h 2006-10-16 12:15:09.000000000 +0900 @@ -8,15 +8,16 @@ #ifndef __X86_KEXEC_H__ #define __X86_KEXEC_H__ -#include <xen/lib.h> /* for printk() used in stub */ +#include <asm/processor.h> #include <xen/types.h> +#include <xen/string.h> #include <public/xen.h> -static void crash_setup_regs(struct cpu_user_regs *newregs, - struct cpu_user_regs *oldregs) -{ - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); -} +#ifdef __x86_64__ +#include <asm/x86_64/kexec.h> +#else +#include <asm/x86_32/kexec.h> +#endif #endif /* __X86_KEXEC_H__ */ --- /dev/null +++ work/xen/include/asm-x86/x86_32/elf.h 2006-10-16 12:15:10.000000000 +0900 @@ -0,0 +1,28 @@ +/****************************************************************************** + * include/asm-x86/x86_32/elf.h + * + * Created By: Horms + * + * Should pull be based on include/asm-i386/elf.h:ELF_CORE_COPY_REGS + * from Linux 2.6.16 + */ + +#ifndef __X86_ELF_X86_32_H__ +#define __X86_ELF_X86_32_H__ + +#include <xen/lib.h> /* for printk() used in stub */ + +#define ELF_CORE_COPY_REGS(pr_reg, regs) \ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + +#endif /* __X86_ELF_X86_32_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- /dev/null +++ work/xen/include/asm-x86/x86_32/kexec.h 2006-10-16 12:15:10.000000000 +0900 @@ -0,0 +1,48 @@ +/****************************************************************************** + * include/asm-x86/x86_32/kexec.h + * + * Created By: Horms + * + * Should be based heavily on include/asm-i386/kexec.h from Linux 2.6.16 + * + */ + +#ifndef __X86_32_KEXEC_H__ +#define __X86_32_KEXEC_H__ + +#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/types.h> +#include <public/xen.h> + +static void crash_fixup_ss_esp(struct cpu_user_regs *newregs, + struct cpu_user_regs *oldregs) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + return; + crash_fixup_ss_esp(newregs, oldregs); +} + +static void crash_setup_regs(struct cpu_user_regs *newregs, + struct cpu_user_regs *oldregs) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +static inline int user_mode(struct cpu_user_regs *regs) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + return -1; +} + + +#endif /* __X86_32_KEXEC_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- /dev/null +++ work/xen/include/asm-x86/x86_64/elf.h 2006-10-16 12:15:10.000000000 +0900 @@ -0,0 +1,28 @@ +/****************************************************************************** + * include/asm-x86/x86_64/elf.h + * + * Created By: Horms + * + * Should pull be based on include/asm-x86_64/elf.h:ELF_CORE_COPY_REGS + * from Linux 2.6.16 + */ + +#ifndef __X86_ELF_X86_64_H__ +#define __X86_ELF_X86_64_H__ + +#include <xen/lib.h> /* for printk() used in stub */ + +#define ELF_CORE_COPY_REGS(pr_reg, regs) \ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + +#endif /* __X86_ELF_X86_64_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- /dev/null +++ work/xen/include/asm-x86/x86_64/kexec.h 2006-10-16 12:15:10.000000000 +0900 @@ -0,0 +1,33 @@ +/****************************************************************************** + * include/asm-x86/x86_64/kexec.h + * + * Created By: Horms + * + * Should be based heavily on include/asm-x86_64/kexec.h from Linux 2.6.16 + * + */ + +#ifndef __X86_64_KEXEC_H__ +#define __X86_64_KEXEC_H__ + +#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/types.h> +#include <public/xen.h> + +static void crash_setup_regs(struct cpu_user_regs *newregs, + struct cpu_user_regs *oldregs) +{ + printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +} + +#endif /* __X86_64_KEXEC_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- 0003/xen/include/public/kexec.h +++ work/xen/include/public/kexec.h 2006-10-16 12:15:09.000000000 +0900 @@ -11,6 +11,10 @@ #include "xen.h" +#if defined(__i386__) || defined(__x86_64__) +#define KEXEC_XEN_NO_PAGES 17 +#endif + /* * Prototype for this hypercall is: * int kexec_op(int cmd, int type, void *extra_args) @@ -43,6 +47,9 @@ */ #define KEXEC_CMD_kexec_load 1 typedef struct xen_kexec_image { +#if defined(__i386__) || defined(__x86_64__) + unsigned long page_list[KEXEC_XEN_NO_PAGES]; +#endif unsigned long indirection_page; unsigned long start_address; } xen_kexec_image_t; --- 0003/xen/include/xen/elfcore.h +++ work/xen/include/xen/elfcore.h 2006-10-16 12:15:09.000000000 +0900 @@ -16,6 +16,9 @@ #include <public/xen.h> #define NT_PRSTATUS 1 +#define NT_XEN_DOM0_CR3 0x10000001 /* XXX: Hopefully this is unused, + feel free to change to a + better/different value */ typedef struct { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magnus Damm
2006-Oct-16 08:33 UTC
[Xen-devel] [PATCH 03/04] Kexec / Kdump: x86_32 specific code
[PATCH 03/04] Kexec / Kdump: x86_32 specific code This patch contains the x86_32 implementation of Kexec / Kdump for Xen. Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> --- Applies on top of xen-unstable-11760. buildconfigs/linux-defconfig_xen_x86_32 | 2 linux-2.6-xen-sparse/arch/i386/Kconfig | 2 linux-2.6-xen-sparse/arch/i386/kernel/Makefile | 2 linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c | 25 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h | 8 patches/linux-2.6.16.29/series | 3 linux-2.6-xen-sparse/include/asm-i386/kexec-xen.h | 57 + patches/linux-2.6.16.29/git-35..cc9.patch | 401 +++++++ patches/linux-2.6.16.29/linux-2.6.19-rc1-kexe..code-i386.patch | 169 ++++ patches/linux-2.6.16.29/linux-2.6.19-rc1-kexec-xen-i386.patch | 54 + xen/arch/x86/crash.c | 47 + xen/arch/x86/x86_32/entry.S | 2 xen/arch/x86/x86_32/machine_kexec.c | 25 xen/include/asm-x86/x86_32/elf.h | 32 xen/include/asm-x86/x86_32/kexec.h | 65 + 15 files changed, 863 insertions(+), 31 deletions(-) --- 0002/buildconfigs/linux-defconfig_xen_x86_32 +++ work/buildconfigs/linux-defconfig_xen_x86_32 2006-10-16 12:23:54.000000000 +0900 @@ -183,6 +183,7 @@ CONFIG_MTRR=y CONFIG_REGPARM=y CONFIG_SECCOMP=y CONFIG_HZ_100=y +CONFIG_KEXEC=y # CONFIG_HZ_250 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 @@ -1036,6 +1037,7 @@ CONFIG_DNOTIFY=y # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y +# CONFIG_PROC_VMCORE is not set CONFIG_SYSFS=y CONFIG_TMPFS=y # CONFIG_HUGETLB_PAGE is not set --- 0001/linux-2.6-xen-sparse/arch/i386/Kconfig +++ work/linux-2.6-xen-sparse/arch/i386/Kconfig 2006-10-16 12:23:54.000000000 +0900 @@ -726,7 +726,7 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call (EXPERIMENTAL)" - depends on EXPERIMENTAL && !X86_XEN + depends on EXPERIMENTAL && !XEN_UNPRIVILEGED_GUEST help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot --- 0001/linux-2.6-xen-sparse/arch/i386/kernel/Makefile +++ work/linux-2.6-xen-sparse/arch/i386/kernel/Makefile 2006-10-16 12:23:54.000000000 +0900 @@ -89,7 +89,7 @@ include $(srctree)/scripts/Makefile.xen obj-y += fixup.o microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o -n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o +n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o crash.o obj-y := $(call filterxen, $(obj-y), $(n-obj-xen)) obj-y := $(call cherrypickxen, $(obj-y)) --- 0001/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c +++ work/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c 2006-10-16 12:40:53.000000000 +0900 @@ -69,6 +69,10 @@ #include "setup_arch_pre.h" #include <bios_ebda.h> +#ifdef CONFIG_XEN +#include <xen/interface/kexec.h> +#endif + /* Forward Declaration. */ void __init find_max_pfn(void); @@ -943,6 +947,7 @@ static void __init parse_cmdline_early ( * after a kernel panic. */ else if (!memcmp(from, "crashkernel=", 12)) { +#ifndef CONFIG_XEN unsigned long size, base; size = memparse(from+12, &from); if (*from == ''@'') { @@ -953,6 +958,10 @@ static void __init parse_cmdline_early ( crashk_res.start = base; crashk_res.end = base + size - 1; } +#else + printk("Ignoring crashkernel command line, " + "parameter will be supplied by xen\n"); +#endif } #endif #ifdef CONFIG_PROC_VMCORE @@ -1322,9 +1331,22 @@ void __init setup_bootmem_allocator(void } #endif #ifdef CONFIG_KEXEC +#ifndef CONFIG_XEN if (crashk_res.start != crashk_res.end) reserve_bootmem(crashk_res.start, crashk_res.end - crashk_res.start + 1); +#else + { + xen_kexec_reserve_t reservation; + BUG_ON(HYPERVISOR_kexec(KEXEC_CMD_kexec_reserve, 0, + &reservation)); + if (reservation.size) { + crashk_res.start = reservation.start; + crashk_res.end = reservation.start + + reservation.size - 1; + } + } +#endif #endif if (!xen_feature(XENFEAT_auto_translated_physmap)) @@ -1389,7 +1411,8 @@ legacy_init_iomem_resources(struct e820e request_resource(res, data_resource); #endif #ifdef CONFIG_KEXEC - request_resource(res, &crashk_res); + if (crashk_res.start != crashk_res.end) + request_resource(res, &crashk_res); #endif } } --- /dev/null +++ work/linux-2.6-xen-sparse/include/asm-i386/kexec-xen.h 2006-10-16 12:23:55.000000000 +0900 @@ -0,0 +1,57 @@ +/* + * include/asm-i386/kexec-xen.h + * + * Created By: Horms <horms@verge.net.au> + */ + +#ifndef _I386_KEXEC_XEN_H +#define _I386_KEXEC_XEN_H + +#include <asm/ptrace.h> +#include <asm/types.h> +#include <xen/interface/arch-x86_32.h> + +static inline void crash_translate_regs(struct pt_regs *linux_regs, + struct cpu_user_regs *xen_regs) +{ + xen_regs->ebx = linux_regs->ebx; + xen_regs->ecx = linux_regs->ecx; + xen_regs->edx = linux_regs->edx; + xen_regs->esi = linux_regs->esi; + xen_regs->edi = linux_regs->edi; + xen_regs->ebp = linux_regs->ebp; + xen_regs->eax = linux_regs->eax; + xen_regs->esp = linux_regs->esp; + xen_regs->ss = linux_regs->xss; + xen_regs->cs = linux_regs->xcs; + xen_regs->ds = linux_regs->xds; + xen_regs->es = linux_regs->xes; + xen_regs->eflags = linux_regs->eflags; +} + +/* Kexec needs to know about the actual physical addresss. + * But in xen, on some architectures, a physical address is a + * pseudo-physical addresss. */ +#ifdef CONFIG_XEN +#define kexec_page_to_pfn(page) pfn_to_mfn(page_to_pfn(page)) +#define kexec_pfn_to_page(pfn) pfn_to_page(mfn_to_pfn(pfn)) +#define kexec_virt_to_phys(addr) virt_to_machine(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr)) +#else +#define kexec_page_to_pfn(page) page_to_pfn(page) +#define kexec_pfn_to_page(pfn) pfn_to_page(pfn) +#define kexec_virt_to_phys(addr) virt_to_phys(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(addr) +#endif + +#endif /* _I386_KEXEC_XEN_H */ + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- 0001/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h +++ work/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h 2006-10-16 12:23:54.000000000 +0900 @@ -385,5 +385,13 @@ HYPERVISOR_xenoprof_op( return _hypercall2(int, xenoprof_op, op, arg); } +static inline int +HYPERVISOR_kexec( + unsigned long op, unsigned int arg1, void * extra_args) +{ + return _hypercall3(int, kexec_op, op, arg1, extra_args); +} + + #endif /* __HYPERCALL_H__ */ --- /dev/null +++ work/patches/linux-2.6.16.29/git-3566561bfadffcb5dbc85d576be80c0dbf2cccc9.patch 2006-10-16 12:23:55.000000000 +0900 @@ -0,0 +1,401 @@ +From: Magnus Damm <magnus@valinux.co.jp> +Date: Tue, 26 Sep 2006 08:52:38 +0000 (+0200) +Subject: [PATCH] i386: Avoid overwriting the current pgd (V4, i386) +X-Git-Url: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3566561bfadffcb5dbc85d576be80c0dbf2cccc9 + +[PATCH] i386: Avoid overwriting the current pgd (V4, i386) + +kexec: Avoid overwriting the current pgd (V4, i386) + +This patch upgrades the i386-specific kexec code to avoid overwriting the +current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used +to start a secondary kernel that dumps the memory of the previous kernel. + +The code introduces a new set of page tables. These tables are used to provide +an executable identity mapping without overwriting the current pgd. + +Signed-off-by: Magnus Damm <magnus@valinux.co.jp> +Signed-off-by: Andi Kleen <ak@suse.de> +--- + +--- a/arch/i386/kernel/machine_kexec.c ++++ b/arch/i386/kernel/machine_kexec.c +@@ -21,70 +21,13 @@ + #include <asm/system.h> + + #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) +- +-#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +-#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +-#define L2_ATTR (_PAGE_PRESENT) +- +-#define LEVEL0_SIZE (1UL << 12UL) +- +-#ifndef CONFIG_X86_PAE +-#define LEVEL1_SIZE (1UL << 22UL) +-static u32 pgtable_level1[1024] PAGE_ALIGNED; +- +-static void identity_map_page(unsigned long address) +-{ +- unsigned long level1_index, level2_index; +- u32 *pgtable_level2; +- +- /* Find the current page table */ +- pgtable_level2 = __va(read_cr3()); +- +- /* Find the indexes of the physical address to identity map */ +- level1_index = (address % LEVEL1_SIZE)/LEVEL0_SIZE; +- level2_index = address / LEVEL1_SIZE; +- +- /* Identity map the page table entry */ +- pgtable_level1[level1_index] = address | L0_ATTR; +- pgtable_level2[level2_index] = __pa(pgtable_level1) | L1_ATTR; +- +- /* Flush the tlb so the new mapping takes effect. +- * Global tlb entries are not flushed but that is not an issue. +- */ +- load_cr3(pgtable_level2); +-} +- +-#else +-#define LEVEL1_SIZE (1UL << 21UL) +-#define LEVEL2_SIZE (1UL << 30UL) +-static u64 pgtable_level1[512] PAGE_ALIGNED; +-static u64 pgtable_level2[512] PAGE_ALIGNED; +- +-static void identity_map_page(unsigned long address) +-{ +- unsigned long level1_index, level2_index, level3_index; +- u64 *pgtable_level3; +- +- /* Find the current page table */ +- pgtable_level3 = __va(read_cr3()); +- +- /* Find the indexes of the physical address to identity map */ +- level1_index = (address % LEVEL1_SIZE)/LEVEL0_SIZE; +- level2_index = (address % LEVEL2_SIZE)/LEVEL1_SIZE; +- level3_index = address / LEVEL2_SIZE; +- +- /* Identity map the page table entry */ +- pgtable_level1[level1_index] = address | L0_ATTR; +- pgtable_level2[level2_index] = __pa(pgtable_level1) | L1_ATTR; +- set_64bit(&pgtable_level3[level3_index], +- __pa(pgtable_level2) | L2_ATTR); +- +- /* Flush the tlb so the new mapping takes effect. +- * Global tlb entries are not flushed but that is not an issue. +- */ +- load_cr3(pgtable_level3); +-} ++static u32 kexec_pgd[1024] PAGE_ALIGNED; ++#ifdef CONFIG_X86_PAE ++static u32 kexec_pmd0[1024] PAGE_ALIGNED; ++static u32 kexec_pmd1[1024] PAGE_ALIGNED; + #endif ++static u32 kexec_pte0[1024] PAGE_ALIGNED; ++static u32 kexec_pte1[1024] PAGE_ALIGNED; + + static void set_idt(void *newidt, __u16 limit) + { +@@ -128,16 +71,6 @@ static void load_segments(void) + #undef __STR + } + +-typedef asmlinkage NORET_TYPE void (*relocate_new_kernel_t)( +- unsigned long indirection_page, +- unsigned long reboot_code_buffer, +- unsigned long start_address, +- unsigned int has_pae) ATTRIB_NORET; +- +-extern const unsigned char relocate_new_kernel[]; +-extern void relocate_new_kernel_end(void); +-extern const unsigned int relocate_new_kernel_size; +- + /* + * A architecture hook called to validate the + * proposed image and prepare the control pages +@@ -170,25 +103,29 @@ void machine_kexec_cleanup(struct kimage + */ + NORET_TYPE void machine_kexec(struct kimage *image) + { +- unsigned long page_list; +- unsigned long reboot_code_buffer; +- +- relocate_new_kernel_t rnk; ++ unsigned long page_list[PAGES_NR]; ++ void *control_page; + + /* Interrupts aren''t acceptable while we reboot */ + local_irq_disable(); + +- /* Compute some offsets */ +- reboot_code_buffer = page_to_pfn(image->control_code_page) +- << PAGE_SHIFT; +- page_list = image->head; +- +- /* Set up an identity mapping for the reboot_code_buffer */ +- identity_map_page(reboot_code_buffer); +- +- /* copy it out */ +- memcpy((void *)reboot_code_buffer, relocate_new_kernel, +- relocate_new_kernel_size); ++ control_page = page_address(image->control_code_page); ++ memcpy(control_page, relocate_kernel, PAGE_SIZE); ++ ++ page_list[PA_CONTROL_PAGE] = __pa(control_page); ++ page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel; ++ page_list[PA_PGD] = __pa(kexec_pgd); ++ page_list[VA_PGD] = (unsigned long)kexec_pgd; ++#ifdef CONFIG_X86_PAE ++ page_list[PA_PMD_0] = __pa(kexec_pmd0); ++ page_list[VA_PMD_0] = (unsigned long)kexec_pmd0; ++ page_list[PA_PMD_1] = __pa(kexec_pmd1); ++ page_list[VA_PMD_1] = (unsigned long)kexec_pmd1; ++#endif ++ page_list[PA_PTE_0] = __pa(kexec_pte0); ++ page_list[VA_PTE_0] = (unsigned long)kexec_pte0; ++ page_list[PA_PTE_1] = __pa(kexec_pte1); ++ page_list[VA_PTE_1] = (unsigned long)kexec_pte1; + + /* The segment registers are funny things, they have both a + * visible and an invisible part. Whenever the visible part is +@@ -207,8 +144,8 @@ NORET_TYPE void machine_kexec(struct kim + set_idt(phys_to_virt(0),0); + + /* now call it */ +- rnk = (relocate_new_kernel_t) reboot_code_buffer; +- (*rnk)(page_list, reboot_code_buffer, image->start, cpu_has_pae); ++ relocate_kernel((unsigned long)image->head, (unsigned long)page_list, ++ image->start, cpu_has_pae); + } + + /* crashkernel=size@addr specifies the location to reserve for +--- a/arch/i386/kernel/relocate_kernel.S ++++ b/arch/i386/kernel/relocate_kernel.S +@@ -7,16 +7,138 @@ + */ + + #include <linux/linkage.h> ++#include <asm/page.h> ++#include <asm/kexec.h> ++ ++/* ++ * Must be relocatable PIC code callable as a C function ++ */ ++ ++#define PTR(x) (x << 2) ++#define PAGE_ALIGNED (1 << PAGE_SHIFT) ++#define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */ ++#define PAE_PGD_ATTR 0x01 /* _PAGE_PRESENT */ ++ ++ .text ++ .align PAGE_ALIGNED ++ .globl relocate_kernel ++relocate_kernel: ++ movl 8(%esp), %ebp /* list of pages */ ++ ++#ifdef CONFIG_X86_PAE ++ /* map the control page at its virtual address */ ++ ++ movl PTR(VA_PGD)(%ebp), %edi ++ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax ++ andl $0xc0000000, %eax ++ shrl $27, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_PMD_0)(%ebp), %edx ++ orl $PAE_PGD_ATTR, %edx ++ movl %edx, (%eax) ++ ++ movl PTR(VA_PMD_0)(%ebp), %edi ++ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax ++ andl $0x3fe00000, %eax ++ shrl $18, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_PTE_0)(%ebp), %edx ++ orl $PAGE_ATTR, %edx ++ movl %edx, (%eax) ++ ++ movl PTR(VA_PTE_0)(%ebp), %edi ++ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax ++ andl $0x001ff000, %eax ++ shrl $9, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %edx ++ orl $PAGE_ATTR, %edx ++ movl %edx, (%eax) ++ ++ /* identity map the control page at its physical address */ ++ ++ movl PTR(VA_PGD)(%ebp), %edi ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax ++ andl $0xc0000000, %eax ++ shrl $27, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_PMD_1)(%ebp), %edx ++ orl $PAE_PGD_ATTR, %edx ++ movl %edx, (%eax) ++ ++ movl PTR(VA_PMD_1)(%ebp), %edi ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax ++ andl $0x3fe00000, %eax ++ shrl $18, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_PTE_1)(%ebp), %edx ++ orl $PAGE_ATTR, %edx ++ movl %edx, (%eax) ++ ++ movl PTR(VA_PTE_1)(%ebp), %edi ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax ++ andl $0x001ff000, %eax ++ shrl $9, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %edx ++ orl $PAGE_ATTR, %edx ++ movl %edx, (%eax) ++#else ++ /* map the control page at its virtual address */ ++ ++ movl PTR(VA_PGD)(%ebp), %edi ++ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax ++ andl $0xffc00000, %eax ++ shrl $20, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_PTE_0)(%ebp), %edx ++ orl $PAGE_ATTR, %edx ++ movl %edx, (%eax) ++ ++ movl PTR(VA_PTE_0)(%ebp), %edi ++ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax ++ andl $0x003ff000, %eax ++ shrl $10, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %edx ++ orl $PAGE_ATTR, %edx ++ movl %edx, (%eax) ++ ++ /* identity map the control page at its physical address */ ++ ++ movl PTR(VA_PGD)(%ebp), %edi ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax ++ andl $0xffc00000, %eax ++ shrl $20, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_PTE_1)(%ebp), %edx ++ orl $PAGE_ATTR, %edx ++ movl %edx, (%eax) ++ ++ movl PTR(VA_PTE_1)(%ebp), %edi ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax ++ andl $0x003ff000, %eax ++ shrl $10, %eax ++ addl %edi, %eax ++ ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %edx ++ orl $PAGE_ATTR, %edx ++ movl %edx, (%eax) ++#endif + +- /* +- * Must be relocatable PIC code callable as a C function, that once +- * it starts can not use the previous processes stack. +- */ +- .globl relocate_new_kernel + relocate_new_kernel: + /* read the arguments and say goodbye to the stack */ + movl 4(%esp), %ebx /* page_list */ +- movl 8(%esp), %ebp /* reboot_code_buffer */ ++ movl 8(%esp), %ebp /* list of pages */ + movl 12(%esp), %edx /* start address */ + movl 16(%esp), %ecx /* cpu_has_pae */ + +@@ -24,11 +146,26 @@ relocate_new_kernel: + pushl $0 + popfl + +- /* set a new stack at the bottom of our page... */ +- lea 4096(%ebp), %esp ++ /* get physical address of control page now */ ++ /* this is impossible after page table switch */ ++ movl PTR(PA_CONTROL_PAGE)(%ebp), %edi ++ ++ /* switch to new set of page tables */ ++ movl PTR(PA_PGD)(%ebp), %eax ++ movl %eax, %cr3 ++ ++ /* setup a new stack at the end of the physical control page */ ++ lea 4096(%edi), %esp + +- /* store the parameters back on the stack */ +- pushl %edx /* store the start address */ ++ /* jump to identity mapped page */ ++ movl %edi, %eax ++ addl $(identity_mapped - relocate_kernel), %eax ++ pushl %eax ++ ret ++ ++identity_mapped: ++ /* store the start address on the stack */ ++ pushl %edx + + /* Set cr0 to a known state: + * 31 0 == Paging disabled +@@ -113,8 +250,3 @@ relocate_new_kernel: + xorl %edi, %edi + xorl %ebp, %ebp + ret +-relocate_new_kernel_end: +- +- .globl relocate_new_kernel_size +-relocate_new_kernel_size: +- .long relocate_new_kernel_end - relocate_new_kernel +--- a/include/asm-i386/kexec.h ++++ b/include/asm-i386/kexec.h +@@ -1,6 +1,26 @@ + #ifndef _I386_KEXEC_H + #define _I386_KEXEC_H + ++#define PA_CONTROL_PAGE 0 ++#define VA_CONTROL_PAGE 1 ++#define PA_PGD 2 ++#define VA_PGD 3 ++#define PA_PTE_0 4 ++#define VA_PTE_0 5 ++#define PA_PTE_1 6 ++#define VA_PTE_1 7 ++#ifdef CONFIG_X86_PAE ++#define PA_PMD_0 8 ++#define VA_PMD_0 9 ++#define PA_PMD_1 10 ++#define VA_PMD_1 11 ++#define PAGES_NR 12 ++#else ++#define PAGES_NR 8 ++#endif ++ ++#ifndef __ASSEMBLY__ ++ + #include <asm/fixmap.h> + #include <asm/ptrace.h> + #include <asm/string.h> +@@ -72,5 +92,12 @@ static inline void crash_setup_regs(stru + newregs->eip = (unsigned long)current_text_addr(); + } + } ++asmlinkage NORET_TYPE void ++relocate_kernel(unsigned long indirection_page, ++ unsigned long control_page, ++ unsigned long start_address, ++ unsigned int has_pae) ATTRIB_NORET; ++ ++#endif /* __ASSEMBLY__ */ + + #endif /* _I386_KEXEC_H */ --- /dev/null +++ work/patches/linux-2.6.16.29/linux-2.6.19-rc1-kexec-move_segment_code-i386.patch 2006-10-16 12:23:55.000000000 +0900 @@ -0,0 +1,169 @@ +kexec: Move asm segment handling code to the assembly file (i386) + +This patch moves the idt, gdt, and segment handling code from machine_kexec.c +to relocate_kernel.S. The main reason behind this move is to avoid code +duplication in the Xen hypervisor. With this patch all code required to kexec +is put on the control page. + +On top of that this patch also counts as a cleanup - I think it is much +nicer to write assembly directly in assembly files than wrap inline assembly +in C functions for no apparent reason. + +Signed-off-by: Magnus Damm <magnus@valinux.co.jp> +--- + + Applies to 2.6.19-rc1. + + machine_kexec.c | 59 ----------------------------------------------------- + relocate_kernel.S | 58 +++++++++++++++++++++++++++++++++++++++++++++++----- + 2 files changed, 53 insertions(+), 64 deletions(-) + +--- 0002/arch/i386/kernel/machine_kexec.c ++++ work/arch/i386/kernel/machine_kexec.c 2006-10-05 15:49:08.000000000 +0900 +@@ -29,48 +29,6 @@ static u32 kexec_pmd1[1024] PAGE_ALIGNED + static u32 kexec_pte0[1024] PAGE_ALIGNED; + static u32 kexec_pte1[1024] PAGE_ALIGNED; + +-static void set_idt(void *newidt, __u16 limit) +-{ +- struct Xgt_desc_struct curidt; +- +- /* ia32 supports unaliged loads & stores */ +- curidt.size = limit; +- curidt.address = (unsigned long)newidt; +- +- load_idt(&curidt); +-}; +- +- +-static void set_gdt(void *newgdt, __u16 limit) +-{ +- struct Xgt_desc_struct curgdt; +- +- /* ia32 supports unaligned loads & stores */ +- curgdt.size = limit; +- curgdt.address = (unsigned long)newgdt; +- +- load_gdt(&curgdt); +-}; +- +-static void load_segments(void) +-{ +-#define __STR(X) #X +-#define STR(X) __STR(X) +- +- __asm__ __volatile__ ( +- "\tljmp $"STR(__KERNEL_CS)",$1f\n" +- "\t1:\n" +- "\tmovl $"STR(__KERNEL_DS)",%%eax\n" +- "\tmovl %%eax,%%ds\n" +- "\tmovl %%eax,%%es\n" +- "\tmovl %%eax,%%fs\n" +- "\tmovl %%eax,%%gs\n" +- "\tmovl %%eax,%%ss\n" +- ::: "eax", "memory"); +-#undef STR +-#undef __STR +-} +- + /* + * A architecture hook called to validate the + * proposed image and prepare the control pages +@@ -127,23 +85,6 @@ NORET_TYPE void machine_kexec(struct kim + page_list[PA_PTE_1] = __pa(kexec_pte1); + page_list[VA_PTE_1] = (unsigned long)kexec_pte1; + +- /* The segment registers are funny things, they have both a +- * visible and an invisible part. Whenever the visible part is +- * set to a specific selector, the invisible part is loaded +- * with from a table in memory. At no other time is the +- * descriptor table in memory accessed. +- * +- * I take advantage of this here by force loading the +- * segments, before I zap the gdt with an invalid value. +- */ +- load_segments(); +- /* The gdt & idt are now invalid. +- * If you want to load them you must set up your own idt & gdt. +- */ +- set_gdt(phys_to_virt(0),0); +- set_idt(phys_to_virt(0),0); +- +- /* now call it */ + relocate_kernel((unsigned long)image->head, (unsigned long)page_list, + image->start, cpu_has_pae); + } +--- 0002/arch/i386/kernel/relocate_kernel.S ++++ work/arch/i386/kernel/relocate_kernel.S 2006-10-05 16:03:21.000000000 +0900 +@@ -154,14 +154,45 @@ relocate_new_kernel: + movl PTR(PA_PGD)(%ebp), %eax + movl %eax, %cr3 + ++ /* setup idt */ ++ movl %edi, %eax ++ addl $(idt_48 - relocate_kernel), %eax ++ lidtl (%eax) ++ ++ /* setup gdt */ ++ movl %edi, %eax ++ addl $(gdt - relocate_kernel), %eax ++ movl %edi, %esi ++ addl $((gdt_48 - relocate_kernel) + 2), %esi ++ movl %eax, (%esi) ++ ++ movl %edi, %eax ++ addl $(gdt_48 - relocate_kernel), %eax ++ lgdtl (%eax) ++ ++ /* setup data segment registers */ ++ mov $(gdt_ds - gdt), %eax ++ mov %eax, %ds ++ mov %eax, %es ++ mov %eax, %fs ++ mov %eax, %gs ++ mov %eax, %ss ++ + /* setup a new stack at the end of the physical control page */ + lea 4096(%edi), %esp + +- /* jump to identity mapped page */ +- movl %edi, %eax +- addl $(identity_mapped - relocate_kernel), %eax +- pushl %eax +- ret ++ /* load new code segment and jump to identity mapped page */ ++ movl %edi, %esi ++ xorl %eax, %eax ++ pushl %eax ++ pushl %esi ++ pushl %eax ++ movl $(gdt_cs - gdt), %eax ++ pushl %eax ++ movl %edi, %eax ++ addl $(identity_mapped - relocate_kernel),%eax ++ pushl %eax ++ iretl + + identity_mapped: + /* store the start address on the stack */ +@@ -250,3 +281,20 @@ identity_mapped: + xorl %edi, %edi + xorl %ebp, %ebp + ret ++ ++ .align 16 ++gdt: ++ .quad 0x0000000000000000 /* NULL descriptor */ ++gdt_cs: ++ .quad 0x00cf9a000000ffff /* kernel 4GB code at 0x00000000 */ ++gdt_ds: ++ .quad 0x00cf92000000ffff /* kernel 4GB data at 0x00000000 */ ++gdt_end: ++ ++gdt_48: ++ .word gdt_end - gdt - 1 /* limit */ ++ .long 0 /* base - filled in by code above */ ++ ++idt_48: ++ .word 0 /* limit */ ++ .long 0 /* base */ --- /dev/null +++ work/patches/linux-2.6.16.29/linux-2.6.19-rc1-kexec-xen-i386.patch 2006-10-16 12:23:55.000000000 +0900 @@ -0,0 +1,54 @@ +--- 0004/arch/i386/kernel/machine_kexec.c ++++ work/arch/i386/kernel/machine_kexec.c 2006-10-11 18:34:06.000000000 +0900 +@@ -20,6 +20,10 @@ + #include <asm/desc.h> + #include <asm/system.h> + ++#ifdef CONFIG_XEN ++#include <xen/interface/kexec.h> ++#endif ++ + #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) + static u32 kexec_pgd[1024] PAGE_ALIGNED; + #ifdef CONFIG_X86_PAE +@@ -29,6 +33,40 @@ static u32 kexec_pmd1[1024] PAGE_ALIGNED + static u32 kexec_pte0[1024] PAGE_ALIGNED; + static u32 kexec_pte1[1024] PAGE_ALIGNED; + ++#ifdef CONFIG_XEN ++ ++#define __ma(x) (pfn_to_mfn(__pa((x)) >> PAGE_SHIFT) << PAGE_SHIFT) ++ ++#if PAGES_NR > KEXEC_XEN_NO_PAGES ++#error PAGES_NR is greater than KEXEC_XEN_NO_PAGES - Xen support will break ++#endif ++ ++#if PA_CONTROL_PAGE != 0 ++#error PA_CONTROL_PAGE is non zero - Xen support will break ++#endif ++ ++void machine_kexec_setup_load_arg(xen_kexec_image_t *xki, struct kimage *image) ++{ ++ void *control_page; ++ ++ memset(xki->page_list, 0, sizeof(xki->page_list)); ++ ++ control_page = page_address(image->control_code_page); ++ memcpy(control_page, relocate_kernel, PAGE_SIZE); ++ ++ xki->page_list[PA_CONTROL_PAGE] = __ma(control_page); ++ xki->page_list[PA_PGD] = __ma(kexec_pgd); ++#ifdef CONFIG_X86_PAE ++ xki->page_list[PA_PMD_0] = __ma(kexec_pmd0); ++ xki->page_list[PA_PMD_1] = __ma(kexec_pmd1); ++#endif ++ xki->page_list[PA_PTE_0] = __ma(kexec_pte0); ++ xki->page_list[PA_PTE_1] = __ma(kexec_pte1); ++ ++} ++ ++#endif /* CONFIG_XEN */ ++ + /* + * A architecture hook called to validate the + * proposed image and prepare the control pages --- 0004/patches/linux-2.6.16.29/series +++ work/patches/linux-2.6.16.29/series 2006-10-16 12:23:54.000000000 +0900 @@ -1,6 +1,9 @@ kexec-generic.patch git-2efe55a9cec8418f0e0cde3dc3787a42fddc4411.patch git-2a8a3d5b65e86ec1dfef7d268c64a909eab94af7.patch +git-3566561bfadffcb5dbc85d576be80c0dbf2cccc9.patch +linux-2.6.19-rc1-kexec-move_segment_code-i386.patch +linux-2.6.19-rc1-kexec-xen-i386.patch blktap-aio-16_03_06.patch device_bind.patch fix-hz-suspend.patch --- 0004/xen/arch/x86/crash.c +++ work/xen/arch/x86/crash.c 2006-10-16 12:23:54.000000000 +0900 @@ -21,6 +21,7 @@ #include <xen/delay.h> #include <xen/perfc.h> #include <xen/kexec.h> +#include <xen/sched.h> #include <public/xen.h> #include <asm/hvm/hvm.h> @@ -171,6 +172,51 @@ static void nmi_shootdown_cpus(void) } #endif +/* The cr3 for dom0 on each of its vcpus + * It is added as ELF_Prstatus prstatus.pr_reg[ELF_NGREG-1)], where + * prstatus is the data of the elf note, and ELF_NGREG was extended + * by one to allow extra space. + * This code runs after all cpus except the crashing one have + * been shutdown so as to avoid having to hold domlist_lock, + * as locking after a crash is playing with fire */ +void find_dom0_cr3(void) +{ + struct domain *d; + struct vcpu *v; + uint32_t *buf; + uint32_t cr3; + Elf_Note note; + + /* Don''t need to grab domlist_lock as we are the only thing running */ + + /* No need to traverse domain_list, as dom0 is always first */ + d = domain_list; + BUG_ON(d->domain_id); + + for_each_vcpu ( d, v ) { + if ( test_bit(_VCPUF_down, &v->vcpu_flags) ) + continue; + buf = (uint32_t *)per_cpu(crash_notes, v->processor); + if (!buf) /* XXX: Can this ever occur? */ + continue; + + memcpy(¬e, buf, sizeof(Elf_Note)); + buf += (sizeof(Elf_Note) +3)/4 + (note.namesz + 3)/4 + + (note.descsz + 3)/4; + + /* XXX: This probably doesn''t take into account shadow mode, + * but that might not be a problem */ + cr3 = pagetable_get_pfn(v->arch.guest_table); + + buf = append_elf_note(buf, "Xen Domanin-0 CR3", + NT_XEN_DOM0_CR3, &cr3, 4); + final_note(buf); + + printk("domain:%i vcpu:%u processor:%u cr3:%08x\n", + d->domain_id, v->vcpu_id, v->processor, cr3); + } +} + void machine_crash_shutdown(struct cpu_user_regs *regs) { printk("machine_crash_shutdown: %d\n", smp_processor_id()); @@ -185,6 +231,7 @@ void machine_crash_shutdown(struct cpu_u hvm_disable(); crash_save_self(regs); + find_dom0_cr3(); } /* --- 0001/xen/arch/x86/x86_32/entry.S +++ work/xen/arch/x86/x86_32/entry.S 2006-10-16 12:23:54.000000000 +0900 @@ -672,6 +672,7 @@ ENTRY(hypercall_table) .long do_hvm_op .long do_sysctl /* 35 */ .long do_domctl + .long do_kexec_op .rept NR_hypercalls-((.-hypercall_table)/4) .long do_ni_hypercall .endr @@ -714,6 +715,7 @@ ENTRY(hypercall_args_table) .byte 2 /* do_hvm_op */ .byte 1 /* do_sysctl */ /* 35 */ .byte 1 /* do_domctl */ + .byte 1 /* do_kexec_op */ .rept NR_hypercalls-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr --- 0004/xen/arch/x86/x86_32/machine_kexec.c +++ work/xen/arch/x86/x86_32/machine_kexec.c 2006-10-16 12:23:55.000000000 +0900 @@ -1,18 +1,29 @@ -/* +/****************************************************************************** * arch/x86/x86_32/machine_kexec.c - * Handle transition of Linux booting another kernel - * - * Created By: Horms <horms@verge.net.au> + * + * Created By: Horms * - * Should be losely based on arch/i386/kernel/machine_kexec.c + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 */ -#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/types.h> #include <public/kexec.h> +#include <asm/fixmap.h> +#include <asm/processor.h> + +typedef asmlinkage void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long control_page, + unsigned long start_address, + unsigned int has_pae); void machine_kexec(xen_kexec_image_t *image) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + relocate_new_kernel_t rnk; + + rnk = (relocate_new_kernel_t) fix_to_virt(FIX_KEXEC_BASE_0); + (*rnk)(image->indirection_page, (unsigned long)image->page_list, + image->start_address, (unsigned long)cpu_has_pae); } /* --- 0004/xen/include/asm-x86/x86_32/elf.h +++ work/xen/include/asm-x86/x86_32/elf.h 2006-10-16 12:23:55.000000000 +0900 @@ -3,17 +3,39 @@ * * Created By: Horms * - * Should pull be based on include/asm-i386/elf.h:ELF_CORE_COPY_REGS - * from Linux 2.6.16 + * Based heavily on include/asm-i386/elf.h and + * include/asm-i386/system.h from Linux 2.6.16 */ #ifndef __X86_ELF_X86_32_H__ #define __X86_ELF_X86_32_H__ -#include <xen/lib.h> /* for printk() used in stub */ +/* XXX: Xen doesn''t have orig_eax. For kdump, on a dom0 crash, the values + * for the crashing CPU could could be passed down from dom0, but is that + * neccessary? + * Also, I''m not sure why fs and gs are derived from the CPU + * rather than regs */ -#define ELF_CORE_COPY_REGS(pr_reg, regs) \ - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +#define ELF_CORE_COPY_REGS(pr_reg, regs) do { \ + unsigned i; \ + pr_reg[0] = regs->ebx; \ + pr_reg[1] = regs->ecx; \ + pr_reg[2] = regs->edx; \ + pr_reg[3] = regs->esi; \ + pr_reg[4] = regs->edi; \ + pr_reg[5] = regs->ebp; \ + pr_reg[6] = regs->eax; \ + pr_reg[7] = regs->ds; \ + pr_reg[8] = regs->es; \ + asm volatile("mov %%fs,%0":"=rm" (i)); pr_reg[9] = i; \ + asm volatile("mov %%gs,%0":"=rm" (i)); pr_reg[10] = i; \ + pr_reg[11] = 0; /* regs->orig_eax; */ \ + pr_reg[12] = regs->eip; \ + pr_reg[13] = regs->cs; \ + pr_reg[14] = regs->eflags; \ + pr_reg[15] = regs->esp; \ + pr_reg[16] = regs->ss; \ +} while(0); #endif /* __X86_ELF_X86_32_H__ */ --- 0004/xen/include/asm-x86/x86_32/kexec.h +++ work/xen/include/asm-x86/x86_32/kexec.h 2006-10-16 12:23:55.000000000 +0900 @@ -3,39 +3,72 @@ * * Created By: Horms * - * Should be based heavily on include/asm-i386/kexec.h from Linux 2.6.16 - * + * Based heavily on include/asm-i386/kexec.h from Linux 2.6.16 */ -#ifndef __X86_32_KEXEC_H__ -#define __X86_32_KEXEC_H__ - -#include <xen/lib.h> /* for printk() used in stub */ -#include <xen/types.h> -#include <public/xen.h> +#ifndef __X86_KEXEC_X86_32_H__ +#define __X86_KEXEC_X86_32_H__ +/* CPU does not save ss and esp on stack if execution is already + * running in kernel mode at the time of NMI occurrence. This code + * fixes it. + */ static void crash_fixup_ss_esp(struct cpu_user_regs *newregs, - struct cpu_user_regs *oldregs) + struct cpu_user_regs *oldregs) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); - return; - crash_fixup_ss_esp(newregs, oldregs); + memcpy(newregs, oldregs, sizeof(*newregs)); + newregs->esp = (unsigned long)&(oldregs->esp); + __asm__ __volatile__( + "xorl %%eax, %%eax\n\t" + "movw %%ss, %%ax\n\t" + :"=a"(newregs->ss)); } +/* + * This function is responsible for capturing register states if coming + * via panic otherwise just fix up the ss and esp if coming via kernel + * mode exception. + */ static void crash_setup_regs(struct cpu_user_regs *newregs, struct cpu_user_regs *oldregs) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + if (oldregs) + crash_fixup_ss_esp(newregs, oldregs); + else { + __asm__ __volatile__("movl %%ebx,%0" : "=m"(newregs->ebx)); + __asm__ __volatile__("movl %%ecx,%0" : "=m"(newregs->ecx)); + __asm__ __volatile__("movl %%edx,%0" : "=m"(newregs->edx)); + __asm__ __volatile__("movl %%esi,%0" : "=m"(newregs->esi)); + __asm__ __volatile__("movl %%edi,%0" : "=m"(newregs->edi)); + __asm__ __volatile__("movl %%ebp,%0" : "=m"(newregs->ebp)); + __asm__ __volatile__("movl %%eax,%0" : "=m"(newregs->eax)); + __asm__ __volatile__("movl %%esp,%0" : "=m"(newregs->esp)); + __asm__ __volatile__("movw %%ss, %%ax;" :"=a"(newregs->ss)); + __asm__ __volatile__("movw %%cs, %%ax;" :"=a"(newregs->cs)); + __asm__ __volatile__("movw %%ds, %%ax;" :"=a"(newregs->ds)); + __asm__ __volatile__("movw %%es, %%ax;" :"=a"(newregs->es)); + __asm__ __volatile__("pushfl; popl %0" :"=m"(newregs->eflags)); + + newregs->eip = (unsigned long)current_text_addr(); + } } +/* + * From Linux 2.6.16''s include/asm-i386/mach-xen/asm/ptrace.h + * + * user_mode_vm(regs) determines whether a register set came from user mode. + * This is true if V8086 mode was enabled OR if the register set was from + * protected mode with RPL-3 CS value. This tricky test checks that with + * one comparison. Many places in the kernel can bypass this full check + * if they have already ruled out V8086 mode, so user_mode(regs) can be used. + */ static inline int user_mode(struct cpu_user_regs *regs) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); - return -1; + return (regs->cs & 2) != 0; } -#endif /* __X86_32_KEXEC_H__ */ +#endif /* __X86_KEXEC_X86_32_H__ */ /* * Local variables: _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magnus Damm
2006-Oct-16 08:33 UTC
[Xen-devel] [PATCH 04/04] Kexec / Kdump: x86_64 specific code
[PATCH 04/04] Kexec / Kdump: x86_64 specific code This patch contains the x86_64 implementation of Kexec / Kdump for Xen. Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> --- Applies on top of xen-unstable-11760. buildconfigs/linux-defconfig_xen_x86_64 | 1 linux-2.6-xen-sparse/arch/x86_64/Kconfig | 2 linux-2.6-xen-sparse/arch/x86_64/kernel/Makefile | 2 linux-2.6-xen-sparse/arch/x86_64/kernel/setup-xen.c | 27 linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/hypercall.h | 7 linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/ptrace.h | 2 patches/linux-2.6.16.29/series | 3 linux-2.6-xen-sparse/include/asm-x86_64/kexec-xen.h | 64 + patches/linux-2.6.16.29/git-4b..1f.patch | 375 ++++++ patches/linux-2.6.16.29/linux-2.6.19-rc1-kexe..code-x86_64.patch | 161 ++++ patches/linux-2.6.16.29/linux-2.6.19-rc1-kexec-xen-x86_64.patch | 162 ++++ xen/arch/x86/x86_64/entry.S | 2 xen/arch/x86/x86_64/machine_kexec.c | 19 xen/include/asm-x86/x86_64/elf.h | 48 + xen/include/asm-x86/x86_64/kexec.h | 33 xen/include/public/kexec.h | 3 16 files changed, 897 insertions(+), 14 deletions(-) --- 0002/buildconfigs/linux-defconfig_xen_x86_64 +++ work/buildconfigs/linux-defconfig_xen_x86_64 2006-10-16 12:41:27.000000000 +0900 @@ -138,6 +138,7 @@ CONFIG_SWIOTLB=y CONFIG_PHYSICAL_START=0x100000 CONFIG_SECCOMP=y CONFIG_HZ_100=y +CONFIG_KEXEC=y # CONFIG_HZ_250 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 --- 0001/linux-2.6-xen-sparse/arch/x86_64/Kconfig +++ work/linux-2.6-xen-sparse/arch/x86_64/Kconfig 2006-10-16 12:41:27.000000000 +0900 @@ -435,7 +435,7 @@ config X86_MCE_AMD config KEXEC bool "kexec system call (EXPERIMENTAL)" - depends on EXPERIMENTAL && !X86_64_XEN + depends on EXPERIMENTAL && !XEN_UNPRIVILEGED_GUEST help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot --- 0001/linux-2.6-xen-sparse/arch/x86_64/kernel/Makefile +++ work/linux-2.6-xen-sparse/arch/x86_64/kernel/Makefile 2006-10-16 12:41:27.000000000 +0900 @@ -59,7 +59,7 @@ pci-dma-y += ../../i386/kernel/pci-dma microcode-$(subst m,y,$(CONFIG_MICROCODE)) := ../../i386/kernel/microcode-xen.o quirks-y := ../../i386/kernel/quirks-xen.o -n-obj-xen := i8259.o reboot.o i8237.o smpboot.o trampoline.o +n-obj-xen := i8259.o reboot.o i8237.o smpboot.o trampoline.o crash.o include $(srctree)/scripts/Makefile.xen --- 0001/linux-2.6-xen-sparse/arch/x86_64/kernel/setup-xen.c +++ work/linux-2.6-xen-sparse/arch/x86_64/kernel/setup-xen.c 2006-10-16 12:41:56.000000000 +0900 @@ -80,6 +80,10 @@ #include <asm/mach-xen/setup_arch_post.h> #include <xen/interface/memory.h> +#ifdef CONFIG_XEN +#include <xen/interface/kexec.h> +#endif + extern unsigned long start_pfn; extern struct edid_info edid_info; @@ -450,6 +454,7 @@ static __init void parse_cmdline_early ( * after a kernel panic. */ else if (!memcmp(from, "crashkernel=", 12)) { +#ifndef CONFIG_XEN unsigned long size, base; size = memparse(from+12, &from); if (*from == ''@'') { @@ -460,6 +465,10 @@ static __init void parse_cmdline_early ( crashk_res.start = base; crashk_res.end = base + size - 1; } +#else + printk("Ignoring crashkernel command line, " + "parameter will be supplied by xen\n"); +#endif } #endif @@ -812,10 +821,23 @@ void __init setup_arch(char **cmdline_p) #endif #endif /* !CONFIG_XEN */ #ifdef CONFIG_KEXEC +#ifndef CONFIG_XEN if (crashk_res.start != crashk_res.end) { reserve_bootmem(crashk_res.start, crashk_res.end - crashk_res.start + 1); } +#else + { + xen_kexec_reserve_t reservation; + BUG_ON(HYPERVISOR_kexec(KEXEC_CMD_kexec_reserve, 0, + &reservation)); + if (reservation.size) { + crashk_res.start = reservation.start; + crashk_res.end = reservation.start + + reservation.size - 1; + } + } +#endif #endif paging_init(); @@ -954,6 +976,11 @@ void __init setup_arch(char **cmdline_p) iommu_hole_init(); #endif +#ifdef CONFIG_KEXEC + if (crashk_res.start != crashk_res.end) + request_resource(&ioport_resource, &crashk_res); +#endif + #ifdef CONFIG_XEN { struct physdev_set_iopl set_iopl; --- /dev/null +++ work/linux-2.6-xen-sparse/include/asm-x86_64/kexec-xen.h 2006-10-16 12:41:28.000000000 +0900 @@ -0,0 +1,64 @@ +/* + * include/asm-x86_64/kexec-xen.h + * + * Created By: Horms <horms@verge.net.au> + */ + +#ifndef _X86_64_KEXEC_XEN_H +#define _X86_64_KEXEC_XEN_H + +#include <asm/ptrace.h> +#include <asm/types.h> +#include <xen/interface/arch-x86_64.h> + +static inline void crash_translate_regs(struct pt_regs *linux_regs, + struct cpu_user_regs *xen_regs) +{ + xen_regs->r15 = linux_regs->r15; + xen_regs->r14 = linux_regs->r14; + xen_regs->r13 = linux_regs->r13; + xen_regs->r12 = linux_regs->r12; + xen_regs->rbp = linux_regs->rbp; + xen_regs->rbx = linux_regs->rbx; + xen_regs->r11 = linux_regs->r11; + xen_regs->r10 = linux_regs->r10; + xen_regs->r9 = linux_regs->r9; + xen_regs->r8 = linux_regs->r8; + xen_regs->rax = linux_regs->rax; + xen_regs->rcx = linux_regs->rcx; + xen_regs->rdx = linux_regs->rdx; + xen_regs->rsi = linux_regs->rsi; + xen_regs->rdi = linux_regs->rdi; + xen_regs->rip = linux_regs->rip; + xen_regs->cs = linux_regs->cs; + xen_regs->rflags = linux_regs->eflags; + xen_regs->rsp = linux_regs->rsp; + xen_regs->ss = linux_regs->ss; +} + +/* Kexec needs to know about the actual physical addresss. + * But in xen, on some architectures, a physical address is a + * pseudo-physical addresss. */ +#ifdef CONFIG_XEN +#define kexec_page_to_pfn(page) pfn_to_mfn(page_to_pfn(page)) +#define kexec_pfn_to_page(pfn) pfn_to_page(mfn_to_pfn(pfn)) +#define kexec_virt_to_phys(addr) virt_to_machine(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr)) +#else +#define kexec_page_to_pfn(page) page_to_pfn(page) +#define kexec_pfn_to_page(pfn) pfn_to_page(pfn) +#define kexec_virt_to_phys(addr) virt_to_phys(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(addr) +#endif + +#endif /* _X86_64_KEXEC_XEN_H */ + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- 0001/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/hypercall.h +++ work/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/hypercall.h 2006-10-16 12:41:27.000000000 +0900 @@ -386,4 +386,11 @@ HYPERVISOR_xenoprof_op( return _hypercall2(int, xenoprof_op, op, arg); } +static inline int +HYPERVISOR_kexec( + unsigned long op, unsigned int arg1, void * extra_args) +{ + return _hypercall3(int, kexec_op, op, arg1, extra_args); +} + #endif /* __HYPERCALL_H__ */ --- 0001/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/ptrace.h +++ work/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/ptrace.h 2006-10-16 12:41:27.000000000 +0900 @@ -90,6 +90,8 @@ extern unsigned long profile_pc(struct p #define profile_pc(regs) instruction_pointer(regs) #endif +#include <linux/compiler.h> + void signal_fault(struct pt_regs *regs, void __user *frame, char *where); struct task_struct; --- /dev/null +++ work/patches/linux-2.6.16.29/git-4bfaaef01a1badb9e8ffb0c0a37cd2379008d21f.patch 2006-10-16 12:41:28.000000000 +0900 @@ -0,0 +1,375 @@ +From: Magnus Damm <magnus@valinux.co.jp> +Date: Tue, 26 Sep 2006 08:52:38 +0000 (+0200) +Subject: [PATCH] Avoid overwriting the current pgd (V4, x86_64) +X-Git-Tag: v2.6.19-rc1 +X-Git-Url: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4bfaaef01a1badb9e8ffb0c0a37cd2379008d21f + +[PATCH] Avoid overwriting the current pgd (V4, x86_64) + +kexec: Avoid overwriting the current pgd (V4, x86_64) + +This patch upgrades the x86_64-specific kexec code to avoid overwriting the +current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used +to start a secondary kernel that dumps the memory of the previous kernel. + +The code introduces a new set of page tables. These tables are used to provide +an executable identity mapping without overwriting the current pgd. + +Signed-off-by: Magnus Damm <magnus@valinux.co.jp> +Signed-off-by: Andi Kleen <ak@suse.de> +--- + +--- a/arch/x86_64/kernel/machine_kexec.c ++++ b/arch/x86_64/kernel/machine_kexec.c +@@ -15,6 +15,15 @@ + #include <asm/mmu_context.h> + #include <asm/io.h> + ++#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) ++static u64 kexec_pgd[512] PAGE_ALIGNED; ++static u64 kexec_pud0[512] PAGE_ALIGNED; ++static u64 kexec_pmd0[512] PAGE_ALIGNED; ++static u64 kexec_pte0[512] PAGE_ALIGNED; ++static u64 kexec_pud1[512] PAGE_ALIGNED; ++static u64 kexec_pmd1[512] PAGE_ALIGNED; ++static u64 kexec_pte1[512] PAGE_ALIGNED; ++ + static void init_level2_page(pmd_t *level2p, unsigned long addr) + { + unsigned long end_addr; +@@ -144,32 +153,19 @@ static void load_segments(void) + ); + } + +-typedef NORET_TYPE void (*relocate_new_kernel_t)(unsigned long indirection_page, +- unsigned long control_code_buffer, +- unsigned long start_address, +- unsigned long pgtable) ATTRIB_NORET; +- +-extern const unsigned char relocate_new_kernel[]; +-extern const unsigned long relocate_new_kernel_size; +- + int machine_kexec_prepare(struct kimage *image) + { +- unsigned long start_pgtable, control_code_buffer; ++ unsigned long start_pgtable; + int result; + + /* Calculate the offsets */ + start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT; +- control_code_buffer = start_pgtable + PAGE_SIZE; + + /* Setup the identity mapped 64bit page table */ + result = init_pgtable(image, start_pgtable); + if (result) + return result; + +- /* Place the code in the reboot code buffer */ +- memcpy(__va(control_code_buffer), relocate_new_kernel, +- relocate_new_kernel_size); +- + return 0; + } + +@@ -184,28 +180,34 @@ void machine_kexec_cleanup(struct kimage + */ + NORET_TYPE void machine_kexec(struct kimage *image) + { +- unsigned long page_list; +- unsigned long control_code_buffer; +- unsigned long start_pgtable; +- relocate_new_kernel_t rnk; ++ unsigned long page_list[PAGES_NR]; ++ void *control_page; + + /* Interrupts aren''t acceptable while we reboot */ + local_irq_disable(); + +- /* Calculate the offsets */ +- page_list = image->head; +- start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT; +- control_code_buffer = start_pgtable + PAGE_SIZE; ++ control_page = page_address(image->control_code_page) + PAGE_SIZE; ++ memcpy(control_page, relocate_kernel, PAGE_SIZE); + +- /* Set the low half of the page table to my identity mapped +- * page table for kexec. Leave the high half pointing at the +- * kernel pages. Don''t bother to flush the global pages +- * as that will happen when I fully switch to my identity mapped +- * page table anyway. +- */ +- memcpy(__va(read_cr3()), __va(start_pgtable), PAGE_SIZE/2); +- __flush_tlb(); ++ page_list[PA_CONTROL_PAGE] = __pa(control_page); ++ page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel; ++ page_list[PA_PGD] = __pa(kexec_pgd); ++ page_list[VA_PGD] = (unsigned long)kexec_pgd; ++ page_list[PA_PUD_0] = __pa(kexec_pud0); ++ page_list[VA_PUD_0] = (unsigned long)kexec_pud0; ++ page_list[PA_PMD_0] = __pa(kexec_pmd0); ++ page_list[VA_PMD_0] = (unsigned long)kexec_pmd0; ++ page_list[PA_PTE_0] = __pa(kexec_pte0); ++ page_list[VA_PTE_0] = (unsigned long)kexec_pte0; ++ page_list[PA_PUD_1] = __pa(kexec_pud1); ++ page_list[VA_PUD_1] = (unsigned long)kexec_pud1; ++ page_list[PA_PMD_1] = __pa(kexec_pmd1); ++ page_list[VA_PMD_1] = (unsigned long)kexec_pmd1; ++ page_list[PA_PTE_1] = __pa(kexec_pte1); ++ page_list[VA_PTE_1] = (unsigned long)kexec_pte1; + ++ page_list[PA_TABLE_PAGE] ++ (unsigned long)__pa(page_address(image->control_code_page)); + + /* The segment registers are funny things, they have both a + * visible and an invisible part. Whenever the visible part is +@@ -222,9 +224,10 @@ NORET_TYPE void machine_kexec(struct kim + */ + set_gdt(phys_to_virt(0),0); + set_idt(phys_to_virt(0),0); ++ + /* now call it */ +- rnk = (relocate_new_kernel_t) control_code_buffer; +- (*rnk)(page_list, control_code_buffer, image->start, start_pgtable); ++ relocate_kernel((unsigned long)image->head, (unsigned long)page_list, ++ image->start); + } + + /* crashkernel=size@addr specifies the location to reserve for +--- a/arch/x86_64/kernel/relocate_kernel.S ++++ b/arch/x86_64/kernel/relocate_kernel.S +@@ -7,31 +7,169 @@ + */ + + #include <linux/linkage.h> ++#include <asm/page.h> ++#include <asm/kexec.h> + +- /* +- * Must be relocatable PIC code callable as a C function, that once +- * it starts can not use the previous processes stack. +- */ +- .globl relocate_new_kernel ++/* ++ * Must be relocatable PIC code callable as a C function ++ */ ++ ++#define PTR(x) (x << 3) ++#define PAGE_ALIGNED (1 << PAGE_SHIFT) ++#define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */ ++ ++ .text ++ .align PAGE_ALIGNED + .code64 ++ .globl relocate_kernel ++relocate_kernel: ++ /* %rdi indirection_page ++ * %rsi page_list ++ * %rdx start address ++ */ ++ ++ /* map the control page at its virtual address */ ++ ++ movq $0x0000ff8000000000, %r10 /* mask */ ++ mov $(39 - 3), %cl /* bits to shift */ ++ movq PTR(VA_CONTROL_PAGE)(%rsi), %r11 /* address to map */ ++ ++ movq %r11, %r9 ++ andq %r10, %r9 ++ shrq %cl, %r9 ++ ++ movq PTR(VA_PGD)(%rsi), %r8 ++ addq %r8, %r9 ++ movq PTR(PA_PUD_0)(%rsi), %r8 ++ orq $PAGE_ATTR, %r8 ++ movq %r8, (%r9) ++ ++ shrq $9, %r10 ++ sub $9, %cl ++ ++ movq %r11, %r9 ++ andq %r10, %r9 ++ shrq %cl, %r9 ++ ++ movq PTR(VA_PUD_0)(%rsi), %r8 ++ addq %r8, %r9 ++ movq PTR(PA_PMD_0)(%rsi), %r8 ++ orq $PAGE_ATTR, %r8 ++ movq %r8, (%r9) ++ ++ shrq $9, %r10 ++ sub $9, %cl ++ ++ movq %r11, %r9 ++ andq %r10, %r9 ++ shrq %cl, %r9 ++ ++ movq PTR(VA_PMD_0)(%rsi), %r8 ++ addq %r8, %r9 ++ movq PTR(PA_PTE_0)(%rsi), %r8 ++ orq $PAGE_ATTR, %r8 ++ movq %r8, (%r9) ++ ++ shrq $9, %r10 ++ sub $9, %cl ++ ++ movq %r11, %r9 ++ andq %r10, %r9 ++ shrq %cl, %r9 ++ ++ movq PTR(VA_PTE_0)(%rsi), %r8 ++ addq %r8, %r9 ++ movq PTR(PA_CONTROL_PAGE)(%rsi), %r8 ++ orq $PAGE_ATTR, %r8 ++ movq %r8, (%r9) ++ ++ /* identity map the control page at its physical address */ ++ ++ movq $0x0000ff8000000000, %r10 /* mask */ ++ mov $(39 - 3), %cl /* bits to shift */ ++ movq PTR(PA_CONTROL_PAGE)(%rsi), %r11 /* address to map */ ++ ++ movq %r11, %r9 ++ andq %r10, %r9 ++ shrq %cl, %r9 ++ ++ movq PTR(VA_PGD)(%rsi), %r8 ++ addq %r8, %r9 ++ movq PTR(PA_PUD_1)(%rsi), %r8 ++ orq $PAGE_ATTR, %r8 ++ movq %r8, (%r9) ++ ++ shrq $9, %r10 ++ sub $9, %cl ++ ++ movq %r11, %r9 ++ andq %r10, %r9 ++ shrq %cl, %r9 ++ ++ movq PTR(VA_PUD_1)(%rsi), %r8 ++ addq %r8, %r9 ++ movq PTR(PA_PMD_1)(%rsi), %r8 ++ orq $PAGE_ATTR, %r8 ++ movq %r8, (%r9) ++ ++ shrq $9, %r10 ++ sub $9, %cl ++ ++ movq %r11, %r9 ++ andq %r10, %r9 ++ shrq %cl, %r9 ++ ++ movq PTR(VA_PMD_1)(%rsi), %r8 ++ addq %r8, %r9 ++ movq PTR(PA_PTE_1)(%rsi), %r8 ++ orq $PAGE_ATTR, %r8 ++ movq %r8, (%r9) ++ ++ shrq $9, %r10 ++ sub $9, %cl ++ ++ movq %r11, %r9 ++ andq %r10, %r9 ++ shrq %cl, %r9 ++ ++ movq PTR(VA_PTE_1)(%rsi), %r8 ++ addq %r8, %r9 ++ movq PTR(PA_CONTROL_PAGE)(%rsi), %r8 ++ orq $PAGE_ATTR, %r8 ++ movq %r8, (%r9) ++ + relocate_new_kernel: +- /* %rdi page_list +- * %rsi reboot_code_buffer ++ /* %rdi indirection_page ++ * %rsi page_list + * %rdx start address +- * %rcx page_table +- * %r8 arg5 +- * %r9 arg6 + */ + + /* zero out flags, and disable interrupts */ + pushq $0 + popfq + +- /* set a new stack at the bottom of our page... */ +- lea 4096(%rsi), %rsp ++ /* get physical address of control page now */ ++ /* this is impossible after page table switch */ ++ movq PTR(PA_CONTROL_PAGE)(%rsi), %r8 ++ ++ /* get physical address of page table now too */ ++ movq PTR(PA_TABLE_PAGE)(%rsi), %rcx ++ ++ /* switch to new set of page tables */ ++ movq PTR(PA_PGD)(%rsi), %r9 ++ movq %r9, %cr3 ++ ++ /* setup a new stack at the end of the physical control page */ ++ lea 4096(%r8), %rsp ++ ++ /* jump to identity mapped page */ ++ addq $(identity_mapped - relocate_kernel), %r8 ++ pushq %r8 ++ ret + +- /* store the parameters back on the stack */ +- pushq %rdx /* store the start address */ ++identity_mapped: ++ /* store the start address on the stack */ ++ pushq %rdx + + /* Set cr0 to a known state: + * 31 1 == Paging enabled +@@ -136,8 +274,3 @@ relocate_new_kernel: + xorq %r15, %r15 + + ret +-relocate_new_kernel_end: +- +- .globl relocate_new_kernel_size +-relocate_new_kernel_size: +- .quad relocate_new_kernel_end - relocate_new_kernel +--- a/include/asm-x86_64/kexec.h ++++ b/include/asm-x86_64/kexec.h +@@ -1,6 +1,27 @@ + #ifndef _X86_64_KEXEC_H + #define _X86_64_KEXEC_H + ++#define PA_CONTROL_PAGE 0 ++#define VA_CONTROL_PAGE 1 ++#define PA_PGD 2 ++#define VA_PGD 3 ++#define PA_PUD_0 4 ++#define VA_PUD_0 5 ++#define PA_PMD_0 6 ++#define VA_PMD_0 7 ++#define PA_PTE_0 8 ++#define VA_PTE_0 9 ++#define PA_PUD_1 10 ++#define VA_PUD_1 11 ++#define PA_PMD_1 12 ++#define VA_PMD_1 13 ++#define PA_PTE_1 14 ++#define VA_PTE_1 15 ++#define PA_TABLE_PAGE 16 ++#define PAGES_NR 17 ++ ++#ifndef __ASSEMBLY__ ++ + #include <linux/string.h> + + #include <asm/page.h> +@@ -64,4 +85,12 @@ static inline void crash_setup_regs(stru + newregs->rip = (unsigned long)current_text_addr(); + } + } ++ ++NORET_TYPE void ++relocate_kernel(unsigned long indirection_page, ++ unsigned long page_list, ++ unsigned long start_address) ATTRIB_NORET; ++ ++#endif /* __ASSEMBLY__ */ ++ + #endif /* _X86_64_KEXEC_H */ --- /dev/null +++ work/patches/linux-2.6.16.29/linux-2.6.19-rc1-kexec-move_segment_code-x86_64.patch 2006-10-16 12:41:28.000000000 +0900 @@ -0,0 +1,161 @@ +kexec: Move asm segment handling code to the assembly file (x86_64) + +This patch moves the idt, gdt, and segment handling code from machine_kexec.c +to relocate_kernel.S. The main reason behind this move is to avoid code +duplication in the Xen hypervisor. With this patch all code required to kexec +is put on the control page. + +On top of that this patch also counts as a cleanup - I think it is much +nicer to write assembly directly in assembly files than wrap inline assembly +in C functions for no apparent reason. + +Signed-off-by: Magnus Damm <magnus@valinux.co.jp> +--- + + Applies to 2.6.19-rc1. + + machine_kexec.c | 58 ----------------------------------------------------- + relocate_kernel.S | 50 +++++++++++++++++++++++++++++++++++++++++---- + 2 files changed, 45 insertions(+), 63 deletions(-) + +--- 0002/arch/x86_64/kernel/machine_kexec.c ++++ work/arch/x86_64/kernel/machine_kexec.c 2006-10-05 16:15:49.000000000 +0900 +@@ -112,47 +112,6 @@ static int init_pgtable(struct kimage *i + return init_level4_page(image, level4p, 0, end_pfn << PAGE_SHIFT); + } + +-static void set_idt(void *newidt, u16 limit) +-{ +- struct desc_ptr curidt; +- +- /* x86-64 supports unaliged loads & stores */ +- curidt.size = limit; +- curidt.address = (unsigned long)newidt; +- +- __asm__ __volatile__ ( +- "lidtq %0\n" +- : : "m" (curidt) +- ); +-}; +- +- +-static void set_gdt(void *newgdt, u16 limit) +-{ +- struct desc_ptr curgdt; +- +- /* x86-64 supports unaligned loads & stores */ +- curgdt.size = limit; +- curgdt.address = (unsigned long)newgdt; +- +- __asm__ __volatile__ ( +- "lgdtq %0\n" +- : : "m" (curgdt) +- ); +-}; +- +-static void load_segments(void) +-{ +- __asm__ __volatile__ ( +- "\tmovl %0,%%ds\n" +- "\tmovl %0,%%es\n" +- "\tmovl %0,%%ss\n" +- "\tmovl %0,%%fs\n" +- "\tmovl %0,%%gs\n" +- : : "a" (__KERNEL_DS) : "memory" +- ); +-} +- + int machine_kexec_prepare(struct kimage *image) + { + unsigned long start_pgtable; +@@ -209,23 +168,6 @@ NORET_TYPE void machine_kexec(struct kim + page_list[PA_TABLE_PAGE] + (unsigned long)__pa(page_address(image->control_code_page)); + +- /* The segment registers are funny things, they have both a +- * visible and an invisible part. Whenever the visible part is +- * set to a specific selector, the invisible part is loaded +- * with from a table in memory. At no other time is the +- * descriptor table in memory accessed. +- * +- * I take advantage of this here by force loading the +- * segments, before I zap the gdt with an invalid value. +- */ +- load_segments(); +- /* The gdt & idt are now invalid. +- * If you want to load them you must set up your own idt & gdt. +- */ +- set_gdt(phys_to_virt(0),0); +- set_idt(phys_to_virt(0),0); +- +- /* now call it */ + relocate_kernel((unsigned long)image->head, (unsigned long)page_list, + image->start); + } +--- 0002/arch/x86_64/kernel/relocate_kernel.S ++++ work/arch/x86_64/kernel/relocate_kernel.S 2006-10-05 16:18:07.000000000 +0900 +@@ -159,13 +159,39 @@ relocate_new_kernel: + movq PTR(PA_PGD)(%rsi), %r9 + movq %r9, %cr3 + ++ /* setup idt */ ++ movq %r8, %rax ++ addq $(idt_80 - relocate_kernel), %rax ++ lidtq (%rax) ++ ++ /* setup gdt */ ++ movq %r8, %rax ++ addq $(gdt - relocate_kernel), %rax ++ movq %r8, %r9 ++ addq $((gdt_80 - relocate_kernel) + 2), %r9 ++ movq %rax, (%r9) ++ ++ movq %r8, %rax ++ addq $(gdt_80 - relocate_kernel), %rax ++ lgdtq (%rax) ++ ++ /* setup data segment registers */ ++ xorl %eax, %eax ++ movl %eax, %ds ++ movl %eax, %es ++ movl %eax, %fs ++ movl %eax, %gs ++ movl %eax, %ss ++ + /* setup a new stack at the end of the physical control page */ + lea 4096(%r8), %rsp + +- /* jump to identity mapped page */ +- addq $(identity_mapped - relocate_kernel), %r8 +- pushq %r8 +- ret ++ /* load new code segment and jump to identity mapped page */ ++ movq %r8, %rax ++ addq $(identity_mapped - relocate_kernel), %rax ++ pushq $(gdt_cs - gdt) ++ pushq %rax ++ lretq + + identity_mapped: + /* store the start address on the stack */ +@@ -272,5 +298,19 @@ identity_mapped: + xorq %r13, %r13 + xorq %r14, %r14 + xorq %r15, %r15 +- + ret ++ ++ .align 16 ++gdt: ++ .quad 0x0000000000000000 /* NULL descriptor */ ++gdt_cs: ++ .quad 0x00af9a000000ffff ++gdt_end: ++ ++gdt_80: ++ .word gdt_end - gdt - 1 /* limit */ ++ .quad 0 /* base - filled in by code above */ ++ ++idt_80: ++ .word 0 /* limit */ ++ .quad 0 /* base */ --- /dev/null +++ work/patches/linux-2.6.16.29/linux-2.6.19-rc1-kexec-xen-x86_64.patch 2006-10-16 12:41:28.000000000 +0900 @@ -0,0 +1,162 @@ +--- 0006/arch/x86_64/kernel/machine_kexec.c ++++ work/arch/x86_64/kernel/machine_kexec.c 2006-10-06 15:36:16.000000000 +0900 +@@ -24,6 +24,104 @@ static u64 kexec_pud1[512] PAGE_ALIGNED; + static u64 kexec_pmd1[512] PAGE_ALIGNED; + static u64 kexec_pte1[512] PAGE_ALIGNED; + ++#ifdef CONFIG_XEN ++ ++/* In the case of Xen, override hypervisor functions to be able to create ++ * a regular identity mapping page table... ++ */ ++ ++#include <xen/interface/kexec.h> ++#include <xen/interface/memory.h> ++ ++#define x__pmd(x) ((pmd_t) { (x) } ) ++#define x__pud(x) ((pud_t) { (x) } ) ++#define x__pgd(x) ((pgd_t) { (x) } ) ++ ++#define x_pmd_val(x) ((x).pmd) ++#define x_pud_val(x) ((x).pud) ++#define x_pgd_val(x) ((x).pgd) ++ ++static inline void x_set_pmd(pmd_t *dst, pmd_t val) ++{ ++ x_pmd_val(*dst) = x_pmd_val(val); ++} ++ ++static inline void x_set_pud(pud_t *dst, pud_t val) ++{ ++ x_pud_val(*dst) = phys_to_machine(x_pud_val(val)); ++} ++ ++static inline void x_pud_clear (pud_t *pud) ++{ ++ x_pud_val(*pud) = 0; ++} ++ ++static inline void x_set_pgd(pgd_t *dst, pgd_t val) ++{ ++ x_pgd_val(*dst) = phys_to_machine(x_pgd_val(val)); ++} ++ ++static inline void x_pgd_clear (pgd_t * pgd) ++{ ++ x_pgd_val(*pgd) = 0; ++} ++ ++#define X__PAGE_KERNEL_LARGE_EXEC \ ++ _PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_PSE ++#define X_KERNPG_TABLE _PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY ++ ++#define __ma(x) (pfn_to_mfn(__pa((x)) >> PAGE_SHIFT) << PAGE_SHIFT) ++ ++#if PAGES_NR > KEXEC_XEN_NO_PAGES ++#error PAGES_NR is greater than KEXEC_XEN_NO_PAGES - Xen support will break ++#endif ++ ++#if PA_CONTROL_PAGE != 0 ++#error PA_CONTROL_PAGE is non zero - Xen support will break ++#endif ++ ++void machine_kexec_setup_load_arg(xen_kexec_image_t *xki, struct kimage *image) ++{ ++ void *control_page; ++ void *table_page; ++ ++ memset(xki->page_list, 0, sizeof(xki->page_list)); ++ ++ control_page = page_address(image->control_code_page) + PAGE_SIZE; ++ memcpy(control_page, relocate_kernel, PAGE_SIZE); ++ ++ table_page = page_address(image->control_code_page); ++ ++ xki->page_list[PA_CONTROL_PAGE] = __ma(control_page); ++ xki->page_list[PA_TABLE_PAGE] = __ma(table_page); ++ ++ xki->page_list[PA_PGD] = __ma(kexec_pgd); ++ xki->page_list[PA_PUD_0] = __ma(kexec_pud0); ++ xki->page_list[PA_PUD_1] = __ma(kexec_pud1); ++ xki->page_list[PA_PMD_0] = __ma(kexec_pmd0); ++ xki->page_list[PA_PMD_1] = __ma(kexec_pmd1); ++ xki->page_list[PA_PTE_0] = __ma(kexec_pte0); ++ xki->page_list[PA_PTE_1] = __ma(kexec_pte1); ++} ++ ++#else /* CONFIG_XEN */ ++ ++#define x__pmd(x) __pmd(x) ++#define x__pud(x) __pud(x) ++#define x__pgd(x) __pgd(x) ++ ++#define x_set_pmd(x, y) set_pmd(x, y) ++#define x_set_pud(x, y) set_pud(x, y) ++#define x_set_pgd(x, y) set_pgd(x, y) ++ ++#define x_pud_clear(x) pud_clear(x) ++#define x_pgd_clear(x) pgd_clear(x) ++ ++#define X__PAGE_KERNEL_LARGE_EXEC __PAGE_KERNEL_LARGE_EXEC ++#define X_KERNPG_TABLE _KERNPG_TABLE ++ ++#endif /* CONFIG_XEN */ ++ + static void init_level2_page(pmd_t *level2p, unsigned long addr) + { + unsigned long end_addr; +@@ -31,7 +129,7 @@ static void init_level2_page(pmd_t *leve + addr &= PAGE_MASK; + end_addr = addr + PUD_SIZE; + while (addr < end_addr) { +- set_pmd(level2p++, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC)); ++ x_set_pmd(level2p++, x__pmd(addr | X__PAGE_KERNEL_LARGE_EXEC)); + addr += PMD_SIZE; + } + } +@@ -56,12 +154,12 @@ static int init_level3_page(struct kimag + } + level2p = (pmd_t *)page_address(page); + init_level2_page(level2p, addr); +- set_pud(level3p++, __pud(__pa(level2p) | _KERNPG_TABLE)); ++ x_set_pud(level3p++, x__pud(__pa(level2p) | X_KERNPG_TABLE)); + addr += PUD_SIZE; + } + /* clear the unused entries */ + while (addr < end_addr) { +- pud_clear(level3p++); ++ x_pud_clear(level3p++); + addr += PUD_SIZE; + } + out: +@@ -92,12 +190,12 @@ static int init_level4_page(struct kimag + if (result) { + goto out; + } +- set_pgd(level4p++, __pgd(__pa(level3p) | _KERNPG_TABLE)); ++ x_set_pgd(level4p++, x__pgd(__pa(level3p) | X_KERNPG_TABLE)); + addr += PGDIR_SIZE; + } + /* clear the unused entries */ + while (addr < end_addr) { +- pgd_clear(level4p++); ++ x_pgd_clear(level4p++); + addr += PGDIR_SIZE; + } + out: +@@ -108,8 +206,14 @@ out: + static int init_pgtable(struct kimage *image, unsigned long start_pgtable) + { + pgd_t *level4p; ++ unsigned long x_end_pfn = end_pfn; ++ ++#ifdef CONFIG_XEN ++ x_end_pfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL); ++#endif ++ + level4p = (pgd_t *)__va(start_pgtable); +- return init_level4_page(image, level4p, 0, end_pfn << PAGE_SHIFT); ++ return init_level4_page(image, level4p, 0, x_end_pfn << PAGE_SHIFT); + } + + int machine_kexec_prepare(struct kimage *image) --- 0005/patches/linux-2.6.16.29/series +++ work/patches/linux-2.6.16.29/series 2006-10-16 12:41:27.000000000 +0900 @@ -4,6 +4,9 @@ git-2a8a3d5b65e86ec1dfef7d268c64a909eab9 git-3566561bfadffcb5dbc85d576be80c0dbf2cccc9.patch linux-2.6.19-rc1-kexec-move_segment_code-i386.patch linux-2.6.19-rc1-kexec-xen-i386.patch +git-4bfaaef01a1badb9e8ffb0c0a37cd2379008d21f.patch +linux-2.6.19-rc1-kexec-move_segment_code-x86_64.patch +linux-2.6.19-rc1-kexec-xen-x86_64.patch blktap-aio-16_03_06.patch device_bind.patch fix-hz-suspend.patch --- 0001/xen/arch/x86/x86_64/entry.S +++ work/xen/arch/x86/x86_64/entry.S 2006-10-16 12:41:27.000000000 +0900 @@ -573,6 +573,7 @@ ENTRY(hypercall_table) .quad do_hvm_op .quad do_sysctl /* 35 */ .quad do_domctl + .quad do_kexec_op .rept NR_hypercalls-((.-hypercall_table)/8) .quad do_ni_hypercall .endr @@ -615,6 +616,7 @@ ENTRY(hypercall_args_table) .byte 2 /* do_hvm_op */ .byte 1 /* do_sysctl */ /* 35 */ .byte 1 /* do_domctl */ + .byte 3 /* do_kexec */ .rept NR_hypercalls-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr --- 0004/xen/arch/x86/x86_64/machine_kexec.c +++ work/xen/arch/x86/x86_64/machine_kexec.c 2006-10-16 12:41:27.000000000 +0900 @@ -4,18 +4,27 @@ * * Created By: Horms <horms@verge.net.au> * - * Should be losely based on arch/x86_64/kernel/machine_kexec.c + * Losely based on arch/x86_64/kernel/machine_kexec.c */ - -#include <xen/lib.h> /* for printk() used in stub */ + #include <xen/types.h> #include <public/kexec.h> +#include <asm/fixmap.h> + +typedef void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long page_list, + unsigned long start_address); void machine_kexec(xen_kexec_image_t *image) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); -} + relocate_new_kernel_t rnk; + rnk = (relocate_new_kernel_t) fix_to_virt(FIX_KEXEC_BASE_0); + (*rnk)(image->indirection_page, (unsigned long)image->page_list, + image->start_address); + } + /* * Local variables: * mode: C --- 0004/xen/include/asm-x86/x86_64/elf.h +++ work/xen/include/asm-x86/x86_64/elf.h 2006-10-16 12:41:27.000000000 +0900 @@ -3,17 +3,55 @@ * * Created By: Horms * - * Should pull be based on include/asm-x86_64/elf.h:ELF_CORE_COPY_REGS - * from Linux 2.6.16 + * Based on include/asm-x86_64/elf.h:ELF_CORE_COPY_REGS from Linux 2.6.16 */ #ifndef __X86_ELF_X86_64_H__ #define __X86_ELF_X86_64_H__ -#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/lib.h> -#define ELF_CORE_COPY_REGS(pr_reg, regs) \ - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); +#include <xen/lib.h> + +/* XXX: Xen doesn''t have orig_rax, so it is omitted. + * Xen dosn''t have threads, so fs and gs are read from the CPU and + * thus values 21 and 22 are just duplicates of 25 and 26 + * respectively. All these values could be passed from dom0 in the + * case of it crashing, but does that help? + * + * Lastly, I''m not sure why ds, es, fs and gs are read from + * the CPU rather than regs, but linux does this + */ + +#define ELF_CORE_COPY_REGS(pr_reg, regs) do { \ + unsigned v; \ + (pr_reg)[0] = (regs)->r15; \ + (pr_reg)[1] = (regs)->r14; \ + (pr_reg)[2] = (regs)->r13; \ + (pr_reg)[3] = (regs)->r12; \ + (pr_reg)[4] = (regs)->rbp; \ + (pr_reg)[5] = (regs)->rbx; \ + (pr_reg)[6] = (regs)->r11; \ + (pr_reg)[7] = (regs)->r10; \ + (pr_reg)[8] = (regs)->r9; \ + (pr_reg)[9] = (regs)->r8; \ + (pr_reg)[10] = (regs)->rax; \ + (pr_reg)[11] = (regs)->rcx; \ + (pr_reg)[12] = (regs)->rdx; \ + (pr_reg)[13] = (regs)->rsi; \ + (pr_reg)[14] = (regs)->rdi; \ + (pr_reg)[16] = (regs)->rip; \ + (pr_reg)[17] = (regs)->cs; \ + (pr_reg)[18] = (regs)->eflags; \ + (pr_reg)[19] = (regs)->rsp; \ + (pr_reg)[20] = (regs)->ss; \ + asm("movl %%fs,%0" : "=r" (v)); (pr_reg)[21] = v; \ + asm("movl %%gs,%0" : "=r" (v)); (pr_reg)[22] = v; \ + asm("movl %%ds,%0" : "=r" (v)); (pr_reg)[23] = v; \ + asm("movl %%es,%0" : "=r" (v)); (pr_reg)[24] = v; \ + asm("movl %%fs,%0" : "=r" (v)); (pr_reg)[25] = v; \ + asm("movl %%gs,%0" : "=r" (v)); (pr_reg)[26] = v; \ +} while(0); #endif /* __X86_ELF_X86_64_H__ */ --- 0004/xen/include/asm-x86/x86_64/kexec.h +++ work/xen/include/asm-x86/x86_64/kexec.h 2006-10-16 12:41:27.000000000 +0900 @@ -10,14 +10,43 @@ #ifndef __X86_64_KEXEC_H__ #define __X86_64_KEXEC_H__ -#include <xen/lib.h> /* for printk() used in stub */ +#include <xen/lib.h> #include <xen/types.h> #include <public/xen.h> +/* + * Saving the registers of the cpu on which panic occured in + * crash_kexec to save a valid sp. The registers of other cpus + * will be saved in machine_crash_shutdown while shooting down them. + */ static void crash_setup_regs(struct cpu_user_regs *newregs, struct cpu_user_regs *oldregs) { - printk("STUB: " __FILE__ ": %s: not implemented\n", __FUNCTION__); + if (oldregs) + memcpy(newregs, oldregs, sizeof(*newregs)); + else { + __asm__ __volatile__("movq %%rbx,%0" : "=m"(newregs->rbx)); + __asm__ __volatile__("movq %%rcx,%0" : "=m"(newregs->rcx)); + __asm__ __volatile__("movq %%rdx,%0" : "=m"(newregs->rdx)); + __asm__ __volatile__("movq %%rsi,%0" : "=m"(newregs->rsi)); + __asm__ __volatile__("movq %%rdi,%0" : "=m"(newregs->rdi)); + __asm__ __volatile__("movq %%rbp,%0" : "=m"(newregs->rbp)); + __asm__ __volatile__("movq %%rax,%0" : "=m"(newregs->rax)); + __asm__ __volatile__("movq %%rsp,%0" : "=m"(newregs->rsp)); + __asm__ __volatile__("movq %%r8,%0" : "=m"(newregs->r8)); + __asm__ __volatile__("movq %%r9,%0" : "=m"(newregs->r9)); + __asm__ __volatile__("movq %%r10,%0" : "=m"(newregs->r10)); + __asm__ __volatile__("movq %%r11,%0" : "=m"(newregs->r11)); + __asm__ __volatile__("movq %%r12,%0" : "=m"(newregs->r12)); + __asm__ __volatile__("movq %%r13,%0" : "=m"(newregs->r13)); + __asm__ __volatile__("movq %%r14,%0" : "=m"(newregs->r14)); + __asm__ __volatile__("movq %%r15,%0" : "=m"(newregs->r15)); + __asm__ __volatile__("movl %%ss, %%eax;" :"=a"(newregs->ss)); + __asm__ __volatile__("movl %%cs, %%eax;" :"=a"(newregs->cs)); + __asm__ __volatile__("pushfq; popq %0" :"=m"(newregs->eflags)); + + newregs->rip = (unsigned long)current_text_addr(); + } } #endif /* __X86_64_KEXEC_H__ */ --- 0004/xen/include/public/kexec.h +++ work/xen/include/public/kexec.h 2006-10-16 12:41:27.000000000 +0900 @@ -50,6 +50,9 @@ typedef struct xen_kexec_image { #if defined(__i386__) || defined(__x86_64__) unsigned long page_list[KEXEC_XEN_NO_PAGES]; #endif +#if defined(__x86_64__) + unsigned long page_table_b; +#endif unsigned long indirection_page; unsigned long start_address; } xen_kexec_image_t; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Oct-16 14:03 UTC
[Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
Some comments: No need for IO-APIC work on the guest side of the kexec interfaces (e.g., don''t call disable_IO_APIC()). What''s the second argument to the hypercall for? There''s no clear explanation of what the TYPE parameter means, and currently it is only ever specified as TYPE_CRASH. So what''s TYPE_DEFAULT for? And do we really need to avoid copy_to/from_guest so it can''t be folded into the structural parameter? The comment attached to every use of xchg() is dubious. We don''t specify warn_unused_result on that function so there''s no good reason for the compiler to complain about discarding the result. If it''s a reproducible problem it needs investigating. We shouldn''t work around a broken compiler. Attribution at the top of many files: ''Horms'' is a bit vague. Could we have a full name? A company name? An email address? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Oct 16, 2006 at 03:03:29PM +0100, Keir Fraser wrote:> Some comments: > > No need for IO-APIC work on the guest side of the kexec interfaces (e.g., > don''t call disable_IO_APIC()). > > What''s the second argument to the hypercall for? There''s no clear > explanation of what the TYPE parameter means, and currently it is only ever > specified as TYPE_CRASH. So what''s TYPE_DEFAULT for? And do we really need > to avoid copy_to/from_guest so it can''t be folded into the structural > parameter?TYPE_DEFAULT is kexec, TYPE_CRASH is kdump. Its just to avoid a from_guest. If you think that isn''t worthwhile we can fold it into the structural parameter.> The comment attached to every use of xchg() is dubious. We don''t specify > warn_unused_result on that function so there''s no good reason for the > compiler to complain about discarding the result. If it''s a reproducible > problem it needs investigating. We shouldn''t work around a broken compiler.I certainly saw the problem. I''ll do some more investigations. ... Actually now that I think about it for a moment, perhaps it was inheriting -Wall from CFLAGS in my environment. If that is the case, then I think the work-around I added is valid.:wq> Attribution at the top of many files: ''Horms'' is a bit vague. Could we have > a full name? A company name? An email address?Sure. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magnus Damm
2006-Oct-17 06:36 UTC
[Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
Hi Keir, thanks for the comments! On 10/16/06, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> Some comments: > > No need for IO-APIC work on the guest side of the kexec interfaces (e.g., > don''t call disable_IO_APIC()).Ok, good idea.> What''s the second argument to the hypercall for? There''s no clear > explanation of what the TYPE parameter means, and currently it is only ever > specified as TYPE_CRASH. So what''s TYPE_DEFAULT for? And do we really need > to avoid copy_to/from_guest so it can''t be folded into the structural > parameter? > > The comment attached to every use of xchg() is dubious. We don''t specify > warn_unused_result on that function so there''s no good reason for the > compiler to complain about discarding the result. If it''s a reproducible > problem it needs investigating. We shouldn''t work around a broken compiler. > > Attribution at the top of many files: ''Horms'' is a bit vague. Could we have > a full name? A company name? An email address?I agree with all your observations. The purpose of this release was to quickly release something that supports VT hardware and applies to a working x86_64 changeset. Now when the release is done I''d like to focus on fixing the issues you''ve brought up together with scratching some itches that I''ve listed below: - Headers and comments cleanup. - Merge load and unload hypercall - make unload same as load NULL. - Investigate xen/common/kexec.c locking - clean up the code. - Merge crash and smp code - it may be possible to share cpu stopping code. - Try to move the ELF notes out of the hypervisor - this is intrusive and complicated. My plan is to spend the week fixing these things and post a new release some time next week. Thanks! / magnus _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Oct-17 06:56 UTC
[Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
On 17/10/06 2:25 am, "Horms" <horms@verge.net.au> wrote:> TYPE_DEFAULT is kexec, TYPE_CRASH is kdump. > > Its just to avoid a from_guest. If you think that isn''t worthwhile > we can fold it into the structural parameter.Yes please. This isn''t performance critical. Folding will make the interface better match other hypercalls.> I certainly saw the problem. I''ll do some more investigations. > > ... Actually now that I think about it for a moment, perhaps it was > inheriting -Wall from CFLAGS in my environment. If that is the case, > then I think the work-around I added is valid.:wqXen is always built with -Wall. I simply don''t believe this can be a valid warning message. There''s nothing special about __xchg() that should cause complaint when its return value is discarded (a common thing to do in C). Can you repro the problem on vanilla versions of gcc (straight from gnu.org)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Oct-17 14:49 UTC
Re: [Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
On 16/10/06 15:03, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote:> The comment attached to every use of xchg() is dubious. We don''t specify > warn_unused_result on that function so there''s no good reason for the > compiler to complain about discarding the result. If it''s a reproducible > problem it needs investigating. We shouldn''t work around a broken compiler.Ok, this seems to be a new ''feature'' of gcc4, which has widened the class of things it will complain about with -Wunused-value. Basically any cast of a value causes the compiler to see an expression whose value must not be discarded -- what a pain! This seems to me that it could mean that ignoring return value of any function becomes potentially dangerous, as the function may be a macro that includes a cast on the return value. I think the fix is -Wno-unused-value. :-) It''s about the least useful of the -Wunused-<foo> warnings anyway. I''ll check this in, so please just get rid of the if(xchg()) hacky workaround. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Oct-17 15:05 UTC
Re: [Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
> > The comment attached to every use of xchg() is dubious. We don''t specify > > warn_unused_result on that function so there''s no good reason for the > > compiler to complain about discarding the result. If it''s a reproducible > > problem it needs investigating. We shouldn''t work around a broken > > compiler. > > Ok, this seems to be a new ''feature'' of gcc4, which has widened the class > of things it will complain about with -Wunused-value. Basically any cast of > a value causes the compiler to see an expression whose value must not be > discarded -- what a pain! This seems to me that it could mean that ignoring > return value of any function becomes potentially dangerous, as the function > may be a macro that includes a cast on the return value. > > I think the fix is -Wno-unused-value. :-) It''s about the least useful of > the -Wunused-<foo> warnings anyway. > > I''ll check this in, so please just get rid of the if(xchg()) hacky > workaround.If you wanted to leave the warning on, you should be able to cast to (void) instead of using the if (void)xchg(&kexec_crash_lock, 0)) in order to explicitly indicate the value is unused. Alternatively I imagine there''s a gcc attribute somewhere to tell it not to worry about unused returns from xchg (like for printf) Cheers, Mark _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Oct-17 20:54 UTC
Re: [Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
On 17/10/06 4:05 pm, "Mark Williamson" <mark.williamson@cl.cam.ac.uk> wrote:> If you wanted to leave the warning on, you should be able to cast to (void) > instead of using the if > > (void)xchg(&kexec_crash_lock, 0))This would be fine if the problem were known to be restricted to xchg(), but it''s not. For example, on my build box ignoring the return value of strncmp() can in some cases trigger the same warning. This is because of some optimisation trickery which includes casting the return value of a memcpy() to char *. Because callers are usually unaware of such trickery, ignoring the return value of any function that produces a return value becomes hazardous.> in order to explicitly indicate the value is unused. Alternatively I imagine > there''s a gcc attribute somewhere to tell it not to worry about unused > returns from xchg (like for printf)Which attribute would this be? There''s one to force a warning, but I don''t believe there''s one to force the opposite. Anyway, this problem is caused by a cast of the return value, so it''s not actually a property of the C function itself. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Oct-17 22:38 UTC
Re: [Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
> This would be fine if the problem were known to be restricted to xchg(), > but it''s not. For example, on my build box ignoring the return value of > strncmp() can in some cases trigger the same warning.Does that happen in the tree at the moment? In any case, I guess it''s not something we want to bite us in the bum in future!> This is because of > some optimisation trickery which includes casting the return value of a > memcpy() to char *. Because callers are usually unaware of such trickery, > ignoring the return value of any function that produces a return value > becomes hazardous.Oh sorry, I see. How icky.> Which attribute would this be? There''s one to force a warning, but I don''t > believe there''s one to force the opposite. Anyway, this problem is caused > by a cast of the return value, so it''s not actually a property of the C > function itself.Ah, I hadn''t understood that it was caused by the cast in the definition of xchg. What a pain! I''ve not actually been able to repoduce this problem in test cases with gcc 4.0.0, so I assume it''s a more recent addition - what''s your compiler version? Do you know if the warn_unused_result attribute will override the -Wno-unused-value and generate a warning anyhow? There''s probably only a few functions we''d want this warning on - if anyone cared about switching these warnings off a fairly small patch could anotate the relevant ones. Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Oct 17, 2006 at 03:49:12PM +0100, Keir Fraser wrote:> On 16/10/06 15:03, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote: > > > The comment attached to every use of xchg() is dubious. We don''t specify > > warn_unused_result on that function so there''s no good reason for the > > compiler to complain about discarding the result. If it''s a reproducible > > problem it needs investigating. We shouldn''t work around a broken compiler. > > Ok, this seems to be a new ''feature'' of gcc4, which has widened the class of > things it will complain about with -Wunused-value. Basically any cast of a > value causes the compiler to see an expression whose value must not be > discarded -- what a pain! This seems to me that it could mean that ignoring > return value of any function becomes potentially dangerous, as the function > may be a macro that includes a cast on the return value. > > I think the fix is -Wno-unused-value. :-) It''s about the least useful of the > -Wunused-<foo> warnings anyway. > > I''ll check this in, so please just get rid of the if(xchg()) hacky > workaround.I have to say that I entirely disagree with your solution. I think the solution is to get rid of -Werror. But I will remove the if(xchg()) foo as you request. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Oct 17, 2006 at 07:56:37AM +0100, Keir Fraser wrote:> > > > On 17/10/06 2:25 am, "Horms" <horms@verge.net.au> wrote: > > > TYPE_DEFAULT is kexec, TYPE_CRASH is kdump. > > > > Its just to avoid a from_guest. If you think that isn''t worthwhile > > we can fold it into the structural parameter. > > Yes please. This isn''t performance critical. Folding will make the interface > better match other hypercalls.The main thing that I am concerned about is, that in the case that dom0 panics, and we are asking the hypercall to take a dump for us, do we really want a copy_from guest in that path? It seems to me that its best to have that path as simple as possible. But I do argree that it comes at the expense of making the interface a bit unclean. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Oct-18 07:08 UTC
Re: [Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
On 17/10/06 11:38 pm, "Mark Williamson" <mark.williamson@cl.cam.ac.uk> wrote:> I''ve not actually been able to repoduce this problem in test cases with gcc > 4.0.0, so I assume it''s a more recent addition - what''s your compiler > version?Gcc 4.1.1 definitely has the problem. The strncmp() error I saw was caused by its definition on Debian 3.1 (latest stable; hardly cutting edge).> Do you know if the warn_unused_result attribute will override > the -Wno-unused-value and generate a warning anyhow? There''s probably only a > few functions we''d want this warning on - if anyone cared about switching > these warnings off a fairly small patch could anotate the relevant ones.Warn_unused_result seems to trigger from a totally different code path that''s unaffected by -Wno-unused-value. I definitely think that unused-value is not as generally useful as unused-{function,label,variable}. unused-{value,parameter} both have a tendency for false positives. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Oct-18 07:13 UTC
Re: [Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
On 18/10/06 1:56 am, "Horms" <horms@verge.net.au> wrote:>> I''ll check this in, so please just get rid of the if(xchg()) hacky >> workaround. > > I have to say that I entirely disagree with your solution. > I think the solution is to get rid of -Werror. But I will > remove the if(xchg()) foo as you request.Holy crap! Getting rid of Werror makes Wall much less useful. We''re too lazy to scan build output for warnings if they don''t break the build. The GCC developers have basically broken the unused-value checking, at least for Linux-style code which is heavy on macro usage. If a particular warning option is no longer useful, the correct answer is to switch it off. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Oct-18 07:15 UTC
[Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
On 18/10/06 2:00 am, "Horms" <horms@verge.net.au> wrote:>> Yes please. This isn''t performance critical. Folding will make the interface >> better match other hypercalls. > > The main thing that I am concerned about is, that in the > case that dom0 panics, and we are asking the hypercall to take a dump > for us, do we really want a copy_from guest in that path? > It seems to me that its best to have that path as simple as possible. > But I do argree that it comes at the expense of making the interface > a bit unclean.We need to assume that kernel address space hasn''t been too badly compromised, or we couldn''t execute as far as making the hypercall. A failed copy_from_guest() in the crash hypercall could simply cause you to use the guest_cpu_user_regs() instead. So you still crash, but the crash site is now the hypercall call site. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Oct 18, 2006 at 08:13:49AM +0100, Keir Fraser wrote:> On 18/10/06 1:56 am, "Horms" <horms@verge.net.au> wrote: > > >> I''ll check this in, so please just get rid of the if(xchg()) hacky > >> workaround. > > > > I have to say that I entirely disagree with your solution. > > I think the solution is to get rid of -Werror. But I will > > remove the if(xchg()) foo as you request. > > Holy crap! Getting rid of Werror makes Wall much less useful. We''re too lazy > to scan build output for warnings if they don''t break the build. > > The GCC developers have basically broken the unused-value checking, at least > for Linux-style code which is heavy on macro usage. If a particular warning > option is no longer useful, the correct answer is to switch it off.I''m quite happy with that argument. However I personally prefer a severity differenation between warnings and errors, that is all. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Oct 18, 2006 at 08:15:45AM +0100, Keir Fraser wrote:> On 18/10/06 2:00 am, "Horms" <horms@verge.net.au> wrote: > > >> Yes please. This isn''t performance critical. Folding will make the interface > >> better match other hypercalls. > > > > The main thing that I am concerned about is, that in the > > case that dom0 panics, and we are asking the hypercall to take a dump > > for us, do we really want a copy_from guest in that path? > > It seems to me that its best to have that path as simple as possible. > > But I do argree that it comes at the expense of making the interface > > a bit unclean. > > We need to assume that kernel address space hasn''t been too badly > compromised, or we couldn''t execute as far as making the hypercall. A failed > copy_from_guest() in the crash hypercall could simply cause you to use the > guest_cpu_user_regs() instead. So you still crash, but the crash site is now > the hypercall call site.Indeed, that is true. I do think there is some value in avoiding guest_cpu_user_regs(), but its theoretical at best. Magnus and I will fold the arguments as you requested. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Adam Heath
2006-Oct-18 15:49 UTC
Re: [Xen-devel] Re: [PATCH 01/04] Kexec / Kdump: Generic code
Keir Fraser wrote:> On 18/10/06 1:56 am, "Horms" <horms@verge.net.au> wrote: > >>> I''ll check this in, so please just get rid of the if(xchg()) hacky >>> workaround. >> I have to say that I entirely disagree with your solution. >> I think the solution is to get rid of -Werror. But I will >> remove the if(xchg()) foo as you request. > > Holy crap! Getting rid of Werror makes Wall much less useful. We''re too lazy > to scan build output for warnings if they don''t break the build.I wrote a script that I used when I was working on kaffe. It would scan the build output, and sort warnings based on the -W option that produces them. It''d then give output sorted by warning option, and file. Made it much easier to work with. I should put that up someplace. Was really handy when a warning was in some .h file, so would be shown several thousand times in the build log, but was really only one problem. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel