kexec: framework and i386 Here is a first cut of kexec for dom0/xen, which will actually kexec the physical machine from xen. The approach taken is to move the architecture-dependant kexec code into a new hypercall. Some notes: * machine_kexec_cleanup() and machine_kexec_prepare() don''t do anything in i386. So while this patch adds a framework for them, I am not sure what parameters are needs at this stage. * Only works for UP, as machine_shutdown is not implemented yet * kexecing into xen does not seem to work, I think that kexec-tools needs updating, but I have not investigated yet * I don''t believe that kdump works yet * This patch was prepared against xen-unstable.hg 9514 As of today (9574) two new hypercalls have been added. I rediffed and moved the kexec hypercall to 33. However this exceedes hypercall_NR, which is currently 32. I tried increasing this, but the dom0 now crashes in entry.S on init. Even after rebuilding both xen and the kernel completely from scratch after a make distclean. Help!! Prepared with the assistance of my colleague Magnus Damm Signed-Off-By: Horms <horms@verge.net.au> --- from-0001/linux-2.6-xen-sparse/arch/i386/Kconfig +++ to-work/linux-2.6-xen-sparse/arch/i386/Kconfig 2006-04-03 15:13:38.000000000 +0900 @@ -726,7 +726,7 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call (EXPERIMENTAL)" - depends on EXPERIMENTAL && !X86_XEN + depends on EXPERIMENTAL help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot --- /dev/null +++ to-work/linux-2.6-xen-sparse/arch/i386/kernel/crash-xen.c 2006-04-03 15:13:38.000000000 +0900 @@ -0,0 +1,15 @@ +/* + * Architecture specific (i386-xen) functions for kexec based crash dumps. + * + * Created by: Horms <horms@verge.net.au> + * + */ + +#include <linux/kernel.h> /* For printk */ + +void machine_crash_shutdown(struct pt_regs *regs) +{ + /* XXX: This should do something */ + printk("xen-kexec: Need to turn of other CPUS in " + "machine_crash_shutdown()\n"); +} --- /dev/null +++ to-work/linux-2.6-xen-sparse/arch/i386/kernel/machine_kexec-xen.c 2006-04-07 12:59:51.000000000 +0900 @@ -0,0 +1,80 @@ +/* + * machine_kexec-xen.c - handle transition of Linux booting another kernel + * + * Created By: Horms <horms@verge.net.au> + * + * Losely based on arch/i386/kernel/machine_kexec-xen.c + */ + +#include <linux/kexec.h> +#include <xen/interface/kexec.h> +#include <linux/mm.h> +#include <asm/hypercall.h> + +const extern unsigned char relocate_new_kernel[]; +extern unsigned int relocate_new_kernel_size; +static kexec_arg_t hypercall_arg; + +/* + * A architecture hook called to validate the + * proposed image and prepare the control pages + * as needed. The pages for KEXEC_CONTROL_CODE_SIZE + * have been allocated, but the segments have yet + * been copied into the kernel. + * + * Do what every setup is needed on image and the + * reboot code buffer to allow us to avoid allocations + * later. + * + * Currently nothing. + */ +int machine_kexec_prepare(struct kimage *image) +{ + return 0; +} + +/* + * Undo anything leftover by machine_kexec_prepare + * when an image is freed. + */ +void machine_kexec_cleanup(struct kimage *image) +{ +} + +/* + * Do not allocate memory (or fail in any way) in machine_kexec(). + * We are past the point of no return, committed to rebooting now. + */ +NORET_TYPE void machine_kexec(struct kimage *image) +{ + kimage_entry_t *ptr, entry; + + /* + * Translate addresses inside head from physcical to machine + * In practice, this only needs to change the pointer to + * indirection pages as non-indirected pages are relative. + */ + ptr = &image->head; + while ((entry = *ptr) && !(entry & IND_DONE)) { + if (!(entry & IND_DESTINATION)) + *ptr = phys_to_machine(entry & PAGE_MASK) | + (entry & ~PAGE_MASK); + + if (entry & IND_INDIRECTION) + ptr = __va(entry & PAGE_MASK); + else + ptr++; + } + + /* Set up arguments to hypercall */ + hypercall_arg.u.kexec.indirection_page = image->head; + hypercall_arg.u.kexec.reboot_code_buffer = + pfn_to_mfn(page_to_pfn(image->control_code_page)) << PAGE_SHIFT; + hypercall_arg.u.kexec.start_address = image->start; + hypercall_arg.u.kexec.relocate_new_kernel = relocate_new_kernel; + hypercall_arg.u.kexec.relocate_new_kernel_size = + relocate_new_kernel_size; + + /* Let Xen do the rest of the work */ + HYPERVISOR_kexec(KEXEC_CMD_kexec, &hypercall_arg); +} --- from-0001/linux-2.6-xen-sparse/drivers/xen/core/reboot.c +++ to-work/linux-2.6-xen-sparse/drivers/xen/core/reboot.c 2006-04-03 15:13:38.000000000 +0900 @@ -38,6 +38,11 @@ extern void ctrl_alt_del(void); */ #define SHUTDOWN_HALT 4 +void machine_shutdown(void) +{ + printk("machine_shutdown: does nothing\n"); +} + void machine_emergency_restart(void) { /* We really want to get pending console data out before we die. */ --- from-0001/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h +++ to-work/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h 2006-04-06 11:00:03.000000000 +0900 @@ -37,6 +37,8 @@ # error "please don''t include this file directly" #endif +#include <xen/interface/kexec.h> + #define __STR(x) #x #define STR(x) __STR(x) @@ -329,6 +331,13 @@ HYPERVISOR_nmi_op( return _hypercall2(int, nmi_op, op, arg); } +static inline int +HYPERVISOR_kexec( + unsigned long op, kexec_arg_t * arg) +{ + return _hypercall2(int, kexec_op, op, arg); +} + #endif /* __HYPERCALL_H__ */ /* バイナリー・ファイル/dev/nullとto-work/linux-2.6.16-xen/kernel/.kexec.c.swpは違います --- from-0001/xen/arch/x86/x86_32/Makefile +++ to-work/xen/arch/x86/x86_32/Makefile 2006-04-03 16:25:31.000000000 +0900 @@ -5,6 +5,7 @@ obj-y += entry.o obj-y += mm.o obj-y += seg_fixup.o obj-y += traps.o +obj-y += machine_kexec.o obj-$(supervisor_mode_kernel) += supervisor_mode_kernel.o --- from-0001/xen/arch/x86/x86_32/entry.S +++ to-work/xen/arch/x86/x86_32/entry.S 2006-04-04 13:02:36.000000000 +0900 @@ -648,6 +648,7 @@ ENTRY(hypercall_table) .long do_acm_op .long do_nmi_op .long do_arch_sched_op + .long do_kexec /* 30 */ .rept NR_hypercalls-((.-hypercall_table)/4) .long do_ni_hypercall .endr @@ -683,6 +684,7 @@ ENTRY(hypercall_args_table) .byte 1 /* do_acm_op */ .byte 2 /* do_nmi_op */ .byte 2 /* do_arch_sched_op */ + .byte 2 /* do_kexec */ /* 30 */ .rept NR_hypercalls-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr --- /dev/null +++ to-work/xen/arch/x86/x86_32/machine_kexec.c 2006-04-07 12:44:16.000000000 +0900 @@ -0,0 +1,168 @@ +/****************************************************************************** + * arch/x86/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <xen/config.h> +#include <xen/types.h> +#include <xen/domain_page.h> +#include <xen/timer.h> +#include <xen/sched.h> +#include <asm/page.h> +#include <asm/flushtlb.h> +#include <public/xen.h> +#include <public/kexec.h> + +typedef asmlinkage void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long reboot_code_buffer, + unsigned long start_address, + unsigned int has_pae); + +#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) + +#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L2_ATTR (_PAGE_PRESENT) + +#ifndef CONFIG_X86_PAE + +static u32 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + unsigned long mfn; + u32 *pgtable_level2; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level2 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + write_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level2); +} + +#else +static u64 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; +static u64 pgtable_level2[L2_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + int mfn; + intpte_t *pgtable_level3; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level3 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + set_64bit(&pgtable_level3[l3_table_offset(address)], + __pa(pgtable_level2) | L2_ATTR); + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + load_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level3); +} +#endif + +static void kexec_load_segments(void) +{ +#define __SSTR(X) #X +#define SSTR(X) __SSTR(X) + __asm__ __volatile__ ( + "\tljmp $"SSTR(__HYPERVISOR_CS)",$1f\n" + "\t1:\n" + "\tmovl $"SSTR(__HYPERVISOR_DS)",%%eax\n" + "\tmovl %%eax,%%ds\n" + "\tmovl %%eax,%%es\n" + "\tmovl %%eax,%%fs\n" + "\tmovl %%eax,%%gs\n" + "\tmovl %%eax,%%ss\n" + ::: "eax", "memory"); +#undef SSTR +#undef __SSTR +} + +#define kexec_load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) +static void kexec_set_idt(void *newidt, __u16 limit) +{ + struct Xgt_desc_struct curidt; + + /* ia32 supports unaliged loads & stores */ + curidt.size = limit; + curidt.address = (unsigned long)newidt; + + kexec_load_idt(&curidt); + +}; + +#define kexec_load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) +static void kexec_set_gdt(void *newgdt, __u16 limit) +{ + struct Xgt_desc_struct curgdt; + + /* ia32 supports unaligned loads & stores */ + curgdt.size = limit; + curgdt.address = (unsigned long)newgdt; + + kexec_load_gdt(&curgdt); +}; + +int machine_kexec_prepare(struct kexec_arg *arg) +{ + return 0; +} + +void machine_kexec_cleanup(struct kexec_arg *arg) +{ +} + +void machine_kexec(struct kexec_arg *arg) +{ + relocate_new_kernel_t rnk; + + local_irq_disable(); + + identity_map_page(arg->u.kexec.reboot_code_buffer); + + copy_from_user((void *)arg->u.kexec.reboot_code_buffer, + arg->u.kexec.relocate_new_kernel, + arg->u.kexec.relocate_new_kernel_size); + + kexec_load_segments(); + + kexec_set_gdt(__va(0),0); + + kexec_set_idt(__va(0),0); + + rnk = (relocate_new_kernel_t) arg->u.kexec.reboot_code_buffer; + + (*rnk)(arg->u.kexec.indirection_page, arg->u.kexec.reboot_code_buffer, + arg->u.kexec.start_address, cpu_has_pae); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- from-0001/xen/common/Makefile +++ to-work/xen/common/Makefile 2006-04-03 15:13:38.000000000 +0900 @@ -24,6 +24,7 @@ obj-y += trace.o obj-y += timer.o obj-y += vsprintf.o obj-y += xmalloc.o +obj-y += kexec.o obj-$(perfc) += perfc.o obj-$(crash_debug) += gdbstub.o --- /dev/null +++ to-work/xen/common/kexec.c 2006-04-07 13:06:54.000000000 +0900 @@ -0,0 +1,54 @@ +/* + * Achitecture independent kexec code for Xen + * + * At this statge, just a switch for the kexec hypercall into + * architecture dependent code. + * + * Created By: Horms <horms@verge.net.au> + */ + +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/guest_access.h> +#include <public/xen.h> +#include <public/kexec.h> + +extern int machine_kexec_prepare(struct kexec_arg *arg); +extern void machine_kexec_cleanup(struct kexec_arg *arg); +extern void machine_kexec(struct kexec_arg *arg); + +int do_kexec(unsigned long op, + GUEST_HANDLE(kexec_arg_t) uarg) +{ + struct kexec_arg arg; + + if ( unlikely(copy_from_guest(&arg, uarg, 1) != 0) ) + { + printk("do_kexec: __copy_from_guest failed"); + return -EFAULT; + } + + switch(op) { + case KEXEC_CMD_kexec: + machine_kexec(&arg); + return -EINVAL; /* Not Reached */ + case KEXEC_CMD_kexec_prepare: + return machine_kexec_prepare(&arg); + case KEXEC_CMD_kexec_cleanup: + machine_kexec_cleanup(&arg); + return 0; + } + + return -EINVAL; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + --- from-0001/xen/include/asm-x86/hypercall.h +++ to-work/xen/include/asm-x86/hypercall.h 2006-04-07 13:05:06.000000000 +0900 @@ -6,6 +6,7 @@ #define __ASM_X86_HYPERCALL_H__ #include <public/physdev.h> +#include <public/kexec.h> extern long do_set_trap_table( @@ -79,6 +80,11 @@ extern long arch_do_vcpu_op( int cmd, struct vcpu *v, GUEST_HANDLE(void) arg); +extern int +do_kexec( + unsigned long op, + GUEST_HANDLE(kexec_arg_t) uarg); + #ifdef __x86_64__ extern long --- /dev/null +++ to-work/xen/include/public/kexec.h 2006-04-07 12:44:43.000000000 +0900 @@ -0,0 +1,39 @@ +/* + * kexec.h: Xen kexec + * + * Created By: Horms <horms@verge.net.au> + */ + +#ifndef _XEN_PUBLIC_KEXEC_H +#define _XEN_PUBLIC_KEXEC_H + +/* + * Scratch space for passing arguments to the kexec hypercall + */ +typedef struct kexec_arg { + union { + struct { + unsigned long data; /* Not sure what this should be yet */ + } helper; + struct { + unsigned long indirection_page; + unsigned long reboot_code_buffer; + unsigned long start_address; + const char *relocate_new_kernel; + unsigned int relocate_new_kernel_size; + } kexec; + } u; +} kexec_arg_t; +DEFINE_GUEST_HANDLE(kexec_arg_t); + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- from-0001/xen/include/public/xen.h +++ to-work/xen/include/public/xen.h 2006-04-04 13:29:54.000000000 +0900 @@ -60,6 +60,7 @@ #define __HYPERVISOR_acm_op 27 #define __HYPERVISOR_nmi_op 28 #define __HYPERVISOR_sched_op 29 +#define __HYPERVISOR_kexec_op 30 /* * VIRTUAL INTERRUPTS @@ -206,6 +207,13 @@ DEFINE_GUEST_HANDLE(mmuext_op_t); #define VMASST_TYPE_writable_pagetables 2 #define MAX_VMASST_TYPE 2 +/* + * Commands to HYPERVISOR_kexec(). + */ +#define KEXEC_CMD_kexec 0 +#define KEXEC_CMD_kexec_prepare 1 +#define KEXEC_CMD_kexec_cleanup 2 + #ifndef __ASSEMBLY__ typedef uint16_t domid_t; diff -r 0010df11836d buildconfigs/linux-defconfig_xen_x86_32 --- a/buildconfigs/linux-defconfig_xen_x86_32 Fri Apr 7 00:32:54 2006 +0100 +++ b/buildconfigs/linux-defconfig_xen_x86_32 Fri Apr 7 14:54:45 2006 +0900 @@ -184,6 +184,7 @@ CONFIG_HZ_100=y # CONFIG_HZ_250 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 +CONFIG_KEXEC=y # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x100000 CONFIG_HOTPLUG_CPU=y _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Apr 07, 2006 at 04:42:36PM +0900, Horms wrote:> kexec: framework and i386 > > Here is a first cut of kexec for dom0/xen, which will actually > kexec the physical machine from xen. The approach taken is > to move the architecture-dependant kexec code into a new hypercall. > > Some notes: > * machine_kexec_cleanup() and machine_kexec_prepare() don''t do > anything in i386. So while this patch adds a framework for them, > I am not sure what parameters are needs at this stage. > * Only works for UP, as machine_shutdown is not implemented yet > * kexecing into xen does not seem to work, I think that > kexec-tools needs updating, but I have not investigated yet > * I don''t believe that kdump works yet > * This patch was prepared against xen-unstable.hg 9514 > As of today (9574) two new hypercalls have been added. > I rediffed and moved the kexec hypercall to 33. However > this exceedes hypercall_NR, which is currently 32. > I tried increasing this, but the dom0 now crashes > in entry.S on init. Even after rebuilding both xen and the kernel > completely from scratch after a make distclean. Help!! >I was looking at doing the same but focusing more on kdump initially. However, the more I understood kexec/kdump and the more I understood the hypervisor and xend, I realized they both were solving the same problem in two different ways. Instead I was trying to focus on a domain0 failover/backup copy. By utilizing xend to set up all the infrastructure of loading the image/initrd, I all I had to do was set a flag in the hypervisor letting it know this was a second copy of another domain0. Upon reboot/crash, the hypervisor could then look to see if there is a second copy of a domain0 and if so run that copy (which would perform the same functionality as kexec AND kdump - minus the memory hole). This has the advantage (if done correctly) of not having to reboot the domainU kernels (which is a _huge_ win). The only penalty is dealing with the couple of seconds when the domain0 switches block/net driver control to the other domain0 and any dropped transactions. The infrastructure in xen is there, I am slowing weeding through the lower layers to set the right bits and such. Unfortunately, I can''t commit all my time to this little project but this is the direction I am trying to head towards. (Any help would be great!) Like I said, this is my 2cents. I just thought this approach would be a better fit with xen, than trying to drag the whole kexec/kdump layer inside the hypervisor. Opinions are welcomed. Cheers, Don _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi,> Here is a first cut of kexec for dom0/xen, which will actually > kexec the physical machine from xen. The approach taken is > to move the architecture-dependant kexec code into a new hypercall.First you need some more security checks. On a first quick look it seems you can zap and takeover the whole machine from within a domU by kexec-booting the machine. Second I think we''ll need a new kexec flag to indicate we''ll go zap the physical machine, not the virtual machine. I''m looking into the later, and I think we''ll be able to do both at some point in the future. Maybe it is enougth to care about dom0 (physical machine kexec) vs. domU (virtual machine kexec) only though. We certainly don''t want allow domUs kexec the whole machine, and virtual machine kexec for dom0 doesn''t make that much sense given how tight xen and dom0 work hand-in-hand.> * kexecing into xen does not seem to work, I think that > kexec-tools needs updating, but I have not investigated yetYep, actually _alot_ of the kexec magic happens in userspace. cheers, Gerd -- Gerd ''just married'' Hoffmann <kraxel@suse.de> I''m the hacker formerly known as Gerd Knorr. http://www.suse.de/~kraxel/just-married.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Apr 07, 2006 at 05:09:15PM +0200, Gerd Hoffmann wrote:> Hi, > > > Here is a first cut of kexec for dom0/xen, which will actually > > kexec the physical machine from xen. The approach taken is > > to move the architecture-dependant kexec code into a new hypercall. > > First you need some more security checks. On a first quick look it > seems you can zap and takeover the whole machine from within a domU by > kexec-booting the machine.Yes, I think you are right, I had completely forgotten about that.> Second I think we''ll need a new kexec flag to indicate we''ll go zap the > physical machine, not the virtual machine. I''m looking into the later, > and I think we''ll be able to do both at some point in the future. Maybe > it is enougth to care about dom0 (physical machine kexec) vs. domU > (virtual machine kexec) only though. We certainly don''t want allow > domUs kexec the whole machine, and virtual machine kexec for dom0 > doesn''t make that much sense given how tight xen and dom0 work hand-in-hand.Sounds fine by me. The focus of what I was trying to achive is to zap the entire physical machine, which is what the current code does. I am actually most interested in kdump, though its not working yet. In any case a flag makes perfect sense. Though it might make sense to add it when more flexible incarnations of kexec are added.> > * kexecing into xen does not seem to work, I think that > > kexec-tools needs updating, but I have not investigated yet > > Yep, actually _alot_ of the kexec magic happens in userspace.Yes, I became aware of that along the way. I''m pretty confident that the way I have done things, if you fixed up user-space kexec so that linux -> xen worked, then xen -> xen would also work. -- Horms _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hirokazu Takahashi
2006-Apr-10 05:09 UTC
Re: [Xen-devel] [PATCH]: kexec: framework and i386
Hi Don,> > kexec: framework and i386 > > > > Here is a first cut of kexec for dom0/xen, which will actually > > kexec the physical machine from xen. The approach taken is > > to move the architecture-dependant kexec code into a new hypercall. > > > > Some notes: > > * machine_kexec_cleanup() and machine_kexec_prepare() don''t do > > anything in i386. So while this patch adds a framework for them, > > I am not sure what parameters are needs at this stage. > > * Only works for UP, as machine_shutdown is not implemented yet > > * kexecing into xen does not seem to work, I think that > > kexec-tools needs updating, but I have not investigated yet > > * I don''t believe that kdump works yet > > * This patch was prepared against xen-unstable.hg 9514 > > As of today (9574) two new hypercalls have been added. > > I rediffed and moved the kexec hypercall to 33. However > > this exceedes hypercall_NR, which is currently 32. > > I tried increasing this, but the dom0 now crashes > > in entry.S on init. Even after rebuilding both xen and the kernel > > completely from scratch after a make distclean. Help!! > > > > I was looking at doing the same but focusing more on kdump initially. > However, the more I understood kexec/kdump and the more I understood the > hypervisor and xend, I realized they both were solving the same problem in > two different ways. > > Instead I was trying to focus on a domain0 failover/backup copy. By > utilizing xend to set up all the infrastructure of loading the > image/initrd, I all I had to do was set a flag in the hypervisor letting > it know this was a second copy of another domain0. > > Upon reboot/crash, the hypervisor could then look to see if there is a > second copy of a domain0 and if so run that copy (which would perform the > same functionality as kexec AND kdump - minus the memory hole). > > This has the advantage (if done correctly) of not having to reboot the > domainU kernels (which is a _huge_ win). The only penalty is dealing with > the couple of seconds when the domain0 switches block/net driver control > to the other domain0 and any dropped transactions. > > The infrastructure in xen is there, I am slowing weeding through the lower > layers to set the right bits and such. Unfortunately, I can''t commit all > my time to this little project but this is the direction I am trying to > head towards. (Any help would be great!) > > Like I said, this is my 2cents. I just thought this approach would be a > better fit with xen, than trying to drag the whole kexec/kdump layer > inside the hypervisor. Opinions are welcomed. > > Cheers, > DonWould you let me confirm my understanding is correct? You prefer kexec/kdump approach to take over a crashed domain0 than HA approach where the backup domain stands by. This is because the former can reset its whole hardware while it would be harder with the latter, right? Thanks, Hirokazu Takahashi. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Apr 10, 2006 at 02:09:17PM +0900, Hirokazu Takahashi wrote:> Hi Don, > > > > kexec: framework and i386 > > > > > > Here is a first cut of kexec for dom0/xen, which will actually > > > kexec the physical machine from xen. The approach taken is > > > to move the architecture-dependant kexec code into a new hypercall. > > > > > > Some notes: > > > * machine_kexec_cleanup() and machine_kexec_prepare() don''t do > > > anything in i386. So while this patch adds a framework for them, > > > I am not sure what parameters are needs at this stage. > > > * Only works for UP, as machine_shutdown is not implemented yet > > > * kexecing into xen does not seem to work, I think that > > > kexec-tools needs updating, but I have not investigated yet > > > * I don''t believe that kdump works yet > > > * This patch was prepared against xen-unstable.hg 9514 > > > As of today (9574) two new hypercalls have been added. > > > I rediffed and moved the kexec hypercall to 33. However > > > this exceedes hypercall_NR, which is currently 32. > > > I tried increasing this, but the dom0 now crashes > > > in entry.S on init. Even after rebuilding both xen and the kernel > > > completely from scratch after a make distclean. Help!! > > > > > > > I was looking at doing the same but focusing more on kdump initially. > > However, the more I understood kexec/kdump and the more I understood the > > hypervisor and xend, I realized they both were solving the same problem in > > two different ways. > > > > Instead I was trying to focus on a domain0 failover/backup copy. By > > utilizing xend to set up all the infrastructure of loading the > > image/initrd, I all I had to do was set a flag in the hypervisor letting > > it know this was a second copy of another domain0. > > > > Upon reboot/crash, the hypervisor could then look to see if there is a > > second copy of a domain0 and if so run that copy (which would perform the > > same functionality as kexec AND kdump - minus the memory hole). > > > > This has the advantage (if done correctly) of not having to reboot the > > domainU kernels (which is a _huge_ win). The only penalty is dealing with > > the couple of seconds when the domain0 switches block/net driver control > > to the other domain0 and any dropped transactions. > > > > The infrastructure in xen is there, I am slowing weeding through the lower > > layers to set the right bits and such. Unfortunately, I can''t commit all > > my time to this little project but this is the direction I am trying to > > head towards. (Any help would be great!) > > > > Like I said, this is my 2cents. I just thought this approach would be a > > better fit with xen, than trying to drag the whole kexec/kdump layer > > inside the hypervisor. Opinions are welcomed. > > > > Cheers, > > Don > > > Would you let me confirm my understanding is correct? > > You prefer kexec/kdump approach to take over a crashed domain0 > than HA approach where the backup domain stands by. > This is because the former can reset its whole hardware > while it would be harder with the latter, right? >Actually the opposite. I prefer the HA approach over kexec/kdump. It seemed like it gave more flexibility (reset dom0 or the whole machine). As much as I would like to see kexec/kdump in xen, for some reason it just doesn''t make sense to me. Cheers, Don> > Thanks, > Hirokazu Takahashi._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Don, Hi all, The key reason why I think that kexec/kdump does makes sense for xen, at least to some extent, is for the case where the hypervisor goes into a bad state, and you actually want to get rid of it and kdump into something else for forensics. There is also the advantage that by kexecing xen, you get access to the entire physical machine, either for crash-dump analysis, or because *gasp* you want to get out of xen for some other crazy reason :) And, on hardware that takes forever and a day to reboot, I believe that doing a kexec will be quite useful for hypervisor development. I would also like to note, that while my patch does involve moving parts of kexec/kdump into the hypervisor, and more similar parts need to be added in order to support other architectures, it is by no means all of kexec/kdump. -- Horms _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
kexec: framework and i386 Hi, here is a second take at this patch. The main changes over the predecessor are that kdump now works, mfns are used instead of pfns (was wrong before), and some code has been moved about. The code still uses the basic approach of moving architecture specific opperations into the hypervisor. Some notes: * machine_kexec_cleanup() and machine_kexec_prepare() don''t do anything in i386. So while this patch adds a framework for them, I am not sure what parameters are needs at this stage. * Only works for UP, as machine_shutdown is not implemented yet * kexecing into xen does not seem to work, I think that kexec-tools needs updating, but I have not investigated yet * I don''t believe that kdump works yet * This patch was prepared against xen-unstable.hg 9514 As of today (9574) two new hypercalls have been added. I rediffed and moved the kexec hypercall to 33. However this exceedes hypercall_NR, which is currently 32. I tried increasing this, but the dom0 now crashes in entry.S on init. Even after rebuilding both xen and the kernel completely from scratch after a make distclean. Help!! Prepared with the assistance of my colleague Magnus Damm Signed-Off-By: Horms <horms@verge.net.au> --- from-0002/buildconfigs/linux-defconfig_xen_x86_32 +++ to-work/buildconfigs/linux-defconfig_xen_x86_32 2006-04-10 12:29:46.000000000 +0900 @@ -183,6 +183,7 @@ CONFIG_HZ_100=y # CONFIG_HZ_250 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 +CONFIG_KEXEC=y # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x100000 CONFIG_HOTPLUG_CPU=y --- from-0001/linux-2.6-xen-sparse/arch/i386/Kconfig +++ to-work/linux-2.6-xen-sparse/arch/i386/Kconfig 2006-04-10 12:29:46.000000000 +0900 @@ -726,7 +726,7 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call (EXPERIMENTAL)" - depends on EXPERIMENTAL && !X86_XEN + depends on EXPERIMENTAL help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot --- from-0001/linux-2.6-xen-sparse/arch/i386/kernel/Makefile +++ to-work/linux-2.6-xen-sparse/arch/i386/kernel/Makefile 2006-04-10 12:29:46.000000000 +0900 @@ -92,7 +92,7 @@ include $(srctree)/scripts/Makefile.xen obj-y += fixup.o microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o -n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o +n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o machine_kexec.o crash.o obj-y := $(call filterxen, $(obj-y), $(n-obj-xen)) obj-y := $(call cherrypickxen, $(obj-y)) --- from-0001/linux-2.6-xen-sparse/arch/x86_64/kernel/Makefile +++ to-work/linux-2.6-xen-sparse/arch/x86_64/kernel/Makefile 2006-04-10 12:29:46.000000000 +0900 @@ -59,7 +59,7 @@ pci-dma-y += ../../i386/kernel/pci-dma microcode-$(subst m,y,$(CONFIG_MICROCODE)) := ../../i386/kernel/microcode-xen.o quirks-y := ../../i386/kernel/quirks-xen.o -n-obj-xen := i8259.o reboot.o i8237.o smpboot.o trampoline.o +n-obj-xen := i8259.o reboot.o i8237.o smpboot.o trampoline.o machine_kexec.o crash.o include $(srctree)/scripts/Makefile.xen --- from-0001/linux-2.6-xen-sparse/drivers/xen/core/reboot.c +++ to-work/linux-2.6-xen-sparse/drivers/xen/core/reboot.c 2006-04-10 12:29:46.000000000 +0900 @@ -17,6 +17,11 @@ #include <linux/kthread.h> #include <xen/gnttab.h> #include <xen/xencons.h> +#include <linux/kexec.h> +#include <xen/interface/kexec.h> +#include <linux/mm.h> +#include <asm/hypercall.h> + #if defined(__i386__) || defined(__x86_64__) /* @@ -38,6 +43,86 @@ extern void ctrl_alt_del(void); */ #define SHUTDOWN_HALT 4 +void machine_shutdown(void) +{ + printk("machine_shutdown: does nothing\n"); +} + +void machine_crash_shutdown(struct pt_regs *regs) +{ + /* XXX: This should do something */ + printk("xen-kexec: Need to turn of other CPUS in " + "machine_crash_shutdown()\n"); +} + +const extern unsigned char relocate_new_kernel[]; +extern unsigned int relocate_new_kernel_size; +static kexec_arg_t hypercall_arg; + +/* + * A architecture hook called to validate the + * proposed image and prepare the control pages + * as needed. The pages for KEXEC_CONTROL_CODE_SIZE + * have been allocated, but the segments have yet + * been copied into the kernel. + * + * Do what every setup is needed on image and the + * reboot code buffer to allow us to avoid allocations + * later. + * + * Currently nothing. + */ +int machine_kexec_prepare(struct kimage *image) +{ + return 0; +} + +/* + * Undo anything leftover by machine_kexec_prepare + * when an image is freed. + */ +void machine_kexec_cleanup(struct kimage *image) +{ +} + +/* + * Do not allocate memory (or fail in any way) in machine_kexec(). + * We are past the point of no return, committed to rebooting now. + */ +NORET_TYPE void machine_kexec(struct kimage *image) +{ + kimage_entry_t *ptr, entry; + + /* + * Translate addresses inside head from physcical to machine + * In practice, this only needs to change the pointer to + * indirection pages as non-indirected pages are relative. + */ + ptr = &image->head; + while ((entry = *ptr) && !(entry & IND_DONE)) { + if (!(entry & IND_DESTINATION)) + *ptr = phys_to_machine(entry & PAGE_MASK) | + (entry & ~PAGE_MASK); + + if (entry & IND_INDIRECTION) + ptr = __va(entry & PAGE_MASK); + else + ptr++; + } + + /* Set up arguments to hypercall */ + hypercall_arg.u.kexec.indirection_page = image->head; + hypercall_arg.u.kexec.reboot_code_buffer = + pfn_to_mfn(page_to_pfn(image->control_code_page)) << PAGE_SHIFT; + hypercall_arg.u.kexec.start_address = image->start; + hypercall_arg.u.kexec.relocate_new_kernel = relocate_new_kernel; + hypercall_arg.u.kexec.relocate_new_kernel_size = + relocate_new_kernel_size; + + /* Let Xen do the rest of the work */ + HYPERVISOR_kexec(KEXEC_CMD_kexec, &hypercall_arg); +} + void machine_emergency_restart(void) { /* We really want to get pending console data out before we die. */ --- from-0001/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h +++ to-work/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h 2006-04-10 12:29:46.000000000 +0900 @@ -37,6 +37,8 @@ # error "please don''t include this file directly" #endif +#include <xen/interface/kexec.h> + #define __STR(x) #x #define STR(x) __STR(x) @@ -329,6 +331,13 @@ HYPERVISOR_nmi_op( return _hypercall2(int, nmi_op, op, arg); } +static inline int +HYPERVISOR_kexec( + unsigned long op, kexec_arg_t * arg) +{ + return _hypercall2(int, kexec_op, op, arg); +} + #endif /* __HYPERCALL_H__ */ /* --- from-0001/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/hypercall.h +++ to-work/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/hypercall.h 2006-04-10 12:29:46.000000000 +0900 @@ -41,6 +41,8 @@ # error "please don''t include this file directly" #endif +#include <xen/interface/kexec.h> + #define __STR(x) #x #define STR(x) __STR(x) @@ -330,6 +332,13 @@ HYPERVISOR_nmi_op( return _hypercall2(int, nmi_op, op, arg); } +static inline int +HYPERVISOR_kexec( + unsigned long op, kexec_arg_t * arg) +{ + return _hypercall2(int, kexec_op, op, arg); +} + #endif /* __HYPERCALL_H__ */ /* --- from-0001/xen/arch/x86/x86_32/Makefile +++ to-work/xen/arch/x86/x86_32/Makefile 2006-04-10 12:29:46.000000000 +0900 @@ -5,6 +5,7 @@ obj-y += entry.o obj-y += mm.o obj-y += seg_fixup.o obj-y += traps.o +obj-y += machine_kexec.o obj-$(supervisor_mode_kernel) += supervisor_mode_kernel.o --- from-0001/xen/arch/x86/x86_32/entry.S +++ to-work/xen/arch/x86/x86_32/entry.S 2006-04-10 12:29:46.000000000 +0900 @@ -648,6 +648,7 @@ ENTRY(hypercall_table) .long do_acm_op .long do_nmi_op .long do_arch_sched_op + .long do_kexec /* 30 */ .rept NR_hypercalls-((.-hypercall_table)/4) .long do_ni_hypercall .endr @@ -683,6 +684,7 @@ ENTRY(hypercall_args_table) .byte 1 /* do_acm_op */ .byte 2 /* do_nmi_op */ .byte 2 /* do_arch_sched_op */ + .byte 2 /* do_kexec */ /* 30 */ .rept NR_hypercalls-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr --- /dev/null +++ to-work/xen/arch/x86/x86_32/machine_kexec.c 2006-04-10 12:29:46.000000000 +0900 @@ -0,0 +1,168 @@ +/****************************************************************************** + * arch/x86/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <xen/config.h> +#include <xen/types.h> +#include <xen/domain_page.h> +#include <xen/timer.h> +#include <xen/sched.h> +#include <asm/page.h> +#include <asm/flushtlb.h> +#include <public/xen.h> +#include <public/kexec.h> + +typedef asmlinkage void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long reboot_code_buffer, + unsigned long start_address, + unsigned int has_pae); + +#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) + +#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L2_ATTR (_PAGE_PRESENT) + +#ifndef CONFIG_X86_PAE + +static u32 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + unsigned long mfn; + u32 *pgtable_level2; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level2 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + write_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level2); +} + +#else +static u64 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; +static u64 pgtable_level2[L2_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + int mfn; + intpte_t *pgtable_level3; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level3 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + set_64bit(&pgtable_level3[l3_table_offset(address)], + __pa(pgtable_level2) | L2_ATTR); + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + load_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level3); +} +#endif + +static void kexec_load_segments(void) +{ +#define __SSTR(X) #X +#define SSTR(X) __SSTR(X) + __asm__ __volatile__ ( + "\tljmp $"SSTR(__HYPERVISOR_CS)",$1f\n" + "\t1:\n" + "\tmovl $"SSTR(__HYPERVISOR_DS)",%%eax\n" + "\tmovl %%eax,%%ds\n" + "\tmovl %%eax,%%es\n" + "\tmovl %%eax,%%fs\n" + "\tmovl %%eax,%%gs\n" + "\tmovl %%eax,%%ss\n" + ::: "eax", "memory"); +#undef SSTR +#undef __SSTR +} + +#define kexec_load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) +static void kexec_set_idt(void *newidt, __u16 limit) +{ + struct Xgt_desc_struct curidt; + + /* ia32 supports unaliged loads & stores */ + curidt.size = limit; + curidt.address = (unsigned long)newidt; + + kexec_load_idt(&curidt); + +}; + +#define kexec_load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) +static void kexec_set_gdt(void *newgdt, __u16 limit) +{ + struct Xgt_desc_struct curgdt; + + /* ia32 supports unaligned loads & stores */ + curgdt.size = limit; + curgdt.address = (unsigned long)newgdt; + + kexec_load_gdt(&curgdt); +}; + +int machine_kexec_prepare(struct kexec_arg *arg) +{ + return 0; +} + +void machine_kexec_cleanup(struct kexec_arg *arg) +{ +} + +void machine_kexec(struct kexec_arg *arg) +{ + relocate_new_kernel_t rnk; + + local_irq_disable(); + + identity_map_page(arg->u.kexec.reboot_code_buffer); + + copy_from_user((void *)arg->u.kexec.reboot_code_buffer, + arg->u.kexec.relocate_new_kernel, + arg->u.kexec.relocate_new_kernel_size); + + kexec_load_segments(); + + kexec_set_gdt(__va(0),0); + + kexec_set_idt(__va(0),0); + + rnk = (relocate_new_kernel_t) arg->u.kexec.reboot_code_buffer; + + (*rnk)(arg->u.kexec.indirection_page, arg->u.kexec.reboot_code_buffer, + arg->u.kexec.start_address, cpu_has_pae); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- from-0001/xen/common/Makefile +++ to-work/xen/common/Makefile 2006-04-10 12:29:46.000000000 +0900 @@ -24,6 +24,7 @@ obj-y += trace.o obj-y += timer.o obj-y += vsprintf.o obj-y += xmalloc.o +obj-y += kexec.o obj-$(perfc) += perfc.o obj-$(crash_debug) += gdbstub.o --- /dev/null +++ to-work/xen/common/kexec.c 2006-04-10 12:38:29.000000000 +0900 @@ -0,0 +1,58 @@ +/* + * Achitecture independent kexec code for Xen + * + * At this statge, just a switch for the kexec hypercall into + * architecture dependent code. + * + * Created By: Horms <horms@verge.net.au> + */ + +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/guest_access.h> +#include <xen/sched.h> +#include <public/xen.h> +#include <public/kexec.h> + +extern int machine_kexec_prepare(struct kexec_arg *arg); +extern void machine_kexec_cleanup(struct kexec_arg *arg); +extern void machine_kexec(struct kexec_arg *arg); + +int do_kexec(unsigned long op, + GUEST_HANDLE(kexec_arg_t) uarg) +{ + struct kexec_arg arg; + + if ( !IS_PRIV(current->domain) ) + return -EPERM; + + if ( unlikely(copy_from_guest(&arg, uarg, 1) != 0) ) + { + printk("do_kexec: __copy_from_guest failed"); + return -EFAULT; + } + + switch(op) { + case KEXEC_CMD_kexec: + machine_kexec(&arg); + return -EINVAL; /* Not Reached */ + case KEXEC_CMD_kexec_prepare: + return machine_kexec_prepare(&arg); + case KEXEC_CMD_kexec_cleanup: + machine_kexec_cleanup(&arg); + return 0; + } + + return -EINVAL; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + --- from-0001/xen/include/asm-x86/hypercall.h +++ to-work/xen/include/asm-x86/hypercall.h 2006-04-10 12:29:46.000000000 +0900 @@ -6,6 +6,7 @@ #define __ASM_X86_HYPERCALL_H__ #include <public/physdev.h> +#include <public/kexec.h> extern long do_set_trap_table( @@ -79,6 +80,11 @@ extern long arch_do_vcpu_op( int cmd, struct vcpu *v, GUEST_HANDLE(void) arg); +extern int +do_kexec( + unsigned long op, + GUEST_HANDLE(kexec_arg_t) uarg); + #ifdef __x86_64__ extern long --- /dev/null +++ to-work/xen/include/public/kexec.h 2006-04-10 12:29:46.000000000 +0900 @@ -0,0 +1,39 @@ +/* + * kexec.h: Xen kexec + * + * Created By: Horms <horms@verge.net.au> + */ + +#ifndef _XEN_PUBLIC_KEXEC_H +#define _XEN_PUBLIC_KEXEC_H + +/* + * Scratch space for passing arguments to the kexec hypercall + */ +typedef struct kexec_arg { + union { + struct { + unsigned long data; /* Not sure what this should be yet */ + } helper; + struct { + unsigned long indirection_page; + unsigned long reboot_code_buffer; + unsigned long start_address; + const char *relocate_new_kernel; + unsigned int relocate_new_kernel_size; + } kexec; + } u; +} kexec_arg_t; +DEFINE_GUEST_HANDLE(kexec_arg_t); + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- from-0001/xen/include/public/xen.h +++ to-work/xen/include/public/xen.h 2006-04-10 12:29:46.000000000 +0900 @@ -60,6 +60,7 @@ #define __HYPERVISOR_acm_op 27 #define __HYPERVISOR_nmi_op 28 #define __HYPERVISOR_sched_op 29 +#define __HYPERVISOR_kexec_op 30 /* * VIRTUAL INTERRUPTS @@ -206,6 +207,13 @@ DEFINE_GUEST_HANDLE(mmuext_op_t); #define VMASST_TYPE_writable_pagetables 2 #define MAX_VMASST_TYPE 2 +/* + * Commands to HYPERVISOR_kexec(). + */ +#define KEXEC_CMD_kexec 0 +#define KEXEC_CMD_kexec_prepare 1 +#define KEXEC_CMD_kexec_cleanup 2 + #ifndef __ASSEMBLY__ typedef uint16_t domid_t; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Apr 11, 2006 at 10:44:37AM +0900, Horms wrote:> Hi Don, Hi all, > > The key reason why I think that kexec/kdump does makes sense for xen, at > least to some extent, is for the case where the hypervisor goes into a > bad state, and you actually want to get rid of it and kdump into > something else for forensics. There is also the advantage that by > kexecing xen, you get access to the entire physical machine, either for > crash-dump analysis, or because *gasp* you want to get out of xen for > some other crazy reason :) And, on hardware that takes forever and a day > to reboot, I believe that doing a kexec will be quite useful for > hypervisor development.I guess I never thought about it from the hypervisor prospective. ;) Part of my concern was that the hypervisor had a bunch of this functionality built-in (like mapping memory and loading cpu context). However, after re-reading some of the kexec code, you don''t use the hypervisor to load a new kernel into memory? And I don''t know enough about the low level bits to understand if hypercall to load vcpu context would be useful.> > I would also like to note, that while my patch does involve moving parts > of kexec/kdump into the hypervisor, and more similar parts need to be > added in order to support other architectures, it is by no means all of > kexec/kdump.I understand what you are saying now. The first patch you sent I skimmed through and immediately thought you were trying to moving most parts down into the hypervisor. Upon reviewing it again, it doesn''t seem as intrusive. :) Cheers, Don _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Apr 12, 2006 at 06:12:30PM +0900, Horms wrote:> > kexec: framework and i386[snip]> * This patch was prepared against xen-unstable.hg 9514 > As of today (9574) two new hypercalls have been added. > I rediffed and moved the kexec hypercall to 33. However > this exceedes hypercall_NR, which is currently 32. > I tried increasing this, but the dom0 now crashes > in entry.S on init. Even after rebuilding both xen and the kernel > completely from scratch after a make distclean. Help!!Hi, I am a bit concerned that this patch is going to start rotting if I can''t at least track the current xen-unstable.hg, or better still get it merged. I would really appreciate it if someone could take moments to comment on the hypercall problem. Is adding a new hypercall, as the current patch does, the best way? If so could someone point me to how to increase the hypercall table size. If not, is it best to piggyback of the dom0_op hypercall? Or is there some other prefered option? -- Horms _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, here is the latest update of the kexec xen/dom0 patch. -- Horms kexec: framework and i386 This is an implementation of kexec for dom0/xen, that allows kexecing of the physical machine from xen. The approach taken is to move the architecture-dependant kexec code into a new hypercall. Some notes: * machine_kexec_cleanup() and machine_kexec_prepare() don''t do anything in i386. So while this patch adds a framework for them, I am not sure what parameters are needs at this stage. * Only works for UP, as machine_shutdown is not implemented yet * kexecing into xen does not seem to work, I think that kexec-tools needs updating, but I have not investigated yet * Kdump works by first copying the kernel into dom0 segments and relocating them later in xen, the same way that kexec does The only difference is that the relocation is made into an area reserved by xen * Kdump reservation is made using the xen command line parameters, kdump_megabytes and kdump_megabytes_base, rather than the linux option crashkernel, which is now ignored. Two parameters are used instead of one to simplify parsing. This can be cleaned up later if desired. But the reservation seems to need to be made by xen to make sure that it happens early enough. * This patch uses dom0_op for hypercalls Highlights since the previous posted version: * Use dom0_op instead of a new kexec hypercall - the hypercall table is currently full, so there is no where to put a new kexec hypercall - This kexec patch makes sense for dom0 at this stage * Kernel notes are filled in for kdump - UP only, this patch does not support SMP kdump yet * Share x86 code between x86_64 and x86_32 (though x86_64 is not finished and not included in this patch) * Doesn''t break x86_64 build Prepared by Horms and Magnus Damm Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> Signed-Off-By: Horms <horms@verge.net.au> linux-2.6-xen-sparse/arch/i386/Kconfig | 2 linux-2.6-xen-sparse/arch/i386/kernel/Makefile | 2 linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c | 26 ++ linux-2.6-xen-sparse/drivers/xen/core/Makefile | 1 linux-2.6-xen-sparse/drivers/xen/core/crash.c | 98 +++++++++ linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c | 78 +++++++ linux-2.6-xen-sparse/drivers/xen/core/reboot.c | 7 ref-linux-2.6.16/drivers/base/cpu.c | 4 ref-linux-2.6.16/kernel/kexec.c | 52 ++++- xen/arch/x86/Makefile | 1 xen/arch/x86/dom0_ops.c | 33 +++ xen/arch/x86/machine_kexec.c | 174 +++++++++++++++++ xen/arch/x86/setup.c | 75 ++++++- xen/common/page_alloc.c | 33 ++- xen/include/public/dom0_ops.h | 23 ++ xen/include/public/xen.h | 8 xen/include/xen/mm.h | 1 17 files changed, 585 insertions(+), 33 deletions(-) --- x/linux-2.6-xen-sparse/arch/i386/Kconfig +++ x/linux-2.6-xen-sparse/arch/i386/Kconfig @@ -726,7 +726,7 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call (EXPERIMENTAL)" - depends on EXPERIMENTAL && !X86_XEN + depends on EXPERIMENTAL help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot --- x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile +++ x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile @@ -92,7 +92,7 @@ include $(srctree)/scripts/Makefile.xen obj-y += fixup.o microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o -n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o +n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o machine_kexec.o crash.o obj-y := $(call filterxen, $(obj-y), $(n-obj-xen)) obj-y := $(call cherrypickxen, $(obj-y)) --- x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c +++ x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c @@ -68,6 +68,10 @@ #include "setup_arch_pre.h" #include <bios_ebda.h> +#ifdef CONFIG_XEN +#include <xen/interface/dom0_ops.h> +#endif + /* Forward Declaration. */ void __init find_max_pfn(void); @@ -932,6 +936,7 @@ static void __init parse_cmdline_early ( * after a kernel panic. */ else if (!memcmp(from, "crashkernel=", 12)) { +#ifndef CONFIG_XEN unsigned long size, base; size = memparse(from+12, &from); if (*from == ''@'') { @@ -942,6 +947,10 @@ static void __init parse_cmdline_early ( crashk_res.start = base; crashk_res.end = base + size - 1; } +#else + printk("Ignoring crashkernel command line, " + "parameter will be supplied by xen\n"); +#endif } #endif #ifdef CONFIG_PROC_VMCORE @@ -1318,9 +1327,23 @@ void __init setup_bootmem_allocator(void } #endif #ifdef CONFIG_KEXEC +#ifndef CONFIG_XEN if (crashk_res.start != crashk_res.end) reserve_bootmem(crashk_res.start, crashk_res.end - crashk_res.start + 1); +#else + { + dom0_op_t op; + op.cmd = DOM0_KEXEC; + op.u.kexec.op = KEXEC_CMD_reserve; + BUG_ON(HYPERVISOR_dom0_op(&op)); + if (op.u.kexec.u.reserve.size) { + crashk_res.start = op.u.kexec.u.reserve.start; + crashk_res.end = op.u.kexec.u.reserve.start + + op.u.kexec.u.reserve.size - 1; + } + } +#endif #endif if (!xen_feature(XENFEAT_auto_translated_physmap)) @@ -1395,6 +1418,9 @@ legacy_init_iomem_resources(struct resou res->end = map[i].end - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; request_resource(&iomem_resource, res); +#ifdef CONFIG_KEXEC + request_resource(res, &crashk_res); +#endif } free_bootmem(__pa(map), PAGE_SIZE); --- x/linux-2.6-xen-sparse/drivers/xen/core/Makefile +++ x/linux-2.6-xen-sparse/drivers/xen/core/Makefile @@ -9,3 +9,4 @@ obj-$(CONFIG_NET) += skbuff.o obj-$(CONFIG_SMP) += smpboot.o obj-$(CONFIG_SYSFS) += hypervisor_sysfs.o obj-$(CONFIG_XEN_SYSFS) += xen_sysfs.o +obj-$(CONFIG_KEXEC) += machine_kexec.o crash.o --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/crash.c @@ -0,0 +1,98 @@ +/* + * Architecture specific (i386-xen) functions for kexec based crash dumps. + * + * Created by: Horms <horms@verge.net.au> + * + */ + +#include <linux/kernel.h> /* For printk */ + +/* XXX: final_note(), crash_save_this_cpu() and crash_save_self() + * are copied from arch/i386/kernel/crash.c, might be good to either + * the original functions non-static and use them, or just + * merge this this into that file. + */ +#include <linux/elf.h> /* For struct elf_note */ +#include <linux/elfcore.h> /* For struct elf_prstatus */ +#include <linux/kexec.h> /* crash_notes */ + +static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, + size_t data_len) +{ + struct elf_note note; + + note.n_namesz = strlen(name) + 1; + note.n_descsz = data_len; + note.n_type = type; + memcpy(buf, ¬e, sizeof(note)); + buf += (sizeof(note) +3)/4; + memcpy(buf, name, note.n_namesz); + buf += (note.n_namesz + 3)/4; + memcpy(buf, data, note.n_descsz); + buf += (note.n_descsz + 3)/4; + + return buf; +} + +static void final_note(u32 *buf) +{ + struct elf_note note; + + note.n_namesz = 0; + note.n_descsz = 0; + note.n_type = 0; + memcpy(buf, ¬e, sizeof(note)); +} + +static void crash_save_this_cpu(struct pt_regs *regs, int cpu) +{ + struct elf_prstatus prstatus; + u32 *buf; + + if ((cpu < 0) || (cpu >= NR_CPUS)) + return; + + /* Using ELF notes here is opportunistic. + * I need a well defined structure format + * for the data I pass, and I need tags + * on the data to indicate what information I have + * squirrelled away. ELF notes happen to provide + * all of that that no need to invent something new. + */ + buf = (u32*)per_cpu_ptr(crash_notes, cpu); + if (!buf) + return; + memset(&prstatus, 0, sizeof(prstatus)); + prstatus.pr_pid = current->pid; + elf_core_copy_regs(&prstatus.pr_reg, regs); + buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus, + sizeof(prstatus)); + final_note(buf); +} + +static void crash_save_self(struct pt_regs *regs) +{ + int cpu; + + cpu = smp_processor_id(); + crash_save_this_cpu(regs, cpu); +} + + +void machine_crash_shutdown(struct pt_regs *regs) +{ + /* XXX: This should do something */ + printk("xen-kexec: Need to turn of other CPUS in " + "machine_crash_shutdown()\n"); + crash_save_self(regs); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c @@ -0,0 +1,78 @@ +/* + * machine_kexec.c - handle transition of Linux booting another kernel + * + * Created By: Horms <horms@verge.net.au> + * + * Losely based on arch/i386/kernel/machine_kexec.c + */ + +#include <linux/kexec.h> +#include <xen/interface/dom0_ops.h> +#include <linux/mm.h> +#include <asm/hypercall.h> + +const extern unsigned char relocate_new_kernel[]; +extern unsigned int relocate_new_kernel_size; + +/* + * A architecture hook called to validate the + * proposed image and prepare the control pages + * as needed. The pages for KEXEC_CONTROL_CODE_SIZE + * have been allocated, but the segments have yet + * been copied into the kernel. + * + * Do what every setup is needed on image and the + * reboot code buffer to allow us to avoid allocations + * later. + * + * Currently nothing. + */ +int machine_kexec_prepare(struct kimage *image) +{ + struct dom0_op op; + op.cmd = DOM0_KEXEC; + op.u.kexec.op = KEXEC_CMD_kexec_prepare; + op.u.kexec.u.helper.data = 0; + return HYPERVISOR_dom0_op(&op); +} + +/* + * Undo anything leftover by machine_kexec_prepare + * when an image is freed. + */ +void machine_kexec_cleanup(struct kimage *image) +{ + struct dom0_op op; + op.cmd = DOM0_KEXEC; + op.u.kexec.op = KEXEC_CMD_kexec_cleanup; + op.u.kexec.u.helper.data = 0; + HYPERVISOR_dom0_op(&op); +} + +/* + * Do not allocate memory (or fail in any way) in machine_kexec(). + * We are past the point of no return, committed to rebooting now. + */ +NORET_TYPE void machine_kexec(struct kimage *image) +{ + struct dom0_op op; + op.cmd = DOM0_KEXEC; + op.u.kexec.op = KEXEC_CMD_kexec; + op.u.kexec.u.kexec.indirection_page = image->head; + op.u.kexec.u.kexec.reboot_code_buffer = + pfn_to_mfn(page_to_pfn(image->control_code_page)) << PAGE_SHIFT; + op.u.kexec.u.kexec.start_address = image->start; + op.u.kexec.u.kexec.relocate_new_kernel = relocate_new_kernel; + op.u.kexec.u.kexec.relocate_new_kernel_size = relocate_new_kernel_size; + HYPERVISOR_dom0_op(&op); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c +++ x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c @@ -370,6 +370,13 @@ static int __init setup_shutdown_event(v subsys_initcall(setup_shutdown_event); +#ifdef CONFIG_KEXEC +void machine_shutdown(void) +{ + printk("machine_shutdown: does nothing\n"); +} +#endif + /* * Local variables: * c-file-style: "linux" --- x/ref-linux-2.6.16/drivers/base/cpu.c +++ x/ref-linux-2.6.16/drivers/base/cpu.c @@ -101,7 +101,11 @@ static ssize_t show_crash_notes(struct s * boot up and this data does not change there after. Hence this * operation should be safe. No locking required. */ +#ifndef CONFIG_XEN addr = __pa(per_cpu_ptr(crash_notes, cpunum)); +#else + addr = virt_to_machine(per_cpu_ptr(crash_notes, cpunum)); +#endif rc = sprintf(buf, "%Lx\n", addr); return rc; } --- x/ref-linux-2.6.16/kernel/kexec.c +++ x/ref-linux-2.6.16/kernel/kexec.c @@ -38,6 +38,20 @@ struct resource crashk_res = { .flags = IORESOURCE_BUSY | IORESOURCE_MEM }; +/* Kexec needs to know about the actually physical addresss. + * But in xen, a physical address is a pseudo-physical addresss. */ +#ifndef CONFIG_XEN +#define kexec_page_to_pfn(page) page_to_pfn(page) +#define kexec_pfn_to_page(pfn) pfn_to_page(pfn) +#define kexec_virt_to_phys(addr) virt_to_phys(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(addr) +#else +#define kexec_page_to_pfn(page) pfn_to_mfn(page_to_pfn(page)) +#define kexec_pfn_to_page(pfn) pfn_to_page(mfn_to_pfn(pfn)) +#define kexec_virt_to_phys(addr) virt_to_machine(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr)) +#endif + int kexec_should_crash(struct task_struct *p) { if (in_interrupt() || !p->pid || p->pid == 1 || panic_on_oops) @@ -403,7 +417,7 @@ static struct page *kimage_alloc_normal_ pages = kimage_alloc_pages(GFP_KERNEL, order); if (!pages) break; - pfn = page_to_pfn(pages); + pfn = kexec_page_to_pfn(pages); epfn = pfn + count; addr = pfn << PAGE_SHIFT; eaddr = epfn << PAGE_SHIFT; @@ -437,6 +451,7 @@ static struct page *kimage_alloc_normal_ return pages; } +#ifndef CONFIG_XEN static struct page *kimage_alloc_crash_control_pages(struct kimage *image, unsigned int order) { @@ -490,7 +505,7 @@ static struct page *kimage_alloc_crash_c } /* If I don''t overlap any segments I have found my hole! */ if (i == image->nr_segments) { - pages = pfn_to_page(hole_start >> PAGE_SHIFT); + pages = kexec_pfn_to_page(hole_start >> PAGE_SHIFT); break; } } @@ -517,6 +532,13 @@ struct page *kimage_alloc_control_pages( return pages; } +#else /* !CONFIG_XEN */ +struct page *kimage_alloc_control_pages(struct kimage *image, + unsigned int order) +{ + return kimage_alloc_normal_control_pages(image, order); +} +#endif static int kimage_add_entry(struct kimage *image, kimage_entry_t entry) { @@ -532,7 +554,7 @@ static int kimage_add_entry(struct kimag return -ENOMEM; ind_page = page_address(page); - *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION; + *image->entry = kexec_virt_to_phys(ind_page) | IND_INDIRECTION; image->entry = ind_page; image->last_entry = ind_page + ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1); @@ -593,13 +615,13 @@ static int kimage_terminate(struct kimag #define for_each_kimage_entry(image, ptr, entry) \ for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \ ptr = (entry & IND_INDIRECTION)? \ - phys_to_virt((entry & PAGE_MASK)): ptr +1) + kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1) static void kimage_free_entry(kimage_entry_t entry) { struct page *page; - page = pfn_to_page(entry >> PAGE_SHIFT); + page = kexec_pfn_to_page(entry >> PAGE_SHIFT); kimage_free_pages(page); } @@ -686,7 +708,7 @@ static struct page *kimage_alloc_page(st * have a match. */ list_for_each_entry(page, &image->dest_pages, lru) { - addr = page_to_pfn(page) << PAGE_SHIFT; + addr = kexec_page_to_pfn(page) << PAGE_SHIFT; if (addr == destination) { list_del(&page->lru); return page; @@ -701,12 +723,12 @@ static struct page *kimage_alloc_page(st if (!page) return NULL; /* If the page cannot be used file it away */ - if (page_to_pfn(page) > + if (kexec_page_to_pfn(page) > (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { list_add(&page->lru, &image->unuseable_pages); continue; } - addr = page_to_pfn(page) << PAGE_SHIFT; + addr = kexec_page_to_pfn(page) << PAGE_SHIFT; /* If it is the destination page we want use it */ if (addr == destination) @@ -729,7 +751,7 @@ static struct page *kimage_alloc_page(st struct page *old_page; old_addr = *old & PAGE_MASK; - old_page = pfn_to_page(old_addr >> PAGE_SHIFT); + old_page = kexec_pfn_to_page(old_addr >> PAGE_SHIFT); copy_highpage(page, old_page); *old = addr | (*old & ~PAGE_MASK); @@ -779,7 +801,7 @@ static int kimage_load_normal_segment(st result = -ENOMEM; goto out; } - result = kimage_add_page(image, page_to_pfn(page) + result = kimage_add_page(image, kexec_page_to_pfn(page) << PAGE_SHIFT); if (result < 0) goto out; @@ -811,6 +833,7 @@ out: return result; } +#ifndef CONFIG_XEN static int kimage_load_crash_segment(struct kimage *image, struct kexec_segment *segment) { @@ -833,7 +856,7 @@ static int kimage_load_crash_segment(str char *ptr; size_t uchunk, mchunk; - page = pfn_to_page(maddr >> PAGE_SHIFT); + page = kexec_pfn_to_page(maddr >> PAGE_SHIFT); if (page == 0) { result = -ENOMEM; goto out; @@ -881,6 +904,13 @@ static int kimage_load_segment(struct ki return result; } +#else /* CONFIG_XEN */ +static int kimage_load_segment(struct kimage *image, + struct kexec_segment *segment) +{ + return kimage_load_normal_segment(image, segment); +} +#endif /* * Exec Kernel system call: for obvious reasons only root may call it. --- x/xen/arch/x86/Makefile +++ x/xen/arch/x86/Makefile @@ -38,6 +38,7 @@ obj-y += trampoline.o obj-y += traps.o obj-y += usercopy.o obj-y += x86_emulate.o +obj-y += machine_kexec.o ifneq ($(pae),n) obj-$(x86_32) += shadow.o shadow_public.o shadow_guest32.o --- x/xen/arch/x86/dom0_ops.c +++ x/xen/arch/x86/dom0_ops.c @@ -29,6 +29,13 @@ #include <asm/mtrr.h> #include "cpu/mtrr/mtrr.h" +extern int machine_kexec_prepare(struct dom0_kexec *arg); +extern void machine_kexec_cleanup(struct dom0_kexec *arg); +extern void machine_kexec(struct dom0_kexec *arg); + +extern unsigned int opt_kdump_megabytes; +extern unsigned int opt_kdump_megabytes_base; + #define TRC_DOM0OP_ENTER_BASE 0x00020000 #define TRC_DOM0OP_LEAVE_BASE 0x00030000 @@ -445,6 +452,32 @@ long arch_do_dom0_op(struct dom0_op *op, } break; + case DOM0_KEXEC: + switch(op->u.kexec.op) { + case KEXEC_CMD_kexec: + machine_kexec(&op->u.kexec); + ret = -EINVAL; /* Not Reached */ + break; + case KEXEC_CMD_kexec_prepare: + ret = machine_kexec_prepare(&op->u.kexec); + break; + case KEXEC_CMD_kexec_cleanup: + machine_kexec_cleanup(&op->u.kexec); + ret = 0; + break; + case KEXEC_CMD_reserve: + op->u.kexec.u.reserve.size = opt_kdump_megabytes << 20; + op->u.kexec.u.reserve.start = opt_kdump_megabytes_base << 20; + if ( unlikely(copy_to_guest(u_dom0_op, op, 1) != 0) ) + { + printk("arch_do_dom0_op: kexec: copy_to_guest failed"); + return -EFAULT; + } + ret = 0; + break; + } + break; + default: ret = -ENOSYS; break; --- /dev/null +++ x/xen/arch/x86/machine_kexec.c @@ -0,0 +1,174 @@ +/****************************************************************************** + * arch/x86/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <xen/config.h> +#include <xen/types.h> +#include <xen/domain_page.h> +#include <xen/timer.h> +#include <xen/sched.h> +#include <asm/page.h> +#include <asm/flushtlb.h> +#include <public/xen.h> +#include <public/dom0_ops.h> + +#ifdef CONFIG_X86_32 + +typedef asmlinkage void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long reboot_code_buffer, + unsigned long start_address, + unsigned int has_pae); + +#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) + +#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L2_ATTR (_PAGE_PRESENT) + +#ifndef CONFIG_X86_PAE + +static u32 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + unsigned long mfn; + u32 *pgtable_level2; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level2 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + write_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level2); +} + +#else +static u64 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; +static u64 pgtable_level2[L2_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + int mfn; + intpte_t *pgtable_level3; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level3 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + set_64bit(&pgtable_level3[l3_table_offset(address)], + __pa(pgtable_level2) | L2_ATTR); + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + load_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level3); +} +#endif + +static void kexec_load_segments(void) +{ +#define __SSTR(X) #X +#define SSTR(X) __SSTR(X) + __asm__ __volatile__ ( + "\tljmp $"SSTR(__HYPERVISOR_CS)",$1f\n" + "\t1:\n" + "\tmovl $"SSTR(__HYPERVISOR_DS)",%%eax\n" + "\tmovl %%eax,%%ds\n" + "\tmovl %%eax,%%es\n" + "\tmovl %%eax,%%fs\n" + "\tmovl %%eax,%%gs\n" + "\tmovl %%eax,%%ss\n" + ::: "eax", "memory"); +#undef SSTR +#undef __SSTR +} + +#define kexec_load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) +static void kexec_set_idt(void *newidt, __u16 limit) +{ + struct Xgt_desc_struct curidt; + + /* ia32 supports unaliged loads & stores */ + curidt.size = limit; + curidt.address = (unsigned long)newidt; + + kexec_load_idt(&curidt); + +}; + +#define kexec_load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) +static void kexec_set_gdt(void *newgdt, __u16 limit) +{ + struct Xgt_desc_struct curgdt; + + /* ia32 supports unaligned loads & stores */ + curgdt.size = limit; + curgdt.address = (unsigned long)newgdt; + + kexec_load_gdt(&curgdt); +}; + +#endif + +int machine_kexec_prepare(struct dom0_kexec *arg) +{ + return 0; +} + +void machine_kexec_cleanup(struct dom0_kexec *arg) +{ +} + +void machine_kexec(struct dom0_kexec *arg) +{ +#ifdef CONFIG_X86_32 + relocate_new_kernel_t rnk; + + local_irq_disable(); + + identity_map_page(arg->u.kexec.reboot_code_buffer); + + copy_from_user((void *)arg->u.kexec.reboot_code_buffer, + arg->u.kexec.relocate_new_kernel, + arg->u.kexec.relocate_new_kernel_size); + + kexec_load_segments(); + + kexec_set_gdt(__va(0),0); + + kexec_set_idt(__va(0),0); + + rnk = (relocate_new_kernel_t) arg->u.kexec.reboot_code_buffer; + + (*rnk)(arg->u.kexec.indirection_page, arg->u.kexec.reboot_code_buffer, + arg->u.kexec.start_address, cpu_has_pae); +#endif +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/arch/x86/setup.c +++ x/xen/arch/x86/setup.c @@ -37,6 +37,11 @@ static unsigned int opt_xenheap_megabyte integer_param("xenheap_megabytes", opt_xenheap_megabytes); #endif +unsigned int opt_kdump_megabytes = 0; +integer_param("kdump_megabytes", opt_kdump_megabytes); +unsigned int opt_kdump_megabytes_base = 0; +integer_param("kdump_megabytes_base", opt_kdump_megabytes_base); + /* opt_nosmp: If true, secondary processors are ignored. */ static int opt_nosmp = 0; boolean_param("nosmp", opt_nosmp); @@ -159,6 +164,20 @@ void discard_initial_images(void) init_domheap_pages(initial_images_start, initial_images_end); } +void __init move_memory(unsigned long dst, + unsigned long src_start, unsigned long src_end) +{ +#if defined(CONFIG_X86_32) + memmove((void *)dst, /* use low mapping */ + (void *)src_start, /* use low mapping */ + src_end - src_start); +#elif defined(CONFIG_X86_64) + memmove(__va(dst), + __va(src_start), + src_end - src_start); +#endif +} + void __init __start_xen(multiboot_info_t *mbi) { char *cmdline; @@ -289,15 +308,8 @@ void __init __start_xen(multiboot_info_t initial_images_start = xenheap_phys_end; initial_images_end = initial_images_start + modules_length; -#if defined(CONFIG_X86_32) - memmove((void *)initial_images_start, /* use low mapping */ - (void *)mod[0].mod_start, /* use low mapping */ - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#elif defined(CONFIG_X86_64) - memmove(__va(initial_images_start), - __va(mod[0].mod_start), - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#endif + move_memory(initial_images_start, + mod[0].mod_start, mod[mbi->mods_count-1].mod_end); /* Initialise boot-time allocator with all RAM situated after modules. */ xenheap_phys_start = init_boot_allocator(__pa(&_end)); @@ -344,6 +356,51 @@ void __init __start_xen(multiboot_info_t #endif } + if (opt_kdump_megabytes) { + unsigned long kdump_start, kdump_size, k; + + /* mark images pages as free for now */ + + init_boot_pages(initial_images_start, initial_images_end); + + kdump_start = opt_kdump_megabytes_base << 20; + kdump_size = opt_kdump_megabytes << 20; + + printk("Kdump: %luMB (%lukB) at 0x%lx\n", + kdump_size >> 20, + kdump_size >> 10, + kdump_start); + + if ((kdump_start & ~PAGE_MASK) || (kdump_size & ~PAGE_MASK)) + panic("Kdump parameters not page aligned\n"); + + kdump_start >>= PAGE_SHIFT; + kdump_size >>= PAGE_SHIFT; + + /* allocate pages for Kdump memory area */ + + k = alloc_boot_pages_at(kdump_size, kdump_start); + + if (k != kdump_start) + panic("Unable to reserve Kdump memory\n"); + + /* allocate pages for relocated initial images */ + + k = ((initial_images_end - initial_images_start) & ~PAGE_MASK) ? 1 : 0; + k += (initial_images_end - initial_images_start) >> PAGE_SHIFT; + + k = alloc_boot_pages(k, 1); + + if (!k) + panic("Unable to allocate initial images memory\n"); + + move_memory(k << PAGE_SHIFT, initial_images_start, initial_images_end); + + initial_images_end -= initial_images_start; + initial_images_start = k << PAGE_SHIFT; + initial_images_end += initial_images_start; + } + memguard_init(); printk("System RAM: %luMB (%lukB)\n", --- x/xen/common/page_alloc.c +++ x/xen/common/page_alloc.c @@ -212,24 +212,35 @@ void init_boot_pages(paddr_t ps, paddr_t } } +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at) +{ + unsigned long i; + + for ( i = 0; i < nr_pfns; i++ ) + if ( allocated_in_map(pfn_at + i) ) + break; + + if ( i == nr_pfns ) + { + map_alloc(pfn_at, nr_pfns); + return pfn_at; + } + + return 0; +} + unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align) { - unsigned long pg, i; + unsigned long pg, i = 0; for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align ) { - for ( i = 0; i < nr_pfns; i++ ) - if ( allocated_in_map(pg + i) ) - break; - - if ( i == nr_pfns ) - { - map_alloc(pg, nr_pfns); - return pg; - } + i = alloc_boot_pages_at(nr_pfns, pg); + if (i != 0) + break; } - return 0; + return i; } --- x/xen/include/public/dom0_ops.h +++ x/xen/include/public/dom0_ops.h @@ -472,6 +472,28 @@ typedef struct dom0_hypercall_init { } dom0_hypercall_init_t; DEFINE_GUEST_HANDLE(dom0_hypercall_init_t); +#define DOM0_KEXEC 49 +typedef struct dom0_kexec{ + unsigned long op; + union { + struct { + unsigned long data; /* Not sure what this should be yet */ + } helper; + struct { + unsigned long indirection_page; + unsigned long reboot_code_buffer; + unsigned long start_address; + const char *relocate_new_kernel; + unsigned int relocate_new_kernel_size; + } kexec; + struct { + unsigned long size; + unsigned long start; + } reserve; + } u; +} dom0_kexec_t; +DEFINE_GUEST_HANDLE(dom0_kexec_t); + typedef struct dom0_op { uint32_t cmd; uint32_t interface_version; /* DOM0_INTERFACE_VERSION */ @@ -513,6 +535,7 @@ typedef struct dom0_op { struct dom0_irq_permission irq_permission; struct dom0_iomem_permission iomem_permission; struct dom0_hypercall_init hypercall_init; + struct dom0_kexec kexec; uint8_t pad[128]; } u; } dom0_op_t; --- x/xen/include/public/xen.h +++ x/xen/include/public/xen.h @@ -215,6 +215,14 @@ DEFINE_GUEST_HANDLE(mmuext_op_t); #define VMASST_TYPE_writable_pagetables 2 #define MAX_VMASST_TYPE 2 +/* + * Operations for kexec. + */ +#define KEXEC_CMD_kexec 0 +#define KEXEC_CMD_kexec_prepare 1 +#define KEXEC_CMD_kexec_cleanup 2 +#define KEXEC_CMD_reserve 3 + #ifndef __ASSEMBLY__ typedef uint16_t domid_t; --- x/xen/include/xen/mm.h +++ x/xen/include/xen/mm.h @@ -40,6 +40,7 @@ struct page_info; paddr_t init_boot_allocator(paddr_t bitmap_start); void init_boot_pages(paddr_t ps, paddr_t pe); unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align); +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at); void end_boot_allocator(void); /* Generic allocator. These functions are *not* interrupt-safe. */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Horms and Mark This is good work! It is very necessary for debugging hypervisor or domain0. I''m not clear at some points. 1. Is this feature available on uni-processor machine? 2. Could you explain more detail usage? Mark, what do you think about this kdump implementation? Best Regards, Akio Takebe>On Wed, Apr 12, 2006 at 06:12:30PM +0900, Horms wrote: >> >> kexec: framework and i386 > >[snip] > >> * This patch was prepared against xen-unstable.hg 9514 >> As of today (9574) two new hypercalls have been added. >> I rediffed and moved the kexec hypercall to 33. However >> this exceedes hypercall_NR, which is currently 32. >> I tried increasing this, but the dom0 now crashes >> in entry.S on init. Even after rebuilding both xen and the kernel >> completely from scratch after a make distclean. Help!! > >Hi, > >I am a bit concerned that this patch is going to start rotting if I >can''t at least track the current xen-unstable.hg, or better still get it >merged. > >I would really appreciate it if someone could take moments to comment on >the hypercall problem. Is adding a new hypercall, as the current patch >does, the best way? If so could someone point me to how to increase the >hypercall table size. If not, is it best to piggyback of the dom0_op >hypercall? Or is there some other prefered option? > >-- >Horms > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
horms-home@verge.net.au
2006-Apr-21 06:55 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
On Fri, Apr 21, 2006 at 03:10:33PM +0900, Akio Takebe wrote:> Hi, Horms and Mark > > This is good work! > It is very necessary for debugging hypervisor or domain0. > I''m not clear at some points. > > 1. Is this feature available on uni-processor machine?At this stage only uni-processor is supported. However, the framework supports SMP, and Magnus and I are planning to fill in those bits soon. Actually, this is the next thing on my list of things to do.> 2. Could you explain more detail usage?The usage is mostly the same as kexec in the Linux kernel. You can kexec from xen into another kernel by using kexec -l, kexec -e, as per linux. kexec -l /boot/vmlinux --append "ro root=/dev/hda..." kexec -e And you can load a kernel that will be run on system crash using kexec -p. This is discussed at some length in Documentation/kdump/kdump.txt of the linux source tree. Those instructions can be followed verbatim for xen. The main difference with kdump, is that instead of using the crashdump command line option to linux, you use the kdump_megabytes and kdump_megabytes_base command line options to xen. When running xen the crashdump linux command line option is ignored. The reason for moving the option from linux to xen is that it seems that the memory needs to be reserved by xen before it starts dom0. The reason that there are two options instead of one is for simplicity. Linux provides infastructure to read the more complex compound option, xen does not. This can be changed if it is a problem. In summary, for kdump "linux crashdump=64M@16M" becomes "xen kdump_megabytes=64 kdump_megabytes_base=16" The other main point to note for users is that while the following transition is possible: xen -> linux. But xen -> xen and linux -> xen is currently not possible. This is because kexec-tool, the user-space component of kexec does not understand enough about xen, and thus needs to be enhanced in order to make this possible.> Mark, what do you think about this kdump implementation? > > Best Regards, > > Akio Takebe-- Horms _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Horms Thank you for your kind explanation. I have a small question. Is this kdump called only when domain0 panic? When Xen/Hypervisor panic, is this kdump called? I think it is necessary for dump feature to be called via NMI handler and panic(). (as linux code) Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> In summary, for kdump "linux crashdump=64M@16M" > becomes "xen kdump_megabytes=64 kdump_megabytes_base=16"Please can you explain a bit about how kdump works, and why a physically contiguous region with known base address is required. What actually gets written out in the crash dump and in what format?> The other main point to note for users is that while the > following transition is possible: xen -> linux. But xen -> > xen and linux -> xen is currently not possible. This is > because kexec-tool, the user-space component of kexec does > not understand enough about xen, and thus needs to be > enhanced in order to make this possible.Tim Deegan submitted a patch to add support for multiboot images (such as Xen) to kexec a couple of years ago, and I believe it has been part of the standard package for some time. Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Apr-23 14:45 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
> This is good work! > It is very necessary for debugging hypervisor or domain0.Agreed.> Mark, what do you think about this kdump implementation?I think kdump, in general, is a nifty solution for supporting crashdumps, since using a separate kernel for crashdumps gives you the best possible opportunity to complete them successfully. Integrating with the Linux crashdump infrastructure seems like a good idea. It doesn''t stop other dom0 OSes using their own crashdump infrastructure, but with kdump we have a chance of getting a dump even if Xen itself crashes (whilst admittedly rare, such crashes are something you''d want to get debugged quickly!) So, essentially, I think the idea is good - I''ll try and take a look through the code. Cheers, Mark> Best Regards, > > Akio Takebe > > >On Wed, Apr 12, 2006 at 06:12:30PM +0900, Horms wrote: > >> kexec: framework and i386 > > > >[snip] > > > >> * This patch was prepared against xen-unstable.hg 9514 > >> As of today (9574) two new hypercalls have been added. > >> I rediffed and moved the kexec hypercall to 33. However > >> this exceedes hypercall_NR, which is currently 32. > >> I tried increasing this, but the dom0 now crashes > >> in entry.S on init. Even after rebuilding both xen and the kernel > >> completely from scratch after a make distclean. Help!! > > > >Hi, > > > >I am a bit concerned that this patch is going to start rotting if I > >can''t at least track the current xen-unstable.hg, or better still get it > >merged. > > > >I would really appreciate it if someone could take moments to comment on > >the hypercall problem. Is adding a new hypercall, as the current patch > >does, the best way? If so could someone point me to how to increase the > >hypercall table size. If not, is it best to piggyback of the dom0_op > >hypercall? Or is there some other prefered option? > > > >-- > >Horms > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xensource.com > >http://lists.xensource.com/xen-devel-- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Apr-23 14:53 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
On Sunday 23 April 2006 15:33, Ian Pratt wrote:> > In summary, for kdump "linux crashdump=64M@16M" > > becomes "xen kdump_megabytes=64 kdump_megabytes_base=16" > > Please can you explain a bit about how kdump works, and why a physically > contiguous region with known base address is required. What actually > gets written out in the crash dump and in what format?The reserved region is the memory space for the "dump kernel". I believe the base address has to correspond to the base address compiled into the dump kernel - since we don''t want the dump kernel to try to own all of memory. It''s native Linux, so it likes to run in contiguous memory. When a panic occurs, Linux kexec jumps into the preloaded kdump kernel (if any). This kernel then reinitiases the hardware, using its own device drivers and uses these to write out the dump to disk. ISTR that the dump format is currently ELF, although I remember some talk on the Fastboot ML about adding some extra headers to make OS debugging easier. It''s a nice solution because you don''t rely on the hosed kernel to do the dump for you, and you don''t disturb its state in the process. It also makes it easy to do things like dumping to network devices, etc. In our case it has the added bonus that on dom0 *or* a Xen crash it ought to be possible to kexec into a native Linux kernel which could dump (possibly some configurable combination of) Xen itself, dom0, and all the other domains. Admittedly hypervisor crashes / hangs are rare, but it might aid debugging to be able to get a reliable dump of a crashed / hung Xen. This would also integrate with the Linux dump infrastructure, which would be useful to have.> Tim Deegan submitted a patch to add support for multiboot images (such > as Xen) to kexec a couple of years ago, and I believe it has been part > of the standard package for some time.It was in there last time I looked at the source code... I''ve never actually used it though, so in principle I guess it could have rotted. Or there could just be something weird happenning. Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hirokazu Takahashi
2006-Apr-23 15:53 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
Hi,> > > In summary, for kdump "linux crashdump=64M@16M" > > > becomes "xen kdump_megabytes=64 kdump_megabytes_base=16" > > > > Please can you explain a bit about how kdump works, and why a physically > > contiguous region with known base address is required. What actually > > gets written out in the crash dump and in what format? > > The reserved region is the memory space for the "dump kernel". I believe the > base address has to correspond to the base address compiled into the dump > kernel - since we don''t want the dump kernel to try to own all of memory. > It''s native Linux, so it likes to run in contiguous memory.The current implementation of linux kernel for x86 requires: 1. Memory for the kernel have to be physically contiguous. 2. The physical memory have to be mapped to specific virtual space. It assumes: virtual address for the kernel == physical address | 0x80000000.> When a panic occurs, Linux kexec jumps into the preloaded kdump kernel (if > any). This kernel then reinitiases the hardware, using its own device > drivers and uses these to write out the dump to disk. ISTR that the dump > format is currently ELF, although I remember some talk on the Fastboot ML > about adding some extra headers to make OS debugging easier. > > It''s a nice solution because you don''t rely on the hosed kernel to do the dump > for you, and you don''t disturb its state in the process. It also makes it > easy to do things like dumping to network devices, etc. > > In our case it has the added bonus that on dom0 *or* a Xen crash it ought to > be possible to kexec into a native Linux kernel which could dump (possibly > some configurable combination of) Xen itself, dom0, and all the other > domains. Admittedly hypervisor crashes / hangs are rare, but it might aid > debugging to be able to get a reliable dump of a crashed / hung Xen. > > This would also integrate with the Linux dump infrastructure, which would be > useful to have.After the dump, I think you will be able to kexec a new Xen and dom0 to restart the box automatically.> > Tim Deegan submitted a patch to add support for multiboot images (such > > as Xen) to kexec a couple of years ago, and I believe it has been part > > of the standard package for some time. > > It was in there last time I looked at the source code... I''ve never actually > used it though, so in principle I guess it could have rotted. Or there could > just be something weird happenning. > > Cheers, > MarkThanks, Hirokazu Takahashi. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> The reserved region is the memory space for the "dump > kernel". I believe the base address has to correspond to the > base address compiled into the dump kernel - since we don''t > want the dump kernel to try to own all of memory. > It''s native Linux, so it likes to run in contiguous memory.This approach is rather wasteful of memory, though maybe burning 64MB isn''t a big deal these days. Strictly speaking, we only really need to reserve a couple of MB for the kernel text. It''s unlikely that you''d want to dump pages belonging to unpriv guests, so there''s actually quite a lot of opportunity for shuffling things around to get the dump kernel to where it expects to be in the machine address space, zeroing the data/bss segments.> When a panic occurs, Linux kexec jumps into the preloaded > kdump kernel (if any). This kernel then reinitiases the > hardware, using its own device drivers and uses these to > write out the dump to disk. ISTR that the dump format is > currently ELF, although I remember some talk on the Fastboot > ML about adding some extra headers to make OS debugging easier.Is Xen and the dom0 kernel dumped as as separate ELF cores? Ian> It''s a nice solution because you don''t rely on the hosed > kernel to do the dump for you, and you don''t disturb its > state in the process. It also makes it easy to do things > like dumping to network devices, etc. > > In our case it has the added bonus that on dom0 *or* a Xen > crash it ought to be possible to kexec into a native Linux > kernel which could dump (possibly some configurable > combination of) Xen itself, dom0, and all the other domains. > Admittedly hypervisor crashes / hangs are rare, but it might > aid debugging to be able to get a reliable dump of a crashed > / hung Xen. > > This would also integrate with the Linux dump infrastructure, > which would be useful to have. > > > Tim Deegan submitted a patch to add support for multiboot > images (such > > as Xen) to kexec a couple of years ago, and I believe it > has been part > > of the standard package for some time. > > It was in there last time I looked at the source code... > I''ve never actually used it though, so in principle I guess > it could have rotted. Or there could just be something weird > happenning. > > Cheers, > Mark > > -- > Dave: Just a question. What use is a unicyle with no seat? > And no pedals! > Mark: To answer a question with a question: What use is a skateboard? > Dave: Skateboards have wheels. > Mark: My wheel has a wheel! >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Apr-23 22:47 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
> > The reserved region is the memory space for the "dump > > kernel". I believe the base address has to correspond to the > > base address compiled into the dump kernel - since we don''t > > want the dump kernel to try to own all of memory. > > It''s native Linux, so it likes to run in contiguous memory. > > This approach is rather wasteful of memory, though maybe burning 64MB > isn''t a big deal these days.Yep, although for a dump kernel''s userspace you probably don''t need that much. Using a minimalised userland, you''d be able to get this substantially less but I''m not sure to what extent they bother in modern distros... I''m not sure the distros are fully leveraging kdump yet, since it''s still fairly new. It depends a bit on how much you want to run - you could use gdb interactively from the dump kernel''s console, for instance.> Strictly speaking, we only really need to reserve a couple of MB for the > kernel text. > It''s unlikely that you''d want to dump pages belonging to unpriv guests, > so there''s actually quite a lot of opportunity for shuffling things > around to get the dump kernel to where it expects to be in the machine > address space, zeroing the data/bss segments.Yep, also true.> > When a panic occurs, Linux kexec jumps into the preloaded > > kdump kernel (if any). This kernel then reinitiases the > > hardware, using its own device drivers and uses these to > > write out the dump to disk. ISTR that the dump format is > > currently ELF, although I remember some talk on the Fastboot > > ML about adding some extra headers to make OS debugging easier. > > Is Xen and the dom0 kernel dumped as as separate ELF cores?I''m not sure how Horms has done this, if he has yet... I''m more familiar with native Linux kdump. I can''t really see it''d be that much harder to do this, though. Cheers, Mark> Ian > > > It''s a nice solution because you don''t rely on the hosed > > kernel to do the dump for you, and you don''t disturb its > > state in the process. It also makes it easy to do things > > like dumping to network devices, etc. > > > > In our case it has the added bonus that on dom0 *or* a Xen > > crash it ought to be possible to kexec into a native Linux > > kernel which could dump (possibly some configurable > > combination of) Xen itself, dom0, and all the other domains. > > Admittedly hypervisor crashes / hangs are rare, but it might > > aid debugging to be able to get a reliable dump of a crashed > > / hung Xen. > > > > This would also integrate with the Linux dump infrastructure, > > which would be useful to have. > > > > > Tim Deegan submitted a patch to add support for multiboot > > > > images (such > > > > > as Xen) to kexec a couple of years ago, and I believe it > > > > has been part > > > > > of the standard package for some time. > > > > It was in there last time I looked at the source code... > > I''ve never actually used it though, so in principle I guess > > it could have rotted. Or there could just be something weird > > happenning. > > > > Cheers, > > Mark > > > > -- > > Dave: Just a question. What use is a unicyle with no seat? > > And no pedals! > > Mark: To answer a question with a question: What use is a skateboard? > > Dave: Skateboards have wheels. > > Mark: My wheel has a wheel!-- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Mark Thank you for your comments!>I think kdump, in general, is a nifty solution for supporting crashdumps, >since using a separate kernel for crashdumps gives you the best possible >opportunity to complete them successfully. > >Integrating with the Linux crashdump infrastructure seems like a good idea. >It doesn''t stop other dom0 OSes using their own crashdump infrastructure, >but >with kdump we have a chance of getting a dump even if Xen itself crashes >(whilst admittedly rare, such crashes are something you''d want to get >debugged quickly!) >Yes, I also agree.>So, essentially, I think the idea is good - I''ll try and take a look through >the code. >I also try to read a new patch. http://lists.xensource.com/archives/html/xen-devel/2006-04/msg00968.html Can kexec hypecll use Hypercall 11 or 30? I think using not hypercall but dom0_op is good idea because using kexec is rare. :-) Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kazuo Moriwaka
2006-Apr-24 01:16 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
Hi, On 4/24/06, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:> > When a panic occurs, Linux kexec jumps into the preloaded > > kdump kernel (if any). This kernel then reinitiases the > > hardware, using its own device drivers and uses these to > > write out the dump to disk. ISTR that the dump format is > > currently ELF, although I remember some talk on the Fastboot > > ML about adding some extra headers to make OS debugging easier. > > Is Xen and the dom0 kernel dumped as as separate ELF cores?I''m working on clipping domain image from whole-machine dump for x86_32 now. Now my prototype reads ELF core and write dom0 image. todo: - Output format is not ELF core yet. Xen domain core image format(works with gdbserverxen). - register information is not work well. -- Kazuo Moriwaka _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kazuo Moriwaka
2006-Apr-24 01:30 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
I forgot attach my scripts and readme. On 4/24/06, Kazuo Moriwaka <moriwaka@valinux.co.jp> wrote:> Hi, > > On 4/24/06, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote: > > > When a panic occurs, Linux kexec jumps into the preloaded > > > kdump kernel (if any). This kernel then reinitiases the > > > hardware, using its own device drivers and uses these to > > > write out the dump to disk. ISTR that the dump format is > > > currently ELF, although I remember some talk on the Fastboot > > > ML about adding some extra headers to make OS debugging easier. > > > > Is Xen and the dom0 kernel dumped as as separate ELF cores? > > I''m working on clipping domain image from whole-machine dump for x86_32 now. > Now my prototype reads ELF core and write dom0 image. > > todo: > - Output format is not ELF core yet. Xen domain core image > format(works with gdbserverxen). > - register information is not work well.-- Kazuo Moriwaka _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2006-Apr-24 01:53 UTC
Hypercall number assignment convension (was Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386)
On Mon, Apr 24, 2006 at 10:10:36AM +0900, Akio Takebe wrote:> Can kexec hypecll use Hypercall 11 or 30? > I think using not hypercall but dom0_op is good idea > because using kexec is rare. :-)I think Rusty''s xen share also had a similar problem caused by the hypercall number conflict. Xen/ia64 with virtual physical model also needs a hypercall number for its own use. Currently it large enough (=256) that it is unlikly to be used by xen/x86. Is there any convension about how to take hypercall number? At least hypercall numbers for arch-specific purpose and experimental purpose should be defined. -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-24 07:32 UTC
Re: Hypercall number assignment convension (was Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386)
On 24 Apr 2006, at 02:53, Isaku Yamahata wrote:> I think Rusty''s xen share also had a similar problem caused by > the hypercall number conflict. > Xen/ia64 with virtual physical model also needs a hypercall number > for its own use. > Currently it large enough (=256) that it is unlikly to be used by > xen/x86. > > Is there any convension about how to take hypercall number? > At least hypercall numbers for arch-specific purpose and > experimental purpose should be defined.The list of __HYPERVISOR_* defines in public/xen.h in the main xen repository is the canonical place. For hypercalls in our tree, simply grabbing the next number in sequence usually makes sense. I''m not sure whether having structure to the hypercall numbers makes sense (e.g., a range for arch-specific usage) -- if so then maybe allocating from 64 upwards would make sense. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Muli Ben-Yehuda
2006-Apr-24 11:20 UTC
Re: Hypercall number assignment convension (was Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386)
On Mon, Apr 24, 2006 at 08:32:09AM +0100, Keir Fraser wrote:> The list of __HYPERVISOR_* defines in public/xen.h in the main xen > repository is the canonical place. For hypercalls in our tree, simply > grabbing the next number in sequence usually makes sense. I''m not sure > whether having structure to the hypercall numbers makes sense (e.g., a > range for arch-specific usage) -- if so then maybe allocating from 64 > upwards would make sense.Won''t having discontigous regions of hcalls break the NR_hypercall masking check? Cheers, Muli _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Horms
2006-Apr-25 00:11 UTC
Re: Hypercall number assignment convension (was Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386)
On Mon, Apr 24, 2006 at 08:32:09AM +0100, Keir Fraser wrote:> > On 24 Apr 2006, at 02:53, Isaku Yamahata wrote: > > >I think Rusty''s xen share also had a similar problem caused by > >the hypercall number conflict. > >Xen/ia64 with virtual physical model also needs a hypercall number > >for its own use. > >Currently it large enough (=256) that it is unlikly to be used by > >xen/x86. > > > >Is there any convension about how to take hypercall number? > >At least hypercall numbers for arch-specific purpose and > >experimental purpose should be defined. > > The list of __HYPERVISOR_* defines in public/xen.h in the main xen > repository is the canonical place. For hypercalls in our tree, simply > grabbing the next number in sequence usually makes sense. I''m not sure > whether having structure to the hypercall numbers makes sense (e.g., a > range for arch-specific usage) -- if so then maybe allocating from 64 > upwards would make sense.There is a small problem, in that for x86_32 at least the hypercall table is currently full with 32 entries (well, the last time I checked anyway), and my attempts to extend it were futile. Could you give me some advice on how to increase its size? -- Horms _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Kazuo Good work! Could you explain usage of your tool? For using your tool, do I need xen-syms, vmlinux(with debug option), and so on? Can this tool extract only dom0 from coredump file? Can this tool also extract xen or domU? Best Regards, Akio Takebe>I forgot attach my scripts and readme. > >On 4/24/06, Kazuo Moriwaka <moriwaka@valinux.co.jp> wrote: >> Hi, >> >> On 4/24/06, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote: >> > > When a panic occurs, Linux kexec jumps into the preloaded >> > > kdump kernel (if any). This kernel then reinitiases the >> > > hardware, using its own device drivers and uses these to >> > > write out the dump to disk. ISTR that the dump format is >> > > currently ELF, although I remember some talk on the Fastboot >> > > ML about adding some extra headers to make OS debugging easier. >> > >> > Is Xen and the dom0 kernel dumped as as separate ELF cores? >> >> I''m working on clipping domain image from whole-machine dump for x86_32 >> now. >> Now my prototype reads ELF core and write dom0 image. >> >> todo: >> - Output format is not ELF core yet. Xen domain core image >> format(works with gdbserverxen). >> - register information is not work well. > >-- >Kazuo Moriwaka > >-------------------------------text/plain------------------------------- >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kazuo Moriwaka
2006-Apr-25 02:34 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
HI, Akio, On 4/25/06, Akio Takebe <takebe_akio@jp.fujitsu.com> wrote:> Could you explain usage of your tool?I''m sorry for lack of description. Its usage is: ./dom0cut_x86_32.py -oOutputFile -dDumpFile -xXenSymsFile -vVmlinuxFile -telf Argument -t ''s filetype is input dumpfile''s type. and it shows some help: ./dom0cut_x86_32.py --help> For using your tool, do I need xen-syms, vmlinux(with debug option), > and so on?Yes, and internally, it calls readelf and nm to fetch ELF sections and some info. # now this script doesn''t use vmlinux debug info yet.> Can this tool extract only dom0 from coredump file?Yes, now it extract only dom0.> Can this tool also extract xen or domU?Not yet. But I think to extend the script to extract domU is not so difficult. To support domU: - In dom0cut_x86_32.py, dom0extract() calls XenImage.get_dom0p() to get *dom0 now. It should be replaced with some function like XenImage.get_domainp(id) to extract domU image. - kdump saves dom0''s latest CPU register info ELF core''s PT_NOTE, so dom0 needs some special treatment about register. For xen, I''m not clear how to set up suitable context information for dump file. regards, -- Kazuo Moriwaka> Best Regards, > > Akio Takebe > > >I forgot attach my scripts and readme. > > > >On 4/24/06, Kazuo Moriwaka <moriwaka@valinux.co.jp> wrote: > >> Hi, > >> > >> On 4/24/06, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote: > >> > > When a panic occurs, Linux kexec jumps into the preloaded > >> > > kdump kernel (if any). This kernel then reinitiases the > >> > > hardware, using its own device drivers and uses these to > >> > > write out the dump to disk. ISTR that the dump format is > >> > > currently ELF, although I remember some talk on the Fastboot > >> > > ML about adding some extra headers to make OS debugging easier. > >> > > >> > Is Xen and the dom0 kernel dumped as as separate ELF cores? > >> > >> I''m working on clipping domain image from whole-machine dump for x86_32 > >> now. > >> Now my prototype reads ELF core and write dom0 image. > >> > >> todo: > >> - Output format is not ELF core yet. Xen domain core image > >> format(works with gdbserverxen). > >> - register information is not work well. > > > >-- > >Kazuo Moriwaka > > > >-------------------------------text/plain------------------------------- > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xensource.com > >http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-25 09:57 UTC
Re: Hypercall number assignment convension (was Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386)
On 25 Apr 2006, at 01:11, Horms wrote:> There is a small problem, in that for x86_32 at least the hypercall > table is currently full with 32 entries (well, the last time I checked > anyway), and my attempts to extend it were futile. Could you give me > some advice on how to increase its size?Double NR_hypercalls in include/asm-x86/config.h. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2006-Apr-26 02:09 UTC
Re: Hypercall number assignment convension (was Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386)
On Mon, Apr 24, 2006 at 08:32:09AM +0100, Keir Fraser wrote:> The list of __HYPERVISOR_* defines in public/xen.h in the main xen > repository is the canonical place. For hypercalls in our tree, simply > grabbing the next number in sequence usually makes sense. I''m not sure > whether having structure to the hypercall numbers makes sense (e.g., a > range for arch-specific usage) -- if so then maybe allocating from 64 > upwards would make sense.Actually xen/ia64 requires only one hypercall number for now. I attached the patches to take one. I''m not sure what name you prefer, so I attached two patches. Please apply which you prefer. (or invent whatever name you like.) -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Apr 25, 2006 at 10:57:11AM +0100, Keir Fraser wrote:> > On 25 Apr 2006, at 01:11, Horms wrote: > > >There is a small problem, in that for x86_32 at least the hypercall > >table is currently full with 32 entries (well, the last time I checked > >anyway), and my attempts to extend it were futile. Could you give me > >some advice on how to increase its size? > > Double NR_hypercalls in include/asm-x86/config.h.Thanks, the new version of the kexec patch below does just that. I did try that before, but for some reason it didn''t work, most likely because it was Friday afternoon at the time. Also, this patch takes hypercall 32, which conflicts with Yamahata-san''s ia64 hypercall. Should I move to 33 and submit a patch that just does hypercall reservation as he did? -- Horms http://www.vergenet.net/~horms/ kexec: framework and i386 This is an implementation of kexec for dom0/xen, that allows kexecing of the physical machine from xen. The approach taken is to move the architecture-dependant kexec code into a new hypercall. Some notes: * machine_kexec_cleanup() and machine_kexec_prepare() don''t do anything in i386. So while this patch adds a framework for them, I am not sure what parameters are needs at this stage. * Only works for UP, as machine_shutdown is not implemented yet * kexecing into xen does not seem to work, I think that kexec-tools needs updating, but I have not investigated yet * Kdump works by first copying the kernel into dom0 segments and relocating them later in xen, the same way that kexec does The only difference is that the relocation is made into an area reserved by xen * Kdump reservation is made using the xen command line parameters, kdump_megabytes and kdump_megabytes_base, rather than the linux option crashkernel, which is now ignored. Two parameters are used instead of one to simplify parsing. This can be cleaned up later if desired. But the reservation seems to need to be made by xen to make sure that it happens early enough. * This patch uses a new kexec hypercall Highlights since the previous posted version: * Use new kexec hypercall instead of dom0_op - hypercall table is expanded to 64 entries Prepared by Horms and Magnus Damm Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> Signed-Off-By: Horms <horms@verge.net.au> linux-2.6-xen-sparse/arch/i386/Kconfig | 2 linux-2.6-xen-sparse/arch/i386/kernel/Makefile | 2 linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c | 24 + linux-2.6-xen-sparse/drivers/xen/core/Makefile | 1 linux-2.6-xen-sparse/drivers/xen/core/crash.c | 98 +++++ linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c | 73 ++++ linux-2.6-xen-sparse/drivers/xen/core/reboot.c | 7 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h | 10 ref-linux-2.6.16/drivers/base/cpu.c | 4 ref-linux-2.6.16/kernel/kexec.c | 52 ++ xen/arch/x86/Makefile | 1 xen/arch/x86/dom0_ops.c | 3 xen/arch/x86/machine_kexec.c | 174 ++++++++++ xen/arch/x86/setup.c | 75 +++- xen/arch/x86/x86_32/entry.S | 2 xen/common/Makefile | 1 xen/common/kexec.c | 71 ++++ xen/common/page_alloc.c | 33 + xen/include/asm-x86/config.h | 2 xen/include/asm-x86/hypercall.h | 5 xen/include/public/kexec.h | 43 ++ xen/include/public/xen.h | 9 xen/include/xen/mm.h | 1 23 files changed, 659 insertions(+), 34 deletions(-) --- x/linux-2.6-xen-sparse/arch/i386/Kconfig +++ x/linux-2.6-xen-sparse/arch/i386/Kconfig @@ -726,7 +726,7 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call (EXPERIMENTAL)" - depends on EXPERIMENTAL && !X86_XEN + depends on EXPERIMENTAL help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot --- x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile +++ x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile @@ -89,7 +89,7 @@ include $(srctree)/scripts/Makefile.xen obj-y += fixup.o microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o -n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o +n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o machine_kexec.o crash.o obj-y := $(call filterxen, $(obj-y), $(n-obj-xen)) obj-y := $(call cherrypickxen, $(obj-y)) --- x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c +++ x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c @@ -68,6 +68,10 @@ #include "setup_arch_pre.h" #include <bios_ebda.h> +#ifdef CONFIG_XEN +#include <xen/interface/kexec.h> +#endif + /* Forward Declaration. */ void __init find_max_pfn(void); @@ -932,6 +936,7 @@ static void __init parse_cmdline_early ( * after a kernel panic. */ else if (!memcmp(from, "crashkernel=", 12)) { +#ifndef CONFIG_XEN unsigned long size, base; size = memparse(from+12, &from); if (*from == ''@'') { @@ -942,6 +947,10 @@ static void __init parse_cmdline_early ( crashk_res.start = base; crashk_res.end = base + size - 1; } +#else + printk("Ignoring crashkernel command line, " + "parameter will be supplied by xen\n"); +#endif } #endif #ifdef CONFIG_PROC_VMCORE @@ -1318,9 +1327,21 @@ void __init setup_bootmem_allocator(void } #endif #ifdef CONFIG_KEXEC +#ifndef CONFIG_XEN if (crashk_res.start != crashk_res.end) reserve_bootmem(crashk_res.start, crashk_res.end - crashk_res.start + 1); +#else + { + struct kexec_arg xen_kexec_arg; + BUG_ON(HYPERVISOR_kexec(KEXEC_CMD_reserve, &xen_kexec_arg)); + if (xen_kexec_arg.u.reserve.size) { + crashk_res.start = xen_kexec_arg.u.reserve.start; + crashk_res.end = xen_kexec_arg.u.reserve.start + + xen_kexec_arg.u.reserve.size - 1; + } + } +#endif #endif if (!xen_feature(XENFEAT_auto_translated_physmap)) @@ -1395,6 +1416,9 @@ legacy_init_iomem_resources(struct resou res->end = map[i].end - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; request_resource(&iomem_resource, res); +#ifdef CONFIG_KEXEC + request_resource(res, &crashk_res); +#endif } free_bootmem(__pa(map), PAGE_SIZE); --- x/linux-2.6-xen-sparse/drivers/xen/core/Makefile +++ x/linux-2.6-xen-sparse/drivers/xen/core/Makefile @@ -9,3 +9,4 @@ obj-$(CONFIG_NET) += skbuff.o obj-$(CONFIG_SMP) += smpboot.o obj-$(CONFIG_SYSFS) += hypervisor_sysfs.o obj-$(CONFIG_XEN_SYSFS) += xen_sysfs.o +obj-$(CONFIG_KEXEC) += machine_kexec.o crash.o --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/crash.c @@ -0,0 +1,98 @@ +/* + * Architecture specific (i386-xen) functions for kexec based crash dumps. + * + * Created by: Horms <horms@verge.net.au> + * + */ + +#include <linux/kernel.h> /* For printk */ + +/* XXX: final_note(), crash_save_this_cpu() and crash_save_self() + * are copied from arch/i386/kernel/crash.c, might be good to either + * the original functions non-static and use them, or just + * merge this this into that file. + */ +#include <linux/elf.h> /* For struct elf_note */ +#include <linux/elfcore.h> /* For struct elf_prstatus */ +#include <linux/kexec.h> /* crash_notes */ + +static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, + size_t data_len) +{ + struct elf_note note; + + note.n_namesz = strlen(name) + 1; + note.n_descsz = data_len; + note.n_type = type; + memcpy(buf, ¬e, sizeof(note)); + buf += (sizeof(note) +3)/4; + memcpy(buf, name, note.n_namesz); + buf += (note.n_namesz + 3)/4; + memcpy(buf, data, note.n_descsz); + buf += (note.n_descsz + 3)/4; + + return buf; +} + +static void final_note(u32 *buf) +{ + struct elf_note note; + + note.n_namesz = 0; + note.n_descsz = 0; + note.n_type = 0; + memcpy(buf, ¬e, sizeof(note)); +} + +static void crash_save_this_cpu(struct pt_regs *regs, int cpu) +{ + struct elf_prstatus prstatus; + u32 *buf; + + if ((cpu < 0) || (cpu >= NR_CPUS)) + return; + + /* Using ELF notes here is opportunistic. + * I need a well defined structure format + * for the data I pass, and I need tags + * on the data to indicate what information I have + * squirrelled away. ELF notes happen to provide + * all of that that no need to invent something new. + */ + buf = (u32*)per_cpu_ptr(crash_notes, cpu); + if (!buf) + return; + memset(&prstatus, 0, sizeof(prstatus)); + prstatus.pr_pid = current->pid; + elf_core_copy_regs(&prstatus.pr_reg, regs); + buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus, + sizeof(prstatus)); + final_note(buf); +} + +static void crash_save_self(struct pt_regs *regs) +{ + int cpu; + + cpu = smp_processor_id(); + crash_save_this_cpu(regs, cpu); +} + + +void machine_crash_shutdown(struct pt_regs *regs) +{ + /* XXX: This should do something */ + printk("xen-kexec: Need to turn of other CPUS in " + "machine_crash_shutdown()\n"); + crash_save_self(regs); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c @@ -0,0 +1,73 @@ +/* + * machine_kexec.c - handle transition of Linux booting another kernel + * + * Created By: Horms <horms@verge.net.au> + * + * Losely based on arch/i386/kernel/machine_kexec.c + */ + +#include <linux/kexec.h> +#include <xen/interface/kexec.h> +#include <linux/mm.h> +#include <asm/hypercall.h> + +const extern unsigned char relocate_new_kernel[]; +extern unsigned int relocate_new_kernel_size; + +/* + * A architecture hook called to validate the + * proposed image and prepare the control pages + * as needed. The pages for KEXEC_CONTROL_CODE_SIZE + * have been allocated, but the segments have yet + * been copied into the kernel. + * + * Do what every setup is needed on image and the + * reboot code buffer to allow us to avoid allocations + * later. + * + * Currently nothing. + */ +int machine_kexec_prepare(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.helper.data = NULL; + return HYPERVISOR_kexec(KEXEC_CMD_kexec_prepare, &hypercall_arg); +} + +/* + * Undo anything leftover by machine_kexec_prepare + * when an image is freed. + */ +void machine_kexec_cleanup(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.helper.data = NULL; + HYPERVISOR_kexec(KEXEC_CMD_kexec_cleanup, &hypercall_arg); +} + +/* + * Do not allocate memory (or fail in any way) in machine_kexec(). + * We are past the point of no return, committed to rebooting now. + */ +NORET_TYPE void machine_kexec(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.kexec.indirection_page = image->head; + hypercall_arg.u.kexec.reboot_code_buffer = + pfn_to_mfn(page_to_pfn(image->control_code_page)) << PAGE_SHIFT; + hypercall_arg.u.kexec.start_address = image->start; + hypercall_arg.u.kexec.relocate_new_kernel = relocate_new_kernel; + hypercall_arg.u.kexec.relocate_new_kernel_size = + relocate_new_kernel_size; + HYPERVISOR_kexec(KEXEC_CMD_kexec, &hypercall_arg); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c +++ x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c @@ -370,6 +370,13 @@ static int __init setup_shutdown_event(v subsys_initcall(setup_shutdown_event); +#ifdef CONFIG_KEXEC +void machine_shutdown(void) +{ + printk("machine_shutdown: does nothing\n"); +} +#endif + /* * Local variables: * c-file-style: "linux" --- x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h +++ x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h @@ -37,6 +37,8 @@ # error "please don''t include this file directly" #endif +#include <xen/interface/kexec.h> + #define __STR(x) #x #define STR(x) __STR(x) @@ -343,6 +345,14 @@ HYPERVISOR_xenoprof_op( return _hypercall2(int, xenoprof_op, op, arg); } +static inline int +HYPERVISOR_kexec( + unsigned long op, kexec_arg_t * arg) +{ + return _hypercall2(int, kexec_op, op, arg); +} + + #endif /* __HYPERCALL_H__ */ --- x/ref-linux-2.6.16/drivers/base/cpu.c +++ x/ref-linux-2.6.16/drivers/base/cpu.c @@ -101,7 +101,11 @@ static ssize_t show_crash_notes(struct s * boot up and this data does not change there after. Hence this * operation should be safe. No locking required. */ +#ifndef CONFIG_XEN addr = __pa(per_cpu_ptr(crash_notes, cpunum)); +#else + addr = virt_to_machine(per_cpu_ptr(crash_notes, cpunum)); +#endif rc = sprintf(buf, "%Lx\n", addr); return rc; } --- x/ref-linux-2.6.16/kernel/kexec.c +++ x/ref-linux-2.6.16/kernel/kexec.c @@ -38,6 +38,20 @@ struct resource crashk_res = { .flags = IORESOURCE_BUSY | IORESOURCE_MEM }; +/* Kexec needs to know about the actually physical addresss. + * But in xen, a physical address is a pseudo-physical addresss. */ +#ifndef CONFIG_XEN +#define kexec_page_to_pfn(page) page_to_pfn(page) +#define kexec_pfn_to_page(pfn) pfn_to_page(pfn) +#define kexec_virt_to_phys(addr) virt_to_phys(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(addr) +#else +#define kexec_page_to_pfn(page) pfn_to_mfn(page_to_pfn(page)) +#define kexec_pfn_to_page(pfn) pfn_to_page(mfn_to_pfn(pfn)) +#define kexec_virt_to_phys(addr) virt_to_machine(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr)) +#endif + int kexec_should_crash(struct task_struct *p) { if (in_interrupt() || !p->pid || p->pid == 1 || panic_on_oops) @@ -403,7 +417,7 @@ static struct page *kimage_alloc_normal_ pages = kimage_alloc_pages(GFP_KERNEL, order); if (!pages) break; - pfn = page_to_pfn(pages); + pfn = kexec_page_to_pfn(pages); epfn = pfn + count; addr = pfn << PAGE_SHIFT; eaddr = epfn << PAGE_SHIFT; @@ -437,6 +451,7 @@ static struct page *kimage_alloc_normal_ return pages; } +#ifndef CONFIG_XEN static struct page *kimage_alloc_crash_control_pages(struct kimage *image, unsigned int order) { @@ -490,7 +505,7 @@ static struct page *kimage_alloc_crash_c } /* If I don''t overlap any segments I have found my hole! */ if (i == image->nr_segments) { - pages = pfn_to_page(hole_start >> PAGE_SHIFT); + pages = kexec_pfn_to_page(hole_start >> PAGE_SHIFT); break; } } @@ -517,6 +532,13 @@ struct page *kimage_alloc_control_pages( return pages; } +#else /* !CONFIG_XEN */ +struct page *kimage_alloc_control_pages(struct kimage *image, + unsigned int order) +{ + return kimage_alloc_normal_control_pages(image, order); +} +#endif static int kimage_add_entry(struct kimage *image, kimage_entry_t entry) { @@ -532,7 +554,7 @@ static int kimage_add_entry(struct kimag return -ENOMEM; ind_page = page_address(page); - *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION; + *image->entry = kexec_virt_to_phys(ind_page) | IND_INDIRECTION; image->entry = ind_page; image->last_entry = ind_page + ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1); @@ -593,13 +615,13 @@ static int kimage_terminate(struct kimag #define for_each_kimage_entry(image, ptr, entry) \ for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \ ptr = (entry & IND_INDIRECTION)? \ - phys_to_virt((entry & PAGE_MASK)): ptr +1) + kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1) static void kimage_free_entry(kimage_entry_t entry) { struct page *page; - page = pfn_to_page(entry >> PAGE_SHIFT); + page = kexec_pfn_to_page(entry >> PAGE_SHIFT); kimage_free_pages(page); } @@ -686,7 +708,7 @@ static struct page *kimage_alloc_page(st * have a match. */ list_for_each_entry(page, &image->dest_pages, lru) { - addr = page_to_pfn(page) << PAGE_SHIFT; + addr = kexec_page_to_pfn(page) << PAGE_SHIFT; if (addr == destination) { list_del(&page->lru); return page; @@ -701,12 +723,12 @@ static struct page *kimage_alloc_page(st if (!page) return NULL; /* If the page cannot be used file it away */ - if (page_to_pfn(page) > + if (kexec_page_to_pfn(page) > (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { list_add(&page->lru, &image->unuseable_pages); continue; } - addr = page_to_pfn(page) << PAGE_SHIFT; + addr = kexec_page_to_pfn(page) << PAGE_SHIFT; /* If it is the destination page we want use it */ if (addr == destination) @@ -729,7 +751,7 @@ static struct page *kimage_alloc_page(st struct page *old_page; old_addr = *old & PAGE_MASK; - old_page = pfn_to_page(old_addr >> PAGE_SHIFT); + old_page = kexec_pfn_to_page(old_addr >> PAGE_SHIFT); copy_highpage(page, old_page); *old = addr | (*old & ~PAGE_MASK); @@ -779,7 +801,7 @@ static int kimage_load_normal_segment(st result = -ENOMEM; goto out; } - result = kimage_add_page(image, page_to_pfn(page) + result = kimage_add_page(image, kexec_page_to_pfn(page) << PAGE_SHIFT); if (result < 0) goto out; @@ -811,6 +833,7 @@ out: return result; } +#ifndef CONFIG_XEN static int kimage_load_crash_segment(struct kimage *image, struct kexec_segment *segment) { @@ -833,7 +856,7 @@ static int kimage_load_crash_segment(str char *ptr; size_t uchunk, mchunk; - page = pfn_to_page(maddr >> PAGE_SHIFT); + page = kexec_pfn_to_page(maddr >> PAGE_SHIFT); if (page == 0) { result = -ENOMEM; goto out; @@ -881,6 +904,13 @@ static int kimage_load_segment(struct ki return result; } +#else /* CONFIG_XEN */ +static int kimage_load_segment(struct kimage *image, + struct kexec_segment *segment) +{ + return kimage_load_normal_segment(image, segment); +} +#endif /* * Exec Kernel system call: for obvious reasons only root may call it. --- x/xen/arch/x86/Makefile +++ x/xen/arch/x86/Makefile @@ -38,6 +38,7 @@ obj-y += trampoline.o obj-y += traps.o obj-y += usercopy.o obj-y += x86_emulate.o +obj-y += machine_kexec.o ifneq ($(pae),n) obj-$(x86_32) += shadow.o shadow_public.o shadow_guest32.o --- x/xen/arch/x86/dom0_ops.c +++ x/xen/arch/x86/dom0_ops.c @@ -29,6 +29,9 @@ #include <asm/mtrr.h> #include "cpu/mtrr/mtrr.h" +extern unsigned int opt_kdump_megabytes; +extern unsigned int opt_kdump_megabytes_base; + #define TRC_DOM0OP_ENTER_BASE 0x00020000 #define TRC_DOM0OP_LEAVE_BASE 0x00030000 --- /dev/null +++ x/xen/arch/x86/machine_kexec.c @@ -0,0 +1,174 @@ +/****************************************************************************** + * arch/x86/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <xen/config.h> +#include <xen/types.h> +#include <xen/domain_page.h> +#include <xen/timer.h> +#include <xen/sched.h> +#include <asm/page.h> +#include <asm/flushtlb.h> +#include <public/xen.h> +#include <public/kexec.h> + +#ifdef CONFIG_X86_32 + +typedef asmlinkage void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long reboot_code_buffer, + unsigned long start_address, + unsigned int has_pae); + +#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) + +#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L2_ATTR (_PAGE_PRESENT) + +#ifndef CONFIG_X86_PAE + +static u32 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + unsigned long mfn; + u32 *pgtable_level2; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level2 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + write_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level2); +} + +#else +static u64 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; +static u64 pgtable_level2[L2_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + int mfn; + intpte_t *pgtable_level3; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level3 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + set_64bit(&pgtable_level3[l3_table_offset(address)], + __pa(pgtable_level2) | L2_ATTR); + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + load_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level3); +} +#endif + +static void kexec_load_segments(void) +{ +#define __SSTR(X) #X +#define SSTR(X) __SSTR(X) + __asm__ __volatile__ ( + "\tljmp $"SSTR(__HYPERVISOR_CS)",$1f\n" + "\t1:\n" + "\tmovl $"SSTR(__HYPERVISOR_DS)",%%eax\n" + "\tmovl %%eax,%%ds\n" + "\tmovl %%eax,%%es\n" + "\tmovl %%eax,%%fs\n" + "\tmovl %%eax,%%gs\n" + "\tmovl %%eax,%%ss\n" + ::: "eax", "memory"); +#undef SSTR +#undef __SSTR +} + +#define kexec_load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) +static void kexec_set_idt(void *newidt, __u16 limit) +{ + struct Xgt_desc_struct curidt; + + /* ia32 supports unaliged loads & stores */ + curidt.size = limit; + curidt.address = (unsigned long)newidt; + + kexec_load_idt(&curidt); + +}; + +#define kexec_load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) +static void kexec_set_gdt(void *newgdt, __u16 limit) +{ + struct Xgt_desc_struct curgdt; + + /* ia32 supports unaligned loads & stores */ + curgdt.size = limit; + curgdt.address = (unsigned long)newgdt; + + kexec_load_gdt(&curgdt); +}; + +#endif + +int machine_kexec_prepare(struct kexec_arg *arg) +{ + return 0; +} + +void machine_kexec_cleanup(struct kexec_arg *arg) +{ +} + +void machine_kexec(struct kexec_arg *arg) +{ +#ifdef CONFIG_X86_32 + relocate_new_kernel_t rnk; + + local_irq_disable(); + + identity_map_page(arg->u.kexec.reboot_code_buffer); + + copy_from_user((void *)arg->u.kexec.reboot_code_buffer, + arg->u.kexec.relocate_new_kernel, + arg->u.kexec.relocate_new_kernel_size); + + kexec_load_segments(); + + kexec_set_gdt(__va(0),0); + + kexec_set_idt(__va(0),0); + + rnk = (relocate_new_kernel_t) arg->u.kexec.reboot_code_buffer; + + (*rnk)(arg->u.kexec.indirection_page, arg->u.kexec.reboot_code_buffer, + arg->u.kexec.start_address, cpu_has_pae); +#endif +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/arch/x86/setup.c +++ x/xen/arch/x86/setup.c @@ -38,6 +38,11 @@ static unsigned int opt_xenheap_megabyte integer_param("xenheap_megabytes", opt_xenheap_megabytes); #endif +unsigned int opt_kdump_megabytes = 0; +integer_param("kdump_megabytes", opt_kdump_megabytes); +unsigned int opt_kdump_megabytes_base = 0; +integer_param("kdump_megabytes_base", opt_kdump_megabytes_base); + /* opt_nosmp: If true, secondary processors are ignored. */ static int opt_nosmp = 0; boolean_param("nosmp", opt_nosmp); @@ -192,6 +197,20 @@ static void percpu_free_unused_areas(voi __pa(__per_cpu_end)); } +void __init move_memory(unsigned long dst, + unsigned long src_start, unsigned long src_end) +{ +#if defined(CONFIG_X86_32) + memmove((void *)dst, /* use low mapping */ + (void *)src_start, /* use low mapping */ + src_end - src_start); +#elif defined(CONFIG_X86_64) + memmove(__va(dst), + __va(src_start), + src_end - src_start); +#endif +} + void __init __start_xen(multiboot_info_t *mbi) { char __cmdline[] = "", *cmdline = __cmdline; @@ -327,15 +346,8 @@ void __init __start_xen(multiboot_info_t initial_images_start = xenheap_phys_end; initial_images_end = initial_images_start + modules_length; -#if defined(CONFIG_X86_32) - memmove((void *)initial_images_start, /* use low mapping */ - (void *)mod[0].mod_start, /* use low mapping */ - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#elif defined(CONFIG_X86_64) - memmove(__va(initial_images_start), - __va(mod[0].mod_start), - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#endif + move_memory(initial_images_start, + mod[0].mod_start, mod[mbi->mods_count-1].mod_end); /* Initialise boot-time allocator with all RAM situated after modules. */ xenheap_phys_start = init_boot_allocator(__pa(&_end)); @@ -383,6 +395,51 @@ void __init __start_xen(multiboot_info_t #endif } + if (opt_kdump_megabytes) { + unsigned long kdump_start, kdump_size, k; + + /* mark images pages as free for now */ + + init_boot_pages(initial_images_start, initial_images_end); + + kdump_start = opt_kdump_megabytes_base << 20; + kdump_size = opt_kdump_megabytes << 20; + + printk("Kdump: %luMB (%lukB) at 0x%lx\n", + kdump_size >> 20, + kdump_size >> 10, + kdump_start); + + if ((kdump_start & ~PAGE_MASK) || (kdump_size & ~PAGE_MASK)) + panic("Kdump parameters not page aligned\n"); + + kdump_start >>= PAGE_SHIFT; + kdump_size >>= PAGE_SHIFT; + + /* allocate pages for Kdump memory area */ + + k = alloc_boot_pages_at(kdump_size, kdump_start); + + if (k != kdump_start) + panic("Unable to reserve Kdump memory\n"); + + /* allocate pages for relocated initial images */ + + k = ((initial_images_end - initial_images_start) & ~PAGE_MASK) ? 1 : 0; + k += (initial_images_end - initial_images_start) >> PAGE_SHIFT; + + k = alloc_boot_pages(k, 1); + + if (!k) + panic("Unable to allocate initial images memory\n"); + + move_memory(k << PAGE_SHIFT, initial_images_start, initial_images_end); + + initial_images_end -= initial_images_start; + initial_images_start = k << PAGE_SHIFT; + initial_images_end += initial_images_start; + } + memguard_init(); printk("System RAM: %luMB (%lukB)\n", --- x/xen/arch/x86/x86_32/entry.S +++ x/xen/arch/x86/x86_32/entry.S @@ -646,6 +646,7 @@ ENTRY(hypercall_table) .long do_arch_sched_op .long do_callback_op /* 30 */ .long do_xenoprof_op + .long do_kexec .rept NR_hypercalls-((.-hypercall_table)/4) .long do_ni_hypercall .endr @@ -683,6 +684,7 @@ ENTRY(hypercall_args_table) .byte 2 /* do_arch_sched_op */ .byte 2 /* do_callback_op */ /* 30 */ .byte 2 /* do_xenoprof_op */ + .byte 2 /* do_kexec */ .rept NR_hypercalls-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr --- x/xen/common/Makefile +++ x/xen/common/Makefile @@ -7,6 +7,7 @@ obj-y += event_channel.o obj-y += grant_table.o obj-y += kernel.o obj-y += keyhandler.o +obj-y += kexec.o obj-y += lib.o obj-y += memory.o obj-y += multicall.o --- /dev/null +++ x/xen/common/kexec.c @@ -0,0 +1,71 @@ +/* + * Achitecture independent kexec code for Xen + * + * At this statge, just a switch for the kexec hypercall into + * architecture dependent code. + * + * Created By: Horms <horms@verge.net.au> + */ + +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/guest_access.h> +#include <xen/sched.h> +#include <public/xen.h> +#include <public/kexec.h> + +extern int machine_kexec_prepare(struct kexec_arg *arg); +extern void machine_kexec_cleanup(struct kexec_arg *arg); +extern void machine_kexec(struct kexec_arg *arg); + +extern unsigned int opt_kdump_megabytes; +extern unsigned int opt_kdump_megabytes_base; + +int do_kexec(unsigned long op, + GUEST_HANDLE(kexec_arg_t) uarg) +{ + struct kexec_arg arg; + + if ( !IS_PRIV(current->domain) ) + return -EPERM; + + if ( op != KEXEC_CMD_reserve && + unlikely(copy_from_guest(&arg, uarg, 1) != 0) ) + { + printk("do_kexec: __copy_from_guest failed"); + return -EFAULT; + } + + switch(op) { + case KEXEC_CMD_kexec: + machine_kexec(&arg); + return -EINVAL; /* Not Reached */ + case KEXEC_CMD_kexec_prepare: + return machine_kexec_prepare(&arg); + case KEXEC_CMD_kexec_cleanup: + machine_kexec_cleanup(&arg); + return 0; + case KEXEC_CMD_reserve: + arg.u.reserve.size = opt_kdump_megabytes << 20; + arg.u.reserve.start = opt_kdump_megabytes_base << 20; + if ( unlikely(copy_to_guest(uarg, &arg, 1) != 0) ) + { + printk("do_kexec: copy_to_guest failed"); + return -EFAULT; + } + return 0; + } + + return -EINVAL; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + --- x/xen/common/page_alloc.c +++ x/xen/common/page_alloc.c @@ -212,24 +212,35 @@ void init_boot_pages(paddr_t ps, paddr_t } } +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at) +{ + unsigned long i; + + for ( i = 0; i < nr_pfns; i++ ) + if ( allocated_in_map(pfn_at + i) ) + break; + + if ( i == nr_pfns ) + { + map_alloc(pfn_at, nr_pfns); + return pfn_at; + } + + return 0; +} + unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align) { - unsigned long pg, i; + unsigned long pg, i = 0; for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align ) { - for ( i = 0; i < nr_pfns; i++ ) - if ( allocated_in_map(pg + i) ) - break; - - if ( i == nr_pfns ) - { - map_alloc(pg, nr_pfns); - return pg; - } + i = alloc_boot_pages_at(nr_pfns, pg); + if (i != 0) + break; } - return 0; + return i; } --- x/xen/include/asm-x86/config.h +++ x/xen/include/asm-x86/config.h @@ -66,7 +66,7 @@ #define barrier() __asm__ __volatile__("": : :"memory") /* A power-of-two value greater than or equal to number of hypercalls. */ -#define NR_hypercalls 32 +#define NR_hypercalls 64 #if NR_hypercalls & (NR_hypercalls - 1) #error "NR_hypercalls must be a power-of-two value" --- x/xen/include/asm-x86/hypercall.h +++ x/xen/include/asm-x86/hypercall.h @@ -6,6 +6,7 @@ #define __ASM_X86_HYPERCALL_H__ #include <public/physdev.h> +#include <public/kexec.h> extern long do_set_trap_table( @@ -79,6 +80,10 @@ extern long arch_do_vcpu_op( int cmd, struct vcpu *v, GUEST_HANDLE(void) arg); +extern int +do_kexec( + unsigned long op, GUEST_HANDLE(kexec_arg_t) uarg); + #ifdef __x86_64__ extern long --- /dev/null +++ x/xen/include/public/kexec.h @@ -0,0 +1,43 @@ +/* + * kexec.h: Xen kexec + * + * Created By: Horms <horms@verge.net.au> + */ + +#ifndef _XEN_PUBLIC_KEXEC_H +#define _XEN_PUBLIC_KEXEC_H + +/* + * Scratch space for passing arguments to the kexec hypercall + */ +typedef struct kexec_arg { + union { + struct { + unsigned long data; /* Not sure what this should be yet */ + } helper; + struct { + unsigned long indirection_page; + unsigned long reboot_code_buffer; + unsigned long start_address; + const char *relocate_new_kernel; + unsigned int relocate_new_kernel_size; + } kexec; + struct { + unsigned long size; + unsigned long start; + } reserve; + } u; +} kexec_arg_t; +DEFINE_GUEST_HANDLE(kexec_arg_t); + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/include/public/xen.h +++ x/xen/include/public/xen.h @@ -62,6 +62,7 @@ #define __HYPERVISOR_sched_op 29 #define __HYPERVISOR_callback_op 30 #define __HYPERVISOR_xenoprof_op 31 +#define __HYPERVISOR_kexec_op 32 /* * VIRTUAL INTERRUPTS @@ -215,6 +216,14 @@ DEFINE_GUEST_HANDLE(mmuext_op_t); #define VMASST_TYPE_writable_pagetables 2 #define MAX_VMASST_TYPE 2 +/* + * Operations for kexec. + */ +#define KEXEC_CMD_kexec 0 +#define KEXEC_CMD_kexec_prepare 1 +#define KEXEC_CMD_kexec_cleanup 2 +#define KEXEC_CMD_reserve 3 + #ifndef __ASSEMBLY__ typedef uint16_t domid_t; --- x/xen/include/xen/mm.h +++ x/xen/include/xen/mm.h @@ -40,6 +40,7 @@ struct page_info; paddr_t init_boot_allocator(paddr_t bitmap_start); void init_boot_pages(paddr_t ps, paddr_t pe); unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align); +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at); void end_boot_allocator(void); /* Generic allocator. These functions are *not* interrupt-safe. */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Simon Horman [Horms]
2006-May-02 08:17 UTC
[Xen-devel] [PATCH]: kexec: framework and i386 (Take VI)
Hi, I will be out of the office until next Monday, so here is the latest and greatest before I go. Tested against 9896, should also work fine with tip (9903). -- Horms http://www.vergenet.net/~horms/ kexec: framework and i386 This is an implementation of kexec for dom0/xen, that allows kexecing of the physical machine from xen. The approach taken is to move the architecture-dependant kexec code into a new hypercall. Some notes: * machine_kexec_cleanup() and machine_kexec_prepare() don''t do anything in i386. So while this patch adds a framework for them, I am not sure what parameters are needs at this stage. * Only works for UP, as machine_shutdown is not implemented yet * kexecing into xen does not seem to work, I think that kexec-tools needs updating, but I have not investigated yet * Kdump works by first copying the kernel into dom0 segments and relocating them later in xen, the same way that kexec does The only difference is that the relocation is made into an area reserved by xen * Kdump reservation is made using the xen command line parameters, kdump_megabytes and kdump_megabytes_base, rather than the linux option crashkernel, which is now ignored. Two parameters are used instead of one to simplify parsing. This can be cleaned up later if desired. But the reservation seems to need to be made by xen to make sure that it happens early enough. * This patch uses a new kexec hypercall Highlights since the previous posted version: * SMP kexec (not kdump yet) * Split x86_32 specific xen code out Prepared by Horms and Magnus Damm Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> Signed-Off-By: Horms <horms@verge.net.au> linux-2.6-xen-sparse/arch/i386/Kconfig | 2 linux-2.6-xen-sparse/arch/i386/kernel/Makefile | 2 linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c | 24 + linux-2.6-xen-sparse/drivers/xen/core/Makefile | 1 linux-2.6-xen-sparse/drivers/xen/core/crash.c | 98 ++++ linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c | 73 +++ linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h | 10 ref-linux-2.6.16/drivers/base/cpu.c | 4 ref-linux-2.6.16/kernel/kexec.c | 52 +- xen/arch/x86/Makefile | 1 xen/arch/x86/dom0_ops.c | 3 xen/arch/x86/machine_kexec.c | 27 + xen/arch/x86/setup.c | 75 +++ xen/arch/x86/x86_32/Makefile | 1 xen/arch/x86/x86_32/entry.S | 2 xen/arch/x86/x86_32/machine_kexec.c | 206 ++++++++++ xen/arch/x86/x86_64/Makefile | 1 xen/arch/x86/x86_64/machine_kexec.c | 24 + xen/common/Makefile | 1 xen/common/kexec.c | 73 +++ xen/common/page_alloc.c | 33 + xen/include/asm-x86/hypercall.h | 5 xen/include/public/kexec.h | 46 ++ xen/include/public/xen.h | 9 xen/include/xen/mm.h | 1 25 files changed, 741 insertions(+), 33 deletions(-) --- x/linux-2.6-xen-sparse/arch/i386/Kconfig +++ x/linux-2.6-xen-sparse/arch/i386/Kconfig @@ -726,7 +726,7 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call (EXPERIMENTAL)" - depends on EXPERIMENTAL && !X86_XEN + depends on EXPERIMENTAL help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot --- x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile +++ x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile @@ -89,7 +89,7 @@ include $(srctree)/scripts/Makefile.xen obj-y += fixup.o microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o -n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o +n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o machine_kexec.o crash.o obj-y := $(call filterxen, $(obj-y), $(n-obj-xen)) obj-y := $(call cherrypickxen, $(obj-y)) --- x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c +++ x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c @@ -68,6 +68,10 @@ #include "setup_arch_pre.h" #include <bios_ebda.h> +#ifdef CONFIG_XEN +#include <xen/interface/kexec.h> +#endif + /* Forward Declaration. */ void __init find_max_pfn(void); @@ -932,6 +936,7 @@ static void __init parse_cmdline_early ( * after a kernel panic. */ else if (!memcmp(from, "crashkernel=", 12)) { +#ifndef CONFIG_XEN unsigned long size, base; size = memparse(from+12, &from); if (*from == ''@'') { @@ -942,6 +947,10 @@ static void __init parse_cmdline_early ( crashk_res.start = base; crashk_res.end = base + size - 1; } +#else + printk("Ignoring crashkernel command line, " + "parameter will be supplied by xen\n"); +#endif } #endif #ifdef CONFIG_PROC_VMCORE @@ -1318,9 +1327,21 @@ void __init setup_bootmem_allocator(void } #endif #ifdef CONFIG_KEXEC +#ifndef CONFIG_XEN if (crashk_res.start != crashk_res.end) reserve_bootmem(crashk_res.start, crashk_res.end - crashk_res.start + 1); +#else + { + struct kexec_arg xen_kexec_arg; + BUG_ON(HYPERVISOR_kexec(KEXEC_CMD_reserve, &xen_kexec_arg)); + if (xen_kexec_arg.u.reserve.size) { + crashk_res.start = xen_kexec_arg.u.reserve.start; + crashk_res.end = xen_kexec_arg.u.reserve.start + + xen_kexec_arg.u.reserve.size - 1; + } + } +#endif #endif if (!xen_feature(XENFEAT_auto_translated_physmap)) @@ -1395,6 +1416,9 @@ legacy_init_iomem_resources(struct resou res->end = map[i].end - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; request_resource(&iomem_resource, res); +#ifdef CONFIG_KEXEC + request_resource(res, &crashk_res); +#endif } free_bootmem(__pa(map), PAGE_SIZE); --- x/linux-2.6-xen-sparse/drivers/xen/core/Makefile +++ x/linux-2.6-xen-sparse/drivers/xen/core/Makefile @@ -9,3 +9,4 @@ obj-$(CONFIG_NET) += skbuff.o obj-$(CONFIG_SMP) += smpboot.o obj-$(CONFIG_SYSFS) += hypervisor_sysfs.o obj-$(CONFIG_XEN_SYSFS) += xen_sysfs.o +obj-$(CONFIG_KEXEC) += machine_kexec.o crash.o --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/crash.c @@ -0,0 +1,98 @@ +/* + * Architecture specific (i386-xen) functions for kexec based crash dumps. + * + * Created by: Horms <horms@verge.net.au> + * + */ + +#include <linux/kernel.h> /* For printk */ + +/* XXX: final_note(), crash_save_this_cpu() and crash_save_self() + * are copied from arch/i386/kernel/crash.c, might be good to either + * the original functions non-static and use them, or just + * merge this this into that file. + */ +#include <linux/elf.h> /* For struct elf_note */ +#include <linux/elfcore.h> /* For struct elf_prstatus */ +#include <linux/kexec.h> /* crash_notes */ + +static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, + size_t data_len) +{ + struct elf_note note; + + note.n_namesz = strlen(name) + 1; + note.n_descsz = data_len; + note.n_type = type; + memcpy(buf, ¬e, sizeof(note)); + buf += (sizeof(note) +3)/4; + memcpy(buf, name, note.n_namesz); + buf += (note.n_namesz + 3)/4; + memcpy(buf, data, note.n_descsz); + buf += (note.n_descsz + 3)/4; + + return buf; +} + +static void final_note(u32 *buf) +{ + struct elf_note note; + + note.n_namesz = 0; + note.n_descsz = 0; + note.n_type = 0; + memcpy(buf, ¬e, sizeof(note)); +} + +static void crash_save_this_cpu(struct pt_regs *regs, int cpu) +{ + struct elf_prstatus prstatus; + u32 *buf; + + if ((cpu < 0) || (cpu >= NR_CPUS)) + return; + + /* Using ELF notes here is opportunistic. + * I need a well defined structure format + * for the data I pass, and I need tags + * on the data to indicate what information I have + * squirrelled away. ELF notes happen to provide + * all of that that no need to invent something new. + */ + buf = (u32*)per_cpu_ptr(crash_notes, cpu); + if (!buf) + return; + memset(&prstatus, 0, sizeof(prstatus)); + prstatus.pr_pid = current->pid; + elf_core_copy_regs(&prstatus.pr_reg, regs); + buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus, + sizeof(prstatus)); + final_note(buf); +} + +static void crash_save_self(struct pt_regs *regs) +{ + int cpu; + + cpu = smp_processor_id(); + crash_save_this_cpu(regs, cpu); +} + + +void machine_crash_shutdown(struct pt_regs *regs) +{ + /* XXX: This should do something */ + printk("xen-kexec: Need to turn of other CPUS in " + "machine_crash_shutdown()\n"); + crash_save_self(regs); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c @@ -0,0 +1,73 @@ +/* + * machine_kexec.c - handle transition of Linux booting another kernel + * + * Created By: Horms <horms@verge.net.au> + * + * Losely based on arch/i386/kernel/machine_kexec.c + */ + +#include <linux/kexec.h> +#include <xen/interface/kexec.h> +#include <linux/mm.h> +#include <asm/hypercall.h> + +const extern unsigned char relocate_new_kernel[]; +extern unsigned int relocate_new_kernel_size; + +/* + * A architecture hook called to validate the + * proposed image and prepare the control pages + * as needed. The pages for KEXEC_CONTROL_CODE_SIZE + * have been allocated, but the segments have yet + * been copied into the kernel. + * + * Do what every setup is needed on image and the + * reboot code buffer to allow us to avoid allocations + * later. + * + * Currently nothing. + */ +int machine_kexec_prepare(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.helper.data = NULL; + return HYPERVISOR_kexec(KEXEC_CMD_kexec_prepare, &hypercall_arg); +} + +/* + * Undo anything leftover by machine_kexec_prepare + * when an image is freed. + */ +void machine_kexec_cleanup(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.helper.data = NULL; + HYPERVISOR_kexec(KEXEC_CMD_kexec_cleanup, &hypercall_arg); +} + +/* + * Do not allocate memory (or fail in any way) in machine_kexec(). + * We are past the point of no return, committed to rebooting now. + */ +NORET_TYPE void machine_kexec(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.kexec.indirection_page = image->head; + hypercall_arg.u.kexec.reboot_code_buffer = + pfn_to_mfn(page_to_pfn(image->control_code_page)) << PAGE_SHIFT; + hypercall_arg.u.kexec.start_address = image->start; + hypercall_arg.u.kexec.relocate_new_kernel = relocate_new_kernel; + hypercall_arg.u.kexec.relocate_new_kernel_size = + relocate_new_kernel_size; + HYPERVISOR_kexec(KEXEC_CMD_kexec, &hypercall_arg); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h +++ x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h @@ -37,6 +37,8 @@ # error "please don''t include this file directly" #endif +#include <xen/interface/kexec.h> + #define __STR(x) #x #define STR(x) __STR(x) @@ -357,6 +359,14 @@ HYPERVISOR_xenoprof_op( return _hypercall2(int, xenoprof_op, op, arg); } +static inline int +HYPERVISOR_kexec( + unsigned long op, kexec_arg_t * arg) +{ + return _hypercall2(int, kexec_op, op, arg); +} + + #endif /* __HYPERCALL_H__ */ --- x/ref-linux-2.6.16/drivers/base/cpu.c +++ x/ref-linux-2.6.16/drivers/base/cpu.c @@ -101,7 +101,11 @@ static ssize_t show_crash_notes(struct s * boot up and this data does not change there after. Hence this * operation should be safe. No locking required. */ +#ifndef CONFIG_XEN addr = __pa(per_cpu_ptr(crash_notes, cpunum)); +#else + addr = virt_to_machine(per_cpu_ptr(crash_notes, cpunum)); +#endif rc = sprintf(buf, "%Lx\n", addr); return rc; } --- x/ref-linux-2.6.16/kernel/kexec.c +++ x/ref-linux-2.6.16/kernel/kexec.c @@ -38,6 +38,20 @@ struct resource crashk_res = { .flags = IORESOURCE_BUSY | IORESOURCE_MEM }; +/* Kexec needs to know about the actually physical addresss. + * But in xen, a physical address is a pseudo-physical addresss. */ +#ifndef CONFIG_XEN +#define kexec_page_to_pfn(page) page_to_pfn(page) +#define kexec_pfn_to_page(pfn) pfn_to_page(pfn) +#define kexec_virt_to_phys(addr) virt_to_phys(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(addr) +#else +#define kexec_page_to_pfn(page) pfn_to_mfn(page_to_pfn(page)) +#define kexec_pfn_to_page(pfn) pfn_to_page(mfn_to_pfn(pfn)) +#define kexec_virt_to_phys(addr) virt_to_machine(addr) +#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr)) +#endif + int kexec_should_crash(struct task_struct *p) { if (in_interrupt() || !p->pid || p->pid == 1 || panic_on_oops) @@ -403,7 +417,7 @@ static struct page *kimage_alloc_normal_ pages = kimage_alloc_pages(GFP_KERNEL, order); if (!pages) break; - pfn = page_to_pfn(pages); + pfn = kexec_page_to_pfn(pages); epfn = pfn + count; addr = pfn << PAGE_SHIFT; eaddr = epfn << PAGE_SHIFT; @@ -437,6 +451,7 @@ static struct page *kimage_alloc_normal_ return pages; } +#ifndef CONFIG_XEN static struct page *kimage_alloc_crash_control_pages(struct kimage *image, unsigned int order) { @@ -490,7 +505,7 @@ static struct page *kimage_alloc_crash_c } /* If I don''t overlap any segments I have found my hole! */ if (i == image->nr_segments) { - pages = pfn_to_page(hole_start >> PAGE_SHIFT); + pages = kexec_pfn_to_page(hole_start >> PAGE_SHIFT); break; } } @@ -517,6 +532,13 @@ struct page *kimage_alloc_control_pages( return pages; } +#else /* !CONFIG_XEN */ +struct page *kimage_alloc_control_pages(struct kimage *image, + unsigned int order) +{ + return kimage_alloc_normal_control_pages(image, order); +} +#endif static int kimage_add_entry(struct kimage *image, kimage_entry_t entry) { @@ -532,7 +554,7 @@ static int kimage_add_entry(struct kimag return -ENOMEM; ind_page = page_address(page); - *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION; + *image->entry = kexec_virt_to_phys(ind_page) | IND_INDIRECTION; image->entry = ind_page; image->last_entry = ind_page + ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1); @@ -593,13 +615,13 @@ static int kimage_terminate(struct kimag #define for_each_kimage_entry(image, ptr, entry) \ for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \ ptr = (entry & IND_INDIRECTION)? \ - phys_to_virt((entry & PAGE_MASK)): ptr +1) + kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1) static void kimage_free_entry(kimage_entry_t entry) { struct page *page; - page = pfn_to_page(entry >> PAGE_SHIFT); + page = kexec_pfn_to_page(entry >> PAGE_SHIFT); kimage_free_pages(page); } @@ -686,7 +708,7 @@ static struct page *kimage_alloc_page(st * have a match. */ list_for_each_entry(page, &image->dest_pages, lru) { - addr = page_to_pfn(page) << PAGE_SHIFT; + addr = kexec_page_to_pfn(page) << PAGE_SHIFT; if (addr == destination) { list_del(&page->lru); return page; @@ -701,12 +723,12 @@ static struct page *kimage_alloc_page(st if (!page) return NULL; /* If the page cannot be used file it away */ - if (page_to_pfn(page) > + if (kexec_page_to_pfn(page) > (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { list_add(&page->lru, &image->unuseable_pages); continue; } - addr = page_to_pfn(page) << PAGE_SHIFT; + addr = kexec_page_to_pfn(page) << PAGE_SHIFT; /* If it is the destination page we want use it */ if (addr == destination) @@ -729,7 +751,7 @@ static struct page *kimage_alloc_page(st struct page *old_page; old_addr = *old & PAGE_MASK; - old_page = pfn_to_page(old_addr >> PAGE_SHIFT); + old_page = kexec_pfn_to_page(old_addr >> PAGE_SHIFT); copy_highpage(page, old_page); *old = addr | (*old & ~PAGE_MASK); @@ -779,7 +801,7 @@ static int kimage_load_normal_segment(st result = -ENOMEM; goto out; } - result = kimage_add_page(image, page_to_pfn(page) + result = kimage_add_page(image, kexec_page_to_pfn(page) << PAGE_SHIFT); if (result < 0) goto out; @@ -811,6 +833,7 @@ out: return result; } +#ifndef CONFIG_XEN static int kimage_load_crash_segment(struct kimage *image, struct kexec_segment *segment) { @@ -833,7 +856,7 @@ static int kimage_load_crash_segment(str char *ptr; size_t uchunk, mchunk; - page = pfn_to_page(maddr >> PAGE_SHIFT); + page = kexec_pfn_to_page(maddr >> PAGE_SHIFT); if (page == 0) { result = -ENOMEM; goto out; @@ -881,6 +904,13 @@ static int kimage_load_segment(struct ki return result; } +#else /* CONFIG_XEN */ +static int kimage_load_segment(struct kimage *image, + struct kexec_segment *segment) +{ + return kimage_load_normal_segment(image, segment); +} +#endif /* * Exec Kernel system call: for obvious reasons only root may call it. --- x/xen/arch/x86/Makefile +++ x/xen/arch/x86/Makefile @@ -39,6 +39,7 @@ obj-y += trampoline.o obj-y += traps.o obj-y += usercopy.o obj-y += x86_emulate.o +obj-y += machine_kexec.o ifneq ($(pae),n) obj-$(x86_32) += shadow.o shadow_public.o shadow_guest32.o --- x/xen/arch/x86/dom0_ops.c +++ x/xen/arch/x86/dom0_ops.c @@ -29,6 +29,9 @@ #include <asm/mtrr.h> #include "cpu/mtrr/mtrr.h" +extern unsigned int opt_kdump_megabytes; +extern unsigned int opt_kdump_megabytes_base; + #define TRC_DOM0OP_ENTER_BASE 0x00020000 #define TRC_DOM0OP_LEAVE_BASE 0x00030000 --- /dev/null +++ x/xen/arch/x86/machine_kexec.c @@ -0,0 +1,27 @@ +/****************************************************************************** + * arch/x86/machine_kexec.c + * + * Created By: Horms + * + */ + +#include <public/kexec.h> + +int machine_kexec_prepare(struct kexec_arg *arg) +{ + return 0; +} + +void machine_kexec_cleanup(struct kexec_arg *arg) +{ +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/arch/x86/setup.c +++ x/xen/arch/x86/setup.c @@ -38,6 +38,11 @@ static unsigned int opt_xenheap_megabyte integer_param("xenheap_megabytes", opt_xenheap_megabytes); #endif +unsigned int opt_kdump_megabytes = 0; +integer_param("kdump_megabytes", opt_kdump_megabytes); +unsigned int opt_kdump_megabytes_base = 0; +integer_param("kdump_megabytes_base", opt_kdump_megabytes_base); + /* opt_nosmp: If true, secondary processors are ignored. */ static int opt_nosmp = 0; boolean_param("nosmp", opt_nosmp); @@ -192,6 +197,20 @@ static void percpu_free_unused_areas(voi __pa(__per_cpu_end)); } +void __init move_memory(unsigned long dst, + unsigned long src_start, unsigned long src_end) +{ +#if defined(CONFIG_X86_32) + memmove((void *)dst, /* use low mapping */ + (void *)src_start, /* use low mapping */ + src_end - src_start); +#elif defined(CONFIG_X86_64) + memmove(__va(dst), + __va(src_start), + src_end - src_start); +#endif +} + void __init __start_xen(multiboot_info_t *mbi) { char __cmdline[] = "", *cmdline = __cmdline; @@ -327,15 +346,8 @@ void __init __start_xen(multiboot_info_t initial_images_start = xenheap_phys_end; initial_images_end = initial_images_start + modules_length; -#if defined(CONFIG_X86_32) - memmove((void *)initial_images_start, /* use low mapping */ - (void *)mod[0].mod_start, /* use low mapping */ - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#elif defined(CONFIG_X86_64) - memmove(__va(initial_images_start), - __va(mod[0].mod_start), - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#endif + move_memory(initial_images_start, + mod[0].mod_start, mod[mbi->mods_count-1].mod_end); /* Initialise boot-time allocator with all RAM situated after modules. */ xenheap_phys_start = init_boot_allocator(__pa(&_end)); @@ -383,6 +395,51 @@ void __init __start_xen(multiboot_info_t #endif } + if (opt_kdump_megabytes) { + unsigned long kdump_start, kdump_size, k; + + /* mark images pages as free for now */ + + init_boot_pages(initial_images_start, initial_images_end); + + kdump_start = opt_kdump_megabytes_base << 20; + kdump_size = opt_kdump_megabytes << 20; + + printk("Kdump: %luMB (%lukB) at 0x%lx\n", + kdump_size >> 20, + kdump_size >> 10, + kdump_start); + + if ((kdump_start & ~PAGE_MASK) || (kdump_size & ~PAGE_MASK)) + panic("Kdump parameters not page aligned\n"); + + kdump_start >>= PAGE_SHIFT; + kdump_size >>= PAGE_SHIFT; + + /* allocate pages for Kdump memory area */ + + k = alloc_boot_pages_at(kdump_size, kdump_start); + + if (k != kdump_start) + panic("Unable to reserve Kdump memory\n"); + + /* allocate pages for relocated initial images */ + + k = ((initial_images_end - initial_images_start) & ~PAGE_MASK) ? 1 : 0; + k += (initial_images_end - initial_images_start) >> PAGE_SHIFT; + + k = alloc_boot_pages(k, 1); + + if (!k) + panic("Unable to allocate initial images memory\n"); + + move_memory(k << PAGE_SHIFT, initial_images_start, initial_images_end); + + initial_images_end -= initial_images_start; + initial_images_start = k << PAGE_SHIFT; + initial_images_end += initial_images_start; + } + memguard_init(); printk("System RAM: %luMB (%lukB)\n", --- x/xen/arch/x86/x86_32/Makefile +++ x/xen/arch/x86/x86_32/Makefile @@ -3,5 +3,6 @@ obj-y += entry.o obj-y += mm.o obj-y += seg_fixup.o obj-y += traps.o +obj-y += machine_kexec.o obj-$(supervisor_mode_kernel) += supervisor_mode_kernel.o --- x/xen/arch/x86/x86_32/entry.S +++ x/xen/arch/x86/x86_32/entry.S @@ -648,6 +648,7 @@ ENTRY(hypercall_table) .long do_xenoprof_op .long do_event_channel_op .long do_physdev_op + .long do_kexec .rept NR_hypercalls-((.-hypercall_table)/4) .long do_ni_hypercall .endr @@ -687,6 +688,7 @@ ENTRY(hypercall_args_table) .byte 2 /* do_xenoprof_op */ .byte 2 /* do_event_channel_op */ .byte 2 /* do_physdev_op */ + .byte 2 /* do_kexec */ .rept NR_hypercalls-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr --- /dev/null +++ x/xen/arch/x86/x86_32/machine_kexec.c @@ -0,0 +1,206 @@ +/****************************************************************************** + * arch/x86/x86_32/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <xen/config.h> +#include <xen/types.h> +#include <xen/domain_page.h> +#include <xen/timer.h> +#include <xen/sched.h> +#include <xen/reboot.h> +#include <xen/console.h> +#include <asm/page.h> +#include <asm/flushtlb.h> +#include <public/xen.h> +#include <public/kexec.h> + +static void __machine_kexec(struct kexec_arg *arg); + +typedef asmlinkage void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long reboot_code_buffer, + unsigned long start_address, + unsigned int has_pae); + +#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) + +#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L2_ATTR (_PAGE_PRESENT) + +#ifndef CONFIG_X86_PAE + +static u32 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + unsigned long mfn; + u32 *pgtable_level2; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level2 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + write_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level2); +} + +#else +static u64 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; +static u64 pgtable_level2[L2_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + int mfn; + intpte_t *pgtable_level3; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level3 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + set_64bit(&pgtable_level3[l3_table_offset(address)], + __pa(pgtable_level2) | L2_ATTR); + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + load_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level3); +} +#endif + +static void kexec_load_segments(void) +{ +#define __SSTR(X) #X +#define SSTR(X) __SSTR(X) + __asm__ __volatile__ ( + "\tljmp $"SSTR(__HYPERVISOR_CS)",$1f\n" + "\t1:\n" + "\tmovl $"SSTR(__HYPERVISOR_DS)",%%eax\n" + "\tmovl %%eax,%%ds\n" + "\tmovl %%eax,%%es\n" + "\tmovl %%eax,%%fs\n" + "\tmovl %%eax,%%gs\n" + "\tmovl %%eax,%%ss\n" + ::: "eax", "memory"); +#undef SSTR +#undef __SSTR +} + +#define kexec_load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) +static void kexec_set_idt(void *newidt, __u16 limit) +{ + struct Xgt_desc_struct curidt; + + /* ia32 supports unaliged loads & stores */ + curidt.size = limit; + curidt.address = (unsigned long)newidt; + + kexec_load_idt(&curidt); + +}; + +#define kexec_load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) +static void kexec_set_gdt(void *newgdt, __u16 limit) +{ + struct Xgt_desc_struct curgdt; + + /* ia32 supports unaligned loads & stores */ + curgdt.size = limit; + curgdt.address = (unsigned long)newgdt; + + kexec_load_gdt(&curgdt); +}; + +static void __machine_shutdown(void *data) +{ + struct kexec_arg *arg = (struct kexec_arg *)data; + + printk("__machine_shutdown: cpu=%u\n", smp_processor_id()); + + watchdog_disable(); + console_start_sync(); + + smp_send_stop(); + +#ifdef CONFIG_X86_IO_APIC + disable_IO_APIC(); +#endif + + __machine_kexec(arg); +} + +void machine_shutdown(struct kexec_arg *arg) +{ + int reboot_cpu_id; + cpumask_t reboot_cpu; + + + reboot_cpu_id = 0; + + if (!cpu_isset(reboot_cpu_id, cpu_online_map)) + reboot_cpu_id = smp_processor_id(); + + if (reboot_cpu_id != smp_processor_id()) { + cpus_clear(reboot_cpu); + cpu_set(reboot_cpu_id, reboot_cpu); + on_selected_cpus(reboot_cpu, __machine_shutdown, arg, 1, 0); + for (;;) + ; /* nothing */ + } + else + __machine_shutdown(arg); + BUG(); +} + +static void __machine_kexec(struct kexec_arg *arg) +{ + relocate_new_kernel_t rnk; + + local_irq_disable(); + + identity_map_page(arg->u.kexec.reboot_code_buffer); + + copy_from_user((void *)arg->u.kexec.reboot_code_buffer, + arg->u.kexec.relocate_new_kernel, + arg->u.kexec.relocate_new_kernel_size); + + kexec_load_segments(); + kexec_set_gdt(__va(0),0); + kexec_set_idt(__va(0),0); + + rnk = (relocate_new_kernel_t) arg->u.kexec.reboot_code_buffer; + (*rnk)(arg->u.kexec.indirection_page, arg->u.kexec.reboot_code_buffer, + arg->u.kexec.start_address, cpu_has_pae); +} + +void machine_kexec(struct kexec_arg *arg) +{ + machine_shutdown(arg); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/arch/x86/x86_64/Makefile +++ x/xen/arch/x86/x86_64/Makefile @@ -1,3 +1,4 @@ obj-y += entry.o obj-y += mm.o obj-y += traps.o +obj-y += machine_kexec.o --- /dev/null +++ x/xen/arch/x86/x86_64/machine_kexec.c @@ -0,0 +1,24 @@ +/****************************************************************************** + * arch/x86/x86_64/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <public/kexec.h> + +void machine_kexec(struct kexec_arg *arg) +{ + printk("machine_kexec: not implemented\n"); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/common/Makefile +++ x/xen/common/Makefile @@ -7,6 +7,7 @@ obj-y += event_channel.o obj-y += grant_table.o obj-y += kernel.o obj-y += keyhandler.o +obj-y += kexec.o obj-y += lib.o obj-y += memory.o obj-y += multicall.o --- /dev/null +++ x/xen/common/kexec.c @@ -0,0 +1,73 @@ +/* + * Achitecture independent kexec code for Xen + * + * At this statge, just a switch for the kexec hypercall into + * architecture dependent code. + * + * Created By: Horms <horms@verge.net.au> + */ + +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/guest_access.h> +#include <xen/sched.h> +#include <public/xen.h> +#include <public/kexec.h> + +extern int machine_kexec_prepare(struct kexec_arg *arg); +extern void machine_kexec_cleanup(struct kexec_arg *arg); +extern void machine_kexec(struct kexec_arg *arg); + +extern unsigned int opt_kdump_megabytes; +extern unsigned int opt_kdump_megabytes_base; + +int do_kexec(unsigned long op, + XEN_GUEST_HANDLE(kexec_arg_t) uarg) +{ + struct kexec_arg arg; + + if ( !IS_PRIV(current->domain) ) + return -EPERM; + + if (op == KEXEC_CMD_reserve) + { + arg.u.reserve.size = opt_kdump_megabytes << 20; + arg.u.reserve.start = opt_kdump_megabytes_base << 20; + if ( unlikely(copy_to_guest(uarg, &arg, 1) != 0) ) + { + printk("do_kexec: copy_to_guest failed"); + return -EFAULT; + } + return 0; + } + + if ( unlikely(copy_from_guest(&arg, uarg, 1) != 0) ) + { + printk("do_kexec: __copy_from_guest failed"); + return -EFAULT; + } + + switch(op) { + case KEXEC_CMD_kexec: + machine_kexec(&arg); + return -EINVAL; /* Not Reached */ + case KEXEC_CMD_kexec_prepare: + return machine_kexec_prepare(&arg); + case KEXEC_CMD_kexec_cleanup: + machine_kexec_cleanup(&arg); + return 0; + } + + return -EINVAL; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + --- x/xen/common/page_alloc.c +++ x/xen/common/page_alloc.c @@ -212,24 +212,35 @@ void init_boot_pages(paddr_t ps, paddr_t } } +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at) +{ + unsigned long i; + + for ( i = 0; i < nr_pfns; i++ ) + if ( allocated_in_map(pfn_at + i) ) + break; + + if ( i == nr_pfns ) + { + map_alloc(pfn_at, nr_pfns); + return pfn_at; + } + + return 0; +} + unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align) { - unsigned long pg, i; + unsigned long pg, i = 0; for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align ) { - for ( i = 0; i < nr_pfns; i++ ) - if ( allocated_in_map(pg + i) ) - break; - - if ( i == nr_pfns ) - { - map_alloc(pg, nr_pfns); - return pg; - } + i = alloc_boot_pages_at(nr_pfns, pg); + if (i != 0) + break; } - return 0; + return i; } --- x/xen/include/asm-x86/hypercall.h +++ x/xen/include/asm-x86/hypercall.h @@ -6,6 +6,7 @@ #define __ASM_X86_HYPERCALL_H__ #include <public/physdev.h> +#include <public/kexec.h> extern long do_event_channel_op_compat( @@ -87,6 +88,10 @@ extern long arch_do_vcpu_op( int cmd, struct vcpu *v, XEN_GUEST_HANDLE(void) arg); +extern int +do_kexec( + unsigned long op, XEN_GUEST_HANDLE(kexec_arg_t) uarg); + #ifdef __x86_64__ extern long --- /dev/null +++ x/xen/include/public/kexec.h @@ -0,0 +1,46 @@ +/* + * kexec.h: Xen kexec public + * + * Created By: Horms <horms@verge.net.au> + */ + +#ifndef _XEN_PUBLIC_KEXEC_H +#define _XEN_PUBLIC_KEXEC_H + +#include <xen/types.h> +#include <public/xen.h> + +/* + * Scratch space for passing arguments to the kexec hypercall + */ +typedef struct kexec_arg { + union { + struct { + unsigned long data; /* Not sure what this should be yet */ + } helper; + struct { + unsigned long indirection_page; + unsigned long reboot_code_buffer; + unsigned long start_address; + const char *relocate_new_kernel; + unsigned int relocate_new_kernel_size; + } kexec; + struct { + unsigned long size; + unsigned long start; + } reserve; + } u; +} kexec_arg_t; +DEFINE_XEN_GUEST_HANDLE(kexec_arg_t); + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/include/public/xen.h +++ x/xen/include/public/xen.h @@ -64,6 +64,7 @@ #define __HYPERVISOR_xenoprof_op 31 #define __HYPERVISOR_event_channel_op 32 #define __HYPERVISOR_physdev_op 33 +#define __HYPERVISOR_kexec_op 34 /* Architecture-specific hypercall definitions. */ #define __HYPERVISOR_arch_0 48 @@ -238,6 +239,14 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t); #define VMASST_TYPE_writable_pagetables 2 #define MAX_VMASST_TYPE 2 +/* + * Operations for kexec. + */ +#define KEXEC_CMD_kexec 0 +#define KEXEC_CMD_kexec_prepare 1 +#define KEXEC_CMD_kexec_cleanup 2 +#define KEXEC_CMD_reserve 3 + #ifndef __ASSEMBLY__ typedef uint16_t domid_t; --- x/xen/include/xen/mm.h +++ x/xen/include/xen/mm.h @@ -40,6 +40,7 @@ struct page_info; paddr_t init_boot_allocator(paddr_t bitmap_start); void init_boot_pages(paddr_t ps, paddr_t pe); unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align); +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at); void end_boot_allocator(void); /* Generic allocator. These functions are *not* interrupt-safe. */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-May-03 07:16 UTC
[Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VI)
Hi, Simon and Magnus I have one question. When Xen is panic, I seemed kexec is not called. Only when dom0 is panic, kexec is called. But in the case of nmi=dom0, can we use kexec by pushing NMI button? Am I righit? I''ll use your patch soon, and report. :-) Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, May 03, 2006 at 04:16:22PM +0900, Akio Takebe wrote:> Hi, Simon and Magnus > > I have one question. > When Xen is panic, I seemed kexec is not called. > Only when dom0 is panic, kexec is called.That is a good point.> But in the case of nmi=dom0, can we use kexec by pushing NMI button? > Am I righit?Probably, I will have to investigate a little further. Though, I''m not sure that I have ever seen an NMI button. Are you thinking about the INIT button on some ia64 boxes? That is a bit different to NMI on x86.> I''ll use your patch soon, and report. :-)Thanks -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-May-06 08:44 UTC
Re: [Xen-devel] [PATCH]: kexec: framework and i386 (Take VI)
Hi, Horms Why you modify ref-linux-2.6.16/kernel/{drivers/base/cpu.c, kernel/kexec.c }? I tried to patch your kexec patch, I fail to patch it. How do you do patch? I think you can make a patch in patches/linux-2.6.16/ if you would modify these. Best Regards, Akio Takebe>Hi, > >I will be out of the office until next Monday, so here is the latest and >greatest before I go. Tested against 9896, should also work fine >with tip (9903). > >-- >Horms http://www.vergenet.net/~ >horms/ > >kexec: framework and i386 > >This is an implementation of kexec for dom0/xen, that allows >kexecing of the physical machine from xen. The approach taken is >to move the architecture-dependant kexec code into a new hypercall. > >Some notes: > * machine_kexec_cleanup() and machine_kexec_prepare() don''t do > anything in i386. So while this patch adds a framework for them, > I am not sure what parameters are needs at this stage. > * Only works for UP, as machine_shutdown is not implemented yet > * kexecing into xen does not seem to work, I think that > kexec-tools needs updating, but I have not investigated yet > * Kdump works by first copying the kernel into dom0 segments > and relocating them later in xen, the same way that kexec does > The only difference is that the relocation is made into > an area reserved by xen > * Kdump reservation is made using the xen command line parameters, > kdump_megabytes and kdump_megabytes_base, rather than > the linux option crashkernel, which is now ignored. > Two parameters are used instead of one to simplify parsing. > This can be cleaned up later if desired. But the reservation > seems to need to be made by xen to make sure that it happens > early enough. > * This patch uses a new kexec hypercall > >Highlights since the previous posted version: > > * SMP kexec (not kdump yet) > * Split x86_32 specific xen code out > >Prepared by Horms and Magnus Damm > >Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> >Signed-Off-By: Horms <horms@verge.net.au> > > linux-2.6-xen-sparse/arch/i386/Kconfig | 2 > linux-2.6-xen-sparse/arch/i386/kernel/Makefile | 2 > linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c | 24 + > linux-2.6-xen-sparse/drivers/xen/core/Makefile | 1 > linux-2.6-xen-sparse/drivers/xen/core/crash.c | 98 ++++ > linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c | 73 +++ > linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h | 10 > ref-linux-2.6.16/drivers/base/cpu.c | 4 > ref-linux-2.6.16/kernel/kexec.c | 52 +- > xen/arch/x86/Makefile | 1 > xen/arch/x86/dom0_ops.c | 3 > xen/arch/x86/machine_kexec.c | 27 + > xen/arch/x86/setup.c | 75 +++ > xen/arch/x86/x86_32/Makefile | 1 > xen/arch/x86/x86_32/entry.S | 2 > xen/arch/x86/x86_32/machine_kexec.c | 206 ++++ >++++++ > xen/arch/x86/x86_64/Makefile | 1 > xen/arch/x86/x86_64/machine_kexec.c | 24 + > xen/common/Makefile | 1 > xen/common/kexec.c | 73 +++ > xen/common/page_alloc.c | 33 + > xen/include/asm-x86/hypercall.h | 5 > xen/include/public/kexec.h | 46 ++ > xen/include/public/xen.h | 9 > xen/include/xen/mm.h | 1 > 25 files changed, 741 insertions(+), 33 deletions(-) > >--- x/linux-2.6-xen-sparse/arch/i386/Kconfig >+++ x/linux-2.6-xen-sparse/arch/i386/Kconfig >@@ -726,7 +726,7 @@ source kernel/Kconfig.hz > > config KEXEC > bool "kexec system call (EXPERIMENTAL)" >- depends on EXPERIMENTAL && !X86_XEN >+ depends on EXPERIMENTAL > help > kexec is a system call that implements the ability to shutdown your > current kernel, and to start another kernel. It is like a reboot >--- x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile >+++ x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile >@@ -89,7 +89,7 @@ include $(srctree)/scripts/Makefile.xen > > obj-y += fixup.o > microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o >-n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o >+n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o machine_kexec >.o crash.o > > obj-y := $(call filterxen, $(obj-y), $(n-obj-xen)) > obj-y := $(call cherrypickxen, $(obj-y)) >--- x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c >+++ x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c >@@ -68,6 +68,10 @@ > #include "setup_arch_pre.h" > #include <bios_ebda.h> > >+#ifdef CONFIG_XEN >+#include <xen/interface/kexec.h> >+#endif >+ > /* Forward Declaration. */ > void __init find_max_pfn(void); > >@@ -932,6 +936,7 @@ static void __init parse_cmdline_early ( > * after a kernel panic. > */ > else if (!memcmp(from, "crashkernel=", 12)) { >+#ifndef CONFIG_XEN > unsigned long size, base; > size = memparse(from+12, &from); > if (*from == ''@'') { >@@ -942,6 +947,10 @@ static void __init parse_cmdline_early ( > crashk_res.start = base; > crashk_res.end = base + size - 1; > } >+#else >+ printk("Ignoring crashkernel command line, " >+ "parameter will be supplied by xen\n"); >+#endif > } > #endif > #ifdef CONFIG_PROC_VMCORE >@@ -1318,9 +1327,21 @@ void __init setup_bootmem_allocator(void > } > #endif > #ifdef CONFIG_KEXEC >+#ifndef CONFIG_XEN > if (crashk_res.start != crashk_res.end) > reserve_bootmem(crashk_res.start, > crashk_res.end - crashk_res.start + 1); >+#else >+ { >+ struct kexec_arg xen_kexec_arg; >+ BUG_ON(HYPERVISOR_kexec(KEXEC_CMD_reserve, &xen_kexec_arg)); >+ if (xen_kexec_arg.u.reserve.size) { >+ crashk_res.start = xen_kexec_arg.u.reserve.start; >+ crashk_res.end = xen_kexec_arg.u.reserve.start + >+ xen_kexec_arg.u.reserve.size - 1; >+ } >+ } >+#endif > #endif > > if (!xen_feature(XENFEAT_auto_translated_physmap)) >@@ -1395,6 +1416,9 @@ legacy_init_iomem_resources(struct resou > res->end = map[i].end - 1; > res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; > request_resource(&iomem_resource, res); >+#ifdef CONFIG_KEXEC >+ request_resource(res, &crashk_res); >+#endif > } > > free_bootmem(__pa(map), PAGE_SIZE); >--- x/linux-2.6-xen-sparse/drivers/xen/core/Makefile >+++ x/linux-2.6-xen-sparse/drivers/xen/core/Makefile >@@ -9,3 +9,4 @@ obj-$(CONFIG_NET) += skbuff.o > obj-$(CONFIG_SMP) += smpboot.o > obj-$(CONFIG_SYSFS) += hypervisor_sysfs.o > obj-$(CONFIG_XEN_SYSFS) += xen_sysfs.o >+obj-$(CONFIG_KEXEC) += machine_kexec.o crash.o >--- /dev/null >+++ x/linux-2.6-xen-sparse/drivers/xen/core/crash.c >@@ -0,0 +1,98 @@ >+/* >+ * Architecture specific (i386-xen) functions for kexec based crash dumps. >+ * >+ * Created by: Horms <horms@verge.net.au> >+ * >+ */ >+ >+#include <linux/kernel.h> /* For printk */ >+ >+/* XXX: final_note(), crash_save_this_cpu() and crash_save_self() >+ * are copied from arch/i386/kernel/crash.c, might be good to either >+ * the original functions non-static and use them, or just >+ * merge this this into that file. >+ */ >+#include <linux/elf.h> /* For struct elf_note */ >+#include <linux/elfcore.h> /* For struct elf_prstatus */ >+#include <linux/kexec.h> /* crash_notes */ >+ >+static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, >+ size_t data_len) >+{ >+ struct elf_note note; >+ >+ note.n_namesz = strlen(name) + 1; >+ note.n_descsz = data_len; >+ note.n_type = type; >+ memcpy(buf, ¬e, sizeof(note)); >+ buf += (sizeof(note) +3)/4; >+ memcpy(buf, name, note.n_namesz); >+ buf += (note.n_namesz + 3)/4; >+ memcpy(buf, data, note.n_descsz); >+ buf += (note.n_descsz + 3)/4; >+ >+ return buf; >+} >+ >+static void final_note(u32 *buf) >+{ >+ struct elf_note note; >+ >+ note.n_namesz = 0; >+ note.n_descsz = 0; >+ note.n_type = 0; >+ memcpy(buf, ¬e, sizeof(note)); >+} >+ >+static void crash_save_this_cpu(struct pt_regs *regs, int cpu) >+{ >+ struct elf_prstatus prstatus; >+ u32 *buf; >+ >+ if ((cpu < 0) || (cpu >= NR_CPUS)) >+ return; >+ >+ /* Using ELF notes here is opportunistic. >+ * I need a well defined structure format >+ * for the data I pass, and I need tags >+ * on the data to indicate what information I have >+ * squirrelled away. ELF notes happen to provide >+ * all of that that no need to invent something new. >+ */ >+ buf = (u32*)per_cpu_ptr(crash_notes, cpu); >+ if (!buf) >+ return; >+ memset(&prstatus, 0, sizeof(prstatus)); >+ prstatus.pr_pid = current->pid; >+ elf_core_copy_regs(&prstatus.pr_reg, regs); >+ buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus, >+ sizeof(prstatus)); >+ final_note(buf); >+} >+ >+static void crash_save_self(struct pt_regs *regs) >+{ >+ int cpu; >+ >+ cpu = smp_processor_id(); >+ crash_save_this_cpu(regs, cpu); >+} >+ >+ >+void machine_crash_shutdown(struct pt_regs *regs) >+{ >+ /* XXX: This should do something */ >+ printk("xen-kexec: Need to turn of other CPUS in " >+ "machine_crash_shutdown()\n"); >+ crash_save_self(regs); >+} >+ >+/* >+ * Local variables: >+ * c-file-style: "linux" >+ * indent-tabs-mode: t >+ * c-indent-level: 8 >+ * c-basic-offset: 8 >+ * tab-width: 8 >+ * End: >+ */ >--- /dev/null >+++ x/linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c >@@ -0,0 +1,73 @@ >+/* >+ * machine_kexec.c - handle transition of Linux booting another kernel >+ * >+ * Created By: Horms <horms@verge.net.au> >+ * >+ * Losely based on arch/i386/kernel/machine_kexec.c >+ */ >+ >+#include <linux/kexec.h> >+#include <xen/interface/kexec.h> >+#include <linux/mm.h> >+#include <asm/hypercall.h> >+ >+const extern unsigned char relocate_new_kernel[]; >+extern unsigned int relocate_new_kernel_size; >+ >+/* >+ * A architecture hook called to validate the >+ * proposed image and prepare the control pages >+ * as needed. The pages for KEXEC_CONTROL_CODE_SIZE >+ * have been allocated, but the segments have yet >+ * been copied into the kernel. >+ * >+ * Do what every setup is needed on image and the >+ * reboot code buffer to allow us to avoid allocations >+ * later. >+ * >+ * Currently nothing. >+ */ >+int machine_kexec_prepare(struct kimage *image) >+{ >+ kexec_arg_t hypercall_arg; >+ hypercall_arg.u.helper.data = NULL; >+ return HYPERVISOR_kexec(KEXEC_CMD_kexec_prepare, &hypercall_arg); >+} >+ >+/* >+ * Undo anything leftover by machine_kexec_prepare >+ * when an image is freed. >+ */ >+void machine_kexec_cleanup(struct kimage *image) >+{ >+ kexec_arg_t hypercall_arg; >+ hypercall_arg.u.helper.data = NULL; >+ HYPERVISOR_kexec(KEXEC_CMD_kexec_cleanup, &hypercall_arg); >+} >+ >+/* >+ * Do not allocate memory (or fail in any way) in machine_kexec(). >+ * We are past the point of no return, committed to rebooting now. >+ */ >+NORET_TYPE void machine_kexec(struct kimage *image) >+{ >+ kexec_arg_t hypercall_arg; >+ hypercall_arg.u.kexec.indirection_page = image->head; >+ hypercall_arg.u.kexec.reboot_code_buffer = >+ pfn_to_mfn(page_to_pfn(image->control_code_page)) << PAGE_SHIFT; >+ hypercall_arg.u.kexec.start_address = image->start; >+ hypercall_arg.u.kexec.relocate_new_kernel = relocate_new_kernel; >+ hypercall_arg.u.kexec.relocate_new_kernel_size = >+ relocate_new_kernel_size; >+ HYPERVISOR_kexec(KEXEC_CMD_kexec, &hypercall_arg); >+} >+ >+/* >+ * Local variables: >+ * c-file-style: "linux" >+ * indent-tabs-mode: t >+ * c-indent-level: 8 >+ * c-basic-offset: 8 >+ * tab-width: 8 >+ * End: >+ */ >--- x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h >+++ x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h >@@ -37,6 +37,8 @@ > # error "please don''t include this file directly" > #endif > >+#include <xen/interface/kexec.h> >+ > #define __STR(x) #x > #define STR(x) __STR(x) > >@@ -357,6 +359,14 @@ HYPERVISOR_xenoprof_op( > return _hypercall2(int, xenoprof_op, op, arg); > } > >+static inline int >+HYPERVISOR_kexec( >+ unsigned long op, kexec_arg_t * arg) >+{ >+ return _hypercall2(int, kexec_op, op, arg); >+} >+ >+ > > #endif /* __HYPERCALL_H__ */ > >--- x/ref-linux-2.6.16/drivers/base/cpu.c >+++ x/ref-linux-2.6.16/drivers/base/cpu.c >@@ -101,7 +101,11 @@ static ssize_t show_crash_notes(struct s > * boot up and this data does not change there after. Hence this > * operation should be safe. No locking required. > */ >+#ifndef CONFIG_XEN > addr = __pa(per_cpu_ptr(crash_notes, cpunum)); >+#else >+ addr = virt_to_machine(per_cpu_ptr(crash_notes, cpunum)); >+#endif > rc = sprintf(buf, "%Lx\n", addr); > return rc; > } >--- x/ref-linux-2.6.16/kernel/kexec.c >+++ x/ref-linux-2.6.16/kernel/kexec.c >@@ -38,6 +38,20 @@ struct resource crashk_res = { > .flags = IORESOURCE_BUSY | IORESOURCE_MEM > }; > >+/* Kexec needs to know about the actually physical addresss. >+ * But in xen, a physical address is a pseudo-physical addresss. */ >+#ifndef CONFIG_XEN >+#define kexec_page_to_pfn(page) page_to_pfn(page) >+#define kexec_pfn_to_page(pfn) pfn_to_page(pfn) >+#define kexec_virt_to_phys(addr) virt_to_phys(addr) >+#define kexec_phys_to_virt(addr) phys_to_virt(addr) >+#else >+#define kexec_page_to_pfn(page) pfn_to_mfn(page_to_pfn(page)) >+#define kexec_pfn_to_page(pfn) pfn_to_page(mfn_to_pfn(pfn)) >+#define kexec_virt_to_phys(addr) virt_to_machine(addr) >+#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr)) >+#endif >+ > int kexec_should_crash(struct task_struct *p) > { > if (in_interrupt() || !p->pid || p->pid == 1 || panic_on_oops) >@@ -403,7 +417,7 @@ static struct page *kimage_alloc_normal_ > pages = kimage_alloc_pages(GFP_KERNEL, order); > if (!pages) > break; >- pfn = page_to_pfn(pages); >+ pfn = kexec_page_to_pfn(pages); > epfn = pfn + count; > addr = pfn << PAGE_SHIFT; > eaddr = epfn << PAGE_SHIFT; >@@ -437,6 +451,7 @@ static struct page *kimage_alloc_normal_ > return pages; > } > >+#ifndef CONFIG_XEN > static struct page *kimage_alloc_crash_control_pages(struct kimage *image, > unsigned int order) > { >@@ -490,7 +505,7 @@ static struct page *kimage_alloc_crash_c > } > /* If I don''t overlap any segments I have found my hole! */ > if (i == image->nr_segments) { >- pages = pfn_to_page(hole_start >> PAGE_SHIFT); >+ pages = kexec_pfn_to_page(hole_start >> PAGE_SHIFT); > break; > } > } >@@ -517,6 +532,13 @@ struct page *kimage_alloc_control_pages( > > return pages; > } >+#else /* !CONFIG_XEN */ >+struct page *kimage_alloc_control_pages(struct kimage *image, >+ unsigned int order) >+{ >+ return kimage_alloc_normal_control_pages(image, order); >+} >+#endif > > static int kimage_add_entry(struct kimage *image, kimage_entry_t entry) > { >@@ -532,7 +554,7 @@ static int kimage_add_entry(struct kimag > return -ENOMEM; > > ind_page = page_address(page); >- *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION; >+ *image->entry = kexec_virt_to_phys(ind_page) | IND_INDIRECTION; > image->entry = ind_page; > image->last_entry = ind_page + > ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1); >@@ -593,13 +615,13 @@ static int kimage_terminate(struct kimag > #define for_each_kimage_entry(image, ptr, entry) \ > for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \ > ptr = (entry & IND_INDIRECTION)? \ >- phys_to_virt((entry & PAGE_MASK)): ptr +1) >+ kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1) > > static void kimage_free_entry(kimage_entry_t entry) > { > struct page *page; > >- page = pfn_to_page(entry >> PAGE_SHIFT); >+ page = kexec_pfn_to_page(entry >> PAGE_SHIFT); > kimage_free_pages(page); > } > >@@ -686,7 +708,7 @@ static struct page *kimage_alloc_page(st > * have a match. > */ > list_for_each_entry(page, &image->dest_pages, lru) { >- addr = page_to_pfn(page) << PAGE_SHIFT; >+ addr = kexec_page_to_pfn(page) << PAGE_SHIFT; > if (addr == destination) { > list_del(&page->lru); > return page; >@@ -701,12 +723,12 @@ static struct page *kimage_alloc_page(st > if (!page) > return NULL; > /* If the page cannot be used file it away */ >- if (page_to_pfn(page) > >+ if (kexec_page_to_pfn(page) > > (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { > list_add(&page->lru, &image->unuseable_pages); > continue; > } >- addr = page_to_pfn(page) << PAGE_SHIFT; >+ addr = kexec_page_to_pfn(page) << PAGE_SHIFT; > > /* If it is the destination page we want use it */ > if (addr == destination) >@@ -729,7 +751,7 @@ static struct page *kimage_alloc_page(st > struct page *old_page; > > old_addr = *old & PAGE_MASK; >- old_page = pfn_to_page(old_addr >> PAGE_SHIFT); >+ old_page = kexec_pfn_to_page(old_addr >> PAGE_SHIFT); > copy_highpage(page, old_page); > *old = addr | (*old & ~PAGE_MASK); > >@@ -779,7 +801,7 @@ static int kimage_load_normal_segment(st > result = -ENOMEM; > goto out; > } >- result = kimage_add_page(image, page_to_pfn(page) >+ result = kimage_add_page(image, kexec_page_to_pfn(page) > << PAGE_SHIFT); > if (result < 0) > goto out; >@@ -811,6 +833,7 @@ out: > return result; > } > >+#ifndef CONFIG_XEN > static int kimage_load_crash_segment(struct kimage *image, > struct kexec_segment *segment) > { >@@ -833,7 +856,7 @@ static int kimage_load_crash_segment(str > char *ptr; > size_t uchunk, mchunk; > >- page = pfn_to_page(maddr >> PAGE_SHIFT); >+ page = kexec_pfn_to_page(maddr >> PAGE_SHIFT); > if (page == 0) { > result = -ENOMEM; > goto out; >@@ -881,6 +904,13 @@ static int kimage_load_segment(struct ki > > return result; > } >+#else /* CONFIG_XEN */ >+static int kimage_load_segment(struct kimage *image, >+ struct kexec_segment *segment) >+{ >+ return kimage_load_normal_segment(image, segment); >+} >+#endif > > /* > * Exec Kernel system call: for obvious reasons only root may call it. >--- x/xen/arch/x86/Makefile >+++ x/xen/arch/x86/Makefile >@@ -39,6 +39,7 @@ obj-y += trampoline.o > obj-y += traps.o > obj-y += usercopy.o > obj-y += x86_emulate.o >+obj-y += machine_kexec.o > > ifneq ($(pae),n) > obj-$(x86_32) += shadow.o shadow_public.o shadow_guest32.o >--- x/xen/arch/x86/dom0_ops.c >+++ x/xen/arch/x86/dom0_ops.c >@@ -29,6 +29,9 @@ > #include <asm/mtrr.h> > #include "cpu/mtrr/mtrr.h" > >+extern unsigned int opt_kdump_megabytes; >+extern unsigned int opt_kdump_megabytes_base; >+ > #define TRC_DOM0OP_ENTER_BASE 0x00020000 > #define TRC_DOM0OP_LEAVE_BASE 0x00030000 > >--- /dev/null >+++ x/xen/arch/x86/machine_kexec.c >@@ -0,0 +1,27 @@ >+/************************************************************************* >***** >+ * arch/x86/machine_kexec.c >+ * >+ * Created By: Horms >+ * >+ */ >+ >+#include <public/kexec.h> >+ >+int machine_kexec_prepare(struct kexec_arg *arg) >+{ >+ return 0; >+} >+ >+void machine_kexec_cleanup(struct kexec_arg *arg) >+{ >+} >+ >+/* >+ * Local variables: >+ * mode: C >+ * c-set-style: "BSD" >+ * c-basic-offset: 4 >+ * tab-width: 4 >+ * indent-tabs-mode: nil >+ * End: >+ */ >--- x/xen/arch/x86/setup.c >+++ x/xen/arch/x86/setup.c >@@ -38,6 +38,11 @@ static unsigned int opt_xenheap_megabyte > integer_param("xenheap_megabytes", opt_xenheap_megabytes); > #endif > >+unsigned int opt_kdump_megabytes = 0; >+integer_param("kdump_megabytes", opt_kdump_megabytes); >+unsigned int opt_kdump_megabytes_base = 0; >+integer_param("kdump_megabytes_base", opt_kdump_megabytes_base); >+ > /* opt_nosmp: If true, secondary processors are ignored. */ > static int opt_nosmp = 0; > boolean_param("nosmp", opt_nosmp); >@@ -192,6 +197,20 @@ static void percpu_free_unused_areas(voi > __pa(__per_cpu_end)); > } > >+void __init move_memory(unsigned long dst, >+ unsigned long src_start, unsigned long src_end) >+{ >+#if defined(CONFIG_X86_32) >+ memmove((void *)dst, /* use low mapping */ >+ (void *)src_start, /* use low mapping */ >+ src_end - src_start); >+#elif defined(CONFIG_X86_64) >+ memmove(__va(dst), >+ __va(src_start), >+ src_end - src_start); >+#endif >+} >+ > void __init __start_xen(multiboot_info_t *mbi) > { > char __cmdline[] = "", *cmdline = __cmdline; >@@ -327,15 +346,8 @@ void __init __start_xen(multiboot_info_t > initial_images_start = xenheap_phys_end; > initial_images_end = initial_images_start + modules_length; > >-#if defined(CONFIG_X86_32) >- memmove((void *)initial_images_start, /* use low mapping */ >- (void *)mod[0].mod_start, /* use low mapping */ >- mod[mbi->mods_count-1].mod_end - mod[0].mod_start); >-#elif defined(CONFIG_X86_64) >- memmove(__va(initial_images_start), >- __va(mod[0].mod_start), >- mod[mbi->mods_count-1].mod_end - mod[0].mod_start); >-#endif >+ move_memory(initial_images_start, >+ mod[0].mod_start, mod[mbi->mods_count-1].mod_end); > > /* Initialise boot-time allocator with all RAM situated after modules. > */ > xenheap_phys_start = init_boot_allocator(__pa(&_end)); >@@ -383,6 +395,51 @@ void __init __start_xen(multiboot_info_t > #endif > } > >+ if (opt_kdump_megabytes) { >+ unsigned long kdump_start, kdump_size, k; >+ >+ /* mark images pages as free for now */ >+ >+ init_boot_pages(initial_images_start, initial_images_end); >+ >+ kdump_start = opt_kdump_megabytes_base << 20; >+ kdump_size = opt_kdump_megabytes << 20; >+ >+ printk("Kdump: %luMB (%lukB) at 0x%lx\n", >+ kdump_size >> 20, >+ kdump_size >> 10, >+ kdump_start); >+ >+ if ((kdump_start & ~PAGE_MASK) || (kdump_size & ~PAGE_MASK)) >+ panic("Kdump parameters not page aligned\n"); >+ >+ kdump_start >>= PAGE_SHIFT; >+ kdump_size >>= PAGE_SHIFT; >+ >+ /* allocate pages for Kdump memory area */ >+ >+ k = alloc_boot_pages_at(kdump_size, kdump_start); >+ >+ if (k != kdump_start) >+ panic("Unable to reserve Kdump memory\n"); >+ >+ /* allocate pages for relocated initial images */ >+ >+ k = ((initial_images_end - initial_images_start) & ~PAGE_MASK) ? 1 > : 0; >+ k += (initial_images_end - initial_images_start) >> PAGE_SHIFT; >+ >+ k = alloc_boot_pages(k, 1); >+ >+ if (!k) >+ panic("Unable to allocate initial images memory\n"); >+ >+ move_memory(k << PAGE_SHIFT, initial_images_start, >initial_images_end); >+ >+ initial_images_end -= initial_images_start; >+ initial_images_start = k << PAGE_SHIFT; >+ initial_images_end += initial_images_start; >+ } >+ > memguard_init(); > > printk("System RAM: %luMB (%lukB)\n", >--- x/xen/arch/x86/x86_32/Makefile >+++ x/xen/arch/x86/x86_32/Makefile >@@ -3,5 +3,6 @@ obj-y += entry.o > obj-y += mm.o > obj-y += seg_fixup.o > obj-y += traps.o >+obj-y += machine_kexec.o > > obj-$(supervisor_mode_kernel) += supervisor_mode_kernel.o >--- x/xen/arch/x86/x86_32/entry.S >+++ x/xen/arch/x86/x86_32/entry.S >@@ -648,6 +648,7 @@ ENTRY(hypercall_table) > .long do_xenoprof_op > .long do_event_channel_op > .long do_physdev_op >+ .long do_kexec > .rept NR_hypercalls-((.-hypercall_table)/4) > .long do_ni_hypercall > .endr >@@ -687,6 +688,7 @@ ENTRY(hypercall_args_table) > .byte 2 /* do_xenoprof_op */ > .byte 2 /* do_event_channel_op */ > .byte 2 /* do_physdev_op */ >+ .byte 2 /* do_kexec */ > .rept NR_hypercalls-(.-hypercall_args_table) > .byte 0 /* do_ni_hypercall */ > .endr >--- /dev/null >+++ x/xen/arch/x86/x86_32/machine_kexec.c >@@ -0,0 +1,206 @@ >+/************************************************************************* >***** >+ * arch/x86/x86_32/machine_kexec.c >+ * >+ * Created By: Horms >+ * >+ * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 >+ */ >+ >+#include <xen/config.h> >+#include <xen/types.h> >+#include <xen/domain_page.h> >+#include <xen/timer.h> >+#include <xen/sched.h> >+#include <xen/reboot.h> >+#include <xen/console.h> >+#include <asm/page.h> >+#include <asm/flushtlb.h> >+#include <public/xen.h> >+#include <public/kexec.h> >+ >+static void __machine_kexec(struct kexec_arg *arg); >+ >+typedef asmlinkage void (*relocate_new_kernel_t)( >+ unsigned long indirection_page, >+ unsigned long reboot_code_buffer, >+ unsigned long start_address, >+ unsigned int has_pae); >+ >+#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) >+ >+#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) >+#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) >+#define L2_ATTR (_PAGE_PRESENT) >+ >+#ifndef CONFIG_X86_PAE >+ >+static u32 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; >+ >+static void identity_map_page(unsigned long address) >+{ >+ unsigned long mfn; >+ u32 *pgtable_level2; >+ >+ /* Find the current page table */ >+ mfn = read_cr3() >> PAGE_SHIFT; >+ pgtable_level2 = map_domain_page(mfn); >+ >+ /* Identity map the page table entry */ >+ pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; >+ pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | >L1_ATTR; >+ >+ /* Flush the tlb so the new mapping takes effect. >+ * Global tlb entries are not flushed but that is not an issue. >+ */ >+ write_cr3(mfn << PAGE_SHIFT); >+ >+ unmap_domain_page(pgtable_level2); >+} >+ >+#else >+static u64 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; >+static u64 pgtable_level2[L2_PAGETABLE_ENTRIES] PAGE_ALIGNED; >+ >+static void identity_map_page(unsigned long address) >+{ >+ int mfn; >+ intpte_t *pgtable_level3; >+ >+ /* Find the current page table */ >+ mfn = read_cr3() >> PAGE_SHIFT; >+ pgtable_level3 = map_domain_page(mfn); >+ >+ /* Identity map the page table entry */ >+ pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; >+ pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | >L1_ATTR; >+ set_64bit(&pgtable_level3[l3_table_offset(address)], >+ __pa(pgtable_level2) | L2_ATTR); >+ >+ /* Flush the tlb so the new mapping takes effect. >+ * Global tlb entries are not flushed but that is not an issue. >+ */ >+ load_cr3(mfn << PAGE_SHIFT); >+ >+ unmap_domain_page(pgtable_level3); >+} >+#endif >+ >+static void kexec_load_segments(void) >+{ >+#define __SSTR(X) #X >+#define SSTR(X) __SSTR(X) >+ __asm__ __volatile__ ( >+ "\tljmp $"SSTR(__HYPERVISOR_CS)",$1f\n" >+ "\t1:\n" >+ "\tmovl $"SSTR(__HYPERVISOR_DS)",%%eax\n" >+ "\tmovl %%eax,%%ds\n" >+ "\tmovl %%eax,%%es\n" >+ "\tmovl %%eax,%%fs\n" >+ "\tmovl %%eax,%%gs\n" >+ "\tmovl %%eax,%%ss\n" >+ ::: "eax", "memory"); >+#undef SSTR >+#undef __SSTR >+} >+ >+#define kexec_load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) >+static void kexec_set_idt(void *newidt, __u16 limit) >+{ >+ struct Xgt_desc_struct curidt; >+ >+ /* ia32 supports unaliged loads & stores */ >+ curidt.size = limit; >+ curidt.address = (unsigned long)newidt; >+ >+ kexec_load_idt(&curidt); >+ >+}; >+ >+#define kexec_load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) >+static void kexec_set_gdt(void *newgdt, __u16 limit) >+{ >+ struct Xgt_desc_struct curgdt; >+ >+ /* ia32 supports unaligned loads & stores */ >+ curgdt.size = limit; >+ curgdt.address = (unsigned long)newgdt; >+ >+ kexec_load_gdt(&curgdt); >+}; >+ >+static void __machine_shutdown(void *data) >+{ >+ struct kexec_arg *arg = (struct kexec_arg *)data; >+ >+ printk("__machine_shutdown: cpu=%u\n", smp_processor_id()); >+ >+ watchdog_disable(); >+ console_start_sync(); >+ >+ smp_send_stop(); >+ >+#ifdef CONFIG_X86_IO_APIC >+ disable_IO_APIC(); >+#endif >+ >+ __machine_kexec(arg); >+} >+ >+void machine_shutdown(struct kexec_arg *arg) >+{ >+ int reboot_cpu_id; >+ cpumask_t reboot_cpu; >+ >+ >+ reboot_cpu_id = 0; >+ >+ if (!cpu_isset(reboot_cpu_id, cpu_online_map)) >+ reboot_cpu_id = smp_processor_id(); >+ >+ if (reboot_cpu_id != smp_processor_id()) { >+ cpus_clear(reboot_cpu); >+ cpu_set(reboot_cpu_id, reboot_cpu); >+ on_selected_cpus(reboot_cpu, __machine_shutdown, arg, 1, 0); >+ for (;;) >+ ; /* nothing */ >+ } >+ else >+ __machine_shutdown(arg); >+ BUG(); >+} >+ >+static void __machine_kexec(struct kexec_arg *arg) >+{ >+ relocate_new_kernel_t rnk; >+ >+ local_irq_disable(); >+ >+ identity_map_page(arg->u.kexec.reboot_code_buffer); >+ >+ copy_from_user((void *)arg->u.kexec.reboot_code_buffer, >+ arg->u.kexec.relocate_new_kernel, >+ arg->u.kexec.relocate_new_kernel_size); >+ >+ kexec_load_segments(); >+ kexec_set_gdt(__va(0),0); >+ kexec_set_idt(__va(0),0); >+ >+ rnk = (relocate_new_kernel_t) arg->u.kexec.reboot_code_buffer; >+ (*rnk)(arg->u.kexec.indirection_page, arg->u.kexec.reboot_code_buffer, >+ arg->u.kexec.start_address, cpu_has_pae); >+} >+ >+void machine_kexec(struct kexec_arg *arg) >+{ >+ machine_shutdown(arg); >+} >+ >+/* >+ * Local variables: >+ * mode: C >+ * c-set-style: "BSD" >+ * c-basic-offset: 4 >+ * tab-width: 4 >+ * indent-tabs-mode: nil >+ * End: >+ */ >--- x/xen/arch/x86/x86_64/Makefile >+++ x/xen/arch/x86/x86_64/Makefile >@@ -1,3 +1,4 @@ > obj-y += entry.o > obj-y += mm.o > obj-y += traps.o >+obj-y += machine_kexec.o >--- /dev/null >+++ x/xen/arch/x86/x86_64/machine_kexec.c >@@ -0,0 +1,24 @@ >+/************************************************************************* >***** >+ * arch/x86/x86_64/machine_kexec.c >+ * >+ * Created By: Horms >+ * >+ * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 >+ */ >+ >+#include <public/kexec.h> >+ >+void machine_kexec(struct kexec_arg *arg) >+{ >+ printk("machine_kexec: not implemented\n"); >+} >+ >+/* >+ * Local variables: >+ * mode: C >+ * c-set-style: "BSD" >+ * c-basic-offset: 4 >+ * tab-width: 4 >+ * indent-tabs-mode: nil >+ * End: >+ */ >--- x/xen/common/Makefile >+++ x/xen/common/Makefile >@@ -7,6 +7,7 @@ obj-y += event_channel.o > obj-y += grant_table.o > obj-y += kernel.o > obj-y += keyhandler.o >+obj-y += kexec.o > obj-y += lib.o > obj-y += memory.o > obj-y += multicall.o >--- /dev/null >+++ x/xen/common/kexec.c >@@ -0,0 +1,73 @@ >+/* >+ * Achitecture independent kexec code for Xen >+ * >+ * At this statge, just a switch for the kexec hypercall into >+ * architecture dependent code. >+ * >+ * Created By: Horms <horms@verge.net.au> >+ */ >+ >+#include <xen/lib.h> >+#include <xen/errno.h> >+#include <xen/guest_access.h> >+#include <xen/sched.h> >+#include <public/xen.h> >+#include <public/kexec.h> >+ >+extern int machine_kexec_prepare(struct kexec_arg *arg); >+extern void machine_kexec_cleanup(struct kexec_arg *arg); >+extern void machine_kexec(struct kexec_arg *arg); >+ >+extern unsigned int opt_kdump_megabytes; >+extern unsigned int opt_kdump_megabytes_base; >+ >+int do_kexec(unsigned long op, >+ XEN_GUEST_HANDLE(kexec_arg_t) uarg) >+{ >+ struct kexec_arg arg; >+ >+ if ( !IS_PRIV(current->domain) ) >+ return -EPERM; >+ >+ if (op == KEXEC_CMD_reserve) >+ { >+ arg.u.reserve.size = opt_kdump_megabytes << 20; >+ arg.u.reserve.start = opt_kdump_megabytes_base << 20; >+ if ( unlikely(copy_to_guest(uarg, &arg, 1) != 0) ) >+ { >+ printk("do_kexec: copy_to_guest failed"); >+ return -EFAULT; >+ } >+ return 0; >+ } >+ >+ if ( unlikely(copy_from_guest(&arg, uarg, 1) != 0) ) >+ { >+ printk("do_kexec: __copy_from_guest failed"); >+ return -EFAULT; >+ } >+ >+ switch(op) { >+ case KEXEC_CMD_kexec: >+ machine_kexec(&arg); >+ return -EINVAL; /* Not Reached */ >+ case KEXEC_CMD_kexec_prepare: >+ return machine_kexec_prepare(&arg); >+ case KEXEC_CMD_kexec_cleanup: >+ machine_kexec_cleanup(&arg); >+ return 0; >+ } >+ >+ return -EINVAL; >+} >+ >+/* >+ * Local variables: >+ * mode: C >+ * c-set-style: "BSD" >+ * c-basic-offset: 4 >+ * tab-width: 4 >+ * indent-tabs-mode: nil >+ * End: >+ */ >+ >--- x/xen/common/page_alloc.c >+++ x/xen/common/page_alloc.c >@@ -212,24 +212,35 @@ void init_boot_pages(paddr_t ps, paddr_t > } > } > >+unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long >pfn_at) >+{ >+ unsigned long i; >+ >+ for ( i = 0; i < nr_pfns; i++ ) >+ if ( allocated_in_map(pfn_at + i) ) >+ break; >+ >+ if ( i == nr_pfns ) >+ { >+ map_alloc(pfn_at, nr_pfns); >+ return pfn_at; >+ } >+ >+ return 0; >+} >+ > unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long >pfn_align) > { >- unsigned long pg, i; >+ unsigned long pg, i = 0; > > for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align ) > { >- for ( i = 0; i < nr_pfns; i++ ) >- if ( allocated_in_map(pg + i) ) >- break; >- >- if ( i == nr_pfns ) >- { >- map_alloc(pg, nr_pfns); >- return pg; >- } >+ i = alloc_boot_pages_at(nr_pfns, pg); >+ if (i != 0) >+ break; > } > >- return 0; >+ return i; > } > > >--- x/xen/include/asm-x86/hypercall.h >+++ x/xen/include/asm-x86/hypercall.h >@@ -6,6 +6,7 @@ > #define __ASM_X86_HYPERCALL_H__ > > #include <public/physdev.h> >+#include <public/kexec.h> > > extern long > do_event_channel_op_compat( >@@ -87,6 +88,10 @@ extern long > arch_do_vcpu_op( > int cmd, struct vcpu *v, XEN_GUEST_HANDLE(void) arg); > >+extern int >+do_kexec( >+ unsigned long op, XEN_GUEST_HANDLE(kexec_arg_t) uarg); >+ > #ifdef __x86_64__ > > extern long >--- /dev/null >+++ x/xen/include/public/kexec.h >@@ -0,0 +1,46 @@ >+/* >+ * kexec.h: Xen kexec public >+ * >+ * Created By: Horms <horms@verge.net.au> >+ */ >+ >+#ifndef _XEN_PUBLIC_KEXEC_H >+#define _XEN_PUBLIC_KEXEC_H >+ >+#include <xen/types.h> >+#include <public/xen.h> >+ >+/* >+ * Scratch space for passing arguments to the kexec hypercall >+ */ >+typedef struct kexec_arg { >+ union { >+ struct { >+ unsigned long data; /* Not sure what this should be yet */ >+ } helper; >+ struct { >+ unsigned long indirection_page; >+ unsigned long reboot_code_buffer; >+ unsigned long start_address; >+ const char *relocate_new_kernel; >+ unsigned int relocate_new_kernel_size; >+ } kexec; >+ struct { >+ unsigned long size; >+ unsigned long start; >+ } reserve; >+ } u; >+} kexec_arg_t; >+DEFINE_XEN_GUEST_HANDLE(kexec_arg_t); >+ >+#endif >+ >+/* >+ * Local variables: >+ * mode: C >+ * c-set-style: "BSD" >+ * c-basic-offset: 4 >+ * tab-width: 4 >+ * indent-tabs-mode: nil >+ * End: >+ */ >--- x/xen/include/public/xen.h >+++ x/xen/include/public/xen.h >@@ -64,6 +64,7 @@ > #define __HYPERVISOR_xenoprof_op 31 > #define __HYPERVISOR_event_channel_op 32 > #define __HYPERVISOR_physdev_op 33 >+#define __HYPERVISOR_kexec_op 34 > > /* Architecture-specific hypercall definitions. */ > #define __HYPERVISOR_arch_0 48 >@@ -238,6 +239,14 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t); > #define VMASST_TYPE_writable_pagetables 2 > #define MAX_VMASST_TYPE 2 > >+/* >+ * Operations for kexec. >+ */ >+#define KEXEC_CMD_kexec 0 >+#define KEXEC_CMD_kexec_prepare 1 >+#define KEXEC_CMD_kexec_cleanup 2 >+#define KEXEC_CMD_reserve 3 >+ > #ifndef __ASSEMBLY__ > > typedef uint16_t domid_t; >--- x/xen/include/xen/mm.h >+++ x/xen/include/xen/mm.h >@@ -40,6 +40,7 @@ struct page_info; > paddr_t init_boot_allocator(paddr_t bitmap_start); > void init_boot_pages(paddr_t ps, paddr_t pe); > unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long >pfn_align); >+unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long >pfn_at); > void end_boot_allocator(void); > > /* Generic allocator. These functions are *not* interrupt-safe. */ > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-May-06 08:46 UTC
[Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VI)
Hi, Horms Thank you for your reply.>On Wed, May 03, 2006 at 04:16:22PM +0900, Akio Takebe wrote: >> Hi, Simon and Magnus >> >> I have one question. >> When Xen is panic, I seemed kexec is not called. >> Only when dom0 is panic, kexec is called. > >That is a good point. > >> But in the case of nmi=dom0, can we use kexec by pushing NMI button? >> Am I righit? > >Probably, I will have to investigate a little further. >Though, I''m not sure that I have ever seen an NMI button. >Are you thinking about the INIT button on some ia64 boxes? >That is a bit different to NMI on x86.I said about the NMI bottun on x86. Many x86 servers (not PC) have a NMI bottun like many ia64 servers have a INIT bottun. Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, May 06, 2006 at 05:44:44PM +0900, Akio Takebe wrote:> Hi, Horms > > Why you modify ref-linux-2.6.16/kernel/{drivers/base/cpu.c, kernel/kexec.c }? > I tried to patch your kexec patch, I fail to patch it. > How do you do patch?Sorry, the drivers/base/cpu.c portion is just an artifact of xen''s build system which modifies that file on build, but doesn''t unmodify it on distclean. It shouldn''t have been included in my patch. kernel/kexec.c needs to be modified primarily so that mfns are used instead of pfns. Again because of strangeness in the the xen build system, this patch is a bit odd as it patches a file not covered by a xen checkout (even though its needed for a xen build). If you run the following before applying the patch it should apply. make prep-kernels clean kclean make -C linux-2.6.16-xen distclean> I think you can make a patch in patches/linux-2.6.16/ if you would > modify these.Yes, that is probably the best way forward, I''ll work on breaking it out in that manner. -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, May 06, 2006 at 05:46:43PM +0900, Akio Takebe wrote:> Hi, Horms > > Thank you for your reply. > > >On Wed, May 03, 2006 at 04:16:22PM +0900, Akio Takebe wrote: > >> Hi, Simon and Magnus > >> > >> I have one question. > >> When Xen is panic, I seemed kexec is not called. > >> Only when dom0 is panic, kexec is called. > > > >That is a good point. > > > >> But in the case of nmi=dom0, can we use kexec by pushing NMI button? > >> Am I righit? > > > >Probably, I will have to investigate a little further. > >Though, I''m not sure that I have ever seen an NMI button. > >Are you thinking about the INIT button on some ia64 boxes? > >That is a bit different to NMI on x86. > I said about the NMI bottun on x86. > Many x86 servers (not PC) have a NMI bottun > like many ia64 servers have a INIT bottun.Ok thanks, I haven''t seen such a machine. I''ll look into simulating it in software. -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-May-07 09:45 UTC
[Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VI)
> >Ok thanks, I haven''t seen such a machine. >I''ll look into simulating it in software. >I have x86 server with NMI button. If necessary, I can test it :-) Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2006-May-08 09:02 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VI)
I didn''t get Horms'' (I presume that''s who is quoted below) original mail so I''ll reply to this one.> >Ok thanks, I haven''t seen such a machine. > >I''ll look into simulating it in software.There is code in xen/arch/x86/nmi.c:do_nmi_trigger(). You can trigger it with the ''n'' keyhandler. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, May 07, 2006 at 01:45:22PM +0900, Horms wrote:> On Sat, May 06, 2006 at 05:44:44PM +0900, Akio Takebe wrote: > > > I think you can make a patch in patches/linux-2.6.16/ if you would > > modify these. > > Yes, that is probably the best way forward, I''ll work on breaking it > out in that manner.Hi Takebe-san, here is an updated version of the patch which moves portions into patches/linux-2.6.16/ as you suggested. It also moves to xen-unstable 9969 / Linux 2.6.16.13 and has some minor build fixes, for problems that crept into the previous patch. -- Horms http://www.vergenet.net/~horms/ kexec: framework and i386 This is an implementation of kexec for dom0/xen, that allows kexecing of the physical machine from xen. The approach taken is to move the architecture-dependant kexec code into a new hypercall. Some notes: * machine_kexec_cleanup() and machine_kexec_prepare() don''t do anything in i386. So while this patch adds a framework for them, I am not sure what parameters are needs at this stage. * Only works for UP, as machine_shutdown is not implemented yet * kexecing into xen does not seem to work, I think that kexec-tools needs updating, but I have not investigated yet * Kdump works by first copying the kernel into dom0 segments and relocating them later in xen, the same way that kexec does The only difference is that the relocation is made into an area reserved by xen * Kdump reservation is made using the xen command line parameters, kdump_megabytes and kdump_megabytes_base, rather than the linux option crashkernel, which is now ignored. Two parameters are used instead of one to simplify parsing. This can be cleaned up later if desired. But the reservation seems to need to be made by xen to make sure that it happens early enough. The tested values are kdump_megabytes=16, kdump_megabytes_base=32 (kdump_megabytes_base=16 does not seem to work) * This patch uses a new kexec hypercall * SMP Kexec works, Kdump is next on the list Highlights since the previous posted version: * Diff now applies to a xen checkout from hg (previously it assumed that the kernel was unpacked) - xen-unstable-hg 9660 / Linux 2.6.16.13 * Added machine_shutdown, which disapperared in the previous release of this patch * Fixed include problems in kexec.h Prepared by Horms and Magnus Damm Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> Signed-Off-By: Horms <horms@verge.net.au> linux-2.6-xen-sparse/arch/i386/Kconfig | 2 linux-2.6-xen-sparse/arch/i386/kernel/Makefile | 2 linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c | 24 + linux-2.6-xen-sparse/drivers/xen/core/Makefile | 1 linux-2.6-xen-sparse/drivers/xen/core/crash.c | 98 ++++ linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c | 73 +++ linux-2.6-xen-sparse/drivers/xen/core/reboot.c | 4 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h | 10 linux-2.6.16.13/kexec.patch | 175 ++++++++ xen/arch/x86/Makefile | 1 xen/arch/x86/dom0_ops.c | 3 xen/arch/x86/machine_kexec.c | 28 + xen/arch/x86/setup.c | 75 +++ xen/arch/x86/x86_32/Makefile | 1 xen/arch/x86/x86_32/entry.S | 2 xen/arch/x86/x86_32/machine_kexec.c | 205 ++++++++++ xen/arch/x86/x86_64/Makefile | 1 xen/arch/x86/x86_64/machine_kexec.c | 25 + xen/common/Makefile | 1 xen/common/kexec.c | 73 +++ xen/common/page_alloc.c | 33 + xen/include/asm-x86/hypercall.h | 6 xen/include/public/kexec.h | 45 ++ xen/include/public/xen.h | 9 xen/include/xen/mm.h | 1 25 files changed, 876 insertions(+), 22 deletions(-) --- x/linux-2.6-xen-sparse/arch/i386/Kconfig +++ x/linux-2.6-xen-sparse/arch/i386/Kconfig @@ -726,7 +726,7 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call (EXPERIMENTAL)" - depends on EXPERIMENTAL && !X86_XEN + depends on EXPERIMENTAL help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot --- x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile +++ x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile @@ -89,7 +89,7 @@ include $(srctree)/scripts/Makefile.xen obj-y += fixup.o microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o -n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o +n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o machine_kexec.o crash.o obj-y := $(call filterxen, $(obj-y), $(n-obj-xen)) obj-y := $(call cherrypickxen, $(obj-y)) --- x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c +++ x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c @@ -68,6 +68,10 @@ #include "setup_arch_pre.h" #include <bios_ebda.h> +#ifdef CONFIG_XEN +#include <xen/interface/kexec.h> +#endif + /* Forward Declaration. */ void __init find_max_pfn(void); @@ -932,6 +936,7 @@ static void __init parse_cmdline_early ( * after a kernel panic. */ else if (!memcmp(from, "crashkernel=", 12)) { +#ifndef CONFIG_XEN unsigned long size, base; size = memparse(from+12, &from); if (*from == ''@'') { @@ -942,6 +947,10 @@ static void __init parse_cmdline_early ( crashk_res.start = base; crashk_res.end = base + size - 1; } +#else + printk("Ignoring crashkernel command line, " + "parameter will be supplied by xen\n"); +#endif } #endif #ifdef CONFIG_PROC_VMCORE @@ -1318,9 +1327,21 @@ void __init setup_bootmem_allocator(void } #endif #ifdef CONFIG_KEXEC +#ifndef CONFIG_XEN if (crashk_res.start != crashk_res.end) reserve_bootmem(crashk_res.start, crashk_res.end - crashk_res.start + 1); +#else + { + struct kexec_arg xen_kexec_arg; + BUG_ON(HYPERVISOR_kexec(KEXEC_CMD_reserve, &xen_kexec_arg)); + if (xen_kexec_arg.u.reserve.size) { + crashk_res.start = xen_kexec_arg.u.reserve.start; + crashk_res.end = xen_kexec_arg.u.reserve.start + + xen_kexec_arg.u.reserve.size - 1; + } + } +#endif #endif if (!xen_feature(XENFEAT_auto_translated_physmap)) @@ -1395,6 +1416,9 @@ legacy_init_iomem_resources(struct resou res->end = map[i].end - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; request_resource(&iomem_resource, res); +#ifdef CONFIG_KEXEC + request_resource(res, &crashk_res); +#endif } free_bootmem(__pa(map), PAGE_SIZE); --- x/linux-2.6-xen-sparse/drivers/xen/core/Makefile +++ x/linux-2.6-xen-sparse/drivers/xen/core/Makefile @@ -9,3 +9,4 @@ obj-$(CONFIG_NET) += skbuff.o obj-$(CONFIG_SMP) += smpboot.o obj-$(CONFIG_SYSFS) += hypervisor_sysfs.o obj-$(CONFIG_XEN_SYSFS) += xen_sysfs.o +obj-$(CONFIG_KEXEC) += machine_kexec.o crash.o --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/crash.c @@ -0,0 +1,98 @@ +/* + * Architecture specific (i386-xen) functions for kexec based crash dumps. + * + * Created by: Horms <horms@verge.net.au> + * + */ + +#include <linux/kernel.h> /* For printk */ + +/* XXX: final_note(), crash_save_this_cpu() and crash_save_self() + * are copied from arch/i386/kernel/crash.c, might be good to either + * the original functions non-static and use them, or just + * merge this this into that file. + */ +#include <linux/elf.h> /* For struct elf_note */ +#include <linux/elfcore.h> /* For struct elf_prstatus */ +#include <linux/kexec.h> /* crash_notes */ + +static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, + size_t data_len) +{ + struct elf_note note; + + note.n_namesz = strlen(name) + 1; + note.n_descsz = data_len; + note.n_type = type; + memcpy(buf, ¬e, sizeof(note)); + buf += (sizeof(note) +3)/4; + memcpy(buf, name, note.n_namesz); + buf += (note.n_namesz + 3)/4; + memcpy(buf, data, note.n_descsz); + buf += (note.n_descsz + 3)/4; + + return buf; +} + +static void final_note(u32 *buf) +{ + struct elf_note note; + + note.n_namesz = 0; + note.n_descsz = 0; + note.n_type = 0; + memcpy(buf, ¬e, sizeof(note)); +} + +static void crash_save_this_cpu(struct pt_regs *regs, int cpu) +{ + struct elf_prstatus prstatus; + u32 *buf; + + if ((cpu < 0) || (cpu >= NR_CPUS)) + return; + + /* Using ELF notes here is opportunistic. + * I need a well defined structure format + * for the data I pass, and I need tags + * on the data to indicate what information I have + * squirrelled away. ELF notes happen to provide + * all of that that no need to invent something new. + */ + buf = (u32*)per_cpu_ptr(crash_notes, cpu); + if (!buf) + return; + memset(&prstatus, 0, sizeof(prstatus)); + prstatus.pr_pid = current->pid; + elf_core_copy_regs(&prstatus.pr_reg, regs); + buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus, + sizeof(prstatus)); + final_note(buf); +} + +static void crash_save_self(struct pt_regs *regs) +{ + int cpu; + + cpu = smp_processor_id(); + crash_save_this_cpu(regs, cpu); +} + + +void machine_crash_shutdown(struct pt_regs *regs) +{ + /* XXX: This should do something */ + printk("xen-kexec: Need to turn of other CPUS in " + "machine_crash_shutdown()\n"); + crash_save_self(regs); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c @@ -0,0 +1,73 @@ +/* + * machine_kexec.c - handle transition of Linux booting another kernel + * + * Created By: Horms <horms@verge.net.au> + * + * Losely based on arch/i386/kernel/machine_kexec.c + */ + +#include <linux/kexec.h> +#include <xen/interface/kexec.h> +#include <linux/mm.h> +#include <asm/hypercall.h> + +const extern unsigned char relocate_new_kernel[]; +extern unsigned int relocate_new_kernel_size; + +/* + * A architecture hook called to validate the + * proposed image and prepare the control pages + * as needed. The pages for KEXEC_CONTROL_CODE_SIZE + * have been allocated, but the segments have yet + * been copied into the kernel. + * + * Do what every setup is needed on image and the + * reboot code buffer to allow us to avoid allocations + * later. + * + * Currently nothing. + */ +int machine_kexec_prepare(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.helper.data = NULL; + return HYPERVISOR_kexec(KEXEC_CMD_kexec_prepare, &hypercall_arg); +} + +/* + * Undo anything leftover by machine_kexec_prepare + * when an image is freed. + */ +void machine_kexec_cleanup(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.helper.data = NULL; + HYPERVISOR_kexec(KEXEC_CMD_kexec_cleanup, &hypercall_arg); +} + +/* + * Do not allocate memory (or fail in any way) in machine_kexec(). + * We are past the point of no return, committed to rebooting now. + */ +NORET_TYPE void machine_kexec(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.kexec.indirection_page = image->head; + hypercall_arg.u.kexec.reboot_code_buffer = + pfn_to_mfn(page_to_pfn(image->control_code_page)) << PAGE_SHIFT; + hypercall_arg.u.kexec.start_address = image->start; + hypercall_arg.u.kexec.relocate_new_kernel = relocate_new_kernel; + hypercall_arg.u.kexec.relocate_new_kernel_size = + relocate_new_kernel_size; + HYPERVISOR_kexec(KEXEC_CMD_kexec, &hypercall_arg); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c +++ x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c @@ -66,6 +66,10 @@ void machine_power_off(void) HYPERVISOR_shutdown(SHUTDOWN_poweroff); } +#ifdef CONFIG_KEXEC +void machine_shutdown(void) { } +#endif + int reboot_thru_bios = 0; /* for dmi_scan.c */ EXPORT_SYMBOL(machine_restart); EXPORT_SYMBOL(machine_halt); --- x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h +++ x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h @@ -39,6 +39,8 @@ # error "please don''t include this file directly" #endif +#include <xen/interface/kexec.h> + #define __STR(x) #x #define STR(x) __STR(x) @@ -359,6 +361,14 @@ HYPERVISOR_xenoprof_op( return _hypercall2(int, xenoprof_op, op, arg); } +static inline int +HYPERVISOR_kexec( + unsigned long op, kexec_arg_t * arg) +{ + return _hypercall2(int, kexec_op, op, arg); +} + + #endif /* __HYPERCALL_H__ */ --- x/xen/arch/x86/Makefile +++ x/xen/arch/x86/Makefile @@ -39,6 +39,7 @@ obj-y += trampoline.o obj-y += traps.o obj-y += usercopy.o obj-y += x86_emulate.o +obj-y += machine_kexec.o ifneq ($(pae),n) obj-$(x86_32) += shadow.o shadow_public.o shadow_guest32.o --- x/xen/arch/x86/dom0_ops.c +++ x/xen/arch/x86/dom0_ops.c @@ -29,6 +29,9 @@ #include <asm/mtrr.h> #include "cpu/mtrr/mtrr.h" +extern unsigned int opt_kdump_megabytes; +extern unsigned int opt_kdump_megabytes_base; + #define TRC_DOM0OP_ENTER_BASE 0x00020000 #define TRC_DOM0OP_LEAVE_BASE 0x00030000 --- /dev/null +++ x/xen/arch/x86/machine_kexec.c @@ -0,0 +1,28 @@ +/****************************************************************************** + * arch/x86/machine_kexec.c + * + * Created By: Horms + * + */ + +#include <xen/types.h> +#include <public/kexec.h> + +int machine_kexec_prepare(struct kexec_arg *arg) +{ + return 0; +} + +void machine_kexec_cleanup(struct kexec_arg *arg) +{ +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/arch/x86/setup.c +++ x/xen/arch/x86/setup.c @@ -38,6 +38,11 @@ static unsigned int opt_xenheap_megabyte integer_param("xenheap_megabytes", opt_xenheap_megabytes); #endif +unsigned int opt_kdump_megabytes = 0; +integer_param("kdump_megabytes", opt_kdump_megabytes); +unsigned int opt_kdump_megabytes_base = 0; +integer_param("kdump_megabytes_base", opt_kdump_megabytes_base); + /* opt_nosmp: If true, secondary processors are ignored. */ static int opt_nosmp = 0; boolean_param("nosmp", opt_nosmp); @@ -192,6 +197,20 @@ static void percpu_free_unused_areas(voi __pa(__per_cpu_end)); } +void __init move_memory(unsigned long dst, + unsigned long src_start, unsigned long src_end) +{ +#if defined(CONFIG_X86_32) + memmove((void *)dst, /* use low mapping */ + (void *)src_start, /* use low mapping */ + src_end - src_start); +#elif defined(CONFIG_X86_64) + memmove(__va(dst), + __va(src_start), + src_end - src_start); +#endif +} + void __init __start_xen(multiboot_info_t *mbi) { char __cmdline[] = "", *cmdline = __cmdline; @@ -327,15 +346,8 @@ void __init __start_xen(multiboot_info_t initial_images_start = xenheap_phys_end; initial_images_end = initial_images_start + modules_length; -#if defined(CONFIG_X86_32) - memmove((void *)initial_images_start, /* use low mapping */ - (void *)mod[0].mod_start, /* use low mapping */ - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#elif defined(CONFIG_X86_64) - memmove(__va(initial_images_start), - __va(mod[0].mod_start), - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#endif + move_memory(initial_images_start, + mod[0].mod_start, mod[mbi->mods_count-1].mod_end); /* Initialise boot-time allocator with all RAM situated after modules. */ xenheap_phys_start = init_boot_allocator(__pa(&_end)); @@ -383,6 +395,51 @@ void __init __start_xen(multiboot_info_t #endif } + if (opt_kdump_megabytes) { + unsigned long kdump_start, kdump_size, k; + + /* mark images pages as free for now */ + + init_boot_pages(initial_images_start, initial_images_end); + + kdump_start = opt_kdump_megabytes_base << 20; + kdump_size = opt_kdump_megabytes << 20; + + printk("Kdump: %luMB (%lukB) at 0x%lx\n", + kdump_size >> 20, + kdump_size >> 10, + kdump_start); + + if ((kdump_start & ~PAGE_MASK) || (kdump_size & ~PAGE_MASK)) + panic("Kdump parameters not page aligned\n"); + + kdump_start >>= PAGE_SHIFT; + kdump_size >>= PAGE_SHIFT; + + /* allocate pages for Kdump memory area */ + + k = alloc_boot_pages_at(kdump_size, kdump_start); + + if (k != kdump_start) + panic("Unable to reserve Kdump memory\n"); + + /* allocate pages for relocated initial images */ + + k = ((initial_images_end - initial_images_start) & ~PAGE_MASK) ? 1 : 0; + k += (initial_images_end - initial_images_start) >> PAGE_SHIFT; + + k = alloc_boot_pages(k, 1); + + if (!k) + panic("Unable to allocate initial images memory\n"); + + move_memory(k << PAGE_SHIFT, initial_images_start, initial_images_end); + + initial_images_end -= initial_images_start; + initial_images_start = k << PAGE_SHIFT; + initial_images_end += initial_images_start; + } + memguard_init(); printk("System RAM: %luMB (%lukB)\n", --- x/xen/arch/x86/x86_32/Makefile +++ x/xen/arch/x86/x86_32/Makefile @@ -3,5 +3,6 @@ obj-y += entry.o obj-y += mm.o obj-y += seg_fixup.o obj-y += traps.o +obj-y += machine_kexec.o obj-$(supervisor_mode_kernel) += supervisor_mode_kernel.o --- x/xen/arch/x86/x86_32/entry.S +++ x/xen/arch/x86/x86_32/entry.S @@ -648,6 +648,7 @@ ENTRY(hypercall_table) .long do_xenoprof_op .long do_event_channel_op .long do_physdev_op + .long do_kexec .rept NR_hypercalls-((.-hypercall_table)/4) .long do_ni_hypercall .endr @@ -687,6 +688,7 @@ ENTRY(hypercall_args_table) .byte 2 /* do_xenoprof_op */ .byte 2 /* do_event_channel_op */ .byte 2 /* do_physdev_op */ + .byte 2 /* do_kexec */ .rept NR_hypercalls-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr --- /dev/null +++ x/xen/arch/x86/x86_32/machine_kexec.c @@ -0,0 +1,205 @@ +/****************************************************************************** + * arch/x86/x86_32/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <xen/config.h> +#include <xen/types.h> +#include <xen/domain_page.h> +#include <xen/timer.h> +#include <xen/sched.h> +#include <xen/reboot.h> +#include <xen/console.h> +#include <asm/page.h> +#include <asm/flushtlb.h> +#include <public/kexec.h> + +static void __machine_kexec(struct kexec_arg *arg); + +typedef asmlinkage void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long reboot_code_buffer, + unsigned long start_address, + unsigned int has_pae); + +#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) + +#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L2_ATTR (_PAGE_PRESENT) + +#ifndef CONFIG_X86_PAE + +static u32 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + unsigned long mfn; + u32 *pgtable_level2; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level2 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + write_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level2); +} + +#else +static u64 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; +static u64 pgtable_level2[L2_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + int mfn; + intpte_t *pgtable_level3; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level3 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + set_64bit(&pgtable_level3[l3_table_offset(address)], + __pa(pgtable_level2) | L2_ATTR); + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + load_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level3); +} +#endif + +static void kexec_load_segments(void) +{ +#define __SSTR(X) #X +#define SSTR(X) __SSTR(X) + __asm__ __volatile__ ( + "\tljmp $"SSTR(__HYPERVISOR_CS)",$1f\n" + "\t1:\n" + "\tmovl $"SSTR(__HYPERVISOR_DS)",%%eax\n" + "\tmovl %%eax,%%ds\n" + "\tmovl %%eax,%%es\n" + "\tmovl %%eax,%%fs\n" + "\tmovl %%eax,%%gs\n" + "\tmovl %%eax,%%ss\n" + ::: "eax", "memory"); +#undef SSTR +#undef __SSTR +} + +#define kexec_load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) +static void kexec_set_idt(void *newidt, __u16 limit) +{ + struct Xgt_desc_struct curidt; + + /* ia32 supports unaliged loads & stores */ + curidt.size = limit; + curidt.address = (unsigned long)newidt; + + kexec_load_idt(&curidt); + +}; + +#define kexec_load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) +static void kexec_set_gdt(void *newgdt, __u16 limit) +{ + struct Xgt_desc_struct curgdt; + + /* ia32 supports unaligned loads & stores */ + curgdt.size = limit; + curgdt.address = (unsigned long)newgdt; + + kexec_load_gdt(&curgdt); +}; + +static void __machine_shutdown(void *data) +{ + struct kexec_arg *arg = (struct kexec_arg *)data; + + printk("__machine_shutdown: cpu=%u\n", smp_processor_id()); + + watchdog_disable(); + console_start_sync(); + + smp_send_stop(); + +#ifdef CONFIG_X86_IO_APIC + disable_IO_APIC(); +#endif + + __machine_kexec(arg); +} + +void machine_shutdown(struct kexec_arg *arg) +{ + int reboot_cpu_id; + cpumask_t reboot_cpu; + + + reboot_cpu_id = 0; + + if (!cpu_isset(reboot_cpu_id, cpu_online_map)) + reboot_cpu_id = smp_processor_id(); + + if (reboot_cpu_id != smp_processor_id()) { + cpus_clear(reboot_cpu); + cpu_set(reboot_cpu_id, reboot_cpu); + on_selected_cpus(reboot_cpu, __machine_shutdown, arg, 1, 0); + for (;;) + ; /* nothing */ + } + else + __machine_shutdown(arg); + BUG(); +} + +static void __machine_kexec(struct kexec_arg *arg) +{ + relocate_new_kernel_t rnk; + + local_irq_disable(); + + identity_map_page(arg->u.kexec.reboot_code_buffer); + + copy_from_user((void *)arg->u.kexec.reboot_code_buffer, + arg->u.kexec.relocate_new_kernel, + arg->u.kexec.relocate_new_kernel_size); + + kexec_load_segments(); + kexec_set_gdt(__va(0),0); + kexec_set_idt(__va(0),0); + + rnk = (relocate_new_kernel_t) arg->u.kexec.reboot_code_buffer; + (*rnk)(arg->u.kexec.indirection_page, arg->u.kexec.reboot_code_buffer, + arg->u.kexec.start_address, cpu_has_pae); +} + +void machine_kexec(struct kexec_arg *arg) +{ + machine_shutdown(arg); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/arch/x86/x86_64/Makefile +++ x/xen/arch/x86/x86_64/Makefile @@ -1,3 +1,4 @@ obj-y += entry.o obj-y += mm.o obj-y += traps.o +obj-y += machine_kexec.o --- /dev/null +++ x/xen/arch/x86/x86_64/machine_kexec.c @@ -0,0 +1,25 @@ +/****************************************************************************** + * arch/x86/x86_64/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <xen/types.h> +#include <public/kexec.h> + +void machine_kexec(struct kexec_arg *arg) +{ + printk("machine_kexec: not implemented\n"); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/common/Makefile +++ x/xen/common/Makefile @@ -7,6 +7,7 @@ obj-y += event_channel.o obj-y += grant_table.o obj-y += kernel.o obj-y += keyhandler.o +obj-y += kexec.o obj-y += lib.o obj-y += memory.o obj-y += multicall.o --- /dev/null +++ x/xen/common/kexec.c @@ -0,0 +1,73 @@ +/* + * Achitecture independent kexec code for Xen + * + * At this statge, just a switch for the kexec hypercall into + * architecture dependent code. + * + * Created By: Horms <horms@verge.net.au> + */ + +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/guest_access.h> +#include <xen/sched.h> +#include <xen/types.h> +#include <public/kexec.h> + +extern int machine_kexec_prepare(struct kexec_arg *arg); +extern void machine_kexec_cleanup(struct kexec_arg *arg); +extern void machine_kexec(struct kexec_arg *arg); + +extern unsigned int opt_kdump_megabytes; +extern unsigned int opt_kdump_megabytes_base; + +int do_kexec(unsigned long op, + XEN_GUEST_HANDLE(kexec_arg_t) uarg) +{ + struct kexec_arg arg; + + if ( !IS_PRIV(current->domain) ) + return -EPERM; + + if (op == KEXEC_CMD_reserve) + { + arg.u.reserve.size = opt_kdump_megabytes << 20; + arg.u.reserve.start = opt_kdump_megabytes_base << 20; + if ( unlikely(copy_to_guest(uarg, &arg, 1) != 0) ) + { + printk("do_kexec: copy_to_guest failed"); + return -EFAULT; + } + return 0; + } + + if ( unlikely(copy_from_guest(&arg, uarg, 1) != 0) ) + { + printk("do_kexec: __copy_from_guest failed"); + return -EFAULT; + } + + switch(op) { + case KEXEC_CMD_kexec: + machine_kexec(&arg); + return -EINVAL; /* Not Reached */ + case KEXEC_CMD_kexec_prepare: + return machine_kexec_prepare(&arg); + case KEXEC_CMD_kexec_cleanup: + machine_kexec_cleanup(&arg); + return 0; + } + + return -EINVAL; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + --- x/xen/common/page_alloc.c +++ x/xen/common/page_alloc.c @@ -212,24 +212,35 @@ void init_boot_pages(paddr_t ps, paddr_t } } +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at) +{ + unsigned long i; + + for ( i = 0; i < nr_pfns; i++ ) + if ( allocated_in_map(pfn_at + i) ) + break; + + if ( i == nr_pfns ) + { + map_alloc(pfn_at, nr_pfns); + return pfn_at; + } + + return 0; +} + unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align) { - unsigned long pg, i; + unsigned long pg, i = 0; for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align ) { - for ( i = 0; i < nr_pfns; i++ ) - if ( allocated_in_map(pg + i) ) - break; - - if ( i == nr_pfns ) - { - map_alloc(pg, nr_pfns); - return pg; - } + i = alloc_boot_pages_at(nr_pfns, pg); + if (i != 0) + break; } - return 0; + return i; } --- x/xen/include/asm-x86/hypercall.h +++ x/xen/include/asm-x86/hypercall.h @@ -6,6 +6,8 @@ #define __ASM_X86_HYPERCALL_H__ #include <public/physdev.h> +#include <xen/types.h> +#include <public/kexec.h> extern long do_event_channel_op_compat( @@ -87,6 +89,10 @@ extern long arch_do_vcpu_op( int cmd, struct vcpu *v, XEN_GUEST_HANDLE(void) arg); +extern int +do_kexec( + unsigned long op, XEN_GUEST_HANDLE(kexec_arg_t) uarg); + #ifdef __x86_64__ extern long --- /dev/null +++ x/xen/include/public/kexec.h @@ -0,0 +1,45 @@ +/* + * kexec.h: Xen kexec public + * + * Created By: Horms <horms@verge.net.au> + */ + +#ifndef _XEN_PUBLIC_KEXEC_H +#define _XEN_PUBLIC_KEXEC_H + +#include "xen.h" + +/* + * Scratch space for passing arguments to the kexec hypercall + */ +typedef struct kexec_arg { + union { + struct { + unsigned long data; /* Not sure what this should be yet */ + } helper; + struct { + unsigned long indirection_page; + unsigned long reboot_code_buffer; + unsigned long start_address; + const char *relocate_new_kernel; + unsigned int relocate_new_kernel_size; + } kexec; + struct { + unsigned long size; + unsigned long start; + } reserve; + } u; +} kexec_arg_t; +DEFINE_XEN_GUEST_HANDLE(kexec_arg_t); + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/include/public/xen.h +++ x/xen/include/public/xen.h @@ -64,6 +64,7 @@ #define __HYPERVISOR_xenoprof_op 31 #define __HYPERVISOR_event_channel_op 32 #define __HYPERVISOR_physdev_op 33 +#define __HYPERVISOR_kexec_op 34 /* Architecture-specific hypercall definitions. */ #define __HYPERVISOR_arch_0 48 @@ -238,6 +239,14 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t); #define VMASST_TYPE_writable_pagetables 2 #define MAX_VMASST_TYPE 2 +/* + * Operations for kexec. + */ +#define KEXEC_CMD_kexec 0 +#define KEXEC_CMD_kexec_prepare 1 +#define KEXEC_CMD_kexec_cleanup 2 +#define KEXEC_CMD_reserve 3 + #ifndef __ASSEMBLY__ typedef uint16_t domid_t; --- x/xen/include/xen/mm.h +++ x/xen/include/xen/mm.h @@ -40,6 +40,7 @@ struct page_info; paddr_t init_boot_allocator(paddr_t bitmap_start); void init_boot_pages(paddr_t ps, paddr_t pe); unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align); +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at); void end_boot_allocator(void); /* Generic allocator. These functions are *not* interrupt-safe. */ --- /dev/null 2006-05-08 18:31:14.283785672 +0900 +++ x/linux-2.6.16.13/kexec.patch 2006-05-09 10:19:10.000000000 +0900 @@ -0,0 +1,175 @@ +--- x/ref-linux-2.6.16.13/drivers/base/cpu.c ++++ x/ref-linux-2.6.16.13/drivers/base/cpu.c +@@ -101,7 +101,11 @@ static ssize_t show_crash_notes(struct s + * boot up and this data does not change there after. Hence this + * operation should be safe. No locking required. + */ ++#ifndef CONFIG_XEN + addr = __pa(per_cpu_ptr(crash_notes, cpunum)); ++#else ++ addr = virt_to_machine(per_cpu_ptr(crash_notes, cpunum)); ++#endif + rc = sprintf(buf, "%Lx\n", addr); + return rc; + } +--- x/ref-linux-2.6.16.13/kernel/kexec.c ++++ x/ref-linux-2.6.16.13/kernel/kexec.c +@@ -38,6 +38,20 @@ struct resource crashk_res = { + .flags = IORESOURCE_BUSY | IORESOURCE_MEM + }; + ++/* Kexec needs to know about the actually physical addresss. ++ * But in xen, a physical address is a pseudo-physical addresss. */ ++#ifndef CONFIG_XEN ++#define kexec_page_to_pfn(page) page_to_pfn(page) ++#define kexec_pfn_to_page(pfn) pfn_to_page(pfn) ++#define kexec_virt_to_phys(addr) virt_to_phys(addr) ++#define kexec_phys_to_virt(addr) phys_to_virt(addr) ++#else ++#define kexec_page_to_pfn(page) pfn_to_mfn(page_to_pfn(page)) ++#define kexec_pfn_to_page(pfn) pfn_to_page(mfn_to_pfn(pfn)) ++#define kexec_virt_to_phys(addr) virt_to_machine(addr) ++#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr)) ++#endif ++ + int kexec_should_crash(struct task_struct *p) + { + if (in_interrupt() || !p->pid || p->pid == 1 || panic_on_oops) +@@ -403,7 +417,7 @@ static struct page *kimage_alloc_normal_ + pages = kimage_alloc_pages(GFP_KERNEL, order); + if (!pages) + break; +- pfn = page_to_pfn(pages); ++ pfn = kexec_page_to_pfn(pages); + epfn = pfn + count; + addr = pfn << PAGE_SHIFT; + eaddr = epfn << PAGE_SHIFT; +@@ -437,6 +451,7 @@ static struct page *kimage_alloc_normal_ + return pages; + } + ++#ifndef CONFIG_XEN + static struct page *kimage_alloc_crash_control_pages(struct kimage *image, + unsigned int order) + { +@@ -490,7 +505,7 @@ static struct page *kimage_alloc_crash_c + } + /* If I don''t overlap any segments I have found my hole! */ + if (i == image->nr_segments) { +- pages = pfn_to_page(hole_start >> PAGE_SHIFT); ++ pages = kexec_pfn_to_page(hole_start >> PAGE_SHIFT); + break; + } + } +@@ -517,6 +532,13 @@ struct page *kimage_alloc_control_pages( + + return pages; + } ++#else /* !CONFIG_XEN */ ++struct page *kimage_alloc_control_pages(struct kimage *image, ++ unsigned int order) ++{ ++ return kimage_alloc_normal_control_pages(image, order); ++} ++#endif + + static int kimage_add_entry(struct kimage *image, kimage_entry_t entry) + { +@@ -532,7 +554,7 @@ static int kimage_add_entry(struct kimag + return -ENOMEM; + + ind_page = page_address(page); +- *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION; ++ *image->entry = kexec_virt_to_phys(ind_page) | IND_INDIRECTION; + image->entry = ind_page; + image->last_entry = ind_page + + ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1); +@@ -593,13 +615,13 @@ static int kimage_terminate(struct kimag + #define for_each_kimage_entry(image, ptr, entry) \ + for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \ + ptr = (entry & IND_INDIRECTION)? \ +- phys_to_virt((entry & PAGE_MASK)): ptr +1) ++ kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1) + + static void kimage_free_entry(kimage_entry_t entry) + { + struct page *page; + +- page = pfn_to_page(entry >> PAGE_SHIFT); ++ page = kexec_pfn_to_page(entry >> PAGE_SHIFT); + kimage_free_pages(page); + } + +@@ -686,7 +708,7 @@ static struct page *kimage_alloc_page(st + * have a match. + */ + list_for_each_entry(page, &image->dest_pages, lru) { +- addr = page_to_pfn(page) << PAGE_SHIFT; ++ addr = kexec_page_to_pfn(page) << PAGE_SHIFT; + if (addr == destination) { + list_del(&page->lru); + return page; +@@ -701,12 +723,12 @@ static struct page *kimage_alloc_page(st + if (!page) + return NULL; + /* If the page cannot be used file it away */ +- if (page_to_pfn(page) > ++ if (kexec_page_to_pfn(page) > + (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { + list_add(&page->lru, &image->unuseable_pages); + continue; + } +- addr = page_to_pfn(page) << PAGE_SHIFT; ++ addr = kexec_page_to_pfn(page) << PAGE_SHIFT; + + /* If it is the destination page we want use it */ + if (addr == destination) +@@ -729,7 +751,7 @@ static struct page *kimage_alloc_page(st + struct page *old_page; + + old_addr = *old & PAGE_MASK; +- old_page = pfn_to_page(old_addr >> PAGE_SHIFT); ++ old_page = kexec_pfn_to_page(old_addr >> PAGE_SHIFT); + copy_highpage(page, old_page); + *old = addr | (*old & ~PAGE_MASK); + +@@ -779,7 +801,7 @@ static int kimage_load_normal_segment(st + result = -ENOMEM; + goto out; + } +- result = kimage_add_page(image, page_to_pfn(page) ++ result = kimage_add_page(image, kexec_page_to_pfn(page) + << PAGE_SHIFT); + if (result < 0) + goto out; +@@ -811,6 +833,7 @@ out: + return result; + } + ++#ifndef CONFIG_XEN + static int kimage_load_crash_segment(struct kimage *image, + struct kexec_segment *segment) + { +@@ -833,7 +856,7 @@ static int kimage_load_crash_segment(str + char *ptr; + size_t uchunk, mchunk; + +- page = pfn_to_page(maddr >> PAGE_SHIFT); ++ page = kexec_pfn_to_page(maddr >> PAGE_SHIFT); + if (page == 0) { + result = -ENOMEM; + goto out; +@@ -881,6 +904,13 @@ static int kimage_load_segment(struct ki + + return result; + } ++#else /* CONFIG_XEN */ ++static int kimage_load_segment(struct kimage *image, ++ struct kexec_segment *segment) ++{ ++ return kimage_load_normal_segment(image, segment); ++} ++#endif + + /* + * Exec Kernel system call: for obvious reasons only root may call it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, May 09, 2006 at 01:16:32PM +0900, Horms wrote:> On Sun, May 07, 2006 at 01:45:22PM +0900, Horms wrote: > > On Sat, May 06, 2006 at 05:44:44PM +0900, Akio Takebe wrote: > > > > > I think you can make a patch in patches/linux-2.6.16/ if you would > > > modify these. > > > > Yes, that is probably the best way forward, I''ll work on breaking it > > out in that manner. > > Hi Takebe-san, > > here is an updated version of the patch which moves portions into > patches/linux-2.6.16/ as you suggested. It also moves to > xen-unstable 9969 / Linux 2.6.16.13 and has some minor build fixes, > for problems that crept into the previous patch.Sorry, this mornin''s patch had the internal patch in the wrong location and with the wrong diff level. -- Horms http://www.vergenet.net/~horms/ kexec: framework and i386 This is an implementation of kexec for dom0/xen, that allows kexecing of the physical machine from xen. The approach taken is to move the architecture-dependant kexec code into a new hypercall. Some notes: * machine_kexec_cleanup() and machine_kexec_prepare() don''t do anything in i386. So while this patch adds a framework for them, I am not sure what parameters are needs at this stage. * Only works for UP, as machine_shutdown is not implemented yet * kexecing into xen does not seem to work, I think that kexec-tools needs updating, but I have not investigated yet * Kdump works by first copying the kernel into dom0 segments and relocating them later in xen, the same way that kexec does The only difference is that the relocation is made into an area reserved by xen * Kdump reservation is made using the xen command line parameters, kdump_megabytes and kdump_megabytes_base, rather than the linux option crashkernel, which is now ignored. Two parameters are used instead of one to simplify parsing. This can be cleaned up later if desired. But the reservation seems to need to be made by xen to make sure that it happens early enough. The tested values are kdump_megabytes=16, kdump_megabytes_base=32 (kdump_megabytes_base=16 does not seem to work) * This patch uses a new kexec hypercall * SMP Kexec works, Kdump is next on the list Highlights since the previous posted version: * Diff now applies to a xen checkout from hg (previously it assumed that the kernel was unpacked) - xen-unstable-hg 9660 / Linux 2.6.16.13 * Added machine_shutdown, which disapperared in the previous release of this patch * Fixed include problems in kexec.h Prepared by Horms and Magnus Damm Signed-Off-By: Magnus Damm <magnus@valinux.co.jp> Signed-Off-By: Horms <horms@verge.net.au> buildconfigs/linux-defconfig_xen_x86_32 | 1 linux-2.6-xen-sparse/arch/i386/Kconfig | 2 linux-2.6-xen-sparse/arch/i386/kernel/Makefile | 2 linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c | 24 + linux-2.6-xen-sparse/drivers/xen/core/Makefile | 1 linux-2.6-xen-sparse/drivers/xen/core/crash.c | 98 ++++ linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c | 73 +++ linux-2.6-xen-sparse/drivers/xen/core/reboot.c | 4 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h | 10 patches/linux-2.6.16.13/kexec.patch | 175 ++++++++ xen/arch/x86/Makefile | 1 xen/arch/x86/dom0_ops.c | 3 xen/arch/x86/machine_kexec.c | 28 + xen/arch/x86/setup.c | 75 +++ xen/arch/x86/x86_32/Makefile | 1 xen/arch/x86/x86_32/entry.S | 2 xen/arch/x86/x86_32/machine_kexec.c | 205 ++++++++++ xen/arch/x86/x86_64/Makefile | 1 xen/arch/x86/x86_64/machine_kexec.c | 25 + xen/common/Makefile | 1 xen/common/kexec.c | 73 +++ xen/common/page_alloc.c | 33 + xen/include/asm-x86/hypercall.h | 6 xen/include/public/kexec.h | 45 ++ xen/include/public/xen.h | 9 xen/include/xen/mm.h | 1 26 files changed, 877 insertions(+), 22 deletions(-) --- x/buildconfigs/linux-defconfig_xen_x86_32 +++ x/buildconfigs/linux-defconfig_xen_x86_32 @@ -184,6 +184,7 @@ CONFIG_MTRR=y CONFIG_REGPARM=y CONFIG_SECCOMP=y CONFIG_HZ_100=y +CONFIG_KEXEC=y # CONFIG_HZ_250 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 --- x/linux-2.6-xen-sparse/arch/i386/Kconfig +++ x/linux-2.6-xen-sparse/arch/i386/Kconfig @@ -726,7 +726,7 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call (EXPERIMENTAL)" - depends on EXPERIMENTAL && !X86_XEN + depends on EXPERIMENTAL help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot --- x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile +++ x/linux-2.6-xen-sparse/arch/i386/kernel/Makefile @@ -89,7 +89,7 @@ include $(srctree)/scripts/Makefile.xen obj-y += fixup.o microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o -n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o +n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o machine_kexec.o crash.o obj-y := $(call filterxen, $(obj-y), $(n-obj-xen)) obj-y := $(call cherrypickxen, $(obj-y)) --- x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c +++ x/linux-2.6-xen-sparse/arch/i386/kernel/setup-xen.c @@ -68,6 +68,10 @@ #include "setup_arch_pre.h" #include <bios_ebda.h> +#ifdef CONFIG_XEN +#include <xen/interface/kexec.h> +#endif + /* Forward Declaration. */ void __init find_max_pfn(void); @@ -932,6 +936,7 @@ static void __init parse_cmdline_early ( * after a kernel panic. */ else if (!memcmp(from, "crashkernel=", 12)) { +#ifndef CONFIG_XEN unsigned long size, base; size = memparse(from+12, &from); if (*from == ''@'') { @@ -942,6 +947,10 @@ static void __init parse_cmdline_early ( crashk_res.start = base; crashk_res.end = base + size - 1; } +#else + printk("Ignoring crashkernel command line, " + "parameter will be supplied by xen\n"); +#endif } #endif #ifdef CONFIG_PROC_VMCORE @@ -1318,9 +1327,21 @@ void __init setup_bootmem_allocator(void } #endif #ifdef CONFIG_KEXEC +#ifndef CONFIG_XEN if (crashk_res.start != crashk_res.end) reserve_bootmem(crashk_res.start, crashk_res.end - crashk_res.start + 1); +#else + { + struct kexec_arg xen_kexec_arg; + BUG_ON(HYPERVISOR_kexec(KEXEC_CMD_reserve, &xen_kexec_arg)); + if (xen_kexec_arg.u.reserve.size) { + crashk_res.start = xen_kexec_arg.u.reserve.start; + crashk_res.end = xen_kexec_arg.u.reserve.start + + xen_kexec_arg.u.reserve.size - 1; + } + } +#endif #endif if (!xen_feature(XENFEAT_auto_translated_physmap)) @@ -1395,6 +1416,9 @@ legacy_init_iomem_resources(struct resou res->end = map[i].end - 1; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; request_resource(&iomem_resource, res); +#ifdef CONFIG_KEXEC + request_resource(res, &crashk_res); +#endif } free_bootmem(__pa(map), PAGE_SIZE); --- x/linux-2.6-xen-sparse/drivers/xen/core/Makefile +++ x/linux-2.6-xen-sparse/drivers/xen/core/Makefile @@ -9,3 +9,4 @@ obj-$(CONFIG_NET) += skbuff.o obj-$(CONFIG_SMP) += smpboot.o obj-$(CONFIG_SYSFS) += hypervisor_sysfs.o obj-$(CONFIG_XEN_SYSFS) += xen_sysfs.o +obj-$(CONFIG_KEXEC) += machine_kexec.o crash.o --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/crash.c @@ -0,0 +1,98 @@ +/* + * Architecture specific (i386-xen) functions for kexec based crash dumps. + * + * Created by: Horms <horms@verge.net.au> + * + */ + +#include <linux/kernel.h> /* For printk */ + +/* XXX: final_note(), crash_save_this_cpu() and crash_save_self() + * are copied from arch/i386/kernel/crash.c, might be good to either + * the original functions non-static and use them, or just + * merge this this into that file. + */ +#include <linux/elf.h> /* For struct elf_note */ +#include <linux/elfcore.h> /* For struct elf_prstatus */ +#include <linux/kexec.h> /* crash_notes */ + +static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, + size_t data_len) +{ + struct elf_note note; + + note.n_namesz = strlen(name) + 1; + note.n_descsz = data_len; + note.n_type = type; + memcpy(buf, ¬e, sizeof(note)); + buf += (sizeof(note) +3)/4; + memcpy(buf, name, note.n_namesz); + buf += (note.n_namesz + 3)/4; + memcpy(buf, data, note.n_descsz); + buf += (note.n_descsz + 3)/4; + + return buf; +} + +static void final_note(u32 *buf) +{ + struct elf_note note; + + note.n_namesz = 0; + note.n_descsz = 0; + note.n_type = 0; + memcpy(buf, ¬e, sizeof(note)); +} + +static void crash_save_this_cpu(struct pt_regs *regs, int cpu) +{ + struct elf_prstatus prstatus; + u32 *buf; + + if ((cpu < 0) || (cpu >= NR_CPUS)) + return; + + /* Using ELF notes here is opportunistic. + * I need a well defined structure format + * for the data I pass, and I need tags + * on the data to indicate what information I have + * squirrelled away. ELF notes happen to provide + * all of that that no need to invent something new. + */ + buf = (u32*)per_cpu_ptr(crash_notes, cpu); + if (!buf) + return; + memset(&prstatus, 0, sizeof(prstatus)); + prstatus.pr_pid = current->pid; + elf_core_copy_regs(&prstatus.pr_reg, regs); + buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus, + sizeof(prstatus)); + final_note(buf); +} + +static void crash_save_self(struct pt_regs *regs) +{ + int cpu; + + cpu = smp_processor_id(); + crash_save_this_cpu(regs, cpu); +} + + +void machine_crash_shutdown(struct pt_regs *regs) +{ + /* XXX: This should do something */ + printk("xen-kexec: Need to turn of other CPUS in " + "machine_crash_shutdown()\n"); + crash_save_self(regs); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- /dev/null +++ x/linux-2.6-xen-sparse/drivers/xen/core/machine_kexec.c @@ -0,0 +1,73 @@ +/* + * machine_kexec.c - handle transition of Linux booting another kernel + * + * Created By: Horms <horms@verge.net.au> + * + * Losely based on arch/i386/kernel/machine_kexec.c + */ + +#include <linux/kexec.h> +#include <xen/interface/kexec.h> +#include <linux/mm.h> +#include <asm/hypercall.h> + +const extern unsigned char relocate_new_kernel[]; +extern unsigned int relocate_new_kernel_size; + +/* + * A architecture hook called to validate the + * proposed image and prepare the control pages + * as needed. The pages for KEXEC_CONTROL_CODE_SIZE + * have been allocated, but the segments have yet + * been copied into the kernel. + * + * Do what every setup is needed on image and the + * reboot code buffer to allow us to avoid allocations + * later. + * + * Currently nothing. + */ +int machine_kexec_prepare(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.helper.data = NULL; + return HYPERVISOR_kexec(KEXEC_CMD_kexec_prepare, &hypercall_arg); +} + +/* + * Undo anything leftover by machine_kexec_prepare + * when an image is freed. + */ +void machine_kexec_cleanup(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.helper.data = NULL; + HYPERVISOR_kexec(KEXEC_CMD_kexec_cleanup, &hypercall_arg); +} + +/* + * Do not allocate memory (or fail in any way) in machine_kexec(). + * We are past the point of no return, committed to rebooting now. + */ +NORET_TYPE void machine_kexec(struct kimage *image) +{ + kexec_arg_t hypercall_arg; + hypercall_arg.u.kexec.indirection_page = image->head; + hypercall_arg.u.kexec.reboot_code_buffer = + pfn_to_mfn(page_to_pfn(image->control_code_page)) << PAGE_SHIFT; + hypercall_arg.u.kexec.start_address = image->start; + hypercall_arg.u.kexec.relocate_new_kernel = relocate_new_kernel; + hypercall_arg.u.kexec.relocate_new_kernel_size = + relocate_new_kernel_size; + HYPERVISOR_kexec(KEXEC_CMD_kexec, &hypercall_arg); +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ --- x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c +++ x/linux-2.6-xen-sparse/drivers/xen/core/reboot.c @@ -66,6 +66,10 @@ void machine_power_off(void) HYPERVISOR_shutdown(SHUTDOWN_poweroff); } +#ifdef CONFIG_KEXEC +void machine_shutdown(void) { } +#endif + int reboot_thru_bios = 0; /* for dmi_scan.c */ EXPORT_SYMBOL(machine_restart); EXPORT_SYMBOL(machine_halt); --- x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h +++ x/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h @@ -39,6 +39,8 @@ # error "please don''t include this file directly" #endif +#include <xen/interface/kexec.h> + #define __STR(x) #x #define STR(x) __STR(x) @@ -359,6 +361,14 @@ HYPERVISOR_xenoprof_op( return _hypercall2(int, xenoprof_op, op, arg); } +static inline int +HYPERVISOR_kexec( + unsigned long op, kexec_arg_t * arg) +{ + return _hypercall2(int, kexec_op, op, arg); +} + + #endif /* __HYPERCALL_H__ */ --- x/xen/arch/x86/Makefile +++ x/xen/arch/x86/Makefile @@ -39,6 +39,7 @@ obj-y += trampoline.o obj-y += traps.o obj-y += usercopy.o obj-y += x86_emulate.o +obj-y += machine_kexec.o ifneq ($(pae),n) obj-$(x86_32) += shadow.o shadow_public.o shadow_guest32.o --- x/xen/arch/x86/dom0_ops.c +++ x/xen/arch/x86/dom0_ops.c @@ -29,6 +29,9 @@ #include <asm/mtrr.h> #include "cpu/mtrr/mtrr.h" +extern unsigned int opt_kdump_megabytes; +extern unsigned int opt_kdump_megabytes_base; + #define TRC_DOM0OP_ENTER_BASE 0x00020000 #define TRC_DOM0OP_LEAVE_BASE 0x00030000 --- /dev/null +++ x/xen/arch/x86/machine_kexec.c @@ -0,0 +1,28 @@ +/****************************************************************************** + * arch/x86/machine_kexec.c + * + * Created By: Horms + * + */ + +#include <xen/types.h> +#include <public/kexec.h> + +int machine_kexec_prepare(struct kexec_arg *arg) +{ + return 0; +} + +void machine_kexec_cleanup(struct kexec_arg *arg) +{ +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/arch/x86/setup.c +++ x/xen/arch/x86/setup.c @@ -38,6 +38,11 @@ static unsigned int opt_xenheap_megabyte integer_param("xenheap_megabytes", opt_xenheap_megabytes); #endif +unsigned int opt_kdump_megabytes = 0; +integer_param("kdump_megabytes", opt_kdump_megabytes); +unsigned int opt_kdump_megabytes_base = 0; +integer_param("kdump_megabytes_base", opt_kdump_megabytes_base); + /* opt_nosmp: If true, secondary processors are ignored. */ static int opt_nosmp = 0; boolean_param("nosmp", opt_nosmp); @@ -192,6 +197,20 @@ static void percpu_free_unused_areas(voi __pa(__per_cpu_end)); } +void __init move_memory(unsigned long dst, + unsigned long src_start, unsigned long src_end) +{ +#if defined(CONFIG_X86_32) + memmove((void *)dst, /* use low mapping */ + (void *)src_start, /* use low mapping */ + src_end - src_start); +#elif defined(CONFIG_X86_64) + memmove(__va(dst), + __va(src_start), + src_end - src_start); +#endif +} + void __init __start_xen(multiboot_info_t *mbi) { char __cmdline[] = "", *cmdline = __cmdline; @@ -327,15 +346,8 @@ void __init __start_xen(multiboot_info_t initial_images_start = xenheap_phys_end; initial_images_end = initial_images_start + modules_length; -#if defined(CONFIG_X86_32) - memmove((void *)initial_images_start, /* use low mapping */ - (void *)mod[0].mod_start, /* use low mapping */ - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#elif defined(CONFIG_X86_64) - memmove(__va(initial_images_start), - __va(mod[0].mod_start), - mod[mbi->mods_count-1].mod_end - mod[0].mod_start); -#endif + move_memory(initial_images_start, + mod[0].mod_start, mod[mbi->mods_count-1].mod_end); /* Initialise boot-time allocator with all RAM situated after modules. */ xenheap_phys_start = init_boot_allocator(__pa(&_end)); @@ -383,6 +395,51 @@ void __init __start_xen(multiboot_info_t #endif } + if (opt_kdump_megabytes) { + unsigned long kdump_start, kdump_size, k; + + /* mark images pages as free for now */ + + init_boot_pages(initial_images_start, initial_images_end); + + kdump_start = opt_kdump_megabytes_base << 20; + kdump_size = opt_kdump_megabytes << 20; + + printk("Kdump: %luMB (%lukB) at 0x%lx\n", + kdump_size >> 20, + kdump_size >> 10, + kdump_start); + + if ((kdump_start & ~PAGE_MASK) || (kdump_size & ~PAGE_MASK)) + panic("Kdump parameters not page aligned\n"); + + kdump_start >>= PAGE_SHIFT; + kdump_size >>= PAGE_SHIFT; + + /* allocate pages for Kdump memory area */ + + k = alloc_boot_pages_at(kdump_size, kdump_start); + + if (k != kdump_start) + panic("Unable to reserve Kdump memory\n"); + + /* allocate pages for relocated initial images */ + + k = ((initial_images_end - initial_images_start) & ~PAGE_MASK) ? 1 : 0; + k += (initial_images_end - initial_images_start) >> PAGE_SHIFT; + + k = alloc_boot_pages(k, 1); + + if (!k) + panic("Unable to allocate initial images memory\n"); + + move_memory(k << PAGE_SHIFT, initial_images_start, initial_images_end); + + initial_images_end -= initial_images_start; + initial_images_start = k << PAGE_SHIFT; + initial_images_end += initial_images_start; + } + memguard_init(); printk("System RAM: %luMB (%lukB)\n", --- x/xen/arch/x86/x86_32/Makefile +++ x/xen/arch/x86/x86_32/Makefile @@ -3,5 +3,6 @@ obj-y += entry.o obj-y += mm.o obj-y += seg_fixup.o obj-y += traps.o +obj-y += machine_kexec.o obj-$(supervisor_mode_kernel) += supervisor_mode_kernel.o --- x/xen/arch/x86/x86_32/entry.S +++ x/xen/arch/x86/x86_32/entry.S @@ -648,6 +648,7 @@ ENTRY(hypercall_table) .long do_xenoprof_op .long do_event_channel_op .long do_physdev_op + .long do_kexec .rept NR_hypercalls-((.-hypercall_table)/4) .long do_ni_hypercall .endr @@ -687,6 +688,7 @@ ENTRY(hypercall_args_table) .byte 2 /* do_xenoprof_op */ .byte 2 /* do_event_channel_op */ .byte 2 /* do_physdev_op */ + .byte 2 /* do_kexec */ .rept NR_hypercalls-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr --- /dev/null +++ x/xen/arch/x86/x86_32/machine_kexec.c @@ -0,0 +1,205 @@ +/****************************************************************************** + * arch/x86/x86_32/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <xen/config.h> +#include <xen/types.h> +#include <xen/domain_page.h> +#include <xen/timer.h> +#include <xen/sched.h> +#include <xen/reboot.h> +#include <xen/console.h> +#include <asm/page.h> +#include <asm/flushtlb.h> +#include <public/kexec.h> + +static void __machine_kexec(struct kexec_arg *arg); + +typedef asmlinkage void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long reboot_code_buffer, + unsigned long start_address, + unsigned int has_pae); + +#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) + +#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define L2_ATTR (_PAGE_PRESENT) + +#ifndef CONFIG_X86_PAE + +static u32 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + unsigned long mfn; + u32 *pgtable_level2; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level2 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + write_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level2); +} + +#else +static u64 pgtable_level1[L1_PAGETABLE_ENTRIES] PAGE_ALIGNED; +static u64 pgtable_level2[L2_PAGETABLE_ENTRIES] PAGE_ALIGNED; + +static void identity_map_page(unsigned long address) +{ + int mfn; + intpte_t *pgtable_level3; + + /* Find the current page table */ + mfn = read_cr3() >> PAGE_SHIFT; + pgtable_level3 = map_domain_page(mfn); + + /* Identity map the page table entry */ + pgtable_level1[l1_table_offset(address)] = address | L0_ATTR; + pgtable_level2[l2_table_offset(address)] = __pa(pgtable_level1) | L1_ATTR; + set_64bit(&pgtable_level3[l3_table_offset(address)], + __pa(pgtable_level2) | L2_ATTR); + + /* Flush the tlb so the new mapping takes effect. + * Global tlb entries are not flushed but that is not an issue. + */ + load_cr3(mfn << PAGE_SHIFT); + + unmap_domain_page(pgtable_level3); +} +#endif + +static void kexec_load_segments(void) +{ +#define __SSTR(X) #X +#define SSTR(X) __SSTR(X) + __asm__ __volatile__ ( + "\tljmp $"SSTR(__HYPERVISOR_CS)",$1f\n" + "\t1:\n" + "\tmovl $"SSTR(__HYPERVISOR_DS)",%%eax\n" + "\tmovl %%eax,%%ds\n" + "\tmovl %%eax,%%es\n" + "\tmovl %%eax,%%fs\n" + "\tmovl %%eax,%%gs\n" + "\tmovl %%eax,%%ss\n" + ::: "eax", "memory"); +#undef SSTR +#undef __SSTR +} + +#define kexec_load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) +static void kexec_set_idt(void *newidt, __u16 limit) +{ + struct Xgt_desc_struct curidt; + + /* ia32 supports unaliged loads & stores */ + curidt.size = limit; + curidt.address = (unsigned long)newidt; + + kexec_load_idt(&curidt); + +}; + +#define kexec_load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) +static void kexec_set_gdt(void *newgdt, __u16 limit) +{ + struct Xgt_desc_struct curgdt; + + /* ia32 supports unaligned loads & stores */ + curgdt.size = limit; + curgdt.address = (unsigned long)newgdt; + + kexec_load_gdt(&curgdt); +}; + +static void __machine_shutdown(void *data) +{ + struct kexec_arg *arg = (struct kexec_arg *)data; + + printk("__machine_shutdown: cpu=%u\n", smp_processor_id()); + + watchdog_disable(); + console_start_sync(); + + smp_send_stop(); + +#ifdef CONFIG_X86_IO_APIC + disable_IO_APIC(); +#endif + + __machine_kexec(arg); +} + +void machine_shutdown(struct kexec_arg *arg) +{ + int reboot_cpu_id; + cpumask_t reboot_cpu; + + + reboot_cpu_id = 0; + + if (!cpu_isset(reboot_cpu_id, cpu_online_map)) + reboot_cpu_id = smp_processor_id(); + + if (reboot_cpu_id != smp_processor_id()) { + cpus_clear(reboot_cpu); + cpu_set(reboot_cpu_id, reboot_cpu); + on_selected_cpus(reboot_cpu, __machine_shutdown, arg, 1, 0); + for (;;) + ; /* nothing */ + } + else + __machine_shutdown(arg); + BUG(); +} + +static void __machine_kexec(struct kexec_arg *arg) +{ + relocate_new_kernel_t rnk; + + local_irq_disable(); + + identity_map_page(arg->u.kexec.reboot_code_buffer); + + copy_from_user((void *)arg->u.kexec.reboot_code_buffer, + arg->u.kexec.relocate_new_kernel, + arg->u.kexec.relocate_new_kernel_size); + + kexec_load_segments(); + kexec_set_gdt(__va(0),0); + kexec_set_idt(__va(0),0); + + rnk = (relocate_new_kernel_t) arg->u.kexec.reboot_code_buffer; + (*rnk)(arg->u.kexec.indirection_page, arg->u.kexec.reboot_code_buffer, + arg->u.kexec.start_address, cpu_has_pae); +} + +void machine_kexec(struct kexec_arg *arg) +{ + machine_shutdown(arg); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/arch/x86/x86_64/Makefile +++ x/xen/arch/x86/x86_64/Makefile @@ -1,3 +1,4 @@ obj-y += entry.o obj-y += mm.o obj-y += traps.o +obj-y += machine_kexec.o --- /dev/null +++ x/xen/arch/x86/x86_64/machine_kexec.c @@ -0,0 +1,25 @@ +/****************************************************************************** + * arch/x86/x86_64/machine_kexec.c + * + * Created By: Horms + * + * Based heavily on arch/i386/machine_kexec.c from Linux 2.6.16 + */ + +#include <xen/types.h> +#include <public/kexec.h> + +void machine_kexec(struct kexec_arg *arg) +{ + printk("machine_kexec: not implemented\n"); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/common/Makefile +++ x/xen/common/Makefile @@ -7,6 +7,7 @@ obj-y += event_channel.o obj-y += grant_table.o obj-y += kernel.o obj-y += keyhandler.o +obj-y += kexec.o obj-y += lib.o obj-y += memory.o obj-y += multicall.o --- /dev/null +++ x/xen/common/kexec.c @@ -0,0 +1,73 @@ +/* + * Achitecture independent kexec code for Xen + * + * At this statge, just a switch for the kexec hypercall into + * architecture dependent code. + * + * Created By: Horms <horms@verge.net.au> + */ + +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/guest_access.h> +#include <xen/sched.h> +#include <xen/types.h> +#include <public/kexec.h> + +extern int machine_kexec_prepare(struct kexec_arg *arg); +extern void machine_kexec_cleanup(struct kexec_arg *arg); +extern void machine_kexec(struct kexec_arg *arg); + +extern unsigned int opt_kdump_megabytes; +extern unsigned int opt_kdump_megabytes_base; + +int do_kexec(unsigned long op, + XEN_GUEST_HANDLE(kexec_arg_t) uarg) +{ + struct kexec_arg arg; + + if ( !IS_PRIV(current->domain) ) + return -EPERM; + + if (op == KEXEC_CMD_reserve) + { + arg.u.reserve.size = opt_kdump_megabytes << 20; + arg.u.reserve.start = opt_kdump_megabytes_base << 20; + if ( unlikely(copy_to_guest(uarg, &arg, 1) != 0) ) + { + printk("do_kexec: copy_to_guest failed"); + return -EFAULT; + } + return 0; + } + + if ( unlikely(copy_from_guest(&arg, uarg, 1) != 0) ) + { + printk("do_kexec: __copy_from_guest failed"); + return -EFAULT; + } + + switch(op) { + case KEXEC_CMD_kexec: + machine_kexec(&arg); + return -EINVAL; /* Not Reached */ + case KEXEC_CMD_kexec_prepare: + return machine_kexec_prepare(&arg); + case KEXEC_CMD_kexec_cleanup: + machine_kexec_cleanup(&arg); + return 0; + } + + return -EINVAL; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ + --- x/xen/common/page_alloc.c +++ x/xen/common/page_alloc.c @@ -212,24 +212,35 @@ void init_boot_pages(paddr_t ps, paddr_t } } +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at) +{ + unsigned long i; + + for ( i = 0; i < nr_pfns; i++ ) + if ( allocated_in_map(pfn_at + i) ) + break; + + if ( i == nr_pfns ) + { + map_alloc(pfn_at, nr_pfns); + return pfn_at; + } + + return 0; +} + unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align) { - unsigned long pg, i; + unsigned long pg, i = 0; for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align ) { - for ( i = 0; i < nr_pfns; i++ ) - if ( allocated_in_map(pg + i) ) - break; - - if ( i == nr_pfns ) - { - map_alloc(pg, nr_pfns); - return pg; - } + i = alloc_boot_pages_at(nr_pfns, pg); + if (i != 0) + break; } - return 0; + return i; } --- x/xen/include/asm-x86/hypercall.h +++ x/xen/include/asm-x86/hypercall.h @@ -6,6 +6,8 @@ #define __ASM_X86_HYPERCALL_H__ #include <public/physdev.h> +#include <xen/types.h> +#include <public/kexec.h> extern long do_event_channel_op_compat( @@ -87,6 +89,10 @@ extern long arch_do_vcpu_op( int cmd, struct vcpu *v, XEN_GUEST_HANDLE(void) arg); +extern int +do_kexec( + unsigned long op, XEN_GUEST_HANDLE(kexec_arg_t) uarg); + #ifdef __x86_64__ extern long --- /dev/null +++ x/xen/include/public/kexec.h @@ -0,0 +1,45 @@ +/* + * kexec.h: Xen kexec public + * + * Created By: Horms <horms@verge.net.au> + */ + +#ifndef _XEN_PUBLIC_KEXEC_H +#define _XEN_PUBLIC_KEXEC_H + +#include "xen.h" + +/* + * Scratch space for passing arguments to the kexec hypercall + */ +typedef struct kexec_arg { + union { + struct { + unsigned long data; /* Not sure what this should be yet */ + } helper; + struct { + unsigned long indirection_page; + unsigned long reboot_code_buffer; + unsigned long start_address; + const char *relocate_new_kernel; + unsigned int relocate_new_kernel_size; + } kexec; + struct { + unsigned long size; + unsigned long start; + } reserve; + } u; +} kexec_arg_t; +DEFINE_XEN_GUEST_HANDLE(kexec_arg_t); + +#endif + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ --- x/xen/include/public/xen.h +++ x/xen/include/public/xen.h @@ -64,6 +64,7 @@ #define __HYPERVISOR_xenoprof_op 31 #define __HYPERVISOR_event_channel_op 32 #define __HYPERVISOR_physdev_op 33 +#define __HYPERVISOR_kexec_op 34 /* Architecture-specific hypercall definitions. */ #define __HYPERVISOR_arch_0 48 @@ -238,6 +239,14 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t); #define VMASST_TYPE_writable_pagetables 2 #define MAX_VMASST_TYPE 2 +/* + * Operations for kexec. + */ +#define KEXEC_CMD_kexec 0 +#define KEXEC_CMD_kexec_prepare 1 +#define KEXEC_CMD_kexec_cleanup 2 +#define KEXEC_CMD_reserve 3 + #ifndef __ASSEMBLY__ typedef uint16_t domid_t; --- x/xen/include/xen/mm.h +++ x/xen/include/xen/mm.h @@ -40,6 +40,7 @@ struct page_info; paddr_t init_boot_allocator(paddr_t bitmap_start); void init_boot_pages(paddr_t ps, paddr_t pe); unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align); +unsigned long alloc_boot_pages_at(unsigned long nr_pfns, unsigned long pfn_at); void end_boot_allocator(void); /* Generic allocator. These functions are *not* interrupt-safe. */ --- /dev/null 2006-05-09 15:32:30.399072192 +0900 +++ x/patches/linux-2.6.16.13/kexec.patch 2006-05-09 18:03:46.000000000 +0900 @@ -0,0 +1,175 @@ +--- x/drivers/base/cpu.c ++++ x/drivers/base/cpu.c +@@ -101,7 +101,11 @@ static ssize_t show_crash_notes(struct s + * boot up and this data does not change there after. Hence this + * operation should be safe. No locking required. + */ ++#ifndef CONFIG_XEN + addr = __pa(per_cpu_ptr(crash_notes, cpunum)); ++#else ++ addr = virt_to_machine(per_cpu_ptr(crash_notes, cpunum)); ++#endif + rc = sprintf(buf, "%Lx\n", addr); + return rc; + } +--- x/kernel/kexec.c ++++ x/kernel/kexec.c +@@ -38,6 +38,20 @@ struct resource crashk_res = { + .flags = IORESOURCE_BUSY | IORESOURCE_MEM + }; + ++/* Kexec needs to know about the actually physical addresss. ++ * But in xen, a physical address is a pseudo-physical addresss. */ ++#ifndef CONFIG_XEN ++#define kexec_page_to_pfn(page) page_to_pfn(page) ++#define kexec_pfn_to_page(pfn) pfn_to_page(pfn) ++#define kexec_virt_to_phys(addr) virt_to_phys(addr) ++#define kexec_phys_to_virt(addr) phys_to_virt(addr) ++#else ++#define kexec_page_to_pfn(page) pfn_to_mfn(page_to_pfn(page)) ++#define kexec_pfn_to_page(pfn) pfn_to_page(mfn_to_pfn(pfn)) ++#define kexec_virt_to_phys(addr) virt_to_machine(addr) ++#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr)) ++#endif ++ + int kexec_should_crash(struct task_struct *p) + { + if (in_interrupt() || !p->pid || p->pid == 1 || panic_on_oops) +@@ -403,7 +417,7 @@ static struct page *kimage_alloc_normal_ + pages = kimage_alloc_pages(GFP_KERNEL, order); + if (!pages) + break; +- pfn = page_to_pfn(pages); ++ pfn = kexec_page_to_pfn(pages); + epfn = pfn + count; + addr = pfn << PAGE_SHIFT; + eaddr = epfn << PAGE_SHIFT; +@@ -437,6 +451,7 @@ static struct page *kimage_alloc_normal_ + return pages; + } + ++#ifndef CONFIG_XEN + static struct page *kimage_alloc_crash_control_pages(struct kimage *image, + unsigned int order) + { +@@ -490,7 +505,7 @@ static struct page *kimage_alloc_crash_c + } + /* If I don''t overlap any segments I have found my hole! */ + if (i == image->nr_segments) { +- pages = pfn_to_page(hole_start >> PAGE_SHIFT); ++ pages = kexec_pfn_to_page(hole_start >> PAGE_SHIFT); + break; + } + } +@@ -517,6 +532,13 @@ struct page *kimage_alloc_control_pages( + + return pages; + } ++#else /* !CONFIG_XEN */ ++struct page *kimage_alloc_control_pages(struct kimage *image, ++ unsigned int order) ++{ ++ return kimage_alloc_normal_control_pages(image, order); ++} ++#endif + + static int kimage_add_entry(struct kimage *image, kimage_entry_t entry) + { +@@ -532,7 +554,7 @@ static int kimage_add_entry(struct kimag + return -ENOMEM; + + ind_page = page_address(page); +- *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION; ++ *image->entry = kexec_virt_to_phys(ind_page) | IND_INDIRECTION; + image->entry = ind_page; + image->last_entry = ind_page + + ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1); +@@ -593,13 +615,13 @@ static int kimage_terminate(struct kimag + #define for_each_kimage_entry(image, ptr, entry) \ + for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \ + ptr = (entry & IND_INDIRECTION)? \ +- phys_to_virt((entry & PAGE_MASK)): ptr +1) ++ kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1) + + static void kimage_free_entry(kimage_entry_t entry) + { + struct page *page; + +- page = pfn_to_page(entry >> PAGE_SHIFT); ++ page = kexec_pfn_to_page(entry >> PAGE_SHIFT); + kimage_free_pages(page); + } + +@@ -686,7 +708,7 @@ static struct page *kimage_alloc_page(st + * have a match. + */ + list_for_each_entry(page, &image->dest_pages, lru) { +- addr = page_to_pfn(page) << PAGE_SHIFT; ++ addr = kexec_page_to_pfn(page) << PAGE_SHIFT; + if (addr == destination) { + list_del(&page->lru); + return page; +@@ -701,12 +723,12 @@ static struct page *kimage_alloc_page(st + if (!page) + return NULL; + /* If the page cannot be used file it away */ +- if (page_to_pfn(page) > ++ if (kexec_page_to_pfn(page) > + (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { + list_add(&page->lru, &image->unuseable_pages); + continue; + } +- addr = page_to_pfn(page) << PAGE_SHIFT; ++ addr = kexec_page_to_pfn(page) << PAGE_SHIFT; + + /* If it is the destination page we want use it */ + if (addr == destination) +@@ -729,7 +751,7 @@ static struct page *kimage_alloc_page(st + struct page *old_page; + + old_addr = *old & PAGE_MASK; +- old_page = pfn_to_page(old_addr >> PAGE_SHIFT); ++ old_page = kexec_pfn_to_page(old_addr >> PAGE_SHIFT); + copy_highpage(page, old_page); + *old = addr | (*old & ~PAGE_MASK); + +@@ -779,7 +801,7 @@ static int kimage_load_normal_segment(st + result = -ENOMEM; + goto out; + } +- result = kimage_add_page(image, page_to_pfn(page) ++ result = kimage_add_page(image, kexec_page_to_pfn(page) + << PAGE_SHIFT); + if (result < 0) + goto out; +@@ -811,6 +833,7 @@ out: + return result; + } + ++#ifndef CONFIG_XEN + static int kimage_load_crash_segment(struct kimage *image, + struct kexec_segment *segment) + { +@@ -833,7 +856,7 @@ static int kimage_load_crash_segment(str + char *ptr; + size_t uchunk, mchunk; + +- page = pfn_to_page(maddr >> PAGE_SHIFT); ++ page = kexec_pfn_to_page(maddr >> PAGE_SHIFT); + if (page == 0) { + result = -ENOMEM; + goto out; +@@ -881,6 +904,13 @@ static int kimage_load_segment(struct ki + + return result; + } ++#else /* CONFIG_XEN */ ++static int kimage_load_segment(struct kimage *image, ++ struct kexec_segment *segment) ++{ ++ return kimage_load_normal_segment(image, segment); ++} ++#endif + + /* + * Exec Kernel system call: for obvious reasons only root may call it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-May-09 13:28 UTC
[Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VIII)
>> >> Hi Takebe-san, >> >> here is an updated version of the patch which moves portions into >> patches/linux-2.6.16/ as you suggested. It also moves to >> xen-unstable 9969 / Linux 2.6.16.13 and has some minor build fixes, >> for problems that crept into the previous patch. > >Sorry, this mornin''s patch had the internal patch in the wrong location >and with the wrong diff level. >Hi, Horms Thank you for sending your new patch. This patch is good compilation. :) I try and repot soon. Best Regards Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kazuo Moriwaka
2006-May-10 06:49 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
Hi, I''ll send a patch for dom0 image extract (from kdump) script. Update is.. - get register values in dom0 context from kdump''s ELF Core PT_NOTE header. - get cr3 register value in context info from vcpu. - now you can look vmlinux symbols from gdbserver-xen. Todo: - It support only single processor now. - ELF Core output. On 4/24/06, Kazuo Moriwaka <moriwaka@valinux.co.jp> wrote:> Hi, > > On 4/24/06, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote: > > > When a panic occurs, Linux kexec jumps into the preloaded > > > kdump kernel (if any). This kernel then reinitiases the > > > hardware, using its own device drivers and uses these to > > > write out the dump to disk. ISTR that the dump format is > > > currently ELF, although I remember some talk on the Fastboot > > > ML about adding some extra headers to make OS debugging easier. > > > > Is Xen and the dom0 kernel dumped as as separate ELF cores? > > I''m working on clipping domain image from whole-machine dump for x86_32 now. > Now my prototype reads ELF core and write dom0 image. > > todo: > - Output format is not ELF core yet. Xen domain core image > format(works with gdbserverxen). > - register information is not work well.-- Kazuo Moriwaka _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kazuo Moriwaka
2006-May-10 06:50 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
I forget attach the patch. On 5/10/06, Kazuo Moriwaka <moriwaka@valinux.co.jp> wrote:> Hi, > > I''ll send a patch for dom0 image extract (from kdump) script. > > Update is.. > - get register values in dom0 context from kdump''s ELF Core PT_NOTE header. > - get cr3 register value in context info from vcpu. > - now you can look vmlinux symbols from gdbserver-xen. > > Todo: > - It support only single processor now. > - ELF Core output. > > On 4/24/06, Kazuo Moriwaka <moriwaka@valinux.co.jp> wrote: > > Hi, > > > > On 4/24/06, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote: > > > > When a panic occurs, Linux kexec jumps into the preloaded > > > > kdump kernel (if any). This kernel then reinitiases the > > > > hardware, using its own device drivers and uses these to > > > > write out the dump to disk. ISTR that the dump format is > > > > currently ELF, although I remember some talk on the Fastboot > > > > ML about adding some extra headers to make OS debugging easier. > > > > > > Is Xen and the dom0 kernel dumped as as separate ELF cores? > > > > I''m working on clipping domain image from whole-machine dump for x86_32 now. > > Now my prototype reads ELF core and write dom0 image. > > > > todo: > > - Output format is not ELF core yet. Xen domain core image > > format(works with gdbserverxen). > > - register information is not work well. > > -- > Kazuo Moriwaka >-- Kazuo Moriwaka _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
horms
2006-May-11 11:35 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VI)
On Mon, May 08, 2006 at 10:02:37AM +0100, Ian Campbell wrote:> I didn''t get Horms'' (I presume that''s who is quoted below) original mail > so I''ll reply to this one. > > > >Ok thanks, I haven''t seen such a machine. > > >I''ll look into simulating it in software. > > There is code in xen/arch/x86/nmi.c:do_nmi_trigger(). You can trigger it > with the ''n'' keyhandler.Thanks, I will use that technique. -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-May-15 08:29 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VI)
Hi, I tested about NMI button with your patch. I got coredump including all memory! We always use dom0''s coredump by using Horms''s and Magnus''s patch. :-) FYI, I used the following grub.conf and /proc/sys/kernel/unknown_nmi_panic=1 title Xen 3.0 kexec root (hd0,0) kernel /xen-3.0.gz dom0_mem=256M kdump_megabytes=64 kdump_megabytes_base=32 nmi=dom0 nosmp module /vmlinuz-2.6-xen ro root=LABEL=/ rhgb nosmp module /initrd-2.6-xen.img Best Regards, Akio Takebe>Hi, Simon and Magnus > >I have one question. >When Xen is panic, I seemed kexec is not called. >Only when dom0 is panic, kexec is called. >But in the case of nmi=dom0, can we use kexec by pushing NMI button? >Am I righit? > >I''ll use your patch soon, and report. :-) > >Best Regards, > >Akio Takebe > > > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-May-16 10:43 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VIII)
Hi, Keir I tried Horms''s kexec patch and Kazuo''s tools. And I could get coredump of dom0! By using this feature, we can debug dom0 with gdbserver-xen to the same way as domU. I think that this is very useful. Xen don''t have dump feature yet, and this feature don''t affect performace, stability, and so on. We think this feature is necessary for trouble-shooting xen. Could Keir apply this feature? or more comments? Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-May-16 10:44 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VIII)
On 16 May 2006, at 11:43, Akio Takebe wrote:> I tried Horms''s kexec patch and Kazuo''s tools. > And I could get coredump of dom0! > By using this feature, we can debug dom0 with gdbserver-xen > to the same way as domU. > I think that this is very useful. > Xen don''t have dump feature yet, > and this feature don''t affect performace, stability, and so on. > We think this feature is necessary for trouble-shooting xen. > > Could Keir apply this feature? > or more comments?Can it kexec to Xen yet? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-May-16 11:03 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VIII)
Hi, No, it can kexec only to kdump kernel. I think we need to update kexec-tools for kexecing Xen. (e.g. to load xen,dom0 and initrd) Am I right, Horms? But this feature is good as coredump feature. Even if cannot kexec to Xen, I believe this is important feature. Best Regards, Akio Takebe> >On 16 May 2006, at 11:43, Akio Takebe wrote: > >> I tried Horms''s kexec patch and Kazuo''s tools. >> And I could get coredump of dom0! >> By using this feature, we can debug dom0 with gdbserver-xen >> to the same way as domU. >> I think that this is very useful. >> Xen don''t have dump feature yet, >> and this feature don''t affect performace, stability, and so on. >> We think this feature is necessary for trouble-shooting xen. >> >> Could Keir apply this feature? >> or more comments? > >Can it kexec to Xen yet? > > -- Keir > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-May-16 12:39 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VIII)
On 16 May 2006, at 12:03, Akio Takebe wrote:> No, it can kexec only to kdump kernel. > I think we need to update kexec-tools for kexecing Xen. > (e.g. to load xen,dom0 and initrd) > Am I right, Horms?kexec-tools support multiboot format these days. So if kexec is added to Xen then we should support kexec''ing to Xen, or we need a good explanation why we can''t. -- Keir> But this feature is good as coredump feature. > Even if cannot kexec to Xen, I believe this is important feature._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Horms
2006-May-17 02:44 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VIII)
On Tue, May 16, 2006 at 01:39:31PM +0100, Keir Fraser wrote:> > On 16 May 2006, at 12:03, Akio Takebe wrote: > > >No, it can kexec only to kdump kernel. > >I think we need to update kexec-tools for kexecing Xen. > >(e.g. to load xen,dom0 and initrd) > >Am I right, Horms? > > kexec-tools support multiboot format these days. So if kexec is added > to Xen then we should support kexec''ing to Xen, or we need a good > explanation why we can''t.No it can''t kexec into xen yet. I haven''t looked into this in depth but I suspect that kexec-tools needs to be updated as Takebe-san suggests. As you mention kexec-tools does support multiboot so I suspect that it is not much work. I will look into it and get back to you. I take it that you would like this to be working before merging? In semi-related news, I will post an updated version of the patch in the next day or so. This is able to capture all of xen''s CPUs on kdump and kdump on xen crash. This means that feature-wise in terms of xen/kernel code the x86_32 port is pretty much complete. I would be really excited to get this merged so more eyes can go over the code and we can get some good feedback and testing. My colleague Magnus has x86_64 port is well under way, however we are having a few problems relating to the approach he has taken to page table handling on kexec. I am hoping to take a crack at ia64 in the near future, though I suspect that x86_32 bug fixes and other merge-related work will delay that a little. -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Horms
2006-May-17 04:53 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take VIII)
On Wed, May 17, 2006 at 11:44:04AM +0900, Horms wrote:> On Tue, May 16, 2006 at 01:39:31PM +0100, Keir Fraser wrote: > > > > On 16 May 2006, at 12:03, Akio Takebe wrote: > > > > >No, it can kexec only to kdump kernel. > > >I think we need to update kexec-tools for kexecing Xen. > > >(e.g. to load xen,dom0 and initrd) > > >Am I right, Horms? > > > > kexec-tools support multiboot format these days. So if kexec is added > > to Xen then we should support kexec''ing to Xen, or we need a good > > explanation why we can''t. > > No it can''t kexec into xen yet.I''m happy to report that with some more testing, as long as kexec-tool is compiled with zlib support I can kexec linux->xen and xen->xen. Actually, zlib might not be neccessary if both xen and linux are uncompressed. In any case, for reference, here is a kexec command line that works for me. kexec -l -t multiboot-x86 --append="console=com1 sync_console conswitch=bb com1=115200,8n1,0x3f8 dom0_mem=48000" /root/xen --module="/root/vmlinuz-xen root=/dev/hda1 ro console=ttyS0,115200 clock=pit ip=on apm=power-off" --module=/tmp/initramfs_data.cpio I will post an updated patch today or tomorrow and at that time I will outline the immediate targets for further development. For now, I''d like to ask that you don''t merge what I have posted, as there are some invasive changes coming up with regards to page table handling, but I sould be able to provide a patch that includes those changes within the next week as the code is already done, it just needs to be cleaned up a bit and merged with the kexec patch. -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Horms
2006-May-17 09:52 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take IX)
Hi, as promised earlier in the day, here is an update on the kexec/kdump patch. The main changes are that SMP now works, and the dumping of cpu registers for kdump has been moved into the hypervisor so as to allow all CPUs to be captured, not just dom0''s VCPUs. Also, as mentioned earlier in the day linux->xen and xen->xen kexec does work, contrary to what I previously reported. I have also broken the patch out into generic, x86 and x86_32 patches which need to be applied in that order. This was done to allow other architectures to be worked on more easily. By that, I mean it makes it easier for my colleagues and I to work together. It should also make it easier to review the code. If a monolithic patch is desired please let me know as it is very easy for me to produce one. I hope to make the next round available within the next few (working) days. This will change page table handling around a bit (only for kexec/kdump not for the rest of the time) so as to avoid trampling the page tables, which is a problem for kdump as it destroys data that might otherwise be analysed. My colleague Magnus Damm is working on having his approach addoped by Linux kdump. Beyond that, Magnus has also been working on a x86_64 port, though that is not quite working. And I plan to start on ia64 soon. -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-May-17 10:10 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take IX)
On 17 May 2006, at 10:52, Horms wrote:> as promised earlier in the day, here is an update on the kexec/kdump > patch. The main changes are that SMP now works, and the dumping of > cpu registers for kdump has been moved into the hypervisor so as to > allow all CPUs to be captured, not just dom0''s VCPUs.Just looking at the generic patch: * Define KEXEC_CMD_* in your public kexec.h header, not xen.h. * Don''t pack all the different arg structs into a union -- the union will change in size if you ever add a bigger argument substructure, plus it''s ugly. Split them out and put a comment by each KEXEC_CMD_* definition explaining what its argument parameter points at (see other header files like vcpu.h for an example). * Can you explain the need for all the changesin your kexec.patch? I guess there are some virt_to_phys address translations that need fixing up, but you also scatter a few hypercalls around in there (e.g., in base/cpu.c) -- can they not be handled more cleanly, or is kexec-on-xen somehow more special than kexec on any native architecture? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Horms
2006-May-18 03:37 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386 (Take IX)
On Wed, May 17, 2006 at 11:10:45AM +0100, Keir Fraser wrote:> > On 17 May 2006, at 10:52, Horms wrote: > > >as promised earlier in the day, here is an update on the kexec/kdump > >patch. The main changes are that SMP now works, and the dumping of > >cpu registers for kdump has been moved into the hypervisor so as to > >allow all CPUs to be captured, not just dom0''s VCPUs. > > Just looking at the generic patch: > * Define KEXEC_CMD_* in your public kexec.h header, not xen.h. > * Don''t pack all the different arg structs into a union -- the union > will change in size if you ever add a bigger argument substructure, > plus it''s ugly. Split them out and put a comment by each KEXEC_CMD_* > definition explaining what its argument parameter points at (see other > header files like vcpu.h for an example). > * Can you explain the need for all the changesin your kexec.patch? I > guess there are some virt_to_phys address translations that need fixing > up, but you also scatter a few hypercalls around in there (e.g., in > base/cpu.c) -- can they not be handled more cleanly, or is kexec-on-xen > somehow more special than kexec on any native architecture?Hi Keir, thanks for your suggestions, I''ll address these and send a more detailed reply a little later. -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kazuo Moriwaka
2006-May-19 11:21 UTC
Re: [Xen-devel] Re: [PATCH]: kexec: framework and i386
Hi, I update dom0cut script to extract dom0 in both Xen Core and ELF Core format. With ELF Core, you can use gdb without gdbserver-xen. typical usage is: $ ./dom0cut.py -telf -f -dvmcore -odom0core -xxen-syms -vvmlinux-2.6.16 $ gdb vmlinux-2.6.16 dom0core or $ ./dom0cut.py -txen -f -dvmcore -odom0core -xxen-syms -vvmlinux-2.6.16 $ gdbserver-xen localhost:9999 --file dom0core ... To make ELF headers, It use libelf (http://directory.fsf.org/libs/misc/libelf.html) and SWIG (http://www.swig.org/). I attach the patch and libelf wrapper. Todo: - SMP support. - other architecture support. On 5/10/06, Kazuo Moriwaka <moriwaka@valinux.co.jp> wrote:> Hi, > > I''ll send a patch for dom0 image extract (from kdump) script. > > Update is.. > - get register values in dom0 context from kdump''s ELF Core PT_NOTE header. > - get cr3 register value in context info from vcpu. > - now you can look vmlinux symbols from gdbserver-xen. > > Todo: > - It support only single processor now. > - ELF Core output. > > On 4/24/06, Kazuo Moriwaka <moriwaka@valinux.co.jp> wrote: > > Hi, > > > > On 4/24/06, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote: > > > > When a panic occurs, Linux kexec jumps into the preloaded > > > > kdump kernel (if any). This kernel then reinitiases the > > > > hardware, using its own device drivers and uses these to > > > > write out the dump to disk. ISTR that the dump format is > > > > currently ELF, although I remember some talk on the Fastboot > > > > ML about adding some extra headers to make OS debugging easier. > > > > > > Is Xen and the dom0 kernel dumped as as separate ELF cores? > > > > I''m working on clipping domain image from whole-machine dump for x86_32 now. > > Now my prototype reads ELF core and write dom0 image. > > > > todo: > > - Output format is not ELF core yet. Xen domain core image > > format(works with gdbserverxen). > > - register information is not work well. > > -- > Kazuo Moriwaka >-- Kazuo Moriwaka _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, sorry for the somewhat long delay between sending updates. I''m happy to announce tenth take of the kexec/kdump patch. I''ll address Keir''s questions from the 9th release below, but first I would like to quickly summarise the patches. Kexec/kdump is implemented by moving the privelaged portions (and related plumbing where needed) from linux into the hypervisor. This is primarily done by implementing kexec''s architecture independent hooks as hypercalls. Both Kexec is working for x86_32 and x86_64 for SMP and UP. Kdump is also working for SMP and UP on x86_32. x86_64 may work, but still needs more attention. In particular the register saving code has not been implemented. These patches also include some reworking of kexec''s internals in order that the page table is not mangled on kdump. These changes also make x86_64 kexec/kdump somewhat easier to implement. Collectively this is the pagetable_a approach developed by my colleague Magnus Damm, and he is working with the linux kexec maintainers to get it merged there. The code is broken out into four patches. They should apply cleanly to xen-unstable.hg 10151. 1. 51.1-kexec-generic-upstream.patch * Common code for all architectures, the basic plumbing for kexec/kdump 2. 51.2.1-kexec-x86-upstream.patch * Glue between 1, and 3 and 4. This would not be needed for ppc or ia64, but neither have been written yet. We are planning to commence work on ia64 soon. * Depends on 1 3. 51.2.1.1-kexec-x86_32-upstream.patch * Kexec/kdump for x86_32 * Depends on 2 (and 1) 4. 51.2.31.2-kexec-x86_64-upstream.patch * * Kexec/kdump for x86_64 * Depends on 2 (and 1) On Thu, May 18, 2006 at 12:37:54PM +0900, Horms wrote:> On Wed, May 17, 2006 at 11:10:45AM +0100, Keir Fraser wrote: > > > > On 17 May 2006, at 10:52, Horms wrote: > > > > >as promised earlier in the day, here is an update on the kexec/kdump > > >patch. The main changes are that SMP now works, and the dumping of > > >cpu registers for kdump has been moved into the hypervisor so as to > > >allow all CPUs to be captured, not just dom0''s VCPUs. > > > > Just looking at the generic patch: > > * Define KEXEC_CMD_* in your public kexec.h header, not xen.h. > > * Don''t pack all the different arg structs into a union -- the union > > will change in size if you ever add a bigger argument substructure, > > plus it''s ugly. Split them out and put a comment by each KEXEC_CMD_* > > definition explaining what its argument parameter points at (see other > > header files like vcpu.h for an example).I have changed both of these things.> > * Can you explain the need for all the changesin your kexec.patch? I > > guess there are some virt_to_phys address translations that need fixing > > up, but you also scatter a few hypercalls around in there (e.g., in > > base/cpu.c) -- can they not be handled more cleanly, or is kexec-on-xen > > somehow more special than kexec on any native architecture?Sure. There are several areas of change, I will address them one by one. If I have missed any, please let me know * pfn vs mfn Linux kexec works in pfns, but as kexec needs to work in real mode in Xen mfns are needed. This change should be fairly obvious, though more invasive than I would have liked. * get_crash_notes When a kernel is loaded for kexec or kdump part of the work is done in user-space. In particular the elf header is created in user-space and it needs to know the location of the elf notes where the registers are saved on crash dump. As only xen knows where all the CPUs the notes are handled by the hypervisor and a hypercall is used by get_crash_notes() to get the address of the notes which is exposed to userspace as required by kexec-tool. It is worth noting that only dom0''s vcpus are exposed to user space, however all CPUs notes will be written by xen. In practice I stronly suspect that a customised tool will be needed to analyise crash dumps, well the xen specific parts anyway, and such a tool should be able to find the crash notes that are not in the elf header. Actually, I''m not really sure why the crash notes need to be in the elf header at all. In essence this code is really just there to keep kexec-tool happy and avoid having to modify it. To that end I am happy to say that neither kexec-tool nor the target kernel (crash or kexec kernel) need to be modified in order to kexec or kdump from xen. * xen_machine_kexec_load and xen_machine_kexec_unload It was originally hoped that the machine_kexec_prepare and machine_kexec_cleanup hooks could be used, however it turns out that the place that they are called in is not very useful for xen. Well, on x86_32 and x86_64 at least. So instead xen_machine_kexec_load and xen_machine_kexec_unload were added. xen_machine_kexec_load loads the kernel into xen. It is at this time that all preparation is work is done. Leavking xen_machine_kexec as just a trigger. xen_machine_kexec_unload reverses the work of xen_machine_kexec_load. -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-Jun-05 02:53 UTC
Re: [Xen-devel] [PATCH] kexec: framework and i386 (Take X)
Hi, Horms When I tried your patch, I had compile errors. If you use xchg(), you should get return value of xchg(). The above compile error occurre at the below part. +void crash_kexec(struct cpu_user_regs *regs) +{ + int locked; + + locked = xchg(&kexec_crash_lock, 1); + if (locked) + return; + __crash_kexec(regs); + xchg(&kexec_crash_lock, 0); <-------------this one +} + +static int get_crash_note(int vcpuid, XEN_GUEST_HANDLE(void) uarg) +{ + struct domain *domain = current->domain; + unsigned long crash_note; + struct vcpu *vcpu; + int locked; + + if (vcpuid < 0 || vcpuid > MAX_VIRT_CPUS) + return -EINVAL; + + if ( ! (vcpu = domain->vcpu[vcpuid]) ) + return -EINVAL; + + locked = xchg(&kexec_crash_lock, 1); + if (locked) + { + printk("do_kexec: (CMD_kexec_crash_note): dump is locked\n"); + return -EFAULT; + } + crash_note = __pa((unsigned long)per_cpu(crash_notes, vcpu-> processor)); + xchg(&kexec_crash_lock, 0); <-------------this one + + if ( unlikely(copy_to_guest(uarg, &crash_note, 1) != 0) ) + { + printk("do_kexec: (CMD_kexec_crash_note): copy_to_guest failed \n"); + return -EFAULT; + } + + return 0; +} + +int do_kexec(unsigned long op, int arg1, XEN_GUEST_HANDLE(void) uarg) +{ + xen_kexec_image_t *image; + int locked; + int *image_set; + int status = -EINVAL; + + if ( !IS_PRIV(current->domain) ) + return -EPERM; + + switch (op) + { + case KEXEC_CMD_kexec_crash_note: + return get_crash_note(arg1, uarg); + case KEXEC_CMD_kexec_reserve: + return get_reserve(uarg); + } + + /* For all other ops, arg1 is the type of kexec, that is + * KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH */ + if (arg1 == KEXEC_TYPE_CRASH) + { + image = &kexec_crash_image; + image_set = &kexec_crash_image_set; + locked = xchg(&kexec_crash_lock, 1); + if (locked) + { + printk("do_kexec: dump is locked\n"); + return -EFAULT; + } + } + else + { + image = &kexec_image; + image_set = &kexec_image_set; + } + + switch(op) { + case KEXEC_CMD_kexec: + BUG_ON(!*image_set); + status = __do_kexec(arg1, uarg, image); + break; + case KEXEC_CMD_kexec_load: + BUG_ON(*image_set); + if ( unlikely(copy_from_guest(image, uarg, 1) != 0) ) + { + printk("do_kexec (CMD_kexec_load): copy_from_guest failed\n "); + status = -EFAULT; + break; + } + *image_set = 1; + status = machine_kexec_load(arg1, image); + break; + case KEXEC_CMD_kexec_unload: + BUG_ON(!*image_set); + *image_set = 0; + machine_kexec_unload(arg1, image); + status = 0; + break; + } + + if (arg1 == KEXEC_TYPE_CRASH) + xchg(&kexec_crash_lock, 0); <-------------this one + return status; +} + Best Regards, Akio Takebe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, May 25, 2006 at 04:20:19PM +0900, Horms wrote:> Hi, > > sorry for the somewhat long delay between sending updates. > I''m happy to announce tenth take of the kexec/kdump patch. > I''ll address Keir''s questions from the 9th release below, > but first I would like to quickly summarise the patches. > > Kexec/kdump is implemented by moving the privelaged portions > (and related plumbing where needed) from linux into the hypervisor. > This is primarily done by implementing kexec''s architecture > independent hooks as hypercalls. > > Both Kexec is working for x86_32 and x86_64 for SMP and UP. > Kdump is also working for SMP and UP on x86_32. x86_64 may work, > but still needs more attention. In particular the register > saving code has not been implemented. > > These patches also include some reworking of kexec''s internals in > order that the page table is not mangled on kdump. These changes > also make x86_64 kexec/kdump somewhat easier to implement. > Collectively this is the pagetable_a approach developed by my colleague > Magnus Damm, and he is working with the linux kexec maintainers to > get it merged there. > > The code is broken out into four patches. > They should apply cleanly to xen-unstable.hg 10151. > > 1. 51.1-kexec-generic-upstream.patch > * Common code for all architectures, > > the basic plumbing for kexec/kdump > 2. 51.2.1-kexec-x86-upstream.patch > * Glue between 1, and 3 and 4. > This would not be needed for ppc or ia64, but > neither have been written yet. > We are planning to commence work on ia64 soon. > * Depends on 1 > > 3. 51.2.1.1-kexec-x86_32-upstream.patch > * Kexec/kdump for x86_32 > * Depends on 2 (and 1) > > 4. 51.2.31.2-kexec-x86_64-upstream.patch > * * Kexec/kdump for x86_64 > * Depends on 2 (and 1)Hi, here is a modest update to the kexec patches, broken out as per the description above. The changes are: * Kconfig: don''t allow kexec to be build for a non-privelaged domain as this makes no sense at this time. * fix a gcc compilation error that became apparent in gcc (GCC) 4.1.2 20060604 (prerelease) (Debian 4.1.1-2). There is a warning produced that causes the build to fail because of the use of -Werror when compiling the hypervisor. The warning relates to kexec''s use of xchg as a simple locking mechanism and not always using the return value as it isn''t of any value. * Record dom0''s cr3 in vmcore for analysis by crash https://www.redhat.com/archives/crash-utility/2006-June/msg00015.html * Upport from xen-unstable.hg 10151 to 10352, which is the current tree at present. This involved fixing two minor diffing issues, nothing more. -- Horms http://www.vergenet.net/~horms/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, here is another modest update of the kexec patchset for kdump. A breif summary of changes (all fairly minor): * Forward port to xen-unstable-10650 * Move hypercall argument setup into machine specific code else its going to get messy as more architectures are added * Don''t pass kimage to the kexec_unload sub-hypercall, as its not needed * Add ia64 stubs * Use __FILE__ and __FUNCTION__ in stubs to make them less prone to error * Add xen-console trigger crash_dump The patches are currently: 1. 51.1-kexec-generic-upstream.patch * Common code for all architectures, the basic plumbing for kexec/kdump 2. 51.1.1-kexec-trigger_crash_dump.patch * xen-console trigger crash_dump * Depends on 1 3. 51.2.1-kexec-x86-upstream.patch * Glue between 1, and 3 and 4. This would not be needed for ppc or ia64, but neither have been written yet. We are planning to commence work on ia64 soon. * Depends on 1 4. 51.2.1.1-kexec-x86_32-upstream.patch * Kexec/kdump for x86_32 * Depends on 3 (and 1) 5. 51.2.31.2-kexec-x86_64-upstream.patch * * Kexec/kdump for x86_64 * Depends on 3 (and 1) I also have some ia64 patches, but they are still not working or complete, so I''ll hold onto them for a bit longer. If anyone wants them, let me know. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, here is a minor update of the kexec/kdump patchset. The changes since the last post are quite minimal. * Forward port to xen-unstable-11059 from 10650 * Add powerpc stubs to 51.1-kexec-generic-upstream.patch going to get messy as more architectures are added The patches are currently: 1. 51.1-kexec-generic-upstream.patch * Common code for all architectures, the basic plumbing for kexec/kdump 2. 51.1.1-kexec-trigger_crash_dump.patch * xen-console trigger crash_dump * Depends on 1 3. 51.2.1-kexec-x86-upstream.patch * Glue between 1, and 3 and 4. This would not be needed for ppc or ia64, but neither have been written yet. We are planning to commence work on ia64 soon. * Depends on 1 4. 51.2.1.1-kexec-x86_32-upstream.patch * Kexec/kdump for x86_32 * Depends on 3 (and 1) 5. 51.2.31.2-kexec-x86_64-upstream.patch * * Kexec/kdump for x86_64 * Depends on 3 (and 1) Things that are being worked on: * Porting kexec for ia64. This is going somewhat slower than I had hoped. Partly because of my own schedule. And partly because the Linux code is flakier than I previously thought. If anyone cares, the problem that is currently bothering me most about linux ia64 kexec is that you can usually kexec once, but twice doesn''t work. e.g. linux --kexec--> linux: ok linux --kexec--> linux --kexec fails--> linux: not ok * Kdump for x86_64. My colleague Magnus is working on this. But he is seeing a very strange problem where kdumping into a bzimage works, while a vmlinux does not. Please prod him if you want more details. Things that would be good to work on: * PPC port -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, here is an update of the kexec/kdump patchset. Summary: * Up port to xen-unstable.hg-11296 (45f6ee334fcc) - kexec hypercall number fragment is now in xen-unstable * Make kexec_page_to_pfn and friends need to be architecture specific - this abstraction is needed to support ia64 * Use kexec_page_to_pfn in machine_kexec_setup_load_arg() - this abstraction is needed to support ia64 * Rename do_kexec to do_kexec_op to make it consistent with other hypercalls * Add ppc stubs * Add ia64 support Architectures: x86_32: Seems to be working fine x86_64: Probably working fine, but I can''t test this as dom0 refuses to boot for me on xen-unstable-11388 (50aea0ec406b). That is, even without the kexec patches. I''m not sure what the problem is and I''ve devicided to get these patches out rather and investigate later. ia64: This patchset also, for the first time, includes ia64 code. Please note that this currently does _not_ work. I am actually struggling to work out why, and would really appreaciate it if someone could cast an eye over it. One possible area of concern is that relocate_kernel wipes out TLB entries. However many of the entries instated in arch/ia64/xen/xenasm.S:ia64_new_rr7() are not wiped. In particular, VHPT_ADDR, Shared info, and Map mapped_reg are not handled by relocate_kernel(), and the handling of current seems to be different. There are also problems with constants inside kexec_fake_sal_rendez. However this function probably also suffers the same problems as relocate_kernel. And it is easy not ro run kexec_fake_sal_rendez by booting xen with maxcpus=1, thus avoiding calling kexec_fake_sal_rendez, which is used in cpu shutdown. ppc: stubs only Patches 1. 51.1-kexec-generic-upstream.patch * Common code for all architectures, the basic plumbing for kexec/kdump 2. 51.1.1-kexec-trigger_crash_dump.patch * xen-console trigger crash_dump * Depends on 1 3. 51.2.1-kexec-x86-upstream.patch * Glue between 1, and 3 and 4. * Depends on 1 4. 51.2.1.1-kexec-x86_32-upstream.patch * Kexec/kdump for x86_32 * Depends on 3 (and 1) 5. 51.2.31.2-kexec-x86_64-upstream.patch * Kexec/kdump for x86_64 * Depends on 3 (and 1) 6. 51.2.2-kexec-ia64-upstream.patch * Kexec/kdump for ia64 * Depends 1 Discussion: Email is always good. Also my partner in crime, Magnus Damm, will be at Xen Summit. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-Aug-31 08:55 UTC
[Xen-ia64-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
Hi, Horms and Magnus Good work. :-) I have one commet. I believe crash_kexec should be directly called when unknown NMI is occurred. In your patch, crash_kexec is called as the bellow. 1. unknown NMI is occurred. (e.g. by pushing NMI botton) 2. xen recieved NMI and call do_nmi. 3. xen report to dom0 by using raise_softirq(NMI_SOFTIRQ). 4. dom0 call crash_kexec of dom0. 5. crash_kexec of dom0 call crash_kexec of xen Am I correct? The above process is not reliable if I''m correct. So I belive crash_kexec of xen should be directly called like the following patch. diff -r 9611a5c9e1a1 xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Thu Aug 31 13:12:26 2006 +0900 +++ b/xen/arch/x86/traps.c Thu Aug 31 17:40:19 2006 +0900 @@ -1612,6 +1612,7 @@ asmlinkage void do_nmi(struct cpu_user_r else if ( reason & 0x40 ) io_check_error(regs); else if ( !nmi_watchdog ) + crash_kexec(NULL); unknown_nmi_error((unsigned char)(reason&0xff)); } } What do you think about it? Best Regards, Akio Takebe>Hi, > >here is an update of the kexec/kdump patchset. > >Summary: > >* Up port to xen-unstable.hg-11296 (45f6ee334fcc) > - kexec hypercall number fragment is now in xen-unstable >* Make kexec_page_to_pfn and friends need to be architecture specific > - this abstraction is needed to support ia64 >* Use kexec_page_to_pfn in machine_kexec_setup_load_arg() > - this abstraction is needed to support ia64 >* Rename do_kexec to do_kexec_op to make it consistent with other > hypercalls >* Add ppc stubs >* Add ia64 support > >Architectures: > >x86_32: > >Seems to be working fine > >x86_64: > >Probably working fine, but I can''t test this as dom0 refuses to boot for >me on xen-unstable-11388 (50aea0ec406b). That is, even without the >kexec patches. I''m not sure what the problem is and I''ve devicided to >get these patches out rather and investigate later. > >ia64: > >This patchset also, for the first time, includes ia64 code. >Please note that this currently does _not_ work. I am actually >struggling to work out why, and would really appreaciate it >if someone could cast an eye over it. > >One possible area of concern is that relocate_kernel wipes out TLB >entries. However many of the entries instated in >arch/ia64/xen/xenasm.S:ia64_new_rr7() are not wiped. In particular, >VHPT_ADDR, Shared info, and Map mapped_reg are not handled by >relocate_kernel(), and the handling of current seems to be different. > >There are also problems with constants inside kexec_fake_sal_rendez. >However this function probably also suffers the same problems as >relocate_kernel. And it is easy not ro run kexec_fake_sal_rendez >by booting xen with maxcpus=1, thus avoiding calling >kexec_fake_sal_rendez, which is used in cpu shutdown. > >ppc: > >stubs only > >Patches > > 1. 51.1-kexec-generic-upstream.patch > * Common code for all architectures, > the basic plumbing for kexec/kdump > > 2. 51.1.1-kexec-trigger_crash_dump.patch > * xen-console trigger crash_dump > * Depends on 1 > > 3. 51.2.1-kexec-x86-upstream.patch > * Glue between 1, and 3 and 4. > * Depends on 1 > > 4. 51.2.1.1-kexec-x86_32-upstream.patch > * Kexec/kdump for x86_32 > * Depends on 3 (and 1) > > 5. 51.2.31.2-kexec-x86_64-upstream.patch > * Kexec/kdump for x86_64 > * Depends on 3 (and 1) > > 6. 51.2.2-kexec-ia64-upstream.patch > * Kexec/kdump for ia64 > * Depends 1 > >Discussion: > >Email is always good. Also my partner in crime, Magnus Damm, >will be at Xen Summit. > >-- >Horms > H: http://www.vergenet.net/~horms/ > W: http://www.valinux.co.jp/en/_______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
On Thu, Aug 31, 2006 at 05:55:52PM +0900, Akio Takebe wrote:> Hi, Horms and Magnus > > Good work. :-) > I have one commet. > > I believe crash_kexec should be directly called > when unknown NMI is occurred. > In your patch, crash_kexec is called as the bellow. > 1. unknown NMI is occurred. (e.g. by pushing NMI botton) > 2. xen recieved NMI and call do_nmi. > 3. xen report to dom0 by using raise_softirq(NMI_SOFTIRQ). > 4. dom0 call crash_kexec of dom0. > 5. crash_kexec of dom0 call crash_kexec of xen > > Am I correct? > The above process is not reliable if I''m correct. > So I belive crash_kexec of xen should be directly called like the > following patch. > > diff -r 9611a5c9e1a1 xen/arch/x86/traps.c > --- a/xen/arch/x86/traps.c Thu Aug 31 13:12:26 2006 +0900 > +++ b/xen/arch/x86/traps.c Thu Aug 31 17:40:19 2006 +0900 > @@ -1612,6 +1612,7 @@ asmlinkage void do_nmi(struct cpu_user_r > else if ( reason & 0x40 ) > io_check_error(regs); > else if ( !nmi_watchdog ) > + crash_kexec(NULL); > unknown_nmi_error((unsigned char)(reason&0xff)); > } > } > > What do you think about it?That seems like a good idea to me. Though I think you are missing { }. Can you test to see if this works? --- a/xen/arch/x86/traps.c 2006-09-01 11:53:44.000000000 +0900 +++ b/xen/arch/x86/traps.c 2006-09-01 11:53:56.000000000 +0900 @@ -1611,8 +1611,10 @@ mem_parity_error(regs); else if ( reason & 0x40 ) io_check_error(regs); - else if ( !nmi_watchdog ) + else if ( !nmi_watchdog ) { + crash_kexec(NULL); unknown_nmi_error((unsigned char)(reason&0xff)); + } } } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-Sep-01 08:41 UTC
[Xen-ia64-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
Hi, Horms>That seems like a good idea to me. Though I think you are missing { }. >Can you test to see if this works?Oops, You''re right. But I think unknown_nmi_error() is not called, because crash_kexec() is called before that. Yes, I''ll test it. :-)> >--- a/xen/arch/x86/traps.c 2006-09-01 11:53:44.000000000 +0900 >+++ b/xen/arch/x86/traps.c 2006-09-01 11:53:56.000000000 +0900 >@@ -1611,8 +1611,10 @@ > mem_parity_error(regs); > else if ( reason & 0x40 ) > io_check_error(regs); >- else if ( !nmi_watchdog ) >+ else if ( !nmi_watchdog ) { >+ crash_kexec(NULL); > unknown_nmi_error((unsigned char)(reason&0xff)); >+ } > } > } >Best Regards, Akio Takebe _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Akio Takebe
2006-Sep-01 08:45 UTC
[Xen-ia64-devel] Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
>Hi, Horms > >>That seems like a good idea to me. Though I think you are missing { }. >>Can you test to see if this works? >Oops, You''re right. But I think unknown_nmi_error() is not called, >because crash_kexec() is called before that.Sorry. In the only case of CONFIG_KEXEC=y, the above is right. Best Regards, Akio Takebe _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Horms
2006-Sep-01 10:21 UTC
Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
On Fri, Sep 01, 2006 at 05:45:59PM +0900, Akio Takebe wrote:> >Hi, Horms > > > >>That seems like a good idea to me. Though I think you are missing { }. > >>Can you test to see if this works? > >Oops, You''re right. But I think unknown_nmi_error() is not called, > >because crash_kexec() is called before that. > Sorry. > In the only case of CONFIG_KEXEC=y, the above is right.Yes, I think that is the case. I will put your patch into the kexec series, as I think that it is a worthy addition. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2006-Sep-04 21:45 UTC
[Xen-ia64-devel] Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
Hi, Horms I tested the following patch with Horms kexec patch. My tests is: push NMI bottun after loading kdump kernel. The results is: OK, I could get vmcore diff -r b688d4a68a3e xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Tue Aug 22 14:59:16 2006 +0100 +++ b/xen/arch/x86/traps.c Tue Sep 05 06:37:49 2006 +0900 @@ -105,6 +105,8 @@ static int debug_stack_lines = 20; static int debug_stack_lines = 20; integer_param("debug_stack_lines", debug_stack_lines); +extern void crash_kexec(struct cpu_user_regs *regs); + #ifdef CONFIG_X86_32 #define stack_words_per_line 8 #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)®s->esp) @@ -1611,8 +1613,10 @@ asmlinkage void do_nmi(struct cpu_user_r mem_parity_error(regs); else if ( reason & 0x40 ) io_check_error(regs); - else if ( !nmi_watchdog ) + else if ( !nmi_watchdog ){ + crash_kexec(NULL); unknown_nmi_error((unsigned char)(reason&0xff)); + } } } Best Regards, Akio Takebe>On Fri, Sep 01, 2006 at 05:45:59PM +0900, Akio Takebe wrote: >> >Hi, Horms >> > >> >>That seems like a good idea to me. Though I think you are missing { }. >> >>Can you test to see if this works? >> >Oops, You''re right. But I think unknown_nmi_error() is not called, >> >because crash_kexec() is called before that. >> Sorry. >> In the only case of CONFIG_KEXEC=y, the above is right. > >Yes, I think that is the case. I will put your patch into the kexec >series, as I think that it is a worthy addition. > >-- >Horms > H: http://www.vergenet.net/~horms/ > W: http://www.valinux.co.jp/en/ > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Kazuo Moriwaka
2006-Sep-05 11:43 UTC
[Xen-ia64-devel] Re: [Xen-devel] [PATCH] kexec: framework and i386 (Take XIV)
On 8/31/06, Horms <horms@verge.net.au> wrote:> x86_64: > > Probably working fine, but I can''t test this as dom0 refuses to boot for > me on xen-unstable-11388 (50aea0ec406b). That is, even without the > kexec patches. I''m not sure what the problem is and I''ve devicided to > get these patches out rather and investigate later.I tried some versions of xen with kdump patches on x86_64, following is the result. I''m sorry for it wasn''t done in systematic style. chengeset result 11414 doesn''t boot 11251 doesn''t boot 11134 doesn''t boot 11076 boot -- Kazuo Moriwaka _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
On Tue, Sep 05, 2006 at 08:43:44PM +0900, Kazuo Moriwaka wrote:> On 8/31/06, Horms <horms@verge.net.au> wrote: > > >x86_64: > > > >Probably working fine, but I can''t test this as dom0 refuses to boot for > >me on xen-unstable-11388 (50aea0ec406b). That is, even without the > >kexec patches. I''m not sure what the problem is and I''ve devicided to > >get these patches out rather and investigate later. > > I tried some versions of xen with kdump patches on x86_64, > following is the result. > I''m sorry for it wasn''t done in systematic style. > > chengeset result > 11414 doesn''t boot > 11251 doesn''t boot > 11134 doesn''t boot > 11076 bootThanks, that is valuable information. I am guessing that doing a bisection between 11134 and 11076 would help shed some light and what has gone astray. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Horms
2007-May-28 05:28 UTC
Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
[ Ian Campbell added to CC list ] On Tue, Sep 05, 2006 at 06:45:35AM +0900, Akio Takebe wrote:> Hi, Horms > > I tested the following patch with Horms kexec patch. > > My tests is: > push NMI bottun after loading kdump kernel. > > The results is: > OK, I could get vmcoreHi Takebe-san, this patch seems ok to me, but it seems that it never went into the tree. Ian, what are your thoughts on it?> diff -r b688d4a68a3e xen/arch/x86/traps.c > --- a/xen/arch/x86/traps.c Tue Aug 22 14:59:16 2006 +0100 > +++ b/xen/arch/x86/traps.c Tue Sep 05 06:37:49 2006 +0900 > @@ -105,6 +105,8 @@ static int debug_stack_lines = 20; > static int debug_stack_lines = 20; > integer_param("debug_stack_lines", debug_stack_lines); > > +extern void crash_kexec(struct cpu_user_regs *regs); > + > #ifdef CONFIG_X86_32 > #define stack_words_per_line 8 > #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)®s->esp) > @@ -1611,8 +1613,10 @@ asmlinkage void do_nmi(struct cpu_user_r > mem_parity_error(regs); > else if ( reason & 0x40 ) > io_check_error(regs); > - else if ( !nmi_watchdog ) > + else if ( !nmi_watchdog ){ > + crash_kexec(NULL); > unknown_nmi_error((unsigned char)(reason&0xff)); > + } > } > } > > > > Best Regards, > > Akio Takebe > > >On Fri, Sep 01, 2006 at 05:45:59PM +0900, Akio Takebe wrote: > >> >Hi, Horms > >> > > >> >>That seems like a good idea to me. Though I think you are missing { }. > >> >>Can you test to see if this works? > >> >Oops, You''re right. But I think unknown_nmi_error() is not called, > >> >because crash_kexec() is called before that. > >> Sorry. > >> In the only case of CONFIG_KEXEC=y, the above is right. > > > >Yes, I think that is the case. I will put your patch into the kexec > >series, as I think that it is a worthy addition. > > > >-- > >Horms > > H: http://www.vergenet.net/~horms/ > > W: http://www.valinux.co.jp/en/ > > > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xensource.com > >http://lists.xensource.com/xen-devel-- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2007-May-28 06:25 UTC
[Xen-ia64-devel] Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
Hi, Horms and Ian Thank you for your reply, Horms. I forgot Signed-off-by of the patch. Signed-off-by: Horms <horms@verge.net.au> Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com> Is the Signed-off-by OK, Horms? Best Regards, Akio Takebe>[ Ian Campbell added to CC list ] > >On Tue, Sep 05, 2006 at 06:45:35AM +0900, Akio Takebe wrote: >> Hi, Horms >> >> I tested the following patch with Horms kexec patch. >> >> My tests is: >> push NMI bottun after loading kdump kernel. >> >> The results is: >> OK, I could get vmcore > > >Hi Takebe-san, > >this patch seems ok to me, but it seems that it never went into the >tree. Ian, what are your thoughts on it? > >> diff -r b688d4a68a3e xen/arch/x86/traps.c >> --- a/xen/arch/x86/traps.c Tue Aug 22 14:59:16 2006 +0100 >> +++ b/xen/arch/x86/traps.c Tue Sep 05 06:37:49 2006 +0900 >> @@ -105,6 +105,8 @@ static int debug_stack_lines = 20; >> static int debug_stack_lines = 20; >> integer_param("debug_stack_lines", debug_stack_lines); >> >> +extern void crash_kexec(struct cpu_user_regs *regs); >> + >> #ifdef CONFIG_X86_32 >> #define stack_words_per_line 8 >> #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)®s->esp) >> @@ -1611,8 +1613,10 @@ asmlinkage void do_nmi(struct cpu_user_r >> mem_parity_error(regs); >> else if ( reason & 0x40 ) >> io_check_error(regs); >> - else if ( !nmi_watchdog ) >> + else if ( !nmi_watchdog ){ >> + crash_kexec(NULL); >> unknown_nmi_error((unsigned char)(reason&0xff)); >> + } >> } >> } >> >> >> >> Best Regards, >> >> Akio Takebe_______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Horms
2007-May-29 01:05 UTC
Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
On Mon, May 28, 2007 at 03:25:04PM +0900, Akio Takebe wrote:> Hi, Horms and Ian > > Thank you for your reply, Horms. > I forgot Signed-off-by of the patch. > > Signed-off-by: Horms <horms@verge.net.au> > Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com> > > Is the Signed-off-by OK, Horms?Actually, i think this might be better: Acked-by: Simon Horman <horms@verge.net.au> -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2007-May-29 09:04 UTC
Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
On Mon, 2007-05-28 at 14:28 +0900, Horms wrote:> [ Ian Campbell added to CC list ] > > On Tue, Sep 05, 2006 at 06:45:35AM +0900, Akio Takebe wrote: > > Hi, Horms > > > > I tested the following patch with Horms kexec patch. > > > > My tests is: > > push NMI bottun after loading kdump kernel. > > > > The results is: > > OK, I could get vmcore > > > Hi Takebe-san, > > this patch seems ok to me, but it seems that it never went into the > tree. Ian, what are your thoughts on it?The default in non-debug builds is to forward the crash to domain 0 so we''d never get here, although I''d expect domain 0 probably does a kdump itself nowadays when an NMI is received. For debug builds I guess it does make sense. Assuming crash_kexec gracefully returns if no crash kernel has been loaded, so that the old behaviour is preserved, then the behaviour would be fine with me. Alternatively "nmi=kdump" on the command line might be nice.> > +extern void crash_kexec(struct cpu_user_regs *regs);I can''t find kexec_crash in xen-unstable. Is it now crash_kexec, with no parameters? Whatever the function is now called it should probably be in a header somewhere therefore no local prototype required.> > + else if ( !nmi_watchdog ){Needs a space between ) and {. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2007-May-31 10:43 UTC
[Xen-ia64-devel] Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
Hi, Ian and Horms I add the nmi=kdump option as Ian suggested. What do you think about it? Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com> --- diff -r 089696e0c603 xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Thu May 17 11:42:46 2007 +0100 +++ b/xen/arch/x86/traps.c Thu May 31 02:25:02 2007 +0900 @@ -1897,6 +1897,7 @@ asmlinkage void io_check_error(struct cp { case ''d'': /* ''dom0'' */ nmi_dom0_report(_XEN_NMIREASON_io_error); + case ''k'': /* ''kdump'' */ case ''i'': /* ''ignore'' */ break; default: /* ''fatal'' */ @@ -1916,6 +1917,8 @@ static void unknown_nmi_error(unsigned c { case ''d'': /* ''dom0'' */ nmi_dom0_report(_XEN_NMIREASON_unknown); + case ''k'': /* ''kdump'' */ + kexec_crash(); case ''i'': /* ''ignore'' */ break; default: /* ''fatal'' */ Best Regards, Akio Takebe _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Keir Fraser
2007-May-31 10:49 UTC
Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
On 31/5/07 11:43, "Akio Takebe" <takebe_akio@jp.fujitsu.com> wrote:> Hi, Ian and Horms > > I add the nmi=kdump option as Ian suggested. > What do you think about it?Won''t the default fatal_trap() behaviour cause you to drop into kdump code anyway? fatal_trap -> panic -> kexec_crash. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2007-May-31 11:07 UTC
[Xen-ia64-devel] Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
Hi, Keir>On 31/5/07 11:43, "Akio Takebe" <takebe_akio@jp.fujitsu.com> wrote: > >> Hi, Ian and Horms >> >> I add the nmi=kdump option as Ian suggested. >> What do you think about it? > >Won''t the default fatal_trap() behaviour cause you to drop into kdump code >anyway? fatal_trap -> panic -> kexec_crash. >Oops, you''re right. All we do is just setting nmi=kdump. Best Regards, Akio Takebe _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Akio Takebe
2007-May-31 11:17 UTC
[Xen-ia64-devel] Re: [Xen-devel] Re: [PATCH] kexec: framework and i386 (Take XIV)
Hi, Keir>Hi, Keir > >>On 31/5/07 11:43, "Akio Takebe" <takebe_akio@jp.fujitsu.com> wrote: >> >>> Hi, Ian and Horms >>> >>> I add the nmi=kdump option as Ian suggested. >>> What do you think about it? >> >>Won''t the default fatal_trap() behaviour cause you to drop into kdump code >>anyway? fatal_trap -> panic -> kexec_crash. >> >Oops, you''re right. >All we do is just setting nmi=kdump. >Sorry, please ignore the previous mail. Yes, as Keir said fatal_trap() should call panic. All we do is just setting nmi=fatal. Best Regards, Akio Takebe _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel