Ever since c/s 13829, the native (32-bit -> 32-bit) call to invoke the secondary kernel has been missing its fourth argument. Apparently this worked out because the respective stack location was non-zero. Starting with Linux 2.6.27 (32-bit) and 2.6.30 (64-bit) a new argument is being expected by the secondary kernel, and again apparently out of pure luck the 64-bit -> 64-bit case still appears to work for those of our customers who want to use it. The question really is whether this code has ever been tested with sufficiently recent kernels in all three variants (32->32, 64->64, and 64->32). While it seems that putting together a patch to address this shouldn''t be that difficult, a second question is how we can avoid getting into the same situation again when Linux extends the protocol again. Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich writes ("[Xen-devel] kexec woes with 32-bit secondary kernel"):> The question really is whether this code has ever been tested > with sufficiently recent kernels in all three variants (32->32, 64->64, > and 64->32).I don''t think it gets much testing.> While it seems that putting together a patch to address this > shouldn''t be that difficult, a second question is how we can avoid > getting into the same situation again when Linux extends the > protocol again.How is one Linux kernel supposed to safely kexec another, potentially different-version, Linux kernel ? We should use whatever mechanism Linux upstream use. Now you''re going to tell me that Linux upstream haven''t provided a way for that to work properly :-/ Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Sep-17 16:57 UTC
Re: [Xen-devel] kexec woes with 32-bit secondary kernel
On Fri, 2010-09-17 at 16:49 +0100, Jan Beulich wrote:> Ever since c/s 13829, the native (32-bit -> 32-bit) call to invoke the > secondary kernel has been missing its fourth argument. Apparently > this worked out because the respective stack location was non-zero.Which argument is this?> Starting with Linux 2.6.27 (32-bit) and 2.6.30 (64-bit) a new > argument is being expected by the secondary kernel, and again > apparently out of pure luck the 64-bit -> 64-bit case still appears > to work for those of our customers who want to use it. > > The question really is whether this code has ever been tested > with sufficiently recent kernels in all three variants (32->32, 64->64, > and 64->32).It gets pretty regular testing in XenServer and XCP in the 32on64->32native variant. This works at least with the 2.6.27 and 2.6.32 domain 0 kernels used in those two situations. I can''t speak for any testing done elsewhere though. I suspect that other than what you guys do there isn''t that much of it.> While it seems that putting together a patch to address this > shouldn''t be that difficult, a second question is how we can avoid > getting into the same situation again when Linux extends the > protocol again.I''ve always thought that the hypercall interface is rather too closely modelled on internals of a particular implementation from a particular version of Linux. On the other hand I''m not sure I have any better ideas. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 17.09.10 at 18:33, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: >> While it seems that putting together a patch to address this >> shouldn''t be that difficult, a second question is how we can avoid >> getting into the same situation again when Linux extends the >> protocol again. > > How is one Linux kernel supposed to safely kexec another, potentially > different-version, Linux kernel ? We should use whatever mechanism > Linux upstream use.That''s not an issue afaict: The trampoline code and the code calling it live in the same kernel, and hence no compatibility is needed between kernel versions. In the Xen case, the calling code is Xen''s, while the called code comes from the kernel. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 17.09.10 at 18:57, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Fri, 2010-09-17 at 16:49 +0100, Jan Beulich wrote: >> Ever since c/s 13829, the native (32-bit -> 32-bit) call to invoke the >> secondary kernel has been missing its fourth argument. Apparently >> this worked out because the respective stack location was non-zero. > > Which argument is this?The cpu_has_pae one.>> Starting with Linux 2.6.27 (32-bit) and 2.6.30 (64-bit) a new >> argument is being expected by the secondary kernel, and again >> apparently out of pure luck the 64-bit -> 64-bit case still appears >> to work for those of our customers who want to use it. >> >> The question really is whether this code has ever been tested >> with sufficiently recent kernels in all three variants (32->32, 64->64, >> and 64->32). > > It gets pretty regular testing in XenServer and XCP in the > 32on64->32native variant. This works at least with the 2.6.27 and 2.6.32 > domain 0 kernels used in those two situations.Hmm, that contradicts what we got told: Neither 32->32native nor 32on64->32native work. But surely it working for you can be a simple matter of luck with the compiler version you''re using (pretty likely different from the ones used for SLE).> I can''t speak for any testing done elsewhere though. I suspect that > other than what you guys do there isn''t that much of it. > >> While it seems that putting together a patch to address this >> shouldn''t be that difficult, a second question is how we can avoid >> getting into the same situation again when Linux extends the >> protocol again. > > I''ve always thought that the hypercall interface is rather too closely > modelled on internals of a particular implementation from a particular > version of Linux. On the other hand I''m not sure I have any better > ideas.Yeah, I agree with both parts. Probably some sort of signature of the called code would have helped. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 17.09.10 at 17:49, "Jan Beulich" <JBeulich@novell.com> wrote: > Ever since c/s 13829, the native (32-bit -> 32-bit) call to invoke the > secondary kernel has been missing its fourth argument. Apparently > this worked out because the respective stack location was non-zero. > > Starting with Linux 2.6.27 (32-bit) and 2.6.30 (64-bit) a new > argument is being expected by the secondary kernel, and again > apparently out of pure luck the 64-bit -> 64-bit case still appears > to work for those of our customers who want to use it. > > The question really is whether this code has ever been tested > with sufficiently recent kernels in all three variants (32->32, 64->64, > and 64->32). > > While it seems that putting together a patch to address this > shouldn''t be that difficult, a second question is how we can avoid > getting into the same situation again when Linux extends the > protocol again.Below a tentative, untested patch. Jan --- 2010-09-20.orig/xen/arch/x86/machine_kexec.c +++ 2010-09-20/xen/arch/x86/machine_kexec.c @@ -23,7 +23,11 @@ typedef void (*relocate_new_kernel_t)( unsigned long indirection_page, unsigned long *page_list, - unsigned long start_address); + unsigned long start_address, +#ifdef __i386__ + unsigned int cpu_has_pae, +#endif + unsigned int preserve_context); extern int machine_kexec_get_xen(xen_kexec_range_t *range); @@ -121,7 +125,11 @@ void machine_kexec(xen_kexec_image_t *im rnk = (relocate_new_kernel_t) image->page_list[1]; (*rnk)(image->indirection_page, image->page_list, - image->start_address); + image->start_address, +#ifdef __i386__ + 1 /* cpu_has_pae */, +#endif + 0 /* preserve_context */); } } --- 2010-09-20.orig/xen/arch/x86/x86_64/compat_kexec.S +++ 2010-09-20/xen/arch/x86/x86_64/compat_kexec.S @@ -119,6 +119,7 @@ compatibility_mode: movl %eax, %ss /* Push arguments onto stack. */ + pushl $0 /* 20(%esp) - preserve context */ pushl $1 /* 16(%esp) - cpu has pae */ pushl %ecx /* 12(%esp) - start address */ pushl %edx /* 8(%esp) - page list */ --- 2010-09-20.orig/xen/include/asm-x86/cpufeature.h +++ 2010-09-20/xen/include/asm-x86/cpufeature.h @@ -139,7 +139,6 @@ #define cpu_has_de boot_cpu_has(X86_FEATURE_DE) #define cpu_has_pse boot_cpu_has(X86_FEATURE_PSE) #define cpu_has_tsc boot_cpu_has(X86_FEATURE_TSC) -#define cpu_has_pae boot_cpu_has(X86_FEATURE_PAE) #define cpu_has_pge boot_cpu_has(X86_FEATURE_PGE) #define cpu_has_pat boot_cpu_has(X86_FEATURE_PAT) #define cpu_has_apic boot_cpu_has(X86_FEATURE_APIC) @@ -165,7 +164,6 @@ #define cpu_has_de 1 #define cpu_has_pse 1 #define cpu_has_tsc 1 -#define cpu_has_pae 1 #define cpu_has_pge 1 #define cpu_has_pat 1 #define cpu_has_apic boot_cpu_has(X86_FEATURE_APIC) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel