As I became aware of only now (by way of a bug report), Linux side kexec changes on i386 silently broke assumptions in the kexec hypercall interface. Therefore I''d like to get an understanding on what the general perspective on the following problems is: 1) machine_kexec_load() silently assumes that page_list[] has alternating (and paired) physical/virtual entries. This is no longer valid with 2.6.27, and while the immediate fix is to re-arrange the list entries so that the newly added entry would end up after all pairs (in 2.6.29 an extra gap will need to be added), this points out that the interface definition itself is really flawed. 2) KEXEC_XEN_NO_PAGES is set to the apparently arbitrary value of 17. While a build error would clearly show any incompatible Linux side change in this case, it still seems bogus to hard-code this Linux defined value into the hypercall interface. 3) machine_kexec_load() blindly iterates over all page_list[] entries, regardless of whether any of them is zero, and hence establishes (currently on 32-bit only) numerous bogus page table entries mapping mfn 0. Thanks for any comments, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Mar 12, 2009 at 02:49:50PM +0000, Jan Beulich wrote:> As I became aware of only now (by way of a bug report), Linux side kexec > changes on i386 silently broke assumptions in the kexec hypercall interface. > Therefore I''d like to get an understanding on what the general perspective > on the following problems is: > > 1) machine_kexec_load() silently assumes that page_list[] has alternating > (and paired) physical/virtual entries. This is no longer valid with 2.6.27, and > while the immediate fix is to re-arrange the list entries so that the newly > added entry would end up after all pairs (in 2.6.29 an extra gap will need > to be added), this points out that the interface definition itself is really > flawed. > > 2) KEXEC_XEN_NO_PAGES is set to the apparently arbitrary value of 17. > While a build error would clearly show any incompatible Linux side change > in this case, it still seems bogus to hard-code this Linux defined value into > the hypercall interface. > > 3) machine_kexec_load() blindly iterates over all page_list[] entries, > regardless of whether any of them is zero, and hence establishes > (currently on 32-bit only) numerous bogus page table entries mapping > mfn 0.Hi Jan, sorry for not replying earlier. I have CCed Magnus Damm who along with myself was involved in the original port of kexec to Xen. I must confess that I am bit rusty on the details, but as I recall the premise of the original design of our port was to allow the existing kexec functionality to work inside Xen. Not great deal of thought was given to to anticipating ways that kexec might change in the future. In reference to questions 1) and 3), all I can really respond with is that is how kexec worked at that time (IIRC), so it seemed to be logical for xen to read the data in that way. If its not now, then I am more than happy to help work towards a better solution. In reference to question 2), the number 17 is derived from include/asm-x86_64/kexec.h in the linux-xen 2.6.18 tree as x86_64 has the highest value for PAGES_NR. I agree that having this number included in the hypercall interface isn''t ideal. -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Magnus Damm <magnus.damm@gmail.com> 16.03.09 12:19 >>> >So the code assumes that PA_.. are even and VA_.. are odd. The defines >could have been copied over to make more readable code, yes, or we >could have shared header files somehow. In the end the hypervisor >assumes the interface remains unchanged. The addresses are modified so >the assembly snipped can run from hypervisor address space. > >What is the real problem? Just that the interface has changed?The issue is that the Linux side changed without having any way to recognize resulting problems apart from running into them. If a lower software layer gets designed based on implementation details of a higher layer, there should at least be build time checks that make sure the upper layer doesn''t change in incompatible ways (and yes, a few checks are in that code, but their coverage is rather limited). But I view the issue as broader: Any other OS wanting to make use of the kexec hypercall interface would be required to match the Linux implementation in various respects. This is what I consider a design flaw, which makes me think that the current kexec sub-hypercalls should be urgently deprecated in favor of a clean interface.>>> 3) machine_kexec_load() blindly iterates over all page_list[] entries, >>> regardless of whether any of them is zero, and hence establishes >>> (currently on 32-bit only) numerous bogus page table entries mapping >>> mfn 0. > >I don''t remember if this was how things were in our original >implementation, or if it is a side effect of upstream linux moving. >Are the page table mappings bad? They may be a side effect of a shared >implementation between 32-bit and 64-bit x86..I don''t think they''re bad - they''re just superfluous and as such latently dangerous.>> On Thu, Mar 12, 2009 at 02:49:50PM +0000, Jan Beulich wrote: >>> 2) KEXEC_XEN_NO_PAGES is set to the apparently arbitrary value of 17. >>> While a build error would clearly show any incompatible Linux side change >>> in this case, it still seems bogus to hard-code this Linux defined value into >>> the hypercall interface. > >> In reference to question 2), the number 17 is derived from >> include/asm-x86_64/kexec.h in the linux-xen 2.6.18 tree as x86_64 has the >> highest value for PAGES_NR. I agree that having this number included in >> the hypercall interface isn''t ideal. > >I remember that the KEXEC_XEN_NO_PAGES value is related to fixmap(). >We use virtual addresses from the fixmap to map in the pages coming >from dom0. The fixmap is static by definition. So i think a static >number comes from there.But what if x86-64 decides to add PA_SWAP_PAGE (as i386 did in 2.6.27)? While at least we''d get a build-time error on the kernel side, there will be no way to get this to work with the current interface, unless they also drop the middle level page table pointers (as i386 did in 2.6.29). That number should have been chosen at least to leave some head room, but really it shouldn''t be a fixed number at all (after all, at the time the respective mappings are needed, almost the entire Xen address space is available, so there seems little reason to constrain them to the fixmap space). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Mar 16, 2009 at 01:54:34PM +0000, Jan Beulich wrote:> >>> Magnus Damm <magnus.damm@gmail.com> 16.03.09 12:19 >>> > >So the code assumes that PA_.. are even and VA_.. are odd. The defines > >could have been copied over to make more readable code, yes, or we > >could have shared header files somehow. In the end the hypervisor > >assumes the interface remains unchanged. The addresses are modified so > >the assembly snipped can run from hypervisor address space. > > > >What is the real problem? Just that the interface has changed? > > The issue is that the Linux side changed without having any way to > recognize resulting problems apart from running into them. If a lower > software layer gets designed based on implementation details of a higher > layer, there should at least be build time checks that make sure the upper > layer doesn''t change in incompatible ways (and yes, a few checks are > in that code, but their coverage is rather limited). > > But I view the issue as broader: Any other OS wanting to make use of > the kexec hypercall interface would be required to match the Linux > implementation in various respects. This is what I consider a design flaw, > which makes me think that the current kexec sub-hypercalls should be > urgently deprecated in favor of a clean interface.Skipping to the chase, what do you envisage such an interface looking like? -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Simon Horman <horms@verge.net.au> 16.03.09 21:51 >>> >Skipping to the chase, what do you envisage such an interface looking like?Something like changing page_list to a pointer (and add a count), and having an extra attribute (or operation code if you will) field for each page that tells what to do with it (i.e. as a replacement for the pa/va pairs: indicating that a certain entry is to represent the virtual address for a machine address specified by another entry). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 2009-03-16 at 09:54 -0400, Jan Beulich wrote:> > But I view the issue as broader: Any other OS wanting to make use of > the kexec hypercall interface would be required to match the Linux > implementation in various respects. This is what I consider a design > flaw, which makes me think that the current kexec sub-hypercalls > should be urgently deprecated in favor of a clean interface.I agree with this, the existing interface has always irked me a little due it its very close ties to the specifics of the Linux 2.6.18 kexec interface. I''ve never been entirely sure what a better interface would look like, but something like you suggest later seems to make sense: On Tue, 2009-03-17 at 04:09 -0400, Jan Beulich wrote:> Something like changing page_list to a pointer (and add a count), and > having an extra attribute (or operation code if you will) field for > each page that tells what to do with it (i.e. as a replacement for the > pa/va pairs: indicating that a certain entry is to represent the > virtual address for a machine address specified by another entry).Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel