Hi folks, The Xen ELF kernel loader is quite quirky wrt. physical and virtual addresses, probably for historical reasons, linux got that wrong too until very recently (kexec merge in 2.6.14 or so). The patch below fixes that. Changes: * Fix linux kernel ELF entry point (also submitted to lkml) * Drop LOAD_OFFSET re-#define hack in xen headers. * Fix both dom0 and libxc elf loaders. * add quick mode so loading old linux kernels doesn''t break. Linux-wise everything should be OK with that, but it might break other OS''es which also use the ELF loader (in case they create bug-compatible ELF headers with broken paddr entries ...). please apply, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I''m afraid this won''t work for 64-bits, because of the way the vsyscall page gets set up there. If you dump the program headers you''ll see that the respective segment is at an address close to the top of the address space, which would make the image appear to need nearly 2 Gb of memory (making the load fail if there''s less than that available to the domain). Also, I''m unclear why you need to make persistent the physical address information for the dom0 case, and hence why the bug compatibility hack is needed there. Jan>>> Gerd Hoffmann <kraxel@suse.de> 22.02.06 12:37:26 >>>Hi folks, The Xen ELF kernel loader is quite quirky wrt. physical and virtual addresses, probably for historical reasons, linux got that wrong too until very recently (kexec merge in 2.6.14 or so). The patch below fixes that. Changes: * Fix linux kernel ELF entry point (also submitted to lkml) * Drop LOAD_OFFSET re-#define hack in xen headers. * Fix both dom0 and libxc elf loaders. * add quick mode so loading old linux kernels doesn''t break. Linux-wise everything should be OK with that, but it might break other OS''es which also use the ELF loader (in case they create bug-compatible ELF headers with broken paddr entries ...). please apply, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich wrote:> I''m afraid this won''t work for 64-bits, because of the way the vsyscall page gets set up there. If you dump the program > headers you''ll see that the respective segment is at an address close to the top of the address space, which would make > the image appear to need nearly 2 Gb of memory (making the load fail if there''s less than that available to the > domain).Yep, will not work work as-is, the domain builder needs more fixes for that.> Also, I''m unclear why you need to make persistent the physical address information for the dom0 case, and hence why the > bug compatibility hack is needed there.Well, the libxc loader needs it to be able to load both old kernels and kernels with correct physical addresses in the phdr. The dom0 loader doesn''t really need it, at least on x32, right now it doesn''t look at the physical addresses anyway. Instead it depends on the VIRT_BASE hack to get it right. It''s sort of documentation only ;) For x64 it will certainly be needed to get the memory size calculations right (use physical not virtual addresses) and fix the issue you''ve outlined above that way. cheers, Gerd -- Gerd ''just married'' Hoffmann <kraxel@suse.de> I''m the hacker formerly known as Gerd Knorr. http://www.suse.de/~kraxel/just-married.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich wrote:> I''m afraid this won''t work for 64-bits, because of the way the vsyscall page gets set up there. If you dump the program > headers you''ll see that the respective segment is at an address close to the top of the address space, which would make > the image appear to need nearly 2 Gb of memory (making the load fail if there''s less than that available to the > domain).Hmm, who sets up the vsyscall mapping right now? The linux kernel I guess? The elf kernel loader certainly doesn''t look like it cares much about virtual address (other than VIRT_BASE) ... cheers, Gerd -- Gerd ''just married'' Hoffmann <kraxel@suse.de> I''m the hacker formerly known as Gerd Knorr. http://www.suse.de/~kraxel/just-married.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Gerd Hoffmann <kraxel@suse.de> 22.02.06 16:12:25 >>> >Jan Beulich wrote: >> I''m afraid this won''t work for 64-bits, because of the way the vsyscall page gets set up there. If you dump theprogram>> headers you''ll see that the respective segment is at an address close to the top of the address space, which wouldmake>> the image appear to need nearly 2 Gb of memory (making the load fail if there''s less than that available to the >> domain). > >Hmm, who sets up the vsyscall mapping right now? The linux kernel I >guess?Yes.>The elf kernel loader certainly doesn''t look like it cares much >about virtual address (other than VIRT_BASE) ...Correct. And the same applies to boot loaders. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> The elf kernel loader certainly doesn''t look like it cares much >> about virtual address (other than VIRT_BASE) ... > > Correct. And the same applies to boot loaders.Well, the difference between xen and other boot loaders is that xen boots the kernel with paging already enabled ... Right now it''s completely broken: the loader works only with virtual addresses in the paddr header field. It can''t stay that way, especially if we''ll want to have kexec working with xenlinux kernels some day ;) As I see things there are basically two ways to fixup this: Either create a simple linear mapping using paddr + VIRT_BASE, which would be pretty close to the current behaviour (and other boot loaders). Or do a complete virtual memory setup, which probably can''t be done without major changes in the domain builder ... cheers, Gerd -- Gerd ''just married'' Hoffmann <kraxel@suse.de> I''m the hacker formerly known as Gerd Knorr. http://www.suse.de/~kraxel/just-married.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:>>>The elf kernel loader certainly doesn''t look like it cares much >>>about virtual address (other than VIRT_BASE) ... >> >>Correct. And the same applies to boot loaders. > > > Well, the difference between xen and other boot loaders is that xen > boots the kernel with paging already enabled ...actually, there have historically been a ton of boot loaders that boot with paging already enabled: just go back to the early Suns; part of the task of SunOS was to move the firmware''s page tables to kernel space and use them until it could build its own. I hit this when I ported sunos to a machine with firmware that did not set up page tables. it''s been very common, in the non-PC world, to have paging turned on very early. Just a historical comment of no value to this discussion :-) ron _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 2/22/06, Gerd Hoffmann <kraxel@suse.de> wrote:> As I see things there are basically two ways to fixup this: Either > create a simple linear mapping using paddr + VIRT_BASE, which would be > pretty close to the current behaviour (and other boot loaders). Or do a > complete virtual memory setup, which probably can''t be done without > major changes in the domain builder ...hi, as previously mentioned here and at the XenSummit, I have a simple yet functioning domU bootloader which defers ELF-decoding to domU-space. It currently only works when the input is a ramdisk image, created with a small tool called ''pack. It used to work from a TCP-connection (provided by the fabulous UIP which is still in my tree), hence the weird state-machine look of the code. I suppose taking this approach could solve all our domU-building woes for good. It should also provide a more elegant solution to the attestation problems the Intel guys were talking about and trying to solve with their two-stage domain building proposal. Clone my tree at http://www.distlab.dk/hg/index.cgi/xen-gfx.hg and have a look at the extras/mstrap subdir. The ''pack'' tool is in tools/migrate. You will need Jam to build, or you have to roll your own Makefile. The tree also contains my very experimental ''Blink'' display system as demoed at the Summit in tools/gfx, and my self-migration patch to XenLinux which at the moment is mostly useful for self-checkpointing to disk (see tools/minimig.c). Your mileage may vary. Jacob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
This is a good start as the PHYS vs. VIRT stuff in the ELF loader is all a bit overloaded. However, I believe these changes aren''t quite complete and for example would break the released OpenSolaris Xen client. It has multiple PT_LOAD sections in the Elf file, some with p_vaddr == p_paddr on purpose and some which don''t. We rely on a "boot loader" (ie grub or domain builder) that only cares about p_paddr. The identity mapped PT_LOAD section contains the OS entry point and has the code to remap the other of the sections to the final VA by creating/installing new page table entries. For example, I think the xc_load_elf.c change: @@ -189,7 +189,18 @@ for ( done = 0; done < phdr->p_filesz; done += chunksz ) { - pa = (phdr->p_paddr + done) - dsi->v_start; + if (phdr->p_paddr == phdr->p_vaddr) { + /* + * Bug compatibility alert: In older linux kernels + * p_paddr is broken, it doesn''t contain the physical + * address but instead is identical to p_vaddr. Thus + * we can''t use it directly, instead we''ll guess it + * using dsi->v_start. + */ + pa = (phdr->p_vaddr + done) - dsi->v_start; + } else { + pa = (phdr->p_paddr + done); + } va = xc_map_foreign_range( xch, dom, PAGE_SIZE, PROT_WRITE, parray[pa>>PAGE_SHIFT]); chunksz = phdr->p_filesz - done; needs to have the line: pa = (phdr->p_paddr + done); be more like: pa = (phdr->p_paddr + done) - kernstart; or better yet add a dsi->p_start and dsi->p_end to use. The same applies to your change to xen/common/elf.c for dom0. To save you downloading OpenSolaris. Here''s sample values from the domU/dom0 ELF image: In the xenguest section we currently specify VIRT_BASE=0x40000000, as there was no PHYS_BASE=. In the flavor of your other changes, I''d expect you could add PHYS_BASE= and OpenSolaris would change to use that. e_entry: 0x40800000 Program Header[0]: p_vaddr: 0x40800000 p_flags: [ PF_X PF_W PF_R ] p_paddr: 0x40800000 p_type: [ PT_LOAD ] p_filesz: 0xe95c p_memsz: 0xe95c p_offset: 0xd4 p_align: 0 Program Header[1]: p_vaddr: 0xfb400000 p_flags: [ PF_X PF_R ] p_paddr: 0x40000000 p_type: [ PT_LOAD ] p_filesz: 0x2aa362 p_memsz: 0x2aa362 p_offset: 0xea40 p_align: 0 Program Header[2]: p_vaddr: 0xfb800000 p_flags: [ PF_X PF_W PF_R ] p_paddr: 0x40400000 p_type: [ PT_LOAD ] p_filesz: 0x16515 p_memsz: 0x94a44 p_offset: 0x2b8dc0 p_align: 0 Here''s sample values we use for the 64 bit Xen OS image: e_entry: 0x40800000 Program Header[0]: p_vaddr: 0x40800000 p_flags: [ PF_X PF_W PF_R ] p_paddr: 0x40800000 p_type: [ PT_LOAD ] p_filesz: 0xed28 p_memsz: 0xed28 p_offset: 0x190 p_align: 0 Program Header[1]: p_vaddr: 0xfffffffffb800000 p_flags: [ PF_X PF_R ] p_paddr: 0x40000000 p_type: [ PT_LOAD ] p_filesz: 0x39adca p_memsz: 0x39adca p_offset: 0xeec0 p_align: 0 Program Header[2]: p_vaddr: 0xfffffffffbc00000 p_flags: [ PF_X PF_W PF_R ] p_paddr: 0x40400000 p_type: [ PT_LOAD ] p_filesz: 0x20fe9 p_memsz: 0xe36c0 p_offset: 0x3a9cc0 p_align: 0 Joe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> some with p_vaddr == p_paddr on purpose and some which don''t. We rely on a > "boot loader" (ie grub or domain builder) that only cares about p_paddr.Same goes for the x86_64 linux kernel with the vsyscall page ... I''ve settled for a slightly different approach now. To keep behaviour as close as possible to classic i386 boot loaders I''ll check paddr only and use the virt_base value from the __xen_guest section to shift the addresses. For bug compatibility with old linux kernels I compare paddr + virt_base. New patch below, this time tested both 32 and 64 bit (linux only though), I think it should be ok for OpenSolaris too ;) cheers, Gerd -- Gerd ''just married'' Hoffmann <kraxel@suse.de> I''m the hacker formerly known as Gerd Knorr. http://www.suse.de/~kraxel/just-married.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 2/23/06, Gerd Hoffmann <kraxel@suse.de> wrote:> > some with p_vaddr == p_paddr on purpose and some which don''t. We rely on a > > "boot loader" (ie grub or domain builder) that only cares about p_paddr. > > Same goes for the x86_64 linux kernel with the vsyscall page ... > > I''ve settled for a slightly different approach now. To keep behaviour > as close as possible to classic i386 boot loaders I''ll check paddr only > and use the virt_base value from the __xen_guest section to shift the > addresses. For bug compatibility with old linux kernels I compare paddr > + virt_base. New patch below, this time tested both 32 and 64 bit > (linux only though), I think it should be ok for OpenSolaris too ;)I have two issues with your latest patch: - the heuristics you use to distinguish between "old" and "new" kernel images, especially the case where a valid "new" kernel image could be mistaken for an "old" image. - the change to the entry point, now entry point and the elf header paddr fields are in different address modes. We have the current combination of VIRT_BASE and elf header paddr fields containing virtual addresses because: - we want to be able to specify where the virtual address space mapped by the initial pagetables starts, this is why we have VIRT_BASE - since we always run in virtual address mode, we thought that the loader should also do so, thus it will load the image to a virtual address - this seems to be in line with how other loaders would read the elf header paddr fields, i.e. use them as load addresses. If we really need to get rid of how we change LOAD_OFFSET in Linux, how about this: we add another entry to the __xen_guest header, PHYS_OFFSET=0 and then substract PHYS_OFFSET from all elf header paddr fiels to turn them into physical addresses, respectively, resp. add VIRT_BASE-PHYS_OFFSET to turn them into virtual addresses. We default PHYS_OFFSET to VIRT_BASE if it''s not present in the __xen_guest header. christian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi,> I have two issues with your latest patch: > - the heuristics you use to distinguish between "old" and "new" kernel > images, especially the case where a valid "new" kernel image could be > mistaken for an "old" image.I don''t think it can ever happen, at least not with linux. The (correct) paddr addresses must be relatively small, otherwise loading the kernel on machines with a small amount of memory will not work. Especially they must be much smaller than the usual linux kernel VIRT_BASE. Not sure about other OS''es, maybe there are some which use a small VIRT_BASE, then it could be problematic, yes.> - the change to the entry point, now entry point and the elf header > paddr fields are in different address modes.xen/include/xen/elf.h says entry point is virtual. IMHO the elf headers should be correctly filled.> - we want to be able to specify where the virtual address space mapped > by the initial pagetables starts, this is why we have VIRT_BASESure.> - since we always run in virtual address mode, we thought that the > loader should also do so, thus it will load the image to a virtual > address - this seems to be in line with how other loaders would read > the elf header paddr fields, i.e. use them as load addresses.Well, loadelfimage() in tools/libxc/xc_load_elf.c copyes the image page-by-page to the _physical_ addresses. And this is how it should be IMHO. virt_base is only used to create the initial virtual mappings as expected by the guest kernel (and for bug-compatibility with old kernels). The dom0 builder can take a shortcut and simply copy the big blobs to virtual addresses (paddr + virt_base). That works because the initial virtual mapping is a simple offset to the physical address and the mappings are already created and active at that point.> If we really need to get rid of how we change LOAD_OFFSET in Linux, > how about this: > we add another entry to the __xen_guest header, PHYS_OFFSET=0 and then > substract PHYS_OFFSET from all elf header paddr fiels to turn them > into physical addresses, respectively, resp. add VIRT_BASE-PHYS_OFFSET > to turn them into virtual addresses. We default PHYS_OFFSET to > VIRT_BASE if it''s not present in the __xen_guest header.I don''t think the new PHYS_OFFSET is needed. cheers, Gerd -- Gerd ''just married'' Hoffmann <kraxel@suse.de> I''m the hacker formerly known as Gerd Knorr. http://www.suse.de/~kraxel/just-married.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> - the change to the entry point, now entry point and the elf header > paddr fields are in different address modes.FYI: that one was rejected on lkml so we probably should go with (physical) entry_point + virt_base ... cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel