Jeremy Fitzhardinge
2007-Jun-06 16:15 UTC
[PATCH RFC 0/7] proposed updates to boot protocol and paravirt booting
This series: 1. Updates the boot protocol to version 2.07 2. Clean up the existing build process, to get rid of tools/build and make the linker do more heavy lifting 3. Make the bzImage payload an ELF file. The bootloader can extract this as a naked ELF file by skipping over boot_params.setup_sects worth of 16-bit setup code. 4. Update the boot_params to 2.07, and update the kernel's head.S to jump to the appropriate subarch-specific kernel entrypoint. The very earliest code is common (copy boot_params, clear bss); the split happens just before the initial pagetable setup. + random little changes to make it all hang together This boots native for me, so everything basically works. But I haven't tested it end-to-end yet, because I haven't done the Xen bits yet. Perhaps Rusty can do the lguest version to verify that its all sound in principle (hint hint ;). So, how does it look? J --
Jeremy Fitzhardinge
2007-Jun-06 16:15 UTC
[PATCH RFC 2/7] add WEAK() for creating weak asm labels
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> --- include/linux/linkage.h | 6 ++++++ 1 file changed, 6 insertions(+) ==================================================================--- a/include/linux/linkage.h +++ b/include/linux/linkage.h @@ -34,6 +34,12 @@ name: #endif +#ifndef WEAK +#define WEAK(name) \ + .weak name; \ + name: +#endif + #define KPROBE_ENTRY(name) \ .pushsection .kprobes.text, "ax"; \ ENTRY(name) --
Proposed updates for version 2.07 of the boot protocol. This includes: load_flags.KEEP_SEGMENTS- flag to request/inhibit segment reloads hardware_subarch - what subarchitecture we're booting under hardware_subarch_data - per-architecture data kernel_payload - address of the raw kernel blob The intention of these changes is to make booting a paravirtualized kernel work via the normal Linux boot protocol. The intention is that the bzImage payload can be a properly formed ELF file, so that the bootloader can use its ELF notes and Phdrs to get more metadata about the kernel and its requirements. The ELF file could be the uncompressed kernel vmlinux itself; it would only take small buildsystem changes to implement this. kernel_payload was added so that a bootloader can just get to the raw bits of the kernel, so that it can do its own decompression/relocation if it wishes. This is not particularly well-defined yet; I just added it with the hope that it keeps HPA happy. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> --- Documentation/i386/boot.txt | 43 +++++++++++++++++++++++++++++++++++++++- arch/i386/kernel/asm-offsets.c | 7 ++++++ include/asm-i386/bootparam.h | 10 +++++++-- 3 files changed, 57 insertions(+), 3 deletions(-) ==================================================================--- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt @@ -168,6 +168,9 @@ 0234/1 2.05+ relocatable_kernel Whether 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not 0235/3 N/A pad2 Unused 0238/4 2.06+ cmdline_size Maximum size of the kernel command line +023C/4 2.07+ hardware_subarch Hardware subarchitecture +0240/8 2.07+ hardware_subarch_data Subarchitecture-specific data +0248/4 2.07+ kernel_payload Pointer to raw kernel data (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -204,7 +207,7 @@ boot loaders can ignore those fields. The byte order of all fields is littleendian (this is x86, after all.) -Field name: setup_secs +Field name: setup_sects Type: read Offset/size: 0x1f1/1 Protocol: ALL @@ -356,6 +359,13 @@ Protocol: 2.00+ - If 0, the protected-mode code is loaded at 0x10000. - If 1, the protected-mode code is loaded at 0x100000. + Bit 6 (write): KEEP_SEGMENTS + Protocol: 2.07+ + - if 0, reload the segment registers in the 32bit entry point. + - if 1, do not reload the segment registers in the 32bit entry point. + Assume that %cs %ds %ss %es are all set to flat segments with + a base of 0 (or the equivalent for their environment). + Bit 7 (write): CAN_USE_HEAP Set this bit to 1 to indicate that the value entered in the heap_end_ptr is valid. If this field is clear, some setup code @@ -479,6 +489,37 @@ Protocol: 2.06+ zero. This means that the command line can contain at most cmdline_size characters. With protocol version 2.05 and earlier, the maximum size was 255. + +Field name: hardware_subarch +Type: write +Offset/size: 0x23c/4 +Protocol: 2.07+ + + In a paravirtualized environment the hardware low level architectural + pieces such as interrupt handling, page table handling, and + accessing process control registers needs to be done differently. + + This field allows the bootloader to inform the kernel we are in one + one of those environments. + + 0x00000000 The default x86/PC environment + 0x00000001 lguest + 0x00000002 Xen + +Field name: hardware_subarch_data +Type: write +Offset/size: 0x240/8 +Protocol: 2.07+ + + A pointer to data that is specific to hardware subarch + +Field name: kernel_payload +Type: read +Offset/size: 0x248/4 +Protocol: 2.07+ + + The relocated pointer to the actual kernel payload, in whatever form + it exists in (gzip image, normally). **** THE KERNEL COMMAND LINE ==================================================================--- a/arch/i386/kernel/asm-offsets.c +++ b/arch/i386/kernel/asm-offsets.c @@ -15,6 +15,7 @@ #include <asm/fixmap.h> #include <asm/processor.h> #include <asm/thread_info.h> +#include <asm/bootparam.h> #include <asm/elf.h> #include <xen/interface/xen.h> @@ -143,4 +144,10 @@ void foo(void) OFFSET(LGUEST_PAGES_regs_errcode, lguest_pages, regs.errcode); OFFSET(LGUEST_PAGES_regs, lguest_pages, regs); #endif + + BLANK(); + OFFSET(BP_scratch, boot_params, scratch); + OFFSET(BP_loadflags, boot_params, hdr.loadflags); + OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch); + OFFSET(BP_version, boot_params, hdr.version); } ==================================================================--- a/include/asm-i386/bootparam.h +++ b/include/asm-i386/bootparam.h @@ -24,8 +24,9 @@ struct setup_header { u16 kernel_version; u8 type_of_loader; u8 loadflags; -#define LOADED_HIGH 0x01 -#define CAN_USE_HEAP 0x80 +#define LOADED_HIGH (1<<0) +#define KEEP_SEGMENTS (1<<6) +#define CAN_USE_HEAP (1<<7) u16 setup_move_size; u32 code32_start; u32 ramdisk_image; @@ -37,6 +38,11 @@ struct setup_header { u32 initrd_addr_max; u32 kernel_alignment; u8 relocatable_kernel; + u8 _pad2[3]; + u32 cmdline_size; + u32 hardware_subarch; + u64 hardware_subarch_data; + u32 kernel_payload; } __attribute__((packed)); struct sys_desc_table { --
Jeremy Fitzhardinge
2007-Jun-06 16:16 UTC
[PATCH RFC 5/7] i386: clean up bzImage generation
This patch cleans up image generation in several ways: - Firstly, it removes tools/build, and uses binutils to do all the final construction of the bzImage. This removes a chunk of code and makes the image generation more flexible, since we can compute various numbers rather than be forced to use fixed constants. - Rename compressed/vmlinux to compressed/blob, to make it a bit clearer that it's the compressed kernel image + decompressor (now all the files named "vmlinux*" are directly derived from the kernel vmlinux). - Rather than using objcopy to wrap the compressed kernel into an object file, simply use the assembler: payload.S does a .incbin of the blob.bin file, which allows us to easily place it into a section, and it makes the Makefile dependency a little clearer. - Similarly, use the same technique to create compressed/piggy.o, which cleans things up even more, since the .S file can also set the input and output_size symbols without further linker script hackery; it also removes a complete linker script. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> --- arch/i386/boot/Makefile | 31 +----- arch/i386/boot/compressed/Makefile | 13 -- arch/i386/boot/compressed/piggy.S | 10 + arch/i386/boot/compressed/vmlinux.scr | 10 - arch/i386/boot/header.S | 6 - arch/i386/boot/payload.S | 3 arch/i386/boot/setup.ld | 39 ++++--- arch/i386/boot/tools/.gitignore | 1 arch/i386/boot/tools/build.c | 168 --------------------------------- 9 files changed, 56 insertions(+), 225 deletions(-) ==================================================================--- a/arch/i386/boot/Makefile +++ b/arch/i386/boot/Makefile @@ -25,12 +25,13 @@ SVGA_MODE := -DSVGA_MODE=NORMAL_VGA #RAMDISK := -DRAMDISK=512 -targets := vmlinux.bin setup.bin setup.elf zImage bzImage +targets := blob.bin setup.elf zImage bzImage subdir- := compressed setup-y += a20.o apm.o cmdline.o copy.o cpu.o cpucheck.o edd.o -setup-y += header.o main.o mca.o memory.o pm.o pmjump.o -setup-y += printf.o string.o tty.o video.o version.o voyager.o +setup-y += header.o main.o mca.o memory.o payload.o pm.o +setup-y += pmjump.o printf.o string.o tty.o video.o version.o +setup-y += voyager.o # The link order of the video-*.o modules can matter. In particular, # video-vga.o *must* be listed first, followed by video-vesa.o. @@ -39,10 +40,6 @@ setup-y += video-vga.o setup-y += video-vga.o setup-y += video-vesa.o setup-y += video-bios.o - -hostprogs-y := tools/build - -HOSTCFLAGS_build.o := $(LINUXINCLUDE) # --------------------------------------------------------------------------- @@ -65,18 +62,12 @@ AFLAGS := $(CFLAGS) -D__ASSEMBLY__ $(obj)/bzImage: IMAGE_OFFSET := 0x100000 $(obj)/bzImage: EXTRA_CFLAGS := -D__BIG_KERNEL__ $(obj)/bzImage: EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__ -$(obj)/bzImage: BUILDFLAGS := -b -quiet_cmd_image = BUILD $@ -cmd_image = $(obj)/tools/build $(BUILDFLAGS) $(obj)/setup.bin \ - $(obj)/vmlinux.bin $(ROOT_DEV) > $@ - -$(obj)/zImage $(obj)/bzImage: $(obj)/setup.bin \ - $(obj)/vmlinux.bin $(obj)/tools/build FORCE - $(call if_changed,image) +$(obj)/zImage $(obj)/bzImage: $(obj)/setup.elf FORCE + $(call if_changed,objcopy) @echo 'Kernel: $@ is ready' ' (#'`cat .version`')' -$(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE +$(obj)/blob.bin: $(obj)/compressed/blob FORCE $(call if_changed,objcopy) SETUP_OBJS = $(addprefix $(obj)/,$(setup-y)) @@ -85,12 +76,10 @@ LDFLAGS_setup.elf := -T $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE $(call if_changed,ld) -OBJCOPYFLAGS_setup.bin := -O binary +$(obj)/payload.o: EXTRA_AFLAGS := -Wa,-I$(obj) +$(obj)/payload.o: $(src)/payload.S $(obj)/blob.bin -$(obj)/setup.bin: $(obj)/setup.elf FORCE - $(call if_changed,objcopy) - -$(obj)/compressed/vmlinux: FORCE +$(obj)/compressed/blob: FORCE $(Q)$(MAKE) $(build)=$(obj)/compressed IMAGE_OFFSET=$(IMAGE_OFFSET) $@ # Set this if you want to pass append arguments to the zdisk/fdimage/isoimage kernel ==================================================================--- a/arch/i386/boot/compressed/Makefile +++ b/arch/i386/boot/compressed/Makefile @@ -4,11 +4,10 @@ # create a compressed vmlinux image from the original vmlinux # -targets := vmlinux vmlinux.bin vmlinux.bin.gz head.o misc.o piggy.o \ +targets := blob vmlinux.bin vmlinux.bin.gz head.o misc.o piggy.o \ vmlinux.bin.all vmlinux.relocs -EXTRA_AFLAGS := -traditional -LDFLAGS_vmlinux := -T +LDFLAGS_blob := -T hostprogs-y := relocs CFLAGS := -m32 -D__KERNEL__ $(LINUX_INCLUDE) -O2 \ @@ -17,7 +16,7 @@ CFLAGS := -m32 -D__KERNEL__ $(LINUX_INC $(call cc-option,-fno-stack-protector) LDFLAGS := -m elf_i386 -$(obj)/vmlinux: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE +$(obj)/blob: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE $(call if_changed,ld) @: @@ -44,7 +43,5 @@ else $(call if_changed,gzip) endif -LDFLAGS_piggy.o := -r --format binary --oformat elf32-i386 -T - -$(obj)/piggy.o: $(src)/vmlinux.scr $(obj)/vmlinux.bin.gz FORCE - $(call if_changed,ld) +$(obj)/piggy.o: EXTRA_AFLAGS := -Wa,-I$(obj) +$(obj)/piggy.o: $(obj)/vmlinux.bin.gz ==================================================================--- /dev/null +++ b/arch/i386/boot/compressed/piggy.S @@ -0,0 +1,10 @@ +.section .data.compressed,"a",@progbits + +.globl input_data, input_len, output_len + +input_len: .long input_data_end - input_data + +input_data: +.incbin "vmlinux.bin.gz" +output_len = .-4 +input_data_end: ==================================================================--- a/arch/i386/boot/compressed/vmlinux.scr +++ /dev/null @@ -1,10 +0,0 @@ -SECTIONS -{ - .data.compressed : { - input_len = .; - LONG(input_data_end - input_data) input_data = .; - *(.data) - output_len = . - 4; - input_data_end = .; - } -} ==================================================================--- a/arch/i386/boot/header.S +++ b/arch/i386/boot/header.S @@ -97,9 +97,9 @@ bugger_off_msg: .section ".header", "a" .globl hdr hdr: -setup_sects: .byte SETUPSECTS +setup_sects: .byte _setup_sects root_flags: .word ROOT_RDONLY -syssize: .long SYSSIZE +syssize: .long kernel_size_para ram_size: .word RAMDISK vid_mode: .word SVGA_MODE root_dev: .word ROOT_DEV @@ -148,7 +148,7 @@ CAN_USE_HEAP = 0x80 # If set, the load .byte LOADED_HIGH #endif -setup_move_size: .word 0x8000 # size to move, when setup is not +setup_move_size: .word _setup_size # size to move, when setup is not # loaded at 0x90000. We will move setup # to 0x90000 then just before jumping # into the kernel. However, only the ==================================================================--- /dev/null +++ b/arch/i386/boot/payload.S @@ -0,0 +1,3 @@ +.section .kernel,"a",@progbits + +.incbin "blob.bin" ==================================================================--- a/arch/i386/boot/setup.ld +++ b/arch/i386/boot/setup.ld @@ -3,18 +3,16 @@ * * Linker script for the i386 setup code */ -OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386") +OUTPUT_FORMAT("elf32-i386") OUTPUT_ARCH(i386) ENTRY(_start) SECTIONS { - . = 0; - .bstext : { *(.bstext) } + .bstext 0 : { *(.bstext) } .bsdata : { *(.bsdata) } - . = 497; - .header : { *(.header) } + .header 497 : { *(.header) } .inittext : { *(.inittext) } .initdata : { *(.initdata) } .text : { *(.text*) } @@ -38,16 +36,29 @@ SECTIONS . = ALIGN(16); - __bss_start = .; - .bss : - { - *(.bss) - } - . = ALIGN(16); - _end = .; + .bss ALIGN(16) : { + __bss_start = .; + *(.bss) + . = ALIGN(16); + _end = .; + } /DISCARD/ : { *(.note*) } - . = ASSERT(_end <= 0x8000, "Setup too big!"); - . = ASSERT(hdr == 0x1f1, "The setup header has the wrong offset!"); + . = ALIGN(512); /* align to sector size */ + _setup_size = . - _start; + _setup_sects = _setup_size / 512; + + /* compressed kernel data */ + .kernel : { + kernel = .; + *(.kernel) + kernel_end = .; + + } + kernel_size = kernel_end - kernel; + kernel_size_para = (kernel_size + 15) / 16; } + +ASSERT(_end <= 0x8000, "Setup too big!"); +ASSERT(hdr == 0x1f1, "The setup header has the wrong offset!"); ==================================================================--- a/arch/i386/boot/tools/.gitignore +++ /dev/null @@ -1,1 +0,0 @@ -build ==================================================================--- a/arch/i386/boot/tools/build.c +++ /dev/null @@ -1,168 +0,0 @@ -/* - * Copyright (C) 1991, 1992 Linus Torvalds - * Copyright (C) 1997 Martin Mares - * Copyright (C) 2007 H. Peter Anvin - */ - -/* - * This file builds a disk-image from three different files: - * - * - setup: 8086 machine code, sets up system parm - * - system: 80386 code for actual system - * - * It does some checking that all files are of the correct type, and - * just writes the result to stdout, removing headers and padding to - * the right amount. It also writes some system data to stderr. - */ - -/* - * Changes by tytso to allow root device specification - * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996 - * Cross compiling fixes by Gertjan van Wingerde, July 1996 - * Rewritten by Martin Mares, April 1997 - * Substantially overhauled by H. Peter Anvin, April 2007 - */ - -#include <stdio.h> -#include <string.h> -#include <stdlib.h> -#include <stdarg.h> -#include <sys/types.h> -#include <sys/stat.h> -#include <sys/sysmacros.h> -#include <unistd.h> -#include <fcntl.h> -#include <sys/mman.h> -#include <asm/boot.h> - -typedef unsigned char u8; -typedef unsigned short u16; -typedef unsigned long u32; - -#define DEFAULT_MAJOR_ROOT 0 -#define DEFAULT_MINOR_ROOT 0 - -/* Minimal number of setup sectors */ -#define SETUP_SECT_MIN 5 -#define SETUP_SECT_MAX 64 - -/* This must be large enough to hold the entire setup */ -u8 buf[SETUP_SECT_MAX*512]; -int is_big_kernel; - -static void die(const char * str, ...) -{ - va_list args; - va_start(args, str); - vfprintf(stderr, str, args); - fputc('\n', stderr); - exit(1); -} - -static void usage(void) -{ - die("Usage: build [-b] setup system [rootdev] [> image]"); -} - -int main(int argc, char ** argv) -{ - unsigned int i, sz, setup_sectors; - int c; - u32 sys_size; - u8 major_root, minor_root; - struct stat sb; - FILE *file; - int fd; - void *kernel; - - if (argc > 2 && !strcmp(argv[1], "-b")) - { - is_big_kernel = 1; - argc--, argv++; - } - if ((argc < 3) || (argc > 4)) - usage(); - if (argc > 3) { - if (!strcmp(argv[3], "CURRENT")) { - if (stat("/", &sb)) { - perror("/"); - die("Couldn't stat /"); - } - major_root = major(sb.st_dev); - minor_root = minor(sb.st_dev); - } else if (strcmp(argv[3], "FLOPPY")) { - if (stat(argv[3], &sb)) { - perror(argv[3]); - die("Couldn't stat root device."); - } - major_root = major(sb.st_rdev); - minor_root = minor(sb.st_rdev); - } else { - major_root = 0; - minor_root = 0; - } - } else { - major_root = DEFAULT_MAJOR_ROOT; - minor_root = DEFAULT_MINOR_ROOT; - } - fprintf(stderr, "Root device is (%d, %d)\n", major_root, minor_root); - - /* Copy the setup code */ - file = fopen(argv[1], "r"); - if (!file) - die("Unable to open `%s': %m", argv[1]); - c = fread(buf, 1, sizeof(buf), file); - if (ferror(file)) - die("read-error on `setup'"); - if (c < 1024) - die("The setup must be at least 1024 bytes"); - if (buf[510] != 0x55 || buf[511] != 0xaa) - die("Boot block hasn't got boot flag (0xAA55)"); - fclose(file); - - /* Pad unused space with zeros */ - setup_sectors = (c + 511) / 512; - if (setup_sectors < SETUP_SECT_MIN) - setup_sectors = SETUP_SECT_MIN; - i = setup_sectors*512; - memset(buf+c, 0, i-c); - - /* Set the default root device */ - buf[508] = minor_root; - buf[509] = major_root; - - fprintf(stderr, "Setup is %d bytes (padded to %d bytes).\n", c, i); - - /* Open and stat the kernel file */ - fd = open(argv[2], O_RDONLY); - if (fd < 0) - die("Unable to open `%s': %m", argv[2]); - if (fstat(fd, &sb)) - die("Unable to stat `%s': %m", argv[2]); - sz = sb.st_size; - fprintf (stderr, "System is %d kB\n", (sz+1023)/1024); - kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0); - if (kernel == MAP_FAILED) - die("Unable to mmap '%s': %m", argv[2]); - sys_size = (sz + 15) / 16; - if (!is_big_kernel && sys_size > DEF_SYSSIZE) - die("System is too big. Try using bzImage or modules."); - - /* Patch the setup code with the appropriate size parameters */ - buf[0x1f1] = setup_sectors-1; - buf[0x1f4] = sys_size; - buf[0x1f5] = sys_size >> 8; - buf[0x1f6] = sys_size >> 16; - buf[0x1f7] = sys_size >> 24; - - if (fwrite(buf, 1, i, stdout) != i) - die("Writing setup failed"); - - /* Copy the kernel code */ - if (fwrite(kernel, 1, sz, stdout) != sz) - die("Writing kernel failed"); - close(fd); - - /* Everything is OK */ - return 0; -} --
Jeremy Fitzhardinge
2007-Jun-06 16:16 UTC
[PATCH RFC 6/7] i386: make the bzImage payload an ELF file
This patch makes the payload of the bzImage file an ELF file. In other words, the bzImage is structured as follows: - boot sector - 16bit setup code - ELF header - decompressor - compressed kernel A bootloader may find the start of the ELF file by looking at the setup_size entry in the boot params, and using that to find the offset of the ELF header. The ELF Phdrs contain all the mapped memory required to decompress and start booting the kernel. One slightly complex part of this is that the bzImage boot_params need to know about the internal structure of the ELF file, at least to the extent of being able to point the core32_start entry at the ELF file's entrypoint, so that loaders which use this field will still work. Similarly, the ELF header needs to know how big the kernel vmlinux's bss segment is, in order to make sure is is mapped properly. To handle these two cases, we generate abstracted versions of the object files which only contain the symbols we care about (generated with objcopy --strip-all --keep-symbol=X), and then include those symbol tables with ld -R. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> --- arch/i386/boot/Makefile | 11 ++++-- arch/i386/boot/compressed/Makefile | 29 +++++++++++++-- arch/i386/boot/compressed/elfhdr.S | 60 +++++++++++++++++++++++++++++++++ arch/i386/boot/compressed/head.S | 9 ++-- arch/i386/boot/compressed/notes.S | 7 +++ arch/i386/boot/compressed/vmlinux.lds | 24 ++++++++++--- arch/i386/boot/header.S | 7 --- arch/i386/boot/setup.ld | 5 ++ arch/i386/kernel/head.S | 1 arch/i386/kernel/vmlinux.lds.S | 1 10 files changed, 131 insertions(+), 23 deletions(-) ==================================================================--- a/arch/i386/boot/Makefile +++ b/arch/i386/boot/Makefile @@ -72,14 +72,19 @@ AFLAGS := $(CFLAGS) -D__ASSEMBLY__ SETUP_OBJS = $(addprefix $(obj)/,$(setup-y)) -LDFLAGS_setup.elf := -T -$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE +$(obj)/zImage $(obj)/bzImage: \ + LDFLAGS := \ + -R $(obj)/compressed/blob-syms \ + --defsym IMAGE_OFFSET=$(IMAGE_OFFSET) -T + +$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) \ + $(obj)/compressed/blob-syms FORCE $(call if_changed,ld) $(obj)/payload.o: EXTRA_AFLAGS := -Wa,-I$(obj) $(obj)/payload.o: $(src)/payload.S $(obj)/blob.bin -$(obj)/compressed/blob: FORCE +$(obj)/compressed/blob $(obj)/compressed/blob-syms: FORCE $(Q)$(MAKE) $(build)=$(obj)/compressed IMAGE_OFFSET=$(IMAGE_OFFSET) $@ # Set this if you want to pass append arguments to the zdisk/fdimage/isoimage kernel ==================================================================--- a/arch/i386/boot/compressed/Makefile +++ b/arch/i386/boot/compressed/Makefile @@ -4,21 +4,42 @@ # create a compressed vmlinux image from the original vmlinux # -targets := blob vmlinux.bin vmlinux.bin.gz head.o misc.o piggy.o \ +targets := blob vmlinux.bin vmlinux.bin.gz \ + elfhdr.o head.o misc.o notes.o piggy.o \ vmlinux.bin.all vmlinux.relocs -LDFLAGS_blob := -T hostprogs-y := relocs CFLAGS := -m32 -D__KERNEL__ $(LINUX_INCLUDE) -O2 \ -fno-strict-aliasing -fPIC \ $(call cc-option,-ffreestanding) \ $(call cc-option,-fno-stack-protector) -LDFLAGS := -m elf_i386 +LDFLAGS := -R $(obj)/vmlinux-syms --defsym IMAGE_OFFSET=$(IMAGE_OFFSET) -T -$(obj)/blob: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE +OBJS=$(addprefix $(obj)/,elfhdr.o head.o misc.o notes.o piggy.o) + +$(obj)/blob: $(src)/vmlinux.lds $(obj)/vmlinux-syms $(OBJS) FORCE $(call if_changed,ld) @: + +# Generate a stripped-down object including only the symbols needed +# so that we can get them with ld -R. Direct stderr to /dev/null to +# shut useless warning up. +quiet_cmd_symextract = SYMEXT $@ + cmd_symextract = objcopy -S \ + $(addprefix -j,$(EXTRACTSECTS)) \ + $(addprefix -K,$(EXTRACTSYMS)) \ + $< $@ 2>/dev/null + +$(obj)/blob-syms: EXTRACTSYMS := blob_entry blob_payload +$(obj)/blob-syms: EXTRACTSECTS := .text.head .data.compressed +$(obj)/blob-syms: $(obj)/blob FORCE + $(call if_changed,symextract) + +$(obj)/vmlinux-syms: EXTRACTSYMS := __reserved_end +$(obj)/vmlinux-syms: EXTRACTSECTS := .bss +$(obj)/vmlinux-syms: vmlinux FORCE + $(call if_changed,symextract) $(obj)/vmlinux.bin: vmlinux FORCE $(call if_changed,objcopy) ==================================================================--- /dev/null +++ b/arch/i386/boot/compressed/elfhdr.S @@ -0,0 +1,60 @@ +/* DIY ELF header */ + +#include <linux/elf.h> +#include <asm/boot.h> + +.section .elfhdr,"a",@progbits +ehdr: + # e_ident + .byte ELFMAG0, ELFMAG1, ELFMAG2, ELFMAG3 + .byte ELFCLASS32, ELFDATA2LSB, EV_CURRENT, ELFOSABI_STANDALONE + .org ehdr + EI_NIDENT +#ifndef CONFIG_RELOCATABLE + .word ET_EXEC # e_type +#else + .word ET_DYN # e_type +#endif + .word EM_386 # e_machine + .int 1 # e_version + .int LOAD_PHYSICAL_ADDR + blob_startup_32 - ehdr # e_entry + .int phdr - ehdr # e_phoff + .int 0 # e_shoff + .int 0 # e_flags + .word ehdr_end - ehdr # e_ehsize + .word phdr_size # e_phentsize + .word phnum # e_phnum + .word 40 # e_shentsize + .word 0 # e_shnum + .word 0 # e_shstrndx +ehdr_end: + +phdr: + .int PT_LOAD # p_type + .int _head - ehdr # p_offset + .int LOAD_PHYSICAL_ADDR # p_vaddr + .int LOAD_PHYSICAL_ADDR # p_paddr + .int blob_filesz # p_filesz + .int blob_memsz # p_memsz + .int PF_R | PF_W | PF_X # p_flags + .int 4096 # p_align +phdr_size = . - phdr + + .int PT_NOTE # p_type + .int _notes - ehdr # p_offset + .int 0 # p_vaddr + .int 0 # p_paddr + .int blob_notesz # p_filesz + .int 0 # p_memsz + .int 0 # p_flags + .int 0 # p_align + + .int PT_PHDR # p_type + .int phdr - ehdr # p_offset + .int LOAD_PHYSICAL_ADDR + phdr - ehdr # p_vaddr + .int LOAD_PHYSICAL_ADDR + phdr - ehdr # p_paddr + .int phdr_end - phdr # p_filesz + .int phdr_end - phdr # p_memsz + .int PF_R | PF_W | PF_X # p_flags + .int 0 # p_align +phdr_end: +phnum = (phdr_end - phdr) / phdr_size ==================================================================--- a/arch/i386/boot/compressed/head.S +++ b/arch/i386/boot/compressed/head.S @@ -27,11 +27,12 @@ #include <asm/segment.h> #include <asm/page.h> #include <asm/boot.h> +#include <asm/asm-offsets.h> .section ".text.head","ax",@progbits - .globl startup_32 + .globl blob_startup_32 -startup_32: +blob_startup_32: cld cli movl $(__BOOT_DS),%eax @@ -48,7 +49,7 @@ startup_32: * data at 0x1e4 (defined as a scratch field) are used as the stack * for this calculation. Only 4 bytes are needed. */ - leal (0x1e4+4)(%esi), %esp + leal (BP_scratch+4)(%esi), %esp call 1f 1: popl %ebp subl $1b, %ebp @@ -85,7 +86,7 @@ 1: popl %ebp pushl %esi leal _end(%ebp), %esi leal _end(%ebx), %edi - movl $(_end - startup_32), %ecx + movl $(_end - blob_startup_32), %ecx std rep movsb ==================================================================--- /dev/null +++ b/arch/i386/boot/compressed/notes.S @@ -0,0 +1,7 @@ +#include <linux/elfnote.h> +#include <linux/elf_boot.h> +#include <linux/utsrelease.h> + +ELFNOTE(ELF_NOTE_BOOT, EIN_PROGRAM_NAME, .asciz "Linux") +ELFNOTE(ELF_NOTE_BOOT, EIN_PROGRAM_VERSION, .asciz UTS_RELEASE) +ELFNOTE(ELF_NOTE_BOOT, EIN_ARGUMENT_STYLE, .asciz "Linux") ==================================================================--- a/arch/i386/boot/compressed/vmlinux.lds +++ b/arch/i386/boot/compressed/vmlinux.lds @@ -1,18 +1,21 @@ OUTPUT_FORMAT("elf32-i386", "elf32-i386" -OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386") +OUTPUT_FORMAT("elf32-i386") OUTPUT_ARCH(i386) -ENTRY(startup_32) + SECTIONS { - /* Be careful parts of head.S assume startup_32 is at - * address 0. - */ + /* make sure we don't get anything from vmlinux-syms */ + /DISCARD/ : { */vmlinux-syms(*) } + . = 0 ; .text.head : { + *(.elfhdr) _head = . ; + blob_entry = blob_startup_32 + IMAGE_OFFSET; *(.text.head) _ehead = . ; } .data.compressed : { + blob_payload = input_data + IMAGE_OFFSET; *(.data.compressed) } .text : { @@ -33,6 +36,7 @@ SECTIONS *(.data.*) _edata = . ; } + blob_filesz = . ; .bss : { _bss = . ; *(.bss) @@ -40,4 +44,14 @@ SECTIONS *(COMMON) _end = . ; } + + /* __reserved_end taken from vmlinux */ + blob_memsz = __reserved_end; + + .notes : { + _notes = . ; + *(.note*) + _notes_end = .; + } + blob_notesz = _notes_end - _notes; } ==================================================================--- a/arch/i386/boot/header.S +++ b/arch/i386/boot/header.S @@ -155,13 +155,8 @@ setup_move_size: .word _setup_size # si # loader knows how much data behind # us also needs to be loaded. -code32_start: # here loaders can put a different +code32_start: .long blob_entry # here loaders can put a different # start address for 32-bit code. -#ifndef __BIG_KERNEL__ - .long 0x1000 # 0x1000 = default for zImage -#else - .long 0x100000 # 0x100000 = default for big kernel -#endif ramdisk_image: .long 0 # address of loaded ramdisk image # Here the loader puts the 32-bit ==================================================================--- a/arch/i386/boot/setup.ld +++ b/arch/i386/boot/setup.ld @@ -9,6 +9,9 @@ ENTRY(_start) SECTIONS { + /* make sure we don't get anything from blob-syms */ + /DISCARD/ : { */blob-syms(*) } + .bstext 0 : { *(.bstext) } .bsdata : { *(.bsdata) } @@ -45,7 +48,7 @@ SECTIONS /DISCARD/ : { *(.note*) } - . = ALIGN(512); /* align to sector size */ + . = ALIGN(4096); /* align to page size */ _setup_size = . - _start; _setup_sects = _setup_size / 512; ==================================================================--- a/arch/i386/kernel/head.S +++ b/arch/i386/kernel/head.S @@ -59,6 +59,7 @@ BOOTBITMAP_SIZE = LOW_PAGES / 8 BOOTBITMAP_SIZE = LOW_PAGES / 8 ALLOCATOR_SLOP = 4 +.globl INIT_MAP_BEYOND_END INIT_MAP_BEYOND_END = BOOTBITMAP_SIZE + (PAGE_TABLE_SIZE + ALLOCATOR_SLOP)*PAGE_SIZE_asm /* ==================================================================--- a/arch/i386/kernel/vmlinux.lds.S +++ b/arch/i386/kernel/vmlinux.lds.S @@ -194,6 +194,7 @@ SECTIONS /* This is where the kernel creates the early boot page tables */ . = ALIGN(4096); pg0 = . ; + __reserved_end = . + INIT_MAP_BEYOND_END - LOAD_OFFSET; } /* Sections to be discarded */ --
Jeremy Fitzhardinge
2007-Jun-06 16:16 UTC
[PATCH RFC 3/7] allow linux/elf.h to be included in assembler
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> --- include/linux/elf.h | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) ==================================================================--- a/include/linux/elf.h +++ b/include/linux/elf.h @@ -1,9 +1,10 @@ #ifndef _LINUX_ELF_H #define _LINUX_ELF_H +#include <linux/elf-em.h> +#ifndef __ASSEMBLY__ #include <linux/types.h> #include <linux/auxvec.h> -#include <linux/elf-em.h> #include <asm/elf.h> struct file; @@ -31,6 +32,7 @@ typedef __u32 Elf64_Word; typedef __u32 Elf64_Word; typedef __u64 Elf64_Xword; typedef __s64 Elf64_Sxword; +#endif /* __ASSEMBLY__ */ /* These constants are for the segment types stored in the image headers */ #define PT_NULL 0 @@ -123,6 +125,7 @@ typedef __s64 Elf64_Sxword; #define ELF64_ST_BIND(x) ELF_ST_BIND(x) #define ELF64_ST_TYPE(x) ELF_ST_TYPE(x) +#ifndef __ASSEMBLY__ typedef struct dynamic{ Elf32_Sword d_tag; union{ @@ -138,6 +141,7 @@ typedef struct { Elf64_Addr d_ptr; } d_un; } Elf64_Dyn; +#endif /* __ASSEMBLY__ */ /* The following are used with relocations */ #define ELF32_R_SYM(x) ((x) >> 8) @@ -146,6 +150,7 @@ typedef struct { #define ELF64_R_SYM(i) ((i) >> 32) #define ELF64_R_TYPE(i) ((i) & 0xffffffff) +#ifndef __ASSEMBLY__ typedef struct elf32_rel { Elf32_Addr r_offset; Elf32_Word r_info; @@ -185,11 +190,12 @@ typedef struct elf64_sym { Elf64_Addr st_value; /* Value of the symbol */ Elf64_Xword st_size; /* Associated symbol size */ } Elf64_Sym; - +#endif /* __ASSEMBLY__ */ #define EI_NIDENT 16 -typedef struct elf32_hdr{ +#ifndef __ASSEMBLY__ +typedef struct elf32_hdr { unsigned char e_ident[EI_NIDENT]; Elf32_Half e_type; Elf32_Half e_machine; @@ -222,6 +228,7 @@ typedef struct elf64_hdr { Elf64_Half e_shnum; Elf64_Half e_shstrndx; } Elf64_Ehdr; +#endif /* __ASSEMBLY__ */ /* These constants define the permissions on sections in the program header, p_flags. */ @@ -229,7 +236,8 @@ typedef struct elf64_hdr { #define PF_W 0x2 #define PF_X 0x1 -typedef struct elf32_phdr{ +#ifndef __ASSEMBLY__ +typedef struct elf32_phdr { Elf32_Word p_type; Elf32_Off p_offset; Elf32_Addr p_vaddr; @@ -250,6 +258,7 @@ typedef struct elf64_phdr { Elf64_Xword p_memsz; /* Segment size in memory */ Elf64_Xword p_align; /* Segment alignment, file & memory */ } Elf64_Phdr; +#endif /* __ASSEMBLY__ */ /* sh_type */ #define SHT_NULL 0 @@ -284,7 +293,8 @@ typedef struct elf64_phdr { #define SHN_ABS 0xfff1 #define SHN_COMMON 0xfff2 #define SHN_HIRESERVE 0xffff - + +#ifndef __ASSEMBLY__ typedef struct { Elf32_Word sh_name; Elf32_Word sh_type; @@ -310,6 +320,7 @@ typedef struct elf64_shdr { Elf64_Xword sh_addralign; /* Section alignment */ Elf64_Xword sh_entsize; /* Entry size if section holds table */ } Elf64_Shdr; +#endif /* __ASSEMBLY__ */ #define EI_MAG0 0 /* e_ident[] indexes */ #define EI_MAG1 1 @@ -343,6 +354,7 @@ typedef struct elf64_shdr { #define ELFOSABI_NONE 0 #define ELFOSABI_LINUX 3 +#define ELFOSABI_STANDALONE 255 #ifndef ELF_OSABI #define ELF_OSABI ELFOSABI_NONE @@ -357,6 +369,7 @@ typedef struct elf64_shdr { #define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */ +#ifndef __ASSEMBLY__ /* Note header in a PT_NOTE section */ typedef struct elf32_note { Elf32_Word n_namesz; /* Name size */ @@ -396,5 +409,6 @@ static inline void arch_write_notes(stru #define ELF_CORE_EXTRA_NOTES_SIZE arch_notes_size() #define ELF_CORE_WRITE_EXTRA_NOTES arch_write_notes(file) #endif /* ARCH_HAVE_EXTRA_ELF_NOTES */ +#endif /* __ASSEMBLY__ */ #endif /* _LINUX_ELF_H */ --
Jeremy Fitzhardinge
2007-Jun-06 16:16 UTC
[PATCH RFC 4/7] define ELF notes for adding to a boot image
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> --- include/linux/elf_boot.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) ==================================================================--- /dev/null +++ b/include/linux/elf_boot.h @@ -0,0 +1,15 @@ +#ifndef ELF_BOOT_H +#define ELF_BOOT_H + +/* Elf notes to help bootloaders identify what program they are booting. + */ + +/* Standardized Elf image notes for booting... The name for all of these is ELFBoot */ +#define ELF_NOTE_BOOT ELFBoot + +#define EIN_PROGRAM_NAME 1 /* The program in this ELF file */ +#define EIN_PROGRAM_VERSION 2 /* The version of the program in this ELF file */ +#define EIN_PROGRAM_CHECKSUM 3 /* ip style checksum of the memory image. */ +#define EIN_ARGUMENT_STYLE 4 /* String identifying argument passing style */ + +#endif /* ELF_BOOT_H */ --
This patch uses the updated boot protocol to do paravirtualized boot. If the boot version is >= 2.07, then it will do two things: 1. Check the bootparams loadflags to see if we should reload the segment registers and clear interrupts. This is appropriate for normal native boot and some paravirtualized environments, but inappropraite for others. 2. Check the hardware architecture, and dispatch to the appropriate kernel entrypoint. If the bootloader doesn't set this, then we simply do the normal boot sequence. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> --- arch/i386/boot/header.S | 9 ++++++++- arch/i386/kernel/head.S | 47 +++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 51 insertions(+), 5 deletions(-) ==================================================================--- a/arch/i386/boot/header.S +++ b/arch/i386/boot/header.S @@ -119,7 +119,7 @@ 1: # Part 2 of the header, from the old setup.S .ascii "HdrS" # header signature - .word 0x0206 # header version number (>= 0x0105) + .word 0x0207 # header version number (>= 0x0105) # or else old loadlin-1.5 will fail) .globl realmode_swtch realmode_swtch: .word 0, 0 # default_switch, SETUPSEG @@ -209,6 +209,13 @@ cmdline_size: .long COMMAND_LINE_SIZ #added with boot protocol #version 2.06 +hardware_subarch: .long 0 # subarchitecture, added with 2.07 + # default to 0 for normal x86 PC + +hardware_subarch_data: .quad 0 + +kernel_payload: .long blob_payload # raw kernel data + # End of setup header ##################################################### .section ".inittext", "ax" ==================================================================--- a/arch/i386/kernel/head.S +++ b/arch/i386/kernel/head.S @@ -71,28 +71,37 @@ INIT_MAP_BEYOND_END = BOOTBITMAP_SIZE + */ .section .text.head,"ax",@progbits ENTRY(startup_32) + /* check to see if KEEP_SEGMENTS flag is meaningful */ + cmpw $0x207, BP_version(%esi) + jb 1f + + /* test KEEP_SEGMENTS flag to see if the bootloader is asking + us to not reload segments */ + testb $(1<<6), BP_loadflags(%esi) + jnz 2f /* * Set segments to known values. */ - cld - lgdt boot_gdt_descr - __PAGE_OFFSET +1: lgdt boot_gdt_descr - __PAGE_OFFSET movl $(__BOOT_DS),%eax movl %eax,%ds movl %eax,%es movl %eax,%fs movl %eax,%gs +2: /* * Clear BSS first so that there are no surprises... - * No need to cld as DF is already clear from cld above... - */ + */ + cld xorl %eax,%eax movl $__bss_start - __PAGE_OFFSET,%edi movl $__bss_stop - __PAGE_OFFSET,%ecx subl %edi,%ecx shrl $2,%ecx rep ; stosl + /* * Copy bootup parameters out of the way. * Note: %esi still has the pointer to the real-mode data. @@ -120,6 +129,35 @@ 2: movsl 1: +#ifdef CONFIG_PARAVIRT + cmpw $0x207, (boot_params + BP_version - __PAGE_OFFSET) + jb default_entry + + /* Paravirt-compatible boot parameters. Look to see what architecture + we're booting under. */ + movl (boot_params + BP_hardware_subarch - __PAGE_OFFSET), %eax + cmpl $num_subarch_entries, %eax + jae bad_subarch + + movl subarch_entries - __PAGE_OFFSET(,%eax,4), %eax + subl $__PAGE_OFFSET, %eax + jmp *%eax + +bad_subarch: +WEAK(lguest_entry) +WEAK(xen_entry) + /* Unknown implementation; there's really + nothing we can do at this point. */ + ud2a +.data +subarch_entries: + .long default_entry /* normal x86/PC */ + .long lguest_entry /* lguest hypervisor */ + .long xen_entry /* Xen hypervisor */ +num_subarch_entries = (. - subarch_entries) / 4 +.previous +#endif /* CONFIG_PARAVIRT */ + /* * Initialize page tables. This creates a PDE and a set of page * tables, which are located immediately beyond _end. The variable @@ -132,6 +170,7 @@ 1: */ page_pde_offset = (__PAGE_OFFSET >> 20); +default_entry: movl $(pg0 - __PAGE_OFFSET), %edi movl $(swapper_pg_dir - __PAGE_OFFSET), %edx movl $0x007, %eax /* 0x007 = PRESENT+RW+USER */ --