Olaf Hering
2012-Jul-05 21:06 UTC
incorrect layout of globals from head_64.S during kexec boot
During kexec in a Xen PVonHVM guest the new kernel crashes most of the time in secondary_startup_64 because the content of phys_base is corrupted. Its not zero as expected but has some random other values. While debugging that crash I came up with the change below to inspect the memory around phys_base. It turned out that the globales are not in the expected memory location. An expected value such as phys_base_plus1 is shifted, but by a different amount during repeated kexec attempts. Up to now I havent figured out where this happens. My question is: were to put additional debug to trace the copying of the data section to its final destination? Is this a task of kexec -l or does that happen during decompressing? I suspect the latter. This is the console output before the crash (the crash happens in ''movq %rax, %cr3''): ... [ 44.072548] Starting new kernel I''m in purgatory early console in decompress_kernel Decompressing Linux... Parsing ELF... done. Booting the kernel. ... example xenctx output: rip: 0000000001000146 flags: 00010086 rf s nz p rsp: 0000000002119c80 rax: 888888888a495999 rcx: 00000000000003d5 rdx: 0000000001000000 rbx: 0000000001cac000 rsi: 0000000000003000 rdi: 0000000001c13000 rbp: 0000000000000000 r8: 0000000001c13000 r9: 1111111111112222 r10: 0000000000001111 r11: 9999999999990000 r12: 8888888888889999 r13: 7777777777778888 r14: 0000000000007777 r15: 0000000000000000 cs: 0010 ss: 0000 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ 0000000000000000/0000000000000000 cr0: 80000011 cr2: ffffffffff600400 cr3: 0211a000 cr4: 000000a0 dr0: 00000000 dr1: 00000000 dr2: 00000000 dr3: 00000000 dr6: ffff0ff0 dr7: 00000400 Code (instr addr 01000146) a0 00 00 00 0f 22 e0 48 c7 c0 00 c0 c0 01 48 03 05 02 3f c1 00 <0f> 22 d8 48 c7 c0 52 01 00 81 ff diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 94bf9cc..999807c 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -69,6 +69,22 @@ startup_64: /* Compute the delta between the address I am compiled to run at and the * address I am actually running at. */ +#if 1 + movq $phys_base - __START_KERNEL_map, %rdx + movq phys_base_minus5(%rip),%rbp + movq phys_base_minus4(%rip),%r8 + movq phys_base_minus3(%rip),%r9 + movq phys_base_minus2(%rip),%r10 + movq phys_base_minus1(%rip),%r11 + movq phys_base(%rip),%r12 + movq phys_base_plus1(%rip),%r13 + movq phys_base_plus2(%rip),%r14 + movq phys_base_plus3(%rip),%r15 +#if 0 + ud2a + hlt +#endif +#endif leaq _text(%rip), %rbp subq $_text - __START_KERNEL_map, %rbp @@ -166,6 +182,10 @@ ENTRY(secondary_startup_64) /* Setup early boot stage 4 level pagetables. */ movq $(init_level4_pgt - __START_KERNEL_map), %rax addq phys_base(%rip), %rax +#if 0 + ud2a + hlt +#endif movq %rax, %cr3 /* Ensure I am executing from virtual addresses */ @@ -439,10 +459,28 @@ early_gdt_descr: .word GDT_ENTRIES*8-1 early_gdt_descr_base: .quad INIT_PER_CPU_VAR(gdt_page) - -ENTRY(phys_base) + .align 32 +phys_base_minus5: + .quad 0x5555555555555555 +phys_base_minus4: + .quad 0x4444444444444444 +phys_base_minus3: + .quad 0x3333333333333333 +phys_base_minus2: + .quad 0x2222222222222222 +phys_base_minus1: + .quad 0x1111111111111111 + + .globl phys_base +phys_base: /* This must match the first entry in level2_kernel_pgt */ .quad 0x0000000000000000 +phys_base_plus1: + .quad 0x9999999999999999 +phys_base_plus2: + .quad 0x8888888888888888 +phys_base_plus3: + .quad 0x7777777777777777 #include "../../x86/xen/xen-head.S"
Jan Beulich
2012-Jul-06 08:29 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
>>> On 05.07.12 at 23:06, Olaf Hering <olaf-QOLJcTWqO2uzQB+pC5nmwQ@public.gmane.org> wrote: > My question is: were to put additional debug to trace the copying of the > data section to its final destination? Is this a task of kexec -l or > does that happen during decompressing? I suspect the latter. This is the > console output before the crash (the crash happens in ''movq %rax, %cr3''): > > ... > [ 44.072548] Starting new kernel > I''m in purgatory > early console in decompress_kernel > > Decompressing Linux... Parsing ELF... done.According to this, I''d first of all extend the printing done on arch/x86/boot/compressed/misc.c:parse_elf(). One possible problem (without much looking at how the individual address ranges get determined) could be overlapping memory ranges in the call to memcpy() inside the loop. Jan
Daniel Kiper
2012-Jul-06 08:41 UTC
Re: incorrect layout of globals from head_64.S during kexec boot
On Thu, Jul 05, 2012 at 11:06:07PM +0200, Olaf Hering wrote:> > During kexec in a Xen PVonHVM guest the new kernel crashes most of the > time in secondary_startup_64 because the content of phys_base is > corrupted. Its not zero as expected but has some random other values. > While debugging that crash I came up with the change below to inspect > the memory around phys_base. > > It turned out that the globales are not in the expected memory location. > An expected value such as phys_base_plus1 is shifted, but by a different > amount during repeated kexec attempts. Up to now I havent figured out > where this happens. > > My question is: were to put additional debug to trace the copying of the > data section to its final destination? Is this a task of kexec -l or > does that happen during decompressing? I suspect the latter. This is the > console output before the crash (the crash happens in ''movq %rax, %cr3''):Copy is done a few times durnig kexec/kdump but the most important in this case, I think, is in relocate_kernel() function (look for rep movsl or rep movsq and code around it). But I am a bit surprised that kernel is decompressing itself. I always thought that it is done during kexec/kdump load phase but maybe I am wrong. Could you send me more info about your Linux Kernel version, kexec-tools version and exact commands which you are using to load/exececute kernel? Daniel
Olaf Hering
2012-Jul-06 12:07 UTC
Re: incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, Daniel Kiper wrote:> Copy is done a few times durnig kexec/kdump but the most important > in this case, I think, is in relocate_kernel() function (look for > rep movsl or rep movsq and code around it). But I am a bit surprised > that kernel is decompressing itself. I always thought that it is done > during kexec/kdump load phase but maybe I am wrong. Could you send > me more info about your Linux Kernel version, kexec-tools version > and exact commands which you are using to load/exececute kernel?Its kexec-tools and kernel mainline, but it happens also with older versions of both. kexec works fine with the forward ported version of xenlinux. kexec -l bzImage --ramdisk=/boot/initrd-3.5.0-rc5-bug694863+ ''--command-line=root=/dev/disk/by-label/sles11sp1_full sysrq=yes panic=9 oops=panic console=ttyS0,115200 log_buf_len=16M ignore_loglevel initcall_debug debug earlyprintk=serial,ttyS0,115200'' -t bzImage --console-serial --serial=ttyS0 --serial-baud=115200 --debug kexec -e As Jan pointed out, the copying is done in arch/x86/boot/compressed/misc.c. But adding some debug to inspect *output in parse_elf() shows that the second entry in program headers is already shifted by 44 bytes in my testing, the others are shifted by the same amount. Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 0xa3b000 R E 0x200000 LOAD 0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 0x05b0e8 RW 0x200000 LOAD 0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 0x012c40 RW 0x200000 LOAD 0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 0x702000 RWE 0x200000 NOTE 0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 0x00017c 0x4 That makes me wonder wether kexec-tools is the culprit. Olaf
Jan Beulich
2012-Jul-06 12:56 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
>>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote: > But adding some debug to inspect > *output in parse_elf() shows that the second entry in program headers is > already shifted by 44 bytes in my testing, the others are shifted by the > same amount.Unfortunately it''s not clear what is shifted - the printout below looks just fine. Also, from your first mail I understood that the shift there was by an amount not divisible by 4 - does that amount vary?> Program Headers: > Type Offset VirtAddr PhysAddr FileSiz > MemSiz Flg Align > LOAD 0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 > 0xa3b000 R E 0x200000 > LOAD 0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 > 0x05b0e8 RW 0x200000 > LOAD 0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 > 0x012c40 RW 0x200000 > LOAD 0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 > 0x702000 RWE 0x200000 > NOTE 0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c > 0x00017c 0x4 > > > That makes me wonder wether kexec-tools is the culprit.Possibly, though generally any corruption to the compressed image should make decompression fail I would think. Jan
Olaf Hering
2012-Jul-06 13:31 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, Jan Beulich wrote:> >>> On 06.07.12 at 14:07, Olaf Hering <olaf-QOLJcTWqO2uzQB+pC5nmwQ@public.gmane.org> wrote: > > But adding some debug to inspect > > *output in parse_elf() shows that the second entry in program headers is > > already shifted by 44 bytes in my testing, the others are shifted by the > > same amount. > > Unfortunately it''s not clear what is shifted - the printout below > looks just fine. Also, from your first mail I understood that the shift > there was by an amount not divisible by 4 - does that amount vary?The memory location of the second LOAD entry (the .data section) is wrong. It should be at 0x1c00000, but in fact its content starts at 0x1c0002c. I looked at the x86 boot code, the vmlinux is gzipped and placed as binary blob, which is then extracted by decompress(). I will cleanup my debug changes and post the output. Olaf> > Program Headers: > > Type Offset VirtAddr PhysAddr FileSiz > > MemSiz Flg Align > > LOAD 0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 > > 0xa3b000 R E 0x200000 > > LOAD 0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 > > 0x05b0e8 RW 0x200000 > > LOAD 0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 > > 0x012c40 RW 0x200000 > > LOAD 0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 > > 0x702000 RWE 0x200000 > > NOTE 0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c > > 0x00017c 0x4
Jan Beulich
2012-Jul-06 13:53 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
>>> On 06.07.12 at 15:31, Olaf Hering <olaf@aepfle.de> wrote: > On Fri, Jul 06, Jan Beulich wrote: > >> >>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote: >> > But adding some debug to inspect >> > *output in parse_elf() shows that the second entry in program headers is >> > already shifted by 44 bytes in my testing, the others are shifted by the >> > same amount. >> >> Unfortunately it''s not clear what is shifted - the printout below >> looks just fine. Also, from your first mail I understood that the shift >> there was by an amount not divisible by 4 - does that amount vary? > > The memory location of the second LOAD entry (the .data section) is wrong. > It should be at 0x1c00000, but in fact its content starts at 0x1c0002c. > I looked at the x86 boot code, the vmlinux is gzipped and placed as > binary blob, which is then extracted by decompress().Are the virtual addresses then offset as well? Is phdr->p_offset sane? And you didn''t clarify whether the offset was always the same. Jan
Olaf Hering
2012-Jul-06 14:14 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, Olaf Hering wrote:> I will cleanup my debug changes and post the output.What I see is that the content of the uncompressed vmlinux is appearently already corrupted after decompress(). After I made small changes to arch/x86/boot/compressed/misc.c and arch/x86/kernel/head_64.S the offset in memory changed from 0x2c to 0x8. This could mean that the unzip code is broken, but this is rather unlikely. The odd thing is, if the first kernel is forced to return false in xen_hvm_platform() to disable the PVonHVM features then kexec works ok. Could it be that some code tweaks the stack content used by decompress() in some odd way? But that would most likely lead to a crash, not to unexpected uncompressing results. I will study the code some more. This is the readelf output from vmlinux: Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align 0 LOAD 0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 0xa3b000 R E 0x200000 1 LOAD 0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 0x05b0e8 RW 0x200000 2 LOAD 0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 0x012c40 RW 0x200000 3 LOAD 0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 0x702000 RWE 0x200000 4 NOTE 0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 0x00017c 0x4 Dump of the Program Header sections: #0 dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x200000 )) count=$(( 0xa3b000 )) | xxd -a -g 8 | head -n 12 0000000: 4c8b0d2940c1004c 8b152a40c1004c8b L..)@..L..*@..L. 0000010: 1d2b40c1004c8b25 3440c1004c8b2d35 .+@..L.%4@..L.-5 0000020: 40c1004c8b353640 c1004c8b3d3740c1 @..L.56@..L.=7@. 0000030: 00488d2dc8ffffff 4881ed0000000148 .H.-....H......H 0000040: 89e825ffff1f0085 c00f855b01000048 ..%........[...H 0000050: 8d15aaffffff48b8 0000000080000000 ......H......... 0000060: 4839c20f83410100 0048012d90bfc000 H9...A...H.-.... 0000070: 48012d09c8c00048 012d7acfc0004801 H.-....H.-z...H. 0000080: 2d7bcfc00048012d 64efc00048012d65 -{...H.-d...H.-e 0000090: efc00048012d36ff c000488d3d5fffff ...H.-6...H.=_.. 00000a0: ff4881e70000e0ff 4889f848c1e81e48 .H......H..H...H 00000b0: 25ff010000743148 8d956330c101488d %....t1H..c0..H. #1 dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0xe00000 )) count=$(( 0x05b0e8 )) | xxd -a -g 8 | head -n 12 0000000: 8044c181ffffffff 60acc181ffffffff .D......`....... 0000010: 0000000000000000 0000000001000010 ................ 0000020: ffffffffffffffff c0d70481ffffffff ................ 0000030: 0000000000000000 0000000000000000 ................ * 0002000: 48c7c0600000000f 05c3cccccccccccc H..`............ 0002010: cccccccccccccccc cccccccccccccccc ................ 0002020: cccccccccccccccc cccccccccccccccc ................ 0002030: cccccccccccccccc cccccccccccccccc ................ 0002040: cccccccccccccccc cccccccccccccccc ................ 0002050: cccccccccccccccc cccccccccccccccc ................ 0002060: cccccccccccccccc cccccccccccccccc ................ #2 dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x1000000 )) count=$(( 0x012c40 )) | xxd -a -g 8 | head -n 12 0000000: 0000000000000000 0000000000000000 ................ * 0004000: 0000000000000000 ffff0000009bcf00 ................ 0004010: ffff0000009baf00 ffff00000093cf00 ................ 0004020: ffff000000fbcf00 ffff000000f3cf00 ................ 0004030: ffff000000fbaf00 0000000000000000 ................ 0004040: 0000000000000000 0000000000000000 ................ * 000be80: ffffffff00000000 0000000000000000 ................ 000be90: 0000000000000000 0000000000000000 ................ * 000bf00: ffffffffffffffff ffffffffffffffff ................ #3 dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x106f000 )) count=$(( 0x087000 )) | xxd -a -g 8 | head -n 12 0000000: 6a006a00e9170100 006a006a01e90e01 j.j......j.j.... 0000010: 00006a006a02e905 0100006a006a03e9 ..j.j......j.j.. 0000020: fc0000006a006a04 e9f30000006a006a ....j.j......j.j 0000030: 05e9ea0000006a00 6a06e9e10000006a ......j.j......j 0000040: 006a07e9d8000000 66906a08e9cf0000 .j......f.j..... 0000050: 006a006a09e9c600 000066906a0ae9bd .j.j......f.j... 0000060: 00000066906a0be9 b400000066906a0c ...f.j......f.j. 0000070: e9ab00000066906a 0de9a20000006690 .....f.j......f. 0000080: 6a0ee9990000006a 006a0fe990000000 j......j.j...... 0000090: 6a006a10e9870000 0066906a11e97e00 j.j......f.j..~. 00000a0: 00006a006a12e975 0000006a006a13e9 ..j.j..u...j.j.. 00000b0: 6c0000006a006a14 e9630000006a006a l...j.j..c...j.j #4 dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x82d5bc )) count=$(( 0x00017c )) 2>/dev/null | xxd -a -g 8 | head -n 12 0000000: 0400000006000000 0600000058656e00 ............Xen. 0000010: 6c696e7578000000 0400000004000000 linux........... 0000020: 0700000058656e00 322e360004000000 ....Xen.2.6..... 0000030: 0800000005000000 58656e0078656e2d ........Xen.xen- 0000040: 332e300004000000 0800000003000000 3.0............. 0000050: 58656e0000000080 ffffffff04000000 Xen............. 0000060: 0800000001000000 58656e0010f2c681 ........Xen..... 0000070: ffffffff04000000 0800000002000000 ................ 0000080: 58656e0000100081 ffffffff04000000 Xen............. 0000090: 2a0000000a000000 58656e0021777269 *.......Xen.!wri 00000a0: 7461626c655f7061 67655f7461626c65 table_page_table 00000b0: 737c7061655f7067 6469725f61626f76 s|pae_pgdir_abov Dump of the place where phys_base is stored in the .data section: (5555555555555555 should have been at offset 0x14020, but in fact its stored at 0x14028 in the output of the guest below) * 0014000: 7f000000c681ffff ffff000000000000 ................ 0014010: 0000000000000000 0000000000000000 ................ 0014020: 5555555555555555 4444444444444444 UUUUUUUUDDDDDDDD 0014030: 3333333333333333 2222222222222222 33333333"""""""" 0014040: 1111111111111111 9090909090909090 ................ 0014050: 0000000000000000 9999999999999999 ................ 0014060: 8888888888888888 7777777777777777 ........wwwwwwww 0014070: 0000000000000000 0000000000000000 ................ 0014080: 7b599781ffffffff 82599781ffffffff {Y.......Y...... 0014090: 0000000000000000 0000000000000000 ................ * Thats what I get in the guest: ... [ 29.590290] Starting new kernel I''m in purgatory early console in decompress_kernel Decompressing Linux... Parsing ELF... output 0000000001000000 phdrs 000000000210e040 ehdr.e_phoff 0000000000000040 ehdr.e_phnum 0000000000000005 i 0000000000000000 p 0000000001200000 4c8b0d2940c1004c8b152a40c1004c8b1d2b40c1004c8b253440c1004c8b2d3540c1004c8b353640c1004c8b3d3740c100488d2dc8ffffff4881ed0000000148 i 0000000000000001 p 0000000001e00000 00000000000000008044c181ffffffff60acc181ffffffff00000000000000000000000001000010ffffffffffffffffc0d70481ffffffff0000000000000000 i 0000000000000001 &p[j] 0000000001e14028 5555555555555555444444444444444433333333333333332222222222222222111111111111111190909090909090900000000000000000999999999999999988888888888888887777777777777777 i 0000000000000002 p 0000000002000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 i 0000000000000003 p 000000000206f000 00000000000000006a006a00e9170100006a006a01e90e0100006a006a02e9050100006a006a03e9fc0000006a006a04e9f30000006a006a05e9ea0000006a00 i 0000000000000004 p 000000000182d5bc 6631d2e9c9fdfdff04ec300006ec300006ec300058656e006c696e7578ec300004ec300004ec300007ec300058656e00322e360004ec30000830ec3005ec3000 i 0000000000000000 dest 0000000001000000 phdr->p_paddr 0000000001000000 phdr->p_offset 0000000000200000 phdr->p_filesz 0000000000a3b000 i 0000000000000000 dest 0000000001000000 output + phdr->p_offset 0000000001200000 phdr->p_filesz 0000000000a3b000 overlap 000000000083b000 i 0000000000000001 dest 0000000001c00000 phdr->p_paddr 0000000001c00000 phdr->p_offset 0000000000e00000 phdr->p_filesz 000000000005b0e8 i 0000000000000001 dest 0000000001c00000 output + phdr->p_offset 0000000001e00000 phdr->p_filesz 000000000005b0e8 i 0000000000000002 dest 0000000001c5c000 phdr->p_paddr 0000000001c5c000 phdr->p_offset 0000000001000000 phdr->p_filesz 0000000000012c40 i 0000000000000002 dest 0000000001c5c000 output + phdr->p_offset 0000000002000000 phdr->p_filesz 0000000000012c40 i 0000000000000003 dest 0000000001c6f000 phdr->p_paddr 0000000001c6f000 phdr->p_offset 000000000106f000 phdr->p_filesz 0000000000087000 i 0000000000000003 dest 0000000001c6f000 output + phdr->p_offset 000000000206f000 phdr->p_filesz 0000000000087000 done. Booting the kernel. <crash> xenctx shows this: rip: 0000000001000136 flags: 00010086 rf s nz p rsp: 000000000211a040 rax: 9090909092515090 rcx: 00000000000003d5 rdx: 0000000001000000 rbx: 0000000001cac000 rsi: 0000000000003000 rdi: 0000000001c13000 rbp: 0000000000000000 r8: 0000000001c13000 r9: 4444444444444444 r10: 3333333333333333 r11: 2222222222222222 r12: 9090909090909090 r13: 0000000000000000 r14: 9999999999999999 r15: 8888888888888888 cs: 0010 ss: 0000 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ 0000000000000000/0000000000000000 cr0: 80000011 cr2: ffffffffff600400 cr3: 0211b000 cr4: 000000a0 dr0: 00000000 dr1: 00000000 dr2: 00000000 dr3: 00000000 dr6: ffff0ff0 dr7: 00000400 Code (instr addr 01000136) a0 00 00 00 0f 22 e0 48 c7 c0 00 c0 c0 01 48 03 05 1a 3f c1 00 <0f> 22 d8 48 c7 c0 42 01 00 81 ff The debug patch for 3.5-rc5: diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 7116dcb..14e77cf 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -273,6 +273,36 @@ static void error(char *x) asm("hlt"); } +static void putbyte(unsigned char val) +{ + static const char digits[] = "0123456789abcdef"; + char str[(sizeof(unsigned char) * 2) + 1]; + int i = sizeof(str), c = sizeof(unsigned char) * 2; + str[--i] = 0; + while(c--) { + str[--i] = digits[val & 0xf]; + val >>=4; + } + putstr(str); +} +static void __putval(char *s, unsigned long val) +{ + static const char digits[] = "0123456789abcdef"; + char str[(sizeof(unsigned long) * 2) + 3]; + int i = sizeof(str), c = sizeof(unsigned long) * 2; + str[--i] = 0; + str[--i] = '' ''; + while(c--) { + str[--i] = digits[val & 0xf]; + val >>=4; + } + str[--i] = '' ''; + + putstr(s); + putstr(str); +} +#define putval(x) __putval(#x, (unsigned long)(x)) + static void parse_elf(void *output) { #ifdef CONFIG_X86_64 @@ -284,6 +314,9 @@ static void parse_elf(void *output) #endif void *dest; int i; + unsigned long j, c; + unsigned char *p; + signed long overlap; memcpy(&ehdr, output, sizeof(ehdr)); if (ehdr.e_ident[EI_MAG0] != ELFMAG0 || @@ -303,6 +336,33 @@ static void parse_elf(void *output) memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum); + putstr("\n"); + putval(output); + putval(phdrs); + putval(ehdr.e_phoff); + putval(ehdr.e_phnum); + putstr("\n"); + for (i = 0; i < ehdr.e_phnum; i++) { + phdr = &phdrs[i]; + p = output + phdr->p_offset; + putval(i); + putval(p); + for (j = 0; j < 64; j++) + putbyte(p[j]); + putstr("\n"); + if (i == 1) + for (j = 0; j < phdr->p_filesz; j++) { + if (p[j + 0] == 0x55 && p[j + 1] == 0x55 && p[j + 2] == 0x55 && p[j + 3] == 0x55 && p[j + 4] == 0x55 && p[j + 5] == 0x55 && p[j + 6] == 0x55 && p[j + 7] == 0x55) { + putval(i); + putval(&p[j]); + for (c = 0; c < 10 * sizeof(unsigned long); c++) + putbyte(p[j +c]); + putstr("\n"); + break; + } + } + } + putstr("\n"); for (i = 0; i < ehdr.e_phnum; i++) { phdr = &phdrs[i]; @@ -314,6 +374,22 @@ static void parse_elf(void *output) #else dest = (void *)(phdr->p_paddr); #endif + putval(i); + putval(dest); + putval(phdr->p_paddr); + putval(phdr->p_offset); + putval(phdr->p_filesz); + putstr("\n"); + putval(i); + putval(dest); + putval(output + phdr->p_offset); + putval(phdr->p_filesz); + putstr("\n"); + overlap = ((long)dest + (long)phdr->p_filesz) - ((long)output + (long)phdr->p_offset); + if (overlap > 0) { + putval(overlap); + putstr("\n"); + } memcpy(dest, output + phdr->p_offset, phdr->p_filesz); diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 94bf9cc..42f1836 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -69,6 +69,15 @@ startup_64: /* Compute the delta between the address I am compiled to run at and the * address I am actually running at. */ +#if 1 + movq phys_base_minus3(%rip),%r9 + movq phys_base_minus2(%rip),%r10 + movq phys_base_minus1(%rip),%r11 + movq phys_base(%rip),%r12 + movq phys_base_plus1(%rip),%r13 + movq phys_base_plus2(%rip),%r14 + movq phys_base_plus3(%rip),%r15 +#endif leaq _text(%rip), %rbp subq $_text - __START_KERNEL_map, %rbp @@ -166,6 +175,9 @@ ENTRY(secondary_startup_64) /* Setup early boot stage 4 level pagetables. */ movq $(init_level4_pgt - __START_KERNEL_map), %rax addq phys_base(%rip), %rax +#if 0 + ud2a +#endif movq %rax, %cr3 /* Ensure I am executing from virtual addresses */ @@ -439,10 +451,27 @@ early_gdt_descr: .word GDT_ENTRIES*8-1 early_gdt_descr_base: .quad INIT_PER_CPU_VAR(gdt_page) + .align 32 +phys_base_minus5: + .quad 0x5555555555555555 +phys_base_minus4: + .quad 0x4444444444444444 +phys_base_minus3: + .quad 0x3333333333333333 +phys_base_minus2: + .quad 0x2222222222222222 +phys_base_minus1: + .quad 0x1111111111111111 ENTRY(phys_base) /* This must match the first entry in level2_kernel_pgt */ .quad 0x0000000000000000 +phys_base_plus1: + .quad 0x9999999999999999 +phys_base_plus2: + .quad 0x8888888888888888 +phys_base_plus3: + .quad 0x7777777777777777 #include "../../x86/xen/xen-head.S"
Jan Beulich
2012-Jul-06 14:50 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
>>> On 06.07.12 at 16:14, Olaf Hering <olaf@aepfle.de> wrote: > On Fri, Jul 06, Olaf Hering wrote: > >> I will cleanup my debug changes and post the output. > > What I see is that the content of the uncompressed vmlinux is > appearently already corrupted after decompress(). After I made small > changes to arch/x86/boot/compressed/misc.c and arch/x86/kernel/head_64.S > the offset in memory changed from 0x2c to 0x8. > > This could mean that the unzip code is broken, but this is rather > unlikely. The odd thing is, if the first kernel is forced to return > false in xen_hvm_platform() to disable the PVonHVM features then kexec > works ok. > > Could it be that some code tweaks the stack content used by decompress() > in some odd way? But that would most likely lead to a crash, not to > unexpected uncompressing results.Especially if the old and new kernel are using the exact same image, how about the decompression writing over the shared info page causing all this? As the decompressor wouldn''t expect Xen to possibly write stuff there itself, it could easily be that some repeat count gets altered, thus breaking the decompressed data without the decompression code necessarily noticing. If that''s the case, there would be a more general problem here (for kdump at least), as granted pages could also still get written to when the new kernel already is in the process of launching. Jan
Daniel Kiper
2012-Jul-06 15:54 UTC
Re: incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, 2012 at 02:07:50PM +0200, Olaf Hering wrote:> On Fri, Jul 06, Daniel Kiper wrote: > > > Copy is done a few times durnig kexec/kdump but the most important > > in this case, I think, is in relocate_kernel() function (look for > > rep movsl or rep movsq and code around it). But I am a bit surprised > > that kernel is decompressing itself. I always thought that it is done > > during kexec/kdump load phase but maybe I am wrong. Could you sendI am wrong.> > me more info about your Linux Kernel version, kexec-tools version > > and exact commands which you are using to load/exececute kernel? > > Its kexec-tools and kernel mainline, but it happens also with older > versions of both. kexec works fine with the forward ported version of > xenlinux. > > kexec -l bzImage --ramdisk=/boot/initrd-3.5.0-rc5-bug694863+ ''--command-line=root=/dev/disk/by-label/sles11sp1_full > sysrq=yes > panic=9 > oops=panic > console=ttyS0,115200 > log_buf_len=16M > ignore_loglevel > initcall_debug > debug earlyprintk=serial,ttyS0,115200'' -t bzImage --console-serial --serial=ttyS0 --serial-baud=115200 --debug > kexec -eNothing special. But try also ELF loader.> As Jan pointed out, the copying is done in > arch/x86/boot/compressed/misc.c. But adding some debug to inspect > *output in parse_elf() shows that the second entry in program headers is > already shifted by 44 bytes in my testing, the others are shifted by the > same amount. > > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > LOAD 0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 0xa3b000 R E 0x200000 > LOAD 0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 0x05b0e8 RW 0x200000 > LOAD 0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 0x012c40 RW 0x200000 > LOAD 0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 0x702000 RWE 0x200000 > NOTE 0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 0x00017c 0x4 > > That makes me wonder wether kexec-tools is the culprit.Is it relocatable kernel? If not please try use not relocatable one. Then kernel will be always in the same place. Daniel
Olaf Hering
2012-Jul-06 17:29 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, Jan Beulich wrote:> > Could it be that some code tweaks the stack content used by decompress() > > in some odd way? But that would most likely lead to a crash, not to > > unexpected uncompressing results. > > Especially if the old and new kernel are using the exact same > image, how about the decompression writing over the shared > info page causing all this? As the decompressor wouldn''t > expect Xen to possibly write stuff there itself, it could easily be > that some repeat count gets altered, thus breaking the > decompressed data without the decompression code necessarily > noticing.In my case the gfn of the shared info page is 1f54. Is it possible to move the page at runtime? It looks like it can be done because the hvm loader configures fffff initially. Perhaps the PVonHVM code has to give up the shared pages or move them to some dedicated unused page during the process of booting into the new kernel. Konrad, any idea how that could be done?> If that''s the case, there would be a more general problem here > (for kdump at least), as granted pages could also still get written > to when the new kernel already is in the process of launching.Maybe you meant to say kexec instead of kdump? kdump runs in its own memory area, so I think the worst thing is that some pages of the crashed kernel get modified. Olaf
Olaf Hering
2012-Jul-10 09:33 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, Olaf Hering wrote:> On Fri, Jul 06, Jan Beulich wrote: > > > > Could it be that some code tweaks the stack content used by decompress() > > > in some odd way? But that would most likely lead to a crash, not to > > > unexpected uncompressing results. > > > > Especially if the old and new kernel are using the exact same > > image, how about the decompression writing over the shared > > info page causing all this? As the decompressor wouldn''t > > expect Xen to possibly write stuff there itself, it could easily be > > that some repeat count gets altered, thus breaking the > > decompressed data without the decompression code necessarily > > noticing. > > In my case the gfn of the shared info page is 1f54. > Is it possible to move the page at runtime? It looks like it can be done > because the hvm loader configures fffff initially. > > Perhaps the PVonHVM code has to give up the shared pages or move them to > some dedicated unused page during the process of booting into the new > kernel.The pfn 1f54 of the shared info page is in the middle of the new bzImage: pfn 32080 KiB _head 29360 KiB _text 33826 KiB _end 33924 KiB In other words, the compressed vmlinux.bin gets slightly modified during uncompression. For some reason this does not lead to decompression errors. In the patch below the pfn is moved from the bss to the data section. As a result the new location is now at 28680 KiB, which is outside of the bzImage. Maybe there is a way to use another dedicated page as shared info page. Olaf diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ff962d4..258e8f3 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void) { int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; + static struct shared_info shared_info_page __page_aligned_data = { }; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; + xatp.gpfn = __pa(&shared_info_page) >> PAGE_SHIFT; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + HYPERVISOR_shared_info = &shared_info_page; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock
Konrad Rzeszutek Wilk
2012-Jul-10 14:14 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:> On Fri, Jul 06, Olaf Hering wrote: > > > On Fri, Jul 06, Jan Beulich wrote: > > > > > > Could it be that some code tweaks the stack content used by decompress() > > > > in some odd way? But that would most likely lead to a crash, not to > > > > unexpected uncompressing results. > > > > > > Especially if the old and new kernel are using the exact same > > > image, how about the decompression writing over the shared > > > info page causing all this? As the decompressor wouldn''t > > > expect Xen to possibly write stuff there itself, it could easily be > > > that some repeat count gets altered, thus breaking the > > > decompressed data without the decompression code necessarily > > > noticing. > > > > In my case the gfn of the shared info page is 1f54. > > Is it possible to move the page at runtime? It looks like it can be done > > because the hvm loader configures fffff initially. > > > > Perhaps the PVonHVM code has to give up the shared pages or move them to > > some dedicated unused page during the process of booting into the new > > kernel. > > The pfn 1f54 of the shared info page is in the middle of the new bzImage: > pfn 32080 KiB <= ok, so the old .bss> _head 29360 KiB > _text 33826 KiB > _end 33924 KiBSo _head is at 1CAC and _text starts at 2108h? Ugh, and 1F54 gets overriden. And with your patch, the data gets stuck in between _text and _end? No wait, where would the shared_info be located in the old kernel? Somewhere below the 1CACh? I presume 1F54 is the _brk_end for the old kernel as well? Could you tell me how the decompress code works? Is the new kernel put at PFN 1000h and the decompressor code is put below it? And the decompressor code uses the .bss section of the "new" kernel to do its deed - since it assumes that the carcass of the old kernel is truly dead and it is not raising a hand saying: "I am not dead yet!". Which brings me to another question - say we do use this patch, what if the decompressor overwrites the old kernels .data section. Won''t we run into this problem again? And what about the new kernel? It will try to register at a new MFN location the VCPU structure. Is that something that the hypervisor is OK with? Does that work with more than VCPU? Or is is stuck with just one VCPU (b/c it couldn''t actually do the hypercall?). Or is the registration OK, since the new kernel has the same layout so it registers at the same MFN as the "dead" kernel and it works peachy? What if the kernel version used in the kexec is different from the old one (say it has less built in things)? That would mean the .text and .data section are different than the "dead" kernel?> > In other words, the compressed vmlinux.bin gets slightly modified during > uncompression. For some reason this does not lead to decompression > errors.Hm> > In the patch below the pfn is moved from the bss to the data section. As > a result the new location is now at 28680 KiB, which is outside of the > bzImage. > > Maybe there is a way to use another dedicated page as shared info page.That would do it, but it has the negative consequence that we end up consuming an extra PAGE_SIZE that on baremetal kernels won''t be used. Also, I think there are other issues lurking around. The other users of the .bss (or rather of the extend_brk) call would also be trashed. On PVHVM that would be the DMI. On PV guests (Daniel has some thoughts on how to make kexec work under PV guests) that means the P2M tree gets trashed. And I am kind of worried that moving it to the .data section won''t be completly safe - as the decompressor might blow away that part too.> > Olaf > > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index ff962d4..258e8f3 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void) > { > int cpu; > struct xen_add_to_physmap xatp; > - static struct shared_info *shared_info_page = 0; > + static struct shared_info shared_info_page __page_aligned_data = { }; > > - if (!shared_info_page) > - shared_info_page = (struct shared_info *) > - extend_brk(PAGE_SIZE, PAGE_SIZE); > xatp.domid = DOMID_SELF; > xatp.idx = 0; > xatp.space = XENMAPSPACE_shared_info; > - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; > + xatp.gpfn = __pa(&shared_info_page) >> PAGE_SHIFT; > if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) > BUG(); > > - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; > + HYPERVISOR_shared_info = &shared_info_page; > > /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info > * page, we use it in the event channel upcall and in some pvclock > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2012-Jul-10 14:46 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:> Which brings me to another question - say we do use this patch, what > if the decompressor overwrites the old kernels .data section. Won''t > we run into this problem again?I''ve not really been following this thread that closely but wouldn''t the right answer be for the original kernel to unmap the shared info on kexec? Or maybe remap it up to some high/reserved address? Can it read the original address used by hvmloader at start of day and reuse that? Ian.
Konrad Rzeszutek Wilk
2012-Jul-10 14:51 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:> On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: > > Which brings me to another question - say we do use this patch, what > > if the decompressor overwrites the old kernels .data section. Won''t > > we run into this problem again? > > I''ve not really been following this thread that closely but wouldn''t the > right answer be for the original kernel to unmap the shared info on > kexec? Or maybe remap it up to some high/reserved address? Can it readThat would be the right answer I think, but I don''t see the a VCPU_deregister call (only VCPU_register). But perhaps the XENMEM_decrease_reservation for the particular MFN is the answer to do a VCPU "de-register" ?> the original address used by hvmloader at start of day and reuse that?Wait, we can map multiple shared_info? Ooh, somehow I thought the guest could only do one registration.> > Ian. >
Olaf Hering
2012-Jul-10 15:23 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:> On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote: > > On Fri, Jul 06, Olaf Hering wrote: > > > > > On Fri, Jul 06, Jan Beulich wrote: > > > > > > > > Could it be that some code tweaks the stack content used by decompress() > > > > > in some odd way? But that would most likely lead to a crash, not to > > > > > unexpected uncompressing results. > > > > > > > > Especially if the old and new kernel are using the exact same > > > > image, how about the decompression writing over the shared > > > > info page causing all this? As the decompressor wouldn''t > > > > expect Xen to possibly write stuff there itself, it could easily be > > > > that some repeat count gets altered, thus breaking the > > > > decompressed data without the decompression code necessarily > > > > noticing. > > > > > > In my case the gfn of the shared info page is 1f54. > > > Is it possible to move the page at runtime? It looks like it can be done > > > because the hvm loader configures fffff initially. > > > > > > Perhaps the PVonHVM code has to give up the shared pages or move them to > > > some dedicated unused page during the process of booting into the new > > > kernel. > > > > The pfn 1f54 of the shared info page is in the middle of the new bzImage: > > pfn 32080 KiB <= ok, so the old .bss > > > _head 29360 KiB > > _text 33826 KiB > > _end 33924 KiB > > So _head is at 1CAC and _text starts at 2108h? > Ugh, and 1F54 gets overriden. And with your patch, the data gets > stuck in between _text and _end? No wait, where would the shared_info > be located in the old kernel? Somewhere below the 1CACh?The 3 symbols above are from bzImage, which contains the gzipped vmlinux and some code to decompress and actually start the uncompressed vmlinux.> I presume 1F54 is the _brk_end for the old kernel as well?Its in the .brk section of the old kernel.> Could you tell me how the decompress code works? Is the new kernel > put at PFN 1000h and the decompressor code is put below it?I''m not too familiar with the details of the events that happen during kexec -e. This is what happens from my understanding: kexec -l loads the new kernel and some helper code into some allocated memory. kexec -e starts the helper code which relocates the new bzImage to its new location (_head) and starts it. The new bzImage uncompresses the vmlinux to its final location and finally starts the new vmlinux. Now that I think about it, during the relocation of the new bzImage the part which contains the compressed vmlinux will already be corrupted because the shared info page will be modified by Xen (I dont know in what intervals the page gets modified).> And the decompressor code uses the .bss section of the "new" kernel > to do its deed - since it assumes that the carcass of the old kernel > is truly dead and it is not raising a hand saying: "I am not dead yet!".The decompressor uses its own .bss.> Which brings me to another question - say we do use this patch, what > if the decompressor overwrites the old kernels .data section. Won''t > we run into this problem again?The old kernel is dead at this point. If both kernels have the same memory layout then the decompressor will clear the page. If they have a different layout the .data section (or whatever happens to be there) of the new kernel will be corrupted.> And what about the new kernel? It will try to register at a new > MFN location the VCPU structure. Is that something that the hypervisor > is OK with? Does that work with more than VCPU? Or is is stuck with > just one VCPU (b/c it couldn''t actually do the hypercall?).So far I havent seen an issue because my guest uses a single cpu.> Or is the registration OK, since the new kernel has the same layout > so it registers at the same MFN as the "dead" kernel and it works > peachy? What if the kernel version used in the kexec is different > from the old one (say it has less built in things)? That would mean > the .text and .data section are different than the "dead" kernel?Yes, the layout would differ. During decompression corruption may occour.> > In the patch below the pfn is moved from the bss to the data section. As > > a result the new location is now at 28680 KiB, which is outside of the > > bzImage. > > > > Maybe there is a way to use another dedicated page as shared info page. > > That would do it, but it has the negative consequence that we end up > consuming an extra PAGE_SIZE that on baremetal kernels won''t be used.I was not thinking of statically allocated pages but some new concept of allocating such shared pages. Shouldnt there be some dedicated area in the E820 table which has to be used during the whole life time of the guest? Are there more shared areas or is it just the shared info page?> And I am kind of worried that moving it to the .data section won''t > be completly safe - as the decompressor might blow away that part too.The decompressor may just clear the area, but since there is no way to tell where the shared pages are its always a risk to allocate them at compile time. Olaf
Ian Campbell
2012-Jul-10 15:29 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:> On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote: > > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: > > > Which brings me to another question - say we do use this patch, what > > > if the decompressor overwrites the old kernels .data section. Won''t > > > we run into this problem again? > > > > I''ve not really been following this thread that closely but wouldn''t the > > right answer be for the original kernel to unmap the shared info on > > kexec? Or maybe remap it up to some high/reserved address? Can it read > > That would be the right answer I think, but I don''t see the a VCPU_deregister > call (only VCPU_register).Is the issue here vcpuinfo or the shared info (or both)?> But perhaps the XENMEM_decrease_reservation for the particular MFN is the > answer to do a VCPU "de-register" ? > > > the original address used by hvmloader at start of day and reuse that? > > Wait, we can map multiple shared_info? Ooh, somehow I thought the guest > could only do one registration.I think you can only have one mapping but you can move it by registering it again. Ian.
Olaf Hering
2012-Jul-10 15:37 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Ian Campbell wrote:> On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote: > > On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote: > > > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: > > > > Which brings me to another question - say we do use this patch, what > > > > if the decompressor overwrites the old kernels .data section. Won''t > > > > we run into this problem again? > > > > > > I''ve not really been following this thread that closely but wouldn''t the > > > right answer be for the original kernel to unmap the shared info on > > > kexec? Or maybe remap it up to some high/reserved address? Can it read > > > > That would be the right answer I think, but I don''t see the a VCPU_deregister > > call (only VCPU_register). > > Is the issue here vcpuinfo or the shared info (or both)?shared info is the issue in PVonHVM. Olaf
Konrad Rzeszutek Wilk
2012-Jul-10 17:26 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:> On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: > > > On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote: > > > On Fri, Jul 06, Olaf Hering wrote: > > > > > > > On Fri, Jul 06, Jan Beulich wrote: > > > > > > > > > > Could it be that some code tweaks the stack content used by decompress() > > > > > > in some odd way? But that would most likely lead to a crash, not to > > > > > > unexpected uncompressing results. > > > > > > > > > > Especially if the old and new kernel are using the exact same > > > > > image, how about the decompression writing over the shared > > > > > info page causing all this? As the decompressor wouldn''t > > > > > expect Xen to possibly write stuff there itself, it could easily be > > > > > that some repeat count gets altered, thus breaking the > > > > > decompressed data without the decompression code necessarily > > > > > noticing. > > > > > > > > In my case the gfn of the shared info page is 1f54. > > > > Is it possible to move the page at runtime? It looks like it can be done > > > > because the hvm loader configures fffff initially. > > > > > > > > Perhaps the PVonHVM code has to give up the shared pages or move them to > > > > some dedicated unused page during the process of booting into the new > > > > kernel. > > > > > > The pfn 1f54 of the shared info page is in the middle of the new bzImage: > > > pfn 32080 KiB <= ok, so the old .bss > > > > > _head 29360 KiB > > > _text 33826 KiB > > > _end 33924 KiB > > > > So _head is at 1CAC and _text starts at 2108h? > > Ugh, and 1F54 gets overriden. And with your patch, the data gets > > stuck in between _text and _end? No wait, where would the shared_info > > be located in the old kernel? Somewhere below the 1CACh? > > The 3 symbols above are from bzImage, which contains the gzipped vmlinux > and some code to decompress and actually start the uncompressed vmlinux. > > > I presume 1F54 is the _brk_end for the old kernel as well? > > Its in the .brk section of the old kernel. > > > Could you tell me how the decompress code works? Is the new kernel > > put at PFN 1000h and the decompressor code is put below it? > > I''m not too familiar with the details of the events that happen during > kexec -e. This is what happens from my understanding: > kexec -l loads the new kernel and some helper code into some allocated > memory. kexec -e starts the helper code which relocates the new bzImage > to its new location (_head) and starts it. The new bzImage uncompresses > the vmlinux to its final location and finally starts the new vmlinux. > > > Now that I think about it, during the relocation of the new bzImage the > part which contains the compressed vmlinux will already be corrupted > because the shared info page will be modified by Xen (I dont know in > what intervals the page gets modified). > > > And the decompressor code uses the .bss section of the "new" kernel > > to do its deed - since it assumes that the carcass of the old kernel > > is truly dead and it is not raising a hand saying: "I am not dead yet!". > > The decompressor uses its own .bss. > > > Which brings me to another question - say we do use this patch, what > > if the decompressor overwrites the old kernels .data section. Won''t > > we run into this problem again? > > The old kernel is dead at this point. If both kernels have the same > memory layout then the decompressor will clear the page. If they have a > different layout the .data section (or whatever happens to be there) of > the new kernel will be corrupted. > > > And what about the new kernel? It will try to register at a new > > MFN location the VCPU structure. Is that something that the hypervisor > > is OK with? Does that work with more than VCPU? Or is is stuck with > > just one VCPU (b/c it couldn''t actually do the hypercall?). > > So far I havent seen an issue because my guest uses a single cpu. > > > Or is the registration OK, since the new kernel has the same layout > > so it registers at the same MFN as the "dead" kernel and it works > > peachy? What if the kernel version used in the kexec is different > > from the old one (say it has less built in things)? That would mean > > the .text and .data section are different than the "dead" kernel? > > Yes, the layout would differ. During decompression corruption may > occour. > > > > In the patch below the pfn is moved from the bss to the data section. As > > > a result the new location is now at 28680 KiB, which is outside of the > > > bzImage. > > > > > > Maybe there is a way to use another dedicated page as shared info page. > > > > That would do it, but it has the negative consequence that we end up > > consuming an extra PAGE_SIZE that on baremetal kernels won''t be used. > > I was not thinking of statically allocated pages but some new concept of > allocating such shared pages. Shouldnt there be some dedicated area in > the E820 table which has to be used during the whole life time of the > guest?Not that I can see. But I don''t see why that could not be added? Perhaps the HVM loader can make it happen? But then how would it tell the kernel that this E820_RESERVED is the shared_info one. Not the other ones..> Are there more shared areas or is it just the shared info page? > > > And I am kind of worried that moving it to the .data section won''t > > be completly safe - as the decompressor might blow away that part too. > > The decompressor may just clear the area, but since there is no way to > tell where the shared pages are its always a risk to allocate them at > compile time.Yeah, and with the hypervisor potentially still updating the "old" MFN before the new kernel has registered the new MFN, we can end up corrupting the new kernel. Ouch. Would all of these issues disappear if the hypervisor had a hypercall that would stop updating the shared info? or just deregister the MFN? What if you ripped the GMFN out using ''decrease_reservation'' hypercall? Would that eliminate the pesky GMFN?
Olaf Hering
2012-Jul-10 18:09 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:> On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote: > > I was not thinking of statically allocated pages but some new concept of > > allocating such shared pages. Shouldnt there be some dedicated area in > > the E820 table which has to be used during the whole life time of the > > guest? > > Not that I can see. But I don''t see why that could not be added? Perhaps > the HVM loader can make it happen? But then how would it tell the kernel > that this E820_RESERVED is the shared_info one. Not the other ones..Maybe just use a new E820 type for this sort of thing? Its just the question wether some other OS can cope with an unknown type. From my reading of the e820 related code a region with an unknown type is just ignored.> > Are there more shared areas or is it just the shared info page? > > > > > And I am kind of worried that moving it to the .data section won''t > > > be completly safe - as the decompressor might blow away that part too. > > > > The decompressor may just clear the area, but since there is no way to > > tell where the shared pages are its always a risk to allocate them at > > compile time. > > Yeah, and with the hypervisor potentially still updating the "old" > MFN before the new kernel has registered the new MFN, we can end up > corrupting the new kernel. Ouch. > > Would all of these issues disappear if the hypervisor had a hypercall > that would stop updating the shared info? or just deregister the MFN? > What if you ripped the GMFN out using ''decrease_reservation'' hypercall? > Would that eliminate the pesky GMFN?I''m not sure, most likely the gfn will just disappear from the guest, like a ballooned page disappears. Accessing it will likely cause a crash. Olaf
Konrad Rzeszutek Wilk
2012-Jul-10 18:32 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 08:09:53PM +0200, Olaf Hering wrote:> On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: > > > On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote: > > > I was not thinking of statically allocated pages but some new concept of > > > allocating such shared pages. Shouldnt there be some dedicated area in > > > the E820 table which has to be used during the whole life time of the > > > guest? > > > > Not that I can see. But I don''t see why that could not be added? Perhaps > > the HVM loader can make it happen? But then how would it tell the kernel > > that this E820_RESERVED is the shared_info one. Not the other ones.. > > Maybe just use a new E820 type for this sort of thing? Its just theEwww.> question wether some other OS can cope with an unknown type. From my > reading of the e820 related code a region with an unknown type is just > ignored.Sure. And we could scan it.. but scanning E820_UNKNOWN for some magic header seems .. hacky.> > > > Are there more shared areas or is it just the shared info page? > > > > > > > And I am kind of worried that moving it to the .data section won''t > > > > be completly safe - as the decompressor might blow away that part too. > > > > > > The decompressor may just clear the area, but since there is no way to > > > tell where the shared pages are its always a risk to allocate them at > > > compile time. > > > > Yeah, and with the hypervisor potentially still updating the "old" > > MFN before the new kernel has registered the new MFN, we can end up > > corrupting the new kernel. Ouch. > > > > Would all of these issues disappear if the hypervisor had a hypercall > > that would stop updating the shared info? or just deregister the MFN? > > What if you ripped the GMFN out using ''decrease_reservation'' hypercall? > > Would that eliminate the pesky GMFN? > > I''m not sure, most likely the gfn will just disappear from the guest, > like a ballooned page disappears. Accessing it will likely cause a > crash.What about an populate_physmap right afterwards to stick a newly minted GMFN in its place? I don''t really know whether this dance of balloon out/balloon in the same GMFN will break the shared_info relationship. Perhaps not? What we are going for is to stop the hypervisor from using the shared_info MFN... perhaps there are other ways to do this?
Keir Fraser
2012-Jul-10 19:08 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On 10/07/2012 19:09, "Olaf Hering" <olaf@aepfle.de> wrote:>>> Are there more shared areas or is it just the shared info page? >>> >>>> And I am kind of worried that moving it to the .data section won''t >>>> be completly safe - as the decompressor might blow away that part too. >>> >>> The decompressor may just clear the area, but since there is no way to >>> tell where the shared pages are its always a risk to allocate them at >>> compile time. >> >> Yeah, and with the hypervisor potentially still updating the "old" >> MFN before the new kernel has registered the new MFN, we can end up >> corrupting the new kernel. Ouch. >> >> Would all of these issues disappear if the hypervisor had a hypercall >> that would stop updating the shared info? or just deregister the MFN? >> What if you ripped the GMFN out using ''decrease_reservation'' hypercall? >> Would that eliminate the pesky GMFN? > > I''m not sure, most likely the gfn will just disappear from the guest, > like a ballooned page disappears. Accessing it will likely cause a > crash.Best thing to do, is possible, is map the shared-info page in the xen-platform pci device''s BAR memory range. Then it will not conflict with any RAM. If you do map it over the top of an existing RAM page, you will have to repopulate that RAM page before kexec, using populate_physmap hypercall. The good news is that the populate_physmap hypercall will have the side effect of unmapping the shared-info page, reayd to be mapped wherever the new kernel would like it to reside :) Hope this clears up some of the confusion. ;) -- Keir
Olaf Hering
2012-Jul-13 20:20 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Keir Fraser wrote:> On 10/07/2012 19:09, "Olaf Hering" <olaf@aepfle.de> wrote: > > I''m not sure, most likely the gfn will just disappear from the guest, > > like a ballooned page disappears. Accessing it will likely cause a > > crash. > > Best thing to do, is possible, is map the shared-info page in the > xen-platform pci device''s BAR memory range. Then it will not conflict with > any RAM. > > If you do map it over the top of an existing RAM page, you will have to > repopulate that RAM page before kexec, using populate_physmap hypercall. The > good news is that the populate_physmap hypercall will have the side effect > of unmapping the shared-info page, reayd to be mapped wherever the new > kernel would like it to reside :)Keir, is this a safe thing to do in a SMP guest? If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and *xen_vcpu then everything will reference these pointers. If drivers/xen/platform-pci.c:platform_pci_init would also do a XENMAPSPACE_shared_info call with pfn B, isnt there a small window where pfn A is not backed by a mfn because mfn M is now connected to pfn C? As a result other code paths which access *HYPERVISOR_shared_info and *xen_vcpu between the hypercall and the update of the pointers will read 0xff. If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn backing *HYPERVISOR_shared_info will remain the same, so there is no need to copy data from the old to the new *HYPERVISOR_shared_info. What do you think, is that race real? Olaf
Keir Fraser
2012-Jul-14 04:54 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On 13/07/2012 21:20, "Olaf Hering" <olaf-QOLJcTWqO2uzQB+pC5nmwQ@public.gmane.org> wrote:> On Tue, Jul 10, Keir Fraser wrote: > >> On 10/07/2012 19:09, "Olaf Hering" <olaf-QOLJcTWqO2uzQB+pC5nmwQ@public.gmane.org> wrote: >>> I''m not sure, most likely the gfn will just disappear from the guest, >>> like a ballooned page disappears. Accessing it will likely cause a >>> crash. >> >> Best thing to do, is possible, is map the shared-info page in the >> xen-platform pci device''s BAR memory range. Then it will not conflict with >> any RAM. >> >> If you do map it over the top of an existing RAM page, you will have to >> repopulate that RAM page before kexec, using populate_physmap hypercall. The >> good news is that the populate_physmap hypercall will have the side effect >> of unmapping the shared-info page, reayd to be mapped wherever the new >> kernel would like it to reside :) > > Keir, > > is this a safe thing to do in a SMP guest? > If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page > (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and > *xen_vcpu then everything will reference these pointers.So pfn A now points at shared_info, and mfn M is lost (freed back to Xen). Xen_vcpu doesn''t come into it, you''d have that mapped at yet another pfn.> If drivers/xen/platform-pci.c:platform_pci_init would also do a > XENMAPSPACE_shared_info call with pfn B, isnt there a small window where > pfn A is not backed by a mfn because mfn M is now connected to pfn C? As > a result other code paths which access *HYPERVISOR_shared_info and > *xen_vcpu between the hypercall and the update of the pointers will read > 0xff.Don''t really understand this. After the XENMAPSPACE_shared_info_call: * PFN B points at shared_info, mfn M_B it previously mapped is lost (freed back to Xen). * PFN A maps nothing, reads return all-1s. Yes, obviously you can''t atomically update the mapping of shinfo from A->B, ad update your pointer in the kernel at exactly the same time. Presumably you do this early during boot, or late during kexec, or otherwise at a time when other processors are not expected to touch shinfo.> > If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn > backing *HYPERVISOR_shared_info will remain the same, so there is no need > to copy data from the old to the new *HYPERVISOR_shared_info.That is correct.> What do you think, is that race real?I suppose it is. I didn''t imagine it would be a troublesome one though. -- Keir> Olaf
Olaf Hering
2012-Jul-15 16:06 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Keir Fraser wrote:> Best thing to do, is possible, is map the shared-info page in the > xen-platform pci device''s BAR memory range. Then it will not conflict with > any RAM.This patch does that. I did a kexec boot and a save/restore. It does not deal with the possible race were the old pfn is not backed by a mfn. Olaf commit a9d5567c67a2317c9db174a4deef6c5690220749 Author: Olaf Hering <olaf@aepfle.de> Date: Thu Jul 12 19:38:39 2012 +0200 xen PVonHVM: move shared info page from RAM to MMIO Signed-off-by: Olaf Hering <ohering@suse.de> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ff962d4..8a743af 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor) return 0; } -void __ref xen_hvm_init_shared_info(void) +static struct shared_info *hvm_shared_info; +static unsigned long hvm_shared_info_pfn; + +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn) { int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; + xatp.gpfn = pfn; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + HYPERVISOR_shared_info = sip; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void) for_each_online_cpu(cpu) { per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; } + mb(); } +/* Reconnect the pfn to a mfn */ +void __ref xen_hvm_resume_shared_info(void) +{ + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + +/* Move the pfn in RAM to a pfn in MMIO space */ +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn) +{ + struct xen_memory_reservation reservation = { + .domid = DOMID_SELF, + .nr_extents = 1, + }; + int rc; + + set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn); + + /* Move pfn, disconnects previous pfn from mfn */ + set_shared_info(sip, pfn); + + /* Allocate new mfn for previous pfn */ + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); + WARN_ON(rc != 1); + + /* Remember final pfn and pointer for resume */ + hvm_shared_info_pfn = pfn; + hvm_shared_info = sip; +} + +/* Use a pfn in RAM until PCI init is done. */ +static void __ref xen_hvm_initial_shared_info(void) +{ + /* FIXME simply allocate a page and release it after pfn move (How at this stage?) */ + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT; + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + + #ifdef CONFIG_XEN_PVHVM static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void) if (r < 0) return; - xen_hvm_init_shared_info(); + xen_hvm_initial_shared_info(); if (xen_feature(XENFEAT_hvm_callback_vector)) xen_have_vector_callback = 1; diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 45329c8..ae8a00c 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) { #ifdef CONFIG_XEN_PVHVM int cpu; - xen_hvm_init_shared_info(); + xen_hvm_resume_shared_info(); xen_callback_vector(); xen_unplug_emulated_devices(); if (xen_feature(XENFEAT_hvm_safe_pvclock)) { diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 202d4c1..1e4329e 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -41,7 +41,7 @@ void xen_enable_syscall(void); void xen_vcpu_restore(void); void xen_callback_vector(void); -void xen_hvm_init_shared_info(void); +void xen_hvm_resume_shared_info(void); void xen_unplug_emulated_devices(void); void __init xen_build_dynamic_phys_to_machine(void); diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c index 97ca359..cbe866b 100644 --- a/drivers/xen/platform-pci.c +++ b/drivers/xen/platform-pci.c @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev, long ioaddr; long mmio_addr, mmio_len; unsigned int max_nr_gframes; + unsigned long addr; + struct shared_info *hvm_shared_info; if (!xen_domain()) return -ENODEV; @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev *pdev, platform_mmio = mmio_addr; platform_mmiolen = mmio_len; + addr = alloc_xen_mmio(PAGE_SIZE); + hvm_shared_info = ioremap(addr, PAGE_SIZE); + memset(hvm_shared_info, 0, PAGE_SIZE); + xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT); + if (!xen_have_vector_callback) { ret = xen_allocate_irq(pdev); if (ret) { diff --git a/include/xen/events.h b/include/xen/events.h index 04399b2..3337698 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq); void xen_irq_resume(void); +void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn); + /* Clear an irq''s pending state, in preparation for polling on it */ void xen_clear_irq_pending(int irq); void xen_set_irq_pending(int irq);
Keir Fraser
2012-Jul-15 17:17 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On 15/07/2012 17:06, "Olaf Hering" <olaf@aepfle.de> wrote:> On Tue, Jul 10, Keir Fraser wrote: > >> Best thing to do, is possible, is map the shared-info page in the >> xen-platform pci device''s BAR memory range. Then it will not conflict with >> any RAM. > > This patch does that. I did a kexec boot and a save/restore. > It does not deal with the possible race were the old pfn is not backed > by a mfn.Looks good to me. -- Keir> Olaf > > > commit a9d5567c67a2317c9db174a4deef6c5690220749 > Author: Olaf Hering <olaf@aepfle.de> > Date: Thu Jul 12 19:38:39 2012 +0200 > > xen PVonHVM: move shared info page from RAM to MMIO > > Signed-off-by: Olaf Hering <ohering@suse.de> > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index ff962d4..8a743af 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor) > return 0; > } > > -void __ref xen_hvm_init_shared_info(void) > +static struct shared_info *hvm_shared_info; > +static unsigned long hvm_shared_info_pfn; > + > +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn) > { > int cpu; > struct xen_add_to_physmap xatp; > - static struct shared_info *shared_info_page = 0; > > - if (!shared_info_page) > - shared_info_page = (struct shared_info *) > - extend_brk(PAGE_SIZE, PAGE_SIZE); > xatp.domid = DOMID_SELF; > xatp.idx = 0; > xatp.space = XENMAPSPACE_shared_info; > - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; > + xatp.gpfn = pfn; > if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) > BUG(); > > - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; > + HYPERVISOR_shared_info = sip; > > /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info > * page, we use it in the event channel upcall and in some pvclock > @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void) > for_each_online_cpu(cpu) { > per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; > } > + mb(); > } > > +/* Reconnect the pfn to a mfn */ > +void __ref xen_hvm_resume_shared_info(void) > +{ > + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); > +} > + > +/* Move the pfn in RAM to a pfn in MMIO space */ > +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned > long pfn) > +{ > + struct xen_memory_reservation reservation = { > + .domid = DOMID_SELF, > + .nr_extents = 1, > + }; > + int rc; > + > + set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn); > + > + /* Move pfn, disconnects previous pfn from mfn */ > + set_shared_info(sip, pfn); > + > + /* Allocate new mfn for previous pfn */ > + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); > + WARN_ON(rc != 1); > + > + /* Remember final pfn and pointer for resume */ > + hvm_shared_info_pfn = pfn; > + hvm_shared_info = sip; > +} > + > +/* Use a pfn in RAM until PCI init is done. */ > +static void __ref xen_hvm_initial_shared_info(void) > +{ > + /* FIXME simply allocate a page and release it after pfn move (How at this > stage?) */ > + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); > + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT; > + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); > +} > + > + > #ifdef CONFIG_XEN_PVHVM > static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, > unsigned long action, void *hcpu) > @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void) > if (r < 0) > return; > > - xen_hvm_init_shared_info(); > + xen_hvm_initial_shared_info(); > > if (xen_feature(XENFEAT_hvm_callback_vector)) > xen_have_vector_callback = 1; > diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c > index 45329c8..ae8a00c 100644 > --- a/arch/x86/xen/suspend.c > +++ b/arch/x86/xen/suspend.c > @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) > { > #ifdef CONFIG_XEN_PVHVM > int cpu; > - xen_hvm_init_shared_info(); > + xen_hvm_resume_shared_info(); > xen_callback_vector(); > xen_unplug_emulated_devices(); > if (xen_feature(XENFEAT_hvm_safe_pvclock)) { > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h > index 202d4c1..1e4329e 100644 > --- a/arch/x86/xen/xen-ops.h > +++ b/arch/x86/xen/xen-ops.h > @@ -41,7 +41,7 @@ void xen_enable_syscall(void); > void xen_vcpu_restore(void); > > void xen_callback_vector(void); > -void xen_hvm_init_shared_info(void); > +void xen_hvm_resume_shared_info(void); > void xen_unplug_emulated_devices(void); > > void __init xen_build_dynamic_phys_to_machine(void); > diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c > index 97ca359..cbe866b 100644 > --- a/drivers/xen/platform-pci.c > +++ b/drivers/xen/platform-pci.c > @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev > *pdev, > long ioaddr; > long mmio_addr, mmio_len; > unsigned int max_nr_gframes; > + unsigned long addr; > + struct shared_info *hvm_shared_info; > > if (!xen_domain()) > return -ENODEV; > @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev > *pdev, > platform_mmio = mmio_addr; > platform_mmiolen = mmio_len; > > + addr = alloc_xen_mmio(PAGE_SIZE); > + hvm_shared_info = ioremap(addr, PAGE_SIZE); > + memset(hvm_shared_info, 0, PAGE_SIZE); > + xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT); > + > if (!xen_have_vector_callback) { > ret = xen_allocate_irq(pdev); > if (ret) { > diff --git a/include/xen/events.h b/include/xen/events.h > index 04399b2..3337698 100644 > --- a/include/xen/events.h > +++ b/include/xen/events.h > @@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq); > > void xen_irq_resume(void); > > +void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long > pfn); > + > /* Clear an irq''s pending state, in preparation for polling on it */ > void xen_clear_irq_pending(int irq); > void xen_set_irq_pending(int irq);
Konrad Rzeszutek Wilk
2012-Jul-16 15:46 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote:> On Tue, Jul 10, Keir Fraser wrote: > > > Best thing to do, is possible, is map the shared-info page in the > > xen-platform pci device''s BAR memory range. Then it will not conflict with > > any RAM. > > This patch does that. I did a kexec boot and a save/restore. > It does not deal with the possible race were the old pfn is not backed > by a mfn. > > Olaf > > > commit a9d5567c67a2317c9db174a4deef6c5690220749 > Author: Olaf Hering <olaf@aepfle.de> > Date: Thu Jul 12 19:38:39 2012 +0200 > > xen PVonHVM: move shared info page from RAM to MMIO > > Signed-off-by: Olaf Hering <ohering@suse.de> > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index ff962d4..8a743af 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor) > return 0; > } > > -void __ref xen_hvm_init_shared_info(void) > +static struct shared_info *hvm_shared_info; > +static unsigned long hvm_shared_info_pfn; > +Please include a big comment explainig what this machination is OK to do multiple times. If you can include the juicy snippets of the email converstation in this comment so that in 6 months if somebody looks at this they have a good understanding of it.> +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn) > { > int cpu; > struct xen_add_to_physmap xatp; > - static struct shared_info *shared_info_page = 0; > > - if (!shared_info_page) > - shared_info_page = (struct shared_info *) > - extend_brk(PAGE_SIZE, PAGE_SIZE); > xatp.domid = DOMID_SELF; > xatp.idx = 0; > xatp.space = XENMAPSPACE_shared_info; > - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; > + xatp.gpfn = pfn; > if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) > BUG(); > > - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; > + HYPERVISOR_shared_info = sip; > > /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info > * page, we use it in the event channel upcall and in some pvclock > @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void) > for_each_online_cpu(cpu) { > per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; > } > + mb(); > } > > +/* Reconnect the pfn to a mfn */ > +void __ref xen_hvm_resume_shared_info(void) > +{ > + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); > +} > + > +/* Move the pfn in RAM to a pfn in MMIO space */ > +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn) > +{ > + struct xen_memory_reservation reservation = { > + .domid = DOMID_SELF, > + .nr_extents = 1, > + }; > + int rc; > + > + set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn); > + > + /* Move pfn, disconnects previous pfn from mfn */ > + set_shared_info(sip, pfn); > + > + /* Allocate new mfn for previous pfn */ > + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); > + WARN_ON(rc != 1);Shouldn''t you do some recovery if it fails? Like reassign the old RAM pfn back to the shared page?> + > + /* Remember final pfn and pointer for resume */ > + hvm_shared_info_pfn = pfn; > + hvm_shared_info = sip; > +} > + > +/* Use a pfn in RAM until PCI init is done. */ > +static void __ref xen_hvm_initial_shared_info(void) > +{ > + /* FIXME simply allocate a page and release it after pfn move (How at this stage?) */You could just have an page on the .bss that has __init_data. During bootup it would be recycled - and since the platform PCI driver is always built in, it would not be a problem I think? Thought if somebody does not compile with CONFIG_XEN_PVHVM then it would make sense to keep the existing allocation from the .brk.> + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); > + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT; > + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); > +} > + > + > #ifdef CONFIG_XEN_PVHVM > static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, > unsigned long action, void *hcpu) > @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void) > if (r < 0) > return; > > - xen_hvm_init_shared_info();Put here a comment saying: /* Initaliazing a RAM PFN that is later going to be replaced with * a MMIO PFN. */> + xen_hvm_initial_shared_info(); > > if (xen_feature(XENFEAT_hvm_callback_vector)) > xen_have_vector_callback = 1; > diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c > index 45329c8..ae8a00c 100644 > --- a/arch/x86/xen/suspend.c > +++ b/arch/x86/xen/suspend.c > @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) > { > #ifdef CONFIG_XEN_PVHVM > int cpu; > - xen_hvm_init_shared_info(); > + xen_hvm_resume_shared_info(); > xen_callback_vector(); > xen_unplug_emulated_devices(); > if (xen_feature(XENFEAT_hvm_safe_pvclock)) { > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h > index 202d4c1..1e4329e 100644 > --- a/arch/x86/xen/xen-ops.h > +++ b/arch/x86/xen/xen-ops.h > @@ -41,7 +41,7 @@ void xen_enable_syscall(void); > void xen_vcpu_restore(void); > > void xen_callback_vector(void); > -void xen_hvm_init_shared_info(void); > +void xen_hvm_resume_shared_info(void); > void xen_unplug_emulated_devices(void); > > void __init xen_build_dynamic_phys_to_machine(void); > diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c > index 97ca359..cbe866b 100644 > --- a/drivers/xen/platform-pci.c > +++ b/drivers/xen/platform-pci.c > @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev, > long ioaddr; > long mmio_addr, mmio_len; > unsigned int max_nr_gframes; > + unsigned long addr; > + struct shared_info *hvm_shared_info; > > if (!xen_domain()) > return -ENODEV; > @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev *pdev, > platform_mmio = mmio_addr; > platform_mmiolen = mmio_len; > > + addr = alloc_xen_mmio(PAGE_SIZE); > + hvm_shared_info = ioremap(addr, PAGE_SIZE); > + memset(hvm_shared_info, 0, PAGE_SIZE); > + xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT); > + > if (!xen_have_vector_callback) { > ret = xen_allocate_irq(pdev); > if (ret) { > diff --git a/include/xen/events.h b/include/xen/events.h > index 04399b2..3337698 100644 > --- a/include/xen/events.h > +++ b/include/xen/events.h > @@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq); > > void xen_irq_resume(void); > > +void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn); > + > /* Clear an irq''s pending state, in preparation for polling on it */ > void xen_clear_irq_pending(int irq); > void xen_set_irq_pending(int irq); > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Olaf Hering
2012-Jul-17 10:24 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Mon, Jul 16, Konrad Rzeszutek Wilk wrote:> On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote: > > -void __ref xen_hvm_init_shared_info(void) > > +static struct shared_info *hvm_shared_info; > > +static unsigned long hvm_shared_info_pfn; > > + > > Please include a big comment explainig what this machination is OK > to do multiple times. If you can include the juicy snippets of the > email converstation in this comment so that in 6 months if somebody > looks at this they have a good understanding of it.I have added a comment to the new patch, and I will extend it further in the commit message.> > + /* Move pfn, disconnects previous pfn from mfn */ > > + set_shared_info(sip, pfn); > > + > > + /* Allocate new mfn for previous pfn */ > > + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); > > + WARN_ON(rc != 1); > > Shouldn''t you do some recovery if it fails? Like reassign the old RAM pfn > back to the shared page?I moved the whole reassignment into the shutdown path because there is appearently no sane way to make the move atomic. Shortly before reboot all vcpus are still online, but hopefully the other vcpus have no work left. See below, the change is on top of the two other patches I sent out today. Olaf xen PVonHVM: move shared_info to MMIO before kexec Signed-off-by: Olaf Hering <olaf@aepfle.de> --- arch/x86/xen/enlighten.c | 114 +++++++++++++++++++++++++++++++++++++++----- arch/x86/xen/suspend.c | 2 +- arch/x86/xen/xen-ops.h | 2 +- drivers/xen/platform-pci.c | 15 ++++++ include/xen/events.h | 2 + 5 Dateien geändert, 122 Zeilen hinzugefügt(+), 13 Zeilen entfernt(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 184fa9c..3cf233b 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -31,6 +31,7 @@ #include <linux/pci.h> #include <linux/gfp.h> #include <linux/memblock.h> +#include <linux/syscore_ops.h> #include <xen/xen.h> #include <xen/interface/xen.h> @@ -1439,38 +1440,126 @@ asmlinkage void __init xen_start_kernel(void) #endif } -void __ref xen_hvm_init_shared_info(void) +#ifdef CONFIG_XEN_PVHVM +/* + * The pfn containing the shared_info is located somewhere in RAM. This + * will cause trouble if the current kernel is doing a kexec boot into a + * new kernel. The new kernel (and its startup code) can not know where + * the pfn is, so it can not reserve the page. The hypervisor will + * continue to update the pfn, and as a result memory corruption occours + * in the new kernel. + * + * One way to work around this issue is to allocate a page in the + * xen-platform pci device''s BAR memory range. But pci init is done very + * late and the shared_info page is already in use very early to read + * the pvclock. So moving the pfn from RAM to MMIO is racy because some + * code paths on other vcpus could access the pfn during the small + * window when the old pfn is moved to the new pfn. There is even a + * small window were the old pfn is not backed by a mfn, and during that + * time all reads return -1. + * + * Because it is not known upfront where the MMIO region is located it + * can not be used right from the start in xen_hvm_init_shared_info. + * + * To minimise trouble the move of the pfn is done shortly before kexec. + * This does not eliminate the race because all vcpus are still online + * when the syscore_ops will be called. But hopefully there is no work + * pending at this point in time. Also the syscore_op is run last which + * reduces the risk further. + */ + +static struct shared_info *xen_hvm_shared_info; + +static void xen_hvm_connect_shared_info(unsigned long pfn) { - int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; + xatp.gpfn = pfn; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; +} +static void xen_hvm_set_shared_info(struct shared_info *sip) +{ + int cpu; + + HYPERVISOR_shared_info = sip; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock * related functions. We don''t need the vcpu_info placement * optimizations because we don''t use any pv_mmu or pv_irq op on * HVM. - * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is - * online but xen_hvm_init_shared_info is run at resume time too and + * When xen_hvm_set_shared_info is run at boot time only vcpu 0 is + * online but xen_hvm_set_shared_info is run at resume time too and * in that case multiple vcpus might be online. */ for_each_online_cpu(cpu) { per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; } } -#ifdef CONFIG_XEN_PVHVM +/* Reconnect the shared_info pfn to a mfn */ +void xen_hvm_resume_shared_info(void) +{ + xen_hvm_connect_shared_info(__pa(xen_hvm_shared_info) >> PAGE_SHIFT); +} + +#ifdef CONFIG_KEXEC +static struct shared_info *xen_hvm_shared_info_kexec; +static unsigned long xen_hvm_shared_info_pfn_kexec; + +/* Remember a pfn in MMIO space for kexec reboot */ +void __devinit xen_hvm_prepare_kexec(struct shared_info *sip, unsigned long pfn) +{ + xen_hvm_shared_info_kexec = sip; + xen_hvm_shared_info_pfn_kexec = pfn; +} + +static void xen_hvm_syscore_shutdown(void) +{ + struct xen_memory_reservation reservation = { + .domid = DOMID_SELF, + .nr_extents = 1, + }; + unsigned long prev_pfn; + int rc; + + if (!xen_hvm_shared_info_kexec) + return; + + prev_pfn = __pa(xen_hvm_shared_info) >> PAGE_SHIFT; + set_xen_guest_handle(reservation.extent_start, &prev_pfn); + + /* Move pfn to MMIO, disconnects previous pfn from mfn */ + xen_hvm_connect_shared_info(xen_hvm_shared_info_pfn_kexec); + + /* Update pointers, following hypercall is also a memory barrier */ + xen_hvm_set_shared_info(xen_hvm_shared_info_kexec); + + /* Allocate new mfn for previous pfn */ + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); + + /* Make sure the previous pfn is really connected to a (new) mfn */ + BUG_ON(rc != 1); +} + +static struct syscore_ops xen_hvm_syscore_ops = { + .shutdown = xen_hvm_syscore_shutdown, +}; +#endif + +/* Use a pfn in RAM, may move to MMIO before kexec. */ +static void __init xen_hvm_init_shared_info(void) +{ + /* Remember pointer for resume */ + xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); + xen_hvm_connect_shared_info(__pa(xen_hvm_shared_info) >> PAGE_SHIFT); + xen_hvm_set_shared_info(xen_hvm_shared_info); +} + static void __init init_hvm_pv_info(void) { int major, minor; @@ -1521,6 +1610,9 @@ static void __init xen_hvm_guest_init(void) init_hvm_pv_info(); xen_hvm_init_shared_info(); +#ifdef CONFIG_KEXEC + register_syscore_ops(&xen_hvm_syscore_ops); +#endif if (xen_feature(XENFEAT_hvm_callback_vector)) xen_have_vector_callback = 1; diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 45329c8..ae8a00c 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) { #ifdef CONFIG_XEN_PVHVM int cpu; - xen_hvm_init_shared_info(); + xen_hvm_resume_shared_info(); xen_callback_vector(); xen_unplug_emulated_devices(); if (xen_feature(XENFEAT_hvm_safe_pvclock)) { diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 202d4c1..1e4329e 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -41,7 +41,7 @@ void xen_enable_syscall(void); void xen_vcpu_restore(void); void xen_callback_vector(void); -void xen_hvm_init_shared_info(void); +void xen_hvm_resume_shared_info(void); void xen_unplug_emulated_devices(void); void __init xen_build_dynamic_phys_to_machine(void); diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c index 97ca359..d4c50d6 100644 --- a/drivers/xen/platform-pci.c +++ b/drivers/xen/platform-pci.c @@ -101,6 +101,19 @@ static int platform_pci_resume(struct pci_dev *pdev) return 0; } +static void __devinit prepare_shared_info(void) +{ +#ifdef CONFIG_KEXEC + unsigned long addr; + struct shared_info *hvm_shared_info; + + addr = alloc_xen_mmio(PAGE_SIZE); + hvm_shared_info = ioremap(addr, PAGE_SIZE); + memset(hvm_shared_info, 0, PAGE_SIZE); + xen_hvm_prepare_kexec(hvm_shared_info, addr >> PAGE_SHIFT); +#endif +} + static int __devinit platform_pci_init(struct pci_dev *pdev, const struct pci_device_id *ent) { @@ -138,6 +151,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev, platform_mmio = mmio_addr; platform_mmiolen = mmio_len; + prepare_shared_info(); + if (!xen_have_vector_callback) { ret = xen_allocate_irq(pdev); if (ret) { diff --git a/include/xen/events.h b/include/xen/events.h index 04399b2..9c641de 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq); void xen_irq_resume(void); +void xen_hvm_prepare_kexec(struct shared_info *sip, unsigned long pfn); + /* Clear an irq''s pending state, in preparation for polling on it */ void xen_clear_irq_pending(int irq); void xen_set_irq_pending(int irq); -- 1.7.10.4
Olaf Hering
2012-Jul-17 12:34 UTC
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 17, Olaf Hering wrote: To make this robust against allocation errors I will change it to do {> + /* Allocate new mfn for previous pfn */ > + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);} while (rc == 0);> + > + /* Make sure the previous pfn is really connected to a (new) mfn */ > + BUG_ON(rc != 1);Olaf