Thanks for your ideas, Sean !>The bug is not likely to be corrupted data in the decompressed output (that is just calling into a gzip routine or something). You shouldn't have to dump/printf->trace from memory during boot to see that data since the "real" kernel binary that is being decompressed into that memory region is probably already somewhere > in your build tree (arch/x86/boot/compressed/Makefile seems like it has the details;Yeah, what I mean, it not just decompresses it, it also applies relocations to the decompressed output: __decompress(input_data, input_len, NULL, NULL, output, output_len, NULL, error); parse_elf(output); handle_relocations(output, output_len, virt_addr); My idea of quick dumping was based on a some experience for one of FreeBSD loaders I fixed earlier. It was very short (512 bytes if I am not mistaken) and just quick look on LLD and bfd linked hex views revealed the reason instantly. LLD linked binary was like corrupted with 4 bytes holes. All other code around was equal. That showed that we just did not apply one of relocations (R_386_16 I think) at all. So main idea of dumping was to check if there is something obviously wrong. May be it was not the best idea :) I want to try though.> Grepping around, it seems like they build this list of relocations based on some sort of homegrown tooling in arch/x86/tools/. E.g. look at arch/x86/tools/relocs.c.So I am also suspecting it is something relative to relocations, my suspicion also based on fact that we had to implement --emit-relocs option to support how they handle it. This is new feature for LLD and can probably still may have bugs (last part was committed on this friday)? and needs testing. Details about how they use --emit-relocs was in next comment: https://reviews.llvm.org/D28612#647277. In short --emit-relocs generates .rel[a].xxx sections in output, they extract these sections using objdump and wrap it into separate binary with own format. And use this data for self-doing relocatations. George. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170219/3ce0626d/attachment-0001.html>
I found that bug finally :) I dumped output for kernel and observed that LLD output starts from ELF header, while BFD points to some code data it seems, what is expected since execution flow makes jmp there. That means decompressed kernel was not copied properly and issue is at one of points you sent earlier: https://github.com/torvalds/linux/blob/5924bbecd0267d87c24110cbe2041b5075173a25/arch/x86/boot/compressed/misc.c#L300 Looking on readelf output for vmlinux, BFD loads has correct LMAs: LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000 0x0000000000241000 0x0000000000241000 R E 200000 LOAD 0x0000000000600000 0xffffffff81400000 0x0000000001400000 0x0000000000041000 0x0000000000041000 RW 200000 LOAD 0x0000000000641000 0xffffffff81441000 0x0000000001441000 0x0000000000070699 0x00000000000c4000 RWE 200000 But LLD output is broken at that part, all LMA are 0x0000000080000000 here?: LOAD 0x0000000000001000 0xffffffff81000000 0x0000000080000000 0x0000000000242000 0x0000000000242000 R E 1000 LOAD 0x0000000000243000 0xffffffff81400000 0x0000000080000000 0x0000000000041000 0x0000000000041000 RW 1000 LOAD 0x0000000000284000 0xffffffff81441000 0x0000000080000000 0x0000000000071000 0x00000000000c4000 RWE 1000 That happened because script sets address in next way: .text : AT(ADDR(.text) - 0xffffffff80000000) { ... } .data : AT(ADDR(.data) - 0xffffffff80000000) { .. } .init.begin : AT(ADDR(.init.begin) - 0xffffffff80000000) { ... } We currently just calculate ADDR(..) as zero in such cases and because of that always have result == 0x0000000080000000 finally. That is the reason of kernel corruption. I prepared patch that fixes it: https://reviews.llvm.org/D30163, with that in, I was able to pass that place and QEMU no more reboots for me ! Though it still hangs a bit later, atm it shows: CPU: AMD QEMU Virtual CPU version 2.5+ (family: 0x6, model: 0x6, stepping: 0x3) Performance Events: PMU not available due to virtualization, using software events only. ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... ..... (found apic 0 pin 2) ... ....... failed. ...trying to set up timer as Virtual Wire IRQ... ..... failed. ...trying to set up timer as ExtINT IRQ... ..... failed :(. Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. ---[ end Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. random: fast init done but that is definetely a different new issue, I am investigating it. (at least I am happy there is no more silent QEMU reboots and it shows some readable error messages it seems). George. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170220/2c967f81/attachment.html>
And I think current issue with "Kernel panic - not syncing: IO-APIC + timer doesn't work!" is also clear. timer_irq_works(void) never returns 1: https://github.com/torvalds/linux/blob/d966564fcdc19e13eb6ba1fbe6b8101070339c3d/arch/x86/kernel/apic/io_apic.c#L1641 I think it happens because of jiffies (http://www.makelinux.net/books/lkd2/ch10lev1sec3#ch10fig01) It should have the same address as jiffies_64: https://github.com/torvalds/linux/blob/b66484cd74706fa8681d051840fe4b18a3da40ff/arch/x86/kernel/vmlinux.lds.S#L41 And that is true for BFD linked binary: 10595: ffffffff8140b000 8 OBJECT GLOBAL DEFAULT 8 jiffies 11730: ffffffff8140b000 8 OBJECT GLOBAL DEFAULT 8 jiffies_64 But something is wrong with them for LLD case: 6422: ffffffff8140b000 8 OBJECT GLOBAL DEFAULT 19 jiffies 7416: ffffffff81400000 0 NOTYPE GLOBAL DEFAULT 19 jiffies_64 I think we probably incorrectly assign symbols outside SECTIONS declarations in scripts. Looking at it. George. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170220/d2086522/attachment-0001.html>