Karl Johnson
2017-Oct-24 19:09 UTC
[CentOS-virt] Crash in CentOS 7 kernel-3.10.0-514.16.1.el7.x86_64 in Xen PV mode
On Tue, Oct 24, 2017 at 3:36 AM, Akemi Yagi <amyagi at gmail.com> wrote:> On Mon, Oct 23, 2017 at 11:08 PM, Akemi Yagi <amyagi at gmail.com> wrote: > >> On Mon, Oct 23, 2017 at 12:57 PM, Karl Johnson <karljohnson.it at gmail.com> >> wrote: >> >>> On Sat, May 20, 2017 at 8:30 PM, Sarah Newman <srn at prgmr.com> wrote: >>> >>>> I experienced a bug that is likely the same as >>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1350373 . Commit >>>> b7dd0e350e0bd4c0fddcc9b8958342700b00b168 , which is supposed to fix >>>> it, doesn't appear in this kernel and doesn't apply cleanly either. >>>> Is there any point in trying to backport the patch? >>>> >>>> I had the same kernel panic while booting a PV domU on >>> 3.10.0-693.2.2.el7.centos.plus.x86_64. I had to start the domU again to >>> boot correctly. Can this patch be added to the CentOS 7 kernel-plus? >>> >>> Karl >>> >> >> ?I can certainly add the patch (commit b7dd0e350e0bd4c0fddcc9b8958342700b00b168) >> to the Plus kernel.? It would be best if you could file a request on >> http://bugs.centos.org so that we can track it better. >> >> Akemi >> > > ?A CentOSPlus kernel ?set with the referenced patch applied is available > for testing at: > > https://people.centos.org/toracat/kernel/7/plus/xen/ > > Feedback appreciated, > > Akemi >Thanks for the build Akemi. I will try to test this kernel in the next days however it will be hard to know if it fix the kernel panic because I can't reproduce it. It's seems to be random and pretty rare in my case. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20171024/5f260b0e/attachment-0002.html>
Karl Johnson
2017-Oct-24 19:53 UTC
[CentOS-virt] Crash in CentOS 7 kernel-3.10.0-514.16.1.el7.x86_64 in Xen PV mode
On Tue, Oct 24, 2017 at 3:09 PM, Karl Johnson <karljohnson.it at gmail.com> wrote:> On Tue, Oct 24, 2017 at 3:36 AM, Akemi Yagi <amyagi at gmail.com> wrote: > >> On Mon, Oct 23, 2017 at 11:08 PM, Akemi Yagi <amyagi at gmail.com> wrote: >> >>> On Mon, Oct 23, 2017 at 12:57 PM, Karl Johnson <karljohnson.it at gmail.com >>> > wrote: >>> >>>> On Sat, May 20, 2017 at 8:30 PM, Sarah Newman <srn at prgmr.com> wrote: >>>> >>>>> I experienced a bug that is likely the same as >>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1350373 . Commit >>>>> b7dd0e350e0bd4c0fddcc9b8958342700b00b168 , which is supposed to fix >>>>> it, doesn't appear in this kernel and doesn't apply cleanly either. >>>>> Is there any point in trying to backport the patch? >>>>> >>>>> I had the same kernel panic while booting a PV domU on >>>> 3.10.0-693.2.2.el7.centos.plus.x86_64. I had to start the domU again >>>> to boot correctly. Can this patch be added to the CentOS 7 kernel-plus? >>>> >>>> Karl >>>> >>> >>> ?I can certainly add the patch (commit b7dd0e350e0bd4c0fddcc9b8958342700b00b168) >>> to the Plus kernel.? It would be best if you could file a request on >>> http://bugs.centos.org so that we can track it better. >>> >>> Akemi >>> >> >> ?A CentOSPlus kernel ?set with the referenced patch applied is available >> for testing at: >> >> https://people.centos.org/toracat/kernel/7/plus/xen/ >> >> Feedback appreciated, >> >> Akemi >> > > Thanks for the build Akemi. I will try to test this kernel in the next > days however it will be hard to know if it fix the kernel panic because I > can't reproduce it. It's seems to be random and pretty rare in my case. >The test kernel doesn't boot on my side: [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-693.5.2.el7.centos.plus.1.x86_64 (yagi2 at h64r7) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Mon Oct 23 22:30:37 PDT 2017 [ 0.000000] Command line: console=hvc0 xencons=tty0 root=/dev/xvda1 ro LANG=en_CA.UTF-8 elevator=noop nohz=off [ 0.000000] ACPI in unprivileged domain disabled [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable [ 0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved [ 0.000000] Xen: [mem 0x0000000000100000-0x000000003fffffff] usable [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI not present or invalid. [ 0.000000] e820: last_pfn = 0x40000 max_arch_pfn = 0x400000000 [ 0.000000] RAMDISK: [mem 0x0242d000-0x038e0fff] [ 0.000000] NUMA turned off [ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000003fffffff] [ 0.000000] NODE_DATA(0) allocated [mem 0x3fe03000-0x3fe29fff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x00001000-0x00ffffff] [ 0.000000] DMA32 [mem 0x01000000-0xffffffff] [ 0.000000] Normal empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x0009ffff] [ 0.000000] node 0: [mem 0x00100000-0x3fffffff] [ 0.000000] Initmem setup node 0 [mem 0x00001000-0x3fffffff] [ 0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org [ 0.000000] No local APIC present [ 0.000000] APIC: disable apic facility [ 0.000000] APIC: switched to apic NOOP [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] [ 0.000000] e820: [mem 0x40000000-0xffffffff] available for PCI devices [ 0.000000] Booting paravirtualized kernel on Xen [ 0.000000] Xen version: 4.6.3-3.el6 (preserve-AD) [ 0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1 [ 0.000000] PERCPU: Embedded 33 pages/cpu @ffff88003f800000 s97112 r8192 d29864 u1048576 [ 0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes) [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 257930 [ 0.000000] Policy zone: DMA32 [ 0.000000] Kernel command line: console=hvc0 xencons=tty0 root=/dev/xvda1 ro LANG=en_CA.UTF-8 elevator=noop nohz=off [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 [ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340 using standard form [ 0.000000] Memory: 989236k/1048576k available (6954k kernel code, 388k absent, 58952k reserved, 4575k data, 1768k init) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 [ 0.000000] Hierarchical RCU implementation. [ 0.000000] RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=2. [ 0.000000] NR_IRQS:327936 nr_irqs:32 0 [ 0.000000] Console: colour dummy device 80x25 [ 0.000000] console [tty0] enabled [ 0.000000] console [hvc0] enabled [ 0.000000] allocated 4194304 bytes of page_cgroup [ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups [ 0.000000] installing Xen timer for CPU 0 [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 2100.066 MHz processor [ 0.002000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4200.06 BogoMIPS (lpj=2100030) [ 0.002000] pid_max: default: 32768 minimum: 301 [ 0.002000] Security Framework initialized [ 0.002000] SELinux: Initializing. [ 0.002000] Yama: becoming mindful. [ 0.002000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.002000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.002000] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes) [ 0.002000] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes) [ 0.002086] Initializing cgroup subsys memory [ 0.002104] Initializing cgroup subsys devices [ 0.002111] Initializing cgroup subsys freezer [ 0.002116] Initializing cgroup subsys net_cls [ 0.002122] Initializing cgroup subsys blkio [ 0.002127] Initializing cgroup subsys perf_event [ 0.002133] Initializing cgroup subsys hugetlb [ 0.002138] Initializing cgroup subsys pids [ 0.002143] Initializing cgroup subsys net_prio [ 0.002207] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' [ 0.002214] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8) [ 0.002221] CPU: Physical Processor ID: 0 [ 0.002225] CPU: Processor Core ID: 0 [ 0.003093] Last level iTLB entries: 4KB 512, 2MB 0, 4MB 0 [ 0.003098] Last level dTLB entries: 4KB 512, 2MB 0, 4MB 0 [ 0.003103] tlb_flushall_shift: 6 [ 0.036643] ftrace: allocating 26819 entries in 105 pages [ 0.043078] cpu 0 spinlock event irq 17 [ 0.043086] smpboot: Max logical packages: 1 [ 0.043118] Performance Events: unsupported p6 CPU model 62 no PMU driver, software events only. [ 0.044508] NMI watchdog: disabled (cpu0): hardware events not enabled [ 0.044515] NMI watchdog: Shutting down hard lockup detector on all cpus [ 0.044598] installing Xen timer for CPU 1 [ 0.044613] cpu 1 spinlock event irq 24 [ 0.044678] SMP alternatives: switching to SMP code [ 0.002000] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: ffff APIC: 6 [ 0.072708] Brought up 2 CPUs [ 0.073046] devtmpfs: initialized [ 0.075736] EVM: security.selinux [ 0.075742] EVM: security.ima [ 0.075746] EVM: security.capability [ 0.076705] atomic64 test passed for x86-64 platform with CX8 and with SSE [ 0.076714] pinctrl core: initialized pinctrl subsystem [ 0.076763] xen:grant_table: Grant tables using version 2 layout [ 0.076775] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [ 0.076786] IP: [<ffffffff813f6d0f>] gnttab_init+0xff/0x260 [ 0.076796] PGD 0 [ 0.076802] Oops: 0002 [#1] SMP [ 0.076808] Modules linked in: [ 0.076817] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-693.5.2.el7.centos.plus.1.x86_64 #1 [ 0.076825] task: ffff88003da38000 ti: ffff88003daa0000 task.ti: ffff88003daa0000 [ 0.076831] RIP: e030:[<ffffffff813f6d0f>] [<ffffffff813f6d0f>] gnttab_init+0xff/0x260 [ 0.076840] RSP: e02b:ffff88003daa3df8 EFLAGS: 00010286 [ 0.076844] RAX: ffff88003d405000 RBX: 0000000000000000 RCX: 000000000001a210 [ 0.076849] RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000000 [ 0.076854] RBP: ffff88003daa3e40 R08: 0000000000000000 R09: 000000000001a1b0 [ 0.076859] R10: ffff88003fe03800 R11: 0000000000000001 R12: 0000000000000000 [ 0.077000] R13: 0000000000000001 R14: 0000000000000010 R15: 0000000000000000 [ 0.077000] FS: 0000000000000000(0000) GS:ffff88003f800000(0000) knlGS:0000000000000000 [ 0.077000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.077000] CR2: 0000000000000010 CR3: 0000000001a0a000 CR4: 0000000000042660 [ 0.077000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.077000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 0.077000] Stack: [ 0.077000] 0000000000000000 0000000400007ff0 ffff000000000020 000000001fffa6db [ 0.077000] ffffffff81a11020 ffff88003e002a70 ffffffff813f6e70 0000000000000000 [ 0.077000] 0000000000000000 ffff88003daa3e50 ffffffff813f6e93 ffff88003daa3e80 [ 0.077000] Call Trace: [ 0.077000] [<ffffffff813f6e70>] ? gnttab_init+0x260/0x260 [ 0.077000] [<ffffffff813f6e93>] __gnttab_init+0x23/0x40 [ 0.077000] [<ffffffff810020e8>] do_one_initcall+0xb8/0x230 [ 0.077000] [<ffffffff81b5d1fb>] kernel_init_freeable+0x17a/0x219 [ 0.077000] [<ffffffff81b5c9d4>] ? initcall_blacklist+0xb0/0xb0 [ 0.077000] [<ffffffff816a3d20>] ? rest_init+0x80/0x80 [ 0.077000] [<ffffffff816a3d2e>] kernel_init+0xe/0xf0 [ 0.077000] [<ffffffff816c5f98>] ret_from_fork+0x58/0x90 [ 0.077000] [<ffffffff816a3d20>] ? rest_init+0x80/0x80 [ 0.077000] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 83 c3 01 41 39 dd 0f 86 84 00 00 00 4c 63 e3 31 f6 bf d0 00 00 00 4e 8d 34 e0 e8 01 09 d9 ff <49> 89 06 48 8b 05 37 0d bf 00 4a 83 3c e0 00 75 d0 48 89 c7 41 [ 0.077000] RIP [<ffffffff813f6d0f>] gnttab_init+0xff/0x260 [ 0.077000] RSP <ffff88003daa3df8> [ 0.077000] CR2: 0000000000000010 [ 0.077000] ---[ end trace ad7a936cdeb5166e ]--- [ 0.077000] Kernel panic - not syncing: Fatal exception I switched back to 3.10.0-693.2.2.el7.centos.plus.x86_64. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20171024/a4478abc/attachment-0002.html>
Akemi Yagi
2017-Oct-24 21:09 UTC
[CentOS-virt] Crash in CentOS 7 kernel-3.10.0-514.16.1.el7.x86_64 in Xen PV mode
On Tue, Oct 24, 2017 at 12:53 PM, Karl Johnson <karljohnson.it at gmail.com> wrote:> On Tue, Oct 24, 2017 at 3:09 PM, Karl Johnson <karljohnson.it at gmail.com> > wrote: > >> On Tue, Oct 24, 2017 at 3:36 AM, Akemi Yagi <amyagi at gmail.com> wrote: >> >>> On Mon, Oct 23, 2017 at 11:08 PM, Akemi Yagi <amyagi at gmail.com> wrote: >>> >>>> On Mon, Oct 23, 2017 at 12:57 PM, Karl Johnson < >>>> karljohnson.it at gmail.com> wrote: >>>> >>>>> On Sat, May 20, 2017 at 8:30 PM, Sarah Newman <srn at prgmr.com> wrote: >>>>> >>>>>> I experienced a bug that is likely the same as >>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1350373 . Commit >>>>>> b7dd0e350e0bd4c0fddcc9b8958342700b00b168 , which is supposed to fix >>>>>> it, doesn't appear in this kernel and doesn't apply cleanly either. >>>>>> Is there any point in trying to backport the patch? >>>>>> >>>>>> I had the same kernel panic while booting a PV domU on >>>>> 3.10.0-693.2.2.el7.centos.plus.x86_64. I had to start the domU again >>>>> to boot correctly. Can this patch be added to the CentOS 7 kernel-plus? >>>>> >>>>> Karl >>>>> >>>> >>>> ?I can certainly add the patch (commit b7dd0e350e0bd4c0fddcc9b8958342700b00b168) >>>> to the Plus kernel.? It would be best if you could file a request on >>>> http://bugs.centos.org so that we can track it better. >>>> >>>> Akemi >>>> >>> >>> ?A CentOSPlus kernel ?set with the referenced patch applied is available >>> for testing at: >>> >>> https://people.centos.org/toracat/kernel/7/plus/xen/ >>> >>> Feedback appreciated, >>> >>> Akemi >>> >> >> Thanks for the build Akemi. I will try to test this kernel in the next >> days however it will be hard to know if it fix the kernel panic because I >> can't reproduce it. It's seems to be random and pretty rare in my case. >> > > The test kernel doesn't boot on my side: > > [ 0.077000] Call Trace: > [ 0.077000] [<ffffffff813f6e70>] ? gnttab_init+0x260/0x260 > [ 0.077000] [<ffffffff813f6e93>] __gnttab_init+0x23/0x40 > [ 0.077000] [<ffffffff810020e8>] do_one_initcall+0xb8/0x230 > [ 0.077000] [<ffffffff81b5d1fb>] kernel_init_freeable+0x17a/0x219 > [ 0.077000] [<ffffffff81b5c9d4>] ? initcall_blacklist+0xb0/0xb0 > [ 0.077000] [<ffffffff816a3d20>] ? rest_init+0x80/0x80 > [ 0.077000] [<ffffffff816a3d2e>] kernel_init+0xe/0xf0 > [ 0.077000] [<ffffffff816c5f98>] ret_from_fork+0x58/0x90 > [ 0.077000] [<ffffffff816a3d20>] ? rest_init+0x80/0x80 > [ 0.077000] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 83 c3 01 41 39 dd > 0f 86 84 00 00 00 4c 63 e3 31 f6 bf d0 00 00 00 4e 8d 34 e0 e8 01 09 d9 ff > <49> 89 06 48 8b 05 37 0d bf 00 4a 83 3c e0 00 75 d0 48 89 c7 41 > [ 0.077000] RIP [<ffffffff813f6d0f>] gnttab_init+0xff/0x260 > [ 0.077000] RSP <ffff88003daa3df8> > [ 0.077000] CR2: 0000000000000010 > [ 0.077000] ---[ end trace ad7a936cdeb5166e ]--- > [ 0.077000] Kernel panic - not syncing: Fatal exception > > I switched back to 3.10.0-693.2.2.el7.centos.plus.x86_64. >?Looks as if the patch broke something before it could fix the problem... Akemi? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20171024/8ec3bb8a/attachment-0002.html>