Jonas Meurer
2014-Oct-13 10:04 UTC
[Pkg-xen-devel] kernel crashes after soft lockups in xen domU
Hey again, Am 2014-08-19 12:26, schrieb Jonas Meurer:> I encounter kernel crashes on an up-to-date Debian/Wheezy Xen domU with > the > stock kernel. The dom0 runs the same linux kernel and > xen/4.1.4-3+deb7u1.the bug is still reproducible with the latest kernel and Xen packages from Debian Wheezy. Unfortunately it seems like a corner case, somehow related to the hardware setup. I'm unable to reproduce the crash on another Xen system with same kernel an Xen versions but different CPU and motherboard. The VM runs in production mode on the second Xen system since several weeks without one single crash. I've got a test system now where I'm able to reproduce the bug by putting the VM (a webserver) under heavy load with the help of siege and pylot. The VM crashes every time I put the webserver under heavy load, everytime with the same backtrace. Can I do anything additional to help debugging the bug? Shall I report it to Xen upstream or send it to lkml? Regarding the hardware: the RAM was checked with memtest86+ and no errors found and the CPU has been replaced by a new one (same model). Still, the VM crash is reproducible. The hardware on the crashing system is: CPU: Intel Xeon E5-2609v2/4x2,5GHz Motherboard: Supermicro X9SRI-F For information, the hardware on non-crashing system is: CPU: Intel XEON L5639/6x2,13 GHz Motherboard: Supermicro X8STi> It seems like the crashes are related to a RT process, even though no > sched_fifo/rr processes are started on this system intentionally. Also, > the > CPU usage is low all the time, no peaks at all. But the kernel reports: > > kernel: [39101.461586] sched: RT throttling activatedThis is still valid, even though I no longer think that it's related to a RT process at all, as no sched_fifo/rr processes are started.> Usually, a few minutes later, soft lockups start to happen, and then > the > system crashes:The backtrace is slightly different now due to kernel and Xen updates: [353013.384931] sched: RT throttling activated [354008.100835] BUG: soft lockup - CPU#5 stuck for 22s! [apache2:24848] [354008.100846] Modules linked in: evdev coretemp crc32c_intel ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 mbcache dm_mod xen_netfront xen_blkfront [354008.100872] CPU 5 [354008.100874] Modules linked in: evdev coretemp crc32c_intel ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 mbcache dm_mod xen_netfront xen_blkfront [354008.100894] [354008.100898] Pid: 24848, comm: apache2 Not tainted 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3 [354008.100904] RIP: e030:[<ffffffff8100122a>] [<ffffffff8100122a>] hypercall_page+0x22a/0x1000 [354008.100914] RSP: e02b:ffff8802f0b41c00 EFLAGS: 00000246 [354008.100918] RAX: 0000000000040001 RBX: ffff8802ffff0200 RCX: ffffffff8100122a [354008.100922] RDX: ffffffffffffffc8 RSI: 0000000000000000 RDI: 0000000000000000 [354008.100927] RBP: 000000000000000e R08: 0000000000000200 R09: dead000000100100 [354008.100931] R10: dead000000200200 R11: 0000000000000246 R12: ffff88028e261da0 [354008.100935] R13: ffff8802ffffbe00 R14: ffffea00099905e8 R15: 000000000000000d [354008.100944] FS: 00007f7b66cd2740(0000) GS:ffff8802ffd40000(0000) knlGS:0000000000000000 [354008.100949] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [354008.100953] CR2: 00007f7b68c2b000 CR3: 00000002855b7000 CR4: 0000000000002660 [354008.100958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [354008.100962] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [354008.100967] Process apache2 (pid: 24848, threadinfo ffff8802f0b40000, task ffff8802ef49c8c0) [354008.100972] Stack: [354008.100974] 0000000000000000 0000000000000200 ffffffff81006790 ffffffff81006d22 [354008.100981] 0000000000000000 dead000000200200 dead000000100100 0000000000000200 [354008.100988] ffff8802ffff0200 ffff8802ffff0200 ffffffffffffffc8 0000000000000200 [354008.100995] Call Trace: [354008.101000] [<ffffffff81006790>] ? xen_force_evtchn_callback+0x9/0xa [354008.101006] [<ffffffff81006d22>] ? check_events+0x12/0x20 [354008.101011] [<ffffffff81006d0f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [354008.101017] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 [354008.101024] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354008.101031] [<ffffffff810be895>] ? release_pages+0xf4/0x14d [354008.101038] [<ffffffff810de78b>] ? free_pages_and_swap_cache+0x48/0x60 [354008.101045] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 [354008.101049] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 [354008.101054] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 [354008.101060] [<ffffffff81044b82>] ? mmput+0x56/0xf8 [354008.101064] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 [354008.101069] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354008.101074] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 [354008.101078] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354008.101083] [<ffffffff81049f57>] ? do_exit+0x245/0x713 [354008.101088] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 [354008.101093] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 [354008.101098] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 [354008.101102] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e [354008.101107] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf [354008.101112] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b [354008.101115] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [354008.101157] Call Trace: [354008.101160] [<ffffffff81006790>] ? xen_force_evtchn_callback+0x9/0xa [354008.101165] [<ffffffff81006d22>] ? check_events+0x12/0x20 [354008.101170] [<ffffffff81006d0f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [354008.101175] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 [354008.101180] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354008.101185] [<ffffffff810be895>] ? release_pages+0xf4/0x14d [354008.101190] [<ffffffff810de78b>] ? free_pages_and_swap_cache+0x48/0x60 [354008.101195] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 [354008.101199] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 [354008.101204] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 [354008.101209] [<ffffffff81044b82>] ? mmput+0x56/0xf8 [354008.101213] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 [354008.101217] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354008.101221] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 [354008.101226] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354008.101230] [<ffffffff81049f57>] ? do_exit+0x245/0x713 [354008.101234] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 [354008.101239] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 [354008.101243] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 [354008.101247] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e [354008.101251] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf [354008.101256] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b [354036.127528] BUG: soft lockup - CPU#5 stuck for 22s! [apache2:24848] [354036.127539] Modules linked in: evdev coretemp crc32c_intel ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 mbcache dm_mod xen_netfront xen_blkfront [354036.127564] CPU 5 [354036.127567] Modules linked in: evdev coretemp crc32c_intel ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 mbcache dm_mod xen_netfront xen_blkfront [354036.127587] [354036.127590] Pid: 24848, comm: apache2 Not tainted 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3 [354036.127597] RIP: e030:[<ffffffff8100122a>] [<ffffffff8100122a>] hypercall_page+0x22a/0x1000 [354036.127606] RSP: e02b:ffff8802f0b41c00 EFLAGS: 00000246 [354036.127610] RAX: 0000000000040001 RBX: ffff8802ffff0200 RCX: ffffffff8100122a [354036.127615] RDX: ffffffffffffffc8 RSI: 0000000000000000 RDI: 0000000000000000 [354036.127619] RBP: 000000000000000e R08: 0000000000000200 R09: dead000000100100 [354036.127623] R10: dead000000200200 R11: 0000000000000246 R12: ffff880265e01cc0 [354036.127627] R13: ffff8802ffffbe00 R14: ffffea00099748c0 R15: 000000000000000d [354036.127637] FS: 00007f7b66cd2740(0000) GS:ffff8802ffd40000(0000) knlGS:0000000000000000 [354036.127642] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [354036.127646] CR2: 00007f7b68c2b000 CR3: 00000002855b7000 CR4: 0000000000002660 [354036.127650] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [354036.127655] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [354036.127660] Process apache2 (pid: 24848, threadinfo ffff8802f0b40000, task ffff8802ef49c8c0) [354036.127665] Stack: [354036.127667] 0000000000000000 0000000000000200 ffffffff81006790 ffffffff81006d22 [354036.127674] 0000000000000000 dead000000200200 dead000000100100 0000000000000200 [354036.127680] ffff8802ffff0200 ffff8802ffff0200 ffffffffffffffc8 0000000000000200 [354036.127687] Call Trace: [354036.127693] [<ffffffff81006790>] ? xen_force_evtchn_callback+0x9/0xa [354036.127699] [<ffffffff81006d22>] ? check_events+0x12/0x20 [354036.127704] [<ffffffff81006d0f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [354036.127710] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 [354036.127717] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354036.127723] [<ffffffff810be895>] ? release_pages+0xf4/0x14d [354036.127730] [<ffffffff810de78b>] ? free_pages_and_swap_cache+0x48/0x60 [354036.127736] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 [354036.127741] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 [354036.127746] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 [354036.127752] [<ffffffff81044b82>] ? mmput+0x56/0xf8 [354036.127756] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 [354036.127760] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354036.127765] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 [354036.127769] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354036.127773] [<ffffffff81049f57>] ? do_exit+0x245/0x713 [354036.127779] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 [354036.127784] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 [354036.127788] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 [354036.127793] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e [354036.127797] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf [354036.127802] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b [354036.127805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [354036.127846] Call Trace: [354036.127849] [<ffffffff81006790>] ? xen_force_evtchn_callback+0x9/0xa [354036.127854] [<ffffffff81006d22>] ? check_events+0x12/0x20 [354036.127859] [<ffffffff81006d0f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [354036.127864] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 [354036.127868] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354036.127873] [<ffffffff810be895>] ? release_pages+0xf4/0x14d [354036.127879] [<ffffffff810de78b>] ? free_pages_and_swap_cache+0x48/0x60 [354036.127883] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 [354036.127888] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 [354036.127893] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 [354036.127897] [<ffffffff81044b82>] ? mmput+0x56/0xf8 [354036.127901] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 [354036.127905] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354036.127909] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 [354036.127914] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354036.127918] [<ffffffff81049f57>] ? do_exit+0x245/0x713 [354036.127922] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 [354036.127927] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 [354036.127931] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 [354036.127935] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e [354036.128001] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf [354036.128007] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b [354044.085361] BUG: soft lockup - CPU#3 stuck for 23s! [apache2:20278] [354044.085370] Modules linked in: evdev coretemp crc32c_intel ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 mbcache dm_mod xen_netfront xen_blkfront [354044.085395] CPU 3 [354044.085397] Modules linked in: evdev coretemp crc32c_intel ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 mbcache dm_mod xen_netfront xen_blkfront [354044.085417] [354044.085420] Pid: 20278, comm: apache2 Not tainted 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3 [354044.085426] RIP: e030:[<ffffffff8100122a>] [<ffffffff8100122a>] hypercall_page+0x22a/0x1000 [354044.085436] RSP: e02b:ffff8802ef743c00 EFLAGS: 00000246 [354044.085440] RAX: 0000000000040001 RBX: ffff8802ffff0200 RCX: ffffffff8100122a [354044.085444] RDX: ffffffffffffffc8 RSI: 0000000000000000 RDI: 0000000000000000 [354044.085449] RBP: 000000000000000e R08: 0000000000000200 R09: dead000000100100 [354044.085453] R10: dead000000200200 R11: 0000000000000246 R12: ffff880265c53320 [354044.085457] R13: ffff8802ffffbe00 R14: ffffea0009cec350 R15: 000000000000000d [354044.085466] FS: 00007f7b66cd2740(0000) GS:ffff8802ffcc0000(0000) knlGS:0000000000000000 [354044.085471] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [354044.085475] CR2: 00007f7b6906cff0 CR3: 00000002efbfb000 CR4: 0000000000002660 [354044.085479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [354044.085484] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [354044.085489] Process apache2 (pid: 20278, threadinfo ffff8802ef742000, task ffff8802efa5b880) [354044.085493] Stack: [354044.085495] 0000000000000000 0000000000000200 ffffffff81006790 ffffffff81006d22 [354044.085502] 0000000000000000 dead000000200200 dead000000100100 0000000000000200 [354044.085508] ffff8802ffff0200 ffff8802ffff0200 ffffffffffffffc8 0000000000000200 [354044.085746] Call Trace: [354044.085753] [<ffffffff81006790>] ? xen_force_evtchn_callback+0x9/0xa [354044.085758] [<ffffffff81006d22>] ? check_events+0x12/0x20 [354044.085763] [<ffffffff81006d0f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [354044.085769] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 [354044.085776] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354044.085783] [<ffffffff810be895>] ? release_pages+0xf4/0x14d [354044.085790] [<ffffffff810de78b>] ? free_pages_and_swap_cache+0x48/0x60 [354044.085796] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 [354044.085800] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 [354044.085806] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 [354044.085811] [<ffffffff81044b82>] ? mmput+0x56/0xf8 [354044.085815] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 [354044.085820] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354044.085824] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 [354044.085829] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354044.085833] [<ffffffff81049f57>] ? do_exit+0x245/0x713 [354044.085838] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 [354044.085843] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 [354044.085848] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 [354044.085852] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e [354044.085856] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf [354044.085861] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b [354044.085865] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [354044.085905] Call Trace: [354044.085909] [<ffffffff81006790>] ? xen_force_evtchn_callback+0x9/0xa [354044.085914] [<ffffffff81006d22>] ? check_events+0x12/0x20 [354044.085919] [<ffffffff81006d0f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [354044.085924] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 [354044.085928] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354044.085933] [<ffffffff810be895>] ? release_pages+0xf4/0x14d [354044.085939] [<ffffffff810de78b>] ? free_pages_and_swap_cache+0x48/0x60 [354044.085944] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 [354044.085948] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 [354044.085953] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 [354044.085957] [<ffffffff81044b82>] ? mmput+0x56/0xf8 [354044.085961] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 [354044.085965] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354044.085970] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 [354044.085974] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354044.085978] [<ffffffff81049f57>] ? do_exit+0x245/0x713 [354044.085983] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 [354044.085987] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 [354044.085992] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 [354044.085996] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e [354044.086000] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf [354044.086004] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b [354047.219103] INFO: rcu_sched detected stall on CPU 5 (t=15039 jiffies) [354047.224054] INFO: rcu_sched detected stalls on CPUs/tasks: { 5} (detected by 1, t=15039 jiffies) [354047.224054] sending NMI to all CPUs: [354047.224054] BUG: unable to handle kernel paging request at ffffffffff5fb310 [354047.224054] IP: [<ffffffff81027b9a>] native_apic_mem_write+0x2/0x9 [354047.224054] PGD 1607067 PUD 1608067 PMD 172d067 PTE 0 [354047.224054] Oops: 0002 [#1] SMP [354047.224054] CPU 1 [354047.224054] Modules linked in: evdev coretemp crc32c_intel ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 mbcache dm_mod xen_netfront xen_blkfront [354047.224054] [354047.224054] Pid: 24843, comm: apache2 Not tainted 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3 [354047.224054] RIP: e030:[<ffffffff81027b9a>] [<ffffffff81027b9a>] native_apic_mem_write+0x2/0x9 [354047.224054] RSP: e02b:ffff8802ffc43c90 EFLAGS: 00010086 [354047.224054] RAX: 0000000000000000 RBX: ffffffff816800e0 RCX: 00000000000007c2 [354047.224054] RDX: 0000000000000000 RSI: 00000000ff000000 RDI: 0000000000000310 [354047.224054] RBP: 0000000000000002 R08: 0000000000000002 R09: 0000000000000000 [354047.224054] R10: 0000000000000000 R11: ffff8802f2863d40 R12: 0000000000000800 [354047.224054] R13: 00000000000000ff R14: ffffffff81624400 R15: 0000000000000000 [354047.224054] FS: 00007f7b66cd2740(0000) GS:ffff8802ffc40000(0000) knlGS:0000000000000000 [354047.224054] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [354047.224054] CR2: ffffffffff5fb310 CR3: 00000002f06c2000 CR4: 0000000000002660 [354047.224054] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [354047.224054] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [354047.224054] Process apache2 (pid: 24843, threadinfo ffff8802ef78a000, task ffff8802f1476800) [354047.224054] Stack: [354047.224054] ffffffff81027dd3 0000000000000000 0000000000002710 ffffffff81622300 [354047.224054] ffffffff81622400 0000000000000001 ffffffff81024d16 ffff8802ffc4edc0 [354047.224054] ffffffff81096308 0000000000000048 ffffffff81624400 ffffffff81098d57 [354047.224054] Call Trace: [354047.224054] <IRQ> [354047.224054] [<ffffffff81027dd3>] ? _flat_send_IPI_mask+0x4b/0x78 [354047.224054] [<ffffffff81024d16>] ? arch_trigger_all_cpu_backtrace+0x4d/0x7b [354047.224054] [<ffffffff81096308>] ? __rcu_pending+0x21a/0x358 [354047.224054] [<ffffffff81098d57>] ? arch_local_irq_restore+0x7/0x8 [354047.224054] [<ffffffff8106c35c>] ? tick_nohz_handler+0xd0/0xd0 [354047.224054] [<ffffffff81096773>] ? rcu_check_callbacks+0x90/0xcc [354047.224054] [<ffffffff81052cbe>] ? update_process_times+0x31/0x63 [354047.224054] [<ffffffff8106c3c6>] ? tick_sched_timer+0x6a/0x90 [354047.224054] [<ffffffff81062632>] ? __run_hrtimer+0xac/0x135 [354047.224054] [<ffffffff81062d1c>] ? hrtimer_interrupt+0xd7/0x1b1 [354047.224054] [<ffffffff8103b087>] ? check_preempt_curr+0x52/0x5f [354047.224054] [<ffffffff810068b9>] ? xen_timer_interrupt+0x28/0xfc [354047.224054] [<ffffffff81244771>] ? get_cycles+0x5/0x8 [354047.224054] [<ffffffff8124561e>] ? add_interrupt_randomness+0x38/0x155 [354047.224054] [<ffffffff8109124d>] ? handle_irq_event_percpu+0x50/0x17d [354047.224054] [<ffffffff8121ca3a>] ? disable_pirq+0x2/0x2 [354047.224054] [<ffffffff8121c624>] ? info_for_irq+0x7/0x17 [354047.224054] [<ffffffff81093847>] ? handle_percpu_irq+0x3a/0x4f [354047.224054] [<ffffffff8121c866>] ? __xen_evtchn_do_upcall+0xd3/0x287 [354047.224054] [<ffffffff8104b780>] ? __local_bh_enable+0x40/0x77 [354047.224054] [<ffffffff813576ac>] ? call_softirq+0x1c/0x30 [354047.224054] [<ffffffff81095239>] ? arch_local_irq_save+0x11/0x15 [354047.224054] [<ffffffff8121dd98>] ? xen_evtchn_do_upcall+0x22/0x32 [354047.224054] [<ffffffff813576fe>] ? xen_do_hypervisor_callback+0x1e/0x30 [354047.224054] <EOI> [354047.224054] [<ffffffff810be219>] ? add_page_to_lru_list+0x64/0x64 [354047.224054] [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000 [354047.224054] [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000 [354047.224054] [<ffffffff81006790>] ? xen_force_evtchn_callback+0x9/0xa [354047.224054] [<ffffffff81006d22>] ? check_events+0x12/0x20 [354047.224054] [<ffffffff81006d0f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [354047.224054] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 [354047.224054] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354047.224054] [<ffffffff810be97d>] ? pagevec_lru_move_fn+0x8f/0xb5 [354047.224054] [<ffffffff810beb8a>] ? __lru_cache_add+0x4a/0x51 [354047.224054] [<ffffffff810d1537>] ? handle_pte_fault+0x224/0x79f [354047.224054] [<ffffffff810ceacb>] ? pmd_val+0x7/0x8 [354047.224054] [<ffffffff810ceb49>] ? pte_offset_kernel+0x16/0x35 [354047.224054] [<ffffffff813533ee>] ? do_page_fault+0x320/0x345 [354047.224054] [<ffffffff81003223>] ? xen_end_context_switch+0xe/0x1c [354047.224054] [<ffffffff81003ba5>] ? xen_mc_issue.constprop.23+0x31/0x49 [354047.224054] [<ffffffff8100d750>] ? __switch_to+0x1e5/0x258 [354047.224054] [<ffffffff81035bd7>] ? arch_local_irq_enable+0x7/0x8 [354047.224054] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 [354047.224054] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 [354047.224054] [<ffffffff813509f5>] ? page_fault+0x25/0x30 [354047.224054] Code: 00 74 18 48 8d 74 24 0c bf 1b 00 00 00 e8 ab fb ff ff f6 c4 04 0f 95 c0 0f b6 c0 48 83 c4 10 c3 90 ff 14 25 d8 57 61 81 c3 89 ff <89> b7 00 b0 5f ff c3 89 ff 8b 87 00 b0 5f ff c3 48 8b 07 25 ff [354047.224054] RIP [<ffffffff81027b9a>] native_apic_mem_write+0x2/0x9 [354047.224054] RSP <ffff8802ffc43c90> [354047.224054] CR2: ffffffffff5fb310 [354047.224054] ---[ end trace 94d691dcc7253fa7 ]--- [354047.224054] Kernel panic - not syncing: Fatal exception in interrupt [354047.224054] Pid: 24843, comm: apache2 Tainted: G D 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3 [354047.224054] Call Trace: [354047.224054] <IRQ> [<ffffffff81349af6>] ? panic+0x95/0x1a2 [354047.224054] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354047.224054] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354047.224054] [<ffffffff81351286>] ? oops_end+0xa9/0xb6 [354047.224054] [<ffffffff8134943f>] ? no_context+0x1ff/0x20e [354047.224054] [<ffffffff81348ccd>] ? pmd_val+0x7/0x8 [354047.224054] [<ffffffff81348d0c>] ? pte_offset_kernel+0x16/0x35 [354047.224054] [<ffffffff81353284>] ? do_page_fault+0x1b6/0x345 [354047.224054] [<ffffffff811b2c7f>] ? vsnprintf+0x7c/0x427 [354047.224054] [<ffffffff811b2c7f>] ? vsnprintf+0x7c/0x427 [354047.224054] [<ffffffff8102bb7c>] ? pvclock_clocksource_read+0x42/0xb2 [354047.224054] [<ffffffff811b3070>] ? sprintf+0x46/0x4b [354047.224054] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 [354047.224054] [<ffffffff8107116d>] ? arch_local_irq_save+0x11/0x17 [354047.224054] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 [354047.224054] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354047.224054] [<ffffffff810637e0>] ? down_trylock+0x20/0x29 [354047.224054] [<ffffffff813509f5>] ? page_fault+0x25/0x30 [354047.224054] [<ffffffff81027b9a>] ? native_apic_mem_write+0x2/0x9 [354047.224054] [<ffffffff81027dd3>] ? _flat_send_IPI_mask+0x4b/0x78 [354047.224054] [<ffffffff81024d16>] ? arch_trigger_all_cpu_backtrace+0x4d/0x7b [354047.224054] [<ffffffff81096308>] ? __rcu_pending+0x21a/0x358 [354047.224054] [<ffffffff81098d57>] ? arch_local_irq_restore+0x7/0x8 [354047.224054] [<ffffffff8106c35c>] ? tick_nohz_handler+0xd0/0xd0 [354047.224054] [<ffffffff81096773>] ? rcu_check_callbacks+0x90/0xcc [354047.224054] [<ffffffff81052cbe>] ? update_process_times+0x31/0x63 [354047.224054] [<ffffffff8106c3c6>] ? tick_sched_timer+0x6a/0x90 [354047.224054] [<ffffffff81062632>] ? __run_hrtimer+0xac/0x135 [354047.224054] [<ffffffff81062d1c>] ? hrtimer_interrupt+0xd7/0x1b1 [354047.224054] [<ffffffff8103b087>] ? check_preempt_curr+0x52/0x5f [354047.224054] [<ffffffff810068b9>] ? xen_timer_interrupt+0x28/0xfc [354047.224054] [<ffffffff81244771>] ? get_cycles+0x5/0x8 [354047.224054] [<ffffffff8124561e>] ? add_interrupt_randomness+0x38/0x155 [354047.224054] [<ffffffff8109124d>] ? handle_irq_event_percpu+0x50/0x17d [354047.224054] [<ffffffff8121ca3a>] ? disable_pirq+0x2/0x2 [354047.224054] [<ffffffff8121c624>] ? info_for_irq+0x7/0x17 [354047.224054] [<ffffffff81093847>] ? handle_percpu_irq+0x3a/0x4f [354047.224054] [<ffffffff8121c866>] ? __xen_evtchn_do_upcall+0xd3/0x287 [354047.224054] [<ffffffff8104b780>] ? __local_bh_enable+0x40/0x77 [354047.224054] [<ffffffff813576ac>] ? call_softirq+0x1c/0x30 [354047.224054] [<ffffffff81095239>] ? arch_local_irq_save+0x11/0x15 [354047.224054] [<ffffffff8121dd98>] ? xen_evtchn_do_upcall+0x22/0x32 [354047.224054] [<ffffffff813576fe>] ? xen_do_hypervisor_callback+0x1e/0x30 [354047.224054] <EOI> [<ffffffff810be219>] ? add_page_to_lru_list+0x64/0x64 [354047.224054] [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000 [354047.224054] [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000 [354047.224054] [<ffffffff81006790>] ? xen_force_evtchn_callback+0x9/0xa [354047.224054] [<ffffffff81006d22>] ? check_events+0x12/0x20 [354047.224054] [<ffffffff81006d0f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [354047.224054] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 [354047.224054] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf [354047.224054] [<ffffffff810be97d>] ? pagevec_lru_move_fn+0x8f/0xb5 [354047.224054] [<ffffffff810beb8a>] ? __lru_cache_add+0x4a/0x51 [354047.224054] [<ffffffff810d1537>] ? handle_pte_fault+0x224/0x79f [354047.224054] [<ffffffff810ceacb>] ? pmd_val+0x7/0x8 [354047.224054] [<ffffffff810ceb49>] ? pte_offset_kernel+0x16/0x35 [354047.224054] [<ffffffff813533ee>] ? do_page_fault+0x320/0x345 [354047.224054] [<ffffffff81003223>] ? xen_end_context_switch+0xe/0x1c [354047.224054] [<ffffffff81003ba5>] ? xen_mc_issue.constprop.23+0x31/0x49 [354047.224054] [<ffffffff8100d750>] ? __switch_to+0x1e5/0x258 [354047.224054] [<ffffffff81035bd7>] ? arch_local_irq_enable+0x7/0x8 [354047.224054] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 [354047.224054] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 [354047.224054] [<ffffffff813509f5>] ? page_fault+0x25/0x30 Cheers, jonas
Jonas Meurer
2014-Nov-05 16:56 UTC
[Pkg-xen-devel] kernel crashes after soft lockups in xen domU
And some more information ... Am 2014-10-13 12:04, schrieb Jonas Meurer:> Am 2014-08-19 12:26, schrieb Jonas Meurer: >> I encounter kernel crashes on an up-to-date Debian/Wheezy Xen domU >> with the >> stock kernel. The dom0 runs the same linux kernel and >> xen/4.1.4-3+deb7u1. > > the bug is still reproducible with the latest kernel and Xen packages > from > Debian Wheezy. Unfortunately it seems like a corner case, somehow > related > to the hardware setup. I'm unable to reproduce the crash on another Xen > system with same kernel an Xen versions but different CPU and > motherboard. > The VM runs in production mode on the second Xen system since several > weeks > without one single crash. > > I've got a test system now where I'm able to reproduce the bug by > putting the > VM (a webserver) under heavy load with the help of siege and pylot. The > VM > crashes every time I put the webserver under heavy load, everytime with > the > same backtrace.I replaced the full system (all hardware components except the harddisks) with a new one in the meantime - and still the bug is repoducible. I'll try to describe the setup: Two Xen virtualization servers (xen1 and xen2), mirroring one block device with DRBD using a primary/secondary setup. The DRBD device holds LVM with the LVs for one single virtual machine (the webserver). This webserver runs an Apache2 daemon. The first virtualization server (xen1, the one that's live) runs rock stable, same for the webserver VM on top. No crashes, no exploding load. The second virtualization server (xen2) runs well as long as it's only secondary (i.e. no virtual machine started). As soon as I switch the DRBD primary to xen2 and start the webserver VM there, load on the webserver is unusual high, it becomes laggy and after some hours (sometimes minutes) it crashes like described in earlier mails. Now I created a test-VM on xen2 that is not on top of the DRBD device in order to factor out DRBD as reason. As already written, if I fire some HTTP requests against the Apache daemon on the test-VM, the VM crashes the same way. I first replaced memory modules and CPU by similar ones without results. Now I replaced the whole hardware (except harddisks) by another one - still the same crashes. So the question is: why does the VM run stable on xen1 while it crashes all the time on xen2. If I compare xen1 and xen2, only real difference is mainboard (Supermicro X8 on xen1; Supermicro X9 on xen2) and CPU (Xeon L5939 on xen1; E5-2609 on xen2) As a next step I'll put the harddisks into another X8/Xeon L5639 server system and try to reproduce the crashes there. My bet is that this system will not crash anymore. In other words, I guess that this very bug is only triggered with the X9 + E-2609 combination.> Can I do anything additional to help debugging the bug? Shall I report > it > to Xen upstream or send it to lkml?Still the same question. Shall I send the bugreport to upstream? Unfortunately nobody from Debian Linux kernel and/or Xen team seems to care :-/ Cheers, jonas> Regarding the hardware: the RAM was checked with memtest86+ and no > errors > found and the CPU has been replaced by a new one (same model). Still, > the > VM crash is reproducible. > > The hardware on the crashing system is: > CPU: Intel Xeon E5-2609v2/4x2,5GHz > Motherboard: Supermicro X9SRI-F > > For information, the hardware on non-crashing system is: > CPU: Intel XEON L5639/6x2,13 GHz > Motherboard: Supermicro X8STi > >> It seems like the crashes are related to a RT process, even though no >> sched_fifo/rr processes are started on this system intentionally. >> Also, the >> CPU usage is low all the time, no peaks at all. But the kernel >> reports: >> >> kernel: [39101.461586] sched: RT throttling activated > > This is still valid, even though I no longer think that it's related to > a > RT process at all, as no sched_fifo/rr processes are started. > >> Usually, a few minutes later, soft lockups start to happen, and then >> the >> system crashes: > > The backtrace is slightly different now due to kernel and Xen updates: > > [353013.384931] sched: RT throttling activated > [354008.100835] BUG: soft lockup - CPU#5 stuck for 22s! [apache2:24848] > [354008.100846] Modules linked in: evdev coretemp crc32c_intel > ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer > aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 > mbcache dm_mod xen_netfront xen_blkfront > [354008.100872] CPU 5 > [354008.100874] Modules linked in: evdev coretemp crc32c_intel > ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer > aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 > mbcache dm_mod xen_netfront xen_blkfront > [354008.100894] > [354008.100898] Pid: 24848, comm: apache2 Not tainted 3.2.0-4-amd64 #1 > Debian 3.2.60-1+deb7u3 > [354008.100904] RIP: e030:[<ffffffff8100122a>] [<ffffffff8100122a>] > hypercall_page+0x22a/0x1000 > [354008.100914] RSP: e02b:ffff8802f0b41c00 EFLAGS: 00000246 > [354008.100918] RAX: 0000000000040001 RBX: ffff8802ffff0200 RCX: > ffffffff8100122a > [354008.100922] RDX: ffffffffffffffc8 RSI: 0000000000000000 RDI: > 0000000000000000 > [354008.100927] RBP: 000000000000000e R08: 0000000000000200 R09: > dead000000100100 > [354008.100931] R10: dead000000200200 R11: 0000000000000246 R12: > ffff88028e261da0 > [354008.100935] R13: ffff8802ffffbe00 R14: ffffea00099905e8 R15: > 000000000000000d > [354008.100944] FS: 00007f7b66cd2740(0000) GS:ffff8802ffd40000(0000) > knlGS:0000000000000000 > [354008.100949] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [354008.100953] CR2: 00007f7b68c2b000 CR3: 00000002855b7000 CR4: > 0000000000002660 > [354008.100958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [354008.100962] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [354008.100967] Process apache2 (pid: 24848, threadinfo > ffff8802f0b40000, task ffff8802ef49c8c0) > [354008.100972] Stack: > [354008.100974] 0000000000000000 0000000000000200 ffffffff81006790 > ffffffff81006d22 > [354008.100981] 0000000000000000 dead000000200200 dead000000100100 > 0000000000000200 > [354008.100988] ffff8802ffff0200 ffff8802ffff0200 ffffffffffffffc8 > 0000000000000200 > [354008.100995] Call Trace: > [354008.101000] [<ffffffff81006790>] ? > xen_force_evtchn_callback+0x9/0xa > [354008.101006] [<ffffffff81006d22>] ? check_events+0x12/0x20 > [354008.101011] [<ffffffff81006d0f>] ? > xen_restore_fl_direct_reloc+0x4/0x4 > [354008.101017] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 > [354008.101024] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354008.101031] [<ffffffff810be895>] ? release_pages+0xf4/0x14d > [354008.101038] [<ffffffff810de78b>] ? > free_pages_and_swap_cache+0x48/0x60 > [354008.101045] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 > [354008.101049] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 > [354008.101054] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 > [354008.101060] [<ffffffff81044b82>] ? mmput+0x56/0xf8 > [354008.101064] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 > [354008.101069] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354008.101074] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 > [354008.101078] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354008.101083] [<ffffffff81049f57>] ? do_exit+0x245/0x713 > [354008.101088] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 > [354008.101093] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 > [354008.101098] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 > [354008.101102] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e > [354008.101107] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf > [354008.101112] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b > [354008.101115] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc > cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 > 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc > cc cc > [354008.101157] Call Trace: > [354008.101160] [<ffffffff81006790>] ? > xen_force_evtchn_callback+0x9/0xa > [354008.101165] [<ffffffff81006d22>] ? check_events+0x12/0x20 > [354008.101170] [<ffffffff81006d0f>] ? > xen_restore_fl_direct_reloc+0x4/0x4 > [354008.101175] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 > [354008.101180] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354008.101185] [<ffffffff810be895>] ? release_pages+0xf4/0x14d > [354008.101190] [<ffffffff810de78b>] ? > free_pages_and_swap_cache+0x48/0x60 > [354008.101195] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 > [354008.101199] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 > [354008.101204] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 > [354008.101209] [<ffffffff81044b82>] ? mmput+0x56/0xf8 > [354008.101213] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 > [354008.101217] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354008.101221] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 > [354008.101226] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354008.101230] [<ffffffff81049f57>] ? do_exit+0x245/0x713 > [354008.101234] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 > [354008.101239] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 > [354008.101243] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 > [354008.101247] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e > [354008.101251] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf > [354008.101256] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b > [354036.127528] BUG: soft lockup - CPU#5 stuck for 22s! [apache2:24848] > [354036.127539] Modules linked in: evdev coretemp crc32c_intel > ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer > aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 > mbcache dm_mod xen_netfront xen_blkfront > [354036.127564] CPU 5 > [354036.127567] Modules linked in: evdev coretemp crc32c_intel > ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer > aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 > mbcache dm_mod xen_netfront xen_blkfront > [354036.127587] > [354036.127590] Pid: 24848, comm: apache2 Not tainted 3.2.0-4-amd64 #1 > Debian 3.2.60-1+deb7u3 > [354036.127597] RIP: e030:[<ffffffff8100122a>] [<ffffffff8100122a>] > hypercall_page+0x22a/0x1000 > [354036.127606] RSP: e02b:ffff8802f0b41c00 EFLAGS: 00000246 > [354036.127610] RAX: 0000000000040001 RBX: ffff8802ffff0200 RCX: > ffffffff8100122a > [354036.127615] RDX: ffffffffffffffc8 RSI: 0000000000000000 RDI: > 0000000000000000 > [354036.127619] RBP: 000000000000000e R08: 0000000000000200 R09: > dead000000100100 > [354036.127623] R10: dead000000200200 R11: 0000000000000246 R12: > ffff880265e01cc0 > [354036.127627] R13: ffff8802ffffbe00 R14: ffffea00099748c0 R15: > 000000000000000d > [354036.127637] FS: 00007f7b66cd2740(0000) GS:ffff8802ffd40000(0000) > knlGS:0000000000000000 > [354036.127642] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [354036.127646] CR2: 00007f7b68c2b000 CR3: 00000002855b7000 CR4: > 0000000000002660 > [354036.127650] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [354036.127655] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [354036.127660] Process apache2 (pid: 24848, threadinfo > ffff8802f0b40000, task ffff8802ef49c8c0) > [354036.127665] Stack: > [354036.127667] 0000000000000000 0000000000000200 ffffffff81006790 > ffffffff81006d22 > [354036.127674] 0000000000000000 dead000000200200 dead000000100100 > 0000000000000200 > [354036.127680] ffff8802ffff0200 ffff8802ffff0200 ffffffffffffffc8 > 0000000000000200 > [354036.127687] Call Trace: > [354036.127693] [<ffffffff81006790>] ? > xen_force_evtchn_callback+0x9/0xa > [354036.127699] [<ffffffff81006d22>] ? check_events+0x12/0x20 > [354036.127704] [<ffffffff81006d0f>] ? > xen_restore_fl_direct_reloc+0x4/0x4 > [354036.127710] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 > [354036.127717] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354036.127723] [<ffffffff810be895>] ? release_pages+0xf4/0x14d > [354036.127730] [<ffffffff810de78b>] ? > free_pages_and_swap_cache+0x48/0x60 > [354036.127736] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 > [354036.127741] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 > [354036.127746] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 > [354036.127752] [<ffffffff81044b82>] ? mmput+0x56/0xf8 > [354036.127756] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 > [354036.127760] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354036.127765] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 > [354036.127769] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354036.127773] [<ffffffff81049f57>] ? do_exit+0x245/0x713 > [354036.127779] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 > [354036.127784] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 > [354036.127788] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 > [354036.127793] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e > [354036.127797] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf > [354036.127802] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b > [354036.127805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc > cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 > 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc > cc cc > [354036.127846] Call Trace: > [354036.127849] [<ffffffff81006790>] ? > xen_force_evtchn_callback+0x9/0xa > [354036.127854] [<ffffffff81006d22>] ? check_events+0x12/0x20 > [354036.127859] [<ffffffff81006d0f>] ? > xen_restore_fl_direct_reloc+0x4/0x4 > [354036.127864] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 > [354036.127868] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354036.127873] [<ffffffff810be895>] ? release_pages+0xf4/0x14d > [354036.127879] [<ffffffff810de78b>] ? > free_pages_and_swap_cache+0x48/0x60 > [354036.127883] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 > [354036.127888] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 > [354036.127893] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 > [354036.127897] [<ffffffff81044b82>] ? mmput+0x56/0xf8 > [354036.127901] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 > [354036.127905] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354036.127909] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 > [354036.127914] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354036.127918] [<ffffffff81049f57>] ? do_exit+0x245/0x713 > [354036.127922] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 > [354036.127927] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 > [354036.127931] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 > [354036.127935] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e > [354036.128001] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf > [354036.128007] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b > [354044.085361] BUG: soft lockup - CPU#3 stuck for 23s! [apache2:20278] > [354044.085370] Modules linked in: evdev coretemp crc32c_intel > ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer > aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 > mbcache dm_mod xen_netfront xen_blkfront > [354044.085395] CPU 3 > [354044.085397] Modules linked in: evdev coretemp crc32c_intel > ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer > aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 > mbcache dm_mod xen_netfront xen_blkfront > [354044.085417] > [354044.085420] Pid: 20278, comm: apache2 Not tainted 3.2.0-4-amd64 #1 > Debian 3.2.60-1+deb7u3 > [354044.085426] RIP: e030:[<ffffffff8100122a>] [<ffffffff8100122a>] > hypercall_page+0x22a/0x1000 > [354044.085436] RSP: e02b:ffff8802ef743c00 EFLAGS: 00000246 > [354044.085440] RAX: 0000000000040001 RBX: ffff8802ffff0200 RCX: > ffffffff8100122a > [354044.085444] RDX: ffffffffffffffc8 RSI: 0000000000000000 RDI: > 0000000000000000 > [354044.085449] RBP: 000000000000000e R08: 0000000000000200 R09: > dead000000100100 > [354044.085453] R10: dead000000200200 R11: 0000000000000246 R12: > ffff880265c53320 > [354044.085457] R13: ffff8802ffffbe00 R14: ffffea0009cec350 R15: > 000000000000000d > [354044.085466] FS: 00007f7b66cd2740(0000) GS:ffff8802ffcc0000(0000) > knlGS:0000000000000000 > [354044.085471] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [354044.085475] CR2: 00007f7b6906cff0 CR3: 00000002efbfb000 CR4: > 0000000000002660 > [354044.085479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [354044.085484] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [354044.085489] Process apache2 (pid: 20278, threadinfo > ffff8802ef742000, task ffff8802efa5b880) > [354044.085493] Stack: > [354044.085495] 0000000000000000 0000000000000200 ffffffff81006790 > ffffffff81006d22 > [354044.085502] 0000000000000000 dead000000200200 dead000000100100 > 0000000000000200 > [354044.085508] ffff8802ffff0200 ffff8802ffff0200 ffffffffffffffc8 > 0000000000000200 > [354044.085746] Call Trace: > [354044.085753] [<ffffffff81006790>] ? > xen_force_evtchn_callback+0x9/0xa > [354044.085758] [<ffffffff81006d22>] ? check_events+0x12/0x20 > [354044.085763] [<ffffffff81006d0f>] ? > xen_restore_fl_direct_reloc+0x4/0x4 > [354044.085769] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 > [354044.085776] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354044.085783] [<ffffffff810be895>] ? release_pages+0xf4/0x14d > [354044.085790] [<ffffffff810de78b>] ? > free_pages_and_swap_cache+0x48/0x60 > [354044.085796] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 > [354044.085800] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 > [354044.085806] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 > [354044.085811] [<ffffffff81044b82>] ? mmput+0x56/0xf8 > [354044.085815] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 > [354044.085820] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354044.085824] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 > [354044.085829] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354044.085833] [<ffffffff81049f57>] ? do_exit+0x245/0x713 > [354044.085838] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 > [354044.085843] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 > [354044.085848] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 > [354044.085852] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e > [354044.085856] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf > [354044.085861] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b > [354044.085865] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc > cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 > 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc > cc cc > [354044.085905] Call Trace: > [354044.085909] [<ffffffff81006790>] ? > xen_force_evtchn_callback+0x9/0xa > [354044.085914] [<ffffffff81006d22>] ? check_events+0x12/0x20 > [354044.085919] [<ffffffff81006d0f>] ? > xen_restore_fl_direct_reloc+0x4/0x4 > [354044.085924] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 > [354044.085928] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354044.085933] [<ffffffff810be895>] ? release_pages+0xf4/0x14d > [354044.085939] [<ffffffff810de78b>] ? > free_pages_and_swap_cache+0x48/0x60 > [354044.085944] [<ffffffff810cf527>] ? tlb_flush_mmu+0x37/0x50 > [354044.085948] [<ffffffff810cf54c>] ? tlb_finish_mmu+0xc/0x31 > [354044.085953] [<ffffffff810d5e79>] ? exit_mmap+0xc4/0xe9 > [354044.085957] [<ffffffff81044b82>] ? mmput+0x56/0xf8 > [354044.085961] [<ffffffff81049d07>] ? exit_mm+0x117/0x122 > [354044.085965] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354044.085970] [<ffffffff81350487>] ? _raw_spin_lock_irq+0xa/0x14 > [354044.085974] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354044.085978] [<ffffffff81049f57>] ? do_exit+0x245/0x713 > [354044.085983] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 > [354044.085987] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 > [354044.085992] [<ffffffff810d5c58>] ? do_munmap+0x2da/0x2f3 > [354044.085996] [<ffffffff8104a6a5>] ? do_group_exit+0x74/0x9e > [354044.086000] [<ffffffff8104a6de>] ? sys_exit_group+0xf/0xf > [354044.086004] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b > [354047.219103] INFO: rcu_sched detected stall on CPU 5 (t=15039 > jiffies) > [354047.224054] INFO: rcu_sched detected stalls on CPUs/tasks: { 5} > (detected by 1, t=15039 jiffies) > [354047.224054] sending NMI to all CPUs: > [354047.224054] BUG: unable to handle kernel paging request at > ffffffffff5fb310 > [354047.224054] IP: [<ffffffff81027b9a>] native_apic_mem_write+0x2/0x9 > [354047.224054] PGD 1607067 PUD 1608067 PMD 172d067 PTE 0 > [354047.224054] Oops: 0002 [#1] SMP > [354047.224054] CPU 1 > [354047.224054] Modules linked in: evdev coretemp crc32c_intel > ghash_clmulni_intel snd_pcm snd_page_alloc aesni_intel snd_timer > aes_x86_64 snd aes_generic soundcore cryptd pcspkr ext4 crc16 jbd2 > mbcache dm_mod xen_netfront xen_blkfront > [354047.224054] > [354047.224054] Pid: 24843, comm: apache2 Not tainted 3.2.0-4-amd64 #1 > Debian 3.2.60-1+deb7u3 > [354047.224054] RIP: e030:[<ffffffff81027b9a>] [<ffffffff81027b9a>] > native_apic_mem_write+0x2/0x9 > [354047.224054] RSP: e02b:ffff8802ffc43c90 EFLAGS: 00010086 > [354047.224054] RAX: 0000000000000000 RBX: ffffffff816800e0 RCX: > 00000000000007c2 > [354047.224054] RDX: 0000000000000000 RSI: 00000000ff000000 RDI: > 0000000000000310 > [354047.224054] RBP: 0000000000000002 R08: 0000000000000002 R09: > 0000000000000000 > [354047.224054] R10: 0000000000000000 R11: ffff8802f2863d40 R12: > 0000000000000800 > [354047.224054] R13: 00000000000000ff R14: ffffffff81624400 R15: > 0000000000000000 > [354047.224054] FS: 00007f7b66cd2740(0000) GS:ffff8802ffc40000(0000) > knlGS:0000000000000000 > [354047.224054] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [354047.224054] CR2: ffffffffff5fb310 CR3: 00000002f06c2000 CR4: > 0000000000002660 > [354047.224054] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [354047.224054] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [354047.224054] Process apache2 (pid: 24843, threadinfo > ffff8802ef78a000, task ffff8802f1476800) > [354047.224054] Stack: > [354047.224054] ffffffff81027dd3 0000000000000000 0000000000002710 > ffffffff81622300 > [354047.224054] ffffffff81622400 0000000000000001 ffffffff81024d16 > ffff8802ffc4edc0 > [354047.224054] ffffffff81096308 0000000000000048 ffffffff81624400 > ffffffff81098d57 > [354047.224054] Call Trace: > [354047.224054] <IRQ> > [354047.224054] [<ffffffff81027dd3>] ? _flat_send_IPI_mask+0x4b/0x78 > [354047.224054] [<ffffffff81024d16>] ? > arch_trigger_all_cpu_backtrace+0x4d/0x7b > [354047.224054] [<ffffffff81096308>] ? __rcu_pending+0x21a/0x358 > [354047.224054] [<ffffffff81098d57>] ? arch_local_irq_restore+0x7/0x8 > [354047.224054] [<ffffffff8106c35c>] ? tick_nohz_handler+0xd0/0xd0 > [354047.224054] [<ffffffff81096773>] ? rcu_check_callbacks+0x90/0xcc > [354047.224054] [<ffffffff81052cbe>] ? update_process_times+0x31/0x63 > [354047.224054] [<ffffffff8106c3c6>] ? tick_sched_timer+0x6a/0x90 > [354047.224054] [<ffffffff81062632>] ? __run_hrtimer+0xac/0x135 > [354047.224054] [<ffffffff81062d1c>] ? hrtimer_interrupt+0xd7/0x1b1 > [354047.224054] [<ffffffff8103b087>] ? check_preempt_curr+0x52/0x5f > [354047.224054] [<ffffffff810068b9>] ? xen_timer_interrupt+0x28/0xfc > [354047.224054] [<ffffffff81244771>] ? get_cycles+0x5/0x8 > [354047.224054] [<ffffffff8124561e>] ? > add_interrupt_randomness+0x38/0x155 > [354047.224054] [<ffffffff8109124d>] ? > handle_irq_event_percpu+0x50/0x17d > [354047.224054] [<ffffffff8121ca3a>] ? disable_pirq+0x2/0x2 > [354047.224054] [<ffffffff8121c624>] ? info_for_irq+0x7/0x17 > [354047.224054] [<ffffffff81093847>] ? handle_percpu_irq+0x3a/0x4f > [354047.224054] [<ffffffff8121c866>] ? > __xen_evtchn_do_upcall+0xd3/0x287 > [354047.224054] [<ffffffff8104b780>] ? __local_bh_enable+0x40/0x77 > [354047.224054] [<ffffffff813576ac>] ? call_softirq+0x1c/0x30 > [354047.224054] [<ffffffff81095239>] ? arch_local_irq_save+0x11/0x15 > [354047.224054] [<ffffffff8121dd98>] ? xen_evtchn_do_upcall+0x22/0x32 > [354047.224054] [<ffffffff813576fe>] ? > xen_do_hypervisor_callback+0x1e/0x30 > [354047.224054] <EOI> > [354047.224054] [<ffffffff810be219>] ? add_page_to_lru_list+0x64/0x64 > [354047.224054] [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000 > [354047.224054] [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000 > [354047.224054] [<ffffffff81006790>] ? > xen_force_evtchn_callback+0x9/0xa > [354047.224054] [<ffffffff81006d22>] ? check_events+0x12/0x20 > [354047.224054] [<ffffffff81006d0f>] ? > xen_restore_fl_direct_reloc+0x4/0x4 > [354047.224054] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 > [354047.224054] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354047.224054] [<ffffffff810be97d>] ? pagevec_lru_move_fn+0x8f/0xb5 > [354047.224054] [<ffffffff810beb8a>] ? __lru_cache_add+0x4a/0x51 > [354047.224054] [<ffffffff810d1537>] ? handle_pte_fault+0x224/0x79f > [354047.224054] [<ffffffff810ceacb>] ? pmd_val+0x7/0x8 > [354047.224054] [<ffffffff810ceb49>] ? pte_offset_kernel+0x16/0x35 > [354047.224054] [<ffffffff813533ee>] ? do_page_fault+0x320/0x345 > [354047.224054] [<ffffffff81003223>] ? xen_end_context_switch+0xe/0x1c > [354047.224054] [<ffffffff81003ba5>] ? > xen_mc_issue.constprop.23+0x31/0x49 > [354047.224054] [<ffffffff8100d750>] ? __switch_to+0x1e5/0x258 > [354047.224054] [<ffffffff81035bd7>] ? arch_local_irq_enable+0x7/0x8 > [354047.224054] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 > [354047.224054] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 > [354047.224054] [<ffffffff813509f5>] ? page_fault+0x25/0x30 > [354047.224054] Code: 00 74 18 48 8d 74 24 0c bf 1b 00 00 00 e8 ab fb > ff ff f6 c4 04 0f 95 c0 0f b6 c0 48 83 c4 10 c3 90 ff 14 25 d8 57 61 > 81 c3 89 ff <89> b7 00 b0 5f ff c3 89 ff 8b 87 00 b0 5f ff c3 48 8b 07 > 25 ff > [354047.224054] RIP [<ffffffff81027b9a>] native_apic_mem_write+0x2/0x9 > [354047.224054] RSP <ffff8802ffc43c90> > [354047.224054] CR2: ffffffffff5fb310 > [354047.224054] ---[ end trace 94d691dcc7253fa7 ]--- > [354047.224054] Kernel panic - not syncing: Fatal exception in > interrupt > [354047.224054] Pid: 24843, comm: apache2 Tainted: G D > 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3 > [354047.224054] Call Trace: > [354047.224054] <IRQ> [<ffffffff81349af6>] ? panic+0x95/0x1a2 > [354047.224054] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354047.224054] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354047.224054] [<ffffffff81351286>] ? oops_end+0xa9/0xb6 > [354047.224054] [<ffffffff8134943f>] ? no_context+0x1ff/0x20e > [354047.224054] [<ffffffff81348ccd>] ? pmd_val+0x7/0x8 > [354047.224054] [<ffffffff81348d0c>] ? pte_offset_kernel+0x16/0x35 > [354047.224054] [<ffffffff81353284>] ? do_page_fault+0x1b6/0x345 > [354047.224054] [<ffffffff811b2c7f>] ? vsnprintf+0x7c/0x427 > [354047.224054] [<ffffffff811b2c7f>] ? vsnprintf+0x7c/0x427 > [354047.224054] [<ffffffff8102bb7c>] ? > pvclock_clocksource_read+0x42/0xb2 > [354047.224054] [<ffffffff811b3070>] ? sprintf+0x46/0x4b > [354047.224054] [<ffffffff8107115b>] ? arch_local_irq_disable+0x7/0x8 > [354047.224054] [<ffffffff8107116d>] ? arch_local_irq_save+0x11/0x17 > [354047.224054] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 > [354047.224054] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354047.224054] [<ffffffff810637e0>] ? down_trylock+0x20/0x29 > [354047.224054] [<ffffffff813509f5>] ? page_fault+0x25/0x30 > [354047.224054] [<ffffffff81027b9a>] ? native_apic_mem_write+0x2/0x9 > [354047.224054] [<ffffffff81027dd3>] ? _flat_send_IPI_mask+0x4b/0x78 > [354047.224054] [<ffffffff81024d16>] ? > arch_trigger_all_cpu_backtrace+0x4d/0x7b > [354047.224054] [<ffffffff81096308>] ? __rcu_pending+0x21a/0x358 > [354047.224054] [<ffffffff81098d57>] ? arch_local_irq_restore+0x7/0x8 > [354047.224054] [<ffffffff8106c35c>] ? tick_nohz_handler+0xd0/0xd0 > [354047.224054] [<ffffffff81096773>] ? rcu_check_callbacks+0x90/0xcc > [354047.224054] [<ffffffff81052cbe>] ? update_process_times+0x31/0x63 > [354047.224054] [<ffffffff8106c3c6>] ? tick_sched_timer+0x6a/0x90 > [354047.224054] [<ffffffff81062632>] ? __run_hrtimer+0xac/0x135 > [354047.224054] [<ffffffff81062d1c>] ? hrtimer_interrupt+0xd7/0x1b1 > [354047.224054] [<ffffffff8103b087>] ? check_preempt_curr+0x52/0x5f > [354047.224054] [<ffffffff810068b9>] ? xen_timer_interrupt+0x28/0xfc > [354047.224054] [<ffffffff81244771>] ? get_cycles+0x5/0x8 > [354047.224054] [<ffffffff8124561e>] ? > add_interrupt_randomness+0x38/0x155 > [354047.224054] [<ffffffff8109124d>] ? > handle_irq_event_percpu+0x50/0x17d > [354047.224054] [<ffffffff8121ca3a>] ? disable_pirq+0x2/0x2 > [354047.224054] [<ffffffff8121c624>] ? info_for_irq+0x7/0x17 > [354047.224054] [<ffffffff81093847>] ? handle_percpu_irq+0x3a/0x4f > [354047.224054] [<ffffffff8121c866>] ? > __xen_evtchn_do_upcall+0xd3/0x287 > [354047.224054] [<ffffffff8104b780>] ? __local_bh_enable+0x40/0x77 > [354047.224054] [<ffffffff813576ac>] ? call_softirq+0x1c/0x30 > [354047.224054] [<ffffffff81095239>] ? arch_local_irq_save+0x11/0x15 > [354047.224054] [<ffffffff8121dd98>] ? xen_evtchn_do_upcall+0x22/0x32 > [354047.224054] [<ffffffff813576fe>] ? > xen_do_hypervisor_callback+0x1e/0x30 > [354047.224054] <EOI> [<ffffffff810be219>] ? > add_page_to_lru_list+0x64/0x64 > [354047.224054] [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000 > [354047.224054] [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000 > [354047.224054] [<ffffffff81006790>] ? > xen_force_evtchn_callback+0x9/0xa > [354047.224054] [<ffffffff81006d22>] ? check_events+0x12/0x20 > [354047.224054] [<ffffffff81006d0f>] ? > xen_restore_fl_direct_reloc+0x4/0x4 > [354047.224054] [<ffffffff81071153>] ? arch_local_irq_restore+0x7/0x8 > [354047.224054] [<ffffffff8135049f>] ? > _raw_spin_unlock_irqrestore+0xe/0xf > [354047.224054] [<ffffffff810be97d>] ? pagevec_lru_move_fn+0x8f/0xb5 > [354047.224054] [<ffffffff810beb8a>] ? __lru_cache_add+0x4a/0x51 > [354047.224054] [<ffffffff810d1537>] ? handle_pte_fault+0x224/0x79f > [354047.224054] [<ffffffff810ceacb>] ? pmd_val+0x7/0x8 > [354047.224054] [<ffffffff810ceb49>] ? pte_offset_kernel+0x16/0x35 > [354047.224054] [<ffffffff813533ee>] ? do_page_fault+0x320/0x345 > [354047.224054] [<ffffffff81003223>] ? xen_end_context_switch+0xe/0x1c > [354047.224054] [<ffffffff81003ba5>] ? > xen_mc_issue.constprop.23+0x31/0x49 > [354047.224054] [<ffffffff8100d750>] ? __switch_to+0x1e5/0x258 > [354047.224054] [<ffffffff81035bd7>] ? arch_local_irq_enable+0x7/0x8 > [354047.224054] [<ffffffff81039a92>] ? finish_task_switch+0x4e/0xb9 > [354047.224054] [<ffffffff8134f0e9>] ? __schedule+0x5f9/0x610 > [354047.224054] [<ffffffff813509f5>] ? page_fault+0x25/0x30 > > Cheers, > jonas
Ben Hutchings
2014-Nov-05 20:40 UTC
[Pkg-xen-devel] kernel crashes after soft lockups in xen domU
On Wed, 2014-11-05 at 17:56 +0100, Jonas Meurer wrote: [...]> So the question is: why does the VM run stable on xen1 while it > crashes all the time on xen2. If I compare xen1 and xen2, only > real difference is mainboard (Supermicro X8 on xen1; Supermicro > X9 on xen2) and CPU (Xeon L5939 on xen1; E5-2609 on xen2) > > As a next step I'll put the harddisks into another X8/Xeon L5639 > server system and try to reproduce the crashes there. My bet is > that this system will not crash anymore. In other words, I guess > that this very bug is only triggered with the X9 + E-2609 > combination. > > > Can I do anything additional to help debugging the bug? Shall I report > > it > > to Xen upstream or send it to lkml? > > Still the same question. Shall I send the bugreport to upstream? > Unfortunately nobody from Debian Linux kernel and/or Xen team seems > to care :-/[...] Sorry you haven't had a response from us so far. This seems to be fairly clearly a Linux/Xen interaction and I don't know enough about Xen to suggest how to debug it. As it involves a relatively old kernel version, I don't think Linux upstream developers will want to hear about this unless you can also reproduce it with a more recent version. Linux 3.16 is available (in testing and wheezy-backports) if you would like to try that. I don't know whether the Xen upstream developers will accept a bug report against this version. Ben. -- Ben Hutchings The program is absolutely right; therefore, the computer must be wrong. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 811 bytes Desc: This is a digitally signed message part URL: <http://lists.alioth.debian.org/pipermail/pkg-xen-devel/attachments/20141105/7030712a/attachment.sig>