Raphael Bauduin
2014-Mar-26 07:45 UTC
[libvirt-users] host crashes "unable to handle paging request"
Hi, we have regular crashed of a kvm host with the error "unable to handle paging request". Can this be due to memory over-commitment even if some memory is still used by the kernel for caches and buffers? (collectd graph shows no free memory, with 15G used, very little buffers, and 1G cache). There are 32GB of swap, of which only 150MB are used. I suspect might be the direction to search to find the cause, but would be happy to learn from people versed in the kernel behaviour to confirm or reject my hypothesis. Below is the full error. Thanks! Raph 745 Mar 23 14:27:37 sMaster01 kernel: [241450.355339] BUG: unable to handle kernel paging request at ffff8804c001fade 746 Mar 23 14:27:37 sMaster01 kernel: [241450.355384] IP: [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd 747 Mar 23 14:27:37 sMaster01 kernel: [241450.355433] PGD 1002063 PUD 0 748 Mar 23 14:27:37 sMaster01 kernel: [241450.355464] Oops: 0000 [#1] SMP 749 Mar 23 14:27:37 sMaster01 kernel: [241450.355496] last sysfs file: /sys/devices/system/cpu/cpu15/ topology/thread_siblings 750 Mar 23 14:27:37 sMaster01 kernel: [241450.355551] CPU 4 751 Mar 23 14:27:37 sMaster01 kernel: [241450.355577] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp kvm_amd kvm ip6table_filter ip6_tables iptable_fi lter ip_tables x_tables tun nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp bonding dm_round_robin dm_multipath scsi_dh loop snd_pcm snd_timer snd soundcore snd_page_alloc serio_raw evdev tpm_tis tpm tpm_bios p smouse pcspkr amd64_edac_mod edac_core button edac_mce_amd shpchp i2c_piix4 container pci_hotplug i2c_core processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sd_mod crc_t10dif mptsas mptscsih mptbase lpfc ehci_hcd scsi_transport_fc tg3 scsi_tgt scsi_transport_sas ohci_hcd libphy scsi_mod usbcore nls_base thermal fan thermal_sys [last unloaded: scsi_wait_scan] 752 Mar 23 14:27:37 sMaster01 kernel: [241450.356084] Pid: 3557, comm: kjournald Not tainted 2.6.32.61vanilla #1 PRIMERGY BX630 S2 753 Mar 23 14:27:37 sMaster01 kernel: [241450.356141] RIP: 0010:[<ffffffff8117e9e9>] [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd 754 Mar 23 14:27:37 sMaster01 kernel: [241450.356196] RSP: 0018:ffff8804229abba0 EFLAGS: 00010202 755 Mar 23 14:27:37 sMaster01 kernel: [241450.356228] RAX: ffff8804c001fad6 RBX: ffff8802e7235080 RCX: 00011200061e5110 756 Mar 23 14:27:37 sMaster01 kernel: [241450.356279] RDX: 0000000000000008 RSI: 0000000000000008 RDI: ffff8802e7235080 757 Mar 23 14:27:37 sMaster01 kernel: [241450.356331] RBP: ffff8802e7235080 R08: 0000000000000000 R09: ffff880425c54c00 758 Mar 23 14:27:37 sMaster01 kernel: [241450.356383] R10: 0000000000000003 R11: 00000000022e539e R12: ffff8802e7235080 759 Mar 23 14:27:37 sMaster01 kernel: [241450.356434] R13: ffff8802e7235080 R14: ffff880425c54c00 R15: ffff8802e6281850 760 Mar 23 14:27:37 sMaster01 kernel: [241450.356486] FS: 00007faa6a757820(0000) GS:ffff88000fc80000(0000) knlGS:0000000000000000 761 Mar 23 14:27:37 sMaster01 kernel: [241450.356540] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 762 Mar 23 14:27:37 sMaster01 kernel: [241450.356573] CR2: ffff8804c001fade CR3: 00000000cc11f000 CR4: 00000000000006e0 763 Mar 23 14:27:37 sMaster01 kernel: [241450.356628] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 764 Mar 23 14:27:37 sMaster01 kernel: [241450.356681] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 765 Mar 23 14:27:37 sMaster01 kernel: [241450.356733] Process kjournald (pid: 3557, threadinfo ffff8804229aa000, task ffff88041490a300) 766 Mar 23 14:27:37 sMaster01 kernel: [241450.356788] Stack: 767 Mar 23 14:27:37 sMaster01 kernel: [241450.356812] ffff880415382c00 0000000100000285 ffff8804229abfd8 0000000000005186 768 Mar 23 14:27:37 sMaster01 kernel: [241450.356852] <0> 0000000000000000 000000000f1c2776 ffff8804128efa38 ffff8802e7235080 769 Mar 23 14:27:37 sMaster01 kernel: [241450.356913] <0> ffff8802e7235080 ffff8802e7235080 ffff8800cdacae40 ffffffff8117eb5a 770 Mar 23 14:27:37 sMaster01 kernel: [241450.356993] Call Trace: 771 Mar 23 14:27:37 sMaster01 kernel: [241450.357021] [<ffffffff8117eb5a>] ? generic_make_request+0xcd/0x2f9 772 Mar 23 14:27:37 sMaster01 kernel: [241450.357058] [<ffffffff810b6034>] ? mempool_alloc+0x55/0x106 773 Mar 23 14:27:37 sMaster01 kernel: [241450.357091] [<ffffffff8117ee5c>] ? submit_bio+0xd6/0xf2 774 Mar 23 14:27:37 sMaster01 kernel: [241450.357125] [<ffffffff8110d83f>] ? submit_bh+0xf5/0x115 775 Mar 23 14:27:37 sMaster01 kernel: [241450.357158] [<ffffffff8110edc0>] ? sync_dirty_buffer+0x51/0x93 776 Mar 23 14:27:37 sMaster01 kernel: [241450.357196] [<ffffffffa01727c7>] ? journal_commit_transaction+0xaa6/0xe4f [jbd] 777 Mar 23 14:27:37 sMaster01 kernel: [241450.357252] [<ffffffffa0175194>] ? kjournald+0xdf/0x226 [jbd] 778 Mar 23 14:27:37 sMaster01 kernel: [241450.357288] [<ffffffff810651de>] ? autoremove_wake_function+0x0/0x2e 779 Mar 23 14:27:37 sMaster01 kernel: [241450.357324] [<ffffffffa01750b5>] ? kjournald+0x0/0x226 [jbd] 780 Mar 23 14:27:37 sMaster01 kernel: [241450.357357] [<ffffffff81064f11>] ? kthread+0x79/0x81 781 Mar 23 14:27:37 sMaster01 kernel: [241450.357391] [<ffffffff81011baa>] ? child_rip+0xa/0x20 782 Mar 23 14:27:37 sMaster01 kernel: [241450.357425] [<ffffffff81016568>] ? read_tsc+0xa/0x20 783 Mar 23 14:27:37 sMaster01 kernel: [241450.357456] [<ffffffff81064e98>] ? kthread+0x0/0x81 784 Mar 23 14:27:37 sMaster01 kernel: [241450.357487] [<ffffffff81011ba0>] ? child_rip+0x0/0x20 785 Mar 23 14:27:37 sMaster01 kernel: [241450.357517] Code: 5c c3 41 55 49 89 fd 41 54 55 53 48 83 ec 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 85 f6 0f 84 86 00 00 00 48 8b 47 10 <48> 8b 40 08 48 8b 40 68 48 c1 f8 09 74 74 89 f2 48 8b 0f 48 39 786 Mar 23 14:27:37 sMaster01 kernel: [241450.357738] RIP [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd 787 Mar 23 14:27:37 sMaster01 kernel: [241450.357772] RSP <ffff8804229abba0> 788 Mar 23 14:27:37 sMaster01 kernel: [241450.357799] CR2: ffff8804c001fade 789 Mar 23 14:27:37 sMaster01 kernel: [241450.358183] ---[ end trace 608fcf1f5a482549 ]--- -- Web database: http://www.myowndb.com Free Software Developers Meeting: http://www.fosdem.org
Raphael Bauduin
2014-Mar-28 14:45 UTC
Re: [libvirt-users] host crashes "unable to handle paging request"
On Wed, Mar 26, 2014 at 8:45 AM, Raphael Bauduin <rblists@gmail.com> wrote:> Hi, > > we have regular crashed of a kvm host with the error "unable to handle > paging request". > Can this be due to memory over-commitment even if some memory is still > used by the kernel for caches and buffers? (collectd graph shows no free > memory, with 15G used, very little buffers, and 1G cache). There are 32GB > of swap, of which only 150MB are used. > > I suspect might be the direction to search to find the cause, but would be > happy to learn from people versed in the kernel behaviour to confirm or > reject my hypothesis. Below is the full error. > > Thanks! > > Raph > > > > 745 Mar 23 14:27:37 sMaster01 kernel: [241450.355339] BUG: unable to > handle kernel paging request at ffff8804c001fade > 746 Mar 23 14:27:37 sMaster01 kernel: [241450.355384] IP: > [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd > 747 Mar 23 14:27:37 sMaster01 kernel: [241450.355433] PGD 1002063 PUD 0 > 748 Mar 23 14:27:37 sMaster01 kernel: [241450.355464] Oops: 0000 [#1] SMP > 749 Mar 23 14:27:37 sMaster01 kernel: [241450.355496] last sysfs file: > /sys/devices/system/cpu/cpu15/ > topology/thread_siblings > 750 Mar 23 14:27:37 sMaster01 kernel: [241450.355551] CPU 4 > 751 Mar 23 14:27:37 sMaster01 kernel: [241450.355577] Modules linked in: > ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 > xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp kvm_amd kvm ip6table_filter > ip6_tables iptable_fi lter ip_tables x_tables tun nfsd exportfs nfs > lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp bonding dm_round_robin > dm_multipath scsi_dh loop snd_pcm snd_timer snd soundcore snd_page_alloc > serio_raw evdev tpm_tis tpm tpm_bios p smouse pcspkr amd64_edac_mod > edac_core button edac_mce_amd shpchp i2c_piix4 container pci_hotplug > i2c_core processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log > dm_snapshot dm_mod sd_mod crc_t10dif mptsas mptscsih mptbase lpfc > ehci_hcd scsi_transport_fc tg3 scsi_tgt scsi_transport_sas ohci_hcd libphy > scsi_mod usbcore nls_base thermal fan thermal_sys [last unloaded: > scsi_wait_scan] > 752 Mar 23 14:27:37 sMaster01 kernel: [241450.356084] Pid: 3557, comm: > kjournald Not tainted 2.6.32.61vanilla #1 PRIMERGY BX630 S2 > 753 Mar 23 14:27:37 sMaster01 kernel: [241450.356141] RIP: > 0010:[<ffffffff8117e9e9>] [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd > 754 Mar 23 14:27:37 sMaster01 kernel: [241450.356196] RSP: > 0018:ffff8804229abba0 EFLAGS: 00010202 > 755 Mar 23 14:27:37 sMaster01 kernel: [241450.356228] RAX: > ffff8804c001fad6 RBX: ffff8802e7235080 RCX: 00011200061e5110 > 756 Mar 23 14:27:37 sMaster01 kernel: [241450.356279] RDX: > 0000000000000008 RSI: 0000000000000008 RDI: ffff8802e7235080 > 757 Mar 23 14:27:37 sMaster01 kernel: [241450.356331] RBP: > ffff8802e7235080 R08: 0000000000000000 R09: ffff880425c54c00 > 758 Mar 23 14:27:37 sMaster01 kernel: [241450.356383] R10: > 0000000000000003 R11: 00000000022e539e R12: ffff8802e7235080 > 759 Mar 23 14:27:37 sMaster01 kernel: [241450.356434] R13: > ffff8802e7235080 R14: ffff880425c54c00 R15: ffff8802e6281850 > 760 Mar 23 14:27:37 sMaster01 kernel: [241450.356486] FS: > 00007faa6a757820(0000) GS:ffff88000fc80000(0000) knlGS:0000000000000000 > 761 Mar 23 14:27:37 sMaster01 kernel: [241450.356540] CS: 0010 DS: 0018 > ES: 0018 CR0: 000000008005003b > 762 Mar 23 14:27:37 sMaster01 kernel: [241450.356573] CR2: > ffff8804c001fade CR3: 00000000cc11f000 CR4: 00000000000006e0 > 763 Mar 23 14:27:37 sMaster01 kernel: [241450.356628] DR0: > 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > 764 Mar 23 14:27:37 sMaster01 kernel: [241450.356681] DR3: > 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > 765 Mar 23 14:27:37 sMaster01 kernel: [241450.356733] Process kjournald > (pid: 3557, threadinfo ffff8804229aa000, task ffff88041490a300) > 766 Mar 23 14:27:37 sMaster01 kernel: [241450.356788] Stack: > 767 Mar 23 14:27:37 sMaster01 kernel: [241450.356812] ffff880415382c00 > 0000000100000285 ffff8804229abfd8 0000000000005186 > 768 Mar 23 14:27:37 sMaster01 kernel: [241450.356852] <0> > 0000000000000000 000000000f1c2776 ffff8804128efa38 ffff8802e7235080 > 769 Mar 23 14:27:37 sMaster01 kernel: [241450.356913] <0> > ffff8802e7235080 ffff8802e7235080 ffff8800cdacae40 ffffffff8117eb5a > 770 Mar 23 14:27:37 sMaster01 kernel: [241450.356993] Call Trace: > 771 Mar 23 14:27:37 sMaster01 kernel: [241450.357021] > [<ffffffff8117eb5a>] ? generic_make_request+0xcd/0x2f9 > 772 Mar 23 14:27:37 sMaster01 kernel: [241450.357058] > [<ffffffff810b6034>] ? mempool_alloc+0x55/0x106 > 773 Mar 23 14:27:37 sMaster01 kernel: [241450.357091] > [<ffffffff8117ee5c>] ? submit_bio+0xd6/0xf2 > 774 Mar 23 14:27:37 sMaster01 kernel: [241450.357125] > [<ffffffff8110d83f>] ? submit_bh+0xf5/0x115 > 775 Mar 23 14:27:37 sMaster01 kernel: [241450.357158] > [<ffffffff8110edc0>] ? sync_dirty_buffer+0x51/0x93 > 776 Mar 23 14:27:37 sMaster01 kernel: [241450.357196] > [<ffffffffa01727c7>] ? journal_commit_transaction+0xaa6/0xe4f [jbd] > 777 Mar 23 14:27:37 sMaster01 kernel: [241450.357252] > [<ffffffffa0175194>] ? kjournald+0xdf/0x226 [jbd] > 778 Mar 23 14:27:37 sMaster01 kernel: [241450.357288] > [<ffffffff810651de>] ? autoremove_wake_function+0x0/0x2e > 779 Mar 23 14:27:37 sMaster01 kernel: [241450.357324] > [<ffffffffa01750b5>] ? kjournald+0x0/0x226 [jbd] > 780 Mar 23 14:27:37 sMaster01 kernel: [241450.357357] > [<ffffffff81064f11>] ? kthread+0x79/0x81 > 781 Mar 23 14:27:37 sMaster01 kernel: [241450.357391] > [<ffffffff81011baa>] ? child_rip+0xa/0x20 > 782 Mar 23 14:27:37 sMaster01 kernel: [241450.357425] > [<ffffffff81016568>] ? read_tsc+0xa/0x20 > 783 Mar 23 14:27:37 sMaster01 kernel: [241450.357456] > [<ffffffff81064e98>] ? kthread+0x0/0x81 > 784 Mar 23 14:27:37 sMaster01 kernel: [241450.357487] > [<ffffffff81011ba0>] ? child_rip+0x0/0x20 > 785 Mar 23 14:27:37 sMaster01 kernel: [241450.357517] Code: 5c c3 41 55 > 49 89 fd 41 54 55 53 48 83 ec 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 > 31 c0 85 f6 0f 84 86 00 00 00 48 8b 47 10 <48> 8b 40 08 48 8b 40 68 48 c1 > f8 09 74 74 89 f2 48 8b 0f 48 39 > 786 Mar 23 14:27:37 sMaster01 kernel: [241450.357738] RIP > [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd > 787 Mar 23 14:27:37 sMaster01 kernel: [241450.357772] RSP > <ffff8804229abba0> > 788 Mar 23 14:27:37 sMaster01 kernel: [241450.357799] CR2: > ffff8804c001fade > 789 Mar 23 14:27:37 sMaster01 kernel: [241450.358183] ---[ end trace > 608fcf1f5a482549 ]--- > >We had a guest crashing with the same error "unable to handle kernel paging request", but in the function __destroy_inode this time. Could faulty memory cause this problem on host and guest? Raph