hi xen 4.0.1 w/2.6.32.41 Last week dom0 experienced an hard crash and box need to be restarted manually (despite kernel.panic=20). Serial console was not setup, only netconsole. No relevant entries through netconsole, but analyzing logs I see some crashes twenty minutes before fatal hang. Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011963] Call Trace: Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011964] <IRQ> [<ffffffff810e5f8e>] __alloc_pages_nodemask+0x586/0x600 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011976] [<ffffffff8110dd6e>] alloc_slab_page+0x19/0x1b Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011980] [<ffffffff8110f632>] __slab_alloc+0x167/0x4b4 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011983] [<ffffffff81038d75>] ? xen_force_evtchn_callback+0xd/0xf Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011987] [<ffffffff8149064d>] ? find_skb+0x32/0x7d Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011991] [<ffffffff811113db>] __kmalloc_track_caller+0xfd/0x17e Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011994] [<ffffffff8103947f>] ? xen_restore_fl_direct_end+0x0/0x1 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011997] [<ffffffff8149064d>] ? find_skb+0x32/0x7d Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.011999] [<ffffffff81039492>] ? check_events+0x12/0x20 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012003] [<ffffffff81479851>] __alloc_skb+0x69/0x155 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012005] [<ffffffff8103947f>] ? xen_restore_fl_direct_end+0x0/0x1 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012045] [<ffffffff81039492>] ? check_events+0x12/0x20 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012050] [<ffffffff815b3fb0>] printk+0x3c/0x3e Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012052] [<ffffffff810775c6>] ? release_console_sem+0x1b1/0x1e2 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012057] [<ffffffff8103007f>] ? vmx_set_cr0+0x313/0x4bc Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012061] [<ffffffff814f8de0>] dump_packet+0x310/0x794 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012065] [<ffffffff811d8b90>] ? selinux_ip_postroute+0x2e/0x1d6 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012068] [<ffffffff815b6005>] ? _spin_unlock_irqrestore+0x37/0x39 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012071] [<ffffffff815b3fb0>] ? printk+0x3c/0x3e Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012074] [<ffffffff814f93c3>] ipt_log_packet+0x15f/0x185 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012077] [<ffffffff814f942e>] log_tg+0x45/0x4b Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012080] [<ffffffff814b3067>] ? conntrack_mt_v2+0x1b/0x1d Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012121] [<ffffffff81543be2>] ? br_forward_finish+0x0/0x53 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012124] [<ffffffff81543c35>] ? __br_forward+0x0/0x9c Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012127] [<ffffffff81543cbe>] __br_forward+0x89/0x9c Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012129] [<ffffffff814792e6>] ? skb_clone+0x58/0x5d Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012132] [<ffffffff81543b22>] br_flood+0xa2/0xbb Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012135] [<ffffffff81543b4b>] br_flood_forward+0x10/0x12 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012137] [<ffffffff81544872>] br_handle_frame_finish+0x13a/0x151 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012141] [<ffffffff81548616>] br_nf_pre_routing_finish+0x249/0x258 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012144] [<ffffffff815492d3>] br_nf_pre_routing+0x55d/0x57e Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012147] [<ffffffff8149fe3e>] nf_iterate+0x41/0x84 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012150] [<ffffffff81544738>] ? br_handle_frame_finish+0x0/0x151 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012155] [<ffffffff81544738>] ? br_handle_frame_finish+0x0/0x151 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012158] [<ffffffff81544a51>] br_handle_frame+0x1c8/0x1ef Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012163] [<ffffffff81482270>] netif_receive_skb+0x330/0x3fc Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012165] [<ffffffff81479ec5>] ? __netdev_alloc_skb+0x1d/0x3a Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012169] [<ffffffff814824bc>] napi_skb_finish+0x24/0x38 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012172] [<ffffffff81482900>] napi_gro_receive+0x2a/0x2f Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012175] [<ffffffff8135aa86>] e1000_receive_skb+0x43/0x4b Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012178] [<ffffffff8135c557>] e1000_clean_rx_irq+0x212/0x2b7 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012181] [<ffffffff8135b11f>] e1000_clean+0x75/0x21f Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012185] [<ffffffff8127e67c>] ? HYPERVISOR_physdev_op+0x16/0x4c Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012194] [<ffffffff8107d4ca>] __do_softirq+0xee/0x1c4 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012199] [<ffffffff8103de2c>] call_softirq+0x1c/0x30 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012202] [<ffffffff8103f4bd>] do_softirq+0x61/0xc2 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012206] [<ffffffff8107d311>] irq_exit+0x36/0x78 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012208] [<ffffffff8127f297>] xen_evtchn_do_upcall+0x37/0x47 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012212] [<ffffffff8103de7e>] xen_do_hypervisor_callback+0x1e/0x30 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012213] <EOI> [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012221] [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012224] [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012226] [<ffffffff81038dbb>] ? xen_safe_halt+0x10/0x1a Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012232] [<ffffffff8103bde6>] ? cpu_idle+0x66/0xa4 Dec 2 01:29:39 xenhost-rack1 kernel: [4437064.012237] [<ffffffff81594359>] ? rest_init+0x6d/0x6f Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012241] [<ffffffff81976dd5>] ? start_kernel+0x43c/0x447 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012245] [<ffffffff819762c1>] ? x86_64_start_reservations+0xac/0xb0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012248] [<ffffffff8197ab24>] ? xen_start_kernel+0x5ea/0x5f1 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012250] Mem-Info: Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012251] DMA per-cpu: Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012253] CPU 0: hi: 0, btch: 1 usd: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012255] CPU 1: hi: 0, btch: 1 usd: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012257] CPU 2: hi: 0, btch: 1 usd: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012258] CPU 3: hi: 0, btch: 1 usd: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012260] DMA32 per-cpu: Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012261] CPU 0: hi: 186, btch: 31 usd: 61 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012263] CPU 1: hi: 186, btch: 31 usd: 58 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012265] CPU 2: hi: 186, btch: 31 usd: 66 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012267] CPU 3: hi: 186, btch: 31 usd: 128 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012268] Normal per-cpu: Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012269] CPU 0: hi: 186, btch: 31 usd: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012271] CPU 1: hi: 186, btch: 31 usd: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012273] CPU 2: hi: 186, btch: 31 usd: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012275] CPU 3: hi: 186, btch: 31 usd: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012279] active_anon:3388 inactive_anon:14639 isolated_anon:0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012280] active_file:70965 inactive_file:66104 isolated_file:0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012281] unevictable:3570 dirty:1 writeback:148 unstable:0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012282] free:13738 slab_reclaimable:10452 slab_unreclaimable:5582 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012283] mapped:1948 shmem:325 pagetables:1309 bounce:0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012288] DMA free:13972kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13764kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012292] lowmem_reserve[]: 0 952 10042 10042 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012300] DMA32 free:36792kB min:1212kB low:1512kB high:1816kB active_anon:13552kB inactive_anon:58556kB active_file:283860kB inactive_file:264416kB unevictable:14280kB isolated(anon):0kB isolated(file):0kB present:975072kB mlocked:14280kB dirty:4kB writeback:592kB mapped:7792kB shmem:1300kB slab_reclaimable:41808kB slab_unreclaimable:19604kB kernel_stack:1208kB pagetables:5236kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012304] lowmem_reserve[]: 0 0 9090 9090 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012311] Normal free:4188kB min:11596kB low:14492kB high:17392kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:9308160kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:2724kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012316] lowmem_reserve[]: 0 0 0 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012318] DMA: 1*4kB 2*8kB 2*16kB 3*32kB 4*64kB 2*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 3*4096kB 13972kB Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012326] DMA32: 8114*4kB 535*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 36736kB Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012333] Normal: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 2*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB 4188kB Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012340] 138450 total pagecache pages Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012342] 13 pages in swap cache Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012344] Swap cache stats: add 21, delete 8, find 123000/123002 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012345] Free swap 3905476kB Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.012347] Total swap 3905528kB Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.035564] 2621440 pages RAM Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.035566] 2425336 pages reserved Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.035568] 154640 pages shared Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.035569] 45402 pages non-shared Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.035572] SLUB: Unable to allocate memory on node -1 (gfp=0x20) Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.035574] cache: kmalloc-1024, object size: 1024, buffer size: 1024, default order: 2, min order: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.035577] node 0: slabs: 870, objs: 6900, free: 0 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.035919] swapper: page allocation failure. order:0, mode:0x4020 Dec 2 01:29:40 xenhost-rack1 kernel: [4437064.035923] Pid: 0, comm: swapper Not tainted 2.6.32.41 #2 ..... it looks like an out of memory isn''t it? Not sure it this could be a xen bug, network driver issue or something else? At the moment I have upgrade to 2.6.32.47. Any help would be greatly appreciated thanks in advance _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
svenvan svenvan
2011-Dec-11 17:21 UTC
Re: xen 4.0.1/w 2.6.32 swapper: page allocation failure
2011/12/5 svenvan svenvan <svenvan.van@gmail.com>> hi > xen 4.0.1 w/2.6.32.41 > > Last week dom0 experienced an hard crash and box need to be restarted > manually (despite kernel.panic=20). > Serial console was not setup, only netconsole. No relevant entries > through netconsole, but analyzing logs I see some crashes twenty minutes > before fatal hang. > >Browsing archive I found a reply from Konrad Rzeszutek about something similar: http://old-list-archives.xen.org/archives/html/xen-devel/2011-09/msg00600.html Someone can confirm if it''s the same issue or not? Konrad maybe? Thanks _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Dec-12 22:00 UTC
Re: xen 4.0.1/w 2.6.32 swapper: page allocation failure
On Sun, Dec 11, 2011 at 06:21:52PM +0100, svenvan svenvan wrote:> 2011/12/5 svenvan svenvan <svenvan.van@gmail.com> > > > hi > > xen 4.0.1 w/2.6.32.41 > > > > Last week dom0 experienced an hard crash and box need to be restarted > > manually (despite kernel.panic=20). > > Serial console was not setup, only netconsole. No relevant entries > > through netconsole, but analyzing logs I see some crashes twenty minutes > > before fatal hang. > > > > > Browsing archive I found a reply from Konrad Rzeszutek about something > similar: > http://old-list-archives.xen.org/archives/html/xen-devel/2011-09/msg00600.html > > Someone can confirm if it''s the same issue or not? Konrad maybe?You would get much more traction if you CC-ed me. Anyhow, no idea - 2.6.32-41 is a bit ancient and this thread does not seem to have relevant data (such as serial console for examples, or the "some crashes").
svenvan svenvan
2011-Dec-13 12:08 UTC
Re: xen 4.0.1/w 2.6.32 swapper: page allocation failure
2011/12/12 Konrad Rzeszutek Wilk <konrad@darnok.org>> On Sun, Dec 11, 2011 at 06:21:52PM +0100, svenvan svenvan wrote: > > 2011/12/5 svenvan svenvan <svenvan.van@gmail.com> > > > > hi > > > xen 4.0.1 w/2.6.32.41 > > >> > Last week dom0 experienced an hard crash and box need to be > restarted > > > manually (despite kernel.panic=20). > > > Serial console was not setup, only netconsole. No relevant entries > > > through netconsole, but analyzing logs I see some crashes twenty > minutes > > > before fatal hang. > > >Browsing archive I found a reply from Konrad Rzeszutek about something > > similar: > > > http://old-list-archives.xen.org/archives/html/xen-devel/2011-09/msg00600.html > >> Someone can confirm if it''s the same issue or not? Konrad maybe? >> You would get much more traction if you CC-ed me. > Anyhow, no idea - 2.6.32-41 is a bit ancient and this thread does not > seem to have relevant data (such as serial console for examples, or the > "some crashes"). >Thanks. I have now setup serial console too Adding more infos: it happened in the night when there are some heavy rsync from guests to a nfs server for backup purpose. Just thinking about this one: Page allocation failure with e1000 http://lists.debian.org/debian-backports/2011/01/msg00037.html http://forums.gentoo.org/viewtopic-t-899896.html and this one: http://xen.1045712.n5.nabble.com/devel-next-2-6-39-SLUB-Unable-to-allocate-memory-on-node-1-gfp-0x20-tt4418288.html#a4422100 Maybe something related to e1000e driver? Increasing /proc/sys/vm/min_free_kbytes can help? xend-config.sxp relevant entries (dom0-min-mem 1024) (enable-dom0-ballooning no) grub relevant entry kernel /boot/xen-4.0.1.gz dom0_mem=1024M max_cstate=1 Do I need to add ''mem=1GB'' too? Thanks _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel