I have 3 VMs, two running webservers and the 3rd running netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0 and all the remaining VMs on a separate CPU. Currently my dom0 has 528M of memory, while each VM has around 160M. Under high loads, the system crashes. I'm pasting a representative crash here: file=grant_table.c, line=729) gnttab_transfer: out-of-range or xen frame 2f016001 (XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or xen frame 2f017001 (XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or xen frame 18fca001 (XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or xen frame 18fcb001 (XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or xen frame 2270c001 (XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or xen frame 2270d001 ------------[ cut here ]------------ kernel BUG at drivers/xen/netback/netback.c:335! invalid operand: 0000 [#1] Modules linked in: ipt_physdev iptable_filter ip_tables video thermal processor fan button battery ac md sworks_agp agpgart dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptbase sd_mod scsi_mod CPU: 0 EIP: 0061:[<c02c6782>] Not tainted VLI EFLAGS: 00010246 (2.6.12.6-xen0) EIP is at net_rx_action+0x4c2/0x4f0 eax: 0000fff7 ebx: df26b620 ecx: 00000042 edx: c04b8920 esi: dc073480 edi: 00000000 ebp: c04b3900 esp: c0a23d28 ds: 007b es: 007b ss: 0069 Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510) Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80 df26b400 5cb34f36 00000088 00000000 0003d700 db6ff012 c04b8920 00000106 00a23e2c c05e5000 00000000 c0363a90 00000001 00000000 00000000 00000001 c0a16510 00000000 Call Trace: [<c0362d90>] br_forward_finish+0x0/0x80 [<c0363b36>] br_handle_frame_finish+0xa6/0x160 [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c01423a5>] kmem_getpages+0x65/0x90 [<c013ece2>] __rmqueue+0xb2/0xf0 [<c032302d>] nf_iterate+0x5d/0x90 [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 [<c032336e>] nf_hook_slow+0x6e/0x120 [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c0368549>] br_nf_pre_routing+0x319/0x4a0 [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 [<c032302d>] nf_iterate+0x5d/0x90 [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c032336e>] nf_hook_slow+0x6e/0x120 [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c0363db3>] br_handle_frame+0x1c3/0x260 [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c03188d3>] netif_receive_skb+0x113/0x230 [<c02820bf>] tg3_rx+0x2cf/0x490 [<c027e246>] tg3_restart_ints+0x26/0xa0 [<c02823a6>] tg3_poll+0x126/0x1a0 [<c0121660>] ksoftirqd+0x0/0xa0 [<c0121660>] ksoftirqd+0x0/0xa0 [<c01214ff>] tasklet_action+0x5f/0xa0 [<c0121152>] __do_softirq+0x52/0xc0 [<c0121207>] do_softirq+0x47/0x60 [<c01216b9>] ksoftirqd+0x59/0xa0 [<c013079d>] kthread+0xad/0xf0 [<c01306f0>] kthread+0x0/0xf0 [<c0106855>] kernel_thread_helper+0x5/0x10 Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40 c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b 2a 01 38 19 3a c0 <0>Kernel panic - not syncing: Fatal exception in interrupt (XEN) Domain 0 shutdown: rebooting machine. NOTE: the line number in netback.c (335) might not be very useful for reference. I have some additional instrumentation in netback, so the line number might not match the files in xen-unstable.hg Will increasing dom0 memory further help? Or increasing the size of the rings? -- Web/Blog/Gallery: http://floatingsun.net _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I have 3 VMs, two running webservers and the 3rd running > netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0 > and all the remaining VMs on a separate CPU. > > Currently my dom0 has 528M of memory, while each VM has around 160M. > Under high loads, the system crashes. I''m pasting a > representative crash here: > > file=grant_table.c, line=729) gnttab_transfer: out-of-range > or xen frame 2f016001 > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 2f017001Interesting. We''ve seen this very occasionally before, but this is the first time on a 32b kernel. The clue is that the errant frame numbers always end 001, and are actually valid if you shift them >>12. It would be very helpful if you could work on a minimal repro case, ideally with only one domU. Chris: any extra debugging that might be helpful? Thanks, Ian> (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 18fca001 > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 18fcb001 > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 2270c001 > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 2270d001 ------------[ cut here > ]------------ kernel BUG at drivers/xen/netback/netback.c:335! > invalid operand: 0000 [#1] > Modules linked in: ipt_physdev iptable_filter ip_tables video > thermal processor fan button battery ac md sworks_agp agpgart > dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih > mptbase sd_mod scsi_mod > CPU: 0 > EIP: 0061:[<c02c6782>] Not tainted VLI > EFLAGS: 00010246 (2.6.12.6-xen0) > EIP is at net_rx_action+0x4c2/0x4f0 > eax: 0000fff7 ebx: df26b620 ecx: 00000042 edx: c04b8920 > esi: dc073480 edi: 00000000 ebp: c04b3900 esp: c0a23d28 > ds: 007b es: 007b ss: 0069 > Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510) > Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80 > df26b400 5cb34f36 > 00000088 00000000 0003d700 db6ff012 c04b8920 00000106 > 00a23e2c c05e5000 > 00000000 c0363a90 00000001 00000000 00000000 00000001 > c0a16510 00000000 Call Trace: > [<c0362d90>] br_forward_finish+0x0/0x80 [<c0363b36>] > br_handle_frame_finish+0xa6/0x160 [<c0363a90>] > br_handle_frame_finish+0x0/0x160 [<c01423a5>] > kmem_getpages+0x65/0x90 [<c013ece2>] __rmqueue+0xb2/0xf0 > [<c032302d>] nf_iterate+0x5d/0x90 [<c0367aa0>] > br_nf_pre_routing_finish+0x0/0x420 > [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 > [<c032336e>] nf_hook_slow+0x6e/0x120 > [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 > [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c0368549>] > br_nf_pre_routing+0x319/0x4a0 [<c0367aa0>] > br_nf_pre_routing_finish+0x0/0x420 > [<c032302d>] nf_iterate+0x5d/0x90 > [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c0363a90>] > br_handle_frame_finish+0x0/0x160 [<c032336e>] > nf_hook_slow+0x6e/0x120 [<c0363a90>] > br_handle_frame_finish+0x0/0x160 [<c0363db3>] > br_handle_frame+0x1c3/0x260 [<c0363a90>] > br_handle_frame_finish+0x0/0x160 [<c03188d3>] > netif_receive_skb+0x113/0x230 [<c02820bf>] > tg3_rx+0x2cf/0x490 [<c027e246>] tg3_restart_ints+0x26/0xa0 > [<c02823a6>] tg3_poll+0x126/0x1a0 [<c0121660>] > ksoftirqd+0x0/0xa0 [<c0121660>] ksoftirqd+0x0/0xa0 > [<c01214ff>] tasklet_action+0x5f/0xa0 [<c0121152>] > __do_softirq+0x52/0xc0 [<c0121207>] do_softirq+0x47/0x60 > [<c01216b9>] ksoftirqd+0x59/0xa0 [<c013079d>] > kthread+0xad/0xf0 [<c01306f0>] kthread+0x0/0xf0 > [<c0106855>] kernel_thread_helper+0x5/0x10 > Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40 > c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff > 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b > 2a 01 38 19 3a c0 <0>Kernel panic - not syncing: Fatal > exception in interrupt > (XEN) Domain 0 shutdown: rebooting machine. > > NOTE: the line number in netback.c (335) might not be very > useful for reference. I have some additional instrumentation > in netback, so the line number might not match the files in > xen-unstable.hg > > Will increasing dom0 memory further help? Or increasing the > size of the rings? > -- > Web/Blog/Gallery: http://floatingsun.net >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Diwaker Could you add this patch to your build of the domain 0 kernel and try to exercise the fault again please? thanks Christopher diff -r 821368442403 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Thu Jan 12 11:45:49 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Thu Jan 12 14:36:56 2006 @@ -212,6 +212,14 @@ vdata = (unsigned long)skb->data; old_mfn = virt_to_mfn(vdata); + if ( ((old_mfn & 0xfff) == 0x001) && (old_mfn > 0x10000000UL) ) + { + printk("XXX: nasty mfn from p2m: v:%p p:%p m:%p\n", + vdata, __pa(vdata), old_mfn ); + /* HACK: let''s try shifting it until it looks sane... */ + old_mfn >>= 12; + } + /* Memory squeeze? Back off for an arbitrary while. */ if ((new_mfn = alloc_mfn()) == 0) { if ( net_ratelimit() ) On 1/12/06, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:> > > > > I have 3 VMs, two running webservers and the 3rd running > > netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0 > > and all the remaining VMs on a separate CPU. > > > > Currently my dom0 has 528M of memory, while each VM has around 160M. > > Under high loads, the system crashes. I''m pasting a > > representative crash here: > > > > file=grant_table.c, line=729) gnttab_transfer: out-of-range > > or xen frame 2f016001 > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > out-of-range or xen frame 2f017001 > > Interesting. We''ve seen this very occasionally before, but this is the > first time on a 32b kernel. > > The clue is that the errant frame numbers always end 001, and are > actually valid if you shift them >>12. > > It would be very helpful if you could work on a minimal repro case, > ideally with only one domU. > > Chris: any extra debugging that might be helpful? > > Thanks, > Ian > > > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > out-of-range or xen frame 18fca001 > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > out-of-range or xen frame 18fcb001 > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > out-of-range or xen frame 2270c001 > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > out-of-range or xen frame 2270d001 ------------[ cut here > > ]------------ kernel BUG at drivers/xen/netback/netback.c:335! > > invalid operand: 0000 [#1] > > Modules linked in: ipt_physdev iptable_filter ip_tables video > > thermal processor fan button battery ac md sworks_agp agpgart > > dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih > > mptbase sd_mod scsi_mod > > CPU: 0 > > EIP: 0061:[<c02c6782>] Not tainted VLI > > EFLAGS: 00010246 (2.6.12.6-xen0) > > EIP is at net_rx_action+0x4c2/0x4f0 > > eax: 0000fff7 ebx: df26b620 ecx: 00000042 edx: c04b8920 > > esi: dc073480 edi: 00000000 ebp: c04b3900 esp: c0a23d28 > > ds: 007b es: 007b ss: 0069 > > Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510) > > Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80 > > df26b400 5cb34f36 > > 00000088 00000000 0003d700 db6ff012 c04b8920 00000106 > > 00a23e2c c05e5000 > > 00000000 c0363a90 00000001 00000000 00000000 00000001 > > c0a16510 00000000 Call Trace: > > [<c0362d90>] br_forward_finish+0x0/0x80 [<c0363b36>] > > br_handle_frame_finish+0xa6/0x160 [<c0363a90>] > > br_handle_frame_finish+0x0/0x160 [<c01423a5>] > > kmem_getpages+0x65/0x90 [<c013ece2>] __rmqueue+0xb2/0xf0 > > [<c032302d>] nf_iterate+0x5d/0x90 [<c0367aa0>] > > br_nf_pre_routing_finish+0x0/0x420 > > [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 > > [<c032336e>] nf_hook_slow+0x6e/0x120 > > [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 > > [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c0368549>] > > br_nf_pre_routing+0x319/0x4a0 [<c0367aa0>] > > br_nf_pre_routing_finish+0x0/0x420 > > [<c032302d>] nf_iterate+0x5d/0x90 > > [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c0363a90>] > > br_handle_frame_finish+0x0/0x160 [<c032336e>] > > nf_hook_slow+0x6e/0x120 [<c0363a90>] > > br_handle_frame_finish+0x0/0x160 [<c0363db3>] > > br_handle_frame+0x1c3/0x260 [<c0363a90>] > > br_handle_frame_finish+0x0/0x160 [<c03188d3>] > > netif_receive_skb+0x113/0x230 [<c02820bf>] > > tg3_rx+0x2cf/0x490 [<c027e246>] tg3_restart_ints+0x26/0xa0 > > [<c02823a6>] tg3_poll+0x126/0x1a0 [<c0121660>] > > ksoftirqd+0x0/0xa0 [<c0121660>] ksoftirqd+0x0/0xa0 > > [<c01214ff>] tasklet_action+0x5f/0xa0 [<c0121152>] > > __do_softirq+0x52/0xc0 [<c0121207>] do_softirq+0x47/0x60 > > [<c01216b9>] ksoftirqd+0x59/0xa0 [<c013079d>] > > kthread+0xad/0xf0 [<c01306f0>] kthread+0x0/0xf0 > > [<c0106855>] kernel_thread_helper+0x5/0x10 > > Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40 > > c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff > > 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b > > 2a 01 38 19 3a c0 <0>Kernel panic - not syncing: Fatal > > exception in interrupt > > (XEN) Domain 0 shutdown: rebooting machine. > > > > NOTE: the line number in netback.c (335) might not be very > > useful for reference. I have some additional instrumentation > > in netback, so the line number might not match the files in > > xen-unstable.hg > > > > Will increasing dom0 memory further help? Or increasing the > > size of the rings? > > -- > > Web/Blog/Gallery: http://floatingsun.net > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I have 3 VMs, two running webservers and the 3rd running > netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0 > and all the remaining VMs on a separate CPU. > > Currently my dom0 has 528M of memory, while each VM has around 160M. > Under high loads, the system crashes. I''m pasting a > representative crash here:Is this PAE or not? How much memory has the system got? Thanks, Ian> file=grant_table.c, line=729) gnttab_transfer: out-of-range > or xen frame 2f016001 > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 2f017001 > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 18fca001 > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 18fcb001 > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 2270c001 > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > out-of-range or xen frame 2270d001 ------------[ cut here > ]------------ kernel BUG at drivers/xen/netback/netback.c:335! > invalid operand: 0000 [#1] > Modules linked in: ipt_physdev iptable_filter ip_tables video > thermal processor fan button battery ac md sworks_agp agpgart > dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih > mptbase sd_mod scsi_mod > CPU: 0 > EIP: 0061:[<c02c6782>] Not tainted VLI > EFLAGS: 00010246 (2.6.12.6-xen0) > EIP is at net_rx_action+0x4c2/0x4f0 > eax: 0000fff7 ebx: df26b620 ecx: 00000042 edx: c04b8920 > esi: dc073480 edi: 00000000 ebp: c04b3900 esp: c0a23d28 > ds: 007b es: 007b ss: 0069 > Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510) > Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80 > df26b400 5cb34f36 > 00000088 00000000 0003d700 db6ff012 c04b8920 00000106 > 00a23e2c c05e5000 > 00000000 c0363a90 00000001 00000000 00000000 00000001 > c0a16510 00000000 Call Trace: > [<c0362d90>] br_forward_finish+0x0/0x80 [<c0363b36>] > br_handle_frame_finish+0xa6/0x160 [<c0363a90>] > br_handle_frame_finish+0x0/0x160 [<c01423a5>] > kmem_getpages+0x65/0x90 [<c013ece2>] __rmqueue+0xb2/0xf0 > [<c032302d>] nf_iterate+0x5d/0x90 [<c0367aa0>] > br_nf_pre_routing_finish+0x0/0x420 > [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 > [<c032336e>] nf_hook_slow+0x6e/0x120 > [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 > [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c0368549>] > br_nf_pre_routing+0x319/0x4a0 [<c0367aa0>] > br_nf_pre_routing_finish+0x0/0x420 > [<c032302d>] nf_iterate+0x5d/0x90 > [<c0363a90>] br_handle_frame_finish+0x0/0x160 [<c0363a90>] > br_handle_frame_finish+0x0/0x160 [<c032336e>] > nf_hook_slow+0x6e/0x120 [<c0363a90>] > br_handle_frame_finish+0x0/0x160 [<c0363db3>] > br_handle_frame+0x1c3/0x260 [<c0363a90>] > br_handle_frame_finish+0x0/0x160 [<c03188d3>] > netif_receive_skb+0x113/0x230 [<c02820bf>] > tg3_rx+0x2cf/0x490 [<c027e246>] tg3_restart_ints+0x26/0xa0 > [<c02823a6>] tg3_poll+0x126/0x1a0 [<c0121660>] > ksoftirqd+0x0/0xa0 [<c0121660>] ksoftirqd+0x0/0xa0 > [<c01214ff>] tasklet_action+0x5f/0xa0 [<c0121152>] > __do_softirq+0x52/0xc0 [<c0121207>] do_softirq+0x47/0x60 > [<c01216b9>] ksoftirqd+0x59/0xa0 [<c013079d>] > kthread+0xad/0xf0 [<c01306f0>] kthread+0x0/0xf0 > [<c0106855>] kernel_thread_helper+0x5/0x10 > Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40 > c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff > 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b > 2a 01 38 19 3a c0 <0>Kernel panic - not syncing: Fatal > exception in interrupt > (XEN) Domain 0 shutdown: rebooting machine. > > NOTE: the line number in netback.c (335) might not be very > useful for reference. I have some additional instrumentation > in netback, so the line number might not match the files in > xen-unstable.hg > > Will increasing dom0 memory further help? Or increasing the > size of the rings? > -- > Web/Blog/Gallery: http://floatingsun.net >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Christopher, Curiously I'm unable to reproduce the bug today (I just recompiled my sources -- I didn't pull in any new changes though, so I don't think anything changed) even without your patch. I'll report back if I see the crash again, Thanks, Diwaker On 1/12/06, Christopher Clark <christopher.clark@cl.cam.ac.uk> wrote:> Hi Diwaker > > Could you add this patch to your build of the domain 0 kernel and try to > exercise the fault again please? > > thanks > > Christopher > > diff -r 821368442403 linux-2.6-xen-sparse/drivers/xen/netback/netback.c > --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c > Thu Jan 12 11:45:49 2006 > +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c > Thu Jan 12 14:36:56 2006 > @@ -212,6 +212,14 @@ > vdata = (unsigned long)skb->data; > old_mfn = virt_to_mfn(vdata); > > + if ( ((old_mfn & 0xfff) == 0x001) && (old_mfn > 0x10000000UL) ) > + { > + printk("XXX: nasty mfn from p2m: v:%p p:%p m:%p\n", > + vdata, __pa(vdata), old_mfn ); > + /* HACK: let's try shifting it until it looks sane... */ > + old_mfn >>= 12; > + } > + > /* Memory squeeze? Back off for an arbitrary while. */ > if ((new_mfn = alloc_mfn()) == 0) { > if ( net_ratelimit() ) > > > > On 1/12/06, Ian Pratt < m+Ian.Pratt@cl.cam.ac.uk> wrote: > > > > > > > I have 3 VMs, two running webservers and the 3rd running > > > netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0 > > > and all the remaining VMs on a separate CPU. > > > > > > Currently my dom0 has 528M of memory, while each VM has around 160M. > > > Under high loads, the system crashes. I'm pasting a > > > representative crash here: > > > > > > file=grant_table.c, line=729) gnttab_transfer: out-of-range > > > or xen frame 2f016001 > > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > > out-of-range or xen frame 2f017001 > > > > Interesting. We've seen this very occasionally before, but this is the > > first time on a 32b kernel. > > > > The clue is that the errant frame numbers always end 001, and are > > actually valid if you shift them >>12. > > > > It would be very helpful if you could work on a minimal repro case, > > ideally with only one domU. > > > > Chris: any extra debugging that might be helpful? > > > > Thanks, > > Ian > > > > > > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > > out-of-range or xen frame 18fca001 > > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > > out-of-range or xen frame 18fcb001 > > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > > out-of-range or xen frame 2270c001 > > > (XEN) (file=grant_table.c, line=729) gnttab_transfer: > > > out-of-range or xen frame 2270d001 ------------[ cut here > > > ]------------ kernel BUG at > drivers/xen/netback/netback.c:335! > > > invalid operand: 0000 [#1] > > > Modules linked in: ipt_physdev iptable_filter ip_tables video > > > thermal processor fan button battery ac md sworks_agp agpgart > > > dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih > > > mptbase sd_mod scsi_mod > > > CPU: 0 > > > EIP: 0061:[<c02c6782>] Not tainted VLI > > > EFLAGS: 00010246 (2.6.12.6-xen0) > > > EIP is at net_rx_action+0x4c2/0x4f0 > > > eax: 0000fff7 ebx: df26b620 ecx: 00000042 edx: c04b8920 > > > esi: dc073480 edi: 00000000 ebp: c04b3900 esp: c0a23d28 > > > ds: 007b es: 007b ss: 0069 > > > Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510) > > > Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80 > > > df26b400 5cb34f36 > > > 00000088 00000000 0003d700 db6ff012 c04b8920 00000106 > > > 00a23e2c c05e5000 > > > 00000000 c0363a90 00000001 00000000 00000000 00000001 > > > c0a16510 00000000 Call Trace: > > > [<c0362d90>] br_forward_finish+0x0/0x80 [<c0363b36>] > > > br_handle_frame_finish+0xa6/0x160 [<c0363a90>] > > > br_handle_frame_finish+0x0/0x160 [<c01423a5>] > > > kmem_getpages+0x65/0x90 [<c013ece2>] > __rmqueue+0xb2/0xf0 > > > [<c032302d>] nf_iterate+0x5d/0x90 [<c0367aa0>] > > > br_nf_pre_routing_finish+0x0/0x420 > > > [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 > > > [<c032336e>] nf_hook_slow+0x6e/0x120 > > > [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420 > > > [<c0363a90>] br_handle_frame_finish+0x0/0x160 > [<c0368549>] > > > br_nf_pre_routing+0x319/0x4a0 [<c0367aa0>] > > > br_nf_pre_routing_finish+0x0/0x420 > > > [<c032302d>] nf_iterate+0x5d/0x90 > > > [<c0363a90>] br_handle_frame_finish+0x0/0x160 > [<c0363a90>] > > > br_handle_frame_finish+0x0/0x160 [<c032336e>] > > > nf_hook_slow+0x6e/0x120 [<c0363a90>] > > > br_handle_frame_finish+0x0/0x160 [<c0363db3>] > > > br_handle_frame+0x1c3/0x260 [<c0363a90>] > > > br_handle_frame_finish+0x0/0x160 [<c03188d3>] > > > netif_receive_skb+0x113/0x230 [<c02820bf>] > > > tg3_rx+0x2cf/0x490 [<c027e246>] > tg3_restart_ints+0x26/0xa0 > > > [<c02823a6>] tg3_poll+0x126/0x1a0 [<c0121660>] > > > ksoftirqd+0x0/0xa0 [<c0121660>] ksoftirqd+0x0/0xa0 > > > [<c01214ff>] tasklet_action+0x5f/0xa0 [<c0121152>] > > > __do_softirq+0x52/0xc0 [<c0121207>] > do_softirq+0x47/0x60 > > > [<c01216b9>] ksoftirqd+0x59/0xa0 [<c013079d>] > > > kthread+0xad/0xf0 [<c01306f0>] kthread+0x0/0xf0 > > > [<c0106855>] kernel_thread_helper+0x5/0x10 > > > Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40 > > > c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff > > > 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b > > > 2a 01 38 19 3a c0 <0>Kernel panic - not syncing: Fatal > > > exception in interrupt > > > (XEN) Domain 0 shutdown: rebooting machine. > > > > > > NOTE: the line number in netback.c (335) might not be very > > > useful for reference. I have some additional instrumentation > > > in netback, so the line number might not match the files in > > > xen-unstable.hg > > > > > > Will increasing dom0 memory further help? Or increasing the > > > size of the rings? > > > -- > > > Web/Blog/Gallery: http://floatingsun.net > > > > > > >-- Web/Blog/Gallery: http://floatingsun.net _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel