hi, I am working on a network driver for mini-os, and after creating a lot of semi-broken test domains, and a lot of vifs (80 or so), I got dom0 to crash like below. I have no idea how to reproduce this, just wanted to let people know there is a some bug in netback somewhere. Jacob ======================================================= (The gnttab errors are due to bugs in my domU code) XEN) (file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h, line=220 ) Error pfn d95a: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 (XEN) (file=grant_table.c, line=906) could not get destination frame d95a. (XEN) (file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h, line=220 ) Error pfn db11: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 (XEN) (file=grant_table.c, line=906) could not get destination frame db11. (XEN) (file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h, line=220 ) Error pfn db10: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 (XEN) (file=grant_table.c, line=906) could not get destination frame db10. (XEN) (file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h, line=220 ) Error pfn c805: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 (XEN) (file=grant_table.c, line=906) could not get destination frame c805. (XEN) (file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h, line=220 ) Error pfn c804: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 (XEN) (file=grant_table.c, line=906) could not get destination frame c804. (XEN) (file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h, line=220 ) Error pfn c823: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 (XEN) (file=grant_table.c, line=906) could not get destination frame c823. (XEN) (file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h, line=220 ) Error pfn c822: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 (XEN) (file=grant_table.c, line=906) could not get destination frame c822. Unable to handle kernel NULL pointer dereference at virtual address 0000000c printing eip: *pde = ma 00000000 pa fffff000 Oops: 0000 [#1] Modules linked in: CPU: 0 EIP: 0061:[<c026c7a8>] Not tainted VLI EFLAGS: 00010286 (2.6.16.29-xenxen #13) EIP is at net_rx_action+0x483/0x897 eax: 00000001 ebx: 00000000 ecx: 00000000 edx: c0447c60 esi: cbaafa60 edi: 0000001e ebp: cd6e8880 esp: c03f9eb0 ds: 007b es: 007b ss: 0069 Process swapper (pid: 0, threadinfo=c03f8000 task=c039ecc0) Stack: <0>000000f9 c03f9f3c c06bc030 00000000 00000000 00000001 00000001 0000000 0 00000000 cbaafa60 00000000 00000000 c03f9f0c c0114593 00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000001 00000001 00000001 Call Trace: [<c0114593>] __wake_up_common+0x2a/0x4a [<c011b86c>] tasklet_action+0x52/0x8e [<c011b73f>] __do_softirq+0x46/0x99 [<c011b7bd>] do_softirq+0x2b/0x48 [<c0105bae>] do_IRQ+0x1f/0x25 [<c02624cf>] evtchn_do_upcall+0x4f/0x76 [<c01046bd>] hypervisor_callback+0x3d/0x48 [<c0106c2f>] safe_halt+0x6b/0x84 [<c0102bc1>] xen_idle+0x36/0x3d [<c0102c78>] cpu_idle+0x2d/0x42 [<c03fa607>] start_kernel+0x26e/0x270 Code: e0 04 03 44 24 74 f6 40 0c 01 74 3c 8b 44 24 58 6b d0 18 40 89 44 24 58 03 54 24 6c 66 83 7a 14 00 0f 84 94 00 00 00 8b 4c 24 2c <0f> bf 41 0c 57 50 68 f7 fb 36 c0 e8 ca b9 ea ff c7 44 24 34 ff <0>Kernel panic - not syncing: Fatal exception in interrupt (XEN) Domain 0 crashed: rebooting machine in 5 seconds. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I am working on a network driver for mini-os, and after creating a lot > of semi-broken test domains, and a lot of vifs (80 or so), I got dom0to> crash like below. > > I have no idea how to reproduce this, just wanted to let people know > there is a some bug in netback somewhere.Please could you try and look at the disassembly and figure out which pointer dereference is causing the problem. Are you using rx-copy or rx-flip? Thanks, Ian> > =======================================================> > (The gnttab errors are due to bugs in my domU code) > > XEN) (file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h, > line=220 > ) Error pfn d95a: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 > (XEN) (file=grant_table.c, line=906) could not get destination frame > d95a. > (XEN)(file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h,> line=220 > ) Error pfn db11: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 > (XEN) (file=grant_table.c, line=906) could not get destination frame > db11. > (XEN)(file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h,> line=220 > ) Error pfn db10: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 > (XEN) (file=grant_table.c, line=906) could not get destination frame > db10. > (XEN)(file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h,> line=220 > ) Error pfn c805: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 > (XEN) (file=grant_table.c, line=906) could not get destination frame > c805. > (XEN)(file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h,> line=220 > ) Error pfn c804: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 > (XEN) (file=grant_table.c, line=906) could not get destination frame > c804. > (XEN)(file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h,> line=220 > ) Error pfn c823: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 > (XEN) (file=grant_table.c, line=906) could not get destination frame > c823. > (XEN)(file=/mnt/hda3/home/jacobg/xen-evilman5.hg/xen/include/asm/mm.h,> line=220 > ) Error pfn c822: rd=ffb04080, od=00000000, caf=00000000, taf=00000000 > (XEN) (file=grant_table.c, line=906) could not get destination frame > c822. > Unable to handle kernel NULL pointer dereference at virtual address > 0000000c > printing eip: > *pde = ma 00000000 pa fffff000 > Oops: 0000 [#1] > Modules linked in: > CPU: 0 > EIP: 0061:[<c026c7a8>] Not tainted VLI > EFLAGS: 00010286 (2.6.16.29-xenxen #13) > EIP is at net_rx_action+0x483/0x897 > eax: 00000001 ebx: 00000000 ecx: 00000000 edx: c0447c60 > esi: cbaafa60 edi: 0000001e ebp: cd6e8880 esp: c03f9eb0 > ds: 007b es: 007b ss: 0069 > Process swapper (pid: 0, threadinfo=c03f8000 task=c039ecc0) > Stack: <0>000000f9 c03f9f3c c06bc030 00000000 00000000 0000000100000001> 0000000 > 0 > 00000000 cbaafa60 00000000 00000000 c03f9f0c c0114593 00000000 > 00000000 > 00000000 00000000 00000000 00000001 00000000 00000001 00000001 > 00000001 > Call Trace: > [<c0114593>] __wake_up_common+0x2a/0x4a > [<c011b86c>] tasklet_action+0x52/0x8e > [<c011b73f>] __do_softirq+0x46/0x99 > [<c011b7bd>] do_softirq+0x2b/0x48 > [<c0105bae>] do_IRQ+0x1f/0x25 > [<c02624cf>] evtchn_do_upcall+0x4f/0x76 > [<c01046bd>] hypervisor_callback+0x3d/0x48 > [<c0106c2f>] safe_halt+0x6b/0x84 > [<c0102bc1>] xen_idle+0x36/0x3d > [<c0102c78>] cpu_idle+0x2d/0x42 > [<c03fa607>] start_kernel+0x26e/0x270 > Code: e0 04 03 44 24 74 f6 40 0c 01 74 3c 8b 44 24 58 6b d0 18 40 8944> 24 58 03 > 54 24 6c 66 83 7a 14 00 0f 84 94 00 00 00 8b 4c 24 2c <0f> bf 41 0c57> 50 68 f7 > fb 36 c0 e8 ca b9 ea ff c7 44 24 34 ff > <0>Kernel panic - not syncing: Fatal exception in interrupt > (XEN) Domain 0 crashed: rebooting machine in 5 seconds. > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Nov 24, 2006 at 07:16:13PM -0000, Ian Pratt wrote:> > I am working on a network driver for mini-os, and after creating a lot > > of semi-broken test domains, and a lot of vifs (80 or so), I got dom0 > to > > crash like below. > > > > I have no idea how to reproduce this, just wanted to let people know > > there is a some bug in netback somewhere. > > Please could you try and look at the disassembly and figure out which > pointer dereference is causing the problem. Are you using rx-copy or > rx-flip?hi, I am using rx-copy. The bug seems to stem from the first DPRINTK in netbk_check_gop, where gop may be used unitialized. Actually, gcc already warns about this, and since most people will not have this printk, it is probably harmless in real life. Jacob ============== static int netbk_check_gop(int nr_frags, domid_t domid, struct netrx_pending_operations *npo) { multicall_entry_t *mcl; gnttab_transfer_t *gop; gnttab_copy_t *copy_op; int status = NETIF_RSP_OKAY; int i; for (i = 0; i <= nr_frags; i++) { if (npo->meta[npo->meta_cons + i].copy) { copy_op = npo->copy + npo->copy_cons++; if (copy_op->status != GNTST_okay) { DPRINTK("Bad status %d from copy to DOM%d.\n", gop->status, domid); status = NETIF_RSP_ERROR; } } else { ... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 2006-11-27 at 12:36 +0100, Jacob Gorm Hansen wrote:> I am using rx-copy. The bug seems to stem from the first DPRINTK in > netbk_check_gop, where gop may be used unitialized. Actually, gcc already warns > about this, and since most people will not have this printk, it is probably > harmless in real life.> if (copy_op->status != GNTST_okay) { > DPRINTK("Bad status %d from copy to DOM%d.\n", > gop->status, domid);That looks to me like a straight typo: it should be printing the value of copy_op->status not gop->status Does a change as simple as this need a patch sending in or will someone in Xensource pick it up from here? Kieran _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27/11/06 12:31, "Kieran Mansley" <kmansley@solarflare.com> wrote:> That looks to me like a straight typo: it should be printing the value > of > copy_op->status not gop->status > > Does a change as simple as this need a patch sending in or will someone > in Xensource pick it up from here?I fixed it. K. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel