Kirk Allan
2006-May-25 15:11 UTC
[Xen-devel] netfront.c: gnttab_query_foreign_access returns non zero in network_tx_buf_gc
I''ve been working form the netfront.c in the testing tree and using SLES 10 RC1 for i386 on a SMP box. When I stress the network using iperf in a domU, domU acting as client on a gigabit network, I occasionally get a panic at the dev_kfree_skb_irq(skb); line. This is the same panic as reported in http://lists.xensource.com/archives/html/xen-devel/2006-05/msg00919.html The trace indicates that the skb is bad and it looks like the skb is an id. Investigating further, the condition occurs if the gnttab_query_foreign_access returns non zero on a second or latter iteration through the for loop. If it return non zero, the the code takes the ''goto out'' which by passes fixing up np->tx.rsp_cons. Then the next time in network_tx_buf_gc we reuse np->tx.rsp_cons which is at the location of a previously completed skb and the skb gets an id and not a skb. Looking at the unstable tree, the goto has been removed and replaced with a break. However, it looks like if gnttab_query_foreign_access returns non zero between np->tx.rsp_cons and prod, then the np->tx.rsp_cons = prod; could advance np->tx.rsp_cons too far causing other problems latter (I have not tested this yet though). The problem I''m having is that I can''t find the root cause as to why gnttab_query_foreign_access returns an 8 (GTF_reading?) and not 0. I''ve looked in netback.c and and xen/common/grant_table.c and am not seeing it (not that it''s not there). Thanks for any help and understanding. Kirk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Hand
2006-May-25 16:37 UTC
Re: [Xen-devel] netfront.c: gnttab_query_foreign_access returns nonzero in network_tx_buf_gc
> I''ve been working form the netfront.c in the testing tree and using SLES > 10 RC1 for i386 on a SMP box. When I stress the network using iperf in > a domU, domU acting as client on a gigabit network, I occasionally get a > panic at the dev_kfree_skb_irq(skb); line. This is the same panic as > reported in > http://lists.xensource.com/archives/html/xen-devel/2006-05/msg00919.html > > The trace indicates that the skb is bad and it looks like the skb is > an id. Investigating further, the condition occurs if the > gnttab_query_foreign_access returns non zero on a second or latter > iteration through the for loop. If it return non zero, the the code > takes the ''goto out'' which by passes fixing up np->tx.rsp_cons. Then > the next time in network_tx_buf_gc we reuse np->tx.rsp_cons which is at > the location of a previously completed skb and the skb gets an id and > not a skb. > > Looking at the unstable tree, the goto has been removed and replaced > with a break. However, it looks like if gnttab_query_foreign_access > returns non zero between np->tx.rsp_cons and prod, then the > np->tx.rsp_cons = prod; could advance np->tx.rsp_cons too far causing > other problems latter (I have not tested this yet though).Yes, this definitely looks like a bug; the ''break'' in -unstable is not really much better than the ''goto out:'' in -testing since in either case we can''t easily correctly recover.> The problem I''m having is that I can''t find the root cause as to why > gnttab_query_foreign_access returns an 8 (GTF_reading?) and not 0. I''ve > looked in netback.c and and xen/common/grant_table.c and am not seeing > it (not that it''s not there).Well all this means is that netback is still using the grant which should of course be impossible since the ring pointers have been advanced. I.e. something is borked. Can you try this with a debug build of xen? It would be interesting to see if xen complains about any grant refs prior to this occurance... cheers, S. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kirk Allan
2006-May-25 19:58 UTC
Re: [Xen-devel] netfront.c: gnttab_query_foreign_access returns nonzero in network_tx_buf_gc
>>> On Thu, May 25, 2006 at 10:37 AM, in message<04b301c68019$96989e80$0302a8c0@Violet>, "Steven Hand" <steven.hand@cl.cam.ac.uk> wrote:>> I''ve been working form the netfront.c in the testing tree and usingSLES>> 10 RC1 for i386 on a SMP box. When I stress the network using iperfin>> a domU, domU acting as client on a gigabit network, I occasionallyget a>> panic at the dev_kfree_skb_irq(skb); line. This is the same panicas>> reported in >> http://lists.xensource.com/archives/html/xen- devel/2006-05/msg00919.html>> >> The trace indicates that the skb is bad and it looks like the skbis>> an id. Investigating further, the condition occurs if the >> gnttab_query_foreign_access returns non zero on a second or latter >> iteration through the for loop. If it return non zero, the thecode>> takes the ''goto out'' which by passes fixing up np- >tx.rsp_cons.Then>> the next time in network_tx_buf_gc we reuse np- >tx.rsp_cons whichis at>> the location of a previously completed skb and the skb gets an idand>> not a skb. >> >> Looking at the unstable tree, the goto has been removed andreplaced>> with a break. However, it looks like ifgnttab_query_foreign_access>> returns non zero between np- >tx.rsp_cons and prod, then the >> np- >tx.rsp_cons = prod; could advance np- >tx.rsp_cons too farcausing>> other problems latter (I have not tested this yet though). > > Yes, this definitely looks like a bug; the ''break'' in - unstable isnot> really much better > than the ''goto out:'' in - testing since in either case we can''teasily> correctly recover. > >> The problem I''m having is that I can''t find the root cause as towhy>> gnttab_query_foreign_access returns an 8 (GTF_reading?) and not 0.I''ve>> looked in netback.c and and xen/common/grant_table.c and am notseeing>> it (not that it''s not there). > > Well all this means is that netback is still using the grant whichshould of> > course > be impossible since the ring pointers have been advanced. I.e.something is> borked. > > Can you try this with a debug build of xen? It would be interestingto see> if xen > complains about any grant refs prior to this occurance... > > > cheers, > > S.Here''s the serial output from a debug build of xen. The domain_crash does not happen on the non-debug xen. . . . (XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0 flags=0 (61 of 64) (XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0 flags=0 (59 of 64) (XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0 flags=0 (61 of 64) (XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0 flags=0 (58 of 64) (XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0 flags=0 (61 of 64) (XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0 flags=0 (60 of 64) (XEN) DOM0: (file=mm.c, line=2449) PTE entry 0 for address f2c81000 doesn''t match frame 7a568 (XEN) DOM0: (file=mm.c, line=637) Attempt to implicitly unmap a granted PTE 4b2fe861 (XEN) domain_crash called from mm.c:638 (XEN) Domain 0 (vcpu#1) crashed on cpu#1: (XEN) ----[ Xen-3.0.2_09668-0.1 Not tainted ]---- (XEN) CPU: 1 (XEN) EIP: 0061:[<c0101287>] (XEN) EFLAGS: 00200212 CONTEXT: guest (XEN) eax: 00000014 ebx: 00000000 ecx: f4c312c0 edx: 00000001 (XEN) esi: f2881f34 edi: f4c2eb9c ebp: f364e408 esp: f2881ee4 (XEN) cr0: 80050033 cr3: 79a7d000 (XEN) ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0069 cs: 0061 (XEN) Guest stack trace from esp=f2881ee4: (XEN) f4c24761 f37a2380 f37a2000 f3f02180 f37a2000 f3f02180 c1658c20 f2881f28 (XEN) c014a91a 00483f02 c032c900 00000001 00000001 00915700 c0c78380 00000000 (XEN) 00483f01 00000027 0003013e 05ea0020 f4c28c40 f2880000 00000000 c03acd10 (XEN) c0123a41 00000001 c036e128 f2880000 c03ab180 c0123555 c03ade60 00000007 (XEN) 00000001 f2880000 00000001 fbdf7000 00000020 c0123665 00000013 f2881fbc (XEN) c01068cc 00000000 c017b610 00000000 00000000 c024d5b1 00000020 00000000 (XEN) b7b8d8d9 08315c88 bfdd38ac bfdd3848 c0105138 f2881fbc b7b8d8d9 00800d4a (XEN) bfdd3b40 08315c88 bfdd38ac bfdd3848 08315c88 0000007b 0000007b ffffffec (XEN) b7b8bbba 00000073 00200286 bfdd37bc 0000007b 00000008 0000240b (XEN) Domain 0 crashed: rebooting machine in 5 seconds. On a non-debug xen I see destroy_grant_host_mapping failing with rc 0xffffffff, domain 2, ref 0x20, flags 6 in __gnttab_unmap_grant_ref in xen/common/grant_table.c. Also in netback.c in net_tx_action_dealloc, the HYPERVISOR_grant_table_op call succeeds but if you look at the status of each of the gnttab_unmap_grant_ref_t entries there is one with 0xffffffff.> > > _______________________________________________ > Xen- devel mailing list > Xen- devel@lists.xensource.com > http://lists.xensource.com/xen- devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-May-25 20:05 UTC
Re: [Xen-devel] netfront.c: gnttab_query_foreign_access returns nonzero in network_tx_buf_gc
On 25 May 2006, at 20:58, Kirk Allan wrote:> Here''s the serial output from a debug build of xen. The domain_crash > does not happen on the non-debug xen.So the problem is in netback driver, or in the map/unmap grant-table logic in Xen. The problem you saw in netfront is simply a knock-on symptom (given netfront''s current inability to recover, the bail path there could more usefully be BUG() I suppose). For some reason Xen has decided that the PTE that netback supplies when it tries to unmap a grant reference does not match up with the memory page that the grant reference grants access to. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel