Hey all, What is the ''veth0'' interface that is created in -unstable? I just upgraded from RH''s snapshot of 3.0 in FC4 to the current -unstable (with the 2.6.12 patches - they seem to work fine); however, when I bring up a domU, veth0 is brought up, and my network traffic dies. I also still have a ''xen-br0'' interface, and the normal ''eth0'' interface. Just wondering what veth0 is supposed to be, and the best way to debug this. Thanks! ------------------------------------------------------------------------ | nate carlson | natecars@natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 8 Jul 2005, at 17:29, Nate Carlson wrote:> Just wondering what veth0 is supposed to be, and the best way to debug > this.veth0 serves two purposes. First, it correctly provides checksum-avoidance across the Ethernet bridge. Second, it ensures that locally-delivered packets coming from other domains on the same box are copied into local buffers. This is actually quite important: otherwise we end up tying up network buffers belonging to a domU for unbounded time (buffers may wait in socket queues in dom0 for example) and also tying up resources in dom0''s netback driver. We have had reports of problems with veth0 though -- possibly the setup script (/etc/xen/scripts/network iirc) is broken for some people. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 8 Jul 2005, Keir Fraser wrote:> veth0 serves two purposes. First, it correctly provides > checksum-avoidance across the Ethernet bridge. Second, it ensures that > locally-delivered packets coming from other domains on the same box are > copied into local buffers. This is actually quite important: otherwise > we end up tying up network buffers belonging to a domU for unbounded > time (buffers may wait in socket queues in dom0 for example) and also > tying up resources in dom0''s netback driver. > > We have had reports of problems with veth0 though -- possibly the setup > script (/etc/xen/scripts/network iirc) is broken for some people.OK - so for a bridging setup, on d0, as I understand it from that script, we''re supposed to have 4(!) interfaces now: eth0 veth0 vif0.0 xen-br0 One thing I find odd is that it assigns the IP address to veth0, but vif0.0 is the one that gets added to the bridge with eth0 - is that the expected behavior? How does veth0/vif0.0 relate to the physical device? Is it just another entirely "virtual" device, or should I be able to take eth0 down, assign the IP to veth0, and have things work? ------------------------------------------------------------------------ | nate carlson | natecars@natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 8 Jul 2005, Nate Carlson wrote:> OK - so for a bridging setup, on d0, as I understand it from that > script, we''re supposed to have 4(!) interfaces now: > > eth0 > veth0 > vif0.0 > xen-br0 > > One thing I find odd is that it assigns the IP address to veth0, but > vif0.0 is the one that gets added to the bridge with eth0 - is that the > expected behavior? > > How does veth0/vif0.0 relate to the physical device? Is it just another > entirely "virtual" device, or should I be able to take eth0 down, assign > the IP to veth0, and have things work?Interesting - the default network script does assign veth0 on bootup, and everything works great - until I start up a guest domain. Anything I can do to troubleshoot that? ------------------------------------------------------------------------ | nate carlson | natecars@natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> OK - so for a bridging setup, on d0, as I understand it from that script, > we''re supposed to have 4(!) interfaces now: > > eth0 > veth0 > vif0.0 > xen-br0 > > One thing I find odd is that it assigns the IP address to veth0, but > vif0.0 is the one that gets added to the bridge with eth0 - is that the > expected behavior? > > How does veth0/vif0.0 relate to the physical device? Is it just another > entirely "virtual" device, or should I be able to take eth0 down, assign > the IP to veth0, and have things work?veth0/vif0.0 is basically a two-interface (back-to-back) loopback device. Packets transmitted on one interface are received on the other (with appropriate copying and checksum-offload twiddling). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > How does veth0/vif0.0 relate to the physical device? Is it just another > > entirely "virtual" device, or should I be able to take eth0 down, assign > > the IP to veth0, and have things work? > > Interesting - the default network script does assign veth0 on bootup, and > everything works great - until I start up a guest domain. Anything I can > do to troubleshoot that?This might have something to do with general networking problems we''ve had this week with domU''s. Bringing up a domU shouldn''t affect dom0''s access to the outside world unless there is some nasty bug. We though thi sbug had maybe disappeared but perhaps it is still alive and well. :-( At least we think it was added this week, so there''s not too much changeset history to wade through. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 8 Jul 2005, Keir Fraser wrote:> This might have something to do with general networking problems we''ve > had this week with domU''s. Bringing up a domU shouldn''t affect dom0''s > access to the outside world unless there is some nasty bug. We though > thi sbug had maybe disappeared but perhaps it is still alive and well. > :-(Must be a nasty bug. :/ Anything I can do to help debug the bug?> At least we think it was added this week, so there''s not too much > changeset history to wade through.That''s helpful, at least! :) ------------------------------------------------------------------------ | nate carlson | natecars@natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:>>>How does veth0/vif0.0 relate to the physical device? Is it just another >>>entirely "virtual" device, or should I be able to take eth0 down, assign >>>the IP to veth0, and have things work? >> >>Interesting - the default network script does assign veth0 on bootup, and >>everything works great - until I start up a guest domain. Anything I can >>do to troubleshoot that? > > > This might have something to do with general networking problems we''ve > had this week with domU''s. Bringing up a domU shouldn''t affect dom0''s > access to the outside world unless there is some nasty bug. We though > thi sbug had maybe disappeared but perhaps it is still alive and > well. :-(Nah, a colleague ran into it on yesterday''s bits, and I''m trying to debug.> At least we think it was added this week, so there''s not too much > changeset history to wade through.Sounds similar to a few problems reported in older posts, so I think it might go back a bit, but if not, that''s good to know. thanks, Nivedita _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 8 Jul 2005, Nate Carlson wrote:> Anything I can do to help debug the bug?One data point, at least - I reconfigured my test host with the old network script (that just puts the ip on xen-br0); I get the same behavior with that. (Bring up domU, networking works for about 2 pings, then dies.) ------------------------------------------------------------------------ | nate carlson | natecars@natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 8 Jul 2005, Nivedita Singhvi wrote:>> This might have something to do with general networking problems we''ve >> had this week with domU''s. Bringing up a domU shouldn''t affect dom0''s >> access to the outside world unless there is some nasty bug. We though >> thi sbug had maybe disappeared but perhaps it is still alive and >> well. :-( > > Nah, a colleague ran into it on yesterday''s bits, and I''m > trying to debug. > >> At least we think it was added this week, so there''s not too much >> changeset history to wade through. > > Sounds similar to a few problems reported in older posts, so I > think it might go back a bit, but if not, that''s good to know.Hey all, Has anyone run down what the root of this is yet? Is there anything a non-programmer like myself can do to help? ------------------------------------------------------------------------ | nate carlson | natecars@natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nate Carlson <natecars@natecarlson.com> writes:> Has anyone run down what the root of this is yet?Trapped into this as well. I think there is another bug as well, see the comments in the log below. Network setup is the "classic" one, with the bridge being configured as network device, veth0/vif0.0 is unused. "eth0" is the bridge, "hw-eth0" the network card. master-xen login: root Password: Last login: Thu Jul 14 07:34:27 from eskarina.ber.suse.de Have a lot of fun... SuSE Linux 9.3 (i586) SysRq : Changing Loglevel Loglevel set to 9 master-xen root ~# device vif1.0 entered promiscuous mode eth0: port 2(vif1.0) entering learning state (XEN) (file=traps.c, line=872) Non-priv domain attempted RDMSR(c0000080,00000000,20100000). (XEN) (file=traps.c, line=864) Non-priv domain attempted WRMSR(c0000080,00000800,00000000). eth0: topology change detected, propagating eth0: port 2(vif1.0) entering forwarding state [ Note #1: That was the initial domU boot. fsck asked for a manual run due to unclean filesystem from the previous crash, so I did that and rebooted ] device vif1.0 left promiscuous mode eth0: port 2(vif1.0) entering disabled state eth0: port 2(vif1.0) entering disabled state device vif1.0 entered promiscuous mode eth0: port 2(vif1.0) entering learning state (XEN) (file=traps.c, line=872) Non-priv domain attempted RDMSR(c0000080,00000000,20100000). (XEN) (file=traps.c, line=864) Non-priv domain attempted WRMSR(c0000080,00000800,00000000). eth0: port 2(vif1.0) entering disabled state [ Note #2: DomU comes up fine now, but without functional network. ] ip link ls vif1.0 7: vif1.0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff [ Note #3: Hmm, the virtual bridge port is down. That shouldn''t be that way, should it? Fixed up manually. Shortly thereafter the machine dies, must be one of the first network packets from domU which kills it. Full oops log below. ] master-xen root ~# ip link set vif1.0 up eth0: port 2(vif1.0) entering learning state master-xen root ~# eth0: topology change detected, propagating eth0: port 2(vif1.0) entering forwarding state general protection fault: 0000 [#1] Modules linked in: CPU: 0 EIP: 0061:[<c02f0dad>] Not tainted VLI EFLAGS: 00010213 (2.6.12-xen0-hg64f26eed8d473a96beab96162c230f1300539d7c) EIP is at skb_release_data+0x54/0xe2 eax: dd0c4080 ebx: 00000000 ecx: 00000002 edx: ffffffff esi: dbcdf580 edi: 00000012 ebp: 0000003c esp: c0453c68 ds: 007b es: 007b ss: 0069 Process swapper (pid: 0, threadinfo=c0452000 task=c03c4500) Stack: dd0c4000 00000000 00000000 dd553b80 dbcdf580 dbcdf580 c02f0e4b dbcdf580 dd553b80 00000000 c02f0f32 dbcdf580 0081f992 dbcdf580 dc56ee20 dbcdf580 dc56ee20 c0274685 dbcdf580 00000002 00000000 38704032 0000003c 00000000 Call Trace: [<c02f0e4b>] kfree_skbmem+0x10/0x26 [<c02f0f32>] __kfree_skb+0xd1/0xdd [<c0274685>] net_rx_action+0x3e3/0x4b3 [<c0125d5c>] update_process_times+0x130/0x140 [<c011e3bd>] profile_tick+0x4e/0x5a [<c0107b81>] xen_idle+0x45/0x4c [<c010b6ea>] __get_time_values_from_xen+0x6a/0x6b [<c010bf44>] timer_interrupt+0x39/0x4ca [<c013d4a7>] mempool_alloc_slab+0x17/0x1b [<c02084a2>] __delay+0x12/0x16 [<c0208524>] __const_udelay+0x25/0x29 [<c029a196>] ata_exec_command_pio+0x27/0x2b [<c029a1f1>] ata_exec_command+0x2b/0x2f [<c013d4c2>] mempool_free_slab+0x17/0x25 [<c01196ce>] recalc_task_prio+0x141/0x151 [<c02f0e5c>] kfree_skbmem+0x21/0x26 [<c02f0e35>] skb_release_data+0xdc/0xe2 [<c02f0e5c>] kfree_skbmem+0x21/0x26 [<c02f0f32>] __kfree_skb+0xd1/0xdd [<c02f6c95>] dev_queue_xmit+0x291/0x2a7 [<c033ae64>] packet_rcv_spkt+0x212/0x21f [<c02f0f5e>] skb_clone+0x20/0x191 [<c02f71fd>] netif_receive_skb+0x20c/0x24b [<c033dfdf>] br_pass_frame_up_finish+0xf/0x18 [<c033e00d>] br_pass_frame_up+0x25/0x29 [<c033e0c7>] br_handle_frame_finish+0xb6/0x120 [<c033e26a>] br_handle_frame+0x139/0x17f [<c01254db>] __mod_timer+0xb1/0xd7 [<c02f0c02>] alloc_skb_from_cache+0x51/0x141 [<c0269fb2>] e100_poll+0xe6/0x87e [<c01221d4>] tasklet_action+0x8b/0xca [<c0121edb>] __do_softirq+0x4b/0x9e [<c0121f5a>] do_softirq+0x2c/0x45 [<c012200a>] irq_exit+0x29/0x2a [<c010e002>] do_IRQ+0x22/0x28 [<c01062e6>] evtchn_do_upcall+0x66/0x8e [<c0109dc8>] hypervisor_callback+0x2c/0x34 [<c0107b81>] xen_idle+0x45/0x4c [<c0107bc4>] cpu_idle+0x3c/0x4a [<c022bf06>] acpi_enable_subsystem+0x29/0x55 [<c0105024>] _stext+0x24/0x28 [<c010505a>] init+0x0/0xfa [<c045484a>] start_kernel+0x1ca/0x1d1 [<c045432f>] unknown_bootoption+0x0/0x23e Code: 89 c1 0f c1 02 01 c8 85 c0 0f 85 a4 00 00 00 8b 96 94 00 00 00 89 d0 83 7a 04 00 74 74 bb 00 00 00 00 3b 5a 04 73 6a 8b 54 d8 10 <8b> 02 f6 c4 08 75 53 8b 42 04 83 f8 ff 75 35 c7 44 24 0c 99 71 <0>Kernel panic - not syncing: Fatal exception in interrupt (XEN) Domain 0 shutdown: rebooting machine. The faulting instruction is this: c02f0d9f: bb 00 00 00 00 mov $0x0,%ebx c02f0da4: 3b 5a 04 cmp 0x4(%edx),%ebx c02f0da7: 73 6a jae c02f0e13 <skb_release_data+0xba> c02f0da9: 8b 54 d8 10 mov 0x10(%eax,%ebx,8),%edx c02f0dad: 8b 02 mov (%edx),%eax <= HERE That should be this loop here: void skb_release_data(struct sk_buff *skb) [ ... ] for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) put_page(skb_shinfo(skb)->frags[i].page); ebx is the loop count and is zero, so it''s the first time we enter the loop. skb_shinfo(skb)->frags[0].page is loaded into edx. It is 0xffffffff (-1?). Trying to dereference edx faults because it points into xen''s memory area ... So the question is why the heck the struct page pointer is -1 at this point? Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Knorr <kraxel@suse.de> writes:> for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) > put_page(skb_shinfo(skb)->frags[i].page); > > ebx is the loop count and is zero, so it''s the first time we enter the > loop. skb_shinfo(skb)->frags[0].page is loaded into edx. It is > 0xffffffff (-1?). Trying to dereference edx faults because it points > into xen''s memory area ... > > So the question is why the heck the struct page pointer is -1 at this > point?Hmm, added a quick check to the most obvious place, that doesn''t trigger though, so it must happen somewhere else ... Ideas anyone? Gerd Index: linux-2.6.12-work/drivers/xen/netback/netback.c ==================================================================--- linux-2.6.12-work.orig/drivers/xen/netback/netback.c 2005-07-14 16:41:22.000000000 +0200 +++ linux-2.6.12-work/drivers/xen/netback/netback.c 2005-07-14 16:43:18.000000000 +0200 @@ -626,6 +626,7 @@ static void net_tx_action(unsigned long /* Append the packet payload as a fragment. */ skb_shinfo(skb)->frags[0].page = virt_to_page(MMAP_VADDR(pending_idx)); + BUG_ON((void*)-1 == skb_shinfo(skb)->frags[0].page); skb_shinfo(skb)->frags[0].size = txreq.size - data_len; skb_shinfo(skb)->frags[0].page_offset = (txreq.addr + data_len) & ~PAGE_MASK; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14 Jul 2005, at 15:58, Gerd Knorr wrote:>> So the question is why the heck the struct page pointer is -1 at this >> point? > > Hmm, added a quick check to the most obvious place, that doesn''t > trigger though, so it must happen somewhere else ... > > Ideas anyone?I think most likely there''s random memory corruption going on, and this must be how it is currently manifesting itself on your system. :-( We also see all kinds of random crap going on in the dom0 netback driver, where it looks like data pointers may be overwritten with garbage. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jul 14, 2005 at 04:12:51PM +0100, Keir Fraser wrote:> I think most likely there''s random memory corruption going on, and this > must be how it is currently manifesting itself on your system. :-(I don''t really think that so much random though, Kurt Garloff and IBM daily build reports the same problems. I''m still without clues on how to reproduce it, I even installed a SLES9 rescue image in case that was related. However I can have this one on dom0 quite easily as soon as I started a domU: recvmsg bug: copied C1811631 seq C1811661 KERNEL: assertion (flags & MSG_PEEK) failed at net/ipv4/tcp.c (1295) -- Vincent Hanquez _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Hanquez wrote:> On Thu, Jul 14, 2005 at 04:12:51PM +0100, Keir Fraser wrote: > >>I think most likely there''s random memory corruption going on, and this >>must be how it is currently manifesting itself on your system. :-( > > > I don''t really think that so much random though, > Kurt Garloff and IBM daily build reports the same problems. > I''m still without clues on how to reproduce it, I even installed > a SLES9 rescue image in case that was related. > > However I can have this one on dom0 quite easily as soon as I started a domU: > > recvmsg bug: copied C1811631 seq C1811661 > KERNEL: assertion (flags & MSG_PEEK) failed at net/ipv4/tcp.c (1295) >The above used to be caused by a problem that was fixed (e100 NAPI driver issue) in mainline, but 2.6.12 should have that fix. Anything else corrupting the tcp socket data will provoke the same message, though, still looking... thanks, Nivedita _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jul 14, 2005 at 08:44:21AM -0700, Nivedita Singhvi wrote:> The above used to be caused by a problem that was fixed (e100 NAPI > driver issue) in mainline, but 2.6.12 should have that fix.the network card is a tg3 anyway. -- Vincent Hanquez _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel