Hi all, I have a strange network problem with some domU''s on three XEN-Hosts. They are loosing their network connectivity. I do bridged networking. * It happens randomly and could happen right after bootup of the domU or anytime later. * The domU is not reachable from another host on the LAN. * The domU is always reachable from the dom0 (ssh, ping). * I can ''repair'' the connection when attaching to the console and ping out from the domU. First nothings happens, then the machine gets back their network. (And thats also my momentary workaround, pinging all the time from the console) * Pinging from another host at the same time helps too. * It could be that I can ping continously from one host and another hosts gets only every 10th packet or so back. * The interfaces could come back from their sleep by itself. * When the networks has fallen asleep, ssh on the domU from another host hangs, it does not come back with "no route to host" or something. I''m suspicious about the network controllers, they are the same on all hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection with I/O Acceleration"(Intel website). I''ve tried the latest e1000 driver from Intel but it does''nt helped. I''ve checked all MAC Adresses, they are unique, also the IP Adresses. Any ideas are welcome :) ------------------------------------------------------------------------- "xm info" from host1, openSUSE 10.2 (X86-64): release : 2.6.18.8-0.9-xen version : #1 SMP Sun Feb 10 22:48:05 UTC 2008 machine : x86_64 nr_cpus : 4 nr_nodes : 1 sockets_per_node : 2 cores_per_socket : 2 threads_per_core : 1 cpu_mhz : 2327 hw_caps : bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 total_memory : 32766 free_memory : 21607 max_free_memory : 21607 max_para_memory : 21603 max_hvm_memory : 21544 xen_major : 3 xen_minor : 0 xen_extra : .3_11774-23 xen_caps : xen-3.0-x86_64 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : 11774 cc_compiler : gcc version 4.1.2 20061115 (prerelease) (SUSE Linux) cc_compile_by : abuild cc_compile_domain : suse.de cc_compile_date : Thu Jan 10 21:22:54 UTC 2008 xend_config_format : 2 ------------------------------------------------------------------------- "xm info" output on host2, openSUSE 10.3 (X86-64) release : 2.6.22.13-0.3-xen version : #1 SMP 2007/11/19 15:02:58 UTC machine : x86_64 nr_cpus : 8 nr_nodes : 1 sockets_per_node : 2 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 3000 hw_caps : bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 total_memory : 16382 free_memory : 591 max_free_memory : 591 max_para_memory : 587 max_hvm_memory : 577 xen_major : 3 xen_minor : 1 xen_extra : .0_15042-51 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : 15042 cc_compiler : gcc version 4.2.1 (SUSE Linux) cc_compile_by : abuild cc_compile_domain : suse.de cc_compile_date : Tue Sep 25 21:16:06 UTC 2007 xend_config_format : 4 -- Marc Teichgraeber _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I''ve seen the same problem with my xen 3.1.0 setup. What the Xen gurus are telling us is that this is a symptom of Xen dom0 being busy and not servicing the network interrupts of the domu''s promptly. Their advice to us was to shift an application that had been running on dom0 to another Xen instance to see if that would help. We are in the process of implementing that solution now. By the way my system (Dell poweredge2950) has got broadcomm inbuilt network cards, not Intel E1000 so it is unlikely that it is a network driver specific issue. During these episodes of non-network connectivity, by the way, it was not unusual to see the following kernel dump in dom0 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: Call Trace: 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: <IRQ> [<ffffffff8025 8269>] softlockup_tick+0xcc/0xde 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: [<ffffffff8020e84d>] timer_interrupt+0x3a3/0x401 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: [<ffffffff80258898>] handle_IRQ_event+0x4b/0x93 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: [<ffffffff8025897e>] __do_IRQ+0x9e/0x100 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: [<ffffffff8020cc97>] do_IRQ+0x63/0x71 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: [<ffffffff8034b347>] evtchn_do_upcall+0xee/0x165 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: [<ffffffff8020abca>] do_hypervisor_callback+0x1e/0x2c 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: <EOI> or Feb 25 10:32:39 fermigrid6 kernel: BUG: soft lockup detected on CPU#0! Feb 25 10:32:39 fermigrid6 kernel: Feb 25 10:32:39 fermigrid6 kernel: Call Trace: Feb 25 10:32:39 fermigrid6 kernel: <IRQ> [<ffffffff80258269>] softlockup_tick+0xcc/0xde Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020e84d>] timer_interrupt+0x3a3/0x401 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80258898>] handle_IRQ_event+0x4b/0x93 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8025897e>] __do_IRQ+0x9e/0x100 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020cc97>] do_IRQ+0x63/0x71 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8034b347>] evtchn_do_upcall+0xee/0x165 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020abca>] do_hypervisor_callback+0x1e/0x2c Feb 25 10:32:39 fermigrid6 kernel: <EOI> [<ffffffff8020622a>] hypercall_page+0x22a/0x1000 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020622a>] hypercall_page+0x22a/0x1000 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8034b258>] force_evtchn_callback+0xa/0xb Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff803f2272>] thread_return+0xdf/0x119 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020622a>] hypercall_page+0x22a/0x1000 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80228a25>] __cond_resched+0x1c/0x44 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff803f25df>] cond_resched+0x37/0x42 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff802343c4>] ksoftirqd+0x0/0xbf Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80234432>] ksoftirqd+0x6e/0xbf Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff802422d7>] kthread+0xc8/0xf1 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020ae1c>] child_rip+0xa/0x12 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8024220f>] kthread+0x0/0xf1 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020ae12>] child_rip+0x0/0x12 ---------------- One of our dom0''s was running an LVS server, the other one on identical hardware was not. We moved the LVS server from one to the other and the network problems and kernel panics followed it. Steve Timm On Mon, 3 Mar 2008, Marc Teichgraeber wrote:> Hi all, > > I have a strange network problem with some domU''s on three XEN-Hosts. > They are loosing their network connectivity. I do bridged networking. > * It happens randomly and could happen right after bootup of the domU > or anytime later. > * The domU is not reachable from another host on the LAN. > * The domU is always reachable from the dom0 (ssh, ping). > * I can ''repair'' the connection when attaching to the console and > ping out from the domU. First nothings happens, then the machine gets > back their network. (And thats also my momentary workaround, pinging all > the time from the console) > * Pinging from another host at the same time helps too. > * It could be that I can ping continously from one host and another > hosts gets only every 10th packet or so back. > * The interfaces could come back from their sleep by itself. > * When the networks has fallen asleep, ssh on the domU from another > host hangs, it does not come back with "no route to host" or something. > > I''m suspicious about the network controllers, they are the same on all > hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller > (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection > with I/O Acceleration"(Intel website). I''ve tried the latest e1000 > driver from Intel but it does''nt helped. > I''ve checked all MAC Adresses, they are unique, also the IP Adresses. > > Any ideas are welcome :) > > ------------------------------------------------------------------------- > "xm info" from host1, openSUSE 10.2 (X86-64): > > release : 2.6.18.8-0.9-xen > version : #1 SMP Sun Feb 10 22:48:05 UTC 2008 > machine : x86_64 > nr_cpus : 4 > nr_nodes : 1 > sockets_per_node : 2 > cores_per_socket : 2 > threads_per_core : 1 > cpu_mhz : 2327 > hw_caps : > bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 > total_memory : 32766 > free_memory : 21607 > max_free_memory : 21607 > max_para_memory : 21603 > max_hvm_memory : 21544 > xen_major : 3 > xen_minor : 0 > xen_extra : .3_11774-23 > xen_caps : xen-3.0-x86_64 > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : 11774 > cc_compiler : gcc version 4.1.2 20061115 (prerelease) (SUSE > Linux) > cc_compile_by : abuild > cc_compile_domain : suse.de > cc_compile_date : Thu Jan 10 21:22:54 UTC 2008 > xend_config_format : 2 > ------------------------------------------------------------------------- > "xm info" output on host2, openSUSE 10.3 (X86-64) > > release : 2.6.22.13-0.3-xen > version : #1 SMP 2007/11/19 15:02:58 UTC > machine : x86_64 > nr_cpus : 8 > nr_nodes : 1 > sockets_per_node : 2 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 3000 > hw_caps : > bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 > total_memory : 16382 > free_memory : 591 > max_free_memory : 591 > max_para_memory : 587 > max_hvm_memory : 577 > xen_major : 3 > xen_minor : 1 > xen_extra : .0_15042-51 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : 15042 > cc_compiler : gcc version 4.2.1 (SUSE Linux) > cc_compile_by : abuild > cc_compile_domain : suse.de > cc_compile_date : Tue Sep 25 21:16:06 UTC 2007 > xend_config_format : 4 > >-- ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division, Scientific Computing Facilities, Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
We had exactly the same problem, but with only one domU (among about 75 other located on 15 dom0s), two months ago. We done a quick and dirty fix : a croned ping every five minutes on the domU. Since that, we have no more user complains (this domU is heavily used). I know it is not a real solution, it just work for us. Alain. xen-users-bounces@lists.xensource.com a écrit sur 03/03/2008 16:49:29 :> Hi all, > > I have a strange network problem with some domU''s on three XEN-Hosts. > They are loosing their network connectivity. I do bridged networking. > * It happens randomly and could happen right after bootup of the domU > or anytime later. > * The domU is not reachable from another host on the LAN. > * The domU is always reachable from the dom0 (ssh, ping). > * I can ''repair'' the connection when attaching to the console and > ping out from the domU. First nothings happens, then the machine gets > back their network. (And thats also my momentary workaround, pinging all > the time from the console) > * Pinging from another host at the same time helps too. > * It could be that I can ping continously from one host and another > hosts gets only every 10th packet or so back. > * The interfaces could come back from their sleep by itself. > * When the networks has fallen asleep, ssh on the domU from another > host hangs, it does not come back with "no route to host" or something. > > I''m suspicious about the network controllers, they are the same on all > hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller > (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection > with I/O Acceleration"(Intel website). I''ve tried the latest e1000 > driver from Intel but it does''nt helped. > I''ve checked all MAC Adresses, they are unique, also the IP Adresses. > > Any ideas are welcome :) > >-------------------------------------------------------------------------> "xm info" from host1, openSUSE 10.2 (X86-64): > > release : 2.6.18.8-0.9-xen > version : #1 SMP Sun Feb 10 22:48:05 UTC 2008 > machine : x86_64 > nr_cpus : 4 > nr_nodes : 1 > sockets_per_node : 2 > cores_per_socket : 2 > threads_per_core : 1 > cpu_mhz : 2327 > hw_caps : > bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 > total_memory : 32766 > free_memory : 21607 > max_free_memory : 21607 > max_para_memory : 21603 > max_hvm_memory : 21544 > xen_major : 3 > xen_minor : 0 > xen_extra : .3_11774-23 > xen_caps : xen-3.0-x86_64 > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : 11774 > cc_compiler : gcc version 4.1.2 20061115 (prerelease) (SUSE > Linux) > cc_compile_by : abuild > cc_compile_domain : suse.de > cc_compile_date : Thu Jan 10 21:22:54 UTC 2008 > xend_config_format : 2 >-------------------------------------------------------------------------> "xm info" output on host2, openSUSE 10.3 (X86-64) > > release : 2.6.22.13-0.3-xen > version : #1 SMP 2007/11/19 15:02:58 UTC > machine : x86_64 > nr_cpus : 8 > nr_nodes : 1 > sockets_per_node : 2 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 3000 > hw_caps : > bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 > total_memory : 16382 > free_memory : 591 > max_free_memory : 591 > max_para_memory : 587 > max_hvm_memory : 577 > xen_major : 3 > xen_minor : 1 > xen_extra : .0_15042-51 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : 15042 > cc_compiler : gcc version 4.2.1 (SUSE Linux) > cc_compile_by : abuild > cc_compile_domain : suse.de > cc_compile_date : Tue Sep 25 21:16:06 UTC 2007 > xend_config_format : 4 > > -- > Marc Teichgraeber > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Marc Teichgraeber
2008-Mar-03 17:19 UTC
Re: [Xen-users] domU network has sleeping sickness
Steven Timm wrote:> I''ve seen the same problem with my xen 3.1.0 setup. What > the Xen gurus are telling us is that this is a symptom of Xen dom0 > being busy and not servicing the network interrupts of the domu''s > promptly. Their advice to us was to shift an application that > had been running on dom0 to another Xen instance to see if that > would help. We are in the process of implementing that solution now. >There is nothing running on my dom0''s. They''re only purpose is managing the domU''s. On one of the problematic XEN-hosts is actually load on the three domU''s, they are serving continous build systems. But another sleepy XEN-host with five domU''s is more or less in pre-production state and idling.> By the way my system (Dell poweredge2950) has got broadcomm > inbuilt network cards, not Intel E1000 so it is unlikely that > it is a network driver specific issue. > > During these episodes of non-network connectivity, by the way, > it was not unusual to see the following kernel dump in dom0 >I do''nt find anything helpful or suspicious in any log. But maybe I''m missing it. I''m looking in dom0 in dmesg, messages, warn, xend-debug.log, xend.log and xen-hotplug.log and in the domU in dmesg, messages and warn. But after the bootup process there is more or less nothing important logged.> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: Call Trace: > 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: <IRQ> > [<ffffffff8025 > 8269>] softlockup_tick+0xcc/0xde > 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: > [<ffffffff8020e84d>] > timer_interrupt+0x3a3/0x401 > 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: > [<ffffffff80258898>] > handle_IRQ_event+0x4b/0x93 > 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: > [<ffffffff8025897e>] > __do_IRQ+0x9e/0x100 > 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: > [<ffffffff8020cc97>] > do_IRQ+0x63/0x71 > 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: > [<ffffffff8034b347>] > evtchn_do_upcall+0xee/0x165 > 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: > [<ffffffff8020abca>] > do_hypervisor_callback+0x1e/0x2c > 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: <EOI> > > or > > Feb 25 10:32:39 fermigrid6 kernel: BUG: soft lockup detected on CPU#0! > Feb 25 10:32:39 fermigrid6 kernel: > Feb 25 10:32:39 fermigrid6 kernel: Call Trace: > Feb 25 10:32:39 fermigrid6 kernel: <IRQ> [<ffffffff80258269>] > softlockup_tick+0xcc/0xde > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020e84d>] > timer_interrupt+0x3a3/0x401 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80258898>] > handle_IRQ_event+0x4b/0x93 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8025897e>] > __do_IRQ+0x9e/0x100 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020cc97>] do_IRQ+0x63/0x71 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8034b347>] > evtchn_do_upcall+0xee/0x165 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020abca>] > do_hypervisor_callback+0x1e/0x2c > Feb 25 10:32:39 fermigrid6 kernel: <EOI> [<ffffffff8020622a>] > hypercall_page+0x22a/0x1000 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020622a>] > hypercall_page+0x22a/0x1000 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8034b258>] > force_evtchn_callback+0xa/0xb > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff803f2272>] > thread_return+0xdf/0x119 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020622a>] > hypercall_page+0x22a/0x1000 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80228a25>] > __cond_resched+0x1c/0x44 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff803f25df>] > cond_resched+0x37/0x42 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff802343c4>] > ksoftirqd+0x0/0xbf > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80234432>] > ksoftirqd+0x6e/0xbf > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff802422d7>] > kthread+0xc8/0xf1 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020ae1c>] > child_rip+0xa/0x12 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8024220f>] kthread+0x0/0xf1 > Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020ae12>] > child_rip+0x0/0x12 > > ---------------- > > One of our dom0''s was running an LVS server, the other one on > identical hardware was not. We moved the LVS server from one to the > other and > the network problems and kernel panics followed it. > > Steve Timm > > On Mon, 3 Mar 2008, Marc Teichgraeber wrote: > >> Hi all, >> >> I have a strange network problem with some domU''s on three XEN-Hosts. >> They are loosing their network connectivity. I do bridged networking. >> * It happens randomly and could happen right after bootup of the domU >> or anytime later. >> * The domU is not reachable from another host on the LAN. >> * The domU is always reachable from the dom0 (ssh, ping). >> * I can ''repair'' the connection when attaching to the console and >> ping out from the domU. First nothings happens, then the machine gets >> back their network. (And thats also my momentary workaround, pinging all >> the time from the console) >> * Pinging from another host at the same time helps too. >> * It could be that I can ping continously from one host and another >> hosts gets only every 10th packet or so back. >> * The interfaces could come back from their sleep by itself. >> * When the networks has fallen asleep, ssh on the domU from another >> host hangs, it does not come back with "no route to host" or something. >> >> I''m suspicious about the network controllers, they are the same on all >> hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller >> (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection >> with I/O Acceleration"(Intel website). I''ve tried the latest e1000 >> driver from Intel but it does''nt helped. >> I''ve checked all MAC Adresses, they are unique, also the IP Adresses. >> >> Any ideas are welcome :) >> >> ------------------------------------------------------------------------- >> >> "xm info" from host1, openSUSE 10.2 (X86-64): >> >> release : 2.6.18.8-0.9-xen >> version : #1 SMP Sun Feb 10 22:48:05 UTC 2008 >> machine : x86_64 >> nr_cpus : 4 >> nr_nodes : 1 >> sockets_per_node : 2 >> cores_per_socket : 2 >> threads_per_core : 1 >> cpu_mhz : 2327 >> hw_caps : >> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 >> total_memory : 32766 >> free_memory : 21607 >> max_free_memory : 21607 >> max_para_memory : 21603 >> max_hvm_memory : 21544 >> xen_major : 3 >> xen_minor : 0 >> xen_extra : .3_11774-23 >> xen_caps : xen-3.0-x86_64 >> xen_pagesize : 4096 >> platform_params : virt_start=0xffff800000000000 >> xen_changeset : 11774 >> cc_compiler : gcc version 4.1.2 20061115 (prerelease) (SUSE >> Linux) >> cc_compile_by : abuild >> cc_compile_domain : suse.de >> cc_compile_date : Thu Jan 10 21:22:54 UTC 2008 >> xend_config_format : 2 >> ------------------------------------------------------------------------- >> >> "xm info" output on host2, openSUSE 10.3 (X86-64) >> >> release : 2.6.22.13-0.3-xen >> version : #1 SMP 2007/11/19 15:02:58 UTC >> machine : x86_64 >> nr_cpus : 8 >> nr_nodes : 1 >> sockets_per_node : 2 >> cores_per_socket : 4 >> threads_per_core : 1 >> cpu_mhz : 3000 >> hw_caps : >> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 >> total_memory : 16382 >> free_memory : 591 >> max_free_memory : 591 >> max_para_memory : 587 >> max_hvm_memory : 577 >> xen_major : 3 >> xen_minor : 1 >> xen_extra : .0_15042-51 >> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p >> xen_scheduler : credit >> xen_pagesize : 4096 >> platform_params : virt_start=0xffff800000000000 >> xen_changeset : 15042 >> cc_compiler : gcc version 4.2.1 (SUSE Linux) >> cc_compile_by : abuild >> cc_compile_domain : suse.de >> cc_compile_date : Tue Sep 25 21:16:06 UTC 2007 >> xend_config_format : 4 >> >> >-- -------------------------------- Marc Teichgraeber Systemadministrator Systemadministration neofonie GmbH Robert-Koch-Platz 4 10115 Berlin fon: +49.30 24627 185 fax: +49.30 24627 120 marc.teichgraeber@neofonie.de http://www.neofonie.de Handelsregister Berlin-Charlottenburg: HRB 67460 Geschaeftsfuehrung Helmut Hoffer von Ankershoffen Nurhan Yildirim -------------------------------- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Mon, 03 Mar 2008 10:05:26 -0600 (CST), <timm@fnal.gov> wrote:> > I''ve seen the same problem with my xen 3.1.0 setup. What > the Xen gurus are telling us is that this is a symptom of Xen dom0 > being busy and not servicing the network interrupts of the domu''sWith the problem being made worse by employing bridging (the most primitive form of routing). Bridging over several machines brings a large cost in overhead packet traffic, since most packets need to be sent everywhere because people insist on setting these up as a /24. -- One of the strokes of genius from McCarthy was making lists the center of the language - kt _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
The symptoms you describe sound similar to whats found when there''s a MAC address conflict. Make sure that the MAC addresses of your virtual machines are completely unique and not used anywhere else in your network. Marc Teichgraeber wrote:> Hi all, > > I have a strange network problem with some domU''s on three XEN-Hosts. > They are loosing their network connectivity. I do bridged networking. > * It happens randomly and could happen right after bootup of the domU > or anytime later. > * The domU is not reachable from another host on the LAN. > * The domU is always reachable from the dom0 (ssh, ping). > * I can ''repair'' the connection when attaching to the console and > ping out from the domU. First nothings happens, then the machine gets > back their network. (And thats also my momentary workaround, pinging all > the time from the console) > * Pinging from another host at the same time helps too. > * It could be that I can ping continously from one host and another > hosts gets only every 10th packet or so back. > * The interfaces could come back from their sleep by itself. > * When the networks has fallen asleep, ssh on the domU from another > host hangs, it does not come back with "no route to host" or something. > > I''m suspicious about the network controllers, they are the same on all > hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller > (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection > with I/O Acceleration"(Intel website). I''ve tried the latest e1000 > driver from Intel but it does''nt helped. > I''ve checked all MAC Adresses, they are unique, also the IP Adresses. > > Any ideas are welcome :) > > ------------------------------------------------------------------------- > "xm info" from host1, openSUSE 10.2 (X86-64): > > release : 2.6.18.8-0.9-xen > version : #1 SMP Sun Feb 10 22:48:05 UTC 2008 > machine : x86_64 > nr_cpus : 4 > nr_nodes : 1 > sockets_per_node : 2 > cores_per_socket : 2 > threads_per_core : 1 > cpu_mhz : 2327 > hw_caps : > bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 > total_memory : 32766 > free_memory : 21607 > max_free_memory : 21607 > max_para_memory : 21603 > max_hvm_memory : 21544 > xen_major : 3 > xen_minor : 0 > xen_extra : .3_11774-23 > xen_caps : xen-3.0-x86_64 > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : 11774 > cc_compiler : gcc version 4.1.2 20061115 (prerelease) (SUSE > Linux) > cc_compile_by : abuild > cc_compile_domain : suse.de > cc_compile_date : Thu Jan 10 21:22:54 UTC 2008 > xend_config_format : 2 > ------------------------------------------------------------------------- > "xm info" output on host2, openSUSE 10.3 (X86-64) > > release : 2.6.22.13-0.3-xen > version : #1 SMP 2007/11/19 15:02:58 UTC > machine : x86_64 > nr_cpus : 8 > nr_nodes : 1 > sockets_per_node : 2 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 3000 > hw_caps : > bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 > total_memory : 16382 > free_memory : 591 > max_free_memory : 591 > max_para_memory : 587 > max_hvm_memory : 577 > xen_major : 3 > xen_minor : 1 > xen_extra : .0_15042-51 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : 15042 > cc_compiler : gcc version 4.2.1 (SUSE Linux) > cc_compile_by : abuild > cc_compile_domain : suse.de > cc_compile_date : Tue Sep 25 21:16:06 UTC 2007 > xend_config_format : 4 >-- Joshua West Systems Engineer Brandeis University http://www.brandeis.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Joshua West wrote:> The symptoms you describe sound similar to whats found when there''s a > MAC address conflict. Make sure that the MAC addresses of your virtual > machines are completely unique and not used anywhere else in your network. > > Marc Teichgraeber wrote: > >> Hi all, >> >> I have a strange network problem with some domU''s on three XEN-Hosts. >> They are loosing their network connectivity. I do bridged networking. >> * It happens randomly and could happen right after bootup of the domU >> or anytime later. >> * The domU is not reachable from another host on the LAN. >> * The domU is always reachable from the dom0 (ssh, ping). >> * I can ''repair'' the connection when attaching to the console and >> ping out from the domU. First nothings happens, then the machine gets >> back their network. (And thats also my momentary workaround, pinging all >> the time from the console) >> * Pinging from another host at the same time helps too. >> * It could be that I can ping continously from one host and another >> hosts gets only every 10th packet or so back. >> * The interfaces could come back from their sleep by itself. >> * When the networks has fallen asleep, ssh on the domU from another >> host hangs, it does not come back with "no route to host" or something. >> >> I''m suspicious about the network controllers, they are the same on all >> hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller >> (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection >> with I/O Acceleration"(Intel website). I''ve tried the latest e1000 >> driver from Intel but it does''nt helped. >> I''ve checked all MAC Adresses, they are unique, also the IP Adresses. >> >>< -- snip --> Joshua, what will happen if the network aliases (eth0:0, eth0:1, eth0:2, etc) all have same MAC address? and how do I know what MAC address to give to the dom_U''s? -- Kind Regards Rudi Ahlers CEO, SoftDux Web: http://www.SoftDux.com Check out my technical blog, http://blog.softdux.com for Linux or other technical stuff, or visit http://www.WebHostingTalk.co.za for Web Hosting stugg _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rudi Ahlers wrote:> Joshua West wrote: >> The symptoms you describe sound similar to whats found when there''s a >> MAC address conflict. Make sure that the MAC addresses of your virtual >> machines are completely unique and not used anywhere else in your >> network. >> > < -- snip --> > > Joshua, what will happen if the network aliases (eth0:0, eth0:1, eth0:2, > etc) all have same MAC address? > > and how do I know what MAC address to give to the dom_U''s? >Sub-interfaces (eth0:0, eth0:1, etc) should and do have the same MAC address. Its when one gets into assigning MAC addresses to different virtual interfaces for your domU (i.e. eth0 in your VM and eth1 in your VM) that you need different MAC addresses. Additionally, unless you have a really good reason, each virtual interface in your domU (eth0, eth1, etc) should be in a different network. Meaning, if eth0 is in a 192.168.100.0/24 network, then eth1 should not also be in 192.168.100.0/24. You can run into asynchronous routing issues that way... as well as some fun when using stateless protocols like UDP and ICMP. This is all not even considering what happens when you''re also leveraging a stateful/connection-tracking firewall like a Cisco FWSM (they don''t like to see asynchronous routing). You configure the MAC address of your domU''s virtual interface''s (VIF''s) in your domU configuration file. For example, the following line gives a domU two network interfaces: vif = [ ''bridge=xenbr20, mac=aa:bb:cc:00:00:80'', ''bridge=xenbr50, mac=aa:bb:cc:00:00:81'' ] The first interface, aka eth0 in the domU, gets attached to xenbr20 in the dom0. The second interface, aka eth1 in the domU, gets attached to xenbr50 in the dom0. Regarding knowing what MAC address to give your domU''s, I believe you can have Xen automatically assign one for you (but last time I checked, it can change upon reboot of a VM). IMHO its better to just keep a list of MAC addresses that you''ve assigned to individual specific interfaces in your domU''s. Additionally, prefix the MAC address with an OUI (the first three bytes) that you''ll never find on a vendor''s network card, such as "aa:bb:cc". Hope this helps. -- Joshua West Systems Engineer Brandeis University http://www.brandeis.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Joshua West wrote:> Rudi Ahlers wrote: > >> Joshua West wrote: >> >>> The symptoms you describe sound similar to whats found when there''s a >>> MAC address conflict. Make sure that the MAC addresses of your virtual >>> machines are completely unique and not used anywhere else in your >>> network. >>> >>> >> < -- snip --> >> >> Joshua, what will happen if the network aliases (eth0:0, eth0:1, eth0:2, >> etc) all have same MAC address? >> >> and how do I know what MAC address to give to the dom_U''s? >> >> > > Sub-interfaces (eth0:0, eth0:1, etc) should and do have the same MAC > address. Its when one gets into assigning MAC addresses to different > virtual interfaces for your domU (i.e. eth0 in your VM and eth1 in your > VM) that you need different MAC addresses. > > Additionally, unless you have a really good reason, each virtual > interface in your domU (eth0, eth1, etc) should be in a different > network. Meaning, if eth0 is in a 192.168.100.0/24 network, then eth1 > should not also be in 192.168.100.0/24. You can run into asynchronous > routing issues that way... as well as some fun when using stateless > protocols like UDP and ICMP. This is all not even considering what > happens when you''re also leveraging a stateful/connection-tracking > firewall like a Cisco FWSM (they don''t like to see asynchronous routing). > > You configure the MAC address of your domU''s virtual interface''s (VIF''s) > in your domU configuration file. For example, the following line gives > a domU two network interfaces: > > vif = [ ''bridge=xenbr20, mac=aa:bb:cc:00:00:80'', ''bridge=xenbr50, > mac=aa:bb:cc:00:00:81'' ] > > The first interface, aka eth0 in the domU, gets attached to xenbr20 in > the dom0. The second interface, aka eth1 in the domU, gets attached to > xenbr50 in the dom0. > > Regarding knowing what MAC address to give your domU''s, I believe you > can have Xen automatically assign one for you (but last time I checked, > it can change upon reboot of a VM). IMHO its better to just keep a list > of MAC addresses that you''ve assigned to individual specific interfaces > in your domU''s. Additionally, prefix the MAC address with an OUI (the > first three bytes) that you''ll never find on a vendor''s network card, > such as "aa:bb:cc". > > Hope this helps. > >Thanx, that helps. I already know about not using two different NIC''s on the same IP subnet, but using different virtual / alias NIC''s on the same physical NIC is a bit new to me. So, if dom0 is on 192.168.10.0/24, then would it be a problem if vm01 is on .10/24 - .19/24, vm02 on .20/24 - .29/24, etc - all on the same IP subnet? How is a MAC address calculated? i.e in hex / dec / binary etc? Is aa.bb.cc safe to use for the 1st 3 octets? -- Kind Regards Rudi Ahlers CEO, SoftDux Web: http://www.SoftDux.com Check out my technical blog, http://blog.softdux.com for Linux or other technical stuff, or visit http://www.WebHostingTalk.co.za for Web Hosting stugg _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rudi Ahlers wrote:> So, if dom0 is on 192.168.10.0/24, then would it be a problem if vm01 is > on .10/24 - .19/24, vm02 on .20/24 - .29/24, etc - all on the same IP > subnet?Hi! I don''t quite understand the notation you use, but if you mean: "Can I use addresses in different subnets for my domU:s?" ... the answer is yes, of course you can if you have setup routing correctly. But you don''t have to. You can have all your domU:s machines in the same subnet if you want. But if you have more than one address assigned to a single domU, the addresses should be in different subnets.> How is a MAC address calculated? i.e in hex / dec / binary etc?All MAC addresses are hex. See <http://en.wikipedia.org/wiki/MAC_address> for more info.> Is aa.bb.cc safe to use for the 1st 3 octets?Yes, because a MAC address that begins with "x2", "x6", "xA" or "xE" (where "x" is anything between 0-F inclusive) is a so called "locally administered address". No equipment you can buy should, as as I understand, use those MAC addresses. As a side note I must admit that I am a little confused about the following, which contradicts what I just said. But I guess that the folks at IEEE just made a few mistakes. lynx -dump http://standards.ieee.org/regauth/oui/oui.txt | \ egrep "^([0-9A-F][26AE]-[0-9A-F][0-9A-F]-[0-9A-F][0-9A-F])" 02-07-01 (hex) RACAL-DATACOM 02-1C-7C (hex) PERQ SYSTEMS CORPORATION 02-60-86 (hex) LOGIC REPLACEMENT TECH. LTD. 02-60-8C (hex) 3COM CORPORATION 02-70-01 (hex) RACAL-DATACOM 02-70-B0 (hex) M/A-COM INC. COMPANIES 02-70-B3 (hex) DATA RECALL LTD 02-9D-8E (hex) CARDIAC RECORDERS INC. 02-AA-3C (hex) OLIVETTI TELECOMM SPA (OLTECO) 02-BB-01 (hex) OCTOTHORPE CORP. 02-C0-8C (hex) 3COM CORPORATION 02-CF-1C (hex) COMMUNICATION MACHINERY CORP. 02-E6-D3 (hex) NIXDORF COMPUTER CORPORATION AA-00-00 (hex) DIGITAL EQUIPMENT CORPORATION AA-00-01 (hex) DIGITAL EQUIPMENT CORPORATION AA-00-02 (hex) DIGITAL EQUIPMENT CORPORATION AA-00-03 (hex) DIGITAL EQUIPMENT CORPORATION AA-00-04 (hex) DIGITAL EQUIPMENT CORPORATION BR /Martin Leben _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users