thr3ads.net - Xen users - [Xen-users] domU network has sleeping sickness [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Marc Teichgraeber

2008-Mar-03 15:49 UTC

[Xen-users] domU network has sleeping sickness

Hi all,

I have a strange network problem with some domU''s on three XEN-Hosts.
They are loosing their network connectivity. I do bridged networking.
   * It happens randomly and could happen right after bootup of the domU
or anytime later.
   * The domU is not reachable from another host on the LAN.
   * The domU is always reachable from the dom0 (ssh, ping).
   * I can ''repair'' the connection when attaching to the
console and
ping out from the domU. First nothings happens, then the machine gets
back their network. (And thats also my momentary workaround, pinging all
the time from the console)
   * Pinging from another host at the same time helps too.
   * It could be that I can ping continously from one host and another
hosts gets only every 10th packet or so back.
   * The interfaces could come back from their sleep by itself.
   * When the networks has fallen asleep, ssh on the domU from another
host hangs, it does not come back with "no route to host" or
something.

I''m suspicious about the network controllers, they are the same on all
hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
(Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection
with I/O Acceleration"(Intel website). I''ve tried the latest e1000
driver from Intel but it does''nt helped.
I''ve checked all MAC Adresses, they are unique, also the IP Adresses.

Any ideas are welcome :)

-------------------------------------------------------------------------
"xm info" from host1,  openSUSE 10.2 (X86-64):

release                : 2.6.18.8-0.9-xen
version                : #1 SMP Sun Feb 10 22:48:05 UTC 2008
machine                : x86_64
nr_cpus                : 4
nr_nodes               : 1
sockets_per_node       : 2
cores_per_socket       : 2
threads_per_core       : 1
cpu_mhz                : 2327
hw_caps                :
bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
total_memory           : 32766
free_memory            : 21607
max_free_memory        : 21607
max_para_memory        : 21603
max_hvm_memory         : 21544
xen_major              : 3
xen_minor              : 0
xen_extra              : .3_11774-23
xen_caps               : xen-3.0-x86_64
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : 11774
cc_compiler            : gcc version 4.1.2 20061115 (prerelease) (SUSE
Linux)
cc_compile_by          : abuild
cc_compile_domain      : suse.de
cc_compile_date        : Thu Jan 10 21:22:54 UTC 2008
xend_config_format     : 2
-------------------------------------------------------------------------
"xm info" output on host2, openSUSE 10.3 (X86-64)

release                : 2.6.22.13-0.3-xen
version                : #1 SMP 2007/11/19 15:02:58 UTC
machine                : x86_64
nr_cpus                : 8
nr_nodes               : 1
sockets_per_node       : 2
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 3000
hw_caps                :
bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
total_memory           : 16382
free_memory            : 591
max_free_memory        : 591
max_para_memory        : 587
max_hvm_memory         : 577
xen_major              : 3
xen_minor              : 1
xen_extra              : .0_15042-51
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : 15042
cc_compiler            : gcc version 4.2.1 (SUSE Linux)
cc_compile_by          : abuild
cc_compile_domain      : suse.de
cc_compile_date        : Tue Sep 25 21:16:06 UTC 2007
xend_config_format     : 4

-- 
Marc Teichgraeber


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steven Timm

2008-Mar-03 16:05 UTC

head link

Re: [Xen-users] domU network has sleeping sickness

I''ve seen the same problem with my xen 3.1.0 setup.  What
the Xen gurus are telling us is that this is a symptom of Xen dom0
being busy and not servicing the network interrupts of the domu''s 
promptly.  Their advice to us was to shift an application that
had been running on dom0 to another Xen instance to see if that
would help.  We are in the process of implementing that solution now.

By the way my system (Dell poweredge2950) has got broadcomm
inbuilt network cards, not Intel E1000 so it is unlikely that
it is a network driver specific issue.

During these episodes of non-network connectivity, by the way,
it was not unusual to see the following kernel dump in dom0

2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: Call Trace:
2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: <IRQ> 
[<ffffffff8025
8269>] softlockup_tick+0xcc/0xde
2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: 
[<ffffffff8020e84d>]
  timer_interrupt+0x3a3/0x401
2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: 
[<ffffffff80258898>]
  handle_IRQ_event+0x4b/0x93
2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: 
[<ffffffff8025897e>]
  __do_IRQ+0x9e/0x100
2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: 
[<ffffffff8020cc97>]
  do_IRQ+0x63/0x71
2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: 
[<ffffffff8034b347>]
  evtchn_do_upcall+0xee/0x165
2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: 
[<ffffffff8020abca>]
  do_hypervisor_callback+0x1e/0x2c
2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: <EOI>

or

Feb 25 10:32:39 fermigrid6 kernel: BUG: soft lockup detected on CPU#0!
Feb 25 10:32:39 fermigrid6 kernel:
Feb 25 10:32:39 fermigrid6 kernel: Call Trace:
Feb 25 10:32:39 fermigrid6 kernel:  <IRQ> [<ffffffff80258269>] 
softlockup_tick+0xcc/0xde
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020e84d>] 
timer_interrupt+0x3a3/0x401
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80258898>] 
handle_IRQ_event+0x4b/0x93
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8025897e>] 
__do_IRQ+0x9e/0x100
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020cc97>] do_IRQ+0x63/0x71
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8034b347>] 
evtchn_do_upcall+0xee/0x165
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020abca>] 
do_hypervisor_callback+0x1e/0x2c
Feb 25 10:32:39 fermigrid6 kernel:  <EOI> [<ffffffff8020622a>] 
hypercall_page+0x22a/0x1000
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020622a>] 
hypercall_page+0x22a/0x1000
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8034b258>] 
force_evtchn_callback+0xa/0xb
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff803f2272>] 
thread_return+0xdf/0x119
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020622a>] 
hypercall_page+0x22a/0x1000
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80228a25>] 
__cond_resched+0x1c/0x44
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff803f25df>] 
cond_resched+0x37/0x42
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff802343c4>] 
ksoftirqd+0x0/0xbf
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80234432>] 
ksoftirqd+0x6e/0xbf
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff802422d7>] kthread+0xc8/0xf1
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020ae1c>] 
child_rip+0xa/0x12
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8024220f>] kthread+0x0/0xf1
Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020ae12>] 
child_rip+0x0/0x12

----------------

One of our dom0''s was running an LVS server, the other one on identical
hardware was not.  We moved the LVS server from one to the other and
the network problems and kernel panics followed it.

Steve Timm

On Mon, 3 Mar 2008, Marc Teichgraeber wrote:
> Hi all,
>
> I have a strange network problem with some domU''s on three
XEN-Hosts.
> They are loosing their network connectivity. I do bridged networking.
>   * It happens randomly and could happen right after bootup of the domU
> or anytime later.
>   * The domU is not reachable from another host on the LAN.
>   * The domU is always reachable from the dom0 (ssh, ping).
>   * I can ''repair'' the connection when attaching to the
console and
> ping out from the domU. First nothings happens, then the machine gets
> back their network. (And thats also my momentary workaround, pinging all
> the time from the console)
>   * Pinging from another host at the same time helps too.
>   * It could be that I can ping continously from one host and another
> hosts gets only every 10th packet or so back.
>   * The interfaces could come back from their sleep by itself.
>   * When the networks has fallen asleep, ssh on the domU from another
> host hangs, it does not come back with "no route to host" or
something.
>
> I''m suspicious about the network controllers, they are the same on
all
> hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
> (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network
Connection
> with I/O Acceleration"(Intel website). I''ve tried the latest
e1000
> driver from Intel but it does''nt helped.
> I''ve checked all MAC Adresses, they are unique, also the IP
Adresses.
>
> Any ideas are welcome :)
>
> -------------------------------------------------------------------------
> "xm info" from host1,  openSUSE 10.2 (X86-64):
>
> release                : 2.6.18.8-0.9-xen
> version                : #1 SMP Sun Feb 10 22:48:05 UTC 2008
> machine                : x86_64
> nr_cpus                : 4
> nr_nodes               : 1
> sockets_per_node       : 2
> cores_per_socket       : 2
> threads_per_core       : 1
> cpu_mhz                : 2327
> hw_caps                :
> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
> total_memory           : 32766
> free_memory            : 21607
> max_free_memory        : 21607
> max_para_memory        : 21603
> max_hvm_memory         : 21544
> xen_major              : 3
> xen_minor              : 0
> xen_extra              : .3_11774-23
> xen_caps               : xen-3.0-x86_64
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : 11774
> cc_compiler            : gcc version 4.1.2 20061115 (prerelease) (SUSE
> Linux)
> cc_compile_by          : abuild
> cc_compile_domain      : suse.de
> cc_compile_date        : Thu Jan 10 21:22:54 UTC 2008
> xend_config_format     : 2
> -------------------------------------------------------------------------
> "xm info" output on host2, openSUSE 10.3 (X86-64)
>
> release                : 2.6.22.13-0.3-xen
> version                : #1 SMP 2007/11/19 15:02:58 UTC
> machine                : x86_64
> nr_cpus                : 8
> nr_nodes               : 1
> sockets_per_node       : 2
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 3000
> hw_caps                :
> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
> total_memory           : 16382
> free_memory            : 591
> max_free_memory        : 591
> max_para_memory        : 587
> max_hvm_memory         : 577
> xen_major              : 3
> xen_minor              : 1
> xen_extra              : .0_15042-51
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : 15042
> cc_compiler            : gcc version 4.2.1 (SUSE Linux)
> cc_compile_by          : abuild
> cc_compile_domain      : suse.de
> cc_compile_date        : Tue Sep 25 21:16:06 UTC 2007
> xend_config_format     : 4
>
>
-- 
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Alain BARTHE

2008-Mar-03 16:12 UTC

head link

RE [Xen-users] domU network has sleeping sickness

We had exactly the same problem, but with only one domU (among about 75 
other located on 15 dom0s), two months ago.
We done a quick and dirty fix : a croned ping every five minutes on the 
domU. Since that, we have no more user complains (this domU is heavily 
used).

I know it is not a real solution, it just work for us.

Alain.

xen-users-bounces@lists.xensource.com a écrit sur 03/03/2008 16:49:29 :
> Hi all,
> 
> I have a strange network problem with some domU''s on three
XEN-Hosts.
> They are loosing their network connectivity. I do bridged networking.
>    * It happens randomly and could happen right after bootup of the domU
> or anytime later.
>    * The domU is not reachable from another host on the LAN.
>    * The domU is always reachable from the dom0 (ssh, ping).
>    * I can ''repair'' the connection when attaching to the
console and
> ping out from the domU. First nothings happens, then the machine gets
> back their network. (And thats also my momentary workaround, pinging all
> the time from the console)
>    * Pinging from another host at the same time helps too.
>    * It could be that I can ping continously from one host and another
> hosts gets only every 10th packet or so back.
>    * The interfaces could come back from their sleep by itself.
>    * When the networks has fallen asleep, ssh on the domU from another
> host hangs, it does not come back with "no route to host" or
something.
> 
> I''m suspicious about the network controllers, they are the same on
all
> hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
> (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network
Connection
> with I/O Acceleration"(Intel website). I''ve tried the latest
e1000
> driver from Intel but it does''nt helped.
> I''ve checked all MAC Adresses, they are unique, also the IP
Adresses.
> 
> Any ideas are welcome :)
> 
> 
-------------------------------------------------------------------------> "xm info" from host1,  openSUSE 10.2 (X86-64):
> 
> release                : 2.6.18.8-0.9-xen
> version                : #1 SMP Sun Feb 10 22:48:05 UTC 2008
> machine                : x86_64
> nr_cpus                : 4
> nr_nodes               : 1
> sockets_per_node       : 2
> cores_per_socket       : 2
> threads_per_core       : 1
> cpu_mhz                : 2327
> hw_caps                :
> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
> total_memory           : 32766
> free_memory            : 21607
> max_free_memory        : 21607
> max_para_memory        : 21603
> max_hvm_memory         : 21544
> xen_major              : 3
> xen_minor              : 0
> xen_extra              : .3_11774-23
> xen_caps               : xen-3.0-x86_64
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : 11774
> cc_compiler            : gcc version 4.1.2 20061115 (prerelease) (SUSE
> Linux)
> cc_compile_by          : abuild
> cc_compile_domain      : suse.de
> cc_compile_date        : Thu Jan 10 21:22:54 UTC 2008
> xend_config_format     : 2
> 
-------------------------------------------------------------------------> "xm info" output on host2, openSUSE 10.3 (X86-64)
> 
> release                : 2.6.22.13-0.3-xen
> version                : #1 SMP 2007/11/19 15:02:58 UTC
> machine                : x86_64
> nr_cpus                : 8
> nr_nodes               : 1
> sockets_per_node       : 2
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 3000
> hw_caps                :
> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
> total_memory           : 16382
> free_memory            : 591
> max_free_memory        : 591
> max_para_memory        : 587
> max_hvm_memory         : 577
> xen_major              : 3
> xen_minor              : 1
> xen_extra              : .0_15042-51
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : 15042
> cc_compiler            : gcc version 4.2.1 (SUSE Linux)
> cc_compile_by          : abuild
> cc_compile_domain      : suse.de
> cc_compile_date        : Tue Sep 25 21:16:06 UTC 2007
> xend_config_format     : 4
> 
> -- 
> Marc Teichgraeber
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Marc Teichgraeber

2008-Mar-03 17:19 UTC

head link

Re: [Xen-users] domU network has sleeping sickness

Steven Timm wrote:> I''ve seen the same problem with my xen 3.1.0 setup.  What
> the Xen gurus are telling us is that this is a symptom of Xen dom0
> being busy and not servicing the network interrupts of the domu''s
> promptly.  Their advice to us was to shift an application that
> had been running on dom0 to another Xen instance to see if that
> would help.  We are in the process of implementing that solution now.
>
There is nothing running on my dom0''s. They''re only purpose is
managing
the domU''s.
On one of the problematic XEN-hosts is actually load on the three
domU''s, they are serving continous build systems. But another sleepy
XEN-host with five domU''s is more or less in pre-production state and
idling.
> By the way my system (Dell poweredge2950) has got broadcomm
> inbuilt network cards, not Intel E1000 so it is unlikely that
> it is a network driver specific issue.
>
> During these episodes of non-network connectivity, by the way,
> it was not unusual to see the following kernel dump in dom0
>
I do''nt find anything helpful or suspicious in any log. But maybe
I''m
missing it.
I''m looking in dom0 in dmesg, messages, warn, xend-debug.log,  xend.log
and xen-hotplug.log and in the domU in dmesg, messages and warn.
But after the bootup process there is more or less nothing important logged.
> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: Call Trace:
> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: <IRQ>
> [<ffffffff8025
> 8269>] softlockup_tick+0xcc/0xde
> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel:
> [<ffffffff8020e84d>]
>  timer_interrupt+0x3a3/0x401
> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel:
> [<ffffffff80258898>]
>  handle_IRQ_event+0x4b/0x93
> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel:
> [<ffffffff8025897e>]
>  __do_IRQ+0x9e/0x100
> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel:
> [<ffffffff8020cc97>]
>  do_IRQ+0x63/0x71
> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel:
> [<ffffffff8034b347>]
>  evtchn_do_upcall+0xee/0x165
> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel:
> [<ffffffff8020abca>]
>  do_hypervisor_callback+0x1e/0x2c
> 2008-02-05T18:35:16-06:00 s_sys@fermigrid5.fnal.gov kernel: <EOI>
>
> or
>
> Feb 25 10:32:39 fermigrid6 kernel: BUG: soft lockup detected on CPU#0!
> Feb 25 10:32:39 fermigrid6 kernel:
> Feb 25 10:32:39 fermigrid6 kernel: Call Trace:
> Feb 25 10:32:39 fermigrid6 kernel:  <IRQ> [<ffffffff80258269>]
> softlockup_tick+0xcc/0xde
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020e84d>]
> timer_interrupt+0x3a3/0x401
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80258898>]
> handle_IRQ_event+0x4b/0x93
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8025897e>]
> __do_IRQ+0x9e/0x100
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020cc97>]
do_IRQ+0x63/0x71
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8034b347>]
> evtchn_do_upcall+0xee/0x165
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020abca>]
> do_hypervisor_callback+0x1e/0x2c
> Feb 25 10:32:39 fermigrid6 kernel:  <EOI> [<ffffffff8020622a>]
> hypercall_page+0x22a/0x1000
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020622a>]
> hypercall_page+0x22a/0x1000
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8034b258>]
> force_evtchn_callback+0xa/0xb
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff803f2272>]
> thread_return+0xdf/0x119
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020622a>]
> hypercall_page+0x22a/0x1000
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80228a25>]
> __cond_resched+0x1c/0x44
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff803f25df>]
> cond_resched+0x37/0x42
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff802343c4>]
> ksoftirqd+0x0/0xbf
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80234432>]
> ksoftirqd+0x6e/0xbf
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff802422d7>]
> kthread+0xc8/0xf1
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020ae1c>]
> child_rip+0xa/0x12
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8024220f>]
kthread+0x0/0xf1
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020ae12>]
> child_rip+0x0/0x12
>
> ----------------
>
> One of our dom0''s was running an LVS server, the other one on
> identical hardware was not.  We moved the LVS server from one to the
> other and
> the network problems and kernel panics followed it.
>
> Steve Timm
>
> On Mon, 3 Mar 2008, Marc Teichgraeber wrote:
>
>> Hi all,
>>
>> I have a strange network problem with some domU''s on three
XEN-Hosts.
>> They are loosing their network connectivity. I do bridged networking.
>>   * It happens randomly and could happen right after bootup of the domU
>> or anytime later.
>>   * The domU is not reachable from another host on the LAN.
>>   * The domU is always reachable from the dom0 (ssh, ping).
>>   * I can ''repair'' the connection when attaching to
the console and
>> ping out from the domU. First nothings happens, then the machine gets
>> back their network. (And thats also my momentary workaround, pinging
all
>> the time from the console)
>>   * Pinging from another host at the same time helps too.
>>   * It could be that I can ping continously from one host and another
>> hosts gets only every 10th packet or so back.
>>   * The interfaces could come back from their sleep by itself.
>>   * When the networks has fallen asleep, ssh on the domU from another
>> host hangs, it does not come back with "no route to host" or
something.
>>
>> I''m suspicious about the network controllers, they are the
same on all
>> hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
>> (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network
Connection
>> with I/O Acceleration"(Intel website). I''ve tried the
latest e1000
>> driver from Intel but it does''nt helped.
>> I''ve checked all MAC Adresses, they are unique, also the IP
Adresses.
>>
>> Any ideas are welcome :)
>>
>>
-------------------------------------------------------------------------
>>
>> "xm info" from host1,  openSUSE 10.2 (X86-64):
>>
>> release                : 2.6.18.8-0.9-xen
>> version                : #1 SMP Sun Feb 10 22:48:05 UTC 2008
>> machine                : x86_64
>> nr_cpus                : 4
>> nr_nodes               : 1
>> sockets_per_node       : 2
>> cores_per_socket       : 2
>> threads_per_core       : 1
>> cpu_mhz                : 2327
>> hw_caps                :
>> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
>> total_memory           : 32766
>> free_memory            : 21607
>> max_free_memory        : 21607
>> max_para_memory        : 21603
>> max_hvm_memory         : 21544
>> xen_major              : 3
>> xen_minor              : 0
>> xen_extra              : .3_11774-23
>> xen_caps               : xen-3.0-x86_64
>> xen_pagesize           : 4096
>> platform_params        : virt_start=0xffff800000000000
>> xen_changeset          : 11774
>> cc_compiler            : gcc version 4.1.2 20061115 (prerelease) (SUSE
>> Linux)
>> cc_compile_by          : abuild
>> cc_compile_domain      : suse.de
>> cc_compile_date        : Thu Jan 10 21:22:54 UTC 2008
>> xend_config_format     : 2
>>
-------------------------------------------------------------------------
>>
>> "xm info" output on host2, openSUSE 10.3 (X86-64)
>>
>> release                : 2.6.22.13-0.3-xen
>> version                : #1 SMP 2007/11/19 15:02:58 UTC
>> machine                : x86_64
>> nr_cpus                : 8
>> nr_nodes               : 1
>> sockets_per_node       : 2
>> cores_per_socket       : 4
>> threads_per_core       : 1
>> cpu_mhz                : 3000
>> hw_caps                :
>> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
>> total_memory           : 16382
>> free_memory            : 591
>> max_free_memory        : 591
>> max_para_memory        : 587
>> max_hvm_memory         : 577
>> xen_major              : 3
>> xen_minor              : 1
>> xen_extra              : .0_15042-51
>> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
>> xen_scheduler          : credit
>> xen_pagesize           : 4096
>> platform_params        : virt_start=0xffff800000000000
>> xen_changeset          : 15042
>> cc_compiler            : gcc version 4.2.1 (SUSE Linux)
>> cc_compile_by          : abuild
>> cc_compile_domain      : suse.de
>> cc_compile_date        : Tue Sep 25 21:16:06 UTC 2007
>> xend_config_format     : 4
>>
>>
>

-- 
--------------------------------
Marc Teichgraeber
Systemadministrator
Systemadministration

neofonie GmbH
Robert-Koch-Platz 4
10115 Berlin
fon: +49.30 24627 185
fax: +49.30 24627 120
marc.teichgraeber@neofonie.de
http://www.neofonie.de 

Handelsregister
Berlin-Charlottenburg: HRB 67460

Geschaeftsfuehrung
Helmut Hoffer von Ankershoffen
Nurhan Yildirim
--------------------------------


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

GP lisper

2008-Mar-04 09:30 UTC

head link

[Xen-users] Re: domU network has sleeping sickness

On Mon, 03 Mar 2008 10:05:26 -0600 (CST), <timm@fnal.gov>
wrote:>
> I''ve seen the same problem with my xen 3.1.0 setup.  What
> the Xen gurus are telling us is that this is a symptom of Xen dom0
> being busy and not servicing the network interrupts of the domu''sWith the problem being made worse by employing bridging (the most
primitive form of routing).  Bridging over several machines brings a
large cost in overhead packet traffic, since most packets need to be
sent everywhere because people insist on setting these up as a /24.


-- 
One of the strokes of genius from McCarthy was making lists
the center of the language - kt


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Joshua West

2008-Mar-06 05:23 UTC

head link

Re: [Xen-users] domU network has sleeping sickness

The symptoms you describe sound similar to whats found when there''s a
MAC address conflict.  Make sure that the MAC addresses of your virtual
machines are completely unique and not used anywhere else in your network.

Marc Teichgraeber wrote:> Hi all,
> 
> I have a strange network problem with some domU''s on three
XEN-Hosts.
> They are loosing their network connectivity. I do bridged networking.
>    * It happens randomly and could happen right after bootup of the domU
> or anytime later.
>    * The domU is not reachable from another host on the LAN.
>    * The domU is always reachable from the dom0 (ssh, ping).
>    * I can ''repair'' the connection when attaching to the
console and
> ping out from the domU. First nothings happens, then the machine gets
> back their network. (And thats also my momentary workaround, pinging all
> the time from the console)
>    * Pinging from another host at the same time helps too.
>    * It could be that I can ping continously from one host and another
> hosts gets only every 10th packet or so back.
>    * The interfaces could come back from their sleep by itself.
>    * When the networks has fallen asleep, ssh on the domU from another
> host hangs, it does not come back with "no route to host" or
something.
> 
> I''m suspicious about the network controllers, they are the same on
all
> hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
> (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network
Connection
> with I/O Acceleration"(Intel website). I''ve tried the latest
e1000
> driver from Intel but it does''nt helped.
> I''ve checked all MAC Adresses, they are unique, also the IP
Adresses.
> 
> Any ideas are welcome :)
> 
> -------------------------------------------------------------------------
> "xm info" from host1,  openSUSE 10.2 (X86-64):
> 
> release                : 2.6.18.8-0.9-xen
> version                : #1 SMP Sun Feb 10 22:48:05 UTC 2008
> machine                : x86_64
> nr_cpus                : 4
> nr_nodes               : 1
> sockets_per_node       : 2
> cores_per_socket       : 2
> threads_per_core       : 1
> cpu_mhz                : 2327
> hw_caps                :
> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
> total_memory           : 32766
> free_memory            : 21607
> max_free_memory        : 21607
> max_para_memory        : 21603
> max_hvm_memory         : 21544
> xen_major              : 3
> xen_minor              : 0
> xen_extra              : .3_11774-23
> xen_caps               : xen-3.0-x86_64
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : 11774
> cc_compiler            : gcc version 4.1.2 20061115 (prerelease) (SUSE
> Linux)
> cc_compile_by          : abuild
> cc_compile_domain      : suse.de
> cc_compile_date        : Thu Jan 10 21:22:54 UTC 2008
> xend_config_format     : 2
> -------------------------------------------------------------------------
> "xm info" output on host2, openSUSE 10.3 (X86-64)
> 
> release                : 2.6.22.13-0.3-xen
> version                : #1 SMP 2007/11/19 15:02:58 UTC
> machine                : x86_64
> nr_cpus                : 8
> nr_nodes               : 1
> sockets_per_node       : 2
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 3000
> hw_caps                :
> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
> total_memory           : 16382
> free_memory            : 591
> max_free_memory        : 591
> max_para_memory        : 587
> max_hvm_memory         : 577
> xen_major              : 3
> xen_minor              : 1
> xen_extra              : .0_15042-51
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : 15042
> cc_compiler            : gcc version 4.2.1 (SUSE Linux)
> cc_compile_by          : abuild
> cc_compile_domain      : suse.de
> cc_compile_date        : Tue Sep 25 21:16:06 UTC 2007
> xend_config_format     : 4
> 

-- 
Joshua West
Systems Engineer
Brandeis University
http://www.brandeis.edu

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Rudi Ahlers

2008-Mar-06 07:31 UTC

head link

Re: [Xen-users] domU network has sleeping sickness

Joshua West wrote:> The symptoms you describe sound similar to whats found when
there''s a
> MAC address conflict.  Make sure that the MAC addresses of your virtual
> machines are completely unique and not used anywhere else in your network.
>
> Marc Teichgraeber wrote:
>   
>> Hi all,
>>
>> I have a strange network problem with some domU''s on three
XEN-Hosts.
>> They are loosing their network connectivity. I do bridged networking.
>>    * It happens randomly and could happen right after bootup of the
domU
>> or anytime later.
>>    * The domU is not reachable from another host on the LAN.
>>    * The domU is always reachable from the dom0 (ssh, ping).
>>    * I can ''repair'' the connection when attaching to
the console and
>> ping out from the domU. First nothings happens, then the machine gets
>> back their network. (And thats also my momentary workaround, pinging
all
>> the time from the console)
>>    * Pinging from another host at the same time helps too.
>>    * It could be that I can ping continously from one host and another
>> hosts gets only every 10th packet or so back.
>>    * The interfaces could come back from their sleep by itself.
>>    * When the networks has fallen asleep, ssh on the domU from another
>> host hangs, it does not come back with "no route to host" or
something.
>>
>> I''m suspicious about the network controllers, they are the
same on all
>> hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
>> (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network
Connection
>> with I/O Acceleration"(Intel website). I''ve tried the
latest e1000
>> driver from Intel but it does''nt helped.
>> I''ve checked all MAC Adresses, they are unique, also the IP
Adresses.
>>
>>     < -- snip -->

Joshua, what will happen if the network aliases (eth0:0, eth0:1, eth0:2, 
etc) all have same MAC address?

and how do I know what MAC address to give to the dom_U''s?

-- 

Kind Regards
Rudi Ahlers
CEO, SoftDux

Web:   http://www.SoftDux.com
Check out my technical blog, http://blog.softdux.com for Linux or other
technical stuff, or visit http://www.WebHostingTalk.co.za for Web Hosting stugg



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Joshua West

2008-Mar-06 16:10 UTC

head link

Re: [Xen-users] domU network has sleeping sickness

Rudi Ahlers wrote:> Joshua West wrote:
>> The symptoms you describe sound similar to whats found when
there''s a
>> MAC address conflict.  Make sure that the MAC addresses of your virtual
>> machines are completely unique and not used anywhere else in your
>> network.
>>  
> < -- snip -->
> 
> Joshua, what will happen if the network aliases (eth0:0, eth0:1, eth0:2,
> etc) all have same MAC address?
> 
> and how do I know what MAC address to give to the dom_U''s?
> 
Sub-interfaces (eth0:0, eth0:1, etc) should and do have the same MAC
address.  Its when one gets into assigning MAC addresses to different
virtual interfaces for your domU (i.e. eth0 in your VM and eth1 in your
VM) that you need different MAC addresses.

Additionally, unless you have a really good reason, each virtual
interface in your domU (eth0, eth1, etc) should be in a different
network.  Meaning, if eth0 is in a 192.168.100.0/24 network, then eth1
should not also be in 192.168.100.0/24.  You can run into asynchronous
routing issues that way... as well as some fun when using stateless
protocols like UDP and ICMP.  This is all not even considering what
happens when you''re also leveraging a stateful/connection-tracking
firewall like a Cisco FWSM (they don''t like to see asynchronous
routing).

You configure the MAC address of your domU''s virtual
interface''s (VIF''s)
in your domU configuration file.  For example, the following line gives
a domU two network interfaces:

vif = [ ''bridge=xenbr20, mac=aa:bb:cc:00:00:80'',
''bridge=xenbr50,
mac=aa:bb:cc:00:00:81'' ]

The first interface, aka eth0 in the domU, gets attached to xenbr20 in
the dom0.  The second interface, aka eth1 in the domU, gets attached to
xenbr50 in the dom0.

Regarding knowing what MAC address to give your domU''s, I believe you
can have Xen automatically assign one for you (but last time I checked,
it can change upon reboot of a VM).  IMHO its better to just keep a list
of MAC addresses that you''ve assigned to individual specific interfaces
in your domU''s.  Additionally, prefix the MAC address with an OUI (the
first three bytes) that you''ll never find on a vendor''s
network card,
such as "aa:bb:cc".

Hope this helps.

-- 
Joshua West
Systems Engineer
Brandeis University
http://www.brandeis.edu

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Rudi Ahlers

2008-Mar-06 16:46 UTC

head link

Re: [Xen-users] domU network has sleeping sickness

Joshua West wrote:> Rudi Ahlers wrote:
>   
>> Joshua West wrote:
>>     
>>> The symptoms you describe sound similar to whats found when
there''s a
>>> MAC address conflict.  Make sure that the MAC addresses of your
virtual
>>> machines are completely unique and not used anywhere else in your
>>> network.
>>>  
>>>       
>> < -- snip -->
>>
>> Joshua, what will happen if the network aliases (eth0:0, eth0:1,
eth0:2,
>> etc) all have same MAC address?
>>
>> and how do I know what MAC address to give to the dom_U''s?
>>
>>     
>
> Sub-interfaces (eth0:0, eth0:1, etc) should and do have the same MAC
> address.  Its when one gets into assigning MAC addresses to different
> virtual interfaces for your domU (i.e. eth0 in your VM and eth1 in your
> VM) that you need different MAC addresses.
>
> Additionally, unless you have a really good reason, each virtual
> interface in your domU (eth0, eth1, etc) should be in a different
> network.  Meaning, if eth0 is in a 192.168.100.0/24 network, then eth1
> should not also be in 192.168.100.0/24.  You can run into asynchronous
> routing issues that way... as well as some fun when using stateless
> protocols like UDP and ICMP.  This is all not even considering what
> happens when you''re also leveraging a stateful/connection-tracking
> firewall like a Cisco FWSM (they don''t like to see asynchronous
routing).
>
> You configure the MAC address of your domU''s virtual
interface''s (VIF''s)
> in your domU configuration file.  For example, the following line gives
> a domU two network interfaces:
>
> vif = [ ''bridge=xenbr20, mac=aa:bb:cc:00:00:80'',
''bridge=xenbr50,
> mac=aa:bb:cc:00:00:81'' ]
>
> The first interface, aka eth0 in the domU, gets attached to xenbr20 in
> the dom0.  The second interface, aka eth1 in the domU, gets attached to
> xenbr50 in the dom0.
>
> Regarding knowing what MAC address to give your domU''s, I believe
you
> can have Xen automatically assign one for you (but last time I checked,
> it can change upon reboot of a VM).  IMHO its better to just keep a list
> of MAC addresses that you''ve assigned to individual specific
interfaces
> in your domU''s.  Additionally, prefix the MAC address with an OUI
(the
> first three bytes) that you''ll never find on a vendor''s
network card,
> such as "aa:bb:cc".
>
> Hope this helps.
>
>   Thanx, that helps. I already know about not using two different NIC''s
on
the same IP subnet, but using different virtual / alias NIC''s on the 
same physical NIC is a bit new to me.


So, if dom0 is on 192.168.10.0/24, then would it be a problem if vm01 is 
on .10/24 - .19/24, vm02 on .20/24 - .29/24, etc - all on the same IP 
subnet?

How is a MAC address calculated? i.e in hex / dec / binary etc?
Is aa.bb.cc safe to use for the 1st 3 octets?

-- 

Kind Regards
Rudi Ahlers
CEO, SoftDux

Web:   http://www.SoftDux.com
Check out my technical blog, http://blog.softdux.com for Linux or other
technical stuff, or visit http://www.WebHostingTalk.co.za for Web Hosting stugg


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Martin Leben

2008-Mar-06 22:43 UTC

head link

[Xen-users] Re: domU network has sleeping sickness

Rudi Ahlers wrote:> So, if dom0 is on 192.168.10.0/24, then would it be a problem if vm01 is 
> on .10/24 - .19/24, vm02 on .20/24 - .29/24, etc - all on the same IP 
> subnet?
Hi!

I don''t quite understand the notation you use, but if you mean:
    "Can I use addresses in different subnets for my domU:s?"
... the answer is yes, of course you can if you have setup routing correctly. 
But you don''t have to. You can have all your domU:s machines in the
same subnet
if you want. But if you have more than one address assigned to a single domU, 
the addresses should be in different subnets.

> How is a MAC address calculated? i.e in hex / dec / binary etc?
All MAC addresses are hex. See <http://en.wikipedia.org/wiki/MAC_address>
for
more info.

> Is aa.bb.cc safe to use for the 1st 3 octets?
Yes, because a MAC address that begins with "x2", "x6",
"xA" or "xE" (where "x"
is anything between 0-F inclusive) is a so called "locally administered 
address". No equipment you can buy should, as as I understand, use those
MAC
addresses.

As a side note I must admit that I am a little confused about the following, 
which contradicts what I just said. But I guess that the folks at IEEE just made
a few mistakes.

lynx -dump http://standards.ieee.org/regauth/oui/oui.txt | \
egrep "^([0-9A-F][26AE]-[0-9A-F][0-9A-F]-[0-9A-F][0-9A-F])"
02-07-01   (hex)                RACAL-DATACOM
02-1C-7C   (hex)                PERQ SYSTEMS CORPORATION
02-60-86   (hex)                LOGIC REPLACEMENT TECH. LTD.
02-60-8C   (hex)                3COM CORPORATION
02-70-01   (hex)                RACAL-DATACOM
02-70-B0   (hex)                M/A-COM INC. COMPANIES
02-70-B3   (hex)                DATA RECALL LTD
02-9D-8E   (hex)                CARDIAC RECORDERS INC.
02-AA-3C   (hex)                OLIVETTI TELECOMM SPA (OLTECO)
02-BB-01   (hex)                OCTOTHORPE CORP.
02-C0-8C   (hex)                3COM CORPORATION
02-CF-1C   (hex)                COMMUNICATION MACHINERY CORP.
02-E6-D3   (hex)                NIXDORF COMPUTER CORPORATION
AA-00-00   (hex)                DIGITAL EQUIPMENT CORPORATION
AA-00-01   (hex)                DIGITAL EQUIPMENT CORPORATION
AA-00-02   (hex)                DIGITAL EQUIPMENT CORPORATION
AA-00-03   (hex)                DIGITAL EQUIPMENT CORPORATION
AA-00-04   (hex)                DIGITAL EQUIPMENT CORPORATION

BR
/Martin Leben

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Mar 2008 - domU network has sleeping sickness

[Xen-users] domU network has sleeping sickness

Re: [Xen-users] domU network has sleeping sickness

RE [Xen-users] domU network has sleeping sickness

Re: [Xen-users] domU network has sleeping sickness

[Xen-users] Re: domU network has sleeping sickness

Re: [Xen-users] domU network has sleeping sickness

Re: [Xen-users] domU network has sleeping sickness

Re: [Xen-users] domU network has sleeping sickness

Re: [Xen-users] domU network has sleeping sickness

[Xen-users] Re: domU network has sleeping sickness