Environment: dom0 is OpenSolaris 2009.06, with Open HA 2009.06 installed. Because of Open HA, physical network connectivity from the dom0 to the switches is via a pair of NICs, e1000g drivers. Multiple VLANs configured on the dom0, this domU should exist on only one of them. Problem: HVM domU 64bit Debian 5.0.3 can see inbound traffic from the world but can''t respond. Details: If I ping from the dom0 or any other machine on the same subnet, the ARP requests are making it to the domU (As shown by ''cat /proc/net/arp'' getting populated with ARP entries that correspond correctly to the ping initiator''s IP address and MAC address on the domU) In domU, ''ifconfig -a'' shows the TX and RX packets increasing as expected. Unfortunately, since the domU can''t get to the network, I can''t get to the repositories to install tcpdump...>From the dom0, if I do ''snoop -d '' against the xvm vnic that Xen creates, I see the broadcast packets for the ARP request, but no response.''dmesg'' on the domU shows the network device as a ''RTL-8193C+'' during discovery. Network attached on dom0 with the following: virsh attach-interface zimbra0 bridge e1000g0 --vlanid=20 Details from dom0 (zimbra0 is the domU in question): root@mltproc1:~# virsh list Id Name State ---------------------------------- 0 Domain-0 running 2 zimbra0 blocked root@mltproc1:~# dladm show-link LINK CLASS MTU STATE OVER vnic0 vnic 1500 up e1000g2 rge0 phys 1500 unknown -- vnic3 vnic 1500 up e1000g0 e1000g0 phys 1500 up -- xvm2_0 vnic 1500 up e1000g0 vnic1 vnic 1500 up e1000g0 vzimbra0 vlan 1500 up e1000g0 e1000g2 phys 1500 up -- vnic2 vnic 1500 up e1000g2 vcmguest0 vlan 1500 up e1000g2 vmltsysadmin0 vlan 1500 up e1000g0 vmltmain0 vnic 1500 up e1000g2 root@mltproc1:~# dladm show-vlan LINK VID OVER FLAGS vzimbra0 20 e1000g0 ----- vcmguest0 7 e1000g2 ----- vmltsysadmin0 21 e1000g0 ----- root@mltproc1:~# dladm show-vnic LINK OVER SPEED MACADDRESS MACADDRTYPE VID vnic0 e1000g2 1000 2:8:20:df:e0:51 random 24 vnic3 e1000g0 1000 2:8:20:2e:4:6f random 27 xvm2_0 e1000g0 1000 0:16:3e:15:d2:eb fixed 20 vnic1 e1000g0 1000 2:8:20:57:a5:28 random 25 vnic2 e1000g2 1000 2:8:20:52:3c:e4 random 26 vmltmain0 e1000g2 1000 2:8:20:6b:9d:30 random 20 -- This message posted from opensolaris.org
Trunda, can you dump the configuration of the domU when it is in the failing state? (virsh dumpxml)
Certainly - not sure if the XML tags will survive posting, so if this doesn''t work I''ll attach it as a file. <domain type=''xen'' id=''2''> <name>zimbra0</name> <uuid>320c7796-577f-2e5e-dc4b-4b12604fc802</uuid> <os> <type>hvm</type> <loader>/usr/lib/xen/boot/hvmloader</loader> <boot dev=''hd''/> </os> <memory>786432</memory> <vcpu>1</vcpu> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <distro name=''debianLenny''/> <features> <acpi/> <apic/> <pae/> </features> <clock offset=''utc''/> <devices> <emulator>/usr/lib/xen/bin/qemu-dm</emulator> <interface type=''bridge''> <source bridge=''e1000g0''/> <target dev=''vif2.0''/> <vlan id=''20''/> <mac address=''00:16:3e:15:d2:eb''/> <script path=''vif-vnic''/> </interface> <disk type=''block'' device=''disk''> <driver name=''phy''/> <source dev=''/dev/zvol/dsk/zimbra_zonepool0/zimbraroot''/> <target dev=''hda''/> </disk> <input type=''mouse'' bus=''ps2''/> <graphics type=''vnc'' port=''5900''/> <console tty=''/dev/pts/4''/> </devices> </domain> -- This message posted from opensolaris.org
That all looks correct. I''ll try to reproduce the problem.
To test, I also set up an additional domU running Windows 2000 Professional SP4, and also unable to ping or be pinged. Since this is part of an Open HA cluster, I also migrated the domUs to another node in the cluster and experienced the same symptoms. For reference, I also have some cluster resources running as zones rather than Xen domUs, and I do have full network connectivity from those. -- This message posted from opensolaris.org
Also, check your switch to make sure it doesn''t have port security turned on (one mac per port) Tommy On Nov 3, 2009, at 1:01 AM, David Edmondson wrote:> Trunda, can you dump the configuration of the domU when it is in the > failing state? (virsh dumpxml) > > _______________________________________________ > xen-discuss mailing list > xen-discuss@opensolaris.org
> Also, check your switch to make sure it doesn''t have > port security > turned on (one mac per port)Unless I''m missing something, I can''t see how the switch could be involved. Traffic outbound from the domU doesn''t appear on the dom0 with snoop against the xen created vnic, so it isn''t reaching the switch. Additionally, pings from the dom0 to the domU aren''t answered (seemingly because the ARP reply fails to make it back to the dom0). That said, it''s always possible I''m misunderstanding something so I checked the switch (HP1800) and do not see any restrictions of one MAC address per port. Each of the VNICs I have set up on the dom0 going over the physical link for iSCSI and for Open HA interconnect (i.e. total of 4 VNICs) get assigned random MAC addresses, and all 4 are working, as well as VLANs from the dom0, so I think it''s safe to say that the switch is not preventing traffic from multiple MAC addresses per port. If there is something I''m overlooking or if that seems like I''m just plain wrong, I''m open to correction. -- This message posted from opensolaris.org
Is there any diagnostic step that I''m not considering? I''m at a point on this project were I don''t really have any other useful work I can do until I figure out how to get the domUs talking to the network. The only thing I can find in logs that doesn''t make sense to me is in /var/log/xen/xend-debug.log Xend started at Wed Nov 4 14:14:00 2009. sh: line 1: brctl: not found sh: line 1: brctl: not found Warning: 2Mb page allocation failed for HVM guest. Isn''t ''brctl'' used on a Linux Xen dom0? This same line appears twice when starting either the Debian domU or the Windows 2000 domU. I''m not sure where the call to this is happening. -- This message posted from opensolaris.org
Tundra Slosek wrote:> Is there any diagnostic step that I''m not considering? I''m at a point on this project were I don''t really have any other useful work I can do until I figure out how to get the domUs talking to the network.Does you setup work if open HA isn''t installed?> The only thing I can find in logs that doesn''t make sense to me is in /var/log/xen/xend-debug.log > > Xend started at Wed Nov 4 14:14:00 2009. > sh: line 1: brctl: not found > sh: line 1: brctl: not foundknown bug... doesn''t hurt anything...> Warning: 2Mb page allocation failed for HVM guest.you most likely don''t have a dom0_mem set so the hypervisor can''t allocate contiguous physical chunks of memory for the guest... You really should set dom0_mem to something like 2g. MRJ
> > > Tundra Slosek wrote: > > Is there any diagnostic step that I''m not > considering? I''m at a point on this project were I > don''t really have any other useful work I can do > until I figure out how to get the domUs talking to > the network. > > Does you setup work if open HA isn''t installed?This will be a fun one to test... do you mean with Open HA uninstalled, or is booting outside of the cluster with ''-x'' in grub sufficient for this test? In my case, Xen was installed/configured AFTER OpenHA, so my domUs didn''t exist before OpenHA was installed.> > > Warning: 2Mb page allocation failed for HVM guest. > > you most likely don''t have a dom0_mem set so the > hypervisor > can''t allocate contiguous physical chunks of memory > for the > guest... You really should set dom0_mem to something > like 2g.Quite correct - I did not have dom0_mem set (or the other adjustements suggested on http://hub.opensolaris.org/bin/view/Community+Group+xen/configuring-dom0) - on a 8GB physical host with Open HA and Xen, I''ve set it to 2GB. -- This message posted from opensolaris.org
On 4 Nov 2009, at 21:15, Tundra Slosek wrote:>> >> >> Tundra Slosek wrote: >>> Is there any diagnostic step that I''m not >> considering? I''m at a point on this project were I >> don''t really have any other useful work I can do >> until I figure out how to get the domUs talking to >> the network. >> >> Does you setup work if open HA isn''t installed? > > This will be a fun one to test... do you mean with Open HA > uninstalled, or is booting outside of the cluster with ''-x'' in grub > sufficient for this test? In my case, Xen was installed/configured > AFTER OpenHA, so my domUs didn''t exist before OpenHA was installed.Please note that when Open HA Cluster (OHAC) is installed then network routing will be disabled by creating /etc/notrouter on each cluster node, i.e.dom0. As such, please ensure that your domU is configured to used the same subnet as the OHAC public interfaces and your domU default route is setup appropriately. In an earlier response, you also mentioned "For reference, I also have some cluster resources running as zones rather than Xen domUs, and I do have full network connectivity from those." I''m not sure if this is within the same OHAC, i.e. dom0 or on another cluster, however looking at those zones, running within the cluster, you will also note that they use the same subnet as the cluster''s public NICs.> >> >>> Warning: 2Mb page allocation failed for HVM guest. >> >> you most likely don''t have a dom0_mem set so the >> hypervisor >> can''t allocate contiguous physical chunks of memory >> for the >> guest... You really should set dom0_mem to something >> like 2g. > > Quite correct - I did not have dom0_mem set (or the other > adjustements suggested on http://hub.opensolaris.org/bin/view/Community+Group+xen/configuring-dom0) > - on a 8GB physical host with Open HA and Xen, I''ve set it to 2GB. > -- > This message posted from opensolaris.org > _______________________________________________ > xen-discuss mailing list > xen-discuss@opensolaris.org
Tundra Slosek wrote:>> >> Tundra Slosek wrote: >>> Is there any diagnostic step that I''m not >> considering? I''m at a point on this project were I >> don''t really have any other useful work I can do >> until I figure out how to get the domUs talking to >> the network. >> >> Does you setup work if open HA isn''t installed? > > This will be a fun one to test... do you mean with Open HA uninstalled, or is booting outside of the cluster with ''-x'' in grub sufficient for this test? In my case, Xen was installed/configured AFTER OpenHA, so my domUs didn''t exist before OpenHA was installed.I''ve never used Open HA so I''m not sure :-) Is there a way to disable it temporarily? Thanks, MRJ>>> Warning: 2Mb page allocation failed for HVM guest. >> you most likely don''t have a dom0_mem set so the >> hypervisor >> can''t allocate contiguous physical chunks of memory >> for the >> guest... You really should set dom0_mem to something >> like 2g. > > Quite correct - I did not have dom0_mem set (or the other adjustements suggested on http://hub.opensolaris.org/bin/view/Community+Group+xen/configuring-dom0) - on a 8GB physical host with Open HA and Xen, I''ve set it to 2GB.
> Please note that when Open HA Cluster (OHAC) is > installed then network > routing will be disabled by creating /etc/notrouter > on each cluster > node, i.e.dom0. As such, please ensure that your domU > is configured to > used the same subnet as the OHAC public interfaces > and your domU > default route is setup appropriately.I can''t get to the cluster at the moment to VNC into the domU to explicitly confirm (and I will do this later today, as it''s better to doublecheck and find a mistake than it is to assume I know what I''m talking about and not find the mistake) - however the intent is for the dom0, the domUs and my laptop to all be on the 192.168.11.0/24 subnet. I believe that SOME traffic is flowing. dom0 device ''vmltmain0'' is at 192.168.11.23, laptop is 192.168.11.37, domU is 192.168.11.88. If I do ''ping 192.168.11.88'' from my laptop, and then (remember the domU is Debian) do ''cat /proc/net/arp'' from the domU, I see the laptop''s MAC address correctly populated as an ARP entry against 192.168.11.37 -- but the dom0 snoop on the xvm created vnic for the domU shows no traffic going back out from the domU. It''s going to take a bit of doing to get tcpdump installed on the domU (it isn''t on the Debian 5.0.3 Install CD, and I don''t have network access, so will have to find it, get it onto an ISO and mount that against the domU with virsh) to be able to provide a snoop from the dom0 at the same timeframe as a tcpdump from the domU.> > In an earlier response, you also mentioned "For > reference, I also have > some cluster resources running as zones rather than > Xen domUs, and I > do have full network connectivity from those." I''m > not sure if this is > within the same OHAC, i.e. dom0 or on another > cluster, however looking > at those zones, running within the cluster, you will > also note that > they use the same subnet as the cluster''s public > NICs.These zones are indeed resources within this same cluster. I am able to ping them (f.e. zone smb2, 192.168.11.3, which uses a shared IP [attached by OpenHA as the logicalhostname], and zone gate1, 192.168.11.1, which uses an exclusive IP [attached in the zonecfg to a manually created vnic dedicated to this purpose on each physical node]) without issue from my laptop (192.168.11.37)> > > > >> > >>> Warning: 2Mb page allocation failed for HVM > guest. > >> > >> you most likely don''t have a dom0_mem set so the > >> hypervisor > >> can''t allocate contiguous physical chunks of > memory > >> for the > >> guest... You really should set dom0_mem to > something > >> like 2g. > > > > Quite correct - I did not have dom0_mem set (or the > other > > adjustements suggested on > http://hub.opensolaris.org/bin/view/Community+Group+xe > n/configuring-dom0) > > - on a 8GB physical host with Open HA and Xen, > I''ve set it to 2GB. > > -- > > This message posted from opensolaris.org > > _______________________________________________ > > xen-discuss mailing list > > xen-discuss@opensolaris.org > > _______________________________________________ > xen-discuss mailing list > xen-discuss@opensolaris.org-- This message posted from opensolaris.org
> >> Does you setup work if open HA isn''t installed? > > > > This will be a fun one to test... do you mean with > Open HA uninstalled, or is booting outside of the > cluster with ''-x'' in grub sufficient for this test? > In my case, Xen was installed/configured AFTER > OpenHA, so my domUs didn''t exist before OpenHA was > installed. > > I''ve never used Open HA so I''m not sure :-) Is there > a way to disable it temporarily? >As I understand, booting with ''-x'' in the grub line causes the node to boot and not try to be a member of the cluster. There is a mechanism to do it w/o the cluster software installed: 1.) copy the backing disk for Xen from the cluster managed resource to a zpool that lives strictly locally on the node 2.) use beadm to create a testing BE just for this purpose 3.) shutdown the node 4.) unplug the node from the network (both cables) 5.) startup the node in non-cluster mode on the test BE 6.) uninstall cluster software 7.) reboot again in the test BE 8.) use virsh to change the location the domU backing disk to point at the local zpool instead of the cluster managed zpool 9.) fire up the domU and see if it can contact the dom0 (it won''t be able to contact anything else because the node is unplugged from the network) 10.) whatever diagnostics are useful 11.) shutdown, plug in the network, and reboot back into the normal BE and in the cluster. It''s certainly do-able, and non-destructive, just more work. -- This message posted from opensolaris.org
I did the ''boot the node with -x so that it isn''t in the cluster'' and that doesn''t change the symptoms at all. -- This message posted from opensolaris.org
To follow up on this, I have confirmed that the dom0 and the domU are both in 192.168.11.0/255.255.255.0 I''m working on trying to get tcpdump on the domU still -- This message posted from opensolaris.org
I can reproduce this (using Solaris 10 as a guest) and filed 6899342 to track it. At the moment I can''t see a workaround.
> I can reproduce this (using Solaris 10 as a guest) > and filed 6899342 > to track it. > > At the moment I can''t see a workaround. > > _______________________________________________ > xen-discuss mailing list > xen-discuss@opensolaris.orgDavid, Looking at the bug report, do you think a possible workaround would be to hand a non-vlan vnic down from dom0 (i.e. instead of telling it to use ''e1000g0'' vlanid 20, having it just use e1000g0) and do the vlan tagging in the domU? It hands more trust to the domU than I would really like, but at this point my options are ''live with that risk'' or ''move to a completely different OS for dom0''. As I think about it, I suppose I could always add an additional physical NIC for the domUs to share and not allow that NIC to reach the VLANs I want private at the switch level. -- This message posted from opensolaris.org
David, first of all, thank you so much for the research you did, as it''s been enough for me to figure out a workaround to go forward with my current project. To followup a bit on what I''ve determined since stopping using the ''--vlanid'' within the virsh attach-interface: 1.) If I do ''virsh attach-interface zimbra0 bridge rge0'', where rge0 is an unused NIC which is physically wired to a switch port which uses ''primary, untagged vlan 20'' (i.e. the same way my workstations will all ultimately be configured), I have network connectivity. From the domU I don''t have to do anything other than setup IP address and routing.>From dom0, this is what shows when the domU is running:root@mltproc1:~ # virsh list Id Name State ---------------------------------- 0 Domain-0 running 3 zimbra0 blocked root@mltproc1:~# dladm show-link xvm3_0 LINK CLASS MTU STATE OVER xvm3_0 vnic 1500 up rge0 root@mltproc1:~# dladm show-vnic xvm3_0 LINK OVER SPEED MACADDRESS MACADDRTYPE VID xvm3_0 rge0 1000 0:16:3e:44:b1:7c fixed 0 2.) If I do ''virsh attach-interface zimbra0 bridge e1000g0'' (wired to a switch port which requires VLAN tags for all traffic), I do not get network connectivity - even if I set the VLAN tag within the domU. 3.) If I do ''virsh attach-interface zimbra0 bridge e1000g0'' (when wired to a switch port which defaults to VLAN 20 if no VLAN tag is present), I do get network connectivity (without needing to set the VLAN tag in the domU). If I set a VLAN tag within the domU, I am not able to get connectivity, which is what I want, from a security point of view, but I''m not sure I fully understand. It seems like I have no mechanism to get from a domU to a VLAN other than the ''default when no tag is present'' setting on my switch for the port. For my purposes on this project, this is acceptable and I can move forward, but I don''t fully understand it, and it wouldn''t be an usable constraint on my next project I''m hoping to use OpenSolaris/Xen/OpenHA for. -- This message posted from opensolaris.org