Marcus Goller
2008-Dec-05 10:27 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
Hi, I have sent the below email to the networking-discuss list, but failed to get any answers so far and also did some additional tests which lead me to the conclusion that it is actually more Crossbow related. To verify this, I ran the ISC dhcp software in the global zone, forcing it to also use the devices in /dev/net (like it would it a zone) but using the physical e1000g0 instead of the virtual interfaces. As far as I can tell, everything worked fine and I did not see the issues described below. Any ideas, what the issue could be or how to debug it, would be appreciated. Thanks, Marcus I have a very strange (at least for me) problem with getting the ISC dhcp server and relay agent to work in virtualized environment using zones and Crossbow. Currently I think I end up having some more generic networking questions, hence my post to this list. I am currently trying to simulate the following environment: - one client subnet - two dhcp server in two different networks - on router tying this networks together, with a dhcp relay agent running on it Client, servers and relay agent run in their own zone. Now when I compile the software with the default setting to use DLPI for sending packets towards the dhcp client, I get the following behaviour on the server: - after the first start of the server, I logs that it received a DISCOVER and sends on OFFER, though nothing can be seen on virtual wire (using snoop) - after that first request, no other packet is being received, although it can be seen that a DISCOVER is being sent towards the server (using snoop on that interface) - restarting the daemon does not make a difference - after about 15 min (I could not really pinpoint the exact time) without network traffic, it works again for exactly one request (same as first item) This can be worked around by defining "USE_SOCKET", then everything works flawlessly. I get a similar thing on the relay agent: - after first start, the relay agent receives the client DISCOVER on one interface, forwarding it to two servers, on two different networks using two different interfaces (this works because sockets seem to be used, using a normal IP unicast) - the (active) server responds with an OFFER, which gets received and processes by the relay agent. It then tries to send it out via DLPI (to set broadcast addresses accordingly) towards the client, which gets logged, but nothing can be seen on the network - all subsequent requests from the client get forwarded to the server, but the OFFER coming back is not seen by the relay agent, although I can see it on the network - after some time (about 15 min) without dhcp traffic it works again for exactly one request/response Unfortunately the relay agent needs to use DLPI, because "USE_SOCKETS" is only supported with one interface. This is what I found out after hours of debugging (using my intermediate C and debugging skills): - Four filedescriptors are open on the relay agent (3 on the interfaces + 1 on the socket). For the first packet, "select" reports two fd''s for reading (socket + one interface), for all following requests only one (socket) fd is reported to be ready, which causes the relay agent to not see the incoming OFFERs on the interface. - Using a debugger to see how the DLPI generated packet is being assembled (I suspected some problem there), I was able to successfully send a packet towards the client. Due to debugging, the time between the DISCOVER was received by the relay agent and the OFFER being sent out on the same interface, was about 30 minutes, which apparently made a difference in this case. I would very much like to get this setup running, because it allows me to simulate different network scenarios pretty easily. To me this looks like something is "blocking" the interface after the first use, causing "select" not to report anything, at least up to a certain timeout period, when everything works again. Any pointers, what the issue could be, or what else I can verify to get a better picture of what is happening, would be appreciated. I do not know if this is now a specific application issue (by incorrectly using DLPI) or just a special case in this virtual environment (which need to be worked around somehow).
Nicolas Droux
2008-Dec-05 14:57 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
Hi Marcus, Can you provide additional specific info on your setup, such as which VNICs are created on which interfaces or etherstubs, and assigned to which zones? Now that we integrated in ONNV build 106, you should also try these bits when they are available. We made several improvements to the code in the last few weeks as we were polishing the code for integration. Nicolas. On Dec 5, 2008, at 3:27 AM, Marcus Goller wrote:> Hi, > > I have sent the below email to the networking-discuss list, but failed > to get any answers so far and also did some additional tests which > lead me to the conclusion that it is actually more Crossbow related. > To verify this, I ran the ISC dhcp software in the global zone, > forcing it to also use the devices in /dev/net (like it would it a > zone) but using the physical e1000g0 instead of the virtual > interfaces. As far as I can tell, everything worked fine and I did not > see the issues described below. Any ideas, what the issue could be or > how to debug it, would be appreciated. > > Thanks, > > Marcus > > > I have a very strange (at least for me) problem with getting the ISC > dhcp server and relay agent to work in virtualized environment using > zones and Crossbow. Currently I think I end up having some more > generic networking questions, hence my post to this list. > I am currently trying to simulate the following environment: > > - one client subnet > - two dhcp server in two different networks > - on router tying this networks together, with a dhcp relay agent > running on it > > Client, servers and relay agent run in their own zone. > Now when I compile the software with the default setting to use DLPI > for sending packets towards the dhcp client, I get the following > behaviour on the server: > > - after the first start of the server, I logs that it received a > DISCOVER and sends on OFFER, though nothing can be seen on virtual > wire (using snoop) > - after that first request, no other packet is being received, > although it can be seen that a DISCOVER is being sent towards the > server (using snoop on that interface) > - restarting the daemon does not make a difference > - after about 15 min (I could not really pinpoint the exact time) > without network traffic, it works again for exactly one request (same > as first item) > > This can be worked around by defining "USE_SOCKET", then everything > works flawlessly. > I get a similar thing on the relay agent: > > - after first start, the relay agent receives the client DISCOVER on > one interface, forwarding it to two servers, on two different networks > using two different interfaces (this works because sockets seem to be > used, using a normal IP unicast) > - the (active) server responds with an OFFER, which gets received and > processes by the relay agent. It then tries to send it out via DLPI > (to set broadcast addresses accordingly) towards the client, which > gets logged, but nothing can be seen on the network > - all subsequent requests from the client get forwarded to the server, > but the OFFER coming back is not seen by the relay agent, although I > can see it on the network > - after some time (about 15 min) without dhcp traffic it works again > for exactly one request/response > > Unfortunately the relay agent needs to use DLPI, because "USE_SOCKETS" > is only supported with one interface. > > This is what I found out after hours of debugging (using my > intermediate C and debugging skills): > > - Four filedescriptors are open on the relay agent (3 on the > interfaces + 1 on the socket). For the first packet, "select" reports > two fd''s for reading (socket + one interface), for all following > requests only one (socket) fd is reported to be ready, which causes > the relay agent to not see the incoming OFFERs on the interface. > - Using a debugger to see how the DLPI generated packet is being > assembled (I suspected some problem there), I was able to successfully > send a packet towards the client. Due to debugging, the time between > the DISCOVER was received by the relay agent and the OFFER being sent > out on the same interface, was about 30 minutes, which apparently made > a difference in this case. > > I would very much like to get this setup running, because it allows me > to simulate different network scenarios pretty easily. > To me this looks like something is "blocking" the interface after the > first use, causing "select" not to report anything, at least up to a > certain timeout period, when everything works again. Any pointers, > what the issue could be, or what else I can verify to get a better > picture of what is happening, would be appreciated. > I do not know if this is now a specific application issue (by > incorrectly using DLPI) or just a special case in this virtual > environment (which need to be worked around somehow). > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. nicolas.droux at sun.com - http://blogs.sun.com/droux
Nicolas Droux
2008-Dec-05 15:01 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
On Dec 5, 2008, at 7:57 AM, Nicolas Droux wrote:> Hi Marcus, > > Can you provide additional specific info on your setup, such as > which VNICs are created on which interfaces or etherstubs, and > assigned to which zones? Now that we integrated in ONNV build 106, youIntegrated in build 105 not 106. I need to catch up on sleep :-)> should also try these bits when they are available. We made several > improvements to the code in the last few weeks as we were polishing > the code for integration. > > Nicolas. > > On Dec 5, 2008, at 3:27 AM, Marcus Goller wrote: > >> Hi, >> >> I have sent the below email to the networking-discuss list, but >> failed >> to get any answers so far and also did some additional tests which >> lead me to the conclusion that it is actually more Crossbow related. >> To verify this, I ran the ISC dhcp software in the global zone, >> forcing it to also use the devices in /dev/net (like it would it a >> zone) but using the physical e1000g0 instead of the virtual >> interfaces. As far as I can tell, everything worked fine and I did >> not >> see the issues described below. Any ideas, what the issue could be or >> how to debug it, would be appreciated. >> >> Thanks, >> >> Marcus >> >> >> I have a very strange (at least for me) problem with getting the ISC >> dhcp server and relay agent to work in virtualized environment using >> zones and Crossbow. Currently I think I end up having some more >> generic networking questions, hence my post to this list. >> I am currently trying to simulate the following environment: >> >> - one client subnet >> - two dhcp server in two different networks >> - on router tying this networks together, with a dhcp relay agent >> running on it >> >> Client, servers and relay agent run in their own zone. >> Now when I compile the software with the default setting to use DLPI >> for sending packets towards the dhcp client, I get the following >> behaviour on the server: >> >> - after the first start of the server, I logs that it received a >> DISCOVER and sends on OFFER, though nothing can be seen on virtual >> wire (using snoop) >> - after that first request, no other packet is being received, >> although it can be seen that a DISCOVER is being sent towards the >> server (using snoop on that interface) >> - restarting the daemon does not make a difference >> - after about 15 min (I could not really pinpoint the exact time) >> without network traffic, it works again for exactly one request (same >> as first item) >> >> This can be worked around by defining "USE_SOCKET", then everything >> works flawlessly. >> I get a similar thing on the relay agent: >> >> - after first start, the relay agent receives the client DISCOVER on >> one interface, forwarding it to two servers, on two different >> networks >> using two different interfaces (this works because sockets seem to be >> used, using a normal IP unicast) >> - the (active) server responds with an OFFER, which gets received and >> processes by the relay agent. It then tries to send it out via DLPI >> (to set broadcast addresses accordingly) towards the client, which >> gets logged, but nothing can be seen on the network >> - all subsequent requests from the client get forwarded to the >> server, >> but the OFFER coming back is not seen by the relay agent, although I >> can see it on the network >> - after some time (about 15 min) without dhcp traffic it works again >> for exactly one request/response >> >> Unfortunately the relay agent needs to use DLPI, because >> "USE_SOCKETS" >> is only supported with one interface. >> >> This is what I found out after hours of debugging (using my >> intermediate C and debugging skills): >> >> - Four filedescriptors are open on the relay agent (3 on the >> interfaces + 1 on the socket). For the first packet, "select" reports >> two fd''s for reading (socket + one interface), for all following >> requests only one (socket) fd is reported to be ready, which causes >> the relay agent to not see the incoming OFFERs on the interface. >> - Using a debugger to see how the DLPI generated packet is being >> assembled (I suspected some problem there), I was able to >> successfully >> send a packet towards the client. Due to debugging, the time between >> the DISCOVER was received by the relay agent and the OFFER being sent >> out on the same interface, was about 30 minutes, which apparently >> made >> a difference in this case. >> >> I would very much like to get this setup running, because it allows >> me >> to simulate different network scenarios pretty easily. >> To me this looks like something is "blocking" the interface after the >> first use, causing "select" not to report anything, at least up to a >> certain timeout period, when everything works again. Any pointers, >> what the issue could be, or what else I can verify to get a better >> picture of what is happening, would be appreciated. >> I do not know if this is now a specific application issue (by >> incorrectly using DLPI) or just a special case in this virtual >> environment (which need to be worked around somehow). >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss > > -- > Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. > nicolas.droux at sun.com - http://blogs.sun.com/droux >-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. nicolas.droux at sun.com - http://blogs.sun.com/droux
Marcus Goller
2008-Dec-05 16:42 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
Hi Nicolas, Sure I can.>uname -aSunOS work01 5.11 net-virt_xb_40_snv_101_103108 i86pc i386 i86pc>dladm show-vnicLINK OVER SPEED MACADDRESS MACADDRTYPE VID a1 e1 0 2:8:20:59:b3:59 random 0 router1 e1 0 2:8:20:31:aa:4 random 0 router2 e2 0 2:8:20:b2:af:4c random 0 router3 e3 0 2:8:20:91:63:65 random 0 server1 e2 0 2:8:20:7d:50:e random 0 server2 e3 0 2:8:20:5d:a3:4b random 0 a1 is on the dhcpclient-zone (the name remained from the Getting Started document) on etherstub e1, together with router1, the client-net interface on the dhcprouter-zone (where the dhcp relay agent runs). router2 and server1 (dhcpserver1-zone) are on etherstub e2, forming the server network 1. router3 and server2 (dhcpserver2-zone) are on the server network2 (etherstub e3). Hope that helps. I saw that the latest SXCE release to download is 103, so I guess I need to do some reading on how to install ONNV 105 :-). Marcus On Fri, Dec 5, 2008 at 4:01 PM, Nicolas Droux <Nicolas.Droux at sun.com> wrote:> > On Dec 5, 2008, at 7:57 AM, Nicolas Droux wrote: > >> Hi Marcus, >> >> Can you provide additional specific info on your setup, such as which >> VNICs are created on which interfaces or etherstubs, and assigned to which >> zones? Now that we integrated in ONNV build 106, you > > Integrated in build 105 not 106. I need to catch up on sleep :-) > >> should also try these bits when they are available. We made several >> improvements to the code in the last few weeks as we were polishing the code >> for integration. >> >> Nicolas. >> >> On Dec 5, 2008, at 3:27 AM, Marcus Goller wrote: >> >>> Hi, >>> >>> I have sent the below email to the networking-discuss list, but failed >>> to get any answers so far and also did some additional tests which >>> lead me to the conclusion that it is actually more Crossbow related. >>> To verify this, I ran the ISC dhcp software in the global zone, >>> forcing it to also use the devices in /dev/net (like it would it a >>> zone) but using the physical e1000g0 instead of the virtual >>> interfaces. As far as I can tell, everything worked fine and I did not >>> see the issues described below. Any ideas, what the issue could be or >>> how to debug it, would be appreciated. >>> >>> Thanks, >>> >>> Marcus >>> >>> >>> I have a very strange (at least for me) problem with getting the ISC >>> dhcp server and relay agent to work in virtualized environment using >>> zones and Crossbow. Currently I think I end up having some more >>> generic networking questions, hence my post to this list. >>> I am currently trying to simulate the following environment: >>> >>> - one client subnet >>> - two dhcp server in two different networks >>> - on router tying this networks together, with a dhcp relay agent running >>> on it >>> >>> Client, servers and relay agent run in their own zone. >>> Now when I compile the software with the default setting to use DLPI >>> for sending packets towards the dhcp client, I get the following >>> behaviour on the server: >>> >>> - after the first start of the server, I logs that it received a >>> DISCOVER and sends on OFFER, though nothing can be seen on virtual >>> wire (using snoop) >>> - after that first request, no other packet is being received, >>> although it can be seen that a DISCOVER is being sent towards the >>> server (using snoop on that interface) >>> - restarting the daemon does not make a difference >>> - after about 15 min (I could not really pinpoint the exact time) >>> without network traffic, it works again for exactly one request (same >>> as first item) >>> >>> This can be worked around by defining "USE_SOCKET", then everything >>> works flawlessly. >>> I get a similar thing on the relay agent: >>> >>> - after first start, the relay agent receives the client DISCOVER on >>> one interface, forwarding it to two servers, on two different networks >>> using two different interfaces (this works because sockets seem to be >>> used, using a normal IP unicast) >>> - the (active) server responds with an OFFER, which gets received and >>> processes by the relay agent. It then tries to send it out via DLPI >>> (to set broadcast addresses accordingly) towards the client, which >>> gets logged, but nothing can be seen on the network >>> - all subsequent requests from the client get forwarded to the server, >>> but the OFFER coming back is not seen by the relay agent, although I >>> can see it on the network >>> - after some time (about 15 min) without dhcp traffic it works again >>> for exactly one request/response >>> >>> Unfortunately the relay agent needs to use DLPI, because "USE_SOCKETS" >>> is only supported with one interface. >>> >>> This is what I found out after hours of debugging (using my >>> intermediate C and debugging skills): >>> >>> - Four filedescriptors are open on the relay agent (3 on the >>> interfaces + 1 on the socket). For the first packet, "select" reports >>> two fd''s for reading (socket + one interface), for all following >>> requests only one (socket) fd is reported to be ready, which causes >>> the relay agent to not see the incoming OFFERs on the interface. >>> - Using a debugger to see how the DLPI generated packet is being >>> assembled (I suspected some problem there), I was able to successfully >>> send a packet towards the client. Due to debugging, the time between >>> the DISCOVER was received by the relay agent and the OFFER being sent >>> out on the same interface, was about 30 minutes, which apparently made >>> a difference in this case. >>> >>> I would very much like to get this setup running, because it allows me >>> to simulate different network scenarios pretty easily. >>> To me this looks like something is "blocking" the interface after the >>> first use, causing "select" not to report anything, at least up to a >>> certain timeout period, when everything works again. Any pointers, >>> what the issue could be, or what else I can verify to get a better >>> picture of what is happening, would be appreciated. >>> I do not know if this is now a specific application issue (by >>> incorrectly using DLPI) or just a special case in this virtual >>> environment (which need to be worked around somehow). >>> _______________________________________________ >>> crossbow-discuss mailing list >>> crossbow-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >> >> -- >> Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. >> nicolas.droux at sun.com - http://blogs.sun.com/droux >> > > -- > Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. > nicolas.droux at sun.com - http://blogs.sun.com/droux > >
Marcus
2008-Dec-10 09:46 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
Nicolas, Would it also help, or make a difference, to try out the December 8th Crossbow release, or would only have 105 the changes you mentioned? Thanks, Marcus On Fri, Dec 5, 2008 at 5:42 PM, Marcus Goller <mgoller at gmail.com> wrote:> Hi Nicolas, > > Sure I can. > >>uname -a > SunOS work01 5.11 net-virt_xb_40_snv_101_103108 i86pc i386 i86pc > >>dladm show-vnic > LINK OVER SPEED MACADDRESS MACADDRTYPE VID > a1 e1 0 2:8:20:59:b3:59 random 0 > router1 e1 0 2:8:20:31:aa:4 random 0 > router2 e2 0 2:8:20:b2:af:4c random 0 > router3 e3 0 2:8:20:91:63:65 random 0 > server1 e2 0 2:8:20:7d:50:e random 0 > server2 e3 0 2:8:20:5d:a3:4b random 0 > > a1 is on the dhcpclient-zone (the name remained from the Getting > Started document) on etherstub e1, together with router1, the > client-net interface on the dhcprouter-zone (where the dhcp relay > agent runs). router2 and server1 (dhcpserver1-zone) are on etherstub > e2, forming the server network 1. router3 and server2 > (dhcpserver2-zone) are on the server network2 (etherstub e3). > Hope that helps. > > I saw that the latest SXCE release to download is 103, so I guess I > need to do some reading on how to install ONNV 105 :-). > > Marcus > > On Fri, Dec 5, 2008 at 4:01 PM, Nicolas Droux <Nicolas.Droux at sun.com> wrote: >> >> On Dec 5, 2008, at 7:57 AM, Nicolas Droux wrote: >> >>> Hi Marcus, >>> >>> Can you provide additional specific info on your setup, such as which >>> VNICs are created on which interfaces or etherstubs, and assigned to which >>> zones? Now that we integrated in ONNV build 106, you >> >> Integrated in build 105 not 106. I need to catch up on sleep :-) >> >>> should also try these bits when they are available. We made several >>> improvements to the code in the last few weeks as we were polishing the code >>> for integration. >>> >>> Nicolas. >>> >>> On Dec 5, 2008, at 3:27 AM, Marcus Goller wrote: >>> >>>> Hi, >>>> >>>> I have sent the below email to the networking-discuss list, but failed >>>> to get any answers so far and also did some additional tests which >>>> lead me to the conclusion that it is actually more Crossbow related. >>>> To verify this, I ran the ISC dhcp software in the global zone, >>>> forcing it to also use the devices in /dev/net (like it would it a >>>> zone) but using the physical e1000g0 instead of the virtual >>>> interfaces. As far as I can tell, everything worked fine and I did not >>>> see the issues described below. Any ideas, what the issue could be or >>>> how to debug it, would be appreciated. >>>> >>>> Thanks, >>>> >>>> Marcus >>>> >>>> >>>> I have a very strange (at least for me) problem with getting the ISC >>>> dhcp server and relay agent to work in virtualized environment using >>>> zones and Crossbow. Currently I think I end up having some more >>>> generic networking questions, hence my post to this list. >>>> I am currently trying to simulate the following environment: >>>> >>>> - one client subnet >>>> - two dhcp server in two different networks >>>> - on router tying this networks together, with a dhcp relay agent running >>>> on it >>>> >>>> Client, servers and relay agent run in their own zone. >>>> Now when I compile the software with the default setting to use DLPI >>>> for sending packets towards the dhcp client, I get the following >>>> behaviour on the server: >>>> >>>> - after the first start of the server, I logs that it received a >>>> DISCOVER and sends on OFFER, though nothing can be seen on virtual >>>> wire (using snoop) >>>> - after that first request, no other packet is being received, >>>> although it can be seen that a DISCOVER is being sent towards the >>>> server (using snoop on that interface) >>>> - restarting the daemon does not make a difference >>>> - after about 15 min (I could not really pinpoint the exact time) >>>> without network traffic, it works again for exactly one request (same >>>> as first item) >>>> >>>> This can be worked around by defining "USE_SOCKET", then everything >>>> works flawlessly. >>>> I get a similar thing on the relay agent: >>>> >>>> - after first start, the relay agent receives the client DISCOVER on >>>> one interface, forwarding it to two servers, on two different networks >>>> using two different interfaces (this works because sockets seem to be >>>> used, using a normal IP unicast) >>>> - the (active) server responds with an OFFER, which gets received and >>>> processes by the relay agent. It then tries to send it out via DLPI >>>> (to set broadcast addresses accordingly) towards the client, which >>>> gets logged, but nothing can be seen on the network >>>> - all subsequent requests from the client get forwarded to the server, >>>> but the OFFER coming back is not seen by the relay agent, although I >>>> can see it on the network >>>> - after some time (about 15 min) without dhcp traffic it works again >>>> for exactly one request/response >>>> >>>> Unfortunately the relay agent needs to use DLPI, because "USE_SOCKETS" >>>> is only supported with one interface. >>>> >>>> This is what I found out after hours of debugging (using my >>>> intermediate C and debugging skills): >>>> >>>> - Four filedescriptors are open on the relay agent (3 on the >>>> interfaces + 1 on the socket). For the first packet, "select" reports >>>> two fd''s for reading (socket + one interface), for all following >>>> requests only one (socket) fd is reported to be ready, which causes >>>> the relay agent to not see the incoming OFFERs on the interface. >>>> - Using a debugger to see how the DLPI generated packet is being >>>> assembled (I suspected some problem there), I was able to successfully >>>> send a packet towards the client. Due to debugging, the time between >>>> the DISCOVER was received by the relay agent and the OFFER being sent >>>> out on the same interface, was about 30 minutes, which apparently made >>>> a difference in this case. >>>> >>>> I would very much like to get this setup running, because it allows me >>>> to simulate different network scenarios pretty easily. >>>> To me this looks like something is "blocking" the interface after the >>>> first use, causing "select" not to report anything, at least up to a >>>> certain timeout period, when everything works again. Any pointers, >>>> what the issue could be, or what else I can verify to get a better >>>> picture of what is happening, would be appreciated. >>>> I do not know if this is now a specific application issue (by >>>> incorrectly using DLPI) or just a special case in this virtual >>>> environment (which need to be worked around somehow). >>>> _______________________________________________ >>>> crossbow-discuss mailing list >>>> crossbow-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >>> >>> -- >>> Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. >>> nicolas.droux at sun.com - http://blogs.sun.com/droux >>> >> >> -- >> Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. >> nicolas.droux at sun.com - http://blogs.sun.com/droux >> >> >
Nicolas Droux
2008-Dec-10 19:04 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
Hi Marcus, Marcus wrote:> Nicolas, > > Would it also help, or make a difference, to try out the December 8th > Crossbow release, or would only have 105 the changes you mentioned?The bits that we just released have our latest changes, and match our project gate before we did our last resync with onnv for integration in snv_105. So these should be fine for you to use until the official snv_105 archives are released. Nicolas.> Thanks, > > Marcus > > On Fri, Dec 5, 2008 at 5:42 PM, Marcus Goller <mgoller at gmail.com> wrote: >> Hi Nicolas, >> >> Sure I can. >> >>> uname -a >> SunOS work01 5.11 net-virt_xb_40_snv_101_103108 i86pc i386 i86pc >> >>> dladm show-vnic >> LINK OVER SPEED MACADDRESS MACADDRTYPE VID >> a1 e1 0 2:8:20:59:b3:59 random 0 >> router1 e1 0 2:8:20:31:aa:4 random 0 >> router2 e2 0 2:8:20:b2:af:4c random 0 >> router3 e3 0 2:8:20:91:63:65 random 0 >> server1 e2 0 2:8:20:7d:50:e random 0 >> server2 e3 0 2:8:20:5d:a3:4b random 0 >> >> a1 is on the dhcpclient-zone (the name remained from the Getting >> Started document) on etherstub e1, together with router1, the >> client-net interface on the dhcprouter-zone (where the dhcp relay >> agent runs). router2 and server1 (dhcpserver1-zone) are on etherstub >> e2, forming the server network 1. router3 and server2 >> (dhcpserver2-zone) are on the server network2 (etherstub e3). >> Hope that helps. >> >> I saw that the latest SXCE release to download is 103, so I guess I >> need to do some reading on how to install ONNV 105 :-). >> >> Marcus >> >> On Fri, Dec 5, 2008 at 4:01 PM, Nicolas Droux <Nicolas.Droux at sun.com> wrote: >>> On Dec 5, 2008, at 7:57 AM, Nicolas Droux wrote: >>> >>>> Hi Marcus, >>>> >>>> Can you provide additional specific info on your setup, such as which >>>> VNICs are created on which interfaces or etherstubs, and assigned to which >>>> zones? Now that we integrated in ONNV build 106, you >>> Integrated in build 105 not 106. I need to catch up on sleep :-) >>> >>>> should also try these bits when they are available. We made several >>>> improvements to the code in the last few weeks as we were polishing the code >>>> for integration. >>>> >>>> Nicolas. >>>> >>>> On Dec 5, 2008, at 3:27 AM, Marcus Goller wrote: >>>> >>>>> Hi, >>>>> >>>>> I have sent the below email to the networking-discuss list, but failed >>>>> to get any answers so far and also did some additional tests which >>>>> lead me to the conclusion that it is actually more Crossbow related. >>>>> To verify this, I ran the ISC dhcp software in the global zone, >>>>> forcing it to also use the devices in /dev/net (like it would it a >>>>> zone) but using the physical e1000g0 instead of the virtual >>>>> interfaces. As far as I can tell, everything worked fine and I did not >>>>> see the issues described below. Any ideas, what the issue could be or >>>>> how to debug it, would be appreciated. >>>>> >>>>> Thanks, >>>>> >>>>> Marcus >>>>> >>>>> >>>>> I have a very strange (at least for me) problem with getting the ISC >>>>> dhcp server and relay agent to work in virtualized environment using >>>>> zones and Crossbow. Currently I think I end up having some more >>>>> generic networking questions, hence my post to this list. >>>>> I am currently trying to simulate the following environment: >>>>> >>>>> - one client subnet >>>>> - two dhcp server in two different networks >>>>> - on router tying this networks together, with a dhcp relay agent running >>>>> on it >>>>> >>>>> Client, servers and relay agent run in their own zone. >>>>> Now when I compile the software with the default setting to use DLPI >>>>> for sending packets towards the dhcp client, I get the following >>>>> behaviour on the server: >>>>> >>>>> - after the first start of the server, I logs that it received a >>>>> DISCOVER and sends on OFFER, though nothing can be seen on virtual >>>>> wire (using snoop) >>>>> - after that first request, no other packet is being received, >>>>> although it can be seen that a DISCOVER is being sent towards the >>>>> server (using snoop on that interface) >>>>> - restarting the daemon does not make a difference >>>>> - after about 15 min (I could not really pinpoint the exact time) >>>>> without network traffic, it works again for exactly one request (same >>>>> as first item) >>>>> >>>>> This can be worked around by defining "USE_SOCKET", then everything >>>>> works flawlessly. >>>>> I get a similar thing on the relay agent: >>>>> >>>>> - after first start, the relay agent receives the client DISCOVER on >>>>> one interface, forwarding it to two servers, on two different networks >>>>> using two different interfaces (this works because sockets seem to be >>>>> used, using a normal IP unicast) >>>>> - the (active) server responds with an OFFER, which gets received and >>>>> processes by the relay agent. It then tries to send it out via DLPI >>>>> (to set broadcast addresses accordingly) towards the client, which >>>>> gets logged, but nothing can be seen on the network >>>>> - all subsequent requests from the client get forwarded to the server, >>>>> but the OFFER coming back is not seen by the relay agent, although I >>>>> can see it on the network >>>>> - after some time (about 15 min) without dhcp traffic it works again >>>>> for exactly one request/response >>>>> >>>>> Unfortunately the relay agent needs to use DLPI, because "USE_SOCKETS" >>>>> is only supported with one interface. >>>>> >>>>> This is what I found out after hours of debugging (using my >>>>> intermediate C and debugging skills): >>>>> >>>>> - Four filedescriptors are open on the relay agent (3 on the >>>>> interfaces + 1 on the socket). For the first packet, "select" reports >>>>> two fd''s for reading (socket + one interface), for all following >>>>> requests only one (socket) fd is reported to be ready, which causes >>>>> the relay agent to not see the incoming OFFERs on the interface. >>>>> - Using a debugger to see how the DLPI generated packet is being >>>>> assembled (I suspected some problem there), I was able to successfully >>>>> send a packet towards the client. Due to debugging, the time between >>>>> the DISCOVER was received by the relay agent and the OFFER being sent >>>>> out on the same interface, was about 30 minutes, which apparently made >>>>> a difference in this case. >>>>> >>>>> I would very much like to get this setup running, because it allows me >>>>> to simulate different network scenarios pretty easily. >>>>> To me this looks like something is "blocking" the interface after the >>>>> first use, causing "select" not to report anything, at least up to a >>>>> certain timeout period, when everything works again. Any pointers, >>>>> what the issue could be, or what else I can verify to get a better >>>>> picture of what is happening, would be appreciated. >>>>> I do not know if this is now a specific application issue (by >>>>> incorrectly using DLPI) or just a special case in this virtual >>>>> environment (which need to be worked around somehow). >>>>> _______________________________________________ >>>>> crossbow-discuss mailing list >>>>> crossbow-discuss at opensolaris.org >>>>> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >>>> -- >>>> Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. >>>> nicolas.droux at sun.com - http://blogs.sun.com/droux >>>> >>> -- >>> Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. >>> nicolas.droux at sun.com - http://blogs.sun.com/droux >>> >>>
Marcus
2009-Jan-23 09:09 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
Hi Nicolas, On Wed, Dec 10, 2008 at 8:04 PM, Nicolas Droux <Nicolas.Droux at sun.com> wrote:> Hi Marcus, > > Marcus wrote: >> >> Nicolas, >> >> Would it also help, or make a difference, to try out the December 8th >> Crossbow release, or would only have 105 the changes you mentioned? > > The bits that we just released have our latest changes, and match our > project gate before we did our last resync with onnv for integration in > snv_105. > > So these should be fine for you to use until the official snv_105 archives > are released. > > Nicolas.I finally got some time to upgrade my test environment to snv_105 (SXCE). Zones are still running on snv_101 though, but for the underlying crossbow network infrastructure, only the global zone is relevant anyway. Initial tests look very promising, so far I have not been able to reproduce the issues I mentioned earlier (DHCP traffic not being accepted or forwarded). Thank you for the help. Marcus
Nicolas Droux
2009-Jan-23 23:55 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
Hi Marcus, That''s good news, thanks for the update. Nicolas. On Jan 23, 2009, at 2:09 AM, Marcus wrote:> Hi Nicolas, > > On Wed, Dec 10, 2008 at 8:04 PM, Nicolas Droux > <Nicolas.Droux at sun.com> wrote: >> Hi Marcus, >> >> Marcus wrote: >>> >>> Nicolas, >>> >>> Would it also help, or make a difference, to try out the December >>> 8th >>> Crossbow release, or would only have 105 the changes you mentioned? >> >> The bits that we just released have our latest changes, and match our >> project gate before we did our last resync with onnv for >> integration in >> snv_105. >> >> So these should be fine for you to use until the official snv_105 >> archives >> are released. >> >> Nicolas. > > I finally got some time to upgrade my test environment to snv_105 > (SXCE). Zones are still running on snv_101 though, but for the > underlying crossbow network infrastructure, only the global zone is > relevant anyway. Initial tests look very promising, so far I have not > been able to reproduce the issues I mentioned earlier (DHCP traffic > not being accepted or forwarded). > > Thank you for the help. > > Marcus-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Marcus
2009-Jan-26 14:54 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
Hi Nicolas, Seems I was a bit too quick with my success mail. I did not notice earlier that I ran the dhcp daemon using sockets (which work), instead of DLPI. This workaround does not help with the dhcp relay, so when I included the relay in my tests, I noticed the same behavior. I also re-tested this now on a fresh Opensolaris 2008.11 installation, with an upgrade to snv_105 by using the development repositories. I mirrored my setup and I can reproduce the issue again. Any pointer how I can find the root of the issue would be appreciated. Marcus On Sat, Jan 24, 2009 at 12:55 AM, Nicolas Droux <droux at sun.com> wrote:> Hi Marcus, > > That''s good news, thanks for the update. > > Nicolas. > > On Jan 23, 2009, at 2:09 AM, Marcus wrote: > >> Hi Nicolas, >> >> On Wed, Dec 10, 2008 at 8:04 PM, Nicolas Droux <Nicolas.Droux at sun.com> >> wrote: >>> >>> Hi Marcus, >>> >>> Marcus wrote: >>>> >>>> Nicolas, >>>> >>>> Would it also help, or make a difference, to try out the December 8th >>>> Crossbow release, or would only have 105 the changes you mentioned? >>> >>> The bits that we just released have our latest changes, and match our >>> project gate before we did our last resync with onnv for integration in >>> snv_105. >>> >>> So these should be fine for you to use until the official snv_105 >>> archives >>> are released. >>> >>> Nicolas. >> >> I finally got some time to upgrade my test environment to snv_105 >> (SXCE). Zones are still running on snv_101 though, but for the >> underlying crossbow network infrastructure, only the global zone is >> relevant anyway. Initial tests look very promising, so far I have not >> been able to reproduce the issues I mentioned earlier (DHCP traffic >> not being accepted or forwarded). >> >> Thank you for the help. >> >> Marcus > > -- > Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. > droux at sun.com - http://blogs.sun.com/droux > >
Brian Banister
2009-Feb-14 02:35 UTC
[crossbow-discuss] Strange networking issue with ISC dhcp and Crossbow
I believe I am experiencing the same problem. I have snv_106, serving 2 subnets with the isc dhcp server, one of the subnets is relayed. I see the relayed requests arrive when I run snoop but I see no indication that the dhcp server actually receives the requests. No response message is generated. The same binary (from blastwave) and configuration file worked fine on snv_98. Is there a fix for this yet? -- This message posted from opensolaris.org