What is the current outlook for the interaction of IPMP and crossbow? Will management of network resources need to be above the physical network interfaces (and create lots of vnics and ipmp groups that way) or can it be on the IPMPng interface? Of course if the IPMPng interfaces were to be GLDv3 interfaces, we could just create vnics on top of that and everything would be sweet :-) Darren
Darren.Reed at Sun.COM writes:> Of course if the IPMPng interfaces were to be GLDv3 interfaces, > we could just create vnics on top of that and everything would be > sweet :-)IPMP is an IP-level mechanism, not a link-layer technology. Providing GLDv3 interfaces for IPMP group interfaces makes no sense to me. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Steffen Weiberle
2007-Feb-08 21:02 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Regarding the VNIC part of crossbow, I imagine IPMP should work, and will have to try that. However, if you use the IP instance part to make an interface (NIC, VLAN, VNIC, aggregation) exclusive to a non-global zone, it is not seen in the global zone where mpathd runs, but looking at the net-init SMF method, and it does not do a global zone check, so it might run in the NGZ. I will definitely test this and let you know how it goes. Steffen reply-to set to aliases Darren.Reed at Sun.COM wrote On 02/08/07 15:19,:> What is the current outlook for the interaction of IPMP and crossbow? > > Will management of network resources need to be above the physical > network interfaces (and create lots of vnics and ipmp groups that way) > or can it be on the IPMPng interface? > > Of course if the IPMPng interfaces were to be GLDv3 interfaces, > we could just create vnics on top of that and everything would be > sweet :-) > > Darren > > > _________________________________ > clearview-discuss mailing list > clearview-discuss at opensolaris.org
Erik Nordmark
2007-Feb-09 01:28 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Steffen Weiberle wrote:> Regarding the VNIC part of crossbow, I imagine IPMP should work, and > will have to try that. > > However, if you use the IP instance part to make an interface (NIC, > VLAN, VNIC, aggregation) exclusive to a non-global zone, it is not seen > in the global zone where mpathd runs, but looking at the net-init SMF > method, and it does not do a global zone check, so it might run in the NGZ. > > I will definitely test this and let you know how it goes.I''m pretty sure IPMP was tested in exclusive-IP zones as part of the test plan for IP Instances. Erik
Peter Memishian
2007-Feb-09 04:15 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
> Regarding the VNIC part of crossbow, I imagine IPMP should work, and> will have to try that. Absolutely. IPMP across IP interfaces that have been plumbed over VNICs should pose no problem. -- meem
Nicolas Droux
2007-Feb-09 19:25 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Peter Memishian wrote:> > Regarding the VNIC part of crossbow, I imagine IPMP should work, and > > will have to try that. > > Absolutely. IPMP across IP interfaces that have been plumbed over VNICs > should pose no problem.Yep, we''re already planning to support this. There''s on wrinkle with link-based detection failure though. Currently VNICs always report an "up" link state regardless of the state of the physical link. This was needed to preserve the inter-VNIC connectivity in the case of a physical link failure (ARP stops transmitting if the link state is down). We''ll have to revisit this design, and fix ARP, if we want to support link-based detection failure on top of VNICs. Nicolas. -- Nicolas Droux - Solaris Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
James Carlson
2007-Feb-09 22:54 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Nicolas Droux writes:> Peter Memishian wrote: > > > Regarding the VNIC part of crossbow, I imagine IPMP should work, and > > > will have to try that. > > > > Absolutely. IPMP across IP interfaces that have been plumbed over VNICs > > should pose no problem. > > Yep, we''re already planning to support this. > > There''s on wrinkle with link-based detection failure though. Currently > VNICs always report an "up" link state regardless of the state of the > physical link. This was needed to preserve the inter-VNIC connectivity > in the case of a physical link failure (ARP stops transmitting if the > link state is down). We''ll have to revisit this design, and fix ARP, if > we want to support link-based detection failure on top of VNICs.I''m not so sure I agree that ARP is actually "broken" here. Having the VNIC down implies that _nothing_ is or should be reachable via that interface. If there are "local" things that are available, but remote things that are not, then this doesn''t appear to mirror the way real interfaces plugged into a hub work -- if the upstream link on your hub goes down, you don''t get a local link down notification. That''s why IPMP uses ICMP probes -- to discern path failures. I think that if you want to reflect underlying physical link status back to the VNIC, you should do it _only_ in the case where there''s either exactly one VNIC on the link, or where none of the VNICs are able to talk to each other (for example, if they''re all on separate VLANs). Giving link down when the link isn''t really down sounds like a mixed message to me. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Nicolas Droux
2007-Feb-09 23:45 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
James Carlson wrote:>> There''s on wrinkle with link-based detection failure though. Currently >> VNICs always report an "up" link state regardless of the state of the >> physical link. This was needed to preserve the inter-VNIC connectivity >> in the case of a physical link failure (ARP stops transmitting if the >> link state is down). We''ll have to revisit this design, and fix ARP, if >> we want to support link-based detection failure on top of VNICs. > > I''m not so sure I agree that ARP is actually "broken" here. Having > the VNIC down implies that _nothing_ is or should be reachable via > that interface. If there are "local" things that are available, but > remote things that are not, then this doesn''t appear to mirror the way > real interfaces plugged into a hub work -- if the upstream link on > your hub goes down, you don''t get a local link down notification.Yes, I agree that it''s not a perfect solution either (note that I was using the conditional in my email above.) Maybe the "fix" in "fix ARP" in my previous email was too strong. I''m also worried that any other protocol or subsystem outside of our control which relies on the link state would be mislead and could be negatively affected as well.> That''s why IPMP uses ICMP probes -- to discern path failures.That''s always an option as well, and the way it works today in the code. I.e. probe-based detection failure would be required when using IPMP on top of VNICs. If that''s an acceptable requirement I''m fine leaving the code as-is :-)> I think that if you want to reflect underlying physical link status > back to the VNIC, you should do it _only_ in the case where there''s > either exactly one VNIC on the link, or where none of the VNICs are > able to talk to each other (for example, if they''re all on separate > VLANs).I''m concerned that this would be harder to track, and that the behavior of subsystems such as IPMP in a non-global zone could change depending on the VNICs created on an interface. Thanks, Nicolas. -- Nicolas Droux - Solaris Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Peter Memishian
2007-Feb-10 05:31 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
> I think that if you want to reflect underlying physical link status> back to the VNIC, you should do it _only_ in the case where there''s > either exactly one VNIC on the link, or where none of the VNICs are > able to talk to each other (for example, if they''re all on separate > VLANs). Seems like if VLANs are reimplemented as a type of VNIC, then the second approach will be required. Otherwise, the existing handling of link state on VLANs will be broken. -- meem
Peter Memishian
2007-Feb-10 05:38 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
> > That''s why IPMP uses ICMP probes -- to discern path failures.> > That''s always an option as well, and the way it works today in the code. > I.e. probe-based detection failure would be required when using IPMP on > top of VNICs. If that''s an acceptable requirement I''m fine leaving the > code as-is :-) I think it would only be OK if in.mpathd can figure out whether a given IP interface supports link-based falure detection, and require the customer to configure test addresses if it does not. (This is a problem today as well, but since most links currently support link-based failure detection, it''s not as pressing a problem. If there isn''t one already, I''ll log a bug on this.) I suspect there will also be a fair number of customers who will be disappointed in not having link-based failure detection over VNICs, since it''s become a very popular configuration choice. -- meem
Thirumalai Srinivasan
2007-Feb-11 22:20 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Iam wondering about the IPMP <--> VNIC feature interaction. If vnic1 and vnic2 are in the same IPMP group, IP requires that vnic1 and vnic2 be completely equivalent from an IP perspective, i.e. anything reachable via vnic1 is also reachable via vnic2 and vice-versa. Also vnic1 and vnic2 must be on the same broadcast domain. If vnic1 and vnic2 are plumbed over the same underlying physical NIC, the above conditions are satisified. But this case does not seem to be highly useful, since an underlying NIC failure will impact both vnics and there isn''t any real multipathing here. If vnic1 and vnic2 are plumbed over different physical NICs, then we have 2 sub-cases. i. vnic1 and vnic2 are the only vnics over the respective physical NICs. This satisfies the IPMP requirements. This subcase also means an underlying NIC failure can be translated to a vnic failure. Generating a DL_NOTE_LINK_DOWN is not an issue in this case. ii.There are some other vnics on either or both of the physical NICS apart from vnic1 and vnic2. This subcase violates the IPMP requirements, since some destinations are locally reachable from vnic1 and not from vnic2 and vice-versa. vnic1 and vnic2 are not really equivalent from IP''s perspective, unless we have those (v)bridge across the vnics over different physical NICs. But that makes assumptions about the vnic configurations and what is permitted. Thirumalai Steffen Weiberle wrote:> Regarding the VNIC part of crossbow, I imagine IPMP should work, and > will have to try that. >
James Carlson
2007-Feb-12 20:16 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Peter Memishian writes:> > > > That''s why IPMP uses ICMP probes -- to discern path failures. > > > > That''s always an option as well, and the way it works today in the code. > > I.e. probe-based detection failure would be required when using IPMP on > > top of VNICs. If that''s an acceptable requirement I''m fine leaving the > > code as-is :-) > > I think it would only be OK if in.mpathd can figure out whether a given IP > interface supports link-based falure detection, and require the customer > to configure test addresses if it does not.Yes. The point is that VNICs that allow "local" communication while the remote link is down can''t claim to have link-based failure detection.> (This is a problem today as > well, but since most links currently support link-based failure detection, > it''s not as pressing a problem. If there isn''t one already, I''ll log a > bug on this.)It sounds like the sort of thing that''s somewhat in the NWAM court.> I suspect there will also be a fair number of customers who will be > disappointed in not having link-based failure detection over VNICs, since > it''s become a very popular configuration choice.Yes. But as long as the VNIC is emulating an internal connection to a bridge or repeater of some sort, I think the answer makes a fair bit of sense. Now, if someone were to update the 802 protocols to allow a switch to notify other ports when one port fails ... -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Nicolas Droux
2007-Feb-12 20:46 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Thirumalai Srinivasan wrote:> i. vnic1 and vnic2 are the only vnics over the respective physical NICs. > This satisfies the IPMP requirements. This subcase also means an > underlying NIC failure can be translated to a vnic failure. Generating > a DL_NOTE_LINK_DOWN is not an issue in this case.I think we might also ok if there is more than one VNIC defined on top of the physical NIC, as long as these additional VNICs are also configured within IPMP groups.> ii.There are some other vnics on either or both of the physical NICS apart > from vnic1 and vnic2. This subcase violates the IPMP requirements, since > some destinations are locally reachable from vnic1 and not from vnic2 and > vice-versa. vnic1 and vnic2 are not really equivalent from IP''s > perspective, > unless we have those (v)bridge across the vnics over different > physical NICs. > But that makes assumptions about the vnic configurations and what is > permitted.I think this is similar to the case where IPMP would be used to group two NICs connected to two different switches, and some hosts are connected to only one switch but not the other. If one of the switches dies, then some of these hosts might become unreachable. Until we are able to build full virtual networks in the host itself, I agree that we might have to limit the use of physical NICs by VNICs when IPMP is in use. Nicolas. -- Nicolas Droux - Solaris Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
James Carlson
2007-Feb-12 21:26 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Nicolas Droux writes:> James Carlson wrote: > > That''s why IPMP uses ICMP probes -- to discern path failures. > > That''s always an option as well, and the way it works today in the code. > I.e. probe-based detection failure would be required when using IPMP on > top of VNICs. If that''s an acceptable requirement I''m fine leaving the > code as-is :-)I don''t know whether it''s an "acceptable requirement." What I''m pointing out is that if you report "link down," but the link is still actually usable for some subset of ''local'' communication, then something''s wrong with the definition of "link down" that''s in use here. Scratch IPMP out of the picture for a moment. Suppose you reported the real interface''s link-down event up to the VNIC. This would cause the IFF_RUNNING bit to disappear from the IP-level interface. And that would cause applications to avoid using the interface (declaring it to be "dead" and so forth), even though there are other VNICs on the same real interface that can still talk to each other. I don''t want to see us carve out exceptions to IFF_RUNNING. It''s one thing if a packet can sneak in or out because the underlying medium is somehow unstable during the "down" period, but quite another matter if "down" really means "partitioned somewhere other than this local NIC."> > I think that if you want to reflect underlying physical link status > > back to the VNIC, you should do it _only_ in the case where there''s > > either exactly one VNIC on the link, or where none of the VNICs are > > able to talk to each other (for example, if they''re all on separate > > VLANs). > > I''m concerned that this would be harder to track, and that the behavior > of subsystems such as IPMP in a non-global zone could change depending > on the VNICs created on an interface.Yes, but at least it''s a fair representation of the structure the user has created. The clear alternative that I can see is to *prohibit* the user from creating any VNICs that can talk to each other, unless they''re attached via a virtual bridge. That makes the administrative relationship clear. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Nicolas Droux
2007-Feb-13 00:25 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
James Carlson wrote:> What I''m pointing out is that if you report "link down," but the link > is still actually usable for some subset of ''local'' communication, > then something''s wrong with the definition of "link down" that''s in > use here. > > Scratch IPMP out of the picture for a moment. Suppose you reported > the real interface''s link-down event up to the VNIC. This would cause > the IFF_RUNNING bit to disappear from the IP-level interface. And > that would cause applications to avoid using the interface (declaring > it to be "dead" and so forth), even though there are other VNICs on > the same real interface that can still talk to each other.Yes, I have the same concern and pointed that out in my previous email as well.> I don''t want to see us carve out exceptions to IFF_RUNNING. It''s one > thing if a packet can sneak in or out because the underlying medium is > somehow unstable during the "down" period, but quite another matter if > "down" really means "partitioned somewhere other than this local NIC."So maybe we should virtualize the link state. So far we''ve assumed that the only information available by IPMP was the state of the link advertised by the VNIC, but it doesn''t have to be that way. I.e. we could have a "virtual link state", always up, which would be used to set IFF_RUNNING flag and would reflect the connectivity between VNICs, and a "physical link state", reflecting the state of the underlying hardware, which would be used by IPMP for link-based failure detection. Nicolas. -- Nicolas Droux - Solaris Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Richard L. Hamilton
2007-Feb-13 11:57 UTC
[crossbow-discuss] Re: Re: [clearview-discuss] IPMP and Crossbow
> If vnic1 and vnic2 are plumbed over the same > underlying physical NIC, the > above conditions are satisified. But this case does > not seem to be highly > useful, since an underlying NIC failure will impact > both vnics and there > isn''t > any real multipathing here.Just as "format" now goes to some trouble to warn about partitions currently in use as filesystems, swap, zpools, etc, so perhaps something (NWAM?) should warn about pointless (but not necessarily outright forbidden) configurations such as trying to multipath over multiple VNICs on the same physical NIC. This message posted from opensolaris.org
James Carlson
2007-Feb-13 16:29 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Nicolas Droux writes:> > I don''t want to see us carve out exceptions to IFF_RUNNING. It''s one > > thing if a packet can sneak in or out because the underlying medium is > > somehow unstable during the "down" period, but quite another matter if > > "down" really means "partitioned somewhere other than this local NIC." > > So maybe we should virtualize the link state. So far we''ve assumed that > the only information available by IPMP was the state of the link > advertised by the VNIC, but it doesn''t have to be that way. > > I.e. we could have a "virtual link state", always up, which would be > used to set IFF_RUNNING flag and would reflect the connectivity between > VNICs, and a "physical link state", reflecting the state of the > underlying hardware, which would be used by IPMP for link-based failure > detection.I suspect that proper administrative use of IPMP requires quite a bit more information than just physical link up/down. Once we stop caring about just the local interface (which is what IPMP historically has guarded), this reduces to the same problem as "path protection." Guarding against remote facilities failure means understanding the topology of the underlying network and knowing which parts are "important" for your application. In other words, the user setting up IPMP and the utilities that assist need to know whether two VNICs represent the same underlying interface. Attempting to put two VNICs with the same physical interface in the same group *might* be regarded as a configuration error -- unless the user understands that he hasn''t gained any redundancy by doing so. In addition, it may well be a configuration error if some VNIC pairs are grouped in one way between VNICs on particular physical interfaces, but other pairs are grouped in a different way. But I think that in this direction may lie eventual madness, as allowing bridges to connect in through VNICs means that connectivity itself is much harder to define. Stepping back for a moment, I think that, though it was implemented within IP itself, IPMP was originally intended to protect against physical interface failure primarily and switch path to "the default router" as a second issue. Instead of trying to define how to run IPMP on virtual interfaces, and dealing with the confusion and broken programming interfaces that result, I think we should be looking at means to allow IPMP-like fail-over between the physical interfaces used by VNICs and some sort of common probing mechanism for those who need the switch path protection feature. IPMP on VNICs might just be a syntax error. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Peter Memishian
2007-Feb-13 16:42 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
> Stepping back for a moment, I think that, though it was implemented> within IP itself, IPMP was originally intended to protect against > physical interface failure primarily and switch path to "the default > router" as a second issue. Instead of trying to define how to run > IPMP on virtual interfaces, and dealing with the confusion and broken > programming interfaces that result, I think we should be looking at > means to allow IPMP-like fail-over between the physical interfaces > used by VNICs and some sort of common probing mechanism for those who > need the switch path protection feature. I know you said "IPMP-like", but with respect to making IPMP do this, one challenge is that the set of VNICs -- rather than the physical devices they''re atop -- define the precise set of hardware addresses that may be tied to the IP addresses in the group. So, while the new IPMP model stops hosting the IP addresses on the IP interfaces in the group, the IP interfaces in the group still are important since they reflect the set of available hardware addresses. -- meem
James Carlson
2007-Feb-13 16:52 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Peter Memishian writes:> > > Stepping back for a moment, I think that, though it was implemented > > within IP itself, IPMP was originally intended to protect against > > physical interface failure primarily and switch path to "the default > > router" as a second issue. Instead of trying to define how to run > > IPMP on virtual interfaces, and dealing with the confusion and broken > > programming interfaces that result, I think we should be looking at > > means to allow IPMP-like fail-over between the physical interfaces > > used by VNICs and some sort of common probing mechanism for those who > > need the switch path protection feature. > > I know you said "IPMP-like", but with respect to making IPMP do this, one > challenge is that the set of VNICs -- rather than the physical devices > they''re atop -- define the precise set of hardware addresses that may be > tied to the IP addresses in the group. So, while the new IPMP model stops > hosting the IP addresses on the IP interfaces in the group, the IP > interfaces in the group still are important since they reflect the set of > available hardware addresses.I don''t think that''s a significant barrier, because the underlying physical interfaces themselves will have addresses. There''s certainly the potential for a bit of confusion with some VNICs borrowing those ''real'' MAC addresses and others using ones of their own devising, but I don''t think that really addresses the problem that the part being protected has different assumptions in the two cases. The inbound load-spreading aspect of shuffling IP addresses among MAC addresses, though, might be an interesting problem. I just don''t want to see us rush into supporting IPMP on VNICs and then discover later that it has either strange deployment artifacts or unsolvable bugs. We''ve been bitten before. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Peter Memishian
2007-Feb-13 17:04 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
> I don''t think that''s a significant barrier, because the underlying> physical interfaces themselves will have addresses. Which types of addresses are you talking about? If you''re talking about hardware addresses, then I think it would be inappropriate to map those to IP addresses that are hosted on the VNIC. > The inbound load-spreading aspect of shuffling IP addresses among MAC > addresses, though, might be an interesting problem. It''s the inbound bindings that I''m concerned about. -- meem
James Carlson
2007-Feb-13 17:23 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Peter Memishian writes:> > > I don''t think that''s a significant barrier, because the underlying > > physical interfaces themselves will have addresses. > > Which types of addresses are you talking about?MAC-layer> If you''re talking about > hardware addresses, then I think it would be inappropriate to map those to > IP addresses that are hosted on the VNIC.It''s not clear to me. It depends on how the VNICs are used, I think.> > The inbound load-spreading aspect of shuffling IP addresses among MAC > > addresses, though, might be an interesting problem. > > It''s the inbound bindings that I''m concerned about.Yes. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Erik Nordmark
2007-Feb-13 21:45 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
James Carlson wrote:> I''m not so sure I agree that ARP is actually "broken" here. Having > the VNIC down implies that _nothing_ is or should be reachable via > that interface. If there are "local" things that are available, but > remote things that are not, then this doesn''t appear to mirror the way > real interfaces plugged into a hub work -- if the upstream link on > your hub goes down, you don''t get a local link down notification. > > That''s why IPMP uses ICMP probes -- to discern path failures. > > I think that if you want to reflect underlying physical link status > back to the VNIC, you should do it _only_ in the case where there''s > either exactly one VNIC on the link, or where none of the VNICs are > able to talk to each other (for example, if they''re all on separate > VLANs). > > Giving link down when the link isn''t really down sounds like a mixed > message to me.I guess the issue is how to handle various forms of degraded operation as opposed to "this NIC is guaranteed to not receive or be able to send any packets". But do we really know that the semantics of Ethernet link test (which drivers use to declare "link down") in fact means that no packets can be received or sent? I certainly don''t know how link test reacts to partial failure (such as cutting the tx pair or rx pair but not both in a cat5 cable). Do you? If packets can be received and/or successfully transmitted when link test says "down" then we don''t have a black and white semantics. Instead we''d just have a difference in degree of degraded operation. Erik
James Carlson
2007-Feb-13 21:53 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Erik Nordmark writes:> But do we really know that the semantics of Ethernet link test (which > drivers use to declare "link down") in fact means that no packets can be > received or sent?It may well depend on the specific driver and the hardware, though the standards seem to say otherwise.> I certainly don''t know how link test reacts to partial failure (such as > cutting the tx pair or rx pair but not both in a cat5 cable). Do you?You fail to negotiate, and you''re suppose to leave the link down.> If packets can be received and/or successfully transmitted when link > test says "down" then we don''t have a black and white semantics. Instead > we''d just have a difference in degree of degraded operation.That''s not quite the case. The VNIC issue is that it creates a clique of nodes that can communicate with each other perfectly well. It''s as if they''re all connected to a single repeater. The fact that they''re in fact housed in the same enclosure and the same Solaris system doesn''t seem as important to me as the fact that they appear to be separate interfaces that communicate as peers. I''m intentionally not talking about the case where there might be intermittent connectivity even though the link layer reports "down." That''s understandable. What''s *NOT* understandable is to get a link down message when the network itself is remotely partitioned. That doesn''t make sense and doesn''t map at all well into the semantics we have. If we''re happy with intentionally trashing that local communication among VNICs when the external link goes down -- that is, reporting link down and at least _assuming_ that applications will be unable to use the local path to any advantage -- then my problem with it goes away. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Erik Nordmark
2007-Feb-13 22:04 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
James Carlson wrote:> Stepping back for a moment, I think that, though it was implemented > within IP itself, IPMP was originally intended to protect against > physical interface failure primarily and switch path to "the default > router" as a second issue.It wasn''t just to protect against NIC failures or router failures. It was to build a component that together with other components (e.g. STP failover between bridges, and routing protocol failover in routers) would ensure that there are no single points of failure between the host and the "cloud". The way this is implemented with ICMP probing to some probe targets capture this for most but not necessarily all failure modes. The cases one need to be careful about are bridges that can internally partition so that the port stays active (i.e. link test towards the host still says it is ok) but the port isn''t connected to e.g. the backplane in the switch any more. ICMP probing detects that case since packets can not reach any probe targets. But should the ports be on e.g. line cards and a line card becomes disconnected from other line cards on the switch (and the line card still works and continues to assert link test and forward frames), then should the probe targets be connected to the same line card as the host, the probing will not detect the partitioned bridge. > Instead of trying to define how to run > IPMP on virtual interfaces, and dealing with the confusion and broken > programming interfaces that result, I think we should be looking at > means to allow IPMP-like fail-over between the physical interfaces > used by VNICs and some sort of common probing mechanism for those who > need the switch path protection feature. I think the implications of this would be to do L2 failover (link aggregation) instead IPMP. The reason I think that would be the result is that presumably we''d want both vnic and vlans (which are different yet similar abstractions on top of a physical) to have similar properties. For VLANs we have the same issue, in the sense that vlan1 and vlan2 on top of bge0 should benefit from a single probing going on for bge0 instead of each vlan interface doing their independent probing. But in the case of vlans one can''t do L3 probing on the physical because the physical (which corresponds to either untagged frames or to any vlan tag you''d like) doesn''t have an IP subnet number assigned to it. And should one decide to assign such a subnet number just for the purposes of IPMP probing, then there is a significant risk of removing the isolation that was the reason for having VLANs in the first place! So I don''t see a useful solution that is something in between L2 and L3 probing and failure detection.> I just don''t want to see us rush into supporting IPMP on VNICs and > then discover later that it has either strange deployment artifacts or > unsolvable bugs. We''ve been bitten before.When were we bitten and due to what root cause? For instance customers do IPMP on VLANs with no ill effects that I''m aware of. Erik
James Carlson
2007-Feb-13 22:21 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
Erik Nordmark writes:> James Carlson wrote: > > > Stepping back for a moment, I think that, though it was implemented > > within IP itself, IPMP was originally intended to protect against > > physical interface failure primarily and switch path to "the default > > router" as a second issue. > > It wasn''t just to protect against NIC failures or router failures. > It was to build a component that together with other components (e.g. > STP failover between bridges, and routing protocol failover in routers) > would ensure that there are no single points of failure between the host > and the "cloud".But it can''t actually deliver that, as it doesn''t probe out all targets within the local network.> The way this is implemented with ICMP probing to some probe targets > capture this for most but not necessarily all failure modes.Right, which is what I was pointing out above. It helps if you care about a special paty to a special egress node in your network. It doesn''t _necessarily_ help if you have multiple such local paths.> The cases one need to be careful about are bridges that can internally > partition so that the port stays active (i.e. link test towards the host > still says it is ok) but the port isn''t connected to e.g. the backplane > in the switch any more.Yes.> ICMP probing detects that case since packets can not reach any probe > targets. > But should the ports be on e.g. line cards and a line card becomes > disconnected from other line cards on the switch (and the line card > still works and continues to assert link test and forward frames), then > should the probe targets be connected to the same line card as the host, > the probing will not detect the partitioned bridge.It sounds like we''re saying the same thing.> > Instead of trying to define how to run > > IPMP on virtual interfaces, and dealing with the confusion and broken > > programming interfaces that result, I think we should be looking at > > means to allow IPMP-like fail-over between the physical interfaces > > used by VNICs and some sort of common probing mechanism for those who > > need the switch path protection feature. > > I think the implications of this would be to do L2 failover (link > aggregation) instead IPMP.Possibly so.> The reason I think that would be the result > is that presumably we''d want both vnic and vlans (which are different > yet similar abstractions on top of a physical) to have similar properties. > > For VLANs we have the same issue, in the sense that vlan1 and vlan2 on > top of bge0 should benefit from a single probing going on for bge0 > instead of each vlan interface doing their independent probing. But in > the case of vlans one can''t do L3 probing on the physical because the > physical (which corresponds to either untagged frames or to any vlan tag > you''d like) doesn''t have an IP subnet number assigned to it. > And should one decide to assign such a subnet number just for the > purposes of IPMP probing, then there is a significant risk of removing > the isolation that was the reason for having VLANs in the first place!Yes; there''s an analogous bit of trouble there. Fortunately, with VLANs we can (and do) reflect the link-down notification to all the clients without causing confusion. That doesn''t seem possible here. If we report link down to a VNIC, then we''re saying that the link is in fact down, when it''s not. As I said in my last message, if we really do regard the VNIC as down (and don''t have applications that _rely_ on local VNIC-to-VNIC traffic) when the external link layer goes down, then that inconsistency goes away.> So I don''t see a useful solution that is something in between L2 and L3 > probing and failure detection.That''s just the niche (with routing entirely at L3 and standards-based failover at L2) that IPMP seems to fill.> > I just don''t want to see us rush into supporting IPMP on VNICs and > > then discover later that it has either strange deployment artifacts or > > unsolvable bugs. We''ve been bitten before. > > When were we bitten and due to what root cause?IPMP in general has a fair number of these, and that''s what I was referring to. In particular, routing daemons are baffled by addresses moving back and forth between interfaces and by the subtle way groups interact with routes (that RTA_IFP isn''t what you think it is) and with inbound packets, and some applications end up getting confused by "test addresses" that look entirely usable but aren''t. Those problems are being addressed by Clearview. What I''m suggesting is that it''d be good to avoid creating another "almost like a normal interface but not quite" element in Solaris. It could easily mean another round of Clearview-like fixes in the future.> For instance customers do IPMP on VLANs with no ill effects that I''m > aware of.True. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Erik Nordmark
2007-Feb-14 02:30 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
James Carlson wrote:> Erik Nordmark writes: >> But do we really know that the semantics of Ethernet link test (which >> drivers use to declare "link down") in fact means that no packets can be >> received or sent? > > It may well depend on the specific driver and the hardware, though the > standards seem to say otherwise. > >> I certainly don''t know how link test reacts to partial failure (such as >> cutting the tx pair or rx pair but not both in a cat5 cable). Do you? > > You fail to negotiate, and you''re suppose to leave the link down.And if the cable is partially severed after the negotiation is complete?>> If packets can be received and/or successfully transmitted when link >> test says "down" then we don''t have a black and white semantics. Instead >> we''d just have a difference in degree of degraded operation. > > That''s not quite the case.Because you use a different definition of "degree" than I do? ;-)> The VNIC issue is that it creates a clique of nodes that can > communicate with each other perfectly well. It''s as if they''re all > connected to a single repeater.I know that.> The fact that they''re in fact housed in the same enclosure and the > same Solaris system doesn''t seem as important to me as the fact that > they appear to be separate interfaces that communicate as peers.But one potentially have the same failure modes with largish bridges, where a line card could be isolated from the rest of the bridge yet still forward frames between the ports on that line card.> I''m intentionally not talking about the case where there might be > intermittent connectivity even though the link layer reports "down." > That''s understandable. What''s *NOT* understandable is to get a link > down message when the network itself is remotely partitioned. That > doesn''t make sense and doesn''t map at all well into the semantics we > have.I thought you argued in an email that the bridges should propagate link down from one port to the other. Was that a joke?> If we''re happy with intentionally trashing that local communication > among VNICs when the external link goes down -- that is, reporting > link down and at least _assuming_ that applications will be unable to > use the local path to any advantage -- then my problem with it goes > away.You seem to be assuming the answer to my question, yet I''m still looking for the most desirable answer. Should we or should we not treat !IFF_RUNNING as a ''no packets can make it through'' or a ''serious problems here''. If it is the latter then *if* there is an alternative (IPMP, routing protocol), then it makes sense to switch. But if there is no alternative then it doesn''t make sense to do more than report in FMA (which we should do in the first case as well). Thus in the latter case it makes no sense for applications (other than routing/failover/fma ones) to look at IFF_RUNNING. Erik
Erik Nordmark
2007-Feb-14 02:55 UTC
[crossbow-discuss] Re: [clearview-discuss] IPMP and Crossbow
James Carlson wrote:> But it can''t actually deliver that, as it doesn''t probe out all > targets within the local network.I tried to explain that with my examples. Here is another attempt: As long as there isn''t a gap but instead some non-zero overlap between the different pieces of the chain of failover technologies, we can ensure that there is no single point of failure. Thus as long as IPMP probes reach *sufficiently far* into the network to get epsilon past the point where STP will take over and be responsible to find alternate paths, and with an assumed epsilon overlap between STP and the routing protocols towards the routers to take us the rest of the way to any endpoint in the network. As I said further down, there is a theoretical bridge failure mode (line card partitioned from rest of bridge and the bridge continues to forward frames between the ports on that linecard) where the target selection matters. If this is a practical failure mode, then one would have to make sure the routers (probe targets) are attached to different line cards than then hosts. (The only other alternative, whether IPMP probing or OSPF hellos between the hosts, imply N-squared scaling which is a bit undesirable.)>> The way this is implemented with ICMP probing to some probe targets >> capture this for most but not necessarily all failure modes. > > Right, which is what I was pointing out above. It helps if you care > about a special paty to a special egress node in your network. It > doesn''t _necessarily_ help if you have multiple such local paths.No, that isn''t what it is doing. It doesn''t care about the probe target being reachable. That is merely a way to be able to probe sufficiently far into the first hop bridge.> It sounds like we''re saying the same thing.I don''t think so. See above.> Fortunately, with VLANs we can (and do) reflect the link-down > notification to all the clients without causing confusion.I think we have some inherent confusion without any vnics or vlans - depends how closely we look at the system. For instance doesn''t snoop claim that a packet is sent even when the link is down? snoop is just like VNICs an example of multiple local attachments to the same place.> That doesn''t seem possible here. If we report link down to a VNIC, > then we''re saying that the link is in fact down, when it''s not.Let''s leave that topic to the other email.>> So I don''t see a useful solution that is something in between L2 and L3 >> probing and failure detection. > > That''s just the niche (with routing entirely at L3 and standards-based > failover at L2) that IPMP seems to fill.No. IPMP does L3 probing and L3 failover. You were looking for something IPMP-like which would do L2 failovers underneath a collection of VNICs. That''s the thing I don''t think exists in any useful sense.>>> I just don''t want to see us rush into supporting IPMP on VNICs and >>> then discover later that it has either strange deployment artifacts or >>> unsolvable bugs. We''ve been bitten before. >> When were we bitten and due to what root cause? > > IPMP in general has a fair number of these, and that''s what I was > referring to. > > In particular, routing daemons are baffled by addresses moving back > and forth between interfaces and by the subtle way groups interact > with routes (that RTA_IFP isn''t what you think it is) and with inbound > packets, and some applications end up getting confused by "test > addresses" that look entirely usable but aren''t.Sure, we know that. And thankfully it is finally being addressed.> Those problems are being addressed by Clearview. What I''m suggesting > is that it''d be good to avoid creating another "almost like a normal > interface but not quite" element in Solaris. It could easily mean > another round of Clearview-like fixes in the future.But I think that is inherent; see the snoop example above. There will be cases when we will be able to have local communication (whether just with snoop as of today, or with other local parties with vnics). That is just a side-effect of virtualization - more of the network moves inside the box. As more of the network moves inside the box we should try to handle that reasonably well, even in failure cases where we want to provide mpathd and routing daemons with information of degraded operation, yet allow things that can still get packets through (e.g., to local destination) continue to get those packets through. That''s the problem I think we need to address. Whether or not IFF_RUNNING is part of the solution is something we can work out once we agree on the problem to solve. Erik