To rehash a question I had (and was told this might be a better place to find the answer) -- I''ve been working on LLDP support for Opensolaris ( http://www.opensolaris.org/os/projects/lld ). ?LLDP works by sending ethernet frames to an ethernet multicast address of 01-80-c3-00-00-0e with an ethertype of 0x88cc. ?My current prototype ( ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a dlpi_open(), dlpi_enabmulti(), and dlpi_bind() ( http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ). It appears that doing a dlpi_bind() could possibly cause a drop in performance on the link. ?Since I don''t have the means to test this myself, I would like to know, if that is in fact correct, is it significant enough to warrant taking another approach? ?If so, is there an approach that wouldn''t have the performance problems?
Jason, This must be the most amazing coincidence there is. I was talking to Nicolas just yesterday telling him that we need to get in touch with Jason King and see where he is with LLDP because we need it to make DCBX work with Crossbow. At this point, it would be best if this is a mac client although we might need to tweak the MAC layer to support the sap classification (which we are planning to do very soon). The MAC client API is described here http://opensolaris.org/os/project/crossbow/Docs/crossbow-virt.pdf although you should probably chat with Nicolas since he is trying to create a official version of this API. Cheers, Sunay Jason King wrote:> To rehash a question I had (and was told this might be a better place > to find the answer) -- I''ve been working on LLDP support for > Opensolaris ( http://www.opensolaris.org/os/projects/lld ). LLDP > works by sending ethernet frames to an ethernet multicast address of > 01-80-c3-00-00-0e with an ethertype of 0x88cc. My current prototype ( > ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a > dlpi_open(), dlpi_enabmulti(), and dlpi_bind() ( > http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ). > > It appears that doing a dlpi_bind() could possibly cause a drop in > performance on the link. Since I don''t have the means to test this > myself, I would like to know, if that is in fact correct, is it > significant enough to warrant taking another approach? If so, is > there an approach that wouldn''t have the performance problems? > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss
> To rehash a question I had (and was told this might be a better place> to find the answer) -- I''ve been working on LLDP support for > Opensolaris ( http://www.opensolaris.org/os/projects/lld ). ?LLDP > works by sending ethernet frames to an ethernet multicast address of > 01-80-c3-00-00-0e with an ethertype of 0x88cc. ?My current prototype ( > ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a > dlpi_open(), dlpi_enabmulti(), and dlpi_bind() ( > http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ). > > It appears that doing a dlpi_bind() could possibly cause a drop in > performance on the link. ?Since I don''t have the means to test this > myself, I would like to know, if that is in fact correct, is it > significant enough to warrant taking another approach? ?If so, is > there an approach that wouldn''t have the performance problems? If you''re not sending very often, one approach would be to create two DLPI endpoints (one for receive and one for send), create the receive on with DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind(). Admittedly a bit awkward, but it should retain performance. As far as the performance issues I was alluding to, search for checks against mi_nactiveclients in the source tree. -- meem
On Fri, Mar 6, 2009 at 8:28 PM, Sunay Tripathi <Sunay.Tripathi at sun.com> wrote:> Jason, > > This must be the most amazing coincidence there is. I was talking to > Nicolas just yesterday telling him that we need to get in touch with > Jason King and see where he is with LLDP because we need it to > make DCBX work with Crossbow. > > At this point, it would be best if this is a mac client although > we might need to tweak the MAC layer to support the sap classification > (which we are planning to do very soon). The MAC client API is > described here > http://opensolaris.org/os/project/crossbow/Docs/crossbow-virt.pdf > although you should probably chat with Nicolas since he is trying > to create a official version of this API.I''ve been playing around with it quite a bit trying to come up with a design I''m happy with -I think I''m pretty much there with that -- I was planning to very soon send out a request for comments from the networking community before presenting it to ARC. I guess I might as well do that now. The actual decoding of the TLVs is pretty simple. I had to get some experience (I''m a sysadmin by day, so I''ve been doing this in my spare time) with other pieces not directly related to decoding the TLVs. The current design can be seen at http://www.opensolaris.org/os/projects/lld/design -- anyone interested please comment. Nothing there is set in stone, and I would welcome feedback from those who deal with this everyday. To summarize it, there is a daemon that listens / transmits on configured interfaces (currently using libdlpi) it decodes the packets for each neighbor into an nvlist and stores them. Access to the data or to modify the configuration of the daemon is done via a door. If the mac client API is the most appropriate, it sounds like at least some component would need to be in the kernel. I''d definitely like to get feedback on that -- how much (all of it, some of it, etc.) would be appropriate. For DCBX, my current thought is to extend the door interface to allow the FCoE drivers to indicate which interfaces should enable DCBX, as well as get & set the necessary data for feature negotiation. I did plan to try to get some discussions with whoever was working on FCoE to figure out what sort of API would make the most sense for that, so I guess it sounds like now is a good time to do that :)
On Fri, Mar 6, 2009 at 9:26 PM, Peter Memishian <peter.memishian at sun.com> wrote:> > ?> To rehash a question I had (and was told this might be a better place > ?> to find the answer) -- I''ve been working on LLDP support for > ?> Opensolaris ( http://www.opensolaris.org/os/projects/lld ). ?LLDP > ?> works by sending ethernet frames to an ethernet multicast address of > ?> 01-80-c3-00-00-0e with an ethertype of 0x88cc. ?My current prototype ( > ?> ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a > ?> dlpi_open(), dlpi_enabmulti(), and dlpi_bind() ( > ?> http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ). > ?> > ?> It appears that doing a dlpi_bind() could possibly cause a drop in > ?> performance on the link. ?Since I don''t have the means to test this > ?> myself, I would like to know, if that is in fact correct, is it > ?> significant enough to warrant taking another approach? ?If so, is > ?> there an approach that wouldn''t have the performance problems? > > If you''re not sending very often, one approach would be to create two DLPI > endpoints (one for receive and one for send), create the receive on with > DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind(). > Admittedly a bit awkward, but it should retain performance. > > As far as the performance issues I was alluding to, search for checks > against mi_nactiveclients in the source tree.That might work -- at most a packet will be transmitted on an interface once a second (typically once every 30 seconds) -- I''m guessing that''d qualify as ''not often''.
On Sat, Mar 7, 2009 at 9:00 AM, Jason King <jason at ansipunx.net> wrote:> The current design can be seen at > http://www.opensolaris.org/os/projects/lld/design ?-- anyone > interested please comment. ?Nothing there is set in stone, and I would > welcome feedback from those who deal with this everyday.Minor correction: The correct link is http://www.opensolaris.org/os/project/lld/design/ -- Sriram
On Fri, Mar 6, 2009 at 10:00 PM, Sriram Narayanan <sriram at belenix.org> wrote:> On Sat, Mar 7, 2009 at 9:00 AM, Jason King <jason at ansipunx.net> wrote: > >> The current design can be seen at >> http://www.opensolaris.org/os/projects/lld/design ?-- anyone >> interested please comment. ?Nothing there is set in stone, and I would >> welcome feedback from those who deal with this everyday. > > Minor correction: > The correct link is ?http://www.opensolaris.org/os/project/lld/design/Ack.. my apologies -- that''s what I get for typing it out by hand :)
On Mar 6, 2009, at 8:26 PM, Peter Memishian wrote:> >> To rehash a question I had (and was told this might be a better place >> to find the answer) -- I''ve been working on LLDP support for >> Opensolaris ( http://www.opensolaris.org/os/projects/lld ). LLDP >> works by sending ethernet frames to an ethernet multicast address of >> 01-80-c3-00-00-0e with an ethertype of 0x88cc. My current >> prototype ( >> ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a >> dlpi_open(), dlpi_enabmulti(), and dlpi_bind() ( >> http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ). >> >> It appears that doing a dlpi_bind() could possibly cause a drop in >> performance on the link. Since I don''t have the means to test this >> myself, I would like to know, if that is in fact correct, is it >> significant enough to warrant taking another approach? If so, is >> there an approach that wouldn''t have the performance problems? > > If you''re not sending very often, one approach would be to create > two DLPI > endpoints (one for receive and one for send), create the receive on > with > DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind(). > Admittedly a bit awkward, but it should retain performance. > > As far as the performance issues I was alluding to, search for checks > against mi_nactiveclients in the source tree.mi_nactiveclients tracks the number of active clients at the MAC client API level. In Jason''s case, the bind would reuse the dls link and its MAC client handle for the underlying data-link, which would not cause mi_nactiveclients to be bumped up. But I think that a larger issue is how LLDP intersects with other MAC layer features, such as link aggregation/VLANs/VNICs/etc, and what components will be consuming and generating LLDP PDUs. A kernel implementation might be needed to properly interoperate with other MAC features and provide the right set of APIs for protocols or features which will use LLDP. I think we first need to better understand these issues and related requirements before deciding on the best approach. Nicolas. -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
> mi_nactiveclients tracks the number of active clients at the MAC> client API level. In Jason''s case, the bind would reuse the dls link > and its MAC client handle for the underlying data-link, which would > not cause mi_nactiveclients to be bumped up. OK, you''re saying that since both DLPI and IP use DLS they will both use the same mac client handle and thus mi_nactiveclients will not be bumped, right? -- meem
On Mar 10, 2009, at 5:15 PM, Peter Memishian wrote:> >> mi_nactiveclients tracks the number of active clients at the MAC >> client API level. In Jason''s case, the bind would reuse the dls link >> and its MAC client handle for the underlying data-link, which would >> not cause mi_nactiveclients to be bumped up. > > OK, you''re saying that since both DLPI and IP use DLS they will both > use > the same mac client handle and thus mi_nactiveclients will not be > bumped, > right?That''s right. Nicolas. -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
> That''s right.OK, thanks for explaining that. My misunderstanding. (It''s still good for in.mpathd to avoid the dlpi_bind() since it has no interest in receiving packets and thus the system is just wasting resources passing them up to it.) -- meem
On Tue, Mar 10, 2009 at 5:32 PM, Nicolas Droux <droux at sun.com> wrote:> > On Mar 6, 2009, at 8:26 PM, Peter Memishian wrote: > >> >>> To rehash a question I had (and was told this might be a better place >>> to find the answer) -- I''ve been working on LLDP support for >>> Opensolaris ( http://www.opensolaris.org/os/projects/lld ). ?LLDP >>> works by sending ethernet frames to an ethernet multicast address of >>> 01-80-c3-00-00-0e with an ethertype of 0x88cc. ?My current prototype ( >>> ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a >>> dlpi_open(), dlpi_enabmulti(), and dlpi_bind() ( >>> http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ). >>> >>> It appears that doing a dlpi_bind() could possibly cause a drop in >>> performance on the link. ?Since I don''t have the means to test this >>> myself, I would like to know, if that is in fact correct, is it >>> significant enough to warrant taking another approach? ?If so, is >>> there an approach that wouldn''t have the performance problems? >> >> If you''re not sending very often, one approach would be to create two DLPI >> endpoints (one for receive and one for send), create the receive on with >> DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind(). >> Admittedly a bit awkward, but it should retain performance. >> >> As far as the performance issues I was alluding to, search for checks >> against mi_nactiveclients in the source tree. > > mi_nactiveclients tracks the number of active clients at the MAC client API > level. In Jason''s case, the bind would reuse the dls link and its MAC client > handle for the underlying data-link, which would not cause mi_nactiveclients > to be bumped up.That''s good to know.> But I think that a larger issue is how LLDP intersects with other MAC layer > features, such as link aggregation/VLANs/VNICs/etc, and what components will > be consuming and generating LLDP PDUs. A kernel implementation might be > needed to properly interoperate with other MAC features and provide the > right set of APIs for protocols or features which will use LLDP.LLDP works on ''physical'' links in that from what I can glean from reading the specification, PDUs would be sent on every link in an aggregation (with the link-specific information), and sent without any VLAN tags (I.e. you would not have a separate PDU for every vlan). For example, if ''hostA'' had a hostid of ''abcd1234'' and a port ''nge2'' with vlan 400 and vlan 500 configured, the neighbor would see the following PDU: Chassis ID: abcd1234 PortID: nge2 VlanID: 400, enabled VlanID: 500, enabled If the host was acting as a switch and a VLAN was associated with the untagged frames on the port, a ''Port VLAN ID'' TLV could be included as well that included the untagged vlan value. For an aggregation, assuming each link is configured for transmission (it would be odd to only have some of them, but doesn''t appear to be prohibited), you would see the chassis id, and port id (for that link -- i.e. ''bge0'' or such), and the aggregation tlv (which contains the aggregation id) sent. Thus if ''hostA'' with a hostid of ''abcd1234'' had aggr1234 configured on bge3 and e1000g5, they would see two PDUs:>From bge3:Chassis ID: abcd1234 PortID: bge3 Aggr: Enabled, ID=1234>From e1000g5:Chassis ID: abcd1234 PortID: e1000g5 Aggr: Enabled, ID=1234 So far, the potential consumers would be dladm(1m), DCBX (for FCoE). In the future, other consumers might be nwam, an sma agent (for implementing the LLDP MIB), and VOIP software (if the MED extensions are implemented). For VNICs, that is an interesting case that might warrant further discussion. If associated (if that''s the right term) to an etherstub, I think they''d behave like any other link. When they''re associated with a physical link, it becomes a more interesting question. Should PDUs be sent with the VNIC information as well? It is permissible, though the current draft for DCBX suggests that negotiation in such in instance (where one end sees multiple neighbors on a single link) is halted until only one neighbor is seen.> > I think we first need to better understand these issues and related > requirements before deciding on the best approach. > > Nicolas. > > -- > Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. > droux at sun.com - http://blogs.sun.com/droux > >
Jason King writes:> LLDP works on ''physical'' links in that from what I can glean from > reading the specification, PDUs would be sent on every link in an > aggregation (with the link-specific information), and sent without any > VLAN tags (I.e. you would not have a separate PDU for every vlan).That''s pretty much what I expected, and it puts this protocol on a par with LACP. Unfortunately, because of the way aggregations work in OpenSolaris, I don''t think you can do this with DLPI from user space today -- you can''t send packets through those links, because the aggregation owns them exclusively. (We allow just passive snooping.) I think you''ll need a special new mechanism down in the mac layer to handle LLDP I/O. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
On Wed, Mar 11, 2009 at 2:25 PM, James Carlson <james.d.carlson at sun.com> wrote:> Jason King writes: >> LLDP works on ''physical'' links in that from what I can glean from >> reading the specification, PDUs would be sent on every link in an >> aggregation (with the link-specific information), and sent without any >> VLAN tags (I.e. you would not have a separate PDU for every vlan). > > That''s pretty much what I expected, and it puts this protocol on a par > with LACP. > > Unfortunately, because of the way aggregations work in OpenSolaris, I > don''t think you can do this with DLPI from user space today -- you > can''t send packets through those links, because the aggregation owns > them exclusively. ?(We allow just passive snooping.) > > I think you''ll need a special new mechanism down in the mac layer to > handle LLDP I/O.Hrmm... Doing a quick inspection it appears that the aggr driver opens the port in exclusive mode, so nothing else (such as LLDP) would be able to open it. This is getting more interesting :) I can think of a few workarounds (mostly around special casing aggregations and adding some hooks in either the mac or aggr driver to let lldp sneak in), but none would look particularly pretty. Anyone a bit more familiar with this area have any ideas?
On Mar 10, 2009, at 10:09 PM, Jason King wrote:> On Tue, Mar 10, 2009 at 5:32 PM, Nicolas Droux <droux at sun.com> wrote: >> >> On Mar 6, 2009, at 8:26 PM, Peter Memishian wrote: >> >>> >>>> To rehash a question I had (and was told this might be a better >>>> place >>>> to find the answer) -- I''ve been working on LLDP support for >>>> Opensolaris ( http://www.opensolaris.org/os/projects/lld ). LLDP >>>> works by sending ethernet frames to an ethernet multicast address >>>> of >>>> 01-80-c3-00-00-0e with an ethertype of 0x88cc. My current >>>> prototype ( >>>> ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a >>>> dlpi_open(), dlpi_enabmulti(), and dlpi_bind() ( >>>> http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ). >>>> >>>> It appears that doing a dlpi_bind() could possibly cause a drop in >>>> performance on the link. Since I don''t have the means to test this >>>> myself, I would like to know, if that is in fact correct, is it >>>> significant enough to warrant taking another approach? If so, is >>>> there an approach that wouldn''t have the performance problems? >>> >>> If you''re not sending very often, one approach would be to create >>> two DLPI >>> endpoints (one for receive and one for send), create the receive >>> on with >>> DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind(). >>> Admittedly a bit awkward, but it should retain performance. >>> >>> As far as the performance issues I was alluding to, search for >>> checks >>> against mi_nactiveclients in the source tree. >> >> mi_nactiveclients tracks the number of active clients at the MAC >> client API >> level. In Jason''s case, the bind would reuse the dls link and its >> MAC client >> handle for the underlying data-link, which would not cause >> mi_nactiveclients >> to be bumped up. > > That''s good to know. > >> But I think that a larger issue is how LLDP intersects with other >> MAC layer >> features, such as link aggregation/VLANs/VNICs/etc, and what >> components will >> be consuming and generating LLDP PDUs. A kernel implementation >> might be >> needed to properly interoperate with other MAC features and provide >> the >> right set of APIs for protocols or features which will use LLDP. > > LLDP works on ''physical'' links in that from what I can glean from > reading the specification, PDUs would be sent on every link in an > aggregation (with the link-specific information), and sent without any > VLAN tags (I.e. you would not have a separate PDU for every vlan).As Jim already pointed out in his reply, the current implementation of link aggregation in OpenSolaris is based on the assumption that the link aggregation "owns" the underlying link, and a DLPI application like your proposed daemon cannot at the same time send PDUs on individual ports. So some more work will be needed to make this work for link aggregation.> For example, if ''hostA'' had a hostid of ''abcd1234'' and a port ''nge2'' > with vlan 400 and vlan 500 configured, the neighbor would see the > following PDU: > Chassis ID: abcd1234 > PortID: nge2 > VlanID: 400, enabled > VlanID: 500, enabled > > If the host was acting as a switch and a VLAN was associated with the > untagged frames on the port, a ''Port VLAN ID'' TLV could be included as > well that included the untagged vlan value. > > For an aggregation, assuming each link is configured for transmission > (it would be odd to only have some of them, but doesn''t appear to be > prohibited), you would see the chassis id, and port id (for that link > -- i.e. ''bge0'' or such), and the aggregation tlv (which contains the > aggregation id) sent. Thus if ''hostA'' with a hostid of ''abcd1234'' had > aggr1234 configured on bge3 and e1000g5, they would see two PDUs: > > From bge3: > Chassis ID: abcd1234 > PortID: bge3 > Aggr: Enabled, ID=1234 > > From e1000g5: > Chassis ID: abcd1234 > PortID: e1000g5 > Aggr: Enabled, ID=1234And you are planning to use libdladm to find the information needed on the various data-links to generate the LDDP contents, right?> So far, the potential consumers would be dladm(1m), DCBX (for FCoE). > In the future, other consumers might be nwam, an sma agent (for > implementing the LLDP MIB), and VOIP software (if the MED extensions > are implemented).It would be interesting to discuss how this will fit with the FCOE stack in more details. I''m cc''ing Zhong Wang who is working on FCOE in OpenSolaris and requires DCBX support for negotiating Priority Flow Control (PFC). The FCOE stack has both user and kernel-space components, so I suppose that FCOE could use DCBX on LLDP either way, and then as a result of the negotiation, pass the CoS value to be used and other info through the MAC client API.> For VNICs, that is an interesting case that might warrant further > discussion. If associated (if that''s the right term) to an etherstub, > I think they''d behave like any other link. When they''re associated > with a physical link, it becomes a more interesting question. Should > PDUs be sent with the VNIC information as well? It is permissible, > though the current draft for DCBX suggests that negotiation in such in > instance (where one end sees multiple neighbors on a single link) is > halted until only one neighbor is seen.We might be required to do this if DCBX relies on such information during negotiation. Also since VNICs can have their own VID, and VLAN data-links are implemented as VNICs, VNICs would have to be taken into account when advertising the VIDs configured on the link. Thanks, Nicolas.> > >> >> I think we first need to better understand these issues and related >> requirements before deciding on the best approach. >> >> Nicolas. >> >> -- >> Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. >> droux at sun.com - http://blogs.sun.com/droux >> >>-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
On Mar 11, 2009, at 3:08 PM, Jason King wrote:> On Wed, Mar 11, 2009 at 2:25 PM, James Carlson <james.d.carlson at sun.com > > wrote: >> Jason King writes: >>> LLDP works on ''physical'' links in that from what I can glean from >>> reading the specification, PDUs would be sent on every link in an >>> aggregation (with the link-specific information), and sent without >>> any >>> VLAN tags (I.e. you would not have a separate PDU for every vlan). >> >> That''s pretty much what I expected, and it puts this protocol on a >> par >> with LACP. >> >> Unfortunately, because of the way aggregations work in OpenSolaris, I >> don''t think you can do this with DLPI from user space today -- you >> can''t send packets through those links, because the aggregation owns >> them exclusively. (We allow just passive snooping.) >> >> I think you''ll need a special new mechanism down in the mac layer to >> handle LLDP I/O. > > Hrmm... Doing a quick inspection it appears that the aggr driver opens > the port in exclusive mode, so nothing else (such as LLDP) would be > able to open it. This is getting more interesting :)Something else could open it, but only to capture packets in passive mode, you can still snoop an aggregated link for example.> I can think of a few workarounds (mostly around special casing > aggregations and adding some hooks in either the mac or aggr driver to > let lldp sneak in), but none would look particularly pretty. Anyone a > bit more familiar with this area have any ideas?One approach would be to send/receive the PDUs through the aggregation. On the receive side the PDUs received by the ports should be already passed up. On the send side the aggregation could detect the LLDP PDUs (this may not be a big issue since aggr needs to parse the packet headers to compute the hash for port selection anyway), and when a LLDP PDU is being sent, send a corresponding PDU across all members of the aggregation. How tricky this gets could depend on the contents of the PDUs for aggregation links. This is not particularly pretty, but at least the kernel changes would be contained in aggr itself. Nicolas. -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Nicolas Droux writes:> One approach would be to send/receive the PDUs through the > aggregation. On the receive side the PDUs received by the ports should > be already passed up. On the send side the aggregation could detect > the LLDP PDUs (this may not be a big issue since aggr needs to parse > the packet headers to compute the hash for port selection anyway), and > when a LLDP PDU is being sent, send a corresponding PDU across all > members of the aggregation. How tricky this gets could depend on the > contents of the PDUs for aggregation links. This is not particularly > pretty, but at least the kernel changes would be contained in aggr > itself.The LLDPDUs sent are different on each port -- at least the Port ID TLV needs to be different. That means that the aggr code would need to parse the TLVs, find the right one (fortunately, there''s a mandatory ordering), and modify it for sending on each of the links in the aggregation. On receive, the listener will need to know which underlying port received a given LLDPDU so that it can keep them straight. I suppose it''s possible to do this, but I''m not sure how viable that design would be. I think it''d be much better to provide a way to get real per-port access. You''re going to need it anyway if you implement an 802.1X authenticator. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
On Thu, Mar 12, 2009 at 7:02 AM, James Carlson <james.d.carlson at sun.com> wrote:> Nicolas Droux writes: >> One approach would be to send/receive the PDUs through the >> aggregation. On the receive side the PDUs received by the ports should >> be already passed up. On the send side the aggregation could detect >> the LLDP PDUs (this may not be a big issue since aggr needs to parse >> the packet headers to compute the hash for port selection anyway), and >> when a LLDP PDU is being sent, send a corresponding PDU across all >> members of the aggregation. How tricky this gets could depend on the >> contents of the PDUs for aggregation links. This is not particularly >> pretty, but at least the kernel changes would be contained in aggr >> itself. > > The LLDPDUs sent are different on each port -- at least the Port ID > TLV needs to be different. ?That means that the aggr code would need > to parse the TLVs, find the right one (fortunately, there''s a > mandatory ordering), and modify it for sending on each of the links in > the aggregation. > > On receive, the listener will need to know which underlying port > received a given LLDPDU so that it can keep them straight. > > I suppose it''s possible to do this, but I''m not sure how viable that > design would be. ?I think it''d be much better to provide a way to get > real per-port access. ?You''re going to need it anyway if you implement > an 802.1X authenticator.Are there any plans afoot to implement 802.1x?
Jason King writes:> On Thu, Mar 12, 2009 at 7:02 AM, James Carlson <james.d.carlson at sun.com> wrote: > > I suppose it''s possible to do this, but I''m not sure how viable that > > design would be. ?I think it''d be much better to provide a way to get > > real per-port access. ?You''re going to need it anyway if you implement > > an 802.1X authenticator. > > Are there any plans afoot to implement 802.1x?It''s CR 5092062. There''s been a fair amount of interest in it (including from our own IT department), and I know we''ve internally discussed plans to launch a project, but I don''t know the current state of things. I''d like to see us have a nice generic implementation that works on all interfaces, rather than something wired into just 802.11. (And in terms of priority, I''d expect that supplicant would come first. Authenticator is harder, as doing a good job on that means connecting into AAA, and Solaris just doesn''t have _any_ AAA infrastructure.) -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
On Mar 12, 2009, at 6:02 AM, James Carlson wrote:> Nicolas Droux writes: >> One approach would be to send/receive the PDUs through the >> aggregation. On the receive side the PDUs received by the ports >> should >> be already passed up. On the send side the aggregation could detect >> the LLDP PDUs (this may not be a big issue since aggr needs to parse >> the packet headers to compute the hash for port selection anyway), >> and >> when a LLDP PDU is being sent, send a corresponding PDU across all >> members of the aggregation. How tricky this gets could depend on the >> contents of the PDUs for aggregation links. This is not particularly >> pretty, but at least the kernel changes would be contained in aggr >> itself. > > The LLDPDUs sent are different on each port -- at least the Port ID > TLV needs to be different. That means that the aggr code would need > to parse the TLVs, find the right one (fortunately, there''s a > mandatory ordering), and modify it for sending on each of the links in > the aggregation. > > On receive, the listener will need to know which underlying port > received a given LLDPDU so that it can keep them straight. > > I suppose it''s possible to do this, but I''m not sure how viable that > design would be. I think it''d be much better to provide a way to get > real per-port access. You''re going to need it anyway if you implement > an 802.1X authenticator.For performance reasons the aggregation owns the underlying MAC, and the MAC layer corresponding to the aggregated ports is bypassed on the data-path (today we have full bypass on RX and partial bypass on TX, full TX bypass is planned as well). This doesn''t allow other clients to have direct access to an aggregated port to send and receive data. Other changes to the core MAC layer would be needed on the control path as well to make this possible. Since protocols like 802.1x and LLDP don''t need the full functionality of the MAC client API, we could potentially provide them limited access to ports via the aggregation, while avoiding aggr to understand these protocols, and preserving the bypass. Nicolas.> > > -- > James Carlson, Solaris Networking <james.d.carlson at sun.com > > > Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 > 2084 > MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 > 1677-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com> wrote:> > On Mar 12, 2009, at 6:02 AM, James Carlson wrote: > >> Nicolas Droux writes: >>> >>> One approach would be to send/receive the PDUs through the >>> aggregation. On the receive side the PDUs received by the ports should >>> be already passed up. On the send side the aggregation could detect >>> the LLDP PDUs (this may not be a big issue since aggr needs to parse >>> the packet headers to compute the hash for port selection anyway), and >>> when a LLDP PDU is being sent, send a corresponding PDU across all >>> members of the aggregation. How tricky this gets could depend on the >>> contents of the PDUs for aggregation links. This is not particularly >>> pretty, but at least the kernel changes would be contained in aggr >>> itself. >> >> The LLDPDUs sent are different on each port -- at least the Port ID >> TLV needs to be different. ?That means that the aggr code would need >> to parse the TLVs, find the right one (fortunately, there''s a >> mandatory ordering), and modify it for sending on each of the links in >> the aggregation. >> >> On receive, the listener will need to know which underlying port >> received a given LLDPDU so that it can keep them straight. >> >> I suppose it''s possible to do this, but I''m not sure how viable that >> design would be. ?I think it''d be much better to provide a way to get >> real per-port access. ?You''re going to need it anyway if you implement >> an 802.1X authenticator. > > For performance reasons the aggregation owns the underlying MAC, and the MAC > layer corresponding to the aggregated ports is bypassed on the data-path > (today we have full bypass on RX and partial bypass on TX, full TX bypass is > planned as well). This doesn''t allow other clients to have direct access to > an aggregated port to send and receive data. Other changes to the core MAC > layer would be needed on the control path as well to make this possible. > Since protocols like 802.1x and LLDP don''t need the full functionality of > the MAC client API, we could potentially provide them limited access to > ports via the aggregation, while avoiding aggr to understand these > protocols, and preserving the bypass.I''m not sure if I''d need to bounce this back to networking-discuss, but I''ll start here. If LLDP or 802.1x call the mac api directly (implying at least some pieces of this need to sit in a kernel module), it sounds like it could have a negative impact on performance (if something else -- IP? -- is also using the link) due to multiple mac clients, whereas if libdlpi is used, it sounds like the existing mac handle is able to be reused. Anyone know if the overhead is significant enough to warrant possibly making the limited access needed for aggregations go through some private dlpi calls that can make the corresponding mac calls to get at the underlying links?
On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com> wrote:> > On Mar 12, 2009, at 6:02 AM, James Carlson wrote: > >> Nicolas Droux writes: >>> >>> One approach would be to send/receive the PDUs through the >>> aggregation. On the receive side the PDUs received by the ports should >>> be already passed up. On the send side the aggregation could detect >>> the LLDP PDUs (this may not be a big issue since aggr needs to parse >>> the packet headers to compute the hash for port selection anyway), and >>> when a LLDP PDU is being sent, send a corresponding PDU across all >>> members of the aggregation. How tricky this gets could depend on the >>> contents of the PDUs for aggregation links. This is not particularly >>> pretty, but at least the kernel changes would be contained in aggr >>> itself. >> >> The LLDPDUs sent are different on each port -- at least the Port ID >> TLV needs to be different. ?That means that the aggr code would need >> to parse the TLVs, find the right one (fortunately, there''s a >> mandatory ordering), and modify it for sending on each of the links in >> the aggregation. >> >> On receive, the listener will need to know which underlying port >> received a given LLDPDU so that it can keep them straight. >> >> I suppose it''s possible to do this, but I''m not sure how viable that >> design would be. ?I think it''d be much better to provide a way to get >> real per-port access. ?You''re going to need it anyway if you implement >> an 802.1X authenticator. > > For performance reasons the aggregation owns the underlying MAC, and the MAC > layer corresponding to the aggregated ports is bypassed on the data-path > (today we have full bypass on RX and partial bypass on TX, full TX bypass is > planned as well). This doesn''t allow other clients to have direct access toCan you explain this a bit more? While I''ve done work on either end of the stack (DLPI clients, and NIC drivers using gldv3), but nothing inbetween, so I''ve been trying to figure out how it all interacts. Do you mean the mac instance the aggr creates for itself is bypassed? I.e. On rx it goes from driver->mac->dls, instead of driver->mac->aggr->mac->dls ? If so, does this imply LACP has to be off (tangentially related, but just trying to get the big picture).> an aggregated port to send and receive data. Other changes to the core MAC > layer would be needed on the control path as well to make this possible. > Since protocols like 802.1x and LLDP don''t need the full functionality of > the MAC client API, we could potentially provide them limited access to > ports via the aggregation, while avoiding aggr to understand these > protocols, and preserving the bypass.
On Apr 7, 2009, at 9:43 PM, Jason King wrote:> On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com> wrote: >> >> On Mar 12, 2009, at 6:02 AM, James Carlson wrote: >> >>> Nicolas Droux writes: >>>> >>>> One approach would be to send/receive the PDUs through the >>>> aggregation. On the receive side the PDUs received by the ports >>>> should >>>> be already passed up. On the send side the aggregation could detect >>>> the LLDP PDUs (this may not be a big issue since aggr needs to >>>> parse >>>> the packet headers to compute the hash for port selection >>>> anyway), and >>>> when a LLDP PDU is being sent, send a corresponding PDU across all >>>> members of the aggregation. How tricky this gets could depend on >>>> the >>>> contents of the PDUs for aggregation links. This is not >>>> particularly >>>> pretty, but at least the kernel changes would be contained in aggr >>>> itself. >>> >>> The LLDPDUs sent are different on each port -- at least the Port ID >>> TLV needs to be different. That means that the aggr code would need >>> to parse the TLVs, find the right one (fortunately, there''s a >>> mandatory ordering), and modify it for sending on each of the >>> links in >>> the aggregation. >>> >>> On receive, the listener will need to know which underlying port >>> received a given LLDPDU so that it can keep them straight. >>> >>> I suppose it''s possible to do this, but I''m not sure how viable that >>> design would be. I think it''d be much better to provide a way to >>> get >>> real per-port access. You''re going to need it anyway if you >>> implement >>> an 802.1X authenticator. >> >> For performance reasons the aggregation owns the underlying MAC, >> and the MAC >> layer corresponding to the aggregated ports is bypassed on the data- >> path >> (today we have full bypass on RX and partial bypass on TX, full TX >> bypass is >> planned as well). This doesn''t allow other clients to have direct >> access to > > Can you explain this a bit more? While I''ve done work on either end > of the stack (DLPI clients, and NIC drivers using gldv3), but nothing > inbetween, so I''ve been trying to figure out how it all interacts. > > Do you mean the mac instance the aggr creates for itself is bypassed? > I.e. On rx it goes from driver->mac->dls, instead of > driver->mac->aggr->mac->dls ? > If so, does this imply LACP has to be off (tangentially related, but > just trying to get the big picture).It''s the MAC layer associated with the port itself that''s bypassed, since we still need to have aggr on the receive path for LACP, and on TX for port selection. So instead of driver->mac->aggr->mac->dls, we have driver->aggr->mac->dls (actually dls is also bypassed on the data- path for the most common cases, but you get the drift.) In order to do this, MAC provides mac_hwring*() functions to aggr. Aggr uses these functions to obtain the list of the rings of the underlying NIC, and exposes corresponding pseudo-rings to the MAC layer above it. These functions also rewire the port''s MAC data-path, both on the interrupt and polling path, to enable the bypass. Hope this helps, Nicolas.> > >> an aggregated port to send and receive data. Other changes to the >> core MAC >> layer would be needed on the control path as well to make this >> possible. >> Since protocols like 802.1x and LLDP don''t need the full >> functionality of >> the MAC client API, we could potentially provide them limited >> access to >> ports via the aggregation, while avoiding aggr to understand these >> protocols, and preserving the bypass. > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
On Wed, Apr 8, 2009 at 11:11 PM, Nicolas Droux <Nicolas.Droux at sun.com> wrote:> > On Apr 7, 2009, at 9:43 PM, Jason King wrote: > >> On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com> wrote: >>> >>> On Mar 12, 2009, at 6:02 AM, James Carlson wrote: >>> >>>> Nicolas Droux writes: >>>>> >>>>> One approach would be to send/receive the PDUs through the >>>>> aggregation. On the receive side the PDUs received by the ports should >>>>> be already passed up. On the send side the aggregation could detect >>>>> the LLDP PDUs (this may not be a big issue since aggr needs to parse >>>>> the packet headers to compute the hash for port selection anyway), and >>>>> when a LLDP PDU is being sent, send a corresponding PDU across all >>>>> members of the aggregation. How tricky this gets could depend on the >>>>> contents of the PDUs for aggregation links. This is not particularly >>>>> pretty, but at least the kernel changes would be contained in aggr >>>>> itself. >>>> >>>> The LLDPDUs sent are different on each port -- at least the Port ID >>>> TLV needs to be different. ?That means that the aggr code would need >>>> to parse the TLVs, find the right one (fortunately, there''s a >>>> mandatory ordering), and modify it for sending on each of the links in >>>> the aggregation. >>>> >>>> On receive, the listener will need to know which underlying port >>>> received a given LLDPDU so that it can keep them straight. >>>> >>>> I suppose it''s possible to do this, but I''m not sure how viable that >>>> design would be. ?I think it''d be much better to provide a way to get >>>> real per-port access. ?You''re going to need it anyway if you implement >>>> an 802.1X authenticator. >>> >>> For performance reasons the aggregation owns the underlying MAC, and the >>> MAC >>> layer corresponding to the aggregated ports is bypassed on the data-path >>> (today we have full bypass on RX and partial bypass on TX, full TX bypass >>> is >>> planned as well). This doesn''t allow other clients to have direct access >>> to >> >> Can you explain this a bit more? ?While I''ve done work on either end >> of the stack (DLPI clients, and NIC drivers using gldv3), but nothing >> inbetween, so I''ve been trying to figure out how it all interacts. >> >> Do you mean the mac instance the aggr creates for itself is bypassed? >> I.e. On rx it goes from driver->mac->dls, instead of >> driver->mac->aggr->mac->dls ? >> If so, does this imply LACP has to be off (tangentially related, but >> just trying to get the big picture). > > It''s the MAC layer associated with the port itself that''s bypassed, since we > still need to have aggr on the receive path for LACP, and on TX for port > selection. So instead of driver->mac->aggr->mac->dls, we have > driver->aggr->mac->dls (actually dls is also bypassed on the data-path for > the most common cases, but you get the drift.) > > In order to do this, MAC provides mac_hwring*() functions to aggr. Aggr uses > these functions to obtain the list of the rings of the underlying NIC, and > exposes corresponding pseudo-rings to the MAC layer above it. These > functions also rewire the port''s MAC data-path, both on the interrupt and > polling path, to enable the bypass. > > Hope this helps,Yes, I was suspecting something like that, but I''ve only glanced through the crossbow design docs a long time ago, so I''m not familiar with all the updates it''s done to the mac later (yet). What I''m wondering as a solution for LLDP (and 802.1x) is perhaps a private callback function for rx, something conceptually like (the actual types are probably off a bit since I don''t have the source in front of me, and names are just there as placeholders): typedef enum { AGGR_CB_LLDP, AGGR_CB_8021X } aggr_cb_proto_t; typedef void (*aggr_per_port_cb_t)(aggr_grp_t *grp, datalink_id_t link, mblk_t *mp, void *cookie); int aggr_add_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto, aggr_per_port_cb_t cb, void *cookie); int aggr_del_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto, aggr_per_port_cb_t cb); int aggr_tx_port(aggr_grp_t *grp, datalink_id_t link, mblk_t *mp); Though it has the disadvantage of complicating add/remove of links, and port up/down events. But as a starting point, I''d like to get feedback/have holes shot in it/etc.
On Apr 8, 2009, at 10:39 PM, Jason King wrote:> On Wed, Apr 8, 2009 at 11:11 PM, Nicolas Droux > <Nicolas.Droux at sun.com> wrote: >> >> On Apr 7, 2009, at 9:43 PM, Jason King wrote: >> >>> On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com> >>> wrote: >>>> >>>> On Mar 12, 2009, at 6:02 AM, James Carlson wrote: >>>> >>>>> Nicolas Droux writes: >>>>>> >>>>>> One approach would be to send/receive the PDUs through the >>>>>> aggregation. On the receive side the PDUs received by the ports >>>>>> should >>>>>> be already passed up. On the send side the aggregation could >>>>>> detect >>>>>> the LLDP PDUs (this may not be a big issue since aggr needs to >>>>>> parse >>>>>> the packet headers to compute the hash for port selection >>>>>> anyway), and >>>>>> when a LLDP PDU is being sent, send a corresponding PDU across >>>>>> all >>>>>> members of the aggregation. How tricky this gets could depend >>>>>> on the >>>>>> contents of the PDUs for aggregation links. This is not >>>>>> particularly >>>>>> pretty, but at least the kernel changes would be contained in >>>>>> aggr >>>>>> itself. >>>>> >>>>> The LLDPDUs sent are different on each port -- at least the Port >>>>> ID >>>>> TLV needs to be different. That means that the aggr code would >>>>> need >>>>> to parse the TLVs, find the right one (fortunately, there''s a >>>>> mandatory ordering), and modify it for sending on each of the >>>>> links in >>>>> the aggregation. >>>>> >>>>> On receive, the listener will need to know which underlying port >>>>> received a given LLDPDU so that it can keep them straight. >>>>> >>>>> I suppose it''s possible to do this, but I''m not sure how viable >>>>> that >>>>> design would be. I think it''d be much better to provide a way >>>>> to get >>>>> real per-port access. You''re going to need it anyway if you >>>>> implement >>>>> an 802.1X authenticator. >>>> >>>> For performance reasons the aggregation owns the underlying MAC, >>>> and the >>>> MAC >>>> layer corresponding to the aggregated ports is bypassed on the >>>> data-path >>>> (today we have full bypass on RX and partial bypass on TX, full >>>> TX bypass >>>> is >>>> planned as well). This doesn''t allow other clients to have direct >>>> access >>>> to >>> >>> Can you explain this a bit more? While I''ve done work on either end >>> of the stack (DLPI clients, and NIC drivers using gldv3), but >>> nothing >>> inbetween, so I''ve been trying to figure out how it all interacts. >>> >>> Do you mean the mac instance the aggr creates for itself is >>> bypassed? >>> I.e. On rx it goes from driver->mac->dls, instead of >>> driver->mac->aggr->mac->dls ? >>> If so, does this imply LACP has to be off (tangentially related, but >>> just trying to get the big picture). >> >> It''s the MAC layer associated with the port itself that''s bypassed, >> since we >> still need to have aggr on the receive path for LACP, and on TX for >> port >> selection. So instead of driver->mac->aggr->mac->dls, we have >> driver->aggr->mac->dls (actually dls is also bypassed on the data- >> path for >> the most common cases, but you get the drift.) >> >> In order to do this, MAC provides mac_hwring*() functions to aggr. >> Aggr uses >> these functions to obtain the list of the rings of the underlying >> NIC, and >> exposes corresponding pseudo-rings to the MAC layer above it. These >> functions also rewire the port''s MAC data-path, both on the >> interrupt and >> polling path, to enable the bypass. >> >> Hope this helps, > > Yes, I was suspecting something like that, but I''ve only glanced > through the crossbow design docs a long time ago, so I''m not familiar > with all the updates it''s done to the mac later (yet). > > What I''m wondering as a solution for LLDP (and 802.1x) is perhaps a > private callback function for rx, something conceptually like (the > actual types are probably off a bit since I don''t have the source in > front of me, and names are just there as placeholders): > > typedef enum { AGGR_CB_LLDP, AGGR_CB_8021X } aggr_cb_proto_t; > typedef void (*aggr_per_port_cb_t)(aggr_grp_t *grp, datalink_id_t > link, mblk_t *mp, void *cookie); > int aggr_add_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto, > aggr_per_port_cb_t cb, void *cookie); > int aggr_del_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto, > aggr_per_port_cb_t cb); > int aggr_tx_port(aggr_grp_t *grp, datalink_id_t link, mblk_t *mp); > > Though it has the disadvantage of complicating add/remove of links, > and port up/down events. But as a starting point, I''d like to get > feedback/have holes shot in it/etc.Jason, This could work. It would be nice to prototype this to validate the approach. If this becomes a large effort in itself, something to consider would be to handle the aggregated ports through a separate RFE/project at a later time. This would allow other protocols which rely on LLDP to start their development (e.g. DCBX) and users to start experimenting with this feature. BTW did you have a chance to look into how the DCBX support would be layered on top of your LLDP implementation? Since DCBX is used by FCoE, it would be interesting to see how these pieces can fit together. Nicolas. -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
On Wed, Apr 15, 2009 at 1:27 AM, Nicolas Droux <Nicolas.Droux at sun.com> wrote:> > On Apr 8, 2009, at 10:39 PM, Jason King wrote: > >> On Wed, Apr 8, 2009 at 11:11 PM, Nicolas Droux <Nicolas.Droux at sun.com> >> wrote: >>> >>> On Apr 7, 2009, at 9:43 PM, Jason King wrote: >>> >>>> On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com> wrote: >>>>> >>>>> On Mar 12, 2009, at 6:02 AM, James Carlson wrote: >>>>> >>>>>> Nicolas Droux writes: >>>>>>> >>>>>>> One approach would be to send/receive the PDUs through the >>>>>>> aggregation. On the receive side the PDUs received by the ports >>>>>>> should >>>>>>> be already passed up. On the send side the aggregation could detect >>>>>>> the LLDP PDUs (this may not be a big issue since aggr needs to parse >>>>>>> the packet headers to compute the hash for port selection anyway), >>>>>>> and >>>>>>> when a LLDP PDU is being sent, send a corresponding PDU across all >>>>>>> members of the aggregation. How tricky this gets could depend on the >>>>>>> contents of the PDUs for aggregation links. This is not particularly >>>>>>> pretty, but at least the kernel changes would be contained in aggr >>>>>>> itself. >>>>>> >>>>>> The LLDPDUs sent are different on each port -- at least the Port ID >>>>>> TLV needs to be different. ?That means that the aggr code would need >>>>>> to parse the TLVs, find the right one (fortunately, there''s a >>>>>> mandatory ordering), and modify it for sending on each of the links in >>>>>> the aggregation. >>>>>> >>>>>> On receive, the listener will need to know which underlying port >>>>>> received a given LLDPDU so that it can keep them straight. >>>>>> >>>>>> I suppose it''s possible to do this, but I''m not sure how viable that >>>>>> design would be. ?I think it''d be much better to provide a way to get >>>>>> real per-port access. ?You''re going to need it anyway if you implement >>>>>> an 802.1X authenticator. >>>>> >>>>> For performance reasons the aggregation owns the underlying MAC, and >>>>> the >>>>> MAC >>>>> layer corresponding to the aggregated ports is bypassed on the >>>>> data-path >>>>> (today we have full bypass on RX and partial bypass on TX, full TX >>>>> bypass >>>>> is >>>>> planned as well). This doesn''t allow other clients to have direct >>>>> access >>>>> to >>>> >>>> Can you explain this a bit more? ?While I''ve done work on either end >>>> of the stack (DLPI clients, and NIC drivers using gldv3), but nothing >>>> inbetween, so I''ve been trying to figure out how it all interacts. >>>> >>>> Do you mean the mac instance the aggr creates for itself is bypassed? >>>> I.e. On rx it goes from driver->mac->dls, instead of >>>> driver->mac->aggr->mac->dls ? >>>> If so, does this imply LACP has to be off (tangentially related, but >>>> just trying to get the big picture). >>> >>> It''s the MAC layer associated with the port itself that''s bypassed, since >>> we >>> still need to have aggr on the receive path for LACP, and on TX for port >>> selection. So instead of driver->mac->aggr->mac->dls, we have >>> driver->aggr->mac->dls (actually dls is also bypassed on the data-path >>> for >>> the most common cases, but you get the drift.) >>> >>> In order to do this, MAC provides mac_hwring*() functions to aggr. Aggr >>> uses >>> these functions to obtain the list of the rings of the underlying NIC, >>> and >>> exposes corresponding pseudo-rings to the MAC layer above it. These >>> functions also rewire the port''s MAC data-path, both on the interrupt and >>> polling path, to enable the bypass. >>> >>> Hope this helps, >> >> Yes, I was suspecting something like that, but I''ve only glanced >> through the crossbow design docs a long time ago, so I''m not familiar >> with all the updates it''s done to the mac later (yet). >> >> What I''m wondering as a solution for LLDP (and 802.1x) is perhaps a >> private callback function for rx, something conceptually like (the >> actual types are probably off a bit since I don''t have the source in >> front of me, and names are just there as placeholders): >> >> typedef enum { AGGR_CB_LLDP, AGGR_CB_8021X } aggr_cb_proto_t; >> typedef void (*aggr_per_port_cb_t)(aggr_grp_t *grp, datalink_id_t >> link, mblk_t *mp, void *cookie); >> int aggr_add_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto, >> aggr_per_port_cb_t cb, void *cookie); >> int aggr_del_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto, >> aggr_per_port_cb_t cb); >> int aggr_tx_port(aggr_grp_t *grp, datalink_id_t link, mblk_t *mp); >> >> Though it has the disadvantage of complicating add/remove of links, >> and port up/down events. ? But as a starting point, I''d like to get >> feedback/have holes shot in it/etc. > > Jason, > > This could work. It would be nice to prototype this to validate the > approach. If this becomes a large effort in itself, something to consider > would be to handle the aggregated ports through ?a separate RFE/project at a > later time. This would allow other protocols which rely on LLDP to start > their development (e.g. DCBX) and users to start experimenting with this > feature.I can try to whip something up in the next few weeks, though unfortunately, I don''t have the equipment myself to test (outside of making sure it builds correctly) with aggregations (I''ve been using vnics on my home pc), I would need some assistance there. It also would mean moving most of the operations into the kernel, so I might have some questions as I go along related to that.> > BTW did you have a chance to look into how the DCBX support would be layered > on top of your LLDP implementation? Since DCBX is used by FCoE, it would be > interesting to see how these pieces can fit together.I tried soliciting feedback on the design from storage-discuss to find out what they would need, but haven''t received anything from them.> > Nicolas. > > -- > Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. > droux at sun.com - http://blogs.sun.com/droux > >