thr3ads.net - crossbow discuss - [crossbow-discuss] DLPI and non IP frames [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Jason King

2009-Mar-07 01:35 UTC

[crossbow-discuss] DLPI and non IP frames

To rehash a question I had (and was told this might be a better place
to find the answer) -- I''ve been working on LLDP support for
Opensolaris ( http://www.opensolaris.org/os/projects/lld ). ?LLDP
works by sending ethernet frames to an ethernet multicast address of
01-80-c3-00-00-0e with an ethertype of 0x88cc. ?My current prototype (
ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a
dlpi_open(), dlpi_enabmulti(), and dlpi_bind() (
http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ).

It appears that doing a dlpi_bind() could possibly cause a drop in
performance on the link. ?Since I don''t have the means to test this
myself, I would like to know, if that is in fact correct, is it
significant enough to warrant taking another approach? ?If so, is
there an approach that wouldn''t have the performance problems?

Sunay Tripathi

2009-Mar-07 02:28 UTC

head link

[crossbow-discuss] DLPI and non IP frames

Jason,

This must be the most amazing coincidence there is. I was talking to
Nicolas just yesterday telling him that we need to get in touch with
Jason King and see where he is with LLDP because we need it to
make DCBX work with Crossbow.

At this point, it would be best if this is a mac client although
we might need to tweak the MAC layer to support the sap classification
(which we are planning to do very soon). The MAC client API is
described here
http://opensolaris.org/os/project/crossbow/Docs/crossbow-virt.pdf
although you should probably chat with Nicolas since he is trying
to create a official version of this API.

Cheers,
Sunay

Jason King wrote:> To rehash a question I had (and was told this might be a better place
> to find the answer) -- I''ve been working on LLDP support for
> Opensolaris ( http://www.opensolaris.org/os/projects/lld ).  LLDP
> works by sending ethernet frames to an ethernet multicast address of
> 01-80-c3-00-00-0e with an ethertype of 0x88cc.  My current prototype (
> ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a
> dlpi_open(), dlpi_enabmulti(), and dlpi_bind() (
> http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ).
> 
> It appears that doing a dlpi_bind() could possibly cause a drop in
> performance on the link.  Since I don''t have the means to test
this
> myself, I would like to know, if that is in fact correct, is it
> significant enough to warrant taking another approach?  If so, is
> there an approach that wouldn''t have the performance problems?
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss

Peter Memishian

2009-Mar-07 03:26 UTC

head link

[crossbow-discuss] DLPI and non IP frames

> To rehash a question I had (and was told this might be a better place > to find the answer) -- I''ve been working on LLDP support for
 > Opensolaris ( http://www.opensolaris.org/os/projects/lld ). ?LLDP
 > works by sending ethernet frames to an ethernet multicast address of
 > 01-80-c3-00-00-0e with an ethertype of 0x88cc. ?My current prototype (
 > ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a
 > dlpi_open(), dlpi_enabmulti(), and dlpi_bind() (
 > http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ).
 > 
 > It appears that doing a dlpi_bind() could possibly cause a drop in
 > performance on the link. ?Since I don''t have the means to test
this
 > myself, I would like to know, if that is in fact correct, is it
 > significant enough to warrant taking another approach? ?If so, is
 > there an approach that wouldn''t have the performance problems?

If you''re not sending very often, one approach would be to create two
DLPI
endpoints (one for receive and one for send), create the receive on with
DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind().
Admittedly a bit awkward, but it should retain performance.

As far as the performance issues I was alluding to, search for checks
against mi_nactiveclients in the source tree.


-- 
meem

Jason King

2009-Mar-07 03:30 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Fri, Mar 6, 2009 at 8:28 PM, Sunay Tripathi <Sunay.Tripathi at sun.com>
wrote:> Jason,
>
> This must be the most amazing coincidence there is. I was talking to
> Nicolas just yesterday telling him that we need to get in touch with
> Jason King and see where he is with LLDP because we need it to
> make DCBX work with Crossbow.
>
> At this point, it would be best if this is a mac client although
> we might need to tweak the MAC layer to support the sap classification
> (which we are planning to do very soon). The MAC client API is
> described here
> http://opensolaris.org/os/project/crossbow/Docs/crossbow-virt.pdf
> although you should probably chat with Nicolas since he is trying
> to create a official version of this API.
I''ve been playing around with it quite a bit trying to come up with a
design I''m happy with -I think I''m pretty much there with that
-- I
was planning to very soon send out a request for comments from the
networking community before presenting it to ARC.  I guess I might as
well do that now.  The actual decoding of the TLVs is pretty simple.
I had to get some experience (I''m a sysadmin by day, so I''ve
been
doing this in my spare time) with other pieces not directly related to
decoding the TLVs.

The current design can be seen at
http://www.opensolaris.org/os/projects/lld/design  -- anyone
interested please comment.  Nothing there is set in stone, and I would
welcome feedback from those who deal with this everyday.

To summarize it, there is a daemon that listens / transmits on
configured interfaces (currently using libdlpi) it decodes the packets
for each neighbor into an nvlist and stores them.  Access to the data
or to modify the configuration of the daemon is done via a door.

If the mac client API is the most appropriate, it sounds like at least
some component would need to be in the kernel.   I''d definitely like
to get feedback on that -- how much (all of it, some of it, etc.)
would be appropriate.

For DCBX, my current thought is to extend the door interface to allow
the FCoE drivers to indicate which interfaces should enable DCBX, as
well as get & set the necessary data for feature negotiation.  I did
plan to try to get some discussions with whoever was working on FCoE
to figure out what sort of API would make the most sense for that, so
I guess it sounds like now is a good time to do that :)

Jason King

2009-Mar-07 03:43 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Fri, Mar 6, 2009 at 9:26 PM, Peter Memishian <peter.memishian at
sun.com> wrote:>
> ?> To rehash a question I had (and was told this might be a better place
> ?> to find the answer) -- I''ve been working on LLDP support for
> ?> Opensolaris ( http://www.opensolaris.org/os/projects/lld ). ?LLDP
> ?> works by sending ethernet frames to an ethernet multicast address of
> ?> 01-80-c3-00-00-0e with an ethertype of 0x88cc. ?My current prototype
(
> ?> ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a
> ?> dlpi_open(), dlpi_enabmulti(), and dlpi_bind() (
> ?> http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ).
> ?>
> ?> It appears that doing a dlpi_bind() could possibly cause a drop in
> ?> performance on the link. ?Since I don''t have the means to
test this
> ?> myself, I would like to know, if that is in fact correct, is it
> ?> significant enough to warrant taking another approach? ?If so, is
> ?> there an approach that wouldn''t have the performance
problems?
>
> If you''re not sending very often, one approach would be to create
two DLPI
> endpoints (one for receive and one for send), create the receive on with
> DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind().
> Admittedly a bit awkward, but it should retain performance.
>
> As far as the performance issues I was alluding to, search for checks
> against mi_nactiveclients in the source tree.
That might work -- at most a packet will be transmitted on an
interface once a second (typically once every 30 seconds) -- I''m
guessing that''d qualify as ''not often''.

Sriram Narayanan

2009-Mar-07 04:00 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Sat, Mar 7, 2009 at 9:00 AM, Jason King <jason at ansipunx.net> wrote:
> The current design can be seen at
> http://www.opensolaris.org/os/projects/lld/design ?-- anyone
> interested please comment. ?Nothing there is set in stone, and I would
> welcome feedback from those who deal with this everyday.
Minor correction:
The correct link is  http://www.opensolaris.org/os/project/lld/design/

-- Sriram

Jason King

2009-Mar-07 04:03 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Fri, Mar 6, 2009 at 10:00 PM, Sriram Narayanan <sriram at belenix.org>
wrote:> On Sat, Mar 7, 2009 at 9:00 AM, Jason King <jason at ansipunx.net>
wrote:
>
>> The current design can be seen at
>> http://www.opensolaris.org/os/projects/lld/design ?-- anyone
>> interested please comment. ?Nothing there is set in stone, and I would
>> welcome feedback from those who deal with this everyday.
>
> Minor correction:
> The correct link is ?http://www.opensolaris.org/os/project/lld/design/
Ack.. my apologies -- that''s what I get for typing it out by hand :)

Nicolas Droux

2009-Mar-10 22:32 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Mar 6, 2009, at 8:26 PM, Peter Memishian wrote:
>
>> To rehash a question I had (and was told this might be a better place
>> to find the answer) -- I''ve been working on LLDP support for
>> Opensolaris ( http://www.opensolaris.org/os/projects/lld ).  LLDP
>> works by sending ethernet frames to an ethernet multicast address of
>> 01-80-c3-00-00-0e with an ethertype of 0x88cc.  My current  
>> prototype (
>> ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing a
>> dlpi_open(), dlpi_enabmulti(), and dlpi_bind() (
>> http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ).
>>
>> It appears that doing a dlpi_bind() could possibly cause a drop in
>> performance on the link.  Since I don''t have the means to test
this
>> myself, I would like to know, if that is in fact correct, is it
>> significant enough to warrant taking another approach?  If so, is
>> there an approach that wouldn''t have the performance problems?
>
> If you''re not sending very often, one approach would be to create
> two DLPI
> endpoints (one for receive and one for send), create the receive on  
> with
> DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind().
> Admittedly a bit awkward, but it should retain performance.
>
> As far as the performance issues I was alluding to, search for checks
> against mi_nactiveclients in the source tree.
mi_nactiveclients tracks the number of active clients at the MAC  
client API level. In Jason''s case, the bind would reuse the dls link  
and its MAC client handle for the underlying data-link, which would  
not cause mi_nactiveclients to be bumped up.

But I think that a larger issue is how LLDP intersects with other MAC  
layer features, such as link aggregation/VLANs/VNICs/etc, and what  
components will be consuming and generating LLDP PDUs. A kernel  
implementation might be needed to properly interoperate with other MAC  
features and provide the right set of APIs for protocols or features  
which will use LLDP.

I think we first need to better understand these issues and related  
requirements before deciding on the best approach.

Nicolas.

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Peter Memishian

2009-Mar-10 23:15 UTC

head link

[crossbow-discuss] DLPI and non IP frames

> mi_nactiveclients tracks the number of active clients at the MAC   > client API level. In Jason''s case, the bind would reuse the dls
link
 > and its MAC client handle for the underlying data-link, which would  
 > not cause mi_nactiveclients to be bumped up.

OK, you''re saying that since both DLPI and IP use DLS they will both
use
the same mac client handle and thus mi_nactiveclients will not be bumped,
right?

-- 
meem

Nicolas Droux

2009-Mar-11 01:01 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Mar 10, 2009, at 5:15 PM, Peter Memishian wrote:
>
>> mi_nactiveclients tracks the number of active clients at the MAC
>> client API level. In Jason''s case, the bind would reuse the
dls link
>> and its MAC client handle for the underlying data-link, which would
>> not cause mi_nactiveclients to be bumped up.
>
> OK, you''re saying that since both DLPI and IP use DLS they will
both
> use
> the same mac client handle and thus mi_nactiveclients will not be  
> bumped,
> right?
That''s right.

Nicolas.

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Peter Memishian

2009-Mar-11 01:34 UTC

head link

[crossbow-discuss] DLPI and non IP frames

> That''s right.
OK, thanks for explaining that.  My misunderstanding.  (It''s still
good for in.mpathd to avoid the dlpi_bind() since it has no interest in
receiving packets and thus the system is just wasting resources passing
them up to it.)

-- 
meem

Jason King

2009-Mar-11 04:09 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Tue, Mar 10, 2009 at 5:32 PM, Nicolas Droux <droux at sun.com>
wrote:>
> On Mar 6, 2009, at 8:26 PM, Peter Memishian wrote:
>
>>
>>> To rehash a question I had (and was told this might be a better
place
>>> to find the answer) -- I''ve been working on LLDP support
for
>>> Opensolaris ( http://www.opensolaris.org/os/projects/lld ). ?LLDP
>>> works by sending ethernet frames to an ethernet multicast address
of
>>> 01-80-c3-00-00-0e with an ethertype of 0x88cc. ?My current
prototype (
>>> ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by doing
a
>>> dlpi_open(), dlpi_enabmulti(), and dlpi_bind() (
>>> http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ).
>>>
>>> It appears that doing a dlpi_bind() could possibly cause a drop in
>>> performance on the link. ?Since I don''t have the means to
test this
>>> myself, I would like to know, if that is in fact correct, is it
>>> significant enough to warrant taking another approach? ?If so, is
>>> there an approach that wouldn''t have the performance
problems?
>>
>> If you''re not sending very often, one approach would be to
create two DLPI
>> endpoints (one for receive and one for send), create the receive on
with
>> DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind().
>> Admittedly a bit awkward, but it should retain performance.
>>
>> As far as the performance issues I was alluding to, search for checks
>> against mi_nactiveclients in the source tree.
>
> mi_nactiveclients tracks the number of active clients at the MAC client API
> level. In Jason''s case, the bind would reuse the dls link and its
MAC client
> handle for the underlying data-link, which would not cause
mi_nactiveclients
> to be bumped up.
That''s good to know.
> But I think that a larger issue is how LLDP intersects with other MAC layer
> features, such as link aggregation/VLANs/VNICs/etc, and what components
will
> be consuming and generating LLDP PDUs. A kernel implementation might be
> needed to properly interoperate with other MAC features and provide the
> right set of APIs for protocols or features which will use LLDP.
LLDP works on ''physical'' links in that from what I can glean
from
reading the specification, PDUs would be sent on every link in an
aggregation (with the link-specific information), and sent without any
VLAN tags (I.e. you would not have a separate PDU for every vlan).

For example, if ''hostA'' had a hostid of
''abcd1234'' and a port ''nge2''
with vlan 400 and vlan 500 configured, the neighbor would see the
following PDU:
     Chassis ID: abcd1234
     PortID: nge2
     VlanID: 400, enabled
     VlanID: 500, enabled

If the host was acting as a switch and a VLAN was associated with the
untagged frames on the port, a ''Port VLAN ID'' TLV could be
included as
well that included the untagged vlan value.

For an aggregation, assuming each link is configured for transmission
(it would be odd to only have some of them, but doesn''t appear to be
prohibited), you would see the chassis id, and port id (for that link
-- i.e. ''bge0'' or such), and the aggregation tlv (which
contains the
aggregation id) sent.  Thus if ''hostA'' with a hostid of
''abcd1234'' had
aggr1234 configured on bge3 and e1000g5, they would see two PDUs:
>From bge3:    Chassis ID: abcd1234
    PortID: bge3
    Aggr: Enabled, ID=1234
>From e1000g5:    Chassis ID: abcd1234
    PortID: e1000g5
    Aggr: Enabled, ID=1234

So far, the potential consumers would be dladm(1m), DCBX (for FCoE).
In the future, other consumers might be nwam,  an sma agent (for
implementing the LLDP MIB), and VOIP software (if the MED extensions
are implemented).

For VNICs, that is an interesting case that might warrant further
discussion.  If associated (if that''s the right term) to an etherstub,
I think they''d behave like any other link.  When they''re
associated
with a physical link, it becomes a more interesting question.  Should
PDUs be sent with the VNIC information as well?  It is permissible,
though the current draft for DCBX suggests that negotiation in such in
instance (where one end sees multiple neighbors on a single link) is
halted until only one neighbor is seen.
>
> I think we first need to better understand these issues and related
> requirements before deciding on the best approach.
>
> Nicolas.
>
> --
> Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
> droux at sun.com - http://blogs.sun.com/droux
>
>

James Carlson

2009-Mar-11 19:25 UTC

head link

[crossbow-discuss] DLPI and non IP frames

Jason King writes:> LLDP works on ''physical'' links in that from what I can
glean from
> reading the specification, PDUs would be sent on every link in an
> aggregation (with the link-specific information), and sent without any
> VLAN tags (I.e. you would not have a separate PDU for every vlan).
That''s pretty much what I expected, and it puts this protocol on a par
with LACP.

Unfortunately, because of the way aggregations work in OpenSolaris, I
don''t think you can do this with DLPI from user space today -- you
can''t send packets through those links, because the aggregation owns
them exclusively.  (We allow just passive snooping.)

I think you''ll need a special new mechanism down in the mac layer to
handle LLDP I/O.

-- 
James Carlson, Solaris Networking              <james.d.carlson at
sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Jason King

2009-Mar-11 21:08 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Wed, Mar 11, 2009 at 2:25 PM, James Carlson <james.d.carlson at
sun.com> wrote:> Jason King writes:
>> LLDP works on ''physical'' links in that from what I
can glean from
>> reading the specification, PDUs would be sent on every link in an
>> aggregation (with the link-specific information), and sent without any
>> VLAN tags (I.e. you would not have a separate PDU for every vlan).
>
> That''s pretty much what I expected, and it puts this protocol on a
par
> with LACP.
>
> Unfortunately, because of the way aggregations work in OpenSolaris, I
> don''t think you can do this with DLPI from user space today -- you
> can''t send packets through those links, because the aggregation
owns
> them exclusively. ?(We allow just passive snooping.)
>
> I think you''ll need a special new mechanism down in the mac layer
to
> handle LLDP I/O.
Hrmm... Doing a quick inspection it appears that the aggr driver opens
the port in exclusive mode, so nothing else (such as LLDP) would be
able to open it.  This is getting more interesting :)

I can think of a few workarounds (mostly around special casing
aggregations and adding some hooks in either the mac or aggr driver to
let lldp sneak in), but none would look particularly pretty.  Anyone a
bit more familiar with this area have any ideas?

Nicolas Droux

2009-Mar-11 21:20 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Mar 10, 2009, at 10:09 PM, Jason King wrote:
> On Tue, Mar 10, 2009 at 5:32 PM, Nicolas Droux <droux at sun.com>
wrote:
>>
>> On Mar 6, 2009, at 8:26 PM, Peter Memishian wrote:
>>
>>>
>>>> To rehash a question I had (and was told this might be a better
>>>> place
>>>> to find the answer) -- I''ve been working on LLDP
support for
>>>> Opensolaris ( http://www.opensolaris.org/os/projects/lld ). 
LLDP
>>>> works by sending ethernet frames to an ethernet multicast
address
>>>> of
>>>> 01-80-c3-00-00-0e with an ethertype of 0x88cc.  My current  
>>>> prototype (
>>>> ssh://anon at hg.opensolaris.org/hg/lld/lld-proto2 ) works by
doing a
>>>> dlpi_open(), dlpi_enabmulti(), and dlpi_bind() (
>>>>
http://src.opensolaris.org/source/xref/lld/lld-proto2/lldp.c#204 ).
>>>>
>>>> It appears that doing a dlpi_bind() could possibly cause a drop
in
>>>> performance on the link.  Since I don''t have the means
to test this
>>>> myself, I would like to know, if that is in fact correct, is it
>>>> significant enough to warrant taking another approach?  If so,
is
>>>> there an approach that wouldn''t have the performance
problems?
>>>
>>> If you''re not sending very often, one approach would be to
create
>>> two DLPI
>>> endpoints (one for receive and one for send), create the receive  
>>> on with
>>> DLPI_PASSIVE, and bracket the sends with dlpi_bind()/dlpi_unbind().
>>> Admittedly a bit awkward, but it should retain performance.
>>>
>>> As far as the performance issues I was alluding to, search for  
>>> checks
>>> against mi_nactiveclients in the source tree.
>>
>> mi_nactiveclients tracks the number of active clients at the MAC  
>> client API
>> level. In Jason''s case, the bind would reuse the dls link and
its
>> MAC client
>> handle for the underlying data-link, which would not cause  
>> mi_nactiveclients
>> to be bumped up.
>
> That''s good to know.
>
>> But I think that a larger issue is how LLDP intersects with other  
>> MAC layer
>> features, such as link aggregation/VLANs/VNICs/etc, and what  
>> components will
>> be consuming and generating LLDP PDUs. A kernel implementation  
>> might be
>> needed to properly interoperate with other MAC features and provide  
>> the
>> right set of APIs for protocols or features which will use LLDP.
>
> LLDP works on ''physical'' links in that from what I can
glean from
> reading the specification, PDUs would be sent on every link in an
> aggregation (with the link-specific information), and sent without any
> VLAN tags (I.e. you would not have a separate PDU for every vlan).
As Jim already pointed out in his reply, the current implementation of  
link aggregation in OpenSolaris is based on the assumption that the  
link aggregation "owns" the underlying link, and a DLPI application  
like your proposed daemon cannot at the same time send PDUs on  
individual ports. So some more work will be needed to make this work  
for link aggregation.
> For example, if ''hostA'' had a hostid of
''abcd1234'' and a port ''nge2''
> with vlan 400 and vlan 500 configured, the neighbor would see the
> following PDU:
>     Chassis ID: abcd1234
>     PortID: nge2
>     VlanID: 400, enabled
>     VlanID: 500, enabled
>
> If the host was acting as a switch and a VLAN was associated with the
> untagged frames on the port, a ''Port VLAN ID'' TLV could
be included as
> well that included the untagged vlan value.
>
> For an aggregation, assuming each link is configured for transmission
> (it would be odd to only have some of them, but doesn''t appear to
be
> prohibited), you would see the chassis id, and port id (for that link
> -- i.e. ''bge0'' or such), and the aggregation tlv (which
contains the
> aggregation id) sent.  Thus if ''hostA'' with a hostid of
''abcd1234'' had
> aggr1234 configured on bge3 and e1000g5, they would see two PDUs:
>
> From bge3:
>    Chassis ID: abcd1234
>    PortID: bge3
>    Aggr: Enabled, ID=1234
>
> From e1000g5:
>    Chassis ID: abcd1234
>    PortID: e1000g5
>    Aggr: Enabled, ID=1234
And you are planning to use libdladm to find the information needed on  
the various data-links to generate the LDDP contents, right?
> So far, the potential consumers would be dladm(1m), DCBX (for FCoE).
> In the future, other consumers might be nwam,  an sma agent (for
> implementing the LLDP MIB), and VOIP software (if the MED extensions
> are implemented).
It would be interesting to discuss how this will fit with the FCOE  
stack in more details. I''m cc''ing Zhong Wang who is working on
FCOE in
OpenSolaris and requires DCBX support for negotiating Priority Flow  
Control (PFC). The FCOE stack has both user and kernel-space  
components, so I suppose that FCOE could use DCBX on LLDP either way,  
and then as a result of the negotiation, pass the CoS value to be used  
and other info through the MAC client API.
> For VNICs, that is an interesting case that might warrant further
> discussion.  If associated (if that''s the right term) to an
etherstub,
> I think they''d behave like any other link.  When they''re
associated
> with a physical link, it becomes a more interesting question.  Should
> PDUs be sent with the VNIC information as well?  It is permissible,
> though the current draft for DCBX suggests that negotiation in such in
> instance (where one end sees multiple neighbors on a single link) is
> halted until only one neighbor is seen.
We might be required to do this if DCBX relies on such information  
during negotiation. Also since VNICs can have their own VID, and VLAN  
data-links are implemented as VNICs, VNICs would have to be taken into  
account when advertising the VIDs configured on the link.

Thanks,
Nicolas.
>
>
>>
>> I think we first need to better understand these issues and related
>> requirements before deciding on the best approach.
>>
>> Nicolas.
>>
>> --
>> Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
>> droux at sun.com - http://blogs.sun.com/droux
>>
>>
-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Nicolas Droux

2009-Mar-12 05:26 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Mar 11, 2009, at 3:08 PM, Jason King wrote:
> On Wed, Mar 11, 2009 at 2:25 PM, James Carlson <james.d.carlson at
sun.com
> > wrote:
>> Jason King writes:
>>> LLDP works on ''physical'' links in that from what
I can glean from
>>> reading the specification, PDUs would be sent on every link in an
>>> aggregation (with the link-specific information), and sent without
>>> any
>>> VLAN tags (I.e. you would not have a separate PDU for every vlan).
>>
>> That''s pretty much what I expected, and it puts this protocol
on a
>> par
>> with LACP.
>>
>> Unfortunately, because of the way aggregations work in OpenSolaris, I
>> don''t think you can do this with DLPI from user space today --
you
>> can''t send packets through those links, because the
aggregation owns
>> them exclusively.  (We allow just passive snooping.)
>>
>> I think you''ll need a special new mechanism down in the mac
layer to
>> handle LLDP I/O.
>
> Hrmm... Doing a quick inspection it appears that the aggr driver opens
> the port in exclusive mode, so nothing else (such as LLDP) would be
> able to open it.  This is getting more interesting :)
Something else could open it, but only to capture packets in passive  
mode, you can still snoop an aggregated link for example.
> I can think of a few workarounds (mostly around special casing
> aggregations and adding some hooks in either the mac or aggr driver to
> let lldp sneak in), but none would look particularly pretty.  Anyone a
> bit more familiar with this area have any ideas?
One approach would be to send/receive the PDUs through the  
aggregation. On the receive side the PDUs received by the ports should  
be already passed up. On the send side the aggregation could detect  
the LLDP PDUs (this may not be a big issue since aggr needs to parse  
the packet headers to compute the hash for port selection anyway), and  
when a LLDP PDU is being sent, send a corresponding PDU across all  
members of the aggregation. How tricky this gets could depend on the  
contents of the PDUs for aggregation links. This is not particularly  
pretty, but at least the kernel changes would be contained in aggr  
itself.

Nicolas.

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

James Carlson

2009-Mar-12 12:02 UTC

head link

[crossbow-discuss] DLPI and non IP frames

Nicolas Droux writes:> One approach would be to send/receive the PDUs through the  
> aggregation. On the receive side the PDUs received by the ports should  
> be already passed up. On the send side the aggregation could detect  
> the LLDP PDUs (this may not be a big issue since aggr needs to parse  
> the packet headers to compute the hash for port selection anyway), and  
> when a LLDP PDU is being sent, send a corresponding PDU across all  
> members of the aggregation. How tricky this gets could depend on the  
> contents of the PDUs for aggregation links. This is not particularly  
> pretty, but at least the kernel changes would be contained in aggr  
> itself.
The LLDPDUs sent are different on each port -- at least the Port ID
TLV needs to be different.  That means that the aggr code would need
to parse the TLVs, find the right one (fortunately, there''s a
mandatory ordering), and modify it for sending on each of the links in
the aggregation.

On receive, the listener will need to know which underlying port
received a given LLDPDU so that it can keep them straight.

I suppose it''s possible to do this, but I''m not sure how
viable that
design would be.  I think it''d be much better to provide a way to get
real per-port access.  You''re going to need it anyway if you implement
an 802.1X authenticator.

-- 
James Carlson, Solaris Networking              <james.d.carlson at
sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Jason King

2009-Mar-13 19:34 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Thu, Mar 12, 2009 at 7:02 AM, James Carlson <james.d.carlson at
sun.com> wrote:> Nicolas Droux writes:
>> One approach would be to send/receive the PDUs through the
>> aggregation. On the receive side the PDUs received by the ports should
>> be already passed up. On the send side the aggregation could detect
>> the LLDP PDUs (this may not be a big issue since aggr needs to parse
>> the packet headers to compute the hash for port selection anyway), and
>> when a LLDP PDU is being sent, send a corresponding PDU across all
>> members of the aggregation. How tricky this gets could depend on the
>> contents of the PDUs for aggregation links. This is not particularly
>> pretty, but at least the kernel changes would be contained in aggr
>> itself.
>
> The LLDPDUs sent are different on each port -- at least the Port ID
> TLV needs to be different. ?That means that the aggr code would need
> to parse the TLVs, find the right one (fortunately, there''s a
> mandatory ordering), and modify it for sending on each of the links in
> the aggregation.
>
> On receive, the listener will need to know which underlying port
> received a given LLDPDU so that it can keep them straight.
>
> I suppose it''s possible to do this, but I''m not sure how
viable that
> design would be. ?I think it''d be much better to provide a way to
get
> real per-port access. ?You''re going to need it anyway if you
implement
> an 802.1X authenticator.
Are there any plans afoot to implement 802.1x?

James Carlson

2009-Mar-13 19:41 UTC

head link

[crossbow-discuss] DLPI and non IP frames

Jason King writes:> On Thu, Mar 12, 2009 at 7:02 AM, James Carlson <james.d.carlson at
sun.com> wrote:
> > I suppose it''s possible to do this, but I''m not sure
how viable that
> > design would be. ?I think it''d be much better to provide a
way to get
> > real per-port access. ?You''re going to need it anyway if you
implement
> > an 802.1X authenticator.
> 
> Are there any plans afoot to implement 802.1x?
It''s CR 5092062.  There''s been a fair amount of interest in it
(including from our own IT department), and I know we''ve internally
discussed plans to launch a project, but I don''t know the current
state of things.

I''d like to see us have a nice generic implementation that works on
all interfaces, rather than something wired into just 802.11.  (And in
terms of priority, I''d expect that supplicant would come first.
Authenticator is harder, as doing a good job on that means connecting
into AAA, and Solaris just doesn''t have _any_ AAA infrastructure.)

-- 
James Carlson, Solaris Networking              <james.d.carlson at
sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Nicolas Droux

2009-Mar-13 22:23 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Mar 12, 2009, at 6:02 AM, James Carlson wrote:
> Nicolas Droux writes:
>> One approach would be to send/receive the PDUs through the
>> aggregation. On the receive side the PDUs received by the ports  
>> should
>> be already passed up. On the send side the aggregation could detect
>> the LLDP PDUs (this may not be a big issue since aggr needs to parse
>> the packet headers to compute the hash for port selection anyway),  
>> and
>> when a LLDP PDU is being sent, send a corresponding PDU across all
>> members of the aggregation. How tricky this gets could depend on the
>> contents of the PDUs for aggregation links. This is not particularly
>> pretty, but at least the kernel changes would be contained in aggr
>> itself.
>
> The LLDPDUs sent are different on each port -- at least the Port ID
> TLV needs to be different.  That means that the aggr code would need
> to parse the TLVs, find the right one (fortunately, there''s a
> mandatory ordering), and modify it for sending on each of the links in
> the aggregation.
>
> On receive, the listener will need to know which underlying port
> received a given LLDPDU so that it can keep them straight.
>
> I suppose it''s possible to do this, but I''m not sure how
viable that
> design would be.  I think it''d be much better to provide a way to
get
> real per-port access.  You''re going to need it anyway if you
implement
> an 802.1X authenticator.
For performance reasons the aggregation owns the underlying MAC, and  
the MAC layer corresponding to the aggregated ports is bypassed on the  
data-path (today we have full bypass on RX and partial bypass on TX,  
full TX bypass is planned as well). This doesn''t allow other clients  
to have direct access to an aggregated port to send and receive data.  
Other changes to the core MAC layer would be needed on the control  
path as well to make this possible. Since protocols like 802.1x and  
LLDP don''t need the full functionality of the MAC client API, we could
potentially provide them limited access to ports via the aggregation,  
while avoiding aggr to understand these protocols, and preserving the  
bypass.

Nicolas.
>
>
> -- 
> James Carlson, Solaris Networking              <james.d.carlson at
sun.com
> >
> Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442  
> 2084
> MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442  
> 1677
-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Jason King

2009-Mar-25 01:36 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com>
wrote:>
> On Mar 12, 2009, at 6:02 AM, James Carlson wrote:
>
>> Nicolas Droux writes:
>>>
>>> One approach would be to send/receive the PDUs through the
>>> aggregation. On the receive side the PDUs received by the ports
should
>>> be already passed up. On the send side the aggregation could detect
>>> the LLDP PDUs (this may not be a big issue since aggr needs to
parse
>>> the packet headers to compute the hash for port selection anyway),
and
>>> when a LLDP PDU is being sent, send a corresponding PDU across all
>>> members of the aggregation. How tricky this gets could depend on
the
>>> contents of the PDUs for aggregation links. This is not
particularly
>>> pretty, but at least the kernel changes would be contained in aggr
>>> itself.
>>
>> The LLDPDUs sent are different on each port -- at least the Port ID
>> TLV needs to be different. ?That means that the aggr code would need
>> to parse the TLVs, find the right one (fortunately, there''s a
>> mandatory ordering), and modify it for sending on each of the links in
>> the aggregation.
>>
>> On receive, the listener will need to know which underlying port
>> received a given LLDPDU so that it can keep them straight.
>>
>> I suppose it''s possible to do this, but I''m not sure
how viable that
>> design would be. ?I think it''d be much better to provide a way
to get
>> real per-port access. ?You''re going to need it anyway if you
implement
>> an 802.1X authenticator.
>
> For performance reasons the aggregation owns the underlying MAC, and the
MAC
> layer corresponding to the aggregated ports is bypassed on the data-path
> (today we have full bypass on RX and partial bypass on TX, full TX bypass
is
> planned as well). This doesn''t allow other clients to have direct
access to
> an aggregated port to send and receive data. Other changes to the core MAC
> layer would be needed on the control path as well to make this possible.
> Since protocols like 802.1x and LLDP don''t need the full
functionality of
> the MAC client API, we could potentially provide them limited access to
> ports via the aggregation, while avoiding aggr to understand these
> protocols, and preserving the bypass.
I''m not sure if I''d need to bounce this back to
networking-discuss,
but I''ll start here.  If LLDP or 802.1x call the mac api directly
(implying at least some pieces of this need to sit in a kernel
module), it sounds like it could have a negative impact on performance
(if something else -- IP? -- is also using the link) due to multiple
mac clients, whereas if libdlpi is used, it sounds like the existing
mac handle is able to be reused.

Anyone know if the overhead is significant enough to warrant possibly
making the limited access needed for aggregations go through some
private dlpi calls that can make the corresponding mac calls to get at
the underlying links?

Jason King

2009-Apr-08 03:43 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com>
wrote:>
> On Mar 12, 2009, at 6:02 AM, James Carlson wrote:
>
>> Nicolas Droux writes:
>>>
>>> One approach would be to send/receive the PDUs through the
>>> aggregation. On the receive side the PDUs received by the ports
should
>>> be already passed up. On the send side the aggregation could detect
>>> the LLDP PDUs (this may not be a big issue since aggr needs to
parse
>>> the packet headers to compute the hash for port selection anyway),
and
>>> when a LLDP PDU is being sent, send a corresponding PDU across all
>>> members of the aggregation. How tricky this gets could depend on
the
>>> contents of the PDUs for aggregation links. This is not
particularly
>>> pretty, but at least the kernel changes would be contained in aggr
>>> itself.
>>
>> The LLDPDUs sent are different on each port -- at least the Port ID
>> TLV needs to be different. ?That means that the aggr code would need
>> to parse the TLVs, find the right one (fortunately, there''s a
>> mandatory ordering), and modify it for sending on each of the links in
>> the aggregation.
>>
>> On receive, the listener will need to know which underlying port
>> received a given LLDPDU so that it can keep them straight.
>>
>> I suppose it''s possible to do this, but I''m not sure
how viable that
>> design would be. ?I think it''d be much better to provide a way
to get
>> real per-port access. ?You''re going to need it anyway if you
implement
>> an 802.1X authenticator.
>
> For performance reasons the aggregation owns the underlying MAC, and the
MAC
> layer corresponding to the aggregated ports is bypassed on the data-path
> (today we have full bypass on RX and partial bypass on TX, full TX bypass
is
> planned as well). This doesn''t allow other clients to have direct
access to
Can you explain this a bit more?  While I''ve done work on either end
of the stack (DLPI clients, and NIC drivers using gldv3), but nothing
inbetween, so I''ve been trying to figure out how it all interacts.

Do you mean the mac instance the aggr creates for itself is bypassed?
I.e. On rx it goes from driver->mac->dls, instead of
driver->mac->aggr->mac->dls ?
If so, does this imply LACP has to be off (tangentially related, but
just trying to get the big picture).
> an aggregated port to send and receive data. Other changes to the core MAC
> layer would be needed on the control path as well to make this possible.
> Since protocols like 802.1x and LLDP don''t need the full
functionality of
> the MAC client API, we could potentially provide them limited access to
> ports via the aggregation, while avoiding aggr to understand these
> protocols, and preserving the bypass.

Nicolas Droux

2009-Apr-09 04:11 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Apr 7, 2009, at 9:43 PM, Jason King wrote:
> On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com>
wrote:
>>
>> On Mar 12, 2009, at 6:02 AM, James Carlson wrote:
>>
>>> Nicolas Droux writes:
>>>>
>>>> One approach would be to send/receive the PDUs through the
>>>> aggregation. On the receive side the PDUs received by the ports
>>>> should
>>>> be already passed up. On the send side the aggregation could
detect
>>>> the LLDP PDUs (this may not be a big issue since aggr needs to
>>>> parse
>>>> the packet headers to compute the hash for port selection  
>>>> anyway), and
>>>> when a LLDP PDU is being sent, send a corresponding PDU across
all
>>>> members of the aggregation. How tricky this gets could depend
on
>>>> the
>>>> contents of the PDUs for aggregation links. This is not  
>>>> particularly
>>>> pretty, but at least the kernel changes would be contained in
aggr
>>>> itself.
>>>
>>> The LLDPDUs sent are different on each port -- at least the Port ID
>>> TLV needs to be different.  That means that the aggr code would
need
>>> to parse the TLVs, find the right one (fortunately,
there''s a
>>> mandatory ordering), and modify it for sending on each of the  
>>> links in
>>> the aggregation.
>>>
>>> On receive, the listener will need to know which underlying port
>>> received a given LLDPDU so that it can keep them straight.
>>>
>>> I suppose it''s possible to do this, but I''m not
sure how viable that
>>> design would be.  I think it''d be much better to provide a
way to
>>> get
>>> real per-port access.  You''re going to need it anyway if
you
>>> implement
>>> an 802.1X authenticator.
>>
>> For performance reasons the aggregation owns the underlying MAC,  
>> and the MAC
>> layer corresponding to the aggregated ports is bypassed on the data- 
>> path
>> (today we have full bypass on RX and partial bypass on TX, full TX  
>> bypass is
>> planned as well). This doesn''t allow other clients to have
direct
>> access to
>
> Can you explain this a bit more?  While I''ve done work on either
end
> of the stack (DLPI clients, and NIC drivers using gldv3), but nothing
> inbetween, so I''ve been trying to figure out how it all interacts.
>
> Do you mean the mac instance the aggr creates for itself is bypassed?
> I.e. On rx it goes from driver->mac->dls, instead of
> driver->mac->aggr->mac->dls ?
> If so, does this imply LACP has to be off (tangentially related, but
> just trying to get the big picture).
It''s the MAC layer associated with the port itself that''s
bypassed,
since we still need to have aggr on the receive path for LACP, and on  
TX for port selection. So instead of driver->mac->aggr->mac->dls, we
have driver->aggr->mac->dls (actually dls is also bypassed on the data-
path for the most common cases, but you get the drift.)

In order to do this, MAC provides mac_hwring*() functions to aggr.  
Aggr uses these functions to obtain the list of the rings of the  
underlying NIC, and exposes corresponding pseudo-rings to the MAC  
layer above it. These functions also rewire the port''s MAC data-path,  
both on the interrupt and polling path, to enable the bypass.

Hope this helps,

Nicolas.


>
>
>> an aggregated port to send and receive data. Other changes to the  
>> core MAC
>> layer would be needed on the control path as well to make this  
>> possible.
>> Since protocols like 802.1x and LLDP don''t need the full  
>> functionality of
>> the MAC client API, we could potentially provide them limited  
>> access to
>> ports via the aggregation, while avoiding aggr to understand these
>> protocols, and preserving the bypass.
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss
-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Jason King

2009-Apr-09 04:39 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Wed, Apr 8, 2009 at 11:11 PM, Nicolas Droux <Nicolas.Droux at sun.com>
wrote:>
> On Apr 7, 2009, at 9:43 PM, Jason King wrote:
>
>> On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com>
wrote:
>>>
>>> On Mar 12, 2009, at 6:02 AM, James Carlson wrote:
>>>
>>>> Nicolas Droux writes:
>>>>>
>>>>> One approach would be to send/receive the PDUs through the
>>>>> aggregation. On the receive side the PDUs received by the
ports should
>>>>> be already passed up. On the send side the aggregation
could detect
>>>>> the LLDP PDUs (this may not be a big issue since aggr needs
to parse
>>>>> the packet headers to compute the hash for port selection
anyway), and
>>>>> when a LLDP PDU is being sent, send a corresponding PDU
across all
>>>>> members of the aggregation. How tricky this gets could
depend on the
>>>>> contents of the PDUs for aggregation links. This is not
particularly
>>>>> pretty, but at least the kernel changes would be contained
in aggr
>>>>> itself.
>>>>
>>>> The LLDPDUs sent are different on each port -- at least the
Port ID
>>>> TLV needs to be different. ?That means that the aggr code would
need
>>>> to parse the TLVs, find the right one (fortunately,
there''s a
>>>> mandatory ordering), and modify it for sending on each of the
links in
>>>> the aggregation.
>>>>
>>>> On receive, the listener will need to know which underlying
port
>>>> received a given LLDPDU so that it can keep them straight.
>>>>
>>>> I suppose it''s possible to do this, but I''m
not sure how viable that
>>>> design would be. ?I think it''d be much better to
provide a way to get
>>>> real per-port access. ?You''re going to need it anyway
if you implement
>>>> an 802.1X authenticator.
>>>
>>> For performance reasons the aggregation owns the underlying MAC,
and the
>>> MAC
>>> layer corresponding to the aggregated ports is bypassed on the
data-path
>>> (today we have full bypass on RX and partial bypass on TX, full TX
bypass
>>> is
>>> planned as well). This doesn''t allow other clients to have
direct access
>>> to
>>
>> Can you explain this a bit more? ?While I''ve done work on
either end
>> of the stack (DLPI clients, and NIC drivers using gldv3), but nothing
>> inbetween, so I''ve been trying to figure out how it all
interacts.
>>
>> Do you mean the mac instance the aggr creates for itself is bypassed?
>> I.e. On rx it goes from driver->mac->dls, instead of
>> driver->mac->aggr->mac->dls ?
>> If so, does this imply LACP has to be off (tangentially related, but
>> just trying to get the big picture).
>
> It''s the MAC layer associated with the port itself that''s
bypassed, since we
> still need to have aggr on the receive path for LACP, and on TX for port
> selection. So instead of driver->mac->aggr->mac->dls, we have
> driver->aggr->mac->dls (actually dls is also bypassed on the
data-path for
> the most common cases, but you get the drift.)
>
> In order to do this, MAC provides mac_hwring*() functions to aggr. Aggr
uses
> these functions to obtain the list of the rings of the underlying NIC, and
> exposes corresponding pseudo-rings to the MAC layer above it. These
> functions also rewire the port''s MAC data-path, both on the
interrupt and
> polling path, to enable the bypass.
>
> Hope this helps,
Yes, I was suspecting something like that, but I''ve only glanced
through the crossbow design docs a long time ago, so I''m not familiar
with all the updates it''s done to the mac later (yet).

What I''m wondering as a solution for LLDP (and 802.1x) is perhaps a
private callback function for rx, something conceptually like (the
actual types are probably off a bit since I don''t have the source in
front of me, and names are just there as placeholders):

typedef enum { AGGR_CB_LLDP, AGGR_CB_8021X } aggr_cb_proto_t;
typedef void (*aggr_per_port_cb_t)(aggr_grp_t *grp, datalink_id_t
link, mblk_t *mp, void *cookie);
int aggr_add_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto,
aggr_per_port_cb_t cb, void *cookie);
int aggr_del_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto,
aggr_per_port_cb_t cb);
int aggr_tx_port(aggr_grp_t *grp, datalink_id_t link, mblk_t *mp);

Though it has the disadvantage of complicating add/remove of links,
and port up/down events.   But as a starting point, I''d like to get
feedback/have holes shot in it/etc.

Nicolas Droux

2009-Apr-15 06:27 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Apr 8, 2009, at 10:39 PM, Jason King wrote:
> On Wed, Apr 8, 2009 at 11:11 PM, Nicolas Droux  
> <Nicolas.Droux at sun.com> wrote:
>>
>> On Apr 7, 2009, at 9:43 PM, Jason King wrote:
>>
>>> On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at
sun.com>
>>> wrote:
>>>>
>>>> On Mar 12, 2009, at 6:02 AM, James Carlson wrote:
>>>>
>>>>> Nicolas Droux writes:
>>>>>>
>>>>>> One approach would be to send/receive the PDUs through
the
>>>>>> aggregation. On the receive side the PDUs received by
the ports
>>>>>> should
>>>>>> be already passed up. On the send side the aggregation
could
>>>>>> detect
>>>>>> the LLDP PDUs (this may not be a big issue since aggr
needs to
>>>>>> parse
>>>>>> the packet headers to compute the hash for port
selection
>>>>>> anyway), and
>>>>>> when a LLDP PDU is being sent, send a corresponding PDU
across
>>>>>> all
>>>>>> members of the aggregation. How tricky this gets could
depend
>>>>>> on the
>>>>>> contents of the PDUs for aggregation links. This is not
>>>>>> particularly
>>>>>> pretty, but at least the kernel changes would be
contained in
>>>>>> aggr
>>>>>> itself.
>>>>>
>>>>> The LLDPDUs sent are different on each port -- at least the
Port
>>>>> ID
>>>>> TLV needs to be different.  That means that the aggr code
would
>>>>> need
>>>>> to parse the TLVs, find the right one (fortunately,
there''s a
>>>>> mandatory ordering), and modify it for sending on each of
the
>>>>> links in
>>>>> the aggregation.
>>>>>
>>>>> On receive, the listener will need to know which underlying
port
>>>>> received a given LLDPDU so that it can keep them straight.
>>>>>
>>>>> I suppose it''s possible to do this, but
I''m not sure how viable
>>>>> that
>>>>> design would be.  I think it''d be much better to
provide a way
>>>>> to get
>>>>> real per-port access.  You''re going to need it
anyway if you
>>>>> implement
>>>>> an 802.1X authenticator.
>>>>
>>>> For performance reasons the aggregation owns the underlying
MAC,
>>>> and the
>>>> MAC
>>>> layer corresponding to the aggregated ports is bypassed on the
>>>> data-path
>>>> (today we have full bypass on RX and partial bypass on TX, full
>>>> TX bypass
>>>> is
>>>> planned as well). This doesn''t allow other clients to
have direct
>>>> access
>>>> to
>>>
>>> Can you explain this a bit more?  While I''ve done work on
either end
>>> of the stack (DLPI clients, and NIC drivers using gldv3), but  
>>> nothing
>>> inbetween, so I''ve been trying to figure out how it all
interacts.
>>>
>>> Do you mean the mac instance the aggr creates for itself is  
>>> bypassed?
>>> I.e. On rx it goes from driver->mac->dls, instead of
>>> driver->mac->aggr->mac->dls ?
>>> If so, does this imply LACP has to be off (tangentially related,
but
>>> just trying to get the big picture).
>>
>> It''s the MAC layer associated with the port itself
that''s bypassed,
>> since we
>> still need to have aggr on the receive path for LACP, and on TX for  
>> port
>> selection. So instead of driver->mac->aggr->mac->dls, we
have
>> driver->aggr->mac->dls (actually dls is also bypassed on the
data-
>> path for
>> the most common cases, but you get the drift.)
>>
>> In order to do this, MAC provides mac_hwring*() functions to aggr.  
>> Aggr uses
>> these functions to obtain the list of the rings of the underlying  
>> NIC, and
>> exposes corresponding pseudo-rings to the MAC layer above it. These
>> functions also rewire the port''s MAC data-path, both on the  
>> interrupt and
>> polling path, to enable the bypass.
>>
>> Hope this helps,
>
> Yes, I was suspecting something like that, but I''ve only glanced
> through the crossbow design docs a long time ago, so I''m not
familiar
> with all the updates it''s done to the mac later (yet).
>
> What I''m wondering as a solution for LLDP (and 802.1x) is perhaps
a
> private callback function for rx, something conceptually like (the
> actual types are probably off a bit since I don''t have the source
in
> front of me, and names are just there as placeholders):
>
> typedef enum { AGGR_CB_LLDP, AGGR_CB_8021X } aggr_cb_proto_t;
> typedef void (*aggr_per_port_cb_t)(aggr_grp_t *grp, datalink_id_t
> link, mblk_t *mp, void *cookie);
> int aggr_add_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto,
> aggr_per_port_cb_t cb, void *cookie);
> int aggr_del_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto,
> aggr_per_port_cb_t cb);
> int aggr_tx_port(aggr_grp_t *grp, datalink_id_t link, mblk_t *mp);
>
> Though it has the disadvantage of complicating add/remove of links,
> and port up/down events.   But as a starting point, I''d like to
get
> feedback/have holes shot in it/etc.
Jason,

This could work. It would be nice to prototype this to validate the  
approach. If this becomes a large effort in itself, something to  
consider would be to handle the aggregated ports through  a separate  
RFE/project at a later time. This would allow other protocols which  
rely on LLDP to start their development (e.g. DCBX) and users to start  
experimenting with this feature.

BTW did you have a chance to look into how the DCBX support would be  
layered on top of your LLDP implementation? Since DCBX is used by  
FCoE, it would be interesting to see how these pieces can fit together.

Nicolas.

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Jason King

2009-Apr-15 19:02 UTC

head link

[crossbow-discuss] DLPI and non IP frames

On Wed, Apr 15, 2009 at 1:27 AM, Nicolas Droux <Nicolas.Droux at sun.com>
wrote:>
> On Apr 8, 2009, at 10:39 PM, Jason King wrote:
>
>> On Wed, Apr 8, 2009 at 11:11 PM, Nicolas Droux <Nicolas.Droux at
sun.com>
>> wrote:
>>>
>>> On Apr 7, 2009, at 9:43 PM, Jason King wrote:
>>>
>>>> On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at
sun.com> wrote:
>>>>>
>>>>> On Mar 12, 2009, at 6:02 AM, James Carlson wrote:
>>>>>
>>>>>> Nicolas Droux writes:
>>>>>>>
>>>>>>> One approach would be to send/receive the PDUs
through the
>>>>>>> aggregation. On the receive side the PDUs received
by the ports
>>>>>>> should
>>>>>>> be already passed up. On the send side the
aggregation could detect
>>>>>>> the LLDP PDUs (this may not be a big issue since
aggr needs to parse
>>>>>>> the packet headers to compute the hash for port
selection anyway),
>>>>>>> and
>>>>>>> when a LLDP PDU is being sent, send a corresponding
PDU across all
>>>>>>> members of the aggregation. How tricky this gets
could depend on the
>>>>>>> contents of the PDUs for aggregation links. This is
not particularly
>>>>>>> pretty, but at least the kernel changes would be
contained in aggr
>>>>>>> itself.
>>>>>>
>>>>>> The LLDPDUs sent are different on each port -- at least
the Port ID
>>>>>> TLV needs to be different. ?That means that the aggr
code would need
>>>>>> to parse the TLVs, find the right one (fortunately,
there''s a
>>>>>> mandatory ordering), and modify it for sending on each
of the links in
>>>>>> the aggregation.
>>>>>>
>>>>>> On receive, the listener will need to know which
underlying port
>>>>>> received a given LLDPDU so that it can keep them
straight.
>>>>>>
>>>>>> I suppose it''s possible to do this, but
I''m not sure how viable that
>>>>>> design would be. ?I think it''d be much better
to provide a way to get
>>>>>> real per-port access. ?You''re going to need it
anyway if you implement
>>>>>> an 802.1X authenticator.
>>>>>
>>>>> For performance reasons the aggregation owns the underlying
MAC, and
>>>>> the
>>>>> MAC
>>>>> layer corresponding to the aggregated ports is bypassed on
the
>>>>> data-path
>>>>> (today we have full bypass on RX and partial bypass on TX,
full TX
>>>>> bypass
>>>>> is
>>>>> planned as well). This doesn''t allow other clients
to have direct
>>>>> access
>>>>> to
>>>>
>>>> Can you explain this a bit more? ?While I''ve done work
on either end
>>>> of the stack (DLPI clients, and NIC drivers using gldv3), but
nothing
>>>> inbetween, so I''ve been trying to figure out how it
all interacts.
>>>>
>>>> Do you mean the mac instance the aggr creates for itself is
bypassed?
>>>> I.e. On rx it goes from driver->mac->dls, instead of
>>>> driver->mac->aggr->mac->dls ?
>>>> If so, does this imply LACP has to be off (tangentially
related, but
>>>> just trying to get the big picture).
>>>
>>> It''s the MAC layer associated with the port itself
that''s bypassed, since
>>> we
>>> still need to have aggr on the receive path for LACP, and on TX for
port
>>> selection. So instead of driver->mac->aggr->mac->dls,
we have
>>> driver->aggr->mac->dls (actually dls is also bypassed on
the data-path
>>> for
>>> the most common cases, but you get the drift.)
>>>
>>> In order to do this, MAC provides mac_hwring*() functions to aggr.
Aggr
>>> uses
>>> these functions to obtain the list of the rings of the underlying
NIC,
>>> and
>>> exposes corresponding pseudo-rings to the MAC layer above it. These
>>> functions also rewire the port''s MAC data-path, both on
the interrupt and
>>> polling path, to enable the bypass.
>>>
>>> Hope this helps,
>>
>> Yes, I was suspecting something like that, but I''ve only
glanced
>> through the crossbow design docs a long time ago, so I''m not
familiar
>> with all the updates it''s done to the mac later (yet).
>>
>> What I''m wondering as a solution for LLDP (and 802.1x) is
perhaps a
>> private callback function for rx, something conceptually like (the
>> actual types are probably off a bit since I don''t have the
source in
>> front of me, and names are just there as placeholders):
>>
>> typedef enum { AGGR_CB_LLDP, AGGR_CB_8021X } aggr_cb_proto_t;
>> typedef void (*aggr_per_port_cb_t)(aggr_grp_t *grp, datalink_id_t
>> link, mblk_t *mp, void *cookie);
>> int aggr_add_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto,
>> aggr_per_port_cb_t cb, void *cookie);
>> int aggr_del_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto,
>> aggr_per_port_cb_t cb);
>> int aggr_tx_port(aggr_grp_t *grp, datalink_id_t link, mblk_t *mp);
>>
>> Though it has the disadvantage of complicating add/remove of links,
>> and port up/down events. ? But as a starting point, I''d like
to get
>> feedback/have holes shot in it/etc.
>
> Jason,
>
> This could work. It would be nice to prototype this to validate the
> approach. If this becomes a large effort in itself, something to consider
> would be to handle the aggregated ports through ?a separate RFE/project at
a
> later time. This would allow other protocols which rely on LLDP to start
> their development (e.g. DCBX) and users to start experimenting with this
> feature.
I can try to whip something up in the next few weeks, though
unfortunately, I don''t have the equipment myself to test (outside of
making sure it builds correctly) with aggregations (I''ve been using
vnics on my home pc), I would need some assistance there.

It also would mean moving most of the operations into the kernel, so I
might have some questions as I go along related to that.
>
> BTW did you have a chance to look into how the DCBX support would be
layered
> on top of your LLDP implementation? Since DCBX is used by FCoE, it would be
> interesting to see how these pieces can fit together.
I tried soliciting feedback on the design from storage-discuss to find
out what they would need, but haven''t received anything from them.
>
> Nicolas.
>
> --
> Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
> droux at sun.com - http://blogs.sun.com/droux
>
>

crossbow discuss - Mar 2009 - DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames

[crossbow-discuss] DLPI and non IP frames