Please see attached document for a list of comments/questions related to the latest version of Crossbow arch. document. Specifically, it calls out issues/questions with the mac_xxx interfaces specified in sec 5.x. Thanks Narayan This message posted from opensolaris.org -------------- next part -------------- 1) MAC client open/close related: (Crossbow-virt.pdf Section 5.2.2 Pg 36) int mac_client_open(mac_handle_t *mh, mac_client_handle_t *mchp, mac_bind_cpus void mac_client_close(mac_client_handle_t mch); typedef struct mac_bind_cpus_s { uint_t mbc_ncpus; uint32_t *mbc_cpus; } mac_bind_cpus_t; Q1.1) The mac_client_open() interface definition line in the document is abruptly cut. It seems like there are additional arguments such as flags etc. Q1.2) On pg 38, there is a reference to the following flags, but which interface takes them as an argument? MAC_OPEN_FLAGS_FORCE_MULTI_RINGS MAC_OPEN_FLAGS_FORCE_ONE_RING It seems like these are an argument to mac_client_open(), but there is a reference mac_open() in the description see below: "If MAC_OPEN_FLAGS_FORCE_MULTI_RINGS flag is set and it is not possible to allocate mbc_ncpus hardware rings, the mac_open() call will fail, otherwise the MAC layer will attempt to reserve one hardware ring for the MAC client." Q1.3) Are there any other flags other than the following ones? MAC_OPEN_FLAGS_FORCE_MULTI_RINGS MAC_OPEN_FLAGS_FORCE_ONE_RING - Is there a way to force a software ring? Q1.4) Is the mbc_cpus in mac_bind_cpus_t an array of CPU ids? Q1.5) The following description of mbc_cpus on pg 37 is not clear, especially for the non-NULL case. "If mbc_cpus is NULL, the MAC layer will pick the CPUs. If mbc_cpus is non-NULL, the MAC layer will chose the CPUs.". Q1.6) What is the relationship between Unicast addresses(multiple unicast set via mac_unicast_add()), Rings and CPUs? - Is there a 1:1 relation between a unicast address and a ring? - Is there a 1:1 relation between a ring and CPU? - The Rings and CPUs are tightly coupled in this interface. How can allocate multiple rings even when there is one CPU(or less number of CPUs). - When there are multiple CPUs and multiple unicast addresses, is there address fanout per CPU? Q1.7) How is the binding of CPUs via mac_bind_cpus_t is co-ordinated with CPU DR(on the platforms that support them)? NOTE: CPU DR is already a supported feature on LDoms. Q1.8) LDoms requires the CPU binding to be changed dynamically, how can this be accomplished ? Q1.9) The following XXX on pg 37. When are the interface changes for priority and bandwidth specification available? "XXX We still need to add the priority and bandwidth limit as argument to mac_open(). We also need an entry point to change the set of CPUs." Q1.10) Can the mac client interface be extended to support creating a client based on ether_type? This is required for mac clients like fiberchannel over ethernet. 2) MAC Unicast address related: (Crossbow-virt.pdf Section 5.2.4 Pg 38) mac_unicast_handle_t mac_unicast_add(mac_client_handle_t mch, mac_addr_type_t addr_type, int *addr_slot, uint_t prefix_len, uchar_t *mac_addr, uint32_t flags); void mac_unicast_unset(mac_unicast_handle_t); void mac_unicast_get(mac_unicast_handle_t mah, uchar_t *mac_addr); void mac_unicast_update(mac_unicast_handle_t mah, mac_addr_type_t addr_type, int *addr_slot, uint_t prefix_len, uchar_t *mac_addr); QUESTIONS: Q2.1) The section 4.5 describes "By value" type which is used to set a specific MAC address by the MAC client. But there is no equivalent addr_type definition under mac_unicast_add() interface. NOTE: LDoms requires the MAC addresses that are allocated by LDom manager be used by the network device. So, LDoms will not use any other addr_type other than "By value" type. Q2.2) Is there an impact to the multiaddress_capab_t.maddr_add()/ maddr_remove() interfaces? Are these being obsoleted or going away? Q2.3) A system with many domains (aka LDoms) with virtual network devices, it requires the use of a large number layer2 addresses, this will exhaust h/w slots available on most standard NICs. How can a client take advantage of layer2 filtering provided by NICs like NII-NIU/Neptune. Specifically, this will help in avoiding the programming of the device into PROMISCous mode etc. Currently there are no interfaces that seem to provide such ability. Q2.4) Clients will need the ability to specify if mac_unicast_add() is allowed it to go into promiscous mode or not. An error return value is required if no h/w mac address slot is available. Q2.5) On pg 40, the follow description still pointing to the rings argument even though it has been removed from mac_unicast_add() interface. "The rings argument specifies the list of rings to associate with the specified unicast MAC address. If it is NULL, the MAC layer allocates a set of rings according to those available to the MAC client, see Section ringselection." Q2.6) Can it be assumed that every address added to a client is processed in a separate ring (either h/w ring or s/w ring)? Q2.7) How are the multiple addresses per client maintained, is it done in the MAC layer or does it bybpass the MAC layer and passed to h/w directly. Q2.8) Can unlimited number of mac addresses be assigned to a MAC client? What are the software/hardware features that limit this? 3) Rings related: (Crossbow-virt.pdf Section 5.3 Pg 43) mac_rint_t *mac_rings_list_get(mac_client_handle_t mch, uint_t nrings); void mac_rings_list_free(mac_rings_t *rings, uint_t nrings); uint16_t mac_ring_get_flags(mac_ring_t ring); QUESTIONS: Q3.1) All of these interfaces are now categorized as project-private API. What motivated this change. These interfaces need to be more open. Q3.2) The mac_rings_list_get() is only for h/w rings, is there an equivalent interface to obtain s/w ring information. Or this interface can be extended return both h/w ring or s/w ring information. Q3.3) Are the mac_resource_set() and mac_resources() interfaces going away? Q3.4) What is the action taken when no free h/w ring available. As per the documentation of mac_rings_list_get(), if no h/w ring available, it returns NULL. In such case, how does mac_unicast_add() behave when NULL is passed for rings? Q3.5) Are there any interfaces other than the above mac_rings_xxx interfaces that are available to deal with MAC rings? Q3.6) Is the mac_rings_list_get() returns the list of mac rings assigned to the client at the time of client open. How can this be changed after the client is open. Q3.7) Assigning h/w rings to a specific MAC address limits the bandwidth to the number of rings that are assigned to that address. Is there a way to not to bind h/w rings specific to MAC address so that the bandwidth could be used by any mac client depending on the traffic? 4) Receive callback related: (Crossbow-virt.pdf Section 5.2.5 Pg 40) int mac_rx_set(mac_client_handle_t mch, mac_rx_fn_t rx_fn, void *arg); int mac_rx_clear(mac_client_handle_t mch); QUESTIONS: Q4.1) How can a client get rx callback per ring that is assigned to the mac client? This will allow parallel processing and improve the performance. Such a feature is already being used in the current implementation of LDoms vSwitch driver and the mac_xxx interfaces should support such an ability. Q4.2) How can a client get a separate callback for a defined type of traffic, such as different SAP numbers etc. This will be useful to provide out of the band type packet processing or related services. Q4.3) There is a reference mac_addr_set(), should it be mac_unicast_add()? 5) Transmit related: (Crossbow-virt.pdf Section 5.2.7 Pg 41) mblk_t *mac_tx(mac_client_handle_t *mch, mblk_t *mp, uint64_t hint); QUESTIONS: Q5.1) What are the valid values for the ''hint'' argument? From the description on pg 42, NULL seems to be a valid value. Is it safe to assume that the ''hint'' is a ring-id, if so, a NULL value of 0 will conflict with a ring-id of 0. Q5.2) If NULL specified as a ''hint'', how is the tx ring selected? Q5.3) The ''hint'' argument description says the following. What is the meaning of a connection in this context and how to identify this? "The hint must be the same for packets of the same connection." 6) Multicast addresses related: (Crossbow-virt.pdf Section 5.2.6 Pg 41) int mac_multicast_add(mac_client_handle_t mch, const uint8_t *addr); int mac_multicast_remove(mac_client_handle_t mch, const uint8_t *addr); No comments at this point. 7) Promiscous mode realted: (Crossbow-virt.pdf Section 5.2.8 Pg 42) Its not clear if the above interface will be available or not, but two new intefaces are added: int mac_promisc_add(mac_client_handle_t mch, mac_promisc_type promisc_type, mac_promisc_fn_t promisc_fn, void *arg, mac_promisc_handle_t *php); int mac_promisc_remove(mac_client_handle_t mch, mac_promisc_handle_t *ph); MAC_PROMISC_ALL - send all packets MAC_PROMISC_MULTI - only broadcast and multicast May be the mac_promisc_add(MAC_PROMISC_ALL) will force device to operate in the promiscous mode. QUESTIONS: Q7.1) According to the section 4.6, the promiscuous mode operates in the layer2 switch model. When choosing the promiscuous mode model can it be either layer2 switch model or shared ethernet model? Q7.2) From the explanation of mac_promisc_add(), it seems like the mac_promisc_add() could be called without setting MAC address via mac_unicast_add(). Is this correct? If so, what is the expected behaviour? 8) Statistics related: Q8.1) Is the mac_stat_get() interface being obsoleted or changed? If so, what is the new equivalent interface? GENERAL QUESTONS: =============== Qg.1) Are there any GLDv3 MAC client interfaces that are being obsoleted(provided by the Nemo framework) but not documented in this doc? Qg.2) Are there any changes to the MAC driver interfaces or being obsoleted? Qg.3) There are no MAC client interfaces to specify bandwidth attributes. From the section 4.7, it seems like they are implemented as part of VNIC and not as MAC client interfaces. If this is the case, how can the bandwidth attributes be specified? Qg.4) When will the classification interface be fully documented for review? Qg.5) In the future it will be great if the document can include version info and change bars.
Narayan, Thanks for the feedback and questions. I was traveling at Tech Days this week and haven''t had the cycles to answer your questions yet. I''ll send you my answers next week. Thanks, Nicolas. Narayan Venkat wrote:> Please see attached document for a list of comments/questions related > to the latest version of Crossbow arch. document. Specifically, it calls > out issues/questions with the mac_xxx interfaces specified in sec 5.x. > > Thanks > Narayan-- Nicolas Droux - Solaris Core OS - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Narayan, Thanks for the comments, my answers below... On Sep 8, 2007, at 12:12 PM, Narayan Venkat wrote:> 1) MAC client open/close related: > (Crossbow-virt.pdf Section 5.2.2 Pg 36) > int mac_client_open(mac_handle_t *mh, > mac_client_handle_t *mchp, mac_bind_cpus > void mac_client_close(mac_client_handle_t mch); > > typedef struct mac_bind_cpus_s { > uint_t mbc_ncpus; > uint32_t *mbc_cpus; > } mac_bind_cpus_t; > > Q1.1) The mac_client_open() interface definition line in the > document > is abruptly cut. It seems like there are additional arguments > such as flags etc.Yes, there''s a missing break in that line, and the flag argument is missing, will fix.> > Q1.2) On pg 38, there is a reference to the following flags, > but which > interface takes them as an argument? > > MAC_OPEN_FLAGS_FORCE_MULTI_RINGS > MAC_OPEN_FLAGS_FORCE_ONE_RING > > It seems like these are an argument to mac_client_open(), > but there is a reference mac_open() in the description see below: > > "If MAC_OPEN_FLAGS_FORCE_MULTI_RINGS flag is set and it is not > possible to allocate mbc_ncpus hardware rings, the mac_open() > call will fail, otherwise the MAC layer will attempt to reserve > one hardware ring for the MAC client."These flags are specified when calling mac_client_open(), not mac_open ().> > Q1.3) Are there any other flags other than the following ones? > > MAC_OPEN_FLAGS_FORCE_MULTI_RINGS > MAC_OPEN_FLAGS_FORCE_ONE_RINGNo.> > - Is there a way to force a software ring?Do you mean not assign a hardware ring? I think this is something we could add, yes.> > Q1.4) Is the mbc_cpus in mac_bind_cpus_t an array of CPU ids?Yes.> > Q1.5) The following description of mbc_cpus on pg 37 is not clear, > especially for the non-NULL case. > > "If mbc_cpus is NULL, the MAC layer will pick the CPUs. > If mbc_cpus is non-NULL, the MAC layer will chose the CPUs.".The first one is correct. If mbc_cpus is non-NULL, the MAC layer will assign the CPUs provided by the caller.> > Q1.6) What is the relationship between Unicast addresses(multiple > unicast set via mac_unicast_add()), Rings and CPUs? > > - Is there a 1:1 relation between a unicast address and a ring? > - Is there a 1:1 relation between a ring and CPU?Neither. The MAC addresses will share the same rings and CPUs.> > - The Rings and CPUs are tightly coupled in this interface. > How can allocate multiple rings even when there is one CPU(or less > number of CPUs).You don''t allocate rings explicitly, you express a level of parallelism instead, the framework distributes the hardware rings transparently.> - When there are multiple CPUs and multiple unicast addresses, > is there address fanout per CPU?See 2 answers above.> > Q1.7) How is the binding of CPUs via mac_bind_cpus_t is co- > ordinated > with CPU DR(on the platforms that support them)?The MAC layer will be notified of the removal of the CPU and will stop using it for its worker threads and interrupts.> > NOTE: CPU DR is already a supported feature on LDoms. > > Q1.8) LDoms requires the CPU binding to be changed dynamically, > how can this be accomplished ?This cannot be done with the API as documented today. It seems that you are looking for a call to change the set of CPUs assigned to the MAC client, is that what you are asking for?> > Q1.9) The following XXX on pg 37. When are the interface > changes for > priority and bandwidth specification available? > > "XXX We still need to add the priority and bandwidth > limit as argument to mac_open(). We also need an entry > point to change the set of CPUs."I''m working on it but I don''t have a firm date.> > Q1.10) Can the mac client interface be extended to support > creating > a client based on ether_type? This is required for mac > clients > like fiberchannel over ethernet.No, each MAC client corresponds to a MAC level entity which is defined by its MAC address. Multiple ether types can be supported on top of a MAC client.> > 2) MAC Unicast address related: > (Crossbow-virt.pdf Section 5.2.4 Pg 38) > > mac_unicast_handle_t mac_unicast_add(mac_client_handle_t mch, > mac_addr_type_t addr_type, int *addr_slot, uint_t prefix_len, > uchar_t *mac_addr, uint32_t flags); > > void mac_unicast_unset(mac_unicast_handle_t); > void mac_unicast_get(mac_unicast_handle_t mah, uchar_t *mac_addr); > void mac_unicast_update(mac_unicast_handle_t mah, > mac_addr_type_t addr_type, int *addr_slot, uint_t prefix_len, > uchar_t *mac_addr); > > QUESTIONS: > Q2.1) The section 4.5 describes "By value" type which is used > to set a specific MAC address by the MAC client. But there > is no equivalent addr_type definition under mac_unicast_add() > interface.MAC_UNICAST_VALUE is missing from the list, this is what you are looking for.> > NOTE: LDoms requires the MAC addresses that are allocated > by LDom manager be used by the network device. So, LDoms > will not use any other addr_type other than "By value" type.That''s fine.> > Q2.2) Is there an impact to the multiaddress_capab_t.maddr_add()/ > maddr_remove() interfaces? Are these being obsoleted or > going away?The capability will stay, and the framework will continue to use that capability to query and control the allocation of MAC address slots. However that interface is not intended to be used by drivers which should use the MAC client interfaces instead.> > Q2.3) A system with many domains (aka LDoms) with virtual network > devices, it requires the use of a large number layer2 > addresses, > this will exhaust h/w slots available on most standard NICs. > How can a client take advantage of layer2 filtering provided by > NICs like NII-NIU/Neptune. Specifically, this will help in > avoiding the programming of the device into > PROMISCous mode > etc. Currently there are no interfaces that seem to > provide > such ability.Yes, this is a situation we are aware of. We''ve talked on this list about having multiple VNICs sharing the same MAC address, and identified by their IP address instead. However this needs to be scoped and defined further before we can commit on providing that functionality.> > Q2.4) Clients will need the ability to specify if mac_unicast_add() > is allowed it to go into promiscous mode or not. An > error return > value is required if no h/w mac address slot is available.OK, I will add a flag.> > Q2.5) On pg 40, the follow description still pointing to the > rings argument even though it has been removed from > mac_unicast_add() interface. > > "The rings argument specifies the list of rings to > associate with the specified unicast MAC address. > If it is NULL, the MAC layer allocates a set of rings > according to those available to the MAC client, see > Section ringselection."This should be removed, good catch.> > Q2.6) Can it be assumed that every address added to a client is > processed in a separate ring (either h/w ring or s/w > ring)?No, all the MAC addresses for a client will share the same ring(s). If there''s a need to have a different set of rings associated with a MAC address, then a different MAC client should be created.> Q2.7) How are the multiple addresses per client maintained, is it > done > in the MAC layer or does it bybpass the MAC layer and passed > to h/w directly.Since the action of reserving the MAC address is triggered by a call to the MAC layer, the MAC layer cannot be bypassed. The MAC layer will use the multiple MAC address capability exposed by the driver to reserve a new MAC address slot.> > Q2.8) Can unlimited number of mac addresses be assigned to a MAC > client? What are the software/hardware features that limit this?Memory that can be allocated by the kernel.> > > 3) Rings related: > (Crossbow-virt.pdf Section 5.3 Pg 43) > mac_rint_t *mac_rings_list_get(mac_client_handle_t mch, > uint_t nrings); > void mac_rings_list_free(mac_rings_t *rings, uint_t nrings); > uint16_t mac_ring_get_flags(mac_ring_t ring); > > > QUESTIONS: > > Q3.1) All of these interfaces are now categorized as project- > private > API. What motivated this change. These interfaces need to be > more open.The MAC layer will do the allocation of hardware resources to the various MAC clients and their flows. Instead of having each MAC client manage its own set of resources, the resources are allocated to MAC clients based on their needs, for example the degree of parallelism expressed through mac_client_open(). If you have specific functional requirements that are not satisfied by the current document, please list them.> Q3.2) The mac_rings_list_get() is only for h/w rings, is > there > an equivalent interface to obtain s/w ring information. > Or this interface can be extended return both h/w ring > or s/w ring information.The interface will evolve to provide that information, but it will remain project private. It is provided here FYI but will change in future revisions of the document.> Q3.3) Are the mac_resource_set() and mac_resources() interfaces > going away?Yes, they will be replaced by different interfaces. But note that they are already project private in Nevada and were not supposed to be used by other ON components.> Q3.4) What is the action taken when no free h/w ring available. > As per the documentation of mac_rings_list_get(), if no h/w > ring available, it returns NULL. In such case, how does > mac_unicast_add() behave when NULL is passed for rings?mac_unicast_add() no longer takes rings. This will be handled transparently to the MAC clients by using a default ring and falling back to software classification.> Q3.5) Are there any interfaces other than the above mac_rings_xxx > interfaces that are available to deal with MAC rings?Not available to MAC clients. The set of project private interfaces might evolve as we refine the design.> Q3.6) Is the mac_rings_list_get() returns the list of mac rings > assigned to the client at the time of client open. How can > this be changed after the client is open.The set of assigned rings may change. The details on the APIs needed to support this still need to be defined, but they will remain project private.> Q3.7) Assigning h/w rings to a specific MAC address limits the > bandwidth to the number of rings that are assigned to that > address. Is there a way to not to bind h/w rings specific > to MAC address so that the bandwidth could be used by > any mac client depending on the traffic?See Q1.3.> > 4) Receive callback related: > (Crossbow-virt.pdf Section 5.2.5 Pg 40) > int mac_rx_set(mac_client_handle_t mch, mac_rx_fn_t rx_fn, > void *arg); > int mac_rx_clear(mac_client_handle_t mch); > > QUESTIONS: > > Q4.1) How can a client get rx callback per ring that is assigned > to the mac client? This will allow parallel processing > and improve the performance. Such a feature is already > being used in the current implementation of LDoms vSwitch > driver and the mac_xxx interfaces should support such an > ability.The parallel processing will still happen. I.e. if multiple hardware rings or software rings are assigned to a MAC clients, multiple connections associated with that MAC client will be spread across these rings.> Q4.2) How can a client get a separate callback for a defined type of > traffic, such as different SAP numbers etc. This will > be useful to provide out of the band type packet processing > or related services.This will be supported by a MAC flow API built on top of the MAC client API. The flow API will be described by a separate document.> > Q4.3) There is a reference mac_addr_set(), should it be > mac_unicast_add()?Yes, will fix.> > 5) Transmit related: > > (Crossbow-virt.pdf Section 5.2.7 Pg 41) > mblk_t *mac_tx(mac_client_handle_t *mch, mblk_t *mp, uint64_t hint); > > QUESTIONS: > > Q5.1) What are the valid values for the ''hint'' argument? > From the description on pg 42, NULL seems to be > a valid value. Is it safe to assume that the ''hint'' is a > ring-id, if so, a NULL value of 0 will conflict with a > ring-id of 0.The hint can be any 64 bit value, but it must always be the same value for the packets corresponding to the same connection to avoid reordering. TCP and UDP for example pass the connection pointer as the hint, which allows us to avoid packet inspection for these protocols.> > Q5.2) If NULL specified as a ''hint'', how is the tx ring > selected?In this case mac_tx() will parse the packet headers and hash on the header information to select a transmit ring.> > Q5.3) The ''hint'' argument description says the following. > What is the meaning of a connection in this context and > how to identify this? > > "The hint must be the same for packets of the same connection."It can be a TCP connection for example. This is required to avoid reordering of packets for the same connection.> 6) Multicast addresses related: > (Crossbow-virt.pdf Section 5.2.6 Pg 41) > int mac_multicast_add(mac_client_handle_t mch, const uint8_t > *addr); > int mac_multicast_remove(mac_client_handle_t mch, const uint8_t > *addr); > > > No comments at this point. > > 7) Promiscous mode realted: > > (Crossbow-virt.pdf Section 5.2.8 Pg 42) > Its not clear if the above interface will be available or not, > but two new intefaces are added: > > int mac_promisc_add(mac_client_handle_t mch, mac_promisc_type > promisc_type, mac_promisc_fn_t promisc_fn, void *arg, > mac_promisc_handle_t *php); > int mac_promisc_remove(mac_client_handle_t mch, > mac_promisc_handle_t *ph); > > MAC_PROMISC_ALL - send all packets > MAC_PROMISC_MULTI - only broadcast and multicast > > May be the mac_promisc_add(MAC_PROMISC_ALL) will force device > to operate in the promiscous mode.Both need to, since the device needs to be in promiscuous mode also to receive all multicast traffic.> > QUESTIONS: > > Q7.1) According to the section 4.6, the promiscuous mode operates > in the layer2 switch model. When choosing the promiscuous mode > model can it be either layer2 switch model or shared > ethernet model?> Q7.2) From the explanation of mac_promisc_add(), it seems like > the mac_promisc_add() could be called without setting > MAC address via mac_unicast_add(). Is this correct? > If so, what is the expected behaviour?Currently we provide the same semantics as a switched environment, i.e. a MAC client will see the same traffic that would be seen by a NIC connected to a switch. What we would also like to provide is the ability to for a MAC client to obtain all the traffic going in and out of the box, as well as the traffic exchanged between MAC clients. The non-unicast address was part of that solution. Another option would be to generalize this with the shared ethernet model, and allow a MAC client to specify that it wants to observe all traffic via a separate promiscuous type. I need to see how this can be added to the API.> > 8) Statistics related: > > Q8.1) Is the mac_stat_get() interface being obsoleted or > changed? > If so, what is the new equivalent interface?Yes, there will be a new MAC client interface. The MAC layer will also maintain per-MAC client statistics for MAC client specific statistics such as number of packets sent/received, etc. I need to add that interface to the document.> > > GENERAL QUESTONS: > ===============> > Qg.1) Are there any GLDv3 MAC client interfaces that are being > obsoleted(provided by the Nemo framework) but not documented > in this doc?The MAC client interface was project private, and most of the interface is being completely revamped by Crossbow. The set of MAC client API available to ON consolidation components is described by section 5.2 of the document. Any other MAC client API are still project private.> Qg.2) Are there any changes to the MAC driver interfaces or being > obsoleted?The changes made to the driver API will be published as part of a separate forthcoming document.> > Qg.3) There are no MAC client interfaces to specify bandwidth > attributes. From the section 4.7, it seems like they are > implemented as part of VNIC and not as MAC client interfaces. > If this is the case, how can the bandwidth attributes be > specified?They are not documented yet, but will be specified as arguments to mac_client_open().> > Qg.4) When will the classification interface be fully documented > for review?There will be separate documents for the MAC driver classification interfaces, and for the MAC client flow APIs.> > Qg.5) In the future it will be great if the document can include > version info and change bars.Will do. Thanks, Nicolas. -- Nicolas Droux - Solaris Core OS - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Narayan, I didn''t hear back from you since I sent my response to your comments 3 weeks ago. In order to make our current schedule, we need to resolve all points of contention regarding LDOMs by October 19th, and I''m concerned by the lack of progress of this discussion. When can we expect to see a follow-up from you so that we bring closure to this discussion? Nicolas. On Sep 19, 2007, at 10:13 AM, Nicolas Droux wrote:> Narayan, > > Thanks for the comments, my answers below... > > On Sep 8, 2007, at 12:12 PM, Narayan Venkat wrote: > >> 1) MAC client open/close related: >> (Crossbow-virt.pdf Section 5.2.2 Pg 36) >> int mac_client_open(mac_handle_t *mh, >> mac_client_handle_t *mchp, mac_bind_cpus >> void mac_client_close(mac_client_handle_t mch); >> >> typedef struct mac_bind_cpus_s { >> uint_t mbc_ncpus; >> uint32_t *mbc_cpus; >> } mac_bind_cpus_t; >> >> Q1.1) The mac_client_open() interface definition line in the >> document >> is abruptly cut. It seems like there are additional >> arguments >> such as flags etc. > > Yes, there''s a missing break in that line, and the flag argument is > missing, will fix. > >> >> Q1.2) On pg 38, there is a reference to the following flags, >> but which >> interface takes them as an argument? >> >> MAC_OPEN_FLAGS_FORCE_MULTI_RINGS >> MAC_OPEN_FLAGS_FORCE_ONE_RING >> >> It seems like these are an argument to mac_client_open(), >> but there is a reference mac_open() in the description see below: >> >> "If MAC_OPEN_FLAGS_FORCE_MULTI_RINGS flag is set and it is not >> possible to allocate mbc_ncpus hardware rings, the mac_open() >> call will fail, otherwise the MAC layer will attempt to reserve >> one hardware ring for the MAC client." > > These flags are specified when calling mac_client_open(), not mac_open > (). > >> >> Q1.3) Are there any other flags other than the following ones? >> >> MAC_OPEN_FLAGS_FORCE_MULTI_RINGS >> MAC_OPEN_FLAGS_FORCE_ONE_RING > > No. > >> >> - Is there a way to force a software ring? > > Do you mean not assign a hardware ring? I think this is something we > could add, yes. > >> >> Q1.4) Is the mbc_cpus in mac_bind_cpus_t an array of CPU ids? > > Yes. > >> >> Q1.5) The following description of mbc_cpus on pg 37 is not >> clear, >> especially for the non-NULL case. >> >> "If mbc_cpus is NULL, the MAC layer will pick the CPUs. >> If mbc_cpus is non-NULL, the MAC layer will chose the CPUs.". > > The first one is correct. If mbc_cpus is non-NULL, the MAC layer will > assign the CPUs provided by the caller. > >> >> Q1.6) What is the relationship between Unicast addresses(multiple >> unicast set via mac_unicast_add()), Rings and CPUs? >> >> - Is there a 1:1 relation between a unicast address and a ring? >> - Is there a 1:1 relation between a ring and CPU? > > Neither. The MAC addresses will share the same rings and CPUs. > >> >> - The Rings and CPUs are tightly coupled in this interface. >> How can allocate multiple rings even when there is one CPU(or >> less >> number of CPUs). > > You don''t allocate rings explicitly, you express a level of > parallelism instead, the framework distributes the hardware rings > transparently. > >> - When there are multiple CPUs and multiple unicast addresses, >> is there address fanout per CPU? > > See 2 answers above. > >> >> Q1.7) How is the binding of CPUs via mac_bind_cpus_t is co- >> ordinated >> with CPU DR(on the platforms that support them)? > > The MAC layer will be notified of the removal of the CPU and will > stop using it for its worker threads and interrupts. > >> >> NOTE: CPU DR is already a supported feature on LDoms. >> >> Q1.8) LDoms requires the CPU binding to be changed dynamically, >> how can this be accomplished ? > > This cannot be done with the API as documented today. It seems that > you are looking for a call to change the set of CPUs assigned to the > MAC client, is that what you are asking for? > >> >> Q1.9) The following XXX on pg 37. When are the interface >> changes for >> priority and bandwidth specification available? >> >> "XXX We still need to add the priority and bandwidth >> limit as argument to mac_open(). We also need an entry >> point to change the set of CPUs." > > I''m working on it but I don''t have a firm date. > >> >> Q1.10) Can the mac client interface be extended to support >> creating >> a client based on ether_type? This is required for mac >> clients >> like fiberchannel over ethernet. > > No, each MAC client corresponds to a MAC level entity which is > defined by its MAC address. Multiple ether types can be supported on > top of a MAC client. > >> >> 2) MAC Unicast address related: >> (Crossbow-virt.pdf Section 5.2.4 Pg 38) >> >> mac_unicast_handle_t mac_unicast_add(mac_client_handle_t mch, >> mac_addr_type_t addr_type, int *addr_slot, uint_t prefix_len, >> uchar_t *mac_addr, uint32_t flags); >> >> void mac_unicast_unset(mac_unicast_handle_t); >> void mac_unicast_get(mac_unicast_handle_t mah, uchar_t *mac_addr); >> void mac_unicast_update(mac_unicast_handle_t mah, >> mac_addr_type_t addr_type, int *addr_slot, uint_t prefix_len, >> uchar_t *mac_addr); >> >> QUESTIONS: >> Q2.1) The section 4.5 describes "By value" type which is used >> to set a specific MAC address by the MAC client. But there >> is no equivalent addr_type definition under mac_unicast_add() >> interface. > > MAC_UNICAST_VALUE is missing from the list, this is what you are > looking for. > >> >> NOTE: LDoms requires the MAC addresses that are allocated >> by LDom manager be used by the network device. So, LDoms >> will not use any other addr_type other than "By value" type. > > That''s fine. > >> >> Q2.2) Is there an impact to the multiaddress_capab_t.maddr_add()/ >> maddr_remove() interfaces? Are these being obsoleted or >> going away? > > The capability will stay, and the framework will continue to use that > capability to query and control the allocation of MAC address slots. > However that interface is not intended to be used by drivers which > should use the MAC client interfaces instead. > >> >> Q2.3) A system with many domains (aka LDoms) with virtual network >> devices, it requires the use of a large number layer2 >> addresses, >> this will exhaust h/w slots available on most standard NICs. >> How can a client take advantage of layer2 filtering >> provided by >> NICs like NII-NIU/Neptune. Specifically, this will help in >> avoiding the programming of the device into >> PROMISCous mode >> etc. Currently there are no interfaces that seem to >> provide >> such ability. > > Yes, this is a situation we are aware of. We''ve talked on this list > about having multiple VNICs sharing the same MAC address, and > identified by their IP address instead. However this needs to be > scoped and defined further before we can commit on providing that > functionality. > >> >> Q2.4) Clients will need the ability to specify if mac_unicast_add() >> is allowed it to go into promiscous mode or not. An >> error return >> value is required if no h/w mac address slot is available. > > OK, I will add a flag. > >> >> Q2.5) On pg 40, the follow description still pointing to the >> rings argument even though it has been removed from >> mac_unicast_add() interface. >> >> "The rings argument specifies the list of rings to >> associate with the specified unicast MAC address. >> If it is NULL, the MAC layer allocates a set of rings >> according to those available to the MAC client, see >> Section ringselection." > > This should be removed, good catch. > >> >> Q2.6) Can it be assumed that every address added to a client is >> processed in a separate ring (either h/w ring or s/w >> ring)? > > No, all the MAC addresses for a client will share the same ring(s). > If there''s a need to have a different set of rings associated with a > MAC address, then a different MAC client should be created. > >> Q2.7) How are the multiple addresses per client maintained, is it >> done >> in the MAC layer or does it bybpass the MAC layer and passed >> to h/w directly. > > Since the action of reserving the MAC address is triggered by a call > to the MAC layer, the MAC layer cannot be bypassed. The MAC layer > will use the multiple MAC address capability exposed by the driver to > reserve a new MAC address slot. > >> >> Q2.8) Can unlimited number of mac addresses be assigned to a MAC >> client? What are the software/hardware features that limit >> this? > > Memory that can be allocated by the kernel. > >> >> >> 3) Rings related: >> (Crossbow-virt.pdf Section 5.3 Pg 43) >> mac_rint_t *mac_rings_list_get(mac_client_handle_t mch, >> uint_t nrings); >> void mac_rings_list_free(mac_rings_t *rings, uint_t nrings); >> uint16_t mac_ring_get_flags(mac_ring_t ring); >> >> >> QUESTIONS: >> >> Q3.1) All of these interfaces are now categorized as project- >> private >> API. What motivated this change. These interfaces need to be >> more open. > > The MAC layer will do the allocation of hardware resources to the > various MAC clients and their flows. Instead of having each MAC > client manage its own set of resources, the resources are allocated > to MAC clients based on their needs, for example the degree of > parallelism expressed through mac_client_open(). If you have specific > functional requirements that are not satisfied by the current > document, please list them. > >> Q3.2) The mac_rings_list_get() is only for h/w rings, is >> there >> an equivalent interface to obtain s/w ring information. >> Or this interface can be extended return both h/w ring >> or s/w ring information. > > The interface will evolve to provide that information, but it will > remain project private. It is provided here FYI but will change in > future revisions of the document. > >> Q3.3) Are the mac_resource_set() and mac_resources() interfaces >> going away? > > Yes, they will be replaced by different interfaces. But note that > they are already project private in Nevada and were not supposed to > be used by other ON components. > >> Q3.4) What is the action taken when no free h/w ring available. >> As per the documentation of mac_rings_list_get(), if no h/w >> ring available, it returns NULL. In such case, how does >> mac_unicast_add() behave when NULL is passed for rings? > > mac_unicast_add() no longer takes rings. This will be handled > transparently to the MAC clients by using a default ring and falling > back to software classification. > >> Q3.5) Are there any interfaces other than the above mac_rings_xxx >> interfaces that are available to deal with MAC rings? > > Not available to MAC clients. The set of project private interfaces > might evolve as we refine the design. > >> Q3.6) Is the mac_rings_list_get() returns the list of mac rings >> assigned to the client at the time of client open. How can >> this be changed after the client is open. > > The set of assigned rings may change. The details on the APIs needed > to support this still need to be defined, but they will remain > project private. > >> Q3.7) Assigning h/w rings to a specific MAC address limits the >> bandwidth to the number of rings that are assigned to that >> address. Is there a way to not to bind h/w rings specific >> to MAC address so that the bandwidth could be used by >> any mac client depending on the traffic? > > See Q1.3. > >> >> 4) Receive callback related: >> (Crossbow-virt.pdf Section 5.2.5 Pg 40) >> int mac_rx_set(mac_client_handle_t mch, mac_rx_fn_t rx_fn, >> void *arg); >> int mac_rx_clear(mac_client_handle_t mch); >> >> QUESTIONS: >> >> Q4.1) How can a client get rx callback per ring that is >> assigned >> to the mac client? This will allow parallel processing >> and improve the performance. Such a feature is already >> being used in the current implementation of LDoms vSwitch >> driver and the mac_xxx interfaces should support such an >> ability. > > The parallel processing will still happen. I.e. if multiple hardware > rings or software rings are assigned to a MAC clients, multiple > connections associated with that MAC client will be spread across > these rings. > >> Q4.2) How can a client get a separate callback for a defined type of >> traffic, such as different SAP numbers etc. This will >> be useful to provide out of the band type packet processing >> or related services. > > This will be supported by a MAC flow API built on top of the MAC > client API. The flow API will be described by a separate document. > >> >> Q4.3) There is a reference mac_addr_set(), should it be >> mac_unicast_add()? > > Yes, will fix. > >> >> 5) Transmit related: >> >> (Crossbow-virt.pdf Section 5.2.7 Pg 41) >> mblk_t *mac_tx(mac_client_handle_t *mch, mblk_t *mp, uint64_t hint); >> >> QUESTIONS: >> >> Q5.1) What are the valid values for the ''hint'' argument? >> From the description on pg 42, NULL seems to be >> a valid value. Is it safe to assume that the ''hint'' is a >> ring-id, if so, a NULL value of 0 will conflict with a >> ring-id of 0. > > The hint can be any 64 bit value, but it must always be the same > value for the packets corresponding to the same connection to avoid > reordering. TCP and UDP for example pass the connection pointer as > the hint, which allows us to avoid packet inspection for these > protocols. > >> >> Q5.2) If NULL specified as a ''hint'', how is the tx ring >> selected? > > In this case mac_tx() will parse the packet headers and hash on the > header information to select a transmit ring. > >> >> Q5.3) The ''hint'' argument description says the following. >> What is the meaning of a connection in this context and >> how to identify this? >> >> "The hint must be the same for packets of the same >> connection." > > It can be a TCP connection for example. This is required to avoid > reordering of packets for the same connection. > >> 6) Multicast addresses related: >> (Crossbow-virt.pdf Section 5.2.6 Pg 41) >> int mac_multicast_add(mac_client_handle_t mch, const uint8_t >> *addr); >> int mac_multicast_remove(mac_client_handle_t mch, const uint8_t >> *addr); >> >> >> No comments at this point. >> >> 7) Promiscous mode realted: >> >> (Crossbow-virt.pdf Section 5.2.8 Pg 42) >> Its not clear if the above interface will be available or not, >> but two new intefaces are added: >> >> int mac_promisc_add(mac_client_handle_t mch, mac_promisc_type >> promisc_type, mac_promisc_fn_t promisc_fn, void *arg, >> mac_promisc_handle_t *php); >> int mac_promisc_remove(mac_client_handle_t mch, >> mac_promisc_handle_t *ph); >> >> MAC_PROMISC_ALL - send all packets >> MAC_PROMISC_MULTI - only broadcast and multicast >> >> May be the mac_promisc_add(MAC_PROMISC_ALL) will force device >> to operate in the promiscous mode. > > Both need to, since the device needs to be in promiscuous mode also > to receive all multicast traffic. > >> >> QUESTIONS: >> >> Q7.1) According to the section 4.6, the promiscuous mode >> operates >> in the layer2 switch model. When choosing the promiscuous mode >> model can it be either layer2 switch model or shared >> ethernet model? > >> Q7.2) From the explanation of mac_promisc_add(), it seems like >> the mac_promisc_add() could be called without setting >> MAC address via mac_unicast_add(). Is this correct? >> If so, what is the expected behaviour? > > Currently we provide the same semantics as a switched environment, > i.e. a MAC client will see the same traffic that would be seen by a > NIC connected to a switch. > > What we would also like to provide is the ability to for a MAC client > to obtain all the traffic going in and out of the box, as well as the > traffic exchanged between MAC clients. The non-unicast address was > part of that solution. > > Another option would be to generalize this with the shared ethernet > model, and allow a MAC client to specify that it wants to observe all > traffic via a separate promiscuous type. I need to see how this can > be added to the API. > >> >> 8) Statistics related: >> >> Q8.1) Is the mac_stat_get() interface being obsoleted or >> changed? >> If so, what is the new equivalent interface? > > Yes, there will be a new MAC client interface. The MAC layer will > also maintain per-MAC client statistics for MAC client specific > statistics such as number of packets sent/received, etc. I need to > add that interface to the document. > >> >> >> GENERAL QUESTONS: >> ===============>> >> Qg.1) Are there any GLDv3 MAC client interfaces that are being >> obsoleted(provided by the Nemo framework) but not documented >> in this doc? > > The MAC client interface was project private, and most of the > interface is being completely revamped by Crossbow. The set of MAC > client API available to ON consolidation components is described by > section 5.2 of the document. Any other MAC client API are still > project private. > >> Qg.2) Are there any changes to the MAC driver interfaces or being >> obsoleted? > > The changes made to the driver API will be published as part of a > separate forthcoming document. > >> >> Qg.3) There are no MAC client interfaces to specify bandwidth >> attributes. From the section 4.7, it seems like they are >> implemented as part of VNIC and not as MAC client interfaces. >> If this is the case, how can the bandwidth attributes be >> specified? > > They are not documented yet, but will be specified as arguments to > mac_client_open(). > >> >> Qg.4) When will the classification interface be fully documented >> for review? > > There will be separate documents for the MAC driver classification > interfaces, and for the MAC client flow APIs. > >> >> Qg.5) In the future it will be great if the document can include >> version info and change bars. > > Will do. > > Thanks, > Nicolas. > > -- > Nicolas Droux - Solaris Core OS - Sun Microsystems, Inc. > droux at sun.com - http://blogs.sun.com/droux > > > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Nicolas Droux - Solaris Core OS - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Hi Nicolas I will send out more questions/comments before end of today or early tomorrow. Very sorry for the delay in sending out a response .. Thanks -Narayan This message posted from opensolaris.org
Hi Nicolas Sorry for the late response. Please see some more questions/comments below .. <..snip..>> >> >> Q1.2) On pg 38, there is a reference to the following flags, >> but which >> interface takes them as an argument? >> >> MAC_OPEN_FLAGS_FORCE_MULTI_RINGS >> MAC_OPEN_FLAGS_FORCE_ONE_RING >> >> It seems like these are an argument to mac_client_open(), >> but there is a reference mac_open() in the description see below: >> >> "If MAC_OPEN_FLAGS_FORCE_MULTI_RINGS flag is set and it is not >> possible to allocate mbc_ncpus hardware rings, the mac_open() >> call will fail, otherwise the MAC layer will attempt to reserve >> one hardware ring for the MAC client." > > These flags are specified when calling mac_client_open(), not > mac_open().I guess the above text will get fixed in a subsequent revision.>> >> Q1.3) Are there any other flags other than the following ones? >> >> MAC_OPEN_FLAGS_FORCE_MULTI_RINGS >> MAC_OPEN_FLAGS_FORCE_ONE_RING > > No.Is there a reason this is tied to hardware rings. We would like the mac client open request be extended so that it can get either all software rings, a mix of hardware and software or HW rings in order to match the number of cpus specified in the client_open call .. A flag can be specified for this .. Also according to the explanation in the doc at page 38, there is also a case where no flags is specified. It seems like, if no flags specified, then it will attempt to reserve one hardware ring. It seems to not to fail even if such reservation fails, but it is not clearly specified.>> - Is there a way to force a software ring? > > Do you mean not assign a hardware ring? I think this is something > we could add, yes.This is related to the above. Can you add a flag that we can use to indicate that client wants to use a single ring or multiple rings but not force hardware rings. That way even when the underlying device does not have enough hardware rings, a client can get a soft ring per CPU. Above comment applies to this one also. The behavior without any flags seem to attempt to reserve one h/w ring. What is the failure case ?>> >> Q1.4) Is the mbc_cpus in mac_bind_cpus_t an array of CPU ids? > > Yes. > >> >> Q1.5) The following description of mbc_cpus on pg 37 is not >> clear, >> especially for the non-NULL case. >> >> "If mbc_cpus is NULL, the MAC layer will pick the CPUs. >> If mbc_cpus is non-NULL, the MAC layer will chose the CPUs.". > > The first one is correct. If mbc_cpus is non-NULL, the MAC layer > will assign the CPUs provided by the caller.When the mbc_cpus is NULL what determines how many CPUs and hence the number of rings available to this client.> >> >> Q1.6) What is the relationship between Unicast addresses(multiple >> unicast set via mac_unicast_add()), Rings and CPUs? >> >> - Is there a 1:1 relation between a unicast address and a ring? >> - Is there a 1:1 relation between a ring and CPU? > > Neither. The MAC addresses will share the same rings and CPUs.But since you are allowing multiple mac addresses to be associated with a client, can we add support as part of the unicast_add call to indicate that each of these addresses should be associated with ring (either HW or SW).> >> >> - The Rings and CPUs are tightly coupled in this interface. >> How can allocate multiple rings even when there is one CPU(or >> less >> number of CPUs). > > You don''t allocate rings explicitly, you express a level of > parallelism instead, the framework distributes the hardware rings > transparently.But the only way we can control this parallelism is by specifying the number of CPUs in the domain. But in a system capable of adding and removing CPUs dynamically, we might want to change the parallelism level too. The current APIs dont allow changing this. We will need a way to specify this as an extension to the client_open or a via a new API call. Also the document states that if mbc_ncpus HW rings the open will fail. As I mentioned earlier it would be nice if we can get software rings in this case. Also, in terms of parallelism is this specified by the no. of CPUs or by unique CPUIDs in the array. What happens if I specify ncpus where all IDs are the same - do I get ncpus HW rings if they are available. Also can we then change the ring to cpu mapping when more CPUs are added/removed to/from the domain ?> >> - When there are multiple CPUs and multiple unicast addresses, >> is there address fanout per CPU? > > See 2 answers above.This will be a very useful feature as it will allow clients to associate each ring with a mac address. Currently the only way to do this is to do separate mac_client_open calls associate it with a ring and then bind it to a mac address.> >> >> Q1.7) How is the binding of CPUs via mac_bind_cpus_t is co- >> ordinated >> with CPU DR(on the platforms that support them)? > > The MAC layer will be notified of the removal of the CPU and will > stop using it for its worker threads and interrupts.That is purely error handling. We need the ability to be able to use more CPUs and improve the level of parallelism when CPUs are added. The reverse is true when the CPUs are removed. When the MAC layer is notified about CPUs going away does it remove the rings associated with the CPUs ?> >> >> NOTE: CPU DR is already a supported feature on LDoms. >> >> Q1.8) LDoms requires the CPU binding to be changed dynamically, >> how can this be accomplished ? > > This cannot be done with the API as documented today. It seems that > you are looking for a call to change the set of CPUs assigned to > the MAC client, is that what you are asking for?See 1.7 <..snip..>>> >> Q1.10) Can the mac client interface be extended to support >> creating >> a client based on ether_type? This is required for mac >> clients >> like fiberchannel over ethernet. > > No, each MAC client corresponds to a MAC level entity which is > defined by its MAC address. Multiple ether types can be supported > on top of a MAC client.Devices like the Niagara2 NIU allow classification of packets using parameters like the ether_type. How can a mac_client take advantage of such a functionality. <..snip..>>> Q2.1) The section 4.5 describes "By value" type which is used >> to set a specific MAC address by the MAC client. But there >> is no equivalent addr_type definition under mac_unicast_add() >> interface. > > MAC_UNICAST_VALUE is missing from the list, this is what you are > looking for.I presume this will be documented in the next revision.>> >> NOTE: LDoms requires the MAC addresses that are allocated >> by LDom manager be used by the network device. So, LDoms >> will not use any other addr_type other than "By value" type. > > That''s fine. > >> >> Q2.2) Is there an impact to the multiaddress_capab_t.maddr_add()/ >> maddr_remove() interfaces? Are these being obsoleted or >> going away? > > The capability will stay, and the framework will continue to use > that capability to query and control the allocation of MAC address > slots. However that interface is not intended to be used by drivers > which should use the MAC client interfaces instead.OK.>> >> Q2.3) A system with many domains (aka LDoms) with virtual network >> devices, it requires the use of a large number >> layer2 addresses, >> this will exhaust h/w slots available on most standard NICs. >> How can a client take advantage of layer2 filtering >> provided by >> NICs like NII-NIU/Neptune. Specifically, this will help in >> avoiding the programming of the device into >> PROMISCous mode >> etc. Currently there are no interfaces that seem to >> provide >> such ability. > > Yes, this is a situation we are aware of. We''ve talked on this list > about having multiple VNICs sharing the same MAC address, and > identified by their IP address instead. However this needs to be > scoped and defined further before we can commit on providing that > functionality. >The current APIs only allow adding as many addresses as the number of slots available. Following this it will put the adapter in promisc mode. Instead can you add the capability to specify when to use a filter and when to take up a slot in the HW.>> >> Q2.4) Clients will need the ability to specify if mac_unicast_add() >> is allowed it to go into promiscous mode or not. An >> error return >> value is required if no h/w mac address slot is available. > > OK, I will add a flag.Thanks .. <..snip..>>> >> Q2.6) Can it be assumed that every address added to a client is >> processed in a separate ring (either h/w ring or s/w >> ring)? > > No, all the MAC addresses for a client will share the same ring(s). > If there''s a need to have a different set of rings associated with > a MAC address, then a different MAC client should be created.What happens when a single client has multiple rings and multiple mac addresses. How is the mapping done in that case ? Would it be possible to in that case request a 1-to-1 mapping and reserve a ring for each address ?>> Q2.7) How are the multiple addresses per client maintained, is it >> done >> in the MAC layer or does it bybpass the MAC layer and passed >> to h/w directly. > > Since the action of reserving the MAC address is triggered by a > call to the MAC layer, the MAC layer cannot be bypassed. The MAC > layer will use the multiple MAC address capability exposed by the > driver to reserve a new MAC address slot.What if the driver does not expose that capability. Will the unicast_add call fail ? Is the MAC layer essentially reflecting the capability of the underlying hardware or does it provide the ability to have multiple addresses irrespective of whether the HW has multiple slots or not ?>> >> Q2.8) Can unlimited number of mac addresses be assigned to a MAC >> client? What are the software/hardware features that limit >> this? > > Memory that can be allocated by the kernel.So even if the underlying device runs out of slots the MAC layer will maintain all the addresses associated with that client. How does it then manage and associate these addresses with the rings allocated for this client ? What does it do in both software and hardware to filter the addresses for this client ? Also which addresses get HW slots and which dont ? Also if you run out of slots does the HW go to promisc mode ..>> >> 3) Rings related: >> (Crossbow-virt.pdf Section 5.3 Pg 43) >> mac_rint_t *mac_rings_list_get(mac_client_handle_t mch, >> uint_t nrings); >> void mac_rings_list_free(mac_rings_t *rings, uint_t nrings); >> uint16_t mac_ring_get_flags(mac_ring_t ring); >> >> >> QUESTIONS: >> >> Q3.1) All of these interfaces are now categorized as project- >> private >> API. What motivated this change. These interfaces need to be >> more open. > > The MAC layer will do the allocation of hardware resources to the > various MAC clients and their flows. Instead of having each MAC > client manage its own set of resources, the resources are allocated > to MAC clients based on their needs, for example the degree of > parallelism expressed through mac_client_open(). If you have > specific functional requirements that are not satisfied by the > current document, please list them.Currently rings are hidden resources entirely managed by the mac layer and clients have no visiblity. All the client gets to do is request a degree of parallelism. Providing APIs that allow clients to see how rings were allocated will be useful.> >> Q3.2) The mac_rings_list_get() is only for h/w rings, is >> there >> an equivalent interface to obtain s/w ring information. >> Or this interface can be extended return both h/w ring >> or s/w ring information. > > The interface will evolve to provide that information, but it will > remain project private. It is provided here FYI but will change in > future revisions of the document.So the expectation is that ring APIs should not be used by clients and it is only an internal MAC layer resource managed by it ?> >> Q3.3) Are the mac_resource_set() and mac_resources() interfaces >> going away? > > Yes, they will be replaced by different interfaces. But note that > they are already project private in Nevada and were not supposed to > be used by other ON components.Agreed, but there is no other way in the new Crossbow API a way to take advantage of multiple rings. There is one generic RX callback, but no other way to associate a callback with a specific ring. This is a limitation of the existing API and support should be added to open API list, so that we can process traffic independently. So having some additional APIs to expose this will be very useful.>> Q3.4) What is the action taken when no free h/w ring available. >> As per the documentation of mac_rings_list_get(), if no h/w >> ring available, it returns NULL. In such case, how does >> mac_unicast_add() behave when NULL is passed for rings? > > mac_unicast_add() no longer takes rings. This will be handled > transparently to the MAC clients by using a default ring and > falling back to software classification.So when there are multiple rings -- and these rings are associated with all the addresses - there is no pre-defined mapping ? That can be inefficient as the hardware has the ability to associate each ring with an address. Can we extend the MAC apis to allow a client to choose between the default behavior and binding addresses to rings ?>> Q3.5) Are there any interfaces other than the above mac_rings_xxx >> interfaces that are available to deal with MAC rings? > > Not available to MAC clients. The set of project private interfaces > might evolve as we refine the design.I would like to see some of the functionality exported via these private mac interfaces promoted to a open API. Even if the API cannot be moved over, can we extend APIs to provide hints ..> >> Q3.6) Is the mac_rings_list_get() returns the list of mac rings >> assigned to the client at the time of client open. How can >> this be changed after the client is open. > > The set of assigned rings may change. The details on the APIs > needed to support this still need to be defined, but they will > remain project private.So you are saying there is no way to rely on how many rings are available to a particular client. This will change without the client''s control ? Is CPUs being removed from the system a case under which this will happen ?>> Q3.7) Assigning h/w rings to a specific MAC address limits the >> bandwidth to the number of rings that are assigned to that >> address. Is there a way to not to bind h/w rings specific >> to MAC address so that the bandwidth could be used by >> any mac client depending on the traffic? > > See Q1.3.Not sure what you mean. Are you suggesting that some mac addresses will have SW rings and others will be associated to HW rings ?>> >> 4) Receive callback related: >> (Crossbow-virt.pdf Section 5.2.5 Pg 40) >> int mac_rx_set(mac_client_handle_t mch, mac_rx_fn_t rx_fn, >> void *arg); >> int mac_rx_clear(mac_client_handle_t mch); >> >> QUESTIONS: >> >> Q4.1) How can a client get rx callback per ring that is >> assigned >> to the mac client? This will allow parallel processing >> and improve the performance. Such a feature is already >> being used in the current implementation of LDoms vSwitch >> driver and the mac_xxx interfaces should support such an >> ability. > > The parallel processing will still happen. I.e. if multiple > hardware rings or software rings are assigned to a MAC clients, > multiple connections associated with that MAC client will be spread > across these rings.So with multiple rings, there will be concurrent callbacks to the rx_fn, each with packets in the corresponding ring ? Also will each callback be able to determine what ring did the callback ?> >> Q4.2) How can a client get a separate callback for a defined type of >> traffic, such as different SAP numbers etc. This will >> be useful to provide out of the band type packet processing >> or related services. > > This will be supported by a MAC flow API built on top of the MAC > client API. The flow API will be described by a separate document.So if a client wants to use the flow API will it need to layer itself on the flow API and not the mac client API directly. Can you give me more information on what this layering will look like. Also, when do you expect the flow API doc to be available ? <..snip..>>> >> 5) Transmit related: >> >> (Crossbow-virt.pdf Section 5.2.7 Pg 41) >> mblk_t *mac_tx(mac_client_handle_t *mch, mblk_t *mp, uint64_t hint); >> >> QUESTIONS: >> >> Q5.1) What are the valid values for the ''hint'' argument? >> From the description on pg 42, NULL seems to be >> a valid value. Is it safe to assume that the ''hint'' is a >> ring-id, if so, a NULL value of 0 will conflict with a >> ring-id of 0. > > The hint can be any 64 bit value, but it must always be the same > value for the packets corresponding to the same connection to avoid > reordering. TCP and UDP for example pass the connection pointer as > the hint, which allows us to avoid packet inspection for these > protocols.Can you clarify what the hint is being used for. Is it similar to the case below -- where a hash will be applied on hint value to pick up a TX ring ?>> >> Q5.2) If NULL specified as a ''hint'', how is the tx ring >> selected? > > In this case mac_tx() will parse the packet headers and hash on the > header information to select a transmit ring. >Is the goal here to somehow bifurcate traffic being sent by a client via the interface ? The algorithm is pre-determined by the mac layer and either hint or misc headed + hash will be used to determine the Tx ring for transmit ? It is possible for a client to request a specific ring -- is the only way to do this is pick a unique hint ?>> >> Q5.3) The ''hint'' argument description says the following. >> What is the meaning of a connection in this context and >> how to identify this? >> >> "The hint must be the same for packets of the same >> connection." > > It can be a TCP connection for example. This is required to avoid > reordering of packets for the same connection.OK ..> >> 6) Multicast addresses related: >> (Crossbow-virt.pdf Section 5.2.6 Pg 41) >> int mac_multicast_add(mac_client_handle_t mch, const uint8_t >> *addr); >> int mac_multicast_remove(mac_client_handle_t mch, const uint8_t >> *addr); >> >> >> No comments at this point. >> >> 7) Promiscous mode realted: >> >> (Crossbow-virt.pdf Section 5.2.8 Pg 42) >> Its not clear if the above interface will be available or not, >> but two new intefaces are added: >> >> int mac_promisc_add(mac_client_handle_t mch, mac_promisc_type >> promisc_type, mac_promisc_fn_t promisc_fn, void *arg, >> mac_promisc_handle_t *php); >> int mac_promisc_remove(mac_client_handle_t mch, >> mac_promisc_handle_t *ph); >> >> MAC_PROMISC_ALL - send all packets >> MAC_PROMISC_MULTI - only broadcast and multicast >> >> May be the mac_promisc_add(MAC_PROMISC_ALL) will force device >> to operate in the promiscous mode. > > Both need to, since the device needs to be in promiscuous mode also > to receive all multicast traffic.What do mean by "Both need to." ? In addition to above interface to enable promisc mode, will the existing promisc_set() interface be removed ? Also, PROMISC_ALL is unicast+multicast, whereas MULTI is only all multicast traffic ?> >> >> QUESTIONS: >> >> Q7.1) According to the section 4.6, the promiscuous mode >> operates >> in the layer2 switch model. When choosing the promiscuous mode >> model can it be either layer2 switch model or shared >> ethernet model? >Comments ?>> Q7.2) From the explanation of mac_promisc_add(), it seems like >> the mac_promisc_add() could be called without setting >> MAC address via mac_unicast_add(). Is this correct? >> If so, what is the expected behaviour? > > Currently we provide the same semantics as a switched environment, > i.e. a MAC client will see the same traffic that would be seen by a > NIC connected to a switch. >Is there a way to see only the multicast traffic associated with all mac clients - union of all mac_client multicast_add addresses. The MULTI promisc option seems more a way to weed to unicast and broadcast traffic on the wire and pass all wire multicast traffic up - including ones the system may not be interested in ? Is this the case ?> What we would also like to provide is the ability to for a MAC > client to obtain all the traffic going in and out of the box, as > well as the traffic exchanged between MAC clients. The non-unicast > address was part of that solution. >OK ..> Another option would be to generalize this with the shared ethernet > model, and allow a MAC client to specify that it wants to observe > all traffic via a separate promiscuous type. I need to see how this > can be added to the API.This will be very useful. How about something MAC_PROMISC_CLIENTS ?> >> >> 8) Statistics related: >> >> Q8.1) Is the mac_stat_get() interface being obsoleted or >> changed? >> If so, what is the new equivalent interface? > > Yes, there will be a new MAC client interface. The MAC layer will > also maintain per-MAC client statistics for MAC client specific > statistics such as number of packets sent/received, etc. I need to > add that interface to the document.OK -- when will this be available. Next doc update ? Until that point in time should we continue to use the mac_stat_get() interface ?>> GENERAL QUESTONS: >> ===============>> >> Qg.1) Are there any GLDv3 MAC client interfaces that are being >> obsoleted(provided by the Nemo framework) but not documented >> in this doc? > > The MAC client interface was project private, and most of the > interface is being completely revamped by Crossbow. The set of MAC > client API available to ON consolidation components is described by > section 5.2 of the document. Any other MAC client API are still > project private.So all interfaces that were part of GLDv3 will be replaced with the interfaces specified here. Anything that is not specified here should not be used moving forward ?>> Qg.2) Are there any changes to the MAC driver interfaces or being >> obsoleted? > > The changes made to the driver API will be published as part of a > separate forthcoming document.How soon will this doc become available ? I see in the latest doc Kais published there are some new interfaces for ring support. I presume there will be a separate doc for the deneric mac driver interfaces ?>> >> Qg.3) There are no MAC client interfaces to specify bandwidth >> attributes. From the section 4.7, it seems like they are >> implemented as part of VNIC and not as MAC client interfaces. >> If this is the case, how can the bandwidth attributes be >> specified? > > They are not documented yet, but will be specified as arguments to > mac_client_open().Can we expect to see this in the next revision of this document ?> >> >> Qg.4) When will the classification interface be fully documented >> for review? > > There will be separate documents for the MAC driver classification > interfaces, and for the MAC client flow APIs.When will this be available ? Thanks -Narayan
Narayan Venkat wrote:> >>> Q1.3) Are there any other flags other than the following ones? >>> >>> MAC_OPEN_FLAGS_FORCE_MULTI_RINGS >>> MAC_OPEN_FLAGS_FORCE_ONE_RING >>> >> No. >> > > Is there a reason this is tied to hardware rings. We would like the > mac client > open request be extended so that it can get either all software > rings, a mix of > hardware and software or HW rings in order to match the number of > cpus specified > in the client_open call .. A flag can be specified for this .. >I''m not clear on this requirement. Do you have examples of MAC clients that need that level of detail? Thanks, Kais.
Narayan, Thanks for the response. A general comment before diving into the details: In general it seems that the issues being discussed are related to either (1) the port of the existing LDOM functionality to the updated MAC client interfaces introduced by Crossbow, or (2) the addition of new LDOM functionality based on these new APIs. It''s critical that we resolve the issues falling into the first category as soon as possible. This will allow the port of LDOM to the new APIs to start as soon as possible, maintain LDOM functionality, and allow Crossbow to meet its integration schedule. For the class of issues related to future LDOM functionality, this discussion will have to take place with a clear understanding of the new functional requirements. So given my latest response below, what are in your opinion the remaining issues which are directly related to the port of the existing LDOM functionality to the new MAC API? On Oct 11, 2007, at 3:13 PM, Narayan Venkat wrote:> Hi Nicolas > > Sorry for the late response. Please see some more questions/ > comments below .. > > <..snip..> > >> >>> >>> Q1.2) On pg 38, there is a reference to the following flags, >>> but which >>> interface takes them as an argument? >>> >>> MAC_OPEN_FLAGS_FORCE_MULTI_RINGS >>> MAC_OPEN_FLAGS_FORCE_ONE_RING >>> >>> It seems like these are an argument to mac_client_open(), >>> but there is a reference mac_open() in the description see below: >>> >>> "If MAC_OPEN_FLAGS_FORCE_MULTI_RINGS flag is set and it is not >>> possible to allocate mbc_ncpus hardware rings, the mac_open() >>> call will fail, otherwise the MAC layer will attempt to reserve >>> one hardware ring for the MAC client." >> >> These flags are specified when calling mac_client_open(), not >> mac_open(). > > I guess the above text will get fixed in a subsequent revision.Yes.> > >>> >>> Q1.3) Are there any other flags other than the following ones? >>> >>> MAC_OPEN_FLAGS_FORCE_MULTI_RINGS >>> MAC_OPEN_FLAGS_FORCE_ONE_RING >> >> No. > > Is there a reason this is tied to hardware rings. We would like the > mac client > open request be extended so that it can get either all software > rings, a mix of > hardware and software or HW rings in order to match the number of > cpus specified > in the client_open call .. A flag can be specified for this ..Why do you want to do this? What is the functional requirement here?> > Also according to the explanation in the doc at page 38, there is > also a case where no flags is specified. It seems like, if no flags > specified, then it will attempt to reserve one hardware ring. > It seems to not to fail even if such reservation fails, but it is not > clearly specified.If you don''t specify the ONE_RING flag, and a hardware ring cannot be reserved for the MAC client, then the MAC client will share the default ring with other MAC clients.> >>> - Is there a way to force a software ring? >> >> Do you mean not assign a hardware ring? I think this is something >> we could add, yes. > > This is related to the above. Can you add a flag that we can use > to indicate that client wants to use a single ring or multiple rings > but not force hardware rings. That way even when the underlying > device does not have enough hardware rings, a client can get a soft > ring per CPU.How is that different than the currently documented behavior?> Above comment applies to this one also. The behavior without any flags > seem to attempt to reserve one h/w ring. What is the failure case ?Without any flag, we try to allocate N hardware rings. If that fails, then we try to allocate 1 hardware ring and do fanout to N soft rings, if that fails, then we share the default hardware ring, and do fanout to N soft rings.> >>> >>> Q1.4) Is the mbc_cpus in mac_bind_cpus_t an array of CPU ids? >> >> Yes. >> >>> >>> Q1.5) The following description of mbc_cpus on pg 37 is not >>> clear, >>> especially for the non-NULL case. >>> >>> "If mbc_cpus is NULL, the MAC layer will pick the CPUs. >>> If mbc_cpus is non-NULL, the MAC layer will chose the CPUs.". >> >> The first one is correct. If mbc_cpus is non-NULL, the MAC layer >> will assign the CPUs provided by the caller. > > When the mbc_cpus is NULL what determines how many CPUs and hence > the number of rings available to this client.mbc_ncpus.> >> >>> >>> Q1.6) What is the relationship between Unicast addresses >>> (multiple >>> unicast set via mac_unicast_add()), Rings and CPUs? >>> >>> - Is there a 1:1 relation between a unicast address and a >>> ring? >>> - Is there a 1:1 relation between a ring and CPU? >> >> Neither. The MAC addresses will share the same rings and CPUs. > > But since you are allowing multiple mac addresses to be associated > with a client, can we add support as part of the unicast_add call > to indicate that each of these addresses should be associated with > ring (either HW or SW).No, each client is associated with a group of hardware rings or soft rings. Each group or set of rings corresponds to a set of unicast MAC addresses. The bandwidth limits are set on a per MAC client basis. This maps to how hardware NICs do their classification and fanout. If you want a separate set of rings for different MAC addresses, then you create a new MAC client.>>> - The Rings and CPUs are tightly coupled in this interface. >>> How can allocate multiple rings even when there is one CPU(or >>> less >>> number of CPUs). >> >> You don''t allocate rings explicitly, you express a level of >> parallelism instead, the framework distributes the hardware rings >> transparently. > > But the only way we can control this parallelism is by specifying > the number of CPUs in the domain. But in a system capable of adding > and removing CPUs dynamically, we might want to change the parallelism > level too. The current APIs dont allow changing this. We will need > a way to specify this as an extension to the client_open or a via > a new API call.So you want an API which allows you to change the actual mac_bind_cpu_t for a client which has been already opened? I think we can do that.> > Also the document states that if mbc_ncpus HW rings the open will > fail. > As I mentioned earlier it would be nice if we can get software rings > in this case.See Q1.3 above, you can fallback to using software rings depending on the flags set.> Also, in terms of parallelism is this specified by the no. of CPUs > or by unique CPUIDs in the array. What happens if I specify ncpus > where all IDs are the same - do I get ncpus HW rings if they are > available. Also can we then change the ring to cpu mapping when > more CPUs are added/removed to/from the domain ?There should be no duplicate CPU ids in the array.> >> >>> - When there are multiple CPUs and multiple unicast addresses, >>> is there address fanout per CPU? >> >> See 2 answers above. > > This will be a very useful feature as it will allow clients to > associate > each ring with a mac address. Currently the only way to do this is > to do separate mac_client_open calls associate it with a ring and then > bind it to a mac address.Yes, you have to do a mac_client_open() if you want to assign a different set of rings to a MAC address, see also above.>>> Q1.7) How is the binding of CPUs via mac_bind_cpus_t is co- >>> ordinated >>> with CPU DR(on the platforms that support them)? >> >> The MAC layer will be notified of the removal of the CPU and will >> stop using it for its worker threads and interrupts. > > That is purely error handling. We need the ability to be able to use > more CPUs and improve the level of parallelism when CPUs are added. > The reverse is true when the CPUs are removed. When the MAC layer is > notified about CPUs going away does it remove the rings associated > with the CPUs ?I was not talking specifically about error handling. If the MAC layer bound a ring worker thread or interrupt to a CPU and that CPU is going away, the MAC layer will move that thread or interrupt to a different CPU. The API discussed in Q1.6 above would allow a MAC client to increase the number of CPUs if it detects that CPUs were added to the system. <snip>> >>> >>> Q1.10) Can the mac client interface be extended to support >>> creating >>> a client based on ether_type? This is required for mac >>> clients >>> like fiberchannel over ethernet. >> >> No, each MAC client corresponds to a MAC level entity which is >> defined by its MAC address. Multiple ether types can be supported >> on top of a MAC client. > > Devices like the Niagara2 NIU allow classification of packets using > parameters like the ether_type. How can a mac_client take advantage > of such a functionality.The fact that a particular hardware implementation can do classification on a specific header field of a packet doesn''t necessarily mean that a MAC client needs to be associated with that field. Today the SAP demultiplexing is done by DLS on top of MAC clients. At some point in the future we may make use of hardware classification to offload that demultiplexing, but can be done at a level above the MAC layer, and maintain the separation between MAC clients and what defines them (MAC addresses and VLANs), from SAP demultiplexing.>>> >>> Q2.1) The section 4.5 describes "By value" type which is used >>> to set a specific MAC address by the MAC client. But there >>> is no equivalent addr_type definition under mac_unicast_add() >>> interface. >> >> MAC_UNICAST_VALUE is missing from the list, this is what you are >> looking for. > > I presume this will be documented in the next revision.Correct.>>> >>> Q2.2) Is there an impact to the multiaddress_capab_t.maddr_add >>> ()/ >>> maddr_remove() interfaces? Are these being obsoleted or >>> going away? >> >> The capability will stay, and the framework will continue to use >> that capability to query and control the allocation of MAC address >> slots. However that interface is not intended to be used by >> drivers which should use the MAC client interfaces instead. > > OK.Since my last reply Kais and Roamer have been working on the design for the new driver interface. Their proposal removes the multiple MAC address capability as it is known today. You should read their design document, which is available at http://www.opensolaris.org/os/project/ crossbow/Docs/virtual_resources.pdf>>> >>> Q2.3) A system with many domains (aka LDoms) with virtual network >>> devices, it requires the use of a large number >>> layer2 addresses, >>> this will exhaust h/w slots available on most standard NICs. >>> How can a client take advantage of layer2 filtering >>> provided by >>> NICs like NII-NIU/Neptune. Specifically, this will help in >>> avoiding the programming of the device into >>> PROMISCous mode >>> etc. Currently there are no interfaces that seem >>> to provide >>> such ability. >> >> Yes, this is a situation we are aware of. We''ve talked on this >> list about having multiple VNICs sharing the same MAC address, and >> identified by their IP address instead. However this needs to be >> scoped and defined further before we can commit on providing that >> functionality. >> > > The current APIs only allow adding as many addresses as the > number of slots available. Following this it will put the adapter > in promisc mode. Instead can you add the capability to specify > when to use a filter and when to take up a slot in the HW.Do you mean that you want to be able to specify that a mac_unicast_add () should put the NIC in promiscuous mode even though there are MAC address slots available? What is the use case for this? <snip>>>> >>> Q2.6) Can it be assumed that every address added to a client is >>> processed in a separate ring (either h/w ring or s/w >>> ring)? >> >> No, all the MAC addresses for a client will share the same ring >> (s). If there''s a need to have a different set of rings associated >> with a MAC address, then a different MAC client should be created. > > What happens when a single client has multiple rings and multiple > mac addresses. How is the mapping done in that case ? Would it > be possible to in that case request a 1-to-1 mapping and reserve > a ring for each address ?If you need a 1-1 mapping between a MAC address and a hardware ring, then you need to use multiple MAC clients, and assign a ring and a unicast address to each client.>>> Q2.7) How are the multiple addresses per client maintained, is >>> it done >>> in the MAC layer or does it bybpass the MAC layer and passed >>> to h/w directly. >> >> Since the action of reserving the MAC address is triggered by a >> call to the MAC layer, the MAC layer cannot be bypassed. The MAC >> layer will use the multiple MAC address capability exposed by the >> driver to reserve a new MAC address slot. > > What if the driver does not expose that capability. Will the > unicast_add > call fail ? Is the MAC layer essentially reflecting the capability of > the underlying hardware or does it provide the ability to have > multiple > addresses irrespective of whether the HW has multiple slots or not ?The request will still succeed if the number of MAC address slots is exhausted, or if the underlying NIC doesn''t support the multiple MAC address capability. However in these cases, the MAC layer will transparently put the NIC in promiscuous mode in order to receive traffic for that new MAC unicast address.>>> Q2.8) Can unlimited number of mac addresses be assigned to a MAC >>> client? What are the software/hardware features that limit >>> this? >> >> Memory that can be allocated by the kernel. > > So even if the underlying device runs out of slots the MAC layer will > maintain all the addresses associated with that client. How does it > then > manage and associate these addresses with the rings allocated for > this client ? What does it do in both software and hardware to > filter the addresses for this client ? Also which addresses get HW > slots > and which dont ? Also if you run out of slots does the HW go to > promisc mode ..Each MAC client is associated with a group of rings. Each group of rings is therefore associated with a set of MAC addresses. If a client needs to be associated with more than one MAC address, then the corresponding group needs to be associated with the same set of addresses. If the hardware runs out of MAC addresses, then the NIC is put in promiscuous mode. The allocation of slots is on a first come first served basis.>>> 3) Rings related: >>> (Crossbow-virt.pdf Section 5.3 Pg 43) >>> mac_rint_t *mac_rings_list_get(mac_client_handle_t mch, >>> uint_t nrings); >>> void mac_rings_list_free(mac_rings_t *rings, uint_t nrings); >>> uint16_t mac_ring_get_flags(mac_ring_t ring); >>> >>> >>> QUESTIONS: >>> >>> Q3.1) All of these interfaces are now categorized as project- >>> private >>> API. What motivated this change. These interfaces need to be >>> more open. >> >> The MAC layer will do the allocation of hardware resources to the >> various MAC clients and their flows. Instead of having each MAC >> client manage its own set of resources, the resources are >> allocated to MAC clients based on their needs, for example the >> degree of parallelism expressed through mac_client_open(). If you >> have specific functional requirements that are not satisfied by >> the current document, please list them. > > Currently rings are hidden resources entirely managed by the mac layer > and clients have no visiblity. All the client gets to do is request a > degree of parallelism. Providing APIs that allow clients to see how > rings were allocated will be useful.Why? What is the functional requirement?>>> Q3.2) The mac_rings_list_get() is only for h/w rings, is >>> there >>> an equivalent interface to obtain s/w ring information. >>> Or this interface can be extended return both h/w ring >>> or s/w ring information. >> >> The interface will evolve to provide that information, but it will >> remain project private. It is provided here FYI but will change in >> future revisions of the document. > > So the expectation is that ring APIs should not be used by clients > and it > is only an internal MAC layer resource managed by it ?Yes, the MAC layer does the allocation of resources to MAC clients and their flows.>>> Q3.3) Are the mac_resource_set() and mac_resources() interfaces >>> going away? >> >> Yes, they will be replaced by different interfaces. But note that >> they are already project private in Nevada and were not supposed >> to be used by other ON components. > > Agreed, but there is no other way in the new Crossbow API a way > to take advantage of multiple rings. There is one generic RX callback, > but no other way to associate a callback with a specific ring. This > is a limitation of the existing API and support should be added to > open API list, so that we can process traffic independently. So having > some additional APIs to expose this will be very useful.But you can process the traffic independently since packets will be sent up to the MAC client concurrently for the multiple rings associated with the client. What else do you need to do here specifically that is prevented by the API?>>> Q3.4) What is the action taken when no free h/w ring available. >>> As per the documentation of mac_rings_list_get(), if no h/w >>> ring available, it returns NULL. In such case, how does >>> mac_unicast_add() behave when NULL is passed for rings? >> >> mac_unicast_add() no longer takes rings. This will be handled >> transparently to the MAC clients by using a default ring and >> falling back to software classification. > > So when there are multiple rings -- and these rings are associated > with all the addresses - there is no pre-defined mapping ? That can > be inefficient as the hardware has the ability to associate each ring > with an address. Can we extend the MAC apis to allow a client to > choose > between the default behavior and binding addresses to rings ?I think I already answered this question multiple times above.>>> Q3.5) Are there any interfaces other than the above mac_rings_xxx >>> interfaces that are available to deal with MAC rings? >> >> Not available to MAC clients. The set of project private >> interfaces might evolve as we refine the design. > > I would like to see some of the functionality exported via these > private mac interfaces promoted to a open API. Even if the API > cannot be moved over, can we extend APIs to provide hints ..Again, what is the functional requirement? What functionality do you need to provide with this information? What hints do you think are missing from the API?>>> Q3.6) Is the mac_rings_list_get() returns the list of mac rings >>> assigned to the client at the time of client open. How can >>> this be changed after the client is open. >> >> The set of assigned rings may change. The details on the APIs >> needed to support this still need to be defined, but they will >> remain project private. > > So you are saying there is no way to rely on how many rings are > available to a particular client. This will change without the > client''s control ? Is CPUs being removed from the system a case > under which this will happen ?The flags taken by mac_client_open() allows some control by the MAC client, see Q1.3. If the client specified that a given CPU be assigned to the client, we could block the DR''ing out the CPU until the MAC client releases that CPU, what is your requirement here?>>> Q3.7) Assigning h/w rings to a specific MAC address limits the >>> bandwidth to the number of rings that are assigned to that >>> address. Is there a way to not to bind h/w rings specific >>> to MAC address so that the bandwidth could be used by >>> any mac client depending on the traffic? >> >> See Q1.3. > > Not sure what you mean. Are you suggesting that some mac addresses > will have SW rings and others will be associated to HW rings ?Between different MAC clients, that''s possible. But within the same MAC client, all unicast addresses of that client will share the same group of hardware rings or SRS.>>> 4) Receive callback related: >>> (Crossbow-virt.pdf Section 5.2.5 Pg 40) >>> int mac_rx_set(mac_client_handle_t mch, mac_rx_fn_t rx_fn, >>> void *arg); >>> int mac_rx_clear(mac_client_handle_t mch); >>> >>> QUESTIONS: >>> >>> Q4.1) How can a client get rx callback per ring that is >>> assigned >>> to the mac client? This will allow parallel processing >>> and improve the performance. Such a feature is already >>> being used in the current implementation of LDoms vSwitch >>> driver and the mac_xxx interfaces should support such an >>> ability. >> >> The parallel processing will still happen. I.e. if multiple >> hardware rings or software rings are assigned to a MAC clients, >> multiple connections associated with that MAC client will be >> spread across these rings. > > So with multiple rings, there will be concurrent callbacks to the > rx_fn, each with packets in the corresponding ring ? Also will each > callback be able to determine what ring did the callback ?Why do you need that information, what is your functional requirement?>>> Q4.2) How can a client get a separate callback for a defined >>> type of >>> traffic, such as different SAP numbers etc. This will >>> be useful to provide out of the band type packet processing >>> or related services. >> >> This will be supported by a MAC flow API built on top of the MAC >> client API. The flow API will be described by a separate document. > > So if a client wants to use the flow API will it need to layer > itself on > the flow API and not the mac client API directly. Can you give me more > information on what this layering will look like. Also, when do you > expect the flow API doc to be available ?The flow API will be an addition to the MAC client API. A MAC client will be able to use that flow API. Such a flow operation would be of the form mac_flow_xxx(mac_client_handle_t mch, <flow description>, <bandwidth properties>, etc). Kais is working on defining that API, I''ll let him comment on expected availability.>>> >>> 5) Transmit related: >>> >>> (Crossbow-virt.pdf Section 5.2.7 Pg 41) >>> mblk_t *mac_tx(mac_client_handle_t *mch, mblk_t *mp, uint64_t >>> hint); >>> >>> QUESTIONS: >>> >>> Q5.1) What are the valid values for the ''hint'' argument? >>> From the description on pg 42, NULL seems to be >>> a valid value. Is it safe to assume that the ''hint'' is a >>> ring-id, if so, a NULL value of 0 will conflict with a >>> ring-id of 0. >> >> The hint can be any 64 bit value, but it must always be the same >> value for the packets corresponding to the same connection to >> avoid reordering. TCP and UDP for example pass the connection >> pointer as the hint, which allows us to avoid packet inspection >> for these protocols. > > Can you clarify what the hint is being used for. Is it similar > to the case below -- where a hash will be applied on hint value > to pick up a TX ring ?Basically yes, a hash function is applied to the hint to select the outbound TX ring.>>> Q5.2) If NULL specified as a ''hint'', how is the tx ring >>> selected? >> >> In this case mac_tx() will parse the packet headers and hash on >> the header information to select a transmit ring. >> > > Is the goal here to somehow bifurcate traffic being sent by a client > via the interface ?The goal is to spread the traffic among the transmit rings assigned to the client while maintaining packet ordering for individual connections, without exposing the details of assignment of transmit rings to MAC clients.> The algorithm is pre-determined by the mac layer > and either hint or misc headed + hash will be used to determine the > Tx ring for transmit ?Yes.> It is possible for a client to request a specific > ring -- is the only way to do this is pick a unique hint ?No, this is done transparently to the MAC client, which doesn''t need to know which transmit rings are assigned to it. <snip>>> >>> 6) Multicast addresses related: >>> (Crossbow-virt.pdf Section 5.2.6 Pg 41) >>> int mac_multicast_add(mac_client_handle_t mch, const uint8_t >>> *addr); >>> int mac_multicast_remove(mac_client_handle_t mch, const uint8_t >>> *addr); >>> >>> >>> No comments at this point. >>> >>> 7) Promiscous mode realted: >>> >>> (Crossbow-virt.pdf Section 5.2.8 Pg 42) >>> Its not clear if the above interface will be available or not, >>> but two new intefaces are added: >>> >>> int mac_promisc_add(mac_client_handle_t mch, mac_promisc_type >>> promisc_type, mac_promisc_fn_t promisc_fn, void *arg, >>> mac_promisc_handle_t *php); >>> int mac_promisc_remove(mac_client_handle_t mch, >>> mac_promisc_handle_t *ph); >>> >>> MAC_PROMISC_ALL - send all packets >>> MAC_PROMISC_MULTI - only broadcast and multicast >>> >>> May be the mac_promisc_add(MAC_PROMISC_ALL) will force device >>> to operate in the promiscous mode. >> >> Both need to, since the device needs to be in promiscuous mode >> also to receive all multicast traffic. > > What do mean by "Both need to." ? In addition to above interface > to enable promisc mode, will the existing promisc_set() interface > be removed ? Also, PROMISC_ALL is unicast+multicast, whereas MULTI > is only all multicast traffic ?Both MAC_PROMISC_* modes will need to cause the device to be put in promiscuous mode. The existing mac_promisc_set() interface is being removed. Yes, PROMISC_ALL will be for all traffic, MULTI only for multicast (and broadcast) traffic.> >> >>> >>> QUESTIONS: >>> >>> Q7.1) According to the section 4.6, the promiscuous mode >>> operates >>> in the layer2 switch model. When choosing the promiscuous mode >>> model can it be either layer2 switch model or shared >>> ethernet model? >> > > Comments ?The answer was included with my answer to Q7.2.>>> Q7.2) From the explanation of mac_promisc_add(), it seems like >>> the mac_promisc_add() could be called without setting >>> MAC address via mac_unicast_add(). Is this correct? >>> If so, what is the expected behaviour? >> >> Currently we provide the same semantics as a switched environment, >> i.e. a MAC client will see the same traffic that would be seen by >> a NIC connected to a switch. >> > > Is there a way to see only the multicast traffic associated with > all mac > clients - union of all mac_client multicast_add addresses. The MULTI > promisc option seems more a way to weed to unicast and broadcast > traffic > on the wire and pass all wire multicast traffic up - including ones > the > system may not be interested in ? Is this the case ?These broadcast flags apply not only to the incoming received traffic but also the traffic sent my MAC clients of the same underlying MAC. I.e. a MAC client PROMISC_MULTI callback will also see all multicast traffic sent by the other MAC clients defined on top of the same MAC. In order to preserve the semantics that are implemented by a real physical switch, this applies to *all* multicast traffic, not just the multicast groups that were "joined" by the individual MAC clients.>> What we would also like to provide is the ability to for a MAC >> client to obtain all the traffic going in and out of the box, as >> well as the traffic exchanged between MAC clients. The non-unicast >> address was part of that solution. >> > > OK .. > >> Another option would be to generalize this with the shared >> ethernet model, and allow a MAC client to specify that it wants to >> observe all traffic via a separate promiscuous type. I need to see >> how this can be added to the API. > > This will be very useful. How about something MAC_PROMISC_CLIENTS ?The new modes will be: * ALL: all traffic, including all traffic sent by MAC clients and traffic seen by the hardware * FILTERED: all multicast and broadcast traffic, plus the traffic to the unicast MAC addresses associated with the MAC client. * MULTI: all multicast and broadcast traffic received on the wire and sent by the MAC clients> >> >>> >>> 8) Statistics related: >>> >>> Q8.1) Is the mac_stat_get() interface being obsoleted or >>> changed? >>> If so, what is the new equivalent interface? >> >> Yes, there will be a new MAC client interface. The MAC layer will >> also maintain per-MAC client statistics for MAC client specific >> statistics such as number of packets sent/received, etc. I need to >> add that interface to the document. > > OK -- when will this be available. Next doc update ? Until that > point in time > should we continue to use the mac_stat_get() interface ?Yes, this will be documented in the next doc update. You will need to use this new interface as part of the port to the new MAC API introduced by Crossbow.> >>> GENERAL QUESTONS: >>> ===============>>> >>> Qg.1) Are there any GLDv3 MAC client interfaces that are being >>> obsoleted(provided by the Nemo framework) but not documented >>> in this doc? >> >> The MAC client interface was project private, and most of the >> interface is being completely revamped by Crossbow. The set of MAC >> client API available to ON consolidation components is described >> by section 5.2 of the document. Any other MAC client API are still >> project private. > > So all interfaces that were part of GLDv3 will be replaced with the > interfaces specified here. Anything that is not specified here should > not be used moving forward ?Yes.>>> Qg.2) Are there any changes to the MAC driver interfaces or being >>> obsoleted? >> >> The changes made to the driver API will be published as part of a >> separate forthcoming document. > > How soon will this doc become available ? I see in the latest doc > Kais published > there are some new interfaces for ring support. I presume there > will be a separate > doc for the deneric mac driver interfaces ?That''s the document. It describes the changes to the driver interface that are made by Crossbow.>>> Qg.3) There are no MAC client interfaces to specify bandwidth >>> attributes. From the section 4.7, it seems like they are >>> implemented as part of VNIC and not as MAC client interfaces. >>> If this is the case, how can the bandwidth attributes be >>> specified? >> >> They are not documented yet, but will be specified as arguments to >> mac_client_open(). > > Can we expect to see this in the next revision of this document ?Yes, I need to formalize the interface and document it. The changes will consist of new arguments to the mac_client_open() call specifying the bandwidth limit and priority.>>> Qg.4) When will the classification interface be fully documented >>> for review? >> >> There will be separate documents for the MAC driver classification >> interfaces, and for the MAC client flow APIs. > > When will this be available ?This was already discussed in Q4.2 and Qg.2. Thanks, Nicolas. -- Nicolas Droux - Solaris Core OS - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux