Kais Belgaied
2008-Sep-08 22:05 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
(networking-discuss Bcc''ed) Good afternoon, The Crossbow team would like to invite you to participate in the project''s second phase of code review. The second phase of the review starts today, and will last for two weeks. It covers the following parts of the code: - MAC layer: . MAC client interface . MAC driver interfaces . Hybrid I/O support . resource management and allocation . packet scheduling . software classification . soft-rings, SRS, and polling . transmit flow control and fanout . promiscuous mode - Other GLDv3 changes: . vnic . DLS . DLD . softmac - IP changes - e1000g driver The webrev is available in http://www.opensolaris.org/os/project/crossbow/Code-review2 The link above includes the suggested format for comments and a list of known issues that will be addressed in the next phase. For a refresher about the project features detailed design, we recommend the updated documents, http://www.opensolaris.org/os/project/crossbow/Docs/ Interested? please send the code review comments by September 20th, 2008 Thanks, The Crossbow team.
Kais, When will crossbow be converting to mercurial? I just pulled up arpddi.c and straight off the bat I see #pragma ident (these are gone with hg if the code is "hg pbchk" clean) So the reason for asking is two fold: - you won''t be doing a putback from the gate that generated the webrev (obviously) and more importantly.. - it might be safer to have code reviewers look at a gate that is after the transition in case the change brings problems with it. Thoughts? Darren
Location: mac.c 774-876 Type: T Priority: 3 Comment: Underlying the calls to mac_perim_enter/exit is a new variation on using mutex''s with kcondvar_t''s to implement locking. Whilst similar to the macros in <sys/condvar_impl.h>, it differs in that it allows recursion on write locking if the owning thread is reacquiring the same lock. What I''d like the project to consider is that rather than embedding this new locking style into these functions, break it out so that this design can be reused later as it seems generally useful. Location: mac_flow.c 1777-1795 (example) Type: T Priority: 2 Comment: the organisation of the flow_desc_t structure should be revised so that instead of having two individual checks for IPv4 and IPv6, there is a single 32bit mask and equality check that covers the IP version and dsfield information. This should work equally well for Ipv4 (with space left over for protocol matching) and Ipv6, where you may also be able to do protocol matching, so long as flow id''s aren''t being used. So in an appropriately designed structure, rather than having: 1780 switch (l3info->l3_version) { 1781 case IPV4_VERSION: { 1782 ipha_t *ipha = (ipha_t *)l3info->l3_start; 1783 1784 return ((ipha->ipha_type_of_service & 1785 fd->fd_dsfield_mask) == fd->fd_dsfield); 1786 } 1787 case IPV6_VERSION: { 1788 ip6_t *ip6h = (ip6_t *)l3info->l3_start; 1789 1790 return ((IPV6_FLOW_TCLASS(ip6h->ip6_vcf) & 1791 fd->fd_dsfield_mask) == fd->fd_dsfield); 1792 } 1793 default: 1794 return (B_FALSE); 1795 } You would have: return ((((uint32_t *)ipha) & fd->fd_l3hdrmask) == fd->fd_l3hdrs); Location: mac_flow.c 2004-2005 Type: T Priority: 3 Comment: there should be no need for the ?: here, the assignment of something to fe_match can be done inside the above if(). Location: mac_flow.c 2089-2093 Type: T Priority: 3 Comment: If fd_ipversion isn''t 4, then what else can it be but 6? We''ve already ASSERT''d that we have an IP version (2058, 2060) that is valid.. Darren
On Tue, Sep 09, 2008 at 08:48:39PM -0700, Darren Reed wrote:> Location: mac.c 774-876 > Type: T > Priority: 3 > Comment: Underlying the calls to mac_perim_enter/exit is a new > variation on using mutex''s with kcondvar_t''s to implement locking. > Whilst similar to the macros in <sys/condvar_impl.h>, it differs in > that it allows recursion on write locking if the owning thread is > reacquiring the same lock. What I''d like the project to consider > is that rather than embedding this new locking style into these > functions, break it out so that this design can be reused later as > it seems generally useful. >I''ll defer to thiru on this one.> Location: mac_flow.c 1777-1795 (example) > Type: T > Priority: 2 > Comment: the organisation of the flow_desc_t structure should be > revised so that instead of having two individual checks for IPv4 > and IPv6, there is a single 32bit mask and equality check that > covers the IP version and dsfield information. This should work > equally well for Ipv4 (with space left over for protocol matching) > and Ipv6, where you may also be able to do protocol matching, > so long as flow id''s aren''t being used. So in an appropriately > designed structure, rather than having: > > 1780 switch (l3info->l3_version) { > 1781 case IPV4_VERSION: { > 1782 ipha_t *ipha = (ipha_t *)l3info->l3_start; > 1783 > 1784 return ((ipha->ipha_type_of_service & > 1785 fd->fd_dsfield_mask) == fd->fd_dsfield); > 1786 } > 1787 case IPV6_VERSION: { > 1788 ip6_t *ip6h = (ip6_t *)l3info->l3_start; > 1789 > 1790 return ((IPV6_FLOW_TCLASS(ip6h->ip6_vcf) & > 1791 fd->fd_dsfield_mask) == fd->fd_dsfield); > 1792 } > 1793 default: > 1794 return (B_FALSE); > 1795 } > > You would have: > return ((((uint32_t *)ipha) & fd->fd_l3hdrmask) == fd->fd_l3hdrs); >the current flow_desc_t design allows the user to specify a dsfield without also specifying ip version. whereas your design makes the ip version mandatory (the fd_l3hdrmask above is ip version dependent because the tos field in ipv4 and traffic class in ipv6 exist in different offsets) also, I think that saving two comparisons is not worth sacrificing readability. most fields in the flow_desc_t now have 1-1 correspondence with user-settable attributes. we deliberately chose to make this structure independent of header format details.> Location: mac_flow.c 2004-2005 > Type: T > Priority: 3 > Comment: there should be no need for the ?: here, the assignment of > something to fe_match can be done inside the above if(). >ACCEPT.> Location: mac_flow.c 2089-2093 > Type: T > Priority: 3 > Comment: If fd_ipversion isn''t 4, then what else can it be but 6? > We''ve already ASSERT''d that we have an IP version (2058, 2060) that > is valid.. >ACCEPT. will change v6 comparison into an assert. eric
Kais Belgaied
2008-Sep-10 15:16 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Hi Darren, the current plan is to convert to Mercurial within a couple of weeks, before the next resync to the latest onnv snapshot. On 09/09/08 19:46, Darren Reed wrote:> Kais, > > When will crossbow be converting to mercurial? > > I just pulled up arpddi.c and straight off the bat I see #pragma ident > (these are gone with hg if the code is "hg pbchk" clean)yep, we''ll b need to run pbchk> > So the reason for asking is two fold: > - you won''t be doing a putback from the gate that generated the > webrev (obviously)that''s right> > and more importantly.. > > - it might be safer to have code reviewers look at a gate that is > after the transition in case the change brings problems with it. > > Thoughts?we''re asking for a review of the difference between onnv and crossbow gates. We realize that we''ve got two moving targets here, so the baseline code (onnv) had to be kept unchanged as much as possible for two reasons: 1. what you just mentioned. Minimize the potential confusion to our reviewers. 2. Prevent the insanely costly mergers that are bound to be error prone. That''s why a few months ago the Crossbow team internally requested that no major changes are done to the Solaris networking stack, DLD, DLS, MAC, softmac, aggr, etc. All the areas in Crossbow''s scope. The rest of the Solaris networking community inside Sun was quite understanding and kindly accepted to re-arrange their schedule to integrate their projects shortly after Crossbow, and in cases, started working off children of the crossbow-gate. Once we address the reviewers comments for this Phase, we''ll send a refreshed webrev against onnv. The content of the delta should be similar, barring the changes for the code review comments of course. When the changes are non trivial, we''ll probably be efficient to include the new location (file name + line numbers) per individual change. Kais> > Darren > > >
Location: mac_sched.c Type: T Priority: 1 Comment: To what extent is header stacking in IPv6 being supported by Crossbow? The only one I see supported is IPPROTO_HOPOPTS. Of immediate concern is that this leaves out IPPROTO_FRAGMENT, IPPROTO_ROUTING and IPPROTO_DSTOPTS. Is further handling of IPv6 part of the plan for further work on Crossbow? (The exclusion of IPPROTO_FRAGMENT would appear to be the most pressing one.) Location: mac_sched.c, mac_flow.c Type: T Priority: 3 Comment: In mac_sched.c we examine the header and say ''type=OTH'' for fragments. How is this knowledge transferred through to the flow classification in mac_flow.c?
On Wed, Sep 10, 2008 at 04:23:06PM -0700, Darren Reed wrote:> Location: mac_sched.c > Type: T > Priority: 1 > Comment: To what extent is header stacking in IPv6 being supported > by Crossbow? The only one I see supported is IPPROTO_HOPOPTS. > Of immediate concern is that this leaves out IPPROTO_FRAGMENT, > IPPROTO_ROUTING and IPPROTO_DSTOPTS. Is further handling > of IPv6 part of the plan for further work on Crossbow? (The exclusion > of IPPROTO_FRAGMENT would appear to be the most pressing one.) >this is being worked on.> Location: mac_sched.c, mac_flow.c > Type: T > Priority: 3 > Comment: In mac_sched.c we examine the header and say ''type=OTH'' > for fragments. How is this knowledge transferred through to the flow > classification in mac_flow.c? >actually, the mac_sched code is entered after flow classification. so any pre-parsed state should be transferred the other way around (mac_flow -> mac_sched). unfortunately this is not being done now due to the current code structure, we''ll try to see if this is feasible in the upcoming weeks. eric
On 09/10/08 17:08, Eric Cheng wrote:> On Wed, Sep 10, 2008 at 04:23:06PM -0700, Darren Reed wrote: > >> Location: mac_sched.c >> Type: T >> Priority: 1 >> Comment: To what extent is header stacking in IPv6 being supported >> by Crossbow? The only one I see supported is IPPROTO_HOPOPTS. >> Of immediate concern is that this leaves out IPPROTO_FRAGMENT, >> IPPROTO_ROUTING and IPPROTO_DSTOPTS. Is further handling >> of IPv6 part of the plan for further work on Crossbow? (The exclusion >> of IPPROTO_FRAGMENT would appear to be the most pressing one.) >> >> > > this is being worked on. > > >> Location: mac_sched.c, mac_flow.c >> Type: T >> Priority: 3 >> Comment: In mac_sched.c we examine the header and say ''type=OTH'' >> for fragments. How is this knowledge transferred through to the flow >> classification in mac_flow.c? >> >> > > actually, the mac_sched code is entered after flow classification. > so any pre-parsed state should be transferred the other way around > (mac_flow -> mac_sched). unfortunately this is not being > done now due to the current code structure, we''ll try to see if > this is feasible in the upcoming weeks. >In that case: Location: mac_flow.c Type: T Priority: 1 Comment: Throughout this code the assumption is made that the packet is not a fragment and that what follows the IP header is in fact a layer 4 protocol header. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20080910/5b4bc6ff/attachment.html>
Darren Reed wrote:> > In that case: > > Location: mac_flow.c > Type: T > Priority: 1 > Comment: Throughout this code the assumption is made that the > packet is not a fragment and that what follows the IP header is in > fact a layer 4 protocol header. >I think we have quite a few PKT_TOO_SMALL() checks to ensure the transport header really exists before dereferencing it. do you not think that''s sufficient? the likelihood of having a fragment <= ip + mac header sizes is small so we chose to fail the classification for these cases (the packet would not be dropped and will reach ip. it''s just that we don''t want to be in business of reassembling fragments within the mac) eric
On 09/10/08 17:43, Eric Cheng wrote:> Darren Reed wrote: >> >> In that case: >> >> Location: mac_flow.c >> Type: T >> Priority: 1 >> Comment: Throughout this code the assumption is made that the >> packet is not a fragment and that what follows the IP header is in >> fact a layer 4 protocol header. >> > > I think we have quite a few PKT_TOO_SMALL() checks to ensure the > transport header really exists before dereferencing it. do you not > think that''s sufficient? the likelihood of having a fragment <= ip + > mac header sizes is small so we chose to fail the classification for > these cases (the packet would not be dropped and will reach ip. it''s > just that we don''t want to be in business of reassembling fragments > within the mac)PKT_TOO_SMALL() verifies that you have "n" bytes of data after IP. It doesn''t guarantee you that it is fragmented data or protocol header. Darren
Darren Reed wrote:> On 09/10/08 17:43, Eric Cheng wrote: >> Darren Reed wrote: >>> In that case: >>> >>> Location: mac_flow.c >>> Type: T >>> Priority: 1 >>> Comment: Throughout this code the assumption is made that the >>> packet is not a fragment and that what follows the IP header is in >>> fact a layer 4 protocol header. >>> >> I think we have quite a few PKT_TOO_SMALL() checks to ensure the >> transport header really exists before dereferencing it. do you not >> think that''s sufficient? the likelihood of having a fragment <= ip + >> mac header sizes is small so we chose to fail the classification for >> these cases (the packet would not be dropped and will reach ip. it''s >> just that we don''t want to be in business of reassembling fragments >> within the mac) > > PKT_TOO_SMALL() verifies that you have "n" bytes of data after IP. > It doesn''t guarantee you that it is fragmented data or protocol header. >ok, I see what you''re saying. we''ll look into this. eric
I will send more comments in the next few days, but firstly a couple of things that caught my eyes: Location: mac_sched.c, squeue.c Type: T Priority: 1 Comment: Could someone explain a bit how mac_client_srs_poll() works? As I understand it directly calls the driver poll entry point when there is a RX ring assigned, so the b_rptr in mblks point at the MAC header. But this function is called from squeue_get_pkts(), which later calls ip_accept_tcp() but the latter assumes the b_rptr is pointing at the IP header... but I didn''t see any b_rptr adjustment across these function calls, or am I missing anything? Also, is this a fast path that will only be taken when there''s no promisc listeners? As I didn''t find any call to functions which dispatch packets to promisc clients. Location: mac_sched.c Type: E Priority: 1 Comment: Is there any reason that we grab the mac_bw_lock to update mac_bw_sz but only for MAC_RX_SRS_ENQUEUE_CHAIN but not for MAC_TX_SRS_ENQUEUE_CHAIN ? Probably a typo? More later. Thanks, Zhijun Kais Belgaied wrote:> (networking-discuss Bcc''ed) > > Good afternoon, > > The Crossbow team would like to invite you to participate in the project''s > second phase of code review. > > The second phase of the review starts today, and will last for > two weeks. It covers the following > parts of the code: > > - MAC layer: > . MAC client interface > . MAC driver interfaces > . Hybrid I/O support > . resource management and allocation > . packet scheduling > . software classification > . soft-rings, SRS, and polling > . transmit flow control and fanout > . promiscuous mode > - Other GLDv3 changes: > . vnic > . DLS > . DLD > . softmac > - IP changes > - e1000g driver > > > The webrev is available in > http://www.opensolaris.org/os/project/crossbow/Code-review2 > > The link above includes the suggested format for comments and a list > of known issues that will be addressed in the next phase. > > For a refresher about the project features detailed design, we recommend > the > updated documents, > http://www.opensolaris.org/os/project/crossbow/Docs/ > > Interested? please send the code review comments by September 20th, 2008 > > Thanks, > > The Crossbow team. > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >
rajagopal kunhappan
2008-Sep-11 21:29 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Zhijun Fu wrote:> I will send more comments in the next few days, but firstly a couple of things that > caught my eyes: > > Location: mac_sched.c, squeue.c > Type: T > Priority: 1 > Comment: Could someone explain a bit how mac_client_srs_poll() works? As I understand > it directly calls the driver poll entry point when there is a RX ring assigned, > so the b_rptr in mblks point at the MAC header. But this function is called from > squeue_get_pkts(), which later calls ip_accept_tcp() but the latter assumes the b_rptr > is pointing at the IP header... but I didn''t see any b_rptr adjustment across these > function calls, or am I missing anything?Agree with your comments about not moving the b_rptr to the ip header but this code is currently not in use and should have been "ifdef''d" out. Will do that. The idea behind the code was that TCP traffic could have its own hardware Rx ring and the SRS on top of it do the polling and deliver the packets to squeue and squeue could in turn ask the SRS to stop sending packets up, and do polling via SRS. Anyway the code is not in use. Sorry.> Also, is this a fast path that will only be taken when there''s no promisc listeners? > As I didn''t find any call to functions which dispatch packets to promisc clients. > > > Location: mac_sched.c > Type: E > Priority: 1 > Comment: Is there any reason that we grab the mac_bw_lock to update mac_bw_sz but > only for MAC_RX_SRS_ENQUEUE_CHAIN but not for MAC_TX_SRS_ENQUEUE_CHAIN ? Probably > a typo?This is one of the asymmetry between Rx and Tx side. On the Rx side, there could be multiple Rx rings associated with a link. When bw is configured for the link, you need to control the incoming packet rate from all the Rx rings. A single mac_bw_ctl_t structure is shared among all the soft_ring_set''s (SRS) associated with Rx rings. So you grab the mac_bw_lock and do the update. Note that there is a 1-1 mapping between an Rx ring and a SRS. On the Tx side, all outgoing packets will come to Tx SRS via mac_tx() call and we do the bw control under srs_lock. mac_bw_lock is not used. Thanks, -krgopi> > > More later. > > Thanks, > Zhijun > > > Kais Belgaied wrote: >> (networking-discuss Bcc''ed) >> >> Good afternoon, >> >> The Crossbow team would like to invite you to participate in the project''s >> second phase of code review. >> >> The second phase of the review starts today, and will last for >> two weeks. It covers the following >> parts of the code: >> >> - MAC layer: >> . MAC client interface >> . MAC driver interfaces >> . Hybrid I/O support >> . resource management and allocation >> . packet scheduling >> . software classification >> . soft-rings, SRS, and polling >> . transmit flow control and fanout >> . promiscuous mode >> - Other GLDv3 changes: >> . vnic >> . DLS >> . DLD >> . softmac >> - IP changes >> - e1000g driver >> >> >> The webrev is available in >> http://www.opensolaris.org/os/project/crossbow/Code-review2 >> >> The link above includes the suggested format for comments and a list >> of known issues that will be addressed in the next phase. >> >> For a refresher about the project features detailed design, we recommend >> the >> updated documents, >> http://www.opensolaris.org/os/project/crossbow/Docs/ >> >> Interested? please send the code review comments by September 20th, 2008 >> >> Thanks, >> >> The Crossbow team. >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >> > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss--
rajagopal kunhappan wrote:> Zhijun Fu wrote: >> I will send more comments in the next few days, but firstly a couple >> of things that >> caught my eyes: >> >> Location: mac_sched.c, squeue.c >> Type: T >> Priority: 1 >> Comment: Could someone explain a bit how mac_client_srs_poll() works? >> As I understand >> it directly calls the driver poll entry point when there is a RX ring >> assigned, >> so the b_rptr in mblks point at the MAC header. But this function is >> called from >> squeue_get_pkts(), which later calls ip_accept_tcp() but the latter >> assumes the b_rptr >> is pointing at the IP header... but I didn''t see any b_rptr >> adjustment across these >> function calls, or am I missing anything? > > Agree with your comments about not moving the b_rptr to the ip header > but this code is currently not in use and should have been "ifdef''d" > out. Will do that. The idea behind the code was that TCP traffic could > have its own hardware Rx ring and the SRS on top of it do the polling > and deliver the packets to squeue and squeue could in turn ask the SRS > to stop sending packets up, and do polling via SRS. Anyway the code is > not in use. Sorry.I see. Thanks.> >> Also, is this a fast path that will only be taken when there''s no >> promisc listeners? >> As I didn''t find any call to functions which dispatch packets to >> promisc clients. >> >> >> Location: mac_sched.c >> Type: E >> Priority: 1 >> Comment: Is there any reason that we grab the mac_bw_lock to update >> mac_bw_sz but >> only for MAC_RX_SRS_ENQUEUE_CHAIN but not for >> MAC_TX_SRS_ENQUEUE_CHAIN ? Probably >> a typo? > > This is one of the asymmetry between Rx and Tx side. On the Rx side, > there could be multiple Rx rings associated with a link. When bw is > configured for the link, you need to control the incoming packet rate > from all the Rx rings. A single mac_bw_ctl_t structure is shared among > all the soft_ring_set''s (SRS) associated with Rx rings. So you grab > the mac_bw_lock and do the update. Note that there is a 1-1 mapping > between an Rx ring and a SRS. > > On the Tx side, all outgoing packets will come to Tx SRS via mac_tx() > call and we do the bw control under srs_lock. mac_bw_lock is not used.Got it. Thanks, Zhijun> > Thanks, > -krgopi > >> >> >> More later. >> >> Thanks, >> Zhijun >> >> >> Kais Belgaied wrote: >>> (networking-discuss Bcc''ed) >>> >>> Good afternoon, >>> >>> The Crossbow team would like to invite you to participate in the >>> project''s >>> second phase of code review. >>> >>> The second phase of the review starts today, and will last for >>> two weeks. It covers the following >>> parts of the code: >>> >>> - MAC layer: >>> . MAC client interface >>> . MAC driver interfaces >>> . Hybrid I/O support >>> . resource management and allocation >>> . packet scheduling >>> . software classification >>> . soft-rings, SRS, and polling >>> . transmit flow control and fanout >>> . promiscuous mode >>> - Other GLDv3 changes: >>> . vnic >>> . DLS >>> . DLD >>> . softmac >>> - IP changes >>> - e1000g driver >>> >>> >>> The webrev is available in >>> http://www.opensolaris.org/os/project/crossbow/Code-review2 >>> >>> The link above includes the suggested format for comments and a list >>> of known issues that will be addressed in the next phase. >>> >>> For a refresher about the project features detailed design, we >>> recommend the >>> updated documents, >>> http://www.opensolaris.org/os/project/crossbow/Docs/ >>> >>> Interested? please send the code review comments by September 20th, >>> 2008 >>> >>> Thanks, >>> >>> The Crossbow team. >>> _______________________________________________ >>> crossbow-discuss mailing list >>> crossbow-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >>> >> >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >
Peter Memishian
2008-Sep-12 00:51 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
> Agree with your comments about not moving the b_rptr to the ip header> but this code is currently not in use and should have been "ifdef''d" > out. Will do that. So this code will be made to work before Crossbow is integrated? If not, shouldn''t it just be removed? -- meem
Thirumalai Srinivasan
2008-Sep-15 21:12 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Kais, I looked at some parts of the MAC code. Attached is the list of comments Thirumalai Kais Belgaied wrote:> (networking-discuss Bcc''ed) > > Good afternoon, > > The Crossbow team would like to invite you to participate in the project''s > second phase of code review. > > The second phase of the review starts today, and will last for > two weeks. It covers the following > parts of the code: > > - MAC layer: > . MAC client interface > . MAC driver interfaces > . Hybrid I/O support > . resource management and allocation > . packet scheduling > . software classification > . soft-rings, SRS, and polling > . transmit flow control and fanout > . promiscuous mode > - Other GLDv3 changes: > . vnic > . DLS > . DLD > . softmac > - IP changes > - e1000g driver > > > The webrev is available in > http://www.opensolaris.org/os/project/crossbow/Code-review2 > > The link above includes the suggested format for comments and a list > of known issues that will be addressed in the next phase. > > For a refresher about the project features detailed design, we recommend > the > updated documents, > http://www.opensolaris.org/os/project/crossbow/Docs/ > > Interested? please send the code review comments by September 20th, 2008 > > Thanks, > > The Crossbow team. > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >-------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cr_phase2 URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20080915/e5bd2541/attachment.ksh>
Here are my comments for mac.c. the rest will come later. general issues: EC-G1 there are lots of places using "(mask & flag)" as a boolean. this might seem ok if used in a conditional but sometimes these actually gets saved in boolean variables. mdb would end up showing numeric values instead of B_TRUE/B_FALSE if this happens. it''s best to replace all with "((mask & flag) != 0)" EC-G2: naming of dtrace probes. I have noticed lots of probes with names like "mac__whatever__something" inside function "mac_whatever()". the "mac__whatever" prefix should be removed. if the intention is to make the probe name unique, that''s not necessary because the same probe name can be used in multiple functions. to enable a specific probe within a specific function at run time, you just have to do this: dtrace -n ''mac_whatever:something'' which is much simpler than ''mac_whatever:mac-whatever-something''. specific issues: mac.c: EC-1: mi_lock, mi_rx_lock, mi_tx_lock, mi_rx_cv are all unused. it might also be possible to remove mi_ring_lock and use mi_rw_lock instead. EC-2: in mac_fini(): rename mac_soft_ring_finish() to mac_soft_ring_fini() to be consistent with other *_fini calls in this function. EC-3: remove i_mac_ring_ctor/dtor and do the initialization/destruction in mac_ring_alloc/free instead. also, kmem_cache_alloc("mac_ring_cache",...) should pass NULL for ctor/dtor. EC-4: MAC_NOTE_PROMISC and mi_promisc does not seem to be used anymore. the callback str_notify() does nothing. EC-5: in mac_addr_factory_*(), why does the slot # have to be >= 1? why can''t the slot index (starts from 0) be returned to the caller? EC-6: mac_rx_group_unmark() should be moved closer to where it''s used. also, before resetting a flag, need to assert that it''s really set. EC-7: mac_tx_classify(), mac_rx_classify(): better return an int (the err code) instead of the flent. it''d be helpful to tell whether mac_flow_lookup() or FLOW_TRY_REFHOLD() failed. EC-8: mac_tx_send(): for the fastpath case, mi_tx() is capable of taking a chain so there is no need to send each mp separately. (a bug might already exist) on a related note, the driver entry point used by mac_ring_tx() doesn''t seem to take a chain. why is this? behavior of mri_send() should be consistent with mi_tx(). EC-9: mac_set_rx_group_state(): need to add checks to ensure it''s legal to transition from grp->mrg_state to state. EC-10: the macro MAC_RX_GROUP_ONLY_CLIENT() does not seem necessary. in mac_set_rx_group_state(), mac_rx_srs_clone(), the macro can be replaced with grp->mrg_clients->mgc_client. because the returned mcip is assumed to be non-NULL. in mac_rx_group_add_flow(), mac_rx_group_remove_flow(), the macro could be replaced with "grp->mrg_clients->mgc_nflows == 1". (mgc_nflows should be renamed mgc_nclients). in mac_reserve_rx_group(), the check: if (!MAX_RX_GROUP_NO_CLIENT(grp) && (MAC_RX_GROUP_ONLY_CLIENT(grp) != mcip)) { .... } should be replaced with: if (grp->mrg_clients->mgc_nclients == 1 && grp->mrg_clients->mgc_client != mcip) { .... } all references to MAX_RX_GROUP_NO_CLIENT() (badly misnamed) should be replaced with "grp->mrg_clients->mgc_nclients == 0". EC-11: the macros GROUP_INTR_* should be placed into functions just like mac_start_group(), mac_stop_group()...etc. EC-12: quite a few functions seem to be related to srs''s and should belong in mac_datapath_setup.c. e.g.: mac_rx_srs_clone(), mac_rx_srs_quiesce(), mac_rx_srs_remove(), mac_srs_clear_flag(), mac_rx_srs_restart(), mac_rx_classify_flow_rem(), mac_rx_classify_flow_quiesce(), mac_rx_classify_flow_restart(), mac_rx_client_quiesce(), mac_rx_client_restart(), mac_tx_srs_quiesce(), mac_tx_srs_restart(), mac_tx_flow_quiesce(), mac_tx_flow_restart(), mac_tx_client_flush(), mac_client_quiesce(), mac_client_restart(), i_mac_tx_srs_notify() EC-13: s/MAC_PROP_BIND_CPU/MAC_PROP_CPUS/ s/MAC_PROP_PRIO/MAC_PROP_PRIORITY/ in mac_{set,get}_prop(), the "switch (macprop->mp_id)" is not needed since we already know this is a mac prop. just call bcopy and mac_{set,get}_resources() directly. EC-14: mac_ring_*(), mac_group_*() routines do not have a consistent naming convention. mac_start_ring(), mac_start_group()..etc should be changed to mac_ring_start(), mac_group_start(),..etc. EC-15: in mac_init_rings(), the last group (i.e. groups[cap_rings->mr_gnum]) is used as a dummy group for storing unused rings. it''d be better to have an explicit unused-rings group (e.g. mi_rx_group_unused, mi_tx_group_unused). related to this is that mac_rx_reserve_group() doesn''t seem to be taking rings from the last group mi_rx_groups[mip->mi_rx_group_count], it takes rings from mi_rx_groups[0], which doesn''t have the leftover rings. EC-16: remove the RING_SEND_FUNC(), RING_DRIVER_HANDLE() and related macros. these macros will not be used directly by anyone. the mac_ring_*() functions already act as wrappers for clients and can dereference ring->mr_info directly. EC-17: in i_mac_group_add_ring(), cap_rings->mr_gaddring would only get called if mrg_driver is not NULL. but it doesn''t seem like mrg_driver can ever be NULL for a valid mac_group_t. the NULL check can be removed. (do the same for i_mac_group_rem_ring()) EC-18: mac_group_mov_ring(): s_group cannot be NULL so the check can be removed. EC-19: these functions are unused: mac_group_addring(), mac_group_remring(), mac_get_macaddr_group() EC-20: mac_free_macaddr(): this function already takes the map so there''s no need to call mac_find_macaddr(). just remove it directly from the mi_addresses list. EC-21: mac_add_macaddr(): what if group != NULL && group != mip->mi_rx_groups? you''ll end up freeing the map and returning 0, a silent error. it''d also be useful to add an output argument that returns the mac_address_t pointer. with this, the callers won''t have to make an extra call to mac_find_macaddr() to get the pointer. EC-22: mac_remove_macaddr(): the check, if (map->ma_group == NULL) return (0); will cause leakage of map. this, if (err != 0) return (err); will need to re-adjust map->ma_ref before returning. EC-23: why is mac_init_macaddr()/mac_fini_macaddr() necessary? why not just call mac_add_macaddr()/mac_remove_macaddr() directly? EC-24: mac_update_macaddr(): it''s better to do the duplicate detection in here rather than relying on the caller. this should remove a few mac_find_macaddr()''s from the callers. EC-25: mac_write_flow_desc(), this needs to use fd_mac_len rather than ETHERADDRL. EC-26: mac_log_flowinfo(): this code reminds of a potential misuse of subflows. the user could create a subflow with the same name as a vnic. supposing the subflow is created on another link and the vnic with same name as the subflow is not yet up (not mac_registered), when the vnic comes up, it won''t be able to create its kstats because someone took its name. we might need some naming scheme for subflow kstats to ensure no collision with link names. EC-27: mac_log_flowinfo(): I think this function assumes flent->fe_mcip is non-NULL because if this is not so, panic will happen at mac_write_flow_desc(). EC-28: mac_reserve_tx_ring(): is there a need for this code: /* find the client which owns that ring */ for (client = mip->mi_clients_list; client != NULL; client = client->mci_client_next) { srs = client->mci_tx_srs; if (mac_tx_srs_ring_present(srs, desired_ring)) { /* found our ring */ break; } } since we could get the srs from desired_ring->mr_srs and client from srs->srs_mcip. this will panic: /* give a new ring to the client... */ sring = mac_reserve_tx_ring(mip, NULL); ASSERT(sring != NULL); if there are no free rings available. this function needs to return an int error code because it could fail here and in mac_start_ring() and we need to be able to tell what happened. EC-29: i_mac_group_allocate_rings(): need to add checks to ensure that the ''nrings'' returned by ms_squery is non-zero. just the assert is not enough. the ''rings'' array could be leaked in the error cases below. (ran out of rings, failed to move rings) if mac_group_mov_ring() fails for the tx case, 0 is returned instead of rv. also, tmp_ring needs to be released before returning. EC-30: mac_reserve_rx_group(): the code in the loop: for (i = start; i < start + loopcount; i++) { } should be moved to a function. mixing breaks and continues makes the code somewhat confusing. doing this will also allow proper error codes to be returned (provided that mac_reserve_rx_group() is changed to return int also) EC-31: mac_rx_group_add_flow(): if mac_start_group() fails, the mac_add_macaddr() above needs to be undone. EC-32: mac_rx_group_remove_flow(): this function should be changed to return an int. mac_set_rx_group_state() is returning void now but that needs to be fixed too because it can fail in many ways. EC-33: mac_reserve_tx_group(): this function assumes that mac_start_group() never fails and tx groups never run out. both of the error cases need to be handled. similar to the above functions, this should be changed to return in int. the mac_group_t* could be returned as an output arg. this is good for diagnostic purposes since we have multiple points of failure.
Kais Belgaied
2008-Sep-16 23:33 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On 09/15/08 14:12, Thirumalai Srinivasan wrote:> > Kais, > > I looked at some parts of the MAC code. Attached is the list of commentsThanks Thiru for the review, The crossbow team will be addressing these comments and replying in the next few days Kais> > Thirumalai > > Kais Belgaied wrote:
Kais Belgaied
2008-Sep-16 23:34 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On 09/16/08 05:24, Eric Cheng wrote:> Here are my comments for mac.c. > the rest will come later. >Thanks Eric for the review, The crossbow team will be addressing these comments and replying in the next few days
Kais, I looked at some of the mac changes, comments attached. Thanks, Zhijun Kais Belgaied wrote:> (networking-discuss Bcc''ed) > > Good afternoon, > > The Crossbow team would like to invite you to participate in the project''s > second phase of code review. > > The second phase of the review starts today, and will last for > two weeks. It covers the following > parts of the code: > > - MAC layer: > . MAC client interface > . MAC driver interfaces > . Hybrid I/O support > . resource management and allocation > . packet scheduling > . software classification > . soft-rings, SRS, and polling > . transmit flow control and fanout > . promiscuous mode > - Other GLDv3 changes: > . vnic > . DLS > . DLD > . softmac > - IP changes > - e1000g driver > > > The webrev is available in > http://www.opensolaris.org/os/project/crossbow/Code-review2 > > The link above includes the suggested format for comments and a list > of known issues that will be addressed in the next phase. > > For a refresher about the project features detailed design, we recommend > the > updated documents, > http://www.opensolaris.org/os/project/crossbow/Docs/ > > Interested? please send the code review comments by September 20th, 2008 > > Thanks, > > The Crossbow team. > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >-- #mdb -K [0]> eri.prc.sun.com::walk staff s|::print staff_t s_name| ::grep .== zhijun |::eval <s=K|::print staff_t Zhijun.Fu at Sun.COM, x84349 Network Virtualization & Performance Team, Solaris Core Operating Systems Since Jul 10,2006 [0]> :c -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Crossbow-code-review.txt URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20080918/b7fbf07f/attachment.txt>
Kais Belgaied
2008-Sep-18 14:15 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Thanks Zhijun for the review, we''ll be going over all the comments and start addressing and replying over the next few weeks. Kais. On 09/18/08 04:08, Zhijun Fu wrote:> Kais, > > I looked at some of the mac changes, comments attached. > > Thanks, > Zhijun
Erik Nordmark
2008-Sep-22 21:45 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
* Reviewer Name: Erik Nordmark * Time Spent * General Comment: I reviewed uts/common/inet/ files, but I didn''t pay close attention to the capability changes. There are several files which only have sccsid changes. Why not remove them before sending out the code review? --- * Comment Location: ip_impl.h et al * Type: T * Priority 1 * Comment Text You have defined ILL_SEND_TX (line 508) as checking for mp != NULL. Is is extremely bad practice both from an understanding and maintance perspective. When the code takes advantage of this as in tcp.c: 19711 FW_HOOKS(ipst->ips_ip4_physical_out_event, 19712 ipst->ips_ipv4firewall_physical_out, 19713 NULL, ill, ipha, mp, mp, 0, ipst); 19714 DTRACE_PROBE1(ip4__physical__out__end, mblk_t *, mp); 19715 DTRACE_IP_FASTPATH(mp, ipha, ill, ipha, NULL); 19716 19717 ILL_SEND_TX(ill, ire, connp, mp, 0); there is nothing that indicates that mp could be NULL after FW_HOOKS; unlike the current onnv-gate where this is made explicit. Thus it wouldn''t be unreasonable to try to insert some references to mp after FW_HOOKS and ILL_SEND_TX; it is easier to understand with an explicit check for mp != NULL. (And I suspect that DTRACE_IP_FASTPATH would actually need to check for NULL already.) It is bad for performance since all code between FW_HOOKS and ILL_SEND_TX have to check for NULL. And if ILL_SEND_TX is used in some other context there is an unneeded check for mp != NULL. * Comment Location: ip6.c around line 12523 * Type: T * Priority 1 * Comment Text You are changing the code so it drops UDP/RAWIP packets instead of doing a putq. This means that UDP applications will see a behavior change. I don''t think that is a good thing. Given that you preserve the putq elsewhere in the code, why have you removed it in ip_xmit_v6() only? * Comment Location: udp.c around line 6364 * Type: E * Priority 1 * Comment Text I don''t see any explanation of how ip_wsrv() gets called once ill_flow_blocked is set (and there wasn''t a canputnext() to the driver). Who calls ill_flow_enable()? Explain in comments. * Comment Location: udp_impl.h * Type: ET * Priority 1 * Comment Text ILL_FLOW_BLOCKED() sure seems like it should be a predicate (test the state of the ill) but in fact it changes ill_flow_blocked. That is likely to lead to maintenance problems. Please factor out the changing of ill_flow_blocked so that it is not part of the macro (if that is done then you might no longer need a macro). Also, given that ILL_FLOW_BLOCKED is used by ip*.c, why is it defined in a UDP specific header file (udp_impl.h)? * Comment Location: ip.c lock ordering * Type: T * Priority 2 * Comment Text ip.c makes me loose sleep by involing cpu_lock in the TCP/IP lock hiearchy: 374 * squeue(sq_lock), flow related (ft_lock, fe_lock) locking 375 * 376 * cpu_lock --> ill_lock --> sqset_lock --> sq_lock 377 * sq_lock -> conn_lock -> QLOCK(q) 378 * ill_lock -> ft_lock -> fe_lock Can we avoid defining lock hierarchies across subsystems? Why do we need to hold cpu_lock when acquiring ill_lock and friends? Given that ip_squeue_getfree() holds sqset_lock and then calls thread_lock() I think there is a deadlock waiting to happen due to the lock order. Does the code in ip_squeue require that the new CPU be known atomically with the new CPU coming into existence? If it does not then the locking interaction can be greatly simplified by having the callback from the cpu code just record that a new CPU has arrived and schedule a taskq to setup the squeues etc. * Comment Location: ip.c around L 5966 * Type: T * Priority 1 * Comment Text Missing cleaup in ip_stack_fini for the ipst->ips_capab_taskq_thread which is created in ip_stack_init. (Alternatively, if it is sufficient to have a single taskq thread for all IP instances you can move it to ip_ddi_init with cleanup in ip_ddi_destroy). * Comment Location: ip.c L 14873 * Type: E * Priority 2 * Comment Text You change the last argument from NULL to 0 in 14783 ip_input(ill, NULL, mp, 0); Since the last argument is a pointer, NULL is the correct value. * Comment Location: ip.c * Type: E * Priority 2 * Comment Text From where is ip_accept_tcp() called? Please add comment explaining this. * Comment Location: usr/src/uts/intel/pfil/pfil.objt-symbols.obj64 * Type: T * Priority 2 * Comment Text Why do you touch that file? pfil no longer exists. * Comment Location: ip_squeue.c L47 * Type: E * Priority 1 * Comment Text Please clarify whether squeues are destroyed or not. The comments are conflicting at: 47 * When a CPU goes offline its squeue set is destroyed, and all its squeues and at 54 * them without holding the squeue lock because of the guarantee that squeues 55 * are never destroyed. ip_squeue locks must be held, however. * Comment Location: ip_squeue.c L265-271 * Type: T * Priority 1 * Comment Text This loop does nothing. Remove it? Or should it be break instead of continue? Currently sq is always NULL at the end of the loop, which means that part of the code after line 271 never executes. 265 for (sq = sqs->sqs_head; sq != NULL; sq = sq->sq_next) { 266 /* 267 * Select a non-default squeue 268 */ 269 if (sq->sq_state & SQS_DEFAULT) 270 continue; 271 } * Comment Location: ip_squeue.c L344 * Type: T * Priority 1 * Comment Text When sqset_global_size = 1 this does the wrong thing: 344 sqs = sqset_global_list[(index % (sqset_global_size-1)) + 1]; Shouldn''t it just be without the -1 and +1? * Comment Location: squeue.c L 239 * Type: T * Priority 3 * Comment Text It sure would be nice if SQUEUE_MSG_SIZE wasn''t needed. How hard would it be to get tcp_timercache to only use normal mblks? * Comment Location: ip_squeue.c L272 * Type: E * Priority 1 * Comment Text You define ENQUEUE_CHAIN_AT_HEAD but nothing uses this macro. Remove? * Comment Location: squeue.c Line 586 * Type: T * Priority 2 * Comment Text We have 586 interrupt = (servicing_interrupt() || tag == SQTAG_IP_INPUT_RX_RING); which is odd to me. The SQTAG has so far only been a debugging aid. It is very odd to now use it for something other than debugging. This oddity also shows up in ip.c L14911. ----
Peter Memishian
2008-Sep-22 22:16 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
> ip.c makes me loose sleep by involing cpu_lock in the TCP/IP lock hiearchy:As well it should: there are several open bugs that the current (pre-Crossbow) code has deadlocks due to violations here. -- meem
Kais Belgaied
2008-Sep-24 02:16 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Thanks Erik for the review the team is addressing all the comments and we''ll start to send the answers soon. On 09/22/08 14:45, Erik Nordmark wrote:> > * Reviewer Name: Erik Nordmark > * Time Spent > * General Comment: > I reviewed uts/common/inet/ files, but I didn''t pay close attention > to the capability changes. > > There are several files which only have sccsid changes. Why not remove > them before sending out the code review?some were missed, We''ll look carefully for those in the Phase II and in the updated webrev. Kais.> > --- >
sowmini.varadhan at sun.com
2008-Sep-26 00:04 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On (09/08/08 15:05), Kais Belgaied wrote:> > The webrev is available in > http://www.opensolaris.org/os/project/crossbow/Code-review2:> Interested? please send the code review comments by September 20th, 2008Sorry this is a little late, and I only looked at specific parts of this quickly, but here''s my $0.02. ------------------------------------------------------------------------ General: -------- - lot of zero-delta files; would be helpful to filter these out of the webrev to get a focussed review of such a complex project. - dld_drv.c, libdladm: needs to merge with recent nevada builds - it seems like we have a whole family of functions in ip with the _simple suffix that duplicate the code of their un-simple counterparts. Does this produce sufficient performance improvement to justify the mainenance problem produced by duplicated code? Specifics: ---------- - mac_set_prop: 2879 if (mac_is_macprop(macprop)) { : 2898 } else if (mip->mi_callbacks->mc_callbacks & MC_SETPROP) { : 2901 } 2902 2903 return (err); 2904 } Could end up return ing uninitialized value if property is !mac_is_macprop() and the driver does not support MC_SETPROP. Suggest 2873 int err = ENOTSUP; : 2879 if (mac_is_macprop(macprop)) { : 2898 } + if (mip->mi_callbacks->mc_callbacks & MC_SETPROP) { : 2901 } 2902 2903 return (err); - mac_get_prop: similar issue around uninitialized err. - mac.c: if property is a mac_prop we would return so Line 2942 is redundant, and the else creates needless tabbing of subsequent lines - mac.c: need to check valsize before doing bcopy in line 2932, 2937 - mac_get_prop: lines 2931-2933 are identcial to lines 2936-2938. Merge them into one case? 2929 case MAC_PROP_MAXBW: 2930 case MAC_PROP_PRIO: 2931 mac_get_resources(mh, &mrp); 2932 bcopy(&mrp, val, sizeof (mac_resource_props_t)); 2933 return (0); 2934 case MAC_PROP_BIND_CPU: 2935 case MAC_PROP_FANOUT: 2936 mac_get_resources(mh, &mrp); 2937 bcopy(&mrp, val, sizeof (mac_resource_props_t)); 2938 return (0); - mac.c: bzero at line 2926 of mac_getprop can be moved to line 2931. - mac.c: similarly lines 2887-2888 are identical to lines 2891-2892; line 2882 can be moved to line 2887. - if mi_resource_props is "SL", then should mac_get_resources invoke i_mac_perim_enter? - linkprop.c: do_check_priority should set up vd_val with the mac_resource_props_t (which is currently done in do_set_res). do_set_res can be replaced by a call to dld_set_public_prop(). - linkprop.c: why isn''t DLDIOC_MODIFYFLOW reusing DLDIOC_SETMACPROP? - linkprop.c: how do MAC_PROP_BIND_CPU and MAC_PROP_FANOUT get set from dladm? I see them in dld_public_prop_t, but not in the prop_table[] and I don''t see how they would get set. --Sowmini
Erik Nordmark
2008-Sep-26 06:03 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
sowmini.varadhan at sun.com wrote:> - it seems like we have a whole family of functions in ip with > the _simple suffix that duplicate the code of their un-simple > counterparts. Does this produce sufficient performance improvement > to justify the mainenance problem produced by duplicated code?I hope the duplication will be short-lived. The reason is that refactor-gate already includes a more condensed implementation of the same ideas (this shows up as ire_route_recursive_dstonly in refactor-gate) Erik
Peter Memishian
2008-Sep-26 07:14 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
> > - it seems like we have a whole family of functions in ip with> > the _simple suffix that duplicate the code of their un-simple > > counterparts. Does this produce sufficient performance improvement > > to justify the mainenance problem produced by duplicated code? > > I hope the duplication will be short-lived. Likewile -- this looks like a maintenance hazard. As an aside, will systems using IPMP be forced into the non-fastpath, or do I need to extend this hack further as part of Clearview IPMP? :-/ -- meem
Kais Belgaied
2008-Sep-26 17:54 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On 09/25/08 17:04, sowmini.varadhan at sun.com wrote:> On (09/08/08 15:05), Kais Belgaied wrote: > >> The webrev is available in >> http://www.opensolaris.org/os/project/crossbow/Code-review2 >> > : > >> Interested? please send the code review comments by September 20th, 2008 >> > > Sorry this is a little late, and I only looked at specific parts of > this quickly, but here''s my $0.02. > > >Thanks Sowmini for the review, the team is compiling the whole list of comments and shall be addressing them in the next coming days. Kais. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20080926/00f3fa65/attachment.html>
Kais Belgaied
2008-Sep-27 00:49 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Thanks Sasha for the review, the team will be addressing and responding in the next few days. Kais. On 09/26/08 17:27, Alexander Kolbasov wrote:> General comment for the squeue code. > > These are for include files only. I have more comments for squeue.c file > itself. >
Peter Memishian
2008-Sep-28 05:58 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Unfortunately, I only had time to look over the IP changes. uts/common/inet/ip/ip_squeue.c: * 186: ip_squeue_set_create() does KM_SLEEP allocations, but all of its callers have cpu_lock held. * 259: First argument to ip_squeue_getfree() is unused. * 289: Do we really need to grab sq_lock here considering the squeue_t has not been made available to a ring? If so, why do we not need to hold sq_lock when adjusting sq_priority earlier in the function? * 290: Unclear what SQS_ILL_BOUND is needed for. Is this strictly a debugging aid? * 336: Is this ASSERT() supposed to be > 0 rather than > 1? Should probably also be checked under sqset_lock. * 490: Unclear why ip_squeue_restart_ring() needs to check ILL_CONDEMNED. * 507: Seems like this cv is misnamed, as we use it both to wait for quiesce and restart. Also, if there''s any risk that multiple threads could be waiting on different conditions on the cv, we need to use cv_broadcast(). * 538: Spurious ARGSUSED. * 540: No clue why this is called ip_squeue_extend(). Seems to be ip_squeue_complete_bind() or somesuch. * 562: Unclear how we can get here with cpuid == -1. Seems like this should be an ASSERT(). * 594: Comment appears to be spurious as there''s no ill_ring_state. * 631: Would be cheaper/cleaner to make this test before doing the kmem_zalloc(). * 640: Some comments explaining why we do the taskq_dispatch() here would be helpful. uts/common/inet/ip.h: * 2009: ill_notify_mh seems too generic; perhaps ill_flownotify_mh? * Synchronization rules for ill_tqlist, ill_notify_mh an ill_capab_pending_cnt need to be documented in the table. * 3432: Name this enum and use it when defining rr_ring_state. * 3449: What is ip_dls_unbind_t? * 3458-3610: This appears a bit stale, as none of ill_mac_accept() dls_accept_pkts() or dls_sendup() exist. * 3465: Unclear what this comment has to do with the provided. code, which doesn''t appear to implement this. * 3469: What is ip_mac_accept_t? * 3474: What is rr_handle -- rr_rx_handle or rr_intr_handle? * 3490: What is rr_cpu_id for? * 3529-3539: Revised IP squeue implementation have left a few of these adrift, like ip_squeue_free() and ip_squeue_tcp(). uts/common/inet/ip_impl.h: * 537: What is ill_direct_tx()? uts/common/inet/ip_if.h: * 148: ILL_MAC_PERIM_HELD() seems a more appropriate name since it only provides the guarantee for the specified ill. uts/common/inet/ip/ip_if.c: * 530: Comment references non-existent ip_squeue_soft_ring variable, also implying that the Niagara/Ontario references are incorrect or stale. * 829: Why isn''t ILL_CAPAB_DLD_LSO in this list? * 1744: Given that we''re writer on the IPSQ, how could ILL_CONDEMNED have been set? * 2067: Indeed, it seems bad if ill_capability_dld_reset() silently fails not to send a reset message. Maybe the reset message(s) need to be preallocated as part of doing initial capability negotiation? As an aside, the error messages here appear to refer to MDT. * 2641, 2664: I''m a bit confused about the terminology here; it seems like ill_ring_bind() and ill_ring_add() belong to the set of ip_squeue_* routines off in ip_squeue.c -- that is, they seem to be ip_squeue_add_ring() and ip_squeue_bind_ring(). It also seems like ip_squeue_clean_ring() should be ip_squeue_unbind_ring(). Looks like all of this may also be responsible for having to #include <sys/squeue_impl.h> in ip_if.c. * 2640: Unclear what it means for ill_ring_bind() to fail, as neither the direct call from ill_ring_add() nor the indirect call through mci_resource_bind() seem to check the return value. (As an aside, many return values from pointer-to-function calls seem to be ignored, but lint isn''t smart enough to notice.) * 2641: Why does ill_ring_bind() take both an ill_t *ill and an ill_rx_ring_t *rx_ring? Is there a case where rx_ring->rr_ill != ill? (The same is true of most of the ip_squeue_*_ring() functions.) * 2644: I''m not convinced this ASSERT() is needed; ill_ring_bind() itself really doesn''t care about cpu_lock, and ip_squeue_cpu_move() already has this ASSERT(). * 2646, 2680: Why do we check ILL_CONDEMNED here? Is it needed for correctness? * 2655: How does one get here with a NULL rr_sqp? It appears the only time it becomes NULL is when we''ve gone through ip_squeue_clean_ring() -> squeue_worker_thr_control(), but ip_squeue_clean_ring() sets rr_ring_state to RR_FREE_INPROG, which we''ve already checked for. * 2676: Why do we "gracefully" fail if mr_type is not MAC_RX_FIFO? I''d expect this to be an ASSERT(). * 2723: This seems dangerous, as ip_squeue_getfree() may block and we''re holding ill_lock. Seem like ip_squeue_getfree() needs to be changed to be non-blocking, and we need to handle failure. * 2734: If someone wants ill_name, they can get it from the ill_t pointer we pass; no need to clutter up the DTrace probe points with needless arguments. * 3122, 3126, 3146, 3155: Direct use of DLD_VERSION_1 seems strange. I would''ve expected DLD_CURRENT_VERSION or somesuch. * 3141: I''m surprised by this check; since ill_dld_capab is only NULLed out in ill_delete_tail(), it seems even if we do e.g. ill_capability_dld_reset(), ill_dld_capab will not be reinitialized. * 3179, 3188: Should VERIFY() that these succeeded. * 3192: A boolean_t would be more appropriate for ill_mac_perim_held(). * 3230, 3312: Unclear why these use CE_CONT and hardcode "warning" rather than using CE_WARN. * 3299: Don''t we need to disable DLD_CAPAB_LSO in this error case? * 3365-3373: Holding ill_lock here seems dangerous since idd_tx_cb_df() can block via mac_client_tx_notify() -> mac_client_tx_notify_remove() -> mac_callback_remove_wait(). Further, unclear why ill_lock is needed at all here since neither ill_notify_mp nor ill_capabilities seems to require it, and it doesn''t appear to provide any synchronization with ill_flow_enable(). * 3481: Comment here is stale; this is not called from ip_rput_dlpi_writer(). * 3503: Nit: no need for ?: operator here, as == does the job. * 3566: Is capability negotiation performance-critical enough to justify the two ways into ill_capability_ack_thr? Why not just always cv_signal() the ill_taskq_dispatch thread? * 3369: Should we VERIFY() that this succeeded? * 3382-3384: Unclear why unbind_conn variable is needed. More broadly, I''m unclear what idp_unbind_conn is for. Nothing seems to use it -- we just create it and destroy it. * 18220: This comment is wrong: we still receive high-priority DLPI responses even after ILL_CONDEMNED is set -- and FWIW all standard DLPI ACKs are high-priority per the DLPI spec. As such, it seems like one issue is that we didn''t define DL_CAPABILITY_ACK as being high-priority. * 3434, 3435, 18225: ill_capability_dld_disable_thr doesn''t exist. * 18347-18373: The new ill_capab_send()/done() logic appears unnecessary. In particular, after the changes for 6651864 introduced ipsq_current_done, which should provide the mechanism you need to ensure another ISPQ operation won''t sneak in. If not, please justify. Also, please be sure to revisit the comment on lines 3415-3419. uts/common/inet/ip/ip.c: * 5666: Where are ips_capab_taskq_cv, ips_capab_taskq_lock, and ips_capab_taskq_list cv/mutex/list_destroy()''d? * 6945, 6948: What does this change have to do with anything? * 14032: This is a big-time hack. If there are checks that can be optimized out, we need to indicate what they are semantically (e.g., a unicast_only argument or somesuch) so that it can be maintained. Also, why replicate lines 14079-14080 rather than just hoisting `skip'' to 14078? * 14200: Nit: name here is awkward; maybe just DEV_Q_FLOW_BLOCKED to match ILL_FLOW_BLOCKED? * 14783: Unclear why this was changed; it was right before. * 15427-15634: As others have mentioned, why is this needed? This will need to be evolved/maintained as a duplicate codepath, only to be shot by IP datapath refactoring -- and e.g. the check on line 15582 puts load-spread IPMP traffic on the slow path, and will put all IPMP traffic on the slow path after Clearview IPMP because the IRE_LOCAL''s ire_rfq is never ill_rq (though it appears straightforward to make it work with Clearview IPMP, it''s yet another datapath to test). If you do decide to keep this function, please at least consider adding common "next" label the bottom of the loop that has the ADD_TO_CHAIN() logic inlined (and the macro removed) along with the ire_refrele()/ire=NULL dance. uts/common/inet/ip/ip6.c: * 114: ip_squeue_flag is already in <sys/squeue_impl.h>; remove. (That said, it seems like it should be in <inet/ip.h>) pkgdefs/SUNWhea/prototype_com: * Why are we shipping any GLDv3 headers at this point? tools/scripts/bfu.sh: * 222-223: Nit: this list is supposed to be alphabetical. -- meem
Kais Belgaied
2008-Sep-29 15:27 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On 09/27/08 22:58, Peter Memishian wrote:> Unfortunately, I only had time to look over the IP changes. > >Thanks Meem for the review. The team will be addressing all comments in a few days. Also, stay tuned for Phase III. Kais,