Kais Belgaied
2008-Sep-08 22:05 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
(networking-discuss Bcc''ed) Good afternoon, The Crossbow team would like to invite you to participate in the project''s second phase of code review. The second phase of the review starts today, and will last for two weeks. It covers the following parts of the code: - MAC layer: . MAC client interface . MAC driver interfaces . Hybrid I/O support . resource management and allocation . packet scheduling . software classification . soft-rings, SRS, and polling . transmit flow control and fanout . promiscuous mode - Other GLDv3 changes: . vnic . DLS . DLD . softmac - IP changes - e1000g driver The webrev is available in http://www.opensolaris.org/os/project/crossbow/Code-review2 The link above includes the suggested format for comments and a list of known issues that will be addressed in the next phase. For a refresher about the project features detailed design, we recommend the updated documents, http://www.opensolaris.org/os/project/crossbow/Docs/ Interested? please send the code review comments by September 20th, 2008 Thanks, The Crossbow team.
Kais, When will crossbow be converting to mercurial? I just pulled up arpddi.c and straight off the bat I see #pragma ident (these are gone with hg if the code is "hg pbchk" clean) So the reason for asking is two fold: - you won''t be doing a putback from the gate that generated the webrev (obviously) and more importantly.. - it might be safer to have code reviewers look at a gate that is after the transition in case the change brings problems with it. Thoughts? Darren
Location: mac.c 774-876
Type: T
Priority: 3
Comment: Underlying the calls to mac_perim_enter/exit is a new
variation on using mutex''s with kcondvar_t''s to implement
locking.
Whilst similar to the macros in <sys/condvar_impl.h>, it differs in
that it allows recursion on write locking if the owning thread is
reacquiring the same lock. What I''d like the project to consider
is that rather than embedding this new locking style into these
functions, break it out so that this design can be reused later as
it seems generally useful.
Location: mac_flow.c 1777-1795 (example)
Type: T
Priority: 2
Comment: the organisation of the flow_desc_t structure should be
revised so that instead of having two individual checks for IPv4
and IPv6, there is a single 32bit mask and equality check that
covers the IP version and dsfield information. This should work
equally well for Ipv4 (with space left over for protocol matching)
and Ipv6, where you may also be able to do protocol matching,
so long as flow id''s aren''t being used. So in an appropriately
designed structure, rather than having:
1780 switch (l3info->l3_version) {
1781 case IPV4_VERSION: {
1782 ipha_t *ipha = (ipha_t *)l3info->l3_start;
1783
1784 return ((ipha->ipha_type_of_service &
1785 fd->fd_dsfield_mask) == fd->fd_dsfield);
1786 }
1787 case IPV6_VERSION: {
1788 ip6_t *ip6h = (ip6_t *)l3info->l3_start;
1789
1790 return ((IPV6_FLOW_TCLASS(ip6h->ip6_vcf) &
1791 fd->fd_dsfield_mask) == fd->fd_dsfield);
1792 }
1793 default:
1794 return (B_FALSE);
1795 }
You would have:
return ((((uint32_t *)ipha) & fd->fd_l3hdrmask) == fd->fd_l3hdrs);
Location: mac_flow.c 2004-2005
Type: T
Priority: 3
Comment: there should be no need for the ?: here, the assignment of
something to fe_match can be done inside the above if().
Location: mac_flow.c 2089-2093
Type: T
Priority: 3
Comment: If fd_ipversion isn''t 4, then what else can it be but 6?
We''ve already ASSERT''d that we have an IP version (2058, 2060)
that
is valid..
Darren
On Tue, Sep 09, 2008 at 08:48:39PM -0700, Darren Reed wrote:> Location: mac.c 774-876 > Type: T > Priority: 3 > Comment: Underlying the calls to mac_perim_enter/exit is a new > variation on using mutex''s with kcondvar_t''s to implement locking. > Whilst similar to the macros in <sys/condvar_impl.h>, it differs in > that it allows recursion on write locking if the owning thread is > reacquiring the same lock. What I''d like the project to consider > is that rather than embedding this new locking style into these > functions, break it out so that this design can be reused later as > it seems generally useful. >I''ll defer to thiru on this one.> Location: mac_flow.c 1777-1795 (example) > Type: T > Priority: 2 > Comment: the organisation of the flow_desc_t structure should be > revised so that instead of having two individual checks for IPv4 > and IPv6, there is a single 32bit mask and equality check that > covers the IP version and dsfield information. This should work > equally well for Ipv4 (with space left over for protocol matching) > and Ipv6, where you may also be able to do protocol matching, > so long as flow id''s aren''t being used. So in an appropriately > designed structure, rather than having: > > 1780 switch (l3info->l3_version) { > 1781 case IPV4_VERSION: { > 1782 ipha_t *ipha = (ipha_t *)l3info->l3_start; > 1783 > 1784 return ((ipha->ipha_type_of_service & > 1785 fd->fd_dsfield_mask) == fd->fd_dsfield); > 1786 } > 1787 case IPV6_VERSION: { > 1788 ip6_t *ip6h = (ip6_t *)l3info->l3_start; > 1789 > 1790 return ((IPV6_FLOW_TCLASS(ip6h->ip6_vcf) & > 1791 fd->fd_dsfield_mask) == fd->fd_dsfield); > 1792 } > 1793 default: > 1794 return (B_FALSE); > 1795 } > > You would have: > return ((((uint32_t *)ipha) & fd->fd_l3hdrmask) == fd->fd_l3hdrs); >the current flow_desc_t design allows the user to specify a dsfield without also specifying ip version. whereas your design makes the ip version mandatory (the fd_l3hdrmask above is ip version dependent because the tos field in ipv4 and traffic class in ipv6 exist in different offsets) also, I think that saving two comparisons is not worth sacrificing readability. most fields in the flow_desc_t now have 1-1 correspondence with user-settable attributes. we deliberately chose to make this structure independent of header format details.> Location: mac_flow.c 2004-2005 > Type: T > Priority: 3 > Comment: there should be no need for the ?: here, the assignment of > something to fe_match can be done inside the above if(). >ACCEPT.> Location: mac_flow.c 2089-2093 > Type: T > Priority: 3 > Comment: If fd_ipversion isn''t 4, then what else can it be but 6? > We''ve already ASSERT''d that we have an IP version (2058, 2060) that > is valid.. >ACCEPT. will change v6 comparison into an assert. eric
Kais Belgaied
2008-Sep-10 15:16 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Hi Darren, the current plan is to convert to Mercurial within a couple of weeks, before the next resync to the latest onnv snapshot. On 09/09/08 19:46, Darren Reed wrote:> Kais, > > When will crossbow be converting to mercurial? > > I just pulled up arpddi.c and straight off the bat I see #pragma ident > (these are gone with hg if the code is "hg pbchk" clean)yep, we''ll b need to run pbchk> > So the reason for asking is two fold: > - you won''t be doing a putback from the gate that generated the > webrev (obviously)that''s right> > and more importantly.. > > - it might be safer to have code reviewers look at a gate that is > after the transition in case the change brings problems with it. > > Thoughts?we''re asking for a review of the difference between onnv and crossbow gates. We realize that we''ve got two moving targets here, so the baseline code (onnv) had to be kept unchanged as much as possible for two reasons: 1. what you just mentioned. Minimize the potential confusion to our reviewers. 2. Prevent the insanely costly mergers that are bound to be error prone. That''s why a few months ago the Crossbow team internally requested that no major changes are done to the Solaris networking stack, DLD, DLS, MAC, softmac, aggr, etc. All the areas in Crossbow''s scope. The rest of the Solaris networking community inside Sun was quite understanding and kindly accepted to re-arrange their schedule to integrate their projects shortly after Crossbow, and in cases, started working off children of the crossbow-gate. Once we address the reviewers comments for this Phase, we''ll send a refreshed webrev against onnv. The content of the delta should be similar, barring the changes for the code review comments of course. When the changes are non trivial, we''ll probably be efficient to include the new location (file name + line numbers) per individual change. Kais> > Darren > > >
Location: mac_sched.c Type: T Priority: 1 Comment: To what extent is header stacking in IPv6 being supported by Crossbow? The only one I see supported is IPPROTO_HOPOPTS. Of immediate concern is that this leaves out IPPROTO_FRAGMENT, IPPROTO_ROUTING and IPPROTO_DSTOPTS. Is further handling of IPv6 part of the plan for further work on Crossbow? (The exclusion of IPPROTO_FRAGMENT would appear to be the most pressing one.) Location: mac_sched.c, mac_flow.c Type: T Priority: 3 Comment: In mac_sched.c we examine the header and say ''type=OTH'' for fragments. How is this knowledge transferred through to the flow classification in mac_flow.c?
On Wed, Sep 10, 2008 at 04:23:06PM -0700, Darren Reed wrote:> Location: mac_sched.c > Type: T > Priority: 1 > Comment: To what extent is header stacking in IPv6 being supported > by Crossbow? The only one I see supported is IPPROTO_HOPOPTS. > Of immediate concern is that this leaves out IPPROTO_FRAGMENT, > IPPROTO_ROUTING and IPPROTO_DSTOPTS. Is further handling > of IPv6 part of the plan for further work on Crossbow? (The exclusion > of IPPROTO_FRAGMENT would appear to be the most pressing one.) >this is being worked on.> Location: mac_sched.c, mac_flow.c > Type: T > Priority: 3 > Comment: In mac_sched.c we examine the header and say ''type=OTH'' > for fragments. How is this knowledge transferred through to the flow > classification in mac_flow.c? >actually, the mac_sched code is entered after flow classification. so any pre-parsed state should be transferred the other way around (mac_flow -> mac_sched). unfortunately this is not being done now due to the current code structure, we''ll try to see if this is feasible in the upcoming weeks. eric
On 09/10/08 17:08, Eric Cheng wrote:> On Wed, Sep 10, 2008 at 04:23:06PM -0700, Darren Reed wrote: > >> Location: mac_sched.c >> Type: T >> Priority: 1 >> Comment: To what extent is header stacking in IPv6 being supported >> by Crossbow? The only one I see supported is IPPROTO_HOPOPTS. >> Of immediate concern is that this leaves out IPPROTO_FRAGMENT, >> IPPROTO_ROUTING and IPPROTO_DSTOPTS. Is further handling >> of IPv6 part of the plan for further work on Crossbow? (The exclusion >> of IPPROTO_FRAGMENT would appear to be the most pressing one.) >> >> > > this is being worked on. > > >> Location: mac_sched.c, mac_flow.c >> Type: T >> Priority: 3 >> Comment: In mac_sched.c we examine the header and say ''type=OTH'' >> for fragments. How is this knowledge transferred through to the flow >> classification in mac_flow.c? >> >> > > actually, the mac_sched code is entered after flow classification. > so any pre-parsed state should be transferred the other way around > (mac_flow -> mac_sched). unfortunately this is not being > done now due to the current code structure, we''ll try to see if > this is feasible in the upcoming weeks. >In that case: Location: mac_flow.c Type: T Priority: 1 Comment: Throughout this code the assumption is made that the packet is not a fragment and that what follows the IP header is in fact a layer 4 protocol header. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20080910/5b4bc6ff/attachment.html>
Darren Reed wrote:> > In that case: > > Location: mac_flow.c > Type: T > Priority: 1 > Comment: Throughout this code the assumption is made that the > packet is not a fragment and that what follows the IP header is in > fact a layer 4 protocol header. >I think we have quite a few PKT_TOO_SMALL() checks to ensure the transport header really exists before dereferencing it. do you not think that''s sufficient? the likelihood of having a fragment <= ip + mac header sizes is small so we chose to fail the classification for these cases (the packet would not be dropped and will reach ip. it''s just that we don''t want to be in business of reassembling fragments within the mac) eric
On 09/10/08 17:43, Eric Cheng wrote:> Darren Reed wrote: >> >> In that case: >> >> Location: mac_flow.c >> Type: T >> Priority: 1 >> Comment: Throughout this code the assumption is made that the >> packet is not a fragment and that what follows the IP header is in >> fact a layer 4 protocol header. >> > > I think we have quite a few PKT_TOO_SMALL() checks to ensure the > transport header really exists before dereferencing it. do you not > think that''s sufficient? the likelihood of having a fragment <= ip + > mac header sizes is small so we chose to fail the classification for > these cases (the packet would not be dropped and will reach ip. it''s > just that we don''t want to be in business of reassembling fragments > within the mac)PKT_TOO_SMALL() verifies that you have "n" bytes of data after IP. It doesn''t guarantee you that it is fragmented data or protocol header. Darren
Darren Reed wrote:> On 09/10/08 17:43, Eric Cheng wrote: >> Darren Reed wrote: >>> In that case: >>> >>> Location: mac_flow.c >>> Type: T >>> Priority: 1 >>> Comment: Throughout this code the assumption is made that the >>> packet is not a fragment and that what follows the IP header is in >>> fact a layer 4 protocol header. >>> >> I think we have quite a few PKT_TOO_SMALL() checks to ensure the >> transport header really exists before dereferencing it. do you not >> think that''s sufficient? the likelihood of having a fragment <= ip + >> mac header sizes is small so we chose to fail the classification for >> these cases (the packet would not be dropped and will reach ip. it''s >> just that we don''t want to be in business of reassembling fragments >> within the mac) > > PKT_TOO_SMALL() verifies that you have "n" bytes of data after IP. > It doesn''t guarantee you that it is fragmented data or protocol header. >ok, I see what you''re saying. we''ll look into this. eric
I will send more comments in the next few days, but firstly a couple of things that caught my eyes: Location: mac_sched.c, squeue.c Type: T Priority: 1 Comment: Could someone explain a bit how mac_client_srs_poll() works? As I understand it directly calls the driver poll entry point when there is a RX ring assigned, so the b_rptr in mblks point at the MAC header. But this function is called from squeue_get_pkts(), which later calls ip_accept_tcp() but the latter assumes the b_rptr is pointing at the IP header... but I didn''t see any b_rptr adjustment across these function calls, or am I missing anything? Also, is this a fast path that will only be taken when there''s no promisc listeners? As I didn''t find any call to functions which dispatch packets to promisc clients. Location: mac_sched.c Type: E Priority: 1 Comment: Is there any reason that we grab the mac_bw_lock to update mac_bw_sz but only for MAC_RX_SRS_ENQUEUE_CHAIN but not for MAC_TX_SRS_ENQUEUE_CHAIN ? Probably a typo? More later. Thanks, Zhijun Kais Belgaied wrote:> (networking-discuss Bcc''ed) > > Good afternoon, > > The Crossbow team would like to invite you to participate in the project''s > second phase of code review. > > The second phase of the review starts today, and will last for > two weeks. It covers the following > parts of the code: > > - MAC layer: > . MAC client interface > . MAC driver interfaces > . Hybrid I/O support > . resource management and allocation > . packet scheduling > . software classification > . soft-rings, SRS, and polling > . transmit flow control and fanout > . promiscuous mode > - Other GLDv3 changes: > . vnic > . DLS > . DLD > . softmac > - IP changes > - e1000g driver > > > The webrev is available in > http://www.opensolaris.org/os/project/crossbow/Code-review2 > > The link above includes the suggested format for comments and a list > of known issues that will be addressed in the next phase. > > For a refresher about the project features detailed design, we recommend > the > updated documents, > http://www.opensolaris.org/os/project/crossbow/Docs/ > > Interested? please send the code review comments by September 20th, 2008 > > Thanks, > > The Crossbow team. > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >
rajagopal kunhappan
2008-Sep-11 21:29 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Zhijun Fu wrote:> I will send more comments in the next few days, but firstly a couple of things that > caught my eyes: > > Location: mac_sched.c, squeue.c > Type: T > Priority: 1 > Comment: Could someone explain a bit how mac_client_srs_poll() works? As I understand > it directly calls the driver poll entry point when there is a RX ring assigned, > so the b_rptr in mblks point at the MAC header. But this function is called from > squeue_get_pkts(), which later calls ip_accept_tcp() but the latter assumes the b_rptr > is pointing at the IP header... but I didn''t see any b_rptr adjustment across these > function calls, or am I missing anything?Agree with your comments about not moving the b_rptr to the ip header but this code is currently not in use and should have been "ifdef''d" out. Will do that. The idea behind the code was that TCP traffic could have its own hardware Rx ring and the SRS on top of it do the polling and deliver the packets to squeue and squeue could in turn ask the SRS to stop sending packets up, and do polling via SRS. Anyway the code is not in use. Sorry.> Also, is this a fast path that will only be taken when there''s no promisc listeners? > As I didn''t find any call to functions which dispatch packets to promisc clients. > > > Location: mac_sched.c > Type: E > Priority: 1 > Comment: Is there any reason that we grab the mac_bw_lock to update mac_bw_sz but > only for MAC_RX_SRS_ENQUEUE_CHAIN but not for MAC_TX_SRS_ENQUEUE_CHAIN ? Probably > a typo?This is one of the asymmetry between Rx and Tx side. On the Rx side, there could be multiple Rx rings associated with a link. When bw is configured for the link, you need to control the incoming packet rate from all the Rx rings. A single mac_bw_ctl_t structure is shared among all the soft_ring_set''s (SRS) associated with Rx rings. So you grab the mac_bw_lock and do the update. Note that there is a 1-1 mapping between an Rx ring and a SRS. On the Tx side, all outgoing packets will come to Tx SRS via mac_tx() call and we do the bw control under srs_lock. mac_bw_lock is not used. Thanks, -krgopi> > > More later. > > Thanks, > Zhijun > > > Kais Belgaied wrote: >> (networking-discuss Bcc''ed) >> >> Good afternoon, >> >> The Crossbow team would like to invite you to participate in the project''s >> second phase of code review. >> >> The second phase of the review starts today, and will last for >> two weeks. It covers the following >> parts of the code: >> >> - MAC layer: >> . MAC client interface >> . MAC driver interfaces >> . Hybrid I/O support >> . resource management and allocation >> . packet scheduling >> . software classification >> . soft-rings, SRS, and polling >> . transmit flow control and fanout >> . promiscuous mode >> - Other GLDv3 changes: >> . vnic >> . DLS >> . DLD >> . softmac >> - IP changes >> - e1000g driver >> >> >> The webrev is available in >> http://www.opensolaris.org/os/project/crossbow/Code-review2 >> >> The link above includes the suggested format for comments and a list >> of known issues that will be addressed in the next phase. >> >> For a refresher about the project features detailed design, we recommend >> the >> updated documents, >> http://www.opensolaris.org/os/project/crossbow/Docs/ >> >> Interested? please send the code review comments by September 20th, 2008 >> >> Thanks, >> >> The Crossbow team. >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >> > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss--
rajagopal kunhappan wrote:> Zhijun Fu wrote: >> I will send more comments in the next few days, but firstly a couple >> of things that >> caught my eyes: >> >> Location: mac_sched.c, squeue.c >> Type: T >> Priority: 1 >> Comment: Could someone explain a bit how mac_client_srs_poll() works? >> As I understand >> it directly calls the driver poll entry point when there is a RX ring >> assigned, >> so the b_rptr in mblks point at the MAC header. But this function is >> called from >> squeue_get_pkts(), which later calls ip_accept_tcp() but the latter >> assumes the b_rptr >> is pointing at the IP header... but I didn''t see any b_rptr >> adjustment across these >> function calls, or am I missing anything? > > Agree with your comments about not moving the b_rptr to the ip header > but this code is currently not in use and should have been "ifdef''d" > out. Will do that. The idea behind the code was that TCP traffic could > have its own hardware Rx ring and the SRS on top of it do the polling > and deliver the packets to squeue and squeue could in turn ask the SRS > to stop sending packets up, and do polling via SRS. Anyway the code is > not in use. Sorry.I see. Thanks.> >> Also, is this a fast path that will only be taken when there''s no >> promisc listeners? >> As I didn''t find any call to functions which dispatch packets to >> promisc clients. >> >> >> Location: mac_sched.c >> Type: E >> Priority: 1 >> Comment: Is there any reason that we grab the mac_bw_lock to update >> mac_bw_sz but >> only for MAC_RX_SRS_ENQUEUE_CHAIN but not for >> MAC_TX_SRS_ENQUEUE_CHAIN ? Probably >> a typo? > > This is one of the asymmetry between Rx and Tx side. On the Rx side, > there could be multiple Rx rings associated with a link. When bw is > configured for the link, you need to control the incoming packet rate > from all the Rx rings. A single mac_bw_ctl_t structure is shared among > all the soft_ring_set''s (SRS) associated with Rx rings. So you grab > the mac_bw_lock and do the update. Note that there is a 1-1 mapping > between an Rx ring and a SRS. > > On the Tx side, all outgoing packets will come to Tx SRS via mac_tx() > call and we do the bw control under srs_lock. mac_bw_lock is not used.Got it. Thanks, Zhijun> > Thanks, > -krgopi > >> >> >> More later. >> >> Thanks, >> Zhijun >> >> >> Kais Belgaied wrote: >>> (networking-discuss Bcc''ed) >>> >>> Good afternoon, >>> >>> The Crossbow team would like to invite you to participate in the >>> project''s >>> second phase of code review. >>> >>> The second phase of the review starts today, and will last for >>> two weeks. It covers the following >>> parts of the code: >>> >>> - MAC layer: >>> . MAC client interface >>> . MAC driver interfaces >>> . Hybrid I/O support >>> . resource management and allocation >>> . packet scheduling >>> . software classification >>> . soft-rings, SRS, and polling >>> . transmit flow control and fanout >>> . promiscuous mode >>> - Other GLDv3 changes: >>> . vnic >>> . DLS >>> . DLD >>> . softmac >>> - IP changes >>> - e1000g driver >>> >>> >>> The webrev is available in >>> http://www.opensolaris.org/os/project/crossbow/Code-review2 >>> >>> The link above includes the suggested format for comments and a list >>> of known issues that will be addressed in the next phase. >>> >>> For a refresher about the project features detailed design, we >>> recommend the >>> updated documents, >>> http://www.opensolaris.org/os/project/crossbow/Docs/ >>> >>> Interested? please send the code review comments by September 20th, >>> 2008 >>> >>> Thanks, >>> >>> The Crossbow team. >>> _______________________________________________ >>> crossbow-discuss mailing list >>> crossbow-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >>> >> >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >
Peter Memishian
2008-Sep-12 00:51 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
> Agree with your comments about not moving the b_rptr to the ip header> but this code is currently not in use and should have been "ifdef''d" > out. Will do that. So this code will be made to work before Crossbow is integrated? If not, shouldn''t it just be removed? -- meem
Thirumalai Srinivasan
2008-Sep-15 21:12 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Kais, I looked at some parts of the MAC code. Attached is the list of comments Thirumalai Kais Belgaied wrote:> (networking-discuss Bcc''ed) > > Good afternoon, > > The Crossbow team would like to invite you to participate in the project''s > second phase of code review. > > The second phase of the review starts today, and will last for > two weeks. It covers the following > parts of the code: > > - MAC layer: > . MAC client interface > . MAC driver interfaces > . Hybrid I/O support > . resource management and allocation > . packet scheduling > . software classification > . soft-rings, SRS, and polling > . transmit flow control and fanout > . promiscuous mode > - Other GLDv3 changes: > . vnic > . DLS > . DLD > . softmac > - IP changes > - e1000g driver > > > The webrev is available in > http://www.opensolaris.org/os/project/crossbow/Code-review2 > > The link above includes the suggested format for comments and a list > of known issues that will be addressed in the next phase. > > For a refresher about the project features detailed design, we recommend > the > updated documents, > http://www.opensolaris.org/os/project/crossbow/Docs/ > > Interested? please send the code review comments by September 20th, 2008 > > Thanks, > > The Crossbow team. > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >-------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cr_phase2 URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20080915/e5bd2541/attachment.ksh>
Here are my comments for mac.c.
the rest will come later.
general issues:
EC-G1
there are lots of places using "(mask & flag)" as a boolean. this
might
seem ok if used in a conditional but sometimes these actually gets saved
in boolean variables. mdb would end up showing numeric values instead of
B_TRUE/B_FALSE if this happens. it''s best to replace all with
"((mask & flag) != 0)"
EC-G2:
naming of dtrace probes.
I have noticed lots of probes with names like
"mac__whatever__something"
inside function "mac_whatever()". the "mac__whatever" prefix
should be
removed.
if the intention is to make the probe name unique, that''s not necessary
because the same probe name can be used in multiple functions. to enable
a specific probe within a specific function at run time, you just have
to do this: dtrace -n ''mac_whatever:something''
which is much simpler than
''mac_whatever:mac-whatever-something''.
specific issues:
mac.c:
EC-1:
mi_lock, mi_rx_lock, mi_tx_lock, mi_rx_cv are all unused. it might also
be possible to remove mi_ring_lock and use mi_rw_lock instead.
EC-2:
in mac_fini():
rename mac_soft_ring_finish() to mac_soft_ring_fini() to be consistent
with other *_fini calls in this function.
EC-3:
remove i_mac_ring_ctor/dtor and do the initialization/destruction in
mac_ring_alloc/free instead.
also, kmem_cache_alloc("mac_ring_cache",...) should pass NULL for
ctor/dtor.
EC-4:
MAC_NOTE_PROMISC and mi_promisc does not seem to be used anymore. the
callback str_notify() does nothing.
EC-5:
in mac_addr_factory_*(), why does the slot # have to be >= 1? why
can''t
the slot index (starts from 0) be returned to the caller?
EC-6:
mac_rx_group_unmark() should be moved closer to where it''s used. also,
before resetting a flag, need to assert that it''s really set.
EC-7:
mac_tx_classify(), mac_rx_classify():
better return an int (the err code) instead of the flent. it''d be
helpful to tell whether mac_flow_lookup() or FLOW_TRY_REFHOLD() failed.
EC-8:
mac_tx_send():
for the fastpath case, mi_tx() is capable of taking a chain so there
is no need to send each mp separately. (a bug might already exist)
on a related note, the driver entry point used by mac_ring_tx()
doesn''t seem to take a chain. why is this? behavior of mri_send()
should be consistent with mi_tx().
EC-9:
mac_set_rx_group_state():
need to add checks to ensure it''s legal to transition from
grp->mrg_state to state.
EC-10:
the macro MAC_RX_GROUP_ONLY_CLIENT() does not seem necessary.
in mac_set_rx_group_state(), mac_rx_srs_clone(), the macro can be
replaced with grp->mrg_clients->mgc_client. because the returned mcip
is assumed to be non-NULL.
in mac_rx_group_add_flow(), mac_rx_group_remove_flow(), the macro could
be replaced with "grp->mrg_clients->mgc_nflows == 1".
(mgc_nflows should be renamed mgc_nclients).
in mac_reserve_rx_group(), the check:
if (!MAX_RX_GROUP_NO_CLIENT(grp) &&
(MAC_RX_GROUP_ONLY_CLIENT(grp) != mcip)) {
....
}
should be replaced with:
if (grp->mrg_clients->mgc_nclients == 1 &&
grp->mrg_clients->mgc_client != mcip) {
....
}
all references to MAX_RX_GROUP_NO_CLIENT() (badly misnamed)
should be replaced with "grp->mrg_clients->mgc_nclients == 0".
EC-11:
the macros GROUP_INTR_* should be placed into functions just like
mac_start_group(), mac_stop_group()...etc.
EC-12:
quite a few functions seem to be related to srs''s and should belong
in mac_datapath_setup.c. e.g.:
mac_rx_srs_clone(), mac_rx_srs_quiesce(), mac_rx_srs_remove(),
mac_srs_clear_flag(), mac_rx_srs_restart(), mac_rx_classify_flow_rem(),
mac_rx_classify_flow_quiesce(), mac_rx_classify_flow_restart(),
mac_rx_client_quiesce(), mac_rx_client_restart(), mac_tx_srs_quiesce(),
mac_tx_srs_restart(), mac_tx_flow_quiesce(), mac_tx_flow_restart(),
mac_tx_client_flush(), mac_client_quiesce(), mac_client_restart(),
i_mac_tx_srs_notify()
EC-13:
s/MAC_PROP_BIND_CPU/MAC_PROP_CPUS/
s/MAC_PROP_PRIO/MAC_PROP_PRIORITY/
in mac_{set,get}_prop(), the "switch (macprop->mp_id)" is not
needed
since we already know this is a mac prop. just call bcopy and
mac_{set,get}_resources() directly.
EC-14:
mac_ring_*(), mac_group_*() routines do not have a consistent naming
convention. mac_start_ring(), mac_start_group()..etc should be changed
to mac_ring_start(), mac_group_start(),..etc.
EC-15:
in mac_init_rings(), the last group (i.e. groups[cap_rings->mr_gnum])
is used as a dummy group for storing unused rings.
it''d be better to have an explicit unused-rings group (e.g.
mi_rx_group_unused, mi_tx_group_unused).
related to this is that mac_rx_reserve_group() doesn''t seem to be
taking rings from the last group mi_rx_groups[mip->mi_rx_group_count],
it takes rings from mi_rx_groups[0], which doesn''t have the leftover
rings.
EC-16:
remove the RING_SEND_FUNC(), RING_DRIVER_HANDLE() and related macros.
these macros will not be used directly by anyone. the mac_ring_*()
functions already act as wrappers for clients and can dereference
ring->mr_info directly.
EC-17:
in i_mac_group_add_ring(), cap_rings->mr_gaddring would only get
called if mrg_driver is not NULL. but it doesn''t seem like mrg_driver
can ever be NULL for a valid mac_group_t. the NULL check can be
removed. (do the same for i_mac_group_rem_ring())
EC-18:
mac_group_mov_ring():
s_group cannot be NULL so the check can be removed.
EC-19:
these functions are unused:
mac_group_addring(), mac_group_remring(),
mac_get_macaddr_group()
EC-20:
mac_free_macaddr():
this function already takes the map so there''s no need to call
mac_find_macaddr(). just remove it directly from the
mi_addresses list.
EC-21:
mac_add_macaddr():
what if group != NULL && group != mip->mi_rx_groups?
you''ll end up freeing the map and returning 0, a silent error.
it''d also be useful to add an output argument that returns
the mac_address_t pointer. with this, the callers won''t have to
make an extra call to mac_find_macaddr() to get the pointer.
EC-22:
mac_remove_macaddr():
the check,
if (map->ma_group == NULL)
return (0);
will cause leakage of map.
this,
if (err != 0)
return (err);
will need to re-adjust map->ma_ref before returning.
EC-23:
why is mac_init_macaddr()/mac_fini_macaddr() necessary? why not just
call mac_add_macaddr()/mac_remove_macaddr() directly?
EC-24:
mac_update_macaddr():
it''s better to do the duplicate detection in here rather than relying
on the caller. this should remove a few mac_find_macaddr()''s from the
callers.
EC-25:
mac_write_flow_desc(),
this needs to use fd_mac_len rather than ETHERADDRL.
EC-26:
mac_log_flowinfo():
this code reminds of a potential misuse of subflows.
the user could create a subflow with the same name as a vnic.
supposing the subflow is created on another link and the vnic with same
name as the subflow is not yet up (not mac_registered), when the vnic
comes up, it won''t be able to create its kstats because someone took
its name. we might need some naming scheme for subflow kstats to ensure
no collision with link names.
EC-27:
mac_log_flowinfo():
I think this function assumes flent->fe_mcip is non-NULL because if
this is not so, panic will happen at mac_write_flow_desc().
EC-28:
mac_reserve_tx_ring():
is there a need for this code:
/* find the client which owns that ring */
for (client = mip->mi_clients_list; client != NULL;
client = client->mci_client_next) {
srs = client->mci_tx_srs;
if (mac_tx_srs_ring_present(srs,
desired_ring)) {
/* found our ring */
break;
}
}
since we could get the srs from desired_ring->mr_srs and client from
srs->srs_mcip.
this will panic:
/* give a new ring to the client... */
sring = mac_reserve_tx_ring(mip, NULL);
ASSERT(sring != NULL);
if there are no free rings available.
this function needs to return an int error code because it could
fail here and in mac_start_ring() and we need to be able to tell what
happened.
EC-29:
i_mac_group_allocate_rings():
need to add checks to ensure that the ''nrings'' returned by
ms_squery is
non-zero. just the assert is not enough.
the ''rings'' array could be leaked in the error cases below.
(ran out of rings, failed to move rings)
if mac_group_mov_ring() fails for the tx case, 0 is returned instead
of rv. also, tmp_ring needs to be released before returning.
EC-30:
mac_reserve_rx_group():
the code in the loop:
for (i = start; i < start + loopcount; i++) {
}
should be moved to a function. mixing breaks and continues makes
the code somewhat confusing. doing this will also allow proper
error codes to be returned (provided that mac_reserve_rx_group()
is changed to return int also)
EC-31:
mac_rx_group_add_flow():
if mac_start_group() fails, the mac_add_macaddr() above needs to
be undone.
EC-32:
mac_rx_group_remove_flow():
this function should be changed to return an int.
mac_set_rx_group_state() is returning void now but that needs to be
fixed too because it can fail in many ways.
EC-33:
mac_reserve_tx_group():
this function assumes that mac_start_group() never fails and tx groups
never run out. both of the error cases need to be handled.
similar to the above functions, this should be changed to return in int.
the mac_group_t* could be returned as an output arg.
this is good for diagnostic purposes since we have multiple points of
failure.
Kais Belgaied
2008-Sep-16 23:33 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On 09/15/08 14:12, Thirumalai Srinivasan wrote:> > Kais, > > I looked at some parts of the MAC code. Attached is the list of commentsThanks Thiru for the review, The crossbow team will be addressing these comments and replying in the next few days Kais> > Thirumalai > > Kais Belgaied wrote:
Kais Belgaied
2008-Sep-16 23:34 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On 09/16/08 05:24, Eric Cheng wrote:> Here are my comments for mac.c. > the rest will come later. >Thanks Eric for the review, The crossbow team will be addressing these comments and replying in the next few days
Kais, I looked at some of the mac changes, comments attached. Thanks, Zhijun Kais Belgaied wrote:> (networking-discuss Bcc''ed) > > Good afternoon, > > The Crossbow team would like to invite you to participate in the project''s > second phase of code review. > > The second phase of the review starts today, and will last for > two weeks. It covers the following > parts of the code: > > - MAC layer: > . MAC client interface > . MAC driver interfaces > . Hybrid I/O support > . resource management and allocation > . packet scheduling > . software classification > . soft-rings, SRS, and polling > . transmit flow control and fanout > . promiscuous mode > - Other GLDv3 changes: > . vnic > . DLS > . DLD > . softmac > - IP changes > - e1000g driver > > > The webrev is available in > http://www.opensolaris.org/os/project/crossbow/Code-review2 > > The link above includes the suggested format for comments and a list > of known issues that will be addressed in the next phase. > > For a refresher about the project features detailed design, we recommend > the > updated documents, > http://www.opensolaris.org/os/project/crossbow/Docs/ > > Interested? please send the code review comments by September 20th, 2008 > > Thanks, > > The Crossbow team. > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >-- #mdb -K [0]> eri.prc.sun.com::walk staff s|::print staff_t s_name| ::grep .== zhijun |::eval <s=K|::print staff_t Zhijun.Fu at Sun.COM, x84349 Network Virtualization & Performance Team, Solaris Core Operating Systems Since Jul 10,2006 [0]> :c -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Crossbow-code-review.txt URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20080918/b7fbf07f/attachment.txt>
Kais Belgaied
2008-Sep-18 14:15 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Thanks Zhijun for the review,
we''ll be going over all the comments and start addressing and replying
over the next few weeks.
Kais.
On 09/18/08 04:08, Zhijun Fu wrote:> Kais,
>
> I looked at some of the mac changes, comments attached.
>
> Thanks,
> Zhijun
Erik Nordmark
2008-Sep-22 21:45 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
* Reviewer Name: Erik Nordmark
* Time Spent
* General Comment:
I reviewed uts/common/inet/ files, but I didn''t pay close attention
to the capability changes.
There are several files which only have sccsid changes. Why not remove
them before sending out the code review?
---
* Comment Location: ip_impl.h et al
* Type: T
* Priority 1
* Comment Text
You have defined ILL_SEND_TX (line 508) as checking for mp != NULL.
Is is extremely bad practice both from an understanding and maintance
perspective.
When the code takes advantage of this as in tcp.c:
19711 FW_HOOKS(ipst->ips_ip4_physical_out_event,
19712 ipst->ips_ipv4firewall_physical_out,
19713 NULL, ill, ipha, mp, mp, 0, ipst);
19714 DTRACE_PROBE1(ip4__physical__out__end, mblk_t *, mp);
19715 DTRACE_IP_FASTPATH(mp, ipha, ill, ipha, NULL);
19716
19717 ILL_SEND_TX(ill, ire, connp, mp, 0);
there is nothing that indicates that mp could be NULL after FW_HOOKS; unlike
the current onnv-gate where this is made explicit.
Thus it wouldn''t be unreasonable to try to insert some references to mp
after
FW_HOOKS and ILL_SEND_TX; it is easier to understand with an explicit check
for mp != NULL. (And I suspect that DTRACE_IP_FASTPATH would actually
need to
check for NULL already.)
It is bad for performance since all code between FW_HOOKS and ILL_SEND_TX
have to check for NULL. And if ILL_SEND_TX is used in some other context
there
is an unneeded check for mp != NULL.
* Comment Location: ip6.c around line 12523
* Type: T
* Priority 1
* Comment Text
You are changing the code so it drops UDP/RAWIP packets instead of doing
a putq.
This means that UDP applications will see a behavior change.
I don''t think that is a good thing. Given that you preserve the putq
elsewhere
in the code, why have you removed it in ip_xmit_v6() only?
* Comment Location: udp.c around line 6364
* Type: E
* Priority 1
* Comment Text
I don''t see any explanation of how ip_wsrv() gets called once
ill_flow_blocked
is set (and there wasn''t a canputnext() to the driver).
Who calls ill_flow_enable()? Explain in comments.
* Comment Location: udp_impl.h
* Type: ET
* Priority 1
* Comment Text
ILL_FLOW_BLOCKED() sure seems like it should be a predicate (test the state
of the ill) but in fact it changes ill_flow_blocked. That is likely to lead
to maintenance problems. Please factor out the changing of ill_flow_blocked
so that it is not part of the macro (if that is done then you might no
longer need a macro).
Also, given that ILL_FLOW_BLOCKED is used by ip*.c, why is it defined in
a UDP
specific header file (udp_impl.h)?
* Comment Location: ip.c lock ordering
* Type: T
* Priority 2
* Comment Text
ip.c makes me loose sleep by involing cpu_lock in the TCP/IP lock hiearchy:
374 * squeue(sq_lock), flow related (ft_lock, fe_lock) locking
375 *
376 * cpu_lock --> ill_lock --> sqset_lock --> sq_lock
377 * sq_lock -> conn_lock -> QLOCK(q)
378 * ill_lock -> ft_lock -> fe_lock
Can we avoid defining lock hierarchies across subsystems? Why do we need to
hold cpu_lock when acquiring ill_lock and friends?
Given that ip_squeue_getfree() holds sqset_lock and then calls thread_lock()
I think there is a deadlock waiting to happen due to the lock order.
Does the code in ip_squeue require that the new CPU be known atomically
with the new CPU coming into existence? If it does not then the locking
interaction can be greatly simplified by having the callback from the
cpu code
just record that a new CPU has arrived and schedule a taskq to setup the
squeues etc.
* Comment Location: ip.c around L 5966
* Type: T
* Priority 1
* Comment Text
Missing cleaup in ip_stack_fini for the ipst->ips_capab_taskq_thread which
is created in ip_stack_init. (Alternatively, if it is sufficient to have
a single taskq thread for all IP instances you can move it to
ip_ddi_init with
cleanup in ip_ddi_destroy).
* Comment Location: ip.c L 14873
* Type: E
* Priority 2
* Comment Text
You change the last argument from NULL to 0 in
14783 ip_input(ill, NULL, mp, 0);
Since the last argument is a pointer, NULL is the correct value.
* Comment Location: ip.c
* Type: E
* Priority 2
* Comment Text
From where is ip_accept_tcp() called? Please add comment explaining this.
* Comment Location: usr/src/uts/intel/pfil/pfil.objt-symbols.obj64
* Type: T
* Priority 2
* Comment Text
Why do you touch that file? pfil no longer exists.
* Comment Location: ip_squeue.c L47
* Type: E
* Priority 1
* Comment Text
Please clarify whether squeues are destroyed or not.
The comments are conflicting at:
47 * When a CPU goes offline its squeue set is destroyed, and all
its squeues
and at
54 * them without holding the squeue lock because of the guarantee
that squeues
55 * are never destroyed. ip_squeue locks must be held, however.
* Comment Location: ip_squeue.c L265-271
* Type: T
* Priority 1
* Comment Text
This loop does nothing. Remove it? Or should it be break instead of
continue?
Currently sq is always NULL at the end of the loop, which means that part of
the code after line 271 never executes.
265 for (sq = sqs->sqs_head; sq != NULL; sq = sq->sq_next) {
266 /*
267 * Select a non-default squeue
268 */
269 if (sq->sq_state & SQS_DEFAULT)
270 continue;
271 }
* Comment Location: ip_squeue.c L344
* Type: T
* Priority 1
* Comment Text
When sqset_global_size = 1 this does the wrong thing:
344 sqs = sqset_global_list[(index %
(sqset_global_size-1)) + 1];
Shouldn''t it just be without the -1 and +1?
* Comment Location: squeue.c L 239
* Type: T
* Priority 3
* Comment Text
It sure would be nice if SQUEUE_MSG_SIZE wasn''t needed. How hard would
it
be to get tcp_timercache to only use normal mblks?
* Comment Location: ip_squeue.c L272
* Type: E
* Priority 1
* Comment Text
You define ENQUEUE_CHAIN_AT_HEAD but nothing uses this macro. Remove?
* Comment Location: squeue.c Line 586
* Type: T
* Priority 2
* Comment Text
We have
586 interrupt = (servicing_interrupt() || tag ==
SQTAG_IP_INPUT_RX_RING);
which is odd to me. The SQTAG has so far only been a debugging aid. It
is very odd to now use it for something other than debugging.
This oddity also shows up in ip.c L14911.
----
Peter Memishian
2008-Sep-22 22:16 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
> ip.c makes me loose sleep by involing cpu_lock in the TCP/IP lock hiearchy:As well it should: there are several open bugs that the current (pre-Crossbow) code has deadlocks due to violations here. -- meem
Kais Belgaied
2008-Sep-24 02:16 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Thanks Erik for the review the team is addressing all the comments and we''ll start to send the answers soon. On 09/22/08 14:45, Erik Nordmark wrote:> > * Reviewer Name: Erik Nordmark > * Time Spent > * General Comment: > I reviewed uts/common/inet/ files, but I didn''t pay close attention > to the capability changes. > > There are several files which only have sccsid changes. Why not remove > them before sending out the code review?some were missed, We''ll look carefully for those in the Phase II and in the updated webrev. Kais.> > --- >
sowmini.varadhan at sun.com
2008-Sep-26 00:04 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On (09/08/08 15:05), Kais Belgaied wrote:> > The webrev is available in > http://www.opensolaris.org/os/project/crossbow/Code-review2:> Interested? please send the code review comments by September 20th, 2008Sorry this is a little late, and I only looked at specific parts of this quickly, but here''s my $0.02. ------------------------------------------------------------------------ General: -------- - lot of zero-delta files; would be helpful to filter these out of the webrev to get a focussed review of such a complex project. - dld_drv.c, libdladm: needs to merge with recent nevada builds - it seems like we have a whole family of functions in ip with the _simple suffix that duplicate the code of their un-simple counterparts. Does this produce sufficient performance improvement to justify the mainenance problem produced by duplicated code? Specifics: ---------- - mac_set_prop: 2879 if (mac_is_macprop(macprop)) { : 2898 } else if (mip->mi_callbacks->mc_callbacks & MC_SETPROP) { : 2901 } 2902 2903 return (err); 2904 } Could end up return ing uninitialized value if property is !mac_is_macprop() and the driver does not support MC_SETPROP. Suggest 2873 int err = ENOTSUP; : 2879 if (mac_is_macprop(macprop)) { : 2898 } + if (mip->mi_callbacks->mc_callbacks & MC_SETPROP) { : 2901 } 2902 2903 return (err); - mac_get_prop: similar issue around uninitialized err. - mac.c: if property is a mac_prop we would return so Line 2942 is redundant, and the else creates needless tabbing of subsequent lines - mac.c: need to check valsize before doing bcopy in line 2932, 2937 - mac_get_prop: lines 2931-2933 are identcial to lines 2936-2938. Merge them into one case? 2929 case MAC_PROP_MAXBW: 2930 case MAC_PROP_PRIO: 2931 mac_get_resources(mh, &mrp); 2932 bcopy(&mrp, val, sizeof (mac_resource_props_t)); 2933 return (0); 2934 case MAC_PROP_BIND_CPU: 2935 case MAC_PROP_FANOUT: 2936 mac_get_resources(mh, &mrp); 2937 bcopy(&mrp, val, sizeof (mac_resource_props_t)); 2938 return (0); - mac.c: bzero at line 2926 of mac_getprop can be moved to line 2931. - mac.c: similarly lines 2887-2888 are identical to lines 2891-2892; line 2882 can be moved to line 2887. - if mi_resource_props is "SL", then should mac_get_resources invoke i_mac_perim_enter? - linkprop.c: do_check_priority should set up vd_val with the mac_resource_props_t (which is currently done in do_set_res). do_set_res can be replaced by a call to dld_set_public_prop(). - linkprop.c: why isn''t DLDIOC_MODIFYFLOW reusing DLDIOC_SETMACPROP? - linkprop.c: how do MAC_PROP_BIND_CPU and MAC_PROP_FANOUT get set from dladm? I see them in dld_public_prop_t, but not in the prop_table[] and I don''t see how they would get set. --Sowmini
Erik Nordmark
2008-Sep-26 06:03 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
sowmini.varadhan at sun.com wrote:> - it seems like we have a whole family of functions in ip with > the _simple suffix that duplicate the code of their un-simple > counterparts. Does this produce sufficient performance improvement > to justify the mainenance problem produced by duplicated code?I hope the duplication will be short-lived. The reason is that refactor-gate already includes a more condensed implementation of the same ideas (this shows up as ire_route_recursive_dstonly in refactor-gate) Erik
Peter Memishian
2008-Sep-26 07:14 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
> > - it seems like we have a whole family of functions in ip with> > the _simple suffix that duplicate the code of their un-simple > > counterparts. Does this produce sufficient performance improvement > > to justify the mainenance problem produced by duplicated code? > > I hope the duplication will be short-lived. Likewile -- this looks like a maintenance hazard. As an aside, will systems using IPMP be forced into the non-fastpath, or do I need to extend this hack further as part of Clearview IPMP? :-/ -- meem
Kais Belgaied
2008-Sep-26 17:54 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On 09/25/08 17:04, sowmini.varadhan at sun.com wrote:> On (09/08/08 15:05), Kais Belgaied wrote: > >> The webrev is available in >> http://www.opensolaris.org/os/project/crossbow/Code-review2 >> > : > >> Interested? please send the code review comments by September 20th, 2008 >> > > Sorry this is a little late, and I only looked at specific parts of > this quickly, but here''s my $0.02. > > >Thanks Sowmini for the review, the team is compiling the whole list of comments and shall be addressing them in the next coming days. Kais. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20080926/00f3fa65/attachment.html>
Kais Belgaied
2008-Sep-27 00:49 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Thanks Sasha for the review,
the team will be addressing and responding in the next few days.
Kais.
On 09/26/08 17:27, Alexander Kolbasov wrote:> General comment for the squeue code.
>
> These are for include files only. I have more comments for squeue.c file
> itself.
>
Peter Memishian
2008-Sep-28 05:58 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
Unfortunately, I only had time to look over the IP changes.
uts/common/inet/ip/ip_squeue.c:
* 186: ip_squeue_set_create() does KM_SLEEP allocations, but all
of its callers have cpu_lock held.
* 259: First argument to ip_squeue_getfree() is unused.
* 289: Do we really need to grab sq_lock here considering the
squeue_t has not been made available to a ring? If so, why do
we not need to hold sq_lock when adjusting sq_priority earlier
in the function?
* 290: Unclear what SQS_ILL_BOUND is needed for. Is this strictly
a debugging aid?
* 336: Is this ASSERT() supposed to be > 0 rather than > 1?
Should probably also be checked under sqset_lock.
* 490: Unclear why ip_squeue_restart_ring() needs to check
ILL_CONDEMNED.
* 507: Seems like this cv is misnamed, as we use it both to wait
for quiesce and restart. Also, if there''s any risk that
multiple threads could be waiting on different conditions on
the cv, we need to use cv_broadcast().
* 538: Spurious ARGSUSED.
* 540: No clue why this is called ip_squeue_extend(). Seems to be
ip_squeue_complete_bind() or somesuch.
* 562: Unclear how we can get here with cpuid == -1. Seems like
this should be an ASSERT().
* 594: Comment appears to be spurious as there''s no
ill_ring_state.
* 631: Would be cheaper/cleaner to make this test before doing the
kmem_zalloc().
* 640: Some comments explaining why we do the taskq_dispatch()
here would be helpful.
uts/common/inet/ip.h:
* 2009: ill_notify_mh seems too generic; perhaps ill_flownotify_mh?
* Synchronization rules for ill_tqlist, ill_notify_mh an
ill_capab_pending_cnt need to be documented in the table.
* 3432: Name this enum and use it when defining rr_ring_state.
* 3449: What is ip_dls_unbind_t?
* 3458-3610: This appears a bit stale, as none of ill_mac_accept()
dls_accept_pkts() or dls_sendup() exist.
* 3465: Unclear what this comment has to do with the provided.
code, which doesn''t appear to implement this.
* 3469: What is ip_mac_accept_t?
* 3474: What is rr_handle -- rr_rx_handle or rr_intr_handle?
* 3490: What is rr_cpu_id for?
* 3529-3539: Revised IP squeue implementation have left a few
of these adrift, like ip_squeue_free() and ip_squeue_tcp().
uts/common/inet/ip_impl.h:
* 537: What is ill_direct_tx()?
uts/common/inet/ip_if.h:
* 148: ILL_MAC_PERIM_HELD() seems a more appropriate name
since it only provides the guarantee for the specified ill.
uts/common/inet/ip/ip_if.c:
* 530: Comment references non-existent ip_squeue_soft_ring
variable, also implying that the Niagara/Ontario references
are incorrect or stale.
* 829: Why isn''t ILL_CAPAB_DLD_LSO in this list?
* 1744: Given that we''re writer on the IPSQ, how could
ILL_CONDEMNED have been set?
* 2067: Indeed, it seems bad if ill_capability_dld_reset()
silently fails not to send a reset message. Maybe the reset
message(s) need to be preallocated as part of doing initial
capability negotiation? As an aside, the error messages here
appear to refer to MDT.
* 2641, 2664: I''m a bit confused about the terminology here; it
seems like ill_ring_bind() and ill_ring_add() belong to the
set of ip_squeue_* routines off in ip_squeue.c -- that is, they
seem to be ip_squeue_add_ring() and ip_squeue_bind_ring().
It also seems like ip_squeue_clean_ring() should be
ip_squeue_unbind_ring(). Looks like all of this may also be
responsible for having to #include <sys/squeue_impl.h> in
ip_if.c.
* 2640: Unclear what it means for ill_ring_bind() to fail, as
neither the direct call from ill_ring_add() nor the indirect
call through mci_resource_bind() seem to check the return value.
(As an aside, many return values from pointer-to-function calls
seem to be ignored, but lint isn''t smart enough to notice.)
* 2641: Why does ill_ring_bind() take both an ill_t *ill and an
ill_rx_ring_t *rx_ring? Is there a case where rx_ring->rr_ill
!= ill? (The same is true of most of the ip_squeue_*_ring()
functions.)
* 2644: I''m not convinced this ASSERT() is needed;
ill_ring_bind()
itself really doesn''t care about cpu_lock, and
ip_squeue_cpu_move() already has this ASSERT().
* 2646, 2680: Why do we check ILL_CONDEMNED here? Is it needed
for correctness?
* 2655: How does one get here with a NULL rr_sqp? It appears the
only time it becomes NULL is when we''ve gone through
ip_squeue_clean_ring() -> squeue_worker_thr_control(), but
ip_squeue_clean_ring() sets rr_ring_state to RR_FREE_INPROG,
which we''ve already checked for.
* 2676: Why do we "gracefully" fail if mr_type is not
MAC_RX_FIFO?
I''d expect this to be an ASSERT().
* 2723: This seems dangerous, as ip_squeue_getfree() may block and
we''re holding ill_lock. Seem like ip_squeue_getfree() needs
to
be changed to be non-blocking, and we need to handle failure.
* 2734: If someone wants ill_name, they can get it from the ill_t
pointer we pass; no need to clutter up the DTrace probe points
with needless arguments.
* 3122, 3126, 3146, 3155: Direct use of DLD_VERSION_1 seems
strange. I would''ve expected DLD_CURRENT_VERSION or
somesuch.
* 3141: I''m surprised by this check; since ill_dld_capab is
only
NULLed out in ill_delete_tail(), it seems even if we do e.g.
ill_capability_dld_reset(), ill_dld_capab will not be
reinitialized.
* 3179, 3188: Should VERIFY() that these succeeded.
* 3192: A boolean_t would be more appropriate for
ill_mac_perim_held().
* 3230, 3312: Unclear why these use CE_CONT and hardcode
"warning" rather than using CE_WARN.
* 3299: Don''t we need to disable DLD_CAPAB_LSO in this error
case?
* 3365-3373: Holding ill_lock here seems dangerous since
idd_tx_cb_df() can block via mac_client_tx_notify() ->
mac_client_tx_notify_remove() -> mac_callback_remove_wait().
Further, unclear why ill_lock is needed at all here since
neither ill_notify_mp nor ill_capabilities seems to require it,
and it doesn''t appear to provide any synchronization with
ill_flow_enable().
* 3481: Comment here is stale; this is not called from
ip_rput_dlpi_writer().
* 3503: Nit: no need for ?: operator here, as == does the job.
* 3566: Is capability negotiation performance-critical enough
to justify the two ways into ill_capability_ack_thr? Why not
just always cv_signal() the ill_taskq_dispatch thread?
* 3369: Should we VERIFY() that this succeeded?
* 3382-3384: Unclear why unbind_conn variable is needed. More
broadly, I''m unclear what idp_unbind_conn is for. Nothing
seems to use it -- we just create it and destroy it.
* 18220: This comment is wrong: we still receive high-priority
DLPI responses even after ILL_CONDEMNED is set -- and FWIW
all standard DLPI ACKs are high-priority per the DLPI spec.
As such, it seems like one issue is that we didn''t define
DL_CAPABILITY_ACK as being high-priority.
* 3434, 3435, 18225: ill_capability_dld_disable_thr doesn''t
exist.
* 18347-18373: The new ill_capab_send()/done() logic appears
unnecessary. In particular, after the changes for 6651864
introduced ipsq_current_done, which should provide the mechanism
you need to ensure another ISPQ operation won''t sneak in. If
not, please justify. Also, please be sure to revisit the
comment on lines 3415-3419.
uts/common/inet/ip/ip.c:
* 5666: Where are ips_capab_taskq_cv, ips_capab_taskq_lock,
and ips_capab_taskq_list cv/mutex/list_destroy()''d?
* 6945, 6948: What does this change have to do with anything?
* 14032: This is a big-time hack. If there are checks that
can be optimized out, we need to indicate what they are
semantically (e.g., a unicast_only argument or somesuch)
so that it can be maintained. Also, why replicate lines
14079-14080 rather than just hoisting `skip'' to 14078?
* 14200: Nit: name here is awkward; maybe just DEV_Q_FLOW_BLOCKED
to match ILL_FLOW_BLOCKED?
* 14783: Unclear why this was changed; it was right before.
* 15427-15634: As others have mentioned, why is this needed?
This will need to be evolved/maintained as a duplicate codepath,
only to be shot by IP datapath refactoring -- and e.g. the check
on line 15582 puts load-spread IPMP traffic on the slow path,
and will put all IPMP traffic on the slow path after Clearview
IPMP because the IRE_LOCAL''s ire_rfq is never ill_rq (though
it
appears straightforward to make it work with Clearview IPMP,
it''s yet another datapath to test).
If you do decide to keep this function, please at least consider
adding common "next" label the bottom of the loop that has
the
ADD_TO_CHAIN() logic inlined (and the macro removed) along with
the ire_refrele()/ire=NULL dance.
uts/common/inet/ip/ip6.c:
* 114: ip_squeue_flag is already in <sys/squeue_impl.h>; remove.
(That said, it seems like it should be in <inet/ip.h>)
pkgdefs/SUNWhea/prototype_com:
* Why are we shipping any GLDv3 headers at this point?
tools/scripts/bfu.sh:
* 222-223: Nit: this list is supposed to be alphabetical.
--
meem
Kais Belgaied
2008-Sep-29 15:27 UTC
[crossbow-discuss] Crossbow Code Review Phase II ready
On 09/27/08 22:58, Peter Memishian wrote:> Unfortunately, I only had time to look over the IP changes. > >Thanks Meem for the review. The team will be addressing all comments in a few days. Also, stay tuned for Phase III. Kais,