Nicolas Droux
2009-Apr-03 23:06 UTC
[crossbow-discuss] [networking-discuss] Solaris Bridging design review [18-25 March 2009]
On Apr 1, 2009, at 2:40 PM, James Carlson wrote:> Nicolas Droux writes: >> - We already talked about having more common MAC hooks that would >> allow consumers like bridging and L2 filtering intercept and inject >> packets without having to change the common data-path. I''ll start >> defining those soon. > > OK. > >> - For the RX path you need to handle the poll path as well or you >> will >> miss packets, see mac_rx_srs_poll_ring(). I didn''t see that covered >> in >> your design doc or code. mac_rx() covers the interrupt path only. > > Good point. Thanks for the reference. It took a while, but I was > able to reverse-engineer a good bit of that code path. (I don''t know > of a design document that covers this; my sparse notes in case anyone > is interested are below.) > > It seems that the right place to hook this is in mac_rx_srs_drain and > mac_rx_srs_drain_bw, at approximately the same point they do the > mac_promisc_client_dispatch. I don''t want to mess with the polling > functions up above.The drain routines can be invoked for both interrupt and polling data- path. So having the hooks in both the drain routines and the interrupt- driven mac_rx() could cause some packets to be seen by the bridge twice. You cannot use the drain routines alone, because when L2 classification kicks in, they will only see the packets for their corresponding MAC clients. The combination of hook in mac_rx_srs_poll_ring() (near mac_promisc_dispatch()) and mac_rx() would cover all cases. Another issue with polling in the context of bridging is that packets in the polling path will be processed from the context of the polling thread, and that polling thread may not pull more packets from a receive ring if there is backlog on the MAC client, or if bandwidth control kicks in. Bridged packets could therefore be throttled by the MAC client which owns the receive rings from which the packets are received. An alternative would be to disable polling of the default rings when bridging is enabled on a NIC. In this case packets would be always coming in through the interrupt path, and the bridge would have a chance to process the packets before they are dispatched to clients SRS''es. The major drawback is that we would lose the performance benefits of polling when a bridge is enabled. We would have to provide some code you could call from mac_bridge_set() and mac_bridge_clear(), but this would allow you to not have to worry about hooks in the polling path.> Given the context (the fact that a bridge can make input packets look > like they came from a different underlying mac), I think the right > thing to do is to let the bridge process the packets and continue > deliverly up through mac_bridge_rx and the software classifier. > >> BTW >> this is another example why generic MAC hooks would be useful to >> have, >> i.e. you wouldn''t have to deal with all the details of the data-path. > > Yes. > >> - For the TX path we''re going to modify the TX default processing to >> use a fanout of rings instead of a single default TX ring to better >> scale. So some of the changes you are making related to the TX entry >> points would have to be redone. It would be better to not have the >> callers specify the default TX ring. > > Agreed; I think the simpler way would be to have a single mac_tx > function that knows how to deal with rings when necessary.Yes that would be best. Nicolas. -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
James Carlson
2009-Apr-15 20:50 UTC
[crossbow-discuss] [networking-discuss] Solaris Bridging design review [18-25 March 2009]
Nicolas Droux writes:> On Apr 1, 2009, at 2:40 PM, James Carlson wrote: > > > Nicolas Droux writes: > >> - We already talked about having more common MAC hooks that would > >> allow consumers like bridging and L2 filtering intercept and inject > >> packets without having to change the common data-path. I''ll start > >> defining those soon. > > > > OK. > > > >> - For the RX path you need to handle the poll path as well or you > >> will > >> miss packets, see mac_rx_srs_poll_ring(). I didn''t see that covered > >> in > >> your design doc or code. mac_rx() covers the interrupt path only. > > > > Good point. Thanks for the reference. It took a while, but I was > > able to reverse-engineer a good bit of that code path. (I don''t know > > of a design document that covers this; my sparse notes in case anyone > > is interested are below.) > > > > It seems that the right place to hook this is in mac_rx_srs_drain and > > mac_rx_srs_drain_bw, at approximately the same point they do the > > mac_promisc_client_dispatch. I don''t want to mess with the polling > > functions up above. > > The drain routines can be invoked for both interrupt and polling data- > path. So having the hooks in both the drain routines and the interrupt- > driven mac_rx() could cause some packets to be seen by the bridge > twice. You cannot use the drain routines alone, because when L2 > classification kicks in, they will only see the packets for their > corresponding MAC clients. The combination of hook in > mac_rx_srs_poll_ring() (near mac_promisc_dispatch()) and mac_rx() > would cover all cases.Ah, ok. mac_rx_srs_process is the issue here. I could use mac_rx_srs_poll_ring instead; that makes sense now.> Another issue with polling in the context of bridging is that packets > in the polling path will be processed from the context of the polling > thread, and that polling thread may not pull more packets from a > receive ring if there is backlog on the MAC client, or if bandwidth > control kicks in. Bridged packets could therefore be throttled by the > MAC client which owns the receive rings from which the packets are > received.Ew. Yes, that does sound a lot less than desirable. It would make traffic that''s transiting the box dependent on the behavior of local MAC receivers and (worse) potentially hold up BPDU messages.> An alternative would be to disable polling of the default rings when > bridging is enabled on a NIC. In this case packets would be always > coming in through the interrupt path, and the bridge would have a > chance to process the packets before they are dispatched to clients > SRS''es. The major drawback is that we would lose the performance > benefits of polling when a bridge is enabled.We''re already losing a fair bit of performance when bridging is enabled -- the interfaces are required to be in promiscuous mode, after all. I doubt that this is important, particularly if it means we have to sacrifice correctness to get it.> We would have to provide > some code you could call from mac_bridge_set() and mac_bridge_clear(), > but this would allow you to not have to worry about hooks in the > polling path.OK ... so I''ll look into how to disable polling and force interrupt mode for everything. It looks like that should be possible to do, but it''s somewhat unclear in the code. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Nicolas Droux
2009-Apr-15 23:09 UTC
[crossbow-discuss] [networking-discuss] Solaris Bridging design review [18-25 March 2009]
James Carlson wrote:>> We would have to provide >> some code you could call from mac_bridge_set() and mac_bridge_clear(), >> but this would allow you to not have to worry about hooks in the >> polling path. > > OK ... so I''ll look into how to disable polling and force interrupt > mode for everything. It looks like that should be possible to do, but > it''s somewhat unclear in the code.We can provide you the functions to enable/disable polling that you can call from the bridging code. Someone on our team will look into the best approach (either calling existing code or provide a new API) and follow-up separately. Nicolas. -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. nicolas.droux at sun.com - http://blogs.sun.com/droux
James Carlson
2009-Apr-16 14:06 UTC
[crossbow-discuss] [networking-discuss] Solaris Bridging design review [18-25 March 2009]
Nicolas Droux writes:> James Carlson wrote: > > OK ... so I''ll look into how to disable polling and force interrupt > > mode for everything. It looks like that should be possible to do, but > > it''s somewhat unclear in the code. > > We can provide you the functions to enable/disable polling that you can > call from the bridging code. Someone on our team will look into the best > approach (either calling existing code or provide a new API) and > follow-up separately.OK; thanks. That''ll help a good bit, as I''m likely to get hurt in there. If you want, you''re welcome to look at and contribute directly to the rbridges-on gate. (That way, you can deliver the feature along with an actual consumer, rather than trying to deliver to ON with no consumer.) My sense of it now is that it''s important to get right, but likely not a gating item in practice. The reason is that: (a) The polling feature gets invoked in and is intended for situations where we''re compute-bound due to network traffic on very high speed interfaces, and we need that incremental savings gained by avoiding interrupts. Those are situations where the user is very performance sensitive and is trying to run right to the edge of what the hardware can do. Those are exactly the users who will not want to use bridging, at least because of the cost of having all interfaces forced into promiscuous mode all of the time. You do this on e1000g, but not nxge. (b) The failure mode that occurs effectively just results in packet loss when we''re driven into polling mode. Packet loss under very high utilization is unfortunate, but at some point, it''s completely unavoidable. This problem just makes it happen a little earlier in the performance curve. (And it''s unclear how much earlier, and whether with reasonable configurations it happens at all.) It might not actually be a blocker. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677