Kais Belgaied
2008-Feb-21 22:18 UTC
[crossbow-discuss] Reclaiming transmit descriptors by NIC drivers with Crossbow new scheduling
The following is mainly a capture of parts of multiple off-line discussions within members of the Crossbow team (Gopi, Thiru, Roamer, May-Lin, Thirumailai, Nitin, KB, ...), I thought I''d open it up to other participants. Crossbow''s core scheduling involves switching a NIC (or individual Rx rings on the NIC) to polling mode. The receive interrupt will become not only rarer, but more importantly outside the control of the NIC drivers. Some drivers, on the other hand, were designed and written before that change. They used to piggy-back the tx descriptor reclaiming at the end of the Rx interrupt for example. At the same time, they disable to transmit interrupt all together, in an effort to minimize the number of interrupts to the host (and the entailed context switches). As expected, that sort of approach will (and it was actually observed) lead to quick exhaustion of the Tx descriptors, with no one to reclaim them soon enough under polling mode. Multiple choices have been suggested, and some actually implemented in some drivers: . A timeout thread that just reclaims periodically . Reclaim some descriptors at the beginning of or before returning from the tx routine . Turn on tx interrrupts per transmit ring. . Have a single NIC-wide interrupt shared for all tx ring''s transmit completion events, and turn that interrupt ON. . Have a high water mark/low water mark for triggering the reclaiming from the transmit routine. . Some internal tunables to choose one or a combination of the approaches above (how to detect the right time to change the tunables'' setting?) I am wondering what has been the experience of folks on this alias with the performance gain or penalty from the various choices above, or if they used different way for efficiently and promptly reclaim. ideas? comments? Kais
James Carlson
2008-Feb-21 22:50 UTC
[crossbow-discuss] Reclaiming transmit descriptors by NIC drivers with Crossbow new scheduling
Kais Belgaied writes:> I am wondering what has been the experience of folks on this alias with > the performance gain or penalty > from the various choices above, or if they used different way for > efficiently and promptly reclaim.I don''t have the experience you''re seeking, but a closely related issue is pinning down creds with dblks that chill out in TX descriptor blocks. A problem we''ve had with Zones is that the stream head puts on dblks with creds, those creds hold references to the zone_t, the zone_t in turn holds a reference to the root vnode_t for the zone, and thus you can''t shut down or unmount a zone until the zone-related mblks floating around in the system somehow get cleaned up. If they''re rotting away on a rarely-used (or perhaps even zone-specific) adapter, you may be in for a bad administrative hair day. We need to have those creds scrubbed away as soon as possible, either by semi-aggressively reclaiming TX resources or by having the creds stripped _before_ being placed on the TX queue. (Preferably by the Nemo framework.) However, one of the thorny problems I think this raises is that, at least for TCP, the db_ref counter is often greater than 1, so removing the credp may not be a kosher move. No, I don''t know what the right fix is here, but this looks like a good place to throw the problem into the mix. ;-} For what it''s worth, and at least a bit more related to the question you''re asking, I''m fairly opposed to "tunables" for these sorts of fiddling internal design details. If we can''t manage to get it right under all conditions, then we need to make it at least "self tuning" so that users don''t have to tweak voodoo variables. No more "putq versus putnext" ndd flags, or their moral equivalent, please. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
deepti dhokte
2008-Feb-21 23:02 UTC
[crossbow-discuss] Reclaiming transmit descriptors by NIC drivers with Crossbow new scheduling
Kais Belgaied wrote:> The following is mainly a capture of parts of multiple off-line > discussions within members of the Crossbow team > (Gopi, Thiru, Roamer, May-Lin, Thirumailai, Nitin, KB, ...), I thought > I''d open it up to other participants. > > Crossbow''s core scheduling involves switching a NIC (or individual Rx > rings on the NIC) to polling mode. > The receive interrupt will become not only rarer, but more importantly > outside the control of the NIC drivers. > Some drivers, on the other hand, were designed and written before that > change. They used to > piggy-back the tx descriptor reclaiming at the end of the Rx interrupt > for example. At the same time, they > disable to transmit interrupt all together, in an effort to minimize the > number of interrupts to the host (and > the entailed context switches). As expected, that sort of approach will > (and it was actually observed) lead > to quick exhaustion of the Tx descriptors, with no one to reclaim them > soon enough under polling mode. > > Multiple choices have been suggested, and some actually implemented in > some drivers: > . A timeout thread that just reclaims periodically >wonder, what would be priority of such a garbage collection thread? if it never gets scheduled, or gets scheduled less periodically compared to other control/data plane threads, then we still have the issue of out-of-sync reclaims.> . Reclaim some descriptors at the beginning of or before returning from > the tx routine >would TX app be blocked until Tx is done and Tx desc. is recycled back to pool? unless it is called out for some later times to be finished. (tell app, Tx is done, while scheduling of actual descriptor recycling routine, scheduled to be executed at some later time)?> . Turn on tx interrrupts per transmit ring. > . Have a single NIC-wide interrupt shared for all tx ring''s transmit > completion events, and turn that interrupt ON. >no comment, as do not understand above.> . Have a high water mark/low water mark for triggering the reclaiming > from the transmit routine. >this could be optimization when to trigger a reclaiming thread to collect the descriptor.> . Some internal tunables to choose one or a combination of the > approaches above (how to detect the right time > to change the tunables'' setting?) > > I am wondering what has been the experience of folksI have no direct experience, but If performance is observed to be lost, because of unavailability of Tx descriptor (say we observe thruput loss, in data-path due to always wait/block on Tx descriptor)., then having a solution to find TX descriptor and keep for future need should buy perf. gain, I think. so a combined solution of above ideas, i.e. a track of high watermark+low watermark of available descriptors, and kicking the garbage/Tx-desc collection thread when available descriptors is < =low watermark, but with higher or equal priority, should help. 2 cents, -Deepti> the performance gain or penalty > from the various choices above, or if they used different way for > efficiently and promptly reclaim. > > ideas? comments? > > Kais > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >
Paul Durrant
2008-Feb-22 10:19 UTC
[crossbow-discuss] Reclaiming transmit descriptors by NIC drivers with Crossbow new scheduling
Kais Belgaied wrote:> Multiple choices have been suggested, and some actually implemented in > some drivers: > . A timeout thread that just reclaims periodically > . Reclaim some descriptors at the beginning of or before returning from > the tx routine > . Turn on tx interrrupts per transmit ring. > . Have a single NIC-wide interrupt shared for all tx ring''s transmit > completion events, and turn that interrupt ON. > . Have a high water mark/low water mark for triggering the reclaiming > from the transmit routine. > . Some internal tunables to choose one or a combination of the > approaches above (how to detect the right time > to change the tunables'' setting?) > > I am wondering what has been the experience of folks on this alias with > the performance gain or penalty > from the various choices above, or if they used different way for > efficiently and promptly reclaim. > > ideas? comments? >I find that reaping descriptors (and freeing mblk/dblk) in the TX path tends to give lowest contention on both the streams dblk caches and the CPU caches because data structures tend to be accessed on the same CPU and having the transmitting thread free the mblk/dblk avoids lock contention (since it is generally this thread that allocates the blocks). Of course this tends to leave some descriptors hanging at the end of a burst of packets but they can be cleaned up by a worker thread after a small delay. Paul -------------- next part -------------- A non-text attachment was scrubbed... Name: pdurrant.vcf Type: text/x-vcard Size: 161 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20080222/ecb6e69c/attachment.vcf>
Apparently Analagous Threads
- Issue with Crossbow on VirtualBox
- Crossbow Code Review Phase III ready
- Crossbow Architecture approved at PSARC commitment review
- Which build is the last beta release ?
- [Fwd: [networking-discuss] code-review: fine-grained privileges for datalink administration]