[Bcc''ed driver-discuss at opensolaris.org and networking-discuss at opensolaris.org ] I am pleased to announce the availability of the first revision of the "Crossbow APIs for Device Drivers" document, available at the following location: http://opensolaris.org/os/project/crossbow/Docs/crossbow-drivers.pdf The document focuses on the new driver APIs that were introduced as part of the Crossbow project, specifically support for multiple receive and transmit rings, virtualization, dynamic polling, and multiple factory MAC addresses. The document includes a detailed description of the current version of the APIs, and should be useful for device driver writers who want to port their drivers to the Crossbow APIs for integration in OpenSolaris. Note that as it is the case for GLDv3, these APIs are not yet committed and should be used only by drivers that are part of the ON consolidation. Other Crossbow documents, use cases, man pages, papers, etc, are available from http://opensolaris.org/os/project/crossbow/Docs Nicolas. -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Andrew Gallatin
2009-May-06 12:49 UTC
[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers
Nicolas Droux wrote:> [Bcc''ed driver-discuss at opensolaris.org and > networking-discuss at opensolaris.org] > > I am pleased to announce the availability of the first revision of the > "Crossbow APIs for Device Drivers" document, available at the following > location:I recently ported a 10GbE driver to Crossbow. My driver currently has a single ring-group, and a configurable number of rings. The NIC hashes received traffic to the rings in hardware. I''m having a strange issue which I do not see in the non-crossbow version of the driver. When I run TCP benchmarks, I''m seeing what seems like packet loss. Specifically, netstat shows tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate, and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow on the same box with the same OS revision). The first thing I suspected was that packets were getting dropped due to my having the wrong generation number, but a dtrace probe doesn''t show any drops there. Now I''m wondering if perhaps the interupt handler is in the middle of a call to mac_rx_ring() when interrupts are disabled. Am I supposed to ensure that my interrupt handler is not calling mac_rx_ring() before my rx_ring_intr_disable() routine returns? Or does the mac layer serialize this? Thanks, Drew
Andrew Gallatin
2009-May-06 16:28 UTC
[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers
Andrew Gallatin wrote:> Nicolas Droux wrote: >> [Bcc''ed driver-discuss at opensolaris.org and >> networking-discuss at opensolaris.org] >> >> I am pleased to announce the availability of the first revision of the >> "Crossbow APIs for Device Drivers" document, available at the >> following location: > > I recently ported a 10GbE driver to Crossbow. My driver currently > has a single ring-group, and a configurable number of rings. The > NIC hashes received traffic to the rings in hardware. > > I''m having a strange issue which I do not see in the non-crossbow > version of the driver. When I run TCP benchmarks, I''m seeing > what seems like packet loss. Specifically, netstat shows > tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate, > and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow > on the same box with the same OS revision). > > The first thing I suspected was that packets were getting dropped > due to my having the wrong generation number, but a dtrace probe > doesn''t show any drops there. > > Now I''m wondering if perhaps the interupt handler is in > the middle of a call to mac_rx_ring() when interrupts > are disabled. Am I supposed to ensure that my interrupt handler is not > calling mac_rx_ring() before my rx_ring_intr_disable() > routine returns? Or does the mac layer serialize this?I''m still trying to figure this out. The code is just so large that I''m having a hard time figuring out the big picture. I have discovered that if I use the dladm create-vnic trick, which disables polling, the out-of-order problem goes away. I guess this implies that there is some problem with synchronization between the interrupt routine, and the polling routine. My driver looks quite a bit like the bge2 driver, in that my interrupt handler builds an mblk chain when holding an rx ring''s lock, and then drops it before calling mac_rx_ring(). I tried changing the code, and holding the lock around mac_rx_ring(), and I still see TCP complaining of out of order & duplicate packets. So perhaps things are being dropped..? Please help.. Drew
Nicolas Droux
2009-May-06 23:06 UTC
[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers
On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote:> Nicolas Droux wrote: >> [Bcc''ed driver-discuss at opensolaris.org and networking-discuss at opensolaris.org >> ] >> I am pleased to announce the availability of the first revision of >> the "Crossbow APIs for Device Drivers" document, available at the >> following location: > > I recently ported a 10GbE driver to Crossbow. My driver currently > has a single ring-group, and a configurable number of rings. The > NIC hashes received traffic to the rings in hardware. > > > I''m having a strange issue which I do not see in the non-crossbow > version of the driver. When I run TCP benchmarks, I''m seeing > what seems like packet loss. Specifically, netstat shows > tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate, > and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow > on the same box with the same OS revision). > > The first thing I suspected was that packets were getting dropped > due to my having the wrong generation number, but a dtrace probe > doesn''t show any drops there. > > Now I''m wondering if perhaps the interupt handler is in > the middle of a call to mac_rx_ring() when interrupts > are disabled. Am I supposed to ensure that my interrupt handler is not > calling mac_rx_ring() before my rx_ring_intr_disable() > routine returns? Or does the mac layer serialize this?Can you reproduce the problem with only one RX ring enabled? If so, something to try would be to bind the poll thread to the same CPU as the MSI for that single RX ring. To find the CPU the MSI is bound to, run ::interrupts from mdb, then assign the CPU to use for the poll thread by doing a "dladm setlinkprop -p cpus=<cpuid> <link>". There might be a race between the poll thread and the thread trying to deliver the chain through mac_rx_ring() from interrupt context, since we currently don''t rebind the MSIs to the same CPUs as their corresponding poll threads. We are planning to do the rebinding of MSIs, but we are depending on interrupt rebinding APIs which are still being worked on. The experiment above would allow us to confirm whether it the issue seen here or if need to look somewhere else. BTW, which ONNV build are you currently using? Nicolas.> > > Thanks, > > Drew-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. nicolas.droux at sun.com - http://blogs.sun.com/droux
Andrew Gallatin
2009-May-07 00:22 UTC
[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers
Nicolas Droux wrote:> > On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote: > >> Nicolas Droux wrote: >>> [Bcc''ed driver-discuss at opensolaris.org and >>> networking-discuss at opensolaris.org] >>> I am pleased to announce the availability of the first revision of >>> the "Crossbow APIs for Device Drivers" document, available at the >>> following location: >> >> I recently ported a 10GbE driver to Crossbow. My driver currently >> has a single ring-group, and a configurable number of rings. The >> NIC hashes received traffic to the rings in hardware. >> >> >> I''m having a strange issue which I do not see in the non-crossbow >> version of the driver. When I run TCP benchmarks, I''m seeing >> what seems like packet loss. Specifically, netstat shows >> tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate, >> and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow >> on the same box with the same OS revision). >> >> The first thing I suspected was that packets were getting dropped >> due to my having the wrong generation number, but a dtrace probe >> doesn''t show any drops there. >> >> Now I''m wondering if perhaps the interupt handler is in >> the middle of a call to mac_rx_ring() when interrupts >> are disabled. Am I supposed to ensure that my interrupt handler is not >> calling mac_rx_ring() before my rx_ring_intr_disable() >> routine returns? Or does the mac layer serialize this? > > Can you reproduce the problem with only one RX ring enabled? If so,Yes, easily.> something to try would be to bind the poll thread to the same CPU as the > MSI for that single RX ring. To find the CPU the MSI is bound to, run > ::interrupts from mdb, then assign the CPU to use for the poll thread by > doing a "dladm setlinkprop -p cpus=<cpuid> <link>".That helps quite a bit. For comparison, with no binding at all, it looks like this: (~1Gb/s) TCP tcpRtoAlgorithm = 0 tcpRtoMin = 400 tcpRtoMax = 60000 tcpMaxConn = -1 tcpActiveOpens = 0 tcpPassiveOpens = 0 tcpAttemptFails = 0 tcpEstabResets = 0 tcpCurrEstab = 5 tcpOutSegs = 17456 tcpOutDataSegs = 21 tcpOutDataBytes = 2272 tcpRetransSegs = 0 tcpRetransBytes = 0 tcpOutAck = 17435 tcpOutAckDelayed = 0 tcpOutUrg = 0 tcpOutWinUpdate = 0 tcpOutWinProbe = 0 tcpOutControl = 0 tcpOutRsts = 0 tcpOutFastRetrans = 0 tcpInSegs =124676 tcpInAckSegs = 21 tcpInAckBytes = 2272 tcpInDupAck = 412 tcpInAckUnsent = 0 tcpInInorderSegs =122654 tcpInInorderBytes =175240560 tcpInUnorderSegs = 125 tcpInUnorderBytes =152184 tcpInDupSegs = 412 tcpInDupBytes =590976 tcpInPartDupSegs = 0 tcpInPartDupBytes = 0 tcpInPastWinSegs = 0 tcpInPastWinBytes = 0 tcpInWinProbe = 0 tcpInWinUpdate = 0 tcpInClosed = 0 tcpRttNoUpdate = 0 tcpRttUpdate = 21 tcpTimRetrans = 0 tcpTimRetransDrop = 0 tcpTimKeepalive = 0 tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0 tcpListenDrop = 0 tcpListenDropQ0 = 0 tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0 After doing the binding, I''m seeing less out-of-order packets. netstat -s -P tcp 1 now looks like this: (~4Gb/s) TCP tcpRtoAlgorithm = 0 tcpRtoMin = 400 tcpRtoMax = 60000 tcpMaxConn = -1 tcpActiveOpens = 0 tcpPassiveOpens = 0 tcpAttemptFails = 0 tcpEstabResets = 0 tcpCurrEstab = 5 tcpOutSegs = 46865 tcpOutDataSegs = 3 tcpOutDataBytes = 1600 tcpRetransSegs = 0 tcpRetransBytes = 0 tcpOutAck = 46869 tcpOutAckDelayed = 0 tcpOutUrg = 0 tcpOutWinUpdate = 19 tcpOutWinProbe = 0 tcpOutControl = 0 tcpOutRsts = 0 tcpOutFastRetrans = 0 tcpInSegs =372387 tcpInAckSegs = 3 tcpInAckBytes = 1600 tcpInDupAck = 33 tcpInAckUnsent = 0 tcpInInorderSegs =372264 tcpInInorderBytes =527482971 tcpInUnorderSegs = 14 tcpInUnorderBytes = 18806 tcpInDupSegs = 33 tcpInDupBytes = 46591 tcpInPartDupSegs = 0 tcpInPartDupBytes = 0 tcpInPastWinSegs = 0 tcpInPastWinBytes = 0 tcpInWinProbe = 0 tcpInWinUpdate = 0 tcpInClosed = 0 tcpRttNoUpdate = 0 tcpRttUpdate = 3 tcpTimRetrans = 0 tcpTimRetransDrop = 0 tcpTimKeepalive = 0 tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0 tcpListenDrop = 0 tcpListenDropQ0 = 0 tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0 And the old version of the driver, which does not deal with the new crossbow interfaces: TCP tcpRtoAlgorithm = 0 tcpRtoMin = 400 tcpRtoMax = 60000 tcpMaxConn = -1 tcpActiveOpens = 0 tcpPassiveOpens = 0 tcpAttemptFails = 0 tcpEstabResets = 0 tcpCurrEstab = 5 tcpOutSegs = 55231 tcpOutDataSegs = 3 tcpOutDataBytes = 1600 tcpRetransSegs = 0 tcpRetransBytes = 0 tcpOutAck = 55228 tcpOutAckDelayed = 0 tcpOutUrg = 0 tcpOutWinUpdate = 465 tcpOutWinProbe = 0 tcpOutControl = 0 tcpOutRsts = 0 tcpOutFastRetrans = 0 tcpInSegs =438394 tcpInAckSegs = 3 tcpInAckBytes = 1600 tcpInDupAck = 0 tcpInAckUnsent = 0 tcpInInorderSegs =438392 tcpInInorderBytes =617512374 tcpInUnorderSegs = 0 tcpInUnorderBytes = 0 tcpInDupSegs = 0 tcpInDupBytes = 0 tcpInPartDupSegs = 0 tcpInPartDupBytes = 0 tcpInPastWinSegs = 0 tcpInPastWinBytes = 0 tcpInWinProbe = 0 tcpInWinUpdate = 0 tcpInClosed = 0 tcpRttNoUpdate = 0 tcpRttUpdate = 3 tcpTimRetrans = 0 tcpTimRetransDrop = 0 tcpTimKeepalive = 0 tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0 tcpListenDrop = 0 tcpListenDropQ0 = 0 tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0> There might be a race between the poll thread and the thread trying to > deliver the chain through mac_rx_ring() from interrupt context, since we > currently don''t rebind the MSIs to the same CPUs as their corresponding > poll threads. We are planning to do the rebinding of MSIs, but we are > depending on interrupt rebinding APIs which are still being worked on. > The experiment above would allow us to confirm whether it the issue seen > here or if need to look somewhere else. > > BTW, which ONNV build are you currently using?SunOS dell1435a 5.11 snv_111a i86pc i386 i86pc This is an OpenSolaris 2009.06 machine (updated from snv_84). I can try BFU''ing a different machine to a later build (once I''ve BFU''ed it to 111a so as to repro it there). I''m traveling, and wouldn''t have a chance to try that test until next week. BTW, did you see my earlier message on networking-discuss (http://mail.opensolaris.org/pipermail/networking-discuss/2009-April/010979.html) That was with the pre-crossbow version of the driver. Cheers, Drew
Nicolas Droux
2009-May-11 19:17 UTC
[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers
Andrew, Thanks for the additional info. We''d like to verify that interrupts are getting disabled from interrupt context itself. If you don''t mind, could you gather an aggregation of the callers of mac_hwring_disable_intr() during one of your runs? You should be able to do this with "dtrace -n fbt:: mac_hwring_disable_intr:entry''{@[stack()] = count()}''" Thanks, Nicolas. On May 6, 2009, at 6:22 PM, Andrew Gallatin wrote:> Nicolas Droux wrote: >> On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote: >>> Nicolas Droux wrote: >>>> [Bcc''ed driver-discuss at opensolaris.org and networking-discuss at opensolaris.org >>>> ] >>>> I am pleased to announce the availability of the first revision >>>> of the "Crossbow APIs for Device Drivers" document, available at >>>> the following location: >>> >>> I recently ported a 10GbE driver to Crossbow. My driver currently >>> has a single ring-group, and a configurable number of rings. The >>> NIC hashes received traffic to the rings in hardware. >>> >>> >>> I''m having a strange issue which I do not see in the non-crossbow >>> version of the driver. When I run TCP benchmarks, I''m seeing >>> what seems like packet loss. Specifically, netstat shows >>> tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate, >>> and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow >>> on the same box with the same OS revision). >>> >>> The first thing I suspected was that packets were getting dropped >>> due to my having the wrong generation number, but a dtrace probe >>> doesn''t show any drops there. >>> >>> Now I''m wondering if perhaps the interupt handler is in >>> the middle of a call to mac_rx_ring() when interrupts >>> are disabled. Am I supposed to ensure that my interrupt handler is >>> not >>> calling mac_rx_ring() before my rx_ring_intr_disable() >>> routine returns? Or does the mac layer serialize this? >> Can you reproduce the problem with only one RX ring enabled? If so, > > Yes, easily. > >> something to try would be to bind the poll thread to the same CPU >> as the MSI for that single RX ring. To find the CPU the MSI is >> bound to, run ::interrupts from mdb, then assign the CPU to use for >> the poll thread by doing a "dladm setlinkprop -p cpus=<cpuid> >> <link>". > > That helps quite a bit. For comparison, with no binding at all, it > looks like this: (~1Gb/s) > > TCP tcpRtoAlgorithm = 0 tcpRtoMin = 400 > tcpRtoMax = 60000 tcpMaxConn = -1 > tcpActiveOpens = 0 tcpPassiveOpens = 0 > tcpAttemptFails = 0 tcpEstabResets = 0 > tcpCurrEstab = 5 tcpOutSegs = 17456 > tcpOutDataSegs = 21 tcpOutDataBytes = 2272 > tcpRetransSegs = 0 tcpRetransBytes = 0 > tcpOutAck = 17435 tcpOutAckDelayed = 0 > tcpOutUrg = 0 tcpOutWinUpdate = 0 > tcpOutWinProbe = 0 tcpOutControl = 0 > tcpOutRsts = 0 tcpOutFastRetrans = 0 > tcpInSegs =124676 > tcpInAckSegs = 21 tcpInAckBytes = 2272 > tcpInDupAck = 412 tcpInAckUnsent = 0 > tcpInInorderSegs =122654 tcpInInorderBytes =175240560 > tcpInUnorderSegs = 125 tcpInUnorderBytes =152184 > tcpInDupSegs = 412 tcpInDupBytes =590976 > tcpInPartDupSegs = 0 tcpInPartDupBytes = 0 > tcpInPastWinSegs = 0 tcpInPastWinBytes = 0 > tcpInWinProbe = 0 tcpInWinUpdate = 0 > tcpInClosed = 0 tcpRttNoUpdate = 0 > tcpRttUpdate = 21 tcpTimRetrans = 0 > tcpTimRetransDrop = 0 tcpTimKeepalive = 0 > tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0 > tcpListenDrop = 0 tcpListenDropQ0 = 0 > tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0 > > After doing the binding, I''m seeing less out-of-order > packets. netstat -s -P tcp 1 now looks like this: (~4Gb/s) > > > > TCP tcpRtoAlgorithm = 0 tcpRtoMin = 400 > tcpRtoMax = 60000 tcpMaxConn = -1 > tcpActiveOpens = 0 tcpPassiveOpens = 0 > tcpAttemptFails = 0 tcpEstabResets = 0 > tcpCurrEstab = 5 tcpOutSegs = 46865 > tcpOutDataSegs = 3 tcpOutDataBytes = 1600 > tcpRetransSegs = 0 tcpRetransBytes = 0 > tcpOutAck = 46869 tcpOutAckDelayed = 0 > tcpOutUrg = 0 tcpOutWinUpdate = 19 > tcpOutWinProbe = 0 tcpOutControl = 0 > tcpOutRsts = 0 tcpOutFastRetrans = 0 > tcpInSegs =372387 > tcpInAckSegs = 3 tcpInAckBytes = 1600 > tcpInDupAck = 33 tcpInAckUnsent = 0 > tcpInInorderSegs =372264 tcpInInorderBytes =527482971 > tcpInUnorderSegs = 14 tcpInUnorderBytes = 18806 > tcpInDupSegs = 33 tcpInDupBytes = 46591 > tcpInPartDupSegs = 0 tcpInPartDupBytes = 0 > tcpInPastWinSegs = 0 tcpInPastWinBytes = 0 > tcpInWinProbe = 0 tcpInWinUpdate = 0 > tcpInClosed = 0 tcpRttNoUpdate = 0 > tcpRttUpdate = 3 tcpTimRetrans = 0 > tcpTimRetransDrop = 0 tcpTimKeepalive = 0 > tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0 > tcpListenDrop = 0 tcpListenDropQ0 = 0 > tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0 > > And the old version of the driver, which does not deal with the new > crossbow interfaces: > > TCP tcpRtoAlgorithm = 0 tcpRtoMin = 400 > tcpRtoMax = 60000 tcpMaxConn = -1 > tcpActiveOpens = 0 tcpPassiveOpens = 0 > tcpAttemptFails = 0 tcpEstabResets = 0 > tcpCurrEstab = 5 tcpOutSegs = 55231 > tcpOutDataSegs = 3 tcpOutDataBytes = 1600 > tcpRetransSegs = 0 tcpRetransBytes = 0 > tcpOutAck = 55228 tcpOutAckDelayed = 0 > tcpOutUrg = 0 tcpOutWinUpdate = 465 > tcpOutWinProbe = 0 tcpOutControl = 0 > tcpOutRsts = 0 tcpOutFastRetrans = 0 > tcpInSegs =438394 > tcpInAckSegs = 3 tcpInAckBytes = 1600 > tcpInDupAck = 0 tcpInAckUnsent = 0 > tcpInInorderSegs =438392 tcpInInorderBytes =617512374 > tcpInUnorderSegs = 0 tcpInUnorderBytes = 0 > tcpInDupSegs = 0 tcpInDupBytes = 0 > tcpInPartDupSegs = 0 tcpInPartDupBytes = 0 > tcpInPastWinSegs = 0 tcpInPastWinBytes = 0 > tcpInWinProbe = 0 tcpInWinUpdate = 0 > tcpInClosed = 0 tcpRttNoUpdate = 0 > tcpRttUpdate = 3 tcpTimRetrans = 0 > tcpTimRetransDrop = 0 tcpTimKeepalive = 0 > tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0 > tcpListenDrop = 0 tcpListenDropQ0 = 0 > tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0 > > > >> There might be a race between the poll thread and the thread trying >> to deliver the chain through mac_rx_ring() from interrupt context, >> since we currently don''t rebind the MSIs to the same CPUs as their >> corresponding poll threads. We are planning to do the rebinding of >> MSIs, but we are depending on interrupt rebinding APIs which are >> still being worked on. The experiment above would allow us to >> confirm whether it the issue seen here or if need to look somewhere >> else. >> BTW, which ONNV build are you currently using? > > SunOS dell1435a 5.11 snv_111a i86pc i386 i86pc > > This is an OpenSolaris 2009.06 machine (updated from > snv_84). > > I can try BFU''ing a different machine to a later build > (once I''ve BFU''ed it to 111a so as to repro it there). > I''m traveling, and wouldn''t have a chance to try that > test until next week. > > > BTW, did you see my earlier message on networking-discuss > (http://mail.opensolaris.org/pipermail/networking-discuss/2009-April/010979.html > ) > That was with the pre-crossbow version of the driver. > > Cheers, > > Drew-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. nicolas.droux at sun.com - http://blogs.sun.com/droux
Andrew Gallatin
2009-May-11 19:57 UTC
[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers
Nicolas Droux wrote:> Andrew, > > Thanks for the additional info. We''d like to verify that interrupts are > getting disabled from interrupt context itself. If you don''t mind, could > you gather an aggregation of the callers of mac_hwring_disable_intr() > during one of your runs? You should be able to do this with "dtrace -n > fbt:: mac_hwring_disable_intr:entry''{@[stack()] = count()}''"Yes, they''re all coming from the interrupt context, via my rx interrupt handler: dtrace: description ''fbt::mac_hwring_disable_intr:entry'' matched 1 probe ^C mac`mac_rx_srs_drain+0x359 mac`mac_rx_srs_process+0x1db mac`mac_rx+0x94 mac`mac_rx_ring+0x4c myri10ge`myri10ge_intr_rx+0x70 myri10ge`myri10ge_intr+0xa2 unix`av_dispatch_autovect+0x7c unix`dispatch_hardint+0x33 unix`switch_sp_and_call+0x13 66481 FWIW, I was able to reproduce the problem on another machine running build 111. I BFU''ed to 112, 113, and 114, and I see the same thing in each build. I think that I must be doing something wrong in my driver. Is there some sort of counter or dtrace script that one use to track down out-of-order packets? I''m not sure if this is helpful, but the 2 stacks I see for the increment of mib:::tcpInDataUnorderSegs are: # dtrace -n mib:::tcpInDataUnorderSegs''{@[stack()] = count()}'' dtrace: description ''mib:::tcpInDataUnorderSegs'' matched 1 probe ^C ip`tcp_rput_data+0xdd0 ip`squeue_enter+0x330 ip`ip_input+0xc17 mac`mac_rx_soft_ring_drain+0xdf mac`mac_soft_ring_worker+0x111 unix`thread_start+0x8 1710 ip`tcp_rput_data+0xdd0 ip`squeue_drain+0x179 ip`squeue_enter+0x3f4 ip`ip_input+0xc17 mac`mac_rx_soft_ring_drain+0xdf mac`mac_soft_ring_worker+0x111 unix`thread_start+0x8 195335 Thanks, Drew
Andrew Gallatin
2009-May-11 20:51 UTC
[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers
I found it! It was completely my bug. I had code which would automatically flush any chain of packets greater than X up through mac_rx(). This code would occasionally trigger in polling mode, and cause chaos. Ripping this out fixes the problem. Drew
Nicolas Droux
2009-May-11 21:12 UTC
[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers
On May 11, 2009, at 1:57 PM, Andrew Gallatin wrote:> Nicolas Droux wrote: >> Andrew, >> Thanks for the additional info. We''d like to verify that interrupts >> are getting disabled from interrupt context itself. If you don''t >> mind, could you gather an aggregation of the callers of >> mac_hwring_disable_intr() during one of your runs? You should be >> able to do this with "dtrace -n fbt:: >> mac_hwring_disable_intr:entry''{@[stack()] = count()}''" > > Yes, they''re all coming from the interrupt context, via my rx > interrupt handler: > > dtrace: description ''fbt::mac_hwring_disable_intr:entry'' matched 1 > probe > ^C > > > mac`mac_rx_srs_drain+0x359 > mac`mac_rx_srs_process+0x1db > mac`mac_rx+0x94 > mac`mac_rx_ring+0x4c > myri10ge`myri10ge_intr_rx+0x70 > myri10ge`myri10ge_intr+0xa2 > unix`av_dispatch_autovect+0x7c > unix`dispatch_hardint+0x33 > unix`switch_sp_and_call+0x13 > 66481OK that''s good to know.> > > FWIW, I was able to reproduce the problem on another machine > running build 111. I BFU''ed to 112, 113, and 114, and I see > the same thing in each build. > > I think that I must be doing something wrong in my driver.I wouldn''t rule this out from the result of these experiments. One key requirement is that you need to ensure that the driver will not pass new packets through mac_rx_ring() for a ring once its interrupt disable entry point for that ring returns. Otherwise the mac_rx_ring() thread can race against the poll thread which can cause packet reordering. This could be the issue here. The driver is required to implement the locking needed for this. If the driver needs to change state from the interrupt disable entry point and query that state from the interrupt handler before invoking mac_rx_ring(), both pieces of code need to protect their access to that state via a common lock. Nicolas.> > Is there some sort of counter or dtrace script that one > use to track down out-of-order packets? > > I''m not sure if this is helpful, but the 2 stacks I see for > the increment of mib:::tcpInDataUnorderSegs are: > > # dtrace -n mib:::tcpInDataUnorderSegs''{@[stack()] = count()}'' > dtrace: description ''mib:::tcpInDataUnorderSegs'' matched 1 probe > ^C > > > ip`tcp_rput_data+0xdd0 > ip`squeue_enter+0x330 > ip`ip_input+0xc17 > mac`mac_rx_soft_ring_drain+0xdf > mac`mac_soft_ring_worker+0x111 > unix`thread_start+0x8 > 1710 > > ip`tcp_rput_data+0xdd0 > ip`squeue_drain+0x179 > ip`squeue_enter+0x3f4 > ip`ip_input+0xc17 > mac`mac_rx_soft_ring_drain+0xdf > mac`mac_soft_ring_worker+0x111 > unix`thread_start+0x8 > 195335 > > > Thanks, > Drew-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. nicolas.droux at sun.com - http://blogs.sun.com/droux