thr3ads.net - crossbow discuss - [crossbow-discuss] Crossbow APIs for device drivers [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Nicolas Droux

2009-Apr-10 23:24 UTC

[crossbow-discuss] Crossbow APIs for device drivers

[Bcc''ed driver-discuss at opensolaris.org and networking-discuss at
opensolaris.org
]

I am pleased to announce the availability of the first revision of the  
"Crossbow APIs for Device Drivers" document, available at the  
following location:

http://opensolaris.org/os/project/crossbow/Docs/crossbow-drivers.pdf

The document focuses on the new driver APIs that were introduced as  
part of the Crossbow project, specifically support for multiple  
receive and transmit rings, virtualization, dynamic polling, and  
multiple factory MAC addresses. The document includes a detailed  
description of the current version of the APIs, and should be useful  
for device driver writers who want to port their drivers to the  
Crossbow APIs for integration in OpenSolaris.

Note that as it is the case for GLDv3, these APIs are not yet  
committed and should be used only by drivers that are part of the ON  
consolidation.

Other Crossbow documents, use cases, man pages, papers, etc, are  
available from http://opensolaris.org/os/project/crossbow/Docs

Nicolas.

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux

Andrew Gallatin

2009-May-06 12:49 UTC

head link

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

Nicolas Droux wrote:> [Bcc''ed driver-discuss at opensolaris.org and 
> networking-discuss at opensolaris.org]
> 
> I am pleased to announce the availability of the first revision of the 
> "Crossbow APIs for Device Drivers" document, available at the
following
> location:
I recently ported a 10GbE driver to Crossbow.  My driver currently
has a single ring-group, and a configurable number of rings.  The
NIC hashes received traffic to the rings in hardware.

I''m having a strange issue which I do not see in the non-crossbow
version of the driver.  When I run TCP benchmarks, I''m seeing
what seems like packet loss.  Specifically, netstat shows
tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate,
and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow
on the same box with the same OS revision).

The first thing I suspected was that packets were getting dropped
due to my having the wrong generation number, but a dtrace probe
doesn''t show any drops there.

Now I''m wondering if perhaps the interupt handler is in
the middle of a call to mac_rx_ring() when interrupts
are disabled. Am I supposed to ensure that my interrupt handler is not
calling mac_rx_ring() before my rx_ring_intr_disable()
routine returns?  Or does the mac layer serialize this?

Thanks,

Drew

Andrew Gallatin

2009-May-06 16:28 UTC

head link

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

Andrew Gallatin wrote:> Nicolas Droux wrote:
>> [Bcc''ed driver-discuss at opensolaris.org and 
>> networking-discuss at opensolaris.org]
>>
>> I am pleased to announce the availability of the first revision of the 
>> "Crossbow APIs for Device Drivers" document, available at the
>> following location:
> 
> I recently ported a 10GbE driver to Crossbow.  My driver currently
> has a single ring-group, and a configurable number of rings.  The
> NIC hashes received traffic to the rings in hardware.
> 
> I''m having a strange issue which I do not see in the non-crossbow
> version of the driver.  When I run TCP benchmarks, I''m seeing
> what seems like packet loss.  Specifically, netstat shows
> tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate,
> and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow
> on the same box with the same OS revision).
> 
> The first thing I suspected was that packets were getting dropped
> due to my having the wrong generation number, but a dtrace probe
> doesn''t show any drops there.
> 
> Now I''m wondering if perhaps the interupt handler is in
> the middle of a call to mac_rx_ring() when interrupts
> are disabled. Am I supposed to ensure that my interrupt handler is not
> calling mac_rx_ring() before my rx_ring_intr_disable()
> routine returns?  Or does the mac layer serialize this?
I''m still trying to figure this out.  The code is just so large
that I''m having a hard time figuring out the big picture.

I have discovered that if I use the dladm create-vnic trick,
which disables polling, the out-of-order problem goes away.
I guess this implies that there is some problem with synchronization
between the interrupt routine, and the polling routine.

My driver looks quite a bit like the bge2 driver, in that
my interrupt handler builds an mblk chain when holding an
rx ring''s lock, and then drops it before calling mac_rx_ring().
I tried changing the code, and holding the lock around mac_rx_ring(),
and I still see TCP complaining of out of order & duplicate
packets.  So perhaps things are being dropped..?

Please help..

Drew

Nicolas Droux

2009-May-06 23:06 UTC

head link

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote:
> Nicolas Droux wrote:
>> [Bcc''ed driver-discuss at opensolaris.org and
networking-discuss at opensolaris.org
>> ]
>> I am pleased to announce the availability of the first revision of  
>> the "Crossbow APIs for Device Drivers" document, available at
the
>> following location:
>
> I recently ported a 10GbE driver to Crossbow.  My driver currently
> has a single ring-group, and a configurable number of rings.  The
> NIC hashes received traffic to the rings in hardware.
>
>
> I''m having a strange issue which I do not see in the non-crossbow
> version of the driver.  When I run TCP benchmarks, I''m seeing
> what seems like packet loss.  Specifically, netstat shows
> tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate,
> and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow
> on the same box with the same OS revision).
>
> The first thing I suspected was that packets were getting dropped
> due to my having the wrong generation number, but a dtrace probe
> doesn''t show any drops there.
>
> Now I''m wondering if perhaps the interupt handler is in
> the middle of a call to mac_rx_ring() when interrupts
> are disabled. Am I supposed to ensure that my interrupt handler is not
> calling mac_rx_ring() before my rx_ring_intr_disable()
> routine returns?  Or does the mac layer serialize this?
Can you reproduce the problem with only one RX ring enabled? If so,  
something to try would be to bind the poll thread to the same CPU as  
the MSI for that single RX ring. To find the CPU the MSI is bound to,  
run ::interrupts from mdb, then assign the CPU to use for the poll  
thread by doing a "dladm setlinkprop -p cpus=<cpuid>
<link>".

There might be a race between the poll thread and the thread trying to  
deliver the chain through mac_rx_ring() from interrupt context, since  
we currently don''t rebind the MSIs to the same CPUs as their  
corresponding poll threads. We are planning to do the rebinding of  
MSIs, but we are depending on interrupt rebinding APIs which are still  
being worked on. The experiment above would allow us to confirm  
whether it the issue seen here or if need to look somewhere else.

BTW, which ONNV build are you currently using?

Nicolas.
>
>
> Thanks,
>
> Drew
-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
nicolas.droux at sun.com - http://blogs.sun.com/droux

Andrew Gallatin

2009-May-07 00:22 UTC

head link

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

Nicolas Droux wrote:> 
> On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote:
> 
>> Nicolas Droux wrote:
>>> [Bcc''ed driver-discuss at opensolaris.org and 
>>> networking-discuss at opensolaris.org]
>>> I am pleased to announce the availability of the first revision of 
>>> the "Crossbow APIs for Device Drivers" document,
available at the
>>> following location:
>>
>> I recently ported a 10GbE driver to Crossbow.  My driver currently
>> has a single ring-group, and a configurable number of rings.  The
>> NIC hashes received traffic to the rings in hardware.
>>
>>
>> I''m having a strange issue which I do not see in the
non-crossbow
>> version of the driver.  When I run TCP benchmarks, I''m seeing
>> what seems like packet loss.  Specifically, netstat shows
>> tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate,
>> and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow
>> on the same box with the same OS revision).
>>
>> The first thing I suspected was that packets were getting dropped
>> due to my having the wrong generation number, but a dtrace probe
>> doesn''t show any drops there.
>>
>> Now I''m wondering if perhaps the interupt handler is in
>> the middle of a call to mac_rx_ring() when interrupts
>> are disabled. Am I supposed to ensure that my interrupt handler is not
>> calling mac_rx_ring() before my rx_ring_intr_disable()
>> routine returns?  Or does the mac layer serialize this?
> 
> Can you reproduce the problem with only one RX ring enabled? If so, 
Yes, easily.
> something to try would be to bind the poll thread to the same CPU as the 
> MSI for that single RX ring. To find the CPU the MSI is bound to, run 
> ::interrupts from mdb, then assign the CPU to use for the poll thread by 
> doing a "dladm setlinkprop -p cpus=<cpuid> <link>".
That helps quite a bit.  For comparison, with no binding at all, it 
looks like this: (~1Gb/s)

TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
         tcpRtoMax           = 60000     tcpMaxConn          =    -1
         tcpActiveOpens      =     0     tcpPassiveOpens     =     0
         tcpAttemptFails     =     0     tcpEstabResets      =     0
         tcpCurrEstab        =     5     tcpOutSegs          = 17456
         tcpOutDataSegs      =    21     tcpOutDataBytes     =  2272
         tcpRetransSegs      =     0     tcpRetransBytes     =     0
         tcpOutAck           = 17435     tcpOutAckDelayed    =     0
         tcpOutUrg           =     0     tcpOutWinUpdate     =     0
         tcpOutWinProbe      =     0     tcpOutControl       =     0
         tcpOutRsts          =     0     tcpOutFastRetrans   =     0
         tcpInSegs           =124676
         tcpInAckSegs        =    21     tcpInAckBytes       =  2272
         tcpInDupAck         =   412     tcpInAckUnsent      =     0
         tcpInInorderSegs    =122654     tcpInInorderBytes   =175240560
         tcpInUnorderSegs    =   125     tcpInUnorderBytes   =152184
         tcpInDupSegs        =   412     tcpInDupBytes       =590976
         tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
         tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
         tcpInWinProbe       =     0     tcpInWinUpdate      =     0
         tcpInClosed         =     0     tcpRttNoUpdate      =     0
         tcpRttUpdate        =    21     tcpTimRetrans       =     0
         tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
         tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
         tcpListenDrop       =     0     tcpListenDropQ0     =     0
         tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0

After doing the binding, I''m seeing less out-of-order
packets.  netstat -s -P tcp 1 now looks like this: (~4Gb/s)



TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
         tcpRtoMax           = 60000     tcpMaxConn          =    -1
         tcpActiveOpens      =     0     tcpPassiveOpens     =     0
         tcpAttemptFails     =     0     tcpEstabResets      =     0
         tcpCurrEstab        =     5     tcpOutSegs          = 46865
         tcpOutDataSegs      =     3     tcpOutDataBytes     =  1600
         tcpRetransSegs      =     0     tcpRetransBytes     =     0
         tcpOutAck           = 46869     tcpOutAckDelayed    =     0
         tcpOutUrg           =     0     tcpOutWinUpdate     =    19
         tcpOutWinProbe      =     0     tcpOutControl       =     0
         tcpOutRsts          =     0     tcpOutFastRetrans   =     0
         tcpInSegs           =372387
         tcpInAckSegs        =     3     tcpInAckBytes       =  1600
         tcpInDupAck         =    33     tcpInAckUnsent      =     0
         tcpInInorderSegs    =372264     tcpInInorderBytes   =527482971
         tcpInUnorderSegs    =    14     tcpInUnorderBytes   = 18806
         tcpInDupSegs        =    33     tcpInDupBytes       = 46591
         tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
         tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
         tcpInWinProbe       =     0     tcpInWinUpdate      =     0
         tcpInClosed         =     0     tcpRttNoUpdate      =     0
         tcpRttUpdate        =     3     tcpTimRetrans       =     0
         tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
         tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
         tcpListenDrop       =     0     tcpListenDropQ0     =     0
         tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0

And the old version of the driver, which does not deal with the new
crossbow interfaces:

TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
         tcpRtoMax           = 60000     tcpMaxConn          =    -1
         tcpActiveOpens      =     0     tcpPassiveOpens     =     0
         tcpAttemptFails     =     0     tcpEstabResets      =     0
         tcpCurrEstab        =     5     tcpOutSegs          = 55231
         tcpOutDataSegs      =     3     tcpOutDataBytes     =  1600
         tcpRetransSegs      =     0     tcpRetransBytes     =     0
         tcpOutAck           = 55228     tcpOutAckDelayed    =     0
         tcpOutUrg           =     0     tcpOutWinUpdate     =   465
         tcpOutWinProbe      =     0     tcpOutControl       =     0
         tcpOutRsts          =     0     tcpOutFastRetrans   =     0
         tcpInSegs           =438394
         tcpInAckSegs        =     3     tcpInAckBytes       =  1600
         tcpInDupAck         =     0     tcpInAckUnsent      =     0
         tcpInInorderSegs    =438392     tcpInInorderBytes   =617512374
         tcpInUnorderSegs    =     0     tcpInUnorderBytes   =     0
         tcpInDupSegs        =     0     tcpInDupBytes       =     0
         tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
         tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
         tcpInWinProbe       =     0     tcpInWinUpdate      =     0
         tcpInClosed         =     0     tcpRttNoUpdate      =     0
         tcpRttUpdate        =     3     tcpTimRetrans       =     0
         tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
         tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
         tcpListenDrop       =     0     tcpListenDropQ0     =     0
         tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0


> There might be a race between the poll thread and the thread trying to 
> deliver the chain through mac_rx_ring() from interrupt context, since we 
> currently don''t rebind the MSIs to the same CPUs as their
corresponding
> poll threads. We are planning to do the rebinding of MSIs, but we are 
> depending on interrupt rebinding APIs which are still being worked on. 
> The experiment above would allow us to confirm whether it the issue seen 
> here or if need to look somewhere else.
> 
> BTW, which ONNV build are you currently using?
SunOS dell1435a 5.11 snv_111a i86pc i386 i86pc

This is an OpenSolaris 2009.06 machine (updated from
snv_84).

I can try BFU''ing a different machine to a later build
(once I''ve BFU''ed it to 111a so as to repro it there).
I''m traveling, and wouldn''t have a chance to try that
test until next week.


BTW, did you see my earlier message on networking-discuss
(http://mail.opensolaris.org/pipermail/networking-discuss/2009-April/010979.html)
That was with the pre-crossbow version of the driver.

Cheers,

Drew

Nicolas Droux

2009-May-11 19:17 UTC

head link

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

Andrew,

Thanks for the additional info. We''d like to verify that interrupts  
are getting disabled from interrupt context itself. If you don''t mind,
could you gather an aggregation of the callers of  
mac_hwring_disable_intr() during one of your runs? You should be able  
to do this with "dtrace -n fbt::  
mac_hwring_disable_intr:entry''{@[stack()] = count()}''"

Thanks,
Nicolas.

On May 6, 2009, at 6:22 PM, Andrew Gallatin wrote:
> Nicolas Droux wrote:
>> On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote:
>>> Nicolas Droux wrote:
>>>> [Bcc''ed driver-discuss at opensolaris.org and
networking-discuss at opensolaris.org
>>>> ]
>>>> I am pleased to announce the availability of the first revision
>>>> of the "Crossbow APIs for Device Drivers" document,
available at
>>>> the following location:
>>>
>>> I recently ported a 10GbE driver to Crossbow.  My driver currently
>>> has a single ring-group, and a configurable number of rings.  The
>>> NIC hashes received traffic to the rings in hardware.
>>>
>>>
>>> I''m having a strange issue which I do not see in the
non-crossbow
>>> version of the driver.  When I run TCP benchmarks, I''m
seeing
>>> what seems like packet loss.  Specifically, netstat shows
>>> tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate,
>>> and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow
>>> on the same box with the same OS revision).
>>>
>>> The first thing I suspected was that packets were getting dropped
>>> due to my having the wrong generation number, but a dtrace probe
>>> doesn''t show any drops there.
>>>
>>> Now I''m wondering if perhaps the interupt handler is in
>>> the middle of a call to mac_rx_ring() when interrupts
>>> are disabled. Am I supposed to ensure that my interrupt handler is
>>> not
>>> calling mac_rx_ring() before my rx_ring_intr_disable()
>>> routine returns?  Or does the mac layer serialize this?
>> Can you reproduce the problem with only one RX ring enabled? If so,
>
> Yes, easily.
>
>> something to try would be to bind the poll thread to the same CPU  
>> as the MSI for that single RX ring. To find the CPU the MSI is  
>> bound to, run ::interrupts from mdb, then assign the CPU to use for  
>> the poll thread by doing a "dladm setlinkprop -p
cpus=<cpuid>
>> <link>".
>
> That helps quite a bit.  For comparison, with no binding at all, it  
> looks like this: (~1Gb/s)
>
> TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
>        tcpRtoMax           = 60000     tcpMaxConn          =    -1
>        tcpActiveOpens      =     0     tcpPassiveOpens     =     0
>        tcpAttemptFails     =     0     tcpEstabResets      =     0
>        tcpCurrEstab        =     5     tcpOutSegs          = 17456
>        tcpOutDataSegs      =    21     tcpOutDataBytes     =  2272
>        tcpRetransSegs      =     0     tcpRetransBytes     =     0
>        tcpOutAck           = 17435     tcpOutAckDelayed    =     0
>        tcpOutUrg           =     0     tcpOutWinUpdate     =     0
>        tcpOutWinProbe      =     0     tcpOutControl       =     0
>        tcpOutRsts          =     0     tcpOutFastRetrans   =     0
>        tcpInSegs           =124676
>        tcpInAckSegs        =    21     tcpInAckBytes       =  2272
>        tcpInDupAck         =   412     tcpInAckUnsent      =     0
>        tcpInInorderSegs    =122654     tcpInInorderBytes   =175240560
>        tcpInUnorderSegs    =   125     tcpInUnorderBytes   =152184
>        tcpInDupSegs        =   412     tcpInDupBytes       =590976
>        tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
>        tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
>        tcpInWinProbe       =     0     tcpInWinUpdate      =     0
>        tcpInClosed         =     0     tcpRttNoUpdate      =     0
>        tcpRttUpdate        =    21     tcpTimRetrans       =     0
>        tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
>        tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
>        tcpListenDrop       =     0     tcpListenDropQ0     =     0
>        tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0
>
> After doing the binding, I''m seeing less out-of-order
> packets.  netstat -s -P tcp 1 now looks like this: (~4Gb/s)
>
>
>
> TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
>        tcpRtoMax           = 60000     tcpMaxConn          =    -1
>        tcpActiveOpens      =     0     tcpPassiveOpens     =     0
>        tcpAttemptFails     =     0     tcpEstabResets      =     0
>        tcpCurrEstab        =     5     tcpOutSegs          = 46865
>        tcpOutDataSegs      =     3     tcpOutDataBytes     =  1600
>        tcpRetransSegs      =     0     tcpRetransBytes     =     0
>        tcpOutAck           = 46869     tcpOutAckDelayed    =     0
>        tcpOutUrg           =     0     tcpOutWinUpdate     =    19
>        tcpOutWinProbe      =     0     tcpOutControl       =     0
>        tcpOutRsts          =     0     tcpOutFastRetrans   =     0
>        tcpInSegs           =372387
>        tcpInAckSegs        =     3     tcpInAckBytes       =  1600
>        tcpInDupAck         =    33     tcpInAckUnsent      =     0
>        tcpInInorderSegs    =372264     tcpInInorderBytes   =527482971
>        tcpInUnorderSegs    =    14     tcpInUnorderBytes   = 18806
>        tcpInDupSegs        =    33     tcpInDupBytes       = 46591
>        tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
>        tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
>        tcpInWinProbe       =     0     tcpInWinUpdate      =     0
>        tcpInClosed         =     0     tcpRttNoUpdate      =     0
>        tcpRttUpdate        =     3     tcpTimRetrans       =     0
>        tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
>        tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
>        tcpListenDrop       =     0     tcpListenDropQ0     =     0
>        tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0
>
> And the old version of the driver, which does not deal with the new
> crossbow interfaces:
>
> TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
>        tcpRtoMax           = 60000     tcpMaxConn          =    -1
>        tcpActiveOpens      =     0     tcpPassiveOpens     =     0
>        tcpAttemptFails     =     0     tcpEstabResets      =     0
>        tcpCurrEstab        =     5     tcpOutSegs          = 55231
>        tcpOutDataSegs      =     3     tcpOutDataBytes     =  1600
>        tcpRetransSegs      =     0     tcpRetransBytes     =     0
>        tcpOutAck           = 55228     tcpOutAckDelayed    =     0
>        tcpOutUrg           =     0     tcpOutWinUpdate     =   465
>        tcpOutWinProbe      =     0     tcpOutControl       =     0
>        tcpOutRsts          =     0     tcpOutFastRetrans   =     0
>        tcpInSegs           =438394
>        tcpInAckSegs        =     3     tcpInAckBytes       =  1600
>        tcpInDupAck         =     0     tcpInAckUnsent      =     0
>        tcpInInorderSegs    =438392     tcpInInorderBytes   =617512374
>        tcpInUnorderSegs    =     0     tcpInUnorderBytes   =     0
>        tcpInDupSegs        =     0     tcpInDupBytes       =     0
>        tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
>        tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
>        tcpInWinProbe       =     0     tcpInWinUpdate      =     0
>        tcpInClosed         =     0     tcpRttNoUpdate      =     0
>        tcpRttUpdate        =     3     tcpTimRetrans       =     0
>        tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
>        tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
>        tcpListenDrop       =     0     tcpListenDropQ0     =     0
>        tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0
>
>
>
>> There might be a race between the poll thread and the thread trying  
>> to deliver the chain through mac_rx_ring() from interrupt context,  
>> since we currently don''t rebind the MSIs to the same CPUs as
their
>> corresponding poll threads. We are planning to do the rebinding of  
>> MSIs, but we are depending on interrupt rebinding APIs which are  
>> still being worked on. The experiment above would allow us to  
>> confirm whether it the issue seen here or if need to look somewhere  
>> else.
>> BTW, which ONNV build are you currently using?
>
> SunOS dell1435a 5.11 snv_111a i86pc i386 i86pc
>
> This is an OpenSolaris 2009.06 machine (updated from
> snv_84).
>
> I can try BFU''ing a different machine to a later build
> (once I''ve BFU''ed it to 111a so as to repro it there).
> I''m traveling, and wouldn''t have a chance to try that
> test until next week.
>
>
> BTW, did you see my earlier message on networking-discuss
>
(http://mail.opensolaris.org/pipermail/networking-discuss/2009-April/010979.html
> )
> That was with the pre-crossbow version of the driver.
>
> Cheers,
>
> Drew
-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
nicolas.droux at sun.com - http://blogs.sun.com/droux

Andrew Gallatin

2009-May-11 19:57 UTC

head link

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

Nicolas Droux wrote:> Andrew,
> 
> Thanks for the additional info. We''d like to verify that
interrupts are
> getting disabled from interrupt context itself. If you don''t mind,
could
> you gather an aggregation of the callers of mac_hwring_disable_intr() 
> during one of your runs? You should be able to do this with "dtrace -n
> fbt:: mac_hwring_disable_intr:entry''{@[stack()] =
count()}''"
Yes, they''re all coming from the interrupt context, via my rx
interrupt handler:

dtrace: description ''fbt::mac_hwring_disable_intr:entry''
matched 1 probe
^C


               mac`mac_rx_srs_drain+0x359
               mac`mac_rx_srs_process+0x1db
               mac`mac_rx+0x94
               mac`mac_rx_ring+0x4c
               myri10ge`myri10ge_intr_rx+0x70
               myri10ge`myri10ge_intr+0xa2
               unix`av_dispatch_autovect+0x7c
               unix`dispatch_hardint+0x33
               unix`switch_sp_and_call+0x13
             66481

FWIW, I was able to reproduce the problem on another machine
running build 111.  I BFU''ed to 112, 113, and 114, and I see
the same thing in each build.

I think that I must be doing something wrong in my driver.
Is there some sort of counter or dtrace script that one
use to track down out-of-order packets?

I''m not sure if this is helpful, but the 2 stacks I see for
the increment of mib:::tcpInDataUnorderSegs are:

# dtrace -n mib:::tcpInDataUnorderSegs''{@[stack()] = count()}''
dtrace: description ''mib:::tcpInDataUnorderSegs'' matched 1
probe
^C


               ip`tcp_rput_data+0xdd0
               ip`squeue_enter+0x330
               ip`ip_input+0xc17
               mac`mac_rx_soft_ring_drain+0xdf
               mac`mac_soft_ring_worker+0x111
               unix`thread_start+0x8
              1710

               ip`tcp_rput_data+0xdd0
               ip`squeue_drain+0x179
               ip`squeue_enter+0x3f4
               ip`ip_input+0xc17
               mac`mac_rx_soft_ring_drain+0xdf
               mac`mac_soft_ring_worker+0x111
               unix`thread_start+0x8
            195335


Thanks,
Drew

Andrew Gallatin

2009-May-11 20:51 UTC

head link

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

I found it!  It was completely my bug.  I had code which
would automatically flush any chain of packets greater
than X up through mac_rx().  This code would occasionally
trigger in polling mode, and cause chaos.   Ripping this
out fixes the problem.

Drew

Nicolas Droux

2009-May-11 21:12 UTC

head link

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

On May 11, 2009, at 1:57 PM, Andrew Gallatin wrote:
> Nicolas Droux wrote:
>> Andrew,
>> Thanks for the additional info. We''d like to verify that
interrupts
>> are getting disabled from interrupt context itself. If you
don''t
>> mind, could you gather an aggregation of the callers of  
>> mac_hwring_disable_intr() during one of your runs? You should be  
>> able to do this with "dtrace -n fbt::  
>> mac_hwring_disable_intr:entry''{@[stack()] =
count()}''"
>
> Yes, they''re all coming from the interrupt context, via my rx
> interrupt handler:
>
> dtrace: description ''fbt::mac_hwring_disable_intr:entry''
matched 1
> probe
> ^C
>
>
>              mac`mac_rx_srs_drain+0x359
>              mac`mac_rx_srs_process+0x1db
>              mac`mac_rx+0x94
>              mac`mac_rx_ring+0x4c
>              myri10ge`myri10ge_intr_rx+0x70
>              myri10ge`myri10ge_intr+0xa2
>              unix`av_dispatch_autovect+0x7c
>              unix`dispatch_hardint+0x33
>              unix`switch_sp_and_call+0x13
>            66481
OK that''s good to know.
>
>
> FWIW, I was able to reproduce the problem on another machine
> running build 111.  I BFU''ed to 112, 113, and 114, and I see
> the same thing in each build.
>
> I think that I must be doing something wrong in my driver.
I wouldn''t rule this out from the result of these experiments.

One key requirement is that you need to ensure that the driver will  
not pass new packets through mac_rx_ring() for a ring once its  
interrupt disable entry point for that ring returns. Otherwise the  
mac_rx_ring() thread can race against the poll thread which can cause  
packet reordering. This could be the issue here. The driver is  
required to implement the locking needed for this. If the driver needs  
to change state from the interrupt disable entry point and query that  
state from the interrupt handler before invoking mac_rx_ring(), both  
pieces of code need to protect their access to that state via a common  
lock.

Nicolas.
>
> Is there some sort of counter or dtrace script that one
> use to track down out-of-order packets?
>
> I''m not sure if this is helpful, but the 2 stacks I see for
> the increment of mib:::tcpInDataUnorderSegs are:
>
> # dtrace -n mib:::tcpInDataUnorderSegs''{@[stack()] =
count()}''
> dtrace: description ''mib:::tcpInDataUnorderSegs'' matched
1 probe
> ^C
>
>
>              ip`tcp_rput_data+0xdd0
>              ip`squeue_enter+0x330
>              ip`ip_input+0xc17
>              mac`mac_rx_soft_ring_drain+0xdf
>              mac`mac_soft_ring_worker+0x111
>              unix`thread_start+0x8
>             1710
>
>              ip`tcp_rput_data+0xdd0
>              ip`squeue_drain+0x179
>              ip`squeue_enter+0x3f4
>              ip`ip_input+0xc17
>              mac`mac_rx_soft_ring_drain+0xdf
>              mac`mac_soft_ring_worker+0x111
>              unix`thread_start+0x8
>           195335
>
>
> Thanks,
> Drew
-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
nicolas.droux at sun.com - http://blogs.sun.com/droux

crossbow discuss - Apr 2009 - Crossbow APIs for device drivers

[crossbow-discuss] Crossbow APIs for device drivers

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers

[crossbow-discuss] [networking-discuss] Crossbow APIs for device drivers