thr3ads.net - crossbow discuss - [crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel [Mar 2009]

If this information is useful, please help other people find it:
Share via:

zhihui Chen

2009-Mar-28 15:52 UTC

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

Recently I found that the mac_srs_rx_poll_ring thread may never stop in
kernel, please see the following mpstat, cpu 2 is in 100% kernel usage, but
no syscalls and no interrupts.

     CPU minf mjf xcal intr  ithr csw icsw migr smtx srw syscl usr sys   wt
 idl
       0     0     0    0    300  100  0    0      1      0       0     0
   0    0    0   100
       1     14   0    0    134   68  134  1      2      0       0     155
0   1    0   99
       2     0     0    0     4     1     0     0      0     0        0
0      0   100 0   0
       3     0     0    0     67    34   70   0      2     0       0      38
   0    0    0   100

      What is the cpu 2 doing? dtrace -n
''fbt:::entry/cpu==2/{@[probefunc]=count();}'' shows following
output:
      ...........
      ddi_fm_acc_err_clear            416312
      ddi_fm_acc_err_get               416312
      ixgbe_check_acc_handle       416312
      ixgbe_check_dma_handle      416312
      ixgbe_ring_tx                        416312
      ixgbe_ring_rx_poll                  416312
      mac_rx_srs_drain                  416312
      ddi_fm_dma_err_get               416312
      ddi_dma_sync                       832624

      What is the stack calling ddi_dma_sync?  dtrace -n
''ddi_dma_sync:entry/cpu==2/{@[stack()]=count();}'' -c
"sleep 10" has
following output:
                  ixgbe`ixgbe_ring_rx+0x49
                  ixgbe`ixgbe_ring_rx_poll+0x3e
                  mac`mac_rx_srs_poll_ring+0x143
                  unix`thread_start+0x8
          3545313
                  ixgbe`ixgbe_ring_rx+0x200
                  ixgbe`ixgbe_ring_rx_poll+0x3e
                  mac`mac_rx_srs_poll_ring+0x143
                  unix`thread_start+0x8
          3545314

      From that, we can say that the thread "mac_rx_srs_poll_ring" is
polling the hardware again and again, never stops. Is this a known bug in
crossbow?

thanks
Zhihui
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090328/5effc063/attachment.html>

Sunay Tripathi

2009-Mar-28 18:28 UTC

head link

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

Zhihui,

What is the network load on the system/NIC? If you have a system which
is pretty loaded, then the poll thread will run as fast as it can to
try and bring packets in the system (without dropping them). What
nevada build are you running? By default we enable only 1 H/W lane
on the system right now so only one poll thread is active. If the
load is normal and you do want to service it, then I suggest
manually enabling more H/W lanes by exposing more Rx rings. So
set these in your /etc/system to enable more MSI-X interrupts
set ddi_msix_alloc_limit=8
set pcplusmp:apic_multi_msi_max=8
set pcplusmp:apic_msix_max=8
set pcplusmp:apic_intr_policy=1

and then set these in /kernel/drv/ixgbe.conf to expose more Rx rings
rx_queue_number = 4;
tx_queue_number = 4;

You will have to reboot but then your ability to service the load
will be much better.

Now if you didn''t want all these packets to come in the system, you
can set a B/W limit on the link using ''dladm'' maxbw property
to a
lower value (say 2000Mbps) and then the system will poll much less.

And if there was no network load and still you were seeing CPU
pegged at 100%, then obviously we have a bug somewhere. In which we
need more details on system/NIC/build workload etc etc.

HTH,
Sunay

zhihui Chen wrote:>      Recently I found that the mac_srs_rx_poll_ring thread may never 
> stop in kernel, please see the following mpstat, cpu 2 is in 100% kernel 
> usage, but no syscalls and no interrupts.
> 
>      CPU minf mjf xcal intr  ithr csw icsw migr smtx srw syscl usr sys   
> wt  idl
>        0     0     0    0    300  100  0    0      1      0       0     
> 0      0    0    0   100
>        1     14   0    0    134   68  134  1      2      0       0     
> 155   0   1    0   99
>        2     0     0    0     4     1     0     0      0     0        0 
>     0      0   100 0   0
>        3     0     0    0     67    34   70   0      2     0       0     
>  38    0    0    0   100
> 
>       What is the cpu 2 doing? dtrace -n 
> ''fbt:::entry/cpu==2/{@[probefunc]=count();}'' shows
following output:
>       ...........
>       ddi_fm_acc_err_clear            416312
>       ddi_fm_acc_err_get               416312
>       ixgbe_check_acc_handle       416312
>       ixgbe_check_dma_handle      416312
>       ixgbe_ring_tx                        416312
>       ixgbe_ring_rx_poll                  416312
>       mac_rx_srs_drain                  416312
>       ddi_fm_dma_err_get               416312
>       ddi_dma_sync                       832624
>  
>       What is the stack calling ddi_dma_sync?  dtrace -n 
> ''ddi_dma_sync:entry/cpu==2/{@[stack()]=count();}'' -c
"sleep 10" has
> following output:
>                   ixgbe`ixgbe_ring_rx+0x49
>                   ixgbe`ixgbe_ring_rx_poll+0x3e
>                   mac`mac_rx_srs_poll_ring+0x143
>                   unix`thread_start+0x8
>           3545313
>                   ixgbe`ixgbe_ring_rx+0x200
>                   ixgbe`ixgbe_ring_rx_poll+0x3e
>                   mac`mac_rx_srs_poll_ring+0x143
>                   unix`thread_start+0x8
>           3545314
> 
>       From that, we can say that the thread
"mac_rx_srs_poll_ring" is
> polling the hardware again and again, never stops. Is this a known bug 
> in crossbow?
> 
> thanks 
> Zhihui
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss

-- 
Sunay Tripathi
Distinguished Engineer
Solaris Core Operating System
Sun MicroSystems Inc.

Solaris Networking:     http://www.opensolaris.org/os/community/networking
Project Crossbow:       http://www.opensolaris.org/os/project/crossbow

Eric Cheng

2009-Mar-28 19:16 UTC

head link

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

On Sat, Mar 28, 2009 at 11:52:24PM +0800, zhihui Chen
wrote:>       From that, we can say that the thread
"mac_rx_srs_poll_ring" is
> polling the hardware again and again, never stops. Is this a known bug in
> crossbow?
> 
this is bug 6819793. it''ll be fixed in snv_112.

eric

zhihui Chen

2009-Mar-29 03:30 UTC

head link

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

Thanks, the workload is TCP_STREAM with netperf, the under hardware is the
10GBE and traffic can be 9Gbps, Without any system tuning , the workload can
run smoothly in most situations. This situation just happens randomly and no
any traffic on the link when this happens. This also leads to no traffic on
any other link so that I can only work on the console.
Thanks
Zhihui

2009/3/29 Sunay Tripathi <Sunay.Tripathi at sun.com>
> Zhihui,
>
> What is the network load on the system/NIC? If you have a system which
> is pretty loaded, then the poll thread will run as fast as it can to
> try and bring packets in the system (without dropping them). What
> nevada build are you running? By default we enable only 1 H/W lane
> on the system right now so only one poll thread is active. If the
> load is normal and you do want to service it, then I suggest
> manually enabling more H/W lanes by exposing more Rx rings. So
> set these in your /etc/system to enable more MSI-X interrupts
> set ddi_msix_alloc_limit=8
> set pcplusmp:apic_multi_msi_max=8
> set pcplusmp:apic_msix_max=8
> set pcplusmp:apic_intr_policy=1
>
> and then set these in /kernel/drv/ixgbe.conf to expose more Rx rings
> rx_queue_number = 4;
> tx_queue_number = 4;
>
> You will have to reboot but then your ability to service the load
> will be much better.
>
> Now if you didn''t want all these packets to come in the system,
you
> can set a B/W limit on the link using ''dladm'' maxbw
property to a
> lower value (say 2000Mbps) and then the system will poll much less.
>
> And if there was no network load and still you were seeing CPU
> pegged at 100%, then obviously we have a bug somewhere. In which we
> need more details on system/NIC/build workload etc etc.
>
> HTH,
> Sunay
>
> zhihui Chen wrote:
>
>>     Recently I found that the mac_srs_rx_poll_ring thread may never
stop
>> in kernel, please see the following mpstat, cpu 2 is in 100% kernel
usage,
>> but no syscalls and no interrupts.
>>
>>     CPU minf mjf xcal intr  ithr csw icsw migr smtx srw syscl usr sys  
wt
>>  idl
>>       0     0     0    0    300  100  0    0      1      0       0    
0
>>    0    0    0   100
>>       1     14   0    0    134   68  134  1      2      0       0    
155
>>   0   1    0   99
>>       2     0     0    0     4     1     0     0      0     0        0
>>  0      0   100 0   0
>>       3     0     0    0     67    34   70   0      2     0       0
>>  38    0    0    0   100
>>
>>      What is the cpu 2 doing? dtrace -n
>> ''fbt:::entry/cpu==2/{@[probefunc]=count();}'' shows
following output:
>>      ...........
>>      ddi_fm_acc_err_clear            416312
>>      ddi_fm_acc_err_get               416312
>>      ixgbe_check_acc_handle       416312
>>      ixgbe_check_dma_handle      416312
>>      ixgbe_ring_tx                        416312
>>      ixgbe_ring_rx_poll                  416312
>>      mac_rx_srs_drain                  416312
>>      ddi_fm_dma_err_get               416312
>>      ddi_dma_sync                       832624
>>        What is the stack calling ddi_dma_sync?  dtrace -n
>> ''ddi_dma_sync:entry/cpu==2/{@[stack()]=count();}'' -c
"sleep 10" has
>> following output:
>>                  ixgbe`ixgbe_ring_rx+0x49
>>                  ixgbe`ixgbe_ring_rx_poll+0x3e
>>                  mac`mac_rx_srs_poll_ring+0x143
>>                  unix`thread_start+0x8
>>          3545313
>>                  ixgbe`ixgbe_ring_rx+0x200
>>                  ixgbe`ixgbe_ring_rx_poll+0x3e
>>                  mac`mac_rx_srs_poll_ring+0x143
>>                  unix`thread_start+0x8
>>          3545314
>>
>>      From that, we can say that the thread
"mac_rx_srs_poll_ring" is
>> polling the hardware again and again, never stops. Is this a known bug
in
>> crossbow?
>>
>> thanks Zhihui
>>
>>
>>
------------------------------------------------------------------------
>>
>> _______________________________________________
>> crossbow-discuss mailing list
>> crossbow-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss
>>
>
>
> --
> Sunay Tripathi
> Distinguished Engineer
> Solaris Core Operating System
> Sun MicroSystems Inc.
>
> Solaris Networking:     http://www.opensolaris.org/os/community/networking
> Project Crossbow:       http://www.opensolaris.org/os/project/crossbow-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090329/3aba40c8/attachment.html>

zhihui Chen

2009-Mar-29 03:32 UTC

head link

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

Yes, the workload is heavy TX load and I set the link prop using "dladm
set-linkprop -p cpus=2 ixgbe", but my reported problem looks not like the
problem described by bug 6819793.
Thanks
Zhihui

2009/3/29 Eric Cheng <tlc at sun.com>
> On Sat, Mar 28, 2009 at 11:52:24PM +0800, zhihui Chen wrote:
> >       From that, we can say that the thread
"mac_rx_srs_poll_ring" is
> > polling the hardware again and again, never stops. Is this a known bug
in
> > crossbow?
> >
>
> this is bug 6819793. it''ll be fixed in snv_112.
>
> eric-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090329/9bba49ac/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

crossbow discuss - Mar 2009 - mac_srs_rx_poll_ring thread never stop polling hardware in kernel

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

[crossbow-discuss] mac_srs_rx_poll_ring thread never stop polling hardware in kernel

Reasonably Related Threads