thr3ads.net - freebsd stable - 11.0 stuck on high network load [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Slawa Olhovchenkov

2016-Sep-21 19:51 UTC

11.0 stuck on high network load

On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote:
> 
>  You can also use Dtrace and lockstat (especially with the lockstat -s
> option):
> 
> https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks
>
https://www.freebsd.org/cgi/man.cgi?query=lockstat&manpath=FreeBSD+11.0-RELEASE
> 
>  But I am less familiar with Dtrace/lockstat tools.
I am still use old kernel and got lockdown again.
Try using lockstat (I am save more output), interesting may be next:

R/W writer spin on writer: 190019 events in 1.070 seconds (177571 events/sec)

-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
140839  74%  74% 0.00    24659 tcpinp                 tcp_tw_2msl_scan+0xc6   

      nsec ------ Time Distribution ------ count     Stack                   
      4096 |                               913       tcp_twstart+0xa3        
      8192 |@@@@@@@@@@@@                   58191     tcp_do_segment+0x201f   
     16384 |@@@@@@                         29594     tcp_input+0xe1c         
     32768 |@@@@                           23447     ip_input+0x15f          
     65536 |@@@                            16197     
    131072 |@                              8674      
    262144 |                               3358      
    524288 |                               456       
   1048576 |                               9         
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
49180  26% 100% 0.00    15929 tcpinp                 tcp_tw_2msl_scan+0xc6   

      nsec ------ Time Distribution ------ count     Stack                   
      4096 |                               157       pfslowtimo+0x54         
      8192 |@@@@@@@@@@@@@@@                24796     softclock_call_cc+0x179 
     16384 |@@@@@@                         11223     softclock+0x44          
     32768 |@@@@                           7426     
intr_event_execute_handlers+0x95
     65536 |@@                             3918      
    131072 |                               1363      
    262144 |                               278       
    524288 |                               19        
-------------------------------------------------------------------------------

> >>  #1. Try above kernel options at least once, and see what you can
get.
> > 
> > OK, I am try this after some time.
> > 
> >>  #2. If #1 is a total failure try below patch:  It won't solve
anything,
> >> it just makes tcp_tw_2msl_scan() less greedy when there is
contention on
> >> the INP write lock.  If it makes the debugging more feasible,
continue
> >> to #3.
> > 
> > OK, thanks.
> > What purpose to not skip locked tcptw in this loop?
> 
>  If I understand your question correctly:  According to your pmcstat
> result, tcp_tw_2msl_scan() currently struggles with a write lock
> (__rw_wlock_hard) and the only write lock used tcp_tw_2msl_scan() is
> INP_WLOCK.  No sign of contention on TW_RLOCK(V_tw_lock) currently.
> 
> 51.86%  [2413083]  lock_delay @ /boot/kernel.VSTREAM/kernel
>  100.0%  [2413083]   __rw_wlock_hard
>   100.0%  [2413083]    tcp_tw_2msl_scan
> 
> --
> Julien

Julien Charbon

2016-Sep-21 21:25 UTC

head link

11.0 stuck on high network load

Hi Slawa,

On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote:> On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote:
>>  You can also use Dtrace and lockstat (especially with the lockstat -s
>> option):
>>
>> https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks
>>
https://www.freebsd.org/cgi/man.cgi?query=lockstat&manpath=FreeBSD+11.0-RELEASE
>>
>>  But I am less familiar with Dtrace/lockstat tools.
> 
> I am still use old kernel and got lockdown again.
> Try using lockstat (I am save more output), interesting may be next:
> 
> R/W writer spin on writer: 190019 events in 1.070 seconds (177571
events/sec)
> 
>
-------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Lock                   Caller
> 140839  74%  74% 0.00    24659 tcpinp                 tcp_tw_2msl_scan+0xc6
> 
>       nsec ------ Time Distribution ------ count     Stack
>       4096 |                               913       tcp_twstart+0xa3
>       8192 |@@@@@@@@@@@@                   58191     tcp_do_segment+0x201f
>      16384 |@@@@@@                         29594     tcp_input+0xe1c
>      32768 |@@@@                           23447     ip_input+0x15f
>      65536 |@@@                            16197     
>     131072 |@                              8674      
>     262144 |                               3358      
>     524288 |                               456       
>    1048576 |                               9         
>
-------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Lock                   Caller
> 49180  26% 100% 0.00    15929 tcpinp                 tcp_tw_2msl_scan+0xc6
> 
>       nsec ------ Time Distribution ------ count     Stack
>       4096 |                               157       pfslowtimo+0x54
>       8192 |@@@@@@@@@@@@@@@                24796    
softclock_call_cc+0x179
>      16384 |@@@@@@                         11223     softclock+0x44
>      32768 |@@@@                           7426     
intr_event_execute_handlers+0x95
>      65536 |@@                             3918      
>     131072 |                               1363      
>     262144 |                               278       
>     524288 |                               19        
>
-------------------------------------------------------------------------------
 This is interesting, it seems that you have two call paths competing
for INP locks here:

 - pfslowtimo()/tcp_tw_2msl_scan(reuse=0) and

 - tcp_input()/tcp_twstart()/tcp_tw_2msl_scan(reuse=1)

 These paths can indeed compete for the same INP lock, as both
tcp_tw_2msl_scan() calls always start with the first inp found in
twq_2msl list.  But in both cases, this first inp should be quickly used
and its lock released anyway, thus that could explain your situation it
that the TCP stack is doing that all the time, for example:

 - Let say that you are running out completely and constantly of tcptw,
and then all connections transitioning to TIME_WAIT state are competing
with the TIME_WAIT timeout scan that tries to free all the expired
tcptw.  If the stack is doing that all the time, it can appear like
"live" locked.

 This is just an hypothesis and as usual might be a red herring.
Anyway, could you run:

$ vmstat -z | head -2; vmstat -z | grep -E 'tcp|sock'

 Ideally, once when everything is ok, and once when you have the issue
to see the differences (if any).

 If it appears your are quite low in tcptw, and if you have enough
memory, could you try increase the tcptw limit using sysctl
net.inet.tcp.maxtcptw?  And actually see if it improve (or not) your
performance.

 My 2 cents.

--
Julien

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160921/8035e2d9/attachment.sig>

freebsd stable - Sep 2016 - 11.0 stuck on high network load

11.0 stuck on high network load

11.0 stuck on high network load