Michael Maier
2019-Aug-17 05:27 UTC
[asterisk-users] asterisk 16.5 / pjsip outage because of task processor queue >= 500 tasks and too many open files later on
Hello! I encountered an outage of asterisk which showed like that: - 2019-08-10 07:22:21 Asterisk start - 2019-08-15 19:39:33 WARNING taskprocessor.c: The 'pjsip/outreg/ispPJSIP-00000060' task processor queue reached 500 scheduled tasks. - 2019-08-15 19:39:34 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' - 2019-08-15 19:57:19 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' - 2019-08-15 22:59:59 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' - 2019-08-16 08:28:04 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' - 2019-08-16 08:29:04 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' - 2019-08-16 08:30:04 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' - 2019-08-16 08:31:04 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' - 2019-08-16 08:32:04 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' - 2019-08-16 08:33:04 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' - 2019-08-16 08:34:04 WARNING pjproject: SSL STATUS_FROM_SSL_ERR (status): Level: 0 err: <33558552> <system library-fopen-Too many open files> len: 0 peer: ISP-IP:5061 - 2019-08-16 08:34:04 ERROR pjproject: ssl0x7fbf92d096f0 Error loading CA list file '/etc/pki/tls/certs/ca-bundle.crt': Too many open files - 2019-08-16 08:34:04 WARNING res_pjsip_outbound_registration.c: No response received from 'sip:isp.de' on registration attempt to '..', retrying in '60' Inound calls via other ISP have been dropped - only those logentries can be seen: - 2019-08-16 09:25:43 WARNING[27924] res_rtp_asterisk.c: Unable to allocate RTCP socket: Too many open files - 2019-08-16 09:25:44 WARNING[27924] res_rtp_asterisk.c: Unable to allocate RTCP socket: Too many open files limits of asterisk: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 7767 7767 processes Max open files 1024 4096 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 7767 7767 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Memory of the device: Disk: 7866 MB / (SD card) 771 MB Swap phys. RAM: 2 GB Memory consumption of the asterisk process 'ps -C asterisk u': USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 15.08.2019 19:01 asterisk 15910 2.2 12.8 2237024 258960 ? Ssl Aug10 173:59 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 15.08.2019 20:01 asterisk 15910 2.2 12.9 2237024 260016 ? Ssl Aug10 175:06 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 15.08.2019 21:01 asterisk 15910 2.2 12.9 2237024 260280 ? Ssl Aug10 176:11 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 15.08.2019 22:01 asterisk 15910 2.2 12.9 2237024 260544 ? Ssl Aug10 177:16 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 15.08.2019 23:01 asterisk 15910 2.2 12.9 2237024 261072 ? Ssl Aug10 178:21 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 00:01 asterisk 15910 2.2 16.0 2302560 323984 ? Ssl Aug10 179:36 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 01:01 asterisk 15910 2.2 21.4 2368096 431076 ? Ssl Aug10 180:51 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 02:01 asterisk 15910 2.2 26.6 2499168 535784 ? Ssl Aug10 182:06 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 03:01 asterisk 15910 2.2 31.0 2564704 626192 ? Ssl Aug10 183:22 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 04:01 asterisk 15910 2.2 36.6 2695776 738540 ? Ssl Aug10 184:38 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 05:01 asterisk 15910 2.2 41.1 2761312 827892 ? Ssl Aug10 185:55 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 06:01 asterisk 15910 2.2 46.2 2892384 932244 ? Ssl Aug10 187:12 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 07:01 asterisk 15910 2.2 51.4 2957920 1036004 ? Ssl Aug10 188:31 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 08:01 asterisk 15910 2.2 56.2 3088992 1133328 ? Ssl Aug10 189:49 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf 16.08.2019 09:01 asterisk 15910 2.2 58.3 3088992 1175064 ? Ssl Aug10 191:07 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf restart 16.08.2019 10:01 asterisk 629 4.9 3.5 2103380 71700 ? Ssl 09:34 1:19 /usr/sbin/asterisk -vvv -mqf -C /etc/asterisk/asterisk.conf VSZ virtual memory size of the process in KiB (1024-byte units) RSS resident set size, the non-swapped physical memory that a task has used (in kiloBytes). => Memory usage had been rapidly growing since 2019-08-16 00:00:01 Few words about the usage of asterisk: - 2 registered endpoints - 4 SIPS / SRTP trunks - 46 calls at 2019-08-15 - the sip:isp.de trunk hadn't been used Some findings: - The problem seems to be triggered by the "task processor queue reached 500 scheduled tasks" problem. I saw this message in June, too. Context each time was the same ISP (-> not Deutsche Telekom!) as described above in conjunction with registration retries, too. - Registration configuration of this provider: <Registration/ServerURI..............................> <Auth..........> <Status.......> ========================================================================================= ispPJSIP/sip:isp.de ispPJSIP Registered ParameterName : ParameterValue ============================================================= auth_rejection_permanent : true client_uri : sip:0049... at isp.de contact_user : +49... endpoint : ispPJSIP expiration : 3600 fatal_retry_interval : 0 forbidden_retry_interval : 10 line : true max_retries : 10000 outbound_auth : ispPJSIP outbound_proxy : retry_interval : 60 server_uri : sip:isp.de support_path : false transport : 0.0.0.0-tls The expiration value given above is not true. isp.de forces ReRegistration each 90s (asterisk does it each 60s if I remember correctly)! Contact: <sip:+49... at isp-ip:5061;transport=TLS;line=....>;expires=90 - After performing the restart of asterisk, registration to the isp.de hadn't any problem any more. Therefore I think, the reregistration problem wasn't a problem of the provider not answering but in fact a problem of asterisk being unable to correctly perform the ReRegistration. The final question: ==================Is there a problem with taskprocessors probably not being canceled on some conditions (maybe unanswered or hanging registrations?) and therefore steadily growing up and using more and more open files (and memory) until asterisk can't do anything anymore because some limits are exceeded as a result? Could there be a problem with the retry interval 60s and the real ReRegister done each 60s, too (performing a "fork" bomb)? Thanks Michael
Joshua C. Colp
2019-Aug-19 10:21 UTC
[asterisk-users] asterisk 16.5 / pjsip outage because of task processor queue >= 500 tasks and too many open files later on
On Sat, Aug 17, 2019, at 3:07 AM, Michael Maier wrote:> Hello! ><snip>> > Few words about the usage of asterisk: > - 2 registered endpoints > - 4 SIPS / SRTP trunks > - 46 calls at 2019-08-15 > - the sip:isp.de trunk hadn't been used > > > Some findings: > > - The problem seems to be triggered by the "task processor queue > reached 500 scheduled tasks" problem. I saw this message in June, too. > Context each time was the same ISP (-> not Deutsche Telekom!) as > described above in conjunction with registration retries, too. > > - Registration configuration of this provider: > > <Registration/ServerURI..............................> > <Auth..........> <Status.......> > =========================================================================================> > ispPJSIP/sip:isp.de ispPJSIP > Registered > > ParameterName : ParameterValue > =============================================================> auth_rejection_permanent : true > client_uri : sip:0049... at isp.de > contact_user : +49... > endpoint : ispPJSIP > expiration : 3600 > fatal_retry_interval : 0 > forbidden_retry_interval : 10 > line : true > max_retries : 10000 > outbound_auth : ispPJSIP > outbound_proxy : > retry_interval : 60 > server_uri : sip:isp.de > support_path : false > transport : 0.0.0.0-tls > > > The expiration value given above is not true. isp.de forces > ReRegistration each 90s (asterisk does it each 60s if I remember > correctly)! > Contact: <sip:+49... at isp-ip:5061;transport=TLS;line=....>;expires=90 > > - After performing the restart of asterisk, registration to the isp.de > hadn't any problem any more. Therefore I think, > the reregistration problem wasn't a problem of the provider not > answering but in fact a problem of asterisk being unable to correctly > perform the ReRegistration. > > > > > The final question: > ==================> Is there a problem with taskprocessors probably not being canceled on > some conditions (maybe unanswered or hanging registrations?) and > therefore steadily growing up and using more and more open files (and > memory) until asterisk can't do > anything anymore because some limits are exceeded as a result? > Could there be a problem with the retry interval 60s and the real > ReRegister done each 60s, too (performing a "fork" bomb)?Taskprocessors aren't recurring things individually, they are a work queue item that is always executed. It may be a problem with the fact that it is TLS, and perhaps the act of trying to establish the TLS connection is taking a long time to fail causing things to build up. I'd suggest collecting a backtrace[1] and providing complete information on an issue[2]. [1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace [2] https://issues.asterisk.org/jira -- Joshua C. Colp Digium - A Sangoma Company | Senior Software Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - US Check us out at: www.digium.com & www.asterisk.org