Dan Cropp
2023-Feb-06 20:05 UTC
[asterisk-users] Asterisk rtp.conf stunaddr setting - what happens if there is an outage
Over the weekend, we had several customers running at AWS. AWS had an outage during this time. This customer is running Asterisk 16.23.0 (which has the STUN timeout crash fix).>From what I have been told, other customers are running newer Asterisk 18.12.1 but encountered similar issues. (I haven't had a chance to verify this)All these customers should be running PJSIP, but I haven't had a chance to verify. The logs show Asterisk was reporting problems communicating with the STUN address in the rtp.conf [02/04 00:15:03.812] NOTICE[5943] stun.c: Attempt 1 to send STUN request to 'x.x.x.x' timed out. [02/04 00:15:06.812] NOTICE[5943] stun.c: Attempt 2 to send STUN request to ''x.x.x.x ' timed out. [02/04 00:15:09.813] WARNING[5943] stun.c: Attempt 3 to send STUN request to 'x.x.x.x' timed out. Check that the server address is correct and reachable. Until Asterisk was reset, the same pattern kept happening. Asterisk received INVITEs Immediately sends the 100 Trying 7 seconds later, Asterisk receives a CANCEL from the SIP provider. Another half second later, Asterisk receives a second CANCEL A second later, Asterisk receives a third CANCEL After the third failed to send STUN request, Asterisk sends a 200 OK response for the CSeq CANCEL Followed by a 487 Request Terminated Then a second 200 OK response for the CANCEL CSeq Then a third 200 OK response for the CANCEL CSeq We have an AMI connection. At this point, we are seeing the Newchannel event for this channel. It immediately sends various events for the Channel, including the Event: Hangup indicating the channel is ended. 63 ms later, it receives an ACK which completes the Call-ID processing. This went on for over 8 hours. When they restarted the Asterisk box, everything was fine. I have been told, they had to restart each Asterisk we had running at AWS to resolve the failed to send to STUN error. No calls/channels would work until that was resolved. I wonder if the STUN address lookup happens only one time and AWS DNS may have modified something during this outage/recovery? Is there a recommendation on how to prevent this from happening? Any thoughts? Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20230206/10a30595/attachment.html>
Dan Cropp
2023-Feb-06 21:27 UTC
[asterisk-users] Asterisk rtp.conf stunaddr setting - what happens if there is an outage
A quick follow-up. Looking at other customers running 18.12.1 who reported problems at the exact same time with AWS issue described below. We are seeing similar behavior. For these systems, the third STUN failure occurs. We were able to answer the call because the SIP provider didn't CANCEL the call. However, upstream from the service provider the calls were terminated. Resulting in a call from the SIP provider to Asterisk that's live, but there is no caller so it appears to be dead air. Does the res_rtp_asterisk stunaddr DNS TTL expiration mentioned in change ID I7955a046293f913ba121bbd82153b04439e3465f require the dnsmgr.conf to be enabled? Dan From: Dan Cropp Sent: Monday, February 6, 2023 2:06 PM To: Asterisk Users Mailing List - Non-Commercial Discussion <asterisk-users at lists.digium.com> Subject: Asterisk rtp.conf stunaddr setting - what happens if there is an outage Over the weekend, we had several customers running at AWS. AWS had an outage during this time. This customer is running Asterisk 16.23.0 (which has the STUN timeout crash fix).>From what I have been told, other customers are running newer Asterisk 18.12.1 but encountered similar issues. (I haven't had a chance to verify this)All these customers should be running PJSIP, but I haven't had a chance to verify. The logs show Asterisk was reporting problems communicating with the STUN address in the rtp.conf [02/04 00:15:03.812] NOTICE[5943] stun.c: Attempt 1 to send STUN request to 'x.x.x.x' timed out. [02/04 00:15:06.812] NOTICE[5943] stun.c: Attempt 2 to send STUN request to ''x.x.x.x ' timed out. [02/04 00:15:09.813] WARNING[5943] stun.c: Attempt 3 to send STUN request to 'x.x.x.x' timed out. Check that the server address is correct and reachable. Until Asterisk was reset, the same pattern kept happening. Asterisk received INVITEs Immediately sends the 100 Trying 7 seconds later, Asterisk receives a CANCEL from the SIP provider. Another half second later, Asterisk receives a second CANCEL A second later, Asterisk receives a third CANCEL After the third failed to send STUN request, Asterisk sends a 200 OK response for the CSeq CANCEL Followed by a 487 Request Terminated Then a second 200 OK response for the CANCEL CSeq Then a third 200 OK response for the CANCEL CSeq We have an AMI connection. At this point, we are seeing the Newchannel event for this channel. It immediately sends various events for the Channel, including the Event: Hangup indicating the channel is ended. 63 ms later, it receives an ACK which completes the Call-ID processing. This went on for over 8 hours. When they restarted the Asterisk box, everything was fine. I have been told, they had to restart each Asterisk we had running at AWS to resolve the failed to send to STUN error. No calls/channels would work until that was resolved. I wonder if the STUN address lookup happens only one time and AWS DNS may have modified something during this outage/recovery? Is there a recommendation on how to prevent this from happening? Any thoughts? Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20230206/49a1fe92/attachment.html>