John Todd
2005-Nov-28 17:44 UTC
[Asterisk-Users] SIP rapid INVITE re-transmission: bug, or config problem?
I'm having a problem with Asterisk sending too many INVITEs to a peer for a single call. I can't quite figure out why there are these rapid INVITEs sent to the call proxy. The call completes correctly (to, in this example, an echo test found via ENUM) but the number of INVITEs is really out of control and is a Bad Thing overall. My notes: 1) This isn't a firewall problem - there aren't any. Additionally, you'll note that the INVITEs are all being replied to eventually. 2) The intervals between the INIVTEs after the 407 sequence are: 34ms, 30ms, 49ms, 91ms. This is _way_ too fast for response timers to be expiring for reliable re-transmissions of INVITEs... isn't it? According to DEFAULT_RETRANS in chan_sip.c, the proper delay should be 1000ms between retransmissions. 3) Here is the peer definition for this system: [testbed] type=peer username=9 secret=dio0sywa82a host=10.0.3.173 insecure=very context=default qualify=4000 4) The 10.0.3.173 host is running a variant of SER, but that shouldn't matter since the problem is pretty clearly Asterisk's pushy INVITE problems. 5) I'm running CVS-HEAD 2005-11-18 22:14:30 UTC. 6) The INVITEs create a huge logjam of "100 Trying" and "200 OK with SDP" messages. This is Bad. 7) I have verified that only one INVITE is coming from the UA to Asterisk and the dialplan is only being executed once, so this is an Asterisk->testbed problem. 8) Here's the console output: cookies*CLI> testbed -- Executing Dial("SIP/2598-dbb4", "SIP/437800047111@testbed") in new stack -- Called 437800047111@testbed -- SIP/testbed-6b7c answered SIP/2598-dbb4 -- Attempting native bridge of SIP/2598-dbb4 and SIP/testbed-6b7c -- Executing Hangup("SIP/2598-dbb4", "") in new stack cookies*CLI> 9) Here's the tethereal output showing the conversation between Asterisk (10.0.3.170) and SER (10.0.3.173): poptart# tethereal -n port 5060 0.000000 10.0.3.170 -> 10.0.3.173 SIP/SDP Request: INVITE sip:437800047111@10.0.3.173, with session description 0.037369 10.0.3.173 -> 10.0.3.170 SIP Status: 407 Proxy Authentication Required 0.037941 10.0.3.170 -> 10.0.3.173 SIP Request: ACK sip:437800047111@10.0.3.173 0.038190 10.0.3.170 -> 10.0.3.173 SIP/SDP Request: INVITE sip:437800047111@10.0.3.173, with session description 0.072043 10.0.3.170 -> 10.0.3.173 SIP/SDP Request: INVITE sip:437800047111@10.0.3.173, with session description 0.102148 10.0.3.170 -> 10.0.3.173 SIP/SDP Request: INVITE sip:437800047111@10.0.3.173, with session description 0.151999 10.0.3.170 -> 10.0.3.173 SIP/SDP Request: INVITE sip:437800047111@10.0.3.173, with session description 0.242048 10.0.3.170 -> 10.0.3.173 SIP/SDP Request: INVITE sip:437800047111@10.0.3.173, with session description 0.317500 10.0.3.173 -> 10.0.3.170 SIP Status: 100 trying -- your call is important to us 0.354345 10.0.3.173 -> 10.0.3.170 SIP Status: 100 trying -- your call is important to us 0.395474 10.0.3.173 -> 10.0.3.170 SIP Status: 100 trying -- your call is important to us [etc] 10) The (post-407) INVITEs are identical to each other - there are no differences in Call-ID, branch, tag, nonces, or SDP. I then compared the INVITES between a working peer and the broken peer to each other - they're almost identical except for IP addresses, so nothing obvious there, either. 11) Each INVITE in a "sip debug" output is tagged with something like "Retransmitting #4 (no NAT) to 10.0.3.173:5060:" but there are no timer statements that I could see in the debug. 12) Interestingly, when I dial one of my UAs which is listed as a peer in sip.conf, there is (in my first test) a 420ms delay between the INVITE and the "100 Trying" from the UA. Asterisk does not re-transmit the INVITE in that interval. Something is different about my [testbed] peer. 13) More interestingly, a peer configured _almost_ identically (different IP address and secret, but otherwise EXACTLY the same) doesn't have the problem. Asterisk respectfully waits for the reply from the proxy and does not blast several INVITEs at it. The delay between the first (post-407) INVITE and the reply from the functional peer is ~50ms, much longer than Asterisk waits before re-transmitting on the [testbed] peer. I am _totally_ stumped here. I have changed the names of the peers, changed the qualify= statement, moved the peers around in sip.conf, stood on my head, etc. Your insights on this would be appreciated, since I'm not quite sure what Asterisk is up to with these rapid INVITE sequences. I'm thinking "bug" but maybe there is some subtle config option that I'm overlooking, so I'll ask the list before I file the bug. Maybe it's just that I've had too much caffeine today and the obvious solution is the one that's the most difficult to see. JT
Kevin P. Fleming
2005-Nov-28 17:52 UTC
[Asterisk-Users] SIP rapid INVITE re-transmission: bug, or config problem?
John Todd wrote:> 2) The intervals between the INIVTEs after the 407 sequence are: 34ms, 30ms, 49ms, 91ms. This is _way_ too fast for response timers to be expiring for reliable re-transmissions of INVITEs... isn't it? According to DEFAULT_RETRANS in chan_sip.c, the proper delay should be 1000ms between retransmissions.The intervals are computed based on the qualify timer results. If the peer responds quickly to OPTIONS, we assume it will respond quickly to INVITE.> I am _totally_ stumped here. I have changed the names of the peers, changed the qualify= statement, moved the peers around in sip.conf, stood on my head, etc. Your insights on this would be appreciated, since I'm not quite sure what Asterisk is up to with these rapid INVITE sequences. I'm thinking "bug" but maybe there is some subtle config option that I'm overlooking, so I'll ask the list before I file the bug. Maybe it's just that I've had too much caffeine today and the obvious solution is the one that's the most difficult to see.I would try removing qualify completely to see if it helps; it's likely that the peer takes a long time to respond to INVITE with authentication and we aren't allowing enough time.
Olle E. Johansson
2005-Nov-29 01:43 UTC
[Asterisk-Users] SIP rapid INVITE re-transmission: bug, or config problem?
John Todd wrote:> I'm having a problem with Asterisk sending too many INVITEs to a peer for a single call. I can't quite figure out why there are these rapid INVITEs sent to the call proxy. The call completes correctly (to, in this example, an echo test found via ENUM) but the number of INVITEs is really out of control and is a Bad Thing overall. > > My notes: > > 1) This isn't a firewall problem - there aren't any. Additionally, you'll note that the INVITEs are all > being replied to eventually. > > 2) The intervals between the INIVTEs after the 407 sequence are: 34ms, 30ms, 49ms, 91ms. > This is _way_ too fast for response timers to be expiring for reliablere-transmissions of INVITEs...> isn't it? According to DEFAULT_RETRANS in chan_sip.c, the properdelay should be 1000ms between retransmissions. No, it's twice the known roundtrip time or twice 500 ms. Since you have qualify on, we propably know the roundtrip time and do the retransmissions based on that.> 3) Here is the peer definition for this system: > > [testbed] > type=peer > username=9 > secret=dio0sywa82a > host=10.0.3.173 > insecure=very > context=default > qualify=4000 > > 6) The INVITEs create a huge logjam of "100 Trying" and "200 OK with SDP" messages. This is Bad.Can we see a SIP debug?> cookies*CLI> testbed > -- Executing Dial("SIP/2598-dbb4", "SIP/437800047111@testbed") in new stack > -- Called 437800047111@testbed > -- SIP/testbed-6b7c answered SIP/2598-dbb4 > -- Attempting native bridge of SIP/2598-dbb4 and SIP/testbed-6b7c > -- Executing Hangup("SIP/2598-dbb4", "") in new stack > cookies*CLI>Looks fine.> 10) The (post-407) INVITEs are identical to each other - there are no differences in Call-ID, branch, tag, nonces, or SDP. I then compared the INVITES between a working peer and the broken peer to each other - they're almost identical except for IP addresses, so nothing obvious there, either.Good.> 11) Each INVITE in a "sip debug" output is tagged with something like "Retransmitting #4 (no NAT) to 10.0.3.173:5060:" but there > are no timer statements that I could see in the debug.Turn on debug to 4. There should be messages about changing T1 timers then.> > I am _totally_ stumped here. I have changed the names of the peers, changed the qualify= statement, moved the peers around in sip.conf, stood on my head, etc. Your insights on this would be appreciated, since I'm not quite sure what Asterisk is up to with these rapid INVITE sequences. I'm thinking "bug" but maybe there is some subtle config option that I'm overlooking, so I'll ask the list before I file the bug. Maybe it's just that I've had too much caffeine today and the obvious solution is the one that's the most difficult to see.If you turn off qualify, we will use the default 500 ms as Timer T1. /O