On Jul 11, 2013, at 2:40 AM, Alexander Frolkin <avf at eldamar.org.uk>
wrote:
> The Java process has an AMI connection to Asterisk which it keeps open
> continuously.  When sending an AMI request, the process will wait a
> certain amount of time for a response (I think the timeout is 5 seconds)
> and if it hasn't received it, will retry the request up to 5 times.
Timeout based retries on AMI is probably going to be a problem.
AMI uses TCP, which has its own retry/retransmission logic. While the
TCP connection is alive, you can safely assume that the message is
either on its way to Asterisk, Asterisk is processing the message, or
the response is heading back to you.
Now if the TCP connection fails after you've sent a request, but
before you receive a response, then you really have no way of knowing
if the request was received/processed/rejected/etc. At that point, the
best you can do is reestablish the AMI connection and retry the
action.
> Occasionally, the Java process logs show AMI requests timing out (after
> 5 tries).  What I see in packet captures of the AMI traffic is:
> 
>  1. Java process sends a request (e.g., add member to queue)
Do you see the TCP ACK coming back from Asterisk?
>  2. Retries 5 seconds later
>  3. Retries again 5 seconds later
>  4. Retries again 5 seconds later
>  5. Retries again 5 seconds later
>  6. Logs a timeout
>  7. After a few more seconds, Asterisk replies to all five requests
>     in one go (in a single packet; so, e.g., for "add member to
queue"
>     it would reply "success", then four failures because the
member is
>     already added); however at this stage, the Java process has given
>     up
During the quiet period while you're waiting for the response, do you
receive events over that AMI connection? Are there other actions that
you're attempting to execute? Is there any consistency as to which
commands are getting delayed?
> It feels like Asterisk queues up the AMI responses and then
> periodically sends out all the responses in the queue in one go.  Is
> something like this going on?  Does the frequency at which Asterisk
> flushes the queue depend on load?  Are there any tunable in the config
> for this?
No, there's no response queue in Asterisk. For the action's I've
looked at, it pretty much immediately processes the request and sends
the response.
There are any number of reasons why the response would be delayed, but
the >25 seconds delay you're seeing is excessive for any of the
reasons I can think of. The resource you're using AMI to access may be
busy doing something else. Or the request is simply taking that long
to process. Packet loss could cause delays in getting responses, but
usually not for the lengths of times you're talking about.
I know it's not a lot of info, but hopefully you can turn up some
logging or packet captures to narrow down what's going on.
> Thanks in advance,
> 
> Alex
-- 
David M. Lee
Digium, Inc. | Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at:  www.digium.com  & www.asterisk.org