2017-12-07 15:50 GMT+01:00 George Joseph <gjoseph at digium.com>:
>
>
> On Wed, Dec 6, 2017 at 11:13 AM, Olivier <oza.4h07 at gmail.com>
wrote:
>
>>
>>
>> 2017-12-06 15:52 GMT+01:00 George Joseph <gjoseph at digium.com>:
>>
>>>
>>>
>>> On Tue, Dec 5, 2017 at 9:20 AM, Olivier <oza.4h07 at
gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I carefully read [1] which details how backtrace files can be
produced.
>>>>
>>>> Maybe this seems natural to some, but how can I go one step
futher, and
>>>> check that produced XXX-thread1.txt, XXX-brief.txt, ... files
are OK ?
>>>>
>>>> In other words, where can I find an example on how to use one
of those
>>>> files and check by myself, that if a system ever fails, I
won't have to
>>>> wait for another failure to provide required data to support
teams ?
>>>>
>>>
>>> It's a great question but I could spend a week answering it and
not
>>> scratch the surface. :)
>>>
>>
>> Thanks very much for trying, anyway ;-)
>>
>>
>>> It's not a straightforward thing unless you know the code in
question.
>>> The most common is a segmentation fault (segfault or SEGV).
>>>
>>
>> True ! I experienced segfaults lately and I could not configure the
>> platform I used then (Debian Jessie) to produce core files in a
directory
>> Asterisk can write into.
>> Now, with Debian Stretch, I can produce core file at will (with a kill
-s
>> SIGSEGV <processid>).
>> I checked ast_coredumped worked OK as it produced thread.txt files and
so
>> on.
>>
>> Ideally, I would like to go one step further: check now that a future
>> .txt file would be "workable" (and not "you should have
compiled with
>> option XXX or configured with option YYY) .
>>
>>
>>
>>> In that case, the thread1.txt file is the place to start. Since
most
>>> of the objects passed around are really pointers to objects, the
most
>>> obvious cause would be a 0x0 for a value. So for instance
"chan=0x0".
>>> That would be a pointer to a channel object that was not set when
it
>>> probably should have been. Unfortunately, it's not only 0x0
that could
>>> cause a segv. Anytime a program tries to access memory it
doesn't own,
>>> that signal is raised. So let's say there a 256 byte buffer
which the
>>> process owns. If there's a bug somewhere that causes the
program to try
>>> and access bytes beyond the end of the buffer, you MAY get a segv
if that
>>> process doesn't also own that memory. If this case, the
backtrace won't
>>> show anything obvious because the pointers all look valid. There
probably
>>> would be an index variable (i or ix, etc) that may be set to 257
but you'd
>>> have to know that the buffer was only 256 bytes to realize that
that was
>>> the issue.
>>>
>>
>> So, with an artificial kill -s SIGSEGV <processid>, does the
bellow
>> output prove I have a workable .txt files (having .txt files that let
>> people find the root cause of the issue is another story as we probably
can
>> only hope for the best here) ?
>>
>>
>> # head core-brief.txt
>> !@!@!@! brief.txt !@!@!@!
>>
>>
>> Thread 38 (Thread 0x7f2aa5dd0700 (LWP 992)):
>> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
>> #1 0x000055cdcb69ae84 in __ast_cond_timedwait (filename=0x55cdcb7d4910
>> "threadpool.c", lineno=1131, func=0x55cdcb7d4ea8
<__PRETTY_FUNCTION__.8978>
>> "worker_idle", cond_name=0x55cdcb7d4b7f
"&worker->cond",
>> mutex_name=0x55cdcb7d4b71 "&worker->lock",
cond=0x7f2abc000978,
>> t=0x7f2abc0009a8, abstime=0x7f2aa5dcfc30) at lock.c:668
>> #2 0x000055cdcb75d153 in worker_idle (worker=0x7f2abc000970) at
>> threadpool.c:1131
>> #3 0x000055cdcb75ce61 in worker_start (arg=0x7f2abc000970) at
>> threadpool.c:1022
>> #4 0x000055cdcb769a8c in dummy_start (data=0x7f2abc000a80) at
>> utils.c:1238
>> #5 0x00007f2aeddad494 in start_thread (arg=0x7f2aa5dd0700) at
>> pthread_create.c:333
>>
>
>
> That's it! The key pieces of information are the function names
> (worker_idle, worker_start, etc.), the filename (threadpool.c, etc) and the
> line numbers (1131, 1022, etc).
>
>
>
>
>>
>>
>>> Deadlocks are even harder to troubleshoot. For that, you need to
look
>>> at full.txt to see where the threads are stuck and find the 1
thread that's
>>> holding the lock that the others are stuck on.
>>>
>>> Sorry. I wish I had a better answer because it'd help a lot if
folks
>>> could do more investigation themselves.
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> Check out the new Asterisk community forum at: https://community.asterisk.
> org/
>
> New to Asterisk? Start here:
> https://wiki.asterisk.org/wiki/display/AST/Getting+Started
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-users
>
Thank you very much guys, for your replies !
Now, I can't wait our next Segfault to happen ;-))))
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.digium.com/pipermail/asterisk-users/attachments/20171207/b3ec3967/attachment-0001.html>