thr3ads.net - dtrace discuss - [dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness

If this information is useful, please help other people find it:
Share via:

przemolicc at poczta.fm

2006-Dec-14 13:35 UTC

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

Hello,

we have 2-way (4-cores) Opteron server (x220M2):

bash-3.00# uname -a
SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc
bash-3.00# psrinfo -v
Status of virtual processor 0 as of: 12/14/2006 14:08:12
  on-line since 12/13/2006 11:41:29.
  The i386 processor operates at 2613 MHz,
	and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 12/14/2006 14:08:12
  on-line since 12/13/2006 11:41:34.
  The i386 processor operates at 2613 MHz,
	and has an i387 compatible floating point processor.
Status of virtual processor 2 as of: 12/14/2006 14:08:12
  on-line since 12/13/2006 11:41:36.
  The i386 processor operates at 2613 MHz,
	and has an i387 compatible floating point processor.
Status of virtual processor 3 as of: 12/14/2006 14:08:12
  on-line since 12/13/2006 11:41:38.
  The i386 processor operates at 2613 MHz,
	and has an i387 compatible floating point processor.

I was trying to catch syscalls and got unexpected message:
bash-3.00# dtrace -n ''syscall::: { @[execname] = count (); }''
dtrace: description ''syscall::: '' matched 454 probes
dtrace: processing aborted: Abort due to systemic unresponsiveness

Because the server is not busy and is being prepared for a production
I was surprised and did dtrace-ing again with vmstat-ing it. Below is output
from vmstat:

[...]

 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  668  160  316  0  0 100
 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  671  130  268  0  0 100
 0 0 0 14546888 7392452 0 0  0  0  0  0  0  3  3  1  1  738  250  326  0  0 100
 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  690  145  284  0  0 100
 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  651  161  285  0  0 100
 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  687  284  316  0  0 100
 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  634  146  268  0  0 100
 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  644  150  271  0  0 100
 0 0 0 14546284 7391632 143 288 0 0 0 0  0  0  0  0  0  692  910  342  0  0 100
 0 0 0 14459332 7312120 228 3332 0 0 0 0 0  0  0  0  0  884 2050  293  5  4 91	
[1]
 0 0 0 14459332 7312116 0 1  0  0  0  0  0  0  0  0  0  865  204  286  0  1 99
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr cd cd m2 m3   in   sy   cs us sy id
 0 0 0 14459332 7312116 0 0  0  0  0  0  0  0  0  0  0  889  232  306  0  0 100
 0 0 0 14459332 7312116 0 0  0  0  0  0  0  1  1  0  0  938  326  353  0  0 100
 0 0 0 14546272 7391616 0 1  0  0  0  0  0  0  0  0  0 53402 1389 1109 4  3 93	
[2]
 0 0 0 14546272 7391616 148 362 0 0 0 0  0  1  1  1  1 1691 3243 1915  3  1 96
 0 0 0 14545648 7390776 0 0  0  0  0  0  0  0  0  0  0  635   97  264  0  0 100
 0 0 0 14545648 7390776 1 3  0  0  0  0  0  0  0  0  0  687  210  340  0  0 100
 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  718  289  380  0  0 100
 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  676  224  322  0  0 100
 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  641  126  278  0  0 100
 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  642  127  263  0  0 100
[...]

The [1] is when I ran the dtrace and [2] is when I got the
message ("unresponsiveness").

I have read the relevant topic:
http://www.opensolaris.org/jive/thread.jspa?messageID=15073&
and am aware that:
- enabling destructive actions (-w)
or
- tuning below parameters for deadmen
    dtrace_deadman_user
    dtrace_deadman_interval
    dtrace_deadman_timeout
can be helpfull.  I agree that all of them are useful while server is really
busy
but I wouldn''t expect such behaviour on an idle server !

Is there any way to solve the problem without the tweaks ? I would like to get
more knowledge about a nature of the problem.

Regards
przemol


----------------------------------------------------------------------
smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret
http://link.interia.pl/f19d4 - najlepsze filmy w intermecie

Jon Haslam

2006-Dec-15 21:56 UTC

head link

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

Hi,

Yes, this looks to be an issue. I''ve had the same problem on some
dual core M2 systems and so has a colleague.

By the look of it, our high res timer gets severely out of whack every
now and then which causes us to think that things have hung. I''ll
let you know when I know more.

Jon.
>Hello,
>
>we have 2-way (4-cores) Opteron server (x220M2):
>
>bash-3.00# uname -a
>SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc
>bash-3.00# psrinfo -v
>Status of virtual processor 0 as of: 12/14/2006 14:08:12
>  on-line since 12/13/2006 11:41:29.
>  The i386 processor operates at 2613 MHz,
>	and has an i387 compatible floating point processor.
>Status of virtual processor 1 as of: 12/14/2006 14:08:12
>  on-line since 12/13/2006 11:41:34.
>  The i386 processor operates at 2613 MHz,
>	and has an i387 compatible floating point processor.
>Status of virtual processor 2 as of: 12/14/2006 14:08:12
>  on-line since 12/13/2006 11:41:36.
>  The i386 processor operates at 2613 MHz,
>	and has an i387 compatible floating point processor.
>Status of virtual processor 3 as of: 12/14/2006 14:08:12
>  on-line since 12/13/2006 11:41:38.
>  The i386 processor operates at 2613 MHz,
>	and has an i387 compatible floating point processor.
>
>I was trying to catch syscalls and got unexpected message:
>bash-3.00# dtrace -n ''syscall::: { @[execname] = count ();
}''
>dtrace: description ''syscall::: '' matched 454 probes
>dtrace: processing aborted: Abort due to systemic unresponsiveness
>
>Because the server is not busy and is being prepared for a production
>I was surprised and did dtrace-ing again with vmstat-ing it. Below is output
from vmstat:
>
>[...]
>
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  668  160  316  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  671  130  268  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  3  3  1  1  738  250  326  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  690  145  284  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  651  161  285  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  687  284  316  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  634  146  268  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  644  150  271  0  0
100
> 0 0 0 14546284 7391632 143 288 0 0 0 0  0  0  0  0  0  692  910  342  0  0
100
> 0 0 0 14459332 7312120 228 3332 0 0 0 0 0  0  0  0  0  884 2050  293  5  4
91		[1]
> 0 0 0 14459332 7312116 0 1  0  0  0  0  0  0  0  0  0  865  204  286  0  1
99
> kthr      memory            page            disk          faults      cpu
> r b w   swap  free  re  mf pi po fr de sr cd cd m2 m3   in   sy   cs us sy
id
> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  0  0  0  0  889  232  306  0  0
100
> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  1  1  0  0  938  326  353  0  0
100
> 0 0 0 14546272 7391616 0 1  0  0  0  0  0  0  0  0  0 53402 1389 1109 4  3
93		[2]
> 0 0 0 14546272 7391616 148 362 0 0 0 0  0  1  1  1  1 1691 3243 1915  3  1
96
> 0 0 0 14545648 7390776 0 0  0  0  0  0  0  0  0  0  0  635   97  264  0  0
100
> 0 0 0 14545648 7390776 1 3  0  0  0  0  0  0  0  0  0  687  210  340  0  0
100
> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  718  289  380  0  0
100
> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  676  224  322  0  0
100
> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  641  126  278  0  0
100
> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  642  127  263  0  0
100
>[...]
>
>The [1] is when I ran the dtrace and [2] is when I got the
>message ("unresponsiveness").
>
>I have read the relevant topic:
>http://www.opensolaris.org/jive/thread.jspa?messageID=15073&
>and am aware that:
>- enabling destructive actions (-w)
>or
>- tuning below parameters for deadmen
>    dtrace_deadman_user
>    dtrace_deadman_interval
>    dtrace_deadman_timeout
>can be helpfull.  I agree that all of them are useful while server is really
busy
>but I wouldn''t expect such behaviour on an idle server !
>
>Is there any way to solve the problem without the tweaks ? I would like to
get
>more knowledge about a nature of the problem.
>
>Regards
>przemol
>
>
>----------------------------------------------------------------------
>smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret
>http://link.interia.pl/f19d4 - najlepsze filmy w intermecie
>
>_______________________________________________
>dtrace-discuss mailing list
>dtrace-discuss at opensolaris.org
>  
>

Dan Mick

2006-Dec-15 22:02 UTC

head link

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

Jon Haslam wrote:> Hi,
> 
> Yes, this looks to be an issue. I''ve had the same problem on some
> dual core M2 systems and so has a colleague.
> 
> By the look of it, our high res timer gets severely out of whack every
> now and then which causes us to think that things have hung. I''ll
> let you know when I know more.
If that''s the problem, it might be fixed by

6342823 Unable to offline CPU 0 on x86 systems

which is in snv_34, and supposed to be in Update 2.  I don''t know if
the
system below is update 2 or not.

(the problem was actually a concomitant fix to timestamp.c under that bug 
number, which probably should have had its own bug number; it only affects 
multi-cpu machines where the TSCs are very different at boot time, which is 
something that''s only started happening with the latest Intel Core2 and
AMD
RevE/RevF processor BIOSes, apparently...so it''s a latent bug exposed
by
newer hardware)

> 
> Jon.
> 
>> Hello,
>>
>> we have 2-way (4-cores) Opteron server (x220M2):
>>
>> bash-3.00# uname -a
>> SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc
>> bash-3.00# psrinfo -v
>> Status of virtual processor 0 as of: 12/14/2006 14:08:12
>>  on-line since 12/13/2006 11:41:29.
>>  The i386 processor operates at 2613 MHz,
>>     and has an i387 compatible floating point processor.
>> Status of virtual processor 1 as of: 12/14/2006 14:08:12
>>  on-line since 12/13/2006 11:41:34.
>>  The i386 processor operates at 2613 MHz,
>>     and has an i387 compatible floating point processor.
>> Status of virtual processor 2 as of: 12/14/2006 14:08:12
>>  on-line since 12/13/2006 11:41:36.
>>  The i386 processor operates at 2613 MHz,
>>     and has an i387 compatible floating point processor.
>> Status of virtual processor 3 as of: 12/14/2006 14:08:12
>>  on-line since 12/13/2006 11:41:38.
>>  The i386 processor operates at 2613 MHz,
>>     and has an i387 compatible floating point processor.
>>
>> I was trying to catch syscalls and got unexpected message:
>> bash-3.00# dtrace -n ''syscall::: { @[execname] = count ();
}''
>> dtrace: description ''syscall::: '' matched 454 probes
>> dtrace: processing aborted: Abort due to systemic unresponsiveness
>>
>> Because the server is not busy and is being prepared for a production
>> I was surprised and did dtrace-ing again with vmstat-ing it. Below is 
>> output from vmstat:
>>
>> [...]
>>
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  668  160  316  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  671  130  268  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  3  3  1  1  738  250  326  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  690  145  284  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  651  161  285  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  687  284  316  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  634  146  268  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  644  150  271  
>> 0  0 100
>> 0 0 0 14546284 7391632 143 288 0 0 0 0  0  0  0  0  0  692  910  342  
>> 0  0 100
>> 0 0 0 14459332 7312120 228 3332 0 0 0 0 0  0  0  0  0  884 2050  293  
>> 5  4 91        [1]
>> 0 0 0 14459332 7312116 0 1  0  0  0  0  0  0  0  0  0  865  204  286  
>> 0  1 99
>> kthr      memory            page            disk          faults     
cpu
>> r b w   swap  free  re  mf pi po fr de sr cd cd m2 m3   in   sy   cs 
>> us sy id
>> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  0  0  0  0  889  232  306  
>> 0  0 100
>> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  1  1  0  0  938  326  353  
>> 0  0 100
>> 0 0 0 14546272 7391616 0 1  0  0  0  0  0  0  0  0  0 53402 1389 1109 
>> 4  3 93        [2]
>> 0 0 0 14546272 7391616 148 362 0 0 0 0  0  1  1  1  1 1691 3243 1915  
>> 3  1 96
>> 0 0 0 14545648 7390776 0 0  0  0  0  0  0  0  0  0  0  635   97  264  
>> 0  0 100
>> 0 0 0 14545648 7390776 1 3  0  0  0  0  0  0  0  0  0  687  210  340  
>> 0  0 100
>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  718  289  380  
>> 0  0 100
>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  676  224  322  
>> 0  0 100
>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  641  126  278  
>> 0  0 100
>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  642  127  263  
>> 0  0 100
>> [...]
>>
>> The [1] is when I ran the dtrace and [2] is when I got the
>> message ("unresponsiveness").
>>
>> I have read the relevant topic:
>> http://www.opensolaris.org/jive/thread.jspa?messageID=15073&
>> and am aware that:
>> - enabling destructive actions (-w)
>> or
>> - tuning below parameters for deadmen
>>    dtrace_deadman_user
>>    dtrace_deadman_interval
>>    dtrace_deadman_timeout
>> can be helpfull.  I agree that all of them are useful while server is 
>> really busy
>> but I wouldn''t expect such behaviour on an idle server !
>>
>> Is there any way to solve the problem without the tweaks ? I would 
>> like to get
>> more knowledge about a nature of the problem.
>>
>> Regards
>> przemol
>>
>>
>> ----------------------------------------------------------------------
>> smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret
>> http://link.interia.pl/f19d4 - najlepsze filmy w intermecie
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>>  
>>
> 
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

Jon Haslam

2006-Dec-15 22:08 UTC

head link

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

>>
>> By the look of it, our high res timer gets severely out of whack every
>> now and then which causes us to think that things have hung.
I''ll
>> let you know when I know more.
>
>
> If that''s the problem, it might be fixed by
>
> 6342823 Unable to offline CPU 0 on x86 systems
>
> which is in snv_34, and supposed to be in Update 2.  I don''t know
if
> the system below is update 2 or not.
Thanks Dan but this happens with the snv gate bits as well. I''ll
contact you off-alias with my findings so far in case they ring
any bells.

Jon.
>
> (the problem was actually a concomitant fix to timestamp.c under that 
> bug number, which probably should have had its own bug number; it only 
> affects multi-cpu machines where the TSCs are very different at boot 
> time, which is something that''s only started happening with the
latest
> Intel Core2 and AMD RevE/RevF processor BIOSes, apparently...so
it''s a
> latent bug exposed by newer hardware)
>
>
>>
>> Jon.
>>
>>> Hello,
>>>
>>> we have 2-way (4-cores) Opteron server (x220M2):
>>>
>>> bash-3.00# uname -a
>>> SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc
>>> bash-3.00# psrinfo -v
>>> Status of virtual processor 0 as of: 12/14/2006 14:08:12
>>>  on-line since 12/13/2006 11:41:29.
>>>  The i386 processor operates at 2613 MHz,
>>>     and has an i387 compatible floating point processor.
>>> Status of virtual processor 1 as of: 12/14/2006 14:08:12
>>>  on-line since 12/13/2006 11:41:34.
>>>  The i386 processor operates at 2613 MHz,
>>>     and has an i387 compatible floating point processor.
>>> Status of virtual processor 2 as of: 12/14/2006 14:08:12
>>>  on-line since 12/13/2006 11:41:36.
>>>  The i386 processor operates at 2613 MHz,
>>>     and has an i387 compatible floating point processor.
>>> Status of virtual processor 3 as of: 12/14/2006 14:08:12
>>>  on-line since 12/13/2006 11:41:38.
>>>  The i386 processor operates at 2613 MHz,
>>>     and has an i387 compatible floating point processor.
>>>
>>> I was trying to catch syscalls and got unexpected message:
>>> bash-3.00# dtrace -n ''syscall::: { @[execname] = count ();
}''
>>> dtrace: description ''syscall::: '' matched 454
probes
>>> dtrace: processing aborted: Abort due to systemic unresponsiveness
>>>
>>> Because the server is not busy and is being prepared for a
production
>>> I was surprised and did dtrace-ing again with vmstat-ing it. Below 
>>> is output from vmstat:
>>>
>>> [...]
>>>
>>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  668  160  
>>> 316  0  0 100
>>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  671  130  
>>> 268  0  0 100
>>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  3  3  1  1  738  250  
>>> 326  0  0 100
>>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  690  145  
>>> 284  0  0 100
>>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  651  161  
>>> 285  0  0 100
>>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  687  284  
>>> 316  0  0 100
>>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  634  146  
>>> 268  0  0 100
>>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  644  150  
>>> 271  0  0 100
>>> 0 0 0 14546284 7391632 143 288 0 0 0 0  0  0  0  0  0  692  910  
>>> 342  0  0 100
>>> 0 0 0 14459332 7312120 228 3332 0 0 0 0 0  0  0  0  0  884 2050  
>>> 293  5  4 91        [1]
>>> 0 0 0 14459332 7312116 0 1  0  0  0  0  0  0  0  0  0  865  204  
>>> 286  0  1 99
>>> kthr      memory            page            disk          
>>> faults      cpu
>>> r b w   swap  free  re  mf pi po fr de sr cd cd m2 m3   in   sy  
cs
>>> us sy id
>>> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  0  0  0  0  889  232  
>>> 306  0  0 100
>>> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  1  1  0  0  938  326  
>>> 353  0  0 100
>>> 0 0 0 14546272 7391616 0 1  0  0  0  0  0  0  0  0  0 53402 1389 
>>> 1109 4  3 93        [2]
>>> 0 0 0 14546272 7391616 148 362 0 0 0 0  0  1  1  1  1 1691 3243 
>>> 1915  3  1 96
>>> 0 0 0 14545648 7390776 0 0  0  0  0  0  0  0  0  0  0  635   97  
>>> 264  0  0 100
>>> 0 0 0 14545648 7390776 1 3  0  0  0  0  0  0  0  0  0  687  210  
>>> 340  0  0 100
>>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  718  289  
>>> 380  0  0 100
>>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  676  224  
>>> 322  0  0 100
>>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  641  126  
>>> 278  0  0 100
>>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  642  127  
>>> 263  0  0 100
>>> [...]
>>>
>>> The [1] is when I ran the dtrace and [2] is when I got the
>>> message ("unresponsiveness").
>>>
>>> I have read the relevant topic:
>>> http://www.opensolaris.org/jive/thread.jspa?messageID=15073&
>>> and am aware that:
>>> - enabling destructive actions (-w)
>>> or
>>> - tuning below parameters for deadmen
>>>    dtrace_deadman_user
>>>    dtrace_deadman_interval
>>>    dtrace_deadman_timeout
>>> can be helpfull.  I agree that all of them are useful while server 
>>> is really busy
>>> but I wouldn''t expect such behaviour on an idle server !
>>>
>>> Is there any way to solve the problem without the tweaks ? I would 
>>> like to get
>>> more knowledge about a nature of the problem.
>>>
>>> Regards
>>> przemol
>>>
>>>
>>>
----------------------------------------------------------------------
>>> smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme,
kabaret
>>> http://link.interia.pl/f19d4 - najlepsze filmy w intermecie
>>>
>>> _______________________________________________
>>> dtrace-discuss mailing list
>>> dtrace-discuss at opensolaris.org
>>>  
>>>
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

przemolicc at poczta.fm

2006-Dec-18 08:06 UTC

head link

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

On Fri, Dec 15, 2006 at 02:02:43PM -0800, Dan Mick
wrote:> Jon Haslam wrote:
> >Hi,
> >
> >Yes, this looks to be an issue. I''ve had the same problem on
some
> >dual core M2 systems and so has a colleague.
> >
> >By the look of it, our high res timer gets severely out of whack every
> >now and then which causes us to think that things have hung.
I''ll
> >let you know when I know more.
> 
> If that''s the problem, it might be fixed by
> 
> 6342823 Unable to offline CPU 0 on x86 systems
> 
> which is in snv_34, and supposed to be in Update 2.  I don''t know
if the
> system below is update 2 or not.
As Jon said, it is not fixed in update 2:

host:> cat /etc/release
                        Solaris 10 6/06 s10x_u2wos_09a X86
           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 09 June 2006

przemol

----------------------------------------------------------------------
smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret
http://link.interia.pl/f19d4 - najlepsze filmy w intermecie

Jon Haslam

2006-Dec-22 18:51 UTC

head link

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

Hi,

Apologies for taking a while to get back you on this.

This issue is to do with how we handle (or don''t handle) tsc''s
being different across CPU''s in the timer function that DTrace
is using. I''ve logged the following CR for this:

6507659: tsc differences between CPU''s give dtrace_gethrtime() serious
problems

Hopefully a fix will be coming shortly.


Cheers.

Jon.
>Hello,
>
>we have 2-way (4-cores) Opteron server (x220M2):
>
>bash-3.00# uname -a
>SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc
>bash-3.00# psrinfo -v
>Status of virtual processor 0 as of: 12/14/2006 14:08:12
>  on-line since 12/13/2006 11:41:29.
>  The i386 processor operates at 2613 MHz,
>	and has an i387 compatible floating point processor.
>Status of virtual processor 1 as of: 12/14/2006 14:08:12
>  on-line since 12/13/2006 11:41:34.
>  The i386 processor operates at 2613 MHz,
>	and has an i387 compatible floating point processor.
>Status of virtual processor 2 as of: 12/14/2006 14:08:12
>  on-line since 12/13/2006 11:41:36.
>  The i386 processor operates at 2613 MHz,
>	and has an i387 compatible floating point processor.
>Status of virtual processor 3 as of: 12/14/2006 14:08:12
>  on-line since 12/13/2006 11:41:38.
>  The i386 processor operates at 2613 MHz,
>	and has an i387 compatible floating point processor.
>
>I was trying to catch syscalls and got unexpected message:
>bash-3.00# dtrace -n ''syscall::: { @[execname] = count ();
}''
>dtrace: description ''syscall::: '' matched 454 probes
>dtrace: processing aborted: Abort due to systemic unresponsiveness
>
>Because the server is not busy and is being prepared for a production
>I was surprised and did dtrace-ing again with vmstat-ing it. Below is output
from vmstat:
>
>[...]
>
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  668  160  316  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  671  130  268  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  3  3  1  1  738  250  326  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  690  145  284  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  651  161  285  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  687  284  316  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  634  146  268  0  0
100
> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  644  150  271  0  0
100
> 0 0 0 14546284 7391632 143 288 0 0 0 0  0  0  0  0  0  692  910  342  0  0
100
> 0 0 0 14459332 7312120 228 3332 0 0 0 0 0  0  0  0  0  884 2050  293  5  4
91		[1]
> 0 0 0 14459332 7312116 0 1  0  0  0  0  0  0  0  0  0  865  204  286  0  1
99
> kthr      memory            page            disk          faults      cpu
> r b w   swap  free  re  mf pi po fr de sr cd cd m2 m3   in   sy   cs us sy
id
> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  0  0  0  0  889  232  306  0  0
100
> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  1  1  0  0  938  326  353  0  0
100
> 0 0 0 14546272 7391616 0 1  0  0  0  0  0  0  0  0  0 53402 1389 1109 4  3
93		[2]
> 0 0 0 14546272 7391616 148 362 0 0 0 0  0  1  1  1  1 1691 3243 1915  3  1
96
> 0 0 0 14545648 7390776 0 0  0  0  0  0  0  0  0  0  0  635   97  264  0  0
100
> 0 0 0 14545648 7390776 1 3  0  0  0  0  0  0  0  0  0  687  210  340  0  0
100
> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  718  289  380  0  0
100
> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  676  224  322  0  0
100
> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  641  126  278  0  0
100
> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  642  127  263  0  0
100
>[...]
>
>The [1] is when I ran the dtrace and [2] is when I got the
>message ("unresponsiveness").
>
>I have read the relevant topic:
>http://www.opensolaris.org/jive/thread.jspa?messageID=15073&
>and am aware that:
>- enabling destructive actions (-w)
>or
>- tuning below parameters for deadmen
>    dtrace_deadman_user
>    dtrace_deadman_interval
>    dtrace_deadman_timeout
>can be helpfull.  I agree that all of them are useful while server is really
busy
>but I wouldn''t expect such behaviour on an idle server !
>
>Is there any way to solve the problem without the tweaks ? I would like to
get
>more knowledge about a nature of the problem.
>
>Regards
>przemol
>
>
>----------------------------------------------------------------------
>smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret
>http://link.interia.pl/f19d4 - najlepsze filmy w intermecie
>
>_______________________________________________
>dtrace-discuss mailing list
>dtrace-discuss at opensolaris.org
>  
>

Robert Milkowski

2007-Apr-19 00:40 UTC

head link

[dtrace-discuss] Re: dtrace: processing aborted: Abort due to systemic

I can see it''s been fixed in snv_58.
Is S10 patch planned?
Should I ask for escalation via support?
 
 
This message posted from opensolaris.org

Jon Haslam

2007-Apr-20 17:23 UTC

head link

[dtrace-discuss] Re: dtrace: processing aborted: Abort due to systemic

Hi Robert,
> I can see it''s been fixed in snv_58.
> Is S10 patch planned?
> Should I ask for escalation via support?
I can''t be sure that you are hitting the same problem but I assume you
are referring to:

6507659 tsc differences between CPU''s give dtrace_gethrtime serious 
problems.

If so, then it looks like there is an escalation open for this. If you need
immediate relief you should contact your support channel who will
probably be able to get you an IDR patch.

You may be aware but I''ll mention it anyway; enable destructive actions
to stop the consumer aborting. Of course, the very nature of the problem
means that your high resolution timestamps could be wrong occasionally so
you need to bear that in mind when using the ''timestamp''
builtin (for
example).

Jon.

dtrace discuss - Dec 2006 - dtrace: processing aborted: Abort due to systemic unresponsiveness - again

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

[dtrace-discuss] Re: dtrace: processing aborted: Abort due to systemic

[dtrace-discuss] Re: dtrace: processing aborted: Abort due to systemic