thr3ads.net - dtrace discuss - [dtrace-discuss] dtrace : Abort due to systemic unresponsiveness [Nov 2008]

If this information is useful, please help other people find it:
Share via:

tester

2008-Nov-16 01:07 UTC

[dtrace-discuss] dtrace : Abort due to systemic unresponsiveness

system is Solaris 10 11/06 s10s_u3wos_10 SPARC on a frame.

I running speculative tracing for fork* and have this cluse for fbt

fbt:::
/self->spec/
{
        /*
         * A speculate() with no other actions speculates the default action:
         * tracing the EPID.
         */
        speculate(self->spec);
}

I had the following messages which ended with abort message.

dtrace: 5 failed speculations (no speculative buffer available)
dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry): invalid
address (0x0) in action #3 at DIF offset 28
dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry): invalid
address (0x0) in action #3 at DIF offset 28
dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry): invalid
address (0x0) in action #3 at DIF offset 28
dtrace: 1 failed speculation (available buffer(s) still busy)
dtrace: 5 failed speculations (no speculative buffer available)
dtrace: 1 failed speculation (no speculative buffer available)
dtrace: 1 failed speculation (no speculative buffer available)
dtrace: 2 failed speculations (no speculative buffer available)
dtrace: processing aborted: Abort due to systemic unresponsiveness

The script ran for more than 20 hours and several of the above messages.

 I read this may been due to load. This is the sar o/p when this happened
aborted at 14:12

14:00:05       9      48       0      42
14:05:04       9      34       0      57
14:10:03       8      30       0      62
14:15:04       9      47       0      44
14:20:02      12      36       0      52
14:25:01      22      30       0      47
14:30:03      25      25       0      50
14:35:03      25      20       0      56
14:40:01      18      14       0      68

sar -r (no conversions)
14:00:05   82988 20089718
14:05:04   80843 20107225
14:10:03   77365 19986565
14:15:04   83289 20103278
14:20:02   71336 19884133
14:25:01   72762 19995819
14:30:03   68079 20095938
14:35:03   70012 20155378
14:40:01   93756 19817050

I am going re-start the script, this was done to troubleshoot EAGAIN errors from
fork. Please suggest any mods that I need to make.
-- 
This message posted from opensolaris.org

S h i v

2008-Nov-16 02:28 UTC

head link

[dtrace-discuss] dtrace : Abort due to systemic unresponsiveness

>From my limited understanding, there are 2 kinds of speculationrelated failures that are seen:
Failures due to insufficient number of buffers (no speculative buffer
available) and failures due to using a buffer that is busy getting
discarded/committed (available buffer(s) still busy)
For the former consider increasing the number of buffers with the
nspec option. For the latter, consider setting the CPU cleanrate to
value beyond the default (101)
The "Abort due to systemic unresponsiveness" indicates heavy load.
This if addressed will probably not throw the "error on enabled probe"
error assuming data access is happening right.

When is the speculative buffer created? When is it committed? Can the
paths being speculated be reduced?

-Shiv


On Sun, Nov 16, 2008 at 6:37 AM, tester <solaris.identity at gmail.com>
wrote:> system is Solaris 10 11/06 s10s_u3wos_10 SPARC on a frame.
>
> I running speculative tracing for fork* and have this cluse for fbt
>
> fbt:::
> /self->spec/
> {
>        /*
>         * A speculate() with no other actions speculates the default
action:
>         * tracing the EPID.
>         */
>        speculate(self->spec);
> }
>
> I had the following messages which ended with abort message.
>
> dtrace: 5 failed speculations (no speculative buffer available)
> dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry):
invalid address (0x0) in action #3 at DIF offset 28
> dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry):
invalid address (0x0) in action #3 at DIF offset 28
> dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry):
invalid address (0x0) in action #3 at DIF offset 28
> dtrace: 1 failed speculation (available buffer(s) still busy)
> dtrace: 5 failed speculations (no speculative buffer available)
> dtrace: 1 failed speculation (no speculative buffer available)
> dtrace: 1 failed speculation (no speculative buffer available)
> dtrace: 2 failed speculations (no speculative buffer available)
> dtrace: processing aborted: Abort due to systemic unresponsiveness
>

tester

2008-Nov-16 03:26 UTC

head link

[dtrace-discuss] dtrace : Abort due to systemic unresponsiveness

> >From my limited understanding, there are 2 kinds of
> speculation
> related failures that are seen:
> Failures due to insufficient number of buffers (no
> speculative buffer
> available) and failures due to using a buffer that is
> busy getting
> discarded/committed (available buffer(s) still busy)
> For the former consider increasing the number of
> buffers with the
> nspec option. For the latter, consider setting the
> CPU cleanrate to
> value beyond the default (101)
> The "Abort due to systemic unresponsiveness"
> indicates heavy load.
> This if addressed will probably not throw the "error
> on enabled probe"
> error assuming data access is happening right.
> 
> When is the speculative buffer created? When is it
> committed? Can the
> paths being speculated be reduced?
> 
> -Shiv
> syscall::fork*:entry
{

 self->spec = speculation();
}

syscall::fork*:return
/self->spec/
{

 speculate(self->spec);

}

syscall::fork*:return
/self->spec && errno != 0/
{
        /*
         * If errno is non-zero, we want to commit the speculation.
         */
        commit(self->spec);
        self->spec = 0;
}

syscall::fork*:return
/self->spec && errno == 0/
{
        /*
         * If errno is not set, we discard the speculation.
         */
        discard(self->spec);
        self->spec = 0;
}

As I mentioned we want to capture code path for a fork failre.  Any changes
needed? I am more interested in knowing why it aborted? Sar o/p tell around 40%
cpu idle (of-course the sar granularity is less). Thanks for any comments.
-- 
This message posted from opensolaris.org

dtrace discuss - Nov 2008 - dtrace : Abort due to systemic unresponsiveness

[dtrace-discuss] dtrace : Abort due to systemic unresponsiveness

[dtrace-discuss] dtrace : Abort due to systemic unresponsiveness

[dtrace-discuss] dtrace : Abort due to systemic unresponsiveness