tester
2008-Nov-16 01:07 UTC
[dtrace-discuss] dtrace : Abort due to systemic unresponsiveness
system is Solaris 10 11/06 s10s_u3wos_10 SPARC on a frame. I running speculative tracing for fork* and have this cluse for fbt fbt::: /self->spec/ { /* * A speculate() with no other actions speculates the default action: * tracing the EPID. */ speculate(self->spec); } I had the following messages which ended with abort message. dtrace: 5 failed speculations (no speculative buffer available) dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry): invalid address (0x0) in action #3 at DIF offset 28 dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry): invalid address (0x0) in action #3 at DIF offset 28 dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry): invalid address (0x0) in action #3 at DIF offset 28 dtrace: 1 failed speculation (available buffer(s) still busy) dtrace: 5 failed speculations (no speculative buffer available) dtrace: 1 failed speculation (no speculative buffer available) dtrace: 1 failed speculation (no speculative buffer available) dtrace: 2 failed speculations (no speculative buffer available) dtrace: processing aborted: Abort due to systemic unresponsiveness The script ran for more than 20 hours and several of the above messages. I read this may been due to load. This is the sar o/p when this happened aborted at 14:12 14:00:05 9 48 0 42 14:05:04 9 34 0 57 14:10:03 8 30 0 62 14:15:04 9 47 0 44 14:20:02 12 36 0 52 14:25:01 22 30 0 47 14:30:03 25 25 0 50 14:35:03 25 20 0 56 14:40:01 18 14 0 68 sar -r (no conversions) 14:00:05 82988 20089718 14:05:04 80843 20107225 14:10:03 77365 19986565 14:15:04 83289 20103278 14:20:02 71336 19884133 14:25:01 72762 19995819 14:30:03 68079 20095938 14:35:03 70012 20155378 14:40:01 93756 19817050 I am going re-start the script, this was done to troubleshoot EAGAIN errors from fork. Please suggest any mods that I need to make. -- This message posted from opensolaris.org
S h i v
2008-Nov-16 02:28 UTC
[dtrace-discuss] dtrace : Abort due to systemic unresponsiveness
>From my limited understanding, there are 2 kinds of speculationrelated failures that are seen: Failures due to insufficient number of buffers (no speculative buffer available) and failures due to using a buffer that is busy getting discarded/committed (available buffer(s) still busy) For the former consider increasing the number of buffers with the nspec option. For the latter, consider setting the CPU cleanrate to value beyond the default (101) The "Abort due to systemic unresponsiveness" indicates heavy load. This if addressed will probably not throw the "error on enabled probe" error assuming data access is happening right. When is the speculative buffer created? When is it committed? Can the paths being speculated be reduced? -Shiv On Sun, Nov 16, 2008 at 6:37 AM, tester <solaris.identity at gmail.com> wrote:> system is Solaris 10 11/06 s10s_u3wos_10 SPARC on a frame. > > I running speculative tracing for fork* and have this cluse for fbt > > fbt::: > /self->spec/ > { > /* > * A speculate() with no other actions speculates the default action: > * tracing the EPID. > */ > speculate(self->spec); > } > > I had the following messages which ended with abort message. > > dtrace: 5 failed speculations (no speculative buffer available) > dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry): invalid address (0x0) in action #3 at DIF offset 28 > dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry): invalid address (0x0) in action #3 at DIF offset 28 > dtrace: error on enabled probe ID 1 (ID 3068: syscall::fork1:entry): invalid address (0x0) in action #3 at DIF offset 28 > dtrace: 1 failed speculation (available buffer(s) still busy) > dtrace: 5 failed speculations (no speculative buffer available) > dtrace: 1 failed speculation (no speculative buffer available) > dtrace: 1 failed speculation (no speculative buffer available) > dtrace: 2 failed speculations (no speculative buffer available) > dtrace: processing aborted: Abort due to systemic unresponsiveness >
tester
2008-Nov-16 03:26 UTC
[dtrace-discuss] dtrace : Abort due to systemic unresponsiveness
> >From my limited understanding, there are 2 kinds of > speculation > related failures that are seen: > Failures due to insufficient number of buffers (no > speculative buffer > available) and failures due to using a buffer that is > busy getting > discarded/committed (available buffer(s) still busy) > For the former consider increasing the number of > buffers with the > nspec option. For the latter, consider setting the > CPU cleanrate to > value beyond the default (101) > The "Abort due to systemic unresponsiveness" > indicates heavy load. > This if addressed will probably not throw the "error > on enabled probe" > error assuming data access is happening right. > > When is the speculative buffer created? When is it > committed? Can the > paths being speculated be reduced? > > -Shiv >syscall::fork*:entry { self->spec = speculation(); } syscall::fork*:return /self->spec/ { speculate(self->spec); } syscall::fork*:return /self->spec && errno != 0/ { /* * If errno is non-zero, we want to commit the speculation. */ commit(self->spec); self->spec = 0; } syscall::fork*:return /self->spec && errno == 0/ { /* * If errno is not set, we discard the speculation. */ discard(self->spec); self->spec = 0; } As I mentioned we want to capture code path for a fork failre. Any changes needed? I am more interested in knowing why it aborted? Sar o/p tell around 40% cpu idle (of-course the sar granularity is less). Thanks for any comments. -- This message posted from opensolaris.org