thr3ads.net - dtrace discuss - [dtrace-discuss] How to speculate and commit on hang. [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Brian Utterback

2007-Sep-20 15:28 UTC

[dtrace-discuss] How to speculate and commit on hang.

I am trying to find the root cause of a bug that results in a system
call that blocks and does not return when it should. I am using
speculations and the fbt provider to trace the kernel function calls
that lead to this. From system dumps I know exactly where the call
blocks.

My problem is that the point at which it blocks is a wait for CV
kind of thing and it is called on most trips through this system
call. The bug is that occasionally a thread checks in but never
checks out. Obviously I have no problem detecting when a block did
not occur, namely the CV wait function returns. In that case I
can discard the speculation. But how do I commit the speculation
when the function doesn''t return?

I thought of saving the active speculations in an associative
array with the spec as the key and the timestamp as the value.
That way, any key whose timestamp is more than some threshold
old should be committed, but I can''t figure out a way to get
the set keys and iterate over them.

Any ideas on how to trace something that doesn''t happen?
-- 
blu

Screening ideas are indeed thought up by the Office for Annoying
Air Travelers and vetted through the Directorate for Confusion
and Complexity - Kip Hawley, Head of the TSA
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom

Bryan Cantrill

2007-Sep-20 15:56 UTC

head link

[dtrace-discuss] How to speculate and commit on hang.

Hey Brian,
> I am trying to find the root cause of a bug that results in a system
> call that blocks and does not return when it should. I am using
> speculations and the fbt provider to trace the kernel function calls
> that lead to this. From system dumps I know exactly where the call
> blocks.
> 
> My problem is that the point at which it blocks is a wait for CV
> kind of thing and it is called on most trips through this system
> call. The bug is that occasionally a thread checks in but never
> checks out. 
So what do you mean exactly when you say that?  You have a missed
wakeup -- is it your hypothesis that this is a missed cv_signal()?
Or do you mean something else by "never checks out"?
> Obviously I have no problem detecting when a block did
> not occur, namely the CV wait function returns. In that case I
> can discard the speculation. But how do I commit the speculation
> when the function doesn''t return?
> 
> I thought of saving the active speculations in an associative
> array with the spec as the key and the timestamp as the value.
> That way, any key whose timestamp is more than some threshold
> old should be committed, but I can''t figure out a way to get
> the set keys and iterate over them.
That''s a creative idea, but it won''t work for exactly the
reason
you described.  My recommendation:  have an associative array that is
keyed by the speculation, for which the value is timestamp.  Then,
use a high frequency tick probe to sweep through the array (speculation
identifiers start at 1 and proceed monotonically to the number of
speculative buffers), committing speculations that are older than some
threshold.  e.g.:

	tick-1234hz
	{
		this->spec = (ndx++ % nspeculations) + 1;
		this->speculated = myspec[this->spec];
	}

	tick-1234hz
	/this->speculated && timestamp - this->speculated > threshold/
	{
		commit(myspec[this->spec]);
	}

(Where "myspec" is an associate array of speculations,
"nspeculations" is
whatever you have tuned "nspec" to, "ndx" is an integer, and
"threshold"
is a nanosecond threshold after which uncommitted speculations should be
committed.  And it should go without saying that any discarded speculations
will need to have myspec[speculation] zeroed.)

Hopefully that''ll do it for you -- with apologies for the contortions
that effecting this requires. ;)

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc

Brian Utterback

2007-Sep-20 16:15 UTC

head link

[dtrace-discuss] How to speculate and commit on hang.

Bryan Cantrill wrote:> keyed by the speculation, for which the value is timestamp.  Then,
> use a high frequency tick probe to sweep through the array (speculation
> identifiers start at 1 and proceed monotonically to the number of
> speculative buffers), committing speculations that are older than some
> threshold.  e.g.:
Ohhh. Thank you. That is exactly what I needed to know. Everything
else is exactly what I had planned. My missing piece was how to
iterate through the buffers. I thought speculations must be opaque
pieces of data, so I needed a way to get the keys of an associative
array, but knowing the range of speculation buffers works quite as
well. But just for curiosity''s sake, is there a way to get a list
of the keys of an associative array?

> 
> 	tick-1234hz
> 	{
> 		this->spec = (ndx++ % nspeculations) + 1;
> 		this->speculated = myspec[this->spec];
> 	}
> 
> 	tick-1234hz
> 	/this->speculated && timestamp - this->speculated >
threshold/
> 	{
> 		commit(myspec[this->spec]);
> 	}

-- 
blu

Screening ideas are indeed thought up by the Office for Annoying
Air Travelers and vetted through the Directorate for Confusion
and Complexity - Kip Hawley, Head of the TSA
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom

Bryan Cantrill

2007-Sep-20 16:24 UTC

head link

[dtrace-discuss] How to speculate and commit on hang.

Hey Brian,
> Bryan Cantrill wrote:
> >keyed by the speculation, for which the value is timestamp.  Then,
> >use a high frequency tick probe to sweep through the array (speculation
> >identifiers start at 1 and proceed monotonically to the number of
> >speculative buffers), committing speculations that are older than some
> >threshold.  e.g.:
> 
> Ohhh. Thank you. That is exactly what I needed to know. Everything
> else is exactly what I had planned. My missing piece was how to
> iterate through the buffers. I thought speculations must be opaque
> pieces of data, so I needed a way to get the keys of an associative
> array, but knowing the range of speculation buffers works quite as
> well. But just for curiosity''s sake, is there a way to get a list
> of the keys of an associative array?
No.  For a bunch of reasons, I''m afraid -- not least being safety.
(If one could iterate over the keys of an associative array, would
you allow an iteration routine to add a key to that array?  If so,
that''s an infinite loop, and if not it''s yet-more odd
semantics.)  But
there are also issues of scalability and coherence so it''s not that
there''s but a single obstacle.  In general, when one is feeling like
iterating over an associative array, aggregations should be used.  Of
course, there are situations where aggregations don''t suffice (like
yours), but in general the lack of associative array iteration is
an unfortunate constraint but not a debilitating one...

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc

Brian Utterback

2007-Sep-20 16:39 UTC

head link

[dtrace-discuss] How to speculate and commit on hang.

Bryan Cantrill wrote:
> No.  For a bunch of reasons, I''m afraid -- not least being safety.
> (If one could iterate over the keys of an associative array, would
> you allow an iteration routine to add a key to that array?  If so,
> that''s an infinite loop, and if not it''s yet-more odd
semantics.)  But
> there are also issues of scalability and coherence so it''s not
that
> there''s but a single obstacle.  In general, when one is feeling
like
> iterating over an associative array, aggregations should be used.  Of
> course, there are situations where aggregations don''t suffice
(like
> yours), but in general the lack of associative array iteration is
> an unfortunate constraint but not a debilitating one...

Rather than having an iterating routine, couldn''t you have a function
that would return an array with the keys as the values? Or have a
way to index an array with the n''th key?

-- 
blu

Screening ideas are indeed thought up by the Office for Annoying
Air Travelers and vetted through the Directorate for Confusion
and Complexity - Kip Hawley, Head of the TSA
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom

Bryan Cantrill

2007-Sep-20 16:44 UTC

head link

[dtrace-discuss] How to speculate and commit on hang.

> >No.  For a bunch of reasons, I''m afraid -- not least being
safety.
> >(If one could iterate over the keys of an associative array, would
> >you allow an iteration routine to add a key to that array?  If so,
> >that''s an infinite loop, and if not it''s yet-more odd
semantics.)  But
> >there are also issues of scalability and coherence so it''s not
that
> >there''s but a single obstacle.  In general, when one is
feeling like
> >iterating over an associative array, aggregations should be used.  Of
> >course, there are situations where aggregations don''t suffice
(like
> >yours), but in general the lack of associative array iteration is
> >an unfortunate constraint but not a debilitating one...
> 
> 
> Rather than having an iterating routine, couldn''t you have a
function
> that would return an array with the keys as the values? 
And what would do with that array?  If you want to print it out or
otherwise transfer it to user-level, you should be using an aggregation...
> Or have a way to index an array with the n''th key?
I think in most situations one can use a set of known (e.g., incrementing)
keys -- as in your case -- so this isn''t required.  But yes, we could
add
a way (albeit an incoherent one) to get the nth key -- but it would 
have odd (that is, incoherent) semantics that I would rather avoid, and
it seems to be prone to misuse.  (That is, new DTrace users looking to
an associative array when they should be using an aggregation.)  So
I''m open to it, just not very open to it. ;)

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc

Kris Raney

2007-Oct-08 15:20 UTC

head link

[dtrace-discuss] How to speculate and commit on hang.

> 	tick-1234hz
> 	{
> 		this->spec = (ndx++ % nspeculations) + 1;
> 		this->speculated = myspec[this->spec];
> 	}
> 
> 	tick-1234hz
> /this->speculated && timestamp - this->speculated >
> > threshold/
> 	{
> 		commit(myspec[this->spec]);
> 	}
Curious: given that this is a backhanded implementation of a loop, why
couldn''t loop semantics be supported, and be implemented in exactly
this way underneath?

Or alternately, why couldn''t there be a foreach() operator to iterate
over an associative array, treated as a destructive action? Could not the
foreach() operation snapshot the list of keys before beginning, and iterate
through that exact list even if the contents of the array are changed during
iteration?


--
This message posted from opensolaris.org

Alexander Kolbasov

2007-Oct-08 22:03 UTC

head link

[dtrace-discuss] How to speculate and commit on hang.

> > 	tick-1234hz
> > 	{
> > 		this->spec = (ndx++ % nspeculations) + 1;
> > 		this->speculated = myspec[this->spec];
> > 	}
> > 
> > 	tick-1234hz
> > /this->speculated && timestamp - this->speculated >
> > > threshold/
> > 	{
> > 		commit(myspec[this->spec]);
> > 	}
> 
> Curious: given that this is a backhanded implementation of a loop, why
couldn''t loop semantics be supported, and be implemented in exactly
this way underneath?
> 
> Or alternately, why couldn''t there be a foreach() operator to
iterate over an associative array, treated as a destructive action? Could not
the foreach() operation snapshot the list of keys before beginning, and iterate
through that exact list even if the contents of the array are changed during
iteration?
There are at least two issues here:

1) You need to allocate a potentially big chunk of memory to hold the snapshot 
and the allocation needs to happen in almost arbitrary context that the probe 
can fire

2) The number of keys can be huge and iterating over it will take a 
significant amount of time when no other system activity can happen. This can 
be disastrous for the system health.


The time-based loops ensure that only a fixed amount of work is done in the 
probe context and the system can do other stuff between probes.

- akolb

Kris Raney

2007-Oct-09 13:17 UTC

head link

[dtrace-discuss] How to speculate and commit on hang.

> 1) You need to allocate a potentially big chunk of
> memory to hold the snapshot 
> and the allocation needs to happen in almost
> arbitrary context that the probe 
> can fire
It need not be implemented as a snapshot. You could timestamp keys, so that keys
newer than the beginning of the loop can be ignored. You could queue new array
items for insertion after the loop exits. You could use a data structure that
helps solve this problem. Or you could make it illegal to add to the associative
array you are iterating over during the iteration.

 > 2) The number of keys can be huge and iterating over
> it will take a 
> significant amount of time when no other system
> activity can happen. This can 
> be disastrous for the system health.
> The time-based loops ensure that only a fixed amount
> of work is done in the 
> probe context and the system can do other stuff
> between probes.
It seems I haven''t made my proposal clear. The example given solves the
iteration in a dtrace way, using timers. Why not provide syntactic sugar
allowing the user to do exactly that, and have it look like a loop? The
disastrous delay wouldn''t happen, because each iteration of the loop
would be in a separate timer. The underlying implementation would be exactly
identical to the example.

By providing the syntactic sugar way, you could also help the user with getting
the loop started at the right time, and with stopping its execution when
it''s completed a full iteration.

Even if I grant that an iterator could cause an disastrous delay, my last
question remains, which is why couldn''t it be allowed, but be
considered a destructive action?

I have a similar situation to the original poster - I have a third party library
whose memory use grows unbounded. It''s not strictly a leak, because it
keeps references to the memory and cleans up at exit time. So it
doesn''t show up in memory tools. I would like to ''punch
in'' with dtrace when the program is at steady state, trigger the growth
and return to steady state, then ''punch out'' and commit() the
frees that *didn''t* happen, so I can go fix that library.

I was able to make this work somewhat using the solution above. But assuming the
keys of the speculations, and iterating over the entire range on the assumption
that committing an unwanted one is ''safe'' feels pretty
kludgey. Also, it''s a nuisance getting the ''loop'' to
run at the right time, since I can''t associate it with the END probe.
The fact that it repeatedly iterates over the array even when I know logically
there''s no work to do yet adds an unwanted system load.

It seems to me like something that could use improvement in the language. What I
wonder is, how many people have to complain about it before you''re
willing to reprioritize your requirements?


--
This message posted from opensolaris.org

dtrace discuss - Sep 2007 - How to speculate and commit on hang.

[dtrace-discuss] How to speculate and commit on hang.

[dtrace-discuss] How to speculate and commit on hang.

[dtrace-discuss] How to speculate and commit on hang.

[dtrace-discuss] How to speculate and commit on hang.

[dtrace-discuss] How to speculate and commit on hang.

[dtrace-discuss] How to speculate and commit on hang.

[dtrace-discuss] How to speculate and commit on hang.

[dtrace-discuss] How to speculate and commit on hang.

[dtrace-discuss] How to speculate and commit on hang.