thr3ads.net - dtrace discuss - [dtrace-discuss] Struggling with dynvar space: how big should it become? [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Paul van den Bogaard

2008-Feb-28 13:53 UTC

[dtrace-discuss] Struggling with dynvar space: how big should it become?

Trying to get a script to run for an hour. Up till now every attempt failed
before the hour was over. Latest command was:

dtrace  -q -x dynvarsize=16g -x cleanrate=400  -s /DTrace/newstr2.d

growing that dynvarsize from 128m, 25, 512 and doubling. My system
"only" has 64GB of RAM and still dtrace produces many of the
following:

dtrace: 1255 dynamic variable drops with non-empty rinsing list
dtrace: 37907 dynamic variable drops with non-empty dirty list
dtrace: 701859 dynamic variable drops
dtrace: 1279 dynamic variable drops with non-empty rinsing list
dtrace: 43404 dynamic variable drops with non-empty dirty list
dtrace: 711953 dynamic variable drops
dtrace: 3990 dynamic variable drops with non-empty rinsing list
dtrace: 42116 dynamic variable drops with non-empty dirty list
dtrace: 170163 dynamic variable drops
dtrace: 9 dynamic variable drops with non-empty rinsing list
dtrace: 8591 dynamic variable drops with non-empty dirty list

With the 16g setting it takes about 20 minutes before the first of these
messages is produced. After that it is producing them continuously.

I am "only" running a 384 process setup with PostgreSQL and hope to be
abel to do one with 1280 processes. The DTrace script in use references a couple
of "home grown" USDT probes. The provider definition is as follows:

provider postgresql {

probe transaction__start(int);
probe transaction__commit(int);
probe transaction__abort(int);
probe lock__startwait(int, int);
probe lock__endwait(int, int);

probe lwlock__acquire(int, int);
probe lwlock__acquire__enter(int, int);
probe lwlock__acquire__return(int, int);
probe lwlock__release(int);
probe lwlock__release__enter(int);
probe lwlock__release__return(int, int);
probe lwlock__startwait(int, int, int, int);
probe lwlock__endwait(int, int, int);
probe lwlock__condacquire(int, int);
probe lwlock__condacquire__fail(int, int);

probe getsnapshotdata(int, int, int, int, int);
probe ready_for_query_start();
probe ready_for_query_end();
probe ignore_till_sync();
probe handle_command_start(int);
probe handle_command_end(int);

};

The failing script:

self int handleCommand;
self int readyForQuery;
self char commandType;
self int commandTypeInitialized;

BEGIN { started = timestamp; }

postgresql*:postgres::ready_for_query_start
{
        self->readyForQuery = timestamp;
}
 
postgresql*:postgres::ready_for_query_end
/self->readyForQuery /
{
        @count["readyForQuery", ''-''] = count();
        @elapsed["readyForQuery", ''-''] =
sum(timestamp - self->readyForQuery);
        self->readyForQuery = 0;
}
 
postgresql*:postgres::ignore_till_sync
{
        @count["ignore_till_sync", ''-''] = count();
}
 
postgresql*:postgres::handle_command_start
{
        self->commandTypeInitialized = 1;
}
 
postgresql*:postgres::handle_command_start
/arg0 == 0/
{
        self->handleCommand = 0;
        @count["zero",''-''] = count();
}
 
postgresql*:postgres::handle_command_start
/arg0 != 0/
{
        self->handleCommand = timestamp;
        self->commandType = (char)arg0;
}
 
postgresql*:postgres::handle_command_end
/self->handleCommand/
{[/b]
        @count["handleCommand", arg0] = count();
        @elapsed["handleCommand", arg0] = sum(timestamp -
self->handleCommand);
        self->handleCommand = 0;
        self->commandType = ''-'';
}


postgresql*:postgres:LWLockAcquire:lwlock-acquire
{       
        self->ts[arg0,arg1]=timestamp;
        self->mode[arg0] = arg1;
}       
        
postgresql*:postgres:LWLockRelease:lwlock-release
/self->commandTypeInitialized && arg0 < 25 &&
self->ts[arg0,self->mode[arg0] ] /
{       
        this->mode = self->mode[arg0];
        @lockheldCount[arg0,this->mode,self->commandType] = count();
        @lockheldTime[arg0,this->mode,self->commandType] = sum (timestamp
- self->ts[arg0,this->mode]);
        self->ts[arg0,this->mode] = 0;
}       
        
postgresql*:postgres:LWLockRelease:lwlock-release
/self->commandTypeInitialized && arg0 >= 25 &&
self->ts[arg0,self->mode[arg0]] /
{       
        this->mode = self->mode[arg0];
        @lockheldCount[25,this->mode,self->commandType] = count();
        @lockheldTime[25,this->mode,self->commandType] = sum (timestamp -
self->ts[arg0,this->mode]);
        self->ts[arg0,this->mode] = 0;
}       
        
postgresql*:postgres:LWLockAcquire:lwlock-startwait
/arg0 < 25/
{       
        self->tsw[arg0,arg1] = timestamp;
        @queueLengthStartwaitMax[ arg0,arg1 ] = max(arg2);
        @queueLengthStartwaitAvg[ arg0,arg1 ] = avg(arg2);
        @queueLengt2StartwaitMax[ arg0,arg1 ] = max(arg3);
        @queueLengt2StartwaitAvg[ arg0,arg1 ] = avg(arg3);
}       
        
postgresql*:postgres:LWLockAcquire:lwlock-startwait
/arg0 >= 25/
{       
        self->tsw[25,arg1] = timestamp;
        @queueLengthStartwaitMax[25,arg1 ] = max(arg2);
        @queueLengthStartwaitAvg[25,arg1 ] = avg(arg2);
        @queueLengt2StartwaitMax[25,arg1 ] = max(arg3);
        @queueLengt2StartwaitAvg[25,arg1 ] = avg(arg3);
}       
postgresql*:postgres:LWLockAcquire:lwlock-endwait
/self->commandTypeInitialized && self->tsw[arg0,arg1] &&
(arg0 < 25)/
{      
        @lockwaitCount[arg0,arg1,self->commandType] = count();
        @lockwaitTime[arg0,arg1,self->commandType] = sum(timestamp -
self->tsw[arg0,arg1]);
        @cycleCountMax[arg0, arg1] = max(arg2);
        @cycleCountAvg[arg0, arg1] = avg(arg2);
        self->tsw[arg0,arg1] = 0;
}                  
                   
postgresql*:postgres:LWLockAcquire:lwlock-endwait
/self->commandTypeInitialized && self->tsw[arg0,arg1] &&
(arg0 >= 25)/
{      
        @lockwaitCount[25,arg1,self->commandType] = count();
        @lockwaitTime[25,arg1,self->commandType] = sum(timestamp -
self->tsw[arg0,arg1]);
        self->tsw[arg0,arg1] = 0;
        @cycleCountMax[25, arg1] = max(arg2);
        @cycleCountAvg[25, arg1] = avg(arg2);
}      
       
postgresql*:postgres:LWLockAcquire:lwlock-acquire-enter
{      
        self->acqtime[arg0,arg1] = timestamp;
}

postgresql*:postgres:LWLockAcquire:lwlock-acquire-return
/self->commandTypeInitialized && arg0 < 25 &&
self->acqtime[arg0,arg1]/
{      
        @lockAcquireCount[ arg0,arg1,self->commandType ] = count();
        @lockAcquireTime[arg0,arg1,self->commandType ] = sum( timestamp -
self->acqtime[arg0,arg1]);
        self->acqtime[arg0,arg1] = 0;
}

postgresql*:postgres:LWLockAcquire:lwlock-acquire-return
/self->commandTypeInitialized && arg0 >= 25 &&
self->acqtime[arg0,arg1]/
{      
        @lockAcquireCount[ 25,arg1,self->commandType ] = count();
        @lockAcquireTime[25,arg1,self->commandType ] = sum( timestamp -
self->acqtime[arg0,arg1]);
        self->acqtime[arg0,arg1] = 0;
}      
       
postgresql*:postgres:LWLockRelease:lwlock-release-enter
/arg0 < 25/
{
        self->acqtime[arg0,self->mode[arg0] ] = timestamp;
}      
       
postgresql*:postgres:LWLockRelease:lwlock-release-enter
/arg0 >= 25/
{      
        self->acqtime[arg0,self->mode[25] ] = timestamp;
}
 
postgresql*:postgres:LWLockRelease:lwlock-release-return
/self->commandTypeInitialized && arg0 < 25 &&
self->acqtime[arg0, self->mode[arg0] ]/
{
        this->mode = self->mode[arg0];
        @lockReleaseCount[ arg0, this->mode,self->commandType ] = count();
        @lockReleaseTime[arg0, this->mode,self->commandType ] = sum(
timestamp - self->acqtime[arg0, this->mode]);        self->acqtime[arg0, this->mode] = 0;
        self->mode[arg0] = 0;
}

postgresql*:postgres:LWLockRelease:lwlock-release-return
/self->commandTypeInitialized && arg0 >= 25 &&
self->acqtime[25, self->mode[25]]/
{
        this->mode = self->mode[25];
        @lockReleaseCount[25, this->mode,self->commandType] = count();
        @lockReleaseTime[25, this->mode,self->commandType ] = sum(
timestamp - self->acqtime[25, this->mod
e]);
        self->acqtime[25, this->mode] = 0;
        self->mode[25] = 0;
}


:::tick-3600s { exit(0); }


:::END {
        printf("WALL: %d millisec\n", ((timestamp - started)/1000000)
);

                                              printa("CH %2d %1d %2c
%@15d\n", at lockheldCount);
        normalize(@lockheldTime,1000000);     printa("TH %2d %1d %2c
%@15d\n", at lockheldTime);

                                              printa("CA %2d %1d %2c
%@15d\n", @lockAcquireCount);
        normalize(@lockAcquireTime, 1000000); printa("TA %2d %1d %2c
%@15d\n", @lockAcquireTime);

                                              printa("CR %2d %1d %2c
%@15d\n", @lockReleaseCount);
        normalize(@lockReleaseTime, 1000000); printa("TR %2d %1d %2c
%@15d\n", @lockReleaseTime);

                                              printa("CW %2d %1d %2c
%@15d\n", at lockwaitCount);
        normalize(@lockwaitTime,1000000);     printa("TW %2d %1d %2c
%@15d\n", at lockwaitTime);

        printa("QM %2d %1d %@10d\n", @queueLengthStartwaitMax);
        printa("QA %2d %1d %@10d\n", @queueLengthStartwaitAvg);

        printa("QM2 %2d %1d %@10d\n", @queueLengt2StartwaitMax);
        printa("QA2 %2d %1d %@10d\n", @queueLengt2StartwaitAvg);

        printa("CM %2d %1d %@10d\n", @cycleCountMax);
        printa("CV %2d %1d %@10d\n", @cycleCountAvg);

        normalize(@elapsed, 1000000);
        printa("C %10s  %2c  %@10d\n", @count);
        printa("T %10s  %2c  %@10d\n", @elapsed);
}


this might seem a little overwhelming but once some things are  known it should
not be too bad :-)

The following lwlock xxxxx probes 


probe lwlock__acquire(int, int);
probe lwlock__acquire__enter(int, int);
probe lwlock__acquire__return(int, int);
probe lwlock__startwait(int, int, int, int);
probe lwlock__endwait(int, int, int);

all have at least two parameters: in all these cases arg0 is an id for a lock
being used and arg1 is a mode: a value of either 0 or 1 (2 different values).
Lockid''s < 25 are special. All others are only of interest when it
comes to their combined "strength".

The following probes fire during a release function call where the lock with
given id (arg0) is freed.


probe lwlock__release(int);
probe lwlock__release__enter(int);
probe lwlock__release__return(int, int);

Since the mode is not passed we need a self->mode in the script to recall
what it was during the lifetime of request/wait-for-it/use/release. BTW the lw
part stands for LeightWeight . And indeed that is wat these locks are supposed
to be. Very comparable to what Oracle calls latches.

The other probes used are to set context. Each database session waits on a
socket for a new request to arrive. This is handled in a state machine.  The
normal parts of SQL processing can be seen. In the handleCommand a  character
representing the next command/state is passed: E:xecute; P:arse; B:ind; are the
ones that I see 99.9999999% of the time. The others are (to me) of no importance
at this moment.
I would like to see how many lwlock waits are out there (count and time) for
each state/command and ofcourse also know how long a parse/bind/execute takes.

I went over the script over and over and do not see any vars that should have
been null-ed. commandType is always set: it defines the context in which a
measurement should be placed.

Any clues how to make this work are very welcome.

Thanks
Paul

BTW would also appreciate some feedback on how to format things so it would be a
little more readable. The indentation is in the source, but does not show up in
the preview and the final post.


--
This message posted from opensolaris.org

Jon Haslam

2008-Feb-28 14:30 UTC

head link

[dtrace-discuss] Struggling with dynvar space: how big should it become?

Hi Paul,
> Trying to get a script to run for an hour. Up till now every attempt failed
before the hour was over. Latest command was:
>
> dtrace  -q -x dynvarsize=16g -x cleanrate=400  -s /DTrace/newstr2.
Try specifying a unit for your cleanrate frequency i.e. 400hz. Without
a unit I''d have expected you to see a warning about
''cleanrate'' being
reduced to 5000hz when the consumer starts.

Jon.

Jon Haslam

2008-Feb-28 14:37 UTC

head link

[dtrace-discuss] Struggling with dynvar space: how big should it become?

Sorry for the bad form of replying to myself but I didn''t finish
what I had to say before I sent it. Sigh.

Without the unit specifier we should be cleaning at the highest
rate possible - 5000hz. I just wanted to make sure you were
cleaning at this rate and that you really were getting the rate you
think you should be at.

Jon.
> Hi Paul,
>
>   
>> Trying to get a script to run for an hour. Up till now every attempt
failed before the hour was over. Latest command was:
>>
>> dtrace  -q -x dynvarsize=16g -x cleanrate=400  -s /DTrace/newstr2.
>>     
>
> Try specifying a unit for your cleanrate frequency i.e. 400hz. Without
> a unit I''d have expected you to see a warning about
''cleanrate'' being
> reduced to 5000hz when the consumer starts.
>
> Jon.
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Bryan Cantrill

2008-Feb-28 14:38 UTC

head link

[dtrace-discuss] Struggling with dynvar space: how big should it become?

On Thu, Feb 28, 2008 at 05:53:37AM -0800, Paul van den Bogaard
wrote:> Trying to get a script to run for an hour. Up till now every attempt failed
before the hour was over. Latest command was:
> 
> dtrace  -q -x dynvarsize=16g -x cleanrate=400  -s /DTrace/newstr2.d
> 
> growing that dynvarsize from 128m, 25, 512 and doubling. My system
"only" has 64GB of RAM and still dtrace produces many of the
following:
> 
> dtrace: 1255 dynamic variable drops with non-empty rinsing list
> dtrace: 37907 dynamic variable drops with non-empty dirty list
> dtrace: 701859 dynamic variable drops
> dtrace: 1279 dynamic variable drops with non-empty rinsing list
> dtrace: 43404 dynamic variable drops with non-empty dirty list
> dtrace: 711953 dynamic variable drops
> dtrace: 3990 dynamic variable drops with non-empty rinsing list
> dtrace: 42116 dynamic variable drops with non-empty dirty list
> dtrace: 170163 dynamic variable drops
> dtrace: 9 dynamic variable drops with non-empty rinsing list
> dtrace: 8591 dynamic variable drops with non-empty dirty list
> 
> With the 16g setting it takes about 20 minutes before the first of these
messages is produced. After that it is producing them continuously.
I suspect that you might be seeing a bug that I ran into recently (and
for which I have a fix that I will be integrating soon).  Are your 
dynamic variables being allocated on one CPU and deallocated on another
by any chance?  (You can determine this by modifying your script to aggregate
on CPU where you would allocate and deallocate a dynamic variable.)  If
this is the case, you are likely running into the same issue that I ran
into -- the symptom of which is dynamic variable exhaustion regardless
of dynamic variable size and cleanrate...

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Sun Microsystems Fishworks.       http://blogs.sun.com/bmc

Paul van den Bogaard

2008-Mar-01 14:00 UTC

head link

[dtrace-discuss] Struggling with dynvar space: how big should

I think it is the same bug you mentioned. Not sure if I completely understood
your guidance though. I changed the original DTrace script so it contains the
following pieces of extention:

postgresql*:postgres::ready_for_query_start
{
        alloc["readyForQuery"] = cpu;
        self->readyForQuery = timestamp;
}
 
 
postgresql*:postgres::ready_for_query_end
/self->readyForQuery && alloc["readyForQuery"] != cpu /
{
        @migrations["readyForQuery"] = count();
}
 
postgresql*:postgres::ready_for_query_end
/self->readyForQuery && alloc["readyForQuery"] == cpu /
{
        @nonmigrations["readyForQuery"] = count();
}
 
postgresql*:postgres::ready_for_query_end
/self->readyForQuery /
{
        @count["readyForQuery", ''-''] = count();
        @elapsed["readyForQuery", ''-''] =
sum(timestamp - self->readyForQuery);
        self->readyForQuery = 0;
}

Here I add the "alloc" array to recall the cpu used when assigning to
a self var. If this same self var is set to zero later on I either increment the
@migrations or the @nonmigrations assocarr for the key equal to that self var
name.

The results of this is:

(MIG: migrations;   NMG: NON migrations)

MIG                  readyForQuery          294974
MIG                  handleCommand     4288534
MIG                  ts[arg0,arg1]          8474999
MIG                  mode[arg0]            8512348
NMG                 mode[arg0]                 5217
NMG                 ts[arg0,arg1]             75472
NMG                 readyForQuery         948196
NMG                 handleCommand    1383594

This shows there is a overwhelming abundance of migrations. 

--Paul


--
This message posted from opensolaris.org

Bryan Cantrill

2008-Mar-01 14:30 UTC

head link

[dtrace-discuss] Struggling with dynvar space: how big should

On Sat, Mar 01, 2008 at 06:00:40AM -0800, Paul van den Bogaard
wrote:> I think it is the same bug you mentioned. Not sure if I completely
understood your guidance though. I changed the original DTrace script so it
contains the following pieces of extention:
> 
> postgresql*:postgres::ready_for_query_start
> {
>         alloc["readyForQuery"] = cpu;
>         self->readyForQuery = timestamp;
> }
>  
>  
> postgresql*:postgres::ready_for_query_end
> /self->readyForQuery && alloc["readyForQuery"] != cpu
/
> {
>         @migrations["readyForQuery"] = count();
> }
>  
> postgresql*:postgres::ready_for_query_end
> /self->readyForQuery && alloc["readyForQuery"] == cpu
/
> {
>         @nonmigrations["readyForQuery"] = count();
> }
>  
> postgresql*:postgres::ready_for_query_end
> /self->readyForQuery /
> {
>         @count["readyForQuery", ''-''] = count();
>         @elapsed["readyForQuery", ''-''] =
sum(timestamp - self->readyForQuery);
>         self->readyForQuery = 0;
> }
> 
> Here I add the "alloc" array to recall the cpu used when
assigning to a self var. If this same self var is set to zero later on I either
increment the @migrations or the @nonmigrations assocarr for the key equal to
that self var name.
> 
> The results of this is:
> 
> (MIG: migrations;   NMG: NON migrations)
> 
> MIG                  readyForQuery          294974
> MIG                  handleCommand     4288534
> MIG                  ts[arg0,arg1]          8474999
> MIG                  mode[arg0]            8512348
> NMG                 mode[arg0]                 5217
> NMG                 ts[arg0,arg1]             75472
> NMG                 readyForQuery         948196
> NMG                 handleCommand    1383594
> 
> This shows there is a overwhelming abundance of migrations. 
For you to be running into the same problem, you would have to have some
CPU that is getting _nothing_ but deallocations.  It would be interesting
to just aggregate on CPU; where you would alloc, do this:

	@alloc[cpu] = count();

And where you would dealloc, do this:

	@dealloc[cpu] = count();

And then in an END or tick probe, do this:

	printf("%8s %8s %8s\n", "CPU", "ALLOC",
"DEALLOC");
	printa("%8d %@8d %@8d\n", @alloc, @dealloc);

If you''re running into the bug that I recently ran into, you should
have
one row that -- for some elongated period -- has zero allocations and
many deallocations.

And in terms of a workaround, try adding this clause to your D script:

	profile-1hz
	{
		bogus[cpu, timestamp] = timestamp;
		bogus[cpu, timestamp] = 0;
	}

If you''re running into this problem, the above clause should eliminate
the drops...

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Sun Microsystems Fishworks.       http://blogs.sun.com/bmc

Paul van den Bogaard

2008-Mar-02 14:30 UTC

head link

[dtrace-discuss] Struggling with dynvar space: how big should

seems that all CPUs are used for allocating data in quite a balanced way. Adding
the magic "hz" to the cleanrate flag made it possible to do a run of
384 concurrent users.
Trying to do 1280 and the old stuff reappears :-(

When increasing the cleanrate while specifying 16g for the dynvarspace I get the
message:

dtrace: dynamic variable size lowered to 8g

increasing the cleanrate even more and this becomes:

dtrace: dynamic variable size lowered to 4g

Is this not counter effective?  Looks to me I need the both. Or is a single CPU
not capable of doing so much dynamic space at such a high rate? Any other
"gotchas" that could be tried to get rid of those dynamic variable
drops?


--
This message posted from opensolaris.org

Paul van den Bogaard

2008-Mar-02 18:21 UTC

head link

[dtrace-discuss] Struggling with dynvar space: how big should

Life is full of surprises. Did a dynvarsize=16g *without* specifying a
cleanrate: 1280 user run was succesfull. Tried with many cleanrates but did not
expect this. Also no reduction of the dynvarsize. Well at least no messages.


--
This message posted from opensolaris.org

dtrace discuss - Feb 2008 - Struggling with dynvar space: how big should it become?

[dtrace-discuss] Struggling with dynvar space: how big should it become?

[dtrace-discuss] Struggling with dynvar space: how big should it become?

[dtrace-discuss] Struggling with dynvar space: how big should it become?

[dtrace-discuss] Struggling with dynvar space: how big should it become?

[dtrace-discuss] Struggling with dynvar space: how big should

[dtrace-discuss] Struggling with dynvar space: how big should

[dtrace-discuss] Struggling with dynvar space: how big should

[dtrace-discuss] Struggling with dynvar space: how big should