Paul van den Bogaard
2008-Feb-28 13:53 UTC
[dtrace-discuss] Struggling with dynvar space: how big should it become?
Trying to get a script to run for an hour. Up till now every attempt failed before the hour was over. Latest command was: dtrace -q -x dynvarsize=16g -x cleanrate=400 -s /DTrace/newstr2.d growing that dynvarsize from 128m, 25, 512 and doubling. My system "only" has 64GB of RAM and still dtrace produces many of the following: dtrace: 1255 dynamic variable drops with non-empty rinsing list dtrace: 37907 dynamic variable drops with non-empty dirty list dtrace: 701859 dynamic variable drops dtrace: 1279 dynamic variable drops with non-empty rinsing list dtrace: 43404 dynamic variable drops with non-empty dirty list dtrace: 711953 dynamic variable drops dtrace: 3990 dynamic variable drops with non-empty rinsing list dtrace: 42116 dynamic variable drops with non-empty dirty list dtrace: 170163 dynamic variable drops dtrace: 9 dynamic variable drops with non-empty rinsing list dtrace: 8591 dynamic variable drops with non-empty dirty list With the 16g setting it takes about 20 minutes before the first of these messages is produced. After that it is producing them continuously. I am "only" running a 384 process setup with PostgreSQL and hope to be abel to do one with 1280 processes. The DTrace script in use references a couple of "home grown" USDT probes. The provider definition is as follows: provider postgresql { probe transaction__start(int); probe transaction__commit(int); probe transaction__abort(int); probe lock__startwait(int, int); probe lock__endwait(int, int); probe lwlock__acquire(int, int); probe lwlock__acquire__enter(int, int); probe lwlock__acquire__return(int, int); probe lwlock__release(int); probe lwlock__release__enter(int); probe lwlock__release__return(int, int); probe lwlock__startwait(int, int, int, int); probe lwlock__endwait(int, int, int); probe lwlock__condacquire(int, int); probe lwlock__condacquire__fail(int, int); probe getsnapshotdata(int, int, int, int, int); probe ready_for_query_start(); probe ready_for_query_end(); probe ignore_till_sync(); probe handle_command_start(int); probe handle_command_end(int); }; The failing script: self int handleCommand; self int readyForQuery; self char commandType; self int commandTypeInitialized; BEGIN { started = timestamp; } postgresql*:postgres::ready_for_query_start { self->readyForQuery = timestamp; } postgresql*:postgres::ready_for_query_end /self->readyForQuery / { @count["readyForQuery", ''-''] = count(); @elapsed["readyForQuery", ''-''] = sum(timestamp - self->readyForQuery); self->readyForQuery = 0; } postgresql*:postgres::ignore_till_sync { @count["ignore_till_sync", ''-''] = count(); } postgresql*:postgres::handle_command_start { self->commandTypeInitialized = 1; } postgresql*:postgres::handle_command_start /arg0 == 0/ { self->handleCommand = 0; @count["zero",''-''] = count(); } postgresql*:postgres::handle_command_start /arg0 != 0/ { self->handleCommand = timestamp; self->commandType = (char)arg0; } postgresql*:postgres::handle_command_end /self->handleCommand/ {[/b] @count["handleCommand", arg0] = count(); @elapsed["handleCommand", arg0] = sum(timestamp - self->handleCommand); self->handleCommand = 0; self->commandType = ''-''; } postgresql*:postgres:LWLockAcquire:lwlock-acquire { self->ts[arg0,arg1]=timestamp; self->mode[arg0] = arg1; } postgresql*:postgres:LWLockRelease:lwlock-release /self->commandTypeInitialized && arg0 < 25 && self->ts[arg0,self->mode[arg0] ] / { this->mode = self->mode[arg0]; @lockheldCount[arg0,this->mode,self->commandType] = count(); @lockheldTime[arg0,this->mode,self->commandType] = sum (timestamp - self->ts[arg0,this->mode]); self->ts[arg0,this->mode] = 0; } postgresql*:postgres:LWLockRelease:lwlock-release /self->commandTypeInitialized && arg0 >= 25 && self->ts[arg0,self->mode[arg0]] / { this->mode = self->mode[arg0]; @lockheldCount[25,this->mode,self->commandType] = count(); @lockheldTime[25,this->mode,self->commandType] = sum (timestamp - self->ts[arg0,this->mode]); self->ts[arg0,this->mode] = 0; } postgresql*:postgres:LWLockAcquire:lwlock-startwait /arg0 < 25/ { self->tsw[arg0,arg1] = timestamp; @queueLengthStartwaitMax[ arg0,arg1 ] = max(arg2); @queueLengthStartwaitAvg[ arg0,arg1 ] = avg(arg2); @queueLengt2StartwaitMax[ arg0,arg1 ] = max(arg3); @queueLengt2StartwaitAvg[ arg0,arg1 ] = avg(arg3); } postgresql*:postgres:LWLockAcquire:lwlock-startwait /arg0 >= 25/ { self->tsw[25,arg1] = timestamp; @queueLengthStartwaitMax[25,arg1 ] = max(arg2); @queueLengthStartwaitAvg[25,arg1 ] = avg(arg2); @queueLengt2StartwaitMax[25,arg1 ] = max(arg3); @queueLengt2StartwaitAvg[25,arg1 ] = avg(arg3); } postgresql*:postgres:LWLockAcquire:lwlock-endwait /self->commandTypeInitialized && self->tsw[arg0,arg1] && (arg0 < 25)/ { @lockwaitCount[arg0,arg1,self->commandType] = count(); @lockwaitTime[arg0,arg1,self->commandType] = sum(timestamp - self->tsw[arg0,arg1]); @cycleCountMax[arg0, arg1] = max(arg2); @cycleCountAvg[arg0, arg1] = avg(arg2); self->tsw[arg0,arg1] = 0; } postgresql*:postgres:LWLockAcquire:lwlock-endwait /self->commandTypeInitialized && self->tsw[arg0,arg1] && (arg0 >= 25)/ { @lockwaitCount[25,arg1,self->commandType] = count(); @lockwaitTime[25,arg1,self->commandType] = sum(timestamp - self->tsw[arg0,arg1]); self->tsw[arg0,arg1] = 0; @cycleCountMax[25, arg1] = max(arg2); @cycleCountAvg[25, arg1] = avg(arg2); } postgresql*:postgres:LWLockAcquire:lwlock-acquire-enter { self->acqtime[arg0,arg1] = timestamp; } postgresql*:postgres:LWLockAcquire:lwlock-acquire-return /self->commandTypeInitialized && arg0 < 25 && self->acqtime[arg0,arg1]/ { @lockAcquireCount[ arg0,arg1,self->commandType ] = count(); @lockAcquireTime[arg0,arg1,self->commandType ] = sum( timestamp - self->acqtime[arg0,arg1]); self->acqtime[arg0,arg1] = 0; } postgresql*:postgres:LWLockAcquire:lwlock-acquire-return /self->commandTypeInitialized && arg0 >= 25 && self->acqtime[arg0,arg1]/ { @lockAcquireCount[ 25,arg1,self->commandType ] = count(); @lockAcquireTime[25,arg1,self->commandType ] = sum( timestamp - self->acqtime[arg0,arg1]); self->acqtime[arg0,arg1] = 0; } postgresql*:postgres:LWLockRelease:lwlock-release-enter /arg0 < 25/ { self->acqtime[arg0,self->mode[arg0] ] = timestamp; } postgresql*:postgres:LWLockRelease:lwlock-release-enter /arg0 >= 25/ { self->acqtime[arg0,self->mode[25] ] = timestamp; } postgresql*:postgres:LWLockRelease:lwlock-release-return /self->commandTypeInitialized && arg0 < 25 && self->acqtime[arg0, self->mode[arg0] ]/ { this->mode = self->mode[arg0]; @lockReleaseCount[ arg0, this->mode,self->commandType ] = count(); @lockReleaseTime[arg0, this->mode,self->commandType ] = sum( timestamp - self->acqtime[arg0, this->mode]);self->acqtime[arg0, this->mode] = 0; self->mode[arg0] = 0; } postgresql*:postgres:LWLockRelease:lwlock-release-return /self->commandTypeInitialized && arg0 >= 25 && self->acqtime[25, self->mode[25]]/ { this->mode = self->mode[25]; @lockReleaseCount[25, this->mode,self->commandType] = count(); @lockReleaseTime[25, this->mode,self->commandType ] = sum( timestamp - self->acqtime[25, this->mod e]); self->acqtime[25, this->mode] = 0; self->mode[25] = 0; } :::tick-3600s { exit(0); } :::END { printf("WALL: %d millisec\n", ((timestamp - started)/1000000) ); printa("CH %2d %1d %2c %@15d\n", at lockheldCount); normalize(@lockheldTime,1000000); printa("TH %2d %1d %2c %@15d\n", at lockheldTime); printa("CA %2d %1d %2c %@15d\n", @lockAcquireCount); normalize(@lockAcquireTime, 1000000); printa("TA %2d %1d %2c %@15d\n", @lockAcquireTime); printa("CR %2d %1d %2c %@15d\n", @lockReleaseCount); normalize(@lockReleaseTime, 1000000); printa("TR %2d %1d %2c %@15d\n", @lockReleaseTime); printa("CW %2d %1d %2c %@15d\n", at lockwaitCount); normalize(@lockwaitTime,1000000); printa("TW %2d %1d %2c %@15d\n", at lockwaitTime); printa("QM %2d %1d %@10d\n", @queueLengthStartwaitMax); printa("QA %2d %1d %@10d\n", @queueLengthStartwaitAvg); printa("QM2 %2d %1d %@10d\n", @queueLengt2StartwaitMax); printa("QA2 %2d %1d %@10d\n", @queueLengt2StartwaitAvg); printa("CM %2d %1d %@10d\n", @cycleCountMax); printa("CV %2d %1d %@10d\n", @cycleCountAvg); normalize(@elapsed, 1000000); printa("C %10s %2c %@10d\n", @count); printa("T %10s %2c %@10d\n", @elapsed); } this might seem a little overwhelming but once some things are known it should not be too bad :-) The following lwlock xxxxx probes probe lwlock__acquire(int, int); probe lwlock__acquire__enter(int, int); probe lwlock__acquire__return(int, int); probe lwlock__startwait(int, int, int, int); probe lwlock__endwait(int, int, int); all have at least two parameters: in all these cases arg0 is an id for a lock being used and arg1 is a mode: a value of either 0 or 1 (2 different values). Lockid''s < 25 are special. All others are only of interest when it comes to their combined "strength". The following probes fire during a release function call where the lock with given id (arg0) is freed. probe lwlock__release(int); probe lwlock__release__enter(int); probe lwlock__release__return(int, int); Since the mode is not passed we need a self->mode in the script to recall what it was during the lifetime of request/wait-for-it/use/release. BTW the lw part stands for LeightWeight . And indeed that is wat these locks are supposed to be. Very comparable to what Oracle calls latches. The other probes used are to set context. Each database session waits on a socket for a new request to arrive. This is handled in a state machine. The normal parts of SQL processing can be seen. In the handleCommand a character representing the next command/state is passed: E:xecute; P:arse; B:ind; are the ones that I see 99.9999999% of the time. The others are (to me) of no importance at this moment. I would like to see how many lwlock waits are out there (count and time) for each state/command and ofcourse also know how long a parse/bind/execute takes. I went over the script over and over and do not see any vars that should have been null-ed. commandType is always set: it defines the context in which a measurement should be placed. Any clues how to make this work are very welcome. Thanks Paul BTW would also appreciate some feedback on how to format things so it would be a little more readable. The indentation is in the source, but does not show up in the preview and the final post. -- This message posted from opensolaris.org
Jon Haslam
2008-Feb-28 14:30 UTC
[dtrace-discuss] Struggling with dynvar space: how big should it become?
Hi Paul,> Trying to get a script to run for an hour. Up till now every attempt failed before the hour was over. Latest command was: > > dtrace -q -x dynvarsize=16g -x cleanrate=400 -s /DTrace/newstr2.Try specifying a unit for your cleanrate frequency i.e. 400hz. Without a unit I''d have expected you to see a warning about ''cleanrate'' being reduced to 5000hz when the consumer starts. Jon.
Jon Haslam
2008-Feb-28 14:37 UTC
[dtrace-discuss] Struggling with dynvar space: how big should it become?
Sorry for the bad form of replying to myself but I didn''t finish what I had to say before I sent it. Sigh. Without the unit specifier we should be cleaning at the highest rate possible - 5000hz. I just wanted to make sure you were cleaning at this rate and that you really were getting the rate you think you should be at. Jon.> Hi Paul, > > >> Trying to get a script to run for an hour. Up till now every attempt failed before the hour was over. Latest command was: >> >> dtrace -q -x dynvarsize=16g -x cleanrate=400 -s /DTrace/newstr2. >> > > Try specifying a unit for your cleanrate frequency i.e. 400hz. Without > a unit I''d have expected you to see a warning about ''cleanrate'' being > reduced to 5000hz when the consumer starts. > > Jon. > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
Bryan Cantrill
2008-Feb-28 14:38 UTC
[dtrace-discuss] Struggling with dynvar space: how big should it become?
On Thu, Feb 28, 2008 at 05:53:37AM -0800, Paul van den Bogaard wrote:> Trying to get a script to run for an hour. Up till now every attempt failed before the hour was over. Latest command was: > > dtrace -q -x dynvarsize=16g -x cleanrate=400 -s /DTrace/newstr2.d > > growing that dynvarsize from 128m, 25, 512 and doubling. My system "only" has 64GB of RAM and still dtrace produces many of the following: > > dtrace: 1255 dynamic variable drops with non-empty rinsing list > dtrace: 37907 dynamic variable drops with non-empty dirty list > dtrace: 701859 dynamic variable drops > dtrace: 1279 dynamic variable drops with non-empty rinsing list > dtrace: 43404 dynamic variable drops with non-empty dirty list > dtrace: 711953 dynamic variable drops > dtrace: 3990 dynamic variable drops with non-empty rinsing list > dtrace: 42116 dynamic variable drops with non-empty dirty list > dtrace: 170163 dynamic variable drops > dtrace: 9 dynamic variable drops with non-empty rinsing list > dtrace: 8591 dynamic variable drops with non-empty dirty list > > With the 16g setting it takes about 20 minutes before the first of these messages is produced. After that it is producing them continuously.I suspect that you might be seeing a bug that I ran into recently (and for which I have a fix that I will be integrating soon). Are your dynamic variables being allocated on one CPU and deallocated on another by any chance? (You can determine this by modifying your script to aggregate on CPU where you would allocate and deallocate a dynamic variable.) If this is the case, you are likely running into the same issue that I ran into -- the symptom of which is dynamic variable exhaustion regardless of dynamic variable size and cleanrate... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Sun Microsystems Fishworks. http://blogs.sun.com/bmc
Paul van den Bogaard
2008-Mar-01 14:00 UTC
[dtrace-discuss] Struggling with dynvar space: how big should
I think it is the same bug you mentioned. Not sure if I completely understood your guidance though. I changed the original DTrace script so it contains the following pieces of extention: postgresql*:postgres::ready_for_query_start { alloc["readyForQuery"] = cpu; self->readyForQuery = timestamp; } postgresql*:postgres::ready_for_query_end /self->readyForQuery && alloc["readyForQuery"] != cpu / { @migrations["readyForQuery"] = count(); } postgresql*:postgres::ready_for_query_end /self->readyForQuery && alloc["readyForQuery"] == cpu / { @nonmigrations["readyForQuery"] = count(); } postgresql*:postgres::ready_for_query_end /self->readyForQuery / { @count["readyForQuery", ''-''] = count(); @elapsed["readyForQuery", ''-''] = sum(timestamp - self->readyForQuery); self->readyForQuery = 0; } Here I add the "alloc" array to recall the cpu used when assigning to a self var. If this same self var is set to zero later on I either increment the @migrations or the @nonmigrations assocarr for the key equal to that self var name. The results of this is: (MIG: migrations; NMG: NON migrations) MIG readyForQuery 294974 MIG handleCommand 4288534 MIG ts[arg0,arg1] 8474999 MIG mode[arg0] 8512348 NMG mode[arg0] 5217 NMG ts[arg0,arg1] 75472 NMG readyForQuery 948196 NMG handleCommand 1383594 This shows there is a overwhelming abundance of migrations. --Paul -- This message posted from opensolaris.org
Bryan Cantrill
2008-Mar-01 14:30 UTC
[dtrace-discuss] Struggling with dynvar space: how big should
On Sat, Mar 01, 2008 at 06:00:40AM -0800, Paul van den Bogaard wrote:> I think it is the same bug you mentioned. Not sure if I completely understood your guidance though. I changed the original DTrace script so it contains the following pieces of extention: > > postgresql*:postgres::ready_for_query_start > { > alloc["readyForQuery"] = cpu; > self->readyForQuery = timestamp; > } > > > postgresql*:postgres::ready_for_query_end > /self->readyForQuery && alloc["readyForQuery"] != cpu / > { > @migrations["readyForQuery"] = count(); > } > > postgresql*:postgres::ready_for_query_end > /self->readyForQuery && alloc["readyForQuery"] == cpu / > { > @nonmigrations["readyForQuery"] = count(); > } > > postgresql*:postgres::ready_for_query_end > /self->readyForQuery / > { > @count["readyForQuery", ''-''] = count(); > @elapsed["readyForQuery", ''-''] = sum(timestamp - self->readyForQuery); > self->readyForQuery = 0; > } > > Here I add the "alloc" array to recall the cpu used when assigning to a self var. If this same self var is set to zero later on I either increment the @migrations or the @nonmigrations assocarr for the key equal to that self var name. > > The results of this is: > > (MIG: migrations; NMG: NON migrations) > > MIG readyForQuery 294974 > MIG handleCommand 4288534 > MIG ts[arg0,arg1] 8474999 > MIG mode[arg0] 8512348 > NMG mode[arg0] 5217 > NMG ts[arg0,arg1] 75472 > NMG readyForQuery 948196 > NMG handleCommand 1383594 > > This shows there is a overwhelming abundance of migrations.For you to be running into the same problem, you would have to have some CPU that is getting _nothing_ but deallocations. It would be interesting to just aggregate on CPU; where you would alloc, do this: @alloc[cpu] = count(); And where you would dealloc, do this: @dealloc[cpu] = count(); And then in an END or tick probe, do this: printf("%8s %8s %8s\n", "CPU", "ALLOC", "DEALLOC"); printa("%8d %@8d %@8d\n", @alloc, @dealloc); If you''re running into the bug that I recently ran into, you should have one row that -- for some elongated period -- has zero allocations and many deallocations. And in terms of a workaround, try adding this clause to your D script: profile-1hz { bogus[cpu, timestamp] = timestamp; bogus[cpu, timestamp] = 0; } If you''re running into this problem, the above clause should eliminate the drops... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Sun Microsystems Fishworks. http://blogs.sun.com/bmc
Paul van den Bogaard
2008-Mar-02 14:30 UTC
[dtrace-discuss] Struggling with dynvar space: how big should
seems that all CPUs are used for allocating data in quite a balanced way. Adding the magic "hz" to the cleanrate flag made it possible to do a run of 384 concurrent users. Trying to do 1280 and the old stuff reappears :-( When increasing the cleanrate while specifying 16g for the dynvarspace I get the message: dtrace: dynamic variable size lowered to 8g increasing the cleanrate even more and this becomes: dtrace: dynamic variable size lowered to 4g Is this not counter effective? Looks to me I need the both. Or is a single CPU not capable of doing so much dynamic space at such a high rate? Any other "gotchas" that could be tried to get rid of those dynamic variable drops? -- This message posted from opensolaris.org
Paul van den Bogaard
2008-Mar-02 18:21 UTC
[dtrace-discuss] Struggling with dynvar space: how big should
Life is full of surprises. Did a dynvarsize=16g *without* specifying a cleanrate: 1280 user run was succesfull. Tried with many cleanrates but did not expect this. Also no reduction of the dynvarsize. Well at least no messages. -- This message posted from opensolaris.org