Hi, I have a script that measures Postgres locks, but it doesn''t seem to work right and wondering if anyone can tell me if the logic is correct. We''re trying to measure the counts and time for two locks which occur in this order Acquire ProcArrayLock -> Acquire WALWriteLock -> Release WALWriteLock -> Release ProcArrayLock WALWriteLock can only be acquired and released after ProcArrayLock, but ProcArrayLock can be acquired without WALWriteLock. The lock counts look right based on the output below, but the times are off. WALWriteLock time should be less than ProcArrayLock time, but it''s the other way around. Are there any logic errors in my script? I wanted to make sure the script is correct before looking into the probe implementation. Thanks, Robert #!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { lckmode[0] = "Exclusive"; lckmode[1] = "Shared"; lckid[0] = "BufMappingLock"; lckid[1] = "BufFreelistLock"; lckid[2] = "ShmemIndexLock"; lckid[3] = "OidGenLock"; lckid[4] = "XidGenLock"; lckid[5] = "ProcArrayLock"; lckid[6] = "SInvalLock"; lckid[7] = "FreeSpaceLock"; lckid[8] = "WALInsertLock"; lckid[9] = "WALWriteLock"; lckid[10] = "ControlFileLock"; lckid[11] = "CheckpointLock"; lckid[12] = "CheckpointStartLock"; lckid[13] = "CLogControlLock"; lckid[14] = "SubtransControlLock"; lckid[15] = "MultiXactGenLock"; lckid[16] = "MultiXactOffsetControlLock"; lckid[17] = "MultiXactMemberControlLock"; lckid[18] = "RelCacheInitLock"; lckid[19] = "BgWriterCommLock"; lckid[20] = "TwoPhaseStateLock"; lckid[21] = "TablespaceCreateLock"; lckid[22] = "BtreeVacuumLock"; lckid[23] = "FirstLockMgrLock"; } pgsql$1:::lwlock-acquire / arg0 == 5 / { @count[lckid[arg0]] = count(); self->vts5=vtimestamp; self->pal = 1; } pgsql$1:::lwlock-acquire / (arg0 == 9) && self->pal / { self->vts9=vtimestamp; @count[lckid[arg0]] = count(); } pgsql$1:::lwlock-release / (arg0 == 5) && self->vts5 / { @time[lckid[arg0]] = sum (vtimestamp - self->vts5); self->vts5=0; } pgsql$1:::lwlock-release / (arg0 == 9) && self->pal && self->vts9 / { @time[lckid[arg0]] = sum (vtimestamp - self->vts9); self->vts9=0; self->pal=0; } dtrace:::END { printf("\n%20s %15s\n", "Lock Id", "Count"); printa("%20s %@15d\n", at count); printf("\n%20s %15s\n", "Lock Id", "Time (ns)"); printa("%20s %@15d\n", at time); } OUTPUT -------- Lock Id Count WALWriteLock 377 ProcArrayLock 5668 Lock Id Time (ns) ProcArrayLock 11783700 WALWriteLock 35862477
On Mon, Jul 10, 2006 at 10:27:12AM -0700, Robert Lor wrote:> Hi, > > I have a script that measures Postgres locks, but it doesn''t seem to > work right and wondering if anyone can tell me if the logic is correct. > We''re trying to measure the counts and time for two locks which occur in > this order > > Acquire ProcArrayLock -> Acquire WALWriteLock -> Release > WALWriteLock -> Release ProcArrayLock > > WALWriteLock can only be acquired and released after ProcArrayLock, but > ProcArrayLock can be acquired without WALWriteLock. > > The lock counts look right based on the output below, but the times are > off. WALWriteLock time should be less than ProcArrayLock time, but it''s > the other way around.[ ... ]> pgsql$1:::lwlock-acquire > / (arg0 == 9) && self->pal / > { > self->vts9=vtimestamp; > @count[lckid[arg0]] = count(); > } > > pgsql$1:::lwlock-release > / (arg0 == 5) && self->vts5 / > { > @time[lckid[arg0]] = sum (vtimestamp - self->vts5); > self->vts5=0; > } > > pgsql$1:::lwlock-release > / (arg0 == 9) && self->pal && self->vts9 / > { > @time[lckid[arg0]] = sum (vtimestamp - self->vts9); > self->vts9=0; > self->pal=0; > }You claim that WALWriteLock is dropped before ProcArrayLock, but your script doesn''t actually assert that. If WALWriteLock is actually dropped after (or long after) ProcArrayLock is dropped, you would expect to see the output that you''re seeing. To explore this, change that lwlock-release clause to: pgsql$1:::lwlock-release /(arg0 == 9) && self->pal && self->vts9/ { @time[!self->vts5 ? strjoin(lckid[arg0], " (ProcArrayLock dropped)") : lckid[arg0]] = sum (vtimestamp - self->vts9); } Also, note that vtimestamp is capturing CPU time, not wall clock time. This is unrelated to whatever issue you''re seeing, but just so that you''re aware that you won''t see disk or scheduling latency show up in the duration number... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc
Bryan Cantrill wrote:> >You claim that WALWriteLock is dropped before ProcArrayLock, but your >script doesn''t actually assert that. >Yes, WALWriteLock is dropped before ProcArrayLock. I made a mistake in the original script and made the following changes but now the output doesn''t show WALWriteLock. What''s wrong? .... pgsql$1:::lwlock-acquire / arg0 == 5 / { @count[lckid[arg0]] = count(); self->vts5=vtimestamp; self->pal = 1; } pgsql$1:::lwlock-acquire / (arg0 == 9) && self->pal / { self->vts9=vtimestamp; @count[lckid[arg0]] = count(); } pgsql$1:::lwlock-release / (arg0 == 9) && self->pal && self->vts9 / { @time[lckid[arg0]] = sum (vtimestamp - self->vts9); self->vts9=0; } pgsql$1:::lwlock-release / (arg0 == 5) && self->pal && self->vts5 / { @time[lckid[arg0]] = sum (vtimestamp - self->vts5); self->vts5=0; self->pal=0; } ... # pg_lwlock.d 2035 Lock Id Count ProcArrayLock 13712 Lock Id Time (ns) ProcArrayLock 17942326> If WALWriteLock is actually dropped >after (or long after) ProcArrayLock is dropped, you would expect to see >the output that you''re seeing. To explore this, change that lwlock-release >clause to: > >pgsql$1:::lwlock-release >/(arg0 == 9) && self->pal && self->vts9/ >{ > @time[!self->vts5 ? strjoin(lckid[arg0], " (ProcArrayLock dropped)") : > lckid[arg0]] = sum (vtimestamp - self->vts9); >} > >Here''s the output from your change: Lock Id Count WALWriteLock 166 ProcArrayLock 2495 Lock Id Time (ns) ProcArrayLock 4974113 WALWriteLock (ProcArrayLock dropped) 11014604>Also, note that vtimestamp is capturing CPU time, not wall clock time. >This is unrelated to whatever issue you''re seeing, but just so that you''re >aware that you won''t see disk or scheduling latency show up in the >duration number... > >Got it!
Robert Lor wrote:> Bryan Cantrill wrote: > >> >> You claim that WALWriteLock is dropped before ProcArrayLock, but your >> script doesn''t actually assert that. >> > Yes, WALWriteLock is dropped before ProcArrayLock. I made a mistake in > the original script and made the following changes but now the output > doesn''t show WALWriteLock. What''s wrong? > > .... > > pgsql$1:::lwlock-acquire > / arg0 == 5 / > { > @count[lckid[arg0]] = count(); > self->vts5=vtimestamp; > self->pal = 1; > } > > pgsql$1:::lwlock-acquire > / (arg0 == 9) && self->pal / > { > self->vts9=vtimestamp; > @count[lckid[arg0]] = count(); > } > > pgsql$1:::lwlock-release > / (arg0 == 9) && self->pal && self->vts9 / > { > @time[lckid[arg0]] = sum (vtimestamp - self->vts9); > self->vts9=0; > } > > pgsql$1:::lwlock-release > / (arg0 == 5) && self->pal && self->vts5 / > { > @time[lckid[arg0]] = sum (vtimestamp - self->vts5); > self->vts5=0; > self->pal=0; > } > ...It turns out that the script above works find but the two locks do not occur in that order for a single Postgres process. Thanks! -Robert