thr3ads.net - dtrace discuss - [dtrace-discuss] How accurate is "ustack"? [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Brian Utterback

2008-Aug-19 20:26 UTC

[dtrace-discuss] How accurate is "ustack"?

I have been doing some profiling using the profile provider. I have a 
command that runs more slowly on the T1000 than it does on prior 
systems and I am trying to find out why. Using the profile provider at 
1000 hz, and aggregating on the ustack output, I find that the same 
function appears at the top of the stack on both platforms, but on 
each there are specific instruction locations within the function that 
appear most often and these are different on the two platforms. They 
are consistent on a platform. That is when I re-run the test on one 
platform, about 4 specific PC locations will appear in the top spots, 
but on that platform it is always the same 4 and the 4 are different 
between the platforms.

So, I am trying to figure out whether or not there is something 
special happening at those locations, or just before or just after, or 
are they just artifacts of how the profile provider works? There are 
two function calls with this function, but neither set of 4 locations 
seem to be near the calls. And if the time was really spent inside the 
next lower down level, wouldn''t that have been reflected in the ustack 
output?
-- 
blu

There are two rules in life:
Rule 1- Don''t tell people everything you know
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom

Jonathan Adams

2008-Aug-19 21:01 UTC

head link

[dtrace-discuss] How accurate is "ustack"?

On Tue, Aug 19, 2008 at 04:26:51PM -0400, Brian Utterback
wrote:> I have been doing some profiling using the profile provider. I have a 
> command that runs more slowly on the T1000 than it does on prior 
> systems and I am trying to find out why. Using the profile provider at 
> 1000 hz, and aggregating on the ustack output, I find that the same 
As a side note, it''s generally better to use a prime rate, like 997 hz,
instead of something which is a multiple of the clock rate.
> function appears at the top of the stack on both platforms, but on 
> each there are specific instruction locations within the function that 
> appear most often and these are different on the two platforms. They 
> are consistent on a platform. That is when I re-run the test on one 
> platform, about 4 specific PC locations will appear in the top spots, 
> but on that platform it is always the same 4 and the 4 are different 
> between the platforms.
> 
> So, I am trying to figure out whether or not there is something 
> special happening at those locations, or just before or just after, or 
> are they just artifacts of how the profile provider works? There are 
> two function calls with this function, but neither set of 4 locations 
> seem to be near the calls. And if the time was really spent inside the 
> next lower down level, wouldn''t that have been reflected in the
ustack
> output?
The top frame of ustack() in a profile probe with a zero "arg0" (i.e.
a probe
that fired while the user code was executing) will be the interrupted
instruction.  Without more information about what the "platform"s you
speak
of are (different x86 boxes?  intel v.s. AMD?  intel v.s. sparc?), it''s
hard to say much.

Some snippets of disassembly would help, if you can provide them.  Something
like:

	echo "the_line_from_the_ustack_output+0xoffset ::dis" | mdb -p
program

would give you ten instructions on either side of the location.  Otherwise,
could you provide some information on what the interrupted locations are
doing?

Cheers,
- jonathan

Brian Utterback

2008-Aug-20 15:06 UTC

head link

[dtrace-discuss] How accurate is "ustack"?

One more thing. On the T1000, I run this script:

#!/usr/sbin/dtrace -s
profile-997
/arg1 && execname == "ssh"/
{
         @[execname,pid,ustack()]=count()
}
profile-997
/arg1 && execname == "sshd"/
{
         @[execname,pid,ustack()]=count()
}
profile-50s
{
         exit(0);
}
END
{
         trunc(@,1000)
}

and I get a whole bunch of errors that look like this:

dtrace: error on enabled probe ID 1 (ID 1853: profile:::profile-997): 
invalid address (0x3bb98000) in action #4

The only difference on these is that the address is different each 
time. I don''t get these message on the v220 at all. Any ideas what the 
problem is?


Brian Utterback wrote:> I have been doing some profiling using the profile provider. I have a 
> command that runs more slowly on the T1000 than it does on prior 
> systems and I am trying to find out why. Using the profile provider at 
> 1000 hz, and aggregating on the ustack output, I find that the same 
> function appears at the top of the stack on both platforms, but on 
> each there are specific instruction locations within the function that 
> appear most often and these are different on the two platforms. They 
> are consistent on a platform. That is when I re-run the test on one 
> platform, about 4 specific PC locations will appear in the top spots, 
> but on that platform it is always the same 4 and the 4 are different 
> between the platforms.
> 
> So, I am trying to figure out whether or not there is something 
> special happening at those locations, or just before or just after, or 
> are they just artifacts of how the profile provider works? There are 
> two function calls with this function, but neither set of 4 locations 
> seem to be near the calls. And if the time was really spent inside the 
> next lower down level, wouldn''t that have been reflected in the
ustack
> output?
-- 
blu

There are two rules in life:
Rule 1- Don''t tell people everything you know
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom

Jon Haslam

2008-Aug-20 22:37 UTC

head link

[dtrace-discuss] How accurate is "ustack"?

> One more thing. On the T1000, I run this script:
>
> #!/usr/sbin/dtrace -s
> profile-997
> /arg1 && execname == "ssh"/
> {
>          @[execname,pid,ustack()]=count()
> }
> profile-997
> /arg1 && execname == "sshd"/
> {
>          @[execname,pid,ustack()]=count()
> }
> profile-50s
> {
>          exit(0);
> }
> END
> {
>          trunc(@,1000)
> }
>
> and I get a whole bunch of errors that look like this:
>
> dtrace: error on enabled probe ID 1 (ID 1853: profile:::profile-997): 
> invalid address (0x3bb98000) in action #4
>
> The only difference on these is that the address is different each 
> time. I don''t get these message on the v220 at all. Any ideas what
the
> problem is?
>   
I had a quick look at this a while ago and here are a few
observations I made. Firstly though, what version of Solaris are
you running? Do you see the errors spasmodically or all the time?

It could be down to the fact that early on in a processes lifecycle
and very periodically during its lifetime (for mmu synchronisation)
we can have an invalid user mmu context. When we go and walk the
user stack we''ll have an invalid secondary context and you''ll
see these errors.

Also it could be that you''re catching a process very early on in
its lifecycle when we have an address space setup but we have no
mappings loaded. When we walk the stack we induce user pagefaults
with the loads and get the invalid address errors.

I would expect to see these errors on all SPARC and not just
4v though so I''m not sure why you don''t see these errors on
4u though. Could be something completely different though...

Jon.
>
> Brian Utterback wrote:
>   
>> I have been doing some profiling using the profile provider. I have a 
>> command that runs more slowly on the T1000 than it does on prior 
>> systems and I am trying to find out why. Using the profile provider at 
>> 1000 hz, and aggregating on the ustack output, I find that the same 
>> function appears at the top of the stack on both platforms, but on 
>> each there are specific instruction locations within the function that 
>> appear most often and these are different on the two platforms. They 
>> are consistent on a platform. That is when I re-run the test on one 
>> platform, about 4 specific PC locations will appear in the top spots, 
>> but on that platform it is always the same 4 and the 4 are different 
>> between the platforms.
>>
>> So, I am trying to figure out whether or not there is something 
>> special happening at those locations, or just before or just after, or 
>> are they just artifacts of how the profile provider works? There are 
>> two function calls with this function, but neither set of 4 locations 
>> seem to be near the calls. And if the time was really spent inside the 
>> next lower down level, wouldn''t that have been reflected in
the ustack
>> output?
>>     
>
>

Brian Utterback

2008-Aug-21 12:12 UTC

head link

[dtrace-discuss] How accurate is "ustack"?

Hmm. Just speculating since I don''t enough about sun4v, but could it 
be something to do with the processor switching to another CMT thread 
when the process blocks waiting for memory?

Since almost all of the user time is spent in one function (as 
determined by the profile provider) I started doing some timing of 
this particular function. I discovered that there are three distinct 
times that the thread can take through the code path. The shortest was 
about the same on the two processors, but the longest was quite a bit 
longer on the sun4v. I suspect that this longer time represents the 
time when the data is not in the cache. My understanding of CMT is 
that when the thread blocks on memory, the processor switches to a 
different thread. That would imply a different address mapping space 
as well, wouldn''t it? Perhaps there is a window where the PC points 
one  place, but the address space is pointing another? Just a WAG.


Jon Haslam wrote:> 
>> One more thing. On the T1000, I run this script:
>>
>> #!/usr/sbin/dtrace -s
>> profile-997
>> /arg1 && execname == "ssh"/
>> {
>>          @[execname,pid,ustack()]=count()
>> }
>> profile-997
>> /arg1 && execname == "sshd"/
>> {
>>          @[execname,pid,ustack()]=count()
>> }
>> profile-50s
>> {
>>          exit(0);
>> }
>> END
>> {
>>          trunc(@,1000)
>> }
>>
>> and I get a whole bunch of errors that look like this:
>>
>> dtrace: error on enabled probe ID 1 (ID 1853: profile:::profile-997): 
>> invalid address (0x3bb98000) in action #4
>>
>> The only difference on these is that the address is different each 
>> time. I don''t get these message on the v220 at all. Any ideas
what the
>> problem is?
>>   
> 
> I had a quick look at this a while ago and here are a few
> observations I made. Firstly though, what version of Solaris are
> you running? Do you see the errors spasmodically or all the time?
> 
> It could be down to the fact that early on in a processes lifecycle
> and very periodically during its lifetime (for mmu synchronisation)
> we can have an invalid user mmu context. When we go and walk the
> user stack we''ll have an invalid secondary context and
you''ll
> see these errors.
> 
> Also it could be that you''re catching a process very early on in
> its lifecycle when we have an address space setup but we have no
> mappings loaded. When we walk the stack we induce user pagefaults
> with the loads and get the invalid address errors.
> 
> I would expect to see these errors on all SPARC and not just
> 4v though so I''m not sure why you don''t see these errors
on
> 4u though. Could be something completely different though...
> 
> Jon.
> 
>>
>> Brian Utterback wrote:
>>  
>>> I have been doing some profiling using the profile provider. I have
a
>>> command that runs more slowly on the T1000 than it does on prior 
>>> systems and I am trying to find out why. Using the profile provider
>>> at 1000 hz, and aggregating on the ustack output, I find that the 
>>> same function appears at the top of the stack on both platforms,
but
>>> on each there are specific instruction locations within the
function
>>> that appear most often and these are different on the two
platforms.
>>> They are consistent on a platform. That is when I re-run the test
on
>>> one platform, about 4 specific PC locations will appear in the top 
>>> spots, but on that platform it is always the same 4 and the 4 are 
>>> different between the platforms.
>>>
>>> So, I am trying to figure out whether or not there is something 
>>> special happening at those locations, or just before or just after,
>>> or are they just artifacts of how the profile provider works? There
>>> are two function calls with this function, but neither set of 4 
>>> locations seem to be near the calls. And if the time was really
spent
>>> inside the next lower down level, wouldn''t that have been
reflected
>>> in the ustack output?
>>>     
>>
>>   
> 
-- 
blu

There are two rules in life:
Rule 1- Don''t tell people everything you know
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom

Jon Haslam

2008-Aug-22 14:21 UTC

head link

[dtrace-discuss] How accurate is "ustack"?

> Hmm. Just speculating since I don''t enough about sun4v, but could
it
> be something to do with the processor switching to another CMT thread 
> when the process blocks waiting for memory?
>
> Since almost all of the user time is spent in one function (as 
> determined by the profile provider) I started doing some timing of 
> this particular function. I discovered that there are three distinct 
> times that the thread can take through the code path. The shortest was 
> about the same on the two processors, but the longest was quite a bit 
> longer on the sun4v. I suspect that this longer time represents the 
> time when the data is not in the cache. My understanding of CMT is 
> that when the thread blocks on memory, the processor switches to a 
> different thread. That would imply a different address mapping space 
> as well, wouldn''t it? Perhaps there is a window where the PC
points
> one  place, but the address space is pointing another? Just a WAG.
Yeah, I wouldn''t like to say without looking at the particular
issue although I can see the behaviour that you see though. Sometime
I''ve got to look at another mmu context/ustack() related issue on 4v
and I''ll verify the source of this then.

Jon.
>
> Jon Haslam wrote:
>   
>>> One more thing. On the T1000, I run this script:
>>>
>>> #!/usr/sbin/dtrace -s
>>> profile-997
>>> /arg1 && execname == "ssh"/
>>> {
>>>          @[execname,pid,ustack()]=count()
>>> }
>>> profile-997
>>> /arg1 && execname == "sshd"/
>>> {
>>>          @[execname,pid,ustack()]=count()
>>> }
>>> profile-50s
>>> {
>>>          exit(0);
>>> }
>>> END
>>> {
>>>          trunc(@,1000)
>>> }
>>>
>>> and I get a whole bunch of errors that look like this:
>>>
>>> dtrace: error on enabled probe ID 1 (ID 1853:
profile:::profile-997):
>>> invalid address (0x3bb98000) in action #4
>>>
>>> The only difference on these is that the address is different each 
>>> time. I don''t get these message on the v220 at all. Any
ideas what the
>>> problem is?
>>>   
>>>       
>> I had a quick look at this a while ago and here are a few
>> observations I made. Firstly though, what version of Solaris are
>> you running? Do you see the errors spasmodically or all the time?
>>
>> It could be down to the fact that early on in a processes lifecycle
>> and very periodically during its lifetime (for mmu synchronisation)
>> we can have an invalid user mmu context. When we go and walk the
>> user stack we''ll have an invalid secondary context and
you''ll
>> see these errors.
>>
>> Also it could be that you''re catching a process very early on
in
>> its lifecycle when we have an address space setup but we have no
>> mappings loaded. When we walk the stack we induce user pagefaults
>> with the loads and get the invalid address errors.
>>
>> I would expect to see these errors on all SPARC and not just
>> 4v though so I''m not sure why you don''t see these
errors on
>> 4u though. Could be something completely different though...
>>
>> Jon.
>>
>>     
>>> Brian Utterback wrote:
>>>  
>>>       
>>>> I have been doing some profiling using the profile provider. I
have a
>>>> command that runs more slowly on the T1000 than it does on
prior
>>>> systems and I am trying to find out why. Using the profile
provider
>>>> at 1000 hz, and aggregating on the ustack output, I find that
the
>>>> same function appears at the top of the stack on both
platforms, but
>>>> on each there are specific instruction locations within the
function
>>>> that appear most often and these are different on the two
platforms.
>>>> They are consistent on a platform. That is when I re-run the
test on
>>>> one platform, about 4 specific PC locations will appear in the
top
>>>> spots, but on that platform it is always the same 4 and the 4
are
>>>> different between the platforms.
>>>>
>>>> So, I am trying to figure out whether or not there is something
>>>> special happening at those locations, or just before or just
after,
>>>> or are they just artifacts of how the profile provider works?
There
>>>> are two function calls with this function, but neither set of 4
>>>> locations seem to be near the calls. And if the time was really
spent
>>>> inside the next lower down level, wouldn''t that have
been reflected
>>>> in the ustack output?
>>>>     
>>>>         
>>>   
>>>       
>
>

Reasonably Related Threads

Search for more maybe matching threads

dtrace discuss - Aug 2008 - How accurate is "ustack"?

[dtrace-discuss] How accurate is "ustack"?

[dtrace-discuss] How accurate is "ustack"?

[dtrace-discuss] How accurate is "ustack"?

[dtrace-discuss] How accurate is "ustack"?

[dtrace-discuss] How accurate is "ustack"?

[dtrace-discuss] How accurate is "ustack"?

Reasonably Related Threads