thr3ads.net - dtrace discuss - [dtrace-discuss] How come DTrace? (From a TNF perpetator) [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Nakul Saraiya

2006-Oct-20 08:06 UTC

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

I''ve posted the TNF goals and constraints from 12+ years ago elsewhere,
but am curious why Sun invested in DTrace, given the history.  It is a great
tool, and is exactly what Sun should have invested in a decade ago.

The question I was asked in PSARC reviews was ''why invest in something
that nobody has asked for'' ...  and there was a violent reaction from
the kernel team when one proposed code patching.

Just wondering why this thinking changed(in the right direction)  over the past
12 years..rgds....nakul
 
 
This message posted from opensolaris.org

James C. McPherson

2006-Oct-20 10:50 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

Nakul Saraiya wrote:> I''ve posted the TNF goals and constraints from 12+ years ago
elsewhere, but
> am curious why Sun invested in DTrace, given the history.  It is a great
> tool, and is exactly what Sun should have invested in a decade ago.
> 
> The question I was asked in PSARC reviews was ''why invest in
something that
> nobody has asked for'' ...  and there was a violent reaction from
the kernel
> team when one proposed code patching.
> 
> Just wondering why this thinking changed(in the right direction)  over the
> past 12 years..rgds....nakul

Hi Nakul,
I''m sure Bryan/Adam/Mike/Jonathan will chime in with the facts,
but for me as an outsider looking in (even while I was still at
Sun) the reason seemed to come down to

The time has come to make things better


oh, and there appears to have been sufficient momentum and feedback
received to give a push to those who needed one.

Actual facts are probably different, the above is merely my
impression of why DTrace was what resulted.



cheers,
James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
               http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson

Chip Bennett

2006-Oct-20 11:29 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

James C. McPherson wrote:> Nakul Saraiya wrote:
>> I''ve posted the TNF goals and constraints from 12+ years ago 
>> elsewhere, but
>> am curious why Sun invested in DTrace, given the history.  It is a
great
>> tool, and is exactly what Sun should have invested in a decade ago.
>>
>> The question I was asked in PSARC reviews was ''why invest in 
>> something that
>> nobody has asked for'' ...  and there was a violent reaction
from the
>> kernel
>> team when one proposed code patching.
>>
>> Just wondering why this thinking changed(in the right direction)  
>> over the
>> past 12 years..rgds....nakul
>
>
> Hi Nakul,
> I''m sure Bryan/Adam/Mike/Jonathan will chime in with the facts,
> but for me as an outsider looking in (even while I was still at
> Sun) the reason seemed to come down to
>
> The time has come to make things better
>
>
> oh, and there appears to have been sufficient momentum and feedback
> received to give a push to those who needed one.
>
> Actual facts are probably different, the above is merely my
> impression of why DTrace was what resulted.And as long as we''re talking about it, one of the things that has me 
curious is how they managed to make a tool, that modifies kernel 
instructions, so reliable.  I''ve never heard of a case where DTrace 
inadvertently put the wrong instructions at probe points, put the wrong 
instructions in the trampoline, forgot to restore the correct 
instructions, picked up the wrong instructions to restore, or otherwise 
mess up the kernel.  Considering the kernel code gets modified 
(sometimes a lot) every time a DTrace consumer starts and exits, that''s
amazing to me.  (Should it be?  :-)  )  I can understand why there might 
have been objections to this method at the beginning, but they would 
seem to be unfounded.

Chip

James C. McPherson

2006-Oct-20 11:58 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

Chip Bennett wrote:
...> And as long as we''re talking about it, one of the things that has
me
> curious is how they managed to make a tool, that modifies kernel 
> instructions, so reliable.  I''ve never heard of a case where
DTrace
> inadvertently put the wrong instructions at probe points, put the wrong 
> instructions in the trampoline, forgot to restore the correct instructions,
> picked up the wrong instructions to restore, or otherwise mess up the
> kernel.  Considering the kernel code gets modified (sometimes a lot) every
> time a DTrace consumer starts and exits, that''s amazing to me. 
(Should it
> be?  :-)  )  I can understand why there might have been objections to this
> method at the beginning, but they would seem to be unfounded.
Design, Review, Design, Review, Code Inspection, Insanely Tough Regression
Testing(tm), .....

oh, and I think there was a Design aspect in there too :)

It certainly helps to have some of the smartest minds on the planet
thinking about how to solve the problems, too.


cheers,
James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
               http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson

Chip Bennett

2006-Oct-20 12:28 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

James C. McPherson wrote:> Chip Bennett wrote:
> ...
>> And as long as we''re talking about it, one of the things that
has me
>> curious is how they managed to make a tool, that modifies kernel 
>> instructions, so reliable. ...
>
> Design, Review, Design, Review, Code Inspection, Insanely Tough 
> Regression
> Testing(tm), .....
>
> oh, and I think there was a Design aspect in there too :)
>
> It certainly helps to have some of the smartest minds on the planet
> thinking about how to solve the problems, too.So lots of good old hard work.  Solid design methodologies.  And sharp 
people who know Solaris inside out.

But I was thinking about how, when regaling the virtues of DTrace to a 
customer, I avoid mentioning that it modifies the kernel, because I''m 
afraid it will turn them off.

No magic methodology involved?  You know like, "Oh yes, we used SunGuard 
to prevent kernel corruption."  ;-)

Chip

James C. McPherson

2006-Oct-20 12:40 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

Chip Bennett wrote:> James C. McPherson wrote:
>> Chip Bennett wrote:
>> ...
>>> And as long as we''re talking about it, one of the things
that has me
>>> curious is how they managed to make a tool, that modifies kernel 
>>> instructions, so reliable. ...
>>
>> Design, Review, Design, Review, Code Inspection, Insanely Tough 
>> Regression
>> Testing(tm), .....
>>
>> oh, and I think there was a Design aspect in there too :)
>>
>> It certainly helps to have some of the smartest minds on the planet
>> thinking about how to solve the problems, too.
> So lots of good old hard work.  Solid design methodologies.  And sharp 
> people who know Solaris inside out. 
> But I was thinking about how, when regaling the virtues of DTrace to a 
> customer, I avoid mentioning that it modifies the kernel, because
I''m
> afraid it will turn them off.
A good point. DTrace interposes itself into the execution stream, but
that''s a "soft" or rather, transient, interposition which is
removed
as soon as it is no longer needed. I don''t think there''s any
point in
trying to hide that there''s kernel stuff going on. I''d make it
a point
to emphasize that when your probes are complete (or before DTrace is
invoked for them) that they don''t actually exist in the kernel at all.
This transient nature is one of the strengths of DTrace. You know this
already because customers are asking "What effect does DTrace have on
my system when I''m not running DTrace?" ... to which the answer is
very clear "none. zip. nada .... zero" :)
> No magic methodology involved?  You know like, "Oh yes, we used
SunGuard
> to prevent kernel corruption."  ;-)
umm, check the patents list? ... and have you met Bryan/Adam/Mike?


cheers,
James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
               http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson

Chip Bennett

2006-Oct-20 12:50 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

James C. McPherson wrote:> A good point. DTrace interposes itself into the execution stream, but
> that''s a "soft" or rather, transient, interposition
which is removed
> as soon as it is no longer needed. ...
>It''s exactly the transient nature, were it not for the reputation of
the
product and the authors, that would be scary:  changing the instructions 
all the time, and the potential for getting it wrong.

Chip

Tanel Poder

2006-Oct-20 13:18 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

Hi,
> It''s exactly the transient nature, were it not for the 
> reputation of the product and the authors, that would be 
> scary:  changing the instructions all the time, and the 
> potential for getting it wrong.
I don''t see much of a problem here.

Data changes all the time too including various frequently used pointers and
stacks. 

You could crash your entire system by just corrupting one single pointer in
right place in the kernel. So I don''t see how changing code in
controlled
manner is much different from changing data... it''s all the same memory
and
both code and data is crucial for a functioning system.

Btw, I haven''t read Solaris source code - but are you sure that all
kernel
probe points are dynamically inserted with FBT like mechanism during
runtime? 

If you do have the source code, it is possible to statically create
probepoints... (however for performance reasons it could make more sense do
have dynamic probe insertion..)

Tanel.

James Carlson

2006-Oct-20 13:31 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

Tanel Poder writes:> If you do have the source code, it is possible to statically create
> probepoints... (however for performance reasons it could make more sense do
> have dynamic probe insertion..)
That part''s not really true.  A static probe point is just a no-op
when not enabled.  It doesn''t have a substantial performance impact.

-- 
James Carlson, KISS Network                    <james.d.carlson at
sun.com>
Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Tanel Poder

2006-Oct-20 13:40 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

> > If you do have the source code, it is possible to statically create 
> > probepoints... (however for performance reasons it could make more 
> > sense do have dynamic probe insertion..)
> 
> That part''s not really true.  A static probe point is just a 
> no-op when not enabled.  It doesn''t have a substantial 
> performance impact.
Ok, now that I think about it, I agree with you. That''s especially true
with
modern pipelined & out-of-order executing CPUs, where a no-op instruction
shouldn''t have much impact even in very tight loops.

So, does compiled Solaris kernel binary have actual no-op "padding" at
kernel function boundaries, which will be replaced with call instruction (or
this invalid instruction on x86) when probe is enabled?

Thanks,
Tanel.

James Carlson

2006-Oct-20 13:49 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

Tanel Poder writes:> So, does compiled Solaris kernel binary have actual no-op
"padding" at
> kernel function boundaries, which will be replaced with call instruction
(or
> this invalid instruction on x86) when probe is enabled?
No.  The no-op usage is done only for ''sdt'' probes -- where a
programmer has placed an intentional probe point.  ''fbt''
probes are
done by looking at the information produced by the compiler and the
instruction sequences, and modifying the standard prologue/epilogue on
the fly.

-- 
James Carlson, KISS Network                    <james.d.carlson at
sun.com>
Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Tanel Poder

2006-Oct-20 13:54 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

> No.  The no-op usage is done only for ''sdt'' probes --
where a
> programmer has placed an intentional probe point.  ''fbt'' 
> probes are done by looking at the information produced by the 
> compiler and the instruction sequences, and modifying the 
> standard prologue/epilogue on the fly.
Does this apply to kernel probepoints too? (in other words - are kernel
functions compiled without any extra padding and modified on the fly?)

Thanks,
Tanel.

James Carlson

2006-Oct-20 14:00 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

Tanel Poder writes:> > No.  The no-op usage is done only for ''sdt'' probes
-- where a
> > programmer has placed an intentional probe point. 
''fbt''
> > probes are done by looking at the information produced by the 
> > compiler and the instruction sequences, and modifying the 
> > standard prologue/epilogue on the fly.
> 
> Does this apply to kernel probepoints too? (in other words - are kernel
> functions compiled without any extra padding and modified on the fly?)
Yes.  You can confirm this fairly easily with mdb ...

-- 
James Carlson, KISS Network                    <james.d.carlson at
sun.com>
Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Nakul Saraiya

2006-Oct-21 01:28 UTC

head link

[dtrace-discuss] Re: How come DTrace? (From a TNF perpetator)

Clearly there''s excellent engineering and a real, useful product here.

My question was more about whether this was pushed from engineering or perhaps
based on pull from the field (customers, ISVs, SEs...)

rgds...nakul
 
 
This message posted from opensolaris.org

Bart Smaalders

2006-Oct-21 02:57 UTC

head link

[dtrace-discuss] Re: How come DTrace? (From a TNF perpetator)

Nakul Saraiya wrote:> Clearly there''s excellent engineering and a real, useful product
here.
> 
> My question was more about whether this was pushed from engineering or
perhaps based on pull from the field (customers, ISVs, SEs...)
> 
> rgds...nakul

Pushed from engineering.

Or perhaps, once they started talking about DTrace, the rest
of us trying to speed up Solaris became very eager customers,
so a push/pull model amplifier might be a better model.

- Bart

-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts

Nakul Saraiya

2006-Oct-21 03:13 UTC

head link

[dtrace-discuss] Re: Re: How come DTrace? (From a TNF perpetator)

Hi again Bart.

There are three things here that I think are valuable -

1. excellent engineering, obviously :)
2. a product, available all the time to every customer (there are many add-on
packages and/or research efforts that ultimately don''t get used - and
getting used and being useful is the only real metric)
2. a push by Sun to build a community and practice around using the tool to
solve real problems in the customer base.

I, for one, am glad that Sun made the commitment to put this out and evangelize
this, many careers after I worked on the performance and tracing stuff while I
was there.

rgds...nakul
 
 
This message posted from opensolaris.org

Jarod Jenson

2006-Oct-21 20:32 UTC

head link

[dtrace-discuss] Re: Re: How come DTrace? (From a TNF perpetator)

Nakul Saraiya''s email at 10/20/2006 10:13 PM,
said:> Hi again Bart.
> 
> There are three things here that I think are valuable -
> 
> 1. excellent engineering, obviously :)
> 2. a product, available all the time to every customer (there are many
add-on packages and/or research efforts that ultimately don''t get used
- and getting used and being useful is the only real metric)
> 2. a push by Sun to build a community and practice around using the tool to
solve real problems in the customer base.
> 
> I, for one, am glad that Sun made the commitment to put this out and
evangelize this, many careers after I worked on the performance and tracing
stuff while I was there.
> 
> rgds...nakul
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
> 
> 
I agree with this, and would like to state it another way.

The number one design constraint of DTrace is that it is usable and 
valuable in a production environment. This is a clear indicator that its 
intended audience is the customer. I can easily say - as a customer - 
that this has easily been accomplished.

Certainly there was push from the engineers at Sun for a tool, but 
Bryan/Mike/Adam didn''t deliver something that was designed for in-house
use only. Believe me, this investment was due to customer demand. These 
guys are dedicated to customer experience (not just DTrace, but FMA as 
well) and that made DTrace what it is today. I have worked with BMA 
(this acronym needs a Wikipedia page) for some time, and I can tell you 
that any performance issue or system failure of Solaris keeps these guys 
up at night.

DTrace is a differentiator in the world of competitive Operating 
Systems, and its value cannot be touted enough. There should be no 
question of "Why DTrace" - only a heart-felt sense of gratitude for 
these guys taking the initiative and delivering something that ensures 
quality of life for all users of the Solaris OE.

I can only imagine what they have up their sleeves for the next project.

Thanks,

Jarod

Bryan Cantrill

2006-Oct-22 06:36 UTC

head link

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

On Fri, Oct 20, 2006 at 01:06:29AM -0700, Nakul Saraiya
wrote:> I''ve posted the TNF goals and constraints from 12+ years ago
elsewhere, but am curious why Sun invested in DTrace, given the history.  It is
a great tool, and is exactly what Sun should have invested in a decade ago.
> 
> The question I was asked in PSARC reviews was ''why invest in
something that nobody has asked for'' ...  and there was a violent
reaction from the kernel team when one proposed code patching.
> 
> Just wondering why this thinking changed(in the right direction)  over the
past 12 years..rgds....nakul
It wasn''t so much a change in thinking as it was a change in approach:
unlike similar endeavors that had come before it, DTrace was not developed
by a tools group, but rather by a team within the kernel group itself.
That is, we built something that was useful because we ourselves needed
to use it.  And -- like much (most? all?) great software -- we didn''t
go
from requirements document to architecture document to implementation to
deployment; we went from an idea (or rather, series of ideas) to a prototype
that we (collectively) could actually use -- and from using that prototype,
we organically discovered missing functionality and architected (or, in
some cases, rearchitected) appropriately and implemented and redeployed
the prototype.

Which is not to say that the development was desultory or that DTrace
itself has not been carefully and deliberately architected -- but rather
that we let actual use guide our development.  From the perspective of
your question, the upshot of all this was severalfold:

  (1)  The initial investment in DTrace was tiny -- just two of us for
       six months

  (2)  By the time we required additional investment (namely, a third
       engineer, six months in), we had demonstrated clear, quantifiable
       return on the investment from (1)

  (3)  By the time we went to PSARC (which was nearly two years into the
       project), there was no question that we were developing something
       of great value to the operating system and to the company:  we
       had (literally) hundreds of internal users, and DTrace had been
       used to solve (literally) hundreds of problems, many of which could
       not have been solved prior to DTrace

In short, as Bart and Jarod mentioned, DTrace is successful because
it''s
useful -- and it''s useful because its use and its development were very
much intertwined, from its inception.  (With, I might add, a tip of the
hat to Bart and Jarod themselves, who were among DTrace''s earliest
users
and a great source of ideas and feedback.)

Finally, let me note that this pattern of intertwined development and
use is not at all unique to DTrace; if you look at other successful
software (Ruby on Rails comes to mind) you will likely see a similarly
utilitarian ethos...

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc

Nakul Saraiya

2006-Oct-26 04:27 UTC

head link

[dtrace-discuss] Re: How come DTrace? (From a TNF perpetator)

Bryan & co, thanks for the info and congratulations on a great tool that
customers and ISVs can use (in fact, our engineers are now using it to tune our
virtual I/O drivers and are totally impressed :)

As a small, irrelevant footnote:  we had similar capabilities in a system called
ZetaLisp, probably before you were born :) in 1983, and in Xerox D-machines
before that (which had remote debugging capability over the network.)   Dynamic
typing then was very useful in quickly building production systems - one hopes
that the new wave of Ruby, etc can meet those standards.

The SunPRO team I worked with in the early 90s had a dynamic type system in
Scheme that could deal with all C types, hence we decided to use that for our
add-on tools.  Some day, in an idle moment, it is worth reading Danny
Bobrow''s paper on power tools for programmers...

rgds...nakul
 
 
This message posted from opensolaris.org

Bryan Cantrill

2006-Oct-26 06:20 UTC

head link

[dtrace-discuss] Re: How come DTrace? (From a TNF perpetator)

Hey Nakul,
> As a small, irrelevant footnote:  we had similar capabilities in a system
called ZetaLisp, probably before you were born :) in 1983, and in Xerox
D-machines before that (which had remote debugging capability over the network.)
Dynamic typing then was very useful in quickly building production systems - one
hopes that the new wave of Ruby, etc can meet those standards.
I would be curious for details, such as you have them.  From what I can
tell, the ZetaLisp debugger (while interesting) looks much more like a
traditional debugger than it does like DTrace:  it appears to be interactive,
invasive and language-specific.  But what I''m finding might not be what
you''re referring to; could you be a little more specific?

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc

Nakul Saraiya

2006-Oct-27 03:35 UTC

head link

[dtrace-discuss] Re: Re: How come DTrace? (From a TNF perpetator)

My comment was more on the dynamic instrumentation aspect - I''ll go
back and dig up the manuals in my garage and try and get some more details to
you - I seem to recall one could do trace(foo()) and get a summary report of the
entire call tree.   BTW, ZetaLisp was not a language - it was an OS as well.  
(I think it may even have spawned off the GNU project since RMS wrote most of
the GUI AFAIK - but that is pure hearsay.)  Like other special-purpose systems,
it could not compete with the general-purpose market, but it set an agenda.

The brilliant step that I think you folks took was to integrate the dynamic
instrumentation with filtering, aggregation, analysis and reporting - in one
convenient package for users.  Making it all work is an impressive
accomplishment.

Like someone else asked, what next?

Rgds...nakul
 
 
This message posted from opensolaris.org

Nakul Saraiya

2006-Oct-31 05:11 UTC

head link

[dtrace-discuss] Re: Re: How come DTrace? (From a TNF perpetator)

Bryan, I haven''t had the time to go back and research this in any
depth, but here is some more information on the Lisp machine runtime
''meter'' facility.

http://common-lisp.net/project/bknr/static/lmman/fd-hac.xml#meter-section-section

Also, a portable system done later at CMU (not covering OS) is
http://www-users.cs.umn.edu/~gini/lisp/metering.cl

More later
 
 
This message posted from opensolaris.org

dtrace discuss - Oct 2006 - How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] Re: How come DTrace? (From a TNF perpetator)

[dtrace-discuss] Re: How come DTrace? (From a TNF perpetator)

[dtrace-discuss] Re: Re: How come DTrace? (From a TNF perpetator)

[dtrace-discuss] Re: Re: How come DTrace? (From a TNF perpetator)

[dtrace-discuss] How come DTrace? (From a TNF perpetator)

[dtrace-discuss] Re: How come DTrace? (From a TNF perpetator)

[dtrace-discuss] Re: How come DTrace? (From a TNF perpetator)

[dtrace-discuss] Re: Re: How come DTrace? (From a TNF perpetator)

[dtrace-discuss] Re: Re: How come DTrace? (From a TNF perpetator)