Dirk Volkmar
2006-Dec-14 07:28 UTC
[dtrace-discuss] Using DTrace to monitor productions systems
Hi, we are developing and operating an very critical application in the financial sector. Now our customer wants us to report performance data (roundtrip times of the messages routed). My idea is using dtrace to measure the times with the pid provider catching the timestamps on entry of the in and out functions of the processes. Now my questions ;-): 1) Is this an appropriate method to monitor [b]production [/b]system ? 2) Are there impacts on the [b]stability [/b]of the system ? 3) I can imagine that dtracing impacts the [b]performance[/b]. Has anybody experience how much it will impact? Thanks for Your Comments Dirk This message posted from opensolaris.org
Dirk Volkmar
2006-Dec-14 08:05 UTC
[dtrace-discuss] Re: Using DTrace to monitor productions systems
We have chain of processes working with these messages. The first one takes a message from IBM MQSeries, do something with it sends it via TCP/IP to another process. This process transfers the message to an other system via calling an API Function provided by this system. This other system sends an asynchon response to the second process which is sends the answer to first process. The answer is then send to customer by putting it in another MQ Series Queue. --------------- ----------- ----------- -------------- |MQSERIES | <---> Process A <---> Process B <---> |SYSTEM X | --------------- ----------- ----------- -------------- Now we want to measure the time between getting the message via MQ and calling the API Funktion in SYSTEM X and the way back. The time the message spent in SYSTEM X should be excluded. So we have to take time before calling the API funktion. To be mor exactly we want to measure the time messages spent in our system. This time schould not be slower the 80 ms. This message posted from opensolaris.org
Darren Reed
2006-Dec-14 08:36 UTC
[dtrace-discuss] Using DTrace to monitor productions systems
Dirk Volkmar wrote:>Hi, > >we are developing and operating an very critical application in the financial sector. Now our customer wants us to report performance data (roundtrip times of the messages routed) >I''d press them for more details on precisely what they consider the round trip time to be. Is it from send() of the original to the recv() of the reply? Is it from the moment they hand the message on to you to the time they get it back? Is it from the time the packet is sent out to when the reply is seen? Or is it something else? Darren
Adam Leventhal
2006-Dec-14 19:59 UTC
[dtrace-discuss] Using DTrace to monitor productions systems
Hi Dirk, We''ve used DTrace with great success on production applications at various financial institutions. In parituclar I''ve used the pid provider to examine a trading application to measure latency between different parts of the system in precisely the way you suggest. DTrace is always safe so there are no special precautions you need to make when using it on a production system. The performance impact will be relative to the frequency with which traced functions are called. On production systems in particular, I''m always careful to trace exactly the functions I need whereas I might just trace everything in a situation where I could safely kill performance. If you need some more specific advice let us know. Adam On Wed, Dec 13, 2006 at 11:28:19PM -0800, Dirk Volkmar wrote:> Hi, > > we are developing and operating an very critical application in the financial sector. Now our customer wants us to report performance data (roundtrip times of the messages routed). > My idea is using dtrace to measure the times with the pid provider catching the timestamps on entry of the in and out functions of the processes. > Now my questions ;-): > > 1) Is this an appropriate method to monitor [b]production [/b]system ? > 2) Are there impacts on the [b]stability [/b]of the system ? > 3) I can imagine that dtracing impacts the [b]performance[/b]. Has anybody experience how much it will impact? > > Thanks for Your Comments > Dirk > > > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Andreas.Haas at Sun.COM
2006-Dec-15 09:56 UTC
[dtrace-discuss] Using DTrace to monitor productions systems
Hi Dirk, let me join Adam in encouraging you. With Grid Engine 6.1 pre-release we''re doing exactly the same http://wiki.gridengine.info/wiki/index.php/Dtrace#Dtrace-based_Master_monitoring_under_Solaris_10 this specification document contains real-world samples for illustration purposes. Note, there might be a need to prevent functions tracked through pid provider be inlined. Unfortunately this is not yet mentioned in Dtrace docs but you''ll find anything important under http://bugs.opensolaris.org/view_bug.do?bug_id=6480235 Gr??e, Andreas On Thu, 14 Dec 2006, Adam Leventhal wrote:> Hi Dirk, > > We''ve used DTrace with great success on production applications at various > financial institutions. In parituclar I''ve used the pid provider to examine > a trading application to measure latency between different parts of the > system in precisely the way you suggest. > > DTrace is always safe so there are no special precautions you need to make > when using it on a production system. The performance impact will be relative > to the frequency with which traced functions are called. On production systems > in particular, I''m always careful to trace exactly the functions I need > whereas I might just trace everything in a situation where I could safely > kill performance. > > If you need some more specific advice let us know. > > Adam > > On Wed, Dec 13, 2006 at 11:28:19PM -0800, Dirk Volkmar wrote: >> Hi, >> >> we are developing and operating an very critical application in the financial sector. Now our customer wants us to report performance data (roundtrip times of the messages routed). >> My idea is using dtrace to measure the times with the pid provider catching the timestamps on entry of the in and out functions of the processes. >> Now my questions ;-): >> >> 1) Is this an appropriate method to monitor [b]production [/b]system ? >> 2) Are there impacts on the [b]stability [/b]of the system ? >> 3) I can imagine that dtracing impacts the [b]performance[/b]. Has anybody experience how much it will impact? >> >> Thanks for Your Comments >> Dirk >> >> >> This message posted from opensolaris.org >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org > > -- > Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
Dirk Volkmar
2006-Dec-15 13:24 UTC
[dtrace-discuss] Re: Using DTrace to monitor productions systems
Hi, Adam! Thats are good news. I''m not on the wrong path ;-) In the moment I have one problem to solve. It is the start up of the dtrace scripts collecting the performance data. The two proceses mentioned above are not started directly but from an other process. In the moment my solution is that i created wrapper shell scripts that call the dtrace with the process (-c switch). This solution has some problems: 1)If the controlling process shuts down the system the dtraces are killed, but the origin process are still running. Maybe I can handle this with forwarding it to the process (using trap, or something like that). 2) The startup of the processes is significantly slower than the normal startup. Are there any options to speed up the dtrace startup? I''ve noticed that there is a -S switch showing an dtrace intermediate code. But I''vent found any possibilty to use this code. Bye, Dirk This message posted from opensolaris.org
Andreas.Haas at Sun.COM
2006-Dec-15 14:06 UTC
[dtrace-discuss] Re: Using DTrace to monitor productions systems
On Fri, 15 Dec 2006, Dirk Volkmar wrote:> Hi, Adam! > > Thats are good news. I''m not on the wrong path ;-) > > In the moment I have one problem to solve. It is the start up of the dtrace scripts collecting the performance data. The two proceses mentioned above are not started directly but from an other process. > In the moment my solution is that i created wrapper shell scripts that call the dtrace with the process (-c switch). This solution has some problems: > > 1)If the controlling process shuts down the system the dtraces are killed, but the origin process are still running. Maybe I can handle this with forwarding it to the process (using trap, or something like that). > > 2) The startup of the processes is significantly slower than the normal startup. Are there any options to speed up the dtrace startup? I''ve noticed that there is a -S switch showing an dtrace intermediate code. But I''vent found any possibilty to use this code.Don''t know if if is applicable to you, but in our case the pids of our two daemons get simply passed as arguments $1 and $2 to the dtrace script pid$1::sge_log:entry, pid$2::sge_log:entry the pids of our daemons are read from a pid file by a tiny boune shell wrapper scripts that launces dtrace like this /usr/sbin/dtrace -s ./monitor.d $master_pid $schedd_pid I''m quite happy with using it that way, as it allows the monitor be started/stopped independently of daemons life-cycle as an interactive command. Integrating the monitor in daemon start-up procedures still is possible ;-) Regards, Andreas
Rayson Ho
2006-Dec-15 15:20 UTC
[dtrace-discuss] Using DTrace to monitor productions systems
Dirk, Yes, DTrace is designed from the ground up to instrument production systems!! In the Usenix paper, "Dynamic Instrumentation of Production Systems", it mentions that DTrace was used inside Sun on a server with 170 Sun Ray clients. The server had performance problems. With DTrace, they found that it''s due to a stock ticker applet for the GNOME desktop calling expensive system calls 100 times per second... You can read the details on page 11: http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf Andreas, I couldn''t append to bug 6480235... I just want to mention that to tell gcc not to inline a function, the user needs to use the noinline attribute: <function name> __attribute__ ((noinline)) Rayson P.S. looking forward to see SUNW hit $6 early next year!!! ;) On 12/15/06, Andreas.Haas at sun.com <Andreas.Haas at sun.com> wrote:> Hi Dirk, > > let me join Adam in encouraging you. > > With Grid Engine 6.1 pre-release we''re doing exactly the same > > http://wiki.gridengine.info/wiki/index.php/Dtrace#Dtrace-based_Master_monitoring_under_Solaris_10 > > this specification document contains real-world samples for > illustration purposes. > > Note, there might be a need to prevent functions tracked through > pid provider be inlined. Unfortunately this is not yet mentioned in > Dtrace docs but you''ll find anything important under > > http://bugs.opensolaris.org/view_bug.do?bug_id=6480235 > > Gr??e, > Andreas > > On Thu, 14 Dec 2006, Adam Leventhal wrote: > > > Hi Dirk, > > > > We''ve used DTrace with great success on production applications at various > > financial institutions. In parituclar I''ve used the pid provider to examine > > a trading application to measure latency between different parts of the > > system in precisely the way you suggest. > > > > DTrace is always safe so there are no special precautions you need to make > > when using it on a production system. The performance impact will be relative > > to the frequency with which traced functions are called. On production systems > > in particular, I''m always careful to trace exactly the functions I need > > whereas I might just trace everything in a situation where I could safely > > kill performance. > > > > If you need some more specific advice let us know. > > > > Adam > > > > On Wed, Dec 13, 2006 at 11:28:19PM -0800, Dirk Volkmar wrote: > >> Hi, > >> > >> we are developing and operating an very critical application in the financial sector. Now our customer wants us to report performance data (roundtrip times of the messages routed). > >> My idea is using dtrace to measure the times with the pid provider catching the timestamps on entry of the in and out functions of the processes. > >> Now my questions ;-): > >> > >> 1) Is this an appropriate method to monitor [b]production [/b]system ? > >> 2) Are there impacts on the [b]stability [/b]of the system ? > >> 3) I can imagine that dtracing impacts the [b]performance[/b]. Has anybody experience how much it will impact? > >> > >> Thanks for Your Comments > >> Dirk > >> > >> > >> This message posted from opensolaris.org > >> _______________________________________________ > >> dtrace-discuss mailing list > >> dtrace-discuss at opensolaris.org > > > > -- > > Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl > > _______________________________________________ > > dtrace-discuss mailing list > > dtrace-discuss at opensolaris.org > > > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org > >
Andreas.Haas at Sun.COM
2006-Dec-15 15:51 UTC
[dtrace-discuss] Using DTrace to monitor productions systems
On Fri, 15 Dec 2006, Rayson Ho wrote:> Andreas, I couldn''t append to bug 6480235... I just want to mention > that to tell gcc not to inline a function, the user needs to use the > noinline attribute: > <function name> __attribute__ ((noinline))Thanks, Rayson. Just added it to #6480235, yet for some reason it still doesn''t appear in http://bugs.opensolaris.org/view_bug.do?bug_id=6480235 but maybe it''s just delayed.> Rayson > > P.S. looking forward to see SUNW hit $6 early next year!!! ;)Shiver 8-) Cheers, Andreas
Dirk Volkmar
2006-Dec-18 09:01 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions systems
The problem with this solution is, that I have to catch the stop and starts of the processes and restart the skript. The normal case is that the processes are running and not stopped during the day. But sometimes the processes are stopped and restart via an admin frontend. This message posted from opensolaris.org
Andreas.Haas at Sun.COM
2006-Dec-18 09:43 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions systems
On Mon, 18 Dec 2006, Dirk Volkmar wrote:> The problem with this solution is, that I have to catch the > stop and starts of the processes and restart the skript.I see. Catching ''stop'' should be doable ... even though I honestly couldn''t tell how exactly this had to be done in a bullet-proof fashion.> The normal case is that the processes are running and not stopped > during the day. But sometimes the processes are stopped and restart > via an admin frontend.If there were pid-files for your daemon components or similar, polling until restart and relaunching dtrace monitor wouldn''t be that hard. Andreas
Roch - PAE
2006-Dec-18 10:55 UTC
[dtrace-discuss] Re: Using DTrace to monitor productions systems
Dirk Volkmar writes: > Hi, Adam! > > Thats are good news. I''m not on the wrong path ;-) > > In the moment I have one problem to solve. It is the start up of the > dtrace scripts collecting the performance data. The two proceses > mentioned above are not started directly but from an other process. > In the moment my solution is that i created wrapper shell scripts that > call the dtrace with the process (-c switch). This solution has some > problems: > > 1)If the controlling process shuts down the system the dtraces are > killed, but the origin process are still running. Maybe I can handle > this with forwarding it to the process (using trap, or something like > that). > > 2) The startup of the processes is significantly slower than the > normal startup. Are there any options to speed up the dtrace startup? > I''ve noticed that there is a -S switch showing an dtrace intermediate > code. But I''vent found any possibilty to use this code. > > Bye, Dirk I don''t know if it''s still behaves like this but, in the past, I found that instrumenting lots and lots of pid probes would be much faster is the application was itself stopped during that instrumentation. I never got to the root of this issue or even if it was real or perceived slowdown (in my case). -r > > > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Adam Leventhal
2006-Dec-18 18:38 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions systems
Another option might be to embed USDT probes in the application. This would allow you to trace these probes as the processes come and go. For example, if you had a USDT provider ''foo'', the instance for pid 567 would be foo567, but you could trace all of them by specifying foo*::: as the probe description. This will match for new processes as well as existing ones. Adam On Mon, Dec 18, 2006 at 01:01:44AM -0800, Dirk Volkmar wrote:> The problem with this solution is, that I have to catch the stop and starts of the processes and restart the skript. > > The normal case is that the processes are running and not stopped during the day. But sometimes the processes are stopped and restart via an admin frontend. > > > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Richard L. Hamilton
2006-Dec-19 12:54 UTC
[dtrace-discuss] Re: Re: Re: Using DTrace to monitor productions
What about a predicate on the execname; is there a way to get from that to what you want to do? This message posted from opensolaris.org
Andreas.Haas at Sun.COM
2006-Dec-19 13:27 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions systems
On Mon, 18 Dec 2006, Andreas.Haas at sun.com wrote:> On Mon, 18 Dec 2006, Dirk Volkmar wrote: > >> The problem with this solution is, that I have to catch the stop and starts >> of the processes and restart the skript. > > I see. Catching ''stop'' should be doable ... even though I honestly couldn''t > tell how exactly this had to be done in a bullet-proof fashion.Just realized how useful catching daemon finish event would be for our Grid Engine monitor.>From the app_crash article I can conclude how I could react to signalevents that lead to a core dump. I''m curious how I''d catch a regular job finish. It appears, it is as easy as using pid$1::exit:entry { exit(0) } are there any other circumstances besides signalling and exit(0) that would need to be handled? Thanks! Andreas
Adam Leventhal
2006-Dec-19 18:18 UTC
[dtrace-discuss] Re: Re: Re: Using DTrace to monitor productions
Yes, but not with pid provider probes. Adam On Tue, Dec 19, 2006 at 04:54:33AM -0800, Richard L. Hamilton wrote:> What about a predicate on the execname; is there a way to get from that > to what you want to do? > > > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Nicolas Williams
2006-Dec-19 19:12 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions systems
On Tue, Dec 19, 2006 at 02:27:43PM +0100, Andreas.Haas at Sun.COM wrote:> On Mon, 18 Dec 2006, Andreas.Haas at sun.com wrote: > > >On Mon, 18 Dec 2006, Dirk Volkmar wrote: > > > >>The problem with this solution is, that I have to catch the stop and > >>starts of the processes and restart the skript. > > > >I see. Catching ''stop'' should be doable ... even though I honestly > >couldn''t tell how exactly this had to be done in a bullet-proof fashion. > > Just realized how useful catching daemon finish event would be for > our Grid Engine monitor.What''s wrong with using process contracts instead? Nico --
Andreas.Haas at Sun.COM
2006-Dec-20 10:56 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions systems
On Tue, 19 Dec 2006, Nicolas Williams wrote:> On Tue, Dec 19, 2006 at 02:27:43PM +0100, Andreas.Haas at Sun.COM wrote: >> On Mon, 18 Dec 2006, Andreas.Haas at sun.com wrote: >> >>> On Mon, 18 Dec 2006, Dirk Volkmar wrote: >>> >>>> The problem with this solution is, that I have to catch the stop and >>>> starts of the processes and restart the skript. >>> >>> I see. Catching ''stop'' should be doable ... even though I honestly >>> couldn''t tell how exactly this had to be done in a bullet-proof fashion. >> >> Just realized how useful catching daemon finish event would be for >> our Grid Engine monitor. > > What''s wrong with using process contracts instead?Process contracts? I searched for ''contracts'' in "Solaris Dynamic Tracing Guide" pdf, but got no hits. Is it a concept on top of Dtrace? Regards, Andreas
Casper.Dik at Sun.COM
2006-Dec-20 11:52 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions systems
>Process contracts? I searched for ''contracts'' in "Solaris Dynamic >Tracing Guide" pdf, but got no hits. > >Is it a concept on top of Dtrace?no, a separate concept used for SMF; it''s how SMF can magically determine that children and grandchildren of the processes it launched terminate or continue to run. Casper
James Carlson
2006-Dec-20 12:31 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions systems
Andreas.Haas at Sun.COM writes:> > What''s wrong with using process contracts instead? > > Process contracts? I searched for ''contracts'' in "Solaris Dynamic > Tracing Guide" pdf, but got no hits. > > Is it a concept on top of Dtrace?No; it was introduced for SMF. See contract(4) and libcontract(3LIB). -- James Carlson, KISS Network <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Andreas.Haas at Sun.COM
2006-Dec-20 17:09 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions #ms
On Wed, 20 Dec 2006, James Carlson wrote:> Andreas.Haas at Sun.COM writes: >>> What''s wrong with using process contracts instead? >> >> Process contracts? I searched for ''contracts'' in "Solaris Dynamic >> Tracing Guide" pdf, but got no hits. >> >> Is it a concept on top of Dtrace? > > No; it was introduced for SMF. See contract(4) and libcontract(3LIB).Anyways. I would consider it inferior and be it only for it requires a certain OS release for the build machine. The Dtrace pid-provider I can use on S10 instantly -- even with binaries built on Solaris 7. I need no instrumenation, I need no addt''l library, nothing else :) All I must honour is the no-inline function constraint. Regards, Andreas
Nicolas Williams
2006-Dec-20 17:20 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions #ms
On Wed, Dec 20, 2006 at 06:09:43PM +0100, Andreas.Haas at Sun.COM wrote:> On Wed, 20 Dec 2006, James Carlson wrote: > > >Andreas.Haas at Sun.COM writes: > >>>What''s wrong with using process contracts instead? > >> > >>Process contracts? I searched for ''contracts'' in "Solaris Dynamic > >>Tracing Guide" pdf, but got no hits. > >> > >>Is it a concept on top of Dtrace? > > > >No; it was introduced for SMF. See contract(4) and libcontract(3LIB). > > Anyways. I would consider it inferior and be it only for it requires a > certain OS release for the build machine. The Dtrace pid-provider > I can use on S10 instantly -- even with binaries built on Solaris 7.Huh? No, you can use process contracts with binaries built on S7 too, as long as you''re on S10 and up (same requirement as for DTrace). Applications need not be aware of process contracts to be in process contracts (every process is in a process contract, always). Only your job launch/status management application need be process contract aware. And process contracts are more robust than DTrace. DTrace can drop events; process contracts will not. See ctrun(1), ctwatch(1) and contract(4). Source code that sets up contracts and watches for events abounds in OpenSolaris.> I need no instrumenation, I need no addt''l library, nothing else :)No, you don''t, whether you use DTrace for this or process contracts. Nico --
Andreas.Haas at Sun.COM
2006-Dec-20 17:25 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions #ms
On Wed, 20 Dec 2006, Nicolas Williams wrote:> On Wed, Dec 20, 2006 at 06:09:43PM +0100, Andreas.Haas at Sun.COM wrote: >> On Wed, 20 Dec 2006, James Carlson wrote: >> >>> Andreas.Haas at Sun.COM writes: >>>>> What''s wrong with using process contracts instead? >>>> >>>> Process contracts? I searched for ''contracts'' in "Solaris Dynamic >>>> Tracing Guide" pdf, but got no hits. >>>> >>>> Is it a concept on top of Dtrace? >>> >>> No; it was introduced for SMF. See contract(4) and libcontract(3LIB). >> >> Anyways. I would consider it inferior and be it only for it requires a >> certain OS release for the build machine. The Dtrace pid-provider >> I can use on S10 instantly -- even with binaries built on Solaris 7. > > Huh?Sorry.> No, you can use process contracts with binaries built on S7 too, as long > as you''re on S10 and up (same requirement as for DTrace). Applications > need not be aware of process contracts to be in process contracts (every > process is in a process contract, always). > > Only your job launch/status management application need be process > contract aware. > > And process contracts are more robust than DTrace. DTrace can drop > events; process contracts will not. > > See ctrun(1), ctwatch(1) and contract(4). > > Source code that sets up contracts and watches for events abounds in > OpenSolaris. > >> I need no instrumenation, I need no addt''l library, nothing else :) > > No, you don''t, whether you use DTrace for this or process contracts.Didn''t now, Nicolas. Thanks, Andreas
Nicolas Williams
2006-Dec-20 17:39 UTC
[dtrace-discuss] Re: Re: Using DTrace to monitor productions #ms
On Wed, Dec 20, 2006 at 06:25:20PM +0100, Andreas.Haas at Sun.COM wrote:> On Wed, 20 Dec 2006, Nicolas Williams wrote: > >See ctrun(1), ctwatch(1) and contract(4).In fact, if you''re scripting job launch/monitoring then using ctrun(1) and ctwatch(1) may be best -- no C code to write. Yes, ctwatch(1)''s output is not Committed, but you can probably deal. BTW, IMO, that''s a bug; file a CR if you need stable output from ctwatch(1). The best part is you get to let the customer specify fatal events and restart semantics in a way that is consistent with ctrun and SMF. Nico --