Hi, I am new to dtrace and I am not sure if I am using it correctly. I have an application running under Solaris 10 x86 that is behaving strangely. The debug version seems to run well but the optimized version is crashing. Armed with the "pstack core" output. I started to investigate the last function that crashed. Using the script below, #!/usr/sbin/dtrace -qs pid$1:a.out:myfunc:entry { printf("Entering myfunc\n"); printf("Arg 0 : %x\n",arg0); printf("Arg 1 : %d\n",arg1); } To run, I used "./script.d <pid>". But, I am seeing something strange. 1. Without running the "script.d", my application processed about 200+ pages before crashing. 2. With the "script.d", my application processed about 100+ pages and then it crashed. 3. I used the dapptrace script from the DTraceToolkit-0.82 (./dapptrace -Up <pid>), my application processed about 40+ pages and then it crashed. I thought Dtrace is not supposed to affect any of the running processes. Not sure why is there a difference. Any ideas? If anyone has pointers to why debug version works and opt version crashed, please let me know as well. That''s the main problem that I was trying to solve. thanks yee koon This message posted from opensolaris.org
Hi Yee Koon,> I thought Dtrace is not supposed to affect any of the > running processes. Not sure why is there a difference. Any ideas?It''s difficult to give a concrete answer without a bit more knowledge of how your process works, but it could be some sort of timing issue. Whilst DTrace is very unintrusive, it will slow your process down a bit, particularly if you call that function a great deal. I wouldn''t normally expect you to notice this overhead, but it''s possible that it''s just enough to change the behaviour of a timing-related problem. Of course, it might also just be a coincidence. You didn''t mention whether the difference is consistent - it might just have been a one-off, particularly if the crash is normally a bit random.> > If anyone has pointers to why debug version works and > opt version crashed, please let me know as well. That''s the > main problem that I was trying to solve.My experience of this sort of thing is that it''s generally some sort of stray, uninitialised pointer or variable. It would be worth looking into using libumem or libwatchmalloc to try and track this down. It''s also possible (although much less likely) that there''s a problem with the compiler you''re using. I know we saw a crash which was a compiler problem, which was limited to optimised code on AMD64 only, in an early access build of the Studio 10 compiler (but this was fixed for the release). Regards, -- Philip Beevers Fidessa Infrastructure Development mailto:philip.beevers at fidessa.com phone: +44 1483 206571> -----Original Message----- > From: dtrace-discuss-bounces at opensolaris.org > [mailto:dtrace-discuss-bounces at opensolaris.org]On Behalf Of > Loh Yee Koon > Sent: 02 August 2005 09:39 > To: dtrace-discuss at opensolaris.org > Subject: [dtrace-discuss] dtrace affecting process? > > > Hi, > > I am new to dtrace and I am not sure if I am using it correctly. > > I have an application running under Solaris 10 x86 that is > behaving strangely. The debug version seems to run well but > the optimized version is crashing. Armed with the "pstack > core" output. I started to investigate the last function that crashed. > > Using the script below, > #!/usr/sbin/dtrace -qs > pid$1:a.out:myfunc:entry > { > printf("Entering myfunc\n"); > printf("Arg 0 : %x\n",arg0); > printf("Arg 1 : %d\n",arg1); > } > To run, I used "./script.d <pid>". > > But, I am seeing something strange. > 1. Without running the "script.d", my application > processed about 200+ pages before crashing. > 2. With the "script.d", my application processed about > 100+ pages and then it crashed. > 3. I used the dapptrace script from the > DTraceToolkit-0.82 (./dapptrace -Up <pid>), my application > processed about 40+ pages and then it crashed. > > I thought Dtrace is not supposed to affect any of the > running processes. Not sure why is there a difference. Any ideas? > > If anyone has pointers to why debug version works and > opt version crashed, please let me know as well. That''s the > main problem that I was trying to solve. > > thanks > > yee koon > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >****************************************************************** This message is intended only for the stated addressee(s) and may be confidential. Access to this email by anyone else is unauthorised. Any opinions expressed in this email do not necessarily reflect the opinions of royalblue. Any unauthorised disclosure, use or dissemination, either whole or in part is prohibited. If you are not the intended recipient of this message, please notify the sender immediately. ******************************************************************
some day i will learn to press reply all on mailing lists reply''s On 8/2/05, James Dickens <jamesd.wi at gmail.com> wrote:> > > > On 8/2/05, Loh Yee Koon <yeekoon.loh at fujixerox.com> wrote: > > > > Hi, > > > > I am new to dtrace and I am not sure if I am using it correctly. > > > > I have an application running under Solaris 10 x86 that is behaving > > strangely. The debug version seems to run well but the optimized version is > > crashing. Armed with the "pstack core" output. I started to investigate the > > last function that crashed. > > > > Using the script below, > > #!/usr/sbin/dtrace -qs > > pid$1:a.out:myfunc:entry > > { > > printf("Entering myfunc\n"); > > printf("Arg 0 : %x\n",arg0); > > printf("Arg 1 : %d\n",arg1); > > } > > To run, I used "./script.d <pid>". > > > > not sure it will help in your issue, but your probe can be recoded like, > every command that a probe executes slows down the system, if you can do it > all in one line its faster. > > printf("Entering myfunc\n Arg 0: %x\n Arg 1: %d\n" arg0, arg1); > > Also if you know the first Y calls to the function work perfectly you can > use the following to speed up its execution. . > > int i=0; > > pid$1:a.out:myfunc:entry > /i > Y / > { > i++; > printf("Entering myfunc\n called %d times, Arg 0: %x\n Arg 1: %d\n", i, > arg0, arg1); > } > > pid$1:a.out:myfunc:entry > { > i++; > } > > James Dickens > uadmin.blogspot.com <http://uadmin.blogspot.com> > > > > But, I am seeing something strange. > > 1. Without running the "script.d", my application processed about 200+ > > pages before crashing. > > 2. With the "script.d", my application processed about 100+ pages and > > then it crashed. > > 3. I used the dapptrace script from the DTraceToolkit-0.82 (./dapptrace > > -Up <pid>), my application processed about 40+ pages and then it crashed. > > > > I thought Dtrace is not supposed to affect any of the running processes. > > Not sure why is there a difference. Any ideas? > > > > If anyone has pointers to why debug version works and opt version > > crashed, please let me know as well. That''s the main problem that I was > > trying to solve. > > > > thanks > > > > yee koon > > This message posted from opensolaris.org <http://opensolaris.org> > > _______________________________________________ > > dtrace-discuss mailing list > > dtrace-discuss at opensolaris.org > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20050802/6f7c43a2/attachment.html>
G''Day Folks, On Tue, 2 Aug 2005, Philip Beevers wrote:> Hi Yee Koon, > > > I thought Dtrace is not supposed to affect any of the > > running processes. Not sure why is there a difference. Any ideas? > > It''s difficult to give a concrete answer without a bit more knowledge of how > your process works, but it could be some sort of timing issue. Whilst DTrace > is very unintrusive, it will slow your process down a bit, particularly if > you call that function a great deal. I wouldn''t normally expect you to > notice this overhead, but it''s possible that it''s just enough to change the > behaviour of a timing-related problem.Yep - sounds like timing. I wrote a little document on the timing effects in Docs/Notes/dtruss_notes.txt in the DTraceToolkit - it becomes greater on slower machines. For example, here is an UltraSPARC 5 running dapptrace, # dapptrace -eoUF date ELAPSD CPU CALL(args) = return . . -> ld.so.1:_rt_boot(0x0, 0x0, 0x0) . . -> ld.so.1:_setup(0xFFBFF9B0, 0x3D100, 0x0) . . -> ld.so.1:setup(0xFFBFFA3C, 0xFFBFFA88, 0x300) . . -> ld.so.1:fmap_setup(0x0, 0x0, 0xFF3EC224) 96 19 <- ld.so.1:fmap_setup = 180 . . -> ld.so.1:addfree(0xFF3EE9E8, 0x1618, 0x0) 272 200 <- ld.so.1:addfree = 76 . . -> ld.so.1:alist_append(0xFF3EC120, 0x0, 0x10) . . -> ld.so.1:malloc(0x8C, 0x0, 0x0) . . -> ld.so.1:align(0x8C, 0x4, 0x0) 60 7 <- ld.so.1:align = 100 . . -> ld.so.1:split(0xFF3EE9F0, 0x8C, 0x3) 59 7 <- ld.so.1:split = 72 338 36 <- ld.so.1:malloc = 268 . . -> ld.so.1:memset(0xFF3EEA0C, 0x0, 0x10) 66 8 <- ld.so.1:memset = 100 658 71 <- ld.so.1:alist_append = 220 [...] Elapsed time and on-CPU time (the first 2 columns) are usually quite different when you have blocking system calls like read(). Here we hawe a total 302 us discrepancy for malloc() - this is due to DTrace slowing things down (slightly). dapptrace really examines a lot - and we must pay a small price for that. The above demonstration is extreme as it''s a 360 MHz UltraSPARC 5. It''s 100 us on a 1050 MHz UltraSPARC IV CPU, and 60 us on a 2791 MHz Xeon CPU. Perhaps a way to eliminate DTrace is to run other CPU hungry programs and see if they also effect your program. Something to really kick it around could be to run a "prstat -R 0" at the same time (assuming single CPU).> > If anyone has pointers to why debug version works and > > opt version crashed, please let me know as well. That''s the > > main problem that I was trying to solve.Hmm - perhaps a pointer problem. The debug version may be populating more memory which is being referenced by accident. Or compiler has a funny idea of what optimising means for that architecture (as Philip said). ... Sounds like when I was an engineering student and wanted to submit my final year project with a CRO still attached, which I had been using to debug timing problems (shudder). Funnily enough, every time I attached the CRO the problem dissapeared. (grounding). :) Brendan [Sydney, Australia]