-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Trying to debug a very ugly issue (multisecond disk delays), found the following "impossible" info: """ ... 0 40286 pwrite64:entry 0 40287 pwrite64:return 36436 0 39998 fdsync:entry 5379 1 39999 fdsync:return 37918785 4 1 40286 pwrite64:entry 1 40287 pwrite64:return 69130 1 39998 fdsync:entry 5394 0 39999 fdsync:return 213396018 4 0 40286 pwrite64:entry 0 40287 pwrite64:return 22620 0 39998 fdsync:entry 3993 0 40286 pwrite64:entry 0 40287 pwrite64:return 63125 0 39998 fdsync:entry 5358 0 39999 fdsync:return 35506247 4 1 39999 fdsync:return 48140410 4 1 40286 pwrite64:entry 1 40287 pwrite64:return 52625 1 39998 fdsync:entry 5068 1 39999 fdsync:return 13777558 4 1 39999 fdsync:return 7192502 4 ... """ The "pwrite64:return" shows the time spend in the "pwrite64" syscall. The "fdsync:entry" number shows the delay between the "pwrite64" finishing and the "fdsync" syscall. The "fdsync:return" shows time spend in the fdsync. We can see this syscall is very slow (comparatively), as expected. The issue is the last fragment: There is multiple "return" but only a single call. How can it be possible. What am I missing. The dtrace code: """ [root at tesalia correo]# cat pwrite_y_fdsync.d self unsigned long long ts; syscall::pwrite64:entry /pid==$1/ { printf(" "); self->ts=timestamp; } syscall::pwrite64:return /pid==$1 && self->ts/ { printf("%llu",timestamp-self->ts); self->ts=timestamp; } syscall::fdsync:entry /pid==$1/ { printf("%llu",timestamp-self->ts); self->ts=timestamp; } syscall::fdsync:return /pid==$1 && self->ts/ { printf("%llu %d",timestamp-self->ts,tid); self->ts=0; } """ Solaris 10 Update 3 here. Thanks in advance. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at argo.es http://www.argo.es/~jcea/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRmWyZplgi5GaxT1NAQLGPQP7BnOR7N4AspdM36R/wZBMX+S4Fhu7+JEI PA8VZaQninPCMZjHwlVoYDcf2BVbeY/9L9CKOlLLsuQaE18G5Zu66p3MXlbVHLhF VOJ64nHU2MFCyV1v863ypBimBFENxHE9X8gFCt4h/fKIAex6cSTdYzwM5irIRvsV iTPyE+GDWyo=NL28 -----END PGP SIGNATURE-----
Hi Jesus, Note that DTrace output is presented one CPU at a time rather than sorted chronologically. Without seeing the full output from DTrace, one wouldn''t be able to draw any conclusions about a ''missing'' entry. I suggest you trace the timestamp in addition and then sort based on that column or pbind the process to a specific CPU when doing this sort of analysis. Adam On Tue, Jun 05, 2007 at 08:58:46PM +0200, Jesus Cea wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Trying to debug a very ugly issue (multisecond disk delays), found the > following "impossible" info: > > """ > ... > 0 40286 pwrite64:entry > 0 40287 pwrite64:return 36436 > 0 39998 fdsync:entry 5379 > 1 39999 fdsync:return 37918785 4 > 1 40286 pwrite64:entry > 1 40287 pwrite64:return 69130 > 1 39998 fdsync:entry 5394 > 0 39999 fdsync:return 213396018 4 > 0 40286 pwrite64:entry > 0 40287 pwrite64:return 22620 > 0 39998 fdsync:entry 3993 > 0 40286 pwrite64:entry > 0 40287 pwrite64:return 63125 > 0 39998 fdsync:entry 5358 > 0 39999 fdsync:return 35506247 4 > 1 39999 fdsync:return 48140410 4 > 1 40286 pwrite64:entry > 1 40287 pwrite64:return 52625 > 1 39998 fdsync:entry 5068 > 1 39999 fdsync:return 13777558 4 > 1 39999 fdsync:return 7192502 4 > ... > """ > > The "pwrite64:return" shows the time spend in the "pwrite64" syscall. > The "fdsync:entry" number shows the delay between the "pwrite64" > finishing and the "fdsync" syscall. The "fdsync:return" shows time spend > in the fdsync. We can see this syscall is very slow (comparatively), as > expected. > > The issue is the last fragment: There is multiple "return" but only a > single call. How can it be possible. What am I missing. > > The dtrace code: > > """ > [root at tesalia correo]# cat pwrite_y_fdsync.d > self unsigned long long ts; > > syscall::pwrite64:entry > /pid==$1/ > { > printf(" "); > self->ts=timestamp; > } > > syscall::pwrite64:return > /pid==$1 && self->ts/ > { > printf("%llu",timestamp-self->ts); > self->ts=timestamp; > } > > syscall::fdsync:entry > /pid==$1/ > { > printf("%llu",timestamp-self->ts); > self->ts=timestamp; > } > > syscall::fdsync:return > /pid==$1 && self->ts/ > { > printf("%llu %d",timestamp-self->ts,tid); > self->ts=0; > } > """ > > Solaris 10 Update 3 here. > > Thanks in advance. > > - -- > Jesus Cea Avion _/_/ _/_/_/ _/_/_/ > jcea at argo.es http://www.argo.es/~jcea/ _/_/ _/_/ _/_/ _/_/ _/_/ > jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ > _/_/ _/_/ _/_/ _/_/ _/_/ > "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ > "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iQCVAwUBRmWyZplgi5GaxT1NAQLGPQP7BnOR7N4AspdM36R/wZBMX+S4Fhu7+JEI > PA8VZaQninPCMZjHwlVoYDcf2BVbeY/9L9CKOlLLsuQaE18G5Zu66p3MXlbVHLhF > VOJ64nHU2MFCyV1v863ypBimBFENxHE9X8gFCt4h/fKIAex6cSTdYzwM5irIRvsV > iTPyE+GDWyo> =NL28 > -----END PGP SIGNATURE----- > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Adam Leventhal wrote:> Note that DTrace output is presented one CPU at a time rather than sorted > chronologically. Without seeing the full output from DTrace, one wouldn''t > be able to draw any conclusions about a ''missing'' entry. I suggest you > trace the timestamp in addition and then sort based on that column or > pbind the process to a specific CPU when doing this sort of analysis.Oh, I see. That could explain it. How can I sort the data by timestamp?. Must I "printf" the timestamp and postprocess the data manually? "pbind" seems to work fine. Now, I must investigate why some "fdsync" are so painfully slow. Some take >5 seconds... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at argo.es http://www.argo.es/~jcea/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRmW5Kplgi5GaxT1NAQLMLQQAg+KxhVaJEuMIhS3qVai+sQHA/psiLX98 KyRspd6+esyBmAU149Vg9BEBTtmwRz7lW/jpyoahE3GH7xsKUvahSIvBlKEdzYHm QyEuWEGrsdcFb6U15JHLpzLsiM0V63uDJsud7t/nWGdO7PT5ahi1cFj8lYMFvhed J4ONNdNFzhk=fapT -----END PGP SIGNATURE-----
On Tue, Jun 05, 2007 at 09:27:38PM +0200, Jesus Cea wrote:> Oh, I see. That could explain it. How can I sort the data by timestamp?. > Must I "printf" the timestamp and postprocess the data manually?Unfortunately, yes. We''ve talked about implementing an option that would sort by timestamp by default, but it would involve more behind the scenes magic than we usually like to provide with DTrace.> Now, I must investigate why some "fdsync" are so painfully slow. Some > take >5 seconds...You might play with the quantize() aggregating action to see the distribution of fdsync() latency. Your results will almost certainly depend on the underlying filesystem. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
the Users Guide isn''t explicit about what happens when ustack() is called and there isn''t any user address space or stack associated with the current kernel thread. It only gives examples where there certainly is, using the syscall provider. I have used sched:::off-cpu to see user-stacks at context-switch events, and it seems to work just fine, but I''m wondering how to put in an explicit predicate for user-associated kernel threads...?... thanks Pete Lawrence.
You''re right that it''s not explicitly stated, but if there''s no associated user context, ustack() should return an empty stack and ustackdepth should have a value of 0. Adam On Wed, Jun 06, 2007 at 11:27:48AM -0700, Peter Lawrence wrote:> > the Users Guide isn''t explicit about what happens when ustack() is called > and there isn''t any user address space or stack associated with the > current kernel thread. It only gives examples where there certainly is, > using the syscall provider. > > I have used sched:::off-cpu to see user-stacks at context-switch events, > and it seems to work just fine, but I''m wondering how to put in an > explicit predicate for user-associated kernel threads...?... > > thanks > Pete Lawrence. > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Adam Leventhal wrote:> You''re right that it''s not explicitly stated, but if there''s no associated > user context, ustack() should return an empty stack and ustackdepth should > have a value of 0.hmm, coincidentally, i just got this output from ustack() and the stacktrace w/ only digits gives "addresses" that were not mapped (eg, "13ba8/i" when in "mdb -k") # dtrace -ae CPU ID FUNCTION:NAME 0 43807 di_cache_write:entry cmd 0x16a48 libc.so.1`ioctl+0x8 libpicldevtree.so.1`libdevinfo_init+0x20 libpicldevtree.so.1`picldevtree_init+0xd4 libpicltree.so.1`construct_picltree+0x98 libpicltree.so.1`xptree_initialize+0xc8 picld`main+0x324 picld`_start+0x108 0 43808 di_cache_write:return rv 0x1 0 43807 di_cache_write:entry cmd 0x16a48 0xff1c1054 0xff37737c 0x150e0 0x1549c 0x13f00 0x13ba8 0 43808 di_cache_write:return rv 0x1 ------ # mdb -k > $m BASE LIMIT SIZE NAME 1000000 18fa000 8fa000 ktextseg 18fa000 1e00000 506000 kvalloc 70000000 80000000 10000000 kvseg32 edd00000 f0000000 2300000 kdebugseg 2a100000000 2a13e680000 3e680000 kpseg 2a750000000 2a753a7c000 3a7c000 kmapseg 30000000000 70000000000 40000000000 kvseg 70000000000 7000090e000 90e000 kmem64 8000000000000000 8000100000000000 100000000000 kpmseg> > Adam > > On Wed, Jun 06, 2007 at 11:27:48AM -0700, Peter Lawrence wrote: > >>the Users Guide isn''t explicit about what happens when ustack() is called >>and there isn''t any user address space or stack associated with the >>current kernel thread. It only gives examples where there certainly is, >>using the syscall provider. >> >>I have used sched:::off-cpu to see user-stacks at context-switch events, >>and it seems to work just fine, but I''m wondering how to put in an >>explicit predicate for user-associated kernel threads...?... >> >>thanks >>Pete Lawrence. >>_______________________________________________ >>dtrace-discuss mailing list >>dtrace-discuss at opensolaris.org > >
On Wed, Jun 06, 2007 at 02:59:42PM -0500, Andrew.Rutz at Sun.COM wrote:> hmm, coincidentally, i just got this output from ustack() > and the stacktrace w/ only digits gives "addresses" that > were not mapped (eg, "13ba8/i" when in "mdb -k")This will happen for ustack() results where the associated process has gone away before the DTrace consumer has had an opportunity to perform the address to symbol resolution. Recall that ustack() results are recorded as addresses and then processed some time in the future when the consumer grabs the recorded state. If a process dies in that window the symbol to name resolution cannot be performed. (The addresses are not kernel addresses and can''t be resolved in ''mdb -k''.) Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl