thr3ads.net - dtrace discuss - [dtrace-discuss] Multiple returns from a syscall [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Jesus Cea

2007-Jun-05 18:58 UTC

[dtrace-discuss] Multiple returns from a syscall

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Trying to debug a very ugly issue (multisecond disk delays), found the
following "impossible" info:

"""
...
  0  40286                   pwrite64:entry
  0  40287                  pwrite64:return 36436
  0  39998                     fdsync:entry 5379
  1  39999                    fdsync:return 37918785 4
  1  40286                   pwrite64:entry
  1  40287                  pwrite64:return 69130
  1  39998                     fdsync:entry 5394
  0  39999                    fdsync:return 213396018 4
  0  40286                   pwrite64:entry
  0  40287                  pwrite64:return 22620
  0  39998                     fdsync:entry 3993
  0  40286                   pwrite64:entry
  0  40287                  pwrite64:return 63125
  0  39998                     fdsync:entry 5358
  0  39999                    fdsync:return 35506247 4
  1  39999                    fdsync:return 48140410 4
  1  40286                   pwrite64:entry
  1  40287                  pwrite64:return 52625
  1  39998                     fdsync:entry 5068
  1  39999                    fdsync:return 13777558 4
  1  39999                    fdsync:return 7192502 4
...
"""

The "pwrite64:return" shows the time spend in the "pwrite64"
syscall.
The "fdsync:entry" number shows the delay between the
"pwrite64"
finishing and the "fdsync" syscall. The "fdsync:return"
shows time spend
in the fdsync. We can see this syscall is very slow (comparatively), as
expected.

The issue is the last fragment: There is multiple "return" but only a
single call. How can it be possible. What am I missing.

The dtrace code:

"""
[root at tesalia correo]# cat pwrite_y_fdsync.d
self unsigned long long ts;

syscall::pwrite64:entry
/pid==$1/
{
  printf(" ");
  self->ts=timestamp;
}

syscall::pwrite64:return
/pid==$1 && self->ts/
{
  printf("%llu",timestamp-self->ts);
  self->ts=timestamp;
}

syscall::fdsync:entry
/pid==$1/
{
  printf("%llu",timestamp-self->ts);
  self->ts=timestamp;
}

syscall::fdsync:return
/pid==$1 && self->ts/
{
  printf("%llu %d",timestamp-self->ts,tid);
  self->ts=0;
}
"""

Solaris 10 Update 3 here.

Thanks in advance.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at argo.es http://www.argo.es/~jcea/ _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
                               _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBRmWyZplgi5GaxT1NAQLGPQP7BnOR7N4AspdM36R/wZBMX+S4Fhu7+JEI
PA8VZaQninPCMZjHwlVoYDcf2BVbeY/9L9CKOlLLsuQaE18G5Zu66p3MXlbVHLhF
VOJ64nHU2MFCyV1v863ypBimBFENxHE9X8gFCt4h/fKIAex6cSTdYzwM5irIRvsV
iTPyE+GDWyo=NL28
-----END PGP SIGNATURE-----

Adam Leventhal

2007-Jun-05 19:12 UTC

head link

[dtrace-discuss] Multiple returns from a syscall

Hi Jesus,

Note that DTrace output is presented one CPU at a time rather than sorted
chronologically. Without seeing the full output from DTrace, one
wouldn''t
be able to draw any conclusions about a ''missing'' entry. I
suggest you
trace the timestamp in addition and then sort based on that column or
pbind the process to a specific CPU when doing this sort of analysis.

Adam

On Tue, Jun 05, 2007 at 08:58:46PM +0200, Jesus Cea
wrote:> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Trying to debug a very ugly issue (multisecond disk delays), found the
> following "impossible" info:
> 
> """
> ...
>   0  40286                   pwrite64:entry
>   0  40287                  pwrite64:return 36436
>   0  39998                     fdsync:entry 5379
>   1  39999                    fdsync:return 37918785 4
>   1  40286                   pwrite64:entry
>   1  40287                  pwrite64:return 69130
>   1  39998                     fdsync:entry 5394
>   0  39999                    fdsync:return 213396018 4
>   0  40286                   pwrite64:entry
>   0  40287                  pwrite64:return 22620
>   0  39998                     fdsync:entry 3993
>   0  40286                   pwrite64:entry
>   0  40287                  pwrite64:return 63125
>   0  39998                     fdsync:entry 5358
>   0  39999                    fdsync:return 35506247 4
>   1  39999                    fdsync:return 48140410 4
>   1  40286                   pwrite64:entry
>   1  40287                  pwrite64:return 52625
>   1  39998                     fdsync:entry 5068
>   1  39999                    fdsync:return 13777558 4
>   1  39999                    fdsync:return 7192502 4
> ...
> """
> 
> The "pwrite64:return" shows the time spend in the
"pwrite64" syscall.
> The "fdsync:entry" number shows the delay between the
"pwrite64"
> finishing and the "fdsync" syscall. The "fdsync:return"
shows time spend
> in the fdsync. We can see this syscall is very slow (comparatively), as
> expected.
> 
> The issue is the last fragment: There is multiple "return" but
only a
> single call. How can it be possible. What am I missing.
> 
> The dtrace code:
> 
> """
> [root at tesalia correo]# cat pwrite_y_fdsync.d
> self unsigned long long ts;
> 
> syscall::pwrite64:entry
> /pid==$1/
> {
>   printf(" ");
>   self->ts=timestamp;
> }
> 
> syscall::pwrite64:return
> /pid==$1 && self->ts/
> {
>   printf("%llu",timestamp-self->ts);
>   self->ts=timestamp;
> }
> 
> syscall::fdsync:entry
> /pid==$1/
> {
>   printf("%llu",timestamp-self->ts);
>   self->ts=timestamp;
> }
> 
> syscall::fdsync:return
> /pid==$1 && self->ts/
> {
>   printf("%llu %d",timestamp-self->ts,tid);
>   self->ts=0;
> }
> """
> 
> Solaris 10 Update 3 here.
> 
> Thanks in advance.
> 
> - --
> Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
> jcea at argo.es http://www.argo.es/~jcea/ _/_/    _/_/  _/_/    _/_/  _/_/
> jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
>                                _/_/  _/_/    _/_/          _/_/  _/_/
> "Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/ 
_/_/
> "My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/ 
_/_/
> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iQCVAwUBRmWyZplgi5GaxT1NAQLGPQP7BnOR7N4AspdM36R/wZBMX+S4Fhu7+JEI
> PA8VZaQninPCMZjHwlVoYDcf2BVbeY/9L9CKOlLLsuQaE18G5Zu66p3MXlbVHLhF
> VOJ64nHU2MFCyV1v863ypBimBFENxHE9X8gFCt4h/fKIAex6cSTdYzwM5irIRvsV
> iTPyE+GDWyo> =NL28
> -----END PGP SIGNATURE-----
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
-- 
Adam Leventhal, Solaris Kernel Development       http://blogs.sun.com/ahl

Jesus Cea

2007-Jun-05 19:27 UTC

head link

[dtrace-discuss] Multiple returns from a syscall

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adam Leventhal wrote:> Note that DTrace output is presented one CPU at a time rather than sorted
> chronologically. Without seeing the full output from DTrace, one
wouldn''t
> be able to draw any conclusions about a ''missing'' entry.
I suggest you
> trace the timestamp in addition and then sort based on that column or
> pbind the process to a specific CPU when doing this sort of analysis.
Oh, I see. That could explain it. How can I sort the data by timestamp?.
Must I "printf" the timestamp and postprocess the data manually?

"pbind" seems to work fine.

Now, I must investigate why some "fdsync" are so painfully slow. Some
take >5 seconds...

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at argo.es http://www.argo.es/~jcea/ _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
                               _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBRmW5Kplgi5GaxT1NAQLMLQQAg+KxhVaJEuMIhS3qVai+sQHA/psiLX98
KyRspd6+esyBmAU149Vg9BEBTtmwRz7lW/jpyoahE3GH7xsKUvahSIvBlKEdzYHm
QyEuWEGrsdcFb6U15JHLpzLsiM0V63uDJsud7t/nWGdO7PT5ahi1cFj8lYMFvhed
J4ONNdNFzhk=fapT
-----END PGP SIGNATURE-----

Adam Leventhal

2007-Jun-05 19:42 UTC

head link

[dtrace-discuss] Multiple returns from a syscall

On Tue, Jun 05, 2007 at 09:27:38PM +0200, Jesus Cea
wrote:> Oh, I see. That could explain it. How can I sort the data by timestamp?.
> Must I "printf" the timestamp and postprocess the data manually?
Unfortunately, yes. We''ve talked about implementing an option that
would
sort by timestamp by default, but it would involve more behind the scenes
magic than we usually like to provide with DTrace.
> Now, I must investigate why some "fdsync" are so painfully slow.
Some
> take >5 seconds...
You might play with the quantize() aggregating action to see the
distribution of fdsync() latency. Your results will almost certainly
depend on the underlying filesystem.

Adam

-- 
Adam Leventhal, Solaris Kernel Development       http://blogs.sun.com/ahl

Peter Lawrence

2007-Jun-06 18:27 UTC

head link

[dtrace-discuss] ustack() question

the Users Guide isn''t explicit about what happens when ustack() is
called
and there isn''t any user address space or stack associated with the
current kernel thread.  It only gives examples where there certainly is,
using the syscall provider.

I have used sched:::off-cpu to see user-stacks at context-switch events,
and it seems to work just fine, but I''m wondering how to put in an
explicit predicate for user-associated kernel threads...?...

thanks
Pete Lawrence.

Adam Leventhal

2007-Jun-06 19:35 UTC

head link

[dtrace-discuss] ustack() question

You''re right that it''s not explicitly stated, but if
there''s no associated
user context, ustack() should return an empty stack and ustackdepth should
have a value of 0.

Adam

On Wed, Jun 06, 2007 at 11:27:48AM -0700, Peter Lawrence
wrote:> 
> the Users Guide isn''t explicit about what happens when ustack() is
called
> and there isn''t any user address space or stack associated with
the
> current kernel thread.  It only gives examples where there certainly is,
> using the syscall provider.
> 
> I have used sched:::off-cpu to see user-stacks at context-switch events,
> and it seems to work just fine, but I''m wondering how to put in an
> explicit predicate for user-associated kernel threads...?...
> 
> thanks
> Pete Lawrence.
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
-- 
Adam Leventhal, Solaris Kernel Development       http://blogs.sun.com/ahl

Andrew.Rutz at Sun.COM

2007-Jun-06 19:59 UTC

head link

[dtrace-discuss] ustack() question

Adam Leventhal wrote:> You''re right that it''s not explicitly stated, but if
there''s no associated
> user context, ustack() should return an empty stack and ustackdepth should
> have a value of 0.
hmm,  coincidentally, i just got this output from ustack()
and the stacktrace w/ only digits gives "addresses" that
were not mapped (eg, "13ba8/i" when in "mdb -k")


# dtrace -ae
CPU     ID                    FUNCTION:NAME
   0  43807             di_cache_write:entry cmd 0x16a48
               libc.so.1`ioctl+0x8
               libpicldevtree.so.1`libdevinfo_init+0x20
               libpicldevtree.so.1`picldevtree_init+0xd4
               libpicltree.so.1`construct_picltree+0x98
               libpicltree.so.1`xptree_initialize+0xc8
               picld`main+0x324
               picld`_start+0x108

   0  43808            di_cache_write:return rv 0x1
   0  43807             di_cache_write:entry cmd 0x16a48
               0xff1c1054
               0xff37737c
               0x150e0
               0x1549c
               0x13f00
               0x13ba8

   0  43808            di_cache_write:return rv 0x1


------
# mdb -k
 > $m
             BASE            LIMIT             SIZE NAME
          1000000          18fa000           8fa000 ktextseg
          18fa000          1e00000           506000 kvalloc
         70000000         80000000         10000000 kvseg32
         edd00000         f0000000          2300000 kdebugseg
      2a100000000      2a13e680000         3e680000 kpseg
      2a750000000      2a753a7c000          3a7c000 kmapseg
      30000000000      70000000000      40000000000 kvseg
      70000000000      7000090e000           90e000 kmem64
8000000000000000 8000100000000000     100000000000 kpmseg
> 
> Adam
> 
> On Wed, Jun 06, 2007 at 11:27:48AM -0700, Peter Lawrence wrote:
> 
>>the Users Guide isn''t explicit about what happens when ustack()
is called
>>and there isn''t any user address space or stack associated with
the
>>current kernel thread.  It only gives examples where there certainly is,
>>using the syscall provider.
>>
>>I have used sched:::off-cpu to see user-stacks at context-switch events,
>>and it seems to work just fine, but I''m wondering how to put in
an
>>explicit predicate for user-associated kernel threads...?...
>>
>>thanks
>>Pete Lawrence.
>>_______________________________________________
>>dtrace-discuss mailing list
>>dtrace-discuss at opensolaris.org
> 
>

Adam Leventhal

2007-Jun-06 20:16 UTC

head link

[dtrace-discuss] ustack() question

On Wed, Jun 06, 2007 at 02:59:42PM -0500, Andrew.Rutz at Sun.COM
wrote:> hmm,  coincidentally, i just got this output from ustack()
> and the stacktrace w/ only digits gives "addresses" that
> were not mapped (eg, "13ba8/i" when in "mdb -k")
This will happen for ustack() results where the associated process has gone
away before the DTrace consumer has had an opportunity to perform the address
to symbol resolution. Recall that ustack() results are recorded as addresses
and then processed some time in the future when the consumer grabs the
recorded state. If a process dies in that window the symbol to name resolution
cannot be performed. (The addresses are not kernel addresses and can''t
be
resolved in ''mdb -k''.)

Adam

-- 
Adam Leventhal, Solaris Kernel Development       http://blogs.sun.com/ahl

dtrace discuss - Jun 2007 - Multiple returns from a syscall

[dtrace-discuss] Multiple returns from a syscall

[dtrace-discuss] Multiple returns from a syscall

[dtrace-discuss] Multiple returns from a syscall

[dtrace-discuss] Multiple returns from a syscall

[dtrace-discuss] ustack() question

[dtrace-discuss] ustack() question

[dtrace-discuss] ustack() question

[dtrace-discuss] ustack() question