Ignoring the long story behind why I''m doing this (and it may be the
wrong track to take), I''ve been trying to use my meager ranks in
Knowledge(dtrace) to figure out why a test we''re doing regarding an OAS
HTTP/JVM problem is taking significantly longer on a T2000 than on a dual
dual-core Dell Linux box.
One strange thing I noticed when looking at average system call times for the
java and httpd processes is the accept() times:
syscall::accept:entry
/execname=="java"/
{
self->ts = timestamp;
}
syscall::accept:return
/self->ts/
{
@sc[pid] = avg(timestamp-self->ts);
self->ts=0;
}
This code yields the following PID and average accept() times:
9490 5032706459
10392 19998243844
10400 19998255673
10398 19998289978
10393 19998312751
10395 19998312895
10396 19998842144
So accept() is taking an average of 20 seconds to complete?!
Then I tried tracing using fbt and the numbers don''t add up. What am I
overlooking? Where is the missing time or where is my code and/or understanding
wrong? (The following example, accept() only takes 9 seconds, but it is just
catching the first one.)
BEGIN
{
total = 0;
}
syscall::accept:entry
/execname == "java" && tracing++/
{
self->start_time = timestamp;
self->trace = 1;
}
fbt:::entry
/self->trace && ! self->fbtrace/
{
self->ts = timestamp;
self->fbtrace = 1;
}
fbt:::return
/self->trace && self->fbtrace/
{
delta = timestamp - self->ts;
total += delta;
trace(delta);
self->ts = 0;
self->fbtrace = 0;
}
syscall::accept:return
/self->trace == 1/
{
printf("total time in accept: %d", timestamp -
self->start_time);
printf("total time in fbt: %d", total);
self->trace = 0;
exit(0);
}
CPU FUNCTION
14 <- set_active_fd 12380
14 <- useracc 6352
14 <- avl_numnodes 8796
14 <- avl_numnodes 8316
14 <- as_segcompar 9648
14 <- as_segcompar 5768
14 <- as_segcompar 5588
14 <- as_segcompar 5420
14 <- as_segcompar 5516
14 <- as_segcompar 5652
14 <- as_segcompar 5496
14 <- as_segcompar 5456
14 <- as_segcompar 5632
14 <- segvn_checkprot 6856
14 <- avl_numnodes 8548
14 <- ufalloc 5688
14 <- fd_find 9856
14 <- fd_reserve 5632
14 <- cv_wait_sig_swap 9696
14 <- thread_lock 9284
14 <- ts_sleep 9088
14 <- exp_x 11056
14 <- disp_lock_enter_high 5724
14 <- disp_lock_exit_high 5484
14 <- sleepq_insert 6544
14 <- disp_lock_exit_nopreempt 5544
14 <- sigcheck 5864
14 <- disp_lock_enter 8860
14 <- disp_lock_exit 5720
14 <- disp_getwork 13684
14 <- restore_mstate 5860
14 <- schedctl_save 9724
25 <- set_active_fd 14240
25 <- useracc 6200
25 <- avl_numnodes 8816
25 <- avl_numnodes 8352
25 <- as_segcompar 9312
25 <- as_segcompar 5688
25 <- as_segcompar 5488
25 <- as_segcompar 5364
25 <- as_segcompar 5464
25 <- segvn_checkprot 6420
25 <- avl_numnodes 8080
25 <- ufalloc 5684
25 <- fd_find 10060
25 <- fd_reserve 5644
25 <- cv_wait_sig_swap 9172
25 <- thread_lock 9140
25 <- ts_sleep 9076
25 <- exp_x 10980
25 <- disp_lock_enter_high 5708
25 <- disp_lock_exit_high 5596
25 <- sleepq_insert 5788
25 <- disp_lock_exit_nopreempt 5492
25 <- sigcheck 5972
25 <- disp_lock_enter 8808
25 <- disp_lock_exit 5476
25 <- disp_getwork 13628
25 <- restore_mstate 5724
25 <- schedctl_save 10008
22 <- set_active_fd 11616
22 <- useracc 6340
22 <- avl_numnodes 9616
22 <- avl_numnodes 9332
22 <- as_segcompar 10012
22 <- as_segcompar 6180
22 <- as_segcompar 6388
22 <- as_segcompar 6132
22 <- as_segcompar 5956
22 <- as_segcompar 6316
22 <- as_segcompar 6396
22 <- segvn_checkprot 6648
22 <- avl_numnodes 9360
22 <- ufalloc 6320
22 <- fd_find 11464
22 <- fd_reserve 6596
22 <- cv_wait_sig_swap 9968
22 <- thread_lock 10080
22 <- ts_sleep 10140
22 <- exp_x 11460
22 <- disp_lock_enter_high 6392
22 <- disp_lock_exit_high 6348
22 <- sleepq_insert 7280
22 <- disp_lock_exit_nopreempt 6036
22 <- sigcheck 6584
22 <- disp_lock_enter 9860
22 <- disp_lock_exit 6348
22 <- disp_getwork 8300
22 <- restore_mstate 5492
22 <- schedctl_save 9256
23 <- set_active_fd 13704
23 <- useracc 7052
23 <- avl_numnodes 10264
23 <- avl_numnodes 9968
23 <- as_segcompar 11836
23 <- as_segcompar 6716
23 <- as_segcompar 6404
23 <- as_segcompar 6404
23 <- as_segcompar 6324
23 <- as_segcompar 6476
23 <- as_segcompar 6344
23 <- segvn_checkprot 6944
23 <- avl_numnodes 9500
23 <- ufalloc 6356
23 <- fd_find 11888
23 <- fd_reserve 6408
23 <- cv_wait_sig_swap 10884
23 <- thread_lock 10528
23 <- ts_sleep 10352
23 <- exp_x 11832
23 <- disp_lock_enter_high 6084
23 <- disp_lock_exit_high 6424
23 <- sleepq_insert 6812
23 <- disp_lock_exit_nopreempt 6276
23 <- sigcheck 6784
23 <- disp_lock_enter 10616
23 <- disp_lock_exit 6232
23 <- disp_getwork 14796
23 <- restore_mstate 6448
23 <- schedctl_save 11052
23 <- set_active_fd 11116
23 <- useracc 5840
23 <- avl_numnodes 8640
23 <- avl_numnodes 8192
23 <- as_segcompar 9248
23 <- as_segcompar 5628
23 <- as_segcompar 5524
23 <- as_segcompar 5456
23 <- as_segcompar 5344
23 <- as_segcompar 5396
23 <- as_segcompar 5540
23 <- as_segcompar 5548
23 <- segvn_checkprot 6176
23 <- avl_numnodes 8172
23 <- ufalloc 5732
23 <- fd_find 10420
23 <- fd_reserve 5504
23 <- cv_wait_sig_swap 8820
23 <- thread_lock 9312
23 <- ts_sleep 9000
23 <- exp_x 10788
23 <- disp_lock_enter_high 5724
23 <- disp_lock_exit_high 5628
23 <- sleepq_insert 5956
23 <- disp_lock_exit_nopreempt 5456
23 <- sigcheck 5876
23 <- disp_lock_enter 8880
23 <- disp_lock_exit 5628
23 <- disp_getwork 13112
23 <- restore_mstate 5500
23 <- schedctl_save 9328
29 <- set_active_fd 11988
29 <- useracc 5880
29 <- avl_numnodes 8604
29 <- avl_numnodes 8116
29 <- as_segcompar 9360
29 <- as_segcompar 5724
29 <- as_segcompar 5476
29 <- as_segcompar 5416
29 <- as_segcompar 5500
29 <- as_segcompar 5668
29 <- segvn_checkprot 6112
29 <- avl_numnodes 8256
29 <- ufalloc 5752
29 <- fd_find 10624
29 <- fd_reserve 5568
29 <- cv_wait_sig_swap 8928
29 <- thread_lock 9308
29 <- ts_sleep 9036
29 <- exp_x 11092
29 <- disp_lock_enter_high 5656
29 <- disp_lock_exit_high 5600
29 <- sleepq_insert 6464
29 <- disp_lock_exit_nopreempt 5604
29 <- sigcheck 5856
29 <- disp_lock_enter 9128
29 <- disp_lock_exit 5728
29 <- disp_getwork 13784
29 <- restore_mstate 5644
29 <- schedctl_save 9916
15 <- set_active_fd 13660
15 <- useracc 6912
15 <- avl_numnodes 9976
15 <- avl_numnodes 9256
15 <- as_segcompar 10940
15 <- as_segcompar 6752
15 <- as_segcompar 6400
15 <- as_segcompar 6320
15 <- as_segcompar 6120
15 <- as_segcompar 6436
15 <- as_segcompar 6544
15 <- as_segcompar 6124
15 <- segvn_checkprot 7048
15 <- avl_numnodes 10272
15 <- ufalloc 6972
15 <- fd_find 11868
15 <- fd_reserve 6568
15 <- cv_wait_sig_swap 11380
15 <- thread_lock 10856
15 <- ts_sleep 10400
15 <- exp_x 12360
15 <- disp_lock_enter_high 6424
15 <- disp_lock_exit_high 6420
15 <- sleepq_insert 7392
15 <- disp_lock_exit_nopreempt 6452
15 <- sigcheck 6808
15 <- disp_lock_enter 10336
15 <- disp_lock_exit 6412
15 <- disp_getwork 15236
15 <- restore_mstate 6712
15 <- schedctl_save 11056
14 <- schedctl_restore 13144
14 <- sigcheck 5884
14 <- sogetoff 6156
14 <- bcopy 5724
14 <- kmem_cache_alloc 10104
14 <- getmajor 5544
14 <- vn_reinit 6008
14 <- vn_recycle 5404
14 <- vn_exists 5748
14 <- getzoneid 9004
14 <- so_lock_single 5692
14 <- kmem_cache_alloc 10160
14 <- kmem_cache_alloc 9000
14 <- setq 6568
14 <- set_qend 5744
14 <- kmem_cache_alloc 8780
14 <- setq 6160
14 <- entersq 5964
14 <- getzoneid 8304
14 <- vmem_alloc 9316
14 <- kmem_cache_alloc 6344
14 <- WR 5440
14 <- WR 5336
14 <- WR 8024
14 <- strlock 9132
14 <- set_nfsrv_ptr 5844
14 <- set_nbsrv_ptr 5924
14 <- set_qend 5816
14 <- set_qend 5648
14 <- strsetuio 5852
14 <- strunlock 5664
14 <- claimstr 9384
14 <- cv_broadcast 8808
14 <- claimstr 8128
14 <- cv_broadcast 8132
14 <- leavesq 5788
14 <- cv_broadcast 5456
14 <- so_unlock_single 5580
14 <- WR 8288
14 <- SAMESTR 8300
14 <- SAMESTR 5376
14 <- bcmp 5640
14 <- kmem_cache_alloc 10544
14 <- strsetrputhooks 8832
14 <- strsetwputhooks 5684
14 <- kmem_cache_alloc 8964
14 <- bcopy 6064
14 <- so_automatic_bind 5940
14 <- bcopy 5668
14 <- kmem_cache_alloc 11600
14 <- bcopy 5956
14 <- canputnext 8732
14 <- strmakedata 5840
14 <- crhold 8632
14 <- stream_willservice 5680
14 <- RD 10688
14 <- kmem_cache_alloc 8924
14 <- bcopy 5704
14 <- crhold 5776
14 <- cv_broadcast 11520
14 <- tcp_eager_unlink 6616
14 <- servicing_interrupt 8808
14 <- cv_broadcast 8936
14 <- tcp_maxpsz_set 5660
14 <- tcp_fuse_set_rcv_hiwat 5808
14 <- tcp_fuse_maxpsz_set 9756
14 <- claimstr 9192
14 <- strqset 6400
14 <- cv_broadcast 8684
14 <- kmem_cache_alloc 8984
14 <- strsock_misc 9956
14 <- dblk_lastfree 10760
14 <- kmem_cache_free 6052
14 <- qbackenable 5844
14 <- strsock_misc 9364
14 <- dblk_lastfree 9912
14 <- kmem_cache_free 5832
14 <- qbackenable 5576
14 <- qenable_locked 15604
14 <- tcp_fuse_syncstr_enable 10396
14 <- tcp_fuse_syncstr_enable 5932
14 <- cv_broadcast 5848
14 <- stream_runservice 6320
14 <- dblk_lastfree 8652
14 <- kmem_cache_free 5804
14 <- sowaitack 8880
14 <- bcopy 19364
14 <- crfree 11904
14 <- kmem_cache_free 6284
14 <- clear_active_fd 8456
14 <- cv_broadcast 5528
14 <- copyout_name 7592
14 <- kmem_cache_alloc 11640
14 <- crhold 5272
14 <- cv_broadcast 8228
14 <= accept total time in accept:
9336163080
total time in fbt: 2393624
Thanks,
Justin
Justin C. Lloyd
Senior Engineer and System Administrator
303-684-4166 Office
720-480-0380 Cell
303-684-4100 Fax
jlloyd at digitalglobe.com
DigitalGlobe ?, An Imaging and Information Company
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20070425/307a2d51/attachment.html>
Justin, I don''t think your fbt entry and return probes are pairing off correctly. Let''s suppose that your fbt call stack is a->b->c. When you call "a", since self->fbtrace is 0, the clause is executed. This sets self->fbtrace equal to 1. The next events are the calls to "b" and "c", which are ignored because self->fbtrace is now 1. The next event is the return from "c", which, since self->fbtrace is true, executes its clause and calculates the delta. But this delta is the execution time for the parts of "a" and "b" that happen before "c", and all of "c". The next events are the returns from "b" and "a" which are ignored because self->fbtrace is now 0. At this point, it appears to me that you have only captured part of the time for the kernel execution. Of course my analysis of the flow could be wrong. :-) Chip Justin Lloyd wrote:> Ignoring the long story behind why I''m doing this (and it may be the wrong track to take), I''ve been trying to use my meager ranks in Knowledge(dtrace) to figure out why a test we''re doing regarding an OAS HTTP/JVM problem is taking significantly longer on a T2000 than on a dual dual-core Dell Linux box. > > One strange thing I noticed when looking at average system call times for the java and httpd processes is the accept() times: > > syscall::accept:entry > /execname=="java"/ > { > self->ts = timestamp; > } > > syscall::accept:return > /self->ts/ > { > @sc[pid] = avg(timestamp-self->ts); > self->ts=0; > } > > This code yields the following PID and average accept() times: > > 9490 5032706459 > 10392 19998243844 > 10400 19998255673 > 10398 19998289978 > 10393 19998312751 > 10395 19998312895 > 10396 19998842144 > > So accept() is taking an average of 20 seconds to complete?! > > Then I tried tracing using fbt and the numbers don''t add up. What am I overlooking? Where is the missing time or where is my code and/or understanding wrong? (The following example, accept() only takes 9 seconds, but it is just catching the first one.) > > BEGIN > { > total = 0; > } > > syscall::accept:entry > /execname == "java" && tracing++/ > { > self->start_time = timestamp; > self->trace = 1; > } > > fbt:::entry > /self->trace && ! self->fbtrace/ > { > self->ts = timestamp; > self->fbtrace = 1; > } > > fbt:::return > /self->trace && self->fbtrace/ > { > delta = timestamp - self->ts; > total += delta; > trace(delta); > self->ts = 0; > self->fbtrace = 0; > } > > syscall::accept:return > /self->trace == 1/ > { > printf("total time in accept: %d", timestamp - self->start_time); > printf("total time in fbt: %d", total); > self->trace = 0; > exit(0); > } > > CPU FUNCTION > 14 <- set_active_fd 12380 > 14 <- useracc 6352 > 14 <- avl_numnodes 8796 > 14 <- avl_numnodes 8316 > 14 <- as_segcompar 9648 > 14 <- as_segcompar 5768 > 14 <- as_segcompar 5588 > 14 <- as_segcompar 5420 > 14 <- as_segcompar 5516 > 14 <- as_segcompar 5652 > 14 <- as_segcompar 5496 > 14 <- as_segcompar 5456 > 14 <- as_segcompar 5632 > 14 <- segvn_checkprot 6856 > 14 <- avl_numnodes 8548 > 14 <- ufalloc 5688 > 14 <- fd_find 9856 > 14 <- fd_reserve 5632 > 14 <- cv_wait_sig_swap 9696 > 14 <- thread_lock 9284 > 14 <- ts_sleep 9088 > 14 <- exp_x 11056 > 14 <- disp_lock_enter_high 5724 > 14 <- disp_lock_exit_high 5484 > 14 <- sleepq_insert 6544 > 14 <- disp_lock_exit_nopreempt 5544 > 14 <- sigcheck 5864 > 14 <- disp_lock_enter 8860 > 14 <- disp_lock_exit 5720 > 14 <- disp_getwork 13684 > 14 <- restore_mstate 5860 > 14 <- schedctl_save 9724 > 25 <- set_active_fd 14240 > 25 <- useracc 6200 > 25 <- avl_numnodes 8816 > 25 <- avl_numnodes 8352 > 25 <- as_segcompar 9312 > 25 <- as_segcompar 5688 > 25 <- as_segcompar 5488 > 25 <- as_segcompar 5364 > 25 <- as_segcompar 5464 > 25 <- segvn_checkprot 6420 > 25 <- avl_numnodes 8080 > 25 <- ufalloc 5684 > 25 <- fd_find 10060 > 25 <- fd_reserve 5644 > 25 <- cv_wait_sig_swap 9172 > 25 <- thread_lock 9140 > 25 <- ts_sleep 9076 > 25 <- exp_x 10980 > 25 <- disp_lock_enter_high 5708 > 25 <- disp_lock_exit_high 5596 > 25 <- sleepq_insert 5788 > 25 <- disp_lock_exit_nopreempt 5492 > 25 <- sigcheck 5972 > 25 <- disp_lock_enter 8808 > 25 <- disp_lock_exit 5476 > 25 <- disp_getwork 13628 > 25 <- restore_mstate 5724 > 25 <- schedctl_save 10008 > 22 <- set_active_fd 11616 > 22 <- useracc 6340 > 22 <- avl_numnodes 9616 > 22 <- avl_numnodes 9332 > 22 <- as_segcompar 10012 > 22 <- as_segcompar 6180 > 22 <- as_segcompar 6388 > 22 <- as_segcompar 6132 > 22 <- as_segcompar 5956 > 22 <- as_segcompar 6316 > 22 <- as_segcompar 6396 > 22 <- segvn_checkprot 6648 > 22 <- avl_numnodes 9360 > 22 <- ufalloc 6320 > 22 <- fd_find 11464 > 22 <- fd_reserve 6596 > 22 <- cv_wait_sig_swap 9968 > 22 <- thread_lock 10080 > 22 <- ts_sleep 10140 > 22 <- exp_x 11460 > 22 <- disp_lock_enter_high 6392 > 22 <- disp_lock_exit_high 6348 > 22 <- sleepq_insert 7280 > 22 <- disp_lock_exit_nopreempt 6036 > 22 <- sigcheck 6584 > 22 <- disp_lock_enter 9860 > 22 <- disp_lock_exit 6348 > 22 <- disp_getwork 8300 > 22 <- restore_mstate 5492 > 22 <- schedctl_save 9256 > 23 <- set_active_fd 13704 > 23 <- useracc 7052 > 23 <- avl_numnodes 10264 > 23 <- avl_numnodes 9968 > 23 <- as_segcompar 11836 > 23 <- as_segcompar 6716 > 23 <- as_segcompar 6404 > 23 <- as_segcompar 6404 > 23 <- as_segcompar 6324 > 23 <- as_segcompar 6476 > 23 <- as_segcompar 6344 > 23 <- segvn_checkprot 6944 > 23 <- avl_numnodes 9500 > 23 <- ufalloc 6356 > 23 <- fd_find 11888 > 23 <- fd_reserve 6408 > 23 <- cv_wait_sig_swap 10884 > 23 <- thread_lock 10528 > 23 <- ts_sleep 10352 > 23 <- exp_x 11832 > 23 <- disp_lock_enter_high 6084 > 23 <- disp_lock_exit_high 6424 > 23 <- sleepq_insert 6812 > 23 <- disp_lock_exit_nopreempt 6276 > 23 <- sigcheck 6784 > 23 <- disp_lock_enter 10616 > 23 <- disp_lock_exit 6232 > 23 <- disp_getwork 14796 > 23 <- restore_mstate 6448 > 23 <- schedctl_save 11052 > 23 <- set_active_fd 11116 > 23 <- useracc 5840 > 23 <- avl_numnodes 8640 > 23 <- avl_numnodes 8192 > 23 <- as_segcompar 9248 > 23 <- as_segcompar 5628 > 23 <- as_segcompar 5524 > 23 <- as_segcompar 5456 > 23 <- as_segcompar 5344 > 23 <- as_segcompar 5396 > 23 <- as_segcompar 5540 > 23 <- as_segcompar 5548 > 23 <- segvn_checkprot 6176 > 23 <- avl_numnodes 8172 > 23 <- ufalloc 5732 > 23 <- fd_find 10420 > 23 <- fd_reserve 5504 > 23 <- cv_wait_sig_swap 8820 > 23 <- thread_lock 9312 > 23 <- ts_sleep 9000 > 23 <- exp_x 10788 > 23 <- disp_lock_enter_high 5724 > 23 <- disp_lock_exit_high 5628 > 23 <- sleepq_insert 5956 > 23 <- disp_lock_exit_nopreempt 5456 > 23 <- sigcheck 5876 > 23 <- disp_lock_enter 8880 > 23 <- disp_lock_exit 5628 > 23 <- disp_getwork 13112 > 23 <- restore_mstate 5500 > 23 <- schedctl_save 9328 > 29 <- set_active_fd 11988 > 29 <- useracc 5880 > 29 <- avl_numnodes 8604 > 29 <- avl_numnodes 8116 > 29 <- as_segcompar 9360 > 29 <- as_segcompar 5724 > 29 <- as_segcompar 5476 > 29 <- as_segcompar 5416 > 29 <- as_segcompar 5500 > 29 <- as_segcompar 5668 > 29 <- segvn_checkprot 6112 > 29 <- avl_numnodes 8256 > 29 <- ufalloc 5752 > 29 <- fd_find 10624 > 29 <- fd_reserve 5568 > 29 <- cv_wait_sig_swap 8928 > 29 <- thread_lock 9308 > 29 <- ts_sleep 9036 > 29 <- exp_x 11092 > 29 <- disp_lock_enter_high 5656 > 29 <- disp_lock_exit_high 5600 > 29 <- sleepq_insert 6464 > 29 <- disp_lock_exit_nopreempt 5604 > 29 <- sigcheck 5856 > 29 <- disp_lock_enter 9128 > 29 <- disp_lock_exit 5728 > 29 <- disp_getwork 13784 > 29 <- restore_mstate 5644 > 29 <- schedctl_save 9916 > 15 <- set_active_fd 13660 > 15 <- useracc 6912 > 15 <- avl_numnodes 9976 > 15 <- avl_numnodes 9256 > 15 <- as_segcompar 10940 > 15 <- as_segcompar 6752 > 15 <- as_segcompar 6400 > 15 <- as_segcompar 6320 > 15 <- as_segcompar 6120 > 15 <- as_segcompar 6436 > 15 <- as_segcompar 6544 > 15 <- as_segcompar 6124 > 15 <- segvn_checkprot 7048 > 15 <- avl_numnodes 10272 > 15 <- ufalloc 6972 > 15 <- fd_find 11868 > 15 <- fd_reserve 6568 > 15 <- cv_wait_sig_swap 11380 > 15 <- thread_lock 10856 > 15 <- ts_sleep 10400 > 15 <- exp_x 12360 > 15 <- disp_lock_enter_high 6424 > 15 <- disp_lock_exit_high 6420 > 15 <- sleepq_insert 7392 > 15 <- disp_lock_exit_nopreempt 6452 > 15 <- sigcheck 6808 > 15 <- disp_lock_enter 10336 > 15 <- disp_lock_exit 6412 > 15 <- disp_getwork 15236 > 15 <- restore_mstate 6712 > 15 <- schedctl_save 11056 > 14 <- schedctl_restore 13144 > 14 <- sigcheck 5884 > 14 <- sogetoff 6156 > 14 <- bcopy 5724 > 14 <- kmem_cache_alloc 10104 > 14 <- getmajor 5544 > 14 <- vn_reinit 6008 > 14 <- vn_recycle 5404 > 14 <- vn_exists 5748 > 14 <- getzoneid 9004 > 14 <- so_lock_single 5692 > 14 <- kmem_cache_alloc 10160 > 14 <- kmem_cache_alloc 9000 > 14 <- setq 6568 > 14 <- set_qend 5744 > 14 <- kmem_cache_alloc 8780 > 14 <- setq 6160 > 14 <- entersq 5964 > 14 <- getzoneid 8304 > 14 <- vmem_alloc 9316 > 14 <- kmem_cache_alloc 6344 > 14 <- WR 5440 > 14 <- WR 5336 > 14 <- WR 8024 > 14 <- strlock 9132 > 14 <- set_nfsrv_ptr 5844 > 14 <- set_nbsrv_ptr 5924 > 14 <- set_qend 5816 > 14 <- set_qend 5648 > 14 <- strsetuio 5852 > 14 <- strunlock 5664 > 14 <- claimstr 9384 > 14 <- cv_broadcast 8808 > 14 <- claimstr 8128 > 14 <- cv_broadcast 8132 > 14 <- leavesq 5788 > 14 <- cv_broadcast 5456 > 14 <- so_unlock_single 5580 > 14 <- WR 8288 > 14 <- SAMESTR 8300 > 14 <- SAMESTR 5376 > 14 <- bcmp 5640 > 14 <- kmem_cache_alloc 10544 > 14 <- strsetrputhooks 8832 > 14 <- strsetwputhooks 5684 > 14 <- kmem_cache_alloc 8964 > 14 <- bcopy 6064 > 14 <- so_automatic_bind 5940 > 14 <- bcopy 5668 > 14 <- kmem_cache_alloc 11600 > 14 <- bcopy 5956 > 14 <- canputnext 8732 > 14 <- strmakedata 5840 > 14 <- crhold 8632 > 14 <- stream_willservice 5680 > 14 <- RD 10688 > 14 <- kmem_cache_alloc 8924 > 14 <- bcopy 5704 > 14 <- crhold 5776 > 14 <- cv_broadcast 11520 > 14 <- tcp_eager_unlink 6616 > 14 <- servicing_interrupt 8808 > 14 <- cv_broadcast 8936 > 14 <- tcp_maxpsz_set 5660 > 14 <- tcp_fuse_set_rcv_hiwat 5808 > 14 <- tcp_fuse_maxpsz_set 9756 > 14 <- claimstr 9192 > 14 <- strqset 6400 > 14 <- cv_broadcast 8684 > 14 <- kmem_cache_alloc 8984 > 14 <- strsock_misc 9956 > 14 <- dblk_lastfree 10760 > 14 <- kmem_cache_free 6052 > 14 <- qbackenable 5844 > 14 <- strsock_misc 9364 > 14 <- dblk_lastfree 9912 > 14 <- kmem_cache_free 5832 > 14 <- qbackenable 5576 > 14 <- qenable_locked 15604 > 14 <- tcp_fuse_syncstr_enable 10396 > 14 <- tcp_fuse_syncstr_enable 5932 > 14 <- cv_broadcast 5848 > 14 <- stream_runservice 6320 > 14 <- dblk_lastfree 8652 > 14 <- kmem_cache_free 5804 > 14 <- sowaitack 8880 > 14 <- bcopy 19364 > 14 <- crfree 11904 > 14 <- kmem_cache_free 6284 > 14 <- clear_active_fd 8456 > 14 <- cv_broadcast 5528 > 14 <- copyout_name 7592 > 14 <- kmem_cache_alloc 11640 > 14 <- crhold 5272 > 14 <- cv_broadcast 8228 > 14 <= accept total time in accept: 9336163080 > total time in fbt: 2393624 > > > Thanks, > Justin > > > Justin C. Lloyd > Senior Engineer and System Administrator > 303-684-4166 Office > 720-480-0380 Cell > 303-684-4100 Fax > jlloyd at digitalglobe.com > DigitalGlobe ?, An Imaging and Information Company > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Chip, More than anything I think you''ve made me realize that my logic here is completely wrong, since in your example my flow would include c''s time in b''s output and b''s time (including c''s) in a''s output (I think...I''m starting to confuse myself). I need to completely re-think how I want to break down the system call''s time, though I don''t think that my incorrect logic accounts for the huge difference in accept() times between the first and second D scripts. Obviously what I''m trying to do here is figure out why the first script reports such a high value and, if that 20 sec value is correct, where that time is being spent. Justin -----Original Message----- From: Chip Bennett [mailto:cbennett at laurustech.com] Sent: Wednesday, April 25, 2007 3:41 PM To: Justin Lloyd Cc: dtrace-discuss at opensolaris.org Subject: Re: [dtrace-discuss] Tracing accept() time Justin, I don''t think your fbt entry and return probes are pairing off correctly. Let''s suppose that your fbt call stack is a->b->c. When you call "a", since self->fbtrace is 0, the clause is executed. This sets self->fbtrace equal to 1. The next events are the calls to "b" and "c", which are ignored because self->fbtrace is now 1. The next event is the return from "c", which, since self->fbtrace is true, executes its clause and calculates the delta. But this delta is the execution time for the parts of "a" and "b" that happen before "c", and all of "c". The next events are the returns from "b" and "a" which are ignored because self->fbtrace is now 0. At this point, it appears to me that you have only captured part of the time for the kernel execution. Of course my analysis of the flow could be wrong. :-) Chip Justin Lloyd wrote:> Ignoring the long story behind why I''m doing this (and it may be thewrong track to take), I''ve been trying to use my meager ranks in Knowledge(dtrace) to figure out why a test we''re doing regarding an OAS HTTP/JVM problem is taking significantly longer on a T2000 than on a dual dual-core Dell Linux box.> > One strange thing I noticed when looking at average system call timesfor the java and httpd processes is the accept() times:> > syscall::accept:entry > /execname=="java"/ > { > self->ts = timestamp; > } > > syscall::accept:return > /self->ts/ > { > @sc[pid] = avg(timestamp-self->ts); > self->ts=0; > } > > This code yields the following PID and average accept() times: > > 9490 5032706459 > 10392 19998243844 > 10400 19998255673 > 10398 19998289978 > 10393 19998312751 > 10395 19998312895 > 10396 19998842144 > > So accept() is taking an average of 20 seconds to complete?! > > Then I tried tracing using fbt and the numbers don''t add up. What am Ioverlooking? Where is the missing time or where is my code and/or understanding wrong? (The following example, accept() only takes 9 seconds, but it is just catching the first one.)> > BEGIN > { > total = 0; > } > > syscall::accept:entry > /execname == "java" && tracing++/ > { > self->start_time = timestamp; > self->trace = 1; > } > > fbt:::entry > /self->trace && ! self->fbtrace/ > { > self->ts = timestamp; > self->fbtrace = 1; > } > > fbt:::return > /self->trace && self->fbtrace/ > { > delta = timestamp - self->ts; > total += delta; > trace(delta); > self->ts = 0; > self->fbtrace = 0; > } > > syscall::accept:return > /self->trace == 1/ > { > printf("total time in accept: %d", timestamp - self->start_time); > printf("total time in fbt: %d", total); > self->trace = 0; > exit(0); > } > > CPU FUNCTION > 14 <- set_active_fd 12380 > 14 <- useracc 6352 > 14 <- avl_numnodes 8796 > 14 <- avl_numnodes 8316 > 14 <- as_segcompar 9648 > 14 <- as_segcompar 5768 > 14 <- as_segcompar 5588 > 14 <- as_segcompar 5420 > 14 <- as_segcompar 5516 > 14 <- as_segcompar 5652 > 14 <- as_segcompar 5496 > 14 <- as_segcompar 5456 > 14 <- as_segcompar 5632 > 14 <- segvn_checkprot 6856 > 14 <- avl_numnodes 8548 > 14 <- ufalloc 5688 > 14 <- fd_find 9856 > 14 <- fd_reserve 5632 > 14 <- cv_wait_sig_swap 9696 > 14 <- thread_lock 9284 > 14 <- ts_sleep 9088 > 14 <- exp_x 11056 > 14 <- disp_lock_enter_high 5724 > 14 <- disp_lock_exit_high 5484 > 14 <- sleepq_insert 6544 > 14 <- disp_lock_exit_nopreempt 5544 > 14 <- sigcheck 5864 > 14 <- disp_lock_enter 8860 > 14 <- disp_lock_exit 5720 > 14 <- disp_getwork 13684 > 14 <- restore_mstate 5860 > 14 <- schedctl_save 9724 > 25 <- set_active_fd 14240 > 25 <- useracc 6200 > 25 <- avl_numnodes 8816 > 25 <- avl_numnodes 8352 > 25 <- as_segcompar 9312 > 25 <- as_segcompar 5688 > 25 <- as_segcompar 5488 > 25 <- as_segcompar 5364 > 25 <- as_segcompar 5464 > 25 <- segvn_checkprot 6420 > 25 <- avl_numnodes 8080 > 25 <- ufalloc 5684 > 25 <- fd_find 10060 > 25 <- fd_reserve 5644 > 25 <- cv_wait_sig_swap 9172 > 25 <- thread_lock 9140 > 25 <- ts_sleep 9076 > 25 <- exp_x 10980 > 25 <- disp_lock_enter_high 5708 > 25 <- disp_lock_exit_high 5596 > 25 <- sleepq_insert 5788 > 25 <- disp_lock_exit_nopreempt 5492 > 25 <- sigcheck 5972 > 25 <- disp_lock_enter 8808 > 25 <- disp_lock_exit 5476 > 25 <- disp_getwork 13628 > 25 <- restore_mstate 5724 > 25 <- schedctl_save 10008 > 22 <- set_active_fd 11616 > 22 <- useracc 6340 > 22 <- avl_numnodes 9616 > 22 <- avl_numnodes 9332 > 22 <- as_segcompar 10012 > 22 <- as_segcompar 6180 > 22 <- as_segcompar 6388 > 22 <- as_segcompar 6132 > 22 <- as_segcompar 5956 > 22 <- as_segcompar 6316 > 22 <- as_segcompar 6396 > 22 <- segvn_checkprot 6648 > 22 <- avl_numnodes 9360 > 22 <- ufalloc 6320 > 22 <- fd_find 11464 > 22 <- fd_reserve 6596 > 22 <- cv_wait_sig_swap 9968 > 22 <- thread_lock 10080 > 22 <- ts_sleep 10140 > 22 <- exp_x 11460 > 22 <- disp_lock_enter_high 6392 > 22 <- disp_lock_exit_high 6348 > 22 <- sleepq_insert 7280 > 22 <- disp_lock_exit_nopreempt 6036 > 22 <- sigcheck 6584 > 22 <- disp_lock_enter 9860 > 22 <- disp_lock_exit 6348 > 22 <- disp_getwork 8300 > 22 <- restore_mstate 5492 > 22 <- schedctl_save 9256 > 23 <- set_active_fd 13704 > 23 <- useracc 7052 > 23 <- avl_numnodes 10264 > 23 <- avl_numnodes 9968 > 23 <- as_segcompar 11836 > 23 <- as_segcompar 6716 > 23 <- as_segcompar 6404 > 23 <- as_segcompar 6404 > 23 <- as_segcompar 6324 > 23 <- as_segcompar 6476 > 23 <- as_segcompar 6344 > 23 <- segvn_checkprot 6944 > 23 <- avl_numnodes 9500 > 23 <- ufalloc 6356 > 23 <- fd_find 11888 > 23 <- fd_reserve 6408 > 23 <- cv_wait_sig_swap 10884 > 23 <- thread_lock 10528 > 23 <- ts_sleep 10352 > 23 <- exp_x 11832 > 23 <- disp_lock_enter_high 6084 > 23 <- disp_lock_exit_high 6424 > 23 <- sleepq_insert 6812 > 23 <- disp_lock_exit_nopreempt 6276 > 23 <- sigcheck 6784 > 23 <- disp_lock_enter 10616 > 23 <- disp_lock_exit 6232 > 23 <- disp_getwork 14796 > 23 <- restore_mstate 6448 > 23 <- schedctl_save 11052 > 23 <- set_active_fd 11116 > 23 <- useracc 5840 > 23 <- avl_numnodes 8640 > 23 <- avl_numnodes 8192 > 23 <- as_segcompar 9248 > 23 <- as_segcompar 5628 > 23 <- as_segcompar 5524 > 23 <- as_segcompar 5456 > 23 <- as_segcompar 5344 > 23 <- as_segcompar 5396 > 23 <- as_segcompar 5540 > 23 <- as_segcompar 5548 > 23 <- segvn_checkprot 6176 > 23 <- avl_numnodes 8172 > 23 <- ufalloc 5732 > 23 <- fd_find 10420 > 23 <- fd_reserve 5504 > 23 <- cv_wait_sig_swap 8820 > 23 <- thread_lock 9312 > 23 <- ts_sleep 9000 > 23 <- exp_x 10788 > 23 <- disp_lock_enter_high 5724 > 23 <- disp_lock_exit_high 5628 > 23 <- sleepq_insert 5956 > 23 <- disp_lock_exit_nopreempt 5456 > 23 <- sigcheck 5876 > 23 <- disp_lock_enter 8880 > 23 <- disp_lock_exit 5628 > 23 <- disp_getwork 13112 > 23 <- restore_mstate 5500 > 23 <- schedctl_save 9328 > 29 <- set_active_fd 11988 > 29 <- useracc 5880 > 29 <- avl_numnodes 8604 > 29 <- avl_numnodes 8116 > 29 <- as_segcompar 9360 > 29 <- as_segcompar 5724 > 29 <- as_segcompar 5476 > 29 <- as_segcompar 5416 > 29 <- as_segcompar 5500 > 29 <- as_segcompar 5668 > 29 <- segvn_checkprot 6112 > 29 <- avl_numnodes 8256 > 29 <- ufalloc 5752 > 29 <- fd_find 10624 > 29 <- fd_reserve 5568 > 29 <- cv_wait_sig_swap 8928 > 29 <- thread_lock 9308 > 29 <- ts_sleep 9036 > 29 <- exp_x 11092 > 29 <- disp_lock_enter_high 5656 > 29 <- disp_lock_exit_high 5600 > 29 <- sleepq_insert 6464 > 29 <- disp_lock_exit_nopreempt 5604 > 29 <- sigcheck 5856 > 29 <- disp_lock_enter 9128 > 29 <- disp_lock_exit 5728 > 29 <- disp_getwork 13784 > 29 <- restore_mstate 5644 > 29 <- schedctl_save 9916 > 15 <- set_active_fd 13660 > 15 <- useracc 6912 > 15 <- avl_numnodes 9976 > 15 <- avl_numnodes 9256 > 15 <- as_segcompar 10940 > 15 <- as_segcompar 6752 > 15 <- as_segcompar 6400 > 15 <- as_segcompar 6320 > 15 <- as_segcompar 6120 > 15 <- as_segcompar 6436 > 15 <- as_segcompar 6544 > 15 <- as_segcompar 6124 > 15 <- segvn_checkprot 7048 > 15 <- avl_numnodes 10272 > 15 <- ufalloc 6972 > 15 <- fd_find 11868 > 15 <- fd_reserve 6568 > 15 <- cv_wait_sig_swap 11380 > 15 <- thread_lock 10856 > 15 <- ts_sleep 10400 > 15 <- exp_x 12360 > 15 <- disp_lock_enter_high 6424 > 15 <- disp_lock_exit_high 6420 > 15 <- sleepq_insert 7392 > 15 <- disp_lock_exit_nopreempt 6452 > 15 <- sigcheck 6808 > 15 <- disp_lock_enter 10336 > 15 <- disp_lock_exit 6412 > 15 <- disp_getwork 15236 > 15 <- restore_mstate 6712 > 15 <- schedctl_save 11056 > 14 <- schedctl_restore 13144 > 14 <- sigcheck 5884 > 14 <- sogetoff 6156 > 14 <- bcopy 5724 > 14 <- kmem_cache_alloc 10104 > 14 <- getmajor 5544 > 14 <- vn_reinit 6008 > 14 <- vn_recycle 5404 > 14 <- vn_exists 5748 > 14 <- getzoneid 9004 > 14 <- so_lock_single 5692 > 14 <- kmem_cache_alloc 10160 > 14 <- kmem_cache_alloc 9000 > 14 <- setq 6568 > 14 <- set_qend 5744 > 14 <- kmem_cache_alloc 8780 > 14 <- setq 6160 > 14 <- entersq 5964 > 14 <- getzoneid 8304 > 14 <- vmem_alloc 9316 > 14 <- kmem_cache_alloc 6344 > 14 <- WR 5440 > 14 <- WR 5336 > 14 <- WR 8024 > 14 <- strlock 9132 > 14 <- set_nfsrv_ptr 5844 > 14 <- set_nbsrv_ptr 5924 > 14 <- set_qend 5816 > 14 <- set_qend 5648 > 14 <- strsetuio 5852 > 14 <- strunlock 5664 > 14 <- claimstr 9384 > 14 <- cv_broadcast 8808 > 14 <- claimstr 8128 > 14 <- cv_broadcast 8132 > 14 <- leavesq 5788 > 14 <- cv_broadcast 5456 > 14 <- so_unlock_single 5580 > 14 <- WR 8288 > 14 <- SAMESTR 8300 > 14 <- SAMESTR 5376 > 14 <- bcmp 5640 > 14 <- kmem_cache_alloc 10544 > 14 <- strsetrputhooks 8832 > 14 <- strsetwputhooks 5684 > 14 <- kmem_cache_alloc 8964 > 14 <- bcopy 6064 > 14 <- so_automatic_bind 5940 > 14 <- bcopy 5668 > 14 <- kmem_cache_alloc 11600 > 14 <- bcopy 5956 > 14 <- canputnext 8732 > 14 <- strmakedata 5840 > 14 <- crhold 8632 > 14 <- stream_willservice 5680 > 14 <- RD 10688 > 14 <- kmem_cache_alloc 8924 > 14 <- bcopy 5704 > 14 <- crhold 5776 > 14 <- cv_broadcast 11520 > 14 <- tcp_eager_unlink 6616 > 14 <- servicing_interrupt 8808 > 14 <- cv_broadcast 8936 > 14 <- tcp_maxpsz_set 5660 > 14 <- tcp_fuse_set_rcv_hiwat 5808 > 14 <- tcp_fuse_maxpsz_set 9756 > 14 <- claimstr 9192 > 14 <- strqset 6400 > 14 <- cv_broadcast 8684 > 14 <- kmem_cache_alloc 8984 > 14 <- strsock_misc 9956 > 14 <- dblk_lastfree 10760 > 14 <- kmem_cache_free 6052 > 14 <- qbackenable 5844 > 14 <- strsock_misc 9364 > 14 <- dblk_lastfree 9912 > 14 <- kmem_cache_free 5832 > 14 <- qbackenable 5576 > 14 <- qenable_locked 15604 > 14 <- tcp_fuse_syncstr_enable 10396 > 14 <- tcp_fuse_syncstr_enable 5932 > 14 <- cv_broadcast 5848 > 14 <- stream_runservice 6320 > 14 <- dblk_lastfree 8652 > 14 <- kmem_cache_free 5804 > 14 <- sowaitack 8880 > 14 <- bcopy 19364 > 14 <- crfree 11904 > 14 <- kmem_cache_free 6284 > 14 <- clear_active_fd 8456 > 14 <- cv_broadcast 5528 > 14 <- copyout_name 7592 > 14 <- kmem_cache_alloc 11640 > 14 <- crhold 5272 > 14 <- cv_broadcast 8228 > 14 <= accept total time in accept:9336163080> total time in fbt:2393624> > > Thanks, > Justin > > > Justin C. Lloyd > Senior Engineer and System Administrator > 303-684-4166 Office > 720-480-0380 Cell > 303-684-4100 Fax > jlloyd at digitalglobe.com > DigitalGlobe (r), An Imaging and Information Company > > > > >------------------------------------------------------------------------> > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Whoa! Here''s the output of a new script, with a few lines for context
around something suspicious. The last column is the timestamp difference
from the previous line. Every other time value in the output is between
about 4000 and 7000. I now notice that the huge value is occurring when
the thread is switched to a different CPU. Is this normal?
28 <- swtch 4908
28 -> resume 4372
28 -> savectx 5428
28 -> schedctl_save 5068
28 <- schedctl_save 4324
28 <- savectx 4540
8 -> restorectx 19998571120
8 -> schedctl_restore 9648
8 <- schedctl_restore 5064
8 <- restorectx 5432
8 <- resume 4808
Here''s the code:
#!/usr/sbin/dtrace -s
#pragma D option flowindent
syscall::accept:entry
/execname == "java" && ! tracing++/
{
self->trace = 1;
self->ts = timestamp;
this->ts = timestamp;
trace(this->ts - self->ts);
self->ts = this->ts;
}
fbt:::entry
/self->trace/
{
this->ts = timestamp;
trace(this->ts - self->ts);
self->ts = this->ts;
}
fbt:::return
/self->trace/
{
this->ts = timestamp;
trace(this->ts - self->ts);
self->ts = this->ts;
}
syscall::accept:return
/self->trace/
{
this->ts = timestamp;
trace(this->ts - self->ts);
self->ts = this->ts;
self->trace = 0;
exit(0);
}
-----Original Message-----
From: dtrace-discuss-bounces at opensolaris.org
[mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Justin
Lloyd
Sent: Wednesday, April 25, 2007 3:59 PM
To: Chip Bennett
Cc: dtrace-discuss at opensolaris.org
Subject: RE: [dtrace-discuss] Tracing accept() time
Chip,
More than anything I think you''ve made me realize that my logic here is
completely wrong, since in your example my flow would include c''s time
in b''s output and b''s time (including c''s) in
a''s output (I think...I''m
starting to confuse myself).
I need to completely re-think how I want to break down the system
call''s
time, though I don''t think that my incorrect logic accounts for the
huge
difference in accept() times between the first and second D scripts.
Obviously what I''m trying to do here is figure out why the first script
reports such a high value and, if that 20 sec value is correct, where
that time is being spent.
Justin
-----Original Message-----
From: Chip Bennett [mailto:cbennett at laurustech.com]
Sent: Wednesday, April 25, 2007 3:41 PM
To: Justin Lloyd
Cc: dtrace-discuss at opensolaris.org
Subject: Re: [dtrace-discuss] Tracing accept() time
Justin,
I don''t think your fbt entry and return probes are pairing off
correctly.
Let''s suppose that your fbt call stack is a->b->c. When you call
"a",
since self->fbtrace is 0, the clause is executed. This sets
self->fbtrace equal to 1. The next events are the calls to "b" and
"c",
which are ignored because self->fbtrace is now 1.
The next event is the return from "c", which, since self->fbtrace
is
true, executes its clause and calculates the delta. But this delta is
the execution time for the parts of "a" and "b" that happen
before "c",
and all of "c". The next events are the returns from "b"
and "a" which
are ignored because self->fbtrace is now 0.
At this point, it appears to me that you have only captured part of the
time for the kernel execution.
Of course my analysis of the flow could be wrong. :-)
Chip
Justin Lloyd wrote:> Ignoring the long story behind why I''m doing this (and it may be
the
wrong track to take), I''ve been trying to use my meager ranks in
Knowledge(dtrace) to figure out why a test we''re doing regarding an OAS
HTTP/JVM problem is taking significantly longer on a T2000 than on a
dual dual-core Dell Linux box.>
> One strange thing I noticed when looking at average system call times
for the java and httpd processes is the accept() times:>
> syscall::accept:entry
> /execname=="java"/
> {
> self->ts = timestamp;
> }
>
> syscall::accept:return
> /self->ts/
> {
> @sc[pid] = avg(timestamp-self->ts);
> self->ts=0;
> }
>
> This code yields the following PID and average accept() times:
>
> 9490 5032706459
> 10392 19998243844
> 10400 19998255673
> 10398 19998289978
> 10393 19998312751
> 10395 19998312895
> 10396 19998842144
>
> So accept() is taking an average of 20 seconds to complete?!
>
> Then I tried tracing using fbt and the numbers don''t add up. What
am I
overlooking? Where is the missing time or where is my code and/or
understanding wrong? (The following example, accept() only takes 9
seconds, but it is just catching the first one.)>
> BEGIN
> {
> total = 0;
> }
>
> syscall::accept:entry
> /execname == "java" && tracing++/
> {
> self->start_time = timestamp;
> self->trace = 1;
> }
>
> fbt:::entry
> /self->trace && ! self->fbtrace/
> {
> self->ts = timestamp;
> self->fbtrace = 1;
> }
>
> fbt:::return
> /self->trace && self->fbtrace/
> {
> delta = timestamp - self->ts;
> total += delta;
> trace(delta);
> self->ts = 0;
> self->fbtrace = 0;
> }
>
> syscall::accept:return
> /self->trace == 1/
> {
> printf("total time in accept: %d", timestamp -
self->start_time);
> printf("total time in fbt: %d", total);
> self->trace = 0;
> exit(0);
> }
>
> CPU FUNCTION
> 14 <- set_active_fd 12380
> 14 <- useracc 6352
> 14 <- avl_numnodes 8796
> 14 <- avl_numnodes 8316
> 14 <- as_segcompar 9648
> 14 <- as_segcompar 5768
> 14 <- as_segcompar 5588
> 14 <- as_segcompar 5420
> 14 <- as_segcompar 5516
> 14 <- as_segcompar 5652
> 14 <- as_segcompar 5496
> 14 <- as_segcompar 5456
> 14 <- as_segcompar 5632
> 14 <- segvn_checkprot 6856
> 14 <- avl_numnodes 8548
> 14 <- ufalloc 5688
> 14 <- fd_find 9856
> 14 <- fd_reserve 5632
> 14 <- cv_wait_sig_swap 9696
> 14 <- thread_lock 9284
> 14 <- ts_sleep 9088
> 14 <- exp_x 11056
> 14 <- disp_lock_enter_high 5724
> 14 <- disp_lock_exit_high 5484
> 14 <- sleepq_insert 6544
> 14 <- disp_lock_exit_nopreempt 5544
> 14 <- sigcheck 5864
> 14 <- disp_lock_enter 8860
> 14 <- disp_lock_exit 5720
> 14 <- disp_getwork 13684
> 14 <- restore_mstate 5860
> 14 <- schedctl_save 9724
> 25 <- set_active_fd 14240
> 25 <- useracc 6200
> 25 <- avl_numnodes 8816
> 25 <- avl_numnodes 8352
> 25 <- as_segcompar 9312
> 25 <- as_segcompar 5688
> 25 <- as_segcompar 5488
> 25 <- as_segcompar 5364
> 25 <- as_segcompar 5464
> 25 <- segvn_checkprot 6420
> 25 <- avl_numnodes 8080
> 25 <- ufalloc 5684
> 25 <- fd_find 10060
> 25 <- fd_reserve 5644
> 25 <- cv_wait_sig_swap 9172
> 25 <- thread_lock 9140
> 25 <- ts_sleep 9076
> 25 <- exp_x 10980
> 25 <- disp_lock_enter_high 5708
> 25 <- disp_lock_exit_high 5596
> 25 <- sleepq_insert 5788
> 25 <- disp_lock_exit_nopreempt 5492
> 25 <- sigcheck 5972
> 25 <- disp_lock_enter 8808
> 25 <- disp_lock_exit 5476
> 25 <- disp_getwork 13628
> 25 <- restore_mstate 5724
> 25 <- schedctl_save 10008
> 22 <- set_active_fd 11616
> 22 <- useracc 6340
> 22 <- avl_numnodes 9616
> 22 <- avl_numnodes 9332
> 22 <- as_segcompar 10012
> 22 <- as_segcompar 6180
> 22 <- as_segcompar 6388
> 22 <- as_segcompar 6132
> 22 <- as_segcompar 5956
> 22 <- as_segcompar 6316
> 22 <- as_segcompar 6396
> 22 <- segvn_checkprot 6648
> 22 <- avl_numnodes 9360
> 22 <- ufalloc 6320
> 22 <- fd_find 11464
> 22 <- fd_reserve 6596
> 22 <- cv_wait_sig_swap 9968
> 22 <- thread_lock 10080
> 22 <- ts_sleep 10140
> 22 <- exp_x 11460
> 22 <- disp_lock_enter_high 6392
> 22 <- disp_lock_exit_high 6348
> 22 <- sleepq_insert 7280
> 22 <- disp_lock_exit_nopreempt 6036
> 22 <- sigcheck 6584
> 22 <- disp_lock_enter 9860
> 22 <- disp_lock_exit 6348
> 22 <- disp_getwork 8300
> 22 <- restore_mstate 5492
> 22 <- schedctl_save 9256
> 23 <- set_active_fd 13704
> 23 <- useracc 7052
> 23 <- avl_numnodes 10264
> 23 <- avl_numnodes 9968
> 23 <- as_segcompar 11836
> 23 <- as_segcompar 6716
> 23 <- as_segcompar 6404
> 23 <- as_segcompar 6404
> 23 <- as_segcompar 6324
> 23 <- as_segcompar 6476
> 23 <- as_segcompar 6344
> 23 <- segvn_checkprot 6944
> 23 <- avl_numnodes 9500
> 23 <- ufalloc 6356
> 23 <- fd_find 11888
> 23 <- fd_reserve 6408
> 23 <- cv_wait_sig_swap 10884
> 23 <- thread_lock 10528
> 23 <- ts_sleep 10352
> 23 <- exp_x 11832
> 23 <- disp_lock_enter_high 6084
> 23 <- disp_lock_exit_high 6424
> 23 <- sleepq_insert 6812
> 23 <- disp_lock_exit_nopreempt 6276
> 23 <- sigcheck 6784
> 23 <- disp_lock_enter 10616
> 23 <- disp_lock_exit 6232
> 23 <- disp_getwork 14796
> 23 <- restore_mstate 6448
> 23 <- schedctl_save 11052
> 23 <- set_active_fd 11116
> 23 <- useracc 5840
> 23 <- avl_numnodes 8640
> 23 <- avl_numnodes 8192
> 23 <- as_segcompar 9248
> 23 <- as_segcompar 5628
> 23 <- as_segcompar 5524
> 23 <- as_segcompar 5456
> 23 <- as_segcompar 5344
> 23 <- as_segcompar 5396
> 23 <- as_segcompar 5540
> 23 <- as_segcompar 5548
> 23 <- segvn_checkprot 6176
> 23 <- avl_numnodes 8172
> 23 <- ufalloc 5732
> 23 <- fd_find 10420
> 23 <- fd_reserve 5504
> 23 <- cv_wait_sig_swap 8820
> 23 <- thread_lock 9312
> 23 <- ts_sleep 9000
> 23 <- exp_x 10788
> 23 <- disp_lock_enter_high 5724
> 23 <- disp_lock_exit_high 5628
> 23 <- sleepq_insert 5956
> 23 <- disp_lock_exit_nopreempt 5456
> 23 <- sigcheck 5876
> 23 <- disp_lock_enter 8880
> 23 <- disp_lock_exit 5628
> 23 <- disp_getwork 13112
> 23 <- restore_mstate 5500
> 23 <- schedctl_save 9328
> 29 <- set_active_fd 11988
> 29 <- useracc 5880
> 29 <- avl_numnodes 8604
> 29 <- avl_numnodes 8116
> 29 <- as_segcompar 9360
> 29 <- as_segcompar 5724
> 29 <- as_segcompar 5476
> 29 <- as_segcompar 5416
> 29 <- as_segcompar 5500
> 29 <- as_segcompar 5668
> 29 <- segvn_checkprot 6112
> 29 <- avl_numnodes 8256
> 29 <- ufalloc 5752
> 29 <- fd_find 10624
> 29 <- fd_reserve 5568
> 29 <- cv_wait_sig_swap 8928
> 29 <- thread_lock 9308
> 29 <- ts_sleep 9036
> 29 <- exp_x 11092
> 29 <- disp_lock_enter_high 5656
> 29 <- disp_lock_exit_high 5600
> 29 <- sleepq_insert 6464
> 29 <- disp_lock_exit_nopreempt 5604
> 29 <- sigcheck 5856
> 29 <- disp_lock_enter 9128
> 29 <- disp_lock_exit 5728
> 29 <- disp_getwork 13784
> 29 <- restore_mstate 5644
> 29 <- schedctl_save 9916
> 15 <- set_active_fd 13660
> 15 <- useracc 6912
> 15 <- avl_numnodes 9976
> 15 <- avl_numnodes 9256
> 15 <- as_segcompar 10940
> 15 <- as_segcompar 6752
> 15 <- as_segcompar 6400
> 15 <- as_segcompar 6320
> 15 <- as_segcompar 6120
> 15 <- as_segcompar 6436
> 15 <- as_segcompar 6544
> 15 <- as_segcompar 6124
> 15 <- segvn_checkprot 7048
> 15 <- avl_numnodes 10272
> 15 <- ufalloc 6972
> 15 <- fd_find 11868
> 15 <- fd_reserve 6568
> 15 <- cv_wait_sig_swap 11380
> 15 <- thread_lock 10856
> 15 <- ts_sleep 10400
> 15 <- exp_x 12360
> 15 <- disp_lock_enter_high 6424
> 15 <- disp_lock_exit_high 6420
> 15 <- sleepq_insert 7392
> 15 <- disp_lock_exit_nopreempt 6452
> 15 <- sigcheck 6808
> 15 <- disp_lock_enter 10336
> 15 <- disp_lock_exit 6412
> 15 <- disp_getwork 15236
> 15 <- restore_mstate 6712
> 15 <- schedctl_save 11056
> 14 <- schedctl_restore 13144
> 14 <- sigcheck 5884
> 14 <- sogetoff 6156
> 14 <- bcopy 5724
> 14 <- kmem_cache_alloc 10104
> 14 <- getmajor 5544
> 14 <- vn_reinit 6008
> 14 <- vn_recycle 5404
> 14 <- vn_exists 5748
> 14 <- getzoneid 9004
> 14 <- so_lock_single 5692
> 14 <- kmem_cache_alloc 10160
> 14 <- kmem_cache_alloc 9000
> 14 <- setq 6568
> 14 <- set_qend 5744
> 14 <- kmem_cache_alloc 8780
> 14 <- setq 6160
> 14 <- entersq 5964
> 14 <- getzoneid 8304
> 14 <- vmem_alloc 9316
> 14 <- kmem_cache_alloc 6344
> 14 <- WR 5440
> 14 <- WR 5336
> 14 <- WR 8024
> 14 <- strlock 9132
> 14 <- set_nfsrv_ptr 5844
> 14 <- set_nbsrv_ptr 5924
> 14 <- set_qend 5816
> 14 <- set_qend 5648
> 14 <- strsetuio 5852
> 14 <- strunlock 5664
> 14 <- claimstr 9384
> 14 <- cv_broadcast 8808
> 14 <- claimstr 8128
> 14 <- cv_broadcast 8132
> 14 <- leavesq 5788
> 14 <- cv_broadcast 5456
> 14 <- so_unlock_single 5580
> 14 <- WR 8288
> 14 <- SAMESTR 8300
> 14 <- SAMESTR 5376
> 14 <- bcmp 5640
> 14 <- kmem_cache_alloc 10544
> 14 <- strsetrputhooks 8832
> 14 <- strsetwputhooks 5684
> 14 <- kmem_cache_alloc 8964
> 14 <- bcopy 6064
> 14 <- so_automatic_bind 5940
> 14 <- bcopy 5668
> 14 <- kmem_cache_alloc 11600
> 14 <- bcopy 5956
> 14 <- canputnext 8732
> 14 <- strmakedata 5840
> 14 <- crhold 8632
> 14 <- stream_willservice 5680
> 14 <- RD 10688
> 14 <- kmem_cache_alloc 8924
> 14 <- bcopy 5704
> 14 <- crhold 5776
> 14 <- cv_broadcast 11520
> 14 <- tcp_eager_unlink 6616
> 14 <- servicing_interrupt 8808
> 14 <- cv_broadcast 8936
> 14 <- tcp_maxpsz_set 5660
> 14 <- tcp_fuse_set_rcv_hiwat 5808
> 14 <- tcp_fuse_maxpsz_set 9756
> 14 <- claimstr 9192
> 14 <- strqset 6400
> 14 <- cv_broadcast 8684
> 14 <- kmem_cache_alloc 8984
> 14 <- strsock_misc 9956
> 14 <- dblk_lastfree 10760
> 14 <- kmem_cache_free 6052
> 14 <- qbackenable 5844
> 14 <- strsock_misc 9364
> 14 <- dblk_lastfree 9912
> 14 <- kmem_cache_free 5832
> 14 <- qbackenable 5576
> 14 <- qenable_locked 15604
> 14 <- tcp_fuse_syncstr_enable 10396
> 14 <- tcp_fuse_syncstr_enable 5932
> 14 <- cv_broadcast 5848
> 14 <- stream_runservice 6320
> 14 <- dblk_lastfree 8652
> 14 <- kmem_cache_free 5804
> 14 <- sowaitack 8880
> 14 <- bcopy 19364
> 14 <- crfree 11904
> 14 <- kmem_cache_free 6284
> 14 <- clear_active_fd 8456
> 14 <- cv_broadcast 5528
> 14 <- copyout_name 7592
> 14 <- kmem_cache_alloc 11640
> 14 <- crhold 5272
> 14 <- cv_broadcast 8228
> 14 <= accept total time in accept:
9336163080> total time in fbt:
2393624>
>
> Thanks,
> Justin
>
>
> Justin C. Lloyd
> Senior Engineer and System Administrator
> 303-684-4166 Office
> 720-480-0380 Cell
> 303-684-4100 Fax
> jlloyd at digitalglobe.com
> DigitalGlobe (r), An Imaging and Information Company
>
>
>
>
>
------------------------------------------------------------------------>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss at opensolaris.org
Wow. That number almost looks like an artifact, of overflow or a signed->unsigned conversion.... On 4/25/07, Justin Lloyd <jlloyd at digitalglobe.com> wrote:> Whoa! Here''s the output of a new script, with a few lines for context > around something suspicious. The last column is the timestamp difference > from the previous line. Every other time value in the output is between > about 4000 and 7000. I now notice that the huge value is occurring when > the thread is switched to a different CPU. Is this normal? > > 28 <- swtch 4908 > 28 -> resume 4372 > 28 -> savectx 5428 > 28 -> schedctl_save 5068 > 28 <- schedctl_save 4324 > 28 <- savectx 4540 > 8 -> restorectx 19998571120 > 8 -> schedctl_restore 9648 > 8 <- schedctl_restore 5064 > 8 <- restorectx 5432 > 8 <- resume 4808 > > Here''s the code: > > #!/usr/sbin/dtrace -s > > #pragma D option flowindent > > syscall::accept:entry > /execname == "java" && ! tracing++/ > { > self->trace = 1; > > self->ts = timestamp; > > this->ts = timestamp; > trace(this->ts - self->ts); > self->ts = this->ts; > } > > fbt:::entry > /self->trace/ > { > this->ts = timestamp; > trace(this->ts - self->ts); > self->ts = this->ts; > } > > fbt:::return > /self->trace/ > { > this->ts = timestamp; > trace(this->ts - self->ts); > self->ts = this->ts; > } > > syscall::accept:return > /self->trace/ > { > this->ts = timestamp; > trace(this->ts - self->ts); > self->ts = this->ts; > > self->trace = 0; > > exit(0); > } > > > > -----Original Message----- > From: dtrace-discuss-bounces at opensolaris.org > [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Justin > Lloyd > Sent: Wednesday, April 25, 2007 3:59 PM > To: Chip Bennett > Cc: dtrace-discuss at opensolaris.org > Subject: RE: [dtrace-discuss] Tracing accept() time > > Chip, > > More than anything I think you''ve made me realize that my logic here is > completely wrong, since in your example my flow would include c''s time > in b''s output and b''s time (including c''s) in a''s output (I think...I''m > starting to confuse myself). > > I need to completely re-think how I want to break down the system call''s > time, though I don''t think that my incorrect logic accounts for the huge > difference in accept() times between the first and second D scripts. > > Obviously what I''m trying to do here is figure out why the first script > reports such a high value and, if that 20 sec value is correct, where > that time is being spent. > > Justin > > -----Original Message----- > From: Chip Bennett [mailto:cbennett at laurustech.com] > Sent: Wednesday, April 25, 2007 3:41 PM > To: Justin Lloyd > Cc: dtrace-discuss at opensolaris.org > Subject: Re: [dtrace-discuss] Tracing accept() time > > Justin, > > I don''t think your fbt entry and return probes are pairing off > correctly. > > Let''s suppose that your fbt call stack is a->b->c. When you call "a", > since self->fbtrace is 0, the clause is executed. This sets > self->fbtrace equal to 1. The next events are the calls to "b" and "c", > which are ignored because self->fbtrace is now 1. > > The next event is the return from "c", which, since self->fbtrace is > true, executes its clause and calculates the delta. But this delta is > the execution time for the parts of "a" and "b" that happen before "c", > and all of "c". The next events are the returns from "b" and "a" which > are ignored because self->fbtrace is now 0. > > At this point, it appears to me that you have only captured part of the > time for the kernel execution. > > Of course my analysis of the flow could be wrong. :-) > > Chip > > Justin Lloyd wrote: > > Ignoring the long story behind why I''m doing this (and it may be the > wrong track to take), I''ve been trying to use my meager ranks in > Knowledge(dtrace) to figure out why a test we''re doing regarding an OAS > HTTP/JVM problem is taking significantly longer on a T2000 than on a > dual dual-core Dell Linux box. > > > > One strange thing I noticed when looking at average system call times > for the java and httpd processes is the accept() times: > > > > syscall::accept:entry > > /execname=="java"/ > > { > > self->ts = timestamp; > > } > > > > syscall::accept:return > > /self->ts/ > > { > > @sc[pid] = avg(timestamp-self->ts); > > self->ts=0; > > } > > > > This code yields the following PID and average accept() times: > > > > 9490 5032706459 > > 10392 19998243844 > > 10400 19998255673 > > 10398 19998289978 > > 10393 19998312751 > > 10395 19998312895 > > 10396 19998842144 > > > > So accept() is taking an average of 20 seconds to complete?! > > > > Then I tried tracing using fbt and the numbers don''t add up. What am I > overlooking? Where is the missing time or where is my code and/or > understanding wrong? (The following example, accept() only takes 9 > seconds, but it is just catching the first one.) > > > > BEGIN > > { > > total = 0; > > } > > > > syscall::accept:entry > > /execname == "java" && tracing++/ > > { > > self->start_time = timestamp; > > self->trace = 1; > > } > > > > fbt:::entry > > /self->trace && ! self->fbtrace/ > > { > > self->ts = timestamp; > > self->fbtrace = 1; > > } > > > > fbt:::return > > /self->trace && self->fbtrace/ > > { > > delta = timestamp - self->ts; > > total += delta; > > trace(delta); > > self->ts = 0; > > self->fbtrace = 0; > > } > > > > syscall::accept:return > > /self->trace == 1/ > > { > > printf("total time in accept: %d", timestamp - self->start_time); > > printf("total time in fbt: %d", total); > > self->trace = 0; > > exit(0); > > } > > > > CPU FUNCTION > > 14 <- set_active_fd 12380 > > 14 <- useracc 6352 > > 14 <- avl_numnodes 8796 > > 14 <- avl_numnodes 8316 > > 14 <- as_segcompar 9648 > > 14 <- as_segcompar 5768 > > 14 <- as_segcompar 5588 > > 14 <- as_segcompar 5420 > > 14 <- as_segcompar 5516 > > 14 <- as_segcompar 5652 > > 14 <- as_segcompar 5496 > > 14 <- as_segcompar 5456 > > 14 <- as_segcompar 5632 > > 14 <- segvn_checkprot 6856 > > 14 <- avl_numnodes 8548 > > 14 <- ufalloc 5688 > > 14 <- fd_find 9856 > > 14 <- fd_reserve 5632 > > 14 <- cv_wait_sig_swap 9696 > > 14 <- thread_lock 9284 > > 14 <- ts_sleep 9088 > > 14 <- exp_x 11056 > > 14 <- disp_lock_enter_high 5724 > > 14 <- disp_lock_exit_high 5484 > > 14 <- sleepq_insert 6544 > > 14 <- disp_lock_exit_nopreempt 5544 > > 14 <- sigcheck 5864 > > 14 <- disp_lock_enter 8860 > > 14 <- disp_lock_exit 5720 > > 14 <- disp_getwork 13684 > > 14 <- restore_mstate 5860 > > 14 <- schedctl_save 9724 > > 25 <- set_active_fd 14240 > > 25 <- useracc 6200 > > 25 <- avl_numnodes 8816 > > 25 <- avl_numnodes 8352 > > 25 <- as_segcompar 9312 > > 25 <- as_segcompar 5688 > > 25 <- as_segcompar 5488 > > 25 <- as_segcompar 5364 > > 25 <- as_segcompar 5464 > > 25 <- segvn_checkprot 6420 > > 25 <- avl_numnodes 8080 > > 25 <- ufalloc 5684 > > 25 <- fd_find 10060 > > 25 <- fd_reserve 5644 > > 25 <- cv_wait_sig_swap 9172 > > 25 <- thread_lock 9140 > > 25 <- ts_sleep 9076 > > 25 <- exp_x 10980 > > 25 <- disp_lock_enter_high 5708 > > 25 <- disp_lock_exit_high 5596 > > 25 <- sleepq_insert 5788 > > 25 <- disp_lock_exit_nopreempt 5492 > > 25 <- sigcheck 5972 > > 25 <- disp_lock_enter 8808 > > 25 <- disp_lock_exit 5476 > > 25 <- disp_getwork 13628 > > 25 <- restore_mstate 5724 > > 25 <- schedctl_save 10008 > > 22 <- set_active_fd 11616 > > 22 <- useracc 6340 > > 22 <- avl_numnodes 9616 > > 22 <- avl_numnodes 9332 > > 22 <- as_segcompar 10012 > > 22 <- as_segcompar 6180 > > 22 <- as_segcompar 6388 > > 22 <- as_segcompar 6132 > > 22 <- as_segcompar 5956 > > 22 <- as_segcompar 6316 > > 22 <- as_segcompar 6396 > > 22 <- segvn_checkprot 6648 > > 22 <- avl_numnodes 9360 > > 22 <- ufalloc 6320 > > 22 <- fd_find 11464 > > 22 <- fd_reserve 6596 > > 22 <- cv_wait_sig_swap 9968 > > 22 <- thread_lock 10080 > > 22 <- ts_sleep 10140 > > 22 <- exp_x 11460 > > 22 <- disp_lock_enter_high 6392 > > 22 <- disp_lock_exit_high 6348 > > 22 <- sleepq_insert 7280 > > 22 <- disp_lock_exit_nopreempt 6036 > > 22 <- sigcheck 6584 > > 22 <- disp_lock_enter 9860 > > 22 <- disp_lock_exit 6348 > > 22 <- disp_getwork 8300 > > 22 <- restore_mstate 5492 > > 22 <- schedctl_save 9256 > > 23 <- set_active_fd 13704 > > 23 <- useracc 7052 > > 23 <- avl_numnodes 10264 > > 23 <- avl_numnodes 9968 > > 23 <- as_segcompar 11836 > > 23 <- as_segcompar 6716 > > 23 <- as_segcompar 6404 > > 23 <- as_segcompar 6404 > > 23 <- as_segcompar 6324 > > 23 <- as_segcompar 6476 > > 23 <- as_segcompar 6344 > > 23 <- segvn_checkprot 6944 > > 23 <- avl_numnodes 9500 > > 23 <- ufalloc 6356 > > 23 <- fd_find 11888 > > 23 <- fd_reserve 6408 > > 23 <- cv_wait_sig_swap 10884 > > 23 <- thread_lock 10528 > > 23 <- ts_sleep 10352 > > 23 <- exp_x 11832 > > 23 <- disp_lock_enter_high 6084 > > 23 <- disp_lock_exit_high 6424 > > 23 <- sleepq_insert 6812 > > 23 <- disp_lock_exit_nopreempt 6276 > > 23 <- sigcheck 6784 > > 23 <- disp_lock_enter 10616 > > 23 <- disp_lock_exit 6232 > > 23 <- disp_getwork 14796 > > 23 <- restore_mstate 6448 > > 23 <- schedctl_save 11052 > > 23 <- set_active_fd 11116 > > 23 <- useracc 5840 > > 23 <- avl_numnodes 8640 > > 23 <- avl_numnodes 8192 > > 23 <- as_segcompar 9248 > > 23 <- as_segcompar 5628 > > 23 <- as_segcompar 5524 > > 23 <- as_segcompar 5456 > > 23 <- as_segcompar 5344 > > 23 <- as_segcompar 5396 > > 23 <- as_segcompar 5540 > > 23 <- as_segcompar 5548 > > 23 <- segvn_checkprot 6176 > > 23 <- avl_numnodes 8172 > > 23 <- ufalloc 5732 > > 23 <- fd_find 10420 > > 23 <- fd_reserve 5504 > > 23 <- cv_wait_sig_swap 8820 > > 23 <- thread_lock 9312 > > 23 <- ts_sleep 9000 > > 23 <- exp_x 10788 > > 23 <- disp_lock_enter_high 5724 > > 23 <- disp_lock_exit_high 5628 > > 23 <- sleepq_insert 5956 > > 23 <- disp_lock_exit_nopreempt 5456 > > 23 <- sigcheck 5876 > > 23 <- disp_lock_enter 8880 > > 23 <- disp_lock_exit 5628 > > 23 <- disp_getwork 13112 > > 23 <- restore_mstate 5500 > > 23 <- schedctl_save 9328 > > 29 <- set_active_fd 11988 > > 29 <- useracc 5880 > > 29 <- avl_numnodes 8604 > > 29 <- avl_numnodes 8116 > > 29 <- as_segcompar 9360 > > 29 <- as_segcompar 5724 > > 29 <- as_segcompar 5476 > > 29 <- as_segcompar 5416 > > 29 <- as_segcompar 5500 > > 29 <- as_segcompar 5668 > > 29 <- segvn_checkprot 6112 > > 29 <- avl_numnodes 8256 > > 29 <- ufalloc 5752 > > 29 <- fd_find 10624 > > 29 <- fd_reserve 5568 > > 29 <- cv_wait_sig_swap 8928 > > 29 <- thread_lock 9308 > > 29 <- ts_sleep 9036 > > 29 <- exp_x 11092 > > 29 <- disp_lock_enter_high 5656 > > 29 <- disp_lock_exit_high 5600 > > 29 <- sleepq_insert 6464 > > 29 <- disp_lock_exit_nopreempt 5604 > > 29 <- sigcheck 5856 > > 29 <- disp_lock_enter 9128 > > 29 <- disp_lock_exit 5728 > > 29 <- disp_getwork 13784 > > 29 <- restore_mstate 5644 > > 29 <- schedctl_save 9916 > > 15 <- set_active_fd 13660 > > 15 <- useracc 6912 > > 15 <- avl_numnodes 9976 > > 15 <- avl_numnodes 9256 > > 15 <- as_segcompar 10940 > > 15 <- as_segcompar 6752 > > 15 <- as_segcompar 6400 > > 15 <- as_segcompar 6320 > > 15 <- as_segcompar 6120 > > 15 <- as_segcompar 6436 > > 15 <- as_segcompar 6544 > > 15 <- as_segcompar 6124 > > 15 <- segvn_checkprot 7048 > > 15 <- avl_numnodes 10272 > > 15 <- ufalloc 6972 > > 15 <- fd_find 11868 > > 15 <- fd_reserve 6568 > > 15 <- cv_wait_sig_swap 11380 > > 15 <- thread_lock 10856 > > 15 <- ts_sleep 10400 > > 15 <- exp_x 12360 > > 15 <- disp_lock_enter_high 6424 > > 15 <- disp_lock_exit_high 6420 > > 15 <- sleepq_insert 7392 > > 15 <- disp_lock_exit_nopreempt 6452 > > 15 <- sigcheck 6808 > > 15 <- disp_lock_enter 10336 > > 15 <- disp_lock_exit 6412 > > 15 <- disp_getwork 15236 > > 15 <- restore_mstate 6712 > > 15 <- schedctl_save 11056 > > 14 <- schedctl_restore 13144 > > 14 <- sigcheck 5884 > > 14 <- sogetoff 6156 > > 14 <- bcopy 5724 > > 14 <- kmem_cache_alloc 10104 > > 14 <- getmajor 5544 > > 14 <- vn_reinit 6008 > > 14 <- vn_recycle 5404 > > 14 <- vn_exists 5748 > > 14 <- getzoneid 9004 > > 14 <- so_lock_single 5692 > > 14 <- kmem_cache_alloc 10160 > > 14 <- kmem_cache_alloc 9000 > > 14 <- setq 6568 > > 14 <- set_qend 5744 > > 14 <- kmem_cache_alloc 8780 > > 14 <- setq 6160 > > 14 <- entersq 5964 > > 14 <- getzoneid 8304 > > 14 <- vmem_alloc 9316 > > 14 <- kmem_cache_alloc 6344 > > 14 <- WR 5440 > > 14 <- WR 5336 > > 14 <- WR 8024 > > 14 <- strlock 9132 > > 14 <- set_nfsrv_ptr 5844 > > 14 <- set_nbsrv_ptr 5924 > > 14 <- set_qend 5816 > > 14 <- set_qend 5648 > > 14 <- strsetuio 5852 > > 14 <- strunlock 5664 > > 14 <- claimstr 9384 > > 14 <- cv_broadcast 8808 > > 14 <- claimstr 8128 > > 14 <- cv_broadcast 8132 > > 14 <- leavesq 5788 > > 14 <- cv_broadcast 5456 > > 14 <- so_unlock_single 5580 > > 14 <- WR 8288 > > 14 <- SAMESTR 8300 > > 14 <- SAMESTR 5376 > > 14 <- bcmp 5640 > > 14 <- kmem_cache_alloc 10544 > > 14 <- strsetrputhooks 8832 > > 14 <- strsetwputhooks 5684 > > 14 <- kmem_cache_alloc 8964 > > 14 <- bcopy 6064 > > 14 <- so_automatic_bind 5940 > > 14 <- bcopy 5668 > > 14 <- kmem_cache_alloc 11600 > > 14 <- bcopy 5956 > > 14 <- canputnext 8732 > > 14 <- strmakedata 5840 > > 14 <- crhold 8632 > > 14 <- stream_willservice 5680 > > 14 <- RD 10688 > > 14 <- kmem_cache_alloc 8924 > > 14 <- bcopy 5704 > > 14 <- crhold 5776 > > 14 <- cv_broadcast 11520 > > 14 <- tcp_eager_unlink 6616 > > 14 <- servicing_interrupt 8808 > > 14 <- cv_broadcast 8936 > > 14 <- tcp_maxpsz_set 5660 > > 14 <- tcp_fuse_set_rcv_hiwat 5808 > > 14 <- tcp_fuse_maxpsz_set 9756 > > 14 <- claimstr 9192 > > 14 <- strqset 6400 > > 14 <- cv_broadcast 8684 > > 14 <- kmem_cache_alloc 8984 > > 14 <- strsock_misc 9956 > > 14 <- dblk_lastfree 10760 > > 14 <- kmem_cache_free 6052 > > 14 <- qbackenable 5844 > > 14 <- strsock_misc 9364 > > 14 <- dblk_lastfree 9912 > > 14 <- kmem_cache_free 5832 > > 14 <- qbackenable 5576 > > 14 <- qenable_locked 15604 > > 14 <- tcp_fuse_syncstr_enable 10396 > > 14 <- tcp_fuse_syncstr_enable 5932 > > 14 <- cv_broadcast 5848 > > 14 <- stream_runservice 6320 > > 14 <- dblk_lastfree 8652 > > 14 <- kmem_cache_free 5804 > > 14 <- sowaitack 8880 > > 14 <- bcopy 19364 > > 14 <- crfree 11904 > > 14 <- kmem_cache_free 6284 > > 14 <- clear_active_fd 8456 > > 14 <- cv_broadcast 5528 > > 14 <- copyout_name 7592 > > 14 <- kmem_cache_alloc 11640 > > 14 <- crhold 5272 > > 14 <- cv_broadcast 8228 > > 14 <= accept total time in accept: > 9336163080 > > total time in fbt: > 2393624 > > > > > > Thanks, > > Justin > > > > > > Justin C. Lloyd > > Senior Engineer and System Administrator > > 303-684-4166 Office > > 720-480-0380 Cell > > 303-684-4100 Fax > > jlloyd at digitalglobe.com > > DigitalGlobe (r), An Imaging and Information Company > > > > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > dtrace-discuss mailing list > > dtrace-discuss at opensolaris.org > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >-- Laissez-nous faire, laissez-nous passer. Le monde va de lui-m?me.
It''s not always that high. I''ve seen as low as just a few
seconds, an order of magnitude difference. Additionally, it only seems to be
happening with our OAS Java processes. I''m hoping to get an OAS test
instance set up on a second T2000 to see if the problem is reproduced there. We
don''t see this particular issue on our production E15K.
bash-3.00# dtrace -n
''syscall::accept:entry{self->ts=timestamp}syscall::accept:return/self->ts/{@sc[execname,
pid]=avg(timestamp-self->ts);self->ts=0}tick-10sec/i++==3/{exit(0)}''
dtrace: description ''syscall::accept:entry'' matched 3 probes
CPU ID FUNCTION:NAME
5 49826 :tick-10sec
opmn 9485 78475
emagent 1633 98236
webcached 890 113215
java 9490 11266338722
bash-3.00#
-----Original Message-----
From: Charles Martin [mailto:chasrmartin at gmail.com]
Sent: Thursday, April 26, 2007 11:07 AM
To: Justin Lloyd
Cc: dtrace-discuss at opensolaris.org
Subject: Re: [dtrace-discuss] Tracing accept() time
Wow.
That number almost looks like an artifact, of overflow or a
signed->unsigned conversion....
On 4/25/07, Justin Lloyd <jlloyd at digitalglobe.com>
wrote:> Whoa! Here''s the output of a new script, with a few lines for
context
> around something suspicious. The last column is the timestamp
> difference from the previous line. Every other time value in the
> output is between about 4000 and 7000. I now notice that the huge
> value is occurring when the thread is switched to a different CPU. Is this
normal?
>
> 28 <- swtch 4908
> 28 -> resume 4372
> 28 -> savectx 5428
> 28 -> schedctl_save 5068
> 28 <- schedctl_save 4324
> 28 <- savectx 4540
> 8 -> restorectx 19998571120
> 8 -> schedctl_restore 9648
> 8 <- schedctl_restore 5064
> 8 <- restorectx 5432
> 8 <- resume 4808
>
> Here''s the code:
>
> #!/usr/sbin/dtrace -s
>
> #pragma D option flowindent
>
> syscall::accept:entry
> /execname == "java" && ! tracing++/
> {
> self->trace = 1;
>
> self->ts = timestamp;
>
> this->ts = timestamp;
> trace(this->ts - self->ts);
> self->ts = this->ts;
> }
>
> fbt:::entry
> /self->trace/
> {
> this->ts = timestamp;
> trace(this->ts - self->ts);
> self->ts = this->ts;
> }
>
> fbt:::return
> /self->trace/
> {
> this->ts = timestamp;
> trace(this->ts - self->ts);
> self->ts = this->ts;
> }
>
> syscall::accept:return
> /self->trace/
> {
> this->ts = timestamp;
> trace(this->ts - self->ts);
> self->ts = this->ts;
>
> self->trace = 0;
>
> exit(0);
> }
>
>
>
> -----Original Message-----
> From: dtrace-discuss-bounces at opensolaris.org
> [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Justin
> Lloyd
> Sent: Wednesday, April 25, 2007 3:59 PM
> To: Chip Bennett
> Cc: dtrace-discuss at opensolaris.org
> Subject: RE: [dtrace-discuss] Tracing accept() time
>
> Chip,
>
> More than anything I think you''ve made me realize that my logic
here
> is completely wrong, since in your example my flow would include
c''s
> time in b''s output and b''s time (including c''s)
in a''s output (I
> think...I''m starting to confuse myself).
>
> I need to completely re-think how I want to break down the system
> call''s time, though I don''t think that my incorrect logic
accounts for
> the huge difference in accept() times between the first and second D
scripts.
>
> Obviously what I''m trying to do here is figure out why the first
> script reports such a high value and, if that 20 sec value is correct,
> where that time is being spent.
>
> Justin
>
> -----Original Message-----
> From: Chip Bennett [mailto:cbennett at laurustech.com]
> Sent: Wednesday, April 25, 2007 3:41 PM
> To: Justin Lloyd
> Cc: dtrace-discuss at opensolaris.org
> Subject: Re: [dtrace-discuss] Tracing accept() time
>
> Justin,
>
> I don''t think your fbt entry and return probes are pairing off
> correctly.
>
> Let''s suppose that your fbt call stack is a->b->c. When you
call "a",
> since self->fbtrace is 0, the clause is executed. This sets
> self->fbtrace equal to 1. The next events are the calls to
"b" and
> self->"c",
> which are ignored because self->fbtrace is now 1.
>
> The next event is the return from "c", which, since
self->fbtrace is
> true, executes its clause and calculates the delta. But this delta is
> the execution time for the parts of "a" and "b" that
happen before
> "c", and all of "c". The next events are the returns
from "b" and "a"
> which are ignored because self->fbtrace is now 0.
>
> At this point, it appears to me that you have only captured part of
> the time for the kernel execution.
>
> Of course my analysis of the flow could be wrong. :-)
>
> Chip
>
> Justin Lloyd wrote:
> > Ignoring the long story behind why I''m doing this (and it may
be the
> wrong track to take), I''ve been trying to use my meager ranks in
> Knowledge(dtrace) to figure out why a test we''re doing regarding
an
> OAS HTTP/JVM problem is taking significantly longer on a T2000 than on
> a dual dual-core Dell Linux box.
> >
> > One strange thing I noticed when looking at average system call
> > times
> for the java and httpd processes is the accept() times:
> >
> > syscall::accept:entry
> > /execname=="java"/
> > {
> > self->ts = timestamp;
> > }
> >
> > syscall::accept:return
> > /self->ts/
> > {
> > @sc[pid] = avg(timestamp-self->ts);
> > self->ts=0;
> > }
> >
> > This code yields the following PID and average accept() times:
> >
> > 9490 5032706459
> > 10392 19998243844
> > 10400 19998255673
> > 10398 19998289978
> > 10393 19998312751
> > 10395 19998312895
> > 10396 19998842144
> >
> > So accept() is taking an average of 20 seconds to complete?!
> >
> > Then I tried tracing using fbt and the numbers don''t add up.
What am
> > I
> overlooking? Where is the missing time or where is my code and/or
> understanding wrong? (The following example, accept() only takes 9
> seconds, but it is just catching the first one.)
> >
> > BEGIN
> > {
> > total = 0;
> > }
> >
> > syscall::accept:entry
> > /execname == "java" && tracing++/
> > {
> > self->start_time = timestamp;
> > self->trace = 1;
> > }
> >
> > fbt:::entry
> > /self->trace && ! self->fbtrace/
> > {
> > self->ts = timestamp;
> > self->fbtrace = 1;
> > }
> >
> > fbt:::return
> > /self->trace && self->fbtrace/
> > {
> > delta = timestamp - self->ts;
> > total += delta;
> > trace(delta);
> > self->ts = 0;
> > self->fbtrace = 0;
> > }
> >
> > syscall::accept:return
> > /self->trace == 1/
> > {
> > printf("total time in accept: %d", timestamp -
self->start_time);
> > printf("total time in fbt: %d", total);
> > self->trace = 0;
> > exit(0);
> > }
> >
> > CPU FUNCTION
> > 14 <- set_active_fd 12380
> > 14 <- useracc 6352
> > 14 <- avl_numnodes 8796
> > 14 <- avl_numnodes 8316
> > 14 <- as_segcompar 9648
> > 14 <- as_segcompar 5768
> > 14 <- as_segcompar 5588
> > 14 <- as_segcompar 5420
> > 14 <- as_segcompar 5516
> > 14 <- as_segcompar 5652
> > 14 <- as_segcompar 5496
> > 14 <- as_segcompar 5456
> > 14 <- as_segcompar 5632
> > 14 <- segvn_checkprot 6856
> > 14 <- avl_numnodes 8548
> > 14 <- ufalloc 5688
> > 14 <- fd_find 9856
> > 14 <- fd_reserve 5632
> > 14 <- cv_wait_sig_swap 9696
> > 14 <- thread_lock 9284
> > 14 <- ts_sleep 9088
> > 14 <- exp_x 11056
> > 14 <- disp_lock_enter_high 5724
> > 14 <- disp_lock_exit_high 5484
> > 14 <- sleepq_insert 6544
> > 14 <- disp_lock_exit_nopreempt 5544
> > 14 <- sigcheck 5864
> > 14 <- disp_lock_enter 8860
> > 14 <- disp_lock_exit 5720
> > 14 <- disp_getwork 13684
> > 14 <- restore_mstate 5860
> > 14 <- schedctl_save 9724
> > 25 <- set_active_fd 14240
> > 25 <- useracc 6200
> > 25 <- avl_numnodes 8816
> > 25 <- avl_numnodes 8352
> > 25 <- as_segcompar 9312
> > 25 <- as_segcompar 5688
> > 25 <- as_segcompar 5488
> > 25 <- as_segcompar 5364
> > 25 <- as_segcompar 5464
> > 25 <- segvn_checkprot 6420
> > 25 <- avl_numnodes 8080
> > 25 <- ufalloc 5684
> > 25 <- fd_find 10060
> > 25 <- fd_reserve 5644
> > 25 <- cv_wait_sig_swap 9172
> > 25 <- thread_lock 9140
> > 25 <- ts_sleep 9076
> > 25 <- exp_x 10980
> > 25 <- disp_lock_enter_high 5708
> > 25 <- disp_lock_exit_high 5596
> > 25 <- sleepq_insert 5788
> > 25 <- disp_lock_exit_nopreempt 5492
> > 25 <- sigcheck 5972
> > 25 <- disp_lock_enter 8808
> > 25 <- disp_lock_exit 5476
> > 25 <- disp_getwork 13628
> > 25 <- restore_mstate 5724
> > 25 <- schedctl_save 10008
> > 22 <- set_active_fd 11616
> > 22 <- useracc 6340
> > 22 <- avl_numnodes 9616
> > 22 <- avl_numnodes 9332
> > 22 <- as_segcompar 10012
> > 22 <- as_segcompar 6180
> > 22 <- as_segcompar 6388
> > 22 <- as_segcompar 6132
> > 22 <- as_segcompar 5956
> > 22 <- as_segcompar 6316
> > 22 <- as_segcompar 6396
> > 22 <- segvn_checkprot 6648
> > 22 <- avl_numnodes 9360
> > 22 <- ufalloc 6320
> > 22 <- fd_find 11464
> > 22 <- fd_reserve 6596
> > 22 <- cv_wait_sig_swap 9968
> > 22 <- thread_lock 10080
> > 22 <- ts_sleep 10140
> > 22 <- exp_x 11460
> > 22 <- disp_lock_enter_high 6392
> > 22 <- disp_lock_exit_high 6348
> > 22 <- sleepq_insert 7280
> > 22 <- disp_lock_exit_nopreempt 6036
> > 22 <- sigcheck 6584
> > 22 <- disp_lock_enter 9860
> > 22 <- disp_lock_exit 6348
> > 22 <- disp_getwork 8300
> > 22 <- restore_mstate 5492
> > 22 <- schedctl_save 9256
> > 23 <- set_active_fd 13704
> > 23 <- useracc 7052
> > 23 <- avl_numnodes 10264
> > 23 <- avl_numnodes 9968
> > 23 <- as_segcompar 11836
> > 23 <- as_segcompar 6716
> > 23 <- as_segcompar 6404
> > 23 <- as_segcompar 6404
> > 23 <- as_segcompar 6324
> > 23 <- as_segcompar 6476
> > 23 <- as_segcompar 6344
> > 23 <- segvn_checkprot 6944
> > 23 <- avl_numnodes 9500
> > 23 <- ufalloc 6356
> > 23 <- fd_find 11888
> > 23 <- fd_reserve 6408
> > 23 <- cv_wait_sig_swap 10884
> > 23 <- thread_lock 10528
> > 23 <- ts_sleep 10352
> > 23 <- exp_x 11832
> > 23 <- disp_lock_enter_high 6084
> > 23 <- disp_lock_exit_high 6424
> > 23 <- sleepq_insert 6812
> > 23 <- disp_lock_exit_nopreempt 6276
> > 23 <- sigcheck 6784
> > 23 <- disp_lock_enter 10616
> > 23 <- disp_lock_exit 6232
> > 23 <- disp_getwork 14796
> > 23 <- restore_mstate 6448
> > 23 <- schedctl_save 11052
> > 23 <- set_active_fd 11116
> > 23 <- useracc 5840
> > 23 <- avl_numnodes 8640
> > 23 <- avl_numnodes 8192
> > 23 <- as_segcompar 9248
> > 23 <- as_segcompar 5628
> > 23 <- as_segcompar 5524
> > 23 <- as_segcompar 5456
> > 23 <- as_segcompar 5344
> > 23 <- as_segcompar 5396
> > 23 <- as_segcompar 5540
> > 23 <- as_segcompar 5548
> > 23 <- segvn_checkprot 6176
> > 23 <- avl_numnodes 8172
> > 23 <- ufalloc 5732
> > 23 <- fd_find 10420
> > 23 <- fd_reserve 5504
> > 23 <- cv_wait_sig_swap 8820
> > 23 <- thread_lock 9312
> > 23 <- ts_sleep 9000
> > 23 <- exp_x 10788
> > 23 <- disp_lock_enter_high 5724
> > 23 <- disp_lock_exit_high 5628
> > 23 <- sleepq_insert 5956
> > 23 <- disp_lock_exit_nopreempt 5456
> > 23 <- sigcheck 5876
> > 23 <- disp_lock_enter 8880
> > 23 <- disp_lock_exit 5628
> > 23 <- disp_getwork 13112
> > 23 <- restore_mstate 5500
> > 23 <- schedctl_save 9328
> > 29 <- set_active_fd 11988
> > 29 <- useracc 5880
> > 29 <- avl_numnodes 8604
> > 29 <- avl_numnodes 8116
> > 29 <- as_segcompar 9360
> > 29 <- as_segcompar 5724
> > 29 <- as_segcompar 5476
> > 29 <- as_segcompar 5416
> > 29 <- as_segcompar 5500
> > 29 <- as_segcompar 5668
> > 29 <- segvn_checkprot 6112
> > 29 <- avl_numnodes 8256
> > 29 <- ufalloc 5752
> > 29 <- fd_find 10624
> > 29 <- fd_reserve 5568
> > 29 <- cv_wait_sig_swap 8928
> > 29 <- thread_lock 9308
> > 29 <- ts_sleep 9036
> > 29 <- exp_x 11092
> > 29 <- disp_lock_enter_high 5656
> > 29 <- disp_lock_exit_high 5600
> > 29 <- sleepq_insert 6464
> > 29 <- disp_lock_exit_nopreempt 5604
> > 29 <- sigcheck 5856
> > 29 <- disp_lock_enter 9128
> > 29 <- disp_lock_exit 5728
> > 29 <- disp_getwork 13784
> > 29 <- restore_mstate 5644
> > 29 <- schedctl_save 9916
> > 15 <- set_active_fd 13660
> > 15 <- useracc 6912
> > 15 <- avl_numnodes 9976
> > 15 <- avl_numnodes 9256
> > 15 <- as_segcompar 10940
> > 15 <- as_segcompar 6752
> > 15 <- as_segcompar 6400
> > 15 <- as_segcompar 6320
> > 15 <- as_segcompar 6120
> > 15 <- as_segcompar 6436
> > 15 <- as_segcompar 6544
> > 15 <- as_segcompar 6124
> > 15 <- segvn_checkprot 7048
> > 15 <- avl_numnodes 10272
> > 15 <- ufalloc 6972
> > 15 <- fd_find 11868
> > 15 <- fd_reserve 6568
> > 15 <- cv_wait_sig_swap 11380
> > 15 <- thread_lock 10856
> > 15 <- ts_sleep 10400
> > 15 <- exp_x 12360
> > 15 <- disp_lock_enter_high 6424
> > 15 <- disp_lock_exit_high 6420
> > 15 <- sleepq_insert 7392
> > 15 <- disp_lock_exit_nopreempt 6452
> > 15 <- sigcheck 6808
> > 15 <- disp_lock_enter 10336
> > 15 <- disp_lock_exit 6412
> > 15 <- disp_getwork 15236
> > 15 <- restore_mstate 6712
> > 15 <- schedctl_save 11056
> > 14 <- schedctl_restore 13144
> > 14 <- sigcheck 5884
> > 14 <- sogetoff 6156
> > 14 <- bcopy 5724
> > 14 <- kmem_cache_alloc 10104
> > 14 <- getmajor 5544
> > 14 <- vn_reinit 6008
> > 14 <- vn_recycle 5404
> > 14 <- vn_exists 5748
> > 14 <- getzoneid 9004
> > 14 <- so_lock_single 5692
> > 14 <- kmem_cache_alloc 10160
> > 14 <- kmem_cache_alloc 9000
> > 14 <- setq 6568
> > 14 <- set_qend 5744
> > 14 <- kmem_cache_alloc 8780
> > 14 <- setq 6160
> > 14 <- entersq 5964
> > 14 <- getzoneid 8304
> > 14 <- vmem_alloc 9316
> > 14 <- kmem_cache_alloc 6344
> > 14 <- WR 5440
> > 14 <- WR 5336
> > 14 <- WR 8024
> > 14 <- strlock 9132
> > 14 <- set_nfsrv_ptr 5844
> > 14 <- set_nbsrv_ptr 5924
> > 14 <- set_qend 5816
> > 14 <- set_qend 5648
> > 14 <- strsetuio 5852
> > 14 <- strunlock 5664
> > 14 <- claimstr 9384
> > 14 <- cv_broadcast 8808
> > 14 <- claimstr 8128
> > 14 <- cv_broadcast 8132
> > 14 <- leavesq 5788
> > 14 <- cv_broadcast 5456
> > 14 <- so_unlock_single 5580
> > 14 <- WR 8288
> > 14 <- SAMESTR 8300
> > 14 <- SAMESTR 5376
> > 14 <- bcmp 5640
> > 14 <- kmem_cache_alloc 10544
> > 14 <- strsetrputhooks 8832
> > 14 <- strsetwputhooks 5684
> > 14 <- kmem_cache_alloc 8964
> > 14 <- bcopy 6064
> > 14 <- so_automatic_bind 5940
> > 14 <- bcopy 5668
> > 14 <- kmem_cache_alloc 11600
> > 14 <- bcopy 5956
> > 14 <- canputnext 8732
> > 14 <- strmakedata 5840
> > 14 <- crhold 8632
> > 14 <- stream_willservice 5680
> > 14 <- RD 10688
> > 14 <- kmem_cache_alloc 8924
> > 14 <- bcopy 5704
> > 14 <- crhold 5776
> > 14 <- cv_broadcast 11520
> > 14 <- tcp_eager_unlink 6616
> > 14 <- servicing_interrupt 8808
> > 14 <- cv_broadcast 8936
> > 14 <- tcp_maxpsz_set 5660
> > 14 <- tcp_fuse_set_rcv_hiwat 5808
> > 14 <- tcp_fuse_maxpsz_set 9756
> > 14 <- claimstr 9192
> > 14 <- strqset 6400
> > 14 <- cv_broadcast 8684
> > 14 <- kmem_cache_alloc 8984
> > 14 <- strsock_misc 9956
> > 14 <- dblk_lastfree 10760
> > 14 <- kmem_cache_free 6052
> > 14 <- qbackenable 5844
> > 14 <- strsock_misc 9364
> > 14 <- dblk_lastfree 9912
> > 14 <- kmem_cache_free 5832
> > 14 <- qbackenable 5576
> > 14 <- qenable_locked 15604
> > 14 <- tcp_fuse_syncstr_enable 10396
> > 14 <- tcp_fuse_syncstr_enable 5932
> > 14 <- cv_broadcast 5848
> > 14 <- stream_runservice 6320
> > 14 <- dblk_lastfree 8652
> > 14 <- kmem_cache_free 5804
> > 14 <- sowaitack 8880
> > 14 <- bcopy 19364
> > 14 <- crfree 11904
> > 14 <- kmem_cache_free 6284
> > 14 <- clear_active_fd 8456
> > 14 <- cv_broadcast 5528
> > 14 <- copyout_name 7592
> > 14 <- kmem_cache_alloc 11640
> > 14 <- crhold 5272
> > 14 <- cv_broadcast 8228
> > 14 <= accept total time in accept:
> 9336163080
> > total time in fbt:
> 2393624
> >
> >
> > Thanks,
> > Justin
> >
> >
> > Justin C. Lloyd
> > Senior Engineer and System Administrator
> > 303-684-4166 Office
> > 720-480-0380 Cell
> > 303-684-4100 Fax
> > jlloyd at digitalglobe.com
> > DigitalGlobe (r), An Imaging and Information Company
> >
> >
> >
> >
> >
> ----------------------------------------------------------------------
> --
> >
> > _______________________________________________
> > dtrace-discuss mailing list
> > dtrace-discuss at opensolaris.org
>
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>
--
Laissez-nous faire, laissez-nous passer.
Le monde va de lui-m?me.
Hmm, I take that back, but I don''t recall seeing it yesterday. And here
it didn''t even change CPUs. Curiouser and curioser.
449 <- swtch 3441
449 -> resume 2080
449 -> savectx 3041
449 -> schedctl_save 2400
449 <- schedctl_save 1921
449 <- savectx 2000
449 -> restorectx 5178600088
449 -> schedctl_restore 12723
449 <- schedctl_restore 3760
449 <- restorectx 4081
449 <- resume 2641
449 -> sigcheck 3681
449 <- sigcheck 2800
-----Original Message-----
From: Justin Lloyd
Sent: Thursday, April 26, 2007 12:09 PM
To: ''Charles Martin''
Cc: dtrace-discuss at opensolaris.org
Subject: RE: [dtrace-discuss] Tracing accept() time
It''s not always that high. I''ve seen as low as just a few
seconds, an order of magnitude difference. Additionally, it only seems to be
happening with our OAS Java processes. I''m hoping to get an OAS test
instance set up on a second T2000 to see if the problem is reproduced there. We
don''t see this particular issue on our production E15K.
bash-3.00# dtrace -n
''syscall::accept:entry{self->ts=timestamp}syscall::accept:return/self->ts/{@sc[execname,
pid]=avg(timestamp-self->ts);self->ts=0}tick-10sec/i++==3/{exit(0)}''
dtrace: description ''syscall::accept:entry'' matched 3 probes
CPU ID FUNCTION:NAME
5 49826 :tick-10sec
opmn 9485 78475
emagent 1633 98236
webcached 890 113215
java 9490 11266338722
bash-3.00#
-----Original Message-----
From: Charles Martin [mailto:chasrmartin at gmail.com]
Sent: Thursday, April 26, 2007 11:07 AM
To: Justin Lloyd
Cc: dtrace-discuss at opensolaris.org
Subject: Re: [dtrace-discuss] Tracing accept() time
Wow.
That number almost looks like an artifact, of overflow or a
signed->unsigned conversion....
On 4/25/07, Justin Lloyd <jlloyd at digitalglobe.com>
wrote:> Whoa! Here''s the output of a new script, with a few lines for
context
> around something suspicious. The last column is the timestamp
> difference from the previous line. Every other time value in the
> output is between about 4000 and 7000. I now notice that the huge
> value is occurring when the thread is switched to a different CPU. Is this
normal?
>
> 28 <- swtch 4908
> 28 -> resume 4372
> 28 -> savectx 5428
> 28 -> schedctl_save 5068
> 28 <- schedctl_save 4324
> 28 <- savectx 4540
> 8 -> restorectx 19998571120
> 8 -> schedctl_restore 9648
> 8 <- schedctl_restore 5064
> 8 <- restorectx 5432
> 8 <- resume 4808
>
> Here''s the code:
>
> #!/usr/sbin/dtrace -s
>
> #pragma D option flowindent
>
> syscall::accept:entry
> /execname == "java" && ! tracing++/
> {
> self->trace = 1;
>
> self->ts = timestamp;
>
> this->ts = timestamp;
> trace(this->ts - self->ts);
> self->ts = this->ts;
> }
>
> fbt:::entry
> /self->trace/
> {
> this->ts = timestamp;
> trace(this->ts - self->ts);
> self->ts = this->ts;
> }
>
> fbt:::return
> /self->trace/
> {
> this->ts = timestamp;
> trace(this->ts - self->ts);
> self->ts = this->ts;
> }
>
> syscall::accept:return
> /self->trace/
> {
> this->ts = timestamp;
> trace(this->ts - self->ts);
> self->ts = this->ts;
>
> self->trace = 0;
>
> exit(0);
> }
>
>
>
> -----Original Message-----
> From: dtrace-discuss-bounces at opensolaris.org
> [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Justin
> Lloyd
> Sent: Wednesday, April 25, 2007 3:59 PM
> To: Chip Bennett
> Cc: dtrace-discuss at opensolaris.org
> Subject: RE: [dtrace-discuss] Tracing accept() time
>
> Chip,
>
> More than anything I think you''ve made me realize that my logic
here
> is completely wrong, since in your example my flow would include
c''s
> time in b''s output and b''s time (including c''s)
in a''s output (I
> think...I''m starting to confuse myself).
>
> I need to completely re-think how I want to break down the system
> call''s time, though I don''t think that my incorrect logic
accounts for
> the huge difference in accept() times between the first and second D
scripts.
>
> Obviously what I''m trying to do here is figure out why the first
> script reports such a high value and, if that 20 sec value is correct,
> where that time is being spent.
>
> Justin
>
> -----Original Message-----
> From: Chip Bennett [mailto:cbennett at laurustech.com]
> Sent: Wednesday, April 25, 2007 3:41 PM
> To: Justin Lloyd
> Cc: dtrace-discuss at opensolaris.org
> Subject: Re: [dtrace-discuss] Tracing accept() time
>
> Justin,
>
> I don''t think your fbt entry and return probes are pairing off
> correctly.
>
> Let''s suppose that your fbt call stack is a->b->c. When you
call "a",
> since self->fbtrace is 0, the clause is executed. This sets
> self->fbtrace equal to 1. The next events are the calls to
"b" and
> self->"c",
> which are ignored because self->fbtrace is now 1.
>
> The next event is the return from "c", which, since
self->fbtrace is
> true, executes its clause and calculates the delta. But this delta is
> the execution time for the parts of "a" and "b" that
happen before
> "c", and all of "c". The next events are the returns
from "b" and "a"
> which are ignored because self->fbtrace is now 0.
>
> At this point, it appears to me that you have only captured part of
> the time for the kernel execution.
>
> Of course my analysis of the flow could be wrong. :-)
>
> Chip
>
> Justin Lloyd wrote:
> > Ignoring the long story behind why I''m doing this (and it may
be the
> wrong track to take), I''ve been trying to use my meager ranks in
> Knowledge(dtrace) to figure out why a test we''re doing regarding
an
> OAS HTTP/JVM problem is taking significantly longer on a T2000 than on
> a dual dual-core Dell Linux box.
> >
> > One strange thing I noticed when looking at average system call
> > times
> for the java and httpd processes is the accept() times:
> >
> > syscall::accept:entry
> > /execname=="java"/
> > {
> > self->ts = timestamp;
> > }
> >
> > syscall::accept:return
> > /self->ts/
> > {
> > @sc[pid] = avg(timestamp-self->ts);
> > self->ts=0;
> > }
> >
> > This code yields the following PID and average accept() times:
> >
> > 9490 5032706459
> > 10392 19998243844
> > 10400 19998255673
> > 10398 19998289978
> > 10393 19998312751
> > 10395 19998312895
> > 10396 19998842144
> >
> > So accept() is taking an average of 20 seconds to complete?!
> >
> > Then I tried tracing using fbt and the numbers don''t add up.
What am
> > I
> overlooking? Where is the missing time or where is my code and/or
> understanding wrong? (The following example, accept() only takes 9
> seconds, but it is just catching the first one.)
> >
> > BEGIN
> > {
> > total = 0;
> > }
> >
> > syscall::accept:entry
> > /execname == "java" && tracing++/
> > {
> > self->start_time = timestamp;
> > self->trace = 1;
> > }
> >
> > fbt:::entry
> > /self->trace && ! self->fbtrace/
> > {
> > self->ts = timestamp;
> > self->fbtrace = 1;
> > }
> >
> > fbt:::return
> > /self->trace && self->fbtrace/
> > {
> > delta = timestamp - self->ts;
> > total += delta;
> > trace(delta);
> > self->ts = 0;
> > self->fbtrace = 0;
> > }
> >
> > syscall::accept:return
> > /self->trace == 1/
> > {
> > printf("total time in accept: %d", timestamp -
self->start_time);
> > printf("total time in fbt: %d", total);
> > self->trace = 0;
> > exit(0);
> > }
> >
> > CPU FUNCTION
> > 14 <- set_active_fd 12380
> > 14 <- useracc 6352
> > 14 <- avl_numnodes 8796
> > 14 <- avl_numnodes 8316
> > 14 <- as_segcompar 9648
> > 14 <- as_segcompar 5768
> > 14 <- as_segcompar 5588
> > 14 <- as_segcompar 5420
> > 14 <- as_segcompar 5516
> > 14 <- as_segcompar 5652
> > 14 <- as_segcompar 5496
> > 14 <- as_segcompar 5456
> > 14 <- as_segcompar 5632
> > 14 <- segvn_checkprot 6856
> > 14 <- avl_numnodes 8548
> > 14 <- ufalloc 5688
> > 14 <- fd_find 9856
> > 14 <- fd_reserve 5632
> > 14 <- cv_wait_sig_swap 9696
> > 14 <- thread_lock 9284
> > 14 <- ts_sleep 9088
> > 14 <- exp_x 11056
> > 14 <- disp_lock_enter_high 5724
> > 14 <- disp_lock_exit_high 5484
> > 14 <- sleepq_insert 6544
> > 14 <- disp_lock_exit_nopreempt 5544
> > 14 <- sigcheck 5864
> > 14 <- disp_lock_enter 8860
> > 14 <- disp_lock_exit 5720
> > 14 <- disp_getwork 13684
> > 14 <- restore_mstate 5860
> > 14 <- schedctl_save 9724
> > 25 <- set_active_fd 14240
> > 25 <- useracc 6200
> > 25 <- avl_numnodes 8816
> > 25 <- avl_numnodes 8352
> > 25 <- as_segcompar 9312
> > 25 <- as_segcompar 5688
> > 25 <- as_segcompar 5488
> > 25 <- as_segcompar 5364
> > 25 <- as_segcompar 5464
> > 25 <- segvn_checkprot 6420
> > 25 <- avl_numnodes 8080
> > 25 <- ufalloc 5684
> > 25 <- fd_find 10060
> > 25 <- fd_reserve 5644
> > 25 <- cv_wait_sig_swap 9172
> > 25 <- thread_lock 9140
> > 25 <- ts_sleep 9076
> > 25 <- exp_x 10980
> > 25 <- disp_lock_enter_high 5708
> > 25 <- disp_lock_exit_high 5596
> > 25 <- sleepq_insert 5788
> > 25 <- disp_lock_exit_nopreempt 5492
> > 25 <- sigcheck 5972
> > 25 <- disp_lock_enter 8808
> > 25 <- disp_lock_exit 5476
> > 25 <- disp_getwork 13628
> > 25 <- restore_mstate 5724
> > 25 <- schedctl_save 10008
> > 22 <- set_active_fd 11616
> > 22 <- useracc 6340
> > 22 <- avl_numnodes 9616
> > 22 <- avl_numnodes 9332
> > 22 <- as_segcompar 10012
> > 22 <- as_segcompar 6180
> > 22 <- as_segcompar 6388
> > 22 <- as_segcompar 6132
> > 22 <- as_segcompar 5956
> > 22 <- as_segcompar 6316
> > 22 <- as_segcompar 6396
> > 22 <- segvn_checkprot 6648
> > 22 <- avl_numnodes 9360
> > 22 <- ufalloc 6320
> > 22 <- fd_find 11464
> > 22 <- fd_reserve 6596
> > 22 <- cv_wait_sig_swap 9968
> > 22 <- thread_lock 10080
> > 22 <- ts_sleep 10140
> > 22 <- exp_x 11460
> > 22 <- disp_lock_enter_high 6392
> > 22 <- disp_lock_exit_high 6348
> > 22 <- sleepq_insert 7280
> > 22 <- disp_lock_exit_nopreempt 6036
> > 22 <- sigcheck 6584
> > 22 <- disp_lock_enter 9860
> > 22 <- disp_lock_exit 6348
> > 22 <- disp_getwork 8300
> > 22 <- restore_mstate 5492
> > 22 <- schedctl_save 9256
> > 23 <- set_active_fd 13704
> > 23 <- useracc 7052
> > 23 <- avl_numnodes 10264
> > 23 <- avl_numnodes 9968
> > 23 <- as_segcompar 11836
> > 23 <- as_segcompar 6716
> > 23 <- as_segcompar 6404
> > 23 <- as_segcompar 6404
> > 23 <- as_segcompar 6324
> > 23 <- as_segcompar 6476
> > 23 <- as_segcompar 6344
> > 23 <- segvn_checkprot 6944
> > 23 <- avl_numnodes 9500
> > 23 <- ufalloc 6356
> > 23 <- fd_find 11888
> > 23 <- fd_reserve 6408
> > 23 <- cv_wait_sig_swap 10884
> > 23 <- thread_lock 10528
> > 23 <- ts_sleep 10352
> > 23 <- exp_x 11832
> > 23 <- disp_lock_enter_high 6084
> > 23 <- disp_lock_exit_high 6424
> > 23 <- sleepq_insert 6812
> > 23 <- disp_lock_exit_nopreempt 6276
> > 23 <- sigcheck 6784
> > 23 <- disp_lock_enter 10616
> > 23 <- disp_lock_exit 6232
> > 23 <- disp_getwork 14796
> > 23 <- restore_mstate 6448
> > 23 <- schedctl_save 11052
> > 23 <- set_active_fd 11116
> > 23 <- useracc 5840
> > 23 <- avl_numnodes 8640
> > 23 <- avl_numnodes 8192
> > 23 <- as_segcompar 9248
> > 23 <- as_segcompar 5628
> > 23 <- as_segcompar 5524
> > 23 <- as_segcompar 5456
> > 23 <- as_segcompar 5344
> > 23 <- as_segcompar 5396
> > 23 <- as_segcompar 5540
> > 23 <- as_segcompar 5548
> > 23 <- segvn_checkprot 6176
> > 23 <- avl_numnodes 8172
> > 23 <- ufalloc 5732
> > 23 <- fd_find 10420
> > 23 <- fd_reserve 5504
> > 23 <- cv_wait_sig_swap 8820
> > 23 <- thread_lock 9312
> > 23 <- ts_sleep 9000
> > 23 <- exp_x 10788
> > 23 <- disp_lock_enter_high 5724
> > 23 <- disp_lock_exit_high 5628
> > 23 <- sleepq_insert 5956
> > 23 <- disp_lock_exit_nopreempt 5456
> > 23 <- sigcheck 5876
> > 23 <- disp_lock_enter 8880
> > 23 <- disp_lock_exit 5628
> > 23 <- disp_getwork 13112
> > 23 <- restore_mstate 5500
> > 23 <- schedctl_save 9328
> > 29 <- set_active_fd 11988
> > 29 <- useracc 5880
> > 29 <- avl_numnodes 8604
> > 29 <- avl_numnodes 8116
> > 29 <- as_segcompar 9360
> > 29 <- as_segcompar 5724
> > 29 <- as_segcompar 5476
> > 29 <- as_segcompar 5416
> > 29 <- as_segcompar 5500
> > 29 <- as_segcompar 5668
> > 29 <- segvn_checkprot 6112
> > 29 <- avl_numnodes 8256
> > 29 <- ufalloc 5752
> > 29 <- fd_find 10624
> > 29 <- fd_reserve 5568
> > 29 <- cv_wait_sig_swap 8928
> > 29 <- thread_lock 9308
> > 29 <- ts_sleep 9036
> > 29 <- exp_x 11092
> > 29 <- disp_lock_enter_high 5656
> > 29 <- disp_lock_exit_high 5600
> > 29 <- sleepq_insert 6464
> > 29 <- disp_lock_exit_nopreempt 5604
> > 29 <- sigcheck 5856
> > 29 <- disp_lock_enter 9128
> > 29 <- disp_lock_exit 5728
> > 29 <- disp_getwork 13784
> > 29 <- restore_mstate 5644
> > 29 <- schedctl_save 9916
> > 15 <- set_active_fd 13660
> > 15 <- useracc 6912
> > 15 <- avl_numnodes 9976
> > 15 <- avl_numnodes 9256
> > 15 <- as_segcompar 10940
> > 15 <- as_segcompar 6752
> > 15 <- as_segcompar 6400
> > 15 <- as_segcompar 6320
> > 15 <- as_segcompar 6120
> > 15 <- as_segcompar 6436
> > 15 <- as_segcompar 6544
> > 15 <- as_segcompar 6124
> > 15 <- segvn_checkprot 7048
> > 15 <- avl_numnodes 10272
> > 15 <- ufalloc 6972
> > 15 <- fd_find 11868
> > 15 <- fd_reserve 6568
> > 15 <- cv_wait_sig_swap 11380
> > 15 <- thread_lock 10856
> > 15 <- ts_sleep 10400
> > 15 <- exp_x 12360
> > 15 <- disp_lock_enter_high 6424
> > 15 <- disp_lock_exit_high 6420
> > 15 <- sleepq_insert 7392
> > 15 <- disp_lock_exit_nopreempt 6452
> > 15 <- sigcheck 6808
> > 15 <- disp_lock_enter 10336
> > 15 <- disp_lock_exit 6412
> > 15 <- disp_getwork 15236
> > 15 <- restore_mstate 6712
> > 15 <- schedctl_save 11056
> > 14 <- schedctl_restore 13144
> > 14 <- sigcheck 5884
> > 14 <- sogetoff 6156
> > 14 <- bcopy 5724
> > 14 <- kmem_cache_alloc 10104
> > 14 <- getmajor 5544
> > 14 <- vn_reinit 6008
> > 14 <- vn_recycle 5404
> > 14 <- vn_exists 5748
> > 14 <- getzoneid 9004
> > 14 <- so_lock_single 5692
> > 14 <- kmem_cache_alloc 10160
> > 14 <- kmem_cache_alloc 9000
> > 14 <- setq 6568
> > 14 <- set_qend 5744
> > 14 <- kmem_cache_alloc 8780
> > 14 <- setq 6160
> > 14 <- entersq 5964
> > 14 <- getzoneid 8304
> > 14 <- vmem_alloc 9316
> > 14 <- kmem_cache_alloc 6344
> > 14 <- WR 5440
> > 14 <- WR 5336
> > 14 <- WR 8024
> > 14 <- strlock 9132
> > 14 <- set_nfsrv_ptr 5844
> > 14 <- set_nbsrv_ptr 5924
> > 14 <- set_qend 5816
> > 14 <- set_qend 5648
> > 14 <- strsetuio 5852
> > 14 <- strunlock 5664
> > 14 <- claimstr 9384
> > 14 <- cv_broadcast 8808
> > 14 <- claimstr 8128
> > 14 <- cv_broadcast 8132
> > 14 <- leavesq 5788
> > 14 <- cv_broadcast 5456
> > 14 <- so_unlock_single 5580
> > 14 <- WR 8288
> > 14 <- SAMESTR 8300
> > 14 <- SAMESTR 5376
> > 14 <- bcmp 5640
> > 14 <- kmem_cache_alloc 10544
> > 14 <- strsetrputhooks 8832
> > 14 <- strsetwputhooks 5684
> > 14 <- kmem_cache_alloc 8964
> > 14 <- bcopy 6064
> > 14 <- so_automatic_bind 5940
> > 14 <- bcopy 5668
> > 14 <- kmem_cache_alloc 11600
> > 14 <- bcopy 5956
> > 14 <- canputnext 8732
> > 14 <- strmakedata 5840
> > 14 <- crhold 8632
> > 14 <- stream_willservice 5680
> > 14 <- RD 10688
> > 14 <- kmem_cache_alloc 8924
> > 14 <- bcopy 5704
> > 14 <- crhold 5776
> > 14 <- cv_broadcast 11520
> > 14 <- tcp_eager_unlink 6616
> > 14 <- servicing_interrupt 8808
> > 14 <- cv_broadcast 8936
> > 14 <- tcp_maxpsz_set 5660
> > 14 <- tcp_fuse_set_rcv_hiwat 5808
> > 14 <- tcp_fuse_maxpsz_set 9756
> > 14 <- claimstr 9192
> > 14 <- strqset 6400
> > 14 <- cv_broadcast 8684
> > 14 <- kmem_cache_alloc 8984
> > 14 <- strsock_misc 9956
> > 14 <- dblk_lastfree 10760
> > 14 <- kmem_cache_free 6052
> > 14 <- qbackenable 5844
> > 14 <- strsock_misc 9364
> > 14 <- dblk_lastfree 9912
> > 14 <- kmem_cache_free 5832
> > 14 <- qbackenable 5576
> > 14 <- qenable_locked 15604
> > 14 <- tcp_fuse_syncstr_enable 10396
> > 14 <- tcp_fuse_syncstr_enable 5932
> > 14 <- cv_broadcast 5848
> > 14 <- stream_runservice 6320
> > 14 <- dblk_lastfree 8652
> > 14 <- kmem_cache_free 5804
> > 14 <- sowaitack 8880
> > 14 <- bcopy 19364
> > 14 <- crfree 11904
> > 14 <- kmem_cache_free 6284
> > 14 <- clear_active_fd 8456
> > 14 <- cv_broadcast 5528
> > 14 <- copyout_name 7592
> > 14 <- kmem_cache_alloc 11640
> > 14 <- crhold 5272
> > 14 <- cv_broadcast 8228
> > 14 <= accept total time in accept:
> 9336163080
> > total time in fbt:
> 2393624
> >
> >
> > Thanks,
> > Justin
> >
> >
> > Justin C. Lloyd
> > Senior Engineer and System Administrator
> > 303-684-4166 Office
> > 720-480-0380 Cell
> > 303-684-4100 Fax
> > jlloyd at digitalglobe.com
> > DigitalGlobe (r), An Imaging and Information Company
> >
> >
> >
> >
> >
> ----------------------------------------------------------------------
> --
> >
> > _______________________________________________
> > dtrace-discuss mailing list
> > dtrace-discuss at opensolaris.org
>
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>
--
Laissez-nous faire, laissez-nous passer.
Le monde va de lui-m?me.
Sorry, I came to this late, so I may not understand what you are looking for. The accept call is a blocking call, so it doesn''t seem like these numbers are very large. This is 5 seconds, that doesn''t seem unreasonable for an accept call. Justin Lloyd wrote:> Hmm, I take that back, but I don''t recall seeing it yesterday. And here it didn''t even change CPUs. Curiouser and curioser. > > 449 <- swtch 3441 > 449 -> resume 2080 > 449 -> savectx 3041 > 449 -> schedctl_save 2400 > 449 <- schedctl_save 1921 > 449 <- savectx 2000 > 449 -> restorectx 5178600088 > 449 -> schedctl_restore 12723 > 449 <- schedctl_restore 3760 > 449 <- restorectx 4081 > 449 <- resume 2641 > 449 -> sigcheck 3681 > 449 <- sigcheck 2800 > > -----Original Message----- > From: Justin Lloyd > Sent: Thursday, April 26, 2007 12:09 PM > To: ''Charles Martin'' > Cc: dtrace-discuss at opensolaris.org > Subject: RE: [dtrace-discuss] Tracing accept() time > > It''s not always that high. I''ve seen as low as just a few seconds, an order of magnitude difference. Additionally, it only seems to be happening with our OAS Java processes. I''m hoping to get an OAS test instance set up on a second T2000 to see if the problem is reproduced there. We don''t see this particular issue on our production E15K. > > bash-3.00# dtrace -n ''syscall::accept:entry{self->ts=timestamp}syscall::accept:return/self->ts/{@sc[execname, pid]=avg(timestamp-self->ts);self->ts=0}tick-10sec/i++==3/{exit(0)}'' > dtrace: description ''syscall::accept:entry'' matched 3 probes > CPU ID FUNCTION:NAME > 5 49826 :tick-10sec > > opmn 9485 78475 > emagent 1633 98236 > webcached 890 113215 > java 9490 11266338722 > bash-3.00# > > > > -----Original Message----- > From: Charles Martin [mailto:chasrmartin at gmail.com] > Sent: Thursday, April 26, 2007 11:07 AM > To: Justin Lloyd > Cc: dtrace-discuss at opensolaris.org > Subject: Re: [dtrace-discuss] Tracing accept() time > > Wow. > > That number almost looks like an artifact, of overflow or a > signed->unsigned conversion.... > > On 4/25/07, Justin Lloyd <jlloyd at digitalglobe.com> wrote: >> Whoa! Here''s the output of a new script, with a few lines for context >> around something suspicious. The last column is the timestamp >> difference from the previous line. Every other time value in the >> output is between about 4000 and 7000. I now notice that the huge >> value is occurring when the thread is switched to a different CPU. Is this normal? >> >> 28 <- swtch 4908 >> 28 -> resume 4372 >> 28 -> savectx 5428 >> 28 -> schedctl_save 5068 >> 28 <- schedctl_save 4324 >> 28 <- savectx 4540 >> 8 -> restorectx 19998571120 >> 8 -> schedctl_restore 9648 >> 8 <- schedctl_restore 5064 >> 8 <- restorectx 5432 >> 8 <- resume 4808 >> >> Here''s the code: >> >> #!/usr/sbin/dtrace -s >> >> #pragma D option flowindent >> >> syscall::accept:entry >> /execname == "java" && ! tracing++/ >> { >> self->trace = 1; >> >> self->ts = timestamp; >> >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> } >> >> fbt:::entry >> /self->trace/ >> { >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> } >> >> fbt:::return >> /self->trace/ >> { >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> } >> >> syscall::accept:return >> /self->trace/ >> { >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> >> self->trace = 0; >> >> exit(0); >> } >> >> >> >> -----Original Message----- >> From: dtrace-discuss-bounces at opensolaris.org >> [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Justin >> Lloyd >> Sent: Wednesday, April 25, 2007 3:59 PM >> To: Chip Bennett >> Cc: dtrace-discuss at opensolaris.org >> Subject: RE: [dtrace-discuss] Tracing accept() time >> >> Chip, >> >> More than anything I think you''ve made me realize that my logic here >> is completely wrong, since in your example my flow would include c''s >> time in b''s output and b''s time (including c''s) in a''s output (I >> think...I''m starting to confuse myself). >> >> I need to completely re-think how I want to break down the system >> call''s time, though I don''t think that my incorrect logic accounts for >> the huge difference in accept() times between the first and second D scripts. >> >> Obviously what I''m trying to do here is figure out why the first >> script reports such a high value and, if that 20 sec value is correct, >> where that time is being spent. >> >> Justin >> >> -----Original Message----- >> From: Chip Bennett [mailto:cbennett at laurustech.com] >> Sent: Wednesday, April 25, 2007 3:41 PM >> To: Justin Lloyd >> Cc: dtrace-discuss at opensolaris.org >> Subject: Re: [dtrace-discuss] Tracing accept() time >> >> Justin, >> >> I don''t think your fbt entry and return probes are pairing off >> correctly. >> >> Let''s suppose that your fbt call stack is a->b->c. When you call "a", >> since self->fbtrace is 0, the clause is executed. This sets >> self->fbtrace equal to 1. The next events are the calls to "b" and >> self->"c", >> which are ignored because self->fbtrace is now 1. >> >> The next event is the return from "c", which, since self->fbtrace is >> true, executes its clause and calculates the delta. But this delta is >> the execution time for the parts of "a" and "b" that happen before >> "c", and all of "c". The next events are the returns from "b" and "a" >> which are ignored because self->fbtrace is now 0. >> >> At this point, it appears to me that you have only captured part of >> the time for the kernel execution. >> >> Of course my analysis of the flow could be wrong. :-) >> >> Chip >> >> Justin Lloyd wrote: >>> Ignoring the long story behind why I''m doing this (and it may be the >> wrong track to take), I''ve been trying to use my meager ranks in >> Knowledge(dtrace) to figure out why a test we''re doing regarding an >> OAS HTTP/JVM problem is taking significantly longer on a T2000 than on >> a dual dual-core Dell Linux box. >>> One strange thing I noticed when looking at average system call >>> times >> for the java and httpd processes is the accept() times: >>> syscall::accept:entry >>> /execname=="java"/ >>> { >>> self->ts = timestamp; >>> } >>> >>> syscall::accept:return >>> /self->ts/ >>> { >>> @sc[pid] = avg(timestamp-self->ts); >>> self->ts=0; >>> } >>> >>> This code yields the following PID and average accept() times: >>> >>> 9490 5032706459 >>> 10392 19998243844 >>> 10400 19998255673 >>> 10398 19998289978 >>> 10393 19998312751 >>> 10395 19998312895 >>> 10396 19998842144 >>> >>> So accept() is taking an average of 20 seconds to complete?! >>> >>> Then I tried tracing using fbt and the numbers don''t add up. What am >>> I >> overlooking? Where is the missing time or where is my code and/or >> understanding wrong? (The following example, accept() only takes 9 >> seconds, but it is just catching the first one.) >>> BEGIN >>> { >>> total = 0; >>> } >>> >>> syscall::accept:entry >>> /execname == "java" && tracing++/ >>> { >>> self->start_time = timestamp; >>> self->trace = 1; >>> } >>> >>> fbt:::entry >>> /self->trace && ! self->fbtrace/ >>> { >>> self->ts = timestamp; >>> self->fbtrace = 1; >>> } >>> >>> fbt:::return >>> /self->trace && self->fbtrace/ >>> { >>> delta = timestamp - self->ts; >>> total += delta; >>> trace(delta); >>> self->ts = 0; >>> self->fbtrace = 0; >>> } >>> >>> syscall::accept:return >>> /self->trace == 1/ >>> { >>> printf("total time in accept: %d", timestamp - self->start_time); >>> printf("total time in fbt: %d", total); >>> self->trace = 0; >>> exit(0); >>> } >>> >>> CPU FUNCTION >>> 14 <- set_active_fd 12380 >>> 14 <- useracc 6352 >>> 14 <- avl_numnodes 8796 >>> 14 <- avl_numnodes 8316 >>> 14 <- as_segcompar 9648 >>> 14 <- as_segcompar 5768 >>> 14 <- as_segcompar 5588 >>> 14 <- as_segcompar 5420 >>> 14 <- as_segcompar 5516 >>> 14 <- as_segcompar 5652 >>> 14 <- as_segcompar 5496 >>> 14 <- as_segcompar 5456 >>> 14 <- as_segcompar 5632 >>> 14 <- segvn_checkprot 6856 >>> 14 <- avl_numnodes 8548 >>> 14 <- ufalloc 5688 >>> 14 <- fd_find 9856 >>> 14 <- fd_reserve 5632 >>> 14 <- cv_wait_sig_swap 9696 >>> 14 <- thread_lock 9284 >>> 14 <- ts_sleep 9088 >>> 14 <- exp_x 11056 >>> 14 <- disp_lock_enter_high 5724 >>> 14 <- disp_lock_exit_high 5484 >>> 14 <- sleepq_insert 6544 >>> 14 <- disp_lock_exit_nopreempt 5544 >>> 14 <- sigcheck 5864 >>> 14 <- disp_lock_enter 8860 >>> 14 <- disp_lock_exit 5720 >>> 14 <- disp_getwork 13684 >>> 14 <- restore_mstate 5860 >>> 14 <- schedctl_save 9724 >>> 25 <- set_active_fd 14240 >>> 25 <- useracc 6200 >>> 25 <- avl_numnodes 8816 >>> 25 <- avl_numnodes 8352 >>> 25 <- as_segcompar 9312 >>> 25 <- as_segcompar 5688 >>> 25 <- as_segcompar 5488 >>> 25 <- as_segcompar 5364 >>> 25 <- as_segcompar 5464 >>> 25 <- segvn_checkprot 6420 >>> 25 <- avl_numnodes 8080 >>> 25 <- ufalloc 5684 >>> 25 <- fd_find 10060 >>> 25 <- fd_reserve 5644 >>> 25 <- cv_wait_sig_swap 9172 >>> 25 <- thread_lock 9140 >>> 25 <- ts_sleep 9076 >>> 25 <- exp_x 10980 >>> 25 <- disp_lock_enter_high 5708 >>> 25 <- disp_lock_exit_high 5596 >>> 25 <- sleepq_insert 5788 >>> 25 <- disp_lock_exit_nopreempt 5492 >>> 25 <- sigcheck 5972 >>> 25 <- disp_lock_enter 8808 >>> 25 <- disp_lock_exit 5476 >>> 25 <- disp_getwork 13628 >>> 25 <- restore_mstate 5724 >>> 25 <- schedctl_save 10008 >>> 22 <- set_active_fd 11616 >>> 22 <- useracc 6340 >>> 22 <- avl_numnodes 9616 >>> 22 <- avl_numnodes 9332 >>> 22 <- as_segcompar 10012 >>> 22 <- as_segcompar 6180 >>> 22 <- as_segcompar 6388 >>> 22 <- as_segcompar 6132 >>> 22 <- as_segcompar 5956 >>> 22 <- as_segcompar 6316 >>> 22 <- as_segcompar 6396 >>> 22 <- segvn_checkprot 6648 >>> 22 <- avl_numnodes 9360 >>> 22 <- ufalloc 6320 >>> 22 <- fd_find 11464 >>> 22 <- fd_reserve 6596 >>> 22 <- cv_wait_sig_swap 9968 >>> 22 <- thread_lock 10080 >>> 22 <- ts_sleep 10140 >>> 22 <- exp_x 11460 >>> 22 <- disp_lock_enter_high 6392 >>> 22 <- disp_lock_exit_high 6348 >>> 22 <- sleepq_insert 7280 >>> 22 <- disp_lock_exit_nopreempt 6036 >>> 22 <- sigcheck 6584 >>> 22 <- disp_lock_enter 9860 >>> 22 <- disp_lock_exit 6348 >>> 22 <- disp_getwork 8300 >>> 22 <- restore_mstate 5492 >>> 22 <- schedctl_save 9256 >>> 23 <- set_active_fd 13704 >>> 23 <- useracc 7052 >>> 23 <- avl_numnodes 10264 >>> 23 <- avl_numnodes 9968 >>> 23 <- as_segcompar 11836 >>> 23 <- as_segcompar 6716 >>> 23 <- as_segcompar 6404 >>> 23 <- as_segcompar 6404 >>> 23 <- as_segcompar 6324 >>> 23 <- as_segcompar 6476 >>> 23 <- as_segcompar 6344 >>> 23 <- segvn_checkprot 6944 >>> 23 <- avl_numnodes 9500 >>> 23 <- ufalloc 6356 >>> 23 <- fd_find 11888 >>> 23 <- fd_reserve 6408 >>> 23 <- cv_wait_sig_swap 10884 >>> 23 <- thread_lock 10528 >>> 23 <- ts_sleep 10352 >>> 23 <- exp_x 11832 >>> 23 <- disp_lock_enter_high 6084 >>> 23 <- disp_lock_exit_high 6424 >>> 23 <- sleepq_insert 6812 >>> 23 <- disp_lock_exit_nopreempt 6276 >>> 23 <- sigcheck 6784 >>> 23 <- disp_lock_enter 10616 >>> 23 <- disp_lock_exit 6232 >>> 23 <- disp_getwork 14796 >>> 23 <- restore_mstate 6448 >>> 23 <- schedctl_save 11052 >>> 23 <- set_active_fd 11116 >>> 23 <- useracc 5840 >>> 23 <- avl_numnodes 8640 >>> 23 <- avl_numnodes 8192 >>> 23 <- as_segcompar 9248 >>> 23 <- as_segcompar 5628 >>> 23 <- as_segcompar 5524 >>> 23 <- as_segcompar 5456 >>> 23 <- as_segcompar 5344 >>> 23 <- as_segcompar 5396 >>> 23 <- as_segcompar 5540 >>> 23 <- as_segcompar 5548 >>> 23 <- segvn_checkprot 6176 >>> 23 <- avl_numnodes 8172 >>> 23 <- ufalloc 5732 >>> 23 <- fd_find 10420 >>> 23 <- fd_reserve 5504 >>> 23 <- cv_wait_sig_swap 8820 >>> 23 <- thread_lock 9312 >>> 23 <- ts_sleep 9000 >>> 23 <- exp_x 10788 >>> 23 <- disp_lock_enter_high 5724 >>> 23 <- disp_lock_exit_high 5628 >>> 23 <- sleepq_insert 5956 >>> 23 <- disp_lock_exit_nopreempt 5456 >>> 23 <- sigcheck 5876 >>> 23 <- disp_lock_enter 8880 >>> 23 <- disp_lock_exit 5628 >>> 23 <- disp_getwork 13112 >>> 23 <- restore_mstate 5500 >>> 23 <- schedctl_save 9328 >>> 29 <- set_active_fd 11988 >>> 29 <- useracc 5880 >>> 29 <- avl_numnodes 8604 >>> 29 <- avl_numnodes 8116 >>> 29 <- as_segcompar 9360 >>> 29 <- as_segcompar 5724 >>> 29 <- as_segcompar 5476 >>> 29 <- as_segcompar 5416 >>> 29 <- as_segcompar 5500 >>> 29 <- as_segcompar 5668 >>> 29 <- segvn_checkprot 6112 >>> 29 <- avl_numnodes 8256 >>> 29 <- ufalloc 5752 >>> 29 <- fd_find 10624 >>> 29 <- fd_reserve 5568 >>> 29 <- cv_wait_sig_swap 8928 >>> 29 <- thread_lock 9308 >>> 29 <- ts_sleep 9036 >>> 29 <- exp_x 11092 >>> 29 <- disp_lock_enter_high 5656 >>> 29 <- disp_lock_exit_high 5600 >>> 29 <- sleepq_insert 6464 >>> 29 <- disp_lock_exit_nopreempt 5604 >>> 29 <- sigcheck 5856 >>> 29 <- disp_lock_enter 9128 >>> 29 <- disp_lock_exit 5728 >>> 29 <- disp_getwork 13784 >>> 29 <- restore_mstate 5644 >>> 29 <- schedctl_save 9916 >>> 15 <- set_active_fd 13660 >>> 15 <- useracc 6912 >>> 15 <- avl_numnodes 9976 >>> 15 <- avl_numnodes 9256 >>> 15 <- as_segcompar 10940 >>> 15 <- as_segcompar 6752 >>> 15 <- as_segcompar 6400 >>> 15 <- as_segcompar 6320 >>> 15 <- as_segcompar 6120 >>> 15 <- as_segcompar 6436 >>> 15 <- as_segcompar 6544 >>> 15 <- as_segcompar 6124 >>> 15 <- segvn_checkprot 7048 >>> 15 <- avl_numnodes 10272 >>> 15 <- ufalloc 6972 >>> 15 <- fd_find 11868 >>> 15 <- fd_reserve 6568 >>> 15 <- cv_wait_sig_swap 11380 >>> 15 <- thread_lock 10856 >>> 15 <- ts_sleep 10400 >>> 15 <- exp_x 12360 >>> 15 <- disp_lock_enter_high 6424 >>> 15 <- disp_lock_exit_high 6420 >>> 15 <- sleepq_insert 7392 >>> 15 <- disp_lock_exit_nopreempt 6452 >>> 15 <- sigcheck 6808 >>> 15 <- disp_lock_enter 10336 >>> 15 <- disp_lock_exit 6412 >>> 15 <- disp_getwork 15236 >>> 15 <- restore_mstate 6712 >>> 15 <- schedctl_save 11056 >>> 14 <- schedctl_restore 13144 >>> 14 <- sigcheck 5884 >>> 14 <- sogetoff 6156 >>> 14 <- bcopy 5724 >>> 14 <- kmem_cache_alloc 10104 >>> 14 <- getmajor 5544 >>> 14 <- vn_reinit 6008 >>> 14 <- vn_recycle 5404 >>> 14 <- vn_exists 5748 >>> 14 <- getzoneid 9004 >>> 14 <- so_lock_single 5692 >>> 14 <- kmem_cache_alloc 10160 >>> 14 <- kmem_cache_alloc 9000 >>> 14 <- setq 6568 >>> 14 <- set_qend 5744 >>> 14 <- kmem_cache_alloc 8780 >>> 14 <- setq 6160 >>> 14 <- entersq 5964 >>> 14 <- getzoneid 8304 >>> 14 <- vmem_alloc 9316 >>> 14 <- kmem_cache_alloc 6344 >>> 14 <- WR 5440 >>> 14 <- WR 5336 >>> 14 <- WR 8024 >>> 14 <- strlock 9132 >>> 14 <- set_nfsrv_ptr 5844 >>> 14 <- set_nbsrv_ptr 5924 >>> 14 <- set_qend 5816 >>> 14 <- set_qend 5648 >>> 14 <- strsetuio 5852 >>> 14 <- strunlock 5664 >>> 14 <- claimstr 9384 >>> 14 <- cv_broadcast 8808 >>> 14 <- claimstr 8128 >>> 14 <- cv_broadcast 8132 >>> 14 <- leavesq 5788 >>> 14 <- cv_broadcast 5456 >>> 14 <- so_unlock_single 5580 >>> 14 <- WR 8288 >>> 14 <- SAMESTR 8300 >>> 14 <- SAMESTR 5376 >>> 14 <- bcmp 5640 >>> 14 <- kmem_cache_alloc 10544 >>> 14 <- strsetrputhooks 8832 >>> 14 <- strsetwputhooks 5684 >>> 14 <- kmem_cache_alloc 8964 >>> 14 <- bcopy 6064 >>> 14 <- so_automatic_bind 5940 >>> 14 <- bcopy 5668 >>> 14 <- kmem_cache_alloc 11600 >>> 14 <- bcopy 5956 >>> 14 <- canputnext 8732 >>> 14 <- strmakedata 5840 >>> 14 <- crhold 8632 >>> 14 <- stream_willservice 5680 >>> 14 <- RD 10688 >>> 14 <- kmem_cache_alloc 8924 >>> 14 <- bcopy 5704 >>> 14 <- crhold 5776 >>> 14 <- cv_broadcast 11520 >>> 14 <- tcp_eager_unlink 6616 >>> 14 <- servicing_interrupt 8808 >>> 14 <- cv_broadcast 8936 >>> 14 <- tcp_maxpsz_set 5660 >>> 14 <- tcp_fuse_set_rcv_hiwat 5808 >>> 14 <- tcp_fuse_maxpsz_set 9756 >>> 14 <- claimstr 9192 >>> 14 <- strqset 6400 >>> 14 <- cv_broadcast 8684 >>> 14 <- kmem_cache_alloc 8984 >>> 14 <- strsock_misc 9956 >>> 14 <- dblk_lastfree 10760 >>> 14 <- kmem_cache_free 6052 >>> 14 <- qbackenable 5844 >>> 14 <- strsock_misc 9364 >>> 14 <- dblk_lastfree 9912 >>> 14 <- kmem_cache_free 5832 >>> 14 <- qbackenable 5576 >>> 14 <- qenable_locked 15604 >>> 14 <- tcp_fuse_syncstr_enable 10396 >>> 14 <- tcp_fuse_syncstr_enable 5932 >>> 14 <- cv_broadcast 5848 >>> 14 <- stream_runservice 6320 >>> 14 <- dblk_lastfree 8652 >>> 14 <- kmem_cache_free 5804 >>> 14 <- sowaitack 8880 >>> 14 <- bcopy 19364 >>> 14 <- crfree 11904 >>> 14 <- kmem_cache_free 6284 >>> 14 <- clear_active_fd 8456 >>> 14 <- cv_broadcast 5528 >>> 14 <- copyout_name 7592 >>> 14 <- kmem_cache_alloc 11640 >>> 14 <- crhold 5272 >>> 14 <- cv_broadcast 8228 >>> 14 <= accept total time in accept: >> 9336163080 >>> total time in fbt: >> 2393624 >>> >>> Thanks, >>> Justin >>> >>> >>> Justin C. Lloyd >>> Senior Engineer and System Administrator >>> 303-684-4166 Office >>> 720-480-0380 Cell >>> 303-684-4100 Fax >>> jlloyd at digitalglobe.com >>> DigitalGlobe (r), An Imaging and Information Company >>> >>> >>> >>> >>> >> ---------------------------------------------------------------------- >> -- >>> _______________________________________________ >>> dtrace-discuss mailing list >>> dtrace-discuss at opensolaris.org >> >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> > > > -- > Laissez-nous faire, laissez-nous passer. > Le monde va de lui-m?me. > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- blu "When Congress started Daylight Savings Time earlier, did they even consider what affect an extra hour of daylight would have on global warming? Well, did they?" ---------------------------------------------------------------------- Brian Utterback - Solaris RPE, Sun Microsystems, Inc. Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom
Brian, Probably right as I cannot reliably replicate it on the 15K, so I''ll stick to figuring out why it is so severe and replicable on the T2000 for now. I''ve opened a case with Sun on this, as right now I''m running up against my knowledge of dtrace and internals to analyze this much further. If we get this figured out, I''ll follow up if it is relevant to do so. Thanks to everyone who has responded for your input! Justin -----Original Message----- From: Brian Utterback [mailto:brian.utterback at sun.com] Sent: Thursday, April 26, 2007 12:47 PM To: Justin Lloyd Cc: dtrace-discuss at opensolaris.org Subject: Re: [dtrace-discuss] Tracing accept() time Sorry, I came to this late, so I may not understand what you are looking for. The accept call is a blocking call, so it doesn''t seem like these numbers are very large. This is 5 seconds, that doesn''t seem unreasonable for an accept call. Justin Lloyd wrote:> Hmm, I take that back, but I don''t recall seeing it yesterday. And here it didn''t even change CPUs. Curiouser and curioser. > > 449 <- swtch 3441 > 449 -> resume 2080 > 449 -> savectx 3041 > 449 -> schedctl_save 2400 > 449 <- schedctl_save 1921 > 449 <- savectx 2000 > 449 -> restorectx 5178600088 > 449 -> schedctl_restore 12723 > 449 <- schedctl_restore 3760 > 449 <- restorectx 4081 > 449 <- resume 2641 > 449 -> sigcheck 3681 > 449 <- sigcheck 2800 > > -----Original Message----- > From: Justin Lloyd > Sent: Thursday, April 26, 2007 12:09 PM > To: ''Charles Martin'' > Cc: dtrace-discuss at opensolaris.org > Subject: RE: [dtrace-discuss] Tracing accept() time > > It''s not always that high. I''ve seen as low as just a few seconds, an order of magnitude difference. Additionally, it only seems to be happening with our OAS Java processes. I''m hoping to get an OAS test instance set up on a second T2000 to see if the problem is reproduced there. We don''t see this particular issue on our production E15K. > > bash-3.00# dtrace -n ''syscall::accept:entry{self->ts=timestamp}syscall::accept:return/self->ts/{@sc[execname, pid]=avg(timestamp-self->ts);self->ts=0}tick-10sec/i++==3/{exit(0)}'' > dtrace: description ''syscall::accept:entry'' matched 3 probes > CPU ID FUNCTION:NAME > 5 49826 :tick-10sec > > opmn 9485 78475 > emagent 1633 98236 > webcached 890 113215 > java 9490 11266338722 > bash-3.00# > > > > -----Original Message----- > From: Charles Martin [mailto:chasrmartin at gmail.com] > Sent: Thursday, April 26, 2007 11:07 AM > To: Justin Lloyd > Cc: dtrace-discuss at opensolaris.org > Subject: Re: [dtrace-discuss] Tracing accept() time > > Wow. > > That number almost looks like an artifact, of overflow or a > signed->unsigned conversion.... > > On 4/25/07, Justin Lloyd <jlloyd at digitalglobe.com> wrote: >> Whoa! Here''s the output of a new script, with a few lines for context >> around something suspicious. The last column is the timestamp >> difference from the previous line. Every other time value in the >> output is between about 4000 and 7000. I now notice that the huge >> value is occurring when the thread is switched to a different CPU. Is this normal? >> >> 28 <- swtch 4908 >> 28 -> resume 4372 >> 28 -> savectx 5428 >> 28 -> schedctl_save 5068 >> 28 <- schedctl_save 4324 >> 28 <- savectx 4540 >> 8 -> restorectx 19998571120 >> 8 -> schedctl_restore 9648 >> 8 <- schedctl_restore 5064 >> 8 <- restorectx 5432 >> 8 <- resume 4808 >> >> Here''s the code: >> >> #!/usr/sbin/dtrace -s >> >> #pragma D option flowindent >> >> syscall::accept:entry >> /execname == "java" && ! tracing++/ >> { >> self->trace = 1; >> >> self->ts = timestamp; >> >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> } >> >> fbt:::entry >> /self->trace/ >> { >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> } >> >> fbt:::return >> /self->trace/ >> { >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> } >> >> syscall::accept:return >> /self->trace/ >> { >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> >> self->trace = 0; >> >> exit(0); >> } >> >> >> >> -----Original Message----- >> From: dtrace-discuss-bounces at opensolaris.org >> [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Justin >> Lloyd >> Sent: Wednesday, April 25, 2007 3:59 PM >> To: Chip Bennett >> Cc: dtrace-discuss at opensolaris.org >> Subject: RE: [dtrace-discuss] Tracing accept() time >> >> Chip, >> >> More than anything I think you''ve made me realize that my logic here >> is completely wrong, since in your example my flow would include c''s >> time in b''s output and b''s time (including c''s) in a''s output (I >> think...I''m starting to confuse myself). >> >> I need to completely re-think how I want to break down the system >> call''s time, though I don''t think that my incorrect logic accounts >> for the huge difference in accept() times between the first and second D scripts. >> >> Obviously what I''m trying to do here is figure out why the first >> script reports such a high value and, if that 20 sec value is >> correct, where that time is being spent. >> >> Justin >> >> -----Original Message----- >> From: Chip Bennett [mailto:cbennett at laurustech.com] >> Sent: Wednesday, April 25, 2007 3:41 PM >> To: Justin Lloyd >> Cc: dtrace-discuss at opensolaris.org >> Subject: Re: [dtrace-discuss] Tracing accept() time >> >> Justin, >> >> I don''t think your fbt entry and return probes are pairing off >> correctly. >> >> Let''s suppose that your fbt call stack is a->b->c. When you call >> "a", since self->fbtrace is 0, the clause is executed. This sets >> self->fbtrace equal to 1. The next events are the calls to "b" and >> self->"c", >> which are ignored because self->fbtrace is now 1. >> >> The next event is the return from "c", which, since self->fbtrace is >> true, executes its clause and calculates the delta. But this delta >> is the execution time for the parts of "a" and "b" that happen before >> "c", and all of "c". The next events are the returns from "b" and "a" >> which are ignored because self->fbtrace is now 0. >> >> At this point, it appears to me that you have only captured part of >> the time for the kernel execution. >> >> Of course my analysis of the flow could be wrong. :-) >> >> Chip >> >> Justin Lloyd wrote: >>> Ignoring the long story behind why I''m doing this (and it may be the >> wrong track to take), I''ve been trying to use my meager ranks in >> Knowledge(dtrace) to figure out why a test we''re doing regarding an >> OAS HTTP/JVM problem is taking significantly longer on a T2000 than >> on a dual dual-core Dell Linux box. >>> One strange thing I noticed when looking at average system call >>> times >> for the java and httpd processes is the accept() times: >>> syscall::accept:entry >>> /execname=="java"/ >>> { >>> self->ts = timestamp; >>> } >>> >>> syscall::accept:return >>> /self->ts/ >>> { >>> @sc[pid] = avg(timestamp-self->ts); >>> self->ts=0; >>> } >>> >>> This code yields the following PID and average accept() times: >>> >>> 9490 5032706459 >>> 10392 19998243844 >>> 10400 19998255673 >>> 10398 19998289978 >>> 10393 19998312751 >>> 10395 19998312895 >>> 10396 19998842144 >>> >>> So accept() is taking an average of 20 seconds to complete?! >>> >>> Then I tried tracing using fbt and the numbers don''t add up. What am >>> I >> overlooking? Where is the missing time or where is my code and/or >> understanding wrong? (The following example, accept() only takes 9 >> seconds, but it is just catching the first one.) >>> BEGIN >>> { >>> total = 0; >>> } >>> >>> syscall::accept:entry >>> /execname == "java" && tracing++/ >>> { >>> self->start_time = timestamp; >>> self->trace = 1; >>> } >>> >>> fbt:::entry >>> /self->trace && ! self->fbtrace/ >>> { >>> self->ts = timestamp; >>> self->fbtrace = 1; >>> } >>> >>> fbt:::return >>> /self->trace && self->fbtrace/ >>> { >>> delta = timestamp - self->ts; >>> total += delta; >>> trace(delta); >>> self->ts = 0; >>> self->fbtrace = 0; >>> } >>> >>> syscall::accept:return >>> /self->trace == 1/ >>> { >>> printf("total time in accept: %d", timestamp - self->start_time); >>> printf("total time in fbt: %d", total); >>> self->trace = 0; >>> exit(0); >>> } >>> >>> CPU FUNCTION >>> 14 <- set_active_fd 12380 >>> 14 <- useracc 6352 >>> 14 <- avl_numnodes 8796 >>> 14 <- avl_numnodes 8316 >>> 14 <- as_segcompar 9648 >>> 14 <- as_segcompar 5768 >>> 14 <- as_segcompar 5588 >>> 14 <- as_segcompar 5420 >>> 14 <- as_segcompar 5516 >>> 14 <- as_segcompar 5652 >>> 14 <- as_segcompar 5496 >>> 14 <- as_segcompar 5456 >>> 14 <- as_segcompar 5632 >>> 14 <- segvn_checkprot 6856 >>> 14 <- avl_numnodes 8548 >>> 14 <- ufalloc 5688 >>> 14 <- fd_find 9856 >>> 14 <- fd_reserve 5632 >>> 14 <- cv_wait_sig_swap 9696 >>> 14 <- thread_lock 9284 >>> 14 <- ts_sleep 9088 >>> 14 <- exp_x 11056 >>> 14 <- disp_lock_enter_high 5724 >>> 14 <- disp_lock_exit_high 5484 >>> 14 <- sleepq_insert 6544 >>> 14 <- disp_lock_exit_nopreempt 5544 >>> 14 <- sigcheck 5864 >>> 14 <- disp_lock_enter 8860 >>> 14 <- disp_lock_exit 5720 >>> 14 <- disp_getwork 13684 >>> 14 <- restore_mstate 5860 >>> 14 <- schedctl_save 9724 >>> 25 <- set_active_fd 14240 >>> 25 <- useracc 6200 >>> 25 <- avl_numnodes 8816 >>> 25 <- avl_numnodes 8352 >>> 25 <- as_segcompar 9312 >>> 25 <- as_segcompar 5688 >>> 25 <- as_segcompar 5488 >>> 25 <- as_segcompar 5364 >>> 25 <- as_segcompar 5464 >>> 25 <- segvn_checkprot 6420 >>> 25 <- avl_numnodes 8080 >>> 25 <- ufalloc 5684 >>> 25 <- fd_find 10060 >>> 25 <- fd_reserve 5644 >>> 25 <- cv_wait_sig_swap 9172 >>> 25 <- thread_lock 9140 >>> 25 <- ts_sleep 9076 >>> 25 <- exp_x 10980 >>> 25 <- disp_lock_enter_high 5708 >>> 25 <- disp_lock_exit_high 5596 >>> 25 <- sleepq_insert 5788 >>> 25 <- disp_lock_exit_nopreempt 5492 >>> 25 <- sigcheck 5972 >>> 25 <- disp_lock_enter 8808 >>> 25 <- disp_lock_exit 5476 >>> 25 <- disp_getwork 13628 >>> 25 <- restore_mstate 5724 >>> 25 <- schedctl_save 10008 >>> 22 <- set_active_fd 11616 >>> 22 <- useracc 6340 >>> 22 <- avl_numnodes 9616 >>> 22 <- avl_numnodes 9332 >>> 22 <- as_segcompar 10012 >>> 22 <- as_segcompar 6180 >>> 22 <- as_segcompar 6388 >>> 22 <- as_segcompar 6132 >>> 22 <- as_segcompar 5956 >>> 22 <- as_segcompar 6316 >>> 22 <- as_segcompar 6396 >>> 22 <- segvn_checkprot 6648 >>> 22 <- avl_numnodes 9360 >>> 22 <- ufalloc 6320 >>> 22 <- fd_find 11464 >>> 22 <- fd_reserve 6596 >>> 22 <- cv_wait_sig_swap 9968 >>> 22 <- thread_lock 10080 >>> 22 <- ts_sleep 10140 >>> 22 <- exp_x 11460 >>> 22 <- disp_lock_enter_high 6392 >>> 22 <- disp_lock_exit_high 6348 >>> 22 <- sleepq_insert 7280 >>> 22 <- disp_lock_exit_nopreempt 6036 >>> 22 <- sigcheck 6584 >>> 22 <- disp_lock_enter 9860 >>> 22 <- disp_lock_exit 6348 >>> 22 <- disp_getwork 8300 >>> 22 <- restore_mstate 5492 >>> 22 <- schedctl_save 9256 >>> 23 <- set_active_fd 13704 >>> 23 <- useracc 7052 >>> 23 <- avl_numnodes 10264 >>> 23 <- avl_numnodes 9968 >>> 23 <- as_segcompar 11836 >>> 23 <- as_segcompar 6716 >>> 23 <- as_segcompar 6404 >>> 23 <- as_segcompar 6404 >>> 23 <- as_segcompar 6324 >>> 23 <- as_segcompar 6476 >>> 23 <- as_segcompar 6344 >>> 23 <- segvn_checkprot 6944 >>> 23 <- avl_numnodes 9500 >>> 23 <- ufalloc 6356 >>> 23 <- fd_find 11888 >>> 23 <- fd_reserve 6408 >>> 23 <- cv_wait_sig_swap 10884 >>> 23 <- thread_lock 10528 >>> 23 <- ts_sleep 10352 >>> 23 <- exp_x 11832 >>> 23 <- disp_lock_enter_high 6084 >>> 23 <- disp_lock_exit_high 6424 >>> 23 <- sleepq_insert 6812 >>> 23 <- disp_lock_exit_nopreempt 6276 >>> 23 <- sigcheck 6784 >>> 23 <- disp_lock_enter 10616 >>> 23 <- disp_lock_exit 6232 >>> 23 <- disp_getwork 14796 >>> 23 <- restore_mstate 6448 >>> 23 <- schedctl_save 11052 >>> 23 <- set_active_fd 11116 >>> 23 <- useracc 5840 >>> 23 <- avl_numnodes 8640 >>> 23 <- avl_numnodes 8192 >>> 23 <- as_segcompar 9248 >>> 23 <- as_segcompar 5628 >>> 23 <- as_segcompar 5524 >>> 23 <- as_segcompar 5456 >>> 23 <- as_segcompar 5344 >>> 23 <- as_segcompar 5396 >>> 23 <- as_segcompar 5540 >>> 23 <- as_segcompar 5548 >>> 23 <- segvn_checkprot 6176 >>> 23 <- avl_numnodes 8172 >>> 23 <- ufalloc 5732 >>> 23 <- fd_find 10420 >>> 23 <- fd_reserve 5504 >>> 23 <- cv_wait_sig_swap 8820 >>> 23 <- thread_lock 9312 >>> 23 <- ts_sleep 9000 >>> 23 <- exp_x 10788 >>> 23 <- disp_lock_enter_high 5724 >>> 23 <- disp_lock_exit_high 5628 >>> 23 <- sleepq_insert 5956 >>> 23 <- disp_lock_exit_nopreempt 5456 >>> 23 <- sigcheck 5876 >>> 23 <- disp_lock_enter 8880 >>> 23 <- disp_lock_exit 5628 >>> 23 <- disp_getwork 13112 >>> 23 <- restore_mstate 5500 >>> 23 <- schedctl_save 9328 >>> 29 <- set_active_fd 11988 >>> 29 <- useracc 5880 >>> 29 <- avl_numnodes 8604 >>> 29 <- avl_numnodes 8116 >>> 29 <- as_segcompar 9360 >>> 29 <- as_segcompar 5724 >>> 29 <- as_segcompar 5476 >>> 29 <- as_segcompar 5416 >>> 29 <- as_segcompar 5500 >>> 29 <- as_segcompar 5668 >>> 29 <- segvn_checkprot 6112 >>> 29 <- avl_numnodes 8256 >>> 29 <- ufalloc 5752 >>> 29 <- fd_find 10624 >>> 29 <- fd_reserve 5568 >>> 29 <- cv_wait_sig_swap 8928 >>> 29 <- thread_lock 9308 >>> 29 <- ts_sleep 9036 >>> 29 <- exp_x 11092 >>> 29 <- disp_lock_enter_high 5656 >>> 29 <- disp_lock_exit_high 5600 >>> 29 <- sleepq_insert 6464 >>> 29 <- disp_lock_exit_nopreempt 5604 >>> 29 <- sigcheck 5856 >>> 29 <- disp_lock_enter 9128 >>> 29 <- disp_lock_exit 5728 >>> 29 <- disp_getwork 13784 >>> 29 <- restore_mstate 5644 >>> 29 <- schedctl_save 9916 >>> 15 <- set_active_fd 13660 >>> 15 <- useracc 6912 >>> 15 <- avl_numnodes 9976 >>> 15 <- avl_numnodes 9256 >>> 15 <- as_segcompar 10940 >>> 15 <- as_segcompar 6752 >>> 15 <- as_segcompar 6400 >>> 15 <- as_segcompar 6320 >>> 15 <- as_segcompar 6120 >>> 15 <- as_segcompar 6436 >>> 15 <- as_segcompar 6544 >>> 15 <- as_segcompar 6124 >>> 15 <- segvn_checkprot 7048 >>> 15 <- avl_numnodes 10272 >>> 15 <- ufalloc 6972 >>> 15 <- fd_find 11868 >>> 15 <- fd_reserve 6568 >>> 15 <- cv_wait_sig_swap 11380 >>> 15 <- thread_lock 10856 >>> 15 <- ts_sleep 10400 >>> 15 <- exp_x 12360 >>> 15 <- disp_lock_enter_high 6424 >>> 15 <- disp_lock_exit_high 6420 >>> 15 <- sleepq_insert 7392 >>> 15 <- disp_lock_exit_nopreempt 6452 >>> 15 <- sigcheck 6808 >>> 15 <- disp_lock_enter 10336 >>> 15 <- disp_lock_exit 6412 >>> 15 <- disp_getwork 15236 >>> 15 <- restore_mstate 6712 >>> 15 <- schedctl_save 11056 >>> 14 <- schedctl_restore 13144 >>> 14 <- sigcheck 5884 >>> 14 <- sogetoff 6156 >>> 14 <- bcopy 5724 >>> 14 <- kmem_cache_alloc 10104 >>> 14 <- getmajor 5544 >>> 14 <- vn_reinit 6008 >>> 14 <- vn_recycle 5404 >>> 14 <- vn_exists 5748 >>> 14 <- getzoneid 9004 >>> 14 <- so_lock_single 5692 >>> 14 <- kmem_cache_alloc 10160 >>> 14 <- kmem_cache_alloc 9000 >>> 14 <- setq 6568 >>> 14 <- set_qend 5744 >>> 14 <- kmem_cache_alloc 8780 >>> 14 <- setq 6160 >>> 14 <- entersq 5964 >>> 14 <- getzoneid 8304 >>> 14 <- vmem_alloc 9316 >>> 14 <- kmem_cache_alloc 6344 >>> 14 <- WR 5440 >>> 14 <- WR 5336 >>> 14 <- WR 8024 >>> 14 <- strlock 9132 >>> 14 <- set_nfsrv_ptr 5844 >>> 14 <- set_nbsrv_ptr 5924 >>> 14 <- set_qend 5816 >>> 14 <- set_qend 5648 >>> 14 <- strsetuio 5852 >>> 14 <- strunlock 5664 >>> 14 <- claimstr 9384 >>> 14 <- cv_broadcast 8808 >>> 14 <- claimstr 8128 >>> 14 <- cv_broadcast 8132 >>> 14 <- leavesq 5788 >>> 14 <- cv_broadcast 5456 >>> 14 <- so_unlock_single 5580 >>> 14 <- WR 8288 >>> 14 <- SAMESTR 8300 >>> 14 <- SAMESTR 5376 >>> 14 <- bcmp 5640 >>> 14 <- kmem_cache_alloc 10544 >>> 14 <- strsetrputhooks 8832 >>> 14 <- strsetwputhooks 5684 >>> 14 <- kmem_cache_alloc 8964 >>> 14 <- bcopy 6064 >>> 14 <- so_automatic_bind 5940 >>> 14 <- bcopy 5668 >>> 14 <- kmem_cache_alloc 11600 >>> 14 <- bcopy 5956 >>> 14 <- canputnext 8732 >>> 14 <- strmakedata 5840 >>> 14 <- crhold 8632 >>> 14 <- stream_willservice 5680 >>> 14 <- RD 10688 >>> 14 <- kmem_cache_alloc 8924 >>> 14 <- bcopy 5704 >>> 14 <- crhold 5776 >>> 14 <- cv_broadcast 11520 >>> 14 <- tcp_eager_unlink 6616 >>> 14 <- servicing_interrupt 8808 >>> 14 <- cv_broadcast 8936 >>> 14 <- tcp_maxpsz_set 5660 >>> 14 <- tcp_fuse_set_rcv_hiwat 5808 >>> 14 <- tcp_fuse_maxpsz_set 9756 >>> 14 <- claimstr 9192 >>> 14 <- strqset 6400 >>> 14 <- cv_broadcast 8684 >>> 14 <- kmem_cache_alloc 8984 >>> 14 <- strsock_misc 9956 >>> 14 <- dblk_lastfree 10760 >>> 14 <- kmem_cache_free 6052 >>> 14 <- qbackenable 5844 >>> 14 <- strsock_misc 9364 >>> 14 <- dblk_lastfree 9912 >>> 14 <- kmem_cache_free 5832 >>> 14 <- qbackenable 5576 >>> 14 <- qenable_locked 15604 >>> 14 <- tcp_fuse_syncstr_enable 10396 >>> 14 <- tcp_fuse_syncstr_enable 5932 >>> 14 <- cv_broadcast 5848 >>> 14 <- stream_runservice 6320 >>> 14 <- dblk_lastfree 8652 >>> 14 <- kmem_cache_free 5804 >>> 14 <- sowaitack 8880 >>> 14 <- bcopy 19364 >>> 14 <- crfree 11904 >>> 14 <- kmem_cache_free 6284 >>> 14 <- clear_active_fd 8456 >>> 14 <- cv_broadcast 5528 >>> 14 <- copyout_name 7592 >>> 14 <- kmem_cache_alloc 11640 >>> 14 <- crhold 5272 >>> 14 <- cv_broadcast 8228 >>> 14 <= accept total time in accept: >> 9336163080 >>> total time in fbt: >> 2393624 >>> >>> Thanks, >>> Justin >>> >>> >>> Justin C. Lloyd >>> Senior Engineer and System Administrator >>> 303-684-4166 Office >>> 720-480-0380 Cell >>> 303-684-4100 Fax >>> jlloyd at digitalglobe.com >>> DigitalGlobe (r), An Imaging and Information Company >>> >>> >>> >>> >>> >> --------------------------------------------------------------------- >> - >> -- >>> _______________________________________________ >>> dtrace-discuss mailing list >>> dtrace-discuss at opensolaris.org >> >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> > > > -- > Laissez-nous faire, laissez-nous passer. > Le monde va de lui-m?me. > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- blu "When Congress started Daylight Savings Time earlier, did they even consider what affect an extra hour of daylight would have on global warming? Well, did they?" ---------------------------------------------------------------------- Brian Utterback - Solaris RPE, Sun Microsystems, Inc. Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom
Hi all, Just following up. The response I got from Sun is that this is probably just the normal behavior with the java processes being put to sleep, based on the description in the accept(3SOCKET) man page: "If no pending connections are present on the queue and the socket is not marked as non-blocking, accept() blocks the caller until a connection is present." Thanks for the responses. If anything, this was at least a good learning experience. Justin -----Original Message----- From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Justin Lloyd Sent: Thursday, April 26, 2007 1:08 PM To: Brian Utterback Cc: dtrace-discuss at opensolaris.org Subject: RE: [dtrace-discuss] Tracing accept() time Brian, Probably right as I cannot reliably replicate it on the 15K, so I''ll stick to figuring out why it is so severe and replicable on the T2000 for now. I''ve opened a case with Sun on this, as right now I''m running up against my knowledge of dtrace and internals to analyze this much further. If we get this figured out, I''ll follow up if it is relevant to do so. Thanks to everyone who has responded for your input! Justin -----Original Message----- From: Brian Utterback [mailto:brian.utterback at sun.com] Sent: Thursday, April 26, 2007 12:47 PM To: Justin Lloyd Cc: dtrace-discuss at opensolaris.org Subject: Re: [dtrace-discuss] Tracing accept() time Sorry, I came to this late, so I may not understand what you are looking for. The accept call is a blocking call, so it doesn''t seem like these numbers are very large. This is 5 seconds, that doesn''t seem unreasonable for an accept call. Justin Lloyd wrote:> Hmm, I take that back, but I don''t recall seeing it yesterday. And here it didn''t even change CPUs. Curiouser and curioser. > > 449 <- swtch 3441 > 449 -> resume 2080 > 449 -> savectx 3041 > 449 -> schedctl_save 2400 > 449 <- schedctl_save 1921 > 449 <- savectx 2000 > 449 -> restorectx 5178600088 > 449 -> schedctl_restore 12723 > 449 <- schedctl_restore 3760 > 449 <- restorectx 4081 > 449 <- resume 2641 > 449 -> sigcheck 3681 > 449 <- sigcheck 2800 > > -----Original Message----- > From: Justin Lloyd > Sent: Thursday, April 26, 2007 12:09 PM > To: ''Charles Martin'' > Cc: dtrace-discuss at opensolaris.org > Subject: RE: [dtrace-discuss] Tracing accept() time > > It''s not always that high. I''ve seen as low as just a few seconds, an order of magnitude difference. Additionally, it only seems to be happening with our OAS Java processes. I''m hoping to get an OAS test instance set up on a second T2000 to see if the problem is reproduced there. We don''t see this particular issue on our production E15K. > > bash-3.00# dtrace -n ''syscall::accept:entry{self->ts=timestamp}syscall::accept:return/self->ts/{@sc[execname, pid]=avg(timestamp-self->ts);self->ts=0}tick-10sec/i++==3/{exit(0)}'' > dtrace: description ''syscall::accept:entry'' matched 3 probes > CPU ID FUNCTION:NAME > 5 49826 :tick-10sec > > opmn 9485 78475 > emagent 1633 98236 > webcached 890 113215 > java 9490 11266338722 > bash-3.00# > > > > -----Original Message----- > From: Charles Martin [mailto:chasrmartin at gmail.com] > Sent: Thursday, April 26, 2007 11:07 AM > To: Justin Lloyd > Cc: dtrace-discuss at opensolaris.org > Subject: Re: [dtrace-discuss] Tracing accept() time > > Wow. > > That number almost looks like an artifact, of overflow or a > signed->unsigned conversion.... > > On 4/25/07, Justin Lloyd <jlloyd at digitalglobe.com> wrote: >> Whoa! Here''s the output of a new script, with a few lines for context >> around something suspicious. The last column is the timestamp >> difference from the previous line. Every other time value in the >> output is between about 4000 and 7000. I now notice that the huge >> value is occurring when the thread is switched to a different CPU. Is this normal? >> >> 28 <- swtch 4908 >> 28 -> resume 4372 >> 28 -> savectx 5428 >> 28 -> schedctl_save 5068 >> 28 <- schedctl_save 4324 >> 28 <- savectx 4540 >> 8 -> restorectx 19998571120 >> 8 -> schedctl_restore 9648 >> 8 <- schedctl_restore 5064 >> 8 <- restorectx 5432 >> 8 <- resume 4808 >> >> Here''s the code: >> >> #!/usr/sbin/dtrace -s >> >> #pragma D option flowindent >> >> syscall::accept:entry >> /execname == "java" && ! tracing++/ >> { >> self->trace = 1; >> >> self->ts = timestamp; >> >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> } >> >> fbt:::entry >> /self->trace/ >> { >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> } >> >> fbt:::return >> /self->trace/ >> { >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> } >> >> syscall::accept:return >> /self->trace/ >> { >> this->ts = timestamp; >> trace(this->ts - self->ts); >> self->ts = this->ts; >> >> self->trace = 0; >> >> exit(0); >> } >> >> >> >> -----Original Message----- >> From: dtrace-discuss-bounces at opensolaris.org >> [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Justin >> Lloyd >> Sent: Wednesday, April 25, 2007 3:59 PM >> To: Chip Bennett >> Cc: dtrace-discuss at opensolaris.org >> Subject: RE: [dtrace-discuss] Tracing accept() time >> >> Chip, >> >> More than anything I think you''ve made me realize that my logic here >> is completely wrong, since in your example my flow would include c''s >> time in b''s output and b''s time (including c''s) in a''s output (I >> think...I''m starting to confuse myself). >> >> I need to completely re-think how I want to break down the system >> call''s time, though I don''t think that my incorrect logic accounts >> for the huge difference in accept() times between the first and second D scripts. >> >> Obviously what I''m trying to do here is figure out why the first >> script reports such a high value and, if that 20 sec value is >> correct, where that time is being spent. >> >> Justin >> >> -----Original Message----- >> From: Chip Bennett [mailto:cbennett at laurustech.com] >> Sent: Wednesday, April 25, 2007 3:41 PM >> To: Justin Lloyd >> Cc: dtrace-discuss at opensolaris.org >> Subject: Re: [dtrace-discuss] Tracing accept() time >> >> Justin, >> >> I don''t think your fbt entry and return probes are pairing off >> correctly. >> >> Let''s suppose that your fbt call stack is a->b->c. When you call >> "a", since self->fbtrace is 0, the clause is executed. This sets >> self->fbtrace equal to 1. The next events are the calls to "b" and >> self->"c", >> which are ignored because self->fbtrace is now 1. >> >> The next event is the return from "c", which, since self->fbtrace is >> true, executes its clause and calculates the delta. But this delta >> is the execution time for the parts of "a" and "b" that happen before >> "c", and all of "c". The next events are the returns from "b" and "a" >> which are ignored because self->fbtrace is now 0. >> >> At this point, it appears to me that you have only captured part of >> the time for the kernel execution. >> >> Of course my analysis of the flow could be wrong. :-) >> >> Chip >> >> Justin Lloyd wrote: >>> Ignoring the long story behind why I''m doing this (and it may be the >> wrong track to take), I''ve been trying to use my meager ranks in >> Knowledge(dtrace) to figure out why a test we''re doing regarding an >> OAS HTTP/JVM problem is taking significantly longer on a T2000 than >> on a dual dual-core Dell Linux box. >>> One strange thing I noticed when looking at average system call >>> times >> for the java and httpd processes is the accept() times: >>> syscall::accept:entry >>> /execname=="java"/ >>> { >>> self->ts = timestamp; >>> } >>> >>> syscall::accept:return >>> /self->ts/ >>> { >>> @sc[pid] = avg(timestamp-self->ts); >>> self->ts=0; >>> } >>> >>> This code yields the following PID and average accept() times: >>> >>> 9490 5032706459 >>> 10392 19998243844 >>> 10400 19998255673 >>> 10398 19998289978 >>> 10393 19998312751 >>> 10395 19998312895 >>> 10396 19998842144 >>> >>> So accept() is taking an average of 20 seconds to complete?! >>> >>> Then I tried tracing using fbt and the numbers don''t add up. What am >>> I >> overlooking? Where is the missing time or where is my code and/or >> understanding wrong? (The following example, accept() only takes 9 >> seconds, but it is just catching the first one.) >>> BEGIN >>> { >>> total = 0; >>> } >>> >>> syscall::accept:entry >>> /execname == "java" && tracing++/ >>> { >>> self->start_time = timestamp; >>> self->trace = 1; >>> } >>> >>> fbt:::entry >>> /self->trace && ! self->fbtrace/ >>> { >>> self->ts = timestamp; >>> self->fbtrace = 1; >>> } >>> >>> fbt:::return >>> /self->trace && self->fbtrace/ >>> { >>> delta = timestamp - self->ts; >>> total += delta; >>> trace(delta); >>> self->ts = 0; >>> self->fbtrace = 0; >>> } >>> >>> syscall::accept:return >>> /self->trace == 1/ >>> { >>> printf("total time in accept: %d", timestamp - self->start_time); >>> printf("total time in fbt: %d", total); >>> self->trace = 0; >>> exit(0); >>> } >>> >>> CPU FUNCTION >>> 14 <- set_active_fd 12380 >>> 14 <- useracc 6352 >>> 14 <- avl_numnodes 8796 >>> 14 <- avl_numnodes 8316 >>> 14 <- as_segcompar 9648 >>> 14 <- as_segcompar 5768 >>> 14 <- as_segcompar 5588 >>> 14 <- as_segcompar 5420 >>> 14 <- as_segcompar 5516 >>> 14 <- as_segcompar 5652 >>> 14 <- as_segcompar 5496 >>> 14 <- as_segcompar 5456 >>> 14 <- as_segcompar 5632 >>> 14 <- segvn_checkprot 6856 >>> 14 <- avl_numnodes 8548 >>> 14 <- ufalloc 5688 >>> 14 <- fd_find 9856 >>> 14 <- fd_reserve 5632 >>> 14 <- cv_wait_sig_swap 9696 >>> 14 <- thread_lock 9284 >>> 14 <- ts_sleep 9088 >>> 14 <- exp_x 11056 >>> 14 <- disp_lock_enter_high 5724 >>> 14 <- disp_lock_exit_high 5484 >>> 14 <- sleepq_insert 6544 >>> 14 <- disp_lock_exit_nopreempt 5544 >>> 14 <- sigcheck 5864 >>> 14 <- disp_lock_enter 8860 >>> 14 <- disp_lock_exit 5720 >>> 14 <- disp_getwork 13684 >>> 14 <- restore_mstate 5860 >>> 14 <- schedctl_save 9724 >>> 25 <- set_active_fd 14240 >>> 25 <- useracc 6200 >>> 25 <- avl_numnodes 8816 >>> 25 <- avl_numnodes 8352 >>> 25 <- as_segcompar 9312 >>> 25 <- as_segcompar 5688 >>> 25 <- as_segcompar 5488 >>> 25 <- as_segcompar 5364 >>> 25 <- as_segcompar 5464 >>> 25 <- segvn_checkprot 6420 >>> 25 <- avl_numnodes 8080 >>> 25 <- ufalloc 5684 >>> 25 <- fd_find 10060 >>> 25 <- fd_reserve 5644 >>> 25 <- cv_wait_sig_swap 9172 >>> 25 <- thread_lock 9140 >>> 25 <- ts_sleep 9076 >>> 25 <- exp_x 10980 >>> 25 <- disp_lock_enter_high 5708 >>> 25 <- disp_lock_exit_high 5596 >>> 25 <- sleepq_insert 5788 >>> 25 <- disp_lock_exit_nopreempt 5492 >>> 25 <- sigcheck 5972 >>> 25 <- disp_lock_enter 8808 >>> 25 <- disp_lock_exit 5476 >>> 25 <- disp_getwork 13628 >>> 25 <- restore_mstate 5724 >>> 25 <- schedctl_save 10008 >>> 22 <- set_active_fd 11616 >>> 22 <- useracc 6340 >>> 22 <- avl_numnodes 9616 >>> 22 <- avl_numnodes 9332 >>> 22 <- as_segcompar 10012 >>> 22 <- as_segcompar 6180 >>> 22 <- as_segcompar 6388 >>> 22 <- as_segcompar 6132 >>> 22 <- as_segcompar 5956 >>> 22 <- as_segcompar 6316 >>> 22 <- as_segcompar 6396 >>> 22 <- segvn_checkprot 6648 >>> 22 <- avl_numnodes 9360 >>> 22 <- ufalloc 6320 >>> 22 <- fd_find 11464 >>> 22 <- fd_reserve 6596 >>> 22 <- cv_wait_sig_swap 9968 >>> 22 <- thread_lock 10080 >>> 22 <- ts_sleep 10140 >>> 22 <- exp_x 11460 >>> 22 <- disp_lock_enter_high 6392 >>> 22 <- disp_lock_exit_high 6348 >>> 22 <- sleepq_insert 7280 >>> 22 <- disp_lock_exit_nopreempt 6036 >>> 22 <- sigcheck 6584 >>> 22 <- disp_lock_enter 9860 >>> 22 <- disp_lock_exit 6348 >>> 22 <- disp_getwork 8300 >>> 22 <- restore_mstate 5492 >>> 22 <- schedctl_save 9256 >>> 23 <- set_active_fd 13704 >>> 23 <- useracc 7052 >>> 23 <- avl_numnodes 10264 >>> 23 <- avl_numnodes 9968 >>> 23 <- as_segcompar 11836 >>> 23 <- as_segcompar 6716 >>> 23 <- as_segcompar 6404 >>> 23 <- as_segcompar 6404 >>> 23 <- as_segcompar 6324 >>> 23 <- as_segcompar 6476 >>> 23 <- as_segcompar 6344 >>> 23 <- segvn_checkprot 6944 >>> 23 <- avl_numnodes 9500 >>> 23 <- ufalloc 6356 >>> 23 <- fd_find 11888 >>> 23 <- fd_reserve 6408 >>> 23 <- cv_wait_sig_swap 10884 >>> 23 <- thread_lock 10528 >>> 23 <- ts_sleep 10352 >>> 23 <- exp_x 11832 >>> 23 <- disp_lock_enter_high 6084 >>> 23 <- disp_lock_exit_high 6424 >>> 23 <- sleepq_insert 6812 >>> 23 <- disp_lock_exit_nopreempt 6276 >>> 23 <- sigcheck 6784 >>> 23 <- disp_lock_enter 10616 >>> 23 <- disp_lock_exit 6232 >>> 23 <- disp_getwork 14796 >>> 23 <- restore_mstate 6448 >>> 23 <- schedctl_save 11052 >>> 23 <- set_active_fd 11116 >>> 23 <- useracc 5840 >>> 23 <- avl_numnodes 8640 >>> 23 <- avl_numnodes 8192 >>> 23 <- as_segcompar 9248 >>> 23 <- as_segcompar 5628 >>> 23 <- as_segcompar 5524 >>> 23 <- as_segcompar 5456 >>> 23 <- as_segcompar 5344 >>> 23 <- as_segcompar 5396 >>> 23 <- as_segcompar 5540 >>> 23 <- as_segcompar 5548 >>> 23 <- segvn_checkprot 6176 >>> 23 <- avl_numnodes 8172 >>> 23 <- ufalloc 5732 >>> 23 <- fd_find 10420 >>> 23 <- fd_reserve 5504 >>> 23 <- cv_wait_sig_swap 8820 >>> 23 <- thread_lock 9312 >>> 23 <- ts_sleep 9000 >>> 23 <- exp_x 10788 >>> 23 <- disp_lock_enter_high 5724 >>> 23 <- disp_lock_exit_high 5628 >>> 23 <- sleepq_insert 5956 >>> 23 <- disp_lock_exit_nopreempt 5456 >>> 23 <- sigcheck 5876 >>> 23 <- disp_lock_enter 8880 >>> 23 <- disp_lock_exit 5628 >>> 23 <- disp_getwork 13112 >>> 23 <- restore_mstate 5500 >>> 23 <- schedctl_save 9328 >>> 29 <- set_active_fd 11988 >>> 29 <- useracc 5880 >>> 29 <- avl_numnodes 8604 >>> 29 <- avl_numnodes 8116 >>> 29 <- as_segcompar 9360 >>> 29 <- as_segcompar 5724 >>> 29 <- as_segcompar 5476 >>> 29 <- as_segcompar 5416 >>> 29 <- as_segcompar 5500 >>> 29 <- as_segcompar 5668 >>> 29 <- segvn_checkprot 6112 >>> 29 <- avl_numnodes 8256 >>> 29 <- ufalloc 5752 >>> 29 <- fd_find 10624 >>> 29 <- fd_reserve 5568 >>> 29 <- cv_wait_sig_swap 8928 >>> 29 <- thread_lock 9308 >>> 29 <- ts_sleep 9036 >>> 29 <- exp_x 11092 >>> 29 <- disp_lock_enter_high 5656 >>> 29 <- disp_lock_exit_high 5600 >>> 29 <- sleepq_insert 6464 >>> 29 <- disp_lock_exit_nopreempt 5604 >>> 29 <- sigcheck 5856 >>> 29 <- disp_lock_enter 9128 >>> 29 <- disp_lock_exit 5728 >>> 29 <- disp_getwork 13784 >>> 29 <- restore_mstate 5644 >>> 29 <- schedctl_save 9916 >>> 15 <- set_active_fd 13660 >>> 15 <- useracc 6912 >>> 15 <- avl_numnodes 9976 >>> 15 <- avl_numnodes 9256 >>> 15 <- as_segcompar 10940 >>> 15 <- as_segcompar 6752 >>> 15 <- as_segcompar 6400 >>> 15 <- as_segcompar 6320 >>> 15 <- as_segcompar 6120 >>> 15 <- as_segcompar 6436 >>> 15 <- as_segcompar 6544 >>> 15 <- as_segcompar 6124 >>> 15 <- segvn_checkprot 7048 >>> 15 <- avl_numnodes 10272 >>> 15 <- ufalloc 6972 >>> 15 <- fd_find 11868 >>> 15 <- fd_reserve 6568 >>> 15 <- cv_wait_sig_swap 11380 >>> 15 <- thread_lock 10856 >>> 15 <- ts_sleep 10400 >>> 15 <- exp_x 12360 >>> 15 <- disp_lock_enter_high 6424 >>> 15 <- disp_lock_exit_high 6420 >>> 15 <- sleepq_insert 7392 >>> 15 <- disp_lock_exit_nopreempt 6452 >>> 15 <- sigcheck 6808 >>> 15 <- disp_lock_enter 10336 >>> 15 <- disp_lock_exit 6412 >>> 15 <- disp_getwork 15236 >>> 15 <- restore_mstate 6712 >>> 15 <- schedctl_save 11056 >>> 14 <- schedctl_restore 13144 >>> 14 <- sigcheck 5884 >>> 14 <- sogetoff 6156 >>> 14 <- bcopy 5724 >>> 14 <- kmem_cache_alloc 10104 >>> 14 <- getmajor 5544 >>> 14 <- vn_reinit 6008 >>> 14 <- vn_recycle 5404 >>> 14 <- vn_exists 5748 >>> 14 <- getzoneid 9004 >>> 14 <- so_lock_single 5692 >>> 14 <- kmem_cache_alloc 10160 >>> 14 <- kmem_cache_alloc 9000 >>> 14 <- setq 6568 >>> 14 <- set_qend 5744 >>> 14 <- kmem_cache_alloc 8780 >>> 14 <- setq 6160 >>> 14 <- entersq 5964 >>> 14 <- getzoneid 8304 >>> 14 <- vmem_alloc 9316 >>> 14 <- kmem_cache_alloc 6344 >>> 14 <- WR 5440 >>> 14 <- WR 5336 >>> 14 <- WR 8024 >>> 14 <- strlock 9132 >>> 14 <- set_nfsrv_ptr 5844 >>> 14 <- set_nbsrv_ptr 5924 >>> 14 <- set_qend 5816 >>> 14 <- set_qend 5648 >>> 14 <- strsetuio 5852 >>> 14 <- strunlock 5664 >>> 14 <- claimstr 9384 >>> 14 <- cv_broadcast 8808 >>> 14 <- claimstr 8128 >>> 14 <- cv_broadcast 8132 >>> 14 <- leavesq 5788 >>> 14 <- cv_broadcast 5456 >>> 14 <- so_unlock_single 5580 >>> 14 <- WR 8288 >>> 14 <- SAMESTR 8300 >>> 14 <- SAMESTR 5376 >>> 14 <- bcmp 5640 >>> 14 <- kmem_cache_alloc 10544 >>> 14 <- strsetrputhooks 8832 >>> 14 <- strsetwputhooks 5684 >>> 14 <- kmem_cache_alloc 8964 >>> 14 <- bcopy 6064 >>> 14 <- so_automatic_bind 5940 >>> 14 <- bcopy 5668 >>> 14 <- kmem_cache_alloc 11600 >>> 14 <- bcopy 5956 >>> 14 <- canputnext 8732 >>> 14 <- strmakedata 5840 >>> 14 <- crhold 8632 >>> 14 <- stream_willservice 5680 >>> 14 <- RD 10688 >>> 14 <- kmem_cache_alloc 8924 >>> 14 <- bcopy 5704 >>> 14 <- crhold 5776 >>> 14 <- cv_broadcast 11520 >>> 14 <- tcp_eager_unlink 6616 >>> 14 <- servicing_interrupt 8808 >>> 14 <- cv_broadcast 8936 >>> 14 <- tcp_maxpsz_set 5660 >>> 14 <- tcp_fuse_set_rcv_hiwat 5808 >>> 14 <- tcp_fuse_maxpsz_set 9756 >>> 14 <- claimstr 9192 >>> 14 <- strqset 6400 >>> 14 <- cv_broadcast 8684 >>> 14 <- kmem_cache_alloc 8984 >>> 14 <- strsock_misc 9956 >>> 14 <- dblk_lastfree 10760 >>> 14 <- kmem_cache_free 6052 >>> 14 <- qbackenable 5844 >>> 14 <- strsock_misc 9364 >>> 14 <- dblk_lastfree 9912 >>> 14 <- kmem_cache_free 5832 >>> 14 <- qbackenable 5576 >>> 14 <- qenable_locked 15604 >>> 14 <- tcp_fuse_syncstr_enable 10396 >>> 14 <- tcp_fuse_syncstr_enable 5932 >>> 14 <- cv_broadcast 5848 >>> 14 <- stream_runservice 6320 >>> 14 <- dblk_lastfree 8652 >>> 14 <- kmem_cache_free 5804 >>> 14 <- sowaitack 8880 >>> 14 <- bcopy 19364 >>> 14 <- crfree 11904 >>> 14 <- kmem_cache_free 6284 >>> 14 <- clear_active_fd 8456 >>> 14 <- cv_broadcast 5528 >>> 14 <- copyout_name 7592 >>> 14 <- kmem_cache_alloc 11640 >>> 14 <- crhold 5272 >>> 14 <- cv_broadcast 8228 >>> 14 <= accept total time in accept: >> 9336163080 >>> total time in fbt: >> 2393624 >>> >>> Thanks, >>> Justin >>> >>> >>> Justin C. Lloyd >>> Senior Engineer and System Administrator >>> 303-684-4166 Office >>> 720-480-0380 Cell >>> 303-684-4100 Fax >>> jlloyd at digitalglobe.com >>> DigitalGlobe (r), An Imaging and Information Company >>> >>> >>> >>> >>> >> --------------------------------------------------------------------- >> - >> -- >>> _______________________________________________ >>> dtrace-discuss mailing list >>> dtrace-discuss at opensolaris.org >> >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> > > > -- > Laissez-nous faire, laissez-nous passer. > Le monde va de lui-m?me. > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- blu "When Congress started Daylight Savings Time earlier, did they even consider what affect an extra hour of daylight would have on global warming? Well, did they?" ---------------------------------------------------------------------- Brian Utterback - Solaris RPE, Sun Microsystems, Inc. Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom _______________________________________________ dtrace-discuss mailing list dtrace-discuss at opensolaris.org