Hi folks,
 This is a client-server environment, oracle export process is running 
on one system and database is running on the other system, the export 
process is running very slow but is running just fine on the database 
server, so I ran dtrace script dtruss on the client side as followed, 
firstly  I wonder why there are some read system calls have much larger 
elapsed time than other read system calls from same file descriptor? and 
secondly how can I troubleshoot further for read system call?
thanks,
Feiwen
===================================================================================#
./dtruss -aeop 19348
PID/LWP   RELATIVE  ELAPSD    CPU SYSCALL(args)          = return
19348/1:     79899    1242    169 write(0xA, "D\021\0", 0x2000)
19348/1:     80303      63     51 write(0x9, "\0", 0x15)        
19348/1:     80366    1021     39 read(0x9, "\a\733\0", 0x810)        
19348/1:     80504      27     16 read(0x9, "\a\733\0", 0x810)        
19348/1:     80593 60002498     35 read(0x9, "\004\710\0", 0x810)
19348/1:     81143    1144     82 write(0xA, "s D\021\0", 0x2000)
19348/1:     81309      63     51 write(0x9, "\0", 0x15)        
19348/1:     81373     827     41 read(0x9, "\a\733\0", 0x810)        
19348/1:     81518      26     15 read(0x9, "\a\733\0", 0x810)        
19348/1:     81696 60003701    115 read(0x9, "\004\724\0", 0x810)
19348/1:     82286    1128     59 write(0x9, "\0", 0x15)        
19348/1:     82363     761     41 read(0x9, "\a\733\0", 0x810)        
19348/1:     82516      24     13 read(0x9, "\a\733\0", 0x810)        
19348/1:     82614 60003141     35 read(0x9, "\004\730\0", 0x810)
19348/1:     83061    1169    100 write(0xA, "56467/03JUL to status D\021
 = 8192 0
19348/1:     83389      60     48 write(0x9, "\0", 0x15)        
19348/1:     83455     961     42 read(0x9, "\a\733\0", 0x810)        
19348/1:     83591      24     13 read(0x9, "\a\733\0", 0x810)        
19348/1:     83688 60003567     39 read(0x9, "\004\725\0", 0x810)
19348/1:     84380    1229    156 write(0x9, "\0", 0x15)        
19348/1:     84465     862     38 read(0x9, "\a\733\0", 0x810)        
19348/1:     84619      24     14 read(0x9, "\a\733\0", 0x810)        
19348/1:     84716 60002793     35 read(0x9, "\004\725\0", 0x810)
19348/1:     84958    1158     80 write(0xA, "tatus D\021\0", 0x2000)
     
19348/1:     85438      65     51 write(0x9, "\0", 0x15)        
19348/1:     85505     940     42 read(0x9, "\a\733\0", 0x810)        
19348/1:     85632      24     13 read(0x9, "\a\733\0", 0x810)        
19348/1:     85808 60003553    122 read(0x9, "\004\725\0", 0x810)
  = 1237 0
19348/1:     86298    1151     82 write(0xA, "933194/03OCT-5W to status 
D\021\0", 0x2000)         = 8192 0
19348/1:     86563      65     52 write(0x9, "\0", 0x15)         = 21
0
19348/1:     86631    1004     42 read(0x9, "\a\733\0", 0x810)        
=
2011 0
19348/1:     86766      23     13 read(0x9, "\a\733\0", 0x810)        
=
2011 0
19348/1:     86863 60002650     35 read(0x9, "\004\732\0", 0x810)
  = 1242 0
19348/1:     87458    1137     68 write(0x9, "\0", 0x15)         = 21
0
19348/1:     87538     791     43 read(0x9, "\a\733\0", 0x810)        
=
2011 0
19348/1:     87693     577     21 read(0x9, "\a\733\0", 0x810)        
=
2011 0
19348/1:     87764      18      8 read(0x9, "\004\736\0", 0x810)
= 1246 0
19348/1:     87997      85     73 write(0xA, " CR 4787092/03OCT to 
status D\021\0", 0x2000)         = 8192 0
19348/1:     88268      34     23 write(0x9, "\0", 0x15)         = 21
0
19348/1:     88303     595     23 read(0x9, "\a\733\0", 0x810)        
=
2011 0
19348/1:     88411      45     18 read(0x9, "\a\733\0", 0x810)        
=
2011 0
19348/1:     88502 60001907     37 read(0x9, "\004\713\0", 0x810)
  = 1227 0
19348/1:     89149    1255    185 write(0xA, "Updated CR 
4922634/03NOV-3W to status D\021\0", 0x2000)        
 = 8192 0
19348/1:     89344      69     57 write(0x9, "\0", 0x15)         = 21
0
19348/1:     89406    1009     39 read(0x9, "\a\733\0", 0x810)        
=
2011 0
19348/1:     89559      26     15 read(0x9, "\a\733\0", 0x810)        
=
2011 0
19348/1:     89659 60002455     37 read(0x9, "\004\715\0", 0x810)
  = 1229 0
19348/1:     90259    1131     59 write(0x9, "\0", 0x15)         = 21
0
19348/1:     90338    1120     43 read(0x9, "\a\733\0", 0x810)        
=
2011 0
19348/1:     90490      26     15 read(0x9, "\a\733\0", 0x810)        
=
2011 0
^C
CALL                                        COUNT
write                                          18
read                                           33
feiwen wan wrote:> > Hi folks, > > This is a client-server environment, oracle export process is running > on one system and database is running on the other system, the export > process is running very slow but is running just fine on the database > server, so I ran dtrace script dtruss on the client side as followed, > firstly I wonder why there are some read system calls have much larger > elapsed time than other read system calls from same file descriptor? and > secondly how can I troubleshoot further for read system call?a few ideas: - you may be causing page faults with your read calls. what kind of file is this? - the script seems somewhat inconsistent, as return values aren''t printed evey time. - can you explain what exactly you''re measuring with "RELATIVE" and "ELAPSED"? I see no connection between the two .. - if indeed "elapsed" is the time this one system call takes, then you could do flowindent tracing on the read system call to see what''s happening. HTH -- Michael Schuster (+49 89) 46008-2974 / x62974 visit the online support center: http://www.sun.com/osc/ Recursion, n.: see ''Recursion''
Michael Schuster - Sun Germany wrote:>feiwen wan wrote: > > >>Hi folks, >> >> This is a client-server environment, oracle export process is running >>on one system and database is running on the other system, the export >>process is running very slow but is running just fine on the database >>server, so I ran dtrace script dtruss on the client side as followed, >>firstly I wonder why there are some read system calls have much larger >>elapsed time than other read system calls from same file descriptor? and >>secondly how can I troubleshoot further for read system call? >> >>The time spent in read() call can vary because of various things : - Like Michael pointed out you could cause page faults in some which could take longer. - Filesystems usually have a read-ahead algorithm which fetch pages in advance. This adds another variable. Also, it would be useful to know what does that fd refer to. Where does the file reside ? Or is it a socket or a pipe ? pfiles would be of help here. Or the other alternative could be to trace all the calls made after read() syscall. That would tell us what code we are using. This option might turn out to be expensive. But, the info would be worth it :) Regards, Sanjeev.> >a few ideas: >- you may be causing page faults with your read calls. what kind of file is >this? >- the script seems somewhat inconsistent, as return values aren''t printed >evey time. >- can you explain what exactly you''re measuring with "RELATIVE" and >"ELAPSED"? I see no connection between the two .. >- if indeed "elapsed" is the time this one system call takes, then you >could do flowindent tracing on the read system call to see what''s >happening. > >HTH > >
G''Day Feiwen, On Mon, 14 Nov 2005, feiwen wan wrote:> Hi folks, > > This is a client-server environment, oracle export process is running > on one system and database is running on the other system, the export > process is running very slow but is running just fine on the database > server, so I ran dtrace script dtruss on the client side as followed, > firstly I wonder why there are some read system calls have much larger > elapsed time than other read system calls from same file descriptor? and > secondly how can I troubleshoot further for read system call? > > thanks, > > Feiwen > ===================================================================================> # ./dtruss -aeop 19348 > PID/LWP RELATIVE ELAPSD CPU SYSCALL(args) = return > 19348/1: 79899 1242 169 write(0xA, "D\021\0", 0x2000) > 19348/1: 80303 63 51 write(0x9, "\0", 0x15) > 19348/1: 80366 1021 39 read(0x9, "\a\733\0", 0x810) > 19348/1: 80504 27 16 read(0x9, "\a\733\0", 0x810) > 19348/1: 80593 60002498 35 read(0x9, "\004\710\0", 0x810) > 19348/1: 81143 1144 82 write(0xA, "s D\021\0", 0x2000) > 19348/1: 81309 63 51 write(0x9, "\0", 0x15) > 19348/1: 81373 827 41 read(0x9, "\a\733\0", 0x810) > 19348/1: 81518 26 15 read(0x9, "\a\733\0", 0x810) > 19348/1: 81696 60003701 115 read(0x9, "\004\724\0", 0x810) > 19348/1: 82286 1128 59 write(0x9, "\0", 0x15) > 19348/1: 82363 761 41 read(0x9, "\a\733\0", 0x810) > 19348/1: 82516 24 13 read(0x9, "\a\733\0", 0x810) > 19348/1: 82614 60003141 35 read(0x9, "\004\730\0", 0x810)[...] The long reads are taking exactly 60.00 seconds elapsed time (the units are microseconds). This sounds a lot like timeouts or polling. As suggested, running "pfiles 19348" will show what file descriptor 9 is (which those reads are waiting for). That should be quite interesting. I can''t say I use relative timestamps a lot (they are vtimestamp based), I''ve found elapsed and on-cpu (-eo) the most useful. cheers, Brendan