Hi folks, This is a client-server environment, oracle export process is running on one system and database is running on the other system, the export process is running very slow but is running just fine on the database server, so I ran dtrace script dtruss on the client side as followed, firstly I wonder why there are some read system calls have much larger elapsed time than other read system calls from same file descriptor? and secondly how can I troubleshoot further for read system call? thanks, Feiwen ===================================================================================# ./dtruss -aeop 19348 PID/LWP RELATIVE ELAPSD CPU SYSCALL(args) = return 19348/1: 79899 1242 169 write(0xA, "D\021\0", 0x2000) 19348/1: 80303 63 51 write(0x9, "\0", 0x15) 19348/1: 80366 1021 39 read(0x9, "\a\733\0", 0x810) 19348/1: 80504 27 16 read(0x9, "\a\733\0", 0x810) 19348/1: 80593 60002498 35 read(0x9, "\004\710\0", 0x810) 19348/1: 81143 1144 82 write(0xA, "s D\021\0", 0x2000) 19348/1: 81309 63 51 write(0x9, "\0", 0x15) 19348/1: 81373 827 41 read(0x9, "\a\733\0", 0x810) 19348/1: 81518 26 15 read(0x9, "\a\733\0", 0x810) 19348/1: 81696 60003701 115 read(0x9, "\004\724\0", 0x810) 19348/1: 82286 1128 59 write(0x9, "\0", 0x15) 19348/1: 82363 761 41 read(0x9, "\a\733\0", 0x810) 19348/1: 82516 24 13 read(0x9, "\a\733\0", 0x810) 19348/1: 82614 60003141 35 read(0x9, "\004\730\0", 0x810) 19348/1: 83061 1169 100 write(0xA, "56467/03JUL to status D\021 = 8192 0 19348/1: 83389 60 48 write(0x9, "\0", 0x15) 19348/1: 83455 961 42 read(0x9, "\a\733\0", 0x810) 19348/1: 83591 24 13 read(0x9, "\a\733\0", 0x810) 19348/1: 83688 60003567 39 read(0x9, "\004\725\0", 0x810) 19348/1: 84380 1229 156 write(0x9, "\0", 0x15) 19348/1: 84465 862 38 read(0x9, "\a\733\0", 0x810) 19348/1: 84619 24 14 read(0x9, "\a\733\0", 0x810) 19348/1: 84716 60002793 35 read(0x9, "\004\725\0", 0x810) 19348/1: 84958 1158 80 write(0xA, "tatus D\021\0", 0x2000) 19348/1: 85438 65 51 write(0x9, "\0", 0x15) 19348/1: 85505 940 42 read(0x9, "\a\733\0", 0x810) 19348/1: 85632 24 13 read(0x9, "\a\733\0", 0x810) 19348/1: 85808 60003553 122 read(0x9, "\004\725\0", 0x810) = 1237 0 19348/1: 86298 1151 82 write(0xA, "933194/03OCT-5W to status D\021\0", 0x2000) = 8192 0 19348/1: 86563 65 52 write(0x9, "\0", 0x15) = 21 0 19348/1: 86631 1004 42 read(0x9, "\a\733\0", 0x810) = 2011 0 19348/1: 86766 23 13 read(0x9, "\a\733\0", 0x810) = 2011 0 19348/1: 86863 60002650 35 read(0x9, "\004\732\0", 0x810) = 1242 0 19348/1: 87458 1137 68 write(0x9, "\0", 0x15) = 21 0 19348/1: 87538 791 43 read(0x9, "\a\733\0", 0x810) = 2011 0 19348/1: 87693 577 21 read(0x9, "\a\733\0", 0x810) = 2011 0 19348/1: 87764 18 8 read(0x9, "\004\736\0", 0x810) = 1246 0 19348/1: 87997 85 73 write(0xA, " CR 4787092/03OCT to status D\021\0", 0x2000) = 8192 0 19348/1: 88268 34 23 write(0x9, "\0", 0x15) = 21 0 19348/1: 88303 595 23 read(0x9, "\a\733\0", 0x810) = 2011 0 19348/1: 88411 45 18 read(0x9, "\a\733\0", 0x810) = 2011 0 19348/1: 88502 60001907 37 read(0x9, "\004\713\0", 0x810) = 1227 0 19348/1: 89149 1255 185 write(0xA, "Updated CR 4922634/03NOV-3W to status D\021\0", 0x2000) = 8192 0 19348/1: 89344 69 57 write(0x9, "\0", 0x15) = 21 0 19348/1: 89406 1009 39 read(0x9, "\a\733\0", 0x810) = 2011 0 19348/1: 89559 26 15 read(0x9, "\a\733\0", 0x810) = 2011 0 19348/1: 89659 60002455 37 read(0x9, "\004\715\0", 0x810) = 1229 0 19348/1: 90259 1131 59 write(0x9, "\0", 0x15) = 21 0 19348/1: 90338 1120 43 read(0x9, "\a\733\0", 0x810) = 2011 0 19348/1: 90490 26 15 read(0x9, "\a\733\0", 0x810) = 2011 0 ^C CALL COUNT write 18 read 33
feiwen wan wrote:> > Hi folks, > > This is a client-server environment, oracle export process is running > on one system and database is running on the other system, the export > process is running very slow but is running just fine on the database > server, so I ran dtrace script dtruss on the client side as followed, > firstly I wonder why there are some read system calls have much larger > elapsed time than other read system calls from same file descriptor? and > secondly how can I troubleshoot further for read system call?a few ideas: - you may be causing page faults with your read calls. what kind of file is this? - the script seems somewhat inconsistent, as return values aren''t printed evey time. - can you explain what exactly you''re measuring with "RELATIVE" and "ELAPSED"? I see no connection between the two .. - if indeed "elapsed" is the time this one system call takes, then you could do flowindent tracing on the read system call to see what''s happening. HTH -- Michael Schuster (+49 89) 46008-2974 / x62974 visit the online support center: http://www.sun.com/osc/ Recursion, n.: see ''Recursion''
Michael Schuster - Sun Germany wrote:>feiwen wan wrote: > > >>Hi folks, >> >> This is a client-server environment, oracle export process is running >>on one system and database is running on the other system, the export >>process is running very slow but is running just fine on the database >>server, so I ran dtrace script dtruss on the client side as followed, >>firstly I wonder why there are some read system calls have much larger >>elapsed time than other read system calls from same file descriptor? and >>secondly how can I troubleshoot further for read system call? >> >>The time spent in read() call can vary because of various things : - Like Michael pointed out you could cause page faults in some which could take longer. - Filesystems usually have a read-ahead algorithm which fetch pages in advance. This adds another variable. Also, it would be useful to know what does that fd refer to. Where does the file reside ? Or is it a socket or a pipe ? pfiles would be of help here. Or the other alternative could be to trace all the calls made after read() syscall. That would tell us what code we are using. This option might turn out to be expensive. But, the info would be worth it :) Regards, Sanjeev.> >a few ideas: >- you may be causing page faults with your read calls. what kind of file is >this? >- the script seems somewhat inconsistent, as return values aren''t printed >evey time. >- can you explain what exactly you''re measuring with "RELATIVE" and >"ELAPSED"? I see no connection between the two .. >- if indeed "elapsed" is the time this one system call takes, then you >could do flowindent tracing on the read system call to see what''s >happening. > >HTH > >
G''Day Feiwen, On Mon, 14 Nov 2005, feiwen wan wrote:> Hi folks, > > This is a client-server environment, oracle export process is running > on one system and database is running on the other system, the export > process is running very slow but is running just fine on the database > server, so I ran dtrace script dtruss on the client side as followed, > firstly I wonder why there are some read system calls have much larger > elapsed time than other read system calls from same file descriptor? and > secondly how can I troubleshoot further for read system call? > > thanks, > > Feiwen > ===================================================================================> # ./dtruss -aeop 19348 > PID/LWP RELATIVE ELAPSD CPU SYSCALL(args) = return > 19348/1: 79899 1242 169 write(0xA, "D\021\0", 0x2000) > 19348/1: 80303 63 51 write(0x9, "\0", 0x15) > 19348/1: 80366 1021 39 read(0x9, "\a\733\0", 0x810) > 19348/1: 80504 27 16 read(0x9, "\a\733\0", 0x810) > 19348/1: 80593 60002498 35 read(0x9, "\004\710\0", 0x810) > 19348/1: 81143 1144 82 write(0xA, "s D\021\0", 0x2000) > 19348/1: 81309 63 51 write(0x9, "\0", 0x15) > 19348/1: 81373 827 41 read(0x9, "\a\733\0", 0x810) > 19348/1: 81518 26 15 read(0x9, "\a\733\0", 0x810) > 19348/1: 81696 60003701 115 read(0x9, "\004\724\0", 0x810) > 19348/1: 82286 1128 59 write(0x9, "\0", 0x15) > 19348/1: 82363 761 41 read(0x9, "\a\733\0", 0x810) > 19348/1: 82516 24 13 read(0x9, "\a\733\0", 0x810) > 19348/1: 82614 60003141 35 read(0x9, "\004\730\0", 0x810)[...] The long reads are taking exactly 60.00 seconds elapsed time (the units are microseconds). This sounds a lot like timeouts or polling. As suggested, running "pfiles 19348" will show what file descriptor 9 is (which those reads are waiting for). That should be quite interesting. I can''t say I use relative timestamps a lot (they are vtimestamp based), I''ve found elapsed and on-cpu (-eo) the most useful. cheers, Brendan