I am not sure if this is ZFS, Niagara or something else issue? Does someone know why commands have the latency shown below? *1) do a ls of a directory. 6.9 seconds total, truss only shows .07 seconds.* itsm-mpk-2% timex truss -fdD -o truss.out ls | wc real 6.94 user 0.01 sys 0.05 639 639 37247 itsm-mpk-2% tail truss.out 27449: 0.0714 0.0002 fstat64(1, 0xFFBFDBF8) = 0 27449: 0.0720 0.0006 write(1, " C . 2 0 0 6 0 6 3 0 2 2".., 5120) = 5120 27449: 0.0726 0.0006 write(1, " g z\n C . 2 0 0 6 0 7 1".., 5120) = 5120 27449: 0.0733 0.0007 write(1, " n . C O M . t a r . g z".., 5120) = 5120 27449: 0.0739 0.0006 write(1, " . t a r . g z\n C . 2 0".., 5120) = 5120 27449: 0.0745 0.0006 write(1, " 1 0 9 . s r 1 - e h d b".., 5120) = 5120 27449: 0.0752 0.0007 write(1, " 2 - 0 1 . i s t a n b u".., 5120) = 5120 27449: 0.0758 0.0006 write(1, " p e 0 6 - 0 5 . s u n r".., 5120) = 5120 27449: 0.0763 0.0005 write(1, " c s y d 0 7 - 0 2 . S M".., 1407) = 1407 27449: 0.0766 0.0003 _exit(0) *2) do a pwd. 4.7 seconds total, truss only shows .18 seconds.* itsm-mpk-2% timex truss -fdD -o truss.out pwd /upload/canary/incoming real 4.75 user 0.02 sys 0.06 itsm-mpk-2% !tail tail truss.out 19422: 0.1802 0.0003 close(3) = 0 19422: 0.1820 0.0018 munmap(0xFF3A0000, 8192) = 0 19422: 0.1824 0.0004 mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF3A0000 19422: 0.1830 0.0006 getcontext(0xFFBFF788) 19422: 0.1832 0.0002 getrlimit(RLIMIT_STACK, 0xFFBFF768) = 0 19422: 0.1835 0.0003 getpid() = 19422 [18587] 19422: 0.1838 0.0003 setustack(0xFF3A2088) 19422: 0.1843 0.0005 getcwd("/upload/canary/incoming", 1025) = 0 19422: 0.1849 0.0006 write(1, " / u p l o a d / c a n a".., 24) = 24 19422: 0.1854 0.0005 _exit(0) *try it on a normal sun ray server, 0.21 secs instead of 4.71 secs.* sr1-ubrm-20% timex truss -fdD -o truss.out pwd /home/smeighan real 0.21 user 0.01 sys 0.02 *3) do an uptime. 4.7 seconds total, truss only shows .68 seconds.* timex truss -fdD -o truss.out uptime 9:49am up 11 day(s), 16:38, 4 users, load average: 1.96, 2.39, 2.11 real 4.70 user 0.01 sys 0.04 itsm-mpk-2% !tail tail truss.out 23966: 0.6823 0.0002 mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFF380000 23966: 0.6826 0.0003 sysinfo(SI_ISALIST, "", 1) = 115 23966: 0.6829 0.0003 sysinfo(SI_ISALIST, "sparcv9 sparcv8plus sparcv8 sparcv8-fsmuld sparcv7 sparc sparcv9+vis sparcv9+vis2 sparcv8plus+vis sparcv8plus+vis2", 115) = 115 23966: 0.6834 0.0005 open("/proc/self/auxv", O_RDONLY) = 3 23966: 0.6837 0.0003 fstat(3, 0xFFBFFB00) = 0 23966: 0.6839 0.0002 read(3, "\0\007D8FFBFFFD8\0\007DE".., 152) = 152 23966: 0.6842 0.0003 close(3) = 0 23966: 0.6844 0.0002 access("/usr/bin/sparcv9/uptime", X_OK) = 0 23966: 0.6847 0.0003 execve("/usr/bin/sparcv9/uptime", 0xFFBFFCB4, 0xFFBFFCBC) 23966: *** cannot trace across exec() of /usr/bin/sparcv9/uptime *** It acts like the command is not getting scheduled right away. We don''t have a lot of load on this box itsm-mpk-2% vmstat 1 11 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s1 s2 -- -- in sy cs us sy id 0 0 0 1298392 660592 338 2708 54 2 2 0 5 408 0 0 0 1754 7387 2113 2 6 91 0 0 0 430712 390312 25 446 0 0 0 0 0 440 0 0 0 2037 3087 2369 1 2 97 0 0 0 430528 390136 203 1345 0 0 0 0 0 419 0 0 0 1998 5147 2383 1 3 96 0 0 0 428024 387584 111 1594 0 0 0 0 0 392 0 0 0 1846 4102 2279 0 3 96 0 0 0 428104 387488 152 2284 0 0 0 0 0 391 0 0 0 2149 6195 2718 2 4 94 0 0 0 428048 390240 109 1697 0 0 0 0 0 366 0 0 0 1956 4711 2477 1 3 96 0 0 0 426560 385792 242 2519 32 0 0 0 0 337 0 0 0 1896 5116 2326 1 5 94 1 0 0 425648 384992 136 1603 0 0 0 0 0 327 0 0 0 1540 3444 1805 1 5 94 1 0 0 426336 385584 25 349 0 0 0 0 0 328 0 0 0 1356 1107 1527 0 4 96 1 0 0 425952 385192 77 711 0 0 0 0 0 322 0 0 0 1414 1785 1598 0 5 95 1 0 0 425464 384488 249 2263 0 0 0 0 0 347 0 0 0 1855 4636 2264 0 6 94 when some of our batch processes run, we still do not use much of the box itsm-mpk-2% vmstat 1 11 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s1 s2 -- -- in sy cs us sy id 0 0 0 1299456 660600 339 2707 54 2 2 0 5 408 0 0 0 1753 7388 2113 2 6 91 0 0 0 1086392 1046240 153 2501 0 0 0 0 0 445 0 0 0 1770 10175 2080 5 4 91 0 0 0 1074800 1034608 230 3342 0 0 0 0 0 413 0 0 0 1833 6621 2243 1 4 95 0 0 0 1066920 1029512 474 6529 0 0 0 0 0 493 0 0 0 1931 25615 2372 10 10 80 0 0 0 1021792 987048 771 13479 0 0 0 0 0 481 0 0 0 2342 37421 3094 29 20 51 0 0 0 991712 964256 1083 20993 0 0 0 0 0 461 0 0 0 2601 39540 3904 5 30 65 0 0 0 985880 953208 1125 22201 8 0 0 0 0 430 0 0 0 2695 49500 3996 12 35 54 0 0 0 971920 936512 1295 26029 0 0 0 0 0 435 0 0 0 2753 48871 4311 10 38 52 0 0 0 946088 923368 1117 24737 0 0 0 0 0 440 0 0 0 3835 46394 6793 6 36 59 0 0 0 935304 914856 1029 22261 0 0 0 0 0 418 0 0 0 3836 45741 6734 5 34 61 0 0 0 940704 902776 1145 25996 0 0 0 0 0 427 0 0 0 3724 46909 6476 6 42 51 The box is a T2000, 32x1000mhz 8 gig memory. This is a Solaris 10 machine running ZFS. We do have root on our box so can run dtrace. I have included the output from dexplorer. thanks sean PS: i have copied the internal Niagara alias as a BCC since I am including an external email alias. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060710/b8816a32/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: de_itsm-mpk-2_200607101001.tar.gz Type: application/x-gzip Size: 38381 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060710/b8816a32/attachment.bin>
Michael Schuster - Sun Microsystems
2006-Jul-11 08:13 UTC
[zfs-discuss] long time to schedule commands
Sean Meighan wrote:> I am not sure if this is ZFS, Niagara or something else issue? Does > someone know why commands have the latency shown below? > > *1) do a ls of a directory. 6.9 seconds total, truss only shows .07 > seconds.*[...] this may be an issue with your $PATH. Do you see the same behaviour if you use absolute paths for the commands? HTH Michael -- Michael Schuster (+49 89) 46008-2974 / x62974 visit the online support center: http://www.sun.com/osc/ Recursion, n.: see ''Recursion''
Michael Schuster - Sun Microsystems wrote:> Sean Meighan wrote: >> I am not sure if this is ZFS, Niagara or something else issue? Does >> someone know why commands have the latency shown below? >> >> *1) do a ls of a directory. 6.9 seconds total, truss only shows .07 >> seconds.* > > [...] > > this may be an issue with your $PATH. Do you see the same behaviour if > you use absolute paths for the commands?It could also be an interaction with your shell. ksh, for example, will write to your $HISTFILE before the exec. If it takes a long time to update your $HISTFILE, then it will appear as if the command itself is slow. truss -d on the shell will show this conclusively. -- richard
Richard Elling wrote:> Michael Schuster - Sun Microsystems wrote: >> Sean Meighan wrote: >>> I am not sure if this is ZFS, Niagara or something else issue? Does >>> someone know why commands have the latency shown below? >>> >>> *1) do a ls of a directory. 6.9 seconds total, truss only shows .07 >>> seconds.* >> >> [...] >> >> this may be an issue with your $PATH. Do you see the same behaviour if >> you use absolute paths for the commands? > > It could also be an interaction with your shell. ksh, for example, will > write to your $HISTFILE before the exec. If it takes a long time to > update your $HISTFILE, then it will appear as if the command itself is > slow. truss -d on the shell will show this conclusively.ksh doesn''t just write to $HISTFILE - ksh fsync()s it, and thus one may be tripping over: 6413510 zfs: writing to ZFS filesystem slows down fsync() on other files in the same FS Certainly, I initially found this issue as a result sluggishness in the ksh prompt. Dana
i made sure path is clean, i also qualified the paths. time varies from 0.5 seconds to 15 seconds. If i just do a "timex pwd", it always seems to be fast. We are using csh. itsm-mpk-2% *env* HOME=/app/canary PATH=/usr/bin:/usr/local/bin:/usr/sbin LOGNAME=canary HZ=100 TERM=xterm TZ=US/Pacific SHELL=/bin/csh MAIL=/var/mail/canary DISPLAY=sr1-ubrm-20:55.0 PWD=/app/canary/data/incoming USER=canary JAVA_HOME=/usr/jdk/instances/jdk1.5.0 TOOLS=/app/tools LD_LIBRARY_PATHVIM=/app/tools/vim70/share/vim/vim70 CVSROOT=:pserver:canary at cvs.sfbay:/export/cvs/cvsroot EDITOR=vi ENV=/app/canary/.kshrc LD_LIBRARY_PATH=PATH=/usr/bin:/usr/local/bin:/usr/ccs/bin:/usr/sbin:/usr/openwin/binitsm-mpk-2% */bin/timex /bin/truss -fdD -o truss.out pwd* /upload/canary/incoming real 13.57 user 0.01 sys 0.03 itsm-mpk-2% tail truss.out 26078: 0.0541 0.0001 close(3) = 0 26078: 0.0553 0.0012 munmap(0xFF3A0000, 8192) = 0 26078: 0.0556 0.0003 mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF3A0000 26078: 0.0560 0.0004 getcontext(0xFFBFF7F8) 26078: 0.0562 0.0002 getrlimit(RLIMIT_STACK, 0xFFBFF7D8) = 0 26078: 0.0563 0.0001 getpid() = 26078 [25982] 26078: 0.0565 0.0002 setustack(0xFF3A2088) 26078: 0.0568 0.0003 getcwd("/upload/canary/incoming", 1025) = 0 26078: 0.0573 0.0005 write(1, " / u p l o a d / c a n a".., 24) = 24 26078: 0.0576 0.0003 _exit(0) Michael Schuster - Sun Microsystems wrote:> Sean Meighan wrote: > >> I am not sure if this is ZFS, Niagara or something else issue? Does >> someone know why commands have the latency shown below? >> >> *1) do a ls of a directory. 6.9 seconds total, truss only shows .07 >> seconds.* > > > [...] > > this may be an issue with your $PATH. Do you see the same behaviour if > you use absolute paths for the commands? > > HTH > Michael-- <http://www.sun.com> * Sean Meighan * Mgr ITSM Engineering *Sun Microsystems, Inc.* US Phone x32329 / +1 408 850-9537 Mobile 303-520-2024 Fax 408 850-9537 Email Sean.Meighan at Sun.COM ------------------------------------------------------------------------ NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060711/528ecb9a/attachment.html>
Sean Meighan wrote:> i made sure path is clean, i also qualified the paths. time varies > from 0.5 seconds to 15 seconds. If i just do a "timex pwd", it always > seems to be fast. We are using csh. >Here''s a simple dscript to figure out how long each syscall is taking: #!/usr/sbin/dtrace -FCs syscall:::entry /execname == "pwd"/ { @count[probefunc] = count(); self->time = timestamp; } syscall:::return /self->time/ { @avg[probefunc] = avg(timestamp - self->time); @sum[probefunc] = sum(timestamp - self->time); self->time = 0; } END { printf("\nCOUNT\n"); printa(@count); printf("\n\n\n\nAVG TIME\n"); printa(@avg); printf("\n\n\n\nSUM TIME\n"); printa(@sum); } Then you can run this like: fsh-hake# /usr/bin/time dtrace -Zs dscript.d -c ''/usr/bin/pwd'' Make sure to path out ''pwd'' (as opposed to using your shell''s built in command). If it isn''t ''pwd'', then you can see if ''time'' is taking a long time too.... eric> itsm-mpk-2% *env* > HOME=/app/canary > PATH=/usr/bin:/usr/local/bin:/usr/sbin > LOGNAME=canary > HZ=100 > TERM=xterm > TZ=US/Pacific > SHELL=/bin/csh > MAIL=/var/mail/canary > DISPLAY=sr1-ubrm-20:55.0 > PWD=/app/canary/data/incoming > USER=canary > JAVA_HOME=/usr/jdk/instances/jdk1.5.0 > TOOLS=/app/tools > LD_LIBRARY_PATH> VIM=/app/tools/vim70/share/vim/vim70 > CVSROOT=:pserver:canary at cvs.sfbay:/export/cvs/cvsroot > EDITOR=vi > ENV=/app/canary/.kshrc > LD_LIBRARY_PATH=> PATH=/usr/bin:/usr/local/bin:/usr/ccs/bin:/usr/sbin:/usr/openwin/bin> itsm-mpk-2% */bin/timex /bin/truss -fdD -o truss.out pwd* > /upload/canary/incoming > > real 13.57 > user 0.01 > sys 0.03 > itsm-mpk-2% tail truss.out > 26078: 0.0541 0.0001 > close(3) = 0 > 26078: 0.0553 0.0012 munmap(0xFF3A0000, > 8192) = 0 > 26078: 0.0556 0.0003 mmap(0x00010000, 24576, > PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) > = 0xFF3A0000 > 26078: 0.0560 0.0004 getcontext(0xFFBFF7F8) > 26078: 0.0562 0.0002 getrlimit(RLIMIT_STACK, > 0xFFBFF7D8) = 0 > 26078: 0.0563 0.0001 > getpid() = 26078 [25982] > 26078: 0.0565 0.0002 setustack(0xFF3A2088) > 26078: 0.0568 0.0003 getcwd("/upload/canary/incoming", > 1025) = 0 > 26078: 0.0573 0.0005 write(1, " / u p l o a d / c a n a".., > 24) = 24 > 26078: 0.0576 0.0003 _exit(0) > > > Michael Schuster - Sun Microsystems wrote: > >> Sean Meighan wrote: >> >>> I am not sure if this is ZFS, Niagara or something else issue? Does >>> someone know why commands have the latency shown below? >>> >>> *1) do a ls of a directory. 6.9 seconds total, truss only shows .07 >>> seconds.* >> >> >> [...] >> >> this may be an issue with your $PATH. Do you see the same behaviour >> if you use absolute paths for the commands? >> >> HTH >> Michael > > > -- > <http://www.sun.com> * Sean Meighan * > Mgr ITSM Engineering > > *Sun Microsystems, Inc.* > US > Phone x32329 / +1 408 850-9537 > Mobile 303-520-2024 > Fax 408 850-9537 > Email Sean.Meighan at Sun.COM > > > ------------------------------------------------------------------------ > NOTICE: This email message is for the sole use of the intended > recipient(s) and may contain confidential and privileged information. > Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message. > ------------------------------------------------------------------------ > >------------------------------------------------------------------------ > >_______________________________________________ >zfs-discuss mailing list >zfs-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >