Krzys
2008-May-20 13:04 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
Hello folks, I have an issue where developer wrote a code and it runs 10 seconds on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds. We are suspecting that maybe the single Floating-point inside that T2000 is causing this problem? How would I troubleshoot this problem? I was thinking to jump into dtrace as everyone is saying its so great but I am not sure where to start in trying to troubleshoot this. Its to late for me to go to take dtrace training class at this moment as it will take weeks for me to get in, and I was looking around on the internet and there are many different examples but I am not sure which one would be the right for troubleshooting this issue with T2000. It does not make sense that two CPU machine that is slower in speed and has less RAM outperforms T2000 which has a single CPU but its faster, not only that its running Solaris 10 which in theory should be performing better, just does not make any sense... Any suggestions or pointers would be greatly appreciated. Regards, Chris
Vladimir Marek
2008-May-20 13:53 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
Hi,> Hello folks, I have an issue where developer wrote a code and it runs 10 seconds > on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and > the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, > 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds.Have you considered using some sort of performance analysis on your code ? Sun Studio has excellent tools. http://developers.sun.com/solaris/articles/analyzer_qs.html Gcc uses utility called ''gprof''. Those are the tools especially designed to find performance issues in the code.> We are suspecting that maybe the single Floating-point inside that T2000 is > causing this problem? How would I troubleshoot this problem?Is the application using floating point heavily ? Is it multi-threaded ?> I was thinking to jump into dtrace as everyone is saying its so greatAfternoon nap is also great :)> but I am not sure where to start in trying to troubleshoot this. Its > to late for me to go to take dtrace training class at this moment as > it will take weeks for me to get in, and I was looking around on the > internet and there are many different examples but I am not sure which > one would be the right for troubleshooting this issue with T2000.Is the developer machine running also Solaris ? If not, I would recommend finding tool which can be found on both systems, so you can compare results easily (gcc will be probably on both systems)> It does not make sense that two CPU machine that is slower in speed and has less > RAM outperforms T2000 which has a single CPU but its faster, not only that its > running Solaris 10 which in theory should be performing better, just does not > make any sense...Less gigahertz does not mean slower machine. T2000 is good at executing multithread, non floating point arithmetic heavy code.> Any suggestions or pointers would be greatly appreciated.Use profiling tools designed for the task. Hope this helps -- Vlad -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 193 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20080520/3efbc00c/attachment.bin>
michael schuster
2008-May-20 14:00 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
Krzys wrote:> Hello folks, I have an issue where developer wrote a code and it runs 10 seconds > on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and > the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, > 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds. > > We are suspecting that maybe the single Floating-point inside that T2000 is > causing this problem?do you have any information about what this program is actually doing or meant to do? CAn you talk to the developer about this? michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''
David Lutz
2008-May-20 14:31 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
You should also take a look at the Sun Blueprints paper "Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems" at http://www.sun.com/blueprints/0107/819-5144.html which covers a lots of issues that are specific to this platform. Dtrace is a very powerful and useful tool, but you will still generally start with higher level tools like prstat, mpstat, vmstat, etc, and use dtrace to drill down once you have any idea where to look. HTH, Dave Lutz ----- Original Message ----- From: Vladimir Marek <Vladimir.Marek at Sun.COM> Date: Tuesday, May 20, 2008 6:55 am Subject: Re: [dtrace-discuss] help with troubleshooting problem with process on my T2000 To: dtrace-discuss at opensolaris.org> Hi, > > > Hello folks, I have an issue where developer wrote a code and it runs 10 seconds > > on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and > > the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, > > 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds. > > Have you considered using some sort of performance analysis on your code > ? Sun Studio has excellent tools. > > http://developers.sun.com/solaris/articles/analyzer_qs.html > > Gcc uses utility called ''gprof''. > > Those are the tools especially designed to find performance issues in > the code. > > > > > We are suspecting that maybe the single Floating-point inside that > T2000 is > > causing this problem? How would I troubleshoot this problem? > > Is the application using floating point heavily ? Is it multi-threaded > ? > > > > I was thinking to jump into dtrace as everyone is saying its so great > > Afternoon nap is also great :) > > > > > but I am not sure where to start in trying to troubleshoot this. Its > > to late for me to go to take dtrace training class at this moment as > > it will take weeks for me to get in, and I was looking around on the > > internet and there are many different examples but I am not sure which > > one would be the right for troubleshooting this issue with T2000. > > Is the developer machine running also Solaris ? If not, I would > recommend finding tool which can be found on both systems, so you can > compare results easily (gcc will be probably on both systems) > > > > It does not make sense that two CPU machine that is slower in speed > and has less > > RAM outperforms T2000 which has a single CPU but its faster, not > only that its > > running Solaris 10 which in theory should be performing better, just > does not > > make any sense... > > Less gigahertz does not mean slower machine. T2000 is good at executing > multithread, non floating point arithmetic heavy code. > > > > Any suggestions or pointers would be greatly appreciated. > > Use profiling tools designed for the task. > > Hope this helps > > -- > Vlad > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Rayson Ho
2008-May-20 14:34 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
cpustat can tell you the usage of FP ops, you can follow the instructions on p35 in: "Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems" BTW, I wrote most of the tuning section for T1/T2 on wikipedia, you may find some useful information there: http://en.wikipedia.org/wiki/UltraSPARC_T1#Application_tuning Rayson On 5/20/08, Krzys <krzys at perfekt.net> wrote:> > Hello folks, I have an issue where developer wrote a code and it runs 10 seconds > on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and > the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, > 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds. > > We are suspecting that maybe the single Floating-point inside that T2000 is > causing this problem? How would I troubleshoot this problem? I was thinking to > jump into dtrace as everyone is saying its so great but I am not sure where to > start in trying to troubleshoot this. Its to late for me to go to take dtrace > training class at this moment as it will take weeks for me to get in, and I was > looking around on the internet and there are many different examples but I am > not sure which one would be the right for troubleshooting this issue with T2000. > It does not make sense that two CPU machine that is slower in speed and has less > RAM outperforms T2000 which has a single CPU but its faster, not only that its > running Solaris 10 which in theory should be performing better, just does not > make any sense... > > Any suggestions or pointers would be greatly appreciated. > > Regards, > > Chris > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
Krzys
2008-May-20 15:20 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
Thanks so much to everyone that replied to my post, here is what I did notice while trying cooltst utility (http://cooltools.sunsource.net/cooltst/) wstest[root] cat cooltst.err Minimum observation interval is 10 seconds cooltst 3.0.1 executed at on wstest/Solaris/UltraSPARC-T1 runtime=1, interval=10 measure_cpu=1, cooltst=/opt/cooltst_v3.01, examine=MATLAB Workload spike analysis: Highest thread was 21455/2 (MATLAB) = 3.0% 2008-05-20 11:05:17: MATLAB 1.5% vs. top thread (MATLAB) 1.5%, total 3.2% 2008-05-20 11:05:28: MATLAB 2.1% vs. top thread (MATLAB) 2.1%, total 3% 2008-05-20 11:05:38: MATLAB 2.5% vs. top thread (MATLAB) 2.5%, total 2.9% 2008-05-20 11:05:49: MATLAB 2.8% vs. top thread (MATLAB) 2.8%, total 2.9% 2008-05-20 11:05:59: MATLAB 2.9% vs. top thread (MATLAB) 2.9%, total 2.7% 2008-05-20 11:06:10: MATLAB 3.0% vs. top thread (MATLAB) 3.0%, total 2.8% In 6 out of 6 observation intervals, MATLAB was the top thread Internal system type code: solaris.t1. Ver detail: 4.10 wstest[root] wstest[root] cat cooltst.out CoolThreads Selection Tool (cooltst) version 3.0.1 Copyright 2008 Sun Microsystems, Inc. All rights reserved Use is subject to license terms. Cooltst observes a running workload and applies various heuristics to assess whether that workload may be suitable for a Sun Fire T1000/T2000/T5x20 system, to help you judge how much effort to put into a feasibility study which might include porting, prototyping, and/or performance measurement of your applications. Cooltst is NOT a system sizing or capacity planning tool, and the rough approximations used internally in cooltst should not substitute for detailed performance analysis. System Configuration Host name wstest System name SUNW,Sun-Fire-T200 Effective UID 0 Cooltst version 3.0.1 OS Solaris OS release 5.10 OS version Generic_127111-11 Distro Solaris BIOS/PROM OBP 4.25.0 2006/11/07 23:24 Memory 16376 MB Chip UltraSPARC-T1 MHz 1200 Architecture SPARC # of Virtual CPUs 32 P0: 1200 MHz UltraSPARC-T1 P1: 1200 MHz UltraSPARC-T1 P2: 1200 MHz UltraSPARC-T1 P3: 1200 MHz UltraSPARC-T1 P4: 1200 MHz UltraSPARC-T1 P5: 1200 MHz UltraSPARC-T1 P6: 1200 MHz UltraSPARC-T1 P7: 1200 MHz UltraSPARC-T1 P8: 1200 MHz UltraSPARC-T1 P9: 1200 MHz UltraSPARC-T1 P10: 1200 MHz UltraSPARC-T1 P11: 1200 MHz UltraSPARC-T1 P12: 1200 MHz UltraSPARC-T1 P13: 1200 MHz UltraSPARC-T1 P14: 1200 MHz UltraSPARC-T1 P15: 1200 MHz UltraSPARC-T1 P16: 1200 MHz UltraSPARC-T1 P17: 1200 MHz UltraSPARC-T1 P18: 1200 MHz UltraSPARC-T1 P19: 1200 MHz UltraSPARC-T1 P20: 1200 MHz UltraSPARC-T1 P21: 1200 MHz UltraSPARC-T1 P22: 1200 MHz UltraSPARC-T1 P23: 1200 MHz UltraSPARC-T1 P24: 1200 MHz UltraSPARC-T1 P25: 1200 MHz UltraSPARC-T1 P26: 1200 MHz UltraSPARC-T1 P27: 1200 MHz UltraSPARC-T1 P28: 1200 MHz UltraSPARC-T1 P29: 1200 MHz UltraSPARC-T1 P30: 1200 MHz UltraSPARC-T1 P31: 1200 MHz UltraSPARC-T1 OS release detail: Solaris 10 6/06 s10s_u2wos_09a SPARC Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 09 June 2006 Workload Measurements Observed system for 1 min in intervals of 10 sec Cycles 1728826254700 Instructions 14004912581 CPI 123.44 ** FP instructions 144384919 Emulated FP instructions 100621 FP Percentage 1.0% The following applies to the measurement interval with the busiest single thread or process: Peak thread utilization at 2008-05-20 11:06:10 Corresponding file name 1211295970 CPU utilization 3.9% Command MATLAB PID/LWPID 21455/2 Thread utilization 3.0% More detail on processes and threads is in data/process.out **Cycles per Instruction (CPI) is not comparable between UltraSPARC T1 and T2 processors and conventional processors. Conventional processors execute an idle loop when there is no work to do, so CPI may be artificially low, especially when the system is somewhat idle. The UltraSPARC T1 and T2 "park" idle threads, consuming no energy, when there is no work to do, so CPI may be artificially high, especially when the system is somewhat idle. Advice During the observation of the highest utilization thread, a fairly low overall CPU utilization of 3.9375% was seen. Are you sure that the workload of interest was running on the system during the time cooltst was running? If not, please run cooltst again while your workload is active. If your workload was running during this observation, then take cooltst''s advice with this caveat. If you expect your workload to increase to higher levels, do you expect it to do so by adding additional threads, as is common, or do you expect it to add more work to the existing single thread, as sometimes happens? If your response time criteria are being met, then even if a single thread is responsible for most of your CPU consumption, you should still get acceptable performance from a a system based on the UltraSPARC T1 processor (i.e. Sun Fire/SPARC Enterprise T1000, T2000, Sun Blade T6300) or a system based on the UltraSPARC T2 processor (i.e. SPARC Enterprise T5120, T5220, Sun Blade T6320), with excess throughput capacity for future growth. But if response time is marginal and workload growth is expected to be in a single thread, then a CMT system may not be appropriate. Floating Point YELLOW Observed floating point content was marginal for an UltraSPARC T1 processor. You may proceed with your evaluation of UltraSPARC T1 for your workload, watching the floating point usage carefully. Or you may instead consider an UltraSPARC T2 processor which can handle floating point heavy workloads. Parallelism GREEN The observed workload has sufficient threads of execution to efficiently utilize the multiple cores and threads of an UltraSPARC T1 or UltraSPARC T2 processor. wstest[root] based on that tool my system is not that much utilized not abused, MATLAB is cerainly using floating point calculations, but still why its so bad in terms of how long it runs comparing to a PC and why it is still slower than V240? I was seriously hoping to see if there is way using dtrace to look into this and have some more ideas in terms of what is going on and what is the limitation. Thanks so much for everyone''s help. Regards, Chris On Tue, 20 May 2008, Vladimir Marek wrote:> Hi, > >> Hello folks, I have an issue where developer wrote a code and it runs 10 seconds >> on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and >> the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, >> 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds. > > Have you considered using some sort of performance analysis on your code > ? Sun Studio has excellent tools. > > http://developers.sun.com/solaris/articles/analyzer_qs.html > > Gcc uses utility called ''gprof''. > > Those are the tools especially designed to find performance issues in > the code. > > > >> We are suspecting that maybe the single Floating-point inside that T2000 is >> causing this problem? How would I troubleshoot this problem? > > Is the application using floating point heavily ? Is it multi-threaded ? > > >> I was thinking to jump into dtrace as everyone is saying its so great > > Afternoon nap is also great :) > > > >> but I am not sure where to start in trying to troubleshoot this. Its >> to late for me to go to take dtrace training class at this moment as >> it will take weeks for me to get in, and I was looking around on the >> internet and there are many different examples but I am not sure which >> one would be the right for troubleshooting this issue with T2000. > > Is the developer machine running also Solaris ? If not, I would > recommend finding tool which can be found on both systems, so you can > compare results easily (gcc will be probably on both systems) > > >> It does not make sense that two CPU machine that is slower in speed and has less >> RAM outperforms T2000 which has a single CPU but its faster, not only that its >> running Solaris 10 which in theory should be performing better, just does not >> make any sense... > > Less gigahertz does not mean slower machine. T2000 is good at executing > multithread, non floating point arithmetic heavy code. > > >> Any suggestions or pointers would be greatly appreciated. > > Use profiling tools designed for the task. > > Hope this helps > > -- > Vlad >