Krzys
2008-May-20 13:04 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
Hello folks, I have an issue where developer wrote a code and it runs 10 seconds on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds. We are suspecting that maybe the single Floating-point inside that T2000 is causing this problem? How would I troubleshoot this problem? I was thinking to jump into dtrace as everyone is saying its so great but I am not sure where to start in trying to troubleshoot this. Its to late for me to go to take dtrace training class at this moment as it will take weeks for me to get in, and I was looking around on the internet and there are many different examples but I am not sure which one would be the right for troubleshooting this issue with T2000. It does not make sense that two CPU machine that is slower in speed and has less RAM outperforms T2000 which has a single CPU but its faster, not only that its running Solaris 10 which in theory should be performing better, just does not make any sense... Any suggestions or pointers would be greatly appreciated. Regards, Chris
Vladimir Marek
2008-May-20 13:53 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
Hi,> Hello folks, I have an issue where developer wrote a code and it runs 10 seconds > on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and > the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, > 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds.Have you considered using some sort of performance analysis on your code ? Sun Studio has excellent tools. http://developers.sun.com/solaris/articles/analyzer_qs.html Gcc uses utility called ''gprof''. Those are the tools especially designed to find performance issues in the code.> We are suspecting that maybe the single Floating-point inside that T2000 is > causing this problem? How would I troubleshoot this problem?Is the application using floating point heavily ? Is it multi-threaded ?> I was thinking to jump into dtrace as everyone is saying its so greatAfternoon nap is also great :)> but I am not sure where to start in trying to troubleshoot this. Its > to late for me to go to take dtrace training class at this moment as > it will take weeks for me to get in, and I was looking around on the > internet and there are many different examples but I am not sure which > one would be the right for troubleshooting this issue with T2000.Is the developer machine running also Solaris ? If not, I would recommend finding tool which can be found on both systems, so you can compare results easily (gcc will be probably on both systems)> It does not make sense that two CPU machine that is slower in speed and has less > RAM outperforms T2000 which has a single CPU but its faster, not only that its > running Solaris 10 which in theory should be performing better, just does not > make any sense...Less gigahertz does not mean slower machine. T2000 is good at executing multithread, non floating point arithmetic heavy code.> Any suggestions or pointers would be greatly appreciated.Use profiling tools designed for the task. Hope this helps -- Vlad -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 193 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20080520/3efbc00c/attachment.bin>
michael schuster
2008-May-20 14:00 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
Krzys wrote:> Hello folks, I have an issue where developer wrote a code and it runs 10 seconds > on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and > the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, > 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds. > > We are suspecting that maybe the single Floating-point inside that T2000 is > causing this problem?do you have any information about what this program is actually doing or meant to do? CAn you talk to the developer about this? michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''
David Lutz
2008-May-20 14:31 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
You should also take a look at the Sun Blueprints paper "Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems" at http://www.sun.com/blueprints/0107/819-5144.html which covers a lots of issues that are specific to this platform. Dtrace is a very powerful and useful tool, but you will still generally start with higher level tools like prstat, mpstat, vmstat, etc, and use dtrace to drill down once you have any idea where to look. HTH, Dave Lutz ----- Original Message ----- From: Vladimir Marek <Vladimir.Marek at Sun.COM> Date: Tuesday, May 20, 2008 6:55 am Subject: Re: [dtrace-discuss] help with troubleshooting problem with process on my T2000 To: dtrace-discuss at opensolaris.org> Hi, > > > Hello folks, I have an issue where developer wrote a code and it runs 10 seconds > > on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and > > the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, > > 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds. > > Have you considered using some sort of performance analysis on your code > ? Sun Studio has excellent tools. > > http://developers.sun.com/solaris/articles/analyzer_qs.html > > Gcc uses utility called ''gprof''. > > Those are the tools especially designed to find performance issues in > the code. > > > > > We are suspecting that maybe the single Floating-point inside that > T2000 is > > causing this problem? How would I troubleshoot this problem? > > Is the application using floating point heavily ? Is it multi-threaded > ? > > > > I was thinking to jump into dtrace as everyone is saying its so great > > Afternoon nap is also great :) > > > > > but I am not sure where to start in trying to troubleshoot this. Its > > to late for me to go to take dtrace training class at this moment as > > it will take weeks for me to get in, and I was looking around on the > > internet and there are many different examples but I am not sure which > > one would be the right for troubleshooting this issue with T2000. > > Is the developer machine running also Solaris ? If not, I would > recommend finding tool which can be found on both systems, so you can > compare results easily (gcc will be probably on both systems) > > > > It does not make sense that two CPU machine that is slower in speed > and has less > > RAM outperforms T2000 which has a single CPU but its faster, not > only that its > > running Solaris 10 which in theory should be performing better, just > does not > > make any sense... > > Less gigahertz does not mean slower machine. T2000 is good at executing > multithread, non floating point arithmetic heavy code. > > > > Any suggestions or pointers would be greatly appreciated. > > Use profiling tools designed for the task. > > Hope this helps > > -- > Vlad > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Rayson Ho
2008-May-20 14:34 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
cpustat can tell you the usage of FP ops, you can follow the instructions on p35 in: "Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems" BTW, I wrote most of the tuning section for T1/T2 on wikipedia, you may find some useful information there: http://en.wikipedia.org/wiki/UltraSPARC_T1#Application_tuning Rayson On 5/20/08, Krzys <krzys at perfekt.net> wrote:> > Hello folks, I have an issue where developer wrote a code and it runs 10 seconds > on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) and > the same process took 73 seconds. Then he took his code to V240 server (8GB RAM, > 1GHz, 2CPU''s, Solaris 8) and that process completed in around 50 seconds. > > We are suspecting that maybe the single Floating-point inside that T2000 is > causing this problem? How would I troubleshoot this problem? I was thinking to > jump into dtrace as everyone is saying its so great but I am not sure where to > start in trying to troubleshoot this. Its to late for me to go to take dtrace > training class at this moment as it will take weeks for me to get in, and I was > looking around on the internet and there are many different examples but I am > not sure which one would be the right for troubleshooting this issue with T2000. > It does not make sense that two CPU machine that is slower in speed and has less > RAM outperforms T2000 which has a single CPU but its faster, not only that its > running Solaris 10 which in theory should be performing better, just does not > make any sense... > > Any suggestions or pointers would be greatly appreciated. > > Regards, > > Chris > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
Krzys
2008-May-20 15:20 UTC
[dtrace-discuss] help with troubleshooting problem with process on my T2000
Thanks so much to everyone that replied to my post, here is what I did notice
while trying cooltst utility (http://cooltools.sunsource.net/cooltst/)
wstest[root] cat cooltst.err
Minimum observation interval is 10 seconds
cooltst 3.0.1 executed at on wstest/Solaris/UltraSPARC-T1
runtime=1, interval=10
measure_cpu=1, cooltst=/opt/cooltst_v3.01, examine=MATLAB
Workload spike analysis:
Highest thread was 21455/2 (MATLAB) = 3.0%
2008-05-20 11:05:17: MATLAB 1.5% vs. top thread (MATLAB) 1.5%, total 3.2%
2008-05-20 11:05:28: MATLAB 2.1% vs. top thread (MATLAB) 2.1%, total 3%
2008-05-20 11:05:38: MATLAB 2.5% vs. top thread (MATLAB) 2.5%, total 2.9%
2008-05-20 11:05:49: MATLAB 2.8% vs. top thread (MATLAB) 2.8%, total 2.9%
2008-05-20 11:05:59: MATLAB 2.9% vs. top thread (MATLAB) 2.9%, total 2.7%
2008-05-20 11:06:10: MATLAB 3.0% vs. top thread (MATLAB) 3.0%, total 2.8%
In 6 out of 6 observation intervals, MATLAB was the top thread
Internal system type code: solaris.t1. Ver detail: 4.10
wstest[root]
wstest[root] cat cooltst.out
CoolThreads Selection Tool (cooltst) version 3.0.1
Copyright 2008 Sun Microsystems, Inc. All rights reserved
Use is subject to license terms.
Cooltst observes a running workload and applies various heuristics
to assess whether that workload may be suitable for a Sun Fire
T1000/T2000/T5x20 system, to help you judge how much effort to put
into a feasibility study which might include porting, prototyping,
and/or performance measurement of your applications. Cooltst is
NOT a system sizing or capacity planning tool, and the rough
approximations used internally in cooltst should not substitute
for detailed performance analysis.
System Configuration
Host name wstest
System name SUNW,Sun-Fire-T200
Effective UID 0
Cooltst version 3.0.1
OS Solaris
OS release 5.10
OS version Generic_127111-11
Distro Solaris
BIOS/PROM OBP 4.25.0 2006/11/07 23:24
Memory 16376 MB
Chip UltraSPARC-T1
MHz 1200
Architecture SPARC
# of Virtual CPUs 32
P0: 1200 MHz UltraSPARC-T1
P1: 1200 MHz UltraSPARC-T1
P2: 1200 MHz UltraSPARC-T1
P3: 1200 MHz UltraSPARC-T1
P4: 1200 MHz UltraSPARC-T1
P5: 1200 MHz UltraSPARC-T1
P6: 1200 MHz UltraSPARC-T1
P7: 1200 MHz UltraSPARC-T1
P8: 1200 MHz UltraSPARC-T1
P9: 1200 MHz UltraSPARC-T1
P10: 1200 MHz UltraSPARC-T1
P11: 1200 MHz UltraSPARC-T1
P12: 1200 MHz UltraSPARC-T1
P13: 1200 MHz UltraSPARC-T1
P14: 1200 MHz UltraSPARC-T1
P15: 1200 MHz UltraSPARC-T1
P16: 1200 MHz UltraSPARC-T1
P17: 1200 MHz UltraSPARC-T1
P18: 1200 MHz UltraSPARC-T1
P19: 1200 MHz UltraSPARC-T1
P20: 1200 MHz UltraSPARC-T1
P21: 1200 MHz UltraSPARC-T1
P22: 1200 MHz UltraSPARC-T1
P23: 1200 MHz UltraSPARC-T1
P24: 1200 MHz UltraSPARC-T1
P25: 1200 MHz UltraSPARC-T1
P26: 1200 MHz UltraSPARC-T1
P27: 1200 MHz UltraSPARC-T1
P28: 1200 MHz UltraSPARC-T1
P29: 1200 MHz UltraSPARC-T1
P30: 1200 MHz UltraSPARC-T1
P31: 1200 MHz UltraSPARC-T1
OS release detail:
Solaris 10 6/06 s10s_u2wos_09a SPARC Copyright 2006 Sun Microsystems, Inc. All
Rights Reserved. Use is subject to license terms. Assembled 09 June 2006
Workload Measurements
Observed system for 1 min
in intervals of 10 sec
Cycles 1728826254700
Instructions 14004912581
CPI 123.44 **
FP instructions 144384919
Emulated FP instructions 100621
FP Percentage 1.0%
The following applies to the measurement interval with the
busiest single thread or process:
Peak thread utilization at 2008-05-20 11:06:10
Corresponding file name 1211295970
CPU utilization 3.9%
Command MATLAB
PID/LWPID 21455/2
Thread utilization 3.0%
More detail on processes and threads is in data/process.out
**Cycles per Instruction (CPI) is not comparable between UltraSPARC
T1 and T2 processors and conventional processors. Conventional
processors execute an idle loop when there is no work to do, so
CPI may be artificially low, especially when the system is
somewhat idle. The UltraSPARC T1 and T2 "park" idle threads,
consuming no energy, when there is no work to do, so CPI may
be artificially high, especially when the system is somewhat idle.
Advice
During the observation of the highest utilization thread, a fairly
low overall CPU utilization of 3.9375% was seen. Are you sure that
the workload of interest was running on the system during the time
cooltst was running? If not, please run cooltst again while your
workload is active.
If your workload was running during this observation, then take
cooltst''s advice with this caveat. If you expect your workload to
increase to higher levels, do you expect it to do so by adding
additional threads, as is common, or do you expect it to add more
work to the existing single thread, as sometimes happens? If your
response time criteria are being met, then even if a single thread
is responsible for most of your CPU consumption, you should still
get acceptable performance from a a system based on the UltraSPARC
T1 processor (i.e. Sun Fire/SPARC Enterprise T1000, T2000, Sun Blade
T6300) or a system based on the UltraSPARC T2 processor (i.e. SPARC
Enterprise T5120, T5220, Sun Blade T6320), with excess throughput
capacity for future growth. But if response time is marginal and
workload growth is expected to be in a single thread, then a CMT
system may not be appropriate.
Floating Point YELLOW
Observed floating point content was marginal for an
UltraSPARC T1 processor. You may proceed with your
evaluation of UltraSPARC T1 for your workload, watching
the floating point usage carefully. Or you may instead
consider an UltraSPARC T2 processor which can handle
floating point heavy workloads.
Parallelism GREEN
The observed workload has sufficient threads of execution to
efficiently utilize the multiple cores and threads of an
UltraSPARC T1 or UltraSPARC T2 processor.
wstest[root]
based on that tool my system is not that much utilized not abused, MATLAB is
cerainly using floating point calculations, but still why its so bad in terms of
how long it runs comparing to a PC and why it is still slower than V240?
I was seriously hoping to see if there is way using dtrace to look into this and
have some more ideas in terms of what is going on and what is the limitation.
Thanks so much for everyone''s help.
Regards,
Chris
On Tue, 20 May 2008, Vladimir Marek wrote:
> Hi,
>
>> Hello folks, I have an issue where developer wrote a code and it runs
10 seconds
>> on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris
10) and
>> the same process took 73 seconds. Then he took his code to V240 server
(8GB RAM,
>> 1GHz, 2CPU''s, Solaris 8) and that process completed in around
50 seconds.
>
> Have you considered using some sort of performance analysis on your code
> ? Sun Studio has excellent tools.
>
> http://developers.sun.com/solaris/articles/analyzer_qs.html
>
> Gcc uses utility called ''gprof''.
>
> Those are the tools especially designed to find performance issues in
> the code.
>
>
>
>> We are suspecting that maybe the single Floating-point inside that
T2000 is
>> causing this problem? How would I troubleshoot this problem?
>
> Is the application using floating point heavily ? Is it multi-threaded ?
>
>
>> I was thinking to jump into dtrace as everyone is saying its so great
>
> Afternoon nap is also great :)
>
>
>
>> but I am not sure where to start in trying to troubleshoot this. Its
>> to late for me to go to take dtrace training class at this moment as
>> it will take weeks for me to get in, and I was looking around on the
>> internet and there are many different examples but I am not sure which
>> one would be the right for troubleshooting this issue with T2000.
>
> Is the developer machine running also Solaris ? If not, I would
> recommend finding tool which can be found on both systems, so you can
> compare results easily (gcc will be probably on both systems)
>
>
>> It does not make sense that two CPU machine that is slower in speed and
has less
>> RAM outperforms T2000 which has a single CPU but its faster, not only
that its
>> running Solaris 10 which in theory should be performing better, just
does not
>> make any sense...
>
> Less gigahertz does not mean slower machine. T2000 is good at executing
> multithread, non floating point arithmetic heavy code.
>
>
>> Any suggestions or pointers would be greatly appreciated.
>
> Use profiling tools designed for the task.
>
> Hope this helps
>
> --
> Vlad
>