Kevin McCormack
2010-Mar-30  09:09 UTC
[dtrace-discuss] Analyzing java class loading with dtrace
Hi there, I am trying to diagnose a performance problem on a Sun T5120 running Solaris 10. I see general poor performance using Java and, in particular, loading classes. I have developed a simple java test that loads all of the classes in a Crystal Reports JAR file (around 7000 classes) which takes much longer on the T5120 (40 secs) than on a Core 2 Duo windows PC (5 secs). I recognise that this time difference may be down to hardware specification but if that is the case, I wish to fully understand which aspect of the hardware (CPU, memory, disk etc.) is limiting the performance of the T5120. In order to try to explain this time difference, I the test on Solaris using the dtrace toolkit. Running procsystime gives the following results (I''ve condensed them to keep the email short): Elapsed Times: ? ? ? ? SYSCALL ? ? ? ? ?TIME (ns) ? ? ?... ? ? ? ? ? ?stat ? ? ? ? ? 18498184 ? ? ? ? fstat64 ? ? ? ? ? 18720808 ? ? ? ? ? ?open ? ? ? ? ? 21688773 ? ? ? ? ?open64 ? ? ? ? ? 41127563 ? ? ? ? ?stat64 ? ? ? ? ? 67024580 ? ? resolvepath ? ? ? ? ?104326254 ? ? ? ? ? yield ? ? ? ? ?514242236 ? ? ? ? ? ?read ? ? ? ? ?608379863 lwp_mutex_timedlock ? ? ? ? 1541179345 ? ? ? ? pollsys ? ? ? ?37961554776 ? ? ? ?lwp_park ? ? ? ?38822747920 ? lwp_cond_wait ? ? ?1530739381685 ? ? ? ? ?TOTAL: ? ? ?1610584447807 CPU Times ? ? ? ? SYSCALL ? ? ? ? ?TIME (ns) ? ? ?... ? ? ? ? ? ?stat ? ? ? ? ? 14066206 ? ? ? ? ?munmap ? ? ? ? ? 14522555 ? ? ? ? ?open64 ? ? ? ? ? 22698725 ? ? ? ? pollsys ? ? ? ? ? 24852893 ? lwp_cond_wait ? ? ? ? ? 26192803 lwp_mutex_timedlock ? ? ? ? ? 30490184 ? ? ? ? ?stat64 ? ? ? ? ? 53125558 ? ? ? ? ? ?read ? ? ? ? ? 55471522 ? ? resolvepath ? ? ? ? ? 80374974 ? ? ? ? ? yield ? ? ? ? ?224775263 ? ? ? ? ?TOTAL: ? ? ? ? ?634822857 Syscall Counts ? ? ? ? SYSCALL ? ? ? ? ? ? ?COUNT ? ? ?... ? ? ? ? ?open64 ? ? ? ? ? ? ? ?613 ? ? ? ? ? fcntl ? ? ? ? ? ? ? ?635 ? ? ? ? pollsys ? ? ? ? ? ? ? ?647 ? ? ? ? ? ?stat ? ? ? ? ? ? ? ?704 ? lwp_cond_wait ? ? ? ? ? ? ? ?794 ? ? ? ? ? close ? ? ? ? ? ? ? ?834 lwp_mutex_timedlock ? ? ? ? ? ? ? ?845 ? ? ? ? ? ? brk ? ? ? ? ? ? ? ?848 ? ? ? ? ?llseek ? ? ? ? ? ? ? 1403 ? ? ? ? ? ?read ? ? ? ? ? ? ? 1515 ? ? ? ? fstat64 ? ? ? ? ? ? ? 1593 ? ? resolvepath ? ? ? ? ? ? ? 1700 ? ? ? ? ?stat64 ? ? ? ? ? ? ? 1778 ? ? ? ? ? yield ? ? ? ? ? ? ?27285 ? ? ? ? ?TOTAL: ? ? ? ? ? ? ?43946 I''m not an expert in dtrace, but what I take from this is that of the total execution time (measured at 40 seconds), only 0.63 seconds is spent actually executing on the CPU. Q1: Is this a correct interpretation? Suspecting that perhaps this ''missing'' time was spent in I/O, I executed iosnoop while the program ran, with the following results: STRTIME ? ? ? ? ? ? ?DEVICE ?MAJ MIN ? UID ? PID D ? ?BLOCK ? SIZE ? ? ? ? ? ? ? ?PATHNAME ARGS 2010 Mar 18 09:22:55 sd1 ? ? ?32 ?13 10054 20116 R 32569904 ? 8192 /var/tmp/test-crystal/lib/cr/CrystalReportsRuntime.jar 2010 Mar 18 09:22:55 sd1 ? ? ?32 ?13 10054 20116 R 32569920 ? 8192 /var/tmp/test-crystal/lib/cr/CrystalReportsRuntime.jar 2010 Mar 18 09:22:56 sd1 ? ? ?32 ? 8 10054 20116 R 42673424 ? 8192 /usr/j2sdk-1.5.0_14/jdk1.5.0_14/jre/lib/rt.jar 2010 Mar 18 09:23:02 sd1 ? ? ?32 ?13 10054 20116 R 32569888 ? 8192 /var/tmp/test-crystal/lib/cr/CrystalReportsRuntime.jar 2010 Mar 18 09:23:10 sd1 ? ? ?32 ?13 10054 20116 R 32564784 ? 8192 /var/tmp/test-crystal/lib/cr/CrystalReportsRuntime.jar 2010 Mar 18 09:23:10 sd1 ? ? ?32 ?13 10054 20116 R 32564816 ? 8192 /var/tmp/test-crystal/lib/cr/CrystalReportsRuntime.jar 2010 Mar 18 09:23:12 sd1 ? ? ?32 ? 8 10054 20116 R ?6395504 ? 8192 /usr/lib/libfontconfig.so.1 2010 Mar 18 09:23:12 sd1 ? ? ?32 ? 8 10054 20116 R ?6395488 ? 8192 /usr/lib/libfontconfig.so.1 2010 Mar 18 09:23:13 sd1 ? ? ?32 ? 8 10054 20116 R ?1823458 ? 7168 /usr/openwin/lib/locale/iso_8859_4/X11/fonts/Type1/LucidaSansLat4.pfb 2010 Mar 18 09:23:14 sd1 ? ? ?32 ? 8 10054 20116 R 42698368 ? 8192 /usr/j2sdk-1.5.0_14/jdk1.5.0_14/jre/lib/charsets.jar 2010 Mar 18 09:23:14 sd1 ? ? ?32 ? 8 10054 20116 R 42698464 ?73728 /usr/j2sdk-1.5.0_14/jdk1.5.0_14/jre/lib/charsets.jar 2010 Mar 18 09:23:18 sd1 ? ? ?32 ?13 10054 20116 R 32560384 ? 8192 /var/tmp/test-crystal/lib/cr/CrystalReportsRuntime.jar 2010 Mar 18 09:23:20 sd1 ? ? ?32 ?13 10054 20116 R 32566144 ? 8192 /var/tmp/test-crystal/lib/cr/CrystalReportsRuntime.jar 2010 Mar 18 09:23:23 sd1 ? ? ?32 ? 8 10054 20116 R 42673584 ?32768 /usr/j2sdk-1.5.0_14/jdk1.5.0_14/jre/lib/rt.jar 2010 Mar 18 09:23:24 sd1 ? ? ?32 ?13 10054 20116 R 32549104 ? 8192 /var/tmp/test-crystal/lib/cr/CrystalReportsRuntime.jar 2010 Mar 18 09:23:25 sd1 ? ? ?32 ?13 10054 20116 R 32556848 ? 8192 /var/tmp/test-crystal/lib/cr/CrystalReportsRuntime.jar My interpretation is that this is very little I/O which should not be holding up the CPU. Q2: Am I correct in saying that this is very little I/O? I have looked at various dtrace tutorials online and it always seems obvious from procsystime where the CPU time is going missing but here I don''t see anything so obvious. Q3: Does anyone have any ideas as to how I could continue this investigation to determine where the bulk of the execution time is going? Thanks in advance for any help. Cheers, Kevin.
Jeremiah Campbell
2010-Mar-30  15:10 UTC
[dtrace-discuss] Analyzing java class loading with dtrace
I''m curious if you''re seeing a similar problem to what I experienced recently. We had recently migrated some JVM''s onto a brand new Solaris 10 T5220 and saw horrendous performance. We opened a case with Sun and eventually got to a kernel engineer who discovered we were running into a variant of bug ID 6815915 which is a C++ or libc bug specifically. However, there is a Java variant of it that was affecting us (never did see a specific bug ID for the java variant). The issue was discovered when looking at at least a level 2 guds and looking at the lockstat-H.out file. We were seeing a bunch of mutex locks bubbling up out of the dtrace helper provider (HotSpot). The work-around was to disable the Dtrace helper functionality (can be turned back on via command line if needed) before starting the application. The work-around was to set an environment variable in your JVM startup scripts (tomcat profile for instance if your JVMs run under tomcat). Here''s the specific quote from the Sun engineer; "The system is under heavy lock contention on dtrace_meta_lock because all java processes are busy in register or destroy dtrace helper functions. Java should not always load DTrace probes. This is likely a Java variation of CR 6815915 which is filed for the C++ runtime libray. The workaround is to run the application with the environment variable DTRACE_DOF_INIT_DISABLE set. " The below makes me wonder if you''re getting some mutex locks out of Dtrace as well. You might try finding this with guds. I''m no Dtrace guru, so couldn''t tell you how to pull the same information out of Dtrace as what we found in guds.> Elapsed Times: > ? ? ? ? SYSCALL ? ? ? ? ?TIME (ns) > ? ? ?... > lwp_mutex_timedlock ? ? ? ? 1541179345 > > CPU Times > ? ? ? ? SYSCALL ? ? ? ? ?TIME (ns) > ? ? ?... > lwp_mutex_timedlock ? ? ? ? ? 30490184-- This message posted from opensolaris.org
Marty Itzkowitz
2010-Mar-30  19:02 UTC
[dtrace-discuss] Analyzing java class loading with dtrace
You could also try to use the Oracle Solaris Studio performance tools,
and profile the application.  It does not user DTrace, so
the DTrace lock contention issue won''t come up.
  http://developers.sun.com/sunstudio/overview/topics/analyzing.jsp
    Marty Itzkowitz
        Project lead, Oracle Solaris Studio Performance Tools.
On 3/30/10 8:10 AM, Jeremiah Campbell wrote:
 > I''m curious if you''re seeing a similar problem to what I
experienced
recently. We had recently migrated some JVM''s onto a brand new Solaris 
10 T5220 and saw horrendous performance. We opened a case with Sun and 
eventually got to a kernel engineer who discovered we were running into 
a variant of bug ID 6815915 which is a C++ or libc bug specifically. 
However, there is a Java variant of it that was affecting us (never did 
see a specific bug ID for the java variant).
 > The issue was discovered when looking at at least a level 2 guds and 
looking at the lockstat-H.out file. We were seeing a bunch of mutex 
locks bubbling up out of the dtrace helper provider (HotSpot). The 
work-around was to disable the Dtrace helper functionality (can be 
turned back on via command line if needed) before starting the 
application. The work-around was to set an environment variable in your 
JVM startup scripts (tomcat profile for instance if your JVMs run under 
tomcat). Here''s the specific quote from the Sun engineer;
 >
 > "The system is under heavy lock contention on dtrace_meta_lock 
because all java processes are busy in register or destroy dtrace helper 
functions. Java should not always load DTrace probes. This is likely a 
Java variation of CR 6815915 which is filed for the C++ runtime libray.
 > The workaround is to run the application with the environment 
variable DTRACE_DOF_INIT_DISABLE set. "
 > The below makes me wonder if you''re getting some mutex locks out
of
Dtrace as well. You might try finding this with guds. I''m no Dtrace 
guru, so couldn''t tell you how to pull the same information out of 
Dtrace as what we found in guds.
 >
 >> Elapsed Times:
 >>         SYSCALL          TIME (ns)
 >>      ...
 >> lwp_mutex_timedlock         1541179345
 >>
 >> CPU Times
 >>         SYSCALL          TIME (ns)
 >>      ...
 >> lwp_mutex_timedlock           30490184
 >>
Kevin McCormack
2010-Mar-31  10:19 UTC
[dtrace-discuss] Analyzing java class loading with dtrace
Hi Jeremiah, Thanks very much for the reply. It was great to hear that at least one other person has had a similar problem. I have a few questions if you don''t mind: 1) Was your Java performance problem related to this java variant of bug ID 6815915? It''s not clear if that bug affected your diagnosis using dtrace or was the cause of the problem itself. i.e. did setting the environment variable DTRACE_DOF_INIT_DISABLE fix your performance problems or just enable you to get better results using dtrace? 2) Running "export DTRACE_DOF_INIT_DISABLE=true" before running my program in the shell doesn''t seem to make any difference to the execution time. Is this how you implemented the workaround in your JVM startup scripts? 3) Excuse my ignorance, but what is "guds"? Cheers, Kevin.> Message: 2 > Date: Tue, 30 Mar 2010 08:10:19 PDT > From: Jeremiah Campbell <jeremiah.campbell at jpmchase.com> > To: dtrace-discuss at opensolaris.org > Subject: Re: [dtrace-discuss] Analyzing java class loading with dtrace > Message-ID: <441700064.151269961849235.JavaMail.Twebapp at sf-app1> > Content-Type: text/plain; charset=UTF-8 > > I''m curious if you''re seeing a similar problem to what I experienced recently. We had recently migrated some JVM''s onto a brand new Solaris 10 T5220 and saw horrendous performance. We opened a case with Sun and eventually got to a kernel engineer who discovered we were running into a variant of bug ID 6815915 which is a C++ or libc bug specifically. However, there is a Java variant of it that was affecting us (never did see a specific bug ID for the java variant). > > The issue was discovered when looking at at least a level 2 guds and looking at the lockstat-H.out file. We were seeing a bunch of mutex locks bubbling up out of the dtrace helper provider (HotSpot). The work-around was to disable the Dtrace helper functionality (can be turned back on via command line if needed) before starting the application. The work-around was to set an environment variable in your JVM startup scripts (tomcat profile for instance if your JVMs run under tomcat). Here''s the specific quote from the Sun engineer; > > "The system is under heavy lock contention on dtrace_meta_lock because all java processes are busy in register or destroy dtrace helper functions. Java should not always load DTrace probes. This is likely a Java variation of CR 6815915 which is filed for the C++ runtime libray. > > The workaround is to run the application with the environment variable DTRACE_DOF_INIT_DISABLE set. " > > The below makes me wonder if you''re getting some mutex locks out of Dtrace as well. You might try finding this with guds. I''m no Dtrace guru, so couldn''t tell you how to pull the same information out of Dtrace as what we found in guds. >> Elapsed Times: >> ? ? ? ? SYSCALL ? ? ? ? ?TIME (ns) >> ? ? ?... >> lwp_mutex_timedlock ? ? ? ? 1541179345 >> >> CPU Times >> ? ? ? ? SYSCALL ? ? ? ? ?TIME (ns) >> ? ? ?... >> lwp_mutex_timedlock ? ? ? ? ? 30490184
Jeremiah Campbell
2010-Mar-31  14:02 UTC
[dtrace-discuss] Analyzing java class loading with dtrace
> Hi Jeremiah, > > Thanks very much for the reply. It was great to hear > that at least one > other person has had a similar problem. I have a few > questions if you > don''t mind: > > 1) Was your Java performance problem related to this > java variant of > bug ID 6815915? It''s not clear if that bug affected > your diagnosis > using dtrace or was the cause of the problem itself. > i.e. did setting > the environment variable DTRACE_DOF_INIT_DISABLE fix > your performance > problems or just enable you to get better results > using dtrace? >We were definitely seeing this bug with our systems. The env variable actually was the fix for the performance problem. The root cause of the issue was that the Java Hotspot provider was loading a Dtrace helper that was causing excessive mutex locks which was backing up threads on the CPU. We were seeing high run-queue depth and low CPU utilization. Our issue wasn''t just that the JVMs were slow, but that they were slowing down the whole server.> 2) Running "export DTRACE_DOF_INIT_DISABLE=true" > before running my > program in the shell doesn''t seem to make any > difference to the > execution time. Is this how you implemented the > workaround in your JVM > startup scripts? >Yes, that is how we were enabling the work-around, specifically we were setting it to "1", but I don''t think it really matters either way. We had to do this work around for all JVMs running on these servers. If it didn''t improve your performance, then perhaps it''s unrelated.> 3) Excuse my ignorance, but what is "guds"? >guds is a performance gathering script the Solution Center has traditionally liked to have you run while you are experiencing problems. Though there are multiple references to it via a search on bigadmin, I can''t seem to find a link to where to download it from. I obtained it via e-mail from a trouble call at some point in the past. ----------- Jeremiah -- This message posted from opensolaris.org
Kevin McCormack
2010-Apr-05  09:58 UTC
[dtrace-discuss] Analyzing java class loading with dtrace
Hi Jeremiah, Thanks again for the reply. I''ve been away for a few days hence the late response. Although it sounds like we are seeing the same characteristics as you did, like you, I''m not convinced we are seeing the same problem. One final question - which version of Java are you running? Thanks very much for your help, Kevin.> Date: Wed, 31 Mar 2010 07:02:20 PDT > From: Jeremiah Campbell <jeremiah.campbell at jpmchase.com> > To: dtrace-discuss at opensolaris.org > Subject: Re: [dtrace-discuss] Analyzing java class loading with dtrace > Message-ID: <570590640.291270044175886.JavaMail.Twebapp at sf-app1> > Content-Type: text/plain; charset=UTF-8 > >> Hi Jeremiah, >> >> Thanks very much for the reply. It was great to hear >> that at least one >> other person has had a similar problem. I have a few >> questions if you >> don''t mind: >> >> 1) Was your Java performance problem related to this >> java variant of >> bug ID 6815915? It''s not clear if that bug affected >> your diagnosis >> using dtrace or was the cause of the problem itself. >> i.e. did setting >> the environment variable DTRACE_DOF_INIT_DISABLE fix >> your performance >> problems or just enable you to get better results >> using dtrace? >> > We were definitely seeing this bug with our systems. The env variable actually was the fix for the performance problem. The root cause of the issue was that the Java Hotspot provider was loading a Dtrace helper that was causing excessive mutex locks which was backing up threads on the CPU. We were seeing high run-queue depth and low CPU utilization. Our issue wasn''t just that the JVMs were slow, but that they were slowing down the whole server. > >> 2) Running "export DTRACE_DOF_INIT_DISABLE=true" >> before running my >> program in the shell doesn''t seem to make any >> difference to the >> execution time. Is this how you implemented the >> workaround in your JVM >> startup scripts? >> > Yes, that is how we were enabling the work-around, specifically we were setting it to "1", but I don''t think it really matters either way. We had to do this work around for all JVMs running on these servers. If it didn''t improve your performance, then perhaps it''s unrelated. > >> 3) Excuse my ignorance, but what is "guds"? >> > > guds is a performance gathering script the Solution Center has traditionally liked to have you run while you are experiencing problems. Though there are multiple references to it via a search on bigadmin, I can''t seem to find a link to where to download it from. I obtained it via e-mail from a trouble call at some point in the past. > > ----------- > Jeremiah
Kevin McCormack
2010-Apr-22  08:35 UTC
[dtrace-discuss] Analyzing java class loading with dtrace
Just to close this out, some Sun engineers took a look at the system and concluded that the performance was in line with their expectations given that Java classloading is a single-threaded activity and each hardware thread on a T5120 has a clock speed of 1.2GHz and is only capable of executing 1 operation per clock cycle. Using corestat, they observed that the hardware thread/CPU was experiencing 35% utilization when running the test we developed but that mpstat showed 100% utilization for the CPU, the discrepancy being due to cache misses apparently. Anyway, thanks for your help with this problem and I hope this feedback helps someone else out in the future. Kevin. On 5 April 2010 11:58, Kevin McCormack <kevmccormack at gmail.com> wrote:> Hi Jeremiah, > > Thanks again for the reply. I''ve been away for a few days hence the > late response. Although it sounds like we are seeing the same > characteristics as you did, like you, I''m not convinced we are seeing > the same problem. > > One final question - which version of Java are you running? > > Thanks very much for your help, > > Kevin. > > >> Date: Wed, 31 Mar 2010 07:02:20 PDT >> From: Jeremiah Campbell <jeremiah.campbell at jpmchase.com> >> To: dtrace-discuss at opensolaris.org >> Subject: Re: [dtrace-discuss] Analyzing java class loading with dtrace >> Message-ID: <570590640.291270044175886.JavaMail.Twebapp at sf-app1> >> Content-Type: text/plain; charset=UTF-8 >> >>> Hi Jeremiah, >>> >>> Thanks very much for the reply. It was great to hear >>> that at least one >>> other person has had a similar problem. I have a few >>> questions if you >>> don''t mind: >>> >>> 1) Was your Java performance problem related to this >>> java variant of >>> bug ID 6815915? It''s not clear if that bug affected >>> your diagnosis >>> using dtrace or was the cause of the problem itself. >>> i.e. did setting >>> the environment variable DTRACE_DOF_INIT_DISABLE fix >>> your performance >>> problems or just enable you to get better results >>> using dtrace? >>> >> We were definitely seeing this bug with our systems. The env variable actually was the fix for the performance problem. The root cause of the issue was that the Java Hotspot provider was loading a Dtrace helper that was causing excessive mutex locks which was backing up threads on the CPU. We were seeing high run-queue depth and low CPU utilization. Our issue wasn''t just that the JVMs were slow, but that they were slowing down the whole server. >> >>> 2) Running "export DTRACE_DOF_INIT_DISABLE=true" >>> before running my >>> program in the shell doesn''t seem to make any >>> difference to the >>> execution time. Is this how you implemented the >>> workaround in your JVM >>> startup scripts? >>> >> Yes, that is how we were enabling the work-around, specifically we were setting it to "1", but I don''t think it really matters either way. We had to do this work around for all JVMs running on these servers. If it didn''t improve your performance, then perhaps it''s unrelated. >> >>> 3) Excuse my ignorance, but what is "guds"? >>> >> >> guds is a performance gathering script the Solution Center has traditionally liked to have you run while you are experiencing problems. Though there are multiple references to it via a search on bigadmin, I can''t seem to find a link to where to download it from. I obtained it via e-mail from a trouble call at some point in the past. >> >> ----------- >> Jeremiah >
Anon Y Mous
2010-Apr-29  11:58 UTC
[dtrace-discuss] Analyzing java class loading with dtrace
>Just to close this out, some Sun engineers took a look at >the system and concluded that the performance was in >line with their expectations given that Java classloading >is a single-threaded activity and each hardware thread >on a T5120 has a clock speed of 1.2GHz and is only >capable of executing 1 operation per clock cycle.The CMT SPARC stuff is a good idea, but I often find myself wishing that Sun''s hardware engineers would have just massively cranked up the clock cycle to boost single threaded performance on SPARC. You can''t help but wonder how many high margin, big iron hardware contracts Sun lost because the IBM POWER6 CPUs in a Power 570 running at 5.0Ghz will appear to be almost 4 to 5 times faster than a T5120 for single threaded things like importing an Oracle database or Java class loading, and when you''re dealing with CIO''s, the appearance of having a fast server can be more important than actually having a server that is optimal for your workload- reference the Dilbert cartoon below for more information: http://dilbert.com/strips/comic/2009-03-02/ -- This message posted from opensolaris.org