thr3ads.net - dtrace discuss - [dtrace-discuss] debugging a faulty jboss/application [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Fernando Gleiser

2008-Nov-24 19:57 UTC

[dtrace-discuss] debugging a faulty jboss/application

>From time to time a jboss process would end eating all, the available CPU
and the load avg would skyrocket.
Once the operators restarted jboss, the system''d be normal again
(sometimes for weeks) until the next incident

Since we moved the app from a v440 running Solaris 10 8/07 to a t2000 running
Solaris 10 5/08 the problem started to happen more frequently (2-3 times a
week). The only other modification (besides hard and OS release) is we put the
app server inside a sparse-zone container with the FSS and a 95% limit on the
CPU usage.

Here''s what uptime says:

Mon Nov 24 10:10:00 ARST 2008
 10:10am  up 10 day(s), 14:40,  6 users,  load average: 325.52, 320.07, 318.72

(yes, load avg is almost 320, but the server was still usable)

a little output from mpstat shows this:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0  531   259  150   46   27   17   43    0    29   99   1   0   0
  1    0   0    3    66    0  124   65   36   10    0    86   99   1   0   0
  2    0   0    0    23    0   34   22   17    9    0     8  100   0   0   0
  3    0   0    0    13    0   13   12    6    9    0     1  100   0   0   0
  4    0   0    0    13    0   14   12    9    9    0     2  100   0   0   0
  5    0   0    0    38    0   69   37   17    8    0    23  100   0   0   0
  6    0   0    0    25    0   40   24   13    8    0     7   95   5   0   0
  7    0   0    0    15    0   17   14   10    9    0     3  100   0   0   0
  8    0   0    0    11    0   10   10    6    8    0     0  100   0   0   0
  9    0   0    0    12    0   11   11    7    9    0     0  100   0   0   0
 10    0   0    0    14    0   13   13    7    9    0     0  100   0   0   0
 11    0   0    0    13    0   12   12    6    9    0     0  100   0   0   0
 12    0   0    0    25    0   37   24   14    8    0    25  100   0   0   0
 13    0   0    0    13    0   13   12    9    9    0     1  100   0   0   0
 14    0   0    0    13    0   12   12    4    9    0     0  100   0   0   0
 15    0   0    0    13    0   12   12    7    9    0     0  100   0   0   0
 16    0   0    0    12    0   11   11    5    7    0     0  100   0   0   0
 17    0   0    0    14    0   13   13    8    7    0     0  100   0   0   0
 18    0   0    0    13    0   12   12    8    9    0     0  100   0   0   0
 19    0   0    1    13    0   12   12    7    8    0     0  100   0   0   0
 20    0   0    7    25    0   35   25   13    9    0     3  100   0   0   0
 21    0   0   72    45    3   78   41   29   11    0    19   99   1   0   0
22    0   0    3    44    4   71   39   21   11    0   135  100   0   0   0
 23    0   0    3    20    5   16   14    7    9    0     2  100   0   0   0
 24    0   0    3    17    5   13   12    6    9    0     2   99   1   0   0
 25    0   0    1    15    4   10   10    8    8    0     0  100   0   0   0
 26    0   0    8    15    0   17   14    5   10    0     0  100   0   0   0
 27    0   0    0    12    0   11   11    7    8    0     0  100   0   0   0
 28    0   0    1    11    0   10   10    6    8    0     0  100   0   0   0
 29    0   0    2    67    0  126   66   30   11    0    48  100   0   0   0
 30    0   0    0    19    0   25   18   11    8    0   232  100   0   0   0
 31    0   0    1    27    0   47   26   12   10    0    10  100   0   0   0



So, 100% cpu in usr mode looks a lot like some kind of infinite loop

I tried to dig it up using dtrace:

profile-1001us

/(execname=="java") && (pid==$1)/

{

        @[jstack()]=count();

}



tick-20s

{

        exit(O);

}


But I got a LOT of these arrors:

dtrace: error on enabled probe ID 1 (ID 52070: profile:::Profile-1001us):
invalid address (0x96a7e000) in action #2

and th stack traces are mostly the hex addresses without the names:


              0xf886ca64
              0xf884cec0
              0xf8805d3c
              0xf8805874
              0xf8805c70
              0xf8805c70
              0xf8805d3c
              0xf88380b8
              0xf9509440
              0xf88b7b44
              0xf8805874
              0xf8805874
              0xf8805764
              0xf8805874
              0xf8805874
              0xf8805d3c
              0xf8805874
              0xf8805874
              0xf8805874
              0xf8805874
              0xf8805874
              0xf8805874
              0xf8805764
              0xf8805764
              0xf8805764
              0xf886993c
              0xf9999004
              0xf884cec0
              0xf8805874
              0xf8805d3c
              0xf884cc7c
              0xf994090c
              0xf8838df8
              0xf8800218
             
libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x5a0
              libjvm.so`JVM_DoPrivileged+0x500
             
libjava.so`Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x14
              0xf880e22c
              0xf880e1d0
              0xf89c6470
              0xf9dfc018
              0xf99ee350
              0xf9d74fd8
              0xf8838df8
              0xf8800218
             
libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x5a0
              libjvm.so`JVM_DoPrivileged+0x500
             
libjava.so`Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2+0x14
              0xf9467cc8
              0xf9badd44
             9852


So my questions are:

any ideas on what to do next? I''m pretty sure it''s a
jboss/application problem, but I need to get more data to show the jboss/devel
people
what''s causing those dtrace errors?


thanks in advance

Adam Leventhal

2008-Nov-26 22:30 UTC

head link

[dtrace-discuss] debugging a faulty jboss/application

Is java running with the -server option on x86? There''s a known bug  
where jstack() can''t capture stacks
On Nov 24, 2008, at 11:57 AM, Fernando Gleiser wrote:
>
>> From time to time a jboss process would end eating all, the  
>> available CPU and the load avg would skyrocket.
>
> Once the operators restarted jboss, the system''d be normal again  
> (sometimes for weeks) until the next incident
>
> Since we moved the app from a v440 running Solaris 10 8/07 to a  
> t2000 running Solaris 10 5/08 the problem started to happen more  
> frequently (2-3 times a week). The only other modification (besides  
> hard and OS release) is we put the app server inside a sparse-zone  
> container with the FSS and a 95% limit on the CPU usage.
>
> Here''s what uptime says:
>
> Mon Nov 24 10:10:00 ARST 2008
> 10:10am  up 10 day(s), 14:40,  6 users,  load average: 325.52,  
> 320.07, 318.72
>
> (yes, load avg is almost 320, but the server was still usable)
>
> a little output from mpstat shows this:
>
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr  
> sys  wt idl
>  0    0   0  531   259  150   46   27   17   43    0    29   99    
> 1   0   0
>  1    0   0    3    66    0  124   65   36   10    0    86   99    
> 1   0   0
>  2    0   0    0    23    0   34   22   17    9    0     8  100    
> 0   0   0
>  3    0   0    0    13    0   13   12    6    9    0     1  100    
> 0   0   0
>  4    0   0    0    13    0   14   12    9    9    0     2  100    
> 0   0   0
>  5    0   0    0    38    0   69   37   17    8    0    23  100    
> 0   0   0
>  6    0   0    0    25    0   40   24   13    8    0     7   95    
> 5   0   0
>  7    0   0    0    15    0   17   14   10    9    0     3  100    
> 0   0   0
>  8    0   0    0    11    0   10   10    6    8    0     0  100    
> 0   0   0
>  9    0   0    0    12    0   11   11    7    9    0     0  100    
> 0   0   0
> 10    0   0    0    14    0   13   13    7    9    0     0  100    
> 0   0   0
> 11    0   0    0    13    0   12   12    6    9    0     0  100    
> 0   0   0
> 12    0   0    0    25    0   37   24   14    8    0    25  100    
> 0   0   0
> 13    0   0    0    13    0   13   12    9    9    0     1  100    
> 0   0   0
> 14    0   0    0    13    0   12   12    4    9    0     0  100    
> 0   0   0
> 15    0   0    0    13    0   12   12    7    9    0     0  100    
> 0   0   0
> 16    0   0    0    12    0   11   11    5    7    0     0  100    
> 0   0   0
> 17    0   0    0    14    0   13   13    8    7    0     0  100    
> 0   0   0
> 18    0   0    0    13    0   12   12    8    9    0     0  100    
> 0   0   0
> 19    0   0    1    13    0   12   12    7    8    0     0  100    
> 0   0   0
> 20    0   0    7    25    0   35   25   13    9    0     3  100    
> 0   0   0
> 21    0   0   72    45    3   78   41   29   11    0    19   99    
> 1   0   0
> 22    0   0    3    44    4   71   39   21   11    0   135  100    
> 0   0   0
> 23    0   0    3    20    5   16   14    7    9    0     2  100    
> 0   0   0
> 24    0   0    3    17    5   13   12    6    9    0     2   99    
> 1   0   0
> 25    0   0    1    15    4   10   10    8    8    0     0  100    
> 0   0   0
> 26    0   0    8    15    0   17   14    5   10    0     0  100    
> 0   0   0
> 27    0   0    0    12    0   11   11    7    8    0     0  100    
> 0   0   0
> 28    0   0    1    11    0   10   10    6    8    0     0  100    
> 0   0   0
> 29    0   0    2    67    0  126   66   30   11    0    48  100    
> 0   0   0
> 30    0   0    0    19    0   25   18   11    8    0   232  100    
> 0   0   0
> 31    0   0    1    27    0   47   26   12   10    0    10  100    
> 0   0   0
>
>
>
> So, 100% cpu in usr mode looks a lot like some kind of infinite loop
>
> I tried to dig it up using dtrace:
>
> profile-1001us
>
> /(execname=="java") && (pid==$1)/
>
> {
>
>        @[jstack()]=count();
>
> }
>
>
>
> tick-20s
>
> {
>
>        exit(O);
>
> }
>
>
> But I got a LOT of these arrors:
>
> dtrace: error on enabled probe ID 1 (ID 52070:  
> profile:::Profile-1001us): invalid address (0x96a7e000) in action #2
>
> and th stack traces are mostly the hex addresses without the names:
>
>
>              0xf886ca64
>              0xf884cec0
>              0xf8805d3c
>              0xf8805874
>              0xf8805c70
>              0xf8805c70
>              0xf8805d3c
>              0xf88380b8
>              0xf9509440
>              0xf88b7b44
>              0xf8805874
>              0xf8805874
>              0xf8805764
>              0xf8805874
>              0xf8805874
>              0xf8805d3c
>              0xf8805874
>              0xf8805874
>              0xf8805874
>              0xf8805874
>              0xf8805874
>              0xf8805874
>              0xf8805764
>              0xf8805764
>              0xf8805764
>              0xf886993c
>              0xf9999004
>              0xf884cec0
>              0xf8805874
>              0xf8805d3c
>              0xf884cc7c
>              0xf994090c
>              0xf8838df8
>              0xf8800218
>               
> libjvm 
> .so 
> ` 
>
__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_
> +0x5a0
>              libjvm.so`JVM_DoPrivileged+0x500
>               
> libjava 
> .so 
> ` 
>
Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2
> +0x14
>              0xf880e22c
>              0xf880e1d0
>              0xf89c6470
>              0xf9dfc018
>              0xf99ee350
>              0xf9d74fd8
>              0xf8838df8
>              0xf8800218
>               
> libjvm 
> .so 
> ` 
>
__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_
> +0x5a0
>              libjvm.so`JVM_DoPrivileged+0x500
>               
> libjava 
> .so 
> ` 
>
Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2
> +0x14
>              0xf9467cc8
>              0xf9badd44
>             9852
>
>
> So my questions are:
>
> any ideas on what to do next? I''m pretty sure it''s a
jboss/
> application problem, but I need to get more data to show the jboss/ 
> devel people
> what''s causing those dtrace errors?
>
>
> thanks in advance
>
>
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

--
Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl

William D. Hathaway

2008-Nov-27 02:07 UTC

head link

[dtrace-discuss] debugging a faulty jboss/application

Sorry DTrace isn''t working for you here. Try using the command line
pstack (Solaris) or jstack (comes with jdk) tools.  I saw a similar situation
last year where garbage collection would occasionally go nuts (where nuts might
be taking several hours to complete the GC cycle).  You can use prstat -L to see
which java threads are chewing the CPU and then look for those threads in the
pstack output.
-- 
This message posted from opensolaris.org

Marcelo Fukushima

2008-Nov-29 00:18 UTC

head link

[dtrace-discuss] debugging a faulty jboss/application

for the stack traces, try running with a different jdk. I was only
able to see the stacks properly when i installed jdk7 (from the
openjdk builds)

On Wed, Nov 26, 2008 at 8:30 PM, Adam Leventhal <ahl at eng.sun.com>
wrote:> Is java running with the -server option on x86? There''s a known
bug
> where jstack() can''t capture stacks
> On Nov 24, 2008, at 11:57 AM, Fernando Gleiser wrote:
>
>>
>>> From time to time a jboss process would end eating all, the
>>> available CPU and the load avg would skyrocket.
>>
>> Once the operators restarted jboss, the system''d be normal
again
>> (sometimes for weeks) until the next incident
>>
>> Since we moved the app from a v440 running Solaris 10 8/07 to a
>> t2000 running Solaris 10 5/08 the problem started to happen more
>> frequently (2-3 times a week). The only other modification (besides
>> hard and OS release) is we put the app server inside a sparse-zone
>> container with the FSS and a 95% limit on the CPU usage.
>>
>> Here''s what uptime says:
>>
>> Mon Nov 24 10:10:00 ARST 2008
>> 10:10am  up 10 day(s), 14:40,  6 users,  load average: 325.52,
>> 320.07, 318.72
>>
>> (yes, load avg is almost 320, but the server was still usable)
>>
>> a little output from mpstat shows this:
>>
>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr
>> sys  wt idl
>>  0    0   0  531   259  150   46   27   17   43    0    29   99
>> 1   0   0
>>  1    0   0    3    66    0  124   65   36   10    0    86   99
>> 1   0   0
>>  2    0   0    0    23    0   34   22   17    9    0     8  100
>> 0   0   0
>>  3    0   0    0    13    0   13   12    6    9    0     1  100
>> 0   0   0
>>  4    0   0    0    13    0   14   12    9    9    0     2  100
>> 0   0   0
>>  5    0   0    0    38    0   69   37   17    8    0    23  100
>> 0   0   0
>>  6    0   0    0    25    0   40   24   13    8    0     7   95
>> 5   0   0
>>  7    0   0    0    15    0   17   14   10    9    0     3  100
>> 0   0   0
>>  8    0   0    0    11    0   10   10    6    8    0     0  100
>> 0   0   0
>>  9    0   0    0    12    0   11   11    7    9    0     0  100
>> 0   0   0
>> 10    0   0    0    14    0   13   13    7    9    0     0  100
>> 0   0   0
>> 11    0   0    0    13    0   12   12    6    9    0     0  100
>> 0   0   0
>> 12    0   0    0    25    0   37   24   14    8    0    25  100
>> 0   0   0
>> 13    0   0    0    13    0   13   12    9    9    0     1  100
>> 0   0   0
>> 14    0   0    0    13    0   12   12    4    9    0     0  100
>> 0   0   0
>> 15    0   0    0    13    0   12   12    7    9    0     0  100
>> 0   0   0
>> 16    0   0    0    12    0   11   11    5    7    0     0  100
>> 0   0   0
>> 17    0   0    0    14    0   13   13    8    7    0     0  100
>> 0   0   0
>> 18    0   0    0    13    0   12   12    8    9    0     0  100
>> 0   0   0
>> 19    0   0    1    13    0   12   12    7    8    0     0  100
>> 0   0   0
>> 20    0   0    7    25    0   35   25   13    9    0     3  100
>> 0   0   0
>> 21    0   0   72    45    3   78   41   29   11    0    19   99
>> 1   0   0
>> 22    0   0    3    44    4   71   39   21   11    0   135  100
>> 0   0   0
>> 23    0   0    3    20    5   16   14    7    9    0     2  100
>> 0   0   0
>> 24    0   0    3    17    5   13   12    6    9    0     2   99
>> 1   0   0
>> 25    0   0    1    15    4   10   10    8    8    0     0  100
>> 0   0   0
>> 26    0   0    8    15    0   17   14    5   10    0     0  100
>> 0   0   0
>> 27    0   0    0    12    0   11   11    7    8    0     0  100
>> 0   0   0
>> 28    0   0    1    11    0   10   10    6    8    0     0  100
>> 0   0   0
>> 29    0   0    2    67    0  126   66   30   11    0    48  100
>> 0   0   0
>> 30    0   0    0    19    0   25   18   11    8    0   232  100
>> 0   0   0
>> 31    0   0    1    27    0   47   26   12   10    0    10  100
>> 0   0   0
>>
>>
>>
>> So, 100% cpu in usr mode looks a lot like some kind of infinite loop
>>
>> I tried to dig it up using dtrace:
>>
>> profile-1001us
>>
>> /(execname=="java") && (pid==$1)/
>>
>> {
>>
>>        @[jstack()]=count();
>>
>> }
>>
>>
>>
>> tick-20s
>>
>> {
>>
>>        exit(O);
>>
>> }
>>
>>
>> But I got a LOT of these arrors:
>>
>> dtrace: error on enabled probe ID 1 (ID 52070:
>> profile:::Profile-1001us): invalid address (0x96a7e000) in action #2
>>
>> and th stack traces are mostly the hex addresses without the names:
>>
>>
>>              0xf886ca64
>>              0xf884cec0
>>              0xf8805d3c
>>              0xf8805874
>>              0xf8805c70
>>              0xf8805c70
>>              0xf8805d3c
>>              0xf88380b8
>>              0xf9509440
>>              0xf88b7b44
>>              0xf8805874
>>              0xf8805874
>>              0xf8805764
>>              0xf8805874
>>              0xf8805874
>>              0xf8805d3c
>>              0xf8805874
>>              0xf8805874
>>              0xf8805874
>>              0xf8805874
>>              0xf8805874
>>              0xf8805874
>>              0xf8805764
>>              0xf8805764
>>              0xf8805764
>>              0xf886993c
>>              0xf9999004
>>              0xf884cec0
>>              0xf8805874
>>              0xf8805d3c
>>              0xf884cc7c
>>              0xf994090c
>>              0xf8838df8
>>              0xf8800218
>>
>> libjvm
>> .so
>> `
>>
__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_
>> +0x5a0
>>              libjvm.so`JVM_DoPrivileged+0x500
>>
>> libjava
>> .so
>> `
>>
Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2
>> +0x14
>>              0xf880e22c
>>              0xf880e1d0
>>              0xf89c6470
>>              0xf9dfc018
>>              0xf99ee350
>>              0xf9d74fd8
>>              0xf8838df8
>>              0xf8800218
>>
>> libjvm
>> .so
>> `
>>
__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_
>> +0x5a0
>>              libjvm.so`JVM_DoPrivileged+0x500
>>
>> libjava
>> .so
>> `
>>
Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2
>> +0x14
>>              0xf9467cc8
>>              0xf9badd44
>>             9852
>>
>>
>> So my questions are:
>>
>> any ideas on what to do next? I''m pretty sure it''s a
jboss/
>> application problem, but I need to get more data to show the jboss/
>> devel people
>> what''s causing those dtrace errors?
>>
>>
>> thanks in advance
>>
>>
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>
>
> --
> Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>


-- 
[]''s
Marcelo Takeshi Fukushima

Reasonably Related Threads

Search for more possibly parallel threads

dtrace discuss - Nov 2008 - debugging a faulty jboss/application

[dtrace-discuss] debugging a faulty jboss/application

[dtrace-discuss] debugging a faulty jboss/application

[dtrace-discuss] debugging a faulty jboss/application

[dtrace-discuss] debugging a faulty jboss/application

Reasonably Related Threads