thr3ads.net - dtrace discuss - [dtrace-discuss] Process in LCK / SLP (Please) [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Kleyson Rios

2008-Apr-16 15:35 UTC

[dtrace-discuss] Process in LCK / SLP (Please)

Hi,

 

I really need help.

 

How can i identify why my process are 100% in LCK and SLP ?

 

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID 

  6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  30   0  18   0 java/28

  6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   1   0 java/94

  6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0   1   0 java/93

  6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   0   0 java/100

  6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   1   0   1   0 java/18

  6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/49

  6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/47

  6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/48

  6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/32

  6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/31

  6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/30

  6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/29

  6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/27

  6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/26

  6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/25

  6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/24

  6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/23

 

 

Atenciosamente,

 

------------------------------------------------------------------

 

Kleyson Rios.

Ger?ncia de Suporte T?cnico

Analista de Suporte / L?der de Equipe

 

Governo do Estado de Goi?s

Ag?ncia de Administra??o

Diretoria de Inform?tica

+55 62 3201-6582

 

"Voc? sabe a diferen?a entre pessoas bem sucedidas e pessoas mal sucedidas
?

Pessoas bem sucedidas est?o dispostas a fazer as coisas que as pessoas mal
sucedidas n?o est?o."

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20080416/daf46aa2/attachment.html>

Michael Schuster

2008-Apr-16 18:28 UTC

head link

[dtrace-discuss] Process in LCK / SLP (Please)

Kleyson Rios wrote:> Hi,
> 
>  
> 
> I really need help.
> 
>  
> 
> How can i identify why my process are 100% in LCK and SLP ?
> 
>  
> 
>    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG 
> PROCESS/LWPID
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  30   0  18   0 java/28
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   1   0 java/94
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0   1   0 java/93
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   0   0 java/100
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   1   0   1   0 java/18
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/49
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/47
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/48
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/32
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/31
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/30
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/29
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/27
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/26
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/25
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/24
> 
>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/23
from the prstat man-page:

      LCK

          The percentage of time the process has spent waiting for
          user locks.

      SLP

          The percentage of time the process has spent sleeping.

it looks like your processes have deadlocked themselves - not much one can 
do without knowledge about what they''re actually doing and access to
the
source code. I''d talk to the people developing this app.

Michael
-- 
Michael Schuster 	http://blogs.sun.com/recursion
Recursion, n.: see ''Recursion''

Peter B. Kessler

2008-Apr-16 19:01 UTC

head link

[dtrace-discuss] Process in LCK / SLP (Please)

Michael Schuster wrote:> Kleyson Rios wrote:
>> Hi,
>>
>> I really need help.
>>
>> How can i identify why my process are 100% in LCK and SLP ?
>>
>>    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  30   0  18   0 java/28
>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   1   0 java/94
>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0   1   0 java/93
>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   0   0
java/100
>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   1   0   1   0 java/18
>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/49
>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/47
>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/48
>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/32
>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/31
>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/30
>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/29
>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 java/27
>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/26
>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/25
>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/24
>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 java/23
> 
> from the prstat man-page:
> 
>       LCK
> 
>           The percentage of time the process has spent waiting for
>           user locks.
> 
>       SLP
> 
>           The percentage of time the process has spent sleeping.
> 
> it looks like your processes have deadlocked themselves - not much one can 
> do without knowledge about what they''re actually doing and access
to the
> source code. I''d talk to the people developing this app.
> 
> Michael
If it is a deadlock, and if it is a recent Java virtual machine, then
you should be able to connect to it with jconsole (from the same directory
where you get your java command), go to the "Threads" tab and use the
"Detect Deadlock" button to get a thread stack dump showing where you
have a cycle among your Java-level locks.

			... peter

rickey c weisner

2008-Apr-16 19:21 UTC

head link

[dtrace-discuss] Process in LCK / SLP (Please)

It could very well be that your process is waiting on condition variables
and this behavior is normal. Please see the attached script as a place
to start. A pstack of the process would also be informative.
rick

On Wed, Apr 16, 2008 at 12:35:53PM -0300, Kleyson Rios
wrote:> Date: Wed, 16 Apr 2008 12:35:53 -0300
> From: Kleyson Rios <kleyson at administracao.go.gov.br>
> Subject: [dtrace-discuss] Process in LCK / SLP (Please)
> To: dtrace-discuss at opensolaris.org
> Errors-to: dtrace-discuss-bounces at opensolaris.org
> X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
> X-Mailer: Microsoft Office Outlook, Build 11.0.5510
> Precedence: list
> X-BeenThere: dtrace-discuss at opensolaris.org
> Delivered-to: dtrace-discuss at opensolaris.org
> Thread-index: Acif144rsI8gfNTOQjKP0oF/+VkN7Q=> X-PMX-Version:
5.4.1.325704
> X-Brightmail-Tracker: AAAAAA=> X-Spam-Checker-Version: SpamAssassin
3.2.3 (2007-08-08) on
> 	oss-mail1.opensolaris.org
> X-Original-To: dtrace-discuss at opensolaris.org
> X-Antispam: No, score=-2.6/5.0, scanned in 0.189sec at (localhost
[127.0.0.1])
> 	by smf-spamd v1.3.1 - http://smfs.sf.net/
> X-Mailman-Version: 2.1.9
> List-Post: <mailto:dtrace-discuss at opensolaris.org>
> List-Subscribe:
<http://mail.opensolaris.org/mailman/listinfo/dtrace-discuss>,
> 	<mailto:dtrace-discuss-request at opensolaris.org?subject=subscribe>
> List-Unsubscribe:
>  <http://mail.opensolaris.org/mailman/listinfo/dtrace-discuss>,
> 	<mailto:dtrace-discuss-request at
opensolaris.org?subject=unsubscribe>
> List-Archive: <http://mail.opensolaris.org/pipermail/dtrace-discuss>
> List-Help: <mailto:dtrace-discuss-request at
opensolaris.org?subject=help>
> List-Id: DTrace General Discussion <dtrace-discuss.opensolaris.org>
> X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,HTML_MESSAGE
> 	autolearn=unavailable version=3.2.3
> X-Spam-Level:
> 
> 
>    Hi,
>    
>    
>    I really need help.
>    
>    
>    How can i identify why my process are 100% in LCK and SLP ?
>    
>    
>       PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
>    PROCESS/LWPID
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  30   0  18   0
>    java/28
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   1   0
>    java/94
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0   1   0
>    java/93
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   0   0
>    java/100
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   1   0   1   0
>    java/18
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
>    java/49
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
>    java/47
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
>    java/48
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
>    java/32
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
>    java/31
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
>    java/30
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
>    java/29
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
>    java/27
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
>    java/26
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
>    java/25
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
>    java/24
>    
>      6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
>    java/23
>    
>    
>    
>    Atenciosamente,
>    
>    
>    ------------------------------------------------------------------
>    
>    
>    Kleyson Rios.
>    
>    Ger?ncia de Suporte T?cnico
>    
>    Analista de Suporte / L?der de Equipe
>    
>    
>    Governo do Estado de Goi?s
>    
>    Ag?ncia de Administra??o
>    
>    Diretoria de Inform?tica
>    
>    +55 62 3201-6582
>    
>    
>    "Voc? sabe a diferen?a entre pessoas bem sucedidas e pessoas mal
>    sucedidas ?
>    
>    Pessoas bem sucedidas est?o dispostas a fazer as coisas que as pessoas
>    mal sucedidas n?o est?o."
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

-- 

Rickey C. Weisner 
Software Development and Performance Specialist 
Sun Microsystems, INC
cell phone: 615-308-1147
email: rick.weisner at sun.com
-------------- next part --------------
#!/usr/sbin/dtrace -s

sched:::off-cpu
/ arg1 != 0 && pid == $1 && curlwpsinfo->pr_state == SSLEEP/
{
	/*
	 * We''re sleeping.  Track our sobj type.
	 */
	self->sobj = curlwpsinfo->pr_stype;
	self->bedtime = timestamp;
}

sched:::off-cpu
/ arg1 != 0 && pid == $1 && curlwpsinfo->pr_state == SRUN/
{
	self->bedtime = timestamp;
/*        @s[pid,ustack(5)]=count(); */
}

sched:::on-cpu
/self->bedtime && !self->sobj /
{
	@[execname,pid,"preempted"] = sum(timestamp - self->bedtime);
/*    @c[execname,pid,"preempted"]=count(); */
	self->bedtime = 0;
}


sched:::on-cpu
/self->sobj /
{
	@[execname,pid,self->sobj == SOBJ_MUTEX ? "kernel-level lock" :
	    self->sobj == SOBJ_RWLOCK ? "rwlock" :
	    self->sobj == SOBJ_CV ? "condition variable" :
	    self->sobj == SOBJ_SEMA ? "semaphore" :
	    self->sobj == SOBJ_USER ? "user-level lock" :
	    self->sobj == SOBJ_USER_PI ? "user-level prio-inheriting lock"
:
	    self->sobj == SOBJ_SHUTTLE ? "shuttle" : "unknown"] 
sum(timestamp - self->bedtime);

/*
	@c[execname,pid,self->sobj == SOBJ_MUTEX ? "kernel-level lock" :
	    self->sobj == SOBJ_RWLOCK ? "rwlock" :
	    self->sobj == SOBJ_CV ? "condition variable" :
	    self->sobj == SOBJ_SEMA ? "semaphore" :
	    self->sobj == SOBJ_USER ? "user-level lock" :
	    self->sobj == SOBJ_USER_PI ? "user-level prio-inheriting lock"
:
	    self->sobj == SOBJ_SHUTTLE ? "shuttle" : "unknown"] 
count();
*/

	self->sobj = 0;
	self->bedtime = 0;
}

tick-10sec
{
	exit(0);
}

Jarod Jenson

2008-Apr-16 19:56 UTC

head link

[dtrace-discuss] Process in LCK / SLP (Please)

Peter B. Kessler''s email at 4/16/08 2:01 PM,
said:> Michael Schuster wrote:
>> Kleyson Rios wrote:
>>> Hi,
>>>
>>> I really need help.
>>>
>>> How can i identify why my process are 100% in LCK and SLP ?
>>>
>>>    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  30   0  18   0
java/28
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   1   0
java/94
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0   1   0
java/93
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   0   0
java/100
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   1   0   1   0
java/18
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/49
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/47
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/48
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/32
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/31
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/30
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/29
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/27
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/26
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/25
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/24
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/23
>> from the prstat man-page:
>>
>>       LCK
>>
>>           The percentage of time the process has spent waiting for
>>           user locks.
>>
>>       SLP
>>
>>           The percentage of time the process has spent sleeping.
>>
>> it looks like your processes have deadlocked themselves - not much one
can
>> do without knowledge about what they''re actually doing and
access to the
>> source code. I''d talk to the people developing this app.
>>
>> Michael
> 
> If it is a deadlock, and if it is a recent Java virtual machine, then
> you should be able to connect to it with jconsole (from the same directory
> where you get your java command), go to the "Threads" tab and use
the
> "Detect Deadlock" button to get a thread stack dump showing where
you
> have a cycle among your Java-level locks.
> 
> 			... peter
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
> 
*Highly* unlikely it is a deadlock. Insanely, Solaris charges time in 
condition wait ([pthread_]cond_wait or lwp_cond_wait) as lock time. This 
has never made sense to me, and I have yet to wield a big enough stick 
to get it fixed.

Take either a Java thread dump or a pstack(1) dump. If a Java dump, you 
will be in Object.wait (or similar) or sleep(). If a pstack(1), you will 
be in some form of cond_wait.

If it really is a deadlock (and I''ll buy you a beer if it is because I 
am sure it isn''t), Peter''s suggestion is the way to go.

As for sleep, the threads are tired. Long day of garbage collection I 
suppose. Seriously though, they have no (or very little) work to do 
since the top thread is 100% sleep but made 18 system calls (rounding 
error).

Thanks,

Jarod

Kleyson Rios

2008-Apr-17 11:49 UTC

head link

[dtrace-discuss] RES: Process in LCK / SLP (Please)

Tanks everybody.

I will try to do what you guys said.
When I get new issues I post again.

Regards, 
Kleyson Rios.

-----Mensagem original-----
De: Jarod Jenson [mailto:jarod at aeysis.com] 
Enviada em: quarta-feira, 16 de abril de 2008 16:56
Para: Peter B. Kessler
Cc: Kleyson Rios; dtrace-discuss at opensolaris.org
Assunto: Re: [dtrace-discuss] Process in LCK / SLP (Please)



Peter B. Kessler''s email at 4/16/08 2:01 PM,
said:> Michael Schuster wrote:
>> Kleyson Rios wrote:
>>> Hi,
>>>
>>> I really need help.
>>>
>>> How can i identify why my process are 100% in LCK and SLP ?
>>>
>>>    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  30   0  18   0
java/28
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   1   0
java/94
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0   1   0
java/93
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   0   0
java/100
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   1   0   1   0
java/18
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/49
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/47
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/48
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/32
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/31
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/30
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/29
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
java/27
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/26
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/25
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/24
>>>   6769 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0
java/23
>> from the prstat man-page:
>>
>>       LCK
>>
>>           The percentage of time the process has spent waiting for
>>           user locks.
>>
>>       SLP
>>
>>           The percentage of time the process has spent sleeping.
>>
>> it looks like your processes have deadlocked themselves - not much one
can >> do without knowledge about what they''re actually doing and
access to the
>> source code. I''d talk to the people developing this app.
>>
>> Michael
> 
> If it is a deadlock, and if it is a recent Java virtual machine, then
> you should be able to connect to it with jconsole (from the same directory
> where you get your java command), go to the "Threads" tab and use
the
> "Detect Deadlock" button to get a thread stack dump showing where
you
> have a cycle among your Java-level locks.
> 
> 			... peter
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
> 
*Highly* unlikely it is a deadlock. Insanely, Solaris charges time in 
condition wait ([pthread_]cond_wait or lwp_cond_wait) as lock time. This 
has never made sense to me, and I have yet to wield a big enough stick 
to get it fixed.

Take either a Java thread dump or a pstack(1) dump. If a Java dump, you 
will be in Object.wait (or similar) or sleep(). If a pstack(1), you will 
be in some form of cond_wait.

If it really is a deadlock (and I''ll buy you a beer if it is because I 
am sure it isn''t), Peter''s suggestion is the way to go.

As for sleep, the threads are tired. Long day of garbage collection I 
suppose. Seriously though, they have no (or very little) work to do 
since the top thread is 100% sleep but made 18 system calls (rounding 
error).

Thanks,

Jarod

przemolicc at poczta.fm

2008-Apr-18 09:49 UTC

head link

[dtrace-discuss] Process in LCK / SLP (Please)

On Wed, Apr 16, 2008 at 12:35:53PM -0300, Kleyson Rios
wrote:> Hi,
> 
>  
> 
> I really need help.
> 
>  
> 
> How can i identify why my process are 100% in LCK and SLP ?
Could you please use ''plockstat'' and show us the output ?
E.g.: plockstat -e 30 -s 10 -A -p `<pid of java proc>''

Regards
przemol

--
http://przemol.blogspot.com/





















----------------------------------------------------------------------
Poprowadz swoj klub do zwyciestwa!
Sprawdz >>> http://link.interia.pl/f1d76

Kleyson Rios

2008-Apr-18 17:03 UTC

head link

[dtrace-discuss] RES: Process in LCK / SLP (Please)

Hi przemol,

Bellow output of plockstat for malloc and libumem. Both many locks.
Why changing to libumem I didn''t get less locks ?

********** Plockstat using malloc (many many locks):

Mutex hold

Count     nsec Lock                         Caller
----------------------------------------------------------------------------
---
 4036    14391 libc.so.1`libc_malloc_lock
libjava.so`JNU_GetStringPlatformChars+0x290
 2118     7385 libc.so.1`libc_malloc_lock
libjava.so`JNU_ReleaseStringPlatformChars+0x18
 3174     4700 libc.so.1`libc_malloc_lock
libjvm.so`__1cCosGmalloc6FI_pv_+0x29
  181    63407 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
 3170     3407 libc.so.1`libc_malloc_lock
libjvm.so`__1cCosEfree6Fpv_v_+0x18
  588    12443 libc.so.1`libc_malloc_lock
libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x7f
  172    37572 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosQpd_resume_thread6FpnGThread__i_+0x1f
  176    26701 libc.so.1`libc_malloc_lock
libjvm.so`__1cCosGmalloc6FI_pv_+0x29
  596     7124 libc.so.1`libc_malloc_lock
libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x1e0
  450     7254 libc.so.1`libc_malloc_lock   0x858b2167
(...) 


********** Plockstat using libumem (many many locks too):

Mutex hold

Count     nsec Lock                         Caller
----------------------------------------------------------------------------
---
    4  1450455 libumem.so.1`umem_cache_lock
libumem.so.1`umem_cache_applyall+0x51
  100    46388 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
  100    23226 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosQpd_resume_thread6FpnGThread__i_+0x1f
  486     4314 0x807b680
libumem.so.1`umem_cache_alloc+0xc1
  488     4236 0x807b680
libumem.so.1`umem_cache_free+0x194
  356     4859 0x807b300
libumem.so.1`umem_cache_alloc+0xc1
  150    11499 0x8073030                    libumem.so.1`vmem_xfree+0xfe
  115    14473 0x8073030                    libumem.so.1`vmem_alloc+0x13c
  297     5374 0x807ad80
libumem.so.1`umem_cache_alloc+0xc1
  362     4258 0x807b300
libumem.so.1`umem_cache_free+0x194
  297     4304 0x807ad80
libumem.so.1`umem_cache_free+0x194
   48    21805 0x807b6c0
libumem.so.1`umem_cache_free+0x194
  150     6635 libumem.so.1`vmem0+0x30      libumem.so.1`vmem_alloc+0x126
(...)



Mutex hold

----------------------------------------------------------------------------
---
Count     nsec Lock                         Caller
  543    13110 0x807b680
libumem.so.1`umem_cache_alloc+0xc1

      nsec ---- Time Distribution --- count Stack
      8192 |@@@@@@@@@@@@            |   289
libumem.so.1`umem_cache_alloc+0xc1
     16384 |@@@@@@@@@               |   224 libumem.so.1`umem_alloc+0x3f
     32768 |@                       |    27 libumem.so.1`malloc+0x23
     65536 |                        |     3
libjava.so`JNU_GetStringPlatformChars+0x290
                                            0x20ac
----------------------------------------------------------------------------
---
Count     nsec Lock                         Caller
   78    89901 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46

      nsec ---- Time Distribution --- count Stack
     65536 |@@@@@@@@@@@@@@@@@       |    57 libc.so.1`fork_lock_exit+0x2f
    131072 |@@@@@                   |    19 libc.so.1`_thrp_suspend+0x22c
    262144 |                        |     1 libc.so.1`thr_suspend+0x1a
    524288 |                        |     1
libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
 
libjvm.so`__1cGThreadNdo_vm_suspend6M_i_+0x42
 
libjvm.so`__1cGThreadKvm_suspend6M_i_+0x33
 
libjvm.so`__1cUThreadSafepointStateXexamine_state_of_thread6Mi_v_+0x101
 
libjvm.so`__1cUSafepointSynchronizeFbegin6F_v_+0x12b
 
libjvm.so`__1cIVMThreadEloop6M_v_+0x1b4

(...)

 
Regards. 
Kleyson Rios.

-----Mensagem original-----
De: dtrace-discuss-bounces at opensolaris.org
[mailto:dtrace-discuss-bounces at opensolaris.org] Em nome de
przemolicc at poczta.fm
Enviada em: sexta-feira, 18 de abril de 2008 06:49
Para: dtrace-discuss at opensolaris.org
Assunto: Re: [dtrace-discuss] Process in LCK / SLP (Please)

On Wed, Apr 16, 2008 at 12:35:53PM -0300, Kleyson Rios
wrote:> Hi,
> 
>  
> 
> I really need help.
> 
>  
> 
> How can i identify why my process are 100% in LCK and SLP ?
Could you please use ''plockstat'' and show us the output ?
E.g.: plockstat -e 30 -s 10 -A -p `<pid of java proc>''

We had such case in the past. I don''t think it is the same reason but
might
give you some ideas.
We had a lot of locks. By using plockstat it turned out that those locks
comes from malloc. Using mtmalloc or libumem reduced a number of locks but
it was still a huge number (and eating our CPU). By using dtrace it turned
out that those locks comes from threads which were ... supposed to be
killed. Our programmers found a bug in the application (already unnecessary
threads were still alive and all of the them going in the endless loop),
changed the code and ... all locks went away.

As Jim said: start with watching who is using mallocs (which thread(s)).

Regards
przemol

--
http://przemol.blogspot.com/





















----------------------------------------------------------------------
Poprowadz swoj klub do zwyciestwa!
Sprawdz >>> http://link.interia.pl/f1d76

_______________________________________________
dtrace-discuss mailing list
dtrace-discuss at opensolaris.org

Jonathan Adams

2008-Apr-18 18:40 UTC

head link

[dtrace-discuss] RES: Process in LCK / SLP (Please)

On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:> Hi przemol,
>
> Bellow output of plockstat for malloc and libumem. Both many locks.
> Why changing to libumem I didn''t get less locks ?
>
You''re looking at Mutex hold statistics, which don''t mean a
lot
(unless contention is caused by long hold times)

The important thing for multi-threaded performance is *contention*.   
(Spinning and blocking)  Those are the statistics you should be  
looking at.

Both malloc and libumem use locks to protect their state;  libumem  
just uses many locks, in order to reduce contention.

Cheers,
- jonathan
> ********** Plockstat using malloc (many many locks):
>
> Mutex hold
>
> Count     nsec Lock                         Caller
>
----------------------------------------------------------------------------
> ---
> 4036    14391 libc.so.1`libc_malloc_lock
> libjava.so`JNU_GetStringPlatformChars+0x290
> 2118     7385 libc.so.1`libc_malloc_lock
> libjava.so`JNU_ReleaseStringPlatformChars+0x18
> 3174     4700 libc.so.1`libc_malloc_lock
> libjvm.so`__1cCosGmalloc6FI_pv_+0x29
>  181    63407 libc.so.1`_uberdata+0x40
> libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
> 3170     3407 libc.so.1`libc_malloc_lock
> libjvm.so`__1cCosEfree6Fpv_v_+0x18
>  588    12443 libc.so.1`libc_malloc_lock
> libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x7f
>  172    37572 libc.so.1`_uberdata+0x40
> libjvm.so`__1cCosQpd_resume_thread6FpnGThread__i_+0x1f
>  176    26701 libc.so.1`libc_malloc_lock
> libjvm.so`__1cCosGmalloc6FI_pv_+0x29
>  596     7124 libc.so.1`libc_malloc_lock
> libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x1e0
>  450     7254 libc.so.1`libc_malloc_lock   0x858b2167
> (...)
>
>
> ********** Plockstat using libumem (many many locks too):
>
> Mutex hold
>
> Count     nsec Lock                         Caller
>
----------------------------------------------------------------------------
> ---
>    4  1450455 libumem.so.1`umem_cache_lock
> libumem.so.1`umem_cache_applyall+0x51
>  100    46388 libc.so.1`_uberdata+0x40
> libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
>  100    23226 libc.so.1`_uberdata+0x40
> libjvm.so`__1cCosQpd_resume_thread6FpnGThread__i_+0x1f
>  486     4314 0x807b680
> libumem.so.1`umem_cache_alloc+0xc1
>  488     4236 0x807b680
> libumem.so.1`umem_cache_free+0x194
>  356     4859 0x807b300
> libumem.so.1`umem_cache_alloc+0xc1
>  150    11499 0x8073030                    libumem.so.1`vmem_xfree 
> +0xfe
>  115    14473 0x8073030                    libumem.so.1`vmem_alloc 
> +0x13c
>  297     5374 0x807ad80
> libumem.so.1`umem_cache_alloc+0xc1
>  362     4258 0x807b300
> libumem.so.1`umem_cache_free+0x194
>  297     4304 0x807ad80
> libumem.so.1`umem_cache_free+0x194
>   48    21805 0x807b6c0
> libumem.so.1`umem_cache_free+0x194
>  150     6635 libumem.so.1`vmem0+0x30      libumem.so.1`vmem_alloc 
> +0x126
> (...)
>
>
>
> Mutex hold
>
>
----------------------------------------------------------------------------
> ---
> Count     nsec Lock                         Caller
>  543    13110 0x807b680
> libumem.so.1`umem_cache_alloc+0xc1
>
>      nsec ---- Time Distribution --- count Stack
>      8192 |@@@@@@@@@@@@            |   289
> libumem.so.1`umem_cache_alloc+0xc1
>     16384 |@@@@@@@@@               |   224 libumem.so.1`umem_alloc 
> +0x3f
>     32768 |@                       |    27 libumem.so.1`malloc+0x23
>     65536 |                        |     3
> libjava.so`JNU_GetStringPlatformChars+0x290
>                                            0x20ac
>
----------------------------------------------------------------------------
> ---
> Count     nsec Lock                         Caller
>   78    89901 libc.so.1`_uberdata+0x40
> libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
>
>      nsec ---- Time Distribution --- count Stack
>     65536 |@@@@@@@@@@@@@@@@@       |    57 libc.so.1`fork_lock_exit 
> +0x2f
>    131072 |@@@@@                   |    19 libc.so.1`_thrp_suspend 
> +0x22c
>    262144 |                        |     1 libc.so.1`thr_suspend+0x1a
>    524288 |                        |     1
> libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
>
> libjvm.so`__1cGThreadNdo_vm_suspend6M_i_+0x42
>
> libjvm.so`__1cGThreadKvm_suspend6M_i_+0x33
>
> libjvm.so`__1cUThreadSafepointStateXexamine_state_of_thread6Mi_v_ 
> +0x101
>
> libjvm.so`__1cUSafepointSynchronizeFbegin6F_v_+0x12b
>
> libjvm.so`__1cIVMThreadEloop6M_v_+0x1b4
>
> (...)
>
>
> Regards.
> Kleyson Rios.
>
> -----Mensagem original-----
> De: dtrace-discuss-bounces at opensolaris.org
> [mailto:dtrace-discuss-bounces at opensolaris.org] Em nome de
> przemolicc at poczta.fm
> Enviada em: sexta-feira, 18 de abril de 2008 06:49
> Para: dtrace-discuss at opensolaris.org
> Assunto: Re: [dtrace-discuss] Process in LCK / SLP (Please)
>
> On Wed, Apr 16, 2008 at 12:35:53PM -0300, Kleyson Rios wrote:
>> Hi,
>>
>>
>>
>> I really need help.
>>
>>
>>
>> How can i identify why my process are 100% in LCK and SLP ?
>
> Could you please use ''plockstat'' and show us the output ?
> E.g.: plockstat -e 30 -s 10 -A -p `<pid of java proc>''
>
> We had such case in the past. I don''t think it is the same reason
> but might
> give you some ideas.
> We had a lot of locks. By using plockstat it turned out that those  
> locks
> comes from malloc. Using mtmalloc or libumem reduced a number of  
> locks but
> it was still a huge number (and eating our CPU). By using dtrace it  
> turned
> out that those locks comes from threads which were ... supposed to be
> killed. Our programmers found a bug in the application (already  
> unnecessary
> threads were still alive and all of the them going in the endless  
> loop),
> changed the code and ... all locks went away.
>
> As Jim said: start with watching who is using mallocs (which  
> thread(s)).
>
> Regards
> przemol
>
> --
> http://przemol.blogspot.com/
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ----------------------------------------------------------------------
> Poprowadz swoj klub do zwyciestwa!
> Sprawdz >>> http://link.interia.pl/f1d76
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
--------------------------------------------------------------------------
Jonathan Adams, Sun Microsystems, ZFS Team    http://blogs.sun.com/jwadams

przemolicc at poczta.fm

2008-Apr-21 08:37 UTC

head link

[dtrace-discuss] RES: Process in LCK / SLP (Please)

On Fri, Apr 18, 2008 at 02:40:15PM -0400, Jonathan Adams
wrote:> 
> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
> > Hi przemol,
> >
> > Bellow output of plockstat for malloc and libumem. Both many locks.
> > Why changing to libumem I didn''t get less locks ?
> >
> 
> You''re looking at Mutex hold statistics, which don''t mean
a lot
> (unless contention is caused by long hold times)
> 
> The important thing for multi-threaded performance is *contention*.   
> (Spinning and blocking)  Those are the statistics you should be  
> looking at.
> 
> Both malloc and libumem use locks to protect their state;  libumem  
> just uses many locks, in order to reduce contention.
So it seems Kleyson that I was wrong and rickey c weisner script
is the way to go.

Regards
przemol

-- 
http://przemol.blogspot.com/





















----------------------------------------------------------------------
Posluchaj najlepszych kawalkow w Sieci!
http://link.interia.pl/f1da1

Kleyson Rios

2008-Apr-22 17:36 UTC

head link

[dtrace-discuss] RES: RES: Process in LCK / SLP (Please)

OK jonathan,

I understand.

So, looking on right place now, i can see few locks and sometimes no locks
(just Mutex Hold). But I still have many threads in 100% LCK.

If I don''t have a lot of locks, where is my problem ?

Running rickey c weisner''s script a get:

(...)
    25736
              libc.so.1`_so_send+0x15
              libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
              libjvm.so`JVM_Send+0x32
              libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
              0xc3c098d3
               10
    25736
              0xc3d2a33a
               14
    25736
              libc.so.1`_write+0x15
              libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
              libjvm.so`JVM_Write+0x30
              libjava.so`0xc8f7c04b
               16
    25736
              libc.so.1`stat64+0x15
               21
    25736
              libc.so.1`_write+0x15
              libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
              libjvm.so`JVM_Write+0x30
              libjava.so`0xc8f80ce9
               76
  java                       25736  kernel-level lock              1
  java                       25736  shuttle                        6
  java                       25736  preempted                      7
  java                       25736  user-level lock                511
  java                       25736  condition variable             748

 
Atenciosamente,
 
------------------------------------------------------------------
 
Kleyson Rios.
Ger?ncia de Suporte T?cnico
Analista de Suporte / L?der de Equipe
 

-----Mensagem original-----
De: Jonathan Adams [mailto:jonathan.adams at sun.com] 
Enviada em: sexta-feira, 18 de abril de 2008 15:40
Para: Kleyson Rios
Cc: dtrace-discuss at opensolaris.org
Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)


On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:> Hi przemol,
>
> Bellow output of plockstat for malloc and libumem. Both many locks.
> Why changing to libumem I didn''t get less locks ?
>
You''re looking at Mutex hold statistics, which don''t mean a
lot
(unless contention is caused by long hold times)

The important thing for multi-threaded performance is *contention*.   
(Spinning and blocking)  Those are the statistics you should be  
looking at.

Both malloc and libumem use locks to protect their state;  libumem  
just uses many locks, in order to reduce contention.

Cheers,
- jonathan

Jim Mauro

2008-Apr-23 04:11 UTC

head link

[dtrace-discuss] RES: RES: Process in LCK / SLP (Please)

You may want to cross-post to a Java alias, but I''ve been down this
road before.

Java will call into malloc() for buffers for network reads and writes
that are larger than 2k bytes (the 2k is from memory, and I think it
was a 1.5 JVM). A large number of malloc calls,
and resulting contention on locks in the library, are due to the application
doing network writes of larger than 2k.

Newer JVMs (1.6) may improve this, but I''m not sure. There''s
also
an alternative set of classes and methods, NIO, which also can
help (although I''ve heard tell that NIO brings other problems along
with it, but I can not speak from personal experience).

At this point, I think you need to consult with Java experts to determine
what options you have for buffer allocation for network IO from the
Java heap, versus the current behavior of the JVM dropping back
to malloc for allocating buffers for network IO.

The other option of course is determining if the code can be changed to
use buffers smaller than 2k.

Thanks,
/jim




Kleyson Rios wrote:> OK jonathan,
>
> I understand.
>
> So, looking on right place now, i can see few locks and sometimes no locks
> (just Mutex Hold). But I still have many threads in 100% LCK.
>
> If I don''t have a lot of locks, where is my problem ?
>
> Running rickey c weisner''s script a get:
>
> (...)
>     25736
>               libc.so.1`_so_send+0x15
>               libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
>               libjvm.so`JVM_Send+0x32
>               libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
>               0xc3c098d3
>                10
>     25736
>               0xc3d2a33a
>                14
>     25736
>               libc.so.1`_write+0x15
>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>               libjvm.so`JVM_Write+0x30
>               libjava.so`0xc8f7c04b
>                16
>     25736
>               libc.so.1`stat64+0x15
>                21
>     25736
>               libc.so.1`_write+0x15
>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>               libjvm.so`JVM_Write+0x30
>               libjava.so`0xc8f80ce9
>                76
>   java                       25736  kernel-level lock              1
>   java                       25736  shuttle                        6
>   java                       25736  preempted                      7
>   java                       25736  user-level lock                511
>   java                       25736  condition variable             748
>
>  
> Atenciosamente,
>  
> ------------------------------------------------------------------
>  
> Kleyson Rios.
> Ger?ncia de Suporte T?cnico
> Analista de Suporte / L?der de Equipe
>  
>
> -----Mensagem original-----
> De: Jonathan Adams [mailto:jonathan.adams at sun.com] 
> Enviada em: sexta-feira, 18 de abril de 2008 15:40
> Para: Kleyson Rios
> Cc: dtrace-discuss at opensolaris.org
> Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)
>
>
> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
>   
>> Hi przemol,
>>
>> Bellow output of plockstat for malloc and libumem. Both many locks.
>> Why changing to libumem I didn''t get less locks ?
>>
>>     
>
> You''re looking at Mutex hold statistics, which don''t mean
a lot
> (unless contention is caused by long hold times)
>
> The important thing for multi-threaded performance is *contention*.   
> (Spinning and blocking)  Those are the statistics you should be  
> looking at.
>
> Both malloc and libumem use locks to protect their state;  libumem  
> just uses many locks, in order to reduce contention.
>
> Cheers,
> - jonathan
>
>
>
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Kleyson Rios

2008-Apr-23 12:56 UTC

head link

[dtrace-discuss] RES: Process in LCK / SLP (Please)

Hi Jim,

But, if there are problems in malloc for buffers for network, I should see
the locks when running plockstat, don''t see ?

 
Regards,
 
------------------------------------------------------------------
 
Kleyson Rios.

-----Mensagem original-----
De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM] 
Enviada em: quarta-feira, 23 de abril de 2008 01:11
Para: Kleyson Rios
Cc: dtrace-discuss at opensolaris.org
Assunto: Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please)

You may want to cross-post to a Java alias, but I''ve been down this
road before.

Java will call into malloc() for buffers for network reads and writes
that are larger than 2k bytes (the 2k is from memory, and I think it
was a 1.5 JVM). A large number of malloc calls,
and resulting contention on locks in the library, are due to the application
doing network writes of larger than 2k.

Newer JVMs (1.6) may improve this, but I''m not sure. There''s
also
an alternative set of classes and methods, NIO, which also can
help (although I''ve heard tell that NIO brings other problems along
with it, but I can not speak from personal experience).

At this point, I think you need to consult with Java experts to determine
what options you have for buffer allocation for network IO from the
Java heap, versus the current behavior of the JVM dropping back
to malloc for allocating buffers for network IO.

The other option of course is determining if the code can be changed to
use buffers smaller than 2k.

Thanks,
/jim




Kleyson Rios wrote:> OK jonathan,
>
> I understand.
>
> So, looking on right place now, i can see few locks and sometimes no locks
> (just Mutex Hold). But I still have many threads in 100% LCK.
>
> If I don''t have a lot of locks, where is my problem ?
>
> Running rickey c weisner''s script a get:
>
> (...)
>     25736
>               libc.so.1`_so_send+0x15
>               libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
>               libjvm.so`JVM_Send+0x32
>
libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131>               0xc3c098d3
>                10
>     25736
>               0xc3d2a33a
>                14
>     25736
>               libc.so.1`_write+0x15
>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>               libjvm.so`JVM_Write+0x30
>               libjava.so`0xc8f7c04b
>                16
>     25736
>               libc.so.1`stat64+0x15
>                21
>     25736
>               libc.so.1`_write+0x15
>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>               libjvm.so`JVM_Write+0x30
>               libjava.so`0xc8f80ce9
>                76
>   java                       25736  kernel-level lock              1
>   java                       25736  shuttle                        6
>   java                       25736  preempted                      7
>   java                       25736  user-level lock                511
>   java                       25736  condition variable             748
>
>  
> Atenciosamente,
>  
> ------------------------------------------------------------------
>  
> Kleyson Rios.
> Ger?ncia de Suporte T?cnico
> Analista de Suporte / L?der de Equipe
>  
>
> -----Mensagem original-----
> De: Jonathan Adams [mailto:jonathan.adams at sun.com] 
> Enviada em: sexta-feira, 18 de abril de 2008 15:40
> Para: Kleyson Rios
> Cc: dtrace-discuss at opensolaris.org
> Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)
>
>
> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
>   
>> Hi przemol,
>>
>> Bellow output of plockstat for malloc and libumem. Both many locks.
>> Why changing to libumem I didn''t get less locks ?
>>
>>     
>
> You''re looking at Mutex hold statistics, which don''t mean
a lot
> (unless contention is caused by long hold times)
>
> The important thing for multi-threaded performance is *contention*.   
> (Spinning and blocking)  Those are the statistics you should be  
> looking at.
>
> Both malloc and libumem use locks to protect their state;  libumem  
> just uses many locks, in order to reduce contention.
>
> Cheers,
> - jonathan
>
>
>
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Jim Mauro

2008-Apr-23 16:08 UTC

head link

[dtrace-discuss] RES: Process in LCK / SLP (Please)

D''oh! My bad. My brain was thinking malloc and lock contention,
I saw the network calls on the stack, connected the dots and starting
typing. I should have looked at the top of the stack. Sorry.

Let''s back up a minute. You have a Java application and the JVM has a
large number of threads in LCK time (user lock), right?

plockstat showed a lot of lock activity in libc malloc, at which point
you tried mtmalloc and libumem, neither of which made much of
a difference, right?

We know empirically (and heard from Jarod) that LCK can be deceiving,
because threads that call cond_wait(3C) to wait on a condition variable
will be charged with LCK time, but it''s really sleep time most of the
time.
So it is sometimes an artifact of the application design that threads show
up at 100% LCK time - threads are waiting for a cond_signal() (or 
cond_broadcast())
to be put to work, but there''s no work to do. Anyway...

The problem I thought we were chasing was lock activity in malloc.

If you have more recent data (plockstat, prstat) please pass it to me (sorry
I missed it), and I''ll get back in sync on the current problem.

I would add that there''s nothing here that I think belongs on 
dtrace_discuss.
I would start posting to the performance group and a Java performance
alias. Also, and I''m sorry if I missed this, how well (or poorly) is
the
Java application performing, and what metric do we have to determine
application performance.

Ricky''s script grabs a user stack when a thread goes off CPU to sleep,
and
tally''s what the threads are sleeping on. It''s mostly
condition
variables, which
doesn''t really help us much (everything sleeps on CVs...mostly).
There''s a
lof of user locks in there as well. So...

The CVs look like threads blocking on writes, including writes to a 
network socket.
You need to figure out the write target, and go from there.

Get the PID of the target JVM -

dtrace -n ''syscall::write:entry / pid == $target / 
@[fds[arg0].fi_pathname] = count(); }'' -p [PID_OF_JVM]

The above will list the files being written to. The next step depends on 
what we see.

The same goes for the user locks. If they are indeed malloc locks (which 
I think they are),
I suspect that will come back around to network IOs larger than 2k. Try 
this:

dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] = count();
}'' -p
[PID_OF_JVM]

Thanks,
/jim

Kleyson Rios wrote:> Hi Jim,
>
> But, if there are problems in malloc for buffers for network, I should see
> the locks when running plockstat, don''t see ?
>
>  
> Regards,
>  
> ------------------------------------------------------------------
>  
> Kleyson Rios.
>
> -----Mensagem original-----
> De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM] 
> Enviada em: quarta-feira, 23 de abril de 2008 01:11
> Para: Kleyson Rios
> Cc: dtrace-discuss at opensolaris.org
> Assunto: Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please)
>
> You may want to cross-post to a Java alias, but I''ve been down
this
> road before.
>
> Java will call into malloc() for buffers for network reads and writes
> that are larger than 2k bytes (the 2k is from memory, and I think it
> was a 1.5 JVM). A large number of malloc calls,
> and resulting contention on locks in the library, are due to the
application
> doing network writes of larger than 2k.
>
> Newer JVMs (1.6) may improve this, but I''m not sure.
There''s also
> an alternative set of classes and methods, NIO, which also can
> help (although I''ve heard tell that NIO brings other problems
along
> with it, but I can not speak from personal experience).
>
> At this point, I think you need to consult with Java experts to determine
> what options you have for buffer allocation for network IO from the
> Java heap, versus the current behavior of the JVM dropping back
> to malloc for allocating buffers for network IO.
>
> The other option of course is determining if the code can be changed to
> use buffers smaller than 2k.
>
> Thanks,
> /jim
>
>
>
>
> Kleyson Rios wrote:
>   
>> OK jonathan,
>>
>> I understand.
>>
>> So, looking on right place now, i can see few locks and sometimes no
locks
>> (just Mutex Hold). But I still have many threads in 100% LCK.
>>
>> If I don''t have a lot of locks, where is my problem ?
>>
>> Running rickey c weisner''s script a get:
>>
>> (...)
>>     25736
>>               libc.so.1`_so_send+0x15
>>               libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
>>               libjvm.so`JVM_Send+0x32
>>
>>     
> libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
>   
>>               0xc3c098d3
>>                10
>>     25736
>>               0xc3d2a33a
>>                14
>>     25736
>>               libc.so.1`_write+0x15
>>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>>               libjvm.so`JVM_Write+0x30
>>               libjava.so`0xc8f7c04b
>>                16
>>     25736
>>               libc.so.1`stat64+0x15
>>                21
>>     25736
>>               libc.so.1`_write+0x15
>>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>>               libjvm.so`JVM_Write+0x30
>>               libjava.so`0xc8f80ce9
>>                76
>>   java                       25736  kernel-level lock              1
>>   java                       25736  shuttle                        6
>>   java                       25736  preempted                      7
>>   java                       25736  user-level lock                511
>>   java                       25736  condition variable             748
>>
>>  
>> Atenciosamente,
>>  
>> ------------------------------------------------------------------
>>  
>> Kleyson Rios.
>> Ger?ncia de Suporte T?cnico
>> Analista de Suporte / L?der de Equipe
>>  
>>
>> -----Mensagem original-----
>> De: Jonathan Adams [mailto:jonathan.adams at sun.com] 
>> Enviada em: sexta-feira, 18 de abril de 2008 15:40
>> Para: Kleyson Rios
>> Cc: dtrace-discuss at opensolaris.org
>> Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)
>>
>>
>> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
>>   
>>     
>>> Hi przemol,
>>>
>>> Bellow output of plockstat for malloc and libumem. Both many locks.
>>> Why changing to libumem I didn''t get less locks ?
>>>
>>>     
>>>       
>> You''re looking at Mutex hold statistics, which don''t
mean a lot
>> (unless contention is caused by long hold times)
>>
>> The important thing for multi-threaded performance is *contention*.   
>> (Spinning and blocking)  Those are the statistics you should be  
>> looking at.
>>
>> Both malloc and libumem use locks to protect their state;  libumem  
>> just uses many locks, in order to reduce contention.
>>
>> Cheers,
>> - jonathan
>>
>>
>>
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>>   
>>     
>
>
>

Kleyson Rios

2008-Apr-23 18:11 UTC

head link

[dtrace-discuss] RES: Process in LCK / SLP (Please)

OK. Let''s go.
Now i have about 100 users simultaneous. That box has 4 CPU and it''s
running
JBoss.
My application it''s very slow. Looking at prstat I saw that I have many
threads and most of time one or two threads are running in the same time.
Using -Lm prstat options I saw that the most of time the threads are in
LCK/SLP state.

Using plockstat I saw many locks for malloc, so I changed for libumem. I
don''t have more contention locks (Spin, block), I have just many Mutex
hold
locks. But the problem still exist. I have 7 box running jboss, all box have
the same problem.

Doubt: Are there some difference for output running prstat for 1 second or 5
second? Because running prstat for 1 second I get thread with 100% LCK.

*******************************************************************
Prstat (now), I got a little better LCK when I changed for libumem:

[root at aga253distp156:/] # prstat -Lmp 25736  
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID 
 25736 root      57 0.0 0.0 0.0 0.0  43 0.0 0.1  36  83  76   0 java/2
 25736 root     0.6 0.5 0.0 0.0 0.0  16  78 0.0  32   0  1K   0 java/7211
 25736 root     0.4 0.6 0.0 0.0 0.0  29  70 0.0  19   5  1K   0 java/7237
 25736 root     0.4 0.6 0.0 0.0 0.0  27  61 0.2  25  15  1K   0 java/7364
 25736 root     0.4 0.5 0.0 0.0 0.0  20  80 0.0  19   8  1K   0 java/7218
 25736 root     0.3 0.4 0.0 0.0 0.0  29  70 0.0  18   5  1K   0 java/7204
 25736 root     0.3 0.4 0.0 0.0 0.0  16  73 0.0  23   6 864   0 java/7374
 25736 root     0.2 0.4 0.0 0.0 0.0  20  79 0.0   9  11  1K   0 java/7305
 25736 root     0.3 0.4 0.0 0.0 0.0  23  76 0.0   7   9 900   0 java/7475
 25736 root     0.4 0.3 0.0 0.0 0.0  11  88 0.0  15   1 553   0 java/7236
 25736 root     0.3 0.3 0.0 0.0 0.0 0.8  93 0.0  17   2 734   0 java/7370
 25736 root     0.1 0.3 0.0 0.0 0.0 8.7  91 0.0   7   2 773   0 java/7235
 25736 root     0.2 0.2 0.0 0.0 0.0 9.5  90 0.0   8   0 316   0 java/7219
 25736 root     0.2 0.1 0.0 0.0 0.0 9.8  84 0.0  17  11 306   0 java/7269
 25736 root     0.1 0.1 0.0 0.0 0.0 0.1 100 0.0   2   1 419   0 java/7471
 25736 root     0.2 0.0 0.0 0.0 0.0 8.0  92 0.0  14   3  68   0 java/7264
 25736 root     0.1 0.1 0.0 0.0 0.0  10  89 0.0   6   0 327   0 java/7470
 25736 root     0.1 0.1 0.0 0.0 0.0 0.0 100 0.0  10   0 302   0 java/7365
 25736 root     0.2 0.0 0.0 0.0 0.0 6.4  88 0.1  14   5  49   0 java/7317
 25736 root     0.1 0.1 0.0 0.0 0.0  93 7.0 0.1   4   4 209   0 java/7367
 25736 root     0.1 0.0 0.0 0.0 0.0 4.0  91 0.0  10   1  55   0 java/7229
 25736 root     0.1 0.0 0.0 0.0 0.0 3.9  96 0.0  11   1  57   0 java/7304
 25736 root     0.1 0.1 0.0 0.0 0.0 8.6  91 0.0   4   1 196   0 java/14
 25736 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0   1   1 263   0 java/7297
 25736 root     0.1 0.1 0.0 0.0 0.0 0.0 100 0.1 154   0  86   0 java/10
 25736 root     0.1 0.0 0.0 0.0 0.0 0.6  99 0.0   9   0  55   0 java/7377
 25736 root     0.1 0.0 0.0 0.0 0.0 100 0.0 0.0  12   4  54   0 java/4
 25736 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0  16   0  40   0 java/3
 25736 root     0.0 0.0 0.0 0.0 0.0 3.6  96 0.0   2   0   4   0 java/7481
 25736 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   8   0  12   0 java/170
 25736 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0  12   0  12   0 java/231
 25736 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0  19   4  25   0 java/171
 25736 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   2   0 java/7221
 25736 root     0.0 0.0 0.0 0.0 0.0  95 0.0 0.1  20   0  20   0 java/167
 25736 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   2   0 java/7378
Total: 1 processes, 256 lwps, load averages: 0.74, 0.68, 0.62

******************************************************************


Plockstat (now):

[root at aga253distp213:/dtrace] # plockstat -A -e 30 -p 25736
        0
Mutex hold

Count     nsec Lock                         Caller
----------------------------------------------------------------------------
---
 3546     4765 0x8079580                  libumem.so.1`umem_cache_alloc+0xc1
3509     3865 0x8079580                  libumem.so.1`umem_cache_free+0x194
  328    12542 0x8073030                    libumem.so.1`vmem_xfree+0xfe
    3  1340378 libumem.so.1`umem_cache_lock
libumem.so.1`umem_cache_applyall+0x51
  261    13414 0x8073030                    libumem.so.1`vmem_alloc+0x13c
  471     5696 0x807b680
libumem.so.1`umem_cache_alloc+0xc1
  475     5441 0x807b680
libumem.so.1`umem_cache_free+0x194
  123    20767 0x807b6c0
libumem.so.1`umem_cache_free+0x194
   28    90103 0x807a1c0
libumem.so.1`umem_cache_free+0x194
  329     6788 libumem.so.1`vmem0+0x30      libumem.so.1`vmem_alloc+0x126
  309     6900 0x807ad80
libumem.so.1`umem_cache_alloc+0xc1
  329     6310 libumem.so.1`vmem0+0x4a4     libumem.so.1`vmem_xalloc+0x43b
  328     6248 libumem.so.1`vmem0+0x30      libumem.so.1`vmem_xfree+0x127
  355     5727 0x807b300
libumem.so.1`umem_cache_alloc+0xc1
  216     9080 0x807b940
libumem.so.1`umem_cache_alloc+0xc1
  355     5338 0x807b300
libumem.so.1`umem_cache_free+0x194
  328     5393 libumem.so.1`vmem0+0x4a4     libumem.so.1`vmem_xfree+0xfe
  310     5308 0x807ad80
libumem.so.1`umem_cache_free+0x194
  329     4921 0x8073030                    libumem.so.1`vmem_xalloc+0x43b
  154    10299 0x807b9c0
libumem.so.1`umem_cache_alloc+0xc1
  216     6043 0x807b940
libumem.so.1`umem_cache_free+0x194
  329     3960 0x8073030                    libumem.so.1`vmem_xalloc+0x315
  161     7968 0x807b980
libumem.so.1`umem_cache_alloc+0xc1
  235     5438 0x807b500
libumem.so.1`umem_cache_alloc+0xc1
  329     3784 libumem.so.1`vmem0+0x4a4     libumem.so.1`vmem_alloc+0x13c
  241     5029 0x807b500
libumem.so.1`umem_cache_free+0x194
   85    13906 0x8082c80
libumem.so.1`umem_cache_alloc+0xc1
  329     3577 libumem.so.1`vmem0+0x4a4     libumem.so.1`vmem_xalloc+0x315
   39    29926 0x8079980
libumem.so.1`umem_cache_free+0x194
  130     8876 0x807ba00
libumem.so.1`umem_cache_alloc+0xc1
   34    33388 0xc8e149c0                   libc.so.1`thr_suspend+0x1a
  155     7304 0x807ba40
libumem.so.1`umem_cache_alloc+0xc1
   13    86856 0x807ac80
libumem.so.1`umem_cache_alloc+0xc1
   94    11979 0x80828c0
libumem.so.1`umem_cache_alloc+0xc1
  143     7742 0x807b5c0
libumem.so.1`umem_cache_alloc+0xc1
  153     5759 0x807b5c0
libumem.so.1`umem_cache_free+0x194
   98     8816 0x807b540
libumem.so.1`umem_cache_alloc+0xc1
  115     7499 0x807b8c0
libumem.so.1`umem_cache_alloc+0xc1
   26    32518 0xc8e12100                   libc.so.1`thr_suspend+0x1a
  158     5210 0x807b640
libumem.so.1`umem_cache_free+0x194
   95     8504 0x80828c0
libumem.so.1`umem_cache_free+0x194
  155     5199 0x807ba40
libumem.so.1`umem_cache_free+0x194
    8    97711 0xc8e145c0                   libc.so.1`thr_suspend+0x1a
  130     5915 0x807ba00
libumem.so.1`umem_cache_free+0x194
   41    18736 0x807b2c0
libumem.so.1`umem_cache_alloc+0xc1
  101     7524 0x8082940
libumem.so.1`umem_cache_alloc+0xc1
   33    22842 0x807d640
libumem.so.1`umem_cache_free+0x194
   35    21064 0x807acc0
libumem.so.1`umem_cache_alloc+0xc1
   16    45814 0xc8e12900                   libc.so.1`thr_suspend+0x1a
  129     5641 0x807b880
libumem.so.1`umem_cache_free+0x194
    5   142887 0xc8e13100                   libc.so.1`thr_suspend+0x1a
   43    16584 0x807b240
libumem.so.1`umem_cache_alloc+0xc1
   21    33838 0xc8e12b80                   libc.so.1`thr_suspend+0x1a
  101     7012 0x8082940
libumem.so.1`umem_cache_free+0x194
    8    42677 0xc8e13f00                   libc.so.1`thr_suspend+0x1a
    7    48677 0xc8e12940                   libc.so.1`thr_suspend+0x1a
    9    37763 0xc8e13080                   libc.so.1`thr_suspend+0x1a
   30    11317 0x8079880
libumem.so.1`umem_cache_alloc+0xc1
   61     5553 0x807fcc0
libumem.so.1`umem_cache_alloc+0xc1
    7    38522 0xc8e14240                   libc.so.1`thr_suspend+0x1a
   47     5664 0x8079cc0                    
   26     9348 0x807a280
libumem.so.1`umem_cache_alloc+0xc1
    6    39964 0xc8e14480                   libc.so.1`thr_suspend+0x1a
   26     9198 0x807bc40
libumem.so.1`umem_cache_alloc+0xc1
    5    47600 0xc8e15440                   libc.so.1`thr_suspend+0x1a
   30     7853 0x807a240
libumem.so.1`umem_cache_alloc+0xc1
   14    16771 0x80824b4
libumem.so.1`umem_depot_alloc+0xa4
    3     3747 0x807f4b4
libumem.so.1`umem_cache_update+0xe5
    3     3743 0x807d134
libumem.so.1`umem_cache_update+0xe5
(...)

Mutex block

Count     nsec Lock                         Caller
----------------------------------------------------------------------------
---
    1  1624523 0x8073030                    libumem.so.1`vmem_xfree+0x24
    1    23803 0x80825c0
libumem.so.1`umem_cache_free+0x51
    1    18619 0x8073030                    libumem.so.1`vmem_xfree+0x24
    1    15304 0x807b5c0
libumem.so.1`umem_cache_free+0x51


******************************************************************


# time dtrace -n ''syscall::write:entry / pid == $target / {
@[fds[arg0].fi_pathname] = count(); }'' -p 25736

dtrace: description ''syscall::write:entry '' matched 1 probe
^C

  /zonas/srvintra/root/var/tmp/imageio158908.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158909.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158910.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158911.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158912.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158913.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158914.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158915.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158916.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158917.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158918.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158919.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158920.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158921.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158922.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158923.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158924.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158925.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158926.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158927.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158928.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158929.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158930.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158931.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158932.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158933.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158934.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158935.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158936.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158937.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158938.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158939.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158940.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158941.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158942.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158943.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158944.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158945.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158946.tmp                   16
  /zonas/srvintra/root/var/tmp/imageio158947.tmp                   16
 
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04
_23.controller.csv              247
 
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/deploy/detr
an.war/WEB-INF/control-detran.log              784
 
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04
_23.request.log            89872

real    0m46.632s
user    0m0.535s
sys     0m0.412s

*************************************************************

# time dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] =
count(); }''
-p 5736

dtrace: description ''pid$target:libc:malloc:entry '' matched 1
probe
^C


real    0m52.971s
user    0m0.581s
sys     0m0.452s


 
Regards,
 
------------------------------------------------------------------
 
Kleyson Rios.
Ger?ncia de Suporte T?cnico
Analista de Suporte / L?der de Equipe
 
Governo do Estado de Goi?s
Ag?ncia de Administra??o
Diretoria de Inform?tica
+55 62 3201-6582
 
"Voc? sabe a diferen?a entre pessoas bem sucedidas e pessoas mal sucedidas
?
Pessoas bem sucedidas est?o dispostas a fazer as coisas que as pessoas mal
sucedidas n?o est?o."

-----Mensagem original-----
De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM] 
Enviada em: quarta-feira, 23 de abril de 2008 13:08
Para: Kleyson Rios
Cc: dtrace-discuss at opensolaris.org
Assunto: Re: RES: Process in LCK / SLP (Please)

D''oh! My bad. My brain was thinking malloc and lock contention,
I saw the network calls on the stack, connected the dots and starting
typing. I should have looked at the top of the stack. Sorry.

Let''s back up a minute. You have a Java application and the JVM has a
large number of threads in LCK time (user lock), right?

plockstat showed a lot of lock activity in libc malloc, at which point
you tried mtmalloc and libumem, neither of which made much of
a difference, right?

We know empirically (and heard from Jarod) that LCK can be deceiving,
because threads that call cond_wait(3C) to wait on a condition variable
will be charged with LCK time, but it''s really sleep time most of the
time.
So it is sometimes an artifact of the application design that threads show
up at 100% LCK time - threads are waiting for a cond_signal() (or 
cond_broadcast())
to be put to work, but there''s no work to do. Anyway...

The problem I thought we were chasing was lock activity in malloc.

If you have more recent data (plockstat, prstat) please pass it to me (sorry
I missed it), and I''ll get back in sync on the current problem.

I would add that there''s nothing here that I think belongs on 
dtrace_discuss.
I would start posting to the performance group and a Java performance
alias. Also, and I''m sorry if I missed this, how well (or poorly) is
the
Java application performing, and what metric do we have to determine
application performance.

Ricky''s script grabs a user stack when a thread goes off CPU to sleep,
and
tally''s what the threads are sleeping on. It''s mostly
condition
variables, which
doesn''t really help us much (everything sleeps on CVs...mostly).
There''s a
lof of user locks in there as well. So...

The CVs look like threads blocking on writes, including writes to a 
network socket.
You need to figure out the write target, and go from there.

Get the PID of the target JVM -

dtrace -n ''syscall::write:entry / pid == $target / 
@[fds[arg0].fi_pathname] = count(); }'' -p [PID_OF_JVM]

The above will list the files being written to. The next step depends on 
what we see.

The same goes for the user locks. If they are indeed malloc locks (which 
I think they are),
I suspect that will come back around to network IOs larger than 2k. Try 
this:

dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] = count();
}'' -p
[PID_OF_JVM]

Thanks,
/jim



Kleyson Rios wrote:> Hi Jim,
>
> But, if there are problems in malloc for buffers for network, I should see
> the locks when running plockstat, don''t see ?
>
>  
> Regards,
>  
> ------------------------------------------------------------------
>  
> Kleyson Rios.
>
> -----Mensagem original-----
> De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM] 
> Enviada em: quarta-feira, 23 de abril de 2008 01:11
> Para: Kleyson Rios
> Cc: dtrace-discuss at opensolaris.org
> Assunto: Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please)
>
> You may want to cross-post to a Java alias, but I''ve been down
this
> road before.
>
> Java will call into malloc() for buffers for network reads and writes
> that are larger than 2k bytes (the 2k is from memory, and I think it
> was a 1.5 JVM). A large number of malloc calls,
> and resulting contention on locks in the library, are due to the
application> doing network writes of larger than 2k.
>
> Newer JVMs (1.6) may improve this, but I''m not sure.
There''s also
> an alternative set of classes and methods, NIO, which also can
> help (although I''ve heard tell that NIO brings other problems
along
> with it, but I can not speak from personal experience).
>
> At this point, I think you need to consult with Java experts to determine
> what options you have for buffer allocation for network IO from the
> Java heap, versus the current behavior of the JVM dropping back
> to malloc for allocating buffers for network IO.
>
> The other option of course is determining if the code can be changed to
> use buffers smaller than 2k.
>
> Thanks,
> /jim
>
>
>
>
> Kleyson Rios wrote:
>   
>> OK jonathan,
>>
>> I understand.
>>
>> So, looking on right place now, i can see few locks and sometimes no
locks>> (just Mutex Hold). But I still have many threads in 100% LCK.
>>
>> If I don''t have a lot of locks, where is my problem ?
>>
>> Running rickey c weisner''s script a get:
>>
>> (...)
>>     25736
>>               libc.so.1`_so_send+0x15
>>               libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
>>               libjvm.so`JVM_Send+0x32
>>
>>     
> libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
>   
>>               0xc3c098d3
>>                10
>>     25736
>>               0xc3d2a33a
>>                14
>>     25736
>>               libc.so.1`_write+0x15
>>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>>               libjvm.so`JVM_Write+0x30
>>               libjava.so`0xc8f7c04b
>>                16
>>     25736
>>               libc.so.1`stat64+0x15
>>                21
>>     25736
>>               libc.so.1`_write+0x15
>>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>>               libjvm.so`JVM_Write+0x30
>>               libjava.so`0xc8f80ce9
>>                76
>>   java                       25736  kernel-level lock              1
>>   java                       25736  shuttle                        6
>>   java                       25736  preempted                      7
>>   java                       25736  user-level lock                511
>>   java                       25736  condition variable             748
>>
>>  
>> Atenciosamente,
>>  
>> ------------------------------------------------------------------
>>  
>> Kleyson Rios.
>> Ger?ncia de Suporte T?cnico
>> Analista de Suporte / L?der de Equipe
>>  
>>
>> -----Mensagem original-----
>> De: Jonathan Adams [mailto:jonathan.adams at sun.com] 
>> Enviada em: sexta-feira, 18 de abril de 2008 15:40
>> Para: Kleyson Rios
>> Cc: dtrace-discuss at opensolaris.org
>> Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)
>>
>>
>> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
>>   
>>     
>>> Hi przemol,
>>>
>>> Bellow output of plockstat for malloc and libumem. Both many locks.
>>> Why changing to libumem I didn''t get less locks ?
>>>
>>>     
>>>       
>> You''re looking at Mutex hold statistics, which don''t
mean a lot
>> (unless contention is caused by long hold times)
>>
>> The important thing for multi-threaded performance is *contention*.   
>> (Spinning and blocking)  Those are the statistics you should be  
>> looking at.
>>
>> Both malloc and libumem use locks to protect their state;  libumem  
>> just uses many locks, in order to reduce contention.
>>
>> Cheers,
>> - jonathan
>>
>>
>>
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>>   
>>     
>
>
>

Jim Mauro

2008-Apr-24 00:05 UTC

head link

[dtrace-discuss] RES: Process in LCK / SLP (Please)

> OK. Let''s go.
> Now i have about 100 users simultaneous. That box has 4 CPU and
it''s running
> JBoss.
> My application it''s very slow. Looking at prstat I saw that I have
many
> threads and most of time one or two threads are running in the same time.
> Using -Lm prstat options I saw that the most of time the threads are in
> LCK/SLP state.
>   Yep. Performance is slow because this threaded application isn''t really
get much in the
way of concurrency. In fact, there are 256 threads in this app, with 
only 1 getting
work done, so my first question is why 256 threads? Did you try tweaking 
this down
to, say, 8 threads? Or 4 threads?> Using plockstat I saw many locks for malloc, so I changed for libumem. I
> don''t have more contention locks (Spin, block), I have just many
Mutex hold
> locks.The hold times are  not atrociously bad, but some are fairly long. More 
to the point,
this indicates all the LCK time is in malloc-related locks. So you may 
have improved the
situation here with the move to libumem, but it doesn''t necesarily mean
the problem has
been solved. The data tells us that some Java threads are spending 20% 
or more of their
time waiting on locks, and those locks all seem to be in malloc. So we 
need to figure out
why the JVM is calling malloc. That''s the second dtrace line I sent,
and
I''m not sure why
it came back empty. I would run it again. We need to know where the 
malloc calls are
coming from.>  But the problem still exist. I have 7 box running jboss, all box have
> the same problem.
>   That''s not surprising. It looks like an application issue to
me.> Doubt: Are there some difference for output running prstat for 1 second or
5
> second? Because running prstat for 1 second I get thread with 100% LCK.
>   Only the difference one would expect by averaging over a longer period. 
The actual microstates
are event-based, and prstat computes the percentage of time spent in 
each microstate based on
the interval, so changing the interval from 1 second to 5 seconds means 
a longer window from
which to compute averages. I would not expect things to change 
appreciably in terms of what''s
reported (and, in my experience, it generally does not).

Your prstat data shows significantly more sleep time than lock time, 
although between the
two you account for 98% of time for almost all the threads. So....

1. The locks. You need to determine where the mallocs are coming from. 
Only an application
    change can improve this.

2. Sleep. What are the threads sleeping on? Looks like writes to files 
and network sockets
    account for a fare bit (based on earlier stack traces). You could 
also grab some random
    pstack(1) samples (pstack PID/LWPID) for long sleeping threads. If 
you want to reduce
    sleep time on writes, you need to make writes faster, or change the 
code to spin
    off a "writer" thread to do the writes asynchronously to the other
work the application
    does (assuming the application design and specs permit such a thing) 
- otherwise, you need
    to optimize writes. Looks like the file writes are all to temp files 
in /var/tmp. Check your
    iostat data and make sure you''re getting good write performance.
Are
the files opened
    O_SYNC/O_DSYNC? That will really slow writes down, and should only 
be done if it''s
    an absolute requirement for the application (run pfiles on the JVM 
to determine if the files
    SYNC flags are set. If they are, ask the programmers if sync writes 
are necessary).

Again, this isn''t a dtrace issue, it''s a Java application
performance
issue. You''ll probably get
more help by posting to a performance group 
(perf-discuss at opensolaris.org) and a
Java interest group (sorry, I''m not a Java expert).

You should also consider trying a newer JVM - the 1.6 JVM includes the 
hotspot dtrace
provider, so we can learn more about what the code is doing with dtrace.
Alternatively, download and link in agent libs for 1.4/1.5 Java to get 
some dtrace probes:
https://solaris10-dtrace-vm-agents.dev.java.net/

And again, I don''t know why 256 threads are being started. I would 
seriously consider taking
the thread count way down for some testing. Too many threads puts more 
pressure on
already contended resources, and the threads are not getting any work 
done anyway...

HTH,
/jim

> *******************************************************************
> Prstat (now), I got a little better LCK when I changed for libumem:
>
> [root at aga253distp156:/] # prstat -Lmp 25736  
>    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
> PROCESS/LWPID 
>  25736 root      57 0.0 0.0 0.0 0.0  43 0.0 0.1  36  83  76   0 java/2
>  25736 root     0.6 0.5 0.0 0.0 0.0  16  78 0.0  32   0  1K   0 java/7211
>  25736 root     0.4 0.6 0.0 0.0 0.0  29  70 0.0  19   5  1K   0 java/7237
>  25736 root     0.4 0.6 0.0 0.0 0.0  27  61 0.2  25  15  1K   0 java/7364
>  25736 root     0.4 0.5 0.0 0.0 0.0  20  80 0.0  19   8  1K   0 java/7218
>  25736 root     0.3 0.4 0.0 0.0 0.0  29  70 0.0  18   5  1K   0 java/7204
>  25736 root     0.3 0.4 0.0 0.0 0.0  16  73 0.0  23   6 864   0 java/7374
>  25736 root     0.2 0.4 0.0 0.0 0.0  20  79 0.0   9  11  1K   0 java/7305
>  25736 root     0.3 0.4 0.0 0.0 0.0  23  76 0.0   7   9 900   0 java/7475
>  25736 root     0.4 0.3 0.0 0.0 0.0  11  88 0.0  15   1 553   0 java/7236
>  25736 root     0.3 0.3 0.0 0.0 0.0 0.8  93 0.0  17   2 734   0 java/7370
>  25736 root     0.1 0.3 0.0 0.0 0.0 8.7  91 0.0   7   2 773   0 java/7235
>  25736 root     0.2 0.2 0.0 0.0 0.0 9.5  90 0.0   8   0 316   0 java/7219
>  25736 root     0.2 0.1 0.0 0.0 0.0 9.8  84 0.0  17  11 306   0 java/7269
>  25736 root     0.1 0.1 0.0 0.0 0.0 0.1 100 0.0   2   1 419   0 java/7471
>  25736 root     0.2 0.0 0.0 0.0 0.0 8.0  92 0.0  14   3  68   0 java/7264
>  25736 root     0.1 0.1 0.0 0.0 0.0  10  89 0.0   6   0 327   0 java/7470
>  25736 root     0.1 0.1 0.0 0.0 0.0 0.0 100 0.0  10   0 302   0 java/7365
>  25736 root     0.2 0.0 0.0 0.0 0.0 6.4  88 0.1  14   5  49   0 java/7317
>  25736 root     0.1 0.1 0.0 0.0 0.0  93 7.0 0.1   4   4 209   0 java/7367
>  25736 root     0.1 0.0 0.0 0.0 0.0 4.0  91 0.0  10   1  55   0 java/7229
>  25736 root     0.1 0.0 0.0 0.0 0.0 3.9  96 0.0  11   1  57   0 java/7304
>  25736 root     0.1 0.1 0.0 0.0 0.0 8.6  91 0.0   4   1 196   0 java/14
>  25736 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0   1   1 263   0 java/7297
>  25736 root     0.1 0.1 0.0 0.0 0.0 0.0 100 0.1 154   0  86   0 java/10
>  25736 root     0.1 0.0 0.0 0.0 0.0 0.6  99 0.0   9   0  55   0 java/7377
>  25736 root     0.1 0.0 0.0 0.0 0.0 100 0.0 0.0  12   4  54   0 java/4
>  25736 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0  16   0  40   0 java/3
>  25736 root     0.0 0.0 0.0 0.0 0.0 3.6  96 0.0   2   0   4   0 java/7481
>  25736 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   8   0  12   0 java/170
>  25736 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0  12   0  12   0 java/231
>  25736 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0  19   4  25   0 java/171
>  25736 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   2   0 java/7221
>  25736 root     0.0 0.0 0.0 0.0 0.0  95 0.0 0.1  20   0  20   0 java/167
>  25736 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   2   0 java/7378
> Total: 1 processes, 256 lwps, load averages: 0.74, 0.68, 0.62
>
> ******************************************************************
>
>
> Plockstat (now):
>
> [root at aga253distp213:/dtrace] # plockstat -A -e 30 -p 25736
>         0
> Mutex hold
>
> Count     nsec Lock                         Caller
>
----------------------------------------------------------------------------
> ---
>  3546     4765 0x8079580                 
libumem.so.1`umem_cache_alloc+0xc1
> 3509     3865 0x8079580                  libumem.so.1`umem_cache_free+0x194
>   328    12542 0x8073030                    libumem.so.1`vmem_xfree+0xfe
>     3  1340378 libumem.so.1`umem_cache_lock
> libumem.so.1`umem_cache_applyall+0x51
>   261    13414 0x8073030                    libumem.so.1`vmem_alloc+0x13c
>   471     5696 0x807b680
> libumem.so.1`umem_cache_alloc+0xc1
>   475     5441 0x807b680
> libumem.so.1`umem_cache_free+0x194
>   123    20767 0x807b6c0
> libumem.so.1`umem_cache_free+0x194
>    28    90103 0x807a1c0
> libumem.so.1`umem_cache_free+0x194
>   329     6788 libumem.so.1`vmem0+0x30      libumem.so.1`vmem_alloc+0x126
>   309     6900 0x807ad80
> libumem.so.1`umem_cache_alloc+0xc1
>   329     6310 libumem.so.1`vmem0+0x4a4     libumem.so.1`vmem_xalloc+0x43b
>   328     6248 libumem.so.1`vmem0+0x30      libumem.so.1`vmem_xfree+0x127
>   355     5727 0x807b300
> libumem.so.1`umem_cache_alloc+0xc1
>   216     9080 0x807b940
> libumem.so.1`umem_cache_alloc+0xc1
>   355     5338 0x807b300
> libumem.so.1`umem_cache_free+0x194
>   328     5393 libumem.so.1`vmem0+0x4a4     libumem.so.1`vmem_xfree+0xfe
>   310     5308 0x807ad80
> libumem.so.1`umem_cache_free+0x194
>   329     4921 0x8073030                    libumem.so.1`vmem_xalloc+0x43b
>   154    10299 0x807b9c0
> libumem.so.1`umem_cache_alloc+0xc1
>   216     6043 0x807b940
> libumem.so.1`umem_cache_free+0x194
>   329     3960 0x8073030                    libumem.so.1`vmem_xalloc+0x315
>   161     7968 0x807b980
> libumem.so.1`umem_cache_alloc+0xc1
>   235     5438 0x807b500
> libumem.so.1`umem_cache_alloc+0xc1
>   329     3784 libumem.so.1`vmem0+0x4a4     libumem.so.1`vmem_alloc+0x13c
>   241     5029 0x807b500
> libumem.so.1`umem_cache_free+0x194
>    85    13906 0x8082c80
> libumem.so.1`umem_cache_alloc+0xc1
>   329     3577 libumem.so.1`vmem0+0x4a4     libumem.so.1`vmem_xalloc+0x315
>    39    29926 0x8079980
> libumem.so.1`umem_cache_free+0x194
>   130     8876 0x807ba00
> libumem.so.1`umem_cache_alloc+0xc1
>    34    33388 0xc8e149c0                   libc.so.1`thr_suspend+0x1a
>   155     7304 0x807ba40
> libumem.so.1`umem_cache_alloc+0xc1
>    13    86856 0x807ac80
> libumem.so.1`umem_cache_alloc+0xc1
>    94    11979 0x80828c0
> libumem.so.1`umem_cache_alloc+0xc1
>   143     7742 0x807b5c0
> libumem.so.1`umem_cache_alloc+0xc1
>   153     5759 0x807b5c0
> libumem.so.1`umem_cache_free+0x194
>    98     8816 0x807b540
> libumem.so.1`umem_cache_alloc+0xc1
>   115     7499 0x807b8c0
> libumem.so.1`umem_cache_alloc+0xc1
>    26    32518 0xc8e12100                   libc.so.1`thr_suspend+0x1a
>   158     5210 0x807b640
> libumem.so.1`umem_cache_free+0x194
>    95     8504 0x80828c0
> libumem.so.1`umem_cache_free+0x194
>   155     5199 0x807ba40
> libumem.so.1`umem_cache_free+0x194
>     8    97711 0xc8e145c0                   libc.so.1`thr_suspend+0x1a
>   130     5915 0x807ba00
> libumem.so.1`umem_cache_free+0x194
>    41    18736 0x807b2c0
> libumem.so.1`umem_cache_alloc+0xc1
>   101     7524 0x8082940
> libumem.so.1`umem_cache_alloc+0xc1
>    33    22842 0x807d640
> libumem.so.1`umem_cache_free+0x194
>    35    21064 0x807acc0
> libumem.so.1`umem_cache_alloc+0xc1
>    16    45814 0xc8e12900                   libc.so.1`thr_suspend+0x1a
>   129     5641 0x807b880
> libumem.so.1`umem_cache_free+0x194
>     5   142887 0xc8e13100                   libc.so.1`thr_suspend+0x1a
>    43    16584 0x807b240
> libumem.so.1`umem_cache_alloc+0xc1
>    21    33838 0xc8e12b80                   libc.so.1`thr_suspend+0x1a
>   101     7012 0x8082940
> libumem.so.1`umem_cache_free+0x194
>     8    42677 0xc8e13f00                   libc.so.1`thr_suspend+0x1a
>     7    48677 0xc8e12940                   libc.so.1`thr_suspend+0x1a
>     9    37763 0xc8e13080                   libc.so.1`thr_suspend+0x1a
>    30    11317 0x8079880
> libumem.so.1`umem_cache_alloc+0xc1
>    61     5553 0x807fcc0
> libumem.so.1`umem_cache_alloc+0xc1
>     7    38522 0xc8e14240                   libc.so.1`thr_suspend+0x1a
>    47     5664 0x8079cc0                    
>    26     9348 0x807a280
> libumem.so.1`umem_cache_alloc+0xc1
>     6    39964 0xc8e14480                   libc.so.1`thr_suspend+0x1a
>    26     9198 0x807bc40
> libumem.so.1`umem_cache_alloc+0xc1
>     5    47600 0xc8e15440                   libc.so.1`thr_suspend+0x1a
>    30     7853 0x807a240
> libumem.so.1`umem_cache_alloc+0xc1
>    14    16771 0x80824b4
> libumem.so.1`umem_depot_alloc+0xa4
>     3     3747 0x807f4b4
> libumem.so.1`umem_cache_update+0xe5
>     3     3743 0x807d134
> libumem.so.1`umem_cache_update+0xe5
> (...)
>
> Mutex block
>
> Count     nsec Lock                         Caller
>
----------------------------------------------------------------------------
> ---
>     1  1624523 0x8073030                    libumem.so.1`vmem_xfree+0x24
>     1    23803 0x80825c0
> libumem.so.1`umem_cache_free+0x51
>     1    18619 0x8073030                    libumem.so.1`vmem_xfree+0x24
>     1    15304 0x807b5c0
> libumem.so.1`umem_cache_free+0x51
>
>
> ******************************************************************
>
>
> # time dtrace -n ''syscall::write:entry / pid == $target / {
> @[fds[arg0].fi_pathname] = count(); }'' -p 25736
>
> dtrace: description ''syscall::write:entry '' matched 1
probe
> ^C
>
>   /zonas/srvintra/root/var/tmp/imageio158908.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158909.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158910.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158911.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158912.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158913.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158914.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158915.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158916.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158917.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158918.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158919.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158920.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158921.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158922.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158923.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158924.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158925.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158926.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158927.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158928.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158929.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158930.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158931.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158932.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158933.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158934.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158935.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158936.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158937.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158938.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158939.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158940.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158941.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158942.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158943.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158944.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158945.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158946.tmp                   16
>   /zonas/srvintra/root/var/tmp/imageio158947.tmp                   16
>  
>
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04
> _23.controller.csv              247
>  
>
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/deploy/detr
> an.war/WEB-INF/control-detran.log              784
>  
>
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04
> _23.request.log            89872
>
> real    0m46.632s
> user    0m0.535s
> sys     0m0.412s
>
> *************************************************************
>
> # time dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] =
count(); }''
> -p 5736
>
> dtrace: description ''pid$target:libc:malloc:entry ''
matched 1 probe
> ^C
>
>
> real    0m52.971s
> user    0m0.581s
> sys     0m0.452s
>
>
>  
> Regards,
>  
> ------------------------------------------------------------------
>  
> Kleyson Rios.
> Ger?ncia de Suporte T?cnico
> Analista de Suporte / L?der de Equipe
>  
> Governo do Estado de Goi?s
> Ag?ncia de Administra??o
> Diretoria de Inform?tica
> +55 62 3201-6582
>  
> "Voc? sabe a diferen?a entre pessoas bem sucedidas e pessoas mal
sucedidas ?
> Pessoas bem sucedidas est?o dispostas a fazer as coisas que as pessoas mal
> sucedidas n?o est?o."
>
> -----Mensagem original-----
> De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM] 
> Enviada em: quarta-feira, 23 de abril de 2008 13:08
> Para: Kleyson Rios
> Cc: dtrace-discuss at opensolaris.org
> Assunto: Re: RES: Process in LCK / SLP (Please)
>
> D''oh! My bad. My brain was thinking malloc and lock contention,
> I saw the network calls on the stack, connected the dots and starting
> typing. I should have looked at the top of the stack. Sorry.
>
> Let''s back up a minute. You have a Java application and the JVM
has a
> large number of threads in LCK time (user lock), right?
>
> plockstat showed a lot of lock activity in libc malloc, at which point
> you tried mtmalloc and libumem, neither of which made much of
> a difference, right?
>
> We know empirically (and heard from Jarod) that LCK can be deceiving,
> because threads that call cond_wait(3C) to wait on a condition variable
> will be charged with LCK time, but it''s really sleep time most of
the time.
> So it is sometimes an artifact of the application design that threads show
> up at 100% LCK time - threads are waiting for a cond_signal() (or 
> cond_broadcast())
> to be put to work, but there''s no work to do. Anyway...
>
> The problem I thought we were chasing was lock activity in malloc.
>
> If you have more recent data (plockstat, prstat) please pass it to me
(sorry
> I missed it), and I''ll get back in sync on the current problem.
>
> I would add that there''s nothing here that I think belongs on 
> dtrace_discuss.
> I would start posting to the performance group and a Java performance
> alias. Also, and I''m sorry if I missed this, how well (or poorly)
is the
> Java application performing, and what metric do we have to determine
> application performance.
>
> Ricky''s script grabs a user stack when a thread goes off CPU to
sleep, and
> tally''s what the threads are sleeping on. It''s mostly
condition
> variables, which
> doesn''t really help us much (everything sleeps on CVs...mostly).
There''s a
> lof of user locks in there as well. So...
>
> The CVs look like threads blocking on writes, including writes to a 
> network socket.
> You need to figure out the write target, and go from there.
>
> Get the PID of the target JVM -
>
> dtrace -n ''syscall::write:entry / pid == $target / 
> @[fds[arg0].fi_pathname] = count(); }'' -p [PID_OF_JVM]
>
> The above will list the files being written to. The next step depends on 
> what we see.
>
> The same goes for the user locks. If they are indeed malloc locks (which 
> I think they are),
> I suspect that will come back around to network IOs larger than 2k. Try 
> this:
>
> dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] = count();
}'' -p
> [PID_OF_JVM]
>
> Thanks,
> /jim
>
>
>
> Kleyson Rios wrote:
>   
>> Hi Jim,
>>
>> But, if there are problems in malloc for buffers for network, I should
see
>> the locks when running plockstat, don''t see ?
>>
>>  
>> Regards,
>>  
>> ------------------------------------------------------------------
>>  
>> Kleyson Rios.
>>
>> -----Mensagem original-----
>> De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM] 
>> Enviada em: quarta-feira, 23 de abril de 2008 01:11
>> Para: Kleyson Rios
>> Cc: dtrace-discuss at opensolaris.org
>> Assunto: Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please)
>>
>> You may want to cross-post to a Java alias, but I''ve been down
this
>> road before.
>>
>> Java will call into malloc() for buffers for network reads and writes
>> that are larger than 2k bytes (the 2k is from memory, and I think it
>> was a 1.5 JVM). A large number of malloc calls,
>> and resulting contention on locks in the library, are due to the
>>     
> application
>   
>> doing network writes of larger than 2k.
>>
>> Newer JVMs (1.6) may improve this, but I''m not sure.
There''s also
>> an alternative set of classes and methods, NIO, which also can
>> help (although I''ve heard tell that NIO brings other problems
along
>> with it, but I can not speak from personal experience).
>>
>> At this point, I think you need to consult with Java experts to
determine
>> what options you have for buffer allocation for network IO from the
>> Java heap, versus the current behavior of the JVM dropping back
>> to malloc for allocating buffers for network IO.
>>
>> The other option of course is determining if the code can be changed to
>> use buffers smaller than 2k.
>>
>> Thanks,
>> /jim
>>
>>
>>
>>
>> Kleyson Rios wrote:
>>   
>>     
>>> OK jonathan,
>>>
>>> I understand.
>>>
>>> So, looking on right place now, i can see few locks and sometimes
no
>>>       
> locks
>   
>>> (just Mutex Hold). But I still have many threads in 100% LCK.
>>>
>>> If I don''t have a lot of locks, where is my problem ?
>>>
>>> Running rickey c weisner''s script a get:
>>>
>>> (...)
>>>     25736
>>>               libc.so.1`_so_send+0x15
>>>               libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
>>>               libjvm.so`JVM_Send+0x32
>>>
>>>     
>>>       
>> libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
>>   
>>     
>>>               0xc3c098d3
>>>                10
>>>     25736
>>>               0xc3d2a33a
>>>                14
>>>     25736
>>>               libc.so.1`_write+0x15
>>>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>>>               libjvm.so`JVM_Write+0x30
>>>               libjava.so`0xc8f7c04b
>>>                16
>>>     25736
>>>               libc.so.1`stat64+0x15
>>>                21
>>>     25736
>>>               libc.so.1`_write+0x15
>>>               libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>>>               libjvm.so`JVM_Write+0x30
>>>               libjava.so`0xc8f80ce9
>>>                76
>>>   java                       25736  kernel-level lock             
1
>>>   java                       25736  shuttle                       
6
>>>   java                       25736  preempted                     
7
>>>   java                       25736  user-level lock               
511
>>>   java                       25736  condition variable            
748
>>>
>>>  
>>> Atenciosamente,
>>>  
>>> ------------------------------------------------------------------
>>>  
>>> Kleyson Rios.
>>> Ger?ncia de Suporte T?cnico
>>> Analista de Suporte / L?der de Equipe
>>>  
>>>
>>> -----Mensagem original-----
>>> De: Jonathan Adams [mailto:jonathan.adams at sun.com] 
>>> Enviada em: sexta-feira, 18 de abril de 2008 15:40
>>> Para: Kleyson Rios
>>> Cc: dtrace-discuss at opensolaris.org
>>> Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)
>>>
>>>
>>> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
>>>   
>>>     
>>>       
>>>> Hi przemol,
>>>>
>>>> Bellow output of plockstat for malloc and libumem. Both many
locks.
>>>> Why changing to libumem I didn''t get less locks ?
>>>>
>>>>     
>>>>       
>>>>         
>>> You''re looking at Mutex hold statistics, which
don''t mean a lot
>>> (unless contention is caused by long hold times)
>>>
>>> The important thing for multi-threaded performance is *contention*.
>>> (Spinning and blocking)  Those are the statistics you should be  
>>> looking at.
>>>
>>> Both malloc and libumem use locks to protect their state;  libumem
>>> just uses many locks, in order to reduce contention.
>>>
>>> Cheers,
>>> - jonathan
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> dtrace-discuss mailing list
>>> dtrace-discuss at opensolaris.org
>>>   
>>>     
>>>       
>>   
>>     
>
>
>

dtrace discuss - Apr 2008 - Process in LCK / SLP (Please)

[dtrace-discuss] Process in LCK / SLP (Please)

[dtrace-discuss] Process in LCK / SLP (Please)

[dtrace-discuss] Process in LCK / SLP (Please)

[dtrace-discuss] Process in LCK / SLP (Please)

[dtrace-discuss] Process in LCK / SLP (Please)

[dtrace-discuss] RES: Process in LCK / SLP (Please)

[dtrace-discuss] Process in LCK / SLP (Please)

[dtrace-discuss] RES: Process in LCK / SLP (Please)

[dtrace-discuss] RES: Process in LCK / SLP (Please)

[dtrace-discuss] RES: Process in LCK / SLP (Please)

[dtrace-discuss] RES: RES: Process in LCK / SLP (Please)

[dtrace-discuss] RES: RES: Process in LCK / SLP (Please)

[dtrace-discuss] RES: Process in LCK / SLP (Please)

[dtrace-discuss] RES: Process in LCK / SLP (Please)

[dtrace-discuss] RES: Process in LCK / SLP (Please)

[dtrace-discuss] RES: Process in LCK / SLP (Please)