Hi, I really need help. How can i identify why my process are 100% in LCK and SLP ? PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 30 0 18 0 java/28 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 1 0 java/94 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 1 0 java/93 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 0 0 java/100 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 1 0 1 0 java/18 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/49 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/47 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/48 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/32 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/31 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/30 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/29 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/27 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/26 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/25 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/24 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/23 Atenciosamente, ------------------------------------------------------------------ Kleyson Rios. Ger?ncia de Suporte T?cnico Analista de Suporte / L?der de Equipe Governo do Estado de Goi?s Ag?ncia de Administra??o Diretoria de Inform?tica +55 62 3201-6582 "Voc? sabe a diferen?a entre pessoas bem sucedidas e pessoas mal sucedidas ? Pessoas bem sucedidas est?o dispostas a fazer as coisas que as pessoas mal sucedidas n?o est?o." -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20080416/daf46aa2/attachment.html>
Kleyson Rios wrote:> Hi, > > > > I really need help. > > > > How can i identify why my process are 100% in LCK and SLP ? > > > > PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG > PROCESS/LWPID > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 30 0 18 0 java/28 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 1 0 java/94 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 1 0 java/93 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 0 0 java/100 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 1 0 1 0 java/18 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/49 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/47 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/48 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/32 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/31 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/30 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/29 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/27 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/26 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/25 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/24 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/23from the prstat man-page: LCK The percentage of time the process has spent waiting for user locks. SLP The percentage of time the process has spent sleeping. it looks like your processes have deadlocked themselves - not much one can do without knowledge about what they''re actually doing and access to the source code. I''d talk to the people developing this app. Michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''
Michael Schuster wrote:> Kleyson Rios wrote: >> Hi, >> >> I really need help. >> >> How can i identify why my process are 100% in LCK and SLP ? >> >> PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID >> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 30 0 18 0 java/28 >> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 1 0 java/94 >> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 1 0 java/93 >> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 0 0 java/100 >> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 1 0 1 0 java/18 >> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/49 >> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/47 >> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/48 >> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/32 >> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/31 >> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/30 >> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/29 >> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/27 >> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/26 >> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/25 >> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/24 >> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/23 > > from the prstat man-page: > > LCK > > The percentage of time the process has spent waiting for > user locks. > > SLP > > The percentage of time the process has spent sleeping. > > it looks like your processes have deadlocked themselves - not much one can > do without knowledge about what they''re actually doing and access to the > source code. I''d talk to the people developing this app. > > MichaelIf it is a deadlock, and if it is a recent Java virtual machine, then you should be able to connect to it with jconsole (from the same directory where you get your java command), go to the "Threads" tab and use the "Detect Deadlock" button to get a thread stack dump showing where you have a cycle among your Java-level locks. ... peter
It could very well be that your process is waiting on condition variables and this behavior is normal. Please see the attached script as a place to start. A pstack of the process would also be informative. rick On Wed, Apr 16, 2008 at 12:35:53PM -0300, Kleyson Rios wrote:> Date: Wed, 16 Apr 2008 12:35:53 -0300 > From: Kleyson Rios <kleyson at administracao.go.gov.br> > Subject: [dtrace-discuss] Process in LCK / SLP (Please) > To: dtrace-discuss at opensolaris.org > Errors-to: dtrace-discuss-bounces at opensolaris.org > X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 > X-Mailer: Microsoft Office Outlook, Build 11.0.5510 > Precedence: list > X-BeenThere: dtrace-discuss at opensolaris.org > Delivered-to: dtrace-discuss at opensolaris.org > Thread-index: Acif144rsI8gfNTOQjKP0oF/+VkN7Q=> X-PMX-Version: 5.4.1.325704 > X-Brightmail-Tracker: AAAAAA=> X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on > oss-mail1.opensolaris.org > X-Original-To: dtrace-discuss at opensolaris.org > X-Antispam: No, score=-2.6/5.0, scanned in 0.189sec at (localhost [127.0.0.1]) > by smf-spamd v1.3.1 - http://smfs.sf.net/ > X-Mailman-Version: 2.1.9 > List-Post: <mailto:dtrace-discuss at opensolaris.org> > List-Subscribe: <http://mail.opensolaris.org/mailman/listinfo/dtrace-discuss>, > <mailto:dtrace-discuss-request at opensolaris.org?subject=subscribe> > List-Unsubscribe: > <http://mail.opensolaris.org/mailman/listinfo/dtrace-discuss>, > <mailto:dtrace-discuss-request at opensolaris.org?subject=unsubscribe> > List-Archive: <http://mail.opensolaris.org/pipermail/dtrace-discuss> > List-Help: <mailto:dtrace-discuss-request at opensolaris.org?subject=help> > List-Id: DTrace General Discussion <dtrace-discuss.opensolaris.org> > X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,HTML_MESSAGE > autolearn=unavailable version=3.2.3 > X-Spam-Level: > > > Hi, > > > I really need help. > > > How can i identify why my process are 100% in LCK and SLP ? > > > PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG > PROCESS/LWPID > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 30 0 18 0 > java/28 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 1 0 > java/94 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 1 0 > java/93 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 0 0 > java/100 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 1 0 1 0 > java/18 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 > java/49 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 > java/47 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 > java/48 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 > java/32 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 > java/31 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 > java/30 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 > java/29 > > 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 > java/27 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 > java/26 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 > java/25 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 > java/24 > > 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 > java/23 > > > > Atenciosamente, > > > ------------------------------------------------------------------ > > > Kleyson Rios. > > Ger?ncia de Suporte T?cnico > > Analista de Suporte / L?der de Equipe > > > Governo do Estado de Goi?s > > Ag?ncia de Administra??o > > Diretoria de Inform?tica > > +55 62 3201-6582 > > > "Voc? sabe a diferen?a entre pessoas bem sucedidas e pessoas mal > sucedidas ? > > Pessoas bem sucedidas est?o dispostas a fazer as coisas que as pessoas > mal sucedidas n?o est?o."> _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Rickey C. Weisner Software Development and Performance Specialist Sun Microsystems, INC cell phone: 615-308-1147 email: rick.weisner at sun.com -------------- next part -------------- #!/usr/sbin/dtrace -s sched:::off-cpu / arg1 != 0 && pid == $1 && curlwpsinfo->pr_state == SSLEEP/ { /* * We''re sleeping. Track our sobj type. */ self->sobj = curlwpsinfo->pr_stype; self->bedtime = timestamp; } sched:::off-cpu / arg1 != 0 && pid == $1 && curlwpsinfo->pr_state == SRUN/ { self->bedtime = timestamp; /* @s[pid,ustack(5)]=count(); */ } sched:::on-cpu /self->bedtime && !self->sobj / { @[execname,pid,"preempted"] = sum(timestamp - self->bedtime); /* @c[execname,pid,"preempted"]=count(); */ self->bedtime = 0; } sched:::on-cpu /self->sobj / { @[execname,pid,self->sobj == SOBJ_MUTEX ? "kernel-level lock" : self->sobj == SOBJ_RWLOCK ? "rwlock" : self->sobj == SOBJ_CV ? "condition variable" : self->sobj == SOBJ_SEMA ? "semaphore" : self->sobj == SOBJ_USER ? "user-level lock" : self->sobj == SOBJ_USER_PI ? "user-level prio-inheriting lock" : self->sobj == SOBJ_SHUTTLE ? "shuttle" : "unknown"] sum(timestamp - self->bedtime); /* @c[execname,pid,self->sobj == SOBJ_MUTEX ? "kernel-level lock" : self->sobj == SOBJ_RWLOCK ? "rwlock" : self->sobj == SOBJ_CV ? "condition variable" : self->sobj == SOBJ_SEMA ? "semaphore" : self->sobj == SOBJ_USER ? "user-level lock" : self->sobj == SOBJ_USER_PI ? "user-level prio-inheriting lock" : self->sobj == SOBJ_SHUTTLE ? "shuttle" : "unknown"] count(); */ self->sobj = 0; self->bedtime = 0; } tick-10sec { exit(0); }
Peter B. Kessler''s email at 4/16/08 2:01 PM, said:> Michael Schuster wrote: >> Kleyson Rios wrote: >>> Hi, >>> >>> I really need help. >>> >>> How can i identify why my process are 100% in LCK and SLP ? >>> >>> PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 30 0 18 0 java/28 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 1 0 java/94 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 1 0 java/93 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 0 0 java/100 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 1 0 1 0 java/18 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/49 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/47 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/48 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/32 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/31 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/30 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/29 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/27 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/26 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/25 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/24 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/23 >> from the prstat man-page: >> >> LCK >> >> The percentage of time the process has spent waiting for >> user locks. >> >> SLP >> >> The percentage of time the process has spent sleeping. >> >> it looks like your processes have deadlocked themselves - not much one can >> do without knowledge about what they''re actually doing and access to the >> source code. I''d talk to the people developing this app. >> >> Michael > > If it is a deadlock, and if it is a recent Java virtual machine, then > you should be able to connect to it with jconsole (from the same directory > where you get your java command), go to the "Threads" tab and use the > "Detect Deadlock" button to get a thread stack dump showing where you > have a cycle among your Java-level locks. > > ... peter > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >*Highly* unlikely it is a deadlock. Insanely, Solaris charges time in condition wait ([pthread_]cond_wait or lwp_cond_wait) as lock time. This has never made sense to me, and I have yet to wield a big enough stick to get it fixed. Take either a Java thread dump or a pstack(1) dump. If a Java dump, you will be in Object.wait (or similar) or sleep(). If a pstack(1), you will be in some form of cond_wait. If it really is a deadlock (and I''ll buy you a beer if it is because I am sure it isn''t), Peter''s suggestion is the way to go. As for sleep, the threads are tired. Long day of garbage collection I suppose. Seriously though, they have no (or very little) work to do since the top thread is 100% sleep but made 18 system calls (rounding error). Thanks, Jarod
Tanks everybody. I will try to do what you guys said. When I get new issues I post again. Regards, Kleyson Rios. -----Mensagem original----- De: Jarod Jenson [mailto:jarod at aeysis.com] Enviada em: quarta-feira, 16 de abril de 2008 16:56 Para: Peter B. Kessler Cc: Kleyson Rios; dtrace-discuss at opensolaris.org Assunto: Re: [dtrace-discuss] Process in LCK / SLP (Please) Peter B. Kessler''s email at 4/16/08 2:01 PM, said:> Michael Schuster wrote: >> Kleyson Rios wrote: >>> Hi, >>> >>> I really need help. >>> >>> How can i identify why my process are 100% in LCK and SLP ? >>> >>> PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIGPROCESS/LWPID>>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 30 0 18 0 java/28 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 1 0 java/94 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 1 0 java/93 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 0 0 java/100 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 1 0 1 0 java/18 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/49 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/47 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/48 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/32 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/31 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/30 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/29 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 java/27 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/26 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/25 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/24 >>> 6769 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 java/23 >> from the prstat man-page: >> >> LCK >> >> The percentage of time the process has spent waiting for >> user locks. >> >> SLP >> >> The percentage of time the process has spent sleeping. >> >> it looks like your processes have deadlocked themselves - not much onecan>> do without knowledge about what they''re actually doing and access to the >> source code. I''d talk to the people developing this app. >> >> Michael > > If it is a deadlock, and if it is a recent Java virtual machine, then > you should be able to connect to it with jconsole (from the same directory > where you get your java command), go to the "Threads" tab and use the > "Detect Deadlock" button to get a thread stack dump showing where you > have a cycle among your Java-level locks. > > ... peter > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >*Highly* unlikely it is a deadlock. Insanely, Solaris charges time in condition wait ([pthread_]cond_wait or lwp_cond_wait) as lock time. This has never made sense to me, and I have yet to wield a big enough stick to get it fixed. Take either a Java thread dump or a pstack(1) dump. If a Java dump, you will be in Object.wait (or similar) or sleep(). If a pstack(1), you will be in some form of cond_wait. If it really is a deadlock (and I''ll buy you a beer if it is because I am sure it isn''t), Peter''s suggestion is the way to go. As for sleep, the threads are tired. Long day of garbage collection I suppose. Seriously though, they have no (or very little) work to do since the top thread is 100% sleep but made 18 system calls (rounding error). Thanks, Jarod
przemolicc at poczta.fm
2008-Apr-18 09:49 UTC
[dtrace-discuss] Process in LCK / SLP (Please)
On Wed, Apr 16, 2008 at 12:35:53PM -0300, Kleyson Rios wrote:> Hi, > > > > I really need help. > > > > How can i identify why my process are 100% in LCK and SLP ?Could you please use ''plockstat'' and show us the output ? E.g.: plockstat -e 30 -s 10 -A -p `<pid of java proc>'' Regards przemol -- http://przemol.blogspot.com/ ---------------------------------------------------------------------- Poprowadz swoj klub do zwyciestwa! Sprawdz >>> http://link.interia.pl/f1d76
Hi przemol,
Bellow output of plockstat for malloc and libumem. Both many locks.
Why changing to libumem I didn''t get less locks ?
********** Plockstat using malloc (many many locks):
Mutex hold
Count nsec Lock Caller
----------------------------------------------------------------------------
---
4036 14391 libc.so.1`libc_malloc_lock
libjava.so`JNU_GetStringPlatformChars+0x290
2118 7385 libc.so.1`libc_malloc_lock
libjava.so`JNU_ReleaseStringPlatformChars+0x18
3174 4700 libc.so.1`libc_malloc_lock
libjvm.so`__1cCosGmalloc6FI_pv_+0x29
181 63407 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
3170 3407 libc.so.1`libc_malloc_lock
libjvm.so`__1cCosEfree6Fpv_v_+0x18
588 12443 libc.so.1`libc_malloc_lock
libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x7f
172 37572 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosQpd_resume_thread6FpnGThread__i_+0x1f
176 26701 libc.so.1`libc_malloc_lock
libjvm.so`__1cCosGmalloc6FI_pv_+0x29
596 7124 libc.so.1`libc_malloc_lock
libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x1e0
450 7254 libc.so.1`libc_malloc_lock 0x858b2167
(...)
********** Plockstat using libumem (many many locks too):
Mutex hold
Count nsec Lock Caller
----------------------------------------------------------------------------
---
4 1450455 libumem.so.1`umem_cache_lock
libumem.so.1`umem_cache_applyall+0x51
100 46388 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
100 23226 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosQpd_resume_thread6FpnGThread__i_+0x1f
486 4314 0x807b680
libumem.so.1`umem_cache_alloc+0xc1
488 4236 0x807b680
libumem.so.1`umem_cache_free+0x194
356 4859 0x807b300
libumem.so.1`umem_cache_alloc+0xc1
150 11499 0x8073030 libumem.so.1`vmem_xfree+0xfe
115 14473 0x8073030 libumem.so.1`vmem_alloc+0x13c
297 5374 0x807ad80
libumem.so.1`umem_cache_alloc+0xc1
362 4258 0x807b300
libumem.so.1`umem_cache_free+0x194
297 4304 0x807ad80
libumem.so.1`umem_cache_free+0x194
48 21805 0x807b6c0
libumem.so.1`umem_cache_free+0x194
150 6635 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_alloc+0x126
(...)
Mutex hold
----------------------------------------------------------------------------
---
Count nsec Lock Caller
543 13110 0x807b680
libumem.so.1`umem_cache_alloc+0xc1
nsec ---- Time Distribution --- count Stack
8192 |@@@@@@@@@@@@ | 289
libumem.so.1`umem_cache_alloc+0xc1
16384 |@@@@@@@@@ | 224 libumem.so.1`umem_alloc+0x3f
32768 |@ | 27 libumem.so.1`malloc+0x23
65536 | | 3
libjava.so`JNU_GetStringPlatformChars+0x290
0x20ac
----------------------------------------------------------------------------
---
Count nsec Lock Caller
78 89901 libc.so.1`_uberdata+0x40
libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
nsec ---- Time Distribution --- count Stack
65536 |@@@@@@@@@@@@@@@@@ | 57 libc.so.1`fork_lock_exit+0x2f
131072 |@@@@@ | 19 libc.so.1`_thrp_suspend+0x22c
262144 | | 1 libc.so.1`thr_suspend+0x1a
524288 | | 1
libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46
libjvm.so`__1cGThreadNdo_vm_suspend6M_i_+0x42
libjvm.so`__1cGThreadKvm_suspend6M_i_+0x33
libjvm.so`__1cUThreadSafepointStateXexamine_state_of_thread6Mi_v_+0x101
libjvm.so`__1cUSafepointSynchronizeFbegin6F_v_+0x12b
libjvm.so`__1cIVMThreadEloop6M_v_+0x1b4
(...)
Regards.
Kleyson Rios.
-----Mensagem original-----
De: dtrace-discuss-bounces at opensolaris.org
[mailto:dtrace-discuss-bounces at opensolaris.org] Em nome de
przemolicc at poczta.fm
Enviada em: sexta-feira, 18 de abril de 2008 06:49
Para: dtrace-discuss at opensolaris.org
Assunto: Re: [dtrace-discuss] Process in LCK / SLP (Please)
On Wed, Apr 16, 2008 at 12:35:53PM -0300, Kleyson Rios
wrote:> Hi,
>
>
>
> I really need help.
>
>
>
> How can i identify why my process are 100% in LCK and SLP ?
Could you please use ''plockstat'' and show us the output ?
E.g.: plockstat -e 30 -s 10 -A -p `<pid of java proc>''
We had such case in the past. I don''t think it is the same reason but
might
give you some ideas.
We had a lot of locks. By using plockstat it turned out that those locks
comes from malloc. Using mtmalloc or libumem reduced a number of locks but
it was still a huge number (and eating our CPU). By using dtrace it turned
out that those locks comes from threads which were ... supposed to be
killed. Our programmers found a bug in the application (already unnecessary
threads were still alive and all of the them going in the endless loop),
changed the code and ... all locks went away.
As Jim said: start with watching who is using mallocs (which thread(s)).
Regards
przemol
--
http://przemol.blogspot.com/
----------------------------------------------------------------------
Poprowadz swoj klub do zwyciestwa!
Sprawdz >>> http://link.interia.pl/f1d76
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss at opensolaris.org
On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:> Hi przemol, > > Bellow output of plockstat for malloc and libumem. Both many locks. > Why changing to libumem I didn''t get less locks ? >You''re looking at Mutex hold statistics, which don''t mean a lot (unless contention is caused by long hold times) The important thing for multi-threaded performance is *contention*. (Spinning and blocking) Those are the statistics you should be looking at. Both malloc and libumem use locks to protect their state; libumem just uses many locks, in order to reduce contention. Cheers, - jonathan> ********** Plockstat using malloc (many many locks): > > Mutex hold > > Count nsec Lock Caller > ---------------------------------------------------------------------------- > --- > 4036 14391 libc.so.1`libc_malloc_lock > libjava.so`JNU_GetStringPlatformChars+0x290 > 2118 7385 libc.so.1`libc_malloc_lock > libjava.so`JNU_ReleaseStringPlatformChars+0x18 > 3174 4700 libc.so.1`libc_malloc_lock > libjvm.so`__1cCosGmalloc6FI_pv_+0x29 > 181 63407 libc.so.1`_uberdata+0x40 > libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46 > 3170 3407 libc.so.1`libc_malloc_lock > libjvm.so`__1cCosEfree6Fpv_v_+0x18 > 588 12443 libc.so.1`libc_malloc_lock > libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x7f > 172 37572 libc.so.1`_uberdata+0x40 > libjvm.so`__1cCosQpd_resume_thread6FpnGThread__i_+0x1f > 176 26701 libc.so.1`libc_malloc_lock > libjvm.so`__1cCosGmalloc6FI_pv_+0x29 > 596 7124 libc.so.1`libc_malloc_lock > libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x1e0 > 450 7254 libc.so.1`libc_malloc_lock 0x858b2167 > (...) > > > ********** Plockstat using libumem (many many locks too): > > Mutex hold > > Count nsec Lock Caller > ---------------------------------------------------------------------------- > --- > 4 1450455 libumem.so.1`umem_cache_lock > libumem.so.1`umem_cache_applyall+0x51 > 100 46388 libc.so.1`_uberdata+0x40 > libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46 > 100 23226 libc.so.1`_uberdata+0x40 > libjvm.so`__1cCosQpd_resume_thread6FpnGThread__i_+0x1f > 486 4314 0x807b680 > libumem.so.1`umem_cache_alloc+0xc1 > 488 4236 0x807b680 > libumem.so.1`umem_cache_free+0x194 > 356 4859 0x807b300 > libumem.so.1`umem_cache_alloc+0xc1 > 150 11499 0x8073030 libumem.so.1`vmem_xfree > +0xfe > 115 14473 0x8073030 libumem.so.1`vmem_alloc > +0x13c > 297 5374 0x807ad80 > libumem.so.1`umem_cache_alloc+0xc1 > 362 4258 0x807b300 > libumem.so.1`umem_cache_free+0x194 > 297 4304 0x807ad80 > libumem.so.1`umem_cache_free+0x194 > 48 21805 0x807b6c0 > libumem.so.1`umem_cache_free+0x194 > 150 6635 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_alloc > +0x126 > (...) > > > > Mutex hold > > ---------------------------------------------------------------------------- > --- > Count nsec Lock Caller > 543 13110 0x807b680 > libumem.so.1`umem_cache_alloc+0xc1 > > nsec ---- Time Distribution --- count Stack > 8192 |@@@@@@@@@@@@ | 289 > libumem.so.1`umem_cache_alloc+0xc1 > 16384 |@@@@@@@@@ | 224 libumem.so.1`umem_alloc > +0x3f > 32768 |@ | 27 libumem.so.1`malloc+0x23 > 65536 | | 3 > libjava.so`JNU_GetStringPlatformChars+0x290 > 0x20ac > ---------------------------------------------------------------------------- > --- > Count nsec Lock Caller > 78 89901 libc.so.1`_uberdata+0x40 > libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46 > > nsec ---- Time Distribution --- count Stack > 65536 |@@@@@@@@@@@@@@@@@ | 57 libc.so.1`fork_lock_exit > +0x2f > 131072 |@@@@@ | 19 libc.so.1`_thrp_suspend > +0x22c > 262144 | | 1 libc.so.1`thr_suspend+0x1a > 524288 | | 1 > libjvm.so`__1cCosRpd_suspend_thread6FpnGThread_i_i_+0x46 > > libjvm.so`__1cGThreadNdo_vm_suspend6M_i_+0x42 > > libjvm.so`__1cGThreadKvm_suspend6M_i_+0x33 > > libjvm.so`__1cUThreadSafepointStateXexamine_state_of_thread6Mi_v_ > +0x101 > > libjvm.so`__1cUSafepointSynchronizeFbegin6F_v_+0x12b > > libjvm.so`__1cIVMThreadEloop6M_v_+0x1b4 > > (...) > > > Regards. > Kleyson Rios. > > -----Mensagem original----- > De: dtrace-discuss-bounces at opensolaris.org > [mailto:dtrace-discuss-bounces at opensolaris.org] Em nome de > przemolicc at poczta.fm > Enviada em: sexta-feira, 18 de abril de 2008 06:49 > Para: dtrace-discuss at opensolaris.org > Assunto: Re: [dtrace-discuss] Process in LCK / SLP (Please) > > On Wed, Apr 16, 2008 at 12:35:53PM -0300, Kleyson Rios wrote: >> Hi, >> >> >> >> I really need help. >> >> >> >> How can i identify why my process are 100% in LCK and SLP ? > > Could you please use ''plockstat'' and show us the output ? > E.g.: plockstat -e 30 -s 10 -A -p `<pid of java proc>'' > > We had such case in the past. I don''t think it is the same reason > but might > give you some ideas. > We had a lot of locks. By using plockstat it turned out that those > locks > comes from malloc. Using mtmalloc or libumem reduced a number of > locks but > it was still a huge number (and eating our CPU). By using dtrace it > turned > out that those locks comes from threads which were ... supposed to be > killed. Our programmers found a bug in the application (already > unnecessary > threads were still alive and all of the them going in the endless > loop), > changed the code and ... all locks went away. > > As Jim said: start with watching who is using mallocs (which > thread(s)). > > Regards > przemol > > -- > http://przemol.blogspot.com/ > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > Poprowadz swoj klub do zwyciestwa! > Sprawdz >>> http://link.interia.pl/f1d76 > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-------------------------------------------------------------------------- Jonathan Adams, Sun Microsystems, ZFS Team http://blogs.sun.com/jwadams
przemolicc at poczta.fm
2008-Apr-21 08:37 UTC
[dtrace-discuss] RES: Process in LCK / SLP (Please)
On Fri, Apr 18, 2008 at 02:40:15PM -0400, Jonathan Adams wrote:> > On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote: > > Hi przemol, > > > > Bellow output of plockstat for malloc and libumem. Both many locks. > > Why changing to libumem I didn''t get less locks ? > > > > You''re looking at Mutex hold statistics, which don''t mean a lot > (unless contention is caused by long hold times) > > The important thing for multi-threaded performance is *contention*. > (Spinning and blocking) Those are the statistics you should be > looking at. > > Both malloc and libumem use locks to protect their state; libumem > just uses many locks, in order to reduce contention.So it seems Kleyson that I was wrong and rickey c weisner script is the way to go. Regards przemol -- http://przemol.blogspot.com/ ---------------------------------------------------------------------- Posluchaj najlepszych kawalkow w Sieci! http://link.interia.pl/f1da1
Kleyson Rios
2008-Apr-22 17:36 UTC
[dtrace-discuss] RES: RES: Process in LCK / SLP (Please)
OK jonathan,
I understand.
So, looking on right place now, i can see few locks and sometimes no locks
(just Mutex Hold). But I still have many threads in 100% LCK.
If I don''t have a lot of locks, where is my problem ?
Running rickey c weisner''s script a get:
(...)
25736
libc.so.1`_so_send+0x15
libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
libjvm.so`JVM_Send+0x32
libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
0xc3c098d3
10
25736
0xc3d2a33a
14
25736
libc.so.1`_write+0x15
libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
libjvm.so`JVM_Write+0x30
libjava.so`0xc8f7c04b
16
25736
libc.so.1`stat64+0x15
21
25736
libc.so.1`_write+0x15
libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
libjvm.so`JVM_Write+0x30
libjava.so`0xc8f80ce9
76
java 25736 kernel-level lock 1
java 25736 shuttle 6
java 25736 preempted 7
java 25736 user-level lock 511
java 25736 condition variable 748
Atenciosamente,
------------------------------------------------------------------
Kleyson Rios.
Ger?ncia de Suporte T?cnico
Analista de Suporte / L?der de Equipe
-----Mensagem original-----
De: Jonathan Adams [mailto:jonathan.adams at sun.com]
Enviada em: sexta-feira, 18 de abril de 2008 15:40
Para: Kleyson Rios
Cc: dtrace-discuss at opensolaris.org
Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)
On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:> Hi przemol,
>
> Bellow output of plockstat for malloc and libumem. Both many locks.
> Why changing to libumem I didn''t get less locks ?
>
You''re looking at Mutex hold statistics, which don''t mean a
lot
(unless contention is caused by long hold times)
The important thing for multi-threaded performance is *contention*.
(Spinning and blocking) Those are the statistics you should be
looking at.
Both malloc and libumem use locks to protect their state; libumem
just uses many locks, in order to reduce contention.
Cheers,
- jonathan
You may want to cross-post to a Java alias, but I''ve been down this road before. Java will call into malloc() for buffers for network reads and writes that are larger than 2k bytes (the 2k is from memory, and I think it was a 1.5 JVM). A large number of malloc calls, and resulting contention on locks in the library, are due to the application doing network writes of larger than 2k. Newer JVMs (1.6) may improve this, but I''m not sure. There''s also an alternative set of classes and methods, NIO, which also can help (although I''ve heard tell that NIO brings other problems along with it, but I can not speak from personal experience). At this point, I think you need to consult with Java experts to determine what options you have for buffer allocation for network IO from the Java heap, versus the current behavior of the JVM dropping back to malloc for allocating buffers for network IO. The other option of course is determining if the code can be changed to use buffers smaller than 2k. Thanks, /jim Kleyson Rios wrote:> OK jonathan, > > I understand. > > So, looking on right place now, i can see few locks and sometimes no locks > (just Mutex Hold). But I still have many threads in 100% LCK. > > If I don''t have a lot of locks, where is my problem ? > > Running rickey c weisner''s script a get: > > (...) > 25736 > libc.so.1`_so_send+0x15 > libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67 > libjvm.so`JVM_Send+0x32 > libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131 > 0xc3c098d3 > 10 > 25736 > 0xc3d2a33a > 14 > 25736 > libc.so.1`_write+0x15 > libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d > libjvm.so`JVM_Write+0x30 > libjava.so`0xc8f7c04b > 16 > 25736 > libc.so.1`stat64+0x15 > 21 > 25736 > libc.so.1`_write+0x15 > libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d > libjvm.so`JVM_Write+0x30 > libjava.so`0xc8f80ce9 > 76 > java 25736 kernel-level lock 1 > java 25736 shuttle 6 > java 25736 preempted 7 > java 25736 user-level lock 511 > java 25736 condition variable 748 > > > Atenciosamente, > > ------------------------------------------------------------------ > > Kleyson Rios. > Ger?ncia de Suporte T?cnico > Analista de Suporte / L?der de Equipe > > > -----Mensagem original----- > De: Jonathan Adams [mailto:jonathan.adams at sun.com] > Enviada em: sexta-feira, 18 de abril de 2008 15:40 > Para: Kleyson Rios > Cc: dtrace-discuss at opensolaris.org > Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please) > > > On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote: > >> Hi przemol, >> >> Bellow output of plockstat for malloc and libumem. Both many locks. >> Why changing to libumem I didn''t get less locks ? >> >> > > You''re looking at Mutex hold statistics, which don''t mean a lot > (unless contention is caused by long hold times) > > The important thing for multi-threaded performance is *contention*. > (Spinning and blocking) Those are the statistics you should be > looking at. > > Both malloc and libumem use locks to protect their state; libumem > just uses many locks, in order to reduce contention. > > Cheers, > - jonathan > > > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
Hi Jim, But, if there are problems in malloc for buffers for network, I should see the locks when running plockstat, don''t see ? Regards, ------------------------------------------------------------------ Kleyson Rios. -----Mensagem original----- De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM] Enviada em: quarta-feira, 23 de abril de 2008 01:11 Para: Kleyson Rios Cc: dtrace-discuss at opensolaris.org Assunto: Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please) You may want to cross-post to a Java alias, but I''ve been down this road before. Java will call into malloc() for buffers for network reads and writes that are larger than 2k bytes (the 2k is from memory, and I think it was a 1.5 JVM). A large number of malloc calls, and resulting contention on locks in the library, are due to the application doing network writes of larger than 2k. Newer JVMs (1.6) may improve this, but I''m not sure. There''s also an alternative set of classes and methods, NIO, which also can help (although I''ve heard tell that NIO brings other problems along with it, but I can not speak from personal experience). At this point, I think you need to consult with Java experts to determine what options you have for buffer allocation for network IO from the Java heap, versus the current behavior of the JVM dropping back to malloc for allocating buffers for network IO. The other option of course is determining if the code can be changed to use buffers smaller than 2k. Thanks, /jim Kleyson Rios wrote:> OK jonathan, > > I understand. > > So, looking on right place now, i can see few locks and sometimes no locks > (just Mutex Hold). But I still have many threads in 100% LCK. > > If I don''t have a lot of locks, where is my problem ? > > Running rickey c weisner''s script a get: > > (...) > 25736 > libc.so.1`_so_send+0x15 > libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67 > libjvm.so`JVM_Send+0x32 >libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131> 0xc3c098d3 > 10 > 25736 > 0xc3d2a33a > 14 > 25736 > libc.so.1`_write+0x15 > libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d > libjvm.so`JVM_Write+0x30 > libjava.so`0xc8f7c04b > 16 > 25736 > libc.so.1`stat64+0x15 > 21 > 25736 > libc.so.1`_write+0x15 > libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d > libjvm.so`JVM_Write+0x30 > libjava.so`0xc8f80ce9 > 76 > java 25736 kernel-level lock 1 > java 25736 shuttle 6 > java 25736 preempted 7 > java 25736 user-level lock 511 > java 25736 condition variable 748 > > > Atenciosamente, > > ------------------------------------------------------------------ > > Kleyson Rios. > Ger?ncia de Suporte T?cnico > Analista de Suporte / L?der de Equipe > > > -----Mensagem original----- > De: Jonathan Adams [mailto:jonathan.adams at sun.com] > Enviada em: sexta-feira, 18 de abril de 2008 15:40 > Para: Kleyson Rios > Cc: dtrace-discuss at opensolaris.org > Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please) > > > On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote: > >> Hi przemol, >> >> Bellow output of plockstat for malloc and libumem. Both many locks. >> Why changing to libumem I didn''t get less locks ? >> >> > > You''re looking at Mutex hold statistics, which don''t mean a lot > (unless contention is caused by long hold times) > > The important thing for multi-threaded performance is *contention*. > (Spinning and blocking) Those are the statistics you should be > looking at. > > Both malloc and libumem use locks to protect their state; libumem > just uses many locks, in order to reduce contention. > > Cheers, > - jonathan > > > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
D''oh! My bad. My brain was thinking malloc and lock contention,
I saw the network calls on the stack, connected the dots and starting
typing. I should have looked at the top of the stack. Sorry.
Let''s back up a minute. You have a Java application and the JVM has a
large number of threads in LCK time (user lock), right?
plockstat showed a lot of lock activity in libc malloc, at which point
you tried mtmalloc and libumem, neither of which made much of
a difference, right?
We know empirically (and heard from Jarod) that LCK can be deceiving,
because threads that call cond_wait(3C) to wait on a condition variable
will be charged with LCK time, but it''s really sleep time most of the
time.
So it is sometimes an artifact of the application design that threads show
up at 100% LCK time - threads are waiting for a cond_signal() (or
cond_broadcast())
to be put to work, but there''s no work to do. Anyway...
The problem I thought we were chasing was lock activity in malloc.
If you have more recent data (plockstat, prstat) please pass it to me (sorry
I missed it), and I''ll get back in sync on the current problem.
I would add that there''s nothing here that I think belongs on
dtrace_discuss.
I would start posting to the performance group and a Java performance
alias. Also, and I''m sorry if I missed this, how well (or poorly) is
the
Java application performing, and what metric do we have to determine
application performance.
Ricky''s script grabs a user stack when a thread goes off CPU to sleep,
and
tally''s what the threads are sleeping on. It''s mostly
condition
variables, which
doesn''t really help us much (everything sleeps on CVs...mostly).
There''s a
lof of user locks in there as well. So...
The CVs look like threads blocking on writes, including writes to a
network socket.
You need to figure out the write target, and go from there.
Get the PID of the target JVM -
dtrace -n ''syscall::write:entry / pid == $target /
@[fds[arg0].fi_pathname] = count(); }'' -p [PID_OF_JVM]
The above will list the files being written to. The next step depends on
what we see.
The same goes for the user locks. If they are indeed malloc locks (which
I think they are),
I suspect that will come back around to network IOs larger than 2k. Try
this:
dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] = count();
}'' -p
[PID_OF_JVM]
Thanks,
/jim
Kleyson Rios wrote:> Hi Jim,
>
> But, if there are problems in malloc for buffers for network, I should see
> the locks when running plockstat, don''t see ?
>
>
> Regards,
>
> ------------------------------------------------------------------
>
> Kleyson Rios.
>
> -----Mensagem original-----
> De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM]
> Enviada em: quarta-feira, 23 de abril de 2008 01:11
> Para: Kleyson Rios
> Cc: dtrace-discuss at opensolaris.org
> Assunto: Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please)
>
> You may want to cross-post to a Java alias, but I''ve been down
this
> road before.
>
> Java will call into malloc() for buffers for network reads and writes
> that are larger than 2k bytes (the 2k is from memory, and I think it
> was a 1.5 JVM). A large number of malloc calls,
> and resulting contention on locks in the library, are due to the
application
> doing network writes of larger than 2k.
>
> Newer JVMs (1.6) may improve this, but I''m not sure.
There''s also
> an alternative set of classes and methods, NIO, which also can
> help (although I''ve heard tell that NIO brings other problems
along
> with it, but I can not speak from personal experience).
>
> At this point, I think you need to consult with Java experts to determine
> what options you have for buffer allocation for network IO from the
> Java heap, versus the current behavior of the JVM dropping back
> to malloc for allocating buffers for network IO.
>
> The other option of course is determining if the code can be changed to
> use buffers smaller than 2k.
>
> Thanks,
> /jim
>
>
>
>
> Kleyson Rios wrote:
>
>> OK jonathan,
>>
>> I understand.
>>
>> So, looking on right place now, i can see few locks and sometimes no
locks
>> (just Mutex Hold). But I still have many threads in 100% LCK.
>>
>> If I don''t have a lot of locks, where is my problem ?
>>
>> Running rickey c weisner''s script a get:
>>
>> (...)
>> 25736
>> libc.so.1`_so_send+0x15
>> libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
>> libjvm.so`JVM_Send+0x32
>>
>>
> libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
>
>> 0xc3c098d3
>> 10
>> 25736
>> 0xc3d2a33a
>> 14
>> 25736
>> libc.so.1`_write+0x15
>> libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>> libjvm.so`JVM_Write+0x30
>> libjava.so`0xc8f7c04b
>> 16
>> 25736
>> libc.so.1`stat64+0x15
>> 21
>> 25736
>> libc.so.1`_write+0x15
>> libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>> libjvm.so`JVM_Write+0x30
>> libjava.so`0xc8f80ce9
>> 76
>> java 25736 kernel-level lock 1
>> java 25736 shuttle 6
>> java 25736 preempted 7
>> java 25736 user-level lock 511
>> java 25736 condition variable 748
>>
>>
>> Atenciosamente,
>>
>> ------------------------------------------------------------------
>>
>> Kleyson Rios.
>> Ger?ncia de Suporte T?cnico
>> Analista de Suporte / L?der de Equipe
>>
>>
>> -----Mensagem original-----
>> De: Jonathan Adams [mailto:jonathan.adams at sun.com]
>> Enviada em: sexta-feira, 18 de abril de 2008 15:40
>> Para: Kleyson Rios
>> Cc: dtrace-discuss at opensolaris.org
>> Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)
>>
>>
>> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
>>
>>
>>> Hi przemol,
>>>
>>> Bellow output of plockstat for malloc and libumem. Both many locks.
>>> Why changing to libumem I didn''t get less locks ?
>>>
>>>
>>>
>> You''re looking at Mutex hold statistics, which don''t
mean a lot
>> (unless contention is caused by long hold times)
>>
>> The important thing for multi-threaded performance is *contention*.
>> (Spinning and blocking) Those are the statistics you should be
>> looking at.
>>
>> Both malloc and libumem use locks to protect their state; libumem
>> just uses many locks, in order to reduce contention.
>>
>> Cheers,
>> - jonathan
>>
>>
>>
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>>
>>
>
>
>
OK. Let''s go.
Now i have about 100 users simultaneous. That box has 4 CPU and it''s
running
JBoss.
My application it''s very slow. Looking at prstat I saw that I have many
threads and most of time one or two threads are running in the same time.
Using -Lm prstat options I saw that the most of time the threads are in
LCK/SLP state.
Using plockstat I saw many locks for malloc, so I changed for libumem. I
don''t have more contention locks (Spin, block), I have just many Mutex
hold
locks. But the problem still exist. I have 7 box running jboss, all box have
the same problem.
Doubt: Are there some difference for output running prstat for 1 second or 5
second? Because running prstat for 1 second I get thread with 100% LCK.
*******************************************************************
Prstat (now), I got a little better LCK when I changed for libumem:
[root at aga253distp156:/] # prstat -Lmp 25736
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
25736 root 57 0.0 0.0 0.0 0.0 43 0.0 0.1 36 83 76 0 java/2
25736 root 0.6 0.5 0.0 0.0 0.0 16 78 0.0 32 0 1K 0 java/7211
25736 root 0.4 0.6 0.0 0.0 0.0 29 70 0.0 19 5 1K 0 java/7237
25736 root 0.4 0.6 0.0 0.0 0.0 27 61 0.2 25 15 1K 0 java/7364
25736 root 0.4 0.5 0.0 0.0 0.0 20 80 0.0 19 8 1K 0 java/7218
25736 root 0.3 0.4 0.0 0.0 0.0 29 70 0.0 18 5 1K 0 java/7204
25736 root 0.3 0.4 0.0 0.0 0.0 16 73 0.0 23 6 864 0 java/7374
25736 root 0.2 0.4 0.0 0.0 0.0 20 79 0.0 9 11 1K 0 java/7305
25736 root 0.3 0.4 0.0 0.0 0.0 23 76 0.0 7 9 900 0 java/7475
25736 root 0.4 0.3 0.0 0.0 0.0 11 88 0.0 15 1 553 0 java/7236
25736 root 0.3 0.3 0.0 0.0 0.0 0.8 93 0.0 17 2 734 0 java/7370
25736 root 0.1 0.3 0.0 0.0 0.0 8.7 91 0.0 7 2 773 0 java/7235
25736 root 0.2 0.2 0.0 0.0 0.0 9.5 90 0.0 8 0 316 0 java/7219
25736 root 0.2 0.1 0.0 0.0 0.0 9.8 84 0.0 17 11 306 0 java/7269
25736 root 0.1 0.1 0.0 0.0 0.0 0.1 100 0.0 2 1 419 0 java/7471
25736 root 0.2 0.0 0.0 0.0 0.0 8.0 92 0.0 14 3 68 0 java/7264
25736 root 0.1 0.1 0.0 0.0 0.0 10 89 0.0 6 0 327 0 java/7470
25736 root 0.1 0.1 0.0 0.0 0.0 0.0 100 0.0 10 0 302 0 java/7365
25736 root 0.2 0.0 0.0 0.0 0.0 6.4 88 0.1 14 5 49 0 java/7317
25736 root 0.1 0.1 0.0 0.0 0.0 93 7.0 0.1 4 4 209 0 java/7367
25736 root 0.1 0.0 0.0 0.0 0.0 4.0 91 0.0 10 1 55 0 java/7229
25736 root 0.1 0.0 0.0 0.0 0.0 3.9 96 0.0 11 1 57 0 java/7304
25736 root 0.1 0.1 0.0 0.0 0.0 8.6 91 0.0 4 1 196 0 java/14
25736 root 0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 1 1 263 0 java/7297
25736 root 0.1 0.1 0.0 0.0 0.0 0.0 100 0.1 154 0 86 0 java/10
25736 root 0.1 0.0 0.0 0.0 0.0 0.6 99 0.0 9 0 55 0 java/7377
25736 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 12 4 54 0 java/4
25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 16 0 40 0 java/3
25736 root 0.0 0.0 0.0 0.0 0.0 3.6 96 0.0 2 0 4 0 java/7481
25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 8 0 12 0 java/170
25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 12 0 12 0 java/231
25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 19 4 25 0 java/171
25736 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 2 0 java/7221
25736 root 0.0 0.0 0.0 0.0 0.0 95 0.0 0.1 20 0 20 0 java/167
25736 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 2 0 java/7378
Total: 1 processes, 256 lwps, load averages: 0.74, 0.68, 0.62
******************************************************************
Plockstat (now):
[root at aga253distp213:/dtrace] # plockstat -A -e 30 -p 25736
0
Mutex hold
Count nsec Lock Caller
----------------------------------------------------------------------------
---
3546 4765 0x8079580 libumem.so.1`umem_cache_alloc+0xc1
3509 3865 0x8079580 libumem.so.1`umem_cache_free+0x194
328 12542 0x8073030 libumem.so.1`vmem_xfree+0xfe
3 1340378 libumem.so.1`umem_cache_lock
libumem.so.1`umem_cache_applyall+0x51
261 13414 0x8073030 libumem.so.1`vmem_alloc+0x13c
471 5696 0x807b680
libumem.so.1`umem_cache_alloc+0xc1
475 5441 0x807b680
libumem.so.1`umem_cache_free+0x194
123 20767 0x807b6c0
libumem.so.1`umem_cache_free+0x194
28 90103 0x807a1c0
libumem.so.1`umem_cache_free+0x194
329 6788 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_alloc+0x126
309 6900 0x807ad80
libumem.so.1`umem_cache_alloc+0xc1
329 6310 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_xalloc+0x43b
328 6248 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_xfree+0x127
355 5727 0x807b300
libumem.so.1`umem_cache_alloc+0xc1
216 9080 0x807b940
libumem.so.1`umem_cache_alloc+0xc1
355 5338 0x807b300
libumem.so.1`umem_cache_free+0x194
328 5393 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_xfree+0xfe
310 5308 0x807ad80
libumem.so.1`umem_cache_free+0x194
329 4921 0x8073030 libumem.so.1`vmem_xalloc+0x43b
154 10299 0x807b9c0
libumem.so.1`umem_cache_alloc+0xc1
216 6043 0x807b940
libumem.so.1`umem_cache_free+0x194
329 3960 0x8073030 libumem.so.1`vmem_xalloc+0x315
161 7968 0x807b980
libumem.so.1`umem_cache_alloc+0xc1
235 5438 0x807b500
libumem.so.1`umem_cache_alloc+0xc1
329 3784 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_alloc+0x13c
241 5029 0x807b500
libumem.so.1`umem_cache_free+0x194
85 13906 0x8082c80
libumem.so.1`umem_cache_alloc+0xc1
329 3577 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_xalloc+0x315
39 29926 0x8079980
libumem.so.1`umem_cache_free+0x194
130 8876 0x807ba00
libumem.so.1`umem_cache_alloc+0xc1
34 33388 0xc8e149c0 libc.so.1`thr_suspend+0x1a
155 7304 0x807ba40
libumem.so.1`umem_cache_alloc+0xc1
13 86856 0x807ac80
libumem.so.1`umem_cache_alloc+0xc1
94 11979 0x80828c0
libumem.so.1`umem_cache_alloc+0xc1
143 7742 0x807b5c0
libumem.so.1`umem_cache_alloc+0xc1
153 5759 0x807b5c0
libumem.so.1`umem_cache_free+0x194
98 8816 0x807b540
libumem.so.1`umem_cache_alloc+0xc1
115 7499 0x807b8c0
libumem.so.1`umem_cache_alloc+0xc1
26 32518 0xc8e12100 libc.so.1`thr_suspend+0x1a
158 5210 0x807b640
libumem.so.1`umem_cache_free+0x194
95 8504 0x80828c0
libumem.so.1`umem_cache_free+0x194
155 5199 0x807ba40
libumem.so.1`umem_cache_free+0x194
8 97711 0xc8e145c0 libc.so.1`thr_suspend+0x1a
130 5915 0x807ba00
libumem.so.1`umem_cache_free+0x194
41 18736 0x807b2c0
libumem.so.1`umem_cache_alloc+0xc1
101 7524 0x8082940
libumem.so.1`umem_cache_alloc+0xc1
33 22842 0x807d640
libumem.so.1`umem_cache_free+0x194
35 21064 0x807acc0
libumem.so.1`umem_cache_alloc+0xc1
16 45814 0xc8e12900 libc.so.1`thr_suspend+0x1a
129 5641 0x807b880
libumem.so.1`umem_cache_free+0x194
5 142887 0xc8e13100 libc.so.1`thr_suspend+0x1a
43 16584 0x807b240
libumem.so.1`umem_cache_alloc+0xc1
21 33838 0xc8e12b80 libc.so.1`thr_suspend+0x1a
101 7012 0x8082940
libumem.so.1`umem_cache_free+0x194
8 42677 0xc8e13f00 libc.so.1`thr_suspend+0x1a
7 48677 0xc8e12940 libc.so.1`thr_suspend+0x1a
9 37763 0xc8e13080 libc.so.1`thr_suspend+0x1a
30 11317 0x8079880
libumem.so.1`umem_cache_alloc+0xc1
61 5553 0x807fcc0
libumem.so.1`umem_cache_alloc+0xc1
7 38522 0xc8e14240 libc.so.1`thr_suspend+0x1a
47 5664 0x8079cc0
26 9348 0x807a280
libumem.so.1`umem_cache_alloc+0xc1
6 39964 0xc8e14480 libc.so.1`thr_suspend+0x1a
26 9198 0x807bc40
libumem.so.1`umem_cache_alloc+0xc1
5 47600 0xc8e15440 libc.so.1`thr_suspend+0x1a
30 7853 0x807a240
libumem.so.1`umem_cache_alloc+0xc1
14 16771 0x80824b4
libumem.so.1`umem_depot_alloc+0xa4
3 3747 0x807f4b4
libumem.so.1`umem_cache_update+0xe5
3 3743 0x807d134
libumem.so.1`umem_cache_update+0xe5
(...)
Mutex block
Count nsec Lock Caller
----------------------------------------------------------------------------
---
1 1624523 0x8073030 libumem.so.1`vmem_xfree+0x24
1 23803 0x80825c0
libumem.so.1`umem_cache_free+0x51
1 18619 0x8073030 libumem.so.1`vmem_xfree+0x24
1 15304 0x807b5c0
libumem.so.1`umem_cache_free+0x51
******************************************************************
# time dtrace -n ''syscall::write:entry / pid == $target / {
@[fds[arg0].fi_pathname] = count(); }'' -p 25736
dtrace: description ''syscall::write:entry '' matched 1 probe
^C
/zonas/srvintra/root/var/tmp/imageio158908.tmp 16
/zonas/srvintra/root/var/tmp/imageio158909.tmp 16
/zonas/srvintra/root/var/tmp/imageio158910.tmp 16
/zonas/srvintra/root/var/tmp/imageio158911.tmp 16
/zonas/srvintra/root/var/tmp/imageio158912.tmp 16
/zonas/srvintra/root/var/tmp/imageio158913.tmp 16
/zonas/srvintra/root/var/tmp/imageio158914.tmp 16
/zonas/srvintra/root/var/tmp/imageio158915.tmp 16
/zonas/srvintra/root/var/tmp/imageio158916.tmp 16
/zonas/srvintra/root/var/tmp/imageio158917.tmp 16
/zonas/srvintra/root/var/tmp/imageio158918.tmp 16
/zonas/srvintra/root/var/tmp/imageio158919.tmp 16
/zonas/srvintra/root/var/tmp/imageio158920.tmp 16
/zonas/srvintra/root/var/tmp/imageio158921.tmp 16
/zonas/srvintra/root/var/tmp/imageio158922.tmp 16
/zonas/srvintra/root/var/tmp/imageio158923.tmp 16
/zonas/srvintra/root/var/tmp/imageio158924.tmp 16
/zonas/srvintra/root/var/tmp/imageio158925.tmp 16
/zonas/srvintra/root/var/tmp/imageio158926.tmp 16
/zonas/srvintra/root/var/tmp/imageio158927.tmp 16
/zonas/srvintra/root/var/tmp/imageio158928.tmp 16
/zonas/srvintra/root/var/tmp/imageio158929.tmp 16
/zonas/srvintra/root/var/tmp/imageio158930.tmp 16
/zonas/srvintra/root/var/tmp/imageio158931.tmp 16
/zonas/srvintra/root/var/tmp/imageio158932.tmp 16
/zonas/srvintra/root/var/tmp/imageio158933.tmp 16
/zonas/srvintra/root/var/tmp/imageio158934.tmp 16
/zonas/srvintra/root/var/tmp/imageio158935.tmp 16
/zonas/srvintra/root/var/tmp/imageio158936.tmp 16
/zonas/srvintra/root/var/tmp/imageio158937.tmp 16
/zonas/srvintra/root/var/tmp/imageio158938.tmp 16
/zonas/srvintra/root/var/tmp/imageio158939.tmp 16
/zonas/srvintra/root/var/tmp/imageio158940.tmp 16
/zonas/srvintra/root/var/tmp/imageio158941.tmp 16
/zonas/srvintra/root/var/tmp/imageio158942.tmp 16
/zonas/srvintra/root/var/tmp/imageio158943.tmp 16
/zonas/srvintra/root/var/tmp/imageio158944.tmp 16
/zonas/srvintra/root/var/tmp/imageio158945.tmp 16
/zonas/srvintra/root/var/tmp/imageio158946.tmp 16
/zonas/srvintra/root/var/tmp/imageio158947.tmp 16
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04
_23.controller.csv 247
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/deploy/detr
an.war/WEB-INF/control-detran.log 784
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04
_23.request.log 89872
real 0m46.632s
user 0m0.535s
sys 0m0.412s
*************************************************************
# time dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] =
count(); }''
-p 5736
dtrace: description ''pid$target:libc:malloc:entry '' matched 1
probe
^C
real 0m52.971s
user 0m0.581s
sys 0m0.452s
Regards,
------------------------------------------------------------------
Kleyson Rios.
Ger?ncia de Suporte T?cnico
Analista de Suporte / L?der de Equipe
Governo do Estado de Goi?s
Ag?ncia de Administra??o
Diretoria de Inform?tica
+55 62 3201-6582
"Voc? sabe a diferen?a entre pessoas bem sucedidas e pessoas mal sucedidas
?
Pessoas bem sucedidas est?o dispostas a fazer as coisas que as pessoas mal
sucedidas n?o est?o."
-----Mensagem original-----
De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM]
Enviada em: quarta-feira, 23 de abril de 2008 13:08
Para: Kleyson Rios
Cc: dtrace-discuss at opensolaris.org
Assunto: Re: RES: Process in LCK / SLP (Please)
D''oh! My bad. My brain was thinking malloc and lock contention,
I saw the network calls on the stack, connected the dots and starting
typing. I should have looked at the top of the stack. Sorry.
Let''s back up a minute. You have a Java application and the JVM has a
large number of threads in LCK time (user lock), right?
plockstat showed a lot of lock activity in libc malloc, at which point
you tried mtmalloc and libumem, neither of which made much of
a difference, right?
We know empirically (and heard from Jarod) that LCK can be deceiving,
because threads that call cond_wait(3C) to wait on a condition variable
will be charged with LCK time, but it''s really sleep time most of the
time.
So it is sometimes an artifact of the application design that threads show
up at 100% LCK time - threads are waiting for a cond_signal() (or
cond_broadcast())
to be put to work, but there''s no work to do. Anyway...
The problem I thought we were chasing was lock activity in malloc.
If you have more recent data (plockstat, prstat) please pass it to me (sorry
I missed it), and I''ll get back in sync on the current problem.
I would add that there''s nothing here that I think belongs on
dtrace_discuss.
I would start posting to the performance group and a Java performance
alias. Also, and I''m sorry if I missed this, how well (or poorly) is
the
Java application performing, and what metric do we have to determine
application performance.
Ricky''s script grabs a user stack when a thread goes off CPU to sleep,
and
tally''s what the threads are sleeping on. It''s mostly
condition
variables, which
doesn''t really help us much (everything sleeps on CVs...mostly).
There''s a
lof of user locks in there as well. So...
The CVs look like threads blocking on writes, including writes to a
network socket.
You need to figure out the write target, and go from there.
Get the PID of the target JVM -
dtrace -n ''syscall::write:entry / pid == $target /
@[fds[arg0].fi_pathname] = count(); }'' -p [PID_OF_JVM]
The above will list the files being written to. The next step depends on
what we see.
The same goes for the user locks. If they are indeed malloc locks (which
I think they are),
I suspect that will come back around to network IOs larger than 2k. Try
this:
dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] = count();
}'' -p
[PID_OF_JVM]
Thanks,
/jim
Kleyson Rios wrote:> Hi Jim,
>
> But, if there are problems in malloc for buffers for network, I should see
> the locks when running plockstat, don''t see ?
>
>
> Regards,
>
> ------------------------------------------------------------------
>
> Kleyson Rios.
>
> -----Mensagem original-----
> De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM]
> Enviada em: quarta-feira, 23 de abril de 2008 01:11
> Para: Kleyson Rios
> Cc: dtrace-discuss at opensolaris.org
> Assunto: Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please)
>
> You may want to cross-post to a Java alias, but I''ve been down
this
> road before.
>
> Java will call into malloc() for buffers for network reads and writes
> that are larger than 2k bytes (the 2k is from memory, and I think it
> was a 1.5 JVM). A large number of malloc calls,
> and resulting contention on locks in the library, are due to the
application> doing network writes of larger than 2k.
>
> Newer JVMs (1.6) may improve this, but I''m not sure.
There''s also
> an alternative set of classes and methods, NIO, which also can
> help (although I''ve heard tell that NIO brings other problems
along
> with it, but I can not speak from personal experience).
>
> At this point, I think you need to consult with Java experts to determine
> what options you have for buffer allocation for network IO from the
> Java heap, versus the current behavior of the JVM dropping back
> to malloc for allocating buffers for network IO.
>
> The other option of course is determining if the code can be changed to
> use buffers smaller than 2k.
>
> Thanks,
> /jim
>
>
>
>
> Kleyson Rios wrote:
>
>> OK jonathan,
>>
>> I understand.
>>
>> So, looking on right place now, i can see few locks and sometimes no
locks>> (just Mutex Hold). But I still have many threads in 100% LCK.
>>
>> If I don''t have a lot of locks, where is my problem ?
>>
>> Running rickey c weisner''s script a get:
>>
>> (...)
>> 25736
>> libc.so.1`_so_send+0x15
>> libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
>> libjvm.so`JVM_Send+0x32
>>
>>
> libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
>
>> 0xc3c098d3
>> 10
>> 25736
>> 0xc3d2a33a
>> 14
>> 25736
>> libc.so.1`_write+0x15
>> libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>> libjvm.so`JVM_Write+0x30
>> libjava.so`0xc8f7c04b
>> 16
>> 25736
>> libc.so.1`stat64+0x15
>> 21
>> 25736
>> libc.so.1`_write+0x15
>> libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>> libjvm.so`JVM_Write+0x30
>> libjava.so`0xc8f80ce9
>> 76
>> java 25736 kernel-level lock 1
>> java 25736 shuttle 6
>> java 25736 preempted 7
>> java 25736 user-level lock 511
>> java 25736 condition variable 748
>>
>>
>> Atenciosamente,
>>
>> ------------------------------------------------------------------
>>
>> Kleyson Rios.
>> Ger?ncia de Suporte T?cnico
>> Analista de Suporte / L?der de Equipe
>>
>>
>> -----Mensagem original-----
>> De: Jonathan Adams [mailto:jonathan.adams at sun.com]
>> Enviada em: sexta-feira, 18 de abril de 2008 15:40
>> Para: Kleyson Rios
>> Cc: dtrace-discuss at opensolaris.org
>> Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)
>>
>>
>> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
>>
>>
>>> Hi przemol,
>>>
>>> Bellow output of plockstat for malloc and libumem. Both many locks.
>>> Why changing to libumem I didn''t get less locks ?
>>>
>>>
>>>
>> You''re looking at Mutex hold statistics, which don''t
mean a lot
>> (unless contention is caused by long hold times)
>>
>> The important thing for multi-threaded performance is *contention*.
>> (Spinning and blocking) Those are the statistics you should be
>> looking at.
>>
>> Both malloc and libumem use locks to protect their state; libumem
>> just uses many locks, in order to reduce contention.
>>
>> Cheers,
>> - jonathan
>>
>>
>>
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>>
>>
>
>
>
> OK. Let''s go. > Now i have about 100 users simultaneous. That box has 4 CPU and it''s running > JBoss. > My application it''s very slow. Looking at prstat I saw that I have many > threads and most of time one or two threads are running in the same time. > Using -Lm prstat options I saw that the most of time the threads are in > LCK/SLP state. >Yep. Performance is slow because this threaded application isn''t really get much in the way of concurrency. In fact, there are 256 threads in this app, with only 1 getting work done, so my first question is why 256 threads? Did you try tweaking this down to, say, 8 threads? Or 4 threads?> Using plockstat I saw many locks for malloc, so I changed for libumem. I > don''t have more contention locks (Spin, block), I have just many Mutex hold > locks.The hold times are not atrociously bad, but some are fairly long. More to the point, this indicates all the LCK time is in malloc-related locks. So you may have improved the situation here with the move to libumem, but it doesn''t necesarily mean the problem has been solved. The data tells us that some Java threads are spending 20% or more of their time waiting on locks, and those locks all seem to be in malloc. So we need to figure out why the JVM is calling malloc. That''s the second dtrace line I sent, and I''m not sure why it came back empty. I would run it again. We need to know where the malloc calls are coming from.> But the problem still exist. I have 7 box running jboss, all box have > the same problem. >That''s not surprising. It looks like an application issue to me.> Doubt: Are there some difference for output running prstat for 1 second or 5 > second? Because running prstat for 1 second I get thread with 100% LCK. >Only the difference one would expect by averaging over a longer period. The actual microstates are event-based, and prstat computes the percentage of time spent in each microstate based on the interval, so changing the interval from 1 second to 5 seconds means a longer window from which to compute averages. I would not expect things to change appreciably in terms of what''s reported (and, in my experience, it generally does not). Your prstat data shows significantly more sleep time than lock time, although between the two you account for 98% of time for almost all the threads. So.... 1. The locks. You need to determine where the mallocs are coming from. Only an application change can improve this. 2. Sleep. What are the threads sleeping on? Looks like writes to files and network sockets account for a fare bit (based on earlier stack traces). You could also grab some random pstack(1) samples (pstack PID/LWPID) for long sleeping threads. If you want to reduce sleep time on writes, you need to make writes faster, or change the code to spin off a "writer" thread to do the writes asynchronously to the other work the application does (assuming the application design and specs permit such a thing) - otherwise, you need to optimize writes. Looks like the file writes are all to temp files in /var/tmp. Check your iostat data and make sure you''re getting good write performance. Are the files opened O_SYNC/O_DSYNC? That will really slow writes down, and should only be done if it''s an absolute requirement for the application (run pfiles on the JVM to determine if the files SYNC flags are set. If they are, ask the programmers if sync writes are necessary). Again, this isn''t a dtrace issue, it''s a Java application performance issue. You''ll probably get more help by posting to a performance group (perf-discuss at opensolaris.org) and a Java interest group (sorry, I''m not a Java expert). You should also consider trying a newer JVM - the 1.6 JVM includes the hotspot dtrace provider, so we can learn more about what the code is doing with dtrace. Alternatively, download and link in agent libs for 1.4/1.5 Java to get some dtrace probes: https://solaris10-dtrace-vm-agents.dev.java.net/ And again, I don''t know why 256 threads are being started. I would seriously consider taking the thread count way down for some testing. Too many threads puts more pressure on already contended resources, and the threads are not getting any work done anyway... HTH, /jim> ******************************************************************* > Prstat (now), I got a little better LCK when I changed for libumem: > > [root at aga253distp156:/] # prstat -Lmp 25736 > PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG > PROCESS/LWPID > 25736 root 57 0.0 0.0 0.0 0.0 43 0.0 0.1 36 83 76 0 java/2 > 25736 root 0.6 0.5 0.0 0.0 0.0 16 78 0.0 32 0 1K 0 java/7211 > 25736 root 0.4 0.6 0.0 0.0 0.0 29 70 0.0 19 5 1K 0 java/7237 > 25736 root 0.4 0.6 0.0 0.0 0.0 27 61 0.2 25 15 1K 0 java/7364 > 25736 root 0.4 0.5 0.0 0.0 0.0 20 80 0.0 19 8 1K 0 java/7218 > 25736 root 0.3 0.4 0.0 0.0 0.0 29 70 0.0 18 5 1K 0 java/7204 > 25736 root 0.3 0.4 0.0 0.0 0.0 16 73 0.0 23 6 864 0 java/7374 > 25736 root 0.2 0.4 0.0 0.0 0.0 20 79 0.0 9 11 1K 0 java/7305 > 25736 root 0.3 0.4 0.0 0.0 0.0 23 76 0.0 7 9 900 0 java/7475 > 25736 root 0.4 0.3 0.0 0.0 0.0 11 88 0.0 15 1 553 0 java/7236 > 25736 root 0.3 0.3 0.0 0.0 0.0 0.8 93 0.0 17 2 734 0 java/7370 > 25736 root 0.1 0.3 0.0 0.0 0.0 8.7 91 0.0 7 2 773 0 java/7235 > 25736 root 0.2 0.2 0.0 0.0 0.0 9.5 90 0.0 8 0 316 0 java/7219 > 25736 root 0.2 0.1 0.0 0.0 0.0 9.8 84 0.0 17 11 306 0 java/7269 > 25736 root 0.1 0.1 0.0 0.0 0.0 0.1 100 0.0 2 1 419 0 java/7471 > 25736 root 0.2 0.0 0.0 0.0 0.0 8.0 92 0.0 14 3 68 0 java/7264 > 25736 root 0.1 0.1 0.0 0.0 0.0 10 89 0.0 6 0 327 0 java/7470 > 25736 root 0.1 0.1 0.0 0.0 0.0 0.0 100 0.0 10 0 302 0 java/7365 > 25736 root 0.2 0.0 0.0 0.0 0.0 6.4 88 0.1 14 5 49 0 java/7317 > 25736 root 0.1 0.1 0.0 0.0 0.0 93 7.0 0.1 4 4 209 0 java/7367 > 25736 root 0.1 0.0 0.0 0.0 0.0 4.0 91 0.0 10 1 55 0 java/7229 > 25736 root 0.1 0.0 0.0 0.0 0.0 3.9 96 0.0 11 1 57 0 java/7304 > 25736 root 0.1 0.1 0.0 0.0 0.0 8.6 91 0.0 4 1 196 0 java/14 > 25736 root 0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 1 1 263 0 java/7297 > 25736 root 0.1 0.1 0.0 0.0 0.0 0.0 100 0.1 154 0 86 0 java/10 > 25736 root 0.1 0.0 0.0 0.0 0.0 0.6 99 0.0 9 0 55 0 java/7377 > 25736 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 12 4 54 0 java/4 > 25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 16 0 40 0 java/3 > 25736 root 0.0 0.0 0.0 0.0 0.0 3.6 96 0.0 2 0 4 0 java/7481 > 25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 8 0 12 0 java/170 > 25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 12 0 12 0 java/231 > 25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 19 4 25 0 java/171 > 25736 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 2 0 java/7221 > 25736 root 0.0 0.0 0.0 0.0 0.0 95 0.0 0.1 20 0 20 0 java/167 > 25736 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 2 0 java/7378 > Total: 1 processes, 256 lwps, load averages: 0.74, 0.68, 0.62 > > ****************************************************************** > > > Plockstat (now): > > [root at aga253distp213:/dtrace] # plockstat -A -e 30 -p 25736 > 0 > Mutex hold > > Count nsec Lock Caller > ---------------------------------------------------------------------------- > --- > 3546 4765 0x8079580 libumem.so.1`umem_cache_alloc+0xc1 > 3509 3865 0x8079580 libumem.so.1`umem_cache_free+0x194 > 328 12542 0x8073030 libumem.so.1`vmem_xfree+0xfe > 3 1340378 libumem.so.1`umem_cache_lock > libumem.so.1`umem_cache_applyall+0x51 > 261 13414 0x8073030 libumem.so.1`vmem_alloc+0x13c > 471 5696 0x807b680 > libumem.so.1`umem_cache_alloc+0xc1 > 475 5441 0x807b680 > libumem.so.1`umem_cache_free+0x194 > 123 20767 0x807b6c0 > libumem.so.1`umem_cache_free+0x194 > 28 90103 0x807a1c0 > libumem.so.1`umem_cache_free+0x194 > 329 6788 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_alloc+0x126 > 309 6900 0x807ad80 > libumem.so.1`umem_cache_alloc+0xc1 > 329 6310 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_xalloc+0x43b > 328 6248 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_xfree+0x127 > 355 5727 0x807b300 > libumem.so.1`umem_cache_alloc+0xc1 > 216 9080 0x807b940 > libumem.so.1`umem_cache_alloc+0xc1 > 355 5338 0x807b300 > libumem.so.1`umem_cache_free+0x194 > 328 5393 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_xfree+0xfe > 310 5308 0x807ad80 > libumem.so.1`umem_cache_free+0x194 > 329 4921 0x8073030 libumem.so.1`vmem_xalloc+0x43b > 154 10299 0x807b9c0 > libumem.so.1`umem_cache_alloc+0xc1 > 216 6043 0x807b940 > libumem.so.1`umem_cache_free+0x194 > 329 3960 0x8073030 libumem.so.1`vmem_xalloc+0x315 > 161 7968 0x807b980 > libumem.so.1`umem_cache_alloc+0xc1 > 235 5438 0x807b500 > libumem.so.1`umem_cache_alloc+0xc1 > 329 3784 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_alloc+0x13c > 241 5029 0x807b500 > libumem.so.1`umem_cache_free+0x194 > 85 13906 0x8082c80 > libumem.so.1`umem_cache_alloc+0xc1 > 329 3577 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_xalloc+0x315 > 39 29926 0x8079980 > libumem.so.1`umem_cache_free+0x194 > 130 8876 0x807ba00 > libumem.so.1`umem_cache_alloc+0xc1 > 34 33388 0xc8e149c0 libc.so.1`thr_suspend+0x1a > 155 7304 0x807ba40 > libumem.so.1`umem_cache_alloc+0xc1 > 13 86856 0x807ac80 > libumem.so.1`umem_cache_alloc+0xc1 > 94 11979 0x80828c0 > libumem.so.1`umem_cache_alloc+0xc1 > 143 7742 0x807b5c0 > libumem.so.1`umem_cache_alloc+0xc1 > 153 5759 0x807b5c0 > libumem.so.1`umem_cache_free+0x194 > 98 8816 0x807b540 > libumem.so.1`umem_cache_alloc+0xc1 > 115 7499 0x807b8c0 > libumem.so.1`umem_cache_alloc+0xc1 > 26 32518 0xc8e12100 libc.so.1`thr_suspend+0x1a > 158 5210 0x807b640 > libumem.so.1`umem_cache_free+0x194 > 95 8504 0x80828c0 > libumem.so.1`umem_cache_free+0x194 > 155 5199 0x807ba40 > libumem.so.1`umem_cache_free+0x194 > 8 97711 0xc8e145c0 libc.so.1`thr_suspend+0x1a > 130 5915 0x807ba00 > libumem.so.1`umem_cache_free+0x194 > 41 18736 0x807b2c0 > libumem.so.1`umem_cache_alloc+0xc1 > 101 7524 0x8082940 > libumem.so.1`umem_cache_alloc+0xc1 > 33 22842 0x807d640 > libumem.so.1`umem_cache_free+0x194 > 35 21064 0x807acc0 > libumem.so.1`umem_cache_alloc+0xc1 > 16 45814 0xc8e12900 libc.so.1`thr_suspend+0x1a > 129 5641 0x807b880 > libumem.so.1`umem_cache_free+0x194 > 5 142887 0xc8e13100 libc.so.1`thr_suspend+0x1a > 43 16584 0x807b240 > libumem.so.1`umem_cache_alloc+0xc1 > 21 33838 0xc8e12b80 libc.so.1`thr_suspend+0x1a > 101 7012 0x8082940 > libumem.so.1`umem_cache_free+0x194 > 8 42677 0xc8e13f00 libc.so.1`thr_suspend+0x1a > 7 48677 0xc8e12940 libc.so.1`thr_suspend+0x1a > 9 37763 0xc8e13080 libc.so.1`thr_suspend+0x1a > 30 11317 0x8079880 > libumem.so.1`umem_cache_alloc+0xc1 > 61 5553 0x807fcc0 > libumem.so.1`umem_cache_alloc+0xc1 > 7 38522 0xc8e14240 libc.so.1`thr_suspend+0x1a > 47 5664 0x8079cc0 > 26 9348 0x807a280 > libumem.so.1`umem_cache_alloc+0xc1 > 6 39964 0xc8e14480 libc.so.1`thr_suspend+0x1a > 26 9198 0x807bc40 > libumem.so.1`umem_cache_alloc+0xc1 > 5 47600 0xc8e15440 libc.so.1`thr_suspend+0x1a > 30 7853 0x807a240 > libumem.so.1`umem_cache_alloc+0xc1 > 14 16771 0x80824b4 > libumem.so.1`umem_depot_alloc+0xa4 > 3 3747 0x807f4b4 > libumem.so.1`umem_cache_update+0xe5 > 3 3743 0x807d134 > libumem.so.1`umem_cache_update+0xe5 > (...) > > Mutex block > > Count nsec Lock Caller > ---------------------------------------------------------------------------- > --- > 1 1624523 0x8073030 libumem.so.1`vmem_xfree+0x24 > 1 23803 0x80825c0 > libumem.so.1`umem_cache_free+0x51 > 1 18619 0x8073030 libumem.so.1`vmem_xfree+0x24 > 1 15304 0x807b5c0 > libumem.so.1`umem_cache_free+0x51 > > > ****************************************************************** > > > # time dtrace -n ''syscall::write:entry / pid == $target / { > @[fds[arg0].fi_pathname] = count(); }'' -p 25736 > > dtrace: description ''syscall::write:entry '' matched 1 probe > ^C > > /zonas/srvintra/root/var/tmp/imageio158908.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158909.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158910.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158911.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158912.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158913.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158914.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158915.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158916.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158917.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158918.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158919.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158920.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158921.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158922.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158923.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158924.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158925.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158926.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158927.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158928.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158929.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158930.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158931.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158932.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158933.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158934.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158935.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158936.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158937.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158938.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158939.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158940.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158941.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158942.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158943.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158944.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158945.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158946.tmp 16 > /zonas/srvintra/root/var/tmp/imageio158947.tmp 16 > > /zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04 > _23.controller.csv 247 > > /zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/deploy/detr > an.war/WEB-INF/control-detran.log 784 > > /zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04 > _23.request.log 89872 > > real 0m46.632s > user 0m0.535s > sys 0m0.412s > > ************************************************************* > > # time dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] = count(); }'' > -p 5736 > > dtrace: description ''pid$target:libc:malloc:entry '' matched 1 probe > ^C > > > real 0m52.971s > user 0m0.581s > sys 0m0.452s > > > > Regards, > > ------------------------------------------------------------------ > > Kleyson Rios. > Ger?ncia de Suporte T?cnico > Analista de Suporte / L?der de Equipe > > Governo do Estado de Goi?s > Ag?ncia de Administra??o > Diretoria de Inform?tica > +55 62 3201-6582 > > "Voc? sabe a diferen?a entre pessoas bem sucedidas e pessoas mal sucedidas ? > Pessoas bem sucedidas est?o dispostas a fazer as coisas que as pessoas mal > sucedidas n?o est?o." > > -----Mensagem original----- > De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM] > Enviada em: quarta-feira, 23 de abril de 2008 13:08 > Para: Kleyson Rios > Cc: dtrace-discuss at opensolaris.org > Assunto: Re: RES: Process in LCK / SLP (Please) > > D''oh! My bad. My brain was thinking malloc and lock contention, > I saw the network calls on the stack, connected the dots and starting > typing. I should have looked at the top of the stack. Sorry. > > Let''s back up a minute. You have a Java application and the JVM has a > large number of threads in LCK time (user lock), right? > > plockstat showed a lot of lock activity in libc malloc, at which point > you tried mtmalloc and libumem, neither of which made much of > a difference, right? > > We know empirically (and heard from Jarod) that LCK can be deceiving, > because threads that call cond_wait(3C) to wait on a condition variable > will be charged with LCK time, but it''s really sleep time most of the time. > So it is sometimes an artifact of the application design that threads show > up at 100% LCK time - threads are waiting for a cond_signal() (or > cond_broadcast()) > to be put to work, but there''s no work to do. Anyway... > > The problem I thought we were chasing was lock activity in malloc. > > If you have more recent data (plockstat, prstat) please pass it to me (sorry > I missed it), and I''ll get back in sync on the current problem. > > I would add that there''s nothing here that I think belongs on > dtrace_discuss. > I would start posting to the performance group and a Java performance > alias. Also, and I''m sorry if I missed this, how well (or poorly) is the > Java application performing, and what metric do we have to determine > application performance. > > Ricky''s script grabs a user stack when a thread goes off CPU to sleep, and > tally''s what the threads are sleeping on. It''s mostly condition > variables, which > doesn''t really help us much (everything sleeps on CVs...mostly). There''s a > lof of user locks in there as well. So... > > The CVs look like threads blocking on writes, including writes to a > network socket. > You need to figure out the write target, and go from there. > > Get the PID of the target JVM - > > dtrace -n ''syscall::write:entry / pid == $target / > @[fds[arg0].fi_pathname] = count(); }'' -p [PID_OF_JVM] > > The above will list the files being written to. The next step depends on > what we see. > > The same goes for the user locks. If they are indeed malloc locks (which > I think they are), > I suspect that will come back around to network IOs larger than 2k. Try > this: > > dtrace -n ''pid$target:libc:malloc:entry { @j[jstack()] = count(); }'' -p > [PID_OF_JVM] > > Thanks, > /jim > > > > Kleyson Rios wrote: > >> Hi Jim, >> >> But, if there are problems in malloc for buffers for network, I should see >> the locks when running plockstat, don''t see ? >> >> >> Regards, >> >> ------------------------------------------------------------------ >> >> Kleyson Rios. >> >> -----Mensagem original----- >> De: James.Mauro at Sun.COM [mailto:James.Mauro at Sun.COM] >> Enviada em: quarta-feira, 23 de abril de 2008 01:11 >> Para: Kleyson Rios >> Cc: dtrace-discuss at opensolaris.org >> Assunto: Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please) >> >> You may want to cross-post to a Java alias, but I''ve been down this >> road before. >> >> Java will call into malloc() for buffers for network reads and writes >> that are larger than 2k bytes (the 2k is from memory, and I think it >> was a 1.5 JVM). A large number of malloc calls, >> and resulting contention on locks in the library, are due to the >> > application > >> doing network writes of larger than 2k. >> >> Newer JVMs (1.6) may improve this, but I''m not sure. There''s also >> an alternative set of classes and methods, NIO, which also can >> help (although I''ve heard tell that NIO brings other problems along >> with it, but I can not speak from personal experience). >> >> At this point, I think you need to consult with Java experts to determine >> what options you have for buffer allocation for network IO from the >> Java heap, versus the current behavior of the JVM dropping back >> to malloc for allocating buffers for network IO. >> >> The other option of course is determining if the code can be changed to >> use buffers smaller than 2k. >> >> Thanks, >> /jim >> >> >> >> >> Kleyson Rios wrote: >> >> >>> OK jonathan, >>> >>> I understand. >>> >>> So, looking on right place now, i can see few locks and sometimes no >>> > locks > >>> (just Mutex Hold). But I still have many threads in 100% LCK. >>> >>> If I don''t have a lot of locks, where is my problem ? >>> >>> Running rickey c weisner''s script a get: >>> >>> (...) >>> 25736 >>> libc.so.1`_so_send+0x15 >>> libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67 >>> libjvm.so`JVM_Send+0x32 >>> >>> >>> >> libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131 >> >> >>> 0xc3c098d3 >>> 10 >>> 25736 >>> 0xc3d2a33a >>> 14 >>> 25736 >>> libc.so.1`_write+0x15 >>> libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d >>> libjvm.so`JVM_Write+0x30 >>> libjava.so`0xc8f7c04b >>> 16 >>> 25736 >>> libc.so.1`stat64+0x15 >>> 21 >>> 25736 >>> libc.so.1`_write+0x15 >>> libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d >>> libjvm.so`JVM_Write+0x30 >>> libjava.so`0xc8f80ce9 >>> 76 >>> java 25736 kernel-level lock 1 >>> java 25736 shuttle 6 >>> java 25736 preempted 7 >>> java 25736 user-level lock 511 >>> java 25736 condition variable 748 >>> >>> >>> Atenciosamente, >>> >>> ------------------------------------------------------------------ >>> >>> Kleyson Rios. >>> Ger?ncia de Suporte T?cnico >>> Analista de Suporte / L?der de Equipe >>> >>> >>> -----Mensagem original----- >>> De: Jonathan Adams [mailto:jonathan.adams at sun.com] >>> Enviada em: sexta-feira, 18 de abril de 2008 15:40 >>> Para: Kleyson Rios >>> Cc: dtrace-discuss at opensolaris.org >>> Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please) >>> >>> >>> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote: >>> >>> >>> >>>> Hi przemol, >>>> >>>> Bellow output of plockstat for malloc and libumem. Both many locks. >>>> Why changing to libumem I didn''t get less locks ? >>>> >>>> >>>> >>>> >>> You''re looking at Mutex hold statistics, which don''t mean a lot >>> (unless contention is caused by long hold times) >>> >>> The important thing for multi-threaded performance is *contention*. >>> (Spinning and blocking) Those are the statistics you should be >>> looking at. >>> >>> Both malloc and libumem use locks to protect their state; libumem >>> just uses many locks, in order to reduce contention. >>> >>> Cheers, >>> - jonathan >>> >>> >>> >>> >>> _______________________________________________ >>> dtrace-discuss mailing list >>> dtrace-discuss at opensolaris.org >>> >>> >>> >> >> > > >