thr3ads.net - dtrace discuss - [dtrace-discuss] Tracing PIDs to TIME

If this information is useful, please help other people find it:
Share via:

Justin Lloyd

2007-Apr-11 18:57 UTC

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

Hi all,

We have a system on which about 17 OAS JVM processes are running and to which a
number of OAS Apache processes connect (used to be local but since moved to
another system for testing the impact). However, for reasons we have not yet
been able to determine, the system will have thousands (we''ve seen
10K+) of TCP connections in the TIME_WAIT state, having ramped up to that in
seconds. (There''s a lot more details to this problem, but this covers
the gist of it.) So I am trying to determine which JVM PIDs are the ones that
are receiving the TCP connections that are ending up in TIME_WAIT to help narrow
down the problem. Is there a way to do this via DTrace?

Thanks,
Justin


Justin C. Lloyd
Senior Engineer and System Administrator
303-684-4166 Office
720-480-0380 Cell
303-684-4100 Fax
jlloyd at digitalglobe.com
DigitalGlobe ?, An Imaging and Information Company 
 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20070411/b011d28a/attachment.html>

James Carlson

2007-Apr-11 19:03 UTC

head link

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

Justin Lloyd writes:> We have a system on which about 17 OAS JVM processes are running and
> to which a number of OAS Apache processes connect (used to be local
> but since moved to another system for testing the impact). However,
> for reasons we have not yet been able to determine, the system will
> have thousands (we''ve seen 10K+) of TCP connections in the
TIME_WAIT
> state, having ramped up to that in seconds. (There''s a lot more
> details to this problem, but this covers the gist of it.) So I am
> trying to determine which JVM PIDs are the ones that are receiving
> the TCP connections that are ending up in TIME_WAIT to help narrow
> down the problem. Is there a way to do this via DTrace?
Back up a bit: why is TIME_WAIT a problem?

It just means that your application issued close() before receiving
TCP FIN.  It typically happens on the "client" side, but not always.

-- 
James Carlson, Solaris Networking              <james.d.carlson at
sun.com>
Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Stefan Parvu

2007-Apr-11 19:47 UTC

head link

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

Hi Justin,
> 
> We have a system on which about 17 OAS JVM processes are running and to 
you have a lot of JVMs. Are you on Solaris 10 or Express ?
> problem, but this covers the gist of it.) So I am trying to determine 
> which JVM PIDs are the ones that are receiving the TCP connections that 
> are ending up in TIME_WAIT to help narrow down the problem. Is there a 
> way to do this via DTrace?

Basically you can use tcpsnoop from DTT.
http://www.brendangregg.com/dtrace.html#DTraceToolkit

Read the sample for tcpsnoop. As well have a look on
tcptop.


Hope it helps,
Stefan

Justin Lloyd

2007-Apr-11 21:10 UTC

head link

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

We''re on Solaris 10. The JVMs are various OAS containers, and that is
just one of the two OAS instances running on this server, but the other
one only has 4 JVMs, I think.

I''ve looked at tcpsnoop in the past, so I''ll check it out
again wrt this
issue. Thanks!
> -----Original Message-----
> From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss-
> bounces at opensolaris.org] On Behalf Of Stefan Parvu
> Sent: Wednesday, April 11, 2007 1:47 PM
> To: dtrace-discuss at opensolaris.org
> Subject: Re: [dtrace-discuss] Tracing PIDs to TIME_WAIT states?
> 
> Hi Justin,
> 
> >
> > We have a system on which about 17 OAS JVM processes are running and
to> 
> you have a lot of JVMs. Are you on Solaris 10 or Express ?
> 
> > problem, but this covers the gist of it.) So I am trying to
determine> > which JVM PIDs are the ones that are receiving the TCP connections
that> > are ending up in TIME_WAIT to help narrow down the problem. Is there
a> > way to do this via DTrace?
> 
> 
> Basically you can use tcpsnoop from DTT.
> http://www.brendangregg.com/dtrace.html#DTraceToolkit
> 
> Read the sample for tcpsnoop. As well have a look on
> tcptop.
> 
> 
> Hope it helps,
> Stefan
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

Justin Lloyd

2007-Apr-11 21:11 UTC

head link

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

TIME_WAIT itself isn''t a problem. Having many thousands when we should
normally have a few hundred is the problem. :)
> -----Original Message-----
> From: James Carlson [mailto:james.d.carlson at sun.com]
> Sent: Wednesday, April 11, 2007 1:03 PM
> To: Justin Lloyd
> Cc: dtrace-discuss at opensolaris.org
> Subject: Re: [dtrace-discuss] Tracing PIDs to TIME_WAIT states?
> 
> Justin Lloyd writes:
> > We have a system on which about 17 OAS JVM processes are running and
> > to which a number of OAS Apache processes connect (used to be local
> > but since moved to another system for testing the impact). However,
> > for reasons we have not yet been able to determine, the system will
> > have thousands (we''ve seen 10K+) of TCP connections in the
TIME_WAIT
> > state, having ramped up to that in seconds. (There''s a lot
more
> > details to this problem, but this covers the gist of it.) So I am
> > trying to determine which JVM PIDs are the ones that are receiving
> > the TCP connections that are ending up in TIME_WAIT to help narrow
> > down the problem. Is there a way to do this via DTrace?
> 
> Back up a bit: why is TIME_WAIT a problem?
> 
> It just means that your application issued close() before receiving
> TCP FIN.  It typically happens on the "client" side, but not
always.
> 
> --
> James Carlson, Solaris Networking
<james.d.carlson at sun.com>> Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442
2084> MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 4421677

Bart Smaalders

2007-Apr-11 21:37 UTC

head link

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

Justin Lloyd wrote:> TIME_WAIT itself isn''t a problem. Having many thousands when we
should
> normally have a few hundred is the problem. :)
> 
See who calls socket, and who calls close on a socket....

#!/usr/sbin/dtrace -s

syscall::so_socket:entry
{
         @a[execname, "socket"]=count();
}

syscall::open:entry
{
         self->trigger = 1;
}


syscall::close:entry
{
         self->trigger = 1;
}

fbt::getf:return
/self->trigger && arg1/
{
         self->v_type = ((file_t *)arg1)->f_vnode->v_type;
         self->trigger = 0;
}
syscall::open:return
/self->v_type == 9/
{
         @[execname, "open socket"]=count();
}
syscall::open:return
{
         self->trigger = 0;
}

syscall::close:return
/self->v_type == 9/
{
         @[execname, "close socket"]=count();
}

syscall::close:return
{
         self->vtype = 0;
}

-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts

Stefan Parvu

2007-Apr-11 22:30 UTC

head link

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

Justin Lloyd wrote:> We''re on Solaris 10. The JVMs are various OAS containers, and that
is
> just one of the two OAS instances running on this server, but the other
> one only has 4 JVMs, I think.
make sure you are all set on the JVM level before starting to go further 
with DTrace:

  - are you using Sun''s JVM HotSpot, -server -client
  - Java 1.4.2_xx or Java 5 ?
  - your heap size
  - frequency of the GC
  - what collector are you using: serial, concurrent, parallel
  - is there any external plugins installed in the web layer which 
redirects the requests to the OAS
  - check once more the TCP tuning
  - experiment with the dvm agent based on your Java version
> I''ve looked at tcpsnoop in the past, so I''ll check it out
again wrt this
> issue. Thanks!
Use Bart''s sample or tcpsnoop or even socketsnoop.d to discover what 
pids are creating traffic.


Stefan

Justin Lloyd

2007-Apr-11 22:59 UTC

head link

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

Our DBA group is on top of the Oracle JVM issue, working with Oracle
support. We (the IS team) are assisting with trying to determine what is
happening "under the hood". The 1000s of connetions is the cause of a
deeper problem: dropped connetions. Many of the things you mention are
being evaluated and tested in our development and test runtimes (e.g.
GC, TCP tuning, etc.)

I''m looking over Bart''s sample now. I had started writing
something
similar, looking at so_socket calls, but his is a bit more in-depth.

Thanks,
Justin
> -----Original Message-----
> From: Stefan Parvu [mailto:stefanparvu14 at yahoo.com]
> Sent: Wednesday, April 11, 2007 4:30 PM
> To: Justin Lloyd
> Cc: dtrace-discuss at opensolaris.org
> Subject: Re: [dtrace-discuss] Tracing PIDs to TIME_WAIT states?
> 
> Justin Lloyd wrote:
> > We''re on Solaris 10. The JVMs are various OAS containers, and
that
is> > just one of the two OAS instances running on this server, but the
other> > one only has 4 JVMs, I think.
> 
> make sure you are all set on the JVM level before starting to go
further> with DTrace:
> 
>   - are you using Sun''s JVM HotSpot, -server -client
>   - Java 1.4.2_xx or Java 5 ?
>   - your heap size
>   - frequency of the GC
>   - what collector are you using: serial, concurrent, parallel
>   - is there any external plugins installed in the web layer which
> redirects the requests to the OAS
>   - check once more the TCP tuning
>   - experiment with the dvm agent based on your Java version
> 
> > I''ve looked at tcpsnoop in the past, so I''ll check
it out again wrt
this> > issue. Thanks!
> 
> Use Bart''s sample or tcpsnoop or even socketsnoop.d to discover
what
> pids are creating traffic.
> 
> 
> Stefan

dtrace discuss - Apr 2007 - Tracing PIDs to TIME_WAIT states?

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?

[dtrace-discuss] Tracing PIDs to TIME_WAIT states?