Hi, I''m trying to locate a function that I can use to tie an incoming tcp packet to the thread that finally accepts the connection. I noted that Brendan''s tcpsnoop uses tcp_accept_finish() but that is not always correct given that tcp_accept_finish() is called by squeue_enter() which may be fire under a different thread. Is there an easy solution or do I really have to work through the squeue stuff for this (I''m sure it''ll be rewarding though). -- Just me, Wire ...
Brendan Gregg
2005-Nov-23 11:55 UTC
[dtrace-discuss] Context of tcp_connect_finish & squeue
G''Day Folks, On Wed, 23 Nov 2005, Wee Yeh Tan wrote:> Hi, > > I''m trying to locate a function that I can use to tie an incoming tcp > packet to the thread that finally accepts the connection. > > I noted that Brendan''s tcpsnoop uses tcp_accept_finish() but that is > not always correct given that tcp_accept_finish() is called by > squeue_enter() which may be fire under a different thread.If any of my scripts are problematic, please post samples that demonstrate it, including instructions on how I can recreate it, or just email me. As an example, the following tests tcp_accept_finish() on a single CPU system under varying load, and connected from client system 10000 times using finger (inetd), # dtrace -n ''fbt::tcp_accept_finish:entry { @[execname] = count(); }'' dtrace: description ''fbt::tcp_accept_finish:entry '' matched 1 probe ^C inetd 10000 That''s 100% correct. Now a 4 CPU server (which I rarely have access to). This time 1000 times (10000 took too long!), # dtrace -n ''fbt::tcp_accept_finish:entry { @[execname] = count(); }'' dtrace: description ''fbt::tcp_accept_finish:entry '' matched 1 probe ^C inetd 1000 Great. Now the same test but different server load (now mostly idle rather than busy), # dtrace -n ''fbt::tcp_accept_finish:entry { @[execname] = count(); }'' dtrace: description ''fbt::tcp_accept_finish:entry '' matched 1 probe ^C sched 2 inetd 998 Bugger. I''m not at all happy with 99.8% correct for some scenarios. Looks like I''ll need to update all my tcp tools. Brendan
Hi Brendan, On 11/23/05, Brendan Gregg <brendan.gregg at tpg.com.au> wrote:> If any of my scripts are problematic, please post samples that demonstrate > it, including instructions on how I can recreate it, or just email me.Sure :). It didn''t cross me that your script was problematic though. I spent half a day trying to weave through tcp.c (and many others) and was looking at tcpsnoop for ideas on what function to look at before I gave up and posted to the board.> # dtrace -n ''fbt::tcp_accept_finish:entry { @[execname] = count(); }'' > dtrace: description ''fbt::tcp_accept_finish:entry '' matched 1 probe > ^C > sched 2 > inetd 998Technically, the situation should arise whenever a thread drains the squeue and works a request that is not owned by itself.> Bugger. I''m not at all happy with 99.8% correct for some scenarios. Looks > like I''ll need to update all my tcp tools.Please let me know if you see any solution. I''ve been trying to do this on and off but would love to get something going. -- Just me, Wire ...
Kais Belgaied
2005-Nov-28 19:43 UTC
[dtrace-discuss] Context of tcp_connect_finish & squeue
Wee Yeh Tan wrote On 11/23/05 00:29,:> Hi, > > I''m trying to locate a function that I can use to tie an incoming tcp > packet to the thread that finally accepts the connection. > > I noted that Brendan''s tcpsnoop uses tcp_accept_finish() but that is > not always correct given that tcp_accept_finish() is called by > squeue_enter() which may be fire under a different thread.you gotta catch it at the socket level, where the context of execution is the acceptor thread''s. It happens upstreams from tcp_accept_finish(). The acceptor thread calls the socket function sotpi_accept() which in turn calls sowaitconnind() and blocks until a connection indication arrives. sotpi_accept() then produces a TPI M_PROTO message that carries the T_conn_res (socket''s response to the incoming TCP connection). It sends that stream message downstreams to ip (TCP/IP module). There is a putnext() call in the sending, thus the squeue_enter(). TCP handles that message via tcp_accept_finish(). To use a variant of Brendan''s 1-liner, you may try catching the returns from sowaitconnind() : # dtrace -n ''fbt:sockfs:sowaitconnind:return { trace(execname); }'' Kais.> > Is there an easy solution or do I really have to work through the > squeue stuff for this (I''m sure it''ll be rewarding though). > > > -- > Just me, > Wire ... > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Kais, Thanks... I''m gonna need to work this a little more. I wasn''t able to extract anything off the sonode*. Will look at this over the weekend. -- Just me, Wire ... On 11/29/05, Kais Belgaied <Kais.Belgaied at sun.com> wrote:> you gotta catch it at the socket level, where the context of execution > is the acceptor thread''s. It happens upstreams from tcp_accept_finish(). > > The acceptor thread calls the socket function sotpi_accept() which > in turn calls sowaitconnind() and blocks until a connection indication > arrives. sotpi_accept() then produces a TPI M_PROTO message that carries the > T_conn_res (socket''s response to the incoming TCP connection). It sends > that stream message downstreams to ip (TCP/IP module). There is a > putnext() call in the sending, thus the squeue_enter(). > TCP handles that message via tcp_accept_finish(). > > To use a variant of Brendan''s 1-liner, you may try catching the returns > from sowaitconnind() : > > # dtrace -n ''fbt:sockfs:sowaitconnind:return { trace(execname); }'' > > Kais.
Brendan Gregg
2005-Dec-03 14:42 UTC
[dtrace-discuss] Context of tcp_connect_finish & squeue
G''Day Folks, On Mon, 28 Nov 2005, Kais Belgaied wrote:> Wee Yeh Tan wrote On 11/23/05 00:29,: > > Hi, > > > > I''m trying to locate a function that I can use to tie an incoming tcp > > packet to the thread that finally accepts the connection.So to recap the problem: * We have tcp events, we''d like to know thread/execname/pid details. * The tcp events can be key''d by a (int)(conn_t *) * If we can build hashes of the form, tname[(conn_t *)] = execname tpid[(conn_t *)] = pid ... Then we can easily lookup thread details for tcp events. (I''m deliberatly didn''t using a hash of structs, btw) I tried, fbt:ip:tcp_accept_finish:entry { self->connp = (conn_t *)arg0; tname[(int)self->connp] = execname; tpid[(int)self->connp] = pid; tuid[(int)self->connp] = uid; } Which we now know doesn''t always work on multi-CPU servers. And being slightly wrong isn''t good enough (unless clearly marked as an estimation tool). A number of people in the past have suggested I look at socket events, for which there are a few problems I won''t go into yet. The main one was no obvious path from the socket events (struct sonode *) to anything from the tcp events, either (conn_t *) or (tcp_t *) (which are both linked). I never figured this one out (maybe I should mention that I begin writing these tools without access to the kernel code (the safe public access we enjoy today!)).> > I noted that Brendan''s tcpsnoop uses tcp_accept_finish() but that is > > not always correct given that tcp_accept_finish() is called by > > squeue_enter() which may be fire under a different thread. > > you gotta catch it at the socket level, where the context of execution > is the acceptor thread''s. It happens upstreams from tcp_accept_finish(). > > The acceptor thread calls the socket function sotpi_accept() which > in turn calls sowaitconnind() and blocks until a connection indication > arrives. sotpi_accept() then produces a TPI M_PROTO message that carries the > T_conn_res (socket''s response to the incoming TCP connection). It sends > that stream message downstreams to ip (TCP/IP module). There is a > putnext() call in the sending, thus the squeue_enter(). > TCP handles that message via tcp_accept_finish(). > > To use a variant of Brendan''s 1-liner, you may try catching the returns > from sowaitconnind() : > > # dtrace -n ''fbt:sockfs:sowaitconnind:return { trace(execname); }''Ahh, thank you Kais - this gives me a sensible reference point where I can trust execname/etc, and you have explained the process well. (at one point I tried tracing putnext()s, but I''ve steered away from them as they can occur rapidly). sowaitconnind looks suitable in testing (100% correct so far), however at that point the new socket and it''s details aren''t populated. If I wait to sotpi_accept:return, then I can fetch the details we need to find the (conn_t *). In testing, sotpi_accept:return is also 100% correct; I just need to think carefully about what could happen between the sowaitconnind:return and the sotpi_accept:return to cause the acceptor thread to be incorrect again (hopefully nothing). Testing went like this, --------socktest.d-------- #!/usr/sbin/dtrace -s #pragma D option quiet fbt::tcp_accept_finish:entry { @tcp[execname] = count(); } fbt:sockfs:sowaitconnind:return { @sowait[execname] = count(); } fbt:sockfs:sotpi_accept:return { @soaccept[execname] = count(); } dtrace:::END { printf("tcp:\n"); printa(@tcp); printf("sowait:\n"); printa(@sowait); printf("soaccept:\n"); printa(@soaccept); } --------socktest.d-------- Then after 10000 connections on a loaded multi-CPU server, # ./socktest.d ^C tcp: sched 6 inetd 9994 sowait: inetd 10000 soaccept: inetd 10000 Cool. tcp_accept_finish is wrong (as we now expect), and both sowaitconnind:return and sotpi_accept:return are 100% correct. So, back to what I WAS using, fbt:ip:tcp_accept_finish:entry { self->connp = (conn_t *)arg0; tname[(int)self->connp] = execname; tpid[(int)self->connp] = pid; tuid[(int)self->connp] = uid; } And now the NEW code (that I''m still testing), ---------------- fbt:sockfs:sotpi_accept:entry { self->sop = args[0]; } fbt:sockfs:sotpi_create:return /self->sop/ { self->nsop = (struct sonode *)arg1; } fbt:sockfs:sotpi_accept:return /self->nsop/ { this->tcpp = (tcp_t *)self->nsop->so_priv; this->connp = (conn_t *)this->tcpp->tcp_connp; tname[(int)this->connp] = execname; tpid[(int)this->connp] = pid; tuid[(int)this->connp] = uid; } fbt:sockfs:sotpi_accept:return { self->nsop = 0; self->sop = 0; } ---------------- ... the key is tracking the new sonode''s, knowing that so_priv will contain (tcp_t *), and knowing when to read it (sotpi_accept:return). I''ll add some code so that it skips localhost connections, which I''m usually not interested in (maybe my tools should have a switch for that). cheers, Brendan