(Forgive the previous incomplete message)> I spent some time a couple of weeks ago working on an intermittent > problem we''ve been seeing for at least 8 months now. It''s a complex > problem so I''m just trying to determine what I can see with regards to > network connection states. Basically, we an image processing server > (IP) than sends a request via a web cache server (WC) to an OAS HTTP > server (WS), which in turn sends its response back to the IP via the > WC. On very rare occasions, the IP doesn''t get a response from the WS > leading to a job getting into a funky state and the IP ending up with > its connections in a FIN_WAIT_2 state since they''re not receiving the > ACKs. What we don''t know is whether the WS is sending them and, if so, > whether they''re getting lost somewhere between the WS HTTP server > processes and the IP. > > So, I was trying to use DTrace to get down into the kernel to see if I > could even see something as simple as TCP packets being sent from the > WS and noting what kind of packets they are, e.g. ACK, FIN/ACK, etc., > and then once I could do that, I could dig deeper and look at the > actual problem. But I''m stumped as to how to do that, especially > without a DTrace network provider. > > I was trying tcpsnoop in the 0.99 DTrace toolkit, but I get the > following: >$ sudo ./tcpsnoop -p 14877 dtrace: failed to compile script /dev/fd/11: "/usr/include/sys/kstat.h", line 439: invalid type combination $ This is on S10 6/06. Any thoughts on this error any my problem in general?> Thanks, > Justin >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20071025/ac46bf13/attachment.html>
I may have figured out a bit: [jlloyd at dubnium-a:~] cat net.d #!/usr/sbin/dtrace -s fbt:ip:tcp_send:entry { t = (tcp_t *) arg1; @[execname, t->tcp_tcph->th_flags] = count(); } [jlloyd at dubnium-a:~] sudo ./net.d dtrace: script ''./net.d'' matched 1 probe ^C awk 24 1 httpd 24 1 java 16 1 sshd 24 2 opmn 16 3 zsched 24 3 sdtperfmeter 24 4 java 24 6 cvfsiod 24 12 opmn 24 13 sched 16 14 sched 24 36 [jlloyd at dubnium-a:~]>From netinet/tcp.h:44 #define TH_FIN <http://cvs.opensolaris.org/source/s?defs=TH_FIN> 0x01 45 #define TH_SYN <http://cvs.opensolaris.org/source/s?defs=TH_SYN> 0x02 46 #define TH_RST <http://cvs.opensolaris.org/source/s?defs=TH_RST> 0x04 47 #define TH_PUSH <http://cvs.opensolaris.org/source/s?defs=TH_PUSH> 0x08 48 #define TH_ACK <http://cvs.opensolaris.org/source/s?defs=TH_ACK> 0x10 49 #define TH_URG <http://cvs.opensolaris.org/source/s?defs=TH_URG> 0x20 50 #define TH_ECE <http://cvs.opensolaris.org/source/s?defs=TH_ECE> 0x40 51 #define TH_CWR <http://cvs.opensolaris.org/source/s?defs=TH_CWR> 0x80 So if I''m interpreting this correctly, and my dtrace code is giving me what I think, the packets being sent above are URG|RST (0x20|0x04) and ACK|RST|SYN (0x10|0x04|0x02). Is that right? Thanks, Justin ________________________________ From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Justin Lloyd Sent: Thursday, October 25, 2007 4:48 PM To: dtrace-discuss at opensolaris.org Subject: [dtrace-discuss] Tracing network packets (Forgive the previous incomplete message) I spent some time a couple of weeks ago working on an intermittent problem we''ve been seeing for at least 8 months now. It''s a complex problem so I''m just trying to determine what I can see with regards to network connection states. Basically, we an image processing server (IP) than sends a request via a web cache server (WC) to an OAS HTTP server (WS), which in turn sends its response back to the IP via the WC. On very rare occasions, the IP doesn''t get a response from the WS leading to a job getting into a funky state and the IP ending up with its connections in a FIN_WAIT_2 state since they''re not receiving the ACKs. What we don''t know is whether the WS is sending them and, if so, whether they''re getting lost somewhere between the WS HTTP server processes and the IP. So, I was trying to use DTrace to get down into the kernel to see if I could even see something as simple as TCP packets being sent from the WS and noting what kind of packets they are, e.g. ACK, FIN/ACK, etc., and then once I could do that, I could dig deeper and look at the actual problem. But I''m stumped as to how to do that, especially without a DTrace network provider. I was trying tcpsnoop in the 0.99 DTrace toolkit, but I get the following: $ sudo ./tcpsnoop -p 14877 dtrace: failed to compile script /dev/fd/11: "/usr/include/sys/kstat.h", line 439: invalid type combination $ This is on S10 6/06. Any thoughts on this error any my problem in general? Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20071025/25c5288e/attachment.html>
Brendan Gregg - Sun Microsystems
2007-Oct-25 23:29 UTC
[dtrace-discuss] Tracing network packets
On Thu, Oct 25, 2007 at 04:48:27PM -0600, Justin Lloyd wrote:> (Forgive the previous incomplete message) > > > I spent some time a couple of weeks ago working on an intermittent > > problem we''ve been seeing for at least 8 months now. It''s a complex > > problem so I''m just trying to determine what I can see with regards to > > network connection states. Basically, we an image processing server > > (IP) than sends a request via a web cache server (WC) to an OAS HTTP > > server (WS), which in turn sends its response back to the IP via the > > WC. On very rare occasions, the IP doesn''t get a response from the WS > > leading to a job getting into a funky state and the IP ending up with > > its connections in a FIN_WAIT_2 state since they''re not receiving the > > ACKs. What we don''t know is whether the WS is sending them and, if so, > > whether they''re getting lost somewhere between the WS HTTP server > > processes and the IP. > > > > So, I was trying to use DTrace to get down into the kernel to see if I > > could even see something as simple as TCP packets being sent from the > > WS and noting what kind of packets they are, e.g. ACK, FIN/ACK, etc., > > and then once I could do that, I could dig deeper and look at the > > actual problem. But I''m stumped as to how to do that, especially > > without a DTrace network provider. > > > > I was trying tcpsnoop in the 0.99 DTrace toolkit, but I get the > > following: > > > $ sudo ./tcpsnoop -p 14877 > dtrace: failed to compile script /dev/fd/11: "/usr/include/sys/kstat.h", > line 439: invalid type combination > $ > > This is on S10 6/06.There are two versions of tcpsnoop in DTT ver 0.99, try "tcpsnoop_snv" if the first doesn''t work. However this probably won''t help in this case, since this looks like bug 6315039. This was supposed to be fixed in patch 119578, however the patch itself was broken and logged in bug 6468001. The patch is supposed to be fixed (I think). Please let us know if it does or doesn''t work. Brendan -- Brendan [CA, USA]
tcpsnoop_snv does exhibit the same problem. We do have 119578-29 on the affected systems and I see that it is up to -30, though 30''s README doesn''t list bug 6468001 as being fixed. It may take me some time to get a system on which I can test 30, but as soon as I can I''ll let you know. Thanks, Justin -----Original Message----- From: Brendan Gregg - Sun Microsystems [mailto:brendan at sun.com] Sent: Thursday, October 25, 2007 5:30 PM To: Justin Lloyd Cc: dtrace-discuss at opensolaris.org Subject: Re: [dtrace-discuss] Tracing network packets On Thu, Oct 25, 2007 at 04:48:27PM -0600, Justin Lloyd wrote:> (Forgive the previous incomplete message) > > > I spent some time a couple of weeks ago working on an intermittent > > problem we''ve been seeing for at least 8 months now. It''s a complex > > problem so I''m just trying to determine what I can see with regards > > to network connection states. Basically, we an image processing > > server > > (IP) than sends a request via a web cache server (WC) to an OAS HTTP> > server (WS), which in turn sends its response back to the IP via the> > WC. On very rare occasions, the IP doesn''t get a response from the > > WS leading to a job getting into a funky state and the IP ending up > > with its connections in a FIN_WAIT_2 state since they''re not > > receiving the ACKs. What we don''t know is whether the WS is sending > > them and, if so, whether they''re getting lost somewhere between the > > WS HTTP server processes and the IP. > > > > So, I was trying to use DTrace to get down into the kernel to see if> > I could even see something as simple as TCP packets being sent from > > the WS and noting what kind of packets they are, e.g. ACK, FIN/ACK, > > etc., and then once I could do that, I could dig deeper and look at > > the actual problem. But I''m stumped as to how to do that, especially> > without a DTrace network provider. > > > > I was trying tcpsnoop in the 0.99 DTrace toolkit, but I get the > > following: > > > $ sudo ./tcpsnoop -p 14877 > dtrace: failed to compile script /dev/fd/11: > "/usr/include/sys/kstat.h", line 439: invalid type combination $ > > This is on S10 6/06.There are two versions of tcpsnoop in DTT ver 0.99, try "tcpsnoop_snv" if the first doesn''t work. However this probably won''t help in this case, since this looks like bug 6315039. This was supposed to be fixed in patch 119578, however the patch itself was broken and logged in bug 6468001. The patch is supposed to be fixed (I think). Please let us know if it does or doesn''t work. Brendan -- Brendan [CA, USA]
Justin Lloyd wrote:> I may have figured out a bit: > > [jlloyd at dubnium-a:~] cat net.d > #!/usr/sbin/dtrace -s > > fbt:ip:tcp_send:entry > { > t = (tcp_t *) arg1; > > @[execname, t->tcp_tcph->th_flags] = count(); > } > [jlloyd at dubnium-a:~] sudo ./net.d > dtrace: script ''./net.d'' matched 1 probe > ^C > > awk 24 1 > httpd 24 1 > java 16 1 > sshd 24 2 > opmn 16 3 > zsched 24 3 > sdtperfmeter 24 4 > java 24 6 > cvfsiod 24 12 > opmn 24 13 > sched 16 14 > sched 24 36 > [jlloyd at dubnium-a:~] > > From netinet/tcp.h: > > > 44 #define TH_FIN 0x01 > 45 #define TH_SYN 0x02 > 46 #define TH_RST 0x04 > 47 #define TH_PUSH 0x08 > 48 #define TH_ACK 0x10 > 49 #define TH_URG 0x20 > 50 #define TH_ECE 0x40 > 51 #define TH_CWR 0x80 > > So if I''m interpreting this correctly, and my dtrace code is giving me > what I think, the packets being sent above are URG|RST (0x20|0x04) and > ACK|RST|SYN (0x10|0x04|0x02). Is that right?No, afraid not. The numbers printed from the dtrace script are in decimal while the flags are in hex. Dec 24 = 0x18, which translates to 0x10 | 0x08, which means flags TH_ACK and TH_PUSH. The 16''s are really TH_ACK without TH_PUSH. This reminds of a joke. Why is Christmas like Halloween? Because Dec 25 = Oct 31.> > Thanks, > Justin > > > ------------------------------------------------------------------------ > *From:* dtrace-discuss-bounces at opensolaris.org > [mailto:dtrace-discuss-bounces at opensolaris.org] *On Behalf Of *Justin Lloyd > *Sent:* Thursday, October 25, 2007 4:48 PM > *To:* dtrace-discuss at opensolaris.org > *Subject:* [dtrace-discuss] Tracing network packets > > (Forgive the previous incomplete message) > > I spent some time a couple of weeks ago working on an intermittent > problem we''ve been seeing for at least 8 months now. It''s a complex > problem so I''m just trying to determine what I can see with regards to > network connection states. Basically, we an image processing server (IP) > than sends a request via a web cache server (WC) to an OAS HTTP server > (WS), which in turn sends its response back to the IP via the WC. On > very rare occasions, the IP doesn''t get a response from the WS leading > to a job getting into a funky state and the IP ending up with its > connections in a FIN_WAIT_2 state since they''re not receiving the ACKs. > What we don''t know is whether the WS is sending them and, if so, whether > they''re getting lost somewhere between the WS HTTP server processes and > the IP. > > So, I was trying to use DTrace to get down into the kernel to see if I > could even see something as simple as TCP packets being sent from the WS > and noting what kind of packets they are, e.g. ACK, FIN/ACK, etc., and > then once I could do that, I could dig deeper and look at the actual > problem. But I''m stumped as to how to do that, especially without a > DTrace network provider. > > I was trying tcpsnoop in the 0.99 DTrace toolkit, but I get the following: > > $ sudo ./tcpsnoop -p 14877 > dtrace: failed to compile script /dev/fd/11: "/usr/include/sys/kstat.h", > line 439: invalid type combination > $ > > This is on S10 6/06. > > Any thoughts on this error any my problem in general? > > Thanks, > Justin > > > ------------------------------------------------------------------------ > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- blu "You''ve added a new disk. Do you want to replace your current drive, protect your data from a drive failure or expand your storage capacity?" - Disk management as it should be. ---------------------------------------------------------------------- Brian Utterback - Solaris RPE, Sun Microsystems, Inc. Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom
Oops, color me embarrassed - I know better than that. :) Thanks for the correction, which actually makes more sense because the URGs and RSTs were confusing! Justin -----Original Message----- From: Brian Utterback [mailto:brian.utterback at sun.com] Sent: Friday, October 26, 2007 8:54 AM To: Justin Lloyd Cc: dtrace-discuss at opensolaris.org Subject: Re: [dtrace-discuss] Tracing network packets Justin Lloyd wrote:> I may have figured out a bit: > > [jlloyd at dubnium-a:~] cat net.d > #!/usr/sbin/dtrace -s > > fbt:ip:tcp_send:entry > { > t = (tcp_t *) arg1; > > @[execname, t->tcp_tcph->th_flags] = count(); } > [jlloyd at dubnium-a:~] sudo ./net.d > dtrace: script ''./net.d'' matched 1 probe ^C > > awk 241> httpd 241> java 161> sshd 242> opmn 163> zsched 243> sdtperfmeter 244> java 246> cvfsiod 2412> opmn 2413> sched 1614> sched 2436> [jlloyd at dubnium-a:~] > > From netinet/tcp.h: > > > 44 #define TH_FIN 0x01 > 45 #define TH_SYN 0x02 > 46 #define TH_RST 0x04 > 47 #define TH_PUSH 0x08 > 48 #define TH_ACK 0x10 > 49 #define TH_URG 0x20 > 50 #define TH_ECE 0x40 > 51 #define TH_CWR 0x80 > > So if I''m interpreting this correctly, and my dtrace code is giving me> what I think, the packets being sent above are URG|RST (0x20|0x04) and > ACK|RST|SYN (0x10|0x04|0x02). Is that right?No, afraid not. The numbers printed from the dtrace script are in decimal while the flags are in hex. Dec 24 = 0x18, which translates to 0x10 | 0x08, which means flags TH_ACK and TH_PUSH. The 16''s are really TH_ACK without TH_PUSH. This reminds of a joke. Why is Christmas like Halloween? Because Dec 25 = Oct 31.> > Thanks, > Justin > > >------------------------------------------------------------------------> *From:* dtrace-discuss-bounces at opensolaris.org > [mailto:dtrace-discuss-bounces at opensolaris.org] *On Behalf Of *JustinLloyd> *Sent:* Thursday, October 25, 2007 4:48 PM > *To:* dtrace-discuss at opensolaris.org > *Subject:* [dtrace-discuss] Tracing network packets > > (Forgive the previous incomplete message) > > I spent some time a couple of weeks ago working on an intermittent > problem we''ve been seeing for at least 8 months now. It''s a complex > problem so I''m just trying to determine what I can see with regards to> network connection states. Basically, we an image processing server(IP)> than sends a request via a web cache server (WC) to an OAS HTTP server> (WS), which in turn sends its response back to the IP via the WC. On > very rare occasions, the IP doesn''t get a response from the WS leading> to a job getting into a funky state and the IP ending up with its > connections in a FIN_WAIT_2 state since they''re not receiving theACKs.> What we don''t know is whether the WS is sending them and, if so,whether> they''re getting lost somewhere between the WS HTTP server processesand> the IP. > > So, I was trying to use DTrace to get down into the kernel to see if I> could even see something as simple as TCP packets being sent from theWS> and noting what kind of packets they are, e.g. ACK, FIN/ACK, etc., and> then once I could do that, I could dig deeper and look at the actual > problem. But I''m stumped as to how to do that, especially without a > DTrace network provider. > > I was trying tcpsnoop in the 0.99 DTrace toolkit, but I get thefollowing:> > $ sudo ./tcpsnoop -p 14877 > dtrace: failed to compile script /dev/fd/11:"/usr/include/sys/kstat.h",> line 439: invalid type combination > $ > > This is on S10 6/06. > > Any thoughts on this error any my problem in general? > > Thanks, > Justin > > >------------------------------------------------------------------------> > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- blu "You''ve added a new disk. Do you want to replace your current drive, protect your data from a drive failure or expand your storage capacity?" - Disk management as it should be. ---------------------------------------------------------------------- Brian Utterback - Solaris RPE, Sun Microsystems, Inc. Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom