Hiya, I''m trying to track down some throughput latency that our customer seems to be attributing to our product, I can''t see what he''s talking about, but I want to try and get some deeper granularity than I might get with something like smokeping, and maybe even see if its down to something tunable on our end. I''ve been looking for some examples on how to use the DTrace network provider to do this, but haven''t been able to come up with anything, and I''m afraid I''m VERY new to DTrace in general, does anyone have some simple examples on how to look at this that I can build on, even if they don''t apply to my exact situation (which I may not be able to explain) I can alter it once I have something to go on. Once I''m pointed in the right direction, I can take it from there. -Trish -- Siobhan P. Lynch trish at reliantsec.net Security Engineer Reliant Security Mobile: 646-401-1405 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20080304/c45e5e3b/attachment.html>
Siobhan P. Lynch wrote:> Hiya, > > > > I?m trying to track down some throughput latency that our customer seems > to be attributing to our product, I can?t see what he?s talking about,[...] Please excuse if this is not what you want to hear: The *first* thing you have to do to solve a problem is to understand it. You might be "getting some deeper granularity" only to find out after having spent some time on that that your customer *actually* means something slightly different than you do when you speak of "throughput latency". BTW: What is "throughput latency" supposed to mean? in my experience, throuput and latency are orthogonal concepts: throughput being the average amount of data you get over a connection over time, and latency being the response time to a given request. Have your customer clarify that first. HTH Michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''
What I''m trying to do is get timestamps on echo packets going out and the timestamp on a reply coming in, very specifically, I know what the client thinks he''s looking at, what I''m trying to prove is the issue is that he''s putting 30 lbs of garbage in a 20 lb. bag (squeezing more data down a 144k IDSL line than they should be) The issue, without getting too much into it is that they are seeing extremely high icmp echo reply times through the box, they are blaiming it on the box itself, so what I''m trying to do is establish a baseline. I''m not looking for "the answer" from you, I''m looking for examples using DTrace on how to measure the exact times between two packets (one request, one reply, icmp or tcp ping) essentially RTT times. It seems that anyone I ask doesn''t know how to use the network provider to do something like this, which is why I ask, I really have no examples to go in in this particular situation, so I can''t even "teach myself" like I usually do. -Trish> -----Original Message----- > From: Michael.Schuster at Sun.COM [mailto:Michael.Schuster at Sun.COM] > Sent: Tuesday, March 04, 2008 10:22 AM > To: Siobhan P. Lynch > Cc: dtrace-discuss at opensolaris.org > Subject: Re: [dtrace-discuss] Network Latency > > Siobhan P. Lynch wrote: > > Hiya, > > > > > > > > I''m trying to track down some throughput latency that our customerseems> > to be attributing to our product, I can''t see what he''s talkingabout,> > [...] > > Please excuse if this is not what you want to hear: > > The *first* thing you have to do to solve a problem is to understandit.> You might be "getting some deeper granularity" only to find out after > having spent some time on that that your customer *actually* means > something slightly different than you do when you speak of "throughput > latency". > > BTW: What is "throughput latency" supposed to mean? in my experience, > throuput and latency are orthogonal concepts: throughput being theaverage> amount of data you get over a connection over time, and latency beingthe> response time to a given request. > Have your customer clarify that first. > > HTH > Michael > -- > Michael Schuster http://blogs.sun.com/recursion > Recursion, n.: see ''Recursion''
G''Day Trish, On Tue, Mar 04, 2008 at 06:17:40PM -0800, Siobhan P. Lynch wrote:> What I''m trying to do is get timestamps on echo packets going out and > the timestamp on a reply coming in,"ping -s" is supposed to provide RTT down to the microsecond, if that''s the information you need. If you meant reading timestamps closer to the NIC rather than in ping/userland - then that''s a great question! :) I''ve done this before (I thought I blogged about it but I can''t find the entry) the suprising result was that for nearby hosts (<< 1 ms) the times ping gives you are more to do with ping code and CPU thread scheduling rather than network latency. I had cooked up an enhanced version of ping which involved wrapping /usr/sbin/ping with a DTrace script to do the actual measurements.> very specifically, I know what the > client thinks he''s looking at, what I''m trying to prove is the issue is > that he''s putting 30 lbs of garbage in a 20 lb. bag (squeezing more data > down a 144k IDSL line than they should be) > > The issue, without getting too much into it is that they are seeing > extremely high icmp echo reply times through the box, they are blaiming > it on the box itself, so what I''m trying to do is establish a baseline. > > I''m not looking for "the answer" from you, I''m looking for examples > using DTrace on how to measure the exact times between two packets (one > request, one reply, icmp or tcp ping) essentially RTT times.Measuring such times (RTT, 1st byte, TCP connection, etc) is something DTrace should do very well - and provide processed output such as time distribution plots. However the only stable network provider to cover networking is "mib", and it doesn''t provide arguments to allow us to match packets, among other things. There is a plan (which some of us have been working on today) to create stable network providers for IP, TCP, ICMP, etc for customer observability. The following output is a demo from a prototype: # ./tcp1bl.d ^C 1st Byte Latency (ns) 192.168.101.101 value ------------- Distribution ------------- count 32768 | 0 65536 |@@@@@@ 2 131072 |@@@@@ 1 262144 |@@@@@@@@@@@@@@@ 4 524288 |@@@@@@@@@@@ 3 1048576 |@@@@ 1 2097152 | 0 That script delivers TCP 1st byte latency by hostname. Hot stuff - and desprately needed. But it isn''t available yet - and when it is, it will be OpenSolaris to start with. So for now we''ll use fbt - the dynamic kernel provider that lets us see everything. The good news: we can answer those questions. The bad news: it''s hard (you''ll learn more Solaris TCP/IP stack code that you probably want to :), and your scripts may break from one Solaris release to the next. Which is why we''d prefer stable providers... DISCLAIMER: THE FOLLOWING FBT SCRIPTS ARE NOT GUARANTEED TO WORK OR BE ACCURATE. PLEASE CHECK THE DATE ON THIS EMAIL - THE MORE MONTHS THAT HAVE ELAPSED, THE GREATER THE CHANCE THESE WILL NOT WORK. I''ll paste several more steps than usual, which might be more useful for people than the fbt scripts themselves. I want to emphasise that DTracing networking using the fbt provider will require much digging, experimentation, and code reading; you are unlikely to get an answer in a hurry. So we are interested in tracing icmp for accurate timestamps. I''ll start by grepping for icmp probes: # dtrace -l | grep icmp 30097 fbt ipf frpr_icmp6 entry 30098 fbt ipf frpr_icmp6 return 30111 fbt ipf frpr_icmp entry 30112 fbt ipf frpr_icmp return 30199 fbt ipf fr_checkicmpmatchingstate entry 30200 fbt ipf fr_checkicmpmatchingstate return 30203 fbt ipf fr_checkicmp6matchingstate entry 30204 fbt ipf fr_checkicmp6matchingstate return 30251 fbt ipf nat_icmpquerytype4 entry ... There were 359 probes listed - maybe too many to read through to start with. Knowing how programmers name functions may lead to some luck: # dtrace -l | egrep ''icmp_(send|receive|read|write|tx|rx)'' 32796 mib ip icmp_send_redirect icmpOutRedirects 55711 fbt ip icmp_send_redirect entry 55712 fbt ip icmp_send_redirect return 55935 fbt ip icmp_send_redirect_v6 entry 55936 fbt ip icmp_send_redirect_v6 return but not in this case. Now I''ll try frequency counting with a known number of events. This is a quiet workstation where there shouldn''t be much other ICMP happening to confuse the output: # dtrace -Fn ''fbt::*icmp*:entry { @[probefunc] = count(); }'' -c ''ping -s 192.168.1.108 56 7'' dtrace: description ''fbt::*icmp*:entry '' matched 100 probes PING 192.168.1.108: 56 data bytes 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.248 ms 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.488 ms 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.325 ms 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.618 ms 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.225 ms 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.635 ms 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.678 ms ----192.168.1.108 PING Statistics---- 7 packets transmitted, 7 packets received, 0% packet loss round-trip (ms) min/avg/max/stddev = 0.225/0.460/0.678/0.192 dtrace: pid 582501 has exited icmp_bind 1 icmp_bind_proto 1 icmp_build_hdrs 1 icmp_capability_req 1 icmp_close 1 icmp_close_free 1 icmp_copy_info 1 icmp_open 1 icmp_openv4 1 icmp_bind_ack 2 icmp_bind_result 2 icmp_opt_set 3 icmp_opt_set_locked 3 icmp_wput_other 5 icmp_inbound 7 icmp_wput 12 icmp_input 14 I sent 7 pings, and was hoping for 7 to show up in the output. I got icmp_inbound(), which sounds promising for the receive, and maybe icmp_wput() for the send - although it fired 12 times instead of 7. Lets try mib: # dtrace -Fn ''mib::: { @[probename] = count(); }'' -c ''ping -s 192.168.1.108 56 7'' dtrace: description ''mib::: '' matched 568 probes PING 192.168.1.108: 56 data bytes 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.765 ms 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.420 ms 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.697 ms 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.668 ms 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.185 ms 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.678 ms 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.852 ms ----192.168.1.108 PING Statistics---- 7 packets transmitted, 7 packets received, 0% packet loss round-trip (ms) min/avg/max/stddev = 0.185/0.609/0.852/0.229 dtrace: pid 582512 has exited ipIfStatsForwProhibits 1 ipIfStatsHCInMcastOctets 1 ipIfStatsHCInMcastPkts 1 ipv6IfIcmpInMsgs 1 ipv6IfIcmpInNeighborAdvertisements 1 tcpOutAckDelayed 3 tcpOutAck 4 icmpInEchoReps 7 icmpInMsgs 7 rawipOutDatagrams 7 tcpInAckBytes 7 tcpInAckSegs 7 tcpOutDataBytes 7 tcpOutDataSegs 7 tcpRttUpdate 7 tcpInDataInorderBytes 9 tcpInDataInorderSegs 9 rawipInDatagrams 14 ipIfStatsHCInOctets 18 ipIfStatsHCInReceives 18 ipIfStatsHCOutOctets 18 ipIfStatsHCOutRequests 18 ipIfStatsHCOutTransmits 18 ipIfStatsHCInDelivers 24 better? I get icmpInMsgs and icmpInEchoReps for inbound, but still no clear outbound. There is rawipOutDatagrams with a 7 count, which sounds like it could be icmp. Lets see who calls rawipOutDatagrams: # dtrace -Fn ''mib:::rawipOutDatagrams { @[probefunc, stack()] = count(); }'' -c ''ping -s 192.168.1.108 56 7'' dtrace: description ''mib:::rawipOutDatagrams '' matched 2 probes PING 192.168.1.108: 56 data bytes 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=1.794 ms 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.462 ms 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.457 ms 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.457 ms 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.468 ms 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.447 ms 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.458 ms ----192.168.1.108 PING Statistics---- 7 packets transmitted, 7 packets received, 0% packet loss round-trip (ms) min/avg/max/stddev = 0.447/0.649/1.794/0.505 dtrace: pid 582521 has exited icmp_wput unix`putnext+0x2f1 genunix`strput+0x1cf genunix`kstrputmsg+0x2bf sockfs`sosend_dgram+0x2dd sockfs`sotpi_sendmsg+0x566 sockfs`sendit+0x1b8 sockfs`sendto+0xb8 sockfs`sendto32+0x2d unix`sys_syscall32+0x1fc 7 The top level function was icmp_wput() - which sounds like our guy. So far, we have the following possible icmp receive probes: fbt::icmp_inbound:entry mib:::icmpInMsgs mib:::icmpInEchoReps And the following possible icmp send probe: mib::icmp_wput:rawipOutDatagrams Now might be a good time to read the source - such as bringing up the icmp_inbound() and icmp_wput() functions in cvs.opensolaris.org, and checking how and when they bump the MIBs. Having taken a quick look, they still look promising. Now to get some timestamps: # dtrace -n ''mib::icmp_wput:rawipOutDatagrams,mib:::icmpInEchoReps { trace(timestamp); }'' -c ''ping -s 192.168.1.108 56 7'' dtrace: description ''mib::icmp_wput:rawipOutDatagrams,mib:::icmpInEchoReps '' matched 2 probes PING 192.168.1.108: 56 data bytes 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.765 ms CPU ID FUNCTION:NAME 0 33301 icmp_wput:rawipOutDatagrams 7698507169846581 1 33052 icmp_inbound:icmpInEchoReps 7698507170499718 1 33301 icmp_wput:rawipOutDatagrams 7698508170106314 1 33052 icmp_inbound:icmpInEchoReps 7698508170223800 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.366 ms 0 33301 icmp_wput:rawipOutDatagrams 7698509170070724 1 33052 icmp_inbound:icmpInEchoReps 7698509170173117 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.306 ms 0 33301 icmp_wput:rawipOutDatagrams 7698510170071133 1 33052 icmp_inbound:icmpInEchoReps 7698510170171805 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.232 ms 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.161 ms 1 33301 icmp_wput:rawipOutDatagrams 7698511170081670 1 33052 icmp_inbound:icmpInEchoReps 7698511170150712 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.179 ms 1 33301 icmp_wput:rawipOutDatagrams 7698512170195437 1 33052 icmp_inbound:icmpInEchoReps 7698512170273791 1 33301 icmp_wput:rawipOutDatagrams 7698513170087531 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.184 ms ----192.168.1.108 PING Statistics---- 7 packets transmitted, 7 packets received, 0% packet loss round-trip (ms) min/avg/max/stddev = 0.161/0.313/0.765/0.213 dtrace: pid 582528 has exited 1 33052 icmp_inbound:icmpInEchoReps 7698513170175988 Beware of the CPU column shuffling - the events may not printed in the correct order (post sort those timestamps). So this is working, but the output is rather rough. So, cutting corners based on some assumptions - in particular that your server is ICMP quiet and packet associations are not needed: # cat ping.d #!/usr/sbin/dtrace -s #pragma D option quiet mib::icmp_wput:rawipOutDatagrams { /* * Assume your server is ICMP quiet, and no packet association * is necessary. */ start = timestamp; } mib:::icmpInEchoReps /start/ { @["ICMP round-trip (ns)"] = quantize(timestamp - start); start = 0; } # ./ping.d -c ''ping -s 192.168.1.108 56 7'' PING 192.168.1.108: 56 data bytes 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.547 ms 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.186 ms 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.470 ms 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.461 ms 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.450 ms 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.450 ms 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.456 ms ----192.168.1.108 PING Statistics---- 7 packets transmitted, 7 packets received, 0% packet loss round-trip (ms) min/avg/max/stddev = 0.186/0.431/0.547/0.113 ICMP round-trip (ns) value ------------- Distribution ------------- count 32768 | 0 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 7 131072 | 0 Oh, that quantize is rather compressed... Lets make it an lquantize(): # cat ping.d #!/usr/sbin/dtrace -s #pragma D option quiet mib::icmp_wput:rawipOutDatagrams { /* * Assume your server is ICMP quiet, and no packet association * is necessary. */ start = timestamp; } mib:::icmpInEchoReps /start/ { @["ICMP round-trip (ns)"] = lquantize(timestamp - start, 0, 1000000, 1000); start = 0; } # ./ping.d -c ''ping -s 192.168.1.108 56 7'' PING 192.168.1.108: 56 data bytes 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.526 ms 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.189 ms 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.457 ms 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.460 ms 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.444 ms 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.474 ms 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.463 ms ----192.168.1.108 PING Statistics---- 7 packets transmitted, 7 packets received, 0% packet loss round-trip (ms) min/avg/max/stddev = 0.189/0.430/0.526/0.110 ICMP round-trip (ns) value ------------- Distribution ------------- count 89000 | 0 90000 |@@@@@@ 1 91000 |@@@@@@@@@@@ 2 92000 | 0 93000 |@@@@@@ 1 94000 | 0 95000 | 0 96000 | 0 97000 | 0 98000 |@@@@@@ 1 99000 | 0 100000 |@@@@@@ 1 101000 |@@@@@@ 1 102000 | 0 Looking good. Note that the measured times are between 0.090 and 0.102 ms, unlike the summary from ping - which has an average of 0.430 ms! That script is a bit junky, the follwing excercises may illustrate why... Excercises for the reader: 1) Read the source behind those probes - are they measuring what we assumed them to measure? 2) associate packets properly by digging out IP and ICMP fields (difficult, and would currently require fbt probes, not mib probes). 3) Can fbt probes be picked that are closer in the code path to the interface, to improve the accuracy of the time measurements? (also difficult). 4) mib::icmp_wput:rawipOutDatagrams is unstable, since it specified the probefunc field. Is there a better mib probe, or combination of mib probes that can be used to make this script stable? (this would exclude tasks (2) and (3)). ... This is by no means an authorative answer on the topic, and covers unstable interfaces that are unlikely to work in the future. But it is an exciting topic for DTrace, and once we have stable network providers we''ll be able to really get started writing some cool and useful scripts... Brendan -- Brendan [CA, USA]
Siobhan P. Lynch wrote:> What I''m trying to do is get timestamps on echo packets going out and > the timestamp on a reply coming in, very specifically, I know what the > client thinks he''s looking at, what I''m trying to prove is the issue is > that he''s putting 30 lbs of garbage in a 20 lb. bag (squeezing more data > down a 144k IDSL line than they should be)hmm ... that sounds a lot more specific than "I can''t see what he''s talking about". I''m optimistic that with the detail you gave, more help will be forthcoming. Please tell us what build of opensolaris you''re using. Have you tried snoop? It''s not DTrace, but it''s very good for observing network traffic, and *does* include timestamps (or so I recall). maybe networking-discuss at opensolaris.org is also a good place to ask for suggestions about analysing this. HTH Michael> The issue, without getting too much into it is that they are seeing > extremely high icmp echo reply times through the box, they are blaiming > it on the box itself, so what I''m trying to do is establish a baseline. > > I''m not looking for "the answer" from you, I''m looking for examples > using DTrace on how to measure the exact times between two packets (one > request, one reply, icmp or tcp ping) essentially RTT times. > > It seems that anyone I ask doesn''t know how to use the network provider > to do something like this, which is why I ask, I really have no examples > to go in in this particular situation, so I can''t even "teach myself" > like I usually do. > > -Trish > >> -----Original Message----- >> From: Michael.Schuster at Sun.COM [mailto:Michael.Schuster at Sun.COM] >> Sent: Tuesday, March 04, 2008 10:22 AM >> To: Siobhan P. Lynch >> Cc: dtrace-discuss at opensolaris.org >> Subject: Re: [dtrace-discuss] Network Latency >> >> Siobhan P. Lynch wrote: >>> Hiya, >>> >>> >>> >>> I''m trying to track down some throughput latency that our customer > seems >>> to be attributing to our product, I can''t see what he''s talking > about, >> [...] >> >> Please excuse if this is not what you want to hear: >> >> The *first* thing you have to do to solve a problem is to understand > it. >> You might be "getting some deeper granularity" only to find out after >> having spent some time on that that your customer *actually* means >> something slightly different than you do when you speak of "throughput >> latency". >> >> BTW: What is "throughput latency" supposed to mean? in my experience, >> throuput and latency are orthogonal concepts: throughput being the > average >> amount of data you get over a connection over time, and latency being > the >> response time to a given request. >> Have your customer clarify that first. >> >> HTH >> Michael >> -- >> Michael Schuster http://blogs.sun.com/recursion >> Recursion, n.: see ''Recursion'' >-- Michael Schuster Sun Microsystems, Inc. recursion, n: see ''recursion''
This is perfect Brendan, *exactly* what I needed to get me started with this, what I''m trying to prove is that the machine is not causing the issue, that it is the network itself, which this puts me well on the way to solving. I understand this may not work down the line, but I also understand from your talk that the unstable providers are just that, unstable, and I may have to alter this as the kernel code changes. -Trish> -----Original Message----- > From: Brendan Gregg - Sun Microsystems [mailto:brendan at sun.com] > Sent: Tuesday, March 04, 2008 11:22 PM > To: Siobhan P. Lynch > Cc: Michael.Schuster at sun.com; dtrace-discuss at opensolaris.org > Subject: Re: [dtrace-discuss] Network Latency > > G''Day Trish, > > On Tue, Mar 04, 2008 at 06:17:40PM -0800, Siobhan P. Lynch wrote: > > What I''m trying to do is get timestamps on echo packets going outand> > the timestamp on a reply coming in, > > "ping -s" is supposed to provide RTT down to the microsecond, ifthat''s> the information you need. If you meant reading timestamps closer tothe> NIC rather than in ping/userland - then that''s a great question! :) > > I''ve done this before (I thought I blogged about it but I can''t find > the entry) the suprising result was that for nearby hosts (<< 1 ms)the> times ping gives you are more to do with ping code and CPU thread > scheduling rather than network latency. I had cooked up an enhanced > version of ping which involved wrapping /usr/sbin/ping with a DTrace > script to do the actual measurements. > > > very specifically, I know what the > > client thinks he''s looking at, what I''m trying to prove is the issueis> > that he''s putting 30 lbs of garbage in a 20 lb. bag (squeezing moredata> > down a 144k IDSL line than they should be) > > > > The issue, without getting too much into it is that they are seeing > > extremely high icmp echo reply times through the box, they areblaiming> > it on the box itself, so what I''m trying to do is establish abaseline.> > > > I''m not looking for "the answer" from you, I''m looking for examples > > using DTrace on how to measure the exact times between two packets(one> > request, one reply, icmp or tcp ping) essentially RTT times. > > Measuring such times (RTT, 1st byte, TCP connection, etc) is something > DTrace should do very well - and provide processed output such as time > distribution plots. > > However the only stable network provider to cover networking is "mib", > and it doesn''t provide arguments to allow us to match packets, among > other things. There is a plan (which some of us have been working on > today) to create stable network providers for IP, TCP, ICMP, etc for > customer observability. The following output is a demo from aprototype:> > # ./tcp1bl.d > ^C > 1st Byte Latency (ns) 192.168.101.101 > > value ------------- Distribution ------------- count > 32768 | 0 > 65536 |@@@@@@ 2 > 131072 |@@@@@ 1 > 262144 |@@@@@@@@@@@@@@@ 4 > 524288 |@@@@@@@@@@@ 3 > 1048576 |@@@@ 1 > 2097152 | 0 > > That script delivers TCP 1st byte latency by hostname. Hot stuff - > and desprately needed. But it isn''t available yet - and when it is, > it will be OpenSolaris to start with. > > So for now we''ll use fbt - the dynamic kernel provider that lets us > see everything. The good news: we can answer those questions. The > bad news: it''s hard (you''ll learn more Solaris TCP/IP stack code that > you probably want to :), and your scripts may break from one Solaris > release to the next. Which is why we''d prefer stable providers... > > DISCLAIMER: THE FOLLOWING FBT SCRIPTS ARE NOT GUARANTEED TO WORK OR BE > ACCURATE. PLEASE CHECK THE DATE ON THIS EMAIL - THE MORE MONTHS THAT > HAVE ELAPSED, THE GREATER THE CHANCE THESE WILL NOT WORK. > > I''ll paste several more steps than usual, which might be more useful > for people than the fbt scripts themselves. I want to emphasise that > DTracing networking using the fbt provider will require much digging, > experimentation, and code reading; you are unlikely to get an answer > in a hurry. > > So we are interested in tracing icmp for accurate timestamps. > I''ll start by grepping for icmp probes: > > # dtrace -l | grep icmp > 30097 fbt ipf frpr_icmp6entry> 30098 fbt ipf frpr_icmp6return> 30111 fbt ipf frpr_icmpentry> 30112 fbt ipf frpr_icmpreturn> 30199 fbt ipf fr_checkicmpmatchingstateentry> 30200 fbt ipf fr_checkicmpmatchingstatereturn> 30203 fbt ipf fr_checkicmp6matchingstateentry> 30204 fbt ipf fr_checkicmp6matchingstatereturn> 30251 fbt ipf nat_icmpquerytype4entry> ... > > There were 359 probes listed - maybe too many to read through to start > with. Knowing how programmers name functions may lead to some luck: > > # dtrace -l | egrep ''icmp_(send|receive|read|write|tx|rx)'' > 32796 mib ip icmp_send_redirect > icmpOutRedirects > 55711 fbt ip icmp_send_redirectentry> 55712 fbt ip icmp_send_redirectreturn> 55935 fbt ip icmp_send_redirect_v6entry> 55936 fbt ip icmp_send_redirect_v6return> > but not in this case. > > Now I''ll try frequency counting with a known number of events. Thisis> a quiet workstation where there shouldn''t be much other ICMP happening > to confuse the output: > > # dtrace -Fn ''fbt::*icmp*:entry { @[probefunc] = count(); }'' -c ''ping-s> 192.168.1.108 56 7'' > dtrace: description ''fbt::*icmp*:entry '' matched 100 probes > PING 192.168.1.108: 56 data bytes > 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.248 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.488 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.325 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.618 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.225 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.635 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.678 ms > > ----192.168.1.108 PING Statistics---- > 7 packets transmitted, 7 packets received, 0% packet loss > round-trip (ms) min/avg/max/stddev = 0.225/0.460/0.678/0.192 > dtrace: pid 582501 has exited > > icmp_bind 1 > icmp_bind_proto 1 > icmp_build_hdrs 1 > icmp_capability_req 1 > icmp_close 1 > icmp_close_free 1 > icmp_copy_info 1 > icmp_open 1 > icmp_openv4 1 > icmp_bind_ack 2 > icmp_bind_result 2 > icmp_opt_set 3 > icmp_opt_set_locked 3 > icmp_wput_other 5 > icmp_inbound 7 > icmp_wput 12 > icmp_input 14 > > I sent 7 pings, and was hoping for 7 to show up in the output. I got > icmp_inbound(), which sounds promising for the receive, and maybe > icmp_wput() for the send - although it fired 12 times instead of 7. > > Lets try mib: > > # dtrace -Fn ''mib::: { @[probename] = count(); }'' -c ''ping -s192.168.1.108> 56 7'' > dtrace: description ''mib::: '' matched 568 probes > PING 192.168.1.108: 56 data bytes > 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.765 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.420 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.697 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.668 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.185 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.678 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.852 ms > > ----192.168.1.108 PING Statistics---- > 7 packets transmitted, 7 packets received, 0% packet loss > round-trip (ms) min/avg/max/stddev = 0.185/0.609/0.852/0.229 > dtrace: pid 582512 has exited > > ipIfStatsForwProhibits 1 > ipIfStatsHCInMcastOctets 1 > ipIfStatsHCInMcastPkts 1 > ipv6IfIcmpInMsgs 1 > ipv6IfIcmpInNeighborAdvertisements 1 > tcpOutAckDelayed 3 > tcpOutAck 4 > icmpInEchoReps 7 > icmpInMsgs 7 > rawipOutDatagrams 7 > tcpInAckBytes 7 > tcpInAckSegs 7 > tcpOutDataBytes 7 > tcpOutDataSegs 7 > tcpRttUpdate 7 > tcpInDataInorderBytes 9 > tcpInDataInorderSegs 9 > rawipInDatagrams 14 > ipIfStatsHCInOctets 18 > ipIfStatsHCInReceives 18 > ipIfStatsHCOutOctets 18 > ipIfStatsHCOutRequests 18 > ipIfStatsHCOutTransmits 18 > ipIfStatsHCInDelivers 24 > > better? I get icmpInMsgs and icmpInEchoReps for inbound, but still > no clear outbound. There is rawipOutDatagrams with a 7 count, which > sounds like it could be icmp. > > Lets see who calls rawipOutDatagrams: > > # dtrace -Fn ''mib:::rawipOutDatagrams { @[probefunc, stack()] count(); }'' > -c ''ping -s 192.168.1.108 56 7'' > dtrace: description ''mib:::rawipOutDatagrams '' matched 2 probes > PING 192.168.1.108: 56 data bytes > 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=1.794 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.462 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.457 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.457 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.468 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.447 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.458 ms > > ----192.168.1.108 PING Statistics---- > 7 packets transmitted, 7 packets received, 0% packet loss > round-trip (ms) min/avg/max/stddev = 0.447/0.649/1.794/0.505 > dtrace: pid 582521 has exited > > icmp_wput > unix`putnext+0x2f1 > genunix`strput+0x1cf > genunix`kstrputmsg+0x2bf > sockfs`sosend_dgram+0x2dd > sockfs`sotpi_sendmsg+0x566 > sockfs`sendit+0x1b8 > sockfs`sendto+0xb8 > sockfs`sendto32+0x2d > unix`sys_syscall32+0x1fc > 7 > > The top level function was icmp_wput() - which sounds like our guy. > > So far, we have the following possible icmp receive probes: > > fbt::icmp_inbound:entry > mib:::icmpInMsgs > mib:::icmpInEchoReps > > And the following possible icmp send probe: > > mib::icmp_wput:rawipOutDatagrams > > Now might be a good time to read the source - such as bringing up the > icmp_inbound() and icmp_wput() functions in cvs.opensolaris.org, and > checking how and when they bump the MIBs. Having taken a quick look, > they still look promising. > > Now to get some timestamps: > > # dtrace -n ''mib::icmp_wput:rawipOutDatagrams,mib:::icmpInEchoReps { > trace(timestamp); }'' -c ''ping -s 192.168.1.108 56 7'' > dtrace: description''mib::icmp_wput:rawipOutDatagrams,mib:::icmpInEchoReps> '' matched 2 probes > PING 192.168.1.108: 56 data bytes > 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.765 ms > CPU ID FUNCTION:NAME > 0 33301 icmp_wput:rawipOutDatagrams 7698507169846581 > 1 33052 icmp_inbound:icmpInEchoReps 7698507170499718 > 1 33301 icmp_wput:rawipOutDatagrams 7698508170106314 > 1 33052 icmp_inbound:icmpInEchoReps 7698508170223800 > 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.366 ms > 0 33301 icmp_wput:rawipOutDatagrams 7698509170070724 > 1 33052 icmp_inbound:icmpInEchoReps 7698509170173117 > 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.306 ms > 0 33301 icmp_wput:rawipOutDatagrams 7698510170071133 > 1 33052 icmp_inbound:icmpInEchoReps 7698510170171805 > 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.232 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.161 ms > 1 33301 icmp_wput:rawipOutDatagrams 7698511170081670 > 1 33052 icmp_inbound:icmpInEchoReps 7698511170150712 > 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.179 ms > 1 33301 icmp_wput:rawipOutDatagrams 7698512170195437 > 1 33052 icmp_inbound:icmpInEchoReps 7698512170273791 > 1 33301 icmp_wput:rawipOutDatagrams 7698513170087531 > 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.184 ms > > ----192.168.1.108 PING Statistics---- > 7 packets transmitted, 7 packets received, 0% packet loss > round-trip (ms) min/avg/max/stddev = 0.161/0.313/0.765/0.213 > dtrace: pid 582528 has exited > 1 33052 icmp_inbound:icmpInEchoReps 7698513170175988 > > Beware of the CPU column shuffling - the events may not printed in the > correct order (post sort those timestamps). So this is working, but > the output is rather rough. > > So, cutting corners based on some assumptions - in particular thatyour> server is ICMP quiet and packet associations are not needed: > > # cat ping.d > #!/usr/sbin/dtrace -s > > #pragma D option quiet > > mib::icmp_wput:rawipOutDatagrams > { > /* > * Assume your server is ICMP quiet, and no packet association > * is necessary. > */ > start = timestamp; > } > > mib:::icmpInEchoReps > /start/ > { > @["ICMP round-trip (ns)"] = quantize(timestamp - start); > start = 0; > } > > # ./ping.d -c ''ping -s 192.168.1.108 56 7'' > PING 192.168.1.108: 56 data bytes > 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.547 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.186 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.470 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.461 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.450 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.450 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.456 ms > > ----192.168.1.108 PING Statistics---- > 7 packets transmitted, 7 packets received, 0% packet loss > round-trip (ms) min/avg/max/stddev = 0.186/0.431/0.547/0.113 > > ICMP round-trip (ns) > value ------------- Distribution ------------- count > 32768 | 0 > 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 7 > 131072 | 0 > > Oh, that quantize is rather compressed... Lets make it anlquantize():> > # cat ping.d > #!/usr/sbin/dtrace -s > > #pragma D option quiet > > mib::icmp_wput:rawipOutDatagrams > { > /* > * Assume your server is ICMP quiet, and no packet association > * is necessary. > */ > start = timestamp; > } > > mib:::icmpInEchoReps > /start/ > { > @["ICMP round-trip (ns)"] = lquantize(timestamp - start, 0, > 1000000, > 1000); > start = 0; > } > > # ./ping.d -c ''ping -s 192.168.1.108 56 7'' > PING 192.168.1.108: 56 data bytes > 64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.526 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.189 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.457 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.460 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.444 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.474 ms > 64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.463 ms > > ----192.168.1.108 PING Statistics---- > 7 packets transmitted, 7 packets received, 0% packet loss > round-trip (ms) min/avg/max/stddev = 0.189/0.430/0.526/0.110 > > ICMP round-trip (ns) > value ------------- Distribution ------------- count > 89000 | 0 > 90000 |@@@@@@ 1 > 91000 |@@@@@@@@@@@ 2 > 92000 | 0 > 93000 |@@@@@@ 1 > 94000 | 0 > 95000 | 0 > 96000 | 0 > 97000 | 0 > 98000 |@@@@@@ 1 > 99000 | 0 > 100000 |@@@@@@ 1 > 101000 |@@@@@@ 1 > 102000 | 0 > > Looking good. Note that the measured times are between 0.090 and0.102 ms,> unlike the summary from ping - which has an average of 0.430 ms! > > That script is a bit junky, the follwing excercises may illustratewhy...> > Excercises for the reader: > > 1) Read the source behind those probes - are they measuring what > we assumed them to measure? > 2) associate packets properly by digging out IP and ICMP fields > (difficult, and would currently require fbt probes, not mibprobes).> 3) Can fbt probes be picked that are closer in the code path to the > interface, to improve the accuracy of the time measurements? > (also difficult). > 4) mib::icmp_wput:rawipOutDatagrams is unstable, since it specified > the probefunc field. Is there a better mib probe, or combinationof> mib probes that can be used to make this script stable? > (this would exclude tasks (2) and (3)). > > ... > > This is by no means an authorative answer on the topic, and covers > unstable interfaces that are unlikely to work in the future. But itis> an exciting topic for DTrace, and once we have stable networkproviders> we''ll be able to really get started writing some cool and useful > scripts... > > Brendan > > -- > Brendan > [CA, USA]