Bogdan Ghita
2005-Apr-29 11:11 UTC
[netflow-tools] Softflowd and rrd running on the same machine
Hello everybody I''ve installed a few days ago softflowd and rrd-associated tools (flow-capture/flowscan) on a monitoring machine connected to a local network. First, I''ll start with the praise - it''s a great software, I''ve been looking for a long time at netflow-related software, but couldn''t find anything that would allow me to implement it via a monitoring machine. Everything seems to be working reasonably well at the moment, but I still have a couple of problems that I can''t find the solution for: - out-of-order packets. In order to get softflowd and rrd to work, I''m sending packets via the local interface of the machine. I thought this would work just fine but flow-capture reports continually ''lost'' packets: Apr 29 11:27:18 linux flow-capture[23080]: ftpdu_seq_check(): src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 expecting=28199800 received=28199770 lost=4294967265 Apr 29 11:27:18 linux flow-capture[23080]: ftpdu_seq_check(): src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 expecting=28199800 received=28199770 lost=4294967265 Apr 29 11:27:18 linux flow-capture[23080]: ftpdu_seq_check(): src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 expecting=28199790 received=28199934 lost=144 Apr 29 11:27:18 linux flow-capture[23080]: ftpdu_seq_check(): src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 expecting=28199935 received=28199936 lost=1 Apr 29 11:27:19 linux flow-capture[23080]: ftpdu_seq_check(): src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 expecting=28199969 received=28199940 lost=4294967266 Apr 29 11:27:19 linux last message repeated 2 times Apr 29 11:27:19 linux flow-capture[23080]: ftpdu_seq_check(): src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 expecting=28199944 received=28200056 lost=112 Apr 29 11:27:20 linux flow-capture[23080]: ftpdu_seq_check(): src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 expecting=28200093 received=28200064 lost=4294967266 Apr 29 11:27:20 linux last message repeated 2 times Apr 29 11:27:20 linux flow-capture[23080]: ftpdu_seq_check(): src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 expecting=28200068 received=28200174 lost=106 Is it possible (I''ve don very little in terms of socket programming, so the answer might be straightforward) to pipe softflowd straight into flow-capture, so that the communication will be (hopefully) smoother? - CPU usage - softflowd seems to behave very strange when changing the priority - with the default priority (0), it uses continuously about 80% of the processor (yes, it is a big network and, again, yes, it is a slow machine - P3-900MHz); when I renice it (via ''top'') to lower priority (e.g. 10), the utilisation drops to about 20%, although the processor is, overall, only about 50% used. Is the collector dropping packets/flow during that time? Is the machine that slow? - Reports - I am not sure whether this is a softflowd problem or an rrd-related problem. I''ve noticed a continuous udp flow running over the network, quite considerable in terms of bandwidth. However, when drawing the graphs, there is only an hourly spike, and nothing else. What could be causing this type of reporting? Thank you very much for your time. Regards Bogdan
Damien Miller
2005-Apr-29 23:11 UTC
[netflow-tools] Softflowd and rrd running on the same machine
Bogdan Ghita wrote:> Hello everybody > > I''ve installed a few days ago softflowd and rrd-associated tools > (flow-capture/flowscan) on a monitoring machine connected to a local > network. First, I''ll start with the praise - it''s a great software, I''ve > been looking for a long time at netflow-related software, but couldn''t > find anything that would allow me to implement it via a monitoring > machine. Everything seems to be working reasonably well at the moment, > but I still have a couple of problems that I can''t find the solution > for: > > - out-of-order packets. In order to get softflowd and rrd to work, I''m > sending packets via the local interface of the machine. I thought this > would work just fine but flow-capture reports continually ''lost'' > packets: > > Apr 29 11:27:18 linux flow-capture[23080]: ftpdu_seq_check(): > src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 > expecting=28199800 received=28199770 lost=4294967265 > Apr 29 11:27:18 linux flow-capture[23080]: ftpdu_seq_check(): > src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 > expecting=28199800 received=28199770 lost=4294967265This might be a bug in the sequence number generation of softflowd, or a bug in flow-capture - flow-capture is certainly printing negative sequence number offsets incorrectly. Can you capture some softflowd output packages with another netflow capable tool to manually check the sequence numbers? E.g. tcpdump''s cnfp mode, flowd, etc.> Is it possible (I''ve don very little in terms of socket programming, so > the answer might be straightforward) to pipe softflowd straight into > flow-capture, so that the communication will be (hopefully) smoother?It is highly unlikely that the packets are acually getting dropped. It is more likely a bug in one/both packages.> - CPU usage - softflowd seems to behave very strange when changing the > priority - with the default priority (0), it uses continuously about 80% > of the processor (yes, it is a big network and, again, yes, it is a slow > machine - P3-900MHz); when I renice it (via ''top'') to lower priority > (e.g. 10), the utilisation drops to about 20%, although the processor > is, overall, only about 50% used. Is the collector dropping packets/flow > during that time? Is the machine that slow?It is difficult to tell whether or not you are dropping packets - it depend a lot on what happens on the kernel side too. I''m not sure what information on packet drops libpcap makes available, but the attached patch adds printing of the statistics that it does provide to those printable using "softflowctl statistics". I''m not sure what a "dropped packet" in libpcap''s parlance is - it may be a drop because the client couldn''t keep up, or a drop from a bpf filter program. It should be obvious once you start playing with it though :)> - Reports - I am not sure whether this is a softflowd problem or an > rrd-related problem. I''ve noticed a continuous udp flow running over the > network, quite considerable in terms of bandwidth. However, when drawing > the graphs, there is only an hourly spike, and nothing else. What could > be causing this type of reporting?That depends on how the data is being presented - if you are showing flow records vs time, then a long UDP conversation may be represented by only a single flow record. Retrospectively producing throughput vs time representations of long-lived flow conversations is a little tricky and requires some suppositions, as the flow record summarises away any dynamics in the conversation. -d -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: softflowd-pcap-stats.diff Url: http://lists.mindrot.org/pipermail/netflow-tools/attachments/20050430/05f7807f/attachment.ksh
Bogdan Ghita
2005-May-02 15:02 UTC
[netflow-tools] Softflowd and rrd running on the same machine
Hello Damien Thank you very much for your response. Although I''ve still got a few unclear issues, at least now I cleared some of them and I''ve got plenty to try for the next few days:). See a few more comments below. Best regards Bogdan> -----Original Message----- > From: Damien Miller [mailto:djm at mindrot.org] > Sent: 30 April 2005 00:12 > To: Bogdan Ghita > Cc: netflow-tools at mindrot.org > Subject: Re: [netflow-tools] Softflowd and rrd running on the samemachine> > Bogdan Ghita wrote: > > Hello everybody > > > > I''ve installed a few days ago softflowd and rrd-associated tools > > (flow-capture/flowscan) on a monitoring machine connected to a local > > network. First, I''ll start with the praise - it''s a great software,I''ve> > been looking for a long time at netflow-related software, butcouldn''t> > find anything that would allow me to implement it via a monitoring > > machine. Everything seems to be working reasonably well at themoment,> > but I still have a couple of problems that I can''t find the solution > > for: > > > > - out-of-order packets. In order to get softflowd and rrd to work,I''m> > sending packets via the local interface of the machine. I thoughtthis> > would work just fine but flow-capture reports continually ''lost'' > > packets: > > > > Apr 29 11:27:18 linux flow-capture[23080]: ftpdu_seq_check(): > > src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 > > expecting=28199800 received=28199770 lost=4294967265 > > Apr 29 11:27:18 linux flow-capture[23080]: ftpdu_seq_check(): > > src_ip=xxx.xxx.xxx.xxx dst_ip=xxx.xxx.xxx.xxx d_version=5 > > expecting=28199800 received=28199770 lost=4294967265 > > This might be a bug in the sequence number generation of softflowd, ora> bug in flow-capture - flow-capture is certainly printing negative > sequence number offsets incorrectly. > > Can you capture some softflowd output packages with another netflow > capable tool to manually check the sequence numbers? E.g. tcpdump''s > cnfp mode, flowd, etc.Thank you for the tcdpdump tip - although I''m using it quite often, I wasn''t aware that it can decode Netflow packets. I''ve looked at it and I got the following trace: 15:00:51.234336 IP (tos 0x0, ttl 64, id 37603, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261557.484 uptime, 1115042451.234285000, #92222382, 29 recs 15:00:51.235204 IP (tos 0x0, ttl 64, id 37604, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261557.484 uptime, 1115042451.234285000, #92222382, 29 recs [lines deleted...] 15:00:51.277266 IP (tos 0x0, ttl 64, id 37656, offset 0, flags [DF], length: 1492) tester.32782 > tester.2000: NetFlow v5, 261557.484 uptime, 1115042451.234285000, #92222382, 30 recs 15:00:51.277933 IP (tos 0x0, ttl 64, id 37657, offset 0, flags [DF], length: 1492) tester.32782 > tester.2000: NetFlow v5, 261557.484 uptime, 1115042451.234285000, #92222382, 30 recs 15:01:01.242400 IP (tos 0x0, ttl 64, id 37718, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261567.493 uptime, 1115042461.242348000, #92227192, 29 recs 15:01:01.243251 IP (tos 0x0, ttl 64, id 37719, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261567.493 uptime, 1115042461.242348000, #92227192, 29 recs [lines deleted...] 15:01:01.322057 IP (tos 0x0, ttl 64, id 37835, offset 0, flags [DF], length: 1492) tester.32782 > tester.2000: NetFlow v5, 261567.493 uptime, 1115042461.242348000, #92227192, 30 recs 15:01:01.322110 IP (tos 0x0, ttl 64, id 37836, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261567.493 uptime, 1115042461.242348000, #92227192, 29 recs 15:01:01.322157 IP (tos 0x0, ttl 64, id 37837, offset 0, flags [DF], length: 772) tester.32782 > tester.2000: NetFlow v5, 261567.493 uptime, 1115042461.242348000, #92227192, 15 recs 15:01:11.230365 IP (tos 0x0, ttl 64, id 37838, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261577.480 uptime, 1115042471.230315000, #92231968, 29 recs 15:01:11.231270 IP (tos 0x0, ttl 64, id 37839, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261577.480 uptime, 1115042471.230315000, #92231968, 29 recs 15:01:11.232049 IP (tos 0x0, ttl 64, id 37840, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261577.480 uptime, 1115042471.230315000, #92231968, 29 recs [lines deleted...] 15:01:11.276308 IP (tos 0x0, ttl 64, id 37952, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261577.480 uptime, 1115042471.230315000, #92231968, 29 recs 15:01:11.276335 IP (tos 0x0, ttl 64, id 37953, offset 0, flags [DF], length: 1492) tester.32782 > tester.2000: NetFlow v5, 261577.480 uptime, 1115042471.230315000, #92231968, 30 recs 15:01:11.276353 IP (tos 0x0, ttl 64, id 37954, offset 0, flags [DF], length: 340) tester.32782 > tester.2000: NetFlow v5, 261577.480 uptime, 1115042471.230315000, #92231968, 6 recs 15:01:21.225781 IP (tos 0x0, ttl 64, id 37955, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261587.476 uptime, 1115042481.225734000, #92236610, 29 recs 15:01:21.226594 IP (tos 0x0, ttl 64, id 37956, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261587.476 uptime, 1115042481.225734000, #92236610, 29 recs 15:01:21.227292 IP (tos 0x0, ttl 64, id 37957, offset 0, flags [DF], length: 1444) tester.32782 > tester.2000: NetFlow v5, 261587.476 uptime, 1115042481.225734000, #92236610, 29 recs I''ve looked at print-cnfp.c within tcpdump and the # values are the sequence numbers within Netflow. I am not sure why they remain the same between datagrams (are they supposed to? the specification says that they should be increased with the corresponding number of flows every time a new datagram arrives), but one thing is sure - there is no gap in the id numbers, so no datagrams are lost in the process. Related to this, I''ve changed the expiry interval to 10 seconds, hoping to get more granularity (and less loss, due to reduced bursts) - has anybody obtained better/worse results while varying the expiry interval?> > > Is it possible (I''ve don very little in terms of socket programming,so> > the answer might be straightforward) to pipe softflowd straight into > > flow-capture, so that the communication will be (hopefully)smoother?> > It is highly unlikely that the packets are acually getting dropped. It > is more likely a bug in one/both packages.Judging by the trace above (no gaps in the id numbers), I am more tempted to believe there is (at least) some sort of misinterpretation, if not a bug.> > > - CPU usage - softflowd seems to behave very strange when changingthe> > priority - with the default priority (0), it uses continuously about80%> > of the processor (yes, it is a big network and, again, yes, it is aslow> > machine - P3-900MHz); when I renice it (via ''top'') to lower priority > > (e.g. 10), the utilisation drops to about 20%, although theprocessor> > is, overall, only about 50% used. Is the collector droppingpackets/flow> > during that time? Is the machine that slow? > > It is difficult to tell whether or not you are dropping packets - it > depend a lot on what happens on the kernel side too. I''m not sure what > information on packet drops libpcap makes available, but the attached > patch adds printing of the statistics that it does provide to those > printable using "softflowctl statistics". > > I''m not sure what a "dropped packet" in libpcap''s parlance is - it may > be a drop because the client couldn''t keep up, or a drop from a bpf > filter program. It should be obvious once you start playing with it > though :)Thank you very much for the patch. I will apply it on later today and come back with the details.> > > - Reports - I am not sure whether this is a softflowd problem or an > > rrd-related problem. I''ve noticed a continuous udp flow running overthe> > network, quite considerable in terms of bandwidth. However, whendrawing> > the graphs, there is only an hourly spike, and nothing else. Whatcould> > be causing this type of reporting? > > That depends on how the data is being presented - if you are showing > flow records vs time, then a long UDP conversation may be representedby> only a single flow record. Retrospectively producing throughput vstime> representations of long-lived flow conversations is a little trickyand> requires some suppositions, as the flow record summarises away any > dynamics in the conversation. >I got this one, understood.> -d
Damien Miller
2005-May-03 00:03 UTC
[netflow-tools] Softflowd and rrd running on the same machine
Bogdan Ghita wrote:> 15:01:21.226594 IP (tos 0x0, ttl 64, id 37956, offset 0, flags [DF], > length: 1444) tester.32782 > tester.2000: NetFlow v5, 261587.476 uptime, > 1115042481.225734000, #92236610, 29 recs > 15:01:21.227292 IP (tos 0x0, ttl 64, id 37957, offset 0, flags [DF], > length: 1444) tester.32782 > tester.2000: NetFlow v5, 261587.476 uptime, > 1115042481.225734000, #92236610, 29 recs > > I''ve looked at print-cnfp.c within tcpdump and the # values are the > sequence numbers within Netflow. I am not sure why they remain the same > between datagrams (are they supposed to? the specification says that > they should be increased with the corresponding number of flows every > time a new datagram arrives), but one thing is sure - there is no gap in > the id numbers, so no datagrams are lost in the process. Related to > this, I''ve changed the expiry interval to 10 seconds, hoping to get more > granularity (and less loss, due to reduced bursts) - has anybody > obtained better/worse results while varying the expiry interval?This is a bug in softflowd. The fix is attached :) Please test and let me know whether it helps. -d -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: softflowd-seqno.diff Url: http://lists.mindrot.org/pipermail/netflow-tools/attachments/20050503/450c924f/attachment.ksh
Hello again I''m aware that the subject is rather off-topic for the list, but it seems that either nobody has had problems before with it or it''s my machine that really needs upgrading. I''ve installed the statistics patch from Damien and now I get the full picture - "pcap packets dropped" are about 10% of "pcap packets received". An example below: --- Number of active flows: 550792 Packets processed: 468154791 Fragments: 1823 Ignored packets: 40871 (40871 non-IP, 0 too short) Flows expired: 7131866 (0 forced) Flows exported: 10501294 in 358453 packets (0 failures) pcap packets received: 3767680 pcap packets dropped: 373100 pcap packets dropped by interface: 0 --- (the percentage does vary considerably, the number of lost packets is not precisely 10%) which, according to my understanding :), means that my network card driver is dropping packets...I''ve got an onboard Intel Ethernet, so I''ve looked then on the eepro100 mailing list but it looks rather quiet. So, anybody has had any experience tweaking the buffers(?) on a/the eepro100 network card driver? Best regards Bogdan P.S. the other softflowd patch work like a charm - no more daily archives of /var/log/messages , due to flow-capture errors.
Bogdan Ghita wrote:> Hello again > > I''m aware that the subject is rather off-topic for the list, but it > seems that either nobody has had problems before with it or it''s my > machine that really needs upgrading. > > I''ve installed the statistics patch from Damien and now I get the full > picture - "pcap packets dropped" are about 10% of "pcap packets > received".Are you using a BPF filter? E.g. "tcp and port http" -d
Not that I''m aware of :) softflowd -i eth0 -m 3000000 -v 5 -n tester:2000 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft tester/tester/2000 - 5 -V5 -E1G -n 287 -N 0 -R/usr/local/netflow/bin/linkme (is there anything else that is setting any filters?) Regards Bogdan> -----Original Message----- > From: Damien Miller [mailto:djm at mindrot.org] > Sent: 03 May 2005 23:11 > To: Bogdan Ghita > Cc: netflow-tools at mindrot.org > Subject: Re: Pcap dropping packets > > Bogdan Ghita wrote: > > Hello again > > > > I''m aware that the subject is rather off-topic for the list, but it > > seems that either nobody has had problems before with it or it''s my > > machine that really needs upgrading. > > > > I''ve installed the statistics patch from Damien and now I get thefull> > picture - "pcap packets dropped" are about 10% of "pcap packets > > received". > > Are you using a BPF filter? E.g. "tcp and port http" > > -d
Bogdan Ghita wrote:> Not that I''m aware of :) > > softflowd -i eth0 -m 3000000 -v 5 -n tester:2000 > > /usr/local/netflow/bin/flow-capture -w /var/netflow/ft > tester/tester/2000 - 5 -V5 -E1G -n 287 -N 0 > -R/usr/local/netflow/bin/linkme > > (is there anything else that is setting any filters?)I have had a chance to test this a bit now. Either your libpcap reports different things than mine in the statistics (possible) or you are indeed overrunning your box''s ability to process packets. -d