thr3ads.net - netflow tools - [netflow-tools] flowd benchmark [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Gijs Molenaar

2005-Jul-13 08:12 UTC

[netflow-tools] flowd benchmark

Hello people,

I''m doing some research for what is the best flow analyse tool for us
at
the moment. We have routers generating around the 1.000.000 flows every
5 minutes, and this is already sampled with a rate of 100. So speed is
very important for us. The 2 tools I like the most are flowd and
flow-tools. Flowd supports v9 (and with that ipv6), so I prefer flowd.

The first thing that I was looking at was the load of the capture
daemon. There isn''t a big difference between the 2. I use a quite slow
computer (pentium III 450, 1 GB ram), and both deaemons use about 10%
CPU time. When the PC is very busy, flow-tools (flow-capture) starts to
drop packages and logs this. My question is, what will happen with flowd
when the CPU load is too high to process a high flow of flows? The fact
that flows are dropped isn''t important for us, but how many can be
interesting.

The next thing I did was flow analysation. I tried both python libraries
for this job. I captured 5 minutes with each daemon. Flowd will write
all info it has to the file, flow-tools does this also. The results
where stunning. These are the results (scripts are attached):

$ python flowtools.py
finished in 20 seconds
flowcount: 931711
45769 flows/s

$ python flowd.py
finished in 256 seconds
flowcount: 944281
3688 flows/s

The flowd python library is about 12x slower! I was really not happy
when I saw this output.

The thing is, I can''t use flowd now. I need to do a _lot_ more
computations than to calculate in and out AS traffic. Running flowtools
python program on a (at the moment) fast machine can speed it up by
about a factor 5, but then flowd would still be much to slow.

Maybe it has to do something with the fact that with flow-tools I do a
readlines() to load the whole file in memory. With flowd it
''walks''
trough the file, which can be much slower. But I''m not sure. flowtools
python libary is also completely written in C.

I like to use flowd, so I wanted to try to change the flowtools python
source to be able to read the flowd binary format. I''m not really a
good
C programmer, but I can give it a try :).

Greetings,

--
Gijs Molenaar
gijs at looze.net
http://gijs.looze.net

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: flowtools.py
Url:
http://lists.mindrot.org/pipermail/netflow-tools/attachments/20050713/164659e0/attachment.ksh
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: flowd.py
Url:
http://lists.mindrot.org/pipermail/netflow-tools/attachments/20050713/164659e0/attachment-0001.ksh

Gijs Molenaar

2005-Jul-13 08:16 UTC

head link

[netflow-tools] flowd benchmark

Whoeps,

last line of flowd.py should be:
print "%i flows/s" % (flowcount/(time.time()-starttime))

Gijs Molenaar

2005-Jul-13 10:27 UTC

head link

[netflow-tools] flowd benchmark

Gijs Molenaar wrote:
> Maybe it has to do something with the fact that with flow-tools I do a 
> readlines() to load the whole file in memory. With flowd it
''walks''
> trough the file, which can be much slower. But I''m not sure.
flowtools
> python libary is also completely written in C.
Again I have to correct myself. It doesn''t do a readlines, it loops 
trough an FlowSet object that looks like a list in python. I still think 
io is the bottleneck.

Damien Miller

2005-Jul-13 10:52 UTC

head link

[netflow-tools] flowd benchmark

Gijs Molenaar wrote:> Hello people,
> 
> I''m doing some research for what is the best flow analyse tool for
us at
> the moment. We have routers generating around the 1.000.000 flows every 
> 5 minutes, and this is already sampled with a rate of 100. So speed is 
> very important for us. The 2 tools I like the most are flowd and 
> flow-tools. Flowd supports v9 (and with that ipv6), so I prefer flowd.
(I assume that you are using flowd-0.8.5)
> The first thing that I was looking at was the load of the capture 
> daemon. There isn''t a big difference between the 2. I use a quite
slow
> computer (pentium III 450, 1 GB ram), and both deaemons use about 10% 
> CPU time. When the PC is very busy, flow-tools (flow-capture) starts to 
> drop packages and logs this. My question is, what will happen with flowd 
> when the CPU load is too high to process a high flow of flows? The fact 
> that flows are dropped isn''t important for us, but how many can be
> interesting.
flowd doesn''t detect if packets are dropped by the kernel before they
reach the daemon. It should check the netflow v5+ sequence numbers, and
this is already on the todo list.
> The next thing I did was flow analysation. I tried both python libraries 
> for this job. I captured 5 minutes with each daemon. Flowd will write 
> all info it has to the file, flow-tools does this also. The results 
> where stunning. These are the results (scripts are attached):
> 
> $ python flowtools.py
> finished in 20 seconds
> flowcount: 931711
> 45769 flows/s
> 
> $ python flowd.py
> finished in 256 seconds
> flowcount: 944281
> 3688 flows/s
Does turning off storing the CRC32 in flowd.conf speed this up?

flowd is always going to have to do a little more work, because the set
of fields that it stores is variable. That being said, it should be
possible to speed up the reader function by moving more it from the pure
python part of the module to the C implementation.

If I get time, I''ll look at it on the weekend.

-d

Gijs Molenaar

2005-Jul-13 11:05 UTC

head link

[netflow-tools] flowd benchmark

Damien Miller wrote:
> (I assume that you are using flowd-0.8.5)
yes
> flowd doesn''t detect if packets are dropped by the kernel before
they
> reach the daemon. It should check the netflow v5+ sequence numbers, and
> this is already on the todo list.
ah ok, good :)
> Does turning off storing the CRC32 in flowd.conf speed this up?
I just did another test with flowd only logging AS info and octets, this 
with the following results:

finished in 145 seconds
flowcount: 961501
6631 flows/s

twice as fast, but still much more slower than flowtools.
> If I get time, I''ll look at it on the weekend.
Great. I really prefer flowd, but I need speed. If I can be of any help, 
let me know.


thanks for the fast reply!

Damien Miller

2005-Jul-13 12:53 UTC

head link

[netflow-tools] flowd benchmark

Damien Miller wrote:> flowd is always going to have to do a little more work, because the set
> of fields that it stores is variable. That being said, it should be
> possible to speed up the reader function by moving more it from the pure
> python part of the module to the C implementation.
OK, I moved all of the flow reding into the C part of the Python module
and it didn''t help much.

So the problem is a little deeper. I probably need to break out gprof to
analyse it properly, but I think the problem is that the C part of the
python module always converts all of the flow fields to python objects
when the flow is loaded. This is a waste of time if not all of those
fields are subsequently used.

It is probably better to make the deserialiser return a first-class
object with tp_dict or tp_members hooked to do the C struct -> python
object conversion either on demand or lazily.

Unfortunately, it is quite a bit more work, but it does fall into the
Python API renovation that is already in the TODO. I''ll try to have a
look at it on the weekend but it will likely take a while longer. If
there are any Python hackers on the list, now would be a good time to
delurk and help out :)

In the meantime, you can get a direct speed increase by only storing the
fields that you are interested in.

-d
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: flowd-python-slightly-faster.diff
Url:
http://lists.mindrot.org/pipermail/netflow-tools/attachments/20050713/53797700/attachment.ksh

Gijs Molenaar

2005-Jul-13 13:39 UTC

head link

[netflow-tools] flowd benchmark

Damien Miller wrote:
> Damien Miller wrote:
>
>> flowd is always going to have to do a little more work, because the set
>> of fields that it stores is variable. That being said, it should be
>> possible to speed up the reader function by moving more it from the
pure
>> python part of the module to the C implementation.
>
>
> OK, I moved all of the flow reding into the C part of the Python module
> and it didn''t help much.
That''s fast!  : )

I did a little more research. Because I thought the python API of flowd 
was slow, I wanted to write a flowd-reader parser in python. I tried 3 
flowcapture programs and their readers.

The test is the most basic operation, just read out the flow log file 
and print the fields

$ time flowd-reader ./flowd-log | wc -l
944282
real    1m55.109s
user    1m23.378s
sys     0m42.296s

$ time flow-export -f2 < ./flowtools-log | wc -l
flow-export: Exported 931711 records
931712
real    0m24.225s
user    0m23.384s
sys     0m2.037s

Much, faster but not variable field length.

$ time ./ipflow grep ./netflow.log | wc -l
1280538
real    1m45.336s
user    1m44.407s
sys     0m3.188s

This is a new one I tried, supporting v9. It isn''t that much faster
than
flowd. So it really is the variable field length thing that makes it slow.

All tests where done with v5 cisco flows, and on a 2 proccessor system.

Jason Dixon

2005-Jul-13 14:14 UTC

head link

[netflow-tools] flowd benchmark

On Jul 13, 2005, at 9:39 AM, Gijs Molenaar wrote:
> All tests where done with v5 cisco flows, and on a 2 proccessor  
> system.
I hear SGI Altix are on sale these days.  ;-)


--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net

Gijs Molenaar

2005-Jul-13 14:58 UTC

head link

[netflow-tools] flowd benchmark

Jason Dixon wrote:
>I hear SGI Altix are on sale these days.  ;-)
>I had one, but it doesn''t run Half-life 2...

netflow tools - Jul 2005 - flowd benchmark

[netflow-tools] flowd benchmark

[netflow-tools] flowd benchmark

[netflow-tools] flowd benchmark

[netflow-tools] flowd benchmark

[netflow-tools] flowd benchmark

[netflow-tools] flowd benchmark

[netflow-tools] flowd benchmark

[netflow-tools] flowd benchmark

[netflow-tools] flowd benchmark