I *think* we are finally making some progress in tracking our elusive performance problems. After employing a second 10Mb link from our ISP, along with another firewall box and proxy, we were able to determine the problem *is* our firewall. We don''t know exactly why yet, but our sporadic slow web access seems to have gone away since swapping a new firewall in this morning. The original firewall is a PPro200 with 256Mb, an Intel E100 and a DLink DFE500 (Tulip) card. Kernel 2.4.22-37 and 2.4.22-28, but both were compiled to remove most of the unnecessary junk, but care was taken to include netfilter stuff needed for firewalling. The current firewall that works is a P3/667, 768Mb, same NICs as above, same shorewall config as above, but a stock 2.4.22-37 Mandrake "secure" kernel. The *most* interesting thing is that this *identical* machine (the current firewall) has *not* undergone any changes aside from installing another 512Mb of RAM. Kernel is the same, and shorewall config is essentially the same. In searching for an answer, I came across this link which suggests that a dedicated firewall should have the ip_conntrack hashsize = ip_conntrack_max: http://www.wallfire.org/misc/netfilter_conntrack_perf.txt I know this isn''t strictly a shorewall issue, but I mention it here in case it is relevant. I plan to visit netfilter lists to investigate more. Now for a shorewall issue: it occurred to me that if I took a "shorewall status" of our current firewall, and then put our old one back in for a period of time, and did the same, would comparing the results tell me anything useful? (or more likely, those of you on this list). What else should I compare? It would seem to me that all I did by increasing RAM from 256 to 768Mb is triple the values of ip_conntrack_max and the hashsize. I am puzzled as to why this should make a difference, since our peak ip_conntrack value is around 2000. Speed of the hardware shouldn''t be an issue either, as neither machine sees a load avg much over 0.01. Thanks for any insight you can offer. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Shawn Wright, I.T. Manager Shawnigan Lake School http://www.sls.bc.ca swright@sls.bc.ca
On Fri, 2004-11-26 at 16:33 -0800, Shawn Wright wrote:> > In searching for an answer, I came across this link which suggests that a > dedicated firewall should have the ip_conntrack hashsize = > ip_conntrack_max: > > http://www.wallfire.org/misc/netfilter_conntrack_perf.txt > > I know this isn''t strictly a shorewall issue, but I mention it here in case it is > relevant. I plan to visit netfilter lists to investigate more. >While tweaking the ratio of the hash table size and the conntrack table size could result in a measurable performance change, the kind of poor performance you are seeing would only occur if your CPU utilization is high (which as I recall it is not).> Now for a shorewall issue: it occurred to me that if I took a "shorewall > status" of our current firewall, and then put our old one back in for a > period of time, and did the same, would comparing the results tell me > anything useful? (or more likely, those of you on this list).Probably not, given that when you "fixed" your NAT problem the output of "shorewall show nat" didn''t change.> What else should I compare?Loaded drivers, driver versions (from dmesg).> It would seem to me that all I did by increasing RAM from 256 to 768Mb is > triple the values of ip_conntrack_max and the hashsize. I am puzzled as > to why this should make a difference, since our peak ip_conntrack value > is around 2000. Speed of the hardware shouldn''t be an issue either, as > neither machine sees a load avg much over 0.01.On a firewall, Netfilter runs in interrupt handlers (and bottom halves) and will not have any effect on load avg. The % of time that you are spending in the system is more significant. -Tom -- Tom Eastep \ Nothing is foolproof to a sufficiently talented fool Shoreline, \ http://shorewall.net Washington USA \ teastep@shorewall.net PGP Public Key \ https://lists.shorewall.net/teastep.pgp.key
On 27 Nov 2004 at 10:09, Tom Eastep wrote:> On Fri, 2004-11-26 at 16:33 -0800, Shawn Wright wrote: > > > > > In searching for an answer, I came across this link which suggests that a > > dedicated firewall should have the ip_conntrack hashsize = > > ip_conntrack_max: > > > > http://www.wallfire.org/misc/netfilter_conntrack_perf.txt > > > > I know this isn''t strictly a shorewall issue, but I mention it here in case it is > > relevant. I plan to visit netfilter lists to investigate more. > > While tweaking the ratio of the hash table size and the conntrack table > size could result in a measurable performance change, the kind of poor > performance you are seeing would only occur if your CPU utilization is > high (which as I recall it is not).This is correct, CPU load is less than 1%> > Now for a shorewall issue: it occurred to me that if I took a "shorewall > > status" of our current firewall, and then put our old one back in for a > > period of time, and did the same, would comparing the results tell me > > anything useful? (or more likely, those of you on this list). > > Probably not, given that when you "fixed" your NAT problem the output of > "shorewall show nat" didn''t change.This part remains a mystery to me.> > What else should I compare? > > Loaded drivers, driver versions (from dmesg).This looks like the most likely source at the moment. I will investigate further.> > It would seem to me that all I did by increasing RAM from 256 to 768Mb is > > triple the values of ip_conntrack_max and the hashsize. I am puzzled as > > to why this should make a difference, since our peak ip_conntrack value > > is around 2000. Speed of the hardware shouldn''t be an issue either, as > > neither machine sees a load avg much over 0.01. > > On a firewall, Netfilter runs in interrupt handlers (and bottom halves) > and will not have any effect on load avg. The % of time that you are > spending in the system is more significant.Could you elaborate a bit more on this please? I understand why load average is not useful in this case, but what other tools can I use to keep tabs on CPU load imposed by netfilter? Top shows system CPU%, but not trends or averages. Thanks. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Shawn Wright, I.T. Manager Shawnigan Lake School http://www.sls.bc.ca swright@sls.bc.ca
On Mon, 2004-11-29 at 09:42 -0800, Shawn Wright wrote:> > Could you elaborate a bit more on this please? I understand why load > average is not useful in this case, but what other tools can I use to keep > tabs on CPU load imposed by netfilter? Top shows system CPU%, but > not trends or averages.I would guess that MRTG coupled with a suitable SNMP MAB would do the job but I haven''t tried to set up something like that. Hopefully someone on the list who is more MRTG literate than I am can help. -Tom -- Tom Eastep \ Nothing is foolproof to a sufficiently talented fool Shoreline, \ http://shorewall.net Washington USA \ teastep@shorewall.net PGP Public Key \ https://lists.shorewall.net/teastep.pgp.key
On 29 Nov 2004 at 9:51, Tom Eastep wrote:> On Mon, 2004-11-29 at 09:42 -0800, Shawn Wright wrote: > > > > > Could you elaborate a bit more on this please? I understand why load > > average is not useful in this case, but what other tools can I use to keep > > tabs on CPU load imposed by netfilter? Top shows system CPU%, but > > not trends or averages. > > I would guess that MRTG coupled with a suitable SNMP MAB would do the > job but I haven''t tried to set up something like that. Hopefully someone > on the list who is more MRTG literate than I am can help.I was hoping for a console tool, but mrtg will do fine - I''ll see what I can come up with here. We''re already using mrtg to monitor the traffic across the firewall, so it shouldn''t be hard to add. I''ll post results if I find a good tool. Thanks! -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Shawn Wright, I.T. Manager Shawnigan Lake School http://www.sls.bc.ca swright@sls.bc.ca
For those who uses MRTG, this one is, by far, easier to implement: http://www.cacti.net/ It''s based on RRDTOOL (MRTG''s engine) and it''s free. You can also use bash scripts as data input for graphics. All administration is done via Web Interface. [Guilsson] On Mon, 29 Nov 2004 10:22:47 -0800, Shawn Wright <swright@sls.bc.ca> wrote:> On 29 Nov 2004 at 9:51, Tom Eastep wrote: > > > > > On Mon, 2004-11-29 at 09:42 -0800, Shawn Wright wrote: > > > > > > > > Could you elaborate a bit more on this please? I understand why load > > > average is not useful in this case, but what other tools can I use to keep > > > tabs on CPU load imposed by netfilter? Top shows system CPU%, but > > > not trends or averages. > > > > I would guess that MRTG coupled with a suitable SNMP MAB would do the > > job but I haven''t tried to set up something like that. Hopefully someone > > on the list who is more MRTG literate than I am can help. > > I was hoping for a console tool, but mrtg will do fine - I''ll see what I can > come up with here. We''re already using mrtg to monitor the traffic across > the firewall, so it shouldn''t be hard to add. I''ll post results if I find a good > tool. Thanks! > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Shawn Wright, I.T. Manager > Shawnigan Lake School > http://www.sls.bc.ca > swright@sls.bc.ca > > _______________________________________________ > Shorewall-users mailing list > Post: Shorewall-users@lists.shorewall.net > Subscribe/Unsubscribe: https://lists.shorewall.net/mailman/listinfo/shorewall-users > Support: http://www.shorewall.net/support.htm > FAQ: http://www.shorewall.net/FAQ.htm >
On 29 Nov 2004 at 10:22, Shawn Wright wrote:> On 29 Nov 2004 at 9:51, Tom Eastep wrote: > > > On Mon, 2004-11-29 at 09:42 -0800, Shawn Wright wrote: > > > > > > > > Could you elaborate a bit more on this please? I understand why load > > > average is not useful in this case, but what other tools can I use to keep > > > tabs on CPU load imposed by netfilter? Top shows system CPU%, but > > > not trends or averages. > > > > I would guess that MRTG coupled with a suitable SNMP MAB would do the > > job but I haven''t tried to set up something like that. Hopefully someone > > on the list who is more MRTG literate than I am can help. > > I was hoping for a console tool, but mrtg will do fine - I''ll see what I can > come up with here. We''re already using mrtg to monitor the traffic across > the firewall, so it shouldn''t be hard to add. I''ll post results if I find a good > tool. Thanks!I haven''t had much luck with getting CPU utilization via snmp yet, but have monitored ''top'' during peak periods, and never seen System CPU% go over 10-15% Back to our other problem: I have now reconfigured our original firewall with a vanilla 2.4.28 kernel, iptables 1.2.11, 2 new NICs (using kernel tulip driver) and identical shorewall setup as the current server (which is now running well, but is temporary). Swapping the original box into service yielded *immediate* performance problems of the type we had seen before - random delays/timeouts hitting new websites, but 2nd attempt at same site was nearly always fast. Speed tests on both ''good'' and ''bad'' firewall units yielded similar results - consistently over 1000kB/s downloads from a remote test site. (link is 10Mb FD) using a 70Mb test file and http through proxy in each case. During these downloads, system CPU% as reported by ''top'' was 5-10% on the ''good'' firewall (PIII-667), and 7-14% on the ''bad'' firewall (a PPro200). If you recall, earlier tests used nearly indentical setups between these two servers (2.4.22 kernel, one Tulip NIC, one EEPro100 NIC), and yielded similar results. The Intel E100/EEPro100 drivers were a possible source of the problem, so I switched to 2 DEC 21142 cards. I am just about ready to give up on the PPro200 server, unless anyone can suggest anything else to try. I have attached status outputs from both firewalls, although the "bad" one was taken while the unit was not actually serving in its role as firewall, but rather on a secondary IP. During the actual live test, IPs were changed of course. (139.142.65.146 eth1, 139.142.66.253 eth0) Here''s basic info on "bad" firewall while "offline" [root@fw log]# shorewall version 2.0.10 [root@fw log]# ip addr show 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop link/ipip 0.0.0.0 brd 0.0.0.0 3: gre0@NONE: <NOARP> mtu 1476 qdisc noop link/gre 0.0.0.0 brd 0.0.0.0 4: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:80:c8:64:a2:61 brd ff:ff:ff:ff:ff:ff inet 139.142.66.9/24 brd 139.142.66.255 scope global eth0 5: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:80:c8:67:96:5c brd ff:ff:ff:ff:ff:ff inet 139.142.65.147/29 brd 139.142.65.151 scope global eth1 [root@fw log]# ip route show 139.142.65.144/29 dev eth1 scope link 139.142.66.0/24 dev eth0 scope link 10.0.0.0/8 via 139.142.66.245 dev eth0 127.0.0.0/8 dev lo scope link default via 139.142.65.145 dev eth1 Here''s the "good" firewall, while "online" [root@proxy4 console]# shorewall version 2.0.10 [root@proxy4 console]# ip addr show 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo 2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:80:c8:64:67:da brd ff:ff:ff:ff:ff:ff inet 139.142.66.253/24 brd 139.142.66.255 scope global eth0 6: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:a0:c9:0f:9d:c8 brd ff:ff:ff:ff:ff:ff inet 139.142.65.146/29 brd 139.142.65.151 scope global eth1 [root@proxy4 console]# ip route show 139.142.65.144/29 dev eth1 scope link 139.142.66.0/24 dev eth0 scope link 10.0.0.0/8 via 139.142.66.245 dev eth0 127.0.0.0/8 dev lo scope link default via 139.142.65.145 dev eth1 Shorewall status output attached, "good.txt" & "bad.txt" -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Shawn Wright, I.T. Manager Shawnigan Lake School http://www.sls.bc.ca swright@sls.bc.ca