thr3ads.net - Shorewall users - Finally making some progress [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Shawn Wright

2004-Nov-27 00:33 UTC

Finally making some progress

I *think* we are finally making some progress in tracking our elusive 
performance problems. After employing a second 10Mb link from our ISP, 
along with another firewall box and proxy, we were able to determine the 
problem *is* our firewall. We don''t know exactly why yet, but our
sporadic
slow web access seems to have gone away since swapping a new firewall 
in this morning.

The original firewall is a PPro200 with 256Mb, an Intel E100 and a DLink 
DFE500 (Tulip) card. Kernel 2.4.22-37 and 2.4.22-28, but both were 
compiled to remove most of the unnecessary junk, but care was taken to 
include netfilter stuff needed for firewalling.

The current firewall that works is a P3/667, 768Mb, same NICs as above, 
same shorewall config as above, but a stock 2.4.22-37 Mandrake 
"secure" kernel. 

The *most* interesting thing is that this *identical* machine (the current 
firewall) has *not* undergone any changes aside from installing another 
512Mb of RAM. Kernel is the same, and shorewall config is essentially 
the same. 

In searching for an answer, I came across this link which suggests that a 
dedicated firewall should have the ip_conntrack hashsize = 
ip_conntrack_max:

http://www.wallfire.org/misc/netfilter_conntrack_perf.txt

I know this isn''t strictly a shorewall issue, but I mention it here in
case it is
relevant. I plan to visit netfilter lists to investigate more. 

Now for a shorewall issue: it occurred to me that if I took a "shorewall 
status" of our current firewall, and then put our old one back in for a 
period of time, and did the same, would comparing the results tell me 
anything useful? (or more likely, those of you on this list). What else 
should I compare?

It would seem to me that all I did by increasing RAM from 256 to 768Mb is 
triple the values of ip_conntrack_max and the hashsize. I am puzzled as 
to why this should make a difference, since our peak ip_conntrack value 
is around 2000. Speed of the hardware shouldn''t be an issue either, as 
neither machine sees a load avg much over 0.01.  

Thanks for any insight you can offer.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

Tom Eastep

2004-Nov-27 18:09 UTC

head link

Re: Finally making some progress

On Fri, 2004-11-26 at 16:33 -0800, Shawn Wright wrote:
> 
> In searching for an answer, I came across this link which suggests that a 
> dedicated firewall should have the ip_conntrack hashsize = 
> ip_conntrack_max:
> 
> http://www.wallfire.org/misc/netfilter_conntrack_perf.txt
> 
> I know this isn''t strictly a shorewall issue, but I mention it
here in case it is
> relevant. I plan to visit netfilter lists to investigate more. 
> 
While tweaking the ratio of the hash table size and the conntrack table
size could result in a measurable performance change, the kind of poor
performance you are seeing would only occur if your CPU utilization is
high (which as I recall it is not).
> Now for a shorewall issue: it occurred to me that if I took a
"shorewall
> status" of our current firewall, and then put our old one back in for
a
> period of time, and did the same, would comparing the results tell me 
> anything useful? (or more likely, those of you on this list).
Probably not, given that when you "fixed" your NAT problem the output
of
"shorewall show nat" didn''t change. 
> What else should I compare?
Loaded drivers, driver versions (from dmesg).
> It would seem to me that all I did by increasing RAM from 256 to 768Mb is 
> triple the values of ip_conntrack_max and the hashsize. I am puzzled as 
> to why this should make a difference, since our peak ip_conntrack value 
> is around 2000. Speed of the hardware shouldn''t be an issue
either, as
> neither machine sees a load avg much over 0.01.
On a firewall, Netfilter runs in interrupt handlers (and bottom halves)
and will not have any effect on load avg. The % of time that you are
spending in the system is more significant.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key

Shawn Wright

2004-Nov-29 17:42 UTC

head link

Re: Finally making some progress

On 27 Nov 2004 at 10:09, Tom Eastep wrote:
> On Fri, 2004-11-26 at 16:33 -0800, Shawn Wright wrote:
> 
> > 
> > In searching for an answer, I came across this link which suggests
that a
> > dedicated firewall should have the ip_conntrack hashsize = 
> > ip_conntrack_max:
> > 
> > http://www.wallfire.org/misc/netfilter_conntrack_perf.txt
> > 
> > I know this isn''t strictly a shorewall issue, but I mention
it here in case it is
> > relevant. I plan to visit netfilter lists to investigate more. 
> 
> While tweaking the ratio of the hash table size and the conntrack table
> size could result in a measurable performance change, the kind of poor
> performance you are seeing would only occur if your CPU utilization is
> high (which as I recall it is not).
This is correct, CPU load is less than 1%
 > > Now for a shorewall issue: it occurred to me that if I took a
"shorewall
> > status" of our current firewall, and then put our old one back in
for a
> > period of time, and did the same, would comparing the results tell me 
> > anything useful? (or more likely, those of you on this list).
> 
> Probably not, given that when you "fixed" your NAT problem the
output of
> "shorewall show nat" didn''t change. 
This part remains a mystery to me.
 > > What else should I compare?
> 
> Loaded drivers, driver versions (from dmesg).
This looks like the most likely source at the moment. I will investigate 
further.
> > It would seem to me that all I did by increasing RAM from 256 to 768Mb
is
> > triple the values of ip_conntrack_max and the hashsize. I am puzzled
as
> > to why this should make a difference, since our peak ip_conntrack
value
> > is around 2000. Speed of the hardware shouldn''t be an issue
either, as
> > neither machine sees a load avg much over 0.01.
> 
> On a firewall, Netfilter runs in interrupt handlers (and bottom halves)
> and will not have any effect on load avg. The % of time that you are
> spending in the system is more significant.
Could you elaborate a bit more on this please? I understand why load 
average is not useful in this case, but what other tools can I use to keep 
tabs on CPU load imposed by netfilter? Top shows system CPU%, but 
not trends or averages. 

Thanks.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

Tom Eastep

2004-Nov-29 17:51 UTC

head link

Re: Finally making some progress

On Mon, 2004-11-29 at 09:42 -0800, Shawn Wright wrote:
> 
> Could you elaborate a bit more on this please? I understand why load 
> average is not useful in this case, but what other tools can I use to keep 
> tabs on CPU load imposed by netfilter? Top shows system CPU%, but 
> not trends or averages. 
I would guess that MRTG coupled with a suitable SNMP MAB would do the
job but I haven''t tried to set up something like that. Hopefully
someone
on the list who is more MRTG literate than I am can help.

-Tom 
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key

Shawn Wright

2004-Nov-29 18:22 UTC

head link

Re: Finally making some progress

On 29 Nov 2004 at 9:51, Tom Eastep wrote:
> On Mon, 2004-11-29 at 09:42 -0800, Shawn Wright wrote:
> 
> > 
> > Could you elaborate a bit more on this please? I understand why load 
> > average is not useful in this case, but what other tools can I use to
keep
> > tabs on CPU load imposed by netfilter? Top shows system CPU%, but 
> > not trends or averages. 
> 
> I would guess that MRTG coupled with a suitable SNMP MAB would do the
> job but I haven''t tried to set up something like that. Hopefully
someone
> on the list who is more MRTG literate than I am can help.
I was hoping for a console tool, but mrtg will do fine - I''ll see what
I can
come up with here. We''re already using mrtg to monitor the traffic
across
the firewall, so it shouldn''t be hard to add. I''ll post
results if I find a good
tool. Thanks!
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

Guilsson

2004-Dec-01 13:36 UTC

head link

Re: Finally making some progress

For those who uses MRTG, this one is, by far, easier to implement:
http://www.cacti.net/

It''s based on RRDTOOL (MRTG''s engine) and it''s free.
You can also use
bash scripts as data input for graphics. All administration is done
via Web Interface.

[Guilsson]


On Mon, 29 Nov 2004 10:22:47 -0800, Shawn Wright <swright@sls.bc.ca>
wrote:> On 29 Nov 2004 at 9:51, Tom Eastep wrote:
> 
> 
> 
> > On Mon, 2004-11-29 at 09:42 -0800, Shawn Wright wrote:
> >
> > >
> > > Could you elaborate a bit more on this please? I understand why
load
> > > average is not useful in this case, but what other tools can I
use to keep
> > > tabs on CPU load imposed by netfilter? Top shows system CPU%, but
> > > not trends or averages.
> >
> > I would guess that MRTG coupled with a suitable SNMP MAB would do the
> > job but I haven''t tried to set up something like that.
Hopefully someone
> > on the list who is more MRTG literate than I am can help.
> 
> I was hoping for a console tool, but mrtg will do fine - I''ll see
what I can
> come up with here. We''re already using mrtg to monitor the traffic
across
> the firewall, so it shouldn''t be hard to add. I''ll post
results if I find a good
> tool. Thanks!
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Shawn Wright, I.T. Manager
> Shawnigan Lake School
> http://www.sls.bc.ca
> swright@sls.bc.ca
> 
> _______________________________________________
> Shorewall-users mailing list
> Post: Shorewall-users@lists.shorewall.net
> Subscribe/Unsubscribe:
https://lists.shorewall.net/mailman/listinfo/shorewall-users
> Support: http://www.shorewall.net/support.htm
> FAQ: http://www.shorewall.net/FAQ.htm
>

Shawn Wright

2004-Dec-02 00:01 UTC

head link

Re: Finally making some progress

On 29 Nov 2004 at 10:22, Shawn Wright wrote:
> On 29 Nov 2004 at 9:51, Tom Eastep wrote:
> 
> > On Mon, 2004-11-29 at 09:42 -0800, Shawn Wright wrote:
> > 
> > > 
> > > Could you elaborate a bit more on this please? I understand why
load
> > > average is not useful in this case, but what other tools can I
use to keep
> > > tabs on CPU load imposed by netfilter? Top shows system CPU%, but
> > > not trends or averages. 
> > 
> > I would guess that MRTG coupled with a suitable SNMP MAB would do the
> > job but I haven''t tried to set up something like that.
Hopefully someone
> > on the list who is more MRTG literate than I am can help.
> 
> I was hoping for a console tool, but mrtg will do fine - I''ll see
what I can
> come up with here. We''re already using mrtg to monitor the traffic
across
> the firewall, so it shouldn''t be hard to add. I''ll post
results if I find a good
> tool. Thanks!
I haven''t had much luck with getting CPU utilization via snmp yet, but 
have monitored ''top'' during peak periods, and never seen
System CPU%
go over 10-15% 

Back to our other problem: I have now reconfigured our original firewall 
with a vanilla 2.4.28 kernel, iptables 1.2.11, 2 new NICs (using kernel tulip 
driver) and identical shorewall setup as the current server (which is now 
running well, but is temporary). 
Swapping the original box into service yielded *immediate* performance 
problems of the type we had seen before - random delays/timeouts hitting 
new websites, but 2nd attempt at same site was nearly always fast. 
Speed tests on both ''good'' and ''bad''
firewall units yielded similar results -
consistently over 1000kB/s downloads from a remote test site. (link is 
10Mb FD) using a 70Mb test file and http through proxy in each case. 
During these downloads, system CPU% as reported by ''top'' was
5-10%
on the ''good'' firewall (PIII-667), and 7-14% on the
''bad'' firewall (a
PPro200).

If you recall, earlier tests used nearly indentical setups between these two 
servers (2.4.22 kernel, one Tulip NIC, one EEPro100 NIC), and yielded 
similar results. The Intel E100/EEPro100 drivers were a possible source 
of the problem, so I switched to 2 DEC 21142 cards.

I am just about ready to give up on the PPro200 server, unless anyone 
can suggest anything else to try. 

I have attached status outputs from both firewalls, although the "bad"
one
was taken while the unit was not actually serving in its role as firewall, but 
rather on a secondary IP. During the actual live test, IPs were changed of 
course. (139.142.65.146 eth1, 139.142.66.253 eth0)

Here''s basic info on "bad" firewall while "offline"

[root@fw log]# shorewall version
2.0.10
[root@fw log]# ip addr show
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop
    link/ipip 0.0.0.0 brd 0.0.0.0
3: gre0@NONE: <NOARP> mtu 1476 qdisc noop
    link/gre 0.0.0.0 brd 0.0.0.0
4: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 
1000
    link/ether 00:80:c8:64:a2:61 brd ff:ff:ff:ff:ff:ff
    inet 139.142.66.9/24 brd 139.142.66.255 scope global eth0
5: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 
1000
    link/ether 00:80:c8:67:96:5c brd ff:ff:ff:ff:ff:ff
    inet 139.142.65.147/29 brd 139.142.65.151 scope global eth1
[root@fw log]# ip route show
139.142.65.144/29 dev eth1  scope link
139.142.66.0/24 dev eth0  scope link
10.0.0.0/8 via 139.142.66.245 dev eth0
127.0.0.0/8 dev lo  scope link
default via 139.142.65.145 dev eth1

Here''s the "good" firewall, while "online"

[root@proxy4 console]# shorewall version
2.0.10
[root@proxy4 console]# ip addr show
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 
100
    link/ether 00:80:c8:64:67:da brd ff:ff:ff:ff:ff:ff
    inet 139.142.66.253/24 brd 139.142.66.255 scope global eth0
6: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 
100
    link/ether 00:a0:c9:0f:9d:c8 brd ff:ff:ff:ff:ff:ff
    inet 139.142.65.146/29 brd 139.142.65.151 scope global eth1
[root@proxy4 console]# ip route show
139.142.65.144/29 dev eth1  scope link
139.142.66.0/24 dev eth0  scope link
10.0.0.0/8 via 139.142.66.245 dev eth0
127.0.0.0/8 dev lo  scope link
default via 139.142.65.145 dev eth1

Shorewall status output attached, "good.txt" & "bad.txt"

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

Maybe Matching Threads

Search for more possibly parallel threads

Shorewall users - Nov 2004 - Finally making some progress

Finally making some progress

Re: Finally making some progress

Re: Finally making some progress

Re: Finally making some progress

Re: Finally making some progress

Re: Finally making some progress

Re: Finally making some progress

Maybe Matching Threads