thr3ads.net - Shorewall users - Serious stability issues [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Shawn Wright

2004-Nov-17 18:07 UTC

Serious stability issues

We have encountered a number of problems with our firewall recently, 
and the past 24 hours have left me quite concerned. Here is what we are 
seeing:

1. Original firewall, a PentiumPro/200 with 96Mb RAM, serving approx 
500 client PCs for a 10Mb internet connection. Running Mandrake 9.2, we 
began seeing severe swapping a few weeks, with kernel mem usage 
exceeding 200Mb. Given an ip_conntrack value of 1000-2000 or so 
during normal loads, the mem usage seemed excessive. Kernel 2.4.22-
28, shorewall 1.4.6c

2. In an effort to improve things until we can replace the machine, we 
installed an additional 128Mb and restarted. The machine failed to boot... 
Strange. 

3. We had a standby machine also running Mandrake 9.2, and quickly 
transferred ifcfgs and shorewall configs to it. It ran for a few minutes 
before *locking up*. This is a machine that has run flawlessly for 5 years 
24/7, albeit not with Mandrake 9.2, but NT 4.0. Restarted, and the lockup 
repeated 3 more times within 30 minutes. This machine was a dual 
PII/233 with 256Mb. Kernel 2.4.22-10, shorewall 1.4.6c

4. Transferred firewall to another spare server, this one also Mandrake 
9.2, dual PIII/733, 256Mb. It ran well from 8pm until 4am when it also 
*locked up*. Again, this hardware has been flawless 24/7 since new, and 
has been running Mandrake 9.2 for over 6 months, and had latest 
updates from Mandrake a few weeks ago. Kernel 2.4.22-10mdk, 
shorewall 2.0.10.

This morning I applied the following updates that were new using urpmi. 
Reading through the notes on these updates, I can find nothing that would 
suggest an exploit or bug causing our lockups. 

libxml2-2.5.11-1.3.92mdk.i586.rpm
libxml2-utils-2.5.11-1.3.92mdk.i586.rpm
iptables-1.2.8-2.1.92mdk.i586.rpm
webmin-1.100-3.3.92mdk.src.rpm
perl-devel-5.8.1-0.RC4.3.1.92mdk.i586.rpm
perl-5.8.1-0.RC4.3.1.92mdk.i586.rpm
perl-base-5.8.1-0.RC4.3.1.92mdk.i586.rpm
shadow-utils-4.0.3-5.1.92mdk.i586.rpm

This seems unlikely to be a shorewall issue, but I''m hoping someone
here
has seen this before. Are there any known issues with SMP hardware and 
the 2.4.22-10 kernel? That seems to be one common link between the 
two systems suffering lockup.

Thanks for any assistance you can provide.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

Philipp Rusch

2004-Nov-17 19:50 UTC

head link

Re: Serious stability issues

Hi Shawn,
I had a similar problem about a year ago when one of my customers was hit
by the slammer virus. This "ate" up the arp cache on the external
interface,
and the machine was dying because it did not have enough memory to buffer
all those "half open" connections when slammer scanned the networks
for
a victim machine. I got rid of this by dropping the packets instead of
rejecting them, seems that this uses much less resources than a drop.

HTH, Philipp

Shawn Wright schrieb:
>We have encountered a number of problems with our firewall recently, 
>and the past 24 hours have left me quite concerned. Here is what we are 
>seeing:
>
>1. Original firewall, a PentiumPro/200 with 96Mb RAM, serving approx 
>500 client PCs for a 10Mb internet connection. Running Mandrake 9.2, we 
>began seeing severe swapping a few weeks, with kernel mem usage 
>exceeding 200Mb. Given an ip_conntrack value of 1000-2000 or so 
>during normal loads, the mem usage seemed excessive. Kernel 2.4.22-
>28, shorewall 1.4.6c
>
>2. In an effort to improve things until we can replace the machine, we 
>installed an additional 128Mb and restarted. The machine failed to boot... 
>Strange. 
>
>3. We had a standby machine also running Mandrake 9.2, and quickly 
>transferred ifcfgs and shorewall configs to it. It ran for a few minutes 
>before *locking up*. This is a machine that has run flawlessly for 5 years 
>24/7, albeit not with Mandrake 9.2, but NT 4.0. Restarted, and the lockup 
>repeated 3 more times within 30 minutes. This machine was a dual 
>PII/233 with 256Mb. Kernel 2.4.22-10, shorewall 1.4.6c
>
>4. Transferred firewall to another spare server, this one also Mandrake 
>9.2, dual PIII/733, 256Mb. It ran well from 8pm until 4am when it also 
>*locked up*. Again, this hardware has been flawless 24/7 since new, and 
>has been running Mandrake 9.2 for over 6 months, and had latest 
>updates from Mandrake a few weeks ago. Kernel 2.4.22-10mdk, 
>shorewall 2.0.10.
>
>This morning I applied the following updates that were new using urpmi. 
>Reading through the notes on these updates, I can find nothing that would 
>suggest an exploit or bug causing our lockups. 
>
>libxml2-2.5.11-1.3.92mdk.i586.rpm
>libxml2-utils-2.5.11-1.3.92mdk.i586.rpm
>iptables-1.2.8-2.1.92mdk.i586.rpm
>webmin-1.100-3.3.92mdk.src.rpm
>perl-devel-5.8.1-0.RC4.3.1.92mdk.i586.rpm
>perl-5.8.1-0.RC4.3.1.92mdk.i586.rpm
>perl-base-5.8.1-0.RC4.3.1.92mdk.i586.rpm
>shadow-utils-4.0.3-5.1.92mdk.i586.rpm
>
>This seems unlikely to be a shorewall issue, but I''m hoping someone
here
>has seen this before. Are there any known issues with SMP hardware and 
>the 2.4.22-10 kernel? That seems to be one common link between the 
>two systems suffering lockup.
>
>Thanks for any assistance you can provide.
>
>-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>Shawn Wright, I.T. Manager
>Shawnigan Lake School
>http://www.sls.bc.ca
>swright@sls.bc.ca
>
>
>_______________________________________________
>Shorewall-users mailing list
>Post: Shorewall-users@lists.shorewall.net
>Subscribe/Unsubscribe:
https://lists.shorewall.net/mailman/listinfo/shorewall-users
>Support: http://www.shorewall.net/support.htm
>FAQ: http://www.shorewall.net/FAQ.htm
>
>  
>

Shawn Wright

2004-Nov-17 20:20 UTC

head link

Re: Serious stability issues

Thanks Philipp, I will look into this. I have the old machine running (offline) 
again to examine logs. In addition, it appears the lockups may be a result 
of some problems I''ve seen mentioned with SMP, aic7xxx driver, and the 
2.4.22 kernel, all of which are common on the two other systems.

On 17 Nov 2004 at 20:50, Philipp Rusch wrote:
> Hi Shawn,
> I had a similar problem about a year ago when one of my customers was hit
> by the slammer virus. This "ate" up the arp cache on the external
interface,
> and the machine was dying because it did not have enough memory to buffer
> all those "half open" connections when slammer scanned the
networks for
> a victim machine. I got rid of this by dropping the packets instead of
> rejecting them, seems that this uses much less resources than a drop.
> 
> HTH, Philipp
> 
> Shawn Wright schrieb:
> 
> >We have encountered a number of problems with our firewall recently, 
> >and the past 24 hours have left me quite concerned. Here is what we are
> >seeing:
> >
> >1. Original firewall, a PentiumPro/200 with 96Mb RAM, serving approx 
> >500 client PCs for a 10Mb internet connection. Running Mandrake 9.2, we
> >began seeing severe swapping a few weeks, with kernel mem usage 
> >exceeding 200Mb. Given an ip_conntrack value of 1000-2000 or so 
> >during normal loads, the mem usage seemed excessive. Kernel 2.4.22-
> >28, shorewall 1.4.6c
> >
> >2. In an effort to improve things until we can replace the machine, we 
> >installed an additional 128Mb and restarted. The machine failed to
boot...
> >Strange. 
> >
> >3. We had a standby machine also running Mandrake 9.2, and quickly 
> >transferred ifcfgs and shorewall configs to it. It ran for a few
minutes
> >before *locking up*. This is a machine that has run flawlessly for 5
years
> >24/7, albeit not with Mandrake 9.2, but NT 4.0. Restarted, and the
lockup
> >repeated 3 more times within 30 minutes. This machine was a dual 
> >PII/233 with 256Mb. Kernel 2.4.22-10, shorewall 1.4.6c
> >
> >4. Transferred firewall to another spare server, this one also Mandrake
> >9.2, dual PIII/733, 256Mb. It ran well from 8pm until 4am when it also 
> >*locked up*. Again, this hardware has been flawless 24/7 since new, and
> >has been running Mandrake 9.2 for over 6 months, and had latest 
> >updates from Mandrake a few weeks ago. Kernel 2.4.22-10mdk, 
> >shorewall 2.0.10.
> >
> >This morning I applied the following updates that were new using urpmi.
> >Reading through the notes on these updates, I can find nothing that
would
> >suggest an exploit or bug causing our lockups. 
> >
> >libxml2-2.5.11-1.3.92mdk.i586.rpm
> >libxml2-utils-2.5.11-1.3.92mdk.i586.rpm
> >iptables-1.2.8-2.1.92mdk.i586.rpm
> >webmin-1.100-3.3.92mdk.src.rpm
> >perl-devel-5.8.1-0.RC4.3.1.92mdk.i586.rpm
> >perl-5.8.1-0.RC4.3.1.92mdk.i586.rpm
> >perl-base-5.8.1-0.RC4.3.1.92mdk.i586.rpm
> >shadow-utils-4.0.3-5.1.92mdk.i586.rpm
> >
> >This seems unlikely to be a shorewall issue, but I''m hoping
someone here
> >has seen this before. Are there any known issues with SMP hardware and 
> >the 2.4.22-10 kernel? That seems to be one common link between the 
> >two systems suffering lockup.
> >
> >Thanks for any assistance you can provide.
> >
> >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >Shawn Wright, I.T. Manager
> >Shawnigan Lake School
> >http://www.sls.bc.ca
> >swright@sls.bc.ca
> >
> >
> >_______________________________________________
> >Shorewall-users mailing list
> >Post: Shorewall-users@lists.shorewall.net
> >Subscribe/Unsubscribe:
https://lists.shorewall.net/mailman/listinfo/shorewall-users
> >Support: http://www.shorewall.net/support.htm
> >FAQ: http://www.shorewall.net/FAQ.htm
> >
> >  
> >
> 
> _______________________________________________
> Shorewall-users mailing list
> Post: Shorewall-users@lists.shorewall.net
> Subscribe/Unsubscribe:
https://lists.shorewall.net/mailman/listinfo/shorewall-users
> Support: http://www.shorewall.net/support.htm
> FAQ: http://www.shorewall.net/FAQ.htm

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

Philipp Rusch

2004-Nov-17 22:39 UTC

head link

Re: Serious stability issues- corr.

Philipp Rusch schrieb:
> Hi Shawn,
> I had a similar problem about a year ago when one of my customers was hit
> by the slammer virus. This "ate" up the arp cache on the external
> interface,
> and the machine was dying because it did not have enough memory to buffer
> all those "half open" connections when slammer scanned the
networks for
> a victim machine. I got rid of this by dropping the packets instead of
> rejecting them, seems that this uses much less resources than a drop. 
correction: ... "this uses much less resources than a reject " of
course
... !
>
>
> HTH, Philipp
>
> Shawn Wright schrieb:
>
>> We have encountered a number of problems with our firewall recently, 
>> and the past 24 hours have left me quite concerned. Here is what we 
>> are seeing:
>>
>> 1. Original firewall, a PentiumPro/200 with 96Mb RAM, serving approx 
>> 500 client PCs for a 10Mb internet connection. Running Mandrake 9.2, 
>> we began seeing severe swapping a few weeks, with kernel mem usage 
>> exceeding 200Mb. Given an ip_conntrack value of 1000-2000 or so 
>> during normal loads, the mem usage seemed excessive. Kernel 2.4.22-
>> 28, shorewall 1.4.6c
>>
>> 2. In an effort to improve things until we can replace the machine, 
>> we installed an additional 128Mb and restarted. The machine failed to 
>> boot... Strange.
>> 3. We had a standby machine also running Mandrake 9.2, and quickly 
>> transferred ifcfgs and shorewall configs to it. It ran for a few 
>> minutes before *locking up*. This is a machine that has run 
>> flawlessly for 5 years 24/7, albeit not with Mandrake 9.2, but NT 
>> 4.0. Restarted, and the lockup repeated 3 more times within 30 
>> minutes. This machine was a dual PII/233 with 256Mb. Kernel 
>> 2.4.22-10, shorewall 1.4.6c
>>
>> 4. Transferred firewall to another spare server, this one also 
>> Mandrake 9.2, dual PIII/733, 256Mb. It ran well from 8pm until 4am 
>> when it also *locked up*. Again, this hardware has been flawless 24/7 
>> since new, and has been running Mandrake 9.2 for over 6 months, and 
>> had latest updates from Mandrake a few weeks ago. Kernel 
>> 2.4.22-10mdk, shorewall 2.0.10.
>>
>> This morning I applied the following updates that were new using 
>> urpmi. Reading through the notes on these updates, I can find nothing 
>> that would suggest an exploit or bug causing our lockups.
>> libxml2-2.5.11-1.3.92mdk.i586.rpm
>> libxml2-utils-2.5.11-1.3.92mdk.i586.rpm
>> iptables-1.2.8-2.1.92mdk.i586.rpm
>> webmin-1.100-3.3.92mdk.src.rpm
>> perl-devel-5.8.1-0.RC4.3.1.92mdk.i586.rpm
>> perl-5.8.1-0.RC4.3.1.92mdk.i586.rpm
>> perl-base-5.8.1-0.RC4.3.1.92mdk.i586.rpm
>> shadow-utils-4.0.3-5.1.92mdk.i586.rpm
>>
>> This seems unlikely to be a shorewall issue, but I''m hoping
someone
>> here has seen this before. Are there any known issues with SMP 
>> hardware and the 2.4.22-10 kernel? That seems to be one common link 
>> between the two systems suffering lockup.
>>
>> Thanks for any assistance you can provide.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Shawn Wright, I.T. Manager
>> Shawnigan Lake School
>> http://www.sls.bc.ca
>> swright@sls.bc.ca
>>
>>
>> _______________________________________________
>> Shorewall-users mailing list
>> Post: Shorewall-users@lists.shorewall.net
>> Subscribe/Unsubscribe: 
>> https://lists.shorewall.net/mailman/listinfo/shorewall-users
>> Support: http://www.shorewall.net/support.htm
>> FAQ: http://www.shorewall.net/FAQ.htm
>>
>>  
>>
>
> _______________________________________________
> Shorewall-users mailing list
> Post: Shorewall-users@lists.shorewall.net
> Subscribe/Unsubscribe: 
> https://lists.shorewall.net/mailman/listinfo/shorewall-users
> Support: http://www.shorewall.net/support.htm
> FAQ: http://www.shorewall.net/FAQ.htm
>

Shawn Wright

2004-Nov-18 01:08 UTC

head link

Re: Serious stability issues- corr.

On 17 Nov 2004 at 23:39, Philipp Rusch wrote:
> 
> Philipp Rusch schrieb:
> 
> > Hi Shawn,
> > I had a similar problem about a year ago when one of my customers was
hit
> > by the slammer virus. This "ate" up the arp cache on the
external
> > interface,
> > and the machine was dying because it did not have enough memory to
buffer
> > all those "half open" connections when slammer scanned the
networks for
> > a victim machine. I got rid of this by dropping the packets instead of
> > rejecting them, seems that this uses much less resources than a drop. 
> 
> correction: ... "this uses much less resources than a reject " of
course
Thanks - I assumed that was a typo. I''ve so far found nothing too
unusual
in the logs except that a large number of rejects were from inside the 
firewall, from hosts with broken routing tables. Most of our local machines 
don''t use the fw as a gateway, but instead go through a Cisco Cat6K. 
In the 100 minute period of log that I looked at, I had approx 100k entries, 
of which 5200 were rejects, and 4600 were drops. Most of the rejects 
were from inside our network. 

So far today, we''ve had two more lockups. I installed the 2.4.22-37
kernel,
and rebooted into into it after the 1st lockup, in hopes it would address the  
problem. It lasted two hours before the next lockup, less than an hour 
ago. I have yanked the 2nd CPU out, and will see who locks up first - me 
of the fw...

At the moment, it appears to be a hardware or kernel /driver issue (I was 
suspecting the 2.4.22 kernel bug with SMP/aic7xxx driver, but am not 
sure what to think now). But if anyone has any ideas on what to look for in 
the logs or my shorewall config, please let me know. After 8 years of 
nearly flawless firewall operation, I admit I am not well versed in tracking 
these things down.

Thanks.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

Gary Buckmaster

2004-Nov-18 01:37 UTC

head link

Re: Serious stability issues- corr.

This may seem like a somewhat ignorant question, but I''m ignorant, so
I''ll ask.  Is there a reason you''re not running the latest 2.4
series
kernel, or running a stable 2.6 series kernel to take advantage of all
the new SMP work that''s been done?

Also, have you checked to make sure its not a physical hardware fault
in the machines?  I realize that you''ve swapped a few out, but nasty
coincidences do happen with old hardware.

Michael Loftis

2004-Nov-18 01:51 UTC

head link

Re: Serious stability issues- corr.

OK this is time for a twilight moment, I''ve an old box at home that ran
fine for god knows how long, and is now experiencing similar lockup issues 
-- it''s behind a firewall, and i honestly am not sure what kernel
version
it''s running (2.2.something for sure, 2.2.17 maybe?).  It''s a
totally
non-critical box so I haven''t even looked into the matter yet...but it
has
been going on for quite a while now.

It''s almost certainly a kernel issue, or some RAM going bad in my case
but
as I said, haven''t even looked at it.

Shawn Wright

2004-Nov-18 17:28 UTC

head link

Re: Serious stability issues- corr.

On 17 Nov 2004 at 19:37, Gary Buckmaster wrote:
> This may seem like a somewhat ignorant question, but I''m ignorant,
so
> I''ll ask.  Is there a reason you''re not running the
latest 2.4 series
> kernel, or running a stable 2.6 series kernel to take advantage of all
> the new SMP work that''s been done?
Most of our linux machines are not SMP, and I''m fairly cautious of
major
new releases, so 2.4 remains the newest production kernel here. We also 
still run NT4.0 servers for similar reasons, and they are still rock solid - I 
can''t say the same for win2k/xp.
 > Also, have you checked to make sure its not a physical hardware fault
> in the machines?  I realize that you''ve swapped a few out, but
nasty
> coincidences do happen with old hardware.
I haven''t ruled anything out yet, but two SMP machines running the same
kernel and SCSI cards seemed like a possible link until it locked up on the 
newer kernel also. However, the newly improved *single* CPU machine 
(after I pulled the 2nd one) has now survived overnight, rather than the 2-
4 hours I was seeing yesterday.


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

Tom Eastep

2004-Nov-18 18:38 UTC

head link

Re: Serious stability issues- corr.

On Thu, 2004-11-18 at 09:28 -0800, Shawn Wright wrote:
> 
> I haven''t ruled anything out yet, but two SMP machines running the
same
> kernel and SCSI cards seemed like a possible link until it locked up on the
> newer kernel also. However, the newly improved *single* CPU machine 
> (after I pulled the 2nd one) has now survived overnight, rather than the 2-
> 4 hours I was seeing yesterday.
Netfilter itself also has a history of SMP-related bugs.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key

Shawn Wright

2004-Nov-18 19:51 UTC

head link

Re: Serious stability issues- corr.

On 18 Nov 2004 at 10:38, Tom Eastep wrote:
> On Thu, 2004-11-18 at 09:28 -0800, Shawn Wright wrote:
> 
> > 
> > I haven''t ruled anything out yet, but two SMP machines
running the same
> > kernel and SCSI cards seemed like a possible link until it locked up
on the
> > newer kernel also. However, the newly improved *single* CPU machine 
> > (after I pulled the 2nd one) has now survived overnight, rather than
the 2-
> > 4 hours I was seeing yesterday.
> 
> Netfilter itself also has a history of SMP-related bugs.
I was wondering about that, but had not yet done any research into it. I 
am still trying to recoup some of the lost sleep from the past few days. I 
think the best option is to avoid SMP for now. Interestingly, I''ve yet
to find
a windows firewall that can handle SMP properly either, since I run an 
SMP windows box at home. I''m still trying to find time to build a
proper
(linux) firewall at for home.

Thanks for the tip.


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Shawn Wright, I.T. Manager
Shawnigan Lake School
http://www.sls.bc.ca
swright@sls.bc.ca

Possibly Parallel Threads

Search for more possibly parallel threads

Shorewall users - Nov 2004 - Serious stability issues

Serious stability issues

Re: Serious stability issues

Re: Serious stability issues

Re: Serious stability issues- corr.

Re: Serious stability issues- corr.

Re: Serious stability issues- corr.

Re: Serious stability issues- corr.

Re: Serious stability issues- corr.

Re: Serious stability issues- corr.

Re: Serious stability issues- corr.

Possibly Parallel Threads