We have encountered a number of problems with our firewall recently, and the past 24 hours have left me quite concerned. Here is what we are seeing: 1. Original firewall, a PentiumPro/200 with 96Mb RAM, serving approx 500 client PCs for a 10Mb internet connection. Running Mandrake 9.2, we began seeing severe swapping a few weeks, with kernel mem usage exceeding 200Mb. Given an ip_conntrack value of 1000-2000 or so during normal loads, the mem usage seemed excessive. Kernel 2.4.22- 28, shorewall 1.4.6c 2. In an effort to improve things until we can replace the machine, we installed an additional 128Mb and restarted. The machine failed to boot... Strange. 3. We had a standby machine also running Mandrake 9.2, and quickly transferred ifcfgs and shorewall configs to it. It ran for a few minutes before *locking up*. This is a machine that has run flawlessly for 5 years 24/7, albeit not with Mandrake 9.2, but NT 4.0. Restarted, and the lockup repeated 3 more times within 30 minutes. This machine was a dual PII/233 with 256Mb. Kernel 2.4.22-10, shorewall 1.4.6c 4. Transferred firewall to another spare server, this one also Mandrake 9.2, dual PIII/733, 256Mb. It ran well from 8pm until 4am when it also *locked up*. Again, this hardware has been flawless 24/7 since new, and has been running Mandrake 9.2 for over 6 months, and had latest updates from Mandrake a few weeks ago. Kernel 2.4.22-10mdk, shorewall 2.0.10. This morning I applied the following updates that were new using urpmi. Reading through the notes on these updates, I can find nothing that would suggest an exploit or bug causing our lockups. libxml2-2.5.11-1.3.92mdk.i586.rpm libxml2-utils-2.5.11-1.3.92mdk.i586.rpm iptables-1.2.8-2.1.92mdk.i586.rpm webmin-1.100-3.3.92mdk.src.rpm perl-devel-5.8.1-0.RC4.3.1.92mdk.i586.rpm perl-5.8.1-0.RC4.3.1.92mdk.i586.rpm perl-base-5.8.1-0.RC4.3.1.92mdk.i586.rpm shadow-utils-4.0.3-5.1.92mdk.i586.rpm This seems unlikely to be a shorewall issue, but I''m hoping someone here has seen this before. Are there any known issues with SMP hardware and the 2.4.22-10 kernel? That seems to be one common link between the two systems suffering lockup. Thanks for any assistance you can provide. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Shawn Wright, I.T. Manager Shawnigan Lake School http://www.sls.bc.ca swright@sls.bc.ca
Hi Shawn, I had a similar problem about a year ago when one of my customers was hit by the slammer virus. This "ate" up the arp cache on the external interface, and the machine was dying because it did not have enough memory to buffer all those "half open" connections when slammer scanned the networks for a victim machine. I got rid of this by dropping the packets instead of rejecting them, seems that this uses much less resources than a drop. HTH, Philipp Shawn Wright schrieb:>We have encountered a number of problems with our firewall recently, >and the past 24 hours have left me quite concerned. Here is what we are >seeing: > >1. Original firewall, a PentiumPro/200 with 96Mb RAM, serving approx >500 client PCs for a 10Mb internet connection. Running Mandrake 9.2, we >began seeing severe swapping a few weeks, with kernel mem usage >exceeding 200Mb. Given an ip_conntrack value of 1000-2000 or so >during normal loads, the mem usage seemed excessive. Kernel 2.4.22- >28, shorewall 1.4.6c > >2. In an effort to improve things until we can replace the machine, we >installed an additional 128Mb and restarted. The machine failed to boot... >Strange. > >3. We had a standby machine also running Mandrake 9.2, and quickly >transferred ifcfgs and shorewall configs to it. It ran for a few minutes >before *locking up*. This is a machine that has run flawlessly for 5 years >24/7, albeit not with Mandrake 9.2, but NT 4.0. Restarted, and the lockup >repeated 3 more times within 30 minutes. This machine was a dual >PII/233 with 256Mb. Kernel 2.4.22-10, shorewall 1.4.6c > >4. Transferred firewall to another spare server, this one also Mandrake >9.2, dual PIII/733, 256Mb. It ran well from 8pm until 4am when it also >*locked up*. Again, this hardware has been flawless 24/7 since new, and >has been running Mandrake 9.2 for over 6 months, and had latest >updates from Mandrake a few weeks ago. Kernel 2.4.22-10mdk, >shorewall 2.0.10. > >This morning I applied the following updates that were new using urpmi. >Reading through the notes on these updates, I can find nothing that would >suggest an exploit or bug causing our lockups. > >libxml2-2.5.11-1.3.92mdk.i586.rpm >libxml2-utils-2.5.11-1.3.92mdk.i586.rpm >iptables-1.2.8-2.1.92mdk.i586.rpm >webmin-1.100-3.3.92mdk.src.rpm >perl-devel-5.8.1-0.RC4.3.1.92mdk.i586.rpm >perl-5.8.1-0.RC4.3.1.92mdk.i586.rpm >perl-base-5.8.1-0.RC4.3.1.92mdk.i586.rpm >shadow-utils-4.0.3-5.1.92mdk.i586.rpm > >This seems unlikely to be a shorewall issue, but I''m hoping someone here >has seen this before. Are there any known issues with SMP hardware and >the 2.4.22-10 kernel? That seems to be one common link between the >two systems suffering lockup. > >Thanks for any assistance you can provide. > >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >Shawn Wright, I.T. Manager >Shawnigan Lake School >http://www.sls.bc.ca >swright@sls.bc.ca > > >_______________________________________________ >Shorewall-users mailing list >Post: Shorewall-users@lists.shorewall.net >Subscribe/Unsubscribe: https://lists.shorewall.net/mailman/listinfo/shorewall-users >Support: http://www.shorewall.net/support.htm >FAQ: http://www.shorewall.net/FAQ.htm > > >
Thanks Philipp, I will look into this. I have the old machine running (offline) again to examine logs. In addition, it appears the lockups may be a result of some problems I''ve seen mentioned with SMP, aic7xxx driver, and the 2.4.22 kernel, all of which are common on the two other systems. On 17 Nov 2004 at 20:50, Philipp Rusch wrote:> Hi Shawn, > I had a similar problem about a year ago when one of my customers was hit > by the slammer virus. This "ate" up the arp cache on the external interface, > and the machine was dying because it did not have enough memory to buffer > all those "half open" connections when slammer scanned the networks for > a victim machine. I got rid of this by dropping the packets instead of > rejecting them, seems that this uses much less resources than a drop. > > HTH, Philipp > > Shawn Wright schrieb: > > >We have encountered a number of problems with our firewall recently, > >and the past 24 hours have left me quite concerned. Here is what we are > >seeing: > > > >1. Original firewall, a PentiumPro/200 with 96Mb RAM, serving approx > >500 client PCs for a 10Mb internet connection. Running Mandrake 9.2, we > >began seeing severe swapping a few weeks, with kernel mem usage > >exceeding 200Mb. Given an ip_conntrack value of 1000-2000 or so > >during normal loads, the mem usage seemed excessive. Kernel 2.4.22- > >28, shorewall 1.4.6c > > > >2. In an effort to improve things until we can replace the machine, we > >installed an additional 128Mb and restarted. The machine failed to boot... > >Strange. > > > >3. We had a standby machine also running Mandrake 9.2, and quickly > >transferred ifcfgs and shorewall configs to it. It ran for a few minutes > >before *locking up*. This is a machine that has run flawlessly for 5 years > >24/7, albeit not with Mandrake 9.2, but NT 4.0. Restarted, and the lockup > >repeated 3 more times within 30 minutes. This machine was a dual > >PII/233 with 256Mb. Kernel 2.4.22-10, shorewall 1.4.6c > > > >4. Transferred firewall to another spare server, this one also Mandrake > >9.2, dual PIII/733, 256Mb. It ran well from 8pm until 4am when it also > >*locked up*. Again, this hardware has been flawless 24/7 since new, and > >has been running Mandrake 9.2 for over 6 months, and had latest > >updates from Mandrake a few weeks ago. Kernel 2.4.22-10mdk, > >shorewall 2.0.10. > > > >This morning I applied the following updates that were new using urpmi. > >Reading through the notes on these updates, I can find nothing that would > >suggest an exploit or bug causing our lockups. > > > >libxml2-2.5.11-1.3.92mdk.i586.rpm > >libxml2-utils-2.5.11-1.3.92mdk.i586.rpm > >iptables-1.2.8-2.1.92mdk.i586.rpm > >webmin-1.100-3.3.92mdk.src.rpm > >perl-devel-5.8.1-0.RC4.3.1.92mdk.i586.rpm > >perl-5.8.1-0.RC4.3.1.92mdk.i586.rpm > >perl-base-5.8.1-0.RC4.3.1.92mdk.i586.rpm > >shadow-utils-4.0.3-5.1.92mdk.i586.rpm > > > >This seems unlikely to be a shorewall issue, but I''m hoping someone here > >has seen this before. Are there any known issues with SMP hardware and > >the 2.4.22-10 kernel? That seems to be one common link between the > >two systems suffering lockup. > > > >Thanks for any assistance you can provide. > > > >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >Shawn Wright, I.T. Manager > >Shawnigan Lake School > >http://www.sls.bc.ca > >swright@sls.bc.ca > > > > > >_______________________________________________ > >Shorewall-users mailing list > >Post: Shorewall-users@lists.shorewall.net > >Subscribe/Unsubscribe: https://lists.shorewall.net/mailman/listinfo/shorewall-users > >Support: http://www.shorewall.net/support.htm > >FAQ: http://www.shorewall.net/FAQ.htm > > > > > > > > _______________________________________________ > Shorewall-users mailing list > Post: Shorewall-users@lists.shorewall.net > Subscribe/Unsubscribe: https://lists.shorewall.net/mailman/listinfo/shorewall-users > Support: http://www.shorewall.net/support.htm > FAQ: http://www.shorewall.net/FAQ.htm-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Shawn Wright, I.T. Manager Shawnigan Lake School http://www.sls.bc.ca swright@sls.bc.ca
Philipp Rusch schrieb:> Hi Shawn, > I had a similar problem about a year ago when one of my customers was hit > by the slammer virus. This "ate" up the arp cache on the external > interface, > and the machine was dying because it did not have enough memory to buffer > all those "half open" connections when slammer scanned the networks for > a victim machine. I got rid of this by dropping the packets instead of > rejecting them, seems that this uses much less resources than a drop.correction: ... "this uses much less resources than a reject " of course ... !> > > HTH, Philipp > > Shawn Wright schrieb: > >> We have encountered a number of problems with our firewall recently, >> and the past 24 hours have left me quite concerned. Here is what we >> are seeing: >> >> 1. Original firewall, a PentiumPro/200 with 96Mb RAM, serving approx >> 500 client PCs for a 10Mb internet connection. Running Mandrake 9.2, >> we began seeing severe swapping a few weeks, with kernel mem usage >> exceeding 200Mb. Given an ip_conntrack value of 1000-2000 or so >> during normal loads, the mem usage seemed excessive. Kernel 2.4.22- >> 28, shorewall 1.4.6c >> >> 2. In an effort to improve things until we can replace the machine, >> we installed an additional 128Mb and restarted. The machine failed to >> boot... Strange. >> 3. We had a standby machine also running Mandrake 9.2, and quickly >> transferred ifcfgs and shorewall configs to it. It ran for a few >> minutes before *locking up*. This is a machine that has run >> flawlessly for 5 years 24/7, albeit not with Mandrake 9.2, but NT >> 4.0. Restarted, and the lockup repeated 3 more times within 30 >> minutes. This machine was a dual PII/233 with 256Mb. Kernel >> 2.4.22-10, shorewall 1.4.6c >> >> 4. Transferred firewall to another spare server, this one also >> Mandrake 9.2, dual PIII/733, 256Mb. It ran well from 8pm until 4am >> when it also *locked up*. Again, this hardware has been flawless 24/7 >> since new, and has been running Mandrake 9.2 for over 6 months, and >> had latest updates from Mandrake a few weeks ago. Kernel >> 2.4.22-10mdk, shorewall 2.0.10. >> >> This morning I applied the following updates that were new using >> urpmi. Reading through the notes on these updates, I can find nothing >> that would suggest an exploit or bug causing our lockups. >> libxml2-2.5.11-1.3.92mdk.i586.rpm >> libxml2-utils-2.5.11-1.3.92mdk.i586.rpm >> iptables-1.2.8-2.1.92mdk.i586.rpm >> webmin-1.100-3.3.92mdk.src.rpm >> perl-devel-5.8.1-0.RC4.3.1.92mdk.i586.rpm >> perl-5.8.1-0.RC4.3.1.92mdk.i586.rpm >> perl-base-5.8.1-0.RC4.3.1.92mdk.i586.rpm >> shadow-utils-4.0.3-5.1.92mdk.i586.rpm >> >> This seems unlikely to be a shorewall issue, but I''m hoping someone >> here has seen this before. Are there any known issues with SMP >> hardware and the 2.4.22-10 kernel? That seems to be one common link >> between the two systems suffering lockup. >> >> Thanks for any assistance you can provide. >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Shawn Wright, I.T. Manager >> Shawnigan Lake School >> http://www.sls.bc.ca >> swright@sls.bc.ca >> >> >> _______________________________________________ >> Shorewall-users mailing list >> Post: Shorewall-users@lists.shorewall.net >> Subscribe/Unsubscribe: >> https://lists.shorewall.net/mailman/listinfo/shorewall-users >> Support: http://www.shorewall.net/support.htm >> FAQ: http://www.shorewall.net/FAQ.htm >> >> >> > > _______________________________________________ > Shorewall-users mailing list > Post: Shorewall-users@lists.shorewall.net > Subscribe/Unsubscribe: > https://lists.shorewall.net/mailman/listinfo/shorewall-users > Support: http://www.shorewall.net/support.htm > FAQ: http://www.shorewall.net/FAQ.htm >
On 17 Nov 2004 at 23:39, Philipp Rusch wrote:> > Philipp Rusch schrieb: > > > Hi Shawn, > > I had a similar problem about a year ago when one of my customers was hit > > by the slammer virus. This "ate" up the arp cache on the external > > interface, > > and the machine was dying because it did not have enough memory to buffer > > all those "half open" connections when slammer scanned the networks for > > a victim machine. I got rid of this by dropping the packets instead of > > rejecting them, seems that this uses much less resources than a drop. > > correction: ... "this uses much less resources than a reject " of courseThanks - I assumed that was a typo. I''ve so far found nothing too unusual in the logs except that a large number of rejects were from inside the firewall, from hosts with broken routing tables. Most of our local machines don''t use the fw as a gateway, but instead go through a Cisco Cat6K. In the 100 minute period of log that I looked at, I had approx 100k entries, of which 5200 were rejects, and 4600 were drops. Most of the rejects were from inside our network. So far today, we''ve had two more lockups. I installed the 2.4.22-37 kernel, and rebooted into into it after the 1st lockup, in hopes it would address the problem. It lasted two hours before the next lockup, less than an hour ago. I have yanked the 2nd CPU out, and will see who locks up first - me of the fw... At the moment, it appears to be a hardware or kernel /driver issue (I was suspecting the 2.4.22 kernel bug with SMP/aic7xxx driver, but am not sure what to think now). But if anyone has any ideas on what to look for in the logs or my shorewall config, please let me know. After 8 years of nearly flawless firewall operation, I admit I am not well versed in tracking these things down. Thanks. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Shawn Wright, I.T. Manager Shawnigan Lake School http://www.sls.bc.ca swright@sls.bc.ca
This may seem like a somewhat ignorant question, but I''m ignorant, so I''ll ask. Is there a reason you''re not running the latest 2.4 series kernel, or running a stable 2.6 series kernel to take advantage of all the new SMP work that''s been done? Also, have you checked to make sure its not a physical hardware fault in the machines? I realize that you''ve swapped a few out, but nasty coincidences do happen with old hardware.
OK this is time for a twilight moment, I''ve an old box at home that ran fine for god knows how long, and is now experiencing similar lockup issues -- it''s behind a firewall, and i honestly am not sure what kernel version it''s running (2.2.something for sure, 2.2.17 maybe?). It''s a totally non-critical box so I haven''t even looked into the matter yet...but it has been going on for quite a while now. It''s almost certainly a kernel issue, or some RAM going bad in my case but as I said, haven''t even looked at it.
On 17 Nov 2004 at 19:37, Gary Buckmaster wrote:> This may seem like a somewhat ignorant question, but I''m ignorant, so > I''ll ask. Is there a reason you''re not running the latest 2.4 series > kernel, or running a stable 2.6 series kernel to take advantage of all > the new SMP work that''s been done?Most of our linux machines are not SMP, and I''m fairly cautious of major new releases, so 2.4 remains the newest production kernel here. We also still run NT4.0 servers for similar reasons, and they are still rock solid - I can''t say the same for win2k/xp.> Also, have you checked to make sure its not a physical hardware fault > in the machines? I realize that you''ve swapped a few out, but nasty > coincidences do happen with old hardware.I haven''t ruled anything out yet, but two SMP machines running the same kernel and SCSI cards seemed like a possible link until it locked up on the newer kernel also. However, the newly improved *single* CPU machine (after I pulled the 2nd one) has now survived overnight, rather than the 2- 4 hours I was seeing yesterday. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Shawn Wright, I.T. Manager Shawnigan Lake School http://www.sls.bc.ca swright@sls.bc.ca
On Thu, 2004-11-18 at 09:28 -0800, Shawn Wright wrote:> > I haven''t ruled anything out yet, but two SMP machines running the same > kernel and SCSI cards seemed like a possible link until it locked up on the > newer kernel also. However, the newly improved *single* CPU machine > (after I pulled the 2nd one) has now survived overnight, rather than the 2- > 4 hours I was seeing yesterday.Netfilter itself also has a history of SMP-related bugs. -Tom -- Tom Eastep \ Nothing is foolproof to a sufficiently talented fool Shoreline, \ http://shorewall.net Washington USA \ teastep@shorewall.net PGP Public Key \ https://lists.shorewall.net/teastep.pgp.key
On 18 Nov 2004 at 10:38, Tom Eastep wrote:> On Thu, 2004-11-18 at 09:28 -0800, Shawn Wright wrote: > > > > > I haven''t ruled anything out yet, but two SMP machines running the same > > kernel and SCSI cards seemed like a possible link until it locked up on the > > newer kernel also. However, the newly improved *single* CPU machine > > (after I pulled the 2nd one) has now survived overnight, rather than the 2- > > 4 hours I was seeing yesterday. > > Netfilter itself also has a history of SMP-related bugs.I was wondering about that, but had not yet done any research into it. I am still trying to recoup some of the lost sleep from the past few days. I think the best option is to avoid SMP for now. Interestingly, I''ve yet to find a windows firewall that can handle SMP properly either, since I run an SMP windows box at home. I''m still trying to find time to build a proper (linux) firewall at for home. Thanks for the tip. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Shawn Wright, I.T. Manager Shawnigan Lake School http://www.sls.bc.ca swright@sls.bc.ca