Hi folks, Wondering if this rings any bells for anyone: After upgrading a handful of web servers from FreeBSD 4.11 with ipfw to 6.1-STABLE with pf, customers started reporting that occasionally their server side scripts would fail to connect to the SQL servers (which are still 4.11 and are attached via a separate dedicated gigabit network). A test page that makes 10,000 rapid SQL connections which connected 100% of the time before, now will usually see anywhere from one or two failed connections to a dozen or so (per 10,000) After trying many other things first, we finally found that 'pf' seems to be the culprit. Disabling pf with pfctl -d allows 100% of all connections to work, and as soon as we enable it we see connection failures again. I've tried changing the pf rule set in different ways, with and without scrubbing, with and without queues, even to the point where I have a single rule that just allows everything. It doesn't seem to matter what the rules actually are, just whether or not pf is enabled. I recompiled the kernel with pf disabled and ipfw enabled, and it works fine with 100% successful connections. We have no funky compiler options or anything like that. Any thoughts? Mark -- Mark Morley Owner / Administrator Islandnet.com
Mark Morley wrote:> Wondering if this rings any bells for anyone: >Yes it does... I had been seeing similar issues for some time on a couple HP Proliant servers - saw it in 5.4 as well - but have been attributing this to driver related issues (the bge driver in particular, which has seen many changes, fixes and enhancements in relatively recent history). In trying to isolate that particular problem I had been applying kernel updates regularly, pf was disabled along with a few other things (also switched from using mpd/netgraph to openvpn/udp), and the problem vanished at some point in between. I cannot definitely name pf as being the culprit as no testing of this was done at the time to confirm it. I had assumed the bge driver changes were responsible for things now working as they should. In addition to the occasional connection failure, I've also seen established connections broken (ssh, http, mysql/ssl and pptp/gre). This was causing havoc with mysql replication over the link, which became very brittle, and required manual fixing (it would get stuck, unable to read the last event in its relay log whenever a disconnection occurred and had to be manually pushed onto the next - mysql 5.0.[3 - .11 or so]).
On Wed, Jun 07, 2006 at 04:25:37PM -0700, Mark Morley wrote:> Disabling pf with pfctl -d allows 100% of all connections to work, and > as soon as we enable it we see connection failures again. > > I've tried changing the pf rule set in different ways, with and without > scrubbing, with and without queues, even to the point where I have a single > rule that just allows everything. It doesn't seem to matter what the rules > actually are, just whether or not pf is enabled.Was that single pass rule using 'keep state'? There is a default limit of 10,000 state entries (configurable with 'set limit states' in pf.conf). A state entry persists for several seconds even after a connection is closed, so quickly establishing 10,000 connections could easily hit that limit. Enable pf and load an empty ruleset (pfctl -e -Fa). Note the output of pfctl -si . Then repeat the test. Then run pfctl -si again, and compare the output with the previous one. Are any counters increasing? Daniel
> A test page that makes 10,000 rapid SQL connections which > connected 100% of the time before, now will usually see > anywhere from one or two failed connections to a dozen or so > (per 10,000)Have you kept track of state table entries during this process with pfctl -si ? You may find that you need to increase set limit states>From the default as a consequenceGreg
On Wed, Jun 07, 2006 at 04:25:37PM -0700, Mark Morley wrote:> Hi folks, > > Wondering if this rings any bells for anyone: > > After upgrading a handful of web servers from FreeBSD 4.11 with ipfw > to 6.1-STABLE with pf, customers started reporting that occasionally > their server side scripts would fail to connect to the SQL servers > (which are still 4.11 and are attached via a separate dedicated > gigabit network). > > A test page that makes 10,000 rapid SQL connections which connected 100% > of the time before, now will usually see anywhere from one or two failed > connections to a dozen or so (per 10,000) > > After trying many other things first, we finally found that 'pf' seems > to be the culprit. > > Disabling pf with pfctl -d allows 100% of all connections to work, and > as soon as we enable it we see connection failures again. > > I've tried changing the pf rule set in different ways, with and without > scrubbing, with and without queues, even to the point where I have a single > rule that just allows everything. It doesn't seem to matter what the rules > actually are, just whether or not pf is enabled. > > I recompiled the kernel with pf disabled and ipfw enabled, and it works > fine with 100% successful connections. We have no funky compiler options > or anything like that. > > Any thoughts?could you show us the followings: - pf.conf - kernel configuration file - uname -a next time please include technical information along with the textual description of your problem Bye, Gergely Czuczy mailto: gergely.czuczy@harmless.hu PGP: http://phoemix.harmless.hu/phoemix.pgp Weenies test. Geniuses solve problems that arise. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060608/64ea734a/attachment.pgp
Hi. I'm not sure it is related to your case but... I have seen a situation when application used for load-testing web server running on MS Windows box failed establishing HTTP connections to the server . Investigation identified that this is due to the fact that Windows relatively quickly reuses source TCP port numbers for these outbound connections. I'm not sure if Microsoft violates TCP standard with that or not. The fact is that pf keeps "closed" entries in the state table for 90 second and it still remembers old source port when Windows send SYN from it trying to establish new connection. As result, pf considers that packet invalid and drops it. You can check pfctl -s info . In my case the state-mismatch counter was increasing with for every falied connection. In any case, output of that tool can be very useful to you - if you see one of counters for dropped packet increasing, you will have an idea why. Regards, Dmitry Andrianov PS: my problem was solved adding following lines to pf.conf: # set short timeout for TCP closed state because Windows tends to reuse # the same outgoing port very quickly and pf starts refusing new connections # because of invalid state # (This occurs when load testing DMZ server from LAN) set timeout { tcp.closed 15 } -----Original Message----- From: owner-freebsd-pf@freebsd.org [mailto:owner-freebsd-pf@freebsd.org] On Behalf Of Mark Morley Sent: Thursday, June 08, 2006 3:26 AM To: freebsd-pf@freebsd.org; freebsd-stable@freebsd.org Subject: pf buggy on 6.1-STABLE? Hi folks, Wondering if this rings any bells for anyone: After upgrading a handful of web servers from FreeBSD 4.11 with ipfw to 6.1-STABLE with pf, customers started reporting that occasionally their server side scripts would fail to connect to the SQL servers (which are still 4.11 and are attached via a separate dedicated gigabit network). A test page that makes 10,000 rapid SQL connections which connected 100% of the time before, now will usually see anywhere from one or two failed connections to a dozen or so (per 10,000) After trying many other things first, we finally found that 'pf' seems to be the culprit. Disabling pf with pfctl -d allows 100% of all connections to work, and as soon as we enable it we see connection failures again. I've tried changing the pf rule set in different ways, with and without scrubbing, with and without queues, even to the point where I have a single rule that just allows everything. It doesn't seem to matter what the rules actually are, just whether or not pf is enabled. I recompiled the kernel with pf disabled and ipfw enabled, and it works fine with 100% successful connections. We have no funky compiler options or anything like that. Any thoughts? Mark -- Mark Morley Owner / Administrator Islandnet.com _______________________________________________ freebsd-pf@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscribe@freebsd.org"
Mark Morley wrote:> Hi folks, > > Wondering if this rings any bells for anyone: > > After upgrading a handful of web servers from FreeBSD 4.11 with ipfw > to 6.1-STABLE with pf, customers started reporting that occasionally > their server side scripts would fail to connect to the SQL servers > (which are still 4.11 and are attached via a separate dedicated > gigabit network). > > A test page that makes 10,000 rapid SQL connections which connected > 100% > of the time before, now will usually see anywhere from one or two > failed > connections to a dozen or so (per 10,000) > > After trying many other things first, we finally found that 'pf' seems > to be the culprit.I've experienced the same. If you have a lot of concurrent connections going on it seems that every so often an connection will be blocked, even if it doesnt match any rule. In my case I experienced this with apache22 acting as a reverse proxy/virtual host. Symptoms: 1. Sudden burst of traffic to a specific virtual host. 2. After some time, normally <30 seconds one of the connection attempts is reset. 3. Apache immediately stops proxying for any subsequent connections and returning a 'too busy message'. The project this was related to got shelved so it hasn't bothered me again yet, but I didn't find any workaround.> Disabling pf with pfctl -d allows 100% of all connections to work, and > as soon as we enable it we see connection failures again.Snap.> I've tried changing the pf rule set in different ways, with and > without > scrubbing, with and without queues, even to the point where I have a > single > rule that just allows everything. It doesn't seem to matter what the > rules > actually are, just whether or not pf is enabled.Same as me.> I recompiled the kernel with pf disabled and ipfw enabled, and it > works > fine with 100% successful connections. We have no funky compiler > options > or anything like that. > > Any thoughts? > > Mark > > -- > Mark Morley > Owner / Administrator > Islandnet.com > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" >Cheers, Dom