thr3ads.net - freebsd stable - RELENG_8 pf stack issue (state count spiraling out of control) [May 2011]

If this information is useful, please help other people find it:
Share via:

Jeremy Chadwick

2011-May-03 01:59 UTC

RELENG_8 pf stack issue (state count spiraling out of control)

(Please keep me CC'd as I'm not subscribed to freebsd-pf.  And apologies
for cross-posting, but the issue is severe enough that I wanted to make
it known on -stable)

The below issue I'm describing is from a machine running 8.2-PRERELEASE
(RELENG_8) using src dated Tue Feb 15 05:46:02 PST 2011.

Please read the story in full, as I have taken the time to describe
everything I did, plus log output, as well as induce a panic via "call
doadump" from ddb so I have a capture of the system at the time.  I also
have a theory as to what caused the problem, but how to trigger it is
unknown; it may be a rare race condition.


This morning I woke up to find a report from one of our users that he
could not connect to a specific TCP port (not SSH) on one of our
servers.  I also found that I couldn't SSH into the same box.  Serial
console was working fine, and the serial console log showed no sign of
any problems.

I started to debug the issue of me not being able to SSH into the
machine and within a few minutes became immediately concerned: pfctl
indicated we had reached the maximum number permitted state table
entries (10,000).

===========================================================# pfctl -s info
Status: Enabled for 76 days 06:49:10          Debug: Urgent

Interface Stats for em0               IPv4             IPv6
  Bytes In                      8969748840                0
  Bytes Out                     8296135477                0
  Packets In
    Passed                       128211763                0
    Blocked                         621379                0
  Packets Out
    Passed                       138483868                0
    Blocked                           2579                0

State Table                          Total             Rate
  current entries                    10000
  searches                       267316807           40.6/s
  inserts                          4440553            0.7/s
  removals                         4430553            0.7/s
Counters
  match                            5067474            0.8/s
  bad-offset                             0            0.0/s
  fragment                             324            0.0/s
  short                                  0            0.0/s
  normalize                             32            0.0/s
  memory                            336946            0.1/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                              0            0.0/s
  proto-cksum                         1611            0.0/s
  state-mismatch                       509            0.0/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s

# pfctl -s memory
states        hard limit    10000
src-nodes     hard limit    10000
frags         hard limit     5000
tables        hard limit     1000
table-entries hard limit   100000
===========================================================
The above is mainly for em0 (our WAN interface); our LAN interface (em1)
was not impacted because we use "set skip on em1".  And it's a
good
thing too: we have lots of LAN-based services that this machine provides
that would have been impacted.  We also use "set skip on lo0".

I immediately went to look at our monitoring graphs, which monitor pf
state (specifically state table entries), polled via bsnmpd(8).  This
data is even more frightening:

http://jdc.parodius.com/freebsd/pf-issue/pf_states-day.png
http://jdc.parodius.com/freebsd/pf-issue/pf_states-week.png

Literally something was spiraling out of control, starting at approx.
2011/05/01 (Sun) at 12:30 PDT.  The situation became dire at approx.
19:45 PDT the same day, but I wasn't aware of it until said user brought
an issue to my attention.

You can see from the network I/O graphs (taken from SNMP polling our
switch, NOT from the host/box itself) that there was no DoS attack or
anything like that occurring -- this was something within FreeBSD
itself.  More evidence of that will become apparent.

http://jdc.parodius.com/freebsd/pf-issue/port_03-day.png
http://jdc.parodius.com/freebsd/pf-issue/port_03-week.png

The first thing I did was "/etc/rc.d/pf reload".  This command hung.
Any attempt to send Ctrl-C/SIGINT did nothing.  I was able to
Ctrl-Z/SIGSTOP it, then use kill %1, but the actual reload process did
not truly die (despite csh stating "Terminated").  The only way to
kill
it was to kill -9.

Attempts to shut down any daemons which utilised the network --
including things that only used em1 -- would not shut down.  This
included things like postfix, mysqld, and some inet-based services.  I
was forced to kill -9 them.  Things like bsnmpd, however, did shut down.

Equally as uncomfortable, "shutdown -r now" did not reboot the system.
That is to say, wall(1)'s announcement was shown, but the actual
stopping of services did not begin.

The next thing I tried was "/etc/rc.d/pf stop", which worked.  Then I
did "/etc/rc.d/pf start", which also worked.  However, what I saw next
surely indicated a bug in the pf layer somewhere -- "pfctl -s states"
and "pfctl -s info" disagreed on the state count:

===========================================================# pfctl -s info
Status: Enabled for 0 days 00:00:16           Debug: Urgent

Interface Stats for em0               IPv4             IPv6
  Bytes In                            3459                0
  Bytes Out                              0                0
  Packets In
    Passed                               0                0
    Blocked                             29                0
  Packets Out
    Passed                               0                0
    Blocked                              0                0

State Table                          Total             Rate
  current entries                    10000
  searches                              29            1.8/s
  inserts                                0            0.0/s
  removals                               0            0.0/s
Counters
  match                                 29            1.8/s
  bad-offset                             0            0.0/s
  fragment                               0            0.0/s
  short                                  0            0.0/s
  normalize                              0            0.0/s
  memory                                18            1.1/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                              0            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                         0            0.0/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s

# pfctl -s state | wc -l
       0
===========================================================
The "pf uptime" shown above, by the way, matches system uptime.

I then attempted "pfctl -F state", but nothing changed (looked the
same
as above).

Since I could not reboot the box, I was forced to drop to ddb via serial
console.  I did some commands like "ps" and the like, and then
"call
doadump" to induce a kernel panic, and then "reboot" (which
worked).

Once the machine came back up, savecore(8) ran, wrote the data out, and
everything has been fine since.  /var/crash/core.txt.0 is ~68KBytes and
I do not feel comfortable sharing its content publicly, but will be
happy to hand it to developer(s) who are interested.  Relevant tidbits I
can discern:

------------------------------------------------------------------------
ps -axl

  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
    0   422     0   0 -16  0     0     0 pftm   DL    ??  1362773081:04.00
[pfpurge]
------------------------------------------------------------------------

------------------------------------------------------------------------
vmstat -z

ITEM                     SIZE     LIMIT      USED      FREE  REQUESTS  FAILURES
pfsrctrpl:                152,    10000,        0,        0,        0,        0
pfrulepl:                 912,        0,       40,       88,      806,        0
pfstatepl:                392,    10000,    10000,        0,  4440553,   341638
pfaltqpl:                 240,        0,        0,        0,        0,        0
pfpooladdrpl:              88,        0,        0,        0,        0,        0
pfrktable:               1296,     1002,        4,       20,      112,        0
pfrkentry:                216,   100008,      603,      891,    15384,        0
pfrkentry2:               216,        0,        0,        0,        0,        0
pffrent:                   32,     5050,        0,      303,     1620,        0
pffrag:                    80,        0,        0,      135,      807,        0
pffrcache:                 80,    10035,        0,        0,        0,        0
pffrcent:                  24,    50022,        0,        0,        0,        0
pfstatescrub:              40,        0,        0,        0,        0,        0
pfiaddrpl:                120,        0,        0,        0,        0,        0
pfospfen:                 112,        0,      696,       30,    18096,        0
pfosfp:                    40,        0,      407,       97,    10582,        0
------------------------------------------------------------------------

You can see evidence of processes not exiting/doing what they should do
here:

------------------------------------------------------------------------
fstat

USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
root     shutdown   91155 root /             2 drwxr-xr-x     512  r
root     shutdown   91155   wd /             2 drwxr-xr-x     512  r
root     shutdown   91155 text /         47195 -r-sr-x---   15912  r
root     shutdown   91155    0 /dev         38 crw-------   ttyu0 rw
root     shutdown   91155    1 /dev         38 crw-------   ttyu0 rw
root     shutdown   91155    2 /dev         38 crw-------   ttyu0 rw
root     sh         91129 root /             2 drwxr-xr-x     512  r
root     sh         91129   wd /             2 drwxr-xr-x     512  r
root     sh         91129 text /            44 -r-xr-xr-x  134848  r
root     sh         91129    0 /dev         38 crw-------   ttyu0 rw
root     sh         91129    1* pipe ffffff01e78fc9e0 <-> ffffff01e78fc888
0 rw
root     sh         91129    2 /dev         20 crw-rw-rw-    null  w
root     shutdown   91115 root /             2 drwxr-xr-x     512  r
root     shutdown   91115   wd /storage      5 drwx------      37  r
root     shutdown   91115 text /         47195 -r-sr-x---   15912  r
root     shutdown   91115    0 /dev         38 crw-------   ttyu0 rw
root     shutdown   91115    1 /dev         38 crw-------   ttyu0 rw
root     shutdown   91115    2 /dev         38 crw-------   ttyu0 rw
root     shutdown   91115    3* local dgram ffffff008ff92960
root     sh         90818 root /             2 drwxr-xr-x     512  r
root     sh         90818   wd /         70659 drwxr-xr-x     512  r
root     sh         90818 text /            44 -r-xr-xr-x  134848  r
root     sh         90818    0 /dev         38 crw-------   ttyu0 rw
root     sh         90818    1* pipe ffffff0043f1ecb8 <-> ffffff0043f1eb60
0 rw
root     sh         90818    2 /dev         20 crw-rw-rw-    null  w
root     csh        90802 root /             2 drwxr-xr-x     512  r
root     csh        90802   wd /             2 drwxr-xr-x     512  r
root     csh        90802 text /            51 -r-xr-xr-x  358752  r
root     csh        90802   15 /dev         38 crw-------   ttyu0 rw
root     csh        90802   16 /dev         38 crw-------   ttyu0 rw
root     csh        90802   17 /dev         38 crw-------   ttyu0 rw
root     csh        90802   18 /dev         38 crw-------   ttyu0 rw
root     csh        90802   19 /dev         38 crw-------   ttyu0 rw
------------------------------------------------------------------------

No indication of mbuf exhaustion, putting further focus on the pf stack:

------------------------------------------------------------------------
netstat -m

2054/1786/3840 mbufs in use (current/cache/total)
2048/1414/3462/25600 mbuf clusters in use (current/cache/total/max)
2048/896 mbuf+clusters out of packet secondary zone in use (current/cache)
0/320/320/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/19200 9k jumbo clusters in use (current/cache/total/max)
0/0/0/12800 16k jumbo clusters in use (current/cache/total/max)
4609K/4554K/9164K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
------------------------------------------------------------------------

Here's one piece of core.0.txt which makes no sense to me -- the
"rate"
column.  I have a very hard time believing that was the interrupt rate
of all the relevant devices at the time (way too high).  Maybe this data
becomes wrong only during a coredump?  The total column I could believe.

------------------------------------------------------------------------
vmstat -i

interrupt                          total       rate
irq4: uart0                        54768        912
irq6: fdc0                             1          0
irq17: uhci1+                        172          2
irq23: uhci3 ehci1+                 2367         39
cpu0: timer                  13183882632  219731377
irq256: em0                    260491055    4341517
irq257: em1                    127555036    2125917
irq258: ahci0                  225923164    3765386
cpu2: timer                  13183881837  219731363
cpu1: timer                  13002196469  216703274
cpu3: timer                  13183881783  219731363
Total                        53167869284  886131154
------------------------------------------------------------------------

Here's what a normal "vmstat -i" shows from the command-line:

# vmstat -i
interrupt                          total       rate
irq4: uart0                          518          0
irq6: fdc0                             1          0
irq23: uhci3 ehci1+                  145          0
cpu0: timer                     19041199       1999
irq256: em0                       614280         64
irq257: em1                       168529         17
irq258: ahci0                     355536         37
cpu2: timer                     19040462       1999
cpu1: timer                     19040458       1999
cpu3: timer                     19040454       1999
Total                           77301582       8119

We graph many aspects of this box, including CPU load, memory/swap
usage, etc. and none show any sign that the interrupt rate on all of
those devices was even remotely out of control.  (I would expect to see
CPU through the roof given the above data)

I have since rebuilt/reinstalled world/kernel on the machine with the
latest RELENG_8 code (box is now 8.2-STABLE #0: Mon May  2 14:44:18 PDT
2011), hoping whatever this was may have been fixed.

As for what I think may have triggered it, but I have no hard evidence
of such: on April 29th, I changed our pf.conf and did "/etc/rc.d/pf
reload".  The pf.conf change was a single line:

Old: scrub on em0 all
New: scrub in on em0 all

Why it took the problem approximately 3 days to start is unknown.  It's
the only change we've made to the system (truly/honestly), and it was a
change to pf.conf.

If anyone has advice (or has seen the above problem), or is interested
in debugging it -- as I said, I have a vmcore -- I'm happy to assist in
any way I can.  I would hate for someone else to get bit by this, and
really am hoping its something that has been fixed between February and
now.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

Jason Hellenthal

2011-May-03 05:06 UTC

head link

RELENG_8 pf stack issue (state count spiraling out of control)

Jeremy,


On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick
wrote:>(Please keep me CC'd as I'm not subscribed to freebsd-pf.  And
apologies
>for cross-posting, but the issue is severe enough that I wanted to make
>it known on -stable)
>
>The below issue I'm describing is from a machine running 8.2-PRERELEASE
>(RELENG_8) using src dated Tue Feb 15 05:46:02 PST 2011.
>
>Please read the story in full, as I have taken the time to describe
>everything I did, plus log output, as well as induce a panic via "call
>doadump" from ddb so I have a capture of the system at the time.  I
also
>have a theory as to what caused the problem, but how to trigger it is
>unknown; it may be a rare race condition.
>
>
>This morning I woke up to find a report from one of our users that he
>could not connect to a specific TCP port (not SSH) on one of our
>servers.  I also found that I couldn't SSH into the same box.  Serial
>console was working fine, and the serial console log showed no sign of
>any problems.
>
>I started to debug the issue of me not being able to SSH into the
>machine and within a few minutes became immediately concerned: pfctl
>indicated we had reached the maximum number permitted state table
>entries (10,000).
>
>===========================================================># pfctl -s
info
>Status: Enabled for 76 days 06:49:10          Debug: Urgent
>
>Interface Stats for em0               IPv4             IPv6
>  Bytes In                      8969748840                0
>  Bytes Out                     8296135477                0
>  Packets In
>    Passed                       128211763                0
>    Blocked                         621379                0
>  Packets Out
>    Passed                       138483868                0
>    Blocked                           2579                0
>
>State Table                          Total             Rate
>  current entries                    10000
>  searches                       267316807           40.6/s
>  inserts                          4440553            0.7/s
>  removals                         4430553            0.7/s
>Counters
>  match                            5067474            0.8/s
>  bad-offset                             0            0.0/s
>  fragment                             324            0.0/s
>  short                                  0            0.0/s
>  normalize                             32            0.0/s
>  memory                            336946            0.1/s
>  bad-timestamp                          0            0.0/s
>  congestion                             0            0.0/s
>  ip-option                              0            0.0/s
>  proto-cksum                         1611            0.0/s
>  state-mismatch                       509            0.0/s
>  state-insert                           0            0.0/s
>  state-limit                            0            0.0/s
>  src-limit                              0            0.0/s
>  synproxy                               0            0.0/s
>
># pfctl -s memory
>states        hard limit    10000
>src-nodes     hard limit    10000
>frags         hard limit     5000
>tables        hard limit     1000
>table-entries hard limit   100000
>===========================================================>
>The above is mainly for em0 (our WAN interface); our LAN interface (em1)
>was not impacted because we use "set skip on em1".  And it's a
good
>thing too: we have lots of LAN-based services that this machine provides
>that would have been impacted.  We also use "set skip on lo0".
>
>I immediately went to look at our monitoring graphs, which monitor pf
>state (specifically state table entries), polled via bsnmpd(8).  This
>data is even more frightening:
>
>http://jdc.parodius.com/freebsd/pf-issue/pf_states-day.png
>http://jdc.parodius.com/freebsd/pf-issue/pf_states-week.png
>
>Literally something was spiraling out of control, starting at approx.
>2011/05/01 (Sun) at 12:30 PDT.  The situation became dire at approx.
>19:45 PDT the same day, but I wasn't aware of it until said user brought
>an issue to my attention.
>
>You can see from the network I/O graphs (taken from SNMP polling our
>switch, NOT from the host/box itself) that there was no DoS attack or
>anything like that occurring -- this was something within FreeBSD
>itself.  More evidence of that will become apparent.
>
>http://jdc.parodius.com/freebsd/pf-issue/port_03-day.png
>http://jdc.parodius.com/freebsd/pf-issue/port_03-week.png
>
>The first thing I did was "/etc/rc.d/pf reload".  This command
hung.
>Any attempt to send Ctrl-C/SIGINT did nothing.  I was able to
>Ctrl-Z/SIGSTOP it, then use kill %1, but the actual reload process did
>not truly die (despite csh stating "Terminated").  The only way to
kill
>it was to kill -9.
>
>Attempts to shut down any daemons which utilised the network --
>including things that only used em1 -- would not shut down.  This
>included things like postfix, mysqld, and some inet-based services.  I
>was forced to kill -9 them.  Things like bsnmpd, however, did shut down.
>
>Equally as uncomfortable, "shutdown -r now" did not reboot the
system.
>That is to say, wall(1)'s announcement was shown, but the actual
>stopping of services did not begin.
>
>The next thing I tried was "/etc/rc.d/pf stop", which worked. 
Then I
>did "/etc/rc.d/pf start", which also worked.  However, what I saw
next
>surely indicated a bug in the pf layer somewhere -- "pfctl -s
states"
>and "pfctl -s info" disagreed on the state count:
>
>===========================================================># pfctl -s
info
>Status: Enabled for 0 days 00:00:16           Debug: Urgent
>
>Interface Stats for em0               IPv4             IPv6
>  Bytes In                            3459                0
>  Bytes Out                              0                0
>  Packets In
>    Passed                               0                0
>    Blocked                             29                0
>  Packets Out
>    Passed                               0                0
>    Blocked                              0                0
>
>State Table                          Total             Rate
>  current entries                    10000
>  searches                              29            1.8/s
>  inserts                                0            0.0/s
>  removals                               0            0.0/s
>Counters
>  match                                 29            1.8/s
>  bad-offset                             0            0.0/s
>  fragment                               0            0.0/s
>  short                                  0            0.0/s
>  normalize                              0            0.0/s
>  memory                                18            1.1/s
>  bad-timestamp                          0            0.0/s
>  congestion                             0            0.0/s
>  ip-option                              0            0.0/s
>  proto-cksum                            0            0.0/s
>  state-mismatch                         0            0.0/s
>  state-insert                           0            0.0/s
>  state-limit                            0            0.0/s
>  src-limit                              0            0.0/s
>  synproxy                               0            0.0/s
>
># pfctl -s state | wc -l
>       0
>===========================================================>
>The "pf uptime" shown above, by the way, matches system uptime.
>
>I then attempted "pfctl -F state", but nothing changed (looked the
same
>as above).
>
>Since I could not reboot the box, I was forced to drop to ddb via serial
>console.  I did some commands like "ps" and the like, and then
"call
>doadump" to induce a kernel panic, and then "reboot" (which
worked).
>
>Once the machine came back up, savecore(8) ran, wrote the data out, and
>everything has been fine since.  /var/crash/core.txt.0 is ~68KBytes and
>I do not feel comfortable sharing its content publicly, but will be
>happy to hand it to developer(s) who are interested.  Relevant tidbits I
>can discern:
>
>------------------------------------------------------------------------
>ps -axl
>
>  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
>    0   422     0   0 -16  0     0     0 pftm   DL    ??  1362773081:04.00
[pfpurge]
>------------------------------------------------------------------------
>
>------------------------------------------------------------------------
>vmstat -z
>
>ITEM                     SIZE     LIMIT      USED      FREE  REQUESTS 
FAILURES
>pfsrctrpl:                152,    10000,        0,        0,        0,      
0
>pfrulepl:                 912,        0,       40,       88,      806,      
0
>pfstatepl:                392,    10000,    10000,        0,  4440553,  
341638
>pfaltqpl:                 240,        0,        0,        0,        0,      
0
>pfpooladdrpl:              88,        0,        0,        0,        0,      
0
>pfrktable:               1296,     1002,        4,       20,      112,      
0
>pfrkentry:                216,   100008,      603,      891,    15384,      
0
>pfrkentry2:               216,        0,        0,        0,        0,      
0
>pffrent:                   32,     5050,        0,      303,     1620,      
0
>pffrag:                    80,        0,        0,      135,      807,      
0
>pffrcache:                 80,    10035,        0,        0,        0,      
0
>pffrcent:                  24,    50022,        0,        0,        0,      
0
>pfstatescrub:              40,        0,        0,        0,        0,      
0
>pfiaddrpl:                120,        0,        0,        0,        0,      
0
>pfospfen:                 112,        0,      696,       30,    18096,      
0
>pfosfp:                    40,        0,      407,       97,    10582,      
0
>------------------------------------------------------------------------
>
>You can see evidence of processes not exiting/doing what they should do
>here:
>
>------------------------------------------------------------------------
>fstat
>
>USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
>root     shutdown   91155 root /             2 drwxr-xr-x     512  r
>root     shutdown   91155   wd /             2 drwxr-xr-x     512  r
>root     shutdown   91155 text /         47195 -r-sr-x---   15912  r
>root     shutdown   91155    0 /dev         38 crw-------   ttyu0 rw
>root     shutdown   91155    1 /dev         38 crw-------   ttyu0 rw
>root     shutdown   91155    2 /dev         38 crw-------   ttyu0 rw
>root     sh         91129 root /             2 drwxr-xr-x     512  r
>root     sh         91129   wd /             2 drwxr-xr-x     512  r
>root     sh         91129 text /            44 -r-xr-xr-x  134848  r
>root     sh         91129    0 /dev         38 crw-------   ttyu0 rw
>root     sh         91129    1* pipe ffffff01e78fc9e0 <->
ffffff01e78fc888      0 rw
>root     sh         91129    2 /dev         20 crw-rw-rw-    null  w
>root     shutdown   91115 root /             2 drwxr-xr-x     512  r
>root     shutdown   91115   wd /storage      5 drwx------      37  r
>root     shutdown   91115 text /         47195 -r-sr-x---   15912  r
>root     shutdown   91115    0 /dev         38 crw-------   ttyu0 rw
>root     shutdown   91115    1 /dev         38 crw-------   ttyu0 rw
>root     shutdown   91115    2 /dev         38 crw-------   ttyu0 rw
>root     shutdown   91115    3* local dgram ffffff008ff92960
>root     sh         90818 root /             2 drwxr-xr-x     512  r
>root     sh         90818   wd /         70659 drwxr-xr-x     512  r
>root     sh         90818 text /            44 -r-xr-xr-x  134848  r
>root     sh         90818    0 /dev         38 crw-------   ttyu0 rw
>root     sh         90818    1* pipe ffffff0043f1ecb8 <->
ffffff0043f1eb60      0 rw
>root     sh         90818    2 /dev         20 crw-rw-rw-    null  w
>root     csh        90802 root /             2 drwxr-xr-x     512  r
>root     csh        90802   wd /             2 drwxr-xr-x     512  r
>root     csh        90802 text /            51 -r-xr-xr-x  358752  r
>root     csh        90802   15 /dev         38 crw-------   ttyu0 rw
>root     csh        90802   16 /dev         38 crw-------   ttyu0 rw
>root     csh        90802   17 /dev         38 crw-------   ttyu0 rw
>root     csh        90802   18 /dev         38 crw-------   ttyu0 rw
>root     csh        90802   19 /dev         38 crw-------   ttyu0 rw
>------------------------------------------------------------------------
>
>No indication of mbuf exhaustion, putting further focus on the pf stack:
>
>------------------------------------------------------------------------
>netstat -m
>
>2054/1786/3840 mbufs in use (current/cache/total)
>2048/1414/3462/25600 mbuf clusters in use (current/cache/total/max)
>2048/896 mbuf+clusters out of packet secondary zone in use (current/cache)
>0/320/320/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
>0/0/0/19200 9k jumbo clusters in use (current/cache/total/max)
>0/0/0/12800 16k jumbo clusters in use (current/cache/total/max)
>4609K/4554K/9164K bytes allocated to network (current/cache/total)
>0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>0 requests for sfbufs denied
>0 requests for sfbufs delayed
>0 requests for I/O initiated by sendfile
>0 calls to protocol drain routines
>------------------------------------------------------------------------
>
>Here's one piece of core.0.txt which makes no sense to me -- the
"rate"
>column.  I have a very hard time believing that was the interrupt rate
>of all the relevant devices at the time (way too high).  Maybe this data
>becomes wrong only during a coredump?  The total column I could believe.
>
>------------------------------------------------------------------------
>vmstat -i
>
>interrupt                          total       rate
>irq4: uart0                        54768        912
>irq6: fdc0                             1          0
>irq17: uhci1+                        172          2
>irq23: uhci3 ehci1+                 2367         39
>cpu0: timer                  13183882632  219731377
>irq256: em0                    260491055    4341517
>irq257: em1                    127555036    2125917
>irq258: ahci0                  225923164    3765386
>cpu2: timer                  13183881837  219731363
>cpu1: timer                  13002196469  216703274
>cpu3: timer                  13183881783  219731363
>Total                        53167869284  886131154
>------------------------------------------------------------------------
>
>Here's what a normal "vmstat -i" shows from the command-line:
>
># vmstat -i
>interrupt                          total       rate
>irq4: uart0                          518          0
>irq6: fdc0                             1          0
>irq23: uhci3 ehci1+                  145          0
>cpu0: timer                     19041199       1999
>irq256: em0                       614280         64
>irq257: em1                       168529         17
>irq258: ahci0                     355536         37
>cpu2: timer                     19040462       1999
>cpu1: timer                     19040458       1999
>cpu3: timer                     19040454       1999
>Total                           77301582       8119
>
>We graph many aspects of this box, including CPU load, memory/swap
>usage, etc. and none show any sign that the interrupt rate on all of
>those devices was even remotely out of control.  (I would expect to see
>CPU through the roof given the above data)
>
>I have since rebuilt/reinstalled world/kernel on the machine with the
>latest RELENG_8 code (box is now 8.2-STABLE #0: Mon May  2 14:44:18 PDT
>2011), hoping whatever this was may have been fixed.
>
>As for what I think may have triggered it, but I have no hard evidence
>of such: on April 29th, I changed our pf.conf and did "/etc/rc.d/pf
>reload".  The pf.conf change was a single line:
>
>Old: scrub on em0 all
>New: scrub in on em0 all
>
>Why it took the problem approximately 3 days to start is unknown.  It's
>the only change we've made to the system (truly/honestly), and it was a
>change to pf.conf.
>
>If anyone has advice (or has seen the above problem), or is interested
>in debugging it -- as I said, I have a vmcore -- I'm happy to assist in
>any way I can.  I would hate for someone else to get bit by this, and
>really am hoping its something that has been fixed between February and
>now.
>
That's quite the deduction there. I've noticed recently that you were
also
experimenting with the new NFS server recompiling kernel etc etc. Seeing 
as weird things can happen with DNS, NFS and mountpoint's, is this the same 
machine that you were doing that on ?

If so can you check to see how many requests for NFS operations were done 
to/from that box ? as well the names that would be being resolved and if 
that machine can resolve them ?

Also I would believe your using tables in your pf.conf, if so do any of 
those tables contain a FQDN that cannot be resolved from that machine ?

I think you probably see what I am getting at here as it could be some 
sort of concurrent recursive DNS failure that can only be seen from the 
machine caused by possibly the new NFS backend or a change in one of the 
tables that pf would use.

-- 

 Regards, (jhell)
 Jason Hellenthal

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 522 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110503/ccfa78e3/attachment.pgp

Vlad Galu

2011-May-03 05:53 UTC

head link

RELENG_8 pf stack issue (state count spiraling out of control)

On Tue, May 3, 2011 at 3:58 AM, Jeremy Chadwick
<freebsd@jdc.parodius.com>wrote:
> (Please keep me CC'd as I'm not subscribed to freebsd-pf.  And
apologies
> for cross-posting, but the issue is severe enough that I wanted to make
> it known on -stable)
>
> The below issue I'm describing is from a machine running 8.2-PRERELEASE
> (RELENG_8) using src dated Tue Feb 15 05:46:02 PST 2011.
>
> Please read the story in full, as I have taken the time to describe
> everything I did, plus log output, as well as induce a panic via "call
> doadump" from ddb so I have a capture of the system at the time.  I
also
> have a theory as to what caused the problem, but how to trigger it is
> unknown; it may be a rare race condition.
>
>
> This morning I woke up to find a report from one of our users that he
> could not connect to a specific TCP port (not SSH) on one of our
> servers.  I also found that I couldn't SSH into the same box.  Serial
> console was working fine, and the serial console log showed no sign of
> any problems.
>
> I started to debug the issue of me not being able to SSH into the
> machine and within a few minutes became immediately concerned: pfctl
> indicated we had reached the maximum number permitted state table
> entries (10,000).
>
> ===========================================================> # pfctl -s
info
> Status: Enabled for 76 days 06:49:10          Debug: Urgent
>
> Interface Stats for em0               IPv4             IPv6
>  Bytes In                      8969748840                0
>  Bytes Out                     8296135477                0
>  Packets In
>    Passed                       128211763                0
>    Blocked                         621379                0
>  Packets Out
>    Passed                       138483868                0
>    Blocked                           2579                0
>
> State Table                          Total             Rate
>  current entries                    10000
>  searches                       267316807           40.6/s
>  inserts                          4440553            0.7/s
>  removals                         4430553            0.7/s
> Counters
>  match                            5067474            0.8/s
>  bad-offset                             0            0.0/s
>  fragment                             324            0.0/s
>  short                                  0            0.0/s
>  normalize                             32            0.0/s
>  memory                            336946            0.1/s
>  bad-timestamp                          0            0.0/s
>  congestion                             0            0.0/s
>  ip-option                              0            0.0/s
>  proto-cksum                         1611            0.0/s
>  state-mismatch                       509            0.0/s
>  state-insert                           0            0.0/s
>  state-limit                            0            0.0/s
>  src-limit                              0            0.0/s
>  synproxy                               0            0.0/s
>
> # pfctl -s memory
> states        hard limit    10000
> src-nodes     hard limit    10000
> frags         hard limit     5000
> tables        hard limit     1000
> table-entries hard limit   100000
> ===========================================================>
> The above is mainly for em0 (our WAN interface); our LAN interface (em1)
> was not impacted because we use "set skip on em1".  And it's
a good
> thing too: we have lots of LAN-based services that this machine provides
> that would have been impacted.  We also use "set skip on lo0".
>
> I immediately went to look at our monitoring graphs, which monitor pf
> state (specifically state table entries), polled via bsnmpd(8).  This
> data is even more frightening:
>
> http://jdc.parodius.com/freebsd/pf-issue/pf_states-day.png
> http://jdc.parodius.com/freebsd/pf-issue/pf_states-week.png
>
> Literally something was spiraling out of control, starting at approx.
> 2011/05/01 (Sun) at 12:30 PDT.  The situation became dire at approx.
> 19:45 PDT the same day, but I wasn't aware of it until said user
brought
> an issue to my attention.
>
> You can see from the network I/O graphs (taken from SNMP polling our
> switch, NOT from the host/box itself) that there was no DoS attack or
> anything like that occurring -- this was something within FreeBSD
> itself.  More evidence of that will become apparent.
>
> http://jdc.parodius.com/freebsd/pf-issue/port_03-day.png
> http://jdc.parodius.com/freebsd/pf-issue/port_03-week.png
>
> The first thing I did was "/etc/rc.d/pf reload".  This command
hung.
> Any attempt to send Ctrl-C/SIGINT did nothing.  I was able to
> Ctrl-Z/SIGSTOP it, then use kill %1, but the actual reload process did
> not truly die (despite csh stating "Terminated").  The only way
to kill
> it was to kill -9.
>
> Attempts to shut down any daemons which utilised the network --
> including things that only used em1 -- would not shut down.  This
> included things like postfix, mysqld, and some inet-based services.  I
> was forced to kill -9 them.  Things like bsnmpd, however, did shut down.
>
> Equally as uncomfortable, "shutdown -r now" did not reboot the
system.
> That is to say, wall(1)'s announcement was shown, but the actual
> stopping of services did not begin.
>
> The next thing I tried was "/etc/rc.d/pf stop", which worked. 
Then I
> did "/etc/rc.d/pf start", which also worked.  However, what I saw
next
> surely indicated a bug in the pf layer somewhere -- "pfctl -s
states"
> and "pfctl -s info" disagreed on the state count:
>
> ===========================================================> # pfctl -s
info
> Status: Enabled for 0 days 00:00:16           Debug: Urgent
>
> Interface Stats for em0               IPv4             IPv6
>  Bytes In                            3459                0
>  Bytes Out                              0                0
>  Packets In
>    Passed                               0                0
>    Blocked                             29                0
>  Packets Out
>    Passed                               0                0
>    Blocked                              0                0
>
> State Table                          Total             Rate
>  current entries                    10000
>  searches                              29            1.8/s
>  inserts                                0            0.0/s
>  removals                               0            0.0/s
> Counters
>  match                                 29            1.8/s
>  bad-offset                             0            0.0/s
>  fragment                               0            0.0/s
>  short                                  0            0.0/s
>  normalize                              0            0.0/s
>  memory                                18            1.1/s
>  bad-timestamp                          0            0.0/s
>  congestion                             0            0.0/s
>  ip-option                              0            0.0/s
>  proto-cksum                            0            0.0/s
>  state-mismatch                         0            0.0/s
>  state-insert                           0            0.0/s
>  state-limit                            0            0.0/s
>  src-limit                              0            0.0/s
>  synproxy                               0            0.0/s
>
> # pfctl -s state | wc -l
>       0
> ===========================================================>
> The "pf uptime" shown above, by the way, matches system uptime.
>
> I then attempted "pfctl -F state", but nothing changed (looked
the same
> as above).
>
> Since I could not reboot the box, I was forced to drop to ddb via serial
> console.  I did some commands like "ps" and the like, and then
"call
> doadump" to induce a kernel panic, and then "reboot" (which
worked).
>
> Once the machine came back up, savecore(8) ran, wrote the data out, and
> everything has been fine since.  /var/crash/core.txt.0 is ~68KBytes and
> I do not feel comfortable sharing its content publicly, but will be
> happy to hand it to developer(s) who are interested.  Relevant tidbits I
> can discern:
>
> ------------------------------------------------------------------------
> ps -axl
>
>  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
>    0   422     0   0 -16  0     0     0 pftm   DL    ??  1362773081:04.00
> [pfpurge]
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------
> vmstat -z
>
> ITEM                     SIZE     LIMIT      USED      FREE  REQUESTS
>  FAILURES
> pfsrctrpl:                152,    10000,        0,        0,        0,
>    0
> pfrulepl:                 912,        0,       40,       88,      806,
>    0
> pfstatepl:                392,    10000,    10000,        0,  4440553,
> 341638
> pfaltqpl:                 240,        0,        0,        0,        0,
>    0
> pfpooladdrpl:              88,        0,        0,        0,        0,
>    0
> pfrktable:               1296,     1002,        4,       20,      112,
>    0
> pfrkentry:                216,   100008,      603,      891,    15384,
>    0
> pfrkentry2:               216,        0,        0,        0,        0,
>    0
> pffrent:                   32,     5050,        0,      303,     1620,
>    0
> pffrag:                    80,        0,        0,      135,      807,
>    0
> pffrcache:                 80,    10035,        0,        0,        0,
>    0
> pffrcent:                  24,    50022,        0,        0,        0,
>    0
> pfstatescrub:              40,        0,        0,        0,        0,
>    0
> pfiaddrpl:                120,        0,        0,        0,        0,
>    0
> pfospfen:                 112,        0,      696,       30,    18096,
>    0
> pfosfp:                    40,        0,      407,       97,    10582,
>    0
> ------------------------------------------------------------------------
>
> You can see evidence of processes not exiting/doing what they should do
> here:
>
> ------------------------------------------------------------------------
> fstat
>
> USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
> root     shutdown   91155 root /             2 drwxr-xr-x     512  r
> root     shutdown   91155   wd /             2 drwxr-xr-x     512  r
> root     shutdown   91155 text /         47195 -r-sr-x---   15912  r
> root     shutdown   91155    0 /dev         38 crw-------   ttyu0 rw
> root     shutdown   91155    1 /dev         38 crw-------   ttyu0 rw
> root     shutdown   91155    2 /dev         38 crw-------   ttyu0 rw
> root     sh         91129 root /             2 drwxr-xr-x     512  r
> root     sh         91129   wd /             2 drwxr-xr-x     512  r
> root     sh         91129 text /            44 -r-xr-xr-x  134848  r
> root     sh         91129    0 /dev         38 crw-------   ttyu0 rw
> root     sh         91129    1* pipe ffffff01e78fc9e0 <->
ffffff01e78fc888
>      0 rw
> root     sh         91129    2 /dev         20 crw-rw-rw-    null  w
> root     shutdown   91115 root /             2 drwxr-xr-x     512  r
> root     shutdown   91115   wd /storage      5 drwx------      37  r
> root     shutdown   91115 text /         47195 -r-sr-x---   15912  r
> root     shutdown   91115    0 /dev         38 crw-------   ttyu0 rw
> root     shutdown   91115    1 /dev         38 crw-------   ttyu0 rw
> root     shutdown   91115    2 /dev         38 crw-------   ttyu0 rw
> root     shutdown   91115    3* local dgram ffffff008ff92960
> root     sh         90818 root /             2 drwxr-xr-x     512  r
> root     sh         90818   wd /         70659 drwxr-xr-x     512  r
> root     sh         90818 text /            44 -r-xr-xr-x  134848  r
> root     sh         90818    0 /dev         38 crw-------   ttyu0 rw
> root     sh         90818    1* pipe ffffff0043f1ecb8 <->
ffffff0043f1eb60
>      0 rw
> root     sh         90818    2 /dev         20 crw-rw-rw-    null  w
> root     csh        90802 root /             2 drwxr-xr-x     512  r
> root     csh        90802   wd /             2 drwxr-xr-x     512  r
> root     csh        90802 text /            51 -r-xr-xr-x  358752  r
> root     csh        90802   15 /dev         38 crw-------   ttyu0 rw
> root     csh        90802   16 /dev         38 crw-------   ttyu0 rw
> root     csh        90802   17 /dev         38 crw-------   ttyu0 rw
> root     csh        90802   18 /dev         38 crw-------   ttyu0 rw
> root     csh        90802   19 /dev         38 crw-------   ttyu0 rw
> ------------------------------------------------------------------------
>
> No indication of mbuf exhaustion, putting further focus on the pf stack:
>
> ------------------------------------------------------------------------
> netstat -m
>
> 2054/1786/3840 mbufs in use (current/cache/total)
> 2048/1414/3462/25600 mbuf clusters in use (current/cache/total/max)
> 2048/896 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/320/320/12800 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/19200 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/12800 16k jumbo clusters in use (current/cache/total/max)
> 4609K/4554K/9164K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> ------------------------------------------------------------------------
>
> Here's one piece of core.0.txt which makes no sense to me -- the
"rate"
> column.  I have a very hard time believing that was the interrupt rate
> of all the relevant devices at the time (way too high).  Maybe this data
> becomes wrong only during a coredump?  The total column I could believe.
>
> ------------------------------------------------------------------------
> vmstat -i
>
> interrupt                          total       rate
> irq4: uart0                        54768        912
> irq6: fdc0                             1          0
> irq17: uhci1+                        172          2
> irq23: uhci3 ehci1+                 2367         39
> cpu0: timer                  13183882632  219731377
> irq256: em0                    260491055    4341517
> irq257: em1                    127555036    2125917
> irq258: ahci0                  225923164    3765386
> cpu2: timer                  13183881837  219731363
> cpu1: timer                  13002196469  216703274
> cpu3: timer                  13183881783  219731363
> Total                        53167869284  886131154
> ------------------------------------------------------------------------
>
> Here's what a normal "vmstat -i" shows from the command-line:
>
> # vmstat -i
> interrupt                          total       rate
> irq4: uart0                          518          0
> irq6: fdc0                             1          0
> irq23: uhci3 ehci1+                  145          0
> cpu0: timer                     19041199       1999
> irq256: em0                       614280         64
> irq257: em1                       168529         17
> irq258: ahci0                     355536         37
> cpu2: timer                     19040462       1999
> cpu1: timer                     19040458       1999
> cpu3: timer                     19040454       1999
> Total                           77301582       8119
>
> We graph many aspects of this box, including CPU load, memory/swap
> usage, etc. and none show any sign that the interrupt rate on all of
> those devices was even remotely out of control.  (I would expect to see
> CPU through the roof given the above data)
>
> I have since rebuilt/reinstalled world/kernel on the machine with the
> latest RELENG_8 code (box is now 8.2-STABLE #0: Mon May  2 14:44:18 PDT
> 2011), hoping whatever this was may have been fixed.
>
> As for what I think may have triggered it, but I have no hard evidence
> of such: on April 29th, I changed our pf.conf and did "/etc/rc.d/pf
> reload".  The pf.conf change was a single line:
>
> Old: scrub on em0 all
> New: scrub in on em0 all
>
> Why it took the problem approximately 3 days to start is unknown.  It's
> the only change we've made to the system (truly/honestly), and it was a
> change to pf.conf.
>
> If anyone has advice (or has seen the above problem), or is interested
> in debugging it -- as I said, I have a vmcore -- I'm happy to assist in
> any way I can.  I would hate for someone else to get bit by this, and
> really am hoping its something that has been fixed between February and
> now.
>
>I'm seeing this as well.  You could change your scrub rules so that you
specifically avoid TCP reassembly (that creates states).

-- 
Good, fast & cheap. Pick any two.

Daniel Hartmeier

2011-May-03 07:29 UTC

head link

RELENG_8 pf stack issue (state count spiraling out of control)

I read those graphs differently: the problem doesn't arise slowly,
but rather seems to start suddenly at 13:00.

Right after 13:00, traffic on em0 drops, i.e. the firewall seems
to stop forwarding packets completely.

Yet, at the same time, the states start to increase, almost linearly
at about one state every two seconds, until the limit of 10,000 is
reached. Reaching the limit seems to be only a side-effect of a
problem that started at 13:00.
> Here's one piece of core.0.txt which makes no sense to me -- the
"rate"
> column.  I have a very hard time believing that was the interrupt rate
> of all the relevant devices at the time (way too high).  Maybe this data
> becomes wrong only during a coredump?  The total column I could believe.
> 
> ------------------------------------------------------------------------
> vmstat -i
> 
> interrupt                          total       rate
> irq4: uart0                        54768        912
> irq6: fdc0                             1          0
> irq17: uhci1+                        172          2
> irq23: uhci3 ehci1+                 2367         39
> cpu0: timer                  13183882632  219731377
> irq256: em0                    260491055    4341517
> irq257: em1                    127555036    2125917
> irq258: ahci0                  225923164    3765386
> cpu2: timer                  13183881837  219731363
> cpu1: timer                  13002196469  216703274
> cpu3: timer                  13183881783  219731363
> Total                        53167869284  886131154
> ------------------------------------------------------------------------
I find this suspect as well, but I don't have an explanation yet.

Are you using anything non-GENERIC related to timers, like change
HZ or enable polling?

Are you sure the problem didn't start right at 13:00, and cause complete
packet loss for the entire period, and that it grew gradually worse
instead?

Daniel

Daniel Hartmeier

2011-May-03 08:48 UTC

head link

RELENG_8 pf stack issue (state count spiraling out of control)

On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:
> Status: Enabled for 76 days 06:49:10          Debug: Urgent
> The "pf uptime" shown above, by the way, matches system uptime.
> ps -axl
> 
>   UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
>     0   422     0   0 -16  0     0     0 pftm   DL    ??  1362773081:04.00
[pfpurge]
This looks weird, too. 1362773081 minutes would be >2500 years.

Usually, you should see [idle] with almost uptime in minutes, and
[pfpurge] with much less, like in

  # uptime
  10:22AM  up 87 days, 19:36, 1 user, load averages: 0.00, 0.03, 0.05
  # echo "((87*24)+19)*60+36" | bc
  126456

  # ps -axl
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
    0     7     0   0  44  0     0     8 pftm   DL    ??    0:13.16 [pfpurge]
    0    11     0   0 171  0     0     8 -      RL    ??  124311:23.04 [idle]

How is time handled on your machine? ntpdate on boot and then ntpd?
Any manual time changes since the last boot?

Daniel

Daniel Hartmeier

2011-May-03 09:23 UTC

head link

RELENG_8 pf stack issue (state count spiraling out of control)

On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:
> Here's one piece of core.0.txt which makes no sense to me -- the
"rate"
> column.  I have a very hard time believing that was the interrupt rate
> of all the relevant devices at the time (way too high).  Maybe this data
> becomes wrong only during a coredump?  The total column I could believe.
> 
> ------------------------------------------------------------------------
> vmstat -i
> 
> interrupt                          total       rate
> irq4: uart0                        54768        912
> irq6: fdc0                             1          0
> irq17: uhci1+                        172          2
> irq23: uhci3 ehci1+                 2367         39
> cpu0: timer                  13183882632  219731377
> irq256: em0                    260491055    4341517
> irq257: em1                    127555036    2125917
> irq258: ahci0                  225923164    3765386
> cpu2: timer                  13183881837  219731363
> cpu1: timer                  13002196469  216703274
> cpu3: timer                  13183881783  219731363
> Total                        53167869284  886131154
> ------------------------------------------------------------------------
> 
> Here's what a normal "vmstat -i" shows from the command-line:
> 
> # vmstat -i
> interrupt                          total       rate
> irq4: uart0                          518          0
> irq6: fdc0                             1          0
> irq23: uhci3 ehci1+                  145          0
> cpu0: timer                     19041199       1999
> irq256: em0                       614280         64
> irq257: em1                       168529         17
> irq258: ahci0                     355536         37
> cpu2: timer                     19040462       1999
> cpu1: timer                     19040458       1999
> cpu3: timer                     19040454       1999
> Total                           77301582       8119
The cpu0-3 timer totals seem consistent in the first output: 
13183881783/1999/60/60/24 matches 76 days of uptime.

The high rate in the first output comes from vmstat.c dointr()'s
division of the total by the uptime:

        struct timespec sp;
        clock_gettime(CLOCK_MONOTONIC, &sp);
        uptime = sp.tv_sec;
	for (i = 0; i < nintr; i++) {
		printf("%-*s %20lu %10lu\n", istrnamlen, intrname,
		    *intrcnt, *intrcnt / uptime);
	}
>From this we can deduce that the value of uptime must have been13183881783/219731363 = 60 (seconds).

Since the uptime was 76 days (and not just 60 seconds), the
CLOCK_MONOTONIC clock must have reset, wrapped, or been overwritten.

I don't know how that's possible, but if this means that the kernel
variable time_second was possibly going back, that could very well
have messed up pf's state purging.

Daniel

Daniel Hartmeier

2011-May-03 10:40 UTC

head link

RELENG_8 pf stack issue (state count spiraling out of control)

On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:
> The next thing I tried was "/etc/rc.d/pf stop", which worked. 
Then I
> did "/etc/rc.d/pf start", which also worked.  However, what I saw
next
> surely indicated a bug in the pf layer somewhere -- "pfctl -s
states"
> and "pfctl -s info" disagreed on the state count:
This can be explained. Note that "/etc/rc.d/pf start" does first flush
all states by calling pfctl -F all.

This calls pf_unlink_state() for every state in the kernel, which
marks each state with PFTM_UNLINKED, but doesn't free it yet.

Such states do not show up in pfctl -s state output, but are still
counted in pfctl -s info output. Normally, they are freed the next
time the pfpurge thread runs (once per second).

It looks like the pfpurge thread was either

  a) sleeping indefinitely, not returning once a second from

	tsleep(pf_purge_thread, PWAIT, "pftm", 1 * hz);

     or

  b) constantly failing to acquire a lock with

	if (!sx_try_upgrade(&pf_consistency_lock))
		return (0);

Maybe a) is possible when CLOCK_MONOTONIC is decreasing? And the
"POKED TIMER" messages you get from BIND, too?

Kind regards,
Daniel

freebsd stable - May 2011 - RELENG_8 pf stack issue (state count spiraling out of control)

RELENG_8 pf stack issue (state count spiraling out of control)

RELENG_8 pf stack issue (state count spiraling out of control)

RELENG_8 pf stack issue (state count spiraling out of control)

RELENG_8 pf stack issue (state count spiraling out of control)

RELENG_8 pf stack issue (state count spiraling out of control)

RELENG_8 pf stack issue (state count spiraling out of control)

RELENG_8 pf stack issue (state count spiraling out of control)