thr3ads.net - freebsd stable - pf killing NFS [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Charles Sprickman

2006-Dec-12 22:10 UTC

pf killing NFS

Hi all,

I'm running a 6.2-RC1 box (cvsup'd today) that has two broadcom nics. 
One
is an internal network (nfs) and the other is external.

PF has this rule for all traffic on the private net:

[root@archive /home/jails]# pfctl -sr|grep bge1
pass in quick on bge1 inet from 192.168.1.0/24 to any
pass out quick on bge1 inet from any to 192.168.1.0/24

No state since these are "quick" and symmetrical.

Doing something like "ls /usr/ports" will just hang until interrupted.
Using tcp for nfs makes it workable, but very slow.

If I disable pf (pfctl -d), both types of mounts work, and speed is 
excellent.  I also just found that if I remove the "scrub in all" 
statement and change it to "scrub in on bge0", things are fine.

Any idea what's going on?  The tcpdump output confuses me (see "bad 
cksum!"), so I'm posting some snippets here.

Looking at tcpdump, things look a bit odd. 192.168.1.111 is the nfs 
client (6.2-RC1), 192.168.1.100 is the nfs server (4.11):

[root@archive /home/spork]# tcpdump -i bge1 -v
tcpdump: listening on bge1, link-type EN10MB (Ethernet), capture size 96 
bytes

00:59:16.269659 IP (tos 0x0, ttl  64, id 5395, offset 0, flags [none], 
proto: UDP (17), length: 132, bad cksum 0 (->e132)!)
192.168.1.111.1861387036 > 192.168.1.100.nfs: 104 access [|nfs]

bad checksum before even hitting the wire??

00:59:16.269920 IP (tos 0x0, ttl  64, id 46705, offset 0, flags [none], 
proto: UDP (17), length: 148) 192.168.1.100.nfs > 
192.168.1.111.1861387036: reply ok 120 access attr: DIR 755 ids 0/0 [|nfs]

We get a reply (dir is mode 755)

00:59:16.270010 IP (tos 0x0, ttl  64, id 5396, offset 0, flags [none], 
proto: UDP (17), length: 132, bad cksum 0 (->e131)!) 
192.168.1.111.1861387037 > 192.168.1.100.nfs: 104 access [|nfs]

Again, bad checksum FROM nfs client to server...

00:59:16.270211 IP (tos 0x0, ttl  64, id 58236, offset 0, flags [none], 
proto: UDP (17), length: 148) 192.168.1.100.nfs > 
192.168.1.111.1861387037: reply ok 120 access attr: DIR 755 ids 0/0 [|nfs]
00:59:16.270306 IP (tos 0x0, ttl  64, id 5397, offset 0, flags [none], 
proto: UDP (17), length: 132, bad cksum 0 (->e130)!) 
192.168.1.111.1861387038 > 192.168.1.100.nfs: 104 access [|nfs]

Now to confuse things further, if I disable pf (pfctl -d), speeds are 
great, but I still get these bad checksum errors:

01:04:21.498293 IP (tos 0x0, ttl  64, id 5482, offset 0, flags [none], 
proto: UDP (17), length: 132, bad cksum 0 (->e0db)!) 
192.168.1.111.1861387048 > 192.168.1.100.nfs: 104 access [|nfs]
01:04:21.498607 IP (tos 0x0, ttl  64, id 16228, offset 0, flags [none], 
proto: UDP (17), length: 148) 192.168.1.100.nfs > 
192.168.1.111.1861387048: reply ok 120 access attr: DIR 755 ids 0/0 [|nfs]
01:04:21.498675 IP (tos 0x0, ttl  64, id 5483, offset 0, flags [none], 
proto: UDP (17), length: 132, bad cksum 0 (->e0da)!) 
192.168.1.111.1861387049 > 192.168.1.100.nfs: 104 access [|nfs]
01:04:21.498900 IP (tos 0x0, ttl  64, id 13349, offset 0, flags [none], 
proto: UDP (17), length: 148) 192.168.1.100.nfs > 
192.168.1.111.1861387049: reply ok 120 access attr: DIR 755 ids 0/0 [|nfs]
01:04:21.498924 IP (tos 0x0, ttl  64, id 5484, offset 0, flags [none], 
proto: UDP (17), length: 132, bad cksum 0 (->e0d9)!) 
192.168.1.111.1861387050 > 192.168.1.100.nfs: 104 access [|nfs]
01:04:21.499195 IP (tos 0x0, ttl  64, id 34907, offset 0, flags [none], 
proto: UDP (17), length: 148) 192.168.1.100.nfs > 
192.168.1.111.1861387050: reply ok 120 access attr: DIR 755 ids 0/0 [|nfs]

Luke Dean

2006-Dec-12 22:51 UTC

head link

pf killing NFS

On Wed, 13 Dec 2006, Charles Sprickman wrote:
> Hi all,
>
> I'm running a 6.2-RC1 box (cvsup'd today) that has two broadcom
nics.  One is
> an internal network (nfs) and the other is external.
>
> PF has this rule for all traffic on the private net:
>
> [root@archive /home/jails]# pfctl -sr|grep bge1
> pass in quick on bge1 inet from 192.168.1.0/24 to any
> pass out quick on bge1 inet from any to 192.168.1.0/24
>
> No state since these are "quick" and symmetrical.
>
> Doing something like "ls /usr/ports" will just hang until
interrupted. Using
> tcp for nfs makes it workable, but very slow.
>
> If I disable pf (pfctl -d), both types of mounts work, and speed is 
> excellent.  I also just found that if I remove the "scrub in all"
statement
> and change it to "scrub in on bge0", things are fine.
I believe it's a bad idea to run NFS traffic through scrub unless you use 
the "no-df" option with it.  I just don't scrub my internal
network
traffic at all.
I got this from "man pf.conf":

      scrub has the following options:

      no-df
            Clears the dont-fragment bit from a matching IP packet.  Some oper-
            ating systems are known to generate fragmented packets with the
            dont-fragment bit set.  This is particularly true with NFS.  Scrub
            will drop such fragmented dont-fragment packets unless no-df is
            specified.

Max Laier

2006-Dec-13 01:12 UTC

head link

pf killing NFS

On Wednesday 13 December 2006 07:10, Charles Sprickman
wrote:> Hi all,
>
> I'm running a 6.2-RC1 box (cvsup'd today) that has two broadcom
nics.
> One is an internal network (nfs) and the other is external.
>
> PF has this rule for all traffic on the private net:
>
> [root@archive /home/jails]# pfctl -sr|grep bge1
> pass in quick on bge1 inet from 192.168.1.0/24 to any
> pass out quick on bge1 inet from any to 192.168.1.0/24
>
> No state since these are "quick" and symmetrical.
>
> Doing something like "ls /usr/ports" will just hang until
interrupted.
> Using tcp for nfs makes it workable, but very slow.
>
> If I disable pf (pfctl -d), both types of mounts work, and speed is
> excellent.  I also just found that if I remove the "scrub in all"
> statement and change it to "scrub in on bge0", things are fine.
>
> Any idea what's going on?  The tcpdump output confuses me (see
"bad
> cksum!"), so I'm posting some snippets here.
As Luke already pointed out, "no-df" on the scrub rule should help. 
As
for the "bad cksum!" - this is a symptom of checksumming done in 
hardware.  ifconfig bge1 -rxcsum -txcsum should get rid of them.

-- 
/"\  Best regards,                      | mlaier@freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   pf4freebsd.love2party.net  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url :
lists.freebsd.org/pipermail/freebsd-stable/attachments/20061213/489e42bc/attachment.pgp

Pete French

2006-Dec-13 03:04 UTC

head link

pf killing NFS

> I'm running a 6.2-RC1 box (cvsup'd today) that has two broadcom
nics.  One
> is an internal network (nfs) and the other is external.
...> Doing something like "ls /usr/ports" will just hang until
interrupted.
> Using tcp for nfs makes it workable, but very slow.
Oddly enough I hit precisely this problem last night - with a cvsup from a
few days ago. I have tried adding the 'no-df' flag to the scrub rules,
but this
did not help much. What I ended up doing was this:

scrub in on bge0 proto tcp fragment reassemble random-id

so that I am not scrubbing UDP traffic. this works fine.

-pete.

Pete French

2006-Dec-13 03:05 UTC

head link

pf killing NFS

> As Luke already pointed out, "no-df" on the scrub rule should
help.  As=20
> for the "bad cksum!" - this is a symptom of checksumming done
in=20
> hardware.  ifconfig bge1 -rxcsum -txcsum should get rid of them.
I am a bit concerned by this - we use a lot of bge interfaces, and I have
hardware checksumming enabled on all of them. Are they known to produce
bad checksums ?

-pete.

Max Laier

2006-Dec-13 03:10 UTC

head link

pf killing NFS

On Wednesday 13 December 2006 12:05, Pete French wrote:> > As Luke already pointed out, "no-df" on the scrub rule
should help.
> > As=20 for the "bad cksum!" - this is a symptom of
checksumming done
> > in=20 hardware.  ifconfig bge1 -rxcsum -txcsum should get rid of
> > them.
>
> I am a bit concerned by this - we use a lot of bge interfaces, and I
> have hardware checksumming enabled on all of them. Are they known to
> produce bad checksums ?
You are misunderstanding.  The problem is simply that the bpf device sees 
bad checksums as it sees the packet before the hardware has calculated 
it.  On the receiver the checksum will be correct.

-- 
/"\  Best regards,                      | mlaier@freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   pf4freebsd.love2party.net  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url :
lists.freebsd.org/pipermail/freebsd-stable/attachments/20061213/4567f525/attachment.pgp

Pete French

2006-Dec-13 03:15 UTC

head link

pf killing NFS

> You are misunderstanding.  The problem is simply that the bpf device
sees=20
> bad checksums as it sees the packet before the hardware has calculated=20
> it.  On the receiver the checksum will be correct.
Ah, gotcha. That makes perfect sense now.

-pete.

freebsd stable - Dec 2006 - pf killing NFS

pf killing NFS

pf killing NFS

pf killing NFS

pf killing NFS

pf killing NFS

pf killing NFS

pf killing NFS