Paul Klapperich
2017-Feb-08 21:59 UTC
[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?
I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of Windows and Linux clients. Everything's been running fine for a while and nothing changed on the server. Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th around 2am the Archlinux clients using 4.9.x kernels and utilizing mount.cifs to access samba shares began thrashing on TCP port 445, causing high CPU load on the server. These machines now cause thrashing after 15-20 minutes whenever a share is mounted using mount.cifs. When it's thrashing, I see thousands of opened ports from a single client: # sockstat -4 | grep 10.0.1.87 | wc 10013 70091 740962 And on the client, the port is constant changing: $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:53122 10.0.0.8:445 ESTABLISHED 0 1253359 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:53700 10.0.0.8:445 ESTABLISHED 0 1253439 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:53926 10.0.0.8:445 ESTABLISHED 0 1254557 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54148 10.0.0.8:445 ESTABLISHED 0 1253578 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54352 10.0.0.8:445 ESTABLISHED 0 1253604 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54518 10.0.0.8:445 ESTABLISHED 0 1254685 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54698 10.0.0.8:445 ESTABLISHED 0 1252177 As a work around, I can downgrade these client machines to any 4.8.x kernel and the issue goes away. My suspicion is something is weird in my smb.conf and a change in the 4.9.x kernels exposes that weirdness. Or maybe there's a bug that was introduced in 4.9 and our setup exposes it. I've built 4.10rc kernels from Linus's git repo and they also have the problem. The 4.9 kernel I built from Linus's git has the problem, but the 4.8 kernel I built does not, so I don't think it's related to any patching done by Archlinux. I don't understand why the issue didn't happen immediately after upgrading kernels on the 27th, but now it very consistently acts up after less than 20 minutes. Attached is the smb.conf used on one of my FreeNAS servers. I was able to copy that config to an Archlinux system running Samba version 4.5.3 (commenting lines 24, 25, 55, and 79 and adjusting the "interfaces =" line) and the problem persists, so it doesn't appear to be specific to FreeNas or Samba 4.3.6. -- Paul Klapperich
Rowland Penny
2017-Feb-08 22:36 UTC
[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?
On Wed, 8 Feb 2017 15:59:16 -0600 Paul Klapperich via samba <samba at lists.samba.org> wrote:> I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of > Windows and Linux clients. Everything's been running fine for a while > and nothing changed on the server. > > Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x > kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th > around 2am the Archlinux clients using 4.9.x kernels and utilizing > mount.cifs to access samba shares began thrashing on TCP port 445, > causing high CPU load on the server. These machines now cause > thrashing after 15-20 minutes whenever a share is mounted using > mount.cifs. > > When it's thrashing, I see thousands of opened ports from a single > client: # sockstat -4 | grep 10.0.1.87 | wc > 10013 70091 740962 > > And on the client, the port is constant changing: > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:53122 10.0.0.8:445 > ESTABLISHED 0 1253359 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:53700 10.0.0.8:445 > ESTABLISHED 0 1253439 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:53926 10.0.0.8:445 > ESTABLISHED 0 1254557 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54148 10.0.0.8:445 > ESTABLISHED 0 1253578 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54352 10.0.0.8:445 > ESTABLISHED 0 1253604 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54518 10.0.0.8:445 > ESTABLISHED 0 1254685 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54698 10.0.0.8:445 > ESTABLISHED 0 1252177 > > As a work around, I can downgrade these client machines to any 4.8.x > kernel and the issue goes away. My suspicion is something is weird in > my smb.conf and a change in the 4.9.x kernels exposes that weirdness. > Or maybe there's a bug that was introduced in 4.9 and our setup > exposes it. > > I've built 4.10rc kernels from Linus's git repo and they also have the > problem. The 4.9 kernel I built from Linus's git has the problem, but > the 4.8 kernel I built does not, so I don't think it's related to any > patching done by Archlinux. I don't understand why the issue didn't > happen immediately after upgrading kernels on the 27th, but now it > very consistently acts up after less than 20 minutes. > > Attached is the smb.conf used on one of my FreeNAS servers. I was > able to copy that config to an Archlinux system running Samba version > 4.5.3 (commenting lines 24, 25, 55, and 79 and adjusting the > "interfaces =" line) and the problem persists, so it doesn't appear > to be specific to FreeNas or Samba 4.3.6. > > -- > Paul KlapperichUnfortunately, this list removes attachments, you will need to post your smb.conf in the actual message. Rowland
Paul Klapperich
2017-Feb-08 22:43 UTC
[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?
Very well. Here is the affected smb.conf.
------
[global]
server min protocol = NT1
server max protocol = SMB3
interfaces = 127.0.0.1 10.0.0.8
bind interfaces only = yes
encrypt passwords = yes
dns proxy = no
strict locking = no
oplocks = yes
deadtime = 15
max log size = 51200
max open files = 2830016
logging = file
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes
getwd cache = yes
guest account = nobody
map to guest = Bad User
obey pam restrictions = yes
directory name cache size = 0
kernel change notify = no
panic action = /usr/local/libexec/samba/samba-backtrace
nsupdate command = /usr/local/bin/samba-nsupdate -g
server string = backup of files
ea support = yes
store dos attributes = yes
lm announce = yes
hostname lookups = yes
unix extensions = no
acl allow execute always = true
dos filemode = yes
multicast dns register = no
local master = no
idmap config *: backend = tdb
idmap config *: range = 10000-90000
server role = member server
security = user
passdb backend = ldapsam:ldap://ldap0.packetdigital.com
ldap admin dn = cn=admin,dc=packetdigital,dc=com
ldap suffix = dc=packetdigital,dc=com
ldap user suffix = ou=Users
ldap group suffix = ou=Groups
ldap ssl = off
ldap replication sleep = 1000
ldap passwd sync = yes
ldapsam:trusted = yes
netbios name = HAMMER
workgroup = PACKETDIGITAL
domain logons = yes
idmap config PACKETDIGITAL: backend = ldap
idmap config PACKETDIGITAL: range = 10000-90000
idmap config PACKETDIGITAL: ldap url = ldap0.packetdigital.com
pid directory = /var/run/samba
create mask = 0666
directory mask = 0777
client ntlmv2 auth = yes
dos charset = CP437
unix charset = UTF-8
log level = 1
#map unix users to 1 or more names
## can map an @group to a username
#username map = /mnt/storage/configs/samba_users.map
follow symlinks = yes
wide links = yes
unix extensions = no
create mask = 0660
idmap uid = 10000-90000
idmap gid = 10000-90000
[Software]
path = /mnt/storage/cifs-share/Software
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
vfs objects = zfs_space zfsacl
hide dot files = yes
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare
create mask = 0775
force create mode = 0775
directory mask = 0775
force directory mode = 0775
force group = Software
valid users = @Software, @Software-RO
read only = yes
write list = @Software
------
--
Paul Klapperich
On Wed, Feb 8, 2017 at 4:36 PM, Rowland Penny via samba <
samba at lists.samba.org> wrote:
> On Wed, 8 Feb 2017 15:59:16 -0600
> Paul Klapperich via samba <samba at lists.samba.org> wrote:
>
> > I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of
> > Windows and Linux clients. Everything's been running fine for a
while
> > and nothing changed on the server.
> >
> > Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x
> > kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th
> > around 2am the Archlinux clients using 4.9.x kernels and utilizing
> > mount.cifs to access samba shares began thrashing on TCP port 445,
> > causing high CPU load on the server. These machines now cause
> > thrashing after 15-20 minutes whenever a share is mounted using
> > mount.cifs.
> >
> > When it's thrashing, I see thousands of opened ports from a single
> > client: # sockstat -4 | grep 10.0.1.87 | wc
> > 10013 70091 740962
> >
> > And on the client, the port is constant changing:
> > $ netstat -net | grep 10.0.0.8
> > tcp 0 0 10.0.1.87:53122 10.0.0.8:445
> > ESTABLISHED 0 1253359
> > $ netstat -net | grep 10.0.0.8
> > tcp 0 0 10.0.1.87:53700 10.0.0.8:445
> > ESTABLISHED 0 1253439
> > $ netstat -net | grep 10.0.0.8
> > tcp 0 0 10.0.1.87:53926 10.0.0.8:445
> > ESTABLISHED 0 1254557
> > $ netstat -net | grep 10.0.0.8
> > tcp 0 0 10.0.1.87:54148 10.0.0.8:445
> > ESTABLISHED 0 1253578
> > $ netstat -net | grep 10.0.0.8
> > tcp 0 0 10.0.1.87:54352 10.0.0.8:445
> > ESTABLISHED 0 1253604
> > $ netstat -net | grep 10.0.0.8
> > tcp 0 0 10.0.1.87:54518 10.0.0.8:445
> > ESTABLISHED 0 1254685
> > $ netstat -net | grep 10.0.0.8
> > tcp 0 0 10.0.1.87:54698 10.0.0.8:445
> > ESTABLISHED 0 1252177
> >
> > As a work around, I can downgrade these client machines to any 4.8.x
> > kernel and the issue goes away. My suspicion is something is weird in
> > my smb.conf and a change in the 4.9.x kernels exposes that weirdness.
> > Or maybe there's a bug that was introduced in 4.9 and our setup
> > exposes it.
> >
> > I've built 4.10rc kernels from Linus's git repo and they also
have the
> > problem. The 4.9 kernel I built from Linus's git has the problem,
but
> > the 4.8 kernel I built does not, so I don't think it's related
to any
> > patching done by Archlinux. I don't understand why the issue
didn't
> > happen immediately after upgrading kernels on the 27th, but now it
> > very consistently acts up after less than 20 minutes.
> >
> > Attached is the smb.conf used on one of my FreeNAS servers. I was
> > able to copy that config to an Archlinux system running Samba version
> > 4.5.3 (commenting lines 24, 25, 55, and 79 and adjusting the
> > "interfaces =" line) and the problem persists, so it
doesn't appear
> > to be specific to FreeNas or Samba 4.3.6.
> >
> > --
> > Paul Klapperich
>
> Unfortunately, this list removes attachments, you will need to post
> your smb.conf in the actual message.
>
> Rowland
>
> --
> To unsubscribe from this list go to the following URL and read the
> instructions: https://lists.samba.org/mailman/options/samba
>
Seemingly Similar Threads
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?