Paul Klapperich
2017-Feb-08 21:59 UTC
[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?
I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of Windows and Linux clients. Everything's been running fine for a while and nothing changed on the server. Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th around 2am the Archlinux clients using 4.9.x kernels and utilizing mount.cifs to access samba shares began thrashing on TCP port 445, causing high CPU load on the server. These machines now cause thrashing after 15-20 minutes whenever a share is mounted using mount.cifs. When it's thrashing, I see thousands of opened ports from a single client: # sockstat -4 | grep 10.0.1.87 | wc 10013 70091 740962 And on the client, the port is constant changing: $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:53122 10.0.0.8:445 ESTABLISHED 0 1253359 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:53700 10.0.0.8:445 ESTABLISHED 0 1253439 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:53926 10.0.0.8:445 ESTABLISHED 0 1254557 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54148 10.0.0.8:445 ESTABLISHED 0 1253578 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54352 10.0.0.8:445 ESTABLISHED 0 1253604 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54518 10.0.0.8:445 ESTABLISHED 0 1254685 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54698 10.0.0.8:445 ESTABLISHED 0 1252177 As a work around, I can downgrade these client machines to any 4.8.x kernel and the issue goes away. My suspicion is something is weird in my smb.conf and a change in the 4.9.x kernels exposes that weirdness. Or maybe there's a bug that was introduced in 4.9 and our setup exposes it. I've built 4.10rc kernels from Linus's git repo and they also have the problem. The 4.9 kernel I built from Linus's git has the problem, but the 4.8 kernel I built does not, so I don't think it's related to any patching done by Archlinux. I don't understand why the issue didn't happen immediately after upgrading kernels on the 27th, but now it very consistently acts up after less than 20 minutes. Attached is the smb.conf used on one of my FreeNAS servers. I was able to copy that config to an Archlinux system running Samba version 4.5.3 (commenting lines 24, 25, 55, and 79 and adjusting the "interfaces =" line) and the problem persists, so it doesn't appear to be specific to FreeNas or Samba 4.3.6. -- Paul Klapperich
Rowland Penny
2017-Feb-08 22:36 UTC
[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?
On Wed, 8 Feb 2017 15:59:16 -0600 Paul Klapperich via samba <samba at lists.samba.org> wrote:> I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of > Windows and Linux clients. Everything's been running fine for a while > and nothing changed on the server. > > Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x > kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th > around 2am the Archlinux clients using 4.9.x kernels and utilizing > mount.cifs to access samba shares began thrashing on TCP port 445, > causing high CPU load on the server. These machines now cause > thrashing after 15-20 minutes whenever a share is mounted using > mount.cifs. > > When it's thrashing, I see thousands of opened ports from a single > client: # sockstat -4 | grep 10.0.1.87 | wc > 10013 70091 740962 > > And on the client, the port is constant changing: > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:53122 10.0.0.8:445 > ESTABLISHED 0 1253359 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:53700 10.0.0.8:445 > ESTABLISHED 0 1253439 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:53926 10.0.0.8:445 > ESTABLISHED 0 1254557 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54148 10.0.0.8:445 > ESTABLISHED 0 1253578 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54352 10.0.0.8:445 > ESTABLISHED 0 1253604 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54518 10.0.0.8:445 > ESTABLISHED 0 1254685 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54698 10.0.0.8:445 > ESTABLISHED 0 1252177 > > As a work around, I can downgrade these client machines to any 4.8.x > kernel and the issue goes away. My suspicion is something is weird in > my smb.conf and a change in the 4.9.x kernels exposes that weirdness. > Or maybe there's a bug that was introduced in 4.9 and our setup > exposes it. > > I've built 4.10rc kernels from Linus's git repo and they also have the > problem. The 4.9 kernel I built from Linus's git has the problem, but > the 4.8 kernel I built does not, so I don't think it's related to any > patching done by Archlinux. I don't understand why the issue didn't > happen immediately after upgrading kernels on the 27th, but now it > very consistently acts up after less than 20 minutes. > > Attached is the smb.conf used on one of my FreeNAS servers. I was > able to copy that config to an Archlinux system running Samba version > 4.5.3 (commenting lines 24, 25, 55, and 79 and adjusting the > "interfaces =" line) and the problem persists, so it doesn't appear > to be specific to FreeNas or Samba 4.3.6. > > -- > Paul KlapperichUnfortunately, this list removes attachments, you will need to post your smb.conf in the actual message. Rowland
Paul Klapperich
2017-Feb-08 22:43 UTC
[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?
Very well. Here is the affected smb.conf. ------ [global] server min protocol = NT1 server max protocol = SMB3 interfaces = 127.0.0.1 10.0.0.8 bind interfaces only = yes encrypt passwords = yes dns proxy = no strict locking = no oplocks = yes deadtime = 15 max log size = 51200 max open files = 2830016 logging = file load printers = no printing = bsd printcap name = /dev/null disable spoolss = yes getwd cache = yes guest account = nobody map to guest = Bad User obey pam restrictions = yes directory name cache size = 0 kernel change notify = no panic action = /usr/local/libexec/samba/samba-backtrace nsupdate command = /usr/local/bin/samba-nsupdate -g server string = backup of files ea support = yes store dos attributes = yes lm announce = yes hostname lookups = yes unix extensions = no acl allow execute always = true dos filemode = yes multicast dns register = no local master = no idmap config *: backend = tdb idmap config *: range = 10000-90000 server role = member server security = user passdb backend = ldapsam:ldap://ldap0.packetdigital.com ldap admin dn = cn=admin,dc=packetdigital,dc=com ldap suffix = dc=packetdigital,dc=com ldap user suffix = ou=Users ldap group suffix = ou=Groups ldap ssl = off ldap replication sleep = 1000 ldap passwd sync = yes ldapsam:trusted = yes netbios name = HAMMER workgroup = PACKETDIGITAL domain logons = yes idmap config PACKETDIGITAL: backend = ldap idmap config PACKETDIGITAL: range = 10000-90000 idmap config PACKETDIGITAL: ldap url = ldap0.packetdigital.com pid directory = /var/run/samba create mask = 0666 directory mask = 0777 client ntlmv2 auth = yes dos charset = CP437 unix charset = UTF-8 log level = 1 #map unix users to 1 or more names ## can map an @group to a username #username map = /mnt/storage/configs/samba_users.map follow symlinks = yes wide links = yes unix extensions = no create mask = 0660 idmap uid = 10000-90000 idmap gid = 10000-90000 [Software] path = /mnt/storage/cifs-share/Software printable = no veto files = /.snapshot/.windows/.mac/.zfs/ writeable = yes browseable = yes vfs objects = zfs_space zfsacl hide dot files = yes guest ok = no nfs4:mode = special nfs4:acedup = merge nfs4:chown = true zfsacl:acesort = dontcare create mask = 0775 force create mode = 0775 directory mask = 0775 force directory mode = 0775 force group = Software valid users = @Software, @Software-RO read only = yes write list = @Software ------ -- Paul Klapperich On Wed, Feb 8, 2017 at 4:36 PM, Rowland Penny via samba < samba at lists.samba.org> wrote:> On Wed, 8 Feb 2017 15:59:16 -0600 > Paul Klapperich via samba <samba at lists.samba.org> wrote: > > > I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of > > Windows and Linux clients. Everything's been running fine for a while > > and nothing changed on the server. > > > > Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x > > kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th > > around 2am the Archlinux clients using 4.9.x kernels and utilizing > > mount.cifs to access samba shares began thrashing on TCP port 445, > > causing high CPU load on the server. These machines now cause > > thrashing after 15-20 minutes whenever a share is mounted using > > mount.cifs. > > > > When it's thrashing, I see thousands of opened ports from a single > > client: # sockstat -4 | grep 10.0.1.87 | wc > > 10013 70091 740962 > > > > And on the client, the port is constant changing: > > $ netstat -net | grep 10.0.0.8 > > tcp 0 0 10.0.1.87:53122 10.0.0.8:445 > > ESTABLISHED 0 1253359 > > $ netstat -net | grep 10.0.0.8 > > tcp 0 0 10.0.1.87:53700 10.0.0.8:445 > > ESTABLISHED 0 1253439 > > $ netstat -net | grep 10.0.0.8 > > tcp 0 0 10.0.1.87:53926 10.0.0.8:445 > > ESTABLISHED 0 1254557 > > $ netstat -net | grep 10.0.0.8 > > tcp 0 0 10.0.1.87:54148 10.0.0.8:445 > > ESTABLISHED 0 1253578 > > $ netstat -net | grep 10.0.0.8 > > tcp 0 0 10.0.1.87:54352 10.0.0.8:445 > > ESTABLISHED 0 1253604 > > $ netstat -net | grep 10.0.0.8 > > tcp 0 0 10.0.1.87:54518 10.0.0.8:445 > > ESTABLISHED 0 1254685 > > $ netstat -net | grep 10.0.0.8 > > tcp 0 0 10.0.1.87:54698 10.0.0.8:445 > > ESTABLISHED 0 1252177 > > > > As a work around, I can downgrade these client machines to any 4.8.x > > kernel and the issue goes away. My suspicion is something is weird in > > my smb.conf and a change in the 4.9.x kernels exposes that weirdness. > > Or maybe there's a bug that was introduced in 4.9 and our setup > > exposes it. > > > > I've built 4.10rc kernels from Linus's git repo and they also have the > > problem. The 4.9 kernel I built from Linus's git has the problem, but > > the 4.8 kernel I built does not, so I don't think it's related to any > > patching done by Archlinux. I don't understand why the issue didn't > > happen immediately after upgrading kernels on the 27th, but now it > > very consistently acts up after less than 20 minutes. > > > > Attached is the smb.conf used on one of my FreeNAS servers. I was > > able to copy that config to an Archlinux system running Samba version > > 4.5.3 (commenting lines 24, 25, 55, and 79 and adjusting the > > "interfaces =" line) and the problem persists, so it doesn't appear > > to be specific to FreeNas or Samba 4.3.6. > > > > -- > > Paul Klapperich > > Unfortunately, this list removes attachments, you will need to post > your smb.conf in the actual message. > > Rowland > > -- > To unsubscribe from this list go to the following URL and read the > instructions: https://lists.samba.org/mailman/options/samba >
Seemingly Similar Threads
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?