Paul Klapperich
2017-Feb-08  21:59 UTC
[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?
I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of Windows and Linux clients. Everything's been running fine for a while and nothing changed on the server. Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th around 2am the Archlinux clients using 4.9.x kernels and utilizing mount.cifs to access samba shares began thrashing on TCP port 445, causing high CPU load on the server. These machines now cause thrashing after 15-20 minutes whenever a share is mounted using mount.cifs. When it's thrashing, I see thousands of opened ports from a single client: # sockstat -4 | grep 10.0.1.87 | wc 10013 70091 740962 And on the client, the port is constant changing: $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:53122 10.0.0.8:445 ESTABLISHED 0 1253359 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:53700 10.0.0.8:445 ESTABLISHED 0 1253439 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:53926 10.0.0.8:445 ESTABLISHED 0 1254557 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54148 10.0.0.8:445 ESTABLISHED 0 1253578 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54352 10.0.0.8:445 ESTABLISHED 0 1253604 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54518 10.0.0.8:445 ESTABLISHED 0 1254685 $ netstat -net | grep 10.0.0.8 tcp 0 0 10.0.1.87:54698 10.0.0.8:445 ESTABLISHED 0 1252177 As a work around, I can downgrade these client machines to any 4.8.x kernel and the issue goes away. My suspicion is something is weird in my smb.conf and a change in the 4.9.x kernels exposes that weirdness. Or maybe there's a bug that was introduced in 4.9 and our setup exposes it. I've built 4.10rc kernels from Linus's git repo and they also have the problem. The 4.9 kernel I built from Linus's git has the problem, but the 4.8 kernel I built does not, so I don't think it's related to any patching done by Archlinux. I don't understand why the issue didn't happen immediately after upgrading kernels on the 27th, but now it very consistently acts up after less than 20 minutes. Attached is the smb.conf used on one of my FreeNAS servers. I was able to copy that config to an Archlinux system running Samba version 4.5.3 (commenting lines 24, 25, 55, and 79 and adjusting the "interfaces =" line) and the problem persists, so it doesn't appear to be specific to FreeNas or Samba 4.3.6. -- Paul Klapperich
Rowland Penny
2017-Feb-08  22:36 UTC
[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?
On Wed, 8 Feb 2017 15:59:16 -0600 Paul Klapperich via samba <samba at lists.samba.org> wrote:> I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of > Windows and Linux clients. Everything's been running fine for a while > and nothing changed on the server. > > Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x > kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th > around 2am the Archlinux clients using 4.9.x kernels and utilizing > mount.cifs to access samba shares began thrashing on TCP port 445, > causing high CPU load on the server. These machines now cause > thrashing after 15-20 minutes whenever a share is mounted using > mount.cifs. > > When it's thrashing, I see thousands of opened ports from a single > client: # sockstat -4 | grep 10.0.1.87 | wc > 10013 70091 740962 > > And on the client, the port is constant changing: > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:53122 10.0.0.8:445 > ESTABLISHED 0 1253359 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:53700 10.0.0.8:445 > ESTABLISHED 0 1253439 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:53926 10.0.0.8:445 > ESTABLISHED 0 1254557 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54148 10.0.0.8:445 > ESTABLISHED 0 1253578 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54352 10.0.0.8:445 > ESTABLISHED 0 1253604 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54518 10.0.0.8:445 > ESTABLISHED 0 1254685 > $ netstat -net | grep 10.0.0.8 > tcp 0 0 10.0.1.87:54698 10.0.0.8:445 > ESTABLISHED 0 1252177 > > As a work around, I can downgrade these client machines to any 4.8.x > kernel and the issue goes away. My suspicion is something is weird in > my smb.conf and a change in the 4.9.x kernels exposes that weirdness. > Or maybe there's a bug that was introduced in 4.9 and our setup > exposes it. > > I've built 4.10rc kernels from Linus's git repo and they also have the > problem. The 4.9 kernel I built from Linus's git has the problem, but > the 4.8 kernel I built does not, so I don't think it's related to any > patching done by Archlinux. I don't understand why the issue didn't > happen immediately after upgrading kernels on the 27th, but now it > very consistently acts up after less than 20 minutes. > > Attached is the smb.conf used on one of my FreeNAS servers. I was > able to copy that config to an Archlinux system running Samba version > 4.5.3 (commenting lines 24, 25, 55, and 79 and adjusting the > "interfaces =" line) and the problem persists, so it doesn't appear > to be specific to FreeNas or Samba 4.3.6. > > -- > Paul KlapperichUnfortunately, this list removes attachments, you will need to post your smb.conf in the actual message. Rowland
Paul Klapperich
2017-Feb-08  22:43 UTC
[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?
Very well. Here is the affected smb.conf.
------
[global]
    server min protocol = NT1
    server max protocol = SMB3
    interfaces = 127.0.0.1 10.0.0.8
    bind interfaces only = yes
    encrypt passwords = yes
    dns proxy = no
    strict locking = no
    oplocks = yes
    deadtime = 15
    max log size = 51200
    max open files = 2830016
    logging = file
    load printers = no
    printing = bsd
    printcap name = /dev/null
    disable spoolss = yes
    getwd cache = yes
    guest account = nobody
    map to guest = Bad User
    obey pam restrictions = yes
    directory name cache size = 0
    kernel change notify = no
    panic action = /usr/local/libexec/samba/samba-backtrace
    nsupdate command = /usr/local/bin/samba-nsupdate -g
    server string = backup of files
    ea support = yes
    store dos attributes = yes
    lm announce = yes
    hostname lookups = yes
    unix extensions = no
    acl allow execute always = true
    dos filemode = yes
    multicast dns register = no
    local master = no
    idmap config *: backend = tdb
    idmap config *: range = 10000-90000
    server role = member server
    security = user
    passdb backend = ldapsam:ldap://ldap0.packetdigital.com
    ldap admin dn = cn=admin,dc=packetdigital,dc=com
    ldap suffix = dc=packetdigital,dc=com
    ldap user suffix = ou=Users
    ldap group suffix = ou=Groups
    ldap ssl = off
    ldap replication sleep = 1000
    ldap passwd sync = yes
    ldapsam:trusted = yes
    netbios name = HAMMER
    workgroup = PACKETDIGITAL
    domain logons = yes
    idmap config PACKETDIGITAL: backend = ldap
    idmap config PACKETDIGITAL: range = 10000-90000
    idmap config PACKETDIGITAL: ldap url = ldap0.packetdigital.com
    pid directory = /var/run/samba
    create mask = 0666
    directory mask = 0777
    client ntlmv2 auth = yes
    dos charset = CP437
    unix charset = UTF-8
    log level = 1
    #map unix users to 1 or more names
    ## can map an @group to a username
    #username map = /mnt/storage/configs/samba_users.map
    follow symlinks = yes
    wide links = yes
    unix extensions = no
    create mask = 0660
    idmap uid = 10000-90000
    idmap gid = 10000-90000
[Software]
    path = /mnt/storage/cifs-share/Software
    printable = no
    veto files = /.snapshot/.windows/.mac/.zfs/
    writeable = yes
    browseable = yes
    vfs objects = zfs_space zfsacl
    hide dot files = yes
    guest ok = no
    nfs4:mode = special
    nfs4:acedup = merge
    nfs4:chown = true
    zfsacl:acesort = dontcare
    create mask = 0775
    force create mode = 0775
    directory mask = 0775
    force directory mode = 0775
    force group = Software
    valid users = @Software, @Software-RO
    read only = yes
    write list = @Software
------
--
Paul Klapperich
On Wed, Feb 8, 2017 at 4:36 PM, Rowland Penny via samba <
samba at lists.samba.org> wrote:
> On Wed, 8 Feb 2017 15:59:16 -0600
> Paul Klapperich via samba <samba at lists.samba.org> wrote:
>
> > I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of
> > Windows and Linux clients. Everything's been running fine for a
while
> > and nothing changed on the server.
> >
> > Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x
> > kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th
> > around 2am the Archlinux clients using 4.9.x kernels and utilizing
> > mount.cifs to access samba shares began thrashing on TCP port 445,
> > causing high CPU load on the server. These machines now cause
> > thrashing after 15-20 minutes whenever a share is mounted using
> > mount.cifs.
> >
> > When it's thrashing, I see thousands of opened ports from a single
> > client: # sockstat -4 | grep 10.0.1.87 | wc
> >    10013   70091  740962
> >
> > And on the client, the port is constant changing:
> > $ netstat -net | grep 10.0.0.8
> > tcp        0      0 10.0.1.87:53122         10.0.0.8:445
> >  ESTABLISHED 0          1253359
> > $ netstat -net | grep 10.0.0.8
> > tcp        0      0 10.0.1.87:53700         10.0.0.8:445
> >  ESTABLISHED 0          1253439
> > $ netstat -net | grep 10.0.0.8
> > tcp        0      0 10.0.1.87:53926         10.0.0.8:445
> >  ESTABLISHED 0          1254557
> > $ netstat -net | grep 10.0.0.8
> > tcp        0      0 10.0.1.87:54148         10.0.0.8:445
> >  ESTABLISHED 0          1253578
> > $ netstat -net | grep 10.0.0.8
> > tcp        0      0 10.0.1.87:54352         10.0.0.8:445
> >  ESTABLISHED 0          1253604
> > $ netstat -net | grep 10.0.0.8
> > tcp        0      0 10.0.1.87:54518         10.0.0.8:445
> >  ESTABLISHED 0          1254685
> > $ netstat -net | grep 10.0.0.8
> > tcp        0      0 10.0.1.87:54698         10.0.0.8:445
> >  ESTABLISHED 0          1252177
> >
> > As a work around, I can downgrade these client machines to any 4.8.x
> > kernel and the issue goes away. My suspicion is something is weird in
> > my smb.conf and a change in the 4.9.x kernels exposes that weirdness.
> > Or maybe there's a bug that was introduced in 4.9 and our setup
> > exposes it.
> >
> > I've built 4.10rc kernels from Linus's git repo and they also
have the
> > problem. The 4.9 kernel I built from Linus's git has the problem,
but
> > the 4.8 kernel I built does not, so I don't think it's related
to any
> > patching done by Archlinux. I don't understand why the issue
didn't
> > happen immediately after upgrading kernels on the 27th, but now it
> > very consistently acts up after less than 20 minutes.
> >
> > Attached is the smb.conf used on one of my FreeNAS servers. I was
> > able to copy that config to an Archlinux system running Samba version
> > 4.5.3 (commenting lines 24, 25, 55, and 79 and adjusting the
> > "interfaces =" line) and the problem persists, so it
doesn't appear
> > to be specific to FreeNas or Samba 4.3.6.
> >
> > --
> > Paul Klapperich
>
> Unfortunately, this list removes attachments, you will need to post
> your smb.conf in the actual message.
>
> Rowland
>
> --
> To unsubscribe from this list go to the following URL and read the
> instructions:  https://lists.samba.org/mailman/options/samba
>
Maybe Matching Threads
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?
- Need help troubleshooting TCP thrashing, possible kernel bug?