Andrew Bartlett
2018-Sep-04 02:15 UTC
[Samba] authentication performance with 4.7.6 -> 4.7.8 upgrade (was: Re: gencache.tdb size and cache flush)
On Wed, 2018-08-29 at 15:36 +0200, Peter Eriksson via samba wrote:> For what it’s worth you are not alone in seeing similar problems with Samba and gencache. > > Our site has some 110K users (university with staff & students (including former ones), and currently around 2000 active (SMB) clients connecting to 5 different Samba servers (around 400-500 clients per server). When we previously just let things “run” gencache.tdb would grow forever and authentication login performance would start to deteriorate after a little while (would take more than 10 seconds). So we now delete it (and locks/locking.tdb that also tends to grow forever) and restart our samba processes every morning at 7 am - which gives us much more stable performance. > > - Servers with 256GB of RAM, 10Gbps ethernet interfaces and around 110TB of disk per server. > - FreeBSD 11.2-p2 > - Samba 4.7.6 with some local patches to allow (much) bigger socket listening queues in order to handle the case of many clients connecting at the same time. > > (We are trying to upgrade to a more recent Samba but 4.7.8 and 4.7.9 gave us horrible authentication performance every 10:th hour where the servers basically denied clients to login for about 2 hours so we had to back down to 4.7.6 again).I realise testing in production is difficult, but is there any chance you can pin down where between 4.7.6 and 4.7.8 it broke? There are not that many changes between, and while some appear authentication related nothing stands out. Also, do you run Samba as an AD DC, or are these file servers in a windows domain? Thanks, Andrew Bartlett -- Andrew Bartlett https://samba.org/~abartlet/ Authentication Developer, Samba Team https://samba.org Samba Development and Support, Catalyst IT https://catalyst.net.nz/services/samba
Andrew Bartlett
2018-Sep-04 02:42 UTC
[Samba] authentication performance with 4.7.6 -> 4.7.8 upgrade (was: Re: gencache.tdb size and cache flush)
On Tue, 2018-09-04 at 14:15 +1200, Andrew Bartlett via samba wrote:> On Wed, 2018-08-29 at 15:36 +0200, Peter Eriksson via samba wrote: > > For what it’s worth you are not alone in seeing similar problems with Samba and gencache. > > > > Our site has some 110K users (university with staff & students (including former ones), and currently around 2000 active (SMB) clients connecting to 5 different Samba servers (around 400-500 clients per server). When we previously just let things “run” gencache.tdb would grow forever and authentication login performance would start to deteriorate after a little while (would take more than 10 seconds). So we now delete it (and locks/locking.tdb that also tends to grow forever) and restart our samba processes every morning at 7 am - which gives us much more stable performance. > > > > - Servers with 256GB of RAM, 10Gbps ethernet interfaces and around 110TB of disk per server. > > - FreeBSD 11.2-p2 > > - Samba 4.7.6 with some local patches to allow (much) bigger socket listening queues in order to handle the case of many clients connecting at the same time. > > > > (We are trying to upgrade to a more recent Samba but 4.7.8 and 4.7.9 gave us horrible authentication performance every 10:th hour where the servers basically denied clients to login for about 2 hours so we had to back down to 4.7.6 again). > > I realise testing in production is difficult, but is there any chance > you can pin down where between 4.7.6 and 4.7.8 it broke? There are not > that many changes between, and while some appear authentication related > nothing stands out. > > Also, do you run Samba as an AD DC, or are these file servers in a > windows domain?BTW, the main caching change made in that set of versions is: commit 0f2e2711e92a433abdc9436ecaf3ba9d773902c8 Author: Volker Lendecke <vl at samba.org> Date: Tue Aug 8 14:24:27 2017 +0200 winbindd: Name<->SID cache is not sequence number based anymore BUG: https://bugzilla.samba.org/show_bug.cgi?id=13369 Signed-off-by: Volker Lendecke <vl at samba.org> Reviewed-by: Ralph Boehme <slow at samba.org> commit a92c5dc7800a32c4dc58051c111a43b4749d0854 Author: Volker Lendecke <vl at samba.org> Date: Sun Aug 6 18:13:10 2017 +0200 winbindd: Move name<->sid cache to gencache The mapping from name to sid and vice versa has nothing to do with a specific domain. It is publically available. Thus put it into gencache without referring to the domain this was retrieved from BUG: https://bugzilla.samba.org/show_bug.cgi?id=13369 Signed-off-by: Volker Lendecke <vl at samba.org> Reviewed-by: Ralph Boehme <slow at samba.org> Perhaps this gives something to try and revert to pin this down. Andrew Bartlett -- Andrew Bartlett https://samba.org/~abartlet/ Authentication Developer, Samba Team https://samba.org Samba Development and Support, Catalyst IT https://catalyst.net.nz/services/samba
Peter Eriksson
2018-Sep-04 13:13 UTC
[Samba] authentication performance with 4.7.6 -> 4.7.8 upgrade (was: Re: gencache.tdb size and cache flush)
I’m going to try to upgrade from 4.7.6 to 4.7.7 on one of our servers soon and see if things break or not. With 4.7.6 things are stable at least. Our file servers are in a Microsoft Windows domain (consisting of 6 Microsoft Windows 2016 AD servers). The graphs (logarithmic time scale) below are login times from a probe station that times a connect using “smbclient” with a Kerberos ticket and basically just quits directly. It shows five of our Samba servers in one graph (so not so easy to read :-) Samba 4.7.6 (2018-09-04, right now, around 400-500 users per server): Samba 4.7.8 (2018-08-29): The probe software has a 10 seconds timeout so the “spikes” are probably/basically connection attempts that timed out. We probe all servers every minute. The holes in the graphs are 10 hours apart from last reboot (07:00 every day) -17:00, 03:00) and then nobody could connect basically (or the attempts took more than a minute so the whole session was aborted - and thus no data recorded in the RRD databases). - Peter> On 4 Sep 2018, at 04:15, Andrew Bartlett via samba <samba at lists.samba.org> wrote: > > On Wed, 2018-08-29 at 15:36 +0200, Peter Eriksson via samba wrote: >> For what it’s worth you are not alone in seeing similar problems with Samba and gencache. >> >> Our site has some 110K users (university with staff & students (including former ones), and currently around 2000 active (SMB) clients connecting to 5 different Samba servers (around 400-500 clients per server). When we previously just let things “run” gencache.tdb would grow forever and authentication login performance would start to deteriorate after a little while (would take more than 10 seconds). So we now delete it (and locks/locking.tdb that also tends to grow forever) and restart our samba processes every morning at 7 am - which gives us much more stable performance. >> >> - Servers with 256GB of RAM, 10Gbps ethernet interfaces and around 110TB of disk per server. >> - FreeBSD 11.2-p2 >> - Samba 4.7.6 with some local patches to allow (much) bigger socket listening queues in order to handle the case of many clients connecting at the same time. >> >> (We are trying to upgrade to a more recent Samba but 4.7.8 and 4.7.9 gave us horrible authentication performance every 10:th hour where the servers basically denied clients to login for about 2 hours so we had to back down to 4.7.6 again). > > I realise testing in production is difficult, but is there any chance > you can pin down where between 4.7.6 and 4.7.8 it broke? There are not > that many changes between, and while some appear authentication related > nothing stands out. > > Also, do you run Samba as an AD DC, or are these file servers in a > windows domain? > > Thanks, > > Andrew Bartlett > > -- > Andrew Bartlett > https://samba.org/~abartlet/ > Authentication Developer, Samba Team https://samba.org > Samba Development and Support, Catalyst IT > https://catalyst.net.nz/services/samba > > > > > > -- > To unsubscribe from this list go to the following URL and read the > instructions: https://lists.samba.org/mailman/options/samba
Rowland Penny
2018-Sep-04 13:35 UTC
[Samba] authentication performance with 4.7.6 -> 4.7.8 upgrade (was: Re: gencache.tdb size and cache flush)
On Tue, 4 Sep 2018 15:13:10 +0200 Peter Eriksson via samba <samba at lists.samba.org> wrote:> I’m going to try to upgrade from 4.7.6 to 4.7.7 on one of our servers > soon and see if things break or not. With 4.7.6 things are stable at > least. > > Our file servers are in a Microsoft Windows domain (consisting of 6 > Microsoft Windows 2016 AD servers). > > > The graphs (logarithmic time scale) below are login times from a > probe station that times a connect using “smbclient” with a Kerberos > ticket and basically just quits directly. It shows five of our Samba > servers in one graph (so not so easy to read :-) > > Samba 4.7.6 (2018-09-04, right now, around 400-500 users per server): > > > > Samba 4.7.8 (2018-08-29): > > > > The probe software has a 10 seconds timeout so the “spikes” are > probably/basically connection attempts that timed out. We probe all > servers every minute. The holes in the graphs are 10 hours apart from > last reboot (07:00 every day) -17:00, 03:00) and then nobody could > connect basically (or the attempts took more than a minute so the > whole session was aborted - and thus no data recorded in the RRD > databases). > > - PeterSorry, but this list strips attachments, can you post them somewhere and then post links. Rowland