thr3ads.net - samba - [Samba] gencache.tdb size and cache flush [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Francesco Malvezzi

2018-Sep-04 09:59 UTC

[Samba] gencache.tdb size and cache flush

Il 04/09/18 06:00, Volker Lendecke ha scritto:> Hi!
> 
> Technical description below, but the exec summary is: Yes, we have a
> performance problem with gencache.
> 
> On Wed, Aug 29, 2018 at 10:28:05AM +0200, Francesco Malvezzi via samba
wrote:
>> Hi all,
>>
>> I have a midsize AD domain with some 50k users but only 100
workstations
>> joined.
>>
>> Sometimes I find server CPU throttling at 100%. In order to let it drop
> 
> Can you find out where *exactly* that 100% is spent? gstack on the
> spinning process with debug symbols would be very helpful here.
not sure how to do it.

can be like that
https://gist.github.com/francescm/8e396f5470da8df8451be13777e18810
?

> 
>> and have smooth performance I delete cache:
>>
>> systemctl stop samba
>> net cache flush
>> systemctl start samba
>>
>> First of all, is it needed a samba stop to flush the cache?
> 
> No.
thank you.
> 
>> Even if cache flush does the job to restore performance, I am clueless
>> about the root cause of the problem. Before flushing cache the
>> gencache.tdb had 15k entries. Is it large? Do you think is it worth
time
>> to investigate why it grows so much or is it just normal?
> 
> 15k entries is not really silly large. I've seen much larger ones.
> What kind of OS do you have? The question is -- does it have the
> ability to use robust mutexes? (FreeBSD 11 and recent Linux).
Debian GNU/Linux 9 (stretch)
Linux addc 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u3 (2018-08-19)
x86_64 GNU/Linux

I absolutely agree to the need to further investigate. The gencache
trail was just a suspect. What I know for sure is I have high spike
loads from a PID with label: "samba: task[dcesrv]".

The stop/delete cache/start procedure actually works, but I am more and
more likely to believe the "delete cache" part is just useless.

thank you,

franz

Volker Lendecke

2018-Sep-04 10:42 UTC

head link

[Samba] gencache.tdb size and cache flush

On Tue, Sep 04, 2018 at 11:59:04AM +0200, Francesco Malvezzi
wrote:> Il 04/09/18 06:00, Volker Lendecke ha scritto:
> > Hi!
> > 
> > Technical description below, but the exec summary is: Yes, we have a
> > performance problem with gencache.
> > 
> > On Wed, Aug 29, 2018 at 10:28:05AM +0200, Francesco Malvezzi via samba
wrote:
> >> Hi all,
> >>
> >> I have a midsize AD domain with some 50k users but only 100
workstations
> >> joined.
> >>
> >> Sometimes I find server CPU throttling at 100%. In order to let it
drop
> > 
> > Can you find out where *exactly* that 100% is spent? gstack on the
> > spinning process with debug symbols would be very helpful here.
> 
> not sure how to do it.
> 
> can be like that
> https://gist.github.com/francescm/8e396f5470da8df8451be13777e18810
> ?
Yes, exactly. The relevant line is

#19 0x00007fe50c1c5a3c in dcesrv_samr_EnumDomainUsers

which means that some client is listing all users in your domain. With
50.000 users this takes a while. If the client times out and
reconnects, this can pretty quickly pile up.

Do you have Linux clients with winbind and "winbind enum users = yes"
in your network? This would probably do that to your DC.

Volker

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de

Meet us at Storage Developer Conference (SDC)
Santa Clara, CA USA, September 24th-27th 2018

Francesco Malvezzi

2018-Sep-04 13:36 UTC

head link

[Samba] gencache.tdb size and cache flush

Il 04/09/18 12:42, Volker Lendecke ha scritto:> On Tue, Sep 04, 2018 at 11:59:04AM +0200, Francesco Malvezzi wrote:
>> Il 04/09/18 06:00, Volker Lendecke ha scritto:
>>> Hi!
>>>
>>> Technical description below, but the exec summary is: Yes, we have
a
>>> performance problem with gencache.
>>>
>>> On Wed, Aug 29, 2018 at 10:28:05AM +0200, Francesco Malvezzi via
samba wrote:
>>>> Hi all,
>>>>
>>>> I have a midsize AD domain with some 50k users but only 100
workstations
>>>> joined.
>>>>
>>>> Sometimes I find server CPU throttling at 100%. In order to let
it drop
>>>
>>> Can you find out where *exactly* that 100% is spent? gstack on the
>>> spinning process with debug symbols would be very helpful here.
>>
>> not sure how to do it.
>>
>> can be like that
>> https://gist.github.com/francescm/8e396f5470da8df8451be13777e18810
>> ?
> 
> Yes, exactly. The relevant line is
> 
> #19 0x00007fe50c1c5a3c in dcesrv_samr_EnumDomainUsers
thank you for reading all that stuff.
> 
> which means that some client is listing all users in your domain. With
> 50.000 users this takes a while. If the client times out and
> reconnects, this can pretty quickly pile up.
If I simulate it by listing all user in Active Directory User and
Computer utility, I obtain a load raise at 100% cpu, very short because
client disconnects at around 1000 users.

A call to:
time sudo ./bin/ldbsearch -H private/sam.ldb "(objectClass=user)" >
/dev/null

real	0m22,410s
user	0m20,132s
sys	0m2,072s

describes better your scenario: one cpu is full load for about 20
seconds and then it drops.
> 
> Do you have Linux clients with winbind and "winbind enum users =
yes"
> in your network? This would probably do that to your DC.
As far as I know, the winbindd clients in our milieu do not enumerate
users, unless misconfigured (but can't talk for MacOSX clients),

thank you,

franz

Andrew Bartlett

2018-Sep-04 19:15 UTC

head link

[Samba] gencache.tdb size and cache flush

On Tue, 2018-09-04 at 12:42 +0200, Volker Lendecke via samba
wrote:> On Tue, Sep 04, 2018 at 11:59:04AM +0200, Francesco Malvezzi wrote:
> > 
> > Il 04/09/18 06:00, Volker Lendecke ha scritto:
> > > 
> > > Hi!
> > > 
> > > Technical description below, but the exec summary is: Yes, we
> > > have a
> > > performance problem with gencache.
> > > 
> > > On Wed, Aug 29, 2018 at 10:28:05AM +0200, Francesco Malvezzi via
> > > samba wrote:
> > > > 
> > > > Hi all,
> > > > 
> > > > I have a midsize AD domain with some 50k users but only 100
> > > > workstations
> > > > joined.
> > > > 
> > > > Sometimes I find server CPU throttling at 100%. In order to
let
> > > > it drop
> > > Can you find out where *exactly* that 100% is spent? gstack on
> > > the
> > > spinning process with debug symbols would be very helpful here.
> > not sure how to do it.
> > 
> > can be like that
> > https://gist.github.com/francescm/8e396f5470da8df8451be13777e18810
> > ?
> Yes, exactly. The relevant line is
> 
> #19 0x00007fe50c1c5a3c in dcesrv_samr_EnumDomainUsers
> 
> which means that some client is listing all users in your domain.
> With
> 50.000 users this takes a while. If the client times out and
> reconnects, this can pretty quickly pile up.
> 
> Do you have Linux clients with winbind and "winbind enum users =
yes"
> in your network? This would probably do that to your DC.
And if the client can't be fixed, certainly the implementation in the
Samba AD DC SAMR server could be made much, much more efficient.  As
far as I see it, we do a objectclass=user search for every 54 users in
a page, that makes a lot of searches for 50,000 users!. 

Thanks,

Andrew Bartlett
-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba

Seemingly Similar Threads

Search for more apparently analagous threads

samba - Sep 2018 - gencache.tdb size and cache flush

[Samba] gencache.tdb size and cache flush

[Samba] gencache.tdb size and cache flush

[Samba] gencache.tdb size and cache flush

[Samba] gencache.tdb size and cache flush

Seemingly Similar Threads