Hello Andrew,> On Mon, 2020-02-17 at 19:11 +0300, Alex via samba wrote: >> I'm running Samba AD DC in a VM under Proxmox. And it's eaten all RAM (1.8GB) within 3 >> days of running:> Exactly which version is this?Sorry, forgot to mention it. Samba version is 4.11.6. Some more info (if needed): [root at vm-dc3 ~]# wbinfo -u | wc -l 62 [root at vm-dc3 ~]# wbinfo -g | wc -l 78> Which sub-process is eating the memory?I was trying to determine that w/o success. Looks like all Samba processes do that.> Can you get a talloc dump using 'smbcontrol $PID pool-usage' for > whichever PID is leaking the memory?Not sure which one is _leaking_ the memory, so I took the one which ate memory more than the others (by "ps auxw"). Please, find the report here: https://www.dropbox.com/s/76qdq1x89brmib0/samba.1520.pool-usage.txt.gz?dl=0> Does the problem reproduce on a current Samba 4.11?Yes, it's on the latest.> Our new LDAP server design is much more memory efficient, particularly > if you have pathological clients that search for the whole DB and then > keep the socket open.I don't think we have such clients. -- Best regards, Alex
Guys, I still need help with this. After a week of uptime almost all swap space is taken: top - 19:24:32 up 7 days, 5:26, 1 user, load average: 0.24, 0.16, 0.22 Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 1794860 total, 202308 free, 1458900 used, 133652 buff/cache KiB Swap: 1048572 total, 147568 free, 901004 used. 194116 avail Mem Current "ps auxw" along with "smbcontrol all pool-usage" are here: https://paste.ee/p/P3doR https://www.dropbox.com/s/76qdq1x89brmib0/samba.all.pool-usage.txt.gz?dl=0 Any help is kindly appreciated!>> On Mon, 2020-02-17 at 19:11 +0300, Alex via samba wrote: >>> I'm running Samba AD DC in a VM under Proxmox. And it's eaten all RAM (1.8GB) within 3 >>> days of running:>> Exactly which version is this?> Sorry, forgot to mention it. Samba version is 4.11.6. Some more info (if needed): > [root at vm-dc3 ~]# wbinfo -u | wc -l > 62 > [root at vm-dc3 ~]# wbinfo -g | wc -l > 78>> Which sub-process is eating the memory?> I was trying to determine that w/o success. Looks like all Samba processes do > that.>> Can you get a talloc dump using 'smbcontrol $PID pool-usage' for >> whichever PID is leaking the memory?> Not sure which one is _leaking_ the memory, so I took the one which ate memory > more than the others (by "ps auxw"). > Please, find the report here: > https://www.dropbox.com/s/76qdq1x89brmib0/samba.1520.pool-usage.txt.gz?dl=0>> Does the problem reproduce on a current Samba 4.11?> Yes, it's on the latest.>> Our new LDAP server design is much more memory efficient, particularly >> if you have pathological clients that search for the whole DB and then >> keep the socket open.> I don't think we have such clients.-- Best regards, Alex
G'Day, Can you rebuild Samba with libbsd so we get better process titles? If you can't, then please use 'samba-tool processes' to line up pids with names. Then, please run the smbcontrol not against 'all' (which hasn't got all the processes, clearly only returned data from eight), but against each of the largest processes individually, and put them in distinct files for me? If you can run Samba 4.12rc we will give some better info in that pool- usage output. I've written a patch to improve our process titles further, in the future you will be able to use 'ps -ef -o pid,comm' and get some idea what each process is, without the rebuild (on linux): https://gitlab.com/samba-team/samba/-/merge_requests/1154 Andrew Bartlett On Fri, 2020-02-21 at 19:27 +0300, Alex via samba wrote:> Guys, I still need help with this. After a week of uptime almost all > swap space > is taken: > top - 19:24:32 up 7 days, 5:26, 1 user, load average: 0.24, 0.16, > 0.22 > Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie > %Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 > si, 0.0 st > KiB Mem : 1794860 total, 202308 free, 1458900 used, 133652 > buff/cache > KiB Swap: 1048572 total, 147568 free, 901004 used. 194116 > avail Mem > > Current "ps auxw" along with "smbcontrol all pool-usage" are here: > https://paste.ee/p/P3doR >https://www.dropbox.com/s/76qdq1x89brmib0/samba.all.pool-usage.txt.gz?dl=0> > Any help is kindly appreciated! > > > > On Mon, 2020-02-17 at 19:11 +0300, Alex via samba wrote: > > > > I'm running Samba AD DC in a VM under Proxmox. And it's eaten > > > > all RAM (1.8GB) within 3 > > > > days of running: > > > Exactly which version is this? > > Sorry, forgot to mention it. Samba version is 4.11.6. Some more > > info (if needed): > > [root at vm-dc3 ~]# wbinfo -u | wc -l > > 62 > > [root at vm-dc3 ~]# wbinfo -g | wc -l > > 78 > > > Which sub-process is eating the memory? > > I was trying to determine that w/o success. Looks like all Samba > > processes do > > that. > > > Can you get a talloc dump using 'smbcontrol $PID pool-usage' for > > > whichever PID is leaking the memory? > > Not sure which one is _leaking_ the memory, so I took the one which > > ate memory > > more than the others (by "ps auxw"). > > Please, find the report here: > >https://www.dropbox.com/s/76qdq1x89brmib0/samba.1520.pool-usage.txt.gz?dl=0> > > Does the problem reproduce on a current Samba 4.11? > > Yes, it's on the latest. > > > Our new LDAP server design is much more memory efficient, > > > particularly > > > if you have pathological clients that search for the whole DB and > > > then > > > keep the socket open. > > I don't think we have such clients. > > -- > Best regards, > Alex > >-- Andrew Bartlett https://samba.org/~abartlet/ Authentication Developer, Samba Team https://samba.org Samba Developer, Catalyst IT https://catalyst.net.nz/services/samba
On Tue, 2020-02-25 at 18:56 +0300, Alex wrote:> Hello Andrew, > > Thanks for getting back on this. > > > If you can't, then please use 'samba-tool processes' to line up > > pids > > with names. > > [root at vm-dc3 var]# samba-tool processes > Service: PID > -------------------------------------- > cldap_server 1529 > dnssrv 1555 > dnsupdate 1553 > kccsrv 1548 > kdc_server 1533 > kdc_server(worker 3) 1552 > kdc_server(worker 0) 1540 > kdc_server(worker 1) 1544 > kdc_server(worker 2) 1549 > ldap_server 1525 > ldap_server(worker 1) 1569 > ldap_server(worker 2) 1571 > ldap_server(worker 0) 1567 > ldap_server(worker 3) 1573 > nbt_server 1517 > notify-daemon 1563 > rpc_server 1514 > rpc_server(worker 2) 1528 > rpc_server(worker 0) 1520 > rpc_server(worker 1) 1524 > rpc_server(worker 3) 1532 > samba 1508 > winbind_server 1547 > > > Then, please run the smbcontrol not against 'all' (which hasn't got > > all > > the processes, clearly only returned data from eight), but against > > each > > of the largest processes individually, and put them in distinct > > files > > for me? > > sorted by swap: https://paste.ee/p/w6TL5 > > I tried to grab pool-usage for PIDs 1540, 1552 and 1555 and got "No > replies > received" for each of them. > > sorted by rss: https://paste.ee/p/Rjsgw > > pool-usage for the PID 1520 returns the same "no replies" > > Let me know if you need anything else.It looks to me that the worst of the issue in the KDC. You could run Samba under valgrind: PYMALLOC=malloc valgrind --trace-children=yes bin/samba -i You could also try giving that pool-usage longer to run with the -- timeout option to smbcontrol, as the default timeout is 10 seconds. smbcontrol --timeout 60 ... Andrew Bartlett -- Andrew Bartlett https://samba.org/~abartlet/ Authentication Developer, Samba Team https://samba.org Samba Development and Support, Catalyst IT - Expert Open Source Solutions https://catalyst.net.nz/services/samba
Thanks! I've got a fix for one issue seen here. I'd love to credit you with a fix. Please let me know what name and any affiliation (eg company) you would like listed. There will probably be more to come, and if you can try and get me the report for any other large processes that would be awesome. See BUG: https://bugzilla.samba.org/show_bug.cgi?id=14299 MR: https://gitlab.com/samba-team/samba/-/merge_requests/1168 Andrew Bartlett On Wed, 2020-02-26 at 14:15 +0300, Alex wrote:> Tweaking timeout value did the trick:(log files removed)> Please, let me know if you still need me to run samba under valgrind > > > > > If you can't, then please use 'samba-tool processes' to line up > > > > pids > > > > with names. > > > > > > [root at vm-dc3 var]# samba-tool processes > > > Service: PID > > > -------------------------------------- > > > cldap_server 1529 > > > dnssrv 1555 > > > dnsupdate 1553 > > > kccsrv 1548 > > > kdc_server 1533 > > > kdc_server(worker 3) 1552 > > > kdc_server(worker 0) 1540 > > > kdc_server(worker 1) 1544 > > > kdc_server(worker 2) 1549 > > > ldap_server 1525 > > > ldap_server(worker 1) 1569 > > > ldap_server(worker 2) 1571 > > > ldap_server(worker 0) 1567 > > > ldap_server(worker 3) 1573 > > > nbt_server 1517 > > > notify-daemon 1563 > > > rpc_server 1514 > > > rpc_server(worker 2) 1528 > > > rpc_server(worker 0) 1520 > > > rpc_server(worker 1) 1524 > > > rpc_server(worker 3) 1532 > > > samba 1508 > > > winbind_server 1547 > > > > > > > Then, please run the smbcontrol not against 'all' (which hasn't > > > > got > > > > all > > > > the processes, clearly only returned data from eight), but > > > > against > > > > each > > > > of the largest processes individually, and put them in distinct > > > > files > > > > for me? > > > > > > sorted by swap: https://paste.ee/p/w6TL5 > > > > > > I tried to grab pool-usage for PIDs 1540, 1552 and 1555 and got > > > "No > > > replies > > > received" for each of them. > > > > > > sorted by rss: https://paste.ee/p/Rjsgw > > > > > > pool-usage for the PID 1520 returns the same "no replies" > > > > > > Let me know if you need anything else. > > It looks to me that the worst of the issue in the KDC. > > You could run Samba under valgrind: > > PYMALLOC=malloc valgrind --trace-children=yes bin/samba -i > > You could also try giving that pool-usage longer to run with the -- > > timeout option to smbcontrol, as the default timeout is 10 seconds. > > smbcontrol --timeout 60 ... > >-- Andrew Bartlett https://samba.org/~abartlet/ Authentication Developer, Samba Team https://samba.org Samba Development and Support, Catalyst IT - Expert Open Source Solutions https://catalyst.net.nz/services/samba
I should say, while it is good to fix, I don't think the fix below is the main issue. Your logs show Samba has loaded its schema from the DB 10 times, and each of those is 2MB. I think that is why the processes are so large, but it could be something else also. Do you have custom schema loaded? Thanks, Andrew Bartlett On Thu, 2020-02-27 at 11:51 +1300, Andrew Bartlett wrote:> Thanks! > > I've got a fix for one issue seen here. > > I'd love to credit you with a fix. Please let me know what name and > any affiliation (eg company) you would like listed. > > There will probably be more to come, and if you can try and get me > the > report for any other large processes that would be awesome. > > See > > BUG: https://bugzilla.samba.org/show_bug.cgi?id=14299 > > MR: https://gitlab.com/samba-team/samba/-/merge_requests/1168 > > Andrew Bartlett > > On Wed, 2020-02-26 at 14:15 +0300, Alex wrote: > > Tweaking timeout value did the trick: > > (log files removed) > > > Please, let me know if you still need me to run samba under > > valgrind > > > > > > > If you can't, then please use 'samba-tool processes' to line > > > > > up > > > > > pids > > > > > with names. > > > > > > > > [root at vm-dc3 var]# samba-tool processes > > > > Service: PID > > > > -------------------------------------- > > > > cldap_server 1529 > > > > dnssrv 1555 > > > > dnsupdate 1553 > > > > kccsrv 1548 > > > > kdc_server 1533 > > > > kdc_server(worker 3) 1552 > > > > kdc_server(worker 0) 1540 > > > > kdc_server(worker 1) 1544 > > > > kdc_server(worker 2) 1549 > > > > ldap_server 1525 > > > > ldap_server(worker 1) 1569 > > > > ldap_server(worker 2) 1571 > > > > ldap_server(worker 0) 1567 > > > > ldap_server(worker 3) 1573 > > > > nbt_server 1517 > > > > notify-daemon 1563 > > > > rpc_server 1514 > > > > rpc_server(worker 2) 1528 > > > > rpc_server(worker 0) 1520 > > > > rpc_server(worker 1) 1524 > > > > rpc_server(worker 3) 1532 > > > > samba 1508 > > > > winbind_server 1547 > > > > > > > > > Then, please run the smbcontrol not against 'all' (which > > > > > hasn't > > > > > got > > > > > all > > > > > the processes, clearly only returned data from eight), but > > > > > against > > > > > each > > > > > of the largest processes individually, and put them in > > > > > distinct > > > > > files > > > > > for me? > > > > > > > > sorted by swap: https://paste.ee/p/w6TL5 > > > > > > > > I tried to grab pool-usage for PIDs 1540, 1552 and 1555 and got > > > > "No > > > > replies > > > > received" for each of them. > > > > > > > > sorted by rss: https://paste.ee/p/Rjsgw > > > > > > > > pool-usage for the PID 1520 returns the same "no replies" > > > > > > > > Let me know if you need anything else. > > > > > > It looks to me that the worst of the issue in the KDC. > > > You could run Samba under valgrind: > > > PYMALLOC=malloc valgrind --trace-children=yes bin/samba -i > > > You could also try giving that pool-usage longer to run with the > > > -- > > > timeout option to smbcontrol, as the default timeout is 10 > > > seconds. > > > smbcontrol --timeout 60 ... > > > >-- Andrew Bartlett https://samba.org/~abartlet/ Authentication Developer, Samba Team https://samba.org Samba Development and Support, Catalyst IT - Expert Open Source Solutions https://catalyst.net.nz/services/samba
Hello Andrew,> I've got a fix for one issue seen here.> BUG: https://bugzilla.samba.org/show_bug.cgi?id=14299> MR: https://gitlab.com/samba-team/samba/-/merge_requests/1168Yesteray, I've deployed freshly released 4.12.0 and it still eats memory: top - 18:59:10 up 1 day, 25 min, 1 user, load average: 0.08, 0.13, 0.10 Tasks: 176 total, 1 running, 175 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.0 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 1829.4 total, 198.7 free, 1229.6 used, 401.1 buff/cache MiB Swap: 1536.0 total, 1508.2 free, 27.8 used. 392.4 avail Mem Right after samba started, it took around 500MB of "used". The OS is CentOS 8. # uname -r 4.18.0-147.5.1.el8_1.x86_64 smem output: https://paste.ee/p/W9hO9 samba-tool processes: https://paste.ee/p/OGoav smbcontrol pool-usage for most memory-consuming processes: https://www.dropbox.com/s/ll1iwbzumafdmah/smbcontrol.pool-usage.tgz?dl=0 Anything I can do to help troubleshoot it further? -- Best regards, Alex