I have upgraded to samba 2.2.2. Our memory problem seems to be
correct but we still have CPU usage issues. Samba quickly starts
eating away at the available CPU. We are running it on a Sun E3500
with 4 400Mhz processors and 2GB of RAM. Below is top output
showing the top processes to be smbd. I currently have approximately
210 smbd processes running.
last pid: 1473; load averages: 17.02, 14.48, 12.18 14:01:03
414 processes: 392 sleeping, 15 running, 3 zombie, 4 on cpu
CPU states: 0.0% idle, 17.9% user, 82.1% kernel, 0.0% iowait, 0.0% swap
Memory: 2048M real, 34M free, 1615M swap in use, 433M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
3768 root 1 33 0 5200K 3832K sleep 6:31 3.94% smbd
16898 root 1 33 0 5528K 4168K sleep 0:50 3.84% smbd
25270 root 1 12 0 5512K 4240K run 0:20 3.78% smbd
21839 root 1 21 0 5496K 4264K run 1:17 3.78% smbd
28038 root 1 13 0 5536K 4456K run 0:13 3.76% smbd
16262 root 1 21 0 5440K 4192K sleep 3:10 3.64% smbd
28783 root 1 12 0 5528K 4400K run 0:06 3.52% smbd
25273 root 1 22 0 5792K 4456K run 0:29 3.51% smbd
29415 root 1 31 0 5528K 4424K cpu15 0:20 3.44% smbd
4026 root 1 20 0 5512K 4320K run 6:12 3.43% smbd
21534 root 1 22 0 5904K 4872K sleep 2:41 3.27% smbd
27639 root 1 21 0 5528K 4320K sleep 0:15 3.16% smbd
842 root 1 22 0 5720K 4752K run 5:52 3.15% smbd
23081 root 1 38 0 5776K 4536K sleep 0:57 2.89% smbd
28384 root 1 13 0 5536K 4040K run 0:20 2.68% smbd
When I truss any of these processes they appear to be in a loop for long
periods of time. Below is a section of the truss.
...
20232: fcntl(13, F_SETLKW64, 0xEFFFE880) = 0
20232: fcntl(13, F_SETLKW64, 0xEFFFE810) = 0
20232: fcntl(13, F_SETLKW64, 0xEFFFE810) = 0
20232: fcntl(13, F_SETLKW64, 0xEFFFE810) = 0
20232: fcntl(13, F_SETLKW64, 0xEFFFE810) = 0
20232: fcntl(13, F_SETLKW64, 0xEFFFE810) = 0
...
If anyone has any suggestions it would be greatly appreciated.
Thanks,
Frank
At 09:14 AM 10/8/2001 +0200, Wagner Guenter wrote:>RE: Memory/CPU usage
>
>Hi Frank,
>
>to this subject there has been a discussion in the samba-technical
>mailing list, with subject
>"Time-critical problem at Sun: exploding smbd memory usage"
>
>We had the same problem when we upgraded Samba 2.0.6 to 2.0.7
>Since 2.0.7 there is a memory leak in the code and is still in
>2.2.1a. It should be solved in the coming 2.2.2.
>
>The problem is relevant when there is a great number of printers.
>And we also found, that each time making changes (even "touch"ing)
>the configuration file smb.conf while Samba is running, the
>size of the smbd prozesses grow ("explode").
>
>Richard Bollinger wrote an early patch for 2.0.7.
>We use this patch with 2.0.10 (the relevant part of the code
>is the same) and it works fine. But there have been
>discussions about this fix.
>
>Now, 6 Sep 2001, Richard Bollinger, wrote "the real fix"
>and it seems, that this will go into 2.2.2.
>
>Here are same of the mailings to this subject:
>(if you have not the time to read all: look at the end)
>
>
>
>exploding-smbd-memory-2.txt
>
>Aus: samba-technical digest, Vol 1 #823 - 7 msgs
>Aus: samba-technical digest, Vol 1 #827 - 14 msgs
>Aus: samba digest, Vol 1 #550 - 41 msgs
>Aus: samba-technical digest, Vol 1 #852 - 10 msgs, 4, (Final Patch!),5
>
>
>
>Message: 4
>Date: Thu, 23 Aug 2001 09:32:12 -0700
>From: Jeremy Allison <jeremy@valinux.com>
>To: Richard Bollinger <rabollinger@home.com>
>Cc: Gerald Carter <gcarter@valinux.com>,
> Samba Technical <samba-technical@samba.org>
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage
>
>Richard Bollinger wrote:
> >
> > Funny... I don't see the change in 2_2 CVS... did you really apply
it?
> >
> > Ahhh I see... Jeremy took it back out. Nice of him to do so, but I
think
>he's wrong. The "main
> > loop" doesn't clean things up after each printer is added
when we're using
>a [printers] clause in
> > smb.conf. That loop occurs inside pcap_printer_fn. Per my testing,
this
>only seems to make a
> > difference on Solaris. Maybe fragmentation is occuring inside their
>malloc() free()?
>
>I took it out as it's not safe to do that free
>inside the printer loop. It's only safe to
>do that talloc delete in the main loop, outside
>of any incoming smb processing.
>
>The talloc delete in the main loop should free
>any memory allocated in the tallocs inside the
>printer allocation. Why does this cause the RSS
>to grow on Solaris ?
>
>insure on Linux does not flag this as a alloc
>bug (and believe me, it would....).
>
>Jeremy.
>
>
>--__--__--
>
>Message: 5
>Date: Thu, 23 Aug 2001 12:48:57 -0400
>From: David Collier-Brown <davecb@canada.sun.com>
>Reply-To: David.Collier-Brown@Sun.COM
>Organization: Priv ate Person
>To: Jeremy Allison <jeremy@valinux.com>
>Cc: Richard Bollinger <rabollinger@home.com>,
> Gerald Carter <gcarter@valinux.com>,
> Samba Technical <samba-technical@samba.org>
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage
>
>Jeremy Allison wrote:
>
> > The talloc delete in the main loop should free
> > any memory allocated in the tallocs inside the
> > printer allocation. Why does this cause the RSS
> > to grow on Solaris ?
> >
> > insure on Linux does not flag this as a alloc
> > bug (and believe me, it would....).
>
> Ok, lets's look at this when the conference
> is over: it may be a subtle Solaris bug,
> and I like to find those.
>
>--dave
>--
>David Collier-Brown, | Always do right. This will gratify
>Performance & Engineering Team | some people and astonish the rest.
>Americas Customer Engineering | -- Mark Twain
>(905) 415-2849 | davecb@canada.sun.com
>
>
>
>
>Aus: samba-technical digest, Vol 1 #827 - 14 msgs
>
>
>Message: 4
>From: "Richard Bollinger" <rabollinger@home.com>
>To: <David.Collier-Brown@sun.com>,
> "Michael E Osborne" <mosborne@jacads.com>
>Cc: <jeremy@valinux.com>, <farrar@parc.xerox.com>,
> <David.Collier-Brown@sun.com>, "Gerald Carter"
><gcarter@valinux.com>,
> "Kris Desjardins" <kris_desjardins@hotmail.com>,
><tonys@aus.sun.com>,
> <craig@aus.sun.com>, <allenw@sun.com>,
<samba-technical@samba.org>
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage
>Date: Tue, 21 Aug 2001 08:33:28 -0400
>
>This is a multi-part message in MIME format.
>
>------=_NextPart_000_000F_01C12A1B.F3D52400
>Content-Type: text/plain;
> charset="iso-8859-1"
>Content-Transfer-Encoding: 7bit
>
>Try the attached patch to fix the printer related leakage on Solaris. We
>also have about 300
>printers.
>
>Rich Bollinger, Elliott Company
>
>------=_NextPart_000_000F_01C12A1B.F3D52400
>Content-Type: application/octet-stream;
> name="fixleaks.patch"
>Content-Transfer-Encoding: quoted-printable
>Content-Disposition: attachment;
> filename="fixleaks.patch"
>
>*** ../samba-2.0.7/source.Linux/param/loadparm.c Fri Nov 17 09:24:27
>2000=0A>--- ./param/loadparm.c Sat Nov 18 00:46:50
2000=0A>***************=0A>*** 2711,2716 ****=0A>--- 2711,2718
----=0A> if ((i=3Dlp_servicenumber(name)) >=3D 0)=0A>
string_set(&iSERVICE(i).comment,comment);=0A> }=0A>+ /*
free up temporary memory */=0A>+ lp_talloc_free();=0A> }=0A>
=0A>
>/************************************************************************>***=0A>***
../samba-2.0.7/source.Linux/smbd/server.c Thu Mar 16 17:59:52 2000=0A>---
./smbd/server.c Fri Nov 17 22:58:15 2000=0A>***************=0A>***
183,188 ****=0A>--- 183,191 ----=0A> fd_set lfds;=0A>
int num;=0A> =0A>+ /* free up temporary
memory */=0A>+ lp_talloc_free();=0A>+ =0A>
memcpy((char *)&lfds, (char *)&listen_set, =0A>
sizeof(listen_set));=0A> =0A>
>------=_NextPart_000_000F_01C12A1B.F3D52400--
>
>
>
>--__--__--
>
>Message: 5
>Date: Thu, 23 Aug 2001 16:30:21 +1000
>From: tony shepherd <tony.shepherd@aus.sun.com>
>To: Richard Bollinger <rabollinger@home.com>
>Cc: David.Collier-Brown@sun.com,
> Michael E Osborne <mosborne@jacads.com>, jeremy@valinux.com,
> farrar@parc.xerox.com, Gerald Carter <gcarter@valinux.com>,
> Kris Desjardins <kris_desjardins@hotmail.com>,
tonys@aus.sun.com,
> craig@aus.sun.com, allenw@sun.com, samba-technical@samba.org
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage
>
>Many thanks to all of you for help in tracking down the problem and
>providing a fix. I have applied the patch and it appears to be
>working. I will let you know if our testing over the next week or so
>turn up any problems.
>
>Again, thanks for this. I did not expect the fix to be so quick.
>
>regards
>
>tony
>
>Richard Bollinger wrote:
> >
> > Try the attached patch to fix the printer related leakage on Solaris.
We
>also have about 300
> > printers.
> >
> > Rich Bollinger, Elliott Company
> >
> >
----------------------------------------------------------------------
> > Name: fixleaks.patch
> > fixleaks.patch Type: unspecified type (application/octet-stream)
> > Encoding: quoted-printable
>
>
>
>Aus: samba digest, Vol 1 #550 - 41 msgs
>
>Message: 3
>Date: Mon, 27 Aug 2001 08:18:16 +1000
>From: tony shepherd <tony.shepherd@aus.sun.com>
>To: "Baker, Byran" <Byran_Baker@bmc.com>
>Cc: "'samba@lists.samba.org'"
<samba@lists.samba.org>
>Subject: Re: Ultra 60 with 500+ users running out of memory
>
>We also recently had a problem with memory usage, although in 2.0.10.
>There was a memory leak which was causing the parent smbd to get very
>large, and therefore all spawned smbd's for each new connection. This
>problem was not only in 2.0.10, but earlier versions. Richard
>Bollinger provided a patch (see attached) which seems to have fixed
>the problem.
>
>We also found that the memory requirements under Solaris seemed to be
>quite a bit higher than that under linux (on cobalt qubes). The size
>of the smbd's was also dependent on the number of shares you were
>providing. For example, on one particular installation on solaris 8,
>samba 2.0.10 (after patch installtion):
>
>35 shares:
> SIZE RES-MEM
>parent smbd 2952 1560
>each new smbd (for each new sessions) 4568 2416
>
>313 shares:
> SIZE RES-MEM
>parent smbd 3536 1784
>each new smbd (for each new sessions) 5080 2968
>
>
>Also, old smbd processes were no "going away". To recover
resources,
>we set the "deadtime" parameter to 5. This removed any inactive
>processes after 5 minutes.
>
>Hope this helps.
>
>tony
>
>"Baker, Byran" wrote:
> >
> > I administer a Sun Ultra60 (2x450MHz, 1.5GB real memory, 2.5GB Swap)
>running
> > Samba 2.0.7. I run an average of 450 users at any given time without
> > problems. When I get many more than 500 users, I begin to have memory
> > shortage problems.
> >
> > I am trying to find out how I can tune Samba, to reduce the amount of
>memory
> > needed per user so that I do not have to upgrade machines again (I
have
> > upgraded from a SPARCstation 5, to an Ultra 10, to the current Ultra
60
>over
> > the years). The CPUs are idle most of the time, so my only real
concern
>is
> > cutting down the memory usage.
> >
> > Thanks in advance,
> > -Byran
> > --
> > To unsubscribe from this list go to the following URL and read the
> > instructions: http://lists.samba.org/mailman/listinfo/samba
>
>[demime 0.98b removed an attachment of type application/octet-stream which
>had a name of fixleaks.patch]
>
>--__--__--
>
>Message: 4
>Date: Mon, 27 Aug 2001 08:51:25 +1000
>From: tony shepherd <tony.shepherd@aus.sun.com>
>To: "Baker, Byran" <Byran_Baker@bmc.com>,
"'samba@lists.samba.org'"
> <samba@lists.samba.org>
>Subject: Re: Ultra 60 with 500+ users running out of memory
>
>[snip]
>
> >
> > [demime 0.98b removed an attachment of type application/octet-stream
which
>had a name of fixleaks.patch]
> > --
>[snip]
>
>Seems the patch got cut by the list server. Here it is again.
>
>tony
>*** ../samba-2.0.7/source.Linux/param/loadparm.c Fri Nov 17 09:24:27
>2000
>--- ./param/loadparm.c Sat Nov 18 00:46:50 2000
>***************
>*** 2711,2716 ****
>--- 2711,2718 ----
> if ((i=lp_servicenumber(name)) >= 0)
> string_set(&iSERVICE(i).comment,comment);
> }
>+ /* free up temporary memory */
>+ lp_talloc_free();
> }
>
>
>/***************************************************************************
>*** ../samba-2.0.7/source.Linux/smbd/server.c Thu Mar 16 17:59:52 2000
>--- ./smbd/server.c Fri Nov 17 22:58:15 2000
>***************
>*** 183,188 ****
>--- 183,191 ----
> fd_set lfds;
> int num;
>
>+ /* free up temporary memory */
>+ lp_talloc_free();
>+
> memcpy((char *)&lfds, (char *)&listen_set,
> sizeof(listen_set));
>
>
>
>
>Aus: samba-technical digest, Vol 1 #852 - 10 msgs, 4
>
>Message: 4
>From: "Richard Bollinger" <rabollinger@home.com>
>To: <davecb@canada.sun.com>, "Gerald Carter"
<gcarter@valinux.com>,
> <jeremy@valinux.com>
>Cc: <samba-technical@lists.samba.org>
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage ---
>Here's the real fix!
>Date: Thu, 6 Sep 2001 00:02:36 -0400
>
>This is a multi-part message in MIME format.
>
>------=_NextPart_000_0040_01C13667.3F142220
>Content-Type: text/plain;
> charset="iso-8859-1"
>Content-Transfer-Encoding: 7bit
>
>Using an old memory allocation debugging / tracking tool (mem_man), I
>monitored what was going
>on while smbd processed our 300+ printer printcap file...
>
>After processing 300 printers, the stats were as follows:
>Mem Manager : 196110 blocks, allocation 11553K, real allocation 11553K, 0
>errors
>
>Of that, talloc() accounted for 192036 of the malloc() calls and 11280K of
>the space allocated.
>
>Sure, all of that would be freed eventually, but it amounts to a torture
>test for the system's
>malloc() / free() capabilities, which apparently aren't as aggressive at
>recovering / releasing
>free space with Solaris as with Linux :-).
>
>I tracked the problem to an O(N^2) loop... add_all_printers() calls
>pcap_printer_fn(), which in
>turn calls lp_add_one_printer(), which in turn calls lp_servicenumber(),
>which in turn calls
>lp_servicename(), which in turn calls lp_string(), which in turn calls
>talloc().
>
>Here's the fix to lp_servicenumber(), based on similar code in
>getservicebyname()...
>
>--- ../source/param/loadparm.c Fri Aug 31 07:15:36 2001
>+++ ./param/loadparm.c Wed Sep 5 22:11:36 2001
>@@ -3418,7 +3424,8 @@
>
> for (iService = iNumServices - 1; iService >= 0; iService--)
> if (VALID(iService) &&
>- strequal(lp_servicename(iService), pszServiceName))
>+ ServicePtrs[iService]->szService &&
>+ strequal(ServicePtrs[iService]->szService, pszServiceName))
> break;
>
> if (iService < 0)
>
>After the fix is in, the same memory monitoring tools reveal these stats:
>Mem Manager : 4054 blocks, allocation 537K, real allocation 537K, 0 errors
>
>Of that, talloc() now accounts for only 7 of the malloc() calls and 357
>bytes allocated.
>
>Rich Bollinger, Elliott Company
>
>------=_NextPart_000_0040_01C13667.3F142220
>Content-Type: application/octet-stream;
> name="fixleaks.patch"
>Content-Transfer-Encoding: quoted-printable
>Content-Disposition: attachment;
> filename="fixleaks.patch"
>
>--- ../source/param/loadparm.c Fri Aug 31 07:15:36 2001=0A>+++
./param/loadparm.c Wed Sep 5 22:11:36 2001=0A>@@ -3418,7 +3424,8 @@=0A>
=0A> for (iService =3D iNumServices - 1; iService >=3D 0;
iService--)=0A> if (VALID(iService) &&=0A>-
strequal(lp_servicename(iService), pszServiceName))=0A>+
ServicePtrs[iService]->szService &&=0A>+
strequal(ServicePtrs[iService]->szService,
>pszServiceName))=0A> break;=0A> =0A>
if (iService < 0)=0A>
>------=_NextPart_000_0040_01C13667.3F142220--
>
>
>
>--__--__--
>
>
>
>
>Aus: samba-technical digest, Vol 1 #852 - 10 msgs, 5
>
>Message: 5
>Date: Wed, 05 Sep 2001 23:54:22 -0700
>From: Jeremy Allison <jeremy@valinux.com>
>Reply-To: jra@samba.org
>To: Richard Bollinger <rabollinger@home.com>
>Cc: davecb@canada.sun.com, Gerald Carter <gcarter@valinux.com>,
> samba-technical@lists.samba.org
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage ---
>Here's
> the real fix!
>
>Richard Bollinger wrote:
> >
> > Using an old memory allocation debugging / tracking tool (mem_man), I
>monitored what was going
> > on while smbd processed our 300+ printer printcap file...
> >
> > After processing 300 printers, the stats were as follows:
> > Mem Manager : 196110 blocks, allocation 11553K, real allocation
11553K, 0
>errors
> >
> > Of that, talloc() accounted for 192036 of the malloc() calls and
11280K of
>the space allocated.
> >
> > Sure, all of that would be freed eventually, but it amounts to a
torture
>test for the system's
> > malloc() / free() capabilities, which apparently aren't as
aggressive at
>recovering / releasing
> > free space with Solaris as with Linux :-).
> >
> > I tracked the problem to an O(N^2) loop... add_all_printers() calls
>pcap_printer_fn(), which in
> > turn calls lp_add_one_printer(), which in turn calls
lp_servicenumber(),
>which in turn calls
> > lp_servicename(), which in turn calls lp_string(), which in turn calls
>talloc().
> >
> > Here's the fix to lp_servicenumber(), based on similar code in
>getservicebyname()...
> >
> > --- ../source/param/loadparm.c Fri Aug 31 07:15:36 2001
> > +++ ./param/loadparm.c Wed Sep 5 22:11:36 2001
> > @@ -3418,7 +3424,8 @@
> >
> > for (iService = iNumServices - 1; iService >= 0; iService--)
> > if (VALID(iService) &&
> > - strequal(lp_servicename(iService), pszServiceName))
> > + ServicePtrs[iService]->szService &&
> > + strequal(ServicePtrs[iService]->szService, pszServiceName))
> > break;
> >
> > if (iService < 0)
> >
> > After the fix is in, the same memory monitoring tools reveal these
stats:
> > Mem Manager : 4054 blocks, allocation 537K, real allocation 537K, 0
errors
> >
> > Of that, talloc() now accounts for only 7 of the malloc() calls and
357
>bytes allocated.
>
>*Great* detective work ! Thanks. I'll commit this fix
>to 2.2 and HEAD as soon as samba.org comes back on
>line for me :-).
>
>Jeremy.
>
>
>
>
>
>
>
>
>G. Wagner
>
>--------------------------------------
>G?nter Wagner
>MKG Kreditbank GmbH
>Schieferstein 5
>D-65439 Fl?rsheim
>
>Telefon: +49 6145 506 358
>FAX: +49 6145 506 356
>E-Mail: g.wagner@mkg-bank.de
>--------------------------------------