thr3ads.net - samba - [Samba] Samba 3.5.8 (and 3.5.5) shipped with Solaris 10 keeps crashing when smbd process count hits about 500-600 [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Matti Rintala

2011-Sep-01 08:32 UTC

[Samba] Samba 3.5.8 (and 3.5.5) shipped with Solaris 10 keeps crashing when smbd process count hits about 500-600

Hi,

We are running Samba on Solaris 10 cluster as a HA service. There are two nodes
in the cluster and Samba versions are 3.5.8 on other node and 3.5.5 on another.
Samba build is one that ships with Solaris 10. We are using Sun (Oracle) LDAP
for user account data so passwd and group databases related information is
retrieved from there. Authentication is done against Windows 2008 AD.

This Samba service is serving users home directories. Same data is also shared
using NFS. We have over 11000 user accounts. During summer this new service was
working nicely but when user count has increased we are experiencing severe
problems. When smbd process limit hits about 500 Samba just stops responding and
we have to restart it. Usually Oracle Solaris Cluster does restart but it fails
because one smbd process won't die even with -9 signal. Nothing really
crashes and at least for some time mother smbd keeps forking new childs so
process count keeps increasing.

We have opened support case to Oracle and together with them we have speculated
that this issue might be caused by naming service and/or LDAP issue. So we
disabled nscd but that didn't have any effect. We have also switched
hosts' ldap_cachemgr to use more efficient LDAP server without success.

Any ideas what could be wrong or any ideas how to debug the problem, please? We
are still continuing investigations with Oracle too.


Matti

Volker Lendecke

2011-Sep-01 09:48 UTC

head link

[Samba] Samba 3.5.8 (and 3.5.5) shipped with Solaris 10 keeps crashing when smbd process count hits about 500-600

On Thu, Sep 01, 2011 at 11:32:32AM +0300, Matti Rintala
wrote:> We are running Samba on Solaris 10 cluster as a HA
> service. There are two nodes in the cluster and Samba
> versions are 3.5.8 on other node and 3.5.5 on another.
> Samba build is one that ships with Solaris 10. We are
> using Sun (Oracle) LDAP for user account data so passwd
> and group databases related information is retrieved from
> there. Authentication is done against Windows 2008 AD.
> 
> This Samba service is serving users home directories. Same
> data is also shared using NFS. We have over 11000 user
> accounts. During summer this new service was working
> nicely but when user count has increased we are
> experiencing severe problems. When smbd process limit hits
> about 500 Samba just stops responding and we have to
> restart it. Usually Oracle Solaris Cluster does restart
> but it fails because one smbd process won't die even with
> -9 signal. Nothing really crashes and at least for some
> time mother smbd keeps forking new childs so process count
> keeps increasing. 
Not being able to kill a process with -9 is a kernel
problem. You need to find out what the process is doing,
although I'm not sure how to do that under Solaris. Can
truss or some other tool inspect a process that is stuck?
> We have opened support case to Oracle and together with
> them we have speculated that this issue might be caused by
> naming service and/or LDAP issue. So we disabled nscd but
> that didn't have any effect. We have also switched hosts'
> ldap_cachemgr to use more efficient LDAP server without
> success.
Naa, not being able to kill -9 is VERY unlikely to be a LDAP
issue. That's mostly user space, except SUN might have some
door implementation of nss.

Volker

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen

Thomas Nau

2011-Sep-01 10:36 UTC

head link

[Samba] Samba 3.5.8 (and 3.5.5) shipped with Solaris 10 keeps crashing when smbd process count hits about 500-600

On 09/01/2011 10:32 AM, Matti Rintala wrote:> Hi,
> 
> We are running Samba on Solaris 10 cluster as a HA service. There are two
nodes in the cluster and Samba versions are 3.5.8 on other node and 3.5.5 on
another. Samba build is one that ships with Solaris 10. We are using Sun
(Oracle) LDAP for user account data so passwd and group databases related
information is retrieved from there. Authentication is done against Windows 2008
AD.
> 
> This Samba service is serving users home directories. Same data is also
shared using NFS. We have over 11000 user accounts. During summer this new
service was working nicely but when user count has increased we are experiencing
severe problems. When smbd process limit hits about 500 Samba just stops
responding and we have to restart it. Usually Oracle Solaris Cluster does
restart but it fails because one smbd process won't die even with -9 signal.
Nothing really crashes and at least for some time mother smbd keeps forking new
childs so process count keeps increasing.
I'm not sure if any of the p* commands or truss will be of some help in
that state. Nevertheless you could check callstack and open files using
pfiles and pstack

If those don't help one idea that pops up in my mind is to use dtrace

> We have opened support case to Oracle and together with them we have
speculated that this issue might be caused by naming service and/or LDAP issue.
So we disabled nscd but that didn't have any effect. We have also switched
hosts' ldap_cachemgr to use more efficient LDAP server without success.
I doubt that as those are not kernel related and the "kill -9" issue
point to some kernel "problem"
> Any ideas what could be wrong or any ideas how to debug the problem,
please? We are still continuing investigations with Oracle too.
Thomas

Possibly Parallel Threads

Search for more reasonably related threads

samba - Sep 2011 - Samba 3.5.8 (and 3.5.5) shipped with Solaris 10 keeps crashing when smbd process count hits about 500-600

[Samba] Samba 3.5.8 (and 3.5.5) shipped with Solaris 10 keeps crashing when smbd process count hits about 500-600

[Samba] Samba 3.5.8 (and 3.5.5) shipped with Solaris 10 keeps crashing when smbd process count hits about 500-600

[Samba] Samba 3.5.8 (and 3.5.5) shipped with Solaris 10 keeps crashing when smbd process count hits about 500-600

Possibly Parallel Threads