Experts,
We've just migrated from samba 2.2.8a to samba 3.0.20b in a very large
corporate environment. Everything was really fine in our lab, but we
began
experiment serious load problems on the productive servers the morning
after
the procedure took place. I'll try (briefly) to describe the
characteristics
of the scenario:
Resources:
Old Environment:
        Hardware:
                Dell PowerEdge 2650
                        Intel Xeon Processor
                        2 GB Ram
Raid 5 (via perc raid controller) on 10k scsi disks
        Software:
                SuSE Linux Enterprise Server 8
                Samba 2.2.8a Servers
                cups printing service
openldap2 as backend (with replicas all over the country,
about 3000 objects in the tree)
                HeartBeat as high availability Service
Everything was charming here!!!!!!
New Environment
        Hardware:
                Dell PowerEdge 2850 Servers
2 Intel Xeon 3.2 GHz (HT i think... i see 4 of them)
Processors
                        4 GB Ram
Raid 5 (via Perc raid controller) on 15k scsi disks
        Software
                SuSE Linux Enterprise Server 9
                Samba 3.0.20b Servers
                cups printing service
Novell eDirectory 8.7.3.4 as backend (Very distributed too,
about 4000 objects in the tree)
                HeartBeat as high availability Service
drbd to keep samba configuracion replicated among the cluster
nodes.
Problems we're having (or had, just as a usefull comment):
eDirectory turned out to be much slower than openldap2 when responding
to nss_ldap queries (i mean.... about 7 or 8 times slower!!!!) so
queries
asking for members of large groups (i.e: groups with about 1500 users
and
above) were usually terminated with an RPC timeout
Everything started to work when we added the ldapsam:trusted=yes
parameter. It dramatically reduced the response times and affected
queries
began to work.
The implementation of this feature produced some other problems (we've
found workarrounds but i'll comment them just to provide some feedback).
        1) The samba server used to die seconds after it was started. 
Something about the nobody user and it's primary group prevented it from
working in a proper manner. We solved this inconvinient by adding de
user
nobody and it's corresponding primary group to the backend.
2) Root user was no longer recognized, (we still trying to figure out
why, the user's been added to the tree, but nothing changed) so we used
the
new role based administration provided by samba 3 as a workarround 
(SeMachinAccount...), and no more troubles about it.
        3)THIS ISSUE IS KILLING US!!!!!!!
Something happens in a determined moment of the day (rush hour).
Everything is running smoothly (0.3 - 0.4 of load average) when the load
start to grow indefinitely!!!!!!. It raises from 0.3 to 50 in a matter
of
seconds!, and it keeps growing till the server dies. We couldn't find
the
reason of this, but it happens in a two hors interval. Before and after
this
interval, there are no errors of any kind.
        I'll paste some log errors (just the ones i saw). I don't think 
they're the cause of our problems, buy you're the experts.
Any clue? do you need me to gather some kind of information? any DoS
bug reported for this samba version?
        Any help will be highly appreciated
Regards, 
Martin
--
        from /var/log/messages
        Oct 25 04:34:15 srvsmb01 smbd[2961]: [2005/10/25 04:34:15, 0] 
lib/util_sock.c:send_smb(762)
        Oct 25 04:34:15 srvsmb01 smbd[2961]:   Error writing 4 bytes to 
client. -1. (Connection reset by peer)
        Oct 25 04:40:36 srvsmb01 smbd[2983]: [2005/10/25 04:40:36, 0] 
lib/util_sock.c:get_peer_addr(1222)
Oct 25 04:40:36 srvsmb01 smbd[2983]: getpeername failed. Error was
Transport endpoint is not connected
        Oct 25 04:40:36 srvsmb01 smbd[2983]: [2005/10/25 04:40:36, 0] 
lib/util_sock.c:write_data(554)
Oct 25 04:40:36 srvsmb01 smbd[2983]: write_data: write failure in
writing to client 167.252.104.98. Error Connection reset
        by peer
        (this happens very often)
        From /var/log/samba/log.nmbd
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
        from /var/log/samba/log.smbd
          smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:28, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 1 try!
        [2005/10/25 01:29:29, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:29, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 2 try!
        [2005/10/25 01:29:29, 2] smbd/close.c:close_normal_file(270)
cmqtbe4 closed file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea
4.xls (numopen=0)
        [2005/10/25 01:29:29, 2] smbd/open.c:open_file(372)
CMQTBE4 opened file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea
4.xls read=No write=Yes (numopen=1)
        [2005/10/25 01:29:29, 2] smbd/close.c:close_normal_file(270)
cmqtbe4 closed file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea
4.xls (numopen=0)
        [2005/10/25 01:29:30, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:30, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 3 try!
        [2005/10/25 01:29:31, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:31, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 4 try!
        [2005/10/25 01:29:32, 2] 
rpc_server/srv_spoolss_nt.c:find_printer_index_by_hnd(270)
        find_printer_index_by_hnd: Printer handle not found: 
_spoolss_writeprinter: Invalid handle (OTHER:15976:11737)
        [2005/10/25 01:29:32, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:32, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 5 try!
        [2005/10/25 01:29:33, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:33, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 6 try!
[2005/10/25 01:29:34, 2] smbd/sesssetup.c:setup_new_vc_session(704)
setup_new_vc_session: New VC == 0, if NT4.x compatible we would close
all old resources.
[2005/10/25 01:29:34, 2] smbd/sesssetup.c:setup_new_vc_session(704)
setup_new_vc_session: New VC == 0, if NT4.x compatible we would close
all old resources.
        [2005/10/25 01:29:34, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:34, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 7 try!
First of all, why run SuSe when CentOS is free, runs faster and is more up to date? I have basically the same setup you have except our system is a quad xeon system and CentOS runs flawlessly 24/7. We used to experiment with SuSe but it is not good for a corporate environment. Just a heads up as I have been doing this for 17 years and CentOS is the cream of the crop for the money. Martin Scandroli wrote:>Experts, > >We've just migrated from samba 2.2.8a to samba 3.0.20b in a very large >corporate environment. Everything was really fine in our lab, but we >began >experiment serious load problems on the productive servers the morning >after >the procedure took place. I'll try (briefly) to describe the >characteristics >of the scenario: > >Resources: > >Old Environment: > > Hardware: > Dell PowerEdge 2650 > Intel Xeon Processor > 2 GB Ram >Raid 5 (via perc raid controller) on 10k scsi disks > Software: > SuSE Linux Enterprise Server 8 > Samba 2.2.8a Servers > cups printing service >openldap2 as backend (with replicas all over the country, >about 3000 objects in the tree) > HeartBeat as high availability Service > >Everything was charming here!!!!!! > > >New Environment > > Hardware: > Dell PowerEdge 2850 Servers >2 Intel Xeon 3.2 GHz (HT i think... i see 4 of them) >Processors > 4 GB Ram >Raid 5 (via Perc raid controller) on 15k scsi disks > > Software > SuSE Linux Enterprise Server 9 > Samba 3.0.20b Servers > cups printing service >Novell eDirectory 8.7.3.4 as backend (Very distributed too, >about 4000 objects in the tree) > HeartBeat as high availability Service >drbd to keep samba configuracion replicated among the cluster >nodes. > >Problems we're having (or had, just as a usefull comment): > >eDirectory turned out to be much slower than openldap2 when responding >to nss_ldap queries (i mean.... about 7 or 8 times slower!!!!) so >queries >asking for members of large groups (i.e: groups with about 1500 users >and >above) were usually terminated with an RPC timeout > >Everything started to work when we added the ldapsam:trusted=yes >parameter. It dramatically reduced the response times and affected >queries >began to work. >The implementation of this feature produced some other problems (we've >found workarrounds but i'll comment them just to provide some feedback). > > 1) The samba server used to die seconds after it was started. >Something about the nobody user and it's primary group prevented it from >working in a proper manner. We solved this inconvinient by adding de >user >nobody and it's corresponding primary group to the backend. >2) Root user was no longer recognized, (we still trying to figure out >why, the user's been added to the tree, but nothing changed) so we used >the >new role based administration provided by samba 3 as a workarround >(SeMachinAccount...), and no more troubles about it. > > > > 3)THIS ISSUE IS KILLING US!!!!!!! > >Something happens in a determined moment of the day (rush hour). >Everything is running smoothly (0.3 - 0.4 of load average) when the load >start to grow indefinitely!!!!!!. It raises from 0.3 to 50 in a matter >of >seconds!, and it keeps growing till the server dies. We couldn't find >the >reason of this, but it happens in a two hors interval. Before and after >this >interval, there are no errors of any kind. > > I'll paste some log errors (just the ones i saw). I don't think >they're the cause of our problems, buy you're the experts. > >Any clue? do you need me to gather some kind of information? any DoS >bug reported for this samba version? > > Any help will be highly appreciated > >Regards, >Martin > >-- > > from /var/log/messages > > Oct 25 04:34:15 srvsmb01 smbd[2961]: [2005/10/25 04:34:15, 0] >lib/util_sock.c:send_smb(762) > Oct 25 04:34:15 srvsmb01 smbd[2961]: Error writing 4 bytes to >client. -1. (Connection reset by peer) > Oct 25 04:40:36 srvsmb01 smbd[2983]: [2005/10/25 04:40:36, 0] >lib/util_sock.c:get_peer_addr(1222) >Oct 25 04:40:36 srvsmb01 smbd[2983]: getpeername failed. Error was >Transport endpoint is not connected > Oct 25 04:40:36 srvsmb01 smbd[2983]: [2005/10/25 04:40:36, 0] >lib/util_sock.c:write_data(554) >Oct 25 04:40:36 srvsmb01 smbd[2983]: write_data: write failure in >writing to client 167.252.104.98. Error Connection reset > by peer > > (this happens very often) > > From /var/log/samba/log.nmbd > >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) >is already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) > > from /var/log/samba/log.smbd > > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:28, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 1 try! > [2005/10/25 01:29:29, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:29, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 2 try! > [2005/10/25 01:29:29, 2] smbd/close.c:close_normal_file(270) >cmqtbe4 closed file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea >4.xls (numopen=0) > [2005/10/25 01:29:29, 2] smbd/open.c:open_file(372) >CMQTBE4 opened file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea >4.xls read=No write=Yes (numopen=1) > [2005/10/25 01:29:29, 2] smbd/close.c:close_normal_file(270) >cmqtbe4 closed file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea >4.xls (numopen=0) > [2005/10/25 01:29:30, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:30, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 3 try! > [2005/10/25 01:29:31, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:31, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 4 try! > [2005/10/25 01:29:32, 2] >rpc_server/srv_spoolss_nt.c:find_printer_index_by_hnd(270) > find_printer_index_by_hnd: Printer handle not found: >_spoolss_writeprinter: Invalid handle (OTHER:15976:11737) > [2005/10/25 01:29:32, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:32, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 5 try! > [2005/10/25 01:29:33, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:33, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 6 try! >[2005/10/25 01:29:34, 2] smbd/sesssetup.c:setup_new_vc_session(704) >setup_new_vc_session: New VC == 0, if NT4.x compatible we would close >all old resources. >[2005/10/25 01:29:34, 2] smbd/sesssetup.c:setup_new_vc_session(704) >setup_new_vc_session: New VC == 0, if NT4.x compatible we would close >all old resources. > [2005/10/25 01:29:34, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:34, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 7 try! > > >
> First of all, why run SuSe when CentOS is free, runs faster > and is more<snipped the rest> This is the samba list and he was asking for samba help, not for a suggestion that he should change his, possibly corporately mandated, platform choice . Regardless of your personal or tested *opinions*, it was not asked for here. People have reasons for running what they do, some of which are out of their control. By the way, your Mozilla install is horribly out of date.
I am running Suse 9.2 Pro in a corporate environment with 3.0.14a and it works great. CentOS is nice as well but I see no problem with Suse. Just my 0.02... If you truly think this is a samba problem try a different version to either replicate the issue or to have it point to a different piece of the puzzle. What is your complete config? You said the load went sky high in a matter of seconds...do you see which process is running wild (smbd, nmbd, winbindd...). Good luck, Michael Barber WPTZ/WNNE Computer Services Administrator. -----Original Message----- From: samba-bounces+mjbarber=hearst.com@lists.samba.org [mailto:samba-bounces+mjbarber=hearst.com@lists.samba.org] On Behalf Of merle@gardenfreshcorp.com Sent: Friday, October 28, 2005 12:48 PM To: masc@intraredes.com Cc: samba@lists.samba.org Subject: Re: [Samba] Overloaded samba server. Is it a bug? First of all, why run SuSe when CentOS is free, runs faster and is more up to date? I have basically the same setup you have except our system is a quad xeon system and CentOS runs flawlessly 24/7. We used to experiment with SuSe but it is not good for a corporate environment. Just a heads up as I have been doing this for 17 years and CentOS is the cream of the crop for the money. Martin Scandroli wrote:>Experts, > >We've just migrated from samba 2.2.8a to samba 3.0.20b in a very large >corporate environment. Everything was really fine in our lab, but we >began experiment serious load problems on the productive servers the >morning after the procedure took place. I'll try (briefly) to describe >the characteristics of the scenario: > >Resources: > >Old Environment: > > Hardware: > Dell PowerEdge 2650 > Intel Xeon Processor > 2 GB Ram >Raid 5 (via perc raid controller) on 10k scsi disks > Software: > SuSE Linux Enterprise Server 8 > Samba 2.2.8a Servers > cups printing service >openldap2 as backend (with replicas all over the country, about 3000 >objects in the tree) > HeartBeat as high availability Service > >Everything was charming here!!!!!! > > >New Environment > > Hardware: > Dell PowerEdge 2850 Servers >2 Intel Xeon 3.2 GHz (HT i think... i see 4 of them) Processors > 4 GB Ram >Raid 5 (via Perc raid controller) on 15k scsi disks > > Software > SuSE Linux Enterprise Server 9 > Samba 3.0.20b Servers > cups printing service >Novell eDirectory 8.7.3.4 as backend (Very distributed too, about 4000 >objects in the tree) > HeartBeat as high availability Service drbd to keep >samba configuracion replicated among the cluster nodes. > >Problems we're having (or had, just as a usefull comment): > >eDirectory turned out to be much slower than openldap2 when responding >to nss_ldap queries (i mean.... about 7 or 8 times slower!!!!) so >queries asking for members of large groups (i.e: groups with about 1500 >users and >above) were usually terminated with an RPC timeout > >Everything started to work when we added the ldapsam:trusted=yes >parameter. It dramatically reduced the response times and affected >queries began to work. >The implementation of this feature produced some other problems (we've >found workarrounds but i'll comment them just to provide some feedback). > > 1) The samba server used to die seconds after it was started. >Something about the nobody user and it's primary group prevented it >from working in a proper manner. We solved this inconvinient by adding >de user nobody and it's corresponding primary group to the backend. >2) Root user was no longer recognized, (we still trying to figure out >why, the user's been added to the tree, but nothing changed) so we used >the new role based administration provided by samba 3 as a workarround >(SeMachinAccount...), and no more troubles about it. > > > > 3)THIS ISSUE IS KILLING US!!!!!!! > >Something happens in a determined moment of the day (rush hour). >Everything is running smoothly (0.3 - 0.4 of load average) when the >load start to grow indefinitely!!!!!!. It raises from 0.3 to 50 in a >matter of seconds!, and it keeps growing till the server dies. We >couldn't find the reason of this, but it happens in a two hors >interval. Before and after this interval, there are no errors of any >kind. > > I'll paste some log errors (just the ones i saw). I don't think >they're the cause of our problems, buy you're the experts. > >Any clue? do you need me to gather some kind of information? any DoS >bug reported for this samba version? > > Any help will be highly appreciated > >Regards, >Martin > >-- > > from /var/log/messages > > Oct 25 04:34:15 srvsmb01 smbd[2961]: [2005/10/25 04:34:15, 0] >lib/util_sock.c:send_smb(762) > Oct 25 04:34:15 srvsmb01 smbd[2961]: Error writing 4 bytes to >client. -1. (Connection reset by peer) > Oct 25 04:40:36 srvsmb01 smbd[2983]: [2005/10/25 04:40:36, 0] >lib/util_sock.c:get_peer_addr(1222) >Oct 25 04:40:36 srvsmb01 smbd[2983]: getpeername failed. Error was >Transport endpoint is not connected > Oct 25 04:40:36 srvsmb01 smbd[2983]: [2005/10/25 04:40:36, 0] >lib/util_sock.c:write_data(554) >Oct 25 04:40:36 srvsmb01 smbd[2983]: write_data: write failure in >writing to client 167.252.104.98. Error Connection reset > by peer > > (this happens very often) > > From /var/log/samba/log.nmbd > >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) >tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959) is >already open in this process > [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767) > > from /var/log/samba/log.smbd > > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:28, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 1 try! > [2005/10/25 01:29:29, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:29, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 2 try! > [2005/10/25 01:29:29, 2] smbd/close.c:close_normal_file(270) >cmqtbe4 closed file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea >4.xls (numopen=0) > [2005/10/25 01:29:29, 2] smbd/open.c:open_file(372) >CMQTBE4 opened file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea >4.xls read=No write=Yes (numopen=1) > [2005/10/25 01:29:29, 2] smbd/close.c:close_normal_file(270) >cmqtbe4 closed file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea >4.xls (numopen=0) > [2005/10/25 01:29:30, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:30, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 3 try! > [2005/10/25 01:29:31, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:31, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 4 try! > [2005/10/25 01:29:32, 2] >rpc_server/srv_spoolss_nt.c:find_printer_index_by_hnd(270) > find_printer_index_by_hnd: Printer handle not found: >_spoolss_writeprinter: Invalid handle (OTHER:15976:11737) > [2005/10/25 01:29:32, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:32, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 5 try! > [2005/10/25 01:29:33, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:33, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 6 try! >[2005/10/25 01:29:34, 2] smbd/sesssetup.c:setup_new_vc_session(704) >setup_new_vc_session: New VC == 0, if NT4.x compatible we would close >all old resources. >[2005/10/25 01:29:34, 2] smbd/sesssetup.c:setup_new_vc_session(704) >setup_new_vc_session: New VC == 0, if NT4.x compatible we would close >all old resources. > [2005/10/25 01:29:34, 0] lib/smbldap.c:smbldap_open(822) > smbldap_open: cannot access LDAP when not root.. > [2005/10/25 01:29:34, 1] lib/smbldap.c:another_ldap_try(951) > Connection to LDAP server failed for the 7 try! > > >-- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/listinfo/samba
On Thu, 2005-10-27 at 03:12 -0300, Martin Scandroli wrote:> Experts,> The implementation of this feature produced some other problems (we've > found workarrounds but i'll comment them just to provide some feedback). > > 1) The samba server used to die seconds after it was started. > Something about the nobody user and it's primary group prevented it from > working in a proper manner. We solved this inconvinient by adding de > user > nobody and it's corresponding primary group to the backend.Yep, this is a known requirement for that feature. I'm not sure it should die, but it can't work without all the accounts it will deal with in LDAP. (Otherwise we have to use the slower method, which is why you turned this on in the first place).> 2) Root user was no longer recognized, (we still trying to figure out > why, the user's been added to the tree, but nothing changed) so we used > the > new role based administration provided by samba 3 as a workarround > (SeMachinAccount...), and no more troubles about it.Yep.> > > 3)THIS ISSUE IS KILLING US!!!!!!! > > Something happens in a determined moment of the day (rush hour). > Everything is running smoothly (0.3 - 0.4 of load average) when the load > start to grow indefinitely!!!!!!. It raises from 0.3 to 50 in a matter > of > seconds!, and it keeps growing till the server dies. We couldn't find > the > reason of this, but it happens in a two hors interval. Before and after > this > interval, there are no errors of any kind. > > I'll paste some log errors (just the ones i saw). I don't think > they're the cause of our problems, buy you're the experts. > > Any clue? do you need me to gather some kind of information? any DoS > bug reported for this samba version?My guess is this: Your LDAP server is getting backed up because of a bug, perhaps invoving a lock in the database. Then Samba processes start backing up, trying to access LDAP, which is wedged. They keep hammering at the ldap server in the backoff pattern, then fail (causing the client to try again). Because the questions are not being answered, the load goes though the roof, and this causes the LDAP sever more pain. One option is to separate your LDAP server from your samba server, and have more than one LDAP server available per Samba server. This allows Samba to use the other server, with the local one recovers (assuming some short-term lock). Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Samba Developer, SuSE Labs, Novell Inc. http://suse.de Authentication Developer, Samba Team http://samba.org Student Network Administrator, Hawker College http://hawkerc.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.samba.org/archive/samba/attachments/20051029/8a6c68d7/attachment.bin
Hi,
-|  > Is there any place in samba where I shoulb be looking?
-|  > Any info/pointers would be much appreciated.
we don't have any problems with memberships in more than two hundred groups.
OS: SuSE SLES 9, Samba 3.0.14a
Mit freundlichem Gru?,
Dirk Laurenz
Systems Engineer	
Fujitsu Siemens Computers
S CE DE SE PS N/O
Sales Central Europe Deutschland 
Professional Service Nord / Ost
Hildesheimer Strasse 25
30880 Laatzen
Germany
Telephone:	+49 (511) 84 89 - 18 08
Telefax:	+49 (511) 84 89 - 25 18 08
Mobile:	+49 (170) 22 10 781
Email:	mailto:dirk.laurenz@fujitsu-siemens.com
Internet:	http://www.fujitsu-siemens.com
            http://www.fujitsu-siemens.de/services/index.html
*******************************************************************************************************************
On Wednesday 02 November 2005 19:50, Jeremy Allison wrote:> On Wed, Nov 02, 2005 at 06:53:36PM -0300, Martin wrote: > > #> strace -f -p <PID_OF_SMBD> > > > > RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 18[ ... ]> > 2005/INVERSION", {st_mode=S_IFDIR|0770, st_size=128, ...}) = 0 > > stat64("Estructura_Central/marketing/Medios/Victor/insitucional > > 2005/INVERSION/cao 2.xls", 0xbfffcec0) = -1 ENOENT (No such file or > > directory) > > stat64("Estructura_Central/marketing/Medios/Victor/insitucional > > 2005/INVERSION/cao 2.xls", 0xbfffcec0) = -1 ENOENT (No such file or > > directory) > > What filesystem is this ?1TB with reiserfs in LVM -- Mrtn
On Wednesday 02 November 2005 19:50, Jeremy Allison wrote:> On Wed, Nov 02, 2005 at 06:53:36PM -0300, Martin wrote: > > #> strace -f -p <PID_OF_SMBD> > > > > RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 18 > > fstat64(18, {st_mode=S_IFDIR|0770, st_size=128, ...}) = 0 > > fcntl64(18, F_SETFD, FD_CLOEXEC) = 0 > > getdents64(18, /* 4 entries */, 4096) = 136 > > getdents64(18, /* 0 entries */, 4096) = 0[ ... ]> > write(45, " reply_unlink : Estructura_Cent"..., 98) = 98 > > stat64("Estructura_Central/marketing/Medios/Victor/insitucional > > 2005/INVERSION", {st_mode=S_IFDIR|0770, st_size=128, ...}) = 0 > > stat64("Estructura_Central/marketing/Medios/Victor/insitucional > > 2005/INVERSION/cao 2.xls", 0xbfffcec0) = -1 ENOENT (No such file or > > directory) > > stat64("Estructura_Central/marketing/Medios/Victor/insitucional > > 2005/INVERSION/cao 2.xls", 0xbfffcec0) = -1 ENOENT (No such file or > > directory) > > What filesystem is this ?1TB with reiserfs in LVM -- Mrtn
>> 1TB with reiserfs in LVM > > We have a similar installation: Kernel 2.6.5-7.201-smp (the official > kernel of SuSE 9.1 Professional) and we are using openldap and reiserfs > too. Additonally we are using quota on the filesystem. Our server hangs > often in this situation with a load of 350!!! The interesting part is > that the cpu's are 92% idle. If we deactivate the quota subsystem the > server will work for a longer time, but it could also happen that the > load reaches 350... Only a reboot will solve this problem... > > Martin: Which kernel are you using? Do you use quota on your filesystem?My 2 Eurocents: With the same setup I've had a similar problem when there was a slight inconsistency in the spelling of a user name in the group. Can you exclude the possibility that a user name is misspelled somewhere?
Martin Scandroli
2005-Nov-11  03:07 UTC
[Samba] Overloaded samba server. Is it a bug? (but not a samba bug)
Well. Finally we resolve it. The problem was with the QLA driver, we applied a kernel patch (kernel-bigsmp-2.6.5-7.234.i586.rpm) provided by SuSE support and it is working fine. The patch will be provided soon in next SLES9 Support Pack 3. Anyway, thanks all of you for your help! Mart?n On Nov 04, 2005 01:36 PM, Jeremy Allison <jra@samba.org> wrote:> On Fri, Nov 04, 2005 at 10:51:52AM -0300, Martin wrote: > > > > How could we find it out? How could we get enough debugging level to > > reach > > this information? > > > > When the smbd proccess stopped in D state the strace does not show > > any line... > > Attach to it with gdb and type "bt". > > Jeremy. >
Maybe Matching Threads
- re. Initial Samba Setup
- NT Workstation and Samba PDC
- smbldap_open: cannot access LDAP when not root ...
- locking.tdb: expand_file ftruncate to 8192 failed (Permission denied)
- SAMBA LOG: tdb/tdbutil.c:tdb_log(725) tdb(unnamed): tdb_open_ex: /var/cache/samba/unexpected.tdb (835, 457947) is already open in this process