Le 10/05/2010 19:14, Jim Kusznir a ?crit :> Hi all:
>
> I've got a couple Ubuntu 9.10 machines that are suffering from a
> recurring failure of winbind that essentially crash the machine. When
> the system is in the "crashed state", one can ping the system,
but all
> forms of login fail.
It's normal, winbind don't works anymore, so all services using pam are
out of service.> It will not even respond to tftpd requests; ssh
> connections "time out", but the initial port is opened (just no
> connect). Rebooting does NOT recover from this, in order to recover,
> I need to:
>
> 1) reboot into single user mode
>
Have you enough place on your partitions at this step ?> 2) edit /etc/nsswitch.conf and remove winbind
> 3) remove winbind from all pam.d/*
> 4) boot normally
> 5) stop samba and winbind
> 6) delete /var/lib/samba/* and /var/cache/samba/*
> 7) start samba
> 8) rejoin doimain
> 9) start winbind
> 10) undo #2 and 3 above
>
> After this, winbind will work for a week or two. If I stop after step
> 4 above the system is usable, but without domain users able to log in.
> My diagnostics show that net ads users (and all other "samba"
> commands) work just fine and find all users. All winbind-specific
> commands (wbinfo -u, etc) fail. Oh, if I leave the system up in the
> crashed state, it begins to fill up logs to the tune of 32gigs in a
> few days. The above procedure repeats approximately once every 5 days
> on our main production system. I have a second workstation that sees
> very little use, and it has suffered the same crash, but far less
> frequently. I have also tried inserting step 6.5 where I delete the
> machine account on the DC, but that doesn't change anything. Also,
> our Ubuntu 9.04 system running the same configuration files has no
> issues. We have not tried 10.04.
>
> This problem has been plaguing our operations for over two months now,
> so any assistance would be greatly appreciated.
>
> Some log file snippits:
>
> (from some point "in the middle" of the crash):
> May 7 15:32:45 casas-lin winbindd[20677]: sys_select: pipe failed
> (Too many open files)
>
"Too many open files" means your system has reach the limit of open
files
try tu use lsof command to see which process open too many files.
lsof|wc -l
to see how many files are open
lsof|less
to see all open files
cat /proc/sys/fs/file-max
to see the system limit
> May 7 15:32:45 casas-lin winbindd[20677]: [2010/05/07 15:32:45, 0]
> lib/events.c:287(s3_event
> _debug)
> May 7 15:32:45 casas-lin winbindd[20677]: s3_event: sys_select()
> failed: 24:Too many open f
> iles
> May 7 15:32:45 casas-lin winbindd[20677]: [2010/05/07 15:32:45, 0]
> lib/select.c:64(sys_selec
> t)
> May 7 15:32:45 casas-lin winbindd[20677]: [2010/05/07 15:32:45, 0]
> lib/debug.c:663(reopen_lo
> gs)
> May 7 15:32:45 casas-lin winbindd[20677]: Unable to open new log
> file /var/log/samba/log.wb
> -CASAS: Too many open files
> ------
> From startup (step 4 above):
> May 10 08:36:50 casas-lin kernel: May 10 08:38:42 casas-lin
> winbindd[1571]: [2010/05/10 08:38:
> 42, 0] libsmb/smb_signing.c:255(signing_good)
> May 10 08:38:42 casas-lin winbindd[1571]: signing_good: BAD SIG: seq 41
> May 10 08:42:25 casas-lin winbindd[1562]: [2010/05/10 08:42:25, 0]
> winbindd/winbindd_dual.c:1
> 86(async_request_timeout_handler)
> May 10 08:42:25 casas-lin winbindd[1562]:
> async_request_timeout_handler: child pid 1571 is n
> ot responding. Closing connection to it.
> May 10 08:42:25 casas-lin winbindd[1571]: [2010/05/10 08:42:25, 0]
> winbindd/winbindd.c:190(wi
> nbindd_sig_term_handler)
> May 10 08:42:25 casas-lin winbindd[1571]: Got sig[15] terminate
(is_parent=0)
> May 10 08:42:25 casas-lin winbindd[1825]: [2010/05/10 08:42:25, 0]
> rpc_client/cli_pipe.c:687(
> cli_pipe_verify_schannel)
> May 10 08:42:25 casas-lin winbindd[1825]: cli_pipe_verify_schannel:
> auth_len 56.
> May 10 08:43:37 casas-lin winbindd[1825]: [2010/05/10 08:43:37, 0]
> libsmb/smb_signing.c:255(s
> igning_good)
> May 10 08:43:37 casas-lin winbindd[1825]: signing_good: BAD SIG: seq 23
> May 10 08:47:25 casas-lin winbindd[1562]: [2010/05/10 08:47:25, 0]
> winbindd/winbindd_dual.c:1
> 86(async_request_timeout_handler)
> May 10 08:47:25 casas-lin winbindd[1562]:
> async_request_timeout_handler: child pid 1825 is n
> ot responding. Closing connection to it.
> May 10 08:47:25 casas-lin winbindd[1825]: [2010/05/10 08:47:25, 0]
> winbindd/winbindd.c:190(wi
> nbindd_sig_term_handler)
> May 10 08:47:25 casas-lin winbindd[1825]: Got sig[15] terminate
(is_parent=0)
> May 10 08:47:25 casas-lin winbindd[1832]: [2010/05/10 08:47:25, 0]
> rpc_client/cli_pipe.c:687(
> cli_pipe_verify_schannel)
> May 10 08:47:25 casas-lin winbindd[1832]: cli_pipe_verify_schannel:
> auth_len 56.
> May 10 08:48:38 casas-lin winbindd[1832]: [2010/05/10 08:48:38, 0]
> libsmb/smb_signing.c:255(s
> igning_good)
> May 10 08:48:38 casas-lin winbindd[1832]: signing_good: BAD SIG: seq 23
> May 10 08:52:25 casas-lin winbindd[1562]: [2010/05/10 08:52:25, 0]
> winbindd/winbindd_dual.c:1
> 86(async_request_timeout_handler)
> May 10 08:52:25 casas-lin winbindd[1562]:
> async_request_timeout_handler: child pid 1832 is n
> ot responding. Closing connection to it.
> May 10 08:52:25 casas-lin winbindd[1832]: [2010/05/10 08:52:25, 0]
> winbindd/winbindd.c:190(wi
> nbindd_sig_term_handler)
>
> ---------
> log.wb-CASAS (my domain is CASAS.WSU.EDU)
> [2010/05/10 09:12:26, 1] libsmb/clikrb5.c:697(ads_krb5_mk_req)
> ads_krb5_mk_req: krb5_get_credentials failed for ad1$@CASAS (KDC
> reply did not match expectations)
> [2010/05/10 09:12:26, 1]
libsmb/cliconnect.c:745(cli_session_setup_kerberos)
> cli_session_setup_kerberos: spnego_gen_negTokenTarg failed: KDC
> reply did not match expectations
> [2010/05/10 09:12:26, 0]
rpc_client/cli_pipe.c:687(cli_pipe_verify_schannel)
> cli_pipe_verify_schannel: auth_len 56.
> [2010/05/10 09:12:26, 1]
> rpc_client/cli_pipe.c:948(cli_pipe_validate_current_pdu)
> cli_pipe_validate_current_pdu: RPC fault code DCERPC fault
> 0x00000721 received from host ad1.casas.wsu.edu!
> -------
> log-wb-CASAS.old (during "crashed state"):
> [2010/04/19 08:17:23, 1] libsmb/clikrb5.c:697(ads_krb5_mk_req)
> ads_krb5_mk_req: krb5_get_credentials failed for ad1$@CASAS (Cannot
> resolve network address
> for KDC in requested realm)
> [2010/04/19 08:17:23, 1]
libsmb/cliconnect.c:745(cli_session_setup_kerberos)
> cli_session_setup_kerberos: spnego_gen_negTokenTarg failed: Cannot
> resolve network address f
> or KDC in requested realm
> [2010/04/19 08:17:23, 0]
rpc_client/cli_pipe.c:687(cli_pipe_verify_schannel)
> cli_pipe_verify_schannel: auth_len 56.
> [2010/04/19 08:17:23, 1]
> rpc_client/cli_pipe.c:948(cli_pipe_validate_current_pdu)
> cli_pipe_validate_current_pdu: RPC fault code DCERPC fault
> 0x00000721 received from host ad1
> .casas.wsu.edu!
> ------------
> My configuration
> ------------
> smb.conf
> ------------
> [global]
> security = ads
> netbios name = casas-lin
> realm = CASAS.WSU.EDU
> workgroup = CASAS
> password server = ad1.casas.wsu.edu
> workgroup = CASAS
> idmap uid = 10000-20000
> idmap gid = 10000-20000
> idmap backend = rid:CASAS.WSU.EDU=10000-20000
> winbind enum users = yes
> winbind enum groups = yes
> winbind use default domain = yes
> #template homedir = /home/%U
> template homedir = /net/files/home/%U
> template shell = /bin/bash
> ; client use spnego = yes
> domain master = no
> --------------
> /etc/krb5.conf
> -------------
> [logging]
> default =FILE:/var/log/krb5libs.log
> kdc =FILE:/var/log/krb5kdc.log
> admin_server =FILE:/var/log/kadmind.log
>
> [libdefaults]
> default_realm = CASAS.WSU.EDU
> dns_lookup_realm = false
> dns_lookup_kdc = true
> ticket_lifetime = 24h
> forwardable = yes
>
> [realms]
> EXAMPLE.COM = {
> kdc = kerberos.example.com:88
> admin_server = kerberos.example.com:749
> default_domain = example.com
> }
>
> CASAS.WSU.EDU = {
> kdc = ad1.casas.wsu.edu
> admin_server = ad1.casas.wsu.edu
> kdc = ad1.casas.wsu.edu
> }
>
> CASAS = {
> kdc = ad1.casas.wsu.edu
> admin_server = ad1.casas.wsu.edu
> kdc = ad1.casas.wsu.edu
> }
>
> [domain_realm]
> .example.com = EXAMPLE.COM
> example.com = EXAMPLE.COM
>
> casas.wsu.edu = CASAS.WSU.EDU
> .casas.wsu.edu = CASAS.WSU.EDU
> [appdefaults]
> pam = {
> debug = false
> ticket_lifetime = 36000
> renew_lifetime = 36000
> forwardable = true
> krb4_convert = false
> }
> ---------------
> /etc/pam.d/common-account
> ---------------
> account [success=1 new_authtok_reqd=done default=ignore] pam_unix.so
> account requisite pam_deny.so
> account required pam_permit.so
> account sufficient pam_winbind.so
> account required pam_krb5.so minimum_uid=1000
> ------------
> /etc/pam.d/common-auth
> ------------
> auth [success=3 default=ignore] pam_winbind.so krb5_auth
krb5_ccache_type=FILE
> auth [success=2 default=ignore] pam_krb5.so minimum_uid=1000 try_first_pass
> auth [success=1 default=ignore] pam_unix.so nullok_secure try_first_pass
> auth requisite pam_deny.so
> auth required pam_permit.so
> ------------
> /etc/pam.d/common-password
> ------------
> password requisite pam_winbind.so
> password requisite pam_krb5.so minimum_uid=1000 use_authtok
> password [success=1 default=ignore] pam_unix.so obscure use_authtok
> try_first_pass sha512
> password requisite pam_deny.so
> password required pam_permit.so
> password optional pam_gnome_keyring.so
> -------------
> /etc/nsswitch.conf
> -------------
> passwd: compat winbind
> group: compat winbind
> shadow: compat
>
> hosts: files dns mdns4
> networks: files
>
> protocols: db files
> services: db files
> ethers: db files
> rpc: db files
>
> netgroup: nis
> ----------------
>
> Thanks!
> --Jim
>