Packet send failed to 62.101.92.90(137) ERRNO=Network is unreachable [2004/05/03 10:27:45, 0] nmbd/nmbd_packets.c:reply_netbios_packet(975) Hi list, once again looking for some good advice. I apologize for the long email and for the mistakes I could have done. I'm running Samba 3.0.2-6.3E on White box linux 3.0 kernel 2.4.21-4.EL (clone of Red Hat EL 3.0), the environment is a Win 2k ADS domain. Users log into the shares (just common folders - no home dir) using winbind. Everything works fine for days, but a couple time the server froze and I was unable to even log in from console. Last time it happened, I could have a look at processes running on the box cause I had an already opened shell, and I found winbindd was using 149M memory. I couldn't ping from or to the box (1 ping success on 6 circa) and restarting winbindd and network solved the problem. Now everything is working fine, the server has not been restarted. I had a look at the logs, and this is what I think is more important: log.smbd a lot of: [2004/05/05 14:54:54, 0] lib/util_sock.c:get_peer_addr(952) getpeername failed. Error was Transport endpoint is not connected I read that this means a broken connection, but then how can it be that just restarting network services all goes back working fine? Couldn't it be that in any way samba interacting with network produced a network collapse? log.winbind [2004/05/05 14:50:25, 1] libads/ldap.c:ads_connect(222) Failed to get ldap server info [2004/05/05 14:55:54, 1] libads/ldap.c:ads_connect(222) Failed to get ldap server info [2004/05/05 14:57:16, 0] rpc_client/cli_pipe.c:rpc_api_pipe(424) cli_pipe: return critical error. Error was Call timed out: server did not respond after 10000 milliseconds [2004/05/05 15:00:47, 1] libads/ldap.c:ads_connect(222) Failed to get ldap server info [2004/05/05 15:01:13, 1] libsmb/cliconnect.c:cli_start_connection(1372) failed negprot [2004/05/05 15:01:23, 1] libsmb/cliconnect.c:cli_start_connection(1372) failed negprot [2004/05/05 15:01:37, 1] libsmb/cliconnect.c:cli_start_connection(1372) failed negprot [2004/05/05 15:05:48, 1] libads/ldap.c:ads_connect(222) Failed to get ldap server info [2004/05/05 15:13:10, 0] rpc_client/cli_pipe.c:rpc_api_pipe(424) cli_pipe: return critical error. Error was Call timed out: server did not respond after 10000 milliseconds [2004/05/05 15:17:32, 0] rpc_client/cli_pipe.c:cli_nt_session_open(1437) cli_nt_session_open: cli_nt_create failed on pipe \NETLOGON to machine FBCSRVDC01. Error was Call timed out: server did not respond after 10000 milliseconds [2004/05/05 15:21:02, 0] lib/pidfile.c:pidfile_create(84) ERROR: winbindd is already running. File /var/run/winbindd.pid exists and process id 1127 is running. [2004/05/05 15:21:32, 1] libads/ldap.c:ads_connect(222) Failed to get ldap server info [2004/05/05 15:22:25, 1] libads/ldap.c:ads_connect(222) Failed to get ldap server info [2004/05/05 15:22:25, 1] libads/ldap_utils.c:ads_do_search_retry(77) ads_search_retry: failed to reconnect (Can't contact LDAP server) [2004/05/05 15:22:25, 1] libads/ads_ldap.c:ads_name_to_sid(58) name_to_sid ads_search: Can't contact LDAP server [2004/05/05 15:22:25, 1] nsswitch/winbindd_group.c:winbindd_getgroups(954) user 'fbcrompc116$' does not exist [2004/05/05 15:22:25, 0] lib/fault.c:fault_report(36) ============================================================== [2004/05/05 15:22:25, 0] lib/fault.c:fault_report(37) INTERNAL ERROR: Signal 11 in pid 1127 (3.0.2-6.3E) Please read the appendix Bugs of the Samba HOWTO collection [2004/05/05 15:22:25, 0] lib/fault.c:fault_report(39) ============================================================== [2004/05/05 15:22:25, 0] lib/util.c:smb_panic(1422) PANIC: internal error [2004/05/05 15:22:25, 0] lib/util.c:smb_panic(1430) BACKTRACE: 14 stack frames: #0 winbindd(smb_panic+0x13f) [0x80cb96f] #1 winbindd [0x80b7428] #2 /lib/tls/libc.so.6 [0xb73b3c08] #3 winbindd(ads_name_to_sid+0x5c) [0x8181a6c] #4 winbindd [0x8084980] #5 winbindd [0x807912c] #6 winbindd(winbindd_lookup_sid_by_name+0x66) [0x8074cf6] #7 winbindd(winbindd_getpwnam+0x249) [0x806f0f9] #8 winbindd(strftime+0x14bc) [0x806d6d8] #9 winbindd(winbind_process_packet+0x2f) [0x806da2f] #10 winbindd(strftime+0x2197) [0x806e3b3] #11 winbindd(main+0x43e) [0x806e97e] #12 /lib/tls/libc.so.6(__libc_start_main+0xf8) [0xb73a1748] #13 winbindd(chroot+0x35) [0x806cdf1] [2004/05/05 15:23:18, 1] libads/ldap.c:ads_connect(222) Failed to get ldap server info [2004/05/05 15:23:38, 1] libsmb/cliconnect.c:cli_start_connection(1372) failed negprot [2004/05/05 15:24:55, 0] rpc_client/cli_pipe.c:rpc_api_pipe(424) cli_pipe: return critical error. Error was Call timed out: server did not respond after 10000 milliseconds [2004/05/05 15:24:55, 1] nsswitch/winbindd_util.c:add_trusted_domain(166) Added domain FBCMEDIA FBCMEDIA.COM S-0-0 [2004/05/05 15:24:55, 1] libsmb/clikrb5.c:ads_krb5_mk_req(269) krb5_cc_get_principal failed (No credentials cache found) Is it ok that SID for domain is FBCMEDIA.COM S-0-0 ?? If I do net getlocalsid fbcmedia I get S-1-5-21-735.....and so on. All net commands and groupmappings are working, wbinfo ok. messages.log May 5 14:52:44 fbcsrvsmb01 smbd[8786]: write_socket_data: write failure. Error = Broken pipe May 5 14:52:44 fbcsrvsmb01 smbd[8786]: [2004/05/05 14:52:44, 0] lib/util_sock.c:write_socket(413) May 5 14:52:44 fbcsrvsmb01 smbd[8786]: write_socket: Error writing 61503 bytes to socket 5: ERRNO = Broken pipe May 5 14:52:44 fbcsrvsmb01 smbd[8786]: [2004/05/05 14:52:44, 0] lib/util_sock.c:send_smb(605) May 5 14:52:44 fbcsrvsmb01 smbd[8786]: Error writing 61503 bytes to client. -1. (Broken pipe) May 5 14:52:50 fbcsrvsmb01 smbd[8915]: [2004/05/05 14:52:50, 0] lib/util_sock.c:read_socket_data(342) May 5 14:52:50 fbcsrvsmb01 smbd[8915]: read_socket_data: recv failure for 4. Error = Connection reset by peer May 5 14:53:29 fbcsrvsmb01 smbd[3587]: [2004/05/05 14:53:28, 0] lib/util_sock.c:read_socket_data(342) May 5 14:53:29 fbcsrvsmb01 smbd[3587]: read_socket_data: recv failure for 4. Error = Connection reset by peer May 5 14:54:25 fbcsrvsmb01 smbd[8953]: [2004/05/05 14:54:25, 0] lib/util_sock.c:read_socket_data(342) May 5 14:54:25 fbcsrvsmb01 smbd[8953]: read_socket_data: recv failure for 4. Error = Connection reset by peer May 5 14:54:34 fbcsrvsmb01 smbd[8959]: [2004/05/05 14:54:34, 0] lib/util_sock.c:read_socket_data(342) May 5 14:54:34 fbcsrvsmb01 smbd[8959]: read_socket_data: recv failure for 4. Error = Connection reset by peer May 5 14:54:54 fbcsrvsmb01 smbd[8969]: [2004/05/05 14:54:54, 0] lib/util_sock.c:get_peer_addr(952) May 5 14:54:54 fbcsrvsmb01 smbd[8969]: getpeername failed. Error was Transport endpoint is not connected May 5 14:54:54 fbcsrvsmb01 smbd[8969]: [2004/05/05 14:54:54, 0] lib/util_sock.c:get_peer_addr(952) May 5 14:54:54 fbcsrvsmb01 smbd[8969]: getpeername failed. Error was Transport endpoint is not connected May 5 14:54:54 fbcsrvsmb01 smbd[8969]: [2004/05/05 14:54:54, 0] lib/access.c:check_access(328) May 5 14:54:54 fbcsrvsmb01 smbd[8969]: [2004/05/05 14:54:54, 0] lib/util_sock.c:get_peer_addr(952) May 5 14:54:54 fbcsrvsmb01 smbd[8969]: getpeername failed. Error was Transport endpoint is not connected May 5 14:54:54 fbcsrvsmb01 smbd[8969]: Denied connection from (0.0.0.0) May 5 14:54:54 fbcsrvsmb01 smbd[8969]: [2004/05/05 14:54:54, 0] lib/util_sock.c:get_peer_addr(952) May 5 14:54:54 fbcsrvsmb01 smbd[8969]: getpeername failed. Error was Transport endpoint is not connected May 5 14:54:54 fbcsrvsmb01 smbd[8969]: Connection denied from 0.0.0.0 What does it mean connection denied from 0.0.0.0? I have logs 0.0.0.0.log in the log dir, what does it mean? I have been looking in the mailing list and googling in the last two days, but I couldn't find a final answer. It looks like it can be related to network problems (but restarting service network wouldn't fix it I think) or iptables, but it looks and manifest like a random issue. It has been working fine for many days, and nothing has been changed lately. If you're still there, thanks for reading. Any idea is really welcome, and much more welcome if possible, would be a hint on how to monitor the linux box (for ex how can I understand what froze the network?) , which tools to use (I can figure out myself how to use them, not asking for a tutorial), so that I can be much more useful to the list than just ask for help ;-) Thanks for you time Simone --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.677 / Virus Database: 439 - Release Date: 04/05/2004 Errore Apertura DB