Dear All, we are experiencing severe problems with Samba 2.2.0 (with quota support) running on a dual processor (400MHz) Sun E450 running Solaris 2.7. This is used as a central file server for student diskspace, accessed by approx 1200 PCs running NT 4. Up until recently we experienced some, what we assume to be, loading issues with connections during the middle of the day being slow. However, recently we have been encountering severe problems. Everything seems fine until midday, then what we start to see is the number of smbd processes going up whilst the number of connections (determined from smbstatus -b) dropping, students with connections starting getting slow responses and no new connections are being made, load on the system skyrockets. stopping samba and restarting seems to cure the problem, but the problem can re-occur. We are in a desperate panic at the moment as the students are all doing assignments and this is seriously affecting their work. We have tried various tweaks to Samba (deadtime, change notify timeout), the tcp stack and have tripled system memory, all to no avail. We also seem to have an issue with keepalives and tcp_nodelay, neither of which seem to work at all, we see the following messages in the log about keepalives:- [2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165) Failed to set socket option SO_KEEPALIVE (Error Invalid argument) [2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165) Failed to set socket option TCP_NODELAY (Error Invalid argument) The following are a selection of messages appearing just before Samba was stopped: [2001/12/13 11:39:51, 0] lib/util_sock.c:write_socket(566) write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe [2001/12/13 11:39:51, 0] lib/util_sock.c:send_smb(753) Error writing 4 bytes to client. -1. Exiting [2001/12/13 11:40:29, 0] lib/util_sock.c:get_socket_addr(1084) getpeername failed. Error was Transport endpoint is not connected [2001/12/13 11:40:30, 0] lib/util_sock.c:get_socket_addr(1084) getpeername failed. Error was Transport endpoint is not connected [2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542) write_socket_data: write failure. Error = Broken pipe [2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566) write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe [2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753) Error writing 4 bytes to client. -1. Exiting [2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542) write_socket_data: write failure. Error = Broken pipe [2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566) write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe [2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753) Error writing 4 bytes to client. -1. Exiting [2001/12/13 11:40:33, 0] lib/util_sock.c:read_socket_data(479) read_socket_data: recv failure for 4. Error = Connection reset by peer [2001/12/13 11:40:49, 0] smbd/server.c:open_sockets(251) open_sockets: accept: Software caused connection abort [2001/12/13 11:40:53, 0] lib/util_sock.c:read_socket_data(479) read_socket_data: recv failure for 4. Error = Connection reset by peer We think we may have loading problems, however, if it is, it doesn't seem to be directly proportional to number of connections. In fact there will be a significant rise in the load at, and for 10 - 15 mins past, the hour (this is all day long not just midday), we assume that this is because logging in exacts a high load on the system. It's alos possible that the midday problems are caused by different patterns of working, as students will be logging in for short periods to check e-mail before going to get lunch etc. Another oddity we see are some samba connections left running from the day before (or sometimes longer), so we are wondering whether connections are not getting killed properly, thereby adding to the load. So, please, any pointers as to what the problem is would be very helpful. At the moment we're struggling, I'm considering getting a less stressful job - something like a fork lift truck driver in an explosives factory - and people are starting to question whether we should replace the whole system with a Novell based one! Thanks in advance Martin Rootes Systems Support ------------------------------------------------------------------------------ Martin Rootes - Senior Systems Programmer/Analyst, Sheffield Hallam University Email : M.J.Rootes@shu.ac.uk Phone: 0114 225 3828 ------------------------------------------------------------------------------
You may want to check out this article from Sysadmin Mag about Solris Performance Tuning, seems like some of this may apply to your situation. http://www.samag.com/documents/s=1323/sam0110e/0110e.htm also this site which is referenced in the article http://www.sean.de/Solaris/tune.html Cheers, Lonny -----Original Message----- From: samba-admin@lists.samba.org [mailto:samba-admin@lists.samba.org]On Behalf Of Martin Rootes Sent: Thursday, December 13, 2001 10:08 AM To: Samba Subject: Severe problem with Samba Dear All, we are experiencing severe problems with Samba 2.2.0 (with quota support) running on a dual processor (400MHz) Sun E450 running Solaris 2.7. This is used as a central file server for student diskspace, accessed by approx 1200 PCs running NT 4. Up until recently we experienced some, what we assume to be, loading issues with connections during the middle of the day being slow. However, recently we have been encountering severe problems. Everything seems fine until midday, then what we start to see is the number of smbd processes going up whilst the number of connections (determined from smbstatus -b) dropping, students with connections starting getting slow responses and no new connections are being made, load on the system skyrockets. stopping samba and restarting seems to cure the problem, but the problem can re-occur. We are in a desperate panic at the moment as the students are all doing assignments and this is seriously affecting their work. We have tried various tweaks to Samba (deadtime, change notify timeout), the tcp stack and have tripled system memory, all to no avail. We also seem to have an issue with keepalives and tcp_nodelay, neither of which seem to work at all, we see the following messages in the log about keepalives:- [2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165) Failed to set socket option SO_KEEPALIVE (Error Invalid argument) [2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165) Failed to set socket option TCP_NODELAY (Error Invalid argument) The following are a selection of messages appearing just before Samba was stopped: [2001/12/13 11:39:51, 0] lib/util_sock.c:write_socket(566) write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe [2001/12/13 11:39:51, 0] lib/util_sock.c:send_smb(753) Error writing 4 bytes to client. -1. Exiting [2001/12/13 11:40:29, 0] lib/util_sock.c:get_socket_addr(1084) getpeername failed. Error was Transport endpoint is not connected [2001/12/13 11:40:30, 0] lib/util_sock.c:get_socket_addr(1084) getpeername failed. Error was Transport endpoint is not connected [2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542) write_socket_data: write failure. Error = Broken pipe [2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566) write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe [2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753) Error writing 4 bytes to client. -1. Exiting [2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542) write_socket_data: write failure. Error = Broken pipe [2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566) write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe [2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753) Error writing 4 bytes to client. -1. Exiting [2001/12/13 11:40:33, 0] lib/util_sock.c:read_socket_data(479) read_socket_data: recv failure for 4. Error = Connection reset by peer [2001/12/13 11:40:49, 0] smbd/server.c:open_sockets(251) open_sockets: accept: Software caused connection abort [2001/12/13 11:40:53, 0] lib/util_sock.c:read_socket_data(479) read_socket_data: recv failure for 4. Error = Connection reset by peer We think we may have loading problems, however, if it is, it doesn't seem to be directly proportional to number of connections. In fact there will be a significant rise in the load at, and for 10 - 15 mins past, the hour (this is all day long not just midday), we assume that this is because logging in exacts a high load on the system. It's alos possible that the midday problems are caused by different patterns of working, as students will be logging in for short periods to check e-mail before going to get lunch etc. Another oddity we see are some samba connections left running from the day before (or sometimes longer), so we are wondering whether connections are not getting killed properly, thereby adding to the load. So, please, any pointers as to what the problem is would be very helpful. At the moment we're struggling, I'm considering getting a less stressful job - something like a fork lift truck driver in an explosives factory - and people are starting to question whether we should replace the whole system with a Novell based one! Thanks in advance Martin Rootes Systems Support ---------------------------------------------------------------------------- -- Martin Rootes - Senior Systems Programmer/Analyst, Sheffield Hallam University Email : M.J.Rootes@shu.ac.uk Phone: 0114 225 3828 ---------------------------------------------------------------------------- -- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Can anyone confirm (or deny) that the problems with file-locking when accessing a shared database (ACT, Access, FoxPro, etc) are fixed yet? This is the only thing preventing me from killing our last Win2K Server. Thanks guys, for all the hard work - Samba rocks (other than this problem)! Charles
On Thu, Dec 13, 2001 at 06:08:03PM +0000, Martin Rootes wrote:> Dear All, > > we are experiencing severe problems with Samba 2.2.0 (with quota support) running on a > dual processor (400MHz) Sun E450 running Solaris 2.7. This is used as a central file server for > student diskspace, accessed by approx 1200 PCs running NT 4. Up until recently we > experienced some, what we assume to be, loading issues with connections during the middle of > the day being slow. However, recently we have been encountering severe problems. Everything > seems fine until midday, then what we start to see is the number of smbd processes going up > whilst the number of connections (determined from smbstatus -b) dropping, students with > connections starting getting slow responses and no new connections are being made, load on > the system skyrockets. stopping samba and restarting seems to cure the problem, but the > problem can re-occur. We are in a desperate panic at the moment as the students are all doing > assignments and this is seriously affecting their work. We have tried various tweaks to Samba > (deadtime, change notify timeout), the tcp stack and have tripled system memory, all to no avail. > We also seem to have an issue with keepalives and tcp_nodelay, neither of which seem to work > at all, we see the following messages in the log about keepalives:-We think we've solved these in the latest Samba 2.2.x CVS tree. Unfortunately this isn't released as "stable" 2.2.3 code yet (getting close though). If you'd like to test this the CVS branch is SAMBA_2_2. It has been confirmed to fix this problem on other Solaris and HPUX boxes. Jeremy.
Thanks Jeremy, I'll compile it up and test it out. Martin. Date sent: Thu, 13 Dec 2001 15:17:32 -0800 To: Martin Rootes <M.J.Rootes@shu.ac.uk> Copies to: Samba <Samba@lists.samba.org> Subject: Re: Severe problem with Samba From: jra@samba.org (Jeremy Allison)> On Thu, Dec 13, 2001 at 06:08:03PM +0000, Martin Rootes wrote: > > Dear All, > > > > we are experiencing severe problems with Samba 2.2.0 (with quota support) running on a > > dual processor (400MHz) Sun E450 running Solaris 2.7. This is used as a central file server for > > student diskspace, accessed by approx 1200 PCs running NT 4. Up until recently we > > experienced some, what we assume to be, loading issues with connections during the middle of > > the day being slow. However, recently we have been encountering severe problems. Everything > > seems fine until midday, then what we start to see is the number of smbd processes going up > > whilst the number of connections (determined from smbstatus -b) dropping, students with > > connections starting getting slow responses and no new connections are being made, load on > > the system skyrockets. stopping samba and restarting seems to cure the problem, but the > > problem can re-occur. We are in a desperate panic at the moment as the students are all doing > > assignments and this is seriously affecting their work. We have tried various tweaks to Samba > > (deadtime, change notify timeout), the tcp stack and have tripled system memory, all to no avail. > > We also seem to have an issue with keepalives and tcp_nodelay, neither of which seem to work > > at all, we see the following messages in the log about keepalives:- > > We think we've solved these in the latest Samba 2.2.x CVS tree. Unfortunately > this isn't released as "stable" 2.2.3 code yet (getting close though). If you'd > like to test this the CVS branch is SAMBA_2_2. It has been confirmed to fix this > problem on other Solaris and HPUX boxes. > > Jeremy.------------------------------------------------------------------------------ Martin Rootes - Senior Systems Programmer/Analyst, Sheffield Hallam University Email : M.J.Rootes@shu.ac.uk Phone: 0114 225 3828 ------------------------------------------------------------------------------