We recently committed to replacing our Netware server with a Samba server
running on an old SPARC box (using RedHat 6.0). This seemed like a good
idea at the time, but now I'm wondering...
We have two labs of PCs (each of 16 machines, all running Windows 95) and
we use PC-Rdist to ensure that the hard disk images on the PCs are kept
fairly clean and up-to-date. The master disk images are stored on the
Samba server.
Our initial testing with only a few clients was successful, and the image
updates were appreciably faster than they had been with the Netware server.
However, once we started using the system "in anger" (that is, once a
larger number of machines started updating their images from the Samba
server) things started to go bad.
Over a few hours, we were getting massive smb log files, and the most
common error messages look like this:
[1999/07/09 21:59:45, 0] smbd/oplock.c:oplock_break(742)
oplock_break: receive_smb timed out after 30 seconds.
oplock_break failed for file <a file name here...>
There were over 6000 of these in a 2 day period. There were a similar
number of the following messages too (probably one for each of the above):
[1999/07/09 21:59:56, 0] smbd/oplock.c:oplock_break(812)
oplock_break: client failure in break - shutting down this smbd.
This goes hand in hand with a high CPU load, and an apparently endless list
of locked files (actually the same file each time) produced by the
smbstatus command (I literally abort the listing because it doesn't look
like it's going to stop).
My first assumption was that the problem was the RedHat 6 SPARC kernel -
I'm not sure this is as stable as the Intel version. To test the theory,
we moved the whole system to a PC running RedHat 6.0. To cut a long story
short, we had exactly the same problems.
To further our testing, we recently migrated back to the SPARC box, and
turned oplocks off in Samba - expecting a performance slowdown, but
hopefully a reliability increase. In this configuration, I've just
witnesed a 32M smb.log file in 3 hours. This time there are plenty of
errors (over 150,000!) looking like this:
[1999/07/16 16:47:36, 0] locking/shmem.c:smb_shm_global_unlock(131)
ERROR smb_shm_global_unlock : shmem not locked
[1999/07/16 16:47:36, 0] locking/shmem.c:smb_shm_global_lock(112)
ERROR smb_shm_global_lock : fcntl_lock failed with code Resource deadlock
avoided
My earlier experiences with Samba were good, but I'm really concerned about
this apparent inability to handle heavy loads. Has anyone seen this kind
of behaviour, or can anyone suggest any workarounds?
Thanks,
Tony
--
Tony Gray, Technical Services Manager,
School of Computing
University of Tasmania
--
"Imagine the disincentive to software development if after months of work
another company could come along and copy your work and market it under its
own name...without legal restraints to such copying, companies like Apple
could not afford to advance the state of the art." -- Bill Gates (New York
Times, 25 Sep 1983).