Hello Samba Community,
I have what is probably a very unique problem. Allow me to explain:
Background:
We build software for Windows, among other things. Most of our developers
are not on Windows, but they need to do Windows builds. To facilitate
this, we've set up a complex build system where calling "make"
automatically connects (rsh/ssh) to a cmd shell on the Windows build
server, translates our Makefile into something more suitable for Windows,
and executes the build. The source code is not on the build server's local
disks, but is instead sitting on a file server which the build server
accesses through Samba. This leads to the problem.
The Problem(s):
We're seeing mysterious and unpredictable problems in this environment.
Looking through the Event Viewer, we've seen 2658 "Delayed Write
Failed"
messages since October. Only 19 of them did not relate to ".pdb"
files.
The Samba logs don't indicate a problem.
We're getting messages from the compiler that it can't find header files
which definitely exist and are definitely in the include path. We're also
getting the occasional "gmake: *** Makefile: Permission denied.
Stop."
message. Simply starting the "make" again without changing any
permissions
allows the build to continue.
Build Server:
* Windows Server 2003 SP2
* 4x 3 GHz Xeon (5160)
* 4 GB RAM
* 2x 10k RPM SAS drives, hardware RAID 1
File Server:
* CentOS 5.2
* 8x 3 GHz Xeon (5450)
* 4 GB RAM
* 14x 15k RPM SAS drives, hardware RAID 6
* Samba 3.0.25b-1.el5_1.4
* Authenticates against Windows domain controller(s)
What I've tried already (not necessarily in this order):
* Rebooted the Build Server.
* Swapped OSs on the Build Server. We started with NT, then moved to XP
and are now on Server 2003.
* Swapped Ethernet cable on the Build Server.
* Swapped Ethernet switch port for the Build Server.
* Swapped Ethernet switch for the Build Server.
* Swapped Ethernet NIC on the Build Server.
* Swapped the Build Server hardware itself.
* Switched from explicitly mapping drives at the start of each remote cmd
session to using UNC paths.
* Swapped OSs on the File Server. We started with Red Hat Linux 8 for i386
and have moved up through several iterations to Centos 5.2 for x86_64.
* Swapped Ethernet cable on the File Server.
* Swapped Ethernet switch port for the File Server.
* Swapped Ethernet switch for the File Server.
* Swapped Ethernet NIC on the File Server.
* Swapped the File Server hardware itself.
* Upgraded to the latest version of Samba available from the CentOS team.
This broke domain authentication for us, so we rolled back to 3.0.25.
* Added a backup domain controller. (NT4 domain environment still. Yes,
I know, I'm working on it)
* Changed the Samba socket options from "TCP_NODELAY SO_RCVBUF=8192
SO_SNDBUF=8192" to "TCP_NODELAY IPTOS_LOWDELAY".
* Set "large readwrite = no"
* Set "write raw = no"
* Explicitly turned on oplocks and level2 oplocks, though I believe they
are on by default.
* "dos filetimes = yes"
* "fake directory create times = yes"
* "dos filetime resolution = yes"
* "allocation roundup size = 0"
Thusfar, any time we've managed to improve performance back to the expected
level, it has been unclear what did the trick... and it didn't last. If
anyone has any thoughts on other things I can try, I would certainly
appreciate it. If there's any further information that would help in
making an assessment, I'd be happy to post what I can.
Thanks,
James