Over the last three days I've been upgrading my Samba infrastructure. This
involved moving from Red Hat Enterprise 3.0 (Samba 3.0.9) to Ubuntu 5.10(Samba
3.0.14) and some new hardware. For the most part things went well. But I do
have some unresolved issues that I would like to get some feedback on. Keep in
mind that this entire setup has been working properly for more than two years in
this fashion.
First a bit of ASCII art:
---------------
| Main Office |
| PDC |
| Master LDAP |
---------------
|
|
VPN
|
|
----------------------
| |
| |
---------------- ----------------
| Office No. 1 | | Office No. 2 |
| BDC | | BDC |
| Slave LDAP | | Slave LDAP |
---------------- ----------------
In the "Main Office", we run a 60/40 spit of machines running Windows
XP and Windows 2000, leaning heavier toward XP. One laptop (running Windows XP)
gave us problems logging onto to the domain for about 20 minutes. After a minor
change to the LDAP configuration and a restart of Samba on the PDC, this machine
came online. The remaining machines in this office came online with very little
issues - the only issue being a slow logon the very first time.
In "Office No. 1" every machine runs Windows 2000 and everyone of them
had to removed and re-added to the domain before logons would work. We kept
getting errors stating that the domain controller was unavailable or the
computer account password in the domain was incorrect. These errors happened
immediately on the windows clients and nothing was recorded in the Samba logs.
In "Office No. 2" we are running Windows 2000 on one machine and
Windows XP Pro on all other machines. The Windows 2000 client exhibited the
same symptoms as described in Office No. 1. One of the Windows XP clients
exhibited the same symptoms as well. The remaining XP machines worked fine. To
cure the troublesome XP client, we had to remove the machine from the domain,
delete the LDAP computer account and then rejoin the domain. After that process
everything seems to be functional.
The upgrade process went like this: On Friday of last week, we had every user
turn their computer off as they left for the day. We left all of the servers
online through the weekend. On Monday, we upgraded the PDC and checked a few
workstations to make sure that things were OK. On Tuesday we were involved in
getting the rack in the server room buttoned up - no changes with the exception
of a machine or two being taken offline for a few minutes while cables were
routed. On Wednesday, we upgraded the Office No. 1 BDC, handled the problem
with the Laptop in the Main office and then Upgraded Office No. 2 BDC. Because
of the problems seen in both of the remote offices, this morning, we went to
every workstation in the main office making sure that they functioned properly.
So my question is why did we have the problems in the remote offices? Why could
they not contact the domain controller? Why would a removal and rejoin cause
the problem to go away? Should I be worried about future occurrences of this
phenomena?
--
Kevin L. Collins, MCSE
Systems Manager
Nesbitt Engineering, Inc.