Mike Ray
2017-Sep-29 22:37 UTC
[Samba] Replication Error Between Differing Samba Versions During Upgrade
Hey all-
Trying to upgrade the domain and running into issues getting my data into the
new controller.
Current configuration:
dc0 - Ubuntu 12.04.2 - Samba: 2:4.0.6-12
dc1 - Ubuntu 12.04.2 - Samba: 2:4.0.6-8
dc2 - Ubuntu 12.04.3 - Samba: 2:4.0.6-8
I'm trying upgrade to Ubuntu 16.04.3, Samba: 2:4.3.11+dfsg-0ubuntu0.16.04.10
The documentation
(https://wiki.samba.org/index.php/Updating_Samba#The_Update_Process) recommends
updating in place, but the standard practice I have to work around is to create
a new server and decommission old servers. In my test instance, I'm trying
to do
this by introducing dc3 already at the OS and Samba version listed above. This
section
(https://wiki.samba.org/index.php/Updating_Samba#Updating_Multiple_Samba_Domain_Controllers)
suggests that running different versions of domain controllers in the same
domain is OK and should not be the inherent problem.
The current state of things is that for dc0, dc1, dc2:
* "samba-tool dbcheck --cross-ncs" returns no errors
* "samba-tool drs showrepl" returns no errors (and just the expected
warning about "No NC replicated for Connection")
* "samba-tool ldapcmp --filter=msDS-NcType,serverState,subrefs"
returns no errors
dc3 is a bit funky:
* attempts to connect to this dc in ADUC results in "The RPC server is
unavailable"
* "samba-tool dbcheck --cross-ncs" returns no errors
* "samba-tool drs showrepl" returns no errors (and just the expected
warning about "No NC replicated for Connection")
* "samba-tool ldapcmp --filter=msDS-NcType,serverState,subrefs"
returns a HUGE amount of errors, e.g.:
Comparing:
'CN=NTDS Quotas,DC=example,DC=com' [ldap://dc0.example.com]
'CN=NTDS Quotas,DC=example,DC=com' [ldap://dc3.example.com]
Attributes found only in ldap://dc0.example.com:
distinguishedName
isCriticalSystemObject
cn
name
objectCategory
objectClass
msDS-TombstoneQuotaFactor
objectGUID
systemFlags
whenCreated
showInAdvancedViewOnly
instanceType
description
FAILED
Forcing replication from a good domain controller "samba-tool drs replicate
dc3
dc0 'DC=EXAMPLE,DC=COM'" returns successfully but does not remove
any of the
errors shown from the ldapcmp command. Adding the "--sync-forced --sync-all
--full-sync" flags does not change the outcome. Replicating on all the NCs
(e.g.
"CN=Configuration,DC=EXAMPLE,DC=COM") does not change the outcome.
I also found that the LDB databases (which I believe are initially populated on
domain join) are missing data. On one of the old controllers, "ldbsearch -H
/var/lib/samba/private/sam.ldb.d/DC%3DEXAMPLE,DC%3DCOM.ldb -b
"cn=dc0,OU=Domain
Controllers,DC=EXAMPLE,DC=COM"" returns good data:
# record 1
dn: CN=DC0,OU=Domain Controllers,DC=example,DC=com
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: user
objectClass: computer
cn: DC0
instanceType: 4
whenCreated: 20130703181107.0Z
uSNCreated: 3583
But on dc3, it returns only:
# record 1
dn: CN=DC0,OU=Domain Controllers,DC=example,DC=com
# record 2
dn: CN=RID Set,CN=DC0,OU=Domain Controllers,DC=example,DC=com
# returned 2 records 2 entries 0 referrals
Before I ran into this issue, I was having issues just getting replication to
work at all ("samba-tool drs showrepl" would show errors). dc3
complained about
WERR_GENERAL_FAILURE from dc0 and dc1 (it is unclear why, but dc3 never
complained about dc2). After spending a long time looking and checking many
other things, I eventually found this thread
https://lists.samba.org/archive/samba/2014-August/184479.html. While the
original author seems to not have needed it, I found that this level of
replication issue (WERR_GENERAL_FAILURE) was fixed by adding the GUID CNAME
records to /etc/hosts file of dc3. Other than that, I have not done anything to
the domain controller after provisioning it (to my recollection -- it's been
a
long couple of days).
I've copied the running domain into an isolated environment and can play
with it
without impacting any users/services, so feel free to recommend anything.
Thanks,
Mike Ray
Mike Ray
2017-Oct-02 22:32 UTC
[Samba] Replication Error Between Differing Samba Versions During Upgrade
Spent more time on this today and found the following: * manually copying over the sam.ldb from a working controller did not fix the broken controller (in fact, it completely destroyed the domain) * upgrading to 4.1.6 shows the same issue as 4.3.11 (dbcheck, drs showrepl, dnsupdate all return happily, but ldapcmp shows missing data in the new controller) * this issue does not show up if I add another 4.0.6(*) dc (*) - this is a bastardize, custom version, not true "4.0.6" My next step is to try just getting a 4.0.7 controller to work. If that fails, just a 4.0.6. My current theory is that at some point the replication code was changed to ask for data in a way that isn't compatible with the old version, but that if I do smaller version upgrades, it'll handle it internally. If anyone has any 4.0 to 4.1 upgrade information/links, I would appreciate them. ----- On Sep 29, 2017, at 5:37 PM, samba samba at lists.samba.org wrote:> Hey all- > > Trying to upgrade the domain and running into issues getting my data into the > new controller. >
Mike Ray
2017-Oct-03 22:31 UTC
[Samba] Replication Error Between Differing Samba Versions During Upgrade
Hello all- At the recommendation of the Catalyst team, instead of trying to get 4.0.6 replicating with 4.0.7, they suggested we try a later version (>=4.5) as those versions are much more recent and may have handled issues older versions just could not. Unfortunately that too did not work. They then suggested that an in-place upgrade, as if there was a bug in our 4.0.6 code that was causing problems, there was not much else that could be done. The server survived an upgrade from Ubuntu 12.04 (Samba 4.0.6) to Ubuntu 14.04 (Samba 4.3.11) with minimal work afterwards (samba-dsdb-modules and samba-vfs-modules packages were missing) and seemingly no loss of data. We still plan on going up to 4.7 via SerNet packages or similar, but at this point it seems we are out of the woods. TL;DR - we fixed a replication issue that was blocking an upgrade by first upgrading our existing hosts. ----- On Oct 2, 2017, at 5:32 PM, Mike Ray mray at xes-inc.com wrote:> Spent more time on this today and found thefollowing: