Mike Ray
2017-Sep-29 22:37 UTC
[Samba] Replication Error Between Differing Samba Versions During Upgrade
Hey all- Trying to upgrade the domain and running into issues getting my data into the new controller. Current configuration: dc0 - Ubuntu 12.04.2 - Samba: 2:4.0.6-12 dc1 - Ubuntu 12.04.2 - Samba: 2:4.0.6-8 dc2 - Ubuntu 12.04.3 - Samba: 2:4.0.6-8 I'm trying upgrade to Ubuntu 16.04.3, Samba: 2:4.3.11+dfsg-0ubuntu0.16.04.10 The documentation (https://wiki.samba.org/index.php/Updating_Samba#The_Update_Process) recommends updating in place, but the standard practice I have to work around is to create a new server and decommission old servers. In my test instance, I'm trying to do this by introducing dc3 already at the OS and Samba version listed above. This section (https://wiki.samba.org/index.php/Updating_Samba#Updating_Multiple_Samba_Domain_Controllers) suggests that running different versions of domain controllers in the same domain is OK and should not be the inherent problem. The current state of things is that for dc0, dc1, dc2: * "samba-tool dbcheck --cross-ncs" returns no errors * "samba-tool drs showrepl" returns no errors (and just the expected warning about "No NC replicated for Connection") * "samba-tool ldapcmp --filter=msDS-NcType,serverState,subrefs" returns no errors dc3 is a bit funky: * attempts to connect to this dc in ADUC results in "The RPC server is unavailable" * "samba-tool dbcheck --cross-ncs" returns no errors * "samba-tool drs showrepl" returns no errors (and just the expected warning about "No NC replicated for Connection") * "samba-tool ldapcmp --filter=msDS-NcType,serverState,subrefs" returns a HUGE amount of errors, e.g.: Comparing: 'CN=NTDS Quotas,DC=example,DC=com' [ldap://dc0.example.com] 'CN=NTDS Quotas,DC=example,DC=com' [ldap://dc3.example.com] Attributes found only in ldap://dc0.example.com: distinguishedName isCriticalSystemObject cn name objectCategory objectClass msDS-TombstoneQuotaFactor objectGUID systemFlags whenCreated showInAdvancedViewOnly instanceType description FAILED Forcing replication from a good domain controller "samba-tool drs replicate dc3 dc0 'DC=EXAMPLE,DC=COM'" returns successfully but does not remove any of the errors shown from the ldapcmp command. Adding the "--sync-forced --sync-all --full-sync" flags does not change the outcome. Replicating on all the NCs (e.g. "CN=Configuration,DC=EXAMPLE,DC=COM") does not change the outcome. I also found that the LDB databases (which I believe are initially populated on domain join) are missing data. On one of the old controllers, "ldbsearch -H /var/lib/samba/private/sam.ldb.d/DC%3DEXAMPLE,DC%3DCOM.ldb -b "cn=dc0,OU=Domain Controllers,DC=EXAMPLE,DC=COM"" returns good data: # record 1 dn: CN=DC0,OU=Domain Controllers,DC=example,DC=com objectClass: top objectClass: person objectClass: organizationalPerson objectClass: user objectClass: computer cn: DC0 instanceType: 4 whenCreated: 20130703181107.0Z uSNCreated: 3583 But on dc3, it returns only: # record 1 dn: CN=DC0,OU=Domain Controllers,DC=example,DC=com # record 2 dn: CN=RID Set,CN=DC0,OU=Domain Controllers,DC=example,DC=com # returned 2 records 2 entries 0 referrals Before I ran into this issue, I was having issues just getting replication to work at all ("samba-tool drs showrepl" would show errors). dc3 complained about WERR_GENERAL_FAILURE from dc0 and dc1 (it is unclear why, but dc3 never complained about dc2). After spending a long time looking and checking many other things, I eventually found this thread https://lists.samba.org/archive/samba/2014-August/184479.html. While the original author seems to not have needed it, I found that this level of replication issue (WERR_GENERAL_FAILURE) was fixed by adding the GUID CNAME records to /etc/hosts file of dc3. Other than that, I have not done anything to the domain controller after provisioning it (to my recollection -- it's been a long couple of days). I've copied the running domain into an isolated environment and can play with it without impacting any users/services, so feel free to recommend anything. Thanks, Mike Ray
Mike Ray
2017-Oct-02 22:32 UTC
[Samba] Replication Error Between Differing Samba Versions During Upgrade
Spent more time on this today and found the following: * manually copying over the sam.ldb from a working controller did not fix the broken controller (in fact, it completely destroyed the domain) * upgrading to 4.1.6 shows the same issue as 4.3.11 (dbcheck, drs showrepl, dnsupdate all return happily, but ldapcmp shows missing data in the new controller) * this issue does not show up if I add another 4.0.6(*) dc (*) - this is a bastardize, custom version, not true "4.0.6" My next step is to try just getting a 4.0.7 controller to work. If that fails, just a 4.0.6. My current theory is that at some point the replication code was changed to ask for data in a way that isn't compatible with the old version, but that if I do smaller version upgrades, it'll handle it internally. If anyone has any 4.0 to 4.1 upgrade information/links, I would appreciate them. ----- On Sep 29, 2017, at 5:37 PM, samba samba at lists.samba.org wrote:> Hey all- > > Trying to upgrade the domain and running into issues getting my data into the > new controller. >
Mike Ray
2017-Oct-03 22:31 UTC
[Samba] Replication Error Between Differing Samba Versions During Upgrade
Hello all- At the recommendation of the Catalyst team, instead of trying to get 4.0.6 replicating with 4.0.7, they suggested we try a later version (>=4.5) as those versions are much more recent and may have handled issues older versions just could not. Unfortunately that too did not work. They then suggested that an in-place upgrade, as if there was a bug in our 4.0.6 code that was causing problems, there was not much else that could be done. The server survived an upgrade from Ubuntu 12.04 (Samba 4.0.6) to Ubuntu 14.04 (Samba 4.3.11) with minimal work afterwards (samba-dsdb-modules and samba-vfs-modules packages were missing) and seemingly no loss of data. We still plan on going up to 4.7 via SerNet packages or similar, but at this point it seems we are out of the woods. TL;DR - we fixed a replication issue that was blocking an upgrade by first upgrading our existing hosts. ----- On Oct 2, 2017, at 5:32 PM, Mike Ray mray at xes-inc.com wrote:> Spent more time on this today and found thefollowing: