Pinja-Liina Jalkanen
2015-Jul-03 13:25 UTC
[Samba] NT_STATUS_INTERNAL_DB_CORRUPTION messages in log.samba--proper course of action?
Hi all, We've recently migrated from a separate DNS server that was dynamically updated with BIND's update-policy, using a manually generated tkey-gssapi-keytab (plus a second server functioning as an ordinary slave to the first), to BIND9_DLZ. The setup predated Samba's AD DC support and BIND's DLZ support, and was originally established because even though we needed AD, we were unwilling to use Windows's own DNS server. After the migration, while replication seems to work, Windows is finally gone for good (yay!), kerberos seems to work and dynamic DNS updates certainly work, there is still a lingering problem manifested as errors in Samba logs that I'd like to ask about. The messages are as follows (this is with -d3): [2015/07/03 14:04:15.034263, 1] ../source4/dsdb/kcc/kcc_topology.c:1437(kcctpl_color_vertices) ../source4/dsdb/kcc/kcc_topology.c:1437: failed to find nCName attribute of object CN=75a3f420-60ef-4728-8608-3ead61de4555,CN=Part itions,CN=Configuration,DC=mydomain,DC=tld [2015/07/03 14:04:15.034308, 1] ../source4/dsdb/kcc/kcc_topology.c:3236(kcctpl_create_connections) ../source4/dsdb/kcc/kcc_topology.c:3236: failed to color vertices: NT_STATUS_INTERNAL_DB_CORRUPTION [2015/07/03 14:04:15.034317, 1] ../source4/dsdb/kcc/kcc_topology.c:3496(kcctpl_create_intersite_connections) ../source4/dsdb/kcc/kcc_topology.c:3496: failed to create connections: NT_STATUS_INTERNAL_DB_CORRUPTION [2015/07/03 14:04:26.299572, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) Some background about how I did the migration process: Because one of our DC's was still running Windows 2003, and its support ends on this very month, we needed to migrate out of that entirely. To that end we'd already transferred all the FSMO roles to Samba, which worked as it should. But we also wanted to finally move from the separate DNS to BIND9_DLZ. This is hardly a documented procedure; eg. the Samba Wiki page at https://wiki.samba.org/index.php/Changing_the_DNS_backend has no mention of it. The obvious way seemed to be to install Samba to our primary DNS for the migration and then join that server to the domain. So I joined it to the domain as a new DC, using a "--dns-backend=BIND9_DLZ" flag. But: this didn't seem to work. The join went OK, but for some reason it didn't create named.conf or dns.keytab into Samba's private directory. In retrospect, it might had been better to install BIND on the DC having the FSMO roles and run samba_upgradedns on that, but I didn't even know to think about such an option in advance, because the documentation for samba_upgradedns didn't take into account a situation like ours, where the previous DNS backend had been NONE. And that box was never supposed to run the DNS. The next thing that I tried to do was to run "samba_upgradedns --dns-backend=BIND9_DLZ" on the newly promoted machine. After manually creating the "DnsAdmins" AD group it actually seemed to work. But I had forgotten that the DNS/primarydns.domain.tld SPN was already assigned to the user that had previously been used to do the dynamic updates. (I'll return to the consequences of that mistake below.) At this point I ran a short script commanding "samba-tool dns add" and adding, one by one, all the old A records for hosts that have static IPs from the domain.tld zonefile back to the domain.tld zone DB that was now managed by Samba. I also noticed that the other DC's records weren't there; I tried to run samba_dnsupdate on the FSMO server, but it failed complaining about kerberos. Because replication was seriously broken at this point due to missing DNS records, I added the right records for the other DCs manually, and pointed pdc._msdcs.mydomain.tld to the right DC. After this the two Samba DCs replicated with each other without errors, but the Windows DC didn't; it complaining about lingering objects, which was odd because the DNS had been broken only for a short while, and no deletions had been done during that period (the lingering object that Windows was complaining about was a long since deleted, ordinary domain user). After some futile repair attempts that failed mostly due to the mixed Windows/Samba environment, I decided not to waste any more time with Windows, because that box was to be demoted really soon anyway; I just ran "dcpromo /forceremoval", cleaned up the metadata by running the script on page https://gallery.technet.microsoft.com/scriptcenter/d31f091f-2642-4ede-9f97-0e1cc4d577f3 through RSAT and manually cleaned up the relevant records from the DNS. Now, with Windows finally gone, I had just two Samba DCs left: one running the primary DNS (A), the other having the FSMO roles (B)--and replication worked! DNS updates still didn't work, but there were hints of SPN problems at log.samba, and at this point I finally realised my aforementioned SPN mistake. After sorting these out and performing the procedure described at https://wiki.samba.org/index.php/Dns_tkey_negotiategss:_TKEY_is_unacceptable kerberos finally started to work; "samba_dnsupdate --all-names --verbose --fail-immediately" passed on both DCs and workstations finally started to re-register themselves. All now sorted out, except... the aforementioned INTERNAL_DB_CORRUPTION errors. They're appearing in the log.samba of the current FSMO box (B). Our future plan is to transfer the FSMO roles from DC B to DC A, join our still-ordinary-slave secondary DNS to the domain as a new DC (C)--migrating that to BIND9_DLZ in the process--and finally demote and remove B, leaving just DCs A and C, both running Samba with BIND_DLZ backends. Before proceeding any further, I however wish to sort the errors out; I've got my part of the scary moments already, when I envisioned starting over by ditching DC A and restoring DC B from a backup. So, to my question. What is the best option: a) To try to manually equalise the attributes (with ADSI Edit or some other LDAP tool) of the CN=75a3f420-60ef-4728-8608-3ead61de4555,CN=Part itions,CN=Configuration,DC=mydomain,DC=tld object (e.g. the nCName attribute that Samba is complaining about--that has the value "DC=ForestDnsZones,DC=mydomain,DC=tld" on DC A but "<none>" on DC B)? Or would this actually be a risky/dangerous procedure? b) Just stop worrying and proceed with migrating the FSMO roles to DC A, joining DC C and ditching DC B, trusting that when DC B is finally demoted and gone, all will be fine? It'd be wonderful I could just trust this option to work, because that'd be the least time consuming. c) Judge the Samba DB to be beyond repair, ditch DC A, restore DC B from a backup, start over again, and re-perform the DNS upgrade somehow differently (how?). Obviously not my favourite option, because of the extra work involved, and because things seem to mostly work already, with normal replication working without errors. What is the recommended course of action by the Samba team? Our Samba version is 4.2.2. BIND is 9.9.5-9-Debian. Thanking for any advice, -- Pinja-Liina Jalkanen Vihre?t / De Gr?na https://www.vihreat.fi/
Rowland Penny
2015-Jul-03 14:32 UTC
[Samba] NT_STATUS_INTERNAL_DB_CORRUPTION messages in log.samba--proper course of action?
On 03/07/15 14:25, Pinja-Liina Jalkanen wrote:> Hi all, > > We've recently migrated from a separate DNS server that was dynamically > updated with BIND's update-policy, using a manually generated > tkey-gssapi-keytab (plus a second server functioning as an ordinary > slave to the first), to BIND9_DLZ. The setup predated Samba's AD DC > support and BIND's DLZ support, and was originally established because > even though we needed AD, we were unwilling to use Windows's own DNS server. > > After the migration, while replication seems to work, Windows is finally > gone for good (yay!), kerberos seems to work and dynamic DNS updates > certainly work, there is still a lingering problem manifested as errors > in Samba logs that I'd like to ask about. The messages are as follows > (this is with -d3): > > [2015/07/03 14:04:15.034263, 1] > ../source4/dsdb/kcc/kcc_topology.c:1437(kcctpl_color_vertices) > ../source4/dsdb/kcc/kcc_topology.c:1437: failed to find nCName > attribute of object CN=75a3f420-60ef-4728-8608-3ead61de4555,CN=Part > itions,CN=Configuration,DC=mydomain,DC=tld > [2015/07/03 14:04:15.034308, 1] > ../source4/dsdb/kcc/kcc_topology.c:3236(kcctpl_create_connections) > ../source4/dsdb/kcc/kcc_topology.c:3236: failed to color vertices: > NT_STATUS_INTERNAL_DB_CORRUPTION > [2015/07/03 14:04:15.034317, 1] > ../source4/dsdb/kcc/kcc_topology.c:3496(kcctpl_create_intersite_connections) > ../source4/dsdb/kcc/kcc_topology.c:3496: failed to create connections: > NT_STATUS_INTERNAL_DB_CORRUPTION > [2015/07/03 14:04:26.299572, 3] > ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > > Some background about how I did the migration process: > > Because one of our DC's was still running Windows 2003, and its support > ends on this very month, we needed to migrate out of that entirely. To > that end we'd already transferred all the FSMO roles to Samba, which > worked as it should. But we also wanted to finally move from the > separate DNS to BIND9_DLZ. This is hardly a documented procedure; eg. > the Samba Wiki page at > https://wiki.samba.org/index.php/Changing_the_DNS_backend has no mention > of it. > > The obvious way seemed to be to install Samba to our primary DNS for the > migration and then join that server to the domain. So I joined it to the > domain as a new DC, using a "--dns-backend=BIND9_DLZ" flag. But: this > didn't seem to work. The join went OK, but for some reason it didn't > create named.conf or dns.keytab into Samba's private directory. > > In retrospect, it might had been better to install BIND on the DC having > the FSMO roles and run samba_upgradedns on that, but I didn't even know > to think about such an option in advance, because the documentation for > samba_upgradedns didn't take into account a situation like ours, where > the previous DNS backend had been NONE. And that box was never supposed > to run the DNS. > > The next thing that I tried to do was to run "samba_upgradedns > --dns-backend=BIND9_DLZ" on the newly promoted machine. After manually > creating the "DnsAdmins" AD group it actually seemed to work. But I had > forgotten that the DNS/primarydns.domain.tld SPN was already assigned to > the user that had previously been used to do the dynamic updates. (I'll > return to the consequences of that mistake below.) > > At this point I ran a short script commanding "samba-tool dns add" and > adding, one by one, all the old A records for hosts that have static IPs > from the domain.tld zonefile back to the domain.tld zone DB that was now > managed by Samba. I also noticed that the other DC's records weren't > there; I tried to run samba_dnsupdate on the FSMO server, but it failed > complaining about kerberos. > > Because replication was seriously broken at this point due to missing > DNS records, I added the right records for the other DCs manually, and > pointed pdc._msdcs.mydomain.tld to the right DC. After this the two > Samba DCs replicated with each other without errors, but the Windows DC > didn't; it complaining about lingering objects, which was odd because > the DNS had been broken only for a short while, and no deletions had > been done during that period (the lingering object that Windows was > complaining about was a long since deleted, ordinary domain user). > > After some futile repair attempts that failed mostly due to the mixed > Windows/Samba environment, I decided not to waste any more time with > Windows, because that box was to be demoted really soon anyway; I just > ran "dcpromo /forceremoval", cleaned up the metadata by running the > script on page > https://gallery.technet.microsoft.com/scriptcenter/d31f091f-2642-4ede-9f97-0e1cc4d577f3 > through RSAT and manually cleaned up the relevant records from the DNS. > > Now, with Windows finally gone, I had just two Samba DCs left: one > running the primary DNS (A), the other having the FSMO roles (B)--and > replication worked! DNS updates still didn't work, but there were hints > of SPN problems at log.samba, and at this point I finally realised my > aforementioned SPN mistake. After sorting these out and performing the > procedure described at > https://wiki.samba.org/index.php/Dns_tkey_negotiategss:_TKEY_is_unacceptable > kerberos finally started to work; "samba_dnsupdate --all-names --verbose > --fail-immediately" passed on both DCs and workstations finally started > to re-register themselves. > > All now sorted out, except... the aforementioned INTERNAL_DB_CORRUPTION > errors. They're appearing in the log.samba of the current FSMO box (B). > Our future plan is to transfer the FSMO roles from DC B to DC A, join > our still-ordinary-slave secondary DNS to the domain as a new DC > (C)--migrating that to BIND9_DLZ in the process--and finally demote and > remove B, leaving just DCs A and C, both running Samba with BIND_DLZ > backends. > > Before proceeding any further, I however wish to sort the errors out; > I've got my part of the scary moments already, when I envisioned > starting over by ditching DC A and restoring DC B from a backup. > > So, to my question. What is the best option: > > a) To try to manually equalise the attributes (with ADSI Edit or some > other LDAP tool) of the CN=75a3f420-60ef-4728-8608-3ead61de4555,CN=Part > itions,CN=Configuration,DC=mydomain,DC=tld object (e.g. the nCName > attribute that Samba is complaining about--that has the value > "DC=ForestDnsZones,DC=mydomain,DC=tld" on DC A but "<none>" on DC B)? Or > would this actually be a risky/dangerous procedure? > > b) Just stop worrying and proceed with migrating the FSMO roles to DC A, > joining DC C and ditching DC B, trusting that when DC B is finally > demoted and gone, all will be fine? It'd be wonderful I could just trust > this option to work, because that'd be the least time consuming. > > c) Judge the Samba DB to be beyond repair, ditch DC A, restore DC B from > a backup, start over again, and re-perform the DNS upgrade somehow > differently (how?). Obviously not my favourite option, because of the > extra work involved, and because things seem to mostly work already, > with normal replication working without errors. > > What is the recommended course of action by the Samba team? > Our Samba version is 4.2.2. BIND is 9.9.5-9-Debian. > > Thanking for any advice, >Why did you go with '--dns-backend=None' , did you miss the 'NONE skips the DNS setup entirely (not recommended)' part in the commands help? Don't bother answering, this is a rhetorical question. OK, I suggest that you look in /usr/share/samba/provision/sambadns.py and then 'create_dns_partitions'. This is what *didn't* get run when you provisioned, You should be able to work out what you need to do now. Rowland
Rowland Penny
2015-Jul-03 18:07 UTC
[Samba] NT_STATUS_INTERNAL_DB_CORRUPTION messages in log.samba--proper course of action?
On 03/07/15 18:51, Pinja-Liina Jalkanen wrote:> On 03/07/15 17:32, Rowland Penny wrote: >> On 03/07/15 14:25, Pinja-Liina Jalkanen wrote: >>> Hi all, >>> >>> We've recently migrated from a separate DNS server that was dynamically >>> updated with BIND's update-policy, using a manually generated >>> tkey-gssapi-keytab (plus a second server functioning as an ordinary >>> slave to the first), to BIND9_DLZ. The setup predated Samba's AD DC >>> support and BIND's DLZ support, and was originally established because >>> even though we needed AD, we were unwilling to use Windows's own DNS >>> server.OK, so I missed that you hadn't provisioned samba4, but the message when you run 'samba-tool domain join --help' is even more explicit : --dns-backend=NAMESERVER-BACKEND The DNS server backend. SAMBA_INTERNAL is the builtin name server (default), BIND9_DLZ uses samba4 AD to store zone information, NONE skips the DNS setup entirely (this DC will not be a DNS server) You need DNS for an AD domain, no ifs or buts, and experience of this mailing list leads to me think that not running it on the DCs is a bad idea.>> Why did you go with '--dns-backend=None' , did you miss the 'NONE skips >> the DNS setup entirely (not recommended)' part in the commands help? >> Don't bother answering, this is a rhetorical question. > You're throwing me rethorical questions, but didn't bother to actually > _read_ my message, did you? I explained quite carefully that we used to > have a DNS setup that is separate from AD and that _predates_ Samba's AD > support--that is Samba 4.0--entirely. > > You could have just as well asked me why we didn't back then just go > with the MS DNS but decided to use BIND instead. Because we've never > ever _provisioned_ Samba; at least not as in "samba-tool domain > provision". The "provisioning" of our domain was, once upon a time, done > with Windows' dcpromo.exe. > > When we first added a Samba DC to the mix we were absolutely NOT going > to change the existing DNS setup, as Samba's AD support was still very > bleeding edge and that's why the first Samba DC was joined to our domain > with --dns-backend=NONE. This whole problem arose when we were finally > brave enough to try to change that setup, but there wasn't any > documentation explaining how to do that.There isn't any documentation for what you need to do now, because nobody ever thought that somebody would set up AD with samba4 (in any form) without a DNS server running on the DC. Your only hope is to go through the files in /usr/share/pyshared/samba/provision/ and pick out the required info from them. I certainly wont be helping you any further in this problem of your own making! Rowland> >> OK, I suggest that you look in /usr/share/samba/provision/sambadns.py >> and then 'create_dns_partitions'. This is what *didn't* get run when you >> provisioned, You should be able to work out what you need to do now. > Given what you wrote before, ignoring entirely the fact that our domain > was never provisioned using Samba, I'm taking your advice with a grain > of salt. But I'll take a look of that script. > > By the way, while the DNS updates do work now, I happened to notice the > following message in the BIND log right after a successful update. It's > most likely related: > > Jul 3 19:00:42 dc-a named[18846]: failed to find dnsRecord for > DC=mydomain.tld,CN=MicrosoftDNS,DC=DomainDnsZones,DC=mydomain,DC=tld > > > Pinja-Liina Jalkanen
Pinja-Liina Jalkanen
2015-Jul-07 13:23 UTC
[Samba] (SOLVED) NT_STATUS_INTERNAL_DB_CORRUPTION messages in log.samba--proper course of action?
(Sorry for the other list users that my previous reply to this thread went to Rowland Penny only. That wasn't my intention.) On 03/07/15 21:07, Rowland Penny wrote:> On 03/07/15 18:51, Pinja-Liina Jalkanen wrote: >> On 03/07/15 17:32, Rowland Penny wrote: >>> On 03/07/15 14:25, Pinja-Liina Jalkanen wrote: >>>> Hi all, >>>> >>>> We've recently migrated from a separate DNS server that was dynamically >>>> updated with BIND's update-policy, using a manually generated >>>> tkey-gssapi-keytab (plus a second server functioning as an ordinary >>>> slave to the first), to BIND9_DLZ. The setup predated Samba's AD DC >>>> support and BIND's DLZ support, and was originally established because >>>> even though we needed AD, we were unwilling to use Windows's own DNS >>>> server. > > OK, so I missed that you hadn't provisioned samba4, but the message when > you run 'samba-tool domain join --help' is even more explicit : > > --dns-backend=NAMESERVER-BACKEND > The DNS server backend. SAMBA_INTERNAL is the > builtin > name server (default), BIND9_DLZ uses samba4 AD to > store zone information, NONE skips the DNS setup > entirely (this DC will not be a DNS server) > > You need DNS for an AD domain, no ifs or buts,Very true. And we've always had one.> and experience of this > mailing list leads to me think that not running it on the DCs is a bad > idea.That's entirely by preference. The only thing that you can't achieve with separate DNS servers is multiple dynamically updatable servers, because with separate servers only the one acting as a primary can be dynamically updated. Secure dynamic updates have been possible from BIND 9.5 forwards, if I recall correctly. We first implemented them with BIND 9.7.>>> Why did you go with '--dns-backend=None' , did you miss the 'NONE skips >>> the DNS setup entirely (not recommended)' part in the commands help? >>> Don't bother answering, this is a rhetorical question. >> You're throwing me rethorical questions, but didn't bother to actually >> _read_ my message, did you? I explained quite carefully that we used to >> have a DNS setup that is separate from AD and that _predates_ Samba's AD >> support--that is Samba 4.0--entirely. >> >> You could have just as well asked me why we didn't back then just go >> with the MS DNS but decided to use BIND instead. Because we've never >> ever _provisioned_ Samba; at least not as in "samba-tool domain >> provision". The "provisioning" of our domain was, once upon a time, done >> with Windows' dcpromo.exe. >> >> When we first added a Samba DC to the mix we were absolutely NOT going >> to change the existing DNS setup, as Samba's AD support was still very >> bleeding edge and that's why the first Samba DC was joined to our domain >> with --dns-backend=NONE. This whole problem arose when we were finally >> brave enough to try to change that setup, but there wasn't any >> documentation explaining how to do that. > > There isn't any documentation for what you need to do now, because > nobody ever thought that somebody would set up AD with samba4 (in any > form) without a DNS server running on the DC.Except the person who added the "NONE" option to the code, perhaps? And Microsoft certainly had thought about it--I vaguely recall them having documentation how to manually register the records required by the DC back in 2000 when AD was first launched. Migrating the DNS to Windows was never mandatory, and many were never thrilled to run MS DNS. Like we weren't. But for some of us, Samba NT domain wasn't enough, either.> Your only hope is to go through the files in > /usr/share/pyshared/samba/provision/ and pick out the required info from > them.Actually, no. It wasn't my only hope. My original mistake was that I should have demoted all non-FSMO DCs first and not join any others before upgrading, but run samba_upgradedns --dns_backend=BIND9_DLZ on that remaining single DC. Then join further DCs again, this time with the right DNS backend. I'm pretty certain now that this would've worked right away. The INTERNAL_DB_CORRUPTION message was just due to the old DC not being aware of all the changes that had only been made on the newly joined DC. So what did I do to actually fix our setup? (DCs A, B and C refer to my original post in this thread.) 1. Made sure that I had good backups of everything. 2. Transferred FSMO roles from DC B to DC A. No errors. 3. Demoted DC B, so as to have only one DC (A) left. No errors. Then ran metadata cleanup just to make it sure. 4. Joined DC C, with option --dns_backend=BIND9_DLZ. No errors. 5. Performed the "Check and fix DNS entries on DC joins" procedure (see https://wiki.samba.org/index.php/Check_and_fix_DNS_entries_on_DC_joins). 6. Hey presto, it works! No errors left anywhere anymore! (Actually it didn't quite go this easily, as I accidentally pointed the DC C's objectGUID CNAME record to DC A first, which totally broke replication after joining of the DC C. But after spotting the error it was easy to fix!)> I certainly wont be helping you any further in this problem of > your own making!Your comment irks me rather badly. We had very good reasons (unless you're a great fan of MS) for our previous setup, and the "NONE" option for the DNS backend was always there. I think it's downright vile from your part to just label it "our own making"--like if we'd hacked the Samba source with our private patches or the like and were now asking for help. If you don't want to help someone, I suggest you just keep quiet instead! And if you really think the "NONE" option shouldn't be there at all, I think you can file a bug to remove it. ----------- For others potentially being in the same situation: 1. Make good backups! 2. Make a script that calls "samba-tool dns add" to migrate your old zonefile(s). 3. Strip your network down to a single DC (remember FSMO transfers and metadata cleanups!). 4. Create the DnsAdmins security group. 5. Remove any conflicting SPNs, if any. 6. Shutdown Samba. 7. Upgrade your domain using "samba_upgradedns --dns-backend=BIND9_DLZ". 8. Setup BIND (see Samba wiki) on your DC and start it. 9. Point your DC to resolve DNS from localhost, if it isn't already. 10. Start Samba and run "samba_dnsupdate --all-names --verbose --fail-immediately"; it should pass. 11. Migrate your old zone(s) using the script you made at #2. 12. When you've confirmed that everything works, you can join other DCs again.>> By the way, while the DNS updates do work now, I happened to notice the >> following message in the BIND log right after a successful update. It's >> most likely related: >> >> Jul 3 19:00:42 dc-a named[18846]: failed to find dnsRecord for >> DC=mydomain.tld,CN=MicrosoftDNS,DC=DomainDnsZones,DC=mydomain,DC=tldOK, this seems nothing to be worried about; named complains about this every time a Samba managed zone is AXFR'd (e.g. if one runs "dig @dc-a _msdcs.mydomain.tld. AXFR"). -- Pinja-Liina Jalkanen Vihre?t / De Gr?na https://www.vihreat.fi/