Hi, Last evening we upgraded one of our 4 Domain Controllers from 4.6.5 to 4.7.1 and all of a sudden the DNS resolution issue started. We are using BIND9_DLZ back end. This server we had upgraded has all 7 FSMO Roles. Some time back we upgraded the other 3 servers from version 4.6.5 to 4.7.1 and there was no issue at all. I have also observed a very peculiar behaviour that is all of a sudden, eventhough samba service is running, the Windows RSAT shows the servers as down and again, we doing nothing, it comes up. We are not seeing any peculiar error message either in samba logs or named logs. Could someone guide us forward? -- Thanks & Regards, Anantha Raghava Do not print this e-mail unless required. Save Paper & trees.
Hi, The upgrade from 4.6.5 broke all the servers. Although the services were running and there is no error message, DNS resolution failed. Even from inside the domain controllers, DNS queries failed. Samba Version 4.7.1 and Named Version 9.9.4. The same issue happened with samba version 4.7.3 and 4.7.4 We had to revert back to 4.6.5 to bring the servers back online. Now I have few questions: 1. On the the DCs, we stopped both samba-ad-dc and named services, just compiled samba 4.7.1 & installed without making any changes to the folders or the files. On three of our 4 servers it worked without any issues. On the fourth server which owned all FSMO roles, although both samba-ad-dc and named services were running (no errors were thrown), DNS could not respond to any queries. For some time, two of the servers were working and they also stopped resolving any names. Whether database replication faulted the other working servers as well, we are not having clarity. 2. Is there any specific procedure to be followed to upgrade? As mentioned above, we have just compiled the latest version from sources and installed on 4.6.5. The process that worked on 3 servers failed & broke the DNS on 4th server that owned all FSMO roles. Should we have transferred the FSMO roles to other working servers before upgrade process? 3. Since samba 4.7.x is a multi-process server, is there any DB locking issue, that is stopping named process from reading DB? 4. In named logs, we say that it was unable to obtain the DNAKEY (.) and it was timing out. But this error was shown even before the upgrade. It turned out be a warning that can be safely ignored. 5. We switched back from BIND9_DLZ DNS Back end to SAMBA_INTERNAL DNS Back end, the same issue continued. DNS failed to respond to any queries. Even internally with in DCs, ping the self could not resolve name & returned error. 6. Finally, when we downgraded, everything started working properly. on the version 4.7.4, we just compiled 4.6.5 and installed. Just reversed the upgrade process. It started working. All servers came back to normal. This is really surprising. Since it is a production setup that is serving the entire India, we just reverted all servers back to 4.6.5. 7. While we observed some network latency, it is not the culprit for stopping of response to DNS queries that stopped the entire 9000 users from logging in. Now we are simulating the same error in our lab setup to figure out the root cause for the issue, before upgrading the production servers. I seek your expert guidance and help to get to the root cause of the problem. -- Thanks & Regards, Anantha Raghava Do not print this e-mail unless required. Save Paper & trees. On 20/01/18 1:52 PM, Anantha Raghava wrote:> > Hi, > > Last evening we upgraded one of our 4 Domain Controllers from 4.6.5 to > 4.7.1 and all of a sudden the DNS resolution issue started. We are > using BIND9_DLZ back end. > > This server we had upgraded has all 7 FSMO Roles. Some time back we > upgraded the other 3 servers from version 4.6.5 to 4.7.1 and there was > no issue at all. > > I have also observed a very peculiar behaviour that is all of a > sudden, eventhough samba service is running, the Windows RSAT shows > the servers as down and again, we doing nothing, it comes up. > > We are not seeing any peculiar error message either in samba logs or > named logs. > > Could someone guide us forward? > > -- > > Thanks & Regards, > > > Anantha Raghava > > > Do not print this e-mail unless required. Save Paper & trees. >
On Mon, 22 Jan 2018 08:09:01 +0530 Anantha Raghava via samba <samba at lists.samba.org> wrote:> Hi, > > The upgrade from 4.6.5 broke all the servers. Although the services > were running and there is no error message, DNS resolution failed. > Even from inside the domain controllers, DNS queries failed. > > Samba Version 4.7.1 and Named Version 9.9.4. The same issue happened > with samba version 4.7.3 and 4.7.4 > > We had to revert back to 4.6.5 to bring the servers back online. Now > I have few questions: >This should work, can you post your smb.conf files and your named conf files. Rowland
Hi Anantha,> The upgrade from 4.6.5 broke all the servers. Although the services were > running and there is no error message, DNS resolution failed. Even from > inside the domain controllers, DNS queries failed. > > Samba Version 4.7.1 and Named Version 9.9.4. The same issue happened > with samba version 4.7.3 and 4.7.4 > > We had to revert back to 4.6.5 to bring the servers back online. Now I > have few questions: > > 1. On the the DCs, we stopped both samba-ad-dc and named services, just > compiled samba 4.7.1 & installed without making any changes to the > folders or the files. On three of our 4 servers it worked without any > issues. On the fourth server which owned all FSMO roles, although both > samba-ad-dc and named services were running (no errors were thrown), DNS > could not respond to any queries. For some time, two of the servers were > working and they also stopped resolving any names. Whether database > replication faulted the other working servers as well, we are not having > clarity. > > 2. Is there any specific procedure to be followed to upgrade? As > mentioned above, we have just compiled the latest version from sources > and installed on 4.6.5. The process that worked on 3 servers failed & > broke the DNS on 4th server that owned all FSMO roles. Should we have > transferred the FSMO roles to other working servers before upgrade process? > > 3. Since samba 4.7.x is a multi-process server, is there any DB locking > issue, that is stopping named process from reading DB?Like you are pointing out, 4.7 gets a performance boost with more multi-processing. However until 4.8 gets out, you may get issues with memory consumption if you have too many LDAP or RPC connections simultaneously. But it can be mitigated (see previous post on the subject). But it didn't trigger any issues on DNS side for us. By the way, in any case you should go with 4.7.4 and not 4.7.1.> 4. In named logs, we say that it was unable to obtain the DNAKEY (.) and > it was timing out. But this error was shown even before the upgrade. It > turned out be a warning that can be safely ignored. > > 5. We switched back from BIND9_DLZ DNS Back end to SAMBA_INTERNAL DNS > Back end, the same issue continued. DNS failed to respond to any > queries. Even internally with in DCs, ping the self could not resolve > name & returned error.The fact that internal DNS was not working would rule out a problem with your Bind-DLZ configuration... Did you run samba-tool dbcheck --cross-ncs after upgrade? There has been some issues on 4.6 to 4.7 upgrades with group membership, but I never had an issue on the DNS partitions though. Do you have some Windows AD join to you Samba domain? Like Rowland was saying, could you post your smb.conf file too see if there is anything weird in there. By the way, unless you have specific requirements, you should stick to pre-compiled deb packages. There are many options, from Samba Plus packages with direct support from SerNet, the package from LPH van Belle, or ours at TIS. It would help to rule out any build/compilation issues. Cheers, Denis> > 6. Finally, when we downgraded, everything started working properly. on > the version 4.7.4, we just compiled 4.6.5 and installed. Just reversed > the upgrade process. It started working. All servers came back to > normal. This is really surprising. Since it is a production setup that > is serving the entire India, we just reverted all servers back to 4.6.5. > > 7. While we observed some network latency, it is not the culprit for > stopping of response to DNS queries that stopped the entire 9000 users > from logging in. > > Now we are simulating the same error in our lab setup to figure out the > root cause for the issue, before upgrading the production servers. > > I seek your expert guidance and help to get to the root cause of the > problem. >-- Denis Cardon Tranquil IT Systems Les Espaces Jules Verne, bâtiment A 12 avenue Jules Verne 44230 Saint Sébastien sur Loire tel : +33 (0) 2.40.97.57.55 http://www.tranquil-it-systems.fr