thr3ads.net - samba - [Samba] samba AD problem after re-join domain [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Jason Keltz

2020-Oct-12 14:36 UTC

[Samba] samba AD problem after re-join domain

On 10/12/2020 4:06 AM, Rowland penny via samba wrote:> On 12/10/2020 02:54, Jason Keltz via samba wrote:
>> I've been working on a Samba AD setup with a bunch of test machines
-
>> the one DC, and a bunch of clients. Last night, I ended up switching 
>> the name of the test machines temporarily (except the DC), and 
>> re-joining the domain (that's for another e-mail later). When
things
>> didn't work the way I had planned,? I switched the hostnames back, 
>> and re-joined the domain today on all the test machines.? I was 
>> shocked to find that I am only able to login to the domain on one of 
>> my hosts. It fails on all the other ones.? I ensured that I deleted 
>> the machine entries from AD.? I haven't changed my Samba config in 
>> months which Rowland had last verified was fine.? I haven't changed
>> my /etc/krb5.conf Kerberos config in months.? I even did a complete 
>> rebuild of one of the machines since I automated the installation 
>> process, and that rebuild was working perfectly many many times, but 
>> now it is failed.? In winbind log every time I try to login I'm 
>> mostly seeing:
>
> Did you leave the domain before you changed the hostname ?
>
> Why did you change the hostnames ? In a case like this, I would have 
> set up a new computer, joined this to the domain and then removed the 
> old computer from the domain. 
Hi Rowland,

I did not leave the domain, but I did delete the entry by either the 
Windows AD tool or "samba-tool computer delete" option.? I can't 
remember which one at this point.? I think that clears up all the bits.? 
Is that correct?? On the local host, I also deleted the 
/etc/krb5.keytab, and deleted all the samba bits so that the join was fresh.

Things are better today.? I discovered one issue which seemingly 
unrelated (to me) to the errors seems to have been the cause of a lot of 
the trouble.? I was chasing errors in winbind log, but several of the 
test servers are NFS servers, and when I rejoined them to the domain, I 
didn't replace the nfs/X entries in their keytab.? Now, the clients 
couldn't mount, and that definately caused some trouble, for which I 
didn't see the signs.? I'm still watching though. However, I can login 
to all the hosts now.

By the way, at one point, I rebooted the DC, and I noticed that all the 
AD clients showed something like this:

[2020/10/12 09:25:19.183616,? 1, pid=36145, effective(0, 0), real(0, 0)] 
../../source3/rpc_client/cli_pipe.c:422(cli_pipe_validate_current_pdu)
 ? ../../source3/rpc_client/cli_pipe.c:422: Bind NACK received from host 
dc1.ad.eecs.yorku.ca!
[2020/10/12 09:44:11.598150,? 1, pid=36145, effective(0, 0), real(0, 0)] 
../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
 ? Reducing LDAP page size from 1000 to 500 due to IO_TIMEOUT

(Which is strange because this means that if you reboot he DC, then the 
clients start talking slower to it when it comes back up?? I don't think 
the number ever increases unless you restart winbind everywhere?)

and since that reboot, I've seen a few of them do this:

[2020/10/12 10:00:19.814381,? 1, pid=36145, effective(0, 0), real(0, 0)] 
../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
 ? Reducing LDAP page size from 500 to 250 due to IO_TIMEOUT
[2020/10/12 10:16:19.557261,? 1, pid=36145, effective(0, 0), real(0, 0)] 
../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
 ? Reducing LDAP page size from 250 to 125 due to IO_TIMEOUT

Two of them are virtualbox VMs, so I figured maybe it's some kind of 
virtualbox thing, but one of them is an actual machine and still has the 
same error.? The DC is very lightly loaded.? How would I debug what is 
causing this reduction in IO?

I know that various errors in the Samba logs are not "issues" but this
one seems to be an issue.? I don't like seeing IO_TIMEOUTs.

Another distracting error in the log included:

[2020/10/11 22:43:29.843630,? 1, pid=969, effective(0, 0), real(0, 0)] 
../../source3/libads/ldap.c:565(ads_find_dc)
 ? ads_find_dc: name resolution for realm 'AD.EECS.YORKU.CA' (domain 
'EECSYORKUCA') failed: NT_STATUS_NO_LOGON_SERVERS

... after boot which sounds serious but it turns out if I try to 
authenticate before everything is up and running, that's what I get. The 
error makes sense but there's no "follow up" to say: "Ok ok -
I found it
now - Sorry to give you a heart attack.".? It's all a learning
experience.

The real reason I was trying to change the hostnames was to deal with a 
scenario particular of our environment.? We have many dualboot machines? 
running Windows and Linux.? I know that I can't join the domain with the 
same name on both Linux and Windows systems because joining one would 
change the password, then the other wouldn't be joined, etc.? I 
understand that it's possible to generate a machine password manually, 
and use that from both sides, but as I understand it, this interferes 
with the systems ability to change the machine password regularly which 
seems more secure.? I don't know if Samba does that. ? I also don't want
to have a different IP address for both sides because that would be 
wasteful.? I would prefer if the hostname would be the same on both 
sides as well.??? I was trying to explore how carefully the name in the 
AD computer database is tied to the "real" DNS name of the host.? What
I
was trying to do was to add to /etc/samba/smb.conf: netbios name=<system 
hostname>-linux so that when I would join the hosts under Linux, they 
would take on a "-linux" name, but only in the AD computer database.? 
When the host was booted, the host would have an AD name of <system 
hostname>-linux, but a real name of just "<system
hostname>".? ? On
Windows, both the AD name and hostname would be "<system
hostname>".?
This would mean that on Windows, you could have a computer called 
"test", and under Linux, "test-linux", but both would really
be the same
physical PC and both would be host "test" with one IP. ?? It
wasn't
working.? I am pretty sure I forgot the nfs/X entries on the NFS servers 
after rejoining the domain so that may be the issue.? However, thinking 
back, I also think that "net ads keytab" would not let me add an entry
for "host/test...." because it wanted "host/test-linux....",
but I could
be wrong.? If the host *had* to take on its real identity "test-linux"
then test-linux could just be an alias for test, I guess, but then the 
machine build would be a headache.... and when the Linux machines boot 
they use dhcp (just like Windows) and the machine wouldn't know if it's 
"test" or "test-linux". Lots of "fun".

Jason.

Jason Keltz

2020-Oct-12 15:11 UTC

head link

[Samba] samba AD problem after re-join domain

On 10/12/2020 10:36 AM, Jason Keltz wrote:>
> On 10/12/2020 4:06 AM, Rowland penny via samba wrote:
>> On 12/10/2020 02:54, Jason Keltz via samba wrote:
>>> I've been working on a Samba AD setup with a bunch of test
machines
>>> - the one DC, and a bunch of clients. Last night, I ended up 
>>> switching the name of the test machines temporarily (except the
DC),
>>> and re-joining the domain (that's for another e-mail later).
When
>>> things didn't work the way I had planned,? I switched the
hostnames
>>> back, and re-joined the domain today on all the test machines.? I 
>>> was shocked to find that I am only able to login to the domain on 
>>> one of my hosts. It fails on all the other ones.? I ensured that I 
>>> deleted the machine entries from AD.? I haven't changed my
Samba
>>> config in months which Rowland had last verified was fine.? I 
>>> haven't changed my /etc/krb5.conf Kerberos config in months.? I
even
>>> did a complete rebuild of one of the machines since I automated the
>>> installation process, and that rebuild was working perfectly many 
>>> many times, but now it is failed. In winbind log every time I try
to
>>> login I'm mostly seeing:
>>
>> Did you leave the domain before you changed the hostname ?
>>
>> Why did you change the hostnames ? In a case like this, I would have 
>> set up a new computer, joined this to the domain and then removed the 
>> old computer from the domain. 
>
> Hi Rowland,
>
> I did not leave the domain, but I did delete the entry by either the 
> Windows AD tool or "samba-tool computer delete" option.? I
can't
> remember which one at this point.? I think that clears up all the 
> bits.? Is that correct?? On the local host, I also deleted the 
> /etc/krb5.keytab, and deleted all the samba bits so that the join was 
> fresh.
>
> Things are better today.? I discovered one issue which seemingly 
> unrelated (to me) to the errors seems to have been the cause of a lot 
> of the trouble.? I was chasing errors in winbind log, but several of 
> the test servers are NFS servers, and when I rejoined them to the 
> domain, I didn't replace the nfs/X entries in their keytab.? Now, the 
> clients couldn't mount, and that definately caused some trouble, for 
> which I didn't see the signs.? I'm still watching though. However,
I
> can login to all the hosts now.
>
> By the way, at one point, I rebooted the DC, and I noticed that all 
> the AD clients showed something like this:
>
> [2020/10/12 09:25:19.183616,? 1, pid=36145, effective(0, 0), real(0, 
> 0)] 
> ../../source3/rpc_client/cli_pipe.c:422(cli_pipe_validate_current_pdu)
> ? ../../source3/rpc_client/cli_pipe.c:422: Bind NACK received from 
> host dc1.ad.eecs.yorku.ca!
> [2020/10/12 09:44:11.598150,? 1, pid=36145, effective(0, 0), real(0, 
> 0)] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
> ? Reducing LDAP page size from 1000 to 500 due to IO_TIMEOUT
>
> (Which is strange because this means that if you reboot he DC, then 
> the clients start talking slower to it when it comes back up?? I don't 
> think the number ever increases unless you restart winbind everywhere?)
>
> and since that reboot, I've seen a few of them do this:
>
> [2020/10/12 10:00:19.814381,? 1, pid=36145, effective(0, 0), real(0, 
> 0)] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
> ? Reducing LDAP page size from 500 to 250 due to IO_TIMEOUT
> [2020/10/12 10:16:19.557261,? 1, pid=36145, effective(0, 0), real(0, 
> 0)] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
> ? Reducing LDAP page size from 250 to 125 due to IO_TIMEOUT
>
> Two of them are virtualbox VMs, so I figured maybe it's some kind of 
> virtualbox thing, but one of them is an actual machine and still has 
> the same error.? The DC is very lightly loaded.? How would I debug 
> what is causing this reduction in IO?
>
> I know that various errors in the Samba logs are not "issues" but
this
> one seems to be an issue.? I don't like seeing IO_TIMEOUTs.
>
> Another distracting error in the log included:
>
> [2020/10/11 22:43:29.843630,? 1, pid=969, effective(0, 0), real(0, 0)] 
> ../../source3/libads/ldap.c:565(ads_find_dc)
> ? ads_find_dc: name resolution for realm 'AD.EECS.YORKU.CA' (domain
> 'EECSYORKUCA') failed: NT_STATUS_NO_LOGON_SERVERS
>
> ... after boot which sounds serious but it turns out if I try to 
> authenticate before everything is up and running, that's what I get. 
> The error makes sense but there's no "follow up" to say:
"Ok ok - I
> found it now - Sorry to give you a heart attack.".? It's all a 
> learning experience.
>
> <snipped>
> Jason

I wanted to add one more thing...? It seems that I'm actually still 
getting this everywhere when a user logs in:

[2020/10/12 10:59:29.958617,? 1, pid=23338, effective(1004, 0), 
real(1004, 0)] 
../../source3/librpc/crypto/gse_krb5.c:417(fill_mem_keytab_from_system_keytab)
 ? ../../source3/librpc/crypto/gse_krb5.c:417: krb5_kt_start_seq_get 
failed (Permission denied)

... but at least the user can still login.

I wonder if this a regular error and everyone is seeing this in their 
logs?? Just for fun, I tried to change the permission of 
/etc/krb5.keytab temporarily to 644, and sure enough, the error goes 
away....? so somehow when the user is logging in, it seems that winbind 
is trying to read the keytab as user.? It's not clear why that would be, 
but while a google search hasn't revealed the reason for this error, I 
do see it in a whole lot of log files. It's just that when I'm trying to
ensure there are no problems with my setup, and trying to understand the 
errors that do show up, it can cause panic.? Whether it's a problem or 
not, I do not know.

Jason.

Rowland penny

2020-Oct-12 15:51 UTC

head link

[Samba] samba AD problem after re-join domain

On 12/10/2020 16:11, Jason Keltz wrote:>
>> Hi Rowland,
>>
>> I did not leave the domain, but I did delete the entry by either the 
>> Windows AD tool or "samba-tool computer delete" option.? I
can't
>> remember which one at this point.? I think that clears up all the 
>> bits.? Is that correct?? On the local host, I also deleted the 
>> /etc/krb5.keytab, and deleted all the samba bits so that the join was 
>> fresh.I would always 'leave' the domain first, before doing anything
else.>>
>>
>> By the way, at one point, I rebooted the DC, and I noticed that all 
>> the AD clients showed something like this:
>>
>> [2020/10/12 09:25:19.183616,? 1, pid=36145, effective(0, 0), real(0, 
>> 0)] 
>> ../../source3/rpc_client/cli_pipe.c:422(cli_pipe_validate_current_pdu)
>> ? ../../source3/rpc_client/cli_pipe.c:422: Bind NACK received from 
>> host dc1.ad.eecs.yorku.ca!
>> [2020/10/12 09:44:11.598150,? 1, pid=36145, effective(0, 0), real(0, 
>> 0)] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
>> ? Reducing LDAP page size from 1000 to 500 due to IO_TIMEOUT
>>
>> (Which is strange because this means that if you reboot he DC, then 
>> the clients start talking slower to it when it comes back up?? I 
>> don't think the number ever increases unless you restart winbind 
>> everywhere?)'page size' refers to the number of records returned, I would be more 
worried about the 'IO_TIMEOUT'>>
>> and since that reboot, I've seen a few of them do this:
>>
>> [2020/10/12 10:00:19.814381,? 1, pid=36145, effective(0, 0), real(0, 
>> 0)] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
>> ? Reducing LDAP page size from 500 to 250 due to IO_TIMEOUT
>> [2020/10/12 10:16:19.557261,? 1, pid=36145, effective(0, 0), real(0, 
>> 0)] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
>> ? Reducing LDAP page size from 250 to 125 due to IO_TIMEOUT
>>
>> Two of them are virtualbox VMs, so I figured maybe it's some kind
of
>> virtualbox thing, but one of them is an actual machine and still has 
>> the same error.? The DC is very lightly loaded. How would I debug 
>> what is causing this reduction in IO?
I would be checked your network connections etc.>>
>> I know that various errors in the Samba logs are not "issues"
but
>> this one seems to be an issue.? I don't like seeing IO_TIMEOUTs.
>>
>> Another distracting error in the log included:
>>
>> [2020/10/11 22:43:29.843630,? 1, pid=969, effective(0, 0), real(0, 
>> 0)] ../../source3/libads/ldap.c:565(ads_find_dc)
>> ? ads_find_dc: name resolution for realm 'AD.EECS.YORKU.CA'
(domain
>> 'EECSYORKUCA') failed: NT_STATUS_NO_LOGON_SERVERS
That make me think of dns/network problems.

>>
>> ... after boot which sounds serious but it turns out if I try to 
>> authenticate before everything is up and running, that's what I
get.
>> The error makes sense but there's no "follow up" to say:
"Ok ok - I
>> found it now - Sorry to give you a heart attack.". It's all a 
>> learning experience.
>>
>> <snipped>
>> Jason
>
>
>
> I wonder if this a regular error and everyone is seeing this in their 
> logs?? Just for fun, I tried to change the permission of 
> /etc/krb5.keytab temporarily to 644, and sure enough, the error goes 
> away....? so somehow when the user is logging in, it seems that 
> winbind is trying to read the keytab as user.? It's not clear why that 
> would be, but while a google search hasn't revealed the reason for 
> this error, I do see it in a whole lot of log files. It's just that 
> when I'm trying to ensure there are no problems with my setup, and 
> trying to understand the errors that do show up, it can cause panic.? 
> Whether it's a problem or not, I do not know
The keytab shouldn't be a problem, what are the permissions on 
/etc/krb5.conf ?

Rowland


The permissio

Seemingly Similar Threads

Search for more apparently analagous threads

samba - Oct 2020 - samba AD problem after re-join domain

[Samba] samba AD problem after re-join domain

[Samba] samba AD problem after re-join domain

[Samba] samba AD problem after re-join domain

Seemingly Similar Threads