Hi all, James, After following James' suggestions fixing the several dbcheck errors, and having observed things for a few days, I'd like to update this issue, and hope for some new input again. :-) Summary: three DCs, all three running Version 4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports no errors, except for two (supposedly innocent) dangling forward links that I'm ignoring for now. Time is synced. Very basic smb.conf, posted earlier, can post again if needed. samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in sync, and also samba-tool drs showrepl shows that replication seems to be stable. The "getting stuck" from the subject line has not occured for a few days, perhaps the dbcheck fixes have solved that, or perhaps we've just been lucky. All in all this appears pretty healthy, but there is a remaing problem: At ANY given time, ONE RANDOM single DC shows high cpu usage on one samba process. And on that DC (can be any of the three DCs) the logs fill up with this:> [2017/10/12 08:38:57.956586, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer' > [2017/10/12 08:38:57.956638, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer] > [2017/10/12 08:38:57.956823, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer' > [2017/10/12 08:38:57.956869, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer] > [2017/10/12 08:38:57.956990, 3] ../source4/auth/ntlm/auth.c:271(auth_check_password_send) > auth_check_password_send: Checking password for unmapped user []\[]@[(null)] > auth_check_password_send: mapped user is: []\[]@[(null)] > [2017/10/12 08:38:57.958675, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET' > [2017/10/12 08:38:57.958728, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET] > [2017/10/12 08:38:57.958948, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET' > [2017/10/12 08:38:57.958994, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET] > [2017/10/12 08:38:57.969111, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) > [2017/10/12 08:38:57.969762, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) > ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1) > [2017/10/12 08:38:58.378265, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) > [2017/10/12 08:38:58.379160, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) > ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1) > [2017/10/12 08:38:58.810202, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) > [2017/10/12 08:38:58.810868, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) > ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1) > [2017/10/12 08:38:59.251863, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) > [2017/10/12 08:38:59.252418, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) > ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1) > [2017/10/12 08:38:59.692247, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)I've seen "last_dn" be various things, system groups like above, but also regular users, computers, and groups that we created. We have even had (very few) cases were it was:> ./log.samba.3.gz: ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn DC=samba,DC=company,DC=com)Can anyone explain what is happening here, or help me understand this? I have read that highwatermark errors are not neccesarily bad, but the fact that they cause continuous high cpu usage on a DC (80, 90%), until the point where this behaviour "transfers" to a next DC makes me feel that in this case, this is not normal, and indicates some kind of problem. Thanks for input! MJ MJ
lingpanda101
2017-Oct-12 14:12 UTC
[Samba] samba getting stuck, highwatermark replication issue?
On 10/12/2017 3:17 AM, mj wrote:> Hi all, James, > > After following James' suggestions fixing the several dbcheck errors, > and having observed things for a few days, I'd like to update this > issue, and hope for some new input again. :-) > > Summary: three DCs, all three running Version > 4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports > no errors, except for two (supposedly innocent) dangling forward links > that I'm ignoring for now. Time is synced. Very basic smb.conf, posted > earlier, can post again if needed. > > samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in > sync, and also samba-tool drs showrepl shows that replication seems to > be stable. > > The "getting stuck" from the subject line has not occured for a few > days, perhaps the dbcheck fixes have solved that, or perhaps we've > just been lucky. > > All in all this appears pretty healthy, but there is a remaing problem: > > At ANY given time, ONE RANDOM single DC shows high cpu usage on one > samba process. And on that DC (can be any of the three DCs) the logs > fill up with this: > >> [2017/10/12 08:38:57.956586, 3] >> ../source4/smbd/service_stream.c:66(stream_terminate_connection) >> Terminating connection - 'ldapsrv_accept_tls_loop: >> tstream_tls_accept_recv() - 104:Connection reset by peer' >> [2017/10/12 08:38:57.956638, 3] >> ../source4/smbd/process_single.c:114(single_terminate) >> single_terminate: reason[ldapsrv_accept_tls_loop: >> tstream_tls_accept_recv() - 104:Connection reset by peer] >> [2017/10/12 08:38:57.956823, 3] >> ../source4/smbd/service_stream.c:66(stream_terminate_connection) >> Terminating connection - 'ldapsrv_accept_tls_loop: >> tstream_tls_accept_recv() - 104:Connection reset by peer' >> [2017/10/12 08:38:57.956869, 3] >> ../source4/smbd/process_single.c:114(single_terminate) >> single_terminate: reason[ldapsrv_accept_tls_loop: >> tstream_tls_accept_recv() - 104:Connection reset by peer] >> [2017/10/12 08:38:57.956990, 3] >> ../source4/auth/ntlm/auth.c:271(auth_check_password_send) >> auth_check_password_send: Checking password for unmapped user >> []\[]@[(null)] >> auth_check_password_send: mapped user is: []\[]@[(null)] >> [2017/10/12 08:38:57.958675, 3] >> ../source4/smbd/service_stream.c:66(stream_terminate_connection) >> Terminating connection - 'ldapsrv_call_loop: >> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET' >> [2017/10/12 08:38:57.958728, 3] >> ../source4/smbd/process_single.c:114(single_terminate) >> single_terminate: reason[ldapsrv_call_loop: >> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET] >> [2017/10/12 08:38:57.958948, 3] >> ../source4/smbd/service_stream.c:66(stream_terminate_connection) >> Terminating connection - 'ldapsrv_call_loop: >> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET' >> [2017/10/12 08:38:57.958994, 3] >> ../source4/smbd/process_single.c:114(single_terminate) >> single_terminate: reason[ldapsrv_call_loop: >> tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET] >> [2017/10/12 08:38:57.969111, 0] >> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) >> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges >> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark >> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) >> [2017/10/12 08:38:57.969762, 2] >> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) >> ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on >> DC=samba,DC=company,DC=com using filter (uSNChanged>=1) >> [2017/10/12 08:38:58.378265, 0] >> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) >> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges >> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark >> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) >> [2017/10/12 08:38:58.379160, 2] >> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) >> ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on >> DC=samba,DC=company,DC=com using filter (uSNChanged>=1) >> [2017/10/12 08:38:58.810202, 0] >> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) >> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges >> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark >> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) >> [2017/10/12 08:38:58.810868, 2] >> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) >> ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on >> DC=samba,DC=company,DC=com using filter (uSNChanged>=1) >> [2017/10/12 08:38:59.251863, 0] >> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) >> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges >> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark >> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) >> [2017/10/12 08:38:59.252418, 2] >> ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) >> ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on >> DC=samba,DC=company,DC=com using filter (uSNChanged>=1) >> [2017/10/12 08:38:59.692247, 0] >> ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) >> ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges >> 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark >> (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) > > I've seen "last_dn" be various things, system groups like above, but > also regular users, computers, and groups that we created. We have > even had (very few) cases were it was: > >> ./log.samba.3.gz: ../source4/rpc_server/drsuapi/getncchanges.c:1961: >> DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older >> highwatermark (last_dn DC=samba,DC=company,DC=com) > > Can anyone explain what is happening here, or help me understand this? > > I have read that highwatermark errors are not neccesarily bad, but the > fact that they cause continuous high cpu usage on a DC (80, 90%), > until the point where this behaviour "transfers" to a next DC makes me > feel that in this case, this is not normal, and indicates some kind of > problem. > > Thanks for input! > > MJ > > MJMJ, A dev or someone else may to assist but your replication isn't syncing correctly among each other. Those dangling links should have purged by now if it's in reference to a DC removed several years ago. Did you do a full replication from a known good DC to the other two? This doesn't always fix the issue but is a good start. You didn't by chance restore a DC recently from backup or had one offline and recently powered on? The highwatermark value tells the source DC what objects the destination DC is requesting to update. The high CPU usage seems due to the DC doing a full partition replication. The fact you stated this issue can happen on all 3 makes it ever tougher to help. I would normally advise to just demote the affected DC and join again. -- -- James
Hi James, list We really appreciate your input on this, thanks! On 10/12/2017 04:12 PM, lingpanda101 via samba wrote:> MJ, > > A dev or someone else may to assist but your replication isn't > syncing correctly among each other. Those dangling links should have > purged by now if it's in reference to a DC removed several years ago.This is rather worrying :-| Specially since I have all kinds of scripts in place that continously check replication, hourly using "samba-tool drs showrepl" plus "samba-tool ldapcmp" every other hour. So one can even have problems, when all built-in checks succeed. :-( Currently DC2 has high cpu usage, and grepping the log.samba for "Succeeded" gives this kind of result:> Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com > Replicated 3 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=comAll zero, with some exceptions... I image this looks better, a sample from the non-high CPU DCs:> Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com > Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com > Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 4 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 4 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com > Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=comSome zeros, but many indications that it is actually replicating data.> Did you do a full replication from a known good DC to the other two?Well at this point I have no idea which DC I can consider "a good dc".> This doesn't always fix the issue but is a good start. You didn't by > chance restore a DC recently from backup or had one offline and recently > powered on?No. These three DCs have been online for many years, ever since the DC1 was removed. (we never demoted it, since it had crashed, so we manually removed the DC1 from the database, that's perhaps why there are some remains) The fact that there are still two 'dangling forward links', identical on all DCs, makes me think that we simply have missed those when we manually removed all DC1 references. This happened back in the samba 4.1 days.> The highwatermark value tells the source DC what objects the destination > DC is requesting to update. The high CPU usage seems due to the DC doing > a full partition replication. The fact you stated this issue can happen > on all 3 makes it ever tougher to help. I would normally advise to just > demote the affected DC and join again.Perhaps I should try if I can find a combination of two DCs that works, check replication, verify with ldapcmp, make sure no high cpu, etc, etc, and then trust those two and demote the third. Any input here would be very welcome... Here's bit of the logs, leading up to the "Replicated 0 objects" on the current high-cpu DC, hopefully that reveils something..?> Not authoritative for '_kerberos.com', forwarding > [2017/10/12 06:00:16.744615, 2] ../source4/dns_server/dns_query.c:1019(dns_server_process_query_send) > Not authoritative for '_kerberos.com', forwarding > [2017/10/12 06:00:16.745393, 2] ../source4/dns_server/dns_query.c:1019(dns_server_process_query_send) > Not authoritative for '_kerberos.com', forwarding > [2017/10/12 06:00:16.745731, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: AS-REQ authtime: 2017-10-12T06:00:16 starttime: unset endtime: 2017-10-12T16:00:16 renew till: unset > [2017/10/12 06:00:16.745830, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: Client supported enctypes: aes256-cts-hmac-sha1-96, aes128-cts-hmac-sha1-96, des3-cbc-sha1, des3-cbc-md5, arcfour-hmac-md5, using arcfour-hmac-md5/arcfour-hmac-md5 > [2017/10/12 06:00:16.745975, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: Requested flags: forwardable > [2017/10/12 06:00:16.748679, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ MEMBERSERVER$@SAMBA.COMPANY.COM from ipv4:192.168.89.2:40725 for ldap/dc2.SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [canonicalize] > [2017/10/12 06:00:16.754551, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset > [2017/10/12 06:00:16.755962, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41634 for ldap/DC2.SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [canonicalize] > [2017/10/12 06:00:16.762012, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset > [2017/10/12 06:00:16.762249, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED' > [2017/10/12 06:00:16.762249, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED' > [2017/10/12 06:00:16.762320, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED] > [2017/10/12 06:00:16.762967, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ MEMBERSERVER$@SAMBA.COMPANY.COM from ipv4:192.168.89.2:40726 for krbtgt/SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [forwarded, forwardable] > [2017/10/12 06:00:16.765363, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset > [2017/10/12 06:00:16.765585, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED' > [2017/10/12 06:00:16.765679, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED] > [2017/10/12 06:00:16.766324, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41635 for krbtgt/SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [forwarded, forwardable] > [2017/10/12 06:00:16.768612, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset > [2017/10/12 06:00:16.768836, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED' > [2017/10/12 06:00:16.768907, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED] > [2017/10/12 06:00:16.769475, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED' > [2017/10/12 06:00:16.769542, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED] > [2017/10/12 06:00:16.799101, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41637 for ldap/dc2.SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [canonicalize] > [2017/10/12 06:00:16.808786, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset > [2017/10/12 06:00:16.809681, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED' > [2017/10/12 06:00:16.809767, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED] > [2017/10/12 06:00:16.817237, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41638 for krbtgt/SAMBA.COMPANY.COM at SAMBA.COMPANY.COM [forwarded, forwardable] > [2017/10/12 06:00:16.819573, 3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper) > Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset > [2017/10/12 06:00:16.820289, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED' > [2017/10/12 06:00:16.820368, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED] > [2017/10/12 06:00:16.843259, 2] ../source4/dsdb/repl/replicated_objects.c:1016(dsdb_replicated_objects_commit) > Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=comLot's of NT_STATUS_CONNECTION_DISCONNECTED. Ideas anyone..? MJ
Andrew Bartlett
2017-Oct-14 10:16 UTC
[Samba] samba getting stuck, highwatermark replication issue?
On Thu, 2017-10-12 at 09:17 +0200, mj via samba wrote:> Hi all, James, > > After following James' suggestions fixing the several dbcheck errors, > and having observed things for a few days, I'd like to update this > issue, and hope for some new input again. :-) > > Summary: three DCs, all three running Version > 4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports > no errors, except for two (supposedly innocent) dangling forward links > that I'm ignoring for now. Time is synced. Very basic smb.conf, posted > earlier, can post again if needed. > > samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in > sync, and also samba-tool drs showrepl shows that replication seems to > be stable. > > The "getting stuck" from the subject line has not occured for a few > days, perhaps the dbcheck fixes have solved that, or perhaps we've just > been lucky. > > All in all this appears pretty healthy, but there is a remaing problem: > > At ANY given time, ONE RANDOM single DC shows high cpu usage on one > samba process. And on that DC (can be any of the three DCs) the logs > fill up with this:I would upgrade to Samba 4.7. The work on locking in LDB and the mention of replication issues was serious. Likewise we fixed a number of other issues in over-replication of linked attributes (group memberships) for 4.6. We are carefully following the reports here, but we do expect replication should be much more stable with Samba 4.7. Thanks, Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba
Hi Andrew, Thanks for chiming in! On 10/14/2017 12:16 PM, Andrew Bartlett via samba wrote:> We are carefully following the reports here, but we do expect > replication should be much more stable with Samba 4.7.OK, that's interesting, because I actually wanted to upgrade ASAP, but (the few) 4.7-upgrade experiences that have been posted, are mostly about replication issues after having upgraded: See https://lists.samba.org/archive/samba/2017-October/thread.html Have people here generally upgraded to 4.7 already? Without major issues? (does that explain the lack of discussion on 4.7?) Or are people mostly waiting until a version 4.7.1 or .2 has been released? MJ