Hi James, Thanks for the quick reply. On 10/09/2017 08:52 PM, lingpanda101 via samba wrote:> You should be able to fix the 'replPropertyMetaData' errors with; > > samba-tool dbcheck --cross-ncs --fix --yes > 'fix_replmetadata_unsorted_attid'Yep, worked great! Fixed all of those replPropertyMetaData errors! :-)> The highwatermark doesn't necessarily reflect an issue. It's part of how > the destination DC keeps track of changes from the source DC. Can you > verify the time and date is correct on all DC's?Date & time matches. But the fact that the same identical message is logged multiple times per second, without an end seems a bit strange... Combined with high cpu usage on the DC where this happens. (yesterday DC2, currently on DC4)> The GUID errors seem related to your old DC offline and NTDS connections > still lingering. Open Microsoft Sites and Services and remove the ones > no longer needed.There is no DC1 mentioned anywhere there. And the two errors remain:> ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=4605>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu > Not removing dangling forward link > ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=4579>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu > Not removing dangling forward linkI was asked a question during the samba-tool dbcheck:> Add yourself to the replica locations for DC=DomainDnsZones,DC=samba,DC=company,DC=com? [y/N/all/none] N > Not fixing missing/incorrect attributes on DC=DomainDnsZones,DC=samba,DC=company,DC=com > > Add yourself to the replica locations for DC=ForestDnsZones,DC=samba,DC=company,DC=com? [y/N/all/none] N > Not fixing missing/incorrect attributes on DC=ForestDnsZones,DC=samba,DC=company,DC=comShould I answer Yes to those two questions? MJ
lingpanda101
2017-Oct-10 13:19 UTC
[Samba] samba getting stuck, highwatermark replication issue?
On 10/10/2017 3:14 AM, mj via samba wrote:> Hi James, > > Thanks for the quick reply. > > On 10/09/2017 08:52 PM, lingpanda101 via samba wrote: > >> You should be able to fix the 'replPropertyMetaData' errors with; >> >> samba-tool dbcheck --cross-ncs --fix --yes >> 'fix_replmetadata_unsorted_attid' > Yep, worked great! Fixed all of those replPropertyMetaData errors! :-) > >> The highwatermark doesn't necessarily reflect an issue. It's part of >> how the destination DC keeps track of changes from the source DC. Can >> you verify the time and date is correct on all DC's? > Date & time matches. But the fact that the same identical message is > logged multiple times per second, without an end seems a bit > strange... Combined with high cpu usage on the DC where this happens. > (yesterday DC2, currently on DC4) > >> The GUID errors seem related to your old DC offline and NTDS >> connections still lingering. Open Microsoft Sites and Services and >> remove the ones no longer needed. > There is no DC1 mentioned anywhere there. And the two errors remain: > >> ERROR: no target object found for GUID component for >> msDS-NC-Replica-Locations in object >> CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu >> - >> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=4605>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS >> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu >> >> Not removing dangling forward link >> ERROR: no target object found for GUID component for >> msDS-NC-Replica-Locations in object >> CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu >> - >> <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=4579>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS >> Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=merit,DC=unu,DC=edu >> >> Not removing dangling forward link > > I was asked a question during the samba-tool dbcheck: > >> Add yourself to the replica locations for >> DC=DomainDnsZones,DC=samba,DC=company,DC=com? [y/N/all/none] N >> Not fixing missing/incorrect attributes on >> DC=DomainDnsZones,DC=samba,DC=company,DC=com >> >> Add yourself to the replica locations for >> DC=ForestDnsZones,DC=samba,DC=company,DC=com? [y/N/all/none] N >> Not fixing missing/incorrect attributes on >> DC=ForestDnsZones,DC=samba,DC=company,DC=com > > Should I answer Yes to those two questions? > > MJ >MJ, I must have missed this snipit on your first email. "Not removing dangling forward link" These are deleted NTDS and harmless. However you can clean them up with. #samba-tool domain tombstones expunge It should be safe to say yes to those questions. You could also run the following command as well for those. #samba-tool dbcheck --cross-ncs --fix --yes 'fix_replica_locations' It may be best to run a manual full replication from a good DC to one that is having the problems. See https://wiki.samba.org/index.php/Manually_Replicating_Directory_Partitions -- -- James
Hi all, James, After following James' suggestions fixing the several dbcheck errors, and having observed things for a few days, I'd like to update this issue, and hope for some new input again. :-) Summary: three DCs, all three running Version 4.5.10-SerNet-Debian-16.wheezy, samba-tool dbcheck --cross-ncs reports no errors, except for two (supposedly innocent) dangling forward links that I'm ignoring for now. Time is synced. Very basic smb.conf, posted earlier, can post again if needed. samba-tool ldapcmp dcX dcY --filter=whenChanged shows that they are in sync, and also samba-tool drs showrepl shows that replication seems to be stable. The "getting stuck" from the subject line has not occured for a few days, perhaps the dbcheck fixes have solved that, or perhaps we've just been lucky. All in all this appears pretty healthy, but there is a remaing problem: At ANY given time, ONE RANDOM single DC shows high cpu usage on one samba process. And on that DC (can be any of the three DCs) the logs fill up with this:> [2017/10/12 08:38:57.956586, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer' > [2017/10/12 08:38:57.956638, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer] > [2017/10/12 08:38:57.956823, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer' > [2017/10/12 08:38:57.956869, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[ldapsrv_accept_tls_loop: tstream_tls_accept_recv() - 104:Connection reset by peer] > [2017/10/12 08:38:57.956990, 3] ../source4/auth/ntlm/auth.c:271(auth_check_password_send) > auth_check_password_send: Checking password for unmapped user []\[]@[(null)] > auth_check_password_send: mapped user is: []\[]@[(null)] > [2017/10/12 08:38:57.958675, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET' > [2017/10/12 08:38:57.958728, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET] > [2017/10/12 08:38:57.958948, 3] ../source4/smbd/service_stream.c:66(stream_terminate_connection) > Terminating connection - 'ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET' > [2017/10/12 08:38:57.958994, 3] ../source4/smbd/process_single.c:114(single_terminate) > single_terminate: reason[ldapsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_RESET] > [2017/10/12 08:38:57.969111, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) > [2017/10/12 08:38:57.969762, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) > ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1) > [2017/10/12 08:38:58.378265, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) > [2017/10/12 08:38:58.379160, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) > ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1) > [2017/10/12 08:38:58.810202, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) > [2017/10/12 08:38:58.810868, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) > ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1) > [2017/10/12 08:38:59.251863, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com) > [2017/10/12 08:38:59.252418, 2] ../source4/rpc_server/drsuapi/getncchanges.c:1483(getncchanges_collect_objects) > ../source4/rpc_server/drsuapi/getncchanges.c:1483: getncchanges on DC=samba,DC=company,DC=com using filter (uSNChanged>=1) > [2017/10/12 08:38:59.692247, 0] ../source4/rpc_server/drsuapi/getncchanges.c:1961(dcesrv_drsuapi_DsGetNCChanges) > ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)I've seen "last_dn" be various things, system groups like above, but also regular users, computers, and groups that we created. We have even had (very few) cases were it was:> ./log.samba.3.gz: ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn DC=samba,DC=company,DC=com)Can anyone explain what is happening here, or help me understand this? I have read that highwatermark errors are not neccesarily bad, but the fact that they cause continuous high cpu usage on a DC (80, 90%), until the point where this behaviour "transfers" to a next DC makes me feel that in this case, this is not normal, and indicates some kind of problem. Thanks for input! MJ MJ