thr3ads.net - Fedora directory users - [Fedora-directory-users] Replication - consumer failed to replay change [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Kevin M. Myer

2005-Dec-17 16:05 UTC

[Fedora-directory-users] Replication - consumer failed to replay change

Hello,

I have three instances of FDS running - two are in a multimaster config 
(directory1 and directory2), with all subtrees replicated and one is a 
dedicated slave (garnet), with half a dozen subtrees replicated.  
directory1 and directory2 are running FDS 7.1, garnet is running FDS 
1.0.1.

Most of the writes go to directory1, and although I have not tested 
writing to every subtree from directory1 -> directory2 and directory2 
-> directory1, replication seems to be working fine for the most part.

However, I noticed yesterday morning the following entry:

[16/Dec/2005:09:06:16 -0500] NSMMReplicationPlugin - agmt="cn=IU13" 
(directory2:636): Consumer failed to replay change (uniqueid 
6a76611a-1dd211b2-8027b642-689d0000, CSN 43a2c9d8000000010000): 
Operations error. Will retry later.

This is repeated every five minutes to the present time.  Is there a 
way to look at the changelog entries to see what modification caused 
this problem?  And if not, whats the best way to go about clearing up 
the error?

Also, is FDS smart enough so that if you have a two-server multimaster 
replication setup, and you use one master to initialize the other, 
which has an existing replication setup with the master, that it won''t 
send the changes back?  In other words, if I have directory1 and 
directory2, and they are setup in multimaster, with replication 
agreements in place for a subtree, and there''s a problem in the subtree
on directory2, can I use directory1 to initialize directory2, or will 
directory2 then turn around and try to initialize directory1?

Kevin
-- 
Kevin M. Myer
Senior Systems Administrator
Lancaster-Lebanon Intermediate Unit 13  http://www.iu13.org

David Boreham

2005-Dec-17 17:40 UTC

head link

Re: [Fedora-directory-users] Replication - consumer failed to replay change

Kevin M. Myer wrote:
> [16/Dec/2005:09:06:16 -0500] NSMMReplicationPlugin -
agmt="cn=IU13"
> (directory2:636): Consumer failed to replay change (uniqueid 
> 6a76611a-1dd211b2-8027b642-689d0000, CSN 43a2c9d8000000010000): 
> Operations error. Will retry later.
>
> This is repeated every five minutes to the present time.  Is there a 
> way to look at the changelog entries to see what modification caused 
> this problem?  And if not, whats the best way to go about clearing up 
> the error?
Try looking in the access and error logs on the replica server (the 
server that is receiving this update).
That should tell us which operation is failing. Exactly what is going on 
I''m not sure, I''ve not seen a
problem like this before. Perhaps someone else on the list has.
> Also, is FDS smart enough so that if you have a two-server multimaster 
> replication setup, and you use one master to initialize the other, 
> which has an existing replication setup with the master, that it
won''t
> send the changes back?  In other words, if I have directory1 and 
> directory2, and they are setup in multimaster, with replication 
> agreements in place for a subtree, and there''s a problem in the 
> subtree on directory2, can I use directory1 to initialize directory2, 
> or will directory2 then turn around and try to initialize directory1?
No, it''s smart enough to not do that.

Kevin M. Myer

2005-Dec-17 18:57 UTC

head link

Re: [Fedora-directory-users] Replication - consumer failed to replay change

Quoting David Boreham <david_list@boreham.org>:
> Try looking in the access and error logs on the replica server (the 
> server that is receiving this update).
> That should tell us which operation is failing. Exactly what is going 
> on I''m not sure, I''ve not seen a
> problem like this before. Perhaps someone else on the list has.
Here''s the action its trying to perform:

[16/Dec/2005:09:06:16 -0500] conn=900959 op=3 EXT 
oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start
Session"
[16/Dec/2005:09:06:16 -0500] conn=900959 op=3 RESULT err=0 tag=120 
nentries=0 etime=0
[16/Dec/2005:09:06:16 -0500] conn=900959 op=4 DEL 
dn="uid=<username>,ou=people,dc=base"
[16/Dec/2005:09:06:16 -0500] conn=900959 op=4 RESULT err=1 tag=107 
nentries=0 etime=0 csn=43a2c9d8000000010000
[16/Dec/2005:09:06:18 -0500] conn=900959 op=5 EXT 
oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End
Session"

The replication to the slave (garnet) did occur properly for the 
account that was being deleted.  Its also not inhibiting other changes 
from occuring in the the same replication session.  I just made a minor 
modification to my account and it replicated while the deletion of the 
account giving errors failed.  I restarted the server that was 
receiving the changes, and now the deletion operation that was failing 
isn''t occuring at all :/  So I guess I''ll just manually delete
the
account, since the one master seems to be convinced that the change 
went through.

Kevin
-- 
Kevin M. Myer
Senior Systems Administrator
Lancaster-Lebanon Intermediate Unit 13  http://www.iu13.org

Richard Megginson

2005-Dec-19 15:31 UTC

head link

Re: [Fedora-directory-users] Replication - consumer failed to replay change

Kevin M. Myer wrote:
> Quoting David Boreham <david_list@boreham.org>:
>
>> Try looking in the access and error logs on the replica server (the 
>> server that is receiving this update).
>> That should tell us which operation is failing. Exactly what is going 
>> on I''m not sure, I''ve not seen a
>> problem like this before. Perhaps someone else on the list has.
>
>
> Here''s the action its trying to perform:
>
> [16/Dec/2005:09:06:16 -0500] conn=900959 op=3 EXT 
> oid="2.16.840.1.113730.3.5.3" name="Netscape Replication
Start Session"
> [16/Dec/2005:09:06:16 -0500] conn=900959 op=3 RESULT err=0 tag=120 
> nentries=0 etime=0
> [16/Dec/2005:09:06:16 -0500] conn=900959 op=4 DEL 
> dn="uid=<username>,ou=people,dc=base"
> [16/Dec/2005:09:06:16 -0500] conn=900959 op=4 RESULT err=1 tag=107 
> nentries=0 etime=0 csn=43a2c9d8000000010000
> [16/Dec/2005:09:06:18 -0500] conn=900959 op=5 EXT 
> oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End
Session"
>
> The replication to the slave (garnet) did occur properly for the 
> account that was being deleted.
Is this the access log from one of the masters?
> Its also not inhibiting other changes from occuring in the the same 
> replication session.  I just made a minor modification to my account 
> and it replicated while the deletion of the account giving errors 
> failed.  I restarted the server that was receiving the changes, and 
> now the deletion operation that was failing isn''t occuring at all
:/
> So I guess I''ll just manually delete the account, since the one
master
> seems to be convinced that the change went through.
So after the restart, everything is ok?
>
> Kevin

Kevin M. Myer

2005-Dec-19 19:54 UTC

head link

Re: [Fedora-directory-users] Replication - consumer failed to replay change

Quoting Richard Megginson <rmeggins@redhat.com>:
> Kevin M. Myer wrote:
>
>> Quoting David Boreham <david_list@boreham.org>:
>>
>>> Try looking in the access and error logs on the replica server (the
>>> server that is receiving this update).
>>> That should tell us which operation is failing. Exactly what is 
>>> going on I''m not sure, I''ve not seen a
>>> problem like this before. Perhaps someone else on the list has.
>>
>>
>> Here''s the action its trying to perform:
>>
>> [16/Dec/2005:09:06:16 -0500] conn=900959 op=3 EXT 
>> oid="2.16.840.1.113730.3.5.3" name="Netscape Replication
Start
>> Session"
>> [16/Dec/2005:09:06:16 -0500] conn=900959 op=3 RESULT err=0 tag=120 
>> nentries=0 etime=0
>> [16/Dec/2005:09:06:16 -0500] conn=900959 op=4 DEL 
>> dn="uid=<username>,ou=people,dc=base"
>> [16/Dec/2005:09:06:16 -0500] conn=900959 op=4 RESULT err=1 tag=107 
>> nentries=0 etime=0 csn=43a2c9d8000000010000
>> [16/Dec/2005:09:06:18 -0500] conn=900959 op=5 EXT 
>> oid="2.16.840.1.113730.3.5.5" name="Netscape Replication
End Session"
>>
>> The replication to the slave (garnet) did occur properly for the 
>> account that was being deleted.
>
> Is this the access log from one of the masters?
Yes, its from the master that the changes were sent to.
>> Its also not inhibiting other changes from occuring in the the same 
>> replication session.  I just made a minor modification to my account 
>> and it replicated while the deletion of the account giving errors 
>> failed.  I restarted the server that was receiving the changes, and 
>> now the deletion operation that was failing isn''t occuring at
all :/
>>  So I guess I''ll just manually delete the account, since the
one
>> master seems to be convinced that the change went through.
>
> So after the restart, everything is ok?
Unfortunately, no.  What has stopped is the attempt to do the 
replication from the master where the initial change was committed.  
Further, if I try to manually delete the entry from the master the 
changes were to be replicated to, I get the same operation error.

[17/Dec/2005:14:07:41 -0500] conn=471 fd=210 slot=210 connection from 
XX.XX.XX.XX to XX.XX.XX.XX
[17/Dec/2005:14:07:41 -0500] conn=471 op=0 BIND dn="cn=Directory 
Manager" method=128 version=3
[17/Dec/2005:14:07:41 -0500] conn=471 op=0 RESULT err=0 tag=97 
nentries=0 etime=0 dn="cn=directory manager"
[17/Dec/2005:14:07:41 -0500] conn=471 op=1 DEL 
dn="uid=<username>,ou=People,dc=base"
[17/Dec/2005:14:07:41 -0500] conn=471 op=1 RESULT err=1 tag=107 
nentries=0 etime=0 csn=43a461fe000000650000
[17/Dec/2005:14:07:41 -0500] conn=471 op=2 UNBIND
[17/Dec/2005:14:07:41 -0500] conn=471 op=2 fd=210 closed - U1

Now to the best of my knowledge, this server has not gone down 
uncleanly, and its only this one entry that is causing problems.  So 
ideas on what to try next, or how I might fix it?

Kevin
-- 
Kevin M. Myer
Senior Systems Administrator
Lancaster-Lebanon Intermediate Unit 13  http://www.iu13.org

Richard Megginson

2005-Dec-19 20:27 UTC

head link

Re: [Fedora-directory-users] Replication - consumer failed to replay change

Kevin M. Myer wrote:
> Quoting Richard Megginson <rmeggins@redhat.com>:
>
>> Kevin M. Myer wrote:
>>
>>> Quoting David Boreham <david_list@boreham.org>:
>>>
>>>> Try looking in the access and error logs on the replica server
(the
>>>> server that is receiving this update).
>>>> That should tell us which operation is failing. Exactly what is
>>>> going on I''m not sure, I''ve not seen a
>>>> problem like this before. Perhaps someone else on the list has.
>>>
>>>
>>>
>>> Here''s the action its trying to perform:
>>>
>>> [16/Dec/2005:09:06:16 -0500] conn=900959 op=3 EXT 
>>> oid="2.16.840.1.113730.3.5.3" name="Netscape
Replication Start Session"
>>> [16/Dec/2005:09:06:16 -0500] conn=900959 op=3 RESULT err=0 tag=120 
>>> nentries=0 etime=0
>>> [16/Dec/2005:09:06:16 -0500] conn=900959 op=4 DEL 
>>> dn="uid=<username>,ou=people,dc=base"
>>> [16/Dec/2005:09:06:16 -0500] conn=900959 op=4 RESULT err=1 tag=107 
>>> nentries=0 etime=0 csn=43a2c9d8000000010000
>>> [16/Dec/2005:09:06:18 -0500] conn=900959 op=5 EXT 
>>> oid="2.16.840.1.113730.3.5.5" name="Netscape
Replication End Session"
>>>
>>> The replication to the slave (garnet) did occur properly for the 
>>> account that was being deleted.
>>
>>
>> Is this the access log from one of the masters?
>
>
> Yes, its from the master that the changes were sent to.
>
>>> Its also not inhibiting other changes from occuring in the the same
>>> replication session.  I just made a minor modification to my
account
>>> and it replicated while the deletion of the account giving errors 
>>> failed.  I restarted the server that was receiving the changes, and
>>> now the deletion operation that was failing isn''t occuring
at all :/
>>>  So I guess I''ll just manually delete the account, since
the one
>>> master seems to be convinced that the change went through.
>>
>>
>> So after the restart, everything is ok?
>
>
> Unfortunately, no.  What has stopped is the attempt to do the 
> replication from the master where the initial change was committed.  
> Further, if I try to manually delete the entry from the master the 
> changes were to be replicated to, I get the same operation error.
>
> [17/Dec/2005:14:07:41 -0500] conn=471 fd=210 slot=210 connection from 
> XX.XX.XX.XX to XX.XX.XX.XX
> [17/Dec/2005:14:07:41 -0500] conn=471 op=0 BIND dn="cn=Directory 
> Manager" method=128 version=3
> [17/Dec/2005:14:07:41 -0500] conn=471 op=0 RESULT err=0 tag=97 
> nentries=0 etime=0 dn="cn=directory manager"
> [17/Dec/2005:14:07:41 -0500] conn=471 op=1 DEL 
> dn="uid=<username>,ou=People,dc=base"
> [17/Dec/2005:14:07:41 -0500] conn=471 op=1 RESULT err=1 tag=107 
> nentries=0 etime=0 csn=43a461fe000000650000
> [17/Dec/2005:14:07:41 -0500] conn=471 op=2 UNBIND
> [17/Dec/2005:14:07:41 -0500] conn=471 op=2 fd=210 closed - U1
>
> Now to the best of my knowledge, this server has not gone down 
> uncleanly, and its only this one entry that is causing problems.  So 
> ideas on what to try next, or how I might fix it?
I think you should just re-initialize it e.g. reinit this master from 
the other master.
>
> Kevin

Fedora directory users - Dec 2005 - Replication - consumer failed to replay change

[Fedora-directory-users] Replication - consumer failed to replay change

Re: [Fedora-directory-users] Replication - consumer failed to replay change

Re: [Fedora-directory-users] Replication - consumer failed to replay change

Re: [Fedora-directory-users] Replication - consumer failed to replay change

Re: [Fedora-directory-users] Replication - consumer failed to replay change

Re: [Fedora-directory-users] Replication - consumer failed to replay change