Chris St. Pierre
2007-Mar-22 14:38 UTC
[Fedora-directory-users] MMR broken, reinitialization erases db
Sometime earlier this week (still trying to determine when), the multi-master replication on one of our databases broke. I tried to reinitialize it between a few of the hosts, and I got a bunch of errors: [22/Mar/2007:09:27:39 -0500] NSMMReplicationPlugin - multimaster_be_state_change: replica o=isp is going offline; disabling replication [22/Mar/2007:09:27:41 -0500] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [22/Mar/2007:09:27:45 -0500] - ERROR bulk import abandoned [22/Mar/2007:09:27:45 -0500] - import userRoot: Aborting all import threads... [22/Mar/2007:09:27:53 -0500] - import userRoot: Import threads aborted. [22/Mar/2007:09:27:53 -0500] - import userRoot: Closing files... [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/owner.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/mail.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/modifytimestamp.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/telephoneNumber.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/nsUniqueId.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/objectclass.db4: unable to flush:No such file or directory [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/ou.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/icsCalendar.db4: unable to flush:No such file or directory [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/sambaSID.db4: unable to flush: Nosuch file or directory [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/givenName.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/gidnumber.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/createtimestamp.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/cn.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/sn.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/uid.db4: unable to flush: No suchfile or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/uidNumber.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/aci.db4: unable to flush: No suchfile or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/uniquemember.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/parentid.db4: unable to flush: Nosuch file or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/entrydn.db4: unable to flush: No such file or directory [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/id2entry.db4: unable to flush: Nosuch file or directory [22/Mar/2007:09:27:57 -0500] - import userRoot: Import failed. [22/Mar/2007:09:27:57 -0500] - process_bulk_import_op: NULL backend This erased the database, and I was left with no data. Subsequently, I''ve restarted FDS, restored from backup using bak2db.pl, and it still doesn''t work. Any ideas? Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University ---------------------------- Never send mail to thobrux@nebrwesleyan.edu
Richard Megginson
2007-Mar-22 15:05 UTC
Re: [Fedora-directory-users] MMR broken, reinitialization erases db
Chris St. Pierre wrote:> Sometime earlier this week (still trying to determine when), the > multi-master replication on one of our databases broke. I tried to > reinitialize it between a few of the hosts, and I got a bunch of > errors: > > [22/Mar/2007:09:27:39 -0500] NSMMReplicationPlugin - > multimaster_be_state_change: replica o=isp is going offline; disabling > replication > [22/Mar/2007:09:27:41 -0500] - WARNING: Import is running with > nsslapd-db-private-import-mem on; No other process is allowed to > access the database > [22/Mar/2007:09:27:45 -0500] - ERROR bulk import abandoned???? You might try enabling the replication log level to see what is going on here.> [22/Mar/2007:09:27:45 -0500] - import userRoot: Aborting all import > threads... > [22/Mar/2007:09:27:53 -0500] - import userRoot: Import threads > aborted. > [22/Mar/2007:09:27:53 -0500] - import userRoot: Closing files... > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/owner.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/mail.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/modifytimestamp.db4: > unable to flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/telephoneNumber.db4: > unable to flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/nsUniqueId.db4: unable > to flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/objectclass.db4: unable > to flush:No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/ou.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/icsCalendar.db4: unable > to flush:No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/sambaSID.db4: unable to > flush: Nosuch file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/givenName.db4: unable > to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/gidnumber.db4: unable > to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/createtimestamp.db4: > unable to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/cn.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/sn.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/uid.db4: unable to > flush: No suchfile or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/uidNumber.db4: unable > to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/aci.db4: unable to > flush: No suchfile or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/uniquemember.db4: > unable to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/parentid.db4: unable to > flush: Nosuch file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/entrydn.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/id2entry.db4: unable to > flush: Nosuch file or directory > [22/Mar/2007:09:27:57 -0500] - import userRoot: Import failed. > [22/Mar/2007:09:27:57 -0500] - process_bulk_import_op: NULL backend > > This erased the database, and I was left with no data. Subsequently, > I''ve restarted FDS, restored from backup using bak2db.pl, and it still > doesn''t work. > > Any ideas? > > Chris St. Pierre > Unix Systems Administrator > Nebraska Wesleyan University > ---------------------------- > Never send mail to thobrux@nebrwesleyan.edu > > -- > Fedora-directory-users mailing list > Fedora-directory-users@redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users
Chris St. Pierre
2007-Mar-22 16:12 UTC
Re: [Fedora-directory-users] MMR broken, reinitialization erases db
On Thu, 22 Mar 2007, Richard Megginson wrote:> ???? You might try enabling the replication log level to see what > is going on here.How do I do that? Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University ---------------------------- Never send mail to thobrux@nebrwesleyan.edu
Chris St. Pierre
2007-Mar-22 16:20 UTC
Re: [Fedora-directory-users] MMR broken, reinitialization erases db
On Thu, 22 Mar 2007, Richard Megginson wrote:> ???? You might try enabling the replication log level to see what > is going on here.Not much more data from that: [22/Mar/2007:11:18:09 -0500] - repl5_inc_waitfor_async_results: 0 0 [22/Mar/2007:11:18:09 -0500] - repl5_inc_result_threadmain starting [22/Mar/2007:11:18:09 -0500] NSMMReplicationPlugin - conn=1067 op=61 repl="o=isp": Released replica [22/Mar/2007:11:18:10 -0500] NSMMReplicationPlugin - conn=1074 op=3 repl="o=isp": Begin total protocol [22/Mar/2007:11:18:10 -0500] NSMMReplicationPlugin - conn=1074 op=3 repl="o=isp": Acquired replica [22/Mar/2007:11:18:10 -0500] NSMMReplicationPlugin - multimaster_be_state_change: replica o=isp is going offline; disabling replication [22/Mar/2007:11:18:10 -0500] - repl5_inc_result_threadmain exiting [22/Mar/2007:11:18:10 -0500] agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389) - session end: state=0 load=0 sent=0 skipped=0 [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): Successfully released consumer [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): Beginning linger on the connection [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): State: sending_updates -> wait_for_changes [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): State: wait_for_changes -> wait_for_changes [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): Cancelling linger on the connection [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): Disconnected from the consumer [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): repl5_inc_stop: protocol stopped after 0 seconds [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - conn=0 op=0 repl="o=isp": Replica in use locking_purl=conn=1074 id=3 [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - replica_disable_replication: replica o=isp is acquired [22/Mar/2007:11:18:12 -0500] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [22/Mar/2007:11:18:12 -0500] NSMMReplicationPlugin - conn=1074 op=3 repl="o=isp": StartNSDS50ReplicationRequest: response=0 rc=0 [22/Mar/2007:11:18:16 -0500] - ERROR bulk import abandoned [22/Mar/2007:11:18:16 -0500] - import userRoot: Aborting all import threads... [22/Mar/2007:11:18:24 -0500] - import userRoot: Import threads aborted. [22/Mar/2007:11:18:24 -0500] - import userRoot: Closing files... [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/owner.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/mail.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/modifytimestamp.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/icsCalendar.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/telephoneNumber.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/nsUniqueId.db4: unable to flush:No such file or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/objectclass.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/ou.db4: unable to flush: No suchfile or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/sambaSID.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/givenName.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/gidnumber.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/createtimestamp.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/cn.db4: unable to flush: No suchfile or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/sn.db4: unable to flush: No suchfile or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/uid.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/uidNumber.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/aci.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/uniquemember.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/parentid.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/entrydn.db4: unable to flush: Nosuch file or directory [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/id2entry.db4: unable to flush: No such file or directory [22/Mar/2007:11:18:28 -0500] - import userRoot: Import failed. [22/Mar/2007:11:18:28 -0500] NSMMReplicationPlugin - Aborting total update in progress for replicated area o=isp connid=1074 [22/Mar/2007:11:18:28 -0500] - process_bulk_import_op: NULL backend [22/Mar/2007:11:18:28 -0500] NSMMReplicationPlugin - conn=1074 op=-1 repl="o=isp": Released replica [22/Mar/2007:11:18:29 -0500] NSMMReplicationPlugin - conn=1077 op=3 repl="o=isp": Begin incremental protocol [22/Mar/2007:11:18:30 -0500] NSMMReplicationPlugin - conn=1077 op=3 repl="o=isp": Acquired replica [22/Mar/2007:11:18:30 -0500] NSMMReplicationPlugin - conn=1077 op=3 repl="o=isp": StartNSDS50ReplicationRequest: response=0 rc=0 Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University ---------------------------- Never send mail to thobrux@nebrwesleyan.edu
Noriko Hosoi
2007-Mar-22 16:58 UTC
Re: [Fedora-directory-users] MMR broken, reinitialization erases db
The message is displayed when the "connection is destroyed"... Could there be any error messages on the other side? Do you see something related in the errors and/or access logs? /* connection was destroyed while we were still storing the extension -- * this is bad news and means we have a bulk import that needs to be * aborted! */ LDAPDebug(LDAP_DEBUG_ANY, "ERROR bulk import abandoned\n", 0, 0, 0); Chris St. Pierre wrote:> On Thu, 22 Mar 2007, Richard Megginson wrote: > >> ???? You might try enabling the replication log level to see what >> is going on here. > > Not much more data from that: > > [22/Mar/2007:11:18:09 -0500] - repl5_inc_waitfor_async_results: 0 0 > [22/Mar/2007:11:18:09 -0500] - repl5_inc_result_threadmain starting > [22/Mar/2007:11:18:09 -0500] NSMMReplicationPlugin - conn=1067 op=61 > repl="o=isp": Released replica > [22/Mar/2007:11:18:10 -0500] NSMMReplicationPlugin - conn=1074 op=3 > repl="o=isp": Begin total protocol > [22/Mar/2007:11:18:10 -0500] NSMMReplicationPlugin - conn=1074 op=3 > repl="o=isp": Acquired replica > [22/Mar/2007:11:18:10 -0500] NSMMReplicationPlugin - > multimaster_be_state_change: replica o=isp is going offline; disabling > replication > [22/Mar/2007:11:18:10 -0500] - repl5_inc_result_threadmain exiting > [22/Mar/2007:11:18:10 -0500] agmt="cn="Replication to > zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389) - session end: state=0 > load=0 sent=0 skipped=0 > [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - > agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): > Successfully released consumer > [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - > agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): > Beginning linger on the connection > [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - > agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): > State: sending_updates -> wait_for_changes > [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - > agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): > State: wait_for_changes -> wait_for_changes > [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - > agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): > Cancelling linger on the connection > [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - > agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): > Disconnected from the consumer > [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - > agmt="cn="Replication to zeppo.nebrwesleyan.edu (o=isp)"" (zeppo:389): > repl5_inc_stop: protocol stopped after 0 seconds > [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - conn=0 op=0 > repl="o=isp": Replica in use locking_purl=conn=1074 id=3 > [22/Mar/2007:11:18:11 -0500] NSMMReplicationPlugin - > replica_disable_replication: replica o=isp is acquired > [22/Mar/2007:11:18:12 -0500] - WARNING: Import is running with > nsslapd-db-private-import-mem on; No other process is allowed to > access the database > [22/Mar/2007:11:18:12 -0500] NSMMReplicationPlugin - conn=1074 op=3 > repl="o=isp": StartNSDS50ReplicationRequest: response=0 rc=0 > [22/Mar/2007:11:18:16 -0500] - ERROR bulk import abandoned > [22/Mar/2007:11:18:16 -0500] - import userRoot: Aborting all import > threads... > [22/Mar/2007:11:18:24 -0500] - import userRoot: Import threads > aborted. > [22/Mar/2007:11:18:24 -0500] - import userRoot: Closing files... > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/owner.db4: unable to > flush: No such file or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/mail.db4: unable to > flush: No such file or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/modifytimestamp.db4: > unable to flush: No such file or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/icsCalendar.db4: unable > to flush: No such file or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/telephoneNumber.db4: > unable to flush: No such file or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/nsUniqueId.db4: unable > to flush:No such file or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/objectclass.db4: unable > to flush: No such file or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/ou.db4: unable to > flush: No suchfile or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/sambaSID.db4: unable to > flush: No such file or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/givenName.db4: unable > to flush: No such file or directory > [22/Mar/2007:11:18:27 -0500] - libdb: userRoot/gidnumber.db4: unable > to flush: No such file or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/createtimestamp.db4: > unable to flush: No such file or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/cn.db4: unable to > flush: No suchfile or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/sn.db4: unable to > flush: No suchfile or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/uid.db4: unable to > flush: No such file or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/uidNumber.db4: unable > to flush: No such file or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/aci.db4: unable to > flush: No such file or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/uniquemember.db4: > unable to flush: No such file or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/parentid.db4: unable to > flush: No such file or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/entrydn.db4: unable to > flush: Nosuch file or directory > [22/Mar/2007:11:18:28 -0500] - libdb: userRoot/id2entry.db4: unable to > flush: No such file or directory > [22/Mar/2007:11:18:28 -0500] - import userRoot: Import failed. > [22/Mar/2007:11:18:28 -0500] NSMMReplicationPlugin - Aborting total > update in progress for replicated area o=isp connid=1074 > [22/Mar/2007:11:18:28 -0500] - process_bulk_import_op: NULL backend > [22/Mar/2007:11:18:28 -0500] NSMMReplicationPlugin - conn=1074 op=-1 > repl="o=isp": Released replica > [22/Mar/2007:11:18:29 -0500] NSMMReplicationPlugin - conn=1077 op=3 > repl="o=isp": Begin incremental protocol > [22/Mar/2007:11:18:30 -0500] NSMMReplicationPlugin - conn=1077 op=3 > repl="o=isp": Acquired replica > [22/Mar/2007:11:18:30 -0500] NSMMReplicationPlugin - conn=1077 op=3 > repl="o=isp": StartNSDS50ReplicationRequest: response=0 rc=0 > > Chris St. Pierre > Unix Systems Administrator > Nebraska Wesleyan University > ---------------------------- > Never send mail to thobrux@nebrwesleyan.edu > > -- > Fedora-directory-users mailing list > Fedora-directory-users@redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users
Chris St. Pierre
2007-Mar-22 18:03 UTC
Re: [Fedora-directory-users] MMR broken, reinitialization erases db
On Thu, 22 Mar 2007, Noriko Hosoi wrote:> The message is displayed when the "connection is destroyed"... Could there be > any error messages on the other side? Do you see something related in the > errors and/or access logs? >Here''s what I get on the supplier: [22/Mar/2007:12:58:11 -0500] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389)". [22/Mar/2007:12:58:26 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Failed to send extended operation: LDAP error 81 (Can''t contact LDAP server) [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Received error 89: NULL for totalupdate operation [22/Mar/2007:12:58:27 -0500] NSMMReplicationPlugin - agmt="cn="Replication to groucho.nebrwesleyan.edu (o=isp)"" (groucho:389): Warning: unable to send endReplication extended operation (Bad parameter to an ldap routine) Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University ---------------------------- Never send mail to thobrux@nebrwesleyan.edu
Chris St. Pierre
2007-Mar-22 21:15 UTC
Re: [Fedora-directory-users] MMR broken, reinitialization erases db
With the help of a couple folks on IRC (thanks richm, uffe!), here''s what I figured out I can do: In order to get two machines doing MMR again, I first got rid of any MMR agreements between them, and then shut them both down. Then I chose one and exported the LDAP database with: /opt/fedora-ds/slapd-instance/db2ldif -n userRoot I copied the LDIF file to the other node. Then I imported it on both: /opt/fedora-ds/slapd-instance/ldif2db -n userRoot -i / /opt/fedora-ds/slapd-instance/ldif/2007_03_22_141131.ldif Then I went into the changelogdb/ folder and blew away all of the __db.*, *.db4, and log.* files. At this point, I started Fedora DS on both nodes again. I then used mmr.pl to re-initialize the MMR agreement between the two of them, and all was well. I''ve now got MMR working again between three nodes; the fourth will get added back in late tonight. This may be more cautious than is necessary, but it''s working. I still have no clue what caused this initially, but I don''t really care (unless it happens again). Thanks for everyone''s help! Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University On Thu, 22 Mar 2007, Chris St. Pierre wrote:> Sometime earlier this week (still trying to determine when), the > multi-master replication on one of our databases broke. I tried to > reinitialize it between a few of the hosts, and I got a bunch of > errors: > > [22/Mar/2007:09:27:39 -0500] NSMMReplicationPlugin - > multimaster_be_state_change: replica o=isp is going offline; disabling > replication > [22/Mar/2007:09:27:41 -0500] - WARNING: Import is running with > nsslapd-db-private-import-mem on; No other process is allowed to > access the database > [22/Mar/2007:09:27:45 -0500] - ERROR bulk import abandoned > [22/Mar/2007:09:27:45 -0500] - import userRoot: Aborting all import > threads... > [22/Mar/2007:09:27:53 -0500] - import userRoot: Import threads > aborted. > [22/Mar/2007:09:27:53 -0500] - import userRoot: Closing files... > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/owner.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/mail.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/modifytimestamp.db4: > unable to flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/telephoneNumber.db4: > unable to flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/nsUniqueId.db4: unable > to flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/objectclass.db4: unable > to flush:No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/ou.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/icsCalendar.db4: unable > to flush:No such file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/sambaSID.db4: unable to > flush: Nosuch file or directory > [22/Mar/2007:09:27:56 -0500] - libdb: userRoot/givenName.db4: unable > to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/gidnumber.db4: unable > to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/createtimestamp.db4: > unable to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/cn.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/sn.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/uid.db4: unable to > flush: No suchfile or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/uidNumber.db4: unable > to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/aci.db4: unable to > flush: No suchfile or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/uniquemember.db4: > unable to flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/parentid.db4: unable to > flush: Nosuch file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/entrydn.db4: unable to > flush: No such file or directory > [22/Mar/2007:09:27:57 -0500] - libdb: userRoot/id2entry.db4: unable to > flush: Nosuch file or directory > [22/Mar/2007:09:27:57 -0500] - import userRoot: Import failed. > [22/Mar/2007:09:27:57 -0500] - process_bulk_import_op: NULL backend > > This erased the database, and I was left with no data. Subsequently, > I''ve restarted FDS, restored from backup using bak2db.pl, and it still > doesn''t work. > > Any ideas? > > Chris St. Pierre > Unix Systems Administrator > Nebraska Wesleyan University > ---------------------------- > Never send mail to thobrux@nebrwesleyan.edu > > -- > Fedora-directory-users mailing list > Fedora-directory-users@redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users > >
Chris St. Pierre
2007-Mar-23 15:02 UTC
Re: [Fedora-directory-users] MMR broken, reinitialization erases db
On Thu, 22 Mar 2007, Chris St. Pierre wrote:> I still have no clue what caused this initially, but I don''t really > care (unless it happens again).Predictably, it happened again. In fact, it happened as soon as I made the MMR cluster live again and operations started coming in. Here''s the error message: [23/Mar/2007:09:59:02 -0500] - csngen_adjust_time: adjustment limit exceeded; value - 86401, limit - 86400 [23/Mar/2007:09:59:03 -0500] NSMMReplicationPlugin - conn=12 op=21 replica="o=isp": Unable to acquire replica: error: excessive clock skew The ''value'' in the first line is always 86401. All four of our nodes have the same time; all use NTP against the same NTP server. All were patched for DST and subsequently rebooted. What the heck? Any ideas? Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University ---------------------------- Never send mail to thobrux@nebrwesleyan.edu