Kotte, Christian (Ext)
2018-Sep-11 10:49 UTC
[Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue
Hi all, I use glusterfs 4.1.3 non-root user geo-replication in a cascading setup. The gsyncd.log on the master is fine, but I have some strange changelog warnings and errors on the interimmaster: gsyncd.log ? [2018-09-11 10:38:35.575464] I [master(worker /bricks/brick1/brick):1460:crawl] _GMaster: slave's time stime=(1536662250, 0) [2018-09-11 10:38:37.126749] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4698 num_files=1 job=1 return_code=23 [2018-09-11 10:38:37.128668] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311'] [2018-09-11 10:38:39.353209] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4057 num_files=1 job=2 return_code=23 [2018-09-11 10:38:39.354737] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311'] [2018-09-11 10:38:41.501187] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4781 num_files=1 job=3 return_code=23 [2018-09-11 10:38:41.503048] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311'] [2018-09-11 10:38:43.575047] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4385 num_files=1 job=1 return_code=23 [2018-09-11 10:38:43.576597] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311'] [2018-09-11 10:38:45.838089] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4765 num_files=1 job=2 return_code=23 [2018-09-11 10:38:45.840205] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311'] [2018-09-11 10:38:47.969033] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4602 num_files=1 job=3 return_code=23 [2018-09-11 10:38:47.970118] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311'] [2018-09-11 10:38:50.54420] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4717 num_files=1 job=1 return_code=23 [2018-09-11 10:38:50.56072] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311'] [2018-09-11 10:38:52.317955] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4711 num_files=1 job=2 return_code=23 [2018-09-11 10:38:52.319642] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311'] [2018-09-11 10:38:54.448926] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4715 num_files=1 job=3 return_code=23 [2018-09-11 10:38:54.451127] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536662311'] [2018-09-11 10:38:56.538007] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4759 num_files=1 job=1 return_code=23 [2018-09-11 10:38:56.538914] E [master(worker /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be processed completely - moving on... files=['CHANGELOG.1536662311'] [2018-09-11 10:38:56.544816] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0 [2018-09-11 10:38:56.545031] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=1536662336.5450 DATA=0 XATT=0 [2018-09-11 10:38:56.545356] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1536662311 entry_stime=None changelog_start=1536662311 stime=(1536662310, 0) duration=20.9674 num_changelogs=1 mode=live_changelog I had those issues in the past with 4.1.2 as well. I could fix it only by deleting the geo-replication and the gluster volume and re-create everything. If I delete the geo-replication and delete the changelogs directory or the CHANGELOG files, I get this error: gsyncd.log ? [2018-09-11 10:26:44.928277] E [repce(agent /bricks/brick1/brick):105:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history num_parallel) File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 101, in cl_history_changelog cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 28, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 61] No data available ? Or Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 53, in history num_parallel) File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 101, in cl_history_changelog cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 28, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 2] No such file or directory I read somewhere that if I delete the geo-replication with ?reset-sync-time?, the changelogs are cleared, but this doesn?t happen. How can I reset the changelog without deleting all data? Regards, Christian Kotte -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180911/9a8603e7/attachment.html>
Kotresh Hiremath Ravishankar
2018-Sep-12 05:31 UTC
[Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue
Answer inline. On Tue, Sep 11, 2018 at 4:19 PM, Kotte, Christian (Ext) < christian.kotte at novartis.com> wrote:> Hi all, > > > > I use glusterfs 4.1.3 non-root user geo-replication in a cascading setup. > The gsyncd.log on the master is fine, but I have some strange changelog > warnings and errors on the interimmaster: > > > > gsyncd.log > > ? > > [2018-09-11 10:38:35.575464] I [master(worker > /bricks/brick1/brick):1460:crawl] _GMaster: slave's time > stime=(1536662250, 0) > > [2018-09-11 10:38:37.126749] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1.4698 num_files=1 job=1 return_code=23 > > [2018-09-11 10:38:37.128668] W [master(worker > /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying > changelogs files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:39.353209] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1.4057 num_files=1 job=2 return_code=23 > > [2018-09-11 10:38:39.354737] W [master(worker > /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying > changelogs files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:41.501187] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1.4781 num_files=1 job=3 return_code=23 > > [2018-09-11 10:38:41.503048] W [master(worker > /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying > changelogs files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:43.575047] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1.4385 num_files=1 job=1 return_code=23 > > [2018-09-11 10:38:43.576597] W [master(worker > /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying > changelogs files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:45.838089] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1.4765 num_files=1 job=2 return_code=23 > > [2018-09-11 10:38:45.840205] W [master(worker > /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying > changelogs files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:47.969033] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1.4602 num_files=1 job=3 return_code=23 > > [2018-09-11 10:38:47.970118] W [master(worker > /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying > changelogs files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:50.54420] I [master(worker /bricks/brick1/brick):1944:syncjob] > Syncer: Sync Time Taken duration=1.4717 num_files=1 job=1 > return_code=23 > > [2018-09-11 10:38:50.56072] W [master(worker /bricks/brick1/brick):1346:process] > _GMaster: incomplete sync, retrying changelogs > files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:52.317955] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1.4711 num_files=1 job=2 return_code=23 > > [2018-09-11 10:38:52.319642] W [master(worker > /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying > changelogs files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:54.448926] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1.4715 num_files=1 job=3 return_code=23 > > [2018-09-11 10:38:54.451127] W [master(worker > /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying > changelogs files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:56.538007] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1.4759 num_files=1 job=1 return_code=23 > > [2018-09-11 10:38:56.538914] E [master(worker > /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be > processed completely - moving on... files=['CHANGELOG.1536662311'] > > [2018-09-11 10:38:56.544816] I [master(worker > /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 > MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0 > > [2018-09-11 10:38:56.545031] I [master(worker > /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken > SETA=0 SETX=0 meta_duration=0.0000 data_duration=1536662336.5450 > DATA=0 XATT=0 > > [2018-09-11 10:38:56.545356] I [master(worker > /bricks/brick1/brick):1394:process] _GMaster: Batch Completed > changelog_end=1536662311 entry_stime=None > changelog_start=1536662311 stime=(1536662310, 0) > duration=20.9674 num_changelogs=1 mode=live_changelog >There seems to be a bug, please raise a bug. For now as a work around add the following line at the end on all the master node's configuration with any editor. After adding it on all master nodes, stop and start geo-rep. rsync-options = --ignore-missing-args configuration file: /var/lib/glusterd/geo-replication/<mastervol>_<slave_node>_<slave/vol>gsyncd.conf> > I had those issues in the past with 4.1.2 as well. I could fix it only by > deleting the geo-replication and the gluster volume and re-create > everything. > > > > If I delete the geo-replication and delete the changelogs directory or the > CHANGELOG files, I get this error: > > > > gsyncd.log > > ? > > [2018-09-11 10:26:44.928277] E [repce(agent /bricks/brick1/brick):105:worker] > <top>: call failed: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in > worker > > res = getattr(self.obj, rmeth)(*in_data[2:]) > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line > 53, in history > > num_parallel) > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 101, in cl_history_changelog > > cls.raise_changelog_err() > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 28, in raise_changelog_err > > raise ChangelogException(errn, os.strerror(errn)) > > ChangelogException: [Errno 61] No data available > > ? > > > > Or > > > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in > worker > > res = getattr(self.obj, rmeth)(*in_data[2:]) > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line > 53, in history > > num_parallel) > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 101, in cl_history_changelog > > cls.raise_changelog_err() > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 28, in raise_changelog_err > > raise ChangelogException(errn, os.strerror(errn)) > > ChangelogException: [Errno 2] No such file or directory >Please share the changelog log file to debug this. On the same node where you got this traceback, on same location, share the following log file "changes-bricks-brick1-brick.log"> > I read somewhere that if I delete the geo-replication with > ?reset-sync-time?, the changelogs are cleared, but this doesn?t happen. >changelogs are not cleared, but in the new geo-rep session, the old changelogs are not used for syncing.> > > How can I reset the changelog without deleting all data? > > >I didn't understand clearly what is the requirement here. Could you elaborate?> Regards, > > > > Christian Kotte > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180912/719b7511/attachment.html>