thr3ads.net - Gluster users - [Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Kotte, Christian (Ext)

2018-Sep-11 10:49 UTC

[Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue

Hi all,

I use glusterfs 4.1.3 non-root user geo-replication in a cascading setup. The
gsyncd.log on the master is fine, but I have some strange changelog warnings and
errors on the interimmaster:

gsyncd.log
?
[2018-09-11 10:38:35.575464] I [master(worker /bricks/brick1/brick):1460:crawl]
_GMaster: slave's time  stime=(1536662250, 0)
[2018-09-11 10:38:37.126749] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken      
duration=1.4698 num_files=1     job=1   return_code=23
[2018-09-11 10:38:37.128668] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs        files=['CHANGELOG.1536662311']
[2018-09-11 10:38:39.353209] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken      
duration=1.4057 num_files=1     job=2   return_code=23
[2018-09-11 10:38:39.354737] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs        files=['CHANGELOG.1536662311']
[2018-09-11 10:38:41.501187] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken      
duration=1.4781 num_files=1     job=3   return_code=23
[2018-09-11 10:38:41.503048] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs        files=['CHANGELOG.1536662311']
[2018-09-11 10:38:43.575047] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken      
duration=1.4385 num_files=1     job=1   return_code=23
[2018-09-11 10:38:43.576597] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs        files=['CHANGELOG.1536662311']
[2018-09-11 10:38:45.838089] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken      
duration=1.4765 num_files=1     job=2   return_code=23
[2018-09-11 10:38:45.840205] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs        files=['CHANGELOG.1536662311']
[2018-09-11 10:38:47.969033] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken      
duration=1.4602 num_files=1     job=3   return_code=23
[2018-09-11 10:38:47.970118] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs        files=['CHANGELOG.1536662311']
[2018-09-11 10:38:50.54420] I [master(worker /bricks/brick1/brick):1944:syncjob]
Syncer: Sync Time Taken        duration=1.4717 num_files=1     job=1  
return_code=23
[2018-09-11 10:38:50.56072] W [master(worker /bricks/brick1/brick):1346:process]
_GMaster: incomplete sync, retrying changelogs
files=['CHANGELOG.1536662311']
[2018-09-11 10:38:52.317955] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken      
duration=1.4711 num_files=1     job=2   return_code=23
[2018-09-11 10:38:52.319642] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs        files=['CHANGELOG.1536662311']
[2018-09-11 10:38:54.448926] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken      
duration=1.4715 num_files=1     job=3   return_code=23
[2018-09-11 10:38:54.451127] W [master(worker
/bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
changelogs        files=['CHANGELOG.1536662311']
[2018-09-11 10:38:56.538007] I [master(worker
/bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken      
duration=1.4759 num_files=1     job=1   return_code=23
[2018-09-11 10:38:56.538914] E [master(worker
/bricks/brick1/brick):1325:process] _GMaster: changelogs could not be processed
completely - moving on... files=['CHANGELOG.1536662311']
[2018-09-11 10:38:56.544816] I [master(worker
/bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken    MKD=0   MKN=0 
LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=0
[2018-09-11 10:38:56.545031] I [master(worker
/bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken    SETA=0
SETX=0  meta_duration=0.0000    data_duration=1536662336.5450   DATA=0  XATT=0
[2018-09-11 10:38:56.545356] I [master(worker
/bricks/brick1/brick):1394:process] _GMaster: Batch Completed    
changelog_end=1536662311        entry_stime=None       
changelog_start=1536662311      stime=(1536662310, 0)   duration=20.9674       
num_changelogs=1        mode=live_changelog

I had those issues in the past with 4.1.2 as well. I could fix it only by
deleting the geo-replication and the gluster volume and re-create everything.

If I delete the geo-replication and delete the changelogs directory or the
CHANGELOG files, I get this error:

gsyncd.log
?
[2018-09-11 10:26:44.928277] E [repce(agent /bricks/brick1/brick):105:worker]
<top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101,
in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 53, in history
    num_parallel)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 101, in cl_history_changelog
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 28, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 61] No data available
?

Or

Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101,
in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 53, in history
    num_parallel)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 101, in cl_history_changelog
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 28, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory

I read somewhere that if I delete the geo-replication with ?reset-sync-time?,
the changelogs are cleared, but this doesn?t happen.

How can I reset the changelog without deleting all data?

Regards,

Christian Kotte
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180911/9a8603e7/attachment.html>

Kotresh Hiremath Ravishankar

2018-Sep-12 05:31 UTC

head link

[Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue

Answer inline.

On Tue, Sep 11, 2018 at 4:19 PM, Kotte, Christian (Ext) <
christian.kotte at novartis.com> wrote:
> Hi all,
>
>
>
> I use glusterfs 4.1.3 non-root user geo-replication in a cascading setup.
> The gsyncd.log on the master is fine, but I have some strange changelog
> warnings and errors on the interimmaster:
>
>
>
> gsyncd.log
>
> ?
>
> [2018-09-11 10:38:35.575464] I [master(worker
> /bricks/brick1/brick):1460:crawl] _GMaster: slave's time
> stime=(1536662250, 0)
>
> [2018-09-11 10:38:37.126749] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4698 num_files=1     job=1   return_code=23
>
> [2018-09-11 10:38:37.128668] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogs        files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:39.353209] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4057 num_files=1     job=2   return_code=23
>
> [2018-09-11 10:38:39.354737] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogs        files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:41.501187] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4781 num_files=1     job=3   return_code=23
>
> [2018-09-11 10:38:41.503048] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogs        files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:43.575047] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4385 num_files=1     job=1   return_code=23
>
> [2018-09-11 10:38:43.576597] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogs        files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:45.838089] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4765 num_files=1     job=2   return_code=23
>
> [2018-09-11 10:38:45.840205] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogs        files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:47.969033] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4602 num_files=1     job=3   return_code=23
>
> [2018-09-11 10:38:47.970118] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogs        files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:50.54420] I [master(worker
/bricks/brick1/brick):1944:syncjob]
> Syncer: Sync Time Taken        duration=1.4717 num_files=1     job=1
> return_code=23
>
> [2018-09-11 10:38:50.56072] W [master(worker
/bricks/brick1/brick):1346:process]
> _GMaster: incomplete sync, retrying changelogs
> files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:52.317955] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4711 num_files=1     job=2   return_code=23
>
> [2018-09-11 10:38:52.319642] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogs        files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:54.448926] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4715 num_files=1     job=3   return_code=23
>
> [2018-09-11 10:38:54.451127] W [master(worker
> /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying
> changelogs        files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:56.538007] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1.4759 num_files=1     job=1   return_code=23
>
> [2018-09-11 10:38:56.538914] E [master(worker
> /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be
> processed completely - moving on... files=['CHANGELOG.1536662311']
>
> [2018-09-11 10:38:56.544816] I [master(worker
> /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken    MKD=0
> MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=0
>
> [2018-09-11 10:38:56.545031] I [master(worker
> /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken
> SETA=0  SETX=0  meta_duration=0.0000    data_duration=1536662336.5450
> DATA=0  XATT=0
>
> [2018-09-11 10:38:56.545356] I [master(worker
> /bricks/brick1/brick):1394:process] _GMaster: Batch Completed
> changelog_end=1536662311        entry_stime=None
> changelog_start=1536662311      stime=(1536662310, 0)
> duration=20.9674        num_changelogs=1        mode=live_changelog
>

There seems to be a bug, please raise a bug. For now as a work around add
the following line at the end on all the master node's configuration with
any editor. After adding it on all master nodes, stop and start geo-rep.

rsync-options = --ignore-missing-args

configuration file:
/var/lib/glusterd/geo-replication/<mastervol>_<slave_node>_<slave/vol>gsyncd.conf






>
> I had those issues in the past with 4.1.2 as well. I could fix it only by
> deleting the geo-replication and the gluster volume and re-create
> everything.
>
>
>
> If I delete the geo-replication and delete the changelogs directory or the
> CHANGELOG files, I get this error:
>
>
>
> gsyncd.log
>
> ?
>
> [2018-09-11 10:26:44.928277] E [repce(agent
/bricks/brick1/brick):105:worker]
> <top>: call failed:
>
> Traceback (most recent call last):
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
101, in
> worker
>
>     res = getattr(self.obj, rmeth)(*in_data[2:])
>
>   File
"/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
> 53, in history
>
>     num_parallel)
>
>   File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 101, in cl_history_changelog
>
>     cls.raise_changelog_err()
>
>   File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 28, in raise_changelog_err
>
>     raise ChangelogException(errn, os.strerror(errn))
>
> ChangelogException: [Errno 61] No data available
>
> ?
>
>
>
> Or
>
>
>
> Traceback (most recent call last):
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
101, in
> worker
>
>     res = getattr(self.obj, rmeth)(*in_data[2:])
>
>   File
"/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
> 53, in history
>
>     num_parallel)
>
>   File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 101, in cl_history_changelog
>
>     cls.raise_changelog_err()
>
>   File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 28, in raise_changelog_err
>
>     raise ChangelogException(errn, os.strerror(errn))
>
> ChangelogException: [Errno 2] No such file or directory
>
Please share the changelog log file to debug this. On the same node where
you got this traceback, on same location, share the following log file
"changes-bricks-brick1-brick.log"

>
> I read somewhere that if I delete the geo-replication with
> ?reset-sync-time?, the changelogs are cleared, but this doesn?t happen.
>
changelogs are not cleared, but in the new geo-rep session, the old
changelogs are not used for syncing.
>
>
> How can I reset the changelog without deleting all data?
>
>
>I didn't understand clearly what is the requirement here. Could you
elaborate?
> Regards,
>
>
>
> Christian Kotte
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180912/719b7511/attachment.html>

Gluster users - Sep 2018 - 4.1.x geo-replication "changelogs could not be processed completely" issue

[Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue

[Gluster-users] 4.1.x geo-replication "changelogs could not be processed completely" issue