thr3ads.net - Gluster users - [Gluster-users] Geo Replication stops replicating [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Kotresh Hiremath Ravishankar

2019-Jun-04 11:57 UTC

[Gluster-users] Geo Replication stops replicating

could you please try adding /usr/sbin to $PATH for user 'sas'? If
it's
bash, add 'export PATH=/usr/sbin:$PATH' in
/home/sas/.bashrc

On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan <sdeepugd at gmail.com>
wrote:
> Hi Kortesh
> Please find the logs of the above error
> *Master log snippet*
>
>> [2019-06-04 11:52:09.254731] I [resource(worker
>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH:
Initializing
>> SSH connection between master and slave...
>>  [2019-06-04 11:52:09.308923] D [repce(worker
>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call
>> 89724:139652759443264:1559649129.31 __repce_version__() ...
>>  [2019-06-04 11:52:09.602792] E [syncdutils(worker
>> /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>:
>> connection to peer is broken
>>  [2019-06-04 11:52:09.603312] E [syncdutils(worker
>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned
error
>>   cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/ glusterd/geo-replication/secret.pem -p 22
-oControlMaster=auto -S
>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock
>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc
sas@
>> 192.168.185.107::code-misc --master-node 192.168.185.106
>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick
>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122
--local-node-
>>   id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120
>> --slave-log-level DEBUG --slave-gluster-log-level INFO
>> --slave-gluster-command-dir /usr/sbin   error=1
>>  [2019-06-04 11:52:09.614996] I [repce(agent
>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer:
terminating
>> on reaching EOF.
>>  [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor:
>> worker(/home/sas/gluster/data/code-misc) connected
>>  [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor:
>> worker died in startup phase brick=/home/sas/gluster/data/code-misc
>>  [2019-06-04 11:52:09.619391] I
>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status
>> Change status=Faulty
>>
>
> *Slave log snippet*
>
>> [2019-06-04 11:50:09.782668] E [syncdutils(slave
>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen:
>> /usr/sbin/gluster> 2 : failed with this errno (No such file or
directory)
>> [2019-06-04 11:50:11.188167] W [gsyncd(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] <top>:
>> Session config file not exists, using the default config
>>
path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf
>> [2019-06-04 11:50:11.201070] I [resource(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER:
>> Mounting gluster volume locally...
>> [2019-06-04 11:50:11.271231] E [resource(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter]
>> MountbrokerMounter: glusterd answered mnt>> [2019-06-04
11:50:11.271998] E [syncdutils(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen:
>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost
>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO
>>
log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log
>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1
>> [2019-06-04 11:50:11.272113] E [syncdutils(slave
>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen:
>> /usr/sbin/gluster> 2 : failed with this errno (No such file or
directory)
>
>
> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeepugd at
gmail.com>
> wrote:
>
>> Hi
>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the
Geo
>> replication failed to start.
>> Stays in faulty state
>>
>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at
gmail.com>
>> wrote:
>>
>>> Checked the data. It remains in 2708. No progress.
>>>
>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar <
>>> khiremat at redhat.com> wrote:
>>>
>>>> That means it could be working and the defunct process might be
some
>>>> old zombie one. Could you check, that data progress ?
>>>>
>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd
at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>> When i change the rsync option the rsync process doesnt
seem to start
>>>>> . Only a defunt process is listed in ps aux. Only when i
set rsync option
>>>>> to " " and restart all the process the rsync
process is listed in ps aux.
>>>>>
>>>>>
>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath
Ravishankar <
>>>>> khiremat at redhat.com> wrote:
>>>>>
>>>>>> Yes, rsync config option should have fixed this issue.
>>>>>>
>>>>>> Could you share the output of the following?
>>>>>>
>>>>>> 1. gluster volume geo-replication <MASTERVOL>
<SLAVEHOST>::<SLAVEVOL>
>>>>>> config rsync-options
>>>>>> 2. ps -ef | grep rsync
>>>>>>
>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan
<sdeepugd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Done.
>>>>>>> We got the following result .
>>>>>>>
>>>>>>>> 1559298781.338234 write(2, "rsync:
link_stat
>>>>>>>>
\"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>>>>>> failed: No such file or directory (2)",
128
>>>>>>>
>>>>>>> seems like a file is missing ?
>>>>>>>
>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath
Ravishankar <
>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Could you take the strace with with more string
size? The argument
>>>>>>>> strings are truncated.
>>>>>>>>
>>>>>>>> strace -s 500 -ttt -T -p <rsync pid>
>>>>>>>>
>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu
srinivasan <
>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Kotresh
>>>>>>>>> The above-mentioned work around did not
work properly.
>>>>>>>>>
>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu
srinivasan <
>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Kotresh
>>>>>>>>>> We have tried the above-mentioned rsync
option and we are
>>>>>>>>>> planning to have the version upgrade to
6.0.
>>>>>>>>>>
>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM
Kotresh Hiremath Ravishankar <
>>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> This looks like the hang because
stderr buffer filled up with
>>>>>>>>>>> errors messages and no one reading
it.
>>>>>>>>>>> I think this issue is fixed in
latest releases. As a workaround,
>>>>>>>>>>> you can do following and check if
it works.
>>>>>>>>>>>
>>>>>>>>>>> Prerequisite:
>>>>>>>>>>>  rsync version should be > 3.1.0
>>>>>>>>>>>
>>>>>>>>>>> Workaround:
>>>>>>>>>>> gluster volume geo-replication
<MASTERVOL>
>>>>>>>>>>> <SLAVEHOST>::<SLAVEVOL>
config rsync-options "--ignore-missing-
>>>>>>>>>>> args"
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Kotresh HR
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM
deepu srinivasan <
>>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi
>>>>>>>>>>>> We were evaluating Gluster geo
Replication between two DCs one
>>>>>>>>>>>> is in US west and one is in US
east. We took multiple trials for different
>>>>>>>>>>>> file size.
>>>>>>>>>>>> The Geo Replication tends to
stop replicating but while
>>>>>>>>>>>> checking the status it appears
to be in Active state. But the slave volume
>>>>>>>>>>>> did not increase in size.
>>>>>>>>>>>> So we have restarted the
geo-replication session and checked
>>>>>>>>>>>> the status. The status was in
an active state and it was in History Crawl
>>>>>>>>>>>> for a long time. We have
enabled the DEBUG mode in logging and checked for
>>>>>>>>>>>> any error.
>>>>>>>>>>>> There was around 2000 file
appeared for syncing candidate. The
>>>>>>>>>>>> Rsync process starts but the
rsync did not happen in the slave volume.
>>>>>>>>>>>> Every time the rsync process
appears in the "ps auxxx" list but the
>>>>>>>>>>>> replication did not happen in
the slave end. What would be the cause of
>>>>>>>>>>>> this problem? Is there anyway
to debug it?
>>>>>>>>>>>>
>>>>>>>>>>>> We have also checked the strace
of the rync program.
>>>>>>>>>>>> it displays something like this
>>>>>>>>>>>>
>>>>>>>>>>>> "write(2, "rsync:
link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> We are using the below specs
>>>>>>>>>>>>
>>>>>>>>>>>> Gluster version - 4.1.7
>>>>>>>>>>>> Sync mode - rsync
>>>>>>>>>>>> Volume - 1x3 in each end
(master and slave)
>>>>>>>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks and Regards,
>>>>>>>> Kotresh H R
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks and Regards,
>>>>>> Kotresh H R
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Thanks and Regards,
>>>> Kotresh H R
>>>>
>>>
-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/b596b07d/attachment.html>

deepu srinivasan

2019-Jun-04 12:16 UTC

head link

[Gluster-users] Geo Replication stops replicating

Have already added the path in bashrc . Still in faulty state

On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar <
khiremat at redhat.com> wrote:
> could you please try adding /usr/sbin to $PATH for user 'sas'? If
it's
> bash, add 'export PATH=/usr/sbin:$PATH' in
> /home/sas/.bashrc
>
> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan <sdeepugd at
gmail.com>
> wrote:
>
>> Hi Kortesh
>> Please find the logs of the above error
>> *Master log snippet*
>>
>>> [2019-06-04 11:52:09.254731] I [resource(worker
>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH:
Initializing
>>> SSH connection between master and slave...
>>>  [2019-06-04 11:52:09.308923] D [repce(worker
>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call
>>> 89724:139652759443264:1559649129.31 __repce_version__() ...
>>>  [2019-06-04 11:52:09.602792] E [syncdutils(worker
>>> /home/sas/gluster/data/code-misc):311:log_raise_exception]
<top>:
>>> connection to peer is broken
>>>  [2019-06-04 11:52:09.603312] E [syncdutils(worker
>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command
returned error
>>>   cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22
-oControlMaster=auto -S
>>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock
>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave
code-misc sas@
>>>   192.168.185.107::code-misc --master-node 192.168.185.106
>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec
--master-brick
>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122
--local-node-
>>>   id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120
>>> --slave-log-level DEBUG --slave-gluster-log-level INFO
>>> --slave-gluster-command-dir /usr/sbin   error=1
>>>  [2019-06-04 11:52:09.614996] I [repce(agent
>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer:
terminating
>>> on reaching EOF.
>>>  [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor]
Monitor:
>>> worker(/home/sas/gluster/data/code-misc) connected
>>>  [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor]
Monitor:
>>> worker died in startup phase brick=/home/sas/gluster/data/code-misc
>>>  [2019-06-04 11:52:09.619391] I
>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status
>>> Change status=Faulty
>>>
>>
>> *Slave log snippet*
>>
>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave
>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen:
>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or
directory)
>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave
>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main]
<top>:
>>> Session config file not exists, using the default config
>>>
path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf
>>> [2019-06-04 11:50:11.201070] I [resource(slave
>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect]
GLUSTER:
>>> Mounting gluster volume locally...
>>> [2019-06-04 11:50:11.271231] E [resource(slave
>>>
192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter]
>>> MountbrokerMounter: glusterd answered mnt>>> [2019-06-04
11:50:11.271998] E [syncdutils(slave
>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen:
>>> command returned error cmd=/usr/sbin/gluster
--remote-host=localhost
>>> system:: mount sas user-map-root=sas aux-gfid-mount acl
log-level=INFO
>>>
log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log
>>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1
>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave
>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen:
>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or
directory)
>>
>>
>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeepugd at
gmail.com>
>> wrote:
>>
>>> Hi
>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But
the
>>> Geo replication failed to start.
>>> Stays in faulty state
>>>
>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at
gmail.com>
>>> wrote:
>>>
>>>> Checked the data. It remains in 2708. No progress.
>>>>
>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar
<
>>>> khiremat at redhat.com> wrote:
>>>>
>>>>> That means it could be working and the defunct process
might be some
>>>>> old zombie one. Could you check, that data progress ?
>>>>>
>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan
<sdeepugd at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>> When i change the rsync option the rsync process doesnt
seem to start
>>>>>> . Only a defunt process is listed in ps aux. Only when
i set rsync option
>>>>>> to " " and restart all the process the rsync
process is listed in ps aux.
>>>>>>
>>>>>>
>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath
Ravishankar <
>>>>>> khiremat at redhat.com> wrote:
>>>>>>
>>>>>>> Yes, rsync config option should have fixed this
issue.
>>>>>>>
>>>>>>> Could you share the output of the following?
>>>>>>>
>>>>>>> 1. gluster volume geo-replication <MASTERVOL>
>>>>>>> <SLAVEHOST>::<SLAVEVOL> config
rsync-options
>>>>>>> 2. ps -ef | grep rsync
>>>>>>>
>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan
<sdeepugd at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Done.
>>>>>>>> We got the following result .
>>>>>>>>
>>>>>>>>> 1559298781.338234 write(2, "rsync:
link_stat
>>>>>>>>>
\"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>>>>>>> failed: No such file or directory
(2)", 128
>>>>>>>>
>>>>>>>> seems like a file is missing ?
>>>>>>>>
>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh
Hiremath Ravishankar <
>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Could you take the strace with with more
string size? The argument
>>>>>>>>> strings are truncated.
>>>>>>>>>
>>>>>>>>> strace -s 500 -ttt -T -p <rsync pid>
>>>>>>>>>
>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu
srinivasan <
>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Kotresh
>>>>>>>>>> The above-mentioned work around did not
work properly.
>>>>>>>>>>
>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu
srinivasan <
>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Kotresh
>>>>>>>>>>> We have tried the above-mentioned
rsync option and we are
>>>>>>>>>>> planning to have the version
upgrade to 6.0.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM
Kotresh Hiremath Ravishankar <
>>>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> This looks like the hang
because stderr buffer filled up with
>>>>>>>>>>>> errors messages and no one
reading it.
>>>>>>>>>>>> I think this issue is fixed in
latest releases. As a
>>>>>>>>>>>> workaround, you can do
following and check if it works.
>>>>>>>>>>>>
>>>>>>>>>>>> Prerequisite:
>>>>>>>>>>>>  rsync version should be >
3.1.0
>>>>>>>>>>>>
>>>>>>>>>>>> Workaround:
>>>>>>>>>>>> gluster volume geo-replication
<MASTERVOL>
>>>>>>>>>>>>
<SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-missing-
>>>>>>>>>>>> args"
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Kotresh HR
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM
deepu srinivasan <
>>>>>>>>>>>> sdeepugd at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi
>>>>>>>>>>>>> We were evaluating Gluster
geo Replication between two DCs one
>>>>>>>>>>>>> is in US west and one is in
US east. We took multiple trials for different
>>>>>>>>>>>>> file size.
>>>>>>>>>>>>> The Geo Replication tends
to stop replicating but while
>>>>>>>>>>>>> checking the status it
appears to be in Active state. But the slave volume
>>>>>>>>>>>>> did not increase in size.
>>>>>>>>>>>>> So we have restarted the
geo-replication session and checked
>>>>>>>>>>>>> the status. The status was
in an active state and it was in History Crawl
>>>>>>>>>>>>> for a long time. We have
enabled the DEBUG mode in logging and checked for
>>>>>>>>>>>>> any error.
>>>>>>>>>>>>> There was around 2000 file
appeared for syncing candidate. The
>>>>>>>>>>>>> Rsync process starts but
the rsync did not happen in the slave volume.
>>>>>>>>>>>>> Every time the rsync
process appears in the "ps auxxx" list but the
>>>>>>>>>>>>> replication did not happen
in the slave end. What would be the cause of
>>>>>>>>>>>>> this problem? Is there
anyway to debug it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> We have also checked the
strace of the rync program.
>>>>>>>>>>>>> it displays something like
this
>>>>>>>>>>>>>
>>>>>>>>>>>>> "write(2, "rsync:
link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> We are using the below
specs
>>>>>>>>>>>>>
>>>>>>>>>>>>> Gluster version - 4.1.7
>>>>>>>>>>>>> Sync mode - rsync
>>>>>>>>>>>>> Volume - 1x3 in each end
(master and slave)
>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks and Regards,
>>>>>>>>> Kotresh H R
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks and Regards,
>>>>>>> Kotresh H R
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Thanks and Regards,
>>>>> Kotresh H R
>>>>>
>>>>
>
> --
> Thanks and Regards,
> Kotresh H R
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/be2697da/attachment-0001.html>

Kotresh Hiremath Ravishankar

2019-Jun-04 17:49 UTC

head link

[Gluster-users] Geo Replication stops replicating

Ccing Sunny, who was investing similar issue.

On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan <sdeepugd at gmail.com>
wrote:
> Have already added the path in bashrc . Still in faulty state
>
> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar <
> khiremat at redhat.com> wrote:
>
>> could you please try adding /usr/sbin to $PATH for user 'sas'?
If it's
>> bash, add 'export PATH=/usr/sbin:$PATH' in
>> /home/sas/.bashrc
>>
>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan <sdeepugd at
gmail.com>
>> wrote:
>>
>>> Hi Kortesh
>>> Please find the logs of the above error
>>> *Master log snippet*
>>>
>>>> [2019-06-04 11:52:09.254731] I [resource(worker
>>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH:
Initializing
>>>> SSH connection between master and slave...
>>>>  [2019-06-04 11:52:09.308923] D [repce(worker
>>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call
>>>> 89724:139652759443264:1559649129.31 __repce_version__() ...
>>>>  [2019-06-04 11:52:09.602792] E [syncdutils(worker
>>>> /home/sas/gluster/data/code-misc):311:log_raise_exception]
<top>:
>>>> connection to peer is broken
>>>>  [2019-06-04 11:52:09.603312] E [syncdutils(worker
>>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command
returned error
>>>>   cmd=ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i
>>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22
-oControlMaster=auto -S
>>>>
/tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock
>>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave
code-misc sas@
>>>>   192.168.185.107::code-misc --master-node 192.168.185.106
>>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec
--master-brick
>>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122
--local-node-
>>>>   id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120
>>>> --slave-log-level DEBUG --slave-gluster-log-level INFO
>>>> --slave-gluster-command-dir /usr/sbin   error=1
>>>>  [2019-06-04 11:52:09.614996] I [repce(agent
>>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer:
terminating
>>>> on reaching EOF.
>>>>  [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor]
Monitor:
>>>> worker(/home/sas/gluster/data/code-misc) connected
>>>>  [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor]
Monitor:
>>>> worker died in startup phase
brick=/home/sas/gluster/data/code-misc
>>>>  [2019-06-04 11:52:09.619391] I
>>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
Worker Status
>>>> Change status=Faulty
>>>>
>>>
>>> *Slave log snippet*
>>>
>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave
>>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr]
Popen:
>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file
or directory)
>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave
>>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main]
<top>:
>>>> Session config file not exists, using the default config
>>>>
path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf
>>>> [2019-06-04 11:50:11.201070] I [resource(slave
>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect]
>>>> GLUSTER: Mounting gluster volume locally...
>>>> [2019-06-04 11:50:11.271231] E [resource(slave
>>>>
192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter]
>>>> MountbrokerMounter: glusterd answered mnt>>>>
[2019-06-04 11:50:11.271998] E [syncdutils(slave
>>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog]
Popen:
>>>> command returned error cmd=/usr/sbin/gluster
--remote-host=localhost
>>>> system:: mount sas user-map-root=sas aux-gfid-mount acl
log-level=INFO
>>>>
log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log
>>>> volfile-server=localhost volfile-id=code-misc client-pid=-1
error=1
>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave
>>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr]
Popen:
>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file
or directory)
>>>
>>>
>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeepugd at
gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version.
But the
>>>> Geo replication failed to start.
>>>> Stays in faulty state
>>>>
>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at
gmail.com>
>>>> wrote:
>>>>
>>>>> Checked the data. It remains in 2708. No progress.
>>>>>
>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath
Ravishankar <
>>>>> khiremat at redhat.com> wrote:
>>>>>
>>>>>> That means it could be working and the defunct process
might be some
>>>>>> old zombie one. Could you check, that data progress ?
>>>>>>
>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan
<sdeepugd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>> When i change the rsync option the rsync process
doesnt seem to
>>>>>>> start . Only a defunt process is listed in ps aux.
Only when i set rsync
>>>>>>> option to " " and restart all the process
the rsync process is listed in ps
>>>>>>> aux.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath
Ravishankar <
>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>
>>>>>>>> Yes, rsync config option should have fixed this
issue.
>>>>>>>>
>>>>>>>> Could you share the output of the following?
>>>>>>>>
>>>>>>>> 1. gluster volume geo-replication
<MASTERVOL>
>>>>>>>> <SLAVEHOST>::<SLAVEVOL> config
rsync-options
>>>>>>>> 2. ps -ef | grep rsync
>>>>>>>>
>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu
srinivasan <
>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Done.
>>>>>>>>> We got the following result .
>>>>>>>>>
>>>>>>>>>> 1559298781.338234 write(2, "rsync:
link_stat
>>>>>>>>>>
\"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>>>>>>>> failed: No such file or directory
(2)", 128
>>>>>>>>>
>>>>>>>>> seems like a file is missing ?
>>>>>>>>>
>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh
Hiremath Ravishankar <
>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Could you take the strace with with
more string size? The
>>>>>>>>>> argument strings are truncated.
>>>>>>>>>>
>>>>>>>>>> strace -s 500 -ttt -T -p <rsync
pid>
>>>>>>>>>>
>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu
srinivasan <
>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Kotresh
>>>>>>>>>>> The above-mentioned work around did
not work properly.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM
deepu srinivasan <
>>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Kotresh
>>>>>>>>>>>> We have tried the
above-mentioned rsync option and we are
>>>>>>>>>>>> planning to have the version
upgrade to 6.0.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04
AM Kotresh Hiremath Ravishankar <
>>>>>>>>>>>> khiremat at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> This looks like the hang
because stderr buffer filled up with
>>>>>>>>>>>>> errors messages and no one
reading it.
>>>>>>>>>>>>> I think this issue is fixed
in latest releases. As a
>>>>>>>>>>>>> workaround, you can do
following and check if it works.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Prerequisite:
>>>>>>>>>>>>>  rsync version should be
> 3.1.0
>>>>>>>>>>>>>
>>>>>>>>>>>>> Workaround:
>>>>>>>>>>>>> gluster volume
geo-replication <MASTERVOL>
>>>>>>>>>>>>>
<SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-missing
>>>>>>>>>>>>> -args"
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Kotresh HR
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, May 30, 2019 at
5:39 PM deepu srinivasan <
>>>>>>>>>>>>> sdeepugd at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>> We were evaluating
Gluster geo Replication between two DCs
>>>>>>>>>>>>>> one is in US west and
one is in US east. We took multiple trials for
>>>>>>>>>>>>>> different file size.
>>>>>>>>>>>>>> The Geo Replication
tends to stop replicating but while
>>>>>>>>>>>>>> checking the status it
appears to be in Active state. But the slave volume
>>>>>>>>>>>>>> did not increase in
size.
>>>>>>>>>>>>>> So we have restarted
the geo-replication session and checked
>>>>>>>>>>>>>> the status. The status
was in an active state and it was in History Crawl
>>>>>>>>>>>>>> for a long time. We
have enabled the DEBUG mode in logging and checked for
>>>>>>>>>>>>>> any error.
>>>>>>>>>>>>>> There was around 2000
file appeared for syncing candidate.
>>>>>>>>>>>>>> The Rsync process
starts but the rsync did not happen in the slave volume.
>>>>>>>>>>>>>> Every time the rsync
process appears in the "ps auxxx" list but the
>>>>>>>>>>>>>> replication did not
happen in the slave end. What would be the cause of
>>>>>>>>>>>>>> this problem? Is there
anyway to debug it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We have also checked
the strace of the rync program.
>>>>>>>>>>>>>> it displays something
like this
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "write(2,
"rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We are using the below
specs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Gluster version - 4.1.7
>>>>>>>>>>>>>> Sync mode - rsync
>>>>>>>>>>>>>> Volume - 1x3 in each
end (master and slave)
>>>>>>>>>>>>>> Intranet Bandwidth - 10
Gig
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Thanks and Regards,
>>>>>>>>>> Kotresh H R
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks and Regards,
>>>>>>>> Kotresh H R
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks and Regards,
>>>>>> Kotresh H R
>>>>>>
>>>>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>
-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/215ad6a3/attachment.html>

Gluster users - Jun 2019 - Geo Replication stops replicating

[Gluster-users] Geo Replication stops replicating

[Gluster-users] Geo Replication stops replicating

[Gluster-users] Geo Replication stops replicating