thr3ads.net - Gluster users - [Gluster-users] Geo Replication stops replicating [Jun 2019]

If this information is useful, please help other people find it:
Share via:

deepu srinivasan

2019-Jun-04 19:36 UTC

[Gluster-users] Geo Replication stops replicating

Thankyou Kotresh

On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar <
khiremat at redhat.com> wrote:
> Ccing Sunny, who was investing similar issue.
>
> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan <sdeepugd at
gmail.com>
> wrote:
>
>> Have already added the path in bashrc . Still in faulty state
>>
>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar <
>> khiremat at redhat.com> wrote:
>>
>>> could you please try adding /usr/sbin to $PATH for user
'sas'? If it's
>>> bash, add 'export PATH=/usr/sbin:$PATH' in
>>> /home/sas/.bashrc
>>>
>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan <sdeepugd at
gmail.com>
>>> wrote:
>>>
>>>> Hi Kortesh
>>>> Please find the logs of the above error
>>>> *Master log snippet*
>>>>
>>>>> [2019-06-04 11:52:09.254731] I [resource(worker
>>>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH:
Initializing
>>>>> SSH connection between master and slave...
>>>>>  [2019-06-04 11:52:09.308923] D [repce(worker
>>>>> /home/sas/gluster/data/code-misc):196:push] RepceClient:
call
>>>>> 89724:139652759443264:1559649129.31 __repce_version__() ...
>>>>>  [2019-06-04 11:52:09.602792] E [syncdutils(worker
>>>>> /home/sas/gluster/data/code-misc):311:log_raise_exception]
<top>:
>>>>> connection to peer is broken
>>>>>  [2019-06-04 11:52:09.603312] E [syncdutils(worker
>>>>> /home/sas/gluster/data/code-misc):805:errlog] Popen:
command returned error
>>>>>   cmd=ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i
>>>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22
-oControlMaster=auto -S
>>>>>
/tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock
>>>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave
code-misc sas@
>>>>>   192.168.185.107::code-misc --master-node 192.168.185.106
>>>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec
--master-brick
>>>>> /home/sas/gluster/data/code-misc --local-node
192.168.185.122 --local-node-
>>>>>   id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout
120
>>>>> --slave-log-level DEBUG --slave-gluster-log-level INFO
>>>>> --slave-gluster-command-dir /usr/sbin   error=1
>>>>>  [2019-06-04 11:52:09.614996] I [repce(agent
>>>>> /home/sas/gluster/data/code-misc):97:service_loop]
RepceServer: terminating
>>>>> on reaching EOF.
>>>>>  [2019-06-04 11:52:09.615545] D
[monitor(monitor):271:monitor]
>>>>> Monitor: worker(/home/sas/gluster/data/code-misc) connected
>>>>>  [2019-06-04 11:52:09.616528] I
[monitor(monitor):278:monitor]
>>>>> Monitor: worker died in startup phase
brick=/home/sas/gluster/data/code-misc
>>>>>  [2019-06-04 11:52:09.619391] I
>>>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
Worker Status
>>>>> Change status=Faulty
>>>>>
>>>>
>>>> *Slave log snippet*
>>>>
>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave
>>>>>
192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen:
>>>>> /usr/sbin/gluster> 2 : failed with this errno (No such
file or directory)
>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave
>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main]
<top>:
>>>>> Session config file not exists, using the default config
>>>>>
path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf
>>>>> [2019-06-04 11:50:11.201070] I [resource(slave
>>>>>
192.168.185.125/home/sas/gluster/data/code-misc):1098:connect]
>>>>> GLUSTER: Mounting gluster volume locally...
>>>>> [2019-06-04 11:50:11.271231] E [resource(slave
>>>>>
192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter]
>>>>> MountbrokerMounter: glusterd answered
mnt>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave
>>>>>
192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen:
>>>>> command returned error cmd=/usr/sbin/gluster
--remote-host=localhost
>>>>> system:: mount sas user-map-root=sas aux-gfid-mount acl
log-level=INFO
>>>>>
log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log
>>>>> volfile-server=localhost volfile-id=code-misc client-pid=-1
error=1
>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave
>>>>>
192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen:
>>>>> /usr/sbin/gluster> 2 : failed with this errno (No such
file or directory)
>>>>
>>>>
>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeepugd at
gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>> As discussed I have upgraded gluster from 4.1 to 6.2
version. But the
>>>>> Geo replication failed to start.
>>>>> Stays in faulty state
>>>>>
>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd
at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Checked the data. It remains in 2708. No progress.
>>>>>>
>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath
Ravishankar <
>>>>>> khiremat at redhat.com> wrote:
>>>>>>
>>>>>>> That means it could be working and the defunct
process might be some
>>>>>>> old zombie one. Could you check, that data progress
?
>>>>>>>
>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan
<sdeepugd at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>> When i change the rsync option the rsync
process doesnt seem to
>>>>>>>> start . Only a defunt process is listed in ps
aux. Only when i set rsync
>>>>>>>> option to " " and restart all the
process the rsync process is listed in ps
>>>>>>>> aux.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh
Hiremath Ravishankar <
>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Yes, rsync config option should have fixed
this issue.
>>>>>>>>>
>>>>>>>>> Could you share the output of the
following?
>>>>>>>>>
>>>>>>>>> 1. gluster volume geo-replication
<MASTERVOL>
>>>>>>>>> <SLAVEHOST>::<SLAVEVOL> config
rsync-options
>>>>>>>>> 2. ps -ef | grep rsync
>>>>>>>>>
>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu
srinivasan <
>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Done.
>>>>>>>>>> We got the following result .
>>>>>>>>>>
>>>>>>>>>>> 1559298781.338234 write(2,
"rsync: link_stat
>>>>>>>>>>>
\"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>>>>>>>>> failed: No such file or directory
(2)", 128
>>>>>>>>>>
>>>>>>>>>> seems like a file is missing ?
>>>>>>>>>>
>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh
Hiremath Ravishankar <
>>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Could you take the strace with with
more string size? The
>>>>>>>>>>> argument strings are truncated.
>>>>>>>>>>>
>>>>>>>>>>> strace -s 500 -ttt -T -p <rsync
pid>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM
deepu srinivasan <
>>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Kotresh
>>>>>>>>>>>> The above-mentioned work around
did not work properly.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM
deepu srinivasan <
>>>>>>>>>>>> sdeepugd at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Kotresh
>>>>>>>>>>>>> We have tried the
above-mentioned rsync option and we are
>>>>>>>>>>>>> planning to have the
version upgrade to 6.0.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 31, 2019 at
11:04 AM Kotresh Hiremath Ravishankar <
>>>>>>>>>>>>> khiremat at redhat.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This looks like the
hang because stderr buffer filled up with
>>>>>>>>>>>>>> errors messages and no
one reading it.
>>>>>>>>>>>>>> I think this issue is
fixed in latest releases. As a
>>>>>>>>>>>>>> workaround, you can do
following and check if it works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Prerequisite:
>>>>>>>>>>>>>>  rsync version should
be > 3.1.0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Workaround:
>>>>>>>>>>>>>> gluster volume
geo-replication <MASTERVOL>
>>>>>>>>>>>>>>
<SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-
>>>>>>>>>>>>>> missing-args"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Kotresh HR
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, May 30, 2019 at
5:39 PM deepu srinivasan <
>>>>>>>>>>>>>> sdeepugd at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>> We were evaluating
Gluster geo Replication between two DCs
>>>>>>>>>>>>>>> one is in US west
and one is in US east. We took multiple trials for
>>>>>>>>>>>>>>> different file
size.
>>>>>>>>>>>>>>> The Geo Replication
tends to stop replicating but while
>>>>>>>>>>>>>>> checking the status
it appears to be in Active state. But the slave volume
>>>>>>>>>>>>>>> did not increase in
size.
>>>>>>>>>>>>>>> So we have
restarted the geo-replication session and checked
>>>>>>>>>>>>>>> the status. The
status was in an active state and it was in History Crawl
>>>>>>>>>>>>>>> for a long time. We
have enabled the DEBUG mode in logging and checked for
>>>>>>>>>>>>>>> any error.
>>>>>>>>>>>>>>> There was around
2000 file appeared for syncing candidate.
>>>>>>>>>>>>>>> The Rsync process
starts but the rsync did not happen in the slave volume.
>>>>>>>>>>>>>>> Every time the
rsync process appears in the "ps auxxx" list but the
>>>>>>>>>>>>>>> replication did not
happen in the slave end. What would be the cause of
>>>>>>>>>>>>>>> this problem? Is
there anyway to debug it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We have also
checked the strace of the rync program.
>>>>>>>>>>>>>>> it displays
something like this
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "write(2,
"rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We are using the
below specs
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Gluster version -
4.1.7
>>>>>>>>>>>>>>> Sync mode - rsync
>>>>>>>>>>>>>>> Volume - 1x3 in
each end (master and slave)
>>>>>>>>>>>>>>> Intranet Bandwidth
- 10 Gig
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks and Regards,
>>>>>>>>> Kotresh H R
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks and Regards,
>>>>>>> Kotresh H R
>>>>>>>
>>>>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>
>
> --
> Thanks and Regards,
> Kotresh H R
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/c9f6d192/attachment.html>

deepu srinivasan

2019-Jun-05 08:58 UTC

head link

[Gluster-users] Geo Replication stops replicating

Hi Kotresh, Sunny
Found this log in the slave machine.
> [2019-06-05 08:49:10.632583] I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req
>
> The message "I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req" repeated 2 times between [2019-06-05
08:49:10.632583]
> and [2019-06-05 08:49:10.670863]
>
> The message "I [MSGID: 106496]
> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
> mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and
> [2019-06-05 08:50:37.254063]
>
> The message "E [MSGID: 106061]
> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
> mountbroker-root' missing in glusterd vol file" repeated 34 times
between
> [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079]
>
> The message "W [MSGID: 106176]
> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
> mount request [No such file or directory]" repeated 34 times between
> [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080]
>
> [2019-06-05 08:50:46.361347] I [MSGID: 106496]
> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
> mount req
>
> [2019-06-05 08:50:46.361384] E [MSGID: 106061]
> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
> mountbroker-root' missing in glusterd vol file
>
> [2019-06-05 08:50:46.361419] W [MSGID: 106176]
> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
> mount request [No such file or directory]
>
> The message "I [MSGID: 106496]
> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
> mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and
> [2019-06-05 08:52:34.019741]
>
> The message "E [MSGID: 106061]
> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
> mountbroker-root' missing in glusterd vol file" repeated 33 times
between
> [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757]
>
> The message "W [MSGID: 106176]
> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
> mount request [No such file or directory]" repeated 33 times between
> [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758]
>
> [2019-06-05 08:52:44.426839] I [MSGID: 106496]
> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
> mount req
>
> [2019-06-05 08:52:44.426886] E [MSGID: 106061]
> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
> mountbroker-root' missing in glusterd vol file
>
> [2019-06-05 08:52:44.426896] W [MSGID: 106176]
> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
> mount request [No such file or directory]
>
On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan <sdeepugd at gmail.com>
wrote:
> Thankyou Kotresh
>
> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar <
> khiremat at redhat.com> wrote:
>
>> Ccing Sunny, who was investing similar issue.
>>
>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan <sdeepugd at
gmail.com>
>> wrote:
>>
>>> Have already added the path in bashrc . Still in faulty state
>>>
>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar <
>>> khiremat at redhat.com> wrote:
>>>
>>>> could you please try adding /usr/sbin to $PATH for user
'sas'? If it's
>>>> bash, add 'export PATH=/usr/sbin:$PATH' in
>>>> /home/sas/.bashrc
>>>>
>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan <sdeepugd at
gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Kortesh
>>>>> Please find the logs of the above error
>>>>> *Master log snippet*
>>>>>
>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker
>>>>>> /home/sas/gluster/data/code-misc):1379:connect_remote]
SSH: Initializing
>>>>>> SSH connection between master and slave...
>>>>>>  [2019-06-04 11:52:09.308923] D [repce(worker
>>>>>> /home/sas/gluster/data/code-misc):196:push]
RepceClient: call
>>>>>> 89724:139652759443264:1559649129.31 __repce_version__()
...
>>>>>>  [2019-06-04 11:52:09.602792] E [syncdutils(worker
>>>>>>
/home/sas/gluster/data/code-misc):311:log_raise_exception] <top>:
>>>>>> connection to peer is broken
>>>>>>  [2019-06-04 11:52:09.603312] E [syncdutils(worker
>>>>>> /home/sas/gluster/data/code-misc):805:errlog] Popen:
command returned error
>>>>>>   cmd=ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i
>>>>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22
-oControlMaster=auto -S
>>>>>>
/tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock
>>>>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd
slave code-misc
>>>>>> sas@   192.168.185.107::code-misc --master-node
192.168.185.106
>>>>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec
--master-brick
>>>>>> /home/sas/gluster/data/code-misc --local-node
192.168.185.122 --local-node-
>>>>>>   id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a
--slave-timeout 120
>>>>>> --slave-log-level DEBUG --slave-gluster-log-level INFO
>>>>>> --slave-gluster-command-dir /usr/sbin   error=1
>>>>>>  [2019-06-04 11:52:09.614996] I [repce(agent
>>>>>> /home/sas/gluster/data/code-misc):97:service_loop]
RepceServer: terminating
>>>>>> on reaching EOF.
>>>>>>  [2019-06-04 11:52:09.615545] D
[monitor(monitor):271:monitor]
>>>>>> Monitor: worker(/home/sas/gluster/data/code-misc)
connected
>>>>>>  [2019-06-04 11:52:09.616528] I
[monitor(monitor):278:monitor]
>>>>>> Monitor: worker died in startup phase
brick=/home/sas/gluster/data/code-misc
>>>>>>  [2019-06-04 11:52:09.619391] I
>>>>>> [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status
>>>>>> Change status=Faulty
>>>>>>
>>>>>
>>>>> *Slave log snippet*
>>>>>
>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave
>>>>>>
192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen:
>>>>>> /usr/sbin/gluster> 2 : failed with this errno (No
such file or directory)
>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave
>>>>>>
192.168.185.125/home/sas/gluster/data/code-misc):305:main] <top>:
>>>>>> Session config file not exists, using the default
config
>>>>>>
path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf
>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave
>>>>>>
192.168.185.125/home/sas/gluster/data/code-misc):1098:connect]
>>>>>> GLUSTER: Mounting gluster volume locally...
>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave
>>>>>>
192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter]
>>>>>> MountbrokerMounter: glusterd answered
mnt>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave
>>>>>>
192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen:
>>>>>> command returned error cmd=/usr/sbin/gluster
--remote-host=localhost
>>>>>> system:: mount sas user-map-root=sas aux-gfid-mount acl
log-level=INFO
>>>>>>
log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log
>>>>>> volfile-server=localhost volfile-id=code-misc
client-pid=-1 error=1
>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave
>>>>>>
192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen:
>>>>>> /usr/sbin/gluster> 2 : failed with this errno (No
such file or directory)
>>>>>
>>>>>
>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan
<sdeepugd at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2
version. But the
>>>>>> Geo replication failed to start.
>>>>>> Stays in faulty state
>>>>>>
>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan
<sdeepugd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Checked the data. It remains in 2708. No progress.
>>>>>>>
>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath
Ravishankar <
>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>
>>>>>>>> That means it could be working and the defunct
process might be
>>>>>>>> some old zombie one. Could you check, that data
progress ?
>>>>>>>>
>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu
srinivasan <
>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>> When i change the rsync option the rsync
process doesnt seem to
>>>>>>>>> start . Only a defunt process is listed in
ps aux. Only when i set rsync
>>>>>>>>> option to " " and restart all the
process the rsync process is listed in ps
>>>>>>>>> aux.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh
Hiremath Ravishankar <
>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, rsync config option should have
fixed this issue.
>>>>>>>>>>
>>>>>>>>>> Could you share the output of the
following?
>>>>>>>>>>
>>>>>>>>>> 1. gluster volume geo-replication
<MASTERVOL>
>>>>>>>>>> <SLAVEHOST>::<SLAVEVOL>
config rsync-options
>>>>>>>>>> 2. ps -ef | grep rsync
>>>>>>>>>>
>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu
srinivasan <
>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Done.
>>>>>>>>>>> We got the following result .
>>>>>>>>>>>
>>>>>>>>>>>> 1559298781.338234 write(2,
"rsync: link_stat
>>>>>>>>>>>>
\"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>>>>>>>>>> failed: No such file or
directory (2)", 128
>>>>>>>>>>>
>>>>>>>>>>> seems like a file is missing ?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM
Kotresh Hiremath Ravishankar <
>>>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Could you take the strace with
with more string size? The
>>>>>>>>>>>> argument strings are truncated.
>>>>>>>>>>>>
>>>>>>>>>>>> strace -s 500 -ttt -T -p
<rsync pid>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM
deepu srinivasan <
>>>>>>>>>>>> sdeepugd at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Kotresh
>>>>>>>>>>>>> The above-mentioned work
around did not work properly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 31, 2019 at
3:16 PM deepu srinivasan <
>>>>>>>>>>>>> sdeepugd at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Kotresh
>>>>>>>>>>>>>> We have tried the
above-mentioned rsync option and we are
>>>>>>>>>>>>>> planning to have the
version upgrade to 6.0.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 31, 2019 at
11:04 AM Kotresh Hiremath Ravishankar
>>>>>>>>>>>>>> <khiremat at
redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This looks like the
hang because stderr buffer filled up
>>>>>>>>>>>>>>> with errors
messages and no one reading it.
>>>>>>>>>>>>>>> I think this issue
is fixed in latest releases. As a
>>>>>>>>>>>>>>> workaround, you can
do following and check if it works.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Prerequisite:
>>>>>>>>>>>>>>>  rsync version
should be > 3.1.0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Workaround:
>>>>>>>>>>>>>>> gluster volume
geo-replication <MASTERVOL>
>>>>>>>>>>>>>>>
<SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-
>>>>>>>>>>>>>>> missing-args"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Kotresh HR
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, May 30,
2019 at 5:39 PM deepu srinivasan <
>>>>>>>>>>>>>>> sdeepugd at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>> We were
evaluating Gluster geo Replication between two DCs
>>>>>>>>>>>>>>>> one is in US
west and one is in US east. We took multiple trials for
>>>>>>>>>>>>>>>> different file
size.
>>>>>>>>>>>>>>>> The Geo
Replication tends to stop replicating but while
>>>>>>>>>>>>>>>> checking the
status it appears to be in Active state. But the slave volume
>>>>>>>>>>>>>>>> did not
increase in size.
>>>>>>>>>>>>>>>> So we have
restarted the geo-replication session and
>>>>>>>>>>>>>>>> checked the
status. The status was in an active state and it was in History
>>>>>>>>>>>>>>>> Crawl for a
long time. We have enabled the DEBUG mode in logging and
>>>>>>>>>>>>>>>> checked for any
error.
>>>>>>>>>>>>>>>> There was
around 2000 file appeared for syncing candidate.
>>>>>>>>>>>>>>>> The Rsync
process starts but the rsync did not happen in the slave volume.
>>>>>>>>>>>>>>>> Every time the
rsync process appears in the "ps auxxx" list but the
>>>>>>>>>>>>>>>> replication did
not happen in the slave end. What would be the cause of
>>>>>>>>>>>>>>>> this problem?
Is there anyway to debug it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We have also
checked the strace of the rync program.
>>>>>>>>>>>>>>>> it displays
something like this
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "write(2,
"rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We are using
the below specs
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Gluster version
- 4.1.7
>>>>>>>>>>>>>>>> Sync mode -
rsync
>>>>>>>>>>>>>>>> Volume - 1x3 in
each end (master and slave)
>>>>>>>>>>>>>>>> Intranet
Bandwidth - 10 Gig
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Thanks and Regards,
>>>>>>>>>> Kotresh H R
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks and Regards,
>>>>>>>> Kotresh H R
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> Thanks and Regards,
>>>> Kotresh H R
>>>>
>>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190605/d351006c/attachment-0001.html>

Gluster users - Jun 2019 - Geo Replication stops replicating

[Gluster-users] Geo Replication stops replicating

[Gluster-users] Geo Replication stops replicating