Kotresh Hiremath Ravishankar
2019-Jun-04 11:57 UTC
[Gluster-users] Geo Replication stops replicating
could you please try adding /usr/sbin to $PATH for user 'sas'? If it's bash, add 'export PATH=/usr/sbin:$PATH' in /home/sas/.bashrc On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan <sdeepugd at gmail.com> wrote:> Hi Kortesh > Please find the logs of the above error > *Master log snippet* > >> [2019-06-04 11:52:09.254731] I [resource(worker >> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >> SSH connection between master and slave... >> [2019-06-04 11:52:09.308923] D [repce(worker >> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >> 89724:139652759443264:1559649129.31 __repce_version__() ... >> [2019-06-04 11:52:09.602792] E [syncdutils(worker >> /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>: >> connection to peer is broken >> [2019-06-04 11:52:09.603312] E [syncdutils(worker >> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ >> 192.168.185.107::code-misc --master-node 192.168.185.106 >> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >> --slave-log-level DEBUG --slave-gluster-log-level INFO >> --slave-gluster-command-dir /usr/sbin error=1 >> [2019-06-04 11:52:09.614996] I [repce(agent >> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >> on reaching EOF. >> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: >> worker(/home/sas/gluster/data/code-misc) connected >> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: >> worker died in startup phase brick=/home/sas/gluster/data/code-misc >> [2019-06-04 11:52:09.619391] I >> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >> Change status=Faulty >> > > *Slave log snippet* > >> [2019-06-04 11:50:09.782668] E [syncdutils(slave >> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: >> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> [2019-06-04 11:50:11.188167] W [gsyncd(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] <top>: >> Session config file not exists, using the default config >> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >> [2019-06-04 11:50:11.201070] I [resource(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: >> Mounting gluster volume locally... >> [2019-06-04 11:50:11.271231] E [resource(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >> MountbrokerMounter: glusterd answered mnt>> [2019-06-04 11:50:11.271998] E [syncdutils(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: >> command returned error cmd=/usr/sbin/gluster --remote-host=localhost >> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >> [2019-06-04 11:50:11.272113] E [syncdutils(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: >> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > > > On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeepugd at gmail.com> > wrote: > >> Hi >> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo >> replication failed to start. >> Stays in faulty state >> >> On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at gmail.com> >> wrote: >> >>> Checked the data. It remains in 2708. No progress. >>> >>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> That means it could be working and the defunct process might be some >>>> old zombie one. Could you check, that data progress ? >>>> >>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com> >>>> wrote: >>>> >>>>> Hi >>>>> When i change the rsync option the rsync process doesnt seem to start >>>>> . Only a defunt process is listed in ps aux. Only when i set rsync option >>>>> to " " and restart all the process the rsync process is listed in ps aux. >>>>> >>>>> >>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>> khiremat at redhat.com> wrote: >>>>> >>>>>> Yes, rsync config option should have fixed this issue. >>>>>> >>>>>> Could you share the output of the following? >>>>>> >>>>>> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>>>>> config rsync-options >>>>>> 2. ps -ef | grep rsync >>>>>> >>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Done. >>>>>>> We got the following result . >>>>>>> >>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>> failed: No such file or directory (2)", 128 >>>>>>> >>>>>>> seems like a file is missing ? >>>>>>> >>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>> khiremat at redhat.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Could you take the strace with with more string size? The argument >>>>>>>> strings are truncated. >>>>>>>> >>>>>>>> strace -s 500 -ttt -T -p <rsync pid> >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Kotresh >>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Kotresh >>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>>>> errors messages and no one reading it. >>>>>>>>>>> I think this issue is fixed in latest releases. As a workaround, >>>>>>>>>>> you can do following and check if it works. >>>>>>>>>>> >>>>>>>>>>> Prerequisite: >>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>> >>>>>>>>>>> Workaround: >>>>>>>>>>> gluster volume geo-replication <MASTERVOL> >>>>>>>>>>> <SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-missing- >>>>>>>>>>> args" >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Kotresh HR >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one >>>>>>>>>>>> is in US west and one is in US east. We took multiple trials for different >>>>>>>>>>>> file size. >>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>> did not increase in size. >>>>>>>>>>>> So we have restarted the geo-replication session and checked >>>>>>>>>>>> the status. The status was in an active state and it was in History Crawl >>>>>>>>>>>> for a long time. We have enabled the DEBUG mode in logging and checked for >>>>>>>>>>>> any error. >>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>>> >>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>> it displays something like this >>>>>>>>>>>> >>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> We are using the below specs >>>>>>>>>>>> >>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Thanks and Regards, >>>>>>>>>>> Kotresh H R >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks and Regards, >>>>>>>> Kotresh H R >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Kotresh H R >>>>>> >>>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>>-- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/b596b07d/attachment.html>
Have already added the path in bashrc . Still in faulty state On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote:> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's > bash, add 'export PATH=/usr/sbin:$PATH' in > /home/sas/.bashrc > > On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan <sdeepugd at gmail.com> > wrote: > >> Hi Kortesh >> Please find the logs of the above error >> *Master log snippet* >> >>> [2019-06-04 11:52:09.254731] I [resource(worker >>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >>> SSH connection between master and slave... >>> [2019-06-04 11:52:09.308923] D [repce(worker >>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >>> 89724:139652759443264:1559649129.31 __repce_version__() ... >>> [2019-06-04 11:52:09.602792] E [syncdutils(worker >>> /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>: >>> connection to peer is broken >>> [2019-06-04 11:52:09.603312] E [syncdutils(worker >>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >>> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ >>> 192.168.185.107::code-misc --master-node 192.168.185.106 >>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >>> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >>> --slave-log-level DEBUG --slave-gluster-log-level INFO >>> --slave-gluster-command-dir /usr/sbin error=1 >>> [2019-06-04 11:52:09.614996] I [repce(agent >>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >>> on reaching EOF. >>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: >>> worker(/home/sas/gluster/data/code-misc) connected >>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: >>> worker died in startup phase brick=/home/sas/gluster/data/code-misc >>> [2019-06-04 11:52:09.619391] I >>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >>> Change status=Faulty >>> >> >> *Slave log snippet* >> >>> [2019-06-04 11:50:09.782668] E [syncdutils(slave >>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: >>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>> [2019-06-04 11:50:11.188167] W [gsyncd(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] <top>: >>> Session config file not exists, using the default config >>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >>> [2019-06-04 11:50:11.201070] I [resource(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: >>> Mounting gluster volume locally... >>> [2019-06-04 11:50:11.271231] E [resource(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >>> MountbrokerMounter: glusterd answered mnt>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: >>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost >>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>> [2019-06-04 11:50:11.272113] E [syncdutils(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: >>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >> >> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeepugd at gmail.com> >> wrote: >> >>> Hi >>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the >>> Geo replication failed to start. >>> Stays in faulty state >>> >>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at gmail.com> >>> wrote: >>> >>>> Checked the data. It remains in 2708. No progress. >>>> >>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>>> khiremat at redhat.com> wrote: >>>> >>>>> That means it could be working and the defunct process might be some >>>>> old zombie one. Could you check, that data progress ? >>>>> >>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> When i change the rsync option the rsync process doesnt seem to start >>>>>> . Only a defunt process is listed in ps aux. Only when i set rsync option >>>>>> to " " and restart all the process the rsync process is listed in ps aux. >>>>>> >>>>>> >>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>>> khiremat at redhat.com> wrote: >>>>>> >>>>>>> Yes, rsync config option should have fixed this issue. >>>>>>> >>>>>>> Could you share the output of the following? >>>>>>> >>>>>>> 1. gluster volume geo-replication <MASTERVOL> >>>>>>> <SLAVEHOST>::<SLAVEVOL> config rsync-options >>>>>>> 2. ps -ef | grep rsync >>>>>>> >>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Done. >>>>>>>> We got the following result . >>>>>>>> >>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>>> failed: No such file or directory (2)", 128 >>>>>>>> >>>>>>>> seems like a file is missing ? >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>>> khiremat at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Could you take the strace with with more string size? The argument >>>>>>>>> strings are truncated. >>>>>>>>> >>>>>>>>> strace -s 500 -ttt -T -p <rsync pid> >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Kotresh >>>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Kotresh >>>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>>>>> errors messages and no one reading it. >>>>>>>>>>>> I think this issue is fixed in latest releases. As a >>>>>>>>>>>> workaround, you can do following and check if it works. >>>>>>>>>>>> >>>>>>>>>>>> Prerequisite: >>>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>>> >>>>>>>>>>>> Workaround: >>>>>>>>>>>> gluster volume geo-replication <MASTERVOL> >>>>>>>>>>>> <SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-missing- >>>>>>>>>>>> args" >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Kotresh HR >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi >>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one >>>>>>>>>>>>> is in US west and one is in US east. We took multiple trials for different >>>>>>>>>>>>> file size. >>>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>>> did not increase in size. >>>>>>>>>>>>> So we have restarted the geo-replication session and checked >>>>>>>>>>>>> the status. The status was in an active state and it was in History Crawl >>>>>>>>>>>>> for a long time. We have enabled the DEBUG mode in logging and checked for >>>>>>>>>>>>> any error. >>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>>>> >>>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>>> it displays something like this >>>>>>>>>>>>> >>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> We are using the below specs >>>>>>>>>>>>> >>>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>> Kotresh H R >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks and Regards, >>>>>>>>> Kotresh H R >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Kotresh H R >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R >>>>> >>>> > > -- > Thanks and Regards, > Kotresh H R >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/be2697da/attachment-0001.html>
Kotresh Hiremath Ravishankar
2019-Jun-04 17:49 UTC
[Gluster-users] Geo Replication stops replicating
Ccing Sunny, who was investing similar issue. On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan <sdeepugd at gmail.com> wrote:> Have already added the path in bashrc . Still in faulty state > > On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's >> bash, add 'export PATH=/usr/sbin:$PATH' in >> /home/sas/.bashrc >> >> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan <sdeepugd at gmail.com> >> wrote: >> >>> Hi Kortesh >>> Please find the logs of the above error >>> *Master log snippet* >>> >>>> [2019-06-04 11:52:09.254731] I [resource(worker >>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >>>> SSH connection between master and slave... >>>> [2019-06-04 11:52:09.308923] D [repce(worker >>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >>>> 89724:139652759443264:1559649129.31 __repce_version__() ... >>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker >>>> /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>: >>>> connection to peer is broken >>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker >>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >>>> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >>>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ >>>> 192.168.185.107::code-misc --master-node 192.168.185.106 >>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >>>> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >>>> --slave-log-level DEBUG --slave-gluster-log-level INFO >>>> --slave-gluster-command-dir /usr/sbin error=1 >>>> [2019-06-04 11:52:09.614996] I [repce(agent >>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >>>> on reaching EOF. >>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: >>>> worker(/home/sas/gluster/data/code-misc) connected >>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: >>>> worker died in startup phase brick=/home/sas/gluster/data/code-misc >>>> [2019-06-04 11:52:09.619391] I >>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >>>> Change status=Faulty >>>> >>> >>> *Slave log snippet* >>> >>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave >>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] <top>: >>>> Session config file not exists, using the default config >>>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >>>> [2019-06-04 11:50:11.201070] I [resource(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] >>>> GLUSTER: Mounting gluster volume locally... >>>> [2019-06-04 11:50:11.271231] E [resource(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >>>> MountbrokerMounter: glusterd answered mnt>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: >>>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost >>>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >>>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >>>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>> >>> >>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeepugd at gmail.com> >>> wrote: >>> >>>> Hi >>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the >>>> Geo replication failed to start. >>>> Stays in faulty state >>>> >>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at gmail.com> >>>> wrote: >>>> >>>>> Checked the data. It remains in 2708. No progress. >>>>> >>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>>>> khiremat at redhat.com> wrote: >>>>> >>>>>> That means it could be working and the defunct process might be some >>>>>> old zombie one. Could you check, that data progress ? >>>>>> >>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> When i change the rsync option the rsync process doesnt seem to >>>>>>> start . Only a defunt process is listed in ps aux. Only when i set rsync >>>>>>> option to " " and restart all the process the rsync process is listed in ps >>>>>>> aux. >>>>>>> >>>>>>> >>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>>>> khiremat at redhat.com> wrote: >>>>>>> >>>>>>>> Yes, rsync config option should have fixed this issue. >>>>>>>> >>>>>>>> Could you share the output of the following? >>>>>>>> >>>>>>>> 1. gluster volume geo-replication <MASTERVOL> >>>>>>>> <SLAVEHOST>::<SLAVEVOL> config rsync-options >>>>>>>> 2. ps -ef | grep rsync >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < >>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>> >>>>>>>>> Done. >>>>>>>>> We got the following result . >>>>>>>>> >>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>>>> failed: No such file or directory (2)", 128 >>>>>>>>> >>>>>>>>> seems like a file is missing ? >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Could you take the strace with with more string size? The >>>>>>>>>> argument strings are truncated. >>>>>>>>>> >>>>>>>>>> strace -s 500 -ttt -T -p <rsync pid> >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Kotresh >>>>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>>>>>> errors messages and no one reading it. >>>>>>>>>>>>> I think this issue is fixed in latest releases. As a >>>>>>>>>>>>> workaround, you can do following and check if it works. >>>>>>>>>>>>> >>>>>>>>>>>>> Prerequisite: >>>>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>>>> >>>>>>>>>>>>> Workaround: >>>>>>>>>>>>> gluster volume geo-replication <MASTERVOL> >>>>>>>>>>>>> <SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-missing >>>>>>>>>>>>> -args" >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Kotresh HR >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi >>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs >>>>>>>>>>>>>> one is in US west and one is in US east. We took multiple trials for >>>>>>>>>>>>>> different file size. >>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>>>> did not increase in size. >>>>>>>>>>>>>> So we have restarted the geo-replication session and checked >>>>>>>>>>>>>> the status. The status was in an active state and it was in History Crawl >>>>>>>>>>>>>> for a long time. We have enabled the DEBUG mode in logging and checked for >>>>>>>>>>>>>> any error. >>>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. >>>>>>>>>>>>>> The Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>>>> it displays something like this >>>>>>>>>>>>>> >>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> We are using the below specs >>>>>>>>>>>>>> >>>>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Thanks and Regards, >>>>>>>>>> Kotresh H R >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks and Regards, >>>>>>>> Kotresh H R >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Kotresh H R >>>>>> >>>>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> >-- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/215ad6a3/attachment.html>