Kotresh Hiremath Ravishankar
2019-May-31 11:05 UTC
[Gluster-users] Geo Replication stops replicating
That means it could be working and the defunct process might be some old zombie one. Could you check, that data progress ? On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com> wrote:> Hi > When i change the rsync option the rsync process doesnt seem to start . > Only a defunt process is listed in ps aux. Only when i set rsync option to > " " and restart all the process the rsync process is listed in ps aux. > > > On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> Yes, rsync config option should have fixed this issue. >> >> Could you share the output of the following? >> >> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >> config rsync-options >> 2. ps -ef | grep rsync >> >> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com> >> wrote: >> >>> Done. >>> We got the following result . >>> >>>> 1559298781.338234 write(2, "rsync: link_stat >>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>> failed: No such file or directory (2)", 128 >>> >>> seems like a file is missing ? >>> >>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> Hi, >>>> >>>> Could you take the strace with with more string size? The argument >>>> strings are truncated. >>>> >>>> strace -s 500 -ttt -T -p <rsync pid> >>>> >>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com> >>>> wrote: >>>> >>>>> Hi Kotresh >>>>> The above-mentioned work around did not work properly. >>>>> >>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Kotresh >>>>>> We have tried the above-mentioned rsync option and we are planning to >>>>>> have the version upgrade to 6.0. >>>>>> >>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>> khiremat at redhat.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> This looks like the hang because stderr buffer filled up with errors >>>>>>> messages and no one reading it. >>>>>>> I think this issue is fixed in latest releases. As a workaround, you >>>>>>> can do following and check if it works. >>>>>>> >>>>>>> Prerequisite: >>>>>>> rsync version should be > 3.1.0 >>>>>>> >>>>>>> Workaround: >>>>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>>>>>> config rsync-options "--ignore-missing-args" >>>>>>> >>>>>>> Thanks, >>>>>>> Kotresh HR >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> We were evaluating Gluster geo Replication between two DCs one is >>>>>>>> in US west and one is in US east. We took multiple trials for different >>>>>>>> file size. >>>>>>>> The Geo Replication tends to stop replicating but while checking >>>>>>>> the status it appears to be in Active state. But the slave volume did not >>>>>>>> increase in size. >>>>>>>> So we have restarted the geo-replication session and checked the >>>>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>>>> error. >>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>> this problem? Is there anyway to debug it? >>>>>>>> >>>>>>>> We have also checked the strace of the rync program. >>>>>>>> it displays something like this >>>>>>>> >>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>> >>>>>>>> >>>>>>>> We are using the below specs >>>>>>>> >>>>>>>> Gluster version - 4.1.7 >>>>>>>> Sync mode - rsync >>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Kotresh H R >>>>>>> >>>>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> >-- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/97e6ad20/attachment.html>
Checked the data. It remains in 2708. No progress. On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote:> That means it could be working and the defunct process might be some old > zombie one. Could you check, that data progress ? > > On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com> > wrote: > >> Hi >> When i change the rsync option the rsync process doesnt seem to start . >> Only a defunt process is listed in ps aux. Only when i set rsync option to >> " " and restart all the process the rsync process is listed in ps aux. >> >> >> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >> khiremat at redhat.com> wrote: >> >>> Yes, rsync config option should have fixed this issue. >>> >>> Could you share the output of the following? >>> >>> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>> config rsync-options >>> 2. ps -ef | grep rsync >>> >>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com> >>> wrote: >>> >>>> Done. >>>> We got the following result . >>>> >>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>> failed: No such file or directory (2)", 128 >>>> >>>> seems like a file is missing ? >>>> >>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>> khiremat at redhat.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Could you take the strace with with more string size? The argument >>>>> strings are truncated. >>>>> >>>>> strace -s 500 -ttt -T -p <rsync pid> >>>>> >>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Kotresh >>>>>> The above-mentioned work around did not work properly. >>>>>> >>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Kotresh >>>>>>> We have tried the above-mentioned rsync option and we are planning >>>>>>> to have the version upgrade to 6.0. >>>>>>> >>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>> khiremat at redhat.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>> errors messages and no one reading it. >>>>>>>> I think this issue is fixed in latest releases. As a workaround, >>>>>>>> you can do following and check if it works. >>>>>>>> >>>>>>>> Prerequisite: >>>>>>>> rsync version should be > 3.1.0 >>>>>>>> >>>>>>>> Workaround: >>>>>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>>>>>>> config rsync-options "--ignore-missing-args" >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kotresh HR >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi >>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is >>>>>>>>> in US west and one is in US east. We took multiple trials for different >>>>>>>>> file size. >>>>>>>>> The Geo Replication tends to stop replicating but while checking >>>>>>>>> the status it appears to be in Active state. But the slave volume did not >>>>>>>>> increase in size. >>>>>>>>> So we have restarted the geo-replication session and checked the >>>>>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>>>>> error. >>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>> >>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>> it displays something like this >>>>>>>>> >>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>> >>>>>>>>> >>>>>>>>> We are using the below specs >>>>>>>>> >>>>>>>>> Gluster version - 4.1.7 >>>>>>>>> Sync mode - rsync >>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks and Regards, >>>>>>>> Kotresh H R >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R >>>>> >>>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >> > > -- > Thanks and Regards, > Kotresh H R >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/5cecc590/attachment-0001.html>
Hi As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo replication failed to start. Stays in faulty state On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at gmail.com> wrote:> Checked the data. It remains in 2708. No progress. > > On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> That means it could be working and the defunct process might be some old >> zombie one. Could you check, that data progress ? >> >> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com> >> wrote: >> >>> Hi >>> When i change the rsync option the rsync process doesnt seem to start . >>> Only a defunt process is listed in ps aux. Only when i set rsync option to >>> " " and restart all the process the rsync process is listed in ps aux. >>> >>> >>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> Yes, rsync config option should have fixed this issue. >>>> >>>> Could you share the output of the following? >>>> >>>> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>>> config rsync-options >>>> 2. ps -ef | grep rsync >>>> >>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com> >>>> wrote: >>>> >>>>> Done. >>>>> We got the following result . >>>>> >>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>> failed: No such file or directory (2)", 128 >>>>> >>>>> seems like a file is missing ? >>>>> >>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>> khiremat at redhat.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Could you take the strace with with more string size? The argument >>>>>> strings are truncated. >>>>>> >>>>>> strace -s 500 -ttt -T -p <rsync pid> >>>>>> >>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Kotresh >>>>>>> The above-mentioned work around did not work properly. >>>>>>> >>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Kotresh >>>>>>>> We have tried the above-mentioned rsync option and we are planning >>>>>>>> to have the version upgrade to 6.0. >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>> khiremat at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>> errors messages and no one reading it. >>>>>>>>> I think this issue is fixed in latest releases. As a workaround, >>>>>>>>> you can do following and check if it works. >>>>>>>>> >>>>>>>>> Prerequisite: >>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>> >>>>>>>>> Workaround: >>>>>>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>>>>>>>> config rsync-options "--ignore-missing-args" >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Kotresh HR >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is >>>>>>>>>> in US west and one is in US east. We took multiple trials for different >>>>>>>>>> file size. >>>>>>>>>> The Geo Replication tends to stop replicating but while checking >>>>>>>>>> the status it appears to be in Active state. But the slave volume did not >>>>>>>>>> increase in size. >>>>>>>>>> So we have restarted the geo-replication session and checked the >>>>>>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>>>>>> error. >>>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>> >>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>> it displays something like this >>>>>>>>>> >>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> We are using the below specs >>>>>>>>>> >>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>> Sync mode - rsync >>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks and Regards, >>>>>>>>> Kotresh H R >>>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Kotresh H R >>>>>> >>>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/f42094da/attachment-0001.html>