Kotresh Hiremath Ravishankar
2019-May-31 09:55 UTC
[Gluster-users] Geo Replication stops replicating
Hi, Could you take the strace with with more string size? The argument strings are truncated. strace -s 500 -ttt -T -p <rsync pid> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com> wrote:> Hi Kotresh > The above-mentioned work around did not work properly. > > On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com> > wrote: > >> Hi Kotresh >> We have tried the above-mentioned rsync option and we are planning to >> have the version upgrade to 6.0. >> >> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >> khiremat at redhat.com> wrote: >> >>> Hi, >>> >>> This looks like the hang because stderr buffer filled up with errors >>> messages and no one reading it. >>> I think this issue is fixed in latest releases. As a workaround, you can >>> do following and check if it works. >>> >>> Prerequisite: >>> rsync version should be > 3.1.0 >>> >>> Workaround: >>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>> config rsync-options "--ignore-missing-args" >>> >>> Thanks, >>> Kotresh HR >>> >>> >>> >>> >>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com> >>> wrote: >>> >>>> Hi >>>> We were evaluating Gluster geo Replication between two DCs one is in US >>>> west and one is in US east. We took multiple trials for different file >>>> size. >>>> The Geo Replication tends to stop replicating but while checking the >>>> status it appears to be in Active state. But the slave volume did not >>>> increase in size. >>>> So we have restarted the geo-replication session and checked the >>>> status. The status was in an active state and it was in History Crawl for a >>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>> error. >>>> There was around 2000 file appeared for syncing candidate. The Rsync >>>> process starts but the rsync did not happen in the slave volume. Every time >>>> the rsync process appears in the "ps auxxx" list but the replication did >>>> not happen in the slave end. What would be the cause of this problem? Is >>>> there anyway to debug it? >>>> >>>> We have also checked the strace of the rync program. >>>> it displays something like this >>>> >>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>> >>>> >>>> We are using the below specs >>>> >>>> Gluster version - 4.1.7 >>>> Sync mode - rsync >>>> Volume - 1x3 in each end (master and slave) >>>> Intranet Bandwidth - 10 Gig >>>> >>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >>-- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/b89a13b6/attachment.html>
Done. We got the following result .> 1559298781.338234 write(2, "rsync: link_stat > \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" > failed: No such file or directory (2)", 128seems like a file is missing ? On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote:> Hi, > > Could you take the strace with with more string size? The argument strings > are truncated. > > strace -s 500 -ttt -T -p <rsync pid> > > On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com> > wrote: > >> Hi Kotresh >> The above-mentioned work around did not work properly. >> >> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com> >> wrote: >> >>> Hi Kotresh >>> We have tried the above-mentioned rsync option and we are planning to >>> have the version upgrade to 6.0. >>> >>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> Hi, >>>> >>>> This looks like the hang because stderr buffer filled up with errors >>>> messages and no one reading it. >>>> I think this issue is fixed in latest releases. As a workaround, you >>>> can do following and check if it works. >>>> >>>> Prerequisite: >>>> rsync version should be > 3.1.0 >>>> >>>> Workaround: >>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>>> config rsync-options "--ignore-missing-args" >>>> >>>> Thanks, >>>> Kotresh HR >>>> >>>> >>>> >>>> >>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com> >>>> wrote: >>>> >>>>> Hi >>>>> We were evaluating Gluster geo Replication between two DCs one is in >>>>> US west and one is in US east. We took multiple trials for different file >>>>> size. >>>>> The Geo Replication tends to stop replicating but while checking the >>>>> status it appears to be in Active state. But the slave volume did not >>>>> increase in size. >>>>> So we have restarted the geo-replication session and checked the >>>>> status. The status was in an active state and it was in History Crawl for a >>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>> error. >>>>> There was around 2000 file appeared for syncing candidate. The Rsync >>>>> process starts but the rsync did not happen in the slave volume. Every time >>>>> the rsync process appears in the "ps auxxx" list but the replication did >>>>> not happen in the slave end. What would be the cause of this problem? Is >>>>> there anyway to debug it? >>>>> >>>>> We have also checked the strace of the rync program. >>>>> it displays something like this >>>>> >>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>> >>>>> >>>>> We are using the below specs >>>>> >>>>> Gluster version - 4.1.7 >>>>> Sync mode - rsync >>>>> Volume - 1x3 in each end (master and slave) >>>>> Intranet Bandwidth - 10 Gig >>>>> >>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> > > -- > Thanks and Regards, > Kotresh H R >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/a6cdb69c/attachment-0001.html>
Kotresh Hiremath Ravishankar
2019-May-31 10:52 UTC
[Gluster-users] Geo Replication stops replicating
Yes, rsync config option should have fixed this issue. Could you share the output of the following? 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> config rsync-options 2. ps -ef | grep rsync On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com> wrote:> Done. > We got the following result . > >> 1559298781.338234 write(2, "rsync: link_stat >> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >> failed: No such file or directory (2)", 128 > > seems like a file is missing ? > > On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> Hi, >> >> Could you take the strace with with more string size? The argument >> strings are truncated. >> >> strace -s 500 -ttt -T -p <rsync pid> >> >> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com> >> wrote: >> >>> Hi Kotresh >>> The above-mentioned work around did not work properly. >>> >>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com> >>> wrote: >>> >>>> Hi Kotresh >>>> We have tried the above-mentioned rsync option and we are planning to >>>> have the version upgrade to 6.0. >>>> >>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>> khiremat at redhat.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> This looks like the hang because stderr buffer filled up with errors >>>>> messages and no one reading it. >>>>> I think this issue is fixed in latest releases. As a workaround, you >>>>> can do following and check if it works. >>>>> >>>>> Prerequisite: >>>>> rsync version should be > 3.1.0 >>>>> >>>>> Workaround: >>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>>>> config rsync-options "--ignore-missing-args" >>>>> >>>>> Thanks, >>>>> Kotresh HR >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> We were evaluating Gluster geo Replication between two DCs one is in >>>>>> US west and one is in US east. We took multiple trials for different file >>>>>> size. >>>>>> The Geo Replication tends to stop replicating but while checking the >>>>>> status it appears to be in Active state. But the slave volume did not >>>>>> increase in size. >>>>>> So we have restarted the geo-replication session and checked the >>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>> error. >>>>>> There was around 2000 file appeared for syncing candidate. The Rsync >>>>>> process starts but the rsync did not happen in the slave volume. Every time >>>>>> the rsync process appears in the "ps auxxx" list but the replication did >>>>>> not happen in the slave end. What would be the cause of this problem? Is >>>>>> there anyway to debug it? >>>>>> >>>>>> We have also checked the strace of the rync program. >>>>>> it displays something like this >>>>>> >>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>> >>>>>> >>>>>> We are using the below specs >>>>>> >>>>>> Gluster version - 4.1.7 >>>>>> Sync mode - rsync >>>>>> Volume - 1x3 in each end (master and slave) >>>>>> Intranet Bandwidth - 10 Gig >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R >>>>> >>>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> >-- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/c5663c74/attachment.html>