thr3ads.net - Gluster users - [Gluster-users] 答复: 答复: 答复: geo-replication status partial faulty [May 2016]

If this information is useful, please help other people find it:
Share via:

vyyy杨雨阳

2016-May-19 08:45 UTC

[Gluster-users] 答复: 答复: 答复: geo-replication status partial faulty

I have checked all the nodes both on masters and slaves, the software is the
same.

I am puzzled why there were half masters work, halt faulty.


[admin at SVR6996HW2285 ~]$ rpm -qa |grep gluster
glusterfs-api-3.6.3-1.el6.x86_64
glusterfs-fuse-3.6.3-1.el6.x86_64
glusterfs-geo-replication-3.6.3-1.el6.x86_64
glusterfs-3.6.3-1.el6.x86_64
glusterfs-cli-3.6.3-1.el6.x86_64
glusterfs-server-3.6.3-1.el6.x86_64
glusterfs-libs-3.6.3-1.el6.x86_64




Best Regards
??? Yuyang Yang

OPS
Ctrip Infrastructure Service (CIS)
Ctrip Computer Technology (Shanghai) Co., Ltd
Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
Web: www.Ctrip.com<http://www.ctrip.com/>



???: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]
????: Thursday, May 19, 2016 4:33 PM
???: vyyy??? <yuyangyang at Ctrip.com>; Gluster-users at gluster.org;
Aravinda Vishwanathapura Krishna Murthy <avishwan at redhat.com>; Kotresh
Hiremath Ravishankar <khiremat at redhat.com>
??: Re: ??: [Gluster-users] ??: geo-replication status partial faulty

Hi,
+geo-rep team.

Can you get the gluster version you are using?

# For example:
rpm -qa | grep gluster

I hope you have same gluster version installed everywhere.
Please double check and share the same.

Thanks,
Saravana
On 05/19/2016 01:37 PM, vyyy??? wrote:
Hi, Saravana

I have changed log level to DEBUG. Then start geo-replication with log-file
option, attached the file.

gluster volume geo-replication filews
glusterfs01.sh3.ctripcorp.com::filews_slave start --log-file=geo.log

I have checked  /root/.ssh/authorized_keys in glusterfs01.sh3.ctripcorp.com , It
have entries in /var/lib/glusterd/geo-replication/common_secret.pem.pub.  and I
have removed the lines not started with ?command=?

ssh -i /var/lib/glusterd/geo-replication/secret.pem  root@
glusterfs01.sh3.ctripcorp.com
I can see gsyncd messages and no ssh error.


Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows :

[2016-05-19 06:39:23.405974] I
[glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
[2016-05-19 06:39:23.541169] E
[glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read
gsyncd status file
[2016-05-19 06:39:23.541210] E
[glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read the
statusfile for /export/sdb/filews brick for  filews(master),
glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
[2016-05-19 06:39:29.472047] I
[glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using passed config
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
[2016-05-19 06:39:34.939709] I
[glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
[2016-05-19 06:39:35.058520] E
[glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read
gsyncd status file


/var/log/glusterfs/geo-replication/filews/
ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log 
shows as following:

[2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
'7,11,10,9'
[2016-05-19 15:11:37.423882] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor] Monitor:
worker(/export/sdb/filews) died before establishing connection
[2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize] <top>:
exiting.






Best Regards
Yuyang Yang





? ??: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]
????: Thursday, May 19, 2016 1:59 PM
???: vyyy??? <yuyangyang at Ctrip.com><mailto:yuyangyang at
Ctrip.com>; Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org>
??: Re: [Gluster-users] ??: geo-replication status partial faulty

Hi,

There seems to be some issue in glusterfs01.sh3.ctripcorp.com slave node.
Can you share the complete logs ?

You can increase verbosity of debug messages like this:
gluster volume geo-replication <master volume> <slave
host>::<slave volume> config log-level DEBUG


Also, check  /root/.ssh/authorized_keys in glusterfs01.sh3.ctripcorp.com
It should have entries in
/var/lib/glusterd/geo-replication/common_secret.pem.pub (present in master
node).

Have a look at this one for example:
https://www.gluster.org/pipermail/gluster-users/2015-August/023174.html

Thanks,
Saravana
On 05/19/2016 07:53 AM, vyyy??? wrote:
Hello,

I have tried to config a geo-replication volume , all the master nodes
configuration are the same, When I start this volume, the status shows partial
faulty as following:

gluster volume geo-replication filews
glusterfs01.sh3.ctripcorp.com::filews_slave status

MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE                      
STATUS     CHECKPOINT STATUS    CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------------------------
SVR8048HW2285    filews        /export/sdb/filews   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SVR8050HW2285    filews        /export/sdb/filews   
glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A                 
N/A
SVR8047HW2285    filews        /export/sdb/filews   
glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A                 
Hybrid Crawl
SVR8049HW2285    filews        /export/sdb/filews   
glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A                 
Hybrid Crawl
SH02SVR5951      filews        /export/sdb/brick1   
glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A                 
N/A
SH02SVR5953      filews        /export/sdb/brick1   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SVR6995HW2285    filews        /export/sdb/filews   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SH02SVR5954      filews        /export/sdb/brick1   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SVR6994HW2285    filews        /export/sdb/filews   
glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A                 
N/A
SVR6993HW2285    filews        /export/sdb/filews   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SH02SVR5952      filews        /export/sdb/brick1   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SVR6996HW2285    filews        /export/sdb/filews   
glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A                 
N/A

On the faulty node, log file /var/log/glusterfs/geo-replication/filews shows
worker(/export/sdb/filews) died before establishing connection

[2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2016-05-18 16:55:46.517460] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize] <top>:
exiting.
[2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor] Monitor:
worker(/export/sdb/filews) died before establishing connection
[2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------

Any advice and suggestions will be greatly appreciated.





Best Regards
?????? Yuyang Yang







_______________________________________________

Gluster-users mailing list

Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>

http://www.gluster.org/mailman/listinfo/gluster-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160519/d881cd04/attachment.html>

Kotresh Hiremath Ravishankar

2016-May-19 09:06 UTC

head link

[Gluster-users] 答复: 答复: 答复: geo-replication status partial faulty

Hi,

Could you just try 'create force' once to fix those status file errors?

e.g., 'gluster volume geo-rep <master vol> <slave
host>::<slave vol> create push-pem force

Thanks and Regards,
Kotresh H R

----- Original Message -----> From: "vyyy???" <yuyangyang at ctrip.com>
> To: "Saravanakumar Arumugam" <sarumuga at redhat.com>,
Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna
> Murthy" <avishwan at redhat.com>, "Kotresh Hiremath
Ravishankar" <khiremat at redhat.com>
> Sent: Thursday, May 19, 2016 2:15:34 PM
> Subject: ??: ??: [Gluster-users] ??: geo-replication status partial faulty
> 
> I have checked all the nodes both on masters and slaves, the software is
the
> same.
> 
> I am puzzled why there were half masters work, halt faulty.
> 
> 
> [admin at SVR6996HW2285 ~]$ rpm -qa |grep gluster
> glusterfs-api-3.6.3-1.el6.x86_64
> glusterfs-fuse-3.6.3-1.el6.x86_64
> glusterfs-geo-replication-3.6.3-1.el6.x86_64
> glusterfs-3.6.3-1.el6.x86_64
> glusterfs-cli-3.6.3-1.el6.x86_64
> glusterfs-server-3.6.3-1.el6.x86_64
> glusterfs-libs-3.6.3-1.el6.x86_64
> 
> 
> 
> 
> Best Regards
> ??? Yuyang Yang
> 
> OPS
> Ctrip Infrastructure Service (CIS)
> Ctrip Computer Technology (Shanghai) Co., Ltd
> Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> Web: www.Ctrip.com<http://www.ctrip.com/>
> 
> 
> 
> ???: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]
> ????: Thursday, May 19, 2016 4:33 PM
> ???: vyyy??? <yuyangyang at Ctrip.com>; Gluster-users at gluster.org;
Aravinda
> Vishwanathapura Krishna Murthy <avishwan at redhat.com>; Kotresh
Hiremath
> Ravishankar <khiremat at redhat.com>
> ??: Re: ??: [Gluster-users] ??: geo-replication status partial faulty
> 
> Hi,
> +geo-rep team.
> 
> Can you get the gluster version you are using?
> 
> # For example:
> rpm -qa | grep gluster
> 
> I hope you have same gluster version installed everywhere.
> Please double check and share the same.
> 
> Thanks,
> Saravana
> On 05/19/2016 01:37 PM, vyyy??? wrote:
> Hi, Saravana
> 
> I have changed log level to DEBUG. Then start geo-replication with log-file
> option, attached the file.
> 
> gluster volume geo-replication filews
> glusterfs01.sh3.ctripcorp.com::filews_slave start --log-file=geo.log
> 
> I have checked  /root/.ssh/authorized_keys in glusterfs01.sh3.ctripcorp.com
,
> It  have entries in
/var/lib/glusterd/geo-replication/common_secret.pem.pub.
> and I have removed the lines not started with ?command=?
> 
> ssh -i /var/lib/glusterd/geo-replication/secret.pem  root@
> glusterfs01.sh3.ctripcorp.com
> I can see gsyncd messages and no ssh error.
> 
> 
> Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows :
> 
> [2016-05-19 06:39:23.405974] I
> [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config
>
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> [2016-05-19 06:39:23.541169] E
> [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read
> gsyncd status file
> [2016-05-19 06:39:23.541210] E
> [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read the
> statusfile for /export/sdb/filews brick for  filews(master),
> glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> [2016-05-19 06:39:29.472047] I
> [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using passed
> config
>
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> [2016-05-19 06:39:34.939709] I
> [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config
>
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> [2016-05-19 06:39:35.058520] E
> [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read
> gsyncd status file
> 
> 
> /var/log/glusterfs/geo-replication/filews/
>
ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log
> shows as following:
> 
> [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor] Monitor:
> ------------------------------------------------------------
> [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor] Monitor:
> starting gsyncd worker
> [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>:
rpc_fd:
> '7,11,10,9'
> [2016-05-19 15:11:37.423882] I [changelogagent(agent):72:__init__]
> ChangelogAgent: Agent listining...
> [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor] Monitor:
> worker(/export/sdb/filews) died before establishing connection
> [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop] RepceServer:
> terminating on reaching EOF.
> [2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize]
<top>:
> exiting.
> 
> 
> 
> 
> 
> 
> Best Regards
> Yuyang Yang
> 
> 
> 
> 
> 
> ? ??: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]
> ????: Thursday, May 19, 2016 1:59 PM
> ???: vyyy??? <yuyangyang at Ctrip.com><mailto:yuyangyang at
Ctrip.com>;
> Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
> ??: Re: [Gluster-users] ??: geo-replication status partial faulty
> 
> Hi,
> 
> There seems to be some issue in glusterfs01.sh3.ctripcorp.com slave node.
> Can you share the complete logs ?
> 
> You can increase verbosity of debug messages like this:
> gluster volume geo-replication <master volume> <slave
host>::<slave volume>
> config log-level DEBUG
> 
> 
> Also, check  /root/.ssh/authorized_keys in glusterfs01.sh3.ctripcorp.com
> It should have entries in
> /var/lib/glusterd/geo-replication/common_secret.pem.pub (present in master
> node).
> 
> Have a look at this one for example:
> https://www.gluster.org/pipermail/gluster-users/2015-August/023174.html
> 
> Thanks,
> Saravana
> On 05/19/2016 07:53 AM, vyyy??? wrote:
> Hello,
> 
> I have tried to config a geo-replication volume , all the master nodes
> configuration are the same, When I start this volume, the status shows
> partial faulty as following:
> 
> gluster volume geo-replication filews
> glusterfs01.sh3.ctripcorp.com::filews_slave status
> 
> MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE
> STATUS     CHECKPOINT STATUS
> CRAWL STATUS
>
-------------------------------------------------------------------------------------------------------------------------------------------------
> SVR8048HW2285    filews        /export/sdb/filews
> glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> N/A
> SVR8050HW2285    filews        /export/sdb/filews
> glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A
> N/A
> SVR8047HW2285    filews        /export/sdb/filews
> glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A
> Hybrid Crawl
> SVR8049HW2285    filews        /export/sdb/filews
> glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A
> Hybrid Crawl
> SH02SVR5951      filews        /export/sdb/brick1
> glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A
> N/A
> SH02SVR5953      filews        /export/sdb/brick1
> glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> N/A
> SVR6995HW2285    filews        /export/sdb/filews
> glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> N/A
> SH02SVR5954      filews        /export/sdb/brick1
> glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> N/A
> SVR6994HW2285    filews        /export/sdb/filews
> glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A
> N/A
> SVR6993HW2285    filews        /export/sdb/filews
> glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> N/A
> SH02SVR5952      filews        /export/sdb/brick1
> glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> N/A
> SVR6996HW2285    filews        /export/sdb/filews
> glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A
> N/A
> 
> On the faulty node, log file /var/log/glusterfs/geo-replication/filews
shows
> worker(/export/sdb/filews) died before establishing connection
> 
> [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor] Monitor:
> ------------------------------------------------------------
> [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor] Monitor:
> starting gsyncd worker
> [2016-05-18 16:55:46.517460] I [changelogagent(agent):72:__init__]
> ChangelogAgent: Agent listining...
> [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop] RepceServer:
> terminating on reaching EOF.
> [2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize]
<top>:
> exiting.
> [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor] Monitor:
> worker(/export/sdb/filews) died before establishing connection
> [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor] Monitor:
> ------------------------------------------------------------
> 
> Any advice and suggestions will be greatly appreciated.
> 
> 
> 
> 
> 
> Best Regards
> ?????? Yuyang Yang
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> 
> Gluster-users mailing list
> 
> Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
> 
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 
>

Gluster users - May 2016 - 答复: 答复: 答复: geo-replication status partial faulty

[Gluster-users] 答复: 答复: 答复: geo-replication status partial faulty

[Gluster-users] 答复: 答复: 答复: geo-replication status partial faulty