thr3ads.net - Gluster users - [Gluster-users] 答复: 答复: 答复: 答复: 答复: 答复: geo-replication status partial faulty [May 2016]

If this information is useful, please help other people find it:
Share via:

Kotresh Hiremath Ravishankar

2016-May-24 10:40 UTC

[Gluster-users] 答复: 答复: 答复: 答复: 答复: geo-replication status partial faulty

Ok, it looks like there is a problem with ssh key distribution.

Before I suggest to clean those up and do setup again, could you share the
output of
following commands

1. gluster vol geo-rep <master_vol> <slave_host>::slave status
2. ls -l /var/lib/glusterd/geo-replication/

Is there multiple geo-rep sessions from this master volume or only one?

Thanks and Regards,
Kotresh H R

----- Original Message -----> From: "vyyy???" <yuyangyang at ctrip.com>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>,
Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna
> Murthy" <avishwan at redhat.com>
> Sent: Tuesday, May 24, 2016 3:19:55 PM
> Subject: ??: ??: ??: ??: [Gluster-users] ??: geo-replication status partial
faulty
> 
> We can establish passwordless ssh directly with command 'ssh' , but
when
> create push-pem, it shows ' Passwordless ssh login has not been setup
'
> unless copy secret.pem to *id_rsa.pub
> 
> [root at SVR8048HW2285 ~]#  ssh -i
/var/lib/glusterd/geo-replication/secret.pem
> root at glusterfs01.sh3.ctripcorp.com
> Last login: Tue May 24 17:23:53 2016 from 10.8.230.213
> This is a private network server, in monitoring state.
> It is strictly prohibited to unauthorized access and used.
> [root at SVR6519HW2285 ~]#
> 
> 
> [root at SVR8048HW2285 filews]# gluster volume geo-replication filews
> glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force
> Passwordless ssh login has not been setup with
glusterfs01.sh3.ctripcorp.com
> for user root.
> geo-replication command failed
> [root at SVR8048HW2285 filews]#
> 
> 
> 
> Best Regards
> ??? Yuyang Yang
> 
> 
> -----????-----
> ???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com]
> ????: Tuesday, May 24, 2016 3:22 PM
> ???: vyyy??? <yuyangyang at Ctrip.com>
> ??: Saravanakumar Arumugam <sarumuga at redhat.com>; Gluster-users at
gluster.org;
> Aravinda Vishwanathapura Krishna Murthy <avishwan at redhat.com>
> ??: Re: ??: ??: ??: [Gluster-users] ??: geo-replication status partial
faulty
> 
> Hi
> 
> Could you try following command from corresponding masters to faulty slave
> nodes and share the output?
> The below command should not ask for password and should run gsync.py.
> 
> ssh -i /var/lib/glusterd/geo-replication/secret.pem root@<faulty
hosts>
> 
> To establish passwordless ssh, it is not necessary to copy secret.pem to
> *id_rsa.pub.
> 
> If the geo-rep session is already established, passwordless ssh would
already
> be there.
> My suspect is that when I asked you to do 'create force' you did it
using
> another slave where password less ssh was not setup. This would create
> another session directory in '/var/lib/glusterd/geo-replication'
i.e
> (<master_vol>_<slave_host>_<slave_vol>)
> 
> Please check and let us know.
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
> > From: "vyyy???" <yuyangyang at ctrip.com>
> > To: "Kotresh Hiremath Ravishankar" <khiremat at
redhat.com>
> > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>,
> > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna
Murthy"
> > <avishwan at redhat.com>
> > Sent: Friday, May 20, 2016 12:35:58 PM
> > Subject: ??: ??: ??: [Gluster-users] ??: geo-replication status
> > partial faulty
> > 
> > Hello, Kotresh
> > 
> > I 'create force', but still some nodes work ,some nodes
faulty.
> > 
> > On faulty nodes
> > etc-glusterfs-glusterd.vol.log shown:
> > [2016-05-20 06:27:03.260870] I
> > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed
> > config
> >
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > [2016-05-20 06:27:03.404544] E
> > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to
> > read gsyncd status file
> > [2016-05-20 06:27:03.404583] E
> > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read
> > the statusfile for /export/sdb/brick1 brick for  filews(master),
> > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > 
> > 
> > /var/log/glusterfs/geo-replication/filews/ssh%3A%2F%2Froot%4010.15.65.
> > 66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log
> > shown:
> > [2016-05-20 15:04:01.858340] I [monitor(monitor):215:monitor] Monitor:
> > ------------------------------------------------------------
> > [2016-05-20 15:04:01.858688] I [monitor(monitor):216:monitor] Monitor:
> > starting gsyncd worker
> > [2016-05-20 15:04:01.986754] D [gsyncd(agent):627:main_i] <top>:
rpc_fd:
> > '7,11,10,9'
> > [2016-05-20 15:04:01.987505] I [changelogagent(agent):72:__init__]
> > ChangelogAgent: Agent listining...
> > [2016-05-20 15:04:01.988079] I [repce(agent):92:service_loop]
RepceServer:
> > terminating on reaching EOF.
> > [2016-05-20 15:04:01.988238] I [syncdutils(agent):214:finalize]
<top>:
> > exiting.
> > [2016-05-20 15:04:01.988250] I [monitor(monitor):267:monitor] Monitor:
> > worker(/export/sdb/brick1) died before establishing connection
> > 
> > Can you help me!
> > 
> > 
> > Best Regards
> > ??? Yuyang Yang
> > 
> > 
> > 
> > -----????-----
> > ???: vyyy???
> > ????: Thursday, May 19, 2016 7:45 PM
> > ???: 'Kotresh Hiremath Ravishankar' <khiremat at
redhat.com>
> > ??: Saravanakumar Arumugam <sarumuga at redhat.com>;
> > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy
> > <avishwan at redhat.com>
> > ??: ??: ??: ??: [Gluster-users] ??: geo-replication status partial
> > faulty
> > 
> > Still not work.
> > 
> > I need copy /var/lib/glusterd/geo-replication/secret.* to
> > /root/.ssh/id_rsa to make passwordless ssh work.
> > 
> >  I generate /var/lib/glusterd/geo-replication/secret.pem file on
> > every  master nodes.
> > 
> > I am not sure is this right.
> > 
> > 
> > [root at sh02svr5956 ~]# gluster volume geo-replication filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force
> > Passwordless ssh login has not been setup with
> > glusterfs01.sh3.ctripcorp.com for user root.
> > geo-replication command failed
> > 
> > [root at sh02svr5956 .ssh]# cp
> > /var/lib/glusterd/geo-replication/secret.pem
> > ./id_rsa
> > cp: overwrite `./id_rsa'? y
> > [root at sh02svr5956 .ssh]# cp
> > /var/lib/glusterd/geo-replication/secret.pem.pub
> > ./id_rsa.pub
> > cp: overwrite `./id_rsa.pub'?
> > 
> >  [root at sh02svr5956 ~]# gluster volume geo-replication filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force
> > Creating  geo-replication session between filews &
> > glusterfs01.sh3.ctripcorp.com::filews_slave has been successful
> > [root at sh02svr5956 ~]#
> > 
> > 
> > 
> > 
> > Best Regards
> > ??? Yuyang Yang
> > OPS
> > Ctrip Infrastructure Service (CIS)
> > Ctrip Computer Technology (Shanghai) Co., Ltd
> > Phone:?+ 86 21 34064880-15554?| Fax: + 86 21 52514588-13389
> > Web:?www.Ctrip.com
> > 
> > 
> > -----????-----
> > ???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com]
> > ????: Thursday, May 19, 2016 5:07 PM
> > ???: vyyy??? <yuyangyang at Ctrip.com>
> > ??: Saravanakumar Arumugam <sarumuga at redhat.com>;
> > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy
> > <avishwan at redhat.com>
> > ??: Re: ??: ??: [Gluster-users] ??: geo-replication status partial
> > faulty
> > 
> > Hi,
> > 
> > Could you just try 'create force' once to fix those status
file errors?
> > 
> > e.g., 'gluster volume geo-rep <master vol> <slave
host>::<slave vol>
> > create push-pem force
> > 
> > Thanks and Regards,
> > Kotresh H R
> > 
> > ----- Original Message -----
> > > From: "vyyy???" <yuyangyang at ctrip.com>
> > > To: "Saravanakumar Arumugam" <sarumuga at
redhat.com>,
> > > Gluster-users at gluster.org, "Aravinda Vishwanathapura
Krishna Murthy"
> > > <avishwan at redhat.com>, "Kotresh Hiremath
Ravishankar"
> > > <khiremat at redhat.com>
> > > Sent: Thursday, May 19, 2016 2:15:34 PM
> > > Subject: ??: ??: [Gluster-users] ??: geo-replication status
partial
> > > faulty
> > > 
> > > I have checked all the nodes both on masters and slaves, the
> > > software is the same.
> > > 
> > > I am puzzled why there were half masters work, halt faulty.
> > > 
> > > 
> > > [admin at SVR6996HW2285 ~]$ rpm -qa |grep gluster
> > > glusterfs-api-3.6.3-1.el6.x86_64
> > > glusterfs-fuse-3.6.3-1.el6.x86_64
> > > glusterfs-geo-replication-3.6.3-1.el6.x86_64
> > > glusterfs-3.6.3-1.el6.x86_64
> > > glusterfs-cli-3.6.3-1.el6.x86_64
> > > glusterfs-server-3.6.3-1.el6.x86_64
> > > glusterfs-libs-3.6.3-1.el6.x86_64
> > > 
> > > 
> > > 
> > > 
> > > Best Regards
> > > ??? Yuyang Yang
> > > 
> > > OPS
> > > Ctrip Infrastructure Service (CIS)
> > > Ctrip Computer Technology (Shanghai) Co., Ltd
> > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > > Web: www.Ctrip.com<http://www.ctrip.com/>
> > > 
> > > 
> > > 
> > > ???: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]
> > > ????: Thursday, May 19, 2016 4:33 PM
> > > ???: vyyy??? <yuyangyang at Ctrip.com>; Gluster-users at
gluster.org;
> > > Aravinda Vishwanathapura Krishna Murthy <avishwan at
redhat.com>;
> > > Kotresh Hiremath Ravishankar <khiremat at redhat.com>
> > > ??: Re: ??: [Gluster-users] ??: geo-replication status partial
> > > faulty
> > > 
> > > Hi,
> > > +geo-rep team.
> > > 
> > > Can you get the gluster version you are using?
> > > 
> > > # For example:
> > > rpm -qa | grep gluster
> > > 
> > > I hope you have same gluster version installed everywhere.
> > > Please double check and share the same.
> > > 
> > > Thanks,
> > > Saravana
> > > On 05/19/2016 01:37 PM, vyyy??? wrote:
> > > Hi, Saravana
> > > 
> > > I have changed log level to DEBUG. Then start geo-replication
with
> > > log-file option, attached the file.
> > > 
> > > gluster volume geo-replication filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave start
--log-file=geo.log
> > > 
> > > I have checked  /root/.ssh/authorized_keys in
> > > glusterfs01.sh3.ctripcorp.com , It  have entries in
> > > /var/lib/glusterd/geo-replication/common_secret.pem.pub.
> > > and I have removed the lines not started with ?command=?
> > > 
> > > ssh -i /var/lib/glusterd/geo-replication/secret.pem  root@
> > > glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no
ssh
> > > error.
> > > 
> > > 
> > > Attached etc-glusterfs-glusterd.vol.log from faulty node, it
shows :
> > > 
> > > [2016-05-19 06:39:23.405974] I
> > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using
passed
> > > config
> > >
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > [2016-05-19 06:39:23.541169] E
> > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-:
Unable
> > > to read gsyncd status file
> > > [2016-05-19 06:39:23.541210] E
> > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to
> > > read the statusfile for /export/sdb/filews brick for
> > > filews(master),
> > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > > [2016-05-19 06:39:29.472047] I
> > > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using
> > > passed config
> > >
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > [2016-05-19 06:39:34.939709] I
> > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using
passed
> > > config
> > >
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > [2016-05-19 06:39:35.058520] E
> > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-:
Unable
> > > to read gsyncd status file
> > > 
> > > 
> > > /var/log/glusterfs/geo-replication/filews/
> > >
ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_
> > > sl
> > > ave.log
> > > shows as following:
> > > 
> > > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor]
Monitor:
> > > ------------------------------------------------------------
> > > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor]
Monitor:
> > > starting gsyncd worker
> > > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i]
<top>: rpc_fd:
> > > '7,11,10,9'
> > > [2016-05-19 15:11:37.423882] I
[changelogagent(agent):72:__init__]
> > > ChangelogAgent: Agent listining...
> > > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor]
Monitor:
> > > worker(/export/sdb/filews) died before establishing connection
> > > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop]
> > > RepceServer:
> > > terminating on reaching EOF.
> > > [2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize]
<top>:
> > > exiting.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Best Regards
> > > Yuyang Yang
> > > 
> > > 
> > > 
> > > 
> > > 
> > > ? ??: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]and
share what's
> > > the output?
> > > ????: Thursday, May 19, 2016 1:59 PM
> > > ???: vyyy??? <yuyangyang at Ctrip.com><mailto:yuyangyang
at Ctrip.com>;
> > > Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org>
> > > ??: Re: [Gluster-users] ??: geo-replication status partial faulty
> > > 
> > > Hi,
> > > 
> > > There seems to be some issue in glusterfs01.sh3.ctripcorp.com
slave node.
> > > Can you share the complete logs ?
> > > 
> > > You can increase verbosity of debug messages like this:
> > > gluster volume geo-replication <master volume> <slave
host>::<slave
> > > volume> config log-level DEBUG
> > > 
> > > 
> > > Also, check  /root/.ssh/authorized_keys in
> > > glusterfs01.sh3.ctripcorp.com It should have entries in
> > > /var/lib/glusterd/geo-replication/common_secret.pem.pub (present
in
> > > master node).
> > > 
> > > Have a look at this one for example:
> > >
https://www.gluster.org/pipermail/gluster-users/2015-August/023174.h
> > > tm
> > > l
> > > 
> > > Thanks,
> > > Saravana
> > > On 05/19/2016 07:53 AM, vyyy??? wrote:
> > > Hello,
> > > 
> > > I have tried to config a geo-replication volume , all the master
> > > nodes configuration are the same, When I start this volume, the
> > > status shows partial faulty as following:
> > > 
> > > gluster volume geo-replication filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave status
> > > 
> > > MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE
> > > STATUS     CHECKPOINT STATUS
> > > CRAWL STATUS
> > >
-------------------------------------------------------------------------------------------------------------------------------------------------
> > > SVR8048HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR8050HW2285    filews        /export/sdb/filews
> > > glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > SVR8047HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > Hybrid Crawl
> > > SVR8049HW2285    filews        /export/sdb/filews
> > > glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > Hybrid Crawl
> > > SH02SVR5951      filews        /export/sdb/brick1
> > > glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > SH02SVR5953      filews        /export/sdb/brick1
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR6995HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SH02SVR5954      filews        /export/sdb/brick1
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR6994HW2285    filews        /export/sdb/filews
> > > glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > SVR6993HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SH02SVR5952      filews        /export/sdb/brick1
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR6996HW2285    filews        /export/sdb/filews
> > > glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > 
> > > On the faulty node, log file
> > > /var/log/glusterfs/geo-replication/filews
> > > shows
> > > worker(/export/sdb/filews) died before establishing connection
> > > 
> > > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor]
Monitor:
> > > ------------------------------------------------------------
> > > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor]
Monitor:
> > > starting gsyncd worker
> > > [2016-05-18 16:55:46.517460] I
[changelogagent(agent):72:__init__]
> > > ChangelogAgent: Agent listining...
> > > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop]
> > > RepceServer:
> > > terminating on reaching EOF.
> > > [2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize]
<top>:
> > > exiting.
> > > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor]
Monitor:
> > > worker(/export/sdb/filews) died before establishing connection
> > > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor]
Monitor:
> > > ------------------------------------------------------------
> > > 
> > > Any advice and suggestions will be greatly appreciated.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Best Regards
> > >        Yuyang Yang
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > 
> > > Gluster-users mailing list
> > > 
> > > Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org>
> > > 
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > 
> > > 
> > > 
> > 
>

vyyy杨雨阳

2016-May-25 01:41 UTC

head link

[Gluster-users] 答复: 答复: 答复: 答复: 答复: 答复: geo-replication status partial faulty

Commands output as following, Thanks

[root at SVR8048HW2285 ~]# gluster volume geo-replication filews
glusterfs01.sh3.ctripcorp.com::filews_slave status
 
MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE                      
STATUS     CHECKPOINT STATUS    CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------------------------
SVR8048HW2285    filews        /export/sdb/filews   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SH02SVR5954      filews        /export/sdb/brick1   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SH02SVR5951      filews        /export/sdb/brick1   
glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A                 
N/A
SVR8050HW2285    filews        /export/sdb/filews   
glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A                 
N/A
SVR8049HW2285    filews        /export/sdb/filews   
glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A                 
Hybrid Crawl
SVR8047HW2285    filews        /export/sdb/filews   
glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A                 
Hybrid Crawl
SVR6995HW2285    filews        /export/sdb/filews   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SVR6993HW2285    filews        /export/sdb/filews   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SH02SVR5953      filews        /export/sdb/brick1   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SH02SVR5952      filews        /export/sdb/brick1   
glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A                 
N/A
SVR6996HW2285    filews        /export/sdb/filews   
glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A                 
N/A
SVR6994HW2285    filews        /export/sdb/filews   
glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A                 
N/A
           
[root at SVR8048HW2285 ~]# ls -l /var/lib/glusterd/geo-replication/
total 40
-rw------- 1 root root 14140 May 20 16:00 common_secret.pem.pub
drwxr-xr-x 2 root root  4096 May 25 09:35
filews_glusterfs01.sh3.ctripcorp.com_filews_slave
-rwxr-xr-x 1 root root  1845 May 17 15:04 gsyncd_template.conf
-rw------- 1 root root  1675 May 20 11:03 secret.pem
-rw-r--r-- 1 root root   400 May 20 11:03 secret.pem.pub
-rw------- 1 root root  1675 May 20 16:00 tar_ssh.pem
-rw-r--r-- 1 root root   400 May 20 16:00 tar_ssh.pem.pub
[root at SVR8048HW2285 ~]#



Best Regards 
??? Yuyang Yang
OPS 
Ctrip Infrastructure Service (CIS)
Ctrip Computer Technology (Shanghai) Co., Ltd? 
Phone:?+ 86 21 34064880-15554?| Fax: + 86 21 52514588-13389 
Web:?www.Ctrip.com


-----????-----
???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com] 
????: Tuesday, May 24, 2016 6:41 PM
???: vyyy??? <yuyangyang at Ctrip.com>
??: Saravanakumar Arumugam <sarumuga at redhat.com>; Gluster-users at
gluster.org; Aravinda Vishwanathapura Krishna Murthy <avishwan at
redhat.com>
??: Re: ??: ??: ??: ??: [Gluster-users] ??: geo-replication status partial
faulty

Ok, it looks like there is a problem with ssh key distribution.

Before I suggest to clean those up and do setup again, could you share the
output of following commands

1. gluster vol geo-rep <master_vol> <slave_host>::slave status 2. ls
-l /var/lib/glusterd/geo-replication/

Is there multiple geo-rep sessions from this master volume or only one?

Thanks and Regards,
Kotresh H R

----- Original Message -----> From: "vyyy???" <yuyangyang at ctrip.com>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>, 
> Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna
Murthy"
> <avishwan at redhat.com>
> Sent: Tuesday, May 24, 2016 3:19:55 PM
> Subject: ??: ??: ??: ??: [Gluster-users] ??: geo-replication status 
> partial faulty
> 
> We can establish passwordless ssh directly with command 'ssh' , but
> when create push-pem, it shows ' Passwordless ssh login has not been
setup '
> unless copy secret.pem to *id_rsa.pub
> 
> [root at SVR8048HW2285 ~]#  ssh -i 
> /var/lib/glusterd/geo-replication/secret.pem
> root at glusterfs01.sh3.ctripcorp.com
> Last login: Tue May 24 17:23:53 2016 from 10.8.230.213 This is a 
> private network server, in monitoring state.
> It is strictly prohibited to unauthorized access and used.
> [root at SVR6519HW2285 ~]#
> 
> 
> [root at SVR8048HW2285 filews]# gluster volume geo-replication filews 
> glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force 
> Passwordless ssh login has not been setup with 
> glusterfs01.sh3.ctripcorp.com for user root.
> geo-replication command failed
> [root at SVR8048HW2285 filews]#
> 
> 
> 
> Best Regards
> ??? Yuyang Yang
> 
> 
> -----????-----
> ???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com]
> ????: Tuesday, May 24, 2016 3:22 PM
> ???: vyyy??? <yuyangyang at Ctrip.com>
> ??: Saravanakumar Arumugam <sarumuga at redhat.com>; 
> Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy 
> <avishwan at redhat.com>
> ??: Re: ??: ??: ??: [Gluster-users] ??: geo-replication status partial 
> faulty
> 
> Hi
> 
> Could you try following command from corresponding masters to faulty 
> slave nodes and share the output?
> The below command should not ask for password and should run gsync.py.
> 
> ssh -i /var/lib/glusterd/geo-replication/secret.pem root@<faulty 
> hosts>
> 
> To establish passwordless ssh, it is not necessary to copy secret.pem 
> to *id_rsa.pub.
> 
> If the geo-rep session is already established, passwordless ssh would 
> already be there.
> My suspect is that when I asked you to do 'create force' you did it
> using another slave where password less ssh was not setup. This would 
> create another session directory in 
> '/var/lib/glusterd/geo-replication' i.e
> (<master_vol>_<slave_host>_<slave_vol>)
> 
> Please check and let us know.
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
> > From: "vyyy???" <yuyangyang at ctrip.com>
> > To: "Kotresh Hiremath Ravishankar" <khiremat at
redhat.com>
> > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>,
> > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna
Murthy"
> > <avishwan at redhat.com>
> > Sent: Friday, May 20, 2016 12:35:58 PM
> > Subject: ??: ??: ??: [Gluster-users] ??: geo-replication status 
> > partial faulty
> > 
> > Hello, Kotresh
> > 
> > I 'create force', but still some nodes work ,some nodes
faulty.
> > 
> > On faulty nodes
> > etc-glusterfs-glusterd.vol.log shown:
> > [2016-05-20 06:27:03.260870] I
> > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed 
> > config 
> >
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > [2016-05-20 06:27:03.404544] E
> > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable 
> > to read gsyncd status file
> > [2016-05-20 06:27:03.404583] E
> > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to 
> > read the statusfile for /export/sdb/brick1 brick for  
> > filews(master),
> > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > 
> > 
> > /var/log/glusterfs/geo-replication/filews/ssh%3A%2F%2Froot%4010.15.65.
> > 66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log
> > shown:
> > [2016-05-20 15:04:01.858340] I [monitor(monitor):215:monitor] Monitor:
> > ------------------------------------------------------------
> > [2016-05-20 15:04:01.858688] I [monitor(monitor):216:monitor] Monitor:
> > starting gsyncd worker
> > [2016-05-20 15:04:01.986754] D [gsyncd(agent):627:main_i] <top>:
rpc_fd:
> > '7,11,10,9'
> > [2016-05-20 15:04:01.987505] I [changelogagent(agent):72:__init__]
> > ChangelogAgent: Agent listining...
> > [2016-05-20 15:04:01.988079] I [repce(agent):92:service_loop]
RepceServer:
> > terminating on reaching EOF.
> > [2016-05-20 15:04:01.988238] I [syncdutils(agent):214:finalize]
<top>:
> > exiting.
> > [2016-05-20 15:04:01.988250] I [monitor(monitor):267:monitor] Monitor:
> > worker(/export/sdb/brick1) died before establishing connection
> > 
> > Can you help me!
> > 
> > 
> > Best Regards
> > ??? Yuyang Yang
> > 
> > 
> > 
> > -----????-----
> > ???: vyyy???
> > ????: Thursday, May 19, 2016 7:45 PM
> > ???: 'Kotresh Hiremath Ravishankar' <khiremat at
redhat.com>
> > ??: Saravanakumar Arumugam <sarumuga at redhat.com>; 
> > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy 
> > <avishwan at redhat.com>
> > ??: ??: ??: ??: [Gluster-users] ??: geo-replication status partial 
> > faulty
> > 
> > Still not work.
> > 
> > I need copy /var/lib/glusterd/geo-replication/secret.* to 
> > /root/.ssh/id_rsa to make passwordless ssh work.
> > 
> >  I generate /var/lib/glusterd/geo-replication/secret.pem file on 
> > every  master nodes.
> > 
> > I am not sure is this right.
> > 
> > 
> > [root at sh02svr5956 ~]# gluster volume geo-replication filews 
> > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force 
> > Passwordless ssh login has not been setup with 
> > glusterfs01.sh3.ctripcorp.com for user root.
> > geo-replication command failed
> > 
> > [root at sh02svr5956 .ssh]# cp
> > /var/lib/glusterd/geo-replication/secret.pem
> > ./id_rsa
> > cp: overwrite `./id_rsa'? y
> > [root at sh02svr5956 .ssh]# cp
> > /var/lib/glusterd/geo-replication/secret.pem.pub
> > ./id_rsa.pub
> > cp: overwrite `./id_rsa.pub'?
> > 
> >  [root at sh02svr5956 ~]# gluster volume geo-replication filews 
> > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force 
> > Creating  geo-replication session between filews & 
> > glusterfs01.sh3.ctripcorp.com::filews_slave has been successful
> > [root at sh02svr5956 ~]#
> > 
> > 
> > 
> > 
> > Best Regards
> > ??? Yuyang Yang
> > OPS
> > Ctrip Infrastructure Service (CIS)
> > Ctrip Computer Technology (Shanghai) Co., Ltd
> > Phone:?+ 86 21 34064880-15554?| Fax: + 86 21 52514588-13389
> > Web:?www.Ctrip.com
> > 
> > 
> > -----????-----
> > ???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com]
> > ????: Thursday, May 19, 2016 5:07 PM
> > ???: vyyy??? <yuyangyang at Ctrip.com>
> > ??: Saravanakumar Arumugam <sarumuga at redhat.com>; 
> > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy 
> > <avishwan at redhat.com>
> > ??: Re: ??: ??: [Gluster-users] ??: geo-replication status partial 
> > faulty
> > 
> > Hi,
> > 
> > Could you just try 'create force' once to fix those status
file errors?
> > 
> > e.g., 'gluster volume geo-rep <master vol> <slave
host>::<slave vol>
> > create push-pem force
> > 
> > Thanks and Regards,
> > Kotresh H R
> > 
> > ----- Original Message -----
> > > From: "vyyy???" <yuyangyang at ctrip.com>
> > > To: "Saravanakumar Arumugam" <sarumuga at
redhat.com>,
> > > Gluster-users at gluster.org, "Aravinda Vishwanathapura
Krishna Murthy"
> > > <avishwan at redhat.com>, "Kotresh Hiremath
Ravishankar"
> > > <khiremat at redhat.com>
> > > Sent: Thursday, May 19, 2016 2:15:34 PM
> > > Subject: ??: ??: [Gluster-users] ??: geo-replication status 
> > > partial faulty
> > > 
> > > I have checked all the nodes both on masters and slaves, the 
> > > software is the same.
> > > 
> > > I am puzzled why there were half masters work, halt faulty.
> > > 
> > > 
> > > [admin at SVR6996HW2285 ~]$ rpm -qa |grep gluster
> > > glusterfs-api-3.6.3-1.el6.x86_64
> > > glusterfs-fuse-3.6.3-1.el6.x86_64
> > > glusterfs-geo-replication-3.6.3-1.el6.x86_64
> > > glusterfs-3.6.3-1.el6.x86_64
> > > glusterfs-cli-3.6.3-1.el6.x86_64
> > > glusterfs-server-3.6.3-1.el6.x86_64
> > > glusterfs-libs-3.6.3-1.el6.x86_64
> > > 
> > > 
> > > 
> > > 
> > > Best Regards
> > > ??? Yuyang Yang
> > > 
> > > OPS
> > > Ctrip Infrastructure Service (CIS) Ctrip Computer Technology 
> > > (Shanghai) Co., Ltd
> > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > > Web: www.Ctrip.com<http://www.ctrip.com/>
> > > 
> > > 
> > > 
> > > ???: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]
> > > ????: Thursday, May 19, 2016 4:33 PM
> > > ???: vyyy??? <yuyangyang at Ctrip.com>; Gluster-users at
gluster.org;
> > > Aravinda Vishwanathapura Krishna Murthy <avishwan at
redhat.com>;
> > > Kotresh Hiremath Ravishankar <khiremat at redhat.com>
> > > ??: Re: ??: [Gluster-users] ??: geo-replication status partial 
> > > faulty
> > > 
> > > Hi,
> > > +geo-rep team.
> > > 
> > > Can you get the gluster version you are using?
> > > 
> > > # For example:
> > > rpm -qa | grep gluster
> > > 
> > > I hope you have same gluster version installed everywhere.
> > > Please double check and share the same.
> > > 
> > > Thanks,
> > > Saravana
> > > On 05/19/2016 01:37 PM, vyyy??? wrote:
> > > Hi, Saravana
> > > 
> > > I have changed log level to DEBUG. Then start geo-replication
with
> > > log-file option, attached the file.
> > > 
> > > gluster volume geo-replication filews 
> > > glusterfs01.sh3.ctripcorp.com::filews_slave start 
> > > --log-file=geo.log
> > > 
> > > I have checked  /root/.ssh/authorized_keys in 
> > > glusterfs01.sh3.ctripcorp.com , It  have entries in 
> > > /var/lib/glusterd/geo-replication/common_secret.pem.pub.
> > > and I have removed the lines not started with ?command=?
> > > 
> > > ssh -i /var/lib/glusterd/geo-replication/secret.pem  root@ 
> > > glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no
ssh
> > > error.
> > > 
> > > 
> > > Attached etc-glusterfs-glusterd.vol.log from faulty node, it
shows :
> > > 
> > > [2016-05-19 06:39:23.405974] I
> > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using 
> > > passed config 
> > >
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > [2016-05-19 06:39:23.541169] E
> > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: 
> > > Unable to read gsyncd status file
> > > [2016-05-19 06:39:23.541210] E
> > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to
> > > read the statusfile for /export/sdb/filews brick for 
> > > filews(master),
> > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > > [2016-05-19 06:39:29.472047] I
> > > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using 
> > > passed config 
> > >
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > [2016-05-19 06:39:34.939709] I
> > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using 
> > > passed config 
> > >
template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > [2016-05-19 06:39:35.058520] E
> > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: 
> > > Unable to read gsyncd status file
> > > 
> > > 
> > > /var/log/glusterfs/geo-replication/filews/
> > >
ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilew
> > > s_
> > > sl
> > > ave.log
> > > shows as following:
> > > 
> > > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor]
Monitor:
> > > ------------------------------------------------------------
> > > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor]
Monitor:
> > > starting gsyncd worker
> > > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i]
<top>: rpc_fd:
> > > '7,11,10,9'
> > > [2016-05-19 15:11:37.423882] I
[changelogagent(agent):72:__init__]
> > > ChangelogAgent: Agent listining...
> > > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor]
Monitor:
> > > worker(/export/sdb/filews) died before establishing connection
> > > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop]
> > > RepceServer:
> > > terminating on reaching EOF.
> > > [2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize]
<top>:
> > > exiting.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Best Regards
> > > Yuyang Yang
> > > 
> > > 
> > > 
> > > 
> > > 
> > > ? ??: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]and
share
> > > what's the output?
> > > ????: Thursday, May 19, 2016 1:59 PM
> > > ???: vyyy??? <yuyangyang at Ctrip.com><mailto:yuyangyang
at Ctrip.com>;
> > > Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org>
> > > ??: Re: [Gluster-users] ??: geo-replication status partial faulty
> > > 
> > > Hi,
> > > 
> > > There seems to be some issue in glusterfs01.sh3.ctripcorp.com
slave node.
> > > Can you share the complete logs ?
> > > 
> > > You can increase verbosity of debug messages like this:
> > > gluster volume geo-replication <master volume> <slave 
> > > host>::<slave
> > > volume> config log-level DEBUG
> > > 
> > > 
> > > Also, check  /root/.ssh/authorized_keys in 
> > > glusterfs01.sh3.ctripcorp.com It should have entries in 
> > > /var/lib/glusterd/geo-replication/common_secret.pem.pub (present 
> > > in master node).
> > > 
> > > Have a look at this one for example:
> > >
https://www.gluster.org/pipermail/gluster-users/2015-August/023174
> > > .h
> > > tm
> > > l
> > > 
> > > Thanks,
> > > Saravana
> > > On 05/19/2016 07:53 AM, vyyy??? wrote:
> > > Hello,
> > > 
> > > I have tried to config a geo-replication volume , all the master 
> > > nodes configuration are the same, When I start this volume, the 
> > > status shows partial faulty as following:
> > > 
> > > gluster volume geo-replication filews 
> > > glusterfs01.sh3.ctripcorp.com::filews_slave status
> > > 
> > > MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE
> > > STATUS     CHECKPOINT STATUS
> > > CRAWL STATUS
> > >
-------------------------------------------------------------------------------------------------------------------------------------------------
> > > SVR8048HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR8050HW2285    filews        /export/sdb/filews
> > > glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > SVR8047HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > Hybrid Crawl
> > > SVR8049HW2285    filews        /export/sdb/filews
> > > glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > Hybrid Crawl
> > > SH02SVR5951      filews        /export/sdb/brick1
> > > glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > SH02SVR5953      filews        /export/sdb/brick1
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR6995HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SH02SVR5954      filews        /export/sdb/brick1
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR6994HW2285    filews        /export/sdb/filews
> > > glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > SVR6993HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SH02SVR5952      filews        /export/sdb/brick1
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR6996HW2285    filews        /export/sdb/filews
> > > glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > 
> > > On the faulty node, log file
> > > /var/log/glusterfs/geo-replication/filews
> > > shows
> > > worker(/export/sdb/filews) died before establishing connection
> > > 
> > > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor]
Monitor:
> > > ------------------------------------------------------------
> > > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor]
Monitor:
> > > starting gsyncd worker
> > > [2016-05-18 16:55:46.517460] I
[changelogagent(agent):72:__init__]
> > > ChangelogAgent: Agent listining...
> > > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop]
> > > RepceServer:
> > > terminating on reaching EOF.
> > > [2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize]
<top>:
> > > exiting.
> > > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor]
Monitor:
> > > worker(/export/sdb/filews) died before establishing connection
> > > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor]
Monitor:
> > > ------------------------------------------------------------
> > > 
> > > Any advice and suggestions will be greatly appreciated.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Best Regards
> > >        Yuyang Yang
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > 
> > > Gluster-users mailing list
> > > 
> > > Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org>
> > > 
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > 
> > > 
> > > 
> > 
>

Gluster users - May 2016 - 答复: 答复: 答复: 答复: 答复: 答复: geo-replication status partial faulty

[Gluster-users] 答复: 答复: 答复: 答复: 答复: geo-replication status partial faulty

[Gluster-users] 答复: 答复: 答复: 答复: 答复: 答复: geo-replication status partial faulty