Hi all, I have a 14 node cluster with two volume on replica 7 one strip 7. This night the node 10 add a peers files and stop. I have corrected with the data in glusterd.info and suppress the bad peers file. Now all the node are stopped no glusterd service and the node 10 failed in starting with the following message : [2015-04-09 12:36:34.441595] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/glusterd --pid-file=/va\ r/run/glusterd.pid) [2015-04-09 12:36:34.447117] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536 [2015-04-09 12:36:34.447181] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory [2015-04-09 12:36:34.452105] W [rdma.c:4221:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) [2015-04-09 12:36:34.452140] E [rdma.c:4519:init] 0-rdma.management: Failed to initialize IB Device [2015-04-09 12:36:34.452156] E [rpc-transport.c:333:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2015-04-09 12:36:34.452233] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2015-04-09 12:36:41.418761] I [glusterd-store.c:2043:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 2 [2015-04-09 12:36:42.107207] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.118716] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.130187] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.141720] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.153222] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.164689] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.176217] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.187721] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.199244] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.210729] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.222230] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.233736] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-04-09 12:36:42.233867] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.240940] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.246986] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.252953] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.258897] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.264829] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.270775] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.276730] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.282703] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.288624] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.294571] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.300498] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-04-09 12:36:42.306460] I [glusterd-store.c:3497:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. [2015-04-09 12:36:42.315087] E [glusterd-store.c:4244:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2015-04-09 12:36:42.315145] E [xlator.c:425:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2015-04-09 12:36:42.315168] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed [2015-04-09 12:36:42.315183] E [graph.c:525:glusterfs_graph_activate] 0-graph: init failed [2015-04-09 12:36:42.315750] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (0), shutting down So something happen in the local configuration that made the glustr daemon to failed. I Hav controlled all the peers files, buat with no changes. The release is le last 3.6.2-1 Any idea ? Many thanks for your help. -- Signature electronique INRA <http://www.inra.fr> *Pierre L?onard* *Senior IT Manager* *MetaGenoPolis* Pierre.Leonard at jouy.inra.fr <mailto:Pierre.Leonard at jouy.inra.fr> T?l. : +33 (0)1 34 65 29 78 Centre de recherche INRA Domaine de Vilvert ? B?t. 325 R+1 78 352 Jouy-en-Josas CEDEX France www.mgps.eu <http://www.mgps.eu> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150409/134342ee/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: logo-INRA2013.gif Type: image/gif Size: 2560 bytes Desc: http://www.inra.fr/ URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150409/134342ee/attachment.gif>
Hi All, Back to you. I have updated all the node : system centos6.6 and gluster last release on all the nodes. All nodes are OK except the 10 which don't start the daemon gluster. Many thanks in advance.> Hi all, > > I have a 14 node cluster with two volume on replica 7 one strip 7. > This night the node 10 add a peers files and stop. > I have corrected with the data in glusterd.info and suppress the bad > peers file. > > Now all the node are stopped no glusterd service and the node 10 > failed in starting with the following message :-- Signature electronique INRA <http://www.inra.fr> *Pierre L?onard* *Senior IT Manager* *MetaGenoPolis* Pierre.Leonard at jouy.inra.fr <mailto:Pierre.Leonard at jouy.inra.fr> T?l. : +33 (0)1 34 65 29 78 Centre de recherche INRA Domaine de Vilvert ? B?t. 325 R+1 78 352 Jouy-en-Josas CEDEX France www.mgps.eu <http://www.mgps.eu> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150409/5ad916cc/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2560 bytes Desc: http://www.inra.fr/ URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150409/5ad916cc/attachment.gif>
On 04/09/2015 06:21 PM, Pierre L?onard wrote:> Hi all, > > I have a 14 node cluster with two volume on replica 7 one strip 7. > This night the node 10 add a peers files and stop. > I have corrected with the data in glusterd.info and suppress the bad peers file.Could you clarify what steps did you perform here. Also could you try to start glusterd with -LDEBUG and share the glusterd log file with us. Also do you see any delta in glusterd.info file between node 10 and the other nodes? ~Atin> > Now all the node are stopped no glusterd service and the node 10 failed in > starting with the following message : > > [2015-04-09 12:36:34.441595] I [MSGID: 100030] [glusterfsd.c:2018:main] > 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: > /usr/sbin/glusterd --pid-file=/va\ > r/run/glusterd.pid) > [2015-04-09 12:36:34.447117] I [glusterd.c:1214:init] 0-management: Maximum > allowed open file descriptors set to 65536 > [2015-04-09 12:36:34.447181] I [glusterd.c:1259:init] 0-management: Using > /var/lib/glusterd as working directory > [2015-04-09 12:36:34.452105] W [rdma.c:4221:__gf_rdma_ctx_create] > 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) > [2015-04-09 12:36:34.452140] E [rdma.c:4519:init] 0-rdma.management: Failed to > initialize IB Device > [2015-04-09 12:36:34.452156] E [rpc-transport.c:333:rpc_transport_load] > 0-rpc-transport: 'rdma' initialization failed > [2015-04-09 12:36:34.452233] W [rpcsvc.c:1524:rpcsvc_transport_create] > 0-rpc-service: cannot create listener, initing the transport failed > [2015-04-09 12:36:41.418761] I > [glusterd-store.c:2043:glusterd_restore_op_version] 0-glusterd: retrieved > op-version: 2 > [2015-04-09 12:36:42.107207] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.118716] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.130187] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.141720] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.153222] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.164689] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.176217] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.187721] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.199244] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.210729] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.222230] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.233736] I > [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2015-04-09 12:36:42.233867] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.240940] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.246986] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.252953] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.258897] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.264829] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.270775] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.276730] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.282703] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.288624] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.294571] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.300498] I [rpc-clnt.c:969:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2015-04-09 12:36:42.306460] I > [glusterd-store.c:3497:glusterd_store_retrieve_missed_snaps_list] 0-management: > No missed snaps list. > [2015-04-09 12:36:42.315087] E > [glusterd-store.c:4244:glusterd_resolve_all_bricks] 0-glusterd: resolve brick > failed in restore > [2015-04-09 12:36:42.315145] E [xlator.c:425:xlator_init] 0-management: > Initialization of volume 'management' failed, review your volfile again > [2015-04-09 12:36:42.315168] E [graph.c:322:glusterfs_graph_init] 0-management: > initializing translator failed > [2015-04-09 12:36:42.315183] E [graph.c:525:glusterfs_graph_activate] 0-graph: > init failed > [2015-04-09 12:36:42.315750] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: > received signum (0), shutting down > > > So something happen in the local configuration that made the glustr daemon to > failed. I Hav controlled all the peers files, buat with no changes. > The release is le last 3.6.2-1 > Any idea ? > > Many thanks for your help. > > -- > Signature electronique > INRA <http://www.inra.fr> > *Pierre L?onard* > *Senior IT Manager* > *MetaGenoPolis* > Pierre.Leonard at jouy.inra.fr <mailto:Pierre.Leonard at jouy.inra.fr> > T?l. : +33 (0)1 34 65 29 78 > > Centre de recherche INRA > Domaine de Vilvert ? B?t. 325 R+1 > 78 352 Jouy-en-Josas CEDEX > France > www.mgps.eu <http://www.mgps.eu> > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-- ~Atin