Jon Cope
2014-Mar-04 21:45 UTC
[Gluster-users] glusterd service fails to start from AWS AMI
Hello all. I have a working replica 2 cluster (4 nodes) up and running happily over Amazon EC2. My end goal is to create AMIs of each machine and then quickly reproduce the same, but new, cluster from those AMIs. Essentially, I'd like a cluster "template". -Assigned original instances' Elastic IPs to new machines to reduce resolution issues. -Passwordless SSH works on initial boot across all machines -Node1: has no evident issue. Starts with glusterd running. -Node1: 'gluster peer status' returns correct public DNS / hostnames for peer nodes. Status: (Disconnected) --since the service is off on them Since my goal is to create a cluster template, reinstalling gluster for each node, though it'll probably work, isn't an optimal solution. Thank You # Node2: etc-glusterfs-glusterd.vol.log # Begins at 'service glusterd start' command entry [2014-03-04 21:20:30.532138] I [glusterfsd.c:2024:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.0.44rhs (/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid) [2014-03-04 21:20:30.539331] I [glusterd.c:1020:init] 0-management: Using /var/lib/glusterd as working directory [2014-03-04 21:20:30.542578] I [socket.c:3485:socket_init] 0-socket.management: SSL support is NOT enabled [2014-03-04 21:20:30.542603] I [socket.c:3500:socket_init] 0-socket.management: using system polling thread [2014-03-04 21:20:30.543203] C [rdma.c:4099:gf_rdma_init] 0-rpc-transport/rdma: Failed to get IB devices [2014-03-04 21:20:30.543342] E [rdma.c:4990:init] 0-rdma.management: Failed to initialize IB Device [2014-03-04 21:20:30.543375] E [rpc-transport.c:320:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2014-03-04 21:20:30.543471] W [rpcsvc.c:1387:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2014-03-04 21:20:37.116571] I [glusterd-store.c:1388:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 2 [2014-03-04 21:20:37.120082] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0 [2014-03-04 21:20:37.120118] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1 [2014-03-04 21:20:37.120137] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-2 [2014-03-04 21:20:37.120154] E [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: brick-3 [2014-03-04 21:20:37.761785] I [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect returned 0 [2014-03-04 21:20:37.765059] I [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect returned 0 [2014-03-04 21:20:37.767677] I [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect returned 0 [2014-03-04 21:20:37.767783] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-03-04 21:20:37.767850] I [socket.c:3485:socket_init] 0-management: SSL support is NOT enabled [2014-03-04 21:20:37.767866] I [socket.c:3500:socket_init] 0-management: using system polling thread [2014-03-04 21:20:37.772356] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-03-04 21:20:37.772441] I [socket.c:3485:socket_init] 0-management: SSL support is NOT enabled [2014-03-04 21:20:37.772459] I [socket.c:3500:socket_init] 0-management: using system polling thread [2014-03-04 21:20:37.776131] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-03-04 21:20:37.776185] I [socket.c:3485:socket_init] 0-management: SSL support is NOT enabled [2014-03-04 21:20:37.776201] I [socket.c:3500:socket_init] 0-management: using system polling thread [2014-03-04 21:20:37.780363] E [glusterd-store.c:2548:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2014-03-04 21:20:37.780395] E [xlator.c:423:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2014-03-04 21:20:37.780410] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed [2014-03-04 21:20:37.780422] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed [2014-03-04 21:20:37.780723] W [glusterfsd.c:1097:cleanup_and_exit] (-->/usr/sbin/glusterd(main+0x6b1) [0x406a91] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb7) [0x405247] (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x106) [0x405156]))) 0-: received signum (0), shutting down
Carlos Capriotti
2014-Mar-04 22:29 UTC
[Gluster-users] glusterd service fails to start from AWS AMI
I don't want to sound simplistic, but seems to be name resolution/network related. Again, I DO know you email ends with redhat.com, but just to make sure, Gluster is running on what distro ? I never dealt with Amazon's platform, so ignorance here is abundant. The reason why I am asking is that I am stress-testing my first (on premisses) install, and I ran into a problem that I am choosing to ignore for now, but will have to solve in the future: DNS resolution stops working after a while. I am using CentOS 6.5, with Gluster 3.4.2. I have a bonded NIC, made out of two physical ones and a third NIC for management. I realized that, despite the fact that I have manually configured all interfaces, disabled user control (may be this), disabled NM access to them, and even tried to update resolv.conf, after a reboot, name resolution does not work. While the NICs were working with NM and/or DHCP, all went file, but after tailoring my ifcfg-* files, DNS went south. You said your name resolution does work. Maybe an entry on your hosts file just to test ? Another thought would be using 3.4.2, instead of 3.4.0. Just wanted to share. KR, Carlos On Tue, Mar 4, 2014 at 10:45 PM, Jon Cope <jcope at redhat.com> wrote:> Hello all. > > I have a working replica 2 cluster (4 nodes) up and running happily over > Amazon EC2. My end goal is to create AMIs of each machine and then quickly > reproduce the same, but new, cluster from those AMIs. Essentially, I'd > like a cluster "template". > > -Assigned original instances' Elastic IPs to new machines to reduce > resolution issues. > -Passwordless SSH works on initial boot across all machines > -Node1: has no evident issue. Starts with glusterd running. > -Node1: 'gluster peer status' returns correct public DNS / hostnames for > peer nodes. Status: (Disconnected) --since the service is off on them > > Since my goal is to create a cluster template, reinstalling gluster for > each node, though it'll probably work, isn't an optimal solution. > > Thank You > > # Node2: etc-glusterfs-glusterd.vol.log > # Begins at 'service glusterd start' command entry > > [2014-03-04 21:20:30.532138] I [glusterfsd.c:2024:main] > 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version > 3.4.0.44rhs (/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid) > [2014-03-04 21:20:30.539331] I [glusterd.c:1020:init] 0-management: Using > /var/lib/glusterd as working directory > [2014-03-04 21:20:30.542578] I [socket.c:3485:socket_init] > 0-socket.management: SSL support is NOT enabled > [2014-03-04 21:20:30.542603] I [socket.c:3500:socket_init] > 0-socket.management: using system polling thread > [2014-03-04 21:20:30.543203] C [rdma.c:4099:gf_rdma_init] > 0-rpc-transport/rdma: Failed to get IB devices > [2014-03-04 21:20:30.543342] E [rdma.c:4990:init] 0-rdma.management: > Failed to initialize IB Device > [2014-03-04 21:20:30.543375] E [rpc-transport.c:320:rpc_transport_load] > 0-rpc-transport: 'rdma' initialization failed > [2014-03-04 21:20:30.543471] W [rpcsvc.c:1387:rpcsvc_transport_create] > 0-rpc-service: cannot create listener, initing the transport failed > [2014-03-04 21:20:37.116571] I > [glusterd-store.c:1388:glusterd_restore_op_version] 0-glusterd: retrieved > op-version: 2 > [2014-03-04 21:20:37.120082] E > [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-0 > [2014-03-04 21:20:37.120118] E > [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-1 > [2014-03-04 21:20:37.120137] E > [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-2 > [2014-03-04 21:20:37.120154] E > [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-3 > [2014-03-04 21:20:37.761785] I > [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect > returned 0 > [2014-03-04 21:20:37.765059] I > [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect > returned 0 > [2014-03-04 21:20:37.767677] I > [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect > returned 0 > [2014-03-04 21:20:37.767783] I [rpc-clnt.c:974:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2014-03-04 21:20:37.767850] I [socket.c:3485:socket_init] 0-management: > SSL support is NOT enabled > [2014-03-04 21:20:37.767866] I [socket.c:3500:socket_init] 0-management: > using system polling thread > [2014-03-04 21:20:37.772356] I [rpc-clnt.c:974:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2014-03-04 21:20:37.772441] I [socket.c:3485:socket_init] 0-management: > SSL support is NOT enabled > [2014-03-04 21:20:37.772459] I [socket.c:3500:socket_init] 0-management: > using system polling thread > [2014-03-04 21:20:37.776131] I [rpc-clnt.c:974:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2014-03-04 21:20:37.776185] I [socket.c:3485:socket_init] 0-management: > SSL support is NOT enabled > [2014-03-04 21:20:37.776201] I [socket.c:3500:socket_init] 0-management: > using system polling thread > [2014-03-04 21:20:37.780363] E > [glusterd-store.c:2548:glusterd_resolve_all_bricks] 0-glusterd: resolve > brick failed in restore > [2014-03-04 21:20:37.780395] E [xlator.c:423:xlator_init] 0-management: > Initialization of volume 'management' failed, review your volfile again > [2014-03-04 21:20:37.780410] E [graph.c:292:glusterfs_graph_init] > 0-management: initializing translator failed > [2014-03-04 21:20:37.780422] E [graph.c:479:glusterfs_graph_activate] > 0-graph: init failed > [2014-03-04 21:20:37.780723] W [glusterfsd.c:1097:cleanup_and_exit] > (-->/usr/sbin/glusterd(main+0x6b1) [0x406a91] > (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb7) [0x405247] > (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x106) [0x405156]))) 0-: > received signum (0), shutting down > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140304/9d04cb03/attachment.html>
Apparently Analagous Threads
- glusterfs crash when the one of replicate node restart
- Possible to bind to multiple addresses?
- Glusterd not working with systemd in redhat 7
- geo-replication 3.5.2 not working on Ubuntu 12.0.4 - transport.address-family not specified
- Instability when using RDMA transport