Marcus Bointon
2012-Jan-18 09:54 UTC
[Gluster-users] Auto-mount on boot failing, works later [SOLVED]
On 17 Jan 2012, at 20:59, John Mark Walker wrote:> Marcus - which builds are you using? This is a known issue. See this article: > > http://community.gluster.org/a/howto-mount-glusterfs-volumes-on-servers-at-boot-time-ubuntu/I'm using the stock builds. That's exactly the kind of 'proper' fix I was looking for, so thanks for the pointer, however, it still doesn't mount, though at least it doesn't hang on boot with that in place. I suspect the "mounting TYPE=glusterfs" may mean that it won't apply to nfs mounts. In theory they will be handled by the _netdev option, though that combination causes a hang for me. For now I've switched to a native mount. This is what I see in the gluster log for this volume on boot: [2012-01-18 08:47:10.195918] I [glusterfsd.c:1493:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.5 [2012-01-18 08:47:10.265797] E [common-utils.c:125:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known) [2012-01-18 08:47:10.265916] E [name.c:253:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host 127.0. 0.1 [2012-01-18 08:47:10.266010] E [glusterfsd-mgmt.c:740:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: Su ccess [2012-01-18 08:47:10.394013] W [glusterfsd.c:727:cleanup_and_exit] (-->/usr/sbin/glusterfs(glusterfs_mgmt_init+0x1d0) [0x407d50] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_start+0x12) [0x7fa73ab99c72] (-->/usr/sbin/glusterfs() [0x407fbf]))) 0-: received signum (1 ), shutting down [2012-01-18 08:47:10.394097] I [fuse-bridge.c:3727:fini] 0-fuse: Unmounting '/var/lib/sitedata/aegir'. The good news here is you can see that gluster is starting before it tries to mount, so the init script is doing its job. I'm not sure that the error saying that it failed successfully is helpful! After boot I do a mount -a and it does this instead (and mounts successfully): [2012-01-18 08:49:51.159713] I [glusterfsd.c:1493:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.5 [2012-01-18 08:49:51.190783] W [write-behind.c:3023:init] 0-shared-write-behind: disabling write-behind for first 0 bytes [2012-01-18 08:49:51.197977] I [client.c:1935:notify] 0-shared-client-0: parent translators are ready, attempting connect on tra nsport [2012-01-18 08:49:51.198390] I [client.c:1935:notify] 0-shared-client-1: parent translators are ready, attempting connect on tra nsport Given volfile: [...] [2012-01-18 08:49:51.199287] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 0-shared-client-1: changing port to 24009 (from 0) [2012-01-18 08:49:51.199569] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 0-shared-client-0: changing port to 24009 (from 0) [2012-01-18 08:49:55.177048] I [client-handshake.c:1090:select_server_supported_programs] 0-shared-client-1: Using Program GlusterFS 3.2.5, Num (1298437), Version (310) [2012-01-18 08:49:55.177384] I [client-handshake.c:913:client_setvolume_cbk] 0-shared-client-1: Connected to 192.168.0.2:24009, attached to remote volume '/var/shared'. [2012-01-18 08:49:55.177409] I [afr-common.c:3141:afr_notify] 0-shared-replicate-0: Subvolume 'shared-client-1' came back up; going online. [2012-01-18 08:49:55.178043] I [client-handshake.c:1090:select_server_supported_programs] 0-shared-client-0: Using Program GlusterFS 3.2.5, Num (1298437), Version (310) [2012-01-18 08:49:55.178602] I [client-handshake.c:913:client_setvolume_cbk] 0-shared-client-0: Connected to 192.168.0.3:24009, attached to remote volume '/var/shared'. [2012-01-18 08:49:55.186862] I [fuse-bridge.c:3339:fuse_graph_setup] 0-fuse: switched to graph 0 [2012-01-18 08:49:55.187028] I [fuse-bridge.c:2927:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.16 [2012-01-18 08:49:55.187763] I [afr-common.c:1520:afr_set_root_inode_on_first_lookup] 0-shared-replicate-0: added root inode So it appears that the thing that needs fixing to make this work properly is that spurious lookup for 127.0.0.1. Now it occurred to me that I'm not running a local DNS server that could provide that service (rather than relying on remote resolution), so I installed dnsmasq and it worked! Given that I had previously had my nfs mount hang on the same kind of dns failure, I tried that too, and that does still hang on the name lookup. This can probably be fixed by tweaking the upstart options for dnsmasq to make it start earlier, or making mountall depend on dns (which seems wrong). All of this is a workaround of course - it shouldn't be doing DNS lookups for IP addresses in the first place. I had a play with autofs as well, but it feels like a workaround and I had trouble getting it to work, though I didn't persevere after I saw this suggestion. Marcus -- Marcus Bointon Synchromedia Limited: Creators of http://www.smartmessages.net/ UK info at hand CRM solutions marcus at synchromedia.co.uk | http://www.synchromedia.co.uk/