Michael Bushey
2015-Feb-25 19:52 UTC
[Gluster-users] Upgrading from 3.6.2-1 to 3.6.2-2 causes "failed to get the 'volume file' from server"
On a Debian testing glusterfs cluster, one node of six (web1) was upgraded from 3.6.2-1 to 3.6.2-2. All looks good server side, and gdash looks happy. The problem is this node is no longer able to mount the volumes. The server config is in Ansible so the nodes should be consistent. web1# mount -t glusterfs localhost:/site-private Mount failed. Please check the log file for more details. web1# gluster volume info site-private Volume Name: site-private Type: Distributed-Replicate Volume ID: 53cb154d-7e44-439f-b52c-ca10414327cb Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: web4:/var/gluster/site-private Brick2: web5:/var/gluster/site-private Brick3: web3:/var/gluster/site-private Brick4: webw:/var/gluster/site-private Options Reconfigured: nfs.disable: on auth.allow: 10.* web1# gluster volume status site-private Status of volume: site-private Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick web4:/var/gluster/site-private 49152 Y 18544 Brick web5:/var/gluster/site-private 49152 Y 3460 Brick web3:/var/gluster/site-private 49152 Y 1171 Brick webw:/var/gluster/site-private 49152 Y 8954 Self-heal Daemon on localhost N/A Y 1410 Self-heal Daemon on web3 N/A Y 6394 Self-heal Daemon on web5 N/A Y 3726 Self-heal Daemon on web4 N/A Y 18928 Self-heal Daemon on 10.0.0.22 N/A Y 3601 Self-heal Daemon on 10.0.0.153 N/A Y 23269 Task Status of Volume site-private ------------------------------------------------------------------------------ There are no active volume tasks 10.0.0.22 is web2, 10.0.0.153 is webw. It's irritating that gluster swaps out some of the hostnames with IPs on an intermittent random basis. Any way to fix this inconsistency? web1# tail -f /var/log/glusterfs/var-www-html-site.example.com-sites-default-private.log [2015-02-25 00:49:14.294562] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/site-private /var/www/html/site.example.com/sites/default/private) [2015-02-25 00:49:14.303008] E [glusterfsd-mgmt.c:1494:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server [2015-02-25 00:49:14.303153] E [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/site-private) [2015-02-25 00:49:14.303595] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (0), shutting down [2015-02-25 00:49:14.303673] I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting '/var/www/html/site.example.com/sites/default/private'. These lines appear in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log about every 5 seconds: [2015-02-25 01:00:04.312532] W [socket.c:611:__socket_rwv] 0-management: readv on /var/run/0ecb037a7fd562bf0d7ed973ccd33ed8.socket failed (Invalid argument) Thanks in advance for your time/help. :)
Niels de Vos
2015-Feb-26 07:59 UTC
[Gluster-users] Upgrading from 3.6.2-1 to 3.6.2-2 causes "failed to get the 'volume file' from server"
On Wed, Feb 25, 2015 at 11:52:02AM -0800, Michael Bushey wrote:> On a Debian testing glusterfs cluster, one node of six (web1) was > upgraded from 3.6.2-1 to 3.6.2-2. All looks good server side, and > gdash looks happy. The problem is this node is no longer able to mount > the volumes. The server config is in Ansible so the nodes should be > consistent.This sounds very much like this issue: www.gluster.org/pipermail/gluster-users/2015-February/020781.html We're now working on getting packagers of different distributions aligned and better informed. In the future, packaging differences like this should be identified earlier and issues prevented. HTH, Niels> > > web1# mount -t glusterfs localhost:/site-private > Mount failed. Please check the log file for more details. > > > web1# gluster volume info site-private > > Volume Name: site-private > Type: Distributed-Replicate > Volume ID: 53cb154d-7e44-439f-b52c-ca10414327cb > Status: Started > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: web4:/var/gluster/site-private > Brick2: web5:/var/gluster/site-private > Brick3: web3:/var/gluster/site-private > Brick4: webw:/var/gluster/site-private > Options Reconfigured: > nfs.disable: on > auth.allow: 10.* > > web1# gluster volume status site-private > Status of volume: site-private > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick web4:/var/gluster/site-private 49152 Y 18544 > Brick web5:/var/gluster/site-private 49152 Y 3460 > Brick web3:/var/gluster/site-private 49152 Y 1171 > Brick webw:/var/gluster/site-private 49152 Y 8954 > Self-heal Daemon on localhost N/A Y 1410 > Self-heal Daemon on web3 N/A Y 6394 > Self-heal Daemon on web5 N/A Y 3726 > Self-heal Daemon on web4 N/A Y 18928 > Self-heal Daemon on 10.0.0.22 N/A Y 3601 > Self-heal Daemon on 10.0.0.153 N/A Y 23269 > > Task Status of Volume site-private > ------------------------------------------------------------------------------ > There are no active volume tasks > > 10.0.0.22 is web2, 10.0.0.153 is webw. It's irritating that gluster > swaps out some of the hostnames with IPs on an intermittent random > basis. Any way to fix this inconsistency? > > > web1# tail -f /var/log/glusterfs/var-www-html-site.example.com-sites-default-private.log > > [2015-02-25 00:49:14.294562] I [MSGID: 100030] > [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running > /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs > --volfile-server=localhost --volfile-id=/site-private > /var/www/html/site.example.com/sites/default/private) > [2015-02-25 00:49:14.303008] E > [glusterfsd-mgmt.c:1494:mgmt_getspec_cbk] 0-glusterfs: failed to get > the 'volume file' from server > [2015-02-25 00:49:14.303153] E > [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-mgmt: failed to fetch > volume file (key:/site-private) > [2015-02-25 00:49:14.303595] W [glusterfsd.c:1194:cleanup_and_exit] > (--> 0-: received signum (0), shutting down > [2015-02-25 00:49:14.303673] I [fuse-bridge.c:5599:fini] 0-fuse: > Unmounting '/var/www/html/site.example.com/sites/default/private'. > > These lines appear in > /var/log/glusterfs/etc-glusterfs-glusterd.vol.log about every 5 > seconds: > [2015-02-25 01:00:04.312532] W [socket.c:611:__socket_rwv] > 0-management: readv on > /var/run/0ecb037a7fd562bf0d7ed973ccd33ed8.socket failed (Invalid > argument) > > > Thanks in advance for your time/help. :) > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users