thr3ads.net - Gluster users - [Gluster-users] Upgrading from 3.6.2-1 to 3.6.2-2 causes "failed to get the 'volume file' from server" [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Michael Bushey

2015-Feb-25 19:52 UTC

[Gluster-users] Upgrading from 3.6.2-1 to 3.6.2-2 causes "failed to get the 'volume file' from server"

On a Debian testing glusterfs cluster, one node of six (web1) was
upgraded from 3.6.2-1 to 3.6.2-2. All looks good server side, and
gdash looks happy. The problem is this node is no longer able to mount
the volumes. The server config is in Ansible so the nodes should be
consistent.


web1# mount -t glusterfs localhost:/site-private
Mount failed. Please check the log file for more details.


web1# gluster volume info site-private

Volume Name: site-private
Type: Distributed-Replicate
Volume ID: 53cb154d-7e44-439f-b52c-ca10414327cb
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: web4:/var/gluster/site-private
Brick2: web5:/var/gluster/site-private
Brick3: web3:/var/gluster/site-private
Brick4: webw:/var/gluster/site-private
Options Reconfigured:
nfs.disable: on
auth.allow: 10.*

web1# gluster volume status site-private
Status of volume: site-private
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick web4:/var/gluster/site-private                      49152   Y       18544
Brick web5:/var/gluster/site-private                      49152   Y       3460
Brick web3:/var/gluster/site-private                      49152   Y       1171
Brick webw:/var/gluster/site-private                      49152   Y       8954
Self-heal Daemon on localhost                           N/A     Y       1410
Self-heal Daemon on web3                                 N/A     Y       6394
Self-heal Daemon on web5                                 N/A     Y       3726
Self-heal Daemon on web4                                 N/A     Y       18928
Self-heal Daemon on 10.0.0.22                       N/A     Y       3601
Self-heal Daemon on 10.0.0.153                      N/A     Y       23269

Task Status of Volume site-private
------------------------------------------------------------------------------
There are no active volume tasks

10.0.0.22 is web2, 10.0.0.153 is webw. It's irritating that gluster
swaps out some of the hostnames with IPs on an intermittent random
basis. Any way to fix this inconsistency?


web1# tail -f
/var/log/glusterfs/var-www-html-site.example.com-sites-default-private.log

[2015-02-25 00:49:14.294562] I [MSGID: 100030]
[glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running
/usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs
--volfile-server=localhost --volfile-id=/site-private
/var/www/html/site.example.com/sites/default/private)
[2015-02-25 00:49:14.303008] E
[glusterfsd-mgmt.c:1494:mgmt_getspec_cbk] 0-glusterfs: failed to get
the 'volume file' from server
[2015-02-25 00:49:14.303153] E
[glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-mgmt: failed to fetch
volume file (key:/site-private)
[2015-02-25 00:49:14.303595] W [glusterfsd.c:1194:cleanup_and_exit]
(--> 0-: received signum (0), shutting down
[2015-02-25 00:49:14.303673] I [fuse-bridge.c:5599:fini] 0-fuse:
Unmounting '/var/www/html/site.example.com/sites/default/private'.

These lines appear in
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log about every 5
seconds:
[2015-02-25 01:00:04.312532] W [socket.c:611:__socket_rwv]
0-management: readv on
/var/run/0ecb037a7fd562bf0d7ed973ccd33ed8.socket failed (Invalid
argument)


Thanks in advance for your time/help. :)

Niels de Vos

2015-Feb-26 07:59 UTC

head link

[Gluster-users] Upgrading from 3.6.2-1 to 3.6.2-2 causes "failed to get the 'volume file' from server"

On Wed, Feb 25, 2015 at 11:52:02AM -0800, Michael Bushey
wrote:> On a Debian testing glusterfs cluster, one node of six (web1) was
> upgraded from 3.6.2-1 to 3.6.2-2. All looks good server side, and
> gdash looks happy. The problem is this node is no longer able to mount
> the volumes. The server config is in Ansible so the nodes should be
> consistent.
This sounds very much like this issue:

    www.gluster.org/pipermail/gluster-users/2015-February/020781.html

We're now working on getting packagers of different distributions
aligned and better informed. In the future, packaging differences like
this should be identified earlier and issues prevented.

HTH,
Niels
> 
> 
> web1# mount -t glusterfs localhost:/site-private
> Mount failed. Please check the log file for more details.
> 
> 
> web1# gluster volume info site-private
> 
> Volume Name: site-private
> Type: Distributed-Replicate
> Volume ID: 53cb154d-7e44-439f-b52c-ca10414327cb
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: web4:/var/gluster/site-private
> Brick2: web5:/var/gluster/site-private
> Brick3: web3:/var/gluster/site-private
> Brick4: webw:/var/gluster/site-private
> Options Reconfigured:
> nfs.disable: on
> auth.allow: 10.*
> 
> web1# gluster volume status site-private
> Status of volume: site-private
> Gluster process                                         Port    Online  Pid
>
------------------------------------------------------------------------------
> Brick web4:/var/gluster/site-private                      49152   Y      
18544
> Brick web5:/var/gluster/site-private                      49152   Y      
3460
> Brick web3:/var/gluster/site-private                      49152   Y      
1171
> Brick webw:/var/gluster/site-private                      49152   Y      
8954
> Self-heal Daemon on localhost                           N/A     Y      
1410
> Self-heal Daemon on web3                                 N/A     Y      
6394
> Self-heal Daemon on web5                                 N/A     Y      
3726
> Self-heal Daemon on web4                                 N/A     Y      
18928
> Self-heal Daemon on 10.0.0.22                       N/A     Y       3601
> Self-heal Daemon on 10.0.0.153                      N/A     Y       23269
> 
> Task Status of Volume site-private
>
------------------------------------------------------------------------------
> There are no active volume tasks
> 
> 10.0.0.22 is web2, 10.0.0.153 is webw. It's irritating that gluster
> swaps out some of the hostnames with IPs on an intermittent random
> basis. Any way to fix this inconsistency?
> 
> 
> web1# tail -f
/var/log/glusterfs/var-www-html-site.example.com-sites-default-private.log
> 
> [2015-02-25 00:49:14.294562] I [MSGID: 100030]
> [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running
> /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs
> --volfile-server=localhost --volfile-id=/site-private
> /var/www/html/site.example.com/sites/default/private)
> [2015-02-25 00:49:14.303008] E
> [glusterfsd-mgmt.c:1494:mgmt_getspec_cbk] 0-glusterfs: failed to get
> the 'volume file' from server
> [2015-02-25 00:49:14.303153] E
> [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-mgmt: failed to fetch
> volume file (key:/site-private)
> [2015-02-25 00:49:14.303595] W [glusterfsd.c:1194:cleanup_and_exit]
> (--> 0-: received signum (0), shutting down
> [2015-02-25 00:49:14.303673] I [fuse-bridge.c:5599:fini] 0-fuse:
> Unmounting '/var/www/html/site.example.com/sites/default/private'.
> 
> These lines appear in
> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log about every 5
> seconds:
> [2015-02-25 01:00:04.312532] W [socket.c:611:__socket_rwv]
> 0-management: readv on
> /var/run/0ecb037a7fd562bf0d7ed973ccd33ed8.socket failed (Invalid
> argument)
> 
> 
> Thanks in advance for your time/help. :)
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - Feb 2015 - Upgrading from 3.6.2-1 to 3.6.2-2 causes "failed to get the 'volume file' from server"

[Gluster-users] Upgrading from 3.6.2-1 to 3.6.2-2 causes "failed to get the 'volume file' from server"

[Gluster-users] Upgrading from 3.6.2-1 to 3.6.2-2 causes "failed to get the 'volume file' from server"