thr3ads.net - Gluster users - [Gluster-users] Broken after 3.7.8 upgrade from 3.7.6 [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Alan Millar

2016-Mar-01 20:30 UTC

[Gluster-users] Broken after 3.7.8 upgrade from 3.7.6

>> I?m still trying to figure out why the self-heal-daemon doesn?t seem to
be
>> working, and what ?unable to get index-dir? means. Any advice on what
to
>> look at would be appreciated. Thanks!
>At any point did you have one node with 3.7.6 and another in 3.7.8 version?
Yes.  I upgraded each server in turn by stopping all gluster server and client
processes on a machine, installing the update, and restarting everything on that
machine.  Then I waited for "heal info" on all volumes to say it had
nothing to work on before proceeding to the next server.

>I couldn't find information about the setup (number of nodes, vol info
etc).

I wasn't sure how much to spam the list :-)

There are 3 servers in the gluster pool.


amillar at chunxy:~$ sudo gluster pool list
UUID                                    Hostname        State
c1531143-229e-44a8-9fbb-089769ec999d    pve4            Connected
e82b0dc0-a490-47e4-bd11-6dcfcd36fd62    pve3            Connected
0422e779-7219-4392-ad8d-6263af4372fa    localhost       Connected



Example volume:



amillar at chunxy:~$ sudo gluster vol info public

Volume Name: public
Type: Replicate
Volume ID: f75bbc7b-5db7-4497-b14c-c0433a84bcd9
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: chunxy:/data/glfs23/public
Brick2: pve4:/data/glfs240/public
Brick3: pve3:/data/glfs830/public
Options Reconfigured:
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.entry-self-heal: on
cluster.self-heal-daemon: enable
performance.readdir-ahead: on



amillar at chunxy:~$ sudo gluster vol status public
Status of volume: public
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick chunxy:/data/glfs23/public            49184     0          Y       13119
Brick pve4:/data/glfs240/public             49168     0          Y       837
Brick pve3:/data/glfs830/public             49167     0          Y       21074
NFS Server on localhost                     2049      0          Y       10237
Self-heal Daemon on localhost               N/A       N/A        Y       10243
NFS Server on pve3                          2049      0          Y       30177
Self-heal Daemon on pve3                    N/A       N/A        Y       30183
NFS Server on pve4                          2049      0          Y       13070
Self-heal Daemon on pve4                    N/A       N/A        Y       13069

Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks



amillar at chunxy:~$ sudo gluster vol heal public info
Brick chunxy:/data/glfs23/public
Number of entries: 0

Brick pve4:/data/glfs240/public
Number of entries: 0

Brick pve3:/data/glfs830/public
Number of entries: 0



amillar at chunxy:~$ sudo grep public /var/log/glusterfs/glustershd.log |tail
-10
[2016-03-01 18:38:41.006543] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1
[2016-03-01 18:48:42.001580] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1
[2016-03-01 18:58:43.001631] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1
[2016-03-01 19:08:43.002149] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1
[2016-03-01 19:18:44.001444] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1
[2016-03-01 19:28:44.001444] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1
[2016-03-01 19:38:44.001445] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1
[2016-03-01 19:48:44.001555] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1
[2016-03-01 19:58:44.001405] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1
[2016-03-01 20:08:44.001444] W [MSGID: 108034]
[afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to get
index-dir on public-client-1



in glustershd.log:
39: volume public-client-1
40:     type protocol/client
41:     option clnt-lk-version 1
42:     option volfile-checksum 0
43:     option volfile-key gluster/glustershd
44:     option client-version 3.7.8
45:     option process-uuid
chunxy-10236-2016/03/01-15:38:17:614674-public-client-1-0-0
46:     option fops-version 1298437
47:     option ping-timeout 42
48:     option remote-host chunxy
49:     option remote-subvolume /data/glfs23/public
50:     option transport-type socket
51:     option username 1aafe2d7-79ea-46b8-9d3e-XXXXXXXXXXXX
52:     option password 7aaa0ce0-2a56-45e1-bc0e-YYYYYYYYYYYY
53: end-volume



What other information would help?  Thanks!

Anuradha Talur

2016-Mar-02 09:17 UTC

head link

[Gluster-users] Broken after 3.7.8 upgrade from 3.7.6

----- Original Message -----> From: "Alan Millar" <grunthos503 at yahoo.com>
> To: "Anuradha Talur" <atalur at redhat.com>
> Cc: gluster-users at gluster.org
> Sent: Wednesday, March 2, 2016 2:00:49 AM
> Subject: Re: [Gluster-users] Broken after 3.7.8 upgrade from 3.7.6
> 
> >> I?m still trying to figure out why the self-heal-daemon doesn?t
seem to be
> >> working, and what ?unable to get index-dir? means. Any advice on
what to
> >> look at would be appreciated. Thanks!
> 
> 
> >At any point did you have one node with 3.7.6 and another in 3.7.8
version?
> 
> Yes.  I upgraded each server in turn by stopping all gluster server and
> client processes on a machine, installing the update, and restarting
> everything on that machine.  Then I waited for "heal info" on all
volumes to
> say it had nothing to work on before proceeding to the next server.
> Thanks for the information.
The "unable to get index-dir on .." messages you saw in log are not
harmful in this scenario.
A simple explanation : when you have 1 new node and 2 old nodes, the
self-heal-deamon and
heal commands run on the new node are expecting that the index-dir 
"<brickpath>/.glusterfs/xattrop/dirty" exists, to process
entries from it.
But the mentioned directory doesn't exist on the old bricks. (it was
introduced as part of
3.7.7). As a result you see these logs.

But this doesn't explain why heal didn't happen on replacing the brick.
:-/
After replacing, did you check volume heal info to know the status of
heal?> 
> >I couldn't find information about the setup (number of nodes, vol
info etc).
> 
> 
> I wasn't sure how much to spam the list :-)
> 
> There are 3 servers in the gluster pool.
> 
> 
> amillar at chunxy:~$ sudo gluster pool list
> UUID                                    Hostname        State
> c1531143-229e-44a8-9fbb-089769ec999d    pve4            Connected
> e82b0dc0-a490-47e4-bd11-6dcfcd36fd62    pve3            Connected
> 0422e779-7219-4392-ad8d-6263af4372fa    localhost       Connected
> 
> 
> 
> Example volume:
> 
> 
> 
> amillar at chunxy:~$ sudo gluster vol info public
> 
> Volume Name: public
> Type: Replicate
> Volume ID: f75bbc7b-5db7-4497-b14c-c0433a84bcd9
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: chunxy:/data/glfs23/public
> Brick2: pve4:/data/glfs240/public
> Brick3: pve3:/data/glfs830/public
> Options Reconfigured:
> cluster.metadata-self-heal: on
> cluster.data-self-heal: on
> cluster.entry-self-heal: on
> cluster.self-heal-daemon: enable
> performance.readdir-ahead: on
> 
> 
> 
> amillar at chunxy:~$ sudo gluster vol status public
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online 
Pid
>
------------------------------------------------------------------------------
> Brick chunxy:/data/glfs23/public            49184     0          Y
> 13119
> Brick pve4:/data/glfs240/public             49168     0          Y      
837
> Brick pve3:/data/glfs830/public             49167     0          Y
> 21074
> NFS Server on localhost                     2049      0          Y
> 10237
> Self-heal Daemon on localhost               N/A       N/A        Y
> 10243
> NFS Server on pve3                          2049      0          Y
> 30177
> Self-heal Daemon on pve3                    N/A       N/A        Y
> 30183
> NFS Server on pve4                          2049      0          Y
> 13070
> Self-heal Daemon on pve4                    N/A       N/A        Y
> 13069
> 
> Task Status of Volume public
>
------------------------------------------------------------------------------
> There are no active volume tasks
> 
> 
> 
> amillar at chunxy:~$ sudo gluster vol heal public info
> Brick chunxy:/data/glfs23/public
> Number of entries: 0
> 
> Brick pve4:/data/glfs240/public
> Number of entries: 0
> 
> Brick pve3:/data/glfs830/public
> Number of entries: 0
> 
> 
> 
> amillar at chunxy:~$ sudo grep public /var/log/glusterfs/glustershd.log
|tail
> -10
> [2016-03-01 18:38:41.006543] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> [2016-03-01 18:48:42.001580] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> [2016-03-01 18:58:43.001631] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> [2016-03-01 19:08:43.002149] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> [2016-03-01 19:18:44.001444] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> [2016-03-01 19:28:44.001444] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> [2016-03-01 19:38:44.001445] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> [2016-03-01 19:48:44.001555] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> [2016-03-01 19:58:44.001405] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> [2016-03-01 20:08:44.001444] W [MSGID: 108034]
> [afr-self-heald.c:445:afr_shd_index_sweep] 0-public-replicate-0: unable to
> get index-dir on public-client-1
> 
> 
> 
> in glustershd.log:
> 39: volume public-client-1
> 40:     type protocol/client
> 41:     option clnt-lk-version 1
> 42:     option volfile-checksum 0
> 43:     option volfile-key gluster/glustershd
> 44:     option client-version 3.7.8
> 45:     option process-uuid
> chunxy-10236-2016/03/01-15:38:17:614674-public-client-1-0-0
> 46:     option fops-version 1298437
> 47:     option ping-timeout 42
> 48:     option remote-host chunxy
> 49:     option remote-subvolume /data/glfs23/public
> 50:     option transport-type socket
> 51:     option username 1aafe2d7-79ea-46b8-9d3e-XXXXXXXXXXXX
> 52:     option password 7aaa0ce0-2a56-45e1-bc0e-YYYYYYYYYYYY
> 53: end-volume
> 
> 
> 
> What other information would help?  Thanks!
> 
-- 
Thanks,
Anuradha.

Gluster users - Mar 2016 - Broken after 3.7.8 upgrade from 3.7.6

[Gluster-users] Broken after 3.7.8 upgrade from 3.7.6

[Gluster-users] Broken after 3.7.8 upgrade from 3.7.6