thr3ads.net - Gluster users - [Gluster-users] geo-replication 3.6.7 - no trusted.gfid on some slave nodes

If this information is useful, please help other people find it:
Share via:

Dietmar Putz

2015-Dec-18 16:32 UTC

[Gluster-users] geo-replication 3.6.7 - no trusted.gfid on some slave nodes - stale file handle

Hello again...

after having some big trouble with an xfs issue in kernel 3.13.0-x and 
3.19.0-39 which has been 'solved' by downgrading to 3.8.4 
(http://comments.gmane.org/gmane.comp.file-systems.xfs.general/71629)
we decided to start a new geo-replication attempt from scratch...
we have deleted the former geo-replication session and started a new one 
as described in :
http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.6

master and slave is a distributed replicated volume running on gluster 
3.6.7 / ubuntu 14.04.
setup worked as described but unfortunately geo-replication isn't 
syncing files and remains in the below shown status.

in the ~geo-replication-slaves/...gluster.log i can found on all slave 
nodes messages like :

[2015-12-16 15:06:46.837748] W [dht-layout.c:180:dht_layout_search] 
0-aut-wien-01-dht: no subvolume for hash (value) = 1448787070
[2015-12-16 15:06:46.837789] W [fuse-bridge.c:1261:fuse_err_cbk] 
0-glusterfs-fuse: 74203: SETXATTR() 
/.gfid/d4815ee4-3348-4105-9136-d0219d956ed8 => -1 (No such file or 
directory)
[2015-12-16 15:06:47.090212] I [dht-layout.c:663:dht_layout_normalize] 
0-aut-wien-01-dht: Found anomalies in (null) (gfid = 
d4815ee4-3348-4105-9136-d0219d956ed8). Holes=1 overlaps=0

[2015-12-16 20:25:55.327874] W [fuse-bridge.c:1967:fuse_create_cbk] 
0-glusterfs-fuse: 199968: /.gfid/603de79d-8d41-44bd-845e-3727cf64a617 => 
-1 (Operation not permitted)
[2015-12-16 20:25:55.617016] W [fuse-bridge.c:1967:fuse_create_cbk] 
0-glusterfs-fuse: 199971: /.gfid/8622fb7d-8909-42de-adb5-c67ed6f006c0 => 
-1 (Operation not permitted)

this is found only on gluster-wien-03-int which is in 'Hybrid Crawl' :
[2015-12-16 17:17:07.219939] W [fuse-bridge.c:1261:fuse_err_cbk] 
0-glusterfs-fuse: 123841: SETXATTR() 
/.gfid/00000000-0000-0000-0000-000000000001 => -1 (File exists)
[2015-12-16 17:17:07.220658] W 
[client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-3: 
remote operation failed: File exists. Path: /2301
[2015-12-16 17:17:07.220702] W 
[client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-2: 
remote operation failed: File exists. Path: /2301


But first of all i would like to have a look at this message, found 
about 6000 times on gluster-wien-05-int and ~07-int which are in 
'History Crawl':
[2015-12-16 13:03:25.658359] W [fuse-bridge.c:483:fuse_entry_cbk] 
0-glusterfs-fuse: 119569: LOOKUP() 
/.gfid/d4815ee4-3348-4105-9136-d0219d956ed8/.dstXXXfDyaP9 => -1 (Stale 
file handle)

The gfid d4815ee4-3348-4105-9136-d0219d956ed8 
1050="d4815ee4-3348-4105-9136-d0219d956ed8" belongs as shown to the 
folder 1050 in the brick-directory.

any brick in the master volume looks like this one ...:
Host : gluster-ger-ber-12-int
# file: gluster-export/1050
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.ger-ber-01-client-0=0x000000000000000000000000
trusted.afr.ger-ber-01-client-1=0x000000000000000000000000
trusted.afr.ger-ber-01-client-2=0x000000000000000000000000
trusted.afr.ger-ber-01-client-3=0x000000000000000000000000
trusted.gfid=0xd4815ee4334841059136d0219d956ed8
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.1c31dc4d-7ee3-423b-8577-c7b0ce2e356a.stime=0x56606290000c7e4e
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x567428e000042116
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

on the slave volume just the brick of wien-02 and wien-03 have the same 
trusted.gfid
Host : gluster-wien-03
# file: gluster-export/1050
trusted.afr.aut-wien-01-client-0=0x000000000000000000000000
trusted.afr.aut-wien-01-client-1=0x000000000000000000000000
trusted.gfid=0xd4815ee4334841059136d0219d956ed8
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x00000001000000000000000055555554

all nodes in 'History Crawl' haven't this trusted.gfid assigned.
Host : gluster-wien-05
# file: gluster-export/1050
trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

I'm not sure if it is normal or if that trusted.gfid should have been 
assigned on all slave nodes by the slave-upgrade.sh script.
bash slave-upgrade.sh localhost:<aut-wien-01> /tmp/master_gfid_file.txt 
$PWD/gsync-sync-gfid was running on wien-02 which has password less 
login for any other slave node.
as i could see in the process list slave-upgrade.sh was running on each 
slave node and starts as far as i can remember with a 'rm -rf 
~/.glusterfs/...'
so the mentioned gfid should disappeared by the slave-upgrade.sh but 
should the trusted.gfid also be re-assigned by the script ?
...I'm confused,
is the 'Stale file handle' message based on the missing trusted.gfid for
/gluster-export/1050/ on the nodes where the message appears ?
does it make sense to geo-rep and to start the slave-upgrade.sh script 
on the affected nodes without having access to the other nodes to fix this ?

currently I'm not sure if the 'stale file handle' messages prevent
us
from getting a running geo-replication but i guess best way is trying to 
get it running step by step...
any help is appreciated.

best regards
dietmar



[ 14:45:42 ] - root at gluster-ger-ber-07 
/var/log/glusterfs/geo-replication/ger-ber-01 $gluster volume 
geo-replication ger-ber-01 gluster-wien-02::aut-wien-01 status detail

MASTER NODE           MASTER VOL    MASTER BRICK 
SLAVE                               STATUS     CHECKPOINT STATUS    
CRAWL STATUS     FILES SYNCD    FILES PENDING    BYTES PENDING    
DELETES PENDING    FILES SKIPPED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
gluster-ger-ber-07    ger-ber-01    /gluster-export 
gluster-wien-07-int::aut-wien-01    Active N/A                  History 
Crawl    -6500 0                0                5 6500
gluster-ger-ber-12    ger-ber-01    /gluster-export 
gluster-wien-06-int::aut-wien-01    Passive N/A                  
N/A              0 0                0                0 0
gluster-ger-ber-11    ger-ber-01    /gluster-export 
gluster-wien-03-int::aut-wien-01    Active N/A                  Hybrid 
Crawl     0 8191             0                0 0
gluster-ger-ber-09    ger-ber-01    /gluster-export 
gluster-wien-05-int::aut-wien-01    Active N/A                  History 
Crawl    -5792 0                0                0 5793
gluster-ger-ber-10    ger-ber-01    /gluster-export 
gluster-wien-02-int::aut-wien-01    Passive N/A                  
N/A              0 0                0                0 0
gluster-ger-ber-08    ger-ber-01    /gluster-export 
gluster-wien-04-int::aut-wien-01    Passive N/A                  
N/A              0 0                0                0 0
[ 14:45:46 ] - root at gluster-ger-ber-07 
/var/log/glusterfs/geo-replication/ger-ber-01 $

Saravanakumar Arumugam

2015-Dec-21 07:08 UTC

head link

[Gluster-users] geo-replication 3.6.7 - no trusted.gfid on some slave nodes - stale file handle

Hi,
Replies inline..

Thanks,
Saravana

On 12/18/2015 10:02 PM, Dietmar Putz wrote:> Hello again...
>
> after having some big trouble with an xfs issue in kernel 3.13.0-x and 
> 3.19.0-39 which has been 'solved' by downgrading to 3.8.4 
> (http://comments.gmane.org/gmane.comp.file-systems.xfs.general/71629)
> we decided to start a new geo-replication attempt from scratch...
> we have deleted the former geo-replication session and started a new 
> one as described in :
> http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.6
>
> master and slave is a distributed replicated volume running on gluster 
> 3.6.7 / ubuntu 14.04.
> setup worked as described but unfortunately geo-replication isn't 
> syncing files and remains in the below shown status.
>
> in the ~geo-replication-slaves/...gluster.log i can found on all slave 
> nodes messages like :
>
> [2015-12-16 15:06:46.837748] W [dht-layout.c:180:dht_layout_search] 
> 0-aut-wien-01-dht: no subvolume for hash (value) = 1448787070
> [2015-12-16 15:06:46.837789] W [fuse-bridge.c:1261:fuse_err_cbk] 
> 0-glusterfs-fuse: 74203: SETXATTR() 
> /.gfid/d4815ee4-3348-4105-9136-d0219d956ed8 => -1 (No such file or 
> directory)
> [2015-12-16 15:06:47.090212] I [dht-layout.c:663:dht_layout_normalize] 
> 0-aut-wien-01-dht: Found anomalies in (null) (gfid = 
> d4815ee4-3348-4105-9136-d0219d956ed8). Holes=1 overlaps=0
>
> [2015-12-16 20:25:55.327874] W [fuse-bridge.c:1967:fuse_create_cbk] 
> 0-glusterfs-fuse: 199968: /.gfid/603de79d-8d41-44bd-845e-3727cf64a617 
> => -1 (Operation not permitted)
> [2015-12-16 20:25:55.617016] W [fuse-bridge.c:1967:fuse_create_cbk] 
> 0-glusterfs-fuse: 199971: /.gfid/8622fb7d-8909-42de-adb5-c67ed6f006c0 
> => -1 (Operation not permitted)Please check whether selinux is enabled in both Master/Slave..I remember 
seeing such errors if selinux enabled.
>
> this is found only on gluster-wien-03-int which is in 'Hybrid
Crawl' :
> [2015-12-16 17:17:07.219939] W [fuse-bridge.c:1261:fuse_err_cbk] 
> 0-glusterfs-fuse: 123841: SETXATTR() 
> /.gfid/00000000-0000-0000-0000-000000000001 => -1 (File exists)
> [2015-12-16 17:17:07.220658] W 
> [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-3: 
> remote operation failed: File exists. Path: /2301
> [2015-12-16 17:17:07.220702] W 
> [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-2: 
> remote operation failed: File exists. Path: /2301
>
Some errors like "file exists" can be ignored.>
> But first of all i would like to have a look at this message, found 
> about 6000 times on gluster-wien-05-int and ~07-int which are in 
> 'History Crawl':
> [2015-12-16 13:03:25.658359] W [fuse-bridge.c:483:fuse_entry_cbk] 
> 0-glusterfs-fuse: 119569: LOOKUP() 
> /.gfid/d4815ee4-3348-4105-9136-d0219d956ed8/.dstXXXfDyaP9 => -1 (Stale 
> file handle)
>
> The gfid d4815ee4-3348-4105-9136-d0219d956ed8 
> 1050="d4815ee4-3348-4105-9136-d0219d956ed8" belongs as shown to
the
> folder 1050 in the brick-directory.
>
> any brick in the master volume looks like this one ...:
> Host : gluster-ger-ber-12-int
> # file: gluster-export/1050
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.ger-ber-01-client-0=0x000000000000000000000000
> trusted.afr.ger-ber-01-client-1=0x000000000000000000000000
> trusted.afr.ger-ber-01-client-2=0x000000000000000000000000
> trusted.afr.ger-ber-01-client-3=0x000000000000000000000000
> trusted.gfid=0xd4815ee4334841059136d0219d956ed8
>
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.1c31dc4d-7ee3-423b-8577-c7b0ce2e356a.stime=0x56606290000c7e4e
>
>
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x567428e000042116
>
> trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
>
> on the slave volume just the brick of wien-02 and wien-03 have the 
> same trusted.gfid
> Host : gluster-wien-03
> # file: gluster-export/1050
> trusted.afr.aut-wien-01-client-0=0x000000000000000000000000
> trusted.afr.aut-wien-01-client-1=0x000000000000000000000000
> trusted.gfid=0xd4815ee4334841059136d0219d956ed8
>
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
>
> trusted.glusterfs.dht=0x00000001000000000000000055555554
>
> all nodes in 'History Crawl' haven't this trusted.gfid
assigned.
> Host : gluster-wien-05
> # file: gluster-export/1050
> trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
> trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
>
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
>
> trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
>
> I'm not sure if it is normal or if that trusted.gfid should have been 
> assigned on all slave nodes by the slave-upgrade.sh script.
As per the doc, it applies gfid on all slave nodes.
> bash slave-upgrade.sh localhost:<aut-wien-01> 
> /tmp/master_gfid_file.txt $PWD/gsync-sync-gfid was running on wien-02 
> which has password less login for any other slave node.
> as i could see in the process list slave-upgrade.sh was running on 
> each slave node and starts as far as i can remember with a 'rm -rf 
> ~/.glusterfs/...'
> so the mentioned gfid should disappeared by the slave-upgrade.sh but 
> should the trusted.gfid also be re-assigned by the script ?
> ...I'm confused,
> is the 'Stale file handle' message based on the missing
trusted.gfid
> for /gluster-export/1050/ on the nodes where the message appears ?
> does it make sense to geo-rep and to start the slave-upgrade.sh script 
> on the affected nodes without having access to the other nodes to fix 
> this ?
>
> currently I'm not sure if the 'stale file handle' messages
prevent us
> from getting a running geo-replication but i guess best way is trying 
> to get it running step by step...
> any help is appreciated.
>
> best regards
> dietmar
>
>
>
> [ 14:45:42 ] - root at gluster-ger-ber-07 
> /var/log/glusterfs/geo-replication/ger-ber-01 $gluster volume 
> geo-replication ger-ber-01 gluster-wien-02::aut-wien-01 status detail
>
> MASTER NODE           MASTER VOL    MASTER BRICK 
> SLAVE                               STATUS     CHECKPOINT STATUS    
> CRAWL STATUS     FILES SYNCD    FILES PENDING    BYTES PENDING    
> DELETES PENDING    FILES SKIPPED
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> gluster-ger-ber-07    ger-ber-01    /gluster-export 
> gluster-wien-07-int::aut-wien-01    Active N/A History Crawl    -6500 
> 0                0                5 6500
> gluster-ger-ber-12    ger-ber-01    /gluster-export 
> gluster-wien-06-int::aut-wien-01    Passive N/A N/A              0 
> 0                0                0 0
> gluster-ger-ber-11    ger-ber-01    /gluster-export 
> gluster-wien-03-int::aut-wien-01    Active N/A Hybrid Crawl     0 
> 8191             0                0 0
> gluster-ger-ber-09    ger-ber-01    /gluster-export 
> gluster-wien-05-int::aut-wien-01    Active N/A History Crawl    -5792 
> 0                0                0 5793
> gluster-ger-ber-10    ger-ber-01    /gluster-export 
> gluster-wien-02-int::aut-wien-01    Passive N/A N/A              0 
> 0                0                0 0
> gluster-ger-ber-08    ger-ber-01    /gluster-export 
> gluster-wien-04-int::aut-wien-01    Passive N/A N/A              0 
> 0                0                0 0
> [ 14:45:46 ] - root at gluster-ger-ber-07 
> /var/log/glusterfs/geo-replication/ger-ber-01 $
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - Dec 2015 - geo-replication 3.6.7 - no trusted.gfid on some slave nodes - stale file handle

[Gluster-users] geo-replication 3.6.7 - no trusted.gfid on some slave nodes - stale file handle

[Gluster-users] geo-replication 3.6.7 - no trusted.gfid on some slave nodes - stale file handle