Rama Shenai
2016-Oct-21 18:05 UTC
[Gluster-users] remote operation failed erros on Glusterfs 3.7.15
Hi Gluster team & users, We are seeing multiple instances of the following error: "remote operation failed [No such file or directory]" on our gluster clients, and this has affects cases where we have some files hosted and are opened/memory-mapped We are seeing this error after we recently added another brick to a replica 2 gluster volume (A couple of days back), making it a volume supported by three replicated bricks (we performed this operation a couple of days ago). Any information on this error would be useful. If needed we can supply any of the client or brick logs. 12447146-[2016-10-21 14:50:07.806214] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x284) [0x7f68a0aada94] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_default [Invalid argument] 12447579-[2016-10-21 14:50:07.837879] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x230) [0x7f68a0aada40] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_access [Invalid argument] 12448011-[2016-10-21 14:50:07.837928] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x284) [0x7f68a0aada94] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_default [Invalid argument] 12448444:[2016-10-21 14:50:10.784317] W [MSGID: 114031] [client-rpc-fops.c:3057:client3_3_readv_cbk] 6-volume1-client-1: remote operation failed [No such file or directory] 12448608:[2016-10-21 14:50:10.784757] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory] 12448772:[2016-10-21 14:50:10.784763] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory] 12448936:[2016-10-21 14:50:10.785575] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory] 12449100-[2016-10-21 14:50:10.786208] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 6-volume1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid 10495074-82d0-4961-8212-5a4f32895f37. (Possible split-brain) 12449328:[2016-10-21 14:50:10.787439] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory] 12449492-[2016-10-21 14:50:10.788730] E [MSGID: 109040] [dht-helper.c:1190:dht_migration_complete_check_task] 6-volume1-dht: (null): failed to lookup the file on volume1-dht [Stale file handle] 12449677-[2016-10-21 14:50:10.788778] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 622070230: READ => -1 gfid=10495074-82d0-4961-8212-5a4f32895f37 fd=0x7f68951a75bc (Stale file handle) 12449864:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory]" repeated 2 times between [2016-10-21 14:50:10.784763] and [2016-10-21 14:50:10.789213] 12450100:[2016-10-21 14:50:10.790080] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory] 12450264:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory]" repeated 3 times between [2016-10-21 14:50:10.784757] and [2016-10-21 14:50:10.791118] 12450500:[2016-10-21 14:50:10.791176] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory] 12450664-[2016-10-21 14:50:10.793395] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 622070238: READ => -1 gfid=10495074-82d0-4961-8212-5a4f32895f37 fd=0x7f68951a75bc (Stale file handle) 12450851-[2016-10-21 14:50:11.036804] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x230) [0x7f68a0aada40] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_access [Invalid argument] 12451283-[2016-10-21 14:50:11.036889] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x148) [0x7f68a0cc5f68] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.15/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x284) [0x7f68a0aada94] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) [0x7f68a7f30dbc] ) 6-dict: !this || key=system.posix_acl_default [Invalid argument] 12451716-The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 6-volume1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid 10495074-82d0-4961-8212-5a4f32895f37. (Possible split-brain)" repeated 3 times between [2016-10-21 14:50:10.786208] and [2016-10-21 14:50:11.223498] 12452016:[2016-10-21 14:50:11.223949] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory] 12452180:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory]" repeated 2 times between [2016-10-21 14:50:10.790080] and [2016-10-21 14:50:11.224945] 12452416-[2016-10-21 14:50:11.225264] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 6-volume1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid 10495074-82d0-4961-8212-5a4f32895f37. (Possible split-brain) 12452644:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory]" repeated 2 times between [2016-10-21 14:50:10.791176] and [2016-10-21 14:50:11.225783] 12452880:[2016-10-21 14:50:11.226648] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-2: remote operation failed [No such file or directory] 12453044-[2016-10-21 14:50:11.228115] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 622070413: READ => -1 gfid=10495074-82d0-4961-8212-5a4f32895f37 fd=0x7f68951a75bc (Stale file handle) 12453231:The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory]" repeated 2 times between [2016-10-21 14:50:11.223949] and [2016-10-21 14:50:11.239505] 12453467:[2016-10-21 14:50:11.239646] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-1: remote operation failed [No such file or directory] 12453631-The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 6-volume1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid 10495074-82d0-4961-8212-5a4f32895f37. (Possible split-brain)" repeated 2 times between [2016-10-21 14:50:11.225264] and [2016-10-21 14:50:11.241102] 12453931:[2016-10-21 14:50:11.241441] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 6-volume1-client-0: remote operation failed [No such file or directory] 12454095-[2016-10-21 14:50:11.243704] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 622070416: READ => -1 gfid=10495074-82d0-4961-8212-5a4f32895f37 fd=0x7f68951a75bc (Stale file handle) Below is the current volume status/configuration: $ sudo gluster volume status Status of volume: volume1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ip-172-25-2-91.us-west-1.compute.inte rnal:/data/glusterfs/volume1/brick1/brick 49152 0 Y 26520 Brick ip-172-25-2-206.us-west-1.compute.int ernal:/data/glusterfs/volume1/brick1/brick 49152 0 Y 17782 Brick ip-172-25-33-75.us-west-1.compute.int ernal:/data/glusterfs/volume1/brick1/brick 49152 0 Y 7225 NFS Server on localhost 2049 0 Y 7245 Self-heal Daemon on localhost N/A N/A Y 7253 NFS Server on ip-172-25-2-206.us-west-1.com pute.internal 2049 0 Y 17436 Self-heal Daemon on ip-172-25-2-206.us-west -1.compute.internal N/A N/A Y 17456 NFS Server on ip-172-25-2-91.us-west-1.comp ute.internal 2049 0 Y 10576 Self-heal Daemon on ip-172-25-2-91.us-west- 1.compute.internal N/A N/A Y 10610 Task Status of Volume volume1 ------------------------------------------------------------------------------ There are no active volume tasks $ sudo gluster volume info Volume Name: volume1 Type: Replicate Volume ID: 3bcca83e-2be5-410c-9a23-b159f570ee7e Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ip-172-25-2-91.us-west-1.compute.internal:/data/glusterfs/volume1/brick1/brick Brick2: ip-172-25-2-206.us-west-1.compute.internal:/data/glusterfs/volume1/brick1/brick Brick3: ip-172-25-33-75.us-west-1.compute.internal:/data/glusterfs/volume1/brick1/brick <-- brick added a couple of days back Options Reconfigured: cluster.quorum-type: fixed cluster.quorum-count: 2 $ From the client log: mnt-repos-volume1.log.1 1: volume volume1-client-0 2: type protocol/client 3: option clnt-lk-version 1 4: option volfile-checksum 0 5: option volfile-key /volume1 6: option client-version 3.7.15 7: option process-uuid production-collab-8-18739-2016/10/04-20:46:19:350684-volume1-client-0-6-0 8: option fops-version 1298437 9: option ping-timeout 42 10: option remote-host ip-172-25-2-91.us-west-1.compute.internal 11: option remote-subvolume /data/glusterfs/volume1/brick1/brick 12: option transport-type socket 13: option send-gids true 14: end-volume 15: 16: volume volume1-client-1 17: type protocol/client 18: option ping-timeout 42 19: option remote-host ip-172-25-2-206.us-west-1.compute.internal 20: option remote-subvolume /data/glusterfs/volume1/brick1/brick 21: option transport-type socket 22: option send-gids true 23: end-volume 24: 25: volume volume1-client-2 26: type protocol/client 27: option ping-timeout 42 28: option remote-host ip-172-25-33-75.us-west-1.compute.internal 29: option remote-subvolume /data/glusterfs/volume1/brick1/brick 30: option transport-type socket 31: option send-gids true 32: end-volume 33: 34: volume volume1-replicate-0 35: type cluster/replicate 36: option quorum-type fixed 37: option quorum-count 2 38: subvolumes volume1-client-0 volume1-client-1 volume1-client-2 39: end-volume 40: 41: volume volume1-dht 42: type cluster/distribute 43: subvolumes volume1-replicate-0 44: end-volume 45: 46: volume volume1-write-behind 47: type performance/write-behind 48: subvolumes volume1-dht 49: end-volume 50: 51: volume volume1-read-ahead 52: type performance/read-ahead 53: subvolumes volume1-write-behind 54: end-volume 55: 56: volume volume1-io-cache 57: type performance/io-cache 58: subvolumes volume1-read-ahead 59: end-volume 60: 61: volume volume1-quick-read 62: type performance/quick-read 63: subvolumes volume1-io-cache 64: end-volume 65: 66: volume volume1-open-behind 67: type performance/open-behind 68: subvolumes volume1-quick-read 69: end-volume 70: 71: volume volume1-md-cache 72: type performance/md-cache 73: option cache-posix-acl true 74: subvolumes volume1-open-behind 75: end-volume 76: 77: volume volume1 78: type debug/io-stats 79: option log-level INFO 80: option latency-measurement off 81: option count-fop-hits off 82: subvolumes volume1-md-cache 83: end-volume 84: 85: volume posix-acl-autoload 86: type system/posix-acl 87: subvolumes volume1 88: end-volume 89: 90: volume meta-autoload 91: type meta 92: subvolumes posix-acl-autoload 93: end-volume 94: +------------------------------------------------------------------------------+ Thanks Rama -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161021/22f7982e/attachment.html>