Kaushal M
2016-May-05  09:56 UTC
[Gluster-users] GLUSTER fuse client, mounted volume becomes read only
On Thu, May 5, 2016 at 2:26 PM, Egidijus Ligeika <egidijus.ligeika at made.com> wrote:> Hi Kaushal, > This is the volume info: > > Volume Name: thevolume > Type: Replicate > Volume ID: da774a83-b426-42bd-b1ec-359b4e71314f > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: gluster-a.mag-test-madeinternal.com:/gluster/brick > Brick2: gluster-b.mag-test-madeinternal.com:/gluster/brick > Brick3: gluster-c.mag-test-madeinternal.com:/gluster/brick > Options Reconfigured: > performance.quick-read: off > network.ping-timeout: 30 > performance.cache-size: 1853171712 > performance.cache-refresh-timeout: 60 > performance.io-thread-count: 32 > performance.write-behind-window-size: 4MB > diagnostics.client-log-level: WARNING > diagnostics.brick-log-level: WARNING > cluster.self-heal-daemon: on > nfs.disable: true > performance.readdir-ahead: on > > > So my gluster FUSE clients are using not vanilla centos AMI, the AMI is > one of the versions by bashton: > https://www.bashton.com/blog/2015/centos-7-2-1511-ami/ > > My gluster cluster server AMI is using, our in-house AMI, that has some > sysadmin packages, epel and docker baked-in. > I created a pretend client machine using our in-house AMI, I mounted the > same cluster, the same way, with the same version of glusterfuse as on the > broken client. Then I copied the same files again. The new pretend client > had not a single issue. no issues in logs, everything completed > successfully without problems. > > I think - the problem is solved with the in-house AMI. > >I'm glad that you found an alternate solution to the problem.> What can I share to help people investigate if it's the OS + package > combination that might be breaking their gluster? >The bashton AMI has a different network driver. I'm not saying it could be the issue, but that is something that could be investigated. Do you see any ping-timeouts in the log? You could look at the logs around the time you get the read-only mount for more information. Also, what version of GlusterFS are you using?> > Regards, > > E. > > > > > > On Thu, 5 May 2016 at 07:40 Kaushal M <kshlmster at gmail.com> wrote: > >> On Wed, May 4, 2016 at 9:12 PM, Egidijus Ligeika < >> egidijus.ligeika at made.com> wrote: >> >>> Hello, >>> >>> I have glusterfs client and server v3.7.11. >>> Th servers are running inside docker on debian image, on a centos host. >>> >>> I am using the fuse client mounting to a list of DNS a records. >>> I use XFS as the underlying storage, inode size 512 and xfs is formatted >>> with 512 inode size. >>> >>> I can see the client and the cluster is clean and happy, heals work. >>> when writing tens of gigabytes to the cluster via the fuse client I see >>> errors like this ON THE FUSE CLIENT /var/log/glusterfs/mountname: >>> >>> [2016 >>> Error: Parse error on line 1: >>> [2016-05-04 14:43:17.7799 >>> -----^ >>> Expecting 'EOF', '}', ',', ']' >>> -05-04 14:43:17.779936] W [rpc-clnt.c:1606:rpc_clnt_submit] 0-thevolume-client-1: failed to submit rpc-request (XID: 0xe75f8 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (thevolume-client-1) >>> [2016-05-04 14:43:17.779958] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.780382] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.780525] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2016-05-04 14:42:47.766130 (xid=0xe75df) >>> [2016-05-04 14:43:17.780541] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-thevolume-client-1: socket disconnected >>> [2016-05-04 14:43:17.780645] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:42:55.772720 (xid=0xe75e0) >>> [2016-05-04 14:43:17.780675] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.780880] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:43:12.445399 (xid=0xe75e1) >>> [2016-05-04 14:43:17.780898] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.780963] E [MSGID: 108006] [afr-common.c:4046:afr_notify] 0-thevolume-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. >>> [2016-05-04 14:43:17.781039] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.781088] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover_cbk+0x3fc) [0x7f03b7965edc] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.781843] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.781937] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.784540] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.784576] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630384: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.789047] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.789080] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630386: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5_1.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.792013] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.792047] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630388: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.792272] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.794634] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.794664] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630390: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_1.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.796674] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.796770] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.796923] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.799405] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.799580] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.799604] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630396: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_2.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.801365] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.802111] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.818490] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:19.770865] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>> [2016-05-04 14:43:19.770930] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>> [2016-05-04 14:43:19.771811] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >>> [2016-05-04 14:43:19.771852] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner d03a00a8037f0000 [Invalid argument] >>> [2016-05-04 14:43:19.771878] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630361: FLUSH() ERR => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:19.771937] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >>> [2016-05-04 14:43:19.771959] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner b43800a8037f0000 [Invalid argument] >>> [2016-05-04 14:43:19.771979] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630375: FLUSH() ERR => -1 (Transport endpoint is not connected) >>> The message "W [MSGID: 114031] [client-rpc-fops.c:1917:client3_3_fxattrop_cbk] 0-thevolume-client-2: remote operation failed" repeated 19 times between [2016-05-04 14:43:15.770856] and [2016-05-04 14:43:15.773223] >>> >>> >>> The mounted volume becomes read only on the client only. The "gluster" >>> volume is still writeable on the gluster servers. >>> gluster volume status does not report anything funky. >>> If I kill the glusterfs process on the client (the box that has gluster >>> volume mounted via fuse), and then MOUNT again, I can carry on copying and >>> writing, until I get those errors again. >>> >>> I have successfully copied the same amount of data directly to the XFS >>> volume on the glusterfs server nodes, I believe my XFS works. >>> >>> All the machines are on AWS, and none of the resources are exhausted >>> IO/RAM/CPU/NETWORK, not on client, not on gluster cluster. >>> >>> Please help! >>> >>> >>> >> Posting the `gluster vol info` output for the volume will help everyone >> get a better picture about your volume (sanitize it to remove any sensitive >> information first). >> >> Have you enabled any options on the volume, particularly any quorum >> options? Client-quorum makes a client read-only when it cannot connect to a >> quorum of servers (normally 50% of replica count +1 ). >> You mount log shows that you've had connections issue (a lot of >> 'Transport endpoint is not connected' messages). >> Can you verify that the network between the client and server isn't >> having problems when you see these errors? >> >> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160505/4b81dade/attachment.html>
Egidijus Ligeika
2016-May-05  10:08 UTC
[Gluster-users] GLUSTER fuse client, mounted volume becomes read only
Hey Kaushal,
Bashton AMI:
There are timeouts.
[2016-05-04 13:47:52.160729] D [rpc-clnt.c:1021:rpc_clnt_connection_init]
0-glusterfs: disable ping-timeout
[2016-05-04 13:47:52.171008] D [rpc-clnt-ping.c:281:rpc_clnt_start_ping]
0-glusterfs: ping timeout is 0, returning
  9:     option ping-timeout 30
 18:     option ping-timeout 30
 27:     option ping-timeout 30
[2016-05-05 09:51:53.707843] C
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-thevolume-client-0:
server 10.10.10.239:49152 has not responded in the last 30 seconds,
disconnecting.
[2016-05-05 09:51:53.709692] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk]
0-thevolume-client-0: socket disconnected
These same options, do not produce any timeouts when I am using the
in-house AMI.
I am using the glusterfs-fuse.x86_64           3.7.11-1.el7
     glusterfs-epel version
wget -P /etc/yum.repos.d/
http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.8/EPEL.repo/glusterfs-epel.repo
[root at web-i8faf7d03 yum.repos.d]# yum --showduplicates list glusterfs-fuse
| expand
Loaded plugins: cob, fastestmirror
Loading mirror speeds from cached hostfile
 * base: ftp.heanet.ie
 * epel: s3-mirror-eu-west-1.fedoraproject.org
 * extras: ftp.heanet.ie
 * updates: ftp.heanet.ie
Installed Packages
glusterfs-fuse.x86_64           3.7.1-16.0.1.el7.centos           @updates
Available Packages
glusterfs-fuse.x86_64           3.7.1-16.el7                      base
glusterfs-fuse.x86_64           3.7.1-16.0.1.el7.centos           updates
glusterfs-fuse.x86_64           3.7.11-1.el7
 glusterfs-epel
Regards,
E.
On Thu, 5 May 2016 at 10:56 Kaushal M <kshlmster at gmail.com> wrote:
> On Thu, May 5, 2016 at 2:26 PM, Egidijus Ligeika <
> egidijus.ligeika at made.com> wrote:
>
>> Hi Kaushal,
>> This is the volume info:
>>
>> Volume Name: thevolume
>> Type: Replicate
>> Volume ID: da774a83-b426-42bd-b1ec-359b4e71314f
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: gluster-a.mag-test-madeinternal.com:/gluster/brick
>> Brick2: gluster-b.mag-test-madeinternal.com:/gluster/brick
>> Brick3: gluster-c.mag-test-madeinternal.com:/gluster/brick
>> Options Reconfigured:
>> performance.quick-read: off
>> network.ping-timeout: 30
>> performance.cache-size: 1853171712
>> performance.cache-refresh-timeout: 60
>> performance.io-thread-count: 32
>> performance.write-behind-window-size: 4MB
>> diagnostics.client-log-level: WARNING
>> diagnostics.brick-log-level: WARNING
>> cluster.self-heal-daemon: on
>> nfs.disable: true
>> performance.readdir-ahead: on
>>
>>
>> So my gluster FUSE clients are using not vanilla centos AMI, the AMI is
>> one of the versions by bashton:
>> https://www.bashton.com/blog/2015/centos-7-2-1511-ami/
>>
>> My gluster cluster server AMI is using, our in-house AMI, that has some
>> sysadmin packages, epel and docker baked-in.
>> I created a pretend client machine using our in-house AMI, I mounted
the
>> same cluster, the same way, with the same version of glusterfuse as on
the
>> broken client. Then I copied the same files again. The new pretend
client
>> had not a single issue. no issues in logs, everything completed
>> successfully without problems.
>>
>> I think - the problem is solved with the in-house AMI.
>>
>>
> I'm glad that you found an alternate solution to the problem.
>
>
>> What can I share to help people investigate if it's the OS +
package
>> combination that might be breaking their gluster?
>>
>
> The bashton AMI has a different network driver. I'm not saying it could
be
> the issue, but that is something that could be investigated.
>
> Do you see any ping-timeouts in the log? You could look at the logs around
> the time you get the read-only mount for more information.
>
> Also, what version of GlusterFS are you using?
>
>
>>
>> Regards,
>>
>> E.
>>
>>
>>
>>
>>
>> On Thu, 5 May 2016 at 07:40 Kaushal M <kshlmster at gmail.com>
wrote:
>>
>>> On Wed, May 4, 2016 at 9:12 PM, Egidijus Ligeika <
>>> egidijus.ligeika at made.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have glusterfs client and server v3.7.11.
>>>> Th servers are running inside docker on debian image, on a
centos host.
>>>>
>>>> I am using the fuse client mounting to a list of DNS a records.
>>>> I use XFS as the underlying storage, inode size 512 and xfs is
>>>> formatted with 512 inode size.
>>>>
>>>> I can see the client and the cluster is clean and happy, heals
work.
>>>> when writing tens of gigabytes to the cluster via the fuse
client I see
>>>> errors like this ON THE FUSE CLIENT
/var/log/glusterfs/mountname:
>>>>
>>>> [2016
>>>> Error: Parse error on line 1:
>>>> [2016-05-04 14:43:17.7799
>>>> -----^
>>>> Expecting 'EOF', '}', ',', ']'
>>>> -05-04 14:43:17.779936] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-thevolume-client-1: failed to submit rpc-request (XID: 0xe75f8 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (thevolume-client-1)
>>>> [2016-05-04 14:43:17.779958] W [MSGID: 114031]
[client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote
operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport
endpoint is not connected]
>>>> [2016-05-04 14:43:17.780382] E [MSGID: 114031]
[client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote
operation failed [Transport endpoint is not connected]
>>>> [2016-05-04 14:43:17.780525] E
[rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] )))))
0-thevolume-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at
2016-05-04 14:42:47.766130 (xid=0xe75df)
>>>> [2016-05-04 14:43:17.780541] W
[rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-thevolume-client-1: socket
disconnected
>>>> [2016-05-04 14:43:17.780645] E
[rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] )))))
0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
called at 2016-05-04 14:42:55.772720 (xid=0xe75e0)
>>>> [2016-05-04 14:43:17.780675] W [MSGID: 114031]
[client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote
operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport
endpoint is not connected]
>>>> [2016-05-04 14:43:17.780880] E
[rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] )))))
0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
called at 2016-05-04 14:43:12.445399 (xid=0xe75e1)
>>>> [2016-05-04 14:43:17.780898] W [MSGID: 114031]
[client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote
operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport
endpoint is not connected]
>>>> [2016-05-04 14:43:17.780963] E [MSGID: 108006]
[afr-common.c:4046:afr_notify] 0-thevolume-replicate-0: All subvolumes are down.
Going offline until atleast one of them comes back up.
>>>> [2016-05-04 14:43:17.781039] W [MSGID: 114031]
[client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote
operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport
endpoint is not connected]
>>>> [2016-05-04 14:43:17.781088] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover_cbk+0x3fc)
[0x7f03b7965edc]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.781843] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a)
[0x7f03b795db3a]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.781937] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a)
[0x7f03b795db3a]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.784540] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199)
[0x7f03b795e1c9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.784576] W
[fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630384: LOOKUP()
/magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5.jpg => -1
(Transport endpoint is not connected)
>>>> [2016-05-04 14:43:17.789047] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199)
[0x7f03b795e1c9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.789080] W
[fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630386: LOOKUP()
/magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5_1.jpg => -1
(Transport endpoint is not connected)
>>>> [2016-05-04 14:43:17.792013] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199)
[0x7f03b795e1c9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.792047] W
[fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630388: LOOKUP()
/magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6.jpg => -1
(Transport endpoint is not connected)
>>>> [2016-05-04 14:43:17.792272] E [MSGID: 114031]
[client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote
operation failed [Transport endpoint is not connected]
>>>> [2016-05-04 14:43:17.794634] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199)
[0x7f03b795e1c9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.794664] W
[fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630390: LOOKUP()
/magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_1.jpg => -1
(Transport endpoint is not connected)
>>>> [2016-05-04 14:43:17.796674] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a)
[0x7f03b795db3a]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.796770] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a)
[0x7f03b795db3a]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.796923] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a)
[0x7f03b795db3a]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.799405] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a)
[0x7f03b795db3a]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.799580] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199)
[0x7f03b795e1c9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.799604] W
[fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630396: LOOKUP()
/magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_2.jpg => -1
(Transport endpoint is not connected)
>>>> [2016-05-04 14:43:17.801365] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a)
[0x7f03b795db3a]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.802111] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a)
[0x7f03b795db3a]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:17.818490] E
[dht-helper.c:1597:dht_inode_ctx_time_update]
(-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a)
[0x7f03b795db3a]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359)
[0x7f03b76dc0f9]
-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210)
[0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument]
>>>> [2016-05-04 14:43:19.770865] E [MSGID: 114031]
[client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote
operation failed [Transport endpoint is not connected]
>>>> [2016-05-04 14:43:19.770930] E [MSGID: 114031]
[client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote
operation failed [Transport endpoint is not connected]
>>>> [2016-05-04 14:43:19.771811] E [MSGID: 114031]
[client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote
operation failed [Invalid argument]
>>>> [2016-05-04 14:43:19.771852] E [MSGID: 108010]
[afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0:
path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on
subvolume thevolume-client-1 with lock owner d03a00a8037f0000 [Invalid argument]
>>>> [2016-05-04 14:43:19.771878] W
[fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630361: FLUSH() ERR => -1
(Transport endpoint is not connected)
>>>> [2016-05-04 14:43:19.771937] E [MSGID: 114031]
[client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote
operation failed [Invalid argument]
>>>> [2016-05-04 14:43:19.771959] E [MSGID: 108010]
[afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0:
path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on
subvolume thevolume-client-1 with lock owner b43800a8037f0000 [Invalid argument]
>>>> [2016-05-04 14:43:19.771979] W
[fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630375: FLUSH() ERR => -1
(Transport endpoint is not connected)
>>>> The message "W [MSGID: 114031]
[client-rpc-fops.c:1917:client3_3_fxattrop_cbk] 0-thevolume-client-2: remote
operation failed" repeated 19 times between [2016-05-04 14:43:15.770856]
and [2016-05-04 14:43:15.773223]
>>>>
>>>>
>>>> The mounted volume becomes read only on the client only. The
"gluster"
>>>> volume is still writeable on the gluster servers.
>>>> gluster volume status does not report anything funky.
>>>> If I kill the glusterfs process on the client (the box that has
gluster
>>>> volume mounted via fuse), and then MOUNT again, I can carry on
copying and
>>>> writing, until I get those errors again.
>>>>
>>>> I have successfully copied the same amount of data directly to
the XFS
>>>> volume on the glusterfs server nodes, I believe my XFS works.
>>>>
>>>> All the machines are on AWS, and none of the resources are
exhausted
>>>> IO/RAM/CPU/NETWORK, not on client, not on gluster cluster.
>>>>
>>>> Please help!
>>>>
>>>>
>>>>
>>> Posting the `gluster vol info` output for the volume will help
everyone
>>> get a better picture about your volume (sanitize it to remove any
sensitive
>>> information first).
>>>
>>> Have you enabled any options on the volume, particularly any quorum
>>> options? Client-quorum makes a client read-only when it cannot
connect to a
>>> quorum of servers (normally 50% of replica count +1 ).
>>> You mount log shows that you've had connections issue (a lot of
>>> 'Transport endpoint is not connected' messages).
>>> Can you verify that the network between the client and server
isn't
>>> having problems when you see these errors?
>>>
>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160505/78a56f4e/attachment.html>