Egidijus Ligeika
2016-May-05 08:56 UTC
[Gluster-users] GLUSTER fuse client, mounted volume becomes read only
Hi Kaushal, This is the volume info: Volume Name: thevolume Type: Replicate Volume ID: da774a83-b426-42bd-b1ec-359b4e71314f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster-a.mag-test-madeinternal.com:/gluster/brick Brick2: gluster-b.mag-test-madeinternal.com:/gluster/brick Brick3: gluster-c.mag-test-madeinternal.com:/gluster/brick Options Reconfigured: performance.quick-read: off network.ping-timeout: 30 performance.cache-size: 1853171712 performance.cache-refresh-timeout: 60 performance.io-thread-count: 32 performance.write-behind-window-size: 4MB diagnostics.client-log-level: WARNING diagnostics.brick-log-level: WARNING cluster.self-heal-daemon: on nfs.disable: true performance.readdir-ahead: on So my gluster FUSE clients are using not vanilla centos AMI, the AMI is one of the versions by bashton: https://www.bashton.com/blog/2015/centos-7-2-1511-ami/ My gluster cluster server AMI is using, our in-house AMI, that has some sysadmin packages, epel and docker baked-in. I created a pretend client machine using our in-house AMI, I mounted the same cluster, the same way, with the same version of glusterfuse as on the broken client. Then I copied the same files again. The new pretend client had not a single issue. no issues in logs, everything completed successfully without problems. I think - the problem is solved with the in-house AMI. What can I share to help people investigate if it's the OS + package combination that might be breaking their gluster? Regards, E. On Thu, 5 May 2016 at 07:40 Kaushal M <kshlmster at gmail.com> wrote:> On Wed, May 4, 2016 at 9:12 PM, Egidijus Ligeika < > egidijus.ligeika at made.com> wrote: > >> Hello, >> >> I have glusterfs client and server v3.7.11. >> Th servers are running inside docker on debian image, on a centos host. >> >> I am using the fuse client mounting to a list of DNS a records. >> I use XFS as the underlying storage, inode size 512 and xfs is formatted >> with 512 inode size. >> >> I can see the client and the cluster is clean and happy, heals work. when >> writing tens of gigabytes to the cluster via the fuse client I see errors >> like this ON THE FUSE CLIENT /var/log/glusterfs/mountname: >> >> [2016 >> Error: Parse error on line 1: >> [2016-05-04 14:43:17.7799 >> -----^ >> Expecting 'EOF', '}', ',', ']' >> -05-04 14:43:17.779936] W [rpc-clnt.c:1606:rpc_clnt_submit] 0-thevolume-client-1: failed to submit rpc-request (XID: 0xe75f8 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (thevolume-client-1) >> [2016-05-04 14:43:17.779958] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >> [2016-05-04 14:43:17.780382] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >> [2016-05-04 14:43:17.780525] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2016-05-04 14:42:47.766130 (xid=0xe75df) >> [2016-05-04 14:43:17.780541] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-thevolume-client-1: socket disconnected >> [2016-05-04 14:43:17.780645] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:42:55.772720 (xid=0xe75e0) >> [2016-05-04 14:43:17.780675] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >> [2016-05-04 14:43:17.780880] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:43:12.445399 (xid=0xe75e1) >> [2016-05-04 14:43:17.780898] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >> [2016-05-04 14:43:17.780963] E [MSGID: 108006] [afr-common.c:4046:afr_notify] 0-thevolume-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. >> [2016-05-04 14:43:17.781039] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >> [2016-05-04 14:43:17.781088] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover_cbk+0x3fc) [0x7f03b7965edc] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.781843] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.781937] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.784540] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.784576] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630384: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5.jpg => -1 (Transport endpoint is not connected) >> [2016-05-04 14:43:17.789047] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.789080] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630386: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5_1.jpg => -1 (Transport endpoint is not connected) >> [2016-05-04 14:43:17.792013] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.792047] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630388: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6.jpg => -1 (Transport endpoint is not connected) >> [2016-05-04 14:43:17.792272] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Transport endpoint is not connected] >> [2016-05-04 14:43:17.794634] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.794664] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630390: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_1.jpg => -1 (Transport endpoint is not connected) >> [2016-05-04 14:43:17.796674] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.796770] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.796923] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.799405] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.799580] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.799604] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630396: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_2.jpg => -1 (Transport endpoint is not connected) >> [2016-05-04 14:43:17.801365] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.802111] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:17.818490] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >> [2016-05-04 14:43:19.770865] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >> [2016-05-04 14:43:19.770930] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >> [2016-05-04 14:43:19.771811] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >> [2016-05-04 14:43:19.771852] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner d03a00a8037f0000 [Invalid argument] >> [2016-05-04 14:43:19.771878] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630361: FLUSH() ERR => -1 (Transport endpoint is not connected) >> [2016-05-04 14:43:19.771937] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >> [2016-05-04 14:43:19.771959] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner b43800a8037f0000 [Invalid argument] >> [2016-05-04 14:43:19.771979] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630375: FLUSH() ERR => -1 (Transport endpoint is not connected) >> The message "W [MSGID: 114031] [client-rpc-fops.c:1917:client3_3_fxattrop_cbk] 0-thevolume-client-2: remote operation failed" repeated 19 times between [2016-05-04 14:43:15.770856] and [2016-05-04 14:43:15.773223] >> >> >> The mounted volume becomes read only on the client only. The "gluster" >> volume is still writeable on the gluster servers. >> gluster volume status does not report anything funky. >> If I kill the glusterfs process on the client (the box that has gluster >> volume mounted via fuse), and then MOUNT again, I can carry on copying and >> writing, until I get those errors again. >> >> I have successfully copied the same amount of data directly to the XFS >> volume on the glusterfs server nodes, I believe my XFS works. >> >> All the machines are on AWS, and none of the resources are exhausted >> IO/RAM/CPU/NETWORK, not on client, not on gluster cluster. >> >> Please help! >> >> >> > Posting the `gluster vol info` output for the volume will help everyone > get a better picture about your volume (sanitize it to remove any sensitive > information first). > > Have you enabled any options on the volume, particularly any quorum > options? Client-quorum makes a client read-only when it cannot connect to a > quorum of servers (normally 50% of replica count +1 ). > You mount log shows that you've had connections issue (a lot of 'Transport > endpoint is not connected' messages). > Can you verify that the network between the client and server isn't having > problems when you see these errors? > > >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160505/9ccc813e/attachment.html>
Kaushal M
2016-May-05 09:56 UTC
[Gluster-users] GLUSTER fuse client, mounted volume becomes read only
On Thu, May 5, 2016 at 2:26 PM, Egidijus Ligeika <egidijus.ligeika at made.com> wrote:> Hi Kaushal, > This is the volume info: > > Volume Name: thevolume > Type: Replicate > Volume ID: da774a83-b426-42bd-b1ec-359b4e71314f > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: gluster-a.mag-test-madeinternal.com:/gluster/brick > Brick2: gluster-b.mag-test-madeinternal.com:/gluster/brick > Brick3: gluster-c.mag-test-madeinternal.com:/gluster/brick > Options Reconfigured: > performance.quick-read: off > network.ping-timeout: 30 > performance.cache-size: 1853171712 > performance.cache-refresh-timeout: 60 > performance.io-thread-count: 32 > performance.write-behind-window-size: 4MB > diagnostics.client-log-level: WARNING > diagnostics.brick-log-level: WARNING > cluster.self-heal-daemon: on > nfs.disable: true > performance.readdir-ahead: on > > > So my gluster FUSE clients are using not vanilla centos AMI, the AMI is > one of the versions by bashton: > https://www.bashton.com/blog/2015/centos-7-2-1511-ami/ > > My gluster cluster server AMI is using, our in-house AMI, that has some > sysadmin packages, epel and docker baked-in. > I created a pretend client machine using our in-house AMI, I mounted the > same cluster, the same way, with the same version of glusterfuse as on the > broken client. Then I copied the same files again. The new pretend client > had not a single issue. no issues in logs, everything completed > successfully without problems. > > I think - the problem is solved with the in-house AMI. > >I'm glad that you found an alternate solution to the problem.> What can I share to help people investigate if it's the OS + package > combination that might be breaking their gluster? >The bashton AMI has a different network driver. I'm not saying it could be the issue, but that is something that could be investigated. Do you see any ping-timeouts in the log? You could look at the logs around the time you get the read-only mount for more information. Also, what version of GlusterFS are you using?> > Regards, > > E. > > > > > > On Thu, 5 May 2016 at 07:40 Kaushal M <kshlmster at gmail.com> wrote: > >> On Wed, May 4, 2016 at 9:12 PM, Egidijus Ligeika < >> egidijus.ligeika at made.com> wrote: >> >>> Hello, >>> >>> I have glusterfs client and server v3.7.11. >>> Th servers are running inside docker on debian image, on a centos host. >>> >>> I am using the fuse client mounting to a list of DNS a records. >>> I use XFS as the underlying storage, inode size 512 and xfs is formatted >>> with 512 inode size. >>> >>> I can see the client and the cluster is clean and happy, heals work. >>> when writing tens of gigabytes to the cluster via the fuse client I see >>> errors like this ON THE FUSE CLIENT /var/log/glusterfs/mountname: >>> >>> [2016 >>> Error: Parse error on line 1: >>> [2016-05-04 14:43:17.7799 >>> -----^ >>> Expecting 'EOF', '}', ',', ']' >>> -05-04 14:43:17.779936] W [rpc-clnt.c:1606:rpc_clnt_submit] 0-thevolume-client-1: failed to submit rpc-request (XID: 0xe75f8 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (thevolume-client-1) >>> [2016-05-04 14:43:17.779958] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.780382] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.780525] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2016-05-04 14:42:47.766130 (xid=0xe75df) >>> [2016-05-04 14:43:17.780541] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-thevolume-client-1: socket disconnected >>> [2016-05-04 14:43:17.780645] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:42:55.772720 (xid=0xe75e0) >>> [2016-05-04 14:43:17.780675] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.780880] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:43:12.445399 (xid=0xe75e1) >>> [2016-05-04 14:43:17.780898] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.780963] E [MSGID: 108006] [afr-common.c:4046:afr_notify] 0-thevolume-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. >>> [2016-05-04 14:43:17.781039] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.781088] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover_cbk+0x3fc) [0x7f03b7965edc] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.781843] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.781937] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.784540] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.784576] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630384: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.789047] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.789080] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630386: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5_1.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.792013] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.792047] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630388: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.792272] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Transport endpoint is not connected] >>> [2016-05-04 14:43:17.794634] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.794664] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630390: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_1.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.796674] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.796770] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.796923] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.799405] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.799580] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.799604] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630396: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_2.jpg => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:17.801365] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.802111] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:17.818490] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>> [2016-05-04 14:43:19.770865] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>> [2016-05-04 14:43:19.770930] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>> [2016-05-04 14:43:19.771811] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >>> [2016-05-04 14:43:19.771852] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner d03a00a8037f0000 [Invalid argument] >>> [2016-05-04 14:43:19.771878] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630361: FLUSH() ERR => -1 (Transport endpoint is not connected) >>> [2016-05-04 14:43:19.771937] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >>> [2016-05-04 14:43:19.771959] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner b43800a8037f0000 [Invalid argument] >>> [2016-05-04 14:43:19.771979] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630375: FLUSH() ERR => -1 (Transport endpoint is not connected) >>> The message "W [MSGID: 114031] [client-rpc-fops.c:1917:client3_3_fxattrop_cbk] 0-thevolume-client-2: remote operation failed" repeated 19 times between [2016-05-04 14:43:15.770856] and [2016-05-04 14:43:15.773223] >>> >>> >>> The mounted volume becomes read only on the client only. The "gluster" >>> volume is still writeable on the gluster servers. >>> gluster volume status does not report anything funky. >>> If I kill the glusterfs process on the client (the box that has gluster >>> volume mounted via fuse), and then MOUNT again, I can carry on copying and >>> writing, until I get those errors again. >>> >>> I have successfully copied the same amount of data directly to the XFS >>> volume on the glusterfs server nodes, I believe my XFS works. >>> >>> All the machines are on AWS, and none of the resources are exhausted >>> IO/RAM/CPU/NETWORK, not on client, not on gluster cluster. >>> >>> Please help! >>> >>> >>> >> Posting the `gluster vol info` output for the volume will help everyone >> get a better picture about your volume (sanitize it to remove any sensitive >> information first). >> >> Have you enabled any options on the volume, particularly any quorum >> options? Client-quorum makes a client read-only when it cannot connect to a >> quorum of servers (normally 50% of replica count +1 ). >> You mount log shows that you've had connections issue (a lot of >> 'Transport endpoint is not connected' messages). >> Can you verify that the network between the client and server isn't >> having problems when you see these errors? >> >> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160505/4b81dade/attachment.html>