Kaushal M
2016-May-05 10:13 UTC
[Gluster-users] GLUSTER fuse client, mounted volume becomes read only
On Thu, May 5, 2016 at 3:38 PM, Egidijus Ligeika <egidijus.ligeika at made.com> wrote:> Hey Kaushal, > > Bashton AMI: > There are timeouts. > > [2016-05-04 13:47:52.160729] D [rpc-clnt.c:1021:rpc_clnt_connection_init] > 0-glusterfs: disable ping-timeout > [2016-05-04 13:47:52.171008] D [rpc-clnt-ping.c:281:rpc_clnt_start_ping] > 0-glusterfs: ping timeout is 0, returning > 9: option ping-timeout 30 > 18: option ping-timeout 30 > 27: option ping-timeout 30 > [2016-05-05 09:51:53.707843] C > [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-thevolume-client-0: > server 10.10.10.239:49152 has not responded in the last 30 seconds, > disconnecting. > [2016-05-05 09:51:53.709692] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] > 0-thevolume-client-0: socket disconnected > > These same options, do not produce any timeouts when I am using the > in-house AMI. > > I am using the glusterfs-fuse.x86_64 3.7.11-1.el7 > glusterfs-epel version > > wget -P /etc/yum.repos.d/ > http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.8/EPEL.repo/glusterfs-epel.repo > [root at web-i8faf7d03 yum.repos.d]# yum --showduplicates list > glusterfs-fuse | expand > Loaded plugins: cob, fastestmirror > Loading mirror speeds from cached hostfile > * base: ftp.heanet.ie > * epel: s3-mirror-eu-west-1.fedoraproject.org > * extras: ftp.heanet.ie > * updates: ftp.heanet.ie > Installed Packages > glusterfs-fuse.x86_64 3.7.1-16.0.1.el7.centos @updates > > Available Packages > glusterfs-fuse.x86_64 3.7.1-16.el7 base > > glusterfs-fuse.x86_64 3.7.1-16.0.1.el7.centos updates > > glusterfs-fuse.x86_64 3.7.11-1.el7 > glusterfs-epel >Is this for the bashton AMI? You're using the glusterfs-fuse provided by the centos repos (3.7.1-16.0.1.el7.centos), which are rebuilds of the Red Hat Gluster Storage client bits. These are not directly compatible with the community versions. (They should work with community servers, but not guaranteed). Could you check using the glusterfs-epel package (3.7.11-1.el7), which is what you used in the in-house image?> > Regards, > > E. > > > > > On Thu, 5 May 2016 at 10:56 Kaushal M <kshlmster at gmail.com> wrote: > >> On Thu, May 5, 2016 at 2:26 PM, Egidijus Ligeika < >> egidijus.ligeika at made.com> wrote: >> >>> Hi Kaushal, >>> This is the volume info: >>> >>> Volume Name: thevolume >>> Type: Replicate >>> Volume ID: da774a83-b426-42bd-b1ec-359b4e71314f >>> Status: Started >>> Number of Bricks: 1 x 3 = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: gluster-a.mag-test-madeinternal.com:/gluster/brick >>> Brick2: gluster-b.mag-test-madeinternal.com:/gluster/brick >>> Brick3: gluster-c.mag-test-madeinternal.com:/gluster/brick >>> Options Reconfigured: >>> performance.quick-read: off >>> network.ping-timeout: 30 >>> performance.cache-size: 1853171712 >>> performance.cache-refresh-timeout: 60 >>> performance.io-thread-count: 32 >>> performance.write-behind-window-size: 4MB >>> diagnostics.client-log-level: WARNING >>> diagnostics.brick-log-level: WARNING >>> cluster.self-heal-daemon: on >>> nfs.disable: true >>> performance.readdir-ahead: on >>> >>> >>> So my gluster FUSE clients are using not vanilla centos AMI, the AMI is >>> one of the versions by bashton: >>> https://www.bashton.com/blog/2015/centos-7-2-1511-ami/ >>> >>> My gluster cluster server AMI is using, our in-house AMI, that has some >>> sysadmin packages, epel and docker baked-in. >>> I created a pretend client machine using our in-house AMI, I mounted the >>> same cluster, the same way, with the same version of glusterfuse as on the >>> broken client. Then I copied the same files again. The new pretend client >>> had not a single issue. no issues in logs, everything completed >>> successfully without problems. >>> >>> I think - the problem is solved with the in-house AMI. >>> >>> >> I'm glad that you found an alternate solution to the problem. >> >> >>> What can I share to help people investigate if it's the OS + package >>> combination that might be breaking their gluster? >>> >> >> The bashton AMI has a different network driver. I'm not saying it could >> be the issue, but that is something that could be investigated. >> >> Do you see any ping-timeouts in the log? You could look at the logs >> around the time you get the read-only mount for more information. >> >> Also, what version of GlusterFS are you using? >> >> >>> >>> Regards, >>> >>> E. >>> >>> >>> >>> >>> >>> On Thu, 5 May 2016 at 07:40 Kaushal M <kshlmster at gmail.com> wrote: >>> >>>> On Wed, May 4, 2016 at 9:12 PM, Egidijus Ligeika < >>>> egidijus.ligeika at made.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> I have glusterfs client and server v3.7.11. >>>>> Th servers are running inside docker on debian image, on a centos host. >>>>> >>>>> I am using the fuse client mounting to a list of DNS a records. >>>>> I use XFS as the underlying storage, inode size 512 and xfs is >>>>> formatted with 512 inode size. >>>>> >>>>> I can see the client and the cluster is clean and happy, heals work. >>>>> when writing tens of gigabytes to the cluster via the fuse client I see >>>>> errors like this ON THE FUSE CLIENT /var/log/glusterfs/mountname: >>>>> >>>>> [2016 >>>>> Error: Parse error on line 1: >>>>> [2016-05-04 14:43:17.7799 >>>>> -----^ >>>>> Expecting 'EOF', '}', ',', ']' >>>>> -05-04 14:43:17.779936] W [rpc-clnt.c:1606:rpc_clnt_submit] 0-thevolume-client-1: failed to submit rpc-request (XID: 0xe75f8 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (thevolume-client-1) >>>>> [2016-05-04 14:43:17.779958] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>>>> [2016-05-04 14:43:17.780382] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>>>> [2016-05-04 14:43:17.780525] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2016-05-04 14:42:47.766130 (xid=0xe75df) >>>>> [2016-05-04 14:43:17.780541] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-thevolume-client-1: socket disconnected >>>>> [2016-05-04 14:43:17.780645] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:42:55.772720 (xid=0xe75e0) >>>>> [2016-05-04 14:43:17.780675] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>>>> [2016-05-04 14:43:17.780880] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:43:12.445399 (xid=0xe75e1) >>>>> [2016-05-04 14:43:17.780898] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>>>> [2016-05-04 14:43:17.780963] E [MSGID: 108006] [afr-common.c:4046:afr_notify] 0-thevolume-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. >>>>> [2016-05-04 14:43:17.781039] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>>>> [2016-05-04 14:43:17.781088] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover_cbk+0x3fc) [0x7f03b7965edc] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.781843] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.781937] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.784540] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.784576] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630384: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5.jpg => -1 (Transport endpoint is not connected) >>>>> [2016-05-04 14:43:17.789047] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.789080] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630386: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5_1.jpg => -1 (Transport endpoint is not connected) >>>>> [2016-05-04 14:43:17.792013] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.792047] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630388: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6.jpg => -1 (Transport endpoint is not connected) >>>>> [2016-05-04 14:43:17.792272] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Transport endpoint is not connected] >>>>> [2016-05-04 14:43:17.794634] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.794664] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630390: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_1.jpg => -1 (Transport endpoint is not connected) >>>>> [2016-05-04 14:43:17.796674] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.796770] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.796923] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.799405] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.799580] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.799604] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630396: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_2.jpg => -1 (Transport endpoint is not connected) >>>>> [2016-05-04 14:43:17.801365] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.802111] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:17.818490] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>> [2016-05-04 14:43:19.770865] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>>>> [2016-05-04 14:43:19.770930] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>>>> [2016-05-04 14:43:19.771811] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >>>>> [2016-05-04 14:43:19.771852] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner d03a00a8037f0000 [Invalid argument] >>>>> [2016-05-04 14:43:19.771878] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630361: FLUSH() ERR => -1 (Transport endpoint is not connected) >>>>> [2016-05-04 14:43:19.771937] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >>>>> [2016-05-04 14:43:19.771959] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner b43800a8037f0000 [Invalid argument] >>>>> [2016-05-04 14:43:19.771979] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630375: FLUSH() ERR => -1 (Transport endpoint is not connected) >>>>> The message "W [MSGID: 114031] [client-rpc-fops.c:1917:client3_3_fxattrop_cbk] 0-thevolume-client-2: remote operation failed" repeated 19 times between [2016-05-04 14:43:15.770856] and [2016-05-04 14:43:15.773223] >>>>> >>>>> >>>>> The mounted volume becomes read only on the client only. The "gluster" >>>>> volume is still writeable on the gluster servers. >>>>> gluster volume status does not report anything funky. >>>>> If I kill the glusterfs process on the client (the box that has >>>>> gluster volume mounted via fuse), and then MOUNT again, I can carry on >>>>> copying and writing, until I get those errors again. >>>>> >>>>> I have successfully copied the same amount of data directly to the XFS >>>>> volume on the glusterfs server nodes, I believe my XFS works. >>>>> >>>>> All the machines are on AWS, and none of the resources are exhausted >>>>> IO/RAM/CPU/NETWORK, not on client, not on gluster cluster. >>>>> >>>>> Please help! >>>>> >>>>> >>>>> >>>> Posting the `gluster vol info` output for the volume will help everyone >>>> get a better picture about your volume (sanitize it to remove any sensitive >>>> information first). >>>> >>>> Have you enabled any options on the volume, particularly any quorum >>>> options? Client-quorum makes a client read-only when it cannot connect to a >>>> quorum of servers (normally 50% of replica count +1 ). >>>> You mount log shows that you've had connections issue (a lot of >>>> 'Transport endpoint is not connected' messages). >>>> Can you verify that the network between the client and server isn't >>>> having problems when you see these errors? >>>> >>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>> >>>> >>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160505/909242e2/attachment.html>
Egidijus Ligeika
2016-May-05 10:25 UTC
[Gluster-users] GLUSTER fuse client, mounted volume becomes read only
(in-house AMI, GLUSTER SERVER CLUSTER) glusterfs 3.7.11 built on Apr 18 2016 14:19:11 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. (in-house AMI, GLUSTER FUSE CLIENT) glusterfs-fuse-3.7.11-1.el7.x86_64 glusterfs 3.7.11 built on Apr 18 2016 13:20:46 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. (bashton AMI, GLUSTER FUSE CLIENT) glusterfs-fuse-3.7.11-1.el7.x86_64 glusterfs 3.7.11 built on Apr 18 2016 13:20:46 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. Although I have multiple versions available in yum, I definitely have the latest version installed from EPEL. All servers and clients have the same version of glusterfs installed. Regards, E. On Thu, 5 May 2016 at 11:13 Kaushal M <kshlmster at gmail.com> wrote:> On Thu, May 5, 2016 at 3:38 PM, Egidijus Ligeika < > egidijus.ligeika at made.com> wrote: > >> Hey Kaushal, >> >> Bashton AMI: >> There are timeouts. >> >> [2016-05-04 13:47:52.160729] D [rpc-clnt.c:1021:rpc_clnt_connection_init] >> 0-glusterfs: disable ping-timeout >> [2016-05-04 13:47:52.171008] D [rpc-clnt-ping.c:281:rpc_clnt_start_ping] >> 0-glusterfs: ping timeout is 0, returning >> 9: option ping-timeout 30 >> 18: option ping-timeout 30 >> 27: option ping-timeout 30 >> [2016-05-05 09:51:53.707843] C >> [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-thevolume-client-0: >> server 10.10.10.239:49152 has not responded in the last 30 seconds, >> disconnecting. >> [2016-05-05 09:51:53.709692] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] >> 0-thevolume-client-0: socket disconnected >> >> These same options, do not produce any timeouts when I am using the >> in-house AMI. >> >> I am using the glusterfs-fuse.x86_64 3.7.11-1.el7 >> glusterfs-epel version >> >> wget -P /etc/yum.repos.d/ >> http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.8/EPEL.repo/glusterfs-epel.repo >> [root at web-i8faf7d03 yum.repos.d]# yum --showduplicates list >> glusterfs-fuse | expand >> Loaded plugins: cob, fastestmirror >> Loading mirror speeds from cached hostfile >> * base: ftp.heanet.ie >> * epel: s3-mirror-eu-west-1.fedoraproject.org >> * extras: ftp.heanet.ie >> * updates: ftp.heanet.ie >> Installed Packages >> glusterfs-fuse.x86_64 3.7.1-16.0.1.el7.centos >> @updates >> Available Packages >> glusterfs-fuse.x86_64 3.7.1-16.el7 base >> >> glusterfs-fuse.x86_64 3.7.1-16.0.1.el7.centos updates >> >> glusterfs-fuse.x86_64 3.7.11-1.el7 >> glusterfs-epel >> > > Is this for the bashton AMI? > You're using the glusterfs-fuse provided by the centos repos > (3.7.1-16.0.1.el7.centos), which are rebuilds of the Red Hat Gluster > Storage client bits. > These are not directly compatible with the community versions. (They > should work with community servers, but not guaranteed). > Could you check using the glusterfs-epel package (3.7.11-1.el7), which is > what you used in the in-house image? > > > >> >> Regards, >> >> E. >> >> >> >> >> On Thu, 5 May 2016 at 10:56 Kaushal M <kshlmster at gmail.com> wrote: >> >>> On Thu, May 5, 2016 at 2:26 PM, Egidijus Ligeika < >>> egidijus.ligeika at made.com> wrote: >>> >>>> Hi Kaushal, >>>> This is the volume info: >>>> >>>> Volume Name: thevolume >>>> Type: Replicate >>>> Volume ID: da774a83-b426-42bd-b1ec-359b4e71314f >>>> Status: Started >>>> Number of Bricks: 1 x 3 = 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: gluster-a.mag-test-madeinternal.com:/gluster/brick >>>> Brick2: gluster-b.mag-test-madeinternal.com:/gluster/brick >>>> Brick3: gluster-c.mag-test-madeinternal.com:/gluster/brick >>>> Options Reconfigured: >>>> performance.quick-read: off >>>> network.ping-timeout: 30 >>>> performance.cache-size: 1853171712 >>>> performance.cache-refresh-timeout: 60 >>>> performance.io-thread-count: 32 >>>> performance.write-behind-window-size: 4MB >>>> diagnostics.client-log-level: WARNING >>>> diagnostics.brick-log-level: WARNING >>>> cluster.self-heal-daemon: on >>>> nfs.disable: true >>>> performance.readdir-ahead: on >>>> >>>> >>>> So my gluster FUSE clients are using not vanilla centos AMI, the AMI is >>>> one of the versions by bashton: >>>> https://www.bashton.com/blog/2015/centos-7-2-1511-ami/ >>>> >>>> My gluster cluster server AMI is using, our in-house AMI, that has some >>>> sysadmin packages, epel and docker baked-in. >>>> I created a pretend client machine using our in-house AMI, I mounted >>>> the same cluster, the same way, with the same version of glusterfuse as on >>>> the broken client. Then I copied the same files again. The new pretend >>>> client had not a single issue. no issues in logs, everything completed >>>> successfully without problems. >>>> >>>> I think - the problem is solved with the in-house AMI. >>>> >>>> >>> I'm glad that you found an alternate solution to the problem. >>> >>> >>>> What can I share to help people investigate if it's the OS + package >>>> combination that might be breaking their gluster? >>>> >>> >>> The bashton AMI has a different network driver. I'm not saying it could >>> be the issue, but that is something that could be investigated. >>> >>> Do you see any ping-timeouts in the log? You could look at the logs >>> around the time you get the read-only mount for more information. >>> >>> Also, what version of GlusterFS are you using? >>> >>> >>>> >>>> Regards, >>>> >>>> E. >>>> >>>> >>>> >>>> >>>> >>>> On Thu, 5 May 2016 at 07:40 Kaushal M <kshlmster at gmail.com> wrote: >>>> >>>>> On Wed, May 4, 2016 at 9:12 PM, Egidijus Ligeika < >>>>> egidijus.ligeika at made.com> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I have glusterfs client and server v3.7.11. >>>>>> Th servers are running inside docker on debian image, on a centos >>>>>> host. >>>>>> >>>>>> I am using the fuse client mounting to a list of DNS a records. >>>>>> I use XFS as the underlying storage, inode size 512 and xfs is >>>>>> formatted with 512 inode size. >>>>>> >>>>>> I can see the client and the cluster is clean and happy, heals work. >>>>>> when writing tens of gigabytes to the cluster via the fuse client I see >>>>>> errors like this ON THE FUSE CLIENT /var/log/glusterfs/mountname: >>>>>> >>>>>> [2016 >>>>>> Error: Parse error on line 1: >>>>>> [2016-05-04 14:43:17.7799 >>>>>> -----^ >>>>>> Expecting 'EOF', '}', ',', ']' >>>>>> -05-04 14:43:17.779936] W [rpc-clnt.c:1606:rpc_clnt_submit] 0-thevolume-client-1: failed to submit rpc-request (XID: 0xe75f8 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (thevolume-client-1) >>>>>> [2016-05-04 14:43:17.779958] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>>>>> [2016-05-04 14:43:17.780382] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>>>>> [2016-05-04 14:43:17.780525] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2016-05-04 14:42:47.766130 (xid=0xe75df) >>>>>> [2016-05-04 14:43:17.780541] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-thevolume-client-1: socket disconnected >>>>>> [2016-05-04 14:43:17.780645] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:42:55.772720 (xid=0xe75e0) >>>>>> [2016-05-04 14:43:17.780675] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>>>>> [2016-05-04 14:43:17.780880] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f03bf304ae2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f03bf0cf90e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f03bf0cfa1e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f03bf0d140a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f03bf0d1c38] ))))) 0-thevolume-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-05-04 14:43:12.445399 (xid=0xe75e1) >>>>>> [2016-05-04 14:43:17.780898] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>>>>> [2016-05-04 14:43:17.780963] E [MSGID: 108006] [afr-common.c:4046:afr_notify] 0-thevolume-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. >>>>>> [2016-05-04 14:43:17.781039] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-thevolume-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] >>>>>> [2016-05-04 14:43:17.781088] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover_cbk+0x3fc) [0x7f03b7965edc] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.781843] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.781937] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.784540] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.784576] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630384: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5.jpg => -1 (Transport endpoint is not connected) >>>>>> [2016-05-04 14:43:17.789047] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.789080] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630386: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb5_1.jpg => -1 (Transport endpoint is not connected) >>>>>> [2016-05-04 14:43:17.792013] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.792047] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630388: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6.jpg => -1 (Transport endpoint is not connected) >>>>>> [2016-05-04 14:43:17.792272] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Transport endpoint is not connected] >>>>>> [2016-05-04 14:43:17.794634] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.794664] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630390: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_1.jpg => -1 (Transport endpoint is not connected) >>>>>> [2016-05-04 14:43:17.796674] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.796770] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.796923] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.799405] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.799580] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_lookup+0x199) [0x7f03b795e1c9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.799604] W [fuse-bridge.c:467:fuse_entry_cbk] 0-glusterfs-fuse: 630396: LOOKUP() /magento/catalog/product/e/m/emmett_3seater_sierra_blue_lb6_2.jpg => -1 (Transport endpoint is not connected) >>>>>> [2016-05-04 14:43:17.801365] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.802111] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:17.818490] E [dht-helper.c:1597:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.7.11/xlator/cluster/replicate.so(afr_discover+0x14a) [0x7f03b795db3a] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x359) [0x7f03b76dc0f9] -->/usr/lib64/glusterfs/3.7.11/xlator/cluster/distribute.so(dht_inode_ctx_time_update+0x210) [0x7f03b76b8b20] ) 0-thevolume-dht: invalid argument: inode [Invalid argument] >>>>>> [2016-05-04 14:43:19.770865] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>>>>> [2016-05-04 14:43:19.770930] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-2: remote operation failed [Transport endpoint is not connected] >>>>>> [2016-05-04 14:43:19.771811] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >>>>>> [2016-05-04 14:43:19.771852] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner d03a00a8037f0000 [Invalid argument] >>>>>> [2016-05-04 14:43:19.771878] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630361: FLUSH() ERR => -1 (Transport endpoint is not connected) >>>>>> [2016-05-04 14:43:19.771937] E [MSGID: 114031] [client-rpc-fops.c:1676:client3_3_finodelk_cbk] 0-thevolume-client-1: remote operation failed [Invalid argument] >>>>>> [2016-05-04 14:43:19.771959] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-thevolume-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume thevolume-client-1 with lock owner b43800a8037f0000 [Invalid argument] >>>>>> [2016-05-04 14:43:19.771979] W [fuse-bridge.c:1287:fuse_err_cbk] 0-glusterfs-fuse: 630375: FLUSH() ERR => -1 (Transport endpoint is not connected) >>>>>> The message "W [MSGID: 114031] [client-rpc-fops.c:1917:client3_3_fxattrop_cbk] 0-thevolume-client-2: remote operation failed" repeated 19 times between [2016-05-04 14:43:15.770856] and [2016-05-04 14:43:15.773223] >>>>>> >>>>>> >>>>>> The mounted volume becomes read only on the client only. The >>>>>> "gluster" volume is still writeable on the gluster servers. >>>>>> gluster volume status does not report anything funky. >>>>>> If I kill the glusterfs process on the client (the box that has >>>>>> gluster volume mounted via fuse), and then MOUNT again, I can carry on >>>>>> copying and writing, until I get those errors again. >>>>>> >>>>>> I have successfully copied the same amount of data directly to the >>>>>> XFS volume on the glusterfs server nodes, I believe my XFS works. >>>>>> >>>>>> All the machines are on AWS, and none of the resources are exhausted >>>>>> IO/RAM/CPU/NETWORK, not on client, not on gluster cluster. >>>>>> >>>>>> Please help! >>>>>> >>>>>> >>>>>> >>>>> Posting the `gluster vol info` output for the volume will help >>>>> everyone get a better picture about your volume (sanitize it to remove any >>>>> sensitive information first). >>>>> >>>>> Have you enabled any options on the volume, particularly any quorum >>>>> options? Client-quorum makes a client read-only when it cannot connect to a >>>>> quorum of servers (normally 50% of replica count +1 ). >>>>> You mount log shows that you've had connections issue (a lot of >>>>> 'Transport endpoint is not connected' messages). >>>>> Can you verify that the network between the client and server isn't >>>>> having problems when you see these errors? >>>>> >>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> >>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160505/2eb4c6cb/attachment-0001.html>