When we change the transport from x to y, it should reflect in all the
vol files. But unfortunately, the volume set command failed to change in
nfs server, (of course it is a bug). I had clearly mentioned in my
previous mails, that changing the volume files using the volume set
command is not recommended, i suggested this, just to check whether tcp
work fine or not.
The reason why you are getting rdma connection error is because , now
bricks are running through tcp, so the brick process will be listening
on socket port. But nfs-server asked for an rdma connection, so they are
trying to connect from rdma port to tcp port. Obviously the connection
will be rejected.
Regards
Rafi KC
On 03/27/2015 12:28 AM, Jonathan Heese wrote:>
> Rafi,
>
>
> Here is my nfs-server.vol file:
>
>
> [root at duke ~]# cat /var/lib/glusterd/nfs/nfs-server.vol
> volume gluster_disk-client-0
> type protocol/client
> option send-gids true
> option password 562ab460-7754-4b5a-82e6-18ed6c130786
> option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
> option transport-type rdma
> option remote-subvolume /bricks/brick1
> option remote-host duke-ib
> end-volume
>
> volume gluster_disk-client-1
> type protocol/client
> option send-gids true
> option password 562ab460-7754-4b5a-82e6-18ed6c130786
> option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
> option transport-type rdma
> option remote-subvolume /bricks/brick1
> option remote-host duchess-ib
> end-volume
>
> volume gluster_disk-replicate-0
> type cluster/replicate
> subvolumes gluster_disk-client-0 gluster_disk-client-1
> end-volume
>
> volume gluster_disk-dht
> type cluster/distribute
> subvolumes gluster_disk-replicate-0
> end-volume
>
> volume gluster_disk-write-behind
> type performance/write-behind
> subvolumes gluster_disk-dht
> end-volume
>
> volume gluster_disk
> type debug/io-stats
> option count-fop-hits off
> option latency-measurement off
> subvolumes gluster_disk-write-behind
> end-volume
>
> volume nfs-server
> type nfs/server
> option nfs3.gluster_disk.volume-id
> 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
> option rpc-auth.addr.gluster_disk.allow *
> option nfs.drc off
> option nfs.nlm on
> option nfs.dynamic-volumes on
> subvolumes gluster_disk
> end-volume
>
> I see that "transport-type rdma" is listed a couple times here,
but
> "gluster volume info" indicates that the volume is using the tcp
> transport:
>
>
> [root at duke ~]# gluster volume info gluster_disk
>
> Volume Name: gluster_disk
> Type: Replicate
> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: duke-ib:/bricks/brick1
> Brick2: duchess-ib:/bricks/brick1
> Options Reconfigured:
> config.transport: tcp
>
> Please let me know if you need any further information from me to
> determine how to correct this discrepancy.
>
>
> Also, I feel compelled to ask: Since the TCP connections are going
> over the InfiniBand connections between the two Gluster servers (based
> on the hostnames which are pointed to the IB IPs via hosts files), are
> there any (significant) drawbacks to using TCP instead of RDMA here?
> Thanks.
>
>
> Regards,
>
> Jon Heese
>
>
> ------------------------------------------------------------------------
> *From:* Mohammed Rafi K C <rkavunga at redhat.com>
> *Sent:* Monday, March 23, 2015 3:29 AM
> *To:* Jonathan Heese
> *Cc:* gluster-users
> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>
>
> On 03/23/2015 11:28 AM, Jonathan Heese wrote:
>> On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C" <rkavunga
at redhat.com
>> <mailto:rkavunga at redhat.com>> wrote:
>>
>>>
>>> On 03/21/2015 07:49 PM, Jonathan Heese wrote:
>>>>
>>>> Mohamed,
>>>>
>>>>
>>>> I have completed the steps you suggested (unmount all, stop the
>>>> volume, set the config.transport to tcp, start the volume,
mount,
>>>> etc.), and the behavior has indeed changed.
>>>>
>>>>
>>>> [root at duke ~]# gluster volume info
>>>>
>>>> Volume Name: gluster_disk
>>>> Type: Replicate
>>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: duke-ib:/bricks/brick1
>>>> Brick2: duchess-ib:/bricks/brick1
>>>> Options Reconfigured:
>>>> config.transport: tcp
>>>>
>>>>
>>>> [root at duke ~]# gluster volume status
>>>> Status of volume: gluster_disk
>>>> Gluster process Port
>>>> Online Pid
>>>>
------------------------------------------------------------------------------
>>>> Brick duke-ib:/bricks/brick1 49152
>>>> Y 16362
>>>> Brick duchess-ib:/bricks/brick1 49152
>>>> Y 14155
>>>> NFS Server on localhost 2049
>>>> Y 16374
>>>> Self-heal Daemon on localhost N/A
>>>> Y 16381
>>>> NFS Server on duchess-ib 2049
>>>> Y 14167
>>>> Self-heal Daemon on duchess-ib N/A
>>>> Y 14174
>>>>
>>>> Task Status of Volume gluster_disk
>>>>
------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>>
>>>> I am no longer seeing the I/O errors during prolonged periods
of
>>>> write I/O that I was seeing when the transport was set to rdma.
>>>> However, I am seeing this message on both nodes every 3 seconds
>>>> (almost exactly):
>>>>
>>>>
>>>> ==> /var/log/glusterfs/nfs.log <=>>>>
[2015-03-21 14:17:40.379719] W
>>>> [rdma.c:1076:gf_rdma_cm_event_handler] 0-gluster_disk-client-1:
cma
>>>> event RDMA_CM_EVENT_REJECTED, error 8 (me:10.10.10.1:1023
>>>> peer:10.10.10.2:49152)
>>>>
>>>>
>>>> Is this something to worry about?
>>>>
>>> If you are not using nfs to export the volumes, there is nothing to
>>> worry.
>>
>> I'm using the native glusterfs FUSE component to mount the volume
>> locally on both servers -- I assume that you're referring to the
>> standard NFS protocol stuff, which I'm not using here.
>>
>> Incidentally, I would like to keep my logs from filling up with junk
>> if possible. Is there something I can do to get rid of these
>> (useless?) error messages?
>
> If i understand correctly, you are getting this enormous log message
> from nfs log only, all other logs and everything are fine now, right ?
> If that is the case, and you are not at all using nfs for exporting
> the volume, as a workaround you can disable nfs for your volume or
> cluster. (gluster v set nfs.disable on). This will turnoff your
> gluster nfs server, and you will no longer get those log messages.
>
>
>>>> Any idea why there are rdma pieces in play when I've set my
>>>> transport to tcp?
>>>>
>>>
>>> there should not be any piece of rdma,if possible, can you paste
the
>>> volfile for nfs server. You can find the volfile in
>>> /var/lib/glusterd/nfs/nfs-server.vol or
>>> /usr/local/var/lib/glusterd/nfs/nfs-server.vol
>>
>> I will get this for you when I can. Thanks.
>
> If you can make it, that will be great help to understand the problem.
>
>
> Rafi KC
>
>>
>> Regards,
>> Jon Heese
>>
>>> Rafi KC
>>>>
>>>> The actual I/O appears to be handled properly and I've seen
no
>>>> further errors in the testing I've done so far.
>>>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Jon Heese
>>>>
>>>>
>>>>
------------------------------------------------------------------------
>>>> *From:* gluster-users-bounces at gluster.org
>>>> <gluster-users-bounces at gluster.org> on behalf of
Jonathan Heese
>>>> <jheese at inetu.net>
>>>> *Sent:* Friday, March 20, 2015 7:04 AM
>>>> *To:* Mohammed Rafi K C
>>>> *Cc:* gluster-users
>>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>>
>>>> Mohammed,
>>>>
>>>> Thanks very much for the reply. I will try that and report
back.
>>>>
>>>> Regards,
>>>> Jon Heese
>>>>
>>>> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C"
>>>> <rkavunga at redhat.com <mailto:rkavunga at
redhat.com>> wrote:
>>>>
>>>>>
>>>>> On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Does anyone else have any further suggestions for
troubleshooting
>>>>>> this?
>>>>>>
>>>>>>
>>>>>>
>>>>>> To sum up: I have a 2 node 2 brick replicated volume,
which holds
>>>>>> a handful of iSCSI image files which are mounted and
served up by
>>>>>> tgtd (CentOS 6) to a handful of devices on a dedicated
iSCSI
>>>>>> network. The most important iSCSI clients (initiators)
are four
>>>>>> VMware ESXi 5.5 hosts that use the iSCSI volumes as
backing for
>>>>>> their datastores for virtual machine storage.
>>>>>>
>>>>>>
>>>>>>
>>>>>> After a few minutes of sustained writing to the volume,
I am
>>>>>> seeing a massive flood (over 1500 per second at times)
of this
>>>>>> error in /var/log/glusterfs/mnt-gluster-disk.log:
>>>>>>
>>>>>> [2015-03-16 02:24:07.582801] W
>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse:
635358:
>>>>>> WRITE => -1 (Input/output error)
>>>>>>
>>>>>>
>>>>>>
>>>>>> When this happens, the ESXi box fails its write
operation and
>>>>>> returns an error to the effect of ?Unable to write data
to
>>>>>> datastore?. I don?t see anything else in the
supporting logs to
>>>>>> explain the root cause of the i/o errors.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any and all suggestions are appreciated. Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> From the mount logs, i assume that your volume transport
type is
>>>>> rdma. There are some known issues for rdma in 3.5.3, and
the patch
>>>>> for to address those issues are already send to upstream
[1]. From
>>>>> the logs, I'm not sure and it is hard to tell you
whether this
>>>>> problem is something related to rdma transport or not. To
make
>>>>> sure that the tcp transport is works well in this scenario,
if
>>>>> possible can you try to reproduce the same using tcp type
volumes.
>>>>> You can change the transport type of volume by doing the
following
>>>>> step ( not recommended in normal use case).
>>>>>
>>>>> 1) unmount every client
>>>>> 2) stop the volume
>>>>> 3) run gluster volume set volname config.transport tcp
>>>>> 4) start the volume again
>>>>> 5) mount the clients
>>>>>
>>>>> [1] : http://goo.gl/2PTL61
>>>>>
>>>>> Regards
>>>>> Rafi KC
>>>>>
>>>>>> /Jon Heese/
>>>>>> /Systems Engineer/
>>>>>> *INetU Managed Hosting*
>>>>>> P: 610.266.7441 x 261
>>>>>> F: 610.266.7434
>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>
>>>>>> /** This message contains confidential information,
which also
>>>>>> may be privileged, and is intended only for the
person(s)
>>>>>> addressed above. Any unauthorized use, distribution,
copying or
>>>>>> disclosure of confidential and/or privileged
information is
>>>>>> strictly prohibited. If you have received this
communication in
>>>>>> error, please erase all copies of the message and its
attachments
>>>>>> and notify the sender immediately via reply e-mail. **/
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:*Jonathan Heese
>>>>>> *Sent:* Tuesday, March 17, 2015 12:36 PM
>>>>>> *To:* 'Ravishankar N'; gluster-users at
gluster.org
>>>>>> *Subject:* RE: [Gluster-users] I/O error on replicated
volume
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ravi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> The last lines in the mount log before the massive
vomit of I/O
>>>>>> errors are from 22 minutes prior, and seem innocuous to
me:
>>>>>>
>>>>>>
>>>>>>
>>>>>> [2015-03-16 01:37:07.126340] E
>>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>>> 0-gluster_disk-client-0: failed to get the port number
for remote
>>>>>> subvolume. Please run 'gluster volume status'
on server to see if
>>>>>> brick process is running.
>>>>>>
>>>>>> [2015-03-16 01:37:07.126587] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>> [0x7fd9c557bccf]
>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>> [0x7fd9c557a995]
>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect
called
>>>>>> (peer:10.10.10.1:24008)
>>>>>>
>>>>>> [2015-03-16 01:37:07.126687] E
>>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>>> 0-gluster_disk-client-1: failed to get the port number
for remote
>>>>>> subvolume. Please run 'gluster volume status'
on server to see if
>>>>>> brick process is running.
>>>>>>
>>>>>> [2015-03-16 01:37:07.126737] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>> [0x7fd9c557bccf]
>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>> [0x7fd9c557a995]
>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect
called
>>>>>> (peer:10.10.10.2:24008)
>>>>>>
>>>>>> [2015-03-16 01:37:10.730165] I
>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-0:
>>>>>> changing port to 49152 (from 0)
>>>>>>
>>>>>> [2015-03-16 01:37:10.730276] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>> [0x7fd9c557bccf]
>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>> [0x7fd9c557a995]
>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect
called
>>>>>> (peer:10.10.10.1:24008)
>>>>>>
>>>>>> [2015-03-16 01:37:10.739500] I
>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-gluster_disk-client-1:
>>>>>> changing port to 49152 (from 0)
>>>>>>
>>>>>> [2015-03-16 01:37:10.739560] W
[rdma.c:4273:gf_rdma_disconnect]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>> [0x7fd9c557bccf]
>>>>>>
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>> [0x7fd9c557a995]
>>>>>>
(-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect
called
>>>>>> (peer:10.10.10.2:24008)
>>>>>>
>>>>>> [2015-03-16 01:37:10.741883] I
>>>>>>
[client-handshake.c:1677:select_server_supported_programs]
>>>>>> 0-gluster_disk-client-0: Using Program GlusterFS 3.3,
Num
>>>>>> (1298437), Version (330)
>>>>>>
>>>>>> [2015-03-16 01:37:10.744524] I
>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>> 0-gluster_disk-client-0: Connected to 10.10.10.1:49152,
attached
>>>>>> to remote volume '/bricks/brick1'.
>>>>>>
>>>>>> [2015-03-16 01:37:10.744537] I
>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>> 0-gluster_disk-client-0: Server and Client lk-version
numbers are
>>>>>> not same, reopening the fds
>>>>>>
>>>>>> [2015-03-16 01:37:10.744566] I
[afr-common.c:4267:afr_notify]
>>>>>> 0-gluster_disk-replicate-0: Subvolume
'gluster_disk-client-0'
>>>>>> came back up; going online.
>>>>>>
>>>>>> [2015-03-16 01:37:10.744627] I
>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>> 0-gluster_disk-client-0: Server lk version = 1
>>>>>>
>>>>>> [2015-03-16 01:37:10.753037] I
>>>>>>
[client-handshake.c:1677:select_server_supported_programs]
>>>>>> 0-gluster_disk-client-1: Using Program GlusterFS 3.3,
Num
>>>>>> (1298437), Version (330)
>>>>>>
>>>>>> [2015-03-16 01:37:10.755657] I
>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>> 0-gluster_disk-client-1: Connected to 10.10.10.2:49152,
attached
>>>>>> to remote volume '/bricks/brick1'.
>>>>>>
>>>>>> [2015-03-16 01:37:10.755676] I
>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>> 0-gluster_disk-client-1: Server and Client lk-version
numbers are
>>>>>> not same, reopening the fds
>>>>>>
>>>>>> [2015-03-16 01:37:10.761945] I
>>>>>> [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse: switched
to graph 0
>>>>>>
>>>>>> [2015-03-16 01:37:10.762144] I
>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>> 0-gluster_disk-client-1: Server lk version = 1
>>>>>>
>>>>>> [*2015-03-16 01:37:10.762279*] I
[fuse-bridge.c:3953:fuse_init]
>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions:
glusterfs
>>>>>> 7.22 kernel 7.14
>>>>>>
>>>>>> [*2015-03-16 01:59:26.098670*] W
>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse:
292084:
>>>>>> WRITE => -1 (Input/output error)
>>>>>>
>>>>>> ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I?ve seen no indication of split-brain on any files at
any point
>>>>>> in this (ever since downdating from 3.6.2 to 3.5.3,
which is when
>>>>>> this particular issue started):
>>>>>>
>>>>>> [root at duke gfapi-module-for-linux-target-driver-]#
gluster v heal
>>>>>> gluster_disk info
>>>>>>
>>>>>> Brick duke.jonheese.local:/bricks/brick1/
>>>>>>
>>>>>> Number of entries: 0
>>>>>>
>>>>>>
>>>>>>
>>>>>> Brick duchess.jonheese.local:/bricks/brick1/
>>>>>>
>>>>>> Number of entries: 0
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>> /Jon Heese/
>>>>>> /Systems Engineer/
>>>>>> *INetU Managed Hosting*
>>>>>> P: 610.266.7441 x 261
>>>>>> F: 610.266.7434
>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>
>>>>>> /** This message contains confidential information,
which also
>>>>>> may be privileged, and is intended only for the
person(s)
>>>>>> addressed above. Any unauthorized use, distribution,
copying or
>>>>>> disclosure of confidential and/or privileged
information is
>>>>>> strictly prohibited. If you have received this
communication in
>>>>>> error, please erase all copies of the message and its
attachments
>>>>>> and notify the sender immediately via reply e-mail. **/
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:*Ravishankar N [mailto:ravishankar at redhat.com]
>>>>>> *Sent:* Tuesday, March 17, 2015 12:35 AM
>>>>>> *To:* Jonathan Heese; gluster-users at gluster.org
>>>>>> <mailto:gluster-users at gluster.org>
>>>>>> *Subject:* Re: [Gluster-users] I/O error on replicated
volume
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> So I resolved my previous issue with split-brains
and the
>>>>>> lack of self-healing by dropping my installed
glusterfs*
>>>>>> packages from 3.6.2 to 3.5.3, but now I've
picked up a new
>>>>>> issue, which actually makes normal use of the
volume
>>>>>> practically impossible.
>>>>>>
>>>>>> A little background for those not already paying
close attention:
>>>>>> I have a 2 node 2 brick replicating volume whose
purpose in
>>>>>> life is to hold iSCSI target files, primarily for
use to
>>>>>> provide datastores to a VMware ESXi cluster. The
plan is to
>>>>>> put a handful of image files on the Gluster volume,
mount
>>>>>> them locally on both Gluster nodes, and run tgtd on
both,
>>>>>> pointed to the image files on the mounted gluster
volume.
>>>>>> Then the ESXi boxes will use multipath
(active/passive) iSCSI
>>>>>> to connect to the nodes, with automatic failover in
case of
>>>>>> planned or unplanned downtime of the Gluster nodes.
>>>>>>
>>>>>> In my most recent round of testing with 3.5.3,
I'm seeing a
>>>>>> massive failure to write data to the volume after
about 5-10
>>>>>> minutes, so I've simplified the scenario a bit
(to minimize
>>>>>> the variables) to: both Gluster nodes up, only one
node
>>>>>> (duke) mounted and running tgtd, and just regular
(single
>>>>>> path) iSCSI from a single ESXi server.
>>>>>>
>>>>>> About 5-10 minutes into migration a VM onto the
test
>>>>>> datastore, /var/log/messages on duke gets blasted
with a ton
>>>>>> of messages exactly like this:
>>>>>>
>>>>>> Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io
error
>>>>>> 0x1781e00 2a -1 512 22971904, Input/output error
>>>>>>
>>>>>>
>>>>>>
>>>>>> And /var/log/glusterfs/mnt-gluster_disk.log gets
blased with
>>>>>> a ton of messages exactly like this:
>>>>>>
>>>>>> [2015-03-16 02:24:07.572279] W
>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk]
0-glusterfs-fuse:
>>>>>> 635299: WRITE => -1 (Input/output error)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Are there any messages in the mount log from AFR about
>>>>>> split-brain just before the above line appears?
>>>>>> Does `gluster v heal <VOLNAME> info` show any
files? Performing
>>>>>> I/O on files that are in split-brain fail with EIO.
>>>>>>
>>>>>> -Ravi
>>>>>>
>>>>>> And the write operation from VMware's side
fails as soon as
>>>>>> these messages start.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I don't see any other errors (in the log files
I know of)
>>>>>> indicating the root cause of these i/o errors.
I'm sure that
>>>>>> this is not enough information to tell what's
going on, but
>>>>>> can anyone help me figure out what to look at next
to figure
>>>>>> this out?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I've also considered using Dan Lambright's
libgfapi gluster
>>>>>> module for tgtd (or something similar) to avoid
going through
>>>>>> FUSE, but I'm not sure whether that would be
irrelevant to
>>>>>> this problem, since I'm not 100% sure if it
lies in FUSE or
>>>>>> elsewhere.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> /Jon Heese/
>>>>>> /Systems Engineer/
>>>>>> *INetU Managed Hosting*
>>>>>> P: 610.266.7441 x 261
>>>>>> F: 610.266.7434
>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>
>>>>>> /** This message contains confidential information,
which
>>>>>> also may be privileged, and is intended only for
the
>>>>>> person(s) addressed above. Any unauthorized use,
>>>>>> distribution, copying or disclosure of confidential
and/or
>>>>>> privileged information is strictly prohibited. If
you have
>>>>>> received this communication in error, please erase
all copies
>>>>>> of the message and its attachments and notify the
sender
>>>>>> immediately via reply e-mail. **/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>>
>>>>>> Gluster-users mailing list
>>>>>>
>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>
>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150327/feeace6a/attachment.html>